Changeset 217456 in webkit


Ignore:
Timestamp:
May 25, 2017 4:11:24 PM (7 years ago)
Author:
msaboff@apple.com
Message:

bmalloc: scavenger runs too much on JetStream
https://bugs.webkit.org/show_bug.cgi?id=172373

Reviewed by Geoffrey Garen.

Instruments says that JetStream on macOS spends about 3% of its time in
madvise.

In <https://bugs.webkit.org/show_bug.cgi?id=160098>, Ben saw some
evidence that madvise was the reason that switching to bmalloc for
DFG::Node allocations was a slowdown the first time around.

In <https://bugs.webkit.org/show_bug.cgi?id=172124>, Michael saw that
scavening policy can affect JetStream.

Intuitively, it seems wrong for the heap to idle shrink during hardcore
benchmarking.

The strategy here is to back off in response to any heap growth event,
and to wait 2s instead of 0.5s for heap growth to take place -- but we
scavenge immediately in response to critical memory pressure, to avoid
jetsam.

One hole in this strategy is that a workload with a perfectly
unfragmented heap that allocates and deallocates ~16kB every 2s will
never shrink its heap. This doesn't seem to be a problem in practice.

This looks like a 2% - 4% speedup on JetStream on Mac Pro and MacBook Air.

  • bmalloc/AsyncTask.h:

(bmalloc::AsyncTask::willRun):
(bmalloc::AsyncTask::willRunSoon):
(bmalloc::Function>::AsyncTask):
(bmalloc::Function>::run):
(bmalloc::Function>::runSoon):
(bmalloc::Function>::threadRunLoop):
(bmalloc::Function>::runSlowCase): Deleted. Added a "run soon" state
so that execution delay is modeled directly instead of implicitly
through sleep events. This enables the Heap to issue a "run now" event
at any moment in response ot memory pressure.

  • bmalloc/Heap.cpp:

(bmalloc::Heap::Heap): Don't call into our own API -- that's a layering
violation.

(bmalloc::Heap::updateMemoryInUseParameters): No need for
m_scavengeSleepDuration anymore.

(bmalloc::Heap::concurrentScavenge): Added a back-off policy when the
heap is growing.
(bmalloc::Heap::scavenge):

(bmalloc::Heap::scavengeSmallPages):
(bmalloc::Heap::scavengeLargeObjects): Don't try to give up in the middle
of a scavenge event. Our new backoff policy supplants that design. Also,
it's easier to profile and understand scavenging behavior if it always
runs to completion once started.

(bmalloc::Heap::scheduleScavenger):
(bmalloc::Heap::scheduleScavengerIfUnderMemoryPressure): Added a
synchronous amortized check for memory pressure. This check has the
benefit that it runs immediately during high rates of heap activity,
so we can detect memory pressure right away and wake the scavenger
instead of waiting for the scavenger to wake up.

(bmalloc::Heap::allocateSmallPage):
(bmalloc::Heap::deallocateSmallLine):
(bmalloc::Heap::splitAndAllocate):
(bmalloc::Heap::tryAllocateLarge):
(bmalloc::Heap::shrinkLarge):
(bmalloc::Heap::deallocateLarge):

  • bmalloc/Heap.h:

(bmalloc::Heap::isUnderMemoryPressure):

  • bmalloc/Sizes.h:
  • bmalloc/VMHeap.h:

(bmalloc::VMHeap::deallocateSmallPage):

  • bmalloc/bmalloc.h:

(bmalloc::api::scavenge): Updated for API changes above.

Location:
trunk/Source/bmalloc
Files:
7 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/bmalloc/ChangeLog

    r217000 r217456  
     12017-05-25  Geoffrey Garen  <ggaren@apple.com> and Michael Saboff  <msaboff@apple.com>
     2
     3        bmalloc: scavenger runs too much on JetStream
     4        https://bugs.webkit.org/show_bug.cgi?id=172373
     5
     6        Reviewed by Geoffrey Garen.
     7
     8        Instruments says that JetStream on macOS spends about 3% of its time in
     9        madvise.
     10
     11        In <https://bugs.webkit.org/show_bug.cgi?id=160098>, Ben saw some
     12        evidence that madvise was the reason that switching to bmalloc for
     13        DFG::Node allocations was a slowdown the first time around.
     14
     15        In <https://bugs.webkit.org/show_bug.cgi?id=172124>, Michael saw that
     16        scavening policy can affect JetStream.
     17
     18        Intuitively, it seems wrong for the heap to idle shrink during hardcore
     19        benchmarking.
     20
     21        The strategy here is to back off in response to any heap growth event,
     22        and to wait 2s instead of 0.5s for heap growth to take place -- but we
     23        scavenge immediately in response to critical memory pressure, to avoid
     24        jetsam.
     25
     26        One hole in this strategy is that a workload with a perfectly
     27        unfragmented heap that allocates and deallocates ~16kB every 2s will
     28        never shrink its heap. This doesn't seem to be a problem in practice.
     29
     30        This looks like a 2% - 4% speedup on JetStream on Mac Pro and MacBook Air.
     31
     32        * bmalloc/AsyncTask.h:
     33        (bmalloc::AsyncTask::willRun):
     34        (bmalloc::AsyncTask::willRunSoon):
     35        (bmalloc::Function>::AsyncTask):
     36        (bmalloc::Function>::run):
     37        (bmalloc::Function>::runSoon):
     38        (bmalloc::Function>::threadRunLoop):
     39        (bmalloc::Function>::runSlowCase): Deleted. Added a "run soon" state
     40        so that execution delay is modeled directly instead of implicitly
     41        through sleep events. This enables the Heap to issue a "run now" event
     42        at any moment in response ot memory pressure.
     43
     44        * bmalloc/Heap.cpp:
     45        (bmalloc::Heap::Heap): Don't call into our own API -- that's a layering
     46        violation.
     47
     48        (bmalloc::Heap::updateMemoryInUseParameters): No need for
     49        m_scavengeSleepDuration anymore.
     50
     51        (bmalloc::Heap::concurrentScavenge): Added a back-off policy when the
     52        heap is growing.
     53        (bmalloc::Heap::scavenge):
     54
     55        (bmalloc::Heap::scavengeSmallPages):
     56        (bmalloc::Heap::scavengeLargeObjects): Don't try to give up in the middle
     57        of a scavenge event. Our new backoff policy supplants that design. Also,
     58        it's easier to profile and understand scavenging behavior if it always
     59        runs to completion once started.
     60
     61        (bmalloc::Heap::scheduleScavenger):
     62        (bmalloc::Heap::scheduleScavengerIfUnderMemoryPressure): Added a
     63        synchronous amortized check for memory pressure. This check has the
     64        benefit that it runs immediately during high rates of heap activity,
     65        so we can detect memory pressure right away and wake the scavenger
     66        instead of waiting for the scavenger to wake up.
     67
     68        (bmalloc::Heap::allocateSmallPage):
     69        (bmalloc::Heap::deallocateSmallLine):
     70        (bmalloc::Heap::splitAndAllocate):
     71        (bmalloc::Heap::tryAllocateLarge):
     72        (bmalloc::Heap::shrinkLarge):
     73        (bmalloc::Heap::deallocateLarge):
     74        * bmalloc/Heap.h:
     75        (bmalloc::Heap::isUnderMemoryPressure):
     76        * bmalloc/Sizes.h:
     77        * bmalloc/VMHeap.h:
     78        (bmalloc::VMHeap::deallocateSmallPage):
     79        * bmalloc/bmalloc.h:
     80        (bmalloc::api::scavenge): Updated for API changes above.
     81
    1822017-05-17  Michael Saboff  <msaboff@apple.com>
    283
  • trunk/Source/bmalloc/bmalloc/AsyncTask.h

    r208562 r217456  
    3030#include "Inline.h"
    3131#include "Mutex.h"
     32#include "Sizes.h"
    3233#include <atomic>
    3334#include <condition_variable>
     
    4142    AsyncTask(Object&, const Function&);
    4243    ~AsyncTask();
    43 
     44   
     45    bool willRun() { return m_state == State::Run; }
    4446    void run();
    45 
     47   
     48    bool willRunSoon() { return m_state > State::Sleep; }
     49    void runSoon();
     50   
    4651private:
    47     enum State { Sleeping, Running, RunRequested };
    48 
     52    enum class State { Sleep, Run, RunSoon };
     53   
    4954    void runSlowCase();
    50 
     55    void runSoonSlowCase();
     56   
    5157    static void threadEntryPoint(AsyncTask*);
    5258    void threadRunLoop();
     
    6571template<typename Object, typename Function>
    6672AsyncTask<Object, Function>::AsyncTask(Object& object, const Function& function)
    67     : m_state(Running)
     73    : m_state(State::Sleep)
    6874    , m_condition()
    6975    , m_thread(std::thread(&AsyncTask::threadEntryPoint, this))
     
    8288
    8389template<typename Object, typename Function>
    84 inline void AsyncTask<Object, Function>::run()
     90void AsyncTask<Object, Function>::run()
    8591{
    86     if (m_state == RunRequested)
    87         return;
    88     runSlowCase();
     92    m_state = State::Run;
     93   
     94    std::lock_guard<Mutex> lock(m_conditionMutex);
     95    m_condition.notify_all();
    8996}
    90 
     97   
    9198template<typename Object, typename Function>
    92 NO_INLINE void AsyncTask<Object, Function>::runSlowCase()
     99void AsyncTask<Object, Function>::runSoon()
    93100{
    94     State oldState = m_state.exchange(RunRequested);
    95     if (oldState == RunRequested || oldState == Running)
    96         return;
    97 
    98     BASSERT(oldState == Sleeping);
     101    m_state = State::RunSoon;
     102   
    99103    std::lock_guard<Mutex> lock(m_conditionMutex);
    100104    m_condition.notify_all();
     
    116120    // This loop ratchets downward from most active to least active state. While
    117121    // we ratchet downward, any other thread may reset our state.
    118 
     122   
    119123    // We require any state change while we are sleeping to signal to our
    120124    // condition variable and wake us up.
    121 
     125   
    122126    while (1) {
    123         State expectedState = RunRequested;
    124         if (m_state.compare_exchange_weak(expectedState, Running))
    125             (m_object.*m_function)();
    126 
    127         expectedState = Running;
    128         if (m_state.compare_exchange_weak(expectedState, Sleeping)) {
     127        if (m_state == State::Sleep) {
    129128            std::unique_lock<Mutex> lock(m_conditionMutex);
    130             m_condition.wait(lock, [&]() { return m_state != Sleeping; });
     129            m_condition.wait(lock, [&]() { return m_state != State::Sleep; });
    131130        }
     131       
     132        if (m_state == State::RunSoon) {
     133            std::unique_lock<Mutex> lock(m_conditionMutex);
     134            m_condition.wait_for(lock, asyncTaskSleepDuration, [&]() { return m_state != State::RunSoon; });
     135        }
     136       
     137        m_state = State::Sleep;
     138        (m_object.*m_function)();
    132139    }
    133140}
  • trunk/Source/bmalloc/bmalloc/Heap.cpp

    r216763 r217456  
    6969    m_pressureHandlerDispatchSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE, 0, DISPATCH_MEMORYPRESSURE_CRITICAL, queue);
    7070    dispatch_source_set_event_handler(m_pressureHandlerDispatchSource, ^{
    71         api::scavenge();
     71        std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
     72        scavenge(lock);
    7273    });
    7374    dispatch_resume(m_pressureHandlerDispatchSource);
     
    145146    double percentInUse = static_cast<double>(m_memoryFootprint) / static_cast<double>(m_maxAvailableMemory);
    146147    m_percentAvailableMemoryInUse = std::min(percentInUse, 1.0);
    147 
    148     double percentFree = 1.0 - m_percentAvailableMemoryInUse;
    149     double sleepInMS = 1200.0 * percentFree * percentFree - 100.0 * percentFree + 2.0;
    150     sleepInMS = std::max(std::min(sleepInMS, static_cast<double>(maxScavengeSleepDuration.count())), 2.0);
    151 
    152     m_scavengeSleepDuration = std::chrono::milliseconds(static_cast<long long>(sleepInMS));
    153148}
    154149#endif
     
    156151void Heap::concurrentScavenge()
    157152{
     153    std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
     154
    158155#if BOS(DARWIN)
    159156    pthread_set_qos_class_self_np(m_requestedScavengerThreadQOSClass, 0);
    160157#endif
    161158
    162     std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
    163 
    164     scavenge(lock, Async);
    165 
    166 #if BPLATFORM(IOS)
    167     updateMemoryInUseParameters();
    168 #endif
    169 }
    170 
    171 void Heap::scavenge(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
    172 {
    173     m_isAllocatingPages.fill(false);
    174     m_isAllocatingLargePages = false;
    175 
    176     if (scavengeMode == Async)
    177         sleep(lock, m_scavengeSleepDuration);
    178 
    179     scavengeSmallPages(lock, scavengeMode);
    180     scavengeLargeObjects(lock, scavengeMode);
    181 }
    182 
    183 void Heap::scavengeSmallPages(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
     159    if (m_isGrowing && !isUnderMemoryPressure()) {
     160        m_isGrowing = false;
     161        m_scavenger.runSoon();
     162        return;
     163    }
     164   
     165    scavenge(lock);
     166}
     167
     168void Heap::scavenge(std::unique_lock<StaticMutex>& lock)
     169{
     170    scavengeSmallPages(lock);
     171    scavengeLargeObjects(lock);
     172}
     173   
     174void Heap::scavengeSmallPages(std::unique_lock<StaticMutex>& lock)
    184175{
    185176    for (size_t pageClass = 0; pageClass < pageClassCount; pageClass++) {
     
    187178
    188179        while (!smallPages.isEmpty()) {
    189             if (m_isAllocatingPages[pageClass]) {
    190                 m_scavenger.run();
    191                 break;
    192             }
    193 
    194180            SmallPage* page = smallPages.pop();
    195             m_vmHeap.deallocateSmallPage(lock, pageClass, page, scavengeMode);
    196         }
    197     }
    198 }
    199 
    200 void Heap::scavengeLargeObjects(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
     181            m_vmHeap.deallocateSmallPage(lock, pageClass, page);
     182        }
     183    }
     184}
     185
     186void Heap::scavengeLargeObjects(std::unique_lock<StaticMutex>& lock)
    201187{
    202188    auto& ranges = m_largeFree.ranges();
    203189    for (size_t i = ranges.size(); i-- > 0; i = std::min(i, ranges.size())) {
    204         if (m_isAllocatingLargePages) {
    205             m_scavenger.run();
    206             break;
    207         }
    208 
    209190        auto range = ranges.pop(i);
    210 
    211         if (scavengeMode == Async)
    212             lock.unlock();
     191       
     192        lock.unlock();
    213193        vmDeallocatePhysicalPagesSloppy(range.begin(), range.size());
    214         if (scavengeMode == Async)
    215             lock.lock();
     194        lock.lock();
    216195
    217196        range.setPhysicalSize(0);
     
    220199}
    221200
     201void Heap::scheduleScavengerIfUnderMemoryPressure(size_t bytes)
     202{
     203    m_scavengerBytes += bytes;
     204    if (m_scavengerBytes < scavengerBytesPerMemoryPressureCheck)
     205        return;
     206
     207    m_scavengerBytes = 0;
     208
     209    if (m_scavenger.willRun())
     210        return;
     211
     212    if (!isUnderMemoryPressure())
     213        return;
     214
     215    m_isGrowing = false;
     216    m_scavenger.run();
     217}
     218
     219void Heap::scheduleScavenger(size_t bytes)
     220{
     221    scheduleScavengerIfUnderMemoryPressure(bytes);
     222
     223    if (m_scavenger.willRunSoon())
     224        return;
     225
     226    m_isGrowing = false;
     227    m_scavenger.runSoon();
     228}
     229
    222230SmallPage* Heap::allocateSmallPage(std::lock_guard<StaticMutex>& lock, size_t sizeClass)
    223231{
     
    225233        return m_smallPagesWithFreeLines[sizeClass].popFront();
    226234
     235    m_isGrowing = true;
     236   
    227237    SmallPage* page = [&]() {
    228238        size_t pageClass = m_pageClasses[sizeClass];
     
    230240            return m_smallPages[pageClass].pop();
    231241
    232         m_isAllocatingPages[pageClass] = true;
    233 
     242        scheduleScavengerIfUnderMemoryPressure(pageSize(pageClass));
     243       
    234244        SmallPage* page = m_vmHeap.allocateSmallPage(lock, pageClass);
    235245        m_objectTypes.set(Chunk::get(page), ObjectType::Small);
     
    260270    m_smallPagesWithFreeLines[sizeClass].remove(page);
    261271    m_smallPages[pageClass].push(page);
    262 
    263     m_scavenger.run();
     272   
     273    scheduleScavenger(pageSize(pageClass));
    264274}
    265275
     
    399409   
    400410    if (range.physicalSize() < range.size()) {
    401         m_isAllocatingLargePages = true;
    402 
     411        scheduleScavengerIfUnderMemoryPressure(range.size());
     412       
    403413        vmAllocatePhysicalPagesSloppy(range.begin() + range.physicalSize(), range.size() - range.physicalSize());
    404414        range.setPhysicalSize(range.size());
     
    421431    BASSERT(isPowerOfTwo(alignment));
    422432
     433    m_isGrowing = true;
     434   
    423435    size_t roundedSize = size ? roundUpToMultipleOf(largeAlignment, size) : largeAlignment;
    424436    if (roundedSize < size) // Check for overflow
     
    469481    splitAndAllocate(range, alignment, newSize);
    470482
    471     m_scavenger.run();
     483    scheduleScavenger(size);
    472484}
    473485
     
    477489    m_largeFree.add(LargeRange(object, size, size));
    478490   
    479     m_scavenger.run();
     491    scheduleScavenger(size);
    480492}
    481493
  • trunk/Source/bmalloc/bmalloc/Heap.h

    r217000 r217456  
    7171    void shrinkLarge(std::lock_guard<StaticMutex>&, const Range&, size_t);
    7272
    73     void scavenge(std::unique_lock<StaticMutex>&, ScavengeMode);
    74 
    75 #if BPLATFORM(IOS)
     73    void scavenge(std::unique_lock<StaticMutex>&);
     74
    7675    size_t memoryFootprint();
    7776    double percentAvailableMemoryInUse();
    78 #endif
    79 
     77    bool isUnderMemoryPressure();
     78   
    8079#if BOS(DARWIN)
    8180    void setScavengerThreadQOSClass(qos_class_t overrideClass) { m_requestedScavengerThreadQOSClass = overrideClass; }
     
    111110    LargeRange splitAndAllocate(LargeRange&, size_t alignment, size_t);
    112111
     112    void scheduleScavenger(size_t);
     113    void scheduleScavengerIfUnderMemoryPressure(size_t);
     114   
    113115    void concurrentScavenge();
    114     void scavengeSmallPages(std::unique_lock<StaticMutex>&, ScavengeMode);
    115     void scavengeLargeObjects(std::unique_lock<StaticMutex>&, ScavengeMode);
    116 
     116    void scavengeSmallPages(std::unique_lock<StaticMutex>&);
     117    void scavengeLargeObjects(std::unique_lock<StaticMutex>&);
     118   
    117119#if BPLATFORM(IOS)
    118120    void updateMemoryInUseParameters();
     
    131133    Map<Chunk*, ObjectType, ChunkHash> m_objectTypes;
    132134
    133     std::array<bool, pageClassCount> m_isAllocatingPages;
    134     bool m_isAllocatingLargePages;
    135 
     135    size_t m_scavengerBytes { 0 };
     136    bool m_isGrowing { false };
     137   
    136138    AsyncTask<Heap, decltype(&Heap::concurrentScavenge)> m_scavenger;
    137139
    138140    Environment m_environment;
    139141    DebugHeap* m_debugHeap;
    140 
    141     std::chrono::milliseconds m_scavengeSleepDuration = { maxScavengeSleepDuration };
    142142
    143143#if BPLATFORM(IOS)
     
    171171}
    172172
     173inline bool Heap::isUnderMemoryPressure()
     174{
     175#if BPLATFORM(IOS)
     176    return percentAvailableMemoryInUse() > memoryPressureThreshold;
     177#else
     178    return false;
     179#endif
     180}
     181   
    173182#if BPLATFORM(IOS)
    174183inline size_t Heap::memoryFootprint()
  • trunk/Source/bmalloc/bmalloc/Sizes.h

    r216960 r217456  
    6969    static const size_t bumpRangeCacheCapacity = 3;
    7070   
    71     static const std::chrono::milliseconds maxScavengeSleepDuration = std::chrono::milliseconds(512);
    72 
     71    static const size_t scavengerBytesPerMemoryPressureCheck = 16 * MB;
     72    static const double memoryPressureThreshold = 0.75;
     73   
     74    static const std::chrono::milliseconds asyncTaskSleepDuration = std::chrono::milliseconds(2000);
     75   
    7376    static const size_t maskSizeClassCount = maskSizeClassMax / alignment;
    7477
  • trunk/Source/bmalloc/bmalloc/VMHeap.h

    r215909 r217456  
    4747public:
    4848    SmallPage* allocateSmallPage(std::lock_guard<StaticMutex>&, size_t);
    49     void deallocateSmallPage(std::unique_lock<StaticMutex>&, size_t, SmallPage*, ScavengeMode);
     49    void deallocateSmallPage(std::unique_lock<StaticMutex>&, size_t, SmallPage*);
    5050
    5151    LargeRange tryAllocateLargeChunk(std::lock_guard<StaticMutex>&, size_t alignment, size_t);
     
    7171}
    7272
    73 inline void VMHeap::deallocateSmallPage(std::unique_lock<StaticMutex>& lock, size_t pageClass, SmallPage* page, ScavengeMode scavengeMode)
     73inline void VMHeap::deallocateSmallPage(std::unique_lock<StaticMutex>& lock, size_t pageClass, SmallPage* page)
    7474{
    75     if (scavengeMode == Async)
    76         lock.unlock();
     75    lock.unlock();
    7776    vmDeallocatePhysicalPagesSloppy(page->begin()->begin(), pageSize(pageClass));
    78     if (scavengeMode == Async)
    79         lock.lock();
     77    lock.lock();
    8078   
    8179    m_smallPages[pageClass].push(page);
  • trunk/Source/bmalloc/bmalloc/bmalloc.h

    r216763 r217456  
    7878
    7979    std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
    80     PerProcess<Heap>::get()->scavenge(lock, Sync);
     80    PerProcess<Heap>::get()->scavenge(lock);
    8181}
    8282
Note: See TracChangeset for help on using the changeset viewer.