Context Navigation

← Previous Changeset
Next Changeset →

Changeset 217456 in webkit

Timestamp:

May 25, 2017 4:11:24 PM (7 years ago)

Author:

msaboff@apple.com

Message:

bmalloc: scavenger runs too much on JetStream
https://bugs.webkit.org/show_bug.cgi?id=172373

Reviewed by Geoffrey Garen.

Instruments says that JetStream on macOS spends about 3% of its time in
madvise.

In <https://bugs.webkit.org/show_bug.cgi?id=160098>, Ben saw some
evidence that madvise was the reason that switching to bmalloc for
DFG::Node allocations was a slowdown the first time around.

In <https://bugs.webkit.org/show_bug.cgi?id=172124>, Michael saw that
scavening policy can affect JetStream.

Intuitively, it seems wrong for the heap to idle shrink during hardcore
benchmarking.

The strategy here is to back off in response to any heap growth event,
and to wait 2s instead of 0.5s for heap growth to take place -- but we
scavenge immediately in response to critical memory pressure, to avoid
jetsam.

One hole in this strategy is that a workload with a perfectly
unfragmented heap that allocates and deallocates ~16kB every 2s will
never shrink its heap. This doesn't seem to be a problem in practice.

This looks like a 2% - 4% speedup on JetStream on Mac Pro and MacBook Air.

bmalloc/AsyncTask.h:

(bmalloc::AsyncTask::willRun):
(bmalloc::AsyncTask::willRunSoon):
(bmalloc::Function>::AsyncTask):
(bmalloc::Function>::run):
(bmalloc::Function>::runSoon):
(bmalloc::Function>::threadRunLoop):
(bmalloc::Function>::runSlowCase): Deleted. Added a "run soon" state
so that execution delay is modeled directly instead of implicitly
through sleep events. This enables the Heap to issue a "run now" event
at any moment in response ot memory pressure.

bmalloc/Heap.cpp:

(bmalloc::Heap::Heap): Don't call into our own API -- that's a layering
violation.

(bmalloc::Heap::updateMemoryInUseParameters): No need for
m_scavengeSleepDuration anymore.

(bmalloc::Heap::concurrentScavenge): Added a back-off policy when the
heap is growing.
(bmalloc::Heap::scavenge):

(bmalloc::Heap::scavengeSmallPages):
(bmalloc::Heap::scavengeLargeObjects): Don't try to give up in the middle
of a scavenge event. Our new backoff policy supplants that design. Also,
it's easier to profile and understand scavenging behavior if it always
runs to completion once started.

(bmalloc::Heap::scheduleScavenger):
(bmalloc::Heap::scheduleScavengerIfUnderMemoryPressure): Added a
synchronous amortized check for memory pressure. This check has the
benefit that it runs immediately during high rates of heap activity,
so we can detect memory pressure right away and wake the scavenger
instead of waiting for the scavenger to wake up.

(bmalloc::Heap::allocateSmallPage):
(bmalloc::Heap::deallocateSmallLine):
(bmalloc::Heap::splitAndAllocate):
(bmalloc::Heap::tryAllocateLarge):
(bmalloc::Heap::shrinkLarge):
(bmalloc::Heap::deallocateLarge):

bmalloc/Heap.h:

(bmalloc::Heap::isUnderMemoryPressure):

bmalloc/Sizes.h:
bmalloc/VMHeap.h:

(bmalloc::VMHeap::deallocateSmallPage):

bmalloc/bmalloc.h:

(bmalloc::api::scavenge): Updated for API changes above.

Location:

trunk/Source/bmalloc

Files:

: 7 edited

ChangeLog (modified) (1 diff)
bmalloc/AsyncTask.h (modified) (5 diffs)
bmalloc/Heap.cpp (modified) (12 diffs)
bmalloc/Heap.h (modified) (4 diffs)
bmalloc/Sizes.h (modified) (1 diff)
bmalloc/VMHeap.h (modified) (2 diffs)
bmalloc/bmalloc.h (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/bmalloc/ChangeLog

-                      r217000
+                      r217456
+-05-25  Geoffrey Garen  <ggaren@apple.com> and Michael Saboff  <msaboff@apple.com>
+        bmalloc: scavenger runs too much on JetStream
+        https://bugs.webkit.org/show_bug.cgi?id=172373
+        Reviewed by Geoffrey Garen.
+        Instruments says that JetStream on macOS spends about 3% of its time in
+        madvise.
+        In <https://bugs.webkit.org/show_bug.cgi?id=160098>, Ben saw some
+        evidence that madvise was the reason that switching to bmalloc for
+        DFG::Node allocations was a slowdown the first time around.
+        In <https://bugs.webkit.org/show_bug.cgi?id=172124>, Michael saw that
+        scavening policy can affect JetStream.
+        Intuitively, it seems wrong for the heap to idle shrink during hardcore
+        benchmarking.
+        The strategy here is to back off in response to any heap growth event,
+        and to wait 2s instead of 0.5s for heap growth to take place -- but we
+        scavenge immediately in response to critical memory pressure, to avoid
+        jetsam.
+        One hole in this strategy is that a workload with a perfectly
+        unfragmented heap that allocates and deallocates ~16kB every 2s will
+        never shrink its heap. This doesn't seem to be a problem in practice.
+        This looks like a 2% - 4% speedup on JetStream on Mac Pro and MacBook Air.
+        * bmalloc/AsyncTask.h:
+        (bmalloc::AsyncTask::willRun):
+        (bmalloc::AsyncTask::willRunSoon):
+        (bmalloc::Function>::AsyncTask):
+        (bmalloc::Function>::run):
+        (bmalloc::Function>::runSoon):
+        (bmalloc::Function>::threadRunLoop):
+        (bmalloc::Function>::runSlowCase): Deleted. Added a "run soon" state
+        so that execution delay is modeled directly instead of implicitly
+        through sleep events. This enables the Heap to issue a "run now" event
+        at any moment in response ot memory pressure.
+        * bmalloc/Heap.cpp:
+        (bmalloc::Heap::Heap): Don't call into our own API -- that's a layering
+        violation.
+        (bmalloc::Heap::updateMemoryInUseParameters): No need for
+        m_scavengeSleepDuration anymore.
+        (bmalloc::Heap::concurrentScavenge): Added a back-off policy when the
+        heap is growing.
+        (bmalloc::Heap::scavenge):
+        (bmalloc::Heap::scavengeSmallPages):
+        (bmalloc::Heap::scavengeLargeObjects): Don't try to give up in the middle
+        of a scavenge event. Our new backoff policy supplants that design. Also,
+        it's easier to profile and understand scavenging behavior if it always
+        runs to completion once started.
+        (bmalloc::Heap::scheduleScavenger):
+        (bmalloc::Heap::scheduleScavengerIfUnderMemoryPressure): Added a
+        synchronous amortized check for memory pressure. This check has the
+        benefit that it runs immediately during high rates of heap activity,
+        so we can detect memory pressure right away and wake the scavenger
+        instead of waiting for the scavenger to wake up.
+        (bmalloc::Heap::allocateSmallPage):
+        (bmalloc::Heap::deallocateSmallLine):
+        (bmalloc::Heap::splitAndAllocate):
+        (bmalloc::Heap::tryAllocateLarge):
+        (bmalloc::Heap::shrinkLarge):
+        (bmalloc::Heap::deallocateLarge):
+        * bmalloc/Heap.h:
+        (bmalloc::Heap::isUnderMemoryPressure):
+        * bmalloc/Sizes.h:
+        * bmalloc/VMHeap.h:
+        (bmalloc::VMHeap::deallocateSmallPage):
+        * bmalloc/bmalloc.h:
+        (bmalloc::api::scavenge): Updated for API changes above.
 -05-17  Michael Saboff  <msaboff@apple.com>

trunk/Source/bmalloc/bmalloc/AsyncTask.h

-                      r208562
+                      r217456
 #include "Inline.h"
 #include "Mutex.h"
+#include "Sizes.h"
 #include <atomic>
 #include <condition_variable>
 …
     AsyncTask(Object&, const Function&);
     ~AsyncTask();
+    bool willRun() { return m_state == State::Run; }
     void run();
+    bool willRunSoon() { return m_state > State::Sleep; }
+    void runSoon();
 private:
     enum State { Sleeping, Running, RunRequested };
+    enum class State { Sleep, Run, RunSoon };
     void runSlowCase();
+    void runSoonSlowCase();
     static void threadEntryPoint(AsyncTask*);
     void threadRunLoop();
 …
 template<typename Object, typename Function>
 AsyncTask<Object, Function>::AsyncTask(Object& object, const Function& function)
     : m_state(Running)
+    : m_state(State::Sleep)
     , m_condition()
     , m_thread(std::thread(&AsyncTask::threadEntryPoint, this))
 …
 template<typename Object, typename Function>
 inline void AsyncTask<Object, Function>::run()
+void AsyncTask<Object, Function>::run()
+{
+    if (m_state == RunRequested)
+        return;
+    runSlowCase();
+    m_state = State::Run;
+    std::lock_guard<Mutex> lock(m_conditionMutex);
+    m_condition.notify_all();
+}
 template<typename Object, typename Function>
 NO_INLINE void AsyncTask<Object, Function>::runSlowCase()
+void AsyncTask<Object, Function>::runSoon()
+{
+    State oldState = m_state.exchange(RunRequested);
+    if (oldState == RunRequested || oldState == Running)
+        return;
+    BASSERT(oldState == Sleeping);
+    m_state = State::RunSoon;
     std::lock_guard<Mutex> lock(m_conditionMutex);
     m_condition.notify_all();
 …
     // This loop ratchets downward from most active to least active state. While
     // we ratchet downward, any other thread may reset our state.
     // We require any state change while we are sleeping to signal to our
     // condition variable and wake us up.
     while (1) {
+        State expectedState = RunRequested;
+        if (m_state.compare_exchange_weak(expectedState, Running))
+            (m_object.*m_function)();
+        expectedState = Running;
+        if (m_state.compare_exchange_weak(expectedState, Sleeping)) {
+        if (m_state == State::Sleep) {
             std::unique_lock<Mutex> lock(m_conditionMutex);
             m_condition.wait(lock, [&]() { return m_state != Sleeping; });
+            m_condition.wait(lock, [&]() { return m_state != State::Sleep; });
+        }
+        if (m_state == State::RunSoon) {
+            std::unique_lock<Mutex> lock(m_conditionMutex);
+            m_condition.wait_for(lock, asyncTaskSleepDuration, [&]() { return m_state != State::RunSoon; });
+        }
+        m_state = State::Sleep;
+        (m_object.*m_function)();
+    }
+}

trunk/Source/bmalloc/bmalloc/Heap.cpp

-                      r216763
+                      r217456
     m_pressureHandlerDispatchSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_MEMORYPRESSURE, 0, DISPATCH_MEMORYPRESSURE_CRITICAL, queue);
     dispatch_source_set_event_handler(m_pressureHandlerDispatchSource, ^{
+        api::scavenge();
+        std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
+        scavenge(lock);
     });
     dispatch_resume(m_pressureHandlerDispatchSource);
 …
     double percentInUse = static_cast<double>(m_memoryFootprint) / static_cast<double>(m_maxAvailableMemory);
     m_percentAvailableMemoryInUse = std::min(percentInUse, 1.0);
-    double percentFree = 1.0 - m_percentAvailableMemoryInUse;
-    double sleepInMS = 1200.0 * percentFree * percentFree - 100.0 * percentFree + 2.0;
-    sleepInMS = std::max(std::min(sleepInMS, static_cast<double>(maxScavengeSleepDuration.count())), 2.0);
-    m_scavengeSleepDuration = std::chrono::milliseconds(static_cast<long long>(sleepInMS));
+}
 #endif
 …
 void Heap::concurrentScavenge()
+{
+    std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
 #if BOS(DARWIN)
     pthread_set_qos_class_self_np(m_requestedScavengerThreadQOSClass, 0);
 #endif
+    std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
+    scavenge(lock, Async);
+#if BPLATFORM(IOS)
+    updateMemoryInUseParameters();
+#endif
+}
+void Heap::scavenge(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
+{
+    m_isAllocatingPages.fill(false);
+    m_isAllocatingLargePages = false;
+    if (scavengeMode == Async)
+        sleep(lock, m_scavengeSleepDuration);
+    scavengeSmallPages(lock, scavengeMode);
+    scavengeLargeObjects(lock, scavengeMode);
+}
+void Heap::scavengeSmallPages(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
+    if (m_isGrowing && !isUnderMemoryPressure()) {
+        m_isGrowing = false;
+        m_scavenger.runSoon();
+        return;
+    }
+    scavenge(lock);
+}
+void Heap::scavenge(std::unique_lock<StaticMutex>& lock)
+{
+    scavengeSmallPages(lock);
+    scavengeLargeObjects(lock);
+}
+void Heap::scavengeSmallPages(std::unique_lock<StaticMutex>& lock)
+{
     for (size_t pageClass = 0; pageClass < pageClassCount; pageClass++) {
 …
         while (!smallPages.isEmpty()) {
-            if (m_isAllocatingPages[pageClass]) {
-                m_scavenger.run();
-                break;
+            }
             SmallPage* page = smallPages.pop();
             m_vmHeap.deallocateSmallPage(lock, pageClass, page, scavengeMode);
+        }
+    }
+}
 void Heap::scavengeLargeObjects(std::unique_lock<StaticMutex>& lock, ScavengeMode scavengeMode)
+            m_vmHeap.deallocateSmallPage(lock, pageClass, page);
+        }
+    }
+}
+void Heap::scavengeLargeObjects(std::unique_lock<StaticMutex>& lock)
+{
     auto& ranges = m_largeFree.ranges();
     for (size_t i = ranges.size(); i-- > 0; i = std::min(i, ranges.size())) {
-        if (m_isAllocatingLargePages) {
-            m_scavenger.run();
-            break;
+        }
         auto range = ranges.pop(i);
+        if (scavengeMode == Async)
+            lock.unlock();
+        lock.unlock();
         vmDeallocatePhysicalPagesSloppy(range.begin(), range.size());
+        if (scavengeMode == Async)
+            lock.lock();
+        lock.lock();
         range.setPhysicalSize(0);
 …
+}
+void Heap::scheduleScavengerIfUnderMemoryPressure(size_t bytes)
+{
+    m_scavengerBytes += bytes;
+    if (m_scavengerBytes < scavengerBytesPerMemoryPressureCheck)
+        return;
+    m_scavengerBytes = 0;
+    if (m_scavenger.willRun())
+        return;
+    if (!isUnderMemoryPressure())
+        return;
+    m_isGrowing = false;
+    m_scavenger.run();
+}
+void Heap::scheduleScavenger(size_t bytes)
+{
+    scheduleScavengerIfUnderMemoryPressure(bytes);
+    if (m_scavenger.willRunSoon())
+        return;
+    m_isGrowing = false;
+    m_scavenger.runSoon();
+}
 SmallPage* Heap::allocateSmallPage(std::lock_guard<StaticMutex>& lock, size_t sizeClass)
+{
 …
         return m_smallPagesWithFreeLines[sizeClass].popFront();
+    m_isGrowing = true;
     SmallPage* page = [&]() {
         size_t pageClass = m_pageClasses[sizeClass];
 …
             return m_smallPages[pageClass].pop();
         m_isAllocatingPages[pageClass] = true;
+        scheduleScavengerIfUnderMemoryPressure(pageSize(pageClass));
         SmallPage* page = m_vmHeap.allocateSmallPage(lock, pageClass);
         m_objectTypes.set(Chunk::get(page), ObjectType::Small);
 …
     m_smallPagesWithFreeLines[sizeClass].remove(page);
     m_smallPages[pageClass].push(page);
     m_scavenger.run();
+    scheduleScavenger(pageSize(pageClass));
+}
 …
     if (range.physicalSize() < range.size()) {
         m_isAllocatingLargePages = true;
+        scheduleScavengerIfUnderMemoryPressure(range.size());
         vmAllocatePhysicalPagesSloppy(range.begin() + range.physicalSize(), range.size() - range.physicalSize());
         range.setPhysicalSize(range.size());
 …
     BASSERT(isPowerOfTwo(alignment));
+    m_isGrowing = true;
     size_t roundedSize = size ? roundUpToMultipleOf(largeAlignment, size) : largeAlignment;
     if (roundedSize < size) // Check for overflow
 …
     splitAndAllocate(range, alignment, newSize);
     m_scavenger.run();
+    scheduleScavenger(size);
+}
 …
     m_largeFree.add(LargeRange(object, size, size));
     m_scavenger.run();
+    scheduleScavenger(size);
+}

trunk/Source/bmalloc/bmalloc/Heap.h

-                      r217000
+                      r217456
     void shrinkLarge(std::lock_guard<StaticMutex>&, const Range&, size_t);
+    void scavenge(std::unique_lock<StaticMutex>&, ScavengeMode);
+#if BPLATFORM(IOS)
+    void scavenge(std::unique_lock<StaticMutex>&);
     size_t memoryFootprint();
     double percentAvailableMemoryInUse();
+#endif
+    bool isUnderMemoryPressure();
 #if BOS(DARWIN)
     void setScavengerThreadQOSClass(qos_class_t overrideClass) { m_requestedScavengerThreadQOSClass = overrideClass; }
 …
     LargeRange splitAndAllocate(LargeRange&, size_t alignment, size_t);
+    void scheduleScavenger(size_t);
+    void scheduleScavengerIfUnderMemoryPressure(size_t);
     void concurrentScavenge();
     void scavengeSmallPages(std::unique_lock<StaticMutex>&, ScavengeMode);
     void scavengeLargeObjects(std::unique_lock<StaticMutex>&, ScavengeMode);
+    void scavengeSmallPages(std::unique_lock<StaticMutex>&);
+    void scavengeLargeObjects(std::unique_lock<StaticMutex>&);
 #if BPLATFORM(IOS)
     void updateMemoryInUseParameters();
 …
     Map<Chunk*, ObjectType, ChunkHash> m_objectTypes;
     std::array<bool, pageClassCount> m_isAllocatingPages;
     bool m_isAllocatingLargePages;
+    size_t m_scavengerBytes { 0 };
+    bool m_isGrowing { false };
     AsyncTask<Heap, decltype(&Heap::concurrentScavenge)> m_scavenger;
     Environment m_environment;
     DebugHeap* m_debugHeap;
-    std::chrono::milliseconds m_scavengeSleepDuration = { maxScavengeSleepDuration };
 #if BPLATFORM(IOS)
 …
+}
+inline bool Heap::isUnderMemoryPressure()
+{
+#if BPLATFORM(IOS)
+    return percentAvailableMemoryInUse() > memoryPressureThreshold;
+#else
+    return false;
+#endif
+}
 #if BPLATFORM(IOS)
 inline size_t Heap::memoryFootprint()

trunk/Source/bmalloc/bmalloc/Sizes.h

-                      r216960
+                      r217456
     static const size_t bumpRangeCacheCapacity = 3;
+    static const std::chrono::milliseconds maxScavengeSleepDuration = std::chrono::milliseconds(512);
+    static const size_t scavengerBytesPerMemoryPressureCheck = 16 * MB;
+    static const double memoryPressureThreshold = 0.75;
+    static const std::chrono::milliseconds asyncTaskSleepDuration = std::chrono::milliseconds(2000);
     static const size_t maskSizeClassCount = maskSizeClassMax / alignment;

trunk/Source/bmalloc/bmalloc/VMHeap.h

-                      r215909
+                      r217456
 public:
     SmallPage* allocateSmallPage(std::lock_guard<StaticMutex>&, size_t);
     void deallocateSmallPage(std::unique_lock<StaticMutex>&, size_t, SmallPage*, ScavengeMode);
+    void deallocateSmallPage(std::unique_lock<StaticMutex>&, size_t, SmallPage*);
     LargeRange tryAllocateLargeChunk(std::lock_guard<StaticMutex>&, size_t alignment, size_t);
 …
+}
 inline void VMHeap::deallocateSmallPage(std::unique_lock<StaticMutex>& lock, size_t pageClass, SmallPage* page, ScavengeMode scavengeMode)
+inline void VMHeap::deallocateSmallPage(std::unique_lock<StaticMutex>& lock, size_t pageClass, SmallPage* page)
+{
+    if (scavengeMode == Async)
+        lock.unlock();
+    lock.unlock();
     vmDeallocatePhysicalPagesSloppy(page->begin()->begin(), pageSize(pageClass));
+    if (scavengeMode == Async)
+        lock.lock();
+    lock.lock();
     m_smallPages[pageClass].push(page);

trunk/Source/bmalloc/bmalloc/bmalloc.h

r216763	r217456
78	78
79	79	std::unique_lock<StaticMutex> lock(PerProcess<Heap>::mutex());
80		PerProcess<Heap>::get()->scavenge(lock~~, Sync~~);
	80	PerProcess<Heap>::get()->scavenge(lock);
81	81	}
82	82

Note: See TracChangeset for help on using the changeset viewer.