Changeset 211461 in webkit


Ignore:
Timestamp:
Jan 31, 2017 5:26:00 PM (7 years ago)
Author:
jfbastien@apple.com
Message:

OSR entry: delay outer-loop compilation when at inner-loop
https://bugs.webkit.org/show_bug.cgi?id=167149

Reviewed by Filip Pizlo.

r211224 was reverted because it caused a massive kraken/ai-astar
regression. This patch instead does the minimally-disruptive
change to fix the original bug as described below, but omits extra
tuning and refactoring which I had before. I'll commit tuning and
refactoring separately, if this sticks. This patch is therefore
very minimal, and layers carefully on top of the complex
spaghetti-logic. The only change it makes is that it uses triggers
to indicate to outer loops that they should compile, which fixes
the immediate bug and seems roughly perf neutral (maybe a small
gain on kraken sometimes, other times a small regression as would
be expected from compiling later).

As of https://bugs.webkit.org/show_bug.cgi?id=155217 OSR
compilation can be kicked off for an entry into an outer-loop,
while executing an inner-loop. This is desirable because often the
codegen from an inner-entry isn't as good as the codegen from an
outer-entry, but execution from an inner-loop is often pretty hot
and likely to kick off compilation. This approach provided nice
speedups on Kraken because we'd select to enter to the outer-loop
very reliably, which reduces variability (the inner-loop was
selected roughly 1/5 times from my unscientific measurements).

When compilation starts we take a snapshot of the JSValues at the
current execution state using OSR's recovery mechanism. These
values are passed to the compiler and are used as way to perform
type profiling, and could be used to observe cell types as well as
to perform predictions such as through constant propagation.

It's therefore desired to enter from the outer-loop when we can,
but we need to be executing from that location to capture the
right JSValues, otherwise we're confusing the compiler and giving
it inaccurate JSValues which can lead it to predict the wrong
things, leading to suboptimal code or recompilation due to
misprediction, or in super-corner-cases a crash.

These effects are pretty hard to measure: Fil points out that
marsalis-osr-entry really needs mustHandleValues (the JSValues
from the point of execution) because right now it just happens to
correctly guess int32. I tried removing mustHandleValues entirely
and saw no slowdowns, but our benchmarks probably aren't
sufficient to reliably find issues, sometimes because we happen to
have sufficient mitigations.

DFG tier-up was added here:
https://bugs.webkit.org/show_bug.cgi?id=112838

  • dfg/DFGOperations.cpp:
Location:
trunk/Source/JavaScriptCore
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ChangeLog

    r211448 r211461  
     12017-01-31  JF Bastien  <jfbastien@apple.com>
     2
     3        OSR entry: delay outer-loop compilation when at inner-loop
     4        https://bugs.webkit.org/show_bug.cgi?id=167149
     5
     6        Reviewed by Filip Pizlo.
     7
     8        r211224 was reverted because it caused a massive kraken/ai-astar
     9        regression. This patch instead does the minimally-disruptive
     10        change to fix the original bug as described below, but omits extra
     11        tuning and refactoring which I had before. I'll commit tuning and
     12        refactoring separately, if this sticks. This patch is therefore
     13        very minimal, and layers carefully on top of the complex
     14        spaghetti-logic. The only change it makes is that it uses triggers
     15        to indicate to outer loops that they should compile, which fixes
     16        the immediate bug and seems roughly perf neutral (maybe a small
     17        gain on kraken sometimes, other times a small regression as would
     18        be expected from compiling later).
     19
     20        As of https://bugs.webkit.org/show_bug.cgi?id=155217 OSR
     21        compilation can be kicked off for an entry into an outer-loop,
     22        while executing an inner-loop. This is desirable because often the
     23        codegen from an inner-entry isn't as good as the codegen from an
     24        outer-entry, but execution from an inner-loop is often pretty hot
     25        and likely to kick off compilation. This approach provided nice
     26        speedups on Kraken because we'd select to enter to the outer-loop
     27        very reliably, which reduces variability (the inner-loop was
     28        selected roughly 1/5 times from my unscientific measurements).
     29
     30        When compilation starts we take a snapshot of the JSValues at the
     31        current execution state using OSR's recovery mechanism. These
     32        values are passed to the compiler and are used as way to perform
     33        type profiling, and could be used to observe cell types as well as
     34        to perform predictions such as through constant propagation.
     35
     36        It's therefore desired to enter from the outer-loop when we can,
     37        but we need to be executing from that location to capture the
     38        right JSValues, otherwise we're confusing the compiler and giving
     39        it inaccurate JSValues which can lead it to predict the wrong
     40        things, leading to suboptimal code or recompilation due to
     41        misprediction, or in super-corner-cases a crash.
     42
     43        These effects are pretty hard to measure: Fil points out that
     44        marsalis-osr-entry really needs mustHandleValues (the JSValues
     45        from the point of execution) because right now it just happens to
     46        correctly guess int32. I tried removing mustHandleValues entirely
     47        and saw no slowdowns, but our benchmarks probably aren't
     48        sufficient to reliably find issues, sometimes because we happen to
     49        have sufficient mitigations.
     50
     51        DFG tier-up was added here:
     52        https://bugs.webkit.org/show_bug.cgi?id=112838
     53
     54        * dfg/DFGOperations.cpp:
     55
    1562017-01-31  Filip Pizlo  <fpizlo@apple.com>
    257
  • trunk/Source/JavaScriptCore/dfg/DFGOperations.cpp

    r211247 r211461  
    23522352
    23532353    JITCode* jitCode = codeBlock->jitCode()->dfg();
     2354
     2355    bool triggeredSlowPath = false;
     2356    auto tierUpEntryTriggers = jitCode->tierUpEntryTriggers.find(osrEntryBytecodeIndex);
     2357    if (tierUpEntryTriggers != jitCode->tierUpEntryTriggers.end()) {
     2358        if (tierUpEntryTriggers->value == 1) {
     2359            // We were asked to enter as soon as possible. Unset this trigger so we don't continually enter.
     2360            if (Options::verboseOSR())
     2361                dataLog("EntryTrigger for ", *codeBlock, " forced slow-path.\n");
     2362            triggeredSlowPath = true;
     2363            tierUpEntryTriggers->value = 0;
     2364        }
     2365    }
     2366
    23542367    if (worklistState == Worklist::Compiling) {
    23552368        CODEBLOCK_LOG_EVENT(codeBlock, "delayFTLCompile", ("still compiling"));
     
    23672380    }
    23682381
     2382    // The following is only true for triggerTierUpNowInLoop, which can never
     2383    // be an OSR entry.
     2384    bool canOSRFromHere = originBytecodeIndex == osrEntryBytecodeIndex;
     2385
    23692386    // If we can OSR Enter, do it right away.
    2370     if (originBytecodeIndex == osrEntryBytecodeIndex) {
     2387    if (canOSRFromHere) {
    23712388        unsigned streamIndex = jitCode->bytecodeIndexToStreamIndex.get(originBytecodeIndex);
    23722389        if (CodeBlock* entryBlock = jitCode->osrEntryBlock()) {
     
    23822399    // - If we couldn't enter for a while, then trigger OSR entry.
    23832400
    2384     if (!shouldTriggerFTLCompile(codeBlock, jitCode))
     2401    if (!shouldTriggerFTLCompile(codeBlock, jitCode) && !triggeredSlowPath)
    23852402        return nullptr;
    23862403
    2387     if (!jitCode->neverExecutedEntry) {
     2404    if (!jitCode->neverExecutedEntry && !triggeredSlowPath) {
    23882405        triggerFTLReplacementCompile(vm, codeBlock, jitCode);
    23892406
     
    24312448    }
    24322449
     2450    if (!canOSRFromHere) {
     2451        // We can't OSR from here, or even start a compilation because doing so
     2452        // calls jitCode->reconstruct which would get the wrong state.
     2453        if (Options::verboseOSR())
     2454            dataLog("Non-OSR-able bc#", originBytecodeIndex, " in ", *codeBlock, " setting parent loop's trigger and backing off.\n");
     2455        jitCode->tierUpEntryTriggers.set(osrEntryBytecodeIndex, 1);
     2456        jitCode->dontOptimizeAnytimeSoon(codeBlock);
     2457        return nullptr;
     2458    }
     2459
    24332460    unsigned streamIndex = jitCode->bytecodeIndexToStreamIndex.get(osrEntryBytecodeIndex);
    2434     auto tierUpHierarchyEntry = jitCode->tierUpInLoopHierarchy.find(osrEntryBytecodeIndex);
    2435     if (tierUpHierarchyEntry != jitCode->tierUpInLoopHierarchy.end()) {
    2436         for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) {
    2437             if (jitCode->tierUpEntrySeen.contains(osrEntryCandidate)) {
    2438                 osrEntryBytecodeIndex = osrEntryCandidate;
    2439                 streamIndex = jitCode->bytecodeIndexToStreamIndex.get(osrEntryBytecodeIndex);
     2461
     2462    if (!triggeredSlowPath) {
     2463        auto tierUpHierarchyEntry = jitCode->tierUpInLoopHierarchy.find(osrEntryBytecodeIndex);
     2464        if (tierUpHierarchyEntry != jitCode->tierUpInLoopHierarchy.end()) {
     2465            for (unsigned osrEntryCandidate : tierUpHierarchyEntry->value) {
     2466                if (jitCode->tierUpEntrySeen.contains(osrEntryCandidate)) {
     2467                    // Ask an enclosing loop to compile, instead of doing so here.
     2468                    jitCode->tierUpEntryTriggers.set(osrEntryCandidate, 1);
     2469                    jitCode->dontOptimizeAnytimeSoon(codeBlock);
     2470                    return nullptr;
     2471                }
    24402472            }
    24412473        }
Note: See TracChangeset for help on using the changeset viewer.