Changeset 243851 in webkit


Ignore:
Timestamp:
Apr 3, 2019 8:37:23 PM (5 years ago)
Author:
rmorisset@apple.com
Message:

B3 should use associativity to optimize expression trees
https://bugs.webkit.org/show_bug.cgi?id=194081

Reviewed by Filip Pizlo.

JSTests:

Added three microbenchmarks:

  • add-tree should be the ideal case, but there is no speedup because we are currently unable to prove that the CheckAdd won't overflow
  • bit-xor-tree most closely matches the situation where the optimization triggers on the JetStream2 subtests where it triggers: an unbalanced expression tree of size 8 that can be balanced, with no other optimizations being unlocked. 16% speedup
  • bit-or-tree is an ideal case, where the reassociation also enables a ton of further simplifications. 42% speedup
  • microbenchmarks/add-tree.js: Added.
  • microbenchmarks/bit-or-tree.js: Added.
  • microbenchmarks/bit-xor-tree.js: Added.

Source/JavaScriptCore:

This patch adds a new B3 pass, that tries to find and optimize expression trees made purely of any one associative and commutative operator (Add/Mul/BitOr/BitAnd/BitXor).
The pass only runs in O2, and runs once, after lowerMacros and just before a run of B3ReduceStrength (which helps clean up the dead code it tends to leave behind).
I had to separate killDeadCode out of B3ReduceStrength (as a new B3EliminateDeadCode pass) to run it before B3OptimizeAssociativeExpressionTrees, as otherwise it is stopped by high use counts
inherited from CSE.
This extra run of DCE is by itself a win, most notably on microbenchmarks/instanceof-always-hit-two (1.5x faster), and on microbenchmarks/licm-dragons(-out-of-bounds) (both get 1.16x speedup).
I suspect it is because it runs between CSE and tail-dedup, and as a result allows a lot more tail-dedup to occur.

The pass is currently extremely conservative, not trying anything if it would cause _any_ code duplication.
For this purpose, it starts by computing use counts for the potentially interesting nodes (those with the right opcodes), and segregate them into expression trees.
The root of an expression tree is a node that is either used in multiple places, or is used by a value with a different opcode.
The leaves of an expression tree are nodes that are either used in multiple places, or have a different opcode.
All constant leaves of a tree are combined, as well as all leaves that are identical. What remains is then laid out into a balanced binary tree, hopefully maximizing ILP.

This optimization was implemented as a stand-alone pass and not as part of B3ReduceStrength mostly because it needs use counts to avoid code duplication.
It also benefits from finding all tree roots first, and not trying to repeatedly optimize subtrees.

I added several tests to testB3 with varying patterns of trees. It is also tested in a less focused way by lots of older tests.

In the future this pass could be expanded to allow some bounded amount of code duplication, and merging more leaves (e.g. Mul(a, 3) and a in an Add tree, into Mul(a, 4))
The latter will need exposing the peephole optimizations out of B3ReduceStrength to avoid duplicating code.

  • JavaScriptCore.xcodeproj/project.pbxproj:
  • Sources.txt:
  • b3/B3Common.cpp:

(JSC::B3::shouldDumpIR):
(JSC::B3::shouldDumpIRAtEachPhase):

  • b3/B3Common.h:
  • b3/B3EliminateDeadCode.cpp: Added.

(JSC::B3::EliminateDeadCode::run):
(JSC::B3::eliminateDeadCode):

  • b3/B3EliminateDeadCode.h: Added.

(JSC::B3::EliminateDeadCode::EliminateDeadCode):

  • b3/B3Generate.cpp:

(JSC::B3::generateToAir):

  • b3/B3OptimizeAssociativeExpressionTrees.cpp: Added.

(JSC::B3::OptimizeAssociativeExpressionTrees::OptimizeAssociativeExpressionTrees):
(JSC::B3::OptimizeAssociativeExpressionTrees::neutralElement):
(JSC::B3::OptimizeAssociativeExpressionTrees::isAbsorbingElement):
(JSC::B3::OptimizeAssociativeExpressionTrees::combineConstants):
(JSC::B3::OptimizeAssociativeExpressionTrees::emitValue):
(JSC::B3::OptimizeAssociativeExpressionTrees::optimizeRootedTree):
(JSC::B3::OptimizeAssociativeExpressionTrees::run):
(JSC::B3::optimizeAssociativeExpressionTrees):

  • b3/B3OptimizeAssociativeExpressionTrees.h: Added.
  • b3/B3ReduceStrength.cpp:
  • b3/B3Value.cpp:

(JSC::B3::Value::replaceWithIdentity):

  • b3/testb3.cpp:

(JSC::B3::testBitXorTreeArgs):
(JSC::B3::testBitXorTreeArgsEven):
(JSC::B3::testBitXorTreeArgImm):
(JSC::B3::testAddTreeArg32):
(JSC::B3::testMulTreeArg32):
(JSC::B3::testBitAndTreeArg32):
(JSC::B3::testBitOrTreeArg32):
(JSC::B3::run):

Location:
trunk
Files:
5 added
10 edited
2 copied

Legend:

Unmodified
Added
Removed
  • trunk/JSTests/ChangeLog

    r243835 r243851  
     12019-04-03  Robin Morisset  <rmorisset@apple.com>
     2
     3        B3 should use associativity to optimize expression trees
     4        https://bugs.webkit.org/show_bug.cgi?id=194081
     5
     6        Reviewed by Filip Pizlo.
     7
     8        Added three microbenchmarks:
     9        - add-tree should be the ideal case, but there is no speedup because we are currently unable to prove that the CheckAdd won't overflow
     10        - bit-xor-tree most closely matches the situation where the optimization triggers on the JetStream2 subtests where it triggers:
     11          an unbalanced expression tree of size 8 that can be balanced, with no other optimizations being unlocked. 16% speedup
     12        - bit-or-tree is an ideal case, where the reassociation also enables a ton of further simplifications. 42% speedup
     13
     14        * microbenchmarks/add-tree.js: Added.
     15        * microbenchmarks/bit-or-tree.js: Added.
     16        * microbenchmarks/bit-xor-tree.js: Added.
     17
    1182019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
    219
  • trunk/Source/JavaScriptCore/ChangeLog

    r243843 r243851  
     12019-04-03  Robin Morisset  <rmorisset@apple.com>
     2
     3        B3 should use associativity to optimize expression trees
     4        https://bugs.webkit.org/show_bug.cgi?id=194081
     5
     6        Reviewed by Filip Pizlo.
     7
     8        This patch adds a new B3 pass, that tries to find and optimize expression trees made purely of any one associative and commutative operator (Add/Mul/BitOr/BitAnd/BitXor).
     9        The pass only runs in O2, and runs once, after lowerMacros and just before a run of B3ReduceStrength (which helps clean up the dead code it tends to leave behind).
     10        I had to separate killDeadCode out of B3ReduceStrength (as a new B3EliminateDeadCode pass) to run it before B3OptimizeAssociativeExpressionTrees, as otherwise it is stopped by high use counts
     11        inherited from CSE.
     12        This extra run of DCE is by itself a win, most notably on microbenchmarks/instanceof-always-hit-two (1.5x faster), and on microbenchmarks/licm-dragons(-out-of-bounds) (both get 1.16x speedup).
     13        I suspect it is because it runs between CSE and tail-dedup, and as a result allows a lot more tail-dedup to occur.
     14
     15        The pass is currently extremely conservative, not trying anything if it would cause _any_ code duplication.
     16        For this purpose, it starts by computing use counts for the potentially interesting nodes (those with the right opcodes), and segregate them into expression trees.
     17        The root of an expression tree is a node that is either used in multiple places, or is used by a value with a different opcode.
     18        The leaves of an expression tree are nodes that are either used in multiple places, or have a different opcode.
     19        All constant leaves of a tree are combined, as well as all leaves that are identical. What remains is then laid out into a balanced binary tree, hopefully maximizing ILP.
     20
     21        This optimization was implemented as a stand-alone pass and not as part of B3ReduceStrength mostly because it needs use counts to avoid code duplication.
     22        It also benefits from finding all tree roots first, and not trying to repeatedly optimize subtrees.
     23
     24        I added several tests to testB3 with varying patterns of trees. It is also tested in a less focused way by lots of older tests.
     25
     26        In the future this pass could be expanded to allow some bounded amount of code duplication, and merging more leaves (e.g. Mul(a, 3) and a in an Add tree, into Mul(a, 4))
     27        The latter will need exposing the peephole optimizations out of B3ReduceStrength to avoid duplicating code.
     28
     29        * JavaScriptCore.xcodeproj/project.pbxproj:
     30        * Sources.txt:
     31        * b3/B3Common.cpp:
     32        (JSC::B3::shouldDumpIR):
     33        (JSC::B3::shouldDumpIRAtEachPhase):
     34        * b3/B3Common.h:
     35        * b3/B3EliminateDeadCode.cpp: Added.
     36        (JSC::B3::EliminateDeadCode::run):
     37        (JSC::B3::eliminateDeadCode):
     38        * b3/B3EliminateDeadCode.h: Added.
     39        (JSC::B3::EliminateDeadCode::EliminateDeadCode):
     40        * b3/B3Generate.cpp:
     41        (JSC::B3::generateToAir):
     42        * b3/B3OptimizeAssociativeExpressionTrees.cpp: Added.
     43        (JSC::B3::OptimizeAssociativeExpressionTrees::OptimizeAssociativeExpressionTrees):
     44        (JSC::B3::OptimizeAssociativeExpressionTrees::neutralElement):
     45        (JSC::B3::OptimizeAssociativeExpressionTrees::isAbsorbingElement):
     46        (JSC::B3::OptimizeAssociativeExpressionTrees::combineConstants):
     47        (JSC::B3::OptimizeAssociativeExpressionTrees::emitValue):
     48        (JSC::B3::OptimizeAssociativeExpressionTrees::optimizeRootedTree):
     49        (JSC::B3::OptimizeAssociativeExpressionTrees::run):
     50        (JSC::B3::optimizeAssociativeExpressionTrees):
     51        * b3/B3OptimizeAssociativeExpressionTrees.h: Added.
     52        * b3/B3ReduceStrength.cpp:
     53        * b3/B3Value.cpp:
     54        (JSC::B3::Value::replaceWithIdentity):
     55        * b3/testb3.cpp:
     56        (JSC::B3::testBitXorTreeArgs):
     57        (JSC::B3::testBitXorTreeArgsEven):
     58        (JSC::B3::testBitXorTreeArgImm):
     59        (JSC::B3::testAddTreeArg32):
     60        (JSC::B3::testMulTreeArg32):
     61        (JSC::B3::testBitAndTreeArg32):
     62        (JSC::B3::testBitOrTreeArg32):
     63        (JSC::B3::run):
     64
    1652019-04-03  Yusuke Suzuki  <ysuzuki@apple.com>
    266
  • trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj

    r243832 r243851  
    852852                2AF7382D18BBBF92008A5A37 /* StructureIDTable.h in Headers */ = {isa = PBXBuildFile; fileRef = 2AF7382B18BBBF92008A5A37 /* StructureIDTable.h */; settings = {ATTRIBUTES = (Private, ); }; };
    853853                2D342F36F7244096804ADB24 /* SourceOrigin.h in Headers */ = {isa = PBXBuildFile; fileRef = 425BA1337E4344E1B269A671 /* SourceOrigin.h */; settings = {ATTRIBUTES = (Private, ); }; };
     854                3395C70722555F6D00BDBFAD /* B3EliminateDeadCode.h in Headers */ = {isa = PBXBuildFile; fileRef = 3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */; };
    854855                371D842D17C98B6E00ECF994 /* libz.dylib in Frameworks */ = {isa = PBXBuildFile; fileRef = 371D842C17C98B6E00ECF994 /* libz.dylib */; };
    855856                37C738D21EDB56E4003F2B0B /* ParseInt.h in Headers */ = {isa = PBXBuildFile; fileRef = 37C738D11EDB5672003F2B0B /* ParseInt.h */; settings = {ATTRIBUTES = (Private, ); }; };
     
    33393340                3032175DF1AD47D8998B34E1 /* JSSourceCode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSSourceCode.h; sourceTree = "<group>"; };
    33403341                30A5F403F11C4F599CD596D5 /* WasmSignatureInlines.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = WasmSignatureInlines.h; sourceTree = "<group>"; };
     3342                33743649224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.cpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.cpp; name = B3OptimizeAssociativeExpressionTrees.cpp; path = b3/B3OptimizeAssociativeExpressionTrees.cpp; sourceTree = "<group>"; };
     3343                3374364A224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = B3OptimizeAssociativeExpressionTrees.h; path = b3/B3OptimizeAssociativeExpressionTrees.h; sourceTree = "<group>"; };
     3344                3395C70422555F6C00BDBFAD /* B3EliminateDeadCode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = B3EliminateDeadCode.cpp; path = b3/B3EliminateDeadCode.cpp; sourceTree = "<group>"; };
     3345                3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = B3EliminateDeadCode.h; path = b3/B3EliminateDeadCode.h; sourceTree = "<group>"; };
    33413346                37119A7720CCB5DC002C6DC9 /* WebKitTargetConditionals.xcconfig */ = {isa = PBXFileReference; lastKnownFileType = text.xcconfig; path = WebKitTargetConditionals.xcconfig; sourceTree = "<group>"; };
    33423347                371D842C17C98B6E00ECF994 /* libz.dylib */ = {isa = PBXFileReference; lastKnownFileType = "compiled.mach-o.dylib"; name = libz.dylib; path = usr/lib/libz.dylib; sourceTree = SDKROOT; };
     
    53595364                                0F725CA31C503DED00AD943A /* B3EliminateCommonSubexpressions.cpp */,
    53605365                                0F725CA41C503DED00AD943A /* B3EliminateCommonSubexpressions.h */,
     5366                                3395C70422555F6C00BDBFAD /* B3EliminateDeadCode.cpp */,
     5367                                3395C70522555F6D00BDBFAD /* B3EliminateDeadCode.h */,
    53615368                                0F5BF16E1F23A5A10029D91D /* B3EnsureLoopPreHeaders.cpp */,
    53625369                                0F5BF16F1F23A5A10029D91D /* B3EnsureLoopPreHeaders.h */,
     
    54065413                                0FEC84D71BDACDAC0080FF74 /* B3Opcode.cpp */,
    54075414                                0FEC84D81BDACDAC0080FF74 /* B3Opcode.h */,
     5415                                33743649224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.cpp */,
     5416                                3374364A224D79EF00C8C227 /* B3OptimizeAssociativeExpressionTrees.h */,
    54085417                                0FEC84D91BDACDAC0080FF74 /* B3Origin.cpp */,
    54095418                                0FEC84DA1BDACDAC0080FF74 /* B3Origin.h */,
     
    90719080                                0FCEFAAC1804C13E00472CE4 /* FTLSaveRestore.h in Headers */,
    90729081                                0F25F1B2181635F300522F39 /* FTLSlowPathCall.h in Headers */,
     9082                                3395C70722555F6D00BDBFAD /* B3EliminateDeadCode.h in Headers */,
    90739083                                0F25F1B4181635F300522F39 /* FTLSlowPathCallKey.h in Headers */,
    90749084                                E322E5A71DA644A8006E7709 /* FTLSnippetParams.h in Headers */,
  • trunk/Source/JavaScriptCore/Sources.txt

    r243832 r243851  
    125125b3/B3Effects.cpp
    126126b3/B3EliminateCommonSubexpressions.cpp
     127b3/B3EliminateDeadCode.cpp
    127128b3/B3EnsureLoopPreHeaders.cpp
    128129b3/B3FenceValue.cpp
     
    144145b3/B3OpaqueByproducts.cpp
    145146b3/B3Opcode.cpp
     147b3/B3OptimizeAssociativeExpressionTrees.cpp
    146148b3/B3Origin.cpp
    147149b3/B3OriginDump.cpp
  • trunk/Source/JavaScriptCore/b3/B3Common.cpp

    r242776 r243851  
    3636namespace JSC { namespace B3 {
    3737
    38 bool shouldDumpIR(B3ComplitationMode mode)
     38bool shouldDumpIR(B3CompilationMode mode)
    3939{
    4040#if ENABLE(FTL_JIT)
     
    4545}
    4646
    47 bool shouldDumpIRAtEachPhase(B3ComplitationMode mode)
     47bool shouldDumpIRAtEachPhase(B3CompilationMode mode)
    4848{
    4949    if (mode == B3Mode)
  • trunk/Source/JavaScriptCore/b3/B3Common.h

    r242776 r243851  
    3535namespace JSC { namespace B3 {
    3636
    37 enum B3ComplitationMode {
     37enum B3CompilationMode {
    3838    B3Mode,
    3939    AirMode
    4040};
    4141
    42 JS_EXPORT_PRIVATE bool shouldDumpIR(B3ComplitationMode);
    43 bool shouldDumpIRAtEachPhase(B3ComplitationMode);
     42JS_EXPORT_PRIVATE bool shouldDumpIR(B3CompilationMode);
     43bool shouldDumpIRAtEachPhase(B3CompilationMode);
    4444bool shouldValidateIR();
    4545bool shouldValidateIRAtEachPhase();
  • trunk/Source/JavaScriptCore/b3/B3EliminateDeadCode.h

    r243850 r243851  
    11/*
    2  * Copyright (C) 2015 Apple Inc. All rights reserved.
     2 * Copyright (C) 2019 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    2424 */
    2525
    26 #include "config.h"
    27 #include "B3Common.h"
     26#pragma once
    2827
    2928#if ENABLE(B3_JIT)
    3029
    31 #include "DFGCommon.h"
    32 #include "FTLState.h"
    33 #include "Options.h"
    34 #include <wtf/Optional.h>
    35 
    3630namespace JSC { namespace B3 {
    3731
    38 bool shouldDumpIR(B3ComplitationMode mode)
    39 {
    40 #if ENABLE(FTL_JIT)
    41     return FTL::verboseCompilationEnabled() || FTL::shouldDumpDisassembly() || shouldDumpIRAtEachPhase(mode);
    42 #else
    43     return shouldDumpIRAtEachPhase(mode);
    44 #endif
    45 }
     32class Procedure;
    4633
    47 bool shouldDumpIRAtEachPhase(B3ComplitationMode mode)
    48 {
    49     if (mode == B3Mode)
    50         return Options::dumpGraphAtEachPhase() || Options::dumpB3GraphAtEachPhase();
    51     return Options::dumpGraphAtEachPhase() || Options::dumpAirGraphAtEachPhase();
    52 }
    53 
    54 bool shouldValidateIR()
    55 {
    56     return DFG::validationEnabled() || shouldValidateIRAtEachPhase();
    57 }
    58 
    59 bool shouldValidateIRAtEachPhase()
    60 {
    61     return Options::validateGraphAtEachPhase();
    62 }
    63 
    64 bool shouldSaveIRBeforePhase()
    65 {
    66     return Options::verboseValidationFailure();
    67 }
    68 
    69 Optional<GPRReg> pinnedExtendedOffsetAddrRegister()
    70 {
    71 #if CPU(ARM64)
    72     return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
    73 #elif CPU(X86_64)
    74     return WTF::nullopt;
    75 #else
    76 #error Unhandled architecture.
    77 #endif
    78 }
     34// The 'Impl' version of this function does not have a scope phase. It is public so that B3ReduceStrength can use it in its fixpoint.
     35bool eliminateDeadCodeImpl(Procedure&);
     36JS_EXPORT_PRIVATE bool eliminateDeadCode(Procedure&);
    7937
    8038} } // namespace JSC::B3
  • trunk/Source/JavaScriptCore/b3/B3Generate.cpp

    r232008 r243851  
    3535#include "B3DuplicateTails.h"
    3636#include "B3EliminateCommonSubexpressions.h"
     37#include "B3EliminateDeadCode.h"
    3738#include "B3FixSSA.h"
    3839#include "B3FoldPathConstants.h"
     
    4445#include "B3LowerToAir.h"
    4546#include "B3MoveConstants.h"
     47#include "B3OptimizeAssociativeExpressionTrees.h"
    4648#include "B3Procedure.h"
    4749#include "B3PureCSE.h"
     
    8890        if (eliminateCommonSubexpressions(procedure))
    8991            eliminateCommonSubexpressions(procedure);
     92        eliminateDeadCode(procedure);
    9093        inferSwitches(procedure);
    9194        if (Options::useB3TailDup())
     
    9396        fixSSA(procedure);
    9497        foldPathConstants(procedure);
    95        
    9698        // FIXME: Add more optimizations here.
    9799        // https://bugs.webkit.org/show_bug.cgi?id=150507
     
    105107
    106108    if (procedure.optLevel() >= 2) {
     109        optimizeAssociativeExpressionTrees(procedure);
    107110        reduceStrength(procedure);
    108111
  • trunk/Source/JavaScriptCore/b3/B3OptimizeAssociativeExpressionTrees.h

    r243850 r243851  
    11/*
    2  * Copyright (C) 2015 Apple Inc. All rights reserved.
     2 * Copyright (C) 2019 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    2121 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    2222 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    23  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
     23 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    2424 */
    2525
    26 #include "config.h"
    27 #include "B3Common.h"
     26#pragma once
    2827
    2928#if ENABLE(B3_JIT)
    3029
    31 #include "DFGCommon.h"
    32 #include "FTLState.h"
    33 #include "Options.h"
    34 #include <wtf/Optional.h>
    35 
    3630namespace JSC { namespace B3 {
    3731
    38 bool shouldDumpIR(B3ComplitationMode mode)
    39 {
    40 #if ENABLE(FTL_JIT)
    41     return FTL::verboseCompilationEnabled() || FTL::shouldDumpDisassembly() || shouldDumpIRAtEachPhase(mode);
    42 #else
    43     return shouldDumpIRAtEachPhase(mode);
    44 #endif
    45 }
     32class Procedure;
    4633
    47 bool shouldDumpIRAtEachPhase(B3ComplitationMode mode)
    48 {
    49     if (mode == B3Mode)
    50         return Options::dumpGraphAtEachPhase() || Options::dumpB3GraphAtEachPhase();
    51     return Options::dumpGraphAtEachPhase() || Options::dumpAirGraphAtEachPhase();
    52 }
    53 
    54 bool shouldValidateIR()
    55 {
    56     return DFG::validationEnabled() || shouldValidateIRAtEachPhase();
    57 }
    58 
    59 bool shouldValidateIRAtEachPhase()
    60 {
    61     return Options::validateGraphAtEachPhase();
    62 }
    63 
    64 bool shouldSaveIRBeforePhase()
    65 {
    66     return Options::verboseValidationFailure();
    67 }
    68 
    69 Optional<GPRReg> pinnedExtendedOffsetAddrRegister()
    70 {
    71 #if CPU(ARM64)
    72     return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
    73 #elif CPU(X86_64)
    74     return WTF::nullopt;
    75 #else
    76 #error Unhandled architecture.
    77 #endif
    78 }
     34// Finds large expression trees using one of the associative and commutative opcodes (Add/Mul/BitAnd/BitOr/BitXor),
     35// then reorder them to eliminate as many operations as possible, then make balanced binary trees out of them
     36// to maximize ILP and minimize the total latency.
     37bool optimizeAssociativeExpressionTrees(Procedure&);
    7938
    8039} } // namespace JSC::B3
    8140
    8241#endif // ENABLE(B3_JIT)
    83 
  • trunk/Source/JavaScriptCore/b3/B3ReduceStrength.cpp

    r243670 r243851  
    3434#include "B3ComputeDivisionMagic.h"
    3535#include "B3Dominators.h"
     36#include "B3EliminateDeadCode.h"
    3637#include "B3InsertionSetInlines.h"
    3738#include "B3MemoryValueInlines.h"
     
    441442            // keep @thing. That's better, since we usually want things to stay wherever the client
    442443            // put them. We're not actually smart enough to move things around at random.
    443             killDeadCode();
     444            m_changed |= eliminateDeadCodeImpl(m_proc);
    444445           
    445446            simplifySSA();
     
    27332734    }
    27342735
    2735     void killDeadCode()
    2736     {
    2737         GraphNodeWorklist<Value*, IndexSet<Value*>> worklist;
    2738         Vector<UpsilonValue*, 64> upsilons;
    2739         for (BasicBlock* block : m_proc) {
    2740             for (Value* value : *block) {
    2741                 Effects effects;
    2742                 // We don't care about effects of SSA operations, since we model them more
    2743                 // accurately than the effects() method does.
    2744                 if (value->opcode() != Phi && value->opcode() != Upsilon)
    2745                     effects = value->effects();
    2746                
    2747                 if (effects.mustExecute())
    2748                     worklist.push(value);
    2749                
    2750                 if (UpsilonValue* upsilon = value->as<UpsilonValue>())
    2751                     upsilons.append(upsilon);
    2752             }
    2753         }
    2754         for (;;) {
    2755             while (Value* value = worklist.pop()) {
    2756                 for (Value* child : value->children())
    2757                     worklist.push(child);
    2758             }
    2759            
    2760             bool didPush = false;
    2761             for (size_t upsilonIndex = 0; upsilonIndex < upsilons.size(); ++upsilonIndex) {
    2762                 UpsilonValue* upsilon = upsilons[upsilonIndex];
    2763                 if (worklist.saw(upsilon->phi())) {
    2764                     worklist.push(upsilon);
    2765                     upsilons[upsilonIndex--] = upsilons.last();
    2766                     upsilons.takeLast();
    2767                     didPush = true;
    2768                 }
    2769             }
    2770             if (!didPush)
    2771                 break;
    2772         }
    2773 
    2774         IndexSet<Variable*> liveVariables;
    2775        
    2776         for (BasicBlock* block : m_proc) {
    2777             size_t sourceIndex = 0;
    2778             size_t targetIndex = 0;
    2779             while (sourceIndex < block->size()) {
    2780                 Value* value = block->at(sourceIndex++);
    2781                 if (worklist.saw(value)) {
    2782                     if (VariableValue* variableValue = value->as<VariableValue>())
    2783                         liveVariables.add(variableValue->variable());
    2784                     block->at(targetIndex++) = value;
    2785                 } else {
    2786                     m_proc.deleteValue(value);
    2787                     m_changed = true;
    2788                 }
    2789             }
    2790             block->values().resize(targetIndex);
    2791         }
    2792 
    2793         for (Variable* variable : m_proc.variables()) {
    2794             if (!liveVariables.contains(variable))
    2795                 m_proc.deleteVariable(variable);
    2796         }
    2797     }
    2798 
    27992736    void simplifySSA()
    28002737    {
  • trunk/Source/JavaScriptCore/b3/B3Value.cpp

    r241335 r243851  
    6464
    6565    ASSERT(m_type == value->m_type);
     66    ASSERT(value != this);
    6667
    6768    if (m_type == Void) {
  • trunk/Source/JavaScriptCore/b3/testb3.cpp

    r243670 r243851  
    3939#include "B3ComputeDivisionMagic.h"
    4040#include "B3Const32Value.h"
     41#include "B3Const64Value.h"
    4142#include "B3ConstPtrValue.h"
    4243#include "B3Effects.h"
     
    397398    testLoadWithOffsetImpl(32761, 16381);
    398399    testLoadWithOffsetImpl(32768, 16384);
     400}
     401
     402void testBitXorTreeArgs(int64_t a, int64_t b)
     403{
     404    Procedure proc;
     405    BasicBlock* root = proc.addBlock();
     406    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     407    Value* argB = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
     408    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, argB);
     409    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
     410    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argA);
     411    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
     412    root->appendNew<Value>(proc, Return, Origin(), node);
     413
     414    CHECK_EQ(compileAndRun<int64_t>(proc, a, b), (((a ^ b) ^ b) ^ a) ^ b);
     415}
     416
     417void testBitXorTreeArgsEven(int64_t a, int64_t b)
     418{
     419    Procedure proc;
     420    BasicBlock* root = proc.addBlock();
     421    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     422    Value* argB = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
     423    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, argB);
     424    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argB);
     425    node = root->appendNew<Value>(proc, BitXor, Origin(), node, argA);
     426    root->appendNew<Value>(proc, Return, Origin(), node);
     427
     428    CHECK_EQ(compileAndRun<int64_t>(proc, a, b), ((a ^ b) ^ b) ^ a);
     429}
     430
     431void testBitXorTreeArgImm(int64_t a, int64_t b)
     432{
     433    Procedure proc;
     434    BasicBlock* root = proc.addBlock();
     435    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     436    Value* immB = root->appendNew<Const64Value>(proc, Origin(), b);
     437    Value* node = root->appendNew<Value>(proc, BitXor, Origin(), argA, immB);
     438    node = root->appendNew<Value>(proc, BitXor, Origin(), argA, node);
     439    node = root->appendNew<Value>(proc, BitXor, Origin(), argA, node);
     440    node = root->appendNew<Value>(proc, BitXor, Origin(), immB, node);
     441    root->appendNew<Value>(proc, Return, Origin(), node);
     442
     443    CHECK_EQ(compileAndRun<int64_t>(proc, a), b ^ (a ^ (a ^ (a ^ b))));
     444}
     445
     446void testAddTreeArg32(int32_t a)
     447{
     448    Procedure proc;
     449    BasicBlock* root = proc.addBlock();
     450    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     451    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
     452    Value* node = argA;
     453    int32_t expectedResult = a;
     454    for (unsigned i = 0; i < 20; ++i) {
     455        Value* otherNode;
     456        if (!(i % 3)) {
     457            otherNode = root->appendNew<Const32Value>(proc, Origin(), i);
     458            expectedResult += i;
     459        } else {
     460            otherNode = argA;
     461            expectedResult += a;
     462        }
     463        node = root->appendNew<Value>(proc, Add, Origin(), node, otherNode);
     464    }
     465    root->appendNew<Value>(proc, Return, Origin(), node);
     466
     467    CHECK_EQ(compileAndRun<int32_t>(proc, a), expectedResult);
     468}
     469
     470void testMulTreeArg32(int32_t a)
     471{
     472    // Fibonacci-like expression tree with multiplication instead of addition.
     473    // Verifies that we don't explode on heavily factored graphs.
     474    Procedure proc;
     475    BasicBlock* root = proc.addBlock();
     476    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     477    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
     478    Value* nodeA = argA;
     479    Value* nodeB = argA;
     480    int32_t expectedA = a, expectedResult = a;
     481    for (unsigned i = 0; i < 20; ++i) {
     482        Value* newNodeB = root->appendNew<Value>(proc, Mul, Origin(), nodeA, nodeB);
     483        nodeA = nodeB;
     484        nodeB = newNodeB;
     485        int32_t newExpectedResult = expectedA * expectedResult;
     486        expectedA = expectedResult;
     487        expectedResult = newExpectedResult;
     488    }
     489    root->appendNew<Value>(proc, Return, Origin(), nodeB);
     490
     491    CHECK_EQ(compileAndRun<int32_t>(proc, a), expectedResult);
     492}
     493
     494void testBitAndTreeArg32(int32_t a)
     495{
     496    Procedure proc;
     497    BasicBlock* root = proc.addBlock();
     498    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     499    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
     500    Value* node = argA;
     501    for (unsigned i = 0; i < 8; ++i) {
     502        Value* constI = root->appendNew<Const32Value>(proc, Origin(), i | 42);
     503        Value* newBitAnd = root->appendNew<Value>(proc, BitAnd, Origin(), argA, constI);
     504        node = root->appendNew<Value>(proc, BitAnd, Origin(), node, newBitAnd);
     505    }
     506    root->appendNew<Value>(proc, Return, Origin(), node);
     507
     508    CHECK_EQ(compileAndRun<int32_t>(proc, a), a & 42);
     509}
     510
     511void testBitOrTreeArg32(int32_t a)
     512{
     513    Procedure proc;
     514    BasicBlock* root = proc.addBlock();
     515    Value* argA = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
     516    argA = root->appendNew<Value>(proc, Trunc, Origin(), argA);
     517    Value* node = argA;
     518    for (unsigned i = 0; i < 8; ++i) {
     519        Value* constI = root->appendNew<Const32Value>(proc, Origin(), i);
     520        Value* newBitAnd = root->appendNew<Value>(proc, BitOr, Origin(), argA, constI);
     521        node = root->appendNew<Value>(proc, BitOr, Origin(), node, newBitAnd);
     522    }
     523    root->appendNew<Value>(proc, Return, Origin(), node);
     524
     525    CHECK_EQ(compileAndRun<int32_t>(proc, a), a | 7);
    399526}
    400527
     
    1704817175    RUN(testReturnConst64(-42));
    1704917176    RUN(testReturnVoid());
     17177
     17178    RUN_BINARY(testBitXorTreeArgs, int64Operands(), int64Operands());
     17179    RUN_BINARY(testBitXorTreeArgsEven, int64Operands(), int64Operands());
     17180    RUN_BINARY(testBitXorTreeArgImm, int64Operands(), int64Operands());
     17181    RUN_UNARY(testAddTreeArg32, int32Operands());
     17182    RUN_UNARY(testMulTreeArg32, int32Operands());
     17183    RUN_UNARY(testBitAndTreeArg32, int32Operands());
     17184    RUN_UNARY(testBitOrTreeArg32, int32Operands());
    1705017185
    1705117186    RUN(testAddArg(111));
Note: See TracChangeset for help on using the changeset viewer.