Changeset 192816 in webkit
- Timestamp:
- Nov 30, 2015 1:05:25 PM (8 years ago)
- Location:
- trunk/Source/JavaScriptCore
- Files:
-
- 2 added
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/Source/JavaScriptCore/CMakeLists.txt
r192776 r192816 122 122 b3/B3PatchpointValue.cpp 123 123 b3/B3PhaseScope.cpp 124 b3/B3PhiChildren.cpp 124 125 b3/B3Procedure.cpp 125 126 b3/B3ReduceStrength.cpp -
trunk/Source/JavaScriptCore/ChangeLog
r192815 r192816 1 2015-11-30 Filip Pizlo <fpizlo@apple.com> 2 3 B3 should be be clever about choosing which child to reuse for result in two-operand commutative operations 4 https://bugs.webkit.org/show_bug.cgi?id=151321 5 6 Reviewed by Geoffrey Garen. 7 8 When lowering a commutative operation to a two-operand instruction, you have a choice of which 9 child value to move into the result tmp. For example we might have: 10 11 @x = Add(@y, @z) 12 13 Assuming no three-operand add is available, we could either lower it to this: 14 15 Move %y, %x 16 Add %z, %x 17 18 or to this: 19 20 Move %z, %x 21 Add %y, %x 22 23 Which is better depends on the likelihood of coalescing with %x. If it's more likely that %y will 24 coalesce with %x, then we want to use the first form. Otherwise, we should use the second form. 25 26 This implements two heuristics for selecting the right form, and makes those heuristics reusable 27 within the B3->Air lowering by abstracting it as preferRightForResult(). For non-commutative 28 operations we must use the first form, so the first form is the default. The heuristics are: 29 30 - If the right child has only one user, then use the second form instead. This is profitable because 31 that means that @z dies at the Add, so using the second form means that the Move will be coalesced 32 away. 33 34 - If one of the children is a Phi that this operation (the Add in this case) flows into via some 35 Upsilon - possibly transitively through other Phis - then use the form that cases a Move on that 36 child. This overrides everything else, and is meant to optimize variables that accumulate in a 37 loop. 38 39 This required adding a reusable PhiChildren analysis, so I wrote one. It has an API that is mostly 40 based on iterators, and a higher-level API for looking at transitive children that is based on 41 functors. 42 43 I was originally implementing this for completeness, but when looking at how it interacted with 44 imaging-gaussian-blur, I realized the need for some heuristic for the loop-accumulator case. This 45 helps a lot on that benchmark. This widens the overall lead that B3 has on imaging-gaussian-blur, but 46 steady-state runs that exclude compile latency still show a slight deficit. That will most likely get 47 fixed by https://bugs.webkit.org/show_bug.cgi?id=151174. 48 49 No new tests because the commutativity appears to be covered by existing tests, and anyway, there are 50 no correctness implications to commuting a commutative operation. 51 52 * CMakeLists.txt: 53 * JavaScriptCore.xcodeproj/project.pbxproj: 54 * b3/B3LowerToAir.cpp: 55 (JSC::B3::Air::LowerToAir::LowerToAir): 56 (JSC::B3::Air::LowerToAir::canBeInternal): 57 (JSC::B3::Air::LowerToAir::appendUnOp): 58 (JSC::B3::Air::LowerToAir::preferRightForResult): 59 (JSC::B3::Air::LowerToAir::appendBinOp): 60 (JSC::B3::Air::LowerToAir::lower): 61 * b3/B3PhiChildren.cpp: Added. 62 (JSC::B3::PhiChildren::PhiChildren): 63 (JSC::B3::PhiChildren::~PhiChildren): 64 * b3/B3PhiChildren.h: Added. 65 (JSC::B3::PhiChildren::ValueCollection::ValueCollection): 66 (JSC::B3::PhiChildren::ValueCollection::size): 67 (JSC::B3::PhiChildren::ValueCollection::at): 68 (JSC::B3::PhiChildren::ValueCollection::operator[]): 69 (JSC::B3::PhiChildren::ValueCollection::contains): 70 (JSC::B3::PhiChildren::ValueCollection::iterator::iterator): 71 (JSC::B3::PhiChildren::ValueCollection::iterator::operator*): 72 (JSC::B3::PhiChildren::ValueCollection::iterator::operator++): 73 (JSC::B3::PhiChildren::ValueCollection::iterator::operator==): 74 (JSC::B3::PhiChildren::ValueCollection::iterator::operator!=): 75 (JSC::B3::PhiChildren::ValueCollection::begin): 76 (JSC::B3::PhiChildren::ValueCollection::end): 77 (JSC::B3::PhiChildren::UpsilonCollection::UpsilonCollection): 78 (JSC::B3::PhiChildren::UpsilonCollection::size): 79 (JSC::B3::PhiChildren::UpsilonCollection::at): 80 (JSC::B3::PhiChildren::UpsilonCollection::operator[]): 81 (JSC::B3::PhiChildren::UpsilonCollection::contains): 82 (JSC::B3::PhiChildren::UpsilonCollection::begin): 83 (JSC::B3::PhiChildren::UpsilonCollection::end): 84 (JSC::B3::PhiChildren::UpsilonCollection::values): 85 (JSC::B3::PhiChildren::UpsilonCollection::forAllTransitiveIncomingValues): 86 (JSC::B3::PhiChildren::UpsilonCollection::transitivelyUses): 87 (JSC::B3::PhiChildren::at): 88 (JSC::B3::PhiChildren::operator[]): 89 * b3/B3Procedure.cpp: 90 (JSC::B3::Procedure::Procedure): 91 * b3/B3Procedure.h: 92 * b3/B3UseCounts.cpp: 93 (JSC::B3::UseCounts::UseCounts): 94 * b3/B3UseCounts.h: 95 (JSC::B3::UseCounts::numUses): 96 (JSC::B3::UseCounts::numUsingInstructions): 97 (JSC::B3::UseCounts::operator[]): Deleted. 98 1 99 2015-11-30 Filip Pizlo <fpizlo@apple.com> 2 100 -
trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
r192812 r192816 307 307 0F34B14916D42010001CDA5A /* DFGUseKind.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F34B14716D4200E001CDA5A /* DFGUseKind.cpp */; }; 308 308 0F34B14A16D42013001CDA5A /* DFGUseKind.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F34B14816D4200E001CDA5A /* DFGUseKind.h */; }; 309 0F37308C1C0BD29100052BFA /* B3PhiChildren.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F37308A1C0BD29100052BFA /* B3PhiChildren.cpp */; }; 310 0F37308D1C0BD29100052BFA /* B3PhiChildren.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F37308B1C0BD29100052BFA /* B3PhiChildren.h */; }; 309 311 0F37308F1C0CD68500052BFA /* DisallowMacroScratchRegisterUsage.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F37308E1C0CD68500052BFA /* DisallowMacroScratchRegisterUsage.h */; }; 310 312 0F3730911C0CD70C00052BFA /* AllowMacroScratchRegisterUsage.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F3730901C0CD70C00052BFA /* AllowMacroScratchRegisterUsage.h */; }; … … 2370 2372 0F34B14716D4200E001CDA5A /* DFGUseKind.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGUseKind.cpp; path = dfg/DFGUseKind.cpp; sourceTree = "<group>"; }; 2371 2373 0F34B14816D4200E001CDA5A /* DFGUseKind.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGUseKind.h; path = dfg/DFGUseKind.h; sourceTree = "<group>"; }; 2374 0F37308A1C0BD29100052BFA /* B3PhiChildren.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = B3PhiChildren.cpp; path = b3/B3PhiChildren.cpp; sourceTree = "<group>"; }; 2375 0F37308B1C0BD29100052BFA /* B3PhiChildren.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = B3PhiChildren.h; path = b3/B3PhiChildren.h; sourceTree = "<group>"; }; 2372 2376 0F37308E1C0CD68500052BFA /* DisallowMacroScratchRegisterUsage.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DisallowMacroScratchRegisterUsage.h; sourceTree = "<group>"; }; 2373 2377 0F3730901C0CD70C00052BFA /* AllowMacroScratchRegisterUsage.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AllowMacroScratchRegisterUsage.h; sourceTree = "<group>"; }; … … 4549 4553 0FEC84DF1BDACDAC0080FF74 /* B3PhaseScope.cpp */, 4550 4554 0FEC84E01BDACDAC0080FF74 /* B3PhaseScope.h */, 4555 0F37308A1C0BD29100052BFA /* B3PhiChildren.cpp */, 4556 0F37308B1C0BD29100052BFA /* B3PhiChildren.h */, 4551 4557 0FEC84E11BDACDAC0080FF74 /* B3Procedure.cpp */, 4552 4558 0FEC84E21BDACDAC0080FF74 /* B3Procedure.h */, … … 6900 6906 0F9D36951AE9CC33000D4DFB /* DFGCleanUpPhase.h in Headers */, 6901 6907 A77A424017A0BBFD00A8DB81 /* DFGClobberize.h in Headers */, 6908 0F37308D1C0BD29100052BFA /* B3PhiChildren.h in Headers */, 6902 6909 A77A424217A0BBFD00A8DB81 /* DFGClobberSet.h in Headers */, 6903 6910 0F3C1F1B1B868E7900ABB08B /* DFGClobbersExitState.h in Headers */, … … 8835 8842 142D6F1113539A4100B02E86 /* MarkStack.cpp in Sources */, 8836 8843 4340A4841A9051AF00D73CCA /* MathCommon.cpp in Sources */, 8844 0F37308C1C0BD29100052BFA /* B3PhiChildren.cpp in Sources */, 8837 8845 14469DDF107EC7E700650446 /* MathObject.cpp in Sources */, 8838 8846 90213E3D123A40C200D422F3 /* MemoryStatistics.cpp in Sources */, -
trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp
r192699 r192816 45 45 #include "B3PatchpointValue.h" 46 46 #include "B3PhaseScope.h" 47 #include "B3PhiChildren.h" 47 48 #include "B3Procedure.h" 48 49 #include "B3StackSlotValue.h" … … 66 67 , m_blockToBlock(procedure.size()) 67 68 , m_useCounts(procedure) 69 , m_phiChildren(procedure) 68 70 , m_procedure(procedure) 69 71 , m_code(procedure.code()) … … 318 320 return false; 319 321 320 // We require internals to have only one use - us. 321 if (m_useCounts[value] != 1) 322 // We require internals to have only one use - us. It's not clear if this should be numUses() or 323 // numUsingInstructions(). Ideally, it would be numUsingInstructions(), except that it's not clear 324 // if we'd actually do the right thing when matching over such a DAG pattern. For now, it simply 325 // doesn't matter because we don't implement patterns that would trigger this. 326 if (m_useCounts.numUses(value) != 1) 322 327 return false; 323 328 … … 526 531 } 527 532 533 // Call this method when doing two-operand lowering of a commutative operation. You have a choice of 534 // which incoming Value is moved into the result. This will select which one is likely to be most 535 // profitable to use as the result. Doing the right thing can have big performance consequences in tight 536 // kernels. 537 bool preferRightForResult(Value* left, Value* right) 538 { 539 // The default is to move left into result, because that's required for non-commutative instructions. 540 // The value that we want to move into result position is the one that dies here. So, if we're 541 // compiling a commutative operation and we know that actually right is the one that dies right here, 542 // then we can flip things around to help coalescing, which then kills the move instruction. 543 // 544 // But it's more complicated: 545 // - Used-once is a bad estimate of whether the variable dies here. 546 // - A child might be a candidate for coalescing with this value. 547 // 548 // Currently, we have machinery in place to recognize super obvious forms of the latter issue. 549 550 bool result = m_useCounts.numUsingInstructions(right) == 1; 551 552 // We recognize when a child is a Phi that has this value as one of its children. We're very 553 // conservative about this; for example we don't even consider transitive Phi children. 554 bool leftIsPhiWithThis = m_phiChildren[left].transitivelyUses(m_value); 555 bool rightIsPhiWithThis = m_phiChildren[right].transitivelyUses(m_value); 556 557 if (leftIsPhiWithThis != rightIsPhiWithThis) 558 result = rightIsPhiWithThis; 559 560 return result; 561 } 562 528 563 template< 529 564 Air::Opcode opcode32, Air::Opcode opcode64, Air::Opcode opcodeDouble, … … 599 634 } 600 635 601 // FIXME: If we're going to use a two-operand instruction, and the operand is commutative, we 602 // should coalesce the result with the operand that is killed. 603 // https://bugs.webkit.org/show_bug.cgi?id=151321 636 if (commutativity == Commutative && preferRightForResult(left, right)) { 637 append(relaxedMoveForType(m_value->type()), tmp(right), result); 638 append(opcode, tmp(left), result); 639 return; 640 } 604 641 605 642 append(relaxedMoveForType(m_value->type()), tmp(left), result); … … 1659 1696 1660 1697 case CheckAdd: 1661 case CheckSub: { 1698 case CheckSub: 1699 case CheckMul: { 1662 1700 CheckValue* checkValue = m_value->as<CheckValue>(); 1663 1701 … … 1685 1723 } 1686 1724 1687 // FIXME: Use commutativity of CheckAdd to increase the likelihood of coalescing.1688 // https://bugs.webkit.org/show_bug.cgi?id=1513211689 1690 append(Move, tmp(left), result);1691 1692 1725 Air::Opcode opcode = Air::Oops; 1726 Commutativity commutativity = NotCommutative; 1727 Arg::Role stackmapRole = Arg::Use; 1693 1728 switch (m_value->opcode()) { 1694 1729 case CheckAdd: 1695 1730 opcode = opcodeForType(BranchAdd32, BranchAdd64, Air::Oops, m_value->type()); 1731 commutativity = Commutative; 1696 1732 break; 1697 1733 case CheckSub: 1698 1734 opcode = opcodeForType(BranchSub32, BranchSub64, Air::Oops, m_value->type()); 1699 1735 break; 1736 case CheckMul: 1737 opcode = opcodeForType(BranchMul32, BranchMul64, Air::Oops, checkValue->type()); 1738 stackmapRole = Arg::LateUse; 1739 break; 1700 1740 default: 1701 1741 RELEASE_ASSERT_NOT_REACHED(); … … 1708 1748 // https://bugs.webkit.org/show_bug.cgi?id=151228 1709 1749 1710 Arg source; 1711 if (imm(right) && isValidForm(opcode, Arg::ResCond, Arg::Imm, Arg::Tmp)) 1712 source = imm(right); 1713 else 1714 source = tmp(right); 1750 Vector<Arg, 2> sources; 1751 if (imm(right) && isValidForm(opcode, Arg::ResCond, Arg::Tmp, Arg::Imm, Arg::Tmp)) { 1752 sources.append(tmp(left)); 1753 sources.append(imm(right)); 1754 } else if (imm(right) && isValidForm(opcode, Arg::ResCond, Arg::Imm, Arg::Tmp)) { 1755 sources.append(imm(right)); 1756 append(Move, tmp(left), result); 1757 } else if (commutativity == Commutative && preferRightForResult(left, right)) { 1758 sources.append(tmp(left)); 1759 append(Move, tmp(right), result); 1760 } else { 1761 sources.append(tmp(right)); 1762 append(Move, tmp(left), result); 1763 } 1715 1764 1716 1765 // There is a really hilarious case that arises when we do BranchAdd32(%x, %x). We won't emit … … 1737 1786 // LateUse here to take care of add-to-self. 1738 1787 1739 CheckSpecial* special = ensureCheckSpecial(opcode, 3);1788 CheckSpecial* special = ensureCheckSpecial(opcode, 2 + sources.size(), stackmapRole); 1740 1789 1741 1790 Inst inst(Patch, checkValue, Arg::special(special)); … … 1743 1792 inst.args.append(Arg::resCond(MacroAssembler::Overflow)); 1744 1793 1745 inst.args.append (source);1794 inst.args.appendVector(sources); 1746 1795 inst.args.append(result); 1747 1796 1748 1797 fillStackmap(inst, checkValue, 2); 1749 1798 1750 m_insts.last().append(WTF::move(inst));1751 return;1752 }1753 1754 case CheckMul: {1755 CheckValue* checkValue = m_value->as<CheckValue>();1756 1757 Value* left = checkValue->child(0);1758 Value* right = checkValue->child(1);1759 1760 Tmp result = tmp(m_value);1761 1762 Air::Opcode opcode =1763 opcodeForType(BranchMul32, BranchMul64, Air::Oops, checkValue->type());1764 CheckSpecial* special = ensureCheckSpecial(opcode, 3, Arg::LateUse);1765 1766 // FIXME: Handle immediates.1767 // https://bugs.webkit.org/show_bug.cgi?id=1512301768 1769 append(Move, tmp(left), result);1770 1771 Inst inst(Patch, checkValue, Arg::special(special));1772 inst.args.append(Arg::resCond(MacroAssembler::Overflow));1773 inst.args.append(tmp(right));1774 inst.args.append(result);1775 1776 fillStackmap(inst, checkValue, 2);1777 1778 1799 m_insts.last().append(WTF::move(inst)); 1779 1800 return; … … 1859 1880 1860 1881 UseCounts m_useCounts; 1882 PhiChildren m_phiChildren; 1861 1883 1862 1884 Vector<Vector<Inst, 4>> m_insts; -
trunk/Source/JavaScriptCore/b3/B3UseCounts.cpp
r191716 r192816 36 36 : m_counts(procedure.values().size()) 37 37 { 38 for (Value* value : procedure.values()) 39 ASSERT_UNUSED(value, !m_counts[value]); 38 Vector<Value*, 64> children; 40 39 for (Value* value : procedure.values()) { 41 for (Value* child : value->children()) 42 m_counts[child]++; 40 children.resize(0); 41 for (Value* child : value->children()) { 42 m_counts[child].numUses++; 43 children.append(child); 44 } 45 std::sort(children.begin(), children.end()); 46 Value* last = nullptr; 47 for (Value* child : children) { 48 if (child == last) 49 continue; 50 51 m_counts[child].numUsingInstructions++; 52 last = child; 53 } 43 54 } 44 55 } -
trunk/Source/JavaScriptCore/b3/B3UseCounts.h
r191705 r192816 41 41 ~UseCounts(); 42 42 43 unsigned operator[](Value* value) const 44 { 45 return m_counts[value]; 46 } 43 unsigned numUses(Value* value) const { return m_counts[value].numUses; } 44 unsigned numUsingInstructions(Value* value) const { return m_counts[value].numUsingInstructions; } 45 47 46 private: 48 IndexMap<Value, unsigned> m_counts; 47 struct Counts { 48 unsigned numUses { 0 }; 49 unsigned numUsingInstructions { 0 }; 50 }; 51 52 IndexMap<Value, Counts> m_counts; 49 53 }; 50 54
Note: See TracChangeset
for help on using the changeset viewer.