Changeset 280452 in webkit
- Timestamp:
- Jul 29, 2021 3:26:13 PM (12 months ago)
- Location:
- trunk
- Files:
-
- 5 added
- 11 edited
-
JSTests/ChangeLog (modified) (1 diff)
-
JSTests/microbenchmarks/jquery-todomvc-regexp.js (modified) (1 diff)
-
JSTests/stress/regexp--bm-search-long-character.js (added)
-
JSTests/stress/regexp--bm-search-long-map.js (added)
-
JSTests/stress/regexp-bitvector-reuse.js (added)
-
JSTests/stress/regexp-non-ascii-bm-search-character.js (added)
-
JSTests/stress/regexp-non-ascii-bm-search-map.js (added)
-
Source/JavaScriptCore/ChangeLog (modified) (1 diff)
-
Source/JavaScriptCore/runtime/OptionsList.h (modified) (1 diff)
-
Source/JavaScriptCore/runtime/RegExp.cpp (modified) (1 diff)
-
Source/JavaScriptCore/yarr/YarrJIT.cpp (modified) (16 diffs)
-
Source/JavaScriptCore/yarr/YarrJIT.h (modified) (7 diffs)
-
Source/WTF/ChangeLog (modified) (1 diff)
-
Source/WTF/wtf/BitVector.cpp (modified) (1 diff)
-
Source/WTF/wtf/Bitmap.h (modified) (3 diffs)
-
Source/WTF/wtf/UniqueRef.h (modified) (3 diffs)
Legend:
- Unmodified
- Added
- Removed
-
trunk/JSTests/ChangeLog
r280289 r280452 1 2021-07-28 Yusuke Suzuki <ysuzuki@apple.com> 2 3 [JSC] Yarr should perform BoyerMoore search 4 https://bugs.webkit.org/show_bug.cgi?id=228301 5 6 Reviewed by Saam Barati. 7 8 * microbenchmarks/jquery-todomvc-regexp.js: 9 * stress/regexp--bm-search-long-character.js: Added. 10 (shouldBe): 11 * stress/regexp--bm-search-long-map.js: Added. 12 (shouldBe): 13 * stress/regexp-bitvector-reuse.js: Added. 14 (shouldBe): 15 * stress/regexp-non-ascii-bm-search-character.js: Added. 16 (shouldBe): 17 * stress/regexp-non-ascii-bm-search-map.js: Added. 18 (shouldBe): 19 1 20 2021-07-25 Alexey Shvayka <shvaikalesh@gmail.com> 2 21 -
trunk/JSTests/microbenchmarks/jquery-todomvc-regexp.js
r280281 r280452 84210 84210 results = regExp$16.exec(string$215); 84211 84211 } 84212 noInline(test); 84212 if (typeof noInline !== "undefined") 84213 noInline(test); 84213 84214 regExps.push(/avoid constant folding/gi); 84214 84215 strings.push(/avoid constant folding/gi); -
trunk/Source/JavaScriptCore/ChangeLog
r280391 r280452 1 2021-07-28 Yusuke Suzuki <ysuzuki@apple.com> 2 3 [JSC] Yarr should perform BoyerMoore search 4 https://bugs.webkit.org/show_bug.cgi?id=228301 5 6 Reviewed by Saam Barati. 7 8 This patch emits skipping fast-path at the beginning of body alternatives with a large stride. So we can quickly discard unrelated characters 9 and attempt to find possibly related sequence in the long sequence. The method is derived from V8's implementation (with some extensions). 10 11 If we have a searching pattern /abcdef/, then we can check the 6th character against a set of {a, b, c, d, e, f}. 12 If it does not match, we can shift 6 characters. We use this strategy since this way can be extended easily to support 13 disjunction, character-class, and ignore-cases. For example, in the case of /(?:abc|def)/, we can check 3rd character 14 against {a, b, c, d, e, f} and shift 3 characters if it does not match. 15 16 Then, the best way to perform the above shifting is that finding the longest character sequence which does not have 17 many candidates. In the case of /[a-z]aaaaaaa[a-z]/, we can extract "aaaaaaa" sequence and check 8th character against {a}. 18 If it does not match, then we can shift 7 characters (length of "aaaaaaa"). This shifting is better than using "[a-z]aaaaaaa[a-z]" 19 sequence and {a-z} set since {a-z} set will almost always match. 20 21 We first collect possible characters for each character position. Then, apply heuristics to extract good character sequence from 22 that and construct fast searching with long stride. 23 24 Microbenchmark which performs RegExp ops in Speedometer2/jQuery-TodoMVC shows 25% improvement. 25 26 ToT Patched 27 28 jquery-todomvc-regexp 723.9739+-1.3997 ^ 579.1698+-1.2505 ^ definitely 1.2500x faster 29 30 This improves Speedometer2/jQuery-TodoMVC by 3%. 31 32 ---------------------------------------------------------------------------------------------------------------------------------- 33 | subtest | ms | ms | b / a | pValue (significance using False Discovery Rate) | 34 ---------------------------------------------------------------------------------------------------------------------------------- 35 | Elm-TodoMVC |123.365625 |123.456250 |1.000735 | 0.804077 | 36 | VueJS-TodoMVC |26.912500 |26.925000 |1.000464 | 0.969603 | 37 | EmberJS-TodoMVC |127.540625 |127.562500 |1.000172 | 0.960474 | 38 | BackboneJS-TodoMVC |50.606250 |50.518750 |0.998271 | 0.670313 | 39 | Preact-TodoMVC |21.018750 |20.850000 |0.991971 | 0.563818 | 40 | AngularJS-TodoMVC |136.943750 |137.271875 |1.002396 | 0.531513 | 41 | Vanilla-ES2015-TodoMVC |68.521875 |68.593750 |1.001049 | 0.701376 | 42 | Inferno-TodoMVC |65.559375 |65.803125 |1.003718 | 0.414418 | 43 | Flight-TodoMVC |77.284375 |76.715625 |0.992641 | 0.219870 | 44 | Angular2-TypeScript-TodoMVC |40.725000 |40.318750 |0.990025 | 0.281212 | 45 | VanillaJS-TodoMVC |55.209375 |54.715625 |0.991057 | 0.056921 | 46 | jQuery-TodoMVC |266.396875 |258.471875 |0.970251 | 0.000000 (significant) | 47 | EmberJS-Debug-TodoMVC |341.550000 |341.856250 |1.000897 | 0.618140 | 48 | React-TodoMVC |88.731250 |88.871875 |1.001585 | 0.512407 | 49 | React-Redux-TodoMVC |150.340625 |150.065625 |0.998171 | 0.412940 | 50 | Vanilla-ES2015-Babel-Webpack-TodoMVC |65.390625 |65.362500 |0.999570 | 0.834760 | 51 ---------------------------------------------------------------------------------------------------------------------------------- 52 a mean = 245.96997 53 b mean = 246.86366 54 pValue = 0.0061448402 55 (Bigger means are better.) 56 1.004 times better 57 Results ARE significant 58 59 * runtime/OptionsList.h: 60 * yarr/YarrJIT.cpp: 61 (JSC::Yarr::BoyerMooreInfo::BoyerMooreInfo): 62 (JSC::Yarr::BoyerMooreInfo::length const): 63 (JSC::Yarr::BoyerMooreInfo::set): 64 (JSC::Yarr::BoyerMooreInfo::index const): 65 (JSC::Yarr::BoyerMooreInfo::setIndex): 66 (JSC::Yarr::BoyerMooreInfo::create): 67 (JSC::Yarr::BoyerMooreInfo::findBestCharacterSequence const): 68 (JSC::Yarr::BoyerMooreInfo::findWorthwhileCharacterSequenceForLookahead const): 69 (JSC::Yarr::BoyerMooreInfo::createCandidateBitmap const): 70 * yarr/YarrJIT.h: 71 (JSC::Yarr::BoyerMooreBitmap::count const): 72 (JSC::Yarr::BoyerMooreBitmap::map const): 73 (JSC::Yarr::BoyerMooreBitmap::isMaskEffective const): 74 (JSC::Yarr::BoyerMooreBitmap::add): 75 (JSC::Yarr::BoyerMooreByteVector::BoyerMooreByteVector): 76 (JSC::Yarr::YarrCodeBlock::set8BitCode): 77 (JSC::Yarr::YarrCodeBlock::set16BitCode): 78 (JSC::Yarr::YarrCodeBlock::set8BitCodeMatchOnly): 79 (JSC::Yarr::YarrCodeBlock::set16BitCodeMatchOnly): 80 (JSC::Yarr::YarrCodeBlock::clear): 81 (JSC::Yarr::YarrCodeBlock::findSameVector const): 82 1 83 2021-07-28 Yusuke Suzuki <ysuzuki@apple.com> 2 84 -
trunk/Source/JavaScriptCore/runtime/OptionsList.h
r279977 r280452 453 453 \ 454 454 v(Bool, dumpCompiledRegExpPatterns, false, Normal, nullptr) \ 455 v(Bool, verboseRegExpCompilation, false, Normal, nullptr) \ 455 456 \ 456 457 v(Bool, dumpModuleRecord, false, Normal, nullptr) \ -
trunk/Source/JavaScriptCore/runtime/RegExp.cpp
r280285 r280452 362 362 #if ENABLE(YARR_JIT) 363 363 if (m_regExpJITCode) 364 m_regExpJITCode->clear( );364 m_regExpJITCode->clear(locker); 365 365 #endif 366 366 m_regExpBytecode = nullptr; -
trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp
r280285 r280452 1 1 /* 2 * Copyright (C) 2009-2018 Apple Inc. All rights reserved. 2 * Copyright (C) 2009-2021 Apple Inc. All rights reserved. 3 * Copyright (C) 2019 the V8 project authors. All rights reserved. 3 4 * 4 5 * Redistribution and use in source and binary forms, with or without … … 35 36 #include "YarrMatchingContextHolder.h" 36 37 #include <wtf/ASCIICType.h> 38 #include <wtf/HexNumber.h> 37 39 #include <wtf/Threading.h> 38 40 … … 41 43 42 44 namespace JSC { namespace Yarr { 45 namespace YarrJITInternal { 46 static constexpr bool verbose = false; 47 } 43 48 44 49 #if CPU(ARM64E) 45 50 JSC_ANNOTATE_JIT_OPERATION(_JITTarget_vmEntryToYarrJITAfter, vmEntryToYarrJITAfter); 46 51 #endif 52 53 class BoyerMooreInfo { 54 WTF_MAKE_NONCOPYABLE(BoyerMooreInfo); 55 WTF_MAKE_FAST_ALLOCATED(BoyerMooreInfo); 56 public: 57 static constexpr unsigned maxLength = 32; 58 59 explicit BoyerMooreInfo(unsigned length) 60 : m_characters(length) 61 { 62 ASSERT(this->length() <= maxLength); 63 } 64 65 unsigned length() const { return m_characters.size(); } 66 67 void set(unsigned index, UChar32 character) 68 { 69 m_characters[index].add(character); 70 } 71 72 unsigned index() const { return m_index; } 73 void setIndex(unsigned index) 74 { 75 m_index = index; 76 } 77 78 static UniqueRef<BoyerMooreInfo> create(unsigned length) 79 { 80 return makeUniqueRef<BoyerMooreInfo>(length); 81 } 82 83 std::optional<std::tuple<unsigned, unsigned>> findWorthwhileCharacterSequenceForLookahead() const; 84 std::tuple<BoyerMooreBitmap::Map, bool> createCandidateBitmap(unsigned begin, unsigned end) const; 85 86 private: 87 std::tuple<int, unsigned, unsigned> findBestCharacterSequence(unsigned numberOfCandidatesLimit) const; 88 89 Vector<BoyerMooreBitmap> m_characters; 90 unsigned m_index { 0 }; 91 }; 92 93 std::tuple<int, unsigned, unsigned> BoyerMooreInfo::findBestCharacterSequence(unsigned numberOfCandidatesLimit) const 94 { 95 int biggestPoint = 0; 96 unsigned beginResult = 0; 97 unsigned endResult = 0; 98 for (unsigned index = 0; index < length();) { 99 while (index < length() && m_characters[index].count() > numberOfCandidatesLimit) 100 ++index; 101 if (index == length()) 102 break; 103 unsigned begin = index; 104 BoyerMooreBitmap::Map map { }; 105 for (; index < length() && m_characters[index].count() <= numberOfCandidatesLimit; ++index) 106 map.merge(m_characters[index].map()); 107 108 // If map has many candidates, then point of this sequence is low since it will match too many things. 109 // And if the sequence is longer, then the point of this sequence is higher since it can skip many characters. 110 // FIXME: Currently we are handling all characters equally. But we should have weight per character since e.g. 'e' should appear more frequently than '\v'. 111 // https://bugs.webkit.org/show_bug.cgi?id=228610 112 int frequency = map.count(); 113 int matchingProbability = BoyerMooreBitmap::mapSize - frequency; 114 int point = (index - begin) * matchingProbability; 115 if (point > biggestPoint) { 116 biggestPoint = point; 117 beginResult = begin; 118 endResult = index; 119 } 120 } 121 return std::tuple { biggestPoint, beginResult, endResult }; 122 } 123 124 std::optional<std::tuple<unsigned, unsigned>> BoyerMooreInfo::findWorthwhileCharacterSequenceForLookahead() const 125 { 126 // If candiates-per-character becomes larger, then sequence is not profitable since this sequence will match against 127 // too many characters. But if we limit candiates-per-character smaller, it is possible that we only find very short 128 // character sequence. We start with low limit, then enlarging the limit to find more and more profitable 129 // character sequence. 130 int biggestPoint = 0; 131 unsigned begin = 0; 132 unsigned end = 0; 133 constexpr unsigned maxCandidatesPerCharacter = 32; 134 for (unsigned limit = 4; limit < maxCandidatesPerCharacter; limit *= 2) { 135 auto [newPoint, newBegin, newEnd] = findBestCharacterSequence(limit); 136 if (newPoint > biggestPoint) { 137 biggestPoint = newPoint; 138 begin = newBegin; 139 end = newEnd; 140 } 141 } 142 if (!biggestPoint) 143 return std::nullopt; 144 return std::tuple { begin, end }; 145 } 146 147 std::tuple<BoyerMooreBitmap::Map, bool> BoyerMooreInfo::createCandidateBitmap(unsigned begin, unsigned end) const 148 { 149 BoyerMooreBitmap::Map map { }; 150 bool isMaskEffective = false; 151 for (unsigned index = begin; index < end; ++index) { 152 auto& bmBitmap = m_characters[index]; 153 map.merge(bmBitmap.map()); 154 isMaskEffective |= bmBitmap.isMaskEffective(); 155 } 156 return std::tuple { map, isMaskEffective }; 157 } 47 158 48 159 class YarrGenerator final : public YarrJITInfo, private MacroAssembler { … … 784 895 : m_term(term) 785 896 , m_op(YarrOpCode::Term) 786 , m_isDeadCode(false)787 897 { 788 898 } … … 790 900 explicit YarrOp(YarrOpCode op) 791 901 : m_op(op) 792 , m_isDeadCode(false)793 902 { 794 903 } … … 820 929 // This flag is used to null out the second pattern character, when 821 930 // two are fused to match a pair together. 822 bool m_isDeadCode ;931 bool m_isDeadCode { false }; 823 932 824 933 // Currently used in the case of some of the more complex management of … … 831 940 // upon backtracking back into the disjunction. 832 941 DataLabelPtr m_returnAddress; 942 943 BoyerMooreInfo* m_bmInfo { nullptr }; 833 944 }; 834 945 … … 1411 1522 allCharacters |= (static_cast<uint64_t>(currentCharacter) << shiftAmount); 1412 1523 1413 if ( (m_pattern.ignoreCase()) && (isASCIIAlpha(currentCharacter)))1524 if (m_pattern.ignoreCase() && isASCIIAlpha(currentCharacter)) 1414 1525 ignoreCaseMask |= 32ULL << shiftAmount; 1415 1526 } … … 2257 2368 // to run the first alternative. (This progresses the input position). 2258 2369 op.m_jumps.append(jumpIfNoAvailableInput(alternative->m_minimumSize)); 2370 m_checkedOffset += alternative->m_minimumSize; 2371 2259 2372 // We will reenter after the check, and assume the input position to have been 2260 2373 // set as appropriate to this alternative. 2261 2374 op.m_reentry = label(); 2262 2375 2263 m_checkedOffset += alternative->m_minimumSize; 2376 // Emit fast skip path with stride if we have BoyerMooreInfo. 2377 if (op.m_bmInfo) { 2378 auto range = op.m_bmInfo->findWorthwhileCharacterSequenceForLookahead(); 2379 if (range) { 2380 auto [beginIndex, endIndex] = *range; 2381 ASSERT(endIndex <= alternative->m_minimumSize); 2382 2383 auto [map, isMaskEffective] = op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex); 2384 unsigned mapCount = map.count(); 2385 // If candiate characters are <= 2, checking each is better than using vector. 2386 if (mapCount <= 2) { 2387 UChar32 character1 = map.findBit(0, true); 2388 ASSERT(character1 != BoyerMooreBitmap::Map::size()); 2389 UChar32 character2 = 0xff; 2390 if (mapCount == 2) { 2391 character2 = map.findBit(character1 + 1, true); 2392 ASSERT(character2 != BoyerMooreBitmap::Map::size()); 2393 } 2394 dataLogLnIf(Options::verboseRegExpCompilation(), "Found 1-or-2 characters lookahead character:(0x", hex(character1), "),character2:(", hex(character2), "),isMaskEffective:(", isMaskEffective,"),range:[", beginIndex, ", ", endIndex, ")"); 2395 2396 JumpList matched; 2397 auto loopHead = label(); 2398 readCharacter(m_checkedOffset - endIndex + 1, regT0); 2399 if (isMaskEffective) 2400 and32(TrustedImm32(BoyerMooreBitmap::mapMask), regT0, regT0); 2401 matched.append(branch32(Equal, regT0, TrustedImm32(character1))); 2402 if (mapCount == 2) 2403 matched.append(branch32(Equal, regT0, TrustedImm32(character2))); 2404 op.m_jumps.append(jumpIfNoAvailableInput(endIndex - beginIndex)); 2405 jump().linkTo(loopHead, this); 2406 matched.link(this); 2407 } else { 2408 const uint8_t* pointer = getBoyerMooreByteVector(op.m_bmInfo->index(), map); 2409 dataLogLnIf(Options::verboseRegExpCompilation(), "Found bitmap lookahead count:(", mapCount, "),range:[", beginIndex, ", ", endIndex, ")"); 2410 2411 move(TrustedImmPtr(pointer), regT1); 2412 auto loopHead = label(); 2413 readCharacter(m_checkedOffset - endIndex + 1, regT0); 2414 and32(TrustedImm32(BoyerMooreBitmap::mapMask), regT0, regT0); 2415 auto matched = branchTest32(NonZero, BaseIndex(regT1, regT0, TimesOne)); 2416 op.m_jumps.append(jumpIfNoAvailableInput(endIndex - beginIndex)); 2417 jump().linkTo(loopHead, this); 2418 matched.link(this); 2419 } 2420 2421 // If the pattern size is not fixed, then store the start index for use if we match. 2422 if (!m_pattern.m_body->m_hasFixedSize) { 2423 if (alternative->m_minimumSize) { 2424 move(index, regT0); 2425 sub32(Imm32(alternative->m_minimumSize), regT0); 2426 setMatchStart(regT0); 2427 } else 2428 setMatchStart(index); 2429 } 2430 } 2431 } 2264 2432 break; 2265 2433 } … … 2781 2949 } 2782 2950 YarrOp& endOp = m_ops[op.m_nextOp]; 2951 ASSERT(endOp.m_op == YarrOpCode::BodyAlternativeEnd); 2783 2952 2784 2953 YarrOp* beginOp = &op; … … 2798 2967 m_backtrackingState.linkTo(endOp.m_reentry, this); 2799 2968 else { 2800 if (m_pattern.sticky() && m_ops[op.m_nextOp].m_op == YarrOpCode::BodyAlternativeEnd) {2969 if (m_pattern.sticky()) { 2801 2970 // It is a sticky pattern and the last alternative failed, jump to the end. 2802 2971 m_backtrackingState.takeBacktracksToJumpList(lastStickyAlternativeFailures, this); … … 2814 2983 m_backtrackingState.link(this); 2815 2984 2816 // No need to advance and retry for a sticky pattern. 2817 if (!m_pattern.sticky()) {2818 // If the pattern size is not fixed, then store the start index for use if we match. 2819 if (!m_pattern.m_body->m_hasFixedSize) {2820 if (alternative->m_minimumSize == 1)2821 setMatchStart(index);2822 else {2823 move(index, regT0);2824 if (alternative->m_minimumSize)2825 sub32(Imm32(alternative->m_minimumSize - 1), regT0);2826 else2827 add32(TrustedImm32(1), regT0);2828 setMatchStart(regT0);2829 }2985 // No need to advance and retry for a sticky pattern. And it is already handled before this branch. 2986 ASSERT(!m_pattern.sticky()); 2987 2988 // If the pattern size is not fixed, then store the start index for use if we match. 2989 if (!m_pattern.m_body->m_hasFixedSize) { 2990 if (alternative->m_minimumSize == 1) 2991 setMatchStart(index); 2992 else { 2993 move(index, regT0); 2994 if (alternative->m_minimumSize) 2995 sub32(Imm32(alternative->m_minimumSize - 1), regT0); 2996 else 2997 add32(TrustedImm32(1), regT0); 2998 setMatchStart(regT0); 2830 2999 } 2831 2832 // Generate code to loop. Check whether the last alternative is longer than the 2833 // first (e.g. /a|xy/ or /a|xyz/).2834 if (alternative->m_minimumSize > beginOp->m_alternative->m_minimumSize) {2835 // We want to loop, and increment input position. If the delta is 1, it is2836 // already correctly incremented, if more than one then decrement as appropriate.2837 unsigned delta = alternative->m_minimumSize - beginOp->m_alternative->m_minimumSize;2838 ASSERT(delta);2839 if (delta != 1)2840 sub32(Imm32(delta - 1), index);2841 jump(beginOp->m_reentry);2842 } else {2843 // If the first alternative has minimum size 0xFFFFFFFFu, then there cannot2844 // be sufficent input available to handle this, so just fall through.2845 unsigned delta = beginOp->m_alternative->m_minimumSize - alternative->m_minimumSize;2846 if (delta != 0xFFFFFFFFu) {2847 // We need to check input because we are incrementing the input.2848 add32(Imm32(delta + 1), index);2849 checkInput().linkTo(beginOp->m_reentry, this);2850 }3000 } 3001 3002 // Generate code to loop. Check whether the last alternative is longer than the 3003 // first (e.g. /a|xy/ or /a|xyz/). 3004 if (alternative->m_minimumSize > beginOp->m_alternative->m_minimumSize) { 3005 // We want to loop, and increment input position. If the delta is 1, it is 3006 // already correctly incremented, if more than one then decrement as appropriate. 3007 unsigned delta = alternative->m_minimumSize - beginOp->m_alternative->m_minimumSize; 3008 ASSERT(delta); 3009 if (delta != 1) 3010 sub32(Imm32(delta - 1), index); 3011 jump(beginOp->m_reentry); 3012 } else { 3013 // If the first alternative has minimum size 0xFFFFFFFFu, then there cannot 3014 // be sufficent input available to handle this, so just fall through. 3015 unsigned delta = beginOp->m_alternative->m_minimumSize - alternative->m_minimumSize; 3016 if (delta != 0xFFFFFFFFu) { 3017 // We need to check input because we are incrementing the input. 3018 add32(Imm32(delta + 1), index); 3019 checkInput().linkTo(beginOp->m_reentry, this); 2851 3020 } 2852 3021 } … … 3644 3813 m_ops.append(YarrOp(YarrOpCode::BodyAlternativeBegin)); 3645 3814 m_ops.last().m_previousOp = notFound; 3815 // Collect BoyerMooreInfo if it is possible and profitable. BoyerMooreInfo will be used to emit fast skip path with large stride 3816 // at the beginning of the body alternatives. 3817 // We do not emit these fast path when RegExp has sticky or unicode flag. Sticky case does not need this since 3818 // it fails when the body alternatives fail to match with the current offset. 3819 // FIXME: Support unicode flag. 3820 // https://bugs.webkit.org/show_bug.cgi?id=228611 3821 if (disjunction->m_minimumSize && disjunction->m_hasFixedSize && !m_pattern.sticky() && !m_pattern.unicode()) { 3822 auto bmInfo = BoyerMooreInfo::create(std::min<unsigned>(disjunction->m_minimumSize, BoyerMooreInfo::maxLength)); 3823 if (collectBoyerMooreInfo(disjunction, currentAlternativeIndex, bmInfo.get())) { 3824 m_ops.last().m_bmInfo = bmInfo.ptr(); 3825 bmInfo->setIndex(m_bmInfos.size()); 3826 m_bmInfos.append(WTFMove(bmInfo)); 3827 } 3828 } 3829 3646 3830 do { 3647 3831 size_t lastOpIndex = m_ops.size() - 1; … … 3667 3851 lastOp.m_alternative = nullptr; 3668 3852 lastOp.m_nextOp = repeatLoop; 3853 } 3854 3855 bool collectBoyerMooreInfo(PatternDisjunction* disjunction, size_t currentAlternativeIndex, BoyerMooreInfo& bmInfo) 3856 { 3857 // If we have a searching pattern /abcdef/, then we can check the 6th character against a set of {a, b, c, d, e, f}. 3858 // If it does not match, we can shift 6 characters. We use this strategy since this way can be extended easily to support 3859 // disjunction, character-class, and ignore-cases. For example, in the case of /(?:abc|def)/, we can check 3rd character 3860 // against {a, b, c, d, e, f} and shift 3 characters if it does not match. 3861 // 3862 // Then, the best way to perform the above shifting is that finding the longest character sequence which does not have 3863 // many candidates. In the case of /[a-z]aaaaaaa[a-z]/, we can extract "aaaaaaa" sequence and check 8th character against {a}. 3864 // If it does not match, then we can shift 7 characters (length of "aaaaaaa"). This shifting is better than using "[a-z]aaaaaaa[a-z]" 3865 // sequence and {a-z} set since {a-z} set will almost always match. 3866 // 3867 // We first collect possible characters for each character position. Then, apply heuristics to extract good character sequence from 3868 // that and construct fast searching with long stride. 3869 3870 ASSERT(disjunction->m_hasFixedSize); // We only support fixed-sized lookahead for BoyerMoore search. 3871 ASSERT(disjunction->m_minimumSize); 3872 3873 // FIXME: Support nested disjunctions (e.g. /(?:abc|def|g(?:hi|jk))/). 3874 // https://bugs.webkit.org/show_bug.cgi?id=228614 3875 // FIXME: Support character-class (e.g. /[\d]test/). 3876 // https://bugs.webkit.org/show_bug.cgi?id=228613 3877 // FIXME: Support non-fixed-sized lookahead (e.g. /.*abc/ and extract "abc" sequence). 3878 // https://bugs.webkit.org/show_bug.cgi?id=228612 3879 auto& alternatives = disjunction->m_alternatives; 3880 for (; currentAlternativeIndex < alternatives.size(); ++currentAlternativeIndex) { 3881 unsigned cursor = 0; 3882 PatternAlternative* alternative = alternatives[currentAlternativeIndex].get(); 3883 for (unsigned index = 0; index < alternative->m_terms.size() && cursor < bmInfo.length(); ++index) { 3884 PatternTerm& term = alternative->m_terms[index]; 3885 switch (term.type) { 3886 case PatternTerm::Type::AssertionBOL: 3887 case PatternTerm::Type::AssertionEOL: 3888 case PatternTerm::Type::AssertionWordBoundary: 3889 case PatternTerm::Type::CharacterClass: 3890 case PatternTerm::Type::BackReference: 3891 case PatternTerm::Type::ForwardReference: 3892 case PatternTerm::Type::ParenthesesSubpattern: 3893 case PatternTerm::Type::ParentheticalAssertion: 3894 case PatternTerm::Type::DotStarEnclosure: 3895 return false; 3896 case PatternTerm::Type::PatternCharacter: { 3897 if (term.quantityType != QuantifierType::FixedCount || term.quantityMaxCount != 1) 3898 return false; 3899 if (term.inputPosition != index) 3900 return false; 3901 if (U16_LENGTH(term.patternCharacter) != 1 && m_decodeSurrogatePairs) 3902 return false; 3903 // For case-insesitive compares, non-ascii characters that have different 3904 // upper & lower case representations are already converted to a character class. 3905 ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(term.patternCharacter) || isCanonicallyUnique(term.patternCharacter, m_canonicalMode)); 3906 if (m_pattern.ignoreCase() && isASCIIAlpha(term.patternCharacter)) { 3907 bmInfo.set(cursor, toASCIIUpper(term.patternCharacter)); 3908 bmInfo.set(cursor, toASCIILower(term.patternCharacter)); 3909 } else 3910 bmInfo.set(cursor, term.patternCharacter); 3911 ++cursor; 3912 break; 3913 } 3914 } 3915 } 3916 } 3917 dataLogLnIf(YarrJITInternal::verbose, "Characters collected"); 3918 return true; 3919 } 3920 3921 const uint8_t* getBoyerMooreByteVector(unsigned index, const BoyerMooreBitmap::Map& map) 3922 { 3923 auto vector = makeUniqueRef<BoyerMooreByteVector>(map); 3924 if (const auto* existingPointer = m_codeBlock.tryReuseBoyerMooreByteVector(index, vector.get())) 3925 return existingPointer; 3926 const uint8_t* pointer = vector->data(); 3927 m_bmVector.append(WTFMove(vector)); 3928 return pointer; 3669 3929 } 3670 3930 … … 3953 4213 if (m_compileMode == JITCompileMode::MatchOnly) { 3954 4214 if (m_charSize == CharSize::Char8) 3955 codeBlock.set8BitCodeMatchOnly(FINALIZE_REGEXP_CODE(linkBuffer, YarrMatchOnly8BitPtrTag, "Match-only 8-bit regular expression") );4215 codeBlock.set8BitCodeMatchOnly(FINALIZE_REGEXP_CODE(linkBuffer, YarrMatchOnly8BitPtrTag, "Match-only 8-bit regular expression"), WTFMove(m_bmVector)); 3956 4216 else 3957 codeBlock.set16BitCodeMatchOnly(FINALIZE_REGEXP_CODE(linkBuffer, YarrMatchOnly16BitPtrTag, "Match-only 16-bit regular expression") );4217 codeBlock.set16BitCodeMatchOnly(FINALIZE_REGEXP_CODE(linkBuffer, YarrMatchOnly16BitPtrTag, "Match-only 16-bit regular expression"), WTFMove(m_bmVector)); 3958 4218 } else { 3959 4219 if (m_charSize == CharSize::Char8) 3960 codeBlock.set8BitCode(FINALIZE_REGEXP_CODE(linkBuffer, Yarr8BitPtrTag, "8-bit regular expression") );4220 codeBlock.set8BitCode(FINALIZE_REGEXP_CODE(linkBuffer, Yarr8BitPtrTag, "8-bit regular expression"), WTFMove(m_bmVector)); 3961 4221 else 3962 codeBlock.set16BitCode(FINALIZE_REGEXP_CODE(linkBuffer, Yarr16BitPtrTag, "16-bit regular expression") );4222 codeBlock.set16BitCode(FINALIZE_REGEXP_CODE(linkBuffer, Yarr16BitPtrTag, "16-bit regular expression"), WTFMove(m_bmVector)); 3963 4223 } 3964 4224 if (m_failureReason) … … 4198 4458 // The regular expression expressed as a linear sequence of operations. 4199 4459 Vector<YarrOp, 128> m_ops; 4460 Vector<UniqueRef<BoyerMooreInfo>, 4> m_bmInfos; 4461 Vector<UniqueRef<BoyerMooreByteVector>> m_bmVector; 4200 4462 4201 4463 // This records the current input offset being applied due to the current -
trunk/Source/JavaScriptCore/yarr/YarrJIT.h
r280285 r280452 1 1 /* 2 2 * Copyright (C) 2009-2021 Apple Inc. All rights reserved. 3 * Copyright (C) 2019 the V8 project authors. All rights reserved. 3 4 * 4 5 * Redistribution and use in source and binary forms, with or without … … 33 34 #include "Yarr.h" 34 35 #include "YarrPattern.h" 36 #include <wtf/Bitmap.h> 37 #include <wtf/FixedVector.h> 38 #include <wtf/UniqueRef.h> 35 39 36 40 #define YARR_CALL … … 57 61 }; 58 62 63 class BoyerMooreBitmap { 64 WTF_MAKE_FAST_ALLOCATED(BoyerMooreBitmap); 65 public: 66 static constexpr unsigned mapSize = 128; 67 static constexpr unsigned mapMask = 128 - 1; 68 using Map = Bitmap<mapSize>; 69 70 BoyerMooreBitmap() = default; 71 72 unsigned count() const { return m_count; } 73 const Map& map() const { return m_map; } 74 bool isMaskEffective() const { return m_isMaskEffective; } 75 76 void add(UChar32 character) 77 { 78 unsigned position = character & mapMask; 79 if (position != static_cast<unsigned>(character)) 80 m_isMaskEffective = true; 81 if (!m_map.get(position)) { 82 m_map.set(position); 83 ++m_count; 84 } 85 } 86 87 private: 88 Map m_map { }; 89 unsigned m_count { 0 }; 90 bool m_isMaskEffective { false }; 91 }; 92 93 class BoyerMooreByteVector : public std::array<uint8_t, BoyerMooreBitmap::mapSize> { 94 WTF_MAKE_FAST_ALLOCATED; 95 public: 96 BoyerMooreByteVector(const BoyerMooreBitmap::Map& map) 97 { 98 fill(0); 99 map.forEachSetBit([&](unsigned index) { 100 (*this)[index] = 1; 101 }); 102 } 103 }; 104 59 105 #if CPU(ARM64E) 60 106 extern "C" EncodedMatchResult vmEntryToYarrJIT(const void* input, unsigned start, unsigned length, int* output, MatchingContextHolder* matchingContext, const void* codePtr); … … 79 125 bool has8BitCode() { return m_ref8.size(); } 80 126 bool has16BitCode() { return m_ref16.size(); } 81 void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref) { m_ref8 = ref; } 82 void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref) { m_ref16 = ref; } 127 void set8BitCode(MacroAssemblerCodeRef<Yarr8BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>> maps) 128 { 129 m_ref8 = ref; 130 m_maps.reserveCapacity(m_maps.size() + maps.size()); 131 for (unsigned index = 0; index < maps.size(); ++index) 132 m_maps.uncheckedAppend(WTFMove(maps[index])); 133 } 134 void set16BitCode(MacroAssemblerCodeRef<Yarr16BitPtrTag> ref, Vector<UniqueRef<BoyerMooreByteVector>> maps) 135 { 136 m_ref16 = ref; 137 m_maps.reserveCapacity(m_maps.size() + maps.size()); 138 for (unsigned index = 0; index < maps.size(); ++index) 139 m_maps.uncheckedAppend(WTFMove(maps[index])); 140 } 83 141 84 142 bool has8BitCodeMatchOnly() { return m_matchOnly8.size(); } 85 143 bool has16BitCodeMatchOnly() { return m_matchOnly16.size(); } 86 void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly) { m_matchOnly8 = matchOnly; } 87 void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly) { m_matchOnly16 = matchOnly; } 144 void set8BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>> maps) 145 { 146 m_matchOnly8 = matchOnly; 147 m_maps.reserveCapacity(m_maps.size() + maps.size()); 148 for (unsigned index = 0; index < maps.size(); ++index) 149 m_maps.uncheckedAppend(WTFMove(maps[index])); 150 } 151 void set16BitCodeMatchOnly(MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> matchOnly, Vector<UniqueRef<BoyerMooreByteVector>> maps) 152 { 153 m_matchOnly16 = matchOnly; 154 m_maps.reserveCapacity(m_maps.size() + maps.size()); 155 for (unsigned index = 0; index < maps.size(); ++index) 156 m_maps.uncheckedAppend(WTFMove(maps[index])); 157 } 88 158 89 159 bool usesPatternContextBuffer() { return m_usesPatternContextBuffer; } … … 171 241 } 172 242 173 void clear( )243 void clear(const AbstractLocker&) 174 244 { 175 245 m_ref8 = MacroAssemblerCodeRef<Yarr8BitPtrTag>(); … … 177 247 m_matchOnly8 = MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag>(); 178 248 m_matchOnly16 = MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag>(); 249 m_maps.clear(); 179 250 m_failureReason = std::nullopt; 251 } 252 253 const uint8_t* tryReuseBoyerMooreByteVector(unsigned index, BoyerMooreByteVector& vector) const 254 { 255 if (index < m_maps.size()) { 256 if (m_maps[index].get() == vector) 257 return m_maps[index]->data(); 258 } 259 return nullptr; 180 260 } 181 261 … … 185 265 MacroAssemblerCodeRef<YarrMatchOnly8BitPtrTag> m_matchOnly8; 186 266 MacroAssemblerCodeRef<YarrMatchOnly16BitPtrTag> m_matchOnly16; 267 Vector<UniqueRef<BoyerMooreByteVector>> m_maps; 187 268 bool m_usesPatternContextBuffer { false }; 188 269 std::optional<JITFailureReason> m_failureReason; -
trunk/Source/WTF/ChangeLog
r280451 r280452 1 2021-07-28 Yusuke Suzuki <ysuzuki@apple.com> 2 3 [JSC] Yarr should perform BoyerMoore search 4 https://bugs.webkit.org/show_bug.cgi?id=228301 5 6 Reviewed by Saam Barati. 7 8 * wtf/BitVector.cpp: 9 (WTF::BitVector::dump const): 10 * wtf/Bitmap.h: 11 (WTF::WordType>::dump const): 12 * wtf/UniqueRef.h: 13 (WTF::makeUniqueRefFromNonNullUniquePtr): 14 (WTF::UniqueRef::UniqueRef): 15 1 16 2021-07-29 Kate Cheney <katherine_cheney@apple.com> 2 17 -
trunk/Source/WTF/wtf/BitVector.cpp
r271921 r280452 263 263 void BitVector::dump(PrintStream& out) const 264 264 { 265 for (size_t i = 0; i < size(); ++i) { 266 if (get(i)) 267 out.printf("1"); 268 else 269 out.printf("-"); 270 } 265 for (size_t i = 0; i < size(); ++i) 266 out.print(get(i) ? "1" : "-"); 271 267 } 272 268 -
trunk/Source/WTF/wtf/Bitmap.h
r278878 r280452 24 24 #include <wtf/HashFunctions.h> 25 25 #include <wtf/MathExtras.h> 26 #include <wtf/PrintStream.h> 26 27 #include <wtf/StdIntExtras.h> 27 28 #include <string.h> … … 131 132 unsigned hash() const; 132 133 134 void dump(PrintStream& out) const; 135 133 136 private: 134 137 static constexpr unsigned wordSize = sizeof(WordType) * 8; … … 508 511 } 509 512 513 template<size_t bitmapSize, typename WordType> 514 inline void Bitmap<bitmapSize, WordType>::dump(PrintStream& out) const 515 { 516 for (size_t i = 0; i < size(); ++i) 517 out.print(get(i) ? "1" : "-"); 518 } 519 510 520 } // namespace WTF 511 521 -
trunk/Source/WTF/wtf/UniqueRef.h
r277571 r280452 47 47 48 48 template<typename T> 49 UniqueRef<T> makeUniqueRefFromNonNullUniquePtr(std::unique_ptr<T> ptr)49 UniqueRef<T> makeUniqueRefFromNonNullUniquePtr(std::unique_ptr<T>&& ptr) 50 50 { 51 return UniqueRef<T>( *ptr.release());51 return UniqueRef<T>(WTFMove(ptr)); 52 52 } 53 53 … … 68 68 } 69 69 70 T* ptr() RETURNS_NONNULL { ASSERT(m_ref); return m_ref.get(); } 71 T* ptr() const RETURNS_NONNULL { ASSERT(m_ref); return m_ref.get(); } 72 70 73 T& get() { ASSERT(m_ref); return *m_ref; } 71 74 const T& get() const { ASSERT(m_ref); return *m_ref; } … … 84 87 private: 85 88 template<class U, class... Args> friend UniqueRef<U> makeUniqueRefWithoutFastMallocCheck(Args&&...); 86 template<class U> friend UniqueRef<U> makeUniqueRefFromNonNullUniquePtr(std::unique_ptr<U> );89 template<class U> friend UniqueRef<U> makeUniqueRefFromNonNullUniquePtr(std::unique_ptr<U>&&); 87 90 template<class U> friend class UniqueRef; 91 92 explicit UniqueRef(std::unique_ptr<T>&& ptr) 93 : m_ref(WTFMove(ptr)) 94 { 95 ASSERT(m_ref); 96 } 88 97 89 98 std::unique_ptr<T> m_ref;
Note: See TracChangeset
for help on using the changeset viewer.