Context Navigation

← Previous Changeset
Next Changeset →

Changeset 112143 in webkit

Timestamp:

Mar 26, 2012 1:13:39 PM (12 years ago)

Author:

barraclough@apple.com

Message:

Greek sigma is handled wrong in case independent regexp.
https://bugs.webkit.org/show_bug.cgi?id=82063

Reviewed by Oliver Hunt.

Source/JavaScriptCore:

The bug here is that we assume that any given codepoint has at most one additional value it
should match under a case insensitive match, and that the pair of codepoints that match (if
a codepoint does not only match itself) can be determined by calling toUpper/toLower on the
given codepoint). Life is not that simple.

Instead, pre-calculate a set of tables mapping from a UCS2 codepoint to the set of characters
it may match, under the ES5.1 case-insensitive matching rules. Since unicode is fairly regular
we can pack this table quite nicely, and get it down to 364 entries. This means we can use a
simple binary search to find an entry in typically eight compares.

CMakeLists.txt:
GNUmakefile.list.am:
JavaScriptCore.gypi:
JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj:
JavaScriptCore.xcodeproj/project.pbxproj:
yarr/yarr.pri:
- Added new files to build systems.
yarr/YarrCanonicalizeUCS2.cpp: Added.
- New - autogenerated, UCS2 canonicalized comparison tables.
yarr/YarrCanonicalizeUCS2.h: Added.

(JSC::Yarr::rangeInfoFor):

Look up the canonicalization info for a UCS2 character.

(JSC::Yarr::getCanonicalPair):

For a UCS2 character with a single equivalent value, look it up.

(JSC::Yarr::isCanonicallyUnique):

Returns true if no other UCS2 code points are canonically equal.

(JSC::Yarr::areCanonicallyEquivalent):

Compare two values, under canonicalization rules.

yarr/YarrCanonicalizeUCS2.js: Added.
- script used to generate YarrCanonicalizeUCS2.cpp.
yarr/YarrInterpreter.cpp:

(JSC::Yarr::Interpreter::tryConsumeBackReference):

Use isCanonicallyUnique, rather than Unicode toUpper/toLower.

yarr/YarrJIT.cpp:

(JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
(JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
(JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):

Use isCanonicallyUnique, rather than Unicode toUpper/toLower.

yarr/YarrPattern.cpp:

(JSC::Yarr::CharacterClassConstructor::putChar):

Updated to determine canonical equivalents correctly.

(JSC::Yarr::CharacterClassConstructor::putUnicodeIgnoreCase):

Added, used to put a non-ascii, non-unique character in a case-insensitive match.

(JSC::Yarr::CharacterClassConstructor::putRange):

Updated to determine canonical equivalents correctly.

(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):

Changed to call putUnicodeIgnoreCase, instead of putChar, avoid a double lookup of rangeInfo.

LayoutTests:

fast/regex/script-tests/unicodeCaseInsensitive.js: Added.

(shouldBeTrue.ucs2CodePoint):

fast/regex/unicodeCaseInsensitive-expected.txt: Added.
fast/regex/unicodeCaseInsensitive.html: Added.
- Added test cases for case-insensitive matches of non-ascii characters.

Location:

trunk

Files:

: 6 added
: 11 edited

LayoutTests/ChangeLog (modified) (1 diff)
LayoutTests/fast/regex/script-tests/unicodeCaseInsensitive.js (added)
LayoutTests/fast/regex/unicodeCaseInsensitive-expected.txt (added)
LayoutTests/fast/regex/unicodeCaseInsensitive.html (added)
Source/JavaScriptCore/CMakeLists.txt (modified) (1 diff)
Source/JavaScriptCore/ChangeLog (modified) (1 diff)
Source/JavaScriptCore/GNUmakefile.list.am (modified) (1 diff)
Source/JavaScriptCore/JavaScriptCore.gypi (modified) (2 diffs)
Source/JavaScriptCore/JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj (modified) (1 diff)
Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj (modified) (5 diffs)
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp (added)
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.h (added)
Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.js (added)
Source/JavaScriptCore/yarr/YarrInterpreter.cpp (modified) (2 diffs)
Source/JavaScriptCore/yarr/YarrJIT.cpp (modified) (5 diffs)
Source/JavaScriptCore/yarr/YarrPattern.cpp (modified) (5 diffs)
Source/JavaScriptCore/yarr/yarr.pri (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/LayoutTests/ChangeLog

-                      r112137
+                      r112143
+-03-25  Gavin Barraclough  <barraclough@apple.com>
+        Greek sigma is handled wrong in case independent regexp.
+        https://bugs.webkit.org/show_bug.cgi?id=82063
+        Reviewed by Oliver Hunt.
+        * fast/regex/script-tests/unicodeCaseInsensitive.js: Added.
+        (shouldBeTrue.ucs2CodePoint):
+        * fast/regex/unicodeCaseInsensitive-expected.txt: Added.
+        * fast/regex/unicodeCaseInsensitive.html: Added.
+            - Added test cases for case-insensitive matches of non-ascii characters.
 -03-26  Emil A Eklund  <eae@chromium.org>

trunk/Source/JavaScriptCore/CMakeLists.txt

r111974	r112143
220	220	tools/CodeProfiling.cpp
221	221
	222	yarr/YarrCanonicalizeUCS2.cpp
222	223	yarr/YarrPattern.cpp
223	224	yarr/YarrInterpreter.cpp

trunk/Source/JavaScriptCore/ChangeLog

-                      r112123
+                      r112143
+-03-25  Gavin Barraclough  <barraclough@apple.com>
+        Greek sigma is handled wrong in case independent regexp.
+        https://bugs.webkit.org/show_bug.cgi?id=82063
+        Reviewed by Oliver Hunt.
+        The bug here is that we assume that any given codepoint has at most one additional value it
+        should match under a case insensitive match, and that the pair of codepoints that match (if
+        a codepoint does not only match itself) can be determined by calling toUpper/toLower on the
+        given codepoint). Life is not that simple.
+        Instead, pre-calculate a set of tables mapping from a UCS2 codepoint to the set of characters
+        it may match, under the ES5.1 case-insensitive matching rules. Since unicode is fairly regular
+        we can pack this table quite nicely, and get it down to 364 entries. This means we can use a
+        simple binary search to find an entry in typically eight compares.
+        * CMakeLists.txt:
+        * GNUmakefile.list.am:
+        * JavaScriptCore.gypi:
+        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj:
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * yarr/yarr.pri:
+            - Added new files to build systems.
+        * yarr/YarrCanonicalizeUCS2.cpp: Added.
+            - New - autogenerated, UCS2 canonicalized comparison tables.
+        * yarr/YarrCanonicalizeUCS2.h: Added.
+        (JSC::Yarr::rangeInfoFor):
+            - Look up the canonicalization info for a UCS2 character.
+        (JSC::Yarr::getCanonicalPair):
+            - For a UCS2 character with a single equivalent value, look it up.
+        (JSC::Yarr::isCanonicallyUnique):
+            - Returns true if no other UCS2 code points are canonically equal.
+        (JSC::Yarr::areCanonicallyEquivalent):
+            - Compare two values, under canonicalization rules.
+        * yarr/YarrCanonicalizeUCS2.js: Added.
+            - script used to generate YarrCanonicalizeUCS2.cpp.
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::tryConsumeBackReference):
+            - Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::jumpIfCharNotEquals):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterOnce):
+        (JSC::Yarr::YarrGenerator::generatePatternCharacterFixed):
+            - Use isCanonicallyUnique, rather than Unicode toUpper/toLower.
+        * yarr/YarrPattern.cpp:
+        (JSC::Yarr::CharacterClassConstructor::putChar):
+            - Updated to determine canonical equivalents correctly.
+        (JSC::Yarr::CharacterClassConstructor::putUnicodeIgnoreCase):
+            - Added, used to put a non-ascii, non-unique character in a case-insensitive match.
+        (JSC::Yarr::CharacterClassConstructor::putRange):
+            - Updated to determine canonical equivalents correctly.
+        (JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
+            - Changed to call putUnicodeIgnoreCase, instead of putChar, avoid a double lookup of rangeInfo.
 -03-26  Kevin Ollivier  <kevino@theolliviers.com>

trunk/Source/JavaScriptCore/GNUmakefile.list.am

r112082	r112143
579	579	Source/JavaScriptCore/tools/TieredMMapArray.h \
580	580	Source/JavaScriptCore/yarr/Yarr.h \
	581	Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp \
	582	Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.h \
581	583	Source/JavaScriptCore/yarr/YarrInterpreter.cpp \
582	584	Source/JavaScriptCore/yarr/YarrInterpreter.h \

trunk/Source/JavaScriptCore/JavaScriptCore.gypi

-                      r111889
+                      r112143
             'runtime/WriteBarrier.h',
             'yarr/Yarr.h',
+            'yarr/YarrCanonicalizeUCS2.h',
             'yarr/YarrInterpreter.h',
             'yarr/YarrPattern.h',
 …
             'runtime/UString.cpp',
             'runtime/UStringConcatenate.h',
+            'yarr/YarrCanonicalizeUCS2.cpp',
             'yarr/YarrInterpreter.cpp',
             'yarr/YarrJIT.cpp',

trunk/Source/JavaScriptCore/JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.vcproj

-                      r111889
+                      r112143
                         </File>
                         <File
+                                RelativePath="..\..\yarr\YarrCanonicalizeUCS2.cpp"
+                                >
+                        </File>
+                        <File
+                                RelativePath="..\..\yarr\YarrCanonicalizeUCS2.h"
+                                >
+                        </File>
+                        <File
                                 RelativePath="..\..\yarr\YarrInterpreter.cpp"
+                                >

trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj

-                      r112040
+                      r112143
 C510151C06A90046D4EF /* RegExpCachedResult.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 86F75EFB151C062F007C9BA3 /* RegExpCachedResult.cpp */; };
 C512151C083D0046D4EF /* RegExpMatchesArray.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 86F75EFD151C062F007C9BA3 /* RegExpMatchesArray.cpp */; };
+C547151FE26B0046D4EF /* YarrCanonicalizeUCS2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 8642C544151FE26B0046D4EF /* YarrCanonicalizeUCS2.cpp */; };
+C548151FE26B0046D4EF /* YarrCanonicalizeUCS2.h in Headers */ = {isa = PBXBuildFile; fileRef = 8642C545151FE26B0046D4EF /* YarrCanonicalizeUCS2.h */; };
 A30F1135007E100CDB49E /* JSValueInlineMethods.h in Headers */ = {isa = PBXBuildFile; fileRef = 865A30F0135007E100CDB49E /* JSValueInlineMethods.h */; settings = {ATTRIBUTES = (Private, ); }; };
 F408810E7D56300947361 /* APIShims.h in Headers */ = {isa = PBXBuildFile; fileRef = 865F408710E7D56300947361 /* APIShims.h */; settings = {ATTRIBUTES = (Private, ); }; };
 …
 F503143CE1C100B295F5 /* JSGlobalThis.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSGlobalThis.h; sourceTree = "<group>"; };
 B23DF0FC60E6200703AA4 /* MacroAssemblerCodeRef.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = MacroAssemblerCodeRef.h; sourceTree = "<group>"; };
+C544151FE26B0046D4EF /* YarrCanonicalizeUCS2.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = YarrCanonicalizeUCS2.cpp; path = yarr/YarrCanonicalizeUCS2.cpp; sourceTree = "<group>"; };
+C545151FE26B0046D4EF /* YarrCanonicalizeUCS2.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = YarrCanonicalizeUCS2.h; path = yarr/YarrCanonicalizeUCS2.h; sourceTree = "<group>"; };
+C546151FE26B0046D4EF /* YarrCanonicalizeUCS2.js */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.javascript; name = YarrCanonicalizeUCS2.js; path = yarr/YarrCanonicalizeUCS2.js; sourceTree = "<group>"; };
 A30F0135007E100CDB49E /* JSValueInlineMethods.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSValueInlineMethods.h; sourceTree = "<group>"; };
 F408710E7D56300947361 /* APIShims.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = APIShims.h; sourceTree = "<group>"; };
 …
                         children = (
 B812DC994500EF7AC4 /* Yarr.h */,
+C544151FE26B0046D4EF /* YarrCanonicalizeUCS2.cpp */,
+C545151FE26B0046D4EF /* YarrCanonicalizeUCS2.h */,
+C546151FE26B0046D4EF /* YarrCanonicalizeUCS2.js */,
 B7D12DBA33700A9FE7B /* YarrInterpreter.cpp */,
 B7E12DBA33700A9FE7B /* YarrInterpreter.h */,
 …
 FA581BB150E953000B9A2D9 /* DFGNodeFlags.h in Headers */,
 FA581BC150E953000B9A2D9 /* DFGNodeType.h in Headers */,
+<<<<<<< .mine
+C548151FE26B0046D4EF /* YarrCanonicalizeUCS2.h in Headers */,
+=======
 F2BDC16151C5D4F00CD8910 /* DFGFixupPhase.h in Headers */,
 F2BDC21151E803B00CD8910 /* DFGInsertionSet.h in Headers */,
 F2BDC2C151FDE9100CD8910 /* Operands.h in Headers */,
+>>>>>>> .r112137
                         );
                         runOnlyForDeploymentPostprocessing = 0;
 …
 C510151C06A90046D4EF /* RegExpCachedResult.cpp in Sources */,
 C512151C083D0046D4EF /* RegExpMatchesArray.cpp in Sources */,
+C547151FE26B0046D4EF /* YarrCanonicalizeUCS2.cpp in Sources */,
                         );
                         runOnlyForDeploymentPostprocessing = 0;

trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp

-                      r108858
+                      r112143
 #include "UString.h"
 #include "Yarr.h"
+#include "YarrCanonicalizeUCS2.h"
 #include <wtf/BumpPointerAllocator.h>
 #include <wtf/DataLog.h>
 …
         if (pattern->m_ignoreCase) {
             for (unsigned i = 0; i < matchSize; ++i) {
+                int ch = input.reread(matchBegin + i);
+                int lo = Unicode::toLower(ch);
+                int hi = Unicode::toUpper(ch);
+                if ((lo != hi) ? (!checkCasedCharacter(lo, hi, negativeInputOffset + matchSize - i)) : (!checkCharacter(ch, negativeInputOffset + matchSize - i))) {
+                    input.uncheckInput(matchSize);
+                    return false;
+                }
+                int oldCh = input.reread(matchBegin + i);
+                int ch = input.readChecked(negativeInputOffset + matchSize - i);
+                if (oldCh == ch)
+                    continue;
+                // The definition for canonicalize (see ES 5.1, 15.10.2.8) means that
+                // unicode values are never allowed to match against ascii ones.
+                if (isASCII(oldCh) || isASCII(ch)) {
+                    if (toASCIIUpper(oldCh) == toASCIIUpper(ch))
+                        continue;
+                } else if (areCanonicallyEquivalent(oldCh, ch))
+                    continue;
+                input.uncheckInput(matchSize);
+                return false;
+            }
         } else {

trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp

-                      r110033
+                      r112143
 #include "LinkBuffer.h"
 #include "Yarr.h"
+#include "YarrCanonicalizeUCS2.h"
 #if ENABLE(YARR_JIT)
 …
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
         ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || (Unicode::toLower(ch) == Unicode::toUpper(ch)));
+        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
         if (m_pattern.m_ignoreCase && isASCIIAlpha(ch)) {
             or32(TrustedImm32(32), character);
             ch = Unicode::toLower(ch);
+            or32(TrustedImm32(0x20), character);
+            ch |= 0x20;
+        }
 …
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
         ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || (Unicode::toLower(ch) == Unicode::toUpper(ch)));
         if ((m_pattern.m_ignoreCase) && (isASCIIAlpha(ch)))
+        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
+        if (m_pattern.m_ignoreCase && isASCIIAlpha(ch))
             ignoreCaseMask |= 32;
 …
             // For case-insesitive compares, non-ascii characters that have different
             // upper & lower case representations are converted to a character class.
             ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(currentCharacter) || (Unicode::toLower(currentCharacter) == Unicode::toUpper(currentCharacter)));
+            ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(currentCharacter) || isCanonicallyUnique(currentCharacter));
             allCharacters |= (currentCharacter << shiftAmount);
 …
         // For case-insesitive compares, non-ascii characters that have different
         // upper & lower case representations are converted to a character class.
         ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || (Unicode::toLower(ch) == Unicode::toUpper(ch)));
+        ASSERT(!m_pattern.m_ignoreCase || isASCIIAlpha(ch) || isCanonicallyUnique(ch));
         if (m_pattern.m_ignoreCase && isASCIIAlpha(ch)) {
             or32(TrustedImm32(32), character);
             ch = Unicode::toLower(ch);
+            or32(TrustedImm32(0x20), character);
+            ch |= 0x20;
+        }

trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp

-                      r106748
+                      r112143
 #include "Yarr.h"
+#include "YarrCanonicalizeUCS2.h"
 #include "YarrParser.h"
 #include <wtf/Vector.h>
 …
     void putChar(UChar ch)
+    {
+        // Handle ascii cases.
         if (ch <= 0x7f) {
             if (m_isCaseInsensitive && isASCIIAlpha(ch)) {
 …
             } else
                 addSorted(m_matches, ch);
+            return;
+        }
+        // Simple case, not a case-insensitive match.
+        if (!m_isCaseInsensitive) {
+            addSorted(m_matchesUnicode, ch);
+            return;
+        }
+        // Add multiple matches, if necessary.
+        UCS2CanonicalizationRange* info = rangeInfoFor(ch);
+        if (info->type == CanonicalizeUnique)
+            addSorted(m_matchesUnicode, ch);
+        else
+            putUnicodeIgnoreCase(ch, info);
+    }
+    void putUnicodeIgnoreCase(UChar ch, UCS2CanonicalizationRange* info)
+    {
+        ASSERT(m_isCaseInsensitive);
+        ASSERT(ch > 0x7f);
+        ASSERT(ch >= info->begin && ch <= info->end);
+        ASSERT(info->type != CanonicalizeUnique);
+        if (info->type == CanonicalizeSet) {
+            for (uint16_t* set = characterSetInfo[info->value]; (ch = *set); ++set)
+                addSorted(m_matchesUnicode, ch);
         } else {
+            UChar upper, lower;
+            if (m_isCaseInsensitive && ((upper = Unicode::toUpper(ch)) != (lower = Unicode::toLower(ch)))) {
+                addSorted(m_matchesUnicode, upper);
+                addSorted(m_matchesUnicode, lower);
+            } else
+                addSorted(m_matchesUnicode, ch);
+        }
+    }
+    // returns true if this character has another case, and 'ch' is the upper case form.
+    static inline bool isUnicodeUpper(UChar ch)
+    {
+        return ch != Unicode::toLower(ch);
+    }
+    // returns true if this character has another case, and 'ch' is the lower case form.
+    static inline bool isUnicodeLower(UChar ch)
+    {
+        return ch != Unicode::toUpper(ch);
+            addSorted(m_matchesUnicode, ch);
+            addSorted(m_matchesUnicode, getCanonicalPair(info, ch));
+        }
+    }
 …
+            }
+        }
+        if (hi >= 0x80) {
+            uint32_t unicodeCurr = std::max(lo, (UChar)0x80);
+            addSortedRange(m_rangesUnicode, unicodeCurr, hi);
+            if (m_isCaseInsensitive) {
+                while (unicodeCurr <= hi) {
+                    // If the upper bound of the range (hi) is 0xffff, the increments to
+                    // unicodeCurr in this loop may take it to 0x10000.  This is fine
+                    // (if so we won't re-enter the loop, since the loop condition above
+                    // will definitely fail) - but this does mean we cannot use a UChar
+                    // to represent unicodeCurr, we must use a 32-bit value instead.
+                    ASSERT(unicodeCurr <= 0xffff);
+                    if (isUnicodeUpper(unicodeCurr)) {
+                        UChar lowerCaseRangeBegin = Unicode::toLower(unicodeCurr);
+                        UChar lowerCaseRangeEnd = lowerCaseRangeBegin;
+                        while ((++unicodeCurr <= hi) && isUnicodeUpper(unicodeCurr) && (Unicode::toLower(unicodeCurr) == (lowerCaseRangeEnd + 1)))
+                            lowerCaseRangeEnd++;
+                        addSortedRange(m_rangesUnicode, lowerCaseRangeBegin, lowerCaseRangeEnd);
+                    } else if (isUnicodeLower(unicodeCurr)) {
+                        UChar upperCaseRangeBegin = Unicode::toUpper(unicodeCurr);
+                        UChar upperCaseRangeEnd = upperCaseRangeBegin;
+                        while ((++unicodeCurr <= hi) && isUnicodeLower(unicodeCurr) && (Unicode::toUpper(unicodeCurr) == (upperCaseRangeEnd + 1)))
+                            upperCaseRangeEnd++;
+                        addSortedRange(m_rangesUnicode, upperCaseRangeBegin, upperCaseRangeEnd);
+                    } else
+                        ++unicodeCurr;
+                }
+            }
+        }
+        if (hi <= 0x7f)
+            return;
+        lo = std::max(lo, (UChar)0x80);
+        addSortedRange(m_rangesUnicode, lo, hi);
+        if (!m_isCaseInsensitive)
+            return;
+        UCS2CanonicalizationRange* info = rangeInfoFor(lo);
+        while (true) {
+            // Handle the range [lo .. end]
+            UChar end = std::min(info->end, hi);
+            switch (info->type) {
+            case CanonicalizeUnique:
+                // Nothing to do - no canonical equivalents.
+                break;
+            case CanonicalizeSet: {
+                UChar ch;
+                for (uint16_t* set = characterSetInfo[info->value]; (ch = *set); ++set)
+                    addSorted(m_matchesUnicode, ch);
+                break;
+            }
+            case CanonicalizeRangeLo:
+                addSortedRange(m_rangesUnicode, lo + info->value, end + info->value);
+                break;
+            case CanonicalizeRangeHi:
+                addSortedRange(m_rangesUnicode, lo - info->value, end - info->value);
+                break;
+            case CanonicalizeAlternatingAligned:
+                // Use addSortedRange since there is likely an abutting range to combine with.
+                if (lo & 1)
+                    addSortedRange(m_rangesUnicode, lo - 1, lo - 1);
+                if (!(end & 1))
+                    addSortedRange(m_rangesUnicode, end + 1, end + 1);
+                break;
+            case CanonicalizeAlternatingUnaligned:
+                // Use addSortedRange since there is likely an abutting range to combine with.
+                if (!(lo & 1))
+                    addSortedRange(m_rangesUnicode, lo - 1, lo - 1);
+                if (end & 1)
+                    addSortedRange(m_rangesUnicode, end + 1, end + 1);
+                break;
+            }
+            if (hi == end)
+                return;
+            ++info;
+            lo = info->begin;
+        };
+    }
 …
         // We handle case-insensitive checking of unicode characters which do have both
         // cases by handling them as if they were defined using a CharacterClass.
+        if (m_pattern.m_ignoreCase && !isASCII(ch) && (Unicode::toUpper(ch) != Unicode::toLower(ch))) {
+            atomCharacterClassBegin();
+            atomCharacterClassAtom(ch);
+            atomCharacterClassEnd();
+        } else
+        if (!m_pattern.m_ignoreCase || isASCII(ch)) {
             m_alternative->m_terms.append(PatternTerm(ch));
+            return;
+        }
+        UCS2CanonicalizationRange* info = rangeInfoFor(ch);
+        if (info->type == CanonicalizeUnique) {
+            m_alternative->m_terms.append(PatternTerm(ch));
+            return;
+        }
+        m_characterClassConstructor.putUnicodeIgnoreCase(ch, info);
+        CharacterClass* newCharacterClass = m_characterClassConstructor.charClass();
+        m_pattern.m_userCharacterClasses.append(newCharacterClass);
+        m_alternative->m_terms.append(PatternTerm(newCharacterClass, false));
+    }

trunk/Source/JavaScriptCore/yarr/yarr.pri

-                      r102237
+                      r112143
     $$PWD/YarrInterpreter.cpp \
     $$PWD/YarrPattern.cpp \
+    $$PWD/YarrSyntaxChecker.cpp
+    $$PWD/YarrSyntaxChecker.cpp \
+    $$PWD/YarrCanonicalizeUCS2.cpp
 # For UString.h

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 112143 in webkit

Legend:

Download in other formats: