Changeset 255134 in webkit
- Timestamp:
- Jan 26, 2020 3:28:44 PM (4 years ago)
- Location:
- trunk
- Files:
-
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/JSTests/ChangeLog
r255112 r255134 1 2020-01-26 Alexey Shvayka <shvaikalesh@gmail.com> 2 3 Invalid ranges in character classes should be banned in unicode patterns 4 https://bugs.webkit.org/show_bug.cgi?id=206768 5 6 Reviewed by Darin Adler. 7 8 * test262/expectations.yaml: Mark 18 test cases as passing. 9 1 10 2020-01-24 Mark Lam <mark.lam@apple.com> 2 11 -
trunk/JSTests/test262/expectations.yaml
r254757 r255134 1241 1241 default: 'SyntaxError: Invalid regular expression: invalid group specifier name' 1242 1242 strict mode: 'SyntaxError: Invalid regular expression: invalid group specifier name' 1243 test/built-ins/RegExp/property-escapes/character-class-range-end.js:1244 default: 'Test262: This statement should not be evaluated.'1245 strict mode: 'Test262: This statement should not be evaluated.'1246 test/built-ins/RegExp/property-escapes/character-class-range-no-dash-end.js:1247 default: 'Test262: This statement should not be evaluated.'1248 strict mode: 'Test262: This statement should not be evaluated.'1249 test/built-ins/RegExp/property-escapes/character-class-range-no-dash-start.js:1250 default: 'Test262: This statement should not be evaluated.'1251 strict mode: 'Test262: This statement should not be evaluated.'1252 test/built-ins/RegExp/property-escapes/character-class-range-start.js:1253 default: 'Test262: This statement should not be evaluated.'1254 strict mode: 'Test262: This statement should not be evaluated.'1255 1243 test/built-ins/RegExp/property-escapes/generated/Alphabetic.js: 1256 1244 default: 'Test262Error: `\p{Alphabetic}` should match U+001CFA (`ᳺ`)' … … 1610 1598 default: 'Test262Error: RegExp("]", "u"): Expected a SyntaxError to be thrown but no exception was thrown at all' 1611 1599 strict mode: 'Test262Error: RegExp("]", "u"): Expected a SyntaxError to be thrown but no exception was thrown at all' 1612 test/built-ins/RegExp/unicode_restricted_character_class_escape.js:1613 default: 'Test262Error: RegExp("[\d-a]", "u"): Expected a SyntaxError to be thrown but no exception was thrown at all'1614 strict mode: 'Test262Error: RegExp("[\d-a]", "u"): Expected a SyntaxError to be thrown but no exception was thrown at all'1615 1600 test/built-ins/RegExp/unicode_restricted_identity_escape.js: 1616 1601 default: "Test262Error: Invalid IdentityEscape in AtomEscape: '\\" … … 3424 3409 default: 'Test262: This statement should not be evaluated.' 3425 3410 strict mode: 'Test262: This statement should not be evaluated.' 3426 test/language/literals/regexp/u-invalid-non-empty-class-ranges-no-dash-a.js:3427 default: 'Test262: This statement should not be evaluated.'3428 strict mode: 'Test262: This statement should not be evaluated.'3429 test/language/literals/regexp/u-invalid-non-empty-class-ranges-no-dash-ab.js:3430 default: 'Test262: This statement should not be evaluated.'3431 strict mode: 'Test262: This statement should not be evaluated.'3432 test/language/literals/regexp/u-invalid-non-empty-class-ranges-no-dash-b.js:3433 default: 'Test262: This statement should not be evaluated.'3434 strict mode: 'Test262: This statement should not be evaluated.'3435 test/language/literals/regexp/u-invalid-non-empty-class-ranges.js:3436 default: 'Test262: This statement should not be evaluated.'3437 strict mode: 'Test262: This statement should not be evaluated.'3438 3411 test/language/literals/regexp/u-invalid-oob-decimal-escape.js: 3439 3412 default: 'Test262: This statement should not be evaluated.' -
trunk/Source/JavaScriptCore/ChangeLog
r255120 r255134 1 2020-01-26 Alexey Shvayka <shvaikalesh@gmail.com> 2 3 Invalid ranges in character classes should be banned in unicode patterns 4 https://bugs.webkit.org/show_bug.cgi?id=206768 5 6 Reviewed by Darin Adler. 7 8 In ES5, grammar of CharacterRange was ambiguous, resulting in invalid ranges 9 like /[\d-a]/ being allowed. As of ES2015, invalid ranges are SyntaxError in 10 unicode patterns, yet still allowed in regular ones to avoid breaking the web. 11 (https://tc39.es/ecma262/#sec-patterns-static-semantics-early-errors-annexb) 12 13 This change adds SyntaxError for unicode patterns and updates explanatory 14 comments. ErrorCode::CharacterClassOutOfOrder is renamed for consistency 15 with newly added error code and ErrorCode::ParenthesesTypeInvalid. 16 17 * yarr/YarrErrorCode.cpp: 18 (JSC::Yarr::errorMessage): 19 (JSC::Yarr::errorToThrow): 20 * yarr/YarrErrorCode.h: 21 * yarr/YarrParser.h: 22 (JSC::Yarr::Parser::CharacterClassParserDelegate::CharacterClassParserDelegate): 23 (JSC::Yarr::Parser::CharacterClassParserDelegate::atomPatternCharacter): 24 (JSC::Yarr::Parser::CharacterClassParserDelegate::atomBuiltInCharacterClass): 25 (JSC::Yarr::Parser::parseCharacterClass): 26 1 27 2020-01-24 Mark Lam <mark.lam@apple.com> 2 28 -
trunk/Source/JavaScriptCore/yarr/YarrErrorCode.cpp
r251425 r255134 36 36 // The order of this array must match the ErrorCode enum. 37 37 static const char* errorMessages[] = { 38 nullptr, // NoError 39 REGEXP_ERROR_PREFIX "regular expression too large", // PatternTooLarge 40 REGEXP_ERROR_PREFIX "numbers out of order in {} quantifier", // QuantifierOutOfOrder 41 REGEXP_ERROR_PREFIX "nothing to repeat", // QuantifierWithoutAtom 42 REGEXP_ERROR_PREFIX "number too large in {} quantifier", // QuantifierTooLarge 43 REGEXP_ERROR_PREFIX "missing )", // MissingParentheses 44 REGEXP_ERROR_PREFIX "unmatched parentheses", // ParenthesesUnmatched 45 REGEXP_ERROR_PREFIX "unrecognized character after (?", // ParenthesesTypeInvalid 46 REGEXP_ERROR_PREFIX "invalid group specifier name", // InvalidGroupName 47 REGEXP_ERROR_PREFIX "duplicate group specifier name", // DuplicateGroupName 48 REGEXP_ERROR_PREFIX "missing terminating ] for character class", // CharacterClassUnmatched 49 REGEXP_ERROR_PREFIX "range out of order in character class", // CharacterClassOutOfOrder 50 REGEXP_ERROR_PREFIX "\\ at end of pattern", // EscapeUnterminated 51 REGEXP_ERROR_PREFIX "invalid unicode {} escape", // InvalidUnicodeEscape 52 REGEXP_ERROR_PREFIX "invalid backreference for unicode pattern", // InvalidBackreference 53 REGEXP_ERROR_PREFIX "invalid escaped character for unicode pattern", // InvalidIdentityEscape 54 REGEXP_ERROR_PREFIX "invalid property expression", // InvalidUnicodePropertyExpression 55 REGEXP_ERROR_PREFIX "too many nested disjunctions", // TooManyDisjunctions 56 REGEXP_ERROR_PREFIX "pattern exceeds string length limits", // OffsetTooLarge 57 REGEXP_ERROR_PREFIX "invalid flags" // InvalidRegularExpressionFlags 38 nullptr, // NoError 39 REGEXP_ERROR_PREFIX "regular expression too large", // PatternTooLarge 40 REGEXP_ERROR_PREFIX "numbers out of order in {} quantifier", // QuantifierOutOfOrder 41 REGEXP_ERROR_PREFIX "nothing to repeat", // QuantifierWithoutAtom 42 REGEXP_ERROR_PREFIX "number too large in {} quantifier", // QuantifierTooLarge 43 REGEXP_ERROR_PREFIX "missing )", // MissingParentheses 44 REGEXP_ERROR_PREFIX "unmatched parentheses", // ParenthesesUnmatched 45 REGEXP_ERROR_PREFIX "unrecognized character after (?", // ParenthesesTypeInvalid 46 REGEXP_ERROR_PREFIX "invalid group specifier name", // InvalidGroupName 47 REGEXP_ERROR_PREFIX "duplicate group specifier name", // DuplicateGroupName 48 REGEXP_ERROR_PREFIX "missing terminating ] for character class", // CharacterClassUnmatched 49 REGEXP_ERROR_PREFIX "range out of order in character class", // CharacterClassRangeOutOfOrder 50 REGEXP_ERROR_PREFIX "invalid range in character class for unicode pattern", // CharacterClassRangeInvalid 51 REGEXP_ERROR_PREFIX "\\ at end of pattern", // EscapeUnterminated 52 REGEXP_ERROR_PREFIX "invalid unicode {} escape", // InvalidUnicodeEscape 53 REGEXP_ERROR_PREFIX "invalid backreference for unicode pattern", // InvalidBackreference 54 REGEXP_ERROR_PREFIX "invalid escaped character for unicode pattern", // InvalidIdentityEscape 55 REGEXP_ERROR_PREFIX "invalid property expression", // InvalidUnicodePropertyExpression 56 REGEXP_ERROR_PREFIX "too many nested disjunctions", // TooManyDisjunctions 57 REGEXP_ERROR_PREFIX "pattern exceeds string length limits", // OffsetTooLarge 58 REGEXP_ERROR_PREFIX "invalid flags" // InvalidRegularExpressionFlags 58 59 }; 59 60 … … 77 78 case ErrorCode::DuplicateGroupName: 78 79 case ErrorCode::CharacterClassUnmatched: 79 case ErrorCode::CharacterClassOutOfOrder: 80 case ErrorCode::CharacterClassRangeOutOfOrder: 81 case ErrorCode::CharacterClassRangeInvalid: 80 82 case ErrorCode::EscapeUnterminated: 81 83 case ErrorCode::InvalidUnicodeEscape: -
trunk/Source/JavaScriptCore/yarr/YarrErrorCode.h
r251425 r255134 46 46 DuplicateGroupName, 47 47 CharacterClassUnmatched, 48 CharacterClassOutOfOrder, 48 CharacterClassRangeOutOfOrder, 49 CharacterClassRangeInvalid, 49 50 EscapeUnterminated, 50 51 InvalidUnicodeEscape, -
trunk/Source/JavaScriptCore/yarr/YarrParser.h
r250005 r255134 55 55 class CharacterClassParserDelegate { 56 56 public: 57 CharacterClassParserDelegate(Delegate& delegate, ErrorCode& err )57 CharacterClassParserDelegate(Delegate& delegate, ErrorCode& err, bool isUnicode) 58 58 : m_delegate(delegate) 59 59 , m_errorCode(err) 60 , m_isUnicode(isUnicode) 60 61 , m_state(Empty) 61 62 , m_character(0) … … 86 87 switch (m_state) { 87 88 case AfterCharacterClass: 88 // Following a built in character class we need look out for a hyphen.89 // Following a built-in character class we need look out for a hyphen. 89 90 // We're looking for invalid ranges, such as /[\d-x]/ or /[\d-\d]/. 90 // If we see a hyphen following a chara ter class then unlike usual91 // If we see a hyphen following a character class then unlike usual 91 92 // we'll report it to the delegate immediately, and put ourself into 92 // a poisoned state. Any following calls to add another character or 93 // character class will result in an error. (A hypen following a 94 // character-class is itself valid, but only at the end of a regex). 93 // a poisoned state. In a unicode pattern, any following calls to add 94 // another character or character class will result in syntax error. 95 // A hypen following a character class is itself valid, but only at 96 // the end of a regex. 95 97 if (hyphenIsRange && ch == '-') { 96 98 m_delegate.atomCharacterClassAtom('-'); … … 117 119 case CachedCharacterHyphen: 118 120 if (ch < m_character) { 119 m_errorCode = ErrorCode::CharacterClass OutOfOrder;121 m_errorCode = ErrorCode::CharacterClassRangeOutOfOrder; 120 122 return; 121 123 } … … 124 126 return; 125 127 126 // See coment in atomBuiltInCharacterClass below. 127 // This too is technically an error, per ECMA-262, and again we 128 // we chose to allow this. Note a subtlely here that while we 129 // diverge from the spec's definition of CharacterRange we do 130 // remain in compliance with the grammar. For example, consider 131 // the expression /[\d-a-z]/. We comply with the grammar in 132 // this case by not allowing a-z to be matched as a range. 128 // If we hit this case, we have an invalid range like /[\d-a]/. 129 // See coment in atomBuiltInCharacterClass() below. 133 130 case AfterCharacterClassHyphen: 131 if (m_isUnicode) { 132 m_errorCode = ErrorCode::CharacterClassRangeInvalid; 133 return; 134 } 134 135 m_delegate.atomCharacterClassAtom(ch); 135 136 m_state = Empty; … … 152 153 case Empty: 153 154 case AfterCharacterClass: 155 m_delegate.atomCharacterClassBuiltIn(classID, invert); 154 156 m_state = AfterCharacterClass; 155 m_delegate.atomCharacterClassBuiltIn(classID, invert);156 157 return; 157 158 158 159 // If we hit either of these cases, we have an invalid range that 159 // looks something like /[ x-\d]/ or /[\d-\d]/.160 // According to ECMA-262 this should be a syntax error, but161 // empirical testing shows this to break teh webz. Instead we162 // comply with to the ECMA-262 grammar, and assume the grammar to163 // have matched the range correctly, but tweak our interpretation164 // of CharacterRange. Effectively we implicitly handle the hyphen165 // as if it were escaped, e.g. /[\w-_]/ is treated as /[\w\-_]/.160 // looks something like /[a-\d]/ or /[\d-\d]/. 161 // Since ES2015, this should be syntax error in a unicode pattern, 162 // yet gracefully handled in a regular regex to avoid breaking the web. 163 // Effectively we handle the hyphen as if it was (implicitly) escaped, 164 // e.g. /[\d-a-z]/ is treated as /[\d\-a\-z]/. 165 // See usages of CharacterRangeOrUnion abstract op in 166 // https://tc39.es/ecma262/#sec-regular-expression-patterns-semantics 166 167 case CachedCharacterHyphen: 167 168 m_delegate.atomCharacterClassAtom(m_character); … … 169 170 FALLTHROUGH; 170 171 case AfterCharacterClassHyphen: 172 if (m_isUnicode) { 173 m_errorCode = ErrorCode::CharacterClassRangeInvalid; 174 return; 175 } 171 176 m_delegate.atomCharacterClassBuiltIn(classID, invert); 172 177 m_state = Empty; … … 202 207 Delegate& m_delegate; 203 208 ErrorCode& m_errorCode; 209 bool m_isUnicode; 204 210 enum CharacterClassConstructionState { 205 211 Empty, … … 596 602 consume(); 597 603 598 CharacterClassParserDelegate characterClassConstructor(m_delegate, m_errorCode );604 CharacterClassParserDelegate characterClassConstructor(m_delegate, m_errorCode, m_isUnicode); 599 605 600 606 characterClassConstructor.begin(tryConsume('^'));
Note: See TracChangeset
for help on using the changeset viewer.