Changeset 259026 in webkit


Ignore:
Timestamp:
Mar 25, 2020 6:24:05 PM (4 years ago)
Author:
Alexey Shvayka
Message:

Invalid numeric and named references should be early syntax errors
https://bugs.webkit.org/show_bug.cgi?id=178175

Reviewed by Ross Kirsling.

JSTests:

  • test262/expectations.yaml: Mark 44 test cases as passing.

Source/JavaScriptCore:

This patch:

  1. Fixes named reference parsing in parseEscape(), making /\k/u throw SyntaxError per spec [1].
  1. Reworks containsIllegalNamedForwardReferences(), making dangling (e.g. /\k<a>(?<b>.)/) and incomplete (e.g. /\k<(?<a>.)/) named references throw SyntaxError if the non-Unicode pattern contains a named group [2].
  1. Moves reparsing logic from YarrPattern to YarrParser, ensuring syntax errors due to illegal references (named & numeric) are thrown at parse time; drops isValidNamedForwardReference() from Delegate, refactors saveUnmatchedNamedForwardReferences(), and overall improves cohesion of illegal references logic.

[1]: https://tc39.es/ecma262/#prod-IdentityEscape
[2]: https://tc39.es/ecma262/#sec-regexpinitialize (step 7.b)

  • yarr/YarrErrorCode.cpp:

(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):

  • yarr/YarrErrorCode.h:
  • yarr/YarrParser.h:

(JSC::Yarr::Parser::CharacterClassParserDelegate::atomNamedBackReference):
(JSC::Yarr::Parser::Parser):
(JSC::Yarr::Parser::parseEscape):
(JSC::Yarr::Parser::parseParenthesesBegin):
(JSC::Yarr::Parser::parse):
(JSC::Yarr::Parser::handleIllegalReferences):
(JSC::Yarr::Parser::containsIllegalNamedForwardReference):
(JSC::Yarr::Parser::resetForReparsing):
(JSC::Yarr::parse):
(JSC::Yarr::Parser::CharacterClassParserDelegate::isValidNamedForwardReference): Deleted.

  • yarr/YarrPattern.cpp:

(JSC::Yarr::YarrPatternConstructor::atomBackReference):
(JSC::Yarr::YarrPatternConstructor::atomNamedForwardReference):
(JSC::Yarr::YarrPattern::compile):
(JSC::Yarr::YarrPatternConstructor::saveUnmatchedNamedForwardReferences): Deleted.
(JSC::Yarr::YarrPatternConstructor::isValidNamedForwardReference): Deleted.

  • yarr/YarrPattern.h:

(JSC::Yarr::YarrPattern::resetForReparsing):
(JSC::Yarr::YarrPattern::containsIllegalBackReference): Deleted.
(JSC::Yarr::YarrPattern::containsIllegalNamedForwardReferences): Deleted.

  • yarr/YarrSyntaxChecker.cpp:

(JSC::Yarr::SyntaxChecker::atomNamedBackReference):
(JSC::Yarr::SyntaxChecker::resetForReparsing):
(JSC::Yarr::SyntaxChecker::isValidNamedForwardReference): Deleted.

Source/WebCore:

Accounts for changes of YarrParser's Delegate interface, no behavioral changes.
resetForReparsing() is never called because we disable numeric backrefences
and named forward references (see arguments of Yarr::parse() call).

Test: TestWebKitAPI.ContentExtensionTest.ParsingFailures

  • contentextensions/URLFilterParser.cpp:

(WebCore::ContentExtensions::PatternParser::resetForReparsing):
(WebCore::ContentExtensions::URLFilterParser::addPattern):
(WebCore::ContentExtensions::PatternParser::isValidNamedForwardReference): Deleted.

Tools:

Removes FIXME as YarrParser is correct not to throw errors as it is
parsing in non-Unicode mode. Also adds a few named groups tests.

  • TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:

LayoutTests:

  • js/regexp-named-capture-groups-expected.txt:
  • js/script-tests/regexp-named-capture-groups.js:
Location:
trunk
Files:
16 edited

Legend:

Unmodified
Added
Removed
  • trunk/JSTests/ChangeLog

    r258976 r259026  
     12020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
     2
     3        Invalid numeric and named references should be early syntax errors
     4        https://bugs.webkit.org/show_bug.cgi?id=178175
     5
     6        Reviewed by Ross Kirsling.
     7
     8        * test262/expectations.yaml: Mark 44 test cases as passing.
     9
    1102020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
    211
  • trunk/JSTests/test262/expectations.yaml

    r258976 r259026  
    17121712  default: 'SyntaxError: Invalid regular expression: number too large in {} quantifier'
    17131713  strict mode: 'SyntaxError: Invalid regular expression: number too large in {} quantifier'
    1714 test/built-ins/RegExp/unicode_restricted_identity_escape_alpha.js:
    1715   default: "Test262Error: IdentityEscape in AtomEscape: 'k' Expected a SyntaxError to be thrown but no exception was thrown at all"
    1716   strict mode: "Test262Error: IdentityEscape in AtomEscape: 'k' Expected a SyntaxError to be thrown but no exception was thrown at all"
    17171714test/built-ins/RegExp/unicode_restricted_octal_escape.js:
    17181715  default: 'Test262Error: RegExp("[\1]", "u"):  Expected a SyntaxError to be thrown but no exception was thrown at all'
     
    32653262  default: "SyntaxError: Invalid character '\\u08be'"
    32663263  strict mode: "SyntaxError: Invalid character '\\u08be'"
    3267 test/language/literals/regexp/named-groups/invalid-dangling-groupname-2-u.js:
    3268   default: 'Test262: This statement should not be evaluated.'
    3269   strict mode: 'Test262: This statement should not be evaluated.'
    3270 test/language/literals/regexp/named-groups/invalid-dangling-groupname-2.js:
    3271   default: 'Test262: This statement should not be evaluated.'
    3272   strict mode: 'Test262: This statement should not be evaluated.'
    3273 test/language/literals/regexp/named-groups/invalid-dangling-groupname-3-u.js:
    3274   default: 'Test262: This statement should not be evaluated.'
    3275   strict mode: 'Test262: This statement should not be evaluated.'
    3276 test/language/literals/regexp/named-groups/invalid-dangling-groupname-3.js:
    3277   default: 'Test262: This statement should not be evaluated.'
    3278   strict mode: 'Test262: This statement should not be evaluated.'
    3279 test/language/literals/regexp/named-groups/invalid-dangling-groupname-4-u.js:
    3280   default: 'Test262: This statement should not be evaluated.'
    3281   strict mode: 'Test262: This statement should not be evaluated.'
    3282 test/language/literals/regexp/named-groups/invalid-dangling-groupname-4.js:
    3283   default: 'Test262: This statement should not be evaluated.'
    3284   strict mode: 'Test262: This statement should not be evaluated.'
    3285 test/language/literals/regexp/named-groups/invalid-dangling-groupname-5.js:
    3286   default: 'Test262: This statement should not be evaluated.'
    3287   strict mode: 'Test262: This statement should not be evaluated.'
    3288 test/language/literals/regexp/named-groups/invalid-dangling-groupname-u.js:
    3289   default: 'Test262: This statement should not be evaluated.'
    3290   strict mode: 'Test262: This statement should not be evaluated.'
    3291 test/language/literals/regexp/named-groups/invalid-dangling-groupname-without-group-u.js:
    3292   default: 'Test262: This statement should not be evaluated.'
    3293   strict mode: 'Test262: This statement should not be evaluated.'
    3294 test/language/literals/regexp/named-groups/invalid-dangling-groupname.js:
    3295   default: 'Test262: This statement should not be evaluated.'
    3296   strict mode: 'Test262: This statement should not be evaluated.'
    3297 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-2.js:
    3298   default: 'Test262: This statement should not be evaluated.'
    3299   strict mode: 'Test262: This statement should not be evaluated.'
    3300 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-3.js:
    3301   default: 'Test262: This statement should not be evaluated.'
    3302   strict mode: 'Test262: This statement should not be evaluated.'
    3303 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-4.js:
    3304   default: 'Test262: This statement should not be evaluated.'
    3305   strict mode: 'Test262: This statement should not be evaluated.'
    3306 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-5.js:
    3307   default: 'Test262: This statement should not be evaluated.'
    3308   strict mode: 'Test262: This statement should not be evaluated.'
    3309 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-6.js:
    3310   default: 'Test262: This statement should not be evaluated.'
    3311   strict mode: 'Test262: This statement should not be evaluated.'
    3312 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-u.js:
    3313   default: 'Test262: This statement should not be evaluated.'
    3314   strict mode: 'Test262: This statement should not be evaluated.'
    3315 test/language/literals/regexp/named-groups/invalid-incomplete-groupname-without-group-3-u.js:
    3316   default: 'Test262: This statement should not be evaluated.'
    3317   strict mode: 'Test262: This statement should not be evaluated.'
    3318 test/language/literals/regexp/named-groups/invalid-incomplete-groupname.js:
    3319   default: 'Test262: This statement should not be evaluated.'
    3320   strict mode: 'Test262: This statement should not be evaluated.'
    33213264test/language/literals/regexp/u-astral-char-class-invert.js:
    33223265  default: 'Test262Error: Expected SameValue(«�», «null») to be true'
    33233266  strict mode: 'Test262Error: Expected SameValue(«�», «null») to be true'
    3324 test/language/literals/regexp/u-dec-esc.js:
    3325   default: 'Test262: This statement should not be evaluated.'
    3326   strict mode: 'Test262: This statement should not be evaluated.'
    3327 test/language/literals/regexp/u-invalid-legacy-octal-escape.js:
    3328   default: 'Test262: This statement should not be evaluated.'
    3329   strict mode: 'Test262: This statement should not be evaluated.'
    3330 test/language/literals/regexp/u-invalid-oob-decimal-escape.js:
    3331   default: 'Test262: This statement should not be evaluated.'
    3332   strict mode: 'Test262: This statement should not be evaluated.'
    33333267test/language/module-code/eval-rqstd-once.js:
    33343268  module: "SyntaxError: Unexpected identifier 'as'. Expected 'from' before exported module name."
  • trunk/LayoutTests/ChangeLog

    r259024 r259026  
     12020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
     2
     3        Invalid numeric and named references should be early syntax errors
     4        https://bugs.webkit.org/show_bug.cgi?id=178175
     5
     6        Reviewed by Ross Kirsling.
     7
     8        * js/regexp-named-capture-groups-expected.txt:
     9        * js/script-tests/regexp-named-capture-groups.js:
     10
    1112020-03-25  Pinki Gyanchandani  <pgyanchandani@apple.com>
    212
  • trunk/LayoutTests/js/regexp-named-capture-groups-expected.txt

    r255452 r259026  
    6767PASS "1122332211".match(/\k<ones>\k<twos>\k<threes>(?<ones>1*)(?<twos>2*)(?<threes>3*)\k<threes>\k<twos>\k<ones>/u) is ["1122332211", "11", "22", "3"]
    6868PASS "\k<z>XzzX".match(/\k<z>X(z*)X/) is ["k<z>XzzX", "zz"]
    69 PASS "\k<z>XzzX".match(/\k<z>X(z*)X/u) threw exception SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern.
     69PASS "\k<z>XzzX".match(/\k<z>X(z*)X/u) threw exception SyntaxError: Invalid regular expression: invalid \k<> named backreference.
     70PASS /\k<xxx(?<a>y)(/ threw exception SyntaxError: Invalid regular expression: invalid \k<> named backreference.
    7071PASS successfullyParsed is true
    7172
  • trunk/LayoutTests/js/script-tests/regexp-named-capture-groups.js

    r255452 r259026  
    112112
    113113// Check that a named forward reference for a non-existent named capture
    114 // matches for non-unicode patterns and throws for unicode patterns.
     114// matches for non-Unicode patterns w/o a named group and throws for patterns with a named group or Unicode flag.
    115115shouldBe('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/)', '["k<z>XzzX", "zz"]');
    116 shouldThrow('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/u)', '"SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern"');
     116shouldThrow('"\\\k<z>XzzX".match(/\\\k<z>X(z*)X/u)', '"SyntaxError: Invalid regular expression: invalid \\\\k<> named backreference"');
     117shouldThrow('/\\\k<xxx(?<a>y)(/', '"SyntaxError: Invalid regular expression: invalid \\\\k<> named backreference"');
  • trunk/Source/JavaScriptCore/ChangeLog

    r259021 r259026  
     12020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
     2
     3        Invalid numeric and named references should be early syntax errors
     4        https://bugs.webkit.org/show_bug.cgi?id=178175
     5
     6        Reviewed by Ross Kirsling.
     7
     8        This patch:
     9
     10        1. Fixes named reference parsing in parseEscape(), making /\k/u throw SyntaxError per spec [1].
     11
     12        2. Reworks containsIllegalNamedForwardReferences(), making dangling (e.g. /\k<a>(?<b>.)/) and
     13           incomplete (e.g. /\k<(?<a>.)/) named references throw SyntaxError if the non-Unicode pattern
     14           contains a named group [2].
     15
     16        3. Moves reparsing logic from YarrPattern to YarrParser, ensuring syntax errors due to illegal
     17           references (named & numeric) are thrown at parse time; drops isValidNamedForwardReference()
     18           from Delegate, refactors saveUnmatchedNamedForwardReferences(), and overall improves cohesion
     19           of illegal references logic.
     20
     21        [1]: https://tc39.es/ecma262/#prod-IdentityEscape
     22        [2]: https://tc39.es/ecma262/#sec-regexpinitialize (step 7.b)
     23
     24        * yarr/YarrErrorCode.cpp:
     25        (JSC::Yarr::errorMessage):
     26        (JSC::Yarr::errorToThrow):
     27        * yarr/YarrErrorCode.h:
     28        * yarr/YarrParser.h:
     29        (JSC::Yarr::Parser::CharacterClassParserDelegate::atomNamedBackReference):
     30        (JSC::Yarr::Parser::Parser):
     31        (JSC::Yarr::Parser::parseEscape):
     32        (JSC::Yarr::Parser::parseParenthesesBegin):
     33        (JSC::Yarr::Parser::parse):
     34        (JSC::Yarr::Parser::handleIllegalReferences):
     35        (JSC::Yarr::Parser::containsIllegalNamedForwardReference):
     36        (JSC::Yarr::Parser::resetForReparsing):
     37        (JSC::Yarr::parse):
     38        (JSC::Yarr::Parser::CharacterClassParserDelegate::isValidNamedForwardReference): Deleted.
     39        * yarr/YarrPattern.cpp:
     40        (JSC::Yarr::YarrPatternConstructor::atomBackReference):
     41        (JSC::Yarr::YarrPatternConstructor::atomNamedForwardReference):
     42        (JSC::Yarr::YarrPattern::compile):
     43        (JSC::Yarr::YarrPatternConstructor::saveUnmatchedNamedForwardReferences): Deleted.
     44        (JSC::Yarr::YarrPatternConstructor::isValidNamedForwardReference): Deleted.
     45        * yarr/YarrPattern.h:
     46        (JSC::Yarr::YarrPattern::resetForReparsing):
     47        (JSC::Yarr::YarrPattern::containsIllegalBackReference): Deleted.
     48        (JSC::Yarr::YarrPattern::containsIllegalNamedForwardReferences): Deleted.
     49        * yarr/YarrSyntaxChecker.cpp:
     50        (JSC::Yarr::SyntaxChecker::atomNamedBackReference):
     51        (JSC::Yarr::SyntaxChecker::resetForReparsing):
     52        (JSC::Yarr::SyntaxChecker::isValidNamedForwardReference): Deleted.
     53
    1542020-03-25  Chris Dumez  <cdumez@apple.com>
    255
  • trunk/Source/JavaScriptCore/yarr/YarrErrorCode.cpp

    r255544 r259026  
    5454        REGEXP_ERROR_PREFIX "invalid Unicode {} escape",                            // InvalidUnicodeEscape
    5555        REGEXP_ERROR_PREFIX "invalid backreference for Unicode pattern",            // InvalidBackreference
     56        REGEXP_ERROR_PREFIX "invalid \\k<> named backreference",                    // InvalidNamedBackReference
    5657        REGEXP_ERROR_PREFIX "invalid escaped character for Unicode pattern",        // InvalidIdentityEscape
    5758        REGEXP_ERROR_PREFIX "invalid \\c escape for Unicode pattern",               // InvalidControlLetterEscape
     
    8889    case ErrorCode::InvalidUnicodeEscape:
    8990    case ErrorCode::InvalidBackreference:
     91    case ErrorCode::InvalidNamedBackReference:
    9092    case ErrorCode::InvalidIdentityEscape:
    9193    case ErrorCode::InvalidControlLetterEscape:
  • trunk/Source/JavaScriptCore/yarr/YarrErrorCode.h

    r255544 r259026  
    5353    InvalidUnicodeEscape,
    5454    InvalidBackreference,
     55    InvalidNamedBackReference,
    5556    InvalidIdentityEscape,
    5657    InvalidControlLetterEscape,
  • trunk/Source/JavaScriptCore/yarr/YarrParser.h

    r258976 r259026  
    11/*
    2  * Copyright (C) 2009-2019 Apple Inc. All rights reserved.
     2 * Copyright (C) 2009-2020 Apple Inc. All rights reserved.
     3 * Copyright (C) 2020 Alexey Shvayka <shvaikalesh@gmail.com>.
    34 *
    45 * Redistribution and use in source and binary forms, with or without
     
    1112 *    documentation and/or other materials provided with the distribution.
    1213 *
    13  * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
    14  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    15  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    16  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
    17  * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    18  * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
    19  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
    20  * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
    21  * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    22  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    23  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     14 * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
     15 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
     16 * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
     17 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
     18 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
     19 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
     20 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
     21 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
     22 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
     23 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
     24 * THE POSSIBILITY OF SUCH DAMAGE.
    2425 */
    2526
     
    4243private:
    4344    template<class FriendDelegate>
    44     friend ErrorCode parse(FriendDelegate&, const String& pattern, bool isUnicode, unsigned backReferenceLimit);
     45    friend ErrorCode parse(FriendDelegate&, const String& pattern, bool isUnicode, unsigned backReferenceLimit, bool isNamedForwardReferenceAllowed);
    4546
    4647    /*
     
    201202        NO_RETURN_DUE_TO_ASSERT void atomBackReference(unsigned) { RELEASE_ASSERT_NOT_REACHED(); }
    202203        NO_RETURN_DUE_TO_ASSERT void atomNamedBackReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
    203         NO_RETURN_DUE_TO_ASSERT bool isValidNamedForwardReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
    204204        NO_RETURN_DUE_TO_ASSERT void atomNamedForwardReference(const String&) { RELEASE_ASSERT_NOT_REACHED(); }
    205205
     
    218218    };
    219219
    220     Parser(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit)
     220    Parser(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit, bool isNamedForwardReferenceAllowed)
    221221        : m_delegate(delegate)
    222         , m_backReferenceLimit(backReferenceLimit)
    223222        , m_data(pattern.characters<CharType>())
    224223        , m_size(pattern.length())
    225224        , m_isUnicode(isUnicode)
     225        , m_backReferenceLimit(backReferenceLimit)
     226        , m_isNamedForwardReferenceAllowed(isNamedForwardReferenceAllowed)
    226227    {
    227228    }
     
    339340                unsigned backReference = consumeNumber();
    340341                if (backReference <= m_backReferenceLimit) {
     342                    m_maxSeenBackReference = std::max(m_maxSeenBackReference, backReference);
    341343                    delegate.atomBackReference(backReference);
    342344                    break;
     
    344346
    345347                restoreState(state);
    346 
    347                 if (m_isUnicode) {
    348                     m_errorCode = ErrorCode::InvalidBackreference;
    349                     return false;
    350                 }
    351348            }
    352349
     
    440437            consume();
    441438            ParseState state = saveState();
    442             if (!atEndOfPattern() && !inCharacterClass) {
    443                 if (consume() == '<') {
    444                     auto groupName = tryConsumeGroupName();
    445                     if (groupName) {
    446                         if (m_captureGroupNames.contains(groupName.value())) {
    447                             delegate.atomNamedBackReference(groupName.value());
    448                             break;
    449                         }
    450                        
    451                         if (delegate.isValidNamedForwardReference(groupName.value())) {
    452                             delegate.atomNamedForwardReference(groupName.value());
    453                             break;
    454                         }
    455                     }
    456                     if (m_isUnicode) {
    457                         m_errorCode = ErrorCode::InvalidBackreference;
     439            if (!inCharacterClass && tryConsume('<')) {
     440                auto groupName = tryConsumeGroupName();
     441                if (groupName) {
     442                    if (m_captureGroupNames.contains(groupName.value())) {
     443                        delegate.atomNamedBackReference(groupName.value());
    458444                        break;
    459445                    }
    460                 }
    461             }
     446
     447                    if (m_isNamedForwardReferenceAllowed) {
     448                        m_forwardReferenceNames.add(groupName.value());
     449                        delegate.atomNamedForwardReference(groupName.value());
     450                        break;
     451                    }
     452                }
     453            }
     454
    462455            restoreState(state);
    463             delegate.atomPatternCharacter('k');
     456            if (!isIdentityEscapeAnError('k')) {
     457                delegate.atomPatternCharacter('k');
     458                m_kIdentityEscapeSeen = true;
     459            }
    464460            break;
    465461        }
     
    678674                auto groupName = tryConsumeGroupName();
    679675                if (groupName) {
     676                    if (m_kIdentityEscapeSeen) {
     677                        m_errorCode = ErrorCode::InvalidNamedBackReference;
     678                        break;
     679                    }
     680
    680681                    auto setAddResult = m_captureGroupNames.add(groupName.value());
    681682                    if (setAddResult.isNewEntry)
     
    694695        } else
    695696            m_delegate.atomParenthesesSubpatternBegin();
     697
     698        if (type == ParenthesesType::Subpattern)
     699            ++m_numSubpatterns;
    696700
    697701        m_parenthesesStack.append(type);
     
    882886    {
    883887        if (m_size > MAX_PATTERN_SIZE)
    884             m_errorCode = ErrorCode::PatternTooLarge;
    885         else
     888            return ErrorCode::PatternTooLarge;
     889
     890        parseTokens();
     891
     892        if (!hasError(m_errorCode)) {
     893            ASSERT(atEndOfPattern());
     894            handleIllegalReferences();
     895            ASSERT(atEndOfPattern());
     896        }
     897
     898        return m_errorCode;
     899    }
     900
     901    void handleIllegalReferences()
     902    {
     903        bool shouldReparse = false;
     904
     905        if (m_maxSeenBackReference > m_numSubpatterns) {
     906            // Contains illegal numeric backreference. See https://tc39.es/ecma262/#prod-annexB-AtomEscape
     907            if (m_isUnicode) {
     908                m_errorCode = ErrorCode::InvalidBackreference;
     909                return;
     910            }
     911
     912            m_backReferenceLimit = m_numSubpatterns;
     913            shouldReparse = true;
     914        }
     915
     916        if (m_kIdentityEscapeSeen && !m_captureGroupNames.isEmpty()) {
     917            m_errorCode = ErrorCode::InvalidNamedBackReference;
     918            return;
     919        }
     920
     921        if (containsIllegalNamedForwardReference()) {
     922            // \k<a> is parsed as named reference in Unicode patterns because of strict IdentityEscape grammar.
     923            // See https://tc39.es/ecma262/#sec-patterns-static-semantics-early-errors
     924            if (m_isUnicode || !m_captureGroupNames.isEmpty()) {
     925                m_errorCode = ErrorCode::InvalidNamedBackReference;
     926                return;
     927            }
     928
     929            m_isNamedForwardReferenceAllowed = false;
     930            shouldReparse = true;
     931        }
     932
     933        if (shouldReparse) {
     934            resetForReparsing();
    886935            parseTokens();
    887         ASSERT(atEndOfPattern() || hasError(m_errorCode));
    888        
    889         return m_errorCode;
     936        }
     937    }
     938
     939    bool containsIllegalNamedForwardReference()
     940    {
     941        if (m_forwardReferenceNames.isEmpty())
     942            return false;
     943
     944        if (m_captureGroupNames.isEmpty())
     945            return true;
     946
     947        for (auto& entry : m_forwardReferenceNames) {
     948            if (!m_captureGroupNames.contains(entry))
     949                return true;
     950        }
     951
     952        return false;
     953    }
     954
     955    void resetForReparsing()
     956    {
     957        ASSERT(!hasError(m_errorCode));
     958
     959        m_delegate.resetForReparsing();
     960        m_index = 0;
     961        m_numSubpatterns = 0;
     962        m_maxSeenBackReference = 0;
     963        m_kIdentityEscapeSeen = false;
     964        m_parenthesesStack.clear();
     965        m_captureGroupNames.clear();
     966        m_forwardReferenceNames.clear();
    890967    }
    891968
     
    11541231
    11551232    Delegate& m_delegate;
    1156     unsigned m_backReferenceLimit;
    11571233    ErrorCode m_errorCode { ErrorCode::NoError };
    11581234    const CharType* m_data;
     
    11601236    unsigned m_index { 0 };
    11611237    bool m_isUnicode;
     1238    unsigned m_backReferenceLimit;
     1239    unsigned m_numSubpatterns { 0 };
     1240    unsigned m_maxSeenBackReference { 0 };
     1241    bool m_isNamedForwardReferenceAllowed;
     1242    bool m_kIdentityEscapeSeen { false };
    11621243    Vector<ParenthesesType, 16> m_parenthesesStack;
    11631244    HashSet<String> m_captureGroupNames;
     1245    HashSet<String> m_forwardReferenceNames;
    11641246
    11651247    // Derived by empirical testing of compile time in PCRE and WREC.
     
    11931275 *    void atomBackReference(unsigned subpatternId);
    11941276 *    void atomNamedBackReference(const String& subpatternName);
    1195  *    bool isValidNamedForwardReference(const String& subpatternName);
    11961277 *    void atomNamedForwardReference(const String& subpatternName);
    11971278 *
     
    11991280 *
    12001281 *    void disjunction();
     1282 *
     1283 *    void resetForReparsing();
    12011284 *
    12021285 * The regular expression is described by a sequence of assertion*() and atom*()
     
    12301313
    12311314template<class Delegate>
    1232 ErrorCode parse(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit = quantifyInfinite)
     1315ErrorCode parse(Delegate& delegate, const String& pattern, bool isUnicode, unsigned backReferenceLimit = quantifyInfinite, bool isNamedForwardReferenceAllowed = true)
    12331316{
    12341317    if (pattern.is8Bit())
    1235         return Parser<Delegate, LChar>(delegate, pattern, isUnicode, backReferenceLimit).parse();
    1236     return Parser<Delegate, UChar>(delegate, pattern, isUnicode, backReferenceLimit).parse();
     1318        return Parser<Delegate, LChar>(delegate, pattern, isUnicode, backReferenceLimit, isNamedForwardReferenceAllowed).parse();
     1319    return Parser<Delegate, UChar>(delegate, pattern, isUnicode, backReferenceLimit, isNamedForwardReferenceAllowed).parse();
    12371320}
    12381321
  • trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp

    r254714 r259026  
    463463    }
    464464
    465     void saveUnmatchedNamedForwardReferences()
    466     {
    467         m_unmatchedNamedForwardReferences.shrink(0);
    468        
    469         for (auto& entry : m_pattern.m_namedForwardReferences) {
    470             if (!m_pattern.m_captureGroupNames.contains(entry))
    471                 m_unmatchedNamedForwardReferences.append(entry);
    472         }
    473     }
    474 
    475465    void assertionBOL()
    476466    {
     
    658648        ASSERT(subpatternId);
    659649        m_pattern.m_containsBackreferences = true;
    660         m_pattern.m_maxBackReference = std::max(m_pattern.m_maxBackReference, subpatternId);
    661650
    662651        if (subpatternId > m_pattern.m_numSubpatterns) {
     
    688677    }
    689678
    690     bool isValidNamedForwardReference(const String& subpatternName)
    691     {
    692         return !m_unmatchedNamedForwardReferences.contains(subpatternName);
    693     }
    694 
    695     void atomNamedForwardReference(const String& subpatternName)
    696     {
    697         m_pattern.m_namedForwardReferences.appendIfNotContains(subpatternName);
     679    void atomNamedForwardReference(const String&)
     680    {
    698681        m_alternative->m_terms.append(PatternTerm::ForwardReference());
    699682    }
     
    11311114    PatternAlternative* m_alternative;
    11321115    CharacterClassConstructor m_characterClassConstructor;
    1133     Vector<String> m_unmatchedNamedForwardReferences;
    11341116    void* m_stackLimit;
    11351117    ErrorCode m_error { ErrorCode::NoError };
     
    11461128        if (hasError(error))
    11471129            return error;
    1148     }
    1149    
    1150     // If the pattern contains illegal backreferences reset & reparse.
    1151     // Quoting Netscape's "What's new in JavaScript 1.2",
    1152     //      "Note: if the number of left parentheses is less than the number specified
    1153     //       in \#, the \# is taken as an octal escape as described in the next row."
    1154     if (containsIllegalBackReference() || containsIllegalNamedForwardReferences()) {
    1155         if (unicode())
    1156             return ErrorCode::InvalidBackreference;
    1157 
    1158         unsigned numSubpatterns = m_numSubpatterns;
    1159 
    1160         constructor.saveUnmatchedNamedForwardReferences();
    1161         constructor.resetForReparsing();
    1162         ErrorCode error = parse(constructor, patternString, unicode(), numSubpatterns);
    1163         ASSERT_UNUSED(error, !hasError(error));
    1164         ASSERT(numSubpatterns == m_numSubpatterns);
    11651130    }
    11661131
  • trunk/Source/JavaScriptCore/yarr/YarrPattern.h

    r249926 r259026  
    392392    {
    393393        m_numSubpatterns = 0;
    394         m_maxBackReference = 0;
    395394        m_initialStartValueFrameLocation = 0;
    396395
     
    416415        m_userCharacterClasses.clear();
    417416        m_captureGroupNames.shrink(0);
    418         m_namedForwardReferences.shrink(0);
    419     }
    420 
    421     bool containsIllegalBackReference()
    422     {
    423         return m_maxBackReference > m_numSubpatterns;
    424     }
    425    
    426     bool containsIllegalNamedForwardReferences()
    427     {
    428         if (m_namedForwardReferences.isEmpty())
    429             return false;
    430 
    431         for (auto& entry : m_namedForwardReferences) {
    432             if (!m_captureGroupNames.contains(entry))
    433                 return true;
    434         }
    435 
    436         return false;
    437417    }
    438418
     
    556536    OptionSet<Flags> m_flags;
    557537    unsigned m_numSubpatterns { 0 };
    558     unsigned m_maxBackReference { 0 };
    559538    unsigned m_initialStartValueFrameLocation { 0 };
    560539    PatternDisjunction* m_body;
     
    562541    Vector<std::unique_ptr<CharacterClass>> m_userCharacterClasses;
    563542    Vector<String> m_captureGroupNames;
    564     Vector<String> m_namedForwardReferences;
    565543    HashMap<String, unsigned> m_namedGroupToParenIndex;
    566544
  • trunk/Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp

    r242699 r259026  
    5050    void atomBackReference(unsigned) {}
    5151    void atomNamedBackReference(const String&) {}
    52     bool isValidNamedForwardReference(const String&) { return true; }
    5352    void atomNamedForwardReference(const String&) {}
    5453    void quantifyAtom(unsigned, unsigned, bool) {}
    5554    void disjunction() {}
     55    void resetForReparsing() {}
    5656};
    5757
  • trunk/Source/WebCore/ChangeLog

    r259024 r259026  
     12020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
     2
     3        Invalid numeric and named references should be early syntax errors
     4        https://bugs.webkit.org/show_bug.cgi?id=178175
     5
     6        Reviewed by Ross Kirsling.
     7
     8        Accounts for changes of YarrParser's Delegate interface, no behavioral changes.
     9        resetForReparsing() is never called because we disable numeric backrefences
     10        and named forward references (see arguments of Yarr::parse() call).
     11
     12        Test: TestWebKitAPI.ContentExtensionTest.ParsingFailures
     13
     14        * contentextensions/URLFilterParser.cpp:
     15        (WebCore::ContentExtensions::PatternParser::resetForReparsing):
     16        (WebCore::ContentExtensions::URLFilterParser::addPattern):
     17        (WebCore::ContentExtensions::PatternParser::isValidNamedForwardReference): Deleted.
     18
    1192020-03-25  Pinki Gyanchandani  <pgyanchandani@apple.com>
    220
  • trunk/Source/WebCore/contentextensions/URLFilterParser.cpp

    r247688 r259026  
    135135    }
    136136
    137     bool isValidNamedForwardReference(const String&)
    138     {
    139         return false;
    140     }
    141 
    142137    void atomNamedForwardReference(const String&)
    143138    {
     
    248243    {
    249244        fail(URLFilterParser::Disjunction);
     245    }
     246
     247    NO_RETURN_DUE_TO_CRASH void resetForReparsing()
     248    {
     249        RELEASE_ASSERT_NOT_REACHED();
    250250    }
    251251
     
    359359    ParseStatus status = Ok;
    360360    PatternParser patternParser(patternIsCaseSensitive);
    361     if (!JSC::Yarr::hasError(JSC::Yarr::parse(patternParser, pattern, false, 0)))
     361    if (!JSC::Yarr::hasError(JSC::Yarr::parse(patternParser, pattern, false, 0, false)))
    362362        patternParser.finalize(patternId, m_combinedURLFilters);
    363363    else
  • trunk/Tools/ChangeLog

    r259019 r259026  
     12020-03-25  Alexey Shvayka  <shvaikalesh@gmail.com>
     2
     3        Invalid numeric and named references should be early syntax errors
     4        https://bugs.webkit.org/show_bug.cgi?id=178175
     5
     6        Reviewed by Ross Kirsling.
     7
     8        Removes FIXME as YarrParser is correct not to throw errors as it is
     9        parsing in non-Unicode mode. Also adds a few named groups tests.
     10
     11        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
     12
    1132020-03-25  Zhifei Fang  <zhifei_fang@apple.com>
    214
  • trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp

    r246490 r259026  
    17611761    testPatternStatus("[a}", ContentExtensions::URLFilterParser::ParseStatus::YarrError);
    17621762   
    1763     // FIXME: Look into why these do not cause YARR parsing errors.  They probably should.
    17641763    testPatternStatus("a]", ContentExtensions::URLFilterParser::ParseStatus::Ok);
    17651764    testPatternStatus("{", ContentExtensions::URLFilterParser::ParseStatus::Ok);
     
    17901789    testPatternStatus("(a)\\1", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Back references are disabled, it parse as octal 1
    17911790    testPatternStatus("(<A>a)\\k<A>", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Named back references aren't handled, it parse as "k<A>"
     1791    testPatternStatus("(?<A>a)\\k<A>", ContentExtensions::URLFilterParser::ParseStatus::BackReference);
    17921792    testPatternStatus("\\1(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as octal 1
    17931793    testPatternStatus("\\8(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as '8'
    17941794    testPatternStatus("\\9(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Forward references are disabled, it parse as '9'
    17951795    testPatternStatus("\\k<A>(<A>a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Named forward references aren't handled, it parse as "k<A>"
     1796    testPatternStatus("\\k<A>(?<A>a)", ContentExtensions::URLFilterParser::ParseStatus::YarrError);
    17961797    testPatternStatus("\\k<A>(a)", ContentExtensions::URLFilterParser::ParseStatus::Ok); // Unmatched named forward references aren't handled, it parse as "k<A>"
    17971798}
Note: See TracChangeset for help on using the changeset viewer.