Context Navigation

← Previous Changeset
Next Changeset →

Changeset 39162 in webkit

Timestamp:

Dec 9, 2008 8:59:06 PM (15 years ago)

Author:

ggaren@apple.com

Message:

JavaScriptCore:

2008-12-09 Geoffrey Garen <ggaren@apple.com>

Reviewed by Cameron Zwarich.

In preparation for compiling WREC without PCRE:

Further relaxed WREC's parsing to be more web-compatible. Fixed PCRE to
match in cases where it didn't already.

Changed JavaScriptCore to report syntax errors detected by WREC, rather
than falling back on PCRE any time WREC sees an error.

pcre/pcre_compile.cpp: (checkEscape): Relaxed parsing of \c and \N escapes to be more web-compatible.

runtime/RegExp.cpp: (JSC::RegExp::RegExp): Only fall back on PCRE if WREC has not reported a syntax error.

wrec/WREC.cpp: (JSC::WREC::Generator::compileRegExp): Fixed some error reporting to match PCRE.

wrec/WRECParser.cpp: Added error messages that match PCRE.

(JSC::WREC::Parser::consumeGreedyQuantifier):
(JSC::WREC::Parser::parseParentheses):
(JSC::WREC::Parser::parseCharacterClass):
(JSC::WREC::Parser::parseNonCharacterEscape): Updated the above functions to
use the new setError API.

(JSC::WREC::Parser::consumeEscape): Relaxed parsing of \c \N \u \x \B
to be more web-compatible.

(JSC::WREC::Parser::parseAlternative): Distinguish between a malformed
quantifier and a quantifier with no prefix, like PCRE does.

(JSC::WREC::Parser::consumeParenthesesType): Updated to use the new setError API.

wrec/WRECParser.h: (JSC::WREC::Parser::error): (JSC::WREC::Parser::syntaxError): (JSC::WREC::Parser::parsePattern): (JSC::WREC::Parser::reset): (JSC::WREC::Parser::setError): Store error messages instead of error codes, to provide for exception messages. Use a setter for reporting errors, so errors detected early are not overwritten by errors detected later.

LayoutTests:

2008-12-09 Geoffrey Garen <ggaren@apple.com>

Reviewed by Cameron Zwarich.

Updated regular expression layout tests to be agnostic between WREC
and PCRE quirks. Also, updated results to match new, more web-compatible
regular expression parsing.

fast/js/regexp-charclass-crash-expected.txt:
fast/js/regexp-charclass-crash.html:
fast/js/regexp-no-extensions-expected.txt:
fast/js/resources/regexp-no-extensions.js:
fast/regex/test1-expected.txt:

Location:

trunk

Files:

: 12 edited

JavaScriptCore/ChangeLog (modified) (1 diff)
JavaScriptCore/pcre/pcre_compile.cpp (modified) (2 diffs)
JavaScriptCore/runtime/RegExp.cpp (modified) (2 diffs)
JavaScriptCore/wrec/WREC.cpp (modified) (2 diffs)
JavaScriptCore/wrec/WRECParser.cpp (modified) (16 diffs)
JavaScriptCore/wrec/WRECParser.h (modified) (6 diffs)
LayoutTests/ChangeLog (modified) (1 diff)
LayoutTests/fast/js/regexp-charclass-crash-expected.txt (modified) (1 diff)
LayoutTests/fast/js/regexp-charclass-crash.html (modified) (1 diff)
LayoutTests/fast/js/regexp-no-extensions-expected.txt (modified) (2 diffs)
LayoutTests/fast/js/resources/regexp-no-extensions.js (modified) (2 diffs)
LayoutTests/fast/regex/test1-expected.txt (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/JavaScriptCore/ChangeLog

-                      r39161
+                      r39162
+-12-09  Geoffrey Garen  <ggaren@apple.com>
+        Reviewed by Cameron Zwarich.
+        In preparation for compiling WREC without PCRE:
+        Further relaxed WREC's parsing to be more web-compatible. Fixed PCRE to
+        match in cases where it didn't already.
+        Changed JavaScriptCore to report syntax errors detected by WREC, rather
+        than falling back on PCRE any time WREC sees an error.
+        * pcre/pcre_compile.cpp:
+        (checkEscape): Relaxed parsing of \c and \N escapes to be more
+        web-compatible.
+        * runtime/RegExp.cpp:
+        (JSC::RegExp::RegExp): Only fall back on PCRE if WREC has not reported
+        a syntax error.
+        * wrec/WREC.cpp:
+        (JSC::WREC::Generator::compileRegExp): Fixed some error reporting to
+        match PCRE.
+        * wrec/WRECParser.cpp: Added error messages that match PCRE.
+        (JSC::WREC::Parser::consumeGreedyQuantifier):
+        (JSC::WREC::Parser::parseParentheses):
+        (JSC::WREC::Parser::parseCharacterClass):
+        (JSC::WREC::Parser::parseNonCharacterEscape): Updated the above functions to
+        use the new setError API.
+        (JSC::WREC::Parser::consumeEscape): Relaxed parsing of \c \N \u \x \B
+        to be more web-compatible.
+        (JSC::WREC::Parser::parseAlternative): Distinguish between a malformed
+        quantifier and a quantifier with no prefix, like PCRE does.
+        (JSC::WREC::Parser::consumeParenthesesType): Updated to use the new setError API.
+        * wrec/WRECParser.h:
+        (JSC::WREC::Parser::error):
+        (JSC::WREC::Parser::syntaxError):
+        (JSC::WREC::Parser::parsePattern):
+        (JSC::WREC::Parser::reset):
+        (JSC::WREC::Parser::setError): Store error messages instead of error codes,
+        to provide for exception messages. Use a setter for reporting errors, so
+        errors detected early are not overwritten by errors detected later.
 -12-09  Gavin Barraclough  <barraclough@apple.com>

trunk/JavaScriptCore/pcre/pcre_compile.cpp

-                      r34858
+                      r39162
                  this is not octal. */
+                if ((c = *ptr) >= '8')
+                if ((c = *ptr) >= '8') {
+                    c = '\\';
+                    ptr -= 1;
                     break;
+                }
             /* \0 always starts an octal number, but we may drop through to here with a
 …
                     return 0;
+                }
                 c = *ptr;
+                if (!isASCIIAlpha(c)) {
+                    c = '\\';
+                    ptr -= 2;
+                    break;
+                }
                 /* A letter is upper-cased; then the 0x40 bit is flipped. This coding
                  is ASCII-specific, but then the whole concept of \cx is ASCII-specific. */

trunk/JavaScriptCore/runtime/RegExp.cpp

-                      r39089
+                      r39162
 #if ENABLE(WREC)
     m_wrecFunction = Generator::compileRegExp(globalData, pattern, &m_numSubpatterns, &m_constructionError, m_executablePool);
     if (m_wrecFunction)
+    if (m_wrecFunction || m_constructionError)
         return;
     // Fall through to non-WREC case.
 …
 #if ENABLE(WREC)
     m_wrecFunction = Generator::compileRegExp(globalData, pattern, &m_numSubpatterns, &m_constructionError, m_executablePool, (m_flagBits & IgnoreCase), (m_flagBits & Multiline));
     if (m_wrecFunction)
+    if (m_wrecFunction || m_constructionError)
         return;
     // Fall through to non-WREC case.

trunk/JavaScriptCore/wrec/WREC.cpp

r39128	r39162
47	47	{
48	48	if (pattern.size() > MaxPatternSize) {
49		*error_ptr = "~~Regular expression too large.~~";
	49	*error_ptr = "regular expression too large";
50	50	return 0;
51	51	}
…	…
76	76
77	77	if (parser.error()) {
78		*error_ptr = ~~"Regular expression malformed.";~~
	78	*error_ptr = parser.syntaxError(); // NULL in the case of patterns that WREC doesn't support yet.
79	79	return 0;
80	80	}

trunk/JavaScriptCore/wrec/WRECParser.cpp

-                      r39130
+                      r39162
 namespace JSC { namespace WREC {
+// These error messages match the error messages used by PCRE.
+const char* Parser::QuantifierOutOfOrder = "numbers out of order in {} quantifier";
+const char* Parser::QuantifierWithoutAtom = "nothing to repeat";
+const char* Parser::ParenthesesUnmatched = "unmatched parentheses";
+const char* Parser::ParenthesesTypeInvalid = "unrecognized character after (?";
+const char* Parser::ParenthesesNotSupported = ""; // Not a user-visible syntax error -- just signals a syntax that WREC doesn't support yet.
+const char* Parser::CharacterClassUnmatched = "missing terminating ] for character class";
+const char* Parser::CharacterClassOutOfOrder = "range out of order in character class";
+const char* Parser::EscapeUnterminated = "\\ at end of pattern";
 class PatternCharacterSequence {
 typedef Generator::JumpList JumpList;
 …
             if (min > max) {
                 m_error = MalformedQuantifier;
+                setError(QuantifierOutOfOrder);
                 return Quantifier(Quantifier::Error);
+            }
 …
         default:
             m_error = UnsupportedParentheses;
+            setError(ParenthesesNotSupported);
             return false;
+    }
     if (consume() != ')') {
         m_error = MalformedParentheses;
+        setError(ParenthesesUnmatched);
         return false;
+    }
 …
         case Quantifier::Greedy:
             m_error = UnsupportedParentheses;
+            setError(ParenthesesNotSupported);
             return false;
         case Quantifier::NonGreedy:
             m_error = UnsupportedParentheses;
+            setError(ParenthesesNotSupported);
             return false;
 …
     CharacterClassConstructor constructor(m_ignoreCase);
     UChar ch;
+    int ch;
     while ((ch = peek()) != ']') {
         switch (ch) {
         case EndOfPattern:
             m_error = MalformedCharacterClass;
+            setError(CharacterClassUnmatched);
             return false;
 …
                     break;
+                }
+                case Escape::Error: {
+                    m_error = MalformedEscape;
+                case Escape::Error:
                     return false;
+                }
                 case Escape::Backreference:
                 case Escape::WordBoundaryAssertion: {
 …
     // lazily catch reversed ranges ([z-a])in character classes
     if (constructor.isUpsideDown()) {
         m_error = MalformedCharacterClass;
+        setError(CharacterClassOutOfOrder);
         return false;
+    }
 …
         case Escape::Error:
-            m_error = MalformedEscape;
             return false;
+    }
 …
     switch (peek()) {
     case EndOfPattern:
+        setError(EscapeUnterminated);
         return Escape(Escape::Error);
 …
         consume();
         if (inCharacterClass)
             return Escape(Escape::Error);
+            return PatternCharacterEscape('B');
         return WordBoundaryAssertionEscape(true); // invert
 …
             // To match Firefox, we parse an invalid backreference in the range [1-7]
             // as an octal escape.
             return peekDigit() > 7 ? Escape(Escape::Error) : PatternCharacterEscape(consumeOctal());
+            return peekDigit() > 7 ? PatternCharacterEscape('\\') : PatternCharacterEscape(consumeOctal());
+        }
 …
     // ControlLetter
     case 'c': {
+        consume();
+        SavedState state(*this);
+        consume();
         int control = consume();
+        if (!isASCIIAlpha(control))
+            return Escape(Escape::Error);
+        if (!isASCIIAlpha(control)) {
+            state.restore();
+            return PatternCharacterEscape('\\');
+        }
         return PatternCharacterEscape(control & 31);
+    }
 …
     case 'x': {
         consume();
+        SavedState state(*this);
         int x = consumeHex(2);
+        if (x == -1)
+            return Escape(Escape::Error);
+        if (x == -1) {
+            state.restore();
+            return PatternCharacterEscape('x');
+        }
         return PatternCharacterEscape(x);
+    }
 …
     case 'u': {
         consume();
+        SavedState state(*this);
         int x = consumeHex(4);
+        if (x == -1)
+            return Escape(Escape::Error);
+        if (x == -1) {
+            state.restore();
+            return PatternCharacterEscape('u');
+        }
         return PatternCharacterEscape(x);
+    }
 …
+            }
+            if (q.type == Quantifier::Error || !sequence.size()) {
+                m_error = MalformedQuantifier;
+            if (q.type == Quantifier::Error)
+                return;
+            if (!sequence.size()) {
+                setError(QuantifierWithoutAtom);
                 return;
+            }
 …
     default:
         m_error = MalformedParentheses;
+        setError(ParenthesesTypeInvalid);
         return Generator::Error;
+    }

trunk/JavaScriptCore/wrec/WRECParser.h

-                      r39130
+                      r39162
     public:
-        enum Error {
-            NoError,
-            MalformedCharacterClass,
-            MalformedParentheses,
-            MalformedPattern,
-            MalformedQuantifier,
-            MalformedEscape,
-            UnsupportedParentheses,
-        };
         Parser(const UString& pattern, bool ignoreCase, bool multiline)
             : m_generator(*this)
 …
         unsigned numSubpatterns() const { return m_numSubpatterns; }
+        Error error() const { return m_error; }
+        const char* error() const { return m_error; }
+        const char* syntaxError() const { return m_error == ParenthesesNotSupported ? 0 : m_error; }
         void parsePattern(JumpList& failures)
 …
             if (peek() != EndOfPattern)
                 m_error = MalformedPattern; // Parsing the pattern should fully consume it.
+                setError(ParenthesesUnmatched); // Parsing the pattern should fully consume it.
+        }
 …
             m_index = 0;
             m_numSubpatterns = 0;
+            m_error = NoError;
+            m_error = 0;
+        }
+        void setError(const char* error)
+        {
+            if (m_error)
+                return;
+            m_error = error;
+        }
 …
         static const int EndOfPattern = -1;
+        // Error messages.
+        static const char* QuantifierOutOfOrder;
+        static const char* QuantifierWithoutAtom;
+        static const char* ParenthesesUnmatched;
+        static const char* ParenthesesTypeInvalid;
+        static const char* ParenthesesNotSupported;
+        static const char* CharacterClassUnmatched;
+        static const char* CharacterClassOutOfOrder;
+        static const char* EscapeUnterminated;
         Generator m_generator;
 …
         bool m_multiline;
         unsigned m_numSubpatterns;
         Error m_error;
+        const char* m_error;
     };

trunk/LayoutTests/ChangeLog

-                      r39159
+                      r39162
+-12-09  Geoffrey Garen  <ggaren@apple.com>
+        Reviewed by Cameron Zwarich.
+        Updated regular expression layout tests to be agnostic between WREC
+        and PCRE quirks. Also, updated results to match new, more web-compatible
+        regular expression parsing.
+        * fast/js/regexp-charclass-crash-expected.txt:
+        * fast/js/regexp-charclass-crash.html:
+        * fast/js/regexp-no-extensions-expected.txt:
+        * fast/js/resources/regexp-no-extensions.js:
+        * fast/regex/test1-expected.txt:
 -12-09  David Levin  <levin@chromium.org>

trunk/LayoutTests/fast/js/regexp-charclass-crash-expected.txt

r24430	r39162
1	1	Tests a crash in the regular expression engine. If this stops with a single "regular expression too large" exception, then the test succeeded.
2	2
3		Got ~~up to iteration 1872~~ and then got this exception: SyntaxError: Invalid regular expression: regular expression too large.
	3	Got over 1000 iterations and then got this exception: SyntaxError: Invalid regular expression: regular expression too large.

trunk/LayoutTests/fast/js/regexp-charclass-crash.html

-                      r24430
+                      r39162
         new RegExp(string);
     } catch (exception) {
         if (/too large/.test(exception)) {
             document.writeln("<div>Got up to iteration " + i + " and then got this exception: " + exception + ".</div>");
+        if (/too large/.test(exception) && i > 1000) {
+            document.writeln("<div>Got over 1000 iterations and then got this exception: " + exception + ".</div>");
             break;
+        }

trunk/LayoutTests/fast/js/regexp-no-extensions-expected.txt

-                      r27752
+                      r39162
 PASS /\2147483648/.exec(String.fromCharCode(140) + "7483648").toString() is String.fromCharCode(140) + "7483648"
 PASS /\4294967296/.exec("\"94967296").toString() is "\"94967296"
 PASS /\8589934592/.exec("8589934592").toString() is "8589934592"
+PASS /\8589934592/.exec("\\8589934592").toString() is "\\8589934592"
 PASS "\nAbc\n".replace(/(\n)[^\n]+$/, "$1") is "\nAbc\n"
 PASS /x$/.exec("x\n") is null
 …
 PASS /[\1q]/.exec("y" + String.fromCharCode(1) + "q").toString() is String.fromCharCode(1)
 PASS /[\1q]/.exec("yq").toString() is "q"
 PASS /\8q/.exec("y8q").toString() is "8q"
+PASS /\8q/.exec("\\8q").toString() is "\\8q"
 PASS /[\8q]/.exec("y8q").toString() is "8"
 PASS /[\8q]/.exec("yq").toString() is "q"

trunk/LayoutTests/fast/js/resources/regexp-no-extensions.js

-                      r27752
+                      r39162
 shouldBe('/\\2147483648/.exec(String.fromCharCode(140) + "7483648").toString()', 'String.fromCharCode(140) + "7483648"');
 shouldBe('/\\4294967296/.exec("\\"94967296").toString()', '"\\"94967296"');
 shouldBe('/\\8589934592/.exec("8589934592").toString()', '"8589934592"');
+shouldBe('/\\8589934592/.exec("\\\\8589934592").toString()', '"\\\\8589934592"');
 shouldBe('"\\nAbc\\n".replace(/(\\n)[^\\n]+$/, "$1")', '"\\nAbc\\n"');
 shouldBe('/x$/.exec("x\\n")', 'null');
 …
 shouldBe('/[\\1q]/.exec("y" + String.fromCharCode(1) + "q").toString()', 'String.fromCharCode(1)');
 shouldBe('/[\\1q]/.exec("yq").toString()', '"q"');
 shouldBe('/\\8q/.exec("y8q").toString()', '"8q"');
+shouldBe('/\\8q/.exec("\\\\8q").toString()', '"\\\\8q"');
 shouldBe('/[\\8q]/.exec("y8q").toString()', '"8"');
 shouldBe('/[\\8q]/.exec("yq").toString()', '"q"');

trunk/LayoutTests/fast/regex/test1-expected.txt

r30517	r39162
109	109
110	110	/^\ca\cA\c[\c{\c:/
111		\e;z: FAIL. Actual results: "null"
	111	FAILED TO COMPILE
112	112
113	113	/^[ab\]cde]/

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 39162 in webkit

Legend:

Download in other formats: