Changeset 258531 in webkit
- Timestamp:
- Mar 16, 2020 5:12:17 PM (4 years ago)
- Location:
- trunk
- Files:
-
- 1 added
- 13 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/JSTests/ChangeLog
r258419 r258531 1 2020-03-16 Keith Miller <keith_miller@apple.com> 2 3 JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t 4 https://bugs.webkit.org/show_bug.cgi?id=208998 5 6 Reviewed by Michael Saboff. 7 8 * stress/unicode-identifiers-with-surrogate-pairs.js: Added. 9 (let.c.of.chars.eval.foo): 10 (throwsSyntaxError): 11 (let.c.of.continueChars.throwsSyntaxError.foo): 12 1 13 2020-03-13 Saam Barati <sbarati@apple.com> 2 14 -
trunk/LayoutTests/ChangeLog
r258526 r258531 1 2020-03-16 Keith Miller <keith_miller@apple.com> 2 3 JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t 4 https://bugs.webkit.org/show_bug.cgi?id=208998 5 6 Reviewed by Michael Saboff. 7 8 Fix broken test that asserted a non-ID_START codepoint was a start codepoint and 9 an ID_START codepoint was not a valid codepoint... 10 11 * js/script-tests/unicode-escape-sequences.js: 12 * js/unicode-escape-sequences-expected.txt: 13 1 14 2020-03-16 Jason Lawrence <lawrence.j@apple.com> 2 15 -
trunk/LayoutTests/js/script-tests/unicode-escape-sequences.js
r183552 r258531 75 75 testIdentifierStartUnicodeEscapeSequence("{102C0}", "D800,DEC0"); 76 76 testIdentifierStartUnicodeEscapeSequence("{102c0}", "D800,DEC0"); 77 testIdentifierStartUnicodeEscapeSequence("{1 D306}", "D834,DF06");78 testIdentifierStartUnicodeEscapeSequence("{1 d306}", "D834,DF06");77 testIdentifierStartUnicodeEscapeSequence("{10000}", "D800,DC00"); 78 testIdentifierStartUnicodeEscapeSequence("{10001}", "D800,DC01"); 79 79 80 80 testInvalidIdentifierStartUnicodeEscapeSequence(""); … … 86 86 testInvalidIdentifierStartUnicodeEscapeSequence("{FFFF}"); 87 87 testInvalidIdentifierStartUnicodeEscapeSequence("{ffff}"); 88 testInvalidIdentifierStartUnicodeEscapeSequence("{10000}");89 testInvalidIdentifierStartUnicodeEscapeSequence("{10001}");90 88 testInvalidIdentifierStartUnicodeEscapeSequence("{10FFFE}"); 91 89 testInvalidIdentifierStartUnicodeEscapeSequence("{10fffe}"); … … 94 92 testInvalidIdentifierStartUnicodeEscapeSequence("{00000000000000000000000010FFFF}"); 95 93 testInvalidIdentifierStartUnicodeEscapeSequence("{00000000000000000000000010ffff}"); 94 testInvalidIdentifierStartUnicodeEscapeSequence("{1D306}"); 95 testInvalidIdentifierStartUnicodeEscapeSequence("{1d306}"); 96 96 97 97 testInvalidIdentifierStartUnicodeEscapeSequence("x"); -
trunk/LayoutTests/js/unicode-escape-sequences-expected.txt
r211319 r258531 36 36 PASS codeUnits(function \u{102C0}(){}.name) is "D800,DEC0" 37 37 PASS codeUnits(function \u{102c0}(){}.name) is "D800,DEC0" 38 PASS codeUnits(function \u{1 D306}(){}.name) is "D834,DF06"39 PASS codeUnits(function \u{1 d306}(){}.name) is "D834,DF06"38 PASS codeUnits(function \u{10000}(){}.name) is "D800,DC00" 39 PASS codeUnits(function \u{10001}(){}.name) is "D800,DC01" 40 40 PASS codeUnits(function \u(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u'. 41 41 PASS codeUnits(function \u{0}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{0}'. … … 46 46 PASS codeUnits(function \u{FFFF}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{FFFF}'. 47 47 PASS codeUnits(function \u{ffff}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{ffff}'. 48 PASS codeUnits(function \u{10000}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10000}'.49 PASS codeUnits(function \u{10001}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10001}'.50 48 PASS codeUnits(function \u{10FFFE}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10FFFE}'. 51 49 PASS codeUnits(function \u{10fffe}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{10fffe}'. … … 54 52 PASS codeUnits(function \u{00000000000000000000000010FFFF}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{00000000000000000000000010FFFF}'. 55 53 PASS codeUnits(function \u{00000000000000000000000010ffff}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{00000000000000000000000010ffff}'. 54 PASS codeUnits(function \u{1D306}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{1D306}'. 55 PASS codeUnits(function \u{1d306}(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{1d306}'. 56 56 PASS codeUnits(function \ux(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u'. 57 57 PASS codeUnits(function \u{(){}.name) threw exception SyntaxError: Invalid unicode escape in identifier: '\u{'. -
trunk/Source/JavaScriptCore/ChangeLog
r258498 r258531 1 2020-03-16 Keith Miller <keith_miller@apple.com> 2 3 JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t 4 https://bugs.webkit.org/show_bug.cgi?id=208998 5 6 Reviewed by Michael Saboff. 7 8 This patch fixes a bug in the parser that allows for surrogate pairs when parsing identifiers. 9 It also makes a few other changes to the parser: 10 11 1) When looking for keywords we just need to check that subsequent 12 character cannot be a identifier part or an escape start. 13 14 2) The only time we call parseIdentifierSlowCase is when we hit an 15 escape start or a surrogate pair so we can optimize that to just 16 copy everything up slow character into our buffer. 17 18 3) We shouldn't allow for asking if a UChar is an identifier start/part. 19 20 * KeywordLookupGenerator.py: 21 (Trie.printSubTreeAsC): 22 (Trie.printAsC): 23 * parser/Lexer.cpp: 24 (JSC::isNonLatin1IdentStart): 25 (JSC::isIdentStart): 26 (JSC::isSingleCharacterIdentStart): 27 (JSC::cannotBeIdentStart): 28 (JSC::isIdentPart): 29 (JSC::isSingleCharacterIdentPart): 30 (JSC::cannotBeIdentPartOrEscapeStart): 31 (JSC::Lexer<LChar>::currentCodePoint const): 32 (JSC::Lexer<UChar>::currentCodePoint const): 33 (JSC::Lexer<LChar>::parseIdentifier): 34 (JSC::Lexer<UChar>::parseIdentifier): 35 (JSC::Lexer<CharacterType>::parseIdentifierSlowCase): 36 (JSC::Lexer<T>::lexWithoutClearingLineTerminator): 37 (JSC::Lexer<T>::scanRegExp): 38 (JSC::isIdentPartIncludingEscapeTemplate): Deleted. 39 (JSC::isIdentPartIncludingEscape): Deleted. 40 * parser/Lexer.h: 41 (JSC::Lexer::setOffsetFromSourcePtr): Deleted. 42 * parser/Parser.cpp: 43 (JSC::Parser<LexerType>::printUnexpectedTokenText): 44 * parser/ParserTokens.h: 45 1 46 2020-03-13 Sergio Villar Senin <svillar@igalia.com> 2 47 -
trunk/Source/JavaScriptCore/KeywordLookupGenerator.py
r250005 r258531 142 142 143 143 if self.value != None: 144 print(str + "if ( !isIdentPartIncludingEscape(code+%d, m_codeEnd)) {" % (len(self.fullPrefix)))144 print(str + "if (LIKELY(cannotBeIdentPartOrEscapeStart(code[%d]))) {" % (len(self.fullPrefix))) 145 145 print(str + " internalShift<%d>();" % len(self.fullPrefix)) 146 146 print(str + " if (shouldCreateIdentifier)") … … 185 185 print("namespace JSC {") 186 186 print("") 187 print("static ALWAYS_INLINE bool isIdentPartIncludingEscape(const LChar* code, const LChar* codeEnd);")188 print("static ALWAYS_INLINE bool isIdentPartIncludingEscape(const UChar* code, const UChar* codeEnd);")187 print("static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(LChar);") 188 print("static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(UChar);") 189 189 # max length + 1 so we don't need to do any bounds checking at all 190 190 print("static constexpr int maxTokenLength = %d;" % (self.maxLength() + 1)) -
trunk/Source/JavaScriptCore/parser/Lexer.cpp
r257681 r258531 733 733 } 734 734 735 static NEVER_INLINE bool isNonLatin1IdentStart(UCharc)735 static bool isNonLatin1IdentStart(UChar32 c) 736 736 { 737 737 return u_hasBinaryProperty(c, UCHAR_ID_START); 738 738 } 739 739 740 static inline bool isIdentStart(LChar c) 741 { 742 return typesOfLatin1Characters[c] == CharacterIdentifierStart; 743 } 744 745 static inline bool isIdentStart(UChar32 c) 746 { 747 return isLatin1(c) ? isIdentStart(static_cast<LChar>(c)) : isNonLatin1IdentStart(c); 740 template<typename CharacterType> 741 static ALWAYS_INLINE bool isIdentStart(CharacterType c) 742 { 743 static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, UChar32>, "Call isSingleCharacterIdentStart for UChars that don't need to check for surrogate pairs"); 744 if (!isLatin1(c)) 745 return isNonLatin1IdentStart(c); 746 return typesOfLatin1Characters[static_cast<LChar>(c)] == CharacterIdentifierStart; 747 } 748 749 static ALWAYS_INLINE bool isSingleCharacterIdentStart(UChar c) 750 { 751 if (LIKELY(isLatin1(c))) 752 return isIdentStart(static_cast<LChar>(c)); 753 return !U16_IS_SURROGATE(c) && isIdentStart(static_cast<UChar32>(c)); 754 } 755 756 static ALWAYS_INLINE bool cannotBeIdentStart(LChar c) 757 { 758 return !isIdentStart(c) && c != '\\'; 759 } 760 761 static ALWAYS_INLINE bool cannotBeIdentStart(UChar c) 762 { 763 if (LIKELY(isLatin1(c))) 764 return cannotBeIdentStart(static_cast<LChar>(c)); 765 return Lexer<UChar>::isWhiteSpace(c) || Lexer<UChar>::isLineTerminator(c); 748 766 } 749 767 … … 753 771 } 754 772 755 static ALWAYS_INLINE bool isIdentPart(LChar c) 756 { 773 template<typename CharacterType> 774 static ALWAYS_INLINE bool isIdentPart(CharacterType c) 775 { 776 static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, UChar32>, "Call isSingleCharacterIdentPart for UChars that don't need to check for surrogate pairs"); 777 if (!isLatin1(c)) 778 return isNonLatin1IdentPart(c); 779 757 780 // Character types are divided into two groups depending on whether they can be part of an 758 781 // identifier or not. Those whose type value is less or equal than CharacterOtherIdentifierPart can be 759 782 // part of an identifier. (See the CharacterType definition for more details.) 760 return typesOfLatin1Characters[c] <= CharacterOtherIdentifierPart; 761 } 762 763 static ALWAYS_INLINE bool isIdentPart(UChar32 c) 764 { 765 return isLatin1(c) ? isIdentPart(static_cast<LChar>(c)) : isNonLatin1IdentPart(c); 766 } 767 768 static ALWAYS_INLINE bool isIdentPart(UChar c) 769 { 770 return isIdentPart(static_cast<UChar32>(c)); 771 } 772 773 template<typename CharacterType> ALWAYS_INLINE bool isIdentPartIncludingEscapeTemplate(const CharacterType* code, const CharacterType* codeEnd) 774 { 775 if (isIdentPart(code[0])) 776 return true; 777 778 // Shortest sequence handled below is \u{0}, which is 5 characters. 779 if (!(code[0] == '\\' && codeEnd - code >= 5 && code[1] == 'u')) 780 return false; 781 782 if (code[2] == '{') { 783 UChar32 codePoint = 0; 784 const CharacterType* pointer; 785 for (pointer = &code[3]; pointer < codeEnd; ++pointer) { 786 auto digit = *pointer; 787 if (!isASCIIHexDigit(digit)) 788 break; 789 codePoint = (codePoint << 4) | toASCIIHexValue(digit); 790 if (codePoint > UCHAR_MAX_VALUE) 791 return false; 792 } 793 return isIdentPart(codePoint) && pointer < codeEnd && *pointer == '}'; 794 } 795 796 // Shortest sequence handled below is \uXXXX, which is 6 characters. 797 if (codeEnd - code < 6) 798 return false; 799 800 auto character1 = code[2]; 801 auto character2 = code[3]; 802 auto character3 = code[4]; 803 auto character4 = code[5]; 804 return isASCIIHexDigit(character1) && isASCIIHexDigit(character2) && isASCIIHexDigit(character3) && isASCIIHexDigit(character4) 805 && isIdentPart(Lexer<LChar>::convertUnicode(character1, character2, character3, character4)); 806 } 807 808 static ALWAYS_INLINE bool isIdentPartIncludingEscape(const LChar* code, const LChar* codeEnd) 809 { 810 return isIdentPartIncludingEscapeTemplate(code, codeEnd); 811 } 812 813 static ALWAYS_INLINE bool isIdentPartIncludingEscape(const UChar* code, const UChar* codeEnd) 814 { 815 return isIdentPartIncludingEscapeTemplate(code, codeEnd); 783 return typesOfLatin1Characters[static_cast<LChar>(c)] <= CharacterOtherIdentifierPart; 784 } 785 786 static ALWAYS_INLINE bool isSingleCharacterIdentPart(UChar c) 787 { 788 if (LIKELY(isLatin1(c))) 789 return isIdentPart(static_cast<LChar>(c)); 790 return !U16_IS_SURROGATE(c) && isIdentPart(static_cast<UChar32>(c)); 791 } 792 793 static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(LChar c) 794 { 795 return !isIdentPart(c) && c != '\\'; 796 } 797 798 // NOTE: This may give give false negatives (for non-ascii) but won't give false posititves. 799 // This means it can be used to detect the end of a keyword (all keywords are ascii) 800 static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(UChar c) 801 { 802 if (LIKELY(isLatin1(c))) 803 return cannotBeIdentPartOrEscapeStart(static_cast<LChar>(c)); 804 return Lexer<UChar>::isWhiteSpace(c) || Lexer<UChar>::isLineTerminator(c); 805 } 806 807 808 template<> 809 ALWAYS_INLINE UChar32 Lexer<LChar>::currentCodePoint() const 810 { 811 return m_current; 812 } 813 814 template<> 815 ALWAYS_INLINE UChar32 Lexer<UChar>::currentCodePoint() const 816 { 817 ASSERT_WITH_MESSAGE(!isIdentStart(static_cast<UChar32>(U_SENTINEL)), "error values shouldn't appear as a valid identifier start code point"); 818 if (!U16_IS_SURROGATE(m_current)) 819 return m_current; 820 821 UChar trail = peek(1); 822 if (UNLIKELY(!U16_IS_LEAD(m_current) || !U16_IS_SURROGATE_TRAIL(trail))) 823 return U_SENTINEL; 824 825 UChar32 codePoint = U16_GET_SUPPLEMENTARY(m_current, trail); 826 return codePoint; 816 827 } 817 828 … … 953 964 954 965 const LChar* identifierStart = currentSourcePtr(); 955 unsigned identifierLineStart = currentLineStartOffset(); 966 ASSERT(isIdentStart(m_current) || m_current == '\\'); 967 while (isIdentPart(m_current)) 968 shift(); 956 969 957 while (isIdentPart(m_current)) 958 shift(); 959 960 if (UNLIKELY(m_current == '\\')) { 961 setOffsetFromSourcePtr(identifierStart, identifierLineStart); 962 return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexerFlags, strictMode); 963 } 970 if (UNLIKELY(m_current == '\\')) 971 return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexerFlags, strictMode, identifierStart); 964 972 965 973 const Identifier* ident = nullptr; … … 1008 1016 template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType Lexer<UChar>::parseIdentifier(JSTokenData* tokenData, OptionSet<LexerFlags> lexerFlags, bool strictMode) 1009 1017 { 1018 ASSERT(!m_parsingBuiltinFunction); 1010 1019 tokenData->escaped = false; 1011 1020 const ptrdiff_t remaining = m_codeEnd - m_code; … … 1017 1026 } 1018 1027 } 1028 1029 const UChar* identifierStart = currentSourcePtr(); 1030 UChar orAllChars = 0; 1031 ASSERT(isSingleCharacterIdentStart(m_current) || U16_IS_SURROGATE(m_current) || m_current == '\\'); 1032 while (isSingleCharacterIdentPart(m_current)) { 1033 orAllChars |= m_current; 1034 shift(); 1035 } 1019 1036 1020 bool isPrivateName = m_current == '@' && m_parsingBuiltinFunction; 1021 bool isWellKnownSymbol = false; 1022 if (isPrivateName) { 1023 ASSERT(m_parsingBuiltinFunction); 1024 shift(); 1025 if (m_current == '@') { 1026 isWellKnownSymbol = true; 1027 shift(); 1028 } 1029 } 1030 1031 1032 const UChar* identifierStart = currentSourcePtr(); 1033 int identifierLineStart = currentLineStartOffset(); 1034 1035 UChar orAllChars = 0; 1036 1037 while (isIdentPart(m_current)) { 1038 orAllChars |= m_current; 1039 shift(); 1040 } 1041 1042 if (UNLIKELY(m_current == '\\')) { 1043 ASSERT(!isPrivateName); 1044 setOffsetFromSourcePtr(identifierStart, identifierLineStart); 1045 return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexerFlags, strictMode); 1046 } 1047 1048 bool isAll8Bit = false; 1049 1050 if (!(orAllChars & ~0xff)) 1051 isAll8Bit = true; 1052 1037 if (UNLIKELY(U16_IS_SURROGATE(m_current) || m_current == '\\')) 1038 return parseIdentifierSlowCase<shouldCreateIdentifier>(tokenData, lexerFlags, strictMode, identifierStart); 1039 1040 bool isAll8Bit = !(orAllChars & ~0xff); 1053 1041 const Identifier* ident = nullptr; 1054 1042 1055 if (shouldCreateIdentifier || m_parsingBuiltinFunction) {1043 if (shouldCreateIdentifier) { 1056 1044 int identifierLength = currentSourcePtr() - identifierStart; 1057 if (m_parsingBuiltinFunction && isPrivateName) { 1058 if (isWellKnownSymbol) 1059 ident = &m_arena->makeIdentifier(m_vm, m_vm.propertyNames->builtinNames().lookUpWellKnownSymbol(identifierStart, identifierLength)); 1060 else 1061 ident = &m_arena->makeIdentifier(m_vm, m_vm.propertyNames->builtinNames().lookUpPrivateName(identifierStart, identifierLength)); 1062 if (!ident) 1063 return INVALID_PRIVATE_NAME_ERRORTOK; 1064 } else { 1065 if (isAll8Bit) 1066 ident = makeIdentifierLCharFromUChar(identifierStart, identifierLength); 1067 else 1068 ident = makeIdentifier(identifierStart, identifierLength); 1069 if (m_parsingBuiltinFunction) { 1070 if (!isSafeBuiltinIdentifier(m_vm, ident)) { 1071 m_lexErrorMessage = makeString("The use of '", ident->string(), "' is disallowed in builtin functions."); 1072 return ERRORTOK; 1073 } 1074 if (*ident == m_vm.propertyNames->undefinedKeyword) 1075 tokenData->ident = &m_vm.propertyNames->undefinedPrivateName; 1076 } 1077 } 1045 if (isAll8Bit) 1046 ident = makeIdentifierLCharFromUChar(identifierStart, identifierLength); 1047 else 1048 ident = makeIdentifier(identifierStart, identifierLength); 1078 1049 tokenData->ident = ident; 1079 1050 } else 1080 1051 tokenData->ident = nullptr; 1081 1052 1082 if (UNLIKELY((remaining < maxTokenLength) && !lexerFlags.contains(LexerFlags::IgnoreReservedWords)) && !isPrivateName) {1053 if (UNLIKELY((remaining < maxTokenLength) && !lexerFlags.contains(LexerFlags::IgnoreReservedWords))) { 1083 1054 ASSERT(shouldCreateIdentifier); 1084 1055 if (remaining < maxTokenLength) { … … 1096 1067 } 1097 1068 1098 template<typename CharacterType> template<bool shouldCreateIdentifier> JSTokenType Lexer<CharacterType>::parseIdentifierSlowCase(JSTokenData* tokenData, OptionSet<LexerFlags> lexerFlags, bool strictMode) 1099 { 1100 tokenData->escaped = true; 1101 auto identifierStart = currentSourcePtr(); 1102 bool bufferRequired = false; 1103 1104 while (true) { 1105 if (LIKELY(isIdentPart(m_current))) { 1106 shift(); 1107 continue; 1108 } 1109 if (LIKELY(m_current != '\\')) 1110 break; 1111 1112 // \uXXXX unicode characters. 1113 bufferRequired = true; 1069 template<typename CharacterType> 1070 template<bool shouldCreateIdentifier> 1071 JSTokenType Lexer<CharacterType>::parseIdentifierSlowCase(JSTokenData* tokenData, OptionSet<LexerFlags> lexerFlags, bool strictMode, const CharacterType* identifierStart) 1072 { 1073 ASSERT(U16_IS_SURROGATE(m_current) || m_current == '\\'); 1074 ASSERT(m_buffer16.isEmpty()); 1075 ASSERT(!tokenData->escaped); 1076 1077 auto fillBuffer = [&] (bool isStart = false) { 1078 // \uXXXX unicode characters or Surrogate pairs. 1114 1079 if (identifierStart != currentSourcePtr()) 1115 1080 m_buffer16.append(identifierStart, currentSourcePtr() - identifierStart); 1116 shift(); 1117 if (UNLIKELY(m_current != 'u')) 1118 return atEnd() ? UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_ESCAPE_ERRORTOK; 1119 shift(); 1120 auto character = parseUnicodeEscape(); 1121 if (UNLIKELY(!character.isValid())) 1122 return character.isIncomplete() ? UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK; 1123 if (UNLIKELY(m_buffer16.size() ? !isIdentPart(character.value()) : !isIdentStart(character.value()))) 1124 return INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK; 1125 if (shouldCreateIdentifier) 1126 recordUnicodeCodePoint(character.value()); 1081 1082 if (m_current == '\\') { 1083 tokenData->escaped = true; 1084 shift(); 1085 if (UNLIKELY(m_current != 'u')) 1086 return atEnd() ? UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_ESCAPE_ERRORTOK; 1087 shift(); 1088 auto character = parseUnicodeEscape(); 1089 if (UNLIKELY(!character.isValid())) 1090 return character.isIncomplete() ? UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK; 1091 if (UNLIKELY(isStart ? !isIdentStart(character.value()) : !isIdentPart(character.value()))) 1092 return INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK; 1093 if (shouldCreateIdentifier) 1094 recordUnicodeCodePoint(character.value()); 1095 identifierStart = currentSourcePtr(); 1096 return IDENT; 1097 } 1098 1099 ASSERT(U16_IS_SURROGATE(m_current)); 1100 if (UNLIKELY(!U16_IS_SURROGATE_LEAD(m_current))) 1101 return INVALID_UNICODE_ENCODING_ERRORTOK; 1102 1103 UChar32 codePoint = currentCodePoint(); 1104 if (UNLIKELY(codePoint == U_SENTINEL)) 1105 return INVALID_UNICODE_ENCODING_ERRORTOK; 1106 if (UNLIKELY(isStart ? !isNonLatin1IdentStart(codePoint) : !isNonLatin1IdentPart(codePoint))) 1107 return INVALID_IDENTIFIER_UNICODE_ERRORTOK; 1108 append16(m_code, 2); 1109 shift(); 1110 shift(); 1127 1111 identifierStart = currentSourcePtr(); 1128 } 1129 1130 int identifierLength; 1112 return IDENT; 1113 }; 1114 1115 JSTokenType type = fillBuffer(identifierStart == currentSourcePtr()); 1116 if (UNLIKELY(type & ErrorTokenFlag)) 1117 return type; 1118 1119 while (true) { 1120 if (LIKELY(isSingleCharacterIdentPart(m_current))) { 1121 shift(); 1122 continue; 1123 } 1124 if (!U16_IS_SURROGATE(m_current) && m_current != '\\') 1125 break; 1126 1127 type = fillBuffer(); 1128 if (UNLIKELY(type & ErrorTokenFlag)) 1129 return type; 1130 } 1131 1131 1132 const Identifier* ident = nullptr; 1132 1133 if (shouldCreateIdentifier) { 1133 if (!bufferRequired) { 1134 identifierLength = currentSourcePtr() - identifierStart; 1135 ident = makeIdentifier(identifierStart, identifierLength); 1136 } else { 1137 if (identifierStart != currentSourcePtr()) 1138 m_buffer16.append(identifierStart, currentSourcePtr() - identifierStart); 1139 ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); 1140 } 1134 if (identifierStart != currentSourcePtr()) 1135 m_buffer16.append(identifierStart, currentSourcePtr() - identifierStart); 1136 ident = makeIdentifier(m_buffer16.data(), m_buffer16.size()); 1141 1137 1142 1138 tokenData->ident = ident; … … 1153 1149 JSTokenType token = static_cast<JSTokenType>(entry->lexerValue()); 1154 1150 if ((token != RESERVED_IF_STRICT) || strictMode) 1155 return bufferRequired ? UNEXPECTED_ESCAPE_ERRORTOK : token;1151 return UNEXPECTED_ESCAPE_ERRORTOK; 1156 1152 } 1157 1153 … … 1913 1909 if (LIKELY(isLatin1(m_current))) 1914 1910 type = static_cast<CharacterType>(typesOfLatin1Characters[m_current]); 1915 else if (isNonLatin1IdentStart(m_current)) 1916 type = CharacterIdentifierStart; 1917 else if (isLineTerminator(m_current)) 1918 type = CharacterLineTerminator; 1919 else 1920 type = CharacterInvalid; 1911 else { 1912 UChar32 codePoint; 1913 U16_GET(m_code, 0, 0, m_codeEnd - m_code, codePoint); 1914 if (isNonLatin1IdentStart(codePoint)) 1915 type = CharacterIdentifierStart; 1916 else if (isLineTerminator(m_current)) 1917 type = CharacterLineTerminator; 1918 else 1919 type = CharacterInvalid; 1920 } 1921 1921 1922 1922 switch (type) { … … 2232 2232 token = tokenTypeForIntegerLikeToken(tokenData->doubleValue); 2233 2233 2234 if (UNLIKELY(isIdentStart(m_current))) { 2234 if (LIKELY(cannotBeIdentStart(m_current))) { 2235 m_buffer8.shrink(0); 2236 break; 2237 } 2238 2239 if (UNLIKELY(isIdentStart(currentCodePoint()))) { 2235 2240 m_lexErrorMessage = "No identifiers allowed directly after numeric literal"_s; 2236 2241 token = atEnd() ? UNTERMINATED_NUMERIC_LITERAL_ERRORTOK : INVALID_NUMERIC_LITERAL_ERRORTOK; … … 2263 2268 } 2264 2269 2265 if (UNLIKELY(isIdentStart(m_current))) { 2270 if (LIKELY(cannotBeIdentStart(m_current))) { 2271 if (LIKELY(token != BIGINT)) 2272 token = tokenTypeForIntegerLikeToken(tokenData->doubleValue); 2273 m_buffer8.shrink(0); 2274 break; 2275 } 2276 2277 if (UNLIKELY(isIdentStart(currentCodePoint()))) { 2266 2278 m_lexErrorMessage = "No space between hexadecimal literal and identifier"_s; 2267 2279 token = UNTERMINATED_HEX_NUMBER_ERRORTOK; … … 2295 2307 } 2296 2308 2297 if (UNLIKELY(isIdentStart(m_current))) { 2309 if (LIKELY(cannotBeIdentStart(m_current))) { 2310 if (LIKELY(token != BIGINT)) 2311 token = tokenTypeForIntegerLikeToken(tokenData->doubleValue); 2312 m_buffer8.shrink(0); 2313 break; 2314 } 2315 2316 if (UNLIKELY(isIdentStart(currentCodePoint()))) { 2298 2317 m_lexErrorMessage = "No space between binary literal and identifier"_s; 2299 2318 token = UNTERMINATED_BINARY_NUMBER_ERRORTOK; … … 2328 2347 } 2329 2348 2330 if (UNLIKELY(isIdentStart(m_current))) { 2349 if (LIKELY(cannotBeIdentStart(m_current))) { 2350 if (LIKELY(token != BIGINT)) 2351 token = tokenTypeForIntegerLikeToken(tokenData->doubleValue); 2352 m_buffer8.shrink(0); 2353 break; 2354 } 2355 2356 if (UNLIKELY(isIdentStart(currentCodePoint()))) { 2331 2357 m_lexErrorMessage = "No space between octal literal and identifier"_s; 2332 2358 token = UNTERMINATED_OCTAL_NUMBER_ERRORTOK; … … 2395 2421 } 2396 2422 2397 if (UNLIKELY(isIdentStart(m_current))) { 2423 if (LIKELY(cannotBeIdentStart(m_current))) { 2424 m_buffer8.shrink(0); 2425 break; 2426 } 2427 2428 if (UNLIKELY(isIdentStart(currentCodePoint()))) { 2398 2429 m_lexErrorMessage = "No identifiers allowed directly after numeric literal"_s; 2399 2430 token = atEnd() ? UNTERMINATED_NUMERIC_LITERAL_ERRORTOK : INVALID_NUMERIC_LITERAL_ERRORTOK; … … 2417 2448 break; 2418 2449 } 2419 case CharacterIdentifierStart: 2420 ASSERT(isIdentStart(m_current)); 2450 case CharacterIdentifierStart: { 2451 if constexpr (ASSERT_ENABLED) { 2452 UChar32 codePoint; 2453 U16_GET(m_code, 0, 0, m_codeEnd - m_code, codePoint); 2454 ASSERT(isIdentStart(codePoint)); 2455 } 2421 2456 FALLTHROUGH; 2457 } 2422 2458 case CharacterBackSlash: 2423 2459 parseIdent: … … 2579 2615 2580 2616 tokenData->pattern = makeRightSizedIdentifier(m_buffer16.data(), m_buffer16.size(), charactersOredTogether); 2581 2582 2617 m_buffer16.shrink(0); 2583 charactersOredTogether = 0; 2584 2585 while (isIdentPart(m_current)) { 2586 record16(m_current); 2587 orCharacter<T>(charactersOredTogether, m_current); 2588 shift(); 2589 } 2590 2591 tokenData->flags = makeRightSizedIdentifier(m_buffer16.data(), m_buffer16.size(), charactersOredTogether); 2592 m_buffer16.shrink(0); 2618 2619 ASSERT(m_buffer8.isEmpty()); 2620 while (LIKELY(isLatin1(m_current)) && isIdentPart(static_cast<LChar>(m_current))) { 2621 record8(static_cast<LChar>(m_current)); 2622 shift(); 2623 } 2624 2625 // Normally this would not be a lex error but dealing with surrogate pairs here is annoying and it's going to be an error anyway... 2626 if (UNLIKELY(!isLatin1(m_current))) { 2627 m_buffer8.shrink(0); 2628 JSTokenType token = INVALID_IDENTIFIER_UNICODE_ERRORTOK; 2629 fillTokenInfo(tokenRecord, token, m_lineNumber, currentOffset(), currentLineStartOffset(), currentPosition()); 2630 m_error = true; 2631 String codePoint = String::fromCodePoint(currentCodePoint()); 2632 if (!codePoint) 2633 codePoint = "`invalid unicode character`"; 2634 m_lexErrorMessage = makeString("Invalid non-latin character in RexExp literal's flags '", getToken(*tokenRecord), codePoint, "'"); 2635 return token; 2636 } 2637 2638 tokenData->flags = makeIdentifier(m_buffer8.data(), m_buffer8.size()); 2639 m_buffer8.shrink(0); 2593 2640 2594 2641 // Since RegExp always ends with /, m_atLineStart always becomes false. -
trunk/Source/JavaScriptCore/parser/Lexer.h
r255440 r258531 136 136 void append16(const UChar* characters, size_t length) { m_buffer16.append(characters, length); } 137 137 138 UChar32 currentCodePoint() const; 138 139 ALWAYS_INLINE void shift(); 139 140 ALWAYS_INLINE bool atEnd() const; … … 148 149 String invalidCharacterMessage() const; 149 150 ALWAYS_INLINE const T* currentSourcePtr() const; 150 ALWAYS_INLINE void setOffsetFromSourcePtr(const T* sourcePtr, unsigned lineStartOffset) { setOffset(offsetFromSourcePtr(sourcePtr), lineStartOffset); }151 151 152 152 ALWAYS_INLINE void setCodeStart(const StringView&); … … 167 167 template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType parseKeyword(JSTokenData*); 168 168 template <bool shouldBuildIdentifiers> ALWAYS_INLINE JSTokenType parseIdentifier(JSTokenData*, OptionSet<LexerFlags>, bool strictMode); 169 template <bool shouldBuildIdentifiers> NEVER_INLINE JSTokenType parseIdentifierSlowCase(JSTokenData*, OptionSet<LexerFlags>, bool strictMode );169 template <bool shouldBuildIdentifiers> NEVER_INLINE JSTokenType parseIdentifierSlowCase(JSTokenData*, OptionSet<LexerFlags>, bool strictMode, const T* identifierStart); 170 170 enum StringParseResult { 171 171 StringParsedSuccessfully, -
trunk/Source/JavaScriptCore/parser/Parser.cpp
r258279 r258531 5221 5221 out.print("Invalid string literal: '", getToken(), "'"); 5222 5222 return; 5223 case INVALID_UNICODE_ENCODING_ERRORTOK: 5224 out.print("Invalid unicode encoding: '", getToken(), "'"); 5225 return; 5226 case INVALID_IDENTIFIER_UNICODE_ERRORTOK: 5227 out.print("Invalid unicode code point in identifier: '", getToken(), "'"); 5228 return; 5223 5229 case ERRORTOK: 5224 5230 out.print("Unrecognized token '", getToken(), "'"); -
trunk/Source/JavaScriptCore/parser/ParserTokens.h
r255440 r258531 34 34 35 35 enum { 36 // Token Bitfield: 0b000000000RTE00 0IIIIPPPPKUXXXXXXX36 // Token Bitfield: 0b000000000RTE00IIIIPPPPKUXXXXXXXX 37 37 // R = right-associative bit 38 38 // T = unterminated error flag … … 44 44 // 45 45 // We must keep the upper 8bit (1byte) region empty. JSTokenType must be 24bits. 46 UnaryOpTokenFlag = 1 28,47 KeywordTokenFlag = 256,48 BinaryOpTokenPrecedenceShift = 9,46 UnaryOpTokenFlag = 1 << 8, 47 KeywordTokenFlag = 1 << 9, 48 BinaryOpTokenPrecedenceShift = 10, 49 49 BinaryOpTokenAllowsInPrecedenceAdditionalShift = 4, 50 50 BinaryOpTokenPrecedenceMask = 15 << BinaryOpTokenPrecedenceShift, 51 ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 7),51 ErrorTokenFlag = 1 << (BinaryOpTokenAllowsInPrecedenceAdditionalShift + BinaryOpTokenPrecedenceShift + 6), 52 52 UnterminatedErrorTokenFlag = ErrorTokenFlag << 1, 53 53 RightAssociativeBinaryOpTokenFlag = UnterminatedErrorTokenFlag << 1 … … 193 193 INVALID_TEMPLATE_LITERAL_ERRORTOK = 15 | ErrorTokenFlag, 194 194 UNEXPECTED_ESCAPE_ERRORTOK = 16 | ErrorTokenFlag, 195 INVALID_UNICODE_ENCODING_ERRORTOK = 17 | ErrorTokenFlag, 196 INVALID_IDENTIFIER_UNICODE_ERRORTOK = 18 | ErrorTokenFlag, 195 197 }; 196 198 static_assert(static_cast<unsigned>(POW) <= 0x00ffffffU, "JSTokenType must be 24bits."); -
trunk/Source/WTF/ChangeLog
r258478 r258531 1 2020-03-16 Keith Miller <keith_miller@apple.com> 2 3 JavaScript identifier grammar supports unescaped astral symbols, but JSC doesn’t 4 https://bugs.webkit.org/show_bug.cgi?id=208998 5 6 Reviewed by Michael Saboff. 7 8 * wtf/text/WTFString.cpp: 9 (WTF::String::fromCodePoint): 10 * wtf/text/WTFString.h: 11 1 12 2020-03-15 Yusuke Suzuki <ysuzuki@apple.com> 2 13 -
trunk/Source/WTF/wtf/text/WTFString.cpp
r254046 r258531 889 889 } 890 890 891 String String::fromCodePoint(UChar32 codePoint) 892 { 893 UChar buffer[2]; 894 uint8_t length = 0; 895 UBool error = false; 896 U16_APPEND(buffer, length, 2, codePoint, error); 897 return error ? String() : String(buffer, length); 898 } 899 891 900 // String Operations 892 901 template<typename CharacterType> -
trunk/Source/WTF/wtf/text/WTFString.h
r250005 r258531 356 356 WTF_EXPORT_PRIVATE static String fromUTF8WithLatin1Fallback(const LChar*, size_t); 357 357 static String fromUTF8WithLatin1Fallback(const char* characters, size_t length) { return fromUTF8WithLatin1Fallback(reinterpret_cast<const LChar*>(characters), length); }; 358 359 WTF_EXPORT_PRIVATE static String fromCodePoint(UChar32 codePoint); 358 360 359 361 // Determines the writing direction using the Unicode Bidi Algorithm rules P2 and P3.
Note: See TracChangeset
for help on using the changeset viewer.