Changeset 178265 in webkit
- Timestamp:
- Jan 12, 2015 8:22:50 AM (9 years ago)
- Location:
- trunk/Source
- Files:
-
- 30 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/Source/WTF/ChangeLog
r178173 r178265 1 2015-01-12 Darin Adler <darin@apple.com> 2 3 Modernize and streamline HTMLTokenizer 4 https://bugs.webkit.org/show_bug.cgi?id=140166 5 6 Reviewed by Sam Weinig. 7 8 * wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition. 9 1 10 2015-01-09 Commit Queue <commit-queue@webkit.org> 2 11 -
trunk/Source/WTF/wtf/Forward.h
r178173 r178265 31 31 template<typename T> class OwnPtr; 32 32 template<typename T> class PassOwnPtr; 33 template<typename T> class PassRef;34 33 template<typename T> class PassRefPtr; 35 34 template<typename T> class RefPtr; … … 46 45 class Encoder; 47 46 class FunctionDispatcher; 47 class OrdinalNumber; 48 48 class PrintStream; 49 49 class String; … … 51 51 class StringImpl; 52 52 class StringView; 53 class TextPosition; 53 54 54 55 } … … 64 65 using WTF::LazyNeverDestroyed; 65 66 using WTF::NeverDestroyed; 67 using WTF::OrdinalNumber; 66 68 using WTF::OwnPtr; 67 69 using WTF::PassOwnPtr; 68 using WTF::PassRef;69 70 using WTF::PassRefPtr; 70 71 using WTF::PrintStream; … … 76 77 using WTF::StringImpl; 77 78 using WTF::StringView; 79 using WTF::TextPosition; 78 80 using WTF::Vector; 79 81 -
trunk/Source/WebCore/ChangeLog
r178253 r178265 1 2015-01-12 Darin Adler <darin@apple.com> 2 3 Modernize and streamline HTMLTokenizer 4 https://bugs.webkit.org/show_bug.cgi?id=140166 5 6 Reviewed by Sam Weinig. 7 8 * html/parser/AtomicHTMLToken.h: 9 (WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions 10 based on fields I removed. 11 12 * html/parser/HTMLDocumentParser.cpp: 13 (WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor 14 to set the initial state when parsing a fragment, since it implements the same 15 rule taht the tokenizerStateForContextElement function did. 16 (WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised 17 interfaces for HTMLSourceTracker and HTMLTokenizer. 18 (WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a 19 TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier 20 for non-character tokens, and let them get cleared later for character tokens. 21 (WebCore::HTMLDocumentParser::insert): Pass references. 22 (WebCore::HTMLDocumentParser::append): Ditto. 23 (WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto. 24 25 * html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken 26 and removed now-unneeded m_token data members. 27 28 * html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword. 29 (WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of 30 advanceAndASSERT with just plain advance; there's really no need to assert the 31 character is the one we just got out of the string. 32 33 * html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from 34 its old location since this class has two data members that are OrdinalNumber. 35 36 * html/parser/HTMLMetaCharsetParser.cpp: 37 (WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the 38 initialization, since it's now done by defaults. 39 (WebCore::extractCharset): Rewrote this to be a non-member function, and to 40 use a for loop, and to handle quote marks in a simpler way. Also changed it 41 to return a StringView so we don't have to allocate a new string. 42 (WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and 43 also take a token argument since it's no longer a data member. 44 (WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for 45 loop, StringView instead of string, and don't bother naming the local enum. 46 (WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new 47 way of getting tokens from the tokenizer. 48 49 * html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and 50 tightened up the formatting a little. Don't bother allocating the tokenizer 51 on the heap. 52 53 * html/parser/HTMLPreloadScanner.cpp: 54 (WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded 55 initialization. 56 (WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto. 57 (WebCore::HTMLPreloadScanner::scan): Changed to take a reference. 58 59 * html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs, 60 and forward declarations. Removed explicit declaration of the destructor, 61 since the default one works. Removed unused createCheckpoint and rewindTo 62 functions. Gave initial values for various data members. Marked the device 63 scale factor const beacuse it's set in the constructor and never changed. 64 Also removed the unneeded isSafeToSendToAnotherThread. 65 66 * html/parser/HTMLResourcePreloader.cpp: 67 (WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted. 68 69 * html/parser/HTMLResourcePreloader.h: 70 (WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to 71 isolatedCopy. Also removed isSafeToSendToAnotherThread. 72 73 * html/parser/HTMLSourceTracker.cpp: 74 (WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state 75 in the source tracker itself, not the token. 76 (WebCore::HTMLSourceTracker::endToken): Ditto. 77 (WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state 78 from the source tracker. 79 80 * html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h. 81 Renamed functions, removed now-unneeded comment. 82 83 * html/parser/HTMLToken.h: Cut down on the fields used by the source tracker. 84 It only needs to know the start and end of each attribute, not each part of 85 each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute, 86 beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue, 87 m_baseOffset and m_length. Added beginAttribute and endAttribute. 88 (WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more. 89 (WebCore::HTMLToken::length): Deleted. 90 (WebCore::HTMLToken::setBaseOffset): Deleted. 91 (WebCore::HTMLToken::setEndOffset): Deleted. 92 (WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we 93 are compiling in assertions. 94 (WebCore::HTMLToken::beginEndTag): Ditto. 95 (WebCore::HTMLToken::addNewAttribute): Deleted. 96 (WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in 97 here and set the start offset. 98 (WebCore::HTMLToken::beginAttributeName): Deleted. 99 (WebCore::HTMLToken::endAttributeName): Deleted. 100 (WebCore::HTMLToken::beginAttributeValue): Deleted. 101 (WebCore::HTMLToken::endAttributeValue): Deleted. 102 103 * html/parser/HTMLTokenizer.cpp: 104 (WebCore::HTMLToken::endAttribute): Added. Sets the end offset. 105 (WebCore::HTMLToken::appendToAttributeName): Updated assertion. 106 (WebCore::HTMLToken::appendToAttributeValue): Ditto. 107 (WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed 108 so it's legal to call on lower case letters too. 109 (WebCore::vectorEqualsString): Changed to take a string literal rather than 110 a WTF::String. 111 (WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function. 112 (WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes. 113 (WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of 114 bufferCharacter for the common case where we know the character is ASCII. 115 (WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the 116 header since it's only used inside the class. 117 (WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed 118 it and removed the state argument. 119 (WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto. 120 (WebCore::HTMLTokenizer::emitEndOfFile): More of the same. 121 (WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto. 122 (WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto. 123 (WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now 124 the actual token, not just a pointer. 125 (WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and 126 removed the state argument. 127 (WebCore::HTMLTokenizer::processToken): This function, formerly nextToken, 128 is now the internal function used by nextToken. Updated its contents to use 129 simpler macros, changed code to set m_state when returning, rather than 130 constantly setting it when cycling through states, switched style to use 131 early return/goto rather than lots of else statements, took out unneeded 132 braces now that BEGIN/END_STATE handles the braces, collapsed upper and 133 lower case letter handling in many states, changed lookAhead call sites to 134 use the new advancePast function instead. 135 (WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of 136 calling a setstate function. 137 (WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header. 138 (WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to 139 a literal instead of a WTF::String. 140 (WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type 141 to be a UChar instead of LChar, although all characters will be ASCII. 142 (WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed 143 type from size_t to unsigned. 144 145 * html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns 146 a TokenPtr so code doesn't have to understand special rules about when to 147 work with an HTMLToken and when to clear it. Made most functions private, 148 and made the State enum private as well. Replaced the state and setState 149 functions with more specific functions for the few states we need to deal 150 with outside the class. Moved function bodies outside the class definition 151 so it's easier to read the class definition. 152 153 * html/parser/HTMLTreeBuilder.cpp: 154 (WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the 155 new set state functions instead of setState. 156 (WebCore::HTMLTreeBuilder::processEndTag): Ditto. 157 (WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto. 158 (WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto. 159 (WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto. 160 161 * html/parser/InputStreamPreprocessor.h: Marked the constructor explicit, 162 and mde it take a reference rather than a pointer. 163 164 * html/parser/TextDocumentParser.cpp: 165 (WebCore::TextDocumentParser::insertFakePreElement): Updated to use the 166 new set state functions instead of setState. 167 168 * html/parser/XSSAuditor.cpp: 169 (WebCore::XSSAuditor::decodedSnippetForName): Updated for name change. 170 (WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to 171 attribute range tracking. 172 (WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change. 173 (WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted. 174 175 * html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread. 176 177 * html/track/WebVTTTokenizer.cpp: Removed the local state variable from 178 WEBVTT_ADVANCE_TO; there is no need for it. 179 (WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a 180 pointer for the preprocessor. 181 (WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local 182 variable and the switch statement, replacing with labels instead since we 183 go between states with goto. 184 185 * platform/text/SegmentedString.cpp: 186 (WebCore::SegmentedString::operator=): Changed the return type to be non-const 187 to match normal C++ design rules. 188 (WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a 189 general purpose prepend function. Also fixed assertions to not use the strangely 190 named "escaped" function, since we are deleting it. 191 (WebCore::SegmentedString::append): Ditto. 192 (WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since 193 the function only works for non-newlines. 194 (WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable. 195 (WebCore::SegmentedString::advancePastSlowCase): Moved here from header and 196 renamed. This function now consumes the characters if they match. 197 198 * platform/text/SegmentedString.h: Made the changes mentioned above. 199 (WebCore::SegmentedString::excludeLineNumbers): Deleted. 200 (WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed 201 behavior so the characters are consumed. 202 (WebCore::SegmentedString::advancePastIgnoringCase): Ditto. 203 (WebCore::SegmentedString::advanceAndASSERT): Deleted. 204 (WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted. 205 (WebCore::SegmentedString::escaped): Deleted. 206 207 * xml/parser/CharacterReferenceParserInlines.h: 208 (WebCore::isHexDigit): Deleted. 209 (WebCore::unconsumeCharacters): Updated for name change. 210 (WebCore::consumeCharacterReference): Removed unneeded name for local enum, 211 renamed local variable "cc" to character. Changed code to use helpers like 212 isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT, 213 since we don't really need to assert the character we just extracted. 214 215 * xml/parser/MarkupTokenizerInlines.h: 216 (WebCore::isTokenizerWhitespace): Renamed argument to character. 217 (WebCore::advanceStringAndASSERTIgnoringCase): Deleted. 218 (WebCore::advanceStringAndASSERT): Deleted. 219 Changed all the macro implementations so they set m_state only when 220 returning from the function and just use goto inside the state machine. 221 1 222 2015-01-11 Andreas Kling <akling@apple.com> 2 223 -
trunk/Source/WebCore/html/parser/AtomicHTMLToken.h
r178173 r178265 192 192 continue; 193 193 194 ASSERT(attribute.nameRange.start);195 ASSERT(attribute.nameRange.end);196 ASSERT(attribute.valueRange.start);197 ASSERT(attribute.valueRange.end);198 199 194 QualifiedName name(nullAtom, AtomicString(attribute.name), nullAtom); 200 195 -
trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp
r178173 r178265 40 40 using namespace HTMLNames; 41 41 42 // This is a direct transcription of step 4 from:43 // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments44 static HTMLTokenizer::State tokenizerStateForContextElement(Element& contextElement, bool reportErrors, const HTMLParserOptions& options)45 {46 const QualifiedName& contextTag = contextElement.tagQName();47 48 if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))49 return HTMLTokenizer::RCDATAState;50 if (contextTag.matches(styleTag)51 || contextTag.matches(xmpTag)52 || contextTag.matches(iframeTag)53 || (contextTag.matches(noembedTag) && options.pluginsEnabled)54 || (contextTag.matches(noscriptTag) && options.scriptEnabled)55 || contextTag.matches(noframesTag))56 return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;57 if (contextTag.matches(scriptTag))58 return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;59 if (contextTag.matches(plaintextTag))60 return HTMLTokenizer::PLAINTEXTState;61 return HTMLTokenizer::DataState;62 }63 64 42 HTMLDocumentParser::HTMLDocumentParser(HTMLDocument& document) 65 43 : ScriptableDocumentParser(document) … … 86 64 , m_xssAuditorDelegate(fragment.document()) 87 65 { 88 bool reportErrors = false; // For now document fragment parsing never reports errors. 89 m_tokenizer.setState(tokenizerStateForContextElement(contextElement, reportErrors, m_options)); 66 // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments 67 if (contextElement.isHTMLElement()) 68 m_tokenizer.updateStateFor(contextElement.tagQName().localName()); 90 69 m_xssAuditor.initForFragment(); 91 70 } … … 280 259 while (canTakeNextToken(mode, session) && !session.needsYield) { 281 260 if (!isParsingFragment()) 282 m_sourceTracker.start(m_input.current(), &m_tokenizer, m_token); 283 284 if (!m_tokenizer.nextToken(m_input.current(), m_token)) 261 m_sourceTracker.startToken(m_input.current(), m_tokenizer); 262 263 auto token = m_tokenizer.nextToken(m_input.current()); 264 if (!token) 285 265 break; 286 266 287 267 if (!isParsingFragment()) { 288 m_sourceTracker.end (m_input.current(), &m_tokenizer, m_token);268 m_sourceTracker.endToken(m_input.current(), m_tokenizer); 289 269 290 270 // We do not XSS filter innerHTML, which means we (intentionally) fail 291 271 // http/tests/security/xssAuditor/dom-write-innerHTML.html 292 if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest( m_token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))272 if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(*token, m_sourceTracker, m_tokenizer.shouldAllowCDATA()))) 293 273 m_xssAuditorDelegate.didBlockScript(*xssInfo); 294 274 } 295 275 296 constructTreeFromHTMLToken(m_token); 297 ASSERT(m_token.type() == HTMLToken::Uninitialized); 276 constructTreeFromHTMLToken(token); 298 277 } 299 278 … … 309 288 310 289 if (isWaitingForScripts()) { 311 ASSERT(m_tokenizer. state() == HTMLTokenizer::DataState);290 ASSERT(m_tokenizer.isInDataState()); 312 291 if (!m_preloadScanner) { 313 292 m_preloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor()); 314 293 m_preloadScanner->appendToEnd(m_input.current()); 315 294 } 316 m_preloadScanner->scan( m_preloader.get(), *document());295 m_preloadScanner->scan(*m_preloader, *document()); 317 296 } 318 297 … … 320 299 } 321 300 322 void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken & rawToken)323 { 324 AtomicHTMLToken token( rawToken);301 void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr& rawToken) 302 { 303 AtomicHTMLToken token(*rawToken); 325 304 326 305 // We clear the rawToken in case constructTreeFromAtomicToken … … 334 313 // the main thread or once we stop allowing synchronous JavaScript 335 314 // execution from parseAttribute. 336 if (rawToken.type() != HTMLToken::Character) 315 if (rawToken->type() != HTMLToken::Character) { 316 // Clearing the TokenPtr makes sure we don't clear the HTMLToken a second time 317 // later when the TokenPtr is destroyed. 337 318 rawToken.clear(); 319 } 338 320 339 321 m_treeBuilder->constructTree(token); 340 341 if (rawToken.type() != HTMLToken::Uninitialized) {342 ASSERT(rawToken.type() == HTMLToken::Character);343 rawToken.clear();344 }345 322 } 346 323 … … 374 351 m_insertionPreloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor()); 375 352 m_insertionPreloadScanner->appendToEnd(source); 376 m_insertionPreloadScanner->scan( m_preloader.get(), *document());353 m_insertionPreloadScanner->scan(*m_preloader, *document()); 377 354 } 378 355 … … 399 376 m_preloadScanner->appendToEnd(source); 400 377 if (isWaitingForScripts()) 401 m_preloadScanner->scan( m_preloader.get(), *document());378 m_preloadScanner->scan(*m_preloader, *document()); 402 379 } 403 380 } … … 534 511 ASSERT(m_preloadScanner); 535 512 m_preloadScanner->appendToEnd(m_input.current()); 536 m_preloadScanner->scan( m_preloader.get(), *document());513 m_preloadScanner->scan(*m_preloader, *document()); 537 514 } 538 515 -
trunk/Source/WebCore/html/parser/HTMLDocumentParser.h
r178173 r178265 104 104 void pumpTokenizer(SynchronousMode); 105 105 void pumpTokenizerIfPossible(SynchronousMode); 106 void constructTreeFromHTMLToken(HTMLToken &);106 void constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr&); 107 107 108 108 void runScriptsForPausedTreeBuilder(); … … 122 122 HTMLInputStream m_input; 123 123 124 HTMLToken m_token;125 124 HTMLTokenizer m_tokenizer; 126 125 std::unique_ptr<HTMLScriptRunner> m_scriptRunner; -
trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp
r178173 r178265 61 61 } 62 62 63 inlinestatic bool acceptMalformed() { return true; }63 static bool acceptMalformed() { return true; } 64 64 65 inlinestatic bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)65 static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc) 66 66 { 67 67 StringBuilder consumedCharacters; … … 73 73 break; 74 74 consumedCharacters.append(cc); 75 source.advance AndASSERT(cc);75 source.advance(); 76 76 } 77 77 notEnoughCharacters = source.isEmpty(); … … 98 98 ASSERT_UNUSED(reference, cc == *reference++); 99 99 consumedCharacters.append(cc); 100 source.advance AndASSERT(cc);100 source.advance(); 101 101 ASSERT(!source.isEmpty()); 102 102 } -
trunk/Source/WebCore/html/parser/HTMLInputStream.h
r178173 r178265 29 29 #include "InputStreamPreprocessor.h" 30 30 #include "SegmentedString.h" 31 #include <wtf/text/TextPosition.h> 31 32 32 33 namespace WebCore { -
trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp
r178173 r178265 1 1 /* 2 2 * Copyright (C) 2010 Google Inc. All Rights Reserved. 3 * Copyright (C) 2015 Apple Inc. All Rights Reserved. 3 4 * 4 5 * Redistribution and use in source and binary forms, with or without … … 29 30 #include "HTMLNames.h" 30 31 #include "HTMLParserIdioms.h" 31 #include "HTMLTokenizer.h"32 #include "TextCodec.h"33 32 #include "TextEncodingRegistry.h" 34 35 using namespace WTF;36 33 37 34 namespace WebCore { … … 40 37 41 38 HTMLMetaCharsetParser::HTMLMetaCharsetParser() 42 : m_tokenizer(std::make_unique<HTMLTokenizer>(HTMLParserOptions())) 43 , m_assumedCodec(newTextCodec(Latin1Encoding())) 44 , m_inHeadSection(true) 45 , m_doneChecking(false) 39 : m_codec(newTextCodec(Latin1Encoding())) 46 40 { 47 41 } 48 42 49 HTMLMetaCharsetParser::~HTMLMetaCharsetParser()43 static StringView extractCharset(const String& value) 50 44 { 51 }52 53 static const char charsetString[] = "charset";54 static const size_t charsetLength = sizeof("charset") - 1;55 56 String HTMLMetaCharsetParser::extractCharset(const String& value)57 {58 size_t pos = 0;59 45 unsigned length = value.length(); 60 61 while (pos < length) { 62 pos = value.find(charsetString, pos, false); 46 for (size_t pos = 0; pos < length; ) { 47 pos = value.find("charset", pos, false); 63 48 if (pos == notFound) 64 49 break; 65 50 51 static const size_t charsetLength = sizeof("charset") - 1; 66 52 pos += charsetLength; 67 53 … … 78 64 ++pos; 79 65 80 char quoteMark = 0; 81 if (pos < length && (value[pos] == '"' || value[pos] == '\'')) { 82 quoteMark = static_cast<char>(value[pos++]); 83 ASSERT(!(quoteMark & 0x80)); 84 } 85 66 UChar quoteMark = 0; 67 if (pos < length && (value[pos] == '"' || value[pos] == '\'')) 68 quoteMark = value[pos++]; 69 86 70 if (pos == length) 87 71 break; … … 94 78 break; // Close quote not found. 95 79 96 return value.substring(pos, end - pos);80 return StringView(value).substring(pos, end - pos); 97 81 } 98 99 return ""; 82 return StringView(); 100 83 } 101 84 102 bool HTMLMetaCharsetParser::processMeta( )85 bool HTMLMetaCharsetParser::processMeta(HTMLToken& token) 103 86 { 104 const HTMLToken::AttributeList& tokenAttributes = m_token.attributes();105 87 AttributeList attributes; 106 for ( HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); iter != tokenAttributes.end(); ++iter) {107 String attributeName = StringImpl::create8BitIfPossible( iter->name);108 String attributeValue = StringImpl::create8BitIfPossible( iter->value);88 for (auto& attribute : token.attributes()) { 89 String attributeName = StringImpl::create8BitIfPossible(attribute.name); 90 String attributeValue = StringImpl::create8BitIfPossible(attribute.value); 109 91 attributes.append(std::make_pair(attributeName, attributeValue)); 110 92 } … … 117 99 { 118 100 bool gotPragma = false; 119 Modemode = None;120 String charset;101 enum { None, Charset, Pragma } mode = None; 102 StringView charset; 121 103 122 for ( AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) {123 const AtomicString& attributeName = iter->first;124 const String& attributeValue = iter->second;104 for (auto& attribute : attributes) { 105 const String& attributeName = attribute.first; 106 const String& attributeValue = attribute.second; 125 107 126 108 if (attributeName == http_equivAttr) { … … 140 122 141 123 if (mode == Charset || (mode == Pragma && gotPragma)) 142 return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset ));124 return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset.toStringWithoutCopying())); 143 125 144 126 return TextEncoding(); 145 127 } 146 147 static const int bytesToCheckUnconditionally = 1024; // That many input bytes will be checked for meta charset even if <head> section is over.148 128 149 129 bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length) … … 157 137 // The following tags are allowed in <head>: 158 138 // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE 159 139 // 160 140 // We stop scanning when a tag that is not permitted in <head> 161 141 // is seen, rather when </head> is seen, because that more closely 162 142 // matches behavior in other browsers; more details in 163 143 // <http://bugs.webkit.org/show_bug.cgi?id=3590>. 164 144 // 165 145 // Additionally, we ignore things that looks like tags in <title>, <script> 166 146 // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>, 167 147 // <http://bugs.webkit.org/show_bug.cgi?id=12165> and 168 148 // <http://bugs.webkit.org/show_bug.cgi?id=12389>. 169 149 // 170 150 // Since many sites have charset declarations after <body> or other tags 171 151 // that are disallowed in <head>, we don't bail out until we've checked at 172 152 // least bytesToCheckUnconditionally bytes of input. 173 153 174 m_input.append(SegmentedString(m_assumedCodec->decode(data, length)));154 static const int bytesToCheckUnconditionally = 1024; 175 155 176 while (m_tokenizer->nextToken(m_input, m_token)) { 177 bool end = m_token.type() == HTMLToken::EndTag; 178 if (end || m_token.type() == HTMLToken::StartTag) { 179 AtomicString tagName(m_token.name()); 180 if (!end) { 181 m_tokenizer->updateStateFor(tagName); 182 if (tagName == metaTag && processMeta()) { 156 m_input.append(SegmentedString(m_codec->decode(data, length))); 157 158 while (auto token = m_tokenizer.nextToken(m_input)) { 159 bool isEnd = token->type() == HTMLToken::EndTag; 160 if (isEnd || token->type() == HTMLToken::StartTag) { 161 AtomicString tagName(token->name()); 162 if (!isEnd) { 163 m_tokenizer.updateStateFor(tagName); 164 if (tagName == metaTag && processMeta(*token)) { 183 165 m_doneChecking = true; 184 166 return true; … … 190 172 && tagName != metaTag && tagName != objectTag 191 173 && tagName != titleTag && tagName != baseTag 192 && (end || tagName != htmlTag) && (end || tagName != headTag)) { 174 && (isEnd || tagName != htmlTag) 175 && (isEnd || tagName != headTag)) { 193 176 m_inHeadSection = false; 194 177 } … … 199 182 return true; 200 183 } 201 202 m_token.clear();203 184 } 204 185 -
trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h
r178173 r178265 27 27 #define HTMLMetaCharsetParser_h 28 28 29 #include "HTMLToken .h"29 #include "HTMLTokenizer.h" 30 30 #include "SegmentedString.h" 31 31 #include "TextEncoding.h" 32 #include <wtf/Noncopyable.h>33 32 34 33 namespace WebCore { 35 34 36 class HTMLTokenizer;37 35 class TextCodec; 38 36 … … 41 39 public: 42 40 HTMLMetaCharsetParser(); 43 ~HTMLMetaCharsetParser();44 41 45 42 // Returns true if done checking, regardless whether an encoding is found. … … 48 45 const TextEncoding& encoding() { return m_encoding; } 49 46 47 // The returned encoding might not be valid. 50 48 typedef Vector<std::pair<String, String>> AttributeList; 51 // The returned encoding might not be valid. 52 static TextEncoding encodingFromMetaAttributes(const AttributeList& 53 ); 49 static TextEncoding encodingFromMetaAttributes(const AttributeList&); 54 50 55 51 private: 56 bool processMeta(); 57 static String extractCharset(const String&); 52 bool processMeta(HTMLToken&); 58 53 59 enum Mode { 60 None, 61 Charset, 62 Pragma, 63 }; 64 65 std::unique_ptr<HTMLTokenizer> m_tokenizer; 66 std::unique_ptr<TextCodec> m_assumedCodec; 54 HTMLTokenizer m_tokenizer; 55 const std::unique_ptr<TextCodec> m_codec; 67 56 SegmentedString m_input; 68 HTMLToken m_token; 69 bool m_inHeadSection; 70 71 bool m_doneChecking; 57 bool m_inHeadSection { true }; 58 bool m_doneChecking { false }; 72 59 TextEncoding m_encoding; 73 60 }; -
trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp
r178173 r178265 243 243 TokenPreloadScanner::TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor) 244 244 : m_documentURL(documentURL) 245 , m_inStyle(false)246 245 , m_deviceScaleFactor(deviceScaleFactor) 247 #if ENABLE(TEMPLATE_ELEMENT) 248 , m_templateCount(0) 249 #endif 250 { 251 } 252 253 TokenPreloadScanner::~TokenPreloadScanner() 254 { 255 } 256 257 TokenPreloadScannerCheckpoint TokenPreloadScanner::createCheckpoint() 258 { 259 TokenPreloadScannerCheckpoint checkpoint = m_checkpoints.size(); 260 m_checkpoints.append(Checkpoint(m_predictedBaseElementURL, m_inStyle 261 #if ENABLE(TEMPLATE_ELEMENT) 262 , m_templateCount 263 #endif 264 )); 265 return checkpoint; 266 } 267 268 void TokenPreloadScanner::rewindTo(TokenPreloadScannerCheckpoint checkpointIndex) 269 { 270 ASSERT(checkpointIndex < m_checkpoints.size()); // If this ASSERT fires, checkpointIndex is invalid. 271 const Checkpoint& checkpoint = m_checkpoints[checkpointIndex]; 272 m_predictedBaseElementURL = checkpoint.predictedBaseElementURL; 273 m_inStyle = checkpoint.inStyle; 274 #if ENABLE(TEMPLATE_ELEMENT) 275 m_templateCount = checkpoint.templateCount; 276 #endif 277 m_cssScanner.reset(); 278 m_checkpoints.clear(); 246 { 279 247 } 280 248 … … 350 318 HTMLPreloadScanner::HTMLPreloadScanner(const HTMLParserOptions& options, const URL& documentURL, float deviceScaleFactor) 351 319 : m_scanner(documentURL, deviceScaleFactor) 352 , m_tokenizer(std::make_unique<HTMLTokenizer>(options)) 353 { 354 } 355 356 HTMLPreloadScanner::~HTMLPreloadScanner() 320 , m_tokenizer(options) 357 321 { 358 322 } … … 363 327 } 364 328 365 void HTMLPreloadScanner::scan(HTMLResourcePreloader *preloader, Document& document)329 void HTMLPreloadScanner::scan(HTMLResourcePreloader& preloader, Document& document) 366 330 { 367 331 ASSERT(isMainThread()); // HTMLTokenizer::updateStateFor only works on the main thread. … … 375 339 PreloadRequestStream requests; 376 340 377 while (m_tokenizer->nextToken(m_source, m_token)) { 378 if (m_token.type() == HTMLToken::StartTag) 379 m_tokenizer->updateStateFor(AtomicString(m_token.name())); 380 m_scanner.scan(m_token, requests, document); 381 m_token.clear(); 382 } 383 384 preloader->preload(WTF::move(requests)); 385 } 386 387 } 341 while (auto token = m_tokenizer.nextToken(m_source)) { 342 if (token->type() == HTMLToken::StartTag) 343 m_tokenizer.updateStateFor(AtomicString(token->name())); 344 m_scanner.scan(*token, requests, document); 345 } 346 347 preloader.preload(WTF::move(requests)); 348 } 349 350 } -
trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h
r178173 r178265 29 29 30 30 #include "CSSPreloadScanner.h" 31 #include "HTMLToken .h"31 #include "HTMLTokenizer.h" 32 32 #include "SegmentedString.h" 33 #include <wtf/Vector.h>34 33 35 34 namespace WebCore { 36 35 37 typedef size_t TokenPreloadScannerCheckpoint;38 39 class HTMLParserOptions;40 class HTMLTokenizer;41 class SegmentedString;42 class Frame;43 44 36 class TokenPreloadScanner { 45 WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); WTF_MAKE_FAST_ALLOCATED;37 WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); 46 38 public: 47 39 explicit TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor = 1.0); 48 ~TokenPreloadScanner();49 40 50 void scan(const HTMLToken&, PreloadRequestStream& requests, Document&);41 void scan(const HTMLToken&, PreloadRequestStream&, Document&); 51 42 52 43 void setPredictedBaseElementURL(const URL& url) { m_predictedBaseElementURL = url; } 53 54 // A TokenPreloadScannerCheckpoint is valid until the next call to rewindTo,55 // at which point all outstanding checkpoints are invalidated.56 TokenPreloadScannerCheckpoint createCheckpoint();57 void rewindTo(TokenPreloadScannerCheckpoint);58 59 bool isSafeToSendToAnotherThread()60 {61 return m_documentURL.isSafeToSendToAnotherThread()62 && m_predictedBaseElementURL.isSafeToSendToAnotherThread();63 }64 44 65 45 private: … … 86 66 void updatePredictedBaseURL(const HTMLToken&); 87 67 88 struct Checkpoint {89 Checkpoint(const URL& predictedBaseElementURL, bool inStyle90 #if ENABLE(TEMPLATE_ELEMENT)91 , size_t templateCount92 #endif93 )94 : predictedBaseElementURL(predictedBaseElementURL)95 , inStyle(inStyle)96 #if ENABLE(TEMPLATE_ELEMENT)97 , templateCount(templateCount)98 #endif99 {100 }101 102 URL predictedBaseElementURL;103 bool inStyle;104 #if ENABLE(TEMPLATE_ELEMENT)105 size_t templateCount;106 #endif107 };108 109 68 CSSPreloadScanner m_cssScanner; 110 69 const URL m_documentURL; 70 const float m_deviceScaleFactor { 1 }; 71 111 72 URL m_predictedBaseElementURL; 112 bool m_inStyle; 113 float m_deviceScaleFactor; 114 73 bool m_inStyle { false }; 115 74 #if ENABLE(TEMPLATE_ELEMENT) 116 size_t m_templateCount;75 unsigned m_templateCount { 0 }; 117 76 #endif 118 119 Vector<Checkpoint> m_checkpoints;120 77 }; 121 78 122 79 class HTMLPreloadScanner { 123 WTF_MAKE_ NONCOPYABLE(HTMLPreloadScanner); WTF_MAKE_FAST_ALLOCATED;80 WTF_MAKE_FAST_ALLOCATED; 124 81 public: 125 82 HTMLPreloadScanner(const HTMLParserOptions&, const URL& documentURL, float deviceScaleFactor = 1.0); 126 ~HTMLPreloadScanner();127 83 128 84 void appendToEnd(const SegmentedString&); 129 void scan(HTMLResourcePreloader *, Document&);85 void scan(HTMLResourcePreloader&, Document&); 130 86 131 87 private: 132 88 TokenPreloadScanner m_scanner; 133 89 SegmentedString m_source; 134 HTMLToken m_token; 135 std::unique_ptr<HTMLTokenizer> m_tokenizer; 90 HTMLTokenizer m_tokenizer; 136 91 }; 137 92 -
trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp
r178173 r178265 35 35 36 36 namespace WebCore { 37 38 bool PreloadRequest::isSafeToSendToAnotherThread() const39 {40 return m_initiator.isSafeToSendToAnotherThread()41 && m_charset.isSafeToSendToAnotherThread()42 && m_resourceURL.isSafeToSendToAnotherThread()43 && m_mediaAttribute.isSafeToSendToAnotherThread()44 && m_baseURL.isSafeToSendToAnotherThread();45 }46 37 47 38 URL PreloadRequest::completeURL(Document& document) -
trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h
r178173 r178265 36 36 PreloadRequest(const String& initiator, const String& resourceURL, const URL& baseURL, CachedResource::Type resourceType, const String& mediaAttribute) 37 37 : m_initiator(initiator) 38 , m_resourceURL(resourceURL .isolatedCopy())38 , m_resourceURL(resourceURL) 39 39 , m_baseURL(baseURL.copy()) 40 40 , m_resourceType(resourceType) 41 , m_mediaAttribute(mediaAttribute .isolatedCopy())41 , m_mediaAttribute(mediaAttribute) 42 42 , m_crossOriginModeAllowsCookies(false) 43 43 { 44 44 } 45 46 bool isSafeToSendToAnotherThread() const;47 45 48 46 CachedResourceRequest resourceRequest(Document&); -
trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp
r178173 r178265 1 1 /* 2 2 * Copyright (C) 2010 Adam Barth. All Rights Reserved. 3 * Copyright (C) 2015 Apple Inc. All rights reserved. 3 4 * 4 5 * Redistribution and use in source and binary forms, with or without … … 26 27 #include "config.h" 27 28 #include "HTMLSourceTracker.h" 29 28 30 #include "HTMLTokenizer.h" 29 31 #include <wtf/text/StringBuilder.h> … … 35 37 } 36 38 37 void HTMLSourceTracker::start (SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)39 void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer) 38 40 { 39 if (token.type() == HTMLToken::Uninitialized) { 40 m_previousSource.clear(); 41 if (tokenizer->numberOfBufferedCharacters()) 42 m_previousSource = tokenizer->bufferedCharacters(); 41 if (!m_started) { 42 if (tokenizer.numberOfBufferedCharacters()) 43 m_previousSource = tokenizer.bufferedCharacters(); 44 else 45 m_previousSource.clear(); 46 m_started = true; 43 47 } else 44 48 m_previousSource.append(m_currentSource); 45 49 46 50 m_currentSource = currentInput; 47 token.setBaseOffset(m_currentSource.numberOfCharactersConsumed() - m_previousSource.length());51 m_tokenStart = m_currentSource.numberOfCharactersConsumed() - m_previousSource.length(); 48 52 } 49 53 50 void HTMLSourceTracker::end (SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)54 void HTMLSourceTracker::endToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer) 51 55 { 56 ASSERT(m_started); 57 m_started = false; 58 59 m_tokenEnd = currentInput.numberOfCharactersConsumed() - tokenizer.numberOfBufferedCharacters(); 52 60 m_cachedSourceForToken = String(); 53 54 // FIXME: This work should really be done by the HTMLTokenizer.55 token.setEndOffset(currentInput.numberOfCharactersConsumed() - tokenizer->numberOfBufferedCharacters());56 61 } 57 62 58 String HTMLSourceTracker::source ForToken(const HTMLToken& token)63 String HTMLSourceTracker::source(const HTMLToken& token) 59 64 { 65 ASSERT(!m_started); 66 60 67 if (token.type() == HTMLToken::EndOfFile) 61 68 return String(); // Hides the null character we use to mark the end of file. … … 64 71 return m_cachedSourceForToken; 65 72 66 unsigned length = token.length();73 unsigned length = m_tokenEnd - m_tokenStart; 67 74 68 75 StringBuilder source; … … 84 91 } 85 92 93 String HTMLSourceTracker::source(const HTMLToken& token, unsigned attributeStart, unsigned attributeEnd) 94 { 95 return source(token).substring(attributeStart - m_tokenStart, attributeEnd - attributeStart); 86 96 } 97 98 } -
trunk/Source/WebCore/html/parser/HTMLSourceTracker.h
r178173 r178265 1 1 /* 2 2 * Copyright (C) 2010 Adam Barth. All Rights Reserved. 3 * Copyright (C) 2015 Apple Inc. All rights reserved. 3 4 * 4 5 * Redistribution and use in source and binary forms, with or without … … 27 28 #define HTMLSourceTracker_h 28 29 29 #include "HTMLToken.h"30 30 #include "SegmentedString.h" 31 31 32 32 namespace WebCore { 33 33 34 class HTMLToken; 34 35 class HTMLTokenizer; 35 36 … … 39 40 HTMLSourceTracker(); 40 41 41 // FIXME: Once we move "end" into HTMLTokenizer, rename "start" to 42 // something that makes it obvious that this method can be called multiple 43 // times. 44 void start(SegmentedString&, HTMLTokenizer*, HTMLToken&); 45 void end(SegmentedString&, HTMLTokenizer*, HTMLToken&); 42 void startToken(SegmentedString&, HTMLTokenizer&); 43 void endToken(SegmentedString&, HTMLTokenizer&); 46 44 47 String sourceForToken(const HTMLToken&); 45 String source(const HTMLToken&); 46 String source(const HTMLToken&, unsigned attributeStart, unsigned attributeEnd); 48 47 49 48 private: 49 bool m_started { false }; 50 51 unsigned m_tokenStart; 52 unsigned m_tokenEnd; 53 50 54 SegmentedString m_previousSource; 51 55 SegmentedString m_currentSource; -
trunk/Source/WebCore/html/parser/HTMLToken.h
r178173 r178265 54 54 55 55 struct Attribute { 56 struct Range {57 unsigned start;58 unsigned end;59 };60 61 Range nameRange;62 Range valueRange;63 56 Vector<UChar, 32> name; 64 57 Vector<UChar, 32> value; 58 59 // Used by HTMLSourceTracker. 60 unsigned startOffset; 61 unsigned endOffset; 65 62 }; 66 63 … … 74 71 Type type() const; 75 72 76 // Used by HTMLSourceTracker.77 void setBaseOffset(unsigned); // Base for attribute offsets, and the end of token offset.78 void setEndOffset(unsigned);79 unsigned length() const;80 81 73 // EndOfFile 82 74 … … 114 106 void beginEndTag(const Vector<LChar, 32>&); 115 107 116 void addNewAttribute(); 117 118 void beginAttributeName(unsigned offset); 108 void beginAttribute(unsigned offset); 119 109 void appendToAttributeName(UChar); 120 void endAttributeName(unsigned offset);121 122 void beginAttributeValue(unsigned offset);123 110 void appendToAttributeValue(UChar); 124 void endAttribute Value(unsigned offset);111 void endAttribute(unsigned offset); 125 112 126 113 void setSelfClosing(); … … 155 142 Type m_type; 156 143 157 unsigned m_baseOffset;158 unsigned m_length;159 160 144 DataVector m_data; 161 145 UChar m_data8BitCheck; … … 173 157 174 158 inline HTMLToken::HTMLToken() 175 { 176 clear(); 159 : m_type(Uninitialized) 160 , m_data8BitCheck(0) 161 { 177 162 } 178 163 … … 182 167 m_data.clear(); 183 168 m_data8BitCheck = 0; 184 185 m_length = 0;186 m_baseOffset = 0;187 169 } 188 170 … … 196 178 ASSERT(m_type == Uninitialized); 197 179 m_type = EndOfFile; 198 }199 200 inline unsigned HTMLToken::length() const201 {202 return m_length;203 }204 205 inline void HTMLToken::setBaseOffset(unsigned offset)206 {207 m_baseOffset = offset;208 }209 210 inline void HTMLToken::setEndOffset(unsigned endOffset)211 {212 m_length = endOffset - m_baseOffset;213 180 } 214 181 … … 301 268 m_type = StartTag; 302 269 m_selfClosing = false; 270 m_attributes.clear(); 271 272 #if !ASSERT_DISABLED 303 273 m_currentAttribute = nullptr; 304 m_attributes.clear(); 274 #endif 305 275 306 276 m_data.append(character); … … 313 283 m_type = EndTag; 314 284 m_selfClosing = false; 285 m_attributes.clear(); 286 287 #if !ASSERT_DISABLED 315 288 m_currentAttribute = nullptr; 316 m_attributes.clear(); 289 #endif 317 290 318 291 m_data.append(character); … … 324 297 m_type = EndTag; 325 298 m_selfClosing = false; 299 m_attributes.clear(); 300 301 #if !ASSERT_DISABLED 326 302 m_currentAttribute = nullptr; 327 m_attributes.clear(); 303 #endif 328 304 329 305 m_data.appendVector(characters); 330 306 } 331 307 332 inline void HTMLToken::addNewAttribute() 333 { 334 ASSERT(m_type == StartTag || m_type == EndTag); 308 inline void HTMLToken::beginAttribute(unsigned offset) 309 { 310 ASSERT(m_type == StartTag || m_type == EndTag); 311 ASSERT(offset); 312 335 313 m_attributes.grow(m_attributes.size() + 1); 336 314 m_currentAttribute = &m_attributes.last(); 337 315 316 m_currentAttribute->startOffset = offset; 317 } 318 319 inline void HTMLToken::endAttribute(unsigned offset) 320 { 321 ASSERT(offset); 322 ASSERT(m_currentAttribute); 323 m_currentAttribute->endOffset = offset; 338 324 #if !ASSERT_DISABLED 339 m_currentAttribute->nameRange.start = 0; 340 m_currentAttribute->nameRange.end = 0; 341 m_currentAttribute->valueRange.start = 0; 342 m_currentAttribute->valueRange.end = 0; 325 m_currentAttribute = nullptr; 343 326 #endif 344 327 } 345 328 346 inline void HTMLToken::beginAttributeName(unsigned offset)347 {348 ASSERT(offset);349 ASSERT(!m_currentAttribute->nameRange.start);350 m_currentAttribute->nameRange.start = offset - m_baseOffset;351 }352 353 inline void HTMLToken::endAttributeName(unsigned offset)354 {355 ASSERT(offset);356 ASSERT(m_currentAttribute->nameRange.start);357 ASSERT(!m_currentAttribute->nameRange.end);358 359 unsigned adjustedOffset = offset - m_baseOffset;360 m_currentAttribute->nameRange.end = adjustedOffset;361 362 // FIXME: Is this intentional? Why point the value at the end of the name?363 m_currentAttribute->valueRange.start = adjustedOffset;364 m_currentAttribute->valueRange.end = adjustedOffset;365 }366 367 inline void HTMLToken::beginAttributeValue(unsigned offset)368 {369 ASSERT(offset);370 m_currentAttribute->valueRange.start = offset - m_baseOffset;371 }372 373 inline void HTMLToken::endAttributeValue(unsigned offset)374 {375 ASSERT(offset);376 m_currentAttribute->valueRange.end = offset - m_baseOffset;377 }378 379 329 inline void HTMLToken::appendToAttributeName(UChar character) 380 330 { 381 331 ASSERT(character); 382 332 ASSERT(m_type == StartTag || m_type == EndTag); 383 ASSERT(m_currentAttribute ->nameRange.start);333 ASSERT(m_currentAttribute); 384 334 m_currentAttribute->name.append(character); 385 335 } … … 389 339 ASSERT(character); 390 340 ASSERT(m_type == StartTag || m_type == EndTag); 391 ASSERT(m_currentAttribute ->valueRange.start);341 ASSERT(m_currentAttribute); 392 342 m_currentAttribute->value.append(character); 393 343 } -
trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp
r178173 r178265 1 1 /* 2 * Copyright (C) 2008 Apple Inc. All Rights Reserved.2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved. 3 3 * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/ 4 4 * Copyright (C) 2010 Google, Inc. All Rights Reserved. … … 30 30 31 31 #include "HTMLEntityParser.h" 32 #include "HTML TreeBuilder.h"32 #include "HTMLNames.h" 33 33 #include "MarkupTokenizerInlines.h" 34 #include "NotImplemented.h"35 34 #include <wtf/ASCIICType.h> 36 #include <wtf/CurrentTime.h>37 #include <wtf/text/CString.h>38 35 39 36 using namespace WTF; … … 43 40 using namespace HTMLNames; 44 41 45 static inline UChar toLowerCase(UChar cc) 46 { 47 ASSERT(isASCIIUpper(cc)); 48 const int lowerCaseOffset = 0x20; 49 return cc + lowerCaseOffset; 50 } 51 52 static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const String& string) 53 { 54 if (vector.size() != string.length()) 55 return false; 56 57 if (!string.length()) 58 return true; 59 60 return equal(string.impl(), vector.data(), vector.size()); 61 } 62 63 static inline bool isEndTagBufferingState(HTMLTokenizer::State state) 64 { 65 switch (state) { 66 case HTMLTokenizer::RCDATAEndTagOpenState: 67 case HTMLTokenizer::RCDATAEndTagNameState: 68 case HTMLTokenizer::RAWTEXTEndTagOpenState: 69 case HTMLTokenizer::RAWTEXTEndTagNameState: 70 case HTMLTokenizer::ScriptDataEndTagOpenState: 71 case HTMLTokenizer::ScriptDataEndTagNameState: 72 case HTMLTokenizer::ScriptDataEscapedEndTagOpenState: 73 case HTMLTokenizer::ScriptDataEscapedEndTagNameState: 42 static inline LChar convertASCIIAlphaToLower(UChar character) 43 { 44 ASSERT(isASCIIAlpha(character)); 45 return toASCIILowerUnchecked(character); 46 } 47 48 static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const char* string) 49 { 50 unsigned size = vector.size(); 51 for (unsigned i = 0; i < size; ++i) { 52 if (!string[i] || vector[i] != string[i]) 53 return false; 54 } 55 return !string[size]; 56 } 57 58 inline bool HTMLTokenizer::inEndTagBufferingState() const 59 { 60 switch (m_state) { 61 case RCDATAEndTagOpenState: 62 case RCDATAEndTagNameState: 63 case RAWTEXTEndTagOpenState: 64 case RAWTEXTEndTagNameState: 65 case ScriptDataEndTagOpenState: 66 case ScriptDataEndTagNameState: 67 case ScriptDataEscapedEndTagOpenState: 68 case ScriptDataEscapedEndTagNameState: 74 69 return true; 75 70 default: … … 78 73 } 79 74 80 #define HTML_BEGIN_STATE(stateName) BEGIN_STATE(HTMLTokenizer, stateName)81 #define HTML_RECONSUME_IN(stateName) RECONSUME_IN(HTMLTokenizer, stateName)82 #define HTML_ADVANCE_TO(stateName) ADVANCE_TO(HTMLTokenizer, stateName)83 #define HTML_SWITCH_TO(stateName) SWITCH_TO(HTMLTokenizer, stateName)84 85 75 HTMLTokenizer::HTMLTokenizer(const HTMLParserOptions& options) 86 : m_ inputStreamPreprocessor(this)76 : m_preprocessor(*this) 87 77 , m_options(options) 88 78 { 89 reset(); 90 } 91 92 HTMLTokenizer::~HTMLTokenizer() 93 { 94 } 95 96 void HTMLTokenizer::reset() 97 { 98 m_state = HTMLTokenizer::DataState; 99 m_token = 0; 100 m_forceNullCharacterReplacement = false; 101 m_shouldAllowCDATA = false; 102 m_additionalAllowedCharacter = '\0'; 79 } 80 81 inline void HTMLTokenizer::bufferASCIICharacter(UChar character) 82 { 83 ASSERT(character != kEndOfFileMarker); 84 ASSERT(isASCII(character)); 85 LChar narrowedCharacter = character; 86 m_token.appendToCharacter(narrowedCharacter); 87 } 88 89 inline void HTMLTokenizer::bufferCharacter(UChar character) 90 { 91 ASSERT(character != kEndOfFileMarker); 92 m_token.appendToCharacter(character); 93 } 94 95 inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source) 96 { 97 saveEndTagNameIfNeeded(); 98 m_state = DataState; 99 source.advanceAndUpdateLineNumber(); 100 return true; 101 } 102 103 inline bool HTMLTokenizer::emitAndReconsumeInDataState() 104 { 105 saveEndTagNameIfNeeded(); 106 m_state = DataState; 107 return true; 108 } 109 110 inline bool HTMLTokenizer::emitEndOfFile(SegmentedString& source) 111 { 112 m_state = DataState; 113 if (haveBufferedCharacterToken()) 114 return true; 115 source.advance(); 116 m_token.clear(); 117 m_token.makeEndOfFile(); 118 return true; 119 } 120 121 inline void HTMLTokenizer::saveEndTagNameIfNeeded() 122 { 123 ASSERT(m_token.type() != HTMLToken::Uninitialized); 124 if (m_token.type() == HTMLToken::StartTag) 125 m_appropriateEndTagName = m_token.name(); 126 } 127 128 inline bool HTMLTokenizer::haveBufferedCharacterToken() const 129 { 130 return m_token.type() == HTMLToken::Character; 103 131 } 104 132 … … 120 148 } 121 149 122 bool HTMLTokenizer::flushBufferedEndTag(SegmentedString& source) 123 { 124 ASSERT(m_token->type() == HTMLToken::Character || m_token->type() == HTMLToken::Uninitialized); 125 source.advanceAndUpdateLineNumber(); 126 if (m_token->type() == HTMLToken::Character) 127 return true; 128 m_token->beginEndTag(m_bufferedEndTagName); 150 void HTMLTokenizer::flushBufferedEndTag() 151 { 152 m_token.beginEndTag(m_bufferedEndTagName); 129 153 m_bufferedEndTagName.clear(); 130 154 m_appropriateEndTagName.clear(); 131 155 m_temporaryBuffer.clear(); 156 } 157 158 bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state) 159 { 160 ASSERT(source.currentChar() == character); 161 appendToTemporaryBuffer(character); 162 source.advanceAndUpdateLineNumber(); 163 164 if (haveBufferedCharacterToken()) { 165 // Emit the buffered character token. 166 // The next call to processToken will flush the buffered end tag and continue parsing it. 167 m_state = state; 168 return true; 169 } 170 171 flushBufferedEndTag(); 132 172 return false; 133 173 } 134 174 135 #define FLUSH_AND_ADVANCE_TO(stateName) \ 136 do { \ 137 m_state = HTMLTokenizer::stateName; \ 138 if (flushBufferedEndTag(source)) \ 139 return true; \ 140 if (source.isEmpty() \ 141 || !m_inputStreamPreprocessor.peek(source)) \ 142 return haveBufferedCharacterToken(); \ 143 cc = m_inputStreamPreprocessor.nextInputCharacter(); \ 144 goto stateName; \ 145 } while (false) 146 147 bool HTMLTokenizer::flushEmitAndResumeIn(SegmentedString& source, HTMLTokenizer::State state) 148 { 149 m_state = state; 150 flushBufferedEndTag(source); 175 bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source) 176 { 177 ASSERT(source.currentChar() == '>'); 178 appendToTemporaryBuffer('>'); 179 source.advance(); 180 181 m_state = DataState; 182 183 if (haveBufferedCharacterToken()) { 184 // Emit the character token we already have. 185 // The next call to processToken will flush the buffered end tag and emit it. 186 return true; 187 } 188 189 flushBufferedEndTag(); 151 190 return true; 152 191 } 153 192 154 bool HTMLTokenizer::nextToken(SegmentedString& source, HTMLToken& token) 155 { 156 // If we have a token in progress, then we're supposed to be called back 157 // with the same token so we can finish it. 158 ASSERT(!m_token || m_token == &token || token.type() == HTMLToken::Uninitialized); 159 m_token = &token; 160 161 if (!m_bufferedEndTagName.isEmpty() && !isEndTagBufferingState(m_state)) { 162 // FIXME: This should call flushBufferedEndTag(). 163 // We started an end tag during our last iteration. 164 m_token->beginEndTag(m_bufferedEndTagName); 165 m_bufferedEndTagName.clear(); 166 m_appropriateEndTagName.clear(); 167 m_temporaryBuffer.clear(); 168 if (m_state == HTMLTokenizer::DataState) { 169 // We're back in the data state, so we must be done with the tag. 193 bool HTMLTokenizer::processToken(SegmentedString& source) 194 { 195 if (!m_bufferedEndTagName.isEmpty() && !inEndTagBufferingState()) { 196 // We are back here after emitting a character token that came just before an end tag. 197 // To continue parsing the end tag we need to move the buffered tag name into the token. 198 flushBufferedEndTag(); 199 200 // If we are in the data state, the end tag is already complete and we should emit it 201 // now, otherwise, we want to resume parsing the partial end tag. 202 if (m_state == DataState) 170 203 return true; 171 }172 204 } 173 205 174 if ( source.isEmpty() || !m_inputStreamPreprocessor.peek(source))206 if (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state))) 175 207 return haveBufferedCharacterToken(); 176 UChar c c = m_inputStreamPreprocessor.nextInputCharacter();177 178 // Source: http://www.whatwg.org/specs/web-apps/current-work/#tokenisation0208 UChar character = m_preprocessor.nextInputCharacter(); 209 210 // https://html.spec.whatwg.org/#tokenization 179 211 switch (m_state) { 180 HTML_BEGIN_STATE(DataState) { 181 if (cc == '&') 182 HTML_ADVANCE_TO(CharacterReferenceInDataState); 183 else if (cc == '<') { 184 if (m_token->type() == HTMLToken::Character) { 185 // We have a bunch of character tokens queued up that we 186 // are emitting lazily here. 187 return true; 188 } 189 HTML_ADVANCE_TO(TagOpenState); 190 } else if (cc == kEndOfFileMarker) 212 213 BEGIN_STATE(DataState) 214 if (character == '&') 215 ADVANCE_TO(CharacterReferenceInDataState); 216 if (character == '<') { 217 if (haveBufferedCharacterToken()) 218 RETURN_IN_CURRENT_STATE(true); 219 ADVANCE_TO(TagOpenState); 220 } 221 if (character == kEndOfFileMarker) 191 222 return emitEndOfFile(source); 192 else { 193 bufferCharacter(cc); 194 HTML_ADVANCE_TO(DataState); 195 } 196 } 197 END_STATE() 198 199 HTML_BEGIN_STATE(CharacterReferenceInDataState) { 223 bufferCharacter(character); 224 ADVANCE_TO(DataState); 225 END_STATE() 226 227 BEGIN_STATE(CharacterReferenceInDataState) 200 228 if (!processEntity(source)) 201 return haveBufferedCharacterToken(); 202 HTML_SWITCH_TO(DataState); 203 } 204 END_STATE() 205 206 HTML_BEGIN_STATE(RCDATAState) { 207 if (cc == '&') 208 HTML_ADVANCE_TO(CharacterReferenceInRCDATAState); 209 else if (cc == '<') 210 HTML_ADVANCE_TO(RCDATALessThanSignState); 211 else if (cc == kEndOfFileMarker) 212 return emitEndOfFile(source); 213 else { 214 bufferCharacter(cc); 215 HTML_ADVANCE_TO(RCDATAState); 216 } 217 } 218 END_STATE() 219 220 HTML_BEGIN_STATE(CharacterReferenceInRCDATAState) { 229 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 230 SWITCH_TO(DataState); 231 END_STATE() 232 233 BEGIN_STATE(RCDATAState) 234 if (character == '&') 235 ADVANCE_TO(CharacterReferenceInRCDATAState); 236 if (character == '<') 237 ADVANCE_TO(RCDATALessThanSignState); 238 if (character == kEndOfFileMarker) 239 RECONSUME_IN(DataState); 240 bufferCharacter(character); 241 ADVANCE_TO(RCDATAState); 242 END_STATE() 243 244 BEGIN_STATE(CharacterReferenceInRCDATAState) 221 245 if (!processEntity(source)) 222 return haveBufferedCharacterToken(); 223 HTML_SWITCH_TO(RCDATAState); 224 } 225 END_STATE() 226 227 HTML_BEGIN_STATE(RAWTEXTState) { 228 if (cc == '<') 229 HTML_ADVANCE_TO(RAWTEXTLessThanSignState); 230 else if (cc == kEndOfFileMarker) 231 return emitEndOfFile(source); 232 else { 233 bufferCharacter(cc); 234 HTML_ADVANCE_TO(RAWTEXTState); 235 } 236 } 237 END_STATE() 238 239 HTML_BEGIN_STATE(ScriptDataState) { 240 if (cc == '<') 241 HTML_ADVANCE_TO(ScriptDataLessThanSignState); 242 else if (cc == kEndOfFileMarker) 243 return emitEndOfFile(source); 244 else { 245 bufferCharacter(cc); 246 HTML_ADVANCE_TO(ScriptDataState); 247 } 248 } 249 END_STATE() 250 251 HTML_BEGIN_STATE(PLAINTEXTState) { 252 if (cc == kEndOfFileMarker) 253 return emitEndOfFile(source); 254 bufferCharacter(cc); 255 HTML_ADVANCE_TO(PLAINTEXTState); 256 } 257 END_STATE() 258 259 HTML_BEGIN_STATE(TagOpenState) { 260 if (cc == '!') 261 HTML_ADVANCE_TO(MarkupDeclarationOpenState); 262 else if (cc == '/') 263 HTML_ADVANCE_TO(EndTagOpenState); 264 else if (isASCIIUpper(cc)) { 265 m_token->beginStartTag(toLowerCase(cc)); 266 HTML_ADVANCE_TO(TagNameState); 267 } else if (isASCIILower(cc)) { 268 m_token->beginStartTag(cc); 269 HTML_ADVANCE_TO(TagNameState); 270 } else if (cc == '?') { 246 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 247 SWITCH_TO(RCDATAState); 248 END_STATE() 249 250 BEGIN_STATE(RAWTEXTState) 251 if (character == '<') 252 ADVANCE_TO(RAWTEXTLessThanSignState); 253 if (character == kEndOfFileMarker) 254 RECONSUME_IN(DataState); 255 bufferCharacter(character); 256 ADVANCE_TO(RAWTEXTState); 257 END_STATE() 258 259 BEGIN_STATE(ScriptDataState) 260 if (character == '<') 261 ADVANCE_TO(ScriptDataLessThanSignState); 262 if (character == kEndOfFileMarker) 263 RECONSUME_IN(DataState); 264 bufferCharacter(character); 265 ADVANCE_TO(ScriptDataState); 266 END_STATE() 267 268 BEGIN_STATE(PLAINTEXTState) 269 if (character == kEndOfFileMarker) 270 RECONSUME_IN(DataState); 271 bufferCharacter(character); 272 ADVANCE_TO(PLAINTEXTState); 273 END_STATE() 274 275 BEGIN_STATE(TagOpenState) 276 if (character == '!') 277 ADVANCE_TO(MarkupDeclarationOpenState); 278 if (character == '/') 279 ADVANCE_TO(EndTagOpenState); 280 if (isASCIIAlpha(character)) { 281 m_token.beginStartTag(convertASCIIAlphaToLower(character)); 282 ADVANCE_TO(TagNameState); 283 } 284 if (character == '?') { 271 285 parseError(); 272 286 // The spec consumes the current character before switching 273 287 // to the bogus comment state, but it's easier to implement 274 288 // if we reconsume the current character. 275 HTML_RECONSUME_IN(BogusCommentState); 276 } else { 277 parseError(); 278 bufferASCIICharacter('<'); 279 HTML_RECONSUME_IN(DataState); 280 } 281 } 282 END_STATE() 283 284 HTML_BEGIN_STATE(EndTagOpenState) { 285 if (isASCIIUpper(cc)) { 286 m_token->beginEndTag(static_cast<LChar>(toLowerCase(cc))); 289 RECONSUME_IN(BogusCommentState); 290 } 291 parseError(); 292 bufferASCIICharacter('<'); 293 RECONSUME_IN(DataState); 294 END_STATE() 295 296 BEGIN_STATE(EndTagOpenState) 297 if (isASCIIAlpha(character)) { 298 m_token.beginEndTag(convertASCIIAlphaToLower(character)); 287 299 m_appropriateEndTagName.clear(); 288 HTML_ADVANCE_TO(TagNameState); 289 } else if (isASCIILower(cc)) { 290 m_token->beginEndTag(static_cast<LChar>(cc)); 291 m_appropriateEndTagName.clear(); 292 HTML_ADVANCE_TO(TagNameState); 293 } else if (cc == '>') { 294 parseError(); 295 HTML_ADVANCE_TO(DataState); 296 } else if (cc == kEndOfFileMarker) { 300 ADVANCE_TO(TagNameState); 301 } 302 if (character == '>') { 303 parseError(); 304 ADVANCE_TO(DataState); 305 } 306 if (character == kEndOfFileMarker) { 297 307 parseError(); 298 308 bufferASCIICharacter('<'); 299 309 bufferASCIICharacter('/'); 300 HTML_RECONSUME_IN(DataState); 301 } else { 302 parseError(); 303 HTML_RECONSUME_IN(BogusCommentState); 304 } 305 } 306 END_STATE() 307 308 HTML_BEGIN_STATE(TagNameState) { 309 if (isTokenizerWhitespace(cc)) 310 HTML_ADVANCE_TO(BeforeAttributeNameState); 311 else if (cc == '/') 312 HTML_ADVANCE_TO(SelfClosingStartTagState); 313 else if (cc == '>') 314 return emitAndResumeIn(source, HTMLTokenizer::DataState); 315 else if (m_options.usePreHTML5ParserQuirks && cc == '<') 316 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 317 else if (isASCIIUpper(cc)) { 318 m_token->appendToName(toLowerCase(cc)); 319 HTML_ADVANCE_TO(TagNameState); 320 } else if (cc == kEndOfFileMarker) { 321 parseError(); 322 HTML_RECONSUME_IN(DataState); 323 } else { 324 m_token->appendToName(cc); 325 HTML_ADVANCE_TO(TagNameState); 326 } 327 } 328 END_STATE() 329 330 HTML_BEGIN_STATE(RCDATALessThanSignState) { 331 if (cc == '/') { 310 RECONSUME_IN(DataState); 311 } 312 parseError(); 313 RECONSUME_IN(BogusCommentState); 314 END_STATE() 315 316 BEGIN_STATE(TagNameState) 317 if (isTokenizerWhitespace(character)) 318 ADVANCE_TO(BeforeAttributeNameState); 319 if (character == '/') 320 ADVANCE_TO(SelfClosingStartTagState); 321 if (character == '>') 322 return emitAndResumeInDataState(source); 323 if (m_options.usePreHTML5ParserQuirks && character == '<') 324 return emitAndReconsumeInDataState(); 325 if (character == kEndOfFileMarker) { 326 parseError(); 327 RECONSUME_IN(DataState); 328 } 329 m_token.appendToName(toASCIILower(character)); 330 ADVANCE_TO(TagNameState); 331 END_STATE() 332 333 BEGIN_STATE(RCDATALessThanSignState) 334 if (character == '/') { 332 335 m_temporaryBuffer.clear(); 333 336 ASSERT(m_bufferedEndTagName.isEmpty()); 334 HTML_ADVANCE_TO(RCDATAEndTagOpenState); 335 } else { 336 bufferASCIICharacter('<'); 337 HTML_RECONSUME_IN(RCDATAState); 338 } 339 } 340 END_STATE() 341 342 HTML_BEGIN_STATE(RCDATAEndTagOpenState) { 343 if (isASCIIUpper(cc)) { 344 m_temporaryBuffer.append(static_cast<LChar>(cc)); 345 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 346 HTML_ADVANCE_TO(RCDATAEndTagNameState); 347 } else if (isASCIILower(cc)) { 348 m_temporaryBuffer.append(static_cast<LChar>(cc)); 349 addToPossibleEndTag(static_cast<LChar>(cc)); 350 HTML_ADVANCE_TO(RCDATAEndTagNameState); 351 } else { 352 bufferASCIICharacter('<'); 353 bufferASCIICharacter('/'); 354 HTML_RECONSUME_IN(RCDATAState); 355 } 356 } 357 END_STATE() 358 359 HTML_BEGIN_STATE(RCDATAEndTagNameState) { 360 if (isASCIIUpper(cc)) { 361 m_temporaryBuffer.append(static_cast<LChar>(cc)); 362 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 363 HTML_ADVANCE_TO(RCDATAEndTagNameState); 364 } else if (isASCIILower(cc)) { 365 m_temporaryBuffer.append(static_cast<LChar>(cc)); 366 addToPossibleEndTag(static_cast<LChar>(cc)); 367 HTML_ADVANCE_TO(RCDATAEndTagNameState); 368 } else { 369 if (isTokenizerWhitespace(cc)) { 370 if (isAppropriateEndTag()) { 371 m_temporaryBuffer.append(static_cast<LChar>(cc)); 372 FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState); 373 } 374 } else if (cc == '/') { 375 if (isAppropriateEndTag()) { 376 m_temporaryBuffer.append(static_cast<LChar>(cc)); 377 FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState); 378 } 379 } else if (cc == '>') { 380 if (isAppropriateEndTag()) { 381 m_temporaryBuffer.append(static_cast<LChar>(cc)); 382 return flushEmitAndResumeIn(source, HTMLTokenizer::DataState); 383 } 337 ADVANCE_TO(RCDATAEndTagOpenState); 338 } 339 bufferASCIICharacter('<'); 340 RECONSUME_IN(RCDATAState); 341 END_STATE() 342 343 BEGIN_STATE(RCDATAEndTagOpenState) 344 if (isASCIIAlpha(character)) { 345 appendToTemporaryBuffer(character); 346 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 347 ADVANCE_TO(RCDATAEndTagNameState); 348 } 349 bufferASCIICharacter('<'); 350 bufferASCIICharacter('/'); 351 RECONSUME_IN(RCDATAState); 352 END_STATE() 353 354 BEGIN_STATE(RCDATAEndTagNameState) 355 if (isASCIIAlpha(character)) { 356 appendToTemporaryBuffer(character); 357 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 358 ADVANCE_TO(RCDATAEndTagNameState); 359 } 360 if (isTokenizerWhitespace(character)) { 361 if (isAppropriateEndTag()) { 362 if (commitToPartialEndTag(source, character, BeforeAttributeNameState)) 363 return true; 364 SWITCH_TO(BeforeAttributeNameState); 384 365 } 385 bufferASCIICharacter('<'); 386 bufferASCIICharacter('/'); 387 m_token->appendToCharacter(m_temporaryBuffer); 388 m_bufferedEndTagName.clear(); 389 m_temporaryBuffer.clear(); 390 HTML_RECONSUME_IN(RCDATAState); 391 } 392 } 393 END_STATE() 394 395 HTML_BEGIN_STATE(RAWTEXTLessThanSignState) { 396 if (cc == '/') { 366 } else if (character == '/') { 367 if (isAppropriateEndTag()) { 368 if (commitToPartialEndTag(source, '/', SelfClosingStartTagState)) 369 return true; 370 SWITCH_TO(SelfClosingStartTagState); 371 } 372 } else if (character == '>') { 373 if (isAppropriateEndTag()) 374 return commitToCompleteEndTag(source); 375 } 376 bufferASCIICharacter('<'); 377 bufferASCIICharacter('/'); 378 m_token.appendToCharacter(m_temporaryBuffer); 379 m_bufferedEndTagName.clear(); 380 m_temporaryBuffer.clear(); 381 RECONSUME_IN(RCDATAState); 382 END_STATE() 383 384 BEGIN_STATE(RAWTEXTLessThanSignState) 385 if (character == '/') { 397 386 m_temporaryBuffer.clear(); 398 387 ASSERT(m_bufferedEndTagName.isEmpty()); 399 HTML_ADVANCE_TO(RAWTEXTEndTagOpenState); 400 } else { 401 bufferASCIICharacter('<'); 402 HTML_RECONSUME_IN(RAWTEXTState); 403 } 404 } 405 END_STATE() 406 407 HTML_BEGIN_STATE(RAWTEXTEndTagOpenState) { 408 if (isASCIIUpper(cc)) { 409 m_temporaryBuffer.append(static_cast<LChar>(cc)); 410 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 411 HTML_ADVANCE_TO(RAWTEXTEndTagNameState); 412 } else if (isASCIILower(cc)) { 413 m_temporaryBuffer.append(static_cast<LChar>(cc)); 414 addToPossibleEndTag(static_cast<LChar>(cc)); 415 HTML_ADVANCE_TO(RAWTEXTEndTagNameState); 416 } else { 417 bufferASCIICharacter('<'); 418 bufferASCIICharacter('/'); 419 HTML_RECONSUME_IN(RAWTEXTState); 420 } 421 } 422 END_STATE() 423 424 HTML_BEGIN_STATE(RAWTEXTEndTagNameState) { 425 if (isASCIIUpper(cc)) { 426 m_temporaryBuffer.append(static_cast<LChar>(cc)); 427 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 428 HTML_ADVANCE_TO(RAWTEXTEndTagNameState); 429 } else if (isASCIILower(cc)) { 430 m_temporaryBuffer.append(static_cast<LChar>(cc)); 431 addToPossibleEndTag(static_cast<LChar>(cc)); 432 HTML_ADVANCE_TO(RAWTEXTEndTagNameState); 433 } else { 434 if (isTokenizerWhitespace(cc)) { 435 if (isAppropriateEndTag()) { 436 m_temporaryBuffer.append(static_cast<LChar>(cc)); 437 FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState); 438 } 439 } else if (cc == '/') { 440 if (isAppropriateEndTag()) { 441 m_temporaryBuffer.append(static_cast<LChar>(cc)); 442 FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState); 443 } 444 } else if (cc == '>') { 445 if (isAppropriateEndTag()) { 446 m_temporaryBuffer.append(static_cast<LChar>(cc)); 447 return flushEmitAndResumeIn(source, HTMLTokenizer::DataState); 448 } 388 ADVANCE_TO(RAWTEXTEndTagOpenState); 389 } 390 bufferASCIICharacter('<'); 391 RECONSUME_IN(RAWTEXTState); 392 END_STATE() 393 394 BEGIN_STATE(RAWTEXTEndTagOpenState) 395 if (isASCIIAlpha(character)) { 396 appendToTemporaryBuffer(character); 397 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 398 ADVANCE_TO(RAWTEXTEndTagNameState); 399 } 400 bufferASCIICharacter('<'); 401 bufferASCIICharacter('/'); 402 RECONSUME_IN(RAWTEXTState); 403 END_STATE() 404 405 BEGIN_STATE(RAWTEXTEndTagNameState) 406 if (isASCIIAlpha(character)) { 407 appendToTemporaryBuffer(character); 408 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 409 ADVANCE_TO(RAWTEXTEndTagNameState); 410 } 411 if (isTokenizerWhitespace(character)) { 412 if (isAppropriateEndTag()) { 413 if (commitToPartialEndTag(source, character, BeforeAttributeNameState)) 414 return true; 415 SWITCH_TO(BeforeAttributeNameState); 449 416 } 450 bufferASCIICharacter('<'); 451 bufferASCIICharacter('/'); 452 m_token->appendToCharacter(m_temporaryBuffer); 453 m_bufferedEndTagName.clear(); 454 m_temporaryBuffer.clear(); 455 HTML_RECONSUME_IN(RAWTEXTState); 456 } 457 } 458 END_STATE() 459 460 HTML_BEGIN_STATE(ScriptDataLessThanSignState) { 461 if (cc == '/') { 417 } else if (character == '/') { 418 if (isAppropriateEndTag()) { 419 if (commitToPartialEndTag(source, '/', SelfClosingStartTagState)) 420 return true; 421 SWITCH_TO(SelfClosingStartTagState); 422 } 423 } else if (character == '>') { 424 if (isAppropriateEndTag()) 425 return commitToCompleteEndTag(source); 426 } 427 bufferASCIICharacter('<'); 428 bufferASCIICharacter('/'); 429 m_token.appendToCharacter(m_temporaryBuffer); 430 m_bufferedEndTagName.clear(); 431 m_temporaryBuffer.clear(); 432 RECONSUME_IN(RAWTEXTState); 433 END_STATE() 434 435 BEGIN_STATE(ScriptDataLessThanSignState) 436 if (character == '/') { 462 437 m_temporaryBuffer.clear(); 463 438 ASSERT(m_bufferedEndTagName.isEmpty()); 464 HTML_ADVANCE_TO(ScriptDataEndTagOpenState); 465 } else if (cc == '!') { 439 ADVANCE_TO(ScriptDataEndTagOpenState); 440 } 441 if (character == '!') { 466 442 bufferASCIICharacter('<'); 467 443 bufferASCIICharacter('!'); 468 HTML_ADVANCE_TO(ScriptDataEscapeStartState); 469 } else { 470 bufferASCIICharacter('<'); 471 HTML_RECONSUME_IN(ScriptDataState); 472 } 473 } 474 END_STATE() 475 476 HTML_BEGIN_STATE(ScriptDataEndTagOpenState) { 477 if (isASCIIUpper(cc)) { 478 m_temporaryBuffer.append(static_cast<LChar>(cc)); 479 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 480 HTML_ADVANCE_TO(ScriptDataEndTagNameState); 481 } else if (isASCIILower(cc)) { 482 m_temporaryBuffer.append(static_cast<LChar>(cc)); 483 addToPossibleEndTag(static_cast<LChar>(cc)); 484 HTML_ADVANCE_TO(ScriptDataEndTagNameState); 485 } else { 486 bufferASCIICharacter('<'); 487 bufferASCIICharacter('/'); 488 HTML_RECONSUME_IN(ScriptDataState); 489 } 490 } 491 END_STATE() 492 493 HTML_BEGIN_STATE(ScriptDataEndTagNameState) { 494 if (isASCIIUpper(cc)) { 495 m_temporaryBuffer.append(static_cast<LChar>(cc)); 496 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 497 HTML_ADVANCE_TO(ScriptDataEndTagNameState); 498 } else if (isASCIILower(cc)) { 499 m_temporaryBuffer.append(static_cast<LChar>(cc)); 500 addToPossibleEndTag(static_cast<LChar>(cc)); 501 HTML_ADVANCE_TO(ScriptDataEndTagNameState); 502 } else { 503 if (isTokenizerWhitespace(cc)) { 504 if (isAppropriateEndTag()) { 505 m_temporaryBuffer.append(static_cast<LChar>(cc)); 506 FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState); 507 } 508 } else if (cc == '/') { 509 if (isAppropriateEndTag()) { 510 m_temporaryBuffer.append(static_cast<LChar>(cc)); 511 FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState); 512 } 513 } else if (cc == '>') { 514 if (isAppropriateEndTag()) { 515 m_temporaryBuffer.append(static_cast<LChar>(cc)); 516 return flushEmitAndResumeIn(source, HTMLTokenizer::DataState); 517 } 444 ADVANCE_TO(ScriptDataEscapeStartState); 445 } 446 bufferASCIICharacter('<'); 447 RECONSUME_IN(ScriptDataState); 448 END_STATE() 449 450 BEGIN_STATE(ScriptDataEndTagOpenState) 451 if (isASCIIAlpha(character)) { 452 appendToTemporaryBuffer(character); 453 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 454 ADVANCE_TO(ScriptDataEndTagNameState); 455 } 456 bufferASCIICharacter('<'); 457 bufferASCIICharacter('/'); 458 RECONSUME_IN(ScriptDataState); 459 END_STATE() 460 461 BEGIN_STATE(ScriptDataEndTagNameState) 462 if (isASCIIAlpha(character)) { 463 appendToTemporaryBuffer(character); 464 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 465 ADVANCE_TO(ScriptDataEndTagNameState); 466 } 467 if (isTokenizerWhitespace(character)) { 468 if (isAppropriateEndTag()) { 469 if (commitToPartialEndTag(source, character, BeforeAttributeNameState)) 470 return true; 471 SWITCH_TO(BeforeAttributeNameState); 518 472 } 519 bufferASCIICharacter('<'); 520 bufferASCIICharacter('/'); 521 m_token->appendToCharacter(m_temporaryBuffer); 522 m_bufferedEndTagName.clear(); 523 m_temporaryBuffer.clear(); 524 HTML_RECONSUME_IN(ScriptDataState); 525 } 526 } 527 END_STATE() 528 529 HTML_BEGIN_STATE(ScriptDataEscapeStartState) { 530 if (cc == '-') { 473 } else if (character == '/') { 474 if (isAppropriateEndTag()) { 475 if (commitToPartialEndTag(source, '/', SelfClosingStartTagState)) 476 return true; 477 SWITCH_TO(SelfClosingStartTagState); 478 } 479 } else if (character == '>') { 480 if (isAppropriateEndTag()) 481 return commitToCompleteEndTag(source); 482 } 483 bufferASCIICharacter('<'); 484 bufferASCIICharacter('/'); 485 m_token.appendToCharacter(m_temporaryBuffer); 486 m_bufferedEndTagName.clear(); 487 m_temporaryBuffer.clear(); 488 RECONSUME_IN(ScriptDataState); 489 END_STATE() 490 491 BEGIN_STATE(ScriptDataEscapeStartState) 492 if (character == '-') { 531 493 bufferASCIICharacter('-'); 532 HTML_ADVANCE_TO(ScriptDataEscapeStartDashState);494 ADVANCE_TO(ScriptDataEscapeStartDashState); 533 495 } else 534 HTML_RECONSUME_IN(ScriptDataState); 535 } 536 END_STATE() 537 538 HTML_BEGIN_STATE(ScriptDataEscapeStartDashState) { 539 if (cc == '-') { 496 RECONSUME_IN(ScriptDataState); 497 END_STATE() 498 499 BEGIN_STATE(ScriptDataEscapeStartDashState) 500 if (character == '-') { 540 501 bufferASCIICharacter('-'); 541 HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);502 ADVANCE_TO(ScriptDataEscapedDashDashState); 542 503 } else 543 HTML_RECONSUME_IN(ScriptDataState); 544 } 545 END_STATE() 546 547 HTML_BEGIN_STATE(ScriptDataEscapedState) { 548 if (cc == '-') { 504 RECONSUME_IN(ScriptDataState); 505 END_STATE() 506 507 BEGIN_STATE(ScriptDataEscapedState) 508 if (character == '-') { 549 509 bufferASCIICharacter('-'); 550 HTML_ADVANCE_TO(ScriptDataEscapedDashState); 551 } else if (cc == '<') 552 HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState); 553 else if (cc == kEndOfFileMarker) { 554 parseError(); 555 HTML_RECONSUME_IN(DataState); 556 } else { 557 bufferCharacter(cc); 558 HTML_ADVANCE_TO(ScriptDataEscapedState); 559 } 560 } 561 END_STATE() 562 563 HTML_BEGIN_STATE(ScriptDataEscapedDashState) { 564 if (cc == '-') { 510 ADVANCE_TO(ScriptDataEscapedDashState); 511 } 512 if (character == '<') 513 ADVANCE_TO(ScriptDataEscapedLessThanSignState); 514 if (character == kEndOfFileMarker) { 515 parseError(); 516 RECONSUME_IN(DataState); 517 } 518 bufferCharacter(character); 519 ADVANCE_TO(ScriptDataEscapedState); 520 END_STATE() 521 522 BEGIN_STATE(ScriptDataEscapedDashState) 523 if (character == '-') { 565 524 bufferASCIICharacter('-'); 566 HTML_ADVANCE_TO(ScriptDataEscapedDashDashState); 567 } else if (cc == '<') 568 HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState); 569 else if (cc == kEndOfFileMarker) { 570 parseError(); 571 HTML_RECONSUME_IN(DataState); 572 } else { 573 bufferCharacter(cc); 574 HTML_ADVANCE_TO(ScriptDataEscapedState); 575 } 576 } 577 END_STATE() 578 579 HTML_BEGIN_STATE(ScriptDataEscapedDashDashState) { 580 if (cc == '-') { 525 ADVANCE_TO(ScriptDataEscapedDashDashState); 526 } 527 if (character == '<') 528 ADVANCE_TO(ScriptDataEscapedLessThanSignState); 529 if (character == kEndOfFileMarker) { 530 parseError(); 531 RECONSUME_IN(DataState); 532 } 533 bufferCharacter(character); 534 ADVANCE_TO(ScriptDataEscapedState); 535 END_STATE() 536 537 BEGIN_STATE(ScriptDataEscapedDashDashState) 538 if (character == '-') { 581 539 bufferASCIICharacter('-'); 582 HTML_ADVANCE_TO(ScriptDataEscapedDashDashState); 583 } else if (cc == '<') 584 HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState); 585 else if (cc == '>') { 540 ADVANCE_TO(ScriptDataEscapedDashDashState); 541 } 542 if (character == '<') 543 ADVANCE_TO(ScriptDataEscapedLessThanSignState); 544 if (character == '>') { 586 545 bufferASCIICharacter('>'); 587 HTML_ADVANCE_TO(ScriptDataState); 588 } else if (cc == kEndOfFileMarker) { 589 parseError(); 590 HTML_RECONSUME_IN(DataState); 591 } else { 592 bufferCharacter(cc); 593 HTML_ADVANCE_TO(ScriptDataEscapedState); 594 } 595 } 596 END_STATE() 597 598 HTML_BEGIN_STATE(ScriptDataEscapedLessThanSignState) { 599 if (cc == '/') { 546 ADVANCE_TO(ScriptDataState); 547 } 548 if (character == kEndOfFileMarker) { 549 parseError(); 550 RECONSUME_IN(DataState); 551 } 552 bufferCharacter(character); 553 ADVANCE_TO(ScriptDataEscapedState); 554 END_STATE() 555 556 BEGIN_STATE(ScriptDataEscapedLessThanSignState) 557 if (character == '/') { 600 558 m_temporaryBuffer.clear(); 601 559 ASSERT(m_bufferedEndTagName.isEmpty()); 602 HTML_ADVANCE_TO(ScriptDataEscapedEndTagOpenState); 603 } else if (isASCIIUpper(cc)) { 560 ADVANCE_TO(ScriptDataEscapedEndTagOpenState); 561 } 562 if (isASCIIAlpha(character)) { 604 563 bufferASCIICharacter('<'); 605 bufferASCIICharacter(c c);564 bufferASCIICharacter(character); 606 565 m_temporaryBuffer.clear(); 607 m_temporaryBuffer.append(toLowerCase(cc)); 608 HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState); 609 } else if (isASCIILower(cc)) { 566 appendToTemporaryBuffer(convertASCIIAlphaToLower(character)); 567 ADVANCE_TO(ScriptDataDoubleEscapeStartState); 568 } 569 bufferASCIICharacter('<'); 570 RECONSUME_IN(ScriptDataEscapedState); 571 END_STATE() 572 573 BEGIN_STATE(ScriptDataEscapedEndTagOpenState) 574 if (isASCIIAlpha(character)) { 575 appendToTemporaryBuffer(character); 576 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 577 ADVANCE_TO(ScriptDataEscapedEndTagNameState); 578 } 579 bufferASCIICharacter('<'); 580 bufferASCIICharacter('/'); 581 RECONSUME_IN(ScriptDataEscapedState); 582 END_STATE() 583 584 BEGIN_STATE(ScriptDataEscapedEndTagNameState) 585 if (isASCIIAlpha(character)) { 586 appendToTemporaryBuffer(character); 587 appendToPossibleEndTag(convertASCIIAlphaToLower(character)); 588 ADVANCE_TO(ScriptDataEscapedEndTagNameState); 589 } 590 if (isTokenizerWhitespace(character)) { 591 if (isAppropriateEndTag()) { 592 if (commitToPartialEndTag(source, character, BeforeAttributeNameState)) 593 return true; 594 SWITCH_TO(BeforeAttributeNameState); 595 } 596 } else if (character == '/') { 597 if (isAppropriateEndTag()) { 598 if (commitToPartialEndTag(source, '/', SelfClosingStartTagState)) 599 return true; 600 SWITCH_TO(SelfClosingStartTagState); 601 } 602 } else if (character == '>') { 603 if (isAppropriateEndTag()) 604 return commitToCompleteEndTag(source); 605 } 606 bufferASCIICharacter('<'); 607 bufferASCIICharacter('/'); 608 m_token.appendToCharacter(m_temporaryBuffer); 609 m_bufferedEndTagName.clear(); 610 m_temporaryBuffer.clear(); 611 RECONSUME_IN(ScriptDataEscapedState); 612 END_STATE() 613 614 BEGIN_STATE(ScriptDataDoubleEscapeStartState) 615 if (isTokenizerWhitespace(character) || character == '/' || character == '>') { 616 bufferASCIICharacter(character); 617 if (temporaryBufferIs("script")) 618 ADVANCE_TO(ScriptDataDoubleEscapedState); 619 else 620 ADVANCE_TO(ScriptDataEscapedState); 621 } 622 if (isASCIIAlpha(character)) { 623 bufferASCIICharacter(character); 624 appendToTemporaryBuffer(convertASCIIAlphaToLower(character)); 625 ADVANCE_TO(ScriptDataDoubleEscapeStartState); 626 } 627 RECONSUME_IN(ScriptDataEscapedState); 628 END_STATE() 629 630 BEGIN_STATE(ScriptDataDoubleEscapedState) 631 if (character == '-') { 632 bufferASCIICharacter('-'); 633 ADVANCE_TO(ScriptDataDoubleEscapedDashState); 634 } 635 if (character == '<') { 610 636 bufferASCIICharacter('<'); 611 bufferASCIICharacter(cc); 612 m_temporaryBuffer.clear(); 613 m_temporaryBuffer.append(static_cast<LChar>(cc)); 614 HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState); 615 } else { 637 ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 638 } 639 if (character == kEndOfFileMarker) { 640 parseError(); 641 RECONSUME_IN(DataState); 642 } 643 bufferCharacter(character); 644 ADVANCE_TO(ScriptDataDoubleEscapedState); 645 END_STATE() 646 647 BEGIN_STATE(ScriptDataDoubleEscapedDashState) 648 if (character == '-') { 649 bufferASCIICharacter('-'); 650 ADVANCE_TO(ScriptDataDoubleEscapedDashDashState); 651 } 652 if (character == '<') { 616 653 bufferASCIICharacter('<'); 617 HTML_RECONSUME_IN(ScriptDataEscapedState); 618 } 619 } 620 END_STATE() 621 622 HTML_BEGIN_STATE(ScriptDataEscapedEndTagOpenState) { 623 if (isASCIIUpper(cc)) { 624 m_temporaryBuffer.append(static_cast<LChar>(cc)); 625 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 626 HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState); 627 } else if (isASCIILower(cc)) { 628 m_temporaryBuffer.append(static_cast<LChar>(cc)); 629 addToPossibleEndTag(static_cast<LChar>(cc)); 630 HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState); 631 } else { 654 ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 655 } 656 if (character == kEndOfFileMarker) { 657 parseError(); 658 RECONSUME_IN(DataState); 659 } 660 bufferCharacter(character); 661 ADVANCE_TO(ScriptDataDoubleEscapedState); 662 END_STATE() 663 664 BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) 665 if (character == '-') { 666 bufferASCIICharacter('-'); 667 ADVANCE_TO(ScriptDataDoubleEscapedDashDashState); 668 } 669 if (character == '<') { 632 670 bufferASCIICharacter('<'); 633 bufferASCIICharacter('/'); 634 HTML_RECONSUME_IN(ScriptDataEscapedState); 635 } 636 } 637 END_STATE() 638 639 HTML_BEGIN_STATE(ScriptDataEscapedEndTagNameState) { 640 if (isASCIIUpper(cc)) { 641 m_temporaryBuffer.append(static_cast<LChar>(cc)); 642 addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc))); 643 HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState); 644 } else if (isASCIILower(cc)) { 645 m_temporaryBuffer.append(static_cast<LChar>(cc)); 646 addToPossibleEndTag(static_cast<LChar>(cc)); 647 HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState); 648 } else { 649 if (isTokenizerWhitespace(cc)) { 650 if (isAppropriateEndTag()) { 651 m_temporaryBuffer.append(static_cast<LChar>(cc)); 652 FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState); 653 } 654 } else if (cc == '/') { 655 if (isAppropriateEndTag()) { 656 m_temporaryBuffer.append(static_cast<LChar>(cc)); 657 FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState); 658 } 659 } else if (cc == '>') { 660 if (isAppropriateEndTag()) { 661 m_temporaryBuffer.append(static_cast<LChar>(cc)); 662 return flushEmitAndResumeIn(source, HTMLTokenizer::DataState); 663 } 664 } 665 bufferASCIICharacter('<'); 666 bufferASCIICharacter('/'); 667 m_token->appendToCharacter(m_temporaryBuffer); 668 m_bufferedEndTagName.clear(); 669 m_temporaryBuffer.clear(); 670 HTML_RECONSUME_IN(ScriptDataEscapedState); 671 } 672 } 673 END_STATE() 674 675 HTML_BEGIN_STATE(ScriptDataDoubleEscapeStartState) { 676 if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') { 677 bufferASCIICharacter(cc); 678 if (temporaryBufferIs(scriptTag.localName())) 679 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState); 680 else 681 HTML_ADVANCE_TO(ScriptDataEscapedState); 682 } else if (isASCIIUpper(cc)) { 683 bufferASCIICharacter(cc); 684 m_temporaryBuffer.append(toLowerCase(cc)); 685 HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState); 686 } else if (isASCIILower(cc)) { 687 bufferASCIICharacter(cc); 688 m_temporaryBuffer.append(static_cast<LChar>(cc)); 689 HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState); 690 } else 691 HTML_RECONSUME_IN(ScriptDataEscapedState); 692 } 693 END_STATE() 694 695 HTML_BEGIN_STATE(ScriptDataDoubleEscapedState) { 696 if (cc == '-') { 697 bufferASCIICharacter('-'); 698 HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashState); 699 } else if (cc == '<') { 700 bufferASCIICharacter('<'); 701 HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 702 } else if (cc == kEndOfFileMarker) { 703 parseError(); 704 HTML_RECONSUME_IN(DataState); 705 } else { 706 bufferCharacter(cc); 707 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState); 708 } 709 } 710 END_STATE() 711 712 HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashState) { 713 if (cc == '-') { 714 bufferASCIICharacter('-'); 715 HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState); 716 } else if (cc == '<') { 717 bufferASCIICharacter('<'); 718 HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 719 } else if (cc == kEndOfFileMarker) { 720 parseError(); 721 HTML_RECONSUME_IN(DataState); 722 } else { 723 bufferCharacter(cc); 724 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState); 725 } 726 } 727 END_STATE() 728 729 HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) { 730 if (cc == '-') { 731 bufferASCIICharacter('-'); 732 HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState); 733 } else if (cc == '<') { 734 bufferASCIICharacter('<'); 735 HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 736 } else if (cc == '>') { 671 ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState); 672 } 673 if (character == '>') { 737 674 bufferASCIICharacter('>'); 738 HTML_ADVANCE_TO(ScriptDataState); 739 } else if (cc == kEndOfFileMarker) { 740 parseError(); 741 HTML_RECONSUME_IN(DataState); 742 } else { 743 bufferCharacter(cc); 744 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState); 745 } 746 } 747 END_STATE() 748 749 HTML_BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) { 750 if (cc == '/') { 675 ADVANCE_TO(ScriptDataState); 676 } 677 if (character == kEndOfFileMarker) { 678 parseError(); 679 RECONSUME_IN(DataState); 680 } 681 bufferCharacter(character); 682 ADVANCE_TO(ScriptDataDoubleEscapedState); 683 END_STATE() 684 685 BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) 686 if (character == '/') { 751 687 bufferASCIICharacter('/'); 752 688 m_temporaryBuffer.clear(); 753 HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState); 754 } else 755 HTML_RECONSUME_IN(ScriptDataDoubleEscapedState); 756 } 757 END_STATE() 758 759 HTML_BEGIN_STATE(ScriptDataDoubleEscapeEndState) { 760 if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') { 761 bufferASCIICharacter(cc); 762 if (temporaryBufferIs(scriptTag.localName())) 763 HTML_ADVANCE_TO(ScriptDataEscapedState); 689 ADVANCE_TO(ScriptDataDoubleEscapeEndState); 690 } 691 RECONSUME_IN(ScriptDataDoubleEscapedState); 692 END_STATE() 693 694 BEGIN_STATE(ScriptDataDoubleEscapeEndState) 695 if (isTokenizerWhitespace(character) || character == '/' || character == '>') { 696 bufferASCIICharacter(character); 697 if (temporaryBufferIs("script")) 698 ADVANCE_TO(ScriptDataEscapedState); 764 699 else 765 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState); 766 } else if (isASCIIUpper(cc)) { 767 bufferASCIICharacter(cc); 768 m_temporaryBuffer.append(toLowerCase(cc)); 769 HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState); 770 } else if (isASCIILower(cc)) { 771 bufferASCIICharacter(cc); 772 m_temporaryBuffer.append(static_cast<LChar>(cc)); 773 HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState); 774 } else 775 HTML_RECONSUME_IN(ScriptDataDoubleEscapedState); 776 } 777 END_STATE() 778 779 HTML_BEGIN_STATE(BeforeAttributeNameState) { 780 if (isTokenizerWhitespace(cc)) 781 HTML_ADVANCE_TO(BeforeAttributeNameState); 782 else if (cc == '/') 783 HTML_ADVANCE_TO(SelfClosingStartTagState); 784 else if (cc == '>') 785 return emitAndResumeIn(source, HTMLTokenizer::DataState); 786 else if (m_options.usePreHTML5ParserQuirks && cc == '<') 787 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 788 else if (isASCIIUpper(cc)) { 789 m_token->addNewAttribute(); 790 m_token->beginAttributeName(source.numberOfCharactersConsumed()); 791 m_token->appendToAttributeName(toLowerCase(cc)); 792 HTML_ADVANCE_TO(AttributeNameState); 793 } else if (cc == kEndOfFileMarker) { 794 parseError(); 795 HTML_RECONSUME_IN(DataState); 796 } else { 797 if (cc == '"' || cc == '\'' || cc == '<' || cc == '=') 798 parseError(); 799 m_token->addNewAttribute(); 800 m_token->beginAttributeName(source.numberOfCharactersConsumed()); 801 m_token->appendToAttributeName(cc); 802 HTML_ADVANCE_TO(AttributeNameState); 803 } 804 } 805 END_STATE() 806 807 HTML_BEGIN_STATE(AttributeNameState) { 808 if (isTokenizerWhitespace(cc)) { 809 m_token->endAttributeName(source.numberOfCharactersConsumed()); 810 HTML_ADVANCE_TO(AfterAttributeNameState); 811 } else if (cc == '/') { 812 m_token->endAttributeName(source.numberOfCharactersConsumed()); 813 HTML_ADVANCE_TO(SelfClosingStartTagState); 814 } else if (cc == '=') { 815 m_token->endAttributeName(source.numberOfCharactersConsumed()); 816 HTML_ADVANCE_TO(BeforeAttributeValueState); 817 } else if (cc == '>') { 818 m_token->endAttributeName(source.numberOfCharactersConsumed()); 819 return emitAndResumeIn(source, HTMLTokenizer::DataState); 820 } else if (m_options.usePreHTML5ParserQuirks && cc == '<') { 821 m_token->endAttributeName(source.numberOfCharactersConsumed()); 822 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 823 } else if (isASCIIUpper(cc)) { 824 m_token->appendToAttributeName(toLowerCase(cc)); 825 HTML_ADVANCE_TO(AttributeNameState); 826 } else if (cc == kEndOfFileMarker) { 827 parseError(); 828 m_token->endAttributeName(source.numberOfCharactersConsumed()); 829 HTML_RECONSUME_IN(DataState); 830 } else { 831 if (cc == '"' || cc == '\'' || cc == '<' || cc == '=') 832 parseError(); 833 m_token->appendToAttributeName(cc); 834 HTML_ADVANCE_TO(AttributeNameState); 835 } 836 } 837 END_STATE() 838 839 HTML_BEGIN_STATE(AfterAttributeNameState) { 840 if (isTokenizerWhitespace(cc)) 841 HTML_ADVANCE_TO(AfterAttributeNameState); 842 else if (cc == '/') 843 HTML_ADVANCE_TO(SelfClosingStartTagState); 844 else if (cc == '=') 845 HTML_ADVANCE_TO(BeforeAttributeValueState); 846 else if (cc == '>') 847 return emitAndResumeIn(source, HTMLTokenizer::DataState); 848 else if (m_options.usePreHTML5ParserQuirks && cc == '<') 849 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 850 else if (isASCIIUpper(cc)) { 851 m_token->addNewAttribute(); 852 m_token->beginAttributeName(source.numberOfCharactersConsumed()); 853 m_token->appendToAttributeName(toLowerCase(cc)); 854 HTML_ADVANCE_TO(AttributeNameState); 855 } else if (cc == kEndOfFileMarker) { 856 parseError(); 857 HTML_RECONSUME_IN(DataState); 858 } else { 859 if (cc == '"' || cc == '\'' || cc == '<') 860 parseError(); 861 m_token->addNewAttribute(); 862 m_token->beginAttributeName(source.numberOfCharactersConsumed()); 863 m_token->appendToAttributeName(cc); 864 HTML_ADVANCE_TO(AttributeNameState); 865 } 866 } 867 END_STATE() 868 869 HTML_BEGIN_STATE(BeforeAttributeValueState) { 870 if (isTokenizerWhitespace(cc)) 871 HTML_ADVANCE_TO(BeforeAttributeValueState); 872 else if (cc == '"') { 873 m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1); 874 HTML_ADVANCE_TO(AttributeValueDoubleQuotedState); 875 } else if (cc == '&') { 876 m_token->beginAttributeValue(source.numberOfCharactersConsumed()); 877 HTML_RECONSUME_IN(AttributeValueUnquotedState); 878 } else if (cc == '\'') { 879 m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1); 880 HTML_ADVANCE_TO(AttributeValueSingleQuotedState); 881 } else if (cc == '>') { 882 parseError(); 883 return emitAndResumeIn(source, HTMLTokenizer::DataState); 884 } else if (cc == kEndOfFileMarker) { 885 parseError(); 886 HTML_RECONSUME_IN(DataState); 887 } else { 888 if (cc == '<' || cc == '=' || cc == '`') 889 parseError(); 890 m_token->beginAttributeValue(source.numberOfCharactersConsumed()); 891 m_token->appendToAttributeValue(cc); 892 HTML_ADVANCE_TO(AttributeValueUnquotedState); 893 } 894 } 895 END_STATE() 896 897 HTML_BEGIN_STATE(AttributeValueDoubleQuotedState) { 898 if (cc == '"') { 899 m_token->endAttributeValue(source.numberOfCharactersConsumed()); 900 HTML_ADVANCE_TO(AfterAttributeValueQuotedState); 901 } else if (cc == '&') { 700 ADVANCE_TO(ScriptDataDoubleEscapedState); 701 } 702 if (isASCIIAlpha(character)) { 703 bufferASCIICharacter(character); 704 appendToTemporaryBuffer(convertASCIIAlphaToLower(character)); 705 ADVANCE_TO(ScriptDataDoubleEscapeEndState); 706 } 707 RECONSUME_IN(ScriptDataDoubleEscapedState); 708 END_STATE() 709 710 BEGIN_STATE(BeforeAttributeNameState) 711 if (isTokenizerWhitespace(character)) 712 ADVANCE_TO(BeforeAttributeNameState); 713 if (character == '/') 714 ADVANCE_TO(SelfClosingStartTagState); 715 if (character == '>') 716 return emitAndResumeInDataState(source); 717 if (m_options.usePreHTML5ParserQuirks && character == '<') 718 return emitAndReconsumeInDataState(); 719 if (character == kEndOfFileMarker) { 720 parseError(); 721 RECONSUME_IN(DataState); 722 } 723 if (character == '"' || character == '\'' || character == '<' || character == '=') 724 parseError(); 725 m_token.beginAttribute(source.numberOfCharactersConsumed()); 726 m_token.appendToAttributeName(toASCIILower(character)); 727 ADVANCE_TO(AttributeNameState); 728 END_STATE() 729 730 BEGIN_STATE(AttributeNameState) 731 if (isTokenizerWhitespace(character)) 732 ADVANCE_TO(AfterAttributeNameState); 733 if (character == '/') 734 ADVANCE_TO(SelfClosingStartTagState); 735 if (character == '=') 736 ADVANCE_TO(BeforeAttributeValueState); 737 if (character == '>') 738 return emitAndResumeInDataState(source); 739 if (m_options.usePreHTML5ParserQuirks && character == '<') 740 return emitAndReconsumeInDataState(); 741 if (character == kEndOfFileMarker) { 742 parseError(); 743 RECONSUME_IN(DataState); 744 } 745 if (character == '"' || character == '\'' || character == '<' || character == '=') 746 parseError(); 747 m_token.appendToAttributeName(toASCIILower(character)); 748 ADVANCE_TO(AttributeNameState); 749 END_STATE() 750 751 BEGIN_STATE(AfterAttributeNameState) 752 if (isTokenizerWhitespace(character)) 753 ADVANCE_TO(AfterAttributeNameState); 754 if (character == '/') 755 ADVANCE_TO(SelfClosingStartTagState); 756 if (character == '=') 757 ADVANCE_TO(BeforeAttributeValueState); 758 if (character == '>') 759 return emitAndResumeInDataState(source); 760 if (m_options.usePreHTML5ParserQuirks && character == '<') 761 return emitAndReconsumeInDataState(); 762 if (character == kEndOfFileMarker) { 763 parseError(); 764 RECONSUME_IN(DataState); 765 } 766 if (character == '"' || character == '\'' || character == '<') 767 parseError(); 768 m_token.beginAttribute(source.numberOfCharactersConsumed()); 769 m_token.appendToAttributeName(toASCIILower(character)); 770 ADVANCE_TO(AttributeNameState); 771 END_STATE() 772 773 BEGIN_STATE(BeforeAttributeValueState) 774 if (isTokenizerWhitespace(character)) 775 ADVANCE_TO(BeforeAttributeValueState); 776 if (character == '"') 777 ADVANCE_TO(AttributeValueDoubleQuotedState); 778 if (character == '&') 779 RECONSUME_IN(AttributeValueUnquotedState); 780 if (character == '\'') 781 ADVANCE_TO(AttributeValueSingleQuotedState); 782 if (character == '>') { 783 parseError(); 784 return emitAndResumeInDataState(source); 785 } 786 if (character == kEndOfFileMarker) { 787 parseError(); 788 RECONSUME_IN(DataState); 789 } 790 if (character == '<' || character == '=' || character == '`') 791 parseError(); 792 m_token.appendToAttributeValue(character); 793 ADVANCE_TO(AttributeValueUnquotedState); 794 END_STATE() 795 796 BEGIN_STATE(AttributeValueDoubleQuotedState) 797 if (character == '"') { 798 m_token.endAttribute(source.numberOfCharactersConsumed()); 799 ADVANCE_TO(AfterAttributeValueQuotedState); 800 } 801 if (character == '&') { 902 802 m_additionalAllowedCharacter = '"'; 903 HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);904 } else if (cc == kEndOfFileMarker) {905 parseError();906 m_token->endAttributeValue(source.numberOfCharactersConsumed());907 HTML_RECONSUME_IN(DataState);908 } else {909 m_token->appendToAttributeValue(cc);910 HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);911 }912 }913 END_STATE() 914 915 HTML_BEGIN_STATE(AttributeValueSingleQuotedState) {916 if (cc == '\'') {917 m_token->endAttributeValue(source.numberOfCharactersConsumed());918 HTML_ADVANCE_TO(AfterAttributeValueQuotedState);919 } else if (cc== '&') {803 ADVANCE_TO(CharacterReferenceInAttributeValueState); 804 } 805 if (character == kEndOfFileMarker) { 806 parseError(); 807 m_token.endAttribute(source.numberOfCharactersConsumed()); 808 RECONSUME_IN(DataState); 809 } 810 m_token.appendToAttributeValue(character); 811 ADVANCE_TO(AttributeValueDoubleQuotedState); 812 END_STATE() 813 814 BEGIN_STATE(AttributeValueSingleQuotedState) 815 if (character == '\'') { 816 m_token.endAttribute(source.numberOfCharactersConsumed()); 817 ADVANCE_TO(AfterAttributeValueQuotedState); 818 } 819 if (character == '&') { 920 820 m_additionalAllowedCharacter = '\''; 921 HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);922 } else if (cc == kEndOfFileMarker) {923 parseError();924 m_token->endAttributeValue(source.numberOfCharactersConsumed());925 HTML_RECONSUME_IN(DataState);926 } else {927 m_token->appendToAttributeValue(cc);928 HTML_ADVANCE_TO(AttributeValueSingleQuotedState);929 }930 }931 END_STATE() 932 933 HTML_BEGIN_STATE(AttributeValueUnquotedState) {934 if (isTokenizerWhitespace(cc)) {935 m_token->endAttributeValue(source.numberOfCharactersConsumed());936 HTML_ADVANCE_TO(BeforeAttributeNameState);937 } else if (cc== '&') {821 ADVANCE_TO(CharacterReferenceInAttributeValueState); 822 } 823 if (character == kEndOfFileMarker) { 824 parseError(); 825 m_token.endAttribute(source.numberOfCharactersConsumed()); 826 RECONSUME_IN(DataState); 827 } 828 m_token.appendToAttributeValue(character); 829 ADVANCE_TO(AttributeValueSingleQuotedState); 830 END_STATE() 831 832 BEGIN_STATE(AttributeValueUnquotedState) 833 if (isTokenizerWhitespace(character)) { 834 m_token.endAttribute(source.numberOfCharactersConsumed()); 835 ADVANCE_TO(BeforeAttributeNameState); 836 } 837 if (character == '&') { 938 838 m_additionalAllowedCharacter = '>'; 939 HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);940 } else if (cc == '>') {941 m_token->endAttributeValue(source.numberOfCharactersConsumed());942 return emitAndResumeIn(source, HTMLTokenizer::DataState);943 } else if (cc == kEndOfFileMarker) {944 parseError();945 m_token->endAttributeValue(source.numberOfCharactersConsumed());946 HTML_RECONSUME_IN(DataState);947 } else {948 if (cc == '"' || cc == '\'' || cc == '<' || cc == '=' || cc == '`')949 parseError();950 m_token->appendToAttributeValue(cc);951 HTML_ADVANCE_TO(AttributeValueUnquotedState);952 }953 }954 END_STATE() 955 956 HTML_BEGIN_STATE(CharacterReferenceInAttributeValueState) {839 ADVANCE_TO(CharacterReferenceInAttributeValueState); 840 } 841 if (character == '>') { 842 m_token.endAttribute(source.numberOfCharactersConsumed()); 843 return emitAndResumeInDataState(source); 844 } 845 if (character == kEndOfFileMarker) { 846 parseError(); 847 m_token.endAttribute(source.numberOfCharactersConsumed()); 848 RECONSUME_IN(DataState); 849 } 850 if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`') 851 parseError(); 852 m_token.appendToAttributeValue(character); 853 ADVANCE_TO(AttributeValueUnquotedState); 854 END_STATE() 855 856 BEGIN_STATE(CharacterReferenceInAttributeValueState) 957 857 bool notEnoughCharacters = false; 958 858 StringBuilder decodedEntity; 959 859 bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters, m_additionalAllowedCharacter); 960 860 if (notEnoughCharacters) 961 return haveBufferedCharacterToken();861 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 962 862 if (!success) { 963 863 ASSERT(decodedEntity.isEmpty()); 964 m_token ->appendToAttributeValue('&');864 m_token.appendToAttributeValue('&'); 965 865 } else { 966 866 for (unsigned i = 0; i < decodedEntity.length(); ++i) 967 m_token ->appendToAttributeValue(decodedEntity[i]);867 m_token.appendToAttributeValue(decodedEntity[i]); 968 868 } 969 869 // We're supposed to switch back to the attribute value state that … … 972 872 // state can be determined by m_additionalAllowedCharacter. 973 873 if (m_additionalAllowedCharacter == '"') 974 HTML_SWITCH_TO(AttributeValueDoubleQuotedState); 975 else if (m_additionalAllowedCharacter == '\'') 976 HTML_SWITCH_TO(AttributeValueSingleQuotedState); 977 else if (m_additionalAllowedCharacter == '>') 978 HTML_SWITCH_TO(AttributeValueUnquotedState); 979 else 980 ASSERT_NOT_REACHED(); 981 } 982 END_STATE() 983 984 HTML_BEGIN_STATE(AfterAttributeValueQuotedState) { 985 if (isTokenizerWhitespace(cc)) 986 HTML_ADVANCE_TO(BeforeAttributeNameState); 987 else if (cc == '/') 988 HTML_ADVANCE_TO(SelfClosingStartTagState); 989 else if (cc == '>') 990 return emitAndResumeIn(source, HTMLTokenizer::DataState); 991 else if (m_options.usePreHTML5ParserQuirks && cc == '<') 992 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 993 else if (cc == kEndOfFileMarker) { 994 parseError(); 995 HTML_RECONSUME_IN(DataState); 996 } else { 997 parseError(); 998 HTML_RECONSUME_IN(BeforeAttributeNameState); 999 } 1000 } 1001 END_STATE() 1002 1003 HTML_BEGIN_STATE(SelfClosingStartTagState) { 1004 if (cc == '>') { 1005 m_token->setSelfClosing(); 1006 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1007 } else if (cc == kEndOfFileMarker) { 1008 parseError(); 1009 HTML_RECONSUME_IN(DataState); 1010 } else { 1011 parseError(); 1012 HTML_RECONSUME_IN(BeforeAttributeNameState); 1013 } 1014 } 1015 END_STATE() 1016 1017 HTML_BEGIN_STATE(BogusCommentState) { 1018 m_token->beginComment(); 1019 HTML_RECONSUME_IN(ContinueBogusCommentState); 1020 } 1021 END_STATE() 1022 1023 HTML_BEGIN_STATE(ContinueBogusCommentState) { 1024 if (cc == '>') 1025 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1026 else if (cc == kEndOfFileMarker) 1027 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1028 else { 1029 m_token->appendToComment(cc); 1030 HTML_ADVANCE_TO(ContinueBogusCommentState); 1031 } 1032 } 1033 END_STATE() 1034 1035 HTML_BEGIN_STATE(MarkupDeclarationOpenState) { 1036 DEPRECATED_DEFINE_STATIC_LOCAL(String, dashDashString, (ASCIILiteral("--"))); 1037 DEPRECATED_DEFINE_STATIC_LOCAL(String, doctypeString, (ASCIILiteral("doctype"))); 1038 DEPRECATED_DEFINE_STATIC_LOCAL(String, cdataString, (ASCIILiteral("[CDATA["))); 1039 if (cc == '-') { 1040 SegmentedString::LookAheadResult result = source.lookAhead(dashDashString); 874 SWITCH_TO(AttributeValueDoubleQuotedState); 875 if (m_additionalAllowedCharacter == '\'') 876 SWITCH_TO(AttributeValueSingleQuotedState); 877 ASSERT(m_additionalAllowedCharacter == '>'); 878 SWITCH_TO(AttributeValueUnquotedState); 879 END_STATE() 880 881 BEGIN_STATE(AfterAttributeValueQuotedState) 882 if (isTokenizerWhitespace(character)) 883 ADVANCE_TO(BeforeAttributeNameState); 884 if (character == '/') 885 ADVANCE_TO(SelfClosingStartTagState); 886 if (character == '>') 887 return emitAndResumeInDataState(source); 888 if (m_options.usePreHTML5ParserQuirks && character == '<') 889 return emitAndReconsumeInDataState(); 890 if (character == kEndOfFileMarker) { 891 parseError(); 892 RECONSUME_IN(DataState); 893 } 894 parseError(); 895 RECONSUME_IN(BeforeAttributeNameState); 896 END_STATE() 897 898 BEGIN_STATE(SelfClosingStartTagState) 899 if (character == '>') { 900 m_token.setSelfClosing(); 901 return emitAndResumeInDataState(source); 902 } 903 if (character == kEndOfFileMarker) { 904 parseError(); 905 RECONSUME_IN(DataState); 906 } 907 parseError(); 908 RECONSUME_IN(BeforeAttributeNameState); 909 END_STATE() 910 911 BEGIN_STATE(BogusCommentState) 912 m_token.beginComment(); 913 RECONSUME_IN(ContinueBogusCommentState); 914 END_STATE() 915 916 BEGIN_STATE(ContinueBogusCommentState) 917 if (character == '>') 918 return emitAndResumeInDataState(source); 919 if (character == kEndOfFileMarker) 920 return emitAndReconsumeInDataState(); 921 m_token.appendToComment(character); 922 ADVANCE_TO(ContinueBogusCommentState); 923 END_STATE() 924 925 BEGIN_STATE(MarkupDeclarationOpenState) 926 if (character == '-') { 927 auto result = source.advancePast("--"); 1041 928 if (result == SegmentedString::DidMatch) { 1042 source.advanceAndASSERT('-'); 1043 source.advanceAndASSERT('-'); 1044 m_token->beginComment(); 1045 HTML_SWITCH_TO(CommentStartState); 1046 } else if (result == SegmentedString::NotEnoughCharacters) 1047 return haveBufferedCharacterToken(); 1048 } else if (cc == 'D' || cc == 'd') { 1049 SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(doctypeString); 1050 if (result == SegmentedString::DidMatch) { 1051 advanceStringAndASSERTIgnoringCase(source, "doctype"); 1052 HTML_SWITCH_TO(DOCTYPEState); 1053 } else if (result == SegmentedString::NotEnoughCharacters) 1054 return haveBufferedCharacterToken(); 1055 } else if (cc == '[' && shouldAllowCDATA()) { 1056 SegmentedString::LookAheadResult result = source.lookAhead(cdataString); 1057 if (result == SegmentedString::DidMatch) { 1058 advanceStringAndASSERT(source, "[CDATA["); 1059 HTML_SWITCH_TO(CDATASectionState); 1060 } else if (result == SegmentedString::NotEnoughCharacters) 1061 return haveBufferedCharacterToken(); 929 m_token.beginComment(); 930 SWITCH_TO(CommentStartState); 931 } 932 if (result == SegmentedString::NotEnoughCharacters) 933 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 934 } else if (isASCIIAlphaCaselessEqual(character, 'd')) { 935 auto result = source.advancePastIgnoringCase("doctype"); 936 if (result == SegmentedString::DidMatch) 937 SWITCH_TO(DOCTYPEState); 938 if (result == SegmentedString::NotEnoughCharacters) 939 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 940 } else if (character == '[' && shouldAllowCDATA()) { 941 auto result = source.advancePast("[CDATA["); 942 if (result == SegmentedString::DidMatch) 943 SWITCH_TO(CDATASectionState); 944 if (result == SegmentedString::NotEnoughCharacters) 945 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 1062 946 } 1063 947 parseError(); 1064 HTML_RECONSUME_IN(BogusCommentState); 1065 } 1066 END_STATE() 1067 1068 HTML_BEGIN_STATE(CommentStartState) { 1069 if (cc == '-') 1070 HTML_ADVANCE_TO(CommentStartDashState); 1071 else if (cc == '>') { 1072 parseError(); 1073 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1074 } else if (cc == kEndOfFileMarker) { 1075 parseError(); 1076 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1077 } else { 1078 m_token->appendToComment(cc); 1079 HTML_ADVANCE_TO(CommentState); 1080 } 1081 } 1082 END_STATE() 1083 1084 HTML_BEGIN_STATE(CommentStartDashState) { 1085 if (cc == '-') 1086 HTML_ADVANCE_TO(CommentEndState); 1087 else if (cc == '>') { 1088 parseError(); 1089 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1090 } else if (cc == kEndOfFileMarker) { 1091 parseError(); 1092 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1093 } else { 1094 m_token->appendToComment('-'); 1095 m_token->appendToComment(cc); 1096 HTML_ADVANCE_TO(CommentState); 1097 } 1098 } 1099 END_STATE() 1100 1101 HTML_BEGIN_STATE(CommentState) { 1102 if (cc == '-') 1103 HTML_ADVANCE_TO(CommentEndDashState); 1104 else if (cc == kEndOfFileMarker) { 1105 parseError(); 1106 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1107 } else { 1108 m_token->appendToComment(cc); 1109 HTML_ADVANCE_TO(CommentState); 1110 } 1111 } 1112 END_STATE() 1113 1114 HTML_BEGIN_STATE(CommentEndDashState) { 1115 if (cc == '-') 1116 HTML_ADVANCE_TO(CommentEndState); 1117 else if (cc == kEndOfFileMarker) { 1118 parseError(); 1119 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1120 } else { 1121 m_token->appendToComment('-'); 1122 m_token->appendToComment(cc); 1123 HTML_ADVANCE_TO(CommentState); 1124 } 1125 } 1126 END_STATE() 1127 1128 HTML_BEGIN_STATE(CommentEndState) { 1129 if (cc == '>') 1130 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1131 else if (cc == '!') { 1132 parseError(); 1133 HTML_ADVANCE_TO(CommentEndBangState); 1134 } else if (cc == '-') { 1135 parseError(); 1136 m_token->appendToComment('-'); 1137 HTML_ADVANCE_TO(CommentEndState); 1138 } else if (cc == kEndOfFileMarker) { 1139 parseError(); 1140 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1141 } else { 1142 parseError(); 1143 m_token->appendToComment('-'); 1144 m_token->appendToComment('-'); 1145 m_token->appendToComment(cc); 1146 HTML_ADVANCE_TO(CommentState); 1147 } 1148 } 1149 END_STATE() 1150 1151 HTML_BEGIN_STATE(CommentEndBangState) { 1152 if (cc == '-') { 1153 m_token->appendToComment('-'); 1154 m_token->appendToComment('-'); 1155 m_token->appendToComment('!'); 1156 HTML_ADVANCE_TO(CommentEndDashState); 1157 } else if (cc == '>') 1158 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1159 else if (cc == kEndOfFileMarker) { 1160 parseError(); 1161 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1162 } else { 1163 m_token->appendToComment('-'); 1164 m_token->appendToComment('-'); 1165 m_token->appendToComment('!'); 1166 m_token->appendToComment(cc); 1167 HTML_ADVANCE_TO(CommentState); 1168 } 1169 } 1170 END_STATE() 1171 1172 HTML_BEGIN_STATE(DOCTYPEState) { 1173 if (isTokenizerWhitespace(cc)) 1174 HTML_ADVANCE_TO(BeforeDOCTYPENameState); 1175 else if (cc == kEndOfFileMarker) { 1176 parseError(); 1177 m_token->beginDOCTYPE(); 1178 m_token->setForceQuirks(); 1179 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1180 } else { 1181 parseError(); 1182 HTML_RECONSUME_IN(BeforeDOCTYPENameState); 1183 } 1184 } 1185 END_STATE() 1186 1187 HTML_BEGIN_STATE(BeforeDOCTYPENameState) { 1188 if (isTokenizerWhitespace(cc)) 1189 HTML_ADVANCE_TO(BeforeDOCTYPENameState); 1190 else if (isASCIIUpper(cc)) { 1191 m_token->beginDOCTYPE(toLowerCase(cc)); 1192 HTML_ADVANCE_TO(DOCTYPENameState); 1193 } else if (cc == '>') { 1194 parseError(); 1195 m_token->beginDOCTYPE(); 1196 m_token->setForceQuirks(); 1197 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1198 } else if (cc == kEndOfFileMarker) { 1199 parseError(); 1200 m_token->beginDOCTYPE(); 1201 m_token->setForceQuirks(); 1202 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1203 } else { 1204 m_token->beginDOCTYPE(cc); 1205 HTML_ADVANCE_TO(DOCTYPENameState); 1206 } 1207 } 1208 END_STATE() 1209 1210 HTML_BEGIN_STATE(DOCTYPENameState) { 1211 if (isTokenizerWhitespace(cc)) 1212 HTML_ADVANCE_TO(AfterDOCTYPENameState); 1213 else if (cc == '>') 1214 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1215 else if (isASCIIUpper(cc)) { 1216 m_token->appendToName(toLowerCase(cc)); 1217 HTML_ADVANCE_TO(DOCTYPENameState); 1218 } else if (cc == kEndOfFileMarker) { 1219 parseError(); 1220 m_token->setForceQuirks(); 1221 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1222 } else { 1223 m_token->appendToName(cc); 1224 HTML_ADVANCE_TO(DOCTYPENameState); 1225 } 1226 } 1227 END_STATE() 1228 1229 HTML_BEGIN_STATE(AfterDOCTYPENameState) { 1230 if (isTokenizerWhitespace(cc)) 1231 HTML_ADVANCE_TO(AfterDOCTYPENameState); 1232 if (cc == '>') 1233 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1234 else if (cc == kEndOfFileMarker) { 1235 parseError(); 1236 m_token->setForceQuirks(); 1237 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1238 } else { 1239 DEPRECATED_DEFINE_STATIC_LOCAL(String, publicString, (ASCIILiteral("public"))); 1240 DEPRECATED_DEFINE_STATIC_LOCAL(String, systemString, (ASCIILiteral("system"))); 1241 if (cc == 'P' || cc == 'p') { 1242 SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(publicString); 1243 if (result == SegmentedString::DidMatch) { 1244 advanceStringAndASSERTIgnoringCase(source, "public"); 1245 HTML_SWITCH_TO(AfterDOCTYPEPublicKeywordState); 1246 } else if (result == SegmentedString::NotEnoughCharacters) 1247 return haveBufferedCharacterToken(); 1248 } else if (cc == 'S' || cc == 's') { 1249 SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(systemString); 1250 if (result == SegmentedString::DidMatch) { 1251 advanceStringAndASSERTIgnoringCase(source, "system"); 1252 HTML_SWITCH_TO(AfterDOCTYPESystemKeywordState); 1253 } else if (result == SegmentedString::NotEnoughCharacters) 1254 return haveBufferedCharacterToken(); 1255 } 1256 parseError(); 1257 m_token->setForceQuirks(); 1258 HTML_ADVANCE_TO(BogusDOCTYPEState); 1259 } 1260 } 1261 END_STATE() 1262 1263 HTML_BEGIN_STATE(AfterDOCTYPEPublicKeywordState) { 1264 if (isTokenizerWhitespace(cc)) 1265 HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState); 1266 else if (cc == '"') { 1267 parseError(); 1268 m_token->setPublicIdentifierToEmptyString(); 1269 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1270 } else if (cc == '\'') { 1271 parseError(); 1272 m_token->setPublicIdentifierToEmptyString(); 1273 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1274 } else if (cc == '>') { 1275 parseError(); 1276 m_token->setForceQuirks(); 1277 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1278 } else if (cc == kEndOfFileMarker) { 1279 parseError(); 1280 m_token->setForceQuirks(); 1281 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1282 } else { 1283 parseError(); 1284 m_token->setForceQuirks(); 1285 HTML_ADVANCE_TO(BogusDOCTYPEState); 1286 } 1287 } 1288 END_STATE() 1289 1290 HTML_BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) { 1291 if (isTokenizerWhitespace(cc)) 1292 HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState); 1293 else if (cc == '"') { 1294 m_token->setPublicIdentifierToEmptyString(); 1295 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1296 } else if (cc == '\'') { 1297 m_token->setPublicIdentifierToEmptyString(); 1298 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1299 } else if (cc == '>') { 1300 parseError(); 1301 m_token->setForceQuirks(); 1302 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1303 } else if (cc == kEndOfFileMarker) { 1304 parseError(); 1305 m_token->setForceQuirks(); 1306 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1307 } else { 1308 parseError(); 1309 m_token->setForceQuirks(); 1310 HTML_ADVANCE_TO(BogusDOCTYPEState); 1311 } 1312 } 1313 END_STATE() 1314 1315 HTML_BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) { 1316 if (cc == '"') 1317 HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState); 1318 else if (cc == '>') { 1319 parseError(); 1320 m_token->setForceQuirks(); 1321 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1322 } else if (cc == kEndOfFileMarker) { 1323 parseError(); 1324 m_token->setForceQuirks(); 1325 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1326 } else { 1327 m_token->appendToPublicIdentifier(cc); 1328 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1329 } 1330 } 1331 END_STATE() 1332 1333 HTML_BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) { 1334 if (cc == '\'') 1335 HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState); 1336 else if (cc == '>') { 1337 parseError(); 1338 m_token->setForceQuirks(); 1339 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1340 } else if (cc == kEndOfFileMarker) { 1341 parseError(); 1342 m_token->setForceQuirks(); 1343 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1344 } else { 1345 m_token->appendToPublicIdentifier(cc); 1346 HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1347 } 1348 } 1349 END_STATE() 1350 1351 HTML_BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) { 1352 if (isTokenizerWhitespace(cc)) 1353 HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState); 1354 else if (cc == '>') 1355 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1356 else if (cc == '"') { 1357 parseError(); 1358 m_token->setSystemIdentifierToEmptyString(); 1359 HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1360 } else if (cc == '\'') { 1361 parseError(); 1362 m_token->setSystemIdentifierToEmptyString(); 1363 HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1364 } else if (cc == kEndOfFileMarker) { 1365 parseError(); 1366 m_token->setForceQuirks(); 1367 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1368 } else { 1369 parseError(); 1370 m_token->setForceQuirks(); 1371 HTML_ADVANCE_TO(BogusDOCTYPEState); 1372 } 1373 } 1374 END_STATE() 1375 1376 HTML_BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) { 1377 if (isTokenizerWhitespace(cc)) 1378 HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState); 1379 else if (cc == '>') 1380 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1381 else if (cc == '"') { 1382 m_token->setSystemIdentifierToEmptyString(); 1383 HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1384 } else if (cc == '\'') { 1385 m_token->setSystemIdentifierToEmptyString(); 1386 HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1387 } else if (cc == kEndOfFileMarker) { 1388 parseError(); 1389 m_token->setForceQuirks(); 1390 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1391 } else { 1392 parseError(); 1393 m_token->setForceQuirks(); 1394 HTML_ADVANCE_TO(BogusDOCTYPEState); 1395 } 1396 } 1397 END_STATE() 1398 1399 HTML_BEGIN_STATE(AfterDOCTYPESystemKeywordState) { 1400 if (isTokenizerWhitespace(cc)) 1401 HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState); 1402 else if (cc == '"') { 1403 parseError(); 1404 m_token->setSystemIdentifierToEmptyString(); 1405 HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1406 } else if (cc == '\'') { 1407 parseError(); 1408 m_token->setSystemIdentifierToEmptyString(); 1409 HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1410 } else if (cc == '>') { 1411 parseError(); 1412 m_token->setForceQuirks(); 1413 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1414 } else if (cc == kEndOfFileMarker) { 1415 parseError(); 1416 m_token->setForceQuirks(); 1417 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1418 } else { 1419 parseError(); 1420 m_token->setForceQuirks(); 1421 HTML_ADVANCE_TO(BogusDOCTYPEState); 1422 } 1423 } 1424 END_STATE() 1425 1426 HTML_BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) { 1427 if (isTokenizerWhitespace(cc)) 1428 HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState); 1429 if (cc == '"') { 1430 m_token->setSystemIdentifierToEmptyString(); 1431 HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1432 } else if (cc == '\'') { 1433 m_token->setSystemIdentifierToEmptyString(); 1434 HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1435 } else if (cc == '>') { 1436 parseError(); 1437 m_token->setForceQuirks(); 1438 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1439 } else if (cc == kEndOfFileMarker) { 1440 parseError(); 1441 m_token->setForceQuirks(); 1442 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1443 } else { 1444 parseError(); 1445 m_token->setForceQuirks(); 1446 HTML_ADVANCE_TO(BogusDOCTYPEState); 1447 } 1448 } 1449 END_STATE() 1450 1451 HTML_BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) { 1452 if (cc == '"') 1453 HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1454 else if (cc == '>') { 1455 parseError(); 1456 m_token->setForceQuirks(); 1457 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1458 } else if (cc == kEndOfFileMarker) { 1459 parseError(); 1460 m_token->setForceQuirks(); 1461 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1462 } else { 1463 m_token->appendToSystemIdentifier(cc); 1464 HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1465 } 1466 } 1467 END_STATE() 1468 1469 HTML_BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) { 1470 if (cc == '\'') 1471 HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1472 else if (cc == '>') { 1473 parseError(); 1474 m_token->setForceQuirks(); 1475 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1476 } else if (cc == kEndOfFileMarker) { 1477 parseError(); 1478 m_token->setForceQuirks(); 1479 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1480 } else { 1481 m_token->appendToSystemIdentifier(cc); 1482 HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1483 } 1484 } 1485 END_STATE() 1486 1487 HTML_BEGIN_STATE(AfterDOCTYPESystemIdentifierState) { 1488 if (isTokenizerWhitespace(cc)) 1489 HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1490 else if (cc == '>') 1491 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1492 else if (cc == kEndOfFileMarker) { 1493 parseError(); 1494 m_token->setForceQuirks(); 1495 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1496 } else { 1497 parseError(); 1498 HTML_ADVANCE_TO(BogusDOCTYPEState); 1499 } 1500 } 1501 END_STATE() 1502 1503 HTML_BEGIN_STATE(BogusDOCTYPEState) { 1504 if (cc == '>') 1505 return emitAndResumeIn(source, HTMLTokenizer::DataState); 1506 else if (cc == kEndOfFileMarker) 1507 return emitAndReconsumeIn(source, HTMLTokenizer::DataState); 1508 HTML_ADVANCE_TO(BogusDOCTYPEState); 1509 } 1510 END_STATE() 1511 1512 HTML_BEGIN_STATE(CDATASectionState) { 1513 if (cc == ']') 1514 HTML_ADVANCE_TO(CDATASectionRightSquareBracketState); 1515 else if (cc == kEndOfFileMarker) 1516 HTML_RECONSUME_IN(DataState); 1517 else { 1518 bufferCharacter(cc); 1519 HTML_ADVANCE_TO(CDATASectionState); 1520 } 1521 } 1522 END_STATE() 1523 1524 HTML_BEGIN_STATE(CDATASectionRightSquareBracketState) { 1525 if (cc == ']') 1526 HTML_ADVANCE_TO(CDATASectionDoubleRightSquareBracketState); 1527 else { 1528 bufferASCIICharacter(']'); 1529 HTML_RECONSUME_IN(CDATASectionState); 1530 } 1531 } 1532 1533 HTML_BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) { 1534 if (cc == '>') 1535 HTML_ADVANCE_TO(DataState); 1536 else { 1537 bufferASCIICharacter(']'); 1538 bufferASCIICharacter(']'); 1539 HTML_RECONSUME_IN(CDATASectionState); 1540 } 1541 } 948 RECONSUME_IN(BogusCommentState); 949 END_STATE() 950 951 BEGIN_STATE(CommentStartState) 952 if (character == '-') 953 ADVANCE_TO(CommentStartDashState); 954 if (character == '>') { 955 parseError(); 956 return emitAndResumeInDataState(source); 957 } 958 if (character == kEndOfFileMarker) { 959 parseError(); 960 return emitAndReconsumeInDataState(); 961 } 962 m_token.appendToComment(character); 963 ADVANCE_TO(CommentState); 964 END_STATE() 965 966 BEGIN_STATE(CommentStartDashState) 967 if (character == '-') 968 ADVANCE_TO(CommentEndState); 969 if (character == '>') { 970 parseError(); 971 return emitAndResumeInDataState(source); 972 } 973 if (character == kEndOfFileMarker) { 974 parseError(); 975 return emitAndReconsumeInDataState(); 976 } 977 m_token.appendToComment('-'); 978 m_token.appendToComment(character); 979 ADVANCE_TO(CommentState); 980 END_STATE() 981 982 BEGIN_STATE(CommentState) 983 if (character == '-') 984 ADVANCE_TO(CommentEndDashState); 985 if (character == kEndOfFileMarker) { 986 parseError(); 987 return emitAndReconsumeInDataState(); 988 } 989 m_token.appendToComment(character); 990 ADVANCE_TO(CommentState); 991 END_STATE() 992 993 BEGIN_STATE(CommentEndDashState) 994 if (character == '-') 995 ADVANCE_TO(CommentEndState); 996 if (character == kEndOfFileMarker) { 997 parseError(); 998 return emitAndReconsumeInDataState(); 999 } 1000 m_token.appendToComment('-'); 1001 m_token.appendToComment(character); 1002 ADVANCE_TO(CommentState); 1003 END_STATE() 1004 1005 BEGIN_STATE(CommentEndState) 1006 if (character == '>') 1007 return emitAndResumeInDataState(source); 1008 if (character == '!') { 1009 parseError(); 1010 ADVANCE_TO(CommentEndBangState); 1011 } 1012 if (character == '-') { 1013 parseError(); 1014 m_token.appendToComment('-'); 1015 ADVANCE_TO(CommentEndState); 1016 } 1017 if (character == kEndOfFileMarker) { 1018 parseError(); 1019 return emitAndReconsumeInDataState(); 1020 } 1021 parseError(); 1022 m_token.appendToComment('-'); 1023 m_token.appendToComment('-'); 1024 m_token.appendToComment(character); 1025 ADVANCE_TO(CommentState); 1026 END_STATE() 1027 1028 BEGIN_STATE(CommentEndBangState) 1029 if (character == '-') { 1030 m_token.appendToComment('-'); 1031 m_token.appendToComment('-'); 1032 m_token.appendToComment('!'); 1033 ADVANCE_TO(CommentEndDashState); 1034 } 1035 if (character == '>') 1036 return emitAndResumeInDataState(source); 1037 if (character == kEndOfFileMarker) { 1038 parseError(); 1039 return emitAndReconsumeInDataState(); 1040 } 1041 m_token.appendToComment('-'); 1042 m_token.appendToComment('-'); 1043 m_token.appendToComment('!'); 1044 m_token.appendToComment(character); 1045 ADVANCE_TO(CommentState); 1046 END_STATE() 1047 1048 BEGIN_STATE(DOCTYPEState) 1049 if (isTokenizerWhitespace(character)) 1050 ADVANCE_TO(BeforeDOCTYPENameState); 1051 if (character == kEndOfFileMarker) { 1052 parseError(); 1053 m_token.beginDOCTYPE(); 1054 m_token.setForceQuirks(); 1055 return emitAndReconsumeInDataState(); 1056 } 1057 parseError(); 1058 RECONSUME_IN(BeforeDOCTYPENameState); 1059 END_STATE() 1060 1061 BEGIN_STATE(BeforeDOCTYPENameState) 1062 if (isTokenizerWhitespace(character)) 1063 ADVANCE_TO(BeforeDOCTYPENameState); 1064 if (character == '>') { 1065 parseError(); 1066 m_token.beginDOCTYPE(); 1067 m_token.setForceQuirks(); 1068 return emitAndResumeInDataState(source); 1069 } 1070 if (character == kEndOfFileMarker) { 1071 parseError(); 1072 m_token.beginDOCTYPE(); 1073 m_token.setForceQuirks(); 1074 return emitAndReconsumeInDataState(); 1075 } 1076 m_token.beginDOCTYPE(toASCIILower(character)); 1077 ADVANCE_TO(DOCTYPENameState); 1078 END_STATE() 1079 1080 BEGIN_STATE(DOCTYPENameState) 1081 if (isTokenizerWhitespace(character)) 1082 ADVANCE_TO(AfterDOCTYPENameState); 1083 if (character == '>') 1084 return emitAndResumeInDataState(source); 1085 if (character == kEndOfFileMarker) { 1086 parseError(); 1087 m_token.setForceQuirks(); 1088 return emitAndReconsumeInDataState(); 1089 } 1090 m_token.appendToName(toASCIILower(character)); 1091 ADVANCE_TO(DOCTYPENameState); 1092 END_STATE() 1093 1094 BEGIN_STATE(AfterDOCTYPENameState) 1095 if (isTokenizerWhitespace(character)) 1096 ADVANCE_TO(AfterDOCTYPENameState); 1097 if (character == '>') 1098 return emitAndResumeInDataState(source); 1099 if (character == kEndOfFileMarker) { 1100 parseError(); 1101 m_token.setForceQuirks(); 1102 return emitAndReconsumeInDataState(); 1103 } 1104 if (isASCIIAlphaCaselessEqual(character, 'p')) { 1105 auto result = source.advancePastIgnoringCase("public"); 1106 if (result == SegmentedString::DidMatch) 1107 SWITCH_TO(AfterDOCTYPEPublicKeywordState); 1108 if (result == SegmentedString::NotEnoughCharacters) 1109 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 1110 } else if (isASCIIAlphaCaselessEqual(character, 's')) { 1111 auto result = source.advancePastIgnoringCase("system"); 1112 if (result == SegmentedString::DidMatch) 1113 SWITCH_TO(AfterDOCTYPESystemKeywordState); 1114 if (result == SegmentedString::NotEnoughCharacters) 1115 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); 1116 } 1117 parseError(); 1118 m_token.setForceQuirks(); 1119 ADVANCE_TO(BogusDOCTYPEState); 1120 END_STATE() 1121 1122 BEGIN_STATE(AfterDOCTYPEPublicKeywordState) 1123 if (isTokenizerWhitespace(character)) 1124 ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState); 1125 if (character == '"') { 1126 parseError(); 1127 m_token.setPublicIdentifierToEmptyString(); 1128 ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1129 } 1130 if (character == '\'') { 1131 parseError(); 1132 m_token.setPublicIdentifierToEmptyString(); 1133 ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1134 } 1135 if (character == '>') { 1136 parseError(); 1137 m_token.setForceQuirks(); 1138 return emitAndResumeInDataState(source); 1139 } 1140 if (character == kEndOfFileMarker) { 1141 parseError(); 1142 m_token.setForceQuirks(); 1143 return emitAndReconsumeInDataState(); 1144 } 1145 parseError(); 1146 m_token.setForceQuirks(); 1147 ADVANCE_TO(BogusDOCTYPEState); 1148 END_STATE() 1149 1150 BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) 1151 if (isTokenizerWhitespace(character)) 1152 ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState); 1153 if (character == '"') { 1154 m_token.setPublicIdentifierToEmptyString(); 1155 ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1156 } 1157 if (character == '\'') { 1158 m_token.setPublicIdentifierToEmptyString(); 1159 ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1160 } 1161 if (character == '>') { 1162 parseError(); 1163 m_token.setForceQuirks(); 1164 return emitAndResumeInDataState(source); 1165 } 1166 if (character == kEndOfFileMarker) { 1167 parseError(); 1168 m_token.setForceQuirks(); 1169 return emitAndReconsumeInDataState(); 1170 } 1171 parseError(); 1172 m_token.setForceQuirks(); 1173 ADVANCE_TO(BogusDOCTYPEState); 1174 END_STATE() 1175 1176 BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) 1177 if (character == '"') 1178 ADVANCE_TO(AfterDOCTYPEPublicIdentifierState); 1179 if (character == '>') { 1180 parseError(); 1181 m_token.setForceQuirks(); 1182 return emitAndResumeInDataState(source); 1183 } 1184 if (character == kEndOfFileMarker) { 1185 parseError(); 1186 m_token.setForceQuirks(); 1187 return emitAndReconsumeInDataState(); 1188 } 1189 m_token.appendToPublicIdentifier(character); 1190 ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState); 1191 END_STATE() 1192 1193 BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) 1194 if (character == '\'') 1195 ADVANCE_TO(AfterDOCTYPEPublicIdentifierState); 1196 if (character == '>') { 1197 parseError(); 1198 m_token.setForceQuirks(); 1199 return emitAndResumeInDataState(source); 1200 } 1201 if (character == kEndOfFileMarker) { 1202 parseError(); 1203 m_token.setForceQuirks(); 1204 return emitAndReconsumeInDataState(); 1205 } 1206 m_token.appendToPublicIdentifier(character); 1207 ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState); 1208 END_STATE() 1209 1210 BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) 1211 if (isTokenizerWhitespace(character)) 1212 ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState); 1213 if (character == '>') 1214 return emitAndResumeInDataState(source); 1215 if (character == '"') { 1216 parseError(); 1217 m_token.setSystemIdentifierToEmptyString(); 1218 ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1219 } 1220 if (character == '\'') { 1221 parseError(); 1222 m_token.setSystemIdentifierToEmptyString(); 1223 ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1224 } 1225 if (character == kEndOfFileMarker) { 1226 parseError(); 1227 m_token.setForceQuirks(); 1228 return emitAndReconsumeInDataState(); 1229 } 1230 parseError(); 1231 m_token.setForceQuirks(); 1232 ADVANCE_TO(BogusDOCTYPEState); 1233 END_STATE() 1234 1235 BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) 1236 if (isTokenizerWhitespace(character)) 1237 ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState); 1238 if (character == '>') 1239 return emitAndResumeInDataState(source); 1240 if (character == '"') { 1241 m_token.setSystemIdentifierToEmptyString(); 1242 ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1243 } 1244 if (character == '\'') { 1245 m_token.setSystemIdentifierToEmptyString(); 1246 ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1247 } 1248 if (character == kEndOfFileMarker) { 1249 parseError(); 1250 m_token.setForceQuirks(); 1251 return emitAndReconsumeInDataState(); 1252 } 1253 parseError(); 1254 m_token.setForceQuirks(); 1255 ADVANCE_TO(BogusDOCTYPEState); 1256 END_STATE() 1257 1258 BEGIN_STATE(AfterDOCTYPESystemKeywordState) 1259 if (isTokenizerWhitespace(character)) 1260 ADVANCE_TO(BeforeDOCTYPESystemIdentifierState); 1261 if (character == '"') { 1262 parseError(); 1263 m_token.setSystemIdentifierToEmptyString(); 1264 ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1265 } 1266 if (character == '\'') { 1267 parseError(); 1268 m_token.setSystemIdentifierToEmptyString(); 1269 ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1270 } 1271 if (character == '>') { 1272 parseError(); 1273 m_token.setForceQuirks(); 1274 return emitAndResumeInDataState(source); 1275 } 1276 if (character == kEndOfFileMarker) { 1277 parseError(); 1278 m_token.setForceQuirks(); 1279 return emitAndReconsumeInDataState(); 1280 } 1281 parseError(); 1282 m_token.setForceQuirks(); 1283 ADVANCE_TO(BogusDOCTYPEState); 1284 END_STATE() 1285 1286 BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) 1287 if (isTokenizerWhitespace(character)) 1288 ADVANCE_TO(BeforeDOCTYPESystemIdentifierState); 1289 if (character == '"') { 1290 m_token.setSystemIdentifierToEmptyString(); 1291 ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1292 } 1293 if (character == '\'') { 1294 m_token.setSystemIdentifierToEmptyString(); 1295 ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1296 } 1297 if (character == '>') { 1298 parseError(); 1299 m_token.setForceQuirks(); 1300 return emitAndResumeInDataState(source); 1301 } 1302 if (character == kEndOfFileMarker) { 1303 parseError(); 1304 m_token.setForceQuirks(); 1305 return emitAndReconsumeInDataState(); 1306 } 1307 parseError(); 1308 m_token.setForceQuirks(); 1309 ADVANCE_TO(BogusDOCTYPEState); 1310 END_STATE() 1311 1312 BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) 1313 if (character == '"') 1314 ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1315 if (character == '>') { 1316 parseError(); 1317 m_token.setForceQuirks(); 1318 return emitAndResumeInDataState(source); 1319 } 1320 if (character == kEndOfFileMarker) { 1321 parseError(); 1322 m_token.setForceQuirks(); 1323 return emitAndReconsumeInDataState(); 1324 } 1325 m_token.appendToSystemIdentifier(character); 1326 ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState); 1327 END_STATE() 1328 1329 BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) 1330 if (character == '\'') 1331 ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1332 if (character == '>') { 1333 parseError(); 1334 m_token.setForceQuirks(); 1335 return emitAndResumeInDataState(source); 1336 } 1337 if (character == kEndOfFileMarker) { 1338 parseError(); 1339 m_token.setForceQuirks(); 1340 return emitAndReconsumeInDataState(); 1341 } 1342 m_token.appendToSystemIdentifier(character); 1343 ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState); 1344 END_STATE() 1345 1346 BEGIN_STATE(AfterDOCTYPESystemIdentifierState) 1347 if (isTokenizerWhitespace(character)) 1348 ADVANCE_TO(AfterDOCTYPESystemIdentifierState); 1349 if (character == '>') 1350 return emitAndResumeInDataState(source); 1351 if (character == kEndOfFileMarker) { 1352 parseError(); 1353 m_token.setForceQuirks(); 1354 return emitAndReconsumeInDataState(); 1355 } 1356 parseError(); 1357 ADVANCE_TO(BogusDOCTYPEState); 1358 END_STATE() 1359 1360 BEGIN_STATE(BogusDOCTYPEState) 1361 if (character == '>') 1362 return emitAndResumeInDataState(source); 1363 if (character == kEndOfFileMarker) 1364 return emitAndReconsumeInDataState(); 1365 ADVANCE_TO(BogusDOCTYPEState); 1366 END_STATE() 1367 1368 BEGIN_STATE(CDATASectionState) 1369 if (character == ']') 1370 ADVANCE_TO(CDATASectionRightSquareBracketState); 1371 if (character == kEndOfFileMarker) 1372 RECONSUME_IN(DataState); 1373 bufferCharacter(character); 1374 ADVANCE_TO(CDATASectionState); 1375 END_STATE() 1376 1377 BEGIN_STATE(CDATASectionRightSquareBracketState) 1378 if (character == ']') 1379 ADVANCE_TO(CDATASectionDoubleRightSquareBracketState); 1380 bufferASCIICharacter(']'); 1381 RECONSUME_IN(CDATASectionState); 1382 END_STATE() 1383 1384 BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) 1385 if (character == '>') 1386 ADVANCE_TO(DataState); 1387 bufferASCIICharacter(']'); 1388 bufferASCIICharacter(']'); 1389 RECONSUME_IN(CDATASectionState); 1542 1390 END_STATE() 1543 1391 … … 1562 1410 { 1563 1411 if (tagName == textareaTag || tagName == titleTag) 1564 setState(HTMLTokenizer::RCDATAState);1412 m_state = RCDATAState; 1565 1413 else if (tagName == plaintextTag) 1566 setState(HTMLTokenizer::PLAINTEXTState);1414 m_state = PLAINTEXTState; 1567 1415 else if (tagName == scriptTag) 1568 setState(HTMLTokenizer::ScriptDataState);1416 m_state = ScriptDataState; 1569 1417 else if (tagName == styleTag 1570 1418 || tagName == iframeTag … … 1573 1421 || tagName == noframesTag 1574 1422 || (tagName == noscriptTag && m_options.scriptEnabled)) 1575 setState(HTMLTokenizer::RAWTEXTState); 1576 } 1577 1578 inline bool HTMLTokenizer::temporaryBufferIs(const String& expectedString) 1423 m_state = RAWTEXTState; 1424 } 1425 1426 inline void HTMLTokenizer::appendToTemporaryBuffer(UChar character) 1427 { 1428 ASSERT(isASCII(character)); 1429 m_temporaryBuffer.append(character); 1430 } 1431 1432 inline bool HTMLTokenizer::temporaryBufferIs(const char* expectedString) 1579 1433 { 1580 1434 return vectorEqualsString(m_temporaryBuffer, expectedString); 1581 1435 } 1582 1436 1583 inline void HTMLTokenizer::a ddToPossibleEndTag(LChar cc)1584 { 1585 ASSERT(is EndTagBufferingState(m_state));1586 m_bufferedEndTagName.append(c c);1587 } 1588 1589 inline bool HTMLTokenizer::isAppropriateEndTag() 1437 inline void HTMLTokenizer::appendToPossibleEndTag(UChar character) 1438 { 1439 ASSERT(isASCII(character)); 1440 m_bufferedEndTagName.append(character); 1441 } 1442 1443 inline bool HTMLTokenizer::isAppropriateEndTag() const 1590 1444 { 1591 1445 if (m_bufferedEndTagName.size() != m_appropriateEndTagName.size()) 1592 1446 return false; 1593 1447 1594 size_t numCharacters= m_bufferedEndTagName.size();1595 1596 for ( size_t i = 0; i < numCharacters; i++) {1448 unsigned size = m_bufferedEndTagName.size(); 1449 1450 for (unsigned i = 0; i < size; i++) { 1597 1451 if (m_bufferedEndTagName[i] != m_appropriateEndTagName[i]) 1598 1452 return false; … … 1604 1458 inline void HTMLTokenizer::parseError() 1605 1459 { 1606 notImplemented(); 1607 } 1608 1609 } 1460 } 1461 1462 } -
trunk/Source/WebCore/html/parser/HTMLTokenizer.h
r178173 r178265 1 1 /* 2 * Copyright (C) 2008 Apple Inc. All Rights Reserved.2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved. 3 3 * Copyright (C) 2010 Google, Inc. All Rights Reserved. 4 4 * … … 31 31 #include "HTMLToken.h" 32 32 #include "InputStreamPreprocessor.h" 33 #include "SegmentedString.h"34 33 35 34 namespace WebCore { 36 35 36 class SegmentedString; 37 37 38 class HTMLTokenizer { 38 WTF_MAKE_NONCOPYABLE(HTMLTokenizer);39 WTF_MAKE_FAST_ALLOCATED;40 39 public: 41 explicit HTMLTokenizer(const HTMLParserOptions&); 42 ~HTMLTokenizer(); 43 44 void reset(); 45 40 explicit HTMLTokenizer(const HTMLParserOptions& = HTMLParserOptions()); 41 42 // If we can't parse a whole token, this returns null. 43 class TokenPtr; 44 TokenPtr nextToken(SegmentedString&); 45 46 // Returns a copy of any characters buffered internally by the tokenizer. 47 // The tokenizer buffers characters when searching for the </script> token that terminates a script element. 48 String bufferedCharacters() const; 49 size_t numberOfBufferedCharacters() const; 50 51 // Updates the tokenizer's state according to the given tag name. This is an approximation of how the tree 52 // builder would update the tokenizer's state. This method is useful for approximating HTML tokenization. 53 // To get exactly the correct tokenization, you need the real tree builder. 54 // 55 // The main failures in the approximation are as follows: 56 // 57 // * The first set of character tokens emitted for a <pre> element might contain an extra leading newline. 58 // * The replacement of U+0000 with U+FFFD will not be sensitive to the tree builder's insertion mode. 59 // * CDATA sections in foreign content will be tokenized as bogus comments instead of as character tokens. 60 // 61 // This approximation is also the algorithm called for when parsing an HTML fragment. 62 // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments 63 void updateStateFor(const AtomicString& tagName); 64 65 void setForceNullCharacterReplacement(bool); 66 67 bool shouldAllowCDATA() const; 68 void setShouldAllowCDATA(bool); 69 70 bool isInDataState() const; 71 72 void setDataState(); 73 void setPLAINTEXTState(); 74 void setRAWTEXTState(); 75 void setRCDATAState(); 76 void setScriptDataState(); 77 78 bool neverSkipNullCharacters() const; 79 80 private: 46 81 enum State { 47 82 DataState, … … 89 124 SelfClosingStartTagState, 90 125 BogusCommentState, 91 // The ContinueBogusCommentState is not in the HTML5 spec, but we use 92 // it internally to keep track of whether we've started the bogus 93 // comment token yet. 94 ContinueBogusCommentState, 126 ContinueBogusCommentState, // Not in the HTML spec, used internally to track whether we started the bogus comment token. 95 127 MarkupDeclarationOpenState, 96 128 CommentStartState, … … 122 154 }; 123 155 124 // This function returns true if it emits a token. Otherwise, callers 125 // must provide the same (in progress) token on the next call (unless 126 // they call reset() first). 127 bool nextToken(SegmentedString&, HTMLToken&); 128 129 // Returns a copy of any characters buffered internally by the tokenizer. 130 // The tokenizer buffers characters when searching for the </script> token 131 // that terminates a script element. 132 String bufferedCharacters() const; 133 134 size_t numberOfBufferedCharacters() const 135 { 136 // Notice that we add 2 to the length of the m_temporaryBuffer to 137 // account for the "</" characters, which are effecitvely buffered in 138 // the tokenizer's state machine. 139 return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0; 140 } 141 142 // Updates the tokenizer's state according to the given tag name. This is 143 // an approximation of how the tree builder would update the tokenizer's 144 // state. This method is useful for approximating HTML tokenization. To 145 // get exactly the correct tokenization, you need the real tree builder. 146 // 147 // The main failures in the approximation are as follows: 148 // 149 // * The first set of character tokens emitted for a <pre> element might 150 // contain an extra leading newline. 151 // * The replacement of U+0000 with U+FFFD will not be sensitive to the 152 // tree builder's insertion mode. 153 // * CDATA sections in foreign content will be tokenized as bogus comments 154 // instead of as character tokens. 155 // 156 void updateStateFor(const AtomicString& tagName); 157 158 bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; } 159 void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; } 160 161 bool shouldAllowCDATA() const { return m_shouldAllowCDATA; } 162 void setShouldAllowCDATA(bool value) { m_shouldAllowCDATA = value; } 163 164 State state() const { return m_state; } 165 void setState(State state) { m_state = state; } 166 167 inline bool shouldSkipNullCharacters() const 168 { 169 return !m_forceNullCharacterReplacement 170 && (m_state == HTMLTokenizer::DataState 171 || m_state == HTMLTokenizer::RCDATAState 172 || m_state == HTMLTokenizer::RAWTEXTState); 173 } 174 175 private: 176 inline bool processEntity(SegmentedString&); 177 178 inline void parseError(); 179 180 void bufferASCIICharacter(UChar character) 181 { 182 ASSERT(character != kEndOfFileMarker); 183 ASSERT(isASCII(character)); 184 m_token->appendToCharacter(static_cast<LChar>(character)); 185 } 186 187 void bufferCharacter(UChar character) 188 { 189 ASSERT(character != kEndOfFileMarker); 190 m_token->appendToCharacter(character); 191 } 192 void bufferCharacter(char) = delete; 193 void bufferCharacter(LChar) = delete; 194 195 inline bool emitAndResumeIn(SegmentedString& source, State state) 196 { 197 saveEndTagNameIfNeeded(); 198 m_state = state; 199 source.advanceAndUpdateLineNumber(); 200 return true; 201 } 202 203 inline bool emitAndReconsumeIn(SegmentedString&, State state) 204 { 205 saveEndTagNameIfNeeded(); 206 m_state = state; 207 return true; 208 } 209 210 inline bool emitEndOfFile(SegmentedString& source) 211 { 212 if (haveBufferedCharacterToken()) 213 return true; 214 m_state = HTMLTokenizer::DataState; 215 source.advanceAndUpdateLineNumber(); 216 m_token->clear(); 217 m_token->makeEndOfFile(); 218 return true; 219 } 220 221 inline bool flushEmitAndResumeIn(SegmentedString&, State); 222 223 // Return whether we need to emit a character token before dealing with 224 // the buffered end tag. 225 inline bool flushBufferedEndTag(SegmentedString&); 226 inline bool temporaryBufferIs(const String&); 227 228 // Sometimes we speculatively consume input characters and we don't 229 // know whether they represent end tags or RCDATA, etc. These 230 // functions help manage these state. 231 inline void addToPossibleEndTag(LChar cc); 232 233 inline void saveEndTagNameIfNeeded() 234 { 235 ASSERT(m_token->type() != HTMLToken::Uninitialized); 236 if (m_token->type() == HTMLToken::StartTag) 237 m_appropriateEndTagName = m_token->name(); 238 } 239 inline bool isAppropriateEndTag(); 240 241 242 inline bool haveBufferedCharacterToken() 243 { 244 return m_token->type() == HTMLToken::Character; 245 } 246 247 State m_state; 248 bool m_forceNullCharacterReplacement; 249 bool m_shouldAllowCDATA; 250 251 // m_token is owned by the caller. If nextToken is not on the stack, 252 // this member might be pointing to unallocated memory. 253 HTMLToken* m_token; 254 255 // http://www.whatwg.org/specs/web-apps/current-work/#additional-allowed-character 256 UChar m_additionalAllowedCharacter; 257 258 // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream 259 InputStreamPreprocessor<HTMLTokenizer> m_inputStreamPreprocessor; 156 bool processToken(SegmentedString&); 157 bool processEntity(SegmentedString&); 158 159 void parseError(); 160 161 void bufferASCIICharacter(UChar); 162 void bufferCharacter(UChar); 163 164 bool emitAndResumeInDataState(SegmentedString&); 165 bool emitAndReconsumeInDataState(); 166 bool emitEndOfFile(SegmentedString&); 167 168 // Return true if we wil emit a character token before dealing with the buffered end tag. 169 void flushBufferedEndTag(); 170 bool commitToPartialEndTag(SegmentedString&, UChar, State); 171 bool commitToCompleteEndTag(SegmentedString&); 172 173 void appendToTemporaryBuffer(UChar); 174 bool temporaryBufferIs(const char*); 175 176 // Sometimes we speculatively consume input characters and we don't know whether they represent 177 // end tags or RCDATA, etc. These functions help manage these state. 178 bool inEndTagBufferingState() const; 179 void appendToPossibleEndTag(UChar); 180 void saveEndTagNameIfNeeded(); 181 bool isAppropriateEndTag() const; 182 183 bool haveBufferedCharacterToken() const; 184 185 static bool isNullCharacterSkippingState(State); 186 187 State m_state { DataState }; 188 bool m_forceNullCharacterReplacement { false }; 189 bool m_shouldAllowCDATA { false }; 190 191 mutable HTMLToken m_token; 192 193 // https://html.spec.whatwg.org/#additional-allowed-character 194 UChar m_additionalAllowedCharacter { 0 }; 195 196 // https://html.spec.whatwg.org/#preprocessing-the-input-stream 197 InputStreamPreprocessor<HTMLTokenizer> m_preprocessor; 260 198 261 199 Vector<UChar, 32> m_appropriateEndTagName; 262 200 263 // http ://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer201 // https://html.spec.whatwg.org/#temporary-buffer 264 202 Vector<LChar, 32> m_temporaryBuffer; 265 203 266 // We occa tionally want to emit both a character token and an end tag204 // We occasionally want to emit both a character token and an end tag 267 205 // token (e.g., when lexing script). We buffer the name of the end tag 268 206 // token here so we remember it next time we re-enter the tokenizer. 269 207 Vector<LChar, 32> m_bufferedEndTagName; 270 208 271 HTMLParserOptions m_options;209 const HTMLParserOptions m_options; 272 210 }; 273 211 212 class HTMLTokenizer::TokenPtr { 213 public: 214 TokenPtr(); 215 ~TokenPtr(); 216 217 TokenPtr(TokenPtr&&); 218 TokenPtr& operator=(TokenPtr&&) = delete; 219 220 void clear(); 221 222 operator bool() const; 223 224 HTMLToken& operator*() const; 225 HTMLToken* operator->() const; 226 227 private: 228 friend class HTMLTokenizer; 229 explicit TokenPtr(HTMLToken*); 230 231 HTMLToken* m_token { nullptr }; 232 }; 233 234 inline HTMLTokenizer::TokenPtr::TokenPtr() 235 { 236 } 237 238 inline HTMLTokenizer::TokenPtr::TokenPtr(HTMLToken* token) 239 : m_token(token) 240 { 241 } 242 243 inline HTMLTokenizer::TokenPtr::~TokenPtr() 244 { 245 if (m_token) 246 m_token->clear(); 247 } 248 249 inline HTMLTokenizer::TokenPtr::TokenPtr(TokenPtr&& other) 250 : m_token(other.m_token) 251 { 252 other.m_token = nullptr; 253 } 254 255 inline void HTMLTokenizer::TokenPtr::clear() 256 { 257 if (m_token) { 258 m_token->clear(); 259 m_token = nullptr; 260 } 261 } 262 263 inline HTMLTokenizer::TokenPtr::operator bool() const 264 { 265 return m_token; 266 } 267 268 inline HTMLToken& HTMLTokenizer::TokenPtr::operator*() const 269 { 270 ASSERT(m_token); 271 return *m_token; 272 } 273 274 inline HTMLToken* HTMLTokenizer::TokenPtr::operator->() const 275 { 276 ASSERT(m_token); 277 return m_token; 278 } 279 280 inline HTMLTokenizer::TokenPtr HTMLTokenizer::nextToken(SegmentedString& source) 281 { 282 return TokenPtr(processToken(source) ? &m_token : nullptr); 283 } 284 285 inline size_t HTMLTokenizer::numberOfBufferedCharacters() const 286 { 287 // Notice that we add 2 to the length of the m_temporaryBuffer to 288 // account for the "</" characters, which are effecitvely buffered in 289 // the tokenizer's state machine. 290 return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0; 291 } 292 293 inline void HTMLTokenizer::setForceNullCharacterReplacement(bool value) 294 { 295 m_forceNullCharacterReplacement = value; 296 } 297 298 inline bool HTMLTokenizer::shouldAllowCDATA() const 299 { 300 return m_shouldAllowCDATA; 301 } 302 303 inline void HTMLTokenizer::setShouldAllowCDATA(bool value) 304 { 305 m_shouldAllowCDATA = value; 306 } 307 308 inline bool HTMLTokenizer::isInDataState() const 309 { 310 return m_state == DataState; 311 } 312 313 inline void HTMLTokenizer::setDataState() 314 { 315 m_state = DataState; 316 } 317 318 inline void HTMLTokenizer::setPLAINTEXTState() 319 { 320 m_state = PLAINTEXTState; 321 } 322 323 inline void HTMLTokenizer::setRAWTEXTState() 324 { 325 m_state = RAWTEXTState; 326 } 327 328 inline void HTMLTokenizer::setRCDATAState() 329 { 330 m_state = RCDATAState; 331 } 332 333 inline void HTMLTokenizer::setScriptDataState() 334 { 335 m_state = ScriptDataState; 336 } 337 338 inline bool HTMLTokenizer::isNullCharacterSkippingState(State state) 339 { 340 return state == DataState || state == RCDATAState || state == RAWTEXTState; 341 } 342 343 inline bool HTMLTokenizer::neverSkipNullCharacters() const 344 { 345 return m_forceNullCharacterReplacement; 346 } 347 274 348 } 275 349 -
trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp
r178173 r178265 696 696 processFakePEndTagIfPInButtonScope(); 697 697 m_tree.insertHTMLElement(&token); 698 m_parser.tokenizer().set State(HTMLTokenizer::PLAINTEXTState);698 m_parser.tokenizer().setPLAINTEXTState(); 699 699 return; 700 700 } … … 800 800 m_tree.insertHTMLElement(&token); 801 801 m_shouldSkipLeadingNewline = true; 802 m_parser.tokenizer().set State(HTMLTokenizer::RCDATAState);802 m_parser.tokenizer().setRCDATAState(); 803 803 m_originalInsertionMode = m_insertionMode; 804 804 m_framesetOk = false; … … 2138 2138 // quirks are enabled. We must set the tokenizer's state to 2139 2139 // DataState explicitly if the tokenizer didn't have a chance to. 2140 ASSERT(m_parser.tokenizer(). state() == HTMLTokenizer::DataState|| m_options.usePreHTML5ParserQuirks);2141 m_parser.tokenizer().set State(HTMLTokenizer::DataState);2140 ASSERT(m_parser.tokenizer().isInDataState() || m_options.usePreHTML5ParserQuirks); 2141 m_parser.tokenizer().setDataState(); 2142 2142 return; 2143 2143 } … … 2740 2740 ASSERT(token.type() == HTMLToken::StartTag); 2741 2741 m_tree.insertHTMLElement(&token); 2742 m_parser.tokenizer().set State(HTMLTokenizer::RCDATAState);2742 m_parser.tokenizer().setRCDATAState(); 2743 2743 m_originalInsertionMode = m_insertionMode; 2744 2744 m_insertionMode = InsertionMode::Text; … … 2749 2749 ASSERT(token.type() == HTMLToken::StartTag); 2750 2750 m_tree.insertHTMLElement(&token); 2751 m_parser.tokenizer().set State(HTMLTokenizer::RAWTEXTState);2751 m_parser.tokenizer().setRAWTEXTState(); 2752 2752 m_originalInsertionMode = m_insertionMode; 2753 2753 m_insertionMode = InsertionMode::Text; … … 2758 2758 ASSERT(token.type() == HTMLToken::StartTag); 2759 2759 m_tree.insertScriptElement(&token); 2760 m_parser.tokenizer().setS tate(HTMLTokenizer::ScriptDataState);2760 m_parser.tokenizer().setScriptDataState(); 2761 2761 m_originalInsertionMode = m_insertionMode; 2762 2762 -
trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h
r178173 r178265 41 41 WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor); 42 42 public: 43 InputStreamPreprocessor(Tokenizer*tokenizer)43 explicit InputStreamPreprocessor(Tokenizer& tokenizer) 44 44 : m_tokenizer(tokenizer) 45 45 { … … 52 52 // The only way we can fail to peek is if there are no more 53 53 // characters in |source| (after collapsing \r\n, etc). 54 ALWAYS_INLINE bool peek(SegmentedString& source )54 ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false) 55 55 { 56 if (source.isEmpty()) 57 return false; 58 56 59 m_nextInputCharacter = source.currentChar(); 57 60 … … 65 68 return true; 66 69 } 67 return processNextInputCharacter(source );70 return processNextInputCharacter(source, skipNullCharacters); 68 71 } 69 72 70 73 // Returns whether there are more characters in |source| after advancing. 71 ALWAYS_INLINE bool advance(SegmentedString& source )74 ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false) 72 75 { 73 76 source.advanceAndUpdateLineNumber(); 74 if (source.isEmpty()) 75 return false; 76 return peek(source); 77 return peek(source, skipNullCharacters); 77 78 } 78 79 … … 86 87 87 88 private: 88 bool processNextInputCharacter(SegmentedString& source )89 bool processNextInputCharacter(SegmentedString& source, bool skipNullCharacters) 89 90 { 90 91 ProcessAgain: … … 108 109 // that filtering breaks surrogate pair handling and causes us not to match Minefield. 109 110 if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) { 110 if ( m_tokenizer->shouldSkipNullCharacters()) {111 if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) { 111 112 source.advancePastNonNewline(); 112 113 if (source.isEmpty()) … … 126 127 } 127 128 128 Tokenizer *m_tokenizer;129 Tokenizer& m_tokenizer; 129 130 130 131 // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character -
trunk/Source/WebCore/html/parser/TextDocumentParser.cpp
r178173 r178265 62 62 // Although Text Documents expose a "pre" element in their DOM, they 63 63 // act like a <plaintext> tag, so we have to force plaintext mode. 64 tokenizer().set State(HTMLTokenizer::PLAINTEXTState);64 tokenizer().setPLAINTEXTState(); 65 65 66 66 m_haveInsertedFakePreElement = true; -
trunk/Source/WebCore/html/parser/XSSAuditor.cpp
r178173 r178265 567 567 { 568 568 // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<"). 569 return fullyDecodeString(request.sourceTracker.source ForToken(request.token), m_encoding).substring(0, request.token.name().size() + 1);569 return fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1); 570 570 } 571 571 … … 576 576 // unquoted input of |name=value |, the snippet is |name=value|. 577 577 // FIXME: We should grab one character before the name also. 578 unsigned start = attribute. nameRange.start;579 unsigned end = attribute. valueRange.end;580 String decodedSnippet = fullyDecodeString(request.sourceTracker.source ForToken(request.token).substring(start, end - start), m_encoding);578 unsigned start = attribute.startOffset; 579 unsigned end = attribute.endOffset; 580 String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding); 581 581 decodedSnippet.truncate(kMaximumFragmentLengthTarget); 582 582 if (treatment == SrcLikeAttribute) { … … 631 631 String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request) 632 632 { 633 String string = request.sourceTracker.source ForToken(request.token);633 String string = request.sourceTracker.source(request.token); 634 634 size_t startPosition = 0; 635 635 size_t endPosition = string.length(); … … 738 738 } 739 739 740 bool XSSAuditor::isSafeToSendToAnotherThread() const741 {742 return m_documentURL.isSafeToSendToAnotherThread()743 && m_decodedURL.isSafeToSendToAnotherThread()744 && m_decodedHTTPBody.isSafeToSendToAnotherThread()745 && m_cachedDecodedSnippet.isSafeToSendToAnotherThread();746 }747 748 740 } // namespace WebCore -
trunk/Source/WebCore/html/parser/XSSAuditor.h
r178173 r178265 62 62 63 63 std::unique_ptr<XSSInfo> filterToken(const FilterTokenRequest&); 64 bool isSafeToSendToAnotherThread() const;65 64 66 65 private: -
trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp
r178173 r178265 1 1 /* 2 2 * Copyright (C) 2011, 2013 Google Inc. All rights reserved. 3 * Copyright (C) 2014 Apple Inc. All rights reserved.3 * Copyright (C) 2014-2015 Apple Inc. All rights reserved. 4 4 * 5 5 * Redistribution and use in source and binary forms, with or without … … 42 42 namespace WebCore { 43 43 44 #define WEBVTT_BEGIN_STATE(stateName) case stateName: stateName: 45 #define WEBVTT_ADVANCE_TO(stateName) \ 46 do { \ 47 state = stateName; \ 48 ASSERT(!m_input.isEmpty()); \ 49 m_inputStreamPreprocessor.advance(m_input); \ 50 cc = m_inputStreamPreprocessor.nextInputCharacter(); \ 51 goto stateName; \ 44 #define WEBVTT_ADVANCE_TO(stateName) \ 45 do { \ 46 ASSERT(!m_input.isEmpty()); \ 47 m_preprocessor.advance(m_input); \ 48 character = m_preprocessor.nextInputCharacter(); \ 49 goto stateName; \ 52 50 } while (false) 53 54 51 55 template<unsigned charactersCount> 56 ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount]) 52 template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount]) 57 53 { 58 54 return WTF::equal(s, reinterpret_cast<const LChar*>(characters), charactersCount - 1); … … 80 76 WebVTTTokenizer::WebVTTTokenizer(const String& input) 81 77 : m_input(input) 82 , m_ inputStreamPreprocessor(this)78 , m_preprocessor(*this) 83 79 { 84 80 // Append an EOF marker and close the input "stream". … … 90 86 bool WebVTTTokenizer::nextToken(WebVTTToken& token) 91 87 { 92 if (m_input.isEmpty() || !m_ inputStreamPreprocessor.peek(m_input))88 if (m_input.isEmpty() || !m_preprocessor.peek(m_input)) 93 89 return false; 94 90 95 UChar c c = m_inputStreamPreprocessor.nextInputCharacter();96 if (c c== kEndOfFileMarker) {97 m_ inputStreamPreprocessor.advance(m_input);91 UChar character = m_preprocessor.nextInputCharacter(); 92 if (character == kEndOfFileMarker) { 93 m_preprocessor.advance(m_input); 98 94 return false; 99 95 } … … 103 99 StringBuilder classes; 104 100 105 enum { 106 DataState, 107 EscapeState, 108 TagState, 109 StartTagState, 110 StartTagClassState, 111 StartTagAnnotationState, 112 EndTagState, 113 TimestampTagState, 114 } state = DataState; 115 116 // 4.8.10.13.4 WebVTT cue text tokenizer 117 switch (state) { 118 WEBVTT_BEGIN_STATE(DataState) { 119 if (cc == '&') { 120 buffer.append(static_cast<LChar>(cc)); 101 // 4.8.10.13.4 WebVTT cue text tokenizer 102 DataState: 103 if (character == '&') { 104 buffer.append('&'); 105 WEBVTT_ADVANCE_TO(EscapeState); 106 } else if (character == '<') { 107 if (result.isEmpty()) 108 WEBVTT_ADVANCE_TO(TagState); 109 else { 110 // We don't want to advance input or perform a state transition - just return a (new) token. 111 // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.) 112 return emitToken(token, WebVTTToken::StringToken(result.toString())); 113 } 114 } else if (character == kEndOfFileMarker) 115 return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString())); 116 else { 117 result.append(character); 118 WEBVTT_ADVANCE_TO(DataState); 119 } 120 121 EscapeState: 122 if (character == ';') { 123 if (equalLiteral(buffer, "&")) 124 result.append('&'); 125 else if (equalLiteral(buffer, "<")) 126 result.append('<'); 127 else if (equalLiteral(buffer, ">")) 128 result.append('>'); 129 else if (equalLiteral(buffer, "&lrm")) 130 result.append(leftToRightMark); 131 else if (equalLiteral(buffer, "&rlm")) 132 result.append(rightToLeftMark); 133 else if (equalLiteral(buffer, " ")) 134 result.append(noBreakSpace); 135 else { 136 buffer.append(character); 137 result.append(buffer); 138 } 139 buffer.clear(); 140 WEBVTT_ADVANCE_TO(DataState); 141 } else if (isASCIIAlphanumeric(character)) { 142 buffer.append(character); 143 WEBVTT_ADVANCE_TO(EscapeState); 144 } else if (character == '<') { 145 result.append(buffer); 146 return emitToken(token, WebVTTToken::StringToken(result.toString())); 147 } else if (character == kEndOfFileMarker) { 148 result.append(buffer); 149 return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString())); 150 } else { 151 result.append(buffer); 152 buffer.clear(); 153 154 if (character == '&') { 155 buffer.append('&'); 121 156 WEBVTT_ADVANCE_TO(EscapeState); 122 } else if (cc == '<') {123 if (result.isEmpty())124 WEBVTT_ADVANCE_TO(TagState);125 else {126 // We don't want to advance input or perform a state transition - just return a (new) token.127 // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)128 return emitToken(token, WebVTTToken::StringToken(result.toString()));129 }130 } else if (cc == kEndOfFileMarker)131 return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));132 else {133 result.append(cc);134 WEBVTT_ADVANCE_TO(DataState);135 157 } 136 } 137 END_STATE() 138 139 WEBVTT_BEGIN_STATE(EscapeState) { 140 if (cc == ';') { 141 if (equalLiteral(buffer, "&")) 142 result.append('&'); 143 else if (equalLiteral(buffer, "<")) 144 result.append('<'); 145 else if (equalLiteral(buffer, ">")) 146 result.append('>'); 147 else if (equalLiteral(buffer, "&lrm")) 148 result.append(leftToRightMark); 149 else if (equalLiteral(buffer, "&rlm")) 150 result.append(rightToLeftMark); 151 else if (equalLiteral(buffer, " ")) 152 result.append(noBreakSpace); 153 else { 154 buffer.append(static_cast<LChar>(cc)); 155 result.append(buffer); 156 } 157 buffer.clear(); 158 WEBVTT_ADVANCE_TO(DataState); 159 } else if (isASCIIAlphanumeric(cc)) { 160 buffer.append(static_cast<LChar>(cc)); 161 WEBVTT_ADVANCE_TO(EscapeState); 162 } else if (cc == '<') { 163 result.append(buffer); 164 return emitToken(token, WebVTTToken::StringToken(result.toString())); 165 } else if (cc == kEndOfFileMarker) { 166 result.append(buffer); 167 return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString())); 168 } else { 169 result.append(buffer); 170 buffer.clear(); 171 172 if (cc == '&') { 173 buffer.append(static_cast<LChar>(cc)); 174 WEBVTT_ADVANCE_TO(EscapeState); 175 } 176 result.append(cc); 177 WEBVTT_ADVANCE_TO(DataState); 178 } 179 } 180 END_STATE() 181 182 WEBVTT_BEGIN_STATE(TagState) { 183 if (isTokenizerWhitespace(cc)) { 184 ASSERT(result.isEmpty()); 185 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 186 } else if (cc == '.') { 187 ASSERT(result.isEmpty()); 188 WEBVTT_ADVANCE_TO(StartTagClassState); 189 } else if (cc == '/') { 190 WEBVTT_ADVANCE_TO(EndTagState); 191 } else if (WTF::isASCIIDigit(cc)) { 192 result.append(cc); 193 WEBVTT_ADVANCE_TO(TimestampTagState); 194 } else if (cc == '>' || cc == kEndOfFileMarker) { 195 ASSERT(result.isEmpty()); 196 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString())); 197 } else { 198 result.append(cc); 199 WEBVTT_ADVANCE_TO(StartTagState); 200 } 201 } 202 END_STATE() 203 204 WEBVTT_BEGIN_STATE(StartTagState) { 205 if (isTokenizerWhitespace(cc)) 206 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 207 else if (cc == '.') 208 WEBVTT_ADVANCE_TO(StartTagClassState); 209 else if (cc == '>' || cc == kEndOfFileMarker) 210 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString())); 211 else { 212 result.append(cc); 213 WEBVTT_ADVANCE_TO(StartTagState); 214 } 215 } 216 END_STATE() 217 218 WEBVTT_BEGIN_STATE(StartTagClassState) { 219 if (isTokenizerWhitespace(cc)) { 220 addNewClass(classes, buffer); 221 buffer.clear(); 222 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 223 } else if (cc == '.') { 224 addNewClass(classes, buffer); 225 buffer.clear(); 226 WEBVTT_ADVANCE_TO(StartTagClassState); 227 } else if (cc == '>' || cc == kEndOfFileMarker) { 228 addNewClass(classes, buffer); 229 buffer.clear(); 230 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString())); 231 } else { 232 buffer.append(cc); 233 WEBVTT_ADVANCE_TO(StartTagClassState); 234 } 235 236 } 237 END_STATE() 238 239 WEBVTT_BEGIN_STATE(StartTagAnnotationState) { 240 if (cc == '>' || cc == kEndOfFileMarker) { 241 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString())); 242 } 243 buffer.append(cc); 158 result.append(character); 159 WEBVTT_ADVANCE_TO(DataState); 160 } 161 162 TagState: 163 if (isTokenizerWhitespace(character)) { 164 ASSERT(result.isEmpty()); 244 165 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 245 } 246 END_STATE() 247 248 WEBVTT_BEGIN_STATE(EndTagState) { 249 if (cc == '>' || cc == kEndOfFileMarker) 250 return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString())); 251 result.append(cc); 166 } else if (character == '.') { 167 ASSERT(result.isEmpty()); 168 WEBVTT_ADVANCE_TO(StartTagClassState); 169 } else if (character == '/') { 252 170 WEBVTT_ADVANCE_TO(EndTagState); 253 } 254 END_STATE() 255 256 WEBVTT_BEGIN_STATE(TimestampTagState) { 257 if (cc == '>' || cc == kEndOfFileMarker) 258 return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString())); 259 result.append(cc); 171 } else if (WTF::isASCIIDigit(character)) { 172 result.append(character); 260 173 WEBVTT_ADVANCE_TO(TimestampTagState); 261 } 262 END_STATE() 263 264 } 265 266 ASSERT_NOT_REACHED(); 267 return false; 174 } else if (character == '>' || character == kEndOfFileMarker) { 175 ASSERT(result.isEmpty()); 176 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString())); 177 } else { 178 result.append(character); 179 WEBVTT_ADVANCE_TO(StartTagState); 180 } 181 182 StartTagState: 183 if (isTokenizerWhitespace(character)) 184 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 185 else if (character == '.') 186 WEBVTT_ADVANCE_TO(StartTagClassState); 187 else if (character == '>' || character == kEndOfFileMarker) 188 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString())); 189 else { 190 result.append(character); 191 WEBVTT_ADVANCE_TO(StartTagState); 192 } 193 194 StartTagClassState: 195 if (isTokenizerWhitespace(character)) { 196 addNewClass(classes, buffer); 197 buffer.clear(); 198 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 199 } else if (character == '.') { 200 addNewClass(classes, buffer); 201 buffer.clear(); 202 WEBVTT_ADVANCE_TO(StartTagClassState); 203 } else if (character == '>' || character == kEndOfFileMarker) { 204 addNewClass(classes, buffer); 205 buffer.clear(); 206 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString())); 207 } else { 208 buffer.append(character); 209 WEBVTT_ADVANCE_TO(StartTagClassState); 210 } 211 212 StartTagAnnotationState: 213 if (character == '>' || character == kEndOfFileMarker) 214 return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString())); 215 buffer.append(character); 216 WEBVTT_ADVANCE_TO(StartTagAnnotationState); 217 218 EndTagState: 219 if (character == '>' || character == kEndOfFileMarker) 220 return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString())); 221 result.append(character); 222 WEBVTT_ADVANCE_TO(EndTagState); 223 224 TimestampTagState: 225 if (character == '>' || character == kEndOfFileMarker) 226 return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString())); 227 result.append(character); 228 WEBVTT_ADVANCE_TO(TimestampTagState); 268 229 } 269 230 -
trunk/Source/WebCore/html/track/WebVTTTokenizer.h
r178173 r178265 41 41 42 42 class WebVTTTokenizer { 43 WTF_MAKE_NONCOPYABLE(WebVTTTokenizer);44 43 public: 45 44 explicit WebVTTTokenizer(const String&); 46 47 45 bool nextToken(WebVTTToken&); 48 46 49 inline bool shouldSkipNullCharacters() const { return true; }47 static bool neverSkipNullCharacters() { return false; } 50 48 51 49 private: 52 50 SegmentedString m_input; 53 54 // ://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream 55 InputStreamPreprocessor<WebVTTTokenizer> m_inputStreamPreprocessor; 51 InputStreamPreprocessor<WebVTTTokenizer> m_preprocessor; 56 52 }; 57 53 -
trunk/Source/WebCore/platform/text/SegmentedString.cpp
r178173 r178265 20 20 #include "config.h" 21 21 #include "SegmentedString.h" 22 23 #include <wtf/text/TextPosition.h> 22 24 23 25 namespace WebCore { … … 45 47 } 46 48 47 constSegmentedString& SegmentedString::operator=(const SegmentedString& other)49 SegmentedString& SegmentedString::operator=(const SegmentedString& other) 48 50 { 49 51 m_pushedChar1 = other.m_pushedChar1; … … 131 133 } 132 134 133 void SegmentedString::p repend(const SegmentedSubstring& s)134 { 135 ASSERT(! escaped());135 void SegmentedString::pushBack(const SegmentedSubstring& s) 136 { 137 ASSERT(!m_pushedChar1); 136 138 ASSERT(!s.numberOfCharactersConsumed()); 137 139 if (!s.m_length) 138 140 return; 139 141 140 // FIXME: We're assuming that the prependwere originally consumed by142 // FIXME: We're assuming that the characters were originally consumed by 141 143 // this SegmentedString. We're also ASSERTing that s is a fresh 142 144 // SegmentedSubstring. These assumptions are sufficient for our … … 167 169 { 168 170 ASSERT(!m_closed); 169 ASSERT(!s. escaped());171 ASSERT(!s.m_pushedChar1); 170 172 append(s.m_currentString); 171 173 if (s.isComposite()) { … … 178 180 } 179 181 180 void SegmentedString::p repend(const SegmentedString& s)181 { 182 ASSERT(! escaped());183 ASSERT(!s. escaped());182 void SegmentedString::pushBack(const SegmentedString& s) 183 { 184 ASSERT(!m_pushedChar1); 185 ASSERT(!s.m_pushedChar1); 184 186 if (s.isComposite()) { 185 187 Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin(); 186 188 Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend(); 187 189 for (; it != e; ++it) 188 p repend(*it);189 } 190 p repend(s.m_currentString);190 pushBack(*it); 191 } 192 pushBack(s.m_currentString); 191 193 m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0); 192 194 } … … 229 231 } 230 232 231 void SegmentedString::advance (unsigned count, UChar* consumedCharacters)233 void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters) 232 234 { 233 235 ASSERT_WITH_SECURITY_IMPLICATION(count <= length()); 234 236 for (unsigned i = 0; i < count; ++i) { 235 237 consumedCharacters[i] = currentChar(); 236 advance ();238 advancePastNonNewline(); 237 239 } 238 240 } … … 354 356 OrdinalNumber SegmentedString::currentColumn() const 355 357 { 356 int zeroBasedColumn = numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine; 357 return OrdinalNumber::fromZeroBasedInt(zeroBasedColumn); 358 return OrdinalNumber::fromZeroBasedInt(numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine); 358 359 } 359 360 … … 364 365 } 365 366 366 } 367 SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive) 368 { 369 unsigned length = strlen(literal); 370 if (length > this->length()) 371 return NotEnoughCharacters; 372 UChar* consumedCharacters; 373 String consumedString = String::createUninitialized(length, consumedCharacters); 374 advancePastNonNewlines(length, consumedCharacters); 375 if (consumedString.startsWith(literal, caseSensitive)) 376 return DidMatch; 377 pushBack(SegmentedString(consumedString)); 378 return DidNotMatch; 379 } 380 381 } -
trunk/Source/WebCore/platform/text/SegmentedString.h
r178173 r178265 1 1 /* 2 Copyright (C) 2004 , 2005, 2006, 2007, 2008Apple Inc. All rights reserved.2 Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved. 3 3 4 4 This library is free software; you can redistribute it and/or … … 23 23 #include <wtf/Deque.h> 24 24 #include <wtf/text/StringBuilder.h> 25 #include <wtf/text/TextPosition.h>26 #include <wtf/text/WTFString.h>27 25 28 26 namespace WebCore { … … 171 169 172 170 SegmentedString(const SegmentedString&); 173 174 const SegmentedString& operator=(const SegmentedString&); 171 SegmentedString& operator=(const SegmentedString&); 175 172 176 173 void clear(); … … 178 175 179 176 void append(const SegmentedString&); 180 void prepend(const SegmentedString&); 181 182 bool excludeLineNumbers() const { return m_currentString.excludeLineNumbers(); } 177 void pushBack(const SegmentedString&); 178 183 179 void setExcludeLineNumbers(); 184 180 … … 200 196 bool isClosed() const { return m_closed; } 201 197 202 enum LookAheadResult { 203 DidNotMatch, 204 DidMatch, 205 NotEnoughCharacters, 206 }; 207 208 LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); } 209 LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); } 198 enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters }; 199 template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); } 200 template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); } 210 201 211 202 void advance() … … 227 218 } 228 219 229 inlinevoid advanceAndUpdateLineNumber()220 void advanceAndUpdateLineNumber() 230 221 { 231 222 if (m_fastPathFlags & Use8BitAdvance) { … … 252 243 253 244 (this->*m_advanceAndUpdateLineNumberFunc)(); 254 }255 256 void advanceAndASSERT(UChar expectedCharacter)257 {258 ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);259 advance();260 }261 262 void advanceAndASSERTIgnoringCase(UChar expectedCharacter)263 {264 ASSERT_UNUSED(expectedCharacter, u_foldCase(currentChar(), U_FOLD_CASE_DEFAULT) == u_foldCase(expectedCharacter, U_FOLD_CASE_DEFAULT));265 advance();266 245 } 267 246 … … 287 266 } 288 267 289 // Writes the consumed characters into consumedCharacters, which must290 // have space for at least |count| characters.291 void advance(unsigned count, UChar* consumedCharacters);292 293 bool escaped() const { return m_pushedChar1; }294 295 268 int numberOfCharactersConsumed() const 296 269 { … … 308 281 UChar currentChar() const { return m_currentChar; } 309 282 310 // The method is moderately slow, comparing to currentLine method.311 283 OrdinalNumber currentColumn() const; 312 284 OrdinalNumber currentLine() const; 313 // Sets value of line/column variables. Column is specified indirectly by a parameter columnAftreProlog 285 286 // Sets value of line/column variables. Column is specified indirectly by a parameter columnAfterProlog 314 287 // which is a value of column that we should get after a prolog (first prologLength characters) has been consumed. 315 void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAft reProlog, int prologLength);288 void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAfterProlog, int prologLength); 316 289 317 290 private: … … 323 296 324 297 void append(const SegmentedSubstring&); 325 void p repend(const SegmentedSubstring&);298 void pushBack(const SegmentedSubstring&); 326 299 327 300 void advance8(); … … 375 348 } 376 349 377 inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive) 378 { 379 if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) { 380 String currentSubstring = m_currentString.currentSubString(string.length()); 381 if (currentSubstring.startsWith(string, caseSensitive)) 382 return DidMatch; 383 return DidNotMatch; 384 } 385 return lookAheadSlowCase(string, caseSensitive); 386 } 387 388 LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive) 389 { 390 unsigned count = string.length(); 391 if (count > length()) 392 return NotEnoughCharacters; 393 UChar* consumedCharacters; 394 String consumedString = String::createUninitialized(count, consumedCharacters); 395 advance(count, consumedCharacters); 396 LookAheadResult result = DidNotMatch; 397 if (consumedString.startsWith(string, caseSensitive)) 398 result = DidMatch; 399 prepend(SegmentedString(consumedString)); 400 return result; 401 } 350 // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters. 351 void advancePastNonNewlines(unsigned count); 352 void advancePastNonNewlines(unsigned count, UChar* consumedCharacters); 353 354 AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive); 355 AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive); 402 356 403 357 bool isComposite() const { return !m_substrings.isEmpty(); } … … 418 372 }; 419 373 374 inline void SegmentedString::advancePastNonNewlines(unsigned count) 375 { 376 for (unsigned i = 0; i < count; ++i) 377 advancePastNonNewline(); 420 378 } 421 379 380 inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive) 381 { 382 ASSERT(strlen(literal) == length); 383 ASSERT(!strchr(literal, '\n')); 384 if (!m_pushedChar1) { 385 if (length <= static_cast<unsigned>(m_currentString.m_length)) { 386 if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive)) 387 return DidNotMatch; 388 advancePastNonNewlines(length); 389 return DidMatch; 390 } 391 } 392 return advancePastSlowCase(literal, caseSensitive); 393 } 394 395 } 396 422 397 #endif -
trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h
r178173 r178265 32 32 namespace WebCore { 33 33 34 inline bool isHexDigit(UChar cc)35 {36 return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f') || (cc >= 'A' && cc <= 'F');37 }38 39 34 inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters) 40 35 { 41 if (consumedCharacters.length() == 1) 42 source.push(consumedCharacters[0]); 43 else if (consumedCharacters.length() == 2) { 44 source.push(consumedCharacters[0]); 45 source.push(consumedCharacters[1]); 46 } else 47 source.prepend(SegmentedString(consumedCharacters.toStringPreserveCapacity())); 36 source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity())); 48 37 } 49 38 … … 55 44 ASSERT(decodedCharacter.isEmpty()); 56 45 57 enum EntityState{46 enum { 58 47 Initial, 59 48 Number, … … 63 52 Decimal, 64 53 Named 65 }; 66 EntityState entityState = Initial; 54 } state = Initial; 67 55 UChar32 result = 0; 68 56 bool overflow = false; … … 71 59 72 60 while (!source.isEmpty()) { 73 UChar c c= source.currentChar();74 switch ( entityState) {75 case Initial: {76 if (c c == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc== '&')61 UChar character = source.currentChar(); 62 switch (state) { 63 case Initial: 64 if (character == '\x09' || character == '\x0A' || character == '\x0C' || character == ' ' || character == '<' || character == '&') 77 65 return false; 78 if (additionalAllowedCharacter && c c== additionalAllowedCharacter)66 if (additionalAllowedCharacter && character == additionalAllowedCharacter) 79 67 return false; 80 if (c c== '#') {81 entityState = Number;68 if (character == '#') { 69 state = Number; 82 70 break; 83 71 } 84 if ( (cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z')) {85 entityState = Named;86 continue;72 if (isASCIIAlpha(character)) { 73 state = Named; 74 goto Named; 87 75 } 88 76 return false; 89 } 90 case Number: { 91 if (cc == 'x') { 92 entityState = MaybeHexLowerCaseX; 77 case Number: 78 if (character == 'x') { 79 state = MaybeHexLowerCaseX; 93 80 break; 94 81 } 95 if (c c== 'X') {96 entityState = MaybeHexUpperCaseX;82 if (character == 'X') { 83 state = MaybeHexUpperCaseX; 97 84 break; 98 85 } 99 if ( cc >= '0' && cc <= '9') {100 entityState = Decimal;101 continue;86 if (isASCIIDigit(character)) { 87 state = Decimal; 88 goto Decimal; 102 89 } 103 source.push ('#');90 source.pushBack(SegmentedString(ASCIILiteral("#"))); 104 91 return false; 105 } 106 case MaybeHexLowerCaseX: { 107 if (isHexDigit(cc)) { 108 entityState = Hex; 109 continue; 92 case MaybeHexLowerCaseX: 93 if (isASCIIHexDigit(character)) { 94 state = Hex; 95 goto Hex; 110 96 } 111 source.push('#'); 112 source.push('x'); 97 source.pushBack(SegmentedString(ASCIILiteral("#x"))); 113 98 return false; 114 } 115 case MaybeHexUpperCaseX: { 116 if (isHexDigit(cc)) { 117 entityState = Hex; 118 continue; 99 case MaybeHexUpperCaseX: 100 if (isASCIIHexDigit(character)) { 101 state = Hex; 102 goto Hex; 119 103 } 120 source.push('#'); 121 source.push('X'); 104 source.pushBack(SegmentedString(ASCIILiteral("#X"))); 122 105 return false; 123 }124 case Hex: {125 if ( cc >= '0' && cc <= '9')126 result = result * 16 + cc - '0';127 else if (cc >= 'a' && cc <= 'f')128 result = result * 16 + 10 + cc - 'a';129 else if (cc >= 'A' && cc <= 'F')130 result = result * 16 + 10 + cc - 'A';131 else if (cc== ';') {132 source.advance AndASSERT(cc);106 case Hex: 107 Hex: 108 if (isASCIIHexDigit(character)) { 109 result = result * 16 + toASCIIHexValue(character); 110 if (result > highestValidCharacter) 111 overflow = true; 112 break; 113 } 114 if (character == ';') { 115 source.advance(); 133 116 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result)); 134 117 return true; 135 } else if (ParserFunctions::acceptMalformed()) { 118 } 119 if (ParserFunctions::acceptMalformed()) { 136 120 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result)); 137 121 return true; 138 } else {139 unconsumeCharacters(source, consumedCharacters);140 return false;141 122 } 142 if (result > highestValidCharacter) 143 overflow = true; 144 break; 145 } 146 case Decimal: { 147 if (cc >= '0' && cc <= '9') 148 result = result * 10 + cc - '0'; 149 else if (cc == ';') { 150 source.advanceAndASSERT(cc); 123 unconsumeCharacters(source, consumedCharacters); 124 return false; 125 case Decimal: 126 Decimal: 127 if (isASCIIDigit(character)) { 128 result = result * 10 + character - '0'; 129 if (result > highestValidCharacter) 130 overflow = true; 131 break; 132 } 133 if (character == ';') { 134 source.advance(); 151 135 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result)); 152 136 return true; 153 } else if (ParserFunctions::acceptMalformed()) { 137 } 138 if (ParserFunctions::acceptMalformed()) { 154 139 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result)); 155 140 return true; 156 } else {157 unconsumeCharacters(source, consumedCharacters);158 return false;159 141 } 160 if (result > highestValidCharacter) 161 overflow = true; 162 break; 142 unconsumeCharacters(source, consumedCharacters); 143 return false; 144 case Named: 145 Named: 146 return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, character); 163 147 } 164 case Named: { 165 return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, cc); 166 } 167 } 168 consumedCharacters.append(cc); 169 source.advanceAndASSERT(cc); 148 consumedCharacters.append(character); 149 source.advance(); 170 150 } 171 151 ASSERT(source.isEmpty()); -
trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h
r178173 r178265 1 1 /* 2 * Copyright (C) 2008 Apple Inc. All Rights Reserved.2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved. 3 3 * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/ 4 4 * Copyright (C) 2010 Google, Inc. All Rights Reserved. … … 31 31 #include "SegmentedString.h" 32 32 33 namespace WebCore {34 35 inline bool isTokenizerWhitespace(UChar cc)36 {37 return cc == ' ' || cc == '\x0A' || cc == '\x09' || cc == '\x0C';38 }39 40 inline void advanceStringAndASSERTIgnoringCase(SegmentedString& source, const char* expectedCharacters)41 {42 while (*expectedCharacters)43 source.advanceAndASSERTIgnoringCase(*expectedCharacters++);44 }45 46 inline void advanceStringAndASSERT(SegmentedString& source, const char* expectedCharacters)47 {48 while (*expectedCharacters)49 source.advanceAndASSERT(*expectedCharacters++);50 }51 52 33 #if COMPILER(MSVC) 53 // We need to disable the "unreachable code" warning because we want to assert 54 // that some code points aren't reached in the state machine. 34 // Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro. 55 35 #pragma warning(disable: 4702) 56 36 #endif 57 37 58 #define BEGIN_STATE(prefix, stateName) case prefix::stateName: stateName: 59 #define END_STATE() ASSERT_NOT_REACHED(); break; 38 namespace WebCore { 60 39 61 // We use this macro when the HTML5 spec says "reconsume the current input 62 // character in the <mumble> state." 63 #define RECONSUME_IN(prefix, stateName) \ 64 do { \ 65 m_state = prefix::stateName; \ 66 goto stateName; \ 40 inline bool isTokenizerWhitespace(UChar character) 41 { 42 return character == ' ' || character == '\x0A' || character == '\x09' || character == '\x0C'; 43 } 44 45 #define BEGIN_STATE(stateName) \ 46 case stateName: \ 47 stateName: { \ 48 const auto currentState = stateName; \ 49 UNUSED_PARAM(currentState); 50 51 #define END_STATE() \ 52 ASSERT_NOT_REACHED(); \ 53 break; \ 54 } 55 56 #define RETURN_IN_CURRENT_STATE(expression) \ 57 do { \ 58 m_state = currentState; \ 59 return expression; \ 67 60 } while (false) 68 61 69 // We use this macro when the HTML5 spec says "consume the next input 70 // character ... and switch to the <mumble> state." 71 #define ADVANCE_TO(prefix, stateName) \ 72 do { \ 73 m_state = prefix::stateName; \ 74 if (!m_inputStreamPreprocessor.advance(source)) \ 75 return haveBufferedCharacterToken(); \ 76 cc = m_inputStreamPreprocessor.nextInputCharacter(); \ 77 goto stateName; \ 62 // We use this macro when the HTML spec says "reconsume the current input character in the <mumble> state." 63 #define RECONSUME_IN(newState) \ 64 do { \ 65 goto newState; \ 78 66 } while (false) 79 67 80 // Sometimes there's more complicated logic in the spec that separates when 81 // we consume the next input character and when we switch to a particular 82 // state. We handle those cases by advancing the source directly and using 83 // this macro to switch to the indicated state. 84 #define SWITCH_TO(prefix, stateName) \ 85 do { \ 86 m_state = prefix::stateName; \ 87 if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source)) \ 88 return haveBufferedCharacterToken(); \ 89 cc = m_inputStreamPreprocessor.nextInputCharacter(); \ 90 goto stateName; \ 68 // We use this macro when the HTML spec says "consume the next input character ... and switch to the <mumble> state." 69 #define ADVANCE_TO(newState) \ 70 do { \ 71 if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \ 72 m_state = newState; \ 73 return haveBufferedCharacterToken(); \ 74 } \ 75 character = m_preprocessor.nextInputCharacter(); \ 76 goto newState; \ 77 } while (false) 78 79 // For more complex cases, caller consumes the characters first and then uses this macro. 80 #define SWITCH_TO(newState) \ 81 do { \ 82 if (!m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \ 83 m_state = newState; \ 84 return haveBufferedCharacterToken(); \ 85 } \ 86 character = m_preprocessor.nextInputCharacter(); \ 87 goto newState; \ 91 88 } while (false) 92 89
Note: See TracChangeset
for help on using the changeset viewer.