Changeset 178265 in webkit


Ignore:
Timestamp:
Jan 12, 2015 8:22:50 AM (9 years ago)
Author:
Darin Adler
Message:

Modernize and streamline HTMLTokenizer
https://bugs.webkit.org/show_bug.cgi?id=140166

Reviewed by Sam Weinig.

Source/WebCore:

  • html/parser/AtomicHTMLToken.h:

(WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
based on fields I removed.

  • html/parser/HTMLDocumentParser.cpp:

(WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
to set the initial state when parsing a fragment, since it implements the same
rule taht the tokenizerStateForContextElement function did.
(WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
interfaces for HTMLSourceTracker and HTMLTokenizer.
(WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
for non-character tokens, and let them get cleared later for character tokens.
(WebCore::HTMLDocumentParser::insert): Pass references.
(WebCore::HTMLDocumentParser::append): Ditto.
(WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.

  • html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken

and removed now-unneeded m_token data members.

  • html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.

(WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
advanceAndASSERT with just plain advance; there's really no need to assert the
character is the one we just got out of the string.

  • html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from

its old location since this class has two data members that are OrdinalNumber.

  • html/parser/HTMLMetaCharsetParser.cpp:

(WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
initialization, since it's now done by defaults.
(WebCore::extractCharset): Rewrote this to be a non-member function, and to
use a for loop, and to handle quote marks in a simpler way. Also changed it
to return a StringView so we don't have to allocate a new string.
(WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
also take a token argument since it's no longer a data member.
(WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
loop, StringView instead of string, and don't bother naming the local enum.
(WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
way of getting tokens from the tokenizer.

  • html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and

tightened up the formatting a little. Don't bother allocating the tokenizer
on the heap.

  • html/parser/HTMLPreloadScanner.cpp:

(WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
initialization.
(WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
(WebCore::HTMLPreloadScanner::scan): Changed to take a reference.

  • html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,

and forward declarations. Removed explicit declaration of the destructor,
since the default one works. Removed unused createCheckpoint and rewindTo
functions. Gave initial values for various data members. Marked the device
scale factor const beacuse it's set in the constructor and never changed.
Also removed the unneeded isSafeToSendToAnotherThread.

  • html/parser/HTMLResourcePreloader.cpp:

(WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.

  • html/parser/HTMLResourcePreloader.h:

(WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
isolatedCopy. Also removed isSafeToSendToAnotherThread.

  • html/parser/HTMLSourceTracker.cpp:

(WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state

in the source tracker itself, not the token.

(WebCore::HTMLSourceTracker::endToken): Ditto.
(WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
from the source tracker.

  • html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.

Renamed functions, removed now-unneeded comment.

  • html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.

It only needs to know the start and end of each attribute, not each part of
each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
m_baseOffset and m_length. Added beginAttribute and endAttribute.
(WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
(WebCore::HTMLToken::length): Deleted.
(WebCore::HTMLToken::setBaseOffset): Deleted.
(WebCore::HTMLToken::setEndOffset): Deleted.
(WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
are compiling in assertions.
(WebCore::HTMLToken::beginEndTag): Ditto.
(WebCore::HTMLToken::addNewAttribute): Deleted.
(WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
here and set the start offset.
(WebCore::HTMLToken::beginAttributeName): Deleted.
(WebCore::HTMLToken::endAttributeName): Deleted.
(WebCore::HTMLToken::beginAttributeValue): Deleted.
(WebCore::HTMLToken::endAttributeValue): Deleted.

  • html/parser/HTMLTokenizer.cpp:

(WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
(WebCore::HTMLToken::appendToAttributeName): Updated assertion.
(WebCore::HTMLToken::appendToAttributeValue): Ditto.
(WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
so it's legal to call on lower case letters too.
(WebCore::vectorEqualsString): Changed to take a string literal rather than
a WTF::String.
(WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
(WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
(WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
bufferCharacter for the common case where we know the character is ASCII.
(WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
header since it's only used inside the class.
(WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
it and removed the state argument.
(WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
(WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
(WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
(WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
(WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
the actual token, not just a pointer.
(WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
removed the state argument.
(WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
is now the internal function used by nextToken. Updated its contents to use
simpler macros, changed code to set m_state when returning, rather than
constantly setting it when cycling through states, switched style to use
early return/goto rather than lots of else statements, took out unneeded
braces now that BEGIN/END_STATE handles the braces, collapsed upper and
lower case letter handling in many states, changed lookAhead call sites to
use the new advancePast function instead.
(WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
calling a setstate function.
(WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
(WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
a literal instead of a WTF::String.
(WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
to be a UChar instead of LChar, although all characters will be ASCII.
(WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
type from size_t to unsigned.

  • html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns

a TokenPtr so code doesn't have to understand special rules about when to
work with an HTMLToken and when to clear it. Made most functions private,
and made the State enum private as well. Replaced the state and setState
functions with more specific functions for the few states we need to deal
with outside the class. Moved function bodies outside the class definition
so it's easier to read the class definition.

  • html/parser/HTMLTreeBuilder.cpp:

(WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
new set state functions instead of setState.
(WebCore::HTMLTreeBuilder::processEndTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.

  • html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,

and mde it take a reference rather than a pointer.

  • html/parser/TextDocumentParser.cpp:

(WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
new set state functions instead of setState.

  • html/parser/XSSAuditor.cpp:

(WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
(WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
attribute range tracking.
(WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
(WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.

  • html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.
  • html/track/WebVTTTokenizer.cpp: Removed the local state variable from

WEBVTT_ADVANCE_TO; there is no need for it.
(WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
pointer for the preprocessor.
(WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
variable and the switch statement, replacing with labels instead since we
go between states with goto.

  • platform/text/SegmentedString.cpp:

(WebCore::SegmentedString::operator=): Changed the return type to be non-const
to match normal C++ design rules.
(WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
general purpose prepend function. Also fixed assertions to not use the strangely
named "escaped" function, since we are deleting it.
(WebCore::SegmentedString::append): Ditto.
(WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
the function only works for non-newlines.
(WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
(WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
renamed. This function now consumes the characters if they match.

  • platform/text/SegmentedString.h: Made the changes mentioned above.

(WebCore::SegmentedString::excludeLineNumbers): Deleted.
(WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
behavior so the characters are consumed.
(WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
(WebCore::SegmentedString::advanceAndASSERT): Deleted.
(WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
(WebCore::SegmentedString::escaped): Deleted.

  • xml/parser/CharacterReferenceParserInlines.h:

(WebCore::isHexDigit): Deleted.
(WebCore::unconsumeCharacters): Updated for name change.
(WebCore::consumeCharacterReference): Removed unneeded name for local enum,
renamed local variable "cc" to character. Changed code to use helpers like
isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
since we don't really need to assert the character we just extracted.

  • xml/parser/MarkupTokenizerInlines.h:

(WebCore::isTokenizerWhitespace): Renamed argument to character.
(WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
(WebCore::advanceStringAndASSERT): Deleted.
Changed all the macro implementations so they set m_state only when
returning from the function and just use goto inside the state machine.

Source/WTF:

  • wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.
Location:
trunk/Source
Files:
30 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/WTF/ChangeLog

    r178173 r178265  
     12015-01-12  Darin Adler  <darin@apple.com>
     2
     3        Modernize and streamline HTMLTokenizer
     4        https://bugs.webkit.org/show_bug.cgi?id=140166
     5
     6        Reviewed by Sam Weinig.
     7
     8        * wtf/Forward.h: Removed PassRef, added OrdinalNumber and TextPosition.
     9
    1102015-01-09  Commit Queue  <commit-queue@webkit.org>
    211
  • trunk/Source/WTF/wtf/Forward.h

    r178173 r178265  
    3131template<typename T> class OwnPtr;
    3232template<typename T> class PassOwnPtr;
    33 template<typename T> class PassRef;
    3433template<typename T> class PassRefPtr;
    3534template<typename T> class RefPtr;
     
    4645class Encoder;
    4746class FunctionDispatcher;
     47class OrdinalNumber;
    4848class PrintStream;
    4949class String;
     
    5151class StringImpl;
    5252class StringView;
     53class TextPosition;
    5354
    5455}
     
    6465using WTF::LazyNeverDestroyed;
    6566using WTF::NeverDestroyed;
     67using WTF::OrdinalNumber;
    6668using WTF::OwnPtr;
    6769using WTF::PassOwnPtr;
    68 using WTF::PassRef;
    6970using WTF::PassRefPtr;
    7071using WTF::PrintStream;
     
    7677using WTF::StringImpl;
    7778using WTF::StringView;
     79using WTF::TextPosition;
    7880using WTF::Vector;
    7981
  • trunk/Source/WebCore/ChangeLog

    r178253 r178265  
     12015-01-12  Darin Adler  <darin@apple.com>
     2
     3        Modernize and streamline HTMLTokenizer
     4        https://bugs.webkit.org/show_bug.cgi?id=140166
     5
     6        Reviewed by Sam Weinig.
     7
     8        * html/parser/AtomicHTMLToken.h:
     9        (WebCore::AtomicHTMLToken::initializeAttributes): Removed unneeded assertions
     10        based on fields I removed.
     11
     12        * html/parser/HTMLDocumentParser.cpp:
     13        (WebCore::HTMLDocumentParser::HTMLDocumentParser): Change to use updateStateFor
     14        to set the initial state when parsing a fragment, since it implements the same
     15        rule taht the tokenizerStateForContextElement function did.
     16        (WebCore::HTMLDocumentParser::pumpTokenizer): Updated to use the revised
     17        interfaces for HTMLSourceTracker and HTMLTokenizer.
     18        (WebCore::HTMLDocumentParser::constructTreeFromHTMLToken): Changed to take a
     19        TokenPtr instead of an HTMLToken, so we can clear out the TokenPtr earlier
     20        for non-character tokens, and let them get cleared later for character tokens.
     21        (WebCore::HTMLDocumentParser::insert): Pass references.
     22        (WebCore::HTMLDocumentParser::append): Ditto.
     23        (WebCore::HTMLDocumentParser::appendCurrentInputStreamToPreloadScannerAndScan): Ditto.
     24
     25        * html/parser/HTMLDocumentParser.h: Updated argument type for constructTreeFromHTMLToken
     26        and removed now-unneeded m_token data members.
     27
     28        * html/parser/HTMLEntityParser.cpp: Removed unneeded uses of the inline keyword.
     29        (WebCore::HTMLEntityParser::consumeNamedEntity): Replaced two uses of
     30        advanceAndASSERT with just plain advance; there's really no need to assert the
     31        character is the one we just got out of the string.
     32
     33        * html/parser/HTMLInputStream.h: Moved the include of TextPosition.h here from
     34        its old location since this class has two data members that are OrdinalNumber.
     35
     36        * html/parser/HTMLMetaCharsetParser.cpp:
     37        (WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser): Removed most of the
     38        initialization, since it's now done by defaults.
     39        (WebCore::extractCharset): Rewrote this to be a non-member function, and to
     40        use a for loop, and to handle quote marks in a simpler way. Also changed it
     41        to return a StringView so we don't have to allocate a new string.
     42        (WebCore::HTMLMetaCharsetParser::processMeta): Use a modern for loop, and
     43        also take a token argument since it's no longer a data member.
     44        (WebCore::HTMLMetaCharsetParser::encodingFromMetaAttributes): Use a modern for
     45        loop, StringView instead of string, and don't bother naming the local enum.
     46        (WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Updated for the new
     47        way of getting tokens from the tokenizer.
     48
     49        * html/parser/HTMLMetaCharsetParser.h: Got rid of some data members and
     50        tightened up the formatting a little. Don't bother allocating the tokenizer
     51        on the heap.
     52
     53        * html/parser/HTMLPreloadScanner.cpp:
     54        (WebCore::TokenPreloadScanner::TokenPreloadScanner): Removed unneeded
     55        initialization.
     56        (WebCore::HTMLPreloadScanner::HTMLPreloadScanner): Ditto.
     57        (WebCore::HTMLPreloadScanner::scan): Changed to take a reference.
     58
     59        * html/parser/HTMLPreloadScanner.h: Removed unneeded includes, typedefs,
     60        and forward declarations. Removed explicit declaration of the destructor,
     61        since the default one works. Removed unused createCheckpoint and rewindTo
     62        functions. Gave initial values for various data members. Marked the device
     63        scale factor const beacuse it's set in the constructor and never changed.
     64        Also removed the unneeded isSafeToSendToAnotherThread.
     65
     66        * html/parser/HTMLResourcePreloader.cpp:
     67        (WebCore::PreloadRequest::isSafeToSendToAnotherThread): Deleted.
     68
     69        * html/parser/HTMLResourcePreloader.h:
     70        (WebCore::PreloadRequest::PreloadRequest): Removed unneeded calls to
     71        isolatedCopy. Also removed isSafeToSendToAnotherThread.
     72
     73        * html/parser/HTMLSourceTracker.cpp:
     74        (WebCore::HTMLSourceTracker::startToken): Renamed. Changed to keep state
     75         in the source tracker itself, not the token.
     76        (WebCore::HTMLSourceTracker::endToken): Ditto.
     77        (WebCore::HTMLSourceTracker::source): Renamed. Changed to use the state
     78        from the source tracker.
     79
     80        * html/parser/HTMLSourceTracker.h: Removed unneeded include of HTMLToken.h.
     81        Renamed functions, removed now-unneeded comment.
     82
     83        * html/parser/HTMLToken.h: Cut down on the fields used by the source tracker.
     84        It only needs to know the start and end of each attribute, not each part of
     85        each attribute. Removed setBaseOffset, setEndOffset, length, addNewAttribute,
     86        beginAttributeName, endAttributeName, beginAttributeValue, endAttributeValue,
     87        m_baseOffset and m_length. Added beginAttribute and endAttribute.
     88        (WebCore::HTMLToken::clear): No need to zero m_length or m_baseOffset any more.
     89        (WebCore::HTMLToken::length): Deleted.
     90        (WebCore::HTMLToken::setBaseOffset): Deleted.
     91        (WebCore::HTMLToken::setEndOffset): Deleted.
     92        (WebCore::HTMLToken::beginStartTag): Only null out m_currentAttribute if we
     93        are compiling in assertions.
     94        (WebCore::HTMLToken::beginEndTag): Ditto.
     95        (WebCore::HTMLToken::addNewAttribute): Deleted.
     96        (WebCore::HTMLToken::beginAttribute): Moved the code from addNewAttribute in
     97        here and set the start offset.
     98        (WebCore::HTMLToken::beginAttributeName): Deleted.
     99        (WebCore::HTMLToken::endAttributeName): Deleted.
     100        (WebCore::HTMLToken::beginAttributeValue): Deleted.
     101        (WebCore::HTMLToken::endAttributeValue): Deleted.
     102
     103        * html/parser/HTMLTokenizer.cpp:
     104        (WebCore::HTMLToken::endAttribute): Added. Sets the end offset.
     105        (WebCore::HTMLToken::appendToAttributeName): Updated assertion.
     106        (WebCore::HTMLToken::appendToAttributeValue): Ditto.
     107        (WebCore::convertASCIIAlphaToLower): Renamed from toLowerCase and changed
     108        so it's legal to call on lower case letters too.
     109        (WebCore::vectorEqualsString): Changed to take a string literal rather than
     110        a WTF::String.
     111        (WebCore::HTMLTokenizer::inEndTagBufferingState): Made this a member function.
     112        (WebCore::HTMLTokenizer::HTMLTokenizer): Updated for data member changes.
     113        (WebCore::HTMLTokenizer::bufferASCIICharacter): Added. Optimized version of
     114        bufferCharacter for the common case where we know the character is ASCII.
     115        (WebCore::HTMLTokenizer::bufferCharacter): Moved this function here from the
     116        header since it's only used inside the class.
     117        (WebCore::HTMLTokenizer::emitAndResumeInDataState): Moved this here, renamed
     118        it and removed the state argument.
     119        (WebCore::HTMLTokenizer::emitAndReconsumeInDataState): Ditto.
     120        (WebCore::HTMLTokenizer::emitEndOfFile): More of the same.
     121        (WebCore::HTMLTokenizer::saveEndTagNameIfNeeded): Ditto.
     122        (WebCore::HTMLTokenizer::haveBufferedCharacterToken): Ditto.
     123        (WebCore::HTMLTokenizer::flushBufferedEndTag): Updated since m_token is now
     124        the actual token, not just a pointer.
     125        (WebCore::HTMLTokenizer::flushEmitAndResumeInDataState): Renamed this and
     126        removed the state argument.
     127        (WebCore::HTMLTokenizer::processToken): This function, formerly nextToken,
     128        is now the internal function used by nextToken. Updated its contents to use
     129        simpler macros, changed code to set m_state when returning, rather than
     130        constantly setting it when cycling through states, switched style to use
     131        early return/goto rather than lots of else statements, took out unneeded
     132        braces now that BEGIN/END_STATE handles the braces, collapsed upper and
     133        lower case letter handling in many states, changed lookAhead call sites to
     134        use the new advancePast function instead.
     135        (WebCore::HTMLTokenizer::updateStateFor): Set m_state directly instead of
     136        calling a setstate function.
     137        (WebCore::HTMLTokenizer::appendToTemporaryBuffer): Moved here from header.
     138        (WebCore::HTMLTokenizer::temporaryBufferIs): Changed argument type to
     139        a literal instead of a WTF::String.
     140        (WebCore::HTMLTokenizer::appendToPossibleEndTag): Renamed and changed type
     141        to be a UChar instead of LChar, although all characters will be ASCII.
     142        (WebCore::HTMLTokenizer::isAppropriateEndTag): Marked const, and changed
     143        type from size_t to unsigned.
     144
     145        * html/parser/HTMLTokenizer.h: Changed interface of nextToken so it returns
     146        a TokenPtr so code doesn't have to understand special rules about when to
     147        work with an HTMLToken and when to clear it. Made most functions private,
     148        and made the State enum private as well. Replaced the state and setState
     149        functions with more specific functions for the few states we need to deal
     150        with outside the class. Moved function bodies outside the class definition
     151        so it's easier to read the class definition.
     152
     153        * html/parser/HTMLTreeBuilder.cpp:
     154        (WebCore::HTMLTreeBuilder::processStartTagForInBody): Updated to use the
     155        new set state functions instead of setState.
     156        (WebCore::HTMLTreeBuilder::processEndTag): Ditto.
     157        (WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
     158        (WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
     159        (WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.
     160
     161        * html/parser/InputStreamPreprocessor.h: Marked the constructor explicit,
     162        and mde it take a reference rather than a pointer.
     163
     164        * html/parser/TextDocumentParser.cpp:
     165        (WebCore::TextDocumentParser::insertFakePreElement): Updated to use the
     166        new set state functions instead of setState.
     167
     168        * html/parser/XSSAuditor.cpp:
     169        (WebCore::XSSAuditor::decodedSnippetForName): Updated for name change.
     170        (WebCore::XSSAuditor::decodedSnippetForAttribute): Updated for changes to
     171        attribute range tracking.
     172        (WebCore::XSSAuditor::decodedSnippetForJavaScript): Updated for name change.
     173        (WebCore::XSSAuditor::isSafeToSendToAnotherThread): Deleted.
     174
     175        * html/parser/XSSAuditor.h: Deleted isSafeToSendToAnotherThread.
     176
     177        * html/track/WebVTTTokenizer.cpp: Removed the local state variable from
     178        WEBVTT_ADVANCE_TO; there is no need for it.
     179        (WebCore::WebVTTTokenizer::WebVTTTokenizer): Use a reference instead of a
     180        pointer for the preprocessor.
     181        (WebCore::WebVTTTokenizer::nextToken): Ditto. Also removed the state local
     182        variable and the switch statement, replacing with labels instead since we
     183        go between states with goto.
     184
     185        * platform/text/SegmentedString.cpp:
     186        (WebCore::SegmentedString::operator=): Changed the return type to be non-const
     187        to match normal C++ design rules.
     188        (WebCore::SegmentedString::pushBack): Renamed from prepend since this is not a
     189        general purpose prepend function. Also fixed assertions to not use the strangely
     190        named "escaped" function, since we are deleting it.
     191        (WebCore::SegmentedString::append): Ditto.
     192        (WebCore::SegmentedString::advancePastNonNewlines): Renamed from advance, since
     193        the function only works for non-newlines.
     194        (WebCore::SegmentedString::currentColumn): Got rid of unneeded local variable.
     195        (WebCore::SegmentedString::advancePastSlowCase): Moved here from header and
     196        renamed. This function now consumes the characters if they match.
     197
     198        * platform/text/SegmentedString.h: Made the changes mentioned above.
     199        (WebCore::SegmentedString::excludeLineNumbers): Deleted.
     200        (WebCore::SegmentedString::advancePast): Renamed from lookAhead. Also changed
     201        behavior so the characters are consumed.
     202        (WebCore::SegmentedString::advancePastIgnoringCase): Ditto.
     203        (WebCore::SegmentedString::advanceAndASSERT): Deleted.
     204        (WebCore::SegmentedString::advanceAndASSERTIgnoringCase): Deleted.
     205        (WebCore::SegmentedString::escaped): Deleted.
     206
     207        * xml/parser/CharacterReferenceParserInlines.h:
     208        (WebCore::isHexDigit): Deleted.
     209        (WebCore::unconsumeCharacters): Updated for name change.
     210        (WebCore::consumeCharacterReference): Removed unneeded name for local enum,
     211        renamed local variable "cc" to character. Changed code to use helpers like
     212        isASCIIAlpha and toASCIIHexValue. Removed unneeded use of advanceAndASSERT,
     213        since we don't really need to assert the character we just extracted.
     214
     215        * xml/parser/MarkupTokenizerInlines.h:
     216        (WebCore::isTokenizerWhitespace): Renamed argument to character.
     217        (WebCore::advanceStringAndASSERTIgnoringCase): Deleted.
     218        (WebCore::advanceStringAndASSERT): Deleted.
     219        Changed all the macro implementations so they set m_state only when
     220        returning from the function and just use goto inside the state machine.
     221
    12222015-01-11  Andreas Kling  <akling@apple.com>
    2223
  • trunk/Source/WebCore/html/parser/AtomicHTMLToken.h

    r178173 r178265  
    192192            continue;
    193193
    194         ASSERT(attribute.nameRange.start);
    195         ASSERT(attribute.nameRange.end);
    196         ASSERT(attribute.valueRange.start);
    197         ASSERT(attribute.valueRange.end);
    198 
    199194        QualifiedName name(nullAtom, AtomicString(attribute.name), nullAtom);
    200195
  • trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp

    r178173 r178265  
    4040using namespace HTMLNames;
    4141
    42 // This is a direct transcription of step 4 from:
    43 // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
    44 static HTMLTokenizer::State tokenizerStateForContextElement(Element& contextElement, bool reportErrors, const HTMLParserOptions& options)
    45 {
    46     const QualifiedName& contextTag = contextElement.tagQName();
    47 
    48     if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
    49         return HTMLTokenizer::RCDATAState;
    50     if (contextTag.matches(styleTag)
    51         || contextTag.matches(xmpTag)
    52         || contextTag.matches(iframeTag)
    53         || (contextTag.matches(noembedTag) && options.pluginsEnabled)
    54         || (contextTag.matches(noscriptTag) && options.scriptEnabled)
    55         || contextTag.matches(noframesTag))
    56         return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
    57     if (contextTag.matches(scriptTag))
    58         return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
    59     if (contextTag.matches(plaintextTag))
    60         return HTMLTokenizer::PLAINTEXTState;
    61     return HTMLTokenizer::DataState;
    62 }
    63 
    6442HTMLDocumentParser::HTMLDocumentParser(HTMLDocument& document)
    6543    : ScriptableDocumentParser(document)
     
    8664    , m_xssAuditorDelegate(fragment.document())
    8765{
    88     bool reportErrors = false; // For now document fragment parsing never reports errors.
    89     m_tokenizer.setState(tokenizerStateForContextElement(contextElement, reportErrors, m_options));
     66    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
     67    if (contextElement.isHTMLElement())
     68        m_tokenizer.updateStateFor(contextElement.tagQName().localName());
    9069    m_xssAuditor.initForFragment();
    9170}
     
    280259    while (canTakeNextToken(mode, session) && !session.needsYield) {
    281260        if (!isParsingFragment())
    282             m_sourceTracker.start(m_input.current(), &m_tokenizer, m_token);
    283 
    284         if (!m_tokenizer.nextToken(m_input.current(), m_token))
     261            m_sourceTracker.startToken(m_input.current(), m_tokenizer);
     262
     263        auto token = m_tokenizer.nextToken(m_input.current());
     264        if (!token)
    285265            break;
    286266
    287267        if (!isParsingFragment()) {
    288             m_sourceTracker.end(m_input.current(), &m_tokenizer, m_token);
     268            m_sourceTracker.endToken(m_input.current(), m_tokenizer);
    289269
    290270            // We do not XSS filter innerHTML, which means we (intentionally) fail
    291271            // http/tests/security/xssAuditor/dom-write-innerHTML.html
    292             if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(m_token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
     272            if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(*token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
    293273                m_xssAuditorDelegate.didBlockScript(*xssInfo);
    294274        }
    295275
    296         constructTreeFromHTMLToken(m_token);
    297         ASSERT(m_token.type() == HTMLToken::Uninitialized);
     276        constructTreeFromHTMLToken(token);
    298277    }
    299278
     
    309288
    310289    if (isWaitingForScripts()) {
    311         ASSERT(m_tokenizer.state() == HTMLTokenizer::DataState);
     290        ASSERT(m_tokenizer.isInDataState());
    312291        if (!m_preloadScanner) {
    313292            m_preloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
    314293            m_preloadScanner->appendToEnd(m_input.current());
    315294        }
    316         m_preloadScanner->scan(m_preloader.get(), *document());
     295        m_preloadScanner->scan(*m_preloader, *document());
    317296    }
    318297
     
    320299}
    321300
    322 void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
    323 {
    324     AtomicHTMLToken token(rawToken);
     301void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr& rawToken)
     302{
     303    AtomicHTMLToken token(*rawToken);
    325304
    326305    // We clear the rawToken in case constructTreeFromAtomicToken
     
    334313    // the main thread or once we stop allowing synchronous JavaScript
    335314    // execution from parseAttribute.
    336     if (rawToken.type() != HTMLToken::Character)
     315    if (rawToken->type() != HTMLToken::Character) {
     316        // Clearing the TokenPtr makes sure we don't clear the HTMLToken a second time
     317        // later when the TokenPtr is destroyed.
    337318        rawToken.clear();
     319    }
    338320
    339321    m_treeBuilder->constructTree(token);
    340 
    341     if (rawToken.type() != HTMLToken::Uninitialized) {
    342         ASSERT(rawToken.type() == HTMLToken::Character);
    343         rawToken.clear();
    344     }
    345322}
    346323
     
    374351            m_insertionPreloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
    375352        m_insertionPreloadScanner->appendToEnd(source);
    376         m_insertionPreloadScanner->scan(m_preloader.get(), *document());
     353        m_insertionPreloadScanner->scan(*m_preloader, *document());
    377354    }
    378355
     
    399376            m_preloadScanner->appendToEnd(source);
    400377            if (isWaitingForScripts())
    401                 m_preloadScanner->scan(m_preloader.get(), *document());
     378                m_preloadScanner->scan(*m_preloader, *document());
    402379        }
    403380    }
     
    534511    ASSERT(m_preloadScanner);
    535512    m_preloadScanner->appendToEnd(m_input.current());
    536     m_preloadScanner->scan(m_preloader.get(), *document());
     513    m_preloadScanner->scan(*m_preloader, *document());
    537514}
    538515
  • trunk/Source/WebCore/html/parser/HTMLDocumentParser.h

    r178173 r178265  
    104104    void pumpTokenizer(SynchronousMode);
    105105    void pumpTokenizerIfPossible(SynchronousMode);
    106     void constructTreeFromHTMLToken(HTMLToken&);
     106    void constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr&);
    107107
    108108    void runScriptsForPausedTreeBuilder();
     
    122122    HTMLInputStream m_input;
    123123
    124     HTMLToken m_token;
    125124    HTMLTokenizer m_tokenizer;
    126125    std::unique_ptr<HTMLScriptRunner> m_scriptRunner;
  • trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp

    r178173 r178265  
    6161    }
    6262
    63     inline static bool acceptMalformed() { return true; }
     63    static bool acceptMalformed() { return true; }
    6464
    65     inline static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
     65    static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
    6666    {
    6767        StringBuilder consumedCharacters;
     
    7373                break;
    7474            consumedCharacters.append(cc);
    75             source.advanceAndASSERT(cc);
     75            source.advance();
    7676        }
    7777        notEnoughCharacters = source.isEmpty();
     
    9898                ASSERT_UNUSED(reference, cc == *reference++);
    9999                consumedCharacters.append(cc);
    100                 source.advanceAndASSERT(cc);
     100                source.advance();
    101101                ASSERT(!source.isEmpty());
    102102            }
  • trunk/Source/WebCore/html/parser/HTMLInputStream.h

    r178173 r178265  
    2929#include "InputStreamPreprocessor.h"
    3030#include "SegmentedString.h"
     31#include <wtf/text/TextPosition.h>
    3132
    3233namespace WebCore {
  • trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp

    r178173 r178265  
    11/*
    22 * Copyright (C) 2010 Google Inc. All Rights Reserved.
     3 * Copyright (C) 2015 Apple Inc. All Rights Reserved.
    34 *
    45 * Redistribution and use in source and binary forms, with or without
     
    2930#include "HTMLNames.h"
    3031#include "HTMLParserIdioms.h"
    31 #include "HTMLTokenizer.h"
    32 #include "TextCodec.h"
    3332#include "TextEncodingRegistry.h"
    34 
    35 using namespace WTF;
    3633
    3734namespace WebCore {
     
    4037
    4138HTMLMetaCharsetParser::HTMLMetaCharsetParser()
    42     : m_tokenizer(std::make_unique<HTMLTokenizer>(HTMLParserOptions()))
    43     , m_assumedCodec(newTextCodec(Latin1Encoding()))
    44     , m_inHeadSection(true)
    45     , m_doneChecking(false)
     39    : m_codec(newTextCodec(Latin1Encoding()))
    4640{
    4741}
    4842
    49 HTMLMetaCharsetParser::~HTMLMetaCharsetParser()
     43static StringView extractCharset(const String& value)
    5044{
    51 }
    52 
    53 static const char charsetString[] = "charset";
    54 static const size_t charsetLength = sizeof("charset") - 1;
    55 
    56 String HTMLMetaCharsetParser::extractCharset(const String& value)
    57 {
    58     size_t pos = 0;
    5945    unsigned length = value.length();
    60 
    61     while (pos < length) {
    62         pos = value.find(charsetString, pos, false);
     46    for (size_t pos = 0; pos < length; ) {
     47        pos = value.find("charset", pos, false);
    6348        if (pos == notFound)
    6449            break;
    6550
     51        static const size_t charsetLength = sizeof("charset") - 1;
    6652        pos += charsetLength;
    6753
     
    7864            ++pos;
    7965
    80         char quoteMark = 0;
    81         if (pos < length && (value[pos] == '"' || value[pos] == '\'')) {
    82             quoteMark = static_cast<char>(value[pos++]);
    83             ASSERT(!(quoteMark & 0x80));
    84         }
    85            
     66        UChar quoteMark = 0;
     67        if (pos < length && (value[pos] == '"' || value[pos] == '\''))
     68            quoteMark = value[pos++];
     69
    8670        if (pos == length)
    8771            break;
     
    9478            break; // Close quote not found.
    9579
    96         return value.substring(pos, end - pos);
     80        return StringView(value).substring(pos, end - pos);
    9781    }
    98 
    99     return "";
     82    return StringView();
    10083}
    10184
    102 bool HTMLMetaCharsetParser::processMeta()
     85bool HTMLMetaCharsetParser::processMeta(HTMLToken& token)
    10386{
    104     const HTMLToken::AttributeList& tokenAttributes = m_token.attributes();
    10587    AttributeList attributes;
    106     for (HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); iter != tokenAttributes.end(); ++iter) {
    107         String attributeName = StringImpl::create8BitIfPossible(iter->name);
    108         String attributeValue = StringImpl::create8BitIfPossible(iter->value);
     88    for (auto& attribute : token.attributes()) {
     89        String attributeName = StringImpl::create8BitIfPossible(attribute.name);
     90        String attributeValue = StringImpl::create8BitIfPossible(attribute.value);
    10991        attributes.append(std::make_pair(attributeName, attributeValue));
    11092    }
     
    11799{
    118100    bool gotPragma = false;
    119     Mode mode = None;
    120     String charset;
     101    enum { None, Charset, Pragma } mode = None;
     102    StringView charset;
    121103
    122     for (AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) {
    123         const AtomicString& attributeName = iter->first;
    124         const String& attributeValue = iter->second;
     104    for (auto& attribute : attributes) {
     105        const String& attributeName = attribute.first;
     106        const String& attributeValue = attribute.second;
    125107
    126108        if (attributeName == http_equivAttr) {
     
    140122
    141123    if (mode == Charset || (mode == Pragma && gotPragma))
    142         return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset));
     124        return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset.toStringWithoutCopying()));
    143125
    144126    return TextEncoding();
    145127}
    146 
    147 static const int bytesToCheckUnconditionally = 1024; // That many input bytes will be checked for meta charset even if <head> section is over.
    148128
    149129bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
     
    157137    // The following tags are allowed in <head>:
    158138    // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE
    159 
     139    //
    160140    // We stop scanning when a tag that is not permitted in <head>
    161141    // is seen, rather when </head> is seen, because that more closely
    162142    // matches behavior in other browsers; more details in
    163143    // <http://bugs.webkit.org/show_bug.cgi?id=3590>.
    164 
     144    //
    165145    // Additionally, we ignore things that looks like tags in <title>, <script>
    166146    // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>,
    167147    // <http://bugs.webkit.org/show_bug.cgi?id=12165> and
    168148    // <http://bugs.webkit.org/show_bug.cgi?id=12389>.
    169 
     149    //
    170150    // Since many sites have charset declarations after <body> or other tags
    171151    // that are disallowed in <head>, we don't bail out until we've checked at
    172152    // least bytesToCheckUnconditionally bytes of input.
    173153
    174     m_input.append(SegmentedString(m_assumedCodec->decode(data, length)));
     154    static const int bytesToCheckUnconditionally = 1024;
    175155
    176     while (m_tokenizer->nextToken(m_input, m_token)) {
    177         bool end = m_token.type() == HTMLToken::EndTag;
    178         if (end || m_token.type() == HTMLToken::StartTag) {
    179             AtomicString tagName(m_token.name());
    180             if (!end) {
    181                 m_tokenizer->updateStateFor(tagName);
    182                 if (tagName == metaTag && processMeta()) {
     156    m_input.append(SegmentedString(m_codec->decode(data, length)));
     157
     158    while (auto token = m_tokenizer.nextToken(m_input)) {
     159        bool isEnd = token->type() == HTMLToken::EndTag;
     160        if (isEnd || token->type() == HTMLToken::StartTag) {
     161            AtomicString tagName(token->name());
     162            if (!isEnd) {
     163                m_tokenizer.updateStateFor(tagName);
     164                if (tagName == metaTag && processMeta(*token)) {
    183165                    m_doneChecking = true;
    184166                    return true;
     
    190172                && tagName != metaTag && tagName != objectTag
    191173                && tagName != titleTag && tagName != baseTag
    192                 && (end || tagName != htmlTag) && (end || tagName != headTag)) {
     174                && (isEnd || tagName != htmlTag)
     175                && (isEnd || tagName != headTag)) {
    193176                m_inHeadSection = false;
    194177            }
     
    199182            return true;
    200183        }
    201 
    202         m_token.clear();
    203184    }
    204185
  • trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h

    r178173 r178265  
    2727#define HTMLMetaCharsetParser_h
    2828
    29 #include "HTMLToken.h"
     29#include "HTMLTokenizer.h"
    3030#include "SegmentedString.h"
    3131#include "TextEncoding.h"
    32 #include <wtf/Noncopyable.h>
    3332
    3433namespace WebCore {
    3534
    36 class HTMLTokenizer;
    3735class TextCodec;
    3836
     
    4139public:
    4240    HTMLMetaCharsetParser();
    43     ~HTMLMetaCharsetParser();
    4441
    4542    // Returns true if done checking, regardless whether an encoding is found.
     
    4845    const TextEncoding& encoding() { return m_encoding; }
    4946
     47    // The returned encoding might not be valid.
    5048    typedef Vector<std::pair<String, String>> AttributeList;
    51     // The returned encoding might not be valid.
    52     static TextEncoding encodingFromMetaAttributes(const AttributeList&
    53 );
     49    static TextEncoding encodingFromMetaAttributes(const AttributeList&);
    5450
    5551private:
    56     bool processMeta();
    57     static String extractCharset(const String&);
     52    bool processMeta(HTMLToken&);
    5853
    59     enum Mode {
    60         None,
    61         Charset,
    62         Pragma,
    63     };
    64 
    65     std::unique_ptr<HTMLTokenizer> m_tokenizer;
    66     std::unique_ptr<TextCodec> m_assumedCodec;
     54    HTMLTokenizer m_tokenizer;
     55    const std::unique_ptr<TextCodec> m_codec;
    6756    SegmentedString m_input;
    68     HTMLToken m_token;
    69     bool m_inHeadSection;
    70 
    71     bool m_doneChecking;
     57    bool m_inHeadSection { true };
     58    bool m_doneChecking { false };
    7259    TextEncoding m_encoding;
    7360};
  • trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp

    r178173 r178265  
    243243TokenPreloadScanner::TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor)
    244244    : m_documentURL(documentURL)
    245     , m_inStyle(false)
    246245    , m_deviceScaleFactor(deviceScaleFactor)
    247 #if ENABLE(TEMPLATE_ELEMENT)
    248     , m_templateCount(0)
    249 #endif
    250 {
    251 }
    252 
    253 TokenPreloadScanner::~TokenPreloadScanner()
    254 {
    255 }
    256 
    257 TokenPreloadScannerCheckpoint TokenPreloadScanner::createCheckpoint()
    258 {
    259     TokenPreloadScannerCheckpoint checkpoint = m_checkpoints.size();
    260     m_checkpoints.append(Checkpoint(m_predictedBaseElementURL, m_inStyle
    261 #if ENABLE(TEMPLATE_ELEMENT)
    262                                     , m_templateCount
    263 #endif
    264                                     ));
    265     return checkpoint;
    266 }
    267 
    268 void TokenPreloadScanner::rewindTo(TokenPreloadScannerCheckpoint checkpointIndex)
    269 {
    270     ASSERT(checkpointIndex < m_checkpoints.size()); // If this ASSERT fires, checkpointIndex is invalid.
    271     const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];
    272     m_predictedBaseElementURL = checkpoint.predictedBaseElementURL;
    273     m_inStyle = checkpoint.inStyle;
    274 #if ENABLE(TEMPLATE_ELEMENT)
    275     m_templateCount = checkpoint.templateCount;
    276 #endif
    277     m_cssScanner.reset();
    278     m_checkpoints.clear();
     246{
    279247}
    280248
     
    350318HTMLPreloadScanner::HTMLPreloadScanner(const HTMLParserOptions& options, const URL& documentURL, float deviceScaleFactor)
    351319    : m_scanner(documentURL, deviceScaleFactor)
    352     , m_tokenizer(std::make_unique<HTMLTokenizer>(options))
    353 {
    354 }
    355 
    356 HTMLPreloadScanner::~HTMLPreloadScanner()
     320    , m_tokenizer(options)
    357321{
    358322}
     
    363327}
    364328
    365 void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& document)
     329void HTMLPreloadScanner::scan(HTMLResourcePreloader& preloader, Document& document)
    366330{
    367331    ASSERT(isMainThread()); // HTMLTokenizer::updateStateFor only works on the main thread.
     
    375339    PreloadRequestStream requests;
    376340
    377     while (m_tokenizer->nextToken(m_source, m_token)) {
    378         if (m_token.type() == HTMLToken::StartTag)
    379             m_tokenizer->updateStateFor(AtomicString(m_token.name()));
    380         m_scanner.scan(m_token, requests, document);
    381         m_token.clear();
    382     }
    383 
    384     preloader->preload(WTF::move(requests));
    385 }
    386 
    387 }
     341    while (auto token = m_tokenizer.nextToken(m_source)) {
     342        if (token->type() == HTMLToken::StartTag)
     343            m_tokenizer.updateStateFor(AtomicString(token->name()));
     344        m_scanner.scan(*token, requests, document);
     345    }
     346
     347    preloader.preload(WTF::move(requests));
     348}
     349
     350}
  • trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h

    r178173 r178265  
    2929
    3030#include "CSSPreloadScanner.h"
    31 #include "HTMLToken.h"
     31#include "HTMLTokenizer.h"
    3232#include "SegmentedString.h"
    33 #include <wtf/Vector.h>
    3433
    3534namespace WebCore {
    3635
    37 typedef size_t TokenPreloadScannerCheckpoint;
    38 
    39 class HTMLParserOptions;
    40 class HTMLTokenizer;
    41 class SegmentedString;
    42 class Frame;
    43 
    4436class TokenPreloadScanner {
    45     WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
     37    WTF_MAKE_NONCOPYABLE(TokenPreloadScanner);
    4638public:
    4739    explicit TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor = 1.0);
    48     ~TokenPreloadScanner();
    4940
    50     void scan(const HTMLToken&, PreloadRequestStream& requests, Document&);
     41    void scan(const HTMLToken&, PreloadRequestStream&, Document&);
    5142
    5243    void setPredictedBaseElementURL(const URL& url) { m_predictedBaseElementURL = url; }
    53 
    54     // A TokenPreloadScannerCheckpoint is valid until the next call to rewindTo,
    55     // at which point all outstanding checkpoints are invalidated.
    56     TokenPreloadScannerCheckpoint createCheckpoint();
    57     void rewindTo(TokenPreloadScannerCheckpoint);
    58 
    59     bool isSafeToSendToAnotherThread()
    60     {
    61         return m_documentURL.isSafeToSendToAnotherThread()
    62             && m_predictedBaseElementURL.isSafeToSendToAnotherThread();
    63     }
    6444
    6545private:
     
    8666    void updatePredictedBaseURL(const HTMLToken&);
    8767
    88     struct Checkpoint {
    89         Checkpoint(const URL& predictedBaseElementURL, bool inStyle
    90 #if ENABLE(TEMPLATE_ELEMENT)
    91             , size_t templateCount
    92 #endif
    93             )
    94             : predictedBaseElementURL(predictedBaseElementURL)
    95             , inStyle(inStyle)
    96 #if ENABLE(TEMPLATE_ELEMENT)
    97             , templateCount(templateCount)
    98 #endif
    99         {
    100         }
    101 
    102         URL predictedBaseElementURL;
    103         bool inStyle;
    104 #if ENABLE(TEMPLATE_ELEMENT)
    105         size_t templateCount;
    106 #endif
    107     };
    108 
    10968    CSSPreloadScanner m_cssScanner;
    11069    const URL m_documentURL;
     70    const float m_deviceScaleFactor { 1 };
     71
    11172    URL m_predictedBaseElementURL;
    112     bool m_inStyle;
    113     float m_deviceScaleFactor;
    114 
     73    bool m_inStyle { false };
    11574#if ENABLE(TEMPLATE_ELEMENT)
    116     size_t m_templateCount;
     75    unsigned m_templateCount { 0 };
    11776#endif
    118 
    119     Vector<Checkpoint> m_checkpoints;
    12077};
    12178
    12279class HTMLPreloadScanner {
    123     WTF_MAKE_NONCOPYABLE(HTMLPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
     80    WTF_MAKE_FAST_ALLOCATED;
    12481public:
    12582    HTMLPreloadScanner(const HTMLParserOptions&, const URL& documentURL, float deviceScaleFactor = 1.0);
    126     ~HTMLPreloadScanner();
    12783
    12884    void appendToEnd(const SegmentedString&);
    129     void scan(HTMLResourcePreloader*, Document&);
     85    void scan(HTMLResourcePreloader&, Document&);
    13086
    13187private:
    13288    TokenPreloadScanner m_scanner;
    13389    SegmentedString m_source;
    134     HTMLToken m_token;
    135     std::unique_ptr<HTMLTokenizer> m_tokenizer;
     90    HTMLTokenizer m_tokenizer;
    13691};
    13792
  • trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp

    r178173 r178265  
    3535
    3636namespace WebCore {
    37 
    38 bool PreloadRequest::isSafeToSendToAnotherThread() const
    39 {
    40     return m_initiator.isSafeToSendToAnotherThread()
    41         && m_charset.isSafeToSendToAnotherThread()
    42         && m_resourceURL.isSafeToSendToAnotherThread()
    43         && m_mediaAttribute.isSafeToSendToAnotherThread()
    44         && m_baseURL.isSafeToSendToAnotherThread();
    45 }
    4637
    4738URL PreloadRequest::completeURL(Document& document)
  • trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h

    r178173 r178265  
    3636    PreloadRequest(const String& initiator, const String& resourceURL, const URL& baseURL, CachedResource::Type resourceType, const String& mediaAttribute)
    3737        : m_initiator(initiator)
    38         , m_resourceURL(resourceURL.isolatedCopy())
     38        , m_resourceURL(resourceURL)
    3939        , m_baseURL(baseURL.copy())
    4040        , m_resourceType(resourceType)
    41         , m_mediaAttribute(mediaAttribute.isolatedCopy())
     41        , m_mediaAttribute(mediaAttribute)
    4242        , m_crossOriginModeAllowsCookies(false)
    4343    {
    4444    }
    45 
    46     bool isSafeToSendToAnotherThread() const;
    4745
    4846    CachedResourceRequest resourceRequest(Document&);
  • trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp

    r178173 r178265  
    11/*
    22 * Copyright (C) 2010 Adam Barth. All Rights Reserved.
     3 * Copyright (C) 2015 Apple Inc. All rights reserved.
    34 *
    45 * Redistribution and use in source and binary forms, with or without
     
    2627#include "config.h"
    2728#include "HTMLSourceTracker.h"
     29
    2830#include "HTMLTokenizer.h"
    2931#include <wtf/text/StringBuilder.h>
     
    3537}
    3638
    37 void HTMLSourceTracker::start(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
     39void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
    3840{
    39     if (token.type() == HTMLToken::Uninitialized) {
    40         m_previousSource.clear();
    41         if (tokenizer->numberOfBufferedCharacters())
    42             m_previousSource = tokenizer->bufferedCharacters();
     41    if (!m_started) {
     42        if (tokenizer.numberOfBufferedCharacters())
     43            m_previousSource = tokenizer.bufferedCharacters();
     44        else
     45            m_previousSource.clear();
     46        m_started = true;
    4347    } else
    4448        m_previousSource.append(m_currentSource);
    4549
    4650    m_currentSource = currentInput;
    47     token.setBaseOffset(m_currentSource.numberOfCharactersConsumed() - m_previousSource.length());
     51    m_tokenStart = m_currentSource.numberOfCharactersConsumed() - m_previousSource.length();
    4852}
    4953
    50 void HTMLSourceTracker::end(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
     54void HTMLSourceTracker::endToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
    5155{
     56    ASSERT(m_started);
     57    m_started = false;
     58
     59    m_tokenEnd = currentInput.numberOfCharactersConsumed() - tokenizer.numberOfBufferedCharacters();
    5260    m_cachedSourceForToken = String();
    53 
    54     // FIXME: This work should really be done by the HTMLTokenizer.
    55     token.setEndOffset(currentInput.numberOfCharactersConsumed() - tokenizer->numberOfBufferedCharacters());
    5661}
    5762
    58 String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
     63String HTMLSourceTracker::source(const HTMLToken& token)
    5964{
     65    ASSERT(!m_started);
     66
    6067    if (token.type() == HTMLToken::EndOfFile)
    6168        return String(); // Hides the null character we use to mark the end of file.
     
    6471        return m_cachedSourceForToken;
    6572
    66     unsigned length = token.length();
     73    unsigned length = m_tokenEnd - m_tokenStart;
    6774
    6875    StringBuilder source;
     
    8491}
    8592
     93String HTMLSourceTracker::source(const HTMLToken& token, unsigned attributeStart, unsigned attributeEnd)
     94{
     95    return source(token).substring(attributeStart - m_tokenStart, attributeEnd - attributeStart);
    8696}
     97
     98}
  • trunk/Source/WebCore/html/parser/HTMLSourceTracker.h

    r178173 r178265  
    11/*
    22 * Copyright (C) 2010 Adam Barth. All Rights Reserved.
     3 * Copyright (C) 2015 Apple Inc. All rights reserved.
    34 *
    45 * Redistribution and use in source and binary forms, with or without
     
    2728#define HTMLSourceTracker_h
    2829
    29 #include "HTMLToken.h"
    3030#include "SegmentedString.h"
    3131
    3232namespace WebCore {
    3333
     34class HTMLToken;
    3435class HTMLTokenizer;
    3536
     
    3940    HTMLSourceTracker();
    4041
    41     // FIXME: Once we move "end" into HTMLTokenizer, rename "start" to
    42     // something that makes it obvious that this method can be called multiple
    43     // times.
    44     void start(SegmentedString&, HTMLTokenizer*, HTMLToken&);
    45     void end(SegmentedString&, HTMLTokenizer*, HTMLToken&);
     42    void startToken(SegmentedString&, HTMLTokenizer&);
     43    void endToken(SegmentedString&, HTMLTokenizer&);
    4644
    47     String sourceForToken(const HTMLToken&);
     45    String source(const HTMLToken&);
     46    String source(const HTMLToken&, unsigned attributeStart, unsigned attributeEnd);
    4847
    4948private:
     49    bool m_started { false };
     50
     51    unsigned m_tokenStart;
     52    unsigned m_tokenEnd;
     53
    5054    SegmentedString m_previousSource;
    5155    SegmentedString m_currentSource;
  • trunk/Source/WebCore/html/parser/HTMLToken.h

    r178173 r178265  
    5454
    5555    struct Attribute {
    56         struct Range {
    57             unsigned start;
    58             unsigned end;
    59         };
    60 
    61         Range nameRange;
    62         Range valueRange;
    6356        Vector<UChar, 32> name;
    6457        Vector<UChar, 32> value;
     58
     59        // Used by HTMLSourceTracker.
     60        unsigned startOffset;
     61        unsigned endOffset;
    6562    };
    6663
     
    7471    Type type() const;
    7572
    76     // Used by HTMLSourceTracker.
    77     void setBaseOffset(unsigned); // Base for attribute offsets, and the end of token offset.
    78     void setEndOffset(unsigned);
    79     unsigned length() const;
    80 
    8173    // EndOfFile
    8274
     
    114106    void beginEndTag(const Vector<LChar, 32>&);
    115107
    116     void addNewAttribute();
    117 
    118     void beginAttributeName(unsigned offset);
     108    void beginAttribute(unsigned offset);
    119109    void appendToAttributeName(UChar);
    120     void endAttributeName(unsigned offset);
    121 
    122     void beginAttributeValue(unsigned offset);
    123110    void appendToAttributeValue(UChar);
    124     void endAttributeValue(unsigned offset);
     111    void endAttribute(unsigned offset);
    125112
    126113    void setSelfClosing();
     
    155142    Type m_type;
    156143
    157     unsigned m_baseOffset;
    158     unsigned m_length;
    159 
    160144    DataVector m_data;
    161145    UChar m_data8BitCheck;
     
    173157
    174158inline HTMLToken::HTMLToken()
    175 {
    176     clear();
     159    : m_type(Uninitialized)
     160    , m_data8BitCheck(0)
     161{
    177162}
    178163
     
    182167    m_data.clear();
    183168    m_data8BitCheck = 0;
    184 
    185     m_length = 0;
    186     m_baseOffset = 0;
    187169}
    188170
     
    196178    ASSERT(m_type == Uninitialized);
    197179    m_type = EndOfFile;
    198 }
    199 
    200 inline unsigned HTMLToken::length() const
    201 {
    202     return m_length;
    203 }
    204 
    205 inline void HTMLToken::setBaseOffset(unsigned offset)
    206 {
    207     m_baseOffset = offset;
    208 }
    209 
    210 inline void HTMLToken::setEndOffset(unsigned endOffset)
    211 {
    212     m_length = endOffset - m_baseOffset;
    213180}
    214181
     
    301268    m_type = StartTag;
    302269    m_selfClosing = false;
     270    m_attributes.clear();
     271
     272#if !ASSERT_DISABLED
    303273    m_currentAttribute = nullptr;
    304     m_attributes.clear();
     274#endif
    305275
    306276    m_data.append(character);
     
    313283    m_type = EndTag;
    314284    m_selfClosing = false;
     285    m_attributes.clear();
     286
     287#if !ASSERT_DISABLED
    315288    m_currentAttribute = nullptr;
    316     m_attributes.clear();
     289#endif
    317290
    318291    m_data.append(character);
     
    324297    m_type = EndTag;
    325298    m_selfClosing = false;
     299    m_attributes.clear();
     300
     301#if !ASSERT_DISABLED
    326302    m_currentAttribute = nullptr;
    327     m_attributes.clear();
     303#endif
    328304
    329305    m_data.appendVector(characters);
    330306}
    331307
    332 inline void HTMLToken::addNewAttribute()
    333 {
    334     ASSERT(m_type == StartTag || m_type == EndTag);
     308inline void HTMLToken::beginAttribute(unsigned offset)
     309{
     310    ASSERT(m_type == StartTag || m_type == EndTag);
     311    ASSERT(offset);
     312
    335313    m_attributes.grow(m_attributes.size() + 1);
    336314    m_currentAttribute = &m_attributes.last();
    337315
     316    m_currentAttribute->startOffset = offset;
     317}
     318
     319inline void HTMLToken::endAttribute(unsigned offset)
     320{
     321    ASSERT(offset);
     322    ASSERT(m_currentAttribute);
     323    m_currentAttribute->endOffset = offset;
    338324#if !ASSERT_DISABLED
    339     m_currentAttribute->nameRange.start = 0;
    340     m_currentAttribute->nameRange.end = 0;
    341     m_currentAttribute->valueRange.start = 0;
    342     m_currentAttribute->valueRange.end = 0;
     325    m_currentAttribute = nullptr;
    343326#endif
    344327}
    345328
    346 inline void HTMLToken::beginAttributeName(unsigned offset)
    347 {
    348     ASSERT(offset);
    349     ASSERT(!m_currentAttribute->nameRange.start);
    350     m_currentAttribute->nameRange.start = offset - m_baseOffset;
    351 }
    352 
    353 inline void HTMLToken::endAttributeName(unsigned offset)
    354 {
    355     ASSERT(offset);
    356     ASSERT(m_currentAttribute->nameRange.start);
    357     ASSERT(!m_currentAttribute->nameRange.end);
    358 
    359     unsigned adjustedOffset = offset - m_baseOffset;
    360     m_currentAttribute->nameRange.end = adjustedOffset;
    361 
    362     // FIXME: Is this intentional? Why point the value at the end of the name?
    363     m_currentAttribute->valueRange.start = adjustedOffset;
    364     m_currentAttribute->valueRange.end = adjustedOffset;
    365 }
    366 
    367 inline void HTMLToken::beginAttributeValue(unsigned offset)
    368 {
    369     ASSERT(offset);
    370     m_currentAttribute->valueRange.start = offset - m_baseOffset;
    371 }
    372 
    373 inline void HTMLToken::endAttributeValue(unsigned offset)
    374 {
    375     ASSERT(offset);
    376     m_currentAttribute->valueRange.end = offset - m_baseOffset;
    377 }
    378 
    379329inline void HTMLToken::appendToAttributeName(UChar character)
    380330{
    381331    ASSERT(character);
    382332    ASSERT(m_type == StartTag || m_type == EndTag);
    383     ASSERT(m_currentAttribute->nameRange.start);
     333    ASSERT(m_currentAttribute);
    384334    m_currentAttribute->name.append(character);
    385335}
     
    389339    ASSERT(character);
    390340    ASSERT(m_type == StartTag || m_type == EndTag);
    391     ASSERT(m_currentAttribute->valueRange.start);
     341    ASSERT(m_currentAttribute);
    392342    m_currentAttribute->value.append(character);
    393343}
  • trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp

    r178173 r178265  
    11/*
    2  * Copyright (C) 2008 Apple Inc. All Rights Reserved.
     2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
    33 * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
    44 * Copyright (C) 2010 Google, Inc. All Rights Reserved.
     
    3030
    3131#include "HTMLEntityParser.h"
    32 #include "HTMLTreeBuilder.h"
     32#include "HTMLNames.h"
    3333#include "MarkupTokenizerInlines.h"
    34 #include "NotImplemented.h"
    3534#include <wtf/ASCIICType.h>
    36 #include <wtf/CurrentTime.h>
    37 #include <wtf/text/CString.h>
    3835
    3936using namespace WTF;
     
    4340using namespace HTMLNames;
    4441
    45 static inline UChar toLowerCase(UChar cc)
    46 {
    47     ASSERT(isASCIIUpper(cc));
    48     const int lowerCaseOffset = 0x20;
    49     return cc + lowerCaseOffset;
    50 }
    51 
    52 static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const String& string)
    53 {
    54     if (vector.size() != string.length())
    55         return false;
    56 
    57     if (!string.length())
    58         return true;
    59 
    60     return equal(string.impl(), vector.data(), vector.size());
    61 }
    62 
    63 static inline bool isEndTagBufferingState(HTMLTokenizer::State state)
    64 {
    65     switch (state) {
    66     case HTMLTokenizer::RCDATAEndTagOpenState:
    67     case HTMLTokenizer::RCDATAEndTagNameState:
    68     case HTMLTokenizer::RAWTEXTEndTagOpenState:
    69     case HTMLTokenizer::RAWTEXTEndTagNameState:
    70     case HTMLTokenizer::ScriptDataEndTagOpenState:
    71     case HTMLTokenizer::ScriptDataEndTagNameState:
    72     case HTMLTokenizer::ScriptDataEscapedEndTagOpenState:
    73     case HTMLTokenizer::ScriptDataEscapedEndTagNameState:
     42static inline LChar convertASCIIAlphaToLower(UChar character)
     43{
     44    ASSERT(isASCIIAlpha(character));
     45    return toASCIILowerUnchecked(character);
     46}
     47
     48static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const char* string)
     49{
     50    unsigned size = vector.size();
     51    for (unsigned i = 0; i < size; ++i) {
     52        if (!string[i] || vector[i] != string[i])
     53            return false;
     54    }
     55    return !string[size];
     56}
     57
     58inline bool HTMLTokenizer::inEndTagBufferingState() const
     59{
     60    switch (m_state) {
     61    case RCDATAEndTagOpenState:
     62    case RCDATAEndTagNameState:
     63    case RAWTEXTEndTagOpenState:
     64    case RAWTEXTEndTagNameState:
     65    case ScriptDataEndTagOpenState:
     66    case ScriptDataEndTagNameState:
     67    case ScriptDataEscapedEndTagOpenState:
     68    case ScriptDataEscapedEndTagNameState:
    7469        return true;
    7570    default:
     
    7873}
    7974
    80 #define HTML_BEGIN_STATE(stateName) BEGIN_STATE(HTMLTokenizer, stateName)
    81 #define HTML_RECONSUME_IN(stateName) RECONSUME_IN(HTMLTokenizer, stateName)
    82 #define HTML_ADVANCE_TO(stateName) ADVANCE_TO(HTMLTokenizer, stateName)
    83 #define HTML_SWITCH_TO(stateName) SWITCH_TO(HTMLTokenizer, stateName)
    84 
    8575HTMLTokenizer::HTMLTokenizer(const HTMLParserOptions& options)
    86     : m_inputStreamPreprocessor(this)
     76    : m_preprocessor(*this)
    8777    , m_options(options)
    8878{
    89     reset();
    90 }
    91 
    92 HTMLTokenizer::~HTMLTokenizer()
    93 {
    94 }
    95 
    96 void HTMLTokenizer::reset()
    97 {
    98     m_state = HTMLTokenizer::DataState;
    99     m_token = 0;
    100     m_forceNullCharacterReplacement = false;
    101     m_shouldAllowCDATA = false;
    102     m_additionalAllowedCharacter = '\0';
     79}
     80
     81inline void HTMLTokenizer::bufferASCIICharacter(UChar character)
     82{
     83    ASSERT(character != kEndOfFileMarker);
     84    ASSERT(isASCII(character));
     85    LChar narrowedCharacter = character;
     86    m_token.appendToCharacter(narrowedCharacter);
     87}
     88
     89inline void HTMLTokenizer::bufferCharacter(UChar character)
     90{
     91    ASSERT(character != kEndOfFileMarker);
     92    m_token.appendToCharacter(character);
     93}
     94
     95inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source)
     96{
     97    saveEndTagNameIfNeeded();
     98    m_state = DataState;
     99    source.advanceAndUpdateLineNumber();
     100    return true;
     101}
     102
     103inline bool HTMLTokenizer::emitAndReconsumeInDataState()
     104{
     105    saveEndTagNameIfNeeded();
     106    m_state = DataState;
     107    return true;
     108}
     109
     110inline bool HTMLTokenizer::emitEndOfFile(SegmentedString& source)
     111{
     112    m_state = DataState;
     113    if (haveBufferedCharacterToken())
     114        return true;
     115    source.advance();
     116    m_token.clear();
     117    m_token.makeEndOfFile();
     118    return true;
     119}
     120
     121inline void HTMLTokenizer::saveEndTagNameIfNeeded()
     122{
     123    ASSERT(m_token.type() != HTMLToken::Uninitialized);
     124    if (m_token.type() == HTMLToken::StartTag)
     125        m_appropriateEndTagName = m_token.name();
     126}
     127
     128inline bool HTMLTokenizer::haveBufferedCharacterToken() const
     129{
     130    return m_token.type() == HTMLToken::Character;
    103131}
    104132
     
    120148}
    121149
    122 bool HTMLTokenizer::flushBufferedEndTag(SegmentedString& source)
    123 {
    124     ASSERT(m_token->type() == HTMLToken::Character || m_token->type() == HTMLToken::Uninitialized);
    125     source.advanceAndUpdateLineNumber();
    126     if (m_token->type() == HTMLToken::Character)
    127         return true;
    128     m_token->beginEndTag(m_bufferedEndTagName);
     150void HTMLTokenizer::flushBufferedEndTag()
     151{
     152    m_token.beginEndTag(m_bufferedEndTagName);
    129153    m_bufferedEndTagName.clear();
    130154    m_appropriateEndTagName.clear();
    131155    m_temporaryBuffer.clear();
     156}
     157
     158bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state)
     159{
     160    ASSERT(source.currentChar() == character);
     161    appendToTemporaryBuffer(character);
     162    source.advanceAndUpdateLineNumber();
     163
     164    if (haveBufferedCharacterToken()) {
     165        // Emit the buffered character token.
     166        // The next call to processToken will flush the buffered end tag and continue parsing it.
     167        m_state = state;
     168        return true;
     169    }
     170
     171    flushBufferedEndTag();
    132172    return false;
    133173}
    134174
    135 #define FLUSH_AND_ADVANCE_TO(stateName)                                    \
    136     do {                                                                   \
    137         m_state = HTMLTokenizer::stateName;                           \
    138         if (flushBufferedEndTag(source))                                   \
    139             return true;                                                   \
    140         if (source.isEmpty()                                               \
    141             || !m_inputStreamPreprocessor.peek(source))                    \
    142             return haveBufferedCharacterToken();                           \
    143         cc = m_inputStreamPreprocessor.nextInputCharacter();               \
    144         goto stateName;                                                    \
    145     } while (false)
    146 
    147 bool HTMLTokenizer::flushEmitAndResumeIn(SegmentedString& source, HTMLTokenizer::State state)
    148 {
    149     m_state = state;
    150     flushBufferedEndTag(source);
     175bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source)
     176{
     177    ASSERT(source.currentChar() == '>');
     178    appendToTemporaryBuffer('>');
     179    source.advance();
     180
     181    m_state = DataState;
     182
     183    if (haveBufferedCharacterToken()) {
     184        // Emit the character token we already have.
     185        // The next call to processToken will flush the buffered end tag and emit it.
     186        return true;
     187    }
     188
     189    flushBufferedEndTag();
    151190    return true;
    152191}
    153192
    154 bool HTMLTokenizer::nextToken(SegmentedString& source, HTMLToken& token)
    155 {
    156     // If we have a token in progress, then we're supposed to be called back
    157     // with the same token so we can finish it.
    158     ASSERT(!m_token || m_token == &token || token.type() == HTMLToken::Uninitialized);
    159     m_token = &token;
    160 
    161     if (!m_bufferedEndTagName.isEmpty() && !isEndTagBufferingState(m_state)) {
    162         // FIXME: This should call flushBufferedEndTag().
    163         // We started an end tag during our last iteration.
    164         m_token->beginEndTag(m_bufferedEndTagName);
    165         m_bufferedEndTagName.clear();
    166         m_appropriateEndTagName.clear();
    167         m_temporaryBuffer.clear();
    168         if (m_state == HTMLTokenizer::DataState) {
    169             // We're back in the data state, so we must be done with the tag.
     193bool HTMLTokenizer::processToken(SegmentedString& source)
     194{
     195    if (!m_bufferedEndTagName.isEmpty() && !inEndTagBufferingState()) {
     196        // We are back here after emitting a character token that came just before an end tag.
     197        // To continue parsing the end tag we need to move the buffered tag name into the token.
     198        flushBufferedEndTag();
     199
     200        // If we are in the data state, the end tag is already complete and we should emit it
     201        // now, otherwise, we want to resume parsing the partial end tag.
     202        if (m_state == DataState)
    170203            return true;
    171         }
    172204    }
    173205
    174     if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))
     206    if (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state)))
    175207        return haveBufferedCharacterToken();
    176     UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
    177 
    178     // Source: http://www.whatwg.org/specs/web-apps/current-work/#tokenisation0
     208    UChar character = m_preprocessor.nextInputCharacter();
     209
     210    // https://html.spec.whatwg.org/#tokenization
    179211    switch (m_state) {
    180     HTML_BEGIN_STATE(DataState) {
    181         if (cc == '&')
    182             HTML_ADVANCE_TO(CharacterReferenceInDataState);
    183         else if (cc == '<') {
    184             if (m_token->type() == HTMLToken::Character) {
    185                 // We have a bunch of character tokens queued up that we
    186                 // are emitting lazily here.
    187                 return true;
    188             }
    189             HTML_ADVANCE_TO(TagOpenState);
    190         } else if (cc == kEndOfFileMarker)
     212
     213    BEGIN_STATE(DataState)
     214        if (character == '&')
     215            ADVANCE_TO(CharacterReferenceInDataState);
     216        if (character == '<') {
     217            if (haveBufferedCharacterToken())
     218                RETURN_IN_CURRENT_STATE(true);
     219            ADVANCE_TO(TagOpenState);
     220        }
     221        if (character == kEndOfFileMarker)
    191222            return emitEndOfFile(source);
    192         else {
    193             bufferCharacter(cc);
    194             HTML_ADVANCE_TO(DataState);
    195         }
    196     }
    197     END_STATE()
    198 
    199     HTML_BEGIN_STATE(CharacterReferenceInDataState) {
     223        bufferCharacter(character);
     224        ADVANCE_TO(DataState);
     225    END_STATE()
     226
     227    BEGIN_STATE(CharacterReferenceInDataState)
    200228        if (!processEntity(source))
    201             return haveBufferedCharacterToken();
    202         HTML_SWITCH_TO(DataState);
    203     }
    204     END_STATE()
    205 
    206     HTML_BEGIN_STATE(RCDATAState) {
    207         if (cc == '&')
    208             HTML_ADVANCE_TO(CharacterReferenceInRCDATAState);
    209         else if (cc == '<')
    210             HTML_ADVANCE_TO(RCDATALessThanSignState);
    211         else if (cc == kEndOfFileMarker)
    212             return emitEndOfFile(source);
    213         else {
    214             bufferCharacter(cc);
    215             HTML_ADVANCE_TO(RCDATAState);
    216         }
    217     }
    218     END_STATE()
    219 
    220     HTML_BEGIN_STATE(CharacterReferenceInRCDATAState) {
     229            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     230        SWITCH_TO(DataState);
     231    END_STATE()
     232
     233    BEGIN_STATE(RCDATAState)
     234        if (character == '&')
     235            ADVANCE_TO(CharacterReferenceInRCDATAState);
     236        if (character == '<')
     237            ADVANCE_TO(RCDATALessThanSignState);
     238        if (character == kEndOfFileMarker)
     239            RECONSUME_IN(DataState);
     240        bufferCharacter(character);
     241        ADVANCE_TO(RCDATAState);
     242    END_STATE()
     243
     244    BEGIN_STATE(CharacterReferenceInRCDATAState)
    221245        if (!processEntity(source))
    222             return haveBufferedCharacterToken();
    223         HTML_SWITCH_TO(RCDATAState);
    224     }
    225     END_STATE()
    226 
    227     HTML_BEGIN_STATE(RAWTEXTState) {
    228         if (cc == '<')
    229             HTML_ADVANCE_TO(RAWTEXTLessThanSignState);
    230         else if (cc == kEndOfFileMarker)
    231             return emitEndOfFile(source);
    232         else {
    233             bufferCharacter(cc);
    234             HTML_ADVANCE_TO(RAWTEXTState);
    235         }
    236     }
    237     END_STATE()
    238 
    239     HTML_BEGIN_STATE(ScriptDataState) {
    240         if (cc == '<')
    241             HTML_ADVANCE_TO(ScriptDataLessThanSignState);
    242         else if (cc == kEndOfFileMarker)
    243             return emitEndOfFile(source);
    244         else {
    245             bufferCharacter(cc);
    246             HTML_ADVANCE_TO(ScriptDataState);
    247         }
    248     }
    249     END_STATE()
    250 
    251     HTML_BEGIN_STATE(PLAINTEXTState) {
    252         if (cc == kEndOfFileMarker)
    253             return emitEndOfFile(source);
    254         bufferCharacter(cc);
    255         HTML_ADVANCE_TO(PLAINTEXTState);
    256     }
    257     END_STATE()
    258 
    259     HTML_BEGIN_STATE(TagOpenState) {
    260         if (cc == '!')
    261             HTML_ADVANCE_TO(MarkupDeclarationOpenState);
    262         else if (cc == '/')
    263             HTML_ADVANCE_TO(EndTagOpenState);
    264         else if (isASCIIUpper(cc)) {
    265             m_token->beginStartTag(toLowerCase(cc));
    266             HTML_ADVANCE_TO(TagNameState);
    267         } else if (isASCIILower(cc)) {
    268             m_token->beginStartTag(cc);
    269             HTML_ADVANCE_TO(TagNameState);
    270         } else if (cc == '?') {
     246            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     247        SWITCH_TO(RCDATAState);
     248    END_STATE()
     249
     250    BEGIN_STATE(RAWTEXTState)
     251        if (character == '<')
     252            ADVANCE_TO(RAWTEXTLessThanSignState);
     253        if (character == kEndOfFileMarker)
     254            RECONSUME_IN(DataState);
     255        bufferCharacter(character);
     256        ADVANCE_TO(RAWTEXTState);
     257    END_STATE()
     258
     259    BEGIN_STATE(ScriptDataState)
     260        if (character == '<')
     261            ADVANCE_TO(ScriptDataLessThanSignState);
     262        if (character == kEndOfFileMarker)
     263            RECONSUME_IN(DataState);
     264        bufferCharacter(character);
     265        ADVANCE_TO(ScriptDataState);
     266    END_STATE()
     267
     268    BEGIN_STATE(PLAINTEXTState)
     269        if (character == kEndOfFileMarker)
     270            RECONSUME_IN(DataState);
     271        bufferCharacter(character);
     272        ADVANCE_TO(PLAINTEXTState);
     273    END_STATE()
     274
     275    BEGIN_STATE(TagOpenState)
     276        if (character == '!')
     277            ADVANCE_TO(MarkupDeclarationOpenState);
     278        if (character == '/')
     279            ADVANCE_TO(EndTagOpenState);
     280        if (isASCIIAlpha(character)) {
     281            m_token.beginStartTag(convertASCIIAlphaToLower(character));
     282            ADVANCE_TO(TagNameState);
     283        }
     284        if (character == '?') {
    271285            parseError();
    272286            // The spec consumes the current character before switching
    273287            // to the bogus comment state, but it's easier to implement
    274288            // if we reconsume the current character.
    275             HTML_RECONSUME_IN(BogusCommentState);
    276         } else {
    277             parseError();
    278             bufferASCIICharacter('<');
    279             HTML_RECONSUME_IN(DataState);
    280         }
    281     }
    282     END_STATE()
    283 
    284     HTML_BEGIN_STATE(EndTagOpenState) {
    285         if (isASCIIUpper(cc)) {
    286             m_token->beginEndTag(static_cast<LChar>(toLowerCase(cc)));
     289            RECONSUME_IN(BogusCommentState);
     290        }
     291        parseError();
     292        bufferASCIICharacter('<');
     293        RECONSUME_IN(DataState);
     294    END_STATE()
     295
     296    BEGIN_STATE(EndTagOpenState)
     297        if (isASCIIAlpha(character)) {
     298            m_token.beginEndTag(convertASCIIAlphaToLower(character));
    287299            m_appropriateEndTagName.clear();
    288             HTML_ADVANCE_TO(TagNameState);
    289         } else if (isASCIILower(cc)) {
    290             m_token->beginEndTag(static_cast<LChar>(cc));
    291             m_appropriateEndTagName.clear();
    292             HTML_ADVANCE_TO(TagNameState);
    293         } else if (cc == '>') {
    294             parseError();
    295             HTML_ADVANCE_TO(DataState);
    296         } else if (cc == kEndOfFileMarker) {
     300            ADVANCE_TO(TagNameState);
     301        }
     302        if (character == '>') {
     303            parseError();
     304            ADVANCE_TO(DataState);
     305        }
     306        if (character == kEndOfFileMarker) {
    297307            parseError();
    298308            bufferASCIICharacter('<');
    299309            bufferASCIICharacter('/');
    300             HTML_RECONSUME_IN(DataState);
    301         } else {
    302             parseError();
    303             HTML_RECONSUME_IN(BogusCommentState);
    304         }
    305     }
    306     END_STATE()
    307 
    308     HTML_BEGIN_STATE(TagNameState) {
    309         if (isTokenizerWhitespace(cc))
    310             HTML_ADVANCE_TO(BeforeAttributeNameState);
    311         else if (cc == '/')
    312             HTML_ADVANCE_TO(SelfClosingStartTagState);
    313         else if (cc == '>')
    314             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    315         else if (m_options.usePreHTML5ParserQuirks && cc == '<')
    316             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    317         else if (isASCIIUpper(cc)) {
    318             m_token->appendToName(toLowerCase(cc));
    319             HTML_ADVANCE_TO(TagNameState);
    320         } else if (cc == kEndOfFileMarker) {
    321             parseError();
    322             HTML_RECONSUME_IN(DataState);
    323         } else {
    324             m_token->appendToName(cc);
    325             HTML_ADVANCE_TO(TagNameState);
    326         }
    327     }
    328     END_STATE()
    329 
    330     HTML_BEGIN_STATE(RCDATALessThanSignState) {
    331         if (cc == '/') {
     310            RECONSUME_IN(DataState);
     311        }
     312        parseError();
     313        RECONSUME_IN(BogusCommentState);
     314    END_STATE()
     315
     316    BEGIN_STATE(TagNameState)
     317        if (isTokenizerWhitespace(character))
     318            ADVANCE_TO(BeforeAttributeNameState);
     319        if (character == '/')
     320            ADVANCE_TO(SelfClosingStartTagState);
     321        if (character == '>')
     322            return emitAndResumeInDataState(source);
     323        if (m_options.usePreHTML5ParserQuirks && character == '<')
     324            return emitAndReconsumeInDataState();
     325        if (character == kEndOfFileMarker) {
     326            parseError();
     327            RECONSUME_IN(DataState);
     328        }
     329        m_token.appendToName(toASCIILower(character));
     330        ADVANCE_TO(TagNameState);
     331    END_STATE()
     332
     333    BEGIN_STATE(RCDATALessThanSignState)
     334        if (character == '/') {
    332335            m_temporaryBuffer.clear();
    333336            ASSERT(m_bufferedEndTagName.isEmpty());
    334             HTML_ADVANCE_TO(RCDATAEndTagOpenState);
    335         } else {
    336             bufferASCIICharacter('<');
    337             HTML_RECONSUME_IN(RCDATAState);
    338         }
    339     }
    340     END_STATE()
    341 
    342     HTML_BEGIN_STATE(RCDATAEndTagOpenState) {
    343         if (isASCIIUpper(cc)) {
    344             m_temporaryBuffer.append(static_cast<LChar>(cc));
    345             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    346             HTML_ADVANCE_TO(RCDATAEndTagNameState);
    347         } else if (isASCIILower(cc)) {
    348             m_temporaryBuffer.append(static_cast<LChar>(cc));
    349             addToPossibleEndTag(static_cast<LChar>(cc));
    350             HTML_ADVANCE_TO(RCDATAEndTagNameState);
    351         } else {
    352             bufferASCIICharacter('<');
    353             bufferASCIICharacter('/');
    354             HTML_RECONSUME_IN(RCDATAState);
    355         }
    356     }
    357     END_STATE()
    358 
    359     HTML_BEGIN_STATE(RCDATAEndTagNameState) {
    360         if (isASCIIUpper(cc)) {
    361             m_temporaryBuffer.append(static_cast<LChar>(cc));
    362             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    363             HTML_ADVANCE_TO(RCDATAEndTagNameState);
    364         } else if (isASCIILower(cc)) {
    365             m_temporaryBuffer.append(static_cast<LChar>(cc));
    366             addToPossibleEndTag(static_cast<LChar>(cc));
    367             HTML_ADVANCE_TO(RCDATAEndTagNameState);
    368         } else {
    369             if (isTokenizerWhitespace(cc)) {
    370                 if (isAppropriateEndTag()) {
    371                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    372                     FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
    373                 }
    374             } else if (cc == '/') {
    375                 if (isAppropriateEndTag()) {
    376                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    377                     FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
    378                 }
    379             } else if (cc == '>') {
    380                 if (isAppropriateEndTag()) {
    381                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    382                     return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
    383                 }
     337            ADVANCE_TO(RCDATAEndTagOpenState);
     338        }
     339        bufferASCIICharacter('<');
     340        RECONSUME_IN(RCDATAState);
     341    END_STATE()
     342
     343    BEGIN_STATE(RCDATAEndTagOpenState)
     344        if (isASCIIAlpha(character)) {
     345            appendToTemporaryBuffer(character);
     346            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     347            ADVANCE_TO(RCDATAEndTagNameState);
     348        }
     349        bufferASCIICharacter('<');
     350        bufferASCIICharacter('/');
     351        RECONSUME_IN(RCDATAState);
     352    END_STATE()
     353
     354    BEGIN_STATE(RCDATAEndTagNameState)
     355        if (isASCIIAlpha(character)) {
     356            appendToTemporaryBuffer(character);
     357            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     358            ADVANCE_TO(RCDATAEndTagNameState);
     359        }
     360        if (isTokenizerWhitespace(character)) {
     361            if (isAppropriateEndTag()) {
     362                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
     363                    return true;
     364                SWITCH_TO(BeforeAttributeNameState);
    384365            }
    385             bufferASCIICharacter('<');
    386             bufferASCIICharacter('/');
    387             m_token->appendToCharacter(m_temporaryBuffer);
    388             m_bufferedEndTagName.clear();
    389             m_temporaryBuffer.clear();
    390             HTML_RECONSUME_IN(RCDATAState);
    391         }
    392     }
    393     END_STATE()
    394 
    395     HTML_BEGIN_STATE(RAWTEXTLessThanSignState) {
    396         if (cc == '/') {
     366        } else if (character == '/') {
     367            if (isAppropriateEndTag()) {
     368                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
     369                    return true;
     370                SWITCH_TO(SelfClosingStartTagState);
     371            }
     372        } else if (character == '>') {
     373            if (isAppropriateEndTag())
     374                return commitToCompleteEndTag(source);
     375        }
     376        bufferASCIICharacter('<');
     377        bufferASCIICharacter('/');
     378        m_token.appendToCharacter(m_temporaryBuffer);
     379        m_bufferedEndTagName.clear();
     380        m_temporaryBuffer.clear();
     381        RECONSUME_IN(RCDATAState);
     382    END_STATE()
     383
     384    BEGIN_STATE(RAWTEXTLessThanSignState)
     385        if (character == '/') {
    397386            m_temporaryBuffer.clear();
    398387            ASSERT(m_bufferedEndTagName.isEmpty());
    399             HTML_ADVANCE_TO(RAWTEXTEndTagOpenState);
    400         } else {
    401             bufferASCIICharacter('<');
    402             HTML_RECONSUME_IN(RAWTEXTState);
    403         }
    404     }
    405     END_STATE()
    406 
    407     HTML_BEGIN_STATE(RAWTEXTEndTagOpenState) {
    408         if (isASCIIUpper(cc)) {
    409             m_temporaryBuffer.append(static_cast<LChar>(cc));
    410             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    411             HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
    412         } else if (isASCIILower(cc)) {
    413             m_temporaryBuffer.append(static_cast<LChar>(cc));
    414             addToPossibleEndTag(static_cast<LChar>(cc));
    415             HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
    416         } else {
    417             bufferASCIICharacter('<');
    418             bufferASCIICharacter('/');
    419             HTML_RECONSUME_IN(RAWTEXTState);
    420         }
    421     }
    422     END_STATE()
    423 
    424     HTML_BEGIN_STATE(RAWTEXTEndTagNameState) {
    425         if (isASCIIUpper(cc)) {
    426             m_temporaryBuffer.append(static_cast<LChar>(cc));
    427             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    428             HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
    429         } else if (isASCIILower(cc)) {
    430             m_temporaryBuffer.append(static_cast<LChar>(cc));
    431             addToPossibleEndTag(static_cast<LChar>(cc));
    432             HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
    433         } else {
    434             if (isTokenizerWhitespace(cc)) {
    435                 if (isAppropriateEndTag()) {
    436                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    437                     FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
    438                 }
    439             } else if (cc == '/') {
    440                 if (isAppropriateEndTag()) {
    441                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    442                     FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
    443                 }
    444             } else if (cc == '>') {
    445                 if (isAppropriateEndTag()) {
    446                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    447                     return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
    448                 }
     388            ADVANCE_TO(RAWTEXTEndTagOpenState);
     389        }
     390        bufferASCIICharacter('<');
     391        RECONSUME_IN(RAWTEXTState);
     392    END_STATE()
     393
     394    BEGIN_STATE(RAWTEXTEndTagOpenState)
     395        if (isASCIIAlpha(character)) {
     396            appendToTemporaryBuffer(character);
     397            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     398            ADVANCE_TO(RAWTEXTEndTagNameState);
     399        }
     400        bufferASCIICharacter('<');
     401        bufferASCIICharacter('/');
     402        RECONSUME_IN(RAWTEXTState);
     403    END_STATE()
     404
     405    BEGIN_STATE(RAWTEXTEndTagNameState)
     406        if (isASCIIAlpha(character)) {
     407            appendToTemporaryBuffer(character);
     408            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     409            ADVANCE_TO(RAWTEXTEndTagNameState);
     410        }
     411        if (isTokenizerWhitespace(character)) {
     412            if (isAppropriateEndTag()) {
     413                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
     414                    return true;
     415                SWITCH_TO(BeforeAttributeNameState);
    449416            }
    450             bufferASCIICharacter('<');
    451             bufferASCIICharacter('/');
    452             m_token->appendToCharacter(m_temporaryBuffer);
    453             m_bufferedEndTagName.clear();
    454             m_temporaryBuffer.clear();
    455             HTML_RECONSUME_IN(RAWTEXTState);
    456         }
    457     }
    458     END_STATE()
    459 
    460     HTML_BEGIN_STATE(ScriptDataLessThanSignState) {
    461         if (cc == '/') {
     417        } else if (character == '/') {
     418            if (isAppropriateEndTag()) {
     419                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
     420                    return true;
     421                SWITCH_TO(SelfClosingStartTagState);
     422            }
     423        } else if (character == '>') {
     424            if (isAppropriateEndTag())
     425                return commitToCompleteEndTag(source);
     426        }
     427        bufferASCIICharacter('<');
     428        bufferASCIICharacter('/');
     429        m_token.appendToCharacter(m_temporaryBuffer);
     430        m_bufferedEndTagName.clear();
     431        m_temporaryBuffer.clear();
     432        RECONSUME_IN(RAWTEXTState);
     433    END_STATE()
     434
     435    BEGIN_STATE(ScriptDataLessThanSignState)
     436        if (character == '/') {
    462437            m_temporaryBuffer.clear();
    463438            ASSERT(m_bufferedEndTagName.isEmpty());
    464             HTML_ADVANCE_TO(ScriptDataEndTagOpenState);
    465         } else if (cc == '!') {
     439            ADVANCE_TO(ScriptDataEndTagOpenState);
     440        }
     441        if (character == '!') {
    466442            bufferASCIICharacter('<');
    467443            bufferASCIICharacter('!');
    468             HTML_ADVANCE_TO(ScriptDataEscapeStartState);
    469         } else {
    470             bufferASCIICharacter('<');
    471             HTML_RECONSUME_IN(ScriptDataState);
    472         }
    473     }
    474     END_STATE()
    475 
    476     HTML_BEGIN_STATE(ScriptDataEndTagOpenState) {
    477         if (isASCIIUpper(cc)) {
    478             m_temporaryBuffer.append(static_cast<LChar>(cc));
    479             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    480             HTML_ADVANCE_TO(ScriptDataEndTagNameState);
    481         } else if (isASCIILower(cc)) {
    482             m_temporaryBuffer.append(static_cast<LChar>(cc));
    483             addToPossibleEndTag(static_cast<LChar>(cc));
    484             HTML_ADVANCE_TO(ScriptDataEndTagNameState);
    485         } else {
    486             bufferASCIICharacter('<');
    487             bufferASCIICharacter('/');
    488             HTML_RECONSUME_IN(ScriptDataState);
    489         }
    490     }
    491     END_STATE()
    492 
    493     HTML_BEGIN_STATE(ScriptDataEndTagNameState) {
    494         if (isASCIIUpper(cc)) {
    495             m_temporaryBuffer.append(static_cast<LChar>(cc));
    496             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    497             HTML_ADVANCE_TO(ScriptDataEndTagNameState);
    498         } else if (isASCIILower(cc)) {
    499             m_temporaryBuffer.append(static_cast<LChar>(cc));
    500             addToPossibleEndTag(static_cast<LChar>(cc));
    501             HTML_ADVANCE_TO(ScriptDataEndTagNameState);
    502         } else {
    503             if (isTokenizerWhitespace(cc)) {
    504                 if (isAppropriateEndTag()) {
    505                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    506                     FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
    507                 }
    508             } else if (cc == '/') {
    509                 if (isAppropriateEndTag()) {
    510                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    511                     FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
    512                 }
    513             } else if (cc == '>') {
    514                 if (isAppropriateEndTag()) {
    515                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    516                     return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
    517                 }
     444            ADVANCE_TO(ScriptDataEscapeStartState);
     445        }
     446        bufferASCIICharacter('<');
     447        RECONSUME_IN(ScriptDataState);
     448    END_STATE()
     449
     450    BEGIN_STATE(ScriptDataEndTagOpenState)
     451        if (isASCIIAlpha(character)) {
     452            appendToTemporaryBuffer(character);
     453            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     454            ADVANCE_TO(ScriptDataEndTagNameState);
     455        }
     456        bufferASCIICharacter('<');
     457        bufferASCIICharacter('/');
     458        RECONSUME_IN(ScriptDataState);
     459    END_STATE()
     460
     461    BEGIN_STATE(ScriptDataEndTagNameState)
     462        if (isASCIIAlpha(character)) {
     463            appendToTemporaryBuffer(character);
     464            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     465            ADVANCE_TO(ScriptDataEndTagNameState);
     466        }
     467        if (isTokenizerWhitespace(character)) {
     468            if (isAppropriateEndTag()) {
     469                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
     470                    return true;
     471                SWITCH_TO(BeforeAttributeNameState);
    518472            }
    519             bufferASCIICharacter('<');
    520             bufferASCIICharacter('/');
    521             m_token->appendToCharacter(m_temporaryBuffer);
    522             m_bufferedEndTagName.clear();
    523             m_temporaryBuffer.clear();
    524             HTML_RECONSUME_IN(ScriptDataState);
    525         }
    526     }
    527     END_STATE()
    528 
    529     HTML_BEGIN_STATE(ScriptDataEscapeStartState) {
    530         if (cc == '-') {
     473        } else if (character == '/') {
     474            if (isAppropriateEndTag()) {
     475                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
     476                    return true;
     477                SWITCH_TO(SelfClosingStartTagState);
     478            }
     479        } else if (character == '>') {
     480            if (isAppropriateEndTag())
     481                return commitToCompleteEndTag(source);
     482        }
     483        bufferASCIICharacter('<');
     484        bufferASCIICharacter('/');
     485        m_token.appendToCharacter(m_temporaryBuffer);
     486        m_bufferedEndTagName.clear();
     487        m_temporaryBuffer.clear();
     488        RECONSUME_IN(ScriptDataState);
     489    END_STATE()
     490
     491    BEGIN_STATE(ScriptDataEscapeStartState)
     492        if (character == '-') {
    531493            bufferASCIICharacter('-');
    532             HTML_ADVANCE_TO(ScriptDataEscapeStartDashState);
     494            ADVANCE_TO(ScriptDataEscapeStartDashState);
    533495        } else
    534             HTML_RECONSUME_IN(ScriptDataState);
    535     }
    536     END_STATE()
    537 
    538     HTML_BEGIN_STATE(ScriptDataEscapeStartDashState) {
    539         if (cc == '-') {
     496            RECONSUME_IN(ScriptDataState);
     497    END_STATE()
     498
     499    BEGIN_STATE(ScriptDataEscapeStartDashState)
     500        if (character == '-') {
    540501            bufferASCIICharacter('-');
    541             HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
     502            ADVANCE_TO(ScriptDataEscapedDashDashState);
    542503        } else
    543             HTML_RECONSUME_IN(ScriptDataState);
    544     }
    545     END_STATE()
    546 
    547     HTML_BEGIN_STATE(ScriptDataEscapedState) {
    548         if (cc == '-') {
     504            RECONSUME_IN(ScriptDataState);
     505    END_STATE()
     506
     507    BEGIN_STATE(ScriptDataEscapedState)
     508        if (character == '-') {
    549509            bufferASCIICharacter('-');
    550             HTML_ADVANCE_TO(ScriptDataEscapedDashState);
    551         } else if (cc == '<')
    552             HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
    553         else if (cc == kEndOfFileMarker) {
    554             parseError();
    555             HTML_RECONSUME_IN(DataState);
    556         } else {
    557             bufferCharacter(cc);
    558             HTML_ADVANCE_TO(ScriptDataEscapedState);
    559         }
    560     }
    561     END_STATE()
    562 
    563     HTML_BEGIN_STATE(ScriptDataEscapedDashState) {
    564         if (cc == '-') {
     510            ADVANCE_TO(ScriptDataEscapedDashState);
     511        }
     512        if (character == '<')
     513            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
     514        if (character == kEndOfFileMarker) {
     515            parseError();
     516            RECONSUME_IN(DataState);
     517        }
     518        bufferCharacter(character);
     519        ADVANCE_TO(ScriptDataEscapedState);
     520    END_STATE()
     521
     522    BEGIN_STATE(ScriptDataEscapedDashState)
     523        if (character == '-') {
    565524            bufferASCIICharacter('-');
    566             HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
    567         } else if (cc == '<')
    568             HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
    569         else if (cc == kEndOfFileMarker) {
    570             parseError();
    571             HTML_RECONSUME_IN(DataState);
    572         } else {
    573             bufferCharacter(cc);
    574             HTML_ADVANCE_TO(ScriptDataEscapedState);
    575         }
    576     }
    577     END_STATE()
    578 
    579     HTML_BEGIN_STATE(ScriptDataEscapedDashDashState) {
    580         if (cc == '-') {
     525            ADVANCE_TO(ScriptDataEscapedDashDashState);
     526        }
     527        if (character == '<')
     528            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
     529        if (character == kEndOfFileMarker) {
     530            parseError();
     531            RECONSUME_IN(DataState);
     532        }
     533        bufferCharacter(character);
     534        ADVANCE_TO(ScriptDataEscapedState);
     535    END_STATE()
     536
     537    BEGIN_STATE(ScriptDataEscapedDashDashState)
     538        if (character == '-') {
    581539            bufferASCIICharacter('-');
    582             HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
    583         } else if (cc == '<')
    584             HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
    585         else if (cc == '>') {
     540            ADVANCE_TO(ScriptDataEscapedDashDashState);
     541        }
     542        if (character == '<')
     543            ADVANCE_TO(ScriptDataEscapedLessThanSignState);
     544        if (character == '>') {
    586545            bufferASCIICharacter('>');
    587             HTML_ADVANCE_TO(ScriptDataState);
    588         } else if (cc == kEndOfFileMarker) {
    589             parseError();
    590             HTML_RECONSUME_IN(DataState);
    591         } else {
    592             bufferCharacter(cc);
    593             HTML_ADVANCE_TO(ScriptDataEscapedState);
    594         }
    595     }
    596     END_STATE()
    597 
    598     HTML_BEGIN_STATE(ScriptDataEscapedLessThanSignState) {
    599         if (cc == '/') {
     546            ADVANCE_TO(ScriptDataState);
     547        }
     548        if (character == kEndOfFileMarker) {
     549            parseError();
     550            RECONSUME_IN(DataState);
     551        }
     552        bufferCharacter(character);
     553        ADVANCE_TO(ScriptDataEscapedState);
     554    END_STATE()
     555
     556    BEGIN_STATE(ScriptDataEscapedLessThanSignState)
     557        if (character == '/') {
    600558            m_temporaryBuffer.clear();
    601559            ASSERT(m_bufferedEndTagName.isEmpty());
    602             HTML_ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
    603         } else if (isASCIIUpper(cc)) {
     560            ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
     561        }
     562        if (isASCIIAlpha(character)) {
    604563            bufferASCIICharacter('<');
    605             bufferASCIICharacter(cc);
     564            bufferASCIICharacter(character);
    606565            m_temporaryBuffer.clear();
    607             m_temporaryBuffer.append(toLowerCase(cc));
    608             HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
    609         } else if (isASCIILower(cc)) {
     566            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
     567            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
     568        }
     569        bufferASCIICharacter('<');
     570        RECONSUME_IN(ScriptDataEscapedState);
     571    END_STATE()
     572
     573    BEGIN_STATE(ScriptDataEscapedEndTagOpenState)
     574        if (isASCIIAlpha(character)) {
     575            appendToTemporaryBuffer(character);
     576            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     577            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
     578        }
     579        bufferASCIICharacter('<');
     580        bufferASCIICharacter('/');
     581        RECONSUME_IN(ScriptDataEscapedState);
     582    END_STATE()
     583
     584    BEGIN_STATE(ScriptDataEscapedEndTagNameState)
     585        if (isASCIIAlpha(character)) {
     586            appendToTemporaryBuffer(character);
     587            appendToPossibleEndTag(convertASCIIAlphaToLower(character));
     588            ADVANCE_TO(ScriptDataEscapedEndTagNameState);
     589        }
     590        if (isTokenizerWhitespace(character)) {
     591            if (isAppropriateEndTag()) {
     592                if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
     593                    return true;
     594                SWITCH_TO(BeforeAttributeNameState);
     595            }
     596        } else if (character == '/') {
     597            if (isAppropriateEndTag()) {
     598                if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
     599                    return true;
     600                SWITCH_TO(SelfClosingStartTagState);
     601            }
     602        } else if (character == '>') {
     603            if (isAppropriateEndTag())
     604                return commitToCompleteEndTag(source);
     605        }
     606        bufferASCIICharacter('<');
     607        bufferASCIICharacter('/');
     608        m_token.appendToCharacter(m_temporaryBuffer);
     609        m_bufferedEndTagName.clear();
     610        m_temporaryBuffer.clear();
     611        RECONSUME_IN(ScriptDataEscapedState);
     612    END_STATE()
     613
     614    BEGIN_STATE(ScriptDataDoubleEscapeStartState)
     615        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
     616            bufferASCIICharacter(character);
     617            if (temporaryBufferIs("script"))
     618                ADVANCE_TO(ScriptDataDoubleEscapedState);
     619            else
     620                ADVANCE_TO(ScriptDataEscapedState);
     621        }
     622        if (isASCIIAlpha(character)) {
     623            bufferASCIICharacter(character);
     624            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
     625            ADVANCE_TO(ScriptDataDoubleEscapeStartState);
     626        }
     627        RECONSUME_IN(ScriptDataEscapedState);
     628    END_STATE()
     629
     630    BEGIN_STATE(ScriptDataDoubleEscapedState)
     631        if (character == '-') {
     632            bufferASCIICharacter('-');
     633            ADVANCE_TO(ScriptDataDoubleEscapedDashState);
     634        }
     635        if (character == '<') {
    610636            bufferASCIICharacter('<');
    611             bufferASCIICharacter(cc);
    612             m_temporaryBuffer.clear();
    613             m_temporaryBuffer.append(static_cast<LChar>(cc));
    614             HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
    615         } else {
     637            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
     638        }
     639        if (character == kEndOfFileMarker) {
     640            parseError();
     641            RECONSUME_IN(DataState);
     642        }
     643        bufferCharacter(character);
     644        ADVANCE_TO(ScriptDataDoubleEscapedState);
     645    END_STATE()
     646
     647    BEGIN_STATE(ScriptDataDoubleEscapedDashState)
     648        if (character == '-') {
     649            bufferASCIICharacter('-');
     650            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
     651        }
     652        if (character == '<') {
    616653            bufferASCIICharacter('<');
    617             HTML_RECONSUME_IN(ScriptDataEscapedState);
    618         }
    619     }
    620     END_STATE()
    621 
    622     HTML_BEGIN_STATE(ScriptDataEscapedEndTagOpenState) {
    623         if (isASCIIUpper(cc)) {
    624             m_temporaryBuffer.append(static_cast<LChar>(cc));
    625             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    626             HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
    627         } else if (isASCIILower(cc)) {
    628             m_temporaryBuffer.append(static_cast<LChar>(cc));
    629             addToPossibleEndTag(static_cast<LChar>(cc));
    630             HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
    631         } else {
     654            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
     655        }
     656        if (character == kEndOfFileMarker) {
     657            parseError();
     658            RECONSUME_IN(DataState);
     659        }
     660        bufferCharacter(character);
     661        ADVANCE_TO(ScriptDataDoubleEscapedState);
     662    END_STATE()
     663
     664    BEGIN_STATE(ScriptDataDoubleEscapedDashDashState)
     665        if (character == '-') {
     666            bufferASCIICharacter('-');
     667            ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
     668        }
     669        if (character == '<') {
    632670            bufferASCIICharacter('<');
    633             bufferASCIICharacter('/');
    634             HTML_RECONSUME_IN(ScriptDataEscapedState);
    635         }
    636     }
    637     END_STATE()
    638 
    639     HTML_BEGIN_STATE(ScriptDataEscapedEndTagNameState) {
    640         if (isASCIIUpper(cc)) {
    641             m_temporaryBuffer.append(static_cast<LChar>(cc));
    642             addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
    643             HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
    644         } else if (isASCIILower(cc)) {
    645             m_temporaryBuffer.append(static_cast<LChar>(cc));
    646             addToPossibleEndTag(static_cast<LChar>(cc));
    647             HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
    648         } else {
    649             if (isTokenizerWhitespace(cc)) {
    650                 if (isAppropriateEndTag()) {
    651                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    652                     FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
    653                 }
    654             } else if (cc == '/') {
    655                 if (isAppropriateEndTag()) {
    656                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    657                     FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
    658                 }
    659             } else if (cc == '>') {
    660                 if (isAppropriateEndTag()) {
    661                     m_temporaryBuffer.append(static_cast<LChar>(cc));
    662                     return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
    663                 }
    664             }
    665             bufferASCIICharacter('<');
    666             bufferASCIICharacter('/');
    667             m_token->appendToCharacter(m_temporaryBuffer);
    668             m_bufferedEndTagName.clear();
    669             m_temporaryBuffer.clear();
    670             HTML_RECONSUME_IN(ScriptDataEscapedState);
    671         }
    672     }
    673     END_STATE()
    674 
    675     HTML_BEGIN_STATE(ScriptDataDoubleEscapeStartState) {
    676         if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
    677             bufferASCIICharacter(cc);
    678             if (temporaryBufferIs(scriptTag.localName()))
    679                 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
    680             else
    681                 HTML_ADVANCE_TO(ScriptDataEscapedState);
    682         } else if (isASCIIUpper(cc)) {
    683             bufferASCIICharacter(cc);
    684             m_temporaryBuffer.append(toLowerCase(cc));
    685             HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
    686         } else if (isASCIILower(cc)) {
    687             bufferASCIICharacter(cc);
    688             m_temporaryBuffer.append(static_cast<LChar>(cc));
    689             HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
    690         } else
    691             HTML_RECONSUME_IN(ScriptDataEscapedState);
    692     }
    693     END_STATE()
    694 
    695     HTML_BEGIN_STATE(ScriptDataDoubleEscapedState) {
    696         if (cc == '-') {
    697             bufferASCIICharacter('-');
    698             HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashState);
    699         } else if (cc == '<') {
    700             bufferASCIICharacter('<');
    701             HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
    702         } else if (cc == kEndOfFileMarker) {
    703             parseError();
    704             HTML_RECONSUME_IN(DataState);
    705         } else {
    706             bufferCharacter(cc);
    707             HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
    708         }
    709     }
    710     END_STATE()
    711 
    712     HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashState) {
    713         if (cc == '-') {
    714             bufferASCIICharacter('-');
    715             HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
    716         } else if (cc == '<') {
    717             bufferASCIICharacter('<');
    718             HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
    719         } else if (cc == kEndOfFileMarker) {
    720             parseError();
    721             HTML_RECONSUME_IN(DataState);
    722         } else {
    723             bufferCharacter(cc);
    724             HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
    725         }
    726     }
    727     END_STATE()
    728 
    729     HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) {
    730         if (cc == '-') {
    731             bufferASCIICharacter('-');
    732             HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
    733         } else if (cc == '<') {
    734             bufferASCIICharacter('<');
    735             HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
    736         } else if (cc == '>') {
     671            ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
     672        }
     673        if (character == '>') {
    737674            bufferASCIICharacter('>');
    738             HTML_ADVANCE_TO(ScriptDataState);
    739         } else if (cc == kEndOfFileMarker) {
    740             parseError();
    741             HTML_RECONSUME_IN(DataState);
    742         } else {
    743             bufferCharacter(cc);
    744             HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
    745         }
    746     }
    747     END_STATE()
    748 
    749     HTML_BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) {
    750         if (cc == '/') {
     675            ADVANCE_TO(ScriptDataState);
     676        }
     677        if (character == kEndOfFileMarker) {
     678            parseError();
     679            RECONSUME_IN(DataState);
     680        }
     681        bufferCharacter(character);
     682        ADVANCE_TO(ScriptDataDoubleEscapedState);
     683    END_STATE()
     684
     685    BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState)
     686        if (character == '/') {
    751687            bufferASCIICharacter('/');
    752688            m_temporaryBuffer.clear();
    753             HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
    754         } else
    755             HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
    756     }
    757     END_STATE()
    758 
    759     HTML_BEGIN_STATE(ScriptDataDoubleEscapeEndState) {
    760         if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
    761             bufferASCIICharacter(cc);
    762             if (temporaryBufferIs(scriptTag.localName()))
    763                 HTML_ADVANCE_TO(ScriptDataEscapedState);
     689            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
     690        }
     691        RECONSUME_IN(ScriptDataDoubleEscapedState);
     692    END_STATE()
     693
     694    BEGIN_STATE(ScriptDataDoubleEscapeEndState)
     695        if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
     696            bufferASCIICharacter(character);
     697            if (temporaryBufferIs("script"))
     698                ADVANCE_TO(ScriptDataEscapedState);
    764699            else
    765                 HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
    766         } else if (isASCIIUpper(cc)) {
    767             bufferASCIICharacter(cc);
    768             m_temporaryBuffer.append(toLowerCase(cc));
    769             HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
    770         } else if (isASCIILower(cc)) {
    771             bufferASCIICharacter(cc);
    772             m_temporaryBuffer.append(static_cast<LChar>(cc));
    773             HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
    774         } else
    775             HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
    776     }
    777     END_STATE()
    778 
    779     HTML_BEGIN_STATE(BeforeAttributeNameState) {
    780         if (isTokenizerWhitespace(cc))
    781             HTML_ADVANCE_TO(BeforeAttributeNameState);
    782         else if (cc == '/')
    783             HTML_ADVANCE_TO(SelfClosingStartTagState);
    784         else if (cc == '>')
    785             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    786         else if (m_options.usePreHTML5ParserQuirks && cc == '<')
    787             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    788         else if (isASCIIUpper(cc)) {
    789             m_token->addNewAttribute();
    790             m_token->beginAttributeName(source.numberOfCharactersConsumed());
    791             m_token->appendToAttributeName(toLowerCase(cc));
    792             HTML_ADVANCE_TO(AttributeNameState);
    793         } else if (cc == kEndOfFileMarker) {
    794             parseError();
    795             HTML_RECONSUME_IN(DataState);
    796         } else {
    797             if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
    798                 parseError();
    799             m_token->addNewAttribute();
    800             m_token->beginAttributeName(source.numberOfCharactersConsumed());
    801             m_token->appendToAttributeName(cc);
    802             HTML_ADVANCE_TO(AttributeNameState);
    803         }
    804     }
    805     END_STATE()
    806 
    807     HTML_BEGIN_STATE(AttributeNameState) {
    808         if (isTokenizerWhitespace(cc)) {
    809             m_token->endAttributeName(source.numberOfCharactersConsumed());
    810             HTML_ADVANCE_TO(AfterAttributeNameState);
    811         } else if (cc == '/') {
    812             m_token->endAttributeName(source.numberOfCharactersConsumed());
    813             HTML_ADVANCE_TO(SelfClosingStartTagState);
    814         } else if (cc == '=') {
    815             m_token->endAttributeName(source.numberOfCharactersConsumed());
    816             HTML_ADVANCE_TO(BeforeAttributeValueState);
    817         } else if (cc == '>') {
    818             m_token->endAttributeName(source.numberOfCharactersConsumed());
    819             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    820         } else if (m_options.usePreHTML5ParserQuirks && cc == '<') {
    821             m_token->endAttributeName(source.numberOfCharactersConsumed());
    822             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    823         } else if (isASCIIUpper(cc)) {
    824             m_token->appendToAttributeName(toLowerCase(cc));
    825             HTML_ADVANCE_TO(AttributeNameState);
    826         } else if (cc == kEndOfFileMarker) {
    827             parseError();
    828             m_token->endAttributeName(source.numberOfCharactersConsumed());
    829             HTML_RECONSUME_IN(DataState);
    830         } else {
    831             if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
    832                 parseError();
    833             m_token->appendToAttributeName(cc);
    834             HTML_ADVANCE_TO(AttributeNameState);
    835         }
    836     }
    837     END_STATE()
    838 
    839     HTML_BEGIN_STATE(AfterAttributeNameState) {
    840         if (isTokenizerWhitespace(cc))
    841             HTML_ADVANCE_TO(AfterAttributeNameState);
    842         else if (cc == '/')
    843             HTML_ADVANCE_TO(SelfClosingStartTagState);
    844         else if (cc == '=')
    845             HTML_ADVANCE_TO(BeforeAttributeValueState);
    846         else if (cc == '>')
    847             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    848         else if (m_options.usePreHTML5ParserQuirks && cc == '<')
    849             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    850         else if (isASCIIUpper(cc)) {
    851             m_token->addNewAttribute();
    852             m_token->beginAttributeName(source.numberOfCharactersConsumed());
    853             m_token->appendToAttributeName(toLowerCase(cc));
    854             HTML_ADVANCE_TO(AttributeNameState);
    855         } else if (cc == kEndOfFileMarker) {
    856             parseError();
    857             HTML_RECONSUME_IN(DataState);
    858         } else {
    859             if (cc == '"' || cc == '\'' || cc == '<')
    860                 parseError();
    861             m_token->addNewAttribute();
    862             m_token->beginAttributeName(source.numberOfCharactersConsumed());
    863             m_token->appendToAttributeName(cc);
    864             HTML_ADVANCE_TO(AttributeNameState);
    865         }
    866     }
    867     END_STATE()
    868 
    869     HTML_BEGIN_STATE(BeforeAttributeValueState) {
    870         if (isTokenizerWhitespace(cc))
    871             HTML_ADVANCE_TO(BeforeAttributeValueState);
    872         else if (cc == '"') {
    873             m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
    874             HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
    875         } else if (cc == '&') {
    876             m_token->beginAttributeValue(source.numberOfCharactersConsumed());
    877             HTML_RECONSUME_IN(AttributeValueUnquotedState);
    878         } else if (cc == '\'') {
    879             m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
    880             HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
    881         } else if (cc == '>') {
    882             parseError();
    883             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    884         } else if (cc == kEndOfFileMarker) {
    885             parseError();
    886             HTML_RECONSUME_IN(DataState);
    887         } else {
    888             if (cc == '<' || cc == '=' || cc == '`')
    889                 parseError();
    890             m_token->beginAttributeValue(source.numberOfCharactersConsumed());
    891             m_token->appendToAttributeValue(cc);
    892             HTML_ADVANCE_TO(AttributeValueUnquotedState);
    893         }
    894     }
    895     END_STATE()
    896 
    897     HTML_BEGIN_STATE(AttributeValueDoubleQuotedState) {
    898         if (cc == '"') {
    899             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    900             HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
    901         } else if (cc == '&') {
     700                ADVANCE_TO(ScriptDataDoubleEscapedState);
     701        }
     702        if (isASCIIAlpha(character)) {
     703            bufferASCIICharacter(character);
     704            appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
     705            ADVANCE_TO(ScriptDataDoubleEscapeEndState);
     706        }
     707        RECONSUME_IN(ScriptDataDoubleEscapedState);
     708    END_STATE()
     709
     710    BEGIN_STATE(BeforeAttributeNameState)
     711        if (isTokenizerWhitespace(character))
     712            ADVANCE_TO(BeforeAttributeNameState);
     713        if (character == '/')
     714            ADVANCE_TO(SelfClosingStartTagState);
     715        if (character == '>')
     716            return emitAndResumeInDataState(source);
     717        if (m_options.usePreHTML5ParserQuirks && character == '<')
     718            return emitAndReconsumeInDataState();
     719        if (character == kEndOfFileMarker) {
     720            parseError();
     721            RECONSUME_IN(DataState);
     722        }
     723        if (character == '"' || character == '\'' || character == '<' || character == '=')
     724            parseError();
     725        m_token.beginAttribute(source.numberOfCharactersConsumed());
     726        m_token.appendToAttributeName(toASCIILower(character));
     727        ADVANCE_TO(AttributeNameState);
     728    END_STATE()
     729
     730    BEGIN_STATE(AttributeNameState)
     731        if (isTokenizerWhitespace(character))
     732            ADVANCE_TO(AfterAttributeNameState);
     733        if (character == '/')
     734            ADVANCE_TO(SelfClosingStartTagState);
     735        if (character == '=')
     736            ADVANCE_TO(BeforeAttributeValueState);
     737        if (character == '>')
     738            return emitAndResumeInDataState(source);
     739        if (m_options.usePreHTML5ParserQuirks && character == '<')
     740            return emitAndReconsumeInDataState();
     741        if (character == kEndOfFileMarker) {
     742            parseError();
     743            RECONSUME_IN(DataState);
     744        }
     745        if (character == '"' || character == '\'' || character == '<' || character == '=')
     746            parseError();
     747        m_token.appendToAttributeName(toASCIILower(character));
     748        ADVANCE_TO(AttributeNameState);
     749    END_STATE()
     750
     751    BEGIN_STATE(AfterAttributeNameState)
     752        if (isTokenizerWhitespace(character))
     753            ADVANCE_TO(AfterAttributeNameState);
     754        if (character == '/')
     755            ADVANCE_TO(SelfClosingStartTagState);
     756        if (character == '=')
     757            ADVANCE_TO(BeforeAttributeValueState);
     758        if (character == '>')
     759            return emitAndResumeInDataState(source);
     760        if (m_options.usePreHTML5ParserQuirks && character == '<')
     761            return emitAndReconsumeInDataState();
     762        if (character == kEndOfFileMarker) {
     763            parseError();
     764            RECONSUME_IN(DataState);
     765        }
     766        if (character == '"' || character == '\'' || character == '<')
     767            parseError();
     768        m_token.beginAttribute(source.numberOfCharactersConsumed());
     769        m_token.appendToAttributeName(toASCIILower(character));
     770        ADVANCE_TO(AttributeNameState);
     771    END_STATE()
     772
     773    BEGIN_STATE(BeforeAttributeValueState)
     774        if (isTokenizerWhitespace(character))
     775            ADVANCE_TO(BeforeAttributeValueState);
     776        if (character == '"')
     777            ADVANCE_TO(AttributeValueDoubleQuotedState);
     778        if (character == '&')
     779            RECONSUME_IN(AttributeValueUnquotedState);
     780        if (character == '\'')
     781            ADVANCE_TO(AttributeValueSingleQuotedState);
     782        if (character == '>') {
     783            parseError();
     784            return emitAndResumeInDataState(source);
     785        }
     786        if (character == kEndOfFileMarker) {
     787            parseError();
     788            RECONSUME_IN(DataState);
     789        }
     790        if (character == '<' || character == '=' || character == '`')
     791            parseError();
     792        m_token.appendToAttributeValue(character);
     793        ADVANCE_TO(AttributeValueUnquotedState);
     794    END_STATE()
     795
     796    BEGIN_STATE(AttributeValueDoubleQuotedState)
     797        if (character == '"') {
     798            m_token.endAttribute(source.numberOfCharactersConsumed());
     799            ADVANCE_TO(AfterAttributeValueQuotedState);
     800        }
     801        if (character == '&') {
    902802            m_additionalAllowedCharacter = '"';
    903             HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
    904         } else if (cc == kEndOfFileMarker) {
    905             parseError();
    906             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    907             HTML_RECONSUME_IN(DataState);
    908         } else {
    909             m_token->appendToAttributeValue(cc);
    910             HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
    911         }
    912     }
    913     END_STATE()
    914 
    915     HTML_BEGIN_STATE(AttributeValueSingleQuotedState) {
    916         if (cc == '\'') {
    917             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    918             HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
    919         } else if (cc == '&') {
     803            ADVANCE_TO(CharacterReferenceInAttributeValueState);
     804        }
     805        if (character == kEndOfFileMarker) {
     806            parseError();
     807            m_token.endAttribute(source.numberOfCharactersConsumed());
     808            RECONSUME_IN(DataState);
     809        }
     810        m_token.appendToAttributeValue(character);
     811        ADVANCE_TO(AttributeValueDoubleQuotedState);
     812    END_STATE()
     813
     814    BEGIN_STATE(AttributeValueSingleQuotedState)
     815        if (character == '\'') {
     816            m_token.endAttribute(source.numberOfCharactersConsumed());
     817            ADVANCE_TO(AfterAttributeValueQuotedState);
     818        }
     819        if (character == '&') {
    920820            m_additionalAllowedCharacter = '\'';
    921             HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
    922         } else if (cc == kEndOfFileMarker) {
    923             parseError();
    924             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    925             HTML_RECONSUME_IN(DataState);
    926         } else {
    927             m_token->appendToAttributeValue(cc);
    928             HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
    929         }
    930     }
    931     END_STATE()
    932 
    933     HTML_BEGIN_STATE(AttributeValueUnquotedState) {
    934         if (isTokenizerWhitespace(cc)) {
    935             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    936             HTML_ADVANCE_TO(BeforeAttributeNameState);
    937         } else if (cc == '&') {
     821            ADVANCE_TO(CharacterReferenceInAttributeValueState);
     822        }
     823        if (character == kEndOfFileMarker) {
     824            parseError();
     825            m_token.endAttribute(source.numberOfCharactersConsumed());
     826            RECONSUME_IN(DataState);
     827        }
     828        m_token.appendToAttributeValue(character);
     829        ADVANCE_TO(AttributeValueSingleQuotedState);
     830    END_STATE()
     831
     832    BEGIN_STATE(AttributeValueUnquotedState)
     833        if (isTokenizerWhitespace(character)) {
     834            m_token.endAttribute(source.numberOfCharactersConsumed());
     835            ADVANCE_TO(BeforeAttributeNameState);
     836        }
     837        if (character == '&') {
    938838            m_additionalAllowedCharacter = '>';
    939             HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
    940         } else if (cc == '>') {
    941             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    942             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    943         } else if (cc == kEndOfFileMarker) {
    944             parseError();
    945             m_token->endAttributeValue(source.numberOfCharactersConsumed());
    946             HTML_RECONSUME_IN(DataState);
    947         } else {
    948             if (cc == '"' || cc == '\'' || cc == '<' || cc == '=' || cc == '`')
    949                 parseError();
    950             m_token->appendToAttributeValue(cc);
    951             HTML_ADVANCE_TO(AttributeValueUnquotedState);
    952         }
    953     }
    954     END_STATE()
    955 
    956     HTML_BEGIN_STATE(CharacterReferenceInAttributeValueState) {
     839            ADVANCE_TO(CharacterReferenceInAttributeValueState);
     840        }
     841        if (character == '>') {
     842            m_token.endAttribute(source.numberOfCharactersConsumed());
     843            return emitAndResumeInDataState(source);
     844        }
     845        if (character == kEndOfFileMarker) {
     846            parseError();
     847            m_token.endAttribute(source.numberOfCharactersConsumed());
     848            RECONSUME_IN(DataState);
     849        }
     850        if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`')
     851            parseError();
     852        m_token.appendToAttributeValue(character);
     853        ADVANCE_TO(AttributeValueUnquotedState);
     854    END_STATE()
     855
     856    BEGIN_STATE(CharacterReferenceInAttributeValueState)
    957857        bool notEnoughCharacters = false;
    958858        StringBuilder decodedEntity;
    959859        bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters, m_additionalAllowedCharacter);
    960860        if (notEnoughCharacters)
    961             return haveBufferedCharacterToken();
     861            RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
    962862        if (!success) {
    963863            ASSERT(decodedEntity.isEmpty());
    964             m_token->appendToAttributeValue('&');
     864            m_token.appendToAttributeValue('&');
    965865        } else {
    966866            for (unsigned i = 0; i < decodedEntity.length(); ++i)
    967                 m_token->appendToAttributeValue(decodedEntity[i]);
     867                m_token.appendToAttributeValue(decodedEntity[i]);
    968868        }
    969869        // We're supposed to switch back to the attribute value state that
     
    972872        // state can be determined by m_additionalAllowedCharacter.
    973873        if (m_additionalAllowedCharacter == '"')
    974             HTML_SWITCH_TO(AttributeValueDoubleQuotedState);
    975         else if (m_additionalAllowedCharacter == '\'')
    976             HTML_SWITCH_TO(AttributeValueSingleQuotedState);
    977         else if (m_additionalAllowedCharacter == '>')
    978             HTML_SWITCH_TO(AttributeValueUnquotedState);
    979         else
    980             ASSERT_NOT_REACHED();
    981     }
    982     END_STATE()
    983 
    984     HTML_BEGIN_STATE(AfterAttributeValueQuotedState) {
    985         if (isTokenizerWhitespace(cc))
    986             HTML_ADVANCE_TO(BeforeAttributeNameState);
    987         else if (cc == '/')
    988             HTML_ADVANCE_TO(SelfClosingStartTagState);
    989         else if (cc == '>')
    990             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    991         else if (m_options.usePreHTML5ParserQuirks && cc == '<')
    992             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    993         else if (cc == kEndOfFileMarker) {
    994             parseError();
    995             HTML_RECONSUME_IN(DataState);
    996         } else {
    997             parseError();
    998             HTML_RECONSUME_IN(BeforeAttributeNameState);
    999         }
    1000     }
    1001     END_STATE()
    1002 
    1003     HTML_BEGIN_STATE(SelfClosingStartTagState) {
    1004         if (cc == '>') {
    1005             m_token->setSelfClosing();
    1006             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1007         } else if (cc == kEndOfFileMarker) {
    1008             parseError();
    1009             HTML_RECONSUME_IN(DataState);
    1010         } else {
    1011             parseError();
    1012             HTML_RECONSUME_IN(BeforeAttributeNameState);
    1013         }
    1014     }
    1015     END_STATE()
    1016 
    1017     HTML_BEGIN_STATE(BogusCommentState) {
    1018         m_token->beginComment();
    1019         HTML_RECONSUME_IN(ContinueBogusCommentState);
    1020     }
    1021     END_STATE()
    1022 
    1023     HTML_BEGIN_STATE(ContinueBogusCommentState) {
    1024         if (cc == '>')
    1025             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1026         else if (cc == kEndOfFileMarker)
    1027             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1028         else {
    1029             m_token->appendToComment(cc);
    1030             HTML_ADVANCE_TO(ContinueBogusCommentState);
    1031         }
    1032     }
    1033     END_STATE()
    1034 
    1035     HTML_BEGIN_STATE(MarkupDeclarationOpenState) {
    1036         DEPRECATED_DEFINE_STATIC_LOCAL(String, dashDashString, (ASCIILiteral("--")));
    1037         DEPRECATED_DEFINE_STATIC_LOCAL(String, doctypeString, (ASCIILiteral("doctype")));
    1038         DEPRECATED_DEFINE_STATIC_LOCAL(String, cdataString, (ASCIILiteral("[CDATA[")));
    1039         if (cc == '-') {
    1040             SegmentedString::LookAheadResult result = source.lookAhead(dashDashString);
     874            SWITCH_TO(AttributeValueDoubleQuotedState);
     875        if (m_additionalAllowedCharacter == '\'')
     876            SWITCH_TO(AttributeValueSingleQuotedState);
     877        ASSERT(m_additionalAllowedCharacter == '>');
     878        SWITCH_TO(AttributeValueUnquotedState);
     879    END_STATE()
     880
     881    BEGIN_STATE(AfterAttributeValueQuotedState)
     882        if (isTokenizerWhitespace(character))
     883            ADVANCE_TO(BeforeAttributeNameState);
     884        if (character == '/')
     885            ADVANCE_TO(SelfClosingStartTagState);
     886        if (character == '>')
     887            return emitAndResumeInDataState(source);
     888        if (m_options.usePreHTML5ParserQuirks && character == '<')
     889            return emitAndReconsumeInDataState();
     890        if (character == kEndOfFileMarker) {
     891            parseError();
     892            RECONSUME_IN(DataState);
     893        }
     894        parseError();
     895        RECONSUME_IN(BeforeAttributeNameState);
     896    END_STATE()
     897
     898    BEGIN_STATE(SelfClosingStartTagState)
     899        if (character == '>') {
     900            m_token.setSelfClosing();
     901            return emitAndResumeInDataState(source);
     902        }
     903        if (character == kEndOfFileMarker) {
     904            parseError();
     905            RECONSUME_IN(DataState);
     906        }
     907        parseError();
     908        RECONSUME_IN(BeforeAttributeNameState);
     909    END_STATE()
     910
     911    BEGIN_STATE(BogusCommentState)
     912        m_token.beginComment();
     913        RECONSUME_IN(ContinueBogusCommentState);
     914    END_STATE()
     915
     916    BEGIN_STATE(ContinueBogusCommentState)
     917        if (character == '>')
     918            return emitAndResumeInDataState(source);
     919        if (character == kEndOfFileMarker)
     920            return emitAndReconsumeInDataState();
     921        m_token.appendToComment(character);
     922        ADVANCE_TO(ContinueBogusCommentState);
     923    END_STATE()
     924
     925    BEGIN_STATE(MarkupDeclarationOpenState)
     926        if (character == '-') {
     927            auto result = source.advancePast("--");
    1041928            if (result == SegmentedString::DidMatch) {
    1042                 source.advanceAndASSERT('-');
    1043                 source.advanceAndASSERT('-');
    1044                 m_token->beginComment();
    1045                 HTML_SWITCH_TO(CommentStartState);
    1046             } else if (result == SegmentedString::NotEnoughCharacters)
    1047                 return haveBufferedCharacterToken();
    1048         } else if (cc == 'D' || cc == 'd') {
    1049             SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(doctypeString);
    1050             if (result == SegmentedString::DidMatch) {
    1051                 advanceStringAndASSERTIgnoringCase(source, "doctype");
    1052                 HTML_SWITCH_TO(DOCTYPEState);
    1053             } else if (result == SegmentedString::NotEnoughCharacters)
    1054                 return haveBufferedCharacterToken();
    1055         } else if (cc == '[' && shouldAllowCDATA()) {
    1056             SegmentedString::LookAheadResult result = source.lookAhead(cdataString);
    1057             if (result == SegmentedString::DidMatch) {
    1058                 advanceStringAndASSERT(source, "[CDATA[");
    1059                 HTML_SWITCH_TO(CDATASectionState);
    1060             } else if (result == SegmentedString::NotEnoughCharacters)
    1061                 return haveBufferedCharacterToken();
     929                m_token.beginComment();
     930                SWITCH_TO(CommentStartState);
     931            }
     932            if (result == SegmentedString::NotEnoughCharacters)
     933                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     934        } else if (isASCIIAlphaCaselessEqual(character, 'd')) {
     935            auto result = source.advancePastIgnoringCase("doctype");
     936            if (result == SegmentedString::DidMatch)
     937                SWITCH_TO(DOCTYPEState);
     938            if (result == SegmentedString::NotEnoughCharacters)
     939                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     940        } else if (character == '[' && shouldAllowCDATA()) {
     941            auto result = source.advancePast("[CDATA[");
     942            if (result == SegmentedString::DidMatch)
     943                SWITCH_TO(CDATASectionState);
     944            if (result == SegmentedString::NotEnoughCharacters)
     945                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
    1062946        }
    1063947        parseError();
    1064         HTML_RECONSUME_IN(BogusCommentState);
    1065     }
    1066     END_STATE()
    1067 
    1068     HTML_BEGIN_STATE(CommentStartState) {
    1069         if (cc == '-')
    1070             HTML_ADVANCE_TO(CommentStartDashState);
    1071         else if (cc == '>') {
    1072             parseError();
    1073             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1074         } else if (cc == kEndOfFileMarker) {
    1075             parseError();
    1076             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1077         } else {
    1078             m_token->appendToComment(cc);
    1079             HTML_ADVANCE_TO(CommentState);
    1080         }
    1081     }
    1082     END_STATE()
    1083 
    1084     HTML_BEGIN_STATE(CommentStartDashState) {
    1085         if (cc == '-')
    1086             HTML_ADVANCE_TO(CommentEndState);
    1087         else if (cc == '>') {
    1088             parseError();
    1089             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1090         } else if (cc == kEndOfFileMarker) {
    1091             parseError();
    1092             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1093         } else {
    1094             m_token->appendToComment('-');
    1095             m_token->appendToComment(cc);
    1096             HTML_ADVANCE_TO(CommentState);
    1097         }
    1098     }
    1099     END_STATE()
    1100 
    1101     HTML_BEGIN_STATE(CommentState) {
    1102         if (cc == '-')
    1103             HTML_ADVANCE_TO(CommentEndDashState);
    1104         else if (cc == kEndOfFileMarker) {
    1105             parseError();
    1106             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1107         } else {
    1108             m_token->appendToComment(cc);
    1109             HTML_ADVANCE_TO(CommentState);
    1110         }
    1111     }
    1112     END_STATE()
    1113 
    1114     HTML_BEGIN_STATE(CommentEndDashState) {
    1115         if (cc == '-')
    1116             HTML_ADVANCE_TO(CommentEndState);
    1117         else if (cc == kEndOfFileMarker) {
    1118             parseError();
    1119             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1120         } else {
    1121             m_token->appendToComment('-');
    1122             m_token->appendToComment(cc);
    1123             HTML_ADVANCE_TO(CommentState);
    1124         }
    1125     }
    1126     END_STATE()
    1127 
    1128     HTML_BEGIN_STATE(CommentEndState) {
    1129         if (cc == '>')
    1130             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1131         else if (cc == '!') {
    1132             parseError();
    1133             HTML_ADVANCE_TO(CommentEndBangState);
    1134         } else if (cc == '-') {
    1135             parseError();
    1136             m_token->appendToComment('-');
    1137             HTML_ADVANCE_TO(CommentEndState);
    1138         } else if (cc == kEndOfFileMarker) {
    1139             parseError();
    1140             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1141         } else {
    1142             parseError();
    1143             m_token->appendToComment('-');
    1144             m_token->appendToComment('-');
    1145             m_token->appendToComment(cc);
    1146             HTML_ADVANCE_TO(CommentState);
    1147         }
    1148     }
    1149     END_STATE()
    1150 
    1151     HTML_BEGIN_STATE(CommentEndBangState) {
    1152         if (cc == '-') {
    1153             m_token->appendToComment('-');
    1154             m_token->appendToComment('-');
    1155             m_token->appendToComment('!');
    1156             HTML_ADVANCE_TO(CommentEndDashState);
    1157         } else if (cc == '>')
    1158             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1159         else if (cc == kEndOfFileMarker) {
    1160             parseError();
    1161             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1162         } else {
    1163             m_token->appendToComment('-');
    1164             m_token->appendToComment('-');
    1165             m_token->appendToComment('!');
    1166             m_token->appendToComment(cc);
    1167             HTML_ADVANCE_TO(CommentState);
    1168         }
    1169     }
    1170     END_STATE()
    1171 
    1172     HTML_BEGIN_STATE(DOCTYPEState) {
    1173         if (isTokenizerWhitespace(cc))
    1174             HTML_ADVANCE_TO(BeforeDOCTYPENameState);
    1175         else if (cc == kEndOfFileMarker) {
    1176             parseError();
    1177             m_token->beginDOCTYPE();
    1178             m_token->setForceQuirks();
    1179             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1180         } else {
    1181             parseError();
    1182             HTML_RECONSUME_IN(BeforeDOCTYPENameState);
    1183         }
    1184     }
    1185     END_STATE()
    1186 
    1187     HTML_BEGIN_STATE(BeforeDOCTYPENameState) {
    1188         if (isTokenizerWhitespace(cc))
    1189             HTML_ADVANCE_TO(BeforeDOCTYPENameState);
    1190         else if (isASCIIUpper(cc)) {
    1191             m_token->beginDOCTYPE(toLowerCase(cc));
    1192             HTML_ADVANCE_TO(DOCTYPENameState);
    1193         } else if (cc == '>') {
    1194             parseError();
    1195             m_token->beginDOCTYPE();
    1196             m_token->setForceQuirks();
    1197             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1198         } else if (cc == kEndOfFileMarker) {
    1199             parseError();
    1200             m_token->beginDOCTYPE();
    1201             m_token->setForceQuirks();
    1202             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1203         } else {
    1204             m_token->beginDOCTYPE(cc);
    1205             HTML_ADVANCE_TO(DOCTYPENameState);
    1206         }
    1207     }
    1208     END_STATE()
    1209 
    1210     HTML_BEGIN_STATE(DOCTYPENameState) {
    1211         if (isTokenizerWhitespace(cc))
    1212             HTML_ADVANCE_TO(AfterDOCTYPENameState);
    1213         else if (cc == '>')
    1214             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1215         else if (isASCIIUpper(cc)) {
    1216             m_token->appendToName(toLowerCase(cc));
    1217             HTML_ADVANCE_TO(DOCTYPENameState);
    1218         } else if (cc == kEndOfFileMarker) {
    1219             parseError();
    1220             m_token->setForceQuirks();
    1221             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1222         } else {
    1223             m_token->appendToName(cc);
    1224             HTML_ADVANCE_TO(DOCTYPENameState);
    1225         }
    1226     }
    1227     END_STATE()
    1228 
    1229     HTML_BEGIN_STATE(AfterDOCTYPENameState) {
    1230         if (isTokenizerWhitespace(cc))
    1231             HTML_ADVANCE_TO(AfterDOCTYPENameState);
    1232         if (cc == '>')
    1233             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1234         else if (cc == kEndOfFileMarker) {
    1235             parseError();
    1236             m_token->setForceQuirks();
    1237             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1238         } else {
    1239             DEPRECATED_DEFINE_STATIC_LOCAL(String, publicString, (ASCIILiteral("public")));
    1240             DEPRECATED_DEFINE_STATIC_LOCAL(String, systemString, (ASCIILiteral("system")));
    1241             if (cc == 'P' || cc == 'p') {
    1242                 SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(publicString);
    1243                 if (result == SegmentedString::DidMatch) {
    1244                     advanceStringAndASSERTIgnoringCase(source, "public");
    1245                     HTML_SWITCH_TO(AfterDOCTYPEPublicKeywordState);
    1246                 } else if (result == SegmentedString::NotEnoughCharacters)
    1247                     return haveBufferedCharacterToken();
    1248             } else if (cc == 'S' || cc == 's') {
    1249                 SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(systemString);
    1250                 if (result == SegmentedString::DidMatch) {
    1251                     advanceStringAndASSERTIgnoringCase(source, "system");
    1252                     HTML_SWITCH_TO(AfterDOCTYPESystemKeywordState);
    1253                 } else if (result == SegmentedString::NotEnoughCharacters)
    1254                     return haveBufferedCharacterToken();
    1255             }
    1256             parseError();
    1257             m_token->setForceQuirks();
    1258             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1259         }
    1260     }
    1261     END_STATE()
    1262 
    1263     HTML_BEGIN_STATE(AfterDOCTYPEPublicKeywordState) {
    1264         if (isTokenizerWhitespace(cc))
    1265             HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
    1266         else if (cc == '"') {
    1267             parseError();
    1268             m_token->setPublicIdentifierToEmptyString();
    1269             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
    1270         } else if (cc == '\'') {
    1271             parseError();
    1272             m_token->setPublicIdentifierToEmptyString();
    1273             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
    1274         } else if (cc == '>') {
    1275             parseError();
    1276             m_token->setForceQuirks();
    1277             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1278         } else if (cc == kEndOfFileMarker) {
    1279             parseError();
    1280             m_token->setForceQuirks();
    1281             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1282         } else {
    1283             parseError();
    1284             m_token->setForceQuirks();
    1285             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1286         }
    1287     }
    1288     END_STATE()
    1289 
    1290     HTML_BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) {
    1291         if (isTokenizerWhitespace(cc))
    1292             HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
    1293         else if (cc == '"') {
    1294             m_token->setPublicIdentifierToEmptyString();
    1295             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
    1296         } else if (cc == '\'') {
    1297             m_token->setPublicIdentifierToEmptyString();
    1298             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
    1299         } else if (cc == '>') {
    1300             parseError();
    1301             m_token->setForceQuirks();
    1302             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1303         } else if (cc == kEndOfFileMarker) {
    1304             parseError();
    1305             m_token->setForceQuirks();
    1306             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1307         } else {
    1308             parseError();
    1309             m_token->setForceQuirks();
    1310             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1311         }
    1312     }
    1313     END_STATE()
    1314 
    1315     HTML_BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) {
    1316         if (cc == '"')
    1317             HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
    1318         else if (cc == '>') {
    1319             parseError();
    1320             m_token->setForceQuirks();
    1321             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1322         } else if (cc == kEndOfFileMarker) {
    1323             parseError();
    1324             m_token->setForceQuirks();
    1325             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1326         } else {
    1327             m_token->appendToPublicIdentifier(cc);
    1328             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
    1329         }
    1330     }
    1331     END_STATE()
    1332 
    1333     HTML_BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) {
    1334         if (cc == '\'')
    1335             HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
    1336         else if (cc == '>') {
    1337             parseError();
    1338             m_token->setForceQuirks();
    1339             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1340         } else if (cc == kEndOfFileMarker) {
    1341             parseError();
    1342             m_token->setForceQuirks();
    1343             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1344         } else {
    1345             m_token->appendToPublicIdentifier(cc);
    1346             HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
    1347         }
    1348     }
    1349     END_STATE()
    1350 
    1351     HTML_BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) {
    1352         if (isTokenizerWhitespace(cc))
    1353             HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
    1354         else if (cc == '>')
    1355             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1356         else if (cc == '"') {
    1357             parseError();
    1358             m_token->setSystemIdentifierToEmptyString();
    1359             HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
    1360         } else if (cc == '\'') {
    1361             parseError();
    1362             m_token->setSystemIdentifierToEmptyString();
    1363             HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
    1364         } else if (cc == kEndOfFileMarker) {
    1365             parseError();
    1366             m_token->setForceQuirks();
    1367             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1368         } else {
    1369             parseError();
    1370             m_token->setForceQuirks();
    1371             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1372         }
    1373     }
    1374     END_STATE()
    1375 
    1376     HTML_BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) {
    1377         if (isTokenizerWhitespace(cc))
    1378             HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
    1379         else if (cc == '>')
    1380             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1381         else if (cc == '"') {
    1382             m_token->setSystemIdentifierToEmptyString();
    1383             HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
    1384         } else if (cc == '\'') {
    1385             m_token->setSystemIdentifierToEmptyString();
    1386             HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
    1387         } else if (cc == kEndOfFileMarker) {
    1388             parseError();
    1389             m_token->setForceQuirks();
    1390             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1391         } else {
    1392             parseError();
    1393             m_token->setForceQuirks();
    1394             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1395         }
    1396     }
    1397     END_STATE()
    1398 
    1399     HTML_BEGIN_STATE(AfterDOCTYPESystemKeywordState) {
    1400         if (isTokenizerWhitespace(cc))
    1401             HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
    1402         else if (cc == '"') {
    1403             parseError();
    1404             m_token->setSystemIdentifierToEmptyString();
    1405             HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
    1406         } else if (cc == '\'') {
    1407             parseError();
    1408             m_token->setSystemIdentifierToEmptyString();
    1409             HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
    1410         } else if (cc == '>') {
    1411             parseError();
    1412             m_token->setForceQuirks();
    1413             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1414         } else if (cc == kEndOfFileMarker) {
    1415             parseError();
    1416             m_token->setForceQuirks();
    1417             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1418         } else {
    1419             parseError();
    1420             m_token->setForceQuirks();
    1421             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1422         }
    1423     }
    1424     END_STATE()
    1425 
    1426     HTML_BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) {
    1427         if (isTokenizerWhitespace(cc))
    1428             HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
    1429         if (cc == '"') {
    1430             m_token->setSystemIdentifierToEmptyString();
    1431             HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
    1432         } else if (cc == '\'') {
    1433             m_token->setSystemIdentifierToEmptyString();
    1434             HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
    1435         } else if (cc == '>') {
    1436             parseError();
    1437             m_token->setForceQuirks();
    1438             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1439         } else if (cc == kEndOfFileMarker) {
    1440             parseError();
    1441             m_token->setForceQuirks();
    1442             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1443         } else {
    1444             parseError();
    1445             m_token->setForceQuirks();
    1446             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1447         }
    1448     }
    1449     END_STATE()
    1450 
    1451     HTML_BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) {
    1452         if (cc == '"')
    1453             HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
    1454         else if (cc == '>') {
    1455             parseError();
    1456             m_token->setForceQuirks();
    1457             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1458         } else if (cc == kEndOfFileMarker) {
    1459             parseError();
    1460             m_token->setForceQuirks();
    1461             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1462         } else {
    1463             m_token->appendToSystemIdentifier(cc);
    1464             HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
    1465         }
    1466     }
    1467     END_STATE()
    1468 
    1469     HTML_BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) {
    1470         if (cc == '\'')
    1471             HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
    1472         else if (cc == '>') {
    1473             parseError();
    1474             m_token->setForceQuirks();
    1475             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1476         } else if (cc == kEndOfFileMarker) {
    1477             parseError();
    1478             m_token->setForceQuirks();
    1479             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1480         } else {
    1481             m_token->appendToSystemIdentifier(cc);
    1482             HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
    1483         }
    1484     }
    1485     END_STATE()
    1486 
    1487     HTML_BEGIN_STATE(AfterDOCTYPESystemIdentifierState) {
    1488         if (isTokenizerWhitespace(cc))
    1489             HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
    1490         else if (cc == '>')
    1491             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1492         else if (cc == kEndOfFileMarker) {
    1493             parseError();
    1494             m_token->setForceQuirks();
    1495             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1496         } else {
    1497             parseError();
    1498             HTML_ADVANCE_TO(BogusDOCTYPEState);
    1499         }
    1500     }
    1501     END_STATE()
    1502 
    1503     HTML_BEGIN_STATE(BogusDOCTYPEState) {
    1504         if (cc == '>')
    1505             return emitAndResumeIn(source, HTMLTokenizer::DataState);
    1506         else if (cc == kEndOfFileMarker)
    1507             return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
    1508         HTML_ADVANCE_TO(BogusDOCTYPEState);
    1509     }
    1510     END_STATE()
    1511 
    1512     HTML_BEGIN_STATE(CDATASectionState) {
    1513         if (cc == ']')
    1514             HTML_ADVANCE_TO(CDATASectionRightSquareBracketState);
    1515         else if (cc == kEndOfFileMarker)
    1516             HTML_RECONSUME_IN(DataState);
    1517         else {
    1518             bufferCharacter(cc);
    1519             HTML_ADVANCE_TO(CDATASectionState);
    1520         }
    1521     }
    1522     END_STATE()
    1523 
    1524     HTML_BEGIN_STATE(CDATASectionRightSquareBracketState) {
    1525         if (cc == ']')
    1526             HTML_ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
    1527         else {
    1528             bufferASCIICharacter(']');
    1529             HTML_RECONSUME_IN(CDATASectionState);
    1530         }
    1531     }
    1532 
    1533     HTML_BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) {
    1534         if (cc == '>')
    1535             HTML_ADVANCE_TO(DataState);
    1536         else {
    1537             bufferASCIICharacter(']');
    1538             bufferASCIICharacter(']');
    1539             HTML_RECONSUME_IN(CDATASectionState);
    1540         }
    1541     }
     948        RECONSUME_IN(BogusCommentState);
     949    END_STATE()
     950
     951    BEGIN_STATE(CommentStartState)
     952        if (character == '-')
     953            ADVANCE_TO(CommentStartDashState);
     954        if (character == '>') {
     955            parseError();
     956            return emitAndResumeInDataState(source);
     957        }
     958        if (character == kEndOfFileMarker) {
     959            parseError();
     960            return emitAndReconsumeInDataState();
     961        }
     962        m_token.appendToComment(character);
     963        ADVANCE_TO(CommentState);
     964    END_STATE()
     965
     966    BEGIN_STATE(CommentStartDashState)
     967        if (character == '-')
     968            ADVANCE_TO(CommentEndState);
     969        if (character == '>') {
     970            parseError();
     971            return emitAndResumeInDataState(source);
     972        }
     973        if (character == kEndOfFileMarker) {
     974            parseError();
     975            return emitAndReconsumeInDataState();
     976        }
     977        m_token.appendToComment('-');
     978        m_token.appendToComment(character);
     979        ADVANCE_TO(CommentState);
     980    END_STATE()
     981
     982    BEGIN_STATE(CommentState)
     983        if (character == '-')
     984            ADVANCE_TO(CommentEndDashState);
     985        if (character == kEndOfFileMarker) {
     986            parseError();
     987            return emitAndReconsumeInDataState();
     988        }
     989        m_token.appendToComment(character);
     990        ADVANCE_TO(CommentState);
     991    END_STATE()
     992
     993    BEGIN_STATE(CommentEndDashState)
     994        if (character == '-')
     995            ADVANCE_TO(CommentEndState);
     996        if (character == kEndOfFileMarker) {
     997            parseError();
     998            return emitAndReconsumeInDataState();
     999        }
     1000        m_token.appendToComment('-');
     1001        m_token.appendToComment(character);
     1002        ADVANCE_TO(CommentState);
     1003    END_STATE()
     1004
     1005    BEGIN_STATE(CommentEndState)
     1006        if (character == '>')
     1007            return emitAndResumeInDataState(source);
     1008        if (character == '!') {
     1009            parseError();
     1010            ADVANCE_TO(CommentEndBangState);
     1011        }
     1012        if (character == '-') {
     1013            parseError();
     1014            m_token.appendToComment('-');
     1015            ADVANCE_TO(CommentEndState);
     1016        }
     1017        if (character == kEndOfFileMarker) {
     1018            parseError();
     1019            return emitAndReconsumeInDataState();
     1020        }
     1021        parseError();
     1022        m_token.appendToComment('-');
     1023        m_token.appendToComment('-');
     1024        m_token.appendToComment(character);
     1025        ADVANCE_TO(CommentState);
     1026    END_STATE()
     1027
     1028    BEGIN_STATE(CommentEndBangState)
     1029        if (character == '-') {
     1030            m_token.appendToComment('-');
     1031            m_token.appendToComment('-');
     1032            m_token.appendToComment('!');
     1033            ADVANCE_TO(CommentEndDashState);
     1034        }
     1035        if (character == '>')
     1036            return emitAndResumeInDataState(source);
     1037        if (character == kEndOfFileMarker) {
     1038            parseError();
     1039            return emitAndReconsumeInDataState();
     1040        }
     1041        m_token.appendToComment('-');
     1042        m_token.appendToComment('-');
     1043        m_token.appendToComment('!');
     1044        m_token.appendToComment(character);
     1045        ADVANCE_TO(CommentState);
     1046    END_STATE()
     1047
     1048    BEGIN_STATE(DOCTYPEState)
     1049        if (isTokenizerWhitespace(character))
     1050            ADVANCE_TO(BeforeDOCTYPENameState);
     1051        if (character == kEndOfFileMarker) {
     1052            parseError();
     1053            m_token.beginDOCTYPE();
     1054            m_token.setForceQuirks();
     1055            return emitAndReconsumeInDataState();
     1056        }
     1057        parseError();
     1058        RECONSUME_IN(BeforeDOCTYPENameState);
     1059    END_STATE()
     1060
     1061    BEGIN_STATE(BeforeDOCTYPENameState)
     1062        if (isTokenizerWhitespace(character))
     1063            ADVANCE_TO(BeforeDOCTYPENameState);
     1064        if (character == '>') {
     1065            parseError();
     1066            m_token.beginDOCTYPE();
     1067            m_token.setForceQuirks();
     1068            return emitAndResumeInDataState(source);
     1069        }
     1070        if (character == kEndOfFileMarker) {
     1071            parseError();
     1072            m_token.beginDOCTYPE();
     1073            m_token.setForceQuirks();
     1074            return emitAndReconsumeInDataState();
     1075        }
     1076        m_token.beginDOCTYPE(toASCIILower(character));
     1077        ADVANCE_TO(DOCTYPENameState);
     1078    END_STATE()
     1079
     1080    BEGIN_STATE(DOCTYPENameState)
     1081        if (isTokenizerWhitespace(character))
     1082            ADVANCE_TO(AfterDOCTYPENameState);
     1083        if (character == '>')
     1084            return emitAndResumeInDataState(source);
     1085        if (character == kEndOfFileMarker) {
     1086            parseError();
     1087            m_token.setForceQuirks();
     1088            return emitAndReconsumeInDataState();
     1089        }
     1090        m_token.appendToName(toASCIILower(character));
     1091        ADVANCE_TO(DOCTYPENameState);
     1092    END_STATE()
     1093
     1094    BEGIN_STATE(AfterDOCTYPENameState)
     1095        if (isTokenizerWhitespace(character))
     1096            ADVANCE_TO(AfterDOCTYPENameState);
     1097        if (character == '>')
     1098            return emitAndResumeInDataState(source);
     1099        if (character == kEndOfFileMarker) {
     1100            parseError();
     1101            m_token.setForceQuirks();
     1102            return emitAndReconsumeInDataState();
     1103        }
     1104        if (isASCIIAlphaCaselessEqual(character, 'p')) {
     1105            auto result = source.advancePastIgnoringCase("public");
     1106            if (result == SegmentedString::DidMatch)
     1107                SWITCH_TO(AfterDOCTYPEPublicKeywordState);
     1108            if (result == SegmentedString::NotEnoughCharacters)
     1109                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     1110        } else if (isASCIIAlphaCaselessEqual(character, 's')) {
     1111            auto result = source.advancePastIgnoringCase("system");
     1112            if (result == SegmentedString::DidMatch)
     1113                SWITCH_TO(AfterDOCTYPESystemKeywordState);
     1114            if (result == SegmentedString::NotEnoughCharacters)
     1115                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
     1116        }
     1117        parseError();
     1118        m_token.setForceQuirks();
     1119        ADVANCE_TO(BogusDOCTYPEState);
     1120    END_STATE()
     1121
     1122    BEGIN_STATE(AfterDOCTYPEPublicKeywordState)
     1123        if (isTokenizerWhitespace(character))
     1124            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
     1125        if (character == '"') {
     1126            parseError();
     1127            m_token.setPublicIdentifierToEmptyString();
     1128            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
     1129        }
     1130        if (character == '\'') {
     1131            parseError();
     1132            m_token.setPublicIdentifierToEmptyString();
     1133            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
     1134        }
     1135        if (character == '>') {
     1136            parseError();
     1137            m_token.setForceQuirks();
     1138            return emitAndResumeInDataState(source);
     1139        }
     1140        if (character == kEndOfFileMarker) {
     1141            parseError();
     1142            m_token.setForceQuirks();
     1143            return emitAndReconsumeInDataState();
     1144        }
     1145        parseError();
     1146        m_token.setForceQuirks();
     1147        ADVANCE_TO(BogusDOCTYPEState);
     1148    END_STATE()
     1149
     1150    BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState)
     1151        if (isTokenizerWhitespace(character))
     1152            ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
     1153        if (character == '"') {
     1154            m_token.setPublicIdentifierToEmptyString();
     1155            ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
     1156        }
     1157        if (character == '\'') {
     1158            m_token.setPublicIdentifierToEmptyString();
     1159            ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
     1160        }
     1161        if (character == '>') {
     1162            parseError();
     1163            m_token.setForceQuirks();
     1164            return emitAndResumeInDataState(source);
     1165        }
     1166        if (character == kEndOfFileMarker) {
     1167            parseError();
     1168            m_token.setForceQuirks();
     1169            return emitAndReconsumeInDataState();
     1170        }
     1171        parseError();
     1172        m_token.setForceQuirks();
     1173        ADVANCE_TO(BogusDOCTYPEState);
     1174    END_STATE()
     1175
     1176    BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState)
     1177        if (character == '"')
     1178            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
     1179        if (character == '>') {
     1180            parseError();
     1181            m_token.setForceQuirks();
     1182            return emitAndResumeInDataState(source);
     1183        }
     1184        if (character == kEndOfFileMarker) {
     1185            parseError();
     1186            m_token.setForceQuirks();
     1187            return emitAndReconsumeInDataState();
     1188        }
     1189        m_token.appendToPublicIdentifier(character);
     1190        ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
     1191    END_STATE()
     1192
     1193    BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState)
     1194        if (character == '\'')
     1195            ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
     1196        if (character == '>') {
     1197            parseError();
     1198            m_token.setForceQuirks();
     1199            return emitAndResumeInDataState(source);
     1200        }
     1201        if (character == kEndOfFileMarker) {
     1202            parseError();
     1203            m_token.setForceQuirks();
     1204            return emitAndReconsumeInDataState();
     1205        }
     1206        m_token.appendToPublicIdentifier(character);
     1207        ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
     1208    END_STATE()
     1209
     1210    BEGIN_STATE(AfterDOCTYPEPublicIdentifierState)
     1211        if (isTokenizerWhitespace(character))
     1212            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
     1213        if (character == '>')
     1214            return emitAndResumeInDataState(source);
     1215        if (character == '"') {
     1216            parseError();
     1217            m_token.setSystemIdentifierToEmptyString();
     1218            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     1219        }
     1220        if (character == '\'') {
     1221            parseError();
     1222            m_token.setSystemIdentifierToEmptyString();
     1223            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     1224        }
     1225        if (character == kEndOfFileMarker) {
     1226            parseError();
     1227            m_token.setForceQuirks();
     1228            return emitAndReconsumeInDataState();
     1229        }
     1230        parseError();
     1231        m_token.setForceQuirks();
     1232        ADVANCE_TO(BogusDOCTYPEState);
     1233    END_STATE()
     1234
     1235    BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState)
     1236        if (isTokenizerWhitespace(character))
     1237            ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
     1238        if (character == '>')
     1239            return emitAndResumeInDataState(source);
     1240        if (character == '"') {
     1241            m_token.setSystemIdentifierToEmptyString();
     1242            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     1243        }
     1244        if (character == '\'') {
     1245            m_token.setSystemIdentifierToEmptyString();
     1246            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     1247        }
     1248        if (character == kEndOfFileMarker) {
     1249            parseError();
     1250            m_token.setForceQuirks();
     1251            return emitAndReconsumeInDataState();
     1252        }
     1253        parseError();
     1254        m_token.setForceQuirks();
     1255        ADVANCE_TO(BogusDOCTYPEState);
     1256    END_STATE()
     1257
     1258    BEGIN_STATE(AfterDOCTYPESystemKeywordState)
     1259        if (isTokenizerWhitespace(character))
     1260            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
     1261        if (character == '"') {
     1262            parseError();
     1263            m_token.setSystemIdentifierToEmptyString();
     1264            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     1265        }
     1266        if (character == '\'') {
     1267            parseError();
     1268            m_token.setSystemIdentifierToEmptyString();
     1269            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     1270        }
     1271        if (character == '>') {
     1272            parseError();
     1273            m_token.setForceQuirks();
     1274            return emitAndResumeInDataState(source);
     1275        }
     1276        if (character == kEndOfFileMarker) {
     1277            parseError();
     1278            m_token.setForceQuirks();
     1279            return emitAndReconsumeInDataState();
     1280        }
     1281        parseError();
     1282        m_token.setForceQuirks();
     1283        ADVANCE_TO(BogusDOCTYPEState);
     1284    END_STATE()
     1285
     1286    BEGIN_STATE(BeforeDOCTYPESystemIdentifierState)
     1287        if (isTokenizerWhitespace(character))
     1288            ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
     1289        if (character == '"') {
     1290            m_token.setSystemIdentifierToEmptyString();
     1291            ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     1292        }
     1293        if (character == '\'') {
     1294            m_token.setSystemIdentifierToEmptyString();
     1295            ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     1296        }
     1297        if (character == '>') {
     1298            parseError();
     1299            m_token.setForceQuirks();
     1300            return emitAndResumeInDataState(source);
     1301        }
     1302        if (character == kEndOfFileMarker) {
     1303            parseError();
     1304            m_token.setForceQuirks();
     1305            return emitAndReconsumeInDataState();
     1306        }
     1307        parseError();
     1308        m_token.setForceQuirks();
     1309        ADVANCE_TO(BogusDOCTYPEState);
     1310    END_STATE()
     1311
     1312    BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState)
     1313        if (character == '"')
     1314            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
     1315        if (character == '>') {
     1316            parseError();
     1317            m_token.setForceQuirks();
     1318            return emitAndResumeInDataState(source);
     1319        }
     1320        if (character == kEndOfFileMarker) {
     1321            parseError();
     1322            m_token.setForceQuirks();
     1323            return emitAndReconsumeInDataState();
     1324        }
     1325        m_token.appendToSystemIdentifier(character);
     1326        ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
     1327    END_STATE()
     1328
     1329    BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState)
     1330        if (character == '\'')
     1331            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
     1332        if (character == '>') {
     1333            parseError();
     1334            m_token.setForceQuirks();
     1335            return emitAndResumeInDataState(source);
     1336        }
     1337        if (character == kEndOfFileMarker) {
     1338            parseError();
     1339            m_token.setForceQuirks();
     1340            return emitAndReconsumeInDataState();
     1341        }
     1342        m_token.appendToSystemIdentifier(character);
     1343        ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
     1344    END_STATE()
     1345
     1346    BEGIN_STATE(AfterDOCTYPESystemIdentifierState)
     1347        if (isTokenizerWhitespace(character))
     1348            ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
     1349        if (character == '>')
     1350            return emitAndResumeInDataState(source);
     1351        if (character == kEndOfFileMarker) {
     1352            parseError();
     1353            m_token.setForceQuirks();
     1354            return emitAndReconsumeInDataState();
     1355        }
     1356        parseError();
     1357        ADVANCE_TO(BogusDOCTYPEState);
     1358    END_STATE()
     1359
     1360    BEGIN_STATE(BogusDOCTYPEState)
     1361        if (character == '>')
     1362            return emitAndResumeInDataState(source);
     1363        if (character == kEndOfFileMarker)
     1364            return emitAndReconsumeInDataState();
     1365        ADVANCE_TO(BogusDOCTYPEState);
     1366    END_STATE()
     1367
     1368    BEGIN_STATE(CDATASectionState)
     1369        if (character == ']')
     1370            ADVANCE_TO(CDATASectionRightSquareBracketState);
     1371        if (character == kEndOfFileMarker)
     1372            RECONSUME_IN(DataState);
     1373        bufferCharacter(character);
     1374        ADVANCE_TO(CDATASectionState);
     1375    END_STATE()
     1376
     1377    BEGIN_STATE(CDATASectionRightSquareBracketState)
     1378        if (character == ']')
     1379            ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
     1380        bufferASCIICharacter(']');
     1381        RECONSUME_IN(CDATASectionState);
     1382    END_STATE()
     1383
     1384    BEGIN_STATE(CDATASectionDoubleRightSquareBracketState)
     1385        if (character == '>')
     1386            ADVANCE_TO(DataState);
     1387        bufferASCIICharacter(']');
     1388        bufferASCIICharacter(']');
     1389        RECONSUME_IN(CDATASectionState);
    15421390    END_STATE()
    15431391
     
    15621410{
    15631411    if (tagName == textareaTag || tagName == titleTag)
    1564         setState(HTMLTokenizer::RCDATAState);
     1412        m_state = RCDATAState;
    15651413    else if (tagName == plaintextTag)
    1566         setState(HTMLTokenizer::PLAINTEXTState);
     1414        m_state = PLAINTEXTState;
    15671415    else if (tagName == scriptTag)
    1568         setState(HTMLTokenizer::ScriptDataState);
     1416        m_state = ScriptDataState;
    15691417    else if (tagName == styleTag
    15701418        || tagName == iframeTag
     
    15731421        || tagName == noframesTag
    15741422        || (tagName == noscriptTag && m_options.scriptEnabled))
    1575         setState(HTMLTokenizer::RAWTEXTState);
    1576 }
    1577 
    1578 inline bool HTMLTokenizer::temporaryBufferIs(const String& expectedString)
     1423        m_state = RAWTEXTState;
     1424}
     1425
     1426inline void HTMLTokenizer::appendToTemporaryBuffer(UChar character)
     1427{
     1428    ASSERT(isASCII(character));
     1429    m_temporaryBuffer.append(character);
     1430}
     1431
     1432inline bool HTMLTokenizer::temporaryBufferIs(const char* expectedString)
    15791433{
    15801434    return vectorEqualsString(m_temporaryBuffer, expectedString);
    15811435}
    15821436
    1583 inline void HTMLTokenizer::addToPossibleEndTag(LChar cc)
    1584 {
    1585     ASSERT(isEndTagBufferingState(m_state));
    1586     m_bufferedEndTagName.append(cc);
    1587 }
    1588 
    1589 inline bool HTMLTokenizer::isAppropriateEndTag()
     1437inline void HTMLTokenizer::appendToPossibleEndTag(UChar character)
     1438{
     1439    ASSERT(isASCII(character));
     1440    m_bufferedEndTagName.append(character);
     1441}
     1442
     1443inline bool HTMLTokenizer::isAppropriateEndTag() const
    15901444{
    15911445    if (m_bufferedEndTagName.size() != m_appropriateEndTagName.size())
    15921446        return false;
    15931447
    1594     size_t numCharacters = m_bufferedEndTagName.size();
    1595 
    1596     for (size_t i = 0; i < numCharacters; i++) {
     1448    unsigned size = m_bufferedEndTagName.size();
     1449
     1450    for (unsigned i = 0; i < size; i++) {
    15971451        if (m_bufferedEndTagName[i] != m_appropriateEndTagName[i])
    15981452            return false;
     
    16041458inline void HTMLTokenizer::parseError()
    16051459{
    1606     notImplemented();
    1607 }
    1608 
    1609 }
     1460}
     1461
     1462}
  • trunk/Source/WebCore/html/parser/HTMLTokenizer.h

    r178173 r178265  
    11/*
    2  * Copyright (C) 2008 Apple Inc. All Rights Reserved.
     2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
    33 * Copyright (C) 2010 Google, Inc. All Rights Reserved.
    44 *
     
    3131#include "HTMLToken.h"
    3232#include "InputStreamPreprocessor.h"
    33 #include "SegmentedString.h"
    3433
    3534namespace WebCore {
    3635
     36class SegmentedString;
     37
    3738class HTMLTokenizer {
    38     WTF_MAKE_NONCOPYABLE(HTMLTokenizer);
    39     WTF_MAKE_FAST_ALLOCATED;
    4039public:
    41     explicit HTMLTokenizer(const HTMLParserOptions&);
    42     ~HTMLTokenizer();
    43 
    44     void reset();
    45 
     40    explicit HTMLTokenizer(const HTMLParserOptions& = HTMLParserOptions());
     41
     42    // If we can't parse a whole token, this returns null.
     43    class TokenPtr;
     44    TokenPtr nextToken(SegmentedString&);
     45
     46    // Returns a copy of any characters buffered internally by the tokenizer.
     47    // The tokenizer buffers characters when searching for the </script> token that terminates a script element.
     48    String bufferedCharacters() const;
     49    size_t numberOfBufferedCharacters() const;
     50
     51    // Updates the tokenizer's state according to the given tag name. This is an approximation of how the tree
     52    // builder would update the tokenizer's state. This method is useful for approximating HTML tokenization.
     53    // To get exactly the correct tokenization, you need the real tree builder.
     54    //
     55    // The main failures in the approximation are as follows:
     56    //
     57    //  * The first set of character tokens emitted for a <pre> element might contain an extra leading newline.
     58    //  * The replacement of U+0000 with U+FFFD will not be sensitive to the tree builder's insertion mode.
     59    //  * CDATA sections in foreign content will be tokenized as bogus comments instead of as character tokens.
     60    //
     61    // This approximation is also the algorithm called for when parsing an HTML fragment.
     62    // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
     63    void updateStateFor(const AtomicString& tagName);
     64
     65    void setForceNullCharacterReplacement(bool);
     66
     67    bool shouldAllowCDATA() const;
     68    void setShouldAllowCDATA(bool);
     69
     70    bool isInDataState() const;
     71
     72    void setDataState();
     73    void setPLAINTEXTState();
     74    void setRAWTEXTState();
     75    void setRCDATAState();
     76    void setScriptDataState();
     77
     78    bool neverSkipNullCharacters() const;
     79
     80private:
    4681    enum State {
    4782        DataState,
     
    89124        SelfClosingStartTagState,
    90125        BogusCommentState,
    91         // The ContinueBogusCommentState is not in the HTML5 spec, but we use
    92         // it internally to keep track of whether we've started the bogus
    93         // comment token yet.
    94         ContinueBogusCommentState,
     126        ContinueBogusCommentState, // Not in the HTML spec, used internally to track whether we started the bogus comment token.
    95127        MarkupDeclarationOpenState,
    96128        CommentStartState,
     
    122154    };
    123155
    124     // This function returns true if it emits a token. Otherwise, callers
    125     // must provide the same (in progress) token on the next call (unless
    126     // they call reset() first).
    127     bool nextToken(SegmentedString&, HTMLToken&);
    128 
    129     // Returns a copy of any characters buffered internally by the tokenizer.
    130     // The tokenizer buffers characters when searching for the </script> token
    131     // that terminates a script element.
    132     String bufferedCharacters() const;
    133 
    134     size_t numberOfBufferedCharacters() const
    135     {
    136         // Notice that we add 2 to the length of the m_temporaryBuffer to
    137         // account for the "</" characters, which are effecitvely buffered in
    138         // the tokenizer's state machine.
    139         return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
    140     }
    141 
    142     // Updates the tokenizer's state according to the given tag name. This is
    143     // an approximation of how the tree builder would update the tokenizer's
    144     // state. This method is useful for approximating HTML tokenization. To
    145     // get exactly the correct tokenization, you need the real tree builder.
    146     //
    147     // The main failures in the approximation are as follows:
    148     //
    149     //  * The first set of character tokens emitted for a <pre> element might
    150     //    contain an extra leading newline.
    151     //  * The replacement of U+0000 with U+FFFD will not be sensitive to the
    152     //    tree builder's insertion mode.
    153     //  * CDATA sections in foreign content will be tokenized as bogus comments
    154     //    instead of as character tokens.
    155     //
    156     void updateStateFor(const AtomicString& tagName);
    157 
    158     bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
    159     void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
    160 
    161     bool shouldAllowCDATA() const { return m_shouldAllowCDATA; }
    162     void setShouldAllowCDATA(bool value) { m_shouldAllowCDATA = value; }
    163 
    164     State state() const { return m_state; }
    165     void setState(State state) { m_state = state; }
    166 
    167     inline bool shouldSkipNullCharacters() const
    168     {
    169         return !m_forceNullCharacterReplacement
    170             && (m_state == HTMLTokenizer::DataState
    171                 || m_state == HTMLTokenizer::RCDATAState
    172                 || m_state == HTMLTokenizer::RAWTEXTState);
    173     }
    174 
    175 private:
    176     inline bool processEntity(SegmentedString&);
    177 
    178     inline void parseError();
    179 
    180     void bufferASCIICharacter(UChar character)
    181     {
    182         ASSERT(character != kEndOfFileMarker);
    183         ASSERT(isASCII(character));
    184         m_token->appendToCharacter(static_cast<LChar>(character));
    185     }
    186 
    187     void bufferCharacter(UChar character)
    188     {
    189         ASSERT(character != kEndOfFileMarker);
    190         m_token->appendToCharacter(character);
    191     }
    192     void bufferCharacter(char) = delete;
    193     void bufferCharacter(LChar) = delete;
    194 
    195     inline bool emitAndResumeIn(SegmentedString& source, State state)
    196     {
    197         saveEndTagNameIfNeeded();
    198         m_state = state;
    199         source.advanceAndUpdateLineNumber();
    200         return true;
    201     }
    202    
    203     inline bool emitAndReconsumeIn(SegmentedString&, State state)
    204     {
    205         saveEndTagNameIfNeeded();
    206         m_state = state;
    207         return true;
    208     }
    209 
    210     inline bool emitEndOfFile(SegmentedString& source)
    211     {
    212         if (haveBufferedCharacterToken())
    213             return true;
    214         m_state = HTMLTokenizer::DataState;
    215         source.advanceAndUpdateLineNumber();
    216         m_token->clear();
    217         m_token->makeEndOfFile();
    218         return true;
    219     }
    220 
    221     inline bool flushEmitAndResumeIn(SegmentedString&, State);
    222 
    223     // Return whether we need to emit a character token before dealing with
    224     // the buffered end tag.
    225     inline bool flushBufferedEndTag(SegmentedString&);
    226     inline bool temporaryBufferIs(const String&);
    227 
    228     // Sometimes we speculatively consume input characters and we don't
    229     // know whether they represent end tags or RCDATA, etc. These
    230     // functions help manage these state.
    231     inline void addToPossibleEndTag(LChar cc);
    232 
    233     inline void saveEndTagNameIfNeeded()
    234     {
    235         ASSERT(m_token->type() != HTMLToken::Uninitialized);
    236         if (m_token->type() == HTMLToken::StartTag)
    237             m_appropriateEndTagName = m_token->name();
    238     }
    239     inline bool isAppropriateEndTag();
    240 
    241 
    242     inline bool haveBufferedCharacterToken()
    243     {
    244         return m_token->type() == HTMLToken::Character;
    245     }
    246 
    247     State m_state;
    248     bool m_forceNullCharacterReplacement;
    249     bool m_shouldAllowCDATA;
    250 
    251     // m_token is owned by the caller. If nextToken is not on the stack,
    252     // this member might be pointing to unallocated memory.
    253     HTMLToken* m_token;
    254 
    255     // http://www.whatwg.org/specs/web-apps/current-work/#additional-allowed-character
    256     UChar m_additionalAllowedCharacter;
    257 
    258     // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
    259     InputStreamPreprocessor<HTMLTokenizer> m_inputStreamPreprocessor;
     156    bool processToken(SegmentedString&);
     157    bool processEntity(SegmentedString&);
     158
     159    void parseError();
     160
     161    void bufferASCIICharacter(UChar);
     162    void bufferCharacter(UChar);
     163
     164    bool emitAndResumeInDataState(SegmentedString&);
     165    bool emitAndReconsumeInDataState();
     166    bool emitEndOfFile(SegmentedString&);
     167
     168    // Return true if we wil emit a character token before dealing with the buffered end tag.
     169    void flushBufferedEndTag();
     170    bool commitToPartialEndTag(SegmentedString&, UChar, State);
     171    bool commitToCompleteEndTag(SegmentedString&);
     172
     173    void appendToTemporaryBuffer(UChar);
     174    bool temporaryBufferIs(const char*);
     175
     176    // Sometimes we speculatively consume input characters and we don't know whether they represent
     177    // end tags or RCDATA, etc. These functions help manage these state.
     178    bool inEndTagBufferingState() const;
     179    void appendToPossibleEndTag(UChar);
     180    void saveEndTagNameIfNeeded();
     181    bool isAppropriateEndTag() const;
     182
     183    bool haveBufferedCharacterToken() const;
     184
     185    static bool isNullCharacterSkippingState(State);
     186
     187    State m_state { DataState };
     188    bool m_forceNullCharacterReplacement { false };
     189    bool m_shouldAllowCDATA { false };
     190
     191    mutable HTMLToken m_token;
     192
     193    // https://html.spec.whatwg.org/#additional-allowed-character
     194    UChar m_additionalAllowedCharacter { 0 };
     195
     196    // https://html.spec.whatwg.org/#preprocessing-the-input-stream
     197    InputStreamPreprocessor<HTMLTokenizer> m_preprocessor;
    260198
    261199    Vector<UChar, 32> m_appropriateEndTagName;
    262200
    263     // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer
     201    // https://html.spec.whatwg.org/#temporary-buffer
    264202    Vector<LChar, 32> m_temporaryBuffer;
    265203
    266     // We occationally want to emit both a character token and an end tag
     204    // We occasionally want to emit both a character token and an end tag
    267205    // token (e.g., when lexing script). We buffer the name of the end tag
    268206    // token here so we remember it next time we re-enter the tokenizer.
    269207    Vector<LChar, 32> m_bufferedEndTagName;
    270208
    271     HTMLParserOptions m_options;
     209    const HTMLParserOptions m_options;
    272210};
    273211
     212class HTMLTokenizer::TokenPtr {
     213public:
     214    TokenPtr();
     215    ~TokenPtr();
     216
     217    TokenPtr(TokenPtr&&);
     218    TokenPtr& operator=(TokenPtr&&) = delete;
     219
     220    void clear();
     221
     222    operator bool() const;
     223
     224    HTMLToken& operator*() const;
     225    HTMLToken* operator->() const;
     226
     227private:
     228    friend class HTMLTokenizer;
     229    explicit TokenPtr(HTMLToken*);
     230
     231    HTMLToken* m_token { nullptr };
     232};
     233
     234inline HTMLTokenizer::TokenPtr::TokenPtr()
     235{
     236}
     237
     238inline HTMLTokenizer::TokenPtr::TokenPtr(HTMLToken* token)
     239    : m_token(token)
     240{
     241}
     242
     243inline HTMLTokenizer::TokenPtr::~TokenPtr()
     244{
     245    if (m_token)
     246        m_token->clear();
     247}
     248
     249inline HTMLTokenizer::TokenPtr::TokenPtr(TokenPtr&& other)
     250    : m_token(other.m_token)
     251{
     252    other.m_token = nullptr;
     253}
     254
     255inline void HTMLTokenizer::TokenPtr::clear()
     256{
     257    if (m_token) {
     258        m_token->clear();
     259        m_token = nullptr;
     260    }
     261}
     262
     263inline HTMLTokenizer::TokenPtr::operator bool() const
     264{
     265    return m_token;
     266}
     267
     268inline HTMLToken& HTMLTokenizer::TokenPtr::operator*() const
     269{
     270    ASSERT(m_token);
     271    return *m_token;
     272}
     273
     274inline HTMLToken* HTMLTokenizer::TokenPtr::operator->() const
     275{
     276    ASSERT(m_token);
     277    return m_token;
     278}
     279
     280inline HTMLTokenizer::TokenPtr HTMLTokenizer::nextToken(SegmentedString& source)
     281{
     282    return TokenPtr(processToken(source) ? &m_token : nullptr);
     283}
     284
     285inline size_t HTMLTokenizer::numberOfBufferedCharacters() const
     286{
     287    // Notice that we add 2 to the length of the m_temporaryBuffer to
     288    // account for the "</" characters, which are effecitvely buffered in
     289    // the tokenizer's state machine.
     290    return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
     291}
     292
     293inline void HTMLTokenizer::setForceNullCharacterReplacement(bool value)
     294{
     295    m_forceNullCharacterReplacement = value;
     296}
     297
     298inline bool HTMLTokenizer::shouldAllowCDATA() const
     299{
     300    return m_shouldAllowCDATA;
     301}
     302
     303inline void HTMLTokenizer::setShouldAllowCDATA(bool value)
     304{
     305    m_shouldAllowCDATA = value;
     306}
     307
     308inline bool HTMLTokenizer::isInDataState() const
     309{
     310    return m_state == DataState;
     311}
     312
     313inline void HTMLTokenizer::setDataState()
     314{
     315    m_state = DataState;
     316}
     317
     318inline void HTMLTokenizer::setPLAINTEXTState()
     319{
     320    m_state = PLAINTEXTState;
     321}
     322
     323inline void HTMLTokenizer::setRAWTEXTState()
     324{
     325    m_state = RAWTEXTState;
     326}
     327
     328inline void HTMLTokenizer::setRCDATAState()
     329{
     330    m_state = RCDATAState;
     331}
     332
     333inline void HTMLTokenizer::setScriptDataState()
     334{
     335    m_state = ScriptDataState;
     336}
     337
     338inline bool HTMLTokenizer::isNullCharacterSkippingState(State state)
     339{
     340    return state == DataState || state == RCDATAState || state == RAWTEXTState;
     341}
     342
     343inline bool HTMLTokenizer::neverSkipNullCharacters() const
     344{
     345    return m_forceNullCharacterReplacement;
     346}
     347
    274348}
    275349
  • trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp

    r178173 r178265  
    696696        processFakePEndTagIfPInButtonScope();
    697697        m_tree.insertHTMLElement(&token);
    698         m_parser.tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
     698        m_parser.tokenizer().setPLAINTEXTState();
    699699        return;
    700700    }
     
    800800        m_tree.insertHTMLElement(&token);
    801801        m_shouldSkipLeadingNewline = true;
    802         m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
     802        m_parser.tokenizer().setRCDATAState();
    803803        m_originalInsertionMode = m_insertionMode;
    804804        m_framesetOk = false;
     
    21382138            // quirks are enabled. We must set the tokenizer's state to
    21392139            // DataState explicitly if the tokenizer didn't have a chance to.
    2140             ASSERT(m_parser.tokenizer().state() == HTMLTokenizer::DataState || m_options.usePreHTML5ParserQuirks);
    2141             m_parser.tokenizer().setState(HTMLTokenizer::DataState);
     2140            ASSERT(m_parser.tokenizer().isInDataState() || m_options.usePreHTML5ParserQuirks);
     2141            m_parser.tokenizer().setDataState();
    21422142            return;
    21432143        }
     
    27402740    ASSERT(token.type() == HTMLToken::StartTag);
    27412741    m_tree.insertHTMLElement(&token);
    2742     m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
     2742    m_parser.tokenizer().setRCDATAState();
    27432743    m_originalInsertionMode = m_insertionMode;
    27442744    m_insertionMode = InsertionMode::Text;
     
    27492749    ASSERT(token.type() == HTMLToken::StartTag);
    27502750    m_tree.insertHTMLElement(&token);
    2751     m_parser.tokenizer().setState(HTMLTokenizer::RAWTEXTState);
     2751    m_parser.tokenizer().setRAWTEXTState();
    27522752    m_originalInsertionMode = m_insertionMode;
    27532753    m_insertionMode = InsertionMode::Text;
     
    27582758    ASSERT(token.type() == HTMLToken::StartTag);
    27592759    m_tree.insertScriptElement(&token);
    2760     m_parser.tokenizer().setState(HTMLTokenizer::ScriptDataState);
     2760    m_parser.tokenizer().setScriptDataState();
    27612761    m_originalInsertionMode = m_insertionMode;
    27622762
  • trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h

    r178173 r178265  
    4141    WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor);
    4242public:
    43     InputStreamPreprocessor(Tokenizer* tokenizer)
     43    explicit InputStreamPreprocessor(Tokenizer& tokenizer)
    4444        : m_tokenizer(tokenizer)
    4545    {
     
    5252    // The only way we can fail to peek is if there are no more
    5353    // characters in |source| (after collapsing \r\n, etc).
    54     ALWAYS_INLINE bool peek(SegmentedString& source)
     54    ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false)
    5555    {
     56        if (source.isEmpty())
     57            return false;
     58
    5659        m_nextInputCharacter = source.currentChar();
    5760
     
    6568            return true;
    6669        }
    67         return processNextInputCharacter(source);
     70        return processNextInputCharacter(source, skipNullCharacters);
    6871    }
    6972
    7073    // Returns whether there are more characters in |source| after advancing.
    71     ALWAYS_INLINE bool advance(SegmentedString& source)
     74    ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false)
    7275    {
    7376        source.advanceAndUpdateLineNumber();
    74         if (source.isEmpty())
    75             return false;
    76         return peek(source);
     77        return peek(source, skipNullCharacters);
    7778    }
    7879
     
    8687
    8788private:
    88     bool processNextInputCharacter(SegmentedString& source)
     89    bool processNextInputCharacter(SegmentedString& source, bool skipNullCharacters)
    8990    {
    9091    ProcessAgain:
     
    108109            // that filtering breaks surrogate pair handling and causes us not to match Minefield.
    109110            if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
    110                 if (m_tokenizer->shouldSkipNullCharacters()) {
     111                if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
    111112                    source.advancePastNonNewline();
    112113                    if (source.isEmpty())
     
    126127    }
    127128
    128     Tokenizer* m_tokenizer;
     129    Tokenizer& m_tokenizer;
    129130
    130131    // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
  • trunk/Source/WebCore/html/parser/TextDocumentParser.cpp

    r178173 r178265  
    6262    // Although Text Documents expose a "pre" element in their DOM, they
    6363    // act like a <plaintext> tag, so we have to force plaintext mode.
    64     tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
     64    tokenizer().setPLAINTEXTState();
    6565
    6666    m_haveInsertedFakePreElement = true;
  • trunk/Source/WebCore/html/parser/XSSAuditor.cpp

    r178173 r178265  
    567567{
    568568    // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<").
    569     return fullyDecodeString(request.sourceTracker.sourceForToken(request.token), m_encoding).substring(0, request.token.name().size() + 1);
     569    return fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1);
    570570}
    571571
     
    576576    // unquoted input of |name=value |, the snippet is |name=value|.
    577577    // FIXME: We should grab one character before the name also.
    578     unsigned start = attribute.nameRange.start;
    579     unsigned end = attribute.valueRange.end;
    580     String decodedSnippet = fullyDecodeString(request.sourceTracker.sourceForToken(request.token).substring(start, end - start), m_encoding);
     578    unsigned start = attribute.startOffset;
     579    unsigned end = attribute.endOffset;
     580    String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
    581581    decodedSnippet.truncate(kMaximumFragmentLengthTarget);
    582582    if (treatment == SrcLikeAttribute) {
     
    631631String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request)
    632632{
    633     String string = request.sourceTracker.sourceForToken(request.token);
     633    String string = request.sourceTracker.source(request.token);
    634634    size_t startPosition = 0;
    635635    size_t endPosition = string.length();
     
    738738}
    739739
    740 bool XSSAuditor::isSafeToSendToAnotherThread() const
    741 {
    742     return m_documentURL.isSafeToSendToAnotherThread()
    743         && m_decodedURL.isSafeToSendToAnotherThread()
    744         && m_decodedHTTPBody.isSafeToSendToAnotherThread()
    745         && m_cachedDecodedSnippet.isSafeToSendToAnotherThread();
    746 }
    747 
    748740} // namespace WebCore
  • trunk/Source/WebCore/html/parser/XSSAuditor.h

    r178173 r178265  
    6262
    6363    std::unique_ptr<XSSInfo> filterToken(const FilterTokenRequest&);
    64     bool isSafeToSendToAnotherThread() const;
    6564
    6665private:
  • trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp

    r178173 r178265  
    11/*
    22 * Copyright (C) 2011, 2013 Google Inc.  All rights reserved.
    3  * Copyright (C) 2014 Apple Inc.  All rights reserved.
     3 * Copyright (C) 2014-2015 Apple Inc.  All rights reserved.
    44 *
    55 * Redistribution and use in source and binary forms, with or without
     
    4242namespace WebCore {
    4343
    44 #define WEBVTT_BEGIN_STATE(stateName) case stateName: stateName:
    45 #define WEBVTT_ADVANCE_TO(stateName)                               \
    46     do {                                                           \
    47         state = stateName;                                         \
    48         ASSERT(!m_input.isEmpty());                                \
    49         m_inputStreamPreprocessor.advance(m_input);                \
    50         cc = m_inputStreamPreprocessor.nextInputCharacter();       \
    51         goto stateName;                                            \
     44#define WEBVTT_ADVANCE_TO(stateName)                        \
     45    do {                                                    \
     46        ASSERT(!m_input.isEmpty());                         \
     47        m_preprocessor.advance(m_input);                    \
     48        character = m_preprocessor.nextInputCharacter();    \
     49        goto stateName;                                     \
    5250    } while (false)
    53 
    5451   
    55 template<unsigned charactersCount>
    56 ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
     52template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
    5753{
    5854    return WTF::equal(s, reinterpret_cast<const LChar*>(characters), charactersCount - 1);
     
    8076WebVTTTokenizer::WebVTTTokenizer(const String& input)
    8177    : m_input(input)
    82     , m_inputStreamPreprocessor(this)
     78    , m_preprocessor(*this)
    8379{
    8480    // Append an EOF marker and close the input "stream".
     
    9086bool WebVTTTokenizer::nextToken(WebVTTToken& token)
    9187{
    92     if (m_input.isEmpty() || !m_inputStreamPreprocessor.peek(m_input))
     88    if (m_input.isEmpty() || !m_preprocessor.peek(m_input))
    9389        return false;
    9490
    95     UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
    96     if (cc == kEndOfFileMarker) {
    97         m_inputStreamPreprocessor.advance(m_input);
     91    UChar character = m_preprocessor.nextInputCharacter();
     92    if (character == kEndOfFileMarker) {
     93        m_preprocessor.advance(m_input);
    9894        return false;
    9995    }
     
    10399    StringBuilder classes;
    104100
    105     enum {
    106         DataState,
    107         EscapeState,
    108         TagState,
    109         StartTagState,
    110         StartTagClassState,
    111         StartTagAnnotationState,
    112         EndTagState,
    113         TimestampTagState,
    114     } state = DataState;
    115 
    116     // 4.8.10.13.4 WebVTT cue text tokenizer
    117     switch (state) {
    118     WEBVTT_BEGIN_STATE(DataState) {
    119         if (cc == '&') {
    120             buffer.append(static_cast<LChar>(cc));
     101// 4.8.10.13.4 WebVTT cue text tokenizer
     102DataState:
     103    if (character == '&') {
     104        buffer.append('&');
     105        WEBVTT_ADVANCE_TO(EscapeState);
     106    } else if (character == '<') {
     107        if (result.isEmpty())
     108            WEBVTT_ADVANCE_TO(TagState);
     109        else {
     110            // We don't want to advance input or perform a state transition - just return a (new) token.
     111            // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
     112            return emitToken(token, WebVTTToken::StringToken(result.toString()));
     113        }
     114    } else if (character == kEndOfFileMarker)
     115        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
     116    else {
     117        result.append(character);
     118        WEBVTT_ADVANCE_TO(DataState);
     119    }
     120
     121EscapeState:
     122    if (character == ';') {
     123        if (equalLiteral(buffer, "&amp"))
     124            result.append('&');
     125        else if (equalLiteral(buffer, "&lt"))
     126            result.append('<');
     127        else if (equalLiteral(buffer, "&gt"))
     128            result.append('>');
     129        else if (equalLiteral(buffer, "&lrm"))
     130            result.append(leftToRightMark);
     131        else if (equalLiteral(buffer, "&rlm"))
     132            result.append(rightToLeftMark);
     133        else if (equalLiteral(buffer, "&nbsp"))
     134            result.append(noBreakSpace);
     135        else {
     136            buffer.append(character);
     137            result.append(buffer);
     138        }
     139        buffer.clear();
     140        WEBVTT_ADVANCE_TO(DataState);
     141    } else if (isASCIIAlphanumeric(character)) {
     142        buffer.append(character);
     143        WEBVTT_ADVANCE_TO(EscapeState);
     144    } else if (character == '<') {
     145        result.append(buffer);
     146        return emitToken(token, WebVTTToken::StringToken(result.toString()));
     147    } else if (character == kEndOfFileMarker) {
     148        result.append(buffer);
     149        return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
     150    } else {
     151        result.append(buffer);
     152        buffer.clear();
     153
     154        if (character == '&') {
     155            buffer.append('&');
    121156            WEBVTT_ADVANCE_TO(EscapeState);
    122         } else if (cc == '<') {
    123             if (result.isEmpty())
    124                 WEBVTT_ADVANCE_TO(TagState);
    125             else {
    126                 // We don't want to advance input or perform a state transition - just return a (new) token.
    127                 // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
    128                 return emitToken(token, WebVTTToken::StringToken(result.toString()));
    129             }
    130         } else if (cc == kEndOfFileMarker)
    131             return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
    132         else {
    133             result.append(cc);
    134             WEBVTT_ADVANCE_TO(DataState);
    135157        }
    136     }
    137     END_STATE()
    138 
    139     WEBVTT_BEGIN_STATE(EscapeState) {
    140         if (cc == ';') {
    141             if (equalLiteral(buffer, "&amp"))
    142                 result.append('&');
    143             else if (equalLiteral(buffer, "&lt"))
    144                 result.append('<');
    145             else if (equalLiteral(buffer, "&gt"))
    146                 result.append('>');
    147             else if (equalLiteral(buffer, "&lrm"))
    148                 result.append(leftToRightMark);
    149             else if (equalLiteral(buffer, "&rlm"))
    150                 result.append(rightToLeftMark);
    151             else if (equalLiteral(buffer, "&nbsp"))
    152                 result.append(noBreakSpace);
    153             else {
    154                 buffer.append(static_cast<LChar>(cc));
    155                 result.append(buffer);
    156             }
    157             buffer.clear();
    158             WEBVTT_ADVANCE_TO(DataState);
    159         } else if (isASCIIAlphanumeric(cc)) {
    160             buffer.append(static_cast<LChar>(cc));
    161             WEBVTT_ADVANCE_TO(EscapeState);
    162         } else if (cc == '<') {
    163             result.append(buffer);
    164             return emitToken(token, WebVTTToken::StringToken(result.toString()));
    165         } else if (cc == kEndOfFileMarker) {
    166             result.append(buffer);
    167             return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
    168         } else {
    169             result.append(buffer);
    170             buffer.clear();
    171 
    172             if (cc == '&') {
    173                 buffer.append(static_cast<LChar>(cc));
    174                 WEBVTT_ADVANCE_TO(EscapeState);
    175             }
    176             result.append(cc);
    177             WEBVTT_ADVANCE_TO(DataState);
    178         }
    179     }
    180     END_STATE()
    181 
    182     WEBVTT_BEGIN_STATE(TagState) {
    183         if (isTokenizerWhitespace(cc)) {
    184             ASSERT(result.isEmpty());
    185             WEBVTT_ADVANCE_TO(StartTagAnnotationState);
    186         } else if (cc == '.') {
    187             ASSERT(result.isEmpty());
    188             WEBVTT_ADVANCE_TO(StartTagClassState);
    189         } else if (cc == '/') {
    190             WEBVTT_ADVANCE_TO(EndTagState);
    191         } else if (WTF::isASCIIDigit(cc)) {
    192             result.append(cc);
    193             WEBVTT_ADVANCE_TO(TimestampTagState);
    194         } else if (cc == '>' || cc == kEndOfFileMarker) {
    195             ASSERT(result.isEmpty());
    196             return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
    197         } else {
    198             result.append(cc);
    199             WEBVTT_ADVANCE_TO(StartTagState);
    200         }
    201     }
    202     END_STATE()
    203 
    204     WEBVTT_BEGIN_STATE(StartTagState) {
    205         if (isTokenizerWhitespace(cc))
    206             WEBVTT_ADVANCE_TO(StartTagAnnotationState);
    207         else if (cc == '.')
    208             WEBVTT_ADVANCE_TO(StartTagClassState);
    209         else if (cc == '>' || cc == kEndOfFileMarker)
    210             return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
    211         else {
    212             result.append(cc);
    213             WEBVTT_ADVANCE_TO(StartTagState);
    214         }
    215     }
    216     END_STATE()
    217 
    218     WEBVTT_BEGIN_STATE(StartTagClassState) {
    219         if (isTokenizerWhitespace(cc)) {
    220             addNewClass(classes, buffer);
    221             buffer.clear();
    222             WEBVTT_ADVANCE_TO(StartTagAnnotationState);
    223         } else if (cc == '.') {
    224             addNewClass(classes, buffer);
    225             buffer.clear();
    226             WEBVTT_ADVANCE_TO(StartTagClassState);
    227         } else if (cc == '>' || cc == kEndOfFileMarker) {
    228             addNewClass(classes, buffer);
    229             buffer.clear();
    230             return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
    231         } else {
    232             buffer.append(cc);
    233             WEBVTT_ADVANCE_TO(StartTagClassState);
    234         }
    235 
    236     }
    237     END_STATE()
    238 
    239     WEBVTT_BEGIN_STATE(StartTagAnnotationState) {
    240         if (cc == '>' || cc == kEndOfFileMarker) {
    241             return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
    242         }
    243         buffer.append(cc);
     158        result.append(character);
     159        WEBVTT_ADVANCE_TO(DataState);
     160    }
     161
     162TagState:
     163    if (isTokenizerWhitespace(character)) {
     164        ASSERT(result.isEmpty());
    244165        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
    245     }
    246     END_STATE()
    247    
    248     WEBVTT_BEGIN_STATE(EndTagState) {
    249         if (cc == '>' || cc == kEndOfFileMarker)
    250             return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
    251         result.append(cc);
     166    } else if (character == '.') {
     167        ASSERT(result.isEmpty());
     168        WEBVTT_ADVANCE_TO(StartTagClassState);
     169    } else if (character == '/') {
    252170        WEBVTT_ADVANCE_TO(EndTagState);
    253     }
    254     END_STATE()
    255 
    256     WEBVTT_BEGIN_STATE(TimestampTagState) {
    257         if (cc == '>' || cc == kEndOfFileMarker)
    258             return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
    259         result.append(cc);
     171    } else if (WTF::isASCIIDigit(character)) {
     172        result.append(character);
    260173        WEBVTT_ADVANCE_TO(TimestampTagState);
    261     }
    262     END_STATE()
    263 
    264     }
    265 
    266     ASSERT_NOT_REACHED();
    267     return false;
     174    } else if (character == '>' || character == kEndOfFileMarker) {
     175        ASSERT(result.isEmpty());
     176        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
     177    } else {
     178        result.append(character);
     179        WEBVTT_ADVANCE_TO(StartTagState);
     180    }
     181
     182StartTagState:
     183    if (isTokenizerWhitespace(character))
     184        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
     185    else if (character == '.')
     186        WEBVTT_ADVANCE_TO(StartTagClassState);
     187    else if (character == '>' || character == kEndOfFileMarker)
     188        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
     189    else {
     190        result.append(character);
     191        WEBVTT_ADVANCE_TO(StartTagState);
     192    }
     193
     194StartTagClassState:
     195    if (isTokenizerWhitespace(character)) {
     196        addNewClass(classes, buffer);
     197        buffer.clear();
     198        WEBVTT_ADVANCE_TO(StartTagAnnotationState);
     199    } else if (character == '.') {
     200        addNewClass(classes, buffer);
     201        buffer.clear();
     202        WEBVTT_ADVANCE_TO(StartTagClassState);
     203    } else if (character == '>' || character == kEndOfFileMarker) {
     204        addNewClass(classes, buffer);
     205        buffer.clear();
     206        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
     207    } else {
     208        buffer.append(character);
     209        WEBVTT_ADVANCE_TO(StartTagClassState);
     210    }
     211
     212StartTagAnnotationState:
     213    if (character == '>' || character == kEndOfFileMarker)
     214        return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
     215    buffer.append(character);
     216    WEBVTT_ADVANCE_TO(StartTagAnnotationState);
     217
     218EndTagState:
     219    if (character == '>' || character == kEndOfFileMarker)
     220        return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
     221    result.append(character);
     222    WEBVTT_ADVANCE_TO(EndTagState);
     223
     224TimestampTagState:
     225    if (character == '>' || character == kEndOfFileMarker)
     226        return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
     227    result.append(character);
     228    WEBVTT_ADVANCE_TO(TimestampTagState);
    268229}
    269230
  • trunk/Source/WebCore/html/track/WebVTTTokenizer.h

    r178173 r178265  
    4141
    4242class WebVTTTokenizer {
    43     WTF_MAKE_NONCOPYABLE(WebVTTTokenizer);
    4443public:
    4544    explicit WebVTTTokenizer(const String&);
    46 
    4745    bool nextToken(WebVTTToken&);
    4846
    49     inline bool shouldSkipNullCharacters() const { return true; }
     47    static bool neverSkipNullCharacters() { return false; }
    5048
    5149private:
    5250    SegmentedString m_input;
    53 
    54     // ://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
    55     InputStreamPreprocessor<WebVTTTokenizer> m_inputStreamPreprocessor;
     51    InputStreamPreprocessor<WebVTTTokenizer> m_preprocessor;
    5652};
    5753
  • trunk/Source/WebCore/platform/text/SegmentedString.cpp

    r178173 r178265  
    2020#include "config.h"
    2121#include "SegmentedString.h"
     22
     23#include <wtf/text/TextPosition.h>
    2224
    2325namespace WebCore {
     
    4547}
    4648
    47 const SegmentedString& SegmentedString::operator=(const SegmentedString& other)
     49SegmentedString& SegmentedString::operator=(const SegmentedString& other)
    4850{
    4951    m_pushedChar1 = other.m_pushedChar1;
     
    131133}
    132134
    133 void SegmentedString::prepend(const SegmentedSubstring& s)
    134 {
    135     ASSERT(!escaped());
     135void SegmentedString::pushBack(const SegmentedSubstring& s)
     136{
     137    ASSERT(!m_pushedChar1);
    136138    ASSERT(!s.numberOfCharactersConsumed());
    137139    if (!s.m_length)
    138140        return;
    139141
    140     // FIXME: We're assuming that the prepend were originally consumed by
     142    // FIXME: We're assuming that the characters were originally consumed by
    141143    //        this SegmentedString.  We're also ASSERTing that s is a fresh
    142144    //        SegmentedSubstring.  These assumptions are sufficient for our
     
    167169{
    168170    ASSERT(!m_closed);
    169     ASSERT(!s.escaped());
     171    ASSERT(!s.m_pushedChar1);
    170172    append(s.m_currentString);
    171173    if (s.isComposite()) {
     
    178180}
    179181
    180 void SegmentedString::prepend(const SegmentedString& s)
    181 {
    182     ASSERT(!escaped());
    183     ASSERT(!s.escaped());
     182void SegmentedString::pushBack(const SegmentedString& s)
     183{
     184    ASSERT(!m_pushedChar1);
     185    ASSERT(!s.m_pushedChar1);
    184186    if (s.isComposite()) {
    185187        Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin();
    186188        Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend();
    187189        for (; it != e; ++it)
    188             prepend(*it);
    189     }
    190     prepend(s.m_currentString);
     190            pushBack(*it);
     191    }
     192    pushBack(s.m_currentString);
    191193    m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
    192194}
     
    229231}
    230232
    231 void SegmentedString::advance(unsigned count, UChar* consumedCharacters)
     233void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters)
    232234{
    233235    ASSERT_WITH_SECURITY_IMPLICATION(count <= length());
    234236    for (unsigned i = 0; i < count; ++i) {
    235237        consumedCharacters[i] = currentChar();
    236         advance();
     238        advancePastNonNewline();
    237239    }
    238240}
     
    354356OrdinalNumber SegmentedString::currentColumn() const
    355357{
    356     int zeroBasedColumn = numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine;
    357     return OrdinalNumber::fromZeroBasedInt(zeroBasedColumn);
     358    return OrdinalNumber::fromZeroBasedInt(numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine);
    358359}
    359360
     
    364365}
    365366
    366 }
     367SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive)
     368{
     369    unsigned length = strlen(literal);
     370    if (length > this->length())
     371        return NotEnoughCharacters;
     372    UChar* consumedCharacters;
     373    String consumedString = String::createUninitialized(length, consumedCharacters);
     374    advancePastNonNewlines(length, consumedCharacters);
     375    if (consumedString.startsWith(literal, caseSensitive))
     376        return DidMatch;
     377    pushBack(SegmentedString(consumedString));
     378    return DidNotMatch;
     379}
     380
     381}
  • trunk/Source/WebCore/platform/text/SegmentedString.h

    r178173 r178265  
    11/*
    2     Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
     2    Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved.
    33
    44    This library is free software; you can redistribute it and/or
     
    2323#include <wtf/Deque.h>
    2424#include <wtf/text/StringBuilder.h>
    25 #include <wtf/text/TextPosition.h>
    26 #include <wtf/text/WTFString.h>
    2725
    2826namespace WebCore {
     
    171169
    172170    SegmentedString(const SegmentedString&);
    173 
    174     const SegmentedString& operator=(const SegmentedString&);
     171    SegmentedString& operator=(const SegmentedString&);
    175172
    176173    void clear();
     
    178175
    179176    void append(const SegmentedString&);
    180     void prepend(const SegmentedString&);
    181 
    182     bool excludeLineNumbers() const { return m_currentString.excludeLineNumbers(); }
     177    void pushBack(const SegmentedString&);
     178
    183179    void setExcludeLineNumbers();
    184180
     
    200196    bool isClosed() const { return m_closed; }
    201197
    202     enum LookAheadResult {
    203         DidNotMatch,
    204         DidMatch,
    205         NotEnoughCharacters,
    206     };
    207 
    208     LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); }
    209     LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); }
     198    enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters };
     199    template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); }
     200    template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); }
    210201
    211202    void advance()
     
    227218    }
    228219
    229     inline void advanceAndUpdateLineNumber()
     220    void advanceAndUpdateLineNumber()
    230221    {
    231222        if (m_fastPathFlags & Use8BitAdvance) {
     
    252243
    253244        (this->*m_advanceAndUpdateLineNumberFunc)();
    254     }
    255 
    256     void advanceAndASSERT(UChar expectedCharacter)
    257     {
    258         ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);
    259         advance();
    260     }
    261 
    262     void advanceAndASSERTIgnoringCase(UChar expectedCharacter)
    263     {
    264         ASSERT_UNUSED(expectedCharacter, u_foldCase(currentChar(), U_FOLD_CASE_DEFAULT) == u_foldCase(expectedCharacter, U_FOLD_CASE_DEFAULT));
    265         advance();
    266245    }
    267246
     
    287266    }
    288267
    289     // Writes the consumed characters into consumedCharacters, which must
    290     // have space for at least |count| characters.
    291     void advance(unsigned count, UChar* consumedCharacters);
    292 
    293     bool escaped() const { return m_pushedChar1; }
    294 
    295268    int numberOfCharactersConsumed() const
    296269    {
     
    308281    UChar currentChar() const { return m_currentChar; }   
    309282
    310     // The method is moderately slow, comparing to currentLine method.
    311283    OrdinalNumber currentColumn() const;
    312284    OrdinalNumber currentLine() const;
    313     // Sets value of line/column variables. Column is specified indirectly by a parameter columnAftreProlog
     285
     286    // Sets value of line/column variables. Column is specified indirectly by a parameter columnAfterProlog
    314287    // which is a value of column that we should get after a prolog (first prologLength characters) has been consumed.
    315     void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength);
     288    void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAfterProlog, int prologLength);
    316289
    317290private:
     
    323296
    324297    void append(const SegmentedSubstring&);
    325     void prepend(const SegmentedSubstring&);
     298    void pushBack(const SegmentedSubstring&);
    326299
    327300    void advance8();
     
    375348    }
    376349
    377     inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive)
    378     {
    379         if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) {
    380             String currentSubstring = m_currentString.currentSubString(string.length());
    381             if (currentSubstring.startsWith(string, caseSensitive))
    382                 return DidMatch;
    383             return DidNotMatch;
    384         }
    385         return lookAheadSlowCase(string, caseSensitive);
    386     }
    387    
    388     LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive)
    389     {
    390         unsigned count = string.length();
    391         if (count > length())
    392             return NotEnoughCharacters;
    393         UChar* consumedCharacters;
    394         String consumedString = String::createUninitialized(count, consumedCharacters);
    395         advance(count, consumedCharacters);
    396         LookAheadResult result = DidNotMatch;
    397         if (consumedString.startsWith(string, caseSensitive))
    398             result = DidMatch;
    399         prepend(SegmentedString(consumedString));
    400         return result;
    401     }
     350    // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters.
     351    void advancePastNonNewlines(unsigned count);
     352    void advancePastNonNewlines(unsigned count, UChar* consumedCharacters);
     353
     354    AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive);
     355    AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive);
    402356
    403357    bool isComposite() const { return !m_substrings.isEmpty(); }
     
    418372};
    419373
     374inline void SegmentedString::advancePastNonNewlines(unsigned count)
     375{
     376    for (unsigned i = 0; i < count; ++i)
     377        advancePastNonNewline();
    420378}
    421379
     380inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive)
     381{
     382    ASSERT(strlen(literal) == length);
     383    ASSERT(!strchr(literal, '\n'));
     384    if (!m_pushedChar1) {
     385        if (length <= static_cast<unsigned>(m_currentString.m_length)) {
     386            if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive))
     387                return DidNotMatch;
     388            advancePastNonNewlines(length);
     389            return DidMatch;
     390        }
     391    }
     392    return advancePastSlowCase(literal, caseSensitive);
     393}
     394
     395}
     396
    422397#endif
  • trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h

    r178173 r178265  
    3232namespace WebCore {
    3333
    34 inline bool isHexDigit(UChar cc)
    35 {
    36     return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f') || (cc >= 'A' && cc <= 'F');
    37 }
    38 
    3934inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters)
    4035{
    41     if (consumedCharacters.length() == 1)
    42         source.push(consumedCharacters[0]);
    43     else if (consumedCharacters.length() == 2) {
    44         source.push(consumedCharacters[0]);
    45         source.push(consumedCharacters[1]);
    46     } else
    47         source.prepend(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
     36    source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
    4837}
    4938
     
    5544    ASSERT(decodedCharacter.isEmpty());
    5645   
    57     enum EntityState {
     46    enum {
    5847        Initial,
    5948        Number,
     
    6352        Decimal,
    6453        Named
    65     };
    66     EntityState entityState = Initial;
     54    } state = Initial;
    6755    UChar32 result = 0;
    6856    bool overflow = false;
     
    7159   
    7260    while (!source.isEmpty()) {
    73         UChar cc = source.currentChar();
    74         switch (entityState) {
    75         case Initial: {
    76             if (cc == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc == '&')
     61        UChar character = source.currentChar();
     62        switch (state) {
     63        case Initial:
     64            if (character == '\x09' || character == '\x0A' || character == '\x0C' || character == ' ' || character == '<' || character == '&')
    7765                return false;
    78             if (additionalAllowedCharacter && cc == additionalAllowedCharacter)
     66            if (additionalAllowedCharacter && character == additionalAllowedCharacter)
    7967                return false;
    80             if (cc == '#') {
    81                 entityState = Number;
     68            if (character == '#') {
     69                state = Number;
    8270                break;
    8371            }
    84             if ((cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z')) {
    85                 entityState = Named;
    86                 continue;
     72            if (isASCIIAlpha(character)) {
     73                state = Named;
     74                goto Named;
    8775            }
    8876            return false;
    89         }
    90         case Number: {
    91             if (cc == 'x') {
    92                 entityState = MaybeHexLowerCaseX;
     77        case Number:
     78            if (character == 'x') {
     79                state = MaybeHexLowerCaseX;
    9380                break;
    9481            }
    95             if (cc == 'X') {
    96                 entityState = MaybeHexUpperCaseX;
     82            if (character == 'X') {
     83                state = MaybeHexUpperCaseX;
    9784                break;
    9885            }
    99             if (cc >= '0' && cc <= '9') {
    100                 entityState = Decimal;
    101                 continue;
     86            if (isASCIIDigit(character)) {
     87                state = Decimal;
     88                goto Decimal;
    10289            }
    103             source.push('#');
     90            source.pushBack(SegmentedString(ASCIILiteral("#")));
    10491            return false;
    105         }
    106         case MaybeHexLowerCaseX: {
    107             if (isHexDigit(cc)) {
    108                 entityState = Hex;
    109                 continue;
     92        case MaybeHexLowerCaseX:
     93            if (isASCIIHexDigit(character)) {
     94                state = Hex;
     95                goto Hex;
    11096            }
    111             source.push('#');
    112             source.push('x');
     97            source.pushBack(SegmentedString(ASCIILiteral("#x")));
    11398            return false;
    114         }
    115         case MaybeHexUpperCaseX: {
    116             if (isHexDigit(cc)) {
    117                 entityState = Hex;
    118                 continue;
     99        case MaybeHexUpperCaseX:
     100            if (isASCIIHexDigit(character)) {
     101                state = Hex;
     102                goto Hex;
    119103            }
    120             source.push('#');
    121             source.push('X');
     104            source.pushBack(SegmentedString(ASCIILiteral("#X")));
    122105            return false;
    123         }
    124         case Hex: {
    125             if (cc >= '0' && cc <= '9')
    126                 result = result * 16 + cc - '0';
    127             else if (cc >= 'a' && cc <= 'f')
    128                 result = result * 16 + 10 + cc - 'a';
    129             else if (cc >= 'A' && cc <= 'F')
    130                 result = result * 16 + 10 + cc - 'A';
    131             else if (cc == ';') {
    132                 source.advanceAndASSERT(cc);
     106        case Hex:
     107        Hex:
     108            if (isASCIIHexDigit(character)) {
     109                result = result * 16 + toASCIIHexValue(character);
     110                if (result > highestValidCharacter)
     111                    overflow = true;
     112                break;
     113            }
     114            if (character == ';') {
     115                source.advance();
    133116                decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
    134117                return true;
    135             } else if (ParserFunctions::acceptMalformed()) {
     118            }
     119            if (ParserFunctions::acceptMalformed()) {
    136120                decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
    137121                return true;
    138             } else {
    139                 unconsumeCharacters(source, consumedCharacters);
    140                 return false;
    141122            }
    142             if (result > highestValidCharacter)
    143                 overflow = true;
    144             break;
    145         }
    146         case Decimal: {
    147             if (cc >= '0' && cc <= '9')
    148                 result = result * 10 + cc - '0';
    149             else if (cc == ';') {
    150                 source.advanceAndASSERT(cc);
     123            unconsumeCharacters(source, consumedCharacters);
     124            return false;
     125        case Decimal:
     126        Decimal:
     127            if (isASCIIDigit(character)) {
     128                result = result * 10 + character - '0';
     129                if (result > highestValidCharacter)
     130                    overflow = true;
     131                break;
     132            }
     133            if (character == ';') {
     134                source.advance();
    151135                decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
    152136                return true;
    153             } else if (ParserFunctions::acceptMalformed()) {
     137            }
     138            if (ParserFunctions::acceptMalformed()) {
    154139                decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
    155140                return true;
    156             } else {
    157                 unconsumeCharacters(source, consumedCharacters);
    158                 return false;
    159141            }
    160             if (result > highestValidCharacter)
    161                 overflow = true;
    162             break;
     142            unconsumeCharacters(source, consumedCharacters);
     143            return false;
     144        case Named:
     145        Named:
     146            return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, character);
    163147        }
    164         case Named: {
    165             return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, cc);
    166         }
    167         }
    168         consumedCharacters.append(cc);
    169         source.advanceAndASSERT(cc);
     148        consumedCharacters.append(character);
     149        source.advance();
    170150    }
    171151    ASSERT(source.isEmpty());
  • trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h

    r178173 r178265  
    11/*
    2  * Copyright (C) 2008 Apple Inc. All Rights Reserved.
     2 * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
    33 * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
    44 * Copyright (C) 2010 Google, Inc. All Rights Reserved.
     
    3131#include "SegmentedString.h"
    3232
    33 namespace WebCore {
    34 
    35 inline bool isTokenizerWhitespace(UChar cc)
    36 {
    37     return cc == ' ' || cc == '\x0A' || cc == '\x09' || cc == '\x0C';
    38 }
    39 
    40 inline void advanceStringAndASSERTIgnoringCase(SegmentedString& source, const char* expectedCharacters)
    41 {
    42     while (*expectedCharacters)
    43         source.advanceAndASSERTIgnoringCase(*expectedCharacters++);
    44 }
    45 
    46 inline void advanceStringAndASSERT(SegmentedString& source, const char* expectedCharacters)
    47 {
    48     while (*expectedCharacters)
    49         source.advanceAndASSERT(*expectedCharacters++);
    50 }
    51 
    5233#if COMPILER(MSVC)
    53 // We need to disable the "unreachable code" warning because we want to assert
    54 // that some code points aren't reached in the state machine.
     34// Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro.
    5535#pragma warning(disable: 4702)
    5636#endif
    5737
    58 #define BEGIN_STATE(prefix, stateName) case prefix::stateName: stateName:
    59 #define END_STATE() ASSERT_NOT_REACHED(); break;
     38namespace WebCore {
    6039
    61 // We use this macro when the HTML5 spec says "reconsume the current input
    62 // character in the <mumble> state."
    63 #define RECONSUME_IN(prefix, stateName)                                    \
    64     do {                                                                   \
    65         m_state = prefix::stateName;                                       \
    66         goto stateName;                                                    \
     40inline bool isTokenizerWhitespace(UChar character)
     41{
     42    return character == ' ' || character == '\x0A' || character == '\x09' || character == '\x0C';
     43}
     44
     45#define BEGIN_STATE(stateName)                                  \
     46    case stateName:                                             \
     47    stateName: {                                                \
     48        const auto currentState = stateName;                    \
     49        UNUSED_PARAM(currentState);
     50
     51#define END_STATE()                                             \
     52        ASSERT_NOT_REACHED();                                   \
     53        break;                                                  \
     54    }
     55
     56#define RETURN_IN_CURRENT_STATE(expression)                     \
     57    do {                                                        \
     58        m_state = currentState;                                 \
     59        return expression;                                      \
    6760    } while (false)
    6861
    69 // We use this macro when the HTML5 spec says "consume the next input
    70 // character ... and switch to the <mumble> state."
    71 #define ADVANCE_TO(prefix, stateName)                                      \
    72     do {                                                                   \
    73         m_state = prefix::stateName;                                       \
    74         if (!m_inputStreamPreprocessor.advance(source))                    \
    75             return haveBufferedCharacterToken();                           \
    76         cc = m_inputStreamPreprocessor.nextInputCharacter();               \
    77         goto stateName;                                                    \
     62// We use this macro when the HTML spec says "reconsume the current input character in the <mumble> state."
     63#define RECONSUME_IN(newState)                                  \
     64    do {                                                        \
     65        goto newState;                                          \
    7866    } while (false)
    7967
    80 // Sometimes there's more complicated logic in the spec that separates when
    81 // we consume the next input character and when we switch to a particular
    82 // state. We handle those cases by advancing the source directly and using
    83 // this macro to switch to the indicated state.
    84 #define SWITCH_TO(prefix, stateName)                                       \
    85     do {                                                                   \
    86         m_state = prefix::stateName;                                       \
    87         if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))   \
    88             return haveBufferedCharacterToken();                           \
    89         cc = m_inputStreamPreprocessor.nextInputCharacter();               \
    90         goto stateName;                                                    \
     68// We use this macro when the HTML spec says "consume the next input character ... and switch to the <mumble> state."
     69#define ADVANCE_TO(newState)                                    \
     70    do {                                                        \
     71        if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \
     72            m_state = newState;                                 \
     73            return haveBufferedCharacterToken();                \
     74        }                                                       \
     75        character = m_preprocessor.nextInputCharacter();        \
     76        goto newState;                                          \
     77    } while (false)
     78
     79// For more complex cases, caller consumes the characters first and then uses this macro.
     80#define SWITCH_TO(newState)                                     \
     81    do {                                                        \
     82        if (!m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \
     83            m_state = newState;                                 \
     84            return haveBufferedCharacterToken();                \
     85        }                                                       \
     86        character = m_preprocessor.nextInputCharacter();        \
     87        goto newState;                                          \
    9188    } while (false)
    9289
Note: See TracChangeset for help on using the changeset viewer.