Context Navigation

← Previous Changeset
Next Changeset →

Changeset 209058 in webkit

Timestamp:

Nov 28, 2016, 8:29:55 PM (9 years ago)

Author:

Darin Adler

Message:

Streamline and speed up tokenizer and segmented string classes
https://bugs.webkit.org/show_bug.cgi?id=165003

Reviewed by Sam Weinig.

Source/JavaScriptCore:

runtime/JSONObject.cpp:

(JSC::Stringifier::appendStringifiedValue): Use viewWithUnderlyingString when calling
StringBuilder::appendQuotedJSONString, since it now takes a StringView and there is
no benefit in creating a String for that function if one doesn't already exist.

Source/WebCore:

Profiling Speedometer on my iMac showed the tokenizer as one of the
hottest functions. This patch streamlines the segmented string class,
removing various unused features, and also improves some other functions
seen on the Speedometer profile. On my iMac I measured a speedup of
about 3%. Changes include:

Removed m_pushedChar1, m_pushedChar2, and m_empty data members from the SegmentedString class and all the code that used to handle them.

Simplified the SegmentedString advance functions so they are small enough to get inlined in the HTML tokenizer.

Updated callers to call the simpler SegmentedString advance functions that don't handle newlines in as many cases as possible.

Cut down on allocations of SegmentedString and made code move the segmented string and the strings that are moved into it rather than copying them whenever possible.

Simplified segmented string functions, removing some branches, mostly from the non-fast paths.

Removed small unused functions and small functions used in only one or two places, made more functions private and renamed for clarity.

bindings/js/JSHTMLDocumentCustom.cpp:

(WebCore::documentWrite): Moved a little more of the common code in here
from the two functions belwo. Removed obsolete comment saying this was not
following the DOM specification because it is. Removed unneeded special
cases for 1 argument and no arguments. Take a reference instead of a pointer.
(WebCore::JSHTMLDocument::write): Updated for above.
(WebCore::JSHTMLDocument::writeln): Ditto.

css/parser/CSSTokenizer.cpp: Added now-needed include.
css/parser/CSSTokenizer.h: Removed unneeded include.

css/parser/CSSTokenizerInputStream.h: Added definition of kEndOfFileMarker

here; this is now separate from the use in the HTMLParser. In the long run,
unclear to me whether it is really needed in either.

dom/Document.cpp:

(WebCore::Document::prepareToWrite): Added. Helper function used by the three
different variants of write. Using this may prevent us from having to construct
a SegmentedString just to append one string after future refactoring.
(WebCore::Document::write): Updated to take an rvalue reference and move the
value through.
(WebCore::Document::writeln): Use a single write call instead of two.

dom/Document.h: Changed write to take an rvalue reference to SegmentedString

rather than a const reference.

dom/DocumentParser.h: Changed insert to take an rvalue reference to

SegmentedString. In the future, should probably overload to take a single
string since that is the normal case.

dom/RawDataDocumentParser.h: Updated for change to DocumentParser.

html/FTPDirectoryDocument.cpp:

(WebCore::FTPDirectoryDocumentParser::append): Refactored a bit, just enough
so that we don't need an assignment operator for SegmentedString that can
copy a String.

html/parser/HTMLDocumentParser.cpp:

(WebCore::HTMLDocumentParser::insert): Updated to take an rvalue reference,
and move the value through.

html/parser/HTMLDocumentParser.h: Updated for the above.

html/parser/HTMLEntityParser.cpp:

(WebCore::HTMLEntityParser::consumeNamedEntity): Updated for name changes.
Changed the twao calls to advance here to call advancePastNonNewline; no
change in behavior, but asserts what the code was assuming before, that the
character was not a newline.

html/parser/HTMLInputStream.h:

(WebCore::HTMLInputStream::appendToEnd): Updated to take an rvalue reference,
and move the value through.
(WebCore::HTMLInputStream::insertAtCurrentInsertionPoint): Ditto.
(WebCore::HTMLInputStream::markEndOfFile): Removed the code to construct a
SegmentedString, overkill since we can just append an individual string.
(WebCore::HTMLInputStream::splitInto): Rewrote the move idiom here to actually
use move, which will reduce reference count churn and other unneeded work.

html/parser/HTMLMetaCharsetParser.cpp:

(WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Removed unneeded
construction of a SegmentedString, just to append a string.

html/parser/HTMLSourceTracker.cpp:

(WebCore::HTMLSourceTracker::HTMLSourceTracker): Moved to the class definition.
(WebCore::HTMLSourceTracker::source): Updated for function name change.

html/parser/HTMLSourceTracker.h: Updated for above.

html/parser/HTMLTokenizer.cpp: Added now-needed include.

(WebCore::HTMLTokenizer::emitAndResumeInDataState): Use advancePastNonNewline,
since this function is never called in response to a newline character.
(WebCore::HTMLTokenizer::commitToPartialEndTag): Ditto.
(WebCore::HTMLTokenizer::commitToCompleteEndTag): Ditto.
(WebCore::HTMLTokenizer::processToken): Use ADVANCE_PAST_NON_NEWLINE_TO macro
instead of ADVANCE_TO in cases where the character we are advancing past is
known not to be a newline, so we can use the more efficient advance function
that doesn't check for the newline character.

html/parser/InputStreamPreprocessor.h: Moved kEndOfFileMarker to

SegmentedString.h; not sure that's a good place for it either. In the long run,
unclear to me whether this is really needed.
(WebCore::InputStreamPreprocessor::peek): Added UNLIKELY for the empty check.
Added LIKELY for the not-special character check.
(WebCore::InputStreamPreprocessor::advance): Updated for the new name of the
advanceAndUpdateLineNumber function.
(WebCore::InputStreamPreprocessor::advancePastNonNewline): Added. More
efficient than advance for cases where the last characer is known not to be
a newline character.
(WebCore::InputStreamPreprocessor::skipNextNewLine): Deleted. Was unused.
(WebCore::InputStreamPreprocessor::reset): Deleted. Was unused except in the
constructor; added initial values for the data members to replace.
(WebCore::InputStreamPreprocessor::processNextInputCharacter): Removed long
FIXME comment that didn't really need to be here. Reorganized a bit.
(WebCore::InputStreamPreprocessor::isAtEndOfFile): Renamed and made static.

html/track/BufferedLineReader.cpp:

(WebCore::BufferedLineReader::nextLine): Updated to not use the poorly named
scanCharacter function to advance past a newline. Also renamed from getLine
and changed to return Optional<String> instead of using a boolean to indicate
failure and an out argument.

html/track/BufferedLineReader.h:

(WebCore::BufferedLineReader::BufferedLineReader): Use the default, putting
initial values on each data member below.
(WebCore::BufferedLineReader::append): Updated to take an rvalue reference,
and move the value through.
(WebCore::BufferedLineReader::scanCharacter): Deleted. Was poorly named,
and easy to replace with two lines of code at its two call sites.
(WebCore::BufferedLineReader::reset): Rewrote to correctly clear all the
data members of the class, not just the segmented string.

html/track/InbandGenericTextTrack.cpp:

(WebCore::InbandGenericTextTrack::parseWebVTTFileHeader): Updated to take
an rvalue reference and move the value through.

html/track/InbandGenericTextTrack.h: Updated for the above.

html/track/InbandTextTrack.h: Updated since parseWebVTTFileHeader now

takes an rvalue reference.

html/track/WebVTTParser.cpp:

(WebCore::WebVTTParser::parseFileHeader): Updated to take an rvalue reference
and move the value through.
(WebCore::WebVTTParser::parseBytes): Updated to pass ownership of the string
in to the line reader append function.
(WebCore::WebVTTParser::parseCueData): Use auto and WTFMove for WebVTTCueData.
(WebCore::WebVTTParser::flush): More of the same.
(WebCore::WebVTTParser::parse): Changed to use nextLine instead of getLine.

html/track/WebVTTParser.h: Updated for the above.

html/track/WebVTTTokenizer.cpp:

(WebCore::advanceAndEmitToken): Use advanceAndUpdateLineNumber by its new
name, just advance. No change in behavior.
(WebCore::WebVTTTokenizer::WebVTTTokenizer): Pass a String, not a
SegmentedString, to add the end of file marker.

platform/graphics/InbandTextTrackPrivateClient.h: Updated since

parseWebVTTFileHeader takes an rvalue reference.

platform/text/SegmentedString.cpp:

(WebCore::SegmentedString::Substring::appendTo): Moved here from the header.
The only caller is SegmentedString::toString, inside this file.
(WebCore::SegmentedString::SegmentedString): Deleted the copy constructor.
No longer needed.
(WebCore::SegmentedString::operator=): Defined a move assignment operator
rather than an ordinary assignment operator, since that's what the call
sites really need.
(WebCore::SegmentedString::length): Simplified since we no longer need to
support pushed characters.
(WebCore::SegmentedString::setExcludeLineNumbers): Simplified, since we
can just iterate m_otherSubstrings without an extra check. Also changed to
write directly to the data member of Substring instead of using a function.
(WebCore::SegmentedString::updateAdvanceFunctionPointersForEmptyString):
Added. Used when we run out of characters.
(WebCore::SegmentedString::clear): Removed code to clear now-deleted members.
Updated for changes to other member names.
(WebCore::SegmentedString::appendSubstring): Renamed from just append to
avoid ambiguity with the public append function. Changed to take an rvalue
reference, and move in, and added code to set m_currentCharacter properly,
so the caller doesn't have to deal with that.
(WebCore::SegmentedString::close): Updated to use m_isClosed by its new name.
Also removed unneeded comment about assertion that fires when trying to close
an already closed string.
(WebCore::SegmentedString::append): Added overloads for rvalue references of
both entire SegmentedString objects and of String. Streamlined to just call
appendSubstring and append to the deque.
(WebCore::SegmentedString::pushBack): Tightened up since we don't allow empty
strings and changed to take just a string, not an entire segmented string.
(WebCore::SegmentedString::advanceSubstring): Moved logic into the
advancePastSingleCharacterSubstringWithoutUpdatingLineNumber function.
(WebCore::SegmentedString::toString): Simplified now that we don't need to
support pushed characters.
(WebCore::SegmentedString::advancePastNonNewlines): Deleted.
(WebCore::SegmentedString::advance8): Deleted.
(WebCore::SegmentedString::advanceWithoutUpdatingLineNumber16): Renamed from
advance16. Simplified now that there are no pushed characters. Also changed to
access data members of m_currentSubstring directly instead of calling a function.
(WebCore::SegmentedString::advanceAndUpdateLineNumber8): Deleted.
(WebCore::SegmentedString::advanceAndUpdateLineNumber16): Ditto.
(WebCore::SegmentedString::advancePastSingleCharacterSubstringWithoutUpdatingLineNumber):
Renamed from advanceSlowCase. Removed uneeded logic to handle pushed characters.
Moved code in here from advanceSubstring.
(WebCore::SegmentedString::advancePastSingleCharacterSubstring): Renamed from
advanceAndUpdateLineNumberSlowCase. Simplified by calling the function above.
(WebCore::SegmentedString::advanceEmpty): Broke assertion up into two.
(WebCore::SegmentedString::updateSlowCaseFunctionPointers): Updated for name changes.
(WebCore::SegmentedString::advancePastSlowCase): Changed name and meaning of
boolean argument. Rewrote to use the String class less; it's now used only when
we fail to match after the first character rather than being used for the actual
comparison with the literal.

platform/text/SegmentedString.h: Moved all non-trivial function bodies out of

the class definition to make things easier to read. Moved the SegmentedSubstring
class inside the SegmentedString class, making it a private struct named Substring.
Removed the m_ prefix from data members of the struct, removed many functions from
the struct and made its union be anonymous instead of naming it m_data. Removed
unneeded StringBuilder.h include.
(WebCore::SegmentedString::isEmpty): Changed to use the length of the substring
instead of a separate boolean. We never create an empty substring, nor leave one
in place as the current substring unless the entire segmented string is empty.
(WebCore::SegmentedString::advancePast): Updated to use the new member function
template instead of a non-template member function. The new member function is
entirely rewritten and does the matching directly rather than allocating a string
just to do prefix matching.
(WebCore::SegmentedString::advancePastLettersIgnoringASCIICase): Renamed to make
it clear that the literal must be all non-letters or lowercase letters as with
the other "letters ignoring ASCII case" functions. The three call sites all fit
the bill. Implement by calling the new function template.
(WebCore::SegmentedString::currentCharacter): Renamed from currentChar.
(WebCore::SegmentedString::Substring::Substring): Use an rvalue reference and
move the string in.
(WebCore::SegmentedString::Substring::currentCharacter): Simplified since this
is never used on an empty substring.
(WebCore::SegmentedString::Substring::incrementAndGetCurrentCharacter): Ditto.
(WebCore::SegmentedString::SegmentedString): Overload to take an rvalue reference.
Simplified since there are now fewer data members.
(WebCore::SegmentedString::advanceWithoutUpdatingLineNumber): Renamed from
advance, since this is only safe to use if there is some reason it is OK to skip
updating the line number.
(WebCore::SegmentedString::advance): Renamed from advanceAndUpdateLineNumber,
since doing that is the normal desired behavior and not worth mentioning in the
public function name.
(WebCore::SegmentedString::advancePastNewline): Renamed from
advancePastNewlineAndUpdateLineNumber.
(WebCore::SegmentedString::numberOfCharactersConsumed): Greatly simplified since
pushed characters are no longer supported.
(WebCore::SegmentedString::characterMismatch): Added. Used by advancePast.

xml/parser/CharacterReferenceParserInlines.h:

(WebCore::unconsumeCharacters): Use toString rather than toStringPreserveCapacity
because the SegmentedString is going to take ownership of the string.
(WebCore::consumeCharacterReference): Updated to use the pushBack that takes just
a String, not a SegmentedString. Also use advancePastNonNewline.

xml/parser/MarkupTokenizerInlines.h: Added ADVANCE_PAST_NON_NEWLINE_TO.

xml/parser/XMLDocumentParser.cpp:

(WebCore::XMLDocumentParser::insert): Updated since this takes an rvalue reference.
(WebCore::XMLDocumentParser::append): Removed unnecessary code to create a
SegmentedString.

xml/parser/XMLDocumentParser.h: Updated for above. Also fixed indentation

and initialized most data members.

xml/parser/XMLDocumentParserLibxml2.cpp:

(WebCore::XMLDocumentParser::XMLDocumentParser): Moved most data member
initialization into the class definition.
(WebCore::XMLDocumentParser::resumeParsing): Removed code that copied a
segmented string, but converted the whole thing into a string before using it.
Now we convert to a string right away.

Source/WTF:

wtf/text/StringBuilder.cpp:

(WTF::StringBuilder::bufferCharacters<LChar>): Moved this here from
the header since it is only used inside the class. Also renamed from
getBufferCharacters.
(WTF::StringBuilder::bufferCharacters<UChar>): Ditto.
(WTF::StringBuilder::appendUninitializedUpconvert): Added. Helper
for the upconvert case in the 16-bit overload of StrinBuilder::append.
(WTF::StringBuilder::append): Changed to use appendUninitializedUpconvert.
(WTF::quotedJSONStringLength): Added. Used in new appendQuotedJSONString
implementation below that now correctly determines the size of what will
be appended by walking thorugh the string twice.
(WTF::appendQuotedJSONStringInternal): Moved the code that writes the
quote marks in here. Also made a few coding style tweaks.
(WTF::StringBuilder::appendQuotedJSONString): Rewrote to use a much
simpler algorithm that grows the string the same way the append function
does. The old code would use reserveCapacity in a way that was costly when
doing a lot of appends on the same string, and also allocated far too much
memory for normal use cases where characters did not need to be turned
into escape sequences.

wtf/text/StringBuilder.h:

(WTF::StringBuilder::append): Tweaked style a bit, fixed a bug where the
m_is8Bit field wasn't set correctly in one case, optimized the function that
adds substrings for the case where this is the first append and the substring
happens to cover the entire string. Also clarified the assertions and removed
an unneeded check from that substring overload.
(WTF::equal): Reimplemented, using equalCommon.

Location:

trunk/Source

Files:

: 40 edited

JavaScriptCore/ChangeLog (modified) (1 diff)
JavaScriptCore/runtime/JSONObject.cpp (modified) (2 diffs)
WTF/ChangeLog (modified) (1 diff)
WTF/wtf/text/StringBuilder.cpp (modified) (19 diffs)
WTF/wtf/text/StringBuilder.h (modified) (12 diffs)
WebCore/ChangeLog (modified) (1 diff)
WebCore/bindings/js/JSHTMLDocumentCustom.cpp (modified) (9 diffs)
WebCore/css/parser/CSSTokenizer.cpp (modified) (1 diff)
WebCore/css/parser/CSSTokenizer.h (modified) (1 diff)
WebCore/css/parser/CSSTokenizerInputStream.h (modified) (1 diff)
WebCore/dom/Document.cpp (modified) (3 diffs)
WebCore/dom/Document.h (modified) (1 diff)
WebCore/dom/DocumentParser.h (modified) (1 diff)
WebCore/dom/RawDataDocumentParser.h (modified) (1 diff)
WebCore/html/FTPDirectoryDocument.cpp (modified) (3 diffs)
WebCore/html/parser/HTMLDocumentParser.cpp (modified) (3 diffs)
WebCore/html/parser/HTMLDocumentParser.h (modified) (1 diff)
WebCore/html/parser/HTMLEntityParser.cpp (modified) (2 diffs)
WebCore/html/parser/HTMLInputStream.h (modified) (4 diffs)
WebCore/html/parser/HTMLMetaCharsetParser.cpp (modified) (2 diffs)
WebCore/html/parser/HTMLSourceTracker.cpp (modified) (2 diffs)
WebCore/html/parser/HTMLSourceTracker.h (modified) (1 diff)
WebCore/html/parser/HTMLTokenizer.cpp (modified) (80 diffs)
WebCore/html/parser/InputStreamPreprocessor.h (modified) (6 diffs)
WebCore/html/track/BufferedLineReader.cpp (modified) (6 diffs)
WebCore/html/track/BufferedLineReader.h (modified) (1 diff)
WebCore/html/track/InbandGenericTextTrack.cpp (modified) (1 diff)
WebCore/html/track/InbandGenericTextTrack.h (modified) (1 diff)
WebCore/html/track/InbandTextTrack.h (modified) (1 diff)
WebCore/html/track/WebVTTParser.cpp (modified) (11 diffs)
WebCore/html/track/WebVTTParser.h (modified) (1 diff)
WebCore/html/track/WebVTTTokenizer.cpp (modified) (4 diffs)
WebCore/platform/graphics/InbandTextTrackPrivateClient.h (modified) (1 diff)
WebCore/platform/text/SegmentedString.cpp (modified) (5 diffs)
WebCore/platform/text/SegmentedString.h (modified) (4 diffs)
WebCore/xml/parser/CharacterReferenceParserInlines.h (modified) (8 diffs)
WebCore/xml/parser/MarkupTokenizerInlines.h (modified) (4 diffs)
WebCore/xml/parser/XMLDocumentParser.cpp (modified) (4 diffs)
WebCore/xml/parser/XMLDocumentParser.h (modified) (3 diffs)
WebCore/xml/parser/XMLDocumentParserLibxml2.cpp (modified) (8 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/JavaScriptCore/ChangeLog

-              r209043
+              r209058
+-11-28  Darin Adler  <darin@apple.com>
+        Streamline and speed up tokenizer and segmented string classes
+        https://bugs.webkit.org/show_bug.cgi?id=165003
+        Reviewed by Sam Weinig.
+        * runtime/JSONObject.cpp:
+        (JSC::Stringifier::appendStringifiedValue): Use viewWithUnderlyingString when calling
+        StringBuilder::appendQuotedJSONString, since it now takes a StringView and there is
+        no benefit in creating a String for that function if one doesn't already exist.
 -11-21  Mark Lam  <mark.lam@apple.com>

trunk/Source/JavaScriptCore/runtime/JSONObject.cpp

-              r208966
+              r209058
 /*
  * Copyright (C) 2009, 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2009-2016 Apple Inc. All rights reserved.
+ *
  * Redistribution and use in source and binary forms, with or without
 …
     if (value.isString()) {
         builder.appendQuotedJSONString(asString(value)->value(m_exec));
+        builder.appendQuotedJSONString(asString(value)->viewWithUnderlyingString(*m_exec).view);
         return StringifySucceeded;
+    }

trunk/Source/WTF/ChangeLog

-              r208985
+              r209058
+-11-28  Darin Adler  <darin@apple.com>
+        Streamline and speed up tokenizer and segmented string classes
+        https://bugs.webkit.org/show_bug.cgi?id=165003
+        Reviewed by Sam Weinig.
+        * wtf/text/StringBuilder.cpp:
+        (WTF::StringBuilder::bufferCharacters<LChar>): Moved this here from
+        the header since it is only used inside the class. Also renamed from
+        getBufferCharacters.
+        (WTF::StringBuilder::bufferCharacters<UChar>): Ditto.
+        (WTF::StringBuilder::appendUninitializedUpconvert): Added. Helper
+        for the upconvert case in the 16-bit overload of StrinBuilder::append.
+        (WTF::StringBuilder::append): Changed to use appendUninitializedUpconvert.
+        (WTF::quotedJSONStringLength): Added. Used in new appendQuotedJSONString
+        implementation below that now correctly determines the size of what will
+        be appended by walking thorugh the string twice.
+        (WTF::appendQuotedJSONStringInternal): Moved the code that writes the
+        quote marks in here. Also made a few coding style tweaks.
+        (WTF::StringBuilder::appendQuotedJSONString): Rewrote to use a much
+        simpler algorithm that grows the string the same way the append function
+        does. The old code would use reserveCapacity in a way that was costly when
+        doing a lot of appends on the same string, and also allocated far too much
+        memory for normal use cases where characters did not need to be turned
+        into escape sequences.
+        * wtf/text/StringBuilder.h:
+        (WTF::StringBuilder::append): Tweaked style a bit, fixed a bug where the
+        m_is8Bit field wasn't set correctly in one case, optimized the function that
+        adds substrings for the case where this is the first append and the substring
+        happens to cover the entire string. Also clarified the assertions and removed
+        an unneeded check from that substring overload.
+        (WTF::equal): Reimplemented, using equalCommon.
 -11-26  Yusuke Suzuki  <utatane.tea@gmail.com>

trunk/Source/WTF/wtf/text/StringBuilder.cpp

-              r205847
+              r209058
 /*
  * Copyright (C) 2010, 2013, 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2010-2016 Apple Inc. All rights reserved.
  * Copyright (C) 2012 Google Inc. All rights reserved.
+ *
 …
 #include "IntegerToStringConversion.h"
 #include "MathExtras.h"
-#include "WTFString.h"
 #include <wtf/dtoa.h>
 …
     static const unsigned minimumCapacity = 16;
     return std::max(requiredLength, std::max(minimumCapacity, capacity * 2));
+}
+template<> ALWAYS_INLINE LChar* StringBuilder::bufferCharacters<LChar>()
+{
+    ASSERT(m_is8Bit);
+    return m_bufferCharacters8;
+}
+template<> ALWAYS_INLINE UChar* StringBuilder::bufferCharacters<UChar>()
+{
+    ASSERT(!m_is8Bit);
+    return m_bufferCharacters16;
+}
 …
+{
     ASSERT(m_is8Bit);
     // Copy the existing data into a new buffer, set result to point to the end of the existing data.
     auto buffer = StringImpl::createUninitialized(requiredLength, m_bufferCharacters8);
 …
+{
     ASSERT(!m_is8Bit);
     // Copy the existing data into a new buffer, set result to point to the end of the existing data.
     auto buffer = StringImpl::createUninitialized(requiredLength, m_bufferCharacters16);
 …
 // Allocate a new 16 bit buffer, copying in currentCharacters (which is 8 bit and may come
 // from either m_string or m_buffer, neither will be reassigned until the copy has completed).
 void StringBuilder::allocateBufferUpConvert(const LChar* currentCharacters, unsigned requiredLength)
+void StringBuilder::allocateBufferUpconvert(const LChar* currentCharacters, unsigned requiredLength)
+{
     ASSERT(m_is8Bit);
     ASSERT(requiredLength >= m_length);
     // Copy the existing data into a new buffer, set result to point to the end of the existing data.
     auto buffer = StringImpl::createUninitialized(requiredLength, m_bufferCharacters16);
 …
+}
+template <>
+void StringBuilder::reallocateBuffer<LChar>(unsigned requiredLength)
+template<> void StringBuilder::reallocateBuffer<LChar>(unsigned requiredLength)
+{
     // If the buffer has only one ref (by this StringBuilder), reallocate it,
 …
+}
+template <>
+void StringBuilder::reallocateBuffer<UChar>(unsigned requiredLength)
+template<> void StringBuilder::reallocateBuffer<UChar>(unsigned requiredLength)
+{
     // If the buffer has only one ref (by this StringBuilder), reallocate it,
 …
     if (m_buffer->is8Bit())
         allocateBufferUpConvert(m_buffer->characters8(), requiredLength);
+        allocateBufferUpconvert(m_buffer->characters8(), requiredLength);
     else if (m_buffer->hasOneRef())
         m_buffer = StringImpl::reallocate(m_buffer.releaseNonNull(), requiredLength, m_bufferCharacters16);
 …
         if (newCapacity > m_length) {
             if (!m_length) {
                 LChar* nullPlaceholder = 0;
+                LChar* nullPlaceholder = nullptr;
                 allocateBuffer(nullPlaceholder, newCapacity);
             } else if (m_string.is8Bit())
 …
 // Make 'length' additional capacity be available in m_buffer, update m_string & m_length,
 // return a pointer to the newly allocated storage.
+template <typename CharType>
+ALWAYS_INLINE CharType* StringBuilder::appendUninitialized(unsigned length)
+template<typename CharacterType> ALWAYS_INLINE CharacterType* StringBuilder::appendUninitialized(unsigned length)
+{
     ASSERT(length);
 …
         m_string = String();
         m_length = requiredLength;
         return getBufferCharacters<CharType>() + currentLength;
+    }
     return appendUninitializedSlow<CharType>(requiredLength);
+        return bufferCharacters<CharacterType>() + currentLength;
+    }
+    return appendUninitializedSlow<CharacterType>(requiredLength);
+}
 // Make 'length' additional capacity be available in m_buffer, update m_string & m_length,
 // return a pointer to the newly allocated storage.
+template <typename CharType>
+CharType* StringBuilder::appendUninitializedSlow(unsigned requiredLength)
+template<typename CharacterType> CharacterType* StringBuilder::appendUninitializedSlow(unsigned requiredLength)
+{
     ASSERT(requiredLength);
 …
         // If the buffer is valid it must be at least as long as the current builder contents!
         ASSERT(m_buffer->length() >= m_length);
+        reallocateBuffer<CharType>(expandedCapacity(capacity(), requiredLength));
+        reallocateBuffer<CharacterType>(expandedCapacity(capacity(), requiredLength));
     } else {
         ASSERT(m_string.length() == m_length);
         allocateBuffer(m_length ? m_string.characters<CharType>() : 0, expandedCapacity(capacity(), requiredLength));
+    }
     CharType* result = getBufferCharacters<CharType>() + m_length;
+        allocateBuffer(m_length ? m_string.characters<CharacterType>() : nullptr, expandedCapacity(capacity(), requiredLength));
+    }
+    auto* result = bufferCharacters<CharacterType>() + m_length;
     m_length = requiredLength;
     ASSERT(m_buffer->length() >= m_length);
 …
+}
+inline UChar* StringBuilder::appendUninitializedUpconvert(unsigned length)
+{
+    unsigned requiredLength = length + m_length;
+    if (requiredLength < length)
+        CRASH();
+    if (m_buffer) {
+        // If the buffer is valid it must be at least as long as the current builder contents!
+        ASSERT(m_buffer->length() >= m_length);
+        allocateBufferUpconvert(m_buffer->characters8(), expandedCapacity(capacity(), requiredLength));
+    } else {
+        ASSERT(m_string.length() == m_length);
+        allocateBufferUpconvert(m_string.isNull() ? nullptr : m_string.characters8(), expandedCapacity(capacity(), requiredLength));
+    }
+    auto* result = m_bufferCharacters16 + m_length;
+    m_length = requiredLength;
+    return result;
+}
 void StringBuilder::append(const UChar* characters, unsigned length)
+{
 …
     if (m_is8Bit) {
         if (length == 1 && !(*characters & ~0xff)) {
+        if (length == 1 && !(*characters & ~0xFF)) {
             // Append as 8 bit character
             LChar lChar = static_cast<LChar>(*characters);
 …
             return;
+        }
+        // Calculate the new size of the builder after appending.
+        unsigned requiredLength = length + m_length;
+        if (requiredLength < length)
+            CRASH();
+        if (m_buffer) {
+            // If the buffer is valid it must be at least as long as the current builder contents!
+            ASSERT(m_buffer->length() >= m_length);
+            allocateBufferUpConvert(m_buffer->characters8(), expandedCapacity(capacity(), requiredLength));
+        } else {
+            ASSERT(m_string.length() == m_length);
+            allocateBufferUpConvert(m_string.isNull() ? 0 : m_string.characters8(), expandedCapacity(capacity(), requiredLength));
+        }
+        memcpy(m_bufferCharacters16 + m_length, characters, static_cast<size_t>(length) * sizeof(UChar));
+        m_length = requiredLength;
+        memcpy(appendUninitializedUpconvert(length), characters, static_cast<size_t>(length) * sizeof(UChar));
     } else
         memcpy(appendUninitialized<UChar>(length), characters, static_cast<size_t>(length) * sizeof(UChar));
     ASSERT(m_buffer->length() >= m_length);
+}
 …
     if (m_is8Bit) {
+        LChar* dest = appendUninitialized<LChar>(length);
+        auto* destination = appendUninitialized<LChar>(length);
+        // FIXME: How did we determine a threshold of 8 here was the right one?
+        // Also, this kind of optimization could be useful anywhere else we have a
+        // performance-sensitive code path that calls memcpy.
         if (length > 8)
             memcpy(dest, characters, static_cast<size_t>(length) * sizeof(LChar));
+            memcpy(destination, characters, length);
         else {
             const LChar* end = characters + length;
             while (characters < end)
                 *(dest++) = *(characters++);
+                *destination++ = *characters++;
+        }
     } else {
         UChar* dest = appendUninitialized<UChar>(length);
+        auto* destination = appendUninitialized<UChar>(length);
         const LChar* end = characters + length;
         while (characters < end)
             *(dest++) = *(characters++);
+            *destination++ = *characters++;
+    }
+}
 …
+}
+template <typename OutputCharacterType, typename InputCharacterType>
+static void appendQuotedJSONStringInternal(OutputCharacterType*& output, const InputCharacterType* input, unsigned length)
+{
+    for (const InputCharacterType* end = input + length; input != end; ++input) {
+        if (LIKELY(*input > 0x1F)) {
+            if (*input == '"' || *input == '\\')
+template<typename LengthType, typename CharacterType> static LengthType quotedJSONStringLength(const CharacterType* input, unsigned length)
+{
+    LengthType quotedLength = 2;
+    for (unsigned i = 0; i < length; ++i) {
+        auto character = input[i];
+        if (LIKELY(character > 0x1F)) {
+            switch (character) {
+            case '"':
+            case '\\':
+                quotedLength += 2;
+                break;
+            default:
+                ++quotedLength;
+                break;
+            }
+        } else {
+            switch (character) {
+            case '\t':
+            case '\r':
+            case '\n':
+            case '\f':
+            case '\b':
+                quotedLength += 2;
+                break;
+            default:
+                quotedLength += 6;
+            }
+        }
+    }
+    return quotedLength;
+}
+template<typename CharacterType> static inline unsigned quotedJSONStringLength(const CharacterType* input, unsigned length)
+{
+    constexpr auto maxSafeLength = (std::numeric_limits<unsigned>::max() - 2) / 6;
+    if (length <= maxSafeLength)
+        return quotedJSONStringLength<unsigned>(input, length);
+    return quotedJSONStringLength<Checked<unsigned>>(input, length).unsafeGet();
+}
+template<typename OutputCharacterType, typename InputCharacterType> static inline void appendQuotedJSONStringInternal(OutputCharacterType* output, const InputCharacterType* input, unsigned length)
+{
+    *output++ = '"';
+    for (unsigned i = 0; i < length; ++i) {
+        auto character = input[i];
+        if (LIKELY(character > 0x1F)) {
+            if (UNLIKELY(character == '"' || character == '\\'))
                 *output++ = '\\';
             *output++ = *input;
+            *output++ = character;
             continue;
+        }
         switch (*input) {
+        switch (character) {
         case '\t':
             *output++ = '\\';
 …
             break;
         default:
+            ASSERT((*input & 0xFF00) == 0);
+            static const char hexDigits[] = "0123456789abcdef";
+            ASSERT(!(character & ~0xFF));
             *output++ = '\\';
             *output++ = 'u';
             *output++ = '0';
             *output++ = '0';
             *output++ = static_cast<LChar>(hexDigits[(*input >> 4) & 0xF]);
             *output++ = static_cast<LChar>(hexDigits[*input & 0xF]);
+            *output++ = upperNibbleToLowercaseASCIIHexDigit(character);
+            *output++ = lowerNibbleToLowercaseASCIIHexDigit(character);
             break;
+        }
+    }
+}
+void StringBuilder::appendQuotedJSONString(const String& string)
+{
+    // Make sure we have enough buffer space to append this string without having
+    // to worry about reallocating in the middle.
+    // The 2 is for the '"' quotes on each end.
+    // The 6 is for characters that need to be \uNNNN encoded.
+    Checked<unsigned> stringLength = string.length();
+    Checked<unsigned> maximumCapacityRequired = length();
+    maximumCapacityRequired += 2 + stringLength * 6;
+    unsigned allocationSize = maximumCapacityRequired.unsafeGet();
+    // This max() is here to allow us to allocate sizes between the range [2^31, 2^32 - 2] because roundUpToPowerOfTwo(1<<31 + some int smaller than 1<<31) == 0.
+    allocationSize = std::max(allocationSize, roundUpToPowerOfTwo(allocationSize));
+    if (is8Bit() && !string.is8Bit())
+        allocateBufferUpConvert(m_bufferCharacters8, allocationSize);
+    else
+        reserveCapacity(allocationSize);
+    ASSERT(m_buffer->length() >= allocationSize);
+    if (is8Bit()) {
+        ASSERT(string.is8Bit());
+        LChar* output = m_bufferCharacters8 + m_length;
+        *output++ = '"';
+        appendQuotedJSONStringInternal(output, string.characters8(), string.length());
+        *output++ = '"';
+        m_length = output - m_bufferCharacters8;
+    *output = '"';
+}
+void StringBuilder::appendQuotedJSONString(StringView string)
+{
+    unsigned length = string.length();
+    if (string.is8Bit()) {
+        auto* characters = string.characters8();
+        if (m_is8Bit)
+            appendQuotedJSONStringInternal(appendUninitialized<LChar>(quotedJSONStringLength(characters, length)), characters, length);
+        else
+            appendQuotedJSONStringInternal(appendUninitialized<UChar>(quotedJSONStringLength(characters, length)), characters, length);
     } else {
+        UChar* output = m_bufferCharacters16 + m_length;
+        *output++ = '"';
+        if (string.is8Bit())
+            appendQuotedJSONStringInternal(output, string.characters8(), string.length());
+        auto* characters = string.characters16();
+        if (m_is8Bit)
+            appendQuotedJSONStringInternal(appendUninitializedUpconvert(quotedJSONStringLength(characters, length)), characters, length);
         else
+            appendQuotedJSONStringInternal(output, string.characters16(), string.length());
+        *output++ = '"';
+        m_length = output - m_bufferCharacters16;
+    }
+    ASSERT(m_buffer->length() >= m_length);
+            appendQuotedJSONStringInternal(appendUninitialized<UChar>(quotedJSONStringLength(characters, length)), characters, length);
+    }
+}

trunk/Source/WTF/wtf/text/StringBuilder.h

-              r205847
+              r209058
 /*
  * Copyright (C) 2009-2010, 2012-2013, 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2009-2016 Apple Inc. All rights reserved.
  * Copyright (C) 2012 Google Inc. All rights reserved.
+ *
 …
  */
+#ifndef StringBuilder_h
+#define StringBuilder_h
+#include <wtf/text/AtomicString.h>
+#pragma once
 #include <wtf/text/StringView.h>
-#include <wtf/text/WTFString.h>
 namespace WTF {
 class StringBuilder {
     // Disallow copying since it's expensive and we don't want code to do it by accident.
+    // Disallow copying since it's expensive and we don't want anyone to do it by accident.
     WTF_MAKE_NONCOPYABLE(StringBuilder);
 public:
+    StringBuilder()
+        : m_length(0)
+        , m_is8Bit(true)
+        , m_bufferCharacters8(0)
+    {
+    }
+    StringBuilder() = default;
     WTF_EXPORT_PRIVATE void append(const UChar*, unsigned);
 …
     ALWAYS_INLINE void append(const char* characters, unsigned length) { append(reinterpret_cast<const LChar*>(characters), length); }
+    void append(const AtomicString& atomicString)
+    {
+        append(atomicString.string());
+    }
+    void append(const AtomicString& atomicString) { append(atomicString.string()); }
     void append(const String& string)
+    {
+        if (!string.length())
+            return;
+        // If we're appending to an empty string, and there is not a buffer (reserveCapacity has not been called)
+        // then just retain the string.
+        unsigned length = string.length();
+        if (!length)
+            return;
+        // If we're appending to an empty string, and there is not a buffer
+        // (reserveCapacity has not been called) then just retain the string.
         if (!m_length && !m_buffer) {
             m_string = string;
             m_length = string.length();
             m_is8Bit = m_string.is8Bit();
+            m_length = length;
+            m_is8Bit = string.is8Bit();
             return;
+        }
         if (string.is8Bit())
             append(string.characters8(), string.length());
+            append(string.characters8(), length);
         else
             append(string.characters16(), string.length());
+            append(string.characters16(), length);
+    }
 …
             return;
         // If we're appending to an empty string, and there is not a buffer (reserveCapacity has not been called)
         // then just retain the string.
+        // If we're appending to an empty string, and there is not a buffer
+        // (reserveCapacity has not been called) then just retain the string.
         if (!m_length && !m_buffer && !other.m_string.isNull()) {
             m_string = other.m_string;
             m_length = other.m_length;
+            m_is8Bit = other.m_is8Bit;
             return;
+        }
 …
     WTF_EXPORT_PRIVATE void append(CFStringRef);
 #endif
 #if USE(CF) && defined(__OBJC__)
     void append(NSString *string) { append((__bridge CFStringRef)string); }
 …
     void append(const String& string, unsigned offset, unsigned length)
+    {
+        if (!string.length())
+            return;
+        if ((offset + length) > string.length())
+            return;
+        ASSERT(offset <= string.length());
+        ASSERT(offset + length <= string.length());
+        if (!length)
+            return;
+        // If we're appending to an empty string, and there is not a buffer
+        // (reserveCapacity has not been called) then just retain the string.
+        if (!offset && !m_length && !m_buffer && length == string.length()) {
+            m_string = string;
+            m_length = length;
+            m_is8Bit = string.is8Bit();
+            return;
+        }
         if (string.is8Bit())
 …
+    }
     void append(UChar c)
+    void append(UChar character)
+    {
         if (m_buffer && m_length < m_buffer->length() && m_string.isNull()) {
             if (!m_is8Bit) {
                 m_bufferCharacters16[m_length++] = c;
+                m_bufferCharacters16[m_length++] = character;
                 return;
+            }
+            if (!(c & ~0xff)) {
+                m_bufferCharacters8[m_length++] = static_cast<LChar>(c);
+            if (!(character & ~0xFF)) {
+                m_bufferCharacters8[m_length++] = static_cast<LChar>(character);
                 return;
+            }
+        }
         append(&c, 1);
+    }
     void append(LChar c)
+        append(&character, 1);
+    }
+    void append(LChar character)
+    {
         if (m_buffer && m_length < m_buffer->length() && m_string.isNull()) {
             if (m_is8Bit)
                 m_bufferCharacters8[m_length++] = c;
+                m_bufferCharacters8[m_length++] = character;
             else
                 m_bufferCharacters16[m_length++] = c;
+                m_bufferCharacters16[m_length++] = character;
         } else
+            append(&c, 1);
+    }
+    void append(char c)
+    {
+        append(static_cast<LChar>(c));
+    }
+            append(&character, 1);
+    }
+    void append(char character) { append(static_cast<LChar>(character)); }
     void append(UChar32 c)
 …
+    }
+    WTF_EXPORT_PRIVATE void appendQuotedJSONString(const String&);
+    template<unsigned charactersCount>
+    ALWAYS_INLINE void appendLiteral(const char (&characters)[charactersCount]) { append(characters, charactersCount - 1); }
+    WTF_EXPORT_PRIVATE void appendQuotedJSONString(StringView);
+    template<unsigned charactersCount> ALWAYS_INLINE void appendLiteral(const char (&characters)[charactersCount]) { append(characters, charactersCount - 1); }
     WTF_EXPORT_PRIVATE void appendNumber(int);
 …
+    }
+    unsigned length() const
+    {
+        return m_length;
+    }
+    unsigned length() const { return m_length; }
     bool isEmpty() const { return !m_length; }
     WTF_EXPORT_PRIVATE void reserveCapacity(unsigned newCapacity);
+    unsigned capacity() const
+    {
+        return m_buffer ? m_buffer->length() : m_length;
+    }
+    unsigned capacity() const { return m_buffer ? m_buffer->length() : m_length; }
     WTF_EXPORT_PRIVATE void resize(unsigned newSize);
     WTF_EXPORT_PRIVATE bool canShrink() const;
     WTF_EXPORT_PRIVATE void shrinkToFit();
 …
     void allocateBuffer(const LChar* currentCharacters, unsigned requiredLength);
     void allocateBuffer(const UChar* currentCharacters, unsigned requiredLength);
+    void allocateBufferUpConvert(const LChar* currentCharacters, unsigned requiredLength);
+    template <typename CharType>
+    void reallocateBuffer(unsigned requiredLength);
+    template <typename CharType>
+    ALWAYS_INLINE CharType* appendUninitialized(unsigned length);
+    template <typename CharType>
+    CharType* appendUninitializedSlow(unsigned length);
+    template <typename CharType>
+    ALWAYS_INLINE CharType * getBufferCharacters();
+    void allocateBufferUpconvert(const LChar* currentCharacters, unsigned requiredLength);
+    template<typename CharacterType> void reallocateBuffer(unsigned requiredLength);
+    UChar* appendUninitializedUpconvert(unsigned length);
+    template<typename CharacterType> CharacterType* appendUninitialized(unsigned length);
+    template<typename CharacterType> CharacterType* appendUninitializedSlow(unsigned length);
+    template<typename CharacterType> CharacterType* bufferCharacters();
     WTF_EXPORT_PRIVATE void reifyString() const;
     unsigned m_length;
+    unsigned m_length { 0 };
     mutable String m_string;
     RefPtr<StringImpl> m_buffer;
     bool m_is8Bit;
+    bool m_is8Bit { true };
     union {
         LChar* m_bufferCharacters8;
+        LChar* m_bufferCharacters8 { nullptr };
         UChar* m_bufferCharacters16;
     };
 };
+template <>
+ALWAYS_INLINE LChar* StringBuilder::getBufferCharacters<LChar>()
+{
+    ASSERT(m_is8Bit);
+    return m_bufferCharacters8;
+}
+template <>
+ALWAYS_INLINE UChar* StringBuilder::getBufferCharacters<UChar>()
+{
+    ASSERT(!m_is8Bit);
+    return m_bufferCharacters16;
+}
+template <typename CharType>
+bool equal(const StringBuilder& s, const CharType* buffer, unsigned length)
+template<typename StringType> bool equal(const StringBuilder&, const StringType&);
+template<typename CharacterType> bool equal(const StringBuilder&, const CharacterType*, unsigned length);
+bool operator==(const StringBuilder&, const StringBuilder&);
+bool operator!=(const StringBuilder&, const StringBuilder&);
+bool operator==(const StringBuilder&, const String&);
+bool operator!=(const StringBuilder&, const String&);
+bool operator==(const String&, const StringBuilder&);
+bool operator!=(const String&, const StringBuilder&);
+template<typename CharacterType> inline bool equal(const StringBuilder& s, const CharacterType* buffer, unsigned length)
+{
     if (s.length() != length)
 …
+}
+template <typename StringType>
+bool equal(const StringBuilder& a, const StringType& b)
+template<typename StringType> inline bool equal(const StringBuilder& a, const StringType& b)
+{
+    if (a.length() != b.length())
+        return false;
+    if (!a.length())
+        return true;
+    if (a.is8Bit()) {
+        if (b.is8Bit())
+            return equal(a.characters8(), b.characters8(), a.length());
+        return equal(a.characters8(), b.characters16(), a.length());
+    }
+    if (b.is8Bit())
+        return equal(a.characters16(), b.characters8(), a.length());
+    return equal(a.characters16(), b.characters16(), a.length());
+    return equalCommon(a, b);
+}
 …
 using WTF::StringBuilder;
-#endif // StringBuilder_h

trunk/Source/WebCore/ChangeLog

-              r209050
+              r209058
+-11-28  Darin Adler  <darin@apple.com>
+        Streamline and speed up tokenizer and segmented string classes
+        https://bugs.webkit.org/show_bug.cgi?id=165003
+        Reviewed by Sam Weinig.
+        Profiling Speedometer on my iMac showed the tokenizer as one of the
+        hottest functions. This patch streamlines the segmented string class,
+        removing various unused features, and also improves some other functions
+        seen on the Speedometer profile. On my iMac I measured a speedup of
+        about 3%. Changes include:
+        - Removed m_pushedChar1, m_pushedChar2, and m_empty data members from the
+          SegmentedString class and all the code that used to handle them.
+        - Simplified the SegmentedString advance functions so they are small
+          enough to get inlined in the HTML tokenizer.
+        - Updated callers to call the simpler SegmentedString advance functions
+          that don't handle newlines in as many cases as possible.
+        - Cut down on allocations of SegmentedString and made code move the
+          segmented string and the strings that are moved into it rather than
+          copying them whenever possible.
+        - Simplified segmented string functions, removing some branches, mostly
+          from the non-fast paths.
+        - Removed small unused functions and small functions used in only one
+          or two places, made more functions private and renamed for clarity.
+        * bindings/js/JSHTMLDocumentCustom.cpp:
+        (WebCore::documentWrite): Moved a little more of the common code in here
+        from the two functions belwo. Removed obsolete comment saying this was not
+        following the DOM specification because it is. Removed unneeded special
+        cases for 1 argument and no arguments. Take a reference instead of a pointer.
+        (WebCore::JSHTMLDocument::write): Updated for above.
+        (WebCore::JSHTMLDocument::writeln): Ditto.
+        * css/parser/CSSTokenizer.cpp: Added now-needed include.
+        * css/parser/CSSTokenizer.h: Removed unneeded include.
+        * css/parser/CSSTokenizerInputStream.h: Added definition of kEndOfFileMarker
+        here; this is now separate from the use in the HTMLParser. In the long run,
+        unclear to me whether it is really needed in either.
+        * dom/Document.cpp:
+        (WebCore::Document::prepareToWrite): Added. Helper function used by the three
+        different variants of write. Using this may prevent us from having to construct
+        a SegmentedString just to append one string after future refactoring.
+        (WebCore::Document::write): Updated to take an rvalue reference and move the
+        value through.
+        (WebCore::Document::writeln): Use a single write call instead of two.
+        * dom/Document.h: Changed write to take an rvalue reference to SegmentedString
+        rather than a const reference.
+        * dom/DocumentParser.h: Changed insert to take an rvalue reference to
+        SegmentedString. In the future, should probably overload to take a single
+        string since that is the normal case.
+        * dom/RawDataDocumentParser.h: Updated for change to DocumentParser.
+        * html/FTPDirectoryDocument.cpp:
+        (WebCore::FTPDirectoryDocumentParser::append): Refactored a bit, just enough
+        so that we don't need an assignment operator for SegmentedString that can
+        copy a String.
+        * html/parser/HTMLDocumentParser.cpp:
+        (WebCore::HTMLDocumentParser::insert): Updated to take an rvalue reference,
+        and move the value through.
+        * html/parser/HTMLDocumentParser.h: Updated for the above.
+        * html/parser/HTMLEntityParser.cpp:
+        (WebCore::HTMLEntityParser::consumeNamedEntity): Updated for name changes.
+        Changed the twao calls to advance here to call advancePastNonNewline; no
+        change in behavior, but asserts what the code was assuming before, that the
+        character was not a newline.
+        * html/parser/HTMLInputStream.h:
+        (WebCore::HTMLInputStream::appendToEnd): Updated to take an rvalue reference,
+        and move the value through.
+        (WebCore::HTMLInputStream::insertAtCurrentInsertionPoint): Ditto.
+        (WebCore::HTMLInputStream::markEndOfFile): Removed the code to construct a
+        SegmentedString, overkill since we can just append an individual string.
+        (WebCore::HTMLInputStream::splitInto): Rewrote the move idiom here to actually
+        use move, which will reduce reference count churn and other unneeded work.
+        * html/parser/HTMLMetaCharsetParser.cpp:
+        (WebCore::HTMLMetaCharsetParser::checkForMetaCharset): Removed unneeded
+        construction of a SegmentedString, just to append a string.
+        * html/parser/HTMLSourceTracker.cpp:
+        (WebCore::HTMLSourceTracker::HTMLSourceTracker): Moved to the class definition.
+        (WebCore::HTMLSourceTracker::source): Updated for function name change.
+        * html/parser/HTMLSourceTracker.h: Updated for above.
+        * html/parser/HTMLTokenizer.cpp: Added now-needed include.
+        (WebCore::HTMLTokenizer::emitAndResumeInDataState): Use advancePastNonNewline,
+        since this function is never called in response to a newline character.
+        (WebCore::HTMLTokenizer::commitToPartialEndTag): Ditto.
+        (WebCore::HTMLTokenizer::commitToCompleteEndTag): Ditto.
+        (WebCore::HTMLTokenizer::processToken): Use ADVANCE_PAST_NON_NEWLINE_TO macro
+        instead of ADVANCE_TO in cases where the character we are advancing past is
+        known not to be a newline, so we can use the more efficient advance function
+        that doesn't check for the newline character.
+        * html/parser/InputStreamPreprocessor.h: Moved kEndOfFileMarker to
+        SegmentedString.h; not sure that's a good place for it either. In the long run,
+        unclear to me whether this is really needed.
+        (WebCore::InputStreamPreprocessor::peek): Added UNLIKELY for the empty check.
+        Added LIKELY for the not-special character check.
+        (WebCore::InputStreamPreprocessor::advance): Updated for the new name of the
+        advanceAndUpdateLineNumber function.
+        (WebCore::InputStreamPreprocessor::advancePastNonNewline): Added. More
+        efficient than advance for cases where the last characer is known not to be
+        a newline character.
+        (WebCore::InputStreamPreprocessor::skipNextNewLine): Deleted. Was unused.
+        (WebCore::InputStreamPreprocessor::reset): Deleted. Was unused except in the
+        constructor; added initial values for the data members to replace.
+        (WebCore::InputStreamPreprocessor::processNextInputCharacter): Removed long
+        FIXME comment that didn't really need to be here. Reorganized a bit.
+        (WebCore::InputStreamPreprocessor::isAtEndOfFile): Renamed and made static.
+        * html/track/BufferedLineReader.cpp:
+        (WebCore::BufferedLineReader::nextLine): Updated to not use the poorly named
+        scanCharacter function to advance past a newline. Also renamed from getLine
+        and changed to return Optional<String> instead of using a boolean to indicate
+        failure and an out argument.
+        * html/track/BufferedLineReader.h:
+        (WebCore::BufferedLineReader::BufferedLineReader): Use the default, putting
+        initial values on each data member below.
+        (WebCore::BufferedLineReader::append): Updated to take an rvalue reference,
+        and move the value through.
+        (WebCore::BufferedLineReader::scanCharacter): Deleted. Was poorly named,
+        and easy to replace with two lines of code at its two call sites.
+        (WebCore::BufferedLineReader::reset): Rewrote to correctly clear all the
+        data members of the class, not just the segmented string.
+        * html/track/InbandGenericTextTrack.cpp:
+        (WebCore::InbandGenericTextTrack::parseWebVTTFileHeader): Updated to take
+        an rvalue reference and move the value through.
+        * html/track/InbandGenericTextTrack.h: Updated for the above.
+        * html/track/InbandTextTrack.h: Updated since parseWebVTTFileHeader now
+        takes an rvalue reference.
+        * html/track/WebVTTParser.cpp:
+        (WebCore::WebVTTParser::parseFileHeader): Updated to take an rvalue reference
+        and move the value through.
+        (WebCore::WebVTTParser::parseBytes): Updated to pass ownership of the string
+        in to the line reader append function.
+        (WebCore::WebVTTParser::parseCueData): Use auto and WTFMove for WebVTTCueData.
+        (WebCore::WebVTTParser::flush): More of the same.
+        (WebCore::WebVTTParser::parse): Changed to use nextLine instead of getLine.
+        * html/track/WebVTTParser.h: Updated for the above.
+        * html/track/WebVTTTokenizer.cpp:
+        (WebCore::advanceAndEmitToken): Use advanceAndUpdateLineNumber by its new
+        name, just advance. No change in behavior.
+        (WebCore::WebVTTTokenizer::WebVTTTokenizer): Pass a String, not a
+        SegmentedString, to add the end of file marker.
+        * platform/graphics/InbandTextTrackPrivateClient.h: Updated since
+        parseWebVTTFileHeader takes an rvalue reference.
+        * platform/text/SegmentedString.cpp:
+        (WebCore::SegmentedString::Substring::appendTo): Moved here from the header.
+        The only caller is SegmentedString::toString, inside this file.
+        (WebCore::SegmentedString::SegmentedString): Deleted the copy constructor.
+        No longer needed.
+        (WebCore::SegmentedString::operator=): Defined a move assignment operator
+        rather than an ordinary assignment operator, since that's what the call
+        sites really need.
+        (WebCore::SegmentedString::length): Simplified since we no longer need to
+        support pushed characters.
+        (WebCore::SegmentedString::setExcludeLineNumbers): Simplified, since we
+        can just iterate m_otherSubstrings without an extra check. Also changed to
+        write directly to the data member of Substring instead of using a function.
+        (WebCore::SegmentedString::updateAdvanceFunctionPointersForEmptyString):
+        Added. Used when we run out of characters.
+        (WebCore::SegmentedString::clear): Removed code to clear now-deleted members.
+        Updated for changes to other member names.
+        (WebCore::SegmentedString::appendSubstring): Renamed from just append to
+        avoid ambiguity with the public append function. Changed to take an rvalue
+        reference, and move in, and added code to set m_currentCharacter properly,
+        so the caller doesn't have to deal with that.
+        (WebCore::SegmentedString::close): Updated to use m_isClosed by its new name.
+        Also removed unneeded comment about assertion that fires when trying to close
+        an already closed string.
+        (WebCore::SegmentedString::append): Added overloads for rvalue references of
+        both entire SegmentedString objects and of String. Streamlined to just call
+        appendSubstring and append to the deque.
+        (WebCore::SegmentedString::pushBack): Tightened up since we don't allow empty
+        strings and changed to take just a string, not an entire segmented string.
+        (WebCore::SegmentedString::advanceSubstring): Moved logic into the
+        advancePastSingleCharacterSubstringWithoutUpdatingLineNumber function.
+        (WebCore::SegmentedString::toString): Simplified now that we don't need to
+        support pushed characters.
+        (WebCore::SegmentedString::advancePastNonNewlines): Deleted.
+        (WebCore::SegmentedString::advance8): Deleted.
+        (WebCore::SegmentedString::advanceWithoutUpdatingLineNumber16): Renamed from
+        advance16. Simplified now that there are no pushed characters. Also changed to
+        access data members of m_currentSubstring directly instead of calling a function.
+        (WebCore::SegmentedString::advanceAndUpdateLineNumber8): Deleted.
+        (WebCore::SegmentedString::advanceAndUpdateLineNumber16): Ditto.
+        (WebCore::SegmentedString::advancePastSingleCharacterSubstringWithoutUpdatingLineNumber):
+        Renamed from advanceSlowCase. Removed uneeded logic to handle pushed characters.
+        Moved code in here from advanceSubstring.
+        (WebCore::SegmentedString::advancePastSingleCharacterSubstring): Renamed from
+        advanceAndUpdateLineNumberSlowCase. Simplified by calling the function above.
+        (WebCore::SegmentedString::advanceEmpty): Broke assertion up into two.
+        (WebCore::SegmentedString::updateSlowCaseFunctionPointers): Updated for name changes.
+        (WebCore::SegmentedString::advancePastSlowCase): Changed name and meaning of
+        boolean argument. Rewrote to use the String class less; it's now used only when
+        we fail to match after the first character rather than being used for the actual
+        comparison with the literal.
+        * platform/text/SegmentedString.h: Moved all non-trivial function bodies out of
+        the class definition to make things easier to read. Moved the SegmentedSubstring
+        class inside the SegmentedString class, making it a private struct named Substring.
+        Removed the m_ prefix from data members of the struct, removed many functions from
+        the struct and made its union be anonymous instead of naming it m_data. Removed
+        unneeded StringBuilder.h include.
+        (WebCore::SegmentedString::isEmpty): Changed to use the length of the substring
+        instead of a separate boolean. We never create an empty substring, nor leave one
+        in place as the current substring unless the entire segmented string is empty.
+        (WebCore::SegmentedString::advancePast): Updated to use the new member function
+        template instead of a non-template member function. The new member function is
+        entirely rewritten and does the matching directly rather than allocating a string
+        just to do prefix matching.
+        (WebCore::SegmentedString::advancePastLettersIgnoringASCIICase): Renamed to make
+        it clear that the literal must be all non-letters or lowercase letters as with
+        the other "letters ignoring ASCII case" functions. The three call sites all fit
+        the bill. Implement by calling the new function template.
+        (WebCore::SegmentedString::currentCharacter): Renamed from currentChar.
+        (WebCore::SegmentedString::Substring::Substring): Use an rvalue reference and
+        move the string in.
+        (WebCore::SegmentedString::Substring::currentCharacter): Simplified since this
+        is never used on an empty substring.
+        (WebCore::SegmentedString::Substring::incrementAndGetCurrentCharacter): Ditto.
+        (WebCore::SegmentedString::SegmentedString): Overload to take an rvalue reference.
+        Simplified since there are now fewer data members.
+        (WebCore::SegmentedString::advanceWithoutUpdatingLineNumber): Renamed from
+        advance, since this is only safe to use if there is some reason it is OK to skip
+        updating the line number.
+        (WebCore::SegmentedString::advance): Renamed from advanceAndUpdateLineNumber,
+        since doing that is the normal desired behavior and not worth mentioning in the
+        public function name.
+        (WebCore::SegmentedString::advancePastNewline): Renamed from
+        advancePastNewlineAndUpdateLineNumber.
+        (WebCore::SegmentedString::numberOfCharactersConsumed): Greatly simplified since
+        pushed characters are no longer supported.
+        (WebCore::SegmentedString::characterMismatch): Added. Used by advancePast.
+        * xml/parser/CharacterReferenceParserInlines.h:
+        (WebCore::unconsumeCharacters): Use toString rather than toStringPreserveCapacity
+        because the SegmentedString is going to take ownership of the string.
+        (WebCore::consumeCharacterReference): Updated to use the pushBack that takes just
+        a String, not a SegmentedString. Also use advancePastNonNewline.
+        * xml/parser/MarkupTokenizerInlines.h: Added ADVANCE_PAST_NON_NEWLINE_TO.
+        * xml/parser/XMLDocumentParser.cpp:
+        (WebCore::XMLDocumentParser::insert): Updated since this takes an rvalue reference.
+        (WebCore::XMLDocumentParser::append): Removed unnecessary code to create a
+        SegmentedString.
+        * xml/parser/XMLDocumentParser.h: Updated for above. Also fixed indentation
+        and initialized most data members.
+        * xml/parser/XMLDocumentParserLibxml2.cpp:
+        (WebCore::XMLDocumentParser::XMLDocumentParser): Moved most data member
+        initialization into the class definition.
+        (WebCore::XMLDocumentParser::resumeParsing): Removed code that copied a
+        segmented string, but converted the whole thing into a string before using it.
+        Now we convert to a string right away.
 -11-28  Chris Dumez  <cdumez@apple.com>

trunk/Source/WebCore/bindings/js/JSHTMLDocumentCustom.cpp

-              r208112
+              r209058
 /*
  * Copyright (C) 2007-2009, 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2007-2016 Apple Inc. All rights reserved.
+ *
  * Redistribution and use in source and binary forms, with or without
 …
 #include "JSHTMLDocument.h"
-#include "Frame.h"
-#include "HTMLCollection.h"
-#include "HTMLDocument.h"
-#include "HTMLElement.h"
 #include "HTMLIFrameElement.h"
-#include "HTMLNames.h"
-#include "JSDOMWindow.h"
 #include "JSDOMWindowCustom.h"
-#include "JSDOMWindowShell.h"
-#include "JSDocumentCustom.h"
 #include "JSHTMLCollection.h"
-#include "JSMainThreadExecState.h"
 #include "SegmentedString.h"
-#include "DocumentParser.h"
-#include <interpreter/StackVisitor.h>
-#include <runtime/Error.h>
-#include <runtime/JSCell.h>
-#include <wtf/unicode/CharacterNames.h>
 using namespace JSC;
 …
+{
     auto& document = passedDocument.get();
+    JSObject* wrapper = createWrapper<HTMLDocument>(globalObject, WTFMove(passedDocument));
+    auto* wrapper = createWrapper<HTMLDocument>(globalObject, WTFMove(passedDocument));
     reportMemoryForDocumentIfFrameless(*state, document);
     return wrapper;
+}
 …
+}
 bool JSHTMLDocument::getOwnPropertySlot(JSObject* object, ExecState* exec, PropertyName propertyName, PropertySlot& slot)
+bool JSHTMLDocument::getOwnPropertySlot(JSObject* object, ExecState* state, PropertyName propertyName, PropertySlot& slot)
+{
     JSHTMLDocument* thisObject = jsCast<JSHTMLDocument*>(object);
     ASSERT_GC_OBJECT_INHERITS(thisObject, info());
+    auto& thisObject = *jsCast<JSHTMLDocument*>(object);
+    ASSERT_GC_OBJECT_INHERITS((&thisObject), info());
     if (propertyName == "open") {
         if (Base::getOwnPropertySlot(thisObject, exec, propertyName, slot))
+        if (Base::getOwnPropertySlot(&thisObject, state, propertyName, slot))
             return true;
+        slot.setCustom(thisObject, ReadOnly | DontDelete | DontEnum, nonCachingStaticFunctionGetter<jsHTMLDocumentPrototypeFunctionOpen, 2>);
+        slot.setCustom(&thisObject, ReadOnly | DontDelete | DontEnum, nonCachingStaticFunctionGetter<jsHTMLDocumentPrototypeFunctionOpen, 2>);
         return true;
+    }
     JSValue value;
     if (thisObject->nameGetter(exec, propertyName, value)) {
         slot.setValue(thisObject, ReadOnly | DontDelete | DontEnum, value);
+    if (thisObject.nameGetter(state, propertyName, value)) {
+        slot.setValue(&thisObject, ReadOnly | DontDelete | DontEnum, value);
         return true;
+    }
     return Base::getOwnPropertySlot(thisObject, exec, propertyName, slot);
+    return Base::getOwnPropertySlot(&thisObject, state, propertyName, slot);
+}
 bool JSHTMLDocument::nameGetter(ExecState* exec, PropertyName propertyName, JSValue& value)
+bool JSHTMLDocument::nameGetter(ExecState* state, PropertyName propertyName, JSValue& value)
+{
     auto& document = wrapped();
     AtomicStringImpl* atomicPropertyName = propertyName.publicName();
+    auto* atomicPropertyName = propertyName.publicName();
     if (!atomicPropertyName || !document.hasDocumentNamedItem(*atomicPropertyName))
         return false;
     if (UNLIKELY(document.documentNamedItemContainsMultipleElements(*atomicPropertyName))) {
         Ref<HTMLCollection> collection = document.documentNamedItems(atomicPropertyName);
+        auto collection = document.documentNamedItems(atomicPropertyName);
         ASSERT(collection->length() > 1);
         value = toJS(exec, globalObject(), collection);
+        value = toJS(state, globalObject(), collection);
         return true;
+    }
     Element& element = *document.documentNamedItem(*atomicPropertyName);
+    auto& element = *document.documentNamedItem(*atomicPropertyName);
     if (UNLIKELY(is<HTMLIFrameElement>(element))) {
         if (Frame* frame = downcast<HTMLIFrameElement>(element).contentFrame()) {
             value = toJS(exec, frame);
+        if (auto* frame = downcast<HTMLIFrameElement>(element).contentFrame()) {
+            value = toJS(state, frame);
             return true;
+        }
+    }
     value = toJS(exec, globalObject(), element);
+    value = toJS(state, globalObject(), element);
     return true;
+}
 …
+{
     // If "all" has been overwritten, return the overwritten value
+    JSValue v = getDirect(state.vm(), Identifier::fromString(&state, "all"));
+    if (v)
+        return v;
+    if (auto overwrittenValue = getDirect(state.vm(), Identifier::fromString(&state, "all")))
+        return overwrittenValue;
     return toJS(&state, globalObject(), wrapped().all());
 …
+}
 static Document* findCallingDocument(ExecState& state)
+static inline Document* findCallingDocument(ExecState& state)
+{
     CallerFunctor functor;
     state.iterate(functor);
     CallFrame* callerFrame = functor.callerFrame();
+    auto* callerFrame = functor.callerFrame();
     if (!callerFrame)
         return nullptr;
+    return asJSDOMWindow(functor.callerFrame()->lexicalGlobalObject())->wrapped().document();
+    return asJSDOMWindow(callerFrame->lexicalGlobalObject())->wrapped().document();
+}
 …
     // For compatibility with other browsers, pass open calls with more than 2 parameters to the window.
     if (state.argumentCount() > 2) {
+        if (Frame* frame = wrapped().frame()) {
+            JSDOMWindowShell* wrapper = toJSDOMWindowShell(frame, currentWorld(&state));
+            if (wrapper) {
+                JSValue function = wrapper->get(&state, Identifier::fromString(&state, "open"));
+        if (auto* frame = wrapped().frame()) {
+            if (auto* wrapper = toJSDOMWindowShell(frame, currentWorld(&state))) {
+                auto function = wrapper->get(&state, Identifier::fromString(&state, "open"));
                 CallData callData;
                 CallType callType = ::getCallData(function, callData);
+                auto callType = ::getCallData(function, callData);
                 if (callType == CallType::None)
                     return throwTypeError(&state, scope);
 …
+    }
+    // document.open clobbers the security context of the document and
+    // aliases it with the active security context.
+    Document* activeDocument = asJSDOMWindow(state.lexicalGlobalObject())->wrapped().document();
+    // In the case of two parameters or fewer, do a normal document open.
+    wrapped().open(activeDocument);
+    // Calling document.open clobbers the security context of the document and aliases it with the active security context.
+    // FIXME: Is it correct that this does not use findCallingDocument as the write function below does?
+    wrapped().open(asJSDOMWindow(state.lexicalGlobalObject())->wrapped().document());
+    // FIXME: Why do we return the document instead of returning undefined?
     return this;
+}
 …
 enum NewlineRequirement { DoNotAddNewline, DoAddNewline };
 static inline void documentWrite(ExecState& state, JSHTMLDocument* thisDocument, NewlineRequirement addNewline)
+static inline JSValue documentWrite(ExecState& state, JSHTMLDocument& document, NewlineRequirement addNewline)
+{
     HTMLDocument* document = &thisDocument->wrapped();
     // DOM only specifies single string argument, but browsers allow multiple or no arguments.
+    VM& vm = state.vm();
+    auto scope = DECLARE_THROW_SCOPE(vm);
+    size_t size = state.argumentCount();
+    String firstString = state.argument(0).toString(&state)->value(&state);
+    SegmentedString segmentedString = firstString;
+    if (size != 1) {
+        if (!size)
+            segmentedString.clear();
+        else {
+            for (size_t i = 1; i < size; ++i) {
+                String subsequentString = state.uncheckedArgument(i).toString(&state)->value(&state);
+                segmentedString.append(SegmentedString(subsequentString));
+            }
+        }
+    SegmentedString segmentedString;
+    size_t argumentCount = state.argumentCount();
+    for (size_t i = 0; i < argumentCount; ++i) {
+        segmentedString.append(state.uncheckedArgument(i).toWTFString(&state));
+        RETURN_IF_EXCEPTION(scope, { });
+    }
     if (addNewline)
         segmentedString.append(SegmentedString(String(&newlineCharacter, 1)));
+        segmentedString.append(String { "\n" });
     Document* activeDocument = findCallingDocument(state);
     document->write(segmentedString, activeDocument);
+    document.wrapped().write(WTFMove(segmentedString), findCallingDocument(state));
+    return jsUndefined();
+}
 JSValue JSHTMLDocument::write(ExecState& state)
+{
+    documentWrite(state, this, DoNotAddNewline);
+    return jsUndefined();
+    return documentWrite(state, *this, DoNotAddNewline);
+}
 JSValue JSHTMLDocument::writeln(ExecState& state)
+{
+    documentWrite(state, this, DoAddNewline);
+    return jsUndefined();
+    return documentWrite(state, *this, DoAddNewline);
+}

trunk/Source/WebCore/css/parser/CSSTokenizer.cpp

r205103	r209058
36	36	#include "CSSTokenizerInputStream.h"
37	37	#include "HTMLParserIdioms.h"
	38	#include <wtf/text/StringBuilder.h>
38	39	#include <wtf/unicode/CharacterNames.h>
39	40

trunk/Source/WebCore/css/parser/CSSTokenizer.h

r208668	r209058
31	31
32	32	#include "CSSParserToken.h"
33		~~#include "InputStreamPreprocessor.h"~~
34	33	#include <climits>
35	34	#include <wtf/text/StringView.h>

trunk/Source/WebCore/css/parser/CSSTokenizerInputStream.h

-              r208668
+              r209058
 #include <wtf/text/StringView.h>
-#include <wtf/text/WTFString.h>
 namespace WebCore {
+constexpr LChar kEndOfFileMarker = 0;
 class CSSTokenizerInputStream {

trunk/Source/WebCore/dom/Document.cpp

r208991	r209058
2792	2792	}
2793	2793
2794		void Document::write(~~const SegmentedString~~& text, Document* ownerDocument)
	2794	void Document::write(SegmentedString&& text, Document* ownerDocument)
2795	2795	{
2796	2796	NestingLevelIncrementer nestingLevelIncrementer(m_writeRecursionDepth);
…	…
2800	2800
2801	2801	if (m_writeRecursionIsTooDeep)
2802		return;
	2802	return;
2803	2803
2804	2804	bool hasInsertionPoint = m_parser && m_parser->hasInsertionPoint();
…	…
2810	2810
2811	2811	ASSERT(m_parser);
2812		m_parser->insert(~~text~~);
	2812	m_parser->insert(WTFMove(text));
2813	2813	}
2814	2814
2815	2815	void Document::write(const String& text, Document* ownerDocument)
2816	2816	{
2817		write(SegmentedString~~(text)~~, ownerDocument);
	2817	write(SegmentedString { text }, ownerDocument);
2818	2818	}
2819	2819
2820	2820	void Document::writeln(const String& text, Document* ownerDocument)
2821	2821	{
2822		write(text, ownerDocument);
2823		write("\n", ownerDocument);
	2822	SegmentedString textWithNewline { text };
	2823	textWithNewline.append(String { "\n" });
	2824	write(WTFMove(textWithNewline), ownerDocument);
2824	2825	}
2825	2826

trunk/Source/WebCore/dom/Document.h

r208982	r209058
603	603	void cancelParsing();
604	604
605		void write(~~const SegmentedString~~& text, Document* ownerDocument = nullptr);
	605	void write(SegmentedString&& text, Document* ownerDocument = nullptr);
606	606	WEBCORE_EXPORT void write(const String& text, Document* ownerDocument = nullptr);
607	607	WEBCORE_EXPORT void writeln(const String& text, Document* ownerDocument = nullptr);

trunk/Source/WebCore/dom/DocumentParser.h

r208179	r209058
44	44
45	45	// insert is used by document.write.
46		virtual void insert(~~const SegmentedString~~&) = 0;
	46	virtual void insert(SegmentedString&&) = 0;
47	47
48	48	// appendBytes and flush are used by DocumentWriter (the loader).

trunk/Source/WebCore/dom/RawDataDocumentParser.h

r208179	r209058
50	50	}
51	51
52		void insert(~~const SegmentedString~~&) override
	52	void insert(SegmentedString&&) override
53	53	{
54	54	// <https://bugs.webkit.org/show_bug.cgi?id=25397>: JS code can always call document.write, we need to handle it.

trunk/Source/WebCore/html/FTPDirectoryDocument.cpp

-              r208658
+              r209058
 void FTPDirectoryDocumentParser::append(RefPtr<StringImpl>&& inputSource)
+{
-    String source(WTFMove(inputSource));
     // Make sure we have the table element to append to by loading the template set in the pref, or
     // creating a very basic document with the appropriate table
 …
     m_dest = m_buffer;
     SegmentedString str = source;
     while (!str.isEmpty()) {
         UChar c = str.currentChar();
+    SegmentedString string { String { WTFMove(inputSource) } };
+    while (!string.isEmpty()) {
+        UChar c = string.currentCharacter();
         if (c == '\r') {
 …
+        }
         str.advance();
+        string.advance();
         // Maybe enlarge the buffer

trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp

-              r208840
+              r209058
+}
 void HTMLDocumentParser::insert(const SegmentedString& source)
+void HTMLDocumentParser::insert(SegmentedString&& source)
+{
     if (isStopped())
 …
     Ref<HTMLDocumentParser> protectedThis(*this);
+    SegmentedString excludedLineNumberSource(source);
+    excludedLineNumberSource.setExcludeLineNumbers();
+    m_input.insertAtCurrentInsertionPoint(excludedLineNumberSource);
+    source.setExcludeLineNumbers();
+    m_input.insertAtCurrentInsertionPoint(WTFMove(source));
     pumpTokenizerIfPossible(ForceSynchronous);
 …
     Ref<HTMLDocumentParser> protectedThis(*this);
     String source(WTFMove(inputSource));
+    String source { WTFMove(inputSource) };
     if (m_preloadScanner) {

trunk/Source/WebCore/html/parser/HTMLDocumentParser.h

r208179	r209058
66	66	explicit HTMLDocumentParser(HTMLDocument&);
67	67
68		void insert(~~const SegmentedString~~&) final;
	68	void insert(SegmentedString&&) final;
69	69	void append(RefPtr<StringImpl>&&) override;
70	70	void finish() override;

trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp

-              r183552
+              r209058
         HTMLEntitySearch entitySearch;
         while (!source.isEmpty()) {
             cc = source.currentChar();
+            cc = source.currentCharacter();
             entitySearch.advance(cc);
             if (!entitySearch.isEntityPrefix())
                 break;
             consumedCharacters.append(cc);
             source.advance();
+            source.advancePastNonNewline();
+        }
         notEnoughCharacters = source.isEmpty();
 …
             const LChar* reference = entitySearch.mostRecentMatch()->entity;
             for (int i = 0; i < length; ++i) {
                 cc = source.currentChar();
+                cc = source.currentCharacter();
                 ASSERT_UNUSED(reference, cc == *reference++);
                 consumedCharacters.append(cc);
                 source.advance();
+                source.advancePastNonNewline();
                 ASSERT(!source.isEmpty());
+            }
             cc = source.currentChar();
+            cc = source.currentCharacter();
+        }
         if (entitySearch.mostRecentMatch()->lastCharacter() == ';'

trunk/Source/WebCore/html/parser/HTMLInputStream.h

r208179	r209058
26	26	#pragma once
27	27
28		~~#include "InputStreamPreprocessor.h"~~
29	28	#include "SegmentedString.h"
30	29	#include <wtf/text/TextPosition.h>
…	…
57	56	}
58	57
59		void appendToEnd(~~const SegmentedString~~& string)
	58	void appendToEnd(SegmentedString&& string)
60	59	{
61		m_last->append(~~string~~);
	60	m_last->append(WTFMove(string));
62	61	}
63	62
64		void insertAtCurrentInsertionPoint(~~const SegmentedString~~& string)
	63	void insertAtCurrentInsertionPoint(SegmentedString&& string)
65	64	{
66		m_first.append(~~string~~);
	65	m_first.append(WTFMove(string));
67	66	}
68	67
…	…
74	73	void markEndOfFile()
75	74	{
76		m_last->append(S~~egmentedString(String(&kEndOfFileMarker, 1))~~);
	75	m_last->append(String { &kEndOfFileMarker, 1 });
77	76	m_last->close();
78	77	}
…	…
93	92	void splitInto(SegmentedString& next)
94	93	{
95		next = m_first;
96		m_first = SegmentedString();
	94	next = WTFMove(m_first);
97	95	if (m_last == &m_first) {
98	96	// We used to only have one SegmentedString in the InputStream

trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp

-              r195452
+              r209058
 /*
  * Copyright (C) 2010 Google Inc. All Rights Reserved.
  * Copyright (C) 2015 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2015-2016 Apple Inc. All Rights Reserved.
+ *
  * Redistribution and use in source and binary forms, with or without
 …
     // least bytesToCheckUnconditionally bytes of input.
     static const int bytesToCheckUnconditionally = 1024;
+    constexpr int bytesToCheckUnconditionally = 1024;
     m_input.append(SegmentedString(m_codec->decode(data, length)));
+    m_input.append(m_codec->decode(data, length));
     while (auto token = m_tokenizer.nextToken(m_input)) {

trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp

-              r207848
+              r209058
 namespace WebCore {
-HTMLSourceTracker::HTMLSourceTracker()
+{
+}
 void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
 …
     unsigned i = 0;
     for ( ; i < length && !m_previousSource.isEmpty(); ++i) {
         source.append(m_previousSource.currentChar());
+        source.append(m_previousSource.currentCharacter());
         m_previousSource.advance();
+    }
     for ( ; i < length; ++i) {
         ASSERT(!m_currentSource.isEmpty());
         source.append(m_currentSource.currentChar());
+        source.append(m_currentSource.currentCharacter());
         m_currentSource.advance();
+    }

trunk/Source/WebCore/html/parser/HTMLSourceTracker.h

r208179	r209058
37	37	WTF_MAKE_NONCOPYABLE(HTMLSourceTracker);
38	38	public:
39		HTMLSourceTracker();
	39	HTMLSourceTracker() = default;
40	40
41	41	void startToken(SegmentedString&, HTMLTokenizer&);

trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp

-              r178265
+              r209058
 /*
  * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008-2016 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
 …
 #include "HTMLNames.h"
 #include "MarkupTokenizerInlines.h"
 #include <wtf/ASCIICType.h>
+#include <wtf/text/StringBuilder.h>
 using namespace WTF;
 …
     saveEndTagNameIfNeeded();
     m_state = DataState;
     source.advanceAndUpdateLineNumber();
+    source.advancePastNonNewline();
     return true;
+}
 …
 bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state)
+{
     ASSERT(source.currentChar() == character);
+    ASSERT(source.currentCharacter() == character);
     appendToTemporaryBuffer(character);
     source.advanceAndUpdateLineNumber();
+    source.advancePastNonNewline();
     if (haveBufferedCharacterToken()) {
 …
 bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source)
+{
     ASSERT(source.currentChar() == '>');
+    ASSERT(source.currentCharacter() == '>');
     appendToTemporaryBuffer('>');
     source.advance();
+    source.advancePastNonNewline();
     m_state = DataState;
 …
     BEGIN_STATE(DataState)
         if (character == '&')
             ADVANCE_TO(CharacterReferenceInDataState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInDataState);
         if (character == '<') {
             if (haveBufferedCharacterToken())
                 RETURN_IN_CURRENT_STATE(true);
             ADVANCE_TO(TagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(TagOpenState);
+        }
         if (character == kEndOfFileMarker)
 …
     BEGIN_STATE(RCDATAState)
         if (character == '&')
             ADVANCE_TO(CharacterReferenceInRCDATAState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInRCDATAState);
         if (character == '<')
             ADVANCE_TO(RCDATALessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RCDATALessThanSignState);
         if (character == kEndOfFileMarker)
             RECONSUME_IN(DataState);
 …
     BEGIN_STATE(RAWTEXTState)
         if (character == '<')
             ADVANCE_TO(RAWTEXTLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RAWTEXTLessThanSignState);
         if (character == kEndOfFileMarker)
             RECONSUME_IN(DataState);
 …
     BEGIN_STATE(ScriptDataState)
         if (character == '<')
             ADVANCE_TO(ScriptDataLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataLessThanSignState);
         if (character == kEndOfFileMarker)
             RECONSUME_IN(DataState);
 …
     BEGIN_STATE(TagOpenState)
         if (character == '!')
             ADVANCE_TO(MarkupDeclarationOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(MarkupDeclarationOpenState);
         if (character == '/')
             ADVANCE_TO(EndTagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(EndTagOpenState);
         if (isASCIIAlpha(character)) {
             m_token.beginStartTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(TagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);
+        }
         if (character == '?') {
 …
             m_token.beginEndTag(convertASCIIAlphaToLower(character));
             m_appropriateEndTagName.clear();
             ADVANCE_TO(TagNameState);
+        }
         if (character == '>') {
             parseError();
             ADVANCE_TO(DataState);
+            ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);
+        }
+        if (character == '>') {
+            parseError();
+            ADVANCE_PAST_NON_NEWLINE_TO(DataState);
+        }
         if (character == kEndOfFileMarker) {
 …
             ADVANCE_TO(BeforeAttributeNameState);
         if (character == '/')
             ADVANCE_TO(SelfClosingStartTagState);
+            ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
         if (character == '>')
             return emitAndResumeInDataState(source);
 …
+        }
         m_token.appendToName(toASCIILower(character));
         ADVANCE_TO(TagNameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);
     END_STATE()
 …
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
             ADVANCE_TO(RCDATAEndTagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RCDATAEndTagOpenState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(RCDATAEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RCDATAEndTagNameState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(RCDATAEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RCDATAEndTagNameState);
+        }
         if (isTokenizerWhitespace(character)) {
 …
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
             ADVANCE_TO(RAWTEXTEndTagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RAWTEXTEndTagOpenState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(RAWTEXTEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RAWTEXTEndTagNameState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(RAWTEXTEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(RAWTEXTEndTagNameState);
+        }
         if (isTokenizerWhitespace(character)) {
 …
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
             ADVANCE_TO(ScriptDataEndTagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEndTagOpenState);
+        }
         if (character == '!') {
             bufferASCIICharacter('<');
             bufferASCIICharacter('!');
             ADVANCE_TO(ScriptDataEscapeStartState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapeStartState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEndTagNameState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEndTagNameState);
+        }
         if (isTokenizerWhitespace(character)) {
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataEscapeStartDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapeStartDashState);
         } else
             RECONSUME_IN(ScriptDataState);
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataEscapedDashDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedDashDashState);
         } else
             RECONSUME_IN(ScriptDataState);
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataEscapedDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedDashState);
+        }
         if (character == '<')
             ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedLessThanSignState);
         if (character == kEndOfFileMarker) {
             parseError();
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataEscapedDashDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedDashDashState);
+        }
         if (character == '<')
             ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedLessThanSignState);
         if (character == kEndOfFileMarker) {
             parseError();
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataEscapedDashDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedDashDashState);
+        }
         if (character == '<')
             ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedLessThanSignState);
         if (character == '>') {
             bufferASCIICharacter('>');
             ADVANCE_TO(ScriptDataState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataState);
+        }
         if (character == kEndOfFileMarker) {
 …
             m_temporaryBuffer.clear();
             ASSERT(m_bufferedEndTagName.isEmpty());
             ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedEndTagOpenState);
+        }
         if (isASCIIAlpha(character)) {
 …
             m_temporaryBuffer.clear();
             appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapeStartState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedEndTagNameState);
+        }
         bufferASCIICharacter('<');
 …
             appendToTemporaryBuffer(character);
             appendToPossibleEndTag(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataEscapedEndTagNameState);
+        }
         if (isTokenizerWhitespace(character)) {
 …
             bufferASCIICharacter(character);
             appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapeStartState);
+        }
         RECONSUME_IN(ScriptDataEscapedState);
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataDoubleEscapedDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedDashState);
+        }
         if (character == '<') {
             bufferASCIICharacter('<');
             ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
         if (character == kEndOfFileMarker) {
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
         if (character == '<') {
             bufferASCIICharacter('<');
             ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
         if (character == kEndOfFileMarker) {
 …
         if (character == '-') {
             bufferASCIICharacter('-');
             ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedDashDashState);
+        }
         if (character == '<') {
             bufferASCIICharacter('<');
             ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapedLessThanSignState);
+        }
         if (character == '>') {
             bufferASCIICharacter('>');
             ADVANCE_TO(ScriptDataState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataState);
+        }
         if (character == kEndOfFileMarker) {
 …
             bufferASCIICharacter('/');
             m_temporaryBuffer.clear();
             ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapeEndState);
+        }
         RECONSUME_IN(ScriptDataDoubleEscapedState);
 …
             bufferASCIICharacter(character);
             appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
             ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+            ADVANCE_PAST_NON_NEWLINE_TO(ScriptDataDoubleEscapeEndState);
+        }
         RECONSUME_IN(ScriptDataDoubleEscapedState);
 …
             ADVANCE_TO(BeforeAttributeNameState);
         if (character == '/')
             ADVANCE_TO(SelfClosingStartTagState);
+            ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
         if (character == '>')
             return emitAndResumeInDataState(source);
 …
         m_token.beginAttribute(source.numberOfCharactersConsumed());
         m_token.appendToAttributeName(toASCIILower(character));
         ADVANCE_TO(AttributeNameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState);
     END_STATE()
 …
             ADVANCE_TO(AfterAttributeNameState);
         if (character == '/')
             ADVANCE_TO(SelfClosingStartTagState);
+            ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
         if (character == '=')
             ADVANCE_TO(BeforeAttributeValueState);
+            ADVANCE_PAST_NON_NEWLINE_TO(BeforeAttributeValueState);
         if (character == '>')
             return emitAndResumeInDataState(source);
 …
             parseError();
         m_token.appendToAttributeName(toASCIILower(character));
         ADVANCE_TO(AttributeNameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState);
     END_STATE()
 …
             ADVANCE_TO(AfterAttributeNameState);
         if (character == '/')
             ADVANCE_TO(SelfClosingStartTagState);
+            ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
         if (character == '=')
             ADVANCE_TO(BeforeAttributeValueState);
+            ADVANCE_PAST_NON_NEWLINE_TO(BeforeAttributeValueState);
         if (character == '>')
             return emitAndResumeInDataState(source);
 …
         m_token.beginAttribute(source.numberOfCharactersConsumed());
         m_token.appendToAttributeName(toASCIILower(character));
         ADVANCE_TO(AttributeNameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState);
     END_STATE()
 …
             ADVANCE_TO(BeforeAttributeValueState);
         if (character == '"')
             ADVANCE_TO(AttributeValueDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueDoubleQuotedState);
         if (character == '&')
             RECONSUME_IN(AttributeValueUnquotedState);
         if (character == '\'')
             ADVANCE_TO(AttributeValueSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueSingleQuotedState);
         if (character == '>') {
             parseError();
 …
             parseError();
         m_token.appendToAttributeValue(character);
         ADVANCE_TO(AttributeValueUnquotedState);
+        ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState);
     END_STATE()
 …
         if (character == '"') {
             m_token.endAttribute(source.numberOfCharactersConsumed());
             ADVANCE_TO(AfterAttributeValueQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterAttributeValueQuotedState);
+        }
         if (character == '&') {
             m_additionalAllowedCharacter = '"';
             ADVANCE_TO(CharacterReferenceInAttributeValueState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);
+        }
         if (character == kEndOfFileMarker) {
 …
         if (character == '\'') {
             m_token.endAttribute(source.numberOfCharactersConsumed());
             ADVANCE_TO(AfterAttributeValueQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterAttributeValueQuotedState);
+        }
         if (character == '&') {
             m_additionalAllowedCharacter = '\'';
             ADVANCE_TO(CharacterReferenceInAttributeValueState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);
+        }
         if (character == kEndOfFileMarker) {
 …
         if (character == '&') {
             m_additionalAllowedCharacter = '>';
             ADVANCE_TO(CharacterReferenceInAttributeValueState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);
+        }
         if (character == '>') {
 …
             parseError();
         m_token.appendToAttributeValue(character);
         ADVANCE_TO(AttributeValueUnquotedState);
+        ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState);
     END_STATE()
 …
             ADVANCE_TO(BeforeAttributeNameState);
         if (character == '/')
             ADVANCE_TO(SelfClosingStartTagState);
+            ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
         if (character == '>')
             return emitAndResumeInDataState(source);
 …
                 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         } else if (isASCIIAlphaCaselessEqual(character, 'd')) {
             auto result = source.advancePastIgnoringCase("doctype");
+            auto result = source.advancePastLettersIgnoringASCIICase("doctype");
             if (result == SegmentedString::DidMatch)
                 SWITCH_TO(DOCTYPEState);
 …
     BEGIN_STATE(CommentStartState)
         if (character == '-')
             ADVANCE_TO(CommentStartDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentStartDashState);
         if (character == '>') {
             parseError();
 …
     BEGIN_STATE(CommentStartDashState)
         if (character == '-')
             ADVANCE_TO(CommentEndState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndState);
         if (character == '>') {
             parseError();
 …
     BEGIN_STATE(CommentState)
         if (character == '-')
             ADVANCE_TO(CommentEndDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndDashState);
         if (character == kEndOfFileMarker) {
             parseError();
 …
     BEGIN_STATE(CommentEndDashState)
         if (character == '-')
             ADVANCE_TO(CommentEndState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndState);
         if (character == kEndOfFileMarker) {
             parseError();
 …
         if (character == '!') {
             parseError();
             ADVANCE_TO(CommentEndBangState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndBangState);
+        }
         if (character == '-') {
             parseError();
             m_token.appendToComment('-');
             ADVANCE_TO(CommentEndState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndState);
+        }
         if (character == kEndOfFileMarker) {
 …
             m_token.appendToComment('-');
             m_token.appendToComment('!');
             ADVANCE_TO(CommentEndDashState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CommentEndDashState);
+        }
         if (character == '>')
 …
+        }
         m_token.beginDOCTYPE(toASCIILower(character));
         ADVANCE_TO(DOCTYPENameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPENameState);
     END_STATE()
 …
+        }
         m_token.appendToName(toASCIILower(character));
         ADVANCE_TO(DOCTYPENameState);
+        ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPENameState);
     END_STATE()
 …
+        }
         if (isASCIIAlphaCaselessEqual(character, 'p')) {
             auto result = source.advancePastIgnoringCase("public");
+            auto result = source.advancePastLettersIgnoringASCIICase("public");
             if (result == SegmentedString::DidMatch)
                 SWITCH_TO(AfterDOCTYPEPublicKeywordState);
 …
                 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
         } else if (isASCIIAlphaCaselessEqual(character, 's')) {
             auto result = source.advancePastIgnoringCase("system");
+            auto result = source.advancePastLettersIgnoringASCIICase("system");
             if (result == SegmentedString::DidMatch)
                 SWITCH_TO(AfterDOCTYPESystemKeywordState);
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
             parseError();
             m_token.setPublicIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             parseError();
             m_token.setPublicIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
         if (character == '>') {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
         if (character == '"') {
             m_token.setPublicIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             m_token.setPublicIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+        }
         if (character == '>') {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
     BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState)
         if (character == '"')
             ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterDOCTYPEPublicIdentifierState);
         if (character == '>') {
             parseError();
 …
     BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState)
         if (character == '\'')
             ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterDOCTYPEPublicIdentifierState);
         if (character == '>') {
             parseError();
 …
             parseError();
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             parseError();
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
         if (character == kEndOfFileMarker) {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
         if (character == '"') {
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
         if (character == kEndOfFileMarker) {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
             parseError();
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             parseError();
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
         if (character == '>') {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
         if (character == '"') {
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+        }
         if (character == '\'') {
             m_token.setSystemIdentifierToEmptyString();
             ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+        }
         if (character == '>') {
 …
         parseError();
         m_token.setForceQuirks();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
     BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState)
         if (character == '"')
             ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterDOCTYPESystemIdentifierState);
         if (character == '>') {
             parseError();
 …
     BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState)
         if (character == '\'')
             ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+            ADVANCE_PAST_NON_NEWLINE_TO(AfterDOCTYPESystemIdentifierState);
         if (character == '>') {
             parseError();
 …
+        }
         parseError();
         ADVANCE_TO(BogusDOCTYPEState);
+        ADVANCE_PAST_NON_NEWLINE_TO(BogusDOCTYPEState);
     END_STATE()
 …
     BEGIN_STATE(CDATASectionState)
         if (character == ']')
             ADVANCE_TO(CDATASectionRightSquareBracketState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CDATASectionRightSquareBracketState);
         if (character == kEndOfFileMarker)
             RECONSUME_IN(DataState);
 …
     BEGIN_STATE(CDATASectionRightSquareBracketState)
         if (character == ']')
             ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
+            ADVANCE_PAST_NON_NEWLINE_TO(CDATASectionDoubleRightSquareBracketState);
         bufferASCIICharacter(']');
         RECONSUME_IN(CDATASectionState);
 …
     BEGIN_STATE(CDATASectionDoubleRightSquareBracketState)
         if (character == '>')
             ADVANCE_TO(DataState);
+            ADVANCE_PAST_NON_NEWLINE_TO(DataState);
         bufferASCIICharacter(']');
         bufferASCIICharacter(']');

trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h

-              r208179
+              r209058
 #include "SegmentedString.h"
-#include <wtf/Noncopyable.h>
 #include <wtf/unicode/CharacterNames.h>
 namespace WebCore {
-const LChar kEndOfFileMarker = 0;
 // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
 template <typename Tokenizer>
 class InputStreamPreprocessor {
-    WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor);
 public:
     explicit InputStreamPreprocessor(Tokenizer& tokenizer)
         : m_tokenizer(tokenizer)
+    {
-        reset();
+    }
 …
     ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false)
+    {
         if (source.isEmpty())
+        if (UNLIKELY(source.isEmpty()))
             return false;
         m_nextInputCharacter = source.currentChar();
+        m_nextInputCharacter = source.currentCharacter();
         // Every branch in this function is expensive, so we have a
 …
         // handling. Please run the parser benchmark whenever you touch
         // this function. It's very hot.
         static const UChar specialCharacterMask = '\n' | '\r' | '\0';
         if (m_nextInputCharacter & ~specialCharacterMask) {
+        constexpr UChar specialCharacterMask = '\n' | '\r' | '\0';
+        if (LIKELY(m_nextInputCharacter & ~specialCharacterMask)) {
             m_skipNextNewLine = false;
             return true;
+        }
         return processNextInputCharacter(source, skipNullCharacters);
+    }
 …
     ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false)
+    {
         source.advanceAndUpdateLineNumber();
+        source.advance();
         return peek(source, skipNullCharacters);
+    }
+    bool skipNextNewLine() const { return m_skipNextNewLine; }
+    void reset(bool skipNextNewLine = false)
+    ALWAYS_INLINE bool advancePastNonNewline(SegmentedString& source, bool skipNullCharacters = false)
+    {
         m_nextInputCharacter = '\0';
         m_skipNextNewLine = skipNextNewLine;
+        source.advancePastNonNewline();
+        return peek(source, skipNullCharacters);
+    }
 …
+    {
     ProcessAgain:
+        ASSERT(m_nextInputCharacter == source.currentChar());
+        ASSERT(m_nextInputCharacter == source.currentCharacter());
         if (m_nextInputCharacter == '\n' && m_skipNextNewLine) {
             m_skipNextNewLine = false;
             source.advancePastNewlineAndUpdateLineNumber();
+            source.advancePastNewline();
             if (source.isEmpty())
                 return false;
             m_nextInputCharacter = source.currentChar();
+            m_nextInputCharacter = source.currentCharacter();
+        }
         if (m_nextInputCharacter == '\r') {
             m_nextInputCharacter = '\n';
             m_skipNextNewLine = true;
+        } else {
+            m_skipNextNewLine = false;
+            // FIXME: The spec indicates that the surrogate pair range as well as
+            // a number of specific character values are parse errors and should be replaced
+            // by the replacement character. We suspect this is a problem with the spec as doing
+            // that filtering breaks surrogate pair handling and causes us not to match Minefield.
+            if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
+                if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
+                    source.advancePastNonNewline();
+                    if (source.isEmpty())
+                        return false;
+                    m_nextInputCharacter = source.currentChar();
+                    goto ProcessAgain;
+                }
+                m_nextInputCharacter = replacementCharacter;
+            }
+            return true;
+        }
+        m_skipNextNewLine = false;
+        if (m_nextInputCharacter || isAtEndOfFile(source))
+            return true;
+        if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
+            source.advancePastNonNewline();
+            if (source.isEmpty())
+                return false;
+            m_nextInputCharacter = source.currentCharacter();
+            goto ProcessAgain;
+        }
+        m_nextInputCharacter = replacementCharacter;
         return true;
+    }
     bool shouldTreatNullAsEndOfFileMarker(SegmentedString& source) const
+    static bool isAtEndOfFile(SegmentedString& source)
+    {
         return source.isClosed() && source.length() == 1;
 …
     // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
     UChar m_nextInputCharacter;
     bool m_skipNextNewLine;
+    UChar m_nextInputCharacter { 0 };
+    bool m_skipNextNewLine { false };
 };

trunk/Source/WebCore/html/track/BufferedLineReader.cpp

-              r165997
+              r209058
 namespace WebCore {
 bool BufferedLineReader::getLine(String& line)
+std::optional<String> BufferedLineReader::nextLine()
+{
     if (m_maybeSkipLF) {
 …
         // then skip it, and then (unconditionally) return the buffered line.
         if (!m_buffer.isEmpty()) {
+            scanCharacter(newlineCharacter);
+            if (m_buffer.currentCharacter() == newlineCharacter)
+                m_buffer.advancePastNewline();
             m_maybeSkipLF = false;
+        }
         // If there was no (new) data available, then keep m_maybeSkipLF set,
+        // and fall through all the way down to the EOS check at the end of
+        // the method.
+        // and fall through all the way down to the EOS check at the end of the function.
+    }
 …
     bool checkForLF = false;
     while (!m_buffer.isEmpty()) {
         UChar c = m_buffer.currentChar();
+        UChar character = m_buffer.currentCharacter();
         m_buffer.advance();
         if (c == newlineCharacter || c == carriageReturn) {
+        if (character == newlineCharacter || character == carriageReturn) {
             // We found a line ending. Return the accumulated line.
             shouldReturnLine = true;
             checkForLF = (c == carriageReturn);
+            checkForLF = (character == carriageReturn);
             break;
+        }
 …
         // NULs are transformed into U+FFFD (REPLACEMENT CHAR.) in step 1 of
         // the WebVTT parser algorithm.
         if (c == '\0')
             c = replacementCharacter;
+        if (character == '\0')
+            character = replacementCharacter;
         m_lineBuffer.append(c);
+        m_lineBuffer.append(character);
+    }
 …
         // May be in the middle of a CRLF pair.
         if (!m_buffer.isEmpty()) {
             // Scan a potential newline character.
             scanCharacter(newlineCharacter);
+            if (m_buffer.currentCharacter() == newlineCharacter)
+                m_buffer.advancePastNewline();
         } else {
             // Check for the LF on the next call (unless we reached EOS, in
+            // Check for the newline on the next call (unless we reached EOS, in
             // which case we'll return the contents of the line buffer, and
             // reset state for the next line.)
 …
     if (shouldReturnLine) {
         line = m_lineBuffer.toString();
+        auto line = m_lineBuffer.toString();
         m_lineBuffer.clear();
         return true;
+        return WTFMove(line);
+    }
     ASSERT(m_buffer.isEmpty());
     return false;
+    return std::nullopt;
+}

trunk/Source/WebCore/html/track/BufferedLineReader.h

-              r208179
+              r209058
 //
 // Converts a stream of data (== a sequence of Strings) into a set of
 // lines. CR, LR or CRLF are considered linebreaks. Normalizes NULs (U+0000)
 // to 'REPLACEMENT CHARACTER' (U+FFFD) and does not return the linebreaks as
+// lines. CR, LR or CRLF are considered line breaks. Normalizes NULs (U+0000)
+// to 'REPLACEMENT CHARACTER' (U+FFFD) and does not return the line breaks as
 // part of the result.
 class BufferedLineReader {
     WTF_MAKE_NONCOPYABLE(BufferedLineReader);
 public:
+    BufferedLineReader()
+        : m_endOfStream(false)
+        , m_maybeSkipLF(false) { }
+    BufferedLineReader() = default;
+    void reset();
+    // Append data to the internal buffer.
+    void append(const String& data)
+    void append(String&& data)
+    {
         ASSERT(!m_endOfStream);
         m_buffer.append(SegmentedString(data));
+        m_buffer.append(WTFMove(data));
+    }
+    // Indicate that no more data will be appended. This will cause any
+    // potentially "unterminated" line to be returned from getLine.
+    void setEndOfStream() { m_endOfStream = true; }
+    // Attempt to read a line from the internal buffer (fed via append).
+    // If successful, true is returned and |line| is set to the line that was
+    // read. If no line could be read false is returned.
+    bool getLine(String& line);
+    // Returns true if EOS has been reached proper.
+    void appendEndOfStream() { m_endOfStream = true; }
     bool isAtEndOfStream() const { return m_endOfStream && m_buffer.isEmpty(); }
     void reset() { m_buffer.clear(); }
+    std::optional<String> nextLine();
 private:
-    // Consume the next character the buffer if it is the character |c|.
-    void scanCharacter(UChar c)
+    {
-        ASSERT(!m_buffer.isEmpty());
-        if (m_buffer.currentChar() == c)
-            m_buffer.advance();
+    }
     SegmentedString m_buffer;
     StringBuilder m_lineBuffer;
     bool m_endOfStream;
     bool m_maybeSkipLF;
+    bool m_endOfStream { false };
+    bool m_maybeSkipLF { false };
 };
+inline void BufferedLineReader::reset()
+{
+    m_buffer.clear();
+    m_lineBuffer.clear();
+    m_endOfStream = false;
+    m_maybeSkipLF = false;
+}
 } // namespace WebCore

trunk/Source/WebCore/html/track/InbandGenericTextTrack.cpp

r208658	r209058
186	186	}
187	187
188		void InbandGenericTextTrack::parseWebVTTFileHeader(InbandTextTrackPrivate* trackPrivate, String header)
	188	void InbandGenericTextTrack::parseWebVTTFileHeader(InbandTextTrackPrivate* trackPrivate, String&& header)
189	189	{
190	190	ASSERT_UNUSED(trackPrivate, trackPrivate == m_private);
191		parser().parseFileHeader(~~header~~);
	191	parser().parseFileHeader(WTFMove(header));
192	192	}
193	193

trunk/Source/WebCore/html/track/InbandGenericTextTrack.h

r207907	r209058
73	73	WebVTTParser& parser();
74	74	void parseWebVTTCueData(InbandTextTrackPrivate*, const ISOWebVTTCue&) final;
75		void parseWebVTTFileHeader(InbandTextTrackPrivate*, String) final;
	75	void parseWebVTTFileHeader(InbandTextTrackPrivate*, String&&) final;
76	76
77	77	void newCuesParsed() final;

trunk/Source/WebCore/html/track/InbandTextTrack.h

r200361	r209058
80	80	void removeGenericCue(InbandTextTrackPrivate, GenericCueData) override { ASSERT_NOT_REACHED(); }
81	81
82		void parseWebVTTFileHeader(InbandTextTrackPrivate*, String) override { ASSERT_NOT_REACHED(); }
	82	void parseWebVTTFileHeader(InbandTextTrackPrivate*, String&&) override { ASSERT_NOT_REACHED(); }
83	83	void parseWebVTTCueData(InbandTextTrackPrivate, const char, unsigned) override { ASSERT_NOT_REACHED(); }
84	84	void parseWebVTTCueData(InbandTextTrackPrivate*, const ISOWebVTTCue&) override { ASSERT_NOT_REACHED(); }

trunk/Source/WebCore/html/track/WebVTTParser.cpp

-              r203302
+              r209058
+}
 void WebVTTParser::parseFileHeader(const String& data)
+void WebVTTParser::parseFileHeader(String&& data)
+{
     m_state = Initial;
     m_lineReader.reset();
     m_lineReader.append(data);
+    m_lineReader.append(WTFMove(data));
     parse();
+}
 …
 void WebVTTParser::parseBytes(const char* data, unsigned length)
+{
+    String textData = m_decoder->decode(data, length);
+    m_lineReader.append(textData);
+    m_lineReader.append(m_decoder->decode(data, length));
     parse();
+}
 …
 void WebVTTParser::parseCueData(const ISOWebVTTCue& data)
+{
     RefPtr<WebVTTCueData> cue = WebVTTCueData::create();
+    auto cue = WebVTTCueData::create();
     MediaTime startTime = data.presentationTime();
 …
         cue->setOriginalStartTime(originalStartTime);
     m_cuelist.append(cue);
+    m_cuelist.append(WTFMove(cue));
     if (m_client)
         m_client->newCuesParsed();
 …
 void WebVTTParser::flush()
+{
+    String textData = m_decoder->flush();
+    m_lineReader.append(textData);
+    m_lineReader.setEndOfStream();
+    m_lineReader.append(m_decoder->flush());
+    m_lineReader.appendEndOfStream();
     parse();
     flushPendingCue();
 …
     // WebVTT parser algorithm. (5.1 WebVTT file parsing.)
     // Steps 1 - 3 - Initial setup.
+    String line;
+    while (m_lineReader.getLine(line)) {
+        if (line.isNull())
+            return;
+    while (auto line = m_lineReader.nextLine()) {
         switch (m_state) {
         case Initial:
             // Steps 4 - 9 - Check for a valid WebVTT signature.
             if (!hasRequiredFileIdentifier(line)) {
+            if (!hasRequiredFileIdentifier(*line)) {
                 if (m_client)
                     m_client->fileFailedToParse();
 …
         case Header:
             collectMetadataHeader(line);
             if (line.isEmpty()) {
+            collectMetadataHeader(*line);
+            if (line->isEmpty()) {
                 // Steps 10-14 - Allow a header (comment area) under the WEBVTT line.
                 if (m_client && m_regionList.size())
 …
+            }
             // Step 15 - Break out of header loop if the line could be a timestamp line.
             if (line.contains("-->"))
                 m_state = recoverCue(line);
+            if (line->contains("-->"))
+                m_state = recoverCue(*line);
             // Step 16 - Line is not the empty string and does not contain "-->".
 …
         case Id:
             // Steps 17 - 20 - Allow any number of line terminators, then initialize new cue values.
             if (line.isEmpty())
+            if (line->isEmpty())
                 break;
 …
             // Steps 22 - 25 - Check if this line contains an optional identifier or timing data.
             m_state = collectCueId(line);
+            m_state = collectCueId(*line);
             break;
         case TimingsAndSettings:
             // Steps 26 - 27 - Discard current cue if the line is empty.
             if (line.isEmpty()) {
+            if (line->isEmpty()) {
                 m_state = Id;
                 break;
 …
             // Steps 28 - 29 - Collect cue timings and settings.
             m_state = collectTimingsAndSettings(line);
+            m_state = collectTimingsAndSettings(*line);
             break;
         case CueText:
             // Steps 31 - 41 - Collect the cue text, create a cue, and add it to the output.
             m_state = collectCueText(line);
+            m_state = collectCueText(*line);
             break;
         case BadCue:
             // Steps 42 - 48 - Discard lines until an empty line or a potential timing line is seen.
             m_state = ignoreBadCue(line);
+            m_state = ignoreBadCue(*line);
             break;

trunk/Source/WebCore/html/track/WebVTTParser.h

r208179	r209058
134	134	// Input data to the parser to parse.
135	135	void parseBytes(const char*, unsigned);
136		void parseFileHeader(~~const String~~&);
	136	void parseFileHeader(String&&);
137	137	void parseCueData(const ISOWebVTTCue&);
138	138	void flush();

trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp

-              r178265
+              r209058
 #include "config.h"
+#include "WebVTTTokenizer.h"
 #if ENABLE(VIDEO_TRACK)
-#include "WebVTTTokenizer.h"
 #include "MarkupTokenizerInlines.h"
 …
         goto stateName;                                     \
     } while (false)
 template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
+{
 …
 inline bool advanceAndEmitToken(SegmentedString& source, WebVTTToken& resultToken, const WebVTTToken& token)
+{
     source.advanceAndUpdateLineNumber();
+    source.advance();
     return emitToken(resultToken, token);
+}
 …
     // Append an EOF marker and close the input "stream".
     ASSERT(!m_input.isClosed());
     m_input.append(SegmentedString(String(&kEndOfFileMarker, 1)));
+    m_input.append(String { &kEndOfFileMarker, 1 });
     m_input.close();
+}

trunk/Source/WebCore/platform/graphics/InbandTextTrackPrivateClient.h

r206538	r209058
181	181	virtual void removeGenericCue(InbandTextTrackPrivate, GenericCueData) = 0;
182	182
183		virtual void parseWebVTTFileHeader(InbandTextTrackPrivate*, String) { ASSERT_NOT_REACHED(); }
	183	virtual void parseWebVTTFileHeader(InbandTextTrackPrivate*, String&&) { ASSERT_NOT_REACHED(); }
184	184	virtual void parseWebVTTCueData(InbandTextTrackPrivate, const char data, unsigned length) = 0;
185	185	virtual void parseWebVTTCueData(InbandTextTrackPrivate*, const ISOWebVTTCue&) = 0;

trunk/Source/WebCore/platform/text/SegmentedString.cpp

-              r178265
+              r209058
 /*
     Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
+    Copyright (C) 2004-2016 Apple Inc. All rights reserved.
     This library is free software; you can redistribute it and/or
 …
 #include "SegmentedString.h"
+#include <wtf/text/StringBuilder.h>
 #include <wtf/text/TextPosition.h>
 namespace WebCore {
+SegmentedString::SegmentedString(const SegmentedString& other)
+    : m_pushedChar1(other.m_pushedChar1)
+    , m_pushedChar2(other.m_pushedChar2)
+    , m_currentString(other.m_currentString)
+    , m_numberOfCharactersConsumedPriorToCurrentString(other.m_numberOfCharactersConsumedPriorToCurrentString)
+    , m_numberOfCharactersConsumedPriorToCurrentLine(other.m_numberOfCharactersConsumedPriorToCurrentLine)
+    , m_currentLine(other.m_currentLine)
+    , m_substrings(other.m_substrings)
+    , m_closed(other.m_closed)
+    , m_empty(other.m_empty)
+    , m_fastPathFlags(other.m_fastPathFlags)
+    , m_advanceFunc(other.m_advanceFunc)
+    , m_advanceAndUpdateLineNumberFunc(other.m_advanceAndUpdateLineNumberFunc)
+{
+    if (m_pushedChar2)
+        m_currentChar = m_pushedChar2;
+    else if (m_pushedChar1)
+        m_currentChar = m_pushedChar1;
+    else
+        m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+}
+SegmentedString& SegmentedString::operator=(const SegmentedString& other)
+{
+    m_pushedChar1 = other.m_pushedChar1;
+    m_pushedChar2 = other.m_pushedChar2;
+    m_currentString = other.m_currentString;
+    m_substrings = other.m_substrings;
+    if (m_pushedChar2)
+        m_currentChar = m_pushedChar2;
+    else if (m_pushedChar1)
+        m_currentChar = m_pushedChar1;
+    else
+        m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+    m_closed = other.m_closed;
+    m_empty = other.m_empty;
+    m_fastPathFlags = other.m_fastPathFlags;
+    m_numberOfCharactersConsumedPriorToCurrentString = other.m_numberOfCharactersConsumedPriorToCurrentString;
+inline void SegmentedString::Substring::appendTo(StringBuilder& builder) const
+{
+    builder.append(string, string.length() - length, length);
+}
+SegmentedString& SegmentedString::operator=(SegmentedString&& other)
+{
+    m_currentSubstring = WTFMove(other.m_currentSubstring);
+    m_otherSubstrings = WTFMove(other.m_otherSubstrings);
+    m_isClosed = other.m_isClosed;
+    m_currentCharacter = other.m_currentCharacter;
+    m_numberOfCharactersConsumedPriorToCurrentSubstring = other.m_numberOfCharactersConsumedPriorToCurrentSubstring;
     m_numberOfCharactersConsumedPriorToCurrentLine = other.m_numberOfCharactersConsumedPriorToCurrentLine;
     m_currentLine = other.m_currentLine;
+    m_advanceFunc = other.m_advanceFunc;
+    m_advanceAndUpdateLineNumberFunc = other.m_advanceAndUpdateLineNumberFunc;
+    m_fastPathFlags = other.m_fastPathFlags;
+    m_advanceWithoutUpdatingLineNumberFunction = other.m_advanceWithoutUpdatingLineNumberFunction;
+    m_advanceAndUpdateLineNumberFunction = other.m_advanceAndUpdateLineNumberFunction;
+    other.clear();
     return *this;
 …
 unsigned SegmentedString::length() const
+{
+    unsigned length = m_currentString.m_length;
+    if (m_pushedChar1) {
+        ++length;
+        if (m_pushedChar2)
+            ++length;
+    }
+    if (isComposite()) {
+        Deque<SegmentedSubstring>::const_iterator it = m_substrings.begin();
+        Deque<SegmentedSubstring>::const_iterator e = m_substrings.end();
+        for (; it != e; ++it)
+            length += it->m_length;
+    }
+    unsigned length = m_currentSubstring.length;
+    for (auto& substring : m_otherSubstrings)
+        length += substring.length;
     return length;
+}
 …
 void SegmentedString::setExcludeLineNumbers()
+{
+    m_currentString.setExcludeLineNumbers();
+    if (isComposite()) {
+        Deque<SegmentedSubstring>::iterator it = m_substrings.begin();
+        Deque<SegmentedSubstring>::iterator e = m_substrings.end();
+        for (; it != e; ++it)
+            it->setExcludeLineNumbers();
+    }
+    if (!m_currentSubstring.doNotExcludeLineNumbers)
+        return;
+    m_currentSubstring.doNotExcludeLineNumbers = false;
+    for (auto& substring : m_otherSubstrings)
+        substring.doNotExcludeLineNumbers = false;
+    updateAdvanceFunctionPointers();
+}
 void SegmentedString::clear()
+{
+    m_pushedChar1 = 0;
+    m_pushedChar2 = 0;
+    m_currentChar = 0;
+    m_currentString.clear();
+    m_numberOfCharactersConsumedPriorToCurrentString = 0;
+    m_currentSubstring.length = 0;
+    m_otherSubstrings.clear();
+    m_isClosed = false;
+    m_currentCharacter = 0;
+    m_numberOfCharactersConsumedPriorToCurrentSubstring = 0;
     m_numberOfCharactersConsumedPriorToCurrentLine = 0;
     m_currentLine = 0;
+    m_substrings.clear();
+    m_closed = false;
+    m_empty = true;
+    m_fastPathFlags = NoFastPath;
+    m_advanceFunc = &SegmentedString::advanceEmpty;
+    m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+}
+void SegmentedString::append(const SegmentedSubstring& s)
+{
+    ASSERT(!m_closed);
+    if (!s.m_length)
+    updateAdvanceFunctionPointersForEmptyString();
+}
+inline void SegmentedString::appendSubstring(Substring&& substring)
+{
+    ASSERT(!m_isClosed);
+    if (!substring.length)
         return;
+    if (!m_currentString.m_length) {
+        m_numberOfCharactersConsumedPriorToCurrentString += m_currentString.numberOfCharactersConsumed();
+        m_currentString = s;
+        updateAdvanceFunctionPointers();
+    } else
+        m_substrings.append(s);
+    m_empty = false;
+}
+void SegmentedString::pushBack(const SegmentedSubstring& s)
+{
+    ASSERT(!m_pushedChar1);
+    ASSERT(!s.numberOfCharactersConsumed());
+    if (!s.m_length)
+        return;
+    // FIXME: We're assuming that the characters were originally consumed by
+    //        this SegmentedString.  We're also ASSERTing that s is a fresh
+    //        SegmentedSubstring.  These assumptions are sufficient for our
+    //        current use, but we might need to handle the more elaborate
+    //        cases in the future.
+    m_numberOfCharactersConsumedPriorToCurrentString += m_currentString.numberOfCharactersConsumed();
+    m_numberOfCharactersConsumedPriorToCurrentString -= s.m_length;
+    if (!m_currentString.m_length) {
+        m_currentString = s;
+        updateAdvanceFunctionPointers();
+    } else {
+        // Shift our m_currentString into our list.
+        m_substrings.prepend(m_currentString);
+        m_currentString = s;
+    if (m_currentSubstring.length)
+        m_otherSubstrings.append(WTFMove(substring));
+    else {
+        m_numberOfCharactersConsumedPriorToCurrentSubstring += m_currentSubstring.numberOfCharactersConsumed();
+        m_currentSubstring = WTFMove(substring);
+        m_currentCharacter = m_currentSubstring.currentCharacter();
         updateAdvanceFunctionPointers();
+    }
+    m_empty = false;
+}
+void SegmentedString::pushBack(String&& string)
+{
+    // We never create a substring for an empty string.
+    ASSERT(string.length());
+    // The new substring we will create won't have the doNotExcludeLineNumbers set appropriately.
+    // That was lost when the characters were consumed before pushing them back. But this does
+    // not matter, because clients never use this for newlines. Catch that with this assertion.
+    ASSERT(!string.contains('\n'));
+    // The characters in the string must be previously consumed characters from this segmented string.
+    ASSERT(string.length() <= numberOfCharactersConsumed());
+    m_numberOfCharactersConsumedPriorToCurrentSubstring += m_currentSubstring.numberOfCharactersConsumed();
+    if (m_currentSubstring.length)
+        m_otherSubstrings.prepend(WTFMove(m_currentSubstring));
+    m_currentSubstring = WTFMove(string);
+    m_numberOfCharactersConsumedPriorToCurrentSubstring -= m_currentSubstring.length;
+    m_currentCharacter = m_currentSubstring.currentCharacter();
+    updateAdvanceFunctionPointers();
+}
 void SegmentedString::close()
+{
+    // Closing a stream twice is likely a coding mistake.
+    ASSERT(!m_closed);
+    m_closed = true;
+}
+void SegmentedString::append(const SegmentedString& s)
+{
+    ASSERT(!m_closed);
+    ASSERT(!s.m_pushedChar1);
+    append(s.m_currentString);
+    if (s.isComposite()) {
+        Deque<SegmentedSubstring>::const_iterator it = s.m_substrings.begin();
+        Deque<SegmentedSubstring>::const_iterator e = s.m_substrings.end();
+        for (; it != e; ++it)
+            append(*it);
+    ASSERT(!m_isClosed);
+    m_isClosed = true;
+}
+void SegmentedString::append(const SegmentedString& string)
+{
+    appendSubstring(Substring { string.m_currentSubstring });
+    for (auto& substring : string.m_otherSubstrings)
+        m_otherSubstrings.append(substring);
+}
+void SegmentedString::append(SegmentedString&& string)
+{
+    appendSubstring(WTFMove(string.m_currentSubstring));
+    for (auto& substring : string.m_otherSubstrings)
+        m_otherSubstrings.append(WTFMove(substring));
+}
+void SegmentedString::append(String&& string)
+{
+    appendSubstring(WTFMove(string));
+}
+void SegmentedString::append(const String& string)
+{
+    appendSubstring(String { string });
+}
+String SegmentedString::toString() const
+{
+    StringBuilder result;
+    m_currentSubstring.appendTo(result);
+    for (auto& substring : m_otherSubstrings)
+        substring.appendTo(result);
+    return result.toString();
+}
+void SegmentedString::advanceWithoutUpdatingLineNumber16()
+{
+    m_currentCharacter = *++m_currentSubstring.currentCharacter16;
+    decrementAndCheckLength();
+}
+void SegmentedString::advanceAndUpdateLineNumber16()
+{
+    ASSERT(m_currentSubstring.doNotExcludeLineNumbers);
+    processPossibleNewline();
+    m_currentCharacter = *++m_currentSubstring.currentCharacter16;
+    decrementAndCheckLength();
+}
+inline void SegmentedString::advancePastSingleCharacterSubstringWithoutUpdatingLineNumber()
+{
+    ASSERT(m_currentSubstring.length == 1);
+    if (m_otherSubstrings.isEmpty()) {
+        m_currentSubstring.length = 0;
+        m_currentCharacter = 0;
+        updateAdvanceFunctionPointersForEmptyString();
+        return;
+    }
+    m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
+}
+void SegmentedString::pushBack(const SegmentedString& s)
+{
+    ASSERT(!m_pushedChar1);
+    ASSERT(!s.m_pushedChar1);
+    if (s.isComposite()) {
+        Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin();
+        Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend();
+        for (; it != e; ++it)
+            pushBack(*it);
+    }
+    pushBack(s.m_currentString);
+    m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
+}
+void SegmentedString::advanceSubstring()
+{
+    if (isComposite()) {
+        m_numberOfCharactersConsumedPriorToCurrentString += m_currentString.numberOfCharactersConsumed();
+        m_currentString = m_substrings.takeFirst();
+        // If we've previously consumed some characters of the non-current
+        // string, we now account for those characters as part of the current
+        // string, not as part of "prior to current string."
+        m_numberOfCharactersConsumedPriorToCurrentString -= m_currentString.numberOfCharactersConsumed();
+        updateAdvanceFunctionPointers();
+    } else {
+        m_currentString.clear();
+        m_empty = true;
+        m_fastPathFlags = NoFastPath;
+        m_advanceFunc = &SegmentedString::advanceEmpty;
+        m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+    }
+}
+String SegmentedString::toString() const
+{
+    StringBuilder result;
+    if (m_pushedChar1) {
+        result.append(m_pushedChar1);
+        if (m_pushedChar2)
+            result.append(m_pushedChar2);
+    }
+    m_currentString.appendTo(result);
+    if (isComposite()) {
+        Deque<SegmentedSubstring>::const_iterator it = m_substrings.begin();
+        Deque<SegmentedSubstring>::const_iterator e = m_substrings.end();
+        for (; it != e; ++it)
+            it->appendTo(result);
+    }
+    return result.toString();
+}
+void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters)
+{
+    ASSERT_WITH_SECURITY_IMPLICATION(count <= length());
+    for (unsigned i = 0; i < count; ++i) {
+        consumedCharacters[i] = currentChar();
+        advancePastNonNewline();
+    }
+}
+void SegmentedString::advance8()
+{
+    ASSERT(!m_pushedChar1);
+    decrementAndCheckLength();
+    m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+}
+void SegmentedString::advance16()
+{
+    ASSERT(!m_pushedChar1);
+    decrementAndCheckLength();
+    m_currentChar = m_currentString.incrementAndGetCurrentChar16();
+}
+void SegmentedString::advanceAndUpdateLineNumber8()
+{
+    ASSERT(!m_pushedChar1);
+    ASSERT(m_currentString.getCurrentChar() == m_currentChar);
+    if (m_currentChar == '\n') {
+        ++m_currentLine;
+        m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+    }
+    decrementAndCheckLength();
+    m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+}
+void SegmentedString::advanceAndUpdateLineNumber16()
+{
+    ASSERT(!m_pushedChar1);
+    ASSERT(m_currentString.getCurrentChar() == m_currentChar);
+    if (m_currentChar == '\n') {
+        ++m_currentLine;
+        m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+    }
+    decrementAndCheckLength();
+    m_currentChar = m_currentString.incrementAndGetCurrentChar16();
+}
+void SegmentedString::advanceSlowCase()
+{
+    if (m_pushedChar1) {
+        m_pushedChar1 = m_pushedChar2;
+        m_pushedChar2 = 0;
+        if (m_pushedChar1) {
+            m_currentChar = m_pushedChar1;
+            return;
+        }
+        updateAdvanceFunctionPointers();
+    } else if (m_currentString.m_length) {
+        if (--m_currentString.m_length == 0)
+            advanceSubstring();
+    } else if (!isComposite()) {
+        m_currentString.clear();
+        m_empty = true;
+        m_fastPathFlags = NoFastPath;
+        m_advanceFunc = &SegmentedString::advanceEmpty;
+        m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+    }
+    m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+}
+void SegmentedString::advanceAndUpdateLineNumberSlowCase()
+{
+    if (m_pushedChar1) {
+        m_pushedChar1 = m_pushedChar2;
+        m_pushedChar2 = 0;
+        if (m_pushedChar1) {
+            m_currentChar = m_pushedChar1;
+            return;
+        }
+        updateAdvanceFunctionPointers();
+    } else if (m_currentString.m_length) {
+        if (m_currentString.getCurrentChar() == '\n' && m_currentString.doNotExcludeLineNumbers()) {
+            ++m_currentLine;
+            // Plus 1 because numberOfCharactersConsumed value hasn't incremented yet; it does with m_length decrement below.
+            m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+        }
+        if (--m_currentString.m_length == 0)
+            advanceSubstring();
+        else
+            m_currentString.incrementAndGetCurrentChar(); // Only need the ++
+    } else if (!isComposite()) {
+        m_currentString.clear();
+        m_empty = true;
+        m_fastPathFlags = NoFastPath;
+        m_advanceFunc = &SegmentedString::advanceEmpty;
+        m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+    }
+    m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
+    m_numberOfCharactersConsumedPriorToCurrentSubstring += m_currentSubstring.numberOfCharactersConsumed();
+    m_currentSubstring = m_otherSubstrings.takeFirst();
+    // If we've previously consumed some characters of the non-current string, we now account for those
+    // characters as part of the current string, not as part of "prior to current string."
+    m_numberOfCharactersConsumedPriorToCurrentSubstring -= m_currentSubstring.numberOfCharactersConsumed();
+    m_currentCharacter = m_currentSubstring.currentCharacter();
+    updateAdvanceFunctionPointers();
+}
+void SegmentedString::advancePastSingleCharacterSubstring()
+{
+    ASSERT(m_currentSubstring.length == 1);
+    ASSERT(m_currentSubstring.doNotExcludeLineNumbers);
+    processPossibleNewline();
+    advancePastSingleCharacterSubstringWithoutUpdatingLineNumber();
+}
 void SegmentedString::advanceEmpty()
+{
+    ASSERT(!m_currentString.m_length && !isComposite());
+    m_currentChar = 0;
+}
+void SegmentedString::updateSlowCaseFunctionPointers()
+{
+    ASSERT(!m_currentSubstring.length);
+    ASSERT(m_otherSubstrings.isEmpty());
+    ASSERT(!m_currentCharacter);
+}
+void SegmentedString::updateAdvanceFunctionPointersForSingleCharacterSubstring()
+{
+    ASSERT(m_currentSubstring.length == 1);
     m_fastPathFlags = NoFastPath;
+    m_advanceFunc = &SegmentedString::advanceSlowCase;
+    m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumberSlowCase;
+    m_advanceWithoutUpdatingLineNumberFunction = &SegmentedString::advancePastSingleCharacterSubstringWithoutUpdatingLineNumber;
+    if (m_currentSubstring.doNotExcludeLineNumbers)
+        m_advanceAndUpdateLineNumberFunction = &SegmentedString::advancePastSingleCharacterSubstring;
+    else
+        m_advanceAndUpdateLineNumberFunction = &SegmentedString::advancePastSingleCharacterSubstringWithoutUpdatingLineNumber;
+}
 …
+}
+SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive)
+{
+    unsigned length = strlen(literal);
+SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool lettersIgnoringASCIICase)
+{
+    constexpr unsigned maxLength = 10;
+    ASSERT(!strchr(literal, '\n'));
+    auto length = strlen(literal);
+    ASSERT(length <= maxLength);
     if (length > this->length())
         return NotEnoughCharacters;
+    UChar* consumedCharacters;
+    String consumedString = String::createUninitialized(length, consumedCharacters);
+    advancePastNonNewlines(length, consumedCharacters);
+    if (consumedString.startsWith(literal, caseSensitive))
+        return DidMatch;
+    pushBack(SegmentedString(consumedString));
+    return DidNotMatch;
+}
+}
+    UChar consumedCharacters[maxLength];
+    for (unsigned i = 0; i < length; ++i) {
+        auto character = m_currentCharacter;
+        if (characterMismatch(character, literal[i], lettersIgnoringASCIICase)) {
+            if (i)
+                pushBack(String { consumedCharacters, i });
+            return DidNotMatch;
+        }
+        advancePastNonNewline();
+        consumedCharacters[i] = character;
+    }
+    return DidMatch;
+}
+void SegmentedString::updateAdvanceFunctionPointersForEmptyString()
+{
+    ASSERT(!m_currentSubstring.length);
+    ASSERT(m_otherSubstrings.isEmpty());
+    ASSERT(!m_currentCharacter);
+    m_fastPathFlags = NoFastPath;
+    m_advanceWithoutUpdatingLineNumberFunction = &SegmentedString::advanceEmpty;
+    m_advanceAndUpdateLineNumberFunction = &SegmentedString::advanceEmpty;
+}
+}

trunk/Source/WebCore/platform/text/SegmentedString.h

-              r178265
+              r209058
 /*
     Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved.
+    Copyright (C) 2004-2016 Apple Inc. All rights reserved.
     This library is free software; you can redistribute it and/or
 …
 */
+#ifndef SegmentedString_h
+#define SegmentedString_h
+#pragma once
 #include <wtf/Deque.h>
 #include <wtf/text/StringBuilder.h>
+#include <wtf/text/WTFString.h>
 namespace WebCore {
+class SegmentedString;
+class SegmentedSubstring {
+public:
+    SegmentedSubstring()
+        : m_length(0)
+        , m_doNotExcludeLineNumbers(true)
+        , m_is8Bit(false)
+    {
+        m_data.string16Ptr = 0;
+    }
+    SegmentedSubstring(const String& str)
+        : m_length(str.length())
+        , m_doNotExcludeLineNumbers(true)
+        , m_string(str)
+    {
+        if (m_length) {
+            if (m_string.is8Bit()) {
+                m_is8Bit = true;
+                m_data.string8Ptr = m_string.characters8();
+            } else {
+                m_is8Bit = false;
+                m_data.string16Ptr = m_string.characters16();
+            }
+        } else
+            m_is8Bit = false;
+    }
+    void clear() { m_length = 0; m_data.string16Ptr = 0; m_is8Bit = false;}
+    bool is8Bit() { return m_is8Bit; }
+    bool excludeLineNumbers() const { return !m_doNotExcludeLineNumbers; }
+    bool doNotExcludeLineNumbers() const { return m_doNotExcludeLineNumbers; }
+    void setExcludeLineNumbers() { m_doNotExcludeLineNumbers = false; }
+    int numberOfCharactersConsumed() const { return m_string.length() - m_length; }
+    void appendTo(StringBuilder& builder) const
+    {
+        int offset = m_string.length() - m_length;
+        if (!offset) {
+            if (m_length)
+                builder.append(m_string);
+        } else
+            builder.append(m_string.substring(offset, m_length));
+    }
+    UChar getCurrentChar8()
+    {
+        return *m_data.string8Ptr;
+    }
+    UChar getCurrentChar16()
+    {
+        return m_data.string16Ptr ? *m_data.string16Ptr : 0;
+    }
+    UChar incrementAndGetCurrentChar8()
+    {
+        ASSERT(m_data.string8Ptr);
+        return *++m_data.string8Ptr;
+    }
+    UChar incrementAndGetCurrentChar16()
+    {
+        ASSERT(m_data.string16Ptr);
+        return *++m_data.string16Ptr;
+    }
+    String currentSubString(unsigned length)
+    {
+        int offset = m_string.length() - m_length;
+        return m_string.substring(offset, length);
+    }
+    ALWAYS_INLINE UChar getCurrentChar()
+    {
+        ASSERT(m_length);
+        if (is8Bit())
+            return getCurrentChar8();
+        return getCurrentChar16();
+    }
+    ALWAYS_INLINE UChar incrementAndGetCurrentChar()
+    {
+        ASSERT(m_length);
+        if (is8Bit())
+            return incrementAndGetCurrentChar8();
+        return incrementAndGetCurrentChar16();
+    }
+public:
+    union {
+        const LChar* string8Ptr;
+        const UChar* string16Ptr;
+    } m_data;
+    int m_length;
+private:
+    bool m_doNotExcludeLineNumbers;
+    bool m_is8Bit;
+    String m_string;
+};
+// FIXME: This should not start with "k".
+// FIXME: This is a shared tokenizer concept, not a SegmentedString concept, but this is the only common header for now.
+constexpr LChar kEndOfFileMarker = 0;
 class SegmentedString {
 public:
+    SegmentedString()
+        : m_pushedChar1(0)
+        , m_pushedChar2(0)
+        , m_currentChar(0)
+        , m_numberOfCharactersConsumedPriorToCurrentString(0)
+        , m_numberOfCharactersConsumedPriorToCurrentLine(0)
+        , m_currentLine(0)
+        , m_closed(false)
+        , m_empty(true)
+        , m_fastPathFlags(NoFastPath)
+        , m_advanceFunc(&SegmentedString::advanceEmpty)
+        , m_advanceAndUpdateLineNumberFunc(&SegmentedString::advanceEmpty)
+    {
+    }
+    SegmentedString(const String& str)
+        : m_pushedChar1(0)
+        , m_pushedChar2(0)
+        , m_currentString(str)
+        , m_currentChar(0)
+        , m_numberOfCharactersConsumedPriorToCurrentString(0)
+        , m_numberOfCharactersConsumedPriorToCurrentLine(0)
+        , m_currentLine(0)
+        , m_closed(false)
+        , m_empty(!str.length())
+        , m_fastPathFlags(NoFastPath)
+    {
+        if (m_currentString.m_length)
+            m_currentChar = m_currentString.getCurrentChar();
+        updateAdvanceFunctionPointers();
+    }
+    SegmentedString(const SegmentedString&);
+    SegmentedString& operator=(const SegmentedString&);
+    SegmentedString() = default;
+    SegmentedString(String&&);
+    SegmentedString(const String&);
+    SegmentedString(SegmentedString&&) = delete;
+    SegmentedString(const SegmentedString&) = delete;
+    SegmentedString& operator=(SegmentedString&&);
+    SegmentedString& operator=(const SegmentedString&) = default;
     void clear();
     void close();
+    void append(SegmentedString&&);
     void append(const SegmentedString&);
+    void pushBack(const SegmentedString&);
+    void append(String&&);
+    void append(const String&);
+    void pushBack(String&&);
     void setExcludeLineNumbers();
+    void push(UChar c)
+    {
+        if (!m_pushedChar1) {
+            m_pushedChar1 = c;
+            m_currentChar = m_pushedChar1 ? m_pushedChar1 : m_currentString.getCurrentChar();
+            updateSlowCaseFunctionPointers();
+        } else {
+            ASSERT(!m_pushedChar2);
+            m_pushedChar2 = c;
+        }
+    }
+    bool isEmpty() const { return m_empty; }
+    bool isEmpty() const { return !m_currentSubstring.length; }
     unsigned length() const;
+    bool isClosed() const { return m_closed; }
+    bool isClosed() const { return m_isClosed; }
+    void advance();
+    void advancePastNonNewline(); // Faster than calling advance when we know the current character is not a newline.
+    void advancePastNewline(); // Faster than calling advance when we know the current character is a newline.
     enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters };
+    template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); }
+    template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); }
+    void advance()
+    {
+        if (m_fastPathFlags & Use8BitAdvance) {
+            ASSERT(!m_pushedChar1);
+            bool haveOneCharacterLeft = (--m_currentString.m_length == 1);
+            m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+            if (!haveOneCharacterLeft)
+                return;
+            updateSlowCaseFunctionPointers();
+            return;
+        }
+        (this->*m_advanceFunc)();
+    }
+    void advanceAndUpdateLineNumber()
+    {
+        if (m_fastPathFlags & Use8BitAdvance) {
+            ASSERT(!m_pushedChar1);
+            bool haveNewLine = (m_currentChar == '\n') & !!(m_fastPathFlags & Use8BitAdvanceAndUpdateLineNumbers);
+            bool haveOneCharacterLeft = (--m_currentString.m_length == 1);
+            m_currentChar = m_currentString.incrementAndGetCurrentChar8();
+            if (!(haveNewLine | haveOneCharacterLeft))
+                return;
+            if (haveNewLine) {
+                ++m_currentLine;
+                m_numberOfCharactersConsumedPriorToCurrentLine =  m_numberOfCharactersConsumedPriorToCurrentString + m_currentString.numberOfCharactersConsumed();
+            }
+            if (haveOneCharacterLeft)
+                updateSlowCaseFunctionPointers();
+            return;
+        }
+        (this->*m_advanceAndUpdateLineNumberFunc)();
+    }
+    void advancePastNonNewline()
+    {
+        ASSERT(currentChar() != '\n');
+        advance();
+    }
+    void advancePastNewlineAndUpdateLineNumber()
+    {
+        ASSERT(currentChar() == '\n');
+        if (!m_pushedChar1 && m_currentString.m_length > 1) {
+            int newLineFlag = m_currentString.doNotExcludeLineNumbers();
+            m_currentLine += newLineFlag;
+            if (newLineFlag)
+                m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + 1;
+            decrementAndCheckLength();
+            m_currentChar = m_currentString.incrementAndGetCurrentChar();
+            return;
+        }
+        advanceAndUpdateLineNumberSlowCase();
+    }
+    int numberOfCharactersConsumed() const
+    {
+        int numberOfPushedCharacters = 0;
+        if (m_pushedChar1) {
+            ++numberOfPushedCharacters;
+            if (m_pushedChar2)
+                ++numberOfPushedCharacters;
+        }
+        return m_numberOfCharactersConsumedPriorToCurrentString + m_currentString.numberOfCharactersConsumed() - numberOfPushedCharacters;
+    }
+    template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast<length, false>(literal); }
+    template<unsigned length> AdvancePastResult advancePastLettersIgnoringASCIICase(const char (&literal)[length]) { return advancePast<length, true>(literal); }
+    unsigned numberOfCharactersConsumed() const;
     String toString() const;
     UChar currentChar() const { return m_currentChar; }
+    UChar currentCharacter() const { return m_currentCharacter; }
     OrdinalNumber currentColumn() const;
 …
 private:
+    struct Substring {
+        Substring() = default;
+        Substring(String&&);
+        UChar currentCharacter() const;
+        UChar currentCharacterPreIncrement();
+        unsigned numberOfCharactersConsumed() const;
+        void appendTo(StringBuilder&) const;
+        String string;
+        unsigned length { 0 };
+        bool is8Bit;
+        union {
+            const LChar* currentCharacter8;
+            const UChar* currentCharacter16;
+        };
+        bool doNotExcludeLineNumbers { true };
+    };
     enum FastPathFlags {
         NoFastPath = 0,
 …
     };
+    void append(const SegmentedSubstring&);
+    void pushBack(const SegmentedSubstring&);
+    void advance8();
+    void advance16();
+    void advanceAndUpdateLineNumber8();
+    void appendSubstring(Substring&&);
+    void processPossibleNewline();
+    void startNewLine();
+    void advanceWithoutUpdatingLineNumber();
+    void advanceWithoutUpdatingLineNumber16();
     void advanceAndUpdateLineNumber16();
     void advanceSlowCase();
     void advanceAndUpdateLineNumberSlowCase();
+    void advancePastSingleCharacterSubstringWithoutUpdatingLineNumber();
+    void advancePastSingleCharacterSubstring();
     void advanceEmpty();
+    void advanceSubstring();
+    void updateSlowCaseFunctionPointers();
+    void decrementAndCheckLength()
+    {
+        ASSERT(m_currentString.m_length > 1);
+        if (--m_currentString.m_length == 1)
+            updateSlowCaseFunctionPointers();
+    }
+    void updateAdvanceFunctionPointers()
+    {
+        if ((m_currentString.m_length > 1) && !m_pushedChar1) {
+            if (m_currentString.is8Bit()) {
+                m_advanceFunc = &SegmentedString::advance8;
+                m_fastPathFlags = Use8BitAdvance;
+                if (m_currentString.doNotExcludeLineNumbers()) {
+                    m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumber8;
+                    m_fastPathFlags |= Use8BitAdvanceAndUpdateLineNumbers;
+                } else
+                    m_advanceAndUpdateLineNumberFunc = &SegmentedString::advance8;
+                return;
+    void updateAdvanceFunctionPointers();
+    void updateAdvanceFunctionPointersForEmptyString();
+    void updateAdvanceFunctionPointersForSingleCharacterSubstring();
+    void decrementAndCheckLength();
+    template<typename CharacterType> static bool characterMismatch(CharacterType, char, bool lettersIgnoringASCIICase);
+    template<unsigned length, bool lettersIgnoringASCIICase> AdvancePastResult advancePast(const char (&literal)[length]);
+    AdvancePastResult advancePastSlowCase(const char* literal, bool lettersIgnoringASCIICase);
+    Substring m_currentSubstring;
+    Deque<Substring> m_otherSubstrings;
+    bool m_isClosed { false };
+    UChar m_currentCharacter { 0 };
+    unsigned m_numberOfCharactersConsumedPriorToCurrentSubstring { 0 };
+    unsigned m_numberOfCharactersConsumedPriorToCurrentLine { 0 };
+    int m_currentLine { 0 };
+    unsigned char m_fastPathFlags { NoFastPath };
+    void (SegmentedString::*m_advanceWithoutUpdatingLineNumberFunction)() { &SegmentedString::advanceEmpty };
+    void (SegmentedString::*m_advanceAndUpdateLineNumberFunction)() { &SegmentedString::advanceEmpty };
+};
+inline SegmentedString::Substring::Substring(String&& passedString)
+    : string(WTFMove(passedString))
+    , length(string.length())
+{
+    if (length) {
+        is8Bit = string.impl()->is8Bit();
+        if (is8Bit)
+            currentCharacter8 = string.impl()->characters8();
+        else
+            currentCharacter16 = string.impl()->characters16();
+    }
+}
+inline unsigned SegmentedString::Substring::numberOfCharactersConsumed() const
+{
+    return string.length() - length;
+}
+ALWAYS_INLINE UChar SegmentedString::Substring::currentCharacter() const
+{
+    ASSERT(length);
+    return is8Bit ? *currentCharacter8 : *currentCharacter16;
+}
+ALWAYS_INLINE UChar SegmentedString::Substring::currentCharacterPreIncrement()
+{
+    ASSERT(length);
+    return is8Bit ? *++currentCharacter8 : *++currentCharacter16;
+}
+inline SegmentedString::SegmentedString(String&& string)
+    : m_currentSubstring(WTFMove(string))
+{
+    if (m_currentSubstring.length) {
+        m_currentCharacter = m_currentSubstring.currentCharacter();
+        updateAdvanceFunctionPointers();
+    }
+}
+inline SegmentedString::SegmentedString(const String& string)
+    : SegmentedString(String { string })
+{
+}
+ALWAYS_INLINE void SegmentedString::decrementAndCheckLength()
+{
+    ASSERT(m_currentSubstring.length > 1);
+    if (UNLIKELY(--m_currentSubstring.length == 1))
+        updateAdvanceFunctionPointersForSingleCharacterSubstring();
+}
+ALWAYS_INLINE void SegmentedString::advanceWithoutUpdatingLineNumber()
+{
+    if (LIKELY(m_fastPathFlags & Use8BitAdvance)) {
+        m_currentCharacter = *++m_currentSubstring.currentCharacter8;
+        decrementAndCheckLength();
+        return;
+    }
+    (this->*m_advanceWithoutUpdatingLineNumberFunction)();
+}
+inline void SegmentedString::startNewLine()
+{
+    ++m_currentLine;
+    m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed();
+}
+inline void SegmentedString::processPossibleNewline()
+{
+    if (m_currentCharacter == '\n')
+        startNewLine();
+}
+inline void SegmentedString::advance()
+{
+    if (LIKELY(m_fastPathFlags & Use8BitAdvance)) {
+        ASSERT(m_currentSubstring.length > 1);
+        bool lastCharacterWasNewline = m_currentCharacter == '\n';
+        m_currentCharacter = *++m_currentSubstring.currentCharacter8;
+        bool haveOneCharacterLeft = --m_currentSubstring.length == 1;
+        if (LIKELY(!(lastCharacterWasNewline | haveOneCharacterLeft)))
+            return;
+        if (lastCharacterWasNewline & !!(m_fastPathFlags & Use8BitAdvanceAndUpdateLineNumbers))
+            startNewLine();
+        if (haveOneCharacterLeft)
+            updateAdvanceFunctionPointersForSingleCharacterSubstring();
+        return;
+    }
+    (this->*m_advanceAndUpdateLineNumberFunction)();
+}
+ALWAYS_INLINE void SegmentedString::advancePastNonNewline()
+{
+    ASSERT(m_currentCharacter != '\n');
+    advanceWithoutUpdatingLineNumber();
+}
+inline void SegmentedString::advancePastNewline()
+{
+    ASSERT(m_currentCharacter == '\n');
+    if (m_currentSubstring.length > 1) {
+        if (m_currentSubstring.doNotExcludeLineNumbers)
+            startNewLine();
+        m_currentCharacter = m_currentSubstring.currentCharacterPreIncrement();
+        decrementAndCheckLength();
+        return;
+    }
+    (this->*m_advanceAndUpdateLineNumberFunction)();
+}
+inline unsigned SegmentedString::numberOfCharactersConsumed() const
+{
+    return m_numberOfCharactersConsumedPriorToCurrentSubstring + m_currentSubstring.numberOfCharactersConsumed();
+}
+template<typename CharacterType> ALWAYS_INLINE bool SegmentedString::characterMismatch(CharacterType a, char b, bool lettersIgnoringASCIICase)
+{
+    return lettersIgnoringASCIICase ? !isASCIIAlphaCaselessEqual(a, b) : a != b;
+}
+template<unsigned lengthIncludingTerminator, bool lettersIgnoringASCIICase> SegmentedString::AdvancePastResult SegmentedString::advancePast(const char (&literal)[lengthIncludingTerminator])
+{
+    constexpr unsigned length = lengthIncludingTerminator - 1;
+    ASSERT(!literal[length]);
+    ASSERT(!strchr(literal, '\n'));
+    if (length + 1 < m_currentSubstring.length) {
+        if (m_currentSubstring.is8Bit) {
+            for (unsigned i = 0; i < length; ++i) {
+                if (characterMismatch(m_currentSubstring.currentCharacter8[i], literal[i], lettersIgnoringASCIICase))
+                    return DidNotMatch;
+            }
+            m_advanceFunc = &SegmentedString::advance16;
+            m_fastPathFlags = NoFastPath;
+            if (m_currentString.doNotExcludeLineNumbers())
+                m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceAndUpdateLineNumber16;
+            else
+                m_advanceAndUpdateLineNumberFunc = &SegmentedString::advance16;
+            m_currentSubstring.currentCharacter8 += length;
+            m_currentCharacter = *m_currentSubstring.currentCharacter8;
+        } else {
+            for (unsigned i = 0; i < length; ++i) {
+                if (characterMismatch(m_currentSubstring.currentCharacter16[i], literal[i], lettersIgnoringASCIICase))
+                    return DidNotMatch;
+            }
+            m_currentSubstring.currentCharacter16 += length;
+            m_currentCharacter = *m_currentSubstring.currentCharacter16;
+        }
+        m_currentSubstring.length -= length;
+        return DidMatch;
+    }
+    return advancePastSlowCase(literal, lettersIgnoringASCIICase);
+}
+inline void SegmentedString::updateAdvanceFunctionPointers()
+{
+    if (m_currentSubstring.length > 1) {
+        if (m_currentSubstring.is8Bit) {
+            m_fastPathFlags = Use8BitAdvance;
+            if (m_currentSubstring.doNotExcludeLineNumbers)
+                m_fastPathFlags |= Use8BitAdvanceAndUpdateLineNumbers;
             return;
+        }
+        if (!m_currentString.m_length && !isComposite()) {
+            m_advanceFunc = &SegmentedString::advanceEmpty;
+            m_fastPathFlags = NoFastPath;
+            m_advanceAndUpdateLineNumberFunc = &SegmentedString::advanceEmpty;
+        }
+        updateSlowCaseFunctionPointers();
+    }
+    // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters.
+    void advancePastNonNewlines(unsigned count);
+    void advancePastNonNewlines(unsigned count, UChar* consumedCharacters);
+    AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive);
+    AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive);
+    bool isComposite() const { return !m_substrings.isEmpty(); }
+    UChar m_pushedChar1;
+    UChar m_pushedChar2;
+    SegmentedSubstring m_currentString;
+    UChar m_currentChar;
+    int m_numberOfCharactersConsumedPriorToCurrentString;
+    int m_numberOfCharactersConsumedPriorToCurrentLine;
+    int m_currentLine;
+    Deque<SegmentedSubstring> m_substrings;
+    bool m_closed;
+    bool m_empty;
+    unsigned char m_fastPathFlags;
+    void (SegmentedString::*m_advanceFunc)();
+    void (SegmentedString::*m_advanceAndUpdateLineNumberFunc)();
+};
+inline void SegmentedString::advancePastNonNewlines(unsigned count)
+{
+    for (unsigned i = 0; i < count; ++i)
+        advancePastNonNewline();
+}
+inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive)
+{
+    ASSERT(strlen(literal) == length);
+    ASSERT(!strchr(literal, '\n'));
+    if (!m_pushedChar1) {
+        if (length <= static_cast<unsigned>(m_currentString.m_length)) {
+            if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive))
+                return DidNotMatch;
+            advancePastNonNewlines(length);
+            return DidMatch;
+        }
+    }
+    return advancePastSlowCase(literal, caseSensitive);
+}
+}
+#endif
+        m_fastPathFlags = NoFastPath;
+        m_advanceWithoutUpdatingLineNumberFunction = &SegmentedString::advanceWithoutUpdatingLineNumber16;
+        if (m_currentSubstring.doNotExcludeLineNumbers)
+            m_advanceAndUpdateLineNumberFunction = &SegmentedString::advanceAndUpdateLineNumber16;
+        else
+            m_advanceAndUpdateLineNumberFunction = &SegmentedString::advanceWithoutUpdatingLineNumber16;
+        return;
+    }
+    if (!m_currentSubstring.length) {
+        updateAdvanceFunctionPointersForEmptyString();
+        return;
+    }
+    updateAdvanceFunctionPointersForSingleCharacterSubstring();
+}
+}

trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h

-              r208646
+              r209058
 namespace WebCore {
 inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters)
+inline void unconsumeCharacters(SegmentedString& source, StringBuilder& consumedCharacters)
+{
     source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
+    source.pushBack(consumedCharacters.toString());
+}
 …
     while (!source.isEmpty()) {
         UChar character = source.currentChar();
+        UChar character = source.currentCharacter();
         switch (state) {
         case Initial:
 …
                 goto Decimal;
+            }
             source.pushBack(SegmentedString(ASCIILiteral("#")));
+            source.pushBack(ASCIILiteral("#"));
             return false;
         case MaybeHexLowerCaseX:
 …
                 goto Hex;
+            }
             source.pushBack(SegmentedString(ASCIILiteral("#x")));
+            source.pushBack(ASCIILiteral("#x"));
             return false;
         case MaybeHexUpperCaseX:
 …
                 goto Hex;
+            }
             source.pushBack(SegmentedString(ASCIILiteral("#X")));
+            source.pushBack(ASCIILiteral("#X"));
             return false;
         case Hex:
 …
+            }
             if (character == ';') {
                 source.advance();
+                source.advancePastNonNewline();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
 …
+            }
             if (character == ';') {
                 source.advance();
+                source.advancePastNonNewline();
                 decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
                 return true;
 …
+        }
         consumedCharacters.append(character);
         source.advance();
+        source.advancePastNonNewline();
+    }
     ASSERT(source.isEmpty());

trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h

-              r208646
+              r209058
 /*
  * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
+ * Copyright (C) 2008-2016 Apple Inc. All Rights Reserved.
  * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
  * Copyright (C) 2010 Google, Inc. All Rights Reserved.
 …
 #pragma once
-#include "SegmentedString.h"
 #if COMPILER(MSVC)
 // Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro.
 …
     case stateName:                                             \
     stateName: {                                                \
         const auto currentState = stateName;                    \
+        constexpr auto currentState = stateName;                \
         UNUSED_PARAM(currentState);
 …
         goto newState;                                          \
     } while (false)
+#define ADVANCE_PAST_NON_NEWLINE_TO(newState)                   \
+    do {                                                        \
+        if (!m_preprocessor.advancePastNonNewline(source, isNullCharacterSkippingState(newState))) { \
+            m_state = newState;                                 \
+            return haveBufferedCharacterToken();                \
+        }                                                       \
+        character = m_preprocessor.nextInputCharacter();        \
+        goto newState;                                          \
+    } while (false)
 // For more complex cases, caller consumes the characters first and then uses this macro.

trunk/Source/WebCore/xml/parser/XMLDocumentParser.cpp

r208840	r209058
101	101	}
102	102
103		void XMLDocumentParser::insert(~~const SegmentedString~~&)
	103	void XMLDocumentParser::insert(SegmentedString&&)
104	104	{
105	105	ASSERT_NOT_REACHED();
…	…
108	108	void XMLDocumentParser::append(RefPtr<StringImpl>&& inputSource)
109	109	{
110		SegmentedString source(WTFMove(inputSource));
	110	String source { WTFMove(inputSource) };
	111
111	112	if (m_sawXSLTransform \|\| !m_sawFirstElement)
112	113	m_originalSourceForTransform.append(source);
…	…
120	121	}
121	122
122		doWrite(source~~.toString()~~);
	123	doWrite(source);
123	124
124	125	// After parsing, dispatch image beforeload events.
…	…
153	154	}
154	155
155
156	156	bool XMLDocumentParser::updateLeafTextNode()
157	157	{

trunk/Source/WebCore/xml/parser/XMLDocumentParser.h

-              r208840
+              r209058
 /*
  * Copyright (C) 2000 Peter Kelly (pmk@post.com)
  * Copyright (C) 2005, 2006, 2007 Apple Inc. All rights reserved.
+ * Copyright (C) 2005-2016 Apple Inc. All rights reserved.
  * Copyright (C) 2007 Samuel Weinig (sam@webkit.org)
  * Copyright (C) 2008 Nokia Corporation and/or its subsidiary(-ies)
 …
 #include "SegmentedString.h"
 #include "XMLErrors.h"
+#include <libxml/tree.h>
+#include <libxml/xmlstring.h>
 #include <wtf/HashMap.h>
 #include <wtf/text/AtomicStringHash.h>
 #include <wtf/text/CString.h>
-#include <libxml/tree.h>
-#include <libxml/xmlstring.h>
 namespace WebCore {
 …
 class CachedResourceLoader;
 class DocumentFragment;
-class Document;
 class Element;
 class FrameView;
 class PendingCallbacks;
-class PendingScript;
 class Text;
     class XMLParserContext : public RefCounted<XMLParserContext> {
     public:
         static RefPtr<XMLParserContext> createMemoryParser(xmlSAXHandlerPtr, void* userData, const CString& chunk);
         static Ref<XMLParserContext> createStringParser(xmlSAXHandlerPtr, void* userData);
         ~XMLParserContext();
         xmlParserCtxtPtr context() const { return m_context; }
+class XMLParserContext : public RefCounted<XMLParserContext> {
+public:
+    static RefPtr<XMLParserContext> createMemoryParser(xmlSAXHandlerPtr, void* userData, const CString& chunk);
+    static Ref<XMLParserContext> createStringParser(xmlSAXHandlerPtr, void* userData);
+    ~XMLParserContext();
+    xmlParserCtxtPtr context() const { return m_context; }
     private:
         XMLParserContext(xmlParserCtxtPtr context)
             : m_context(context)
+        {
+        }
         xmlParserCtxtPtr m_context;
     };
+private:
+    XMLParserContext(xmlParserCtxtPtr context)
+        : m_context(context)
+    {
+    }
+    xmlParserCtxtPtr m_context;
+};
     class XMLDocumentParser final : public ScriptableDocumentParser, public PendingScriptClient {
         WTF_MAKE_FAST_ALLOCATED;
     public:
         static Ref<XMLDocumentParser> create(Document& document, FrameView* view)
+        {
             return adoptRef(*new XMLDocumentParser(document, view));
+        }
         static Ref<XMLDocumentParser> create(DocumentFragment& fragment, Element* element, ParserContentPolicy parserContentPolicy)
+        {
             return adoptRef(*new XMLDocumentParser(fragment, element, parserContentPolicy));
+        }
+class XMLDocumentParser final : public ScriptableDocumentParser, public PendingScriptClient {
+    WTF_MAKE_FAST_ALLOCATED;
+public:
+    static Ref<XMLDocumentParser> create(Document& document, FrameView* view)
+    {
+        return adoptRef(*new XMLDocumentParser(document, view));
+    }
+    static Ref<XMLDocumentParser> create(DocumentFragment& fragment, Element* element, ParserContentPolicy parserContentPolicy)
+    {
+        return adoptRef(*new XMLDocumentParser(fragment, element, parserContentPolicy));
+    }
         ~XMLDocumentParser();
+    ~XMLDocumentParser();
         // Exposed for callbacks:
         void handleError(XMLErrors::ErrorType, const char* message, TextPosition);
+    // Exposed for callbacks:
+    void handleError(XMLErrors::ErrorType, const char* message, TextPosition);
         void setIsXHTMLDocument(bool isXHTML) { m_isXHTMLDocument = isXHTML; }
         bool isXHTMLDocument() const { return m_isXHTMLDocument; }
+    void setIsXHTMLDocument(bool isXHTML) { m_isXHTMLDocument = isXHTML; }
+    bool isXHTMLDocument() const { return m_isXHTMLDocument; }
         static bool parseDocumentFragment(const String&, DocumentFragment&, Element* parent = nullptr, ParserContentPolicy = AllowScriptingContent);
+    static bool parseDocumentFragment(const String&, DocumentFragment&, Element* parent = nullptr, ParserContentPolicy = AllowScriptingContent);
         // Used by the XMLHttpRequest to check if the responseXML was well formed.
         bool wellFormed() const override { return !m_sawError; }
+    // Used by XMLHttpRequest to check if the responseXML was well formed.
+    bool wellFormed() const final { return !m_sawError; }
         static bool supportsXMLVersion(const String&);
+    static bool supportsXMLVersion(const String&);
     private:
         XMLDocumentParser(Document&, FrameView* = nullptr);
         XMLDocumentParser(DocumentFragment&, Element*, ParserContentPolicy);
+private:
+    explicit XMLDocumentParser(Document&, FrameView* = nullptr);
+    XMLDocumentParser(DocumentFragment&, Element*, ParserContentPolicy);
+        // From DocumentParser
+        void insert(const SegmentedString&) override;
+        void append(RefPtr<StringImpl>&&) override;
+        void finish() override;
+        bool isWaitingForScripts() const override;
+        void stopParsing() override;
+        void detach() override;
+    void insert(SegmentedString&&) final;
+    void append(RefPtr<StringImpl>&&) final;
+    void finish() final;
+    bool isWaitingForScripts() const final;
+    void stopParsing() final;
+    void detach() final;
         TextPosition textPosition() const override;
         bool shouldAssociateConsoleMessagesWithTextPosition() const override;
+    TextPosition textPosition() const final;
+    bool shouldAssociateConsoleMessagesWithTextPosition() const final;
         void notifyFinished(PendingScript&) final;
+    void notifyFinished(PendingScript&) final;
         void end();
+    void end();
         void pauseParsing();
         void resumeParsing();
+    void pauseParsing();
+    void resumeParsing();
         bool appendFragmentSource(const String&);
+    bool appendFragmentSource(const String&);
+    public:
+        // callbacks from parser SAX
+        void error(XMLErrors::ErrorType, const char* message, va_list args) WTF_ATTRIBUTE_PRINTF(3, 0);
+        void startElementNs(const xmlChar* xmlLocalName, const xmlChar* xmlPrefix, const xmlChar* xmlURI, int nb_namespaces,
+                            const xmlChar** namespaces, int nb_attributes, int nb_defaulted, const xmlChar** libxmlAttributes);
+        void endElementNs();
+        void characters(const xmlChar* s, int len);
+        void processingInstruction(const xmlChar* target, const xmlChar* data);
+        void cdataBlock(const xmlChar* s, int len);
+        void comment(const xmlChar* s);
+        void startDocument(const xmlChar* version, const xmlChar* encoding, int standalone);
+        void internalSubset(const xmlChar* name, const xmlChar* externalID, const xmlChar* systemID);
+        void endDocument();
+public:
+    // Callbacks from parser SAX, and other functions needed inside
+    // the parser implementation, but outside this class.
+        bool isParsingEntityDeclaration() const { return m_isParsingEntityDeclaration; }
+        void setIsParsingEntityDeclaration(bool value) { m_isParsingEntityDeclaration = value; }
+    void error(XMLErrors::ErrorType, const char* message, va_list args) WTF_ATTRIBUTE_PRINTF(3, 0);
+    void startElementNs(const xmlChar* xmlLocalName, const xmlChar* xmlPrefix, const xmlChar* xmlURI,
+        int numNamespaces, const xmlChar** namespaces,
+        int numAttributes, int numDefaulted, const xmlChar** libxmlAttributes);
+    void endElementNs();
+    void characters(const xmlChar*, int length);
+    void processingInstruction(const xmlChar* target, const xmlChar* data);
+    void cdataBlock(const xmlChar*, int length);
+    void comment(const xmlChar*);
+    void startDocument(const xmlChar* version, const xmlChar* encoding, int standalone);
+    void internalSubset(const xmlChar* name, const xmlChar* externalID, const xmlChar* systemID);
+    void endDocument();
         int depthTriggeringEntityExpansion() const { return m_depthTriggeringEntityExpansion; }
         void setDepthTriggeringEntityExpansion(int depth) { m_depthTriggeringEntityExpansion = depth; }
+    bool isParsingEntityDeclaration() const { return m_isParsingEntityDeclaration; }
+    void setIsParsingEntityDeclaration(bool value) { m_isParsingEntityDeclaration = value; }
     private:
         void initializeParserContext(const CString& chunk = CString());
+    int depthTriggeringEntityExpansion() const { return m_depthTriggeringEntityExpansion; }
+    void setDepthTriggeringEntityExpansion(int depth) { m_depthTriggeringEntityExpansion = depth; }
+        void pushCurrentNode(ContainerNode*);
+        void popCurrentNode();
+        void clearCurrentNodeStack();
+private:
+    void initializeParserContext(const CString& chunk = CString());
+        void insertErrorMessageBlock();
+    void pushCurrentNode(ContainerNode*);
+    void popCurrentNode();
+    void clearCurrentNodeStack();
+        void createLeafTextNode();
+        bool updateLeafTextNode();
+    void insertErrorMessageBlock();
         void doWrite(const String&);
         void doEnd();
+    void createLeafTextNode();
+    bool updateLeafTextNode();
+        FrameView* m_view;
+    void doWrite(const String&);
+    void doEnd();
         SegmentedString m_originalSourceForTransform;
+    xmlParserCtxtPtr context() const { return m_context ? m_context->context() : nullptr; };
+        xmlParserCtxtPtr context() const { return m_context ? m_context->context() : nullptr; };
+        RefPtr<XMLParserContext> m_context;
+        std::unique_ptr<PendingCallbacks> m_pendingCallbacks;
+        Vector<xmlChar> m_bufferedText;
+        int m_depthTriggeringEntityExpansion;
+        bool m_isParsingEntityDeclaration;
+    FrameView* m_view { nullptr };
+        ContainerNode* m_currentNode;
+        Vector<ContainerNode*> m_currentNodeStack;
+    SegmentedString m_originalSourceForTransform;
+        RefPtr<Text> m_leafTextNode;
+    RefPtr<XMLParserContext> m_context;
+    std::unique_ptr<PendingCallbacks> m_pendingCallbacks;
+    Vector<xmlChar> m_bufferedText;
+    int m_depthTriggeringEntityExpansion { -1 };
+    bool m_isParsingEntityDeclaration { false };
+        bool m_sawError;
+        bool m_sawCSS;
+        bool m_sawXSLTransform;
+        bool m_sawFirstElement;
+        bool m_isXHTMLDocument;
+        bool m_parserPaused;
+        bool m_requestingScript;
+        bool m_finishCalled;
+    ContainerNode* m_currentNode { nullptr };
+    Vector<ContainerNode*> m_currentNodeStack;
         std::unique_ptr<XMLErrors> m_xmlErrors;
+    RefPtr<Text> m_leafTextNode;
+        RefPtr<PendingScript> m_pendingScript;
+        TextPosition m_scriptStartPosition;
+    bool m_sawError { false };
+    bool m_sawCSS { false };
+    bool m_sawXSLTransform { false };
+    bool m_sawFirstElement { false };
+    bool m_isXHTMLDocument { false };
+    bool m_parserPaused { false };
+    bool m_requestingScript { false };
+    bool m_finishCalled { false };
+        bool m_parsingFragment;
+        AtomicString m_defaultNamespaceURI;
+    std::unique_ptr<XMLErrors> m_xmlErrors;
+        typedef HashMap<AtomicString, AtomicString> PrefixForNamespaceMap;
+        PrefixForNamespaceMap m_prefixToNamespaceMap;
+        SegmentedString m_pendingSrc;
+    };
+    RefPtr<PendingScript> m_pendingScript;
+    TextPosition m_scriptStartPosition;
+    bool m_parsingFragment { false };
+    AtomicString m_defaultNamespaceURI;
+    HashMap<AtomicString, AtomicString> m_prefixToNamespaceMap;
+    SegmentedString m_pendingSrc;
+};
 #if ENABLE(XSLT)

trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp

-              r208840
+              r209058
 /*
  * Copyright (C) 2000 Peter Kelly <pmk@post.com>
  * Copyright (C) 2005, 2006, 2008 Apple Inc. All rights reserved.
+ * Copyright (C) 2005-2016 Apple Inc. All rights reserved.
  * Copyright (C) 2006 Alexey Proskuryakov <ap@webkit.org>
  * Copyright (C) 2007 Samuel Weinig <sam@webkit.org>
 …
 #include "DocumentType.h"
 #include "Frame.h"
-#include "FrameLoader.h"
-#include "FrameView.h"
 #include "HTMLEntityParser.h"
 #include "HTMLHtmlElement.h"
-#include "HTMLLinkElement.h"
-#include "HTMLNames.h"
-#include "HTMLStyleElement.h"
 #include "HTMLTemplateElement.h"
-#include "LoadableClassicScript.h"
 #include "Page.h"
 #include "PendingScript.h"
 #include "ProcessingInstruction.h"
 #include "ResourceError.h"
-#include "ResourceRequest.h"
 #include "ResourceResponse.h"
 #include "ScriptElement.h"
 #include "ScriptSourceCode.h"
-#include "SecurityOrigin.h"
 #include "Settings.h"
 #include "StyleScope.h"
-#include "TextResourceDecoder.h"
 #include "TransformSource.h"
 #include "XMLNSNames.h"
 #include "XMLDocumentParserScope.h"
 #include <libxml/parserInternals.h>
-#include <wtf/Ref.h>
 #include <wtf/StringExtras.h>
-#include <wtf/Threading.h>
-#include <wtf/Vector.h>
 #include <wtf/unicode/UTF8.h>
 …
 #if ENABLE(XSLT)
+static inline bool hasNoStyleInformation(Document* document)
+{
+    if (document->sawElementsInKnownNamespaces())
+static inline bool shouldRenderInXMLTreeViewerMode(Document& document)
+{
+    if (document.sawElementsInKnownNamespaces())
         return false;
     if (document->transformSourceDocument())
+    if (document.transformSourceDocument())
         return false;
+    if (!document->frame() || !document->frame()->page())
+    auto* frame = document.frame();
+    if (!frame)
         return false;
     if (!document->frame()->page()->settings().developerExtrasEnabled())
+    if (!frame->settings().developerExtrasEnabled())
         return false;
     if (document->frame()->tree().parent())
+    if (frame->tree().parent())
         return false; // This document is not in a top frame
     return true;
+}
 #endif
 class PendingCallbacks {
     WTF_MAKE_NONCOPYABLE(PendingCallbacks); WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_FAST_ALLOCATED;
 public:
-    PendingCallbacks() = default;
     void appendStartElementNSCallback(const xmlChar* xmlLocalName, const xmlChar* xmlPrefix, const xmlChar* xmlURI, int numNamespaces, const xmlChar** namespaces, int numAttributes, int numDefaulted, const xmlChar** attributes)
+    {
 …
     : ScriptableDocumentParser(document)
     , m_view(frameView)
-    , m_context(nullptr)
     , m_pendingCallbacks(std::make_unique<PendingCallbacks>())
-    , m_depthTriggeringEntityExpansion(-1)
-    , m_isParsingEntityDeclaration(false)
     , m_currentNode(&document)
-    , m_sawError(false)
-    , m_sawCSS(false)
-    , m_sawXSLTransform(false)
-    , m_sawFirstElement(false)
-    , m_isXHTMLDocument(false)
-    , m_parserPaused(false)
-    , m_requestingScript(false)
-    , m_finishCalled(false)
     , m_scriptStartPosition(TextPosition::belowRangePosition())
-    , m_parsingFragment(false)
+{
+}
 …
 XMLDocumentParser::XMLDocumentParser(DocumentFragment& fragment, Element* parentElement, ParserContentPolicy parserContentPolicy)
     : ScriptableDocumentParser(fragment.document(), parserContentPolicy)
-    , m_view(nullptr)
-    , m_context(nullptr)
     , m_pendingCallbacks(std::make_unique<PendingCallbacks>())
-    , m_depthTriggeringEntityExpansion(-1)
-    , m_isParsingEntityDeclaration(false)
     , m_currentNode(&fragment)
-    , m_sawError(false)
-    , m_sawCSS(false)
-    , m_sawXSLTransform(false)
-    , m_sawFirstElement(false)
-    , m_isXHTMLDocument(false)
-    , m_parserPaused(false)
-    , m_requestingScript(false)
-    , m_finishCalled(false)
     , m_scriptStartPosition(TextPosition::belowRangePosition())
     , m_parsingFragment(true)
 …
+{
     const char* originalTarget = target;
+    WTF::Unicode::ConversionResult conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity,
+        utf16Entity + numberOfCodeUnits, &target, target + targetSize);
+    auto conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize);
     if (conversionResult != WTF::Unicode::conversionOK)
         return 0;
 …
 #if ENABLE(XSLT)
     bool xmlViewerMode = !m_sawError && !m_sawCSS && !m_sawXSLTransform && hasNoStyleInformation(document());
+    bool xmlViewerMode = !m_sawError && !m_sawCSS && !m_sawXSLTransform && shouldRenderInXMLTreeViewerMode(*document());
     if (xmlViewerMode) {
         XMLTreeViewer xmlTreeViewer(*document());
 …
+    }
-    // Then, write any pending data
-    SegmentedString rest = m_pendingSrc;
-    m_pendingSrc.clear();
     // There is normally only one string left, so toString() shouldn't copy.
     // In any case, the XML parser runs on the main thread and it's OK if
     // the passed string has more than one reference.
+    append(rest.toString().impl());
+    auto rest = m_pendingSrc.toString();
+    m_pendingSrc.clear();
+    append(rest.impl());
     // Finally, if finish() has been called and write() didn't result

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 209058 in webkit

Legend:

Download in other formats: