Context Navigation

← Previous Changeset
Next Changeset →

Changeset 60926 in webkit

Timestamp:

Jun 9, 2010 5:49:25 PM (14 years ago)

Author:

abarth@webkit.org

Message:

2010-06-09 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

Fix handling of bytes received from the network while in document.write
https://bugs.webkit.org/show_bug.cgi?id=40356

The old tokenizer has specially logic for handling the case of
receiving bytes from the network while in a nested call to
document.write. This patch implements similar logic for the HTML5
tokenizer. Also, this patch abstracts the tricky shuffling of
SegmentedStrings behind a simple API.

I'm not sure how to trigger this case. My guess is we can trigger it
using a nested event loop, e.g., via alert(), but I'm not sure how to
test that in a LayoutTest. There don't appear to be any LayoutTests
that currently test this behavior despite it being present in the old
tokenizer.

html/HTML5Tokenizer.cpp: (WebCore::HTML5Tokenizer::pumpLexer): (WebCore::HTML5Tokenizer::write):
- Added a branch for the |append| argument.

(WebCore::HTML5Tokenizer::end):
(WebCore::HTML5Tokenizer::finish):
(WebCore::HTML5Tokenizer::executeScript):

Switch over to using a RAII pattern for recording and restoring insertion points.

html/HTML5Tokenizer.h: (WebCore::HTML5Tokenizer::InputStream::InputStream): (WebCore::HTML5Tokenizer::InputStream::appendToEnd): (WebCore::HTML5Tokenizer::InputStream::insertAtCurrentInsertionPoint): (WebCore::HTML5Tokenizer::InputStream::close):
- Putting the close() method on InputStream makes it much easier to handle EOF. We now just close the last buffer in the stream when the network says it's done.

(WebCore::HTML5Tokenizer::InputStream::current):

This class could be moved to its own file, but it shouldn't be used outside of the tokenizer.

(WebCore::HTML5Tokenizer::InsertionPointRecord::InsertionPointRecord):
(WebCore::HTML5Tokenizer::InsertionPointRecord::~InsertionPointRecord):

A simple RAII class for managing saved insertion points.

platform/text/SegmentedString.cpp: (WebCore::SegmentedString::operator=):
- Fix a related bug where m_closed was not being copied properly in the assignment operator.

Location:

trunk/WebCore

Files:

: 4 edited

ChangeLog (modified) (1 diff)
html/HTML5Tokenizer.cpp (modified) (6 diffs)
html/HTML5Tokenizer.h (modified) (2 diffs)
platform/text/SegmentedString.cpp (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/WebCore/ChangeLog

-                      r60916
+                      r60926
+-06-09  Adam Barth  <abarth@webkit.org>
+        Reviewed by Eric Seidel.
+        Fix handling of bytes received from the network while in document.write
+        https://bugs.webkit.org/show_bug.cgi?id=40356
+        The old tokenizer has specially logic for handling the case of
+        receiving bytes from the network while in a nested call to
+        document.write.  This patch implements similar logic for the HTML5
+        tokenizer.  Also, this patch abstracts the tricky shuffling of
+        SegmentedStrings behind a simple API.
+        I'm not sure how to trigger this case.  My guess is we can trigger it
+        using a nested event loop, e.g., via alert(), but I'm not sure how to
+        test that in a LayoutTest.  There don't appear to be any LayoutTests
+        that currently test this behavior despite it being present in the old
+        tokenizer.
+        * html/HTML5Tokenizer.cpp:
+        (WebCore::HTML5Tokenizer::pumpLexer):
+        (WebCore::HTML5Tokenizer::write):
+            - Added a branch for the |append| argument.
+        (WebCore::HTML5Tokenizer::end):
+        (WebCore::HTML5Tokenizer::finish):
+        (WebCore::HTML5Tokenizer::executeScript):
+            - Switch over to using a RAII pattern for recording and restoring
+              insertion points.
+        * html/HTML5Tokenizer.h:
+        (WebCore::HTML5Tokenizer::InputStream::InputStream):
+        (WebCore::HTML5Tokenizer::InputStream::appendToEnd):
+        (WebCore::HTML5Tokenizer::InputStream::insertAtCurrentInsertionPoint):
+        (WebCore::HTML5Tokenizer::InputStream::close):
+            - Putting the close() method on InputStream makes it much easier to
+              handle EOF.  We now just close the last buffer in the stream when
+              the network says it's done.
+        (WebCore::HTML5Tokenizer::InputStream::current):
+            - This class could be moved to its own file, but it shouldn't be
+              used outside of the tokenizer.
+        (WebCore::HTML5Tokenizer::InsertionPointRecord::InsertionPointRecord):
+        (WebCore::HTML5Tokenizer::InsertionPointRecord::~InsertionPointRecord):
+            - A simple RAII class for managing saved insertion points.
+        * platform/text/SegmentedString.cpp:
+        (WebCore::SegmentedString::operator=):
+            - Fix a related bug where m_closed was not being copied properly in
+              the assignment operator.
 -06-09  Tony Gentilcore  <tonyg@chromium.org>

trunk/WebCore/html/HTML5Tokenizer.cpp

-                      r60898
+                      r60926
     ASSERT(!m_parserStopped);
     ASSERT(!m_treeBuilder->isPaused());
     while (!m_parserStopped && m_lexer->nextToken(m_source, m_token)) {
+    while (!m_parserStopped && m_lexer->nextToken(m_input.current(), m_token)) {
         m_treeBuilder->constructTreeFromToken(m_token);
         m_token.clear();
 …
+}
 void HTML5Tokenizer::write(const SegmentedString& source, bool)
+void HTML5Tokenizer::write(const SegmentedString& source, bool appendData)
+{
     if (m_parserStopped)
 …
     NestingLevelIncrementer nestingLevelIncrementer(m_writeNestingLevel);
+    // HTML5Tokenizer::executeScript is responsible for handling saving m_source before re-entry.
+    m_source.append(source);
+    if (appendData) {
+        m_input.appendToEnd(source);
+        if (m_writeNestingLevel > 1) {
+            // We've gotten data off the network in a nested call to write().
+            // We don't want to consume any more of the input stream now.  Do
+            // not worry.  We'll consume this data in a less-nested write().
+            return;
+        }
+    } else
+        m_input.insertAtCurrentInsertionPoint(source);
     pumpLexerIfPossible();
     endIfDelayed();
 …
 void HTML5Tokenizer::end()
+{
-    m_source.close();
     pumpLexerIfPossible();
     // Informs the the rest of WebCore that parsing is really finished.
 …
 void HTML5Tokenizer::finish()
+{
     // We can't call m_source.close() yet as we may have a <script> execution
     // pending which will call document.write().  No more data off the network though.
     // end() calls Document::finishedParsing() once we're actually done parsing.
+    // We're not going to get any more data off the network, so we close the
+    // input stream to indicate EOF.
+    m_input.close();
     attemptToEnd();
+}
 …
     if (!m_document->frame())
         return;
+    SegmentedString oldInsertionPoint = m_source;
+    m_source = SegmentedString();
+    InsertionPointRecord savedInsertionPoint(m_input);
     m_document->frame()->script()->executeScript(sourceCode);
-    // Append oldInsertionPoint onto the new (likely empty) m_source instead of
-    // oldInsertionPoint.prepent(m_source) as that would ASSERT if
-    // m_source.escaped() (it had characters pushed back onto it).
-    // If m_source was closed, then the tokenizer was stopped, and we discard
-    // any pending data as though an EOF character was inserted into the stream.
-    if (!m_source.isClosed())
-        m_source.append(oldInsertionPoint);
+}

trunk/WebCore/html/HTML5Tokenizer.h

-                      r60898
+                      r60926
 private:
+    // The InputStream is made up of a sequence of SegmentedStrings:
+    //
+    // [--current--][--next--][--next--] ... [--next--]
+    //            /\                         (also called m_last)
+    //            L_ current insertion point
+    //
+    // The current segmented string is stored in InputStream.  Each of the
+    // afterInsertionPoint buffers are stored in InsertionPointRecords on the
+    // stack.
+    //
+    // We remove characters from the "current" string in the InputStream.
+    // document.write() will add characters at the current insertion point,
+    // which appends them to the "current" string.
+    //
+    // m_last is a pointer to the last of the afterInsertionPoint strings.
+    // The network adds data at the end of the InputStream, which appends
+    // them to the "last" string.
+    class InputStream {
+    public:
+        InputStream()
+            : m_last(&m_first)
+        {
+        }
+        void appendToEnd(const SegmentedString& string)
+        {
+            m_last->append(string);
+        }
+        void insertAtCurrentInsertionPoint(const SegmentedString& string)
+        {
+            m_first.append(string);
+        }
+        void close() { m_last->close(); }
+        SegmentedString& current() { return m_first; }
+        void splitInto(SegmentedString& next)
+        {
+            next = m_first;
+            m_first = SegmentedString();
+            if (m_last == &m_first) {
+                // We used to only have one SegmentedString in the InputStream
+                // but now we have two.  That means m_first is no longer also
+                // the m_last string, |next| is now the last one.
+                m_last = &next;
+            }
+        }
+        void mergeFrom(SegmentedString& next)
+        {
+            m_first.append(next);
+            if (m_last == &next) {
+                // The string |next| used to be the last SegmentedString in
+                // the InputStream.  Now that it's been merged into m_first,
+                // that makes m_first the last one.
+                m_last = &m_first;
+            }
+            if (next.isClosed()) {
+                // We also need to merge the "closed" state from next to
+                // m_first.  Arguably, this work could be done in append().
+                m_first.close();
+            }
+        }
+    private:
+        SegmentedString m_first;
+        SegmentedString* m_last;
+    };
+    class InsertionPointRecord {
+    public:
+        InsertionPointRecord(InputStream& inputStream)
+            : m_inputStream(&inputStream)
+        {
+            m_inputStream->splitInto(m_next);
+        }
+        ~InsertionPointRecord()
+        {
+            m_inputStream->mergeFrom(m_next);
+        }
+    private:
+        InputStream* m_inputStream;
+        SegmentedString m_next;
+    };
     void pumpLexer();
     void pumpLexerIfPossible();
 …
     bool inWrite() const { return m_writeNestingLevel > 0; }
     SegmentedString m_source;
+    InputStream m_input;
     // We hold m_token here because it might be partially complete.

trunk/WebCore/platform/text/SegmentedString.cpp

r60683	r60926
52	52	else
53	53	m_currentChar = other.m_currentChar;
	54	m_closed = other.m_closed;
54	55	return *this;
55	56	}

Note: See TracChangeset for help on using the changeset viewer.