Context Navigation

← Previous Changeset
Next Changeset →

Changeset 64799 in webkit

Timestamp:

Aug 5, 2010 5:12:15 PM (14 years ago)

Author:

abarth@webkit.org

Message:

2010-08-05 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

U+0000 is turned to U+FFFD (replacement character)
https://bugs.webkit.org/show_bug.cgi?id=42112

Update test results to show null stripping. These changes are mostly
going back to the old results we had before we added the FFFD
replacement.

fast/dom/stripNullFromTextNodes-expected.txt:
fast/tokenizer/null-in-text-expected.txt: Added.
fast/tokenizer/null-in-text.html: Added.
fast/tokenizer/null-xss-expected.txt: Added.
fast/tokenizer/null-xss.html: Added.
- The main risk with stripping null characters is that they'll be used in XSS attacks. This test shows that we don't strip null characters from tag names.
platform/mac/fast/text/stripNullFromText-expected.txt:
svg/dom/fuzz-path-parser-expected.txt:
svg/dom/rgb-color-parser-expected.txt:

2010-08-05 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

U+0000 is turned to U+FFFD (replacement character)
https://bugs.webkit.org/show_bug.cgi?id=42112

This patch introduces an intentional parsing difference from the HTML5
parsing specificiation. The spec requires us to convert NULL
characters to U+FFFD, but doing so causes compatibility issues with a
number of sites, including US Bank.

In this patch, we strip the null characters instead in certain cases.
Firefox has made a corresponding change. After gathering compatability
data, we hope to convince the HTML WG to adopt this change.

Tests: fast/tokenizer/null-in-text.html

fast/tokenizer/null-xss.html

html/HTMLTokenizer.cpp: (WebCore::HTMLTokenizer::HTMLTokenizer): (WebCore::HTMLTokenizer::reset):
html/HTMLTokenizer.h: (WebCore::HTMLTokenizer::setSkipLeadingNewLineForListing): (WebCore::HTMLTokenizer::forceNullCharacterReplacement): (WebCore::HTMLTokenizer::setForceNullCharacterReplacement): (WebCore::HTMLTokenizer::shouldSkipNullCharacters): (WebCore::HTMLTokenizer::InputStreamPreprocessor::InputStreamPreprocessor): (WebCore::HTMLTokenizer::InputStreamPreprocessor::peek):
html/HTMLTreeBuilder.cpp: (WebCore::HTMLTreeBuilder::passTokenToLegacyParser): (WebCore::HTMLTreeBuilder::constructTreeFromToken): (WebCore::HTMLTreeBuilder::processStartTagForInBody):

Location:

trunk

Files:

: 4 added
: 9 edited

LayoutTests/ChangeLog (modified) (1 diff)
LayoutTests/fast/dom/stripNullFromTextNodes-expected.txt (modified) (1 diff)
LayoutTests/fast/tokenizer/null-in-text-expected.txt (added)
LayoutTests/fast/tokenizer/null-in-text.html (added)
LayoutTests/fast/tokenizer/null-xss-expected.txt (added)
LayoutTests/fast/tokenizer/null-xss.html (added)
LayoutTests/platform/mac/fast/text/stripNullFromText-expected.txt (modified) (1 diff)
LayoutTests/svg/dom/fuzz-path-parser-expected.txt (modified) (1 diff)
LayoutTests/svg/dom/rgb-color-parser-expected.txt (modified) (1 diff)
WebCore/ChangeLog (modified) (1 diff)
WebCore/html/HTMLTokenizer.cpp (modified) (2 diffs)
WebCore/html/HTMLTokenizer.h (modified) (6 diffs)
WebCore/html/HTMLTreeBuilder.cpp (modified) (4 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/LayoutTests/ChangeLog

-                      r64796
+                      r64799
+-08-05  Adam Barth  <abarth@webkit.org>
+        Reviewed by Eric Seidel.
+        U+0000 is turned to U+FFFD (replacement character)
+        https://bugs.webkit.org/show_bug.cgi?id=42112
+        Update test results to show null stripping.  These changes are mostly
+        going back to the old results we had before we added the FFFD
+        replacement.
+        * fast/dom/stripNullFromTextNodes-expected.txt:
+        * fast/tokenizer/null-in-text-expected.txt: Added.
+        * fast/tokenizer/null-in-text.html: Added.
+        * fast/tokenizer/null-xss-expected.txt: Added.
+        * fast/tokenizer/null-xss.html: Added.
+            - The main risk with stripping null characters is that they'll be
+              used in XSS attacks.  This test shows that we don't strip null
+              characters from tag names.
+        * platform/mac/fast/text/stripNullFromText-expected.txt:
+        * svg/dom/fuzz-path-parser-expected.txt:
+        * svg/dom/rgb-color-parser-expected.txt:
 -08-05  Adam Barth  <abarth@webkit.org>

trunk/LayoutTests/fast/dom/stripNullFromTextNodes-expected.txt

r61282	r64799
1		~~��hell��~~o
2		The null characters should be stripped out of the string above and it should have a length of 5. And the DOM thinks the length is...~~19 :-(~~
	1	hello
	2	The null characters should be stripped out of the string above and it should have a length of 5. And the DOM thinks the length is...5!

trunk/LayoutTests/platform/mac/fast/text/stripNullFromText-expected.txt

r61234	r64799
4	4	RenderBlock {HTML} at (0,0) size 800x600
5	5	RenderBody {BODY} at (8,8) size 784x584
6		RenderBlock {DIV} at (0,0) size 784x21 [border: (1px solid #FF0000)]
7		RenderText {#text} at (1,2) size 16x18
8		text run at (1,2) width 16: "\x{FFFD}"
	6	RenderBlock {DIV} at (0,0) size 784x2 [border: (1px solid #FF0000)]

trunk/LayoutTests/svg/dom/fuzz-path-parser-expected.txt

r61637	r64799
460	460	Could not parse:
461	461	Could not parse: M
462		Could not parse: M�
	462	Could not parse: M
463	463	Parsed as 2 command(s) [MZ]: M1,1Z0
464	464	PASS successfullyParsed is true

trunk/LayoutTests/svg/dom/rgb-color-parser-expected.txt

-                      r61637
+                      r64799
 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb(
 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1:
 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: �
 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb(�)
+Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1:
+Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb()
 PASS successfullyParsed is true

trunk/WebCore/ChangeLog

-                      r64798
+                      r64799
+-08-05  Adam Barth  <abarth@webkit.org>
+        Reviewed by Eric Seidel.
+        U+0000 is turned to U+FFFD (replacement character)
+        https://bugs.webkit.org/show_bug.cgi?id=42112
+        This patch introduces an intentional parsing difference from the HTML5
+        parsing specificiation.  The spec requires us to convert NULL
+        characters to U+FFFD, but doing so causes compatibility issues with a
+        number of sites, including US Bank.
+        In this patch, we strip the null characters instead in certain cases.
+        Firefox has made a corresponding change.  After gathering compatability
+        data, we hope to convince the HTML WG to adopt this change.
+        Tests: fast/tokenizer/null-in-text.html
+               fast/tokenizer/null-xss.html
+        * html/HTMLTokenizer.cpp:
+        (WebCore::HTMLTokenizer::HTMLTokenizer):
+        (WebCore::HTMLTokenizer::reset):
+        * html/HTMLTokenizer.h:
+        (WebCore::HTMLTokenizer::setSkipLeadingNewLineForListing):
+        (WebCore::HTMLTokenizer::forceNullCharacterReplacement):
+        (WebCore::HTMLTokenizer::setForceNullCharacterReplacement):
+        (WebCore::HTMLTokenizer::shouldSkipNullCharacters):
+        (WebCore::HTMLTokenizer::InputStreamPreprocessor::InputStreamPreprocessor):
+        (WebCore::HTMLTokenizer::InputStreamPreprocessor::peek):
+        * html/HTMLTreeBuilder.cpp:
+        (WebCore::HTMLTreeBuilder::passTokenToLegacyParser):
+        (WebCore::HTMLTreeBuilder::constructTreeFromToken):
+        (WebCore::HTMLTreeBuilder::processStartTagForInBody):
 -08-05  Andy Estes  <aestes@apple.com>

trunk/WebCore/html/HTMLTokenizer.cpp

-                      r64012
+                      r64799
 HTMLTokenizer::HTMLTokenizer()
+    : m_inputStreamPreprocessor(this)
+{
     reset();
 …
     m_lineNumber = 0;
     m_skipLeadingNewLineForListing = false;
+    m_forceNullCharacterReplacement = false;
     m_additionalAllowedCharacter = '\0';
+}

trunk/WebCore/html/HTMLTokenizer.h

-                      r63987
+                      r64799
     // Hack to skip leading newline in <pre>/<listing> for authoring ease.
     // http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#parsing-main-inbody
+    void skipLeadingNewLineForListing() { m_skipLeadingNewLineForListing = true; }
+    void setSkipLeadingNewLineForListing(bool value) { m_skipLeadingNewLineForListing = value; }
+    bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
+    void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
+    bool shouldSkipNullCharacters() const
+    {
+        return !m_forceNullCharacterReplacement
+            && (m_state == DataState
+                || m_state == RCDATAState
+                || m_state == RAWTEXTState
+                || m_state == PLAINTEXTState);
+    }
 private:
 …
     class InputStreamPreprocessor : public Noncopyable {
     public:
+        InputStreamPreprocessor()
+            : m_nextInputCharacter('\0')
+        InputStreamPreprocessor(HTMLTokenizer* tokenizer)
+            : m_tokenizer(tokenizer)
+            , m_nextInputCharacter('\0')
             , m_skipNextNewLine(false)
+        {
 …
         ALWAYS_INLINE bool peek(SegmentedString& source, int& lineNumber)
+        {
+        PeekAgain:
             m_nextInputCharacter = *source;
 …
                 // by the replacement character. We suspect this is a problem with the spec as doing
                 // that filtering breaks surrogate pair handling and causes us not to match Minefield.
+                if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source))
+                if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
+                    if (m_tokenizer->shouldSkipNullCharacters()) {
+                        source.advancePastNonNewline();
+                        if (source.isEmpty())
+                            return false;
+                        goto PeekAgain;
+                    }
                     m_nextInputCharacter = 0xFFFD;
+                }
+            }
             return true;
 …
             return source.isClosed() && source.length() == 1;
+        }
+        HTMLTokenizer* m_tokenizer;
         // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
 …
     bool m_skipLeadingNewLineForListing;
+    bool m_forceNullCharacterReplacement;
     // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer

trunk/WebCore/html/HTMLTreeBuilder.cpp

-                      r64712
+                      r64799
             m_lastScriptElementStartLine = m_tokenizer->lineNumber();
         } else if (oldStyleToken.tagName == preTag || oldStyleToken.tagName == listingTag)
             m_tokenizer->skipLeadingNewLineForListing();
+            m_tokenizer->setSkipLeadingNewLineForListing(true);
         else
             m_tokenizer->setState(adjustedLexerState(m_tokenizer->state(), oldStyleToken.tagName, m_document->frame()));
 …
     AtomicHTMLToken token(rawToken);
     processToken(token);
+    // Swallowing U+0000 characters isn't in the HTML5 spec, but turning all
+    // the U+0000 characters into replacement characters has compatibility
+    // problems.
+    m_tokenizer->setForceNullCharacterReplacement(m_insertionMode == TextMode || m_insertionMode == InForeignContentMode);
+}
 …
         processFakePEndTagIfPInButtonScope();
         m_tree.insertHTMLElement(token);
         m_tokenizer->skipLeadingNewLineForListing();
+        m_tokenizer->setSkipLeadingNewLineForListing(true);
         m_framesetOk = false;
         return;
 …
     if (token.name() == textareaTag) {
         m_tree.insertHTMLElement(token);
         m_tokenizer->skipLeadingNewLineForListing();
+        m_tokenizer->setSkipLeadingNewLineForListing(true);
         m_tokenizer->setState(HTMLTokenizer::RCDATAState);
         m_originalInsertionMode = m_insertionMode;

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 64799 in webkit

Legend:

Download in other formats: