Changeset 64799 in webkit


Ignore:
Timestamp:
Aug 5, 2010 5:12:15 PM (14 years ago)
Author:
abarth@webkit.org
Message:

2010-08-05 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

U+0000 is turned to U+FFFD (replacement character)
https://bugs.webkit.org/show_bug.cgi?id=42112

Update test results to show null stripping. These changes are mostly
going back to the old results we had before we added the FFFD
replacement.

  • fast/dom/stripNullFromTextNodes-expected.txt:
  • fast/tokenizer/null-in-text-expected.txt: Added.
  • fast/tokenizer/null-in-text.html: Added.
  • fast/tokenizer/null-xss-expected.txt: Added.
  • fast/tokenizer/null-xss.html: Added.
    • The main risk with stripping null characters is that they'll be used in XSS attacks. This test shows that we don't strip null characters from tag names.
  • platform/mac/fast/text/stripNullFromText-expected.txt:
  • svg/dom/fuzz-path-parser-expected.txt:
  • svg/dom/rgb-color-parser-expected.txt:

2010-08-05 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

U+0000 is turned to U+FFFD (replacement character)
https://bugs.webkit.org/show_bug.cgi?id=42112

This patch introduces an intentional parsing difference from the HTML5
parsing specificiation. The spec requires us to convert NULL
characters to U+FFFD, but doing so causes compatibility issues with a
number of sites, including US Bank.

In this patch, we strip the null characters instead in certain cases.
Firefox has made a corresponding change. After gathering compatability
data, we hope to convince the HTML WG to adopt this change.

Tests: fast/tokenizer/null-in-text.html

fast/tokenizer/null-xss.html

  • html/HTMLTokenizer.cpp: (WebCore::HTMLTokenizer::HTMLTokenizer): (WebCore::HTMLTokenizer::reset):
  • html/HTMLTokenizer.h: (WebCore::HTMLTokenizer::setSkipLeadingNewLineForListing): (WebCore::HTMLTokenizer::forceNullCharacterReplacement): (WebCore::HTMLTokenizer::setForceNullCharacterReplacement): (WebCore::HTMLTokenizer::shouldSkipNullCharacters): (WebCore::HTMLTokenizer::InputStreamPreprocessor::InputStreamPreprocessor): (WebCore::HTMLTokenizer::InputStreamPreprocessor::peek):
  • html/HTMLTreeBuilder.cpp: (WebCore::HTMLTreeBuilder::passTokenToLegacyParser): (WebCore::HTMLTreeBuilder::constructTreeFromToken): (WebCore::HTMLTreeBuilder::processStartTagForInBody):
Location:
trunk
Files:
4 added
9 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r64796 r64799  
     12010-08-05  Adam Barth  <abarth@webkit.org>
     2
     3        Reviewed by Eric Seidel.
     4
     5        U+0000 is turned to U+FFFD (replacement character)
     6        https://bugs.webkit.org/show_bug.cgi?id=42112
     7
     8        Update test results to show null stripping.  These changes are mostly
     9        going back to the old results we had before we added the FFFD
     10        replacement.
     11
     12        * fast/dom/stripNullFromTextNodes-expected.txt:
     13        * fast/tokenizer/null-in-text-expected.txt: Added.
     14        * fast/tokenizer/null-in-text.html: Added.
     15        * fast/tokenizer/null-xss-expected.txt: Added.
     16        * fast/tokenizer/null-xss.html: Added.
     17            - The main risk with stripping null characters is that they'll be
     18              used in XSS attacks.  This test shows that we don't strip null
     19              characters from tag names.
     20        * platform/mac/fast/text/stripNullFromText-expected.txt:
     21        * svg/dom/fuzz-path-parser-expected.txt:
     22        * svg/dom/rgb-color-parser-expected.txt:
     23
    1242010-08-05  Adam Barth  <abarth@webkit.org>
    225
  • trunk/LayoutTests/fast/dom/stripNullFromTextNodes-expected.txt

    r61282 r64799  
    1 ���������hell�����o
    2 The null characters should be stripped out of the string above and it should have a length of 5. And the DOM thinks the length is...19 :-(
     1hello
     2The null characters should be stripped out of the string above and it should have a length of 5. And the DOM thinks the length is...5!
  • trunk/LayoutTests/platform/mac/fast/text/stripNullFromText-expected.txt

    r61234 r64799  
    44  RenderBlock {HTML} at (0,0) size 800x600
    55    RenderBody {BODY} at (8,8) size 784x584
    6       RenderBlock {DIV} at (0,0) size 784x21 [border: (1px solid #FF0000)]
    7         RenderText {#text} at (1,2) size 16x18
    8           text run at (1,2) width 16: "\x{FFFD}"
     6      RenderBlock {DIV} at (0,0) size 784x2 [border: (1px solid #FF0000)]
  • trunk/LayoutTests/svg/dom/fuzz-path-parser-expected.txt

    r61637 r64799  
    460460Could not parse:
    461461Could not parse: M
    462 Could not parse: M
     462Could not parse: M
    463463Parsed as 2 command(s) [MZ]: M1,1Z0
    464464PASS successfullyParsed is true
  • trunk/LayoutTests/svg/dom/rgb-color-parser-expected.txt

    r61637 r64799  
    255255Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb(
    256256Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1:
    257 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1:
    258 Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb()
     257Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1:
     258Threw exception Error: SVG_INVALID_VALUE_ERR: DOM SVG Exception 1: rgb()
    259259PASS successfullyParsed is true
    260260
  • trunk/WebCore/ChangeLog

    r64798 r64799  
     12010-08-05  Adam Barth  <abarth@webkit.org>
     2
     3        Reviewed by Eric Seidel.
     4
     5        U+0000 is turned to U+FFFD (replacement character)
     6        https://bugs.webkit.org/show_bug.cgi?id=42112
     7
     8        This patch introduces an intentional parsing difference from the HTML5
     9        parsing specificiation.  The spec requires us to convert NULL
     10        characters to U+FFFD, but doing so causes compatibility issues with a
     11        number of sites, including US Bank.
     12
     13        In this patch, we strip the null characters instead in certain cases.
     14        Firefox has made a corresponding change.  After gathering compatability
     15        data, we hope to convince the HTML WG to adopt this change.
     16
     17        Tests: fast/tokenizer/null-in-text.html
     18               fast/tokenizer/null-xss.html
     19
     20        * html/HTMLTokenizer.cpp:
     21        (WebCore::HTMLTokenizer::HTMLTokenizer):
     22        (WebCore::HTMLTokenizer::reset):
     23        * html/HTMLTokenizer.h:
     24        (WebCore::HTMLTokenizer::setSkipLeadingNewLineForListing):
     25        (WebCore::HTMLTokenizer::forceNullCharacterReplacement):
     26        (WebCore::HTMLTokenizer::setForceNullCharacterReplacement):
     27        (WebCore::HTMLTokenizer::shouldSkipNullCharacters):
     28        (WebCore::HTMLTokenizer::InputStreamPreprocessor::InputStreamPreprocessor):
     29        (WebCore::HTMLTokenizer::InputStreamPreprocessor::peek):
     30        * html/HTMLTreeBuilder.cpp:
     31        (WebCore::HTMLTreeBuilder::passTokenToLegacyParser):
     32        (WebCore::HTMLTreeBuilder::constructTreeFromToken):
     33        (WebCore::HTMLTreeBuilder::processStartTagForInBody):
     34
    1352010-08-05  Andy Estes  <aestes@apple.com>
    236
  • trunk/WebCore/html/HTMLTokenizer.cpp

    r64012 r64799  
    9898
    9999HTMLTokenizer::HTMLTokenizer()
     100    : m_inputStreamPreprocessor(this)
    100101{
    101102    reset();
     
    112113    m_lineNumber = 0;
    113114    m_skipLeadingNewLineForListing = false;
     115    m_forceNullCharacterReplacement = false;
    114116    m_additionalAllowedCharacter = '\0';
    115117}
  • trunk/WebCore/html/HTMLTokenizer.h

    r63987 r64799  
    133133    // Hack to skip leading newline in <pre>/<listing> for authoring ease.
    134134    // http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#parsing-main-inbody
    135     void skipLeadingNewLineForListing() { m_skipLeadingNewLineForListing = true; }
     135    void setSkipLeadingNewLineForListing(bool value) { m_skipLeadingNewLineForListing = value; }
     136
     137    bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
     138    void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
     139
     140    bool shouldSkipNullCharacters() const
     141    {
     142        return !m_forceNullCharacterReplacement
     143            && (m_state == DataState
     144                || m_state == RCDATAState
     145                || m_state == RAWTEXTState
     146                || m_state == PLAINTEXTState);
     147    }
    136148
    137149private:
     
    139151    class InputStreamPreprocessor : public Noncopyable {
    140152    public:
    141         InputStreamPreprocessor()
    142             : m_nextInputCharacter('\0')
     153        InputStreamPreprocessor(HTMLTokenizer* tokenizer)
     154            : m_tokenizer(tokenizer)
     155            , m_nextInputCharacter('\0')
    143156            , m_skipNextNewLine(false)
    144157        {
     
    152165        ALWAYS_INLINE bool peek(SegmentedString& source, int& lineNumber)
    153166        {
     167        PeekAgain:
    154168            m_nextInputCharacter = *source;
    155169
     
    180194                // by the replacement character. We suspect this is a problem with the spec as doing
    181195                // that filtering breaks surrogate pair handling and causes us not to match Minefield.
    182                 if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source))
     196                if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
     197                    if (m_tokenizer->shouldSkipNullCharacters()) {
     198                        source.advancePastNonNewline();
     199                        if (source.isEmpty())
     200                            return false;
     201                        goto PeekAgain;
     202                    }
    183203                    m_nextInputCharacter = 0xFFFD;
     204                }
    184205            }
    185206            return true;
     
    202223            return source.isClosed() && source.length() == 1;
    203224        }
     225
     226        HTMLTokenizer* m_tokenizer;
    204227
    205228        // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
     
    243266
    244267    bool m_skipLeadingNewLineForListing;
     268    bool m_forceNullCharacterReplacement;
    245269
    246270    // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer
  • trunk/WebCore/html/HTMLTreeBuilder.cpp

    r64712 r64799  
    470470            m_lastScriptElementStartLine = m_tokenizer->lineNumber();
    471471        } else if (oldStyleToken.tagName == preTag || oldStyleToken.tagName == listingTag)
    472             m_tokenizer->skipLeadingNewLineForListing();
     472            m_tokenizer->setSkipLeadingNewLineForListing(true);
    473473        else
    474474            m_tokenizer->setState(adjustedLexerState(m_tokenizer->state(), oldStyleToken.tagName, m_document->frame()));
     
    510510    AtomicHTMLToken token(rawToken);
    511511    processToken(token);
     512
     513    // Swallowing U+0000 characters isn't in the HTML5 spec, but turning all
     514    // the U+0000 characters into replacement characters has compatibility
     515    // problems.
     516    m_tokenizer->setForceNullCharacterReplacement(m_insertionMode == TextMode || m_insertionMode == InForeignContentMode);
    512517}
    513518
     
    849854        processFakePEndTagIfPInButtonScope();
    850855        m_tree.insertHTMLElement(token);
    851         m_tokenizer->skipLeadingNewLineForListing();
     856        m_tokenizer->setSkipLeadingNewLineForListing(true);
    852857        m_framesetOk = false;
    853858        return;
     
    969974    if (token.name() == textareaTag) {
    970975        m_tree.insertHTMLElement(token);
    971         m_tokenizer->skipLeadingNewLineForListing();
     976        m_tokenizer->setSkipLeadingNewLineForListing(true);
    972977        m_tokenizer->setState(HTMLTokenizer::RCDATAState);
    973978        m_originalInsertionMode = m_insertionMode;
Note: See TracChangeset for help on using the changeset viewer.