Changeset 267963 in webkit


Ignore:
Timestamp:
Oct 5, 2020 7:40:55 AM (4 years ago)
Author:
achristensen@apple.com
Message:

Fix UTF-8 encoding in URL parsing
https://bugs.webkit.org/show_bug.cgi?id=217289

Reviewed by Darin Adler.

LayoutTests/imported/w3c:

  • web-platform-tests/url/a-element-expected.txt:
  • web-platform-tests/url/a-element-origin-expected.txt:
  • web-platform-tests/url/a-element-origin-xhtml-expected.txt:
  • web-platform-tests/url/a-element-xhtml-expected.txt:
  • web-platform-tests/url/resources/urltestdata.json:
  • web-platform-tests/url/url-constructor-expected.txt:
  • web-platform-tests/url/url-origin-expected.txt:

Source/WTF:

This matches the behavior of Firefox and the Unicode and whatwg encoding specifications.

  • wtf/URLParser.cpp:

(WTF::URLParser::utf8PercentEncode):
(WTF::URLParser::utf8QueryEncode):
(WTF::URLParser::parseHostAndPort):

Tools:

  • TestWebKitAPI/Tests/WTF/URLParser.cpp:

(TestWebKitAPI::TEST_F):

LayoutTests:

  • fast/url/anchor-expected.txt:
  • fast/url/anchor.html:
  • fast/url/path-expected.txt:
  • fast/url/path.html:
Location:
trunk
Files:
17 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r267958 r267963  
     12020-10-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        Fix UTF-8 encoding in URL parsing
     4        https://bugs.webkit.org/show_bug.cgi?id=217289
     5
     6        Reviewed by Darin Adler.
     7
     8        * fast/url/anchor-expected.txt:
     9        * fast/url/anchor.html:
     10        * fast/url/path-expected.txt:
     11        * fast/url/path.html:
     12
    1132020-10-04  Antoine Quint  <graouts@webkit.org>
    214
  • trunk/LayoutTests/fast/url/anchor-expected.txt

    r266399 r267963  
    99PASS canonicalize('http://www.example.com/#%41%a') is 'http://www.example.com/#%41%a'
    1010FAIL canonicalize('http://www.example.com/#\ud800\u597d') should be http://www.example.com/#�好. Was http://www.example.com/#%EF%BF%BD%E5%A5%BD.
    11 FAIL canonicalize('http://www.example.com/#a\uFDD0') should be http://www.example.com/#a﷐. Was http://www.example.com/#a%EF%BF%BD.
     11PASS canonicalize('http://www.example.com/#a\uFDD0') is 'http://www.example.com/#a%EF%B7%90'
    1212PASS canonicalize('http://www.example.com/#asdf#qwer') is 'http://www.example.com/#asdf#qwer'
    1313PASS canonicalize('http://www.example.com/##asdf') is 'http://www.example.com/##asdf'
  • trunk/LayoutTests/fast/url/anchor.html

    r266399 r267963  
    1616  ["%41%a", "%41%a"],
    1717  ["\\ud800\\u597d", "\\uFFFD\\u597D"],
    18   ["a\\uFDD0", "a\\uFDD0"],
     18  ["a\\uFDD0", "a%EF%B7%90"],
    1919  ["asdf#qwer", "asdf#qwer"],
    2020  ["#asdf", "#asdf"],
  • trunk/LayoutTests/fast/url/path-expected.txt

    r208087 r267963  
    3838PASS canonicalize('http://example.com/@asdf%40') is 'http://example.com/@asdf%40'
    3939PASS canonicalize('http://example.com/你好你好') is 'http://example.com/%E4%BD%A0%E5%A5%BD%E4%BD%A0%E5%A5%BD'
    40 PASS canonicalize('http://example.com/﷐zyx') is 'http://example.com/%EF%BF%BDzyx'
     40PASS canonicalize('http://example.com/﷐zyx') is 'http://example.com/%EF%B7%90zyx'
    4141PASS canonicalize('http://example.com/‥/foo') is 'http://example.com/%E2%80%A5/foo'
    4242PASS canonicalize('http://example.com//foo') is 'http://example.com/%EF%BB%BF/foo'
  • trunk/LayoutTests/fast/url/path.html

    r155273 r267963  
    7979  // Invalid unicode characters should fail. We only do validation on
    8080  // UTF-16 input, so this doesn't happen on 8-bit.
    81   ["/\ufdd0zyx", "/%EF%BF%BDzyx"],
     81  ["/\ufdd0zyx", "/%EF%B7%90zyx"],
    8282  // U+2025 TWO DOT LEADER should not be normalized to .. in the path
    8383  ["/\u2025/foo", "/%E2%80%A5/foo"],
  • trunk/LayoutTests/imported/w3c/ChangeLog

    r267959 r267963  
     12020-10-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        Fix UTF-8 encoding in URL parsing
     4        https://bugs.webkit.org/show_bug.cgi?id=217289
     5
     6        Reviewed by Darin Adler.
     7
     8        * web-platform-tests/url/a-element-expected.txt:
     9        * web-platform-tests/url/a-element-origin-expected.txt:
     10        * web-platform-tests/url/a-element-origin-xhtml-expected.txt:
     11        * web-platform-tests/url/a-element-xhtml-expected.txt:
     12        * web-platform-tests/url/resources/urltestdata.json:
     13        * web-platform-tests/url/url-constructor-expected.txt:
     14        * web-platform-tests/url/url-origin-expected.txt:
     15
    1162020-10-05  Rob Buis  <rbuis@igalia.com>
    217
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/a-element-expected.txt

    r267933 r267963  
    360360PASS Parsing: <wow:%NBD> against <about:blank>
    361361PASS Parsing: <wow:%1G> against <about:blank>
    362 FAIL Parsing: <wow:￿> against <about:blank> assert_equals: href expected "wow:%EF%BF%BF" but got "wow:%EF%BF%BD"
     362PASS Parsing: <wow:￿> against <about:blank>
     363PASS Parsing: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    363364PASS Parsing: <http://a<b> against <about:blank>
    364365PASS Parsing: <http://a>b> against <about:blank>
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/a-element-origin-expected.txt

    r267647 r267963  
    263263FAIL Parsing origin: <wow:%1G> against <about:blank> assert_equals: origin expected "null" but got "wow://"
    264264FAIL Parsing origin: <wow:￿> against <about:blank> assert_equals: origin expected "null" but got "wow://"
     265PASS Parsing origin: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    265266PASS Parsing origin: <http://!"$&'()*+,-.;=_`{|}~/> against <about:blank>
    266267FAIL Parsing origin: <sc://!"$&'()*+,-.;=_`{|}~/> against <about:blank> assert_equals: origin expected "null" but got "sc://%1f!\"$&'()*+,-.;=_`{|}~"
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/a-element-origin-xhtml-expected.txt

    r267647 r267963  
    263263FAIL Parsing origin: <wow:%1G> against <about:blank> assert_equals: origin expected "null" but got "wow://"
    264264FAIL Parsing origin: <wow:￿> against <about:blank> assert_equals: origin expected "null" but got "wow://"
     265PASS Parsing origin: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    265266PASS Parsing origin: <http://!"$&'()*+,-.;=_`{|}~/> against <about:blank>
    266267FAIL Parsing origin: <sc://!"$&'()*+,-.;=_`{|}~/> against <about:blank> assert_equals: origin expected "null" but got "sc://%1f!\"$&'()*+,-.;=_`{|}~"
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/a-element-xhtml-expected.txt

    r267933 r267963  
    360360PASS Parsing: <wow:%NBD> against <about:blank>
    361361PASS Parsing: <wow:%1G> against <about:blank>
    362 FAIL Parsing: <wow:￿> against <about:blank> assert_equals: href expected "wow:%EF%BF%BF" but got "wow:%EF%BF%BD"
     362PASS Parsing: <wow:￿> against <about:blank>
     363PASS Parsing: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    363364PASS Parsing: <http://a<b> against <about:blank>
    364365PASS Parsing: <http://a>b> against <about:blank>
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/resources/urltestdata.json

    r267896 r267963  
    46194619    "hash": ""
    46204620  },
    4621   "# unknown scheme with non-URL characters in the path",
     4621  "# unknown scheme with non-URL characters",
    46224622  {
    46234623    "input": "wow:\uFFFF",
     
    46334633    "pathname": "%EF%BF%BF",
    46344634    "search": "",
     4635    "hash": ""
     4636  },
     4637  {
     4638    "input": "http://example.com/\uD800\uD801\uDFFE\uDFFF\uFDD0\uFDCF\uFDEF\uFDF0\uFFFE\uFFFF?\uD800\uD801\uDFFE\uDFFF\uFDD0\uFDCF\uFDEF\uFDF0\uFFFE\uFFFF",
     4639    "base": "about:blank",
     4640    "href": "http://example.com/%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF?%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
     4641    "origin": "http://example.com",
     4642    "protocol": "http:",
     4643    "username": "",
     4644    "password": "",
     4645    "host": "example.com",
     4646    "hostname": "example.com",
     4647    "port": "",
     4648    "pathname": "/%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
     4649    "search": "?%EF%BF%BD%F0%90%9F%BE%EF%BF%BD%EF%B7%90%EF%B7%8F%EF%B7%AF%EF%B7%B0%EF%BF%BE%EF%BF%BF",
    46354650    "hash": ""
    46364651  },
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/url-constructor-expected.txt

    r267933 r267963  
    361361PASS Parsing: <wow:%NBD> against <about:blank>
    362362PASS Parsing: <wow:%1G> against <about:blank>
    363 FAIL Parsing: <wow:￿> against <about:blank> assert_equals: href expected "wow:%EF%BF%BF" but got "wow:%EF%BF%BD"
     363PASS Parsing: <wow:￿> against <about:blank>
     364PASS Parsing: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    364365PASS Parsing: <http://a<b> against <about:blank>
    365366PASS Parsing: <http://a>b> against <about:blank>
  • trunk/LayoutTests/imported/w3c/web-platform-tests/url/url-origin-expected.txt

    r267647 r267963  
    262262FAIL Origin parsing: <wow:%1G> against <about:blank> assert_equals: origin expected "null" but got "wow://"
    263263FAIL Origin parsing: <wow:￿> against <about:blank> assert_equals: origin expected "null" but got "wow://"
     264PASS Origin parsing: <http://example.com/U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿?U+d800�U+dffeU+dfff﷐﷏﷯ﷰ￾￿> against <about:blank>
    264265PASS Origin parsing: <http://!"$&'()*+,-.;=_`{|}~/> against <about:blank>
    265266FAIL Origin parsing: <sc://!"$&'()*+,-.;=_`{|}~/> against <about:blank> assert_equals: origin expected "null" but got "sc://%1f!\"$&'()*+,-.;=_`{|}~"
  • trunk/Source/WTF/ChangeLog

    r267938 r267963  
     12020-10-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        Fix UTF-8 encoding in URL parsing
     4        https://bugs.webkit.org/show_bug.cgi?id=217289
     5
     6        Reviewed by Darin Adler.
     7
     8        This matches the behavior of Firefox and the Unicode and whatwg encoding specifications.
     9
     10        * wtf/URLParser.cpp:
     11        (WTF::URLParser::utf8PercentEncode):
     12        (WTF::URLParser::utf8QueryEncode):
     13        (WTF::URLParser::parseHostAndPort):
     14
    1152020-10-03  Yusuke Suzuki  <ysuzuki@apple.com>
    216
  • trunk/Source/WTF/wtf/URLParser.cpp

    r267933 r267963  
    483483    ASSERT_WITH_MESSAGE(isInCodeSet(codePoint), "isInCodeSet should always return true for non-ASCII characters");
    484484    syntaxViolation(iterator);
    485    
    486     if (!U_IS_UNICODE_CHAR(codePoint)) {
     485
     486    uint8_t buffer[U8_MAX_LENGTH];
     487    int32_t offset = 0;
     488    UBool isError = false;
     489    U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, isError);
     490    if (isError) {
    487491        appendToASCIIBuffer(replacementCharacterUTF8PercentEncoded, replacementCharacterUTF8PercentEncodedLength);
    488492        return;
    489493    }
    490    
    491     uint8_t buffer[U8_MAX_LENGTH];
    492     int32_t offset = 0;
    493     U8_APPEND_UNSAFE(buffer, offset, codePoint);
    494494    for (int32_t i = 0; i < offset; ++i)
    495495        percentEncodeByte(buffer[i]);
     
    509509        return;
    510510    }
    511    
     511
    512512    syntaxViolation(iterator);
    513    
    514     if (!U_IS_UNICODE_CHAR(codePoint)) {
     513
     514    uint8_t buffer[U8_MAX_LENGTH];
     515    int32_t offset = 0;
     516    UBool isError = false;
     517    U8_APPEND(buffer, offset, U8_MAX_LENGTH, codePoint, isError);
     518    if (isError) {
    515519        appendToASCIIBuffer(replacementCharacterUTF8PercentEncoded, replacementCharacterUTF8PercentEncodedLength);
    516520        return;
    517521    }
    518 
    519     uint8_t buffer[U8_MAX_LENGTH];
    520     int32_t offset = 0;
    521     U8_APPEND_UNSAFE(buffer, offset, codePoint);
    522522    for (int32_t i = 0; i < offset; ++i) {
    523523        auto byte = buffer[i];
     
    27212721            syntaxViolation(hostBegin);
    27222722
    2723         if (!U_IS_UNICODE_CHAR(*iterator))
    2724             return false;
    27252723        uint8_t buffer[U8_MAX_LENGTH];
    27262724        int32_t offset = 0;
    2727         U8_APPEND_UNSAFE(buffer, offset, *iterator);
     2725        UBool isError = false;
     2726        U8_APPEND(buffer, offset, U8_MAX_LENGTH, *iterator, isError);
     2727        if (isError)
     2728            return false;
    27282729        utf8Encoded.append(buffer, offset);
    27292730    }
  • trunk/Tools/ChangeLog

    r267957 r267963  
     12020-10-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        Fix UTF-8 encoding in URL parsing
     4        https://bugs.webkit.org/show_bug.cgi?id=217289
     5
     6        Reviewed by Darin Adler.
     7
     8        * TestWebKitAPI/Tests/WTF/URLParser.cpp:
     9        (TestWebKitAPI::TEST_F):
     10
    1112020-10-05  Rob Buis  <rbuis@igalia.com>
    212
  • trunk/Tools/TestWebKitAPI/Tests/WTF/URLParser.cpp

    r267931 r267963  
    13041304        {"http", "", "", "w", 0, "/", "%EF%BF%BD", "", "http://w/?%EF%BF%BD"},
    13051305        {"http", "", "", "w", 0, "/", "%ED%A0%80", "", "http://w/?%ED%A0%80"});
    1306    
    1307     // FIXME: Write more invalid surrogate pair tests based on feedback from https://bugs.webkit.org/show_bug.cgi?id=162105
    13081306}
    13091307
Note: See TracChangeset for help on using the changeset viewer.