Changeset 236565 in webkit


Ignore:
Timestamp:
Sep 27, 2018, 1:05:52 PM (7 years ago)
Author:
achristensen@apple.com
Message:

URLParser should use TextEncoding through an abstract class
https://bugs.webkit.org/show_bug.cgi?id=190027

Reviewed by Andy Estes.

Source/WebCore:

URLParser uses TextEncoding for one call to encode, which is only used for encoding the query of URLs in documents with non-UTF encodings.
There are 3 call sites that specify the TextEncoding to use from the Document, and even those call sites use a UTF encoding most of the time.
All other URL parsing is done using a well-optimized path which assumes UTF-8 encoding and uses macros from ICU headers, not a TextEncoding.
Moving the logic in this way breaks URL and URLParser's dependency on TextEncoding, which makes it possible to use in a lower-level project
without also moving TextEncoding, TextCodec, TextCodecICU, ThreadGlobalData, and the rest of WebCore and JavaScriptCore.

There is no observable change in behavior. There is now one virtual function call in a code path in URLParser that is not performance-sensitive,
and TextEncodings now have a vtable, which uses a few more bytes of memory total for WebKit.

  • css/parser/CSSParserContext.h:

(WebCore::CSSParserContext::completeURL const):

  • css/parser/CSSParserIdioms.cpp:

(WebCore::completeURL):

  • dom/Document.cpp:

(WebCore::Document::completeURL const):

  • html/HTMLBaseElement.cpp:

(WebCore::HTMLBaseElement::href const):
Move the call to encodingForFormSubmission from the URL constructor to the 3 call sites that specify the encoding from the Document.

  • loader/FormSubmission.cpp:

(WebCore::FormSubmission::create):

  • loader/TextResourceDecoder.cpp:

(WebCore::TextResourceDecoder::encodingForURLParsing):

  • loader/TextResourceDecoder.h:
  • platform/URL.cpp:

(WebCore::URL::URL):

  • platform/URL.h:

(WebCore::URLTextEncoding::~URLTextEncoding):

  • platform/URLParser.cpp:

(WebCore::URLParser::encodeNonUTF8Query):
(WebCore::URLParser::copyURLPartsUntil):
(WebCore::URLParser::URLParser):
(WebCore::URLParser::parse):
(WebCore::URLParser::encodeQuery): Deleted.
A pointer replaces the boolean isUTF8Encoding and the TextEncoding& which had a default value of UTF8Encoding.
Now the pointer being null means that we use UTF8, and the pointer being non-null means we use that encoding.

  • platform/URLParser.h:

(WebCore::URLParser::URLParser):

  • platform/text/TextEncoding.cpp:

(WebCore::UTF7Encoding):
(WebCore::TextEncoding::encodingForFormSubmissionOrURLParsing const):
(WebCore::ASCIIEncoding):
(WebCore::Latin1Encoding):
(WebCore::UTF16BigEndianEncoding):
(WebCore::UTF16LittleEndianEncoding):
(WebCore::UTF8Encoding):
(WebCore::WindowsLatin1Encoding):
(WebCore::TextEncoding::encodingForFormSubmission const): Deleted.
Use NeverDestroyed because TextEncoding now has a virtual destructor.

  • platform/text/TextEncoding.h:

Rename encodingForFormSubmission to encodingForFormSubmissionOrURLParsing to make it more clear that we are intentionally using it for both.

Tools:

  • TestWebKitAPI/Tests/WebCore/URLParser.cpp:

(TestWebKitAPI::checkURL):
(TestWebKitAPI::TEST_F):

Location:
trunk
Files:
16 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/WebCore/ChangeLog

    r236563 r236565  
     12018-09-27  Alex Christensen  <achristensen@webkit.org>
     2
     3        URLParser should use TextEncoding through an abstract class
     4        https://bugs.webkit.org/show_bug.cgi?id=190027
     5
     6        Reviewed by Andy Estes.
     7
     8        URLParser uses TextEncoding for one call to encode, which is only used for encoding the query of URLs in documents with non-UTF encodings.
     9        There are 3 call sites that specify the TextEncoding to use from the Document, and even those call sites use a UTF encoding most of the time.
     10        All other URL parsing is done using a well-optimized path which assumes UTF-8 encoding and uses macros from ICU headers, not a TextEncoding.
     11        Moving the logic in this way breaks URL and URLParser's dependency on TextEncoding, which makes it possible to use in a lower-level project
     12        without also moving TextEncoding, TextCodec, TextCodecICU, ThreadGlobalData, and the rest of WebCore and JavaScriptCore.
     13
     14        There is no observable change in behavior.  There is now one virtual function call in a code path in URLParser that is not performance-sensitive,
     15        and TextEncodings now have a vtable, which uses a few more bytes of memory total for WebKit.
     16
     17        * css/parser/CSSParserContext.h:
     18        (WebCore::CSSParserContext::completeURL const):
     19        * css/parser/CSSParserIdioms.cpp:
     20        (WebCore::completeURL):
     21        * dom/Document.cpp:
     22        (WebCore::Document::completeURL const):
     23        * html/HTMLBaseElement.cpp:
     24        (WebCore::HTMLBaseElement::href const):
     25        Move the call to encodingForFormSubmission from the URL constructor to the 3 call sites that specify the encoding from the Document.
     26        * loader/FormSubmission.cpp:
     27        (WebCore::FormSubmission::create):
     28        * loader/TextResourceDecoder.cpp:
     29        (WebCore::TextResourceDecoder::encodingForURLParsing):
     30        * loader/TextResourceDecoder.h:
     31        * platform/URL.cpp:
     32        (WebCore::URL::URL):
     33        * platform/URL.h:
     34        (WebCore::URLTextEncoding::~URLTextEncoding):
     35        * platform/URLParser.cpp:
     36        (WebCore::URLParser::encodeNonUTF8Query):
     37        (WebCore::URLParser::copyURLPartsUntil):
     38        (WebCore::URLParser::URLParser):
     39        (WebCore::URLParser::parse):
     40        (WebCore::URLParser::encodeQuery): Deleted.
     41        A pointer replaces the boolean isUTF8Encoding and the TextEncoding& which had a default value of UTF8Encoding.
     42        Now the pointer being null means that we use UTF8, and the pointer being non-null means we use that encoding.
     43        * platform/URLParser.h:
     44        (WebCore::URLParser::URLParser):
     45        * platform/text/TextEncoding.cpp:
     46        (WebCore::UTF7Encoding):
     47        (WebCore::TextEncoding::encodingForFormSubmissionOrURLParsing const):
     48        (WebCore::ASCIIEncoding):
     49        (WebCore::Latin1Encoding):
     50        (WebCore::UTF16BigEndianEncoding):
     51        (WebCore::UTF16LittleEndianEncoding):
     52        (WebCore::UTF8Encoding):
     53        (WebCore::WindowsLatin1Encoding):
     54        (WebCore::TextEncoding::encodingForFormSubmission const): Deleted.
     55        Use NeverDestroyed because TextEncoding now has a virtual destructor.
     56        * platform/text/TextEncoding.h:
     57        Rename encodingForFormSubmission to encodingForFormSubmissionOrURLParsing to make it more clear that we are intentionally using it for both.
     58
    1592018-09-27  John Wilander  <wilander@apple.com>
    260
  • trunk/Source/WebCore/css/parser/CSSParserContext.h

    r234215 r236565  
    7070        if (charset.isEmpty())
    7171            return URL(baseURL, url);
    72         return URL(baseURL, url, TextEncoding(charset));
     72        TextEncoding encoding(charset);
     73        auto& encodingForURLParsing = encoding.encodingForFormSubmissionOrURLParsing();
     74        return URL(baseURL, url, encodingForURLParsing == UTF8Encoding() ? nullptr : &encodingForURLParsing);
    7375    }
    7476};
  • trunk/Source/WebCore/css/parser/CSSParserIdioms.cpp

    r218890 r236565  
    4848URL completeURL(const CSSParserContext& context, const String& url)
    4949{
    50     if (url.isNull())
    51         return URL();
    52     if (context.charset.isEmpty())
    53         return URL(context.baseURL, url);
    54     return URL(context.baseURL, url, context.charset);
     50    return context.completeURL(url);
    5551}
    5652
  • trunk/Source/WebCore/dom/Document.cpp

    r236560 r236565  
    48954895    if (!m_decoder)
    48964896        return URL(baseURL, url);
    4897     return URL(baseURL, url, m_decoder->encoding());
     4897    return URL(baseURL, url, m_decoder->encodingForURLParsing());
    48984898}
    48994899
  • trunk/Source/WebCore/html/HTMLBaseElement.cpp

    r229694 r236565  
    9090        return document().url();
    9191
    92     URL url = !document().decoder() ?
    93         URL(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue)) :
    94         URL(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue), document().decoder()->encoding());
     92    auto* encoding = document().decoder() ? document().decoder()->encodingForURLParsing() : nullptr;
     93    URL url(document().url(), stripLeadingAndTrailingHTMLSpaces(attributeValue), encoding);
    9594
    9695    if (!url.isValid())
  • trunk/Source/WebCore/loader/FormSubmission.cpp

    r234278 r236565  
    176176
    177177    auto dataEncoding = isMailtoForm ? UTF8Encoding() : encodingFromAcceptCharset(copiedAttributes.acceptCharset(), document);
    178     auto domFormData = DOMFormData::create(dataEncoding.encodingForFormSubmission());
     178    auto domFormData = DOMFormData::create(dataEncoding.encodingForFormSubmissionOrURLParsing());
    179179    StringPairVector formValues;
    180180
  • trunk/Source/WebCore/loader/TextResourceDecoder.cpp

    r228594 r236565  
    660660}
    661661
    662 }
     662const TextEncoding* TextResourceDecoder::encodingForURLParsing()
     663{
     664    // For UTF-{7,16,32}, we want to use UTF-8 for the query part as
     665    // we do when submitting a form. A form with GET method
     666    // has its contents added to a URL as query params and it makes sense
     667    // to be consistent.
     668    auto& encoding = m_encoding.encodingForFormSubmissionOrURLParsing();
     669    if (encoding == UTF8Encoding())
     670        return nullptr;
     671    return &encoding;
     672}
     673
     674}
  • trunk/Source/WebCore/loader/TextResourceDecoder.h

    r225618 r236565  
    4949    void setEncoding(const TextEncoding&, EncodingSource);
    5050    const TextEncoding& encoding() const { return m_encoding; }
     51    const TextEncoding* encodingForURLParsing();
    5152
    5253    bool hasEqualEncodingForCharset(const String& charset) const;
  • trunk/Source/WebCore/platform/URL.cpp

    r235949 r236565  
    104104}
    105105
    106 URL::URL(const URL& base, const String& relative)
    107 {
    108     URLParser parser(relative, base);
    109     *this = parser.result();
    110 }
    111 
    112 URL::URL(const URL& base, const String& relative, const TextEncoding& encoding)
    113 {
    114     // For UTF-{7,16,32}, we want to use UTF-8 for the query part as
    115     // we do when submitting a form. A form with GET method
    116     // has its contents added to a URL as query params and it makes sense
    117     // to be consistent.
    118     URLParser parser(relative, base, encoding.encodingForFormSubmission());
     106URL::URL(const URL& base, const String& relative, const URLTextEncoding* encoding)
     107{
     108    URLParser parser(relative, base, encoding);
    119109    *this = parser.result();
    120110}
  • trunk/Source/WebCore/platform/URL.h

    r235949 r236565  
    4848namespace WebCore {
    4949
    50 class TextEncoding;
     50class URLTextEncoding {
     51public:
     52    virtual Vector<uint8_t> encodeForURLParsing(StringView) const = 0;
     53    virtual ~URLTextEncoding() { };
     54};
     55
    5156struct URLHash;
    5257
     
    6671
    6772    // Resolves the relative URL with the given base URL. If provided, the
    68     // TextEncoding is used to encode non-ASCII characers. The base URL can be
     73    // URLTextEncoding is used to encode non-ASCII characers. The base URL can be
    6974    // null or empty, in which case the relative URL will be interpreted as
    7075    // absolute.
     
    7277    // URL. Instead I think it would be better to treat all invalid base URLs
    7378    // the same way we treate null and empty base URLs.
    74     WEBCORE_EXPORT URL(const URL& base, const String& relative);
    75     URL(const URL& base, const String& relative, const TextEncoding&);
     79    WEBCORE_EXPORT URL(const URL& base, const String& relative, const URLTextEncoding* = nullptr);
    7680
    7781    WEBCORE_EXPORT static URL fakeURLWithRelativePart(const String&);
     
    209213    WEBCORE_EXPORT void invalidate();
    210214    static bool protocolIs(const String&, const char*);
    211     void init(const URL&, const String&, const TextEncoding&);
    212215    void copyToBuffer(Vector<char, 512>& buffer) const;
    213216    unsigned hostStart() const;
     
    304307// in it, the resulting string will have embedded null characters!
    305308WEBCORE_EXPORT String decodeURLEscapeSequences(const String&);
     309class TextEncoding;
    306310String decodeURLEscapeSequences(const String&, const TextEncoding&);
    307311
  • trunk/Source/WebCore/platform/URLParser.cpp

    r236528 r236565  
    619619
    620620template<typename CharacterType>
    621 void URLParser::encodeQuery(const Vector<UChar>& source, const TextEncoding& encoding, CodePointIterator<CharacterType> iterator)
    622 {
    623     auto encoded = encoding.encode(StringView(source.data(), source.size()), UnencodableHandling::URLEncodedEntities);
     621void URLParser::encodeNonUTF8Query(const Vector<UChar>& source, const URLTextEncoding& encoding, CodePointIterator<CharacterType> iterator)
     622{
     623    auto encoded = encoding.encodeForURLParsing(StringView(source.data(), source.size()));
    624624    auto* data = encoded.data();
    625625    size_t length = encoded.size();
     
    881881
    882882template<typename CharacterType>
    883 void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePointIterator<CharacterType>& iterator, bool& isUTF8Encoding)
     883void URLParser::copyURLPartsUntil(const URL& base, URLPart part, const CodePointIterator<CharacterType>& iterator, const URLTextEncoding*& nonUTF8QueryEncoding)
    884884{
    885885    syntaxViolation(iterator);
     
    920920    case Scheme::WS:
    921921    case Scheme::WSS:
    922         isUTF8Encoding = true;
     922        nonUTF8QueryEncoding = nullptr;
    923923        m_urlIsSpecial = true;
    924924        return;
     
    934934    case Scheme::NonSpecial:
    935935        m_urlIsSpecial = false;
    936         isUTF8Encoding = true;
     936        nonUTF8QueryEncoding = nullptr;
    937937        return;
    938938    }
     
    11531153}
    11541154
    1155 URLParser::URLParser(const String& input, const URL& base, const TextEncoding& encoding)
     1155URLParser::URLParser(const String& input, const URL& base, const URLTextEncoding* nonUTF8QueryEncoding)
    11561156    : m_inputString(input)
    11571157{
     
    11661166    if (input.is8Bit()) {
    11671167        m_inputBegin = input.characters8();
    1168         parse(input.characters8(), input.length(), base, encoding);
     1168        parse(input.characters8(), input.length(), base, nonUTF8QueryEncoding);
    11691169    } else {
    11701170        m_inputBegin = input.characters16();
    1171         parse(input.characters16(), input.length(), base, encoding);
     1171        parse(input.characters16(), input.length(), base, nonUTF8QueryEncoding);
    11721172    }
    11731173
     
    11801180    if (!m_didSeeSyntaxViolation) {
    11811181        // Force a syntax violation at the beginning to make sure we get the same result.
    1182         URLParser parser(makeString(" ", input), base, encoding);
     1182        URLParser parser(makeString(" ", input), base, nonUTF8QueryEncoding);
    11831183        URL parsed = parser.result();
    11841184        if (parsed.isValid())
     
    11891189
    11901190template<typename CharacterType>
    1191 void URLParser::parse(const CharacterType* input, const unsigned length, const URL& base, const TextEncoding& encoding)
    1192 {
    1193     URL_PARSER_LOG("Parsing URL <%s> base <%s> encoding <%s>", String(input, length).utf8().data(), base.string().utf8().data(), encoding.name());
     1191void URLParser::parse(const CharacterType* input, const unsigned length, const URL& base, const URLTextEncoding* nonUTF8QueryEncoding)
     1192{
     1193    URL_PARSER_LOG("Parsing URL <%s> base <%s>", String(input, length).utf8().data(), base.string().utf8().data());
    11941194    m_url = { };
    11951195    ASSERT(m_asciiBuffer.isEmpty());
    1196    
    1197     bool isUTF8Encoding = encoding == UTF8Encoding();
     1196
    11981197    Vector<UChar> queryBuffer;
    11991198
     
    12881287                case Scheme::WS:
    12891288                case Scheme::WSS:
    1290                     isUTF8Encoding = true;
     1289                    nonUTF8QueryEncoding = nullptr;
    12911290                    m_urlIsSpecial = true;
    12921291                    if (base.protocolIs(urlScheme))
     
    13101309                    break;
    13111310                case Scheme::NonSpecial:
    1312                     isUTF8Encoding = true;
     1311                    nonUTF8QueryEncoding = nullptr;
    13131312                    auto maybeSlash = c;
    13141313                    advance(maybeSlash);
     
    13541353            }
    13551354            if (base.m_cannotBeABaseURL && *c == '#') {
    1356                 copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
     1355                copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
    13571356                state = State::Fragment;
    13581357                appendToASCIIBuffer('#');
     
    13641363                break;
    13651364            }
    1366             copyURLPartsUntil(base, URLPart::SchemeEnd, c, isUTF8Encoding);
     1365            copyURLPartsUntil(base, URLPart::SchemeEnd, c, nonUTF8QueryEncoding);
    13671366            appendToASCIIBuffer(':');
    13681367            state = State::File;
     
    14141413                break;
    14151414            case '?':
    1416                 copyURLPartsUntil(base, URLPart::PathEnd, c, isUTF8Encoding);
     1415                copyURLPartsUntil(base, URLPart::PathEnd, c, nonUTF8QueryEncoding);
    14171416                appendToASCIIBuffer('?');
    14181417                ++c;
    1419                 if (isUTF8Encoding)
    1420                     state = State::UTF8Query;
    1421                 else {
     1418                if (nonUTF8QueryEncoding) {
    14221419                    queryBegin = c;
    14231420                    state = State::NonUTF8Query;
    1424                 }
     1421                } else
     1422                    state = State::UTF8Query;
    14251423                break;
    14261424            case '#':
    1427                 copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
     1425                copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
    14281426                appendToASCIIBuffer('#');
    14291427                state = State::Fragment;
     
    14311429                break;
    14321430            default:
    1433                 copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, isUTF8Encoding);
     1431                copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, nonUTF8QueryEncoding);
    14341432                if (currentPosition(c) && parsedDataView(currentPosition(c) - 1) != '/') {
    14351433                    appendToASCIIBuffer('/');
     
    14441442            if (*c == '/' || *c == '\\') {
    14451443                ++c;
    1446                 copyURLPartsUntil(base, URLPart::SchemeEnd, c, isUTF8Encoding);
     1444                copyURLPartsUntil(base, URLPart::SchemeEnd, c, nonUTF8QueryEncoding);
    14471445                appendToASCIIBuffer("://", 3);
    14481446                if (m_urlIsSpecial)
     
    14541452                }
    14551453            } else {
    1456                 copyURLPartsUntil(base, URLPart::PortEnd, c, isUTF8Encoding);
     1454                copyURLPartsUntil(base, URLPart::PortEnd, c, nonUTF8QueryEncoding);
    14571455                appendToASCIIBuffer('/');
    14581456                m_url.m_pathAfterLastSlash = base.m_hostEnd + base.m_portLength + 1;
     
    15851583                syntaxViolation(c);
    15861584                if (base.isValid() && base.protocolIs("file")) {
    1587                     copyURLPartsUntil(base, URLPart::PathEnd, c, isUTF8Encoding);
     1585                    copyURLPartsUntil(base, URLPart::PathEnd, c, nonUTF8QueryEncoding);
    15881586                    appendToASCIIBuffer('?');
    15891587                    ++c;
     
    15991597                    m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
    16001598                }
    1601                 if (isUTF8Encoding)
    1602                     state = State::UTF8Query;
    1603                 else {
     1599                if (nonUTF8QueryEncoding) {
    16041600                    queryBegin = c;
    16051601                    state = State::NonUTF8Query;
    1606                 }
     1602                } else
     1603                    state = State::UTF8Query;
    16071604                break;
    16081605            case '#':
    16091606                syntaxViolation(c);
    16101607                if (base.isValid() && base.protocolIs("file")) {
    1611                     copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
     1608                    copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
    16121609                    appendToASCIIBuffer('#');
    16131610                } else {
     
    16281625                syntaxViolation(c);
    16291626                if (base.isValid() && base.protocolIs("file") && shouldCopyFileURL(c))
    1630                     copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, isUTF8Encoding);
     1627                    copyURLPartsUntil(base, URLPart::PathAfterLastSlash, c, nonUTF8QueryEncoding);
    16311628                else {
    16321629                    appendToASCIIBuffer("///", 3);
     
    16941691                            appendToASCIIBuffer("/?", 2);
    16951692                            ++c;
    1696                             if (isUTF8Encoding)
    1697                                 state = State::UTF8Query;
    1698                             else {
     1693                            if (nonUTF8QueryEncoding) {
    16991694                                queryBegin = c;
    17001695                                state = State::NonUTF8Query;
    1701                             }
     1696                            } else
     1697                                state = State::UTF8Query;
    17021698                            m_url.m_pathAfterLastSlash = currentPosition(c) - 1;
    17031699                            m_url.m_pathEnd = m_url.m_pathAfterLastSlash;
     
    17721768                appendToASCIIBuffer('?');
    17731769                ++c;
    1774                 if (isUTF8Encoding)
    1775                     state = State::UTF8Query;
    1776                 else {
     1770                if (nonUTF8QueryEncoding) {
    17771771                    queryBegin = c;
    17781772                    state = State::NonUTF8Query;
    1779                 }
     1773                } else
     1774                    state = State::UTF8Query;
    17801775                break;
    17811776            }
     
    17951790                appendToASCIIBuffer('?');
    17961791                ++c;
    1797                 if (isUTF8Encoding)
    1798                     state = State::UTF8Query;
    1799                 else {
     1792                if (nonUTF8QueryEncoding) {
    18001793                    queryBegin = c;
    18011794                    state = State::NonUTF8Query;
    1802                 }
     1795                } else
     1796                    state = State::UTF8Query;
    18031797            } else if (*c == '#') {
    18041798                m_url.m_pathEnd = currentPosition(c);
     
    18221816                break;
    18231817            }
    1824             if (isUTF8Encoding)
    1825                 utf8QueryEncode(c);
    1826             else
    1827                 appendCodePoint(queryBuffer, *c);
     1818            ASSERT(!nonUTF8QueryEncoding);
     1819            utf8QueryEncode(c);
    18281820            ++c;
    18291821            break;
     
    18331825                ASSERT(queryBegin != CodePointIterator<CharacterType>());
    18341826                if (*c == '#') {
    1835                     encodeQuery(queryBuffer, encoding, CodePointIterator<CharacterType>(queryBegin, c));
     1827                    encodeNonUTF8Query(queryBuffer, *nonUTF8QueryEncoding, CodePointIterator<CharacterType>(queryBegin, c));
    18361828                    m_url.m_queryEnd = currentPosition(c);
    18371829                    state = State::Fragment;
     
    18691861    case State::SpecialRelativeOrAuthority:
    18701862        LOG_FINAL_STATE("SpecialRelativeOrAuthority");
    1871         copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
     1863        copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
    18721864        break;
    18731865    case State::PathOrAuthority:
     
    18901882    case State::RelativeSlash:
    18911883        LOG_FINAL_STATE("RelativeSlash");
    1892         copyURLPartsUntil(base, URLPart::PortEnd, c, isUTF8Encoding);
     1884        copyURLPartsUntil(base, URLPart::PortEnd, c, nonUTF8QueryEncoding);
    18931885        appendToASCIIBuffer('/');
    18941886        m_url.m_pathAfterLastSlash = m_url.m_hostEnd + m_url.m_portLength + 1;
     
    19531945        LOG_FINAL_STATE("File");
    19541946        if (base.isValid() && base.protocolIs("file")) {
    1955             copyURLPartsUntil(base, URLPart::QueryEnd, c, isUTF8Encoding);
     1947            copyURLPartsUntil(base, URLPart::QueryEnd, c, nonUTF8QueryEncoding);
    19561948            break;
    19571949        }
     
    20482040        LOG_FINAL_STATE("NonUTF8Query");
    20492041        ASSERT(queryBegin != CodePointIterator<CharacterType>());
    2050         encodeQuery(queryBuffer, encoding, CodePointIterator<CharacterType>(queryBegin, c));
     2042        encodeNonUTF8Query(queryBuffer, *nonUTF8QueryEncoding, CodePointIterator<CharacterType>(queryBegin, c));
    20512043        m_url.m_queryEnd = currentPosition(c);
    20522044        break;
  • trunk/Source/WebCore/platform/URLParser.h

    r231337 r236565  
    2626#pragma once
    2727
    28 #include "TextEncoding.h"
    2928#include "URL.h"
    3029#include <wtf/Expected.h>
     
    3938class URLParser {
    4039public:
    41     WEBCORE_EXPORT URLParser(const String&, const URL& = { }, const TextEncoding& = UTF8Encoding());
     40    WEBCORE_EXPORT URLParser(const String&, const URL& = { }, const URLTextEncoding* = nullptr);
    4241    URL result() { return m_url; }
    4342
     
    7170    using LCharBuffer = Vector<LChar, defaultInlineBufferSize>;
    7271
    73     template<typename CharacterType> void parse(const CharacterType*, const unsigned length, const URL&, const TextEncoding&);
     72    template<typename CharacterType> void parse(const CharacterType*, const unsigned length, const URL&, const URLTextEncoding*);
    7473    template<typename CharacterType> void parseAuthority(CodePointIterator<CharacterType>);
    7574    template<typename CharacterType> bool parseHostAndPort(CodePointIterator<CharacterType>);
     
    108107    void appendToASCIIBuffer(const char*, size_t);
    109108    void appendToASCIIBuffer(const LChar* characters, size_t size) { appendToASCIIBuffer(reinterpret_cast<const char*>(characters), size); }
    110     template<typename CharacterType> void encodeQuery(const Vector<UChar>& source, const TextEncoding&, CodePointIterator<CharacterType>);
     109    template<typename CharacterType> void encodeNonUTF8Query(const Vector<UChar>& source, const URLTextEncoding&, CodePointIterator<CharacterType>);
    111110    void copyASCIIStringUntil(const String&, size_t length);
    112111    bool copyBaseWindowsDriveLetter(const URL&);
     
    128127
    129128    enum class URLPart;
    130     template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&, bool& isUTF8Encoding);
     129    template<typename CharacterType> void copyURLPartsUntil(const URL& base, URLPart, const CodePointIterator<CharacterType>&, const URLTextEncoding*&);
    131130    static size_t urlLengthUntilPart(const URL&, URLPart);
    132131    void popPath();
  • trunk/Source/WebCore/platform/text/TextEncoding.cpp

    r235935 r236565  
    3232#include "TextEncodingRegistry.h"
    3333#include <unicode/unorm.h>
     34#include <wtf/NeverDestroyed.h>
    3435#include <wtf/StdLibExtras.h>
    3536#include <wtf/text/CString.h>
     
    4041static const TextEncoding& UTF7Encoding()
    4142{
    42     static TextEncoding globalUTF7Encoding("UTF-7");
     43    static NeverDestroyed<TextEncoding> globalUTF7Encoding("UTF-7");
    4344    return globalUTF7Encoding;
    4445}
     
    174175// should be done for UTF-32. In case of UTF-7, it is a byte-based encoding,
    175176// but it's fraught with problems and we'd rather steer clear of it.
    176 const TextEncoding& TextEncoding::encodingForFormSubmission() const
     177const TextEncoding& TextEncoding::encodingForFormSubmissionOrURLParsing() const
    177178{
    178179    if (isNonByteBasedEncoding() || isUTF7Encoding())
     
    183184const TextEncoding& ASCIIEncoding()
    184185{
    185     static TextEncoding globalASCIIEncoding("ASCII");
     186    static NeverDestroyed<TextEncoding> globalASCIIEncoding("ASCII");
    186187    return globalASCIIEncoding;
    187188}
     
    189190const TextEncoding& Latin1Encoding()
    190191{
    191     static TextEncoding globalLatin1Encoding("latin1");
     192    static NeverDestroyed<TextEncoding> globalLatin1Encoding("latin1");
    192193    return globalLatin1Encoding;
    193194}
     
    195196const TextEncoding& UTF16BigEndianEncoding()
    196197{
    197     static TextEncoding globalUTF16BigEndianEncoding("UTF-16BE");
     198    static NeverDestroyed<TextEncoding> globalUTF16BigEndianEncoding("UTF-16BE");
    198199    return globalUTF16BigEndianEncoding;
    199200}
     
    201202const TextEncoding& UTF16LittleEndianEncoding()
    202203{
    203     static TextEncoding globalUTF16LittleEndianEncoding("UTF-16LE");
     204    static NeverDestroyed<TextEncoding> globalUTF16LittleEndianEncoding("UTF-16LE");
    204205    return globalUTF16LittleEndianEncoding;
    205206}
     
    207208const TextEncoding& UTF8Encoding()
    208209{
    209     static TextEncoding globalUTF8Encoding("UTF-8");
    210     ASSERT(globalUTF8Encoding.isValid());
     210    static NeverDestroyed<TextEncoding> globalUTF8Encoding("UTF-8");
     211    ASSERT(globalUTF8Encoding.get().isValid());
    211212    return globalUTF8Encoding;
    212213}
     
    214215const TextEncoding& WindowsLatin1Encoding()
    215216{
    216     static TextEncoding globalWindowsLatin1Encoding("WinLatin-1");
     217    static NeverDestroyed<TextEncoding> globalWindowsLatin1Encoding("WinLatin-1");
    217218    return globalWindowsLatin1Encoding;
    218219}
  • trunk/Source/WebCore/platform/text/TextEncoding.h

    r228594 r236565  
    2626#pragma once
    2727
     28#include "URL.h"
    2829#include <pal/text/UnencodableHandling.h>
    2930#include <wtf/text/WTFString.h>
     
    3132namespace WebCore {
    3233
    33 class TextEncoding {
     34class TextEncoding : public URLTextEncoding {
    3435public:
    3536    TextEncoding() = default;
     
    4445
    4546    const TextEncoding& closestByteBasedEquivalent() const;
    46     const TextEncoding& encodingForFormSubmission() const;
     47    const TextEncoding& encodingForFormSubmissionOrURLParsing() const;
    4748
    4849    WEBCORE_EXPORT String decode(const char*, size_t length, bool stopOnError, bool& sawError) const;
    4950    String decode(const char*, size_t length) const;
    50     Vector<uint8_t> encode(StringView, UnencodableHandling) const;
     51    WEBCORE_EXPORT Vector<uint8_t> encode(StringView, UnencodableHandling) const;
     52    Vector<uint8_t> encodeForURLParsing(StringView string) const final { return encode(string, UnencodableHandling::URLEncodedEntities); }
    5153
    5254    UChar backslashAsCurrencySymbol() const;
  • trunk/Tools/ChangeLog

    r236562 r236565  
     12018-09-27  Alex Christensen  <achristensen@webkit.org>
     2
     3        URLParser should use TextEncoding through an abstract class
     4        https://bugs.webkit.org/show_bug.cgi?id=190027
     5
     6        Reviewed by Andy Estes.
     7
     8        * TestWebKitAPI/Tests/WebCore/URLParser.cpp:
     9        (TestWebKitAPI::checkURL):
     10        (TestWebKitAPI::TEST_F):
     11
    1122018-09-27  Ryan Haddad  <ryanhaddad@apple.com>
    213
  • trunk/Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp

    r236528 r236565  
    2626#include "config.h"
    2727#include "WTFStringUtilities.h"
     28#include <WebCore/TextEncoding.h>
    2829#include <WebCore/URLParser.h>
    2930#include <wtf/MainThread.h>
     
    211212}
    212213
    213 static void checkURL(const String& urlString, const TextEncoding& encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
     214static void checkURL(const String& urlString, const TextEncoding* encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
    214215{
    215216    URLParser parser(urlString, { }, encoding);
     
    236237}
    237238
    238 static void checkURL(const String& urlString, const String& baseURLString, const TextEncoding& encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
     239static void checkURL(const String& urlString, const String& baseURLString, const TextEncoding* encoding, const ExpectedParts& parts, TestTabs testTabs = TestTabs::Yes)
    239240{
    240241    URLParser baseParser(baseURLString, { }, encoding);
     
    12861287TEST_F(URLParserTest, QueryEncoding)
    12871288{
    1288     checkURL(utf16String(u"http://host?ß😍#ß😍"), UTF8Encoding(), {"http", "", "", "host", 0, "/", "%C3%9F%F0%9F%98%8D", "%C3%9F%F0%9F%98%8D", utf16String(u"http://host/?%C3%9F%F0%9F%98%8D#%C3%9F%F0%9F%98%8D")}, testTabsValueForSurrogatePairs);
     1289    checkURL(utf16String(u"http://host?ß😍#ß😍"), nullptr, {"http", "", "", "host", 0, "/", "%C3%9F%F0%9F%98%8D", "%C3%9F%F0%9F%98%8D", utf16String(u"http://host/?%C3%9F%F0%9F%98%8D#%C3%9F%F0%9F%98%8D")}, testTabsValueForSurrogatePairs);
    12891290
    12901291    TextEncoding latin1(String("latin1"));
    1291     checkURL("http://host/?query with%20spaces", latin1, {"http", "", "", "host", 0, "/", "query%20with%20spaces", "", "http://host/?query%20with%20spaces"});
    1292     checkURL("http://host/?query", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
    1293     checkURL("http://host/?\tquery", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
    1294     checkURL("http://host/?q\tuery", latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
    1295     checkURL("http://host/?query with SpAcEs#fragment", latin1, {"http", "", "", "host", 0, "/", "query%20with%20SpAcEs", "fragment", "http://host/?query%20with%20SpAcEs#fragment"});
    1296     checkURL("http://host/?que\rry\t\r\n#fragment", latin1, {"http", "", "", "host", 0, "/", "query", "fragment", "http://host/?query#fragment"});
     1292    checkURL("http://host/?query with%20spaces", &latin1, {"http", "", "", "host", 0, "/", "query%20with%20spaces", "", "http://host/?query%20with%20spaces"});
     1293    checkURL("http://host/?query", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
     1294    checkURL("http://host/?\tquery", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
     1295    checkURL("http://host/?q\tuery", &latin1, {"http", "", "", "host", 0, "/", "query", "", "http://host/?query"});
     1296    checkURL("http://host/?query with SpAcEs#fragment", &latin1, {"http", "", "", "host", 0, "/", "query%20with%20SpAcEs", "fragment", "http://host/?query%20with%20SpAcEs#fragment"});
     1297    checkURL("http://host/?que\rry\t\r\n#fragment", &latin1, {"http", "", "", "host", 0, "/", "query", "fragment", "http://host/?query#fragment"});
    12971298
    12981299    TextEncoding unrecognized(String("unrecognized invalid encoding name"));
    1299     checkURL("http://host/?query", unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
    1300     checkURL("http://host/?", unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
     1300    checkURL("http://host/?query", &unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
     1301    checkURL("http://host/?", &unrecognized, {"http", "", "", "host", 0, "/", "", "", "http://host/?"});
    13011302
    13021303    TextEncoding iso88591(String("ISO-8859-1"));
    13031304    String withUmlauts = utf16String<4>({0xDC, 0x430, 0x451, '\0'});
    1304     checkURL(makeString("ws://host/path?", withUmlauts), iso88591, {"ws", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "ws://host/path?%C3%9C%D0%B0%D1%91"});
    1305     checkURL(makeString("wss://host/path?", withUmlauts), iso88591, {"wss", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "wss://host/path?%C3%9C%D0%B0%D1%91"});
    1306     checkURL(makeString("asdf://host/path?", withUmlauts), iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "asdf://host/path?%C3%9C%D0%B0%D1%91"});
    1307     checkURL(makeString("https://host/path?", withUmlauts), iso88591, {"https", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "https://host/path?%DC%26%231072%3B%26%231105%3B"});
    1308     checkURL(makeString("gopher://host/path?", withUmlauts), iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "gopher://host/path?%DC%26%231072%3B%26%231105%3B"});
    1309     checkURL(makeString("/path?", withUmlauts, "#fragment"), "ws://example.com/", iso88591, {"ws", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "ws://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
    1310     checkURL(makeString("/path?", withUmlauts, "#fragment"), "wss://example.com/", iso88591, {"wss", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "wss://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
    1311     checkURL(makeString("/path?", withUmlauts, "#fragment"), "asdf://example.com/", iso88591, {"asdf", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
    1312     checkURL(makeString("/path?", withUmlauts, "#fragment"), "https://example.com/", iso88591, {"https", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "https://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
    1313     checkURL(makeString("/path?", withUmlauts, "#fragment"), "gopher://example.com/", iso88591, {"gopher", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
    1314     checkURL(makeString("gopher://host/path?", withUmlauts, "#fragment"), "asdf://example.com/?doesntmatter", iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://host/path?%DC%26%231072%3B%26%231105%3B#fragment"});
    1315     checkURL(makeString("asdf://host/path?", withUmlauts, "#fragment"), "http://example.com/?doesntmatter", iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://host/path?%C3%9C%D0%B0%D1%91#fragment"});
    1316 
    1317     checkURL("http://host/pa'th?qu'ery#fr'agment", UTF8Encoding(), {"http", "", "", "host", 0, "/pa'th", "qu%27ery", "fr'agment", "http://host/pa'th?qu%27ery#fr'agment"});
    1318     checkURL("asdf://host/pa'th?qu'ery#fr'agment", UTF8Encoding(), {"asdf", "", "", "host", 0, "/pa'th", "qu'ery", "fr'agment", "asdf://host/pa'th?qu'ery#fr'agment"});
     1305    checkURL(makeString("ws://host/path?", withUmlauts), &iso88591, {"ws", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "ws://host/path?%C3%9C%D0%B0%D1%91"});
     1306    checkURL(makeString("wss://host/path?", withUmlauts), &iso88591, {"wss", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "wss://host/path?%C3%9C%D0%B0%D1%91"});
     1307    checkURL(makeString("asdf://host/path?", withUmlauts), &iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "", "asdf://host/path?%C3%9C%D0%B0%D1%91"});
     1308    checkURL(makeString("https://host/path?", withUmlauts), &iso88591, {"https", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "https://host/path?%DC%26%231072%3B%26%231105%3B"});
     1309    checkURL(makeString("gopher://host/path?", withUmlauts), &iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "", "gopher://host/path?%DC%26%231072%3B%26%231105%3B"});
     1310    checkURL(makeString("/path?", withUmlauts, "#fragment"), "ws://example.com/", &iso88591, {"ws", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "ws://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
     1311    checkURL(makeString("/path?", withUmlauts, "#fragment"), "wss://example.com/", &iso88591, {"wss", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "wss://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
     1312    checkURL(makeString("/path?", withUmlauts, "#fragment"), "asdf://example.com/", &iso88591, {"asdf", "", "", "example.com", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://example.com/path?%C3%9C%D0%B0%D1%91#fragment"});
     1313    checkURL(makeString("/path?", withUmlauts, "#fragment"), "https://example.com/", &iso88591, {"https", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "https://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
     1314    checkURL(makeString("/path?", withUmlauts, "#fragment"), "gopher://example.com/", &iso88591, {"gopher", "", "", "example.com", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://example.com/path?%DC%26%231072%3B%26%231105%3B#fragment"});
     1315    checkURL(makeString("gopher://host/path?", withUmlauts, "#fragment"), "asdf://example.com/?doesntmatter", &iso88591, {"gopher", "", "", "host", 0, "/path", "%DC%26%231072%3B%26%231105%3B", "fragment", "gopher://host/path?%DC%26%231072%3B%26%231105%3B#fragment"});
     1316    checkURL(makeString("asdf://host/path?", withUmlauts, "#fragment"), "http://example.com/?doesntmatter", &iso88591, {"asdf", "", "", "host", 0, "/path", "%C3%9C%D0%B0%D1%91", "fragment", "asdf://host/path?%C3%9C%D0%B0%D1%91#fragment"});
     1317
     1318    checkURL("http://host/pa'th?qu'ery#fr'agment", nullptr, {"http", "", "", "host", 0, "/pa'th", "qu%27ery", "fr'agment", "http://host/pa'th?qu%27ery#fr'agment"});
     1319    checkURL("asdf://host/pa'th?qu'ery#fr'agment", nullptr, {"asdf", "", "", "host", 0, "/pa'th", "qu'ery", "fr'agment", "asdf://host/pa'th?qu'ery#fr'agment"});
    13191320    // FIXME: Add more tests with other encodings and things like non-ascii characters, emoji and unmatched surrogate pairs.
    13201321}
Note: See TracChangeset for help on using the changeset viewer.