Context Navigation

← Previous Changeset
Next Changeset →

Changeset 241863 in webkit

Timestamp:

Feb 21, 2019, 12:39:00 AM (6 years ago)

Author:

commit-queue@webkit.org

Message:

Update MIME type parser
https://bugs.webkit.org/show_bug.cgi?id=180526

Patch by Rob Buis <rbuis@igalia.com> on 2019-02-21
Reviewed by Darin Adler.

LayoutTests/imported/w3c:

Update improved test expectations.

web-platform-tests/xhr/overridemimetype-blob-expected.txt:

Source/WebCore:

Further testing showed the MIME parser needs these fixes:

stripWhitespace is wrong for removing HTTP whitespace, use stripLeadingAndTrailingHTTPSpaces instead.
HTTP Token code points checking for Rfc2045 and Mimesniff were mixed up, use the dedicated isValidHTTPToken for Mimesniff mode.
Quoted Strings were not unescaped/escaped, this seems ok for serializing but is wrong when gettings individual parameter values. Implement [1] and [2] Step 2.4 to properly unescape and escape.

This change also tries to avoid hard to read uses of find.

Test: ParsedContentType.Serialize

[1] https://fetch.spec.whatwg.org/#collect-an-http-quoted-string
[2] https://mimesniff.spec.whatwg.org/#serializing-a-mime-type

platform/network/ParsedContentType.cpp:

(WebCore::skipSpaces):
(WebCore::parseToken):
(WebCore::isNotQuoteOrBackslash):
(WebCore::collectHTTPQuotedString):
(WebCore::containsNonTokenCharacters):
(WebCore::parseQuotedString):
(WebCore::ParsedContentType::parseContentType):
(WebCore::ParsedContentType::create):
(WebCore::ParsedContentType::setContentType):
(WebCore::containsNonQuoteStringTokenCharacters):
(WebCore::ParsedContentType::setContentTypeParameter):
(WebCore::ParsedContentType::serialize const):
(WebCore::substringForRange): Deleted.
(WebCore::isNonTokenCharacter): Deleted.
(WebCore::isNonQuotedStringTokenCharacter): Deleted.

platform/network/ParsedContentType.h:

Tools:

Add tests involving leading and trailing whitespace, non-token
characters and quoted strings.

TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:

(TestWebKitAPI::TEST):

Location:

trunk

Files:

: 7 edited

LayoutTests/imported/w3c/ChangeLog (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt (modified) (1 diff)
Source/WebCore/ChangeLog (modified) (1 diff)
Source/WebCore/platform/network/ParsedContentType.cpp (modified) (15 diffs)
Source/WebCore/platform/network/ParsedContentType.h (modified) (2 diffs)
Tools/ChangeLog (modified) (1 diff)
Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp (modified) (4 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/LayoutTests/imported/w3c/ChangeLog

-              r241746
+              r241863
+-02-21  Rob Buis  <rbuis@igalia.com>
+        Update MIME type parser
+        https://bugs.webkit.org/show_bug.cgi?id=180526
+        Reviewed by Darin Adler.
+        Update improved test expectations.
+        * web-platform-tests/xhr/overridemimetype-blob-expected.txt:
 -02-18  Oriol Brufau  <obrufau@igalia.com>

trunk/LayoutTests/imported/w3c/web-platform-tests/xhr/overridemimetype-blob-expected.txt

r240706	r241863
60	60	PASS 54) MIME types need to be parsed and serialized: ÿ/ÿ
61	61	PASS 55) MIME types need to be parsed and serialized: text/html(;doesnot=matter
62		FAIL 56) MIME types need to be parsed and serialized: {/} assert_equals: expected "application/octet-stream" but got "{/}"
	62	PASS 56) MIME types need to be parsed and serialized: {/}
63	63	PASS 57) MIME types need to be parsed and serialized: Ā/Ā
64	64

trunk/Source/WebCore/ChangeLog

-              r241860
+              r241863
+-02-21  Rob Buis  <rbuis@igalia.com>
+        Update MIME type parser
+        https://bugs.webkit.org/show_bug.cgi?id=180526
+        Reviewed by Darin Adler.
+        Further testing showed the MIME parser needs these fixes:
+        - stripWhitespace is wrong for removing HTTP whitespace, use
+          stripLeadingAndTrailingHTTPSpaces instead.
+        - HTTP Token code points checking for Rfc2045 and Mimesniff were
+          mixed up, use the dedicated isValidHTTPToken for Mimesniff mode.
+        - Quoted Strings were not unescaped/escaped, this seems ok for
+          serializing but is wrong when gettings individual parameter values.
+          Implement [1] and [2] Step 2.4 to properly unescape and escape.
+        This change also tries to avoid hard to read uses of find.
+        Test: ParsedContentType.Serialize
+        [1] https://fetch.spec.whatwg.org/#collect-an-http-quoted-string
+        [2] https://mimesniff.spec.whatwg.org/#serializing-a-mime-type
+        * platform/network/ParsedContentType.cpp:
+        (WebCore::skipSpaces):
+        (WebCore::parseToken):
+        (WebCore::isNotQuoteOrBackslash):
+        (WebCore::collectHTTPQuotedString):
+        (WebCore::containsNonTokenCharacters):
+        (WebCore::parseQuotedString):
+        (WebCore::ParsedContentType::parseContentType):
+        (WebCore::ParsedContentType::create):
+        (WebCore::ParsedContentType::setContentType):
+        (WebCore::containsNonQuoteStringTokenCharacters):
+        (WebCore::ParsedContentType::setContentTypeParameter):
+        (WebCore::ParsedContentType::serialize const):
+        (WebCore::substringForRange): Deleted.
+        (WebCore::isNonTokenCharacter): Deleted.
+        (WebCore::isNonQuotedStringTokenCharacter): Deleted.
+        * platform/network/ParsedContentType.h:
 -02-20  Simon Fraser  <simon.fraser@apple.com>

trunk/Source/WebCore/platform/network/ParsedContentType.cpp

-              r241291
+              r241863
 namespace WebCore {
 static void skipSpaces(const String& input, unsigned& startIndex)
+static void skipSpaces(StringView input, unsigned& startIndex)
+{
     while (startIndex < input.length() && isHTTPSpace(input[startIndex]))
 …
 using CharacterMeetsCondition = bool (*)(UChar);
 static Optional<SubstringRange> parseToken(const String& input, unsigned& startIndex, CharacterMeetsCondition characterMeetsCondition, Mode mode, bool skipTrailingWhitespace = false)
+static Optional<StringView> parseToken(StringView input, unsigned& startIndex, CharacterMeetsCondition characterMeetsCondition, Mode mode, bool skipTrailingWhitespace = false)
+{
     unsigned inputLength = input.length();
 …
             --tokenEnd;
+    }
+    return SubstringRange(tokenStart, tokenEnd - tokenStart);
+}
+static bool containsNonTokenCharacters(const String& input, SubstringRange range)
+{
+    for (unsigned index = 0; index < range.second; ++index) {
+        if (!isTokenCharacter(input[range.first + index]))
+    return input.substring(tokenStart, tokenEnd - tokenStart);
+}
+static bool isNotQuoteOrBackslash(UChar ch)
+{
+    return ch != '"' && ch != '\\';
+}
+static String collectHTTPQuotedString(StringView input, unsigned& startIndex)
+{
+    ASSERT(input[startIndex] == '"');
+    unsigned inputLength = input.length();
+    unsigned& position = startIndex;
+    position++;
+    StringBuilder builder;
+    while (true) {
+        unsigned positionStart = position;
+        parseToken(input, position, isNotQuoteOrBackslash, Mode::MimeSniff);
+        builder.append(input.substring(positionStart, position - positionStart));
+        if (position >= inputLength)
+            break;
+        UChar quoteOrBackslash = input[position++];
+        if (quoteOrBackslash == '\\') {
+            if (position >= inputLength) {
+                builder.append(quoteOrBackslash);
+                break;
+            }
+            builder.append(input[position++]);
+        } else {
+            ASSERT(quoteOrBackslash == '"');
+            break;
+        }
+    }
+    return builder.toString();
+}
+static bool containsNonTokenCharacters(StringView input, Mode mode)
+{
+    if (mode == Mode::MimeSniff)
+        return !isValidHTTPToken(input.toStringWithoutCopying());
+    for (unsigned index = 0; index < input.length(); ++index) {
+        if (!isTokenCharacter(input[index]))
             return true;
+    }
 …
+}
 static Optional<SubstringRange> parseQuotedString(const String& input, unsigned& startIndex, Mode mode)
+static Optional<StringView> parseQuotedString(StringView input, unsigned& startIndex)
+{
     unsigned inputLength = input.length();
 …
         return WTF::nullopt;
+    if (input[quotedStringEnd++] != '"' || quotedStringEnd >= inputLength) {
+        if (mode == Mode::Rfc2045)
+            return WTF::nullopt;
+        return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart);
+    }
+    if (input[quotedStringEnd++] != '"' || quotedStringEnd >= inputLength)
+        return WTF::nullopt;
     bool lastCharacterWasBackslash = false;
     char currentCharacter;
     while ((currentCharacter = input[quotedStringEnd++]) != '"' || lastCharacterWasBackslash) {
+        if (quotedStringEnd >= inputLength) {
+            if (mode == Mode::Rfc2045)
+                return WTF::nullopt;
+            break;
+        }
+        if (quotedStringEnd >= inputLength)
+            return WTF::nullopt;
         if (currentCharacter == '\\' && !lastCharacterWasBackslash) {
             lastCharacterWasBackslash = true;
 …
+    }
     if (input[quotedStringEnd - 1] == '"')
+        return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart - 1);
+    return SubstringRange(quotedStringStart, quotedStringEnd - quotedStringStart);
+}
+static String substringForRange(const String& string, const SubstringRange& range)
+{
+    return string.substring(range.first, range.second);
+        quotedStringEnd++;
+    return input.substring(quotedStringStart, quotedStringEnd - quotedStringStart);
+}
 …
     unsigned contentTypeStart = index;
     auto typeRange = parseToken(m_contentType, index, isNotForwardSlash, mode);
     if (!typeRange || containsNonTokenCharacters(m_contentType, *typeRange)) {
+    if (!typeRange || containsNonTokenCharacters(*typeRange, mode)) {
         LOG_ERROR("Invalid Content-Type, invalid type value.");
         return false;
+    }
     if (m_contentType[index++] != '/') {
+    if (index >= contentTypeLength || m_contentType[index++] != '/') {
         LOG_ERROR("Invalid Content-Type, missing '/'.");
         return false;
 …
     auto subTypeRange = parseToken(m_contentType, index, isNotSemicolon, mode, mode == Mode::MimeSniff);
     if (!subTypeRange || containsNonTokenCharacters(m_contentType, *subTypeRange)) {
+    if (!subTypeRange || containsNonTokenCharacters(*subTypeRange, mode)) {
         LOG_ERROR("Invalid Content-Type, invalid subtype value.");
         return false;
 …
     size_t semiColonIndex = m_contentType.find(';', contentTypeStart);
     if (semiColonIndex == notFound) {
         setContentType(SubstringRange(contentTypeStart, contentTypeLength - contentTypeStart), mode);
+        setContentType(m_contentType.substring(contentTypeStart, contentTypeLength - contentTypeStart), mode);
         return true;
+    }
     setContentType(SubstringRange(contentTypeStart, semiColonIndex - contentTypeStart), mode);
+    setContentType(m_contentType.substring(contentTypeStart, semiColonIndex - contentTypeStart), mode);
     index = semiColonIndex + 1;
     while (true) {
 …
         // Should we tolerate spaces here?
         if (mode == Mode::Rfc2045) {
             if (m_contentType[index++] != '=' || index >= contentTypeLength) {
+            if (index >= contentTypeLength || m_contentType[index++] != '=') {
                 LOG_ERROR("Invalid Content-Type malformed parameter.");
                 return false;
 …
                 continue;
+        }
+        String parameterName = substringForRange(m_contentType, *keyRange);
+        String parameterName = keyRange->toString();
         // Should we tolerate spaces here?
+        Optional<SubstringRange> valueRange;
+        if (m_contentType[index] == '"') {
+            valueRange = parseQuotedString(m_contentType, index, mode);
+            if (mode == Mode::MimeSniff)
+        String parameterValue;
+        Optional<StringView> valueRange;
+        if (index < contentTypeLength && m_contentType[index] == '"') {
+            if (mode == Mode::MimeSniff) {
+                parameterValue = collectHTTPQuotedString(m_contentType, index);
                 parseToken(m_contentType, index, isNotSemicolon, mode);
+            } else
+                valueRange = parseQuotedString(m_contentType, index);
         } else
             valueRange = parseToken(m_contentType, index, isNotSemicolon, mode, mode == Mode::MimeSniff);
+        if (!valueRange) {
+            if (mode == Mode::MimeSniff)
+                continue;
+            LOG_ERROR("Invalid Content-Type, invalid parameter value.");
+            return false;
+        }
+        String parameterValue = substringForRange(m_contentType, *valueRange);
+        if (parameterValue.isNull()) {
+            if (!valueRange) {
+                if (mode == Mode::MimeSniff)
+                    continue;
+                LOG_ERROR("Invalid Content-Type, invalid parameter value.");
+                return false;
+            }
+            parameterValue = valueRange->toString();
+        }
         // Should we tolerate spaces here?
         if (mode == Mode::Rfc2045 && index < contentTypeLength && m_contentType[index++] != ';') {
 …
 Optional<ParsedContentType> ParsedContentType::create(const String& contentType, Mode mode)
+{
     ParsedContentType parsedContentType(mode == Mode::Rfc2045 ? contentType : contentType.stripWhiteSpace());
+    ParsedContentType parsedContentType(mode == Mode::Rfc2045 ? contentType : stripLeadingAndTrailingHTTPSpaces(contentType));
     if (!parsedContentType.parseContentType(mode))
         return WTF::nullopt;
 …
+}
 void ParsedContentType::setContentType(const SubstringRange& contentRange, Mode mode)
+{
     m_mimeType = substringForRange(m_contentType, contentRange).stripWhiteSpace();
+void ParsedContentType::setContentType(StringView contentRange, Mode mode)
+{
+    m_mimeType = contentRange.toString();
     if (mode == Mode::MimeSniff)
+        m_mimeType = m_mimeType.convertToASCIILowercase();
+}
+static bool isNonTokenCharacter(UChar ch)
+{
+    return !isTokenCharacter(ch);
+}
+static bool isNonQuotedStringTokenCharacter(UChar ch)
+{
+    return !isQuotedStringTokenCharacter(ch);
+        m_mimeType = stripLeadingAndTrailingHTTPSpaces(m_mimeType).convertToASCIILowercase();
+    else
+        m_mimeType = m_mimeType.stripWhiteSpace();
+}
+static bool containsNonQuoteStringTokenCharacters(const String& input)
+{
+    for (unsigned index = 0; index < input.length(); ++index) {
+        if (!isQuotedStringTokenCharacter(input[index]))
+            return true;
+    }
+    return false;
+}
 …
     String name = keyName;
     if (mode == Mode::MimeSniff) {
         if (m_parameterValues.contains(keyName) || keyName.find(isNonTokenCharacter) != notFound || keyValue.find(isNonQuotedStringTokenCharacter) != notFound)
+        if (m_parameterValues.contains(name) || !isValidHTTPToken(name) || containsNonQuoteStringTokenCharacters(keyValue))
             return;
         name = name.convertToASCIILowercase();
 …
         builder.append('=');
         String value = m_parameterValues.get(name);
         if (value.isEmpty() || value.find(isNonTokenCharacter) != notFound) {
+        if (value.isEmpty() || !isValidHTTPToken(value)) {
             builder.append('"');
+            builder.append(value);
+            for (unsigned index = 0; index < value.length(); ++index) {
+                auto ch = value[index];
+                if (ch == '\\' || ch =='"')
+                    builder.append('\\');
+                builder.append(ch);
+            }
             builder.append('"');
         } else

trunk/Source/WebCore/platform/network/ParsedContentType.h

-              r241291
+              r241863
     MimeSniff
 };
-// <index, length>
-typedef std::pair<unsigned, unsigned> SubstringRange;
 WEBCORE_EXPORT bool isValidContentType(const String&, Mode = Mode::MimeSniff);
 …
     ParsedContentType& operator=(ParsedContentType const&) = delete;
     bool parseContentType(Mode);
     void setContentType(const SubstringRange&, Mode);
+    void setContentType(StringView, Mode);
     void setContentTypeParameter(const String&, const String&, Mode);

trunk/Tools/ChangeLog

-              r241858
+              r241863
+-02-21  Rob Buis  <rbuis@igalia.com>
+        Update MIME type parser
+        https://bugs.webkit.org/show_bug.cgi?id=180526
+        Reviewed by Darin Adler.
+        Add tests involving leading and trailing whitespace, non-token
+        characters and quoted strings.
+        * TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp:
+        (TestWebKitAPI::TEST):
 -02-20  Don Olmstead  <don.olmstead@sony.com>

trunk/Tools/TestWebKitAPI/Tests/WebCore/ParsedContentType.cpp

-              r241291
+              r241863
     EXPECT_STREQ(serializeIfValid("text/\0"), "NOTVALID");
     EXPECT_STREQ(serializeIfValid("text/ "), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("{/}"), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("x/x ;x=x"), "x/x;x=x");
     EXPECT_STREQ(serializeIfValid("text/plain"), "text/plain");
     EXPECT_STREQ(serializeIfValid("text/plain\0"), "text/plain");
 …
     EXPECT_STREQ(serializeIfValid("\rtext/plain"), "text/plain");
     EXPECT_STREQ(serializeIfValid("\btext/plain"), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("\x0Btext/plain"), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("\u000Btext/plain"), "NOTVALID");
     EXPECT_STREQ(serializeIfValid("text/plain "), "text/plain");
     EXPECT_STREQ(serializeIfValid("text/plain\t"), "text/plain");
 …
     EXPECT_STREQ(serializeIfValid("text/plain\r"), "text/plain");
     EXPECT_STREQ(serializeIfValid("text/plain\b"), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("text/plain\x0B"), "NOTVALID");
+    EXPECT_STREQ(serializeIfValid("text/plain\u000B"), "NOTVALID");
     EXPECT_STREQ(serializeIfValid(" text/plain "), "text/plain");
     EXPECT_STREQ(serializeIfValid("\ttext/plain\t"), "text/plain");
 …
     EXPECT_STREQ(serializeIfValid("text/plain;test=\"value\"foo"), "text/plain;test=value");
     EXPECT_STREQ(serializeIfValid("text/plain;test=\"value\"foo;foo=bar"), "text/plain;test=value;foo=bar");
     EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\ue\""), "text/plain;test=\"val\\ue\"");
+    EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\ue\""), "text/plain;test=value");
     EXPECT_STREQ(serializeIfValid("text/plain;test=\"val\\\"ue\""), "text/plain;test=\"val\\\"ue\"");
     EXPECT_STREQ(serializeIfValid("text/plain;test='value'"), "text/plain;test='value'");
     EXPECT_STREQ(serializeIfValid("text/plain;test='value"), "text/plain;test='value");
     EXPECT_STREQ(serializeIfValid("text/plain;test=value'"), "text/plain;test=value'");
+    EXPECT_STREQ(serializeIfValid("text/plain;test={value}"), "text/plain;test=\"{value}\"");
+}

Note: See TracChangeset for help on using the changeset viewer.