Context Navigation

← Previous Changeset
Next Changeset →

Changeset 266528 in webkit

Timestamp:

Sep 3, 2020, 9:57:49 AM (5 years ago)

Author:

achristensen@apple.com

Message:

TextDecoder should ignore byte-order-mark like other browsers and spec
https://bugs.webkit.org/show_bug.cgi?id=216108

Reviewed by Darin Adler.

LayoutTests/imported/w3c:

web-platform-tests/encoding/streams/decode-ignore-bom.any-expected.txt:
web-platform-tests/encoding/textdecoder-ignorebom.any-expected.txt:

Source/WebCore:

Covered by newly passing web platform tests.

dom/TextDecoder.cpp:

(WebCore::TextDecoder::ignoreBOMIfNecessary):
(WebCore::TextDecoder::decode):
(WebCore::TextDecoder::prependBOMIfNecessary): Deleted.

dom/TextDecoder.h:

Location:

trunk

Files:

: 10 edited

LayoutTests/imported/w3c/ChangeLog (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-ignore-bom.any-expected.txt (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-ignore-bom.any.worker-expected.txt (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-copy.any-expected.txt (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-copy.any.worker-expected.txt (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.any-expected.txt (modified) (1 diff)
LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.any.worker-expected.txt (modified) (1 diff)
Source/WebCore/ChangeLog (modified) (1 diff)
Source/WebCore/dom/TextDecoder.cpp (modified) (5 diffs)
Source/WebCore/dom/TextDecoder.h (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/LayoutTests/imported/w3c/ChangeLog

-              r266527
+              r266528
+-09-03  Alex Christensen  <achristensen@webkit.org>
+        TextDecoder should ignore byte-order-mark like other browsers and spec
+        https://bugs.webkit.org/show_bug.cgi?id=216108
+        Reviewed by Darin Adler.
+        * web-platform-tests/encoding/streams/decode-ignore-bom.any-expected.txt:
+        * web-platform-tests/encoding/textdecoder-ignorebom.any-expected.txt:
 -09-03  Alex Christensen  <achristensen@webkit.org>

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-ignore-bom.any-expected.txt

-              r266348
+              r266528
 PASS ignoreBOM should work for encoding utf-8, split at character 0
+FAIL ignoreBOM should work for encoding utf-8, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+FAIL ignoreBOM should work for encoding utf-8, split at character 2 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-8, split at character 1
+PASS ignoreBOM should work for encoding utf-8, split at character 2
 PASS ignoreBOM should work for encoding utf-8, split at character 3
 PASS ignoreBOM should work for encoding utf-16le, split at character 0
+FAIL ignoreBOM should work for encoding utf-16le, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16le, split at character 1
 PASS ignoreBOM should work for encoding utf-16le, split at character 2
+FAIL ignoreBOM should work for encoding utf-16le, split at character 3 assert_equals: BOM should be preserved expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16le, split at character 3
 PASS ignoreBOM should work for encoding utf-16be, split at character 0
+FAIL ignoreBOM should work for encoding utf-16be, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16be, split at character 1
 PASS ignoreBOM should work for encoding utf-16be, split at character 2
+FAIL ignoreBOM should work for encoding utf-16be, split at character 3 assert_equals: BOM should be preserved expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16be, split at character 3

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-ignore-bom.any.worker-expected.txt

-              r266348
+              r266528
 PASS ignoreBOM should work for encoding utf-8, split at character 0
+FAIL ignoreBOM should work for encoding utf-8, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+FAIL ignoreBOM should work for encoding utf-8, split at character 2 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-8, split at character 1
+PASS ignoreBOM should work for encoding utf-8, split at character 2
 PASS ignoreBOM should work for encoding utf-8, split at character 3
 PASS ignoreBOM should work for encoding utf-16le, split at character 0
+FAIL ignoreBOM should work for encoding utf-16le, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16le, split at character 1
 PASS ignoreBOM should work for encoding utf-16le, split at character 2
+FAIL ignoreBOM should work for encoding utf-16le, split at character 3 assert_equals: BOM should be preserved expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16le, split at character 3
 PASS ignoreBOM should work for encoding utf-16be, split at character 0
+FAIL ignoreBOM should work for encoding utf-16be, split at character 1 assert_equals: BOM should be stripped expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16be, split at character 1
 PASS ignoreBOM should work for encoding utf-16be, split at character 2
+FAIL ignoreBOM should work for encoding utf-16be, split at character 3 assert_equals: BOM should be preserved expected "abc" but got "abc"
+PASS ignoreBOM should work for encoding utf-16be, split at character 3

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-copy.any-expected.txt

r264561	r266528
1	1
2		FAIL Modify buffer after passing it in (ArrayBuffer) assert_equals: expected "@" but got "@"
3		FAIL Modify buffer after passing it in (SharedArrayBuffer) assert_equals: expected "@" but got "@"
	2	PASS Modify buffer after passing it in (ArrayBuffer)
	3	PASS Modify buffer after passing it in (SharedArrayBuffer)
4	4

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-copy.any.worker-expected.txt

r264561	r266528
1	1
2		FAIL Modify buffer after passing it in (ArrayBuffer) assert_equals: expected "@" but got "@"
3		FAIL Modify buffer after passing it in (SharedArrayBuffer) assert_equals: expected "@" but got "@"
	2	PASS Modify buffer after passing it in (ArrayBuffer)
	3	PASS Modify buffer after passing it in (SharedArrayBuffer)
4	4

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.any-expected.txt

-              r256730
+              r266528
+FAIL BOM is ignored if ignoreBOM option is specified: utf-8 assert_equals: utf-8: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+FAIL BOM is ignored if ignoreBOM option is specified: utf-16le assert_equals: utf-16le: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+FAIL BOM is ignored if ignoreBOM option is specified: utf-16be assert_equals: utf-16be: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+PASS BOM is ignored if ignoreBOM option is specified: utf-8
+PASS BOM is ignored if ignoreBOM option is specified: utf-16le
+PASS BOM is ignored if ignoreBOM option is specified: utf-16be
 PASS The ignoreBOM attribute of TextDecoder

trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-ignorebom.any.worker-expected.txt

-              r256730
+              r266528
+FAIL BOM is ignored if ignoreBOM option is specified: utf-8 assert_equals: utf-8: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+FAIL BOM is ignored if ignoreBOM option is specified: utf-16le assert_equals: utf-16le: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+FAIL BOM is ignored if ignoreBOM option is specified: utf-16be assert_equals: utf-16be: BOM should be present in decoded string if ignored by a reused decoder expected "abc" but got "abc"
+PASS BOM is ignored if ignoreBOM option is specified: utf-8
+PASS BOM is ignored if ignoreBOM option is specified: utf-16le
+PASS BOM is ignored if ignoreBOM option is specified: utf-16be
 PASS The ignoreBOM attribute of TextDecoder

trunk/Source/WebCore/ChangeLog

-              r266527
+              r266528
+-09-03  Alex Christensen  <achristensen@webkit.org>
+        TextDecoder should ignore byte-order-mark like other browsers and spec
+        https://bugs.webkit.org/show_bug.cgi?id=216108
+        Reviewed by Darin Adler.
+        Covered by newly passing web platform tests.
+        * dom/TextDecoder.cpp:
+        (WebCore::TextDecoder::ignoreBOMIfNecessary):
+        (WebCore::TextDecoder::decode):
+        (WebCore::TextDecoder::prependBOMIfNecessary): Deleted.
+        * dom/TextDecoder.h:
 -09-03  Alex Christensen  <achristensen@webkit.org>

trunk/Source/WebCore/dom/TextDecoder.cpp

-              r243163
+              r266528
+}
+void TextDecoder::ignoreBOMIfNecessary(const uint8_t*& data, size_t& length)
+constexpr uint8_t utf8BOMBytes[3] { 0xEF, 0xBB, 0xBF };
+constexpr uint8_t utf16BEBOMBytes[2] { 0xFE, 0xFF };
+constexpr uint8_t utf16LEBOMBytes[2] { 0xFF, 0xFE };
+size_t TextDecoder::bytesNeededForFullBOMIgnoreCheck() const
+{
+    const uint8_t utf8BOMBytes[3] = {0xEF, 0xBB, 0xBF};
+    const uint8_t utf16BEBOMBytes[2] = {0xFE, 0xFF};
+    const uint8_t utf16LEBOMBytes[2] = {0xFF, 0xFE};
+    if (m_textEncoding == UTF8Encoding())
+        return sizeof(utf8BOMBytes);
+    if (m_textEncoding == UTF16BigEndianEncoding())
+        return sizeof(utf16BEBOMBytes);
+    if (m_textEncoding == UTF16LittleEndianEncoding())
+        return sizeof(utf16LEBOMBytes);
+    return 0;
+}
+bool TextDecoder::isBeginningOfIncompleteBOM(const uint8_t* bytes, size_t length) const
+{
+    if (!length)
+        return true;
+    if (m_textEncoding == UTF8Encoding()) {
+        if (length == 1)
+            return bytes[0] == utf8BOMBytes[0];
+        return length == 2 && bytes[0] == utf8BOMBytes[0] && bytes[1] == utf8BOMBytes[1];
+    }
+    if (m_textEncoding == UTF16BigEndianEncoding())
+        return length == 1 && bytes[0] == utf16BEBOMBytes[0];
+    if (m_textEncoding == UTF16LittleEndianEncoding())
+        return length == 1 && bytes[0] == utf16LEBOMBytes[0];
+    return false;
+}
+auto TextDecoder::ignoreBOMIfNecessary(const uint8_t*& data, size_t& length, bool stream) -> WaitForMoreBOMBytes
+{
+    if (m_bomIgnoredIfNecessary || m_options.ignoreBOM)
+        return WaitForMoreBOMBytes::No;
+    if (stream && length < bytesNeededForFullBOMIgnoreCheck()) {
+        if (isBeginningOfIncompleteBOM(data, length))
+            return WaitForMoreBOMBytes::Yes;
+        m_bomIgnoredIfNecessary = true;
+        return WaitForMoreBOMBytes::No;
+    }
     if (m_textEncoding == UTF8Encoding()
 …
         length -= sizeof(utf16LEBOMBytes);
+    }
+}
+String TextDecoder::prependBOMIfNecessary(const String& decoded)
+{
+    if (m_hasDecoded || !m_options.ignoreBOM)
+        return decoded;
+    const UChar utf16BEBOM[2] = {0xFEFF, '\0'};
+    // FIXME: Make TextCodec::decode take a flag for prepending BOM so we don't need to do this extra allocation and copy.
+    return makeString(utf16BEBOM, decoded);
+    m_bomIgnoredIfNecessary = true;
+    return WaitForMoreBOMBytes::No;
+}
 …
+    }
+    ignoreBOMIfNecessary(data, length);
+    if (!options.stream)
+        m_bomIgnoredIfNecessary = false;
+    bool alreadyBuffered = false;
     if (m_buffer.size()) {
         m_buffer.append(data, length);
         data = m_buffer.data();
         length = m_buffer.size();
+        alreadyBuffered = true;
+    }
+    if (ignoreBOMIfNecessary(data, length, options.stream) == WaitForMoreBOMBytes::Yes) {
+        ASSERT(options.stream);
+        if (!alreadyBuffered)
+            m_buffer.append(data, length);
+        return String();
+    }
 …
     String result;
     if (!sawError)
         result = prependBOMIfNecessary(m_textEncoding.decode(charData, length, stopOnError, sawError));
+        result = m_textEncoding.decode(charData, length, stopOnError, sawError);
     if (sawError) {
 …
             if (m_options.fatal)
                 return Exception { TypeError };
             result = prependBOMIfNecessary(m_textEncoding.decode(charData, length));
+            result = m_textEncoding.decode(charData, length);
+        }
     } else
         m_buffer.clear();
-    m_hasDecoded = true;
     return result;
+}

trunk/Source/WebCore/dom/TextDecoder.h

-              r242776
+              r266528
 private:
-    String prependBOMIfNecessary(const String&);
-    void ignoreBOMIfNecessary(const uint8_t*& data, size_t& length);
     TextDecoder(const char*, Options);
+    enum class WaitForMoreBOMBytes : bool { No, Yes };
+    WaitForMoreBOMBytes ignoreBOMIfNecessary(const uint8_t*& data, size_t& length, bool stream);
+    size_t bytesNeededForFullBOMIgnoreCheck() const;
+    bool isBeginningOfIncompleteBOM(const uint8_t*, size_t) const;
     TextEncoding m_textEncoding;
     Options m_options;
-    bool m_hasDecoded { false };
     Vector<uint8_t> m_buffer;
+    bool m_bomIgnoredIfNecessary { false };
 };

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 266528 in webkit

Legend:

Download in other formats: