Changeset 266668 in webkit


Ignore:
Timestamp:
Sep 5, 2020, 12:48:15 PM (4 years ago)
Author:
achristensen@apple.com
Message:

TextDecoder should properly handle streams
https://bugs.webkit.org/show_bug.cgi?id=216202

Reviewed by Darin Adler.

LayoutTests/imported/w3c:

  • web-platform-tests/encoding/streams/decode-non-utf8.any-expected.txt:
  • web-platform-tests/encoding/streams/decode-non-utf8.any.js:
  • web-platform-tests/encoding/streams/decode-non-utf8.any.worker-expected.txt:
  • web-platform-tests/encoding/streams/decode-split-character.any-expected.txt:
  • web-platform-tests/encoding/streams/decode-split-character.any.worker-expected.txt:
  • web-platform-tests/encoding/streams/realms.window-expected.txt:
  • web-platform-tests/encoding/textdecoder-fatal-streaming-expected.txt: Removed.
  • web-platform-tests/encoding/textdecoder-fatal-streaming.any-expected.txt:
  • web-platform-tests/encoding/textdecoder-fatal-streaming.any.worker-expected.txt:

Source/WebCore:

A TextCodec keeps state when it decodes part of valid input, such as the first byte of a multibyte sequence.
TextEncoding::decode makes a new TextCodec and throws away that state.
In order to properly handle streaming, we need to keep the TextCodec and call TextCodec::decode directly.

Covered by newly passing web platform tests. I also added a test that failed in my first implementation attempt
but passes now in WebKit as well as Chromium. Firefox hasn't implemented TextDecoderStream yet, but this test will
hopefully help them not make the same mistake I did.

  • dom/TextDecoder.cpp:

(WebCore::TextDecoder::decode):
(WebCore::codeUnitByteSize): Deleted.

  • dom/TextDecoder.h:
Location:
trunk
Files:
1 deleted
12 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/imported/w3c/ChangeLog

    r266660 r266668  
     12020-09-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        TextDecoder should properly handle streams
     4        https://bugs.webkit.org/show_bug.cgi?id=216202
     5
     6        Reviewed by Darin Adler.
     7
     8        * web-platform-tests/encoding/streams/decode-non-utf8.any-expected.txt:
     9        * web-platform-tests/encoding/streams/decode-non-utf8.any.js:
     10        * web-platform-tests/encoding/streams/decode-non-utf8.any.worker-expected.txt:
     11        * web-platform-tests/encoding/streams/decode-split-character.any-expected.txt:
     12        * web-platform-tests/encoding/streams/decode-split-character.any.worker-expected.txt:
     13        * web-platform-tests/encoding/streams/realms.window-expected.txt:
     14        * web-platform-tests/encoding/textdecoder-fatal-streaming-expected.txt: Removed.
     15        * web-platform-tests/encoding/textdecoder-fatal-streaming.any-expected.txt:
     16        * web-platform-tests/encoding/textdecoder-fatal-streaming.any.worker-expected.txt:
     17
    1182020-09-05  Darin Adler  <darin@apple.com>
    219
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-non-utf8.any-expected.txt

    r266620 r266668  
    99PASS TextDecoderStream should be able to decode invalid sequences in Shift_JIS
    1010PASS TextDecoderStream should be able to reject invalid sequences in Shift_JIS
     11PASS TextDecoderStream should be able to decode ISO-2022-JP
     12PASS TextDecoderStream should be able to decode invalid sequences in ISO-2022-JP
     13PASS TextDecoderStream should be able to reject invalid sequences in ISO-2022-JP
    1114PASS TextDecoderStream should be able to decode ISO-8859-14
    1215
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-non-utf8.any.js

    r264561 r266668  
    2525    expected: "\u{6c34}",
    2626    invalid: [255]
     27  },
     28  {
     29    name: 'ISO-2022-JP',
     30    value: [65, 66, 67, 0x1B, 65, 66, 67],
     31    expected: "ABC\u{fffd}ABC",
     32    invalid: [0x0E]
    2733  },
    2834  {
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-non-utf8.any.worker-expected.txt

    r266620 r266668  
    99PASS TextDecoderStream should be able to decode invalid sequences in Shift_JIS
    1010PASS TextDecoderStream should be able to reject invalid sequences in Shift_JIS
     11PASS TextDecoderStream should be able to decode ISO-2022-JP
     12PASS TextDecoderStream should be able to decode invalid sequences in ISO-2022-JP
     13PASS TextDecoderStream should be able to reject invalid sequences in ISO-2022-JP
    1114PASS TextDecoderStream should be able to decode ISO-8859-14
    1215
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-split-character.any-expected.txt

    r266348 r266668  
    11
    22PASS a code point split between chunks should not be emitted until all bytes are available; split point = 2
    3 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 3 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
    4 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 4 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
    5 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 5 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
     3PASS a code point split between chunks should not be emitted until all bytes are available; split point = 3
     4PASS a code point split between chunks should not be emitted until all bytes are available; split point = 4
     5PASS a code point split between chunks should not be emitted until all bytes are available; split point = 5
    66PASS a code point should be emitted as soon as all bytes are available
    77PASS an empty chunk inside a code point split between chunks should not change the output; split point = 1
    88PASS an empty chunk inside a code point split between chunks should not change the output; split point = 2
    9 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 3 assert_equals: two chunks should be output expected 2 but got 1
    10 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 4 assert_equals: two chunks should be output expected 2 but got 1
    11 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 5 assert_equals: two chunks should be output expected 2 but got 1
     9PASS an empty chunk inside a code point split between chunks should not change the output; split point = 3
     10PASS an empty chunk inside a code point split between chunks should not change the output; split point = 4
     11PASS an empty chunk inside a code point split between chunks should not change the output; split point = 5
    1212PASS an empty chunk inside a code point split between chunks should not change the output; split point = 6
    1313
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/decode-split-character.any.worker-expected.txt

    r266348 r266668  
    11
    22PASS a code point split between chunks should not be emitted until all bytes are available; split point = 2
    3 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 3 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
    4 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 4 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
    5 FAIL a code point split between chunks should not be emitted until all bytes are available; split point = 5 assert_array_equals: the split code point should be in the second chunk of the output lengths differ, expected array ["I ", "💙 streams"] length 2, got ["I 💙 streams"] length 1
     3PASS a code point split between chunks should not be emitted until all bytes are available; split point = 3
     4PASS a code point split between chunks should not be emitted until all bytes are available; split point = 4
     5PASS a code point split between chunks should not be emitted until all bytes are available; split point = 5
    66PASS a code point should be emitted as soon as all bytes are available
    77PASS an empty chunk inside a code point split between chunks should not change the output; split point = 1
    88PASS an empty chunk inside a code point split between chunks should not change the output; split point = 2
    9 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 3 assert_equals: two chunks should be output expected 2 but got 1
    10 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 4 assert_equals: two chunks should be output expected 2 but got 1
    11 FAIL an empty chunk inside a code point split between chunks should not change the output; split point = 5 assert_equals: two chunks should be output expected 2 but got 1
     9PASS an empty chunk inside a code point split between chunks should not change the output; split point = 3
     10PASS an empty chunk inside a code point split between chunks should not change the output; split point = 4
     11PASS an empty chunk inside a code point split between chunks should not change the output; split point = 5
    1212PASS an empty chunk inside a code point split between chunks should not change the output; split point = 6
    1313
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/streams/realms.window-expected.txt

    r266348 r266668  
    1111PASS the result object when write is called with a pending read should come from the same realm as the constructor of TextDecoderStream
    1212PASS TypeError for chunk with the wrong type should come from constructor realm of TextDecoderStream
    13 FAIL TypeError for invalid chunk should come from constructor realm of TextDecoderStream assert_unreached: Should have rejected: write TypeError should come from constructor realm Reached unreachable code
     13PASS TypeError for invalid chunk should come from constructor realm of TextDecoderStream
    1414PASS TypeError for incomplete input should come from constructor realm of TextDecoderStream
    1515
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.any-expected.txt

    r266457 r266668  
    11
    22PASS Fatal flag, non-streaming cases
    3 FAIL Fatal flag, streaming cases assert_equals: expected "\0" but got ""
     3PASS Fatal flag, streaming cases
    44
  • trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-fatal-streaming.any.worker-expected.txt

    r266457 r266668  
    11
    22PASS Fatal flag, non-streaming cases
    3 FAIL Fatal flag, streaming cases assert_equals: expected "\0" but got ""
     3PASS Fatal flag, streaming cases
    44
  • trunk/Source/WebCore/ChangeLog

    r266667 r266668  
     12020-09-05  Alex Christensen  <achristensen@webkit.org>
     2
     3        TextDecoder should properly handle streams
     4        https://bugs.webkit.org/show_bug.cgi?id=216202
     5
     6        Reviewed by Darin Adler.
     7
     8        A TextCodec keeps state when it decodes part of valid input, such as the first byte of a multibyte sequence.
     9        TextEncoding::decode makes a new TextCodec and throws away that state.
     10        In order to properly handle streaming, we need to keep the TextCodec and call TextCodec::decode directly.
     11
     12        Covered by newly passing web platform tests.  I also added a test that failed in my first implementation attempt
     13        but passes now in WebKit as well as Chromium.  Firefox hasn't implemented TextDecoderStream yet, but this test will
     14        hopefully help them not make the same mistake I did.
     15
     16        * dom/TextDecoder.cpp:
     17        (WebCore::TextDecoder::decode):
     18        (WebCore::codeUnitByteSize): Deleted.
     19        * dom/TextDecoder.h:
     20
    1212020-09-05  Myles C. Maxfield  <mmaxfield@apple.com>
    222
  • trunk/Source/WebCore/dom/TextDecoder.cpp

    r266528 r266668  
    2727
    2828#include "HTMLParserIdioms.h"
     29#include "TextCodec.h"
     30#include "TextEncodingRegistry.h"
    2931#include <wtf/Optional.h>
    3032
    3133namespace WebCore {
     34
     35TextDecoder::~TextDecoder() = default;
    3236
    3337ExceptionOr<Ref<TextDecoder>> TextDecoder::create(const String& label, Options options)
     
    118122}
    119123
    120 static size_t codeUnitByteSize(const TextEncoding& encoding)
    121 {
    122     return encoding.isByteBasedEncoding() ? 1 : 2;
    123 }
    124 
    125124ExceptionOr<String> TextDecoder::decode(Optional<BufferSource::VariantType> input, DecodeOptions options)
    126125{
     
    152151    }
    153152
    154     const bool stopOnError = true;
     153    auto oldBuffer = std::exchange(m_buffer, { });
     154
     155    if (!m_codec)
     156        m_codec = newTextCodec(m_textEncoding);
     157
    155158    bool sawError = false;
    156     if (length % codeUnitByteSize(m_textEncoding))
    157         sawError = true;
    158     const char* charData = reinterpret_cast<const char*>(data);
    159     String result;
    160     if (!sawError)
    161         result = m_textEncoding.decode(charData, length, stopOnError, sawError);
    162 
    163     if (sawError) {
    164         if (options.stream) {
    165             result = String();
    166             if (!m_buffer.size())
    167                 m_buffer.append(data, length);
    168         } else {
    169             if (m_options.fatal)
    170                 return Exception { TypeError };
    171             result = m_textEncoding.decode(charData, length);
    172         }
    173     } else
    174         m_buffer.clear();
    175 
     159    String result = m_codec->decode(reinterpret_cast<const char*>(data), length, !options.stream, false, sawError);
     160    if (sawError && m_options.fatal)
     161        return Exception { TypeError };
    176162    return result;
    177163}
  • trunk/Source/WebCore/dom/TextDecoder.h

    r266528 r266668  
    3333namespace WebCore {
    3434
     35class TextCodec;
     36
    3537class TextDecoder : public RefCounted<TextDecoder> {
    3638public:
     
    4446   
    4547    static ExceptionOr<Ref<TextDecoder>> create(const String& label, Options);
     48    ~TextDecoder();
    4649
    4750    String encoding() const;
     
    5861    bool isBeginningOfIncompleteBOM(const uint8_t*, size_t) const;
    5962
    60     TextEncoding m_textEncoding;
     63    const TextEncoding m_textEncoding;
     64    std::unique_ptr<TextCodec> m_codec;
    6165    Options m_options;
    6266    Vector<uint8_t> m_buffer;
Note: See TracChangeset for help on using the changeset viewer.