Changeset 202599 in webkit


Ignore:
Timestamp:
Jun 28, 2016 6:04:05 PM (8 years ago)
Author:
jiewen_tan@apple.com
Message:

Implement "replacement" codec
https://bugs.webkit.org/show_bug.cgi?id=159180
<rdar://problem/26015178>

Reviewed by Brent Fulgham.

LayoutTests/imported/w3c:

  • web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt:

Source/WebCore:

Test: fast/encoding/charset-replacement.html

Add support for "replacement" codec according to the spec:
https://encoding.spec.whatwg.org/#replacement
According to the spec, encoding labels {"csiso2022kr", "hz-gb-2312", "iso-2022-cn",
"iso-2022-cn-ext", "iso-2022-kr"} are used to conduct certain attacks that abuse
a mismatch between encodings supported on the server and the client. Therefore,
they are grouped under the "replacement" codec, which does the following things
to prevent those attacks.
1) Decode: terminates with a single U+FFFD.
2) Encode: treated as UTF-8.

Furthermore, the "replacement" codec is a specification convenience to group those
vulnerable encoding labels. Therefore, it should not be able to use directly.

This change is based on the following Blink changes:
https://codereview.chromium.org/265973003, and
https://codereview.chromium.org/261013007.

  • CMakeLists.txt:
  • WebCore.xcodeproj/project.pbxproj:
  • platform/text/TextAllInOne.cpp:
  • platform/text/TextCodecReplacement.cpp: Added.

(WebCore::TextCodecReplacement::create):
(WebCore::TextCodecReplacement::TextCodecReplacement):
(WebCore::TextCodecReplacement::registerEncodingNames):
(WebCore::TextCodecReplacement::registerCodecs):
(WebCore::TextCodecReplacement::decode):

  • platform/text/TextCodecReplacement.h: Added.
  • platform/text/TextEncoding.cpp:

(WebCore::TextEncoding::TextEncoding):

  • platform/text/TextEncodingRegistry.cpp:

(WebCore::isReplacementEncoding):
(WebCore::extendTextCodecMaps):

  • platform/text/TextEncodingRegistry.h:

LayoutTests:

  • fast/encoding/char-decoding-expected.txt:
  • fast/encoding/char-decoding.html:
  • fast/encoding/char-encoding-expected.txt:
  • fast/encoding/char-encoding.html:
  • fast/encoding/charset-replacement-expected.txt: Added.
  • fast/encoding/charset-replacement.html: Added.
Location:
trunk
Files:
4 added
14 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r202597 r202599  
     12016-06-28  Jiewen Tan  <jiewen_tan@apple.com>
     2
     3        Implement "replacement" codec
     4        https://bugs.webkit.org/show_bug.cgi?id=159180
     5        <rdar://problem/26015178>
     6
     7        Reviewed by Brent Fulgham.
     8
     9        * fast/encoding/char-decoding-expected.txt:
     10        * fast/encoding/char-decoding.html:
     11        * fast/encoding/char-encoding-expected.txt:
     12        * fast/encoding/char-encoding.html:
     13        * fast/encoding/charset-replacement-expected.txt: Added.
     14        * fast/encoding/charset-replacement.html: Added.
     15
    1162016-06-28  Michael Saboff  <msaboff@apple.com>
    217
  • trunk/LayoutTests/fast/encoding/char-decoding-expected.txt

    r84473 r202599  
    191191PASS decode('UTF-16BE', '%D8%69%DE%D6') is 'U+D869/U+DED6'
    192192PASS decode('unicodeFFFE', '%D8%69%DE%D6') is 'U+D869/U+DED6'
     193PASS decode('csiso2022kr', '%41%42%43%61%62%63%31%32%33%A0') is 'U+FFFD'
     194PASS decode('hz-gb-2312', '%41%42%43%61%62%63%31%32%33%A0') is 'U+FFFD'
     195PASS decode('iso-2022-cn', '%41%42%43%61%62%63%31%32%33%A0') is 'U+FFFD'
     196PASS decode('iso-2022-cn-ext', '%41%42%43%61%62%63%31%32%33%A0') is 'U+FFFD'
     197PASS decode('iso-2022-kr', '%41%42%43%61%62%63%31%32%33%A0') is 'U+FFFD'
    193198PASS successfullyParsed is true
    194199
  • trunk/LayoutTests/fast/encoding/char-decoding.html

    r155267 r202599  
    106106testDecode('unicodeFFFE', '%D8%69%DE%D6', 'U+D869/U+DED6');
    107107
     108// Replacement encodings should decode as replacement (U+FFFD) then EOF
     109testDecode("csiso2022kr", "%41%42%43%61%62%63%31%32%33%A0", "U+FFFD");
     110testDecode("hz-gb-2312", "%41%42%43%61%62%63%31%32%33%A0", "U+FFFD");
     111testDecode("iso-2022-cn", "%41%42%43%61%62%63%31%32%33%A0", "U+FFFD");
     112testDecode("iso-2022-cn-ext", "%41%42%43%61%62%63%31%32%33%A0", "U+FFFD");
     113testDecode("iso-2022-kr", "%41%42%43%61%62%63%31%32%33%A0", "U+FFFD");
     114
    108115</script>
    109116<script src="../../resources/js-test-post.js"></script>
  • trunk/LayoutTests/fast/encoding/char-encoding-expected.txt

    r64817 r202599  
    1717PASS encode('GBK', 'U+22EF') is '%A1%AD'
    1818PASS encode('GBK', 'U+301C') is '%A1%AB'
     19PASS encode('csiso2022kr', 'U+00A0') is '%C2%A0'
     20PASS encode('hz-gb-2312', 'U+00A0') is '%C2%A0'
     21PASS encode('iso-2022-cn', 'U+00A0') is '%C2%A0'
     22PASS encode('iso-2022-cn-ext', 'U+00A0') is '%C2%A0'
     23PASS encode('iso-2022-kr', 'U+00A0') is '%C2%A0'
    1924PASS successfullyParsed is true
    2025
  • trunk/LayoutTests/fast/encoding/char-encoding.html

    r155267 r202599  
    3434testEncode('GBK', 'U+22EF', '%A1%AD');
    3535testEncode('GBK', 'U+301C', '%A1%AB');
     36// Replacement encodings - should encode as UTF-8
     37testEncode("csiso2022kr", "U+00A0", "%C2%A0");
     38testEncode("hz-gb-2312", "U+00A0", "%C2%A0");
     39testEncode("iso-2022-cn", "U+00A0", "%C2%A0");
     40testEncode("iso-2022-cn-ext", "U+00A0", "%C2%A0");
     41testEncode("iso-2022-kr", "U+00A0", "%C2%A0");
    3642
    3743// Turning on this test causes a download to occur. FIXME: A bug?
  • trunk/LayoutTests/imported/w3c/ChangeLog

    r202542 r202599  
     12016-06-28  Jiewen Tan  <jiewen_tan@apple.com>
     2
     3        Implement "replacement" codec
     4        https://bugs.webkit.org/show_bug.cgi?id=159180
     5        <rdar://problem/26015178>
     6
     7        Reviewed by Brent Fulgham.
     8
     9        * web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt:
     10
    1112016-06-27  Youenn Fablet  <youenn@apple.com>
    212
  • trunk/LayoutTests/imported/w3c/web-platform-tests/dom/nodes/Document-characterSet-normalization-expected.txt

    r202471 r202599  
    639639PASS Name "EUC-KR" has label "windows-949" (inputEncoding)
    640640PASS Name "EUC-KR" has label "windows-949" (charset)
    641 FAIL Name "replacement" has label "csiso2022kr" (characterSet) assert_equals: expected "replacement" but got "ISO-2022-KR"
    642 FAIL Name "replacement" has label "csiso2022kr" (inputEncoding) assert_equals: expected "replacement" but got "ISO-2022-KR"
    643 FAIL Name "replacement" has label "csiso2022kr" (charset) assert_equals: expected "replacement" but got "ISO-2022-KR"
    644 FAIL Name "replacement" has label "hz-gb-2312" (characterSet) assert_equals: expected "replacement" but got "HZ-GB-2312"
    645 FAIL Name "replacement" has label "hz-gb-2312" (inputEncoding) assert_equals: expected "replacement" but got "HZ-GB-2312"
    646 FAIL Name "replacement" has label "hz-gb-2312" (charset) assert_equals: expected "replacement" but got "HZ-GB-2312"
    647 FAIL Name "replacement" has label "iso-2022-cn" (characterSet) assert_equals: expected "replacement" but got "ISO-2022-CN"
    648 FAIL Name "replacement" has label "iso-2022-cn" (inputEncoding) assert_equals: expected "replacement" but got "ISO-2022-CN"
    649 FAIL Name "replacement" has label "iso-2022-cn" (charset) assert_equals: expected "replacement" but got "ISO-2022-CN"
    650 FAIL Name "replacement" has label "iso-2022-cn-ext" (characterSet) assert_equals: expected "replacement" but got "ISO-2022-CN-EXT"
    651 FAIL Name "replacement" has label "iso-2022-cn-ext" (inputEncoding) assert_equals: expected "replacement" but got "ISO-2022-CN-EXT"
    652 FAIL Name "replacement" has label "iso-2022-cn-ext" (charset) assert_equals: expected "replacement" but got "ISO-2022-CN-EXT"
    653 FAIL Name "replacement" has label "iso-2022-kr" (characterSet) assert_equals: expected "replacement" but got "ISO-2022-KR"
    654 FAIL Name "replacement" has label "iso-2022-kr" (inputEncoding) assert_equals: expected "replacement" but got "ISO-2022-KR"
    655 FAIL Name "replacement" has label "iso-2022-kr" (charset) assert_equals: expected "replacement" but got "ISO-2022-KR"
     641PASS Name "replacement" has label "csiso2022kr" (characterSet)
     642PASS Name "replacement" has label "csiso2022kr" (inputEncoding)
     643PASS Name "replacement" has label "csiso2022kr" (charset)
     644PASS Name "replacement" has label "hz-gb-2312" (characterSet)
     645PASS Name "replacement" has label "hz-gb-2312" (inputEncoding)
     646PASS Name "replacement" has label "hz-gb-2312" (charset)
     647PASS Name "replacement" has label "iso-2022-cn" (characterSet)
     648PASS Name "replacement" has label "iso-2022-cn" (inputEncoding)
     649PASS Name "replacement" has label "iso-2022-cn" (charset)
     650PASS Name "replacement" has label "iso-2022-cn-ext" (characterSet)
     651PASS Name "replacement" has label "iso-2022-cn-ext" (inputEncoding)
     652PASS Name "replacement" has label "iso-2022-cn-ext" (charset)
     653PASS Name "replacement" has label "iso-2022-kr" (characterSet)
     654PASS Name "replacement" has label "iso-2022-kr" (inputEncoding)
     655PASS Name "replacement" has label "iso-2022-kr" (charset)
    656656
  • trunk/Source/WebCore/CMakeLists.txt

    r202408 r202599  
    23792379    platform/text/TextCodecICU.cpp
    23802380    platform/text/TextCodecLatin1.cpp
     2381    platform/text/TextCodecReplacement.cpp
    23812382    platform/text/TextCodecUTF16.cpp
    23822383    platform/text/TextCodecUTF8.cpp
  • trunk/Source/WebCore/ChangeLog

    r202592 r202599  
     12016-06-28  Jiewen Tan  <jiewen_tan@apple.com>
     2
     3        Implement "replacement" codec
     4        https://bugs.webkit.org/show_bug.cgi?id=159180
     5        <rdar://problem/26015178>
     6
     7        Reviewed by Brent Fulgham.
     8
     9        Test: fast/encoding/charset-replacement.html
     10
     11        Add support for "replacement" codec according to the spec:
     12        https://encoding.spec.whatwg.org/#replacement
     13        According to the spec, encoding labels {"csiso2022kr", "hz-gb-2312", "iso-2022-cn",
     14        "iso-2022-cn-ext", "iso-2022-kr"} are used to conduct certain attacks that abuse
     15        a mismatch between encodings supported on the server and the client. Therefore,
     16        they are grouped under the "replacement" codec, which does the following things
     17        to prevent those attacks.
     18        1) Decode: terminates with a single U+FFFD.
     19        2) Encode: treated as UTF-8.
     20
     21        Furthermore, the "replacement" codec is a specification convenience to group those
     22        vulnerable encoding labels. Therefore, it should not be able to use directly.
     23
     24        This change is based on the following Blink changes:
     25        https://codereview.chromium.org/265973003, and
     26        https://codereview.chromium.org/261013007.
     27
     28        * CMakeLists.txt:
     29        * WebCore.xcodeproj/project.pbxproj:
     30        * platform/text/TextAllInOne.cpp:
     31        * platform/text/TextCodecReplacement.cpp: Added.
     32        (WebCore::TextCodecReplacement::create):
     33        (WebCore::TextCodecReplacement::TextCodecReplacement):
     34        (WebCore::TextCodecReplacement::registerEncodingNames):
     35        (WebCore::TextCodecReplacement::registerCodecs):
     36        (WebCore::TextCodecReplacement::decode):
     37        * platform/text/TextCodecReplacement.h: Added.
     38        * platform/text/TextEncoding.cpp:
     39        (WebCore::TextEncoding::TextEncoding):
     40        * platform/text/TextEncodingRegistry.cpp:
     41        (WebCore::isReplacementEncoding):
     42        (WebCore::extendTextCodecMaps):
     43        * platform/text/TextEncodingRegistry.h:
     44
    1452016-06-28  Dean Jackson  <dino@apple.com>
    246
  • trunk/Source/WebCore/WebCore.xcodeproj/project.pbxproj

    r202581 r202599  
    23322332                572A7F211C6E5719009C6149 /* SimulatedClick.h in Headers */ = {isa = PBXBuildFile; fileRef = 572A7F201C6E5719009C6149 /* SimulatedClick.h */; };
    23332333                572A7F231C6E5A66009C6149 /* SimulatedClick.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 572A7F221C6E5A66009C6149 /* SimulatedClick.cpp */; };
     2334                57EF5E601D20C83900171E60 /* TextCodecReplacement.h in Headers */ = {isa = PBXBuildFile; fileRef = 57EF5E5F1D20C83900171E60 /* TextCodecReplacement.h */; };
     2335                57EF5E621D20D28700171E60 /* TextCodecReplacement.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 57EF5E611D20D28700171E60 /* TextCodecReplacement.cpp */; };
    23342336                580371611A66F00A00BAF519 /* ClipRect.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 5803715F1A66F00A00BAF519 /* ClipRect.cpp */; };
    23352337                580371621A66F00A00BAF519 /* ClipRect.h in Headers */ = {isa = PBXBuildFile; fileRef = 580371601A66F00A00BAF519 /* ClipRect.h */; settings = {ATTRIBUTES = (Private, ); }; };
     
    999810000                572A7F201C6E5719009C6149 /* SimulatedClick.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SimulatedClick.h; sourceTree = "<group>"; };
    999910001                572A7F221C6E5A66009C6149 /* SimulatedClick.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SimulatedClick.cpp; sourceTree = "<group>"; };
     10002                57EF5E5F1D20C83900171E60 /* TextCodecReplacement.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = TextCodecReplacement.h; sourceTree = "<group>"; };
     10003                57EF5E611D20D28700171E60 /* TextCodecReplacement.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = TextCodecReplacement.cpp; sourceTree = "<group>"; };
    1000010004                5803715F1A66F00A00BAF519 /* ClipRect.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ClipRect.cpp; sourceTree = "<group>"; };
    1000110005                580371601A66F00A00BAF519 /* ClipRect.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ClipRect.h; sourceTree = "<group>"; };
     
    2235022354                                B2C3DA0D0D006C1D00EF6F26 /* TextCodecLatin1.cpp */,
    2235122355                                B2C3DA0E0D006C1D00EF6F26 /* TextCodecLatin1.h */,
     22356                                57EF5E611D20D28700171E60 /* TextCodecReplacement.cpp */,
     22357                                57EF5E5F1D20C83900171E60 /* TextCodecReplacement.h */,
    2235222358                                B2C3DA0F0D006C1D00EF6F26 /* TextCodecUserDefined.cpp */,
    2235322359                                B2C3DA100D006C1D00EF6F26 /* TextCodecUserDefined.h */,
     
    2791227918                                BC3BE9990E9C1E5D00835588 /* RenderScrollbarTheme.h in Headers */,
    2791327919                                458FE40A1589DF0B005609E6 /* RenderSearchField.h in Headers */,
     27920                                57EF5E601D20C83900171E60 /* TextCodecReplacement.h in Headers */,
    2791427921                                0F11A54F0F39233100C37884 /* RenderSelectionInfo.h in Headers */,
    2791527922                                AB247A6D0AFD6383003FA5FD /* RenderSlider.h in Headers */,
     
    3019030197                                D359D789129CA2710006E5D2 /* HTMLDetailsElement.cpp in Sources */,
    3019130198                                A8EA79F90A1916DF00A8EF5F /* HTMLDirectoryElement.cpp in Sources */,
     30199                                57EF5E621D20D28700171E60 /* TextCodecReplacement.cpp in Sources */,
    3019230200                                A8EA7CB10A192B9C00A8EF5F /* HTMLDivElement.cpp in Sources */,
    3019330201                                A8EA79F50A1916DF00A8EF5F /* HTMLDListElement.cpp in Sources */,
  • trunk/Source/WebCore/platform/text/TextAllInOne.cpp

    r165676 r202599  
    3131#include "TextCodecICU.cpp"
    3232#include "TextCodecLatin1.cpp"
     33#include "TextCodecReplacement.cpp"
    3334#include "TextCodecUTF16.cpp"
    3435#include "TextCodecUTF8.cpp"
  • trunk/Source/WebCore/platform/text/TextEncoding.cpp

    r177280 r202599  
    4848    , m_backslashAsCurrencySymbol(backslashAsCurrencySymbol())
    4949{
     50    // Aliases are valid, but not "replacement" itself.
     51    if (m_name && isReplacementEncoding(name))
     52        m_name = nullptr;
    5053}
    5154
     
    5457    , m_backslashAsCurrencySymbol(backslashAsCurrencySymbol())
    5558{
     59    // Aliases are valid, but not "replacement" itself.
     60    if (m_name && isReplacementEncoding(name))
     61        m_name = nullptr;
    5662}
    5763
  • trunk/Source/WebCore/platform/text/TextEncodingRegistry.cpp

    r195452 r202599  
    3030#include "TextCodecICU.h"
    3131#include "TextCodecLatin1.h"
     32#include "TextCodecReplacement.h"
    3233#include "TextCodecUserDefined.h"
    3334#include "TextCodecUTF16.h"
     
    268269}
    269270
     271bool isReplacementEncoding(const char* alias)
     272{
     273    if (!alias)
     274        return false;
     275
     276    if (strlen(alias) != 11)
     277        return false;
     278
     279    return !strcasecmp(alias, "replacement");
     280}
     281
     282bool isReplacementEncoding(const String& alias)
     283{
     284    return equalLettersIgnoringASCIICase(alias, "replacement");
     285}
     286
    270287bool shouldShowBackslashAsCurrencySymbolIn(const char* canonicalEncodingName)
    271288{
     
    275292static void extendTextCodecMaps()
    276293{
     294    TextCodecReplacement::registerEncodingNames(addToTextEncodingNameMap);
     295    TextCodecReplacement::registerCodecs(addToTextCodecMap);
     296
    277297    TextCodecICU::registerEncodingNames(addToTextEncodingNameMap);
    278298    TextCodecICU::registerCodecs(addToTextCodecMap);
  • trunk/Source/WebCore/platform/text/TextEncodingRegistry.h

    r174757 r202599  
    4747    bool isJapaneseEncoding(const char* canonicalEncodingName);
    4848    bool shouldShowBackslashAsCurrencySymbolIn(const char* canonicalEncodingName);
     49    bool isReplacementEncoding(const char* alias);
     50    bool isReplacementEncoding(const String& alias);
    4951
    5052    WEBCORE_EXPORT String defaultTextEncodingNameForSystemLanguage();
Note: See TracChangeset for help on using the changeset viewer.