Changeset 208902 in webkit


Ignore:
Timestamp:
Nov 18, 2016 2:47:24 PM (7 years ago)
Author:
achristensen@apple.com
Message:

Support IDN2008 with UTS #46 instead of IDN2003
https://bugs.webkit.org/show_bug.cgi?id=144194

Reviewed by Darin Adler.

Source/WebCore:

Use uidna_nameToASCII instead of the deprecated uidna_IDNToASCII.
It uses IDN2008 instead of IDN2003, and it uses UTF #46 when used with a UIDNA opened with uidna_openUTS46.
This follows https://url.spec.whatwg.org/#concept-domain-to-ascii except we do not use Transitional_Processing
to prevent homograph attacks on german domain names with "ß" and "ss" in them. These are now treated as separate domains.
Firefox also doesn't use Transitional_Processing. Chrome and the current specification use Transitional_processing,
but https://github.com/whatwg/url/issues/110 might change the spec.

In addition, http://unicode.org/reports/tr46/ says:
"implementations are encouraged to apply the Bidi and ContextJ validity criteria"
Bidi checks prevent domain names with bidirectional text, such as latin and hebrew characters in the same domain. Chrome and Firefox do this.

ContextJ checks prevent code points such as U+200D, which is a zero-width joiner which users would not see when looking at the domain name.
Firefox currently enables ContextJ checks and it is suggested by UTS #46, so we'll do it.

ContextO checks, which we do not use and neither does any other browser nor the spec, would fail if a domain contains code points such as U+30FB,
which looks somewhat like a dot. We can investigate enabling these checks later.

Covered by new API tests and rebased LayoutTests.
The new API tests verify that we do not use transitional processing, that we do apply the Bidi and ContextJ checks, but not ContextO checks.

  • platform/URLParser.cpp:

(WebCore::URLParser::domainToASCII):
(WebCore::URLParser::internationalDomainNameTranscoder):

  • platform/URLParser.h:
  • platform/mac/WebCoreNSURLExtras.mm:

(WebCore::mapHostNameWithRange):

Tools:

  • TestWebKitAPI/Tests/WebCore/URLParser.cpp:

(TestWebKitAPI::TEST_F):
Add some tests from http://unicode.org/faq/idn.html verifying that we follow UTS46's deviations from IDN2008.
Add some tests based on https://tools.ietf.org/html/rfc5893 verifying that we check for bidirectional text.
Add a test based on https://tools.ietf.org/html/rfc5892 verifying that we do not do ContextO check.
Add a test for U+321D and U+321E which have particularly interesting punycode encodings. We match Firefox here now.
Also add a test from http://www.unicode.org/reports/tr46/#IDNAComparison verifying we are not using IDN2003.
We should consider importing all of http://www.unicode.org/Public/idna/9.0.0/IdnaTest.txt as URL domain tests.

LayoutTests:

  • fast/encoding/idn-security.html:

Move some characters with changed IDN encodings to inside the check for old ICU.

  • fast/url/idna2003-expected.txt:
  • fast/url/idna2008-expected.txt:

Update expected results. We are now more compliant with IDN2008.

Location:
trunk
Files:
11 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r208900 r208902  
     12016-11-17  Alex Christensen  <achristensen@webkit.org>
     2
     3        Support IDN2008 with UTS #46 instead of IDN2003
     4        https://bugs.webkit.org/show_bug.cgi?id=144194
     5
     6        Reviewed by Darin Adler.
     7
     8        * fast/encoding/idn-security.html:
     9        Move some characters with changed IDN encodings to inside the check for old ICU.
     10        * fast/url/idna2003-expected.txt:
     11        * fast/url/idna2008-expected.txt:
     12        Update expected results.  We are now more compliant with IDN2008.
     13
    1142016-11-18  Ryan Haddad  <ryanhaddad@apple.com>
    215
  • trunk/LayoutTests/fast/encoding/idn-security-expected.txt

    r45254 r208902  
    3535PASS testIDNRoundTrip(0xa000) is '%uA000'
    3636PASS testIDNRoundTripNotFirstCharacter(0xa000) is '%uA000'
    37 PASS testIDNRoundTrip(0x2024) is '.'
    38 PASS testIDNRoundTripNotFirstCharacter(0x2024) is '.'
    39 PASS testIDNRoundTrip(0xfe52) is '.'
    40 PASS testIDNRoundTripNotFirstCharacter(0xfe52) is '.'
    4137PASS testIDNRoundTrip(0xff0f) is '/'
    4238PASS testIDNRoundTripNotFirstCharacter(0xff0f) is '/'
     
    8783PASS testIDNRoundTrip(0x261) is 'punycode'
    8884PASS testIDNRoundTripNotFirstCharacter(0x261) is 'punycode'
    89 PASS testIDNRoundTrip(0x337) is 'punycode'
    90 PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
    91 PASS testIDNRoundTrip(0x337) is 'punycode'
    92 PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
    93 PASS testIDNRoundTrip(0x338) is 'punycode'
    94 PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
    95 PASS testIDNRoundTrip(0x338) is 'punycode'
    96 PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
    97 PASS testIDNRoundTrip(0x5b4) is 'punycode'
    98 PASS testIDNRoundTripNotFirstCharacter(0x5b4) is 'punycode'
    99 PASS testIDNRoundTrip(0x5bc) is 'punycode'
    100 PASS testIDNRoundTripNotFirstCharacter(0x5bc) is 'punycode'
    101 PASS testIDNRoundTrip(0x660) is 'punycode'
    102 PASS testIDNRoundTripNotFirstCharacter(0x660) is 'punycode'
    103 PASS testIDNRoundTrip(0x6f0) is 'punycode'
    104 PASS testIDNRoundTripNotFirstCharacter(0x6f0) is 'punycode'
    105 PASS testIDNRoundTrip(0x115f) is 'punycode'
    106 PASS testIDNRoundTripNotFirstCharacter(0x115f) is 'punycode'
    107 PASS testIDNRoundTrip(0x1160) is 'punycode'
    108 PASS testIDNRoundTripNotFirstCharacter(0x1160) is 'punycode'
    10985PASS testIDNRoundTrip(0x2027) is 'punycode'
    11086PASS testIDNRoundTripNotFirstCharacter(0x2027) is 'punycode'
     
    163139PASS testIDNRoundTrip(0x3035) is 'punycode'
    164140PASS testIDNRoundTripNotFirstCharacter(0x3035) is 'punycode'
    165 PASS testIDNRoundTrip(0x3164) is 'punycode'
    166 PASS testIDNRoundTripNotFirstCharacter(0x3164) is 'punycode'
    167 PASS testIDNRoundTrip(0x321d) is 'punycode'
    168 PASS testIDNRoundTripNotFirstCharacter(0x321d) is 'punycode'
    169 PASS testIDNRoundTrip(0x321e) is 'punycode'
    170 PASS testIDNRoundTripNotFirstCharacter(0x321e) is 'punycode'
    171141PASS testIDNRoundTrip(0x33ae) is 'punycode'
    172142PASS testIDNRoundTripNotFirstCharacter(0x33ae) is 'punycode'
     
    177147PASS testIDNRoundTrip(0x33df) is 'punycode'
    178148PASS testIDNRoundTripNotFirstCharacter(0x33df) is 'punycode'
    179 PASS testIDNRoundTrip(0xfe14) is 'punycode'
    180 PASS testIDNRoundTripNotFirstCharacter(0xfe14) is 'punycode'
    181 PASS testIDNRoundTrip(0xfe15) is 'punycode'
    182 PASS testIDNRoundTripNotFirstCharacter(0xfe15) is 'punycode'
    183149PASS testIDNRoundTrip(0xfe3f) is 'punycode'
    184150PASS testIDNRoundTripNotFirstCharacter(0xfe3f) is 'punycode'
     
    187153PASS testIDNRoundTrip(0xfe5e) is 'punycode'
    188154PASS testIDNRoundTripNotFirstCharacter(0xfe5e) is 'punycode'
    189 PASS testIDNRoundTrip(0xffa0) is 'punycode'
    190 PASS testIDNRoundTripNotFirstCharacter(0xffa0) is 'punycode'
    191155PASS testIDNEncode(0x2028) is '%u2028'
    192156PASS testIDNEncodeNotFirstCharacter(0x2028) is '%u2028'
     
    245209PASS testIDNEncode(0xfeff) is '%uFEFF'
    246210PASS testIDNRoundTripNotFirstCharacter(0xfeff) is ''
     211PASS testIDNRoundTrip(0x2024) is '%u2024'
     212PASS testIDNRoundTripNotFirstCharacter(0x2024) is '%u2024'
     213PASS testIDNRoundTrip(0xfe52) is '%uFE52'
     214PASS testIDNRoundTripNotFirstCharacter(0xfe52) is '%uFE52'
     215PASS testIDNRoundTrip(0x337) is '%u0337'
     216PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
     217PASS testIDNRoundTrip(0x337) is '%u0337'
     218PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
     219PASS testIDNRoundTrip(0x338) is '%u0338'
     220PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
     221PASS testIDNRoundTrip(0x338) is '%u0338'
     222PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
     223PASS testIDNRoundTrip(0x5b4) is '%u05B4'
     224PASS testIDNRoundTripNotFirstCharacter(0x5b4) is 'punycode'
     225PASS testIDNRoundTrip(0x5bc) is '%u05BC'
     226PASS testIDNRoundTripNotFirstCharacter(0x5bc) is 'punycode'
     227PASS testIDNRoundTrip(0x660) is '%u0660'
     228PASS testIDNRoundTripNotFirstCharacter(0x660) is '%u0660'
     229PASS testIDNRoundTrip(0x6f0) is 'punycode'
     230PASS testIDNRoundTripNotFirstCharacter(0x6f0) is 'punycode'
     231PASS testIDNRoundTrip(0x115f) is '%u115F'
     232PASS testIDNRoundTripNotFirstCharacter(0x115f) is '%u115F'
     233PASS testIDNRoundTrip(0x1160) is '%u1160'
     234PASS testIDNRoundTripNotFirstCharacter(0x1160) is '%u1160'
     235PASS testIDNRoundTrip(0x3164) is '%u3164'
     236PASS testIDNRoundTripNotFirstCharacter(0x3164) is '%u3164'
     237PASS testIDNRoundTrip(0x321d) is '%28%uC624%uC804%29'
     238PASS testIDNRoundTripNotFirstCharacter(0x321d) is '%28%uC624%uC804%29'
     239PASS testIDNRoundTrip(0x321e) is '%28%uC624%uD6C4%29'
     240PASS testIDNRoundTripNotFirstCharacter(0x321e) is '%28%uC624%uD6C4%29'
     241PASS testIDNRoundTrip(0xfe14) is '%3B'
     242PASS testIDNRoundTripNotFirstCharacter(0xfe14) is '%3B'
     243PASS testIDNRoundTrip(0xfe15) is '%21'
     244PASS testIDNRoundTripNotFirstCharacter(0xfe15) is '%21'
     245PASS testIDNRoundTrip(0xffa0) is '%uFFA0'
     246PASS testIDNRoundTripNotFirstCharacter(0xffa0) is '%uFFA0'
    247247
  • trunk/LayoutTests/fast/encoding/idn-security.html

    r155267 r208902  
    135135
    136136/* ICU converts these to other allowed characters, so the original character can't be used to get to a phishy domain name */
    137 testIDNCharacter(0x2024, ".");
    138 testIDNCharacter(0xFE52, ".");
    139137testIDNCharacter(0xFF0F, "/");
    140138
     
    169167testIDNCharacter(0x0251, "disallowed");
    170168testIDNCharacter(0x0261, "disallowed");
    171 testIDNCharacter(0x0337, "disallowed");
    172 testIDNCharacter(0x0337, "disallowed");
    173 testIDNCharacter(0x0338, "disallowed");
    174 testIDNCharacter(0x0338, "disallowed");
    175 testIDNCharacter(0x05B4, "disallowed");
    176 testIDNCharacter(0x05BC, "disallowed");
    177 testIDNCharacter(0x0660, "disallowed");
    178 testIDNCharacter(0x06F0, "disallowed");
    179 testIDNCharacter(0x115F, "disallowed");
    180 testIDNCharacter(0x1160, "disallowed");
    181169testIDNCharacter(0x2027, "disallowed");
    182170testIDNCharacter(0x2039, "disallowed");
     
    207195testIDNCharacter(0x3033, "disallowed");
    208196testIDNCharacter(0x3035, "disallowed");
    209 testIDNCharacter(0x3164, "disallowed");
    210 testIDNCharacter(0x321D, "disallowed");
    211 testIDNCharacter(0x321E, "disallowed");
    212197testIDNCharacter(0x33AE, "disallowed");
    213198testIDNCharacter(0x33AF, "disallowed");
    214199testIDNCharacter(0x33C6, "disallowed");
    215200testIDNCharacter(0x33DF, "disallowed");
    216 testIDNCharacter(0xFE14, "disallowed");
    217 testIDNCharacter(0xFE15, "disallowed");
    218201testIDNCharacter(0xFE3F, "disallowed");
    219202testIDNCharacter(0xFE5D, "disallowed");
    220203testIDNCharacter(0xFE5E, "disallowed");
    221 testIDNCharacter(0xFFA0, "disallowed");
    222204
    223205/* ICU won't encode these characters in IDN, thus we should always get 'host not found'. */
     
    259241    testIDNCharacter(0xFF61, ".");
    260242    testIDNCharacter(0xFEFF, "");
     243    testIDNCharacter(0x2024, ".");
     244    testIDNCharacter(0xFE52, ".");
     245    testIDNCharacter(0x0337, "disallowed");
     246    testIDNCharacter(0x0337, "disallowed");
     247    testIDNCharacter(0x0338, "disallowed");
     248    testIDNCharacter(0x0338, "disallowed");
     249    testIDNCharacter(0x05B4, "disallowed");
     250    testIDNCharacter(0x05BC, "disallowed");
     251    testIDNCharacter(0x0660, "disallowed");
     252    testIDNCharacter(0x06F0, "disallowed");
     253    testIDNCharacter(0x115F, "disallowed");
     254    testIDNCharacter(0x1160, "disallowed");
     255    testIDNCharacter(0x3164, "disallowed");
     256    testIDNCharacter(0x321D, "disallowed");
     257    testIDNCharacter(0x321E, "disallowed");
     258    testIDNCharacter(0xFE14, "disallowed");
     259    testIDNCharacter(0xFE15, "disallowed");
     260    testIDNCharacter(0xFFA0, "disallowed");
    261261} else {
    262262    testIDNCharacter(0x200B, "does not encode", "");
     
    265265    testIDNCharacter(0xFF61, "does not encode", ".");
    266266    testIDNCharacter(0xFEFF, "does not encode", "");
     267    testIDNCharacter(0x2024, "%u2024");
     268    testIDNCharacter(0xFE52, "%uFE52");
     269    testIDNCharacter(0x0337, "%u0337", "punycode");
     270    testIDNCharacter(0x0337, "%u0337", "punycode");
     271    testIDNCharacter(0x0338, "%u0338", "punycode");
     272    testIDNCharacter(0x0338, "%u0338", "punycode");
     273    testIDNCharacter(0x05B4, "%u05B4", "punycode");
     274    testIDNCharacter(0x05BC, "%u05BC", "punycode");
     275    testIDNCharacter(0x0660, "%u0660");
     276    testIDNCharacter(0x06F0, "disallowed");
     277    testIDNCharacter(0x115F, "%u115F");
     278    testIDNCharacter(0x1160, "%u1160");
     279    testIDNCharacter(0x3164, "%u3164");
     280    testIDNCharacter(0x321D, "%28%uC624%uC804%29");
     281    testIDNCharacter(0x321E, "%28%uC624%uD6C4%29");
     282    testIDNCharacter(0xFE14, "%3B");
     283    testIDNCharacter(0xFE15, "%21");
     284    testIDNCharacter(0xFFA0, "%uFFA0");
    267285}
    268286
  • trunk/LayoutTests/fast/url/idna2003-expected.txt

    r207162 r208902  
    55
    66The PASS/FAIL results of this test are set to the behavior in IDNA2003.
    7 PASS canonicalize('http://faß.de/') is 'http://fass.de/'
    8 PASS canonicalize('http://βόλος.com/') is 'http://xn--nxasmq6b.com/'
    9 PASS canonicalize('http://ශ්‍රී.com/') is 'http://xn--10cl1a0b.com/'
    10 PASS canonicalize('http://نامه‌ای.com/') is 'http://xn--mgba3gch31f.com/'
     7FAIL canonicalize('http://faß.de/') should be http://fass.de/. Was http://xn--fa-hia.de/.
     8FAIL canonicalize('http://βόλος.com/') should be http://xn--nxasmq6b.com/. Was http://xn--nxasmm1c.com/.
     9FAIL canonicalize('http://ශ්‍රී.com/') should be http://xn--10cl1a0b.com/. Was http://xn--10cl1a0b660p.com/.
     10FAIL canonicalize('http://نامه‌ای.com/') should be http://xn--mgba3gch31f.com/. Was http://xn--mgba3gch31f060k.com/.
    1111PASS canonicalize('http://www.looĸout.net/') is 'http://www.xn--looout-5bb.net/'
    1212PASS canonicalize('http://ᗯᗯᗯ.lookout.net/') is 'http://xn--1qeaa.lookout.net/'
    1313PASS canonicalize('http://www.lookout.сом/') is 'http://www.lookout.xn--l1adi/'
    1414FAIL canonicalize('http://www.lookout.net:80/') should be http://www.lookout.net:80/. Was http://www.lookout.net:80/.
    15 PASS canonicalize('http://www‥lookout.net/') is 'http://www..lookout.net/'
     15FAIL canonicalize('http://www‥lookout.net/') should be http://www..lookout.net/. Was http://www‥lookout.net/.
    1616PASS canonicalize('http://www.lookout‧net/') is 'http://www.xn--lookoutnet-406e/'
    1717PASS canonicalize('http://www.looĸout.net/') is 'http://www.xn--looout-5bb.net/'
  • trunk/LayoutTests/fast/url/idna2008-expected.txt

    r104414 r208902  
    66The PASS/FAIL results of this test are set to the behavior in IDNA2008.
    77PASS canonicalize('http://Bücher.de/') is 'http://xn--bcher-kva.de/'
    8 FAIL canonicalize('http://faß.de/') should be http://xn--fa-hia.de/. Was http://fass.de/.
    9 FAIL canonicalize('http://βόλος.com/') should be http://xn--nxasmm1c.com/. Was http://xn--nxasmq6b.com/.
    10 FAIL canonicalize('http://ශ්‍රී.com/') should be http://xn--10cl1a0b660p.com/. Was http://xn--10cl1a0b.com/.
    11 FAIL canonicalize('http://نامه‌ای.com/') should be http://xn--mgba3gch31f060k.com/. Was http://xn--mgba3gch31f.com/.
     8PASS canonicalize('http://faß.de/') is 'http://xn--fa-hia.de/'
     9PASS canonicalize('http://βόλος.com/') is 'http://xn--nxasmm1c.com/'
     10PASS canonicalize('http://ශ්‍රී.com/') is 'http://xn--10cl1a0b660p.com/'
     11PASS canonicalize('http://نامه‌ای.com/') is 'http://xn--mgba3gch31f060k.com/'
    1212FAIL canonicalize('http://♥.net/') should be http://�.net/. Was http://xn--g6h.net/.
    13 FAIL canonicalize('http://͸.net/') should be http://�.net/. Was http://xn--zva.net/.
    14 FAIL canonicalize('http://Ӏ.com/') should be http://�.com/. Was http://xn--d5a.com/.
    15 FAIL canonicalize('http://㛼.com/') should be http://�.com/. Was http://xn--j74i.com/.
    16 FAIL canonicalize('http://Ↄ.com/') should be http://�.com/. Was http://xn--q5g.com/.
     13FAIL canonicalize('http://͸.net/') should be http://�.net/. Was http://͸.net/.
     14FAIL canonicalize('http://Ӏ.com/') should be http://�.com/. Was http://Ӏ.com/.
     15FAIL canonicalize('http://㛼.com/') should be http://�.com/. Was http://㛼.com/.
     16FAIL canonicalize('http://Ↄ.com/') should be http://�.com/. Was http://.com/.
    1717PASS canonicalize('http://look͏out.net/') is 'http://lookout.net/'
    1818PASS canonicalize('http://gOoGle.com/') is 'http://google.com/'
    1919FAIL canonicalize('http://ড়.com/') should be http://ড়.com/. Was http://xn--15b8c.com/.
    20 FAIL canonicalize('http://ẞ.com/') should be http://ss.com/. Was http://xn--kkg.com/.
    21 FAIL canonicalize('http://ẞ.foo.com/') should be http://ss.foo.com/. Was http://xn--kkg.foo.com/.
     20PASS canonicalize('http://ẞ.com/') is 'http://ss.com/'
     21PASS canonicalize('http://ẞ.foo.com/') is 'http://ss.foo.com/'
    2222FAIL canonicalize('http://-foo.bar.com/') should be http:///. Was http://-foo.bar.com/.
    2323FAIL canonicalize('http://foo-.bar.com/') should be http:///. Was http://foo-.bar.com/.
  • trunk/Source/WebCore/ChangeLog

    r208899 r208902  
     12016-11-17  Alex Christensen  <achristensen@webkit.org>
     2
     3        Support IDN2008 with UTS #46 instead of IDN2003
     4        https://bugs.webkit.org/show_bug.cgi?id=144194
     5
     6        Reviewed by Darin Adler.
     7
     8        Use uidna_nameToASCII instead of the deprecated uidna_IDNToASCII.
     9        It uses IDN2008 instead of IDN2003, and it uses UTF #46 when used with a UIDNA opened with uidna_openUTS46.
     10        This follows https://url.spec.whatwg.org/#concept-domain-to-ascii except we do not use Transitional_Processing
     11        to prevent homograph attacks on german domain names with "ß" and "ss" in them.  These are now treated as separate domains.
     12        Firefox also doesn't use Transitional_Processing. Chrome and the current specification use Transitional_processing,
     13        but https://github.com/whatwg/url/issues/110 might change the spec.
     14       
     15        In addition, http://unicode.org/reports/tr46/ says:
     16        "implementations are encouraged to apply the Bidi and ContextJ validity criteria"
     17        Bidi checks prevent domain names with bidirectional text, such as latin and hebrew characters in the same domain.  Chrome and Firefox do this.
     18
     19        ContextJ checks prevent code points such as U+200D, which is a zero-width joiner which users would not see when looking at the domain name.
     20        Firefox currently enables ContextJ checks and it is suggested by UTS #46, so we'll do it.
     21
     22        ContextO checks, which we do not use and neither does any other browser nor the spec, would fail if a domain contains code points such as U+30FB,
     23        which looks somewhat like a dot.  We can investigate enabling these checks later.
     24
     25        Covered by new API tests and rebased LayoutTests.
     26        The new API tests verify that we do not use transitional processing, that we do apply the Bidi and ContextJ checks, but not ContextO checks.
     27
     28        * platform/URLParser.cpp:
     29        (WebCore::URLParser::domainToASCII):
     30        (WebCore::URLParser::internationalDomainNameTranscoder):
     31        * platform/URLParser.h:
     32        * platform/mac/WebCoreNSURLExtras.mm:
     33        (WebCore::mapHostNameWithRange):
     34
    1352016-11-18  Dean Jackson  <dino@apple.com>
    236
  • trunk/Source/WebCore/platform/URLParser.cpp

    r208815 r208902  
    3030#include "RuntimeApplicationChecks.h"
    3131#include <array>
     32#include <mutex>
    3233#include <unicode/uidna.h>
    3334#include <unicode/utypes.h>
     
    24802481    UChar hostnameBuffer[defaultInlineBufferSize];
    24812482    UErrorCode error = U_ZERO_ERROR;
    2482 
    2483 #if COMPILER(GCC) || COMPILER(CLANG)
    2484 #pragma GCC diagnostic push
    2485 #pragma GCC diagnostic ignored "-Wdeprecated-declarations"
    2486 #endif
    2487     // FIXME: This should use uidna_openUTS46 / uidna_close instead
    2488     int32_t numCharactersConverted = uidna_IDNToASCII(StringView(domain).upconvertedCharacters(), domain.length(), hostnameBuffer, defaultInlineBufferSize, UIDNA_ALLOW_UNASSIGNED, nullptr, &error);
    2489 #if COMPILER(GCC) || COMPILER(CLANG)
    2490 #pragma GCC diagnostic pop
    2491 #endif
     2483    UIDNAInfo processingDetails = UIDNA_INFO_INITIALIZER;
     2484    int32_t numCharactersConverted = uidna_nameToASCII(&internationalDomainNameTranscoder(), StringView(domain).upconvertedCharacters(), domain.length(), hostnameBuffer, defaultInlineBufferSize, &processingDetails, &error);
    24922485    ASSERT(numCharactersConverted <= static_cast<int32_t>(defaultInlineBufferSize));
    24932486
    2494     if (error == U_ZERO_ERROR) {
     2487    if (U_SUCCESS(error) && !processingDetails.errors) {
    24952488        for (int32_t i = 0; i < numCharactersConverted; ++i) {
    24962489            ASSERT(isASCII(hostnameBuffer[i]));
     
    27592752    }
    27602753    return String::adopt(WTFMove(output));
     2754}
     2755
     2756const UIDNA& URLParser::internationalDomainNameTranscoder()
     2757{
     2758    static UIDNA* encoder;
     2759    static std::once_flag onceFlag;
     2760    std::call_once(onceFlag, [] {
     2761        UErrorCode error = U_ZERO_ERROR;
     2762        encoder = uidna_openUTS46(UIDNA_CHECK_BIDI | UIDNA_CHECK_CONTEXTJ | UIDNA_NONTRANSITIONAL_TO_UNICODE | UIDNA_NONTRANSITIONAL_TO_ASCII, &error);
     2763        RELEASE_ASSERT(U_SUCCESS(error));
     2764        RELEASE_ASSERT(encoder);
     2765    });
     2766    return *encoder;
    27612767}
    27622768
  • trunk/Source/WebCore/platform/URLParser.h

    r208815 r208902  
    3030#include <wtf/Forward.h>
    3131
     32struct UIDNA;
     33
    3234namespace WebCore {
    3335
     
    4850    static URLEncodedForm parseURLEncodedForm(StringView);
    4951    static String serialize(const URLEncodedForm&);
     52
     53    static const UIDNA& internationalDomainNameTranscoder();
    5054
    5155private:
  • trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm

    r200163 r208902  
    2828
    2929#import "config.h"
     30#import "URLParser.h"
    3031#import "WebCoreObjCExtras.h"
    3132#import "WebCoreNSStringExtras.h"
     
    479480   
    480481    UErrorCode uerror = U_ZERO_ERROR;
    481     int32_t numCharactersConverted = (encode ? uidna_IDNToASCII : uidna_IDNToUnicode)(sourceBuffer, length, destinationBuffer, HOST_NAME_BUFFER_LENGTH, UIDNA_ALLOW_UNASSIGNED, NULL, &uerror);
    482     if (U_FAILURE(uerror)) {
     482    UIDNAInfo processingDetails = UIDNA_INFO_INITIALIZER;
     483    int32_t numCharactersConverted = (encode ? uidna_nameToASCII : uidna_nameToUnicode)(&URLParser::internationalDomainNameTranscoder(), sourceBuffer, length, destinationBuffer, HOST_NAME_BUFFER_LENGTH, &processingDetails, &uerror);
     484    if (U_FAILURE(uerror) || processingDetails.errors) {
    483485        *error = YES;
    484486        return nil;
  • trunk/Tools/ChangeLog

    r208882 r208902  
     12016-11-17  Alex Christensen  <achristensen@webkit.org>
     2
     3        Support IDN2008 with UTS #46 instead of IDN2003
     4        https://bugs.webkit.org/show_bug.cgi?id=144194
     5
     6        Reviewed by Darin Adler.
     7
     8        * TestWebKitAPI/Tests/WebCore/URLParser.cpp:
     9        (TestWebKitAPI::TEST_F):
     10        Add some tests from http://unicode.org/faq/idn.html verifying that we follow UTS46's deviations from IDN2008.
     11        Add some tests based on https://tools.ietf.org/html/rfc5893 verifying that we check for bidirectional text.
     12        Add a test based on https://tools.ietf.org/html/rfc5892 verifying that we do not do ContextO check.
     13        Add a test for U+321D and U+321E which have particularly interesting punycode encodings.  We match Firefox here now.
     14        Also add a test from http://www.unicode.org/reports/tr46/#IDNAComparison verifying we are not using IDN2003.
     15        We should consider importing all of http://www.unicode.org/Public/idna/9.0.0/IdnaTest.txt as URL domain tests.
     16
    1172016-11-17  Carlos Garcia Campos  <cgarcia@igalia.com>
    218
  • trunk/Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp

    r208842 r208902  
    10941094        {"a", "", "", "b", 0, "", "", "", "a://b"},
    10951095        {"", "", "", "", 0, "", "", "", "a://b"});
     1096    checkURL(utf16String(u"http://öbb.at"), {"http", "", "", "xn--bb-eka.at", 0, "/", "", "", "http://xn--bb-eka.at/"});
     1097    checkURL(utf16String(u"http://ÖBB.at"), {"http", "", "", "xn--bb-eka.at", 0, "/", "", "", "http://xn--bb-eka.at/"});
     1098    checkURL(utf16String(u"http://√.com"), {"http", "", "", "xn--19g.com", 0, "/", "", "", "http://xn--19g.com/"});
     1099    checkURLDifferences(utf16String(u"http://faß.de"),
     1100        {"http", "", "", "xn--fa-hia.de", 0, "/", "", "", "http://xn--fa-hia.de/"},
     1101        {"http", "", "", "fass.de", 0, "/", "", "", "http://fass.de/"});
     1102    checkURL(utf16String(u"http://ԛәлп.com"), {"http", "", "", "xn--k1ai47bhi.com", 0, "/", "", "", "http://xn--k1ai47bhi.com/"});
     1103    checkURLDifferences(utf16String(u"http://Ⱥbby.com"),
     1104        {"http", "", "", "xn--bby-iy0b.com", 0, "/", "", "", "http://xn--bby-iy0b.com/"},
     1105        {"http", "", "", "xn--bby-spb.com", 0, "/", "", "", "http://xn--bby-spb.com/"});
     1106    checkURLDifferences(utf16String(u"http://\u2132"),
     1107        {"", "", "", "", 0, "", "", "", utf16String(u"http://Ⅎ")},
     1108        {"http", "", "", "xn--f3g", 0, "/", "", "", "http://xn--f3g/"});
     1109    checkURLDifferences(utf16String(u"http://\u05D9\u05B4\u05D5\u05D0\u05B8/"),
     1110        {"http", "", "", "xn--cdbi5etas", 0, "/", "", "", "http://xn--cdbi5etas/"},
     1111        {"", "", "", "", 0, "", "", "", "about:blank"}, TestTabs::No);
     1112    checkURLDifferences(utf16String(u"http://bidirectional\u0786\u07AE\u0782\u07B0\u0795\u07A9\u0793\u07A6\u0783\u07AA/"),
     1113        {"", "", "", "", 0, "", "", "", utf16String(u"http://bidirectionalކޮންޕީޓަރު/")},
     1114        {"", "", "", "", 0, "", "", "", "about:blank"}, TestTabs::No);
     1115    checkURLDifferences(utf16String(u"http://contextj\u200D"),
     1116        {"", "", "", "", 0, "", "", "", utf16String(u"http://contextj\u200D")},
     1117        {"http", "", "", "contextj", 0, "/", "", "", "http://contextj/"});
     1118    checkURL(utf16String(u"http://contexto\u30FB"), {"http", "", "", "xn--contexto-wg5g", 0, "/", "", "", "http://xn--contexto-wg5g/"});
     1119    checkURLDifferences(utf16String(u"http://\u321D\u321E/"),
     1120        {"http", "", "", "xn--()()-bs0sc174agx4b", 0, "/", "", "", "http://xn--()()-bs0sc174agx4b/"},
     1121        {"http", "", "", "xn--5mkc", 0, "/", "", "", "http://xn--5mkc/"});
    10961122}
    10971123
Note: See TracChangeset for help on using the changeset viewer.