Changeset 65351 in webkit


Ignore:
Timestamp:
Aug 13, 2010 8:18:16 PM (14 years ago)
Author:
eric@webkit.org
Message:

2010-08-12 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

Add support for MathML entities
https://bugs.webkit.org/show_bug.cgi?id=43949

Test progression for proper entity support.

  • html5lib/runner-expected-html5.txt:
  • html5lib/runner-expected.txt:

2010-08-09 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

Add support for MathML entities
https://bugs.webkit.org/show_bug.cgi?id=43949

Implementing the HTML5 entity parsing algorithm require refactoring how
we search for entity names. Instead of using a perfect hash, we now
use a sorted list. As we advance through the input, we walk down a
binary search of the table looking for an entity.

Using this data structure lets us keep track of whether the current
string is a prefix of an existing entity, which we need for the
algorithm. In a future patch, I plan to add some indices to the
table, which should let us narrow down the range of interesting entries
more quickly.

The one nasty piece of the algorithm is if we walk too far down the
input and we need to back up to a previous match. In this patch, we
accomplish this by rewinding the input and consuming a known number of
characters to resync the source.

  • WebCore.xcodeproj/project.pbxproj:
  • html/HTMLEntityParser.cpp: (WebCore::consumeHTMLEntity):
  • html/HTMLEntitySearch.cpp: Added. (WebCore::): (WebCore::HTMLEntitySearch::HTMLEntitySearch): (WebCore::HTMLEntitySearch::compare): (WebCore::HTMLEntitySearch::findStart): (WebCore::HTMLEntitySearch::findEnd): (WebCore::HTMLEntitySearch::advance):
  • html/HTMLEntitySearch.h: Added. (WebCore::HTMLEntitySearch::isEntityPrefix): (WebCore::HTMLEntitySearch::currentValue): (WebCore::HTMLEntitySearch::lastMatch): (WebCore::HTMLEntitySearch::): (WebCore::HTMLEntitySearch::fail):
  • html/HTMLEntityTable.h: Added. (WebCore::HTMLEntityTableEntry::lastCharacter):

2010-08-12 Adam Barth <abarth@webkit.org>

Reviewed by Eric Seidel.

Add support for MathML entities
https://bugs.webkit.org/show_bug.cgi?id=43949

A script for generating the C++ state data structure describing all the
entities from a JSON description.

  • Scripts/create-html-entity-table: Added.
Location:
trunk
Files:
4 added
1 deleted
16 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r65348 r65351  
     12010-08-12  Adam Barth  <abarth@webkit.org>
     2
     3        Reviewed by Eric Seidel.
     4
     5        Add support for MathML entities
     6        https://bugs.webkit.org/show_bug.cgi?id=43949
     7
     8        Test progression for proper entity support.
     9
     10        * html5lib/runner-expected-html5.txt:
     11        * html5lib/runner-expected.txt:
     12
    1132010-08-13  Mihai Parparita  <mihaip@chromium.org>
    214
  • trunk/LayoutTests/html5lib/runner-expected-html5.txt

    r65213 r65351  
    119119resources/scriptdata01.dat: PASS
    120120
    121 resources/html5test-com.dat:
    122 7
    123 9
    124 10
    125 11
    126 
    127 Test 7 of 24 in resources/html5test-com.dat failed. Input:
    128 &lang;&rang;
    129 Got:
    130 | <html>
    131 |   <head>
    132 |   <body>
    133 |     "〈〉"
    134 Expected:
    135 | <html>
    136 |   <head>
    137 |   <body>
    138 |     "⟨⟩"
    139 
    140 Test 9 of 24 in resources/html5test-com.dat failed. Input:
    141 &ImaginaryI;
    142 Got:
    143 | <html>
    144 |   <head>
    145 |   <body>
    146 |     "&ImaginaryI;"
    147 Expected:
    148 | <html>
    149 |   <head>
    150 |   <body>
    151 |     "ⅈ"
    152 
    153 Test 10 of 24 in resources/html5test-com.dat failed. Input:
    154 &Kopf;
    155 Got:
    156 | <html>
    157 |   <head>
    158 |   <body>
    159 |     "&Kopf;"
    160 Expected:
    161 | <html>
    162 |   <head>
    163 |   <body>
    164 |     "𝕂"
    165 
    166 Test 11 of 24 in resources/html5test-com.dat failed. Input:
    167 &notinva;
    168 Got:
    169 | <html>
    170 |   <head>
    171 |   <body>
    172 |     "&notinva;"
    173 Expected:
    174 | <html>
    175 |   <head>
    176 |   <body>
    177 |     "∉"
    178 resources/entities01.dat:
    179 2
    180 5
    181 
    182 Test 2 of 68 in resources/entities01.dat failed. Input:
    183 FOO&gtBAR
    184 Got:
    185 | <html>
    186 |   <head>
    187 |   <body>
    188 |     "FOO&gtBAR"
    189 Expected:
    190 | <html>
    191 |   <head>
    192 |   <body>
    193 |     "FOO>BAR"
    194 
    195 Test 5 of 68 in resources/entities01.dat failed. Input:
    196 I'm &notit; I tell you
    197 Got:
    198 | <html>
    199 |   <head>
    200 |   <body>
    201 |     "I'm &notit; I tell you"
    202 Expected:
    203 | <html>
    204 |   <head>
    205 |   <body>
    206 |     "I'm ¬it; I tell you"
     121resources/html5test-com.dat: PASS
     122
     123resources/entities01.dat: PASS
     124
    207125resources/entities02.dat: PASS
    208126
  • trunk/LayoutTests/html5lib/runner-expected.txt

    r65006 r65351  
    192192resources/scriptdata01.dat: PASS
    193193
    194 resources/html5test-com.dat:
    195 7
    196 9
    197 10
    198 11
    199 
    200 Test 7 of 24 in resources/html5test-com.dat failed. Input:
    201 &lang;&rang;
    202 Got:
    203 | <html>
    204 |   <head>
    205 |   <body>
    206 |     "〈〉"
    207 Expected:
    208 | <html>
    209 |   <head>
    210 |   <body>
    211 |     "⟨⟩"
    212 
    213 Test 9 of 24 in resources/html5test-com.dat failed. Input:
    214 &ImaginaryI;
    215 Got:
    216 | <html>
    217 |   <head>
    218 |   <body>
    219 |     "&ImaginaryI;"
    220 Expected:
    221 | <html>
    222 |   <head>
    223 |   <body>
    224 |     "ⅈ"
    225 
    226 Test 10 of 24 in resources/html5test-com.dat failed. Input:
    227 &Kopf;
    228 Got:
    229 | <html>
    230 |   <head>
    231 |   <body>
    232 |     "&Kopf;"
    233 Expected:
    234 | <html>
    235 |   <head>
    236 |   <body>
    237 |     "𝕂"
    238 
    239 Test 11 of 24 in resources/html5test-com.dat failed. Input:
    240 &notinva;
    241 Got:
    242 | <html>
    243 |   <head>
    244 |   <body>
    245 |     "&notinva;"
    246 Expected:
    247 | <html>
    248 |   <head>
    249 |   <body>
    250 |     "∉"
    251 resources/entities01.dat:
    252 2
    253 5
    254 
    255 Test 2 of 68 in resources/entities01.dat failed. Input:
    256 FOO&gtBAR
    257 Got:
    258 | <html>
    259 |   <head>
    260 |   <body>
    261 |     "FOO&gtBAR"
    262 Expected:
    263 | <html>
    264 |   <head>
    265 |   <body>
    266 |     "FOO>BAR"
    267 
    268 Test 5 of 68 in resources/entities01.dat failed. Input:
    269 I'm &notit; I tell you
    270 Got:
    271 | <html>
    272 |   <head>
    273 |   <body>
    274 |     "I'm &notit; I tell you"
    275 Expected:
    276 | <html>
    277 |   <head>
    278 |   <body>
    279 |     "I'm ¬it; I tell you"
     194resources/html5test-com.dat: PASS
     195
     196resources/entities01.dat: PASS
     197
    280198resources/entities02.dat: PASS
    281199
  • trunk/WebCore/CMakeLists.txt

    r65336 r65351  
    972972    html/HTMLElement.cpp
    973973    html/HTMLElementStack.cpp
     974    html/HTMLEntitySearch.cpp
    974975    html/HTMLEmbedElement.cpp
    975976    html/HTMLFieldSetElement.cpp
  • trunk/WebCore/ChangeLog

    r65350 r65351  
     12010-08-09  Adam Barth  <abarth@webkit.org>
     2
     3        Reviewed by Eric Seidel.
     4
     5        Add support for MathML entities
     6        https://bugs.webkit.org/show_bug.cgi?id=43949
     7
     8        Implementing the HTML5 entity parsing algorithm require refactoring how
     9        we search for entity names.  Instead of using a perfect hash, we now
     10        use a sorted list.  As we advance through the input, we walk down a
     11        binary search of the table looking for an entity.
     12
     13        Using this data structure lets us keep track of whether the current
     14        string is a prefix of an existing entity, which we need for the
     15        algorithm.  In a future patch, I plan to add some indices to the
     16        table, which should let us narrow down the range of interesting entries
     17        more quickly.
     18
     19        The one nasty piece of the algorithm is if we walk too far down the
     20        input and we need to back up to a previous match.  In this patch, we
     21        accomplish this by rewinding the input and consuming a known number of
     22        characters to resync the source.
     23
     24        * WebCore.xcodeproj/project.pbxproj:
     25        * html/HTMLEntityParser.cpp:
     26        (WebCore::consumeHTMLEntity):
     27        * html/HTMLEntitySearch.cpp: Added.
     28        (WebCore::):
     29        (WebCore::HTMLEntitySearch::HTMLEntitySearch):
     30        (WebCore::HTMLEntitySearch::compare):
     31        (WebCore::HTMLEntitySearch::findStart):
     32        (WebCore::HTMLEntitySearch::findEnd):
     33        (WebCore::HTMLEntitySearch::advance):
     34        * html/HTMLEntitySearch.h: Added.
     35        (WebCore::HTMLEntitySearch::isEntityPrefix):
     36        (WebCore::HTMLEntitySearch::currentValue):
     37        (WebCore::HTMLEntitySearch::lastMatch):
     38        (WebCore::HTMLEntitySearch::):
     39        (WebCore::HTMLEntitySearch::fail):
     40        * html/HTMLEntityTable.h: Added.
     41        (WebCore::HTMLEntityTableEntry::lastCharacter):
     42
    1432010-08-13  Tony Gentilcore  <tonyg@chromium.org>
    244
  • trunk/WebCore/DerivedSources.make

    r65218 r65351  
    506506    DocTypeStrings.cpp \
    507507    HTMLElementFactory.cpp \
    508     HTMLEntityNames.cpp \
     508    HTMLEntityTable.cpp \
    509509    HTMLNames.cpp \
    510510    WMLElementFactory.cpp \
     
    601601# HTML entity names
    602602
    603 HTMLEntityNames.cpp : html/HTMLEntityNames.gperf $(WebCore)/make-hash-tools.pl
    604         perl $(WebCore)/make-hash-tools.pl . $(WebCore)/html/HTMLEntityNames.gperf
     603HTMLEntityTable.cpp : html/HTMLEntityNames.json $(WebCore)/../WebKitTools/Scripts/create-html-entity-table
     604        python $(WebCore)/../WebKitTools/Scripts/create-html-entity-table -o HTMLEntityTable.cpp $(WebCore)/html/HTMLEntityNames.json
    605605
    606606# --------
  • trunk/WebCore/GNUmakefile.am

    r65312 r65351  
    9393        DerivedSources/WebCore/HTMLElementFactory.cpp \
    9494        DerivedSources/WebCore/HTMLElementFactory.h \
    95         DerivedSources/WebCore/HTMLEntityNames.cpp \
     95        DerivedSources/WebCore/HTMLEntityTable.cpp \
    9696        DerivedSources/WebCore/HTMLNames.cpp \
    9797        DerivedSources/WebCore/HTMLNames.h \
     
    14281428        WebCore/html/HTMLElementStack.cpp \
    14291429        WebCore/html/HTMLElementStack.h \
     1430        WebCore/html/HTMLEntitySearch.cpp \
     1431        WebCore/html/HTMLEntitySearch.h \
    14301432        WebCore/html/HTMLEmbedElement.cpp \
    14311433        WebCore/html/HTMLEmbedElement.h \
     
    43964398
    43974399# HTML entity names
    4398 DerivedSources/WebCore/HTMLEntityNames.cpp : $(WebCore)/html/HTMLEntityNames.gperf $(WebCore)/make-hash-tools.pl
    4399         $(PERL) $(WebCore)/make-hash-tools.pl $(GENSOURCES_WEBCORE) $(WebCore)/html/HTMLEntityNames.gperf
     4400DerivedSources/WebCore/HTMLEntityTable.cpp : $(WebCore)/html/HTMLEntityNames.json $(WebCore)/../WebKitTools/Scripts/create-html-entity-table
     4401        $(PYTHON) $(WebCore)/../WebKitTools/Scripts/create-html-entity-table -o $(GENSOURCES_WEBCORE)/HTMLEntityTable.cpp $(WebCore)/html/HTMLEntityNames.json
    44004402
    44014403# color names
  • trunk/WebCore/WebCore.gyp/WebCore.gyp

    r64680 r65351  
    277277        # gperf rule
    278278        '../html/DocTypeStrings.gperf',
    279         '../html/HTMLEntityNames.gperf',
    280279        '../platform/ColorData.gperf',
     280
     281        # json rule
     282        '../html/HTMLEntityNames.json',
    281283
    282284        # idl rules
     
    599601            '<(SHARED_INTERMEDIATE_DIR)/webkit/<(RULE_INPUT_ROOT).cpp',
    600602          ],
    601           'dependencies': [
     603          'inputs': [
    602604            '../make-hash-tools.pl',
    603605          ],
     
    609611          ],
    610612          'process_outputs_as_sources': 0,
     613        },
     614        {
     615          'rule_name': 'json',
     616          'extension': 'json',
     617          #
     618          # json outputs are generated by WebKitTools/Scripts/create-html-entity-table
     619          #
     620          'outputs': [
     621            '<(SHARED_INTERMEDIATE_DIR)/webkit/HTMLEntityTable.cpp',
     622          ],
     623          'inputs': [
     624            '../../WebKitTools/Scripts/create-html-entity-table',
     625          ],
     626          'action': [
     627            'python',
     628            '../../WebKitTools/Scripts/create-html-entity-table',
     629            '-o',
     630            '<(SHARED_INTERMEDIATE_DIR)/webkit/HTMLEntityTable.cpp',
     631            '<(RULE_INPUT_PATH)',
     632          ],
    611633        },
    612634        # Rule to build generated JavaScript (V8) bindings from .idl source.
  • trunk/WebCore/WebCore.gypi

    r65312 r65351  
    15891589            'html/HTMLElementStack.cpp',
    15901590            'html/HTMLElementStack.h',
     1591            'html/HTMLEntitySearch.cpp',
     1592            'html/HTMLEntitySearch.h',
    15911593            'html/HTMLEmbedElement.cpp',
    15921594            'html/HTMLEmbedElement.h',
  • trunk/WebCore/WebCore.pri

    r65070 r65351  
    3030XMLNS_NAMES = $$PWD/xml/xmlnsattrs.in
    3131
    32 ENTITIES_GPERF = $$PWD/html/HTMLEntityNames.gperf
     32HTML_ENTITIES = $$PWD/html/HTMLEntityNames.json
    3333
    3434COLORDATA_GPERF = $$PWD/platform/ColorData.gperf
     
    591591
    592592# GENERATOR 8-A:
    593 entities.output = $${WC_GENERATED_SOURCES_DIR}/HTMLEntityNames.cpp
    594 entities.input = ENTITIES_GPERF
    595 entities.wkScript = $$PWD/make-hash-tools.pl
    596 entities.commands = perl $$entities.wkScript $${WC_GENERATED_SOURCES_DIR} $$ENTITIES_GPERF
     593entities.output = $${WC_GENERATED_SOURCES_DIR}/HTMLEntityTable.cpp
     594entities.input = HTML_ENTITIES
     595entities.wkScript = $$PWD/../WebKitTools/Scripts/create-html-entity-table
     596entities.commands = python $$entities.wkScript -o $${WC_GENERATED_SOURCES_DIR}/HTMLEntityTable.cpp $$HTML_ENTITIES
    597597entities.clean = ${QMAKE_FILE_OUT}
    598 entities.depends = $$PWD/make-hash-tools.pl
     598entities.depends = $$PWD/../WebKitTools/Scripts/create-html-entity-table
    599599addExtraCompiler(entities)
    600600
  • trunk/WebCore/WebCore.pro

    r65321 r65351  
    672672    html/HTMLElement.cpp \
    673673    html/HTMLElementStack.cpp \
     674    html/HTMLEntitySearch.cpp \
    674675    html/HTMLEmbedElement.cpp \
    675676    html/HTMLFieldSetElement.cpp \
  • trunk/WebCore/WebCore.vcproj/WebCore.vcproj

    r65312 r65351  
    3763837638                        </File>
    3763937639                        <File
     37640                                RelativePath="..\html\HTMLEntitySearch.cpp"
     37641                                >
     37642                        </File>
     37643                        <File
     37644                                RelativePath="..\html\HTMLEntitySearch.h"
     37645                                >
     37646                        </File>
     37647                        <File
    3764037648                                RelativePath="..\html\HTMLEmbedElement.cpp"
    3764137649                                >
  • trunk/WebCore/WebCore.xcodeproj/project.pbxproj

    r65349 r65351  
    31843184                A8A909AC0CBCD6B50029B807 /* RenderSVGTransformableContainer.h in Headers */ = {isa = PBXBuildFile; fileRef = A8A909AA0CBCD6B50029B807 /* RenderSVGTransformableContainer.h */; };
    31853185                A8A909AD0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8A909AB0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp */; };
     3186                A8BC044E1214EB2A00B5F122 /* HTMLEntitySearch.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */; };
     3187                A8BC044F1214EB2B00B5F122 /* HTMLEntitySearch.h in Headers */ = {isa = PBXBuildFile; fileRef = 970C4FE01211266200C3D393 /* HTMLEntitySearch.h */; };
     3188                A8BC04921214F69600B5F122 /* HTMLEntityTable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */; };
    31863189                A8BCFD05120A046100B5F122 /* SVGPathSeg.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8BCFD04120A046100B5F122 /* SVGPathSeg.cpp */; };
    31873190                A8C2280E11D4A59700D5A7D3 /* DocumentParser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A8C2280D11D4A59700D5A7D3 /* DocumentParser.cpp */; };
     
    84818484                97059975107D975200A50A7C /* PolicyChecker.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = PolicyChecker.cpp; sourceTree = "<group>"; };
    84828485                97059976107D975200A50A7C /* PolicyChecker.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = PolicyChecker.h; sourceTree = "<group>"; };
     8486                970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntitySearch.cpp; sourceTree = "<group>"; };
     8487                970C4FE01211266200C3D393 /* HTMLEntitySearch.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLEntitySearch.h; sourceTree = "<group>"; };
     8488                970C4FE11211266200C3D393 /* HTMLEntityTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntityTable.cpp; sourceTree = "<group>"; };
     8489                970C4FE21211266200C3D393 /* HTMLEntityTable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLEntityTable.h; sourceTree = "<group>"; };
    84838490                9719AEFF11D09F2C00D45831 /* HTMLInputStream.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = HTMLInputStream.h; sourceTree = "<group>"; };
    84848491                9738899E116EA9DC00ADF313 /* DocumentWriter.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocumentWriter.cpp; sourceTree = "<group>"; };
     
    88668873                A8A909AA0CBCD6B50029B807 /* RenderSVGTransformableContainer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RenderSVGTransformableContainer.h; sourceTree = "<group>"; };
    88678874                A8A909AB0CBCD6B50029B807 /* RenderSVGTransformableContainer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = RenderSVGTransformableContainer.cpp; sourceTree = "<group>"; };
     8875                A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntityTable.cpp; sourceTree = "<group>"; };
    88688876                A8BCFD04120A046100B5F122 /* SVGPathSeg.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SVGPathSeg.cpp; sourceTree = "<group>"; };
    88698877                A8C2280D11D4A59700D5A7D3 /* DocumentParser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocumentParser.cpp; sourceTree = "<group>"; };
     
    1094610954                E406F3FA1198304D009D59D6 /* DocTypeStrings.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DocTypeStrings.cpp; sourceTree = "<group>"; };
    1094710955                E406F3FB1198307D009D59D6 /* ColorData.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = ColorData.cpp; sourceTree = "<group>"; };
    10948                 E406F4021198329A009D59D6 /* HTMLEntityNames.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = HTMLEntityNames.cpp; sourceTree = "<group>"; };
    1094910956                E415F10C0D9A05870033CE97 /* ElementTimeControl.idl */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text; path = ElementTimeControl.idl; sourceTree = "<group>"; };
    1095010957                E415F1680D9A165D0033CE97 /* DOMElementTimeControl.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DOMElementTimeControl.h; sourceTree = "<group>"; };
     
    1229312300                                A17C81200F2A5CF7005DAAEB /* HTMLElementFactory.cpp */,
    1229412301                                A17C81210F2A5CF7005DAAEB /* HTMLElementFactory.h */,
    12295                                 E406F4021198329A009D59D6 /* HTMLEntityNames.cpp */,
     12302                                A8BC04911214F69600B5F122 /* HTMLEntityTable.cpp */,
    1229612303                                A8D06B380A265DCD005E7203 /* HTMLNames.cpp */,
    1229712304                                A8D06B370A265DCD005E7203 /* HTMLNames.h */,
     
    1399113998                                976E895E11C0CA3A00EA9CA9 /* HTMLEntityParser.cpp */,
    1399213999                                976E895F11C0CA3A00EA9CA9 /* HTMLEntityParser.h */,
     14000                                970C4FDF1211266200C3D393 /* HTMLEntitySearch.cpp */,
     14001                                970C4FE01211266200C3D393 /* HTMLEntitySearch.h */,
     14002                                970C4FE11211266200C3D393 /* HTMLEntityTable.cpp */,
     14003                                970C4FE21211266200C3D393 /* HTMLEntityTable.h */,
    1399314004                                A81369B9097374F500D74463 /* HTMLFieldSetElement.cpp */,
    1399414005                                A81369B8097374F500D74463 /* HTMLFieldSetElement.h */,
     
    2016420175                                CE172E011136E8CE0062A533 /* ZoomMode.h in Headers */,
    2016520176                                2EED57FE1214A9C2007656BB /* ThreadableBlobRegistry.h in Headers */,
     20177                                A8BC044F1214EB2B00B5F122 /* HTMLEntitySearch.h in Headers */,
    2016620178                        );
    2016720179                        runOnlyForDeploymentPostprocessing = 0;
     
    2259122603                                97DD4D860FDF4D6E00ECF9A4 /* XSSAuditor.cpp in Sources */,
    2259222604                                2EED57FD1214A9C2007656BB /* ThreadableBlobRegistry.cpp in Sources */,
     22605                                A8BC044E1214EB2A00B5F122 /* HTMLEntitySearch.cpp in Sources */,
     22606                                A8BC04921214F69600B5F122 /* HTMLEntityTable.cpp in Sources */,
    2259322607                        );
    2259422608                        runOnlyForDeploymentPostprocessing = 0;
  • trunk/WebCore/html/HTMLEntityParser.cpp

    r65171 r65351  
    2929#include "HTMLEntityParser.h"
    3030
     31#include "HTMLEntitySearch.h"
     32#include "HTMLEntityTable.h"
    3133#include <wtf/Vector.h>
    32 
    33 #include "HTMLEntityNames.cpp"
    3434
    3535using namespace WTF;
     
    103103    unsigned result = 0;
    104104    Vector<UChar, 10> consumedCharacters;
    105     Vector<char, 10> entityName;
    106105
    107106    while (!source.isEmpty()) {
     
    167166                source.advancePastNonNewline();
    168167                return legalEntityFor(result);
    169             } else 
     168            } else
    170169                return legalEntityFor(result);
    171170            break;
     
    182181        }
    183182        case Named: {
    184             // FIXME: This code is wrong. We need to find the longest matching entity.
    185             //        The examples from the spec are:
    186             //            I'm &notit; I tell you
    187             //            I'm &notin; I tell you
    188             //        In the first case, "&not" is the entity.  In the second
    189             //        case, "&notin;" is the entity.
    190             // FIXME: Our list of HTML entities is incomplete.
    191             // FIXME: The number 8 below is bogus.
    192             while (!source.isEmpty() && entityName.size() <= 8) {
     183            HTMLEntitySearch entitySearch;
     184            while (!source.isEmpty()) {
    193185                cc = *source;
    194                 if (cc == ';') {
    195                     const Entity* entity = findEntity(entityName.data(), entityName.size());
    196                     if (entity) {
    197                         source.advanceAndASSERT(';');
    198                         return entity->code;
    199                     }
     186                entitySearch.advance(cc);
     187                if (!entitySearch.isEntityPrefix())
    200188                    break;
    201                 }
    202                 if (!isAlphaNumeric(cc)) {
    203                     const Entity* entity = findEntity(entityName.data(), entityName.size());
    204                     if (entity) {
    205                         // HTML5 tells us to ignore this entity, for historical reasons,
    206                         // if the lookhead character is '='.
    207                         if (additionalAllowedCharacter && cc == '=')
    208                             break;
    209                         // Some entities require a terminating semicolon, whereas other
    210                         // entities do not.  The HTML5 spec has a giant list:
    211                         //
    212                         // http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references
    213                         //
    214                         // However, the list seems to boil down to this branch:
    215                         if (entity->code > 255)
    216                             break;
    217                         return entity->code;
    218                     }
    219                     break;
    220                 }
    221                 entityName.append(cc);
    222189                consumedCharacters.append(cc);
    223190                source.advanceAndASSERT(cc);
    224191            }
    225192            notEnoughCharacters = source.isEmpty();
     193            if (notEnoughCharacters) {
     194                // We can't an entity because there might be a longer entity
     195                // that we could match if we had more data.
     196                unconsumeCharacters(source, consumedCharacters);
     197                return 0;
     198            }
     199            if (!entitySearch.lastMatch()) {
     200                ASSERT(!entitySearch.currentValue());
     201                unconsumeCharacters(source, consumedCharacters);
     202                return 0;
     203            }
     204            if (entitySearch.lastMatch()->length != entitySearch.currentLength()) {
     205                // We've consumed too many characters.  We need to walk the
     206                // source back to the point at which we had consumed an
     207                // actual entity.
     208                unconsumeCharacters(source, consumedCharacters);
     209                consumedCharacters.clear();
     210                const int length = entitySearch.lastMatch()->length;
     211                const UChar* reference = entitySearch.lastMatch()->entity;
     212                for (int i = 0; i < length; ++i) {
     213                    cc = *source;
     214                    ASSERT_UNUSED(reference, cc == *reference++);
     215                    consumedCharacters.append(cc);
     216                    source.advanceAndASSERT(cc);
     217                    ASSERT(!source.isEmpty());
     218                }
     219                cc = *source;
     220            }
     221            if (entitySearch.lastMatch()->lastCharacter() == ';')
     222                return entitySearch.lastMatch()->value;
     223            if (!additionalAllowedCharacter || !(isAlphaNumeric(cc) || cc == '='))
     224                return entitySearch.lastMatch()->value;
    226225            unconsumeCharacters(source, consumedCharacters);
    227226            return 0;
     
    239238UChar decodeNamedEntity(const char* name)
    240239{
    241     const Entity* e = findEntity(name, strlen(name));
    242     return e ? e->code : 0;
     240    HTMLEntitySearch search;
     241    while (name && search.isEntityPrefix())
     242        search.advance(*name++);
     243    search.advance(';');
     244    UChar32 entityValue = search.currentValue();
     245    if (U16_LENGTH(entityValue) != 1) {
     246        // Callers need to move off this API if the entity table has values
     247        // which do no fit in a 16 bit UChar!
     248        ASSERT_NOT_REACHED();
     249        return 0;
     250    }
     251    return static_cast<UChar>(entityValue);
    243252}
    244253
  • trunk/WebCore/make-hash-tools.pl

    r61091 r65351  
    3030switch ($option) {
    3131
    32 case "HTMLEntityNames" {
    33 
    34     my $htmlEntityNamesGenerated   = "$outdir/HTMLEntityNames.cpp";
    35     my $htmlEntityNamesGperf       = $ARGV[0];
    36     shift;
    37 
    38     system("gperf --key-positions=\"*\" -D -s 2 $htmlEntityNamesGperf > $htmlEntityNamesGenerated") == 0 || die "calling gperf failed: $?";
    39 
    40 } # case "HTMLEntityNames"
    41 
    4232case "DocTypeStrings" {
    4333
  • trunk/WebKitTools/ChangeLog

    r65343 r65351  
     12010-08-12  Adam Barth  <abarth@webkit.org>
     2
     3        Reviewed by Eric Seidel.
     4
     5        Add support for MathML entities
     6        https://bugs.webkit.org/show_bug.cgi?id=43949
     7
     8        A script for generating the C++ state data structure describing all the
     9        entities from a JSON description.
     10
     11        * Scripts/create-html-entity-table: Added.
     12
    1132010-08-13  Dirk Pranke  <dpranke@chromium.org>
    214
Note: See TracChangeset for help on using the changeset viewer.