Context Navigation

← Previous Changeset
Next Changeset →

Changeset 98624 in webkit

Timestamp:

Oct 27, 2011, 1:16:20 PM (14 years ago)

Author:

msaboff@apple.com

Message:

Investigate storing strings in 8-bit buffers when possible
https://bugs.webkit.org/show_bug.cgi?id=66161

Source/JavaScriptCore:

Investigate storing strings in 8-bit buffers when possible
https://bugs.webkit.org/show_bug.cgi?id=66161

Added support for 8 bit string data in StringImpl. Changed
(UChar*) m_data to m_data16. Added char* m_data8 as a union
with m_data16. Added UChar* m_copyData16 to the other union
to store a 16 bit copy of an 8 bit string when needed.
Added characters8() and characters16() accessor methods
that assume the caller has checked the underlying string type
via the new is8Bit() method. The characters() method will
return a UChar* of the string, materializing a 16 bit copy if the
string is an 8 bit string. Added two flags, one for 8 bit buffer
and a second for a 16 bit copy for an 8 bit string.

Fixed method name typo (StringHasher::defaultCoverter()).

Over time the goal is to eliminate calls to characters() and
us the character8() and characters16() accessors.

This patch does not include changes that actually create 8 bit
strings. This is the first of at least 8 patches. Subsequent
patches will be submitted for JIT changes, making the JSC lexer,
parser and literal parser, JavaScript string changes and
then changes in webcore to take advantage of the 8 bit strings.

This change is performance neutral for SunSpider and V8 when
run from the command line with "jsc".

Reviewed by Geoffrey Garen.

JavaScriptCore.exp:
JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def
interpreter/Interpreter.cpp:

(JSC::Interpreter::callEval):

parser/SourceProvider.h:

(JSC::UStringSourceProvider::data):
(JSC::UStringSourceProvider::UStringSourceProvider):

runtime/Identifier.cpp:

(JSC::IdentifierCStringTranslator::hash):
(JSC::IdentifierCStringTranslator::equal):
(JSC::IdentifierCStringTranslator::translate):
(JSC::Identifier::add):
(JSC::Identifier::toUInt32):

runtime/Identifier.h:

(JSC::Identifier::equal):
(JSC::operator==):
(JSC::operator!=):

runtime/JSString.cpp:

(JSC::JSString::resolveRope):
(JSC::JSString::resolveRopeSlowCase):

runtime/RegExp.cpp:

(JSC::RegExp::match):

runtime/StringPrototype.cpp:

(JSC::jsSpliceSubstringsWithSeparators):

runtime/UString.cpp:

(JSC::UString::UString):
(JSC::equalSlowCase):
(JSC::UString::utf8):

runtime/UString.h:

(JSC::UString::characters):
(JSC::UString::characters8):
(JSC::UString::characters16):
(JSC::UString::is8Bit):
(JSC::UString::operator[]):
(JSC::UString::find):
(JSC::operator==):

wtf/StringHasher.h:

(WTF::StringHasher::computeHash):
(WTF::StringHasher::defaultConverter):

wtf/text/AtomicString.cpp:

(WTF::CStringTranslator::hash):
(WTF::CStringTranslator::equal):
(WTF::CStringTranslator::translate):
(WTF::AtomicString::add):

wtf/text/AtomicString.h:

(WTF::AtomicString::AtomicString):
(WTF::AtomicString::contains):
(WTF::AtomicString::find):
(WTF::AtomicString::add):
(WTF::operator==):
(WTF::operator!=):
(WTF::equalIgnoringCase):

wtf/text/StringConcatenate.h:
wtf/text/StringHash.h:

(WTF::StringHash::equal):
(WTF::CaseFoldingHash::hash):

wtf/text/StringImpl.cpp:

(WTF::StringImpl::~StringImpl):
(WTF::StringImpl::createUninitialized):
(WTF::StringImpl::create):
(WTF::StringImpl::getData16SlowCase):
(WTF::StringImpl::containsOnlyWhitespace):
(WTF::StringImpl::substring):
(WTF::StringImpl::characterStartingAt):
(WTF::StringImpl::lower):
(WTF::StringImpl::upper):
(WTF::StringImpl::fill):
(WTF::StringImpl::foldCase):
(WTF::StringImpl::stripMatchedCharacters):
(WTF::StringImpl::removeCharacters):
(WTF::StringImpl::simplifyMatchedCharactersToSpace):
(WTF::StringImpl::toIntStrict):
(WTF::StringImpl::toUIntStrict):
(WTF::StringImpl::toInt64Strict):
(WTF::StringImpl::toUInt64Strict):
(WTF::StringImpl::toIntPtrStrict):
(WTF::StringImpl::toInt):
(WTF::StringImpl::toUInt):
(WTF::StringImpl::toInt64):
(WTF::StringImpl::toUInt64):
(WTF::StringImpl::toIntPtr):
(WTF::StringImpl::toDouble):
(WTF::StringImpl::toFloat):
(WTF::equal):
(WTF::equalIgnoringCase):
(WTF::StringImpl::find):
(WTF::StringImpl::findIgnoringCase):
(WTF::StringImpl::reverseFind):
(WTF::StringImpl::replace):
(WTF::StringImpl::defaultWritingDirection):
(WTF::StringImpl::adopt):
(WTF::StringImpl::createWithTerminatingNullCharacter):

wtf/text/StringImpl.h:

(WTF::StringImpl::StringImpl):
(WTF::StringImpl::create):
(WTF::StringImpl::create8):
(WTF::StringImpl::tryCreateUninitialized):
(WTF::StringImpl::flagsOffset):
(WTF::StringImpl::flagIs8Bit):
(WTF::StringImpl::dataOffset):
(WTF::StringImpl::is8Bit):
(WTF::StringImpl::characters8):
(WTF::StringImpl::characters16):
(WTF::StringImpl::characters):
(WTF::StringImpl::has16BitShadow):
(WTF::StringImpl::setHash):
(WTF::StringImpl::hash):
(WTF::StringImpl::copyChars):
(WTF::StringImpl::operator[]):
(WTF::StringImpl::find):
(WTF::StringImpl::findIgnoringCase):
(WTF::equal):
(WTF::equalIgnoringCase):
(WTF::StringImpl::isolatedCopy):

wtf/text/WTFString.cpp:

(WTF::String::String):
(WTF::String::append):
(WTF::String::format):
(WTF::String::fromUTF8):
(WTF::String::fromUTF8WithLatin1Fallback):

wtf/text/WTFString.h:

(WTF::String::find):
(WTF::String::findIgnoringCase):
(WTF::String::contains):
(WTF::String::append):
(WTF::String::fromUTF8):
(WTF::String::fromUTF8WithLatin1Fallback):
(WTF::operator==):
(WTF::operator!=):
(WTF::equalIgnoringCase):

wtf/unicode/Unicode.h:
yarr/YarrJIT.cpp:

(JSC::Yarr::execute):

yarr/YarrJIT.h:

(JSC::Yarr::YarrCodeBlock::execute):

yarr/YarrParser.h:

(JSC::Yarr::Parser::Parser):

Source/WebCore:

Changes to support 8 bit StringImpl changes.

Reviewed by Geoffrey Garen.

No new tests, refactored StringImpl for 8 bit strings.

platform/text/cf/StringImplCF.cpp:

(WTF::StringImpl::createCFString):

Source/WebKit2:

Added export of StringImpl::getData16SlowCase for linking tests.

Reviewed by Geoffrey Garen.

win/WebKit2.def:

Location:

trunk/Source

Files:

: 29 edited

JavaScriptCore/ChangeLog (modified) (1 diff)
JavaScriptCore/JavaScriptCore.exp (modified) (8 diffs)
JavaScriptCore/JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def (modified) (2 diffs)
JavaScriptCore/interpreter/Interpreter.cpp (modified) (1 diff)
JavaScriptCore/parser/SourceProvider.h (modified) (2 diffs)
JavaScriptCore/runtime/Identifier.cpp (modified) (4 diffs)
JavaScriptCore/runtime/Identifier.h (modified) (3 diffs)
JavaScriptCore/runtime/JSString.cpp (modified) (2 diffs)
JavaScriptCore/runtime/RegExp.cpp (modified) (1 diff)
JavaScriptCore/runtime/StringPrototype.cpp (modified) (2 diffs)
JavaScriptCore/runtime/UString.cpp (modified) (4 diffs)
JavaScriptCore/runtime/UString.h (modified) (5 diffs)
JavaScriptCore/wtf/StringHasher.h (modified) (2 diffs)
JavaScriptCore/wtf/text/AtomicString.cpp (modified) (3 diffs)
JavaScriptCore/wtf/text/AtomicString.h (modified) (5 diffs)
JavaScriptCore/wtf/text/StringConcatenate.h (modified) (4 diffs)
JavaScriptCore/wtf/text/StringHash.h (modified) (3 diffs)
JavaScriptCore/wtf/text/StringImpl.cpp (modified) (51 diffs)
JavaScriptCore/wtf/text/StringImpl.h (modified) (22 diffs)
JavaScriptCore/wtf/text/WTFString.cpp (modified) (8 diffs)
JavaScriptCore/wtf/text/WTFString.h (modified) (8 diffs)
JavaScriptCore/wtf/unicode/Unicode.h (modified) (1 diff)
JavaScriptCore/yarr/YarrJIT.cpp (modified) (1 diff)
JavaScriptCore/yarr/YarrJIT.h (modified) (3 diffs)
JavaScriptCore/yarr/YarrParser.h (modified) (2 diffs)
WebCore/ChangeLog (modified) (1 diff)
WebCore/platform/text/cf/StringImplCF.cpp (modified) (2 diffs)
WebKit2/ChangeLog (modified) (1 diff)
WebKit2/win/WebKit2.def (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/JavaScriptCore/ChangeLog

-              r98606
+              r98624
+-10-27  Michael Saboff  <msaboff@apple.com>
+        Investigate storing strings in 8-bit buffers when possible
+        https://bugs.webkit.org/show_bug.cgi?id=66161
+        Investigate storing strings in 8-bit buffers when possible
+        https://bugs.webkit.org/show_bug.cgi?id=66161
+        Added support for 8 bit string data in StringImpl.  Changed
+        (UChar*) m_data to m_data16.  Added char* m_data8 as a union
+        with m_data16.  Added UChar* m_copyData16 to the other union
+        to store a 16 bit copy of an 8 bit string when needed.
+        Added characters8() and characters16() accessor methods
+        that assume the caller has checked the underlying string type
+        via the new is8Bit() method. The characters() method will
+        return a UChar* of the string, materializing a 16 bit copy if the
+        string is an 8 bit string.  Added two flags, one for 8 bit buffer
+        and a second for a 16 bit copy for an 8 bit string.
+        Fixed method name typo (StringHasher::defaultCoverter()).
+        Over time the goal is to eliminate calls to characters() and
+        us the character8() and characters16() accessors.
+        This patch does not include changes that actually create 8 bit
+        strings. This is the first of at least 8 patches.  Subsequent
+        patches will be submitted for JIT changes, making the JSC lexer,
+        parser and literal parser, JavaScript string changes and
+        then changes in webcore to take advantage of the 8 bit strings.
+        This change is performance neutral for SunSpider and V8 when
+        run from the command line with "jsc".
+        Reviewed by Geoffrey Garen.
+        * JavaScriptCore.exp:
+        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def
+        * interpreter/Interpreter.cpp:
+        (JSC::Interpreter::callEval):
+        * parser/SourceProvider.h:
+        (JSC::UStringSourceProvider::data):
+        (JSC::UStringSourceProvider::UStringSourceProvider):
+        * runtime/Identifier.cpp:
+        (JSC::IdentifierCStringTranslator::hash):
+        (JSC::IdentifierCStringTranslator::equal):
+        (JSC::IdentifierCStringTranslator::translate):
+        (JSC::Identifier::add):
+        (JSC::Identifier::toUInt32):
+        * runtime/Identifier.h:
+        (JSC::Identifier::equal):
+        (JSC::operator==):
+        (JSC::operator!=):
+        * runtime/JSString.cpp:
+        (JSC::JSString::resolveRope):
+        (JSC::JSString::resolveRopeSlowCase):
+        * runtime/RegExp.cpp:
+        (JSC::RegExp::match):
+        * runtime/StringPrototype.cpp:
+        (JSC::jsSpliceSubstringsWithSeparators):
+        * runtime/UString.cpp:
+        (JSC::UString::UString):
+        (JSC::equalSlowCase):
+        (JSC::UString::utf8):
+        * runtime/UString.h:
+        (JSC::UString::characters):
+        (JSC::UString::characters8):
+        (JSC::UString::characters16):
+        (JSC::UString::is8Bit):
+        (JSC::UString::operator[]):
+        (JSC::UString::find):
+        (JSC::operator==):
+        * wtf/StringHasher.h:
+        (WTF::StringHasher::computeHash):
+        (WTF::StringHasher::defaultConverter):
+        * wtf/text/AtomicString.cpp:
+        (WTF::CStringTranslator::hash):
+        (WTF::CStringTranslator::equal):
+        (WTF::CStringTranslator::translate):
+        (WTF::AtomicString::add):
+        * wtf/text/AtomicString.h:
+        (WTF::AtomicString::AtomicString):
+        (WTF::AtomicString::contains):
+        (WTF::AtomicString::find):
+        (WTF::AtomicString::add):
+        (WTF::operator==):
+        (WTF::operator!=):
+        (WTF::equalIgnoringCase):
+        * wtf/text/StringConcatenate.h:
+        * wtf/text/StringHash.h:
+        (WTF::StringHash::equal):
+        (WTF::CaseFoldingHash::hash):
+        * wtf/text/StringImpl.cpp:
+        (WTF::StringImpl::~StringImpl):
+        (WTF::StringImpl::createUninitialized):
+        (WTF::StringImpl::create):
+        (WTF::StringImpl::getData16SlowCase):
+        (WTF::StringImpl::containsOnlyWhitespace):
+        (WTF::StringImpl::substring):
+        (WTF::StringImpl::characterStartingAt):
+        (WTF::StringImpl::lower):
+        (WTF::StringImpl::upper):
+        (WTF::StringImpl::fill):
+        (WTF::StringImpl::foldCase):
+        (WTF::StringImpl::stripMatchedCharacters):
+        (WTF::StringImpl::removeCharacters):
+        (WTF::StringImpl::simplifyMatchedCharactersToSpace):
+        (WTF::StringImpl::toIntStrict):
+        (WTF::StringImpl::toUIntStrict):
+        (WTF::StringImpl::toInt64Strict):
+        (WTF::StringImpl::toUInt64Strict):
+        (WTF::StringImpl::toIntPtrStrict):
+        (WTF::StringImpl::toInt):
+        (WTF::StringImpl::toUInt):
+        (WTF::StringImpl::toInt64):
+        (WTF::StringImpl::toUInt64):
+        (WTF::StringImpl::toIntPtr):
+        (WTF::StringImpl::toDouble):
+        (WTF::StringImpl::toFloat):
+        (WTF::equal):
+        (WTF::equalIgnoringCase):
+        (WTF::StringImpl::find):
+        (WTF::StringImpl::findIgnoringCase):
+        (WTF::StringImpl::reverseFind):
+        (WTF::StringImpl::replace):
+        (WTF::StringImpl::defaultWritingDirection):
+        (WTF::StringImpl::adopt):
+        (WTF::StringImpl::createWithTerminatingNullCharacter):
+        * wtf/text/StringImpl.h:
+        (WTF::StringImpl::StringImpl):
+        (WTF::StringImpl::create):
+        (WTF::StringImpl::create8):
+        (WTF::StringImpl::tryCreateUninitialized):
+        (WTF::StringImpl::flagsOffset):
+        (WTF::StringImpl::flagIs8Bit):
+        (WTF::StringImpl::dataOffset):
+        (WTF::StringImpl::is8Bit):
+        (WTF::StringImpl::characters8):
+        (WTF::StringImpl::characters16):
+        (WTF::StringImpl::characters):
+        (WTF::StringImpl::has16BitShadow):
+        (WTF::StringImpl::setHash):
+        (WTF::StringImpl::hash):
+        (WTF::StringImpl::copyChars):
+        (WTF::StringImpl::operator[]):
+        (WTF::StringImpl::find):
+        (WTF::StringImpl::findIgnoringCase):
+        (WTF::equal):
+        (WTF::equalIgnoringCase):
+        (WTF::StringImpl::isolatedCopy):
+        * wtf/text/WTFString.cpp:
+        (WTF::String::String):
+        (WTF::String::append):
+        (WTF::String::format):
+        (WTF::String::fromUTF8):
+        (WTF::String::fromUTF8WithLatin1Fallback):
+        * wtf/text/WTFString.h:
+        (WTF::String::find):
+        (WTF::String::findIgnoringCase):
+        (WTF::String::contains):
+        (WTF::String::append):
+        (WTF::String::fromUTF8):
+        (WTF::String::fromUTF8WithLatin1Fallback):
+        (WTF::operator==):
+        (WTF::operator!=):
+        (WTF::equalIgnoringCase):
+        * wtf/unicode/Unicode.h:
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::execute):
+        * yarr/YarrJIT.h:
+        (JSC::Yarr::YarrCodeBlock::execute):
+        * yarr/YarrParser.h:
+        (JSC::Yarr::Parser::Parser):
 -10-27  Mark Hahnenberg  <mhahnenberg@apple.com>

trunk/Source/JavaScriptCore/JavaScriptCore.exp

-              r98593
+              r98624
 __ZN3WTF10StringImpl11reverseFindEPS0_j
 __ZN3WTF10StringImpl11reverseFindEtj
-__ZN3WTF10StringImpl16findIgnoringCaseEPKcj
 __ZN3WTF10StringImpl16findIgnoringCaseEPS0_j
 __ZN3WTF10StringImpl18simplifyWhiteSpaceEv
 …
 __ZN3WTF10StringImpl4fillEt
 __ZN3WTF10StringImpl4findEPFbtEj
-__ZN3WTF10StringImpl4findEPKcj
 __ZN3WTF10StringImpl4findEPS0_j
 __ZN3WTF10StringImpl4findEtj
 …
 __ZN3WTF10StringImpl5toIntEPb
 __ZN3WTF10StringImpl5upperEv
+__ZN3WTF10StringImpl6createEPKc
+__ZN3WTF10StringImpl6createEPKcj
+__ZN3WTF10StringImpl6createEPKh
 __ZN3WTF10StringImpl6createEPKtj
 __ZN3WTF10StringImpl7replaceEPS0_S1_
 …
 __ZN3WTF12AtomicString11addSlowCaseEPNS_10StringImplE
 __ZN3WTF12AtomicString16fromUTF8InternalEPKcS2_
 __ZN3WTF12AtomicString3addEPKc
+__ZN3WTF12AtomicString3addEPKh
 __ZN3WTF12AtomicString3addEPKt
 __ZN3WTF12AtomicString3addEPKtj
 …
 __ZN3WTF16fastZeroedMallocEm
 __ZN3WTF17charactersToFloatEPKtmPbS2_
+__ZN3WTF17equalIgnoringCaseEPKtPKcj
+__ZN3WTF17equalIgnoringCaseEPNS_10StringImplEPKc
+__ZN3WTF17equalIgnoringCaseEPKtPKhj
 __ZN3WTF17equalIgnoringCaseEPNS_10StringImplES1_
+__ZN3WTF17equalIgnoringCaseEPNS_10StringImplEPKh
 __ZN3WTF18calculateDSTOffsetEdd
 __ZN3WTF18calculateUTCOffsetEv
 …
 __ZN3WTF5MutexC1Ev
 __ZN3WTF5MutexD1Ev
 __ZN3WTF5equalEPKNS_10StringImplEPKc
+__ZN3WTF5equalEPKNS_10StringImplEPKh
 __ZN3WTF5equalEPKNS_10StringImplEPKtj
 __ZN3WTF5equalEPKNS_10StringImplES2_
 __ZN3WTF5yieldEv
 __ZN3WTF6String26fromUTF8WithLatin1FallbackEPKcm
+__ZN3WTF6String26fromUTF8WithLatin1FallbackEPKhm
 __ZN3WTF6String29charactersWithNullTerminationEv
+__ZN3WTF6StringC1EPKcj
+__ZN3WTF6String6appendEh
 __ZN3WTF6String6appendEPKtj
 __ZN3WTF6String6appendERKS0_
-__ZN3WTF6String6appendEc
 __ZN3WTF6String6appendEt
 __ZN3WTF6String6formatEPKcz
 …
 __ZN3WTF6String6numberEy
 __ZN3WTF6String6removeEji
 __ZN3WTF6String8fromUTF8EPKc
 __ZN3WTF6String8fromUTF8EPKcm
+__ZN3WTF6String8fromUTF8EPKh
+__ZN3WTF6String8fromUTF8EPKhm
 __ZN3WTF6String8truncateEj
 __ZN3WTF6StringC1EPKc
-__ZN3WTF6StringC1EPKcj
 __ZN3WTF6StringC1EPKt
 __ZN3WTF6StringC1EPKtj
 …
 __ZNK3JSC9HashTable11createTableEPNS_12JSGlobalDataE
 __ZNK3JSC9HashTable11deleteTableEv
+__ZNK3WTF10StringImpl17getData16SlowCaseEv
 __ZNK3WTF12AtomicString5lowerEv
 __ZNK3WTF13DecimalNumber15toStringDecimalEPtj

trunk/Source/JavaScriptCore/JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def

-              r98606
+              r98624
     ?absoluteTimeToWaitTimeoutInterval@WTF@@YAKN@Z
     ?activityCallback@Heap@JSC@@QAEPAVGCActivityCallback@2@XZ
+    ?add@AtomicString@WTF@@CA?AV?$PassRefPtr@VStringImpl@WTF@@@2@PBD@Z
     ?add@Identifier@JSC@@SA?AV?$PassRefPtr@VStringImpl@WTF@@@WTF@@PAVExecState@2@PBD@Z
     ?add@PropertyNameArray@JSC@@QAEXPAVStringImpl@WTF@@@Z
 …
     ?empty@StringImpl@WTF@@SAPAV12@XZ
     ?enumerable@PropertyDescriptor@JSC@@QBE_NXZ
-    ?equal@Identifier@JSC@@SA_NPBVStringImpl@WTF@@PBD@Z
     ?equalUTF16WithUTF8@Unicode@WTF@@YA_NPB_W0PBD1@Z
     ?evaluate@DebuggerCallFrame@JSC@@QBE?AVJSValue@2@ABVUString@2@AAV32@@Z

trunk/Source/JavaScriptCore/interpreter/Interpreter.cpp

r98422	r98624
446	446	// FIXME: We can use the preparser in strict mode, we just need additional logic
447	447	// to prevent duplicates.
448		LiteralParser preparser(callFrame, programSource.characters(), programSource.length(), LiteralParser::NonStrictJSON);
	448	LiteralParser preparser(callFrame, programSource.characters16(), programSource.length(), LiteralParser::NonStrictJSON);
449	449	if (JSValue parsedObject = preparser.tryLiteralParse())
450	450	return parsedObject;

trunk/Source/JavaScriptCore/parser/SourceProvider.h

-              r95901
+              r98624
             return m_source.substringSharingImpl(start, end - start);
+        }
         const UChar* data() const { return m_source.characters(); }
+        const UChar* data() const { return m_data; }
         int length() const { return m_source.length(); }
 …
             : SourceProvider(url)
             , m_source(source)
+            , m_data(m_source.characters16())
+        {
+        }
         UString m_source;
+        const UChar* m_data;
     };

trunk/Source/JavaScriptCore/runtime/Identifier.cpp

-              r94475
+              r98624
 struct IdentifierCStringTranslator {
     static unsigned hash(const char* c)
+    {
         return StringHasher::computeHash<char>(c);
+    }
     static bool equal(StringImpl* r, const char* s)
+    static unsigned hash(const LChar* c)
+    {
+        return StringHasher::computeHash<LChar>(c);
+    }
+    static bool equal(StringImpl* r, const LChar* s)
+    {
         return Identifier::equal(r, s);
+    }
     static void translate(StringImpl*& location, const char* c, unsigned hash)
+    {
         size_t length = strlen(c);
+    static void translate(StringImpl*& location, const LChar* c, unsigned hash)
+    {
+        size_t length = strlen(reinterpret_cast<const char*>(c));
         UChar* d;
         StringImpl* r = StringImpl::createUninitialized(length, d).leakRef();
         for (size_t i = 0; i != length; i++)
             d[i] = static_cast<unsigned char>(c[i]); // use unsigned char to zero-extend instead of sign-extend
+            d[i] = c[i];
         r->setHash(hash);
         location = r;
 …
         return StringImpl::empty();
     if (!c[1])
         return add(globalData, globalData->smallStrings.singleCharacterStringRep(static_cast<unsigned char>(c[0])));
+        return add(globalData, globalData->smallStrings.singleCharacterStringRep(c[0]));
     IdentifierTable& identifierTable = *globalData->identifierTable;
 …
         return iter->second;
     pair<HashSet<StringImpl*>::iterator, bool> addResult = identifierTable.add<const char*, IdentifierCStringTranslator>(c);
+    pair<HashSet<StringImpl*>::iterator, bool> addResult = identifierTable.add<const LChar*, IdentifierCStringTranslator>(reinterpret_cast<const LChar*>(c));
     // If the string is newly-translated, then we need to adopt it.
 …
     unsigned length = string.length();
-    const UChar* characters = string.characters();
     // An empty string is not a number.
     if (!length)
         return 0;
+    const UChar* characters = string.characters16();
     // Get the first character, turning it into a digit.

trunk/Source/JavaScriptCore/runtime/Identifier.h

-              r97675
+              r98624
         friend bool operator!=(const Identifier&, const Identifier&);
+        friend bool operator==(const Identifier&, const LChar*);
         friend bool operator==(const Identifier&, const char*);
+        friend bool operator!=(const Identifier&, const LChar*);
         friend bool operator!=(const Identifier&, const char*);
+        static bool equal(const StringImpl*, const char*);
+        static bool equal(const StringImpl*, const LChar*);
+        static inline bool equal(const StringImpl*a, const char*b) { return Identifier::equal(a, reinterpret_cast<const LChar*>(b)); };
         static bool equal(const StringImpl*, const UChar*, unsigned length);
         static bool equal(const StringImpl* a, const StringImpl* b) { return ::equal(a, b); }
 …
         static bool equal(const Identifier& a, const Identifier& b) { return a.m_string.impl() == b.m_string.impl(); }
         static bool equal(const Identifier& a, const char* b) { return equal(a.m_string.impl(), b); }
+        static bool equal(const Identifier& a, const LChar* b) { return equal(a.m_string.impl(), b); }
         static PassRefPtr<StringImpl> add(ExecState*, const UChar*, int length);
 …
+    }
     inline bool operator==(const Identifier& a, const char* b)
+    inline bool operator==(const Identifier& a, const LChar* b)
+    {
         return Identifier::equal(a, b);
+    }
+    inline bool operator!=(const Identifier& a, const char* b)
+    inline bool operator==(const Identifier& a, const char* b)
+    {
+        return Identifier::equal(a, reinterpret_cast<const LChar*>(b));
+    }
+    inline bool operator!=(const Identifier& a, const LChar* b)
+    {
         return !Identifier::equal(a, b);
+    }
+    inline bool Identifier::equal(const StringImpl* r, const char* s)
+    inline bool operator!=(const Identifier& a, const char* b)
+    {
+        return !Identifier::equal(a, reinterpret_cast<const LChar*>(b));
+    }
+    inline bool Identifier::equal(const StringImpl* r, const LChar* s)
+    {
         return WTF::equal(r, s);

trunk/Source/JavaScriptCore/runtime/JSString.cpp

-              r98593
+              r98624
         StringImpl* string = m_fibers[i]->m_value.impl();
         unsigned length = string->length();
         StringImpl::copyChars(position, string->characters(), length);
+        StringImpl::copyChars(position, string->characters16(), length);
         position += length;
         m_fibers[i].clear();
 …
         unsigned length = string->length();
         position -= length;
         StringImpl::copyChars(position, string->characters(), length);
+        StringImpl::copyChars(position, string->characters16(), length);
+    }

trunk/Source/JavaScriptCore/runtime/RegExp.cpp

-              r95936
+              r98624
         if (m_state == JITCode) {
             if (s.is8Bit())
                 result = Yarr::execute(m_representation->m_regExpJITCode, s.latin1().data(), startOffset, s.length(), offsetVector);
+                result = Yarr::execute(m_representation->m_regExpJITCode, s.characters8(), startOffset, s.length(), offsetVector);
             else
                 result = Yarr::execute(m_representation->m_regExpJITCode, s.characters(), startOffset, s.length(), offsetVector);
+                result = Yarr::execute(m_representation->m_regExpJITCode, s.characters16(), startOffset, s.length(), offsetVector);
 #if ENABLE(YARR_JIT_DEBUG)
             matchCompareWithInterpreter(s, startOffset, offsetVector, result);

trunk/Source/JavaScriptCore/runtime/StringPrototype.cpp

-              r98501
+              r98624
         if (i < rangeCount) {
             if (int srcLen = substringRanges[i].length) {
                 StringImpl::copyChars(buffer + bufferPos, source.characters() + substringRanges[i].position, srcLen);
+                StringImpl::copyChars(buffer + bufferPos, source.characters16() + substringRanges[i].position, srcLen);
                 bufferPos += srcLen;
+            }
 …
         if (i < separatorCount) {
             if (int sepLen = separators[i].length()) {
                 StringImpl::copyChars(buffer + bufferPos, separators[i].characters(), sepLen);
+                StringImpl::copyChars(buffer + bufferPos, separators[i].characters16(), sepLen);
                 bufferPos += sepLen;
+            }

trunk/Source/JavaScriptCore/runtime/UString.cpp

-              r94452
+              r98624
 // Construct a string with latin1 data.
+UString::UString(const LChar* characters, unsigned length)
+    : m_impl(characters ? StringImpl::create(characters, length) : 0)
+{
+}
 UString::UString(const char* characters, unsigned length)
     : m_impl(characters ? StringImpl::create(characters, length) : 0)
+    : m_impl(characters ? StringImpl::create(reinterpret_cast<const LChar*>(characters), length) : 0)
+{
+}
 // Construct a string with latin1 data, from a null-terminated source.
+UString::UString(const LChar* characters)
+    : m_impl(characters ? StringImpl::create(characters) : 0)
+{
+}
 UString::UString(const char* characters)
     : m_impl(characters ? StringImpl::create(characters) : 0)
+    : m_impl(characters ? StringImpl::create(reinterpret_cast<const LChar*>(characters)) : 0)
+{
+}
 …
+}
+// This method assumes that all simple checks have been performed by
+// the inlined operator==() in the header file.
+bool equalSlowCase(const UString& s1, const UString& s2)
+{
+    StringImpl* rep1 = s1.impl();
+    StringImpl* rep2 = s2.impl();
+    unsigned size1 = rep1->length();
+    // At this point we know
+    //   (a) that the strings are the same length and
+    //   (b) that they are greater than zero length.
+    bool s1Is8Bit = rep1->is8Bit();
+    bool s2Is8Bit = rep2->is8Bit();
+    if (s1Is8Bit) {
+        const LChar* d1 = rep1->characters8();
+        if (s2Is8Bit) {
+            const LChar* d2 = rep2->characters8();
+            if (d1 == d2) // Check to see if the data pointers are the same.
+                return true;
+            // Do quick checks for sizes 1 and 2.
+            switch (size1) {
+            case 1:
+                return d1[0] == d2[0];
+            case 2:
+                return (d1[0] == d2[0]) & (d1[1] == d2[1]);
+            default:
+                return (!memcmp(d1, d2, size1 * sizeof(LChar)));
+            }
+        }
+        const UChar* d2 = rep2->characters16();
+        for (unsigned i = 0; i < size1; i++) {
+            if (d1[i] != d2[i])
+                return false;
+        }
+        return true;
+    }
+    if (s2Is8Bit) {
+        const UChar* d1 = rep1->characters16();
+        const LChar* d2 = rep2->characters8();
+        for (unsigned i = 0; i < size1; i++) {
+            if (d1[i] != d2[i])
+                return false;
+        }
+        return true;
+    }
+    const UChar* d1 = rep1->characters16();
+    const UChar* d2 = rep2->characters16();
+    if (d1 == d2) // Check to see if the data pointers are the same.
+        return true;
+    // Do quick checks for sizes 1 and 2.
+    switch (size1) {
+    case 1:
+        return d1[0] == d2[0];
+    case 2:
+        return (d1[0] == d2[0]) & (d1[1] == d2[1]);
+    default:
+        return (!memcmp(d1, d2, size1 * sizeof(UChar)));
+    }
+}
 bool operator<(const UString& s1, const UString& s2)
+{
 …
+{
     unsigned length = this->length();
+    const UChar* characters = this->characters();
+    if (is8Bit())
+        return CString(reinterpret_cast<const char*>(characters8()), length);
     // Allocate a buffer big enough to hold all the characters
 …
     if (length > numeric_limits<unsigned>::max() / 3)
         return CString();
+    const UChar* characters = this->characters16();
     Vector<char, 1024> bufferVector(length * 3);

trunk/Source/JavaScriptCore/runtime/UString.h

-              r94981
+              r98624
     // Construct a string with latin1 data.
+    UString(const LChar* characters, unsigned length);
     UString(const char* characters, unsigned length);
     // Construct a string with latin1 data, from a null-terminated source.
+    UString(const LChar* characters);
     UString(const char* characters);
 …
         if (!m_impl)
             return 0;
+        return m_impl->characters();
+    }
+    bool is8Bit() const { return false; }
+        return m_impl->characters16();
+    }
+    const LChar* characters8() const
+    {
+        if (!m_impl)
+            return 0;
+        ASSERT(m_impl->is8Bit());
+        return m_impl->characters8();
+    }
+    const UChar* characters16() const
+    {
+        if (!m_impl)
+            return 0;
+        ASSERT(!m_impl->is8Bit());
+        return m_impl->characters16();
+    }
+    bool is8Bit() const { return m_impl->is8Bit(); }
     CString ascii() const;
 …
         if (!m_impl || index >= m_impl->length())
             return 0;
+        return m_impl->characters()[index];
+        if (is8Bit())
+            return m_impl->characters8()[index];
+        return m_impl->characters16()[index];
+    }
 …
     size_t find(const UString& str, unsigned start = 0) const
         { return m_impl ? m_impl->find(str.impl(), start) : notFound; }
     size_t find(const char* str, unsigned start = 0) const
+    size_t find(const LChar* str, unsigned start = 0) const
         { return m_impl ? m_impl->find(str, start) : notFound; }
 …
 };
+NEVER_INLINE bool equalSlowCase(const UString& s1, const UString& s2);
 ALWAYS_INLINE bool operator==(const UString& s1, const UString& s2)
+{
     StringImpl* rep1 = s1.impl();
     StringImpl* rep2 = s2.impl();
+    if (rep1 == rep2) // If they're the same rep, they're equal.
+        return true;
     unsigned size1 = 0;
     unsigned size2 = 0;
-    if (rep1 == rep2) // If they're the same rep, they're equal.
-        return true;
     if (rep1)
         size1 = rep1->length();
     if (rep2)
         size2 = rep2->length();
     if (size1 != size2) // If the lengths are not the same, we're done.
         return false;
     if (!size1)
         return true;
+    // At this point we know
+    //   (a) that the strings are the same length and
+    //   (b) that they are greater than zero length.
+    const UChar* d1 = rep1->characters();
+    const UChar* d2 = rep2->characters();
+    if (d1 == d2) // Check to see if the data pointers are the same.
+        return true;
+    // Do quick checks for sizes 1 and 2.
+    switch (size1) {
+    case 1:
+        return d1[0] == d2[0];
+    case 2:
+        return (d1[0] == d2[0]) & (d1[1] == d2[1]);
+    default:
+        return memcmp(d1, d2, size1 * sizeof(UChar)) == 0;
+    }
+    if (size1 == 1)
+        return (*rep1)[0] == (*rep2)[0];
+    return equalSlowCase(s1, s2);
+}

trunk/Source/JavaScriptCore/wtf/StringHasher.h

r98495	r98624
135	135	template<typename T> static inline unsigned computeHash(const T* data, unsigned length)
136	136	{
137		return computeHash<T, defaultCoverter>(data, length);
	137	return computeHash<T, defaultConverter>(data, length);
138	138	}
139	139
140	140	template<typename T> static inline unsigned computeHash(const T* data)
141	141	{
142		return computeHash<T, defaultCoverter>(data);
	142	return computeHash<T, defaultConverter>(data);
143	143	}
144	144
…	…
156	156
157	157	private:
158		static inline UChar defaultCoverter(UChar ch)
	158	static inline UChar defaultConverter(UChar ch)
159	159	{
160	160	return ch;
161	161	}
162	162
163		static inline UChar defaultCo~~verter(c~~har ch)
	163	static inline UChar defaultConverter(LChar ch)
164	164	{
165		return ~~static_cast<unsigned char>(ch)~~;
	165	return ch;
166	166	}
167	167

trunk/Source/JavaScriptCore/wtf/text/AtomicString.cpp

r94475	r98624
86	86
87	87	struct CStringTranslator {
88		static unsigned hash(const char* c)
	88	static unsigned hash(const LChar* c)
89	89	{
90	90	return StringHasher::computeHash(c);
91	91	}
92	92
93		static inline bool equal(StringImpl* r, const char* s)
	93	static inline bool equal(StringImpl* r, const LChar* s)
94	94	{
95	95	return WTF::equal(r, s);
96	96	}
97	97
98		static void translate(StringImpl& location, const char const& c, unsigned hash)
	98	static void translate(StringImpl& location, const LChar const& c, unsigned hash)
99	99	{
100	100	location = StringImpl::create(c).leakRef();
…	…
104	104	};
105	105
106		PassRefPtr<StringImpl> AtomicString::add(const char* c)
	106	PassRefPtr<StringImpl> AtomicString::add(const LChar* c)
107	107	{
108	108	if (!c)
…	…
111	111	return StringImpl::empty();
112	112
113		return addToStringTable<const char*, CStringTranslator>(c);
	113	return addToStringTable<const LChar*, CStringTranslator>(c);
114	114	}
115	115

trunk/Source/JavaScriptCore/wtf/text/AtomicString.h

-              r94475
+              r98624
     AtomicString() { }
+    AtomicString(const LChar* s) : m_string(add(s)) { }
     AtomicString(const char* s) : m_string(add(s)) { }
     AtomicString(const UChar* s, unsigned length) : m_string(add(s, length)) { }
 …
     bool contains(UChar c) const { return m_string.contains(c); }
     bool contains(const char* s, bool caseSensitive = true) const
+    bool contains(const LChar* s, bool caseSensitive = true) const
         { return m_string.contains(s, caseSensitive); }
     bool contains(const String& s, bool caseSensitive = true) const
 …
     size_t find(UChar c, size_t start = 0) const { return m_string.find(c, start); }
     size_t find(const char* s, size_t start = 0, bool caseSentitive = true) const
+    size_t find(const LChar* s, size_t start = 0, bool caseSentitive = true) const
         { return m_string.find(s, start, caseSentitive); }
     size_t find(const String& s, size_t start = 0, bool caseSentitive = true) const
 …
     String m_string;
+    static PassRefPtr<StringImpl> add(const char*);
+    static PassRefPtr<StringImpl> add(const LChar*);
+    ALWAYS_INLINE static PassRefPtr<StringImpl> add(const char* s) { return add(reinterpret_cast<const LChar*>(s)); };
     static PassRefPtr<StringImpl> add(const UChar*, unsigned length);
+    ALWAYS_INLINE static PassRefPtr<StringImpl> add(const char* s, unsigned length) { return add(reinterpret_cast<const char*>(s), length); };
     static PassRefPtr<StringImpl> add(const UChar*, unsigned length, unsigned existingHash);
     static PassRefPtr<StringImpl> add(const UChar*);
 …
 inline bool operator==(const AtomicString& a, const AtomicString& b) { return a.impl() == b.impl(); }
 bool operator==(const AtomicString& a, const char* b);
 inline bool operator==(const AtomicString& a, const char* b) { return WTF::equal(a.impl(), b); }
+bool operator==(const AtomicString&, const LChar*);
+inline bool operator==(const AtomicString& a, const char* b) { return WTF::equal(a.impl(), reinterpret_cast<const LChar*>(b)); }
 inline bool operator==(const AtomicString& a, const Vector<UChar>& b) { return a.impl() && equal(a.impl(), b.data(), b.size()); }
 inline bool operator==(const AtomicString& a, const String& b) { return equal(a.impl(), b.impl()); }
 inline bool operator==(const char* a, const AtomicString& b) { return b == a; }
+inline bool operator==(const LChar* a, const AtomicString& b) { return b == a; }
 inline bool operator==(const String& a, const AtomicString& b) { return equal(a.impl(), b.impl()); }
 inline bool operator==(const Vector<UChar>& a, const AtomicString& b) { return b == a; }
 inline bool operator!=(const AtomicString& a, const AtomicString& b) { return a.impl() != b.impl(); }
+inline bool operator!=(const AtomicString& a, const char *b) { return !(a == b); }
+inline bool operator!=(const AtomicString& a, const LChar* b) { return !(a == b); }
+inline bool operator!=(const AtomicString& a, const char* b) { return !(a == b); }
 inline bool operator!=(const AtomicString& a, const String& b) { return !equal(a.impl(), b.impl()); }
 inline bool operator!=(const AtomicString& a, const Vector<UChar>& b) { return !(a == b); }
 inline bool operator!=(const char* a, const AtomicString& b) { return !(b == a); }
+inline bool operator!=(const LChar* a, const AtomicString& b) { return !(b == a); }
 inline bool operator!=(const String& a, const AtomicString& b) { return !equal(a.impl(), b.impl()); }
 inline bool operator!=(const Vector<UChar>& a, const AtomicString& b) { return !(a == b); }
 inline bool equalIgnoringCase(const AtomicString& a, const AtomicString& b) { return equalIgnoringCase(a.impl(), b.impl()); }
+inline bool equalIgnoringCase(const AtomicString& a, const char* b) { return equalIgnoringCase(a.impl(), b); }
+inline bool equalIgnoringCase(const AtomicString& a, const LChar* b) { return equalIgnoringCase(a.impl(), b); }
+inline bool equalIgnoringCase(const AtomicString& a, const char* b) { return equalIgnoringCase(a.impl(), reinterpret_cast<const LChar*>(b)); }
 inline bool equalIgnoringCase(const AtomicString& a, const String& b) { return equalIgnoringCase(a.impl(), b.impl()); }
+inline bool equalIgnoringCase(const char* a, const AtomicString& b) { return equalIgnoringCase(a, b.impl()); }
+inline bool equalIgnoringCase(const LChar* a, const AtomicString& b) { return equalIgnoringCase(a, b.impl()); }
+inline bool equalIgnoringCase(const char* a, const AtomicString& b) { return equalIgnoringCase(reinterpret_cast<const LChar*>(a), b.impl()); }
 inline bool equalIgnoringCase(const String& a, const AtomicString& b) { return equalIgnoringCase(a.impl(), b.impl()); }

trunk/Source/JavaScriptCore/wtf/text/StringConcatenate.h

-              r90813
+              r98624
 template<>
+class StringTypeAdapter<LChar> {
+public:
+    StringTypeAdapter<LChar>(LChar buffer)
+        : m_buffer(buffer)
+    {
+    }
+    unsigned length() { return 1; }
+    void writeTo(UChar* destination) { *destination = m_buffer; }
+private:
+    LChar m_buffer;
+};
+template<>
 class StringTypeAdapter<UChar> {
 public:
 …
 template<>
+class StringTypeAdapter<LChar*> {
+public:
+    StringTypeAdapter<LChar*>(LChar* buffer)
+    : m_buffer(buffer)
+    , m_length(strlen(reinterpret_cast<char*>(buffer)))
+    {
+    }
+    unsigned length() { return m_length; }
+    void writeTo(UChar* destination)
+    {
+        for (unsigned i = 0; i < m_length; ++i)
+            destination[i] = m_buffer[i];
+    }
+private:
+    const LChar* m_buffer;
+    unsigned m_length;
+};
+template<>
 class StringTypeAdapter<const UChar*> {
 public:
 …
 template<>
+class StringTypeAdapter<const LChar*> {
+public:
+    StringTypeAdapter<const LChar*>(const LChar* buffer)
+        : m_buffer(buffer)
+        , m_length(strlen(reinterpret_cast<const char*>(buffer)))
+    {
+    }
+    unsigned length() { return m_length; }
+    void writeTo(UChar* destination)
+    {
+        for (unsigned i = 0; i < m_length; ++i)
+            destination[i] = m_buffer[i];
+    }
+private:
+    const LChar* m_buffer;
+    unsigned m_length;
+};
+template<>
 class StringTypeAdapter<Vector<char> > {
 public:
 …
 private:
     const Vector<char>& m_buffer;
+};
+template<>
+class StringTypeAdapter<Vector<LChar> > {
+public:
+    StringTypeAdapter<Vector<LChar> >(const Vector<LChar>& buffer)
+        : m_buffer(buffer)
+    {
+    }
+    size_t length() { return m_buffer.size(); }
+    void writeTo(UChar* destination)
+    {
+        for (size_t i = 0; i < m_buffer.size(); ++i)
+            destination[i] = m_buffer[i];
+    }
+private:
+    const Vector<LChar>& m_buffer;
 };

trunk/Source/JavaScriptCore/wtf/text/StringHash.h

-              r95090
+              r98624
                 return false;
+            if (a->is8Bit()) {
+                if (b->is8Bit()) {
+                    // Both a & b are 8 bit.
+                    const LChar* aChars = a->characters8();
+                    const LChar* bChars = b->characters8();
+                    unsigned i = 0;
+                    // FIXME: perhaps we should have a more abstract macro that indicates when
+                    // going 4 bytes at a time is unsafe
+#if (CPU(X86) || CPU(X86_64))
+                    const unsigned charsPerInt = sizeof(uint32_t) / sizeof(char);
+                    if (aLength > charsPerInt) {
+                        unsigned stopCount = aLength & ~(charsPerInt - 1);
+                        const uint32_t* aIntCharacters = reinterpret_cast<const uint32_t*>(aChars);
+                        const uint32_t* bIntCharacters = reinterpret_cast<const uint32_t*>(bChars);
+                        for (unsigned j = 0; i < stopCount; i += charsPerInt, ++j) {
+                            if (aIntCharacters[j] != bIntCharacters[j])
+                                return false;
+                        }
+                    }
+#endif
+                    for (; i < aLength; ++i) {
+                        if (aChars[i] != bChars[i])
+                            return false;
+                    }
+                    return true;
+                }
+                // We know that a is 8 bit & b is 16 bit.
+                const LChar* aChars = a->characters8();
+                const UChar* bChars = b->characters16();
+                for (unsigned i = 0; i != aLength; ++i) {
+                    if (*aChars++ != *bChars++)
+                        return false;
+                }
+                return true;
+            }
+            if (b->is8Bit()) {
+                // We know that a is 8 bit and b is 16 bit.
+                const UChar* aChars = a->characters16();
+                const LChar* bChars = b->characters8();
+                for (unsigned i = 0; i != aLength; ++i) {
+                    if (*aChars++ != *bChars++)
+                        return false;
+                }
+                return true;
+            }
+            // Both a & b are 16 bit.
             // FIXME: perhaps we should have a more abstract macro that indicates when
             // going 4 bytes at a time is unsafe
 #if CPU(ARM) || CPU(SH4) || CPU(MIPS) || CPU(SPARC)
             const UChar* aChars = a->characters();
             const UChar* bChars = b->characters();
+            const UChar* aChars = a->characters16();
+            const UChar* bChars = b->characters16();
             for (unsigned i = 0; i != aLength; ++i) {
                 if (*aChars++ != *bChars++)
 …
 #else
             /* Do it 4-bytes-at-a-time on architectures where it's safe */
             const uint32_t* aChars = reinterpret_cast<const uint32_t*>(a->characters());
             const uint32_t* bChars = reinterpret_cast<const uint32_t*>(b->characters());
+            const uint32_t* aChars = reinterpret_cast<const uint32_t*>(a->characters16());
+            const uint32_t* bChars = reinterpret_cast<const uint32_t*>(b->characters16());
             unsigned halfLength = aLength >> 1;
 …
+        }
+        static unsigned hash(const char* data, unsigned length)
+        {
+            return StringHasher::computeHash<char, foldCase<char> >(data, length);
+        }
+        static unsigned hash(const LChar* data, unsigned length)
+        {
+            return StringHasher::computeHash<LChar, foldCase<LChar> >(data, length);
+        }
+        static inline unsigned hash(const char* data, unsigned length)
+        {
+            return CaseFoldingHash::hash(reinterpret_cast<const LChar*>(data), length);
+        }
         static bool equal(const StringImpl* a, const StringImpl* b)
+        {

trunk/Source/JavaScriptCore/wtf/text/StringImpl.cpp

-              r98316
+              r98624
     BufferOwnership ownership = bufferOwnership();
+    if (has16BitShadow()) {
+        ASSERT(m_copyData16);
+        fastFree(m_copyData16);
+    }
     if (ownership == BufferInternal)
         return;
     if (ownership == BufferOwned) {
+        ASSERT(m_data);
+        fastFree(const_cast<UChar*>(m_data));
+        // We use m_data8, but since it is a union with m_data16 this works either way.
+        ASSERT(m_data8);
+        fastFree(const_cast<LChar*>(m_data8));
         return;
+    }
 …
     ASSERT(m_substringBuffer);
     m_substringBuffer->deref();
+}
+PassRefPtr<StringImpl> StringImpl::createUninitialized(unsigned length, LChar*& data)
+{
+    if (!length) {
+        data = 0;
+        return empty();
+    }
+    // Allocate a single buffer large enough to contain the StringImpl
+    // struct as well as the data which it contains. This removes one
+    // heap allocation from this call.
+    if (length > ((std::numeric_limits<unsigned>::max() - sizeof(StringImpl)) / sizeof(LChar)))
+        CRASH();
+    size_t size = sizeof(StringImpl) + length * sizeof(LChar);
+    StringImpl* string = static_cast<StringImpl*>(fastMalloc(size));
+    data = reinterpret_cast<LChar*>(string + 1);
+    return adoptRef(new (string) StringImpl(length, Force8BitConstructor));
+}
 …
+}
 PassRefPtr<StringImpl> StringImpl::create(const char* characters, unsigned length)
+PassRefPtr<StringImpl> StringImpl::create(const LChar* characters, unsigned length)
+{
     if (!characters || !length)
 …
+}
 PassRefPtr<StringImpl> StringImpl::create(const char* string)
+PassRefPtr<StringImpl> StringImpl::create(const LChar* string)
+{
     if (!string)
         return empty();
     size_t length = strlen(string);
+    size_t length = strlen(reinterpret_cast<const char*>(string));
     if (length > numeric_limits<unsigned>::max())
         CRASH();
 …
+}
+const UChar* StringImpl::getData16SlowCase() const
+{
+    if (has16BitShadow())
+        return m_copyData16;
+    if (bufferOwnership() == BufferSubstring) {
+        // If this is a substring, return a pointer into the parent string.
+        // TODO: Consider severing this string from the parent string
+        unsigned offset = m_data8 - m_substringBuffer->characters8();
+        return m_substringBuffer->characters16() + offset;
+    }
+    unsigned len = length();
+    m_copyData16 = static_cast<UChar*>(fastMalloc(len * sizeof(UChar)));
+    for (size_t i = 0; i < len; i++)
+        m_copyData16[i] = m_data8[i];
+    m_hashAndFlags |= s_hashFlagHas16BitShadow;
+    return m_copyData16;
+}
 bool StringImpl::containsOnlyWhitespace()
+{
 …
     // that are not whitespace from the point of view of RenderText; I wonder if
     // that's a problem in practice.
+    for (unsigned i = 0; i < m_length; i++)
+        if (!isASCIISpace(m_data[i]))
+    if (is8Bit()) {
+        for (unsigned i = 0; i < m_length; i++) {
+            UChar c = m_data8[i];
+            if (!isASCIISpace(c))
+                return false;
+        }
+        return true;
+    }
+    for (unsigned i = 0; i < m_length; i++) {
+        UChar c = m_data16[i];
+        if (!isASCIISpace(c))
             return false;
+    }
     return true;
+}
 …
         length = maxLength;
+    }
+    return create(m_data + start, length);
+    if (is8Bit())
+        return create(m_data8 + start, length);
+    return create(m_data16 + start, length);
+}
 UChar32 StringImpl::characterStartingAt(unsigned i)
+{
+    if (U16_IS_SINGLE(m_data[i]))
+        return m_data[i];
+    if (i + 1 < m_length && U16_IS_LEAD(m_data[i]) && U16_IS_TRAIL(m_data[i + 1]))
+        return U16_GET_SUPPLEMENTARY(m_data[i], m_data[i + 1]);
+    if (is8Bit())
+        return m_data8[i];
+    if (U16_IS_SINGLE(m_data16[i]))
+        return m_data16[i];
+    if (i + 1 < m_length && U16_IS_LEAD(m_data16[i]) && U16_IS_TRAIL(m_data16[i + 1]))
+        return U16_GET_SUPPLEMENTARY(m_data16[i], m_data16[i + 1]);
     return 0;
+}
 …
     // First scan the string for uppercase and non-ASCII characters:
+    bool noUpper = true;
     UChar ored = 0;
+    bool noUpper = true;
+    const UChar *end = m_data + m_length;
+    for (const UChar* chp = m_data; chp != end; chp++) {
+    if (is8Bit()) {
+        const LChar* end = m_data8 + m_length;
+        for (const LChar* chp = m_data8; chp != end; chp++) {
+            if (UNLIKELY(isASCIIUpper(*chp)))
+                noUpper = false;
+            ored |= *chp;
+        }
+        // Nothing to do if the string is all ASCII with no uppercase.
+        if (noUpper && !(ored & ~0x7F))
+            return this;
+        if (m_length > static_cast<unsigned>(numeric_limits<int32_t>::max()))
+            CRASH();
+        int32_t length = m_length;
+        LChar* data8;
+        RefPtr<StringImpl> newImpl = createUninitialized(length, data8);
+        if (!(ored & ~0x7F)) {
+            for (int32_t i = 0; i < length; i++)
+                data8[i] = toASCIILower(m_data8[i]);
+            return newImpl.release();
+        }
+        // Do a slower implementation for cases that include non-ASCII Latin-1 characters.
+        for (int32_t i = 0; i < length; i++)
+            data8[i] = static_cast<LChar>(Unicode::toLower(m_data8[i]));
+        return newImpl.release();
+    }
+    const UChar *end = m_data16 + m_length;
+    for (const UChar* chp = m_data16; chp != end; chp++) {
         if (UNLIKELY(isASCIIUpper(*chp)))
             noUpper = false;
         ored |= *chp;
+    }
     // Nothing to do if the string is all ASCII with no uppercase.
     if (noUpper && !(ored & ~0x7F))
 …
     int32_t length = m_length;
-    UChar* data;
-    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
     if (!(ored & ~0x7F)) {
+        // Do a faster loop for the case where all the characters are ASCII.
+        for (int i = 0; i < length; i++) {
+            UChar c = m_data[i];
+            data[i] = toASCIILower(c);
+        }
+        return newImpl;
+        UChar* data16;
+        RefPtr<StringImpl> newImpl = createUninitialized(m_length, data16);
+        for (int32_t i = 0; i < length; i++) {
+            UChar c = m_data16[i];
+            data16[i] = toASCIILower(c);
+        }
+        return newImpl.release();
+    }
     // Do a slower implementation for cases that include non-ASCII characters.
+    UChar* data16;
+    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data16);
     bool error;
     int32_t realLength = Unicode::toLower(data, length, m_data, m_length, &error);
+    int32_t realLength = Unicode::toLower(data16, length, m_data16, m_length, &error);
     if (!error && realLength == length)
+        return newImpl;
+    newImpl = createUninitialized(realLength, data);
+    Unicode::toLower(data, realLength, m_data, m_length, &error);
+        return newImpl.release();
+    newImpl = createUninitialized(realLength, data16);
+    Unicode::toLower(data16, realLength, m_data16, m_length, &error);
     if (error)
         return this;
     return newImpl;
+    return newImpl.release();
+}
 …
     // but in empirical testing, few actual calls to upper() are no-ops, so
     // it wouldn't be worth the extra time for pre-scanning.
-    UChar* data;
-    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
     if (m_length > static_cast<unsigned>(numeric_limits<int32_t>::max()))
 …
     int32_t length = m_length;
+    if (is8Bit()) {
+        LChar* data8;
+        RefPtr<StringImpl> newImpl = createUninitialized(m_length, data8);
+        // Do a faster loop for the case where all the characters are ASCII.
+        char ored = 0;
+        for (int i = 0; i < length; i++) {
+            char c = m_data8[i];
+            ored |= c;
+            data8[i] = toASCIIUpper(c);
+        }
+        if (!(ored & ~0x7F))
+            return newImpl.release();
+        // Do a slower implementation for cases that include non-ASCII Latin-1 characters.
+        for (int32_t i = 0; i < length; i++)
+            data8[i] = static_cast<LChar>(Unicode::toUpper(m_data8[i]));
+        return newImpl.release();
+    }
+    UChar* data16;
+    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data16);
     // Do a faster loop for the case where all the characters are ASCII.
     UChar ored = 0;
     for (int i = 0; i < length; i++) {
         UChar c = m_data[i];
+        UChar c = m_data16[i];
         ored |= c;
         data[i] = toASCIIUpper(c);
+        data16[i] = toASCIIUpper(c);
+    }
     if (!(ored & ~0x7F))
 …
     // Do a slower implementation for cases that include non-ASCII characters.
     bool error;
+    int32_t realLength = Unicode::toUpper(data, length, m_data, m_length, &error);
+    newImpl = createUninitialized(m_length, data16);
+    int32_t realLength = Unicode::toUpper(data16, length, m_data16, m_length, &error);
     if (!error && realLength == length)
         return newImpl;
     newImpl = createUninitialized(realLength, data);
     Unicode::toUpper(data, realLength, m_data, m_length, &error);
+    newImpl = createUninitialized(realLength, data16);
+    Unicode::toUpper(data16, realLength, m_data16, m_length, &error);
     if (error)
         return this;
 …
         return this;
+    if (!(character & ~0x7F)) {
+        LChar* data;
+        RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
+        for (unsigned i = 0; i < m_length; ++i)
+            data[i] = character;
+        return newImpl.release();
+    }
     UChar* data;
     RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
 …
 PassRefPtr<StringImpl> StringImpl::foldCase()
+{
-    UChar* data;
-    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
     if (m_length > static_cast<unsigned>(numeric_limits<int32_t>::max()))
         CRASH();
     int32_t length = m_length;
+    if (is8Bit()) {
+        // Do a faster loop for the case where all the characters are ASCII.
+        LChar* data;
+        RefPtr <StringImpl>newImpl = createUninitialized(m_length, data);
+        LChar ored = 0;
+        for (int32_t i = 0; i < length; i++) {
+            LChar c = m_data8[i];
+            data[i] = toASCIILower(c);
+            ored |= c;
+        }
+        if (!(ored & ~0x7F))
+            return newImpl.release();
+        // Do a slower implementation for cases that include non-ASCII Latin-1 characters.
+        for (int32_t i = 0; i < length; i++)
+            data[i] = static_cast<LChar>(Unicode::toLower(m_data8[i]));
+    }
     // Do a faster loop for the case where all the characters are ASCII.
+    UChar* data;
+    RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
     UChar ored = 0;
     for (int32_t i = 0; i < length; i++) {
         UChar c = m_data[i];
+        UChar c = m_data16[i];
         ored |= c;
         data[i] = toASCIILower(c);
 …
     // Do a slower implementation for cases that include non-ASCII characters.
     bool error;
     int32_t realLength = Unicode::foldCase(data, length, m_data, m_length, &error);
+    int32_t realLength = Unicode::foldCase(data, length, m_data16, m_length, &error);
     if (!error && realLength == length)
         return newImpl.release();
     newImpl = createUninitialized(realLength, data);
     Unicode::foldCase(data, realLength, m_data, m_length, &error);
+    Unicode::foldCase(data, realLength, m_data16, m_length, &error);
     if (error)
         return this;
 …
     // skip white space from start
     while (start <= end && predicate(m_data[start]))
+    while (start <= end && predicate(is8Bit() ? m_data8[start] : m_data16[start]))
         start++;
 …
     // skip white space from end
     while (end && predicate(m_data[end]))
+    while (end && predicate(is8Bit() ? m_data8[end] : m_data16[end]))
         end--;
     if (!start && end == m_length - 1)
         return this;
+    return create(m_data + start, end + 1 - start);
+    if (is8Bit())
+        return create(m_data8 + start, end + 1 - start);
+    return create(m_data16 + start, end + 1 - start);
+}
 …
+}
+// FIXME: Add 8-bit path. Likely requires templatized StringBuffer class
 PassRefPtr<StringImpl> StringImpl::removeCharacters(CharacterMatchFunctionPtr findMatch)
+{
     const UChar* from = m_data;
+    const UChar* from = characters();
     const UChar* fromend = from + m_length;
 …
     StringBuffer data(m_length);
     UChar* to = data.characters();
     unsigned outc = from - m_data;
+    unsigned outc = from - characters16();
     if (outc)
         memcpy(to, m_data, outc * sizeof(UChar));
+        memcpy(to, characters16(), outc * sizeof(UChar));
     while (true) {
 …
+}
+// FIXME: Add 8-bit path. Likely requires templatized StringBuffer class
 template <class UCharPredicate>
 inline PassRefPtr<StringImpl> StringImpl::simplifyMatchedCharactersToSpace(UCharPredicate predicate)
 …
     StringBuffer data(m_length);
     const UChar* from = m_data;
+    const UChar* from = characters16();
     const UChar* fromend = from + m_length;
     int outc = 0;
 …
 int StringImpl::toIntStrict(bool* ok, int base)
+{
     return charactersToIntStrict(m_data, m_length, ok, base);
+    return charactersToIntStrict(characters16(), m_length, ok, base);
+}
 unsigned StringImpl::toUIntStrict(bool* ok, int base)
+{
     return charactersToUIntStrict(m_data, m_length, ok, base);
+    return charactersToUIntStrict(characters16(), m_length, ok, base);
+}
 int64_t StringImpl::toInt64Strict(bool* ok, int base)
+{
+    return charactersToInt64Strict(m_data, m_length, ok, base);
+    return charactersToInt64Strict(characters16(), m_length, ok, base);
+}
 uint64_t StringImpl::toUInt64Strict(bool* ok, int base)
+{
     return charactersToUInt64Strict(m_data, m_length, ok, base);
+    return charactersToUInt64Strict(characters16(), m_length, ok, base);
+}
 intptr_t StringImpl::toIntPtrStrict(bool* ok, int base)
+{
     return charactersToIntPtrStrict(m_data, m_length, ok, base);
+    return charactersToIntPtrStrict(characters16(), m_length, ok, base);
+}
 int StringImpl::toInt(bool* ok)
+{
     return charactersToInt(m_data, m_length, ok);
+    return charactersToInt(characters16(), m_length, ok);
+}
 unsigned StringImpl::toUInt(bool* ok)
+{
     return charactersToUInt(m_data, m_length, ok);
+    return charactersToUInt(characters16(), m_length, ok);
+}
 int64_t StringImpl::toInt64(bool* ok)
+{
     return charactersToInt64(m_data, m_length, ok);
+    return charactersToInt64(characters16(), m_length, ok);
+}
 uint64_t StringImpl::toUInt64(bool* ok)
+{
     return charactersToUInt64(m_data, m_length, ok);
+    return charactersToUInt64(characters16(), m_length, ok);
+}
 intptr_t StringImpl::toIntPtr(bool* ok)
+{
+    return charactersToIntPtr(m_data, m_length, ok);
+    return charactersToIntPtr(characters16(), m_length, ok);
+}
 double StringImpl::toDouble(bool* ok, bool* didReadNumber)
+{
     return charactersToDouble(m_data, m_length, ok, didReadNumber);
+    return charactersToDouble(characters16(), m_length, ok, didReadNumber);
+}
 float StringImpl::toFloat(bool* ok, bool* didReadNumber)
+{
     return charactersToFloat(m_data, m_length, ok, didReadNumber);
+}
 static bool equal(const UChar* a, const char* b, int length)
+    return charactersToFloat(characters16(), m_length, ok, didReadNumber);
+}
+static bool equal(const UChar* a, const LChar* b, int length)
+{
     ASSERT(length >= 0);
 …
+}
 bool equalIgnoringCase(const UChar* a, const char* b, unsigned length)
+bool equalIgnoringCase(const UChar* a, const LChar* b, unsigned length)
+{
     while (length--) {
 …
 size_t StringImpl::find(UChar c, unsigned start)
+{
     return WTF::find(m_data, m_length, c, start);
+    return WTF::find(characters16(), m_length, c, start);
+}
 size_t StringImpl::find(CharacterMatchFunctionPtr matchFunction, unsigned start)
+{
     return WTF::find(m_data, m_length, matchFunction, start);
+}
 size_t StringImpl::find(const char* matchString, unsigned index)
+    return WTF::find(characters16(), m_length, matchFunction, start);
+}
+size_t StringImpl::find(const LChar* matchString, unsigned index)
+{
     // Check for null or empty string to match against
     if (!matchString)
         return notFound;
     size_t matchStringLength = strlen(matchString);
+    size_t matchStringLength = strlen(reinterpret_cast<const char*>(matchString));
     if (matchStringLength > numeric_limits<unsigned>::max())
         CRASH();
 …
     // Optimization 1: fast case for strings of length 1.
     if (matchLength == 1)
         return WTF::find(characters(), length(), *(const unsigned char*)matchString, index);
+        return WTF::find(characters16(), length(), *matchString, index);
     // Check index & matchLength are in range.
 …
     const UChar* searchCharacters = characters() + index;
-    const unsigned char* matchCharacters = (const unsigned char*)matchString;
     // Optimization 2: keep a running hash of the strings,
 …
     for (unsigned i = 0; i < matchLength; ++i) {
         searchHash += searchCharacters[i];
         matchHash += matchCharacters[i];
+        matchHash += matchString[i];
+    }
 …
+}
 size_t StringImpl::findIgnoringCase(const char* matchString, unsigned index)
+size_t StringImpl::findIgnoringCase(const LChar* matchString, unsigned index)
+{
     // Check for null or empty string to match against
     if (!matchString)
         return notFound;
     size_t matchStringLength = strlen(matchString);
+    size_t matchStringLength = strlen(reinterpret_cast<const char*>(matchString));
     if (matchStringLength > numeric_limits<unsigned>::max())
         CRASH();
 …
     // Optimization 1: fast case for strings of length 1.
     if (matchLength == 1)
         return WTF::find(characters(), length(), matchString->characters()[0], index);
+        return WTF::find(characters16(), length(), matchString->characters16()[0], index);
     // Check index & matchLength are in range.
 …
 size_t StringImpl::reverseFind(UChar c, unsigned index)
+{
     return WTF::reverseFind(m_data, m_length, c, index);
+    return WTF::reverseFind(characters16(), m_length, c, index);
+}
 …
     // Optimization 1: fast case for strings of length 1.
     if (matchLength == 1)
         return WTF::reverseFind(characters(), length(), matchString->characters()[0], index);
+        return WTF::reverseFind(characters16(), length(), matchString->characters()[0], index);
     // Check index & matchLength are in range.
 …
         return this;
     unsigned i;
+    for (i = 0; i != m_length; ++i)
+        if (m_data[i] == oldC)
+    for (i = 0; i != m_length; ++i) {
+        UChar c = is8Bit() ? m_data8[i] : m_data16[i];
+        if (c == oldC)
             break;
+    }
     if (i == m_length)
         return this;
+    if (is8Bit()) {
+        if (oldC > 0xff)
+            // Looking for a 16 bit char in an 8 bit string, we're done.
+            return this;
+        if (newC <= 0xff) {
+            LChar* data;
+            LChar oldChar = static_cast<LChar>(oldC);
+            LChar newChar = static_cast<LChar>(newC);
+            RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
+            for (i = 0; i != m_length; ++i) {
+                char ch = m_data8[i];
+                if (ch == oldChar)
+                    ch = newChar;
+                data[i] = ch;
+            }
+            return newImpl.release();
+        }
+        // There is the possibility we need to up convert from 8 to 16 bit,
+        // create a 16 bit string for the result.
+        UChar* data;
+        RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
+        for (i = 0; i != m_length; ++i) {
+            UChar ch = m_data8[i];
+            if (ch == oldC)
+                ch = newC;
+            data[i] = ch;
+        }
+        return newImpl.release();
+    }
     UChar* data;
     RefPtr<StringImpl> newImpl = createUninitialized(m_length, data);
     for (i = 0; i != m_length; ++i) {
         UChar ch = m_data[i];
+        UChar ch = m_data16[i];
         if (ch == oldC)
             ch = newC;
 …
     if (!lengthToReplace && !lengthToInsert)
         return this;
-    UChar* data;
     if ((length() - lengthToReplace) >= (numeric_limits<unsigned>::max() - lengthToInsert))
         CRASH();
+    if (is8Bit() && (!str || str->is8Bit())) {
+        LChar* data;
+        RefPtr<StringImpl> newImpl =
+        createUninitialized(length() - lengthToReplace + lengthToInsert, data);
+        memcpy(data, m_data8, position * sizeof(LChar));
+        if (str)
+            memcpy(data + position, str->m_data8, lengthToInsert * sizeof(LChar));
+        memcpy(data + position + lengthToInsert, m_data8 + position + lengthToReplace,
+               (length() - position - lengthToReplace) * sizeof(LChar));
+        return newImpl.release();
+    }
+    UChar* data;
     RefPtr<StringImpl> newImpl =
         createUninitialized(length() - lengthToReplace + lengthToInsert, data);
+    memcpy(data, characters(), position * sizeof(UChar));
+    if (str)
+        memcpy(data + position, str->characters(), lengthToInsert * sizeof(UChar));
+    memcpy(data + position + lengthToInsert, characters() + position + lengthToReplace,
+        (length() - position - lengthToReplace) * sizeof(UChar));
+    if (is8Bit())
+        for (unsigned i = 0; i < position; i++)
+            data[i] = m_data8[i];
+    else
+        memcpy(data, m_data16, position * sizeof(UChar));
+    if (str) {
+        if (str->is8Bit())
+            for (unsigned i = 0; i < lengthToInsert; i++)
+                data[i + position] = str->m_data8[i];
+        else
+            memcpy(data + position, str->m_data16, lengthToInsert * sizeof(UChar));
+    }
+    if (is8Bit()) {
+        for (unsigned i = 0; i < length() - position - lengthToReplace; i++)
+            data[i + position + lengthToInsert] = m_data8[i + position + lengthToReplace];
+    } else {
+        memcpy(data + position + lengthToInsert, characters() + position + lengthToReplace,
+            (length() - position - lengthToReplace) * sizeof(UChar));
+    }
     return newImpl.release();
+}
 …
     unsigned matchCount = 0;
     // Count the matches
+    // Count the matches.
     while ((srcSegmentStart = find(pattern, srcSegmentStart)) != notFound) {
         ++matchCount;
 …
+    }
     // If we have 0 matches, we don't have to do any more work
+    // If we have 0 matches then we don't have to do any more work to do.
     if (!matchCount)
         return this;
 …
     newSize += replaceSize;
+    UChar* data;
+    RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
+    // Construct the new data
+    // Construct the new data.
     size_t srcSegmentEnd;
     unsigned srcSegmentLength;
     srcSegmentStart = 0;
     unsigned dstOffset = 0;
+    bool srcIs8Bit = is8Bit();
+    bool replacementIs8Bit = replacement->is8Bit();
+    // There are 4 cases:
+    // 1. This and replacement are both 8 bit.
+    // 2. This and replacement are both 16 bit.
+    // 3. This is 8 bit and replacement is 16 bit.
+    // 4. This is 16 bit and replacement is 8 bit.
+    if (srcIs8Bit && replacementIs8Bit) {
+        // Case 1
+        LChar* data;
+        RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
+        while ((srcSegmentEnd = find(pattern, srcSegmentStart)) != notFound) {
+            srcSegmentLength = srcSegmentEnd - srcSegmentStart;
+            memcpy(data + dstOffset, m_data8 + srcSegmentStart, srcSegmentLength * sizeof(LChar));
+            dstOffset += srcSegmentLength;
+            memcpy(data + dstOffset, replacement->m_data8, repStrLength * sizeof(LChar));
+            dstOffset += repStrLength;
+            srcSegmentStart = srcSegmentEnd + 1;
+        }
+        srcSegmentLength = m_length - srcSegmentStart;
+        memcpy(data + dstOffset, m_data8 + srcSegmentStart, srcSegmentLength * sizeof(LChar));
+        ASSERT(dstOffset + srcSegmentLength == newImpl->length());
+        return newImpl.release();
+    }
+    UChar* data;
+    RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
     while ((srcSegmentEnd = find(pattern, srcSegmentStart)) != notFound) {
         srcSegmentLength = srcSegmentEnd - srcSegmentStart;
+        memcpy(data + dstOffset, m_data + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+        if (srcIs8Bit) {
+            // Case 3.
+            for (unsigned i = 0; i < srcSegmentLength; i++)
+                data[i + dstOffset] = m_data8[i + srcSegmentStart];
+        } else {
+            // Cases 2 & 4.
+            memcpy(data + dstOffset, m_data16 + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+        }
         dstOffset += srcSegmentLength;
+        memcpy(data + dstOffset, replacement->m_data, repStrLength * sizeof(UChar));
+        if (replacementIs8Bit) {
+            // Case 4.
+            for (unsigned i = 0; i < repStrLength; i++)
+                data[i + dstOffset] = replacement->m_data8[i];
+        } else {
+            // Cases 2 & 3.
+            memcpy(data + dstOffset, replacement->m_data16, repStrLength * sizeof(UChar));
+        }
         dstOffset += repStrLength;
         srcSegmentStart = srcSegmentEnd + 1;
 …
     srcSegmentLength = m_length - srcSegmentStart;
+    memcpy(data + dstOffset, m_data + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+    if (srcIs8Bit) {
+        // Case 3.
+        for (unsigned i = 0; i < srcSegmentLength; i++)
+            data[i + dstOffset] = m_data8[i + srcSegmentStart];
+    } else {
+        // Cases 2 & 4.
+        memcpy(data + dstOffset, m_data16 + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+    }
     ASSERT(dstOffset + srcSegmentLength == newImpl->length());
 …
     unsigned matchCount = 0;
     // Count the matches
+    // Count the matches.
     while ((srcSegmentStart = find(pattern, srcSegmentStart)) != notFound) {
         ++matchCount;
 …
     newSize += matchCount * repStrLength;
-    UChar* data;
-    RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
     // Construct the new data
 …
     srcSegmentStart = 0;
     unsigned dstOffset = 0;
+    bool srcIs8Bit = is8Bit();
+    bool replacementIs8Bit = replacement->is8Bit();
+    // There are 4 cases:
+    // 1. This and replacement are both 8 bit.
+    // 2. This and replacement are both 16 bit.
+    // 3. This is 8 bit and replacement is 16 bit.
+    // 4. This is 16 bit and replacement is 8 bit.
+    if (srcIs8Bit && replacementIs8Bit) {
+        // Case 1
+        LChar* data;
+        RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
+        while ((srcSegmentEnd = find(pattern, srcSegmentStart)) != notFound) {
+            srcSegmentLength = srcSegmentEnd - srcSegmentStart;
+            memcpy(data + dstOffset, m_data8 + srcSegmentStart, srcSegmentLength * sizeof(LChar));
+            dstOffset += srcSegmentLength;
+            memcpy(data + dstOffset, replacement->m_data8, repStrLength * sizeof(LChar));
+            dstOffset += repStrLength;
+            srcSegmentStart = srcSegmentEnd + patternLength;
+        }
+        srcSegmentLength = m_length - srcSegmentStart;
+        memcpy(data + dstOffset, m_data8 + srcSegmentStart, srcSegmentLength * sizeof(LChar));
+        ASSERT(dstOffset + srcSegmentLength == newImpl->length());
+        return newImpl.release();
+    }
+    UChar* data;
+    RefPtr<StringImpl> newImpl = createUninitialized(newSize, data);
     while ((srcSegmentEnd = find(pattern, srcSegmentStart)) != notFound) {
         srcSegmentLength = srcSegmentEnd - srcSegmentStart;
+        memcpy(data + dstOffset, m_data + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+        if (srcIs8Bit) {
+            // Case 3.
+            for (unsigned i = 0; i < srcSegmentLength; i++)
+                data[i + dstOffset] = m_data8[i + srcSegmentStart];
+        } else {
+            // Case 2 & 4.
+            memcpy(data + dstOffset, m_data16 + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+        }
         dstOffset += srcSegmentLength;
+        memcpy(data + dstOffset, replacement->m_data, repStrLength * sizeof(UChar));
+        if (replacementIs8Bit) {
+            // Cases 2 & 3.
+            for (unsigned i = 0; i < repStrLength; i++)
+                data[i + dstOffset] = replacement->m_data8[i];
+        } else {
+            // Case 4
+            memcpy(data + dstOffset, replacement->m_data16, repStrLength * sizeof(UChar));
+        }
         dstOffset += repStrLength;
         srcSegmentStart = srcSegmentEnd + patternLength;
 …
     srcSegmentLength = m_length - srcSegmentStart;
+    memcpy(data + dstOffset, m_data + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+    if (srcIs8Bit) {
+        // Case 3.
+        for (unsigned i = 0; i < srcSegmentLength; i++)
+            data[i + dstOffset] = m_data8[i + srcSegmentStart];
+    } else {
+        // Cases 2 & 4.
+        memcpy(data + dstOffset, m_data16 + srcSegmentStart, srcSegmentLength * sizeof(UChar));
+    }
     ASSERT(dstOffset + srcSegmentLength == newImpl->length());
 …
+}
 bool equal(const StringImpl* a, const char* b)
+bool equal(const StringImpl* a, const LChar* b, unsigned length)
+{
     if (!a)
 …
         return !a;
+    if (length != a->length())
+        return false;
+    if (a->is8Bit()) {
+        const LChar* aChars = a->characters8();
+        for (unsigned i = 0; i != length; ++i) {
+            unsigned char bc = b[i];
+            unsigned char ac = aChars[i];
+            if (!bc)
+                return false;
+            if (ac != bc)
+                return false;
+        }
+        return true;
+    }
+    const UChar* aChars = a->characters16();
+    for (unsigned i = 0; i != length; ++i) {
+        UChar bc = b[i];
+        UChar ac = aChars[i];
+        if (!bc)
+            return false;
+        if (ac != bc)
+            return false;
+    }
+    return true;
+}
+bool equal(const StringImpl* a, const LChar* b)
+{
+    if (!a)
+        return !b;
+    if (!b)
+        return !a;
     unsigned length = a->length();
+    const UChar* as = a->characters();
+    if (a->is8Bit()) {
+        const LChar* aPtr = a->characters8();
+        for (unsigned i = 0; i != length; ++i) {
+            unsigned char bc = b[i];
+            unsigned char ac = aPtr[i];
+            if (!bc)
+                return false;
+            if (ac != bc)
+                return false;
+        }
+        return !b[length];
+    }
+    const UChar* aPtr = a->characters16();
     for (unsigned i = 0; i != length; ++i) {
         unsigned char bc = b[i];
         if (!bc)
             return false;
         if (as[i] != bc)
+        if (aPtr[i] != bc)
             return false;
+    }
 …
     return true;
 #else
+    /* Do it 4-bytes-at-a-time on architectures where it's safe */
+    const uint32_t* aCharacters = reinterpret_cast<const uint32_t*>(a->characters());
+    if (a->is8Bit()) {
+        const LChar* as = a->characters8();
+        for (unsigned i = 0; i != length; ++i)
+            if (as[i] != b[i])
+                return false;
+        return true;
+    }
+    // Do comparison 4-bytes-at-a-time on architectures where it's safe.
+    const uint32_t* aCharacters = reinterpret_cast<const uint32_t*>(a->characters16());
     const uint32_t* bCharacters = reinterpret_cast<const uint32_t*>(b);
     unsigned halfLength = length >> 1;
     for (unsigned i = 0; i != halfLength; ++i) {
 …
             return false;
+    }
     if (length & 1 &&  *reinterpret_cast<const uint16_t*>(aCharacters) != *reinterpret_cast<const uint16_t*>(bCharacters))
         return false;
     return true;
 #endif
 …
+}
 bool equalIgnoringCase(StringImpl* a, const char* b)
+bool equalIgnoringCase(StringImpl* a, const LChar* b)
+{
     if (!a)
 …
     unsigned length = a->length();
-    const UChar* as = a->characters();
     // Do a faster loop for the case where all the characters are ASCII.
     UChar ored = 0;
     bool equal = true;
+    if (a->is8Bit()) {
+        const LChar* as = a->characters8();
+        for (unsigned i = 0; i != length; ++i) {
+            LChar bc = b[i];
+            if (!bc)
+                return false;
+            UChar ac = as[i];
+            ored |= ac;
+            equal = equal && (toASCIILower(ac) == toASCIILower(bc));
+        }
+        // Do a slower implementation for cases that include non-ASCII characters.
+        if (ored & ~0x7F) {
+            equal = true;
+            for (unsigned i = 0; i != length; ++i)
+                equal = equal && (foldCase(as[i]) == foldCase(b[i]));
+        }
+        return equal && !b[length];
+    }
+    const UChar* as = a->characters16();
     for (unsigned i = 0; i != length; ++i) {
         char bc = b[i];
+        LChar bc = b[i];
         if (!bc)
             return false;
 …
         equal = true;
         for (unsigned i = 0; i != length; ++i) {
+            unsigned char bc = b[i];
+            equal = equal && (foldCase(as[i]) == foldCase(bc));
+            equal = equal && (foldCase(as[i]) == foldCase(b[i]));
+        }
+    }
 …
+{
     for (unsigned i = 0; i < m_length; ++i) {
         WTF::Unicode::Direction charDirection = WTF::Unicode::direction(m_data[i]);
+        WTF::Unicode::Direction charDirection = WTF::Unicode::direction(is8Bit() ? m_data8[i] : m_data16[i]);
         if (charDirection == WTF::Unicode::LeftToRight) {
             if (hasStrongDirectionality)
 …
 PassRefPtr<StringImpl> StringImpl::adopt(StringBuffer& buffer)
+{
+    // FIXME: handle 8-bit StringBuffer when it exists.
     unsigned length = buffer.length();
     if (length == 0)
 …
     // Use createUninitialized instead of 'new StringImpl' so that the string and its buffer
     // get allocated in a single memory block.
-    UChar* data;
     unsigned length = string.m_length;
     if (length >= numeric_limits<unsigned>::max())
         CRASH();
+    RefPtr<StringImpl> terminatedString = createUninitialized(length + 1, data);
+    memcpy(data, string.m_data, length * sizeof(UChar));
+    data[length] = 0;
+    RefPtr<StringImpl> terminatedString;
+    if (string.is8Bit()) {
+        LChar* data;
+        terminatedString = createUninitialized(length + 1, data);
+        memcpy(data, string.m_data8, length * sizeof(LChar));
+        data[length] = 0;
+    } else {
+        UChar* data;
+        terminatedString = createUninitialized(length + 1, data);
+        memcpy(data, string.m_data16, length * sizeof(UChar));
+        data[length] = 0;
+    }
     terminatedString->m_length--;
     terminatedString->m_hashAndFlags = (string.m_hashAndFlags & ~s_flagMask) | s_hashFlagHasTerminatingNullCharacter;

trunk/Source/JavaScriptCore/wtf/text/StringImpl.h

-              r98495
+              r98624
 namespace JSC {
 struct IdentifierCStringTranslator;
+struct IdentifierCharBufferTranslator;
 struct IdentifierUCharBufferTranslator;
+}
 …
     WTF_MAKE_NONCOPYABLE(StringImpl); WTF_MAKE_FAST_ALLOCATED;
     friend struct JSC::IdentifierCStringTranslator;
+    friend struct JSC::IdentifierCharBufferTranslator;
     friend struct JSC::IdentifierUCharBufferTranslator;
     friend struct WTF::CStringTranslator;
 …
         : m_refCount(s_refCountFlagIsStaticString)
         , m_length(length)
         , m_data(characters)
+        , m_data16(characters)
         , m_buffer(0)
         , m_hashAndFlags(s_hashFlagIsIdentifier | BufferOwned)
 …
+    }
+    // Create a normal string with internal storage (BufferInternal)
+    // Used to construct static strings, which have an special refCount that can never hit zero.
+    // This means that the static string will never be destroyed, which is important because
+    // static strings will be shared across threads & ref-counted in a non-threadsafe manner.
+    StringImpl(const LChar* characters, unsigned length, ConstructStaticStringTag)
+        : m_refCount(s_refCountFlagIsStaticString)
+        , m_length(length)
+        , m_data8(characters)
+        , m_buffer(0)
+        , m_hashAndFlags(s_hashFlag8BitBuffer | s_hashFlagIsIdentifier | BufferOwned)
+    {
+        // Ensure that the hash is computed so that AtomicStringHash can call existingHash()
+        // with impunity. The empty string is special because it is never entered into
+        // AtomicString's HashKey, but still needs to compare correctly.
+        hash();
+    }
+    // FIXME: there has to be a less hacky way to do this.
+    enum Force8Bit { Force8BitConstructor };
+    // Create a normal 8-bit string with internal storage (BufferInternal)
+    StringImpl(unsigned length, Force8Bit)
+        : m_refCount(s_refCountIncrement)
+        , m_length(length)
+        , m_data8(reinterpret_cast<const LChar*>(this + 1))
+        , m_buffer(0)
+        , m_hashAndFlags(s_hashFlag8BitBuffer | BufferInternal)
+    {
+        ASSERT(m_data8);
+        ASSERT(m_length);
+    }
+    // Create a normal 16-bit string with internal storage (BufferInternal)
     StringImpl(unsigned length)
         : m_refCount(s_refCountIncrement)
         , m_length(length)
         , m_data(reinterpret_cast<const UChar*>(this + 1))
+        , m_data16(reinterpret_cast<const UChar*>(this + 1))
         , m_buffer(0)
         , m_hashAndFlags(BufferInternal)
+    {
         ASSERT(m_data);
+        ASSERT(m_data16);
         ASSERT(m_length);
+    }
 …
         : m_refCount(s_refCountIncrement)
         , m_length(length)
         , m_data(characters)
+        , m_data16(characters)
         , m_buffer(0)
         , m_hashAndFlags(BufferOwned)
+    {
         ASSERT(m_data);
+        ASSERT(m_data16);
         ASSERT(m_length);
+    }
+    // Used to create new strings that are a substring of an existing StringImpl (BufferSubstring)
+    // Used to create new strings that are a substring of an existing 8-bit StringImpl (BufferSubstring)
+    StringImpl(const LChar* characters, unsigned length, PassRefPtr<StringImpl> base)
+        : m_refCount(s_refCountIncrement)
+        , m_length(length)
+        , m_data8(characters)
+        , m_substringBuffer(base.leakRef())
+        , m_hashAndFlags(s_hashFlag8BitBuffer | BufferSubstring)
+    {
+        ASSERT(is8Bit());
+        ASSERT(m_data8);
+        ASSERT(m_length);
+        ASSERT(m_substringBuffer->bufferOwnership() != BufferSubstring);
+    }
+    // Used to create new strings that are a substring of an existing 16-bit StringImpl (BufferSubstring)
     StringImpl(const UChar* characters, unsigned length, PassRefPtr<StringImpl> base)
         : m_refCount(s_refCountIncrement)
         , m_length(length)
         , m_data(characters)
+        , m_data16(characters)
         , m_substringBuffer(base.leakRef())
         , m_hashAndFlags(BufferSubstring)
+    {
+        ASSERT(m_data);
+        ASSERT(!is8Bit());
+        ASSERT(m_data16);
         ASSERT(m_length);
         ASSERT(m_substringBuffer->bufferOwnership() != BufferSubstring);
 …
     static PassRefPtr<StringImpl> create(const UChar*, unsigned length);
+    static PassRefPtr<StringImpl> create(const char*, unsigned length);
+    static PassRefPtr<StringImpl> create(const char*);
+    static ALWAYS_INLINE PassRefPtr<StringImpl> create(PassRefPtr<StringImpl> rep, unsigned offset, unsigned length)
+    static PassRefPtr<StringImpl> create(const LChar*, unsigned length);
+    ALWAYS_INLINE static PassRefPtr<StringImpl> create(const char* s, unsigned length) { return create(s, length); };
+    static PassRefPtr<StringImpl> create(const LChar*);
+    ALWAYS_INLINE static PassRefPtr<StringImpl> create(const char* s) { return create(reinterpret_cast<const LChar*>(s)); };
+    static ALWAYS_INLINE PassRefPtr<StringImpl> create8(PassRefPtr<StringImpl> rep, unsigned offset, unsigned length)
+    {
         ASSERT(rep);
 …
             return empty();
+        ASSERT(rep->is8Bit());
         StringImpl* ownerRep = (rep->bufferOwnership() == BufferSubstring) ? rep->m_substringBuffer : rep.get();
+        return adoptRef(new StringImpl(rep->m_data + offset, length, ownerRep));
+    }
+        return adoptRef(new StringImpl(rep->m_data8 + offset, length, ownerRep));
+    }
+    static ALWAYS_INLINE PassRefPtr<StringImpl> create(PassRefPtr<StringImpl> rep, unsigned offset, unsigned length)
+    {
+        ASSERT(rep);
+        ASSERT(length <= rep->length());
+        if (!length)
+            return empty();
+        StringImpl* ownerRep = (rep->bufferOwnership() == BufferSubstring) ? rep->m_substringBuffer : rep.get();
+        if (rep->is8Bit())
+            return adoptRef(new StringImpl(rep->m_data8 + offset, length, ownerRep));
+        return adoptRef(new StringImpl(rep->m_data16 + offset, length, ownerRep));
+    }
+    static PassRefPtr<StringImpl> createUninitialized(unsigned length, LChar*& data);
     static PassRefPtr<StringImpl> createUninitialized(unsigned length, UChar*& data);
     static ALWAYS_INLINE PassRefPtr<StringImpl> tryCreateUninitialized(unsigned length, UChar*& output)
+    template <typename T> static ALWAYS_INLINE PassRefPtr<StringImpl> tryCreateUninitialized(unsigned length, T*& output)
+    {
         if (!length) {
 …
+        }
         if (length > ((std::numeric_limits<unsigned>::max() - sizeof(StringImpl)) / sizeof(UChar))) {
+        if (length > ((std::numeric_limits<unsigned>::max() - sizeof(StringImpl)) / sizeof(T))) {
             output = 0;
             return 0;
+        }
         StringImpl* resultImpl;
         if (!tryFastMalloc(sizeof(UChar) * length + sizeof(StringImpl)).getValue(resultImpl)) {
+        if (!tryFastMalloc(sizeof(T) * length + sizeof(StringImpl)).getValue(resultImpl)) {
             output = 0;
             return 0;
+        }
+        output = reinterpret_cast<UChar*>(resultImpl + 1);
+        output = reinterpret_cast<T*>(resultImpl + 1);
+        if (sizeof(T) == sizeof(char))
+            return adoptRef(new(resultImpl) StringImpl(length, Force8BitConstructor));
         return adoptRef(new(resultImpl) StringImpl(length));
+    }
 …
     static PassRefPtr<StringImpl> reallocate(PassRefPtr<StringImpl> originalString, unsigned length, UChar*& data);
+    static unsigned dataOffset() { return OBJECT_OFFSETOF(StringImpl, m_data); }
+    static unsigned flagsOffset() { return OBJECT_OFFSETOF(StringImpl, m_hashAndFlags); }
+    static unsigned flagIs8Bit() { return s_hashFlag8BitBuffer; }
+    static unsigned dataOffset() { return OBJECT_OFFSETOF(StringImpl, m_data8); }
     static PassRefPtr<StringImpl> createWithTerminatingNullCharacter(const StringImpl&);
 …
     unsigned length() const { return m_length; }
+    const UChar* characters() const { return m_data; }
+    bool is8Bit() const { return m_hashAndFlags & s_hashFlag8BitBuffer; }
+    // FIXME: Remove all unnecessary usages of characters()
+    ALWAYS_INLINE const LChar* characters8() const { ASSERT(is8Bit()); ASSERT_NOT_REACHED(); return m_data8; }
+    ALWAYS_INLINE const UChar* characters16() const { ASSERT(!is8Bit()); return m_data16; }
+    ALWAYS_INLINE const UChar* characters() const
+    {
+        if (!is8Bit())
+            return m_data16;
+        return getData16SlowCase();
+    }
     size_t cost()
 …
+    }
+    bool has16BitShadow() const { return m_hashAndFlags & s_hashFlagHas16BitShadow; }
     bool isIdentifier() const { return m_hashAndFlags & s_hashFlagIsIdentifier; }
     void setIsIdentifier(bool isIdentifier)
 …
+    {
         ASSERT(!hasHash());
+        ASSERT(hash == StringHasher::computeHash(m_data, m_length)); // Multiple clients assume that StringHasher is the canonical string hash function.
+        // Multiple clients assume that StringHasher is the canonical string hash function.
+        ASSERT(hash == (is8Bit() ? StringHasher::computeHash(m_data8, m_length) : StringHasher::computeHash(m_data16, m_length)));
         ASSERT(!(hash & (s_flagMask << (8 * sizeof(hash) - s_flagCount)))); // Verify that enough high bits are empty.
 …
     unsigned hash() const
+    {
+        if (!hasHash())
+            setHash(StringHasher::computeHash(m_data, m_length));
+        if (!hasHash()) {
+            if (is8Bit())
+                setHash(StringHasher::computeHash(m_data8, m_length));
+            else
+                setHash(StringHasher::computeHash(m_data16, m_length));
+        }
         return existingHash();
+    }
 …
     static StringImpl* empty();
+    static void copyChars(UChar* destination, const UChar* source, unsigned numCharacters)
+    {
+    // FIXME: Does this really belong in StringImpl?
+    template <typename T> static void copyChars(T* destination, const T* source, unsigned numCharacters)
+    {
+        if (numCharacters == 1) {
+            *destination = *source;
+            return;
+        }
         if (numCharacters <= s_copyCharsInlineCutOff) {
+            for (unsigned i = 0; i < numCharacters; ++i)
+            unsigned i = 0;
+#if (CPU(X86) || CPU(X86_64))
+            const unsigned charsPerInt = sizeof(uint32_t) / sizeof(T);
+            if (numCharacters > charsPerInt) {
+                unsigned stopCount = numCharacters & ~(charsPerInt - 1);
+                const uint32_t* srcCharacters = reinterpret_cast<const uint32_t*>(source);
+                uint32_t* destCharacters = reinterpret_cast<uint32_t*>(destination);
+                for (unsigned j = 0; i < stopCount; i += charsPerInt, ++j)
+                    destCharacters[j] = srcCharacters[j];
+            }
+#endif
+            for (; i < numCharacters; ++i)
                 destination[i] = source[i];
         } else
             memcpy(destination, source, numCharacters * sizeof(UChar));
+            memcpy(destination, source, numCharacters * sizeof(T));
+    }
 …
     PassRefPtr<StringImpl> substring(unsigned pos, unsigned len = UINT_MAX);
+    UChar operator[](unsigned i) { ASSERT(i < m_length); return m_data[i]; }
+    UChar operator[](unsigned i) const
+    {
+        ASSERT(i < m_length);
+        if (is8Bit())
+            return m_data8[i];
+        return m_data16[i];
+    }
     UChar32 characterStartingAt(unsigned);
 …
     PassRefPtr<StringImpl> fill(UChar);
+    // FIXME: Do we need fill(char) or can we just do the right thing if UChar is ASCII?
     PassRefPtr<StringImpl> foldCase();
 …
     PassRefPtr<StringImpl> removeCharacters(CharacterMatchFunctionPtr);
+    // FIXME: Do we need char version, or is it okay to just pass in an ASCII char for 8-bit? Same for reverseFind, replace
     size_t find(UChar, unsigned index = 0);
     size_t find(CharacterMatchFunctionPtr, unsigned index = 0);
+    size_t find(const char*, unsigned index = 0);
+    size_t find(const LChar*, unsigned index = 0);
+    ALWAYS_INLINE size_t find(const char* s, unsigned index = 0) { return find(reinterpret_cast<const LChar*>(s), index); };
     size_t find(StringImpl*, unsigned index = 0);
+    size_t findIgnoringCase(const char*, unsigned index = 0);
+    size_t findIgnoringCase(const LChar*, unsigned index = 0);
+    ALWAYS_INLINE size_t findIgnoringCase(const char* s, unsigned index = 0) { return findIgnoringCase(reinterpret_cast<const LChar*>(s), index); };
     size_t findIgnoringCase(StringImpl*, unsigned index = 0);
 …
     template <class UCharPredicate> PassRefPtr<StringImpl> stripMatchedCharacters(UCharPredicate);
     template <class UCharPredicate> PassRefPtr<StringImpl> simplifyMatchedCharactersToSpace(UCharPredicate);
+    NEVER_INLINE const UChar* getData16SlowCase() const;
     // The bottom bit in the ref count indicates a static (immortal) string.
 …
     COMPILE_ASSERT(s_flagCount == StringHasher::flagCount, StringHasher_reserves_enough_bits_for_StringImpl_flags);
+    static const unsigned s_hashFlagHas16BitShadow = 1u << 7;
+    static const unsigned s_hashFlag8BitBuffer = 1u << 6;
     static const unsigned s_hashFlagHasTerminatingNullCharacter = 1u << 5;
     static const unsigned s_hashFlagIsAtomic = 1u << 4;
 …
     unsigned m_refCount;
     unsigned m_length;
+    const UChar* m_data;
+    union {
+        const LChar* m_data8;
+        const UChar* m_data16;
+    };
     union {
         void* m_buffer;
         StringImpl* m_substringBuffer;
+        mutable UChar* m_copyData16;
     };
     mutable unsigned m_hashAndFlags;
 …
 bool equal(const StringImpl*, const StringImpl*);
+bool equal(const StringImpl*, const char*);
+inline bool equal(const char* a, StringImpl* b) { return equal(b, a); }
+bool equal(const StringImpl*, const LChar*);
+inline bool equal(const StringImpl* a, const char* b) { return equal(a, reinterpret_cast<const LChar*>(b)); }
+bool equal(const StringImpl*, const LChar*, unsigned);
+inline bool equal(const LChar* a, StringImpl* b) { return equal(b, a); }
+inline bool equal(const char* a, StringImpl* b) { return equal(b, reinterpret_cast<const LChar*>(a)); }
 bool equal(const StringImpl*, const UChar*, unsigned);
 bool equalIgnoringCase(StringImpl*, StringImpl*);
+bool equalIgnoringCase(StringImpl*, const char*);
+inline bool equalIgnoringCase(const char* a, StringImpl* b) { return equalIgnoringCase(b, a); }
+bool equalIgnoringCase(const UChar* a, const char* b, unsigned length);
+inline bool equalIgnoringCase(const char* a, const UChar* b, unsigned length) { return equalIgnoringCase(b, a, length); }
+bool equalIgnoringCase(StringImpl*, const LChar*);
+inline bool equalIgnoringCase(const LChar* a, StringImpl* b) { return equalIgnoringCase(b, a); }
+bool equalIgnoringCase(const UChar*, const LChar*, unsigned);
+inline bool equalIgnoringCase(const UChar* a, const char* b, unsigned length) { return equalIgnoringCase(a, reinterpret_cast<const LChar*>(b), length); }
+inline bool equalIgnoringCase(const LChar* a, const UChar* b, unsigned length) { return equalIgnoringCase(b, a, length); }
+inline bool equalIgnoringCase(const char* a, const UChar* b, unsigned length) { return equalIgnoringCase(b, reinterpret_cast<const LChar*>(a), length); }
 bool equalIgnoringNullity(StringImpl*, StringImpl*);
 …
 inline PassRefPtr<StringImpl> StringImpl::isolatedCopy() const
+{
+    return create(m_data, m_length);
+    if (is8Bit())
+        return create(m_data8, m_length);
+    return create(m_data16, m_length);
+}

trunk/Source/JavaScriptCore/wtf/text/WTFString.cpp

-              r98316
+              r98624
 // Construct a string with latin1 data.
+String::String(const LChar* characters, unsigned length)
+    : m_impl(characters ? StringImpl::create(characters, length) : 0)
+{
+}
 String::String(const char* characters, unsigned length)
     : m_impl(characters ? StringImpl::create(characters, length) : 0)
+    : m_impl(characters ? StringImpl::create(reinterpret_cast<const LChar*>(characters), length) : 0)
+{
+}
 // Construct a string with latin1 data, from a null-terminated source.
+String::String(const LChar* characters)
+    : m_impl(characters ? StringImpl::create(characters) : 0)
+{
+}
 String::String(const char* characters)
     : m_impl(characters ? StringImpl::create(characters) : 0)
+    : m_impl(characters ? StringImpl::create(reinterpret_cast<const LChar*>(characters)) : 0)
+{
+}
 …
+}
 void String::append(char c)
+void String::append(LChar c)
+{
     // FIXME: This is extremely inefficient. So much so that we might want to take this
 …
     QByteArray ba = buffer.toUtf8();
     return StringImpl::create(ba.constData(), ba.length());
+    return StringImpl::create(reinterpret_cast<const LChar*>(ba.constData()), ba.length());
 #elif OS(WINCE)
 …
             return String("");
         if (written > 0)
             return StringImpl::create(buffer.data(), written);
+            return StringImpl::create(reinterpret_cast<const LChar*>(buffer.data()), written);
         bufferSize <<= 1;
 …
     va_end(args);
     return StringImpl::create(buffer.data(), len);
+    return StringImpl::create(reinterpret_cast<const LChar*>(buffer.data()), len);
 #endif
+}
 …
+}
 String String::fromUTF8(const char* stringStart, size_t length)
+String String::fromUTF8(const LChar* stringStart, size_t length)
+{
     if (length > numeric_limits<unsigned>::max())
 …
     // Try converting into the buffer.
     const char* stringCurrent = stringStart;
     if (convertUTF8ToUTF16(&stringCurrent, stringStart + length, &buffer, bufferEnd) != conversionOK)
+    const char* stringCurrent = reinterpret_cast<const char*>(stringStart);
+    if (convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + length), &buffer, bufferEnd) != conversionOK)
         return String();
 …
+}
 String String::fromUTF8(const char* string)
+String String::fromUTF8(const LChar* string)
+{
     if (!string)
         return String();
     return fromUTF8(string, strlen(string));
+}
 String String::fromUTF8WithLatin1Fallback(const char* string, size_t size)
+    return fromUTF8(string, strlen(reinterpret_cast<const char*>(string)));
+}
+String String::fromUTF8WithLatin1Fallback(const LChar* string, size_t size)
+{
     String utf8 = fromUTF8(string, size);

trunk/Source/JavaScriptCore/wtf/text/WTFString.h

-              r98316
+              r98624
     // Construct a string with latin1 data.
+    WTF_EXPORT_PRIVATE String(const LChar* characters, unsigned length);
     WTF_EXPORT_PRIVATE String(const char* characters, unsigned length);
     // Construct a string with latin1 data, from a null-terminated source.
+    WTF_EXPORT_PRIVATE String(const LChar* characters);
     WTF_EXPORT_PRIVATE String(const char* characters);
 …
     size_t find(CharacterMatchFunctionPtr matchFunction, unsigned start = 0) const
         { return m_impl ? m_impl->find(matchFunction, start) : notFound; }
     size_t find(const char* str, unsigned start = 0) const
+    size_t find(const LChar* str, unsigned start = 0) const
         { return m_impl ? m_impl->find(str, start) : notFound; }
 …
     // Case insensitive string matching.
     WTF_EXPORT_PRIVATE size_t findIgnoringCase(const char* str, unsigned start = 0) const
+    WTF_EXPORT_PRIVATE size_t findIgnoringCase(const LChar* str, unsigned start = 0) const
         { return m_impl ? m_impl->findIgnoringCase(str, start) : notFound; }
     WTF_EXPORT_PRIVATE size_t findIgnoringCase(const String& str, unsigned start = 0) const
 …
     // Wrappers for find & reverseFind adding dynamic sensitivity check.
     size_t find(const char* str, unsigned start, bool caseSensitive) const
+    size_t find(const LChar* str, unsigned start, bool caseSensitive) const
         { return caseSensitive ? find(str, start) : findIgnoringCase(str, start); }
     size_t find(const String& str, unsigned start, bool caseSensitive) const
 …
     bool contains(UChar c) const { return find(c) != notFound; }
     bool contains(const char* str, bool caseSensitive = true) const { return find(str, 0, caseSensitive) != notFound; }
+    bool contains(const LChar* str, bool caseSensitive = true) const { return find(str, 0, caseSensitive) != notFound; }
     bool contains(const String& str, bool caseSensitive = true) const { return find(str, 0, caseSensitive) != notFound; }
 …
     WTF_EXPORT_PRIVATE void append(const String&);
+    WTF_EXPORT_PRIVATE void append(char);
+    WTF_EXPORT_PRIVATE void append(LChar);
+    inline WTF_EXPORT_PRIVATE void append(char c) { append(static_cast<LChar>(c)); };
     WTF_EXPORT_PRIVATE void append(UChar);
     WTF_EXPORT_PRIVATE void append(const UChar*, unsigned length);
 …
     // String::fromUTF8 will return a null string if
     // the input data contains invalid UTF-8 sequences.
+    WTF_EXPORT_PRIVATE static String fromUTF8(const char*, size_t);
+    WTF_EXPORT_PRIVATE static String fromUTF8(const char*);
+    WTF_EXPORT_PRIVATE static String fromUTF8(const LChar*, size_t);
+    WTF_EXPORT_PRIVATE static String fromUTF8(const LChar*);
+    inline WTF_EXPORT_PRIVATE static String fromUTF8(const char* s, size_t length) { return fromUTF8(reinterpret_cast<const LChar*>(s), length); };
+    inline WTF_EXPORT_PRIVATE static String fromUTF8(const char* s) { return fromUTF8(reinterpret_cast<const LChar*>(s)); };
     // Tries to convert the passed in string to UTF-8, but will fall back to Latin-1 if the string is not valid UTF-8.
+    WTF_EXPORT_PRIVATE static String fromUTF8WithLatin1Fallback(const char*, size_t);
+    WTF_EXPORT_PRIVATE static String fromUTF8WithLatin1Fallback(const LChar*, size_t);
+    inline WTF_EXPORT_PRIVATE static String fromUTF8WithLatin1Fallback(const char* s, size_t length) { return fromUTF8WithLatin1Fallback(reinterpret_cast<const LChar*>(s), length); };
     // Determines the writing direction using the Unicode Bidi Algorithm rules P2 and P3.
 …
 inline bool operator==(const String& a, const String& b) { return equal(a.impl(), b.impl()); }
+inline bool operator==(const String& a, const char* b) { return equal(a.impl(), b); }
+inline bool operator==(const char* a, const String& b) { return equal(a, b.impl()); }
+inline bool operator==(const String& a, const LChar* b) { return equal(a.impl(), b); }
+inline bool operator==(const String& a, const char* b) { return equal(a.impl(), reinterpret_cast<const LChar*>(b)); }
+inline bool operator==(const LChar* a, const String& b) { return equal(a, b.impl()); }
+inline bool operator==(const char* a, const String& b) { return equal(reinterpret_cast<const LChar*>(a), b.impl()); }
 inline bool operator!=(const String& a, const String& b) { return !equal(a.impl(), b.impl()); }
+inline bool operator!=(const String& a, const char* b) { return !equal(a.impl(), b); }
+inline bool operator!=(const char* a, const String& b) { return !equal(a, b.impl()); }
+inline bool operator!=(const String& a, const LChar* b) { return !equal(a.impl(), b); }
+inline bool operator!=(const String& a, const char* b) { return !equal(a.impl(), reinterpret_cast<const LChar*>(b)); }
+inline bool operator!=(const LChar* a, const String& b) { return !equal(a, b.impl()); }
+inline bool operator!=(const char* a, const String& b) { return !equal(reinterpret_cast<const LChar*>(a), b.impl()); }
 inline bool equalIgnoringCase(const String& a, const String& b) { return equalIgnoringCase(a.impl(), b.impl()); }
+inline bool equalIgnoringCase(const String& a, const char* b) { return equalIgnoringCase(a.impl(), b); }
+inline bool equalIgnoringCase(const char* a, const String& b) { return equalIgnoringCase(a, b.impl()); }
+inline bool equalIgnoringCase(const String& a, const LChar* b) { return equalIgnoringCase(a.impl(), b); }
+inline bool equalIgnoringCase(const String& a, const char* b) { return equalIgnoringCase(a.impl(), reinterpret_cast<const LChar*>(b)); }
+inline bool equalIgnoringCase(const LChar* a, const String& b) { return equalIgnoringCase(a, b.impl()); }
+inline bool equalIgnoringCase(const char* a, const String& b) { return equalIgnoringCase(reinterpret_cast<const LChar*>(a), b.impl()); }
 inline bool equalPossiblyIgnoringCase(const String& a, const String& b, bool ignoreCase)

trunk/Source/JavaScriptCore/wtf/unicode/Unicode.h

r95555	r98624
40	40	COMPILE_ASSERT(sizeof(UChar) == 2, UCharIsTwoBytes);
41	41
	42	// Define platform neutral 8 bit character type (L is for Latin-1).
	43	typedef unsigned char LChar;
	44
42	45	#endif // WTF_UNICODE_H

trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp

r96169	r98624
2501	2501	}
2502	2502
2503		int execute(YarrCodeBlock& jitObject, const char* input, unsigned start, unsigned length, int* output)
	2503	int execute(YarrCodeBlock& jitObject, const LChar* input, unsigned start, unsigned length, int* output)
2504	2504	{
2505	2505	return jitObject.execute(input, start, length, output);

trunk/Source/JavaScriptCore/yarr/YarrJIT.h

-              r95901
+              r98624
 class YarrCodeBlock {
     typedef int (*YarrJITCode8)(const char* input, unsigned start, unsigned length, int* output) YARR_CALL;
+    typedef int (*YarrJITCode8)(const LChar* input, unsigned start, unsigned length, int* output) YARR_CALL;
     typedef int (*YarrJITCode16)(const UChar* input, unsigned start, unsigned length, int* output) YARR_CALL;
 …
     void set16BitCode(MacroAssembler::CodeRef ref) { m_ref16 = ref; }
     int execute(const char* input, unsigned start, unsigned length, int* output)
+    int execute(const LChar* input, unsigned start, unsigned length, int* output)
+    {
         ASSERT(has8BitCode());
 …
 void jitCompile(YarrPattern&, YarrCharSize, JSGlobalData*, YarrCodeBlock& jitObject);
 int execute(YarrCodeBlock& jitObject, const UChar* input, unsigned start, unsigned length, int* output);
 int execute(YarrCodeBlock& jitObject, const char* input, unsigned start, unsigned length, int* output);
+int execute(YarrCodeBlock& jitObject, const LChar* input, unsigned start, unsigned length, int* output);
 } } // namespace JSC::Yarr

trunk/Source/JavaScriptCore/yarr/YarrParser.h

-              r95901
+              r98624
         , m_backReferenceLimit(backReferenceLimit)
         , m_err(NoError)
         , m_data(pattern)
+        , m_data(pattern.characters16())
         , m_size(pattern.length())
         , m_index(0)
 …
     unsigned m_backReferenceLimit;
     ErrorCode m_err;
     const UString& m_data;
+    const UChar* m_data;
     unsigned m_size;
     unsigned m_index;

trunk/Source/WebCore/ChangeLog

-              r98618
+              r98624
+-10-27  Michael Saboff  <msaboff@apple.com>
+        Investigate storing strings in 8-bit buffers when possible
+        https://bugs.webkit.org/show_bug.cgi?id=66161
+        Changes to support 8 bit StringImpl changes.
+        Reviewed by Geoffrey Garen.
+        No new tests, refactored StringImpl for 8 bit strings.
+        * platform/text/cf/StringImplCF.cpp:
+        (WTF::StringImpl::createCFString):
 -10-27  Nat Duca  <nduca@chromium.org>

trunk/Source/WebCore/platform/text/cf/StringImplCF.cpp

-              r93012
+              r98624
     CFAllocatorRef allocator = (m_length && isMainThread()) ? StringWrapperCFAllocator::allocator() : 0;
     if (!allocator)
         return CFStringCreateWithCharacters(0, reinterpret_cast<const UniChar*>(m_data), m_length);
+        return CFStringCreateWithCharacters(0, reinterpret_cast<const UniChar*>(characters()), m_length);
     // Put pointer to the StringImpl in a global so the allocator can store it with the CFString.
 …
     StringWrapperCFAllocator::currentString = this;
     CFStringRef string = CFStringCreateWithCharactersNoCopy(allocator, reinterpret_cast<const UniChar*>(m_data), m_length, kCFAllocatorNull);
+    CFStringRef string = CFStringCreateWithCharactersNoCopy(allocator, reinterpret_cast<const UniChar*>(characters()), m_length, kCFAllocatorNull);
     // The allocator cleared the global when it read it, but also clear it here just in case.

trunk/Source/WebKit2/ChangeLog

-              r98622
+              r98624
+-10-27  Michael Saboff  <msaboff@apple.com>
+        Investigate storing strings in 8-bit buffers when possible
+        https://bugs.webkit.org/show_bug.cgi?id=66161
+        Added export of StringImpl::getData16SlowCase for linking tests.
+        Reviewed by Geoffrey Garen.
+        * win/WebKit2.def:
 -10-27  Sam Weinig  <sam@webkit.org>

trunk/Source/WebKit2/win/WebKit2.def

-              r98598
+              r98624
         ?absoluteBoundingBoxRectIgnoringTransforms@RenderObject@WebCore@@QBE?AVIntRect@2@XZ
         ?add@AtomicString@WTF@@CA?AV?$PassRefPtr@VStringImpl@WTF@@@2@PBD@Z
+        ?add@AtomicString@WTF@@CA?AV?$PassRefPtr@VStringImpl@WTF@@@2@PBE@Z
         ?addSlowCase@AtomicString@WTF@@CA?AV?$PassRefPtr@VStringImpl@WTF@@@2@PAVStringImpl@2@@Z
         ?cacheDOMStructure@WebCore@@YAPAVStructure@JSC@@PAVJSDOMGlobalObject@1@PAV23@PBUClassInfo@3@@Z
 …
         ?createWrapper@WebCore@@YA?AVJSValue@JSC@@PAVExecState@3@PAVJSDOMGlobalObject@1@PAVNode@1@@Z
         ?ensureShadowRoot@Element@WebCore@@QAEPAVShadowRoot@2@XZ
         ?equal@WTF@@YA_NPBVStringImpl@1@PBD@Z
+        ?equal@WTF@@YA_NPBVStringImpl@1@PBE@Z
         ?externalRepresentation@WebCore@@YA?AVString@WTF@@PAVElement@1@I@Z
         ?getCachedDOMStructure@WebCore@@YAPAVStructure@JSC@@PAVJSDOMGlobalObject@1@PBUClassInfo@3@@Z
+        ?getData16SlowCase@StringImpl@WTF@@ABEPB_WXZ
         ?getElementById@TreeScope@WebCore@@QBEPAVElement@2@ABVAtomicString@WTF@@@Z
         ?getLocationAndLengthFromRange@TextIterator@WebCore@@SA_NPAVElement@2@PBVRange@2@AAI2@Z

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 98624 in webkit

Legend:

Download in other formats: