wiki:EfficientStrings

Construction

Null and empty

Passing a null string is very efficient. Whenever possible, use a null String instead of an empty String.

From literal

Each String type has a efficient constructor to initialize the string from a literal:

  • String foo = ASCIILiteral("bar");
  • String foo("bar", String::ConstructFromLiteral);

The safest option is ASCIILiteral("Foo"). It produces the same code size as String("Foo") while being faster.

The difference between the version is if the length of the string is included or not. Having the size given in the constructor makes the constructor faster. Having the size also makes the code bigger, which is a problem when the code is executed infrequently.

In general, use ASCIILiteral unless you can show improvement on a benchmark by using ConstructFromLiteral.

AtomicString from literal

AtomicString should always use the full template version:

  • AtomicString foo("bar", AtomicString::ConstructFromLiteral);

The reason is that version gives the possibility to compute the hash at compile time in the future.

Not creating a string

Many operations can be more efficient with a literal. Do not create a String when it is not needed.

E.g.:

  • foo.startsWith("bar")
  • foo.startsWith(ASCIILiteral("bar"))
  • foo.startsWith(String("bar"))

The first version is the fastest.

Concatenation

There are two efficient way to concatenate strings: StringBuilder and StringOperators. Anything else is pretty much less efficient when doing more than one operations.

E.g.:

str = text;
str.append("a"); // == str.append(String("a"));
str.append(foo);
str += bar;

Should be (StringOperators):

str = text + 'a' + foo + bar;

Note the use of 'a' here instead of "a" as it is more efficient.


E.g:

str = foo;
for (size_t i = 0; i < foobars; ++i)
   str += "bar";

should be:

StringBuilder builder;
builder.append(foo);
for (size_t i = 0; i < foobars; ++i)
   builder.appendLiteral("bar");
str = builder.toString();

Note: If you need to append a literal char, builder.append('c'); is more efficient than builder.appendLiteral("c");

Memory

Any of the string class uses memory on the heap to allocate a StringImpl. The only way to avoid allocating new memory is to use the methods taking a constant literal.

On 64bits, the memory used for StringImpl vary between 28 bytes to (28 + length + length * 2) bytes for a string from copy that has been converted to 16 bits.

For example, a 10 characters string from copy converted to 16bits + the allocators alignment would typically take:
-28 + 10 = 38 -> typically allocated to 64bytes
-10 * 2 = 20 -> typically allocated to 32bytes
-->96bytes.
Be careful when allocating strings.

AtomicString VS String

WTF::AtomicString is a class that has four differences from the normal WTF::String class:

  • It’s more expensive to create a new atomic string than a non-atomic string; doing so requires a lookup in a per-thread atomic string hash table.
  • It’s very inexpensive to compare one atomic string with another. The cost is just a pointer comparison. The actual string length and data don’t need to be compared, because on any one thread no AtomicString can be equal to any other AtomicString.
  • If a particular string already exists in the atomic string table, allocating another string that is equal to it does not cost any additional memory. The atomic string is shared and the cost is looking it up in the per-thread atomic string hash table and incrementing its reference count.
  • There are special considerations if you want to use an atomic string on a thread other than the one it was created on since each thread has its own atomic string hash table. In particular, if internal StringImpl object is destroyed it will try to remove string from the wrong AtomicStringTable. Also, if String::isolatedCopy() is invoked for AtomicString, it will always create a copy, and internal reference counter is not thread safe.

We use AtomicString to make string comparisons fast and to save memory when many equal strings are likely to be allocated. For example, we use AtomicString for HTML attribute names so we can compare them quickly, and for both HTML attribute names and values since it’s common to have many identical ones and we save memory.

We shouldn't use AtomicString if the string we're about to create doesn't get shared across multiple AtomicStrings. For example, if we had used AtomicString for the strings inside Text nodes, then we may end up filling up the atomic string table with all these really long strings that don't typically appear more than once. It also slows down the hash map look up for all other atomic strings.

(this topic is a summary of the thread "[webkit-dev] When should I use AtomicString vs String?")

Last modified 6 years ago Last modified on Dec 18, 2018, 1:57:24 PM
Note: See TracWiki for help on using the wiki.