wiki:WhitespaceCollapsing

Version 1 (modified by mmaxfield@apple.com, 9 years ago) (diff)

--

Whitespace collapsing occurs in a collection of disparate pieces, which all collude to produce the intended effect. The highest level function in this process is RenderBlockFlow::layoutRunsAndFloatsInRange(); everything occurs inside there. Note that this entire discussion occurs on RenderObjects; the DOM is not touched at all. RenderTexts have their own String objects, which often point to the same data as the DOM node’s String.

The first step happens when we break up the string into lines, with a call to lineBreaker.nextLineBreak(). As we scan through the string looking for the next line breaking opportunity (LineBreaker::nextLineBreak()), we keep track of any contiguous sequences of whitespace we encounter. Whenever we hit a boundary at the beginning or end of these whitespace ranges, we call a callback with an Iterator pointing to the position of the boundary. (Note that these ranges actually start on the second whitespace character, because we don’t want to ignore the first whitespace character.)

This callback is MidpointState::startIgnoringSpaces(const Iterator&) or MidpointState::stopIgnoringSpaces(const Iterator&). There is one MidpointState object at any time during line layout, and it is owned by the BidiResolver (of which there is also only one at any point during layout, and it is created on the stack in RenderBlockFlow::layoutRunsAndFloats()).

These callbacks simply append their argument iterator into a vector. Therefore, when we are done with line breaking, we have a bunch of pairs of iterators, where each pair tightly straddles a range of whitespace which should be ignored. These iterators are in logical order (the order of the codepoints in the string). The number of iterators in the MidpointState might actually be odd; that just means that we should ignore from the last iterator until the end of the line. From here on, our goal is to ignore everything between these iterator pairs in the MidpointState.

Now that we have found the end of the line, we pop back up to RenderBlockFlow::layoutRunsAndFloatsInRange(), and call constructBidiRunsForSegment(). This is responsible for running the BiDi algorithm on the line, as implemented by BidiResolver. The BiDi algorithm receives a string and will partition the string into runs where each run is associated with a directionality (left to right or right to left). These runs are then placed one by one in the inline-direction of the line (sort of, more on this later).

These runs are emitted one by one by a callback: InlineBidiResolver::appendRun() (which is a template specialization which is used in this scenario). Because the BiDi algorithm doesn’t have any concept of individual HTML elements, but only operates on a stream of text, it may emit runs which contain multiple elements. However, we currently have a requirement that our InlineBoxes may only be created for a single renderer. Therefore, InlineBidiResolver::appendRun() will iterate across the RenderObjects in each BiDi run, and call a callback on each of them.

This callback is RenderBlockFlow::appendRunsForObject(), which is where whitespace collapsing really happens. The goal of this function is to subtract out all the MidpointState ranges from its input, and create BidiRun objects for each part remaining. It performs this operation in linear time by using the fact that the BiDi algorithm will emit runs in the order they occur in the input, which is the same as the order of whitespace sequences was added to the MidpointState. By walking a cursor in MidpointState which persists beyond each individual invocation of RenderBlockFlow::appendRunsForObject(), the function can perform a logical “zip” operation between the ranges.

These BidiRun objects which RenderBlockFlow::appendRunsForObject() creates get appended to an internal vector inside BidiResolver. Execution then pops all the way back up to RenderBlockFlow::layoutRunsAndFloatsInRange() again, which also knows about the BidiResolver, and thus has access to these new BidiRuns. We then call createLineBoxesFromBidiRuns(), which will both create the InlineBoxes as well as position them correctly. It does this by iterating through the BidiRuns, constructing InlineBoxes, and setting a pointer inside the BidiRun to point to the new InlineBox. It then runs a horizontal and vertical layout pass on the newly created InlineBoxes. When everything is complete, the BidiRuns themselves are discarded, and the renderers are associated with their constituent InlineBoxes.

I mentioned before that the BiDI algorithm’s callbacks only sort-of occur in string order. In the general case, they do occur in string order, however, the BiDi algorithm has a concept of “isolates.” When the algorithm encounters a region of the text which is isolated, the algorithm simply ignores it and treats it as atomic. Then, when the algorithm is done, it goes back and recursively processes any isolates it previously found. This means that, if there are no isolates, the callbacks are in string order, but the act of encountering an isolate means that the algorithm has to run another pass through the string.

InlineBidiResolver::appendRun() represents isolates as a run which encompasses the entire isolate. However, addPlaceholderRunForIsolatedInline() records the fact that these runs are fake by appending them to a special vector inside BidiResolver. Then, when execution pops back to constructBidiRunsForSegment(), it pulls a run off the vector and starts the BiDi algorithm anew on it. Restarting the BiDi algorithm requires restoring the cursor into the MidpointState, which was saved back when the fake run was recorded in the first place.