WebKit Coordinated Graphics System
This wiki page describes the coordinated graphics architecture currently used by the WebKit2 ports of Qt, EFL and GTK+.
A good reading before reading this is the following from Chromium: http://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome/
This explains in detail the difference between standard rendering in WebCore and the accelerated compositing code path. Without understanding of the difference, the rest of this document might not be very useful.
Other blog posts that can help:
Qt scenegraph: http://blog.qt.digia.com/blog/2010/05/18/a-qt-scenegraph/
Reasons for going with TextureMapper rather than QGraphicsView etc., and could help understand our progress on the subject in the last 2.5 years:
This document is here to serve as a snapshot of current architecture for hardware-acceleration in the Qt port. Though all of the components described here are used by the Qt port, most of them are not used exclusively by the Qt port. Every port has a different blend of components it uses.
The main components described in this document:
TextureMapper - a set of drawing primitives and a scenegraph implementation tuned specifically for WebKit accelerated compositing. Used by Qt, EFL and GTK+.
Coordinated Compositing - an architecture for WebKit2 that allows using accelerated compositing in a dual process environment. EFL is, at the time of writing this document, in the process of switching to use coordinated compositing.
Coordinated Backing Store - How the compositing environment deals with software-rendered buffers or WebGL buffers, e.g. from the point of view of memory and pixel-formats. This includes several components, some are used only together with coordinated compositing, some also used elsewhere.
QtScenegraph integration - How the WebKit way of rendering is glued with the QtQuick scenegraph way of rendering. This is naturally used by the Qt port only.
The following diagram shows the different components and classes, and how they all work together. more explanations would follow.
TextureMapper is a light-weight scenegraph implementation that is specially attuned for efficient GPU or software rendering of CSS3 composited content, such as 3D transforms, animations and canvas. It is a combination of a specialized accelerated drawing context (TextureMapper) and a scenegraph (TextureMapperLayer).
TextureMapper is pretty well-contained, and apart from the backing-store integration, it is thread safe.
TextureMapper includes/uses the following classes. Note that the last two (without TextureMapper in the name), do not depend on TextureMapper, but are currently used only by it.
TextureMapper is an abstract class that provides the necessary drawing primitives for the scenegraph. Unlike GraphicsContext or QPainter, it does not try to provide all possible drawing primitives, nor does it try to be a public API. Its only purpose is to abstract different implementations of the drawing primitives from the scenegraph.
It is called TextureMapper because its main function is to figure out how to do texture-mapping, shading, transforming and blending a texture to a drawing buffer. TextureMapper also includes a class called BitmapTexture, which is a drawing buffer that can either be manipulated by a backing store, or become a drawing target for TextureMapper. BitmapTexture is roughly equivalent to an ImageBuffer, or to a combination of a texture and an FBO.
TextureMapperImageBuffer is a software implementation of the TextureMapper drawing primitives.
It is currently used by Qt in WebKit1 when using a QWebView, and also by the GTK+ port as a fallback when OpenGL is not available. TextureMapperImageBuffer uses normal GraphicsContext operations to draw, making it not effectively use the GPU. However, unlike regular drawing in WebCore, it does use backing stores for the layers, which could cause some soft acceleration.
TextureMapperGL is the GPU-accelerated implementation of the drawing primitives. It is currently used by Qt-WebKit2, GTK+ and EFL(?). TextureMapperGL uses shaders compatible with GL ES 2.0, though we’re in the process of converting it to use the WebGL infrastructure directly, which would help with shader compatibility.
TextureMapperGL tries to use scissors for clipping whenever it can, though if the clipping is not rectangular stencil is used.
Other things that are handled specially in TextureMapperGL are masks, which are rendered using multi-texturing the content texture with the mask texture, and filters, which are handled with special shaders. A feature that is not yet implemented in TextureMapperGL is CSS shaders, and that is in the pipeline.
TextureMapperLayer is the class that represent a node in the GPU-renderable layer tree. It maintains its own tree hierarchy, which is equivalent to the GraphicsLayer tree hierarchy, though unlike GraphicsLayers, its hierarchy is thread safe.
The main function of TextureMapperLayer is to determine the render order and the use of intermediate surfaces. For example, opacity is to be applied on the rendering results of a subtree, and not separately for each layer.
The correct order of render phasese for a subtree of layers is as such:
However, rendering each of these phases into an intermediate surface would be costly on some GPUs. Therefore, TextureMapperLayer does its best to avoid using intermediate surfaces unless absolutely needed. For example, content that has reflections and no masks or opacity would not need intermediate surfaces at all, and the same goes for content with mask/opacity that does not have overlapping sub layers.
Another use of TextureMapperLayer is to tie together TextureMapper, GraphicsLayerTextureMapper, TextureMapperBackingStore, GraphicsLayerAnimation and GraphicsLayerTransform. It includes all the synchronization phases, that are responsible to time when changes coming from different sources are actually supposed to occur. Without proper synchronization, many times flickers or artifact bugs will occur.
GraphicsLayerTextureMapper is the glue between GraphicsLayer and TextureMapperLayer. Since TextureMapperLayer is thread-safe and GraphicsLayer is not, GraphicsLayerTextureMapper provides the necessary glue. GraphicsLayerTextureMapper does not do anything interesting in particular apart from synchronization.
TextureMapperBackingStore is what glues TextureMapperLayer with different backing stores. It allows the type of backing store to be abstracted away from TextureMapperLayer, allowing TextureMapperLayer to be used both with a standard tiled backing store for WebKit1 (TextureMapperTiledBackingStore), a GraphicsSurface-backed backing store for WebGL (TextureMapperSurfaceBackingStore) or a backing store for WebKit2 (WebKit::CoordinatedBackingStore).
The implementation in TextureMapperTiledBackingStore is pretty simple, and is used only to work around the 2000x2000 texture size limitation in OpenGL. It is very different from the dynamic tiled backing store, discussed later.
GraphicsLayerAnimation is a class that replicates the AnimationController logic in WebCore, but does so without being coupled with particular WebCore primitives like the document.
GraphicsLayerAnimation can receive an accelerated animation “spec” (WebCore::Animation, KeyframeValueList), and process the animation at different points in time, interpolating the opacity, transforms (and filters in the future), sending them back to the animation client, in this case being TextureMapperLayer.
GraphicsLayerTransform is responsible for the transformation matrix calculations based on the rules defined by GraphicsLayer. There are several properties that determine a node’s transformation matrix: its parent matrix, the anchor point, the local transformation, position, scrolling adjustment, perspective and the preserves-3d flag. These properties are separated due to the fact that the transformation might be animated using GraphicsLayerAnimation, which would make it impossible to multiply in advance.
What’s missing from TextureMapper
- CSS shaders
- Support for mask/reflection edge cases
- Better readability in TextureMapperShaderManager
- Support for dynamic TiledBackingStore per layer in WebKit1
- Make it better and faster :)
Coordinated compositing, guarded by USE(UI_SIDE_COMPOSITING), is a WebKit2 implementation of accelerated compositing. It synchronizes the layer tree provided by WebCore in the web process with a proxied layer tree in the UI process that is responsible for the actual GPU rendering to the screen.
The coordinated compositing code might be a bit tricky to read, because of its asynchronous nature on top of the WebKit2 IPC mechanism.
An important concept to understand is the roles of the UI and the web processes: The web process owns the content and the UI process owns the viewport. Thus, the UI process would know about viewport and user interactions before the web process, and the web process would know about content changes (e.g. a CSS layer getting added) before the UI process.
The web process owns the frame and the content. It decides when to render a new frame, which information to include in the layer tree, how the backing stores behave, and also how the animations are timed. The UI process is responsible for rendering the latest frame from the web process, compensating for user interaction not yet committed by the web process.
An example for that is scrolling adjustments - the web process tells the UI process which layers are fixed to the viewport, and the UI process knows the exact scroll position for each paint. It’s the UI process’ job to account for the difference. Another role of the UI process is to periodically notify the web process of viewport changes (e.g. panning), to allow the web process to decide whether or not to create/destroy tiles, as well as moving content into view when writing in text fields.
CoordinatedGraphicsLayer is what glues together the composited content which WebCore creates as GraphicsLayers, and the coordinated graphics system. CoordinatedGraphicsLayer does not deal with IPC directly. Instead, it saves the information gathered from the compositor, and prepares it for serialization. It maintains an ID for each layer, which allows naming each layer, links a layer ID with its backing store, and deals with directly composited images and how they are serialized.
CoordinatedLayerTreeHost is the “boss” for synchronizing the content changes and the viewport changes. It acts on a frame-by-frame basis. For each frame it retrieves the up-to-date layer information from the different CoordinatedGraphicsLayers, makes sure the backing-store content is ready to be serialized, prepares the information needed for scroll adjustment, and passes all that info to the UI process in a series of messages for each frame.
CoordinatedLayerTreeHost maintains a map of “directly composited images” - images that are rendered once and used multiple times in the same texture. CoordinatedLayerTreeHost maintains the ref-counted lifecycle of such an image, signaling to the UI process when an image is no longer in use and its backing texture can be destroyed.
CoordinatedLayerTreeHost is also responsible for two special layers that are not recognized as “composited” by WebCore – a layer for the main web-content, known as non-composited content, is treated as a layer. That is also true for the overlay layer, which paints things such as the scrollbar and the tap indicators.
Another important role of CoordinatedLayerTreeHost is to receive the viewport information from the UI process, and to propagate that information to the different tiled backing-stores so that they can prepare to create/destroy tiles.
CoordinatedLayerTreeHostProxy is what binds together the viewport, the GPU renderer and the contents. It doesn't have functionality of its own, instead it acts as a message hub between those components.
LayerTreeRenderer is the class that knows how to turn serialized information from the web process and viewport information from the view classes into a GPU-renderable tree of TextureMapperLayers. It maintains a map of layer IDs to layers,
What’s missing from Coordinated Compositing
- Coordinating animations separately from normal rendering, e.g. in a thread. See CSS Animations.
- Using ScrollingCoordinator instead of the home-brewed fixed-position adjustments, also use it to support UI-side overflow:scroll. There is still a lot to do around scrolling.
- Serializing and coordinating CSS shaders, once they're implemented in TextureMapper.
Unlike coordinated compositing, which includes mainly light-weight rendering information about each layer, backing-stores contain pixel data, and thus are both memory-hungry and are expensive to copy and serialize. Backing-stores are drawing into by software or by hardware, depending on the scenario, and thus require binding to platform/OS/GPU-specific code.
As OpenGL has texture size limitations we need to restrict what we are saving in GPU memory (ie. not paint and store all the contents in the back-buffer) and we also want to avoid repainting what didn't change, we cannot use the viewport rect as the texture either, as that would invalidate the back-buffer at every scroll.
The solution is to use a tiled backing-store where each layer is split up into tiles. The backing store is dynamic in the sense that it doesn't paint all tiles, but only the tiles visible (tiles intersecting the visible rect) plus some area outside (pre-painted tiles; cover rect). In order to save memory it also keeps a rect of tiles to keep in memory (keep rect) and removes those outside.
The idea about tile pre-painting, is having tiles ready when the user is going to scroll the page. There are some heuristics to calculate where to pre-paint, depending about the panning trajectory, scroll position and contents scale.
Tiling texture gives the benefit of texture streaming. When a large layer has to be uploaded to the GPU it can be done several tiles at the time and thus conserving memory bandwidth and preventing the GPU from stalling during long texture uploads.
TiledBackingStore is the class that makes the decisions about tiles, creating or destroying them, dealing with the cover-rect heuristics that are based on the scroll position, contents size and panning trajectory. TiledBackingStore relies on an abstract WebCore::Tile implementation, allowing for different decisions around drawing the tiles synchronously or asynchronously.
CoordinatedTile is a web-process backend for WebCore::Tile used by the TiledBackingStore, allowing software rendering of tiles into the coordinated graphics system. It's the tile-equivalent of CoordinatedGraphicsLayer, maintaining a tile-ID map. It uses a special client (a superclass of LayerTreeCoordinator) to allocate content surfaces and send the messages to the UI process.
ShareableBitmap is used by Apple and other ports, but in this context it is used as a software backing-store for content updates. When there is no platform-specific GraphicsSurface implementation, ShareableBitmap acts as a fallback that uses standard shared memory as a backing store for the update, and then updates the TextureMapper GPU backing-stores (BitmapTextures) with the contents from that shared memory.
ShareableBitmaps are also used as a backing-store for uploading “directly composited images”, images that are rendered once in software and then reused as a texture multiple times.
GraphicsSurface is an abstraction for a platform-specific buffer that can share graphics data across processes. It is used in two ways, probably more in the future:
- “fast texture uploads”: by painting into a graphics surface in the web process, and then copying into a texture in the UI process, we can avoid expensive texture uploads with color conversions, and instead paint directly to a GPU-friendly (probably DMA) memory buffer.
- WebGL: to allow compositing WebGL content with the rest of the layers, we need to render WebGL content into a surface in the web process and then use the result in the UI process, all that without deep copying of the pixel data.
GraphicsSurfaces are currently supported only on Qt-Mac and GLX, and are currently enabled only for WebGL and not for fast texture uploads. This is work in progress.
ShareableSurface is a glue class that abstracts away the differences between GraphicsSurface and ShareableBitmap for the purposes of serialization. This is a convenience, to allow us to avoid adding #ifdefs or special code-paths for GraphicsSurface in too many places in WebKit2.
CoordinatedBackingStore synchronizes remote tiles (created by the TiledBackingStore) and ShareableSurfaces with TextureMapper in the UI process. It is responsible for copying or swapping the data in the BitmapTexture with new data or updates from ShareableSurface. It also knows how to paint the different tiles with relation to the parameters received from TextureMapperLayer.
UpdateAtlas is a graphics-memory optimization, designed to avoid fragmented allocations of small update buffers. Instead of allocating small buffers as it goes along, UpdateAtlas allocates large graphics buffers, and manages the update-patch allocations by itself, using square shapes inside the large buffers as sub-images to store temporary pixel data.
What’s missing from Coordinated Backing Stores
- More platform support for GraphicsSurfaces – e.g. on some embedded/mobile systems.
- Graceful handling of low memory situations, e.g. by visible incremental updates to tiles using a smaller UpdateAtlas, and adjustment to how the the TiledBackingStore computes the keep and cover rects.
- Allow different pixel formats for tiles, e.g. 16-bit for opaque tiles and 8-bit for masks.
- Use GraphicsSurfaces for fast texture uploads.
To integrate WebKit coordinated graphics with the QtScenegraph, the following main things were necessary:
- Making LayerTreeRenderer thread-safe, so it can run in QtScenegraph's rendering thread.
- Use a private API from QtScenegraph (QSGRenderNode) which allows us to render WebKit's layer tree without rendering into an intermediate FBO.
- Synchronize the remote content coming from the UI process at a time that is right for QtScenegraph.
- Propagate the clipping, transform and opacity attributes from QtScenegraph to LayerTreeRenderer.
- Render the view's background.
The class that handles most of this is QtWebPageSGNode. Its main function, “render”, is used to translate QtScenegraph parameters to LayerTreeRenderer parameters. The SG node contains 3 nodes - a root node that controls the viewport transform, a background node, and a custom contents node which renders the actual web contents using LayerTreeRenderer. Some other functionality is in QQuickWebPage::updatePaintNode.
Note that vsync and triple-buffering are dealt with outside of the scope of this system. The coordinated graphics system renders the compositing scene to the provided buffer, and it's up to the host system (e.g. QWindow) to decide on the buffering strategy.
View the debug border & repaint count
You can see the debug visuals when setting WEBKIT_SHOW_COMPOSITING_DEBUG_VISUALS environment value to 1 as follows.
> WEBKIT_SHOW_COMPOSITING_DEBUG_VISUALS=1 ./MiniBrowser
Currently, in WebKit2, CSS animation frames are handled together with all other frames – the web process re-layouts for each frame, and sends the new transform/opacity/filter information together with the rest of the frame. The requestAnimationFrame feature, which allows synchronizing animations with the display refresh, is synchronized with CoordinatedLayerTreeHost to make sure those animation frames are interpolated as accurately and smoothly as possible, avoiding choppiness and throttling.
One thing that is missing is threaded animations: allowing animations to continue to produce frames while lengthy operations in the web process are taking place. This will allow animations to appear smooth while elements that are not related to that animation are being rendered into the backing store.
This is a somewhat tricky thing to achieve, mainly because animations still need to sometimes sync with the non-animated contents.
There are two possible approaches to WebGL. The current approach uses GraphicsSurfaces, allowing the web process to render with GPU into a platform-specific buffer, later compositing it in the UI process. This approach is somewhat easier to implement, but might not be efficient on some GPU systems, and can also create security issues if the web process has direct access to the GPU.
The other option is to serialize the WebGL display list to the UI process, making the actual GL calls there, into an FBO. This is a safer and potentially more cross-platform approach, however it's yet to be seen how much of an undertaking it is, and how well it scales.
Comments and Questions
This section contains comments and questions. Feel free to add more!
Would be great with more infomation about how vsync and triple buffering fits into all of this (http://hardforum.com/showthread.php?t=928593)
NR: Vsync/triple-buffering is done in the driver/QtScenegraph level. Added a comment.
FOLLOW-UP: How does that work in practice? When the driver vsyncs we draw (or swap buffers => blit) on each sync. If we are not ready (no new back-buffer), we will paint the old front-buffer and ending up with a frame rate of vsync/N, where N is some positive number. Triple buffering fixes this. But if we are just a node in the QtSceneGraph which draws (swap => blit) on vsync, I don't see how we become triple buffered.
It is not so clear how TextureMapper and TextureMapperLayer etc fits together. Like it sounds as TextureMapper is a scene-graph and also a drawing context. Maybe this could be a bit more clear.
NR: TextureMapper is a drawing context, TextureMapperLayer is a scenegraph. Updated.
Can UpdateAtlas be used with canvas like in http://www.dorothybrowser.com/accelerating-html-2d-canvas-texture-atlas-technique ?
NR: yes, we can use a texture atlas for accelerated 2D canvas. However, accelerated 2D canvas is not yet supported and not discussed in this document :)
How to make thumbnails etc. Using the software implementation. Does the software implementation have limitations?
NR: I'd rather make thumbnails with GL and then glReadPixels. The software implementation doesn't have inherent limitations, but it renders slightly differently.
You say that layer tree coordinator sends viewport information for creating, deleting tiles etc. That is only when using the TiledBackingStore. Maybe that could be made more clear
NR: Not sure what comment to add.
How does HW accelerated 2D canvas come into place?
NR: It needs the same things that WebGL needs, e.g. GraphicsSurface or serialized commands. But it's not baked enough to write anything about.
How does the Chrome threaded compositor relate to this? Could/should we do something similar?
Are we optimizing occluded tiles (those not visible due to some opaque layer fully covering them), ie. not painting them until needed?
What needs to be changed to this system if/when WebKit2 goes multi-process instead of the current dual process model?