GPU Process

by Wensen Hseih (Apple) Slide Deck

High level summary of the goals of GPU process, a basic walkthrough of the architecture, and the current status of the project.

Hello, and welcome! Mostly covers the what, why, and how, and will discuss active areas of development at the end.

— What?

Current: [architecture diagram] 1 UI process, 1 network process, multiple web content processes

Add a new process for graphics and media playback

— Why?

Security. Web Process has privs to talk to GPU and kernel. If Web Process is compromised, attacker gets access to these privs

Move it out of process so we can vet priviliged access and terminate the web process if it tries to do anything suspicious

— How?

ImageBuffer — currently wraps e.g. CGContext

In GPUP world, it instead owns a DisplayList WebCore GraphicsContext

ImageBuffer now has a GPUP counterpart, which owns the real CGContext

<many class names>

— How to draw a path (for example)

something in Web Process calls e.g. GraphicsContext::strokePath

we don’t have a platformContext(), but instead a GraphicsContextImpl (m_impl), which is a DisplayListRecorder

builds up a DisplayList, builds up items that represent GraphicsContext operations that we’ll later apply to the real context in the GPU process

in the strokePath case, we record all the things you need to replay the strokePath() in the GPU process

(the path, etc.)

then, in the GPU process, we’re back at the beginning (GraphicsContext::strokePath); this time, we don’t have a m_impl, but we DO have a PlatformContext (CGContextRef), so we go back to the platform implementation

(and stroke the path on the CGContext)

send all sorts of display list items (corresponding to GraphicsContext items)


No slides, going to talk about what we need to do to move Media out of process.

Normal media playback is already out of process (straight files, HLS playlists)

Next few weeks, same thing for MediaSource backed media elements.

Will continue to keep in-process model working for ports that haven’t adopted the GPU process.

Moving the parts of MSE that deal with samples into SourceBufferPrivate

Benefits are like Wenson mentioned: limiting the surface area risk for things that take over the Web Content process

Limiting risk + severity of RCE in the WP

Completed early stages, performance seems to be (generally) on-par (small regression)

Work is straightforward, just keep pushing through and do in a way that other ports can adopt in the future


Remaining challenges and ongoing work


Measuring with MotionMark

Much of the regression is overhead from serializing and coordinating graphics commands between the WP + GPU process

When we first enabled GPUP for canvas about a year ago, we were 40-60% regressed on the canvas subtests

landed some recent changes in trunk that mitigate this by storing DisplayList items in shared memory (in a segmented ring buffer) so the web process can write while the GPU process is reading

Four main areas to complete: 2d canvas, dom rendering, webgl, media

Have separate switches for each

Once we turn them all on, we can eliminate IOKit access overall

And get the security benefits

Security benefits only hold if the code in the GPU process is robust as well

Suppose we have a compromised WP

Should never be able to crash the GPU process by sending e.g. malicious IPC

Need to be robust, even terminating the Web Content process if we see anything fishy (OOB access, etc.)

Questions & Comments

Ken R.: Can I ask about synchronization; have you thought about the synchronization e.g. media painting into canvas

Wenson: not flushed out yet (have the ability to paint the current frame, but need to e.g. be able to reference the correct time when the painting was requested)

Simon: There are cases right now where you have to synchronously IPC (e.g. readPixels). And canvas-to-canvas painting needs to be sure that you’ve flushed the source before copying

When you know that a canvas is going to paint into another canvas, you push a flush identifier into the source and know you have. to wait for the dest to ask for it

We know we have to not break any existing behavior

Wenson: Currently in the GPUP, display list playback is currently on the main thread, which makes synchronization much easier. Once we move things to separate threads, it will be trickier and need more logic

Ken: One real gotcha: Google Maps; 2D canvas on top of WebGL. HTML spec guarantees that all work done in a RAF shows up on screen at the same time. Make sure to give that thought

Simon: Pretty sure we already aren’t synchronized here. Kimmo has been doing some changes where the dest buffer management in WebGL is going to be more similar to how we manage front/back buffers for DOM rendering

Think we’ll end up doing a better job of synchronization with the GPUP

Hopefully that won’t be an issue

(everyone concurs that there probably aren’t WPTs that will catch this)

We’ve been focusing our testing on motionmark, but would be great if people have examples of complex feature rich 2d canvases that can be looked at for performance

Ken: Anything about the progress for remoting WebGL?

Simon: Kimmo is making progress; WebGLLayer is going to turn into one of our PlatformLayer things. that doesn’t actually have a CALayer, and front/back buffer management will move down into GraphicsContextGL. Very active development

No attempt yet to optimize sending GL commands (Wenson’s mechanism is all for DisplayList, not sure if we will share that for GL yet)

Wenson: Performance will be a huge hurdle

Think blink has a similar setup, but abstract at the GL layer

Some advantages to having our DisplayList items as an abstraction (can encapsulate multiple GL commands for normal drawing), but needs to be powerful enough to handle both

Simon: perf going to be very sensitive to the type of content; will be in a good place when all of the content is flowing Web->GPU

(will take a hit for getError, etc.)

(have to stall the world and sync IPC to the GPUP to get the pixels for getPixels, etc.)

Ken: Blink’s started as GL and then added more terse representation

Having both certainly adds value

Ken: a big gotcha, since you’re writing into shared memory; very important to only read that shared memory once in the GPU process

Wenson: not sure what you mean; no mechanical guards, but we do intend to only do it once

Security is tricky, conservative approach now: we only need it right when we’re applying. Once it’s validated, it’s copied, and applied from there

Last modified 4 months ago Last modified on Jan 5, 2021 3:20:25 PM