GPU Process
by Wensen Hseih (Apple) Slide Deck
High level summary of the goals of GPU process, a basic walkthrough of the architecture, and the current status of the project.
Hello, and welcome! Mostly covers the what, why, and how, and will discuss active areas of development at the end.
— What?
Current: [architecture diagram] 1 UI process, 1 network process, multiple web content processes
Add a new process for graphics and media playback
— Why?
Security. Web Process has privs to talk to GPU and kernel. If Web Process is compromised, attacker gets access to these privs
Move it out of process so we can vet priviliged access and terminate the web process if it tries to do anything suspicious
— How?
ImageBuffer — currently wraps e.g. CGContext
In GPUP world, it instead owns a DisplayList WebCore GraphicsContext
ImageBuffer now has a GPUP counterpart, which owns the real CGContext
<many class names>
— How to draw a path (for example)
something in Web Process calls e.g. GraphicsContext::strokePath
we don’t have a platformContext(), but instead a GraphicsContextImpl (m_impl), which is a DisplayListRecorder
builds up a DisplayList, builds up items that represent GraphicsContext operations that we’ll later apply to the real context in the GPU process
in the strokePath case, we record all the things you need to replay the strokePath() in the GPU process
(the path, etc.)
then, in the GPU process, we’re back at the beginning (GraphicsContext::strokePath); this time, we don’t have a m_impl, but we DO have a PlatformContext (CGContextRef), so we go back to the platform implementation
(and stroke the path on the CGContext)
send all sorts of display list items (corresponding to GraphicsContext items)
JER---
No slides, going to talk about what we need to do to move Media out of process.
Normal media playback is already out of process (straight files, HLS playlists)
Next few weeks, same thing for MediaSource backed media elements.
Will continue to keep in-process model working for ports that haven’t adopted the GPU process.
Moving the parts of MSE that deal with samples into SourceBufferPrivate
Benefits are like Wenson mentioned: limiting the surface area risk for things that take over the Web Content process
Limiting risk + severity of RCE in the WP
Completed early stages, performance seems to be (generally) on-par (small regression)
Work is straightforward, just keep pushing through and do in a way that other ports can adopt in the future
WENSON---
Remaining challenges and ongoing work
PERFORMANCE
Measuring with MotionMark
Much of the regression is overhead from serializing and coordinating graphics commands between the WP + GPU process
When we first enabled GPUP for canvas about a year ago, we were 40-60% regressed on the canvas subtests
landed some recent changes in trunk that mitigate this by storing DisplayList items in shared memory (in a segmented ring buffer) so the web process can write while the GPU process is reading
Four main areas to complete: 2d canvas, dom rendering, webgl, media
Have separate switches for each
Once we turn them all on, we can eliminate IOKit access overall
And get the security benefits
Security benefits only hold if the code in the GPU process is robust as well
Suppose we have a compromised WP
Should never be able to crash the GPU process by sending e.g. malicious IPC
Need to be robust, even terminating the Web Content process if we see anything fishy (OOB access, etc.)
Questions & Comments
Ken R.: Can I ask about synchronization; have you thought about the synchronization e.g. media painting into canvas
Wenson: not flushed out yet (have the ability to paint the current frame, but need to e.g. be able to reference the correct time when the painting was requested)
Simon: There are cases right now where you have to synchronously IPC (e.g. readPixels). And canvas-to-canvas painting needs to be sure that you’ve flushed the source before copying
When you know that a canvas is going to paint into another canvas, you push a flush identifier into the source and know you have. to wait for the dest to ask for it
We know we have to not break any existing behavior
Wenson: Currently in the GPUP, display list playback is currently on the main thread, which makes synchronization much easier. Once we move things to separate threads, it will be trickier and need more logic
Ken: One real gotcha: Google Maps; 2D canvas on top of WebGL. HTML spec guarantees that all work done in a RAF shows up on screen at the same time. Make sure to give that thought
Simon: Pretty sure we already aren’t synchronized here. Kimmo has been doing some changes where the dest buffer management in WebGL is going to be more similar to how we manage front/back buffers for DOM rendering
Think we’ll end up doing a better job of synchronization with the GPUP
Hopefully that won’t be an issue
(everyone concurs that there probably aren’t WPTs that will catch this)
We’ve been focusing our testing on motionmark, but would be great if people have examples of complex feature rich 2d canvases that can be looked at for performance
Ken: Anything about the progress for remoting WebGL?
Simon: Kimmo is making progress; WebGLLayer is going to turn into one of our PlatformLayer things. that doesn’t actually have a CALayer, and front/back buffer management will move down into GraphicsContextGL. Very active development
No attempt yet to optimize sending GL commands (Wenson’s mechanism is all for DisplayList, not sure if we will share that for GL yet)
Wenson: Performance will be a huge hurdle
Think blink has a similar setup, but abstract at the GL layer
Some advantages to having our DisplayList items as an abstraction (can encapsulate multiple GL commands for normal drawing), but needs to be powerful enough to handle both
Simon: perf going to be very sensitive to the type of content; will be in a good place when all of the content is flowing Web->GPU
(will take a hit for getError, etc.)
(have to stall the world and sync IPC to the GPUP to get the pixels for getPixels, etc.)
Ken: Blink’s started as GL and then added more terse representation
Having both certainly adds value
Ken: a big gotcha, since you’re writing into shared memory; very important to only read that shared memory once in the GPU process
Wenson: not sure what you mean; no mechanical guards, but we do intend to only do it once
Security is tricky, conservative approach now: we only need it right when we’re applying. Once it’s validated, it’s copied, and applied from there