wiki:RebaselineServer

Version 7 (modified by mihaip@chromium.org, 4 years ago) (diff)

--

Rebaseline Server

The "Rebaseline Server" is a tool that makes it easy to do "mass" rebaselines (updating of pixel or text baselines for dozens or hundreds of layout tests). It makes it easy to review changes to baselines and update existing ones (and optionally move them, to handle OS transitions). It's known as a server because it runs a local HTTP server which makes it easy to have a cross-platform GUI for reviewing that is backed by Python code for doing the SCM (SVN/Git) operations on the filesystem.

To use the tool, run: webkit-patch rebaseline-server WebKitBuild/{Debug|Release}/layout-test-results (after having run new-run-webkit-tests). Once the GUI that runs at http://localhost:8127 launches, you can inspect results and add tests that you wish to rebaseline to the queue. Once you have enqueued enough tests, you can process the queue, which will do the necessary Git/SVN operations. You can then submit those changes as usual with webkit-patch.

See the UI walkthrough and sample usages below for more details.

UI Walkthrough

1. Test selectors

These menus let you choose which test(s) you want to rebaseline. The first menu breaks down tests by failure type. Within that failure type, the second menu filters by directory. The third menu lists individual tests. You can use the « and » buttons to move through the tests in the chosen directory (there are also keyboard shortcuts). The View test link shows the test's source on trac.webkit.org.

2. Image results

Shows expected/actual/diff of the image results (if the test failed because of image diffs). The diff displays pixels that were the same at 25% opacity, and different ones as red. You can click on any image to use the loupe to magnify that area. In the case of matching images but mismatching checksums, the diff area displays the checksums.

3. Text results

Similarly to image results, this area displays expected/actual/diffs of text results if the test failed because of text differences.

4. Loupe

The loupe is triggered by clicking in any of the image outputs. It shows enlarged areas of the expected, actual and diff images of 20x20 pixels around the click point. The color values at the click point are also displayed. You can also click within enlarged pixels in the loup to re-center it on that pixel.

5. Test state

A test can either need rebaselining, be in the rebaseline queue, have its rebaseline attemp succeed or fail. The current state of it is shown in the lower left part of the footer. The list of tests in the test selector is also grouped by test state.

6. Current baselines

The currently checked in baselines (for all platforms) are listed in the footer as well. The baselines that were actually used for the test are in brighter blue (and are bold).

7. Baseline target (and what to do with existing baselines)

The first platform drop-down lets you choose which platform you want to update baselines for (e.g. mac or chromium-linux). The second one lets you choose what to do with existing baselines (if any) for that platform. This is useful when updating baselines because of OS changes, in which case existing baselines need to be moved to a more version-specific directory (e.g once Snow Leopard was released, the mac directory should contain Snow Leopard baselines, and existing ones were moved to mac-leopard).

8. Queue

Rebaselining is not done immediately (since the SCM operations can take a while). Instead, tests to be rebaselined are added to a queue (using the big button in the lower right). The queue can be shown and managed by using the "Queue" link next to it. To actually rebaseline the queue, the button under it should be used.

The "Log" link in the upper right lets you see log output from the server during a rebaseline. This may be helpful in understanding what operations are run as part of a rebaseline. Next to it is an "Exit" link that shuts down the local HTTP server.

Keyboard shortcuts

The following keyboard shortcuts are supported:

  • left and right arrow: move through tests
  • q: add the current test to the rebaseline queue
  • x: remove the current test from the rebaseline queue
  • r: rebaseline all the tests in the queue

Sample Usages

Leopard to Snow Leopard Transition

Snow Leopard has a different text anti-aliasing algorithm (and other small rendering tweaks) which requires many pixel baselines to be updated. The flow for doing this would be.

  1. (On a Snow Leopard machine) run new-run-webkit-tests --tolerance 0 --pixel-tests [path/to/tests] (or omit the path to run all tests)
  2. Observe that many tests failed with imaged differences
  3. Run webkit-patch rebaseline-server WebKitBuild/{Debug|Release}/layout-test-results (or wherever your layout test results are localed)
  4. http://localhost:8127/ should be launched in your browser, navigate to it if not
  5. From the footer of the UI, set the baseline target to mac (since we want the default mac port baselines to reflect the newest shipping OS)
  6. Similarly, set the Move current baselines to: dropdown to mac-leopard, so that the existing baselines are still used for bots that are on Leopard
  7. Refer to the UI walkthrough for more details on how to use the GUI
  8. Commit the result patch with webkit-patch as usual

If working on the Chromium port, the above steps can be used there too, except the baseline target is chromium-mac and existing baselines should be moved to chromium-mac-leopard.

Skia changes

Skia is the graphics library used by the Chromium port on Windows and Linux; changes to it often require updates to pixel baselines.

  1. (On a Windows or Linux machine) run new-run-webkit-tests --tolerance 0 --pixel-tests [path/to/tests] (or omit the path to run all tests)
  2. Observe that many tests failed with imaged differences
  3. Run webkit-patch rebaseline-server WebKitBuild/{Debug|Release}/layout-test-results (or wherever your layout test results are localed)
  4. http://localhost:8127/ should be launched in your browser, navigate to it if not
  5. From the footer of the UI, set the baseline target to chromium-win or chromium-win as appropriate.
  6. Leave the Move current baselines to: dropdown set to Nowhere (replace)
  7. Refer to the UI walkthrough for more details on how to use the GUI
  8. Commit the result patch with webkit-patch as usual

Note that if you'd like to rebaseline both Windows and Linux in the same patch, you'll need to run the tests on both platforms by hand and copy the results file to the machine that's running the rebaseline server (see the caveats section).

Code location and design

The Python server lives at WebKitTools/Scripts/webkitpy/tool/commands/rebaselineserver.py, the GUI is at WebKitTools/Scripts/webkitpy/tool/commands/data/rebaselineserver/.

The Python server parses the unexpected_results.jsonfile generated by new-run-webkit-tests. It then starts an HTTP server. That server can respond with the list of failing tests, expected/actual output for a test (read from the test results directory), or to commands to rebaseline tests. The latter is done by using the scm.py wrapper that makes it agnostic to Git vs. SVN checkouts.

The UI populates various <select>s based on the test output, grouping results by failure type, directory and test. The "loupe" functionality is accomplished by rendering pixel tests into a <canvas> and then drawing the enlarged pixels into another <canvas> (see loupe.js). The queue is maintained as a <select> as well (see queue.js).

Caveats/Limitations

  • new-run-webkit-tests must be used (the tool currently relies on the JSON output that only NRWT produces)
  • The tool can currently only use local test results, it cannot pull from bots
  • The UI works best on a 30 inch monitor

Attachments