wiki:TriagingTestFailures

Introduction

The build bots are most useful when they are "green" (i.e., the build isn't broken and there are no unexpected test failures). When the bots are green and a regression is introduced, the bots go from green to red, which is very easy to notice and respond to. When the bots are red and a regression is introduced, they stay red, so it's hard to notice the regression occurred at all.

This guide attempts to walk you through a process to get the bots green again when they are red by triaging test failures, filing bugs on them, and checking in new results or skipping tests.

Find out what is failing

There are two main ways to do this:

  • Browse build.webkit.org (you should probably start with this one)
    1. Go to http://build.webkit.org/builders.
    2. Click on the name of a builder you're interested in to see a summary of its recent builds.
      • The Info column will tell you if any tests failed for that build.
      • To see the test output for a particular build, click on the link in the Build # column, then on view results, then on results.html
  • Use webkit-patch
    1. Run this command:
      webkit-patch failure-reason
      
    2. When prompted, specify which builder you're interested in.
    3. Press Enter to continue. webkit-patch will look back through the recent builds for that builder until it has found when all current failures were introduced.

Find out whether the failures are known

Bugs that are known to be making the bots red are marked with the MakingBotsRed keyword. Look through the list of bugs with the MakingBotsRed keyword to see if the failures are already known. If so, you might be able to skip ahead to Get the bots green again.

Find out when each test started failing

You can either:

Try to figure out why each test is failing

(You probably won't be able to figure out exactly why every test is failing, but the more information you can get now, the better.)

Look at the revision range where the failure was introduced. If you find that:

  • The test and/or its expected output was modified
    • The test might need new results for the failing platform(s).
    • Are the test's results platform-specific (i.e., are they beneath LayoutTests/platform/)?
      • Yes: the failing platforms might just need new results checked in. You'll have to verify that the current output from those platforms is correct.
      • No: the failing platforms might have some missing functionality in WebKit or DumpRenderTree.
  • Related areas of WebKit were modified
    • Were the modifications platform-specific?
      • Yes: the failing platforms might need similar modifications made.
      • No: there might be some existing platform-specific code that is responsible for the different results.

File bugs for the failures

If multiple tests are failing for the same reason, you should group them together into a single bug. You should group them together into a single bug even if you aren't 100% certain they're failing for the same reason. If you're wrong, whoever investigates the failures later will figure this out and file new bugs as needed.

If a test fails on multiple platforms and those platforms will need separate fixes, you should file one bug for each failing platform.

  1. Go to http://webkit.org/new-bug
  2. Include in your report:
  3. Apply keywords
    • MakingBotsRed
    • LayoutTestFailure
    • Regression, if the failure is due to a regression in WebKit
    • PlatformOnly, if the test only fails on one platform
  4. CC appropriate people
    • Experts in the relevant area of WebKit
    • The author of the test
    • The author and reviewer of the change that caused the regression, if known
  5. If the test affects one of Apple's ports, and you work for Apple, you should migrate the bug into Radar.

Get the bots green again

  • If you know why the test is failing, and know how to fix it (e.g., by making a change to WebKit or checking in new correct results), then fix it and close the bug!
  • Otherwise, do one of the following and note what you did in the bug you filed, then remove the MakingBotsRed keyword.
    • If the test fails every time and the test's output is the same every time, check in new results for the test and include the bug URL in your ChangeLog.
      • You should do this even if the test's output includes failure messages or incorrect rendering. By running the test against these "expected failure" results rather than skipping the test entirely, we can discover when new regressions are introduced. (See more discussion of this policy here.)
    • If the test fails intermittently, or crashes, or hangs, add the test to the appropriate Skipped files (e.g., LayoutTests/platform/mac-leopard/Skipped). Include a comment in the Skipped file with the bug URL and a brief description of how it fails, e.g.:
      # Sometimes times out http://webkit.org/b/12345
      fast/js/some-cool-test.html
      

Watch the bots to make sure it worked

Once you're finished addressing test failures, you should watch the bots to make sure what you did worked. Common problems include:

  • Misspelling a test name in the Skipped file
    • You'll see output like:
      Skipped list contained 'editing/spelling/spelling-contenteditable.html', but no file of that name could be found
      
  • New tests start failing after you skipped an earlier test
    • This can mean that there was a test earlier than the one that you skipped that was the actual cause of the trouble (e.g., an earlier test causing a later test to hang). You should try to track down the test that is the root cause of the issue and skip it instead of the test(s) it's affecting.
  • Putting new results in the wrong directory
Last modified 14 years ago Last modified on Mar 1, 2011, 2:47:18 PM
Note: See TracWiki for help on using the wiki.