Context Navigation

Keeping the bots green

Commits that break the build
Commits that break tests
Flakey tests
Script bugs
Flakey machines, OS bugs
- Kernel panics, overloaded system components
- Limited disk space on build.webkit.org (currently being addressed)

Build/test cycle is too damn long (4 or 5 hour delay to see if you broke anything)
Tester bots get behind
Testers don't test the latest build
Waterfall page doesn’t show enough info
- Need a lot of usability improvements to make it easier for developers to use the waterfall/console to determine if they broke things
If the bot is already red, it is hard to tell if your change made it worse
- Perhaps we should add (back?) colors for the degree of breakage (e.g. orange means only a few tests are failing, red means lots of tests are failing)

Apple’s
- 2 people watching the bots concurrently every week (just during regular work hours)
- Builds hovering close to green, but still go red multiple times a day
Chromium’s
- 3 sets of bots
  - chromium.org - ToT Chromium with a fixed WebKit
    - Very strict green policy, aggressively rolling out changes that make the bots red
    - Mostly green with spots of red
  - build.webkit.org - ToT WebKit with a fixed version of Chromium
    - Ad hoc maintenance
    - Red most of the time
  - Canaries - ToT Chromium with ToT WebKit
    - Rotating shifts of 1 person for 4 days at a time
    - Around the clock coverage
    - Most of the time spent updating baselines because the Chromium bots run the pixel tests
    - Also red most of the time
    - This is what the gardeners look at first
Qt has Ossy
- Leading a group of 6 people
- They have found that it requires experience to identify which checkins caused which failures
GTK
- Makes use of TestFailures + build.webkit.org/waterfall
- For a while they had one person per day gardening, but right now it is up to individual contributors

Ideally we should all use the same status page to determine if something broke
Darin's ideal tool
- Identifies when something is broken that wasn't broken before
- Present the suspect set of revisions
- Once the gardener determines which revision is most likely the cause, provides a quick way to notify the relevant people
Garden-O-Matic
- Built on top of code in webkit.py in the WebKit tree, designed to be used by any port
  - Currently only works for the Chromium port - needs adjustment for different result formats and URLs used by different ports
  - FIXME: bugs for these?
- Client-side tool, runs a local webserver
- Allows for you to browse failing tests, one-click to get the rebaseline applied to your local tree
We should sit down and merge changes made to buildbot for build.chromium.org and build.webkit.org
- Full Buildbot is not checked into WebKit, only some configuration files
  - For build.webkit.org, changing config files automatically restarts master
  - The Chromium infrastructure team tracks Buildbot ToT pretty closely
- For build.webkit.org, can go back 25, 50, 100, etc on the waterfall views
- FIXME: Can we get the improvements that Ojan made to the console view for build.chromium.org added for build.webkit.org?
Qt is working on a tool to run just the relevant tests to your change, based on code coverage
- Still doesn't deal with flakey tests well

More EWS bots?
- FIXME: Get the Mac EWS bots running the tests
Easier way to identify a commit that caused a test failure
Automatic notifications of build/test failure
- The issue here is that when there are infrastructure problems, everything got a notification
- FIXME: We should bring back the automatic notifications from sheriffbot

Chromium does some distributed compilation, speeds up builds quite a lot
Chromium sometimes splits up tests - runs half on one machine and half on another
- dpranke looked at doing a master / gather, but it might not be worth the effort needed to keep the bots in sync

Would need to be actively enforced by tools
Chromium has this policy today but
- there aren't that many flaky tests (most already marked as flakey in TestExpectations files)
- they have been able to isolate machine-related flakiness
- they have tryjobs, unit tests, all required to be run before checking in
Probably not feasible yet for WebKit anyways due to tooling
- Need a way to grab new baselines for all ports
- Need EWS support for tests for all ports

Lots of tests don't work well in parallel due to reliance on system components that get overloaded
- dpranke may have something for this soon
Tests run before
- e.g. setTimeout that gets applied to the next test
- Seems like we could improve our tools for that
Memory corruption

dpranke is working on having NRWT restart DRT on directory boundaries, found that it does reduce flakiness

More emphasis on running tests before committing
Easier to generate platform-specific test results
Standard process
- If a test is failing, there should be one standard way of dealing with it
- We don't currently have a good understanding of what is the best practice (do you roll it out? skip? land failing results? how long do you wait after notifying the committer?)

Last modified 13 years ago Last modified on Jun 3, 2012, 9:16:45 PM

Note: See TracWiki for help on using the wiki.