Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of April 2012 Keeping the bots green

Timestamp:: Apr 20, 2012, 3:44:29 PM (13 years ago)
Author:: jberlin@webkit.org
Comment:: Notes for the keeping the bots green session

Legend:

: Unmodified
: Added
: Removed
: Modified

April 2012 Keeping the bots green

               v1
+Keeping the bots green
+----------------------
+Things that cause bots to go red
+        Commits that break the build
+        Commits that break tests
+        Flakey tests
+        Script bugs
+        Flakey machines, OS bugs
+                Kernel panics, overloaded system components
+                Limited disk space on build.webkit.org (currently being addressed)
+Improve build state feedback for committers
+        Build/test cycle is too damn long (4 or 5 hour delay to see if you broke anything)
+        Tester bots get behind
+        Testers don't test the latest build
+        Waterfall page doesn’t show enough info
+                Need a lot of usability improvements to make it easier for developers to use the waterfall/console to determine if they broke things
+        If the bot is already red, it is hard to tell if your change made it worse
+                Perhaps we should add (back?) colors for the degree of breakage (e.g. orange means only a few tests are failing, red means lots of tests are failing)
+Gardening approaches
+        Apple’s
+people watching the bots concurrently every week (just during regular work hours)
+                Builds hovering close to green, but still go red multiple times a day
+        Chromium’s
+sets of bots
+                        chromium.org - ToT Chromium with a fixed WebKit
+                                Very strict green policy, aggressively rolling out changes that make the bots red
+                                Mostly green with spots of red
+                        build.webkit.org - ToT WebKit with a fixed version of Chromium
+                                Ad hoc maintenance
+                                Red most of the time
+                        Canaries - ToT Chromium with ToT WebKit
+                                Rotating shifts of 1 person for 4 days at a time
+                                Around the clock coverage
+                                Most of the time spent updating baselines because the Chromium bots run the pixel tests
+                                Also red most of the time
+                                This is what the gardeners look at first
+        Qt has Ossy
+                Leading a group of 6 people
+                They have found that it requires experience to identify which checkins caused which failures
+        GTK
+                Makes use of TestFailures + build.webkit.org/waterfall
+                For a while they had one person per day gardening, but right now it is up to individual contributors
+Can we share more tools?
+        Ideally we should all use the same status page to determine if something broke
+        Darin's ideal tool
+                Identifies when something is broken that wasn't broken before
+                Present the suspect set of revisions
+                Once the gardener determines which revision is most likely the cause, provides a quick way to notify the relevant people
+        Garden-O-Matic
+                Built on top of code in webkit.py in the WebKit tree, designed to be used by any port
+                        Currently only works for the Chromium port - needs adjustment for different result formats and URLs used by different ports
+                        FIXME: bugs for these?
+                Client-side tool, runs a local webserver
+                Allows for you to browse failing tests, one-click to get the rebaseline applied to your local tree
+        We should sit down and merge changes made to buildbot for build.chromium.org and build.webkit.org
+                Full Buildbot is not checked into WebKit, only some configuration files
+                        For build.webkit.org, changing config files automatically restarts master
+                        The Chromium infrastructure team tracks Buildbot ToT pretty closely
+                For build.webkit.org, can go back 25, 50, 100, etc on the waterfall views
+                FIXME: Can we get the improvements that Ojan made to the console view for build.chromium.org added for build.webkit.org?
+        Qt is working on a tool to run just the relevant tests to your change, based on code coverage
+                Still doesn't deal with flakey tests well
+Infrastructure/waterfall improvements
+        More EWS bots?
+                FIXME: Get the Mac EWS bots running the tests
+        Easier way to identify a commit that caused a test failure
+        Automatic notifications of build/test failure
+                The issue here is that when there are infrastructure problems, everything got a notification
+                FIXME: We should bring back the automatic notifications from sheriffbot
+Distributing builds / testing (e.g. diskcc)
+        Chromium does some distributed compilation, speeds up builds quite a lot
+        Chromium sometimes splits up tests - runs half on one machine and half on another
+                dpranke looked at doing a master / gather, but it might not be worth the effort needed to keep the bots in sync
+Pixel tests
+        Could try to convert more to ref tests
+        Garden-O-Matic will help with this
+        Are pixel tests worth it?
+                Every time Apple has tried to start running the pixel tests, it has been a struggle to maintain them
+                Neither Chromium nor Apple has run a cost / benefit analysis of them
+                One gardener said that about 60% of pixel test failures he saw when gardening have pointed out real bugs
+                It is difficult for the gardener to tell with a lot of the tests whether a change is actually a regression
+                        This results in a lot of skipping
+Discussion of the No Commits Until It Is Green policy
+        Would need to be actively enforced by tools
+        Chromium has this policy today but
+                there aren't that many flaky tests (most already marked as flakey in test_expectations.txt)
+                they have been able to isolate machine-related flakiness
+                they have tryjobs, unit tests, all required to be run before checking in
+        Probably not feasible yet for WebKit anyways due to tooling
+                Need a way to grab new baselines for all ports
+                Need EWS support for tests for all ports
+Discussion of causes of flakey tests
+. Lots of tests don't work well in parallel due to reliance on system components that get overloaded
+                dpranke may have something for this soon
+. Tests run before
+                e.g. setTimeout that gets applied to the next test
+                Seems like we could improve our tools for that
+. Memory corruption
+        dpranke is working on having NRWT restart DRT on directory boundaries, found that it does reduce flakiness
+Workflow improvements
+        More emphasis on running tests before committing
+        Easier to generate platform-specific test results
+        Standard process
+                If a test is failing, there should be one standard way of dealing with it
+                We don't currently have a good understanding of what is the best practice (do you roll it out? skip? land failing results? how long do you wait after notifying the committer?)
+Skipped tests are technical debt
+        More emphasis on unskipping?