Changes between Initial Version and Version 1 of SelectiveTestBuildBot


Ignore:
Timestamp:
Feb 13, 2012 3:36:13 AM (12 years ago)
Author:
beszedes@inf.u-szeged.hu
Comment:

Adding regression test improvement pages

Legend:

Unmodified
Added
Removed
Modified
  • SelectiveTestBuildBot

    v1 v1  
     1= Selective Regression Testing =
     2
     3== Goal ==
     4The purpose of this project is to create and maintain a test system that executes Layout Tests based on code changes, that is only those tests are selected for execution (automatically) that are affected by the changes made to the source code.
     5
     6== Motivation ==
     7The selection is made based on function level code coverage which means that only those tests are selected for a revision x which traverse any of the functions changed at x, all other tests are skipped. Based on an experiments performed on WebKit revisions from October 2011,
     8on average less than 0.1% of the test cases were failing while on average 5 functions were changed per revision in this period. Moreover, the number of test cases affected by a specific change was usually very low. The coverage of all the tests was altogether about 70-75% of the total number of functions but if we perform the tests selectively we can still achieve the same coverage with respect to the changes but by executing only a fraction of all the tests.
     9
     10In the same experiment, we got a high percentage of inclusiveness: over 95% of the failing test cases in the full test could be identified by using the function change coverage method, and at the same time a siginficant reduction in the number of test performed was achieved. Namely, we got this result by executing less than 1% of the test cases on average.
     11
     12== Technicalities ==
     13The method for test selection is the following. We instrument the source code of the methods and functions to log entry and exit events during execution, and make such an instrumented build of the system. Then all tests are executed to produce the initial coverage information, which relates each test to a set of functions it executes. This information is stored in a relational database. Next, a set of changed functions is extracted from a revision under test and this set is used to query the database for a list of tests to execute.
     14The technical details and source code of the scripts can be found at [wiki:"RegSelectionDetails"].
     15
     16== Usage ==
     17There will be two types of use of the selective regression tests:
     18 * As a special build bot (currently for the Qt port). This provides a faster alternative to the build bot performing full test: it saves more than 77% of the total build time including the overhead required for the selection.
     19 * As part of the Chromium Early Warning System. This will allow developers and reviewers to check the patches in Bugzilla before landing in the repository. Due to the selection the time required to perform the tests will be much less. Detailed description is at [wiki:"SelectiveTestEWS"].
     20
     21== Build bot ==
     22The selective test build bot for Qt has been started in November 2011, and since then several hundred builds have been performed. Technical details can be found at [wiki:"SelectiveTestTechnical"].
     23
     24== Initial analysis of results ==
     25This bot is running in parallel to the full Qt test, but since both bots usually have to skip revisions there was only 64 revisions that were common.
     26We manually compared the full and selective test for the inclusivenes of the selection:
     27 * We excluded contiguous failing tests because the selective test looks only for changes and does not deal with reoccuring failures.
     28 * There were 5 Crashes among these revisions
     29 * In 9 cases exactly the same test cases were failing in the two bots
     30 * There were additional failures in both bots in 24 cases
     31 * In 3 cases there were tests that the selective test did not find
     32 * In 23 cases selective test found more failures
     33
     34Overall, in the cases where there was an overlap between the failures found, about 77% of the failures in the full test have been found by the selective test. The majority of the additional failures not found by the full test turned out to be flakes or time outs.
     35Among the missed failures by the selective test 92% was due to the outdated coverage database (some new tests were not in it yet) and the rest is due to flakes and time outs.
     36
     37Since the coverage computation is resource intensive operation we plan to update the coverage database on a regular basis but not for each revision. This will then eliminate most of the remaining 23% of the missed failures. Currently we are working on starting a continuous (instrumented) build bot to provide mostly up to date coverage information.