Changes between Version 2 and Version 3 of SelectiveTestBuildBot


Ignore:
Timestamp:
Apr 16, 2012, 6:09:38 AM (13 years ago)
Author:
beszedes@inf.u-szeged.hu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SelectiveTestBuildBot

    v2 v3  
    11= Selective Regression Testing =
    22
    3 == Goal ==
     3== Goal and motivation ==
    44The purpose of this project is to create and maintain a test system that executes Layout Tests based on code changes, that is, only those tests are selected for execution (automatically) that are affected by the changes made to the source code. The method can be used either in a build bot or as part of an Early Warning System (see [wiki:"SelectiveTestEWS"]).
    55
    6 == Motivation ==
    76The selection is made based on function level code coverage which means that only those tests are selected for a revision x which traverse any of the functions changed at x, all other tests are skipped. Based on an experiments performed on WebKit revisions from October 2011,
    87on average less than 0.1% of the test cases were failing while on average 5 functions were changed per revision in this period. Moreover, the number of test cases affected by a specific change was usually very low. The coverage of all the tests was altogether about 70-75% of the total number of functions but if we perform the tests selectively we can still achieve the same coverage with respect to the changes but by executing only a fraction of all the tests.
     
    109In the same experiment, we got a high percentage of inclusiveness: over 95% of the failing test cases in the full test could be identified by using the function change coverage method, and at the same time a siginficant reduction in the number of test performed was achieved. Namely, we got this result by executing less than 1% of the test cases on average.
    1110
    12 == Technicalities ==
    13 The method for test selection is the following. We instrument the source code of the methods and functions to log entry and exit events during execution, and make such an instrumented build of the system. Then all tests are executed to produce the initial coverage information, which relates each test to a set of functions it executes. This information is stored in a relational database. Next, a set of changed functions is extracted from a revision under test and this set is used to query the database for a list of tests to execute.
    14 The technical details and source code of the scripts can be found at [wiki:"RegSelectionDetails"].
    15 
    1611== Usage ==
    1712There will be two types of use of the selective regression tests:
    18  * As a special build bot (currently for the Qt port). This provides a faster alternative to the build bot performing full test: it saves more than 77% of the total build time including the overhead required for the selection.
     13 * As a special build bot (currently for the Qt port which started in November 2011). This provides a faster alternative to the build bot performing full test: it saves more than 77% of the total build time including the overhead required for the selection. Probably embedded platforms like ARM could benefit most of this method. The Qt build bot can be found here: http://build.webkit.sed.hu/builders/x86-64%20Linux%20Qt%20Release%20QuickTest
    1914 * As part of the Chromium Early Warning System. This will allow developers and reviewers to check the patches in Bugzilla before landing in the repository. Due to the selection the time required to perform the tests will be much less. Detailed description is at [wiki:"SelectiveTestEWS"].
    2015
    21 == Build bot ==
    22 The selective test build bot for Qt has been started in November 2011, and since then several hundred builds have been performed. Technical details can be found at [wiki:"SelectiveTestTechnical"].
     16== Technicalities and how to get it ==
     17The method for test selection is the following. We instrument the source code of the methods and functions to log entry and exit events during execution, and make such an instrumented build of the system (using the GCC instrumentation feature and our custom C++ instrumentation code). Then all tests are executed to produce the initial coverage information, which relates each test to a set of functions it executes. This information is stored in a relational database. Next, a set of changed functions is extracted from a revision under test (using an extended PrepareChangeLog script) and this set is used to query the database for a list of tests to execute.
    2318
    24 == Initial analysis of results ==
    25 This bot is running in parallel to the full Qt test, but since both bots usually have to skip revisions there was only 64 revisions that were common.
    26 We manually compared the full and selective test for the inclusivenes of the selection:
    27  * We excluded contiguous failing tests because the selective test looks only for changes and does not deal with reoccuring failures.
    28  * There were 5 Crashes among these revisions
    29  * In 9 cases exactly the same test cases were failing in the two bots
    30  * There were additional failures in both bots in 24 cases
    31  * In 3 cases there were tests that the selective test did not find
    32  * In 23 cases selective test found more failures
     19The meta-bug describing the method can be found here: https://bugs.webkit.org/show_bug.cgi?id=78699
    3320
    34 Overall, in the cases where there was an overlap between the failures found, about 77% of the failures in the full test have been found by the selective test. The majority of the additional failures not found by the full test turned out to be flakes or time outs.
    35 Among the missed failures by the selective test 92% was due to the outdated coverage database (some new tests were not in it yet) and the rest is due to flakes and time outs.
     21The preliminary patch is (will be) here.
    3622
    37 Since the coverage computation is resource intensive operation we plan to update the coverage database on a regular basis but not for each revision. This will then eliminate most of the remaining 23% of the missed failures. Currently we are working on starting a continuous (instrumented) build bot to provide mostly up to date coverage information.
     23== Some results ==
     24