Changes between Version 6 and Version 7 of SelectiveTestBuildBot

Apr 19, 2012 4:36:26 AM (9 years ago)



  • SelectiveTestBuildBot

    v6 v7  
    2727== Some results ==
     29Between November 16, 2011 and April 11, 2012, the Selective Test configuration performed the analysis of 9690 revisions.
     30This corresponds to about 72% of the commits between the revisions r100422 and r113914
     31(the full test configuration for the same platform analyzed only 6005 revisions in the same period, which means that about 61% more revisions were checked by the Selective Test builder).
     33The average total time including compilation, test selection and test execution was 1339 seconds for the full test and 287 seconds for the selective test (on an Intel Xeon E5450 3.00GHz machine with 8 cores and 32GiB memory).
     34This means that regression testing errors can be found in about a quarter of time and a higher rate of tested revisions can be obtained, namely without selection only every second of the revisions is tested while with selection 2 of every 3.
     36Manual investigation of relevant parts of the 9690 revisions has been performed, but
     37we needed a set of revisions shared by both the full and the selective build bots. Because the build queues for the full test and selective test are not synchronized, only 1665 revisions were suitable for comparison.
     38Out of these, 119 revisions had new failures (with a total number of 876 individual failures).
     39Since we limit our analysis to C++ code, we classified the revisions also according to whether they contained C++, non-C++ or mixed changes.
     40Out of the 119, in 5 revisions there were only C++ changes, while 90 revisions were mixed, and the rest contained no C++ changes.
     42From the C++-only revisions there was a build problem with one of them, but in the remaining 4 revisions the selective test captured all failures (30 altogether) of the full test (inclusiveness was 100%).
     43In the mixed revisions, from the total 302 failures 60% (181) was correctly identified.
     45The selection capabilities for the remaining ca. 8000 revisions were checked by performing the selection offline in a batch process specifically created for this purpose.
     46For this, we used the same coverage database, list of changes and set of tools as the Selective Test bot uses.
     47We also limited the analysis to those revisions that contained changes to C++ code only, and looked only at new failures in the revisions.
     48The overall inclusiveness we got was 75.38% in this case.
     49Most of the missed failures by the selective test could be attributed to changes in some non-C++ code, which we cannot handle currently, to the slightly outdated coverage database and the imperfections of the change determination method.
     51The selection size (number of selected tests) varied: there were many small (0 or 1) but also complete (almost all tests) selections.
     52The following graph shows the relationship between the inclusiveness and selection size.
     53As can be seen, there are several cases when the selection size is big. Currently, we are working on prioritization algorithms to further reduce the selection size.