| 29 | Between November 16, 2011 and April 11, 2012, the Selective Test configuration performed the analysis of 9690 revisions. |
| 30 | This corresponds to about 72% of the commits between the revisions r100422 and r113914 |
| 31 | (the full test configuration for the same platform analyzed only 6005 revisions in the same period, which means that about 61% more revisions were checked by the Selective Test builder). |
| 32 | |
| 33 | The average total time including compilation, test selection and test execution was 1339 seconds for the full test and 287 seconds for the selective test (on an Intel Xeon E5450 3.00GHz machine with 8 cores and 32GiB memory). |
| 34 | This means that regression testing errors can be found in about a quarter of time and a higher rate of tested revisions can be obtained, namely without selection only every second of the revisions is tested while with selection 2 of every 3. |
| 35 | |
| 36 | Manual investigation of relevant parts of the 9690 revisions has been performed, but |
| 37 | we needed a set of revisions shared by both the full and the selective build bots. Because the build queues for the full test and selective test are not synchronized, only 1665 revisions were suitable for comparison. |
| 38 | Out of these, 119 revisions had new failures (with a total number of 876 individual failures). |
| 39 | Since we limit our analysis to C++ code, we classified the revisions also according to whether they contained C++, non-C++ or mixed changes. |
| 40 | Out of the 119, in 5 revisions there were only C++ changes, while 90 revisions were mixed, and the rest contained no C++ changes. |
| 41 | |
| 42 | From the C++-only revisions there was a build problem with one of them, but in the remaining 4 revisions the selective test captured all failures (30 altogether) of the full test (inclusiveness was 100%). |
| 43 | In the mixed revisions, from the total 302 failures 60% (181) was correctly identified. |
| 44 | |
| 45 | The selection capabilities for the remaining ca. 8000 revisions were checked by performing the selection offline in a batch process specifically created for this purpose. |
| 46 | For this, we used the same coverage database, list of changes and set of tools as the Selective Test bot uses. |
| 47 | We also limited the analysis to those revisions that contained changes to C++ code only, and looked only at new failures in the revisions. |
| 48 | The overall inclusiveness we got was 75.38% in this case. |
| 49 | Most of the missed failures by the selective test could be attributed to changes in some non-C++ code, which we cannot handle currently, to the slightly outdated coverage database and the imperfections of the change determination method. |
| 50 | |
| 51 | The selection size (number of selected tests) varied: there were many small (0 or 1) but also complete (almost all tests) selections. |
| 52 | The following graph shows the relationship between the inclusiveness and selection size. |
| 53 | As can be seen, there are several cases when the selection size is big. Currently, we are working on prioritization algorithms to further reduce the selection size. |
| 54 | |
| 55 | [[Image(selectivetest.png)]] |
| 56 | |