Changeset 221474 in webkit


Ignore:
Timestamp:
Aug 31, 2017 11:36:32 PM (7 years ago)
Author:
Carlos Garcia Campos
Message:

[GTK] Improve the way unit test are run and the results reported
https://bugs.webkit.org/show_bug.cgi?id=176104

Reviewed by Carlos Alberto Lopez Perez.

There are several issues with the way unit tests are run by run-gtk-tests and also with the way results are
reported:

  • The results summary only mentions the test binaries, not the actual test cases, so you always have to scroll up to find the actual test cases failing.
  • The number of reported failures is the number of test binaries that failed, so if a new test case fails for the same binary in a new revision, we won't notice it just looking at the number of failures.
  • We show detailed information about skipped test in the results summary, which is just noise.
  • In the case of glib tests, when a test case times out, we finish the test suite, instead of continuing with the rest of the test cases like we do for normal failures or crashes. If a new test case fails after a test case that timed out we will not notice it until we fix or skip the test cases timing out.
  • In the case of glib tests, the timeout is applied to the whole suite, instead of per test case, we have a hack to make it longer only for that. It has worked so far, but it doesn't scale, and it's an ugly hack.
  • It's not currently possible to detect flaky tests, because again, we know the binaries/suites that failed but not the actual test cases.

This patch fixes all these issues and makes it possible to add support for flaky tests in a follow up patch.

  • BuildSlaveSupport/build.webkit.org-config/master.cfg:

(RunGtkAPITests.commandComplete): Update the RunGtkAPITests step to parse the new output.

  • Scripts/run-gtk-tests:

(TestRunner._start_timeout): Helper to start the timeout if needed.
(TestRunner._start_timeout._alarm_handler): Raise timeout exception.
(TestRunner._stop_timeout): Helper to stop the timeout if needed.
(TestRunner._waitpid): Merged waitpid and return_code_from_exit_status.
(TestRunner._run_test_glib): Do not double the timeout anymore, we now start/stop the timeout for every test
case. Return a dictionary where keys are test cases and values the results only in case of failures.
(TestRunner._run_test_glib.parse_line): Update the parser to also detect test cases and the results.
(TestRunner._run_test_glib.parse_line.set_test_result): Helper to set the result of a test case.
(TestRunner._run_google_test): Return a dictionary where key is the test case and value is the result only in
case of failure.
(TestRunner._run_google_test_suite): Updated now that _run_google_test returns a dictionary.
(TestRunner.run_tests): Handle the results dictionary and show the results with information about test cases
failing and grouped by test binaries.

Location:
trunk/Tools
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • trunk/Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg

    r220238 r221474  
    659659        self.statusLine = []
    660660
    661         foundItems = re.findall("Tests failed \((\d+)\):", logText)
     661        foundItems = re.findall("Unexpected failures \((\d+)\):", logText)
    662662        if (foundItems):
    663663            self.incorrectTests = int(foundItems[0])
    664664
    665         foundItems = re.findall("Tests that crashed \((\d+)\):", logText)
     665        foundItems = re.findall("Unexpected crashes \((\d+)\):", logText)
    666666        if (foundItems):
    667667            self.crashedTests = int(foundItems[0])
    668668
    669         foundItems = re.findall("Tests that timed out \((\d+)\):", logText)
     669        foundItems = re.findall("Unexpected timeouts \((\d+)\):", logText)
    670670        if (foundItems):
    671671            self.timedOutTests = int(foundItems[0])
    672672
    673         foundItems = re.findall("Tests skipped \((\d+)\):", logText)
    674         if (foundItems):
    675             self.skippedTests = int(foundItems[0])
    676 
    677673        self.totalFailedTests = self.incorrectTests + self.crashedTests + self.timedOutTests
    678674
    679675        if self.totalFailedTests > 0:
    680676            self.statusLine = [
    681                 "%d API tests failed, %d crashed, %d timed out, %d skipped" %
    682                 (self.incorrectTests, self.crashedTests, self.timedOutTests, self.skippedTests)
     677                "%d API tests failed, %d crashed, %d timed out" % (self.incorrectTests, self.crashedTests, self.timedOutTests)
    683678            ]
    684679
  • trunk/Tools/ChangeLog

    r221468 r221474  
     12017-08-31  Carlos Garcia Campos  <cgarcia@igalia.com>
     2
     3        [GTK] Improve the way unit test are run and the results reported
     4        https://bugs.webkit.org/show_bug.cgi?id=176104
     5
     6        Reviewed by Carlos Alberto Lopez Perez.
     7
     8        There are several issues with the way unit tests are run by run-gtk-tests and also with the way results are
     9        reported:
     10
     11         - The results summary only mentions the test binaries, not the actual test cases, so you always have to scroll
     12           up to find the actual test cases failing.
     13         - The number of reported failures is the number of test binaries that failed, so if a new test case fails for
     14           the same binary in a new revision, we won't notice it just looking at the number of failures.
     15         - We show detailed information about skipped test in the results summary, which is just noise.
     16         - In the case of glib tests, when a test case times out, we finish the test suite, instead of continuing with the
     17           rest of the test cases like we do for normal failures or crashes. If a new test case fails after a test case that
     18           timed out we will not notice it until we fix or skip the test cases timing out.
     19         - In the case of glib tests, the timeout is applied to the whole suite, instead of per test case, we have a hack
     20           to make it longer only for that. It has worked so far, but it doesn't scale, and it's an ugly hack.
     21         - It's not currently possible to detect flaky tests, because again, we know the binaries/suites that failed but
     22           not the actual test cases.
     23
     24        This patch fixes all these issues and makes it possible to add support for flaky tests in a follow up patch.
     25
     26        * BuildSlaveSupport/build.webkit.org-config/master.cfg:
     27        (RunGtkAPITests.commandComplete): Update the RunGtkAPITests step to parse the new output.
     28        * Scripts/run-gtk-tests:
     29        (TestRunner._start_timeout): Helper to start the timeout if needed.
     30        (TestRunner._start_timeout._alarm_handler): Raise timeout exception.
     31        (TestRunner._stop_timeout): Helper to stop the timeout if needed.
     32        (TestRunner._waitpid): Merged waitpid and return_code_from_exit_status.
     33        (TestRunner._run_test_glib): Do not double the timeout anymore, we now start/stop the timeout for every test
     34        case. Return a dictionary where keys are test cases and values the results only in case of failures.
     35        (TestRunner._run_test_glib.parse_line): Update the parser to also detect test cases and the results.
     36        (TestRunner._run_test_glib.parse_line.set_test_result): Helper to set the result of a test case.
     37        (TestRunner._run_google_test): Return a dictionary where key is the test case and value is the result only in
     38        case of failure.
     39        (TestRunner._run_google_test_suite): Updated now that _run_google_test returns a dictionary.
     40        (TestRunner.run_tests): Handle the results dictionary and show the results with information about test cases
     41        failing and grouped by test binaries.
     42
    1432017-08-31  Filip Pizlo  <fpizlo@apple.com>
    244
  • trunk/Tools/Scripts/run-gtk-tests

    r221295 r221474  
    259259        return True
    260260
    261     def _get_child_pid_from_test_output(self, output):
    262         if not output:
    263             return -1
    264         match = re.search(r'\(pid=(?P<child_pid>[0-9]+)\)', output)
    265         if not match:
    266             return -1
    267         return int(match.group('child_pid'))
    268 
    269261    def _kill_process(self, pid):
    270262        try:
     
    274266            pass
    275267
    276     def _run_test_command(self, command, timeout=-1):
    277         def alarm_handler(signum, frame):
     268    @staticmethod
     269    def _start_timeout(timeout):
     270        if timeout <= 0:
     271            return
     272
     273        def _alarm_handler(signum, frame):
    278274            raise TestTimeout
    279275
    280         child_pid = [-1]
    281         def parse_line(line, child_pid = child_pid):
    282             if child_pid[0] == -1:
    283                 child_pid[0] = self._get_child_pid_from_test_output(line)
    284 
    285             sys.stdout.write(line)
    286 
    287         def waitpid(pid):
    288             while True:
    289                 try:
    290                     return os.waitpid(pid, 0)
    291                 except (OSError, IOError) as e:
    292                     if e.errno == errno.EINTR:
    293                         continue
    294                     raise
    295 
    296         def return_code_from_exit_status(status):
    297             if os.WIFSIGNALED(status):
    298                 return -os.WTERMSIG(status)
    299             elif os.WIFEXITED(status):
    300                 return os.WEXITSTATUS(status)
    301             else:
     276        signal(SIGALRM, _alarm_handler)
     277        alarm(timeout)
     278
     279    @staticmethod
     280    def _stop_timeout(timeout):
     281        if timeout <= 0:
     282            return
     283
     284        alarm(0)
     285
     286    def _waitpid(self, pid):
     287        while True:
     288            try:
     289                dummy, status = os.waitpid(pid, 0)
     290                if os.WIFSIGNALED(status):
     291                    return -os.WTERMSIG(status)
     292                if os.WIFEXITED(status):
     293                    return os.WEXITSTATUS(status)
     294
    302295                # Should never happen
    303296                raise RuntimeError("Unknown child exit status!")
     297            except (OSError, IOError) as e:
     298                if e.errno == errno.EINTR:
     299                    continue
     300                if e.errno == errno.ECHILD:
     301                    # This happens if SIGCLD is set to be ignored or waiting
     302                    # for child processes has otherwise been disabled for our
     303                    # process.  This child is dead, we can't get the status.
     304                    return 0
     305                raise
     306
     307    def _run_test_glib(self, test_program):
     308        command = ['gtester', '-k']
     309        if self._options.verbose:
     310            command.append('--verbose')
     311        for test_case in self._test_cases_to_skip(test_program):
     312            command.extend(['-s', test_case])
     313        command.append(test_program)
     314
     315        timeout = self._options.timeout
     316        test = os.path.join(os.path.basename(os.path.dirname(test_program)), os.path.basename(test_program))
     317        if test in TestRunner.SLOW:
     318            timeout *= 5
     319
     320        test_context = { "child-pid" : -1, "did-timeout" : False, "current_test" : None }
     321        def parse_line(line, test_context = test_context):
     322            if not line:
     323                return
     324
     325            match = re.search(r'\(pid=(?P<child_pid>[0-9]+)\)', line)
     326            if match:
     327                test_context["child-pid"] = int(match.group('child_pid'))
     328                sys.stdout.write(line)
     329                return
     330
     331            def set_test_result(test, result):
     332                if result == "FAIL":
     333                    if test_context["did-timeout"] and result == "FAIL":
     334                        test_context[test] = "TIMEOUT"
     335                    else:
     336                        test_context[test] = result
     337                test_context["did-timeout"] = False
     338                test_context["current_test"] = None
     339                self._stop_timeout(timeout)
     340                self._start_timeout(timeout)
     341
     342            normalized_line = line.strip().replace(' ', '')
     343            if not normalized_line:
     344                return
     345
     346            if normalized_line[0] == '/':
     347                test, result = normalized_line.split(':', 1)
     348                if result in ["OK", "FAIL"]:
     349                    set_test_result(test, result)
     350                else:
     351                    test_context["current_test"] = test
     352            elif normalized_line in ["OK", "FAIL"]:
     353                set_test_result(test_context["current_test"], normalized_line)
     354
     355            sys.stdout.write(line)
    304356
    305357        pid, fd = os.forkpty()
     
    308360            sys.exit(0)
    309361
    310         if timeout > 0:
    311             signal(SIGALRM, alarm_handler)
    312             alarm(timeout)
    313 
    314         try:
    315             common.parse_output_lines(fd, parse_line)
    316             if timeout > 0:
    317                 alarm(0)
    318         except TestTimeout:
    319             self._kill_process(pid)
    320             if child_pid[0] > 0:
    321                 self._kill_process(child_pid[0])
    322             raise
    323 
    324         try:
    325             dummy, status = waitpid(pid)
    326         except OSError as e:
    327             if e.errno != errno.ECHILD:
    328                 raise
    329             # This happens if SIGCLD is set to be ignored or waiting
    330             # for child processes has otherwise been disabled for our
    331             # process.  This child is dead, we can't get the status.
    332             status = 0
    333 
    334         return return_code_from_exit_status(status)
    335 
    336     def _run_test_glib(self, test_program):
    337         tester_command = ['gtester', '-k']
    338         if self._options.verbose:
    339             tester_command.append('--verbose')
    340         for test_case in self._test_cases_to_skip(test_program):
    341             tester_command.extend(['-s', test_case])
    342         tester_command.append(test_program)
    343         # This timeout is supposed to be per test case, but in the case of GLib tests it affects all the tests cases of
    344         # the same test program. Some test programs like TestLoaderClient, that have a lot of test cases, often time out
    345         # in the bots because the timeout is not enough to run all the tests cases. So, we use a longer timeout for GLib
    346         # tests (timeout * 2).
    347         timeout = self._options.timeout * 2
    348         test = os.path.join(os.path.basename(os.path.dirname(test_program)), os.path.basename(test_program))
    349         if test in TestRunner.SLOW:
    350             timeout *= 5
    351 
    352         return self._run_test_command(tester_command, timeout)
     362        self._start_timeout(timeout)
     363
     364        while (True):
     365            try:
     366                common.parse_output_lines(fd, parse_line)
     367                break
     368            except TestTimeout:
     369                assert test_context["child-pid"] > 0
     370                self._kill_process(test_context["child-pid"])
     371                test_context["child-pid"] = -1
     372                test_context["did-timeout"] = True
     373
     374        self._stop_timeout(timeout)
     375        del test_context["child-pid"]
     376        del test_context["did-timeout"]
     377        del test_context["current_test"]
     378
     379        self._waitpid(pid)
     380        return test_context
    353381
    354382    def _get_tests_from_google_test_suite(self, test_program):
     
    375403
    376404    def _run_google_test(self, test_program, subtest):
    377         test_command = [test_program, '--gtest_filter=%s' % (subtest)]
     405        command = [test_program, '--gtest_filter=%s' % (subtest)]
    378406        timeout = self._options.timeout
    379407        if subtest in TestRunner.SLOW:
    380408            timeout *= 5
    381409
    382         status = self._run_test_command(test_command, timeout)
     410        pid, fd = os.forkpty()
     411        if pid == 0:
     412            os.execvpe(command[0], command, self._test_env)
     413            sys.exit(0)
     414
     415        self._start_timeout(timeout)
     416        try:
     417            common.parse_output_lines(fd, sys.stdout.write)
     418            status = self._waitpid(pid)
     419        except TestTimeout:
     420            self._kill_process(pid)
     421            return { subtest : "TIMEOUT" }
     422
     423        self._stop_timeout(timeout)
     424
    383425        if status == -SIGSEGV:
    384426            sys.stdout.write("**CRASH** %s\n" % subtest)
    385427            sys.stdout.flush()
    386         return status
     428            return { subtest : "CRASH" }
     429
     430        if status != 0:
     431            return { subtest : "FAIL" }
     432
     433        return { }
    387434
    388435    def _run_google_test_suite(self, test_program):
    389         retcode = 0
     436        result = { }
    390437        for subtest in self._get_tests_from_google_test_suite(test_program):
    391             if self._run_google_test(test_program, subtest):
    392                 retcode = 1
    393         return retcode
     438            result.update(self._run_google_test(test_program, subtest))
     439        return result
    394440
    395441    def _run_test(self, test_program):
     
    416462        self._tests = [test for test in self._tests if self._should_run_test_program(test)]
    417463
    418         crashed_tests = []
    419         failed_tests = []
    420         timed_out_tests = []
     464        crashed_tests = {}
     465        failed_tests = {}
     466        timed_out_tests = {}
    421467        try:
    422468            for test in self._tests:
    423                 exit_status_code = 0
    424                 try:
    425                     exit_status_code = self._run_test(test)
    426                 except TestTimeout:
    427                     sys.stdout.write("TEST: %s: TIMEOUT\n" % test)
    428                     sys.stdout.flush()
    429                     timed_out_tests.append(test)
    430 
    431                 if exit_status_code == -SIGSEGV:
    432                     sys.stdout.write("TEST: %s: CRASHED\n" % test)
    433                     sys.stdout.flush()
    434                     crashed_tests.append(test)
    435                 elif exit_status_code != 0:
    436                     failed_tests.append(test)
     469                results = self._run_test(test)
     470                for test_case, result in results.iteritems():
     471                    if result == "FAIL":
     472                        failed_tests.setdefault(test, []).append(test_case)
     473                    elif result == "TIMEOUT":
     474                        timed_out_tests.setdefault(test, []).append(test_case)
     475                    elif result == "CRASH":
     476                        crashed_tests.setdefault(test, []).append(test_case)
    437477        finally:
    438478            self._tear_down_testing_environment()
    439479
    440480        if failed_tests:
    441             names = [test.replace(self._test_programs_base_dir(), '', 1) for test in failed_tests]
    442             sys.stdout.write("Tests failed (%d): %s\n" % (len(names), ", ".join(names)))
     481            sys.stdout.write("\nUnexpected failures (%d)\n" % (sum(len(value) for value in failed_tests.itervalues())))
     482            for test in failed_tests:
     483                sys.stdout.write("    %s\n" % (test.replace(self._test_programs_base_dir(), '', 1)))
     484                for test_case in failed_tests[test]:
     485                    sys.stdout.write("        %s\n" % (test_case))
    443486            sys.stdout.flush()
    444487
    445488        if crashed_tests:
    446             names = [test.replace(self._test_programs_base_dir(), '', 1) for test in crashed_tests]
    447             sys.stdout.write("Tests that crashed (%d): %s\n" % (len(names), ", ".join(names)))
     489            sys.stdout.write("\nUnexpected crashes (%d)\n" % (sum(len(value) for value in crashed_tests.itervalues())))
     490            for test in crashed_tests:
     491                sys.stdout.write("    %s\n" % (test.replace(self._test_programs_base_dir(), '', 1)))
     492                for test_case in crashed_tests[test]:
     493                    sys.stdout.write("        %s\n" % (test_case))
    448494            sys.stdout.flush()
    449495
    450496        if timed_out_tests:
    451             names = [test.replace(self._test_programs_base_dir(), '', 1) for test in timed_out_tests]
    452             sys.stdout.write("Tests that timed out (%d): %s\n" % (len(names), ", ".join(names)))
    453             sys.stdout.flush()
    454 
    455         if self._skipped_tests and self._options.skipped_action == 'skip':
    456             sys.stdout.write("Tests skipped (%d):\n%s\n" %
    457                              (len(self._skipped_tests),
    458                               "\n".join([str(skipped) for skipped in self._skipped_tests])))
     497            sys.stdout.write("\nUnexpected timeouts (%d)\n" % (sum(len(value) for value in timed_out_tests.itervalues())))
     498            for test in timed_out_tests:
     499                sys.stdout.write("    %s\n" % (test.replace(self._test_programs_base_dir(), '', 1)))
     500                for test_case in timed_out_tests[test]:
     501                    sys.stdout.write("        %s\n" % (test_case))
    459502            sys.stdout.flush()
    460503
Note: See TracChangeset for help on using the changeset viewer.