Changeset 221659 in webkit


Ignore:
Timestamp:
Sep 5, 2017 7:37:41 PM (7 years ago)
Author:
rniwa@webkit.org
Message:

Compute the final score using geometric mean in Speedometer 2.0
https://bugs.webkit.org/show_bug.cgi?id=172968

Reviewed by Saam Barati.

Make Speedometer 2.0 use the geometric mean of the subtotal of each test suite instead of the total..

In Speedometer 1.0, we used the total time to compute the final score because we wanted to make
the slowest framework and library faster. The fastest suite (FlightJS) still accounted for ~6% and
the slowest case (React) accounted for ~25% so we felt the total time, or the arithmetic mean with
a constant factor, was a good metric to track.

In the latest version of Speedometer 2.0, however, the fastest suite (Preact) runs in ~55ms whereas
the slowest suite (Inferno) takes 1.5s on Safari. Since the total time is 6.5s, Preact's suite only
accounts for ~0.8% of the total score while Inferno's suite accounts for ~23% of the total score.
Since the goal of Speedometer is to approximate different kinds of DOM API use patterns on the Web,
we want each framework & library to have some measurement impact on the overall benchmark score.

Furthermore, after r221205, we're testing both debug build of Ember.js as well as release build.
Since debug build is 4x slower, using the total time or the arithmetic mean thereof will effectively
give 4x as much weight to debug build of Ember.js relative to release build of Ember.js. Given only
~5% of websites that deploy Ember.js use debug build, this weighting is clearly not right.

This patch, therefore, replaces the arithmetic mean by the geometric mean to compute the final score.
It also moves the code to compute the final score to BenchmarkRunner to be shared between main.js
and InteractiveRunner.html.

  • Speedometer/InteractiveRunner.html:

(.didRunSuites): Show geometric mean, arithmetic mean, total, as well as the score for completeness
since this is a debugging page for developers.

  • Speedometer/resources/benchmark-runner.js:

(BenchmarkRunner.prototype.step): Added mean, geomean, and score as measuredValues' properties.
(BenchmarkRunner.prototype._runTestAndRecordResults): Removed the dead code.
(BenchmarkRunner.prototype._finalize): Compute and add total, arithmetic mean (just mean in the code),
and geometric mean (geomean) to measuredValues.

  • Speedometer/resources/main.js:

(window.benchmarkClient): Replaced testsCount by stepsCount and _timeValues by _measuredValuesList.
(window.benchmarkClient.willRunTest):
(window.benchmarkClient.didRunTest):
(window.benchmarkClient.didRunSuites): Store measuredValues object instead of just the total time.
(window.benchmarkClient.didFinishLastIteration):
(window.benchmarkClient._computeResults):
(window.benchmarkClient._computeResults.valueForUnit): Renamed from totalTimeInDisplayUnit. Now simply
retrieves the values computed by BenchmarkRunner's_finalize.
(startBenchmark):
(computeScore): Deleted.

Location:
trunk/PerformanceTests
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • trunk/PerformanceTests/ChangeLog

    r221636 r221659  
     12017-09-05  Ryosuke Niwa  <rniwa@webkit.org>
     2
     3        Compute the final score using geometric mean in Speedometer 2.0
     4        https://bugs.webkit.org/show_bug.cgi?id=172968
     5
     6        Reviewed by Saam Barati.
     7
     8        Make Speedometer 2.0 use the geometric mean of the subtotal of each test suite instead of the total..
     9
     10        In Speedometer 1.0, we used the total time to compute the final score because we wanted to make
     11        the slowest framework and library faster. The fastest suite (FlightJS) still accounted for ~6% and
     12        the slowest case (React) accounted for ~25% so we felt the total time, or the arithmetic mean with
     13        a constant factor, was a good metric to track.
     14
     15        In the latest version of Speedometer 2.0, however, the fastest suite (Preact) runs in ~55ms whereas
     16        the slowest suite (Inferno) takes 1.5s on Safari. Since the total time is 6.5s, Preact's suite only
     17        accounts for ~0.8% of the total score while Inferno's suite accounts for ~23% of the total score.
     18        Since the goal of Speedometer is to approximate different kinds of DOM API use patterns on the Web,
     19        we want each framework & library to have some measurement impact on the overall benchmark score.
     20
     21        Furthermore, after r221205, we're testing both debug build of Ember.js as well as release build.
     22        Since debug build is 4x slower, using the total time or the arithmetic mean thereof will effectively
     23        give 4x as much weight to debug build of Ember.js relative to release build of Ember.js. Given only
     24        ~5% of websites that deploy Ember.js use debug build, this weighting is clearly not right.
     25
     26        This patch, therefore, replaces the arithmetic mean by the geometric mean to compute the final score.
     27        It also moves the code to compute the final score to BenchmarkRunner to be shared between main.js
     28        and InteractiveRunner.html.
     29
     30        * Speedometer/InteractiveRunner.html:
     31        (.didRunSuites): Show geometric mean, arithmetic mean, total, as well as the score for completeness
     32        since this is a debugging page for developers.
     33        * Speedometer/resources/benchmark-runner.js:
     34        (BenchmarkRunner.prototype.step): Added mean, geomean, and score as measuredValues' properties.
     35        (BenchmarkRunner.prototype._runTestAndRecordResults): Removed the dead code.
     36        (BenchmarkRunner.prototype._finalize): Compute and add total, arithmetic mean (just mean in the code),
     37        and geometric mean (geomean) to measuredValues.
     38        * Speedometer/resources/main.js:
     39        (window.benchmarkClient): Replaced testsCount by stepsCount and _timeValues by _measuredValuesList.
     40        (window.benchmarkClient.willRunTest):
     41        (window.benchmarkClient.didRunTest):
     42        (window.benchmarkClient.didRunSuites): Store measuredValues object instead of just the total time.
     43        (window.benchmarkClient.didFinishLastIteration):
     44        (window.benchmarkClient._computeResults):
     45        (window.benchmarkClient._computeResults.valueForUnit): Renamed from totalTimeInDisplayUnit. Now simply
     46        retrieves the values computed by BenchmarkRunner's_finalize.
     47        (startBenchmark):
     48        (computeScore): Deleted.
     49
    1502017-09-05  JF Bastien  <jfbastien@apple.com>
    251
  • trunk/PerformanceTests/Speedometer/InteractiveRunner.html

    r221039 r221659  
    142142                results += suiteName + ' : ' + suiteResults.total + ' ms\n';
    143143            }
     144            results += 'Arithemtic Mean : ' + measuredValues.mean  + ' ms\n';
     145            results += 'Geometric Mean : ' + measuredValues.geomean  + ' ms\n';
    144146            results += 'Total : ' + measuredValues.total + ' ms\n';
     147            results += 'Score : ' + measuredValues.score + ' rpm\n';
    145148
    146149            if (!results)
  • trunk/PerformanceTests/Speedometer/resources/benchmark-runner.js

    r221106 r221659  
    195195    if (!state) {
    196196        state = new BenchmarkState(this._suites);
    197         this._measuredValues = {tests: {}, total: 0};
     197        this._measuredValues = {tests: {}, total: 0, mean: NaN, geomean: NaN, score: NaN};
    198198    }
    199199
     
    208208    if (state.isFirstTest()) {
    209209        this._removeFrame();
    210         this._masuredValuesForCurrentSuite = {};
    211210        var self = this;
    212211        return state.prepareCurrentSuite(this, this._appendFrame()).then(function (prepareReturnValue) {
     
    261260            suiteResults.tests[test.name] = {tests: {'Sync': syncTime, 'Async': asyncTime}, total: total};
    262261            suiteResults.total += total;
    263             self._measuredValues.total += total;
    264262
    265263            if (self._client && self._client.didRunTest)
     
    276274    this._removeFrame();
    277275
    278     if (this._client && this._client.didRunSuites)
     276    if (this._client && this._client.didRunSuites) {
     277        var product = 1;
     278        var values = [];
     279        for (var suiteName in this._measuredValues.tests) {
     280            var suiteTotal = this._measuredValues.tests[suiteName].total;
     281            product *= suiteTotal;
     282            values.push(suiteTotal);
     283        }
     284
     285        values.sort(function (a, b) { return a - b }); // Avoid the loss of significance for the sum.
     286        var total = values.reduce(function (a, b) { return a + b });
     287        var geomean = Math.pow(product, 1 / values.length);
     288
     289        var correctionFactor = 3; // This factor makes the test score look reasonably fit within 0 to 140.
     290        this._measuredValues.total = total;
     291        this._measuredValues.mean = total / values.length;
     292        this._measuredValues.geomean = geomean;
     293        this._measuredValues.score = 60 * 1000 / geomean / correctionFactor;
    279294        this._client.didRunSuites(this._measuredValues);
     295    }
    280296
    281297    if (this._runNextIteration)
  • trunk/PerformanceTests/Speedometer/resources/main.js

    r221118 r221659  
    22    displayUnit: 'runs/min',
    33    iterationCount: 10,
    4     testsCount: null,
     4    stepCount: null,
    55    suitesCount: null,
    6     _timeValues: [],
     6    _measuredValuesList: [],
    77    _finishedTestCount: 0,
    88    _progressCompleted: null,
     
    1414    },
    1515    willRunTest: function (suite, test) {
    16         document.getElementById('info').textContent = suite.name + ' ( ' + this._finishedTestCount + ' / ' + this.testsCount + ' )';
     16        document.getElementById('info').textContent = suite.name + ' ( ' + this._finishedTestCount + ' / ' + this.stepCount + ' )';
    1717    },
    1818    didRunTest: function () {
    1919        this._finishedTestCount++;
    20         this._progressCompleted.style.width = (this._finishedTestCount * 100 / this.testsCount) + '%';
     20        this._progressCompleted.style.width = (this._finishedTestCount * 100 / this.stepCount) + '%';
    2121    },
    2222    didRunSuites: function (measuredValues) {
    23         this._timeValues.push(measuredValues.total);
     23        this._measuredValuesList.push(measuredValues);
    2424    },
    2525    willStartFirstIteration: function () {
    26         this._timeValues = [];
     26        this._measuredValuesList = [];
    2727        this._finishedTestCount = 0;
    2828        this._progressCompleted = document.getElementById('progress-completed');
     
    3232        document.getElementById('logo-link').onclick = null;
    3333
    34         var results = this._computeResults(this._timeValues, this.displayUnit);
     34        var results = this._computeResults(this._measuredValuesList, this.displayUnit);
    3535
    3636        this._updateGaugeNeedle(results.mean);
     
    4848            showResultsSummary();
    4949    },
    50     _computeResults: function (timeValues, displayUnit) {
     50    _computeResults: function (measuredValuesList, displayUnit) {
    5151        var suitesCount = this.suitesCount;
    52         function totalTimeInDisplayUnit(time) {
     52        function valueForUnit(measuredValues) {
    5353            if (displayUnit == 'ms')
    54                 return time;
    55             return computeScore(time);
     54                return measuredValues.geomean;
     55            return measuredValues.score;
    5656        }
    5757
     
    6565        }
    6666
    67         var values = timeValues.map(totalTimeInDisplayUnit);
     67        var values = measuredValuesList.map(valueForUnit);
    6868        var sum = values.reduce(function (a, b) { return a + b; }, 0);
    6969        var arithmeticMean = sum / values.length;
     
    8484
    8585        return {
    86             formattedValues: timeValues.map(function (time) {
    87                 return toSigFigPrecision(totalTimeInDisplayUnit(time), 4) + ' ' + displayUnit;
     86            formattedValues: values.map(function (value) {
     87                return toSigFigPrecision(value, 4) + ' ' + displayUnit;
    8888            }),
    8989            mean: arithmeticMean,
     
    197197
    198198    var enabledSuites = Suites.filter(function (suite) { return !suite.disabled; });
    199     var totalSubtestCount = enabledSuites.reduce(function (testsCount, suite) { return testsCount + suite.tests.length; }, 0);
    200     benchmarkClient.testsCount = benchmarkClient.iterationCount * totalSubtestCount;
     199    var totalSubtestsCount = enabledSuites.reduce(function (testsCount, suite) { return testsCount + suite.tests.length; }, 0);
     200    benchmarkClient.stepCount = benchmarkClient.iterationCount * totalSubtestsCount;
    201201    benchmarkClient.suitesCount = enabledSuites.length;
    202202    var runner = new BenchmarkRunner(Suites, benchmarkClient);
     
    204204
    205205    return true;
    206 }
    207 
    208 function computeScore(time) {
    209     return 60 * 1000 * benchmarkClient.suitesCount / time;
    210206}
    211207
Note: See TracChangeset for help on using the changeset viewer.