google / benchmark / 1605 / 1
92%
master: 92%

Ran 29 May 2018 10:29AM UTC

Files 36

Run time 2s

Badge

Committed 29 May 2018 10:13AM UTC coverage: 87.181%. Remained the same

Job # COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage

Build Type

push

travis-ci

Committed by

Commit Message

Benchmarking is hard. Making sense of the benchmarking results is even harder. (#593)

The first problem you have to solve yourself. The second one can be aided.
The benchmark library can compute some statistics over the repetitions,
which helps with grasping the results somewhat.

But that is only for the one set of results. It does not really help to compare
the two benchmark results, which is the interesting bit. Thankfully, there are
these bundled `tools/compare.py` and `tools/compare_bench.py` scripts.

They can provide a diff between two benchmarking results. Yay!
Except not really, it's just a diff, while it is very informative and better than
nothing, it does not really help answer The Question - am i just looking at the noise?
It's like not having these per-benchmark statistics...

Roughly, we can formulate the question as:
> Are these two benchmarks the same?
> Did my change actually change anything, or is the difference below the noise level?

Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not?
So maybe we can use statistics here, and solve all our problems?
lol, no, it won't solve all the problems. But maybe it will act as a tool,
to better understand the output, just like the usual statistics on the repetitions...

I'm making an assumption here that most of the people care about the change
of average value, not the standard deviation. Thus i believe we can use T-Test,
be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).
**EDIT**: however, after @dominichamon review, it was decided that it is better
to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test)
I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu).

There are two new use... (continued)

Run Details

1503 of 1724 relevant lines covered (87.18%)

5006333.93 hits per line

Source Files on job 1605.1 (COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage)