• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

google / benchmark / 1605 / 1
92%
master: 92%

Build:
DEFAULT BRANCH: master
Ran 29 May 2018 10:29AM UTC
Files 36
Run time 2s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

29 May 2018 10:13AM UTC coverage: 87.181%. Remained the same
COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage

push

travis-ci

dominichamon
Benchmarking is hard. Making sense of the benchmarking results is even harder. (#593)

The first problem you have to solve yourself. The second one can be aided.
The benchmark library can compute some statistics over the repetitions,
which helps with grasping the results somewhat.

But that is only for the one set of results. It does not really help to compare
the two benchmark results, which is the interesting bit. Thankfully, there are
these bundled `tools/compare.py` and `tools/compare_bench.py` scripts.

They can provide a diff between two benchmarking results. Yay!
Except not really, it's just a diff, while it is very informative and better than
nothing, it does not really help answer The Question - am i just looking at the noise?
It's like not having these per-benchmark statistics...

Roughly, we can formulate the question as:
> Are these two benchmarks the same?
> Did my change actually change anything, or is the difference below the noise level?

Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not?
So maybe we can use statistics here, and solve all our problems?
lol, no, it won't solve all the problems. But maybe it will act as a tool,
to better understand the output, just like the usual statistics on the repetitions...

I'm making an assumption here that most of the people care about the change
of average value, not the standard deviation. Thus i believe we can use T-Test,
be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).
**EDIT**: however, after @dominichamon review, it was decided that it is better
to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test)
I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu).

There are two new use... (continued)

1503 of 1724 relevant lines covered (87.18%)

5006333.93 hits per line

Source Files on job 1605.1 (COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage)
  • Tree
  • List 0
  • Changed 10
  • Source Changed 0
  • Coverage Changed 10
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 1500
  • Travis Job 1605.1
  • a6a1b0d7 on github
  • Prev Job for COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage on master (#1601.1)
  • Next Job for COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage on master (#1613.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc