• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

google / benchmark / 1605
92%

Build:
DEFAULT BRANCH: master
Ran 29 May 2018 10:29AM UTC
Jobs 1
Files 36
Run time 3s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
1605

push

travis-ci

dominichamon
Benchmarking is hard. Making sense of the benchmarking results is even harder. (#593)

The first problem you have to solve yourself. The second one can be aided.
The benchmark library can compute some statistics over the repetitions,
which helps with grasping the results somewhat.

But that is only for the one set of results. It does not really help to compare
the two benchmark results, which is the interesting bit. Thankfully, there are
these bundled `tools/compare.py` and `tools/compare_bench.py` scripts.

They can provide a diff between two benchmarking results. Yay!
Except not really, it's just a diff, while it is very informative and better than
nothing, it does not really help answer The Question - am i just looking at the noise?
It's like not having these per-benchmark statistics...

Roughly, we can formulate the question as:
> Are these two benchmarks the same?
> Did my change actually change anything, or is the difference below the noise level?

Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not?
So maybe we can use statistics here, and solve all our problems?
lol, no, it won't solve all the problems. But maybe it will act as a tool,
to better understand the output, just like the usual statistics on the repetitions...

I'm making an assumption here that most of the people care about the change
of average value, not the standard deviation. Thus i believe we can use T-Test,
be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).
**EDIT**: however, after @dominichamon review, it was decided that it is better
to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test)
I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu).

There are two new use... (continued)

1503 of 1724 relevant lines covered (87.18%)

5006333.93 hits per line

Jobs
ID Job ID Ran Files Coverage
1 1605.1 (COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage) 29 May 2018 10:29AM UTC 0
87.18
Travis Job 1605.1
Source Files on build 1605
Detailed source file information is not available for this build.
  • Back to Repo
  • Travis Build #1605
  • a6a1b0d7 on github
  • Prev Build on master (#1601)
  • Next Build on master (#1613)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc