• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

jakirkham / dask-distance / 530
100%
master: 100%

Build:
Build:
LAST BUILD BRANCH: fix_license_typo
DEFAULT BRANCH: master
Ran 08 Oct 2017 12:35AM UTC
Jobs 4
Files 5
Run time 54s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
530

push

travis-ci

jakirkham
Optimize pdist's custom metrics on diagonal chunks

When handling a custom metric with `pdist`, we can optimize this case
further to avoid unneeded computations in chunks along the diagonal. In
particular, we can keep track of the global indices corresponding to the
larger Dask Array. Through this we can know where individual points in
our chunk map to in the larger Dask Array.

Using this knowledge of the global indices, we can check to see where
individual points in each chunk lie relative to the diagonal. If the
point lies above the diagonal in the Dask Array, we simply compute it as
normal. However if the point is below the diagonal, we can skip it as
this point will be dropped from the final result anyways. Thus we are
able to only compute for the indices we will keep with the custom metric
when using `pdist`.

If for some reason Dask tried to compute a chunk that we intend to toss
completely, adding this check results in simply allocating the memory
needed to compute that chunk and then verifying nothing from that chunk
will be kept. So this is a nice failsafe optimization to have. That
said, Dask should not be computing those chunks at all.

There is still some extra cost to allocating a full NumPy array for
chunks on the diagonal. After all we technically don't need any of the
entries that lie on or below the diagonal as they will simply be tossed.
However it is difficult to optimize out this memory cost without
significantly redesigning the structure of how computations are done
with `pdist`. Given this is just the cost of allocating memory and not
extra computation, it is difficult to say an attempt at such an
optimization is justified given the likely greater maintenance cost and
chance for errors.

28 of 28 new or added lines in 2 files covered. (100.0%)

310 of 310 relevant lines covered (100.0%)

3.97 hits per line

Jobs
ID Job ID Ran Files Coverage
1 530.1 (PYVER="3.6") 08 Oct 2017 12:36AM UTC 0
100.0
Travis Job 530.1
2 530.2 (PYVER="3.5") 08 Oct 2017 12:35AM UTC 0
100.0
Travis Job 530.2
3 530.3 (PYVER="3.4") 08 Oct 2017 12:36AM UTC 0
100.0
Travis Job 530.3
4 530.4 (PYVER="2.7") 08 Oct 2017 12:36AM UTC 0
97.42
Travis Job 530.4
Source Files on build 530
  • List 0
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Travis Build #530
  • 1af8daad on github
  • Prev Build on opt_pdist_diag (#525)
  • Next Build on opt_pdist_diag (#532)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc