• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

dask / dask / 15784 / 3
53%
master: 53%

Build:
DEFAULT BRANCH: master
Ran 27 Oct 2020 07:18AM UTC
Files 113
Run time 15s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

26 Oct 2020 07:39PM UTC coverage: 94.003%. Remained the same
PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1'

cron

travis-ci

web-flow
Begin experimenting with parallel prefix scan for cumsum and cumprod (#6675)

* Begin experimenting with parallel prefix scan for cumsum and cumprod in dask.array

This is a WIP and needs benchmarked.  I think it's interesting, though, and want to share.
It's been a while since I've worked on dask.array, so feedback is most welcome.

This is a work-efficient parallel prefix scan.  It uses a Brent-Kung construction and
is known as the Blelloch algorithm.  We adapt it to work on chunks.

Previously, to do a cumsum across N chunks would require N levels of dependencies.
This PR takes approximately 2 * lg(N) levels of dependencies.  It exposes parallelism.
It is work-efficient and only requires a third more tasks than the previous method.
Scans on floating point values should also be more accurate.

A parallel cumsum works by first taking the sum of each block, then do a binary tree
merge followed by a fan-out (i.e., the Brent-Kung pattern).  We then take the cumsum
of each block and add the sum of the previous blocks.

NumPy calculates cumsum and cumprod very fast, but it calculates sum and prod
significantly faster.  This is why I think this approach will be faster.
Exposing parallelism and an efficient communication pattern is another reason I think
this should be faster (especially when communication costs are significant).

I also think this will be an interesting test for `dask.order` and the scheduler.

Q: Should we allow users to choose which method to use (i.e., prev or new in this PR)?
Does the answer to this depend on benchmarks?

Benchmarks and graph diagrams are forthcoming :)

* Choose cumsum/cumprod with `method=` keyword argument.

Current choices are "sequential", "blelloch", and "blelloch-split".
Default is "sequential".  I need to document these.

* black

* Add docstrings for "blelloch" method for cumsum/cumprod

21241 of 22596 relevant lines covered (94.0%)

0.94 hits per line

Source Files on job 15784.3 (PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1')
  • Tree
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 13380
  • Travis Job 15784.3
  • a8329c20 on github
  • Prev Job for PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1' on master (#15781.3)
  • Next Job for PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1' on master (#15785.3)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc