dask / dask / 15784 / 3
53%
master: 53%

Ran 27 Oct 2020 07:18AM UTC

Files 113

Run time 15s

Badge

Committed 26 Oct 2020 07:39PM UTC coverage: 94.003%. Remained the same

Job # PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1'

Build Type

cron

travis-ci

Committed by

Commit Message

Begin experimenting with parallel prefix scan for cumsum and cumprod (#6675)

* Begin experimenting with parallel prefix scan for cumsum and cumprod in dask.array

This is a WIP and needs benchmarked. I think it's interesting, though, and want to share.
It's been a while since I've worked on dask.array, so feedback is most welcome.

This is a work-efficient parallel prefix scan. It uses a Brent-Kung construction and
is known as the Blelloch algorithm. We adapt it to work on chunks.

Previously, to do a cumsum across N chunks would require N levels of dependencies.
This PR takes approximately 2 * lg(N) levels of dependencies. It exposes parallelism.
It is work-efficient and only requires a third more tasks than the previous method.
Scans on floating point values should also be more accurate.

A parallel cumsum works by first taking the sum of each block, then do a binary tree
merge followed by a fan-out (i.e., the Brent-Kung pattern). We then take the cumsum
of each block and add the sum of the previous blocks.

NumPy calculates cumsum and cumprod very fast, but it calculates sum and prod
significantly faster. This is why I think this approach will be faster.
Exposing parallelism and an efficient communication pattern is another reason I think
this should be faster (especially when communication costs are significant).

I also think this will be an interesting test for `dask.order` and the scheduler.

Q: Should we allow users to choose which method to use (i.e., prev or new in this PR)?
Does the answer to this depend on benchmarks?

Benchmarks and graph diagrams are forthcoming :)

* Choose cumsum/cumprod with `method=` keyword argument.

Current choices are "sequential", "blelloch", and "blelloch-split".
Default is "sequential". I need to document these.

* Add docstrings for "blelloch" method for cumsum/cumprod

Coverage Stats

21241 of 22596 relevant lines covered (94.0%)

0.94 hits per line

Source Files on job 15784.3 (PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1')