dask / dask / 15781
53%

Ran 27 Oct 2020 04:16AM UTC

Jobs 1

Files 113

Run time 17s

Badge

pending completion

Build # 15781

Build Type

push

travis-ci

Committed by

Commit Message

Begin experimenting with parallel prefix scan for cumsum and cumprod (#6675)

* Begin experimenting with parallel prefix scan for cumsum and cumprod in dask.array

This is a WIP and needs benchmarked. I think it's interesting, though, and want to share.
It's been a while since I've worked on dask.array, so feedback is most welcome.

This is a work-efficient parallel prefix scan. It uses a Brent-Kung construction and
is known as the Blelloch algorithm. We adapt it to work on chunks.

Previously, to do a cumsum across N chunks would require N levels of dependencies.
This PR takes approximately 2 * lg(N) levels of dependencies. It exposes parallelism.
It is work-efficient and only requires a third more tasks than the previous method.
Scans on floating point values should also be more accurate.

A parallel cumsum works by first taking the sum of each block, then do a binary tree
merge followed by a fan-out (i.e., the Brent-Kung pattern). We then take the cumsum
of each block and add the sum of the previous blocks.

NumPy calculates cumsum and cumprod very fast, but it calculates sum and prod
significantly faster. This is why I think this approach will be faster.
Exposing parallelism and an efficient communication pattern is another reason I think
this should be faster (especially when communication costs are significant).

I also think this will be an interesting test for `dask.order` and the scheduler.

Q: Should we allow users to choose which method to use (i.e., prev or new in this PR)?
Does the answer to this depend on benchmarks?

Benchmarks and graph diagrams are forthcoming :)

* Choose cumsum/cumprod with `method=` keyword argument.

Current choices are "sequential", "blelloch", and "blelloch-split".
Default is "sequential". I need to document these.

* Add docstrings for "blelloch" method for cumsum/cumprod

Coverage Stats

80 of 80 new or added lines in 2 files covered. (100.0%)

21241 of 22596 relevant lines covered (94.0%)

0.94 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
3	15781.3 (PYTHON_VERSION=3.8 ENV_FILE=continuous_integration/environment-3.8.yaml TEST='true' COVERAGE='true' PARALLEL='false' XTRATESTARGS= TEST_IMPORTS='true' NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='1')	27 Oct 2020 04:16AM UTC	0	94.0	Travis Job 15781.3

Source Files on build 15781

Detailed source file information is not available for this build.