2
2%
master: 2%

Ran 22 May 2020 06:04AM UTC

Files 33

Run time 3s

Badge

Embed ▾

Committed 22 May 2020 05:59AM UTC coverage: 41.06% (+21.2%) from 19.902%

Job # SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure]

Build Type

push

travis-ci

Committed by

tgbugs

Commit Message

massive improvements in spc clone time, setup.py ver and dep bumps

A fresh pull of all 200 remote datasets now takes about 3 minutes.

NOTE: `spc pull` should NOT BE USED unless you know exactly what
you are doing. In the future this functionality will be restored
with better performance, but for now it is almost always faster
delete the contents of the dataset folder and express ds.rchildren.

It only took me about 9 months to finally figure out that I had
actually fixed many of the pulling performance bottlenecks and that we
can almost entirely get rid of the current implementation of pull.

As it turns out it I got almost everything sorted out so that it is
possible to just call `list(dataset_cache.rchildren)` and the entire
entire tree will populate itself. When we fix the cache constructor
this becomes `[rc.materialize() for rc in d.rchildren]` or similar,
depending on exactly what we name that method. Better yet, if we do
it using a bare for loop then the memory overhead will be zero.

The other piece that makes this faster is the completed sparse pull
implementation. We now use the remote package count with a default
cutoff of 10k packages to cause a dataset to be sparse, namely that
only its metadata files and their parend directories are pulled. The
implementation of that is a bit slow, but still about 2 orders of
magnitude faster than the alternative. The approach for implementing
is_sparse also points the way toward being able to mark folders with
additional operational information, e.g. that they should not be
exported or that they should not be pulled at all.

Some tweaks to how spc rmeta works were also made so that existing
metadata will not be repulled in a bulk clone. This work also makes
the BlackfynnCache aware of the dataset metadata pulled from rmeta,
so we should be able to start comparing ttl file and bf:internal
metadata in the near future.

Run Details

3534 of 8607 relevant lines covered (41.06%)

0.41 hits per line

SciCrunch / sparc-curation / 552 / 2
2%
master: 2%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 552.2 (SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure])

SciCrunch / sparc-curation / 552 / 2 2% master: 2%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 552.2 (SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure])

SciCrunch / sparc-curation / 552 / 2
2%
master: 2%

README BADGES
x