550
2%

Ran 22 May 2020 05:43AM UTC

Jobs 1

Files 33

Run time 6s

Badge

Embed ▾

pending completion

Build # 550

Build Type

push

travis-ci

Committed by

tgbugs

Commit Message

massive improvements in spc clone time, setup.py ver and dep bumps

A fresh pull of all 200 remote datasets now takes about 3 minutes.

NOTE: `spc pull` should NOT BE USED unless you know exactly what
you are doing. In the future this functionality will be restored
with better performance, but for now it is almost always faster
delete the contents of the dataset folder and express ds.rchildren.

It only took me about 9 months to finally figure out that I had
actually fixed many of the pulling performance bottlenecks and that we
can almost entirely get rid of the current implementation of pull.

As it turns out it I got almost everything sorted out so that it is
possible to just call `list(dataset_cache.rchildren)` and the entire
entire tree will populate itself. When we fix the cache constructor
this becomes `[rc.materialize() for rc in d.rchildren]` or similar,
depending on exactly what we name that method. Better yet, if we do
it using a bare for loop then the memory overhead will be zero.

The other piece that makes this faster is the completed sparse pull
implementation. We now use the remote package count with a default
cutoff of 10k packages to cause a dataset to be sparse, namely that
only its metadata files and their parend directories are pulled. The
implementation of that is a bit slow, but still about 2 orders of
magnitude faster than the alternative. The approach for implementing
is_sparse also points the way toward being able to mark folders with
additional operational information, e.g. that they should not be
exported or that they should not be pulled at all.

Some tweaks to how spc rmeta works were also made so that existing
metadata will not be repulled in a bulk clone. This work also makes
the BlackfynnCache aware of the dataset metadata pulled from rmeta,
so we should be able to start comparing ttl file and bf:internal
metadata in the near future.

Run Details

1713 of 8608 relevant lines covered (19.9%)

0.2 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
2	550.2 (SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure])	22 May 2020 05:43AM UTC	0	19.9	Travis Job 550.2

Source Files on build 550

Detailed source file information is not available for this build.

SciCrunch / sparc-curation / 550
2%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 550

SciCrunch / sparc-curation / 550 2%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 550

SciCrunch / sparc-curation / 550
2%

README BADGES
x