• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

SciCrunch / sparc-curation / 552 / 2
2%
master: 2%

Build:
DEFAULT BRANCH: master
Ran 22 May 2020 06:04AM UTC
Files 33
Run time 3s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

22 May 2020 05:59AM UTC coverage: 41.06% (+21.2%) from 19.902%
SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure]

push

travis-ci

tgbugs
massive improvements in spc clone time, setup.py ver and dep bumps

A fresh pull of all 200 remote datasets now takes about 3 minutes.

NOTE: `spc pull` should NOT BE USED unless you know exactly what
you are doing. In the future this functionality will be restored
with better performance, but for now it is almost always faster
delete the contents of the dataset folder and express ds.rchildren.

It only took me about 9 months to finally figure out that I had
actually fixed many of the pulling performance bottlenecks and that we
can almost entirely get rid of the current implementation of pull.

As it turns out it I got almost everything sorted out so that it is
possible to just call `list(dataset_cache.rchildren)` and the entire
entire tree will populate itself. When we fix the cache constructor
this becomes `[rc.materialize() for rc in d.rchildren]` or similar,
depending on exactly what we name that method. Better yet, if we do
it using a bare for loop then the memory overhead will be zero.

The other piece that makes this faster is the completed sparse pull
implementation. We now use the remote package count with a default
cutoff of 10k packages to cause a dataset to be sparse, namely that
only its metadata files and their parend directories are pulled. The
implementation of that is a bit slow, but still about 2 orders of
magnitude faster than the alternative. The approach for implementing
is_sparse also points the way toward being able to mark folders with
additional operational information, e.g. that they should not be
exported or that they should not be pulled at all.

Some tweaks to how spc rmeta works were also made so that existing
metadata will not be repulled in a bulk clone. This work also makes
the BlackfynnCache aware of the dataset metadata pulled from rmeta,
so we should be able to start comparing ttl file and bf:internal
metadata in the near future.

3534 of 8607 relevant lines covered (41.06%)

0.41 hits per line

Source Files on job 552.2 (SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure])
  • Tree
  • List 0
  • Changed 23
  • Source Changed 0
  • Coverage Changed 23
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 294
  • Travis Job 552.2
  • 08210c70 on github
  • Prev Job for SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure] on master (#551.2)
  • Next Job for SCIGRAPH_API=https://scicrunch.org/api/1/sparc-scigraph SCICRUNCH_API_KEY=[secure] on master (#553.2)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc