• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api / 16815561643 / 1
95%
main: 93%

Build:
Build:
LAST BUILD BRANCH: USE-306-handle-missing-metadata-or-embeddings
DEFAULT BRANCH: main
Ran 07 Aug 2025 08:55PM UTC
Files 7
Run time 0s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

07 Aug 2025 08:52PM UTC coverage: 94.828% (+0.3%) from 94.558%
16815561643.1

Pull #160

github

ghukill
Load pyarrow dataset on TIMDEXDataset init

Why these changes are being introduced:

As the TIMDEXDatasetMetadata becomes more integrated, there is
less need to be explicit about how we load the pyarrow dataset.

Formerly, the method .load() needed to be called manually and
supported options like 'current_records' or 'include_parquet_files'.
This also reflected a time when 'TIMDEXDataset.load()' suggested that
"loading" was the pyarrow dataset only.  With the introduction of
metadata, it is also better to be specific we are loading a pyarrow
dataset which is only one of many assets associated with a
TIMDEXDataset instance.

How this addresses that need:

Renames .load() to .load_pyarrow_dataset() to be explicit about
what is happening.

We no longer store the pyarrow dataset filesystem or paths on self,
as they are only used briefly during this dataset load.  We can get
them anytime via .dataset.

Really most important, we limit the root 'location' that we init
a TIMDEXDataset instance to be a string only, the root of the dataset.
Now that we don't allow a list of strings at that level, we can trust
the nature of self.location to be a string, and the root of the TIMDEX
dataset.

Side effects of this change:
* TIMDEXDataset and TIMDEXDatasetMetadata can only be initialized
with a string, which is the root of the TIMDEX dataset.  From there,
both know where their assets can be found.
* You cannot "pre-filter" the pyarrow dataset when loading, which had
confusing overlap with the read methods; the read methods themselves
may change somewhat dramatically now that we have metadata to use.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-533
Pull Request #160: TIMX 533 - Load pyarrow dataset on TIMDEXDataset init

385 of 406 relevant lines covered (94.83%)

0.95 hits per line

Source Files on job 16815561643.1
  • Tree
  • List 7
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 16815561643
  • 4412f0f7 on github
  • Prev Job for on TIMX-533-rework-dataset-load (#16806100415.1)
  • Next Job for on TIMX-533-rework-dataset-load (#16815676743.1)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc