• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api / 16815676743
95%
main: 93%

Build:
Build:
LAST BUILD BRANCH: USE-306-handle-missing-metadata-or-embeddings
DEFAULT BRANCH: main
Ran 07 Aug 2025 09:01PM UTC
Jobs 1
Files 7
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

07 Aug 2025 08:58PM UTC coverage: 94.828% (+0.3%) from 94.558%
16815676743

Pull #160

github

ghukill
Load pyarrow dataset on TIMDEXDataset init

Why these changes are being introduced:

As the TIMDEXDatasetMetadata becomes more integrated, there is
less need to be explicit about how we load the pyarrow dataset.

Formerly, the method .load() needed to be called manually and
supported options like 'current_records' or 'include_parquet_files'.
This also reflected a time when 'TIMDEXDataset.load()' suggested that
"loading" was the pyarrow dataset only.  With the introduction of
metadata, it is also better to be specific we are loading a pyarrow
dataset which is only one of many assets associated with a
TIMDEXDataset instance.

How this addresses that need:

Renames .load() to .load_pyarrow_dataset() to be explicit about
what is happening.

We no longer store the pyarrow dataset filesystem or paths on self,
as they are only used briefly during this dataset load.  We can get
them anytime via .dataset.

Really most important, we limit the root 'location' that we init
a TIMDEXDataset instance to be a string only, the root of the dataset.
Now that we don't allow a list of strings at that level, we can trust
the nature of self.location to be a string, and the root of the TIMDEX
dataset.

Side effects of this change:
* TIMDEXDataset and TIMDEXDatasetMetadata can only be initialized
with a string, which is the root of the TIMDEX dataset.  From there,
both know where their assets can be found.
* You cannot "pre-filter" the pyarrow dataset when loading, which had
confusing overlap with the read methods; the read methods themselves
may change somewhat dramatically now that we have metadata to use.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-533
Pull Request #160: TIMX 533 - Load pyarrow dataset on TIMDEXDataset init

38 of 38 new or added lines in 1 file covered. (100.0%)

1 existing line in 1 file now uncovered.

385 of 406 relevant lines covered (94.83%)

0.95 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
1
0.0
-100.0% timdex_dataset_api/exceptions.py
Jobs
ID Job ID Ran Files Coverage
1 16815676743.1 07 Aug 2025 09:01PM UTC 7
94.83
GitHub Action Run
Source Files on build 16815676743
  • Tree
  • List 7
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #16815676743
  • Pull Request #160
  • PR Base - TIMX-526-projected-views (#16806100415)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc