• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api
95%
main: 94%

Build:
Build:
LAST BUILD BRANCH: USE-142-dataset-embedding-imports
DEFAULT BRANCH: main
Repo Added 26 Nov 2024 09:20PM UTC
Token 6Ra2O2Hw9sRKiMVfLUH0SUOYrZqZct8CJ regen
Build 318 Last
Files 8
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH TIMX-533-rework-dataset-load
branch: TIMX-533-rework-dataset-load
CHANGE BRANCH
x
Reset
Sync Branches
  • TIMX-533-rework-dataset-load
  • TIMX-414-scaffold-library-project
  • TIMX-415-load-dataset
  • TIMX-415-write-to-dataset
  • TIMX-417-read-from-dataset
  • TIMX-424-reorder-partition-columns
  • TIMX-425-update-load-dataset-and-apply-filtering
  • TIMX-427-improve-logging
  • TIMX-432-rework-dataset-partitions
  • TIMX-453-read-transformed-records-from-dataset
  • TIMX-456-bump-version-number
  • TIMX-456-filter-with-or-conditions
  • TIMX-465-run-record-offset-column
  • TIMX-468-read-configs
  • TIMX-494-new-timdexsource-class
  • TIMX-494-pip-audit-and-logging-updates
  • TIMX-494-run-metadata
  • TIMX-494-source-current-runs-and-records
  • TIMX-496-add-same-day-run-timestamp
  • TIMX-496-establish-migrations-and-backfill-migration
  • TIMX-497-filtering-current-records
  • TIMX-504-dataset-fragments-vs-batches
  • TIMX-506-dataset-metadata-class-client
  • TIMX-507-current-records-utilize-metadata-layer
  • TIMX-508-run-timestamp-data-migration
  • TIMX-509-explicit-run-timestamp
  • TIMX-512-row-group-sizes
  • TIMX-515-hotfix-install-duckdb-httpfs-extension
  • TIMX-515-static-duckdb-file-prep
  • TIMX-526-projected-views
  • TIMX-527-write-append-deltas
  • TIMX-528-merge-append-deltas
  • TIMX-529-sql-based-read-methods
  • TIMX-530-create-static-metadata-db-file
  • TIMX-530-prep-work-and-s3-client
  • TIMX-537-bump-to-major-version-3
  • TIMX-540-ecs-duckdb-s3-connection
  • TIMX-541-extension-installation-lambda-context
  • TIMX-543-cr-optimize-v2
  • TIMX-543-keyset-pagination-for-reading
  • TIMX-559-update-duckdb-dependency
  • USE-142-dataset-embedding-imports
  • USE-142-dataset-embedding-types
  • USE-142-embeddings-source-and-write
  • USE-58-lazy-load-current-records
  • bump-version-0-6-0
  • dependabot/pip/boto3-1.35.72
  • dependabot/pip/boto3-1.35.74
  • dependabot/pip/boto3-1.35.76
  • dependabot/pip/boto3-1.35.77
  • dependabot/pip/boto3-1.35.78
  • dependabot/pip/boto3-1.35.79
  • dependabot/pip/boto3-1.35.80
  • dependabot/pip/boto3-1.35.81
  • dependabot/pip/boto3-1.35.82
  • dependabot/pip/boto3-1.35.83
  • dependabot/pip/boto3-1.35.84
  • dependabot/pip/boto3-1.35.85
  • dependabot/pip/boto3-1.35.86
  • dependabot/pip/boto3-1.35.87
  • dependabot/pip/boto3-1.35.88
  • dependabot/pip/boto3-1.35.90
  • dependabot/pip/boto3-1.35.91
  • dependabot/pip/boto3-1.35.92
  • dependabot/pip/boto3-1.35.93
  • dependabot/pip/boto3-1.35.94
  • dependabot/pip/boto3-1.35.96
  • dependabot/pip/boto3-1.35.97
  • dependabot/pip/boto3-1.35.98
  • dependabot/pip/boto3-1.35.99
  • dependabot/pip/boto3-1.36.0
  • dependabot/pip/boto3-1.36.1
  • dependabot/pip/boto3-1.36.10
  • dependabot/pip/boto3-1.36.11
  • dependabot/pip/boto3-1.36.12
  • dependabot/pip/boto3-1.36.13
  • dependabot/pip/boto3-1.36.14
  • dependabot/pip/boto3-1.36.15
  • dependabot/pip/boto3-1.36.16
  • dependabot/pip/boto3-1.36.17
  • dependabot/pip/boto3-1.36.18
  • dependabot/pip/boto3-1.36.19
  • dependabot/pip/boto3-1.36.2
  • dependabot/pip/boto3-1.36.20
  • dependabot/pip/boto3-1.36.21
  • dependabot/pip/boto3-1.36.22
  • dependabot/pip/boto3-1.36.24
  • dependabot/pip/boto3-1.36.25
  • dependabot/pip/boto3-1.36.26
  • dependabot/pip/boto3-1.36.4
  • dependabot/pip/boto3-1.36.5
  • dependabot/pip/boto3-1.36.6
  • dependabot/pip/boto3-1.36.7
  • dependabot/pip/boto3-1.36.8
  • dependabot/pip/boto3-1.36.9
  • dependabot/pip/boto3-1.37.0
  • dependabot/pip/boto3-1.37.1
  • dependabot/pip/boto3-1.37.2
  • dependabot/pip/boto3-1.37.3
  • dependabot/pip/boto3-stubs-1.35.76
  • dependabot/pip/boto3-stubs-1.35.77
  • dependabot/pip/boto3-stubs-1.35.78
  • dependabot/pip/boto3-stubs-1.35.79
  • dependabot/pip/boto3-stubs-1.35.80
  • dependabot/pip/boto3-stubs-1.35.81
  • dependabot/pip/boto3-stubs-1.35.82
  • dependabot/pip/boto3-stubs-1.35.83
  • dependabot/pip/boto3-stubs-1.35.84
  • dependabot/pip/boto3-stubs-1.35.85
  • dependabot/pip/boto3-stubs-1.35.86
  • dependabot/pip/boto3-stubs-1.35.87
  • dependabot/pip/boto3-stubs-1.35.88
  • dependabot/pip/boto3-stubs-1.35.90
  • dependabot/pip/boto3-stubs-1.35.91
  • dependabot/pip/boto3-stubs-1.35.92
  • dependabot/pip/boto3-stubs-1.35.93
  • dependabot/pip/boto3-stubs-1.35.94
  • dependabot/pip/boto3-stubs-1.35.96
  • dependabot/pip/boto3-stubs-1.35.97
  • dependabot/pip/boto3-stubs-1.35.98
  • dependabot/pip/boto3-stubs-1.35.99
  • dependabot/pip/boto3-stubs-1.36.0
  • dependabot/pip/boto3-stubs-1.36.1
  • dependabot/pip/boto3-stubs-1.36.10
  • dependabot/pip/boto3-stubs-1.36.11
  • dependabot/pip/boto3-stubs-1.36.12
  • dependabot/pip/boto3-stubs-1.36.13
  • dependabot/pip/boto3-stubs-1.36.14
  • dependabot/pip/boto3-stubs-1.36.15
  • dependabot/pip/boto3-stubs-1.36.16
  • dependabot/pip/boto3-stubs-1.36.17
  • dependabot/pip/boto3-stubs-1.36.18
  • dependabot/pip/boto3-stubs-1.36.19
  • dependabot/pip/boto3-stubs-1.36.2
  • dependabot/pip/boto3-stubs-1.36.21
  • dependabot/pip/boto3-stubs-1.36.22
  • dependabot/pip/boto3-stubs-1.36.24
  • dependabot/pip/boto3-stubs-1.36.25
  • dependabot/pip/boto3-stubs-1.36.26
  • dependabot/pip/boto3-stubs-1.36.4
  • dependabot/pip/boto3-stubs-1.36.5
  • dependabot/pip/boto3-stubs-1.36.6
  • dependabot/pip/boto3-stubs-1.36.7
  • dependabot/pip/boto3-stubs-1.36.8
  • dependabot/pip/boto3-stubs-1.36.9
  • dependabot/pip/boto3-stubs-1.37.0
  • dependabot/pip/boto3-stubs-1.37.1
  • dependabot/pip/boto3-stubs-1.37.2
  • dependabot/pip/boto3-stubs-1.37.3
  • dependabot/pip/cryptography-44.0.1
  • dependabot/pip/ipython-8.30.0
  • dependabot/pip/ipython-8.31.0
  • dependabot/pip/ipython-8.32.0
  • dependabot/pip/moto-5.0.23
  • dependabot/pip/moto-5.0.24
  • dependabot/pip/moto-5.0.25
  • dependabot/pip/moto-5.0.26
  • dependabot/pip/moto-5.0.27
  • dependabot/pip/moto-5.0.28
  • dependabot/pip/moto-5.1.0
  • dependabot/pip/mypy-1.14.0
  • dependabot/pip/mypy-1.14.1
  • dependabot/pip/mypy-1.15.0
  • dependabot/pip/pandas-stubs-2.2.3.241126
  • dependabot/pip/pyarrow-stubs-17.13
  • dependabot/pip/pyarrow-stubs-17.14
  • dependabot/pip/pytest-8.3.4
  • dependabot/pip/ruff-0.8.1
  • dependabot/pip/ruff-0.8.2
  • dependabot/pip/ruff-0.8.3
  • dependabot/pip/ruff-0.8.4
  • epic-TIMX-515
  • epic-timx-515
  • hotfix-pin-duckdb-v1-3-2
  • main
  • temp-IN-1438-marimo-ecs-permissions
  • v0.2.0
  • v0.3.0
  • v0.6.0
  • v1-release
  • v1.0
  • v1.1
  • v2.0
  • v2.1
  • v2.2
  • v2.3
  • v3.0
  • v3.2
  • v3.3
  • v3.4

08 Aug 2025 09:05PM UTC coverage: 94.828% (+0.3%) from 94.558%
16840527470

Pull #160

github

ghukill
Load pyarrow dataset on TIMDEXDataset init

Why these changes are being introduced:

As the TIMDEXDatasetMetadata becomes more integrated, there is
less need to be explicit about how we load the pyarrow dataset.

Formerly, the method .load() needed to be called manually and
supported options like 'current_records' or 'include_parquet_files'.
This also reflected a time when 'TIMDEXDataset.load()' suggested that
"loading" was the pyarrow dataset only.  With the introduction of
metadata, it is also better to be specific we are loading a pyarrow
dataset which is only one of many assets associated with a
TIMDEXDataset instance.

How this addresses that need:

Renames .load() to .load_pyarrow_dataset() to be explicit about
what is happening.

We no longer store the pyarrow dataset filesystem or paths on self,
as they are only used briefly during this dataset load.  We can get
them anytime via .dataset.

Really most important, we limit the root 'location' that we init
a TIMDEXDataset instance to be a string only, the root of the dataset.
Now that we don't allow a list of strings at that level, we can trust
the nature of self.location to be a string, and the root of the TIMDEX
dataset.

Side effects of this change:
* TIMDEXDataset and TIMDEXDatasetMetadata can only be initialized
with a string, which is the root of the TIMDEX dataset.  From there,
both know where their assets can be found.
* You cannot "pre-filter" the pyarrow dataset when loading, which had
confusing overlap with the read methods; the read methods themselves
may change somewhat dramatically now that we have metadata to use.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-533
Pull Request #160: TIMX 533 - Load pyarrow dataset on TIMDEXDataset init

38 of 38 new or added lines in 1 file covered. (100.0%)

1 existing line in 1 file now uncovered.

385 of 406 relevant lines covered (94.83%)

0.95 hits per line

Relevant lines Covered
Build:
Build:
406 RELEVANT LINES 385 COVERED LINES
0.95 HITS PER LINE
Source Files on TIMX-533-rework-dataset-load
  • Tree
  • List 7
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
16840527470 TIMX-533-rework-dataset-load Load pyarrow dataset on TIMDEXDataset init Why these changes are being introduced: As the TIMDEXDatasetMetadata becomes more integrated, there is less need to be explicit about how we load the pyarrow dataset. Formerly, the method .load() neede... Pull #160 08 Aug 2025 09:08PM UTC ghukill github
94.83
16815676743 TIMX-533-rework-dataset-load Load pyarrow dataset on TIMDEXDataset init Why these changes are being introduced: As the TIMDEXDatasetMetadata becomes more integrated, there is less need to be explicit about how we load the pyarrow dataset. Formerly, the method .load() neede... Pull #160 07 Aug 2025 09:01PM UTC ghukill github
94.83
16815561643 TIMX-533-rework-dataset-load Load pyarrow dataset on TIMDEXDataset init Why these changes are being introduced: As the TIMDEXDatasetMetadata becomes more integrated, there is less need to be explicit about how we load the pyarrow dataset. Formerly, the method .load() neede... Pull #160 07 Aug 2025 08:55PM UTC ghukill github
94.83
See All Builds (318)

Badge your Repo: timdex-dataset-api

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc