• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api / 12243904605
99%
main: 93%

Build:
Build:
LAST BUILD BRANCH: USE-306-handle-missing-metadata-or-embeddings
DEFAULT BRANCH: main
Ran 09 Dec 2024 08:40PM UTC
Jobs 1
Files 5
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

09 Dec 2024 08:31PM UTC coverage: 98.013% (-0.6%) from 98.592%
12243904605

push

github

jonavellecuerdo
Rework dataset partitions to only year, month, day

Why these changes are being introduced:
* These changes simplify the partitioning schema for the TIMDEXDataset,
allowing the app to take advantage of PyArrow's memory-efficient
processes for reading and writing Parquet datasets. Furthermore, the
new partitioning schema will result in a more efficient, coherent
folder structure when writing datasets. For more details, see:
https://mitlibraries.atlassian.net/wiki/spaces/IN/pages/4094296066/Engineering+Plan+Parquet+Datasets+for+TIMDEX+ETL#Rework-Dataset-Partitions-to-use-only-Year-%2F-Month-%2F-Day.

How this addresses that need:
* Update TIMDEX_DATASET_SCHEMA to include [year, month, day]
* Update DatasetRecord attrs to include [year, month, day] and
  set [source, run_date, run_type, run_id, action] as primary columns
* Add post_init method to DatasetRecord to derive partition values
  from 'run-date
* Remove 'partition' values from DatasetRecord.to_dict
* Remove 'partition_values' mixin from TIMDEXDataset.write to reduce
  complexity and have write method utilize DatasetRecord partition
  columns instead.
* Update unit tests to use new partitions and remove deprecated tests

Side effects of this change:
* The new partitioning schema introduces a 3-level folder structure
within TIMDEXDataset.location (i.e. the base path of the dataset)
for [year, month, day], where the leaf node will contain parquet files
for every source run.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-432

18 of 18 new or added lines in 1 file covered. (100.0%)

1 existing line in 1 file now uncovered.

148 of 151 relevant lines covered (98.01%)

0.98 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
1
97.06
-2.94% timdex_dataset_api/record.py
Jobs
ID Job ID Ran Files Coverage
1 12243904605.1 09 Dec 2024 08:40PM UTC 5
98.01
GitHub Action Run
Source Files on build 12243904605
  • Tree
  • List 5
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #12243904605
  • 76347b1b on github
  • Prev Build on main (#12237422707)
  • Next Build on TIMX-432-rework-dataset-partitions (#12282750069)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc