• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api / 12282750069
99%
main: 93%

Build:
Build:
LAST BUILD BRANCH: USE-306-handle-missing-metadata-or-embeddings
DEFAULT BRANCH: main
Ran 11 Dec 2024 06:38PM UTC
Jobs 1
Files 5
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

11 Dec 2024 06:19PM UTC coverage: 98.013% (-0.6%) from 98.592%
12282750069

Pull #15

github

jonavellecuerdo
Update TIMDEXDatase.write method to only overwrite similarly named parquet files

Why these changes are being introduced:
* Since the TIMDEXDataset partitions are now the [year, month, day]
of the 'run_date', parquet files from different source runs
will be written to the same partition. The previous configuration
of existing_data_behavior="delete_matching" would result in
the deletion of any existing parquet files from the partition directory
with every source run, which is not the desired outcome.
To support the new partitions, this updates the configuration
existing_data_behavior="overwrite_or_ignore" which will
ignore any existing data and will only overwrite files with the
same filename.

How this addresses that need:
* Set existing_data_behavior="overwrite_or_ignore" in ds.write_dataset method call
* Add unit tests to demonstrate updated existing_data_behavior

Side effects of this change:
* In the event the multiple runs are performed for the same 'source' and 'run-date',
which is unlikely to occur, parquet files from both runs will exist in the
partitioned directory. DatasetRecords are can still be uniquely identified via the
'run_id' column.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-432
Pull Request #15: Rework dataset partitions to only year, month, day

21 of 21 new or added lines in 2 files covered. (100.0%)

1 existing line in 1 file now uncovered.

148 of 151 relevant lines covered (98.01%)

0.98 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
1
97.06
-2.94% timdex_dataset_api/record.py
Jobs
ID Job ID Ran Files Coverage
1 12282750069.1 11 Dec 2024 06:38PM UTC 5
98.01
GitHub Action Run
Source Files on build 12282750069
  • Tree
  • List 5
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #12282750069
  • Pull Request #15
  • PR Base - main (#12237422707)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc