• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api
97%
main: 93%

Build:
Build:
LAST BUILD BRANCH: v3.4
DEFAULT BRANCH: main
Repo Added 26 Nov 2024 09:20PM UTC
Token 6Ra2O2Hw9sRKiMVfLUH0SUOYrZqZct8CJ regen
Build 304 Last
Files 7
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH TIMX-465-run-record-offset-column
branch: TIMX-465-run-record-offset-column
CHANGE BRANCH
x
Reset
Sync Branches
  • TIMX-465-run-record-offset-column
  • TIMX-414-scaffold-library-project
  • TIMX-415-load-dataset
  • TIMX-415-write-to-dataset
  • TIMX-417-read-from-dataset
  • TIMX-424-reorder-partition-columns
  • TIMX-425-update-load-dataset-and-apply-filtering
  • TIMX-427-improve-logging
  • TIMX-432-rework-dataset-partitions
  • TIMX-453-read-transformed-records-from-dataset
  • TIMX-456-bump-version-number
  • TIMX-456-filter-with-or-conditions
  • TIMX-468-read-configs
  • TIMX-494-new-timdexsource-class
  • TIMX-494-pip-audit-and-logging-updates
  • TIMX-494-run-metadata
  • TIMX-494-source-current-runs-and-records
  • TIMX-496-add-same-day-run-timestamp
  • TIMX-496-establish-migrations-and-backfill-migration
  • TIMX-497-filtering-current-records
  • TIMX-504-dataset-fragments-vs-batches
  • TIMX-506-dataset-metadata-class-client
  • TIMX-507-current-records-utilize-metadata-layer
  • TIMX-508-run-timestamp-data-migration
  • TIMX-509-explicit-run-timestamp
  • TIMX-512-row-group-sizes
  • TIMX-515-hotfix-install-duckdb-httpfs-extension
  • TIMX-515-static-duckdb-file-prep
  • TIMX-526-projected-views
  • TIMX-527-write-append-deltas
  • TIMX-528-merge-append-deltas
  • TIMX-529-sql-based-read-methods
  • TIMX-530-create-static-metadata-db-file
  • TIMX-530-prep-work-and-s3-client
  • TIMX-533-rework-dataset-load
  • TIMX-537-bump-to-major-version-3
  • TIMX-540-ecs-duckdb-s3-connection
  • TIMX-541-extension-installation-lambda-context
  • TIMX-543-cr-optimize-v2
  • TIMX-543-keyset-pagination-for-reading
  • bump-version-0-6-0
  • dependabot/pip/boto3-1.35.72
  • dependabot/pip/boto3-1.35.74
  • dependabot/pip/boto3-1.35.76
  • dependabot/pip/boto3-1.35.77
  • dependabot/pip/boto3-1.35.78
  • dependabot/pip/boto3-1.35.79
  • dependabot/pip/boto3-1.35.80
  • dependabot/pip/boto3-1.35.81
  • dependabot/pip/boto3-1.35.82
  • dependabot/pip/boto3-1.35.83
  • dependabot/pip/boto3-1.35.84
  • dependabot/pip/boto3-1.35.85
  • dependabot/pip/boto3-1.35.86
  • dependabot/pip/boto3-1.35.87
  • dependabot/pip/boto3-1.35.88
  • dependabot/pip/boto3-1.35.90
  • dependabot/pip/boto3-1.35.91
  • dependabot/pip/boto3-1.35.92
  • dependabot/pip/boto3-1.35.93
  • dependabot/pip/boto3-1.35.94
  • dependabot/pip/boto3-1.35.96
  • dependabot/pip/boto3-1.35.97
  • dependabot/pip/boto3-1.35.98
  • dependabot/pip/boto3-1.35.99
  • dependabot/pip/boto3-1.36.0
  • dependabot/pip/boto3-1.36.1
  • dependabot/pip/boto3-1.36.10
  • dependabot/pip/boto3-1.36.11
  • dependabot/pip/boto3-1.36.12
  • dependabot/pip/boto3-1.36.13
  • dependabot/pip/boto3-1.36.14
  • dependabot/pip/boto3-1.36.15
  • dependabot/pip/boto3-1.36.16
  • dependabot/pip/boto3-1.36.17
  • dependabot/pip/boto3-1.36.18
  • dependabot/pip/boto3-1.36.19
  • dependabot/pip/boto3-1.36.2
  • dependabot/pip/boto3-1.36.20
  • dependabot/pip/boto3-1.36.21
  • dependabot/pip/boto3-1.36.22
  • dependabot/pip/boto3-1.36.24
  • dependabot/pip/boto3-1.36.25
  • dependabot/pip/boto3-1.36.26
  • dependabot/pip/boto3-1.36.4
  • dependabot/pip/boto3-1.36.5
  • dependabot/pip/boto3-1.36.6
  • dependabot/pip/boto3-1.36.7
  • dependabot/pip/boto3-1.36.8
  • dependabot/pip/boto3-1.36.9
  • dependabot/pip/boto3-1.37.0
  • dependabot/pip/boto3-1.37.1
  • dependabot/pip/boto3-1.37.2
  • dependabot/pip/boto3-1.37.3
  • dependabot/pip/boto3-stubs-1.35.76
  • dependabot/pip/boto3-stubs-1.35.77
  • dependabot/pip/boto3-stubs-1.35.78
  • dependabot/pip/boto3-stubs-1.35.79
  • dependabot/pip/boto3-stubs-1.35.80
  • dependabot/pip/boto3-stubs-1.35.81
  • dependabot/pip/boto3-stubs-1.35.82
  • dependabot/pip/boto3-stubs-1.35.83
  • dependabot/pip/boto3-stubs-1.35.84
  • dependabot/pip/boto3-stubs-1.35.85
  • dependabot/pip/boto3-stubs-1.35.86
  • dependabot/pip/boto3-stubs-1.35.87
  • dependabot/pip/boto3-stubs-1.35.88
  • dependabot/pip/boto3-stubs-1.35.90
  • dependabot/pip/boto3-stubs-1.35.91
  • dependabot/pip/boto3-stubs-1.35.92
  • dependabot/pip/boto3-stubs-1.35.93
  • dependabot/pip/boto3-stubs-1.35.94
  • dependabot/pip/boto3-stubs-1.35.96
  • dependabot/pip/boto3-stubs-1.35.97
  • dependabot/pip/boto3-stubs-1.35.98
  • dependabot/pip/boto3-stubs-1.35.99
  • dependabot/pip/boto3-stubs-1.36.0
  • dependabot/pip/boto3-stubs-1.36.1
  • dependabot/pip/boto3-stubs-1.36.10
  • dependabot/pip/boto3-stubs-1.36.11
  • dependabot/pip/boto3-stubs-1.36.12
  • dependabot/pip/boto3-stubs-1.36.13
  • dependabot/pip/boto3-stubs-1.36.14
  • dependabot/pip/boto3-stubs-1.36.15
  • dependabot/pip/boto3-stubs-1.36.16
  • dependabot/pip/boto3-stubs-1.36.17
  • dependabot/pip/boto3-stubs-1.36.18
  • dependabot/pip/boto3-stubs-1.36.19
  • dependabot/pip/boto3-stubs-1.36.2
  • dependabot/pip/boto3-stubs-1.36.21
  • dependabot/pip/boto3-stubs-1.36.22
  • dependabot/pip/boto3-stubs-1.36.24
  • dependabot/pip/boto3-stubs-1.36.25
  • dependabot/pip/boto3-stubs-1.36.26
  • dependabot/pip/boto3-stubs-1.36.4
  • dependabot/pip/boto3-stubs-1.36.5
  • dependabot/pip/boto3-stubs-1.36.6
  • dependabot/pip/boto3-stubs-1.36.7
  • dependabot/pip/boto3-stubs-1.36.8
  • dependabot/pip/boto3-stubs-1.36.9
  • dependabot/pip/boto3-stubs-1.37.0
  • dependabot/pip/boto3-stubs-1.37.1
  • dependabot/pip/boto3-stubs-1.37.2
  • dependabot/pip/boto3-stubs-1.37.3
  • dependabot/pip/cryptography-44.0.1
  • dependabot/pip/ipython-8.30.0
  • dependabot/pip/ipython-8.31.0
  • dependabot/pip/ipython-8.32.0
  • dependabot/pip/moto-5.0.23
  • dependabot/pip/moto-5.0.24
  • dependabot/pip/moto-5.0.25
  • dependabot/pip/moto-5.0.26
  • dependabot/pip/moto-5.0.27
  • dependabot/pip/moto-5.0.28
  • dependabot/pip/moto-5.1.0
  • dependabot/pip/mypy-1.14.0
  • dependabot/pip/mypy-1.14.1
  • dependabot/pip/mypy-1.15.0
  • dependabot/pip/pandas-stubs-2.2.3.241126
  • dependabot/pip/pyarrow-stubs-17.13
  • dependabot/pip/pyarrow-stubs-17.14
  • dependabot/pip/pytest-8.3.4
  • dependabot/pip/ruff-0.8.1
  • dependabot/pip/ruff-0.8.2
  • dependabot/pip/ruff-0.8.3
  • dependabot/pip/ruff-0.8.4
  • epic-TIMX-515
  • epic-timx-515
  • hotfix-pin-duckdb-v1-3-2
  • main
  • temp-IN-1438-marimo-ecs-permissions
  • v0.2.0
  • v0.3.0
  • v0.6.0
  • v1-release
  • v1.0
  • v1.1
  • v2.0
  • v2.1
  • v2.2
  • v2.3
  • v3.0
  • v3.2
  • v3.3
  • v3.4

15 Jan 2025 06:17PM UTC coverage: 97.423% (+0.03%) from 97.396%
12794648113

push

github

ghukill
Add run_record_offset column to dataset

Why these changes are being introduced:

Bulk reading and writing from the TIMDEX dataset is a primary responsibility,
but occassional random access (e.g. locating a single record row) will be
helpful (e.g. looking at the original source record for a problematic record).

Each TIMDEX JSON record in Opensearch will contain a "provenance" object that will
include things like run_date, run_id, and now run_record_offset.  This offset
allows for quicker (time) and more efficient (data read) retrieval of a single
record given information in the TIMDEX provenance object.

How this addresses that need:

Parquet files have metadata embedded that describe what values can be found
in subsets of the file, but this is only helpful when the min/max values
in that metadata can inform query engines if a desired record may be
present.  Unfortunately, the timdex_record_id is a) not lexicographically
sortable (at least not easily), and b) are not ordered during write.

By adding this offset, effectively an incrementing counter as records are
yielded for writing, we have a value that is pre-sorted and provides nice
ranges in the parquet file metadata.  Query engines can utilize this to
dramatically improve random access reads.  By including this offset integer
in the TIMDEX record "provenance" section we close the loop and provide
enough information in the Opensearch record to efficiently retrieve it
from the parquet dataset.

Side effects of this change:
* Dataset will now include a new column 'run_record_offset'

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-465

5 of 5 new or added lines in 3 files covered. (100.0%)

189 of 194 relevant lines covered (97.42%)

0.97 hits per line

Relevant lines Covered
Build:
Build:
194 RELEVANT LINES 189 COVERED LINES
0.97 HITS PER LINE
Source Files on TIMX-465-run-record-offset-column
  • Tree
  • List 5
  • Changed 3
  • Source Changed 3
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
12794648113 TIMX-465-run-record-offset-column Add run_record_offset column to dataset Why these changes are being introduced: Bulk reading and writing from the TIMDEX dataset is a primary responsibility, but occassional random access (e.g. locating a single record row) will be helpful (e.g.... push 15 Jan 2025 06:26PM UTC ghukill github
97.42
See All Builds (304)

Badge your Repo: timdex-dataset-api

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc