• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api
93%
main: 93%

Build:
Build:
LAST BUILD BRANCH: v3.4
DEFAULT BRANCH: main
Repo Added 26 Nov 2024 09:20PM UTC
Token 6Ra2O2Hw9sRKiMVfLUH0SUOYrZqZct8CJ regen
Build 304 Last
Files 7
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH TIMX-543-keyset-pagination-for-reading
branch: TIMX-543-keyset-pagination-for-reading
CHANGE BRANCH
x
Reset
Sync Branches
  • TIMX-543-keyset-pagination-for-reading
  • TIMX-414-scaffold-library-project
  • TIMX-415-load-dataset
  • TIMX-415-write-to-dataset
  • TIMX-417-read-from-dataset
  • TIMX-424-reorder-partition-columns
  • TIMX-425-update-load-dataset-and-apply-filtering
  • TIMX-427-improve-logging
  • TIMX-432-rework-dataset-partitions
  • TIMX-453-read-transformed-records-from-dataset
  • TIMX-456-bump-version-number
  • TIMX-456-filter-with-or-conditions
  • TIMX-465-run-record-offset-column
  • TIMX-468-read-configs
  • TIMX-494-new-timdexsource-class
  • TIMX-494-pip-audit-and-logging-updates
  • TIMX-494-run-metadata
  • TIMX-494-source-current-runs-and-records
  • TIMX-496-add-same-day-run-timestamp
  • TIMX-496-establish-migrations-and-backfill-migration
  • TIMX-497-filtering-current-records
  • TIMX-504-dataset-fragments-vs-batches
  • TIMX-506-dataset-metadata-class-client
  • TIMX-507-current-records-utilize-metadata-layer
  • TIMX-508-run-timestamp-data-migration
  • TIMX-509-explicit-run-timestamp
  • TIMX-512-row-group-sizes
  • TIMX-515-hotfix-install-duckdb-httpfs-extension
  • TIMX-515-static-duckdb-file-prep
  • TIMX-526-projected-views
  • TIMX-527-write-append-deltas
  • TIMX-528-merge-append-deltas
  • TIMX-529-sql-based-read-methods
  • TIMX-530-create-static-metadata-db-file
  • TIMX-530-prep-work-and-s3-client
  • TIMX-533-rework-dataset-load
  • TIMX-537-bump-to-major-version-3
  • TIMX-540-ecs-duckdb-s3-connection
  • TIMX-541-extension-installation-lambda-context
  • TIMX-543-cr-optimize-v2
  • bump-version-0-6-0
  • dependabot/pip/boto3-1.35.72
  • dependabot/pip/boto3-1.35.74
  • dependabot/pip/boto3-1.35.76
  • dependabot/pip/boto3-1.35.77
  • dependabot/pip/boto3-1.35.78
  • dependabot/pip/boto3-1.35.79
  • dependabot/pip/boto3-1.35.80
  • dependabot/pip/boto3-1.35.81
  • dependabot/pip/boto3-1.35.82
  • dependabot/pip/boto3-1.35.83
  • dependabot/pip/boto3-1.35.84
  • dependabot/pip/boto3-1.35.85
  • dependabot/pip/boto3-1.35.86
  • dependabot/pip/boto3-1.35.87
  • dependabot/pip/boto3-1.35.88
  • dependabot/pip/boto3-1.35.90
  • dependabot/pip/boto3-1.35.91
  • dependabot/pip/boto3-1.35.92
  • dependabot/pip/boto3-1.35.93
  • dependabot/pip/boto3-1.35.94
  • dependabot/pip/boto3-1.35.96
  • dependabot/pip/boto3-1.35.97
  • dependabot/pip/boto3-1.35.98
  • dependabot/pip/boto3-1.35.99
  • dependabot/pip/boto3-1.36.0
  • dependabot/pip/boto3-1.36.1
  • dependabot/pip/boto3-1.36.10
  • dependabot/pip/boto3-1.36.11
  • dependabot/pip/boto3-1.36.12
  • dependabot/pip/boto3-1.36.13
  • dependabot/pip/boto3-1.36.14
  • dependabot/pip/boto3-1.36.15
  • dependabot/pip/boto3-1.36.16
  • dependabot/pip/boto3-1.36.17
  • dependabot/pip/boto3-1.36.18
  • dependabot/pip/boto3-1.36.19
  • dependabot/pip/boto3-1.36.2
  • dependabot/pip/boto3-1.36.20
  • dependabot/pip/boto3-1.36.21
  • dependabot/pip/boto3-1.36.22
  • dependabot/pip/boto3-1.36.24
  • dependabot/pip/boto3-1.36.25
  • dependabot/pip/boto3-1.36.26
  • dependabot/pip/boto3-1.36.4
  • dependabot/pip/boto3-1.36.5
  • dependabot/pip/boto3-1.36.6
  • dependabot/pip/boto3-1.36.7
  • dependabot/pip/boto3-1.36.8
  • dependabot/pip/boto3-1.36.9
  • dependabot/pip/boto3-1.37.0
  • dependabot/pip/boto3-1.37.1
  • dependabot/pip/boto3-1.37.2
  • dependabot/pip/boto3-1.37.3
  • dependabot/pip/boto3-stubs-1.35.76
  • dependabot/pip/boto3-stubs-1.35.77
  • dependabot/pip/boto3-stubs-1.35.78
  • dependabot/pip/boto3-stubs-1.35.79
  • dependabot/pip/boto3-stubs-1.35.80
  • dependabot/pip/boto3-stubs-1.35.81
  • dependabot/pip/boto3-stubs-1.35.82
  • dependabot/pip/boto3-stubs-1.35.83
  • dependabot/pip/boto3-stubs-1.35.84
  • dependabot/pip/boto3-stubs-1.35.85
  • dependabot/pip/boto3-stubs-1.35.86
  • dependabot/pip/boto3-stubs-1.35.87
  • dependabot/pip/boto3-stubs-1.35.88
  • dependabot/pip/boto3-stubs-1.35.90
  • dependabot/pip/boto3-stubs-1.35.91
  • dependabot/pip/boto3-stubs-1.35.92
  • dependabot/pip/boto3-stubs-1.35.93
  • dependabot/pip/boto3-stubs-1.35.94
  • dependabot/pip/boto3-stubs-1.35.96
  • dependabot/pip/boto3-stubs-1.35.97
  • dependabot/pip/boto3-stubs-1.35.98
  • dependabot/pip/boto3-stubs-1.35.99
  • dependabot/pip/boto3-stubs-1.36.0
  • dependabot/pip/boto3-stubs-1.36.1
  • dependabot/pip/boto3-stubs-1.36.10
  • dependabot/pip/boto3-stubs-1.36.11
  • dependabot/pip/boto3-stubs-1.36.12
  • dependabot/pip/boto3-stubs-1.36.13
  • dependabot/pip/boto3-stubs-1.36.14
  • dependabot/pip/boto3-stubs-1.36.15
  • dependabot/pip/boto3-stubs-1.36.16
  • dependabot/pip/boto3-stubs-1.36.17
  • dependabot/pip/boto3-stubs-1.36.18
  • dependabot/pip/boto3-stubs-1.36.19
  • dependabot/pip/boto3-stubs-1.36.2
  • dependabot/pip/boto3-stubs-1.36.21
  • dependabot/pip/boto3-stubs-1.36.22
  • dependabot/pip/boto3-stubs-1.36.24
  • dependabot/pip/boto3-stubs-1.36.25
  • dependabot/pip/boto3-stubs-1.36.26
  • dependabot/pip/boto3-stubs-1.36.4
  • dependabot/pip/boto3-stubs-1.36.5
  • dependabot/pip/boto3-stubs-1.36.6
  • dependabot/pip/boto3-stubs-1.36.7
  • dependabot/pip/boto3-stubs-1.36.8
  • dependabot/pip/boto3-stubs-1.36.9
  • dependabot/pip/boto3-stubs-1.37.0
  • dependabot/pip/boto3-stubs-1.37.1
  • dependabot/pip/boto3-stubs-1.37.2
  • dependabot/pip/boto3-stubs-1.37.3
  • dependabot/pip/cryptography-44.0.1
  • dependabot/pip/ipython-8.30.0
  • dependabot/pip/ipython-8.31.0
  • dependabot/pip/ipython-8.32.0
  • dependabot/pip/moto-5.0.23
  • dependabot/pip/moto-5.0.24
  • dependabot/pip/moto-5.0.25
  • dependabot/pip/moto-5.0.26
  • dependabot/pip/moto-5.0.27
  • dependabot/pip/moto-5.0.28
  • dependabot/pip/moto-5.1.0
  • dependabot/pip/mypy-1.14.0
  • dependabot/pip/mypy-1.14.1
  • dependabot/pip/mypy-1.15.0
  • dependabot/pip/pandas-stubs-2.2.3.241126
  • dependabot/pip/pyarrow-stubs-17.13
  • dependabot/pip/pyarrow-stubs-17.14
  • dependabot/pip/pytest-8.3.4
  • dependabot/pip/ruff-0.8.1
  • dependabot/pip/ruff-0.8.2
  • dependabot/pip/ruff-0.8.3
  • dependabot/pip/ruff-0.8.4
  • epic-TIMX-515
  • epic-timx-515
  • hotfix-pin-duckdb-v1-3-2
  • main
  • temp-IN-1438-marimo-ecs-permissions
  • v0.2.0
  • v0.3.0
  • v0.6.0
  • v1-release
  • v1.0
  • v1.1
  • v2.0
  • v2.1
  • v2.2
  • v2.3
  • v3.0
  • v3.2
  • v3.3
  • v3.4

29 Aug 2025 08:22PM UTC coverage: 93.011% (+0.02%) from 92.989%
17333521896

Pull #169

github

ghukill
Use keyset pagination for meta and data read methods

Why these changes are being introduced:

For all read methods, the former approach was to perform a metadata query
and store the entire results in memory, then loop through chunks of that
metadata and build SQL queries to perform data retrieval.  Even for
metadata queries that may bring back 3-4 million results, this worked,
but there is an upper limit.

Ideally, we would perform all of our queries -- metadata and data -- in
chunks to ease memory pressure.  And in some cases, this can increase
performance.

How this addresses that need:

This reworks the base read_batches_iter() method to perform smaller,
chunked metadata queries.  To paginate the results, instead of using
the slow LIMIT / OFFSET approach, we use keyset pagination, which means
we can look for values greater than a tuple of values that are ordered.
This is often the preferred way to perform paginated querying when you
have nicely ordered columns.

In support of this, we also begin hashing the filename and run_id
columns for ordering, providing almost an order magnitude speedup.
The performance penalty for creating the hash is offset by the speedup
of ordering integers versus very long strings.

The net effect is no changes to the input/ouput signatures of the read
methods, but improved memory usage and performance.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-543
Pull Request #169: TIMX 543 - keyset pagination for read methods

51 of 55 new or added lines in 3 files covered. (92.73%)

519 of 558 relevant lines covered (93.01%)

0.93 hits per line

Relevant lines Covered
Build:
Build:
558 RELEVANT LINES 519 COVERED LINES
0.93 HITS PER LINE
Source Files on TIMX-543-keyset-pagination-for-reading
  • Tree
  • List 7
  • Changed 3
  • Source Changed 3
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
17333521896 TIMX-543-keyset-pagination-for-reading Use keyset pagination for meta and data read methods Why these changes are being introduced: For all read methods, the former approach was to perform a metadata query and store the entire results in memory, then loop through chunks of that metad... Pull #169 29 Aug 2025 08:25PM UTC ghukill github
93.01
17324732994 TIMX-543-keyset-pagination-for-reading Use keyset pagination for meta and data read methods Why these changes are being introduced: For all read methods, the former approach was to perform a metadata query and store the entire results in memory, then loop through chunks of that metad... push 29 Aug 2025 01:13PM UTC ghukill github
93.25
See All Builds (304)

Badge your Repo: timdex-dataset-api

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc