• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / timdex-dataset-api / 17324732994 / 1
93%
main: 93%

Build:
Build:
LAST BUILD BRANCH: USE-306-handle-missing-metadata-or-embeddings
DEFAULT BRANCH: main
Ran 29 Aug 2025 01:13PM UTC
Files 7
Run time 0s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

29 Aug 2025 01:05PM UTC coverage: 93.248% (+0.3%) from 92.989%
17324732994.1

push

github

ghukill
Use keyset pagination for meta and data read methods

Why these changes are being introduced:

For all read methods, the former approach was to perform a metadata query
and store the entire results in memory, then loop through chunks of that
metadata and build SQL queries to perform data retrieval.  Even for
metadata queries that may bring back 3-4 million results, this worked,
but there is an upper limit.

Ideally, we would perform all of our queries -- metadata and data -- in
chunks to ease memory pressure.  And in some cases, this can increase
performance.

How this addresses that need:

This reworks the base read_batches_iter() method to perform smaller,
chunked metadata queries.  To paginate the results, instead of using
the slow LIMIT / OFFSET approach, we use keyset pagination, which means
we can look for values greater than a tuple of values that are ordered.
This is often the preferred way to perform paginated querying when you
have nicely ordered columns.

In support of this, we also begin hashing the filename and run_id
columns for ordering, providing almost an order magnitude speedup.
The performance penalty for creating the hash is offset by the speedup
of ordering integers versus very long strings.

The net effect is no changes to the input/ouput signatures of the read
methods, but improved memory usage and performance.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-543

511 of 548 relevant lines covered (93.25%)

0.93 hits per line

Source Files on job 17324732994.1
  • Tree
  • List 7
  • Changed 3
  • Source Changed 3
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 17324732994
  • efb3973d on github
  • Prev Job for on TIMX-543-keyset-pagination-for-reading (#17242126793.1)
  • Next Job for on TIMX-543-keyset-pagination-for-reading (#17333521896.1)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc