• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 25711122998
76%

Build:
DEFAULT BRANCH: main
Ran 12 May 2026 03:18AM UTC
Jobs 1
Files 28
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

12 May 2026 03:16AM UTC coverage: 75.329% (+0.3%) from 75.035%
25711122998

push

github

web-flow
WIP plan: on-disk cache for proteome k-mer indexes (closes #246) (#247)

* add scripts/profile_build.py for build pipeline profiling (#176, #246)

Wraps build_observations(force=True) under cProfile + tracks wall time
and peak RSS, with a stage-by-stage breakdown by intercepting the
builder's print() checkpoints.

Used to gather the data attached to issue #246: peptide_mappings is
67% of build wall time, dominated by ~159 _build calls (40 species x
4 lengths) that each rebuild the k-mer index from scratch.

* on-disk cache for built ProteomeIndexes (closes #246)

Implements the plan from PR #247 description.

The peptide_mappings stage of `hitlist data build` is 67% of total
wall (cProfile, see #246) — dominated by 159 ProteomeIndex._build
calls (40 species x 4 peptide lengths) that each rebuild the k-mer
index from scratch on every run, even when the source FASTAs haven't
changed.

This commit pickles built indexes to ~/.hitlist/proteome_index_cache/
keyed by the same (path, size, mtime, lengths, gene_name, gene_id)
tuple the existing in-memory `_FASTA_INDEX_CACHE` uses, plus a format
version prefix.  When IEDB CSVs change but FASTAs don't (the common
deploy pattern), subsequent rebuilds skip the per-(species, length)
_build calls entirely and load each index from a ~1-3s pickle read
instead of a ~30-120s rebuild.

New helpers in proteome.py:
- _disk_cache_filename(): version-prefixed, basename+lengths+sha256
  filename for human debuggability
- _load_index_from_disk(): returns None on miss / corrupt / wrong
  format; touches mtime on hit for LRU tracking
- _write_index_to_disk(): atomic .tmp + rename; failures warn instead
  of crashing the build
- _evict_disk_cache_if_over_cap(): mtime-LRU eviction triggered after
  every successful write
- set_disk_cache_dir() / clear_disk_cache(): public for tests

Cache cap: 50 GB default, configurable via
HITLIST_PROTEOME_INDEX_CACHE_GB env var.  Set to 0 to disable.

Wired into ProteomeIndex.from_fasta between... (continued)

4348 of 5772 relevant lines covered (75.33%)

0.75 hits per line

Coverage Regressions

Lines Coverage ∆ File
38
89.02
1.21% proteome.py
Jobs
ID Job ID Ran Files Coverage
1 25711122998.1 12 May 2026 03:18AM UTC 28
75.33
GitHub Action Run
Source Files on build 25711122998
  • Tree
  • List 28
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #25711122998
  • f954507b on github
  • Prev Build on main (#25684802403)
  • Next Build on main (#25711341345)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc