25687406882
77%

Ran 11 May 2026 07:11PM UTC

Jobs 1

Files 35

Run time 1min

Badge

Embed ▾

Committed 11 May 2026 05:51PM UTC coverage: 70.727% (+0.3%) from 70.378%

Build # 25687406882

Build Type

push

github

Committed by

web-flow

Commit Message

Cache ProteomeIndex; drop dead legacy parquet fallback (#71)

* Cache ProteomeIndex; drop dead legacy parquet fallback (#69)

Two small reductions of the secondary memory hot paths identified in
#69:

1. `tsarina.indexing.load_ms_evidence` always pushes `peptide=` down
   to hitlist's `load_observations`. The TypeError fallback for
   hitlist < 1.6.0 was dead code under the `hitlist>=1.15.1` pyproject
   pin, and the matching regression test
   (`test_load_ms_evidence_falls_back_when_peptide_kwarg_unsupported`)
   is removed alongside.

2. `tsarina.cli_hits._enumerate_gene_peptides` now resolves its
   `ProteomeIndex` through an `lru_cache(maxsize=1)` keyed by
   `(release, lengths)`. The niche `--skip-ms-evidence` /
   `--iedb` / `--cedar` paths build a ~8-15 GB peak full-proteome
   index; back-to-back gene queries in a single process now reuse it.
   New test pins the call count via mock.

Skipping cross-process disk caching for `proteome_kmer_set` — already
cached process-wide by hitlist, and persistent caching is a bigger
feature tracked separately.

Refs #69.

* Address review feedback on #71

- Rewrite the comment on `_cached_proteome_index` so it reflects the
  actual reason maxsize=1 is the right cap: only one (release, lengths)
  is ever passed within a process, and the index itself is ~8-15 GB
  resident so a second cache entry would double memory without buying
  any hit-rate.
- Extend the cache regression test to also assert "peptide" is among
  the returned columns, catching any future regression that breaks the
  enumeration's column shape when results come from the cache.
- Update the `load_ms_evidence` docstring to describe what the function
  now does: a single pushdown-filtered parquet read with the peptide
  set as one of the pushdown keys. The prior wording still referenced
  the in-memory fallback that this PR removed.

* Fix load_ms_evidence docstring on gene_name pushdown

The prior revision dropped the hedge from the main desc... (continued)

Coverage Stats

2392 of 3382 relevant lines covered (70.73%)

0.71 hits per line

Coverage Regressions

Lines	Coverage	∆	File
80	55.02	6.58%	cli_hits.py
1	95.65	-1.12%	indexing.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	25687406882.1	11 May 2026 07:11PM UTC	35	70.73	GitHub Action Run

pirl-unc / tsarina / 25687406882
77%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 25687406882

pirl-unc / tsarina / 25687406882 77%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 25687406882

pirl-unc / tsarina / 25687406882
77%

README BADGES
x