• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 24757161437
80%

Build:
DEFAULT BRANCH: main
Ran 22 Apr 2026 02:34AM UTC
Jobs 1
Files 21
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

22 Apr 2026 02:33AM UTC coverage: 49.069% (+0.2%) from 48.854%
24757161437

push

github

web-flow
v1.14.2: expose proteome_kmer_set primitive for cross-package caching (#99) (#114)

Adds a shared primitive for "every protein-coding k-mer at lengths X,
optionally restricted to a gene subset" — the common recipe for
self-peptide subtraction in neoantigen / vaccine candidate filtering.

## What ships

**ProteomeIndex methods:**
- ``all_kmers: frozenset[str]`` — cached property returning every
  unique k-mer at ``self.lengths`` across all indexed proteins.
- ``kmers_for_genes(gene_ids: frozenset[str]) -> frozenset[str]`` —
  walks the proteins table (not the k-mer index) to re-extract k-mers
  from proteins whose ``gene_id`` is in ``gene_ids``. Returns an
  independent frozenset per call; cache yourself if needed.

**Module-level primitive:**
- ``hitlist.proteome.proteome_kmer_set(release, lengths, gene_ids=None,
  species="human") -> frozenset[str]`` — builds the ProteomeIndex on
  demand via ``from_ensembl`` (reuses any in-process cache from #86)
  and caches the returned frozenset via ``@lru_cache(maxsize=8)`` so
  repeat calls with identical args are <1 ms.

## Why

tsarina (v0.10.1, PRs #18 / #19) ships its own ``_non_cta_proteome_kmers``
and ``_human_proteome_kmers`` helpers that walk ``ensembl.genes()`` and
build ``frozenset[str]`` objects. They're correctly cached per-process
but every future sibling package (perseus, topiary, notebook scripts)
would have to re-implement the same walk. The primitive belongs in
hitlist, which already owns ``ProteomeIndex.from_ensembl``.

Tsarina's forthcoming PR #19 collapses its local helpers into a
delegate call: ``proteome_kmer_set(release, lengths, gene_ids=...)``.
Downstream packages can follow the same pattern.

## Cost / caching

- First call (~10-60 s): ProteomeIndex build + iterating the sequences
  for the ``gene_ids``-filtered case.
- Subsequent calls with identical args: <1 ms — cached frozenset.
- Memory: full human at lengths (8,9,10,11) is ~1 GB for the frozenset.
  v1.14.0's compact ProteomeInde... (continued)

1554 of 3167 relevant lines covered (49.07%)

0.49 hits per line

Coverage Regressions

Lines Coverage ∆ File
19
75.98
0.0% proteome.py
Jobs
ID Job ID Ran Files Coverage
1 24757161437.1 22 Apr 2026 02:34AM UTC 21
49.07
GitHub Action Run
Source Files on build 24757161437
  • Tree
  • List 21
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #24757161437
  • 7c108bc6 on github
  • Prev Build on main (#24757065298)
  • Next Build on main (#24782354308)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc