• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 24754572661
80%

Build:
DEFAULT BRANCH: main
Ran 22 Apr 2026 01:01AM UTC
Jobs 1
Files 21
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

22 Apr 2026 01:00AM UTC coverage: 48.541% (+0.1%) from 48.394%
24754572661

push

github

web-flow
v1.14.0: shrink ProteomeIndex + length-on-demand — full build now runs on 8-16 GB machines (#110) (#111)

Cuts peak RSS during ``hitlist data build`` from ~13 GB to ~3 GB by
two orthogonal compressions:

## (1) Packed int64 postings in ``ProteomeIndex.index``

Old: ``dict[str, list[tuple[str, int]]]``  — ~195 bytes per k-mer entry.
New: ``dict[str, int | np.ndarray]``       — scalar int for single-hit
     k-mers (39.7M out of 41.7M for human), np.int64 array for multi-hit.

Postings are packed as ``(prot_idx << 32) | pos`` where ``prot_idx``
refers to a new ``_protein_ids`` list populated in ``_build``. Public
API is unchanged — ``lookup()`` and ``map_peptides()`` decode the
packed form transparently; callers that were reading ``self.index``
values directly (there were none in the codebase) would need to
migrate.

Human (8/9/10/11-mers): index drops from ~8 GB to ~6 GB in isolation.

## (2) Length-on-demand in build_peptide_mappings

The mapping pass in ``hitlist.mappings.build_peptide_mappings`` was
holding a single ProteomeIndex covering all four MHC-I lengths
simultaneously (~10 GB for human). Switched to iterating per-length:
build (9,)-only index, query all 9-mers, drop, build (10,)-only, etc.

Added ``lengths=`` kwarg to ``_build_species_index``. Each
single-length human index is ~3.2 GB; sequential build + drop keeps
peak bounded at that.

Peak RSS during mapping pass: ~13 GB → ~3.3 GB. MCF7-sized laptop
machines no longer get OOM-killed mid-flanking.

## Merge behavior

``ProteomeIndex.merge`` now rebuilds the compact representation from
the union of protein dicts rather than copying list-of-tuple postings.
Simpler + correct across the new int-indexed posting representation.
A few percent slower for human + small-viral merges; irrelevant
performance-wise for the typical case.

## Tests

All 14 existing ProteomeIndex tests + 111 related tests pass unchanged
— proves the packed representation is transparent at the public API.
Empirical benchmar... (continued)

1514 of 3119 relevant lines covered (48.54%)

0.49 hits per line

Coverage Regressions

Lines Coverage ∆ File
49
16.67
-1.17% mappings.py
38
75.98
2.9% proteome.py
Jobs
ID Job ID Ran Files Coverage
1 24754572661.1 22 Apr 2026 01:01AM UTC 21
48.54
GitHub Action Run
Source Files on build 24754572661
  • Tree
  • List 21
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #24754572661
  • dcca9af8 on github
  • Prev Build on main (#24700180143)
  • Next Build on main (#24757065298)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc