• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 24791370603
80%

Build:
DEFAULT BRANCH: main
Ran 22 Apr 2026 04:59PM UTC
Jobs 1
Files 21
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

22 Apr 2026 04:57PM UTC coverage: 50.2% (+0.06%) from 50.139%
24791370603

push

github

web-flow
v1.15.1: normalize mhc_restriction on ingest + length bounds on observations/binding (#121, #118) (#123)

## #121 — normalize_allele at ingest

Before this fix, the scanner and supplement ingest paths wrote raw
``mhc_restriction`` strings to the parquet without normalization.
Supplementary data (Gomez-Zepeda 2024 SK-MEL-37, PMID 38480730)
contained ``A*02:01`` without the ``HLA-`` prefix; the scanner
passes through IEDB strings verbatim and occasionally catches typos
or unusual formats.

Effect: ``load_observations(mhc_restriction="HLA-A*02:01")`` would
miss rows stored as ``A*02:01``. The CLI ``--mhc-allele`` path
normalizes at query time so it hits both, but raw parquet inspection
and aggregators grouping on the raw column silently split.

Fix: apply ``hitlist.curation.normalize_allele`` to
``mhc_restriction`` in both ingest paths. The function is already
``@cache``d on the unique vocabulary (~100k strings), so per-row cost
is ~100 ns after the first hit.

## #118 — length_min / length_max on load_observations / load_binding

``load_bulk_peptides`` shipped ``length_min`` / ``length_max`` in
v1.14.3 (#108). The MHC peptide loaders needed the same — every
training-set script was post-filtering ``df[df["peptide"].str.len()
.between(8, 11)]`` after load.

- ``load_observations(length_min=8, length_max=11)`` → MHC-I window
- ``load_observations(length_min=12, length_max=25)`` → MHC-II window
- ``load_binding(length_min=9, length_max=9)`` → strictly 9-mers
- ``load_all_evidence(length_min=..., length_max=...)`` passes
  through to both underlying loaders

Implementation detail: observations.parquet / binding.parquet don't
carry an explicit ``length`` column (unlike the bulk parquet), so the
bound is applied post-read via ``peptide.str.len().between(lo, hi)``.
For the full 4.4M-row observations parquet this is ~100 ms, small
relative to the read. A future PR can add ``length`` as a stored Int64
column at build time and push the filter down; for now the sim... (continued)

1629 of 3245 relevant lines covered (50.2%)

0.5 hits per line

Coverage Regressions

Lines Coverage ∆ File
21
27.67
-0.18% scanner.py
8
91.0
-0.49% observations.py
4
86.57
0.2% supplement.py
Jobs
ID Job ID Ran Files Coverage
1 24791370603.1 22 Apr 2026 04:59PM UTC 21
50.2
GitHub Action Run
Source Files on build 24791370603
  • Tree
  • List 21
  • Changed 3
  • Source Changed 0
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #24791370603
  • e72573c8 on github
  • Prev Build on main (#24789976505)
  • Next Build on main (#24793421865)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc