25165443155

Committed 30 Apr 2026 12:30PM UTC coverage: 70.099% (+0.09%) from 70.012%

Build # 25165443155

Build Type

push

github

Committed by

web-flow

Commit Message

v1.30.9: hitlist qc discrepancies — per-PMID curation triage report (#193)

Adds a fourth QC subcommand that scans observations.parquet for
biologically suspicious patterns and surfaces per-PMID rates. One row
per (pmid, mhc_class) bucket, sorted descending by suspect-row score
so a curator picks targets from the top.

Detected patterns:
- suspect_class_label_n / _rate (#182): per-PMID count of rows where
  bimodal length disagrees with curated class — the v1.30.0
  mhc_class_label_suspect flag rolled up.
- length_p50 / _p99: median + 99th percentile peptide length per
  bucket. Class I should sit at p50≈9, p99≤12; class II at p50≈14-15.
- monoallelic_class_only_n (#45): mono-allelic rows that carry the
  "HLA class I/II" sentinel instead of a 4-digit allele — paper knows
  the allele, IEDB lost it.
- class_pool_n / _rate (#37): rows that came down the
  pmid_class_pool fallback (no per-peptide allele resolution).
- nonstandard_aa_n: peptides containing X / B / Z / U / O / lowercase
  / digits — ambiguous PSM IDs or upstream string corruption.

Also wired into hitlist/cli.py as `hitlist qc discrepancies` with
--class, --min-rows (default 50), --top, --output, and added to
qc.run_all() so `hitlist qc` (no subcommand) prints the new report.

Coverage Stats

3486 of 4973 relevant lines covered (70.1%)

0.7 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

55.24

/cli.py

pirl-unc / hitlist / 25165443155

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source Not Available

Source File
Press 'n' to go to next uncovered line, 'b' for previous