24782354308
80%

Ran 22 Apr 2026 01:57PM UTC

Jobs 1

Files 21

Run time 1min

Badge

Embed ▾

Committed 22 Apr 2026 01:55PM UTC coverage: 49.673% (+0.6%) from 49.069%

Build # 24782354308

Build Type

push

github

Committed by

web-flow

Commit Message

v1.14.3: in-silico digest + length/percentile bounds on bulk loaders (#104, #108) (#115)

## #104 — hitlist.proteome.digest()

Centralizes in-silico cleavage rules so every consumer building
theoretical-negative peptide sets (MS-detectability training, vaccine
candidate pre-screening, peptide identity round-tripping) doesn't
re-derive them by hand:

    >>> from hitlist.proteome import digest
    >>> digest(prame_seq, enzyme="Trypsin/P", min_len=7, max_len=30, max_missed=2)
    >>> digest(prame_seq, enzyme="GluC")     # bicarbonate: cleaves E AND D
    >>> digest(prame_seq, enzyme="LysC")     # allows K-P cleavage
    >>> digest(prame_seq, enzyme="Chymotrypsin")  # MaxQuant's + variant (F/W/Y/L/M)

Dispatch key matches ``sources.yaml::digestion_enzyme`` exactly, so
results are directly comparable to ``load_bulk_peptides(digestion_enzyme=...)``
output. Short aliases (``"Trypsin"``, ``"trypsin"``, ``"chymo"``, …)
resolve to the canonical form for ergonomics.

Subtle corners captured in the rules table:
- **Trypsin/P**: not before P (classical KP/RP rule)
- **Chymotrypsin+**: F/W/Y/**L/M**, not before P (MaxQuant's permissive
  variant — matches the Bekker-Jensen ingest)
- **GluC;D.P**: E **and D**, not before P — the bicarbonate-buffer
  variant the paper used; D-cleavage is where phosphate-GluC users
  silently get wrong results when they re-derive
- **LysC/P**: K, **allowed** before P (unlike trypsin)

## #108 — length / abundance_percentile bounds on loaders

    >>> load_bulk_peptides(length_min=8, length_max=11)          # MHC-I window
    >>> load_bulk_peptides(length_min=7, length_max=30)          # detectability input
    >>> load_bulk_proteomics(cell_line="HeLa",
    ...                      abundance_percentile_min=0.9)       # top-decile abundant

Covers the literal recipes the training-set scripts were re-filtering
post-load. ``abundance_percentile`` bounds explicitly drop NaN-rank
rows (peptide-level rows carry NaN there — only protein rows... (continued)

Coverage Stats

1594 of 3209 relevant lines covered (49.67%)

0.5 hits per line

Coverage Regressions

Lines	Coverage	∆	File
2	97.47	-1.09%	bulk_proteomics.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	24782354308.1	22 Apr 2026 01:57PM UTC	21	49.67	GitHub Action Run

pirl-unc / hitlist / 24782354308
80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24782354308

pirl-unc / hitlist / 24782354308 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24782354308

pirl-unc / hitlist / 24782354308
80%

README BADGES
x