• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 25613983293
75%

Build:
DEFAULT BRANCH: main
Ran 09 May 2026 11:01PM UTC
Jobs 1
Files 28
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

09 May 2026 10:59PM UTC coverage: 74.898% (+0.4%) from 74.531%
25613983293

push

github

web-flow
v1.30.46: split gene/protein columns out of observations.parquet (closes #238 partial) (#243)

Drops ``gene_names``, ``gene_ids``, ``protein_ids``, and
``n_source_proteins`` from the observations.parquet schema.  The same
information has always been in ``peptide_mappings.parquet`` (one row
per peptide x protein); we were storing it in TWO places and merging
the long-form mapping back onto every observation row at build time.

## Wins (measured against the v1.30.45 build of the same source data)

| Metric | v1.30.45 | v1.30.46 | Delta |
|---|---|---|---|
| observations.parquet size | 192.2 MB | **117.6 MB** | **-39%, -74.5 MB** |
| Gene/protein columns combined size | 71.1 MB (38.8% of obs) | 0 MB | -71 MB |
| Gene-annotation merge step in build | full-frame merge | skipped | (eliminates the largest single transient memory step in build_observations) |
| ``hitlist pmhc --gene PRAME`` rows loaded | 4.4M (full corpus) | 257 (matched peptides only) | ~17,000x reduction |
| ``hitlist pmhc --gene PRAME`` parquet load time | ~6s | 0.8s | ~7.5x faster |

Build-side memory wins are harder to capture as a single number — the
v1.30.45 build held the full peptide_mappings (~65 MB) AND obs (~190 MB)
in pandas form simultaneously through ``annotate_observations_with_genes``,
which is the largest transient step.  Eliminating that step cuts the
single biggest memory blip in the build pipeline.

## What changed

### Build path (``builder.py``)

Skip ``annotate_observations_with_genes()``.  The peptide_mappings.parquet
sidecar continues to be built independently as it always was.

### Reader path (``observations.py``)

- ``load_observations`` now AUTO-ATTACHES gene/protein columns from
  peptide_mappings.parquet when the caller requests them but the
  parquet doesn't carry them (post-v1.30.46).  Only joins the
  matched-peptides slice (cheap on filtered loads, expensive only on
  full-corpus loads — same complexity as the old build-time merge).
- New entries in ``_DERI... (continued)

4234 of 5653 relevant lines covered (74.9%)

0.75 hits per line

Coverage Regressions

Lines Coverage ∆ File
102
63.08
0.85% builder.py
19
89.21
0.64% pmhc_query.py
11
94.53
-0.63% observations.py
Jobs
ID Job ID Ran Files Coverage
1 25613983293.1 09 May 2026 11:01PM UTC 28
74.9
GitHub Action Run
Source Files on build 25613983293
  • Tree
  • List 28
  • Changed 4
  • Source Changed 0
  • Coverage Changed 4
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #25613983293
  • 7e9c10de on github
  • Prev Build on main (#25601432313)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc