• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 26973114525
77%

Build:
DEFAULT BRANCH: main
Ran 04 Jun 2026 07:02PM UTC
Jobs 1
Files 29
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

04 Jun 2026 07:00PM UTC coverage: 76.817% (+0.03%) from 76.789%
26973114525

push

github

web-flow
v1.30.58: categorical dtypes for low-cardinality obs metadata (#263) (#269)

* v1.30.58: categorical dtypes for low-cardinality obs metadata (#263)

generate_observations_table() brings per-sample metadata in via the
ms_samples join and PMID dict-lookups, so those columns arrive as plain
object/string dtype (hundreds of MB of per-worker Python str overhead on
a ~4.4M-row table) instead of via the dictionary-encoded parquet path.

- Downcast an audited allowlist of low-cardinality join/derived columns
  to category at the enrichment boundary (sample_mhc, mhc_restriction,
  mhc_class_label_severity, instrument_type, cell_line_name, condition_
  category, ...). Biggest wins: sample_mhc 479MB, mhc_class_label_severity
  227MB, mhc_restriction 224MB -> ~9MB each. Companion to #262: these
  round-trip through Arrow IPC as dictionary columns that xdist workers
  mmap-share zero-copy.
- sample_label deliberately NOT categoricalized: it's compared element-
  wise against cell_name (already categorical) and two differently-
  categoried categoricals can't be compared.
- Add cell_line_name + cell_type to builder._CATEGORICAL_BUILD_COLUMNS
  (the parquet write path) - low-cardinality, was missing.
- _apply_training_defaults now fills via _fillna_scalar_safe (widens the
  category set for out-of-category sentinels like 'not_applicable')
  instead of a raw fillna that raised on categorical columns.

Excludes peptide / *_iri / free-text / semicolon multi-value columns,
mirroring the builder's existing exclusions.

* Document the three cell_* fields in the categorical test fixture

Clarify cell_name (raw IEDB catch-all) vs cell_line_name (line part,
hybrid suffix stripped) vs cell_type (tissue/type part), and make the
synthetic values coherent (a 'Line-1-B cell' hybrid -> line 'Line-1' +
type 'B cell').

4725 of 6151 relevant lines covered (76.82%)

0.77 hits per line

Coverage Regressions

Lines Coverage ∆ File
157
81.47
0.11% export.py
109
63.69
0.0% builder.py
Jobs
ID Job ID Ran Files Coverage
1 26973114525.1 04 Jun 2026 07:02PM UTC 29
76.82
GitHub Action Run
Source Files on build 26973114525
  • Tree
  • List 29
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #26973114525
  • 09f17e42 on github
  • Prev Build on main (#26971093406)
  • Next Build on main (#26974356414)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc