• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 25517476149

07 May 2026 07:29PM UTC coverage: 74.071% (+1.9%) from 72.217%
25517476149

push

github

web-flow
v1.30.41: fix categorical fillna crash + obliterate legacy index cache (#234)

* v1.30.41: fix categorical fillna crash + obliterate legacy index cache

Two fixes in one PR:

1. **Categorical ``fillna`` crash on load.**  Surfaced by
   ``tsarina hits --gene PRAME`` against the freshly-rebuilt v1.30.40
   corpus:

       TypeError: Cannot setitem on a Categorical with a new category
       (), set the categories first

   Triggered by ``df["mhc_class"].fillna("")`` and the per-load
   ``mhc_restriction`` normalization path.  Post-#137 the columns are
   categorical (``{"I", "II", "non classical"}`` for mhc_class), and
   pandas refuses to ``fillna`` with a value outside the category set.

   Fix: cast to ``StringDtype`` before ``fillna`` in the three affected
   spots — ``observations.py:617`` (mhc_class severity check), 587
   (mhc_restriction normalization), and ``supplement.py:291``
   (mhc_species fallback).  StringDtype accepts arbitrary string
   fills.

   Regression tests:
   - ``test_load_observations_handles_categorical_mhc_class_with_nan``
   - ``test_load_observations_normalizes_categorical_mhc_restriction``

2. **Obliterate the legacy ``~/.hitlist/index/`` cache.**  ``hitlist
   data list`` showed an "Index" column whose ``cached`` / ``stale``
   status was based on the OLD CSV-scan fallback path in
   ``indexer.py``.  When ``observations.parquet`` is built, the actual
   ``get_index()`` API derives counts from the parquet directly — the
   per-source cache is never read.  The dated cache files just made
   ``hitlist data list`` look misleadingly stale.

   Changes:
   - ``hitlist/indexer.py`` rewritten: only the parquet-derived
     ``get_index()`` and ``validate_alleles_from_index`` survive.
     The CSV-scan fallback (``_index_from_csv``, ``_scan_single``,
     ``_cache_dir``, ``_cache_is_valid``, ``_resolve_source_paths``,
     ``_cache_key``) is gone.  ``get_index()`` now raises
     ``FileNotFoundError`` when the parquet isn't built ... (continued)

4108 of 5546 relevant lines covered (74.07%)

0.74 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

56.44
/cli.py


Source Not Available

STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc