• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / hitlist / 25493184752

07 May 2026 11:31AM UTC coverage: 72.217% (-0.04%) from 72.255%
25493184752

push

github

web-flow
v1.30.40: fix build crash — normalize pmid to Int64 before Arrow conversion (#233)

v1.30.39 introduced ``pa.Table.from_pandas`` for the per-source
partition conversion, which infers a single type per column.  The
scanner emits ``pmid`` as object dtype with ``""`` for missing rows
and integer-like strings for present ones; pyarrow chokes:

    pyarrow.lib.ArrowInvalid: Could not convert '' with type str:
    tried to convert to int64

The full-frame ``pmid → Int64`` cast at the bottom of
``build_observations`` runs AFTER the per-source Arrow conversion,
so it can't fix this.

Fix: normalize ``pmid`` to ``Int64`` per-partition immediately after
``is_binding_assay`` partitioning, BEFORE
``_compress_categoricals`` and ``pa.Table.from_pandas``.  The
``pd.to_numeric(..., errors="coerce")`` cast uniformizes the column
to ``Int64`` with ``pd.NA`` for missing, which pyarrow accepts.

Regression test
``test_pyarrow_from_pandas_handles_mixed_pmid_after_normalization``
exercises a mixed-shape pmid column (string ``"12345"``, empty
``""``, Python ``int 67890``) through the same normalization +
Arrow conversion the builder does, asserting round-trip preserves
Int64 + NA.

The full-frame cast at the bottom of ``build_observations`` is
preserved as defensive scaffolding — it's now a no-op since the
per-partition cast runs first, but kept against future code paths
that bypass the partition step.

4107 of 5687 relevant lines covered (72.22%)

0.72 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

61.92
/builder.py


Source Not Available

STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc