24906948592
80%

Ran 24 Apr 2026 07:05PM UTC

Jobs 1

Files 23

Run time 1min

Badge

Embed ▾

Committed 24 Apr 2026 07:03PM UTC coverage: 58.642% (+1.9%) from 56.771%

Build # 24906948592

Build Type

push

github

Committed by

web-flow

Commit Message

v1.17.0: Store assay_iri as stable evidence_row_id (#146) (#152)

Issue #146: scan() deduplicates source rows by assay IRI but the built
indexes only preserve reference_iri (paper-level for IEDB/CEDAR).  The
training export then builds evidence_row_id from reference_iri, which
collapses many rows from the same paper to one identifier — a
catastrophic aliasing for any consumer that needs to regroup exploded
mapping rows back to a single assay observation.

Fix:

- hitlist/scanner.py: add assay_iri to every record emitted by scan().
  The value already exists (line 299 reads it for dedupe) — it just
  wasn't being carried onto the record.
- hitlist/supplement.py: synthesize an assay_iri equal to the existing
  row-unique reference_iri string (supplement:PMID:peptide:mhc).  The
  export layer can then treat assay_iri as the canonical identifier
  regardless of source.
- hitlist/builder.py: cross-source dedup was documented as 'by assay
  IRI' but implemented against reference_iri.  Use assay_iri primarily
  with a reference_iri fallback so partial rebuilds / older
  intermediates still work.
- hitlist/export.py: _apply_training_defaults now builds
  evidence_row_id = {evidence_kind}:{assay_iri} when assay_iri is
  populated, falling back to {evidence_kind}:{reference_iri} for
  pre-#146 parquets, and finally to {evidence_kind}:row:{idx} when
  both are empty.

Tests:

- tests/test_scanner.py: synthetic IEDB CSV with three distinct assay
  IRIs sharing one reference IRI — all three survive the dedupe AND
  the output carries distinct assay_iri values.  Duplicate-assay_iri
  rows are collapsed to one.
- tests/test_export.py: three row-id tests — assay_iri preferred
  (3 unique IDs from 3 rows that share a paper), reference_iri
  fallback for older parquets, row:{idx} sentinel when both are
  empty.
- tests/test_supplement.py: assert assay_iri is present and non-empty
  on every supplementary row.

Version: 1.16.0 → 1.17.0 (minor; new parquet column, export... (continued)

Coverage Stats

2375 of 4050 relevant lines covered (58.64%)

0.59 hits per line

Coverage Regressions

Lines	Coverage	∆	File
140	58.82	-0.72%	builder.py
22	78.99	0.4%	export.py
2	75.47	47.8%	scanner.py
2	86.76	0.2%	supplement.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	24906948592.1	24 Apr 2026 07:05PM UTC	23	58.64	GitHub Action Run

pirl-unc / hitlist / 24906948592
80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24906948592

pirl-unc / hitlist / 24906948592 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24906948592

pirl-unc / hitlist / 24906948592
80%

README BADGES
x