2
85%
main: 85%

Ran 13 May 2026 10:16PM UTC

Files 25

Run time 2s

Badge

Embed ▾

Committed 13 May 2026 10:15PM UTC coverage: 71.45% (+0.09%) from 71.362%

Job # 25829560221.2

Build Type

push

github

Committed by

web-flow

Commit Message

Use IEDB epitope_full_v3 table for canonical antigen/species names (#54) (#59)

Adds an optional second IEDB download (epitope_full_v3.csv from the
epitope_full_v3.zip endpoint) and joins it against the receptor file
to replace the long/inconsistent Source Molecule / Source Organism
strings with the epitope table's shorter, more publication-canonical
equivalents.

Findings against the real cache:

  - Original #54 premise (epitope table fills receptor blanks) is
    barely true — only 0 of 6,785 empty Source Molecule values are
    recoverable, and 202 of 6,491 empty Source Organism values.
  - But the epitope table provides *shorter* names for ~80K rows
    where receptor and epitope disagree. Examples:
      "transcriptional activator Tax"
        → "Protein Tax-1"
      "HLA class I histocompatibility antigen, Cw-3 alpha chain..."
        → "MHC class I protein"
      "Melanocyte protein PMEL"
        → "melanocyte-specific secreted glycoprotein, partial"
  - Decision (with user): pivot from "fill blanks" to "override with
    shorter canonical form when present." This is also what #55 will
    consume — reducing synonym sprawl up front.

Changes:

- `datacache.py`: new `iedb_epitope` DatabaseSpec pointing at IEDB's
  `epitope_full_v3.zip`. `tcrsift data download --db iedb_epitope`
  now works.
- `_load_iedb_v3` also captures `epitope_iri` (the IEDB URL ID),
  needed as the join key.
- `_normalize_iedb_iri` turns the receptor's `https://www.iedb.org/...`
  into the `http://www.iedb.org/...` scheme used by the epitope file.
  Without this, the join finds 0 rows.
- `load_iedb_epitope_lookup` parses the hierarchical-header epitope
  CSV and returns a dedup'd lookup indexed by normalized IRI with
  `antigen_gene` and `species` columns.
- `_apply_iedb_epitope_overrides` replaces receptor `antigen_gene`
  / `species` with epitope-table values where present (non-NaN),
  keeping the receptor value otherwise.
- `load_iedb` gains an `epitope_path` kwarg. Thr... (continued)

Coverage Stats

4795 of 6711 relevant lines covered (71.45%)

0.71 hits per line

pirl-unc / tcrsift / 25829560221 / 2
85%
main: 85%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python-3.12 - 25829560221.2

pirl-unc / tcrsift / 25829560221 / 2 85% main: 85%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job python-3.12 - 25829560221.2

pirl-unc / tcrsift / 25829560221 / 2
85%
main: 85%

README BADGES
x