26642287264
85%

Ran 29 May 2026 02:14PM UTC

Jobs 4

Files 31

Run time 1min

Badge

Embed ▾

Committed 29 May 2026 02:12PM UTC coverage: 78.367% (+0.1%) from 78.237%

Build # 26642287264

Build Type

push

github

Committed by

web-flow

Commit Message

2.1.0: fix #57 — Levenshtein-1 fuzzy CDR3 matching (#112)

Adds a fuzzy fallback for β-only matching in ``match_clonotypes``.
Pre-2.1 only exact CDR3 string equality recorded a hit; in real TIL
cohorts exact β matches are <10% of clones, so most true hits to
known antigen-specific TCRs were being missed.

New parameters on ``match_clonotypes`` and ``annotate_clonotypes``:

  match_mode: "exact" (default; pre-2.1 behaviour) | "levenshtein"
  max_distance: edit-distance threshold for fuzzy mode (currently
                clamped to 1; values >1 warn and clamp)

Implementation: deletion-canonical neighbor index over the database
β CDR3 column. Each DB entry contributes ``len(cdr3) + 1`` canonical
variants (the string itself plus one-character-deletion at each
position). For a query CDR3 of length L, lookup is O(L) — generate
its canonical variants and union the indexed sets to get the
candidate set. Pairwise Lev-distance check on candidates filters
out the canonical-set collisions at Lev distance >1. Tractable on
the actual VDJdb/IEDB scale (~300K β entries).

αβ matching stays strict-exact in both modes. Fuzzy αβ is too noisy
biologically — the paired-chain prior is the strong signal and we
don't want a Lev-1 hit on β to dragnet a wrong α.

Output:
  - db_match_distance: int — 0 for exact β, 1 for Levenshtein-1
    fuzzy β, None when unmatched.
  - db_match_strength: extended with ``b_only_near`` (and
    ``b_only_near_cross`` when host is non-human, per #83 + #54).

TCRdist is intentionally out-of-scope here — it would pull an
optional dependency tree (tcrdist3 or numpy-only tcrdist-rs port).
Worth a follow-up issue if the Levenshtein-1 hit rate isn't enough.

Tests: 31 new in tests/test_annotate_fuzzy.py covering:
  - Direct ``_levenshtein_distance_at_most_1`` unit cases:
    identical / substitution / insertion / deletion / two-edits /
    substitution-then-deletion / length-diff-too-large / empties.
  - Canonical-variant generator: self-inclusion, e... (continued)

Coverage Stats

6720 of 8575 relevant lines covered (78.37%)

3.13 hits per line

Coverage Regressions

Lines	Coverage	∆	File
34	93.51	-0.19%	annotate.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	python-3.10 - 26642287264.1	29 May 2026 02:15PM UTC	31	78.36	GitHub Action Run
2	python-3.9 - 26642287264.2	29 May 2026 02:15PM UTC	31	78.33	GitHub Action Run
3	python-3.11 - 26642287264.3	29 May 2026 02:14PM UTC	31	78.36	GitHub Action Run
4	python-3.12 - 26642287264.4	29 May 2026 02:15PM UTC	31	78.36	GitHub Action Run

pirl-unc / tcrsift / 26642287264
85%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 26642287264

pirl-unc / tcrsift / 26642287264 85%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 26642287264

pirl-unc / tcrsift / 26642287264
85%

README BADGES
x