• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pirl-unc / tcrsift / 25830014136 / 2
84%
main: 84%

Build:
DEFAULT BRANCH: main
Ran 13 May 2026 10:28PM UTC
Files 25
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

13 May 2026 10:26PM UTC coverage: 71.552% (+0.1%) from 71.45%
25830014136.2

push

github

web-flow
Canonical antigen short-symbol table + db_protein_canonical column (#55) (#60)

Adds an authoritative synonym table (`CANONICAL_ANTIGEN_ALIASES`) that
collapses the many free-text Source-Molecule strings VDJdb / IEDB
emit into the short symbols people read papers in. The mapping is
applied during match_clonotypes to produce a `db_protein_canonical`
column alongside the existing `db_protein`.

Examples (from the issue):
  "Melanoma antigen recognized by T-cells 1"  →  MART-1
  "MLANA"                                      →  MART-1
  "Melan-A"                                    →  MART-1
  "Cancer/testis antigen 1"                    →  NY-ESO-1
  "CTAG1B"                                     →  NY-ESO-1
  "Spike glycoprotein"                         →  SARS-CoV-2 Spike
  "surface glycoprotein [SARS-CoV-2]"          →  SARS-CoV-2 Spike
  "Matrix protein 1"                           →  Flu M1
  "Replicase polyprotein 1ab"                  →  SARS-CoV-2 ORF1ab
  "transcriptional activator Tax"              →  HTLV-1 Tax
  "Protein Tax-1"                              →  HTLV-1 Tax  (also #54 form)

Design:
- Ordered list of (pattern, canonical) pairs; first matching pattern
  wins. Patterns are case-insensitive substrings (more forgiving than
  regex word boundaries against the IEDB/VDJdb free-text fields,
  which routinely include `[organism]` suffixes and "protein"
  descriptors that should be ignored).
- Specificity rule: longer / more specific patterns precede shorter
  ones (e.g. "Spike glycoprotein" before any future bare "Spike").
- Short tokens that collide across species are disambiguated by
  baking the organism prefix into the canonical form: `CMV pp65`,
  `Flu M1`, `HTLV-1 Tax`, `SARS-CoV-2 Spike`.
- Pre-classified once on the database (alongside db_category) so the
  per-clone match path picks it as a mode like any other field.
- Unknown antigens pass through unchanged.
- Seed list covers tumor-associated self antigens + the common
  viral/bact... (continued)

4819 of 6735 relevant lines covered (71.55%)

0.72 hits per line

Source Files on job python-3.12 - 25830014136.2
  • Tree
  • List 25
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 25830014136
  • cb94a36c on github
  • Prev Job for on main (#25829560221.2)
  • Next Job for on main (#25830380843.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc