• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

openvax / topiary / 24434611335 / 3
88%
master: 90%

Build:
Build:
LAST BUILD BRANCH: v5.16.2
DEFAULT BRANCH: master
Ran 15 Apr 2026 03:26AM UTC
Files 33
Run time 1s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

15 Apr 2026 03:18AM UTC coverage: 87.99% (-0.3%) from 88.246%
24434611335.3

Pull #135

github

iskandr
Add SelfProteome — nearest-self lookup architecture (part A of #124)

SelfProteome holds a species-tagged, scope-filtered reference protein
corpus indexed by peptide length and answers per-query nearest-
neighbor lookups.  Plugs into TopiaryPredictor via a new
self_proteome= kwarg; result DataFrame gains self_nearest_peptide /
_peptide_length / _edit_distance / _gene_id / _transcript_id /
_reference_offset / _reference_version columns.  Columns are joined
before filter_by / sort_by so they can participate in DSL expressions.

This PR ships the architecture with one axis (sequence-nearest) and
substitutions only.  Follow-ups queued on #124:
  - scope="protected_tissues" + HPA/GTEx tissue filter
  - 1aa insertion / deletion candidates
  - self_nearest_by_binding / self_strongest_nearby second + third axes
  - self_nearest_candidates structured column
  - Seed-and-extend algorithm once the benchmark decides

## Surface

- SelfProteome.from_peptides(dict, peptide_lengths=...) — test helper
  for in-memory reference sets.
- SelfProteome.from_fasta(path, scope=...) — FASTA loader, scope
  limited to "all" or a callable (no gene metadata in FASTA).
- SelfProteome.from_ensembl(species, release, scope=..., cta_source=...)
  — pyensembl-backed loader.  Defaults to scope="non_cta" for human,
  stripping CTA genes via the existing topiary.sources pirlygenes
  integration.  Non-human species must pass cta_source explicitly
  when using "non_cta"; a clear ValueError fires otherwise.
- reference_version property composes an "ensembl-{species}-{release}+
  scope-{scope}+..." string that gets stamped on every output row.
  Custom filters (sets or callables) hash into the string so
  reproducibility holds even without a stable label.
- nearest(peptides) returns a DataFrame with one row per query,
  preserving input order.  Queries whose length isn't represented in
  the reference get None rows rather than raising.

## Algorithm

SIMD-vectorized Hamming distance against... (continued)
Pull Request #135: Add SelfProteome — nearest-self lookup architecture (part A of #124)

3480 of 3955 relevant lines covered (87.99%)

0.88 hits per line

Source Files on job python-3.12 - 24434611335.3
  • Tree
  • List 33
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 24434611335
  • 6117a712 on github
  • Prev Job for on add-self-proteome-nearest (#24430359255.2)
  • Next Job for on add-self-proteome-nearest (#24435351267.2)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc