• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

broadinstitute / catch / 296 / 1
94%
master: 94%

Build:
DEFAULT BRANCH: master
Ran 04 Mar 2019 11:06PM UTC
Files 65
Run time 15s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

04 Mar 2019 10:35PM UTC coverage: 95.127% (+0.02%) from 95.106%
296.1

push

travis-ci-com

web-flow
Merge pull request #25 from broadinstitute/cluster-genomes

Add options to cluster input sequences and design on each cluster

This PR addresses #24.

Issue #24 gives background and reasons for this feature. In short, [`design.py`](https://github.com/broadinstitute/catch/blob/master/bin/design.py) can be slow when given a large number of highly divergent sequences (e.g., all sequences for all eight segments of influenza A virus). One solution is to cluster input sequences (alignment-free), solve a separate set cover instance on each cluster, and then merge the output probes from each cluster.

This PR adds the argument `--cluster-and-design-separately`. When provided, it produces a signature (or "sketch") of each input sequence using MinHash (similar to what is done in [Mash](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x)) and clusters the sequences by comparing their signatures. Clustering itself can be slow and memory-intensive, but using signatures enables fast pairwise comparison of sequences. Then, it both generates candidate probes independently on each cluster and runs a collection of filters on those candidate probes (typically including [`set_cover_filter`](https://github.com/broadinstitute/catch/blob/master/catch/filter/set_cover_filter.py)) independently on each cluster. It merges the resulting probes (removing exact duplicates), and runs final filters (e.g., [`adapter_filter`](https://github.com/broadinstitute/catch/blob/master/catch/filter/adapter_filter.py)) on the merged set of probes.

Depending on the resource requirements of clustering, this can generally improve runtime and memory usage overall because solving independent, smaller set cover instances requires fewer resources than solving the complete one. One downside is that this can increase the size of the resulting probe set (e.g., if there is homology between input sequences that are placed into different clusters).

This PR also adds the ... (continued)

1683 of 1876 branches covered (89.71%)

5193 of 5459 relevant lines covered (95.13%)

0.95 hits per line

Source Files on job 296.1
  • Tree
  • List 0
  • Changed 20
  • Source Changed 20
  • Coverage Changed 17
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Build 82
  • Travis Job 296.1
  • bf97305e on github
  • Prev Job for on master (#289.1)
  • Next Job for on master (#306.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc