• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

broadinstitute / catch / 296
94%

Build:
DEFAULT BRANCH: master
Ran 04 Mar 2019 11:05PM UTC
Jobs 3
Files 65
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
296

push

travis-ci-com

web-flow
Merge pull request #25 from broadinstitute/cluster-genomes

Add options to cluster input sequences and design on each cluster

This PR addresses #24.

Issue #24 gives background and reasons for this feature. In short, [`design.py`](https://github.com/broadinstitute/catch/blob/master/bin/design.py) can be slow when given a large number of highly divergent sequences (e.g., all sequences for all eight segments of influenza A virus). One solution is to cluster input sequences (alignment-free), solve a separate set cover instance on each cluster, and then merge the output probes from each cluster.

This PR adds the argument `--cluster-and-design-separately`. When provided, it produces a signature (or "sketch") of each input sequence using MinHash (similar to what is done in [Mash](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x)) and clusters the sequences by comparing their signatures. Clustering itself can be slow and memory-intensive, but using signatures enables fast pairwise comparison of sequences. Then, it both generates candidate probes independently on each cluster and runs a collection of filters on those candidate probes (typically including [`set_cover_filter`](https://github.com/broadinstitute/catch/blob/master/catch/filter/set_cover_filter.py)) independently on each cluster. It merges the resulting probes (removing exact duplicates), and runs final filters (e.g., [`adapter_filter`](https://github.com/broadinstitute/catch/blob/master/catch/filter/adapter_filter.py)) on the merged set of probes.

Depending on the resource requirements of clustering, this can generally improve runtime and memory usage overall because solving independent, smaller set cover instances requires fewer resources than solving the complete one. One downside is that this can increase the size of the resulting probe set (e.g., if there is homology between input sequences that are placed into different clusters).

This PR also adds the ... (continued)

1516 of 1709 branches covered (88.71%)

5193 of 5459 relevant lines covered (95.13%)

2.85 hits per line

Jobs
ID Job ID Ran Files Coverage
1 296.1 04 Mar 2019 11:06PM UTC 0
95.13
Travis Job 296.1
2 296.2 04 Mar 2019 11:06PM UTC 0
95.13
Travis Job 296.2
3 296.3 04 Mar 2019 11:05PM UTC 0
95.13
Travis Job 296.3
Source Files on build 296
Detailed source file information is not available for this build.
  • Back to Repo
  • Build #296
  • bf97305e on github
  • Prev Build on master (#289)
  • Next Build on master (#306)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc