• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

ContentMine / thresher
83%
master: 51%

Build:
Build:
LAST BUILD BRANCH: fix-hyphen-urls
DEFAULT BRANCH: master
Repo Added 29 Jun 2014 04:23PM UTC
Files 14
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH follow-on
branch: follow-on
CHANGE BRANCH
x
Reset
  • follow-on
  • depfree
  • fix-hyphen-urls
  • master
  • v0.0.1
  • v0.0.2
  • v0.0.3
  • v0.0.5
  • v0.0.6
  • v0.0.7
  • v0.0.9
  • v0.1.10
  • v0.1.11

pending completion
43

push

travis-ci

Blahah
Complete overhaul of thresher architecture

This is the first, and most major step in a complete overhaul of thresher.
The purpose of this is to support the current and near-future needs of
scraperJSON, based on revisiting the design and incorporating a lot of
user feedback.

Major changes:

- all scraping functionality has been moved to the Scraper class
- the Thresher class now only handles selecting a scraper by URL, and running it
- ScraperBox class holds a collection of scrapers and can match them to URLs
- all logging has been removed and the entire module now operates using events

scraperJSON features implemented:

- elements can be nested (fixes #2 and ContentMine/scraperJSON#3)
- elements can depend on 'following' the captured URLs from other elements (fixes #6)
- URLs are resolved (and all redirects followed) before scraping (fixes #10)
- headless pre-rendering is no longer default (for a massive speed/efficiency increase)

388 of 470 relevant lines covered (82.55%)

10.15 hits per line

Relevant lines Covered
Build:
Build:
470 RELEVANT LINES 388 COVERED LINES
10.15 HITS PER LINE
Source Files on follow-on
  • List 0
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
43 follow-on Complete overhaul of thresher architecture This is the first, and most major step in a complete overhaul of thresher. The purpose of this is to support the current and near-future needs of scraperJSON, based on revisiting the design and incorpora... push 07 Sep 2014 11:09PM UTC Blahah travis-ci pending completion  
See All Builds (57)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc