• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

wikimedia / parsoid / 2656
85%

Build:
DEFAULT BRANCH: master
Ran 28 May 2019 04:39PM UTC
Jobs 4
Files 152
Run time 6min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

pending completion
2656

push

travis-ci

jenkins-bot
Implement the parsing pipeline

* All pipeline stages extend the abstract PipelineStage class.

* The pipeline is seen as a simpler array of stages through which input
  is pushed through synchronously.

* The input can be processed all at once or in chunks depending on
  how the stages chunk it. The only supported scenario currenlty is
  for the tokenizer to use wikipeg's native streaming interface (which
  uses a PHP generator underneath) to yield chunks from the tokenizer.

  There are memory saving benefits to this mode since the PEG cache
  can be freed up after every top level block. Additionally, if those
  tokens are pushed all the way to the token builder, memory allocated
  for all those tokens can get freed right away (as long as we null out
  the appropriate variables holding references to them).

* In chunk-based parsing mode, every stage requests its previous stage
  to yield chunks. At the top level, the pipeline requests its last
  stage to yield its result. Doing it backward lets every stage
  figure out how to consume the input from its (generator) previous
  stage. The most tangible benefit is where the DOMPostProcessor
  consumes a DOM from the HTML5TreeBuilder, and the HTML5TreeBuilder
  has an implicit end-of-stream signal when the yielding for-loop
  terminates. This consumer <- generator pull eliminates the need for a
  separate end signal which would be required in a generator -> consumer
  push design.

* But, unlike the JS code, I didn't push this chunked parsing all the way
  yet. So, if a nested pipeline (most commonly when expanding templates
  or extensions) also processes chunks, the current implementation
  aggregates all those chunks in an array before returning that array.
  i.e. the top-level pipeline is not yet a generator. But, to fully
  benefit from chunking, it would make sense for it to be a generator.
  That would require some more invasive changes to token handlers.

  However from a performance POV, this pa... (continued)

9198 of 12006 branches covered (76.61%)

14597 of 17628 relevant lines covered (82.81%)

28646.07 hits per line

Jobs
ID Job ID Ran Files Coverage
1 2656.1 28 May 2019 04:40PM UTC 0
82.78
Travis Job 2656.1
2 2656.2 28 May 2019 04:39PM UTC 0
82.78
Travis Job 2656.2
3 2656.3 28 May 2019 04:43PM UTC 0
82.81
Travis Job 2656.3
4 2656.4 28 May 2019 04:45PM UTC 0
82.81
Travis Job 2656.4
Source Files on build 2656
Detailed source file information is not available for this build.
  • Back to Repo
  • Travis Build #2656
  • c6c8b294 on github
  • Prev Build on master (#2655)
  • Next Build on master (#2657)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc