2656
85%

Ran 28 May 2019 04:39PM UTC

Jobs 4

Files 152

Run time 6min

Badge

Embed ▾

pending completion

Build # 2656

Build Type

push

travis-ci

Committed by jenkins-bot

Commit Message

Implement the parsing pipeline

* All pipeline stages extend the abstract PipelineStage class.

* The pipeline is seen as a simpler array of stages through which input
  is pushed through synchronously.

* The input can be processed all at once or in chunks depending on
  how the stages chunk it. The only supported scenario currenlty is
  for the tokenizer to use wikipeg's native streaming interface (which
  uses a PHP generator underneath) to yield chunks from the tokenizer.

  There are memory saving benefits to this mode since the PEG cache
  can be freed up after every top level block. Additionally, if those
  tokens are pushed all the way to the token builder, memory allocated
  for all those tokens can get freed right away (as long as we null out
  the appropriate variables holding references to them).

* In chunk-based parsing mode, every stage requests its previous stage
  to yield chunks. At the top level, the pipeline requests its last
  stage to yield its result. Doing it backward lets every stage
  figure out how to consume the input from its (generator) previous
  stage. The most tangible benefit is where the DOMPostProcessor
  consumes a DOM from the HTML5TreeBuilder, and the HTML5TreeBuilder
  has an implicit end-of-stream signal when the yielding for-loop
  terminates. This consumer <- generator pull eliminates the need for a
  separate end signal which would be required in a generator -> consumer
  push design.

* But, unlike the JS code, I didn't push this chunked parsing all the way
  yet. So, if a nested pipeline (most commonly when expanding templates
  or extensions) also processes chunks, the current implementation
  aggregates all those chunks in an array before returning that array.
  i.e. the top-level pipeline is not yet a generator. But, to fully
  benefit from chunking, it would make sense for it to be a generator.
  That would require some more invasive changes to token handlers.

  However from a performance POV, this pa... (continued)

Coverage Stats

9198 of 12006 branches covered (76.61%)

14597 of 17628 relevant lines covered (82.81%)

28646.07 hits per line

Jobs

ID	Job ID	Ran	Coverage
1	2656.1	28 May 2019 04:40PM UTC	82.78	Travis Job 2656.1
2	2656.2	28 May 2019 04:39PM UTC	82.78	Travis Job 2656.2
3	2656.3	28 May 2019 04:43PM UTC	82.81	Travis Job 2656.3
4	2656.4	28 May 2019 04:45PM UTC	82.81	Travis Job 2656.4

Source Files on build 2656

Detailed source file information is not available for this build.

wikimedia / parsoid / 2656
85%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 2656

wikimedia / parsoid / 2656 85%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 2656

wikimedia / parsoid / 2656
85%

README BADGES
x