|
Ran
|
Jobs
4
|
Files
106
|
Run time
4min
|
Badge
README BADGES
|
push
travis-ci
<a href="https://github.com/wikimedia/parsoid/commit/461ed0ab1">LanguageConverter: switch to byte-oriented state machines For Latin script languages, the maximum degree of a state machine node is on the order of ~50, since the alphabet (plus digits, punctuation, and spaces) is not very large. However for CJK languages like Mandarin, the maximum degree of a state machine node can be much higher, since there are thousands of possible characters. The foma tools seem to contain algorithms which are ~quadratic (in space and time) in the degree of a node, and they exhaust space and time bounds and fail for Mandarin when the FST is expressed in terms of Unicode code points. Building the machine to operate on UTF-8 octets ("bytes") instead of Unicode code points ensures that the maximum degree of an FST node is 184 (for the first octet) but more commonly 64 (for continuation bytes, which are 10xxxxxx in binary). This refactoring of the state machine allows us to operate directly on strings in their native UTF-8 encoding, but more importantly it distributes the out degree among nodes so that the foma algorithms execute in reasonable amounts of space and time. Change-Id: I</a><a class="double-link" href="https://github.com/wikimedia/parsoid/commit/93c872a93">93c872a93</a>
8582 of 10601 branches covered (80.95%)
13714 of 15928 relevant lines covered (86.1%)
114635.55 hits per line
| ID | Job ID | Ran | Files | Coverage | |
|---|---|---|---|---|---|
| 1 | 2014.1 | 0 |
84.64 |
Travis Job 2014.1 | |
| 2 | 2014.2 | 0 |
84.64 |
Travis Job 2014.2 | |
| 3 | 2014.3 | 0 |
84.64 |
Travis Job 2014.3 | |
| 4 | 2014.4 | 0 |
84.64 |
Travis Job 2014.4 |