• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

rust-lang / regex
100%
master: 93%

Build:
Build:
LAST BUILD BRANCH: ag/misc-fixes
DEFAULT BRANCH: master
Repo Added 13 Mar 2016 05:50PM UTC
Files 20
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH word
branch: word
CHANGE BRANCH
x
Reset
  • word
  • master

pending completion
507

push

travis-ci

BurntSushi
Add ASCII word boundaries to the lazy DFA.

In other words, `\b` in a `bytes::Regex` can now be used in the DFA.
This leads to a big performance boost:

```
sherlock::word_ending_n                  115,465,261 (5 MB/s)  3,038,621 (195 MB/s)    -112,426,640  -97.37%
```

Unfortunately, Unicode word boundaries continue to elude the DFA. This
state of affairs is lamentable, but after a lot of thought, I've
concluded there are only two ways to speed up Unicode word boundaries:

1. Come up with a hairbrained scheme to add multi-byte look-behind/ahead
   to the lazy DFA. (The theory says it's possible. Figuring out how to
   do this without combinatorial state explosion is not within my grasp
   at the moment.)
2. Build a second lazy DFA with transitions on Unicode codepoints
   instead of bytes. (The looming inevitability of this makes me queasy
   for a number of reasons.)

To ameliorate this state of affairs, it is now possible to disable
Unicode support in `Regex::new` with `(?-u)`. In other words, one can
now use an ASCII word boundary with `(?-u:\b)`.

Disabling Unicode support does not violate any invariants around UTF-8.
In particular, if the regular expression could lead to a match of
invalid UTF-8, then the parser will return an error. (This only happens
for `Regex::new`. `bytes::Regex::new` still of course allows matching
arbitrary bytes.)

Finally, a new `PERFORMANCE.md` guide was written.

1048 of 1048 relevant lines covered (100.0%)

1.0 hits per line

Relevant lines Covered
Build:
Build:
1048 RELEVANT LINES 1048 COVERED LINES
1.0 HITS PER LINE
Source Files on word
Detailed source file information is not available for this build.

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
507 word Add ASCII word boundaries to the lazy DFA. In other words, `\b` in a `bytes::Regex` can now be used in the DFA. This leads to a big performance boost: ``` sherlock::word_ending_n 115,465,261 (5 MB/s) 3,038,621 (195 MB/s) -11... push 01 May 2018 11:09AM UTC BurntSushi travis-ci pending completion  
See All Builds (912)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc