• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

freeeve / roaringrange / 28541272305
84%

Build:
DEFAULT BRANCH: main
Ran 01 Jul 2026 07:08PM UTC
Jobs 1
Files 16
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

01 Jul 2026 07:07PM UTC coverage: 84.284% (-0.03%) from 84.318%
28541272305

push

github

freeeve
feat(terms)!: per-language stop words + decouple stemming from language

Stop-word removal is now keyed on the index language (18 Snowball languages)
instead of a fixed English list, and the RRTI stem filter is decoupled from the
language so an index can strip a language's stop words without stemming.

Header semantics (RRTI): the `language` byte is meaningful when bit0 (stemmed)
OR bit1 (stop-words) is set. The reader builds the stemmer only under bit0 but
reads the language under bit0 | bit1. Enabling either filter requires a language
-- a filter set with no language is a build error (no language==0 => English
fallback). Defaults (no filter) stay byte-identical; the English list is the
unchanged 31-word set, so existing English-stopword indexes are unaffected.

Per-language lists live once in stopwords/<lang>.txt at the repo root (sorted,
lowercased, de-duplicated). Rust embeds them with include_str! and Go with
//go:embed -- the same physical files, so the two ports' lists are byte-identical
by construction. English is the fixed list; the other 17 are from NLTK, Tamil
from spaCy.

API:
- Rust: stop_words(lang) / is_stop_word(t, lang); Tokenizer::with(language, stem,
  stopwords, case_fold) with the old new(..) kept as a stem = language.is_some()
  shim; spec() widened to (language, stem, stopwords, case_fold); from_header
  reads the language under bit0 | bit1; TermIndexConfig / TermSplitBuildConfig
  gain a stem field; the stream writer sets FLAG_STEMMED from stem and errors on
  a filter with no language.
- Go: all 18 TermLanguage constants + a stopwordFile map; termStopWordList /
  isTermStopWord(t, lang); TermTokenizer.language; NewTermTokenizerFull and
  WriteTermIndexFull, with the old *With funcs kept as shims;
  TermSplitBuildConfig.Stem.
- Python: TermBuilder / TermSplitSetBuilder take stem=None (defaults to
  "a language was given"); a ValueError when a filter has no language.

Go multilingual stemming stays out of scope: only English ste... (continued)

33 of 38 new or added lines in 2 files covered. (86.84%)

1 existing line in 1 file now uncovered.

1507 of 1788 relevant lines covered (84.28%)

33.25 hits per line

Uncovered Changes

Lines Coverage ∆ File
5
85.25
-0.4% terms.go

Coverage Regressions

Lines Coverage ∆ File
1
85.25
-0.4% terms.go
Jobs
ID Job ID Ran Files Coverage
1 28541272305.1 01 Jul 2026 07:08PM UTC 16
84.28
GitHub Action Run
Source Files on build 28541272305
  • Tree
  • List 16
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #28541272305
  • 849f9c23 on github
  • Prev Build on main (#28279688357)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc