• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

benwbrum / fromthepage / 25654994539 / 1
70%
development: 70%

Build:
DEFAULT BRANCH: development
Ran 11 May 2026 07:11AM UTC
Files 263
Run time 9s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

11 May 2026 06:53AM UTC coverage: 69.829% (+0.005%) from 69.824%
25654994539.1

push

github

web-flow
Reduce false positives in subject duplicate detection (#5475)

* Initial plan

* Fix subject duplicate algorithm to reduce false positives

Two improvements to possible_duplicates:
1. Extract only 4+ character alphabetic words (was 2+ word chars).
   This filters out short initials/abbreviations like "Jr", "Fr", "R."
2. Use REGEXP whole-word matching instead of LIKE substring matching.
   This prevents "Fr" from matching "Frederick" and "Smith" from
   matching "Smithson".

Add tests covering:
- Short initials (F. R. Calvert) don't cause false positives
- 2-char abbreviations (Fr.) don't match unrelated articles
- Whole-word matching works correctly (Smith != Smithson)
- Empty word list returns no duplicates
- Collection scoping is preserved
- Multi-word articles prioritize matches sharing more words

Agent-Logs-Url: https://github.com/benwbrum/fromthepage/sessions/9b227348-a656-40ce-a73c-f7be08b69d71

Co-authored-by: benwbrum <199961+benwbrum@users.noreply.github.com>

* Address code review: add constant and defensive REGEXP escaping

- Extract magic number 4 into MINIMUM_DUPLICATE_WORD_LENGTH constant
  with explanatory comment at the class level
- Use Regexp.escape(word) defensively before interpolating into the
  REGEXP pattern (words from scan only contain [[:alpha:]] chars and
  have no metacharacters, but escaping is good practice)

Agent-Logs-Url: https://github.com/benwbrum/fromthepage/sessions/9b227348-a656-40ce-a73c-f7be08b69d71

Co-authored-by: benwbrum <199961+benwbrum@users.noreply.github.com>

* Address remaining code review: frozen regex constant, MySQL compat comment, safer index test

- Move DUPLICATE_WORD_REGEX to a frozen class constant to avoid runtime
  interpolation concerns; MINIMUM_DUPLICATE_WORD_LENGTH is still
  referenced once at class load time when the constant is defined
- Expand MySQL compatibility comment explaining [[:alpha:]] is supported
  in both MySQL 5.x POSIX ERE and MySQL 8.0+ ICU regex engines
- Extract index(... (continued)

2440 of 4001 branches covered (60.98%)

Branch coverage included in aggregate %.

10005 of 13821 relevant lines covered (72.39%)

156.84 hits per line

Source Files on job 25654994539.1
  • Tree
  • List 263
  • Changed 1
  • Source Changed 1
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Build 25654994539
  • b5c8a99f on github
  • Prev Job for on development (#25645408942.1)
  • Next Job for on development (#25870477784.1)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc