• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

grobidOrg / grobid
39%
master: 39%

Build:
Build:
LAST BUILD BRANCH: fix/pdfalto-memory-limit-macos
DEFAULT BRANCH: master
Repo Added 19 Jan 2026 12:58PM UTC
Token teyP78tECFj7GsoLVxxuREI5tGJlxw8Ge regen
Build 414 Last
Files 328
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH bugfix/fix-728
branch: bugfix/fix-728
CHANGE BRANCH
x
Reset
Sync Branches
  • bugfix/fix-728
  • 0.9.0
  • bugfix/add-priority-content-type
  • bugfix/avoid-IOBE-in-orcid-search
  • bugfix/codespell-round-two
  • bugfix/correct-git-revision
  • bugfix/docker-arm
  • bugfix/docker-full-build
  • bugfix/fix-149
  • bugfix/fix-15
  • bugfix/fix-671
  • bugfix/fix-849
  • bugfix/fix-classification-delft
  • bugfix/fix-doi-search
  • bugfix/improve-regexes
  • bugfix/issue-1024
  • bugfix/issue-1355
  • bugfix/issue-271
  • bugfix/issue-465
  • bugfix/issue-512
  • bugfix/issue-716
  • bugfix/issue-849
  • bugfix/issue-920
  • bugfix/jep-env-uv
  • bugfix/model-selection-flavours
  • bugfix/patch-lmdb
  • bugfix/reference-segmenter-createTraining
  • bugfix/sentence-segmentation-detection
  • bugfix/spotless
  • bugfix/support-missing-dir
  • bugfixes/docker-retag
  • bugix/correct-revision
  • chore/remove-powermock
  • chore/remove-unused-models
  • ci/reduce-coveralls
  • ci/support-forks
  • display-version-in-ui
  • feature/add-api-status-in-ui
  • feature/add-code-cleaning
  • feature/add-ignore-areas-api
  • feature/add-pdf-tei-editor
  • feature/auto-update-hf-space
  • feature/blingfire
  • feature/code-formatting
  • feature/codeql
  • feature/codespell
  • feature/coi-ac
  • feature/coi-ac-update-light-models
  • feature/community-page
  • feature/development-api
  • feature/docker-summary
  • feature/documentation
  • feature/extract-any-url-in-fulltext
  • feature/fix-bibtex-index
  • feature/improve-documentation
  • feature/issue-to-documentation
  • feature/literature-law-flavour
  • feature/literature-law-flavour-eval
  • feature/literature-law-flavour-experiment
  • feature/literature-law-flavour-experiments
  • feature/middle-name-bibtex
  • feature/onnx-models
  • feature/onnx-models-perf
  • feature/onnx-models-perfs
  • feature/otlp-metrics-push
  • feature/prepare-release
  • feature/refactor-tei-formatter
  • feature/refactor-trainers
  • feature/remove-unused-lexicons
  • feature/rename-ui-items
  • feature/revise-consolidation
  • feature/rewrite-overloaded
  • feature/sync-crf-manual-build
  • feature/unpin-lmdb-delft
  • feature/update-benchmarks
  • feature/update-copyright
  • feature/update-delft
  • feature/update-dependencies
  • feature/update-doc-flavor-dev
  • feature/update-editorconfig
  • feature/update-lingua
  • feture/update-documentation
  • fix/pdfalto-memory-limit-macos
  • hotfix/0.9.0-docker
  • improvement/affiliation-author
  • improvement/fix-lexicon
  • improvements/fails-on-failures
  • improvements/speed-up-ci
  • master
  • multi-arch-docker-image
  • publish/0.9.0-central
  • release/0.9.0
  • rootless
  • security/cmdinjection-threadleak
  • security/jline-cmdinjection-threadleak
  • security/update-dependencies
  • tensorflow-2.16
  • tensorflow-2.16-docker-full

22 Jun 2026 10:05PM UTC coverage: 38.587% (+0.01%) from 38.577%
27987196011

Pull #1461

github

lfoppiano
fix: stop folding the letters ae/oe to "ae"/"oe" in clean() #728

TextUtilities.clean() expands typographic ligatures (the fi/fl/ff family,
genuine PDF rendering artifacts) but also folded the distinct Unicode
letters ae/AE and oe/OE to ASCII. Those are real letters used deliberately
in Danish, Norwegian, Icelandic, French, etc., so folding them was data
loss in the extracted text. They now pass through unchanged; the ligature
expansion is unaffected.

Add regression tests covering both the preserved letters and the
still-expanded typographic ligatures.
Pull Request #1461: Fix 728 - Avoid converting danish letters when not necessary

8636 of 24856 branches covered (34.74%)

Branch coverage included in aggregate %.

18390 of 45184 relevant lines covered (40.7%)

4.94 hits per line

Relevant lines Covered
Build:
Build:
45184 RELEVANT LINES 18390 COVERED LINES
4.94 HITS PER LINE
Source Files on bugfix/fix-728
  • Tree
  • List 325
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
27987196011 bugfix/fix-728 fix: stop folding the letters ae/oe to "ae"/"oe" in clean() #728 TextUtilities.clean() expands typographic ligatures (the fi/fl/ff family, genuine PDF rendering artifacts) but also folded the distinct Unicode letters ae/AE and oe/OE to ASCII. Tho... Pull #1461 22 Jun 2026 10:17PM UTC lfoppiano github
38.59
27977370222 bugfix/fix-728 fix: stop folding the letters ae/oe to "ae"/"oe" in clean() #728 TextUtilities.clean() expands typographic ligatures (the fi/fl/ff family, genuine PDF rendering artifacts) but also folded the distinct Unicode letters ae/AE and oe/OE to ASCII. Tho... Pull #1461 22 Jun 2026 07:23PM UTC lfoppiano github
38.58
See All Builds (414)

Badge your Repo: grobid

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc