• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

grobidOrg / grobid
39%
master: 39%

Build:
Build:
LAST BUILD BRANCH: fix/pdfalto-memory-limit-macos
DEFAULT BRANCH: master
Repo Added 19 Jan 2026 12:58PM UTC
Token teyP78tECFj7GsoLVxxuREI5tGJlxw8Ge regen
Build 414 Last
Files 328
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH bugfix/fix-671
branch: bugfix/fix-671
CHANGE BRANCH
x
Reset
Sync Branches
  • bugfix/fix-671
  • 0.9.0
  • bugfix/add-priority-content-type
  • bugfix/avoid-IOBE-in-orcid-search
  • bugfix/codespell-round-two
  • bugfix/correct-git-revision
  • bugfix/docker-arm
  • bugfix/docker-full-build
  • bugfix/fix-149
  • bugfix/fix-15
  • bugfix/fix-728
  • bugfix/fix-849
  • bugfix/fix-classification-delft
  • bugfix/fix-doi-search
  • bugfix/improve-regexes
  • bugfix/issue-1024
  • bugfix/issue-1355
  • bugfix/issue-271
  • bugfix/issue-465
  • bugfix/issue-512
  • bugfix/issue-716
  • bugfix/issue-849
  • bugfix/issue-920
  • bugfix/jep-env-uv
  • bugfix/model-selection-flavours
  • bugfix/patch-lmdb
  • bugfix/reference-segmenter-createTraining
  • bugfix/sentence-segmentation-detection
  • bugfix/spotless
  • bugfix/support-missing-dir
  • bugfixes/docker-retag
  • bugix/correct-revision
  • chore/remove-powermock
  • chore/remove-unused-models
  • ci/reduce-coveralls
  • ci/support-forks
  • display-version-in-ui
  • feature/add-api-status-in-ui
  • feature/add-code-cleaning
  • feature/add-ignore-areas-api
  • feature/add-pdf-tei-editor
  • feature/auto-update-hf-space
  • feature/blingfire
  • feature/code-formatting
  • feature/codeql
  • feature/codespell
  • feature/coi-ac
  • feature/coi-ac-update-light-models
  • feature/community-page
  • feature/development-api
  • feature/docker-summary
  • feature/documentation
  • feature/extract-any-url-in-fulltext
  • feature/fix-bibtex-index
  • feature/improve-documentation
  • feature/issue-to-documentation
  • feature/literature-law-flavour
  • feature/literature-law-flavour-eval
  • feature/literature-law-flavour-experiment
  • feature/literature-law-flavour-experiments
  • feature/middle-name-bibtex
  • feature/onnx-models
  • feature/onnx-models-perf
  • feature/onnx-models-perfs
  • feature/otlp-metrics-push
  • feature/prepare-release
  • feature/refactor-tei-formatter
  • feature/refactor-trainers
  • feature/remove-unused-lexicons
  • feature/rename-ui-items
  • feature/revise-consolidation
  • feature/rewrite-overloaded
  • feature/sync-crf-manual-build
  • feature/unpin-lmdb-delft
  • feature/update-benchmarks
  • feature/update-copyright
  • feature/update-delft
  • feature/update-dependencies
  • feature/update-doc-flavor-dev
  • feature/update-editorconfig
  • feature/update-lingua
  • feture/update-documentation
  • fix/pdfalto-memory-limit-macos
  • hotfix/0.9.0-docker
  • improvement/affiliation-author
  • improvement/fix-lexicon
  • improvements/fails-on-failures
  • improvements/speed-up-ci
  • master
  • multi-arch-docker-image
  • publish/0.9.0-central
  • release/0.9.0
  • rootless
  • security/cmdinjection-threadleak
  • security/jline-cmdinjection-threadleak
  • security/update-dependencies
  • tensorflow-2.16
  • tensorflow-2.16-docker-full

22 Jun 2026 09:10PM UTC coverage: 38.584% (-0.002%) from 38.586%
27984234598

Pull #1460

github

lfoppiano
fix: detect document language for segmentation training data #671

createTrainingSegmentation() and createBlankTrainingData() hardcoded
xml:lang="en" in the generated training TEI regardless of the actual
document language. Detect the language from the extracted raw text via the existing LanguageUtilities, falling back to "en" when detection is
unavailable or inconclusive (detection failures are non-fatal).
Pull Request #1460: Fix #671

8637 of 24860 branches covered (34.74%)

Branch coverage included in aggregate %.

18392 of 45192 relevant lines covered (40.7%)

4.94 hits per line

Relevant lines Covered
Build:
Build:
45192 RELEVANT LINES 18392 COVERED LINES
4.94 HITS PER LINE
Source Files on bugfix/fix-671
  • Tree
  • List 325
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
27984234598 bugfix/fix-671 fix: detect document language for segmentation training data #671 createTrainingSegmentation() and createBlankTrainingData() hardcoded xml:lang="en" in the generated training TEI regardless of the actual document language. Detect the language fro... Pull #1460 22 Jun 2026 09:23PM UTC lfoppiano github
38.58
27977366707 bugfix/fix-671 fix: detect document language for segmentation training data #671 createTrainingSegmentation() and createBlankTrainingData() hardcoded xml:lang="en" in the generated training TEI regardless of the actual document language. Detect the language fro... Pull #1460 22 Jun 2026 07:24PM UTC lfoppiano github
38.58
See All Builds (414)

Badge your Repo: grobid

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc