• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

grobidOrg / grobid
39%
master: 39%

Build:
Build:
LAST BUILD BRANCH: copilot/possibility-to-list-ongoing-trainings
DEFAULT BRANCH: master
Repo Added 19 Jan 2026 12:58PM UTC
Token teyP78tECFj7GsoLVxxuREI5tGJlxw8Ge regen
Build 427 Last
Files 328
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH copilot/fix-tokenizer-language-sensitivity
branch: copilot/fix-tokenizer-language-sensitivity
CHANGE BRANCH
x
Reset
Sync Branches
  • copilot/fix-tokenizer-language-sensitivity
  • 0.9.0
  • bugfix/add-priority-content-type
  • bugfix/avoid-IOBE-in-orcid-search
  • bugfix/codespell-round-two
  • bugfix/correct-git-revision
  • bugfix/docker-arm
  • bugfix/docker-full-build
  • bugfix/fix-149
  • bugfix/fix-15
  • bugfix/fix-671
  • bugfix/fix-728
  • bugfix/fix-849
  • bugfix/fix-classification-delft
  • bugfix/fix-doi-search
  • bugfix/improve-regexes
  • bugfix/issue-1024
  • bugfix/issue-1355
  • bugfix/issue-271
  • bugfix/issue-465
  • bugfix/issue-512
  • bugfix/issue-716
  • bugfix/issue-849
  • bugfix/issue-920
  • bugfix/jep-env-uv
  • bugfix/model-selection-flavours
  • bugfix/patch-lmdb
  • bugfix/reference-segmenter-createTraining
  • bugfix/sentence-segmentation-detection
  • bugfix/spotless
  • bugfix/support-missing-dir
  • bugfix/ziputils-security-advisor
  • bugfixes/docker-retag
  • bugix/correct-revision
  • chore/remove-powermock
  • chore/remove-unused-models
  • ci/reduce-coveralls
  • ci/support-forks
  • copilot/fix-paragraph-segmentation-issue
  • copilot/fix-wapiti-jni-space-argument
  • copilot/possibility-to-list-ongoing-trainings
  • display-version-in-ui
  • feature/add-api-status-in-ui
  • feature/add-code-cleaning
  • feature/add-ignore-areas-api
  • feature/add-pdf-tei-editor
  • feature/apache-licence-header
  • feature/auto-update-hf-space
  • feature/blingfire
  • feature/code-formatting
  • feature/codeql
  • feature/codespell
  • feature/coi-ac
  • feature/coi-ac-update-light-models
  • feature/community-page
  • feature/development-api
  • feature/docker-summary
  • feature/documentation
  • feature/extract-any-url-in-fulltext
  • feature/fix-bibtex-index
  • feature/improve-documentation
  • feature/issue-to-documentation
  • feature/literature-law-flavour
  • feature/literature-law-flavour-eval
  • feature/literature-law-flavour-experiment
  • feature/literature-law-flavour-experiments
  • feature/middle-name-bibtex
  • feature/onnx-models
  • feature/onnx-models-perf
  • feature/onnx-models-perfs
  • feature/otlp-metrics-push
  • feature/prepare-release
  • feature/refactor-tei-formatter
  • feature/refactor-trainers
  • feature/remove-unused-lexicons
  • feature/rename-ui-items
  • feature/revise-consolidation
  • feature/rewrite-overloaded
  • feature/sync-crf-manual-build
  • feature/unpin-lmdb-delft
  • feature/update-benchmarks
  • feature/update-copyright
  • feature/update-delft
  • feature/update-dependencies
  • feature/update-doc-flavor-dev
  • feature/update-editorconfig
  • feature/update-lingua
  • feture/update-documentation
  • fix/pdfalto-memory-limit-macos
  • hotfix/0.9.0-docker
  • improvement/affiliation-author
  • improvement/fix-lexicon
  • improvements/fails-on-failures
  • improvements/speed-up-ci
  • master
  • multi-arch-docker-image
  • publish/0.9.0-central
  • release/0.9.0
  • rootless
  • security/cmdinjection-threadleak
  • security/jline-cmdinjection-threadleak
  • security/update-dependencies
  • tensorflow-2.16
  • tensorflow-2.16-docker-full

28 Jun 2026 05:57AM UTC coverage: 38.634% (+0.004%) from 38.63%
28313157230

Pull #1480

github

web-flow
Make tokenizer language-sensitive in PDFALTOSaxHandler

- Add Language field initialized from doc.getLanguage() in constructor
- Add setLanguage()/getLanguage() methods for external configuration
- Pass language to analyzer.tokenize() call instead of using default
- Add tests for language initialization and configuration
- Fixes #376
Pull Request #1480: Make tokenizer language-sensitive in PDFALTOSaxHandler

8677 of 24982 branches covered (34.73%)

Branch coverage included in aggregate %.

18494 of 45348 relevant lines covered (40.78%)

1.65 hits per line

Relevant lines Covered
Build:
Build:
45348 RELEVANT LINES 18494 COVERED LINES
1.65 HITS PER LINE
Source Files on copilot/fix-tokenizer-language-sensitivity
  • Tree
  • List 328
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
28313157230 copilot/fix-tokenizer-language-sensitivity Make tokenizer language-sensitive in PDFALTOSaxHandler - Add Language field initialized from doc.getLanguage() in constructor - Add setLanguage()/getLanguage() methods for external configuration - Pass language to analyzer.tokenize() call instead... Pull #1480 30 Jun 2026 08:19AM UTC web-flow github
38.63
See All Builds (427)

Badge your Repo: grobid

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc