• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

bramp / build-along / 19850699447
89%

Build:
DEFAULT BRANCH: main
Ran 02 Dec 2025 07:30AM UTC
Jobs 1
Files 149
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

02 Dec 2025 07:27AM UTC coverage: 90.421% (-0.1%) from 90.525%
19850699447

push

github

bramp
feat: Implement rule-based classifier system and refactor key classifiers

Introduces a flexible, rule-based architecture for PDF element classification,
significantly reducing boilerplate and improving maintainability.

Key changes include:
- **New `RuleBasedClassifier` base class**: Provides a declarative way to define
  classifier logic using a list of `Rule` objects.
- **Generic and specific `Rule` implementations**:
  - `Filter` base class for pass/fail rules (`IsInstanceFilter`, `InBottomBandFilter`).
  - Scoring rules like `RegexMatch`, `FontSizeMatch`, `CornerDistanceScore`,
    `PageNumberValueMatch`.
  - Aggregation rules like `MaxScoreRule` for combining multiple scoring criteria.
  - Domain-specific rules: `PageNumberTextRule`, `PartCountTextRule`,
    `PartNumberTextRule`, `BagNumberTextRule`, `TopLeftPositionScore`,
    `BagNumberFontSizeRule`, and `StepNumberTextRule`.
- **Refactored Classifiers**: Converted `PageNumberClassifier`,
  `PartCountClassifier`, `StepNumberClassifier`, `PartNumberClassifier`, and
  `BagNumberClassifier` to use the new rule-based system.
- **Centralized Configuration**: `min_score` and weights for `BagNumberClassifier`
  are now configurable via `BagNumberConfig`.
- **Improved Test Fixtures**: Updated `conftest.py` to support the new `RuleScore`.
- **Golden File Updates**: Golden test fixtures were regenerated to match the
  new (and in some cases, improved) classification behavior. Notably, some pages
  previously misclassified as "instruction" are now correctly identified as "catalog"
  due to enhanced `StepNumberClassifier` logic.

This refactoring streamlines the addition and modification of classification
heuristics, making the system more robust and easier to debug.

341 of 365 new or added lines in 12 files covered. (93.42%)

11 existing lines in 2 files now uncovered.

10298 of 11389 relevant lines covered (90.42%)

0.9 hits per line

New Missed Lines in Diff

Lines Coverage ∆ File
5
90.74
src/build_a_long/pdf_extract/classifier/rule_based_classifier.py
19
91.0
src/build_a_long/pdf_extract/classifier/rules.py

Uncovered Existing Lines

Lines Coverage ∆ File
1
85.37
-2.44% src/build_a_long/pdf_extract/classifier/label_classifier.py
10
88.37
-1.1% src/build_a_long/pdf_extract/classifier/steps/diagram_classifier.py
Jobs
ID Job ID Ran Files Coverage
1 19850699447.1 02 Dec 2025 07:30AM UTC 149
90.42
GitHub Action Run
Source Files on build 19850699447
  • Tree
  • List 149
  • Changed 12
  • Source Changed 11
  • Coverage Changed 12
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #19850699447
  • b3cecc78 on github
  • Prev Build on main (#19843950690)
  • Next Build on main (#19995046189)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc