• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / transmogrifier / 20113036624
99%
main: 99%

Build:
Build:
LAST BUILD BRANCH: v3.8
DEFAULT BRANCH: main
Ran 10 Dec 2025 08:58PM UTC
Jobs 1
Files 20
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

10 Dec 2025 08:53PM UTC coverage: 98.789% (-0.04%) from 98.831%
20113036624

Pull #266

github

ghukill
Parse full HTML from mitlibwebsite source records

Why these changes are being introduced:

Now that browsertrix-harvester is including full HTML + response headers in the
source record available to Transmogrifier, we can do two things:

1. Parse metadata for mitlibwebsite TIMDEX records from the original, full HTML
in a more opinionated fashion than we could in browsertrix-harvester.

2. Extract good, meaningful full-text from the full HTML to use for the new
`fulltext` field.

How this addresses that need:

Expects a new `html_base64` field in the browsertrix-harvester source records.
Uses this to extract metadata and full-text for the record.

Side effects of this change:
* Full-text is now available in the TIMDEX record for the mitlibwebsite
source.
* If needed, this HTML parsing could be utilized to extract more granular,
source specific metadata in the future.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-259
Pull Request #266: USE 259 - parse HTML for mitlibwebsite source

23 of 24 new or added lines in 1 file covered. (95.83%)

1795 of 1817 relevant lines covered (98.79%)

0.99 hits per line

New Missed Lines in Diff

Lines Coverage ∆ File
1
96.77
-0.79% transmogrifier/sources/json/mitlibwebsite.py
Jobs
ID Job ID Ran Files Coverage
1 20113036624.1 10 Dec 2025 08:58PM UTC 20
98.79
GitHub Action Run
Source Files on build 20113036624
  • Tree
  • List 20
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Github Actions Build #20113036624
  • Pull Request #266
  • PR Base - main (#20043212059)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc