• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / transmogrifier
99%
main: 99%

Build:
Build:
LAST BUILD BRANCH: v2.6
DEFAULT BRANCH: main
Repo Added 30 Mar 2022 05:29PM UTC
Files 19
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH TIMX-406-add-provenance-data
branch: TIMX-406-add-provenance-data
CHANGE BRANCH
x
Reset
  • TIMX-406-add-provenance-data
  • 2025-08-alma-locations-update
  • GDT-109-mitaardvark-adjustments
  • GDT-138-update-timdex-record-to-enable-access-filter
  • GDT-149-updates-from-full-mit-harvest
  • GDT-193-adr-handle-data-forms
  • GDT-201-data-type-refactor
  • GDT-203-add-provider-field
  • GDT-205-adr-institution-information
  • GDT-206-add-provider-field
  • GDT-210-publishers-refactor
  • GDT-217-handle-none-values
  • GDT-217-map-spatial-subjects-to-locations-field
  • GDT-247-update-rights-access-values
  • GDT-54-additional-aardvark-fields
  • GDT-54-more-MITAardvark-methods
  • GDT-54-workarounds-and-bug-fixes
  • GDT-68-update-external-url
  • GDT-82-transformer-base-class
  • GDT-83-jsontransformer
  • IN-1079-maintenance-2024-10
  • RDI-111-subject-grouping
  • RDI-141-citation-generation-refactor
  • RDI-151-datacite-refactor
  • RDI-166-record-filtering
  • RDI-216-default-content-type
  • RDI-53-zenodo-transform
  • RDI-55-whoas-transform
  • RDI-78
  • TIMX-18-dates-languages-fields
  • TIMX-18-holdings-field
  • TIMX-18-holdings-refactor
  • TIMX-18-marc-transform
  • TIMX-18-marc-transform-additional-fields
  • TIMX-18-marc-transform-even-more-fields
  • TIMX-18-marc-transform-initial-methods
  • TIMX-18-marc-transform-initial_methods
  • TIMX-18-marc-transform-more-fields
  • TIMX-18-marc-transform-yet-more-fields
  • TIMX-18-serial-holdings-field
  • TIMX-180-date-cleanup
  • TIMX-212-invalid-date-bug-fix
  • TIMX-212-invalid-date-fix
  • TIMX-227-springshare-sources
  • TIMX-232-springshare-ids
  • TIMX-234-update-aspace-identifiers
  • TIMX-235-aspace-date-range
  • TIMX-241-springshare-oai-dates
  • TIMX-246-methods-for-fields-adr
  • TIMX-254-mitlibwebsite-transformer
  • TIMX-270-date-bug-fix
  • TIMX-282-dspacedim-fmr-judgment-day
  • TIMX-282-dspacedim-fmr-origins
  • TIMX-283-springshare-field-method-refactor
  • TIMX-284-datacite-field-method-refactor
  • TIMX-284-datacite-fmr-beyond-thunderdome
  • TIMX-284-datacite-fmr-strikes-back
  • TIMX-285-aardvark-fmr
  • TIMX-285-aardvark-fmr-the-wrong-trousers
  • TIMX-286-dspace-mets-fmr-begins
  • TIMX-286-dspace-mets-fmr-returns
  • TIMX-287-ead-field-method-refactor
  • TIMX-287-ead-field-method-refactor-1
  • TIMX-287-ead-field-method-refactor-2
  • TIMX-287-ead-field-method-refactor-3
  • TIMX-287-ead-field-method-refactor-4
  • TIMX-288-marc-field-method-refactor
  • TIMX-288-marc-field-method-refactor-2
  • TIMX-288-marc-field-method-refactor-3
  • TIMX-288-marc-field-method-refactor-4
  • TIMX-288-marc-field-method-refactor-5
  • TIMX-291-orchestration
  • TIMX-332-dedupe-function
  • TIMX-355-control-field-index
  • TIMX-400-new-popupvinyl-location
  • TIMX-403-inputs-support-parquet-writing
  • TIMX-404-establish-feature-flagging-pathways
  • TIMX-405-write-output-to-dataset
  • TIMX-447-dvd-collections-location
  • TIMX-454-memory-error-on-exit
  • TIMX-459-update-logging
  • TIMX-489-remove-parquet-feature-flags
  • TIMX-496-update-tda
  • TIMX-501-additional-format-support
  • TIMX-501-update-aspace-oai-identifier-parsing
  • TIMX-509-cli-run-timestamp-arg
  • TIMX-537
  • TIMX-557-and-misc
  • TIMX-64-add-more-fields-to-get-optional-fields
  • TIMX-64-additional-fields
  • TIMX-64-complete-get-optional-fields
  • TIMX-64-ead-transform
  • TIMX-64-ead-transform-get-optional-fields
  • TIMX-64-expand-get-optional-fields
  • USE-69-mitlibwebsite-transformer-update
  • USE-98-mitlibwebsite-deleted-records
  • add-description-to-readme
  • bugfix
  • date-whitespace-bug-fix
  • dependabot-update
  • dependabot/docker/python-3.12-slim
  • dependabot/docker/python-3.13-slim
  • dependabot/pip/attrs-23.1.0
  • dependabot/pip/bandit-1.7.5
  • dependabot/pip/beautifulsoup4-4.12.0
  • dependabot/pip/beautifulsoup4-4.12.1
  • dependabot/pip/beautifulsoup4-4.12.2
  • dependabot/pip/beautifulsoup4-4.13.3
  • dependabot/pip/black-23.3.0
  • dependabot/pip/black-24.3.0
  • dependabot/pip/black-24.4.0
  • dependabot/pip/black-24.4.2
  • dependabot/pip/black-25.1.0
  • dependabot/pip/certifi-2022.12.7
  • dependabot/pip/certifi-2023.7.22
  • dependabot/pip/click-8.1.5
  • dependabot/pip/click-8.1.6
  • dependabot/pip/click-8.1.8
  • dependabot/pip/cryptography-42.0.2
  • dependabot/pip/cryptography-42.0.4
  • dependabot/pip/cryptography-44.0.1
  • dependabot/pip/flake8-6.1.0
  • dependabot/pip/gitpython-3.1.30
  • dependabot/pip/idna-3.7
  • dependabot/pip/ipython-8.21.0
  • dependabot/pip/ipython-8.22.2
  • dependabot/pip/ipython-8.23.0
  • dependabot/pip/ipython-8.30.0
  • dependabot/pip/ipython-8.31.0
  • dependabot/pip/ipython-8.32.0
  • dependabot/pip/jinja2-3.1.4
  • dependabot/pip/jinja2-3.1.5
  • dependabot/pip/lxml-4.9.1
  • dependabot/pip/lxml-4.9.3
  • dependabot/pip/lxml-5.2.0
  • dependabot/pip/mypy-1.1.1
  • dependabot/pip/mypy-1.10.0
  • dependabot/pip/mypy-1.14.0
  • dependabot/pip/mypy-1.14.1
  • dependabot/pip/mypy-1.2.0
  • dependabot/pip/mypy-1.3.0
  • dependabot/pip/mypy-1.4.1
  • dependabot/pip/mypy-1.9.0
  • dependabot/pip/pre-commit-3.6.1
  • dependabot/pip/pre-commit-3.6.2
  • dependabot/pip/pre-commit-3.7.0
  • dependabot/pip/pre-commit-4.1.0
  • dependabot/pip/pyarrow-18.1.0
  • dependabot/pip/pytest-7.2.2
  • dependabot/pip/pytest-7.3.0
  • dependabot/pip/pytest-7.3.1
  • dependabot/pip/pytest-7.3.2
  • dependabot/pip/pytest-7.4.0
  • dependabot/pip/pytest-8.0.2
  • dependabot/pip/pytest-8.1.0
  • dependabot/pip/pytest-8.1.1
  • dependabot/pip/pytest-8.3.4
  • dependabot/pip/python-dateutil-2.9.0.post0
  • dependabot/pip/requests-2.31.0
  • dependabot/pip/ruff-0.2.1
  • dependabot/pip/ruff-0.2.2
  • dependabot/pip/ruff-0.3.0
  • dependabot/pip/ruff-0.3.2
  • dependabot/pip/ruff-0.3.3
  • dependabot/pip/ruff-0.3.4
  • dependabot/pip/ruff-0.3.7
  • dependabot/pip/ruff-0.4.1
  • dependabot/pip/ruff-0.4.2
  • dependabot/pip/ruff-0.4.3
  • dependabot/pip/ruff-0.8.1
  • dependabot/pip/ruff-0.8.2
  • dependabot/pip/ruff-0.8.3
  • dependabot/pip/ruff-0.8.6
  • dependabot/pip/ruff-0.9.1
  • dependabot/pip/ruff-0.9.2
  • dependabot/pip/ruff-0.9.3
  • dependabot/pip/ruff-0.9.4
  • dependabot/pip/ruff-0.9.5
  • dependabot/pip/ruff-0.9.7
  • dependabot/pip/safety-3.2.14
  • dependabot/pip/safety-3.3.0
  • dependabot/pip/sentry-sdk-1.17.0
  • dependabot/pip/sentry-sdk-1.18.0
  • dependabot/pip/sentry-sdk-1.19.0
  • dependabot/pip/sentry-sdk-1.19.1
  • dependabot/pip/sentry-sdk-1.20.0
  • dependabot/pip/sentry-sdk-1.21.0
  • dependabot/pip/sentry-sdk-1.21.1
  • dependabot/pip/sentry-sdk-1.22.1
  • dependabot/pip/sentry-sdk-1.22.2
  • dependabot/pip/sentry-sdk-1.23.1
  • dependabot/pip/sentry-sdk-1.25.0
  • dependabot/pip/sentry-sdk-1.25.1
  • dependabot/pip/sentry-sdk-1.26.0
  • dependabot/pip/sentry-sdk-1.27.1
  • dependabot/pip/sentry-sdk-1.28.1
  • dependabot/pip/sentry-sdk-1.40.3
  • dependabot/pip/sentry-sdk-1.40.4
  • dependabot/pip/sentry-sdk-1.40.5
  • dependabot/pip/sentry-sdk-1.42.0
  • dependabot/pip/sentry-sdk-1.43.0
  • dependabot/pip/sentry-sdk-1.44.0
  • dependabot/pip/sentry-sdk-1.45.0
  • dependabot/pip/sentry-sdk-2.0.1
  • dependabot/pip/sentry-sdk-2.19.2
  • dependabot/pip/sentry-sdk-2.20.0
  • dependabot/pip/sentry-sdk-2.22.0
  • dependabot/pip/smart-open-7.0.3
  • dependabot/pip/types-python-dateutil-2.9.0.20240316
  • dependencies-update
  • dependency-updates
  • dependency-updates-23-03-29
  • field-method-adr
  • field-method-refactor
  • fix-bs4-recursion-issue
  • fix-recursion-bug
  • gdt-116-update-locations
  • gdt-199-publication-adr
  • gdt-54-aardvark-transform
  • gdt-82-transformer-class-refactor
  • helpers-typing
  • helpers_typing
  • hotfix-fully-remove-etl-version-flags
  • hotfix-vulnerability
  • install-tda-from-main
  • jpal-transform
  • linting-update
  • main
  • makefile-workflow-updates
  • mit-timdex-json-schema
  • proposed-transform
  • rdi-106-complete-timdex-record-model
  • rdi-120-dspace-mets-transform
  • rdi-139-streaming-xml-parser
  • rdi-160-update-config
  • rdi-165-refactor-sources
  • rdi-187-handle-missing-title-field
  • rdi-233-subfield-bugfix
  • rdi-243-246-update-content-type-filters
  • rdi-52
  • rdi-77
  • refs/heads/dependabot/pip/sentry-sdk-1.24.0
  • refs/tags/v.1.3.1
  • refs/tags/v.1.3.4
  • refs/tags/v1.0
  • refs/tags/v1.1.0
  • refs/tags/v1.1.1
  • refs/tags/v1.2.1
  • refs/tags/v1.2.2
  • refs/tags/v1.3.0
  • refs/tags/v1.3.2
  • refs/tags/v1.3.3
  • refs/tags/v1.4.0
  • refs/tags/v1.4.1
  • stage-workflow-update
  • tda-version-bump-v0.3.0
  • timx-126-skip-deleted-records
  • timx-141-deleted-records
  • timx-184-marc-portfolio-mapping
  • timx-193-reduce-logs
  • timx-273-structure-exploration
  • timx-64-refactor-class-methods-archdesc-check
  • update-caller-workflows
  • update-dependencies-2025-10-22
  • v1.5
  • v1.6.0
  • v2.0
  • v2.2
  • v2.2.1
  • v2.4
  • v2.5
  • v2.6
  • workflow-makefile-updates

16 Jan 2025 09:05PM UTC coverage: 98.765% (+0.008%) from 98.757%
12817776883

push

github

ghukill
Add TIMDEX provenance object to transformed records

Why these changes are being introduced:

Transitioning to a parquet dataset architecture for TIMDEX ETL provides
additional data related to each transformed record as part of that
record's row in the dataset.  But this data is only helpful if you tether
the record you encounter in Opensearch with a row in the dataset.

Certainly related, but not dependent on the parquet dataset change,
was the desire for more information about a record in TIMDEX, e.g. when
was it transformed and indexed.

We might consider this information "provenance" about the TIMDEX record
as encountered in Opensearch and/or the TIMDEX API.

How this addresses that need:

A new "timdex_provenance" field is added to the TIMDEX data model that
includes information about the origins of the TIMDEX record.  As it
pertains to the parquet dataset, this provenance data includes fields like
"run_id" and "run_record_offset" which help pinpoint the row in the
parquet dataset for this record.  With this linkage, it becomes possible to
very quickly retrieve the original source record for a transformed record.

In addition to support random access reads of the dataset, this provenance
data provides some metadata about the TIMDEX record that is immediately
informative like "run_date".

Side effects of this change:
* None, really.  TIM will need to be updated to include this new field
in the Opensearch mapping, but until then, it's just extra data in the
transformed record.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-406

12 of 12 new or added lines in 2 files covered. (100.0%)

1760 of 1782 relevant lines covered (98.77%)

0.99 hits per line

Relevant lines Covered
Build:
Build:
1782 RELEVANT LINES 1760 COVERED LINES
0.99 HITS PER LINE
Source Files on TIMX-406-add-provenance-data
  • Tree
  • List 18
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
12817776883 TIMX-406-add-provenance-data Add TIMDEX provenance object to transformed records Why these changes are being introduced: Transitioning to a parquet dataset architecture for TIMDEX ETL provides additional data related to each transformed record as part of that record's row i... push 16 Jan 2025 09:19PM UTC ghukill github
98.77
See All Builds (691)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc