• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / transmogrifier
99%
main: 99%

Build:
Build:
LAST BUILD BRANCH: v2.6
DEFAULT BRANCH: main
Repo Added 30 Mar 2022 05:29PM UTC
Files 19
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH TIMX-403-inputs-support-parquet-writing
branch: TIMX-403-inputs-support-parquet-writing
CHANGE BRANCH
x
Reset
  • TIMX-403-inputs-support-parquet-writing
  • 2025-08-alma-locations-update
  • GDT-109-mitaardvark-adjustments
  • GDT-138-update-timdex-record-to-enable-access-filter
  • GDT-149-updates-from-full-mit-harvest
  • GDT-193-adr-handle-data-forms
  • GDT-201-data-type-refactor
  • GDT-203-add-provider-field
  • GDT-205-adr-institution-information
  • GDT-206-add-provider-field
  • GDT-210-publishers-refactor
  • GDT-217-handle-none-values
  • GDT-217-map-spatial-subjects-to-locations-field
  • GDT-247-update-rights-access-values
  • GDT-54-additional-aardvark-fields
  • GDT-54-more-MITAardvark-methods
  • GDT-54-workarounds-and-bug-fixes
  • GDT-68-update-external-url
  • GDT-82-transformer-base-class
  • GDT-83-jsontransformer
  • IN-1079-maintenance-2024-10
  • RDI-111-subject-grouping
  • RDI-141-citation-generation-refactor
  • RDI-151-datacite-refactor
  • RDI-166-record-filtering
  • RDI-216-default-content-type
  • RDI-53-zenodo-transform
  • RDI-55-whoas-transform
  • RDI-78
  • TIMX-18-dates-languages-fields
  • TIMX-18-holdings-field
  • TIMX-18-holdings-refactor
  • TIMX-18-marc-transform
  • TIMX-18-marc-transform-additional-fields
  • TIMX-18-marc-transform-even-more-fields
  • TIMX-18-marc-transform-initial-methods
  • TIMX-18-marc-transform-initial_methods
  • TIMX-18-marc-transform-more-fields
  • TIMX-18-marc-transform-yet-more-fields
  • TIMX-18-serial-holdings-field
  • TIMX-180-date-cleanup
  • TIMX-212-invalid-date-bug-fix
  • TIMX-212-invalid-date-fix
  • TIMX-227-springshare-sources
  • TIMX-232-springshare-ids
  • TIMX-234-update-aspace-identifiers
  • TIMX-235-aspace-date-range
  • TIMX-241-springshare-oai-dates
  • TIMX-246-methods-for-fields-adr
  • TIMX-254-mitlibwebsite-transformer
  • TIMX-270-date-bug-fix
  • TIMX-282-dspacedim-fmr-judgment-day
  • TIMX-282-dspacedim-fmr-origins
  • TIMX-283-springshare-field-method-refactor
  • TIMX-284-datacite-field-method-refactor
  • TIMX-284-datacite-fmr-beyond-thunderdome
  • TIMX-284-datacite-fmr-strikes-back
  • TIMX-285-aardvark-fmr
  • TIMX-285-aardvark-fmr-the-wrong-trousers
  • TIMX-286-dspace-mets-fmr-begins
  • TIMX-286-dspace-mets-fmr-returns
  • TIMX-287-ead-field-method-refactor
  • TIMX-287-ead-field-method-refactor-1
  • TIMX-287-ead-field-method-refactor-2
  • TIMX-287-ead-field-method-refactor-3
  • TIMX-287-ead-field-method-refactor-4
  • TIMX-288-marc-field-method-refactor
  • TIMX-288-marc-field-method-refactor-2
  • TIMX-288-marc-field-method-refactor-3
  • TIMX-288-marc-field-method-refactor-4
  • TIMX-288-marc-field-method-refactor-5
  • TIMX-291-orchestration
  • TIMX-332-dedupe-function
  • TIMX-355-control-field-index
  • TIMX-400-new-popupvinyl-location
  • TIMX-404-establish-feature-flagging-pathways
  • TIMX-405-write-output-to-dataset
  • TIMX-406-add-provenance-data
  • TIMX-447-dvd-collections-location
  • TIMX-454-memory-error-on-exit
  • TIMX-459-update-logging
  • TIMX-489-remove-parquet-feature-flags
  • TIMX-496-update-tda
  • TIMX-501-additional-format-support
  • TIMX-501-update-aspace-oai-identifier-parsing
  • TIMX-509-cli-run-timestamp-arg
  • TIMX-537
  • TIMX-557-and-misc
  • TIMX-64-add-more-fields-to-get-optional-fields
  • TIMX-64-additional-fields
  • TIMX-64-complete-get-optional-fields
  • TIMX-64-ead-transform
  • TIMX-64-ead-transform-get-optional-fields
  • TIMX-64-expand-get-optional-fields
  • USE-69-mitlibwebsite-transformer-update
  • USE-98-mitlibwebsite-deleted-records
  • add-description-to-readme
  • bugfix
  • date-whitespace-bug-fix
  • dependabot-update
  • dependabot/docker/python-3.12-slim
  • dependabot/docker/python-3.13-slim
  • dependabot/pip/attrs-23.1.0
  • dependabot/pip/bandit-1.7.5
  • dependabot/pip/beautifulsoup4-4.12.0
  • dependabot/pip/beautifulsoup4-4.12.1
  • dependabot/pip/beautifulsoup4-4.12.2
  • dependabot/pip/beautifulsoup4-4.13.3
  • dependabot/pip/black-23.3.0
  • dependabot/pip/black-24.3.0
  • dependabot/pip/black-24.4.0
  • dependabot/pip/black-24.4.2
  • dependabot/pip/black-25.1.0
  • dependabot/pip/certifi-2022.12.7
  • dependabot/pip/certifi-2023.7.22
  • dependabot/pip/click-8.1.5
  • dependabot/pip/click-8.1.6
  • dependabot/pip/click-8.1.8
  • dependabot/pip/cryptography-42.0.2
  • dependabot/pip/cryptography-42.0.4
  • dependabot/pip/cryptography-44.0.1
  • dependabot/pip/flake8-6.1.0
  • dependabot/pip/gitpython-3.1.30
  • dependabot/pip/idna-3.7
  • dependabot/pip/ipython-8.21.0
  • dependabot/pip/ipython-8.22.2
  • dependabot/pip/ipython-8.23.0
  • dependabot/pip/ipython-8.30.0
  • dependabot/pip/ipython-8.31.0
  • dependabot/pip/ipython-8.32.0
  • dependabot/pip/jinja2-3.1.4
  • dependabot/pip/jinja2-3.1.5
  • dependabot/pip/lxml-4.9.1
  • dependabot/pip/lxml-4.9.3
  • dependabot/pip/lxml-5.2.0
  • dependabot/pip/mypy-1.1.1
  • dependabot/pip/mypy-1.10.0
  • dependabot/pip/mypy-1.14.0
  • dependabot/pip/mypy-1.14.1
  • dependabot/pip/mypy-1.2.0
  • dependabot/pip/mypy-1.3.0
  • dependabot/pip/mypy-1.4.1
  • dependabot/pip/mypy-1.9.0
  • dependabot/pip/pre-commit-3.6.1
  • dependabot/pip/pre-commit-3.6.2
  • dependabot/pip/pre-commit-3.7.0
  • dependabot/pip/pre-commit-4.1.0
  • dependabot/pip/pyarrow-18.1.0
  • dependabot/pip/pytest-7.2.2
  • dependabot/pip/pytest-7.3.0
  • dependabot/pip/pytest-7.3.1
  • dependabot/pip/pytest-7.3.2
  • dependabot/pip/pytest-7.4.0
  • dependabot/pip/pytest-8.0.2
  • dependabot/pip/pytest-8.1.0
  • dependabot/pip/pytest-8.1.1
  • dependabot/pip/pytest-8.3.4
  • dependabot/pip/python-dateutil-2.9.0.post0
  • dependabot/pip/requests-2.31.0
  • dependabot/pip/ruff-0.2.1
  • dependabot/pip/ruff-0.2.2
  • dependabot/pip/ruff-0.3.0
  • dependabot/pip/ruff-0.3.2
  • dependabot/pip/ruff-0.3.3
  • dependabot/pip/ruff-0.3.4
  • dependabot/pip/ruff-0.3.7
  • dependabot/pip/ruff-0.4.1
  • dependabot/pip/ruff-0.4.2
  • dependabot/pip/ruff-0.4.3
  • dependabot/pip/ruff-0.8.1
  • dependabot/pip/ruff-0.8.2
  • dependabot/pip/ruff-0.8.3
  • dependabot/pip/ruff-0.8.6
  • dependabot/pip/ruff-0.9.1
  • dependabot/pip/ruff-0.9.2
  • dependabot/pip/ruff-0.9.3
  • dependabot/pip/ruff-0.9.4
  • dependabot/pip/ruff-0.9.5
  • dependabot/pip/ruff-0.9.7
  • dependabot/pip/safety-3.2.14
  • dependabot/pip/safety-3.3.0
  • dependabot/pip/sentry-sdk-1.17.0
  • dependabot/pip/sentry-sdk-1.18.0
  • dependabot/pip/sentry-sdk-1.19.0
  • dependabot/pip/sentry-sdk-1.19.1
  • dependabot/pip/sentry-sdk-1.20.0
  • dependabot/pip/sentry-sdk-1.21.0
  • dependabot/pip/sentry-sdk-1.21.1
  • dependabot/pip/sentry-sdk-1.22.1
  • dependabot/pip/sentry-sdk-1.22.2
  • dependabot/pip/sentry-sdk-1.23.1
  • dependabot/pip/sentry-sdk-1.25.0
  • dependabot/pip/sentry-sdk-1.25.1
  • dependabot/pip/sentry-sdk-1.26.0
  • dependabot/pip/sentry-sdk-1.27.1
  • dependabot/pip/sentry-sdk-1.28.1
  • dependabot/pip/sentry-sdk-1.40.3
  • dependabot/pip/sentry-sdk-1.40.4
  • dependabot/pip/sentry-sdk-1.40.5
  • dependabot/pip/sentry-sdk-1.42.0
  • dependabot/pip/sentry-sdk-1.43.0
  • dependabot/pip/sentry-sdk-1.44.0
  • dependabot/pip/sentry-sdk-1.45.0
  • dependabot/pip/sentry-sdk-2.0.1
  • dependabot/pip/sentry-sdk-2.19.2
  • dependabot/pip/sentry-sdk-2.20.0
  • dependabot/pip/sentry-sdk-2.22.0
  • dependabot/pip/smart-open-7.0.3
  • dependabot/pip/types-python-dateutil-2.9.0.20240316
  • dependencies-update
  • dependency-updates
  • dependency-updates-23-03-29
  • field-method-adr
  • field-method-refactor
  • fix-bs4-recursion-issue
  • fix-recursion-bug
  • gdt-116-update-locations
  • gdt-199-publication-adr
  • gdt-54-aardvark-transform
  • gdt-82-transformer-class-refactor
  • helpers-typing
  • helpers_typing
  • hotfix-fully-remove-etl-version-flags
  • hotfix-vulnerability
  • install-tda-from-main
  • jpal-transform
  • linting-update
  • main
  • makefile-workflow-updates
  • mit-timdex-json-schema
  • proposed-transform
  • rdi-106-complete-timdex-record-model
  • rdi-120-dspace-mets-transform
  • rdi-139-streaming-xml-parser
  • rdi-160-update-config
  • rdi-165-refactor-sources
  • rdi-187-handle-missing-title-field
  • rdi-233-subfield-bugfix
  • rdi-243-246-update-content-type-filters
  • rdi-52
  • rdi-77
  • refs/heads/dependabot/pip/sentry-sdk-1.24.0
  • refs/tags/v.1.3.1
  • refs/tags/v.1.3.4
  • refs/tags/v1.0
  • refs/tags/v1.1.0
  • refs/tags/v1.1.1
  • refs/tags/v1.2.1
  • refs/tags/v1.2.2
  • refs/tags/v1.3.0
  • refs/tags/v1.3.2
  • refs/tags/v1.3.3
  • refs/tags/v1.4.0
  • refs/tags/v1.4.1
  • stage-workflow-update
  • tda-version-bump-v0.3.0
  • timx-126-skip-deleted-records
  • timx-141-deleted-records
  • timx-184-marc-portfolio-mapping
  • timx-193-reduce-logs
  • timx-273-structure-exploration
  • timx-64-refactor-class-methods-archdesc-check
  • update-caller-workflows
  • update-dependencies-2025-10-22
  • v1.5
  • v1.6.0
  • v2.0
  • v2.2
  • v2.2.1
  • v2.4
  • v2.5
  • v2.6
  • workflow-makefile-updates

19 Nov 2024 08:21PM UTC coverage: 99.464% (+0.004%) from 99.46%
11921190435

push

github

ghukill
Add run_id CLI argument and Transformer attribute

Why these changes are being introduced:

As we move into Transmogrifier writing to a parquet dataset, one
important bit of information it will need is the concept of a
"run id".  This correlates directly to an "Execution UUID" that
every StepFunction invocation produces.  This identifier is then
used when writing the records to the parquet dataset, allowing
for quick and easy access to records associated with that identifier.

There is a small many-to-one relationship that makes naming a bit
awkward: each StepFunction invocation may run Transmogrifier multiple
times (e.g. multiple input files).  Each time it invokes Transmogrifier,
the same "run_id" would be passed.  This effectively groups the outputs
of all Transmogrifier invocations in the same location in the parquet
dataset.  The language of this new "run_id" in Transmogrifier is
intentionally somewhat high level, indicating it's just an identifier
to associate with that invocation of the run.

How this addresses that need:
* Adds new CLI argument -r / --run-id
* Transformer gets new attribute 'run_id'
* Transformer mints a UUID of a run id is not passed, making
the change backwards compatible and inconsequential if a run
id is not passed

Side effects of this change:
* Going forward, invocations of Transmogrifier can use the run id
as part of the parquet record writing.  Until then, it has no effect.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-403

13 of 13 new or added lines in 2 files covered. (100.0%)

1669 of 1678 relevant lines covered (99.46%)

0.99 hits per line

Relevant lines Covered
Build:
Build:
1678 RELEVANT LINES 1669 COVERED LINES
0.99 HITS PER LINE
Source Files on TIMX-403-inputs-support-parquet-writing
  • Tree
  • List 18
  • Changed 2
  • Source Changed 2
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
11921190435 TIMX-403-inputs-support-parquet-writing Add run_id CLI argument and Transformer attribute Why these changes are being introduced: As we move into Transmogrifier writing to a parquet dataset, one important bit of information it will need is the concept of a "run id". This correlates d... push 19 Nov 2024 08:27PM UTC ghukill github
99.46
See All Builds (691)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc