• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MITLibraries / geo-harvester
98%
main: 100%

Build:
Build:
LAST BUILD BRANCH: IN-1246-pip-audit
DEFAULT BRANCH: main
Repo Added 21 Dec 2023 09:08PM UTC
Files 26
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH GDT-241-post-normalize-data-cleanup
branch: GDT-241-post-normalize-data-cleanup
CHANGE BRANCH
x
Reset
  • GDT-241-post-normalize-data-cleanup
  • GDT-109-no-cdn-write-ogm
  • GDT-109-normalize-ogm-records
  • GDT-116-fix-wkt-coords
  • GDT-149-updates-from-full-mit-harvest
  • GDT-156-memory-performance
  • GDT-159-make-bbox-optional-and-add-quality-validation
  • GDT-159-remove-bbox-requirement-from-mitaardvark-and-add-wkt-validator
  • GDT-170-remove-sqs-check
  • GDT-195-normalize-format-resource-type
  • GDT-226-alma-marc-fetch-and-filter
  • GDT-227-refactor-source-format-classes
  • GDT-227-transform-alma-marc-optional-fields
  • GDT-227-transform-alma-marc-required-fields
  • GDT-230-add-alma-marc-cli-cmd
  • GDT-267-no-records-skip-output-file
  • GDT-269-ogm-incremental-use-rss-vs-api
  • GDT-275-sqs-missing-file
  • GDT-290-pool-and-dedupe-eb-events
  • GDT-323-resolve-container-oom-error-for-cdn-writes
  • GDT-329-skip-ogm-supressed
  • GDT-68-update-external-url
  • GDT-85-ogm-fetch-records
  • GDT-85-ogm-github-api
  • GDT-85-ogm-incremental-harvest-improvements
  • GDT-87-send-eb-events
  • IN-1088-maintenance-2024-11
  • IN-1246-pip-audit
  • ISS-59-fgdc-mapping-bug
  • bug-modify-failed-record-logging
  • cdn-url-fix
  • dependabot/pip/aiohttp-3.9.2
  • dependabot/pip/attrs-23.2.0
  • dependabot/pip/black-24.2.0
  • dependabot/pip/black-24.3.0
  • dependabot/pip/black-24.4.1
  • dependabot/pip/black-24.4.2
  • dependabot/pip/boto3-1.34.10
  • dependabot/pip/boto3-1.34.11
  • dependabot/pip/boto3-1.34.12
  • dependabot/pip/boto3-1.34.13
  • dependabot/pip/boto3-1.34.14
  • dependabot/pip/boto3-1.34.15
  • dependabot/pip/boto3-1.34.16
  • dependabot/pip/boto3-1.34.17
  • dependabot/pip/boto3-1.34.19
  • dependabot/pip/boto3-1.34.20
  • dependabot/pip/boto3-1.34.21
  • dependabot/pip/boto3-1.34.22
  • dependabot/pip/boto3-1.34.23
  • dependabot/pip/boto3-1.34.25
  • dependabot/pip/boto3-1.34.26
  • dependabot/pip/boto3-1.34.27
  • dependabot/pip/boto3-1.34.28
  • dependabot/pip/boto3-1.34.29
  • dependabot/pip/boto3-1.34.30
  • dependabot/pip/boto3-1.34.31
  • dependabot/pip/boto3-1.34.32
  • dependabot/pip/boto3-1.34.33
  • dependabot/pip/boto3-1.34.34
  • dependabot/pip/boto3-1.34.35
  • dependabot/pip/boto3-1.34.36
  • dependabot/pip/boto3-1.34.37
  • dependabot/pip/boto3-1.34.38
  • dependabot/pip/boto3-1.34.39
  • dependabot/pip/boto3-1.34.40
  • dependabot/pip/boto3-1.34.41
  • dependabot/pip/boto3-1.34.42
  • dependabot/pip/boto3-1.34.43
  • dependabot/pip/boto3-1.34.44
  • dependabot/pip/boto3-1.34.45
  • dependabot/pip/boto3-1.34.46
  • dependabot/pip/boto3-1.34.47
  • dependabot/pip/boto3-1.34.48
  • dependabot/pip/boto3-1.34.49
  • dependabot/pip/boto3-1.34.50
  • dependabot/pip/boto3-1.34.51
  • dependabot/pip/boto3-1.34.52
  • dependabot/pip/boto3-1.34.53
  • dependabot/pip/boto3-1.34.54
  • dependabot/pip/boto3-1.34.55
  • dependabot/pip/boto3-1.34.56
  • dependabot/pip/boto3-1.34.57
  • dependabot/pip/boto3-1.34.58
  • dependabot/pip/boto3-1.34.59
  • dependabot/pip/boto3-1.34.6
  • dependabot/pip/boto3-1.34.60
  • dependabot/pip/boto3-1.34.61
  • dependabot/pip/boto3-1.34.62
  • dependabot/pip/boto3-1.34.63
  • dependabot/pip/boto3-1.34.64
  • dependabot/pip/boto3-1.34.65
  • dependabot/pip/boto3-1.34.66
  • dependabot/pip/boto3-1.34.67
  • dependabot/pip/boto3-1.34.68
  • dependabot/pip/boto3-1.34.69
  • dependabot/pip/boto3-1.34.7
  • dependabot/pip/boto3-1.34.70
  • dependabot/pip/boto3-1.34.71
  • dependabot/pip/boto3-1.34.72
  • dependabot/pip/boto3-1.34.73
  • dependabot/pip/boto3-1.34.74
  • dependabot/pip/boto3-1.34.75
  • dependabot/pip/boto3-1.34.76
  • dependabot/pip/boto3-1.34.77
  • dependabot/pip/boto3-1.34.78
  • dependabot/pip/boto3-1.34.79
  • dependabot/pip/boto3-1.34.8
  • dependabot/pip/boto3-1.34.80
  • dependabot/pip/boto3-1.34.81
  • dependabot/pip/boto3-1.34.82
  • dependabot/pip/boto3-1.34.83
  • dependabot/pip/boto3-1.34.84
  • dependabot/pip/boto3-1.34.85
  • dependabot/pip/boto3-1.34.86
  • dependabot/pip/boto3-1.34.87
  • dependabot/pip/boto3-1.34.88
  • dependabot/pip/boto3-1.34.89
  • dependabot/pip/boto3-1.34.9
  • dependabot/pip/boto3-1.34.90
  • dependabot/pip/boto3-1.34.91
  • dependabot/pip/boto3-1.34.92
  • dependabot/pip/boto3-1.34.93
  • dependabot/pip/boto3-1.34.94
  • dependabot/pip/boto3-1.34.95
  • dependabot/pip/boto3-1.34.96
  • dependabot/pip/boto3-1.34.97
  • dependabot/pip/boto3-1.34.98
  • dependabot/pip/boto3-stubs-1.34.10
  • dependabot/pip/boto3-stubs-1.34.11
  • dependabot/pip/boto3-stubs-1.34.12
  • dependabot/pip/boto3-stubs-1.34.13
  • dependabot/pip/boto3-stubs-1.34.14
  • dependabot/pip/boto3-stubs-1.34.15
  • dependabot/pip/boto3-stubs-1.34.16
  • dependabot/pip/boto3-stubs-1.34.17
  • dependabot/pip/boto3-stubs-1.34.19
  • dependabot/pip/boto3-stubs-1.34.20
  • dependabot/pip/boto3-stubs-1.34.21
  • dependabot/pip/boto3-stubs-1.34.22
  • dependabot/pip/boto3-stubs-1.34.23
  • dependabot/pip/boto3-stubs-1.34.25
  • dependabot/pip/boto3-stubs-1.34.26
  • dependabot/pip/boto3-stubs-1.34.27
  • dependabot/pip/boto3-stubs-1.34.28
  • dependabot/pip/boto3-stubs-1.34.29
  • dependabot/pip/boto3-stubs-1.34.30
  • dependabot/pip/boto3-stubs-1.34.31
  • dependabot/pip/boto3-stubs-1.34.32
  • dependabot/pip/boto3-stubs-1.34.33
  • dependabot/pip/boto3-stubs-1.34.34
  • dependabot/pip/boto3-stubs-1.34.36
  • dependabot/pip/boto3-stubs-1.34.37
  • dependabot/pip/boto3-stubs-1.34.38
  • dependabot/pip/boto3-stubs-1.34.39
  • dependabot/pip/boto3-stubs-1.34.40
  • dependabot/pip/boto3-stubs-1.34.41
  • dependabot/pip/boto3-stubs-1.34.42
  • dependabot/pip/boto3-stubs-1.34.43
  • dependabot/pip/boto3-stubs-1.34.44
  • dependabot/pip/boto3-stubs-1.34.45
  • dependabot/pip/boto3-stubs-1.34.46
  • dependabot/pip/boto3-stubs-1.34.47
  • dependabot/pip/boto3-stubs-1.34.48
  • dependabot/pip/boto3-stubs-1.34.49
  • dependabot/pip/boto3-stubs-1.34.50
  • dependabot/pip/boto3-stubs-1.34.51
  • dependabot/pip/boto3-stubs-1.34.52
  • dependabot/pip/boto3-stubs-1.34.53
  • dependabot/pip/boto3-stubs-1.34.54
  • dependabot/pip/boto3-stubs-1.34.55
  • dependabot/pip/boto3-stubs-1.34.56
  • dependabot/pip/boto3-stubs-1.34.57
  • dependabot/pip/boto3-stubs-1.34.58
  • dependabot/pip/boto3-stubs-1.34.59
  • dependabot/pip/boto3-stubs-1.34.6
  • dependabot/pip/boto3-stubs-1.34.60
  • dependabot/pip/boto3-stubs-1.34.61
  • dependabot/pip/boto3-stubs-1.34.62
  • dependabot/pip/boto3-stubs-1.34.63
  • dependabot/pip/boto3-stubs-1.34.64
  • dependabot/pip/boto3-stubs-1.34.65
  • dependabot/pip/boto3-stubs-1.34.66
  • dependabot/pip/boto3-stubs-1.34.67
  • dependabot/pip/boto3-stubs-1.34.68
  • dependabot/pip/boto3-stubs-1.34.69
  • dependabot/pip/boto3-stubs-1.34.7
  • dependabot/pip/boto3-stubs-1.34.70
  • dependabot/pip/boto3-stubs-1.34.71
  • dependabot/pip/boto3-stubs-1.34.72
  • dependabot/pip/boto3-stubs-1.34.73
  • dependabot/pip/boto3-stubs-1.34.74
  • dependabot/pip/boto3-stubs-1.34.75
  • dependabot/pip/boto3-stubs-1.34.76
  • dependabot/pip/boto3-stubs-1.34.77
  • dependabot/pip/boto3-stubs-1.34.78
  • dependabot/pip/boto3-stubs-1.34.79
  • dependabot/pip/boto3-stubs-1.34.8
  • dependabot/pip/boto3-stubs-1.34.80
  • dependabot/pip/boto3-stubs-1.34.81
  • dependabot/pip/boto3-stubs-1.34.82
  • dependabot/pip/boto3-stubs-1.34.83
  • dependabot/pip/boto3-stubs-1.34.84
  • dependabot/pip/boto3-stubs-1.34.85
  • dependabot/pip/boto3-stubs-1.34.86
  • dependabot/pip/boto3-stubs-1.34.87
  • dependabot/pip/boto3-stubs-1.34.88
  • dependabot/pip/boto3-stubs-1.34.89
  • dependabot/pip/boto3-stubs-1.34.9
  • dependabot/pip/boto3-stubs-1.34.90
  • dependabot/pip/boto3-stubs-1.34.91
  • dependabot/pip/boto3-stubs-1.34.92
  • dependabot/pip/boto3-stubs-1.34.93
  • dependabot/pip/boto3-stubs-1.34.94
  • dependabot/pip/boto3-stubs-1.34.95
  • dependabot/pip/boto3-stubs-1.34.96
  • dependabot/pip/boto3-stubs-1.34.97
  • dependabot/pip/boto3-stubs-1.34.98
  • dependabot/pip/cryptography-42.0.4
  • dependabot/pip/freezegun-1.5.0
  • dependabot/pip/idna-3.7
  • dependabot/pip/ipython-8.19.0
  • dependabot/pip/ipython-8.20.0
  • dependabot/pip/ipython-8.21.0
  • dependabot/pip/jinja2-3.1.3
  • dependabot/pip/jinja2-3.1.4
  • dependabot/pip/moto-5.0.1
  • dependabot/pip/moto-5.0.3
  • dependabot/pip/moto-5.0.4
  • dependabot/pip/moto-5.0.5
  • dependabot/pip/mypy-1.10.0
  • dependabot/pip/mypy-1.8.0
  • dependabot/pip/pre-commit-3.6.1
  • dependabot/pip/pygit2-1.14.1
  • dependabot/pip/ruff-0.1.11
  • dependabot/pip/ruff-0.1.13
  • dependabot/pip/ruff-0.1.14
  • dependabot/pip/ruff-0.1.15
  • dependabot/pip/ruff-0.1.9
  • dependabot/pip/ruff-0.2.0
  • dependabot/pip/ruff-0.2.1
  • dependabot/pip/sentry-sdk-1.40.1
  • dependabot/pip/sentry-sdk-1.40.2
  • dependabot/pip/sentry-sdk-1.40.3
  • dependabot/pip/sentry-sdk-1.40.4
  • dependabot/pip/sentry-sdk-1.40.5
  • dependabot/pip/sentry-sdk-1.40.6
  • dependabot/pip/sentry-sdk-1.41.0
  • dependabot/pip/sentry-sdk-1.42.0
  • dependabot/pip/sentry-sdk-1.43.0
  • dependabot/pip/sentry-sdk-1.44.0
  • dependabot/pip/sentry-sdk-1.44.1
  • dependabot/pip/sentry-sdk-1.45.0
  • dependabot/pip/shapely-2.0.3
  • dependabot/pip/types-python-dateutil-2.8.19.20240311
  • dependabot/pip/types-python-dateutil-2.9.0.20240315
  • dependabot/pip/types-python-dateutil-2.9.0.20240316
  • dependabot/pip/types-requests-2.31.0.20240406
  • dependabot/pip/werkzeug-3.0.3
  • fix-tests-from-merges
  • handle-ogm-missing-data
  • main
  • refs/tags/v1.0.0
  • refs/tags/v1.1.0
  • refs/tags/v1.2.0
  • refs/tags/v1.2.1
  • refs/tags/v1.3.0
  • v1.3.1

26 Mar 2024 02:56PM UTC coverage: 98.369% (+0.01%) from 98.358%
8438177369

push

github

ghukill
Post normalize data quality hooks

Why these changes are being introduced:

It was discovered that some TIMDEX records had empty strings for 'subjects' field that originated
from this harvester.  While it could be addressed whack-a-mole style at the individual metadata format
normalization logic, a more holistic approach is performing some data cleanup after normalization
logic has taken place, removing values that should never end up in the final MITAardvark record.

How this addresses that need:
* adds post normalization data cleanup method _remove_none_and_blank_strings()
* adds post normalization data cleanup method _dedupe_list_fields()
* both methods are run for all field and values, for all source record normalizations

Side effects of this change:
* None values and empty strings removed from final MITAardvark record

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-241

20 of 20 new or added lines in 1 file covered. (100.0%)

1870 of 1901 relevant lines covered (98.37%)

0.98 hits per line

Relevant lines Covered
Build:
Build:
1901 RELEVANT LINES 1870 COVERED LINES
0.98 HITS PER LINE
Source Files on GDT-241-post-normalize-data-cleanup
  • Tree
  • List 29
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
8438177369 GDT-241-post-normalize-data-cleanup Post normalize data quality hooks Why these changes are being introduced: It was discovered that some TIMDEX records had empty strings for 'subjects' field that originated from this harvester. While it could be addressed whack-a-mole style at t... push 26 Mar 2024 03:01PM UTC ghukill github
98.37
See All Builds (389)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc