12907877087
52%
master: 52%

Ran 22 Jan 2025 12:22PM UTC

Jobs 1

Files 160

Run time 3min

Badge

Embed ▾

Committed 22 Jan 2025 10:50AM UTC coverage: 51.904% (-0.02%) from 51.927%

Build # 12907877087

Build Type

push

github

Committed by

pondzix

Commit Message

Use full type name when checking max schema key

Scenario:

* Input batch contains data using schema, let’s say link_click
* link_click schema is used as a context AND as an entity/self describing event
* We have multiple versions of  link_click schema

When translated to the content of shredding_complete JSON message, it would contain Iines like this:
```
      "types": [
        {
          "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-0",
          "snowplowEntity": "SELF_DESCRIBING_EVENT"
        },
        ....
        {
          "schemaKey": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1",
          "snowplowEntity": "CONTEXT"
        }
      ]
```

For such scenario it looks like we skip necessary warehouse migration for self describing column. We only execute migration for the context:

```
INFO Migration: Migrating contexts_com_snowplowanalytics_snowplow_link_click_1 AddColumn(Fragment("ALTER TABLE atomic.events ADD COLUMN contexts_com_snowplowanalytics_snowplow_link_click_1 ARRAY"),List()) (pre-transaction)
```
but never for unstruct_event_com_snowplowanalytics_snowplow_link_click_1, which results in an error when inserting data to the table:

```
ERROR Error executing transaction. Sleeping for 30 seconds for the first time
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error: error line 1 at position 1,779
invalid identifier 'UNSTRUCT_EVENT_COM_SNOWPLOWANALYTICS_SNOWPLOW_LINK_CLICK_1'
```

It seems to be caused by this [line](https://github.com/snowplow/snowplow-rdb-loader/blob/fffcbe460/modules/loader/src/main/scala/com/snowplowanalytics/snowplow/rdbloader/discovery/DataDiscovery.scala#L213) where we group incoming types by the name and find max schema key per group. But name doesn’t contain unstruct/context prefix, so it’s possible to “lose” type when schema is used as both context/unstruct + with older version.

I think the solution here cou... (continued)

Run Details

1 of 1 new or added line in 1 file covered. (100.0%)

2 existing lines in 2 files now uncovered.

2317 of 4464 relevant lines covered (51.9%)

0.83 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
1	12907877087.1	22 Jan 2025 12:22PM UTC	160	51.9	GitHub Action Run

snowplow / snowplow-rdb-loader / 12907877087
52%
master: 52%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 12907877087

snowplow / snowplow-rdb-loader / 12907877087 52% master: 52%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 12907877087

snowplow / snowplow-rdb-loader / 12907877087
52%
master: 52%

README BADGES
x