[BUG] Satellite inserting duplicates when granularity of source table and target satellite differ

**Describe the bug**

We are currently facing the issue that in satellites where we deviate from a direct 1:1 mapping of source table columns to satellite columns, the sat_v0 macro will keep inserting on incremental loads the same records with every load even if there is no change in the hashdiff.


**Environment**

- dbt version: core 1.11.2
- datavault4dbt version:  1.12.2
- Database/Platform: Snowflake

**To Reproduce**

Example source table:

| product_hk | payload_col_sat1 | payload_col_sat2 | ldts | 
|----------|----------|----------|----------|
| A  | Value B  | Value C  | 22-02-26 09:00:14:00 | 
| A  | Value B  | Value F  | 22-02-26 09:00:14:00 | 
| A  | Value B  | Value G  | 22-02-26 09:00:14:00 | 

This is a simplified example input where the granularity of the source table doesn't match the granularity of the target satellite (in this case the problem is in sat1). The HK and HD and ldts for all three records in this example will be the same on every load.

The behavior happens because the deduplication step in the macro has the logic for incremental loads to assign a row number that will later be used in the WHERE NOT EXISTS clause to only compare the latest record per HK of the batch with the target satellite.

The problem is that this row number is not aligned with the ordering of the deduplication function since on similar ldts the ordering of the two window functions will be non-deterministic.



    SELECT
    {{ parent_hashkey }},
    {{ ns.hdiff_alias }},
    {{- "\n\n    " ~ datavault4dbt.print_list(datavault4dbt.escape_column_names(source_cols)) if source_cols else " *" }}
    {% if is_incremental() -%}
    , ROW_NUMBER() OVER(PARTITION BY {{ parent_hashkey }} ORDER BY {{ src_ldts }}) as rn
    {%- endif %}
    FROM source_data
    QUALIFY
        CASE
            WHEN {{ ns.hdiff_alias }} = LAG({{ ns.hdiff_alias }}) OVER(PARTITION BY {{ parent_hashkey|lower }} ORDER BY {{ src_ldts }}) THEN FALSE
            ELSE TRUE
        END

The result is that whenever there are complete duplicates (HK, HD, LDTS all same, as is the case when reading records from a table with a higher granularity than the attributes for the satellite) row number 1 will not match with the QUALIFY=TRUE, and therefore records will be inserted again.

This could be easily fixed by breaking the CTE into two parts, the first one assigning the row number and then the second one deduplicating with the QUALIFY statement but ordering on the row number from the first CTE. This way both steps would be aligned and would not end up inserting the same records again.

**Expected behavior**

The macro should be able to handle cases when the source table has a higher granularity than the attributes tracked in the satellite and therefore will be presented with duplicates in each load. It shouldn't insert the same values with each when there is no change in the HD.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Satellite inserting duplicates when granularity of source table and target satellite differ #427

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

product_hk	payload_col_sat1	payload_col_sat2	ldts
A	Value B	Value C	22-02-26 09:00:14:00
A	Value B	Value F	22-02-26 09:00:14:00
A	Value B	Value G	22-02-26 09:00:14:00

[BUG] Satellite inserting duplicates when granularity of source table and target satellite differ #427

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions