Background
Currently, we filter out _fivetran_deleted rows at the staging layer here. However, this approach prevents deleted rows from flowing downstream, which can cause issues with incremental models recognizing these rows for proper deletion handling.
For example, in cases where a transaction is deleted in the source, the deletion does not propagate to the final models because the _fivetran_deleted rows are removed early in the pipeline.
Proposed Solution
Update the _fivetran_deleted filtering strategy to defer the removal of these rows to downstream transformations. This change would allow incremental models to process deletions correctly while preserving the ability to exclude deleted rows in the final outputs.
To do
- Requires updating all models downstream of staging to ensure
_fivetran_deleted rows are handled appropriately.
- Needs validation to ensure no unintended side effects, such as retaining deleted rows in final outputs.
Steps to Implement
- Remove the
_fivetran_deleted filtering logic from staging models.
- Update downstream models to explicitly filter
_fivetran_deleted rows where necessary.
- Write tests to confirm that deleted rows are processed correctly in incremental and full-refresh scenarios.
Additional Context
This change is proposed as an alternative solution to address incremental data quality issues, particularly for users who cannot schedule full-refreshes.
Open Questions
- What performance implications might this change introduce in larger datasets?
Background
Currently, we filter out
_fivetran_deletedrows at the staging layer here. However, this approach prevents deleted rows from flowing downstream, which can cause issues with incremental models recognizing these rows for proper deletion handling.For example, in cases where a transaction is deleted in the source, the deletion does not propagate to the final models because the
_fivetran_deletedrows are removed early in the pipeline.Proposed Solution
Update the
_fivetran_deletedfiltering strategy to defer the removal of these rows to downstream transformations. This change would allow incremental models to process deletions correctly while preserving the ability to exclude deleted rows in the final outputs.To do
_fivetran_deletedrows are handled appropriately.Steps to Implement
_fivetran_deletedfiltering logic from staging models._fivetran_deletedrows where necessary.Additional Context
This change is proposed as an alternative solution to address incremental data quality issues, particularly for users who cannot schedule full-refreshes.
Open Questions