Skip to content

NOTICKET (feat): Add performance optimization for snowflake external tables#339

Open
Squisher23 wants to merge 4 commits intoScalefreeCOM:mainfrom
Squisher23:snowflake_external_table_performance_max_days_late
Open

NOTICKET (feat): Add performance optimization for snowflake external tables#339
Squisher23 wants to merge 4 commits intoScalefreeCOM:mainfrom
Squisher23:snowflake_external_table_performance_max_days_late

Conversation

@Squisher23
Copy link

Add performance optimization for snowflake external tables tables by limiting the partitions based on ldts

Description

Snowflake has some weaknesses in external table implementation and only does partition pruning when usings literals as filter on the partition ( see (https://community.snowflake.com/s/article/Pruning-is-not-happening-subquery for details)

The issue #335 already solved our most critical issue with the runtime of satelites, but also links, hubs and tracking-satelites have been getting slower and slower (from seconds to several minutes) with the increasing number of parquet-files in our data lake.

This PR tries to solve the problem by adding a new parameter datavault4dbt.max_days_for_late_arriving_data and filtering all staging tables to the last x days that were configured. Adding this feature speed up our dbt from 30 Minutes to just 5 minutes and should not have other consequences as long as you run dbt more often than the max_days_for_late_arriving_data and you don't have any data sources that deliver data later than the max_days_for_late_arriving_data-Parameter. To recreate the whole vault the parameter needs to be deleted or set to a value high enough

Type of change

Please delete options that are not relevant.

  • [ x] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

  • Tests you ran

Test Configuration:

  • datavault4dbt-Version: 1.98
  • dbt-Version:core, 1.92
  • dbt-adapter-Version: dbt-snowflake 1.91

Checklist:

  • [ x] I have performed a self-review of my code
  • [ x] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation or included information that needs updates (e.g. in the Wiki) -> I don't know how to do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants