Skip to content

[Bug] Performance degradation by clustering/sorting #127

@rwang-lyra

Description

@rwang-lyra

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

we have observed much longer runtime with "cluster_by = ['transaction_id']," introduced in v0.13.0 release; the code refence is here: https://github.com/fivetran/dbt_netsuite/pull/117/files#diff-144d46b313d4b1851f6b2a20d16c25a6f41758af76cb008d874c1f61530383f3R7 and is with 3 major models.

seems the sorting takes about 40% of runtime and snowflake warns about the high cardinatlity of transcation_id is causing long runtime.

we have tried locally to compare; on balance_sheet model - 25 minutes with current code, clustering enables vs 14 minutes with clustering removed.

Relevant error log or model output

No response

Expected behavior

shorter execution time

dbt Project configurations

dbt-core==1.7.14
dbt-snowflake==1.7.5

Package versions

  • package: dbt-labs/dbt_utils
    version: 1.1.1
  • package: brooklyn-data/dbt_artifacts
    version: 2.3.0
  • package: dbt-labs/dbt_project_evaluator
    version: 0.8.0
  • package: dbt-labs/dbt_external_tables
    version: 0.8.7

What database are you using dbt with?

snowflake

dbt Version

dbt-core==1.7.14
dbt-snowflake==1.7.5

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.

Metadata

Metadata

Assignees

No one assigned

    Labels

    error:unforcedstatus:staleIssue was blocked or had no user response for more than 30 daystype:bugSomething is broken or incorrect

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions