Skip to content

SNOW-3266242: Support TRY_CAST with user-provided schema in DataFrameReader#4138

Open
sfc-gh-yuwang wants to merge 4 commits intomainfrom
SNOW-3266242
Open

SNOW-3266242: Support TRY_CAST with user-provided schema in DataFrameReader#4138
sfc-gh-yuwang wants to merge 4 commits intomainfrom
SNOW-3266242

Conversation

@sfc-gh-yuwang
Copy link
Copy Markdown
Collaborator

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-3266242

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    preiously try_cast is not supported when using user schema, this PR meant to fix this gap

self._file_type = "CSV"

schema_to_cast, transformations = None, None
use_user_schema = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to introduce this new var?
was there a bug before?

Copy link
Copy Markdown
Collaborator Author

@sfc-gh-yuwang sfc-gh-yuwang Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically there is a bug before,
in the analyzer: https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py#L1960
we need this bool value to actually apply try_cast to the result, otherwise try_cast is ignored

schema,
schema_to_cast,
transformations,
) = self._get_schema_from_csv_user_input(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check with @sfc-gh-skumbham to understand the scope? is this requirement just for csv?
do we need generalization for other formats?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed with him that this PR's scope is limited to CSV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants