Skip to content

Fix AssertionError crash when INFER_SCHEMA fails for JSON/ORC/AVRO files#4129

Merged
sfc-gh-helmeleegy merged 6 commits intomainfrom
fix/infer-schema-fallback-json
Mar 26, 2026
Merged

Fix AssertionError crash when INFER_SCHEMA fails for JSON/ORC/AVRO files#4129
sfc-gh-helmeleegy merged 6 commits intomainfrom
fix/infer-schema-fallback-json

Conversation

@sfc-gh-dyadav
Copy link
Copy Markdown
Contributor

@sfc-gh-dyadav sfc-gh-dyadav commented Mar 20, 2026

Fix AssertionError crash when INFER_SCHEMA fails for JSON/ORC/AVRO files

Summary

  • When INFER_SCHEMA fails for semi-structured file formats (JSON, ORC, AVRO), the reader now handles the error gracefully instead of crashing with an empty AssertionError
  • Mirrors the existing error handling that CSV already has: raises FileNotFoundError directly, logs a warning with the actual error for other failures, and disables INFER_SCHEMA to fall back to $1 VARIANT schema

Problem

In _read_semi_structured_file, when _infer_schema_for_file_format() failed (e.g., malformed JSON returning Snowflake error 100069), the exception was silently discarded and schema_to_cast remained None. However, INFER_SCHEMA stayed True in options, so SnowflakePlanBuilder.read_file() later hit assert schema_to_cast is not None — producing an empty AssertionError with no useful context for the user.

The CSV reader (line 810-822) already handled this correctly by checking the exception, setting a fallback schema_to_cast, and logging a warning. The JSON/ORC/AVRO path was missing this handling entirely.

Before / After

Before After
Error type AssertionError SnowparkSQLException (actual Snowflake error) or logged warning + $1 VARIANT fallback
Error message (empty) 100069: Error parsing JSON: ... or warning with full exception details
User action None — no indication of what went wrong Clear guidance: check file validity or provide explicit .schema()

Customer Impact

Vertex Pharmaceuticals (account 3813899, va4) hit this bug in session 16380571490304122 when reading malformed JSON files from an S3 stage via a Snowflake Notebook. They saw empty AssertionError exceptions (SCOS error code 5001) with no indication that the underlying issue was 100069: Error parsing JSON.

Changes

  • src/snowflake/snowpark/dataframe_reader.py: In _read_semi_structured_file, when _infer_schema_for_file_format() returns an error for JSON/ORC/AVRO:
    1. Raise FileNotFoundError directly (same as CSV)
    2. For other errors, log a warning with the actual exception and disable INFER_SCHEMA so the downstream assert schema_to_cast is not None doesn't fire

Companion PR

This fix has a companion change in the SAS repo (map_read_partitioned_file.py, map_read_json.py) that adds defensive AssertionError handling at the SCOS layer, converting empty assertions into descriptive ValueError messages with error code 5001. This ensures graceful behavior even with older Snowpark Python versions.

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-3201780

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

When INFER_SCHEMA fails for JSON/ORC/AVRO files (e.g. malformed content
triggering error 100069), the exception was silently swallowed, leaving
INFER_SCHEMA=True with no schema_to_cast. This caused a downstream
AssertionError with no message, hiding the actual failure from users.

Now when inference fails: FileNotFoundError is re-raised directly, other
errors log a warning and set INFER_SCHEMA=False so the read falls back
to a single VARIANT column (matching existing CSV fallback behavior).
The real Snowflake error surfaces when the data is actually queried.

Made-with: Cursor
…ndling

Cover three scenarios for JSON, AVRO, ORC, and Parquet formats:
- Generic error: logs warning and falls back to VARIANT schema
- FileNotFoundError: re-raised directly to the caller
- Successful inference: no warning logged, normal behavior

Made-with: Cursor
@sfc-gh-helmeleegy sfc-gh-helmeleegy merged commit 2243ab8 into main Mar 26, 2026
32 of 35 checks passed
@sfc-gh-helmeleegy sfc-gh-helmeleegy deleted the fix/infer-schema-fallback-json branch March 26, 2026 02:40
@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants