fix: add configurable file encoding to resolve #2759 (UnicodeDecodeError on JP Windows) by sfc-gh-moczko · Pull Request #2791 · snowflakedb/snowflake-cli

sfc-gh-moczko · 2026-02-27T16:45:30Z

Summary

Adds a configuration-driven encoding system (SNOWFLAKE_CLI_ENCODING_FILE_IO env var or [cli.encoding] file_io in config.toml) to allow users on non-UTF-8 locales (e.g., Japanese Windows with CP932) to specify the file I/O encoding
Passes configured encoding through SecurePath.read_text() and SecurePath.open() at all file-reading callsites: SQL statement reader, Jinja rendering, Cortex commands, Snowpark requirements, and entity script templates
Returns None (platform default) when unconfigured, preserving existing behavior for all current users

Changes

New: src/snowflake/cli/api/encoding.py -- get_file_io_encoding(), get_subprocess_encoding(), detect_encoding_environment()
Modified: statement_reader.py -- from_file() and files_reader() now pass encoding (root cause of SNOW-3066557: UnicodeDecodeError when executing SQL files with UTF-8 encoding on Japanese Windows #2759)
Modified: rendering/jinja.py -- read_file_content() and procedure_from_js_file()
Modified: entities/utils.py -- render_script_template()
Modified: cortex/commands.py and cortex/manager.py -- 5 read_text() callsites
Modified: snowpark/commands.py -- _read_snowflake_requirements_file()
Modified: api/config.py -- added ENCODING_SECTION_PATH constant

Test plan

12 unit tests for encoding.py (env var, config, precedence, detection, mismatch warning)
2 new tests in test_statement_reader.py verifying UTF-8 Japanese SQL files work through both files_reader() and ParsedStatement.from_file()
2 new tests in test_secure_path.py verifying read_text(encoding=) and open(encoding=) pass-through
Bug reproduction confirmed: CP932 locale + UTF-8 Japanese SQL -> UnicodeDecodeError without fix, succeeds with SNOWFLAKE_CLI_ENCODING_FILE_IO=utf-8
All pre-commit hooks pass (ruff, black, mypy, check-ast, codespell, license headers)

Closes #2759

.... Generated with Cortex Code

…ror on JP Windows) On non-UTF-8 Windows locales (e.g., Japanese CP932), `snow sql -f` fails with UnicodeDecodeError because file I/O operations use the platform default encoding instead of UTF-8. This adds a configuration- driven encoding system that lets users set their preferred encoding via SNOWFLAKE_CLI_ENCODING_FILE_IO env var or config.toml without changing behavior for existing users on UTF-8 systems. Closes #2759 .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>

tests/sql/test_statement_reader.py writes UTF-8 files then reads them without configuring an encoding. On Windows the platform default is charmap (cp1252), causing a UnicodeDecodeError. Use monkeypatch to set SNOWFLAKE_CLI_ENCODING_FILE_IO=utf-8 so the tests verify the intended configured behaviour on all platforms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sfc-gh-olorek

Thank you for the contibution! Left some comments

sfc-gh-olorek · 2026-03-19T16:18:36Z

src/snowflake/cli/_plugins/cortex/commands.py

+        prompt = SecurePath(file).read_text(
+            file_size_limit_mb=DEFAULT_SIZE_LIMIT_MB,
+            encoding=get_file_io_encoding(),
+        )


Thanks for adding this! Just to be complete I noticed that nativeapp/artifacts.py and spcs/services/manager.py still uses the old version. Both would still fail on JP Windows. Could you also change it there?

Done — added encoding=get_file_io_encoding() to SecurePath.open() in both nativeapp/artifacts.py and spcs/services/manager.py.

sfc-gh-olorek · 2026-03-19T16:24:37Z

src/snowflake/cli/api/encoding.py

+def get_file_io_encoding() -> Optional[str]:
+    """Get configured file I/O encoding, or None for platform default.
+
+    Resolution order:
+    1. SNOWFLAKE_CLI_ENCODING_FILE_IO environment variable
+    2. config.toml [cli.encoding] file_io value
+    3. None (use platform default)
+    """
+    env_encoding = os.environ.get("SNOWFLAKE_CLI_ENCODING_FILE_IO")
+    if env_encoding:
+        return env_encoding
+
+    try:
+        from snowflake.cli.api.config import get_config_value
+
+        return get_config_value("cli", "encoding", key="file_io")
+    except Exception:
+        return None


In case of invalid encoding for exmpale: SNOWFLAKE_CLI_ENCODING_FILE_IO=not-a-codec the error will be generated late during file read. Could you add as a last step a verfication that the encoding is valid?:

try: codecs.lookup(encoding) except LookupError: raise CliError( f"Unknown encoding '{encoding}'. Check SNOWFLAKE_CLI_ENCODING_FILE_IO " f"or [cli.encoding] file_io in config.toml." )

Done — added _validate_encoding() which calls codecs.lookup() and raises CliError for unknown codec names. Validation fires immediately when the env var or config value is read, not deferred to file open time.

sfc-gh-olorek · 2026-03-19T16:26:35Z

src/snowflake/cli/api/encoding.py

+def get_subprocess_encoding() -> Optional[str]:
+    """Get configured subprocess encoding, or None for platform default.
+
+    Resolution order:
+    1. SNOWFLAKE_CLI_ENCODING_SUBPROCESS environment variable
+    2. config.toml [cli.encoding] subprocess value
+    3. None (use platform default)
+    """
+    env_encoding = os.environ.get("SNOWFLAKE_CLI_ENCODING_SUBPROCESS")
+    if env_encoding:
+        return env_encoding
+
+    try:
+        from snowflake.cli.api.config import get_config_value
+
+        return get_config_value("cli", "encoding", key="subprocess")
+    except Exception:
+        return None


This method is never used in the non-test codebase. Can we move it to the test package if it is a helper method or delete it if it not needed at all?

Done — detect_encoding_environment() is moved to the test file as a local helper. It is no longer in encoding.py.

sfc-gh-olorek · 2026-03-19T16:28:21Z

src/snowflake/cli/api/config.py

 PLUGINS_SECTION_PATH = [CLI_SECTION, PLUGINS_SECTION]
 PLUGIN_ENABLED_KEY = "enabled"
 FEATURE_FLAGS_SECTION_PATH = [CLI_SECTION, "features"]
+ENCODING_SECTION_PATH = [CLI_SECTION, "encoding"]


This variable is never used. Could we delete it or acutally use it? : see comment about: get_config_value

Done — ENCODING_SECTION_PATH is now imported and used in encoding.py, removing the duplicate hardcoded path.

sfc-gh-olorek · 2026-03-19T16:28:44Z

src/snowflake/cli/api/encoding.py

+    try:
+        from snowflake.cli.api.config import get_config_value
+
+        return get_config_value("cli", "encoding", key="file_io")


We can use ENCODING_SECTION_PATH for the hardcoded constants.

Done — both get_file_io_encoding() and get_subprocess_encoding() now call get_config_value(*ENCODING_SECTION_PATH, key=...).

sfc-gh-olorek · 2026-03-19T16:31:22Z

src/snowflake/cli/api/encoding.py

+    except Exception:
+        return None


except Exception with return None swallows all exceptions. What do you think about making the catch more granural (for example: KeyError, MissingConfiguration) and log the problem?

Done — replaced except Exception with _MISSING_ENCODING_EXCEPTIONS = (KeyError, NonExistentKey, MissingConfigOptionError, ConfigSourceError), matching exactly the exceptions get_config_value raises when a key is absent. Added a log.debug() call on the not-configured path.

sfc-gh-olorek · 2026-03-19T16:35:19Z

src/snowflake/cli/api/encoding.py

+        "locale": locale.getpreferredencoding(),
+    }
+
+    encodings = {v.lower().replace("-", "") for v in env_info.values()}


Nit: codec name can also contain underscore for exmple: utf_8. Could you use Python's codecs.lookup(name).name instead of replacing the -?

Done — detect_encoding_environment() (now in the test file) uses codecs.lookup(v).name for canonical comparison instead of string replacement.

- Wire ENCODING_SECTION_PATH constant into get_file_io_encoding() and get_subprocess_encoding() instead of hard-coded string literals - Add _validate_encoding() helper: calls codecs.lookup() and raises CliError with a clear message on invalid codec names (e.g. bad env var) - Narrow exception handling from broad `except Exception` to explicit (KeyError, NonExistentKey, MissingConfigOptionError, ConfigSourceError) with a log.debug() call so nothing is silently swallowed - Remove detect_encoding_environment() from production code; move it to tests/api/test_encoding.py as a test-local helper. Fix codec name comparison to use codecs.lookup(v).name instead of fragile .replace() - Add encoding=get_file_io_encoding() to SecurePath.open() in nativeapp/artifacts.py and spcs/services/manager.py so JP Windows users are covered in all file-reading paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…id encoding - Simplify _validate_encoding error message to "Check the value set in {source}" avoiding the redundant 'or [cli.encoding]...' clause when source is already a config path - Add test_raises_on_invalid_encoding to TestGetFileIoEncoding and TestGetSubprocessEncoding so the CliError raised on bad codec names is covered Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sfc-gh-moczko requested a review from a team as a code owner February 27, 2026 16:45

sfc-gh-moczko force-pushed the feature/encoding-fix branch from 0e10769 to 97c6e56 Compare March 18, 2026 17:09

sfc-gh-olorek reviewed Mar 19, 2026

View reviewed changes

sfc-gh-moczko and others added 2 commits March 23, 2026 17:14

Conversation

sfc-gh-moczko commented Feb 27, 2026

Summary

Changes

Test plan

Uh oh!

sfc-gh-olorek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants