fix: render Jinja blocks before splitting on semicolons (#2650) by sfc-gh-moczko · Pull Request #2782 · snowflakedb/snowflake-cli

sfc-gh-moczko · 2026-02-24T22:19:56Z

Summary

Fixes SNOW-2397013: Jinja IF-statements are not interpreted correctly #2650: Jinja block statements ({% if %}, {% for %}, etc.) containing semicolons were broken because split_statements() splits on ; before Jinja rendering, producing invalid template fragments.
Moves Jinja rendering to a pre-render step on the whole content before split_statements(), while legacy (&{ }) and standard (<% %>) syntax continues to render per-statement (they are variable-only, no blocks).
Adds 7 new tests covering: if-blocks, for-loops, if/else, mixed Jinja+standard syntax, false conditions, file-based input, and undefined variable errors.

Root Cause

The SQL execution pipeline called split_statements() then per-statement template rendering. Since split_statements is unaware of Jinja block syntax, a query like:

{% if var == 'Jinja' %}select 1;{% endif %}

was split into two broken fragments:

{% if var == 'Jinja' %}select 1; -- unclosed if block --> TemplateSyntaxError
{% endif %} -- orphaned endif --> TemplateSyntaxError

Fix

Jinja block rendering now happens on the whole content before splitting. The per-statement pipeline still handles legacy/standard variable substitution (which never had block syntax). This is safe because Jinja {{ }}/{% %} syntax is orthogonal to <% %>/&{ } -- rendering one leaves the other untouched.

Changes

File	Change
`statement_reader.py`	Add `pre_render` parameter to `query_reader()` and `files_reader()`
`manager.py`	Split Jinja into pre-render step (whole content) + per-statement step (legacy/standard only)
`tests/sql/test_sql.py`	Add 7 test cases for Jinja block scenarios

Test plan

sfc-gh-olorek

Left some comments, I am a bit scarred on changing the order of the rendering pipeline, as it may produce some unwanted side effects.

sfc-gh-olorek · 2026-03-19T19:32:15Z

+            content = f.read()
+            if pre_render:
+                content = pre_render(content)
+            stmts = split_statements(io.StringIO(content), remove_comments)


This runs before split_statements removes comments, so Jinja-like syntax inside SQL comments (e.g. -- {% if debug %}) would be resolved and could cause template errors.

I think sth like this will cause errors:

@mock.patch("snowflake.cli._plugins.sql.manager.SqlManager._execute_string") def test_jinja_syntax_in_sql_comment_causes_error(mock_execute_query): """Jinja-like syntax in SQL comments is resolved by pre-render because pre-render runs before comment removal.""" manager = SqlManager() query = "-- {% if debug %}enable tracing{% endif %}\nselect 1;" with pytest.raises(Exception, match="undefined"): _, results = manager.execute( query=query, files=None, std_in=False, data={}, template_syntax_config=SQLTemplateSyntaxConfig( enable_legacy_syntax=False, enable_standard_syntax=False, enable_jinja_syntax=True, ), ) list(results)

Fixed. Added _strip_sql_comments() — a state machine that strips -- and /* */ comments while preserving single-quoted strings and dollar-quoted strings. It runs as the first step inside _jinja_pre_render, before Jinja sees the content. test_jinja_template_syntax_in_sql_comment_is_ignored verifies that -- {{ undefined_var }} does not cause an UndefinedError.

Thanks for the fix. I have a concern about the approach: _strip_sql_comments permanently removes all SQL comments when Jinja is enabled, even when --retain-comments is used. This silently changes behavior for users who rely on comments reaching Snowflake.

I, for now, don't know a clean way around it, as for example escaping approach by preprocessing jinja semicolons by replacing ; inside {% %} with some placeholder also have some corner cases, and has a lot of unnecessary complexity.

This may result with a different approach for the whole pr, that I don't know yet. Maybe you have some ideas?

Good catch, and agreed this was a real regression. I've pushed a fix in the latest commit.

The approach: instead of stripping comments before Jinja renders, we now conditionally wrap them in {% raw %}...{% endraw %}. Jinja passes {% raw %} blocks through unchanged, so all comment content (including any {{ }} or {% %} syntax inside) is neutralised without actually removing the text.

Concretely, in _jinja_pre_render:

--retain-comments → _wrap_comments_in_jinja_raw() — comments become {% raw %}-- hint{% endraw %} before Jinja runs, then the raw tags are consumed and the original comment text survives into the output SQL.

default (strip) → _strip_sql_comments() — existing behaviour unchanged.

Both functions share a new _tokenize_sql() generator that handles string literal detection once, so the logic can't drift between the two paths.

The one theoretical edge case: a comment containing the literal string {% endraw %} would break the wrapping. In practice this can't appear in any real SQL (it's a Jinja-specific string with no SQL meaning), and the token is short enough that it's implausible as part of an optimizer hint. If it ever becomes a concern, the placeholder/nonce approach is the robust fallback.

Thanks for the improvement. I still think that it has some corner cases:

you acknowledged that a comment containing {% endraw %} is a limitation itself, but you said that it is practically impossible to hit. I disagree, as someone can document jinja escaping itself in a SQL command. I think we cannot proceed with this limitation forward

Snowflake supports nested comments but tokenizer ends preemptively. for example /* outer /* inner */ {{ boom }} */ SELECT 1; should be interpreted but with the current approach it fails.

_tokenize_sql - is very hacky and impossible to review by a human eye. Making the code not maintainable

can be resolved by escaping {% endraw %}, but this increases the complexity of the change even more

can be cleanly resolved by counting the comment depth, and only ending the when the depth reaches 0
to resolve 3. I think we need completely another approach to the problem. relying on a regex.

I have played with the code for a little while and found the following outline to be the simplest and solving all corner cases that I could identify:

# === statement_reader.py — additions (after ASYNC_SUFFIX) === _SQL_TOKEN_RE = re.compile( r""" '(?:[^'\\]|\\.| '')*' # single-quoted string (skip) | \$\$.*?\$\$ # $$-dollar-quoted string (skip) | (?P<block_start> /\* ) # block comment opening (nested scan) | (?P<line_comment> --[^\n]* ) # line comment (capture) """, re.VERBOSE | re.DOTALL, ) @dataclass class SavedComments: _comments: list = field(default_factory=list) _nonce: str = field(default_factory=lambda: uuid.uuid4().hex[:12]) def add(self, text: str) -> str: idx = len(self._comments) self._comments.append(text) return f"__SQLC_{self._nonce}_{idx}__" def restore(self, sql: str) -> str: for i, comment in enumerate(self._comments): sql = sql.replace(f"__SQLC_{self._nonce}_{i}__", comment) return sql def _protect_sql_comments(sql: str) -> tuple[str, SavedComments]: saved = SavedComments() result: list[str] = [] pos = 0 while pos < len(sql): m = _SQL_TOKEN_RE.search(sql, pos) if m is None: break if m.group("line_comment"): result.append(sql[pos : m.start()]) result.append(saved.add(m.group())) pos = m.end() elif m.group("block_start"): depth = 1 j = m.end() while j < len(sql) and depth > 0: if sql[j : j + 2] == "/*": depth += 1; j += 2 elif sql[j : j + 2] == "*/": depth -= 1; j += 2 else: j += 1 result.append(sql[pos : m.start()]) result.append(saved.add(sql[m.start() : j])) pos = j else: # String literal — emit as-is, advance past it result.append(sql[pos : m.end()]) pos = m.end() result.append(sql[pos:]) return "".join(result), saved # === statement_reader.py — signature changes only === def query_reader(source, operators, remove_comments=False, pre_render=None): if pre_render: source = pre_render(source) stmts = split_statements(io.StringIO(source), remove_comments) yield from recursive_statement_reader(stmts, [], operators, remove_comments, pre_render) def files_reader(paths, operators, remove_comments=False, pre_render=None): for path in paths: with path.open(read_file_limit_mb=UNLIMITED) as f: content = f.read() if pre_render: content = pre_render(content) stmts = split_statements(io.StringIO(content), remove_comments) yield from recursive_statement_reader( stmts, [path.as_posix()], operators, remove_comments, pre_render, ) def recursive_statement_reader(source, seen_files, operators, remove_comments, pre_render=None): # ... same logic, but in the !source/!load branch: content = parsed_source.statement.read() if pre_render: content = pre_render(content) yield from recursive_statement_reader( split_statements(io.StringIO(content), remove_comments), seen_files, operators, remove_comments, pre_render, ) # === manager.py — execute() body === stmt_operators = [] jinja_pre_render = None if template_syntax_config.enable_jinja_syntax: def _jinja_pre_render(content: str) -> str: content, saved = _protect_sql_comments(content) if template_syntax_config.enable_legacy_syntax: content = transpile_snowsql_templates(content) content = snowflake_sql_jinja_render( content, template_syntax_config=SQLTemplateSyntaxConfig( enable_legacy_syntax=template_syntax_config.enable_legacy_syntax, enable_standard_syntax=template_syntax_config.enable_standard_syntax, enable_jinja_syntax=True, ), data=data, ) return saved.restore(content) jinja_pre_render = _jinja_pre_render else: if template_syntax_config.enable_legacy_syntax: stmt_operators.append(transpile_snowsql_templates) stmt_operators.append(partial(snowflake_sql_jinja_render, ...)) # Pass jinja_pre_render to readers: stmt_reader = query_reader(query, stmt_operators, remove_comments, jinja_pre_render)

@sfc-gh-moczko what do you think about changing the PR with the following regex based aproach?

All three points are valid — switching to the regex placeholder approach as you outlined.

The implementation follows your sketch closely:

_SQL_TOKEN_RE regex skips single-quoted strings and $$-dollar-quoted strings, matches line comments (captured as a group), and matches block comment openings (handled separately).

_SavedComments dataclass stores original comment text keyed by nonce-prefixed index; restore() does a plain string replace.

Block comments use a depth-counting loop (not the regex) so nested /* ... /* ... */ ... */ is captured as a single placeholder.

_jinja_pre_render always protects then restores — no retain_comments conditional needed there. split_statements(remove_comments=…) remains the sole decision point for stripping vs. keeping.

{% raw %}, _tokenize_sql, _strip_sql_comments, and _wrap_comments_in_jinja_raw are all gone. Tests updated accordingly, including an explicit case for {% endraw %} inside a comment and for nested block comments.

Two minor follow-up cleanups pushed alongside the implementation:

test_jinja_template_syntax_in_sql_comment_is_ignored had a stale inline comment saying "SQL comments should be stripped first" — updated to say they are replaced with inert placeholders before Jinja runs.

test_retain_comments_with_jinja_enabled docstring still referenced _strip_sql_comments which no longer exists — updated to describe the actual mechanism.

No functional changes.

Thanks! lgtm!

sfc-gh-olorek

lgtm

sfc-gh-olorek · 2026-04-03T13:03:45Z

@sfc-gh-moczko lgtm please rebase on top of 0fd2247 to fix the integration tests

`split_statements()` splits SQL on `;` before template rendering, breaking Jinja block statements (`{% if %}`, `{% for %}`, etc.) that contain semicolons. This moves Jinja rendering to a pre-render step on the whole content before splitting, while legacy/standard syntax continues to render per-statement. Closes #2650 .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>

The for-loop renders to 3 separate SQL statements, which correctly produces nested result sets consistent with all other multi-statement output in the CLI. .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>

… to !source files - Add _strip_sql_comments() state-machine to statement_reader.py that preserves string literals and dollar-quoted strings, so template-like syntax inside SQL comments is not evaluated by Jinja. - When enable_jinja_syntax=True, consolidate all rendering (legacy transpile, standard/<% %> variables, Jinja blocks) into a single pre-render pass applied before split_statements. This preserves the correct standard->Jinja ordering and fixes Jinja blocks that span multiple semicolon-separated statements. - Propagate pre_render through recursive_statement_reader so files loaded via !source receive the same pre-render treatment as the top-level query. - Improve the misleading empty-output error message from "Use either query, filename or input option." to "No SQL statements found to execute." - Add unit tests for _strip_sql_comments, Jinja comment safety, updated error message, and !source pre_render propagation. Fixes: #2650 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…enabled When Jinja syntax is enabled, _jinja_pre_render was calling _strip_sql_comments unconditionally, silently discarding all comments regardless of the --retain-comments flag. Fix by introducing _wrap_comments_in_jinja_raw(): instead of stripping comments, it wraps them in {% raw %}...{% endraw %} so Jinja passes them through unchanged. _jinja_pre_render now chooses the right strategy based on retain_comments. Both functions share a new _tokenize_sql() generator that handles string literal detection in one place, eliminating the duplication risk.

…protection Three issues with the previous approach were identified in review: 1. A comment containing {% endraw %} (e.g. documenting Jinja escaping) would break the wrapping — not a theoretical edge case. 2. Snowflake supports nested block comments; the character-by-character tokenizer exited at the first */ leaving outer comment content exposed to Jinja. 3. _tokenize_sql was considered too complex and unmaintainable. New approach (_protect_sql_comments): - Regex finds string literals (skip), line comments, and block comment openings. - Line comments are replaced with nonce-keyed placeholders. - Block comments use a depth-counting loop to handle nesting correctly. - Placeholders are restored after Jinja renders, then split_statements handles the retain/strip decision via remove_comments as before. - No dependency on Jinja internals ({% raw %} removed entirely).

…roach

sfc-gh-moczko · 2026-04-08T09:03:34Z

CI update: all checks pass except the 4 integration-trusted jobs (ubuntu/windows × tests-trusted/tests-trusted-ng), which are timing out. This is the known SNOWCLI_IT warehouse credit limit issue — not related to this PR's changes.

sfc-gh-moczko requested a review from a team as a code owner February 24, 2026 22:19

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch 2 times, most recently from de27892 to f6d865b Compare March 3, 2026 09:06

sfc-gh-moczko self-assigned this Mar 3, 2026

sfc-gh-moczko marked this pull request as draft March 3, 2026 09:55

sfc-gh-moczko marked this pull request as ready for review March 3, 2026 10:34

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 225511d to 1f5d863 Compare March 5, 2026 14:01

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 1f5d863 to 4f42dd4 Compare March 18, 2026 17:43

sfc-gh-olorek reviewed Mar 19, 2026

View reviewed changes

sfc-gh-olorek mentioned this pull request Mar 24, 2026

fix: unify Jinja pre-render, strip SQL comments before Jinja, propagate to !source files #2831

Closed

2 tasks

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch 2 times, most recently from 8ff37ef to 2d28299 Compare March 25, 2026 14:09

sfc-gh-olorek approved these changes Apr 1, 2026

View reviewed changes

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 93bbb4c to 0669a17 Compare April 1, 2026 13:29

Maciej Oczko and others added 8 commits April 7, 2026 08:17

test: add test verifying standard-then-Jinja rendering order

c383175

fix: simplify assert to avoid Black 22.12 ast.Str crash on Python 3.12

16376fd

fix: update stale test comments to reflect placeholder protection app…

fb1a519

…roach

sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 0669a17 to fb1a519 Compare April 7, 2026 08:18

sfc-gh-moczko merged commit 4632461 into main Apr 8, 2026
35 of 47 checks passed

sfc-gh-moczko deleted the fix/2650-jinja-block-semicolons branch April 8, 2026 12:25

Conversation

sfc-gh-moczko commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Changes

Test plan

Uh oh!

sfc-gh-olorek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sfc-gh-olorek left a comment

Choose a reason for hiding this comment

Uh oh!

sfc-gh-olorek commented Apr 3, 2026

Uh oh!

sfc-gh-moczko commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfc-gh-moczko commented Feb 24, 2026 •

edited

Loading