Skip to content

fix: render Jinja blocks before splitting on semicolons (#2650)#2782

Merged
sfc-gh-moczko merged 8 commits intomainfrom
fix/2650-jinja-block-semicolons
Apr 8, 2026
Merged

fix: render Jinja blocks before splitting on semicolons (#2650)#2782
sfc-gh-moczko merged 8 commits intomainfrom
fix/2650-jinja-block-semicolons

Conversation

@sfc-gh-moczko
Copy link
Copy Markdown
Collaborator

@sfc-gh-moczko sfc-gh-moczko commented Feb 24, 2026

Summary

  • Fixes SNOW-2397013: Jinja IF-statements are not interpreted correctly #2650: Jinja block statements ({% if %}, {% for %}, etc.) containing semicolons were broken because split_statements() splits on ; before Jinja rendering, producing invalid template fragments.
  • Moves Jinja rendering to a pre-render step on the whole content before split_statements(), while legacy (&{ }) and standard (<% %>) syntax continues to render per-statement (they are variable-only, no blocks).
  • Adds 7 new tests covering: if-blocks, for-loops, if/else, mixed Jinja+standard syntax, false conditions, file-based input, and undefined variable errors.

Root Cause

The SQL execution pipeline called split_statements() then per-statement template rendering. Since split_statements is unaware of Jinja block syntax, a query like:

{% if var == 'Jinja' %}select 1;{% endif %}

was split into two broken fragments:

  1. {% if var == 'Jinja' %}select 1; -- unclosed if block --> TemplateSyntaxError
  2. {% endif %} -- orphaned endif --> TemplateSyntaxError

Fix

Jinja block rendering now happens on the whole content before splitting. The per-statement pipeline still handles legacy/standard variable substitution (which never had block syntax). This is safe because Jinja {{ }}/{% %} syntax is orthogonal to <% %>/&{ } -- rendering one leaves the other untouched.

Changes

File Change
statement_reader.py Add pre_render parameter to query_reader() and files_reader()
manager.py Split Jinja into pre-render step (whole content) + per-statement step (legacy/standard only)
tests/sql/test_sql.py Add 7 test cases for Jinja block scenarios

Test plan

  • Reproduction test passes
  • For-loop test produces 3 statements
  • If/else test selects correct branch
  • Mixed Jinja + standard syntax works
  • False condition raises CliArgumentError (no statements to execute)
  • File-based Jinja blocks via -f flag work
  • Undefined variable in block raises clear error
  • All 83 tests in test_sql.py pass (including 10 existing parametrized syntax config tests)
  • All 33 test_statement_reader.py tests pass
  • All pre-commit hooks pass (ruff, black, mypy, codespell)
  • Live Snowflake smoke test: 3 scenarios pass end-to-end

@sfc-gh-moczko sfc-gh-moczko requested a review from a team as a code owner February 24, 2026 22:19
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch 2 times, most recently from de27892 to f6d865b Compare March 3, 2026 09:06
@sfc-gh-moczko sfc-gh-moczko self-assigned this Mar 3, 2026
@sfc-gh-moczko sfc-gh-moczko marked this pull request as draft March 3, 2026 09:55
@sfc-gh-moczko sfc-gh-moczko marked this pull request as ready for review March 3, 2026 10:34
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 225511d to 1f5d863 Compare March 5, 2026 14:01
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 1f5d863 to 4f42dd4 Compare March 18, 2026 17:43
Copy link
Copy Markdown
Contributor

@sfc-gh-olorek sfc-gh-olorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, I am a bit scarred on changing the order of the rendering pipeline, as it may produce some unwanted side effects.

Comment thread src/snowflake/cli/_plugins/sql/manager.py
Comment thread src/snowflake/cli/_plugins/sql/statement_reader.py
content = f.read()
if pre_render:
content = pre_render(content)
stmts = split_statements(io.StringIO(content), remove_comments)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs before split_statements removes comments, so Jinja-like syntax inside SQL comments (e.g. -- {% if debug %}) would be resolved and could cause template errors.

I think sth like this will cause errors:

    @mock.patch("snowflake.cli._plugins.sql.manager.SqlManager._execute_string")
    def test_jinja_syntax_in_sql_comment_causes_error(mock_execute_query):
        """Jinja-like syntax in SQL comments is resolved by pre-render
        because pre-render runs before comment removal."""
        manager = SqlManager()
        query = "-- {% if debug %}enable tracing{% endif %}\nselect 1;"
        with pytest.raises(Exception, match="undefined"):
            _, results = manager.execute(
                query=query,
                files=None,
                std_in=False,
                data={},
                template_syntax_config=SQLTemplateSyntaxConfig(
                    enable_legacy_syntax=False,
                    enable_standard_syntax=False,
                    enable_jinja_syntax=True,
                ),
            )
            list(results)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added _strip_sql_comments() — a state machine that strips -- and /* */ comments while preserving single-quoted strings and dollar-quoted strings. It runs as the first step inside _jinja_pre_render, before Jinja sees the content. test_jinja_template_syntax_in_sql_comment_is_ignored verifies that -- {{ undefined_var }} does not cause an UndefinedError.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I have a concern about the approach: _strip_sql_comments permanently removes all SQL comments when Jinja is enabled, even when --retain-comments is used. This silently changes behavior for users who rely on comments reaching Snowflake.

I, for now, don't know a clean way around it, as for example escaping approach by preprocessing jinja semicolons by replacing ; inside {% %} with some placeholder also have some corner cases, and has a lot of unnecessary complexity.

This may result with a different approach for the whole pr, that I don't know yet. Maybe you have some ideas?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and agreed this was a real regression. I've pushed a fix in the latest commit.

The approach: instead of stripping comments before Jinja renders, we now conditionally wrap them in {% raw %}...{% endraw %}. Jinja passes {% raw %} blocks through unchanged, so all comment content (including any {{ }} or {% %} syntax inside) is neutralised without actually removing the text.

Concretely, in _jinja_pre_render:

  • --retain-comments_wrap_comments_in_jinja_raw() — comments become {% raw %}-- hint{% endraw %} before Jinja runs, then the raw tags are consumed and the original comment text survives into the output SQL.
  • default (strip) → _strip_sql_comments() — existing behaviour unchanged.

Both functions share a new _tokenize_sql() generator that handles string literal detection once, so the logic can't drift between the two paths.

The one theoretical edge case: a comment containing the literal string {% endraw %} would break the wrapping. In practice this can't appear in any real SQL (it's a Jinja-specific string with no SQL meaning), and the token is short enough that it's implausible as part of an optimizer hint. If it ever becomes a concern, the placeholder/nonce approach is the robust fallback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement. I still think that it has some corner cases:

  1. you acknowledged that a comment containing {% endraw %} is a limitation itself, but you said that it is practically impossible to hit. I disagree, as someone can document jinja escaping itself in a SQL command. I think we cannot proceed with this limitation forward

  2. Snowflake supports nested comments but tokenizer ends preemptively. for example /* outer /* inner */ {{ boom }} */ SELECT 1; should be interpreted but with the current approach it fails.

  3. _tokenize_sql - is very hacky and impossible to review by a human eye. Making the code not maintainable

  4. can be resolved by escaping {% endraw %}, but this increases the complexity of the change even more

  5. can be cleanly resolved by counting the comment depth, and only ending the when the depth reaches 0
    to resolve 3. I think we need completely another approach to the problem. relying on a regex.

I have played with the code for a little while and found the following outline to be the simplest and solving all corner cases that I could identify:

 # === statement_reader.py — additions (after ASYNC_SUFFIX) ===

    _SQL_TOKEN_RE = re.compile(
        r"""
          '(?:[^'\\]|\\.| '')*'             # single-quoted string (skip)
        | \$\$.*?\$\$                        # $$-dollar-quoted string (skip)
        | (?P<block_start>  /\*  )           # block comment opening (nested scan)
        | (?P<line_comment>  --[^\n]*  )     # line comment (capture)
        """,
        re.VERBOSE | re.DOTALL,
    )


    @dataclass
    class SavedComments:
        _comments: list = field(default_factory=list)
        _nonce: str = field(default_factory=lambda: uuid.uuid4().hex[:12])

        def add(self, text: str) -> str:
            idx = len(self._comments)
            self._comments.append(text)
            return f"__SQLC_{self._nonce}_{idx}__"

        def restore(self, sql: str) -> str:
            for i, comment in enumerate(self._comments):
                sql = sql.replace(f"__SQLC_{self._nonce}_{i}__", comment)
            return sql


    def _protect_sql_comments(sql: str) -> tuple[str, SavedComments]:
        saved = SavedComments()
        result: list[str] = []
        pos = 0

        while pos < len(sql):
            m = _SQL_TOKEN_RE.search(sql, pos)
            if m is None:
                break

            if m.group("line_comment"):
                result.append(sql[pos : m.start()])
                result.append(saved.add(m.group()))
                pos = m.end()

            elif m.group("block_start"):
                depth = 1
                j = m.end()
                while j < len(sql) and depth > 0:
                    if sql[j : j + 2] == "/*":
                        depth += 1; j += 2
                    elif sql[j : j + 2] == "*/":
                        depth -= 1; j += 2
                    else:
                        j += 1
                result.append(sql[pos : m.start()])
                result.append(saved.add(sql[m.start() : j]))
                pos = j

            else:
                # String literal — emit as-is, advance past it
                result.append(sql[pos : m.end()])
                pos = m.end()

        result.append(sql[pos:])
        return "".join(result), saved

    # === statement_reader.py — signature changes only ===

    def query_reader(source, operators, remove_comments=False, pre_render=None):
        if pre_render:
            source = pre_render(source)
        stmts = split_statements(io.StringIO(source), remove_comments)
        yield from recursive_statement_reader(stmts, [], operators, remove_comments, pre_render)

    def files_reader(paths, operators, remove_comments=False, pre_render=None):
        for path in paths:
            with path.open(read_file_limit_mb=UNLIMITED) as f:
                content = f.read()
                if pre_render:
                    content = pre_render(content)
                stmts = split_statements(io.StringIO(content), remove_comments)
                yield from recursive_statement_reader(
                    stmts, [path.as_posix()], operators, remove_comments, pre_render,
                )

    def recursive_statement_reader(source, seen_files, operators, remove_comments, pre_render=None):
        # ... same logic, but in the !source/!load branch:
        content = parsed_source.statement.read()
        if pre_render:
            content = pre_render(content)
        yield from recursive_statement_reader(
            split_statements(io.StringIO(content), remove_comments),
            seen_files, operators, remove_comments, pre_render,
        )

    # === manager.py — execute() body ===

    stmt_operators = []
    jinja_pre_render = None

    if template_syntax_config.enable_jinja_syntax:

        def _jinja_pre_render(content: str) -> str:
            content, saved = _protect_sql_comments(content)
            if template_syntax_config.enable_legacy_syntax:
                content = transpile_snowsql_templates(content)
            content = snowflake_sql_jinja_render(
                content,
                template_syntax_config=SQLTemplateSyntaxConfig(
                    enable_legacy_syntax=template_syntax_config.enable_legacy_syntax,
                    enable_standard_syntax=template_syntax_config.enable_standard_syntax,
                    enable_jinja_syntax=True,
                ),
                data=data,
            )
            return saved.restore(content)

        jinja_pre_render = _jinja_pre_render
    else:
        if template_syntax_config.enable_legacy_syntax:
            stmt_operators.append(transpile_snowsql_templates)
        stmt_operators.append(partial(snowflake_sql_jinja_render, ...))

    # Pass jinja_pre_render to readers:
    stmt_reader = query_reader(query, stmt_operators, remove_comments, jinja_pre_render)

@sfc-gh-moczko what do you think about changing the PR with the following regex based aproach?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three points are valid — switching to the regex placeholder approach as you outlined.

The implementation follows your sketch closely:

  • _SQL_TOKEN_RE regex skips single-quoted strings and $$-dollar-quoted strings, matches line comments (captured as a group), and matches block comment openings (handled separately).
  • _SavedComments dataclass stores original comment text keyed by nonce-prefixed index; restore() does a plain string replace.
  • Block comments use a depth-counting loop (not the regex) so nested /* ... /* ... */ ... */ is captured as a single placeholder.
  • _jinja_pre_render always protects then restores — no retain_comments conditional needed there. split_statements(remove_comments=…) remains the sole decision point for stripping vs. keeping.

{% raw %}, _tokenize_sql, _strip_sql_comments, and _wrap_comments_in_jinja_raw are all gone. Tests updated accordingly, including an explicit case for {% endraw %} inside a comment and for nested block comments.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor follow-up cleanups pushed alongside the implementation:

  • test_jinja_template_syntax_in_sql_comment_is_ignored had a stale inline comment saying "SQL comments should be stripped first" — updated to say they are replaced with inert placeholders before Jinja runs.
  • test_retain_comments_with_jinja_enabled docstring still referenced _strip_sql_comments which no longer exists — updated to describe the actual mechanism.

No functional changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! lgtm!

Comment thread src/snowflake/cli/_plugins/sql/manager.py Outdated
Copy link
Copy Markdown
Contributor

@sfc-gh-olorek sfc-gh-olorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 93bbb4c to 0669a17 Compare April 1, 2026 13:29
@sfc-gh-olorek
Copy link
Copy Markdown
Contributor

@sfc-gh-moczko lgtm please rebase on top of 0fd2247 to fix the integration tests

Maciej Oczko and others added 8 commits April 7, 2026 08:17
`split_statements()` splits SQL on `;` before template rendering,
breaking Jinja block statements (`{% if %}`, `{% for %}`, etc.) that
contain semicolons. This moves Jinja rendering to a pre-render step
on the whole content before splitting, while legacy/standard syntax
continues to render per-statement.

Closes #2650

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
The for-loop renders to 3 separate SQL statements, which correctly
produces nested result sets consistent with all other multi-statement
output in the CLI.

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
… to !source files

- Add _strip_sql_comments() state-machine to statement_reader.py that
  preserves string literals and dollar-quoted strings, so template-like
  syntax inside SQL comments is not evaluated by Jinja.

- When enable_jinja_syntax=True, consolidate all rendering (legacy
  transpile, standard/<% %> variables, Jinja blocks) into a single
  pre-render pass applied before split_statements. This preserves the
  correct standard->Jinja ordering and fixes Jinja blocks that span
  multiple semicolon-separated statements.

- Propagate pre_render through recursive_statement_reader so files
  loaded via !source receive the same pre-render treatment as the
  top-level query.

- Improve the misleading empty-output error message from
  "Use either query, filename or input option." to
  "No SQL statements found to execute."

- Add unit tests for _strip_sql_comments, Jinja comment safety,
  updated error message, and !source pre_render propagation.

Fixes: #2650

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…enabled

When Jinja syntax is enabled, _jinja_pre_render was calling _strip_sql_comments
unconditionally, silently discarding all comments regardless of the
--retain-comments flag.

Fix by introducing _wrap_comments_in_jinja_raw(): instead of stripping comments,
it wraps them in {% raw %}...{% endraw %} so Jinja passes them through unchanged.
_jinja_pre_render now chooses the right strategy based on retain_comments.

Both functions share a new _tokenize_sql() generator that handles string literal
detection in one place, eliminating the duplication risk.
…protection

Three issues with the previous approach were identified in review:
1. A comment containing {% endraw %} (e.g. documenting Jinja escaping) would
   break the wrapping — not a theoretical edge case.
2. Snowflake supports nested block comments; the character-by-character tokenizer
   exited at the first */ leaving outer comment content exposed to Jinja.
3. _tokenize_sql was considered too complex and unmaintainable.

New approach (_protect_sql_comments):
- Regex finds string literals (skip), line comments, and block comment openings.
- Line comments are replaced with nonce-keyed placeholders.
- Block comments use a depth-counting loop to handle nesting correctly.
- Placeholders are restored after Jinja renders, then split_statements handles
  the retain/strip decision via remove_comments as before.
- No dependency on Jinja internals ({% raw %} removed entirely).
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2650-jinja-block-semicolons branch from 0669a17 to fb1a519 Compare April 7, 2026 08:18
@sfc-gh-moczko
Copy link
Copy Markdown
Collaborator Author

CI update: all checks pass except the 4 integration-trusted jobs (ubuntu/windows × tests-trusted/tests-trusted-ng), which are timing out. This is the known SNOWCLI_IT warehouse credit limit issue — not related to this PR's changes.

@sfc-gh-moczko sfc-gh-moczko merged commit 4632461 into main Apr 8, 2026
35 of 47 checks passed
@sfc-gh-moczko sfc-gh-moczko deleted the fix/2650-jinja-block-semicolons branch April 8, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-2397013: Jinja IF-statements are not interpreted correctly

2 participants