Skip to content

fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#14

Open
hbrooks wants to merge 2 commits into
masterfrom
demo/pr-40499
Open

fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#14
hbrooks wants to merge 2 commits into
masterfrom
demo/pr-40499

Conversation

@hbrooks

@hbrooks hbrooks commented May 28, 2026

Copy link
Copy Markdown

sha174n and others added 2 commits May 28, 2026 13:33
Adds a configurable upper bound on the size of SQL scripts accepted by
the SQL parser. Scripts longer than SQL_MAX_PARSE_LENGTH (default
1,000,000 characters) are rejected before being passed to sqlglot. The
check sits in SQLStatement._parse, so it applies to every code path that
goes through SQLScript, including SQL Lab execute, format, RLS rewriting,
dataset SQL, and database engine spec helpers. Set SQL_MAX_PARSE_LENGTH
to None to disable.
Follow-up on the parse-length gate. Three blocking gaps:

1. The gate at the top of SQLStatement._parse missed three other
   call sites in the same module that hand strings directly to
   sqlglot.parse_one: SQLStatement.parse_predicate, the
   extract_tables_from_statement helper that builds a pseudo SELECT
   from an exp.Command literal, and the standalone
   transpile_to_dialect entry point. Any of these could be reached
   without going through SQLStatement._parse, so the previous
   single-site check was bypassable.

   Pulled the check out of SQLStatement and into a module-level
   helper, then called it from all four sqlglot.parse/parse_one
   sites so the bound cannot be bypassed by a direct caller.

2. The cap was in Unicode code points, not bytes. A 1M-codepoint
   string of four-byte characters is up to 4MB of payload that the
   parser still has to ingest. Switched to UTF-8 byte length so the
   bound directly reflects parser memory and CPU exposure.

3. The "current_app.config.get + except RuntimeError" pattern is
   not the codebase idiom for "config-with-fallback-outside-app".
   Replaced with `has_app_context()`, which matches the pattern
   already used in sql_lab.py, models/core.py, and others.

Tests added in tests/unit_tests/sql/parse_tests.py:
- accept exactly at the cap (boundary)
- reject one byte over the cap
- reject when codepoint count is under the cap but byte count is over
- SQL_MAX_PARSE_LENGTH=None disables the gate
- app-config value overrides the module fallback
- SQLScript short-circuits sqlglot.parse on over-cap input (spy asserts
  zero calls, covers the MySQL-backtick double-parse path)
- SQLStatement.parse_predicate is gated
- transpile_to_dialect is gated

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants