fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#14
Open
hbrooks wants to merge 2 commits into
Open
fix(sql): cap parser input length via SQL_MAX_PARSE_LENGTH config#14hbrooks wants to merge 2 commits into
hbrooks wants to merge 2 commits into
Conversation
Adds a configurable upper bound on the size of SQL scripts accepted by the SQL parser. Scripts longer than SQL_MAX_PARSE_LENGTH (default 1,000,000 characters) are rejected before being passed to sqlglot. The check sits in SQLStatement._parse, so it applies to every code path that goes through SQLScript, including SQL Lab execute, format, RLS rewriting, dataset SQL, and database engine spec helpers. Set SQL_MAX_PARSE_LENGTH to None to disable.
Follow-up on the parse-length gate. Three blocking gaps: 1. The gate at the top of SQLStatement._parse missed three other call sites in the same module that hand strings directly to sqlglot.parse_one: SQLStatement.parse_predicate, the extract_tables_from_statement helper that builds a pseudo SELECT from an exp.Command literal, and the standalone transpile_to_dialect entry point. Any of these could be reached without going through SQLStatement._parse, so the previous single-site check was bypassable. Pulled the check out of SQLStatement and into a module-level helper, then called it from all four sqlglot.parse/parse_one sites so the bound cannot be bypassed by a direct caller. 2. The cap was in Unicode code points, not bytes. A 1M-codepoint string of four-byte characters is up to 4MB of payload that the parser still has to ingest. Switched to UTF-8 byte length so the bound directly reflects parser memory and CPU exposure. 3. The "current_app.config.get + except RuntimeError" pattern is not the codebase idiom for "config-with-fallback-outside-app". Replaced with `has_app_context()`, which matches the pattern already used in sql_lab.py, models/core.py, and others. Tests added in tests/unit_tests/sql/parse_tests.py: - accept exactly at the cap (boundary) - reject one byte over the cap - reject when codepoint count is under the cap but byte count is over - SQL_MAX_PARSE_LENGTH=None disables the gate - app-config value overrides the module fallback - SQLScript short-circuits sqlglot.parse on over-cap input (spy asserts zero calls, covers the MySQL-backtick double-parse path) - SQLStatement.parse_predicate is gated - transpile_to_dialect is gated Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirror of apache/superset#40499 by @sha174n