Skip to content

fix(sql): broaden mutating-statement detection in SQL Lab parser#20

Open
hbrooks wants to merge 7 commits into
masterfrom
demo/pr-40421
Open

fix(sql): broaden mutating-statement detection in SQL Lab parser#20
hbrooks wants to merge 7 commits into
masterfrom
demo/pr-40421

Conversation

@hbrooks

@hbrooks hbrooks commented May 28, 2026

Copy link
Copy Markdown

sha174n and others added 7 commits May 25, 2026 15:25
is_mutating() and check_functions_present() in superset/sql/parse.py
previously missed several construct shapes that share a meta cause:
AST nodes that are not subclasses of the expected types, opaque
exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully
structure, and function-name patterns that mutate server state without
a wrapping DML node.

This change addresses:

1. PostgreSQL opaque exp.Command constructs now classified as mutating
   alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE,
   EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM.

2. Structured nodes added to the mutating-node tuple: exp.Copy,
   exp.Grant, exp.Revoke. Some dialects parse these into structured
   nodes (e.g. COPY in PostgreSQL becomes exp.Copy, not exp.Command),
   so node-type matching is needed in addition to command-name
   matching.

3. PostgreSQL large-object writer functions classified as mutating by
   name: lo_from_bytea, lo_export, lo_import, lo_put, lo_create,
   lowrite. These appear as plain function calls inside an exp.Select
   wrapper and have no enclosing mutating AST node. The full lo_*
   family (writers and readers) is also added to the PostgreSQL entry
   in DISALLOWED_SQL_FUNCTIONS for defense in depth.

4. SELECT ... INTO new_table FROM ... parses as exp.Select with an
   `into` arg (PostgreSQL CTAS variant). It creates a new relation
   and is now treated as mutating.

5. check_functions_present() now visits exp.SessionParameter so that
   MySQL's @@<name> syntax matches denylist entries like `version`
   or `hostname`. exp.SessionParameter is not a subclass of exp.Func,
   so the existing exp.Func walk skipped it.

Tests
-----
- New: test_is_mutating_postgres_function_and_select_into covers the
  lo_* writers (positive), lo_* readers (negative, blocked separately
  via denylist), case-insensitivity, and SELECT INTO.
- New: test_is_mutating_postgres_command_constructs covers
  PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM,
  with the pre-existing DO and EXPLAIN ANALYZE controls included.
- New: test_check_functions_present_session_parameter covers MySQL
  @@Version / @@hostname etc., plus lo_* function denylist matching.
- All 562 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ + sql_lab_test.py
  pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up to the initial broadening of is_mutating() and
check_functions_present(). This change closes the remaining bypass
surfaces that the SQL Lab read-only gate did not catch:

1. PostgreSQL information_schema views. Adds
   information_schema.tables, .columns, .schemata, .views,
   .routines, .role_table_grants, .role_column_grants,
   .role_routine_grants, .table_privileges, .column_privileges,
   .usage_privileges, .key_column_usage, .table_constraints,
   .referential_constraints, .view_table_usage, .applicable_roles,
   .enabled_roles to DISALLOWED_SQL_TABLES['postgresql'].

   These entries are schema-qualified, so they require the new
   schema-aware matching in check_tables_present (see below).
   Without this change, a user table happening to be named `tables`
   or `columns` would have been falsely blocked alongside the
   information_schema views.

2. check_tables_present schema-qualified matching.

   The previous matcher built `present = {t.table.lower() ...}`
   from parsed table references, stripping the schema. A denylist
   entry of `information_schema.tables` therefore could not match
   without also matching every user table named `tables`. The new
   matcher distinguishes bare entries (no dot in the denylist
   string) from schema-qualified entries (with a dot):

     - Bare entries match by table name regardless of schema
       (existing behavior preserved for `pg_stat_activity` etc.).
     - Schema-qualified entries match only when both schema and
       table match.

3. SHOW commands. Adds SHOW to _POSTGRES_MUTATING_COMMAND_NAMES so
   it is rejected by the allow_dml=False gate, mirroring the
   read-side blocks already in DISALLOWED_SQL_FUNCTIONS for
   pg_read_file, version(), current_setting, etc. SHOW reads
   server-configuration state (server version, hba_file, ssl,
   search_path) which has the same information-disclosure profile
   as those functions.

4. PostgreSQL setval() sequence mutator. Adds SETVAL to
   _MUTATING_FUNCTION_NAMES. setval() looks like a read but
   advances sequence state for every subsequent nextval caller.

Tests
-----
- New test_check_tables_present_schema_qualified covers bare vs
  schema-qualified matching, including the must-not-match case
  for user tables sharing a name with an information_schema view.
- test_is_mutating_postgres_function_and_select_into extended
  with setval positive and case-insensitivity cases.
- test_is_mutating_postgres_command_constructs extended with
  SHOW positive cases (search_path, all, server_version).
- Pre-existing test_has_mutation expectation for
  `SHOW search_path` flipped from False to True to reflect the
  intentional behavior change; `SET search_path TO public`
  remains non-mutating because it parses as exp.Set, not
  exp.Command.
- All 576 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ and
  sql_lab_test.py pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous parametrization for test_check_tables_present_schema_qualified
exercised only PostgreSQL. The shipped DISALLOWED_SQL_TABLES entries
for MySQL and MSSQL are also schema-qualified (mysql.user,
performance_schema.threads, performance_schema.processlist,
sys.sql_logins, sys.server_principals, sys.configurations) and rely
on the same matcher. Without explicit dialect coverage a future
refactor of check_tables_present could silently regress one engine
while leaving the others working.

Adds parametrized cases for:

- mysql: mysql.user, performance_schema.threads,
  performance_schema.processlist (positive); mydb.user (negative,
  user-authored table sharing the leaf name must not be blocked).
- mssql: sys.sql_logins, sys.server_principals, sys.configurations
  (positive); mydb.sql_logins (negative).

All 17 parameter cases pass; no regressions in the broader suite
(584 tests in tests/unit_tests/sql/parse_tests.py).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Master moved the duckdb pin but never regenerated the pinned
requirements file, so every PR's check-python-deps job has been
failing on this single-line diff. Regen via the uv-pip-compile path
that CI uses (`uv pip compile pyproject.toml requirements/base.in`).
Coverage was at 99 (fail-under=100). Both concrete subclasses override
is_destructive(), so the base-class `raise NotImplementedError()` line
was never executed in tests. Added a direct test that calls the unbound
base method on a sentinel object, matching the pattern used elsewhere
in the codebase for abstract stubs.
check_tables_present had `if not bare: continue` after
`bare = (t.table or "").lower()`. sqlglot Table.table is always
non-empty for any parseable table reference, so the guard never
triggered and broke the 100% coverage gate.

Use `t.table.lower()` directly. If a future sqlglot version ever
returns None, the AttributeError surfaces immediately rather than
silently polluting the present-tables set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants