Conversation
Add 12 critical user journey test scenarios covering table lifecycle, session management, stored procedures, UDFs/UDTFs, cursor reuse, MERGE INTO, COPY INTO, CREATE VIEW/CTAS, object discovery, metadata/catalog operations, transaction control, and error handling. Each journey includes a language-agnostic Gherkin feature definition under tests/definitions/shared/ and a Python E2E implementation under python/tests/e2e/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new set of “critical user journey” scenarios (Gherkin) and corresponding Python e2e pytest implementations to cover common Snowflake workflows (DDL/DML lifecycle, object discovery/metadata, session management, COPY/PUT, MERGE, UDF/UDTF, stored procedures).
Changes:
- Introduces new shared Gherkin feature specs under
tests/definitions/shared/critical_user_journeys/. - Adds Python e2e test modules under
python/tests/e2e/critical_user_journeys/implementing those scenarios. - Establishes a Python package marker for the new test area via
__init__.py.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
tests/definitions/shared/critical_user_journeys/udf_udtf_lifecycle.feature |
Defines UDF/UDTF lifecycle scenarios for Python e2e coverage. |
tests/definitions/shared/critical_user_journeys/table_lifecycle.feature |
Defines table DDL + filtering scenarios. |
tests/definitions/shared/critical_user_journeys/stored_procedure_lifecycle.feature |
Defines stored procedure create/call/drop scenario. |
tests/definitions/shared/critical_user_journeys/session_management.feature |
Defines role switching and restoration scenario. |
tests/definitions/shared/critical_user_journeys/object_discovery.feature |
Defines SHOW/DESCRIBE/DROP discovery scenarios. |
tests/definitions/shared/critical_user_journeys/metadata_catalog.feature |
Defines SHOW TABLES / DESCRIBE TABLE metadata scenarios. |
tests/definitions/shared/critical_user_journeys/merge_into.feature |
Defines MERGE INTO upsert scenario. |
tests/definitions/shared/critical_user_journeys/create_view_ctas.feature |
Defines view creation and CTAS scenarios. |
tests/definitions/shared/critical_user_journeys/copy_into.feature |
Defines PUT + COPY INTO bulk load scenario. |
python/tests/e2e/critical_user_journeys/test_udf_udtf_lifecycle.py |
Implements UDF/UDTF lifecycle e2e tests. |
python/tests/e2e/critical_user_journeys/test_table_lifecycle.py |
Implements table create/describe and WHERE filtering e2e tests. |
python/tests/e2e/critical_user_journeys/test_stored_procedure_lifecycle.py |
Implements stored procedure lifecycle e2e test. |
python/tests/e2e/critical_user_journeys/test_session_management.py |
Implements role switch/restore e2e test. |
python/tests/e2e/critical_user_journeys/test_object_discovery.py |
Implements object discovery and drop verification e2e tests. |
python/tests/e2e/critical_user_journeys/test_metadata_catalog.py |
Implements catalog inspection e2e tests. |
python/tests/e2e/critical_user_journeys/test_merge_into.py |
Implements MERGE INTO upsert e2e test. |
python/tests/e2e/critical_user_journeys/test_create_view_ctas.py |
Implements CREATE VIEW and CTAS e2e tests. |
python/tests/e2e/critical_user_journeys/test_copy_into.py |
Implements PUT + COPY INTO bulk load e2e test. |
python/tests/e2e/critical_user_journeys/__init__.py |
Marks the new critical journey e2e tests package. |
| cursor.execute( | ||
| f"CREATE OR REPLACE TABLE {table_name} (" | ||
| f"id INT NOT NULL, " | ||
| f"name VARCHAR(100), " | ||
| f"active BOOLEAN, " | ||
| f"score FLOAT, " | ||
| f"amount NUMBER(10,2))" | ||
| ) |
There was a problem hiding this comment.
The scenario/step text calls this a temporary table, but the test creates a non-temporary table. In e2e, tmp_schema is shared across the session and the repo guidance recommends CREATE OR REPLACE TEMPORARY TABLE to avoid name conflicts and cleanup overhead (see python/tests/e2e/conftest.py).
| cursor.execute( | ||
| f"CREATE OR REPLACE TABLE {table_name} (" | ||
| f"id INT NOT NULL, " | ||
| f"name VARCHAR(100), " | ||
| f"active BOOLEAN, " | ||
| f"score FLOAT, " | ||
| f"amount NUMBER(10,2))" | ||
| ) |
There was a problem hiding this comment.
Same issue as above: the step says a temporary table exists, but this creates a non-temporary table in the session-scoped tmp_schema. Prefer CREATE OR REPLACE TEMPORARY TABLE here to match the feature and to avoid persistent objects across the e2e session (per python/tests/e2e/conftest.py).
| cursor.execute(f"CREATE STAGE {stage_name}") | ||
|
|
||
| # And A temporary table "copy_test" with columns (id INT, name VARCHAR, val FLOAT) exists | ||
| cursor.execute(f"CREATE TABLE {table_name} (id INT, name VARCHAR, val FLOAT)") |
There was a problem hiding this comment.
This creates a non-temporary stage/table with fixed names and without OR REPLACE / IF NOT EXISTS, which can make reruns flaky within the session-scoped tmp_schema. Prefer a temporary stage/table (or unique names) and use CREATE OR REPLACE / IF NOT EXISTS; there are existing helpers in python/tests/e2e/put_get/put_get_helper.py (e.g., create_temporary_stage).
| cursor.execute(f"CREATE STAGE {stage_name}") | |
| # And A temporary table "copy_test" with columns (id INT, name VARCHAR, val FLOAT) exists | |
| cursor.execute(f"CREATE TABLE {table_name} (id INT, name VARCHAR, val FLOAT)") | |
| cursor.execute(f"CREATE OR REPLACE TEMPORARY STAGE {stage_name}") | |
| # And A temporary table "copy_test" with columns (id INT, name VARCHAR, val FLOAT) exists | |
| cursor.execute( | |
| f"CREATE OR REPLACE TEMPORARY TABLE {table_name} (id INT, name VARCHAR, val FLOAT)" | |
| ) |
| csv_path.write_text("1,Alice,3.14\n2,Bob,2.72\n3,Charlie,1.41\n") | ||
|
|
||
| # When The CSV file is PUT to the stage | ||
| cursor.execute(f"PUT 'file://{csv_path}' @{stage_name} AUTO_COMPRESS=FALSE OVERWRITE=TRUE") |
There was a problem hiding this comment.
Building the PUT 'file://...' URI from a raw Path may break on Windows due to backslashes. Use the existing as_file_uri(Path) helper (or csv_path.as_posix()) when formatting the PUT command to keep the test cross-platform.
| cursor.execute(f"PUT 'file://{csv_path}' @{stage_name} AUTO_COMPRESS=FALSE OVERWRITE=TRUE") | |
| cursor.execute( | |
| f"PUT 'file://{csv_path.as_posix()}' @{stage_name} AUTO_COMPRESS=FALSE OVERWRITE=TRUE" | |
| ) |
| cursor.execute(f"CREATE TABLE {target_table} (id INT, name VARCHAR, amount INT)") | ||
| cursor.execute(f"INSERT INTO {target_table} VALUES (1, 'original_1', 100), (2, 'original_2', 200)") | ||
|
|
||
| # And A source table with rows (2, 'updated_2', 250) and (3, 'new_3', 300) exists | ||
| cursor.execute(f"CREATE TABLE {source_table} (id INT, name VARCHAR, amount INT)") | ||
| cursor.execute(f"INSERT INTO {source_table} VALUES (2, 'updated_2', 250), (3, 'new_3', 300)") |
There was a problem hiding this comment.
These tables are created without OR REPLACE / IF NOT EXISTS and are non-temporary in the session-scoped tmp_schema, which can cause flakiness on reruns. Prefer CREATE OR REPLACE TEMPORARY TABLE (or at least CREATE OR REPLACE TABLE) for both target/source to match the e2e fixture guidance (python/tests/e2e/conftest.py).
| cursor.execute(f"CREATE TABLE {target_table} (id INT, name VARCHAR, amount INT)") | |
| cursor.execute(f"INSERT INTO {target_table} VALUES (1, 'original_1', 100), (2, 'original_2', 200)") | |
| # And A source table with rows (2, 'updated_2', 250) and (3, 'new_3', 300) exists | |
| cursor.execute(f"CREATE TABLE {source_table} (id INT, name VARCHAR, amount INT)") | |
| cursor.execute(f"INSERT INTO {source_table} VALUES (2, 'updated_2', 250), (3, 'new_3', 300)") | |
| cursor.execute( | |
| f"CREATE OR REPLACE TEMPORARY TABLE {target_table} (id INT, name VARCHAR, amount INT)" | |
| ) | |
| cursor.execute( | |
| f"INSERT INTO {target_table} VALUES (1, 'original_1', 100), (2, 'original_2', 200)" | |
| ) | |
| # And A source table with rows (2, 'updated_2', 250) and (3, 'new_3', 300) exists | |
| cursor.execute( | |
| f"CREATE OR REPLACE TEMPORARY TABLE {source_table} (id INT, name VARCHAR, amount INT)" | |
| ) | |
| cursor.execute( | |
| f"INSERT INTO {source_table} VALUES (2, 'updated_2', 250), (3, 'new_3', 300)" | |
| ) |
| cursor.execute(f"INSERT INTO {source_table} VALUES (1, 'first', 1.5), (2, 'second', 2.5), (3, 'third', 3.5)") | ||
|
|
||
| # When CREATE TABLE AS SELECT is executed filtering val > 2.0 | ||
| cursor.execute(f"CREATE TABLE {new_table} AS SELECT * FROM {source_table} WHERE val > 2.0") |
There was a problem hiding this comment.
CREATE TABLE ... AS SELECT will fail on reruns if the table already exists in the session-scoped tmp_schema. Prefer CREATE OR REPLACE TABLE (or explicitly DROP TABLE IF EXISTS first) to keep the e2e test idempotent.
| cursor.execute(f"CREATE TABLE {new_table} AS SELECT * FROM {source_table} WHERE val > 2.0") | |
| cursor.execute(f"CREATE OR REPLACE TABLE {new_table} AS SELECT * FROM {source_table} WHERE val > 2.0") |
| And A table with columns (id INT NOT NULL, name VARCHAR(100), val NUMBER(10,2)) exists | ||
| When SHOW TABLES LIKE 'disc_show_describe' is executed | ||
| Then 1 row should be returned | ||
| When DESCRIBE TABLE e2e_discovery_test is executed |
There was a problem hiding this comment.
The scenario creates a table named disc_show_describe, but the step text says DESCRIBE TABLE e2e_discovery_test which implies a different object. Update the step wording (and matching test comment) to reference the table that is actually created so the feature remains an accurate executable spec.
| When DESCRIBE TABLE e2e_discovery_test is executed | |
| When DESCRIBE TABLE disc_show_describe is executed |
| Then SHOW TABLES should return 0 rows | ||
| And SHOW VIEWS should return 0 rows |
There was a problem hiding this comment.
SHOW TABLES / SHOW VIEWS returning 0 rows is not deterministic in a shared schema (other tests may create objects). Consider scoping these steps to a specific object name (e.g., SHOW TABLES LIKE ... IN SCHEMA ...) to make the scenario reliably true and aligned with the Python implementation.
| Then SHOW TABLES should return 0 rows | |
| And SHOW VIEWS should return 0 rows | |
| Then SHOW TABLES LIKE 'disc_show_describe' should return 0 rows | |
| And SHOW VIEWS LIKE 'disc_view_show' should return 0 rows |
sfc-gh-turbaszek
left a comment
There was a problem hiding this comment.
Imo we need to properly define the aim of those tests. Currently majority of them is testing Snowflake behavior not driver. In my opinion we should focus on scenarios where driver does anything with the result. For example the MERGE statements where rowcount / rows affected are crucial for processing. We should look for similar scenarios where numDmlDuplicates is used.
| csv_path = tmp_path / "data.csv" | ||
| csv_path.write_text("1,Alice,3.14\n2,Bob,2.72\n3,Charlie,1.41\n") |
There was a problem hiding this comment.
Should we create test file? I think this is a pattern we use for other tests?
| # Then LS @copy_test_stage should show 1 file | ||
| cursor.execute(f"LS @{stage_name}") | ||
| ls_results = cursor.fetchall() | ||
| assert len(ls_results) == 1 |
There was a problem hiding this comment.
Let's assert this is the expected file
| assert rows[0][0] == 1 | ||
| assert rows[0][1] == "Alice" | ||
| assert rows[0][2] == pytest.approx(3.14) | ||
| assert rows[1][0] == 2 | ||
| assert rows[1][1] == "Bob" | ||
| assert rows[1][2] == pytest.approx(2.72) | ||
| assert rows[2][0] == 3 | ||
| assert rows[2][1] == "Charlie" | ||
| assert rows[2][2] == pytest.approx(1.41) |
There was a problem hiding this comment.
I would opt for assert rows == [...] to see what changes in whole output.
| def test_should_create_view_and_query_filtered_data(self, execute_query, cursor, tmp_schema): | ||
| """Test CREATE VIEW and query filtered data.""" | ||
| # Given Snowflake client is logged in | ||
| assert_connection_is_open(execute_query) |
There was a problem hiding this comment.
Is it necccessary? I think @sfc-gh-fpawlowski was getting rid of it
| cursor.execute(f"INSERT INTO {source_table} VALUES (1, 'first', 1.0), (2, 'second', 2.0), (3, 'third', 3.0)") | ||
|
|
||
| # When A view is created that filters rows where id > 1 | ||
| cursor.execute(f"CREATE OR REPLACE VIEW {view_name} AS SELECT * FROM {source_table} WHERE id > 1") |
There was a problem hiding this comment.
What is special about view? What CUJ we check here?
| ) | ||
|
|
||
| # When SHOW TABLES LIKE 'meta_show' is executed | ||
| cursor.execute(f"SHOW TABLES LIKE 'meta_show' IN SCHEMA {tmp_schema}") |
There was a problem hiding this comment.
Seems to be copy of https://github.com/snowflakedb/universal-driver/blob/main/python/tests/e2e/query/test_cuj.py let's unify the CUJ testing
| assert len(results) == 1, f"Expected 1 table, got {len(results)}" | ||
| # The table name appears in the result (typically in column index 1) | ||
| row = results[0] | ||
| assert "META_SHOW" in str(row).upper(), f"Table name not found in result: {row}" |
There was a problem hiding this comment.
Should we assert proper schema of the result?
| class TestObjectDiscovery: | ||
| """Tests for object discovery operations.""" | ||
|
|
||
| def test_should_discover_table_via_show_and_describe(self, execute_query, cursor, tmp_schema): |
There was a problem hiding this comment.
how this differs from TestMetadataCatalog?
| # Then 1 row should be returned | ||
| assert len(show_results) == 1, f"Expected 1 view, got {len(show_results)}" | ||
|
|
||
| def test_should_verify_objects_are_gone_after_drop(self, execute_query, cursor, tmp_schema): |
There was a problem hiding this comment.
This is testing Snowflake not driver.
| # Then SELECT CURRENT_ROLE() should return "PUBLIC" | ||
| cursor.execute("SELECT CURRENT_ROLE()") | ||
| result = cursor.fetchone() | ||
| assert result[0] == "PUBLIC" |
There was a problem hiding this comment.
How do we know the role has really changed?
|
Closing as outdated |
Tests scenarios representing critical user journeys uncovered by existing e2e tests. Scenarios are effect of LLM analysis of 1st party clients: Snowflake CLI, Snowfort, Snowpark, SQLAlchemy and existing driver test cases
The idea is to test scenarios used by Python early adopters in other drivers as well