Skip to content

Commit 30b1e5e

Browse files
authored
Feat: improve unit test failure reporting when rows are missing (#2338)
* Feat: improve unit test failure reporting when rows are missing * Update docs * PR feedback
1 parent 17c2c35 commit 30b1e5e

6 files changed

Lines changed: 218 additions & 98 deletions

File tree

docs/concepts/tests.md

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Tests within a suite file contain the following attributes:
1414

1515
* The unique name of a test
1616
* The name of the model targeted by this test
17+
* [Optional] The test's description
1718
* Test inputs, which are defined per upstream model or external table referenced by the target model. Each test input consists of the following:
1819
* The name of an upstream model or external table
1920
* The list of rows defined as a mapping from a column name to a value associated with it
@@ -28,6 +29,7 @@ The YAML format is defined as follows:
2829
```yaml linenums="1"
2930
<unique_test_name>:
3031
model: <target_model_name>
32+
description: <description> # Optional
3133
inputs:
3234
<upstream_model_or_external_table_name>:
3335
rows:
@@ -49,7 +51,7 @@ The YAML format is defined as follows:
4951
5052
The `rows` key is optional in the above format, so the following would also be valid:
5153

52-
```
54+
```yaml linenums="1"
5355
<unique_test_name>:
5456
model: <target_model_name>
5557
inputs:
@@ -97,7 +99,6 @@ SELECT
9799
FROM
98100
sqlmesh_example.incremental_model
99101
GROUP BY item_id
100-
ORDER BY item_id
101102
```
102103

103104
Notice how the query of the model definition above references one upstream model: `sqlmesh_example.incremental_model`.
@@ -109,16 +110,16 @@ test_example_full_model:
109110
model: sqlmesh_example.full_model
110111
inputs:
111112
sqlmesh_example.incremental_model:
112-
rows:
113-
- id: 1
114-
item_id: 1
115-
ds: '2020-01-01'
116-
- id: 2
117-
item_id: 1
118-
ds: '2020-01-02'
119-
- id: 3
120-
item_id: 2
121-
ds: '2020-01-03'
113+
rows:
114+
- id: 1
115+
item_id: 1
116+
event_date: '2020-01-01'
117+
- id: 2
118+
item_id: 1
119+
event_date: '2020-01-02'
120+
- id: 3
121+
item_id: 2
122+
event_date: '2020-01-03'
122123
outputs:
123124
query:
124125
rows:
@@ -128,7 +129,7 @@ test_example_full_model:
128129
num_orders: 1
129130
```
130131

131-
The `ds` column is not needed in the above test, since it is not referenced in `full_model`, so it may be omitted.
132+
The `event_date` column is not needed in the above test, since it is not referenced in `full_model`, so it may be omitted.
132133

133134
If we were only interested in testing the `num_orders` column, we could only specify input values for the `id` column of `sqlmesh_example.incremental_model`, thus rewriting the above test more compactly as follows:
134135

@@ -165,11 +166,10 @@ WITH filtered_orders_cte AS (
165166
)
166167
SELECT
167168
item_id,
168-
COUNT(distinct id) AS num_orders,
169+
COUNT(DISTINCT id) AS num_orders,
169170
FROM
170171
filtered_orders_cte
171172
GROUP BY item_id
172-
ORDER BY item_id
173173
```
174174

175175
Below is the example of a test that verifies individual rows returned by the `filtered_orders_cte` CTE before aggregation takes place:
@@ -182,13 +182,13 @@ test_example_full_model:
182182
rows:
183183
- id: 1
184184
item_id: 1
185-
ds: '2020-01-01'
185+
event_date: '2020-01-01'
186186
- id: 2
187187
item_id: 1
188-
ds: '2020-01-02'
188+
event_date: '2020-01-02'
189189
- id: 3
190190
item_id: 2
191-
ds: '2020-01-03'
191+
event_date: '2020-01-03'
192192
outputs:
193193
ctes:
194194
filtered_orders_cte:
@@ -217,22 +217,21 @@ In this example, we'll show how to generate a test for `sqlmesh_example.incremen
217217
MODEL (
218218
name sqlmesh_example.incremental_model,
219219
kind INCREMENTAL_BY_TIME_RANGE (
220-
time_column ds
220+
time_column event_date
221221
),
222222
start '2020-01-01',
223223
cron '@daily',
224-
grain (id, ds)
224+
grain (id, event_date)
225225
);
226226
227227
SELECT
228228
id,
229229
item_id,
230-
ds,
230+
event_date,
231231
FROM
232232
sqlmesh_example.seed_model
233233
WHERE
234-
ds between @start_ds and @end_ds
235-
234+
event_date BETWEEN @start_date AND @end_date
236235
```
237236

238237
Firstly, we need to specify the input data for the upstream model `sqlmesh_example.seed_model`. The `create_test` command starts by executing a user-supplied query against the project's data warehouse and uses the returned data to produce the test's input rows.
@@ -243,22 +242,22 @@ For instance, the following query will return three rows from the table correspo
243242
SELECT * FROM sqlmesh_example.seed_model LIMIT 3
244243
```
245244

246-
Next, notice that `sqlmesh_example.incremental_model` contains a filter which references the `@start_ds` and `@end_ds` [macro variables](macros/macro_variables.md).
245+
Next, notice that `sqlmesh_example.incremental_model` contains a filter which references the `@start_date` and `@end_date` [macro variables](macros/macro_variables.md).
247246

248-
To make the generated test deterministic and thus ensure that it will always succeed, we need to define these variables and modify the above query to constrain `ds` accordingly.
247+
To make the generated test deterministic and thus ensure that it will always succeed, we need to define these variables and modify the above query to constrain `event_date` accordingly.
249248

250-
If we set `@start_ds` to `'2020-01-01'` and `@end_ds` to `'2020-01-04'`, the above query needs to be changed to:
249+
If we set `@start_date` to `'2020-01-01'` and `@end_date` to `'2020-01-04'`, the above query needs to be changed to:
251250

252251
```sql linenums="1"
253-
SELECT * FROM sqlmesh_example.seed_model WHERE ds BETWEEN '2020-01-01' AND '2020-01-04' LIMIT 3
252+
SELECT * FROM sqlmesh_example.seed_model WHERE event_date BETWEEN '2020-01-01' AND '2020-01-04' LIMIT 3
254253
```
255254

256255
Finally, combining this query with the proper macro variable definitions, we can compute the expected output for the model's query in order to generate the complete test.
257256

258257
This can be achieved using the following command:
259258

260-
```bash
261-
$ sqlmesh create_test sqlmesh_example.incremental_model --query sqlmesh_example.seed_model "select * from sqlmesh_example.seed_model where ds between '2020-01-01' and '2020-01-04' limit 3" --var start '2020-01-01' --var end '2020-01-04'
259+
```
260+
$ sqlmesh create_test sqlmesh_example.incremental_model --query sqlmesh_example.seed_model "SELECT * FROM sqlmesh_example.seed_model WHERE event_date BETWEEN '2020-01-01' AND '2020-01-04' LIMIT 3" --var start '2020-01-01' --var end '2020-01-04'
262261
```
263262

264263
Running this creates the following new test, located at `tests/test_incremental_model.yaml`:
@@ -270,32 +269,32 @@ test_incremental_model:
270269
sqlmesh_example.seed_model:
271270
- id: 1
272271
item_id: 2
273-
ds: '2020-01-01'
272+
event_date: 2020-01-01
274273
- id: 2
275274
item_id: 1
276-
ds: '2020-01-01'
275+
event_date: 2020-01-01
277276
- id: 3
278277
item_id: 3
279-
ds: '2020-01-03'
278+
event_date: 2020-01-03
280279
outputs:
281280
query:
282281
- id: 1
283282
item_id: 2
284-
ds: '2020-01-01'
283+
event_date: 2020-01-01
285284
- id: 2
286285
item_id: 1
287-
ds: '2020-01-01'
286+
event_date: 2020-01-01
288287
- id: 3
289288
item_id: 3
290-
ds: '2020-01-03'
289+
event_date: 2020-01-03
291290
vars:
292291
start: '2020-01-01'
293292
end: '2020-01-04'
294293
```
295294

296295
As shown below, we now have two passing tests. Hooray!
297296

298-
```bash
297+
```
299298
$ sqlmesh test
300299
.
301300
----------------------------------------------------------------------
@@ -314,7 +313,7 @@ Tests run automatically every time a new [plan](plans.md) is created.
314313
315314
You can execute tests on demand using the `sqlmesh test` command as follows:
316315
317-
```bash
316+
```
318317
$ sqlmesh test
319318
.
320319
----------------------------------------------------------------------
@@ -325,13 +324,13 @@ OK
325324
326325
The command returns a non-zero exit code if there are any failures, and reports them in the standard error stream:
327326
328-
```bash
327+
```
329328
$ sqlmesh test
330329
F
331330
======================================================================
332331
FAIL: test_example_full_model (test/tests/test_full_model.yaml)
333332
----------------------------------------------------------------------
334-
AssertionError: Data differs (exp: expected, act: actual)
333+
AssertionError: Data mismatch (exp: expected, act: actual)
335334

336335
num_orders
337336
exp act
@@ -349,12 +348,12 @@ Note: when there are many differing columns, the corresponding DataFrame will be
349348
350349
To run a specific model test, pass in the suite file name followed by `::` and the name of the test:
351350
352-
```bash
351+
```
353352
$ sqlmesh test tests/test_full_model.yaml::test_example_full_model
354353
```
355354
356355
You can also run tests that match a pattern or substring using a glob pathname expansion syntax:
357356
358-
```bash
357+
```
359358
$ sqlmesh test tests/test_*
360359
```

sqlmesh/cli/example_project.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def _gen_config(dialect: t.Optional[str], template: ProjectTemplate) -> str:
7070
7171
SELECT
7272
item_id,
73-
count(distinct id) AS num_orders,
73+
COUNT(DISTINCT id) AS num_orders,
7474
FROM
7575
{EXAMPLE_INCREMENTAL_MODEL_NAME}
7676
GROUP BY item_id
@@ -93,7 +93,7 @@ def _gen_config(dialect: t.Optional[str], template: ProjectTemplate) -> str:
9393
FROM
9494
{EXAMPLE_SEED_MODEL_NAME}
9595
WHERE
96-
event_date between @start_date and @end_date
96+
event_date BETWEEN @start_date AND @end_date
9797
"""
9898

9999
EXAMPLE_SEED_MODEL_DEF = f"""MODEL (

0 commit comments

Comments
 (0)