Skip to content

Commit 1b79ef6

Browse files
Feat: Support name inference from schema directories (#2697)
1 parent cba8735 commit 1b79ef6

File tree

10 files changed

+201
-33
lines changed

10 files changed

+201
-33
lines changed

docs/concepts/models/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ Learn more about these properties and their default values in the [model configu
199199
### name
200200
- `name` specifies the name of the model. This name represents the production view name that the model outputs, so it generally takes the form of `"schema"."view_name"`. The name of a model must be unique in a SQLMesh project.<br /><br />
201201
When models are used in non-production environments, SQLMesh automatically prefixes the names. For example, consider a model named `"sushi"."customers"`. In production its view is named `"sushi"."customers"`, and in dev its view is named `"sushi__dev"."customers"`.<br /><br />
202-
Name is ***required*** and must be ***unique***.
202+
Name is ***required*** and must be ***unique***, unless [name inference](../../reference/model_configuration.md#model-naming) is enabled.
203203

204204
### kind
205205
- Kind specifies what [kind](model_kinds.md) a model is. A model's kind determines how it is computed and stored. The default kind is `VIEW`, which means a view is created and your query is run each time that view is accessed. See [below](#incremental-model-properties) for properties that apply to incremental model kinds.

docs/guides/configuration.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -944,6 +944,35 @@ This example demonstrates how to specify an incremental by time range model kind
944944

945945
Learn more about specifying Python models at the [Python models concepts page](../concepts/models/python_models.md#model-specification).
946946

947+
948+
#### Model Naming
949+
950+
The `model_naming` configuration controls if model names are inferred based on the project's directory structure. If `model_naming` is not defined or `infer_names` is set to false, the model names must be provided explicitly.
951+
952+
With `infer_names` set to true, model names are inferred based on their path. For example, a model located at `models/catalog/schema/model.sql` would be named `catalog.schema.model`. However, if a name is provided in the model definition, it will take precedence over the inferred name.
953+
954+
Example enabling name inference:
955+
956+
=== "YAML"
957+
958+
```yaml linenums="1"
959+
model_naming:
960+
infer_names: true
961+
```
962+
963+
=== "Python"
964+
965+
```python linenums="1"
966+
from sqlmesh.core.config import Config, NameInferenceConfig
967+
968+
config = Config(
969+
model_naming=NameInferenceConfig(
970+
infer_names=True
971+
)
972+
)
973+
```
974+
975+
947976
### Debug mode
948977

949978
To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y".

docs/reference/model_configuration.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Configuration options for SQLMesh model properties. Supported by all model kinds
1010

1111
| Option | Description | Type | Required |
1212
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :---------------: | :------: |
13-
| `name` | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). | str | Y |
13+
| `name` | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). Can be omitted if [infer_names](#model-naming) is set to true. | str | N |
1414
| `kind` | The model kind ([Additional Details](#model-kind-properties)). (Default: `VIEW`) | str \| dict | N |
1515
| `audits` | SQLMesh [audits](../concepts/audits.md) that should run against the model's output | array[str] | N |
1616
| `dialect` | The SQL dialect in which the model's query is written. All SQL dialects [supported by the SQLGlot library](https://github.com/tobymao/sqlglot/blob/main/sqlglot/dialects/dialect.py) are allowed. | str | N |
@@ -49,6 +49,16 @@ The SQLMesh project-level `model_defaults` key supports the following options, d
4949
- session_properties (on per key basis)
5050
- on_destructive_change (described [below](#incremental-models))
5151

52+
53+
### Model Naming
54+
55+
Configuration option for name inference. Learn more in the [model naming guide](../guides/configuration.md#model-naming).
56+
57+
| Option | Description | Type | Required |
58+
| --------------- | --------------------------------------------------------------------------------------- | :-----: | :------: |
59+
| `infer_names` | Whether to automatically infer model names based on the directory structure (Default: `False`) | bool | N |
60+
61+
5262
## Model kind properties
5363

5464
Configuration options for kind-specific SQLMesh model properties, in addition to the [general model properties](#general-model-properties) listed above.
@@ -155,7 +165,7 @@ Top-level options inside the MODEL DDL:
155165

156166
| Option | Description | Type | Required |
157167
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :------: |
158-
| `name` | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). | str | Y |
168+
| `name` | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). Can be omitted if [infer_names](#model-naming) is set to true. | str | N |
159169
| `kind` | The model kind. Must be `SEED`. | str | Y |
160170
| `columns` | The column names and data types in the CSV file. Disables automatic inference of column names and types by the pandas CSV reader. NOTE: order of columns overrides the order specified in the CSV header row (if present). | array[str] | N |
161171
| `audits` | SQLMesh [audits](../concepts/audits.md) that should run against the model's output | array[str] | N |

sqlmesh/core/config/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
)
2727
from sqlmesh.core.config.migration import MigrationConfig as MigrationConfig
2828
from sqlmesh.core.config.model import ModelDefaultsConfig as ModelDefaultsConfig
29+
from sqlmesh.core.config.naming import NameInferenceConfig as NameInferenceConfig
2930
from sqlmesh.core.config.plan import PlanConfig as PlanConfig
3031
from sqlmesh.core.config.root import Config as Config
3132
from sqlmesh.core.config.run import RunConfig as RunConfig

sqlmesh/core/config/naming.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from __future__ import annotations
2+
3+
from sqlmesh.core.config.base import BaseConfig
4+
5+
6+
class NameInferenceConfig(BaseConfig):
7+
"""Configuration for name inference of models from directory structure.
8+
9+
Args:
10+
infer_names: A flag indicating whether name inference is enabled.
11+
12+
"""
13+
14+
infer_names: bool = False

sqlmesh/core/config/root.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from sqlmesh.core.config.gateway import GatewayConfig
2727
from sqlmesh.core.config.migration import MigrationConfig
2828
from sqlmesh.core.config.model import ModelDefaultsConfig
29+
from sqlmesh.core.config.naming import NameInferenceConfig as NameInferenceConfig
2930
from sqlmesh.core.config.plan import PlanConfig
3031
from sqlmesh.core.config.run import RunConfig
3132
from sqlmesh.core.config.scheduler import BuiltInSchedulerConfig, SchedulerConfig
@@ -111,6 +112,7 @@ class Config(BaseConfig):
111112
feature_flags: FeatureFlag = FeatureFlag()
112113
plan: PlanConfig = PlanConfig()
113114
migration: MigrationConfig = MigrationConfig()
115+
model_naming: NameInferenceConfig = NameInferenceConfig()
114116
variables: t.Dict[str, t.Any] = {}
115117
disable_anonymized_analytics: bool = False
116118

sqlmesh/core/loader.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ def _load() -> Model:
311311
project=config.project,
312312
default_catalog=self._context.default_catalog,
313313
variables=variables,
314+
infer_names=config.model_naming.infer_names,
314315
)
315316

316317
model = cache.get_or_load_model(path, _load)
@@ -354,6 +355,7 @@ def _load_python_models(self) -> UniqueKeyDict[str, Model]:
354355
project=config.project,
355356
default_catalog=self._context.default_catalog,
356357
variables=variables,
358+
infer_names=config.model_naming.infer_names,
357359
)
358360
models[model.fqn] = model
359361
finally:

sqlmesh/core/model/decorator.py

Lines changed: 28 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,19 @@
33
import logging
44
import typing as t
55
from pathlib import Path
6+
import inspect
67

78
from sqlglot import exp
89
from sqlglot.dialects.dialect import DialectType
910

1011
from sqlmesh.core import constants as c
1112
from sqlmesh.core.dialect import MacroFunc
12-
from sqlmesh.core.model.definition import Model, create_python_model, create_sql_model
13+
from sqlmesh.core.model.definition import (
14+
Model,
15+
create_python_model,
16+
create_sql_model,
17+
get_model_name,
18+
)
1319
from sqlmesh.core.model.kind import ModelKindName, _ModelKind
1420
from sqlmesh.utils import registry_decorator
1521
from sqlmesh.utils.errors import ConfigError
@@ -24,26 +30,11 @@ class model(registry_decorator):
2430
registry_name = "python_models"
2531
_dialect: DialectType = None
2632

27-
def __init__(self, name: str, is_sql: bool = False, **kwargs: t.Any) -> None:
28-
if not name:
29-
raise ConfigError("Python model must have a name.")
30-
33+
def __init__(self, name: t.Optional[str] = None, is_sql: bool = False, **kwargs: t.Any) -> None:
3134
if not is_sql and "columns" not in kwargs:
3235
raise ConfigError("Python model must define column schema.")
3336

34-
kind = kwargs.get("kind", None)
35-
if kind is not None:
36-
if isinstance(kind, _ModelKind):
37-
logger.warning(
38-
f"""Python model "{name}"'s `kind` argument was passed a SQLMesh `{type(kind).__name__}` object. This may result in unexpected behavior - provide a dictionary instead."""
39-
)
40-
elif isinstance(kind, dict):
41-
if "name" not in kind or not isinstance(kind.get("name"), ModelKindName):
42-
raise ConfigError(
43-
f"""Python model "{name}"'s `kind` dictionary must contain a `name` key with a valid ModelKindName enum value."""
44-
)
45-
46-
self.name = name
37+
self.name = name or ""
4738
self.is_sql = is_sql
4839
self.kwargs = kwargs
4940

@@ -90,11 +81,30 @@ def model(
9081
project: str = "",
9182
default_catalog: t.Optional[str] = None,
9283
variables: t.Optional[t.Dict[str, t.Any]] = None,
84+
infer_names: t.Optional[bool] = False,
9385
) -> Model:
9486
"""Get the model registered by this function."""
9587
env: t.Dict[str, t.Any] = {}
9688
entrypoint = self.func.__name__
9789

90+
if not self.name and infer_names:
91+
self.name = get_model_name(Path(inspect.getfile(self.func)))
92+
93+
if not self.name:
94+
raise ConfigError("Python model must have a name.")
95+
96+
kind = self.kwargs.get("kind", None)
97+
if kind is not None:
98+
if isinstance(kind, _ModelKind):
99+
logger.warning(
100+
f"""Python model "{self.name}"'s `kind` argument was passed a SQLMesh `{type(kind).__name__}` object. This may result in unexpected behavior - provide a dictionary instead."""
101+
)
102+
elif isinstance(kind, dict):
103+
if "name" not in kind or not isinstance(kind.get("name"), ModelKindName):
104+
raise ConfigError(
105+
f"""Python model "{self.name}"'s `kind` dictionary must contain a `name` key with a valid ModelKindName enum value."""
106+
)
107+
98108
build_env(self.func, env=env, name=entrypoint, path=module_path)
99109

100110
common_kwargs = dict(

sqlmesh/core/model/definition.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1404,6 +1404,7 @@ def load_sql_based_model(
14041404
physical_schema_override: t.Optional[t.Dict[str, str]] = None,
14051405
default_catalog: t.Optional[str] = None,
14061406
variables: t.Optional[t.Dict[str, t.Any]] = None,
1407+
infer_names: t.Optional[bool] = False,
14071408
**kwargs: t.Any,
14081409
) -> Model:
14091410
"""Load a model from a parsed SQLMesh model SQL file.
@@ -1491,7 +1492,11 @@ def load_sql_based_model(
14911492
if isinstance(meta_fields.get("dialect"), exp.Expression):
14921493
meta_fields["dialect"] = meta_fields["dialect"].name
14931494

1495+
# The name of the model will be inferred from its path relative to `models/`, if it's not explicitly specified
14941496
name = meta_fields.pop("name", "")
1497+
if not name and infer_names:
1498+
name = get_model_name(path)
1499+
14951500
if not name:
14961501
raise_config_error("Model must have a name", path)
14971502
if "default_catalog" in meta_fields:
@@ -2122,3 +2127,8 @@ def _refs_to_sql(values: t.Any) -> exp.Expression:
21222127
"allow_partials": exp.convert,
21232128
"signals": lambda values: exp.Tuple(expressions=values),
21242129
}
2130+
2131+
2132+
def get_model_name(path: Path) -> str:
2133+
path_parts = list(path.parts[path.parts.index("models") + 1 : -1]) + [path.stem]
2134+
return ".".join(path_parts[-3:])

tests/core/test_model.py

Lines changed: 102 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,20 @@
1111
from pytest_mock.plugin import MockerFixture
1212
from sqlglot import exp, parse_one
1313
from sqlglot.schema import MappingSchema
14+
from sqlmesh.cli.example_project import init_example_project
1415

1516
from sqlmesh.core import constants as c
1617
from sqlmesh.core import dialect as d
17-
from sqlmesh.core.config import Config
18-
from sqlmesh.core.config.model import ModelDefaultsConfig
18+
from sqlmesh.core.config import (
19+
Config,
20+
NameInferenceConfig,
21+
ModelDefaultsConfig,
22+
)
1923
from sqlmesh.core.context import Context, ExecutionContext
2024
from sqlmesh.core.dialect import parse
2125
from sqlmesh.core.macros import MacroEvaluator, macro
2226
from sqlmesh.core.model import (
27+
PythonModel,
2328
FullKind,
2429
IncrementalByTimeRangeKind,
2530
IncrementalUnmanagedKind,
@@ -1692,26 +1697,38 @@ def b_model(context):
16921697

16931698
assert isinstance(python_model.kind, FullKind)
16941699

1700+
@model("kind_empty_dict", kind=dict(), columns={'"COL"': "int"})
1701+
def my_model(context):
1702+
pass
1703+
16951704
# error if kind dict with no `name` key
16961705
with pytest.raises(ConfigError, match="`kind` dictionary must contain a `name` key"):
1706+
python_model = model.get_registry()["kind_empty_dict"].model(
1707+
module_path=Path("."),
1708+
path=Path("."),
1709+
)
16971710

1698-
@model("kind_empty_dict", kind=dict(), columns={'"COL"': "int"})
1699-
def my_model(context):
1700-
pass
1711+
@model("kind_dict_badname", kind=dict(name="test"), columns={'"COL"': "int"})
1712+
def my_model_1(context):
1713+
pass
17011714

17021715
# error if kind dict with `name` key whose type is not a ModelKindName enum
17031716
with pytest.raises(ConfigError, match="with a valid ModelKindName enum value"):
1717+
python_model = model.get_registry()["kind_dict_badname"].model(
1718+
module_path=Path("."),
1719+
path=Path("."),
1720+
)
17041721

1705-
@model("kind_dict_badname", kind=dict(name="test"), columns={'"COL"': "int"})
1706-
def my_model(context):
1707-
pass
1722+
@model("kind_instance", kind=FullKind(), columns={'"COL"': "int"})
1723+
def my_model_2(context):
1724+
pass
17081725

17091726
# warning if kind is ModelKind instance
17101727
with patch.object(logger, "warning") as mock_logger:
1711-
1712-
@model("kind_instance", kind=FullKind(), columns={'"COL"': "int"})
1713-
def my_model(context):
1714-
pass
1728+
python_model = model.get_registry()["kind_instance"].model(
1729+
module_path=Path("."),
1730+
path=Path("."),
1731+
)
17151732

17161733
assert (
17171734
mock_logger.call_args[0][0]
@@ -4450,3 +4467,76 @@ def test_incremental_by_partition(sushi_context, assert_exp_eq):
44504467
"""
44514468
)
44524469
load_sql_based_model(expressions)
4470+
4471+
4472+
@pytest.mark.parametrize(
4473+
["model_def", "path", "expected_name"],
4474+
[
4475+
[
4476+
"""dialect duckdb,""",
4477+
"""models/test_schema/test_model.sql,""",
4478+
"test_schema.test_model",
4479+
],
4480+
[
4481+
"""dialect duckdb,""",
4482+
"""models/test_model.sql,""",
4483+
"test_model",
4484+
],
4485+
[
4486+
"""dialect duckdb,""",
4487+
"""models/inventory/db/test_schema/test_model.sql,""",
4488+
"db.test_schema.test_model",
4489+
],
4490+
["""name test_model,""", """models/schema/test_model.sql,""", "test_model"],
4491+
],
4492+
)
4493+
def test_model_table_name_inference(
4494+
sushi_context: Context, model_def: str, path: str, expected_name: str
4495+
):
4496+
model = load_sql_based_model(
4497+
d.parse(
4498+
f"""
4499+
MODEL (
4500+
{model_def}
4501+
);
4502+
SELECT a FROM tbl;
4503+
""",
4504+
default_dialect="duckdb",
4505+
),
4506+
path=Path(f"$root/{path}"),
4507+
infer_names=True,
4508+
)
4509+
assert model.name == expected_name
4510+
4511+
4512+
@pytest.mark.parametrize(
4513+
["path", "expected_name"],
4514+
[
4515+
[
4516+
"""models/test_schema/test_model.py""",
4517+
"test_schema.test_model",
4518+
],
4519+
[
4520+
"""models/inventory/db/test_schema/test_model.py""",
4521+
"db.test_schema.test_model",
4522+
],
4523+
],
4524+
)
4525+
def test_python_model_name_inference(tmp_path: Path, path: str, expected_name: str) -> None:
4526+
init_example_project(tmp_path, dialect="duckdb")
4527+
config = Config(
4528+
model_defaults=ModelDefaultsConfig(dialect="duckdb"),
4529+
model_naming=NameInferenceConfig(infer_names=True),
4530+
)
4531+
4532+
foo_py_file = tmp_path / path
4533+
foo_py_file.parent.mkdir(parents=True, exist_ok=True)
4534+
foo_py_file.write_text("""from sqlmesh import model
4535+
@model(
4536+
columns={'"COL"': "int"},
4537+
)
4538+
def my_model(context, **kwargs):
4539+
pass""")
4540+
context = Context(paths=tmp_path, config=config)
4541+
assert context.get_model(expected_name).name == expected_name
4542+
assert isinstance(context.get_model(expected_name), PythonModel)

0 commit comments

Comments
 (0)