Feat: Support name inference from schema directories (#2697)

themisvaltinos · web-flow · commit 1b79ef6c7d37 · 2024-06-04T13:25:08.000+03:00
diff --git a/docs/concepts/models/overview.md b/docs/concepts/models/overview.md
@@ -199,7 +199,7 @@ Learn more about these properties and their default values in the [model configu
 ### name
 - `name` specifies the name of the model. This name represents the production view name that the model outputs, so it generally takes the form of `"schema"."view_name"`. The name of a model must be unique in a SQLMesh project.<br /><br />
 When models are used in non-production environments, SQLMesh automatically prefixes the names. For example, consider a model named `"sushi"."customers"`. In production its view is named `"sushi"."customers"`, and in dev its view is named `"sushi__dev"."customers"`.<br /><br />
-Name is ***required*** and must be ***unique***.
+Name is ***required*** and must be ***unique***, unless [name inference](../../reference/model_configuration.md#model-naming) is enabled.
 
 ### kind
 - Kind specifies what [kind](model_kinds.md) a model is. A model's kind determines how it is computed and stored. The default kind is `VIEW`, which means a view is created and your query is run each time that view is accessed. See [below](#incremental-model-properties) for properties that apply to incremental model kinds.
diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md
@@ -944,6 +944,35 @@ This example demonstrates how to specify an incremental by time range model kind
 
 Learn more about specifying Python models at the [Python models concepts page](../concepts/models/python_models.md#model-specification).
 
+
+#### Model Naming
+
+The `model_naming` configuration controls if model names are inferred based on the project's directory structure. If `model_naming` is not defined or `infer_names` is set to false, the model names must be provided explicitly.
+
+With `infer_names` set to true, model names are inferred based on their path. For example, a model located at `models/catalog/schema/model.sql` would be named `catalog.schema.model`. However, if a name is provided in the model definition, it will take precedence over the inferred name.
+
+Example enabling name inference:
+
+=== "YAML"
+
+    ```yaml linenums="1"
+    model_naming:
+      infer_names: true
+    ```
+
+=== "Python"
+
+    ```python linenums="1"
+    from sqlmesh.core.config import Config, NameInferenceConfig
+
+    config = Config(
+        model_naming=NameInferenceConfig(
+            infer_names=True
+        )
+    )
+    ```
+
+
 ### Debug mode
 
 To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y".
diff --git a/docs/reference/model_configuration.md b/docs/reference/model_configuration.md
@@ -10,7 +10,7 @@ Configuration options for SQLMesh model properties. Supported by all model kinds
 
 | Option                | Description                                                                                                                                                                                                                                                                                      |       Type        | Required |
 | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :---------------: | :------: |
-| `name`                | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`).                                                                                                                                                         |        str        |    Y     |
+| `name`                | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). Can be omitted if [infer_names](#model-naming) is set to true.                                                                                                                                                        |        str        |    N     |
 | `kind`                | The model kind ([Additional Details](#model-kind-properties)). (Default: `VIEW`)                                                                                                                                                                                                                 |    str \| dict    |    N     |
 | `audits`              | SQLMesh [audits](../concepts/audits.md) that should run against the model's output                                                                                                                                                                                                               |    array[str]     |    N     |
 | `dialect`             | The SQL dialect in which the model's query is written. All SQL dialects [supported by the SQLGlot library](https://github.com/tobymao/sqlglot/blob/main/sqlglot/dialects/dialect.py) are allowed.                                                                                                |        str        |    N     |
@@ -49,6 +49,16 @@ The SQLMesh project-level `model_defaults` key supports the following options, d
 - session_properties (on per key basis)
 - on_destructive_change (described [below](#incremental-models))
 
+
+### Model Naming
+
+Configuration option for name inference. Learn more in the [model naming guide](../guides/configuration.md#model-naming).
+
+| Option          | Description                                                                             |  Type   | Required |
+| --------------- | --------------------------------------------------------------------------------------- | :-----: | :------: |
+| `infer_names`   | Whether to automatically infer model names based on the directory structure (Default: `False`) | bool |    N     |
+
+
 ## Model kind properties
 
 Configuration options for kind-specific SQLMesh model properties, in addition to the [general model properties](#general-model-properties) listed above.
@@ -155,7 +165,7 @@ Top-level options inside the MODEL DDL:
 
 | Option        | Description                                                                                                                                                                                                                |    Type    | Required |
 | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :------: |
-| `name`        | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`).                                                                                   |    str     |    Y     |
+| `name`        | The model name. Must include at least a qualifying schema (`<schema>.<model>`) and may include a catalog (`<catalog>.<schema>.<model>`). Can be omitted if [infer_names](#model-naming) is set to true.                                                                                |    str     |    N     |
 | `kind`        | The model kind. Must be `SEED`.                                                                                                                                                                                            |    str     |    Y     |
 | `columns`     | The column names and data types in the CSV file. Disables automatic inference of column names and types by the pandas CSV reader. NOTE: order of columns overrides the order specified in the CSV header row (if present). | array[str] |    N     |
 | `audits`      | SQLMesh [audits](../concepts/audits.md) that should run against the model's output                                                                                                                                         | array[str] |    N     |
diff --git a/sqlmesh/core/config/__init__.py b/sqlmesh/core/config/__init__.py
@@ -26,6 +26,7 @@
 )
 from sqlmesh.core.config.migration import MigrationConfig as MigrationConfig
 from sqlmesh.core.config.model import ModelDefaultsConfig as ModelDefaultsConfig
+from sqlmesh.core.config.naming import NameInferenceConfig as NameInferenceConfig
 from sqlmesh.core.config.plan import PlanConfig as PlanConfig
 from sqlmesh.core.config.root import Config as Config
 from sqlmesh.core.config.run import RunConfig as RunConfig
diff --git a/sqlmesh/core/config/naming.py b/sqlmesh/core/config/naming.py
@@ -0,0 +1,14 @@
+from __future__ import annotations
+
+from sqlmesh.core.config.base import BaseConfig
+
+
+class NameInferenceConfig(BaseConfig):
+    """Configuration for name inference of models from directory structure.
+
+    Args:
+        infer_names: A flag indicating whether name inference is enabled.
+
+    """
+
+    infer_names: bool = False
diff --git a/sqlmesh/core/config/root.py b/sqlmesh/core/config/root.py
@@ -26,6 +26,7 @@
 from sqlmesh.core.config.gateway import GatewayConfig
 from sqlmesh.core.config.migration import MigrationConfig
 from sqlmesh.core.config.model import ModelDefaultsConfig
+from sqlmesh.core.config.naming import NameInferenceConfig as NameInferenceConfig
 from sqlmesh.core.config.plan import PlanConfig
 from sqlmesh.core.config.run import RunConfig
 from sqlmesh.core.config.scheduler import BuiltInSchedulerConfig, SchedulerConfig
@@ -111,6 +112,7 @@ class Config(BaseConfig):
     feature_flags: FeatureFlag = FeatureFlag()
     plan: PlanConfig = PlanConfig()
     migration: MigrationConfig = MigrationConfig()
+    model_naming: NameInferenceConfig = NameInferenceConfig()
     variables: t.Dict[str, t.Any] = {}
     disable_anonymized_analytics: bool = False
 
diff --git a/sqlmesh/core/loader.py b/sqlmesh/core/loader.py
@@ -311,6 +311,7 @@ def _load() -> Model:
                         project=config.project,
                         default_catalog=self._context.default_catalog,
                         variables=variables,
+                        infer_names=config.model_naming.infer_names,
                     )
 
                 model = cache.get_or_load_model(path, _load)
@@ -354,6 +355,7 @@ def _load_python_models(self) -> UniqueKeyDict[str, Model]:
                             project=config.project,
                             default_catalog=self._context.default_catalog,
                             variables=variables,
+                            infer_names=config.model_naming.infer_names,
                         )
                         models[model.fqn] = model
             finally:
diff --git a/sqlmesh/core/model/decorator.py b/sqlmesh/core/model/decorator.py
@@ -3,13 +3,19 @@
 import logging
 import typing as t
 from pathlib import Path
+import inspect
 
 from sqlglot import exp
 from sqlglot.dialects.dialect import DialectType
 
 from sqlmesh.core import constants as c
 from sqlmesh.core.dialect import MacroFunc
-from sqlmesh.core.model.definition import Model, create_python_model, create_sql_model
+from sqlmesh.core.model.definition import (
+    Model,
+    create_python_model,
+    create_sql_model,
+    get_model_name,
+)
 from sqlmesh.core.model.kind import ModelKindName, _ModelKind
 from sqlmesh.utils import registry_decorator
 from sqlmesh.utils.errors import ConfigError
@@ -24,26 +30,11 @@ class model(registry_decorator):
     registry_name = "python_models"
     _dialect: DialectType = None
 
-    def __init__(self, name: str, is_sql: bool = False, **kwargs: t.Any) -> None:
-        if not name:
-            raise ConfigError("Python model must have a name.")
-
+    def __init__(self, name: t.Optional[str] = None, is_sql: bool = False, **kwargs: t.Any) -> None:
         if not is_sql and "columns" not in kwargs:
             raise ConfigError("Python model must define column schema.")
 
-        kind = kwargs.get("kind", None)
-        if kind is not None:
-            if isinstance(kind, _ModelKind):
-                logger.warning(
-                    f"""Python model "{name}"'s `kind` argument was passed a SQLMesh `{type(kind).__name__}` object. This may result in unexpected behavior - provide a dictionary instead."""
-                )
-            elif isinstance(kind, dict):
-                if "name" not in kind or not isinstance(kind.get("name"), ModelKindName):
-                    raise ConfigError(
-                        f"""Python model "{name}"'s `kind` dictionary must contain a `name` key with a valid ModelKindName enum value."""
-                    )
-
-        self.name = name
+        self.name = name or ""
         self.is_sql = is_sql
         self.kwargs = kwargs
 
@@ -90,11 +81,30 @@ def model(
         project: str = "",
         default_catalog: t.Optional[str] = None,
         variables: t.Optional[t.Dict[str, t.Any]] = None,
+        infer_names: t.Optional[bool] = False,
     ) -> Model:
         """Get the model registered by this function."""
         env: t.Dict[str, t.Any] = {}
         entrypoint = self.func.__name__
 
+        if not self.name and infer_names:
+            self.name = get_model_name(Path(inspect.getfile(self.func)))
+
+        if not self.name:
+            raise ConfigError("Python model must have a name.")
+
+        kind = self.kwargs.get("kind", None)
+        if kind is not None:
+            if isinstance(kind, _ModelKind):
+                logger.warning(
+                    f"""Python model "{self.name}"'s `kind` argument was passed a SQLMesh `{type(kind).__name__}` object. This may result in unexpected behavior - provide a dictionary instead."""
+                )
+            elif isinstance(kind, dict):
+                if "name" not in kind or not isinstance(kind.get("name"), ModelKindName):
+                    raise ConfigError(
+                        f"""Python model "{self.name}"'s `kind` dictionary must contain a `name` key with a valid ModelKindName enum value."""
+                    )
+
         build_env(self.func, env=env, name=entrypoint, path=module_path)
 
         common_kwargs = dict(
diff --git a/sqlmesh/core/model/definition.py b/sqlmesh/core/model/definition.py
@@ -1404,6 +1404,7 @@ def load_sql_based_model(
     physical_schema_override: t.Optional[t.Dict[str, str]] = None,
     default_catalog: t.Optional[str] = None,
     variables: t.Optional[t.Dict[str, t.Any]] = None,
+    infer_names: t.Optional[bool] = False,
     **kwargs: t.Any,
 ) -> Model:
     """Load a model from a parsed SQLMesh model SQL file.
@@ -1491,7 +1492,11 @@ def load_sql_based_model(
     if isinstance(meta_fields.get("dialect"), exp.Expression):
         meta_fields["dialect"] = meta_fields["dialect"].name
 
+    # The name of the model will be inferred from its path relative to `models/`, if it's not explicitly specified
     name = meta_fields.pop("name", "")
+    if not name and infer_names:
+        name = get_model_name(path)
+
     if not name:
         raise_config_error("Model must have a name", path)
     if "default_catalog" in meta_fields:
@@ -2122,3 +2127,8 @@ def _refs_to_sql(values: t.Any) -> exp.Expression:
     "allow_partials": exp.convert,
     "signals": lambda values: exp.Tuple(expressions=values),
 }
+
+
+def get_model_name(path: Path) -> str:
+    path_parts = list(path.parts[path.parts.index("models") + 1 : -1]) + [path.stem]
+    return ".".join(path_parts[-3:])
diff --git a/tests/core/test_model.py b/tests/core/test_model.py
@@ -11,15 +11,20 @@
 from pytest_mock.plugin import MockerFixture
 from sqlglot import exp, parse_one
 from sqlglot.schema import MappingSchema
+from sqlmesh.cli.example_project import init_example_project
 
 from sqlmesh.core import constants as c
 from sqlmesh.core import dialect as d
-from sqlmesh.core.config import Config
-from sqlmesh.core.config.model import ModelDefaultsConfig
+from sqlmesh.core.config import (
+    Config,
+    NameInferenceConfig,
+    ModelDefaultsConfig,
+)
 from sqlmesh.core.context import Context, ExecutionContext
 from sqlmesh.core.dialect import parse
 from sqlmesh.core.macros import MacroEvaluator, macro
 from sqlmesh.core.model import (
+    PythonModel,
     FullKind,
     IncrementalByTimeRangeKind,
     IncrementalUnmanagedKind,
@@ -1692,26 +1697,38 @@ def b_model(context):
 
     assert isinstance(python_model.kind, FullKind)
 
+    @model("kind_empty_dict", kind=dict(), columns={'"COL"': "int"})
+    def my_model(context):
+        pass
+
     # error if kind dict with no `name` key
     with pytest.raises(ConfigError, match="`kind` dictionary must contain a `name` key"):
+        python_model = model.get_registry()["kind_empty_dict"].model(
+            module_path=Path("."),
+            path=Path("."),
+        )
 
-        @model("kind_empty_dict", kind=dict(), columns={'"COL"': "int"})
-        def my_model(context):
-            pass
+    @model("kind_dict_badname", kind=dict(name="test"), columns={'"COL"': "int"})
+    def my_model_1(context):
+        pass
 
     # error if kind dict with `name` key whose type is not a ModelKindName enum
     with pytest.raises(ConfigError, match="with a valid ModelKindName enum value"):
+        python_model = model.get_registry()["kind_dict_badname"].model(
+            module_path=Path("."),
+            path=Path("."),
+        )
 
-        @model("kind_dict_badname", kind=dict(name="test"), columns={'"COL"': "int"})
-        def my_model(context):
-            pass
+    @model("kind_instance", kind=FullKind(), columns={'"COL"': "int"})
+    def my_model_2(context):
+        pass
 
     # warning if kind is ModelKind instance
     with patch.object(logger, "warning") as mock_logger:
-
-        @model("kind_instance", kind=FullKind(), columns={'"COL"': "int"})
-        def my_model(context):
-            pass
+        python_model = model.get_registry()["kind_instance"].model(
+            module_path=Path("."),
+            path=Path("."),
+        )
 
         assert (
             mock_logger.call_args[0][0]
@@ -4450,3 +4467,76 @@ def test_incremental_by_partition(sushi_context, assert_exp_eq):
             """
         )
         load_sql_based_model(expressions)
+
+
+@pytest.mark.parametrize(
+    ["model_def", "path", "expected_name"],
+    [
+        [
+            """dialect duckdb,""",
+            """models/test_schema/test_model.sql,""",
+            "test_schema.test_model",
+        ],
+        [
+            """dialect duckdb,""",
+            """models/test_model.sql,""",
+            "test_model",
+        ],
+        [
+            """dialect duckdb,""",
+            """models/inventory/db/test_schema/test_model.sql,""",
+            "db.test_schema.test_model",
+        ],
+        ["""name test_model,""", """models/schema/test_model.sql,""", "test_model"],
+    ],
+)
+def test_model_table_name_inference(
+    sushi_context: Context, model_def: str, path: str, expected_name: str
+):
+    model = load_sql_based_model(
+        d.parse(
+            f"""
+        MODEL (
+            {model_def}
+        );
+        SELECT a FROM tbl;
+        """,
+            default_dialect="duckdb",
+        ),
+        path=Path(f"$root/{path}"),
+        infer_names=True,
+    )
+    assert model.name == expected_name
+
+
+@pytest.mark.parametrize(
+    ["path", "expected_name"],
+    [
+        [
+            """models/test_schema/test_model.py""",
+            "test_schema.test_model",
+        ],
+        [
+            """models/inventory/db/test_schema/test_model.py""",
+            "db.test_schema.test_model",
+        ],
+    ],
+)
+def test_python_model_name_inference(tmp_path: Path, path: str, expected_name: str) -> None:
+    init_example_project(tmp_path, dialect="duckdb")
+    config = Config(
+        model_defaults=ModelDefaultsConfig(dialect="duckdb"),
+        model_naming=NameInferenceConfig(infer_names=True),
+    )
+
+    foo_py_file = tmp_path / path
+    foo_py_file.parent.mkdir(parents=True, exist_ok=True)
+    foo_py_file.write_text("""from sqlmesh import model
+@model(
+    columns={'"COL"': "int"},
+)
+def my_model(context, **kwargs):
+    pass""")
+    context = Context(paths=tmp_path, config=config)
+    assert context.get_model(expected_name).name == expected_name
+    assert isinstance(context.get_model(expected_name), PythonModel)

Original file line number	Diff line number	Diff line change
`@@ -26,6 +26,7 @@`
`26`	`26`	`)`
`27`	`27`	`from sqlmesh.core.config.migration import MigrationConfig as MigrationConfig`
`28`	`28`	`from sqlmesh.core.config.model import ModelDefaultsConfig as ModelDefaultsConfig`
	`29`	`+from sqlmesh.core.config.naming import NameInferenceConfig as NameInferenceConfig`
`29`	`30`	`from sqlmesh.core.config.plan import PlanConfig as PlanConfig`
`30`	`31`	`from sqlmesh.core.config.root import Config as Config`
`31`	`32`	`from sqlmesh.core.config.run import RunConfig as RunConfig`
Original file line number	Diff line number	Diff line change
`@@ -311,6 +311,7 @@ def _load() -> Model:`
`311`	`311`	`project=config.project,`
`312`	`312`	`default_catalog=self._context.default_catalog,`
`313`	`313`	`variables=variables,`
	`314`	`+ infer_names=config.model_naming.infer_names,`
`314`	`315`	`)`
`315`	`316`
`316`	`317`	`model = cache.get_or_load_model(path, _load)`
`@@ -354,6 +355,7 @@ def _load_python_models(self) -> UniqueKeyDict[str, Model]:`
`354`	`355`	`project=config.project,`
`355`	`356`	`default_catalog=self._context.default_catalog,`
`356`	`357`	`variables=variables,`
	`358`	`+ infer_names=config.model_naming.infer_names,`
`357`	`359`	`)`
`358`	`360`	`models[model.fqn] = model`
`359`	`361`	`finally:`