From 40ade4626401b52cfa55884e133a1ab9c29b6c83 Mon Sep 17 00:00:00 2001 From: Seth Fitzsimmons Date: Wed, 11 Feb 2026 14:37:25 -0800 Subject: [PATCH 1/3] Add draft tags design Draft design for replacing the hardcoded namespace concept with extensible tags declared by package authors and derived by tag providers. --- docs/designs/tags.md | 378 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 378 insertions(+) create mode 100644 docs/designs/tags.md diff --git a/docs/designs/tags.md b/docs/designs/tags.md new file mode 100644 index 000000000..744048be2 --- /dev/null +++ b/docs/designs/tags.md @@ -0,0 +1,378 @@ +# Tags: Extensible Model Grouping via Entry Points + +Replace the hardcoded `namespace` concept (`"overture"`, `"annex"`) with tags -- +string labels declared by package authors and derived by tag providers. Tags +become the filtering and grouping mechanism for model discovery, driven by +package authors rather than central coordination. + +Theme remains a first-class structural field. It maps to data partitioning and +entry point naming, distinct from tags which are descriptive metadata. + +## Phase 1: Basic Tagging + +Replace namespace with tags, move discovery to `system`, update CLI. + +### Data Model + +`ModelKey` drops `namespace`, gains `tags`: + +```python +@dataclass(frozen=True, slots=True) +class ModelKey: + theme: str | None + type: str + tags: frozenset[str] + class_name: str +``` + +### Entry Point Format + +Names change from `namespace:theme:type` to `theme:type` (or just `type` for +non-themed models), with optional `#tag1,tag2` suffix. The group remains +`overture.models`. + +```toml +[project] +keywords = ["some-org-tag"] + +[project.entry-points."overture.models"] +"buildings:building" = "overture.schema.buildings:Building" +"buildings:building_part#draft" = "overture.schema.buildings:BuildingPart" +``` + +A model's tags are the union of: + +- `[project].keywords` from the distribution metadata (read via + `entry_point.dist.metadata["Keywords"]`) +- Per-model `#` tags from the entry point name + +Note: `[project].keywords` is also used for PyPI search, so schema tags and +PyPI keywords share a namespace. This was considered and accepted -- in +practice, schema-relevant keywords and PyPI discoverability keywords overlap +naturally, and the `#` per-model mechanism covers cases where finer control is +needed. + +### Tag Prefix Reservation + +The `system:` prefix is reserved for tag providers. Discovery rejects +author-declared tags (keywords and `#`) that start with `system:`. Static +sources containing `system:*` tags produce an error (or warning + discard). + +### Discovery + +`discover_models` and `ModelKey` move to `overture.schema.system.discovery`: + +```python +def discover_models(tags: set[str] | None = None) -> dict[ModelKey, type[BaseModel]]: +``` + +When `tags` is provided, any-match semantics apply: models whose effective tags +intersect the filter set are included. When `None`, all models are returned. + +Implementation: + +1. Iterate `overture.models` entry points +2. Split name on `#` to separate `theme:type` from per-model tags +3. Split the prefix on `:` -- one part = type only (theme is `None`), two + parts = theme + type +4. Load the model class via `entry_point.load()` +5. Read `entry_point.dist.metadata["Keywords"]` for package-level tags +6. Union package tags and per-model tags, rejecting any `system:*` tags +7. Build `ModelKey` with `frozenset` tags + +### CLI + +`--namespace` and `--overture-types` are removed. `--tag` (repeatable) replaces +them. `--theme` and `--type` remain. + +Repeated `--tag` flags use OR: `--tag foo --tag bar` matches models with +either tag. Repeated `--theme` flags also use OR. Filters compose as AND +across dimensions: `--tag foo --theme buildings` takes the OR result from tags +and intersects it with the OR result from themes. + +`list-types` groups by theme and displays tags per model: + +``` +buildings + building overture + building_part overture, draft +places + place overture +transportation + connector overture + segment overture +(unthemed) + sources overture +``` + +### Migration + +Existing `namespace:theme:type` entry points become `theme:type` (or `type`). +The `namespace` field is removed from `ModelKey`. The `overture` namespace +does not need replacement as a keyword or `#` tag -- Phase 2 introduces a tag +provider in `core` that derives the `overture` tag from `OvertureFeature` +inheritance. The `annex` namespace has no equivalent; annex packages either +declare their own keywords or rely on tag providers. + +### What Moves + +- `discover_models`, `ModelKey` move to `overture.schema.system.discovery` +- CLI imports from `system` directly + +--- + +## Phase 2: Tag Providers + +Derived tags from model introspection. The `overture` tag (currently a hardcoded +namespace) becomes a derived tag produced by a provider in `core`. + +### Provider Mechanism + +A new entry point group `overture.tag_providers` registers callables. The entry +point key is informational (describes the provider's purpose) and not +interpreted by discovery. + +Provider signature: + +```python +(model_class: type[BaseModel], key: ModelKey, tags: set[str]) -> set[str] +``` + +Providers receive the accumulated tag set and return a tag set (either the +same object mutated or a new set). Tags accumulate across providers: each +provider sees the result of all prior providers plus the static tags. The +caller copies the set before passing it to each provider and diffs the copy +against the return value to detect additions and removals. Providers can +remove static tags (from keywords and `#`) and tags added by earlier +providers. + +**Unresolved: provider ordering.** Since providers chain, execution order +affects results. Leading candidate: alphabetical by entry point name with +rc.d-style numeric prefixes (`10_extensions`, `50_overture`, `90_approved`). +Since the entry point key is informational, providers self-register their +ordering. Published guidance would define what the priority ranges mean (e.g., +0-19 for structural tags, 20-49 for derived tags, 50-79 for classification, +80-99 for approval/policy). Ranges are illustrative -- actual boundaries TBD. + +Providers run after static tag resolution, before filtering. Only tag +providers registered by `overture-schema-system` may add `system:`-prefixed +tags. Discovery enforces this by checking `entry_point.dist.name` before +accepting `system:` additions. This is a convention boundary, not a security +mechanism -- any package could name itself `overture-schema-system`. It +prevents accidental `system:` claims from third-party providers and +ecosystem-specific packages (like `core`). + +### Two-Phase Discovery + +With tag providers, `discover_models` becomes: + +1. **Load phase**: load models, resolve static tags (keywords + `#`) +2. **Enrich phase**: load all `overture.tag_providers` entry points, run each + provider (passing a copy of the current tags), diff input/output to detect + changes, enforce `system:` prefix rules +3. Freeze final `set[str]` tags into `frozenset[str]` on `ModelKey`, apply + filter, return + +### The `overture` Tag Provider + +Lives in `core`, registered as an entry point: + +```toml +# packages/overture-schema-core/pyproject.toml +[project.entry-points."overture.tag_providers"] +overture = "overture.schema.core.tags:overture_provider" +``` + +```python +from overture.schema.core import OvertureFeature + +def overture_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + if isinstance(model_class, type) and issubclass(model_class, OvertureFeature): + tags.add("overture") + return tags +``` + +Theme packages do not need `"overture"` in `[project].keywords`. The tag is +derived from `OvertureFeature` inheritance by this provider. + +--- + +## Phase 3: Extension Support + +Layer extension registration and discovery onto the tagging system. The +extension mechanism itself (how extensions modify or compose with base models) +is TBD and designed separately. This phase builds on Roel's proposal but that +design is still subject to change. What follows covers registration, discovery, +and CLI display only. + +### Extension Primitives + +`@extends` decorator and `Extends()` annotation move from `core` to +`overture.schema.system`. These declare that a model augments one or more base +models: + +```python +@extends(Building, Place) +class OperatingHours(BaseModel): + hours: ... + timezone: ... +``` + +Or as a `NewType` with annotation: + +```python +OperatingHours = Annotated[OperatingHoursModel, Extends(Building, Place)] +``` + +### Extension Tag Provider + +Lives in `system`, registered as an entry point: + +```toml +# packages/overture-schema-system/pyproject.toml +[project.entry-points."overture.tag_providers"] +extensions = "overture.schema.system.tags:extension_provider" +``` + +```python +def extension_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + if extends_classes(model_class): + tags.add("system:extension") + return tags +``` + +Package authors register extensions as regular `overture.models` entry points. +Extensions follow the same naming rules as other models: `theme:type` for +themed extensions, or just `type` for unthemed ones. The `system:extension` +tag is derived automatically; authors never declare it. + +### Extension Discovery Helper + +Extensions declare which base models they augment (via `@extends` / +`Extends()`), but callers typically need the reverse: given a base model, which +extensions apply to it? A convenience function in `system` builds this reverse +mapping: + +```python +def extension_graph( + models: dict[ModelKey, type[BaseModel]], +) -> dict[type[BaseModel], list[type[BaseModel]]]: + """Map base models to the extensions that apply to them.""" +``` + +Filters `models` for the `system:extension` tag, calls `extends_classes()` on +each to get its declared base models, and inverts the relationship. The CLI +uses this to show registered extensions alongside the models they extend +without reimplementing the traversal. + +### CLI Display + +The CLI treats `system:extension` as a presentation hint: + +**`list-types`**: extensions appear in a separate section listing which models +they extend. Core models show a compact cross-reference to available extensions. + +``` +buildings + building overture + + capacity, operating-hours + building_part overture, draft + +places + place overture + + operating-hours + +extensions + capacity system:extension + extends: building + operating-hours system:extension + extends: building, place +``` + +**`describe-type` on a core model**: includes a "Registered extensions" section +after the model's own fields. + +``` +building (buildings) overture + + Fields: + geometry Geometry + subtype BuildingSubtype | None + ... + + Registered extensions: + capacity max_occupancy, floor_area, ... + operating-hours hours, timezone, ... +``` + +**`describe-type` on an extension**: shows which models it extends and full +field definitions. + +--- + +## Phase 4: Manifest-Driven Approval (Sketch) + +A tag provider that checks models against a curated manifest, enabling +organizations to certify which models (and extensions) they endorse. + +### Concept + +A tag provider reads a manifest of approved entries and adds an approval tag +to matching models. This lets an organization certify models without requiring +changes to the model packages themselves. + +### Shape + +The manifest lives in a dedicated package (e.g., +`overture-schema-official-extensions`) that depends on nothing but `system`. +It registers a tag provider: + +```toml +# packages/overture-schema-official-extensions/pyproject.toml +[project.entry-points."overture.tag_providers"] +approved = "overture.schema.official_extensions:approved_provider" +``` + +The manifest can match against class names (entry point values), entry point +names, and/or package names with optional version specifiers. Package names +are harder to collide than class names, making them a stronger identifier for +third-party extensions. + +```python +APPROVED = { + # by class name + "overture.schema.buildings:Building", + "overture.schema.buildings:BuildingPart", + # by package name (with optional version specifier) + "acme-schema-parcels>=1.0", +} + +def approved_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + if _matches_manifest(key): + tags.add("approved") + return tags +``` + +Note: `approved` is an unprefixed tag (not `system:approved`) since this +provider lives outside `system`. + +Note: `_matches_manifest` needs more than `ModelKey` to match against package +names. `ModelKey` carries `class_name` (the entry point value) but not the +distribution/package name. The provider will need access to additional context +(e.g., the entry point object itself, or an enriched key). The right mechanism +is TBD. + +### Use Cases + +- **Overture release certification**: only models in the manifest are tagged + `approved` for a given release. The manifest package is versioned alongside + the release. +- **Organizational policy**: a company publishes their own manifest package + with their own approved set (including approved extensions). +- **CLI filtering**: `--tag approved` shows only certified models. From f37fdefaa57c9d32aab9cbf8388baf06d1b61a01 Mon Sep 17 00:00:00 2001 From: Seth Fitzsimmons Date: Thu, 12 Feb 2026 15:05:41 -0800 Subject: [PATCH 2/3] Replace theme with machine tags in design ModelKey loses theme and type fields; gains name (from entry point key) and class_name (entry point value). Theme becomes overture:theme=buildings declared in [project].keywords rather than parsed from entry point naming. Introduces machine tag format [namespace:]key[=value] with tags_by_key/tags_by_namespace helpers in system. CLI replaces --theme with generic --group-by that works for any structured tag dimension. Adds codegen path generation appendix: feature models use tags_by_key for theme directory, supplementary types use schema_root tag + module prefix stripping. --- docs/designs/tags.md | 210 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 167 insertions(+), 43 deletions(-) diff --git a/docs/designs/tags.md b/docs/designs/tags.md index 744048be2..040eb12b7 100644 --- a/docs/designs/tags.md +++ b/docs/designs/tags.md @@ -5,39 +5,57 @@ string labels declared by package authors and derived by tag providers. Tags become the filtering and grouping mechanism for model discovery, driven by package authors rather than central coordination. -Theme remains a first-class structural field. It maps to data partitioning and -entry point naming, distinct from tags which are descriptive metadata. +Theme becomes a tag (`overture:theme=buildings`), not a structural field. +`system` provides generic key-based grouping without understanding what +"theme" means. Any package can declare theme tags (or any other structured +tag) without special support in the discovery layer. -## Phase 1: Basic Tagging +## Phase 1: Structured Tags Replace namespace with tags, move discovery to `system`, update CLI. +### Tag Format + +Tags are strings following the pattern `[namespace:]key[=value]`: + +- **Plain**: `overture`, `draft` +- **Namespaced**: `system:extension` -- `:` separates ownership/reservation +- **Namespaced k/v**: `overture:theme=buildings`, `codegen:schema_root=overture.schema` + +`:` signals ownership/reservation. `=` signals a dimension with a value +(groupable via `--group-by`). One level of each -- no nested colons or +multiple `=` signs. + ### Data Model -`ModelKey` drops `namespace`, gains `tags`: +`ModelKey` drops `namespace` and `theme`, gains `name`: ```python @dataclass(frozen=True, slots=True) class ModelKey: - theme: str | None - type: str - tags: frozenset[str] - class_name: str + name: str # friendly name from entry point key + class_name: str # entry point value (module:Class) + tags: frozenset[str] # plain and structured tags ``` +No `namespace` field, no `theme` field. Both are tags. + +`name` is the entry point key (before any `#` suffix). It serves as the +model's friendly identifier in CLI output and codegen path generation. + ### Entry Point Format -Names change from `namespace:theme:type` to `theme:type` (or just `type` for -non-themed models), with optional `#tag1,tag2` suffix. The group remains +Entry point keys are `name` or `name#tag1,tag2`. No `theme:type` prefix -- +theme is a package-level keyword. The entry point group remains `overture.models`. ```toml [project] -keywords = ["some-org-tag"] +keywords = ["overture:theme=buildings"] [project.entry-points."overture.models"] -"buildings:building" = "overture.schema.buildings:Building" -"buildings:building_part#draft" = "overture.schema.buildings:BuildingPart" +building = "overture.schema.buildings:Building" +"building_part#draft" = "overture.schema.buildings:BuildingPart" ``` A model's tags are the union of: @@ -45,6 +63,7 @@ A model's tags are the union of: - `[project].keywords` from the distribution metadata (read via `entry_point.dist.metadata["Keywords"]`) - Per-model `#` tags from the entry point name +- Tag providers (Phase 2) Note: `[project].keywords` is also used for PyPI search, so schema tags and PyPI keywords share a namespace. This was considered and accepted -- in @@ -54,9 +73,35 @@ needed. ### Tag Prefix Reservation -The `system:` prefix is reserved for tag providers. Discovery rejects -author-declared tags (keywords and `#`) that start with `system:`. Static -sources containing `system:*` tags produce an error (or warning + discard). +The `system:` prefix is reserved for tag providers registered by +`overture-schema-system`. Discovery rejects author-declared tags (keywords +and `#`) that start with `system:`. Static sources containing `system:*` tags +produce an error (or warning + discard). + +All other prefixes are convention-based with no enforcement. + +### Tag Parsing Helpers + +`system` provides utilities for working with structured tags: + +```python +def tags_by_key(tags: frozenset[str], key: str) -> set[str]: + """Extract values for k/v tags with the given key. + + tags_by_key(frozenset({"overture:theme=buildings", "overture", "draft"}), "overture:theme") + -> {"buildings"} + """ + +def tags_by_namespace(tags: frozenset[str], namespace: str) -> set[str]: + """Extract tag bodies within a namespace. + + tags_by_namespace(frozenset({"system:extension", "overture"}), "system") + -> {"extension"} + """ +``` + +`tags_by_key` is the primitive that CLI `--group-by` and codegen path +generation build on. Both live in `system` alongside `ModelKey`. ### Discovery @@ -72,51 +117,61 @@ intersect the filter set are included. When `None`, all models are returned. Implementation: 1. Iterate `overture.models` entry points -2. Split name on `#` to separate `theme:type` from per-model tags -3. Split the prefix on `:` -- one part = type only (theme is `None`), two - parts = theme + type -4. Load the model class via `entry_point.load()` -5. Read `entry_point.dist.metadata["Keywords"]` for package-level tags -6. Union package tags and per-model tags, rejecting any `system:*` tags -7. Build `ModelKey` with `frozenset` tags +2. Split name on `#` to separate `name` from per-model tags +3. Load the model class via `entry_point.load()` +4. Read `entry_point.dist.metadata["Keywords"]` for package-level tags +5. Union package tags and per-model tags, rejecting any `system:*` tags +6. Run tag providers (Phase 2) +7. Build `ModelKey(name=name, class_name=entry_point.value, tags=frozenset(tags))` ### CLI -`--namespace` and `--overture-types` are removed. `--tag` (repeatable) replaces -them. `--theme` and `--type` remain. +`--namespace`, `--overture-types`, and `--theme` are removed. Replaced by: + +- `--tag ` (repeatable, OR within tags, AND across other dimensions) +- `--group-by ` -- group output by values of matching `[ns:]key=*` tags Repeated `--tag` flags use OR: `--tag foo --tag bar` matches models with -either tag. Repeated `--theme` flags also use OR. Filters compose as AND -across dimensions: `--tag foo --theme buildings` takes the OR result from tags -and intersects it with the OR result from themes. +either tag. -`list-types` groups by theme and displays tags per model: +`list-types` with `--group-by` groups by the specified tag key and displays +tags per model: ``` +$ overture-schema list-types --group-by overture:theme + buildings building overture building_part overture, draft + places place overture -transportation - connector overture - segment overture -(unthemed) + +(ungrouped) sources overture ``` +Models without a matching tag appear in "(ungrouped)". + +`--group-by` is generic: `--group-by acme:category` works if packages use +`acme:category=widgets` tags. Default behavior (no `--group-by`): flat list +with tags column. + ### Migration -Existing `namespace:theme:type` entry points become `theme:type` (or `type`). -The `namespace` field is removed from `ModelKey`. The `overture` namespace -does not need replacement as a keyword or `#` tag -- Phase 2 introduces a tag -provider in `core` that derives the `overture` tag from `OvertureFeature` -inheritance. The `annex` namespace has no equivalent; annex packages either -declare their own keywords or rely on tag providers. +Existing `namespace:theme:type` entry points become `name` (just the type +name). The `namespace` and `theme` fields are removed from `ModelKey`. Theme +moves to `overture:theme=X` in `[project].keywords` on each theme package. + +The `overture` namespace does not need replacement as a keyword or `#` tag -- +Phase 2 introduces a tag provider in `core` that derives the `overture` tag +from `OvertureFeature` inheritance. The `annex` namespace has no equivalent; +annex packages either declare their own keywords or rely on tag providers. ### What Moves - `discover_models`, `ModelKey` move to `overture.schema.system.discovery` +- Tag parsing helpers (`tags_by_key`, `tags_by_namespace`) live in `system` - CLI imports from `system` directly --- @@ -194,6 +249,10 @@ def overture_provider( return tags ``` +This provider does NOT add `overture:theme=X` -- theme comes from package +keywords. The provider adds only the flat `overture` tag based on +inheritance. + Theme packages do not need `"overture"` in `[project].keywords`. The tag is derived from `OvertureFeature` inheritance by this provider. @@ -246,9 +305,9 @@ def extension_provider( ``` Package authors register extensions as regular `overture.models` entry points. -Extensions follow the same naming rules as other models: `theme:type` for -themed extensions, or just `type` for unthemed ones. The `system:extension` -tag is derived automatically; authors never declare it. +Extensions follow the same naming rules as other models: `name` or +`name#tag1,tag2`. The `system:extension` tag is derived automatically; authors +never declare it. ### Extension Discovery Helper @@ -277,6 +336,8 @@ The CLI treats `system:extension` as a presentation hint: they extend. Core models show a compact cross-reference to available extensions. ``` +$ overture-schema list-types --group-by overture:theme + buildings building overture + capacity, operating-hours @@ -297,7 +358,7 @@ extensions after the model's own fields. ``` -building (buildings) overture +building overture, overture:theme=buildings Fields: geometry Geometry @@ -376,3 +437,66 @@ is TBD. - **Organizational policy**: a company publishes their own manifest package with their own approved set (including approved extensions). - **CLI filtering**: `--tag approved` shows only certified models. + +--- + +## Appendix: Codegen Path Generation + +Two strategies depending on whether a type has an entry point. + +### Feature Models (have entry points and tags) + +`tags_by_key(key.tags, "overture:theme")` provides the theme directory +(omitted if absent). `to_snake_case(spec.name)` provides the subdirectory and +filename. + +``` +{theme}/{slug}/{slug}.md # with theme tag +{slug}/{slug}.md # without theme tag +``` + +No module path involved for features. + +### Supplementary Types (no entry points) + +Module prefix stripping replaces theme-tag matching for path generation. + +**`schema_root` resolution** (checked in order): + +1. **Explicit tag**: `codegen:schema_root=overture.schema` in package keywords +2. **Tag provider**: `core` registers a provider that adds + `codegen:schema_root=overture.schema` to models from Overture packages +3. **Heuristic**: compute common prefix of module paths across the entry + point group (the `compute_schema_root()` fallback) + +**Path algorithm** for any type: + +1. Resolve `schema_root` for the type's distribution +2. Get `__module__`, strip `{schema_root}.` prefix +3. Walk remaining components, keep only packages + (`hasattr(mod, '__path__')`) -- drops file-level modules +4. Result is the directory path + +| `__module__` | Root | After strip | Pkg-filtered | Path | +|---|---|---|---|---| +| `o.s.divisions.division.models` | `overture.schema` | `divisions.division.models` | `divisions.division` | `divisions/division/` | +| `acme.widgets.places.models` | `acme.widgets` | `places.models` | `places` | `places/` | +| `acme.widgets.places.restaurants.types` | `acme.widgets` | `places.restaurants.types` | `places.restaurants` | `places/restaurants/` | + +Then the existing nesting logic decides: single-feature reference nests under +that feature; multi-feature reference stays at theme/intermediate level. + +### Core/System Types + +Uses the same `schema_root` + package-filter algorithm. With +`schema_root=overture.schema`, a type in `overture.schema.core.cartography` +yields path `core/` (if `cartography` is a file) or `core/cartography/` +(if promoted to a package). + +The hardcoded `_SUBSYSTEM_MAP` exists because file-level modules lose +path significance when filtered. Promoting meaningful modules (like +`cartography.py`) to packages eliminates the map -- a code organization +change noted as follow-up work. + +`compute_schema_root()` and `_SCHEMA_PREFIX` are replaced by the +`schema_root` tag. From 34f3952d52b60d65a1ed6a842a0363b76d3ff3ff Mon Sep 17 00:00:00 2001 From: Seth Fitzsimmons Date: Tue, 17 Feb 2026 16:38:12 -0800 Subject: [PATCH 3/3] Incorporate meeting feedback into tags design Restructure around meeting outcomes: tags generalize the ModelKind classifier from pydantic-extensions, with tag providers as the mechanism from Phase 1 (not deferred). - Add Purpose, Generalizing the Classifier, Privileged Packages, Security Roadmap, and Deferred Keywords sections - Merge old Phase 1 (static tags) and Phase 2 (tag providers) into a single phase; tag providers are the only tag source - Add privilege table for prefix reservation (overture:* owned by core, system:* owned by system) over flat reservation, with rationale: structured tags encode relationships, prevent multi-ecosystem collision, and avoid tag-pairing ambiguity - Add feature provider (Feature in system) separate from overture provider (OvertureFeature in core) - Move extension provider into Phase 2 (Extension Support) - Defer package keywords to a future tag provider; drop #tag syntax from entry point format (names are just identifiers) - Renumber phases: extensions become Phase 2, manifest-driven approval becomes Phase 3 --- docs/designs/tags.md | 589 +++++++++++++++++++++++++++---------------- 1 file changed, 377 insertions(+), 212 deletions(-) diff --git a/docs/designs/tags.md b/docs/designs/tags.md index 040eb12b7..b30f35eaf 100644 --- a/docs/designs/tags.md +++ b/docs/designs/tags.md @@ -1,30 +1,204 @@ # Tags: Extensible Model Grouping via Entry Points -Replace the hardcoded `namespace` concept (`"overture"`, `"annex"`) with tags -- -string labels declared by package authors and derived by tag providers. Tags -become the filtering and grouping mechanism for model discovery, driven by -package authors rather than central coordination. +Move model discovery into `system`, eliminating assumptions about Overture in +the process. Replace the hardcoded `namespace` concept (`"overture"`, `"annex"`) +and the `ModelKind` classifier with tags -- string labels derived by tag +providers. Tags become the filtering, grouping, and classification mechanism +for model discovery, driven by introspection and package metadata rather than +central coordination. -Theme becomes a tag (`overture:theme=buildings`), not a structural field. -`system` provides generic key-based grouping without understanding what -"theme" means. Any package can declare theme tags (or any other structured -tag) without special support in the discovery layer. +`system` provides generic tag-based grouping without understanding what any +particular tag means. Any package can register tag providers that classify +models without special support in the discovery layer. -## Phase 1: Structured Tags +## Purpose -Replace namespace with tags, move discovery to `system`, update CLI. +Tags serve three roles: -### Tag Format +- **CLI filtering**: select subsets of models for output and codegen + (`--tag overture`, `--tag draft`) +- **Classification and endorsement**: distinguish features from extensions, + mark models as vetted or approved by an authority +- **Marketplace taxonomy**: browse and classify models and extensions in a + future extension catalog -Tags are strings following the pattern `[namespace:]key[=value]`: +These roles overlap -- a tag like `overture:theme=buildings` serves both +filtering and taxonomy. The design accommodates this overlap through +structured tags that encode both ownership and dimension (see Tag Format). -- **Plain**: `overture`, `draft` -- **Namespaced**: `system:extension` -- `:` separates ownership/reservation -- **Namespaced k/v**: `overture:theme=buildings`, `codegen:schema_root=overture.schema` +## Generalizing the Classifier -`:` signals ownership/reservation. `=` signals a dimension with a value -(groupable via `--group-by`). One level of each -- no nested colons or -multiple `=` signs. +The `pydantic-extensions` branch introduces `ModelKind`, an enum that +classifies models by introspecting their type hierarchy: + +```python +class ModelKind(str, Enum): + OVERTURE_MODEL = "overture_model" + EXTENSION_MODEL = "extension_model" + BASE_MODEL = "base_model" +``` + +`ModelKind.of()` inspects a model and returns a fixed classification. +`discover_models` then hardcodes how each kind maps to namespace and theme: + +```python +match ModelKind.of(model_class): + case ModelKind.OVERTURE_MODEL: + ns = "overture" + theme = get_args(first_class.model_fields["theme"].annotation)[0] + case ModelKind.EXTENSION_MODEL: + ns = "extensions" + case ModelKind.BASE_MODEL: + ns = "annex" +``` + +This works for the current set of three categories but embeds assumptions +that limit extensibility: + +- **Closed classification set.** Adding a new kind requires changing the enum + and every `match` statement that consumes it. Third-party packages cannot + introduce new classifications. +- **Overture-specific logic in discovery.** `OVERTURE_MODEL` detection depends + on `OvertureFeature`, which lives in `core`. Discovery in `system` should + not import from `core`. +- **Classification and discovery are coupled.** The `match` block maps kinds + to structural fields (`namespace`, `theme`). The mapping is implicit and + changes when the data model changes. + +Tags generalize this. A tag provider does the same introspection work as +`ModelKind.of()` but returns tags instead of enum values: + +| `ModelKind` | Equivalent tags | Provider location | +|---|---|---| +| `OVERTURE_MODEL` | `feature`, `overture` | `system`, `core` | +| `EXTENSION_MODEL` | `system:extension` | `system` | +| `BASE_MODEL` | *(no tag -- it's the default)* | -- | + +The `feature` tag comes from `Feature` inheritance (checked in `system`). +The `overture` tag comes from `OvertureFeature` inheritance (checked in +`core`). These are separate providers because the concepts live in separate +packages -- `system` knows about `Feature` but not `OvertureFeature`. + +The classification becomes open: any package can register a tag provider that +adds new classifications without modifying `system` or `core`. Discovery +consumes tags generically -- it does not understand what any tag means. + +Theme stops being a structural field derived from model annotations inside +discovery. It becomes a tag (`overture:theme=buildings`) produced by a +provider in `core` that reads the model's `theme` field. Discovery sees +theme tags the same way it sees any other tag. + +--- + +## Tag Format + +Tags are strings following the pattern `[prefix:]key[=value]`: + +- **Plain**: `overture`, `draft`, `feature` +- **Prefixed**: `system:extension` -- `:` separates ownership +- **Prefixed k/v**: `overture:theme=buildings` + +`:` signals ownership and enables prefix reservation (see Privileged Packages +and Tag Reservation). `=` signals a dimension with a value (groupable via +`--group-by`). One level of each -- no nested colons or multiple `=` signs. + +### Tag Limits + +To prevent unbounded tag proliferation, tag providers should add O(1) tags +per model -- classifications and dimensional values, not open-ended lists. +Discovery warns when a model accumulates more than 16 tags after all +providers run. This is a diagnostic guard, not a hard limit; it catches +runaway providers without constraining legitimate use. + +--- + +## Privileged Packages and Tag Reservation + +### Prefix Reservation + +Prefixed tags (`prefix:*`) are owned by specific packages. A privilege table +in `system` maps prefixes to the packages authorized to produce them: + +```python +# overture.schema.system.privileges +RESERVED_PREFIXES: dict[str, set[str]] = { + "system": {"overture-schema-system"}, + "overture": {"overture-schema-core"}, +} +``` + +Discovery enforces this: when a tag provider adds a prefixed tag, the +provider's `entry_point.dist.name` must appear in the authorized set for +that prefix. Violations produce a warning and the tag is discarded. + +The table is maintained in `system` and updated as the ecosystem grows. New +packages claim prefixes by submitting changes to this table. + +### Plain Tag Reservation + +Specific plain tags (no colon) can also be reserved: + +```python +RESERVED_TAGS: dict[str, set[str]] = { + "overture": {"overture-schema-core"}, + "feature": {"overture-schema-system"}, +} +``` + +Same enforcement: only authorized packages may produce these tags via tag +providers. Unauthorized providers that attempt to add reserved tags see a +warning and discard. + +This is the less-preferred mechanism. Reserving individual strings does not +scale -- every new reserved word requires an update to the table. Prefix +reservation (`overture:*` owned by `overture-schema-core`) covers open-ended +families of tags without per-tag coordination. Plain tag reservation handles +the small set of broad classifications where a prefix would add noise +without information. + +### Why Prefix-Based Reservation + +The alternative is flat reservation only -- reserve specific strings +(`overture`, `feature`, `extension`, `buildings`, `theme`) without prefix +semantics. Flat tags keep the `key=value` dimension, and they're shorter. +But making the prefix opaque has costs: + +**Structured tags encode relationships that flat tags cannot.** With +`overture:theme=buildings`, `--group-by overture:theme` extracts the +dimension and groups by its values. With flat tags `theme` and `buildings`, +grouping requires sniffing for pairings -- "does this model have both +`theme` and `buildings`?" -- and the pairing is implicit. A model tagged +`theme`, `buildings`, `residential` is ambiguous: are `buildings` and +`residential` both theme values, or is `residential` a separate +classification? + +**Tags serve both endorsement and discovery.** A flat `buildings` tag in a +marketplace context is ambiguous: does it mean "this package relates to +buildings" (taxonomy) or "this package is the endorsed buildings schema" +(endorsement)? `overture:theme=buildings` unambiguously signals taxonomy; +endorsement lives in a separate tag (`approved`). + +**Multi-ecosystem support requires disambiguation.** When multiple +ecosystems share the discovery mechanism, prefixes prevent collision: +`overture:theme=buildings` and `acme:theme=industrial` coexist cleanly. +Flat `buildings` and `industrial` need external context to distinguish +their purpose. + +**`system` benefits from prefixed tags internally.** `system:extension` +encodes both the authority (`system`) and the classification (`extension`). +If `system` uses prefixed tags but the external convention is flat, you have +two tag systems. + +Prefix-based reservation is the primary mechanism. Flat reservation handles +the small set of unambiguous, high-frequency classifications where a prefix +would be pure noise (`overture`, `feature`). + +--- + +## Phase 1: Tag Providers and Discovery + +Tag providers are the mechanism from the start, not a later addition. +Discovery loads models, runs tag providers to derive tags, then filters. ### Data Model @@ -33,52 +207,119 @@ multiple `=` signs. ```python @dataclass(frozen=True, slots=True) class ModelKey: - name: str # friendly name from entry point key + name: str # entry point name class_name: str # entry point value (module:Class) tags: frozenset[str] # plain and structured tags ``` -No `namespace` field, no `theme` field. Both are tags. +No `namespace` field, no `theme` field. Both are expressed as tags. -`name` is the entry point key (before any `#` suffix). It serves as the -model's friendly identifier in CLI output and codegen path generation. +`name` is the `overture.models` entry point name. It serves as an identifier +for the package maintainer. CLI output and codegen paths use the class name +and module path (introspectable from the entry point value), not the entry +point key. Since `overture.models` keys have no functional role in the +system, the `name#tag1,tag2` syntax discussed in Deferred: Package Keywords +as Tag Source remains viable -- the `#` suffix would not affect any consumer. + +(By contrast, `overture.tag_providers` entry point names carry rc.d-style +numeric prefixes that determine provider execution order -- see Provider +Mechanism.) ### Entry Point Format -Entry point keys are `name` or `name#tag1,tag2`. No `theme:type` prefix -- -theme is a package-level keyword. The entry point group remains +Entry point keys are the model's name. The entry point group remains `overture.models`. ```toml -[project] -keywords = ["overture:theme=buildings"] - [project.entry-points."overture.models"] building = "overture.schema.buildings:Building" -"building_part#draft" = "overture.schema.buildings:BuildingPart" +building_part = "overture.schema.buildings:BuildingPart" ``` -A model's tags are the union of: +Tags come from tag providers. Providers can introspect model classes, +package metadata, and entry point attributes to derive tags (see Deferred: +Package Keywords as Tag Source for an example). -- `[project].keywords` from the distribution metadata (read via - `entry_point.dist.metadata["Keywords"]`) -- Per-model `#` tags from the entry point name -- Tag providers (Phase 2) +### Provider Mechanism -Note: `[project].keywords` is also used for PyPI search, so schema tags and -PyPI keywords share a namespace. This was considered and accepted -- in -practice, schema-relevant keywords and PyPI discoverability keywords overlap -naturally, and the `#` per-model mechanism covers cases where finer control is -needed. +A new entry point group `overture.tag_providers` registers callables. The +entry point name determines execution order using rc.d-style numeric prefixes +(`10_feature`, `50_overture`, `90_approved`). + +Provider signature: + +```python +(model_class: type[BaseModel], key: ModelKey, tags: set[str]) -> set[str] +``` + +Providers receive the accumulated tag set and return a (potentially modified) +set. Tags accumulate across providers: each provider sees the result of all +prior providers. The caller copies the set before passing it to each provider +and diffs the copy against the return value to detect additions and removals. +Providers can remove tags added by earlier providers. + +Only providers from packages authorized in the privilege table may add +reserved tags or prefixed tags. Discovery enforces this by checking +`entry_point.dist.name` against the privilege table after each provider runs. + +**Ordering**: numeric prefixes on entry point names determine execution order. +Published guidance defines priority ranges: + +- 0-19: structural tags (`feature`, `system:extension` in Phase 2) +- 20-49: derived classifications (`overture`) +- 50-79: taxonomy tags (`overture:theme=buildings`) +- 80-99: approval/policy tags (`approved`) -### Tag Prefix Reservation +Ranges are illustrative -- actual boundaries TBD. -The `system:` prefix is reserved for tag providers registered by -`overture-schema-system`. Discovery rejects author-declared tags (keywords -and `#`) that start with `system:`. Static sources containing `system:*` tags -produce an error (or warning + discard). +### Built-in Tag Providers -All other prefixes are convention-based with no enforcement. +**`feature` provider** (in `system`, priority 10): + +Classifies models that derive from `Feature` (`overture.schema.system.feature`). +This is the generic base class for all feature models; `OvertureFeature` +extends it in `core`. + +```toml +# packages/overture-schema-system/pyproject.toml +[project.entry-points."overture.tag_providers"] +"10_feature" = "overture.schema.system.tags:feature_provider" +``` + +```python +from overture.schema.system.feature import Feature + +def feature_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + if isinstance(model_class, type) and issubclass(model_class, Feature): + tags.add("feature") + return tags +``` + +**`overture` provider** (in `core`, priority 50): + +Classifies models that derive from `OvertureFeature`. + +```toml +# packages/overture-schema-core/pyproject.toml +[project.entry-points."overture.tag_providers"] +"50_overture" = "overture.schema.core.tags:overture_provider" +``` + +```python +from overture.schema.core import OvertureFeature + +def overture_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + if isinstance(model_class, type) and issubclass(model_class, OvertureFeature): + tags.add("overture") + return tags +``` + +This provider does NOT add `overture:theme=X` -- theme is a separate +provider (also in `core`) that reads the model's `theme` field annotation. ### Tag Parsing Helpers @@ -111,25 +352,28 @@ generation build on. Both live in `system` alongside `ModelKey`. def discover_models(tags: set[str] | None = None) -> dict[ModelKey, type[BaseModel]]: ``` -When `tags` is provided, any-match semantics apply: models whose effective tags -intersect the filter set are included. When `None`, all models are returned. +When `tags` is provided, any-match semantics apply: models whose effective +tags intersect the filter set are included. When `None`, all models are +returned. Implementation: 1. Iterate `overture.models` entry points -2. Split name on `#` to separate `name` from per-model tags -3. Load the model class via `entry_point.load()` -4. Read `entry_point.dist.metadata["Keywords"]` for package-level tags -5. Union package tags and per-model tags, rejecting any `system:*` tags -6. Run tag providers (Phase 2) -7. Build `ModelKey(name=name, class_name=entry_point.value, tags=frozenset(tags))` +2. Load the model class via `entry_point.load()` +3. Load all `overture.tag_providers` entry points, sorted by key +4. For each model, run each provider in order: copy the current tag set, + call the provider, diff the result, enforce the privilege table +5. Warn if a model exceeds 16 tags +6. Freeze final `set[str]` into `frozenset[str]` on `ModelKey`, apply + filter, return ### CLI `--namespace`, `--overture-types`, and `--theme` are removed. Replaced by: -- `--tag ` (repeatable, OR within tags, AND across other dimensions) -- `--group-by ` -- group output by values of matching `[ns:]key=*` tags +- `--tag ` (repeatable, OR semantics) +- `--group-by ` -- group output by values of matching + `[prefix:]key=*` tags Repeated `--tag` flags use OR: `--tag foo --tag bar` matches models with either tag. @@ -141,11 +385,11 @@ tags per model: $ overture-schema list-types --group-by overture:theme buildings - building overture - building_part overture, draft + building overture, feature + building_part overture, feature, draft places - place overture + place overture, feature (ungrouped) sources overture @@ -153,112 +397,31 @@ places Models without a matching tag appear in "(ungrouped)". -`--group-by` is generic: `--group-by acme:category` works if packages use -`acme:category=widgets` tags. Default behavior (no `--group-by`): flat list -with tags column. +`--group-by` is generic: `--group-by acme:category` works if packages +register tag providers that produce `acme:category=widgets` tags. Default +behavior (no `--group-by`): flat list with tags column. ### Migration -Existing `namespace:theme:type` entry points become `name` (just the type -name). The `namespace` and `theme` fields are removed from `ModelKey`. Theme -moves to `overture:theme=X` in `[project].keywords` on each theme package. - -The `overture` namespace does not need replacement as a keyword or `#` tag -- -Phase 2 introduces a tag provider in `core` that derives the `overture` tag -from `OvertureFeature` inheritance. The `annex` namespace has no equivalent; -annex packages either declare their own keywords or rely on tag providers. +Existing `namespace:theme:type` entry points become just the type name. The +`namespace` and `theme` fields are removed from `ModelKey`. Theme moves to a +tag provider in `core` that reads model field annotations. The `overture` +namespace becomes a derived tag from `OvertureFeature` inheritance. The +`annex` namespace has no equivalent; annex packages either register their +own tag providers or rely on generic classification. ### What Moves - `discover_models`, `ModelKey` move to `overture.schema.system.discovery` +- Tag provider infrastructure, privilege table live in `system` +- `feature` provider lives in `system` (`extension` provider added in Phase 2) +- `overture` and theme providers live in `core` - Tag parsing helpers (`tags_by_key`, `tags_by_namespace`) live in `system` - CLI imports from `system` directly --- -## Phase 2: Tag Providers - -Derived tags from model introspection. The `overture` tag (currently a hardcoded -namespace) becomes a derived tag produced by a provider in `core`. - -### Provider Mechanism - -A new entry point group `overture.tag_providers` registers callables. The entry -point key is informational (describes the provider's purpose) and not -interpreted by discovery. - -Provider signature: - -```python -(model_class: type[BaseModel], key: ModelKey, tags: set[str]) -> set[str] -``` - -Providers receive the accumulated tag set and return a tag set (either the -same object mutated or a new set). Tags accumulate across providers: each -provider sees the result of all prior providers plus the static tags. The -caller copies the set before passing it to each provider and diffs the copy -against the return value to detect additions and removals. Providers can -remove static tags (from keywords and `#`) and tags added by earlier -providers. - -**Unresolved: provider ordering.** Since providers chain, execution order -affects results. Leading candidate: alphabetical by entry point name with -rc.d-style numeric prefixes (`10_extensions`, `50_overture`, `90_approved`). -Since the entry point key is informational, providers self-register their -ordering. Published guidance would define what the priority ranges mean (e.g., -0-19 for structural tags, 20-49 for derived tags, 50-79 for classification, -80-99 for approval/policy). Ranges are illustrative -- actual boundaries TBD. - -Providers run after static tag resolution, before filtering. Only tag -providers registered by `overture-schema-system` may add `system:`-prefixed -tags. Discovery enforces this by checking `entry_point.dist.name` before -accepting `system:` additions. This is a convention boundary, not a security -mechanism -- any package could name itself `overture-schema-system`. It -prevents accidental `system:` claims from third-party providers and -ecosystem-specific packages (like `core`). - -### Two-Phase Discovery - -With tag providers, `discover_models` becomes: - -1. **Load phase**: load models, resolve static tags (keywords + `#`) -2. **Enrich phase**: load all `overture.tag_providers` entry points, run each - provider (passing a copy of the current tags), diff input/output to detect - changes, enforce `system:` prefix rules -3. Freeze final `set[str]` tags into `frozenset[str]` on `ModelKey`, apply - filter, return - -### The `overture` Tag Provider - -Lives in `core`, registered as an entry point: - -```toml -# packages/overture-schema-core/pyproject.toml -[project.entry-points."overture.tag_providers"] -overture = "overture.schema.core.tags:overture_provider" -``` - -```python -from overture.schema.core import OvertureFeature - -def overture_provider( - model_class: type[BaseModel], key: ModelKey, tags: set[str] -) -> set[str]: - if isinstance(model_class, type) and issubclass(model_class, OvertureFeature): - tags.add("overture") - return tags -``` - -This provider does NOT add `overture:theme=X` -- theme comes from package -keywords. The provider adds only the flat `overture` tag based on -inheritance. - -Theme packages do not need `"overture"` in `[project].keywords`. The tag is -derived from `OvertureFeature` inheritance by this provider. - ---- - -## Phase 3: Extension Support +## Phase 2: Extension Support Layer extension registration and discovery onto the tagging system. The extension mechanism itself (how extensions modify or compose with base models) @@ -292,7 +455,7 @@ Lives in `system`, registered as an entry point: ```toml # packages/overture-schema-system/pyproject.toml [project.entry-points."overture.tag_providers"] -extensions = "overture.schema.system.tags:extension_provider" +"15_extensions" = "overture.schema.system.tags:extension_provider" ``` ```python @@ -305,9 +468,7 @@ def extension_provider( ``` Package authors register extensions as regular `overture.models` entry points. -Extensions follow the same naming rules as other models: `name` or -`name#tag1,tag2`. The `system:extension` tag is derived automatically; authors -never declare it. +The `system:extension` tag is derived automatically; authors never declare it. ### Extension Discovery Helper @@ -339,12 +500,12 @@ they extend. Core models show a compact cross-reference to available extensions. $ overture-schema list-types --group-by overture:theme buildings - building overture + building overture, feature + capacity, operating-hours - building_part overture, draft + building_part overture, feature, draft places - place overture + place overture, feature + operating-hours extensions @@ -358,7 +519,7 @@ extensions after the model's own fields. ``` -building overture, overture:theme=buildings +building overture, feature, overture:theme=buildings Fields: geometry Geometry @@ -375,7 +536,7 @@ field definitions. --- -## Phase 4: Manifest-Driven Approval (Sketch) +## Phase 3: Manifest-Driven Approval (Sketch) A tag provider that checks models against a curated manifest, enabling organizations to certify which models (and extensions) they endorse. @@ -395,7 +556,7 @@ It registers a tag provider: ```toml # packages/overture-schema-official-extensions/pyproject.toml [project.entry-points."overture.tag_providers"] -approved = "overture.schema.official_extensions:approved_provider" +"90_approved" = "overture.schema.official_extensions:approved_provider" ``` The manifest can match against class names (entry point values), entry point @@ -440,63 +601,67 @@ is TBD. --- -## Appendix: Codegen Path Generation - -Two strategies depending on whether a type has an entry point. - -### Feature Models (have entry points and tags) +## Deferred: Package Keywords as Tag Source -`tags_by_key(key.tags, "overture:theme")` provides the theme directory -(omitted if absent). `to_snake_case(spec.name)` provides the subdirectory and -filename. +Package keywords (`[project].keywords` in `pyproject.toml`) are a natural +source of tags -- they're part of Python distribution metadata and appear in +PyPI search. The initial design derives all tags from tag providers, but a +provider could surface keywords as tags: +```python +def keyword_provider( + model_class: type[BaseModel], key: ModelKey, tags: set[str] +) -> set[str]: + # Requires access to distribution metadata via entry point or enriched key + dist = _get_distribution(key) + for kw in dist.metadata.get_all("Keyword", []): + kw = kw.strip() + if kw and not _is_reserved(kw): + tags.add(kw) + return tags ``` -{theme}/{slug}/{slug}.md # with theme tag -{slug}/{slug}.md # without theme tag -``` - -No module path involved for features. -### Supplementary Types (no entry points) +This is deferred because tag providers handle the initial use cases +(classification, theme, extensions) without requiring package authors to +declare keywords. Keywords become useful when: -Module prefix stripping replaces theme-tag matching for path generation. +- Third-party packages want to self-declare tags without writing a provider +- A catalog needs author-supplied taxonomy tags beyond what introspection + can derive -**`schema_root` resolution** (checked in order): +The provider above needs access to distribution metadata through `ModelKey` +or a separate context object -- the current `ModelKey` does not carry this. +Adding it is straightforward when the need arises. -1. **Explicit tag**: `codegen:schema_root=overture.schema` in package keywords -2. **Tag provider**: `core` registers a provider that adds - `codegen:schema_root=overture.schema` to models from Overture packages -3. **Heuristic**: compute common prefix of module paths across the entry - point group (the `compute_schema_root()` fallback) - -**Path algorithm** for any type: - -1. Resolve `schema_root` for the type's distribution -2. Get `__module__`, strip `{schema_root}.` prefix -3. Walk remaining components, keep only packages - (`hasattr(mod, '__path__')`) -- drops file-level modules -4. Result is the directory path - -| `__module__` | Root | After strip | Pkg-filtered | Path | -|---|---|---|---|---| -| `o.s.divisions.division.models` | `overture.schema` | `divisions.division.models` | `divisions.division` | `divisions/division/` | -| `acme.widgets.places.models` | `acme.widgets` | `places.models` | `places` | `places/` | -| `acme.widgets.places.restaurants.types` | `acme.widgets` | `places.restaurants.types` | `places.restaurants` | `places/restaurants/` | - -Then the existing nesting logic decides: single-feature reference nests under -that feature; multi-feature reference stays at theme/intermediate level. - -### Core/System Types - -Uses the same `schema_root` + package-filter algorithm. With -`schema_root=overture.schema`, a type in `overture.schema.core.cartography` -yields path `core/` (if `cartography` is a file) or `core/cartography/` -(if promoted to a package). +Note: `[project].keywords` is also used for PyPI search, so schema tags and +PyPI keywords share a namespace. In practice, schema-relevant keywords and +PyPI discoverability keywords overlap naturally. -The hardcoded `_SUBSYSTEM_MAP` exists because file-level modules lose -path significance when filtered. Promoting meaningful modules (like -`cartography.py`) to packages eliminates the map -- a code organization -change noted as follow-up work. +--- -`compute_schema_root()` and `_SCHEMA_PREFIX` are replaced by the -`schema_root` tag. +## Security Roadmap + +The privilege model is convention-based. The privilege table checks +`entry_point.dist.name`, which any package can spoof by naming itself +`overture-schema-system`. This prevents accidental claims, not adversarial +ones. + +This is acceptable for the current stage. The Python packaging ecosystem +provides a path to stronger guarantees when needed: + +- **Trusted Publishers** (PyPI): ties package uploads to specific CI/CD + identities (GitHub Actions OIDC), preventing unauthorized releases of + known package names. Already available on PyPI. +- **Provenance attestations** (PEP 740): Sigstore-based build provenance + lets consumers verify that a package was built from a specific source + repository by a specific workflow. PyPI supports this today for packages + using Trusted Publishers. +- **Repository-based classification**: `[project.urls]` metadata carries + source repository URLs. A future tag provider could use verified + provenance to classify packages by origin repository -- e.g., packages + built from the `OvertureMaps/schema` repo get `core` status, packages + from `OvertureMaps/schema-extensions` get `vetted` status. + +The privilege table's interface stays the same as enforcement strengthens; +only the verification backend changes. Convention-based enforcement upgrades +to cryptographic enforcement without changing tag semantics.