Open
Conversation
pytest-subtests merged into pytest core as of pytest 9. Update test imports from pytest_subtests.SubTests to _pytest.subtests.Subtests.
- Add -q, --tb=short to `make test` for compact output - Set verbosity_subtests=0 to suppress per-subtest progress characters (the u/,/- markers from pytest's built-in subtests support)
Bare triple-quoted strings after NewType assignments are expression statements that Python never attaches to the NewType object, leaving __doc__ as None. Convert each to an explicit __doc__ assignment so codegen and introspection tools can read them at runtime. Same pattern DocumentedEnum uses for enum member docs.
OvertureFeature validator error message had two continuation
lines missing the f-prefix, so {self.__class__.__name__} was
rendered literally. Also add missing space before "and".
Also fix "supserset" typo in docstring.
Replace hardcoded discriminator_fields tuple ("type", "theme",
"subtype") in _process_union_member with the discriminator field
name extracted from the union's Annotated metadata.
introspect_union already extracted the discriminator field name
but didn't pass it through to member processing. Now it does,
so unions using any field name as discriminator work correctly.
For nested unions, parent discriminator values are extracted from
nested leaf models to preserve structural tuple classification.
Feature.field_discriminator now attaches _field_name to the
callable, and _extract_discriminator_name reads it. This handles
the Discriminator-wrapping-a-callable case that str(disc) got
wrong silently.
Make _extract_literal_value return str directly instead of object, eliminating implicit str() conversions at call sites. Add comment explaining nested union re-indexing under the parent discriminator. Remove redundant test covered by TestDiscriminatorDiscovery and debugging print() calls from TestStructuralTuples.
The field holds the entry point value in "module:Class" format, not a class name. The old name required callers to know this (codegen's cli.py had a comment explaining it, and assigned to a local `entry_point` variable to compensate).
Empty package with build config, namespace packages, and py.typed marker. Declares click, jinja2, tomli, and overture-schema-core/system as dependencies.
Type analyzer (analyze_type) handles all type unwrapping in a single iterative function: NewType → Annotated → Union → list → terminal classification. Constraints accumulate from Annotated metadata with source tracking via ConstraintSource. Data structures: TypeInfo (type representation), FieldSpec (model field), ModelSpec (model), EnumSpec, NewTypeSpec, PrimitiveSpec. Type registry maps type names to per-target string representations via TypeMapping. is_semantic_newtype() distinguishes meaningful NewTypes from pass-through aliases. Utilities: case_conversion (snake_case), docstring (cleaning and custom-docstring detection).
Domain-specific extractors that consume analyze_type() and produce specs: - model_extraction: extract_model() for Pydantic models with MRO-aware field ordering, alias resolution, and recursive sub-model expansion via expand_model_tree() - enum_extraction: extract_enum() for DocumentedEnum classes - newtype_extraction: extract_newtype() for semantic NewTypes - primitive_extraction: extract_primitives() for numeric types with range and precision introspection - union_extraction: extract_union() with field merging across discriminated union variants Shared test fixtures in codegen_test_support.py.
Generate prose from extracted constraint data: - field_constraint_description: describe field-level constraints (ranges, patterns, unique items, hex colors) as human-readable notes with NewType source attribution - model_constraint_description: describe model-level constraints (@require_any_of, @radio_group, @min_fields_set, @require_if, @forbid_if) as prose, with consolidation of same-field conditional constraints
Determine what artifacts to generate and where they go:
- module_layout: compute output directories for entry points,
map Python module paths to filesystem output paths via
compute_output_dir
- path_assignment: build_placement_registry maps types to
output file paths. Feature models get {theme}/{slug}/,
shared types get types/{subsystem}/, theme-local types
nest under their feature or sit flat at theme level
- type_collection: discover supplementary types (enums,
NewTypes, sub-models) by walking expanded feature trees
- link_computation: relative_link() computes cross-page
links, LinkContext holds page path + registry for
resolving links during rendering
dafd3d7 to
4198027
Compare
4198027 to
23b22c7
Compare
Embed JSON example features in [tool.overture-schema.examples] sections. Each example is a complete GeoJSON Feature matching the theme's Pydantic model, used by the codegen example_loader to render example tables in documentation.
Jinja2 templates and rendering logic for documentation pages: - markdown_renderer: orchestrates page rendering for features, enums, NewTypes, primitives, and geometry. Recursively expands MODEL-kind fields inline with dot-notation. - markdown_type_format: type string formatting with link-aware rendering via LinkContext - example_loader: loads examples from theme pyproject.toml, validates against Pydantic models, flattens to dot-notation - reverse_references: computes "Used By" cross-references between types and the features that reference them Templates: feature, enum, newtype, primitives, geometry pages. Golden-file snapshot tests verify rendered output stability. Adds renderer-specific fixtures to conftest.py (cli_runner, primitives_markdown, geometry_markdown).
Click-based CLI entry point (overture-codegen generate) that wires discovery → extraction → output layout → rendering: - Discovers models via discover_models() entry points - Filters themes, extracts specs, builds placement registry - Renders markdown pages with field tables, examples, cross- references, and sidebar metadata - Supports --theme filtering and --output-dir targeting Integration tests verify extraction against real Overture models (Building, Division, Segment, etc.) to catch schema drift. CLI tests verify end-to-end generation, output structure, and link integrity.
Design doc covers the four-layer architecture, analyze_type(), domain-specific extractors, and extension points for new output targets. Walkthrough traces Segment through the full pipeline module-by-module in dependency order, with FeatureVersion as a secondary example for constraint provenance in the type analyzer. README describes the problem (Pydantic flattens domain vocabulary), the "unwrap once, render many" approach, CLI usage, architecture overview, and programmatic API.
23b22c7 to
8b0d396
Compare
TypeInfo.literal_value discarded multi-value Literals entirely (Literal["a", "b"] got None). Renamed to literal_values as a tuple of all args so consumers decide presentation. single_literal_value() preserves its contract: returns the value for single-arg Literals, None otherwise. Callers (example_loader, union_extraction) are unchanged. Multi-value Literals render as pipe-separated quoted values in markdown tables: `"a"` \| `"b"`.
| raise TypeError("Bare list without type argument is not supported") | ||
| state.is_list = True | ||
| annotation = args[0] | ||
| continue |
Contributor
There was a problem hiding this comment.
Haven't reviewed everything yet, but I found an issue in here while testing the parquet generator: this does not properly unpack nested lists. I don't think it's a problem for the markdown, but it surfaces in Divisions where we have list[NewType("Hierarchy", list[HierarchyItem])]. You can see the diff in the resulting arrow schemas:
Generated:
list<element: struct<division_id: string not null, subtype: string not null, name: string not null>>
Release Data (2026-02-18.0):
list<element: list<element: struct<division_id: string, subtype: string, name: string>>>
This should instead use something like a list_depth or a recursive unwrap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
overture-schema-codegen, a code generator that produces documentation fromPydantic schema models.
Pydantic's
model_json_schema()flattens the schema's domain vocabulary into JSONSchema primitives. NewType names, constraint provenance, and custom constraint classes
disappear. Navigating Python's type annotation machinery -- NewType chains, nested
Annotatedwrappers, union filtering, generic resolution -- is complex. The codegendoes it once.
analyze_type()unwraps annotations intoTypeInfo, a flattarget-independent representation that renderers consume without re-entering the type
system.
Architecture
Four layers with strict downward imports:
analyze_type()is the central function. A single iterative loop peels NewType,Annotated, Union, and container wrappers in fixed order, accumulating constraints tagged
with the NewType that contributed them. The result is a
TypeInfodataclass thatdownstream modules consume without re-entering the type system.
Both concrete
BaseModelsubclasses and discriminated union type aliases (likeSegment = Annotated[Union[RoadSegment, ...], ...]) satisfy theFeatureSpecprotocol and flowthrough the same pipeline. Union extraction finds the common base class, partitions
fields into shared and variant-specific, and extracts the discriminator mapping.
markdown_pipeline.pyorchestrates the full pipeline without I/O: tree expansion,supplementary type collection, path assignment, reverse references, and rendering.
Returns
list[RenderedPage]. The CLI writes files to disk with Docusaurus frontmatter.Design doc:
packages/overture-schema-codegen/docs/design.mdChanges outside the codegen package
Preparatory fixes and refactors in core/system/CLI packages:
ModelKey.class_nametoentry_point(carries module:Class path, not just theclass name)
resolve_discriminator_field_name()to system feature moduledictinstead ofMappingin system test util type hintsExample (real) data added to theme
pyproject.tomlfiles (addresses, base, buildings,divisions, places) under
[examples.ModelName]sections.What's in the package
Source:
type_analyzer.pyTypeInfospecs.pytype_registry.pymodel_extraction.pyModelSpec, tree expansionunion_extraction.pyUnionSpec, discriminator mappingenum_extraction.pyEnumSpecnewtype_extraction.pyNewTypeSpecprimitive_extraction.pyfield_constraint_description.pymodel_constraint_description.pymodule_layout.pytype_collection.pypath_assignment.pylink_computation.pyreverse_references.pymarkdown_type_format.pyTypeInfo→ markdown type strings with linksmarkdown_renderer.pyexample_loader.pymarkdown_pipeline.pycli.pygenerateandlistcommandscase_conversion.pydocstring.pyTests: unit tests per module, golden file tests for
rendered markdown, integration tests against real schema models.
Design decisions worth reviewing
analyze_typeis iterative, not recursive. Thewhile Trueloop handles arbitrarynesting depth (NewType wrapping Annotated wrapping NewType wrapping Annotated...)
without stack growth. Dict key/value types are the one exception where it recurses.
Cache insertion before recursion in
expand_model_tree. The sub-model'sModelSpecenters the cache before its fields are expanded. A back-edge encounter finds the cached
entry and marks
starts_cycle=Truerather than infinite-looping.FeatureSpecis a Protocol, not a base class.ModelSpecandUnionSpechavedifferent field structures (flat list vs. annotated-field list with variant provenance).
A protocol lets them share a pipeline interface without forcing inheritance.
Constraint provenance via
ConstraintSource. Each constraint records which NewTypecontributed it. Field-level constraints with
source=Nonerender on the field;constraints with a named source render on the NewType's own page. This prevents
duplication.
Test plan
make checkpasses (pytest + doctests + ruff + mypy)make install && overture-codegen generate --format markdown --output-dir /tmp/schema-docsproduces outputfeature (e.g., Building) -- field tables, links, constraint descriptions, examples
features, features link to shared types)
The live schema reference contains Markdown produced by these changes (modulo some improvements from today).