Skip to content

schema: hoist inline definitions into named $defs for codegen#726

Merged
hughcars merged 7 commits into
mainfrom
ad-cqc/schema-codegen
Jun 10, 2026
Merged

schema: hoist inline definitions into named $defs for codegen#726
hughcars merged 7 commits into
mainfrom
ad-cqc/schema-codegen

Conversation

@ad-cqc

@ad-cqc ad-cqc commented May 7, 2026

Copy link
Copy Markdown
Contributor

Motivation

The current schema is correct for validation, but its pervasive use of inline definitions makes it hostile to code generation (Python, Julia, or any target that turns a JSON Schema into a set of named types in a single module):

  • Inline object schemas inherit the surrounding key name. Two unrelated inline objects under different parents but with the same key (e.g. Attributes, Direction, CoordinateSystem) collide as soon as the generated types share a namespace.
  • Inline schemas inside arrays produce synthetic …Items types. SurfaceCurrent.items was an inline object, which generators render as SurfaceCurrentItems. The same was true for Elements, producing SurfaceCurrentElementsItems — a name with no relationship to anything in the source of truth.
  • oneOf-by-required discriminators are anonymous. LumpedPort and SurfaceCurrent items used oneOf: [{required:[Attributes]}, {required:[Elements]}] over a single inline object. Generators can't see two distinct shapes, so they emit one type with every field optional and lose the discriminator.
  • Inline string enums collide. Repeated literals like ["Cartesian", "Cylindrical"] or ["MGS", "CGS", "CGS2"] show up at multiple call sites; without a named definition, each call site produces a separate, name-clashing enum.

What this PR does

Commit 1: schema refactor (scripts/schema/config/*.json)

Pure refactor — no semantic changes. The same documents that validated before still validate, and the same documents that failed still fail.

For each of boundaries.json, domains.json, model.json, problem.json, solver.json:

  1. Hoist inline object schemas into \$defs with unique, source-traceable names. Examples:

    • boundaries: PEC, PMC, Absorbing, Conductivity, WavePort, WavePortPEC, Ground, ZeroCharge, Terminal, Periodic, BoundaryPostprocessing
    • domains: Material, DomainPostprocessing, DomainEnergy, Probe, CurrentDipole
    • model: Box, Sphere
  2. Split anonymous oneOf discriminators into named variants:

    • LumpedPort.itemsoneOf: [LumpedPortAttributes, LumpedPortElements]
    • SurfaceCurrent.itemsoneOf: [SurfaceCurrentAttributes, SurfaceCurrentElements]

    This gives codegen two distinct, well-named types plus a tagged union, instead of one type with everything optional.

  3. Promote inline string enums to named \$defs: CoordinateSystem, ProblemType, GSOrthogonalization, AdaptiveGSOrthogonalization, AdaptiveCircuitSynthesisDomainOrthogonalization, MaterialAxes, Direction, DipoleDirection.

  4. Name polymorphic value shapes that previously appeared as anonymous oneOf fragments:

    • SamplesSamplesPoint | SamplesLinear | SamplesLog
    • Port DirectionPortDirection | Vector3
    • Port Excitation integer → ExcitationIndex

Commit 2: validator follow-ups (palace/utils/jsonschema.cpp, test/unit/test-schema.cpp)

Caught by running the schema unit tests locally; required to keep behavior consistent under the named-\$ref schema.

  1. FindAllSchemasByKey now resolves internal #/\$defs/ refs instead of skipping them. Previously the function continued past any \$ref into \$defs, which meant schemas only reachable through a named ref (e.g. BoundaryPostprocessing.FarField) were no longer findable via ValidateConfig(data, "FarField"). Adds a depth guard against pathological cycles.
  2. SchemaErrorHandler enhances type-mismatch and enum errors using substring matching. With LumpedPort.items now a oneOf, the validator wraps simple errors as [combination: oneOf / case#0] unexpected instance type. The exact-string check stopped matching, so the (got string) / valid-enum hints disappeared. Switching to find() makes the enhancement fire regardless of oneOf/anyOf wrapping.
  3. Two exact-string assertions in test-schema.cpp relaxed to substring checks. The validator's behavior is unchanged, but the wording for Index failures shifted from a flat single-message form to a oneOf-enumerated form. Asserting on the underlying message keeps the test resilient to validator-output formatting.

Why this is safe

  • Every schema change replaces an inline schema with a \$ref to a \$defs entry whose body is byte-equivalent (modulo whitespace) to the inlined version.
  • additionalProperties: false, required, and oneOf/allOf constraints are preserved at the same nesting level.
  • For LumpedPort / SurfaceCurrent, the two split variants carry the full property set; only the required differs, so no document that was valid before is rejected now.
  • FindAllSchemasByKey continues to return the same results for top-level property lookups; it now also finds keys reachable only through internal \$defs refs (strictly more permissive).

Test plan

  • Local: cmake --build build/palace-build --target unit-tests && build/palace-build/test/unit/palace-unit-tests "[schema]" — 10 passed, 1 skipped (the Embedded Schema Matches Source SECTION skips by design when its source path isn't present), 0 failed, 155/155 assertions.
    • Sub-schema lookup by key for LumpedPort / WavePort / SurfaceCurrent / CurrentDipole / Materials / FarField.
    • LumpedPort / SurfaceCurrent oneOf: [Attributes, Elements] discrimination.
    • PEC/Ground and PMC/ZeroCharge mutual exclusion.
    • All 24 example configs validate.
    • Problem.Type → Solver requirements (Driven/Eigenmode/Transient reject if matching solver section absent; Electrostatic/Magnetostatic accept defaults).
    • Error-message format: enum hints, type-mismatch hints, oneOf-wrapped messages, additional-property failures.
  • CI style.yml "Check JSON Schema" — scripts/validate-config runs over every examples/**/*.json against the refactored schema.
  • CI unit-test job (test-schema.cpp).

@ad-cqc ad-cqc requested a review from hughcars May 7, 2026 22:01
@ad-cqc ad-cqc force-pushed the ad-cqc/schema-codegen branch 2 times, most recently from aba4829 to 3ac9de4 Compare May 13, 2026 16:18
@ad-cqc ad-cqc force-pushed the ad-cqc/schema-codegen branch 2 times, most recently from d4f2b7e to b070c77 Compare May 20, 2026 17:30

@hughcars hughcars left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, one small request on error/warning handling. I think we should probably hold off merging this until #719, and #716 go in.

Comment thread palace/utils/jsonschema.cpp
ad-cqc added 7 commits June 9, 2026 13:19
…iness

Promote every anonymous object/array-item schema and inline string enum
into top-level $defs entries with unique names. Replace inline schemas
with $ref pointers. Validator behavior is unchanged.

LumpedPort and SurfaceCurrent items are split into named oneOf variants
(LumpedPortAttributes / LumpedPortElements, SurfaceCurrentAttributes /
SurfaceCurrentElements) so a downstream codegen sees two distinct
shapes plus a tagged union, instead of one shape with everything
optional.
Resolve internal $defs references during recursive schema key lookup
so that nested-via-$ref schemas (e.g. BoundaryPostprocessing.FarField)
are discoverable. Add a depth guard to prevent infinite recursion from
pathological self-referential $defs cycles.

Change error message enhancement from exact equality checks to
substring matching so type mismatch and enum hints also fire for
messages wrapped in oneOf/anyOf combination contexts.

Update unit tests to use substring assertions instead of exact string
comparisons to accommodate the richer validator output.
Rename DielectricType to InterfaceDielectric in boundaries.json for
clarity, and merge duplicate AdaptiveGSOrthogonalization and
GSOrthogonalization enums into a single Orthogonalization definition
in solver.json.
…ath $def

VoltagePath and CurrentPath share the same array-of-2D/3D-points shape;
hoist into a single $defs/CoordinatePath and reference it from all
WavePort, PostprocessingImpedance, and PostprocessingVoltage uses.
Separate the max-depth guard from the is_object check in
FindAllSchemasByKey and emit an Mpi::Warning when the depth cap
is hit. This surfaces potential self-referential $defs cycles in
the schema as a developer-visible diagnostic instead of silently
truncating the search.
Update SchemaCoverageGaps to resolve $ref pointers against $defs and
handle oneOf/anyOf discriminated unions introduced by schema hoisting.
Properties are collected as the union across arms while required fields
are computed as the intersection, matching pre-hoist single-object
semantics.
@ad-cqc ad-cqc force-pushed the ad-cqc/schema-codegen branch from 4c28172 to f526f19 Compare June 9, 2026 20:19
@hughcars hughcars merged commit 64e69e9 into main Jun 10, 2026
67 of 69 checks passed
@hughcars hughcars deleted the ad-cqc/schema-codegen branch June 10, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants