Skip to content

Comments

[Generator][Experimental] Add opt-in dependency-aware sharded Types generation#866

Draft
jpsim wants to merge 2 commits intoapple:mainfrom
jpsim:jp/sharding
Draft

[Generator][Experimental] Add opt-in dependency-aware sharded Types generation#866
jpsim wants to merge 2 commits intoapple:mainfrom
jpsim:jp/sharding

Conversation

@jpsim
Copy link

@jpsim jpsim commented Feb 22, 2026

Summary

This PR adds an experimental, fully opt-in sharded Types generation mode intended for very large OpenAPI specs.

Default behavior is unchanged. This is an additive path for advanced users who need better generated-code compile times and are willing to adopt custom build integration.

Why experimental

This feature targets a narrow, high-scale use case and introduces additional generation topology (layering + sharding + inter-file imports). Marking it experimental lets us:

  • gather real-world feedback before committing to long-term API stability,
  • iterate on naming/layout heuristics with lower compatibility pressure,
  • keep default maintenance burden low for the core project.

Real-world motivation / observed impact

In our production use case (large spec, filtered subset), generated output compilation was a major bottleneck.
With dependency-aware layering + sharding, we observed >2x faster compilation of generated output.

Example environment and workload:

  • OpenAPI source: ~18MB (filtered to ~300 operations for generation)
  • Baseline: ~45s generation + ~6m30s release compile of generated output
  • Sharded mode: generated output compile time reduced by more than half (roughly ~3m in this workload)

These results are workload- and build-system-dependent, but they motivated proposing this upstream as an optional path for similarly large deployments.

Benchmark snapshot (real-world workload)

The chart below shows compile time of generated output in our production project (not generator runtime), measured on the same filtered spec input.

  • Baseline: single-file Types output
  • Variant: dependency-aware sharded output
  • Observed: ~6m30s -> ~3m (more than 2x faster)

Note: this is one workload and build setup; absolute gains will vary by project and toolchain.

Ramp benchmark comparing before vs after compile time for generated swift-openapi output (single-file vs dependency-aware sharded output)

Scope of this PR

  • Add ShardingConfig to Config (opt-in only), with validation.
  • Add dependency-graph-based partitioning primitives used for sharding.
  • Add sharded Types output path (root + component/type/operation shard files).
  • Add CLI/config plumbing for explicit opt-in sharding.
  • Add tests for algorithms, config validation, and sharded generation behavior.

Non-goals

  • No change to default single-file output behavior.
  • No auto-sharding heuristics in this PR.
  • No promise of stable file naming/layout semantics yet (experimental).

Stability / compatibility

  • If sharding is not set, behavior is unchanged.
  • Experimental sharded output is explicitly subject to iteration.
  • Existing users are unaffected unless they opt in.

Maintenance posture

  • Isolated implementation and tests to limit impact on the main generation path.
  • Happy to adjust scope, naming, or API shape to align with maintainer preference.
  • If preferred, this can be additionally gated/documented under an explicit “Experimental” section.

Add foundational infrastructure for dependency-aware sharded
code generation without enabling the feature.

* `GraphAlgorithms.swift`: iterative Tarjan SCC, topological
  sort via min-heap, condensation DAG, longest-path layering,
  and LPT bin-packing
* `ShardingConfig` in `Config.swift` with `typeShardCounts`,
  `operationLayerShardCounts`, validation, and optional
  `modulePrefix` for deterministic file naming
* `StructuredSwiftRepresentation.file` replaced with `.files`
  array for multi-file output (backward-compatible via
  `init(file:)`)
* `ImportDescription` gains `exported` flag; renderer emits
  `@_exported` prefix when set
* `HeapModule` added as a dependency from swift-collections
* Comprehensive tests for all graph algorithms
Add opt-in sharding that splits generated Types and Operations
output across multiple files based on schema dependency layers,
enabling parallel compilation in consuming build systems.

* `SchemaDependencyGraph`: builds a dependency graph from
  `OpenAPI.ComponentDictionary<JSONSchema>`, computes SCC
  layers, and extracts per-operation schema references
* `TypesFileTranslator` gains `translateFileSharded`,
  `translateSchemasSharded`, `translateOperationsSharded`,
  `ShardNamingStrategy` (default and prefixed), and
  `ShardImportResolver` for cross-shard `@_exported` imports
* `runShardedGenerator()` in `GeneratorPipeline.swift` for
  multi-file output
* `--sharding` CLI flag and `sharding` config file key
* Remove verbose per-file log messages from `replaceFileContents`
* Tests for sharding invariants, determinism, naming contracts,
  and end-to-end generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant