Skip to content

SortingCatalog: Shared Topological Sort Abstraction #115

@avallete

Description

@avallete

SortingCatalog: Shared Topological Sort Abstraction

Current State

Two independent topological sort implementations exist:

  • pg-topo ([src/graph/topo-sort.ts](packages/pg-topo/src/graph/topo-sort.ts)): Kahn's algorithm with rich tie-breaking (phase, statementClass, filePath, statementIndex) + Tarjan SCC cycle detection. Operates on StatementNode[] + Map<number, Set<number>> edges.
  • pg-delta ([src/core/sort/topological-sort.ts](packages/pg-delta/src/core/sort/topological-sort.ts)): Kahn's algorithm with simple index-based tie-breaking + DFS cycle detection. Operates on nodeCount + Array<[number, number]> edges.

Both packages also have separate graph-building logic:

  • pg-topo [build-graph.ts](packages/pg-topo/src/graph/build-graph.ts): Resolves ObjectRef provides/requires with kind compatibility + signature matching + external providers.
  • pg-delta [graph-builder.ts](packages/pg-delta/src/core/sort/graph-builder.ts) + [sort-changes.ts](packages/pg-delta/src/core/sort/sort-changes.ts): Resolves string-based stable IDs from Change.creates/requires + PgDependRow[] from pg_depend.

pg-delta already uses pg-topo for the declarative apply path (SQL files via analyzeAndSort), but uses its own sort for the imperative path (diff-based changes).

Design: SortingCatalog

The SortingCatalog is a string-key-based intermediate representation that sits between domain-specific dependency extraction and topological sorting. String keys are the right level of abstraction because:

  • pg-topo already has objectRefKey() producing kind:schema:name:signature strings
  • pg-delta already uses string-based stable IDs (type:schema.name)
  • It avoids coupling the sort to ObjectRef semantics (kind compatibility, signature matching)
// New types in pg-topo

type SortingItem = {
  /** Canonical string keys this item provides/creates */
  provides: string[];
  /** Canonical string keys this item depends on */
  requires: string[];
  /** Primary tie-breaking priority (lower = earlier). Maps to phase weight. */
  priority: number;
  /** Secondary tie-breaking weight (lower = earlier). Maps to type/class weight. */
  weight: number;
  /** Tertiary tie-breaking key for deterministic ordering (e.g., filePath or stableId). */
  groupKey: string;
  /** Final tie-breaking: original position in input. */
  orderKey: number;
  /** Human-readable label for diagnostics and error messages. */
  label: string;
};

type SortingCatalog = {
  items: SortingItem[];
  /** Keys provided externally (no edge needed when an item requires one of these). */
  externalProviders?: Set<string>;
  /** Direct edges to add beyond provides/requires resolution (for custom constraints). */
  additionalEdges?: Array<{ from: number; to: number; label?: string }>;
};

type SortCatalogResult = {
  orderedIndices: number[];
  cycleGroups: number[][];
  diagnostics: SortDiagnostic[];
};

Architecture: Two-Layer Pipeline

flowchart TB
  subgraph pgTopo_sql [pg-topo SQL Path]
    SQL["SQL strings"] --> Parse["Parse + Classify + Extract"]
    Parse --> SN["StatementNode[] with ObjectRef provides/requires"]
    SN --> Resolve["Resolve ObjectRefs: kind/signature compat matching"]
    Resolve --> SC1["SortingCatalog with string keys"]
  end

  subgraph pgDelta_path [pg-delta Change Path]
    Changes["Change[] + PgDependRow[]"] --> Convert["Convert stable IDs to provides/requires"]
    Convert --> SC2["SortingCatalog with string keys"]
  end

  SC1 --> Sort["sortCatalog() — shared in pg-topo"]
  SC2 --> Sort
  Sort --> Result["SortCatalogResult: orderedIndices + cycles + diagnostics"]
Loading

Layer 1 (domain-specific): Produce a SortingCatalog from domain objects. Each domain handles its own matching/resolution:

  • pg-topo SQL path: ObjectRef compatibility matching happens BEFORE building the SortingCatalog. The current buildGraph is split into a "resolve" step (ObjectRef matching -> canonical string keys) and the generic sort.
  • pg-delta path: Stable ID matching + pg_depend row resolution happens during SortingCatalog construction. Custom constraints become additionalEdges.

Layer 2 (shared): sortCatalog(catalog: SortingCatalog) -> SortCatalogResult. This does:

  1. Build edges from provides/requires (simple string matching)
  2. Add additionalEdges
  3. Run Kahn's algorithm with priority > weight > groupKey > orderKey tie-breaking
  4. SCC cycle detection for remaining nodes
  5. Return ordered indices + cycle groups + diagnostics (unresolved deps, duplicate producers)

Key Changes by Package

pg-topo

  1. New types: SortingItem, SortingCatalog, SortCatalogResult in [src/model/types.ts](packages/pg-topo/src/model/types.ts).
  2. New sortCatalog() function: Generic sort operating on SortingCatalog. Lives in a new file src/graph/sort-catalog.ts. Replaces the current topoSort + buildGraph combination for string-key-based sorting.
  3. Split buildGraph: The current [build-graph.ts](packages/pg-topo/src/graph/build-graph.ts) is refactored into:
  • A resolve step (resolve-dependencies.ts): Takes StatementNode[] + externalProviders, does ObjectRef matching (kind compat, signature compat), produces canonical string keys for each node's provides/requires. This is where isKindCompatible, signaturesCompatible, producerIndicesForRequirement, etc. live.
  • A catalog builder (build-sorting-catalog.ts): Takes resolved StatementNodes, converts to SortingCatalog (objectRefKey for provides/requires, phase weights, statement class weights).
  • The generic sortCatalog() handles the rest.
  1. Refactor analyzeAndSort: [analyze-and-sort.ts](packages/pg-topo/src/analyze-and-sort.ts) becomes:
  • Parse SQL -> StatementNode[]
  • Resolve ObjectRefs (new resolve step) -> resolved deps + diagnostics
  • Build SortingCatalog
  • Call sortCatalog()
  • Map results back to StatementNode ordering
  1. New public API: Export sortCatalog, SortingCatalog, SortingItem, SortCatalogResult from [src/index.ts](packages/pg-topo/src/index.ts). Also export a buildSortingCatalogFromStatements() helper for the SQL path.
  2. Shuffle testing utility: A test helper that takes a SortingCatalog, shuffles items, sorts, and validates the output order is consistent with the dependency constraints. This enables detecting missing dependencies.

pg-delta

  1. New adapter: A function in sort/ (e.g., build-sorting-catalog.ts) that converts Change[] + PgDependRow[] into a SortingCatalog:
  • provides = Change.creates (and Change.drops in DROP phase)
  • requires = Change.requires
  • pg_depend rows mapped to requires or additionalEdges
  • Custom constraints become additionalEdges
  • priority from execution phase (drop vs create_alter)
  • weight from logical sort weight (object type ordering)
  • groupKey from stable ID
  • label from change description
  1. Refactor sort-changes.ts: Replace calls to performStableTopologicalSort + findCycle with calls to pg-topo's sortCatalog(). The cycle-breaking loop in sortPhaseChanges would remain in pg-delta (it's specific to pg-delta's domain: sequence ownership, etc.), operating on the SortingCatalog by modifying edges before re-sorting.
  2. Remove topological-sort.ts: Delete performStableTopologicalSort, findCycle. Keep formatCycleError (or move it to a shared location) since pg-delta needs cycle error formatting with Change-specific context.

Handling pg-delta's Specific Concerns

  • Two-phase sort (DROP reversed): pg-delta continues to partition changes into DROP and CREATE_ALTER phases, building a separate SortingCatalog for each. Edge inversion for DROP phase happens during SortingCatalog construction (swap provides/requires or swap source/target in additionalEdges).
  • Cycle-breaking: Remains in pg-delta. The loop: build SortingCatalog -> sortCatalog() -> if cycles, filter edges -> rebuild catalog -> re-sort. pg-topo's sortCatalog reports cycles but doesn't break them (reports cycle groups like it does today).
  • Custom constraints: Expressed as additionalEdges in the SortingCatalog. These are direct ordering rules (e.g., "default privileges before CREATEs") that bypass provides/requires resolution.
  • Debug visualization: pg-delta's [debug-visualization.ts](packages/pg-delta/src/core/sort/debug-visualization.ts) can continue to work with the SortingCatalog result + Change metadata for Mermaid output.

Migration Strategy

This is a non-trivial refactoring. Recommended order:

  1. Phase A (pg-topo only): Add SortingCatalog types + sortCatalog() alongside existing code. Verify by testing that analyzeAndSort produces identical results when routed through the new path.
  2. Phase B (pg-topo refactor): Split buildGraph into resolve + catalog builder. Refactor analyzeAndSort to use the new pipeline. All existing pg-topo tests must pass.
  3. Phase C (pg-delta integration): Add the Change -> SortingCatalog adapter. Wire sort-changes.ts to use sortCatalog(). All existing pg-delta tests must pass.
  4. Phase D (cleanup): Remove topological-sort.ts from pg-delta. Remove old topoSort from pg-topo if fully superseded.
  5. Phase E (shuffle testing): Add shuffle test utility to pg-topo. Use it in pg-topo's test suite to validate dependency extraction completeness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    🛠️ ChoreInternal maintenance, refactoring, tooling, dependencies, or performance work

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions