Skip to content

adr: purpose of deterministic UUIDs (compiler reproducibility vs. offline minting) #15

Description

@maehr

Context

The CanonicalReference UUID seed includes normalization_version, which is fixed at minting time (specification.md:207, identifier-syntax.md:37). The citation system's current normalization_version may differ from any given reference's minting-time value — that's the whole point of the version field.

Consequence: a third party with only

(work_key, citation_system_key, locator)

cannot reliably compute the registry UUID, because they don't know which normalization_version was in force when that specific reference was minted. They have to look it up in the registry first — at which point they already have the IRI.

This means deterministic UUIDs in TextRefs do not support naive client-side offline minting. What they do support is:

  • compiler / mirror / export reproducibility — anyone with the registry can reproduce identical IDs;
  • detection of accidental ID drift in the compile pipeline;
  • a deterministic round-trip from authored data → published artifact.

The spec is currently silent on which of these is the intended design goal of determinism, and the asymmetry has bitten implementers thinking about offline tools.

Options considered

Option 1 — keep normalization_version in the seed; declare determinism's purpose as compiler/mirror reproducibility. (Recommended.)

  • Seed unchanged. IDs already minted stay valid.
  • /standard/identifier-syntax/ explicitly states that clients resolve by lookup; determinism is for registry/mirror integrity, not offline computation.
  • Pros: zero migration; matches the de-facto behaviour; easy to explain.
  • Cons: requires acknowledging that "deterministic" doesn't mean "offline-computable from semantic fields alone" — some readers will be disappointed.

Option 2 — remove normalization_version from the seed.

  • IDs become computable from (work_key, citation_system_key, locator) alone, given an agreed normalization.
  • Pros: true offline-computable IDs; simpler third-party tooling.
  • Cons: every existing reference re-mints under new IDs (breaking, registry-wide); a normalization change silently changes IDs unless we add some other discriminator; opens the door to the John.3.16 / john.3.16 identity-split class of bug (overlaps with standard: require canonical ASCII digit and case forms in citation-system profiles #13 on canonical forms).

Option 3 — keep current seed; tell clients to "compute with current version, fall back to lookup on miss".

  • No spec/data change.
  • Pros: zero cost up front.
  • Cons: produces a false sense of offline computability; clients will silently mis-mint then quietly hit the lookup path; failure mode is hard to detect.

Recommendation

Option 1. Adopt unless the project explicitly wants client-side offline minting as a stated design goal — and the review found no evidence it does.

This is fundamentally a documentation/intent decision, not a data change. Land as an ADR (decisions/ADR-NNNN-deterministic-uuid-purpose.md) so future implementers don't have to re-derive the asymmetry from the spec.

Expected consequences

  • /standard/identifier-syntax/ gains a "Purpose of determinism" subsection clarifying scope (registry/mirror reproducibility, not offline minting).
  • A new ADR captures the rationale and the options not taken.
  • Future questions of the form "why can't I compute IDs from (work, system, locator) alone?" point at the ADR.
  • No data migration. No code change beyond docs.
  • If Option 2 is later chosen instead, the ADR becomes the explicit superseded record and provides the historical context for the breaking change.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions