Affected spec section
/standard/system-profiles/ — locator validation contract for citation-system profiles.
Motivation
Two concrete registry issues filed today both tweak locator_regex to fix concrete bugs, but they share a deeper question that is worth surfacing at the spec level before we keep patching individual profiles:
The question: is locator_regex + examples the right validation contract for a citation-system profile at all?
Reasons to revisit:
- The deterministic UUID seed uses the exact normalized locator bytes, so any regex-valid spelling difference (leading zeros, case, abbreviation form) mints a distinct permanent identity. A regex alone is a weak fence against identity splits.
- Some axes that profiles actually need to validate cannot be expressed in a regex at all — most obviously Bible book vocabulary, which is an enumerated set, not a character pattern.
- ECMAScript regex is backtracking; allowing arbitrary contributor regexes is a latent DoS/CI-stall risk (see review notes on RE2-compatible subsets).
- Tightening the regex to enforce canonical forms (no leading zeros, canonical case) tangles syntactic validation with normalization rules in a way that is hard to read and easy to get wrong.
This isn't a request to ship a new design today — it's a request to decide what we want the contract to be before more profiles land.
Proposed change
Open the design question for discussion. Sketch options without picking one:
- Status quo, tightened. Keep
locator_regex but constrain it to an RE2-compatible / linear-time subset, and require profiles to declare canonical digit and case forms enforced by rejection rather than folding.
- Regex + structured fields. Keep
locator_regex for the grammar around named capture groups, and add structured fields beside it for things regex can't express — e.g. vocabulary: for enumerated tokens (Bible books, Stephanus columns), canonical_case:, leading_zeros: forbid.
- Declarative validator schema. Replace the single regex with a small declarative shape: named components with per-component types and constraints (
integer min=1 no-leading-zeros, enum: [Genesis, …], regex: [ab]). Compiler derives the regex.
- Drop syntactic validation from the profile. Rely on minting-time review and on the resolver. Cheapest spec, riskiest data.
Alternatives considered
The four options above are the alternatives. Reasonable hybrids exist (e.g. option 2 with option 1's RE2 constraint).
Compatibility impact
additive (no breaking change)
(The discussion is additive. A chosen outcome might later require breaking changes to existing profiles; that would be handled in a follow-up proposal.)
Target spec version
v0.2.0-draft
Affected spec section
/standard/system-profiles/— locator validation contract for citation-system profiles.Motivation
Two concrete registry issues filed today both tweak
locator_regexto fix concrete bugs, but they share a deeper question that is worth surfacing at the spec level before we keep patching individual profiles:The question: is
locator_regex+examplesthe right validation contract for a citation-system profile at all?Reasons to revisit:
This isn't a request to ship a new design today — it's a request to decide what we want the contract to be before more profiles land.
Proposed change
Open the design question for discussion. Sketch options without picking one:
locator_regexbut constrain it to an RE2-compatible / linear-time subset, and require profiles to declare canonical digit and case forms enforced by rejection rather than folding.locator_regexfor the grammar around named capture groups, and add structured fields beside it for things regex can't express — e.g.vocabulary:for enumerated tokens (Bible books, Stephanus columns),canonical_case:,leading_zeros: forbid.integer min=1 no-leading-zeros,enum: [Genesis, …],regex: [ab]). Compiler derives the regex.Alternatives considered
The four options above are the alternatives. Reasonable hybrids exist (e.g. option 2 with option 1's RE2 constraint).
Compatibility impact
additive (no breaking change)
(The discussion is additive. A chosen outcome might later require breaking changes to existing profiles; that would be handled in a follow-up proposal.)
Target spec version
v0.2.0-draft