Skip to content

data: Bible book-chapter-verse regex rejects numbered books; vocabulary unpinned #3

Description

@maehr

Affected file

systems/bible-book-chapter-verse.yaml

Citation system

Bible book-chapter-verse (currently labelled "OSIS-style")

What is wrong or missing?

The current locator_regex requires the book component to start with a letter:

locator_regex: '^(?<book>[A-Za-z][A-Za-z0-9_]*)\.(?<chapter>[1-9][0-9]*)\.(?<verse>[1-9][0-9]*)$'

This rejects every digit-initial numbered book — both the OSIS abbreviations (1Cor, 2Sam, 3John) and the full-name forms (1Corinthians, 2Samuel, 3John). As soon as the registry adds anything outside the four Gospels / Pentateuch / Psalms it will hit this.

Two related problems sit beside the regex:

  1. Vocabulary is unpinned. The profile is labelled "OSIS-style" but the examples (Genesis.1.1, Matthew.5.3) use full names, not OSIS codes (which would be Gen.1.1, Matt.5.3). It is currently unclear whether the canonical vocabulary is OSIS codes or full names; a regex cannot enforce either way.
  2. Case folding is open. Today John.3.16, john.3.16, and JOHN.3.16 all match — and would mint three distinct permanent identities. The canonical case needs to be declared.

Suggested correction or new value

Two parts:

A. Regex (mechanical, low-risk): widen to allow up to one leading digit on the book component while keeping at least one letter required (so 1.1.1 stays invalid):

locator_regex: '^(?<book>[1-4]?[A-Za-z][A-Za-z0-9_]*)\.(?<chapter>[1-9][0-9]*)\.(?<verse>[1-9][0-9]*)$'

B. Vocabulary + case (decision needed): before adjusting examples and prose, the project needs to pick:

  • OSIS codes (Gen, Matt, 1Cor) — the literature standard, and what the current label advertises; or
  • Full English names (Genesis, Matthew, 1Corinthians) — what the current examples and (per agent context) resolver comments suggest.

…and pick canonical case. After that decision, examples and the preferred_label should be updated together, and case-variant spellings added to invalid.

Happy to open a small PR for part A once a maintainer signs off on the direction; I would prefer not to push part B unilaterally.

Evidence and sources

Rights or licence note

OSIS specification is open (CrossWire). SBL Handbook is copyrighted but the abbreviation list itself is factual.

Related

This issue and the Bekker issue (filed separately) together motivate a broader spec-level question about whether locator_regex + examples is the right validation contract — Bible is the strongest case, because book vocabulary genuinely cannot be captured by regex alone. That discussion belongs in the standard repo and will be cross-linked here once filed.


Related (meta-discussion): textrefs/textrefs.org#9 — whether locator_regex + examples is the right validation contract in the first place. Bible is the strongest motivating case there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions