Skip to content

data: Bekker locator regex rejects valid pages 1–99 #2

Description

@maehr

Affected file

systems/bekker.yaml

Citation system

Bekker numbering (Aristotelian corpus)

What is wrong or missing?

The current locator_regex requires the page component to be 3–4 digits:

locator_regex: '^(?<page>[0-9]{3,4})(?<column>[ab])(?<line>[0-9]{1,2})$'

Bekker pagination, however, begins at page 1. Valid early Aristotelian references such as 1a1, 16a1, 24b10, and 99a5 therefore fail validation. The issue is currently latent because the only registered Aristotle work (Nicomachean Ethics) sits at four-digit pages (1094a–1181b), but it would block registering anything from the opening of the corpus (e.g. Categories, De Interpretatione).

The current regex also accepts non-canonical forms:

  • leading-zero pages such as 0983b10
  • line zero such as 983b0
  • leading-zero lines such as 983b01

Because the deterministic UUID seed uses the exact normalized locator, each of these spellings would mint a distinct permanent identity for the same passage.

Suggested correction or new value

Tighten the regex to require pages 1–9999 and lines 1–99, both without leading zeros:

locator_regex: '^(?<page>[1-9][0-9]{0,3})(?<column>[ab])(?<line>[1-9][0-9]?)$'
examples:
  valid:
    - '1a1'
    - '16a1'
    - '24b10'
    - '1094a1'
    - '1462b10'
  invalid:
    - '983'
    - '983c10'
    - '983b'
    - '983b100'
    - '0983b10'
    - '983b0'
    - '983b01'

Happy to open a small PR if that is useful.

Evidence and sources

  • Bekker, August Immanuel (ed.), Aristotelis Opera, Berlin 1831 — pagination runs from p. 1.
  • Examples of early-corpus references: Aristotle, Categories 1a1 (https://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.01.0051).
  • Current registry profile: systems/bekker.yaml.

Rights or licence note

Open. Bekker 1831 is public-domain; Perseus links are CC-BY-SA.

Related

This is one of two concrete cases (the other is bible-book-chapter-verse, filed separately) that motivate a broader spec-level question about whether locator_regex + examples is the right validation contract for citation-system profiles. That meta-discussion belongs in the standard repo; this issue is intentionally narrow.


Related (meta-discussion): textrefs/textrefs.org#9 — whether locator_regex + examples is the right validation contract in the first place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions