Skip to content

docs: link-check the unified site (multi-Astro hierarchy) #75

@lex00

Description

@lex00

Internal links on the published docs site break regularly. Root cause: the docs are 13 separately-built Astro/Starlight sites (docs/ + 12 lexicons) that get stitched into one tree at .docs-dist/chant/ by scripts/build-docs.sh. Each Starlight build only validates intra-site slug links — cross-site links written as raw paths (link: '/lexicons/k8s/gke-composites/' in docs/astro.config.mjs, hand-written [text](/chant/foo) in MDX) are opaque href strings to the per-site builds. Nothing catches typos, renames, or base-prefix mistakes until users hit a 404 in prod.

The unified .docs-dist/chant/ tree is the only place where cross-site links resolve against real files, so any validation must run against that assembled output — not against individual lexicon builds.

Part 1: Link checker (done)

just docs-check-links runs lychee in offline mode against the unified output. Lychee crawls every emitted HTML file and resolves both relative and root-relative links against .docs-dist/, matching how GitHub Pages serves the site. Asset references (CSS/JS/fonts/etc.) and Starlight's pagefind/ index are excluded — focus is on page-to-page navigation.

Part 2: Systemic base-prefix bug (root cause of most breakage)

First lychee run reported ~603 broken links. Initial guess (lexicon sidebars missing the /chant/ base prefix) was wrong. Audit of docs/src/content/docs/** reveals the actual breakdown:

  • 485 broken root-relative links across 120 source files — the dominant class
    • 483 are /api/* references in docs/src/content/docs/api/* — auto-generated by TypeDoc on every build, so hand-fixes are useless
    • 2 are hand-written /lexicons/* links: the Introduction page (getting-started/introduction.mdx:16, 11 lexicon links on one line) and lexicon-authoring/observation.mdx:260
  • 1 is a hardcoded literal in packages/core/src/codegen/docs-sections.ts:153 that gets embedded in every generated lexicon overview page ([Serialization](/serialization/output-formats))
  • 262 links already use the correct /chant/... form elsewhere — style is mixed across the codebase today
  • The remaining ~115 of the 603 lychee errors are cross-doc bugs unrelated to base-prefixing (e.g. chant/guide/multi-lexicon from two tutorials, mis-prefixed temporal links, the temporal ops/worker-profiles confusion)

Root cause: the classic Astro/Starlight gotcha — root-relative markdown links inside .md/.mdx content do not get the configured base: '/chant' prepended at build time. Astro only base-prefixes its own internal navigation (sidebar link: entries, Starlight slug: entries) — never link hrefs that appear inside markdown body content. The TypeDoc-generated API docs and any hand-written [foo](/bar/) link both fall through this gap.

Fix: shared rehype-base-url plugin

Add a small rehype plugin (packages/core/src/codegen/rehype-base-url.ts) that walks the HAST tree and rewrites <a href> attributes starting with / to start with <base>/ instead. Register it in all 13 Astro configs (main + 12 lexicons). The lexicon configs are emitted from a shared template in packages/core/src/codegen/docs.ts:225-241, so the codegen patch propagates the wiring to all 12 lexicons on next regenerate. The hardcoded /serialization/... literal in docs-sections.ts:153 gets fixed at source (write /chant/serialization/output-formats) since the plugin in a lexicon site would otherwise mis-prefix it to /chant/lexicons/<name>/serialization/....

Plugin honors a projectBase: '/chant' option so it skips already-prefixed /chant/... links across all 13 sites, making it fully idempotent against the 262 already-correct links.

Tasks

  • Add just docs-check-links target (lychee, offline mode, excludes assets + pagefind)
  • Implement rehype-base-url plugin + unit test
  • Wire plugin into main docs/astro.config.mjs + docs/package.json
  • Patch packages/core/src/codegen/docs.ts template to emit plugin wiring in all 12 lexicon configs
  • Fix hardcoded /serialization/output-formats link in docs-sections.ts:153
  • Regenerate the 12 lexicon docs and commit
  • Investigate the ~115 residual errors (cross-doc target bugs unrelated to base-prefixing) — separate cleanup
  • Add lychee check to CI workflow — run on PRs touching docs paths
  • Optional: short note in lexicon-authoring/docs-site.mdx documenting that cross-site links must use the full /chant/... path

Done when

  • just docs-check-links exits 0 on a clean main (or with only residual cross-doc bugs tracked separately)
  • CI blocks PRs that introduce broken internal doc links
  • The systemic base-prefix bug is fixed at the plugin + codegen level so adding a new lexicon or regenerating TypeDoc API docs doesn't re-introduce it

Out of scope

  • External link checking (HTTP/HTTPS) — separate, flakier concern
  • Migrating to a single mono-Astro site or a typed URL module — possible follow-up

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions