Skip to content

Write-time broken-link validation in the Open Knowledge MCP.…#350

Merged
inkeep-oss-sync[bot] merged 1 commit into
mainfrom
copybara/sync
Jun 27, 2026
Merged

Write-time broken-link validation in the Open Knowledge MCP.…#350
inkeep-oss-sync[bot] merged 1 commit into
mainfrom
copybara/sync

Conversation

@inkeep-oss-sync

Copy link
Copy Markdown
Contributor

Write-time broken-link validation in the Open Knowledge MCP. write/edit now return brokenLinks in the same response — outbound links that don't resolve are surfaced at write time, report-only so the write still lands and you can author a doc before its link target exists. Validation covers every local link, not just docs: the ./-onto-a-content-root-path doubling footgun and missing [[wiki]] / markdown doc targets (no-such-doc), root-escaping paths from one ../ too many (unresolvable), and links to assets or source files ([src](../../foo.py)) that don't exist on disk at the resolved path (no-such-file). That last reason closes the gap a real codebase-wiki run hit: wrong-depth source-file links that 404 silently because the doc-only link graph never tracked them. The platform and pack skills are updated to point agents at brokenLinks as the primary write-time check and to clarify that a same-pass forward-reference reports as no-such-doc until its target lands (the links({ kind: "dead" }) audit is the authoritative end-state check).

…2064)

* feat(open-knowledge): correct-by-construction link authoring (PRD-7147)

Doc-naming MCP tools now emit a paste-ready canonical `linkTo` and write/edit
return write-time `brokenLinks` validation, so agents never hand-build the
broken relative/absolute hybrid and broken outbound links surface in the same
response instead of needing a separate dead-link round-trip.

- R1: linkTo {href,form,docName} on write/edit/search/exec/links/move/
  restore_version; search gains an optional `fromDoc` input -> relative form.
  New core builders buildLinkTo/buildAbsoluteMarkdownHref.
- R2: brokenLinks computed synchronously from the just-written bytes (NOT the
  100ms-debounced index), report-only, always present ([] = all resolve).
  New computeBrokenOutboundLinks in backlink-index.ts.
- R3: precedent #56 (canonical link contract) + AGENTS.md jump-index.
- R4: bundled SKILL.md + docs core-concepts.md rewritten to the coherent
  contract (relative-default, no-hybrid rule, brokenLinks as primary check);
  landed after R2 verified, per the spec sequencing rule. Guard test enforces
  SKILL.md self-containment.
- R5: wiki-link compat preserved; broken [[Page]] flows through brokenLinks.

Spec: specs/2026-06-23-prd-7147-link-authoring-contract/

Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B

* test(open-knowledge): update fixtures for new linkTo/brokenLinks fields

Three server tests asserted exact shapes that the link-authoring contract
extends:
- search registration: inputSchema now also exposes `fromDoc`.
- agent-patch: the flat success body now always carries `brokenLinks` ([] for
  a link-free patch).

(Local turbo caching of `server#test` masked these; CI's fresh run caught them.)

Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B

* refactor(open-knowledge): address PR review on link-authoring contract

- Unify BrokenLinkReason: one `BROKEN_LINK_REASONS` const in core feeds both
  the Zod `z.enum` and the server-side type, closing the silent-drop drift the
  reviewer flagged (extractor reason ↔ parser enum can no longer diverge).
- Thread the target's real on-disk extension into `linkTo` for edit, move,
  links, and restore_version (extracted `docExtensionOnDisk` to shared.ts), so
  `.mdx` docs get `.mdx` links from every tool — correct-by-construction, not
  just write/search/exec.
- Batch `documents` output describe now lists `brokenLinks` + `linkTo`.
- search DESCRIPTION documents the new `fromDoc` param; exec/links document the
  per-row `linkTo` (exec via the enrichedPaths field describe to respect the
  2 KB tool-description cap).
- New advisory-warnings unit tests for parseBrokenLinks / formatBrokenLinkLines
  / formatBrokenLinkBrief; integration test now covers an `.mdx` edit's linkTo
  + documents the deliberate local-interface (wire-shape) decoupling.

Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B

* feat(open-knowledge): validate file links (assets + source files) in brokenLinks, not just docs

Extends R2 (write-time `brokenLinks`) to cover every local link target, not
only `.md`/`.mdx` docs. This closes the gap a real codebase-wiki run hit: a
wrong-depth `[src](../../../foo.py)` overshoots the content root and 404s
silently — invisible to both the editor red-underline and the doc-only link
graph.

- New reason `no-such-file` (sibling to `no-such-doc`/`unresolvable`), derived
  from the single `BROKEN_LINK_REASONS` const so the wire enum + server type
  stay in lockstep.
- `computeBrokenOutboundLinks` gains an injected `fileExists` oracle: doc links
  resolve against the in-memory admitted set (unchanged); file links resolve via
  the existing root-confining `resolveAssetProjectPath` then check disk. An
  overshoot → `unresolvable`; an in-root miss → `no-such-file`. The oracle is
  injected (not called inside the extractor) so the function stays pure and
  unit-testable without a filesystem. Wiki-link asset embeds (`![[x.pdf]]`) are
  out of scope — basename-resolved, not path-resolved.
- The 3 api-extension handlers (write/patch/frontmatter-patch) pass a shared
  `linkedFileExists` predicate (`existsSync(resolve(contentDir, …))`).
- Tests: backlink-index file-oracle cases (clean / overshoot / missing /
  absolute / external-skip), schema third-reason, advisory formatter, and an
  integration test that writes a real `.py` on disk and asserts correct-depth is
  clean while over-deep + missing surface in the same write response.
- Spec R2/AC2.7 + D9, SKILL.md reason list, and the changeset updated.

Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B

* docs(open-knowledge): reconcile codebase-wiki source-link guidance with brokenLinks file validation

The codebase-wiki pack (#1921) told the wiki agent that source-file links
produce "no dead-link noise." That is true of the link *graph* (the `links`
tool tracks only .md/.mdx edges) but became misleading once this branch's
brokenLinks check started validating source/asset file targets too: a
wrong-depth source link now surfaces as no-such-file (or unresolvable if it
overshoots the content root) in the write/edit response.

Update both the pack SKILL.md and the workflow({ kind: "wiki" }) body to keep
the graph caveat while making clear source links ARE validated at write time —
count the ../ hops from the page's own folder.

Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s

* docs(open-knowledge): align entity-vault link guidance with brokenLinks + GBrain interop

The entity-vault pack recommended path-qualified wikilinks as the preferred
form. Verified how brokenLinks resolves links and reconciled the guidance with
both OK's resolver and GBrain's:

- brokenLinks DOES handle path-qualified wikilinks: `[[people/alice|Alice]]`
  strips the alias/anchor and resolves the `folder/slug` target vault-root
  (not source-dir-relative), validated against the admitted set. Added a
  regression test — the existing wiki-link coverage only had bare `[[Page]]`.
- Two footguns the verification surfaced, now documented in the pack:
  - a bare markdown `[x](people/alice.md)` from a subfolder resolves
    source-dir-relative in OK (`meetings/people/alice`) and false-reports
    broken — use the `../`-correct relative form (paste linkTo.href + fromDoc).
  - OK's leading-slash root-absolute form is valid in OK but GBrain rejects it
    as an absolute filesystem path.
  - wikilinks must stay extensionless; `[[…\.md]]` resolves to a missing doc.

Pack now prefers standard markdown relative links (GitHub + GBrain + OK all
agree) while keeping path-qualified wikilinks first-class for vault-root
addressing.

Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s

* Remove linkTo from MCP responses, keep brokenLinks write-time validation

An ablation of the link-authoring features against codebase-wiki generation
(opus and sonnet, on the microreservoir repo) found brokenLinks is the
load-bearing change. It reliably drives residual broken links to zero by
catching wrong-depth source links and forward-reference doc links at write
time. linkTo showed no measurable benefit for wiki generation: it only covers
links to docs that already exist, not the source-file depth or forward-reference
errors that actually break wiki output. This cuts linkTo to keep the response
lean and keeps brokenLinks.

Removed:
- linkTo field from write, edit, search, exec, links, move, restore_version
- buildLinkTo, LinkTo, linkToOutputField, docExtensionFromPath helpers
- search fromDoc input (existed only to make linkTo relative)
- linkTo prose from the platform and entity-vault skills

Kept and clarified:
- brokenLinks (no-such-doc, no-such-file, unresolvable) on write and edit
- Reconciled the platform skill forward-reference guidance: a same-pass
  no-such-doc is an expected transient forward reference, and
  links({ kind: dead }) is the authoritative end-state audit

* Reframe precedent #56 and spec around brokenLinks after cutting linkTo

Precedent #56 (canonical link contract) drops the linkTo emission affordance
and now centers on write-time brokenLinks validation, documenting the three
reasons (no-such-doc, no-such-file, unresolvable) and the same-pass
forward-reference clarification. Added a pre-merge corrigendum to the PRD-7147
spec recording that R1 (linkTo) was removed after the ablation, keeping the
original design prose as a historical record.

* Address review: document no-such-file + remove linkTo-cut leftovers

- shared.ts / backlink-index.ts: the brokenLinks describe string and the
  BrokenOutboundLink JSDoc now list the no-such-file reason and note that
  resolvedTo can be a content-root file path.
- link-authoring-contract.test.ts: the local BrokenLink wire-shape interface
  gains the no-such-file reason it already asserts.
- advisory-warnings.test.ts: parseBrokenLinks now exercises all three reasons.
- restore-version.ts: drop the dead resolveContentDir call left after the
  linkTo removal.
- md-audit registry: drop the stale linkTo mention from the exclusion comment.

---------

GitOrigin-RevId: 2534b12d5687b1d0a8e34b659a9fa53eee963add

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval from agents-private public-mirror-sync (run: https://github.com/inkeep/agents-private/actions/runs/28293765944). Source of truth is the monorepo; direct edits on inkeep/open-knowledge are overwritten on next sync.

@inkeep-oss-sync inkeep-oss-sync Bot merged commit 2bb7b41 into main Jun 27, 2026
1 check passed
@inkeep-oss-sync inkeep-oss-sync Bot deleted the copybara/sync branch June 27, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant