Write-time broken-link validation in the Open Knowledge MCP.…#350
Merged
Conversation
…2064)
* feat(open-knowledge): correct-by-construction link authoring (PRD-7147)
Doc-naming MCP tools now emit a paste-ready canonical `linkTo` and write/edit
return write-time `brokenLinks` validation, so agents never hand-build the
broken relative/absolute hybrid and broken outbound links surface in the same
response instead of needing a separate dead-link round-trip.
- R1: linkTo {href,form,docName} on write/edit/search/exec/links/move/
restore_version; search gains an optional `fromDoc` input -> relative form.
New core builders buildLinkTo/buildAbsoluteMarkdownHref.
- R2: brokenLinks computed synchronously from the just-written bytes (NOT the
100ms-debounced index), report-only, always present ([] = all resolve).
New computeBrokenOutboundLinks in backlink-index.ts.
- R3: precedent #56 (canonical link contract) + AGENTS.md jump-index.
- R4: bundled SKILL.md + docs core-concepts.md rewritten to the coherent
contract (relative-default, no-hybrid rule, brokenLinks as primary check);
landed after R2 verified, per the spec sequencing rule. Guard test enforces
SKILL.md self-containment.
- R5: wiki-link compat preserved; broken [[Page]] flows through brokenLinks.
Spec: specs/2026-06-23-prd-7147-link-authoring-contract/
Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B
* test(open-knowledge): update fixtures for new linkTo/brokenLinks fields
Three server tests asserted exact shapes that the link-authoring contract
extends:
- search registration: inputSchema now also exposes `fromDoc`.
- agent-patch: the flat success body now always carries `brokenLinks` ([] for
a link-free patch).
(Local turbo caching of `server#test` masked these; CI's fresh run caught them.)
Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B
* refactor(open-knowledge): address PR review on link-authoring contract
- Unify BrokenLinkReason: one `BROKEN_LINK_REASONS` const in core feeds both
the Zod `z.enum` and the server-side type, closing the silent-drop drift the
reviewer flagged (extractor reason ↔ parser enum can no longer diverge).
- Thread the target's real on-disk extension into `linkTo` for edit, move,
links, and restore_version (extracted `docExtensionOnDisk` to shared.ts), so
`.mdx` docs get `.mdx` links from every tool — correct-by-construction, not
just write/search/exec.
- Batch `documents` output describe now lists `brokenLinks` + `linkTo`.
- search DESCRIPTION documents the new `fromDoc` param; exec/links document the
per-row `linkTo` (exec via the enrichedPaths field describe to respect the
2 KB tool-description cap).
- New advisory-warnings unit tests for parseBrokenLinks / formatBrokenLinkLines
/ formatBrokenLinkBrief; integration test now covers an `.mdx` edit's linkTo
+ documents the deliberate local-interface (wire-shape) decoupling.
Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B
* feat(open-knowledge): validate file links (assets + source files) in brokenLinks, not just docs
Extends R2 (write-time `brokenLinks`) to cover every local link target, not
only `.md`/`.mdx` docs. This closes the gap a real codebase-wiki run hit: a
wrong-depth `[src](../../../foo.py)` overshoots the content root and 404s
silently — invisible to both the editor red-underline and the doc-only link
graph.
- New reason `no-such-file` (sibling to `no-such-doc`/`unresolvable`), derived
from the single `BROKEN_LINK_REASONS` const so the wire enum + server type
stay in lockstep.
- `computeBrokenOutboundLinks` gains an injected `fileExists` oracle: doc links
resolve against the in-memory admitted set (unchanged); file links resolve via
the existing root-confining `resolveAssetProjectPath` then check disk. An
overshoot → `unresolvable`; an in-root miss → `no-such-file`. The oracle is
injected (not called inside the extractor) so the function stays pure and
unit-testable without a filesystem. Wiki-link asset embeds (`![[x.pdf]]`) are
out of scope — basename-resolved, not path-resolved.
- The 3 api-extension handlers (write/patch/frontmatter-patch) pass a shared
`linkedFileExists` predicate (`existsSync(resolve(contentDir, …))`).
- Tests: backlink-index file-oracle cases (clean / overshoot / missing /
absolute / external-skip), schema third-reason, advisory formatter, and an
integration test that writes a real `.py` on disk and asserts correct-depth is
clean while over-deep + missing surface in the same write response.
- Spec R2/AC2.7 + D9, SKILL.md reason list, and the changeset updated.
Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B
* docs(open-knowledge): reconcile codebase-wiki source-link guidance with brokenLinks file validation
The codebase-wiki pack (#1921) told the wiki agent that source-file links
produce "no dead-link noise." That is true of the link *graph* (the `links`
tool tracks only .md/.mdx edges) but became misleading once this branch's
brokenLinks check started validating source/asset file targets too: a
wrong-depth source link now surfaces as no-such-file (or unresolvable if it
overshoots the content root) in the write/edit response.
Update both the pack SKILL.md and the workflow({ kind: "wiki" }) body to keep
the graph caveat while making clear source links ARE validated at write time —
count the ../ hops from the page's own folder.
Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s
* docs(open-knowledge): align entity-vault link guidance with brokenLinks + GBrain interop
The entity-vault pack recommended path-qualified wikilinks as the preferred
form. Verified how brokenLinks resolves links and reconciled the guidance with
both OK's resolver and GBrain's:
- brokenLinks DOES handle path-qualified wikilinks: `[[people/alice|Alice]]`
strips the alias/anchor and resolves the `folder/slug` target vault-root
(not source-dir-relative), validated against the admitted set. Added a
regression test — the existing wiki-link coverage only had bare `[[Page]]`.
- Two footguns the verification surfaced, now documented in the pack:
- a bare markdown `[x](people/alice.md)` from a subfolder resolves
source-dir-relative in OK (`meetings/people/alice`) and false-reports
broken — use the `../`-correct relative form (paste linkTo.href + fromDoc).
- OK's leading-slash root-absolute form is valid in OK but GBrain rejects it
as an absolute filesystem path.
- wikilinks must stay extensionless; `[[…\.md]]` resolves to a missing doc.
Pack now prefers standard markdown relative links (GitHub + GBrain + OK all
agree) while keeping path-qualified wikilinks first-class for vault-root
addressing.
Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s
* Remove linkTo from MCP responses, keep brokenLinks write-time validation
An ablation of the link-authoring features against codebase-wiki generation
(opus and sonnet, on the microreservoir repo) found brokenLinks is the
load-bearing change. It reliably drives residual broken links to zero by
catching wrong-depth source links and forward-reference doc links at write
time. linkTo showed no measurable benefit for wiki generation: it only covers
links to docs that already exist, not the source-file depth or forward-reference
errors that actually break wiki output. This cuts linkTo to keep the response
lean and keeps brokenLinks.
Removed:
- linkTo field from write, edit, search, exec, links, move, restore_version
- buildLinkTo, LinkTo, linkToOutputField, docExtensionFromPath helpers
- search fromDoc input (existed only to make linkTo relative)
- linkTo prose from the platform and entity-vault skills
Kept and clarified:
- brokenLinks (no-such-doc, no-such-file, unresolvable) on write and edit
- Reconciled the platform skill forward-reference guidance: a same-pass
no-such-doc is an expected transient forward reference, and
links({ kind: dead }) is the authoritative end-state audit
* Reframe precedent #56 and spec around brokenLinks after cutting linkTo
Precedent #56 (canonical link contract) drops the linkTo emission affordance
and now centers on write-time brokenLinks validation, documenting the three
reasons (no-such-doc, no-such-file, unresolvable) and the same-pass
forward-reference clarification. Added a pre-merge corrigendum to the PRD-7147
spec recording that R1 (linkTo) was removed after the ablation, keeping the
original design prose as a historical record.
* Address review: document no-such-file + remove linkTo-cut leftovers
- shared.ts / backlink-index.ts: the brokenLinks describe string and the
BrokenOutboundLink JSDoc now list the no-such-file reason and note that
resolvedTo can be a content-root file path.
- link-authoring-contract.test.ts: the local BrokenLink wire-shape interface
gains the no-such-file reason it already asserts.
- advisory-warnings.test.ts: parseBrokenLinks now exercises all three reasons.
- restore-version.ts: drop the dead resolveContentDir call left after the
linkTo removal.
- md-audit registry: drop the stale linkTo mention from the exclusion comment.
---------
GitOrigin-RevId: 2534b12d5687b1d0a8e34b659a9fa53eee963add
Contributor
There was a problem hiding this comment.
Automated approval from agents-private public-mirror-sync (run: https://github.com/inkeep/agents-private/actions/runs/28293765944). Source of truth is the monorepo; direct edits on inkeep/open-knowledge are overwritten on next sync.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Write-time broken-link validation in the Open Knowledge MCP.
write/editnow returnbrokenLinksin the same response — outbound links that don't resolve are surfaced at write time, report-only so the write still lands and you can author a doc before its link target exists. Validation covers every local link, not just docs: the./-onto-a-content-root-path doubling footgun and missing[[wiki]]/ markdown doc targets (no-such-doc), root-escaping paths from one../too many (unresolvable), and links to assets or source files ([src](../../foo.py)) that don't exist on disk at the resolved path (no-such-file). That last reason closes the gap a real codebase-wiki run hit: wrong-depth source-file links that 404 silently because the doc-only link graph never tracked them. The platform and pack skills are updated to point agents atbrokenLinksas the primary write-time check and to clarify that a same-pass forward-reference reports asno-such-docuntil its target lands (thelinks({ kind: "dead" })audit is the authoritative end-state check).