From 8e44a756f31471497a1eb5edc175c91e1b7c0ec0 Mon Sep 17 00:00:00 2001 From: mike-inkeep Date: Sat, 27 Jun 2026 11:26:56 -0400 Subject: [PATCH] feat(open-knowledge): write-time broken-link validation (PRD-7147) (#2064) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(open-knowledge): correct-by-construction link authoring (PRD-7147) Doc-naming MCP tools now emit a paste-ready canonical `linkTo` and write/edit return write-time `brokenLinks` validation, so agents never hand-build the broken relative/absolute hybrid and broken outbound links surface in the same response instead of needing a separate dead-link round-trip. - R1: linkTo {href,form,docName} on write/edit/search/exec/links/move/ restore_version; search gains an optional `fromDoc` input -> relative form. New core builders buildLinkTo/buildAbsoluteMarkdownHref. - R2: brokenLinks computed synchronously from the just-written bytes (NOT the 100ms-debounced index), report-only, always present ([] = all resolve). New computeBrokenOutboundLinks in backlink-index.ts. - R3: precedent #56 (canonical link contract) + AGENTS.md jump-index. - R4: bundled SKILL.md + docs core-concepts.md rewritten to the coherent contract (relative-default, no-hybrid rule, brokenLinks as primary check); landed after R2 verified, per the spec sequencing rule. Guard test enforces SKILL.md self-containment. - R5: wiki-link compat preserved; broken [[Page]] flows through brokenLinks. Spec: specs/2026-06-23-prd-7147-link-authoring-contract/ Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B * test(open-knowledge): update fixtures for new linkTo/brokenLinks fields Three server tests asserted exact shapes that the link-authoring contract extends: - search registration: inputSchema now also exposes `fromDoc`. - agent-patch: the flat success body now always carries `brokenLinks` ([] for a link-free patch). (Local turbo caching of `server#test` masked these; CI's fresh run caught them.) Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B * refactor(open-knowledge): address PR review on link-authoring contract - Unify BrokenLinkReason: one `BROKEN_LINK_REASONS` const in core feeds both the Zod `z.enum` and the server-side type, closing the silent-drop drift the reviewer flagged (extractor reason ↔ parser enum can no longer diverge). - Thread the target's real on-disk extension into `linkTo` for edit, move, links, and restore_version (extracted `docExtensionOnDisk` to shared.ts), so `.mdx` docs get `.mdx` links from every tool — correct-by-construction, not just write/search/exec. - Batch `documents` output describe now lists `brokenLinks` + `linkTo`. - search DESCRIPTION documents the new `fromDoc` param; exec/links document the per-row `linkTo` (exec via the enrichedPaths field describe to respect the 2 KB tool-description cap). - New advisory-warnings unit tests for parseBrokenLinks / formatBrokenLinkLines / formatBrokenLinkBrief; integration test now covers an `.mdx` edit's linkTo + documents the deliberate local-interface (wire-shape) decoupling. Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B * feat(open-knowledge): validate file links (assets + source files) in brokenLinks, not just docs Extends R2 (write-time `brokenLinks`) to cover every local link target, not only `.md`/`.mdx` docs. This closes the gap a real codebase-wiki run hit: a wrong-depth `[src](../../../foo.py)` overshoots the content root and 404s silently — invisible to both the editor red-underline and the doc-only link graph. - New reason `no-such-file` (sibling to `no-such-doc`/`unresolvable`), derived from the single `BROKEN_LINK_REASONS` const so the wire enum + server type stay in lockstep. - `computeBrokenOutboundLinks` gains an injected `fileExists` oracle: doc links resolve against the in-memory admitted set (unchanged); file links resolve via the existing root-confining `resolveAssetProjectPath` then check disk. An overshoot → `unresolvable`; an in-root miss → `no-such-file`. The oracle is injected (not called inside the extractor) so the function stays pure and unit-testable without a filesystem. Wiki-link asset embeds (`![[x.pdf]]`) are out of scope — basename-resolved, not path-resolved. - The 3 api-extension handlers (write/patch/frontmatter-patch) pass a shared `linkedFileExists` predicate (`existsSync(resolve(contentDir, …))`). - Tests: backlink-index file-oracle cases (clean / overshoot / missing / absolute / external-skip), schema third-reason, advisory formatter, and an integration test that writes a real `.py` on disk and asserts correct-depth is clean while over-deep + missing surface in the same write response. - Spec R2/AC2.7 + D9, SKILL.md reason list, and the changeset updated. Claude-Session: https://claude.ai/code/session_01ETC1KfQABzdFw94NAkZm6B * docs(open-knowledge): reconcile codebase-wiki source-link guidance with brokenLinks file validation The codebase-wiki pack (#1921) told the wiki agent that source-file links produce "no dead-link noise." That is true of the link *graph* (the `links` tool tracks only .md/.mdx edges) but became misleading once this branch's brokenLinks check started validating source/asset file targets too: a wrong-depth source link now surfaces as no-such-file (or unresolvable if it overshoots the content root) in the write/edit response. Update both the pack SKILL.md and the workflow({ kind: "wiki" }) body to keep the graph caveat while making clear source links ARE validated at write time — count the ../ hops from the page's own folder. Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s * docs(open-knowledge): align entity-vault link guidance with brokenLinks + GBrain interop The entity-vault pack recommended path-qualified wikilinks as the preferred form. Verified how brokenLinks resolves links and reconciled the guidance with both OK's resolver and GBrain's: - brokenLinks DOES handle path-qualified wikilinks: `[[people/alice|Alice]]` strips the alias/anchor and resolves the `folder/slug` target vault-root (not source-dir-relative), validated against the admitted set. Added a regression test — the existing wiki-link coverage only had bare `[[Page]]`. - Two footguns the verification surfaced, now documented in the pack: - a bare markdown `[x](people/alice.md)` from a subfolder resolves source-dir-relative in OK (`meetings/people/alice`) and false-reports broken — use the `../`-correct relative form (paste linkTo.href + fromDoc). - OK's leading-slash root-absolute form is valid in OK but GBrain rejects it as an absolute filesystem path. - wikilinks must stay extensionless; `[[…\.md]]` resolves to a missing doc. Pack now prefers standard markdown relative links (GitHub + GBrain + OK all agree) while keeping path-qualified wikilinks first-class for vault-root addressing. Claude-Session: https://claude.ai/code/session_01AGgjz3TLMKGVwHR8pG4w4s * Remove linkTo from MCP responses, keep brokenLinks write-time validation An ablation of the link-authoring features against codebase-wiki generation (opus and sonnet, on the microreservoir repo) found brokenLinks is the load-bearing change. It reliably drives residual broken links to zero by catching wrong-depth source links and forward-reference doc links at write time. linkTo showed no measurable benefit for wiki generation: it only covers links to docs that already exist, not the source-file depth or forward-reference errors that actually break wiki output. This cuts linkTo to keep the response lean and keeps brokenLinks. Removed: - linkTo field from write, edit, search, exec, links, move, restore_version - buildLinkTo, LinkTo, linkToOutputField, docExtensionFromPath helpers - search fromDoc input (existed only to make linkTo relative) - linkTo prose from the platform and entity-vault skills Kept and clarified: - brokenLinks (no-such-doc, no-such-file, unresolvable) on write and edit - Reconciled the platform skill forward-reference guidance: a same-pass no-such-doc is an expected transient forward reference, and links({ kind: dead }) is the authoritative end-state audit * Reframe precedent #56 and spec around brokenLinks after cutting linkTo Precedent #56 (canonical link contract) drops the linkTo emission affordance and now centers on write-time brokenLinks validation, documenting the three reasons (no-such-doc, no-such-file, unresolvable) and the same-pass forward-reference clarification. Added a pre-merge corrigendum to the PRD-7147 spec recording that R1 (linkTo) was removed after the ablation, keeping the original design prose as a historical record. * Address review: document no-such-file + remove linkTo-cut leftovers - shared.ts / backlink-index.ts: the brokenLinks describe string and the BrokenOutboundLink JSDoc now list the no-such-file reason and note that resolvedTo can be a content-root file path. - link-authoring-contract.test.ts: the local BrokenLink wire-shape interface gains the no-such-file reason it already asserts. - advisory-warnings.test.ts: parseBrokenLinks now exercises all three reasons. - restore-version.ts: drop the dead resolveContentDir call left after the linkTo removal. - md-audit registry: drop the stale linkTo mention from the exclusion comment. --------- GitOrigin-RevId: 2534b12d5687b1d0a8e34b659a9fa53eee963add --- .changeset/link-authoring-contract.md | 5 + docs/content/reference/core-concepts.md | 2 +- .../link-authoring-contract.test.ts | 286 ++++++++++++++++++ .../mcp-move-real-roundtrip.test.ts | 12 +- packages/core/src/index.ts | 6 + .../core/src/schemas/api/agent-write.test.ts | 38 +++ packages/core/src/schemas/api/agent-write.ts | 17 ++ packages/core/src/utils/link-targets.test.ts | 21 ++ packages/core/src/utils/link-targets.ts | 12 +- .../skills/packs/codebase-wiki/SKILL.md | 2 +- .../assets/skills/packs/entity-vault/SKILL.md | 7 +- .../server/assets/skills/project/SKILL.md | 2 +- .../skills/project/references/linking.md | 10 +- packages/server/src/api-agent-patch.test.ts | 2 + packages/server/src/api-extension.ts | 41 ++- packages/server/src/backlink-index.test.ts | 167 ++++++++++ packages/server/src/backlink-index.ts | 118 ++++++++ .../src/mcp/tools/advisory-warnings.test.ts | 98 +++++- .../server/src/mcp/tools/advisory-warnings.ts | 31 ++ packages/server/src/mcp/tools/edit.ts | 34 ++- packages/server/src/mcp/tools/move.ts | 7 +- packages/server/src/mcp/tools/shared.ts | 27 +- packages/server/src/mcp/tools/wiki-body.ts | 4 +- packages/server/src/mcp/tools/write.ts | 49 ++- .../server/src/skill-linking-contract.test.ts | 46 +++ 25 files changed, 981 insertions(+), 63 deletions(-) create mode 100644 .changeset/link-authoring-contract.md create mode 100644 packages/app/tests/integration/link-authoring-contract.test.ts create mode 100644 packages/server/src/skill-linking-contract.test.ts diff --git a/.changeset/link-authoring-contract.md b/.changeset/link-authoring-contract.md new file mode 100644 index 00000000..e6239762 --- /dev/null +++ b/.changeset/link-authoring-contract.md @@ -0,0 +1,5 @@ +--- +"@inkeep/open-knowledge": minor +--- + +Write-time broken-link validation in the Open Knowledge MCP. `write`/`edit` now return `brokenLinks` in the same response — outbound links that don't resolve are surfaced at write time, report-only so the write still lands and you can author a doc before its link target exists. Validation covers **every** local link, not just docs: the `./`-onto-a-content-root-path doubling footgun and missing `[[wiki]]` / markdown doc targets (`no-such-doc`), root-escaping paths from one `../` too many (`unresolvable`), and links to assets or source files (`[src](../../foo.py)`) that don't exist on disk at the resolved path (`no-such-file`). That last reason closes the gap a real codebase-wiki run hit: wrong-depth source-file links that 404 silently because the doc-only link graph never tracked them. The platform and pack skills are updated to point agents at `brokenLinks` as the primary write-time check and to clarify that a same-pass forward-reference reports as `no-such-doc` until its target lands (the `links({ kind: "dead" })` audit is the authoritative end-state check). diff --git a/docs/content/reference/core-concepts.md b/docs/content/reference/core-concepts.md index 7cde726b..b7f0e606 100644 --- a/docs/content/reference/core-concepts.md +++ b/docs/content/reference/core-concepts.md @@ -39,7 +39,7 @@ The set of files the engine treats as your knowledge base is the configured cont ## Links and backlinks -Internal cross-references are written with **standard markdown links**: `[text](./relative/path.md)` or `[text](/absolute/from/content-root.md)`. Whenever document A links to document B, OpenKnowledge automatically records the inverse on B: a **backlink** from B back to A. +Internal cross-references are written with **standard markdown links**. The recommended form is **relative** — `[text](./sibling.md)`, `[text](../folder/doc.md)` — which stays portable across GitHub, Obsidian, VS Code, and published sites. A **root-absolute** form (`[text](/folder/doc.md)`, where the leading slash means the content root) is equally valid and convenient for cross-folder links. The two never mix: never glue `./` onto a content-root path, since `./folder/doc.md` written from a doc already inside `folder/` resolves to the doubled, broken `folder/folder/doc.md` — `write`/`edit` flag exactly this in their `brokenLinks` response. Whenever document A links to document B, OpenKnowledge automatically records the inverse on B: a **backlink** from B back to A. You never write backlinks by hand. They are computed from the links you already write, and together they form the **link graph**: the network of relationships across your knowledge base. diff --git a/packages/app/tests/integration/link-authoring-contract.test.ts b/packages/app/tests/integration/link-authoring-contract.test.ts new file mode 100644 index 00000000..6f172c25 --- /dev/null +++ b/packages/app/tests/integration/link-authoring-contract.test.ts @@ -0,0 +1,286 @@ +import { afterAll, beforeAll, expect, test } from 'bun:test'; +import { randomUUID } from 'node:crypto'; +import { mkdirSync, readFileSync, writeFileSync } from 'node:fs'; +import { join } from 'node:path'; +import { HARNESS_BOOT_TIMEOUT_MS } from './harness-boot-timeout'; +import { awaitFileWatcherIndexed, createTestServer, type TestServer } from './test-harness.ts'; + +const MCP_PROTOCOL_VERSION = '2025-06-18'; + +interface InitializedSession { + sessionId: string; + protocolVersion: string; +} + +interface BrokenLink { + href: string; + resolvedTo: string | null; + reason: 'no-such-doc' | 'no-such-file' | 'unresolvable'; +} + +async function openMcpSession(port: number): Promise { + const init = await fetch(`http://127.0.0.1:${port}/mcp`, { + method: 'POST', + headers: { accept: 'application/json, text/event-stream', 'content-type': 'application/json' }, + body: JSON.stringify({ + jsonrpc: '2.0', + id: 1, + method: 'initialize', + params: { + protocolVersion: MCP_PROTOCOL_VERSION, + capabilities: {}, + clientInfo: { name: 'Claude', version: '1.0.0' }, + }, + }), + }); + expect(init.status).toBe(200); + const sessionId = init.headers.get('mcp-session-id'); + expect(sessionId).toBeTruthy(); + const initBody = (await init.json()) as { result?: { protocolVersion?: string } }; + const protocolVersion = initBody.result?.protocolVersion ?? MCP_PROTOCOL_VERSION; + + const initialized = await fetch(`http://127.0.0.1:${port}/mcp`, { + method: 'POST', + headers: { + accept: 'application/json, text/event-stream', + 'content-type': 'application/json', + 'mcp-session-id': sessionId as string, + 'mcp-protocol-version': protocolVersion, + }, + body: JSON.stringify({ jsonrpc: '2.0', method: 'notifications/initialized' }), + }); + expect(initialized.status).toBe(202); + return { sessionId: sessionId as string, protocolVersion }; +} + +let nextId = 100; + +async function callTool( + port: number, + session: InitializedSession, + name: string, + args: Record, + cwd: string, +): Promise<{ isError?: boolean; structuredContent?: Record }> { + nextId += 1; + const res = await fetch(`http://127.0.0.1:${port}/mcp`, { + method: 'POST', + headers: { + accept: 'application/json, text/event-stream', + 'content-type': 'application/json', + 'mcp-session-id': session.sessionId, + 'mcp-protocol-version': session.protocolVersion, + }, + body: JSON.stringify({ + jsonrpc: '2.0', + id: nextId, + method: 'tools/call', + params: { name, arguments: { ...args, cwd } }, + }), + }); + expect(res.status).toBe(200); + const body = (await res.json()) as { + result?: { isError?: boolean; structuredContent?: Record }; + error?: unknown; + }; + if (body.error) throw new Error(`tools/call error: ${JSON.stringify(body.error)}`); + return body.result ?? {}; +} + +function docResult(structured: Record | undefined): Record { + expect(structured).toBeDefined(); + const doc = structured?.document as Record | undefined; + expect(doc).toBeDefined(); + return doc as Record; +} + +let server: TestServer; + +beforeAll(async () => { + server = await createTestServer({ debounce: 50, maxDebounce: 200 }); +}, HARNESS_BOOT_TIMEOUT_MS); + +afterAll(async () => { + await server.cleanup(); +}); + +test('write surfaces broken outbound links (doubling + escape-root + broken wiki) in the same response', async () => { + const session = await openMcpSession(server.port); + const folder = `wiki-${randomUUID().slice(0, 8)}`; + const docName = `${folder}/OVERVIEW`; + const content = [ + '# Wiki Overview', + '', + `See [tasks](./${folder}/modules/tasks) for the task module.`, + 'A bad [escape](../../../way-out.md) link.', + 'And a [[Ghost Page]] wiki reference.', + '', + ].join('\n'); + + const result = await callTool( + server.port, + session, + 'write', + { document: { path: docName, content, position: 'replace' } }, + server.contentDir, + ); + expect(result.isError ?? false).toBe(false); + + const doc = docResult(result.structuredContent); + const broken = doc.brokenLinks as BrokenLink[]; + expect(broken).toEqual([ + { + href: `./${folder}/modules/tasks`, + resolvedTo: `${folder}/${folder}/modules/tasks`, + reason: 'no-such-doc', + }, + { href: '../../../way-out.md', resolvedTo: null, reason: 'unresolvable' }, + { href: '[[Ghost Page]]', resolvedTo: 'Ghost Page', reason: 'no-such-doc' }, + ]); + + const stored = readFileSync(join(server.contentDir, `${docName}.md`), 'utf-8'); + expect(stored).toContain(`[tasks](./${folder}/modules/tasks)`); + expect(stored).toContain('[escape](../../../way-out.md)'); + expect(stored).toContain('[[Ghost Page]]'); +}); + +test('a write whose links all resolve returns brokenLinks: [] (positive confirmation)', async () => { + const session = await openMcpSession(server.port); + const docName = `clean-${randomUUID().slice(0, 8)}`; + const content = [ + '# Clean', + '', + `Back to [self](./${docName.split('/').pop()}.md).`, + 'An [external](https://example.com) site and an [anchor](#clean).', + '', + ].join('\n'); + + const result = await callTool( + server.port, + session, + 'write', + { document: { path: docName, content, position: 'replace' } }, + server.contentDir, + ); + expect(result.isError ?? false).toBe(false); + const doc = docResult(result.structuredContent); + expect(doc.brokenLinks).toEqual([]); +}); + +test('write validates links to any file on disk, not just docs (source-file depth)', async () => { + const session = await openMcpSession(server.port); + const uid = randomUUID().slice(0, 8); + const relFile = `src/probe-${uid}.py`; + mkdirSync(join(server.contentDir, 'src'), { recursive: true }); + writeFileSync(join(server.contentDir, relFile), 'def probe(): ...\n'); + + const docName = `wiki-${uid}/modules/m`; + const content = [ + '# Module', + '', + `Correct depth: [probe](../../${relFile}).`, + `Over-deep: [probe again](../../../${relFile}).`, + `Missing: [gone](../../src/missing-${uid}.py).`, + '', + ].join('\n'); + + const result = await callTool( + server.port, + session, + 'write', + { document: { path: docName, content, position: 'replace' } }, + server.contentDir, + ); + expect(result.isError ?? false).toBe(false); + const broken = docResult(result.structuredContent).brokenLinks as BrokenLink[]; + expect(broken).toEqual([ + { href: `../../../${relFile}`, resolvedTo: null, reason: 'unresolvable' }, + { + href: `../../src/missing-${uid}.py`, + resolvedTo: `src/missing-${uid}.py`, + reason: 'no-such-file', + }, + ]); +}); + +test('edit (body find/replace) reports a broken link introduced by the edit', async () => { + const session = await openMcpSession(server.port); + const docName = `edited-${randomUUID().slice(0, 8)}`; + await callTool( + server.port, + session, + 'write', + { document: { path: docName, content: '# Edited\n\nPlaceholder.\n', position: 'replace' } }, + server.contentDir, + ); + + const edited = await callTool( + server.port, + session, + 'edit', + { + document: { + path: docName, + find: 'Placeholder.', + replace: 'See [gone](./does-not-exist.md).', + }, + }, + server.contentDir, + ); + expect(edited.isError ?? false).toBe(false); + const doc = docResult(edited.structuredContent); + expect(doc.brokenLinks).toEqual([ + { href: './does-not-exist.md', resolvedTo: 'does-not-exist', reason: 'no-such-doc' }, + ]); +}); + +test('the HTTP /api/agent-write-md response carries brokenLinks directly', async () => { + const docName = `http-${randomUUID().slice(0, 8)}`; + const res = await fetch(`http://127.0.0.1:${server.port}/api/agent-write-md`, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + docName, + position: 'replace', + markdown: 'Broken [ref](./nope.md) here.\n', + }), + }); + expect(res.status).toBe(200); + const body = (await res.json()) as { brokenLinks?: BrokenLink[] }; + expect(body.brokenLinks).toEqual([ + { href: './nope.md', resolvedTo: 'nope', reason: 'no-such-doc' }, + ]); +}); + +test('a link to a doc that actually exists is not flagged (admitted-set membership)', async () => { + const session = await openMcpSession(server.port); + const suffix = randomUUID().slice(0, 8); + const target = `guides-${suffix}/install`; + const sourceDoc = `guides-${suffix}/index`; + + await callTool( + server.port, + session, + 'write', + { document: { path: target, content: '# Install\n\nSteps.\n', position: 'replace' } }, + server.contentDir, + ); + await awaitFileWatcherIndexed(server, target); + + const result = await callTool( + server.port, + session, + 'write', + { + document: { + path: sourceDoc, + content: `# Index\n\nSee [install](./install.md).\n`, + position: 'replace', + }, + }, + server.contentDir, + ); + expect(result.isError ?? false).toBe(false); + const doc = docResult(result.structuredContent); + expect(doc.brokenLinks).toEqual([]); +}); diff --git a/packages/app/tests/integration/mcp-move-real-roundtrip.test.ts b/packages/app/tests/integration/mcp-move-real-roundtrip.test.ts index 7f7010ab..2729c3a2 100644 --- a/packages/app/tests/integration/mcp-move-real-roundtrip.test.ts +++ b/packages/app/tests/integration/mcp-move-real-roundtrip.test.ts @@ -135,11 +135,19 @@ describe('MCP move tool — real roundtrip against live OK server (QA-004 / QA-0 expect(result.isError).toBeUndefined(); const structured = result.structuredContent as { ok: boolean; - renamed: Array<{ fromDocName: string; toDocName: string }>; + renamed: Array<{ + fromDocName: string; + toDocName: string; + }>; rewrittenDocs: Array<{ docName: string; rewrites: number }>; }; expect(structured.ok).toBe(true); - expect(structured.renamed).toEqual([{ fromDocName: 'auth', toDocName: 'sso' }]); + expect(structured.renamed).toEqual([ + { + fromDocName: 'auth', + toDocName: 'sso', + }, + ]); const rewrittenNames = structured.rewrittenDocs.map((d) => d.docName); expect(rewrittenNames).toContain('index'); diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts index 2f408bf6..4dba21a4 100644 --- a/packages/core/src/index.ts +++ b/packages/core/src/index.ts @@ -554,8 +554,13 @@ export { BacklinkEntrySchema, type BacklinksSuccess, BacklinksSuccessSchema, + BROKEN_LINK_REASONS, type BranchInfoResponse, BranchInfoResponseSchema, + type BrokenLink, + type BrokenLinkReason, + BrokenLinkSchema, + BrokenLinksSchema, type CheckoutFailureReason, CheckoutFailureReasonSchema, type CheckoutRequest, @@ -1083,6 +1088,7 @@ export { type AnchorLinkTarget, type AssetLinkTarget, assertNeverLinkTarget, + buildAbsoluteMarkdownHref, buildRelativeMarkdownHref, type ClassifiedLinkTarget, classifyMarkdownHref, diff --git a/packages/core/src/schemas/api/agent-write.test.ts b/packages/core/src/schemas/api/agent-write.test.ts index 062d4ebf..4c3d5ef1 100644 --- a/packages/core/src/schemas/api/agent-write.test.ts +++ b/packages/core/src/schemas/api/agent-write.test.ts @@ -254,6 +254,7 @@ describe('AgentWriteMdSuccessSchema', () => { timestamp: '2026-04-30T00:00:00.000Z', subscriberCount: 0, systemSubscriberCount: 0, + brokenLinks: [], }); expect(result.success).toBe(true); }); @@ -270,15 +271,50 @@ describe('AgentWriteMdSuccessSchema', () => { message: 'No backlinks; consider linking from [[folder/README]].', }, ], + brokenLinks: [], }); expect(result.success).toBe(true); }); + test('parses populated brokenLinks across all three reason kinds', () => { + const result = AgentWriteMdSuccessSchema.safeParse({ + timestamp: '2026-04-30T00:00:00.000Z', + subscriberCount: 0, + systemSubscriberCount: 0, + brokenLinks: [ + { href: './wiki/x', resolvedTo: 'wiki/wiki/x', reason: 'no-such-doc' }, + { href: '../src/foo.py', resolvedTo: 'src/foo.py', reason: 'no-such-file' }, + { href: '../../escape.md', resolvedTo: null, reason: 'unresolvable' }, + ], + }); + expect(result.success).toBe(true); + }); + + test('requires brokenLinks (always-present confirmation field)', () => { + const result = AgentWriteMdSuccessSchema.safeParse({ + timestamp: '2026-04-30T00:00:00.000Z', + subscriberCount: 0, + systemSubscriberCount: 0, + }); + expect(result.success).toBe(false); + }); + + test('rejects an invalid brokenLinks reason', () => { + const result = AgentWriteMdSuccessSchema.safeParse({ + timestamp: '2026-04-30T00:00:00.000Z', + subscriberCount: 0, + systemSubscriberCount: 0, + brokenLinks: [{ href: './x', resolvedTo: null, reason: 'broken-anchor' }], + }); + expect(result.success).toBe(false); + }); + test('rejects negative subscriberCount', () => { const result = AgentWriteMdSuccessSchema.safeParse({ timestamp: '2026-04-30T00:00:00.000Z', subscriberCount: -1, systemSubscriberCount: 0, + brokenLinks: [], }); expect(result.success).toBe(false); }); @@ -289,6 +325,7 @@ describe('AgentWriteMdSuccessSchema', () => { subscriberCount: 0, systemSubscriberCount: 0, hints: [{ type: 'something-else', parentCandidates: [], message: '' }], + brokenLinks: [], }); expect(result.success).toBe(false); }); @@ -300,6 +337,7 @@ describe('AgentPatchSuccessSchema', () => { timestamp: '2026-04-30T00:00:00.000Z', subscriberCount: 0, systemSubscriberCount: 0, + brokenLinks: [], }); expect(result.success).toBe(true); }); diff --git a/packages/core/src/schemas/api/agent-write.ts b/packages/core/src/schemas/api/agent-write.ts index 67b4ed5e..b5a4aff1 100644 --- a/packages/core/src/schemas/api/agent-write.ts +++ b/packages/core/src/schemas/api/agent-write.ts @@ -128,6 +128,20 @@ export type AdvisoryWarning = z.infer; export const AdvisoryWarningsSchema = z.array(AdvisoryWarningSchema).min(1); +export const BROKEN_LINK_REASONS = ['no-such-doc', 'no-such-file', 'unresolvable'] as const; +export type BrokenLinkReason = (typeof BROKEN_LINK_REASONS)[number]; + +export const BrokenLinkSchema = z + .object({ + href: z.string(), + resolvedTo: z.string().nullable(), + reason: z.enum(BROKEN_LINK_REASONS), + }) + .loose() satisfies StandardSchemaV1; +export type BrokenLink = z.infer; + +export const BrokenLinksSchema = z.array(BrokenLinkSchema); + export const AgentWriteSuccessSchema = z .object({ timestamp: z.string().min(1), @@ -147,6 +161,7 @@ export const AgentWriteMdSuccessSchema = z summary: SummaryResponseFieldSchema.optional(), warning: WriteWarningSchema.optional(), warnings: AdvisoryWarningsSchema.optional(), + brokenLinks: BrokenLinksSchema, }) .loose() satisfies StandardSchemaV1; export type AgentWriteMdSuccess = z.infer; @@ -159,6 +174,7 @@ export const AgentPatchSuccessSchema = z summary: SummaryResponseFieldSchema.optional(), warning: WriteWarningSchema.optional(), warnings: AdvisoryWarningsSchema.optional(), + brokenLinks: BrokenLinksSchema, }) .loose() satisfies StandardSchemaV1; export type AgentPatchSuccess = z.infer; @@ -192,6 +208,7 @@ export const FrontmatterPatchSuccessSchema = z summary: SummaryResponseFieldSchema.optional(), warning: WriteWarningSchema.optional(), warnings: AdvisoryWarningsSchema.optional(), + brokenLinks: BrokenLinksSchema, }) .loose() satisfies StandardSchemaV1; export type FrontmatterPatchSuccess = z.infer; diff --git a/packages/core/src/utils/link-targets.test.ts b/packages/core/src/utils/link-targets.test.ts index b36393af..46e63b0a 100644 --- a/packages/core/src/utils/link-targets.test.ts +++ b/packages/core/src/utils/link-targets.test.ts @@ -1,5 +1,6 @@ import { describe, expect, test } from 'bun:test'; import { + buildAbsoluteMarkdownHref, buildRelativeMarkdownHref, classifyMarkdownHref, classifyWikiLinkTarget, @@ -174,4 +175,24 @@ describe('buildRelativeMarkdownHref', () => { '../install.md', ); }); + + test('honors a non-default extension for the target', () => { + expect(buildRelativeMarkdownHref('docs/index', 'docs/guide', null, '.mdx')).toBe('./guide.mdx'); + }); +}); + +describe('buildAbsoluteMarkdownHref', () => { + test('builds a root-absolute href from an extension-less docName', () => { + expect(buildAbsoluteMarkdownHref('wiki/modules/tasks')).toBe('/wiki/modules/tasks.md'); + }); + + test('appends an anchor when given', () => { + expect(buildAbsoluteMarkdownHref('docs/guide', '.md', 'install')).toBe( + '/docs/guide.md#install', + ); + }); + + test('honors a non-default extension', () => { + expect(buildAbsoluteMarkdownHref('guides/widget', '.mdx')).toBe('/guides/widget.mdx'); + }); }); diff --git a/packages/core/src/utils/link-targets.ts b/packages/core/src/utils/link-targets.ts index 622105cd..a206ff45 100644 --- a/packages/core/src/utils/link-targets.ts +++ b/packages/core/src/utils/link-targets.ts @@ -1,3 +1,4 @@ +import { DEFAULT_DOC_EXTENSION } from '../constants/doc-extensions.ts'; import { type ResolvedInternalHref, resolveInternalHref } from './resolve-internal-href.ts'; export interface DocLinkTarget extends ResolvedInternalHref { @@ -144,6 +145,7 @@ export function buildRelativeMarkdownHref( sourceDocName: string, targetDocName: string, anchor: string | null = null, + ext: string = DEFAULT_DOC_EXTENSION, ): string { const sourceDirSegments = splitDocNameSegments(sourceDocName); sourceDirSegments.pop(); @@ -169,5 +171,13 @@ export function buildRelativeMarkdownHref( relativePath = `./${relativePath}`; } - return `${relativePath}.md${anchor ? `#${anchor}` : ''}`; + return `${relativePath}${ext}${anchor ? `#${anchor}` : ''}`; +} + +export function buildAbsoluteMarkdownHref( + docName: string, + ext: string = DEFAULT_DOC_EXTENSION, + anchor: string | null = null, +): string { + return `/${docName}${ext}${anchor ? `#${anchor}` : ''}`; } diff --git a/packages/server/assets/skills/packs/codebase-wiki/SKILL.md b/packages/server/assets/skills/packs/codebase-wiki/SKILL.md index 5c0ce427..80344d3f 100644 --- a/packages/server/assets/skills/packs/codebase-wiki/SKILL.md +++ b/packages/server/assets/skills/packs/codebase-wiki/SKILL.md @@ -46,7 +46,7 @@ The `workflow({ kind: "wiki" })` guide is the authoritative source for exactly h ## Source-reference convention - **Intra-wiki navigation** → OK doc links — they build the backlink / hub / orphan graph, so link liberally; density is how the wiki stays navigable. -- **Code references** → relative links + symbol code-spans (`internal`) or GitHub blob URLs (`public`). Source-file links produce **no dead-link noise** (the graph only tracks `.md`/`.mdx` edges). Never invent paths — reference only files you actually read. +- **Code references** → relative links + symbol code-spans (`internal`) or GitHub blob URLs (`public`). Source-file links stay out of the navigation graph (`links` tracks only `.md`/`.mdx` edges, so they never show as graph dead-links or orphans) — but a wrong-depth path still surfaces in the write/edit `brokenLinks` response (`no-such-file`, or `unresolvable` if it overshoots the content root), so count the `../` hops from the page's folder. Never invent paths — reference only files you actually read. The full rules — the GitHub-URL / relative fallback, the `#Lxx` caveat, and the exact code-span shape — live in the `workflow({ kind: "wiki" })` guide. diff --git a/packages/server/assets/skills/packs/entity-vault/SKILL.md b/packages/server/assets/skills/packs/entity-vault/SKILL.md index 340d549d..ad0db0ce 100644 --- a/packages/server/assets/skills/packs/entity-vault/SKILL.md +++ b/packages/server/assets/skills/packs/entity-vault/SKILL.md @@ -32,7 +32,12 @@ When a new fact arrives, route it: update **compiled truth** if it changes curre ## Links -Prefer path-qualified wikilinks when entity identity matters: `[[companies/acme|Acme]]`, `[[people/jane-founder|Jane Founder]]`, `[[concepts/agent-runtime-observability|agent-runtime observability]]`. Path-qualified links resolve to the right dossier under GBrain's typed extraction. +Per the platform skill, prefer **standard markdown links** for new dossiers, and count the relative depth carefully for the cross-folder links this vault is full of — e.g. `[Alice Chen](../people/alice-chen.md)` from a `meetings/` note, `[Acme AI](../companies/acme-ai.md)` from a `people/` dossier. The write/edit `brokenLinks` response catches a wrong-depth path before you walk away. New brain pages render on GitHub, satisfy OK's `brokenLinks` check, and import cleanly into GBrain. + +Two deltas from the platform default, both GBrain-specific: + +- **Don't use the leading-slash root-absolute form** (`[Alice](/people/alice-chen.md)`). OK resolves it fine, but GBrain reads a leading `/` as an absolute filesystem path and rejects it — use a relative path. (GBrain also accepts a bare `[Alice](people/alice-chen.md)`, but from a subfolder OK resolves that *source-dir-relative* — `meetings/people/alice-chen` — and reports it broken, so stick to the `../`-correct relative form, which both accept.) +- **Path-qualified wikilinks are first-class here**, not legacy: `[[people/alice-chen|Alice Chen]]` resolves vault-root from any folder (no `../` to count), is validated by `brokenLinks` like any link, and imports into GBrain. Keep them **extensionless** (`[[people/alice-chen]]`, never `[[people/alice-chen.md]]` — the `.md` makes it resolve to a nonexistent doc). A *bare* `[[note-name]]` with no folder is ambiguous (resolves only if GBrain's basename-globbing is on) — reserve it for Obsidian migration. ## Agent behaviors diff --git a/packages/server/assets/skills/project/SKILL.md b/packages/server/assets/skills/project/SKILL.md index 62dd6198..bb7cf3ec 100644 --- a/packages/server/assets/skills/project/SKILL.md +++ b/packages/server/assets/skills/project/SKILL.md @@ -82,7 +82,7 @@ Knowledge-base docs are factual artifacts. Every claim must be traceable, and ** ## Linking — standard markdown links (MUST) -Link every noun-phrase that names another document — `[text](./relative/path.md)` — and link liberally. **Every link must resolve to a doc that exists** (never link a doc not yet written; create it the same pass or leave the mention as plain prose + a tracked task). Never backtick a link (`` `[text](./foo.md)` `` is a bug) and never use HTML ``. **After writing a doc, call `links({ kind: "dead", sourceDocuments: ["your/doc"] })` and fix every broken reference** — a dead link is debt, not a to-do marker; the editor's red-underline is slug-tolerant and lies, so trust the tool. External web sources are NOT inline body links (see Grounding). Full rule set + the `[[Page]]` legacy note: `references/linking.md`. +Link every noun-phrase that names another document — `[text](./relative/path.md)` — and link liberally. **Every link must resolve to a doc that exists by the time you're done** (a same-pass forward-reference you create later in the pass is fine; for one that genuinely won't exist, leave the mention as plain prose + a tracked task). Never backtick a link (`` `[text](./foo.md)` `` is a bug) and never use HTML ``. **Read `brokenLinks` on every `write`/`edit` response: `[]` means all links resolve; a populated list names each broken `href` + `reason` (`no-such-doc` / `no-such-file` / `unresolvable`) — fix them in a follow-up `edit`.** `links({ kind: "dead" })` is the authoritative end-state audit (the editor's red-underline is slug-tolerant and lies, so trust the tool). External web sources are NOT inline body links (see Grounding). Full rule set + the `[[Page]]` legacy note: `references/linking.md`. ## Folders, frontmatter, templates diff --git a/packages/server/assets/skills/project/references/linking.md b/packages/server/assets/skills/project/references/linking.md index 1b695444..89190ccf 100644 --- a/packages/server/assets/skills/project/references/linking.md +++ b/packages/server/assets/skills/project/references/linking.md @@ -1,14 +1,16 @@ # Linking — mechanics -(Core carries the MUST: link noun-phrases with standard markdown links, every link resolves, verify with `links({ kind: "dead" })`. This file carries the full rule set.) +(Core carries the MUST: link noun-phrases with standard markdown links, every link resolves, read `brokenLinks` on each write/edit. This file carries the full rule set.) -- **Every noun-phrase that names another document should be linked** using standard markdown link syntax: `[text](./relative/path.md)` or `[text](/absolute/from/content-root.md)`. +- **Every noun-phrase that names another document should be linked** using standard markdown link syntax. Two forms are valid: **relative** (`[text](./sibling.md)`, `[text](../folder/doc.md)`) — the recommended default, native in GitHub / Obsidian / VS Code and what published sites expect — and **root-absolute** (`[text](/folder/doc.md)`, leading slash = content root), handy for a cross-folder link. +- **Never glue `./` onto a content-root path.** `./wiki/modules/tasks`, written from a doc already inside `wiki/`, resolves to the doubled garbage `wiki/wiki/modules/tasks` — correct resolution of a malformed path, i.e. a silently broken link. Pick ONE form: a relative path from your doc (`./tasks.md`, `../modules/tasks.md`), or a `/`-rooted absolute path (`/wiki/modules/tasks.md`). The `./` prefix and a content-root path never combine. - **External web sources are NOT inline body links.** Per the Grounding rule, web URLs live in the `source_url:` frontmatter of an ingested doc under `external-sources/` (or the project's equivalent raw-sources folder); the body cites the local path: `[source name](./external-sources/source-slug.md)`. A raw `[source](https://...)` inline in the body is a TODO, not a citation — see Grounding for the closed-loop contract. - **Internal cross-refs between OK docs** → `[text](./other-doc.md)` — link liberally to aid navigation. -- **Every link must resolve to a doc that exists.** Never link to a doc that isn't written yet. If you want to reference something that should have its own page but doesn't: create that page in the same pass, or record it as a tracked task (`TaskCreate`, or your host's task tool — if the host has none, tell the user) and leave the mention as plain prose. A broken link is debt, not a to-do marker. +- **Every link must resolve to a doc that exists by the time you're done.** Within a single multi-doc authoring pass, linking a page you'll create later in the same pass is fine — `brokenLinks` reports it as `no-such-doc` until the target lands, an expected transient forward-reference rather than a wrong-path error. What's not acceptable is *leaving* a dead link behind: create the target in the same pass, or record it as a tracked task (`TaskCreate`, or your host's task tool — if the host has none, tell the user) and leave the mention as plain prose. - **Never wrap a link in backticks.** `` `[text](./foo.md)` `` is a bug — the backticks make it render as literal code rather than a link. - **Never use HTML anchors** (``). Markdown link syntax only. -- **Verify before walking away.** After writing a doc, call `links({ kind: "dead", sourceDocuments: ["your/doc"] })` to find broken references. Fix or remove every one — a dead link is never acceptable to leave behind. Companion `links` kinds: `backlinks` (incoming), `forward` (outgoing), `orphans` (no incoming), `hubs` (high-incoming), `suggest` (untextualized mentions worth linking). +- **`brokenLinks` on the write/edit response is your primary check — read it before walking away.** Every `write`/`edit` returns `brokenLinks` in the SAME payload: `[]` means every outbound link resolves; a populated list names each broken `href` with its `resolvedTo` + `reason` (`no-such-doc` = resolved to a doc that doesn't exist; `no-such-file` = resolved to a linked asset / source file that isn't on disk at that path; `unresolvable` = the path escapes the content root, usually one `../` too many). This validates **every** local link, not just doc links: a wrong-depth `[src](../../foo.py)` to a source file is caught too. It is report-only — the write landed regardless; fix or remove every one in a follow-up `edit`. +- **`links({ kind: "dead" })` is the authoritative end-state audit.** For a corpus-wide sweep across many docs: `links({ kind: "dead", sourceDocuments: ["your/doc"] })`. Companion `links` kinds: `backlinks` (incoming), `forward` (outgoing), `orphans` (no incoming), `hubs` (high-incoming), `suggest` (untextualized mentions worth linking). - **The editor's red-underline visual lies.** Its dead-link detection tolerates slug-fallback (e.g., `foo` may appear resolved because `foo.md` exists at root). `links({ kind: "dead" })` is strict-exact — trust the tool, not the visual. **Note on wiki-link syntax (`[[Page]]`):** the parser still handles it for legacy content, but it's NO LONGER the recommended default. Write new content with standard markdown links per above. Seed-pack templates (`ok seed --pack `) may still emit `[[Page]]` placeholders inside template body text — those are legacy. When you instantiate a seed-pack template, replace the legacy placeholders with standard markdown links during the `{shape}`-fill pass. diff --git a/packages/server/src/api-agent-patch.test.ts b/packages/server/src/api-agent-patch.test.ts index c371c46d..ffaad657 100644 --- a/packages/server/src/api-agent-patch.test.ts +++ b/packages/server/src/api-agent-patch.test.ts @@ -97,6 +97,7 @@ describe('POST /api/agent-patch', () => { timestamp: expect.any(String), subscriberCount: expect.any(Number), systemSubscriberCount: expect.any(Number), + brokenLinks: [], }); expect(ytext.toString()).toBe( '# Notes\n\nProject Alpha appears first. Later, Project Alpha (linked) appears second.\n', @@ -172,6 +173,7 @@ describe('POST /api/agent-patch', () => { timestamp: expect.any(String), subscriberCount: expect.any(Number), systemSubscriberCount: expect.any(Number), + brokenLinks: [], }); expect(ytext.toString()).toBe( '# Notes\n\nProject Alpha (linked) appears first. Later, Project Alpha appears second.\n', diff --git a/packages/server/src/api-extension.ts b/packages/server/src/api-extension.ts index d21ff941..6ad6263e 100644 --- a/packages/server/src/api-extension.ts +++ b/packages/server/src/api-extension.ts @@ -399,6 +399,7 @@ import { } from './apply-managed-rename.ts'; import { type BacklinkIndex, + computeBrokenOutboundLinks, type GraphNode as IndexedGraphNode, isOrphanMode, } from './backlink-index.ts'; @@ -2198,6 +2199,9 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { return admitted; } + const linkedFileExists = (contentRootRelativePath: string): boolean => + existsSync(resolve(contentDir, contentRootRelativePath)); + function createSerializedRunner() { let pending = Promise.resolve(); return async function runSerialized(task: () => Promise): Promise { @@ -3718,9 +3722,17 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { const hints = computeOrphanHints(resolvedDocName); - const renderWarnings = await validateMermaidFences( - session.dc.document.getText('source').toString(), + const writtenSource = session.dc.document.getText('source').toString(); + + const renderWarnings = await validateMermaidFences(writtenSource, resolvedDocName); + + const admittedForLinks = collectAdmittedDocNames(); + admittedForLinks.add(resolvedDocName); + const brokenLinks = computeBrokenOutboundLinks( + writtenSource, resolvedDocName, + admittedForLinks, + linkedFileExists, ); const subscriberCount = getSubscriberCount(resolvedDocName); @@ -3757,6 +3769,7 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { ? { warning: writeMdWarning } : {}), ...(writeMdAdvisories.length > 0 ? { warnings: writeMdAdvisories } : {}), + brokenLinks, }, { handler: 'agent-write-md' }, ); @@ -3985,6 +3998,16 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { } const fmWarning = buildReconcileWarning(fmReconcile); + + const admittedForLinks = collectAdmittedDocNames(); + admittedForLinks.add(resolvedDocName); + const brokenLinks = computeBrokenOutboundLinks( + session.dc.document.getText('source').toString(), + resolvedDocName, + admittedForLinks, + linkedFileExists, + ); + successResponse( res, 200, @@ -3996,6 +4019,7 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { appliedKeys, ...(summaryResponse ? { summary: summaryResponse } : {}), ...(fmWarning ? { warning: fmWarning, warnings: [fmWarning] } : {}), + brokenLinks, }, { handler: 'frontmatter-patch' }, ); @@ -5168,9 +5192,17 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { const { response: summaryResponse } = summaryResponseFields(normalizedSummary); - const renderWarnings = await validateMermaidFences( - session.dc.document.getText('source').toString(), + const patchedSource = session.dc.document.getText('source').toString(); + + const renderWarnings = await validateMermaidFences(patchedSource, docName); + + const admittedForLinks = collectAdmittedDocNames(); + admittedForLinks.add(docName); + const brokenLinks = computeBrokenOutboundLinks( + patchedSource, docName, + admittedForLinks, + linkedFileExists, ); const patchWarning = buildReconcileWarning(patchReconcile); @@ -5196,6 +5228,7 @@ export function createApiExtension(options: ApiExtensionOptions): Extension { ? { warning: patchWarning } : {}), ...(patchAdvisories.length > 0 ? { warnings: patchAdvisories } : {}), + brokenLinks, }, { handler: 'agent-patch' }, ); diff --git a/packages/server/src/backlink-index.test.ts b/packages/server/src/backlink-index.test.ts index 68b0ad9e..49977109 100644 --- a/packages/server/src/backlink-index.test.ts +++ b/packages/server/src/backlink-index.test.ts @@ -13,6 +13,8 @@ import { join } from 'node:path'; import { LOCAL_DIR } from '@inkeep/open-knowledge-core'; import { BacklinkIndex, + type BrokenOutboundLink, + computeBrokenOutboundLinks, type ExtractedWikiLink, extractMarkdownLinksFromMarkdown, extractWikiLinksFromMarkdown, @@ -1209,3 +1211,168 @@ describe('reconcileWithDisk', () => { } }); }); + +describe('computeBrokenOutboundLinks', () => { + test('returns [] when every outbound link resolves (AC2.1)', () => { + const md = 'See [sibling](./real.md) and [root](/docs/guide.md) and [[Existing]].'; + const admitted = new Set(['notes/real', 'docs/guide', 'Existing']); + expect(computeBrokenOutboundLinks(md, 'notes/a', admitted)).toEqual([]); + }); + + test('flags the `./`-onto-content-root doubling footgun as no-such-doc (AC2.2)', () => { + const md = 'See [tasks](./wiki/modules/tasks).'; + expect(computeBrokenOutboundLinks(md, 'wiki/OVERVIEW', new Set())).toEqual< + BrokenOutboundLink[] + >([ + { + href: './wiki/modules/tasks', + resolvedTo: 'wiki/wiki/modules/tasks', + reason: 'no-such-doc', + }, + ]); + }); + + test('flags a root-escaping relative link as unresolvable (AC2.3)', () => { + const md = 'Bad [escape](../escape.md).'; + expect(computeBrokenOutboundLinks(md, 'readme', new Set())).toEqual([ + { href: '../escape.md', resolvedTo: null, reason: 'unresolvable' }, + ]); + }); + + test('flags a relative path that pops past the content root as unresolvable', () => { + const md = 'Deep [out](../../way-out.md).'; + expect(computeBrokenOutboundLinks(md, 'a/b', new Set())).toEqual([ + { href: '../../way-out.md', resolvedTo: null, reason: 'unresolvable' }, + ]); + }); + + test('an empty-href markdown construct `[x]()` is not a link (mirrors the indexer)', () => { + expect(computeBrokenOutboundLinks('See [x]() here.', 'notes/a', new Set())).toEqual([]); + }); + + test('flags a broken wiki-link with the reconstructed [[…]] href (AC2.4)', () => { + const md = 'Missing [[Ghost Page]] reference, and an [[Existing]] one.'; + const admitted = new Set(['Existing']); + expect(computeBrokenOutboundLinks(md, 'notes/a', admitted)).toEqual([ + { href: '[[Ghost Page]]', resolvedTo: 'Ghost Page', reason: 'no-such-doc' }, + ]); + }); + + test('flags a broken `![[doc]]` embed (validated like the index does)', () => { + const md = 'Embed: ![[missing-doc]] here.'; + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set())).toEqual([ + { href: '[[missing-doc]]', resolvedTo: 'missing-doc', reason: 'no-such-doc' }, + ]); + }); + + test('resolves a path-qualified wiki-link (`[[folder/slug|Alias]]`) vault-root, not source-dir-relative', () => { + const md = 'Met [[people/alice-chen|Alice Chen]]; stub [[people/bob-jones|Bob]].'; + const admitted = new Set(['people/alice-chen']); + expect(computeBrokenOutboundLinks(md, 'meetings/2026-01-01', admitted)).toEqual< + BrokenOutboundLink[] + >([{ href: '[[people/bob-jones]]', resolvedTo: 'people/bob-jones', reason: 'no-such-doc' }]); + }); + + test('skips external URLs, image embeds, and anchors; file links skipped when no oracle is passed', () => { + const md = [ + 'Web [site](https://example.com/missing).', + 'Mail [me](mailto:a@b.com).', + 'Asset [pdf](./missing.pdf) and ![alt](./missing.png).', + 'Image embed ![[missing.png]].', + 'Anchor [top](#section).', + ].join('\n'); + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set())).toEqual([]); + }); + + test('does not scan links inside fenced or inline code', () => { + const md = [ + 'Inline `[x](./missing.md)` stays code.', + '```', + '[fenced](./also-missing.md)', + '[[FencedWiki]]', + '```', + ].join('\n'); + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set())).toEqual([]); + }); + + test('does not scan the frontmatter region', () => { + const md = ['---', 'title: Has a [fake](./missing.md) in YAML', '---', 'Body only.'].join('\n'); + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set())).toEqual([]); + }); + + test('dedupes repeated identical broken hrefs', () => { + const md = 'First [a](./missing.md), again [b](./missing.md).'; + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set())).toEqual([ + { href: './missing.md', resolvedTo: 'notes/missing', reason: 'no-such-doc' }, + ]); + }); + + test('treats a self-link to the admitted source doc as valid', () => { + const md = 'See [self](./a.md).'; + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set(['notes/a']))).toEqual([]); + }); + + const fileOracle = (existing: string[]) => { + const set = new Set(existing); + return (p: string) => set.has(p); + }; + + test('a correct-depth source-file link that exists on disk is clean', () => { + const md = 'Probe in [jacobian.py](../../microreservoir/entk/jacobian.py).'; + expect( + computeBrokenOutboundLinks( + md, + 'wiki/modules/entk', + new Set(), + fileOracle(['microreservoir/entk/jacobian.py']), + ), + ).toEqual([]); + }); + + test('an over-deep source-file link (one extra `../`) is unresolvable — the wiki bug', () => { + const md = 'Probe in [jacobian.py](../../../microreservoir/entk/jacobian.py).'; + expect( + computeBrokenOutboundLinks( + md, + 'wiki/modules/entk', + new Set(), + fileOracle(['microreservoir/entk/jacobian.py']), + ), + ).toEqual([ + { + href: '../../../microreservoir/entk/jacobian.py', + resolvedTo: null, + reason: 'unresolvable', + }, + ]); + }); + + test('an in-root file link to a missing file is no-such-file (resolvedTo = the path)', () => { + const md = 'See [data](../data/missing.json).'; + expect( + computeBrokenOutboundLinks(md, 'wiki/OVERVIEW', new Set(), fileOracle([]))[0], + ).toEqual({ + href: '../data/missing.json', + resolvedTo: 'data/missing.json', + reason: 'no-such-file', + }); + }); + + test('a content-root-absolute file link resolves from the root', () => { + const md = 'Config at [pkg](/package.json) and [gone](/nope.json).'; + expect( + computeBrokenOutboundLinks(md, 'wiki/modules/cli', new Set(), fileOracle(['package.json'])), + ).toEqual([ + { href: '/nope.json', resolvedTo: 'nope.json', reason: 'no-such-file' }, + ]); + }); + + test('external URLs and wiki image embeds are not file-validated even with an oracle', () => { + const md = [ + 'Web [pdf](https://example.com/x.pdf).', + 'Embed ![[diagram.png]].', + 'Image ![alt](./local.png).', + ].join('\n'); + expect(computeBrokenOutboundLinks(md, 'notes/a', new Set(), fileOracle([]))).toEqual([]); + }); +}); diff --git a/packages/server/src/backlink-index.ts b/packages/server/src/backlink-index.ts index 1ad8313d..0d1af869 100644 --- a/packages/server/src/backlink-index.ts +++ b/packages/server/src/backlink-index.ts @@ -2,6 +2,7 @@ import { type Dirent, existsSync, mkdirSync } from 'node:fs'; import { readdir, readFile, stat, writeFile } from 'node:fs/promises'; import { dirname, join, relative, resolve } from 'node:path'; import { + type BrokenLinkReason, classifyMarkdownHref, classifyWikiLinkTarget, getWikiLinkText, @@ -12,6 +13,7 @@ import { type OrphanMode, parseGlobalSkillBundleDoc, parseProjectSkillBundleDoc, + resolveAssetProjectPath, resolveInternalHref, resolveSkillBundleWikiTarget, skillLiveDocName, @@ -659,6 +661,122 @@ function extractExternalMarkdownLinksFromMarkdown( return links; } +export interface BrokenOutboundLink { + href: string; + resolvedTo: string | null; + reason: BrokenLinkReason; +} + +export function computeBrokenOutboundLinks( + markdown: string, + sourceDocName: string, + admittedDocs: Iterable, + fileExists?: (contentRootRelativePath: string) => boolean, +): BrokenOutboundLink[] { + const admitted = admittedDocs instanceof Set ? admittedDocs : new Set(admittedDocs); + + let body: string; + try { + ({ body } = stripFrontmatter(markdown)); + } catch { + body = markdown; + } + + const source = body.replaceAll('\r\n', '\n').replaceAll('\r', '\n'); + const lines = source.split('\n'); + const broken: BrokenOutboundLink[] = []; + const seen = new Set(); + let fence: FenceState | null = null; + + const record = (href: string, resolvedTo: string | null, reason: BrokenLinkReason): void => { + if (seen.has(href)) return; + seen.add(href); + broken.push({ href, resolvedTo, reason }); + }; + + const recordMarkdownLink = (rawHref: string): void => { + const trimmed = rawHref.trim(); + if (trimmed.startsWith('#')) return; + const classified = classifyMarkdownHref(trimmed, sourceDocName); + if (!classified) { + record(trimmed, null, 'unresolvable'); + return; + } + if (classified.kind === 'doc') { + if (!admitted.has(classified.docName)) { + record(trimmed, classified.docName, 'no-such-doc'); + } + return; + } + if (classified.kind === 'asset') { + if (!fileExists) return; + const filePath = resolveAssetProjectPath(classified.url, sourceDocName); + if (filePath === null) { + record(trimmed, null, 'unresolvable'); + return; + } + if (!fileExists(filePath)) { + record(trimmed, filePath, 'no-such-file'); + } + return; + } + }; + + const recordWikiLink = (target: string, anchor: string | null): void => { + const classified = classifyWikiLinkTarget(target, anchor); + if (!classified || classified.kind !== 'doc') return; + if (!admitted.has(classified.docName)) { + record(`[[${target}${anchor ? `#${anchor}` : ''}]]`, classified.docName, 'no-such-doc'); + } + }; + + for (const line of lines) { + if (fence) { + if (isFenceClose(line, fence)) fence = null; + continue; + } + const nextFence = matchFence(line); + if (nextFence) { + fence = nextFence; + continue; + } + + let idx = leadingMarkdownPrefixLength(line); + while (idx < line.length) { + if (line[idx] === '\\' && idx + 1 < line.length) { + idx += 2; + continue; + } + if (line[idx] === '`') { + const inlineCode = readInlineCode(line, idx); + if (inlineCode) { + idx = inlineCode.nextIndex; + continue; + } + } + if (line[idx] === '[' && line[idx + 1] === '[') { + const wikiLink = readWikiLink(line, idx); + if (wikiLink) { + recordWikiLink(wikiLink.target, wikiLink.anchor); + idx = wikiLink.nextIndex; + continue; + } + } + if (line[idx] === '[' && line[idx - 1] !== '!') { + const mdLink = readMarkdownLink(line, idx); + if (mdLink) { + recordMarkdownLink(mdLink.href); + idx = mdLink.nextIndex; + continue; + } + } + idx++; + } + } + + return broken; +} + function serializeState(state: BranchGraphState): SerializedBranchGraphState { return { backward: Object.fromEntries( diff --git a/packages/server/src/mcp/tools/advisory-warnings.test.ts b/packages/server/src/mcp/tools/advisory-warnings.test.ts index 1e3ca985..e968f387 100644 --- a/packages/server/src/mcp/tools/advisory-warnings.test.ts +++ b/packages/server/src/mcp/tools/advisory-warnings.test.ts @@ -1,11 +1,14 @@ import { describe, expect, test } from 'bun:test'; -import type { RenderWarning, WriteWarning } from '@inkeep/open-knowledge-core'; +import type { BrokenLink, RenderWarning, WriteWarning } from '@inkeep/open-knowledge-core'; import { formatAdvisoryBriefs, formatAdvisoryLines, + formatBrokenLinkBrief, + formatBrokenLinkLines, formatRenderWarningsBrief, formatRenderWarningsLine, parseAdvisoryWarnings, + parseBrokenLinks, } from './advisory-warnings.ts'; function mermaidWarning(overrides: Partial = {}): RenderWarning { @@ -118,3 +121,96 @@ describe('render-family bounds phrasing', () => { expect(formatRenderWarningsBrief(warnings)).toContain('10+'); }); }); + +const noSuchDoc: BrokenLink = { + href: './wiki/x', + resolvedTo: 'wiki/wiki/x', + reason: 'no-such-doc', +}; +const unresolvable: BrokenLink = { + href: '../../escape.md', + resolvedTo: null, + reason: 'unresolvable', +}; +const noSuchFile: BrokenLink = { + href: '../src/foo.py', + resolvedTo: 'src/foo.py', + reason: 'no-such-file', +}; + +describe('parseBrokenLinks', () => { + test('parses a well-formed array (all three reasons)', () => { + expect(parseBrokenLinks([noSuchDoc, noSuchFile, unresolvable])).toEqual([ + noSuchDoc, + noSuchFile, + unresolvable, + ]); + }); + + test('drops malformed entries but keeps valid ones', () => { + const mixed = [ + noSuchDoc, + { href: 'x', resolvedTo: null, reason: 'broken-anchor' }, // invalid reason + { href: 42 }, // wrong types + unresolvable, + ]; + expect(parseBrokenLinks(mixed)).toEqual([noSuchDoc, unresolvable]); + }); + + test('returns [] for a non-array (absent / wrong-typed field)', () => { + expect(parseBrokenLinks(undefined)).toEqual([]); + expect(parseBrokenLinks(null)).toEqual([]); + expect(parseBrokenLinks('nope')).toEqual([]); + expect(parseBrokenLinks({})).toEqual([]); + }); + + test('returns [] for an empty array (the all-resolve confirmation)', () => { + expect(parseBrokenLinks([])).toEqual([]); + }); +}); + +describe('formatBrokenLinkLines', () => { + test('no links → no lines (clean write stays quiet)', () => { + expect(formatBrokenLinkLines([])).toEqual([]); + }); + + test('one link → singular header + a bullet with resolvedTo', () => { + const lines = formatBrokenLinkLines([noSuchDoc]); + expect(lines[0]).toContain('1 broken outbound link —'); + expect(lines[0]).not.toContain('links —'); + expect(lines[1]).toBe(' • ./wiki/x → wiki/wiki/x (no-such-doc)'); + }); + + test('null resolvedTo omits the arrow', () => { + const lines = formatBrokenLinkLines([unresolvable]); + expect(lines[1]).toBe(' • ../../escape.md (unresolvable)'); + expect(lines[1]).not.toContain('→'); + }); + + test('a no-such-file entry renders the resolved path + reason', () => { + const lines = formatBrokenLinkLines([noSuchFile]); + expect(lines[1]).toBe(' • ../src/foo.py → src/foo.py (no-such-file)'); + }); + + test('N links → plural header + one bullet each', () => { + const lines = formatBrokenLinkLines([noSuchDoc, unresolvable]); + expect(lines[0]).toContain('2 broken outbound links —'); + expect(lines).toHaveLength(3); + }); +}); + +describe('formatBrokenLinkBrief', () => { + test('no links → null (nothing appended to the batch line)', () => { + expect(formatBrokenLinkBrief([])).toBeNull(); + }); + + test('one link → singular brief', () => { + expect(formatBrokenLinkBrief([noSuchDoc])).toBe('⚠ 1 broken outbound link (see brokenLinks).'); + }); + + test('N links → plural brief', () => { + expect(formatBrokenLinkBrief([noSuchDoc, unresolvable])).toBe( + '⚠ 2 broken outbound links (see brokenLinks).', + ); + }); +}); diff --git a/packages/server/src/mcp/tools/advisory-warnings.ts b/packages/server/src/mcp/tools/advisory-warnings.ts index d9138b3e..94f960c9 100644 --- a/packages/server/src/mcp/tools/advisory-warnings.ts +++ b/packages/server/src/mcp/tools/advisory-warnings.ts @@ -1,6 +1,8 @@ import { type AdvisoryWarning, AdvisoryWarningSchema, + type BrokenLink, + BrokenLinkSchema, type RenderWarning, type WriteWarning, } from '@inkeep/open-knowledge-core'; @@ -14,6 +16,35 @@ export function parseAdvisoryWarnings(value: unknown): AdvisoryWarning[] | undef return warnings.length > 0 ? warnings : undefined; } +export function parseBrokenLinks(value: unknown): BrokenLink[] { + if (!Array.isArray(value)) return []; + return value.flatMap((entry) => { + const parsed = BrokenLinkSchema.safeParse(entry); + return parsed.success ? [parsed.data] : []; + }); +} + +export function formatBrokenLinkLines(links: BrokenLink[]): string[] { + if (links.length === 0) return []; + const header = `⚠ ${links.length} broken outbound link${ + links.length === 1 ? '' : 's' + } — fix or remove (the write still landed):`; + return [header, ...links.map((l) => ` • ${formatBrokenLink(l)}`)]; +} + +export function formatBrokenLinkBrief(links: BrokenLink[]): string | null { + if (links.length === 0) return null; + return `⚠ ${links.length} broken outbound link${ + links.length === 1 ? '' : 's' + } (see brokenLinks).`; +} + +function formatBrokenLink(link: BrokenLink): string { + return link.resolvedTo + ? `${link.href} → ${link.resolvedTo} (${link.reason})` + : `${link.href} (${link.reason})`; +} + function integrityEntries(warnings: AdvisoryWarning[]): WriteWarning[] { return warnings.filter( (w): w is WriteWarning => w.kind === 'content-divergence' || w.kind === 'disk-edit-reconciled', diff --git a/packages/server/src/mcp/tools/edit.ts b/packages/server/src/mcp/tools/edit.ts index 03d7a051..ed2d8636 100644 --- a/packages/server/src/mcp/tools/edit.ts +++ b/packages/server/src/mcp/tools/edit.ts @@ -12,7 +12,12 @@ import { mergePatch } from '../../content/frontmatter-merge.ts'; import type { TemplateFrontmatter } from '../../content/templates-write.ts'; import { SUPPORTED_DOC_EXTENSIONS } from '../../doc-extensions.ts'; import type { AgentIdentity } from '../agent-identity.ts'; -import { formatAdvisoryLines, parseAdvisoryWarnings } from './advisory-warnings.ts'; +import { + formatAdvisoryLines, + formatBrokenLinkLines, + parseAdvisoryWarnings, + parseBrokenLinks, +} from './advisory-warnings.ts'; import { resolveWithinRoot } from './path-safety.ts'; import { buildPreviewAttachWarning, resolvePreviewUrl, START_UI_TEXT_HINT } from './preview-url.ts'; import type { ConfigOrResolver, ServerInstance, ServerUrlOrResolver } from './shared.ts'; @@ -190,6 +195,7 @@ async function handleDocBody( result, normalized.docName, cwd, + contentDir, autoOpen, 'Edit applied successfully.', ); @@ -199,6 +205,7 @@ async function handleDocFrontmatter( doc: { path: string; frontmatter: FrontmatterPatch }, args: { summary?: string }, cwd: string, + contentDir: string, url: string, deps: EditDeps, autoOpen: boolean, @@ -240,6 +247,7 @@ async function handleDocFrontmatter( result, normalized.docName, cwd, + contentDir, autoOpen, `Frontmatter patched (${opSummary || `${Object.keys(doc.frontmatter).length} key(s)`}).`, ); @@ -249,23 +257,22 @@ function composeWritePreviewResult( result: Awaited>, docName: string, cwd: string, + _contentDir: string, autoOpen: boolean, leadLine: string, ) { const lockDir = resolveLockDir(cwd); const preview = resolvePreviewUrl(docName, { lockDir }); - const subscriberCount = - typeof result.subscriberCount === 'number' ? result.subscriberCount : undefined; const systemSubscriberCount = typeof result.systemSubscriberCount === 'number' ? result.systemSubscriberCount : undefined; const noPreviewAnywhere = systemSubscriberCount === 0; - const noPreviewOnThisDoc = subscriberCount === 0; const summaryResult = result.summary && typeof result.summary === 'object' ? (result.summary as { value: string; truncatedFrom?: number; hint?: string }) : undefined; const summaryHint = typeof summaryResult?.hint === 'string' ? summaryResult.hint : undefined; const advisoryWarnings = parseAdvisoryWarnings(result.warnings); + const brokenLinks = parseBrokenLinks(result.brokenLinks); const lines: string[] = [leadLine]; if (noPreviewAnywhere && !preview) lines.push(START_UI_TEXT_HINT); @@ -273,17 +280,11 @@ function composeWritePreviewResult( if (advisoryWarnings) { lines.push(...formatAdvisoryLines(advisoryWarnings)); } + lines.push(...formatBrokenLinkLines(brokenLinks)); const text = lines.join('\n'); - if ( - !preview && - !noPreviewAnywhere && - !noPreviewOnThisDoc && - !summaryResult && - !advisoryWarnings - ) { - return textResult(text); - } - const document: Record = {}; + const document: Record = { + brokenLinks, + }; if (summaryResult) document.summary = summaryResult; if (advisoryWarnings) document.warnings = advisoryWarnings; const warning = noPreviewAnywhere ? buildPreviewAttachWarning(preview, autoOpen) : undefined; @@ -649,7 +650,9 @@ export function register(server: ServerInstance, deps: EditDeps): void { document: z .object(documentResultBaseShape) .optional() - .describe('Document edit result (present when there is a doc-specific signal).'), + .describe( + 'Document edit result. Always present on a successful document edit (body or frontmatter) — it carries `brokenLinks` (possibly `[]`) plus any `summary`/`warnings`. Absent only for folder/template edits.', + ), folder: z .object({ ok: z.boolean(), @@ -726,6 +729,7 @@ export function register(server: ServerInstance, deps: EditDeps): void { { path: args.document.path, frontmatter: args.document.frontmatter }, args, cwd, + contentDir, url, deps, autoOpen, diff --git a/packages/server/src/mcp/tools/move.ts b/packages/server/src/mcp/tools/move.ts index 99983ce7..206e6781 100644 --- a/packages/server/src/mcp/tools/move.ts +++ b/packages/server/src/mcp/tools/move.ts @@ -195,7 +195,12 @@ export function register(server: ServerInstance, deps: MoveDeps): void { 'Template move only: `true` = tracked `git mv` (history preserved), `false` = plain disk rename (untracked / local-only `.ok/`).', ), renamed: z - .array(z.object({ fromDocName: z.string(), toDocName: z.string() })) + .array( + z.object({ + fromDocName: z.string(), + toDocName: z.string(), + }), + ) .optional() .describe('docName remappings performed.'), rewrittenDocs: looseObjectArray diff --git a/packages/server/src/mcp/tools/shared.ts b/packages/server/src/mcp/tools/shared.ts index 105b0cdd..d9462d9d 100644 --- a/packages/server/src/mcp/tools/shared.ts +++ b/packages/server/src/mcp/tools/shared.ts @@ -1,9 +1,16 @@ +import { existsSync } from 'node:fs'; import { resolve } from 'node:path'; -import { AdvisoryWarningSchema, validateDocName } from '@inkeep/open-knowledge-core'; +import { + AdvisoryWarningSchema, + BrokenLinkSchema, + validateDocName, +} from '@inkeep/open-knowledge-core'; import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { z } from 'zod'; import type { Config } from '../../config/schema.ts'; +import { SUPPORTED_DOC_EXTENSIONS } from '../../doc-extensions.ts'; import type { AgentIdentity } from '../agent-identity.ts'; +import { resolveWithinRoot } from './path-safety.ts'; export type ServerInstance = McpServer; export type ConfigOrResolver = Config | ((cwd?: string) => Promise); @@ -70,6 +77,23 @@ export const previewAttachWarningField = z .optional() .describe('Preview-attach hint (`{ action, previewUrl?, message? }`) when relevant.'); +const brokenLinksOutputField = z + .array(BrokenLinkSchema) + .describe( + 'Outbound internal links in the just-written doc that do not resolve. Always present — `[]` means every link resolves. Each: `{ href (as written), resolvedTo (the docName or content-root file path it pointed at, or null), reason: "no-such-doc" | "no-such-file" | "unresolvable" }`. Report-only — the write landed regardless; fix in a follow-up edit.', + ); + +export function docExtensionOnDisk( + contentDir: string, + docName: string, +): (typeof SUPPORTED_DOC_EXTENSIONS)[number] | undefined { + for (const ext of SUPPORTED_DOC_EXTENSIONS) { + const contained = resolveWithinRoot(contentDir, `${docName}${ext}`); + if (contained.ok && existsSync(contained.abs)) return ext; + } + return undefined; +} + export const documentResultBaseShape = { summary: summaryOutputSchema.optional(), warnings: z @@ -79,6 +103,7 @@ export const documentResultBaseShape = { .describe( "Advisory entries discriminated by `kind`. Write-integrity kinds — `content-divergence` (converged Y.Text didn't byte-match what you composed) and `disk-edit-reconciled` (an out-of-band disk edit was folded in before your write) — mean re-read the doc. The renderability kind `mermaid-parse-error` means the write landed but that fence will not render — fix it and re-edit.", ), + brokenLinks: brokenLinksOutputField, } as const; export function nestDocResult( diff --git a/packages/server/src/mcp/tools/wiki-body.ts b/packages/server/src/mcp/tools/wiki-body.ts index c09d8fcb..6d9bcde4 100644 --- a/packages/server/src/mcp/tools/wiki-body.ts +++ b/packages/server/src/mcp/tools/wiki-body.ts @@ -29,7 +29,7 @@ Default when the user says nothing: \`internal\` / \`standard\`. ## Source-reference convention (how wiki pages point at code) - **Intra-wiki navigation** → OK doc links (\`[Auth flow](../flows/auth.md)\`). These build the full backlink / hub / orphan graph. Link liberally — density is how the wiki stays navigable. -- **Code references, \`internal\`** → a relative markdown link to the source file plus an inline code-span for the symbol — e.g. the \`bootServer()\` symbol in [boot.ts](../../packages/server/src/boot.ts). Relative links click-open in the asset preview and produce **no dead-link noise** (the link graph only tracks \`.md\`/\`.mdx\` edges, so source links are never reported dead). A cosmetic \`#Lxx\` is fine but not navigable. +- **Code references, \`internal\`** → a relative markdown link to the source file plus an inline code-span for the symbol — e.g. the \`bootServer()\` symbol in [boot.ts](../../packages/server/src/boot.ts). Relative source links click-open in the asset preview and stay out of the navigation graph (the \`links\` graph tracks only \`.md\`/\`.mdx\` edges, so source links never show as graph dead-links or orphans) — but the write/edit \`brokenLinks\` response DOES validate them: a wrong-depth path resolving to a missing file is reported \`no-such-file\`, and one that overshoots the content root is \`unresolvable\`. So count the \`../\` hops from the page's own folder, not the wiki root. A cosmetic \`#Lxx\` is fine but not navigable. - **Code references, \`public\`** → GitHub blob URLs (\`https://github.com///blob//path/to/file.ts\`) so a reader without the repo can follow them. Detect the remote with \`exec\` (or native \`git remote get-url origin\`); if there is none, fall back to relative links. Never invent paths — every source reference must point at a file you actually read. @@ -105,7 +105,7 @@ Atomic pages for domain terms and core abstractions (one term each): definition, 2. Run \`links({ kind: ["orphans", "hubs", "dead"] })\`: - **orphans** — pages nothing links to. Adopt each by linking it from OVERVIEW or a relevant section page (or, rarely, justify it as intentionally standalone). - **hubs** — ranks by *inbound* links. Confirm your **concept/module pages** show up here — that's the signal cross-linking actually happened. Don't expect \`OVERVIEW\`: a freshly authored nav page has almost no inbound links, so it won't appear (and that's correct — its coverage was checked in step 1). - - **dead** — fix or remove every dead link. (Source-file links never appear here — only \`.md\`/\`.mdx\` edges are tracked.) + - **dead** — fix or remove every dead link. (Source-file links don't appear in this graph — only \`.md\`/\`.mdx\` edges are tracked — but they ARE validated by \`brokenLinks\` on each write/edit, per the code-reference rule above.) 3. Append a \`wiki/log.md\` entry (see *Log discipline*). 4. Tell the user the wiki is ready and surface the OVERVIEW preview URL. diff --git a/packages/server/src/mcp/tools/write.ts b/packages/server/src/mcp/tools/write.ts index f63b78f6..d94ded9a 100644 --- a/packages/server/src/mcp/tools/write.ts +++ b/packages/server/src/mcp/tools/write.ts @@ -1,4 +1,4 @@ -import { existsSync, readFileSync } from 'node:fs'; +import { readFileSync } from 'node:fs'; import { resolve as resolvePath } from 'node:path'; import { type DocExtension, @@ -19,18 +19,20 @@ import { mergePatch } from '../../content/frontmatter-merge.ts'; import { parentFolderOf } from '../../content/nested-folder-rules.ts'; import { applySubstitution, todayIsoUtc } from '../../content/substitution.ts'; import { resolveTemplatesAvailable } from '../../content/templates-resolver.ts'; -import { SUPPORTED_DOC_EXTENSIONS } from '../../doc-extensions.ts'; import type { AgentIdentity } from '../agent-identity.ts'; import { formatAdvisoryBriefs, formatAdvisoryLines, + formatBrokenLinkBrief, + formatBrokenLinkLines, parseAdvisoryWarnings, + parseBrokenLinks, } from './advisory-warnings.ts'; -import { resolveWithinRoot } from './path-safety.ts'; import { buildPreviewAttachWarning, resolvePreviewUrl, START_UI_TEXT_HINT } from './preview-url.ts'; import type { ConfigOrResolver, ServerInstance, ServerUrlOrResolver } from './shared.ts'; import { agentIdentityFields, + docExtensionOnDisk, documentResultBaseShape, HOCUSPOCUS_NOT_RUNNING_ERROR, httpPost, @@ -125,14 +127,6 @@ function emptyAppendNoOpNote(position: string, markdown: string | undefined): st return `No content to ${position} — document unchanged. To clear a document, use \`position: "replace"\` with empty \`content\`.`; } -function docExtensionOnDisk(contentDir: string, docName: string): '.md' | '.mdx' | null { - for (const ext of SUPPORTED_DOC_EXTENSIONS) { - const contained = resolveWithinRoot(contentDir, `${docName}${ext}`); - if (contained.ok && existsSync(contained.abs)) return ext; - } - return null; -} - function requestedDocExtension(rawDocName: string): '.md' | '.mdx' | null { const lower = rawDocName.toLowerCase(); if (lower.endsWith('.mdx')) return '.mdx'; @@ -211,7 +205,7 @@ async function writeOneDoc( let effectiveMarkdown = spec.content ?? ''; const hasFrontmatter = spec.frontmatter !== undefined && Object.keys(spec.frontmatter).length > 0; - const existingExt = docExtensionOnDisk(contentDir, docName); + const existingExt = docExtensionOnDisk(contentDir, docName) ?? null; const docExists = existingExt !== null; const requestedExt = spec.extension ?? requestedDocExtension(spec.path); @@ -600,12 +594,14 @@ async function handleBatch( if (!r.ok) return { docName: r.docName, ok: false as const, error: r.error }; const preview = resolvePreviewUrl(r.docName, { lockDir }); const warnings = parseAdvisoryWarnings(r.raw.warnings); + const brokenLinks = parseBrokenLinks(r.raw.brokenLinks); return { docName: r.docName, ok: true as const, position: r.position, ...(preview ? { previewUrl: preview.url } : {}), ...(warnings ? { warnings } : {}), + brokenLinks, }; }); const okCount = docOut.filter((d) => d.ok).length; @@ -621,6 +617,10 @@ async function handleBatch( if (d?.ok && d.warnings) { baseParts.push(...formatAdvisoryBriefs(d.warnings)); } + if (d?.ok) { + const brokenBrief = formatBrokenLinkBrief(d.brokenLinks); + if (brokenBrief) baseParts.push(brokenBrief); + } return baseParts.join(' '); }); const perDocNotes = documents.flatMap((spec, i) => { @@ -654,12 +654,9 @@ async function handleSingleDoc( const result = w.raw; const preview = resolvePreviewUrl(w.docName, { lockDir }); - const subscriberCount = - typeof result.subscriberCount === 'number' ? result.subscriberCount : undefined; const systemSubscriberCount = typeof result.systemSubscriberCount === 'number' ? result.systemSubscriberCount : undefined; const noPreviewAnywhere = systemSubscriberCount === 0; - const noPreviewOnThisDoc = subscriberCount === 0; const hints = Array.isArray(result.hints) ? result.hints : undefined; const summaryResult = result.summary && typeof result.summary === 'object' @@ -667,6 +664,7 @@ async function handleSingleDoc( : undefined; const summaryHint = typeof summaryResult?.hint === 'string' ? summaryResult.hint : undefined; const advisoryWarnings = parseAdvisoryWarnings(result.warnings); + const brokenLinks = parseBrokenLinks(result.brokenLinks); const noOpNote = emptyAppendNoOpNote(w.position, spec.content); const lines: string[] = [ @@ -688,19 +686,12 @@ async function handleSingleDoc( if (advisoryWarnings) { lines.push(...formatAdvisoryLines(advisoryWarnings)); } + lines.push(...formatBrokenLinkLines(brokenLinks)); const text = lines.join('\n'); - if ( - !preview && - !noPreviewAnywhere && - !noPreviewOnThisDoc && - !hints && - !summaryResult && - !advisoryWarnings - ) { - return textResult(text); - } - const document: Record = {}; + const document: Record = { + brokenLinks, + }; if (hints) document.hints = hints; if (summaryResult) document.summary = summaryResult; if (advisoryWarnings) document.warnings = advisoryWarnings; @@ -842,7 +833,9 @@ export function register(server: ServerInstance, deps: WriteDeps): void { ...documentResultBaseShape, }) .optional() - .describe('Single-document write result (present when there is a doc-specific signal).'), + .describe( + 'Single-document write result. Always present on a successful single-doc write — it carries `brokenLinks` (possibly `[]`) plus any `summary`/`hints`/`warnings`.', + ), folder: z .object({ ok: z.boolean(), @@ -895,7 +888,7 @@ export function register(server: ServerInstance, deps: WriteDeps): void { documents: looseObjectArray .optional() .describe( - 'Batch write: per-doc result `{ docName, ok, position?, previewUrl?, warnings?, error? }`.', + 'Batch write: per-doc result `{ docName, ok, position?, previewUrl?, warnings?, brokenLinks, error? }`. `brokenLinks` (possibly `[]`) is present on each successful entry, same as a single-doc write.', ), previewUrl: previewUrlOutputField.optional(), previewUrlSource: previewUrlSourceField, diff --git a/packages/server/src/skill-linking-contract.test.ts b/packages/server/src/skill-linking-contract.test.ts new file mode 100644 index 00000000..fcc33568 --- /dev/null +++ b/packages/server/src/skill-linking-contract.test.ts @@ -0,0 +1,46 @@ +import { describe, expect, test } from 'bun:test'; +import { readFileSync } from 'node:fs'; +import { join } from 'node:path'; + +const SKILL_PATH = join(import.meta.dir, '../assets/skills/project/SKILL.md'); +const LINKING_PATH = join(import.meta.dir, '../assets/skills/project/references/linking.md'); +const CORE_CONCEPTS_PATH = join( + import.meta.dir, + '../../../docs/content/reference/core-concepts.md', +); + +describe('bundled project skill — link-authoring contract', () => { + const skill = readFileSync(SKILL_PATH, 'utf-8'); + const linking = readFileSync(LINKING_PATH, 'utf-8'); + + test('core + linking reference stay self-contained: no precedent citation, no PRECEDENTS.md link', () => { + for (const text of [skill, linking]) { + expect(text).not.toMatch(/precedent #/i); + expect(text).not.toContain('PRECEDENTS.md'); + } + }); + + test('core points at brokenLinks and the linking reference', () => { + expect(skill).toContain('brokenLinks'); + expect(skill).toContain('references/linking.md'); + }); + + test('linking reference states relative is the recommended default + the no-hybrid rule', () => { + expect(linking).toContain('the recommended default'); + expect(linking).toContain('Never glue `./` onto a content-root path'); + }); + + test('linking reference makes brokenLinks the primary check + keeps the dead-link sweep as end-state audit', () => { + expect(linking).toMatch(/`brokenLinks`[^\n]*primary check/); + expect(linking).toContain('authoritative end-state audit'); + }); +}); + +describe('docs core-concepts.md — link form guidance', () => { + const doc = readFileSync(CORE_CONCEPTS_PATH, 'utf-8'); + + test('states relative is recommended + the no-hybrid rule', () => { + expect(doc).toContain('The recommended form is **relative**'); + expect(doc).toContain('never glue `./` onto a content-root path'); + }); +});