Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 82 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,88 @@ Changes are tracked via git tags. Each release tag corresponds to an entry here.

## [Unreleased]

_No changes yet._
### Added — Documenter output auto-ingested into Clio

Closes a long-standing inconsistency the user spotted post-v0.24.3:
cfcf auto-ingests almost every workspace artifact into Clio
(iteration logs, judge assessments, reflection analyses, plan.md,
decision-log, architect-review, problem-pack, context-pack…) —
but explicitly **excluded** the documenter's `docs/*.md` output.

The documenter agent template even called this out: *"cf² doesn't
auto-ingest the documenter output (the `docs/` tree is the
canonical surface)."* The carve-out's stated rationale was that
`docs/` is canonical, so Clio is redundant. But that same argument
applies to `plan.md` — which IS auto-ingested. The carve-out was
inconsistent and cost cross-workspace discoverability of the
*most polished, integrative* artifact a workspace produces.

**Behaviour**:

- After the documenter completes (both auto-document path inside
the iteration loop AND standalone `cfcf document`), walk
`<repo>/docs/` recursively and ingest every `*.md` file as a
separate Clio document.
- Stable per-file title: `<workspace>: docs/<relative-path>`
(e.g. `gmbot: docs/architecture.md`, `gmbot: docs/api/auth.md`).
- `updateIfExists: true` — re-running the documenter overwrites
in place, never produces duplicates. sha256 dedup means
unchanged content is a no-op.
- Author stamp: `documenter|<adapter>|<model>` (matches the
existing actor convention).
- Metadata: `{role: "documenter", artifact_type:
"documenter-output", file_path: "<rel>", tier: "semantic",
ingest_trigger: "loop-auto" | "manual", …}`. The
`documenter-output` artifact_type makes the new docs filterable
in `cfcf clio search --metadata`.
- Non-`.md` files (images, JSON config, etc.) are skipped. Dot-
directories (`.git`, `.vscode`, …) under `docs/` are skipped.
- Empty / whitespace-only files are skipped.
- Per-file errors are logged + counted but never fail the rest
of the batch (same best-effort policy as the other auto-ingest
hooks).
- Respects `clio.ingestPolicy` (per-workspace or global): `"off"`
→ no-op; `"summaries-only"` and `"all"` → runs. Documenter
output is treated as a summary — it's the cleanest
cross-workspace artifact a workspace produces.
- Pre-existing user-authored files in `docs/` are also ingested.
Intentional: they're authoritative workspace content; surfacing
them in cross-workspace Clio search is a feature, not a leak.
(Different directory than `cfcf-docs/`, so no overlap with
existing ingests.)

**Implementation** (~165 LoC + tests):

- `packages/core/src/clio/loop-ingest.ts` — new
`ingestDocumenterOutput(backend, workspace, trigger)` helper +
`walkMarkdownFiles(dir)` internal helper (recursive walk, skips
dot-dirs).
- `packages/core/src/iteration-loop.ts` — call site inside the
auto-document branch, after the commit + history-event update.
- `packages/core/src/documenter-runner.ts` — call site after the
standalone `cfcf document` run completes successfully. (Failed
runs don't ingest — partial / broken output shouldn't pollute
cross-workspace search.)
- `packages/core/src/templates/cfcf-documenter-instructions.md` —
prose updated to match reality. Agent no longer needs to be
asked to push to Clio; the harness handles it.

**Test coverage** (11 new tests in `loop-ingest.test.ts`, all 1049
total pass):

- Multi-file ingest with per-file titles
- Recursive walk through nested `docs/` subdirectories
- `updateIfExists` round-trip (one doc per file, content updates
in place across re-runs)
- Non-`.md` files ignored
- Dot-directories skipped
- Empty `docs/` returns `{ingested: 0, errors: 0}` (no-op safe)
- Whitespace-only files skipped
- Author stamp = `documenter|<adapter>|<model>`
- Trigger captured in metadata (`loop-auto` vs `manual`)
- `clio.ingestPolicy: "off"` → no-op
- `clio.ingestPolicy: "summaries-only"` → runs (documenter output
IS a summary)

## [0.24.3] -- 2026-05-13

Expand Down
186 changes: 186 additions & 0 deletions packages/core/src/clio/loop-ingest.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import {
ingestPlanMd,
ingestDevIterationArtifacts,
ingestJudgeArtifact,
ingestDocumenterOutput,
PROBLEM_PACK_FILES,
} from "./loop-ingest.js";
import type { WorkspaceConfig } from "../types.js";
Expand Down Expand Up @@ -1029,3 +1030,188 @@ describe("ingestContextPack", () => {
expect(r.perFile).toEqual([]);
});
});

// ── ingestDocumenterOutput (v0.24.4) ──────────────────────────────────────

describe("ingestDocumenterOutput", () => {
async function seedDocs(files: Record<string, string>): Promise<void> {
for (const [rel, content] of Object.entries(files)) {
const full = join(repoDir, rel);
const dir = full.substring(0, full.lastIndexOf("/"));
await mkdir(dir, { recursive: true });
await writeFile(full, content, "utf-8");
}
}

it("ingests every *.md under docs/ as a separate Clio doc with stable per-file titles", async () => {
const ws = makeWorkspace();
await seedDocs({
"docs/architecture.md": "# Architecture\n\nSystem overview.\n",
"docs/api.md": "# API\n\nEndpoints.\n",
"docs/deployment.md": "# Deployment\n\nDeploy guide.\n",
});

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(3);
expect(result.errors).toBe(0);

const docs = await clio.listDocuments({ project: "test-project" });
const docOutputs = docs.filter(
(d) => (d.metadata as { artifact_type?: string })?.artifact_type === "documenter-output",
);
expect(docOutputs).toHaveLength(3);

// Each file gets `<workspace>: <relative-path>` as title.
const titles = docOutputs.map((d) => d.title).sort();
expect(titles).toEqual([
`${ws.name}: docs/api.md`,
`${ws.name}: docs/architecture.md`,
`${ws.name}: docs/deployment.md`,
]);
});

it("walks nested docs/ subdirectories", async () => {
const ws = makeWorkspace();
await seedDocs({
"docs/architecture.md": "# Top-level\n",
"docs/api/auth.md": "# Auth API\n",
"docs/api/users.md": "# Users API\n",
"docs/guides/quickstart.md": "# Quickstart\n",
});

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(4);

const docs = await clio.listDocuments({ project: "test-project" });
const titles = docs
.filter((d) => (d.metadata as { artifact_type?: string })?.artifact_type === "documenter-output")
.map((d) => d.title)
.sort();
expect(titles).toEqual([
`${ws.name}: docs/api/auth.md`,
`${ws.name}: docs/api/users.md`,
`${ws.name}: docs/architecture.md`,
`${ws.name}: docs/guides/quickstart.md`,
]);
});

it("updates docs in place across re-runs (updateIfExists — one doc per file, NOT one per call)", async () => {
const ws = makeWorkspace();
await seedDocs({ "docs/architecture.md": "# Architecture v1\n\nInitial.\n" });

const r1 = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(r1.ingested).toBe(1);
const docsAfterRun1 = await clio.listDocuments({ project: "test-project" });
expect(docsAfterRun1).toHaveLength(1);
const docId1 = docsAfterRun1[0].id;

// Second documenter run with revised content. The previously
// ingested doc should be UPDATED, not duplicated.
await seedDocs({ "docs/architecture.md": "# Architecture v2\n\nUpdated after a loop.\n" });
const r2 = await ingestDocumenterOutput(clio, ws, "manual");
expect(r2.ingested).toBe(1);

const docsAfterRun2 = await clio.listDocuments({ project: "test-project" });
expect(docsAfterRun2).toHaveLength(1);
expect(docsAfterRun2[0].id).toBe(docId1); // same doc id, updated in place

// Content reflects the second pass.
const fetched = await clio.getDocumentContent(docId1);
expect(fetched?.content).toContain("v2");
expect(fetched?.content).toContain("Updated after a loop");
});

it("ignores non-.md files in docs/", async () => {
const ws = makeWorkspace();
await seedDocs({
"docs/architecture.md": "# Architecture\n",
"docs/diagram.png": "fake binary", // .png — should NOT be ingested
"docs/config.json": "{}", // .json — should NOT be ingested
});

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(1);
const docs = await clio.listDocuments({ project: "test-project" });
expect(docs.filter((d) => (d.metadata as { artifact_type?: string })?.artifact_type === "documenter-output")).toHaveLength(1);
});

it("skips dot-directories (.git, .vscode, etc.)", async () => {
const ws = makeWorkspace();
await seedDocs({
"docs/architecture.md": "# Real doc\n",
"docs/.git/HEAD": "ref: refs/heads/main\n", // unlikely but defensive
"docs/.cache/build.md": "should be skipped\n",
});

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(1);
});

it("returns {ingested: 0, errors: 0} when docs/ doesn't exist (no-op, safe)", async () => {
const ws = makeWorkspace();
// No docs/ created — this can happen if the documenter agent
// failed early or the workspace has no docs phase yet.
const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(0);
expect(result.errors).toBe(0);
});

it("skips empty files (no ingest for whitespace-only content)", async () => {
const ws = makeWorkspace();
await seedDocs({
"docs/empty.md": " \n\n \n",
"docs/real.md": "# Real content\n",
});

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(1); // only real.md
});

it("stamps author as documenter|<adapter>|<model> for audit-log attribution", async () => {
const ws = makeWorkspace({
documenterAgent: { adapter: "codex", model: "gpt-5" },
});
await seedDocs({ "docs/architecture.md": "# Architecture\n" });

await ingestDocumenterOutput(clio, ws, "manual");
const docs = await clio.listDocuments({ project: "test-project" });
const doc = docs.find((d) => d.title === `${ws.name}: docs/architecture.md`);
expect(doc?.author).toBe("documenter|codex|gpt-5");
});

it("captures the trigger (loop-auto vs manual) in metadata for audit", async () => {
const ws = makeWorkspace();
await seedDocs({ "docs/architecture.md": "# Architecture\n" });

await ingestDocumenterOutput(clio, ws, "loop-auto");
const docs1 = await clio.listDocuments({ project: "test-project" });
const doc1 = docs1.find((d) => d.title === `${ws.name}: docs/architecture.md`);
expect((doc1?.metadata as { ingest_trigger?: string })?.ingest_trigger).toBe("loop-auto");

// Re-run via standalone documenter (manual trigger) — overrides
// the previous trigger stamp.
await ingestDocumenterOutput(clio, ws, "manual");
const docs2 = await clio.listDocuments({ project: "test-project" });
const doc2 = docs2.find((d) => d.title === `${ws.name}: docs/architecture.md`);
expect((doc2?.metadata as { ingest_trigger?: string })?.ingest_trigger).toBe("manual");
});

it("respects clio.ingestPolicy = 'off' (no-op)", async () => {
const ws = makeWorkspace({ clio: { ingestPolicy: "off" } });
await seedDocs({ "docs/architecture.md": "# Architecture\n" });

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(0);
expect(result.errors).toBe(0);
const docs = await clio.listDocuments({ project: "test-project" });
expect(docs).toHaveLength(0);
});

it("runs on policy 'summaries-only' (documenter output IS a summary)", async () => {
const ws = makeWorkspace({ clio: { ingestPolicy: "summaries-only" } });
await seedDocs({ "docs/architecture.md": "# Architecture\n" });

const result = await ingestDocumenterOutput(clio, ws, "loop-auto");
expect(result.ingested).toBe(1);
});
});
Loading
Loading