From fa95630db10f29ebbbb14898ecb3c6951c1fb592 Mon Sep 17 00:00:00 2001 From: ilkhombek Date: Thu, 18 Jun 2026 17:34:38 +0500 Subject: [PATCH] Add fablize procedure layer (memory + method, one product) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pairs the codebase-memory-mcp engine (what the code IS) with fablize (how to work on it): clarify-first, multi-story verification gate with bounded self-correction, investigation protocol, verification grounding, destructive-action guard, and observability. Integration is prompt-level only — the disciplines call the MCP tools (get_architecture, search_graph, trace_path, detect_changes) at the points they help most (see INTEGRATION.md). The C core is unmodified upstream, so git pull upstream merges cleanly; fablize lives entirely in fablize/ as dependency-free stdlib Python + plain-text packs. - fablize/: packs, engines (goals/spec/metrics/bundle), guard hook, 15 tests - install-combined.sh: build engine -> register MCP -> apply disciplines - INTEGRATION.md, NOTICE (dual-MIT), fablize-ci.yml (path-scoped) - README: additive callout for the combined distribution Co-Authored-By: Claude Opus 4.8 --- .github/workflows/fablize-ci.yml | 27 +++ INTEGRATION.md | 37 ++++ NOTICE | 19 ++ README.md | 16 ++ fablize/.gitignore | 4 + fablize/AGENTS.md | 123 ++++++++++++ fablize/README.md | 36 ++++ fablize/hooks/destructive_guard.py | 69 +++++++ fablize/install.sh | 60 ++++++ fablize/packs/clarify-pack.txt | 20 ++ fablize/packs/investigation-protocol.txt | 25 +++ fablize/packs/orient-pack.txt | 18 ++ fablize/packs/verification-grounding-pack.txt | 20 ++ fablize/scripts/bundle.py | 141 +++++++++++++ fablize/scripts/goals.py | 186 ++++++++++++++++++ fablize/scripts/metrics.py | 83 ++++++++ fablize/scripts/spec.py | 101 ++++++++++ fablize/tests/test_fablize.py | 158 +++++++++++++++ install-combined.sh | 34 ++++ 19 files changed, 1177 insertions(+) create mode 100644 .github/workflows/fablize-ci.yml create mode 100644 INTEGRATION.md create mode 100644 NOTICE create mode 100644 fablize/.gitignore create mode 100644 fablize/AGENTS.md create mode 100644 fablize/README.md create mode 100644 fablize/hooks/destructive_guard.py create mode 100755 fablize/install.sh create mode 100644 fablize/packs/clarify-pack.txt create mode 100644 fablize/packs/investigation-protocol.txt create mode 100644 fablize/packs/orient-pack.txt create mode 100644 fablize/packs/verification-grounding-pack.txt create mode 100644 fablize/scripts/bundle.py create mode 100644 fablize/scripts/goals.py create mode 100644 fablize/scripts/metrics.py create mode 100644 fablize/scripts/spec.py create mode 100644 fablize/tests/test_fablize.py create mode 100755 install-combined.sh diff --git a/.github/workflows/fablize-ci.yml b/.github/workflows/fablize-ci.yml new file mode 100644 index 00000000..48ee36cf --- /dev/null +++ b/.github/workflows/fablize-ci.yml @@ -0,0 +1,27 @@ +name: fablize-ci +# Procedure-layer CI. Independent of the upstream C workflows (separate filename, scoped +# paths) so it never clashes on `git pull upstream`. +on: + push: + paths: + - 'fablize/**' + - '.github/workflows/fablize-ci.yml' + pull_request: + paths: + - 'fablize/**' + - '.github/workflows/fablize-ci.yml' + +jobs: + test: + runs-on: ubuntu-latest + strategy: + matrix: + python-version: ["3.9", "3.11", "3.13"] + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + - name: fablize test suite (stdlib only, no deps) + working-directory: fablize + run: python -m unittest discover -s tests -p 'test_*.py' -v diff --git a/INTEGRATION.md b/INTEGRATION.md new file mode 100644 index 00000000..434065df --- /dev/null +++ b/INTEGRATION.md @@ -0,0 +1,37 @@ +# How the two layers compose + +This project is one product made of two complementary layers: + +| Layer | Folder | Answers | Form | +|-------|--------|---------|------| +| **Memory** | `src/`, `internal/`, … (the C core) | *What is the code?* — definitions, callers, data flow, architecture | MCP server, 14 tools, SQLite graph | +| **Procedure** | `fablize/` | *How do I work on it?* — clarify, complete, investigate, verify, escalate | stdlib Python + plain-text packs | + +The memory layer gives the agent a **map**; the procedure layer gives it a **method**. Neither +replaces the other — a map without a method wanders, a method without a map crawls file by file. + +## Where the procedure calls the memory + +The fablize disciplines invoke the MCP tools at the exact points they help most: + +| Discipline (`fablize/packs/…`) | Calls these memory tools | Why | +|---|---|---| +| **orient-pack** | `index_repository`, `get_architecture`, `search_graph`, `get_code_snippet`, `trace_path` | Build the map before editing — know the seams and the blast radius. | +| **clarify-pack** (step 0) | `get_architecture`, `search_graph`, `search_code` | Answer unknowns from the code before asking the user — cheaper than a question. | +| **investigation-protocol** (steps 3–4) | `search_graph`, `trace_path` (data_flow), `get_code_snippet`, `query_graph`, `ingest_traces` | `trace_path` *is* the causal chain; `query_graph` exposes hot-path signals. | +| **verification-grounding** | `detect_changes`, `trace_path` (inbound) | Confirm the structural effect of a change and catch a forgotten caller. | +| **spec-lock decisions** (`spec.py`) | `manage_adr` (optional) | A locked architectural decision can be recorded as an ADR in the graph. | + +All of this is **prompt-level wiring** — plain text and tool calls. No C was modified; the C +core stays byte-for-byte upstream, so `git pull upstream` merges cleanly. The procedure layer +also degrades gracefully: if the memory tools are absent, every discipline still applies by +reading files directly. + +## Design boundary (deliberate) + +fablize is **not** reimplemented as MCP tools inside the C server. Its engines stay as +dependency-free Python the agent drives from a shell — the same shell every agent that +codebase-memory-mcp configures already has. This keeps the procedure layer portable, testable +in isolation (`fablize/tests/`), and independent of the C build. + +See `fablize/AGENTS.md` for the operating block and `fablize/README.md` for the layer's contents. diff --git a/NOTICE b/NOTICE new file mode 100644 index 00000000..83607619 --- /dev/null +++ b/NOTICE @@ -0,0 +1,19 @@ +codebase-memory-mcp + fablize +============================== + +This distribution combines two independently MIT-licensed components. + +1. codebase-memory-mcp (the memory layer — C core, src/, internal/, vendored/, …) + Copyright (c) 2025 DeusData + Upstream: https://github.com/DeusData/codebase-memory-mcp + Licensed under the MIT License (see LICENSE). + The C core in this fork is unmodified upstream. + +2. fablize (the procedure layer — the fablize/ directory, plus INTEGRATION.md and + install-combined.sh at the repo root) + Copyright (c) 2025 fivetaku + Upstream: https://github.com/fivetaku/fablize + Licensed under the MIT License. + +Both components are distributed under the MIT License. See LICENSE for the full text. +Each component retains its own copyright; this NOTICE documents their combination. diff --git a/README.md b/README.md index b48a297f..bc7917ae 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,22 @@ High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 158 languages, enhanced with [**Hybrid LSP** semantic type resolution](#hybrid-lsp) for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents. +> ### 🧭 This distribution: codebase-memory-mcp **+ fablize** +> +> This fork pairs the memory engine with **[fablize](fablize/)** — a procedure layer that +> makes an agent *work* well, not just *see* well. The memory layer answers **what the code +> is**; fablize answers **how to work on it**: clarify before building, complete with +> evidence, investigate systematically (using `trace_path` as the literal causal chain), +> verify the structural effect of a change, and escalate honestly at the model's ceiling. +> Two complementary layers, one install — see **[INTEGRATION.md](INTEGRATION.md)**. +> +> ```bash +> bash install-combined.sh # builds the engine, registers MCP, applies the disciplines +> ``` +> +> The C core below is **unmodified upstream** — fablize lives entirely in `fablize/` (pure +> stdlib Python + plain-text packs), so updates from [DeusData/codebase-memory-mcp](https://github.com/DeusData/codebase-memory-mcp) merge cleanly. + > **Research** — The design and benchmarks behind this project are described in the preprint [*Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP*](https://arxiv.org/abs/2603.27277) (arXiv:2603.27277). Evaluated across 31 real-world repositories: 83% answer quality, 10× fewer tokens, 2.1× fewer tool calls vs. file-by-file exploration. > **Security & Trust** — This tool reads your codebase and writes to your agent configuration files. That is what it is designed to do. If you prefer to audit before running, the [full source is here](https://github.com/DeusData/codebase-memory-mcp) — every release binary is signed, checksummed, and scanned by 70+ antivirus engines. All processing happens 100% locally; your code never leaves your machine. Found a security issue? We want to know — see [SECURITY.md](SECURITY.md). Security is Priority #1 for us. diff --git a/fablize/.gitignore b/fablize/.gitignore new file mode 100644 index 00000000..23b07abc --- /dev/null +++ b/fablize/.gitignore @@ -0,0 +1,4 @@ +__pycache__/ +*.pyc +.fablize/ +dist/ diff --git a/fablize/AGENTS.md b/fablize/AGENTS.md new file mode 100644 index 00000000..ea33d2a6 --- /dev/null +++ b/fablize/AGENTS.md @@ -0,0 +1,123 @@ +# fablize — operating disciplines for any AI coding agent + +> This is the tool-agnostic version of fablize. `AGENTS.md` is read by Cursor, GitHub +> Copilot, Gemini CLI, Aider, Codex, and other agents the same way `CLAUDE.md` is read by +> Claude Code. Drop this file (and the `packs/` + `scripts/` it references) into a project +> and any agent gains the same completion / verification / investigation discipline. +> +> Principle: a harness cannot raise a model's ceiling. It makes the model reach its *own* +> ceiling by enforcing verification, completion, and investigation as procedure. When the +> ceiling itself is the blocker (open-ended creative detail, self-driven discovery), +> escalate — don't pretend. + +Apply only what the task signals — the smallest matching discipline. Overlap only when the +task is genuinely multi-category. With no signal, just follow the baseline. + +## The two layers of this project + +This project pairs a **memory layer** with this **procedure layer**: + +- **Memory (codebase-memory-mcp)** — a structural knowledge graph of the code, exposed as MCP + tools: `get_architecture`, `search_graph`, `search_code`, `trace_path`, `query_graph`, + `get_code_snippet`, `detect_changes`, `ingest_traces`, `manage_adr`, `index_repository`, … + It answers *what the code is* — definitions, callers, data flow, architecture — in + sub-millisecond queries. Prefer `search_graph` / `search_code` / `trace_path` **instead of + grep/glob** for finding code, callers, dependencies, and impact. +- **Procedure (fablize, below)** — answers *how to work*: clarify, complete with evidence, + investigate, verify, escalate. + +The disciplines below call the memory tools at the points where they help most (see +`INTEGRATION.md`). When the memory tools are not present, the disciplines still apply — they +degrade gracefully to reading files directly. + +## [always] Baseline + +- Lead with the outcome. Stay within the requested scope — no incidental refactors. +- Ground every "done" claim in a command you actually ran this session (paste the result). +- Confirm before destructive or hard-to-reverse actions. + +## [unfamiliar / multi-file change] Orient first + +Before editing code you have not read this session, build the map: follow +`packs/orient-pack.txt` — `get_architecture` for the seams → `search_graph` to locate the +symbols → `trace_path` (inbound) for the blast radius before changing a shared symbol. +Skip for a self-contained edit in a file already in front of you. + +- Lead with the outcome. Stay within the requested scope — no incidental refactors. +- Ground every "done" claim in a command you actually ran this session (paste the result). +- Confirm before destructive or hard-to-reverse actions. + +## [ambiguous / expensive build] Clarify first + +Before building something underspecified (open-ended, multi-file, design/UI, unstated +scope), follow `packs/clarify-pack.txt`: surface the genuine unknowns → ask ONE batched +round of 1–4 targeted questions → lock the agreed spec → then build against it. Persist it: + +```bash +python3 scripts/spec.py lock --brief "" --req "" \ + --constraint "" --decision "question::answer" +python3 scripts/spec.py show # run first when resuming an ambiguous build +``` + +Skip entirely if the request is already specific — asking on a clear task is its own waste. +First resolve what you can from the code (`get_architecture` / `search_graph`) — a question +the graph already answers is not a question for the user. + +## [2+ sequential stories] Multi-story loop with a verification gate + +Decompose into sequential stories, complete one at a time, produce evidence as you go. +State persists in `./.fablize/` (resume across sessions with `status`). + +```bash +python3 scripts/goals.py create --brief "" \ + --goal "title::verifiable objective" --goal "..." # the LAST goal must be a verification story +python3 scripts/goals.py next # activate the next story + handoff +# ...work that story only... +python3 scripts/goals.py checkpoint --id G001 --status complete --evidence "" +# final story is a gate: --verify-cmd "" --verify-evidence "" are required +python3 scripts/goals.py retry --id G001 # reopen a blocked story for another attempt +python3 scripts/goals.py status # run first when resuming +``` + +Rules: `complete` requires non-empty evidence; the final goal cannot complete without a +verify command + its result. A story that is `blocked` twice trips the escalation gate +(see below) — bounded self-correction, never an infinite retry loop. + +## [debugging / test failure / unknown cause / review] Investigation protocol + +Follow `packs/investigation-protocol.txt`: reproduce first → form 3+ competing hypotheses → +gather evidence per hypothesis → trace the full causal chain (removing the symptom is not +removing the defect) → verify before and after → report the hypotheses you rejected. +The memory tools make this concrete: `trace_path` (mode:"data_flow") *is* the causal chain; +`query_graph` exposes hot-path signals for performance defects; `ingest_traces` folds a +reproduction back into the graph. + +## [render / executable artifact: HTML, SVG, game, UI, chart] Verification grounding + +Follow `packs/verification-grounding-pack.txt`: run it in the real renderer → observe the +actual output → fix what the observation reveals → re-run. A static parse confirms +well-formed, not correct. For a code change, the analogue is `detect_changes` + `trace_path` +(inbound) to confirm the structural effect and catch a caller you forgot. + +## [at the capability ceiling] Escalate + +Signals: stuck on the same problem 2+ times (the goals engine trips this automatically), +open-ended creation where detail itself is the value, deep review needing out-of-spec +discovery. These are capability, not procedure. In order: (1) raise the model's thinking +budget / reasoning effort to its maximum; (2) hand off to a stronger model in a fresh +session with an evidence package (symptoms, attempts, failure point, repro); (3) otherwise +report the limit honestly and name where a human must step in. + +## Observability + +The engines log every event to `~/.fablize/events.jsonl`. Summarize real usage with: + +```bash +python3 scripts/metrics.py # completion rate, escalations, specs locked +``` + +--- + +The `scripts/` are pure-Python stdlib (no dependencies) — any agent with a shell can run +them. The `packs/` are plain text — any agent can read them. That is what makes these +disciplines portable across tools. diff --git a/fablize/README.md b/fablize/README.md new file mode 100644 index 00000000..c1fc721a --- /dev/null +++ b/fablize/README.md @@ -0,0 +1,36 @@ +# fablize — the discipline layer + +This folder is the **procedure layer** of this project. While the C core +(`codebase-memory-mcp`) gives an agent a *map* of the code, fablize gives it a +*method* of working: clarify before building, complete with evidence, investigate +systematically, verify what was rendered, and escalate honestly at the capability ceiling. + +It is self-contained and dependency-free (pure-Python stdlib + plain-text packs), so it +works with **any** agent that has a shell — exactly the agents `codebase-memory-mcp` +already configures. + +## Contents + +| Path | What | +|------|------| +| `AGENTS.md` | the operating block, wired to this project's MCP tools | +| `packs/` | the verified discipline packs (clarify, investigation, verification grounding) | +| `scripts/goals.py` | multi-story loop with an evidence/verification gate + bounded self-correction | +| `scripts/spec.py` | locked-spec store so a clarified spec survives compaction/restart | +| `scripts/metrics.py` | observability over `~/.fablize/events.jsonl` | +| `scripts/bundle.py` | build a portable, tool-agnostic bundle of the disciplines | +| `hooks/destructive_guard.py` | PreToolUse guard that asks before hard-to-reverse commands | +| `tests/` | stdlib unittest suite (no deps) | + +## Run the tests + +```bash +python3 -m unittest discover -s tests -v +``` + +## How it composes with the memory layer + +See [`../INTEGRATION.md`](../INTEGRATION.md) for how the disciplines call the MCP tools +(`get_architecture`, `search_graph`, `trace_path`, `detect_changes`, …). + +MIT licensed. diff --git a/fablize/hooks/destructive_guard.py b/fablize/hooks/destructive_guard.py new file mode 100644 index 00000000..c57c9d0d --- /dev/null +++ b/fablize/hooks/destructive_guard.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python3 +"""fablize destructive-action guard — a deterministic PreToolUse hook. + +The "confirm before destructive or hard-to-reverse actions" rule lives in the operating +block as text, which a model can skip. This hook makes it a *preventive control*: it +inspects Bash commands before they run and forces a human approval prompt for the +genuinely dangerous, hard-to-reverse ones (recursive force-delete, force-push, history +rewrite, disk wipe, destructive SQL, etc.). + +Protocol: reads the PreToolUse payload on stdin, emits a permission decision on stdout. + - "ask" → Claude Code prompts the user to approve before running (default for matches). + - silent → exit 0 with no output lets the command proceed normally. +It never hard-blocks (deny) — the user stays in control; it only inserts a checkpoint. +""" +import json +import re +import sys + +# (compiled pattern, human reason). Order doesn't matter; first match wins for the message. +RULES = [ + (r"\brm\s+(-[a-zA-Z]*r[a-zA-Z]*\s+)*-?[a-zA-Z]*f|\brm\s+-[a-zA-Z]*f[a-zA-Z]*r", "recursive/forced file deletion (rm -rf)"), + (r"\brm\s+-[a-zA-Z]*r[a-zA-Z]*\s+(/|~|\$HOME|\.)\s*$", "recursive delete of a top-level path"), + (r"\bgit\s+push\b.*(--force\b|-f\b)", "git force-push (rewrites remote history)"), + (r"\bgit\s+(reset\s+--hard|clean\s+-[a-zA-Z]*f|filter-branch|filter-repo)\b", "git history/working-tree destruction"), + (r"\bgit\s+branch\s+-D\b", "force-delete of a git branch"), + (r"\b(drop|truncate)\s+(table|database|schema)\b", "destructive SQL (DROP/TRUNCATE)"), + (r"\b(mkfs|dd\s+if=|shred|wipefs)\b", "disk/partition wipe"), + (r"\b(kubectl|helm)\s+delete\b", "Kubernetes resource deletion"), + (r"\b(terraform|tofu)\s+destroy\b", "infrastructure teardown (terraform destroy)"), + (r":\(\)\s*\{\s*:\|:&\s*\}", "fork bomb"), + (r"\bchmod\s+-R\b|\bchown\s+-R\b", "recursive permission/ownership change"), + (r">\s*/dev/sd[a-z]", "raw write to a block device"), +] +COMPILED = [(re.compile(p, re.I), why) for p, why in RULES] + + +def match(command): + for rx, why in COMPILED: + if rx.search(command): + return why + return None + + +def main(): + try: + payload = json.load(sys.stdin) + except (ValueError, OSError): + sys.exit(0) # malformed input — do not interfere + if payload.get("tool_name") != "Bash": + sys.exit(0) + command = (payload.get("tool_input") or {}).get("command", "") + if not command: + sys.exit(0) + why = match(command) + if not why: + sys.exit(0) + out = { + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "ask", + "permissionDecisionReason": f"fablize guard: {why}. Confirm this hard-to-reverse action before running.", + } + } + print(json.dumps(out)) + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/fablize/install.sh b/fablize/install.sh new file mode 100755 index 00000000..94f3c60c --- /dev/null +++ b/fablize/install.sh @@ -0,0 +1,60 @@ +#!/usr/bin/env bash +# fablize install — apply the procedure layer to a project (any agent). +# Companion to the codebase-memory-mcp (memory layer) install. Additive and idempotent: +# copies packs+scripts in, appends the operating block to whatever instruction file the +# agent reads, and registers the destructive-action guard for Claude Code if present. +# Usage: bash fablize/install.sh [target-project-dir] (default: current directory) +set -euo pipefail +HERE="$(cd "$(dirname "$0")" && pwd)" +TARGET="${1:-$PWD}" +echo "fablize (procedure layer) → $TARGET" + +mkdir -p "$TARGET/.fablize-disciplines/packs" "$TARGET/.fablize-disciplines/scripts" +cp "$HERE/packs/"*.txt "$TARGET/.fablize-disciplines/packs/" +cp "$HERE/scripts/"*.py "$TARGET/.fablize-disciplines/scripts/" +cp "$HERE/hooks/destructive_guard.py" "$TARGET/.fablize-disciplines/" 2>/dev/null || true +echo " ✓ packs + scripts → .fablize-disciplines/" + +# Append the operating block to any instruction file the agent already uses, else AGENTS.md. +block="$HERE/AGENTS.md" +wrote=0 +for f in AGENTS.md CLAUDE.md .cursorrules .github/copilot-instructions.md GEMINI.md; do + path="$TARGET/$f" + if [ -f "$path" ]; then + if ! grep -q "fablize — operating disciplines" "$path" 2>/dev/null; then + mkdir -p "$(dirname "$path")" + { printf '\n\n'; cat "$block"; } >> "$path" + echo " ✓ appended disciplines to $f" + else + echo " = $f already has fablize disciplines" + fi + wrote=1 + fi +done +if [ "$wrote" -eq 0 ]; then + cp "$block" "$TARGET/AGENTS.md" + echo " ✓ created AGENTS.md" +fi + +# Register the destructive-action guard for Claude Code, if its settings file is present. +SETTINGS="$HOME/.claude/settings.json" +if command -v python3 >/dev/null 2>&1 && [ -f "$SETTINGS" ]; then + python3 - "$SETTINGS" "$TARGET/.fablize-disciplines/destructive_guard.py" <<'PY' || true +import json, os, sys +settings, guard = sys.argv[1], sys.argv[2] +try: + data = json.load(open(settings, encoding="utf-8")) +except (OSError, ValueError): + raise SystemExit(0) +cmd = f'python3 "{guard}"' +hooks = data.setdefault("hooks", {}).setdefault("PreToolUse", []) +blob = json.dumps(hooks) +if "destructive_guard.py" in blob: + print(" = destructive guard already registered (Claude Code)"); raise SystemExit(0) +hooks.append({"matcher": "Bash", "hooks": [{"type": "command", "command": cmd, "timeout": 10}]}) +json.dump(data, open(settings, "w", encoding="utf-8"), indent=2) +print(" ✓ destructive guard registered (Claude Code PreToolUse)") +PY +fi + +echo "Done. The agent now has the fablize disciplines wired to the memory tools (see INTEGRATION.md)." diff --git a/fablize/packs/clarify-pack.txt b/fablize/packs/clarify-pack.txt new file mode 100644 index 00000000..cb78980e --- /dev/null +++ b/fablize/packs/clarify-pack.txt @@ -0,0 +1,20 @@ + + +The most expensive token waste is rework: starting an ambiguous task, building the wrong thing, and rebuilding it after the user says "that's not what I meant." One round of targeted questions costs almost nothing against a full rebuild. Close the START of the task, not just its end. + +Apply this ONLY when getting it wrong would be expensive to redo — an open-ended or multi-file build, a design/UI artifact, a new feature with unstated scope, anything where you are about to commit real work on underspecified requirements. If the request is already specific (exact files named, concrete requirements, a closed answer), SKIP this entirely and just do it. This is not "always ask" — asking on an already-clear task is its own waste. + +The discipline, before you write anything: + +0. RESOLVE FROM THE CODE FIRST (when the codebase-memory tools are available — this project ships them). An unknown you can answer from the codebase is not a question for the user. Before drafting questions, call `get_architecture` for the lay of the land and `search_graph` / `search_code` to check how the relevant area is already built (existing patterns, naming, data shapes, call sites). A question the graph already answers is its own waste — cheaper than an AskUserQuestion round. Only genuinely undecided, judgment-call unknowns survive to step 1. + +1. SURFACE THE UNKNOWNS. List what is genuinely undecided and would change what you build: scope (how far does this go?), inputs/outputs, tech stack / dependencies, visual or API style, data shape, and — most important — the "done" criterion (how will we both know it's correct?). + +2. ASK ONCE, BATCHED. Put 1–4 of the highest-leverage unknowns into a SINGLE AskUserQuestion call — not one question per turn, drip by drip. Each question: a clear prompt, 2–4 concrete options, the recommended default as the first option. Only ask what actually changes the work; do not ask what you can safely default or infer from context. + +3. LOCK THE SPEC. Treat the answers (plus anything already explicit in the request) as the agreed specification. State it back in one or two lines and build against it. If a spec ledger is in use, record it so a later session does not re-ask — see scripts/spec.py (lock / show). + +4. THEN BUILD. Once the spec is locked, proceed without re-litigating decisions the user already made. If a genuinely new ambiguity appears mid-build, fold it into the next natural checkpoint rather than stopping for every micro-question. + +The trigger test: "Could building the wrong thing here cost a full redo?" If yes, clarify before starting. If no, just start. + diff --git a/fablize/packs/investigation-protocol.txt b/fablize/packs/investigation-protocol.txt new file mode 100644 index 00000000..df388775 --- /dev/null +++ b/fablize/packs/investigation-protocol.txt @@ -0,0 +1,25 @@ + + +When debugging, follow this discipline: + +1. Reproduce first. Run the failing case and read the actual output before forming any hypothesis. + +2. Develop several competing hypotheses — at least three — before investigating any single one. A symptom that pattern-matches to a known failure may have a different cause. The most visible signal in the logs is not necessarily the root cause; treat it as one hypothesis among several, not the conclusion. + +3. For each hypothesis, identify what evidence would confirm or refute it, then gather that evidence by reading the relevant code paths end to end. Track your confidence per hypothesis as evidence accumulates. + +4. Trace the full causal chain. Do not stop at the first plausible cause: ask what allowed that cause to produce this symptom, and whether removing only the visible trigger would leave the defect latent. A fix that makes the test pass is not necessarily a fix that removes the defect. + +5. Verify before and after. Confirm the root cause with evidence before changing code. After the fix, demonstrate that the failure mode itself is gone — not merely that the triggering condition no longer occurs in this environment. + +6. In your report, state the hypotheses you rejected and the evidence that rejected them. + +MEMORY-GRAPH AUGMENTATION (when the codebase-memory tools are available — this project ships them): +Steps 3 and 4 are exactly what the structural graph does faster and more completely than grep: +- To find a definition or its relationships, call `search_graph` (BM25 query, name_pattern, or semantic_query) INSTEAD OF grep/glob — it returns the qualified_name and structural links, not just text hits. +- To trace the causal chain (step 4), call `trace_path` on the suspect function: direction:"inbound" for "who can reach this symptom" (callers/impact), direction:"outbound" for "what this touches", and mode:"data_flow" to follow a bad value hop by hop with the arg expression at each step. The chain IS the path the tool returns — step 4 becomes literal instead of manual. +- To read a suspect's source, call `get_code_snippet` with the qualified_name from search_graph. +- For performance/complexity defects, `query_graph` exposes per-function hot-path signals (transitive_loop_depth, linear_scan_in_loop, recursion_in_loop) — query them to locate the real bottleneck instead of guessing. +- After a runtime reproduction, `ingest_traces` folds the observed trace back into the graph so the next query reflects what actually executed. +The discipline (compete hypotheses, trace the WHOLE chain, verify before/after) is unchanged — the tools just make each step cheaper and more complete. + diff --git a/fablize/packs/orient-pack.txt b/fablize/packs/orient-pack.txt new file mode 100644 index 00000000..2ab9340d --- /dev/null +++ b/fablize/packs/orient-pack.txt @@ -0,0 +1,18 @@ + + +Before working a non-trivial change in an unfamiliar area, build the map before you touch the territory. Editing code you have only grepped is how an agent breaks a caller it never saw. This project ships a structural knowledge graph of the codebase — use it to orient first. + +Apply this when: the task touches code you have not already read this session, spans more than one file, or asks "where/how does X work?". SKIP it for a self-contained one-liner in a file already in front of you — orienting a known area is its own waste. + +The orientation loop, before you edit: + +1. ENSURE THE GRAPH EXISTS. If this repo is not indexed yet, call `index_repository` (check with `list_projects` / `index_status`). One-time per repo; incremental after. + +2. GET THE LAY OF THE LAND. Call `get_architecture` for packages, services, dependencies, and the Leiden clusters — the de-facto modules and architectural seams, which often cut across the folder layout. This tells you which seam your change lands in. + +3. LOCATE THE WORK. Call `search_graph` (natural-language query, name_pattern, or semantic_query) INSTEAD OF grep/glob to find the exact symbols you will touch — you get qualified_names and structural links, not just text hits. Read the target with `get_code_snippet`. + +4. KNOW THE BLAST RADIUS. Before changing a shared symbol, call `trace_path` (direction:"inbound") to see every caller you must not break. This is the difference between a local edit and a silent regression. + +Then proceed into the normal loop (clarify / multi-story / build) already knowing the map. The graph is the cheapest orientation available — sub-millisecond queries instead of a file-by-file crawl that burns tokens and still misses cross-file edges. + diff --git a/fablize/packs/verification-grounding-pack.txt b/fablize/packs/verification-grounding-pack.txt new file mode 100644 index 00000000..085c3aa5 --- /dev/null +++ b/fablize/packs/verification-grounding-pack.txt @@ -0,0 +1,20 @@ + + +When you produce an artifact whose correctness can only be confirmed by running or rendering it — an HTML page, an SVG, a game, a UI, a chart, a script with observable output, an animation — do not stop at writing the file and telling the user to open it. Before you declare the work done, run it in its natural execution environment and observe the actual output yourself. + +This is a verification MODALITY, not extra testing for its own right. The point is not "write more tests"; it is "see the thing actually behave." A static parse (xmllint, node --check, HTMLParser, minidom) confirms the file is well-formed — it does NOT confirm the artifact looks or behaves correctly. Well-formed and correct are different claims. + +The grounding loop, before completion: + +1. RUN IT in the real renderer. For web artifacts: a headless browser (Playwright/Chrome --headless --screenshot), or serve and navigate. For SVG: render to PNG. For scripts: execute and capture stdout/stderr. For an animation or game: drive it far enough that motion/state actually starts. + +2. OBSERVE THE OUTPUT. Read the screenshot back. Read the console for errors. Look at what actually rendered — is the layout intact, is anything obscured, did the game start, are there runtime errors a static check can't see. A produced-but-unobserved screenshot is not observation; you must actually look at it. + +3. FIX WHAT THE OBSERVATION REVEALS, then re-run. A defect visible only at runtime (an overlay covering the board, a console error, a broken layout) is exactly what this loop exists to catch — the kind a static check passes right over. + +STRUCTURAL GROUNDING (when the codebase-memory tools are available — this project ships them). For a code change (not a rendered artifact), the analogue of "observe the output" is confirming the change had the structural effect you intended and no unintended one: after editing, call `detect_changes` to see the changed symbols and their impact radius, and `trace_path` (direction:"inbound") on anything you altered to confirm you accounted for every caller. This catches the silent breakage a passing local test misses — a caller you forgot exists. It complements, not replaces, running the artifact below. + +Apply this only to artifacts with an observable execution result. Pure text, prose, configuration, or plain logic that has its own test suite does not need rendering — for those, the relevant grounding is running the tests, which you already do. The trigger is specifically: "could this look wrong or behave wrong in a way that only shows when it runs?" If yes, run it and look before you finish. + +Stop when you have actually looked, not after a fixed number of checks. One clean observation of the rendered output is enough — if the first render shows the artifact behaving and looking correct, you are done; do not re-render the same unchanged state to accumulate confidence. Re-render only after you change something: each defect the observation reveals gets one fix and one re-check, and you stop again once that check is clean. The goal is "I saw it work," not "I checked it N times." Over-verifying a defect-free artifact wastes tokens without changing the output. + diff --git a/fablize/scripts/bundle.py b/fablize/scripts/bundle.py new file mode 100644 index 00000000..dfe6f857 --- /dev/null +++ b/fablize/scripts/bundle.py @@ -0,0 +1,141 @@ +#!/usr/bin/env python3 +"""fablize bundle — build a portable, tool-agnostic package of the disciplines. + +Produces `dist/fablize-portable/` (and a `.zip`) containing everything needed to apply +fablize to ANY AI agent — Claude Code, Cursor, Copilot, Gemini, Aider, Codex — with no +plugin install and no dependencies. Send the zip to anyone; they unzip and run apply.sh +in their project. + +What goes in the bundle: + AGENTS.md - the universal operating block (read by most agents) + packs/ - the verified discipline packs (plain text) + scripts/ - goals.py, spec.py, metrics.py (stdlib-only Python) + apply.sh - drops AGENTS.md + packs/ + scripts/ into a target project + QUICKSTART.md - per-tool wiring instructions + +Usage: + python3 scripts/bundle.py # build dist/fablize-portable + .zip + python3 scripts/bundle.py --out /tmp/x +""" +import argparse +import shutil +import zipfile +from pathlib import Path + +ROOT = Path(__file__).resolve().parent.parent + +APPLY_SH = """#!/usr/bin/env bash +# Apply fablize disciplines to a project (any AI agent). Usage: bash apply.sh [target-dir] +set -euo pipefail +HERE="$(cd "$(dirname "$0")" && pwd)" +TARGET="${1:-$PWD}" +echo "fablize → $TARGET" +mkdir -p "$TARGET/packs" "$TARGET/scripts" +cp "$HERE/packs/"*.txt "$TARGET/packs/" +cp "$HERE/scripts/"*.py "$TARGET/scripts/" +# Append the operating block to the agent instruction file(s) present, else create AGENTS.md. +block="$HERE/AGENTS.md" +wrote=0 +for f in AGENTS.md CLAUDE.md .cursorrules .github/copilot-instructions.md GEMINI.md; do + path="$TARGET/$f" + if [ -f "$path" ]; then + if ! grep -q "fablize — operating disciplines" "$path" 2>/dev/null; then + { printf '\\n\\n'; cat "$block"; } >> "$path" + echo " ✓ appended disciplines to $f" + else + echo " = $f already has fablize disciplines" + fi + wrote=1 + fi +done +if [ "$wrote" -eq 0 ]; then + cp "$block" "$TARGET/AGENTS.md" + echo " ✓ created AGENTS.md" +fi +echo "Done. Your agent now has the fablize disciplines. See QUICKSTART.md for per-tool notes." +""" + +QUICKSTART = """# fablize portable — quickstart + +These are the fablize operating disciplines packaged for **any** AI coding agent. +No plugin, no install, no dependencies (the scripts are pure-Python stdlib). + +## Apply to a project + +```bash +bash apply.sh /path/to/your/project # or just `bash apply.sh` inside the project +``` + +This copies `packs/` + `scripts/` into the project and adds the operating block to +whichever instruction file your agent reads. + +## How each tool picks it up + +| Agent | Reads | +|------------------|------------------------------------| +| Claude Code | `CLAUDE.md` / `AGENTS.md` | +| Cursor | `.cursorrules` / `AGENTS.md` | +| GitHub Copilot | `.github/copilot-instructions.md` | +| Gemini CLI | `GEMINI.md` / `AGENTS.md` | +| Aider / Codex / | `AGENTS.md` | +| others | | + +`apply.sh` appends to any of these that already exist, otherwise it creates `AGENTS.md` +(the emerging cross-tool standard). + +## Use it + +The agent now follows the disciplines automatically by task type. You can also drive the +engines yourself from a shell: + +```bash +python3 scripts/spec.py lock --req "..." --decision "q::a" # lock a clarified spec +python3 scripts/goals.py create --brief "..." --goal "a::x" --goal "verify::y" +python3 scripts/goals.py next +python3 scripts/metrics.py # observability summary +``` + +MIT licensed. Source: https://github.com/fivetaku/fablize +""" + + +def build(out_dir): + pkg = out_dir / "fablize-portable" + if pkg.exists(): + shutil.rmtree(pkg) + (pkg / "packs").mkdir(parents=True) + (pkg / "scripts").mkdir(parents=True) + + shutil.copy2(ROOT / "AGENTS.md", pkg / "AGENTS.md") + for f in (ROOT / "packs").glob("*.txt"): + shutil.copy2(f, pkg / "packs" / f.name) + for name in ("goals.py", "spec.py", "metrics.py"): + shutil.copy2(ROOT / "scripts" / name, pkg / "scripts" / name) + (pkg / "apply.sh").write_text(APPLY_SH, encoding="utf-8") + (pkg / "apply.sh").chmod(0o755) + (pkg / "QUICKSTART.md").write_text(QUICKSTART, encoding="utf-8") + shutil.copy2(ROOT / "README.md", pkg / "README.md") + + zip_path = out_dir / "fablize-portable.zip" + with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as z: + for p in pkg.rglob("*"): + z.write(p, p.relative_to(out_dir)) + return pkg, zip_path + + +def main(): + ap = argparse.ArgumentParser(prog="bundle.py") + ap.add_argument("--out", default=str(ROOT / "dist")) + a = ap.parse_args() + out = Path(a.out) + out.mkdir(parents=True, exist_ok=True) + pkg, zip_path = build(out) + n_files = sum(1 for _ in pkg.rglob("*") if _.is_file()) + print(f"fablize: portable bundle built — {n_files} files") + print(f" dir: {pkg}") + print(f" zip: {zip_path} (send this to anyone)") + print(" apply: unzip → bash apply.sh /path/to/project") + + +if __name__ == "__main__": + main() diff --git a/fablize/scripts/goals.py b/fablize/scripts/goals.py new file mode 100644 index 00000000..d0483275 --- /dev/null +++ b/fablize/scripts/goals.py @@ -0,0 +1,186 @@ +#!/usr/bin/env python3 +"""fablize goal engine — a self-contained, stdlib-only multi-story loop with a verification gate. + +Design (behavior only): + - Decompose a task into sequential stories, persisted to a ledger (.fablize/) — survives session death. + - A story can be checkpointed only after `next` activates it. + - A `complete` checkpoint requires non-empty evidence. + - The final story cannot complete without a verify command + result (the verification gate). + +Usage: + goals.py create --brief "..." --goal "title::objective" [--goal ...] + goals.py next # activate the next story + print a handoff + goals.py checkpoint --id G001 --status complete|failed|blocked --evidence "..." + [--verify-cmd "" --verify-evidence ""] # required on the final story + goals.py status +State directory: ./.fablize/ (run from the repo root) +""" +import argparse +import json +import sys +from datetime import datetime, timezone +from pathlib import Path + +DIR = Path(".fablize") +GOALS = DIR / "goals.json" +LEDGER = DIR / "ledger.jsonl" +# Global, cross-project event stream for observability (metrics.py reads this). +GLOBAL_LOG = Path.home() / ".fablize" / "events.jsonl" +ESCALATE_AFTER = 2 # blocked attempts on one story before the engine forces escalation + + +def now(): + return datetime.now(timezone.utc).isoformat() + + +def log(event, **kw): + DIR.mkdir(exist_ok=True) + rec = {"ts": now(), "event": event, **kw} + with open(LEDGER, "a", encoding="utf-8") as f: + f.write(json.dumps(rec, ensure_ascii=False) + "\n") + try: + GLOBAL_LOG.parent.mkdir(exist_ok=True) + with open(GLOBAL_LOG, "a", encoding="utf-8") as f: + f.write(json.dumps({**rec, "tool": "goals", "cwd": str(Path.cwd())}, ensure_ascii=False) + "\n") + except OSError: + pass # never let observability break the engine + + +def load(): + if not GOALS.exists(): + sys.exit("fablize: no plan — run `create` from the repo root first.") + return json.loads(GOALS.read_text(encoding="utf-8")) + + +def save(plan): + DIR.mkdir(exist_ok=True) + GOALS.write_text(json.dumps(plan, ensure_ascii=False, indent=1), encoding="utf-8") + + +def cmd_create(a): + if GOALS.exists() and not a.force: + sys.exit("fablize: a plan already exists. Check it with `status`, or replace it with --force.") + goals = [] + for i, g in enumerate(a.goal, 1): + if "::" not in g: + sys.exit(f"fablize: --goal format is 'title::objective' — invalid: {g}") + title, obj = g.split("::", 1) + goals.append({"id": f"G{i:03d}", "title": title.strip(), "objective": obj.strip(), + "status": "pending", "evidence": None, "attempts": 0}) + if not goals: + sys.exit("fablize: at least one --goal is required.") + save({"brief": a.brief, "created": now(), "goals": goals}) + log("plan_created", brief=a.brief, count=len(goals)) + print(f"fablize: plan created — {len(goals)} stories") + for g in goals: + print(f" {g['id']} {g['title']}: {g['objective']}") + + +def cmd_next(a): + plan = load() + active = [g for g in plan["goals"] if g["status"] == "in_progress"] + if active: + g = active[0] + else: + pending = [g for g in plan["goals"] if g["status"] == "pending"] + if not pending: + blocked = [g for g in plan["goals"] if g["status"] == "blocked"] + if blocked: + print(f"fablize: no pending stories, but {len(blocked)} blocked — reopen one with " + f"`retry --id {blocked[0]['id']}` or report the blocker.") + else: + print("fablize: all stories complete ✓") + return + g = pending[0] + g["status"] = "in_progress" + save(plan); log("story_started", id=g["id"], title=g["title"]) + is_final = g["id"] == plan["goals"][-1]["id"] + print(f"=== fablize handoff — {g['id']} {g['title']}") + print(f"Objective: {g['objective']}") + print("Rule: work this story only. Produce evidence as you go.") + if is_final: + print("★ Final story — the complete checkpoint requires --verify-cmd and --verify-evidence (verification gate).") + print(f"On completion: goals.py checkpoint --id {g['id']} --status complete --evidence \"\"" + + (" --verify-cmd \"\" --verify-evidence \"\"" if is_final else "")) + + +def cmd_retry(a): + plan = load() + g = next((x for x in plan["goals"] if x["id"] == a.id), None) + if not g: + sys.exit(f"fablize: {a.id} not found.") + if g["status"] not in ("blocked", "failed"): + sys.exit(f"fablize: {a.id} is {g['status']} — only a blocked/failed story can be retried.") + g["status"] = "in_progress" + save(plan) + attempt = g.get("attempts", 0) + 1 + log("story_started", id=g["id"], title=g["title"], retry=True, attempt=attempt) + print(f"↻ fablize retry — {g['id']} (attempt {attempt}); escalates at {ESCALATE_AFTER} failures.") + print(f"Objective: {g['objective']}") + + +def cmd_checkpoint(a): + plan = load() + g = next((x for x in plan["goals"] if x["id"] == a.id), None) + if not g: + sys.exit(f"fablize: {a.id} not found.") + if g["status"] != "in_progress": + sys.exit(f"fablize: {a.id} is not active ({g['status']}) — activate it with `next` first.") + if a.status == "complete": + if not (a.evidence and a.evidence.strip()): + sys.exit("fablize: a complete checkpoint requires non-empty --evidence.") + if g["id"] == plan["goals"][-1]["id"]: + if not (a.verify_cmd and a.verify_cmd.strip() and a.verify_evidence and a.verify_evidence.strip()): + sys.exit("fablize: the final story cannot complete without --verify-cmd and --verify-evidence (verification gate).") + # Self-correction counter: a blocked/failed checkpoint is an attempt. After ESCALATE_AFTER + # of them on the same story, the engine stops the retry spiral and prints an escalation handoff + # (best-practice: bounded self-correction, then escalate — never loop forever). + if a.status in ("blocked", "failed"): + g["attempts"] = g.get("attempts", 0) + 1 + g["status"] = a.status + g["evidence"] = a.evidence + save(plan) + log("checkpoint", id=g["id"], status=a.status, evidence=a.evidence, + attempts=g.get("attempts", 0), verify_cmd=a.verify_cmd, verify_evidence=a.verify_evidence) + print(f"fablize: {g['id']} → {a.status}") + if a.status in ("blocked", "failed") and g.get("attempts", 0) >= ESCALATE_AFTER: + log("escalation_triggered", id=g["id"], attempts=g["attempts"]) + print(f"★ fablize escalation gate — {g['id']} has failed {g['attempts']}× (≥{ESCALATE_AFTER}).") + print(" This is likely the model's capability ceiling, not a procedure gap. In order:") + print(" 1) recommend `/effort xhigh` to push the current model to its ceiling;") + print(" 2) hand off to a stronger model in a fresh session with an evidence package") + print(" (symptoms, attempts, failure point, repro);") + print(" 3) otherwise report the limit honestly and name where a human must step in.") + return + remaining = [x for x in plan["goals"] if x["status"] in ("pending", "in_progress")] + print("fablize: all stories complete ✓" if not remaining else f"fablize: {len(remaining)} stories left — continue with `next`.") + + +def cmd_status(a): + plan = load() + done = sum(1 for g in plan["goals"] if g["status"] == "complete") + print(f"fablize: {done}/{len(plan['goals'])} complete — {plan['brief']}") + mark = {"complete": "✓", "in_progress": "▶", "pending": "·", "failed": "✗", "blocked": "■"} + for g in plan["goals"]: + print(f" {mark.get(g['status'],'?')} {g['id']} [{g['status']}] {g['title']}") + + +def main(): + p = argparse.ArgumentParser(prog="goals.py") + sub = p.add_subparsers(dest="cmd", required=True) + c = sub.add_parser("create"); c.add_argument("--brief", required=True) + c.add_argument("--goal", action="append", default=[]); c.add_argument("--force", action="store_true") + sub.add_parser("next") + k = sub.add_parser("checkpoint"); k.add_argument("--id", required=True) + k.add_argument("--status", required=True, choices=["complete", "failed", "blocked"]) + k.add_argument("--evidence", default=""); k.add_argument("--verify-cmd", dest="verify_cmd", default="") + k.add_argument("--verify-evidence", dest="verify_evidence", default="") + rt = sub.add_parser("retry"); rt.add_argument("--id", required=True) + sub.add_parser("status") + a = p.parse_args() + {"create": cmd_create, "next": cmd_next, "checkpoint": cmd_checkpoint, + "retry": cmd_retry, "status": cmd_status}[a.cmd](a) + + +if __name__ == "__main__": + main() diff --git a/fablize/scripts/metrics.py b/fablize/scripts/metrics.py new file mode 100644 index 00000000..448b6261 --- /dev/null +++ b/fablize/scripts/metrics.py @@ -0,0 +1,83 @@ +#!/usr/bin/env python3 +"""fablize metrics — summarize the cross-project event stream (~/.fablize/events.jsonl). + +This is the observability layer: it turns the raw event log written by goals.py / spec.py +into real, queryable numbers (how many plans, completion rate, how often work hit the +escalation gate, how many specs were locked). It gives the "verified-only" philosophy +actual data to decide on, instead of self-assessment. + +Usage: + metrics.py # human-readable summary + metrics.py --json # machine-readable + metrics.py --since 2026-06-01 # only events on/after this ISO date +""" +import argparse +import json +from collections import Counter +from pathlib import Path + +GLOBAL_LOG = Path.home() / ".fablize" / "events.jsonl" + + +def read_events(since=""): + if not GLOBAL_LOG.exists(): + return [] + out = [] + for line in GLOBAL_LOG.read_text(encoding="utf-8").splitlines(): + line = line.strip() + if not line: + continue + try: + rec = json.loads(line) + except ValueError: + continue + if since and rec.get("ts", "") < since: + continue + out.append(rec) + return out + + +def summarize(events): + ev = Counter(e.get("event") for e in events) + checkpoints = [e for e in events if e.get("event") == "checkpoint"] + statuses = Counter(c.get("status") for c in checkpoints) + completed = statuses.get("complete", 0) + total_ck = len(checkpoints) + projects = {e.get("cwd") for e in events if e.get("cwd")} + return { + "events_total": len(events), + "plans_created": ev.get("plan_created", 0), + "stories_started": ev.get("story_started", 0), + "checkpoints": total_ck, + "checkpoint_status": dict(statuses), + "completion_rate": round(completed / total_ck, 3) if total_ck else None, + "escalations": ev.get("escalation_triggered", 0), + "specs_locked": ev.get("spec_locked", 0), + "projects": len(projects), + } + + +def main(): + p = argparse.ArgumentParser(prog="metrics.py") + p.add_argument("--json", action="store_true") + p.add_argument("--since", default="") + a = p.parse_args() + s = summarize(read_events(a.since)) + if a.json: + print(json.dumps(s, ensure_ascii=False, indent=2)) + return + if not s["events_total"]: + print("fablize: no events yet (~/.fablize/events.jsonl is empty). Run a goals/spec flow first.") + return + print(f"fablize metrics{(' since ' + a.since) if a.since else ''} — {s['events_total']} events across {s['projects']} project(s)") + print(f" plans created : {s['plans_created']}") + print(f" stories started : {s['stories_started']}") + print(f" checkpoints : {s['checkpoints']} {s['checkpoint_status']}") + rate = f"{s['completion_rate']*100:.1f}%" if s["completion_rate"] is not None else "n/a" + print(f" completion rate : {rate}") + print(f" escalation gate : {s['escalations']} hit(s)") + print(f" specs locked : {s['specs_locked']}") + + +if __name__ == "__main__": + main() diff --git a/fablize/scripts/spec.py b/fablize/scripts/spec.py new file mode 100644 index 00000000..1eef6c97 --- /dev/null +++ b/fablize/scripts/spec.py @@ -0,0 +1,101 @@ +#!/usr/bin/env python3 +"""fablize spec ledger — a self-contained, stdlib-only locked-spec store. + +Purpose (behavior only): + - After clarifying an ambiguous task, lock the agreed spec to a ledger (.fablize/) so a + later session (after compaction or restart) reads it instead of re-asking the user. + - Prevents doing the same clarification — and the same work — two or three times. + +Usage: + spec.py lock --req "..." [--req ...] [--constraint "..."] [--decision "question::answer"] [--brief "..."] + spec.py show # first command when resuming — prints the locked spec +State directory: ./.fablize/ (run from the repo root) +""" +import argparse +import json +import sys +from datetime import datetime, timezone +from pathlib import Path + +DIR = Path(".fablize") +SPEC = DIR / "spec.json" +LEDGER = DIR / "ledger.jsonl" +GLOBAL_LOG = Path.home() / ".fablize" / "events.jsonl" + + +def now(): + return datetime.now(timezone.utc).isoformat() + + +def log(event, **kw): + DIR.mkdir(exist_ok=True) + rec = {"ts": now(), "event": event, **kw} + with open(LEDGER, "a", encoding="utf-8") as f: + f.write(json.dumps(rec, ensure_ascii=False) + "\n") + try: + GLOBAL_LOG.parent.mkdir(exist_ok=True) + with open(GLOBAL_LOG, "a", encoding="utf-8") as f: + f.write(json.dumps({**rec, "tool": "spec", "cwd": str(Path.cwd())}, ensure_ascii=False) + "\n") + except OSError: + pass + + +def cmd_lock(a): + reqs = [r.strip() for r in a.req if r.strip()] + if not reqs and not a.decision and not a.constraint: + sys.exit("fablize: lock needs at least one --req, --constraint, or --decision.") + decisions = [] + for d in a.decision: + if "::" not in d: + sys.exit(f"fablize: --decision format is 'question::answer' — invalid: {d}") + q, ans = d.split("::", 1) + decisions.append({"question": q.strip(), "answer": ans.strip()}) + spec = { + "brief": a.brief, + "locked": now(), + "requirements": reqs, + "constraints": [c.strip() for c in a.constraint if c.strip()], + "decisions": decisions, + } + DIR.mkdir(exist_ok=True) + SPEC.write_text(json.dumps(spec, ensure_ascii=False, indent=1), encoding="utf-8") + log("spec_locked", reqs=len(reqs), constraints=len(spec["constraints"]), decisions=len(decisions)) + print(f"fablize: spec locked — {len(reqs)} requirement(s), {len(decisions)} decision(s) → {SPEC}") + print("fablize: build against this; do not re-ask the user what is recorded here.") + + +def cmd_show(a): + if not SPEC.exists(): + print("fablize: no locked spec yet. After clarifying, record it with `spec.py lock`.") + return + spec = json.loads(SPEC.read_text(encoding="utf-8")) + print(f"fablize: locked spec — {spec.get('brief') or '(no brief)'} [locked {spec.get('locked','?')}]") + if spec.get("requirements"): + print("Requirements:") + for r in spec["requirements"]: + print(f" • {r}") + if spec.get("constraints"): + print("Constraints:") + for c in spec["constraints"]: + print(f" • {c}") + if spec.get("decisions"): + print("Decisions:") + for d in spec["decisions"]: + print(f" • {d['question']} → {d['answer']}") + + +def main(): + p = argparse.ArgumentParser(prog="spec.py") + sub = p.add_subparsers(dest="cmd", required=True) + lk = sub.add_parser("lock") + lk.add_argument("--brief", default="") + lk.add_argument("--req", action="append", default=[]) + lk.add_argument("--constraint", action="append", default=[]) + lk.add_argument("--decision", action="append", default=[]) + sub.add_parser("show") + a = p.parse_args() + {"lock": cmd_lock, "show": cmd_show}[a.cmd](a) + + +if __name__ == "__main__": + main() diff --git a/fablize/tests/test_fablize.py b/fablize/tests/test_fablize.py new file mode 100644 index 00000000..b7f9abc7 --- /dev/null +++ b/fablize/tests/test_fablize.py @@ -0,0 +1,158 @@ +#!/usr/bin/env python3 +"""fablize test suite — stdlib unittest, no third-party deps (portable everywhere). + +Runs the engines as real subprocesses in an isolated temp HOME/CWD so the suite never +touches the developer's real ~/.fablize or repo state. Covers the invariants that ARE +the product: evidence-gated completion, the final verification gate, the bounded +self-correction → escalation counter, and the metrics summary. +""" +import json +import os +import subprocess +import sys +import tempfile +import unittest +from pathlib import Path + +ROOT = Path(__file__).resolve().parent.parent +GOALS = str(ROOT / "scripts" / "goals.py") +SPEC = str(ROOT / "scripts" / "spec.py") +METRICS = str(ROOT / "scripts" / "metrics.py") +GUARD = str(ROOT / "hooks" / "destructive_guard.py") + + +class Base(unittest.TestCase): + def setUp(self): + self.tmp = tempfile.mkdtemp() + self.env = dict(os.environ, HOME=self.tmp) + + def run_script(self, script, *args, stdin=None): + return subprocess.run( + [sys.executable, script, *args], + cwd=self.tmp, env=self.env, input=stdin, + capture_output=True, text=True, + ) + + +class GoalsTests(Base): + def _create(self): + return self.run_script(GOALS, "create", "--brief", "demo", + "--goal", "build::do the thing", + "--goal", "verify::prove it works") + + def test_create_and_status(self): + r = self._create() + self.assertEqual(r.returncode, 0, r.stderr) + self.assertIn("2 stories", r.stdout) + s = self.run_script(GOALS, "status") + self.assertIn("0/2 complete", s.stdout) + + def test_complete_requires_evidence(self): + self._create() + self.run_script(GOALS, "next") + r = self.run_script(GOALS, "checkpoint", "--id", "G001", "--status", "complete", "--evidence", "") + self.assertNotEqual(r.returncode, 0) + self.assertIn("non-empty --evidence", r.stderr) + + def test_checkpoint_requires_active(self): + self._create() + # G001 never activated via `next` + r = self.run_script(GOALS, "checkpoint", "--id", "G001", "--status", "complete", "--evidence", "x") + self.assertNotEqual(r.returncode, 0) + self.assertIn("not active", r.stderr) + + def test_final_story_verification_gate(self): + self._create() + self.run_script(GOALS, "next") + self.run_script(GOALS, "checkpoint", "--id", "G001", "--status", "complete", "--evidence", "built") + self.run_script(GOALS, "next") # activates final G002 + # final without verify args must fail + r = self.run_script(GOALS, "checkpoint", "--id", "G002", "--status", "complete", "--evidence", "done") + self.assertNotEqual(r.returncode, 0) + self.assertIn("verification gate", r.stderr) + # with verify args it succeeds + ok = self.run_script(GOALS, "checkpoint", "--id", "G002", "--status", "complete", + "--evidence", "done", "--verify-cmd", "pytest", "--verify-evidence", "12 passed") + self.assertEqual(ok.returncode, 0, ok.stderr) + self.assertIn("all stories complete", ok.stdout) + + def test_bounded_escalation(self): + self._create() + # attempt 1 + self.run_script(GOALS, "next") + r1 = self.run_script(GOALS, "checkpoint", "--id", "G001", "--status", "blocked", "--evidence", "stuck") + self.assertNotIn("escalation gate", r1.stdout) + # retry → attempt 2 → escalation + rt = self.run_script(GOALS, "retry", "--id", "G001") + self.assertIn("attempt 2", rt.stdout) + r2 = self.run_script(GOALS, "checkpoint", "--id", "G001", "--status", "blocked", "--evidence", "still stuck") + self.assertIn("escalation gate", r2.stdout) + self.assertIn("effort xhigh", r2.stdout) + + def test_global_event_log_written(self): + self._create() + log = Path(self.tmp) / ".fablize" / "events.jsonl" + self.assertTrue(log.exists()) + lines = [json.loads(x) for x in log.read_text().splitlines() if x.strip()] + self.assertTrue(any(e["event"] == "plan_created" and e["tool"] == "goals" for e in lines)) + + +class SpecTests(Base): + def test_lock_needs_something(self): + r = self.run_script(SPEC, "lock", "--brief", "x") + self.assertNotEqual(r.returncode, 0) + self.assertIn("at least one", r.stderr) + + def test_lock_and_show(self): + r = self.run_script(SPEC, "lock", "--brief", "auth", "--req", "use OAuth", + "--decision", "db::postgres") + self.assertEqual(r.returncode, 0, r.stderr) + s = self.run_script(SPEC, "show") + self.assertIn("use OAuth", s.stdout) + self.assertIn("postgres", s.stdout) + + def test_show_empty(self): + s = self.run_script(SPEC, "show") + self.assertIn("no locked spec", s.stdout) + + +class MetricsTests(Base): + def test_summary_after_flow(self): + self.run_script(GOALS, "create", "--brief", "m", "--goal", "a::x", "--goal", "v::y") + self.run_script(SPEC, "lock", "--req", "r1") + r = self.run_script(METRICS, "--json") + data = json.loads(r.stdout) + self.assertEqual(data["plans_created"], 1) + self.assertEqual(data["specs_locked"], 1) + + def test_empty_metrics(self): + r = self.run_script(METRICS) + self.assertIn("no events yet", r.stdout) + + +class GuardTests(Base): + def _check(self, command): + payload = json.dumps({"tool_name": "Bash", "tool_input": {"command": command}}) + return self.run_script(GUARD, stdin=payload) + + def test_blocks_rm_rf(self): + r = self._check("rm -rf /tmp/stuff") + self.assertIn("permissionDecision", r.stdout) + self.assertIn("ask", r.stdout) + + def test_blocks_force_push(self): + r = self._check("git push origin main --force") + self.assertIn("ask", r.stdout) + + def test_allows_safe_command(self): + r = self._check("ls -la && git status") + self.assertEqual(r.stdout.strip(), "") + + def test_ignores_non_bash(self): + payload = json.dumps({"tool_name": "Read", "tool_input": {"file_path": "/x"}}) + r = self.run_script(GUARD, stdin=payload) + self.assertEqual(r.stdout.strip(), "") + + +if __name__ == "__main__": + unittest.main(verbosity=2) diff --git a/install-combined.sh b/install-combined.sh new file mode 100755 index 00000000..b9503499 --- /dev/null +++ b/install-combined.sh @@ -0,0 +1,34 @@ +#!/usr/bin/env bash +# Combined installer: codebase-memory-mcp (memory layer) + fablize (procedure layer). +# One command sets up both. The C core is built if needed, registered as an MCP server for +# your agents, then the fablize disciplines are applied to the current project. +# Usage: bash install-combined.sh [target-project-dir] (default: current directory) +set -euo pipefail +ROOT="$(cd "$(dirname "$0")" && pwd)" +TARGET="${1:-$PWD}" +BIN="$ROOT/build/c/codebase-memory-mcp" + +echo "=== codebase-memory-mcp + fablize — combined install ===" + +# 1. Memory layer: build the binary if it isn't there yet. +if [ ! -x "$BIN" ]; then + echo "[1/3] Building the memory engine (codebase-memory-mcp)..." + "$ROOT/scripts/build.sh" +else + echo "[1/3] Memory engine already built: $BIN" +fi + +# 2. Memory layer: register the MCP server + agent instruction files. +echo "[2/3] Registering the MCP server with your agents..." +"$BIN" install -y || { + echo " ! 'install' returned non-zero — configure the MCP server manually (see README)."; } + +# 3. Procedure layer: apply the fablize disciplines to the target project. +echo "[3/3] Applying the fablize procedure layer..." +bash "$ROOT/fablize/install.sh" "$TARGET" + +echo +echo "=== Done. Both layers installed. ===" +echo " Memory : codebase-memory-mcp MCP tools (search_graph, trace_path, get_architecture, …)" +echo " Method : fablize disciplines in $TARGET (see INTEGRATION.md)" +echo " Re-run 'bash fablize/install.sh ' to add the disciplines to another project."