diff --git a/.claude/scheduled_tasks.lock b/.claude/scheduled_tasks.lock
deleted file mode 100644
index a9fcb25..0000000
--- a/.claude/scheduled_tasks.lock
+++ /dev/null
@@ -1 +0,0 @@
-{"sessionId":"3c40b50c-1a7d-4957-a8df-6dd80f691ee9","pid":29655,"acquiredAt":1776712915943}
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 82b5388..82c14ff 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,4 +18,10 @@ coverage/
 /_archive/
 .grepai/
 
-graphify-out/
\ No newline at end of file
+graphify-out/
+
+# AI coding agent local state (per-developer, not for git)
+.claude/
+.agents/
+CLAUDE.md
+skills-lock.json
\ No newline at end of file
diff --git a/README.md b/README.md
index 4c61660..a958853 100644
--- a/README.md
+++ b/README.md
@@ -222,13 +222,19 @@ When `ANTHROPIC_API_KEY` isn't set, the dashboard renders a read-only banner —
 
 ## Agent Skills
 
-Designing an AgentForge workflow from scratch is a lot of YAML. The repo ships an [agent skill](https://skills.sh) that walks any Claude Code / Cursor / Codex session through the design — agents, phases, gates, loops, parallelism, wiring, nodes — and emits a working `.agentforge/` directory.
+The repo ships [agent skills](https://skills.sh) that drop into any Claude Code / Cursor / Codex session — installable in one command:
 
 ```bash
-npx skills add mandarnilange/agentforge/agentforge-workflow
+npx skills add mandarnilange/agentforge
 ```
 
-Then ask the agent something like *"help me design an AgentForge pipeline for PR triage"* and the skill kicks in. Catalog and authoring docs: [`skills/`](skills/).
+| Skill | Audience | What it does |
+|---|---|---|
+| `agentforge-workflow` | End user | Walks through designing an AgentForge workflow and emits a complete `.agentforge/` directory. |
+| `agentforge-template-author` | Contributor | Guides adding a new shipped template under `packages/{core,platform}/src/templates/`. |
+| `agentforge-debug` | Operator | Triages stuck or failing pipeline runs from symptom to fix path. |
+
+Trigger phrases live in each skill's frontmatter — e.g. *"help me design an AgentForge pipeline for PR triage"* fires `agentforge-workflow`; *"why is pipeline X stuck?"* fires `agentforge-debug`. Catalog, changelog, and authoring docs: [`skills/`](skills/).
 
 ---
 
diff --git a/skills/CHANGELOG.md b/skills/CHANGELOG.md
new file mode 100644
index 0000000..a9f770b
--- /dev/null
+++ b/skills/CHANGELOG.md
@@ -0,0 +1,56 @@
+# Skills changelog
+
+Top-level history for every skill under [`skills/`](.). Each skill also
+keeps a per-skill `CHANGELOG.md` with finer-grained notes — link to that
+file from the entry below.
+
+## Versioning policy
+
+Skills are versioned independently via `metadata.version` in
+`SKILL.md`'s frontmatter. Use semver:
+
+| Bump | Trigger |
+|---|---|
+| **major** | Trigger conditions changed (the skill fires on different prompts now), required tool surface changed, file structure renamed, or an existing reference doc removed. Anything a user has muscle memory for. |
+| **minor** | New reference docs, new sections in `SKILL.md`, additive support for new prompts. Existing behaviour preserved. |
+| **patch** | Prose tightening, typo fixes, schema cheat-sheet updates that mirror upstream changes, broken-link repairs. |
+
+**Bump on every PR that lands a skill change**, even a tiny one — the
+version is what distinguishes one published skill from the next, and
+users may pin to it. CI's frontmatter validator enforces presence; the
+contributor enforces honesty.
+
+## Release flow
+
+1. Land your change locally and run `npm run skills:validate`.
+2. Bump the affected skill's `metadata.version` in its `SKILL.md`.
+3. Add an entry to that skill's `CHANGELOG.md` (create the file if it
+   does not exist).
+4. Add a one-line entry here under the next "Unreleased" or current
+   version block, linking to the per-skill changelog.
+5. Open a PR. The `Publish Skills` workflow validates frontmatter on
+   every push.
+
+There is no separate "publish" command. Once the PR merges into `main`,
+`npx skills add mandarnilange/agentforge` picks up the new version on
+next install.
+
+## History
+
+### Unreleased
+
+- _add entries here as they merge_
+
+### 2026-05-02
+
+- **`agentforge-template-author` 0.1.0** — initial release. Guides
+  contributors through adding a new shipped template under
+  `packages/{core,platform}/src/templates/`. See
+  [`agentforge-template-author/CHANGELOG.md`](agentforge-template-author/CHANGELOG.md).
+- **`agentforge-debug` 0.1.0** — initial release. Triages stuck or
+  failing pipeline runs from symptom to fix path. See
+  [`agentforge-debug/CHANGELOG.md`](agentforge-debug/CHANGELOG.md).
+- **`agentforge-workflow` 0.1.0** — initial release. Walks users through
+  designing an AgentForge workflow and emits a complete `.agentforge/`
+  directory. See
+  [`agentforge-workflow/CHANGELOG.md`](agentforge-workflow/CHANGELOG.md).
diff --git a/skills/README.md b/skills/README.md
index a0377ab..dfe98c7 100644
--- a/skills/README.md
+++ b/skills/README.md
@@ -11,17 +11,21 @@ Install all skills from this repo into your agent runtime:
 npx skills add mandarnilange/agentforge
 ```
 
-Or install a single skill by path:
+The Vercel skills CLI scans the repo's `skills/` directory and installs every skill it finds. Want a single skill instead? Use the full GitHub tree URL:
 
 ```bash
-npx skills add mandarnilange/agentforge/agentforge-workflow
+npx skills add https://github.com/mandarnilange/agentforge/tree/main/skills/agentforge-workflow
 ```
 
 ## Available skills
 
-| Skill | What it does |
-|---|---|
-| [`agentforge-workflow`](./agentforge-workflow/SKILL.md) | Walks a user through designing an AgentForge workflow — agents, pipeline, gates, loops, parallelism, wiring, nodes — and emits a complete `.agentforge/` directory. |
+| Skill | Audience | What it does |
+|---|---|---|
+| [`agentforge-workflow`](./agentforge-workflow/SKILL.md) | End user | Walks through designing an AgentForge workflow — agents, pipeline, gates, loops, parallelism, wiring, nodes — and emits a complete `.agentforge/` directory. |
+| [`agentforge-template-author`](./agentforge-template-author/SKILL.md) | Contributor | Guides adding a new shipped template under `packages/{core,platform}/src/templates/`. Scope check → package selection → manifest → tests → PR checklist. |
+| [`agentforge-debug`](./agentforge-debug/SKILL.md) | Operator | Triages a stuck or failing pipeline run from symptom → state inspection → root-cause classification → fix path. |
+
+Versions, release notes, and the bump policy live in [`CHANGELOG.md`](./CHANGELOG.md).
 
 ## Layout
 
diff --git a/skills/agentforge-debug/CHANGELOG.md b/skills/agentforge-debug/CHANGELOG.md
new file mode 100644
index 0000000..f923793
--- /dev/null
+++ b/skills/agentforge-debug/CHANGELOG.md
@@ -0,0 +1,15 @@
+# `agentforge-debug` changelog
+
+## 0.1.0 — 2026-05-02
+
+Initial release.
+
+- Triages a stuck or failing AgentForge pipeline run: symptom →
+  state inspection → root-cause classification → fix path.
+- Read-first discipline: never propose state-mutating actions until run
+  state has been read from the state store / dashboard.
+- Reference docs: symptom tree (symptom → cause → first command),
+  state model (every status enum and transition), fix paths (concrete
+  recipes per root cause).
+- Hard rules: no SQLite-file deletes, no gate bypass without
+  authorisation, no validation skip flags, one fix at a time.
diff --git a/skills/agentforge-debug/SKILL.md b/skills/agentforge-debug/SKILL.md
new file mode 100644
index 0000000..b5316e4
--- /dev/null
+++ b/skills/agentforge-debug/SKILL.md
@@ -0,0 +1,167 @@
+---
+name: agentforge-debug
+description: >
+  Triages a stuck or failing AgentForge pipeline run. Walks the user from
+  symptom → state inspection → root-cause classification → fix path. Trigger
+  when the user reports an AgentForge run that is "stuck", "failing",
+  "looping", "blowing its budget", "not picking up at the gate", "schema
+  invalid", "won't dispatch", "node not picking it up", or asks "why is
+  pipeline X not finishing?". Do NOT trigger for general AgentForge design
+  questions (use `agentforge-workflow`) or for debugging unrelated AI
+  pipelines.
+license: MIT
+metadata:
+  author: mandarnilange
+  version: "0.1.0"
+---
+
+# AgentForge Pipeline Debug
+
+You are helping a user triage an AgentForge pipeline run that is not
+behaving. Run state lives in the SQLite/Postgres state store; every agent
+step, gate decision, and node heartbeat is recorded there. The dashboard
+visualises the same data. This skill walks from the symptom the user
+reports to a concrete fix path.
+
+The bias is **read first, change last**. Never propose mutations (re-run,
+abort, force-approve gate) until you have read the current run state and
+classified the root cause.
+
+## When to use this skill
+
+Trigger on any of:
+- "Pipeline X is stuck / not finishing / hung"
+- "My run is looping / hitting max iterations / failing schema validation"
+- "Gate is not picking up / no one can approve"
+- "Agent blew its budget / cost ceiling hit"
+- "Node not picking up the job / scheduler isn't dispatching"
+- "Dashboard says paused but nothing is happening"
+- "Why is pipeline `<id>` failing?"
+
+Do **not** trigger for:
+- Workflow design questions (use `agentforge-workflow`).
+- General LLM-app debugging unrelated to AgentForge.
+
+## Reference material
+
+Read on demand:
+
+- `references/symptom-tree.md` — symptom → likely cause → first command to run
+- `references/state-model.md` — pipeline / agent run / gate status enums and
+  what each transition means
+- `references/fix-paths.md` — concrete recipes for each root cause
+  (re-run agent, edit prompt, swap node, raise budget, force gate, etc.)
+
+The authoritative status enums live at:
+- `packages/core/src/domain/models/pipeline-run.model.ts` — `PipelineRunStatus`
+- `packages/core/src/domain/models/agent-run.model.ts` — `AgentRunRecordStatus`, `AgentRunExitReason`
+- `packages/core/src/domain/models/gate.model.ts` — `GateStatus`
+
+Read the status enum from the source if you need the exact list — do not
+rely on memory.
+
+## The flow
+
+### 1. Capture the symptom precisely
+
+Restate what you heard in one or two lines. Resist jumping to fixes.
+
+Ask only what you need:
+- The pipeline run ID (or pipeline name + project name).
+- What the user expected to happen.
+- What they observe (dashboard state, error message, last log line).
+- When it started — was it working before? What changed?
+
+Skip questions whose answers you can infer from the dashboard / state.
+
+### 2. Read the run state
+
+Run these in order. Stop reading the moment you have enough to classify:
+
+```bash
+# High-level run state — status, current phase, last update
+agentforge get pipeline <run-id>
+
+# Per-agent run records — which one is stuck/failed
+agentforge get pipeline <run-id> --agents          # if available; otherwise:
+# Inspect via dashboard at http://localhost:3001/runs/<run-id>
+
+# Logs for the suspect agent run
+agentforge logs <run-id> --agent <agent-name>
+
+# Node pool health — relevant if the symptom is "not dispatching"
+agentforge get nodes
+```
+
+If the user does not have CLI access (e.g. on a worker host), point them at
+the dashboard pages: `/runs/<id>`, `/runs/<id>/agents/<agent-name>`,
+`/nodes`.
+
+### 3. Classify the root cause
+
+Use `references/symptom-tree.md`. The classifications:
+
+| Class | One-line tell |
+|---|---|
+| **Gate hung** | `pipeline_run.status = paused_at_gate`, no recent gate decision |
+| **Agent failed (LLM error)** | `agent_run.status = failed`, `error` field non-empty, `exitReason = error` |
+| **Agent failed (schema invalid)** | `failed` with `error` mentioning Zod or `validate` step |
+| **Agent failed (script step)** | `failed` with last step type `script` and non-zero `exitCode` |
+| **Loop spinning** | Agent `running` with high `retryCount` or `loop.iteration ≈ maxIterations` |
+| **Budget hit** | `failed` with `exitReason = budget-tokens` or `budget-cost` |
+| **Timeout** | `failed` with `exitReason = timeout` |
+| **Stuck dispatching** | `agent_run.status = pending`/`scheduled` for long, no node claim |
+| **Node down** | `nodes` table shows the targeted node `offline` or stale heartbeat |
+| **Node affinity miss** | `pending` forever; no node satisfies `nodeAffinity.required` |
+| **Cancelled** | `pipeline_run.status = cancelled` — was someone else's `agentforge cancel <run-id>`? |
+
+Tell the user the class in one sentence and cite the evidence ("agent
+`developer` is `failed` with `exitReason: budget-cost`, run cost was
+$0.62 vs the $0.60 ceiling").
+
+### 4. Propose the fix path
+
+Use `references/fix-paths.md` for the matched class. Present **the smallest
+reversible action first**, then escalating options.
+
+For example, on a schema-invalid failure:
+1. Inspect the agent's last LLM output via dashboard or logs.
+2. If the LLM hallucinated a missing field, the prompt likely needs a
+   tightening — propose a one-line addition to `prompts/<agent>.system.md`.
+3. If the schema itself is wrong, point at the schema file and propose the
+   minimum change.
+4. Re-run only the failed agent: `agentforge run --continue <run-id>`.
+
+Do not skip to "abort and re-start the pipeline" unless the run is
+unrecoverable.
+
+### 5. Confirm before mutating state
+
+Any action that changes shared state — approving a gate, cancelling a run,
+re-running an agent, force-clearing a stuck claim — requires explicit user
+confirmation. State your understanding, the proposed action, and the
+expected outcome. Wait.
+
+Read-only investigation does not need confirmation.
+
+## Hard rules
+
+- **Read before write.** Always inspect run state before proposing a fix.
+- **Never tell the user to delete the SQLite file** or `TRUNCATE pipeline_runs`
+  to "clear stuck state". That is data loss. Use the proper CLI.
+- **Never bypass a gate** without explicit user authorisation. Gates are the
+  audit trail.
+- **Do not propose `--no-verify` or skip-validation flags** to bypass
+  schema failures. Fix the schema or the prompt.
+- **Do not assume the dashboard is wrong.** When the dashboard and CLI
+  disagree, the state store is the source of truth — read it directly.
+- **Propose one fix at a time.** Avoid stacked changes that make it
+  impossible to know which one solved the problem.
+
+## What success looks like
+
+- The user knows exactly which agent / phase / gate is the failure point.
+- The user knows the root-cause class with cited evidence.
+- The user has a concrete next action (CLI command, file edit, gate
+  decision) and an idea of what will tell them whether it worked.
+- No state has been mutated without their explicit go-ahead.
diff --git a/skills/agentforge-debug/references/fix-paths.md b/skills/agentforge-debug/references/fix-paths.md
new file mode 100644
index 0000000..ebbbedb
--- /dev/null
+++ b/skills/agentforge-debug/references/fix-paths.md
@@ -0,0 +1,199 @@
+# Fix paths — concrete recipes per root cause
+
+Each section: smallest reversible action first, then escalations. Always
+get explicit user confirmation before any state-mutating action.
+
+## Gate hung
+
+**Smallest action — make the decision via dashboard or CLI:**
+```bash
+agentforge gate approve <gate-id>                              # accept
+agentforge gate reject  <gate-id> --reason "..."               # reject
+agentforge gate revise  <gate-id> --notes "Tighten section 3"  # ask for revision
+```
+
+**If approver is unavailable:** check `gateDefaults.timeout` in the pipeline.
+Expired gates auto-reject. Adjust the timeout in the pipeline YAML for
+future runs.
+
+**If the gate should never have existed:** edit the pipeline to drop the
+`gate:` block on that phase. This requires a re-run; cannot retroactively
+remove a gate from an in-flight run.
+
+## Agent failed — LLM error
+
+**Smallest action — read the error field, retry once if transient:**
+```bash
+agentforge run --continue <run-id>
+```
+Anthropic 529 (overloaded) and transient network errors are retried 3× by
+the runtime already. A failure surfaced to the agent run record means
+those retries were exhausted.
+
+**If the LLM keeps producing the same wrong output:** the prompt is the
+problem. Read the agent's `prompts/<name>.system.md`, identify the
+ambiguity, and tighten with a one-line constraint. Re-run the failed
+agent only:
+```bash
+agentforge run --continue <run-id>
+```
+
+**If the model is the problem (capability gap):** swap `spec.model.name`
+to a stronger model on that agent. Note the cost impact; warn the user.
+
+## Agent failed — schema invalid
+
+**Smallest action — inspect the failing artifact:**
+- Dashboard: `/runs/<run-id>/agents/<agent-name>` shows the LLM's last
+  output and the validation error.
+- CLI: `agentforge logs <run-id> --agent <agent>` (then look for the
+  `validate` step).
+
+**Branch on root cause:**
+
+| Cause | Fix |
+|---|---|
+| LLM hallucinated a missing field | Tighten the prompt to require it |
+| LLM emitted the right field with the wrong type | Add an example to the prompt |
+| Schema is too strict for the use case | Loosen the schema — minimum change wins |
+| Schema is wrong (typo, missing optional) | Fix the schema |
+
+**Re-run only the failed agent:**
+```bash
+agentforge run --continue <run-id>
+```
+
+**Do not** disable validation to make the run pass. The validate step
+exists exactly to catch this; bypassing it pollutes downstream phases.
+
+## Agent failed — script step
+
+**Smallest action — read the error field for stderr + exit code.**
+
+**Most common causes and fixes:**
+- Missing tool on the node (`eslint`, `pytest`, `gofmt`):
+  ```bash
+  # On the node, install it. Then re-run.
+  npm i -g eslint  # or whatever
+  agentforge run --continue <run-id>
+  ```
+- Wrong working directory: the script `cd {{run.workdir}}`s but `workdir`
+  is empty because a prior step did not create it. Add `mkdir -p` to the
+  setup step.
+- Path / env var missing on the worker: the local node has it but the
+  Docker / SSH worker does not. Bake the dep into the worker image, or
+  declare it as a `nodeAffinity.required` capability and run the agent
+  on a different node.
+
+## Loop spinning at maxIterations
+
+**First — read what the LLM is doing each iteration.** The dashboard's
+agent timeline shows every iteration's input and output. If the LLM is
+making no progress, the prompt is the problem; if each iteration improves,
+the predicate is wrong.
+
+**Fixes:**
+- Predicate wrong (script always emits the same value): rewrite the
+  predicate's `run` block. Re-run only the agent.
+- Prompt too vague: add a "what the previous iteration got wrong" hint
+  using `{{steps.run-tests.output}}`.
+- Genuinely too hard: raise `maxIterations` (with caution — costs
+  multiply) or break the work into two agents.
+
+**Never raise `maxIterations` past 10** without explicit user buy-in.
+Cost and wall-clock both compound.
+
+## Budget hit (`budget-tokens` / `budget-cost`)
+
+**Read `tokenUsage` and `cost_usd` to see how close to the ceiling the run
+came.**
+
+**Three honest fixes:**
+1. **Raise the budget** in `spec.resources.budget`. Justified when the
+   task is genuinely larger than the original estimate.
+2. **Shorten the prompt** — system prompts that have grown over time
+   often duplicate context. Read for redundancy.
+3. **Pick a cheaper model** — `claude-haiku-4-5` for cost-sensitive
+   agents that do not need deep reasoning.
+
+Do not silently disable budgets. They exist to bound runaway spend.
+
+## Timeout
+
+**Smallest action — bump the timeout for that agent only:**
+```yaml
+spec:
+  resources:
+    timeoutSeconds: 1800   # was 600 (default)
+```
+
+`0` disables. Use sparingly — disabled timeouts mean a hung LLM call
+holds the worker forever.
+
+**If the agent has multiple steps:** add per-step timeouts in
+`spec.definitions.<step>.timeoutSeconds` rather than raising the
+agent-level one. The script step is usually the one that needs more
+time, not the LLM call.
+
+## Stuck dispatching / node affinity miss
+
+**First — confirm a node satisfies the agent's required capabilities:**
+```bash
+agentforge get nodes
+grep -A 5 "nodeAffinity" .agentforge/agents/<agent>.agent.yaml
+```
+
+**Two fix paths:**
+- **Relax the agent**: drop a `required:` capability the user does not
+  actually need.
+- **Strengthen the pool**: start a worker with the missing capability:
+  ```bash
+  NODE_NAME=worker-gpu \
+  NODE_CAPABILITIES=llm-access,docker,gpu \
+  CONTROL_PLANE_URL=http://cp:3001 \
+    docker compose -f packages/platform/docker-compose.worker.yml up -d
+  ```
+
+If all nodes are at `maxConcurrentRuns`: wait, raise the limit on a node,
+or scale the pool horizontally.
+
+## Node down
+
+**Verify:**
+```bash
+agentforge get nodes      # is the node `offline` / heartbeat stale?
+```
+
+**Fixes:**
+- Restart the worker process. The reconciler reclaims its in-flight runs
+  within `claim_ttl` (default ~60s) and re-dispatches.
+- If the node is permanently gone, drop it from the pool. Existing
+  agent runs reassigned automatically.
+
+## Cancelled by accident
+
+`cancelled` is terminal. There is no "uncancel."
+
+**Recovery path:**
+- Read what artifacts the cancelled run produced via dashboard or CLI.
+- Start a fresh pipeline. If you want to skip phases that already
+  completed, manually feed those artifacts as inputs to the new run.
+
+Make sure the user understands the cost / time implications before
+re-running the full pipeline.
+
+---
+
+## When to escalate to a code change
+
+If the same class of failure has happened more than twice on different
+runs of the same template / agent, the root cause is in the YAML or the
+framework, not the run. Examples:
+- Repeated schema-invalid failures from the same agent → prompt or schema
+  bug; fix and PR.
+- Repeated timeouts on the same agent → unrealistic budget; raise default.
+- Repeated stuck-dispatch on the same affinity → template author should
+  loosen the `required:` to `preferred:`.
+
+In these cases, file an issue or a PR rather than re-running the same
+broken pipeline.
diff --git a/skills/agentforge-debug/references/state-model.md b/skills/agentforge-debug/references/state-model.md
new file mode 100644
index 0000000..9d100fc
--- /dev/null
+++ b/skills/agentforge-debug/references/state-model.md
@@ -0,0 +1,114 @@
+# State model — what every status means
+
+Authoritative source files (read directly when in doubt):
+
+- `packages/core/src/domain/models/pipeline-run.model.ts`
+- `packages/core/src/domain/models/agent-run.model.ts`
+- `packages/core/src/domain/models/gate.model.ts`
+- `packages/core/src/domain/models/node.model.ts`
+
+## `PipelineRunStatus`
+
+```
+running           → at least one agent is executing or pending
+paused_at_gate    → all phase-N agents done; waiting for human gate decision
+completed         → all phases finished, every gate approved
+failed            → an agent's failure was unrecoverable; pipeline halted
+cancelled         → manually aborted via CLI / dashboard
+```
+
+Transitions:
+- `running` → `paused_at_gate` after a gated phase's agents complete
+- `paused_at_gate` → `running` after gate `approved`
+- `paused_at_gate` → `failed` after gate `rejected` or timeout
+- `running` → `failed` when an agent run fails and there are no retries left
+- any → `cancelled` via explicit cancel
+
+`completed` and `failed` are terminal except via `agentforge run --continue`
+(only valid when `paused_at_gate` or in the middle of a recoverable failure).
+
+## `AgentRunRecordStatus`
+
+```
+pending           → created but not yet claimed by a node
+scheduled         → assigned to a node, awaiting execution slot
+running           → executing on a node right now
+succeeded         → finished, output validated, artifacts persisted
+failed            → run did not produce a valid output
+```
+
+A `pending` agent run with no matching node hangs the pipeline. A
+`scheduled` agent run that never advances to `running` suggests the worker
+crashed between claim and start — the reconciler should reclaim within
+`claim_ttl`.
+
+## `AgentRunExitReason`
+
+Populated only when non-obvious. ADR-0003 has the rationale.
+
+```
+timeout           → wall-clock budget hit (LLM call or step)
+budget-tokens     → token ceiling exceeded
+budget-cost       → USD ceiling exceeded
+cancelled         → user cancelled mid-run
+error             → catch-all (LLM error, script non-zero exit, schema invalid)
+```
+
+Absent `exitReason` on `succeeded` is normal. Absent on `failed` means the
+failure path is unspecified — read the `error` field for the message.
+
+## `GateStatus`
+
+```
+pending             → awaiting decision
+approved            → human approved; pipeline proceeds to next phase
+rejected            → human rejected; pipeline marked failed
+revision_requested  → human asked for revision; phase agents re-run with revisionNotes
+```
+
+`revision_requested` re-runs the phase's agents with the revision note
+inlined into their input. The new agent runs are full executions —
+budgets and timeouts apply afresh.
+
+## `NodeStatus`
+
+```
+online            → recent heartbeat, accepting work
+offline            → no recent heartbeat (default ~60s threshold)
+unknown            → never heartbeated since registration
+degraded          → online but reporting partial capability loss
+```
+
+`offline` for longer than the heartbeat threshold triggers reclaim of any
+agent runs claimed by that node. `degraded` is informational — the node
+still receives work that fits its remaining capabilities.
+
+## How transitions are persisted
+
+Every status change writes a row to the state store:
+
+- `pipeline_runs` — one row per run, `version` column for optimistic
+  concurrency.
+- `agent_runs` — one row per agent execution, includes `tokenUsage`,
+  `cost_usd`, `duration_ms`, `error`, `exitReason`, `recoveryToken`,
+  `lastStatusAt`, `statusMessage`.
+- `gates` — one row per gate, status updates atomic.
+
+Reading the state store directly (read-only) is fine for diagnostics.
+Mutating it directly is not — always go through the CLI / API.
+
+## Reading the state store directly (last resort)
+
+```bash
+sqlite3 ./output/.state.db "SELECT id, status, current_phase, started_at FROM pipeline_runs WHERE id LIKE '<prefix>%' ORDER BY created_at DESC LIMIT 10;"
+
+sqlite3 ./output/.state.db "SELECT agent_name, phase, status, exit_reason, error, cost_usd FROM agent_runs WHERE pipeline_run_id = '<run-id>' ORDER BY phase, started_at;"
+
+sqlite3 ./output/.state.db "SELECT id, phase_completed, status, reviewer, decided_at FROM gates WHERE pipeline_run_id = '<run-id>';"
+```
+
+For Postgres deployments swap to `psql $AGENTFORGE_POSTGRES_URL`. Same
+schema.
+
+Treat raw SQL output as a debugging aid — present findings to the user
+in plain English, not as table dumps.
diff --git a/skills/agentforge-debug/references/symptom-tree.md b/skills/agentforge-debug/references/symptom-tree.md
new file mode 100644
index 0000000..7776ce3
--- /dev/null
+++ b/skills/agentforge-debug/references/symptom-tree.md
@@ -0,0 +1,161 @@
+# Symptom → cause → first command
+
+Walk the user from observable symptom to a single CLI command that will
+either confirm or eliminate a suspected cause.
+
+## "The run is stuck — nothing is happening"
+
+**Step 1.** Read pipeline state:
+```bash
+agentforge get pipeline <run-id>
+```
+
+Branch on `status`:
+
+| `status` | Likely cause | Next |
+|---|---|---|
+| `running` | Agent stuck inside an LLM call or loop | Read agent run records |
+| `paused_at_gate` | Gate awaiting approval | List pending gates |
+| `failed` | Latest agent failed, run halted | Inspect last agent run |
+| `completed` | The run is done — user is looking at the wrong run | Confirm run-id |
+| `cancelled` | Someone aborted it | Check audit log |
+
+**Step 2 — `running` branch.** Find the active agent:
+- Dashboard: `/runs/<run-id>` shows the live phase + agent.
+- CLI: `agentforge logs <run-id>` and look at the most recent agent name.
+
+If the agent has been `running` for more than its `timeoutSeconds`,
+suspect a timeout that has not yet propagated. Check the worker node:
+```bash
+agentforge get nodes
+```
+Stale heartbeat → node crashed mid-run; reconciliation should mark the
+agent `failed` within ~60s.
+
+**Step 2 — `paused_at_gate` branch.** Pending gate is waiting:
+```bash
+# List pending gates for the run (via dashboard or DB query)
+# Dashboard: /runs/<run-id>/gates
+```
+Gate status `pending` with a non-empty `phaseCompleted` → the user (or
+their reviewer) needs to take action. Gate timeout (`gateDefaults.timeout`
+or per-gate) automatically rejects when expired.
+
+## "Agent X is failing"
+
+**Step 1.** Read agent run record:
+```bash
+agentforge logs <run-id> --agent <agent-name>     # if your CLI supports it
+# or via dashboard /runs/<run-id>/agents/<agent-name>
+```
+
+Branch on `exitReason`:
+
+| `exitReason` | Cause | First fix path |
+|---|---|---|
+| `error` | LLM returned bad output, script step failed, or unhandled error | Read `error` field — message tells you which |
+| `budget-tokens` | Hit `resources.budget.maxTotalTokens` | Inspect `tokenUsage`; raise budget or shorten prompt |
+| `budget-cost` | Hit `resources.budget.maxCostUsd` | Inspect `cost_usd`; raise budget or pick cheaper model |
+| `timeout` | Wall-clock budget exhausted | Raise `timeoutSeconds` or split work |
+| `cancelled` | Manually cancelled via CLI / dashboard | Confirm intent |
+
+If `exitReason` is missing, the agent succeeded or failed for an
+unspecified reason — read the `error` field directly.
+
+**Step 2 — Schema validation failures** (within `error: ...`):
+- Look for `ZodError` or `validate` step output.
+- Read the failing step's `instructions` (LLM prompt) and the schema file.
+- 90% of the time: the prompt does not constrain the LLM toward the schema.
+  Fix is a one-line prompt addition stating the required fields.
+
+**Step 2 — Script step failures** (within `error: ...`):
+- The error contains the step's stderr / exit code.
+- Read the agent's `definitions.<step-name>.run` block.
+- Common: missing tool on the node (`eslint`, `pytest`, `gofmt`), missing
+  env var (`PATH`), or a `cd {{run.workdir}}` referring to a path the
+  agent did not create. Fix in the YAML.
+
+## "It's looping — same error over and over"
+
+This is a `loop` block hitting `maxIterations` without the `until`
+predicate going truthy.
+
+**Step 1.** Read the agent definition's loop:
+```bash
+# Pull the YAML — could be in .agentforge/ or under packages/<pkg>/src/templates/
+cat .agentforge/agents/<agent>.agent.yaml
+```
+
+**Step 2.** Inspect the predicate step's output across iterations via the
+dashboard timeline or logs. The predicate (`test-gate`, etc.) should emit a
+truthy sentinel when work is done. If it never does, either:
+- The work genuinely cannot be completed by the LLM with current
+  constraints (raise `maxIterations`, change model, refine prompt).
+- The predicate is wrong (logic error in the script that decides done-ness).
+
+Do **not** raise `maxIterations` past a sane ceiling (5–10) without first
+reading what the LLM is doing — silent infinite-loop on the user's bill.
+
+## "Node not picking up the job"
+
+**Step 1.**:
+```bash
+agentforge get nodes
+```
+
+Branch:
+
+| State | Cause | Fix |
+|---|---|---|
+| Node `offline` / stale heartbeat | Worker process crashed | Restart the worker; reconciler will re-dispatch |
+| All nodes online but agent `pending` | `nodeAffinity.required` not satisfied | Inspect agent's required capabilities vs node capabilities |
+| All nodes at max capacity | `maxConcurrentRuns` reached | Wait, scale the pool, or raise per-node concurrency |
+| No nodes at all | No worker registered | Start one — `agentforge node start --control-plane-url ...` |
+
+**Step 2.** For `nodeAffinity` mismatches:
+```bash
+# What the agent requires
+grep -A 5 "nodeAffinity" .agentforge/agents/<agent>.agent.yaml
+
+# What the nodes provide
+agentforge get nodes
+```
+Resolve by either relaxing the agent's requirements or starting a node
+with the missing capability.
+
+## "Cost ceiling hit / budget blown"
+
+**Step 1.** The agent's `exitReason` will be `budget-tokens` or
+`budget-cost`. Read its `tokenUsage` and `cost_usd` from the run record.
+
+**Step 2.** Decide path:
+- **The agent's task is genuinely larger than the budget** → raise
+  `resources.budget.maxTotalTokens` / `maxCostUsd` in the agent YAML.
+- **The prompt is leaky** (LLM rambling, stuffed system prompt) → tighten
+  the prompt, lower `maxTokens`, lower `thinking` from `high` to `medium`.
+- **A loop is consuming budget** → lower `maxIterations` or fix the
+  predicate (see "looping" above).
+
+## "Pipeline stuck dispatching — never reaches an agent"
+
+This is the scheduler not finding a node, OR a worker not claiming the
+pending job. Order of investigation:
+
+1. `agentforge get nodes` — any nodes online?
+2. Inspect `nodeAffinity` requirements (above).
+3. If nodes online and capabilities match, suspect the event bus or job
+   queue. Check the control-plane logs for "no eligible node" messages
+   and the reconciler logs for claim races.
+4. Worst case: restart the control plane. The state store survives;
+   dispatch resumes.
+
+## "I cancelled by accident — can I resume?"
+
+`pipeline_run.status = cancelled` is terminal. The state store keeps the
+record but `agentforge run --continue <run-id>` will not work. Use:
+- `agentforge get pipeline <run-id>` to read the artifacts produced so far.
+- Start a fresh pipeline that wires those artifacts as inputs (manual
+  inputs to the new run).
+
+There is no automatic "uncancel". Confirm with the user before they
+re-run from scratch (cost, time).
diff --git a/skills/agentforge-template-author/CHANGELOG.md b/skills/agentforge-template-author/CHANGELOG.md
new file mode 100644
index 0000000..9599a2f
--- /dev/null
+++ b/skills/agentforge-template-author/CHANGELOG.md
@@ -0,0 +1,17 @@
+# `agentforge-template-author` changelog
+
+## 0.1.0 — 2026-05-02
+
+Initial release.
+
+- Guides contributors adding a new shipped AgentForge template under
+  `packages/{core,platform}/src/templates/<name>/`.
+- Walks scope check → package selection → domain & inputs → agent set
+  → pipeline shape → prompts → schemas → manifest, README, and tests
+  → wire-up → emit.
+- Reference docs: template anatomy (directory layout + `template.json`
+  manifest contract), core-vs-platform decision matrix, test-and-publish
+  PR checklist.
+- Hard rules: no invented registry fields, one package only, no
+  platform-only features in core templates, every output type has a
+  schema, parse tests required.
diff --git a/skills/agentforge-template-author/SKILL.md b/skills/agentforge-template-author/SKILL.md
new file mode 100644
index 0000000..b8ebf2c
--- /dev/null
+++ b/skills/agentforge-template-author/SKILL.md
@@ -0,0 +1,231 @@
+---
+name: agentforge-template-author
+description: >
+  Guides a contributor through adding a new shipped AgentForge template under
+  `packages/core/src/templates/<name>/` (or platform). Walks the user through
+  scope check, agent set, prompts, schemas, `template.json` metadata, README,
+  and tests, then emits the complete template directory ready for PR. Trigger
+  when the user asks to "add a new template", "ship a template for X",
+  "contribute a template", "create a starter for Y workflow", or wants to
+  publish a reusable pipeline shape that other AgentForge users can scaffold
+  via `agentforge init --template <name>`. Do NOT trigger for
+  end-user `.agentforge/` projects (use `agentforge-workflow` for those).
+license: MIT
+metadata:
+  author: mandarnilange
+  version: "0.1.0"
+---
+
+# AgentForge Template Author
+
+You are helping a contributor add a new **shipped template** to AgentForge.
+Templates live under `packages/core/src/templates/<name>/` (core, no platform
+deps) or `packages/platform/src/templates/<name>/` (platform, can use the
+multi-provider middleware, Postgres, Docker / SSH executors). They are
+auto-discovered by the template registry — no registration step.
+
+End users scaffold a copy into their project with:
+
+```bash
+npx @mandarnilange/agentforge init --template <name>
+```
+
+A new template is the right contribution shape when a useful pipeline pattern
+is missing from the catalog. It is **not** the right shape when a user is
+building something for their own project — point them at
+`agentforge-workflow` instead.
+
+## When to use this skill
+
+Trigger on any of:
+- "Add / contribute / ship a new AgentForge template"
+- "Create a starter template for `<domain>`"
+- "Modify the existing `<template>` template" (after confirming it should be
+  forked or replaced rather than upstreamed in place)
+- A contributor asking how the templates registry works.
+
+Do **not** trigger for end-user workflow design (use `agentforge-workflow`).
+
+## Reference material
+
+Read on demand — do not load upfront:
+
+- `references/template-anatomy.md` — full directory layout, `template.json`
+  manifest contract, and the registry's discovery rules
+- `references/core-vs-platform.md` — how to pick which package the template
+  belongs in and what each gives you
+- `references/test-and-publish.md` — required tests, lint, and the PR
+  checklist before shipping
+
+The shipped templates themselves are the strongest reference. Read at least
+one before authoring:
+- `packages/core/src/templates/simple-sdlc/` — minimal 3-agent example
+- `packages/platform/src/templates/api-builder/` — parallel codegen + tests
+- `packages/platform/src/templates/code-review/` — diff-input pipeline
+
+## The flow
+
+### 1. Scope check
+
+Ask: "Could a small modification of an existing template do the job?"
+
+| If the goal is | Modifying the existing template usually wins |
+|---|---|
+| Different model / provider on one agent | yes |
+| Extra script step in one agent's flow | yes |
+| Drop or add a single gate | yes |
+| Different output schema for the same artifact type | yes |
+| Net-new pipeline shape (different agent set, different domain) | no — new template |
+
+If the answer is "yes, modification wins", do NOT ship a new template — ask
+the user to PR the change to the existing one (or contribute a doc / cookbook
+entry showing how to override).
+
+If "no, new template is justified", continue.
+
+### 2. Package selection
+
+Use `references/core-vs-platform.md`. Quick rule:
+
+- **core** if every agent uses Anthropic, the executor is `pi-ai` or
+  `pi-coding-agent` only, no Postgres, no SSH workers, no multi-provider.
+  Anyone with `npm install @mandarnilange/agentforge-core` can use it.
+- **platform** otherwise. Multi-provider, Docker executor, SSH workers,
+  Postgres-only state, ECS — these all push the template to platform.
+
+When in doubt, prefer **core**. Cross-package dependencies are not allowed.
+
+### 3. Domain & inputs
+
+Pin the answers:
+
+- **What does the template *produce*?** Concrete artifacts the user gets at
+  the end (code, docs, dashboards, reports).
+- **What does it *take in*?** A brief, a URL, a code repo, a dataset, a
+  ticket. Define the `pipeline.spec.input` shape.
+- **What is the human role?** Where does the user need to approve / revise?
+  These become gates.
+
+If the inputs are ambiguous (e.g. "any kind of brief"), narrow before going
+further. A vague input begets vague agents and bad templates.
+
+### 4. Agent set
+
+Apply the same rules as `agentforge-workflow`:
+
+- One agent = one well-named human role doing one phase.
+- Every output `type` is a contract — pick names you will not regret.
+- `pi-coding-agent` only when the agent must touch files / run shell.
+- `pi-ai` for everything else.
+- Every agent declares a `resources.budget`.
+
+Aim for **3–5 agents** for a typical template. More than 6 is a sign the
+template is doing too much; consider splitting or removing optional phases.
+
+### 5. Pipeline shape
+
+- **Phase order** — strict topological order on artifact dependencies.
+- **Gates** — between expensive / one-way phases (after architecture, before
+  code-gen, before destructive ops). Skip gates for cheap reversible steps.
+- **Parallel phases** — when agents share inputs and have no inter-deps.
+- **Loops** — for self-correcting flows (test-fix, validate-fix).
+
+Cross-cutting agents (security, compliance) are unusual in templates because
+they push complexity onto every adopter. Only include if the template's
+domain genuinely demands it (e.g. a `regulated-content` template).
+
+### 6. Prompts
+
+Every agent's `spec.systemPrompt.file` points at `prompts/<agent>.system.md`.
+Write each prompt with:
+
+- A clear role statement and one-sentence mission.
+- A description of the inputs the agent will receive.
+- An explicit output contract referencing the schema file.
+- 2–4 quality-bar bullets the agent must satisfy.
+- Hard constraints (out-of-scope behaviour to refuse).
+
+Treat prompts as the public surface of the template — they are what
+end-users will read and edit when adapting it. Prefer clear over clever.
+
+### 7. Schemas
+
+Every output `type` needs a schema file under `schemas/<type>.schema.yaml`.
+Reuse the 45 shipped schemas in `packages/core/src/schemas/` whenever
+possible — `frd`, `architecture`, `test-suite`, etc. — so end-users get
+familiar artifact types across templates.
+
+For new artifact types, follow the structure shown in
+`references/template-anatomy.md`. Don't over-specify; the agent fills it in.
+
+### 8. Manifest, README, and tests
+
+Required files at the top of the template directory:
+
+- `template.json` — manifest with `name`, `displayName`, `description`,
+  `tags[]`, `agents` (number), `executor`. Schema in
+  `references/template-anatomy.md`.
+- `README.md` — one-page user-facing doc showing the pipeline shape, what to
+  pass in, what comes out, and how to extend. Mirror the structure of
+  `packages/core/src/templates/simple-sdlc/README.md`.
+- A test that asserts the template's pipeline + agents + nodes parse against
+  the Zod schemas in `packages/core/src/definitions/parser.ts`. See
+  `references/test-and-publish.md` for the test pattern.
+
+### 9. Wire-up & documentation
+
+- The template registry **auto-discovers** anything with a valid
+  `template.json` — no code change needed for `agentforge templates list` to
+  pick it up.
+- Add an entry to `docs/templates.md` (table at the top + dedicated section
+  with pipeline diagram and example run command).
+- For platform templates, also update
+  `packages/platform/src/templates/registry.ts` only if it does anything
+  beyond auto-discovery (most don't).
+- Run the full test suite locally: `npm test` from the repo root.
+
+### 10. Emit and stop
+
+Write the directory tree per `references/template-anatomy.md`. Show the
+contributor:
+
+1. The exact file tree you wrote.
+2. The validate command:
+   ```bash
+   npm test -- packages/core/tests/templates  # or platform/
+   ```
+3. The local-scaffold command they can run to dogfood the template:
+   ```bash
+   npx tsx packages/core/src/cli/index.ts init --template <name>
+   ```
+4. The PR checklist from `references/test-and-publish.md`.
+
+Do **not** open the PR. Do **not** push. Stop here unless the contributor
+explicitly asks for the next step.
+
+## Hard rules
+
+- **Do not invent registry fields.** `template.json` must match the schema in
+  `packages/core/src/templates/registry.ts` (`isValidManifest`).
+- **One package only.** A template lives in core OR platform, never both.
+  Cross-package imports are forbidden.
+- **No platform-only features in core templates.** No multi-provider models,
+  no Postgres, no Docker executor, no SSH affinity. The CI build will fail.
+- **Every output `type` has a schema.** Either reuse a shipped one from
+  `packages/core/src/schemas/` or add a new one under the template's
+  `schemas/` directory.
+- **System prompts go in files**, never inline `text:`. End-users will edit
+  these.
+- **Tests required.** A template without a parse test should not merge. See
+  `references/test-and-publish.md`.
+
+## What success looks like
+
+- A new directory under `packages/{core,platform}/src/templates/<name>/`
+  with `agents/`, `pipelines/`, `nodes/`, `prompts/`, `schemas/`,
+  `template.json`, and `README.md`.
+- `npm test` passes — including the new template's parse test.
+- `npx tsx packages/core/src/cli/index.ts templates list` shows the new
+  template (or the platform CLI for platform templates).
+- `docs/templates.md` updated with the catalog entry.
+- The contributor knows the exact PR checklist and can ship.
diff --git a/skills/agentforge-template-author/references/core-vs-platform.md b/skills/agentforge-template-author/references/core-vs-platform.md
new file mode 100644
index 0000000..fe468af
--- /dev/null
+++ b/skills/agentforge-template-author/references/core-vs-platform.md
@@ -0,0 +1,59 @@
+# Core vs platform — picking the right package
+
+A template lives in **one** package. Cross-package imports are forbidden.
+Pick wrong and CI rejects the PR.
+
+## Decision matrix
+
+| Need | Core | Platform |
+|---|---|---|
+| Anthropic only | ✓ | ✓ |
+| OpenAI / Gemini / Ollama | — | ✓ |
+| `pi-ai` executor | ✓ | ✓ |
+| `pi-coding-agent` executor | ✓ | ✓ |
+| Local node | ✓ | ✓ |
+| Docker executor | — | ✓ |
+| SSH / remote workers | — | ✓ |
+| SQLite state store | ✓ | ✓ |
+| Postgres state store | — | ✓ |
+| OTel API only (no SDK) | ✓ | ✓ |
+| Full OTel SDK + Jaeger / Grafana | — | ✓ |
+| Crash recovery + reconciliation | — | ✓ |
+| Per-pipeline rate limiting | — | ✓ |
+
+If every cell on the row is `core`, ship to **core**. Even one platform-only
+need pushes the template to **platform**.
+
+## Why this matters
+
+- Core is the smallest install (`@mandarnilange/agentforge-core`). Every
+  template in core is usable without the platform binary.
+- Platform adds runtime extras but is heavier. Templates here are only
+  reachable via `@mandarnilange/agentforge`.
+- A platform-only feature in a core template breaks the core build (CI will
+  fail on the cross-package import).
+
+## Quick smell test
+
+Read your draft `template.json` + agent files and ask:
+
+1. Do any agents have `model.provider: openai | google | ollama`? → platform
+2. Do any nodes use `type: docker | ssh`? → platform
+3. Does a script step rely on Postgres? → platform
+4. Does any agent require multi-host scheduling? → platform
+
+If you said no to all four, you have a **core** template. Most starter
+templates land in core.
+
+## Cross-package patterns to avoid
+
+- ❌ `import { ... } from "@mandarnilange/agentforge-core/..."` inside a
+  platform template's tests.
+- ❌ A core template's agent referencing a platform-only schema.
+- ❌ A core template's pipeline using a `crossCutting` agent that lives in
+  platform.
+- ❌ Conditional logic in the registry to "promote" a core template into
+  platform under certain conditions.
+
+If you find yourself reaching for any of these, ship the template to
+platform instead — it is allowed to import from core, the reverse is not.
diff --git a/skills/agentforge-template-author/references/template-anatomy.md b/skills/agentforge-template-author/references/template-anatomy.md
new file mode 100644
index 0000000..b35882f
--- /dev/null
+++ b/skills/agentforge-template-author/references/template-anatomy.md
@@ -0,0 +1,147 @@
+# Template anatomy
+
+Authoritative source: `packages/core/src/templates/registry.ts` and the
+shipped templates under `packages/core/src/templates/simple-sdlc/`.
+
+## Directory layout
+
+```
+packages/<core|platform>/src/templates/<name>/
+├── template.json                         # required manifest
+├── README.md                             # required user-facing doc
+├── agents/
+│   ├── <agent-1>.agent.yaml
+│   └── ...
+├── pipelines/
+│   └── <name>.pipeline.yaml              # one pipeline file per template
+├── nodes/
+│   ├── local.node.yaml                   # at least one node profile
+│   └── docker.node.yaml                  # if any pi-coding-agent
+├── prompts/
+│   ├── <agent-1>.system.md
+│   └── ...
+└── schemas/
+    └── <new-output-type>.schema.yaml     # only for output types not in
+                                           # packages/core/src/schemas/
+```
+
+## `template.json` manifest
+
+The registry validates this object via `isValidManifest`. Required fields:
+
+```json
+{
+  "name": "<kebab-case>",
+  "displayName": "Human Friendly Name",
+  "description": "One-line description; shown in `agentforge templates list`.",
+  "tags": ["sdlc", "starter"],
+  "agents": 3,
+  "executor": "pi-ai"
+}
+```
+
+| Field | Type | Notes |
+|---|---|---|
+| `name` | string | Must equal the directory name. Lowercase + dashes. |
+| `displayName` | string | Title-cased label shown in lists. |
+| `description` | string | One sentence. Surfaces in CLI + dashboard. |
+| `tags` | string[] | At least one tag. Conventional tags: `starter`, `sdlc`, `code`, `content`, `data`, `ops`, `regulated`. |
+| `agents` | number | Total agent count. Used for sorting + display. |
+| `executor` | string | `pi-ai`, `pi-coding-agent`, or `mixed`. |
+
+Any extra fields are ignored. Missing or wrong-typed fields cause the
+registry to skip the template with a console warning at startup — silent in
+production but visible in tests.
+
+## Discovery rules
+
+The core registry (`packages/core/src/templates/registry.ts`):
+
+1. Scans `packages/core/src/templates/*/` at runtime.
+2. Reads each subdirectory's `template.json`.
+3. Validates via `isValidManifest`.
+4. Caches the result for the process lifetime.
+
+The platform registry (`packages/platform/src/templates/registry.ts`)
+does the same for `packages/platform/src/templates/*/` and merges with
+the core list at runtime. Platform templates take precedence on name
+collision — keep template names globally unique to avoid surprises.
+
+There is no central "register this template" call. **Drop the directory
+in, ship the manifest, you're listed.** Tests will fail if the manifest
+is malformed.
+
+## Pipeline file naming
+
+```
+pipelines/<template-name>.pipeline.yaml
+```
+
+The pipeline's `metadata.name` should match the template's directory name.
+This is what end-users pass to `agentforge run-pipeline <name>` after
+running `init --template <name>`.
+
+## Node profiles
+
+Templates ship at least one `local.node.yaml`. Any template with a
+`pi-coding-agent` should also ship `docker.node.yaml` so the user can
+opt into sandboxed runs without writing their own node file.
+
+Capabilities to declare:
+
+- `local.node.yaml`: `[git, llm-access]`
+- `docker.node.yaml`: `[docker, git, llm-access]` (and `local-fs` if the
+  agent needs to write files outside the container)
+
+## Prompt file template
+
+```markdown
+# <Agent Display Name> — System Prompt
+
+You are <role>. Your job is to <one-sentence mission>.
+
+## Inputs
+
+You will receive:
+- `<input-type>`: <what it contains>
+
+## Output contract
+
+Produce a single artifact of type `<output-type>` matching the schema at
+`schemas/<output-type>.schema.yaml`. Do not invent fields outside the schema.
+
+## Quality bar
+
+- <2-4 specific bullets>
+
+## Hard constraints
+
+- <out-of-scope behaviours>
+```
+
+Keep prompts under ~150 lines. End-users will read and edit them.
+
+## Schema file template
+
+Reuse from `packages/core/src/schemas/` whenever possible. For genuinely
+new artifact types:
+
+```yaml
+$schema: https://json-schema.org/draft/2020-12/schema
+title: <ArtifactName>
+description: <one line>
+type: object
+required: [<field-1>, <field-2>]
+properties:
+  <field-1>:
+    type: string
+    description: <what this is>
+  <field-2>:
+    type: array
+    items:
+      type: object
+      required: [name]
+      properties:
+        name: { type: string }
+additionalProperties: false
+```
diff --git a/skills/agentforge-template-author/references/test-and-publish.md b/skills/agentforge-template-author/references/test-and-publish.md
new file mode 100644
index 0000000..8d4246a
--- /dev/null
+++ b/skills/agentforge-template-author/references/test-and-publish.md
@@ -0,0 +1,137 @@
+# Test and publish a template
+
+Every shipped template needs a test that proves the YAML files parse, plus
+the standard PR review pass. Untested templates do not merge.
+
+## Required tests
+
+### 1. Parse test
+
+Asserts every `.agent.yaml`, `.pipeline.yaml`, and `.node.yaml` in the
+template parses against the Zod schemas in
+`packages/core/src/definitions/parser.ts`. Pattern from existing templates:
+
+```typescript
+// packages/<core|platform>/tests/templates/<name>.test.ts
+import { describe, expect, it } from "vitest";
+import { readFileSync } from "node:fs";
+import { glob } from "node:fs/promises";
+import { join } from "node:path";
+import { parse as parseYaml } from "yaml";
+import {
+  AgentDefinitionSchema,
+  PipelineDefinitionSchema,
+  NodeDefinitionSchema,
+} from "../../src/definitions/parser.js";
+
+const TEMPLATE_DIR = join(import.meta.dirname, "..", "..", "src", "templates", "<name>");
+
+describe("<name> template", () => {
+  it("agents parse", async () => {
+    for await (const file of glob(`${TEMPLATE_DIR}/agents/*.agent.yaml`)) {
+      const raw = parseYaml(readFileSync(file, "utf8"));
+      expect(() => AgentDefinitionSchema.parse(raw)).not.toThrow();
+    }
+  });
+
+  it("pipeline parses", () => {
+    const file = `${TEMPLATE_DIR}/pipelines/<name>.pipeline.yaml`;
+    const raw = parseYaml(readFileSync(file, "utf8"));
+    expect(() => PipelineDefinitionSchema.parse(raw)).not.toThrow();
+  });
+
+  it("nodes parse", async () => {
+    for await (const file of glob(`${TEMPLATE_DIR}/nodes/*.node.yaml`)) {
+      const raw = parseYaml(readFileSync(file, "utf8"));
+      expect(() => NodeDefinitionSchema.parse(raw)).not.toThrow();
+    }
+  });
+
+  it("registry discovers it", async () => {
+    const { getCoreTemplates } = await import("../../src/templates/registry.js");
+    const t = getCoreTemplates().find((x) => x.name === "<name>");
+    expect(t).toBeDefined();
+    expect(t?.agents).toBeGreaterThan(0);
+  });
+});
+```
+
+For platform templates, swap the import to
+`@mandarnilange/agentforge-core` (read-only) for the schemas, and the
+registry import to the platform registry.
+
+### 2. Manifest validation
+
+The registry's `isValidManifest` runs at startup. Make sure the test
+exercises a fresh registry load via `clearCoreTemplatesCache()`:
+
+```typescript
+import { clearCoreTemplatesCache, getCoreTemplates } from "../../src/templates/registry.js";
+
+beforeEach(() => clearCoreTemplatesCache());
+```
+
+Otherwise a stale cache from earlier tests can mask manifest errors.
+
+### 3. End-to-end smoke (optional, recommended)
+
+Run the template through `init` → `validate` → first agent execution
+against a mocked LLM. This catches schema/wiring mismatches that pass the
+parse test but break at runtime. See
+`packages/core/tests/cli/init.test.ts` for the init pattern and
+`packages/core/tests/runner/agent-runner.test.ts` for mocked-LLM execution.
+
+## Local checks before opening the PR
+
+```bash
+# 1. Lint and format
+npm run lint
+
+# 2. Type check
+npm run typecheck
+
+# 3. Run all tests (template + everything else)
+npm test
+
+# 4. Smoke-test the scaffold
+mkdir -p /tmp/agentforge-template-test && cd /tmp/agentforge-template-test
+npx tsx <repo>/packages/core/src/cli/index.ts init --template <name>
+ls -R .agentforge/
+
+# 5. Validate the scaffolded definitions
+npx tsx <repo>/packages/core/src/cli/index.ts list
+```
+
+If `list` shows the template's agents and `init` produced a complete
+`.agentforge/` directory, the publish flow is working.
+
+## PR checklist
+
+Copy this into the PR description:
+
+- [ ] New directory `packages/<core|platform>/src/templates/<name>/`
+- [ ] `template.json` validates against `isValidManifest`
+- [ ] `agents/`, `pipelines/`, `nodes/`, `prompts/`, `schemas/`, `README.md`
+- [ ] Parse test added at `packages/<core|platform>/tests/templates/<name>.test.ts`
+- [ ] `npm test` passes locally
+- [ ] `npm run lint` clean
+- [ ] `docs/templates.md` updated with the catalog entry (table row + dedicated section)
+- [ ] Smoke-tested: `npx tsx packages/core/src/cli/index.ts init --template <name>` produces a complete `.agentforge/` directory
+- [ ] No cross-package imports (core → platform, or platform → end-user code)
+- [ ] All output `type`s either reuse a shipped schema or have a new schema file in the template's `schemas/` directory
+
+## Versioning
+
+Templates do not have their own version field — they ship with the package
+version. Breaking changes to a shipped template (renamed agent, changed
+output type) require a major package bump. Keep that bar high.
+
+For evolving an existing template:
+- Adding an optional new agent to an existing phase: minor bump.
+- Adding a new output `type`: minor bump.
+- Renaming an agent or removing one: major bump.
+- Changing the `template.json` `name`: major bump (the install command
+  changes for end users).
+
+If you are adding a *brand new* template, you do not need a version bump —
+new template = additive change.
diff --git a/skills/agentforge-workflow/CHANGELOG.md b/skills/agentforge-workflow/CHANGELOG.md
new file mode 100644
index 0000000..e8b4fcc
--- /dev/null
+++ b/skills/agentforge-workflow/CHANGELOG.md
@@ -0,0 +1,13 @@
+# `agentforge-workflow` changelog
+
+## 0.1.0 — 2026-05-02
+
+Initial release.
+
+- Walks the user from goal → template-first check → agent decomposition
+  → pipeline shape → per-agent flow → nodes → schemas → scaffold emit.
+- On-demand reference cheat sheets for AgentDefinition, PipelineDefinition,
+  NodeDefinition schemas plus a template catalog and the `.agentforge/`
+  scaffold layout.
+- Hard rules: no invented schema fields, one producer per artifact type,
+  budgets required, prompts in files, gates default to required.