mandarnilange · mandarnilange · May 2, 2026 · May 2, 2026 · May 2, 2026
diff --git a/.github/workflows/publish-skills.yml b/.github/workflows/publish-skills.yml
@@ -0,0 +1,34 @@
+name: Publish Skills
+
+# Validates SKILL.md frontmatter under `skills/`. The `skills/` path is
+# what Vercel's skills CLI scans by default — keeping it on the default
+# branch IS the publication step (no upload / no registry submission).
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+    paths:
+      - 'skills/**'
+      - 'scripts/validate-skills.mjs'
+      - '.github/workflows/publish-skills.yml'
+
+concurrency:
+  group: publish-skills-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  validate:
+    name: validate skill frontmatter
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v5
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v5
+        with:
+          node-version: '22'
+
+      - name: Validate skill frontmatter
+        run: node scripts/validate-skills.mjs
diff --git a/README.md b/README.md
diff --git a/docs/artifacts.md b/docs/artifacts.md
@@ -0,0 +1,42 @@
+# Artifact Typing & Validation
+
+> Part of the [AgentForge documentation](README.md).
+
+Every agent declares typed inputs and outputs. Artifacts are validated against Zod / JSON Schemas at every pipeline boundary — invalid output fails the agent run before it reaches the next phase.
+
+```
+Agent YAML                    Zod Schema                      Runtime
+┌─────────────────┐    ┌──────────────────────┐    ┌──────────────────────┐
+│ outputs:        │    │ RequirementsSchema   │    │ Agent produces JSON  │
+│   - type:       │───▶│   = z.object({       │───▶│ → safeParse(output)  │
+│     requirements│    │     epics: [...],    │    │ → pass ✓  or fail ✗  │
+│     schema: ... │    │     ...              │    └──────────────────────┘
+└─────────────────┘    │   })                 │
+                       └──────────────────────┘
+```
+
+## What ships
+
+45 built-in schemas covering requirements, architecture, code, data, testing, security, and DevOps — see `packages/core/src/schemas/`. Every shipped template references them so you can compose pipelines without inventing new artifact types.
+
+## Defining your own
+
+Add a Zod schema in TypeScript and reference it from agent YAML by file path:
+
+```yaml
+# .agentforge/agents/my-agent.agent.yaml
+spec:
+  outputs:
+    - type: my-artifact
+      schema: schemas/my-artifact.schema.yaml
+```
+
+The schema file can be either Zod-shaped TypeScript (loaded via the schema registry) or a JSON Schema YAML — both validate at the same boundary.
+
+Architectural details and the artifact flow through phases: [`docs/architecture.md`](architecture.md#artifact-flow).
+
+## Why this matters
+
+- **Malformed LLM output is caught early.** Bad JSON, missing fields, or wrong types abort the agent run before downstream agents consume it.
+- **Wiring is explicit.** Each agent's `inputs[].type` and `outputs[].type` form a contract. Two agents producing the same type is a configuration error you catch at `agentforge validate` time, not at runtime.
+- **Schemas double as docs.** New team members understand what each phase produces by reading one file.
diff --git a/docs/harness-model.md b/docs/harness-model.md
@@ -0,0 +1,132 @@
+# The Harness Model
+
+> Part of the [AgentForge documentation](README.md).
+
+Most agent frameworks treat an "agent" as one LLM call wrapped in a few tools. AgentForge treats an agent as a **harness** — a named flow of steps where the LLM is just one step type. Your existing tools (linters, test runners, security scanners, custom CLIs) sit alongside the LLM and *gate its output* on every run.
+
+The result: bad LLM output never leaks to the next phase, and you customise behaviour by editing YAML — not by forking the framework.
+
+---
+
+## Step types
+
+Each agent declares a flow of named steps from this set:
+
+| Type | What it does |
+|---|---|
+| `llm` | Invokes the agent's model with the system prompt + inputs. The normal LLM call. |
+| `script` | Runs a shell command on the node. Has access to template variables (`{{run.workdir}}`, `{{pipeline.id}}`, `{{steps.<name>.output}}`, `{{steps.<name>.exitCode}}`). |
+| `validate` | Runs a Zod / JSON Schema check against a named artifact or the last LLM output. Fails the run by default; set `continueOnError: true` to log and continue. |
+| `transform` | Pure data reshape between steps (no side effects). |
+
+Plus two control-flow constructs usable anywhere in a flow:
+
+- **`loop`** — retry a block until a predicate step outputs a success sentinel, with a `maxIterations` ceiling.
+- **`condition`** — skip a step when a referenced step's output doesn't match.
+
+---
+
+## Real example — the bundled `developer` agent
+
+This is from `packages/core/src/templates/simple-sdlc/agents/developer.agent.yaml`. It shows the *generate → lint → test → fix-until-passing* pattern that `script` + `loop` unlock together:
+
+```yaml
+spec:
+  executor: pi-coding-agent
+  tools: [read, write, edit, bash, grep, find]
+
+  definitions:
+    generate-code:
+      type: llm
+      instructions: |
+        Generate the full implementation based on the requirements and architecture plan.
+
+    lint-and-format:
+      type: script
+      run: |
+        cd {{run.workdir}}
+        # Auto-detect + run the project's linter/formatter
+        if   [ -f package.json  ]; then npx eslint src/ --fix; npx prettier --write "src/**/*.{ts,js}"
+        elif [ -f pyproject.toml ]; then python -m black .; python -m ruff check --fix .
+        elif [ -f go.mod        ]; then gofmt -w .
+        fi
+      continueOnError: true
+
+    run-tests:
+      type: script
+      run: |
+        cd {{run.workdir}}
+        if   [ -f package.json  ]; then npm test
+        elif [ -f pyproject.toml ]; then python -m pytest -v
+        elif [ -f go.mod        ]; then go test ./...
+        fi
+      captureOutput: true
+      continueOnError: true
+
+    test-gate:
+      type: script
+      run: |
+        if [ "{{steps.run-tests.exitCode}}" = "0" ]; then echo "PASS"; else echo "false"; fi
+
+    fix-code:
+      type: llm
+      instructions: |
+        Fix attempt {{loop.iteration}} of {{loop.maxIterations}}.
+        Failing tests:
+        {{steps.run-tests.output}}
+        Fix the source code — don't modify tests unless they have a genuine bug.
+
+    validate-output:
+      type: validate
+      schema: code-output
+
+    git-commit:
+      type: script
+      run: |
+        cd {{run.workdir}}
+        git add -A && git commit -m "feat(developer): pipeline {{pipeline.id}}"
+      continueOnError: true
+
+  flow:
+    - step: generate-code
+    - step: lint-and-format
+    - loop:
+        until: "{{steps.test-gate.output}}"     # exits when test-gate emits "PASS"
+        maxIterations: 3
+        do:
+          - step: run-tests
+          - step: test-gate
+          - step: fix-code
+            condition: "{{steps.test-gate.output}}"   # skip fix if tests passed
+    - step: validate-output
+    - step: git-commit
+```
+
+---
+
+## Why this matters
+
+- **Your existing tools stay in charge of correctness.** The LLM proposes; `eslint`, `pytest`, `go vet`, `trivy`, `semgrep`, whatever you already trust, decide whether the output is acceptable. Bad LLM output doesn't leak into the next phase.
+- **Customise without forking.** Want a different linter, a stricter security scan, a different commit convention? It's YAML — edit the `run:` block. No framework recompile.
+- **Domain-agnostic.** The same mechanics build a content agent (generate → SEO audit → Grammarly → publish), a data agent (generate SQL → explain-plan → dry-run → apply), an ops agent (generate runbook → shellcheck → render to PDF). Scripts are the universal glue.
+- **Observable.** Every step — LLM and script — lands in the state store with output, exit code, duration, and a span in your OTel trace. The dashboard timeline shows the whole harness, not just the LLM turn.
+
+---
+
+## Template variables
+
+Every `script.run`, `llm.instructions`, `condition`, and `loop.until` field is a template. Available bindings:
+
+- `{{run.workdir}}` — agent's working directory on the node
+- `{{run.id}}`, `{{pipeline.id}}` — IDs for logging / commits
+- `{{inputs.<type>}}` — content of a declared input artifact
+- `{{steps.<name>.output}}` / `.exitCode` — last result of a named step
+- `{{loop.iteration}}` / `{{loop.maxIterations}}` — current loop position
+
+Full grammar and resolution semantics: [`docs/architecture.md`](architecture.md).
+
+---
+
+## Step grammar reference
+
+For the authoritative shape of `step`, `loop`, `parallel`, and `condition` blocks, see the Zod schema in `packages/core/src/definitions/parser.ts` (`AgentDefinitionSchema`). Pipeline execution and artifact flow: [`docs/pipeline-execution-flows.md`](pipeline-execution-flows.md).
diff --git a/docs/packages.md b/docs/packages.md
@@ -0,0 +1,49 @@
+# `@mandarnilange/agentforge` vs `@mandarnilange/agentforge-core`
+
+> Part of the [AgentForge documentation](README.md).
+
+Two npm packages ship from this repo. Most users want **`@mandarnilange/agentforge`** (the platform binary). Pick `@mandarnilange/agentforge-core` only if you're embedding the engine into your own CLI / service or you specifically don't want the platform extras.
+
+## Feature comparison
+
+| | **`@mandarnilange/agentforge-core`** | **`@mandarnilange/agentforge`** (platform) |
+|---|---|---|
+| **Install** | `npm install @mandarnilange/agentforge-core` | `npm install @mandarnilange/agentforge` (pulls in core) |
+| **Binary** | `agentforge-core` | `agentforge` |
+| **Intended for** | Local dev, evaluation, library embed | Production, teams, multi-host |
+| **LLM providers** | Anthropic | Anthropic + OpenAI + Gemini + Ollama |
+| **Executors** | Local (in-process) | Local + **Docker container** + **Remote HTTP** |
+| **Node types** | `local` | `local` + `ssh` + remote workers |
+| **State store** | SQLite (file) | SQLite **or** PostgreSQL |
+| **Persistent definitions** | YAML on disk, loaded per run | YAML on disk **or** `apply` to DB (versioned, hot-reload) |
+| **Observability** | OTel API (no-op without SDK) | Full OTel SDK + Jaeger / Grafana export |
+| **Crash recovery** | — | Pipeline rehydration + reconciliation loop |
+| **Rate limiting** | — | Token / cost / concurrency per pipeline |
+| **Multi-host deploy** | — | Control-plane + worker Docker Compose files |
+| **Docker image** | `ghcr.io/mandarnilange/agentforge-core` (~289 MB) | `ghcr.io/mandarnilange/agentforge-platform` (~336 MB) |
+
+Defaults are **identical** for local dev (SQLite, local executor, Anthropic). Installing the platform package up front means you won't have to migrate when you need a production feature.
+
+## Using just the framework (`@mandarnilange/agentforge-core`)
+
+If you're embedding the engine into your own CLI or service — or you want the framework without the platform binary, multi-provider middleware, or Postgres — install `@mandarnilange/agentforge-core` directly:
+
+```bash
+npm install @mandarnilange/agentforge-core
+npx @mandarnilange/agentforge-core init --template simple-sdlc
+```
+
+Same YAML schema, same executors, same control plane. You wire your own entry point. Package-level docs: [`packages/core/README.md`](../packages/core/README.md).
+
+## Multi-provider setup
+
+Mixing Anthropic, OpenAI, Gemini, and Ollama (one provider per agent in the same pipeline): [`docs/multi-provider.md`](multi-provider.md).
+
+## Docker images
+
+```bash
+docker build --target core     -t agentforge-core     .   # ~289 MB
+docker build --target platform -t agentforge-platform .   # ~336 MB
+```
+
+Image targets share the same `Dockerfile` — the `platform` target adds the platform-only entry points and env variables.
diff --git a/docs/platform-architecture.md b/docs/platform-architecture.md
@@ -2070,7 +2070,61 @@ interface AgentJobIdentity {
 
 ---
 
-## 15. Glossary
+## 15. Deploying heterogeneous worker pools
+
+The execution plane scales horizontally by adding worker hosts. Workers register with the control plane, heartbeat, and receive dispatched agent jobs. Two workers with **different capabilities** on different hosts let the scheduler route each agent to the right node via `nodeAffinity`.
+
+### Spinning up two specialised workers
+
+```bash
+# Worker A — beefy, Docker-isolated, GPU
+NODE_NAME=worker-gpu \
+NODE_CAPABILITIES=llm-access,docker,high-memory,gpu \
+NODE_MAX_CONCURRENT_RUNS=4 \
+CONTROL_PLANE_URL=http://cp:3001 \
+  docker compose -f packages/platform/docker-compose.worker.yml up -d
+
+# Worker B — lightweight, llm-calls only
+NODE_NAME=worker-light \
+NODE_CAPABILITIES=llm-access \
+NODE_MAX_CONCURRENT_RUNS=10 \
+CONTROL_PLANE_URL=http://cp:3001 \
+  docker compose -f packages/platform/docker-compose.worker.yml up -d
+```
+
+### Matching agent affinity
+
+The `developer` agent demands Docker isolation and benefits from GPU, so it routes to `worker-gpu`. The `analyst` agent only needs LLM access, so it lands on `worker-light`:
+
+```yaml
+# .agentforge/agents/developer.agent.yaml
+spec:
+  nodeAffinity:
+    required:  [{ capability: llm-access }, { capability: docker }]
+    preferred: [{ capability: gpu }, { capability: high-memory }]
+```
+
+```yaml
+# .agentforge/agents/analyst.agent.yaml
+spec:
+  nodeAffinity:
+    required: [{ capability: llm-access }]
+```
+
+### Verifying the pool
+
+```bash
+agentforge get nodes
+# NAME          STATUS   CAPABILITIES                                    ACTIVE/MAX
+# worker-gpu    online   llm-access, docker, high-memory, gpu            0/4
+# worker-light  online   llm-access                                      0/10
+```
+
+The scheduler picks the highest-scoring node whose capabilities satisfy each agent's required set; soft preferences break ties and active-run counts cap concurrency per node.
+
+---
+
+## 16. Glossary
 
 | Term | Definition |
 |------|-----------|

diff --git a/docs/who-uses-it.md b/docs/who-uses-it.md
@@ -0,0 +1,41 @@
+# Who Uses AgentForge
+
+> Part of the [AgentForge documentation](README.md).
+
+AgentForge is a YAML-and-CLI framework. Engineers and platform teams author it; the *artifacts and gates* it produces are consumed across an organisation.
+
+## Roles, concretely
+
+### Platform / DevOps engineers
+**Stand it up once, the rest of the org inherits the substrate.**
+
+- Run AgentForge as a control plane + worker pool for the company.
+- Configure node pools, secrets, cost ceilings, OTel export.
+- Add new pipelines as `git push` — no per-team glue code to maintain.
+- Pair with [`docs/platform-architecture.md`](platform-architecture.md).
+
+### Software engineers
+**Build the agents your domain needs; reuse the harness.**
+
+- Author `.agent.yaml`, `.pipeline.yaml`, and step pipelines.
+- Wire your linter, tests, and security scanners as `script` steps so they gate the LLM. See the harness model: [`docs/harness-model.md`](harness-model.md).
+- Ship AI-assisted features without giving up code-review discipline — every step lands in the OTel trace and the dashboard timeline.
+
+### Product / domain owners (marketing, sales, HR, ops, legal, finance)
+**Don't write YAML. Drive runs from the dashboard.**
+
+- Kick off pipelines via the dashboard or CLI ("run `seo-review` on this URL").
+- Approve / reject / revise at human gates between phases — plain-English revision notes, no code.
+- Read and download the typed artifacts the pipeline produces.
+
+The artifact-typing model means revision notes steer the next LLM call: gates are a two-way conversation, not a rubber stamp. Every decision is signed, timestamped, and survives restarts.
+
+## What everyone gets
+
+One binary, one control plane, one audit trail:
+
+- **Cost guardrails at every layer.** Each agent declares its own token + dollar ceiling; pipelines carry org-wide limits. The dashboard shows spend in real time. Runaway LLM calls abort cleanly *before* they bill you.
+- **Typed artifacts.** 45 built-in Zod / JSON Schemas for SDLC outputs (and you define your own). Malformed LLM output fails the run before it poisons the next phase. See [`docs/artifacts.md`](artifacts.md).
+- **Humans in the loop.** Plain-English approvals between phases — the LLM proposes; the human decides.
+- **Real-time dashboard.** Pipeline timeline, live agent conversation, artifact viewer, PDF export, cost tracking — same binary, no extra install.
+- **Open source, MIT.** No paid tier, no cloud dependency, no telemetry.
diff --git a/package.json b/package.json
@@ -20,6 +20,7 @@
 		"test:watch": "vitest",
 		"lint": "biome check --write .",
 		"typecheck": "tsc --build",
+		"skills:validate": "node scripts/validate-skills.mjs",
 		"clean": "rm -rf packages/core/dist packages/core/tsconfig.tsbuildinfo packages/core/src/dashboard/dist packages/platform/dist packages/platform/tsconfig.tsbuildinfo"
 	},
 	"engines": {