From da455a1daa13877d927ae0886b3ec4f994dd9f4a Mon Sep 17 00:00:00 2001 From: Mandar Nilange Date: Sat, 2 May 2026 09:23:54 +0530 Subject: [PATCH 1/2] feat(skills): add agentforge-workflow agent skill + Vercel publish flow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - skills/agentforge-workflow/ — guided AgentForge workflow author with on-demand schema cheat sheets (agent / pipeline / node / templates / scaffold) - scripts/validate-skills.mjs + .github/workflows/publish-skills.yml — frontmatter validator (name, description, license, metadata.author, metadata.version) enforced in CI; agentforge- prefix required - README restructure: harness model surfaced right after Quick Start, audience pitch / package comparison / harness deep-dive moved to dedicated docs, all doc links consolidated into one grouped "Learn more" launchpad - New docs: who-uses-it, harness-model, packages, artifacts; platform-architecture.md gains heterogeneous worker pool deployment example --- .github/workflows/publish-skills.yml | 34 ++ README.md | 561 ++++-------------- docs/artifacts.md | 42 ++ docs/harness-model.md | 132 +++++ docs/packages.md | 49 ++ docs/platform-architecture.md | 56 +- docs/who-uses-it.md | 41 ++ package.json | 1 + scripts/validate-skills.mjs | 129 ++++ skills/README.md | 75 +++ skills/agentforge-workflow/SKILL.md | 198 +++++++ .../references/agent-schema.md | 148 +++++ .../references/node-schema.md | 68 +++ .../references/pipeline-schema.md | 153 +++++ .../references/scaffold.md | 109 ++++ .../references/template-catalog.md | 65 ++ 16 files changed, 1411 insertions(+), 450 deletions(-) create mode 100644 .github/workflows/publish-skills.yml create mode 100644 docs/artifacts.md create mode 100644 docs/harness-model.md create mode 100644 docs/packages.md create mode 100644 docs/who-uses-it.md create mode 100644 scripts/validate-skills.mjs create mode 100644 skills/README.md create mode 100644 skills/agentforge-workflow/SKILL.md create mode 100644 skills/agentforge-workflow/references/agent-schema.md create mode 100644 skills/agentforge-workflow/references/node-schema.md create mode 100644 skills/agentforge-workflow/references/pipeline-schema.md create mode 100644 skills/agentforge-workflow/references/scaffold.md create mode 100644 skills/agentforge-workflow/references/template-catalog.md diff --git a/.github/workflows/publish-skills.yml b/.github/workflows/publish-skills.yml new file mode 100644 index 0000000..7628c88 --- /dev/null +++ b/.github/workflows/publish-skills.yml @@ -0,0 +1,34 @@ +name: Publish Skills + +# Validates SKILL.md frontmatter under `skills/`. The `skills/` path is +# what Vercel's skills CLI scans by default — keeping it on the default +# branch IS the publication step (no upload / no registry submission). + +on: + push: + branches: [main] + pull_request: + branches: [main] + paths: + - 'skills/**' + - 'scripts/validate-skills.mjs' + - '.github/workflows/publish-skills.yml' + +concurrency: + group: publish-skills-${{ github.ref }} + cancel-in-progress: true + +jobs: + validate: + name: validate skill frontmatter + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v5 + + - name: Setup Node.js + uses: actions/setup-node@v5 + with: + node-version: '22' + + - name: Validate skill frontmatter + run: node scripts/validate-skills.mjs diff --git a/README.md b/README.md index 81146d5..1a41939 100644 --- a/README.md +++ b/README.md @@ -14,44 +14,11 @@ - **Run anywhere** — local, Docker, or remote workers; SQLite or Postgres; OTel-native. - **Scale like infra** — multi-worker scheduling, approval gates, cost ceilings, live dashboard. -Ships with a reference SDLC template — runnable end-to-end in minutes. Domain-agnostic: point it at code review, content generation, ops runbooks, data pipelines — anywhere multiple LLM calls need to be coordinated with humans in the loop. - -> **For platform teams:** think of it as Kubernetes-style orchestration for agentic workloads — control plane, execution plane, declarative specs — without the cluster. - -> **Status:** v0.2.0 release candidate (`v0.2.0-rc.2`) — early-feedback build. API surface is stabilising but may still shift; `npm install @mandarnilange/agentforge` pulls the RC. Please [open an issue](https://github.com/mandarnilange/agentforge/issues) for anything that looks rough, or use [Discussions](https://github.com/mandarnilange/agentforge/discussions) for usage questions. - ---- - -## Agent orchestration platform for every team - -Any team running LLM-driven work — marketing, sales, HR, operations, legal, product, engineering — gets the same platform: cost control, observability, human-in-the-loop gates, typed artifacts, scheduling, recovery. You build the agents your domain needs; AgentForge handles the orchestration around them. - -- **Marketing, sales, HR, ops, legal, finance** — run LLM workflows through the same dashboard the engineers use. Plain-English revision notes, no code, no engineering escalations. Domain expertise stays with the domain experts. -- **Product & engineering teams** — ship AI-assisted features without giving up code-review discipline. Your linter, tests, and security scanners gate the LLM; the dashboard gives a full audit trail for every decision. -- **Platform teams** — stand up AgentForge once; every team in the org inherits cost tracking, audit logs, sandboxing, and observability. Zero per-team glue code to maintain. Adding a new team is a `git push`. - -One binary, one control plane, one audit trail. You focus on the systems; AgentForge handles the harness. +Ships with a reference SDLC template — runnable end-to-end in minutes. Domain-agnostic: code review, content generation, ops runbooks, data pipelines — anywhere multiple LLM calls need to be coordinated with humans in the loop. --- -## Key Features - -- **Declarative YAML for everything.** Agents, pipelines, nodes, and artifact schemas are data, not code. Version-control them, diff them, generate them, `apply` them to a persistent store. -- **Deterministic + LLM harness per agent.** Each agent is a named flow of `llm`, `script`, `validate`, and `transform` steps — wrapped in loops and conditionals. Wire in your linter, test runner, or security scanner and the LLM's output is checked by *your tools* on every run. The framework becomes the harness you customize; your tools stay in charge of correctness. -- **Multiple execution targets — nodes.** `local` (in-process), Docker containers (per-agent isolation), and remote workers over SSH / HTTP. Agents declare `nodeAffinity`; the scheduler matches them to nodes by capabilities + load. Run heterogeneous worker pools (GPU, high-memory, lightweight) from the same image. -- **Bring your own LLM provider.** Anthropic, OpenAI (GPT-4o, o1), Google Gemini, and Ollama (local). Mix providers *per agent* in the same pipeline — `model.provider` is agent-level, not global. -- **Bring your own coding agent.** The `IExecutionBackend` port is the plug-in point for any coding-agent runtime. Ships today with `@mariozechner/pi-coding-agent` (`read`, `write`, `edit`, `bash`, `grep`, `find`). Adapt Codex, OpenCode, Goose, Gemini CLI, Aider, or your internal tool by implementing one interface. Add per-agent custom tools — query your database, file a Jira ticket, call an internal API — through a tool-definition API that doesn't require a framework recompile. Open source and extensible by design. -- **Typed artifacts with schema validation.** 45 built-in Zod / JSON Schemas for SDLC outputs; define your own in a single file. Malformed LLM output fails the run before it poisons the next phase. -- **Humans in the loop, not just observers.** Between phases, a reviewer accepts, rejects, or redirects with a note in plain English — no YAML, no code. The LLM proposes; the human decides. Revision notes steer the next LLM call, so gates are a two-way conversation, not a rubber stamp. Every decision is signed, timestamped, and survives restarts. -- **Cost guardrails at every layer.** Each agent declares its own token + dollar ceiling; pipelines carry org-wide limits set by the platform team; the dashboard shows spend accumulating in real time. A runaway LLM call aborts cleanly before it bills you — not after. -- **Real-time dashboard in the binary.** React SPA + Server-Sent Events: pipeline timeline, live agent conversation, gate actions, artifact viewer, PDF export, cost tracking. No extra install. -- **Batteries-included templates.** `simple-sdlc` starter (3 agents) in core. Platform binary ships `api-builder`, `code-review`, `content-generation`, `data-pipeline`, `seo-review` — real, runnable pipelines, not demos. -- **Laptop-to-production continuum.** SQLite → PostgreSQL, local → Docker → remote workers, no-op OTel → full SDK with Jaeger/Grafana. Same YAML, same binary, env-var upgrades only. No migration. -- **Open source, MIT.** No paid tier, no cloud dependency, no telemetry. You own the stack. - ---- - -## Quick Start (5 minutes) +## Quick Start ```bash # 1. Install @@ -63,65 +30,76 @@ export ANTHROPIC_API_KEY=sk-ant-... # 3. Scaffold the reference template into .agentforge/ npx @mandarnilange/agentforge init --template simple-sdlc -# 4. Run a single agent against a brief -npx @mandarnilange/agentforge exec analyst --input "Build a freelance invoicing SaaS" - -# 5. Run the full pipeline (approval gates between phases) +# 4. Run the full pipeline (approval gates between phases) npx @mandarnilange/agentforge run --project my-app --input "brief=Build a freelance invoicing SaaS" -# 6. Open the dashboard to watch it live -npx @mandarnilange/agentforge dashboard -# → http://localhost:3001 +# 5. Watch it live +npx @mandarnilange/agentforge dashboard # → http://localhost:3001 +``` + +--- + +## The harness model — what makes AgentForge different + +Other frameworks treat an "agent" as one LLM call wrapped in tools. AgentForge treats an agent as a **harness** — a named flow of steps where your existing tools are first-class: + +- `llm` — call the model with the system prompt + inputs. +- `script` — run any shell command (linter, test runner, security scanner, your custom CLI). +- `validate` — Zod / JSON Schema check against an artifact. Fails the run by default. +- `transform` — pure data reshape between steps. + +Wrap any of these in `loop` (with a `until` predicate + `maxIterations`) or `condition` blocks. The LLM proposes; *your tools* decide whether the output is acceptable. Bad output never leaks into the next phase. + +A real example — the bundled `developer` agent's *generate → lint → test → fix-until-passing* flow: + +```yaml +spec: + flow: + - step: generate-code + - step: lint-and-format + - loop: + until: "{{steps.test-gate.output}}" # exits when test-gate emits "PASS" + maxIterations: 3 + do: + - step: run-tests + - step: test-gate + - step: fix-code + condition: "{{steps.test-gate.output}}" # skip fix if tests passed + - step: validate-output + - step: git-commit ``` -That's it. If `ANTHROPIC_API_KEY` is missing or `.agentforge/` is empty, the CLI prints a friendly pointer. Full walkthrough: [`docs/getting-started.md`](docs/getting-started.md). +Each step's output, exit code, duration, and OTel span land in the state store. The dashboard shows the whole harness, not just the LLM turn. --- ## Architecture: control plane + execution plane -AgentForge borrows Kubernetes' separation of concerns. A **control plane** decides *what* runs, *where*, and *when*. An **execution plane** — one or more **nodes** — actually runs the agents. - ``` ┌─────────────────────────────────────────────┐ │ CONTROL PLANE │ - │ │ - │ ┌──────────┐ ┌──────────┐ ┌────────┐ │ - │ │Dashboard │ │Scheduler │ │ Gates │ │ - │ │ + HTTP │ │ │ │ │ │ - │ └──────────┘ └──────────┘ └────────┘ │ - │ ┌────────────────┐ ┌──────────────────┐ │ - │ │ Definition │ │ State store │ │ - │ │ store (YAML / │ │ SQLite / Postgres│ │ - │ │ DB-backed) │ │ │ │ - │ └────────────────┘ └──────────────────┘ │ + │ Dashboard · Scheduler · Gates │ + │ Definition store · State store · Events │ └──────────────────┬──────────────────────────┘ │ dispatch jobs │ report results ▼ ┌─────────────────────────────────────────────┐ │ EXECUTION PLANE │ - │ │ - │ ┌───────────┐ ┌───────────┐ ┌────────┐ │ - │ │ node: │ │ node: │ │ node: │ │ - │ │ local │ │ docker │ │worker-1│ │ - │ │ (in-proc) │ │(container)│ │(ssh/ht)│ │ - │ └───────────┘ └───────────┘ └────────┘ │ + │ node: local · docker · ssh / http worker │ │ │ │ Agents run here — file system, LLM calls, │ │ shell, tools all live on the node. │ └─────────────────────────────────────────────┘ ``` -- **Control plane** — pipeline controller, gate controller, scheduler, definition store, state store, event bus, dashboard server. On a laptop it's a single Node.js process; in production it's one or more control-plane containers backed by Postgres. -- **Execution plane — nodes.** A node is anywhere an agent can run: the local process, a Docker container, a remote worker reached over SSH or HTTP. Nodes advertise **capabilities** (`llm-access`, `docker`, `local-fs`, `high-memory`, `git`, …) and the scheduler matches each agent's `nodeAffinity` to the pool. -- **Same binary.** Both planes are in the `agentforge` binary. On a laptop, one process hosts both. In distributed deployments, you run control-plane and worker containers from the same image — just different CLI invocations. - -Deep dive: [`docs/platform-architecture.md`](docs/platform-architecture.md) · [`docs/pipeline-execution-flows.md`](docs/pipeline-execution-flows.md). +- **Control plane** — pipeline / gate controllers, scheduler, definition store, state store, event bus, dashboard server. +- **Execution plane (nodes)** — the local process, a Docker container, or a remote worker over SSH / HTTP. Nodes advertise **capabilities** (`llm-access`, `docker`, `local-fs`, `git`, `gpu`, …) and the scheduler matches each agent's `nodeAffinity` to the pool. +- **Same binary.** On a laptop, one process hosts both. In production, run control-plane and worker containers from the same image with different CLI invocations. --- -## Core Concepts +## Core concepts | Concept | What it is | Defined in | |---|---|---| @@ -131,44 +109,7 @@ Deep dive: [`docs/platform-architecture.md`](docs/platform-architecture.md) · [ | **Artifact** | A typed, validated JSON document passed between agents. | `.agentforge/schemas/*.schema.yaml` | | **Gate** | A pause point between phases for human review (approve / reject / revise). | Inline in the pipeline | -### Nodes in more detail - -Nodes are declarative like everything else: - -```yaml -# .agentforge/nodes/local.node.yaml -apiVersion: agentforge/v1 -kind: NodeDefinition -metadata: - name: local - type: local -spec: - connection: - type: local - capabilities: [llm-access, local-fs, git] - resources: - maxConcurrentRuns: 3 -``` - -```yaml -# .agentforge/nodes/worker-1.node.yaml (platform binary) -apiVersion: agentforge/v1 -kind: NodeDefinition -metadata: - name: worker-1 - type: ssh -spec: - connection: - type: ssh - host: worker1.internal - user: ci - keyFile: ~/.ssh/deploy-key - capabilities: [llm-access, docker, high-memory] - resources: - maxConcurrentRuns: 10 -``` - -Agents ask for capabilities they need: +Agents declare which capabilities they need: ```yaml # .agentforge/agents/developer.agent.yaml (excerpt) @@ -176,20 +117,18 @@ spec: nodeAffinity: required: - capability: llm-access - - capability: docker # agent writes files + runs shell — needs isolation + - capability: docker # writes files + runs shell — needs isolation preferred: - - capability: high-memory # prefer a beefy node if one's available + - capability: high-memory # prefer a beefy node if available ``` -The scheduler picks the highest-scoring node whose capabilities satisfy the required set. Soft preferences break ties. - -Deeper architectural tour: [`docs/architecture.md`](docs/architecture.md). +The scheduler picks the highest-scoring node whose capabilities satisfy the required set; soft preferences break ties. --- ## Reference template — `simple-sdlc` -Three agents wired into a classic requirements → architecture → implementation flow. Use it to learn the mechanics; customise for your own domain. +Three agents wired into a classic requirements → architecture → implementation flow. ``` Brief ──► analyst ─► [gate] ─► architect ─► [gate] ─► developer ─► done @@ -203,198 +142,32 @@ metadata: name: simple-sdlc spec: input: - - name: brief - type: raw-brief - required: true + - { name: brief, type: raw-brief, required: true } phases: - - name: requirements - phase: 1 - agents: [analyst] - gate: { required: true } - - name: architecture - phase: 2 - agents: [architect] - gate: { required: true } - - name: implementation - phase: 3 - agents: [developer] -``` - -```yaml -# .agentforge/agents/analyst.agent.yaml (excerpt) -apiVersion: agentforge/v1 -kind: AgentDefinition -metadata: - name: analyst - role: Requirements Analyst -spec: - executor: pi-ai - model: - provider: anthropic - name: claude-sonnet-4-20250514 - systemPrompt: - file: prompts/analyst.system.md - inputs: - - type: raw-brief - required: true - outputs: - - type: requirements - schema: schemas/requirements.schema.yaml -``` - -More templates — `api-builder`, `code-review`, `content-generation`, `data-pipeline`, `seo-review` — ship with the platform binary. Catalog: [`docs/templates.md`](docs/templates.md). - ---- - -## Agents are mini-pipelines — the harness model - -An agent isn't forced to be "one LLM call". Each agent can declare a **flow** of named steps — LLM calls, shell scripts, schema validation, transforms — and wrap them in conditionals and loops. The LLM produces code or content; your own tools validate it, your linter fixes it, your test runner verifies it, your security scanner flags it. Agents become a harness around the LLM, not a thin wrapper over it. - -### Step types - -| Type | What it does | -|---|---| -| `llm` | Invokes the agent's model with the system prompt + inputs. The normal LLM call. | -| `script` | Runs a shell command on the node. Has access to template variables (`{{run.workdir}}`, `{{pipeline.id}}`, `{{steps..output}}`, `{{steps..exitCode}}`). | -| `validate` | Runs a Zod / JSON Schema check against a named artifact or the last LLM output. Fails the run by default; set `continueOnError: true` to log and continue. | -| `transform` | Pure data reshape between steps (no side effects). | - -Plus two control-flow constructs usable anywhere in a flow: - -- **`loop`** — retry a block until a predicate step outputs a success sentinel, with a `maxIterations` ceiling. -- **`condition`** — skip a step when a referenced step's output doesn't match. - -### Real example — the bundled `developer` agent - -This is from `packages/core/src/templates/simple-sdlc/agents/developer.agent.yaml`. It shows the *generate → lint → test → fix-until-passing* pattern that `script` + `loop` unlock together: - -```yaml -spec: - executor: pi-coding-agent - tools: [read, write, edit, bash, grep, find] - - definitions: - generate-code: - type: llm - instructions: | - Generate the full implementation based on the requirements and architecture plan. - - lint-and-format: - type: script - run: | - cd {{run.workdir}} - # Auto-detect + run the project's linter/formatter - if [ -f package.json ]; then npx eslint src/ --fix; npx prettier --write "src/**/*.{ts,js}" - elif [ -f pyproject.toml ]; then python -m black .; python -m ruff check --fix . - elif [ -f go.mod ]; then gofmt -w . - fi - continueOnError: true - - run-tests: - type: script - run: | - cd {{run.workdir}} - if [ -f package.json ]; then npm test - elif [ -f pyproject.toml ]; then python -m pytest -v - elif [ -f go.mod ]; then go test ./... - fi - captureOutput: true - continueOnError: true - - test-gate: - type: script - run: | - if [ "{{steps.run-tests.exitCode}}" = "0" ]; then echo "PASS"; else echo "false"; fi - - fix-code: - type: llm - instructions: | - Fix attempt {{loop.iteration}} of {{loop.maxIterations}}. - Failing tests: - {{steps.run-tests.output}} - Fix the source code — don't modify tests unless they have a genuine bug. - - validate-output: - type: validate - schema: code-output - - git-commit: - type: script - run: | - cd {{run.workdir}} - git add -A && git commit -m "feat(developer): pipeline {{pipeline.id}}" - continueOnError: true - - flow: - - step: generate-code - - step: lint-and-format - - loop: - until: "{{steps.test-gate.output}}" # exits when test-gate emits "PASS" - maxIterations: 3 - do: - - step: run-tests - - step: test-gate - - step: fix-code - condition: "{{steps.test-gate.output}}" # skip fix if tests passed - - step: validate-output - - step: git-commit + - { name: requirements, phase: 1, agents: [analyst], gate: { required: true } } + - { name: architecture, phase: 2, agents: [architect], gate: { required: true } } + - { name: implementation, phase: 3, agents: [developer] } ``` -### Why this matters - -- **Your existing tools stay in charge of correctness.** The LLM proposes; `eslint`, `pytest`, `go vet`, `trivy`, `semgrep`, whatever you already trust, decide whether it's acceptable. Bad LLM output doesn't leak into the next phase. -- **Customize without forking.** Want a different linter, a stricter security scan, a different commit convention? It's YAML — edit the `run:` block. No framework recompile. -- **Domain-agnostic.** The same mechanics build a content agent (generate → SEO audit → Grammarly → publish), a data agent (generate SQL → explain-plan → dry-run → apply), an ops agent (generate runbook → shellcheck → render to PDF). Scripts are the universal glue. -- **Observable.** Every step — LLM and script — lands in the state store with output, exit code, duration, and a span in your OTel trace. The dashboard timeline shows the whole harness, not just the LLM turn. - -Deeper dive on step pipelines, template variables, and loop semantics: [`docs/architecture.md`](docs/architecture.md) and [`docs/pipeline-execution-flows.md`](docs/pipeline-execution-flows.md). - ---- - -## `@mandarnilange/agentforge` vs `@mandarnilange/agentforge-core` - -Two npm packages ship from this repo. Pick based on your target environment. - -| | **`@mandarnilange/agentforge-core`** | **`@mandarnilange/agentforge`** (platform) | -|---|---|---| -| **Install** | `npm install @mandarnilange/agentforge-core` | `npm install @mandarnilange/agentforge` (pulls in core) | -| **Binary** | `agentforge-core` | `agentforge` | -| **Intended for** | Local dev, evaluation, library embed | Production, teams, multi-host | -| **LLM providers** | Anthropic | Anthropic + OpenAI + Gemini + Ollama | -| **Executors** | Local (in-process) | Local + **Docker container** + **Remote HTTP** | -| **Node types** | `local` | `local` + `ssh` + remote workers | -| **State store** | SQLite (file) | SQLite **or** PostgreSQL | -| **Persistent definitions** | YAML on disk, loaded per run | YAML on disk **or** `apply` to DB (versioned, hot-reload) | -| **Observability** | OTel API (no-op without SDK) | Full OTel SDK + Jaeger / Grafana export | -| **Crash recovery** | — | Pipeline rehydration + reconciliation loop | -| **Rate limiting** | — | Token / cost / concurrency per pipeline | -| **Multi-host deploy** | — | Control-plane + worker Docker Compose files | -| **Docker image** | `ghcr.io/mandarnilange/agentforge-core` (~289 MB) | `ghcr.io/mandarnilange/agentforge-platform` (~336 MB) | - -**Rule of thumb:** start with `@mandarnilange/agentforge-core` if you want the smallest surface for experimentation or you're embedding AgentForge inside your own CLI. Otherwise install `@mandarnilange/agentforge` — defaults are identical for local dev (SQLite, local executor, Anthropic), and every production feature is available the day you need it. *You won't have to migrate.* - -Multi-provider setup (OpenAI, Gemini, Ollama): [`docs/multi-provider.md`](docs/multi-provider.md). +More templates — `api-builder`, `code-review`, `content-generation`, `data-pipeline`, `seo-review` — ship with the platform binary. --- ## Deployment topologies -Three ways to run AgentForge, smallest to largest. Same YAML, same binary — only the deployment shape changes. +Same YAML, same binary — three shapes from smallest to largest. ### 1. Laptop — single process ``` ┌─────────────────────────────────────────────────┐ │ agentforge (one Node.js process) │ -│ │ │ control plane ──dispatch──► local node │ -│ │ -│ SQLite state (./output/.state.db) │ -│ Anthropic LLM (ANTHROPIC_API_KEY) │ +│ SQLite state · Anthropic LLM │ └─────────────────────────────────────────────────┘ ``` -For evaluation, demos, and most small projects. `npx @mandarnilange/agentforge dashboard` starts everything. If you prefer running it in Docker without Postgres or OTel: +`npx @mandarnilange/agentforge dashboard` starts everything. Dockerized variant available without Postgres / OTel: ```bash docker compose up -d # Dashboard at :3001 @@ -404,247 +177,137 @@ PROJECT=my-app BRIEF="Build a todo app" \ ### 2. Single host — production on one box -``` -┌─────────────────────────────────────────────────────┐ -│ Docker host │ -│ │ -│ ┌──────────┐ ┌────────────┐ ┌──────────┐ │ -│ │ Postgres │◄───►│ agentforge │────►│ Docker │ │ -│ │ │ │ (control + │ │ executor │ │ -│ └──────────┘ │ local node)│ │ (node) │ │ -│ └────────────┘ └──────────┘ │ -│ │ │ -│ ▼ │ -│ Jaeger + Grafana │ -└─────────────────────────────────────────────────────┘ -``` +Postgres durability, OTel tracing, Docker-isolated agent runs. ```bash docker compose -f packages/platform/docker-compose.prod.yml up -d ``` -One-box setup for a small team. Postgres durability, OTel tracing, Docker-isolated agent runs, dashboard at `:3001`. - ### 3. Distributed — control plane + worker pool -``` - ┌────────────────────────────────┐ ┌─────────────────────────────┐ - │ Control-plane host │ │ Worker host #1 │ - │ │ │ │ - │ ┌──────────────────────────┐ │ │ ┌───────────────────────┐ │ - │ │ agentforge │ │ HTTP │ │ agentforge node start │ │ - │ │ scheduler · dashboard │◄─┼────────┼─►│ │ │ - │ │ gates · state · events │ │ │ │ Docker executor │ │ - │ └──────────────────────────┘ │ │ │ local-fs · git │ │ - │ ┌──────────────────────────┐ │ │ └───────────────────────┘ │ - │ │ Postgres │ │ └─────────────────────────────┘ - │ └──────────────────────────┘ │ ▲ - └────────────────────────────────┘ │ - ┌───────────────┴─────────────┐ - │ Worker host #2 … N │ - │ (same image, more pods) │ - └─────────────────────────────┘ -``` - ```bash -# Control plane host +# Control-plane host docker compose -f packages/platform/docker-compose.control-plane.yml up -d -# Every worker host +# Each worker host CONTROL_PLANE_URL=http://cp-host:3001 \ docker compose -f packages/platform/docker-compose.worker.yml up -d ``` -Workers register with the control plane, heartbeat, and receive dispatched agent jobs. Scale horizontally by adding worker hosts; the scheduler routes work using node capabilities + current load. - -> **Current limitation — control plane is single-replica.** The execution plane scales horizontally to many worker hosts, but the control plane itself should be run as a single instance today. The pending-job queue, scheduler state, and event bus are process-local, so running two control-plane replicas will split-brain (lost dispatches, halved SSE updates, racing reconcilers). This is tracked as a roadmap item with a concrete path to fix — see [`ROADMAP.md`](ROADMAP.md#horizontal-scaling-of-the-control-plane). +Workers register, heartbeat, and receive dispatched jobs. Heterogeneous pools (GPU vs lightweight) are routed by `nodeAffinity`. -#### Heterogeneous worker pools +> **Current limitation — control plane is single-replica.** The execution plane scales horizontally to many worker hosts, but the control plane itself should be run as a single instance today. Pending-job queue, scheduler state, and event bus are process-local; running two replicas will split-brain. Tracked with a concrete fix path — see [`ROADMAP.md`](ROADMAP.md#horizontal-scaling-of-the-control-plane). -Two workers with different capabilities on different hosts — the scheduler picks the right one for each agent via `nodeAffinity`. +--- -```bash -# Worker A — beefy, Docker-isolated, GPU -NODE_NAME=worker-gpu \ -NODE_CAPABILITIES=llm-access,docker,high-memory,gpu \ -NODE_MAX_CONCURRENT_RUNS=4 \ -CONTROL_PLANE_URL=http://cp:3001 \ - docker compose -f packages/platform/docker-compose.worker.yml up -d +## Two packages — which one to install -# Worker B — lightweight, llm-calls only -NODE_NAME=worker-light \ -NODE_CAPABILITIES=llm-access \ -NODE_MAX_CONCURRENT_RUNS=10 \ -CONTROL_PLANE_URL=http://cp:3001 \ - docker compose -f packages/platform/docker-compose.worker.yml up -d -``` +Install **`@mandarnilange/agentforge`** unless you have a specific reason not to. Defaults are identical for local dev (SQLite, local executor, Anthropic), and every production feature is available the day you need it — no migration. -Matching agent — the `developer` agent demands Docker isolation and benefits from GPU, so it routes to `worker-gpu`. The `analyst` agent only needs LLM access, so it lands on `worker-light`: +Install **`@mandarnilange/agentforge-core`** if you want the framework primitives without multi-provider middleware, Postgres, or the Docker / SSH executors — typically when embedding AgentForge in your own CLI. -```yaml -# .agentforge/agents/developer.agent.yaml -spec: - nodeAffinity: - required: [{ capability: llm-access }, { capability: docker }] - preferred: [{ capability: gpu }, { capability: high-memory }] -``` +--- -```yaml -# .agentforge/agents/analyst.agent.yaml -spec: - nodeAffinity: - required: [{ capability: llm-access }] -``` +## Dashboard -Verify the pool: +A React SPA served by the same binary. Real-time pipeline view via Server-Sent Events: run list with status / cost / progress, phase-by-phase timeline with live agent conversations, gate management (approve / reject / revise in-browser), artifact viewer with type-aware renderers, PDF export. ```bash -agentforge get nodes -# NAME STATUS CAPABILITIES ACTIVE/MAX -# worker-gpu online llm-access, docker, high-memory, gpu 0/4 -# worker-light online llm-access 0/10 +npx @mandarnilange/agentforge dashboard --port 3001 ``` -Docker image build commands: - -```bash -docker build --target core -t agentforge-core . # ~289 MB -docker build --target platform -t agentforge-platform . # ~336 MB -``` +When `ANTHROPIC_API_KEY` isn't set, the dashboard renders a read-only banner — useful for browsing completed runs. --- -## Dashboard - -A React SPA served by the same binary. Real-time pipeline view via Server-Sent Events: +## Agent Skills -- Pipeline run list with status, cost, and progress -- Phase-by-phase timeline with live agent conversations -- Gate management (approve / reject / revise in-browser) -- Artifact viewer with type-aware renderers -- PDF export of a completed run +Designing an AgentForge workflow from scratch is a lot of YAML. The repo ships an [agent skill](https://skills.sh) that walks any Claude Code / Cursor / Codex session through the design — agents, phases, gates, loops, parallelism, wiring, nodes — and emits a working `.agentforge/` directory. ```bash -npx @mandarnilange/agentforge dashboard --port 3001 +npx skills add mandarnilange/agentforce_public/agentforge-workflow ``` -When `ANTHROPIC_API_KEY` isn't set, the dashboard renders a read-only banner — useful for browsing completed runs. +Then ask the agent something like *"help me design an AgentForge pipeline for PR triage"* and the skill kicks in. Catalog and authoring docs: [`skills/`](skills/). --- -## CLI Reference +## CLI reference ```bash agentforge init --template # Scaffold .agentforge/ from a template agentforge templates list # Show bundled templates -agentforge list # List agents in the current project -agentforge info # Agent details agentforge exec [options] # Run a single agent agentforge run --project # Start a pipeline agentforge run --continue # Resume a paused pipeline agentforge dashboard # Start the web dashboard agentforge get pipelines # List pipeline runs -agentforge get pipeline # Inspect a run -agentforge gate approve # Approve a gate -agentforge gate reject # Reject a gate -agentforge gate revise # Request revision +agentforge gate {approve,reject,revise} # Gate actions agentforge logs # View agent run logs agentforge apply -f # Apply persistent YAML definitions (platform) agentforge get nodes # List registered worker nodes (platform) agentforge node start --control-plane-url # Run as a worker (platform) ``` -Full command semantics, flag reference, and resume flow: [`docs/getting-started.md`](docs/getting-started.md). - ---- - -## Artifact Typing & Validation - -Every agent declares typed inputs and outputs. Artifacts are validated against Zod / JSON Schemas at every pipeline boundary — invalid output fails the agent run before it reaches the next phase. - -``` -Agent YAML Zod Schema Runtime -┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐ -│ outputs: │ │ RequirementsSchema │ │ Agent produces JSON │ -│ - type: │───▶│ = z.object({ │───▶│ → safeParse(output) │ -│ requirements│ │ epics: [...], │ │ → pass ✓ or fail ✗ │ -│ schema: ... │ │ ... │ └──────────────────────┘ -└─────────────────┘ │ }) │ - └──────────────────────┘ -``` - -Ships with 45 built-in schemas covering requirements, architecture, code, data, testing, security, and DevOps. Define your own in TypeScript with Zod and reference them from agent YAML. Details: [`docs/architecture.md`](docs/architecture.md#artifact-flow). - --- -## Environment Variables +## Environment variables | Variable | Required | Default | Description | |---|---|---|---| -| `ANTHROPIC_API_KEY` | Yes | — | Anthropic API key. Missing key prints a friendly error with a link to the console. | -| `OPENAI_API_KEY` | If using OpenAI | — | OpenAI API key. | -| `GOOGLE_API_KEY` | If using Gemini | — | Google AI API key. | +| `ANTHROPIC_API_KEY` | Yes | — | Anthropic API key. | +| `OPENAI_API_KEY` / `GOOGLE_API_KEY` | If using | — | Other providers. | | `OLLAMA_BASE_URL` | No | `http://localhost:11434` | Ollama server URL. | | `AGENTFORGE_DEFAULT_MODEL` | No | `claude-sonnet-4-20250514` | Default model. | -| `AGENTFORGE_MAX_TOKENS` | No | `64000` | Max output tokens. | -| `AGENTFORGE_LLM_TIMEOUT_SECONDS` | No | `600` | Wall-clock timeout per agent LLM call. Set `0` to disable. | -| `AGENTFORGE_OUTPUT_DIR` | No | `./output` | Artifact output directory. | -| `AGENTFORGE_DIR` | No | `./.agentforge` | Path to definitions directory. | -| `AGENTFORGE_LOG_LEVEL` | No | `info` | Log level. | +| `AGENTFORGE_LLM_TIMEOUT_SECONDS` | No | `600` | Wall-clock timeout per LLM call (`0` disables). | +| `AGENTFORGE_OUTPUT_DIR` / `AGENTFORGE_DIR` | No | `./output` / `./.agentforge` | Output and definitions paths. | | `AGENTFORGE_STATE_STORE` | No | `sqlite` | `sqlite` or `postgres`. | -| `AGENTFORGE_POSTGRES_URL` | If `postgres` | — | `postgres://user:pass@host:port/db` — masked in logs. | +| `AGENTFORGE_POSTGRES_URL` | If `postgres` | — | Connection URL — masked in logs. | | `OTEL_EXPORTER_OTLP_ENDPOINT` | No | — | Enables OTel tracing export. | -### Reliability - -- **LLM timeouts.** Every agent LLM call is bounded (default 600s, per-agent override via `spec.resources.timeoutSeconds`). Timeouts abort in-flight HTTP and fail with an actionable error. -- **Retry on `overloaded_error`.** Anthropic HTTP 529 is retried 3× with exponential backoff (2s, 4s, 8s). Caller aborts take precedence. -- **Secret masking.** API keys and the Postgres URL are registered at startup and replaced with `***` in logs, errors, and conversation transcripts. +**Reliability:** every LLM call is bounded by `timeoutSeconds` (per-agent override available). Anthropic HTTP 529 (`overloaded_error`) is retried 3× with exponential backoff. API keys and the Postgres URL are masked in logs, errors, and conversation transcripts. --- -## Documentation +## Learn more -Deep-dive guides live in [`docs/`](docs/): +Every deep-dive lives in [`docs/`](docs/). Pick a track: -- **[Getting Started](docs/getting-started.md)** — full walkthrough from install to running a pipeline. -- **[Architecture](docs/architecture.md)** — control plane, domain model, ports & adapters. -- **[Platform Architecture](docs/platform-architecture.md)** — distributed execution, schedulers, recovery. -- **[Pipeline Execution Flows](docs/pipeline-execution-flows.md)** — how a run actually moves through the system. -- **[Multi-Provider Execution](docs/multi-provider.md)** — using OpenAI, Gemini, and Ollama alongside Anthropic. +**Get started** +- **[Getting Started](docs/getting-started.md)** — install to first pipeline run, full CLI walkthrough, resume flow. +- **[Who Uses It](docs/who-uses-it.md)** — what platform engineers, software engineers, and domain owners each get out of AgentForge. - **[Templates](docs/templates.md)** — catalog of bundled pipeline templates. -- **[pi-coding-agent Extensions](docs/pi-coding-agent-extensions.md)** — adding custom tools and lifecycle hooks. -- **[Testing Guide](docs/testing-guide.md)** — how to run tests, dry-runs, and real pipelines. ---- - -## Contributing +**Concepts** +- **[Harness Model](docs/harness-model.md)** — full step-grammar walkthrough and the bundled `developer` agent's test-fix loop. +- **[Architecture](docs/architecture.md)** — control plane, domain model, ports & adapters, step grammar. +- **[Pipeline Execution Flows](docs/pipeline-execution-flows.md)** — how a run moves through the system. +- **[Artifact Typing](docs/artifacts.md)** — typed inputs/outputs, schema validation, why malformed LLM output fails fast. -Contributions are welcome — bug reports, feature ideas, documentation fixes, and code. +**Operate** +- **[Platform Architecture](docs/platform-architecture.md)** — distributed execution, schedulers, recovery, heterogeneous worker pools. +- **[Packages — core vs platform](docs/packages.md)** — feature comparison and embedding the framework. +- **[Multi-Provider Execution](docs/multi-provider.md)** — OpenAI, Gemini, and Ollama alongside Anthropic. -- **Bug reports / feature requests:** [GitHub issues](https://github.com/mandarnilange/agentforge/issues). -- **Development setup and conventions:** [`CONTRIBUTING.md`](CONTRIBUTING.md). -- **Testing workflow:** [`docs/testing-guide.md`](docs/testing-guide.md). -- **Larger architectural work / deferred items:** [`ROADMAP.md`](ROADMAP.md) — each entry is issue-ready, pick one up. -- **Pull requests:** small, focused, with tests. Conventional-commit messages preferred. - -Everything is MIT-licensed. Contributions land under the same licence. +**Extend & test** +- **[pi-coding-agent Extensions](docs/pi-coding-agent-extensions.md)** — adding custom tools and lifecycle hooks. +- **[Testing Guide](docs/testing-guide.md)** — running tests, dry-runs, and real pipelines. --- -## Using just the framework (`@mandarnilange/agentforge-core`) +## Stability -If you're embedding the engine into your own CLI or service — or you want the framework without the platform binary, multi-provider middleware, or Postgres — install `@mandarnilange/agentforge-core` directly: +v0.2.0 release candidate (`v0.2.0-rc.2`) — early-feedback build. API surface is stabilising but may still shift. `npm install @mandarnilange/agentforge` pulls the RC. [Open an issue](https://github.com/mandarnilange/agentforge/issues) for anything that looks rough, or use [Discussions](https://github.com/mandarnilange/agentforge/discussions) for usage questions. -```bash -npm install @mandarnilange/agentforge-core -npx @mandarnilange/agentforge-core init --template simple-sdlc -``` +--- -Same YAML schema, same executors, same control plane. You wire your own entry point. Package-level docs: [`packages/core/README.md`](packages/core/README.md). +## Contributing ---- +Bug reports, feature ideas, doc fixes, and code are all welcome. -## License +- **Issues / requests:** [GitHub issues](https://github.com/mandarnilange/agentforge/issues). +- **Dev setup and conventions:** [`CONTRIBUTING.md`](CONTRIBUTING.md). +- **Larger architectural work:** [`ROADMAP.md`](ROADMAP.md) — every entry is issue-ready. +- **Pull requests:** small, focused, with tests. Conventional-commit messages preferred. -MIT — see [`LICENSE`](LICENSE). +MIT-licensed — see [`LICENSE`](LICENSE). diff --git a/docs/artifacts.md b/docs/artifacts.md new file mode 100644 index 0000000..0c2c4a5 --- /dev/null +++ b/docs/artifacts.md @@ -0,0 +1,42 @@ +# Artifact Typing & Validation + +> Part of the [AgentForge documentation](README.md). + +Every agent declares typed inputs and outputs. Artifacts are validated against Zod / JSON Schemas at every pipeline boundary — invalid output fails the agent run before it reaches the next phase. + +``` +Agent YAML Zod Schema Runtime +┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐ +│ outputs: │ │ RequirementsSchema │ │ Agent produces JSON │ +│ - type: │───▶│ = z.object({ │───▶│ → safeParse(output) │ +│ requirements│ │ epics: [...], │ │ → pass ✓ or fail ✗ │ +│ schema: ... │ │ ... │ └──────────────────────┘ +└─────────────────┘ │ }) │ + └──────────────────────┘ +``` + +## What ships + +45 built-in schemas covering requirements, architecture, code, data, testing, security, and DevOps — see `packages/core/src/schemas/`. Every shipped template references them so you can compose pipelines without inventing new artifact types. + +## Defining your own + +Add a Zod schema in TypeScript and reference it from agent YAML by file path: + +```yaml +# .agentforge/agents/my-agent.agent.yaml +spec: + outputs: + - type: my-artifact + schema: schemas/my-artifact.schema.yaml +``` + +The schema file can be either Zod-shaped TypeScript (loaded via the schema registry) or a JSON Schema YAML — both validate at the same boundary. + +Architectural details and the artifact flow through phases: [`docs/architecture.md`](architecture.md#artifact-flow). + +## Why this matters + +- **Malformed LLM output is caught early.** Bad JSON, missing fields, or wrong types abort the agent run before downstream agents consume it. +- **Wiring is explicit.** Each agent's `inputs[].type` and `outputs[].type` form a contract. Two agents producing the same type is a configuration error you catch at `agentforge validate` time, not at runtime. +- **Schemas double as docs.** New team members understand what each phase produces by reading one file. diff --git a/docs/harness-model.md b/docs/harness-model.md new file mode 100644 index 0000000..426ad4c --- /dev/null +++ b/docs/harness-model.md @@ -0,0 +1,132 @@ +# The Harness Model + +> Part of the [AgentForge documentation](README.md). + +Most agent frameworks treat an "agent" as one LLM call wrapped in a few tools. AgentForge treats an agent as a **harness** — a named flow of steps where the LLM is just one step type. Your existing tools (linters, test runners, security scanners, custom CLIs) sit alongside the LLM and *gate its output* on every run. + +The result: bad LLM output never leaks to the next phase, and you customise behaviour by editing YAML — not by forking the framework. + +--- + +## Step types + +Each agent declares a flow of named steps from this set: + +| Type | What it does | +|---|---| +| `llm` | Invokes the agent's model with the system prompt + inputs. The normal LLM call. | +| `script` | Runs a shell command on the node. Has access to template variables (`{{run.workdir}}`, `{{pipeline.id}}`, `{{steps..output}}`, `{{steps..exitCode}}`). | +| `validate` | Runs a Zod / JSON Schema check against a named artifact or the last LLM output. Fails the run by default; set `continueOnError: true` to log and continue. | +| `transform` | Pure data reshape between steps (no side effects). | + +Plus two control-flow constructs usable anywhere in a flow: + +- **`loop`** — retry a block until a predicate step outputs a success sentinel, with a `maxIterations` ceiling. +- **`condition`** — skip a step when a referenced step's output doesn't match. + +--- + +## Real example — the bundled `developer` agent + +This is from `packages/core/src/templates/simple-sdlc/agents/developer.agent.yaml`. It shows the *generate → lint → test → fix-until-passing* pattern that `script` + `loop` unlock together: + +```yaml +spec: + executor: pi-coding-agent + tools: [read, write, edit, bash, grep, find] + + definitions: + generate-code: + type: llm + instructions: | + Generate the full implementation based on the requirements and architecture plan. + + lint-and-format: + type: script + run: | + cd {{run.workdir}} + # Auto-detect + run the project's linter/formatter + if [ -f package.json ]; then npx eslint src/ --fix; npx prettier --write "src/**/*.{ts,js}" + elif [ -f pyproject.toml ]; then python -m black .; python -m ruff check --fix . + elif [ -f go.mod ]; then gofmt -w . + fi + continueOnError: true + + run-tests: + type: script + run: | + cd {{run.workdir}} + if [ -f package.json ]; then npm test + elif [ -f pyproject.toml ]; then python -m pytest -v + elif [ -f go.mod ]; then go test ./... + fi + captureOutput: true + continueOnError: true + + test-gate: + type: script + run: | + if [ "{{steps.run-tests.exitCode}}" = "0" ]; then echo "PASS"; else echo "false"; fi + + fix-code: + type: llm + instructions: | + Fix attempt {{loop.iteration}} of {{loop.maxIterations}}. + Failing tests: + {{steps.run-tests.output}} + Fix the source code — don't modify tests unless they have a genuine bug. + + validate-output: + type: validate + schema: code-output + + git-commit: + type: script + run: | + cd {{run.workdir}} + git add -A && git commit -m "feat(developer): pipeline {{pipeline.id}}" + continueOnError: true + + flow: + - step: generate-code + - step: lint-and-format + - loop: + until: "{{steps.test-gate.output}}" # exits when test-gate emits "PASS" + maxIterations: 3 + do: + - step: run-tests + - step: test-gate + - step: fix-code + condition: "{{steps.test-gate.output}}" # skip fix if tests passed + - step: validate-output + - step: git-commit +``` + +--- + +## Why this matters + +- **Your existing tools stay in charge of correctness.** The LLM proposes; `eslint`, `pytest`, `go vet`, `trivy`, `semgrep`, whatever you already trust, decide whether the output is acceptable. Bad LLM output doesn't leak into the next phase. +- **Customise without forking.** Want a different linter, a stricter security scan, a different commit convention? It's YAML — edit the `run:` block. No framework recompile. +- **Domain-agnostic.** The same mechanics build a content agent (generate → SEO audit → Grammarly → publish), a data agent (generate SQL → explain-plan → dry-run → apply), an ops agent (generate runbook → shellcheck → render to PDF). Scripts are the universal glue. +- **Observable.** Every step — LLM and script — lands in the state store with output, exit code, duration, and a span in your OTel trace. The dashboard timeline shows the whole harness, not just the LLM turn. + +--- + +## Template variables + +Every `script.run`, `llm.instructions`, `condition`, and `loop.until` field is a template. Available bindings: + +- `{{run.workdir}}` — agent's working directory on the node +- `{{run.id}}`, `{{pipeline.id}}` — IDs for logging / commits +- `{{inputs.}}` — content of a declared input artifact +- `{{steps..output}}` / `.exitCode` — last result of a named step +- `{{loop.iteration}}` / `{{loop.maxIterations}}` — current loop position + +Full grammar and resolution semantics: [`docs/architecture.md`](architecture.md). + +--- + +## Step grammar reference + +For the authoritative shape of `step`, `loop`, `parallel`, and `condition` blocks, see the Zod schema in `packages/core/src/definitions/parser.ts` (`AgentDefinitionSchema`). Pipeline execution and artifact flow: [`docs/pipeline-execution-flows.md`](pipeline-execution-flows.md). diff --git a/docs/packages.md b/docs/packages.md new file mode 100644 index 0000000..978c31a --- /dev/null +++ b/docs/packages.md @@ -0,0 +1,49 @@ +# `@mandarnilange/agentforge` vs `@mandarnilange/agentforge-core` + +> Part of the [AgentForge documentation](README.md). + +Two npm packages ship from this repo. Most users want **`@mandarnilange/agentforge`** (the platform binary). Pick `@mandarnilange/agentforge-core` only if you're embedding the engine into your own CLI / service or you specifically don't want the platform extras. + +## Feature comparison + +| | **`@mandarnilange/agentforge-core`** | **`@mandarnilange/agentforge`** (platform) | +|---|---|---| +| **Install** | `npm install @mandarnilange/agentforge-core` | `npm install @mandarnilange/agentforge` (pulls in core) | +| **Binary** | `agentforge-core` | `agentforge` | +| **Intended for** | Local dev, evaluation, library embed | Production, teams, multi-host | +| **LLM providers** | Anthropic | Anthropic + OpenAI + Gemini + Ollama | +| **Executors** | Local (in-process) | Local + **Docker container** + **Remote HTTP** | +| **Node types** | `local` | `local` + `ssh` + remote workers | +| **State store** | SQLite (file) | SQLite **or** PostgreSQL | +| **Persistent definitions** | YAML on disk, loaded per run | YAML on disk **or** `apply` to DB (versioned, hot-reload) | +| **Observability** | OTel API (no-op without SDK) | Full OTel SDK + Jaeger / Grafana export | +| **Crash recovery** | — | Pipeline rehydration + reconciliation loop | +| **Rate limiting** | — | Token / cost / concurrency per pipeline | +| **Multi-host deploy** | — | Control-plane + worker Docker Compose files | +| **Docker image** | `ghcr.io/mandarnilange/agentforge-core` (~289 MB) | `ghcr.io/mandarnilange/agentforge-platform` (~336 MB) | + +Defaults are **identical** for local dev (SQLite, local executor, Anthropic). Installing the platform package up front means you won't have to migrate when you need a production feature. + +## Using just the framework (`@mandarnilange/agentforge-core`) + +If you're embedding the engine into your own CLI or service — or you want the framework without the platform binary, multi-provider middleware, or Postgres — install `@mandarnilange/agentforge-core` directly: + +```bash +npm install @mandarnilange/agentforge-core +npx @mandarnilange/agentforge-core init --template simple-sdlc +``` + +Same YAML schema, same executors, same control plane. You wire your own entry point. Package-level docs: [`packages/core/README.md`](../packages/core/README.md). + +## Multi-provider setup + +Mixing Anthropic, OpenAI, Gemini, and Ollama (one provider per agent in the same pipeline): [`docs/multi-provider.md`](multi-provider.md). + +## Docker images + +```bash +docker build --target core -t agentforge-core . # ~289 MB +docker build --target platform -t agentforge-platform . # ~336 MB +``` + +Image targets share the same `Dockerfile` — the `platform` target adds the platform-only entry points and env variables. diff --git a/docs/platform-architecture.md b/docs/platform-architecture.md index 02eeaab..8676d5d 100644 --- a/docs/platform-architecture.md +++ b/docs/platform-architecture.md @@ -2070,7 +2070,61 @@ interface AgentJobIdentity { --- -## 15. Glossary +## 15. Deploying heterogeneous worker pools + +The execution plane scales horizontally by adding worker hosts. Workers register with the control plane, heartbeat, and receive dispatched agent jobs. Two workers with **different capabilities** on different hosts let the scheduler route each agent to the right node via `nodeAffinity`. + +### Spinning up two specialised workers + +```bash +# Worker A — beefy, Docker-isolated, GPU +NODE_NAME=worker-gpu \ +NODE_CAPABILITIES=llm-access,docker,high-memory,gpu \ +NODE_MAX_CONCURRENT_RUNS=4 \ +CONTROL_PLANE_URL=http://cp:3001 \ + docker compose -f packages/platform/docker-compose.worker.yml up -d + +# Worker B — lightweight, llm-calls only +NODE_NAME=worker-light \ +NODE_CAPABILITIES=llm-access \ +NODE_MAX_CONCURRENT_RUNS=10 \ +CONTROL_PLANE_URL=http://cp:3001 \ + docker compose -f packages/platform/docker-compose.worker.yml up -d +``` + +### Matching agent affinity + +The `developer` agent demands Docker isolation and benefits from GPU, so it routes to `worker-gpu`. The `analyst` agent only needs LLM access, so it lands on `worker-light`: + +```yaml +# .agentforge/agents/developer.agent.yaml +spec: + nodeAffinity: + required: [{ capability: llm-access }, { capability: docker }] + preferred: [{ capability: gpu }, { capability: high-memory }] +``` + +```yaml +# .agentforge/agents/analyst.agent.yaml +spec: + nodeAffinity: + required: [{ capability: llm-access }] +``` + +### Verifying the pool + +```bash +agentforge get nodes +# NAME STATUS CAPABILITIES ACTIVE/MAX +# worker-gpu online llm-access, docker, high-memory, gpu 0/4 +# worker-light online llm-access 0/10 +``` + +The scheduler picks the highest-scoring node whose capabilities satisfy each agent's required set; soft preferences break ties and active-run counts cap concurrency per node. + +--- + +## 16. Glossary | Term | Definition | |------|-----------| diff --git a/docs/who-uses-it.md b/docs/who-uses-it.md new file mode 100644 index 0000000..9910915 --- /dev/null +++ b/docs/who-uses-it.md @@ -0,0 +1,41 @@ +# Who Uses AgentForge + +> Part of the [AgentForge documentation](README.md). + +AgentForge is a YAML-and-CLI framework. Engineers and platform teams author it; the *artifacts and gates* it produces are consumed across an organisation. + +## Roles, concretely + +### Platform / DevOps engineers +**Stand it up once, the rest of the org inherits the substrate.** + +- Run AgentForge as a control plane + worker pool for the company. +- Configure node pools, secrets, cost ceilings, OTel export. +- Add new pipelines as `git push` — no per-team glue code to maintain. +- Pair with [`docs/platform-architecture.md`](platform-architecture.md). + +### Software engineers +**Build the agents your domain needs; reuse the harness.** + +- Author `.agent.yaml`, `.pipeline.yaml`, and step pipelines. +- Wire your linter, tests, and security scanners as `script` steps so they gate the LLM. See the harness model: [`docs/harness-model.md`](harness-model.md). +- Ship AI-assisted features without giving up code-review discipline — every step lands in the OTel trace and the dashboard timeline. + +### Product / domain owners (marketing, sales, HR, ops, legal, finance) +**Don't write YAML. Drive runs from the dashboard.** + +- Kick off pipelines via the dashboard or CLI ("run `seo-review` on this URL"). +- Approve / reject / revise at human gates between phases — plain-English revision notes, no code. +- Read and download the typed artifacts the pipeline produces. + +The artifact-typing model means revision notes steer the next LLM call: gates are a two-way conversation, not a rubber stamp. Every decision is signed, timestamped, and survives restarts. + +## What everyone gets + +One binary, one control plane, one audit trail: + +- **Cost guardrails at every layer.** Each agent declares its own token + dollar ceiling; pipelines carry org-wide limits. The dashboard shows spend in real time. Runaway LLM calls abort cleanly *before* they bill you. +- **Typed artifacts.** 45 built-in Zod / JSON Schemas for SDLC outputs (and you define your own). Malformed LLM output fails the run before it poisons the next phase. See [`docs/artifacts.md`](artifacts.md). +- **Humans in the loop.** Plain-English approvals between phases — the LLM proposes; the human decides. +- **Real-time dashboard.** Pipeline timeline, live agent conversation, artifact viewer, PDF export, cost tracking — same binary, no extra install. +- **Open source, MIT.** No paid tier, no cloud dependency, no telemetry. diff --git a/package.json b/package.json index 20f229c..36e3071 100644 --- a/package.json +++ b/package.json @@ -20,6 +20,7 @@ "test:watch": "vitest", "lint": "biome check --write .", "typecheck": "tsc --build", + "skills:validate": "node scripts/validate-skills.mjs", "clean": "rm -rf packages/core/dist packages/core/tsconfig.tsbuildinfo packages/core/src/dashboard/dist packages/platform/dist packages/platform/tsconfig.tsbuildinfo" }, "engines": { diff --git a/scripts/validate-skills.mjs b/scripts/validate-skills.mjs new file mode 100644 index 0000000..dd2caf6 --- /dev/null +++ b/scripts/validate-skills.mjs @@ -0,0 +1,129 @@ +#!/usr/bin/env node +// Validates skill SKILL.md frontmatter under `skills/`. +// +// Vercel skills format: name, description, license. Optional `metadata` +// (we accept it but only require version when present). +// +// House rules: +// - skill folder name must equal frontmatter `name` +// - skill name must start with PREFIX (default: agentforge-) + +import { readdirSync, readFileSync, statSync, existsSync } from "node:fs"; +import { join, relative, dirname } from "node:path"; +import { fileURLToPath } from "node:url"; + +const ROOT = join(dirname(fileURLToPath(import.meta.url)), ".."); +const SKILLS = join(ROOT, "skills"); +const PREFIX = process.env.SKILL_PREFIX ?? "agentforge-"; + +const errors = []; +const err = (msg) => errors.push(msg); + +function parseFrontmatter(md, filePath) { + if (!md.startsWith("---\n")) { + err(`${filePath}: missing YAML frontmatter (must start with '---')`); + return null; + } + const end = md.indexOf("\n---", 4); + if (end === -1) { + err(`${filePath}: unterminated frontmatter (no closing '---')`); + return null; + } + const block = md.slice(4, end); + const out = {}; + let currentKey = null; + let currentObj = null; + for (const rawLine of block.split("\n")) { + const line = rawLine.replace(/\s+$/, ""); + if (!line.trim() || line.trim().startsWith("#")) continue; + const indented = /^\s/.test(line); + const m = line.match(/^(\s*)([\w-]+):\s*(.*)$/); + if (!m) { + if (currentKey === "description" && indented) { + out.description = `${out.description ?? ""} ${line.trim()}`.trim(); + } + continue; + } + const [, indent, key, value] = m; + if (indent === "") { + currentKey = key; + currentObj = null; + if (value === "" || value === ">") { + out[key] = key === "metadata" ? {} : ""; + if (key === "metadata") currentObj = out[key]; + } else { + out[key] = stripQuotes(value); + } + } else if (currentObj && currentKey === "metadata") { + currentObj[key] = stripQuotes(value); + } else if (currentKey === "description") { + out.description = `${out.description ?? ""} ${line.trim()}`.replace(/\s+/g, " ").trim(); + } + } + return out; +} + +function stripQuotes(v) { + const t = v.trim(); + if ((t.startsWith('"') && t.endsWith('"')) || (t.startsWith("'") && t.endsWith("'"))) { + return t.slice(1, -1); + } + return t; +} + +function listSkills() { + if (!existsSync(SKILLS)) return []; + return readdirSync(SKILLS).filter((entry) => { + const full = join(SKILLS, entry); + return statSync(full).isDirectory() && !entry.startsWith("."); + }); +} + +function validateSkill(skillName) { + const skillDir = join(SKILLS, skillName); + const skillMd = join(skillDir, "SKILL.md"); + if (!existsSync(skillMd)) { + err(`${skillName}: missing SKILL.md`); + return null; + } + const md = readFileSync(skillMd, "utf8"); + const fm = parseFrontmatter(md, relative(ROOT, skillMd)); + if (!fm) return null; + + for (const k of ["name", "description", "license"]) { + if (!fm[k]) err(`${skillName}: frontmatter missing '${k}'`); + } + if (!fm.metadata || typeof fm.metadata !== "object") { + err(`${skillName}: frontmatter missing 'metadata' object`); + } else { + if (!fm.metadata.author) err(`${skillName}: 'metadata.author' missing`); + if (!fm.metadata.version) err(`${skillName}: 'metadata.version' missing`); + } + if (fm.name && fm.name !== skillName) { + err(`${skillName}: folder name does not match frontmatter name '${fm.name}'`); + } + if (fm.name && !fm.name.startsWith(PREFIX)) { + err(`${skillName}: name '${fm.name}' must start with '${PREFIX}'`); + } + return fm; +} + +const skills = listSkills(); +if (skills.length === 0) { + console.log(`No skills found in ${relative(ROOT, SKILLS)}/`); + process.exit(0); +} + +console.log(`Found ${skills.length} skill(s) in ${relative(ROOT, SKILLS)}/:`); +for (const name of skills) { + const fm = validateSkill(name); + const tag = fm?.metadata?.version ? ` (${fm.metadata.version})` : ""; + console.log(` - ${name}${tag}`); +} + +if (errors.length > 0) { + console.error(`\n✗ ${errors.length} error(s):`); + for (const e of errors) console.error(` - ${e}`); + process.exit(1); +} +console.log("\n✓ skills OK"); diff --git a/skills/README.md b/skills/README.md new file mode 100644 index 0000000..7d64d01 --- /dev/null +++ b/skills/README.md @@ -0,0 +1,75 @@ +# AgentForge Skills + +Agent skills for AgentForge, published via the +[Vercel skills ecosystem](https://skills.sh). + +## Installation + +Install all skills from this repo into your agent runtime: + +```bash +npx skills add mandarnilange/agentforce_public +``` + +Or install a single skill by path: + +```bash +npx skills add mandarnilange/agentforce_public/agentforge-workflow +``` + +## Available skills + +| Skill | What it does | +|---|---| +| [`agentforge-workflow`](./agentforge-workflow/SKILL.md) | Walks a user through designing an AgentForge workflow — agents, pipeline, gates, loops, parallelism, wiring, nodes — and emits a complete `.agentforge/` directory. | + +## Layout + +``` +skills/ +└── / + ├── SKILL.md <- entry point with YAML frontmatter + └── references/ <- on-demand reading (cheat sheets, examples) +``` + +Skill folder names and `name:` frontmatter must match and **must start +with `agentforge-`**. `npm run skills:validate` enforces both. + +## Frontmatter + +```yaml +--- +name: agentforge- +description: > + Both *what* the skill does and *when* to trigger it. +license: MIT +metadata: + author: + version: "0.1.0" +--- +``` + +Required: `name`, `description`, `license`, `metadata.author`, +`metadata.version`. + +## Authoring a new skill + +1. `mkdir -p skills/agentforge-/references` +2. Write `SKILL.md` with the frontmatter above and clear *trigger + conditions* in the description. +3. Add reference docs under `references/` for anything the skill should + load on demand. Keep `SKILL.md` short — it loads on every trigger. +4. `npm run skills:validate` +5. Open a PR. CI runs the same validator on every push. + +## Publishing + +There is no upload step. Once a skill lands on the default branch under +`skills//`, Vercel's CLI discovers it on install. It will appear on +[skills.sh](https://skills.sh) automatically once it accrues install +telemetry. + +## CI + +`.github/workflows/publish-skills.yml` validates frontmatter on every PR +and push to `main`. diff --git a/skills/agentforge-workflow/SKILL.md b/skills/agentforge-workflow/SKILL.md new file mode 100644 index 0000000..2a2fcbf --- /dev/null +++ b/skills/agentforge-workflow/SKILL.md @@ -0,0 +1,198 @@ +--- +name: agentforge-workflow +description: > + Guides a user through designing an AgentForge workflow — the agents, pipeline, + gates, loops, parallelism, wiring, and node placement — and emits a complete, + schema-valid `.agentforge/` directory. Trigger when the user asks to author, + define, scaffold, design, or modify an AgentForge pipeline / agent / node, or + says things like "help me set up a workflow", "I want to build an agent + pipeline", "what agents do I need for X", or "turn this template into ...". + Do NOT trigger for unrelated AI-agent frameworks (LangGraph, CrewAI, etc.). +license: MIT +metadata: + author: mandarnilange + version: "0.1.0" +--- + +# AgentForge Workflow + +You are helping the user define an **AgentForge** workflow. AgentForge is a +Kubernetes-style control plane for AI agents: agents are declarative YAML, run +in phased pipelines, and gate on human approval between phases. This skill +walks the user through the design decisions and produces a working +`.agentforge/` directory. + +## When to use this skill + +Trigger on any of: +- "Help me define / author / design / scaffold an AgentForge workflow" +- "What agents do I need for ?" +- "Turn this brief into a pipeline" +- "Modify the `