An open framework for agentic workflows. Bring your process, LLMs, scripts, agents, and infra — we handle the orchestration.
- Compose agent harnesses in YAML — LLM calls, scripts, validators, transforms, loops, and conditionals.
- Gate the LLM with your tools — deterministic steps (linters, tests, schemas) wrap non-deterministic LLM calls, so output is checked on every run.
- Plug in your stack — Anthropic, OpenAI, Gemini, Ollama, coding-agent runtimes. Every layer has an extension point where the built-in doesn't fit.
- Run anywhere — local, Docker, or remote workers; SQLite or Postgres; OTel-native.
- Scale like infra — multi-worker scheduling, approval gates, cost ceilings, live dashboard.
Ships with a reference SDLC template — runnable end-to-end in minutes. Domain-agnostic: code review, content generation, ops runbooks, data pipelines — anywhere multiple LLM calls need to be coordinated with humans in the loop.
# 1. Install
npm install @mandarnilange/agentforge
# 2. Set your Anthropic API key
export ANTHROPIC_API_KEY=sk-ant-...
# 3. Scaffold the reference template into .agentforge/
npx @mandarnilange/agentforge init --template simple-sdlc
# 4. Run the full pipeline (approval gates between phases)
npx @mandarnilange/agentforge run --project my-app --input "brief=Build a freelance invoicing SaaS"
# 5. Watch it live
npx @mandarnilange/agentforge dashboard # → http://localhost:3001Other frameworks treat an "agent" as one LLM call wrapped in tools. AgentForge treats an agent as a harness — a named flow of steps where your existing tools are first-class:
llm— call the model with the system prompt + inputs.script— run any shell command (linter, test runner, security scanner, your custom CLI).validate— Zod / JSON Schema check against an artifact. Fails the run by default.transform— pure data reshape between steps.
Wrap any of these in loop (with a until predicate + maxIterations) or condition blocks. The LLM proposes; your tools decide whether the output is acceptable. Bad output never leaks into the next phase.
A real example — the bundled developer agent's generate → lint → test → fix-until-passing flow:
spec:
flow:
- step: generate-code
- step: lint-and-format
- loop:
until: "{{steps.test-gate.output}}" # exits when test-gate emits "PASS"
maxIterations: 3
do:
- step: run-tests
- step: test-gate
- step: fix-code
condition: "{{steps.test-gate.output}}" # skip fix if tests passed
- step: validate-output
- step: git-commitEach step's output, exit code, duration, and OTel span land in the state store. The dashboard shows the whole harness, not just the LLM turn.
┌─────────────────────────────────────────────┐
│ CONTROL PLANE │
│ Dashboard · Scheduler · Gates │
│ Definition store · State store · Events │
└──────────────────┬──────────────────────────┘
│
dispatch jobs │ report results
▼
┌─────────────────────────────────────────────┐
│ EXECUTION PLANE │
│ node: local · docker · ssh / http worker │
│ │
│ Agents run here — file system, LLM calls, │
│ shell, tools all live on the node. │
└─────────────────────────────────────────────┘
- Control plane — pipeline / gate controllers, scheduler, definition store, state store, event bus, dashboard server.
- Execution plane (nodes) — the local process, a Docker container, or a remote worker over SSH / HTTP. Nodes advertise capabilities (
llm-access,docker,local-fs,git,gpu, …) and the scheduler matches each agent'snodeAffinityto the pool. - Same binary. On a laptop, one process hosts both. In production, run control-plane and worker containers from the same image with different CLI invocations.
| Concept | What it is | Defined in |
|---|---|---|
| Agent | A system prompt + typed I/O + optional step pipeline. Runs on a node. | .agentforge/agents/*.agent.yaml |
| Pipeline | A sequence of phases; each phase runs one or more agents and may end with an approval gate. | .agentforge/pipelines/*.pipeline.yaml |
| Node | An execution target — local, Docker, or remote SSH — with declared capabilities. |
.agentforge/nodes/*.node.yaml |
| Artifact | A typed, validated JSON document passed between agents. | .agentforge/schemas/*.schema.yaml |
| Gate | A pause point between phases for human review (approve / reject / revise). | Inline in the pipeline |
Agents declare which capabilities they need:
# .agentforge/agents/developer.agent.yaml (excerpt)
spec:
nodeAffinity:
required:
- capability: llm-access
- capability: docker # writes files + runs shell — needs isolation
preferred:
- capability: high-memory # prefer a beefy node if availableThe scheduler picks the highest-scoring node whose capabilities satisfy the required set; soft preferences break ties.
Three agents wired into a classic requirements → architecture → implementation flow.
Brief ──► analyst ─► [gate] ─► architect ─► [gate] ─► developer ─► done
# .agentforge/pipelines/simple-sdlc.pipeline.yaml
apiVersion: agentforge/v1
kind: PipelineDefinition
metadata:
name: simple-sdlc
spec:
input:
- { name: brief, type: raw-brief, required: true }
phases:
- { name: requirements, phase: 1, agents: [analyst], gate: { required: true } }
- { name: architecture, phase: 2, agents: [architect], gate: { required: true } }
- { name: implementation, phase: 3, agents: [developer] }More templates — api-builder, code-review, content-generation, data-pipeline, seo-review — ship with the platform binary.
Same YAML, same binary — three shapes from smallest to largest.
┌─────────────────────────────────────────────────┐
│ agentforge (one Node.js process) │
│ control plane ──dispatch──► local node │
│ SQLite state · Anthropic LLM │
└─────────────────────────────────────────────────┘
npx @mandarnilange/agentforge dashboard starts everything. Dockerized variant available without Postgres / OTel:
docker compose up -d # Dashboard at :3001
PROJECT=my-app BRIEF="Build a todo app" \
docker compose run --rm runner # One-shot pipelinePostgres durability, OTel tracing, Docker-isolated agent runs.
docker compose -f packages/platform/docker-compose.prod.yml up -d# Control-plane host
docker compose -f packages/platform/docker-compose.control-plane.yml up -d
# Each worker host
CONTROL_PLANE_URL=http://cp-host:3001 \
docker compose -f packages/platform/docker-compose.worker.yml up -dWorkers register, heartbeat, and receive dispatched jobs. Heterogeneous pools (GPU vs lightweight) are routed by nodeAffinity.
Current limitation — control plane is single-replica. The execution plane scales horizontally to many worker hosts, but the control plane itself should be run as a single instance today. Pending-job queue, scheduler state, and event bus are process-local; running two replicas will split-brain. Tracked with a concrete fix path — see
ROADMAP.md.
Install @mandarnilange/agentforge unless you have a specific reason not to. Defaults are identical for local dev (SQLite, local executor, Anthropic), and every production feature is available the day you need it — no migration.
Install @mandarnilange/agentforge-core if you want the framework primitives without multi-provider middleware, Postgres, or the Docker / SSH executors — typically when embedding AgentForge in your own CLI.
A React SPA served by the same binary. Real-time pipeline view via Server-Sent Events: run list with status / cost / progress, phase-by-phase timeline with live agent conversations, gate management (approve / reject / revise in-browser), artifact viewer with type-aware renderers, PDF export.
npx @mandarnilange/agentforge dashboard --port 3001When ANTHROPIC_API_KEY isn't set, the dashboard renders a read-only banner — useful for browsing completed runs.
The repo ships agent skills that drop into any Claude Code / Cursor / Codex session — installable in one command:
npx skills add mandarnilange/agentforge| Skill | Audience | What it does |
|---|---|---|
agentforge-workflow |
End user | Walks through designing an AgentForge workflow and emits a complete .agentforge/ directory. |
agentforge-cli |
Operator | Conversational interface to the AgentForge CLI — run, monitor, approve gates, apply YAML. |
agentforge-debug |
Operator | Triages stuck or failing pipeline runs from symptom to fix path. |
agentforge-template-author |
Contributor | Guides adding a new shipped template under packages/{core,platform}/src/templates/. |
Trigger phrases live in each skill's frontmatter — e.g. "help me design a PR-triage pipeline" fires agentforge-workflow, "approve the architect gate" fires agentforge-cli, "why is pipeline X stuck?" fires agentforge-debug.
New here? The Skill Quickstart is a 5-minute, fill-in-the-blanks markdown you edit, paste into your AI agent, and run. Catalog, changelog, and authoring docs: skills/.
agentforge init --template <name> # Scaffold .agentforge/ from a template
agentforge templates list # Show bundled templates
agentforge exec <agent> [options] # Run a single agent
agentforge run --project <name> # Start a pipeline
agentforge run --continue <run-id> # Resume a paused pipeline
agentforge dashboard # Start the web dashboard
agentforge get pipelines # List pipeline runs
agentforge gate {approve,reject,revise} # Gate actions
agentforge logs <run-id> # View agent run logs
agentforge apply -f <path> # Apply persistent YAML definitions (platform)
agentforge get nodes # List registered worker nodes (platform)
agentforge node start --control-plane-url <url> # Run as a worker (platform)| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | — | Anthropic API key. |
OPENAI_API_KEY / GOOGLE_API_KEY |
If using | — | Other providers. |
OLLAMA_BASE_URL |
No | http://localhost:11434 |
Ollama server URL. |
AGENTFORGE_DEFAULT_MODEL |
No | claude-sonnet-4-20250514 |
Default model. |
AGENTFORGE_LLM_TIMEOUT_SECONDS |
No | 600 |
Wall-clock timeout per LLM call (0 disables). |
AGENTFORGE_OUTPUT_DIR / AGENTFORGE_DIR |
No | ./output / ./.agentforge |
Output and definitions paths. |
AGENTFORGE_STATE_STORE |
No | sqlite |
sqlite or postgres. |
AGENTFORGE_POSTGRES_URL |
If postgres |
— | Connection URL — masked in logs. |
OTEL_EXPORTER_OTLP_ENDPOINT |
No | — | Enables OTel tracing export. |
Reliability: every LLM call is bounded by timeoutSeconds (per-agent override available). Anthropic HTTP 529 (overloaded_error) is retried 3× with exponential backoff. API keys and the Postgres URL are masked in logs, errors, and conversation transcripts.
Every deep-dive lives in docs/. Pick a track:
Get started
- Skill Quickstart — 5-minute, fill-in-the-blanks path from zero to your first run via the
agentforge-workflowskill. - Getting Started — install to first pipeline run, full CLI walkthrough, resume flow.
- Who Uses It — what platform engineers, software engineers, and domain owners each get out of AgentForge.
- Templates — catalog of bundled pipeline templates.
Concepts
- Harness Model — full step-grammar walkthrough and the bundled
developeragent's test-fix loop. - Architecture — control plane, domain model, ports & adapters, step grammar.
- Pipeline Execution Flows — how a run moves through the system.
- Artifact Typing — typed inputs/outputs, schema validation, why malformed LLM output fails fast.
Operate
- Platform Architecture — distributed execution, schedulers, recovery, heterogeneous worker pools.
- Packages — core vs platform — feature comparison and embedding the framework.
- Multi-Provider Execution — OpenAI, Gemini, and Ollama alongside Anthropic.
Extend & test
- pi-coding-agent Extensions — adding custom tools and lifecycle hooks.
- Testing Guide — running tests, dry-runs, and real pipelines.
v0.2.0 release candidate (v0.2.0-rc.2) — early-feedback build. API surface is stabilising but may still shift. npm install @mandarnilange/agentforge pulls the RC. Open an issue for anything that looks rough, or use Discussions for usage questions.
Bug reports, feature ideas, doc fixes, and code are all welcome.
- Issues / requests: GitHub issues.
- Dev setup and conventions:
CONTRIBUTING.md. - Larger architectural work:
ROADMAP.md— every entry is issue-ready. - Pull requests: small, focused, with tests. Conventional-commit messages preferred.
MIT-licensed — see LICENSE.