Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/publish-skills.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Publish Skills

# Validates SKILL.md frontmatter under `skills/`. The `skills/` path is
# what Vercel's skills CLI scans by default — keeping it on the default
# branch IS the publication step (no upload / no registry submission).

on:
push:
branches: [main]
pull_request:
branches: [main]
paths:
- 'skills/**'
- 'scripts/validate-skills.mjs'
- '.github/workflows/publish-skills.yml'

concurrency:
group: publish-skills-${{ github.ref }}
cancel-in-progress: true

jobs:
validate:
name: validate skill frontmatter
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5

- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version: '22'

- name: Validate skill frontmatter
run: node scripts/validate-skills.mjs
561 changes: 112 additions & 449 deletions README.md

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions docs/artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Artifact Typing & Validation

> Part of the [AgentForge documentation](README.md).

Every agent declares typed inputs and outputs. Artifacts are validated against Zod / JSON Schemas at every pipeline boundary — invalid output fails the agent run before it reaches the next phase.

```
Agent YAML Zod Schema Runtime
┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ outputs: │ │ RequirementsSchema │ │ Agent produces JSON │
│ - type: │───▶│ = z.object({ │───▶│ → safeParse(output) │
│ requirements│ │ epics: [...], │ │ → pass ✓ or fail ✗ │
│ schema: ... │ │ ... │ └──────────────────────┘
└─────────────────┘ │ }) │
└──────────────────────┘
```

## What ships

45 built-in schemas covering requirements, architecture, code, data, testing, security, and DevOps — see `packages/core/src/schemas/`. Every shipped template references them so you can compose pipelines without inventing new artifact types.

## Defining your own

Add a Zod schema in TypeScript and reference it from agent YAML by file path:

```yaml
# .agentforge/agents/my-agent.agent.yaml
spec:
outputs:
- type: my-artifact
schema: schemas/my-artifact.schema.yaml
```

The schema file can be either Zod-shaped TypeScript (loaded via the schema registry) or a JSON Schema YAML — both validate at the same boundary.

Architectural details and the artifact flow through phases: [`docs/architecture.md`](architecture.md#artifact-flow).

## Why this matters

- **Malformed LLM output is caught early.** Bad JSON, missing fields, or wrong types abort the agent run before downstream agents consume it.
- **Wiring is explicit.** Each agent's `inputs[].type` and `outputs[].type` form a contract. Two agents producing the same type is a configuration error you catch at `agentforge validate` time, not at runtime.
- **Schemas double as docs.** New team members understand what each phase produces by reading one file.
132 changes: 132 additions & 0 deletions docs/harness-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# The Harness Model

> Part of the [AgentForge documentation](README.md).

Most agent frameworks treat an "agent" as one LLM call wrapped in a few tools. AgentForge treats an agent as a **harness** — a named flow of steps where the LLM is just one step type. Your existing tools (linters, test runners, security scanners, custom CLIs) sit alongside the LLM and *gate its output* on every run.

The result: bad LLM output never leaks to the next phase, and you customise behaviour by editing YAML — not by forking the framework.

---

## Step types

Each agent declares a flow of named steps from this set:

| Type | What it does |
|---|---|
| `llm` | Invokes the agent's model with the system prompt + inputs. The normal LLM call. |
| `script` | Runs a shell command on the node. Has access to template variables (`{{run.workdir}}`, `{{pipeline.id}}`, `{{steps.<name>.output}}`, `{{steps.<name>.exitCode}}`). |
| `validate` | Runs a Zod / JSON Schema check against a named artifact or the last LLM output. Fails the run by default; set `continueOnError: true` to log and continue. |
| `transform` | Pure data reshape between steps (no side effects). |

Plus two control-flow constructs usable anywhere in a flow:

- **`loop`** — retry a block until a predicate step outputs a success sentinel, with a `maxIterations` ceiling.
- **`condition`** — skip a step when a referenced step's output doesn't match.

---

## Real example — the bundled `developer` agent

This is from `packages/core/src/templates/simple-sdlc/agents/developer.agent.yaml`. It shows the *generate → lint → test → fix-until-passing* pattern that `script` + `loop` unlock together:

```yaml
spec:
executor: pi-coding-agent
tools: [read, write, edit, bash, grep, find]

definitions:
generate-code:
type: llm
instructions: |
Generate the full implementation based on the requirements and architecture plan.

lint-and-format:
type: script
run: |
cd {{run.workdir}}
# Auto-detect + run the project's linter/formatter
if [ -f package.json ]; then npx eslint src/ --fix; npx prettier --write "src/**/*.{ts,js}"
elif [ -f pyproject.toml ]; then python -m black .; python -m ruff check --fix .
elif [ -f go.mod ]; then gofmt -w .
fi
continueOnError: true

run-tests:
type: script
run: |
cd {{run.workdir}}
if [ -f package.json ]; then npm test
elif [ -f pyproject.toml ]; then python -m pytest -v
elif [ -f go.mod ]; then go test ./...
fi
captureOutput: true
continueOnError: true

test-gate:
type: script
run: |
if [ "{{steps.run-tests.exitCode}}" = "0" ]; then echo "PASS"; else echo "false"; fi

fix-code:
type: llm
instructions: |
Fix attempt {{loop.iteration}} of {{loop.maxIterations}}.
Failing tests:
{{steps.run-tests.output}}
Fix the source code — don't modify tests unless they have a genuine bug.

validate-output:
type: validate
schema: code-output

git-commit:
type: script
run: |
cd {{run.workdir}}
git add -A && git commit -m "feat(developer): pipeline {{pipeline.id}}"
continueOnError: true

flow:
- step: generate-code
- step: lint-and-format
- loop:
until: "{{steps.test-gate.output}}" # exits when test-gate emits "PASS"
maxIterations: 3
do:
- step: run-tests
- step: test-gate
- step: fix-code
condition: "{{steps.test-gate.output}}" # skip fix if tests passed
- step: validate-output
- step: git-commit
```

---

## Why this matters

- **Your existing tools stay in charge of correctness.** The LLM proposes; `eslint`, `pytest`, `go vet`, `trivy`, `semgrep`, whatever you already trust, decide whether the output is acceptable. Bad LLM output doesn't leak into the next phase.
- **Customise without forking.** Want a different linter, a stricter security scan, a different commit convention? It's YAML — edit the `run:` block. No framework recompile.
- **Domain-agnostic.** The same mechanics build a content agent (generate → SEO audit → Grammarly → publish), a data agent (generate SQL → explain-plan → dry-run → apply), an ops agent (generate runbook → shellcheck → render to PDF). Scripts are the universal glue.
- **Observable.** Every step — LLM and script — lands in the state store with output, exit code, duration, and a span in your OTel trace. The dashboard timeline shows the whole harness, not just the LLM turn.

---

## Template variables

Every `script.run`, `llm.instructions`, `condition`, and `loop.until` field is a template. Available bindings:

- `{{run.workdir}}` — agent's working directory on the node
- `{{run.id}}`, `{{pipeline.id}}` — IDs for logging / commits
- `{{inputs.<type>}}` — content of a declared input artifact
- `{{steps.<name>.output}}` / `.exitCode` — last result of a named step
- `{{loop.iteration}}` / `{{loop.maxIterations}}` — current loop position

Full grammar and resolution semantics: [`docs/architecture.md`](architecture.md).

---

## Step grammar reference

For the authoritative shape of `step`, `loop`, `parallel`, and `condition` blocks, see the Zod schema in `packages/core/src/definitions/parser.ts` (`AgentDefinitionSchema`). Pipeline execution and artifact flow: [`docs/pipeline-execution-flows.md`](pipeline-execution-flows.md).
49 changes: 49 additions & 0 deletions docs/packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# `@mandarnilange/agentforge` vs `@mandarnilange/agentforge-core`

> Part of the [AgentForge documentation](README.md).

Two npm packages ship from this repo. Most users want **`@mandarnilange/agentforge`** (the platform binary). Pick `@mandarnilange/agentforge-core` only if you're embedding the engine into your own CLI / service or you specifically don't want the platform extras.

## Feature comparison

| | **`@mandarnilange/agentforge-core`** | **`@mandarnilange/agentforge`** (platform) |
|---|---|---|
| **Install** | `npm install @mandarnilange/agentforge-core` | `npm install @mandarnilange/agentforge` (pulls in core) |
| **Binary** | `agentforge-core` | `agentforge` |
| **Intended for** | Local dev, evaluation, library embed | Production, teams, multi-host |
| **LLM providers** | Anthropic | Anthropic + OpenAI + Gemini + Ollama |
| **Executors** | Local (in-process) | Local + **Docker container** + **Remote HTTP** |
| **Node types** | `local` | `local` + `ssh` + remote workers |
| **State store** | SQLite (file) | SQLite **or** PostgreSQL |
| **Persistent definitions** | YAML on disk, loaded per run | YAML on disk **or** `apply` to DB (versioned, hot-reload) |
| **Observability** | OTel API (no-op without SDK) | Full OTel SDK + Jaeger / Grafana export |
| **Crash recovery** | — | Pipeline rehydration + reconciliation loop |
| **Rate limiting** | — | Token / cost / concurrency per pipeline |
| **Multi-host deploy** | — | Control-plane + worker Docker Compose files |
| **Docker image** | `ghcr.io/mandarnilange/agentforge-core` (~289 MB) | `ghcr.io/mandarnilange/agentforge-platform` (~336 MB) |

Defaults are **identical** for local dev (SQLite, local executor, Anthropic). Installing the platform package up front means you won't have to migrate when you need a production feature.

## Using just the framework (`@mandarnilange/agentforge-core`)

If you're embedding the engine into your own CLI or service — or you want the framework without the platform binary, multi-provider middleware, or Postgres — install `@mandarnilange/agentforge-core` directly:

```bash
npm install @mandarnilange/agentforge-core
npx @mandarnilange/agentforge-core init --template simple-sdlc
```

Same YAML schema, same executors, same control plane. You wire your own entry point. Package-level docs: [`packages/core/README.md`](../packages/core/README.md).

## Multi-provider setup

Mixing Anthropic, OpenAI, Gemini, and Ollama (one provider per agent in the same pipeline): [`docs/multi-provider.md`](multi-provider.md).

## Docker images

```bash
docker build --target core -t agentforge-core . # ~289 MB
docker build --target platform -t agentforge-platform . # ~336 MB
```

Image targets share the same `Dockerfile` — the `platform` target adds the platform-only entry points and env variables.
56 changes: 55 additions & 1 deletion docs/platform-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2070,7 +2070,61 @@ interface AgentJobIdentity {

---

## 15. Glossary
## 15. Deploying heterogeneous worker pools

The execution plane scales horizontally by adding worker hosts. Workers register with the control plane, heartbeat, and receive dispatched agent jobs. Two workers with **different capabilities** on different hosts let the scheduler route each agent to the right node via `nodeAffinity`.

### Spinning up two specialised workers

```bash
# Worker A — beefy, Docker-isolated, GPU
NODE_NAME=worker-gpu \
NODE_CAPABILITIES=llm-access,docker,high-memory,gpu \
NODE_MAX_CONCURRENT_RUNS=4 \
CONTROL_PLANE_URL=http://cp:3001 \
docker compose -f packages/platform/docker-compose.worker.yml up -d

# Worker B — lightweight, llm-calls only
NODE_NAME=worker-light \
NODE_CAPABILITIES=llm-access \
NODE_MAX_CONCURRENT_RUNS=10 \
CONTROL_PLANE_URL=http://cp:3001 \
docker compose -f packages/platform/docker-compose.worker.yml up -d
```

### Matching agent affinity

The `developer` agent demands Docker isolation and benefits from GPU, so it routes to `worker-gpu`. The `analyst` agent only needs LLM access, so it lands on `worker-light`:

```yaml
# .agentforge/agents/developer.agent.yaml
spec:
nodeAffinity:
required: [{ capability: llm-access }, { capability: docker }]
preferred: [{ capability: gpu }, { capability: high-memory }]
```

```yaml
# .agentforge/agents/analyst.agent.yaml
spec:
nodeAffinity:
required: [{ capability: llm-access }]
```

### Verifying the pool

```bash
agentforge get nodes
# NAME STATUS CAPABILITIES ACTIVE/MAX
# worker-gpu online llm-access, docker, high-memory, gpu 0/4
# worker-light online llm-access 0/10
```

The scheduler picks the highest-scoring node whose capabilities satisfy each agent's required set; soft preferences break ties and active-run counts cap concurrency per node.

---

## 16. Glossary

| Term | Definition |
|------|-----------|
Expand Down
41 changes: 41 additions & 0 deletions docs/who-uses-it.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Who Uses AgentForge

> Part of the [AgentForge documentation](README.md).

AgentForge is a YAML-and-CLI framework. Engineers and platform teams author it; the *artifacts and gates* it produces are consumed across an organisation.

## Roles, concretely

### Platform / DevOps engineers
**Stand it up once, the rest of the org inherits the substrate.**

- Run AgentForge as a control plane + worker pool for the company.
- Configure node pools, secrets, cost ceilings, OTel export.
- Add new pipelines as `git push` — no per-team glue code to maintain.
- Pair with [`docs/platform-architecture.md`](platform-architecture.md).

### Software engineers
**Build the agents your domain needs; reuse the harness.**

- Author `.agent.yaml`, `.pipeline.yaml`, and step pipelines.
- Wire your linter, tests, and security scanners as `script` steps so they gate the LLM. See the harness model: [`docs/harness-model.md`](harness-model.md).
- Ship AI-assisted features without giving up code-review discipline — every step lands in the OTel trace and the dashboard timeline.

### Product / domain owners (marketing, sales, HR, ops, legal, finance)
**Don't write YAML. Drive runs from the dashboard.**

- Kick off pipelines via the dashboard or CLI ("run `seo-review` on this URL").
- Approve / reject / revise at human gates between phases — plain-English revision notes, no code.
- Read and download the typed artifacts the pipeline produces.

The artifact-typing model means revision notes steer the next LLM call: gates are a two-way conversation, not a rubber stamp. Every decision is signed, timestamped, and survives restarts.

## What everyone gets

One binary, one control plane, one audit trail:

- **Cost guardrails at every layer.** Each agent declares its own token + dollar ceiling; pipelines carry org-wide limits. The dashboard shows spend in real time. Runaway LLM calls abort cleanly *before* they bill you.
- **Typed artifacts.** 45 built-in Zod / JSON Schemas for SDLC outputs (and you define your own). Malformed LLM output fails the run before it poisons the next phase. See [`docs/artifacts.md`](artifacts.md).
- **Humans in the loop.** Plain-English approvals between phases — the LLM proposes; the human decides.
- **Real-time dashboard.** Pipeline timeline, live agent conversation, artifact viewer, PDF export, cost tracking — same binary, no extra install.
- **Open source, MIT.** No paid tier, no cloud dependency, no telemetry.
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"test:watch": "vitest",
"lint": "biome check --write .",
"typecheck": "tsc --build",
"skills:validate": "node scripts/validate-skills.mjs",
"clean": "rm -rf packages/core/dist packages/core/tsconfig.tsbuildinfo packages/core/src/dashboard/dist packages/platform/dist packages/platform/tsconfig.tsbuildinfo"
},
"engines": {
Expand Down
Loading