A self-hosted platform for designing software on a visual board and having LLM agents build it — turning architecture blocks into real, reviewed pull requests, with the whole pipeline observable in real time.
You sketch a system as a board of services → modules → tasks, attach requirements (PRDs, RFCs, tracker issues), and run agent pipelines against each block. Coding agents clone the linked repo, implement the work, open a PR, and push live progress back to the board. Reviewer, tester and acceptance agents sharpen the result; humans stay in the loop through decision prompts, PR review and a hard spend cap.
- What it is
- What it supports
- How it works
- Repository layout
- Feature guide
- Documentation index
- Deployment
cat-factory is a software-development agent management platform. It is self-hosted and runs end-to-end on Cloudflare: a Nuxt single-page app talks to a Cloudflare Worker (Hono + D1), and the heavy coding work runs in per-run Cloudflare Containers (or your own runner pool). It pairs a spatial planning surface (a Vue Flow canvas) with a durable, server-side execution engine so runs make progress whether or not a browser is open.
Two ideas anchor the model:
- The board is the plan. A "service" is a
Blockwithlevel: 'frame'; modules are sub-frames, tasks are leaves. Dependencies are edges. The board is both the design artifact and the unit of work agents act on. - Agents do real work through pull requests. The implementation phases run a coding agent on an actual checkout; "done" means a PR exists and its CI is green, not merely that text was generated.
| Capability | What you get |
|---|---|
| Visual architecture boards | A pannable/zoomable canvas of frames (services), modules and tasks with dependency edges, drag-drop reparenting, and semantic level-of-detail. |
| Accounts & workspaces | A signed-in user switches between a personal account and any orgs they belong to; an account owns many workspaces (boards). Visibility is by membership. |
| Agent pipelines | Reusable, ordered chains of agent steps (architect → coder → blueprints → reviewer → tester → acceptance, plus mocker/playwright/deployer/custom kinds) applied per block. |
| Durable, observable execution | Runs are driven by Cloudflare Workflows and stream live step/subtask progress, decision prompts, and failures to the board over WebSockets. |
| Real code changes via PRs | Coding agents (coder, mocker, playwright) run in a per-run container, clone the repo, implement, and open a PR; merge flips the block to done. |
| Requirements review | A stateless reviewer agent raises gaps/clarifications/assumptions/risks on a block; a human answers each, then the agent folds the answers back into the description. |
| Service blueprints | A Blueprinter agent decomposes a repo into a service → modules → features map stored in the repo (blueprints/) and reconciles it onto the board. |
| Repo bootstrap | Adapt a reference architecture (or scaffold from scratch) into a pre-created empty repo and force-push the result, materialising a new service frame on the board. |
| On-demand board scan | Decompose an existing repo into a board structure / reusable blueprint anchored to file references. |
| GitHub integration | Connect an account to GitHub via a GitHub App for repo/PR/issue read & write plus webhooks, with local D1 projections kept fresh. |
| Document & task sources | Link Confluence/Notion docs and Jira/Linear/GitHub issues to a board: import, expand into structure, or attach as agent context. |
| Ephemeral environments | Register your own preview-environment tooling via a declarative HTTP manifest so deployer/tester agents provision and run against it. |
| Prompt-fragment library | A tenant-scoped, versioned catalog of best-practice guidelines (built-in ∪ account ∪ workspace), optionally sourced from a repo, selected per run. |
| Bring-your-own runner pool | Route coding jobs to your own Kubernetes/Nomad/scheduler pool instead of Cloudflare Containers, described by a manifest. |
| Spend safeguards | Every LLM call is metered into an org-wide monthly budget; runs pause at the cap and resume when the period rolls over (or on an explicit override). |
| Model picker | Per-block model selection; each model runs on Cloudflare Workers AI by default and upgrades to its direct provider API when a key is set. |
| Benchmarking | A headless harness (cat-bench) that scores agents (requirement review / code review / implementation) across models and prompt versions. |
┌──────────────┐ WebSocket events ┌───────────────────────────┐
│ Nuxt SPA │ ◀──── push, not ──── │ Cloudflare Worker │
│ (frontend/app)│ polling │ Hono controllers + D1 │
│ Vue Flow │ ───── REST ─────────▶ │ (runtimes/cloudflare) │
└──────────────┘ └────────────┬──────────────┘
│ ports (DI)
┌──────────▼──────────┐
│ domain packages │
│ kernel + services │
└──────────┬──────────┘
│ dispatch coding jobs
┌──────────────────────▼───────────────────────┐
│ per-run Cloudflare Container (or runner pool) │
│ executor-harness → Pi coding agent → PR │
└───────────────────────────────────────────────┘
The canonical pattern is async + durable + observable: a service starts a run,
a Cloudflare Workflows instance drives it one checkpointed step at a time, a
container executes the long-running agent work asynchronously, and every
persisted transition is pushed to the browser through a per-workspace Durable
Object. The same shape is reused by execution, bootstrap and blueprints. The
end-to-end flows are written up in CLAUDE.md.
The domain + the HTTP layer are runtime-neutral, so the same backend serves two
deployment targets: the Cloudflare Worker above and a Node.js service
(backend/runtimes/node, Postgres via Drizzle + pg-boss for durable jobs). Each
facade supplies only its differentiators; a shared conformance suite runs the same
assertions against both to keep them from drifting.
One pnpm workspace, split into reusable libraries (published to npm + a public
runner image on GHCR and Docker Hub) and example deployments that depend on them. Other
organizations copy deploy/*, point the config at their own resources, and
deploy both halves on their end.
Libraries (published):
| Path | Package | Role |
|---|---|---|
frontend/app |
@cat-factory/app |
Reusable Nuxt layer (ssr: false) — the board UI, Pinia stores, composables, the WebSocket stream. Consumed via extends. |
backend/packages/contracts |
@cat-factory/contracts |
Valibot wire contracts shared by SPA + the backends. |
backend/packages/kernel |
@cat-factory/kernel |
Shared vocabulary: domain types, pure logic + constants, and all repository/port interfaces. |
backend/packages/orchestration |
@cat-factory/orchestration |
The delivery-workflow engine + domain composition root (createCore()): module services for execution, bootstrap, pipelines, board, requirements, merge, … |
backend/packages/integrations |
@cat-factory/integrations |
Opt-in integration services (GitHub, documents, tasks, environments, runner pools) behind kernel ports. |
backend/packages/agents |
@cat-factory/agents |
Agent catalog + prompt composition (systemPromptFor/userPromptFor, the per-kind roles, prompt-version registry) and the AI provisioning facade (CompositeModelProvider + the neutral resolvers). |
backend/packages/provider-bedrock |
@cat-factory/provider-bedrock |
Opt-in AWS Bedrock model resolver (@ai-sdk/amazon-bedrock) with a supported-model allow-list; mixed into a facade's registry when configured. |
backend/packages/spend |
@cat-factory/spend |
The spend safeguard: pricing tables + spend metering/gating. |
backend/packages/workspaces |
@cat-factory/workspaces |
Workspace + account services. |
backend/packages/server |
@cat-factory/server |
Runtime-neutral HTTP layer shared by every facade: all Hono controllers, middleware (auth/authz/CORS/error), request helpers, the gateway seams, the AppConfig contract, and the shared row↔domain mappers. |
backend/packages/prompt-fragments |
@cat-factory/prompt-fragments |
The built-in tier of best-practice prompt fragments. See its README. |
Runtime facades (one per deployment target; serve the same @cat-factory/server app):
| Path | Package | Role |
|---|---|---|
backend/runtimes/cloudflare |
@cat-factory/worker |
Cloudflare Worker facade: D1 repos, Durable Objects, Workflows, per-run Containers, queues/cron, the CF gateway impls. Thin createApp()/buildContainer() over @cat-factory/server; ships the D1 migrations/. |
backend/runtimes/node |
@cat-factory/node-server |
Node.js service facade: serves the shared app via @hono/node-server with Drizzle/Postgres repos + pg-boss durable execution. start() / createServer(); DATABASE_URL selects the database. |
Internal (private; not published to npm):
| Path | Package | Role |
|---|---|---|
backend/internal/executor-harness |
@cat-factory/executor-harness |
The payload that runs inside each per-run container (the Pi coding-agent harness). Published as a public multi-arch Docker image to GHCR + Docker Hub (not npm). See its README. |
backend/internal/benchmark-harness |
@cat-factory/benchmark-harness |
Headless agent benchmarking (cat-bench); internal. See its README. |
backend/internal/conformance |
@cat-factory/conformance |
Cross-runtime conformance suite + the canonical deterministic FakeAgentExecutor; run by both runtime facades' test suites to mandate feature parity. |
Deployments (examples; copy these to deploy on your own infra):
| Path | Package | Role |
|---|---|---|
deploy/backend |
@cat-factory/deploy-backend |
Cloudflare Worker deployment: re-exports @cat-factory/worker + the production wrangler.toml. See its README. |
deploy/node |
@cat-factory/deploy-node |
Node.js service deployment: calls @cat-factory/node-server's start() (Postgres + pg-boss); ships a Dockerfile + .env.example. See its README. |
deploy/frontend |
@cat-factory/deploy-frontend |
Pages deployment: a thin Nuxt app that extends @cat-factory/app + the Pages wrangler.toml. See its README. |
In this repo the deployments depend on the libraries via workspace:*; in your
own copy you swap that for the published npm version. The backend is a hexagonal
monorepo — controllers (worker) → services (core) → ports, with infra adapters
wired in container.ts. The full breakdown is in the
backend overview. Releases use changesets — see
CONTRIBUTING.md.
Each capability has a deeper write-up; start here and follow the link.
- Boards, services & repo linkage — the
frame → module → taskmodel, how a repo is resolved for a block at runtime, and drag-drop reparenting.CLAUDE.md→ Board / service / repo-linkage model. - Execution & real-time events — the durable run engine, decision prompts, failure/retry surface, and the push-not-poll event hub. Backend → Execution & real-time events.
- Model support — per-block model selection, the Cloudflare → direct →
subscription fallback ladder ("subscriptions always win"), the Pi / Claude Code /
Codex harnesses, flat-rate quota vs the spend budget, and the individual-only
(Claude-on-org) rule.
docs/model-support.md. - Requirements review — the stateless, synchronous reviewer agent.
CLAUDE.md→ Requirements review flow. - Service blueprints — the in-repo
blueprints/map and board reconciliation.CLAUDE.md→ Service blueprints flow. - Repo bootstrap — create a repo from a reference architecture.
CLAUDE.md→ Repo bootstrap flow. - Authentication — "Login with GitHub"; GitHub accounts are the identity
provider, so there's no separate user store.
docs/auth.md. - GitHub integration — connect an account via a GitHub App for repo/PR/issue read & write plus webhooks; the installation is shared across the account's workspaces, and each workspace explicitly links the repos it tracks. Design · Operations runbook · Two-app provisioning (ADR 0005).
- Document sources — link requirements, RFCs and PRDs from Confluence/Notion
and expand them into structure.
docs/document-sources.md. - Ephemeral environments — plug in your own preview-environment tooling via a
declarative manifest, or a hand-written native adapter.
docs/environments-integration.md· native adapters. - Prompt-fragment library — tenant-scoped, repo-sourced guidelines selected per run. ADR 0006.
- Self-hosted runner pool — run coding jobs on your own infra. Operator guide · ADR 0004.
- Storage & retention — the D1 data model's retention sweeps.
docs/storage-and-retention.md. - Container reaping — how per-run containers get reclaimed, and the current
gaps.
docs/container-reaping.md. - Benchmarking — score agents across models and prompt versions.
benchmark-harnessREADME.
Start here
- Backend overview — the Worker + D1 monorepo and its layering.
frontend/app/README.md— the Nuxt SPA layer.CLAUDE.md— the cross-cutting runtime flows (execution + events, bootstrap, blueprints, requirements review, the board/repo-linkage model) in one place for quick lookup.
Integrations & features
- Model support — selection, fallbacks, harnesses & provisioning
- Authentication
- GitHub integration — design · operations runbook · App Manifest
- Document sources
- Ephemeral environments · native adapters
- Self-hosted runner pool
Operations
Architecture decisions (ADRs)
- 0001 — GitHub integration via a GitHub App
- 0002 — Cloudflare as the runtime platform
- 0003 — Pluggable ephemeral-environment providers
- 0004 — Self-hosted runner pool
- 0005 — Two-app tiering for repository creation
- 0006 — Tenant-scoped prompt-fragment library
The two halves are deployed from the example packages under deploy/. Each
carries its own config: the backend Worker in
deploy/backend/ and the frontend Pages
project in deploy/frontend/. The backend can
alternatively run as a long-running Node.js service (Postgres + pg-boss) from
deploy/node/ — same HTTP API, different runtime. To deploy on
your own infrastructure, copy those directories and swap the workspace:*
dependency for the published npm version — see each package's README. The
reference deployment below runs on Cloudflare under the iselwin@gmail.com
account (wrangler whoami must show fe0047c6e869c8cb875ca425a9c341af).
| Piece | Cloudflare resource | Production URL |
|---|---|---|
| Backend | Worker cat-factory-backend |
https://catfactory-api.kiberion.com |
| Frontend | Pages project cat-factory |
https://catfactory.kiberion.com |
| Data | D1 database cat_factory |
(bound to the Worker as DB) |
Deploy the backend first so any schema the new frontend expects is already
live, then the frontend. Migrations run before the Worker deploy. The runner
container image is published independently to GHCR + Docker Hub (see
backend/internal/executor-harness
and .github/workflows/docker-publish.yml); the backend wrangler.toml
references it by tag.
cd deploy/backend
# 1. apply any new migrations to the PRODUCTION D1 (review the pending list first)
wrangler d1 migrations list cat_factory --remote
wrangler d1 migrations apply cat_factory --remote # == pnpm db:migrate:remote
# 2. deploy the Worker (also rolls the container image, workflows, cron triggers).
# `pnpm deploy` builds @cat-factory/worker first, then `wrangler deploy`.
pnpm deployThe migrations ship with the @cat-factory/worker library, so migrations_dir
points at node_modules/@cat-factory/worker/migrations (see the comment in
deploy/backend/wrangler.toml if your tooling can't follow the symlink). The
Worker prints its *.workers.dev URL; production traffic reaches it through the
catfactory-api.kiberion.com custom domain (configured in the Cloudflare
dashboard, not in wrangler.toml). First-time setup (auth, provider, GitHub-App
and container secrets) is in backend/README.md
— auth is required or the API fails closed.
Instead of the Worker, run the same backend as a long-running Node.js service over
Postgres (durable jobs on pg-boss). It needs only DATABASE_URL (the schema
migrates on boot); all other config is environment-driven and documented in
deploy/node/.env.example.
cd deploy/node
cp .env.example .env # set DATABASE_URL, auth, model keys, …
pnpm start # builds @cat-factory/node-server, then runs the service
# or as a container (build from the repo root):
docker build -f deploy/node/Dockerfile -t cat-factory-node .
docker run --rm -p 8787:8787 --env-file deploy/node/.env cat-factory-nodeRequires Node 24 or 26 (the entry runs via built-in type stripping; the scripts
load .env with Node's native --env-file). See
deploy/node/README.md.
The SPA is ssr: false, so the backend URL is baked in at build time from
NUXT_PUBLIC_API_BASE — it is not a Pages runtime var. Build with the prod
API base, then deploy the static output:
cd deploy/frontend
NUXT_PUBLIC_API_BASE=https://catfactory-api.kiberion.com pnpm generate
pnpm deploy # wrangler pages deploy; project + dir from wrangler.tomlPowerShell equivalent for the build step:
$env:NUXT_PUBLIC_API_BASE = "https://catfactory-api.kiberion.com"; pnpm generatepnpm generate writes the static site to .output/public; wrangler pages deploy (no args) reads the project name cat-factory and that output dir from
deploy/frontend/wrangler.toml. main is the Pages production branch, so the
deploy updates the catfactory.kiberion.com alias. Sanity-check after deploying:
curl -s https://catfactory-api.kiberion.com/health # {"status":"ok"}
curl -s https://catfactory.kiberion.com | grep -o catfactory-api.kiberion.com # baked API basebackend/scripts/teardown-production.sh
deletes the Worker (and its containers/workflows/crons), optionally the Pages
project (--include-pages), and always preserves the D1 data.
Re-deploying brings production back.