Skip to content

kibertoad/cat-factory

Repository files navigation

cat-factory

A self-hosted platform for designing software on a visual board and having LLM agents build it — turning architecture blocks into real, reviewed pull requests, with the whole pipeline observable in real time.

You sketch a system as a board of services → modules → tasks, attach requirements (PRDs, RFCs, tracker issues), and run agent pipelines against each block. Coding agents clone the linked repo, implement the work, open a PR, and push live progress back to the board. Reviewer, tester and acceptance agents sharpen the result; humans stay in the loop through decision prompts, PR review and a hard spend cap.

Table of contents

What it is

cat-factory is a software-development agent management platform. It is self-hosted and runs end-to-end on Cloudflare: a Nuxt single-page app talks to a Cloudflare Worker (Hono + D1), and the heavy coding work runs in per-run Cloudflare Containers (or your own runner pool). It pairs a spatial planning surface (a Vue Flow canvas) with a durable, server-side execution engine so runs make progress whether or not a browser is open.

Two ideas anchor the model:

  • The board is the plan. A "service" is a Block with level: 'frame'; modules are sub-frames, tasks are leaves. Dependencies are edges. The board is both the design artifact and the unit of work agents act on.
  • Agents do real work through pull requests. The implementation phases run a coding agent on an actual checkout; "done" means a PR exists and its CI is green, not merely that text was generated.

What it supports

Capability What you get
Visual architecture boards A pannable/zoomable canvas of frames (services), modules and tasks with dependency edges, drag-drop reparenting, and semantic level-of-detail.
Accounts & workspaces A signed-in user switches between a personal account and any orgs they belong to; an account owns many workspaces (boards). Visibility is by membership.
Agent pipelines Reusable, ordered chains of agent steps (architect → coder → blueprints → reviewer → tester → acceptance, plus mocker/playwright/deployer/custom kinds) applied per block.
Durable, observable execution Runs are driven by Cloudflare Workflows and stream live step/subtask progress, decision prompts, and failures to the board over WebSockets.
Real code changes via PRs Coding agents (coder, mocker, playwright) run in a per-run container, clone the repo, implement, and open a PR; merge flips the block to done.
Requirements review A stateless reviewer agent raises gaps/clarifications/assumptions/risks on a block; a human answers each, then the agent folds the answers back into the description.
Service blueprints A Blueprinter agent decomposes a repo into a service → modules → features map stored in the repo (blueprints/) and reconciles it onto the board.
Repo bootstrap Adapt a reference architecture (or scaffold from scratch) into a pre-created empty repo and force-push the result, materialising a new service frame on the board.
On-demand board scan Decompose an existing repo into a board structure / reusable blueprint anchored to file references.
GitHub integration Connect an account to GitHub via a GitHub App for repo/PR/issue read & write plus webhooks, with local D1 projections kept fresh.
Document & task sources Link Confluence/Notion docs and Jira/Linear/GitHub issues to a board: import, expand into structure, or attach as agent context.
Ephemeral environments Register your own preview-environment tooling via a declarative HTTP manifest so deployer/tester agents provision and run against it.
Prompt-fragment library A tenant-scoped, versioned catalog of best-practice guidelines (built-in ∪ account ∪ workspace), optionally sourced from a repo, selected per run.
Bring-your-own runner pool Route coding jobs to your own Kubernetes/Nomad/scheduler pool instead of Cloudflare Containers, described by a manifest.
Spend safeguards Every LLM call is metered into an org-wide monthly budget; runs pause at the cap and resume when the period rolls over (or on an explicit override).
Model picker Per-block model selection; each model runs on Cloudflare Workers AI by default and upgrades to its direct provider API when a key is set.
Benchmarking A headless harness (cat-bench) that scores agents (requirement review / code review / implementation) across models and prompt versions.

How it works

┌──────────────┐   WebSocket events    ┌───────────────────────────┐
│  Nuxt SPA    │ ◀──── push, not ────  │  Cloudflare Worker        │
│ (frontend/app)│      polling         │  Hono controllers + D1    │
│  Vue Flow    │ ───── REST ─────────▶ │  (runtimes/cloudflare)    │
└──────────────┘                       └────────────┬──────────────┘
                                                     │ ports (DI)
                                          ┌──────────▼──────────┐
                                          │   domain packages   │
                                          │  kernel + services  │
                                          └──────────┬──────────┘
                                                     │ dispatch coding jobs
                              ┌──────────────────────▼───────────────────────┐
                              │ per-run Cloudflare Container (or runner pool) │
                              │ executor-harness → Pi coding agent → PR    │
                              └───────────────────────────────────────────────┘

The canonical pattern is async + durable + observable: a service starts a run, a Cloudflare Workflows instance drives it one checkpointed step at a time, a container executes the long-running agent work asynchronously, and every persisted transition is pushed to the browser through a per-workspace Durable Object. The same shape is reused by execution, bootstrap and blueprints. The end-to-end flows are written up in CLAUDE.md.

The domain + the HTTP layer are runtime-neutral, so the same backend serves two deployment targets: the Cloudflare Worker above and a Node.js service (backend/runtimes/node, Postgres via Drizzle + pg-boss for durable jobs). Each facade supplies only its differentiators; a shared conformance suite runs the same assertions against both to keep them from drifting.

Repository layout

One pnpm workspace, split into reusable libraries (published to npm + a public runner image on GHCR and Docker Hub) and example deployments that depend on them. Other organizations copy deploy/*, point the config at their own resources, and deploy both halves on their end.

Libraries (published):

Path Package Role
frontend/app @cat-factory/app Reusable Nuxt layer (ssr: false) — the board UI, Pinia stores, composables, the WebSocket stream. Consumed via extends.
backend/packages/contracts @cat-factory/contracts Valibot wire contracts shared by SPA + the backends.
backend/packages/kernel @cat-factory/kernel Shared vocabulary: domain types, pure logic + constants, and all repository/port interfaces.
backend/packages/orchestration @cat-factory/orchestration The delivery-workflow engine + domain composition root (createCore()): module services for execution, bootstrap, pipelines, board, requirements, merge, …
backend/packages/integrations @cat-factory/integrations Opt-in integration services (GitHub, documents, tasks, environments, runner pools) behind kernel ports.
backend/packages/agents @cat-factory/agents Agent catalog + prompt composition (systemPromptFor/userPromptFor, the per-kind roles, prompt-version registry) and the AI provisioning facade (CompositeModelProvider + the neutral resolvers).
backend/packages/provider-bedrock @cat-factory/provider-bedrock Opt-in AWS Bedrock model resolver (@ai-sdk/amazon-bedrock) with a supported-model allow-list; mixed into a facade's registry when configured.
backend/packages/spend @cat-factory/spend The spend safeguard: pricing tables + spend metering/gating.
backend/packages/workspaces @cat-factory/workspaces Workspace + account services.
backend/packages/server @cat-factory/server Runtime-neutral HTTP layer shared by every facade: all Hono controllers, middleware (auth/authz/CORS/error), request helpers, the gateway seams, the AppConfig contract, and the shared row↔domain mappers.
backend/packages/prompt-fragments @cat-factory/prompt-fragments The built-in tier of best-practice prompt fragments. See its README.

Runtime facades (one per deployment target; serve the same @cat-factory/server app):

Path Package Role
backend/runtimes/cloudflare @cat-factory/worker Cloudflare Worker facade: D1 repos, Durable Objects, Workflows, per-run Containers, queues/cron, the CF gateway impls. Thin createApp()/buildContainer() over @cat-factory/server; ships the D1 migrations/.
backend/runtimes/node @cat-factory/node-server Node.js service facade: serves the shared app via @hono/node-server with Drizzle/Postgres repos + pg-boss durable execution. start() / createServer(); DATABASE_URL selects the database.

Internal (private; not published to npm):

Path Package Role
backend/internal/executor-harness @cat-factory/executor-harness The payload that runs inside each per-run container (the Pi coding-agent harness). Published as a public multi-arch Docker image to GHCR + Docker Hub (not npm). See its README.
backend/internal/benchmark-harness @cat-factory/benchmark-harness Headless agent benchmarking (cat-bench); internal. See its README.
backend/internal/conformance @cat-factory/conformance Cross-runtime conformance suite + the canonical deterministic FakeAgentExecutor; run by both runtime facades' test suites to mandate feature parity.

Deployments (examples; copy these to deploy on your own infra):

Path Package Role
deploy/backend @cat-factory/deploy-backend Cloudflare Worker deployment: re-exports @cat-factory/worker + the production wrangler.toml. See its README.
deploy/node @cat-factory/deploy-node Node.js service deployment: calls @cat-factory/node-server's start() (Postgres + pg-boss); ships a Dockerfile + .env.example. See its README.
deploy/frontend @cat-factory/deploy-frontend Pages deployment: a thin Nuxt app that extends @cat-factory/app + the Pages wrangler.toml. See its README.

In this repo the deployments depend on the libraries via workspace:*; in your own copy you swap that for the published npm version. The backend is a hexagonal monorepo — controllers (worker) → services (core) → ports, with infra adapters wired in container.ts. The full breakdown is in the backend overview. Releases use changesets — see CONTRIBUTING.md.

Feature guide

Each capability has a deeper write-up; start here and follow the link.

Documentation index

Start here

  • Backend overview — the Worker + D1 monorepo and its layering.
  • frontend/app/README.md — the Nuxt SPA layer.
  • CLAUDE.md — the cross-cutting runtime flows (execution + events, bootstrap, blueprints, requirements review, the board/repo-linkage model) in one place for quick lookup.

Integrations & features

Operations

Architecture decisions (ADRs)

Deployment

The two halves are deployed from the example packages under deploy/. Each carries its own config: the backend Worker in deploy/backend/ and the frontend Pages project in deploy/frontend/. The backend can alternatively run as a long-running Node.js service (Postgres + pg-boss) from deploy/node/ — same HTTP API, different runtime. To deploy on your own infrastructure, copy those directories and swap the workspace:* dependency for the published npm version — see each package's README. The reference deployment below runs on Cloudflare under the iselwin@gmail.com account (wrangler whoami must show fe0047c6e869c8cb875ca425a9c341af).

Piece Cloudflare resource Production URL
Backend Worker cat-factory-backend https://catfactory-api.kiberion.com
Frontend Pages project cat-factory https://catfactory.kiberion.com
Data D1 database cat_factory (bound to the Worker as DB)

Deploy the backend first so any schema the new frontend expects is already live, then the frontend. Migrations run before the Worker deploy. The runner container image is published independently to GHCR + Docker Hub (see backend/internal/executor-harness and .github/workflows/docker-publish.yml); the backend wrangler.toml references it by tag.

Backend (Worker + D1)

cd deploy/backend

# 1. apply any new migrations to the PRODUCTION D1 (review the pending list first)
wrangler d1 migrations list  cat_factory --remote
wrangler d1 migrations apply cat_factory --remote     # == pnpm db:migrate:remote

# 2. deploy the Worker (also rolls the container image, workflows, cron triggers).
#    `pnpm deploy` builds @cat-factory/worker first, then `wrangler deploy`.
pnpm deploy

The migrations ship with the @cat-factory/worker library, so migrations_dir points at node_modules/@cat-factory/worker/migrations (see the comment in deploy/backend/wrangler.toml if your tooling can't follow the symlink). The Worker prints its *.workers.dev URL; production traffic reaches it through the catfactory-api.kiberion.com custom domain (configured in the Cloudflare dashboard, not in wrangler.toml). First-time setup (auth, provider, GitHub-App and container secrets) is in backend/README.mdauth is required or the API fails closed.

Backend (Node.js service — alternative to the Worker)

Instead of the Worker, run the same backend as a long-running Node.js service over Postgres (durable jobs on pg-boss). It needs only DATABASE_URL (the schema migrates on boot); all other config is environment-driven and documented in deploy/node/.env.example.

cd deploy/node
cp .env.example .env          # set DATABASE_URL, auth, model keys, …
pnpm start                    # builds @cat-factory/node-server, then runs the service

# or as a container (build from the repo root):
docker build -f deploy/node/Dockerfile -t cat-factory-node .
docker run --rm -p 8787:8787 --env-file deploy/node/.env cat-factory-node

Requires Node 24 or 26 (the entry runs via built-in type stripping; the scripts load .env with Node's native --env-file). See deploy/node/README.md.

Frontend (Nuxt SPA → Pages)

The SPA is ssr: false, so the backend URL is baked in at build time from NUXT_PUBLIC_API_BASE — it is not a Pages runtime var. Build with the prod API base, then deploy the static output:

cd deploy/frontend
NUXT_PUBLIC_API_BASE=https://catfactory-api.kiberion.com pnpm generate
pnpm deploy                            # wrangler pages deploy; project + dir from wrangler.toml

PowerShell equivalent for the build step:

$env:NUXT_PUBLIC_API_BASE = "https://catfactory-api.kiberion.com"; pnpm generate

pnpm generate writes the static site to .output/public; wrangler pages deploy (no args) reads the project name cat-factory and that output dir from deploy/frontend/wrangler.toml. main is the Pages production branch, so the deploy updates the catfactory.kiberion.com alias. Sanity-check after deploying:

curl -s https://catfactory-api.kiberion.com/health        # {"status":"ok"}
curl -s https://catfactory.kiberion.com | grep -o catfactory-api.kiberion.com   # baked API base

Emergency takedown

backend/scripts/teardown-production.sh deletes the Worker (and its containers/workflows/crons), optionally the Pages project (--include-pages), and always preserves the D1 data. Re-deploying brings production back.

About

Software development agent management platform

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors