ganbuild

Every AI builder ships the same look. Coral gradients, centered text, emoji everywhere. You've seen it. Your users have definitely seen it.

The problem is simple: one agent builds the thing and decides it's good. ganbuild splits those jobs. The Generator writes code. A different Evaluator opens the app in a real browser, pulls up an award-winning site next to it, and compares the two. Below 8/10? Code gets thrown out, Generator tries again. The Evaluator doesn't know which attempt it's seeing, so it can't go easy on round three.

You write a few sentences about what you want. ganbuild plans it, finds real sites to pull design direction from, builds it in sprints, and checks each one in a live browser. Hover states, copy tone, the works. You approve the plan once. After that, it runs on its own.

Inspired by:

Anthropic's harness design for long-running apps — separated Generator/Evaluator architecture
Karpathy's autoresearch — experiment tracking, git-based keep/discard, autonomous loops

Quick start

Install ganbuild (see below)
Run /ganbuild "a personal portfolio site" — describe what you're building
Approve the plan when prompted
Watch it build, evaluate, and polish autonomously
Run /ganbuild:review to annotate and refine

Stop there. You'll know if this is for you.

Tip: ganbuild runs autonomously for a while (planning, building, evaluating, retrying). It works best with permissions pre-approved so it doesn't pause for confirmation on every file edit:
claude --dangerously-skip-permissions
# or
claude --enable-auto-mode

Install

npx skills add johnlim5847/ganbuild

ganbuild works on any agent that supports the SKILL.md standard (Claude Code, Codex, Gemini CLI, Cursor, Windsurf, etc.). Skills are discovered automatically.

Works well with gstack

ganbuild uses a browser tool for design research and evaluation. If you have gstack installed, ganbuild will automatically use its /browse skill (~100ms per command, persistent sessions). Otherwise it falls back to chrome MCP tools or Playwright.

gstack also provides complementary skills you can run on your ganbuild output:

/qa — find and fix bugs with real browser testing
/design-review — visual QA with before/after screenshots
/review — pre-landing code review
/ship — create a PR with changelog

Usage

/ganbuild "a retro arcade game collection with leaderboards"

Command	Description
`/ganbuild "brief"`	Full build from a short description
`/ganbuild --interactive "brief"`	Build with human annotations between sprints
`/ganbuild --no-review "brief"`	Build without human review phase
`/ganbuild:resume`	Resume from the last completed sprint
`/ganbuild:review`	Launch human review on an existing build

How it works

Brief --> Planner --> spec.md + sprint-contracts.md
                          |
                     [User confirms]
                          |
              Design Research Agent
              Browses award-winning sites, screenshots them,
              extracts real CSS tokens (colors, fonts, layout)
              Writes design-system/MASTER.md
                          |
            +--- Build Loop (autonomous) ---+
            |                               |
            |  Generator -- implements sprint|
            |  Evaluator -- browses live app |
            |    + visits reference site     |
            |    + side-by-side comparison   |
            |    + scores blind (no attempt #)|
            |  Score >= 8 -- git merge (keep)|
            |  Score < 8 -- git reset (discard)|
            |  Max 3 retries per sprint     |
            |  All logged to experiments.jsonl|
            +-------------------------------+
                          |
            Human Review (agentation toolbar)
                          |
            Polish --> Feel Pass --> Humanizer
                          |
                     build-report.md

Design Pipeline

The design system is not generated from keyword catalogs. Instead:

Design Research Agent browses curated design galleries for award-winning sites matching the product domain
Visits 3 reference sites live, screenshots heroes and inner sections
Extracts real colors, fonts, and layout patterns from computed CSS
Writes design-system/MASTER.md with concrete references and tokens
Reference screenshots saved to design-system/references/

The Planner writes product specs only (features, user flows, data model). It never specifies colors, fonts, or visual direction. Design comes from research, not imagination.

Evaluator Rigor

Side-by-side comparison: Evaluator visits a reference site during each evaluation and compares it to the app
Score anchoring: Reference sites = 8-9 on the scale. The app is scored relative to them.
Blind evaluation: The Evaluator never knows which attempt number it is, preventing score inflation on retries
Design research verification: If MASTER.md or reference screenshots are missing, design scores are auto-capped at 4

Optional: Human Review with Agentation

For the human review phase, install agentation-mcp:

npm install -g agentation-mcp
npx agentation-mcp init

This lets you annotate the live app directly in your browser. The agent watches for annotations and fixes them in real-time.

Experiment Tracking

Every build attempt is logged to experiments.jsonl:

{"sprint":1,"attempt":1,"scores":{"functionality":8,"design":5,"craft":5,"originality":4},"avg":5.5,"status":"discard","description":"basic grid layout"}
{"sprint":1,"attempt":2,"scores":{"functionality":9,"design":8,"craft":8,"originality":8},"avg":8.25,"status":"keep","description":"editorial layout with extracted design tokens"}

The Generator reads this before each attempt to avoid repeating failed approaches.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
resume		resume
review		review
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ganbuild

Quick start

Install

Works well with gstack

Usage

How it works

Design Pipeline

Evaluator Rigor

Optional: Human Review with Agentation

Experiment Tracking

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ganbuild

Quick start

Install

Works well with gstack

Usage

How it works

Design Pipeline

Evaluator Rigor

Optional: Human Review with Agentation

Experiment Tracking

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages