Skills with teeth. 51 opinionated rules for AI coding agents, 33 with a verifier that fails when ignored.
−72% verifier-visible slop on Sonnet, −58% on Haiku, 20 adversarial prompts × 3 runs per arm. Per 1000 LOC: −85% Sonnet, −71% Haiku. Same model both arms, only the skills change.
Live playground · Install · Skills · Benchmarks · Worked example
You give Claude / Cursor / Codex a prompt. It writes code. Most "skill packs" stop there - a long markdown file the agent is told to read. Forge ships executable verifiers that run on the output and fail when the agent ignored the rule. Same verifiers run in four places:
| Entry point | When it fires |
|---|---|
hooks/ - Claude Code post-edit hook |
Every Edit/Write/MultiEdit, blocking with feedback the model sees |
vscode-extension/ - VS Code & Cursor |
On save, inline Diagnostic warnings with line/column |
mcp-server/ - MCP server |
Any agent that speaks MCP - call verify_snippet / verify_file on demand |
.github/actions/forge-verify/ - GitHub Action |
Every PR; sticky comment, optional merge gate |
51 skills across 14 domains. 33 ship a verifier (shell, AST, or both). The remaining 18 are style registers (brutalist / minimalist / soft / redesign), image-direction (brandkit / imagegen-), and methodology / orchestration (rag / evals / citation / research / agent-) - guidance by nature, not mechanically checkable, marked as such.
The numbers in BENCHMARKS.md. 35 prompts total (20 adversarial + 15 neutral) × 3 runs per arm, same model both arms:
| Sonnet 4.6 | Baseline | Forge | Δ | Violations / 1000 LOC |
|---|---|---|---|---|
| Adversarial | 115 | 32 | −72.2% | 19.87 → 2.98 (−85%) |
| Neutral | 54 | 9 | −83.3% | 9.57 → 1.05 (−89%) |
| Combined | 169 | 41 | −75.7% | 14.79 → 2.12 (−86%) |
| Haiku 4.5 | Baseline | Forge | Δ | Violations / 1000 LOC |
|---|---|---|---|---|
| Adversarial | 127 | 54 | −57.5% | 28.53 → 8.33 (−71%) |
Per skill on Sonnet: forge-api-design 20→1 (−95%), forge-error-handling 61→20 (−67%), six skills zeroed (kubernetes, migrations, logging, frontend, github-actions, prompt-engineering).
Reproduce: cd benchmarks && npm install && BENCH_N_RUNS=3 BENCH_CORPUS=adv npm run all. No API key - uses your local claude CLI.
Drop any SKILL.md into your project. The Claude Code post-edit hook auto-discovers them:
git clone https://github.com/f4rkh4d/forge-skill
cd forge-skill
./hooks/install.sh # one-shot install at the user level
./hooks/install.sh --project # or per-project (this repo only)After install, every file Claude Code edits is checked against the applicable forge verifiers automatically. If a verifier flags a violation, the hook exits with code 2 and Claude sees the violation text on its next turn - it fixes them without you in the loop.
For other agents, see the MCP server, GitHub Action, or VS Code extension.
51 skills across 14 domains. Each is a folder with a SKILL.md (the rules for the model to read) and optionally verify/check_*.sh (the script that runs on the output and fails when the rules were ignored).
| Domain | Skills | Verified |
|---|---|---|
| Design | 9 | 4 |
| Backend | 9 | 7 |
| Data | 2 | 2 |
| Infra | 5 | 5 |
| Security | 1 | 1 |
| Testing | 1 | 1 |
| Output | 1 | 1 |
| Docs | 1 | 1 |
| MCP | 3 | 2 |
| Multi-agent | 3 | 0 |
| LLM apps | 5 | 4 |
| Dev workflow | 5 | 3 |
| Image direction | 3 | 0 |
| Research | 2 | 1 |
Browse skills/ for the full list. AST-grade verifiers (real TypeScript AST traversal, not regex) cover the top 8: forge-frontend, forge-typescript, forge-api-design, forge-error-handling, forge-validation, forge-react-hooks, forge-tests, forge-naming.
skills/<domain>/<skill>/
├── SKILL.md # rules + BAD/GOOD examples (the model reads this)
└── verify/
└── check_*.sh # shell script, exits non-zero with VIOLATION lines
A verifier returns exit code 0 on clean output, non-zero with a list of violations otherwise. Eight skills delegate to verify/lib/ts-ast.mjs which parses the actual TypeScript AST. Card-in-card detected even when extracted to a variable. c.req.json() flagged when consumed without .parse() on the same flow. Hooks flagged when called inside if / loop / ternary / && / after early return.
Install with npm install at the repo root for AST mode; falls back to grep heuristics without Node. The verifiers themselves are covered by a test corpus of 22 fixtures that runs in CI - regressions in verifier logic don't ship silently.
examples/orders-api/ is a small but real Hono + Postgres service built by dogfooding ten skills together. ~1100 lines of TypeScript across 17 files. Read it to see what the kit looks like applied end-to-end on real code.
f4rkh4d.github.io/forge-skill/playground/ loads the actual TypeScript compiler in your browser and runs the six AST checks. Paste TypeScript or TSX, hit Run, see violations with line/column. Nothing leaves your browser. Five "Try this" presets included.
The skill format is intentionally lightweight. To add forge-rust or forge-python-fastapi:
mkdir skills/<domain>/forge-<name>with aSKILL.md(frontmatter + Quick reference + Hard rules + BAD/GOOD examples, see existing skills for shape).- If the rules are grep- or AST-checkable, add
verify/check_*.sh. The CLI auto-discovers it. - Add a fixture to
tests/bad/andtests/good/so the verifier is regression-tested. - Open a PR. CI gates: shellcheck on the verifier, frontmatter present, the corpus passes.
MIT. Built by @f4rkh4d.
