Skip to content

f4rkh4d/forge-skill

Repository files navigation

Forge Skill - the discipline kit for AI agents

Forge Skill

Skills with teeth. 51 opinionated rules for AI coding agents, 33 with a verifier that fails when ignored.

−72% verifier-visible slop on Sonnet, −58% on Haiku, 20 adversarial prompts × 3 runs per arm. Per 1000 LOC: −85% Sonnet, −71% Haiku. Same model both arms, only the skills change.

Stars MIT Sonnet Haiku Playground

Live playground · Install · Skills · Benchmarks · Worked example


What it does

You give Claude / Cursor / Codex a prompt. It writes code. Most "skill packs" stop there - a long markdown file the agent is told to read. Forge ships executable verifiers that run on the output and fail when the agent ignored the rule. Same verifiers run in four places:

Entry point When it fires
hooks/ - Claude Code post-edit hook Every Edit/Write/MultiEdit, blocking with feedback the model sees
vscode-extension/ - VS Code & Cursor On save, inline Diagnostic warnings with line/column
mcp-server/ - MCP server Any agent that speaks MCP - call verify_snippet / verify_file on demand
.github/actions/forge-verify/ - GitHub Action Every PR; sticky comment, optional merge gate

51 skills across 14 domains. 33 ship a verifier (shell, AST, or both). The remaining 18 are style registers (brutalist / minimalist / soft / redesign), image-direction (brandkit / imagegen-), and methodology / orchestration (rag / evals / citation / research / agent-) - guidance by nature, not mechanically checkable, marked as such.

Quick proof

The numbers in BENCHMARKS.md. 35 prompts total (20 adversarial + 15 neutral) × 3 runs per arm, same model both arms:

Sonnet 4.6 Baseline Forge Δ Violations / 1000 LOC
Adversarial 115 32 −72.2% 19.87 → 2.98 (−85%)
Neutral 54 9 −83.3% 9.57 → 1.05 (−89%)
Combined 169 41 −75.7% 14.79 → 2.12 (−86%)
Haiku 4.5 Baseline Forge Δ Violations / 1000 LOC
Adversarial 127 54 −57.5% 28.53 → 8.33 (−71%)

Per skill on Sonnet: forge-api-design 20→1 (−95%), forge-error-handling 61→20 (−67%), six skills zeroed (kubernetes, migrations, logging, frontend, github-actions, prompt-engineering).

Reproduce: cd benchmarks && npm install && BENCH_N_RUNS=3 BENCH_CORPUS=adv npm run all. No API key - uses your local claude CLI.

Install

Drop any SKILL.md into your project. The Claude Code post-edit hook auto-discovers them:

git clone https://github.com/f4rkh4d/forge-skill
cd forge-skill
./hooks/install.sh                 # one-shot install at the user level
./hooks/install.sh --project       # or per-project (this repo only)

After install, every file Claude Code edits is checked against the applicable forge verifiers automatically. If a verifier flags a violation, the hook exits with code 2 and Claude sees the violation text on its next turn - it fixes them without you in the loop.

For other agents, see the MCP server, GitHub Action, or VS Code extension.

Skills

51 skills across 14 domains. Each is a folder with a SKILL.md (the rules for the model to read) and optionally verify/check_*.sh (the script that runs on the output and fails when the rules were ignored).

Domain Skills Verified
Design 9 4
Backend 9 7
Data 2 2
Infra 5 5
Security 1 1
Testing 1 1
Output 1 1
Docs 1 1
MCP 3 2
Multi-agent 3 0
LLM apps 5 4
Dev workflow 5 3
Image direction 3 0
Research 2 1

Browse skills/ for the full list. AST-grade verifiers (real TypeScript AST traversal, not regex) cover the top 8: forge-frontend, forge-typescript, forge-api-design, forge-error-handling, forge-validation, forge-react-hooks, forge-tests, forge-naming.

How the verifiers work

skills/<domain>/<skill>/
├── SKILL.md            # rules + BAD/GOOD examples (the model reads this)
└── verify/
    └── check_*.sh      # shell script, exits non-zero with VIOLATION lines

A verifier returns exit code 0 on clean output, non-zero with a list of violations otherwise. Eight skills delegate to verify/lib/ts-ast.mjs which parses the actual TypeScript AST. Card-in-card detected even when extracted to a variable. c.req.json() flagged when consumed without .parse() on the same flow. Hooks flagged when called inside if / loop / ternary / && / after early return.

Install with npm install at the repo root for AST mode; falls back to grep heuristics without Node. The verifiers themselves are covered by a test corpus of 22 fixtures that runs in CI - regressions in verifier logic don't ship silently.

Worked example

examples/orders-api/ is a small but real Hono + Postgres service built by dogfooding ten skills together. ~1100 lines of TypeScript across 17 files. Read it to see what the kit looks like applied end-to-end on real code.

Try it live

f4rkh4d.github.io/forge-skill/playground/ loads the actual TypeScript compiler in your browser and runs the six AST checks. Paste TypeScript or TSX, hit Run, see violations with line/column. Nothing leaves your browser. Five "Try this" presets included.

Contributing

The skill format is intentionally lightweight. To add forge-rust or forge-python-fastapi:

  1. mkdir skills/<domain>/forge-<name> with a SKILL.md (frontmatter + Quick reference + Hard rules + BAD/GOOD examples, see existing skills for shape).
  2. If the rules are grep- or AST-checkable, add verify/check_*.sh. The CLI auto-discovers it.
  3. Add a fixture to tests/bad/ and tests/good/ so the verifier is regression-tested.
  4. Open a PR. CI gates: shellcheck on the verifier, frontmatter present, the corpus passes.

License

MIT. Built by @f4rkh4d.

About

51 opinionated skills for AI coding agents. 33 ship with AST/shell verifiers. -72% verifier-visible slop on Sonnet 4.6, -58% on Haiku 4.5 (20 adversarial prompts, 3 runs/arm).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors