A senior engineer embedded in your git workflow. Reviews pull requests for security, quality, and compliance — with full observability, drift detection, and a 5-provider LLM fallback chain. Everything lives in git.
- What It Does
- How It Works
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- CI/CD Integration
- Serverless Deployment
- Audit Dashboard
- Example Output
- Severity Levels
- Risk Score
- Audit Log
- Cryptographic Signatures
- Memory System
- Human-in-the-Loop
- Skills Reference
- Environment Variables
- Troubleshooting
- Built With
GitClaw is a provider-agnostic AI agent that runs on every pull request and:
- Narrates the diff in plain English before reviewing — so human reviewers know exactly where to focus
- Reviews for security — hardcoded secrets, SQL injection, broken auth, unsafe dependencies
- Scores code quality across 4 dimensions: complexity, test coverage gap, duplication, maintainability (0–100 each)
- Cites authoritative references — OWASP, CVE, ESLint, NIST — for every HIGH/CRITICAL finding
- Posts a structured GitHub comment with verdict, risk score, and actionable fixes
- Emits OTel trace spans — latency, token count, cost per run — to
.gitagent/traces.jsonl - Writes structured daily logs at DEBUG/INFO/WARN/ERROR to
.gitagent/logs/YYYY-MM-DD.ndjson - Detects behavioral drift — compares rolling reviews against a baseline, fires alerts if the agent becomes lenient
- Monitors its own health — p50/p95/p99 latency, cost trends, escalation rate — every 5 reviews
- Learns your codebase — memory system tracks recurring patterns and hot paths across PRs
- Signs every audit entry with Ed25519 — tamper-evident compliance artifact
- Escalates to humans when PRs touch auth, payments, DB migrations, or secrets
- Falls back across 5 LLM providers — Anthropic → OpenAI → Groq → NVIDIA NIM → Gemini
When triggered (manually, via CLI, or by a GitHub webhook), the agent runs 9 skills in sequence:
PR opened / updated
│
▼
┌──────────────────┐
│ narrate-diff │ ← Plain-English summary, identifies highest-risk file:line
└────────┬─────────┘
▼
┌──────────────────┐
│ review-pr │ ← Fetches diff, scans CRITICAL→INFO, formats comment
└────────┬─────────┘
▼
┌──────────────────┐
│ quality-score │ ← Complexity, test gap, duplication, maintainability (0–100)
└────────┬─────────┘
▼
┌──────────────────┐
│ justify-decision │ ← HIGH/CRITICAL only: OWASP/CVE/ESLint citations
└────────┬─────────┘
▼
┌──────────────────┐
│ audit-log │ ← Signed JSON → .gitagent/audit.jsonl
└────────┬─────────┘
▼
┌──────────────────┐
│ emit-log │ ← Structured NDJSON → .gitagent/logs/YYYY-MM-DD.ndjson
└────────┬─────────┘
▼
┌──────────────────┐
│ observe-trace │ ← OTel span → .gitagent/traces.jsonl
└────────┬─────────┘
▼
Post GitHub comment · escalate if CRITICAL · run health/drift checks
Each step is logged. Nothing is auto-merged. The agent recommends; humans decide.
GitClaw/
├── agent.yaml # Manifest — model, 9 skills, human-in-the-loop, compliance
├── SOUL.md # Agent identity and communication style
├── RULES.md # Hard constraints (must/must-never)
├── index.js # Orchestrator — provider selection, tracing, memory, health
├── providers.js # 5-provider fallback chain (Anthropic→OpenAI→Groq→NIM→Gemini)
├── test.js # Provider connectivity + review smoke tests
├── clawless.config.js # Serverless deployment (webhooks, secrets, volumes)
├── skills/
│ ├── narrate-diff/SKILL.md # Plain-English PR summary, identifies risk focus area
│ ├── review-pr/SKILL.md # CRITICAL→INFO security & quality scan
│ ├── quality-score/SKILL.md # 4-dimension code quality scorer (0–100 each)
│ ├── justify-decision/SKILL.md # OWASP/CVE/ESLint citations for HIGH+ findings
│ ├── audit-log/SKILL.md # Append-only compliance trail
│ ├── emit-log/SKILL.md # Structured daily NDJSON logs
│ ├── observe-trace/SKILL.md # OTel-compatible trace spans
│ ├── health-check/SKILL.md # Agent health metrics — every 5 reviews
│ └── detect-drift/SKILL.md # Behavioral drift detection — every 10 reviews
├── tools/
│ ├── github-pr.yaml # Tool schema: get_diff, post_comment, get_files
│ └── github-pr.js # Implementation using fetch() — WebContainer-safe
├── .github/
│ └── workflows/review.yml # GitHub Actions trigger on PR open/update
├── dashboard/
│ └── index.html # Standalone audit dashboard (no server, drag-drop)
├── memory/
│ └── patterns.json # Codebase memory — hot paths, recurring issues
└── metrics/
├── health.json # Current agent health snapshot
├── baseline.json # Drift detection baseline (written at review #10)
└── drift.json # Latest drift signal check result
GitClaw is provider-agnostic. Set any combination of keys — it picks the first available one automatically:
| Tier | Provider | Model | Env Var | Free tier? |
|---|---|---|---|---|
| 1 | Anthropic Claude | claude-sonnet-4-5 | ANTHROPIC_API_KEY |
No |
| 2 | OpenAI | gpt-4.1 | OPENAI_API_KEY |
No |
| 3 | Groq Llama | llama-3.3-70b-versatile | GROQ_API_KEY |
Yes |
| 4 | NVIDIA NIM | llama-3.1-70b-instruct | NVIDIA_API_KEY |
Yes (limited) |
| 5 | Google Gemini | gemini-1.5-pro | GEMINI_API_KEY |
Yes |
# Force a specific tier by unsetting higher-priority keys
ANTHROPIC_API_KEY="" node index.js 42 owner/repo # uses OpenAI
ANTHROPIC_API_KEY="" OPENAI_API_KEY="" node index.js 42 owner/repo # uses Groq- Node.js v18 or higher (
node --version) - npm v9 or higher
- At least one LLM API key (see table above — Groq has a free tier)
- A GitHub Personal Access Token with
reposcope (optional for dry-run on public repos)
# Clone the repo
git clone https://github.com/your-org/gitclaw.git
cd gitclaw
# Install dependencies
npm install
# Set up credentials
cp .env.example .envOpen .env and fill in:
GITHUB_TOKEN=
ANTHROPIC_API_KEY=Controls the model, which skills are active, and when to escalate to a human:
model:
preferred: claude-sonnet-4-5-20250929 # swap to claude-opus for deeper reviews
human_in_the_loop:
enabled: true
trigger: "when PR touches auth, secrets, DB migrations, or billing logic"Hard behavioral constraints — the agent reads this on every run. Edit it to add project-specific rules (e.g. "always flag usage of our deprecated internal SDK").
Defines the agent's tone and expertise. You can tune it to match your team's review culture.
# Review PR #42 in owner/repo
node index.js 42 owner/repoPR_NUMBER=42 GITHUB_REPO=owner/repo npm start# Fetches the real diff, runs the full review, prints the comment instead of posting
node index.js 123 vercel/next.js --dry-run
node index.js 42 expressjs/express --dry-runnpm start # node index.js (reads PR_NUMBER + GITHUB_REPO from env)
npm test # provider connectivity + review smoke test (no repo needed)
npm run test:quick # ping all configured providers only
npm run test:anthropic # test Anthropic provider specifically
npm run test:groq # test Groq specifically
npm run validate # validates agent.yaml and skill manifests▶ Running skill: review-pr
🔧 Tool: github-pr({"action":"get_diff","repo":"owner/repo","pr_number":42})
🔧 Tool: github-pr({"action":"post_comment","repo":"owner/repo","pr_number":42,...})
▶ Running skill: justify-decision
▶ Running skill: audit-log
🔧 Tool: Write({".gitagent/audit.jsonl"})
🚨 Escalating to human — CRITICAL finding: hardcoded secret in src/auth.js:42
✅ Done. Audit written to .gitagent/audit.jsonl
# Pretty-print all entries
cat .gitagent/audit.jsonl | jq '.'
# Show only blocked PRs
cat .gitagent/audit.jsonl | jq 'select(.verdict == "BLOCKED")'
# Count reviews per day
cat .gitagent/audit.jsonl | jq -r '.timestamp[:10]' | sort | uniq -cAdd this workflow to trigger GitClaw automatically on every PR:
# .github/workflows/gitclaw-review.yml
name: GitClaw PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run GitClaw review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
PR_NUMBER: ${{ github.event.pull_request.number }}
GITHUB_REPO: ${{ github.repository }}
run: node index.jsAdd ANTHROPIC_API_KEY to your repo's Actions secrets under Settings → Secrets and variables → Actions.
gitclaw-review:
image: node:20
script:
- npm install
- node index.js $CI_MERGE_REQUEST_IID $CI_PROJECT_PATH
variables:
GITHUB_TOKEN: $GITHUB_TOKEN
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"dashboard/index.html is a self-contained, zero-dependency dashboard that runs entirely in the browser — no server, no build step.
To open it:
open dashboard/index.html
# or just double-click it in FinderTo load data: drag and drop your .gitagent/audit.jsonl file onto the page, or click to browse.
You'll see:
- Summary stat cards — total reviews, blocked PRs, escalations, critical findings, average risk score
- Full review history table with verdict badges, risk scores, finding counts, escalation flags, and signature previews
- Color-coded risk levels (🔴 ≥60, 🟡 ≥30, 🟢 <30)
All data is processed locally — nothing leaves your machine.
Deploy to Clawless for zero-infra, webhook-triggered reviews — no server needed:
# Deploy
npx clawless deploy
# View deployment status
npx clawless status
# Tail live logs
npx clawless logs --followAfter deploying, register the Clawless webhook URL in your GitHub repo under Settings → Webhooks. Select the Pull requests event. GitClaw will fire automatically on every PR open/update.
The clawless.config.js handles:
- Webhook payload mapping (PR number, repo, commit SHA → env vars)
- Secret injection from Clawless secrets store
- Persistent volume for the audit log across invocations
Posted directly as a GitHub PR comment:
## ReviewAgent Report 🤖
**PR:** #42 · **Files changed:** 3 · **Verdict:** 🚫 BLOCKED
---
### 🚫 Critical (must fix before merge)
- **[CRITICAL]** `src/auth.js:42` — Hardcoded API key detected. Move to environment variable.
> Rule: OWASP A02:2021 Cryptographic Failures
> Fix: `const key = process.env.API_KEY`
> Ref: https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
### ⚠️ High
- **[HIGH]** `src/db.js:17` — Raw SQL string concatenation. SQL injection risk.
> Rule: OWASP A03:2021 Injection
> Fix: Use parameterized queries — `db.query('SELECT * FROM users WHERE id = ?', [id])`
> Ref: https://owasp.org/Top10/A03_2021-Injection/
### 🔶 Medium
- **[MEDIUM]** `src/api.js:88` — `console.log` with user data left in production path.
> Fix: Remove or replace with a structured logger that respects log levels.
### 💡 Suggestions
- **[LOW]** `src/utils.js:12` — Unused import `lodash`. Remove to reduce bundle size.
- **[INFO]** `src/api.js:34` — Consider extracting this 40-line function for testability.
---
*Reviewed by ReviewAgent v1.0.0 · Audit entry written to `.gitagent/audit.jsonl`*
| Level | Badge | Meaning | Blocks merge? |
|---|---|---|---|
| CRITICAL | 🚫 | Security vulnerability, hardcoded secret, data exposure | Yes |
| HIGH | Injection risk, broken auth, unsafe dependency | Yes | |
| MEDIUM | 🔶 | Missing tests, debug code in prod, deprecated API | Recommended fix |
| LOW | 💬 | Style, naming, unused imports | No |
| INFO | 💡 | Minor refactor suggestions | No |
Every review computes a weighted risk score (0–100):
score = min(100, CRITICAL×40 + HIGH×15 + MEDIUM×5 + LOW×1)
| Score | Badge | Label |
|---|---|---|
| 60–100 | 🔴 | CRITICAL RISK — human escalation triggered |
| 30–59 | 🟡 | ELEVATED RISK — changes requested |
| 0–29 | 🟢 | LOW RISK — likely approvable |
The score is included in:
- The PR comment header (visible to all reviewers)
- The audit log entry (
risk_scorefield) - The git commit message (
audit: PR #42 — BLOCKED (risk: 80/100)) - The dashboard stat cards
Every review appends a structured entry to .gitagent/audit.jsonl:
{
"timestamp": "2025-09-15T14:32:00Z",
"agent": "pr-review-agent",
"version": "1.0.0",
"event": "pr_reviewed",
"pr_number": 42,
"repo": "owner/repo",
"verdict": "BLOCKED",
"findings": {
"CRITICAL": 1,
"HIGH": 1,
"MEDIUM": 1,
"LOW": 1,
"INFO": 0
},
"human_escalated": true,
"skill_invoked": "review-pr",
"commit_sha": "abc123def456",
"reviewer": "ReviewAgent/claude-sonnet-4-5"
}Properties:
| Field | Type | Description |
|---|---|---|
timestamp |
ISO 8601 UTC | When the review ran |
verdict |
string | APPROVED, CHANGES_REQUESTED, or BLOCKED |
findings |
object | Count of findings per severity level |
human_escalated |
boolean | Whether a human reviewer was paged |
commit_sha |
string | Head commit of the PR at review time |
reviewer |
string | Agent + model that produced the review |
The log is append-only and version-controlled. It survives repo clones, is diff-able in git history, and serves as a compliance artifact for SOC 2 / ISO 27001 audits.
Every audit entry is signed before it's written to disk using gitclaw identity sign (Ed25519). The signed entry includes two extra fields:
{
"...": "...",
"signature": "ed25519:base64encodedSignatureHere==",
"public_key": "SHA256:fingerprint"
}To verify an entry:
npx gitclaw identity verify --entry "$(tail -1 .gitagent/audit.jsonl)"If gitclaw identity is unavailable (e.g. in a minimal CI environment), the agent falls back to a deterministic UNSIGNED:<hash> placeholder so the schema stays consistent and the field is always present. You can grep for UNSIGNED: to detect unverified entries.
memory/patterns.json is a local learning file that grows with each PR review. It stores:
hot_paths— directories with the most frequent findings (top 10), e.g.src/auth,db/migrationsrecurring_issues— issue patterns seen more than once, with counts and last-seen timestamps (top 20)version— increments on every update so you can track drift
Before each review, the agent reads this file and injects the context into its task prompt:
"Recurring issues to watch: "sql injection in db.js" (seen 4x, HIGH); "hardcoded token" (seen 2x, CRITICAL)"
This means the agent gets progressively more focused on your codebase's specific weaknesses over time.
To reset memory:
rm memory/patterns.jsonTo inspect it:
cat memory/patterns.json | jq '.recurring_issues[:5]'GitClaw escalates automatically when a PR touches:
- Authentication or session logic
- Payment or billing code
- Database migrations
- Cryptographic primitives
- Environment secrets or
.envfiles
When escalated, the agent logs human_escalated: true in the audit entry and outputs a 🚨 line to the console (or triggers a Clawless notification if deployed). It never auto-merges — it only recommends.
The core skill. Fetches the PR diff via the github-pr tool, scans across five severity categories, and formats a structured Markdown comment. Runs first, always.
Runs after review-pr for any HIGH or CRITICAL finding. Maps the finding to an authoritative source (OWASP Top 10, CVE database, NIST, CWE, ESLint docs) and appends a one-line citation. Makes findings undeniable.
Runs last. Appends a structured JSON entry to .gitagent/audit.jsonl. Never overwrites — always appends. The file is committed to the repo on each run so the trail is version-controlled.
| Variable | Required | Description |
|---|---|---|
GITHUB_TOKEN |
Yes | GitHub PAT with repo scope |
ANTHROPIC_API_KEY |
Yes | Anthropic API key |
PR_NUMBER |
Yes* | PR number to review (*or first CLI arg) |
GITHUB_REPO |
Yes* | owner/repo format (*or second CLI arg) |
gitclaw: command not found / Cannot find package 'gitclaw'
Run npm install first. The gitclaw and clawless packages must be installed.
Error: GITHUB_TOKEN is not set
Copy .env.example to .env and fill in your token. Make sure it has the repo scope.
npm run validate fails
Check that agent.yaml references skill names that exactly match the name: field in each SKILL.md frontmatter.
Agent posts no comment on the PR
Verify your GITHUB_TOKEN has write access to the target repo. Tokens for forks won't have permission to post on the upstream repo by default.
Audit log not persisting between Clawless runs
Confirm the audit-trail volume is configured in clawless.config.js and that .gitagent is in the mountPath.
- Claude Sonnet — core reasoning model
- GitClaw — agent runtime and skill orchestration
- Clawless — serverless deployment and webhook triggers
MIT