AutoPentest

An agentic pentesting MCP server that automates web application penetration testing using the full OWASP Web Security Testing Guide and PortSwigger Web Security Academy technique references.

Point it at a target — it crawls your app, maps every endpoint, then spawns role-specialized agents (Scout, Analyzer, Exploiter, Reporter) to test for XSS, SQLi, SSRF, SSTI, IDOR and more. No false positives — every finding is backed by real, reproducible evidence with quality gates enforcing proof at every phase. Includes 31 PortSwigger technique guides, adaptive WAF evasion for 12 vendors, cross-phase vulnerability chaining, and risk-weighted endpoint prioritization. Run it with Claude Code, the API, or go fully offline using Ollama models.

Think of it as: A senior pentester's methodology encoded into an MCP server — 109 OWASP tests, 31 PortSwigger attack technique guides, 68+ MCP tools, 27 security tools, 4 specialized agent roles, 7 structured phases, automated quality assurance, and a zero-context final review.

Why AutoPentest?

Manual penetration testing is thorough but slow. Automated scanners are fast but shallow. AutoPentest bridges the gap:

Capability	Manual Pentest	Automated Scanner	AutoPentest
Full OWASP WSTG coverage	Depends on tester	Partial	109 tests
Business logic testing	Yes	No	Yes
Multi-step exploitation	Yes	Limited	Yes
Vulnerability chaining	Yes	No	Yes
Evidence-based findings	Yes	Template output	Reproducible curl commands
Consistent quality	Varies	Yes	Phase gates + Final Judge
Speed	Days	Minutes	Hours
Cross-domain auth (SSO/OIDC)	Manual setup	Usually fails	Automated handling

Architecture

┌─────────────────────────────────────────────────────────────┐
│                  LLM Orchestrator (Claude)                  │
│                                                             │
│  Reads CLAUDE.md workflow, manages phases,                  │
│  spawns role-specialized subagents                          │
└──────────┬──────────┬──────────┬──────────┬─────────────────┘
           │          │          │          │
     ┌─────▼────┐ ┌───▼─────┐ ┌──▼───────┐ ┌▼─────────┐
     │  Scout   │ │Analyzer │ │Exploiter │ │ Reporter │
     │  (recon) │ │ (vuln   │ │ (proof)  │ │ (QA /    │
     │          │ │  disc.) │ │          │ │  judge)  │
     └──────────┘ └─────────┘ └──────────┘ └──────────┘
           │          │          │          │
           │     MCP  │          │     MCP  │
           ▼          ▼          ▼          ▼
┌──────────────────────────┐  ┌──────────────────────┐
│  WSTG MCP Server         │  │  Playwright MCP      │
│  (68+ tools)             │  │  (Browser Testing)   │
│                          │  │                      │
│  ◦ 109 WSTG tests        │  │  ◦ DOM XSS proof     │
│  ◦ 31 technique guides   │  │  ◦ Clickjacking      │
│  ◦ Task tree             │  │  ◦ JS-rendered auth  │
│  ◦ Knowledge graph       │  └──────────────────────┘
│  ◦ WAF evasion           │
│  ◦ Tool output parser    │
│  ◦ Results verification  │  docker exec
│  ◦ Context compression   │       │
│  ◦ Endpoint priority     │       ▼
│  ◦ Quality gates         │  ┌──────────────────────┐
│  ◦ Report generation     │  │  autopentest-tools   │
└──────────────────────────┘  │  (Docker Container)  │
                              │                      │
                              │  27 security tools:  │
                              │  nuclei, sqlmap,     │
                              │  dalfox, katana,     │
                              │  ffuf, nmap ...      │
                              │                      │
                              │  Burp proxy          │
                              │  passthrough         │
                              └──────────────────────┘

How it works:

Claude Code reads CLAUDE.md for the complete pentest methodology and orchestrates the 7-phase workflow
Role-specialized subagents (Scout, Analyzer, Exploiter, Reporter) execute focused tasks with dedicated prompt templates, tool guidance, and anti-patterns
WSTG MCP Server (68+ tools) provides OWASP test procedures, 31 PortSwigger technique guides, hierarchical task tree, knowledge graph, WAF evasion, endpoint prioritization, results verification, context compression, quality gates, and report generation
Docker Container runs all 27 security tools — traffic optionally routes through Burp Suite for passive monitoring
Playwright MCP handles browser-based testing (DOM XSS, clickjacking, JS-rendered login pages)

Features

Comprehensive OWASP Coverage

109 WSTG test cases across 12 categories — from information gathering to API testing
Each test includes step-by-step CLI procedures, context-specific payloads, detection criteria, and severity rubrics
Tests are prioritized (MUST/SHOULD) with conditional triggers so nothing relevant is skipped

31 PortSwigger Attack Technique Guides

Sourced from PortSwigger Web Security Academy — detection methods, exploitation techniques, payloads, cheat sheets, and WAF bypass patterns
Organized by vulnerability class (SQLi, XSS, SSRF, JWT, OAuth, etc.) for direct use during testing
Integrated into every testing phase — agents automatically load the relevant technique guide before testing each vulnerability class
Database/platform-specific payload tables (Oracle vs MySQL vs PostgreSQL vs MSSQL for SQLi, Jinja2 vs Twig vs Freemarker for SSTI, etc.)
WAF bypass patterns organized by bypass level (basic → intermediate → advanced)

27 Pre-Configured Security Tools

All tools pre-installed in a single Docker image — make setup and you're ready
Tools organized by phase: discovery, injection testing, authentication, cryptography, API testing
Automatic Burp Suite proxy integration for passive traffic monitoring

Structured 7-Phase Workflow

Phase 0: Application Discovery & Mapping
Phase 1: Information Gathering & Reconnaissance
Phase 2: Configuration & Deployment Testing
Phase 3: Identity, Authentication, Authorization & Session Management
Phase 4: Input Validation Testing (pipelined XSS/SQLi/SSRF pipelines)
Phase 5: Error Handling, Cryptography, Business Logic, Client-Side & API Testing
Phase 6: Coverage Verification & Reporting
Phase 7: Final Judge Review & Remediation

Quality Assurance System

Automated phase gates — each phase must pass quality checks before proceeding
Quality Reviewer subagent at every phase transition identifies gaps and suggests improvements
Final Judge — a zero-context agent reviews the entire engagement cold, like an external QA reviewer
Exhaustion gates — "not vulnerable" requires proof of sufficient testing effort (minimum techniques and bypass attempts)

Evidence-Based Findings

Every finding requires reproducible curl commands and full request/response evidence
Three-tier classification: EXPLOITED (proven impact), POTENTIAL (blocked by control), FALSE_POSITIVE (control holds)
Anti-hallucination framework — "no exploit = no finding" enforced at every level
Evidence checklists per vulnerability class verified before any finding is logged

Role-Specialized Subagents

4 dedicated roles with focused prompt templates, tool guidance, and anti-patterns:
- Scout — reconnaissance only, maps attack surface without sending payloads (Phase 0-1)
- Analyzer — identifies potential sinks with canary/witness payloads, builds exploitation queues (Phase 2-5 analysis)
- Exploiter — consumes Analyzer output, proves exploitation with evidence, logs confirmed findings (Phase 4 exploitation)
- Reporter — quality review and Final Judge, reviews data without sending requests (QA + post-report)
Validation checkpoint between analysis and exploitation prevents wasted effort
Each role has explicit allowed/restricted tool lists and input/output contracts

Pipelined Exploitation (Phase 4)

3 independent two-stage pipelines run in parallel: XSS, Injection (SQLi/CMDi), SSRF/SSTI
Each pipeline: Analyzer (discover → analyze → queue) → validation checkpoint → Exploiter (exploit → log)
Each pipeline loads its PortSwigger technique guide for detection methods, cheat sheets, and WAF bypass patterns
WAF intelligence shared across all pipelines
Context-aware witness payloads for 13 sink types

Adaptive WAF Evasion

Automatic WAF fingerprinting from response headers, body, and status codes — identifies 12 WAF vendors (Cloudflare, AWS WAF, Akamai, Imperva, ModSecurity, F5, FortiWeb, Sucuri, Barracuda, Wordfence, NAXSI, Citrix)
Vendor-specific bypass payloads organized by complexity level (basic → intermediate → advanced)
WAF intelligence shared across all agents via deliverable system
Agents automatically identify WAF on first block response and switch to tailored bypass payloads

Cross-Phase Knowledge Graph

Entity-relationship graph tracks endpoints, parameters, technologies, findings, cookies, domains, and user roles
Automated vulnerability chaining via BFS path finding with 7 predefined chain patterns:
- XSS + missing CSP, XSS + weak cookie (no HttpOnly), Open redirect + OAuth callback
- IDOR + admin role, SSRF + cloud metadata, No lockout + no MFA, CORS + sensitive endpoint
Severity upgrades when chaining materially increases impact
Populated throughout testing, queried after Phase 4 for chain discovery

Hierarchical Task Tree

Persistent tree structure (phases as branches, tests as leaves) prevents LLM depth-first bias and context loss
Main agent maintains strategic macro view; subagents update only their assigned leaf nodes
Auto-propagation: when all children complete, parent auto-completes
Phase-level completion percentages for informed decision-making

Endpoint Risk Prioritization

Score and sort endpoints by risk for prioritized testing — highest risk tested first
Scoring factors: parameter count, technology risk indicators, taint chain confidence, tool convergence, auth requirements, injectable parameter names
Integrated into Phase 0 endpoint map generation

Tool Output Parsing

13 built-in parsers for common CLI tools (nmap, nuclei, sqlmap, ffuf, httpx, whatweb, testssl, nikto, dalfox, katana, gau, wapiti, commix)
Condenses raw tool output 3-5x while preserving key findings, endpoints, and errors
Configurable verbosity: summary (~15 lines), detailed (~50 lines), full (complete parsed output)

CLI Tool Results Verification

Automatic validation of CLI tool output quality — detects empty output, proxy errors, permission issues, and suspicious results
10 per-tool validators (nmap, nuclei, sqlmap, ffuf, feroxbuster, testssl, dalfox, wapiti, katana, httpx) with corrected command suggestions
When a tool produces empty or suspicious output, the validator suggests fixes (e.g., add -Pn for nmap, remove proxy env vars, try different flags)
Integrated into the tool execution workflow — agents call verify_tool_result() after every CLI tool run

Progressive Context Compression

Phase summaries (~500-800 words) auto-generated when phase gates pass — capturing findings, coverage, tool results, and attack surface in compressed form
Prevents context degradation in long-running engagements by replacing raw historical data with structured summaries
get_engagement_summary() combines all phase summaries into a single overview for injecting into new subagent prompts
Summaries stored as deliverables — accessible by any downstream agent without requiring full engagement history

Counterfactual Analysis (Second-Pass Discovery)

After an Analyzer completes with vulnerabilities found, a second Analyzer is spawned with instructions to "assume those vulns are patched"
The counterfactual Analyzer searches for additional vulnerabilities: different endpoints, different parameters, different injection contexts, logic flaws
Results are appended to the existing exploitation queue (automatic merge with deduplication by endpoint+parameter and auto-incrementing IDs)
Based on PenHeal ablation research showing +71% vulnerability coverage with counterfactual prompting

Multi-Domain Support

Automatic SSO/OAuth/OIDC/SAML detection and handling
Per-domain scope registration, crawling, and testing
Cookie jar management for cross-domain session persistence
6-level authentication failure escalation (alternative grants → PKCE → headless browser → token extraction → user provision → unauthenticated)

Crash-Safe Engagement Management

Append-only findings.md and progress.log survive crashes
Git workspace checkpointing with rollback capability
Auto-resume on interruption — resume-prompt.md auto-generated at every checkpoint with full context (target, credentials, current phase, remaining tests, scope). Paste into a new session to continue exactly where you left off
Mid-phase checkpoint granularity — tracks which tests within a phase are completed, not just phase-level state
Full audit trail of every MCP tool call with timestamps

Professional Reporting

Markdown reports with executive summary, findings by severity, test coverage matrix, and tool coverage
Per-category coverage percentages and gap analysis
Vulnerability chaining analysis documented
Final Judge observations and quality notes included

Agent Role System

AutoPentest uses 4 specialized agent roles instead of generic subagents. Each role has a dedicated prompt template with focused tool guidance, input/output contracts, and anti-patterns.

Role	Template	Purpose	Phases
Scout	`templates/agent-roles/scout.md`	Reconnaissance and attack surface mapping	Phase 0-1, source code discovery
Analyzer	`templates/agent-roles/analyzer.md`	Vulnerability discovery with canary/witness payloads	Phase 2-5 analysis
Exploiter	`templates/agent-roles/exploiter.md`	Exploitation proof with evidence	Phase 4 exploitation
Reporter	`templates/agent-roles/reporter.md`	Quality review and Final Judge	Phase transitions, post-report

How the Pipeline Works

Phase 4 (highest-impact testing) uses a two-stage pipeline per vulnerability class:

┌──────────────────────────────────────────────────────────────┐
│                    Pipeline 1: XSS                           │
│                                                              │
│  Analyzer (75 turns)          Exploiter (75 turns)           │
│  ┌─────────────────────┐      ┌─────────────────────┐        │
│  │ Discover endpoints  │      │ Load Analyzer queue │        │
│  │ Send canary payloads│─────▶│ Attempt exploitation│        │
│  │ Build exploit queue │ gate │ Prove impact        │        │
│  │ Save deliverable    │      │ Log findings        │        │
│  └─────────────────────┘      └─────────────────────┘        │
│                          ▲                                   │
│               validate_exploitation_queue()                  │
└──────────────────────────────────────────────────────────────┘

Three pipelines (XSS, Injection, SSRF/SSTI) run in parallel. The validation checkpoint between Analyzer and Exploiter ensures only well-formed exploitation queues proceed.

Role Boundaries

Each role has explicit tool restrictions enforced through prompts:

Scouts cannot call log_finding() or send attack payloads
Analyzers can log configuration findings (missing headers, weak cookies) but not injection-class findings
Exploiters cannot create new queues — they consume what the Analyzer produced
Reporters cannot send HTTP requests to the target — they review data only

For CTF challenges and small apps (<3 input endpoints), a legacy monolithic pipeline is available as a fallback.

Quick Start

Prerequisites

Docker (Docker Desktop on macOS/Windows, Docker Engine on Linux)
Claude Code CLI with an active Anthropic API key
uv (Python package manager for the MCP server)
Node.js (for Playwright MCP server)
Optional: Burp Suite Professional for passive traffic monitoring

Installation

# 1. Clone the repository
git clone https://github.com/bhavsec/autopentest-ai.git
cd autopentest-ai

# 2. Install Python dependencies for the MCP server
cd server && uv sync && cd ..

# 3. Build Docker image and start the tools container
make setup

That's it. All 27 security tools are now installed and ready inside the Docker container.

Verify Installation

# Check all tools are installed
make verify-tools

# Expected output:
# [+] nuclei: installed
# [+] httpx: installed
# [+] katana: installed
# ... (27 tools total)

Start Testing

# Launch Claude Code in the project directory
claude

Then tell Claude what to test:

Run a full WSTG assessment against https://target.example.com

Usage

Option A: Interactive Mode

Launch Claude Code and provide the target:

Run a full pentest against https://app.example.com

Credentials: admin / P@ssw0rd123

Claude will ask for any missing information (like credentials) and begin the 7-phase workflow.

Option B: Config-Driven Mode (Recommended)

Create a YAML config file for repeatable, consistent assessments:

# configs/my-target.yaml
target:
  url: https://app.example.com
  scope:
    - app.example.com
    - api.example.com
  exclude:
    - cdn.example.com

authentication:
  login_type: form
  login_url: https://app.example.com/login
  credentials:
    username: testuser@example.com
    password: secret123
  login_flow:
    - "Type $username into the email field"
    - "Type $password into the password field"
    - "Click the 'Sign In' button"
  success_condition:
    type: url_contains
    value: "/dashboard"

rules:
  avoid:
    - description: "Do not test logout"
      type: path
      url_path: "/logout"
  focus:
    - description: "Prioritize API endpoints"
      type: path
      url_path: "/api"

reporting:
  tester_name: "Security Team"

Then in Claude Code:

Load the config from configs/my-target.yaml and run the pentest

Option C: Targeted Testing

Run specific WSTG tests against specific endpoints:

Run WSTG-INPV-05 (SQL Injection) against https://app.example.com/search?q=

Test https://app.example.com for CORS misconfiguration (WSTG-CONF-13)

Run all authentication tests (WSTG-ATHN) against https://app.example.com

Option D: Resume an Interrupted Engagement

Resume engagement pentest-2026-02-11-myapp

Testing Phases

Phase 0: Application Discovery & Mapping

The critical foundation phase. Claude autonomously:

Pre-flight checks — verifies target reachability, detects redirects and cross-domain auth
Launches 10+ background tools in parallel (katana, ffuf, nuclei, whatweb, gau, nmap, feroxbuster, wapiti, httpx)
Recursive crawling — follows links to depth 2-3, parses HTML/JS for endpoints
Directory brute-forcing — common paths + technology-specific wordlists
Tool result ingestion — reads all background tool outputs and merges into unified endpoint map
Builds structured endpoint inventory with parameters, auth requirements, and priority rankings

Output: A complete endpoint map organized by domain, ready for systematic testing.

Phase 1-2: Reconnaissance & Configuration

Server fingerprinting, technology detection, metadata review
Security header analysis (HSTS, CSP, CORS, X-Frame-Options)
TLS configuration testing, admin interface discovery
HTTP methods testing, file extension handling

Phase 3: Authentication, Authorization & Session Management

Role/privilege lattice built before testing (maps guards, middleware, and bypass tests)
IDOR testing with multiple alternate IDs per endpoint
CSRF testing on every state-changing endpoint
Session fixation, hijacking, and token analysis
JWT vulnerability testing (if applicable)
OAuth/OIDC weakness testing (if applicable)

Phase 4: Input Validation (Highest Impact)

Three independent two-stage pipelines run in parallel, each using the Analyzer→Exploiter role split:

Pipeline	Vulnerability Classes	Tools	Technique Guides
XSS Pipeline	Reflected XSS, Stored XSS, DOM XSS	dalfox, Playwright	XSS, DOM
Injection Pipeline	SQL Injection, Command Injection, NoSQL Injection	sqlmap, commix, nosqli	SQLI, CMDI, NOSQLI
SSRF/SSTI Pipeline	SSRF, SSTI, Path Traversal	sstimap, ssrfmap	SSRF, SSTI, PTRAV

Each pipeline: Analyzer (discover → analyze → build exploitation queue) → validation checkpoint → Exploiter (attempt exploitation → prove impact → log findings). WAF evasion intelligence is shared across all pipelines.

Phase 5: Error Handling, Crypto, Business Logic, Client-Side & APIs

Stack trace and error message disclosure
TLS/SSL testing via testssl.sh
Business logic bypass (workflow circumvention, request forgery)
Client-side testing (clickjacking, open redirects, DOM manipulation)
GraphQL and REST API testing
Vulnerability chaining analysis across all findings

Phase 6: Reporting

Coverage verification (test coverage + tool coverage)
Finding deduplication and severity calibration
Markdown report generation with executive summary, findings, coverage matrices

Phase 7: Final Judge Review

A zero-context agent reviews the entire engagement cold — no knowledge of testing decisions or difficulties. It examines:

Coverage integrity — rubber-stamped tests, missing endpoints
N/A cascade detection — categories with excessive "not applicable" markings
Finding quality — evidence completeness, severity consistency, chaining opportunities
Tool utilization — tools run but output never reviewed, lazy skip reasons
Missed attack surface — untested endpoints, untested parameters, untested domains

The verdict (PASS/CONDITIONAL_PASS/FAIL) triggers specific remediation actions before the report is delivered.

Security Tools

Discovery & Reconnaissance (Phase 0)

Tool	Purpose	Key Flags
katana	Web crawler with JS rendering	`-jc` for JavaScript crawling
httpx	HTTP probing, tech detection	`-tech-detect -status-code -title`
ffuf	Directory/parameter fuzzing	`-w wordlist -mc all -fc 404`
feroxbuster	Recursive directory enumeration	`--smart --auto-tune`
nuclei	Template-based vuln scanner	`-t cves/ -t misconfigurations/`
nikto	Web server misconfiguration	`-Tuning 1234567890`
whatweb	Technology fingerprinting	`--aggression 3`
nmap	Port and service scanning	`-sV -sC --top-ports 1000`
gau	Historical URL discovery	`--blacklist png,jpg,gif`
subfinder	Subdomain enumeration	`-silent -all`

Injection Testing (Phase 4)

Tool	Purpose	Key Flags
sqlmap	SQL injection (all techniques)	`--batch --risk 3 --level 5`
dalfox	XSS scanning & exploitation	`--skip-bav --deep-domxss`
commix	Command injection	`--batch --all`
sstimap	Server-Side Template Injection	`-u <url>`
ssrfmap	SSRF exploitation	`-r request.txt`
nosqli	NoSQL injection	`-u <url>`
crlfuzz	CRLF injection / HTTP splitting	`-u <url>`
smuggler	HTTP request smuggling	`-u <url>`

Authentication & Session (Phase 3)

Tool	Purpose	Key Flags
hydra	Credential brute-force	`-L users.txt -P pass.txt`
jwt_tool	JWT token analysis & exploitation	`-t <token> -M at`

Cryptography & APIs (Phase 5)

Tool	Purpose	Key Flags
testssl.sh	TLS/SSL configuration testing	`--severity HIGH --sneaky`
graphql-cop	GraphQL security testing	`-t <url>`
websocat	WebSocket testing	`ws://<url>`

Infrastructure (Phase 2)

Tool	Purpose
corscanner	CORS misconfiguration scanning
dnsreaper	Subdomain takeover detection

Browser Automation

Tool	Purpose
Playwright	DOM XSS proof, clickjacking, JS-rendered login, client-side storage inspection

WSTG Knowledge Base

109 test cases across 12 OWASP categories, each with CLI-specific procedures:

Code	Category	Tests	Examples
INFO	Information Gathering	10	Search engine discovery, server fingerprinting, metadata review
CONF	Configuration & Deployment	14	Security headers, CORS, CSP, HSTS, admin interfaces
IDNT	Identity Management	5	Role definitions, registration, account enumeration
ATHN	Authentication	11	Default creds, lockout, auth bypass, MFA, password policy
ATHZ	Authorization	5	Directory traversal, auth bypass, privilege escalation, IDOR
SESS	Session Management	11	Cookie attributes, CSRF, session fixation/hijacking, JWT
INPV	Input Validation	20	XSS, SQLi, CMDi, SSTI, SSRF, path traversal, XXE, LDAP
ERRH	Error Handling	2	Error messages, stack traces
CRYP	Cryptography	4	TLS config, padding oracle, weak encryption
BUSL	Business Logic	10	Workflow bypass, request forgery, file upload, rate limits
CLNT	Client-Side	14	DOM XSS, clickjacking, open redirects, WebSockets, storage
APIT	API Testing	3	GraphQL, REST, SOAP

Each test file includes:

Step-by-step CLI procedures (curl commands, tool invocations)
Payloads organized by bypass level (basic, intermediate, advanced)
Detection criteria with severity assessment rubrics
Remediation guidance with references

PortSwigger Technique Guides

31 attack technique reference guides sourced from PortSwigger Web Security Academy, organized by vulnerability class for direct use during real pentesting engagements.

What's Included

Code	Category	WSTG Mapping	Key Content
SQLI	SQL Injection	INPV-05	UNION/blind/error/time-based/OOB techniques, database-specific cheat sheets (Oracle, MySQL, PostgreSQL, MSSQL), WAF bypass
XSS	Cross-Site Scripting	INPV-01, INPV-02, CLNT-01	Reflected/stored/DOM contexts, tag & event handler payloads, CSP bypass, filter evasion
CMDI	OS Command Injection	INPV-12	Separator characters, blind techniques (time-delay, OOB), OS-specific payloads
SSTI	Server-Side Template Injection	INPV-18	Jinja2/Twig/Freemarker/Velocity/ERB detection & exploitation, sandbox escapes
SSRF	Server-Side Request Forgery	INPV-19	URL scheme tricks, IP obfuscation, DNS rebinding, cloud metadata, filter bypass
PTRAV	Path Traversal	INPV-04	Encoding variations, null byte injection, wrapper bypass
XXE	XML External Entities	INPV-07	File retrieval, SSRF via XXE, blind XXE with OOB, parameter entities
AUTHN	Authentication	ATHN-01 to ATHN-07	Brute force, 2FA bypass, password reset poisoning, credential stuffing
AUTHZ	Access Control	ATHZ-01 to ATHZ-04	IDOR, privilege escalation, horizontal/vertical bypass, referer-based controls
JWT	JSON Web Tokens	SESS-10	Algorithm confusion (none/HS256→RS256), kid injection, JWK/JKU exploitation
OAUTH	OAuth 2.0	ATHZ-05	Authorization code theft, open redirect, scope upgrade, CSRF on OAuth flows
CSRF	Cross-Site Request Forgery	SESS-05	Token bypass, SameSite bypass, referer validation bypass
SMUGGLE	HTTP Request Smuggling	INPV-15	CL.TE, TE.CL, TE.TE, HTTP/2 downgrade, request tunneling
DOM	DOM-Based Vulnerabilities	CLNT-01	Sources/sinks, DOM clobbering, prototype pollution gadgets
CORS	Cross-Origin Resource Sharing	CONF-13, CLNT-07	Origin reflection, null origin, subdomain trust exploitation
NOSQLI	NoSQL Injection	INPV-05	MongoDB operator injection, JavaScript injection, blind extraction
GRAPHQL	GraphQL	APIT-01	Introspection, field suggestion, batching attacks, authorization bypass
RACE	Race Conditions	BUSL-04	Limit overrun, TOCTOU, single-endpoint races, last-frame sync
UPLOAD	File Upload	BUSL-08, BUSL-09	Extension bypass, content-type manipulation, web shells, polyglot files
HOST	Host Header Injection	INPV-17	Password reset poisoning, cache poisoning, routing-based SSRF

Plus 11 more: CLICK, WS, CACHEPOIS, CACHEDEC, DESER, INFO, BUSL, PROTO, API, LLM, SKILLS.

How They're Used

Technique guides are integrated into every testing phase via the get_technique_guide() MCP tool:

Phase 2 → CORS guide for CONF-13 testing
Phase 3 → AUTHN, AUTHZ, CSRF, JWT, OAUTH guides for auth/session testing
Phase 4 → SQLI, XSS, CMDI, SSTI, SSRF, PTRAV, XXE guides for input validation
Phase 5 → DOM, CLICK, GRAPHQL, RACE, UPLOAD guides for client-side & business logic

Each parallel testing agent automatically loads its relevant technique guide before testing, providing:

Detection payloads — what to inject to identify the vulnerability
Exploitation techniques — organized by attack method with step-by-step procedures
Cheat sheets — database/platform-specific syntax tables for quick reference
WAF bypass patterns — encoding, obfuscation, and filter evasion strategies

Adding Custom Guides

See docs/adding-knowledge-base-resources.md for instructions on adding new technique guides to the knowledge base.

Quality Assurance System

AutoPentest has a multi-layered QA system that prevents shallow testing:

1. Phase Gates (Automated)

After each phase, phase_gate_check() validates:

All MUST-priority tests were executed
Minimum coverage thresholds are met
Tool coverage is adequate
No critical gaps exist

Blocked phases cannot proceed until all issues are resolved.

2. Quality Reviewer (Per-Phase)

A subagent spawned at every phase transition that:

Checks for 16 known anti-patterns (rubber-stamping, N/A cascades, finding inflation)
Identifies untested endpoints and parameters
Suggests vulnerability chaining opportunities
Recommends alternative approaches for blocked tests

3. Final Judge (Post-Report)

A zero-context agent that reviews the completed engagement with fresh eyes:

Analyzes coverage integrity across all domains
Detects N/A cascades and their root causes
Validates finding quality and evidence completeness
Identifies missed attack surface
Issues a verdict: PASS, CONDITIONAL_PASS, or FAIL

4. Exhaustion Gates

Marking a vulnerability as "not exploitable" requires proof of effort:

Vuln Class	Min Techniques	Min Bypass Attempts
XSS	3	5
SQL Injection	3	5
Command Injection	3	5
SSTI	2	3
SSRF	3	5
Path Traversal	3	5

5. Evidence Checklists

Before logging any finding, evidence requirements are verified:

Reproducible curl command
Full HTTP request and response
Proof of actual exploitation (not theoretical impact)
Correct classification tier (EXPLOITED vs POTENTIAL)

6. Live Engagement Logging

Every MCP tool call is automatically logged to engagements/<eid>/logs.txt with full arguments, results, and execution duration. Run tail -f logs.txt in a separate terminal to watch all agent activity in real time. 100% coverage via automatic tool wrapper — no manual instrumentation needed.

7. Phase Gate Timing

Phase gates enforce minimum 60-second intervals between calls (15s in CTF mode), preventing premature phase completion. Inter-gate work verification warns if fewer than 3 work events occur between consecutive gates.

Benchmarking

AutoPentest includes integration with the XBOW Validation Benchmarks — 104 CTF-style Docker challenges used as the industry standard for benchmarking AI pentest agents.

Benchmark Scores (Reference)

Agent	Score	Source
Shannon	96.2%	KeygraphHQ (2024)
PentestGPT	86.5%	USENIX Sec 2024

Usage

# Setup (one-time)
cd benchmarks/xbow && make setup

# Solve with AutoPentest (MCP server + CLAUDE.md + CTF mode)
make solve ID=XBEN-001-24

# Solve with raw Claude (baseline — no MCP, no methodology)
make solve ID=XBEN-001-24 RAW=1

# Solve by vulnerability tag
make solve-tag TAG=sqli

# Solve all 104 challenges
make solve-all

# Full baseline run for comparison
make solve-all RAW=1

# Score the latest run
make score

# Compare autopentest vs raw runs side-by-side
make compare

The solver has two modes:

autopentest (default): Runs Claude Code from the project root, loading .mcp.json (MCP server with 68+ tools) and CLAUDE.md (pentest methodology). Measures AutoPentest's full capability.
raw (RAW=1): Runs bare Claude Code with no MCP server or methodology. Baseline for measuring AutoPentest's value-add over raw LLM capability.

Each challenge is a Docker Compose app with a flag injected at build time. Flag extraction from Claude's output determines pass/fail. Results are scored per-challenge, per-tag, and per-difficulty-level.

CTF Mode

For CTF challenges and small apps, enable CTF mode for relaxed quality gates:

mode: ctf
target:
  url: https://target.com

CTF mode reduces phase gate timing (15s vs 60s), skips QA Reviewer requirements, and halves completion thresholds — while maintaining finding quality and evidence standards.

Example Report

A complete example report from a pentest against PortSwigger's Gin & Juice Shop (a deliberately vulnerable application) is included in the repository:

View Full Report

What the Report Includes

The report demonstrates AutoPentest's output against a real target with 23 findings across all severity levels:

Severity	Count	Examples
Critical	2	UNION-based SQL injection with full data extraction, access control bypass via X-Original-URL header
High	5	Reflected XSS via JS string escape bypass, IDOR on order details, XXE with local file read, DOM XSS via prototype pollution
Medium	6	Missing security headers, no account lockout, missing CSP, CRLF injection, DOM-based open redirect
Low	5	Infrastructure info disclosure, EOL AngularJS, insecure ALB cookies, weak TLS config
Informational	5	Consolidated duplicates and secondary evidence for primary findings

Report Structure

1. Executive Summary         — Target scope, finding summary, domain architecture
2. Detailed Findings         — Each finding with description, evidence (curl commands), and remediation
3. Vulnerability Chaining    — Cross-finding analysis (e.g., XSS + no CSP = severity upgrade)
4. Test Coverage Matrix      — Per-category WSTG coverage (100% across 12 categories)
5. Tool Coverage Matrix      — 27/27 tools tracked, 8 actively run

Sample Finding (SQL Injection)

From the report — a Critical SQL injection finding with full exploitation evidence:

FINDING-017: SQL Injection in /catalog category parameter — Full Data Extraction

Severity: Critical
WSTG Reference: WSTG-INPV-05

The category parameter is vulnerable to UNION-based SQL injection.
The attacker can:
  1. Inject a single quote to cause a 500 error (confirming injection)
  2. Use UNION SELECT with 8 columns to extract arbitrary data
  3. Enumerate tables: PRODUCTS, TRACKING, USERS
  4. Extract credentials from the USERS table

Evidence (reproducible curl command):
  curl -sk "https://ginandjuice.shop/catalog?category='+UNION+SELECT+1,USERNAME,PASSWORD,
  1,1,USERNAME,1,USERNAME+FROM+USERS+LIMIT+10--"

Every finding includes reproducible curl commands, full request/response evidence, and actionable remediation guidance.

Configuration

Engagement Config (YAML)

Config-driven pentests skip interactive questions and ensure consistency:

target:
  url: https://app.example.com
  scope: [app.example.com, api.example.com]

authentication:
  login_type: sso                    # form | sso | api | manual | none
  login_url: https://app.example.com/login
  credentials:
    username: testuser
    password: secret123
  sso:
    provider: keycloak               # keycloak | auth0 | okta | azure_ad
    auth_domain: auth.example.com
    realm: myrealm
    client_id: my-app

rules:
  avoid:
    - { type: path, url_path: "/logout", description: "Skip logout" }
    - { type: endpoint, method: DELETE, url_path: "/api/admin/*", description: "No destructive admin ops" }
  focus:
    - { type: path, url_path: "/api", description: "Prioritize API" }

reporting:
  tester_name: "Security Team"

MCP Server Configuration

The .mcp.json file registers two MCP servers:

{
  "mcpServers": {
    "wstg-pentest": {
      "command": "uv",
      "args": ["--directory", "./server", "run", "server.py"]
    },
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp"]
    }
  }
}

Burp Suite Integration (Optional)

For passive traffic monitoring through Burp Suite Professional:

Start Burp Suite and enable the proxy on all interfaces (0.0.0.0:8080)
The Docker container automatically routes traffic through host.docker.internal:8080
All HTTP requests appear in Burp's proxy history for manual review

Multi-Domain Testing

AutoPentest has first-class support for applications with multiple domains (e.g., a SPA frontend + API backend + SSO provider):

Automatic Detection

During Phase 0, AutoPentest detects cross-domain authentication by following login redirects:

app.example.com → redirects to → auth.example.com/login
                 → after login → app.example.com/callback

All domains are automatically registered in scope with their type (app, auth_provider, api, cdn).

Per-Domain Testing

Every WSTG test is evaluated per domain — not just the primary:

Discovery tools (katana, ffuf, nuclei) run against all domains
Input validation tools (sqlmap, dalfox) target endpoints on every domain with server-side processing
A test is "not applicable" only when no domain has the tested feature

Cross-Domain Authentication

Supported SSO protocols:

OAuth 2.0 / OIDC (Authorization Code, PKCE, Password Grant, Client Credentials)
SAML (SP-initiated flow)
Keycloak, Auth0, Okta, Azure AD
Custom SSO (redirect chain following with cookie jar)

Authentication escalation procedure (6 levels) ensures testing can proceed even with complex auth flows.

Crash Recovery

AutoPentest is designed to survive interruptions:

Automatic Checkpointing

Phase gates auto-save checkpoints on PASS
git_checkpoint() creates git snapshots of the engagement workspace
Append-only logs (findings.md, progress.log) survive crashes

Auto-Resume via resume-prompt.md (Recommended)

Every checkpoint and phase gate automatically generates engagements/<eid>/resume-prompt.md — a complete, self-contained prompt with everything a fresh session needs:

Target URL, authentication credentials, and scope domains
Current phase and which specific tests remain (mid-phase precision)
Cookie jar status and re-authentication instructions
Avoid/focus rules and endpoint map references

To resume after an interruption:

Open a new Claude Code session
Paste the contents of engagements/<eid>/resume-prompt.md
Claude picks up exactly where it left off — no manual context needed

Resume from Checkpoint (Alternative)

Resume engagement pentest-2026-02-11-myapp

This restores:

All findings and test tracking data
Coverage statistics and phase gate results
Scope registrations and deliverables
Mid-phase remaining tests (not just phase-level state)
Instructions for what to do next

Manual Checkpoints

Save at any time:

Save a checkpoint before starting Phase 4 exploitation

Rollback on Failure

If a phase produces bad results, roll back to the previous checkpoint:

Roll back the engagement to the last checkpoint

Project Structure

autopentest-ai/
├── CLAUDE.md                          # Master pentest workflow (drives Claude Code)
├── .mcp.json                          # MCP server configuration
├── Dockerfile                         # Multi-stage Docker build (27 tools)
├── docker-compose.yml                 # Docker Compose alternative
├── Makefile                           # setup, start, stop, verify-tools, shell
│
├── server/
│   ├── server.py                      # FastMCP server (68+ MCP tools)
│   ├── task_tree.py                   # Hierarchical task tree (6 MCP tools)
│   ├── tool_parsers.py                # Tool output parsing (2 MCP tools, 13 parsers)
│   ├── endpoint_priority.py           # Endpoint risk prioritization (2 MCP tools)
│   ├── waf_evasion.py                 # Adaptive WAF evasion (3 MCP tools, 12 vendors)
│   ├── knowledge_graph.py             # Cross-phase knowledge graph (5 MCP tools)
│   ├── tool_verification.py           # CLI tool results verification (1 MCP tool, 10 validators)
│   ├── context_compression.py         # Progressive context compression (2 MCP tools)
│   └── pyproject.toml                 # Python dependencies
│
├── knowledge-base/
│   ├── web-security-testing-guide/    # OWASP WSTG knowledge base (109 test procedures)
│   │   ├── 01-information-gathering/  # 10 tests (WSTG-INFO-01 → 10)
│   │   ├── 02-configuration/          # 14 tests (WSTG-CONF-01 → 14)
│   │   ├── 03-identity-management/    # 5 tests  (WSTG-IDNT-01 → 05)
│   │   ├── 04-authentication/         # 11 tests (WSTG-ATHN-01 → 11)
│   │   ├── 05-authorization/          # 5 tests  (WSTG-ATHZ-01 → 05)
│   │   ├── 06-session-management/     # 11 tests (WSTG-SESS-01 → 11)
│   │   ├── 07-input-validation/       # 20 tests (WSTG-INPV-01 → 20)
│   │   ├── 08-error-handling/         # 2 tests  (WSTG-ERRH-01 → 02)
│   │   ├── 09-cryptography/           # 4 tests  (WSTG-CRYP-01 → 04)
│   │   ├── 10-business-logic/         # 10 tests (WSTG-BUSL-01 → 10)
│   │   ├── 11-client-side/            # 14 tests (WSTG-CLNT-01 → 14)
│   │   └── 12-api-testing/            # 3 tests  (WSTG-APIT-01 → 03)
│   └── portswigger-academy/           # 31 PortSwigger attack technique guides
│       ├── sql-injection.md           # UNION, blind, error-based, OOB, WAF bypass
│       ├── cross-site-scripting.md    # Reflected, stored, DOM, CSP bypass, filter evasion
│       ├── ssrf.md                    # URL schemes, cloud metadata, DNS rebinding
│       ├── ssti.md                    # Jinja2, Twig, Freemarker sandbox escapes
│       ├── jwt.md                     # Algorithm confusion, kid injection, JWK exploitation
│       ├── oauth.md                   # Auth code theft, redirect exploitation, scope upgrade
│       └── ... (31 total)             # One per vulnerability class
│
├── templates/                         # Testing guides and procedures
│   ├── input-validation-guide.md      # Phase 4 step-by-step procedures
│   ├── testing-strategies.md          # Test matrices, chaining, parallel strategy
│   ├── cli-tools-guide.md             # Tool setup and Docker management
│   ├── tools.md                       # Per-tool command reference
│   ├── quality-gates.md               # Phase quality checklists and anti-patterns
│   ├── cross-domain-auth-guide.md     # SSO/OIDC/SAML procedures
│   ├── source-code-analysis.md        # Security-focused code review template
│   ├── pipelined-testing.md           # Phase 4 pipelined exploitation strategy
│   ├── agent-roles/                   # Role-specialized subagent templates
│   │   ├── README.md                  # Role index and selection guide
│   │   ├── scout.md                   # Reconnaissance role (Phase 0-1)
│   │   ├── analyzer.md                # Vulnerability discovery role (Phase 2-5)
│   │   ├── exploiter.md               # Exploitation proof role (Phase 4)
│   │   └── reporter.md                # QA review + Final Judge role
│   ├── shared/
│   │   ├── honesty-framework.md       # Anti-hallucination guardrails
│   │   ├── exploit-classification.md  # Three-tier finding classification
│   │   ├── reproducibility.md         # Evidence format requirements
│   │   └── scope-rules.md            # Avoid/focus rule templates
│   └── wordlists/                     # Tech-specific fuzzing wordlists
│
├── benchmarks/
│   └── xbow/                              # XBOW benchmark suite (104 CTF challenges)
│       ├── runner.py                      # Challenge orchestration
│       ├── solver.py                      # Automated solver (Claude Code CLI)
│       ├── Makefile                       # solve, solve-all, score, compare
│       └── results/                       # Run reports
│
├── docs/
│   ├── ROADMAP.md                         # Competitive analysis + improvement roadmap
│   └── adding-knowledge-base-resources.md # Guide for adding new technique guides
│
├── configs/
│   ├── example-config.yaml            # Example engagement configuration
│   └── config-schema.md               # YAML schema documentation
│
├── scripts/
│   ├── install-tools.sh               # Docker build + container start
│   ├── browser-auth.py                # Headless Chromium auth (JS-rendered logins)
│   ├── pkce-auth.py                   # OAuth 2.0 PKCE flow automation
│   └── status.sh                      # Engagement status dashboard
│
└── engagements/                       # Runtime output (git-ignored)
    └── <engagement-id>/
        ├── logs.txt                   # Live engagement log (tail -f to watch)
        ├── findings.md                # Append-only findings log
        ├── progress.log               # Timestamped event log
        ├── resume-prompt.md           # Auto-resume prompt (paste into new session)
        ├── report.md                  # Final pentest report
        ├── cookies.txt                # Cross-domain cookie jar
        └── tool-output/               # Raw CLI tool outputs

Requirements

Requirement	Version	Notes
Docker	20.10+	Docker Desktop on macOS/Windows
Claude Code	Latest	`npm install -g @anthropic-ai/claude-code`
uv	0.1+	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
Node.js	18+	For Playwright MCP server
Python	3.10+	Managed by uv (no manual install needed)
Burp Suite Pro	Latest	Optional — for passive traffic monitoring

Supported platforms: macOS (Apple Silicon & Intel), Linux (x86_64 & ARM64)

FAQ

Q: Does this replace a human penetration tester?

No. AutoPentest automates the systematic, methodology-driven parts of a pentest. It excels at coverage (ensuring nothing is missed) and consistency (every test follows the same procedure). However, complex business logic, creative exploitation chains, and context-dependent risk assessment still benefit from human expertise. Think of it as a force multiplier.

Q: How long does a full assessment take?

It depends on the application's size and complexity. A typical medium-sized web app (50-100 endpoints) takes a few hours. Multi-domain applications with SSO take longer. The pipelined Phase 4 architecture parallelizes the most time-intensive testing.

Q: Can I run this without Burp Suite?

Yes. Burp Suite is optional and used only for passive traffic monitoring. All HTTP requests go through docker exec curl and all security tools run inside the Docker container. Without Burp, you lose the ability to review traffic in Burp's proxy history, but all testing functionality works.

Q: What are the PortSwigger technique guides?

31 attack reference guides covering detection, exploitation techniques, payloads, cheat sheets, and WAF bypass patterns — sourced from PortSwigger Web Security Academy. During testing, agents automatically load the relevant guide (e.g., the SQLi guide when testing for SQL injection) for comprehensive technique and payload reference. See docs/adding-knowledge-base-resources.md to add your own guides.

Q: How do I add custom wordlists or payloads?

Place wordlists in templates/wordlists/ and they'll be available inside the Docker container via the volume mount. The WSTG test files in knowledge-base/ can also be customized with additional payloads. To add new attack technique guides, follow the instructions in docs/adding-knowledge-base-resources.md.

Q: Can I test applications behind a VPN?

Yes. The Docker container inherits your host's network (on Linux with --network host) or reaches the host via host.docker.internal (on macOS/Windows). If your VPN is running on the host, the container can reach VPN-protected targets.

Q: What happens if a pentest is interrupted (crash, usage limit, timeout)?

AutoPentest automatically generates a resume-prompt.md file at every checkpoint with everything needed to continue. Open a new Claude Code session, paste the contents of engagements/<eid>/resume-prompt.md, and testing resumes exactly where it left off — including mid-phase progress, credentials, scope, and remaining tests.

Q: What about rate limiting?

AutoPentest includes three-tier error classification (Transient/Rate Limit/Permanent) with automatic backoff. If the target rate-limits requests, tools automatically slow down. You can also set avoid rules in the config to skip specific endpoints.

Q: What are the agent roles?

AutoPentest uses 4 specialized roles (Scout, Analyzer, Exploiter, Reporter) instead of generic subagents. Each role has a dedicated prompt template with focused tool guidance, restricted tool lists, and anti-patterns. This prevents agents from conflating reconnaissance, analysis, exploitation, and reporting — improving focus and failure isolation. See templates/agent-roles/README.md for the full role index.

Q: How does WAF evasion work?

When a payload gets blocked (403, block page), AutoPentest automatically fingerprints the WAF vendor from response characteristics, then loads vendor-specific bypass payloads organized by complexity level. 12 WAF vendors are supported (Cloudflare, AWS WAF, Akamai, Imperva, ModSecurity, F5, and more). WAF intelligence is shared across all agents via the deliverable system.

Q: What is counterfactual analysis?

After the first analysis pass finds vulnerabilities, AutoPentest can spawn a second Analyzer that assumes all known vulnerabilities are patched. This forces the agent to look for different attack vectors — different endpoints, parameters, injection contexts, and logic flaws. The results are merged into the existing exploitation queue with automatic deduplication. This technique is based on academic research (PenHeal ablation study) showing +71% vulnerability coverage improvement.

Q: How does results verification work?

When CLI tools (nmap, nuclei, sqlmap, etc.) produce empty or suspicious output, the verify_tool_result() tool detects common issues (proxy errors, permission denied, wrong flags) and suggests corrected commands. This prevents agents from silently counting broken tool runs as "completed" — a common failure mode in automated pentesting.

Q: How does vulnerability chaining work?

The knowledge graph tracks entities (endpoints, parameters, findings, cookies, domains) and relationships discovered during testing. After Phase 4, find_chains() uses BFS to discover multi-hop attack paths and checks 7 predefined chain patterns (e.g., XSS + missing CSP, SSRF + cloud metadata, IDOR + admin role). Chains that increase impact trigger automatic severity upgrades.

Disclaimer

This tool is intended for authorized security testing only. Only use AutoPentest against applications you have explicit permission to test. Unauthorized access to computer systems is illegal. The authors are not responsible for any misuse of this tool.

Always ensure you have:

Written authorization from the application owner
A clearly defined scope of what can and cannot be tested
An understanding of the testing environment (production vs staging)
Appropriate avoid rules configured for destructive or sensitive endpoints

Built with Model Context Protocol

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benchmarks		benchmarks
configs		configs
engagements		engagements
knowledge-base		knowledge-base
scripts		scripts
server		server
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cli-output.gif		cli-output.gif
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

AutoPentest

Table of Contents

Why AutoPentest?

Architecture

Features

Comprehensive OWASP Coverage

31 PortSwigger Attack Technique Guides

27 Pre-Configured Security Tools

Structured 7-Phase Workflow

Quality Assurance System

Evidence-Based Findings

Role-Specialized Subagents

Pipelined Exploitation (Phase 4)

Adaptive WAF Evasion

Cross-Phase Knowledge Graph

Hierarchical Task Tree

Endpoint Risk Prioritization

Tool Output Parsing

CLI Tool Results Verification

Progressive Context Compression

Counterfactual Analysis (Second-Pass Discovery)

Multi-Domain Support

Crash-Safe Engagement Management

Professional Reporting

Agent Role System

How the Pipeline Works

Role Boundaries

Quick Start

Prerequisites

Installation

Verify Installation

Start Testing

Usage

Option A: Interactive Mode

Option B: Config-Driven Mode (Recommended)

Option C: Targeted Testing

Option D: Resume an Interrupted Engagement

Testing Phases

Phase 0: Application Discovery & Mapping

Phase 1-2: Reconnaissance & Configuration

Phase 3: Authentication, Authorization & Session Management

Phase 4: Input Validation (Highest Impact)

Phase 5: Error Handling, Crypto, Business Logic, Client-Side & APIs

Phase 6: Reporting

Phase 7: Final Judge Review

Security Tools

Discovery & Reconnaissance (Phase 0)

Injection Testing (Phase 4)

Authentication & Session (Phase 3)

Cryptography & APIs (Phase 5)

Infrastructure (Phase 2)

Browser Automation

WSTG Knowledge Base

PortSwigger Technique Guides

What's Included

How They're Used

Adding Custom Guides

Quality Assurance System

1. Phase Gates (Automated)

2. Quality Reviewer (Per-Phase)

3. Final Judge (Post-Report)

4. Exhaustion Gates

5. Evidence Checklists

6. Live Engagement Logging

7. Phase Gate Timing

Benchmarking

Benchmark Scores (Reference)

Usage

CTF Mode

Example Report

What the Report Includes

Report Structure

Sample Finding (SQL Injection)

Configuration

Engagement Config (YAML)

MCP Server Configuration

Packages