command: /pentest — multi-turn vulnerability discovery loop (MVP) by 0oyun · Pull Request #103 · berabuddies/puffer

0oyun · 2026-05-10T14:47:57Z

Add a TUI-managed /pentest command for authorized web targets. The core command layer registers and validates URL/max-iteration input, then hands execution to TUI-owned queued prompts and state.

Key behavior:

render a single pentest prompt schema with bounded curl/browser guidance and root-cause reporting
track a fixed baseline taxonomy in code, derive pending categories, and ignore model-provided pending values
count unique root_cause_id values instead of category coverage, so same-category defects stay distinct
keep provider/tool failures from advancing successful iterations or no-new streaks; queue recovery prompts for unavailable Browser/provider failures
stop after computed taxonomy coverage plus two successful no-new passes, or at the safety cap
add focused tests for parsing, lifecycle, provider recovery, tool failure retry, dedupe, and stopping

Add a TUI-managed /pentest command for authorized web targets. The core command layer registers and validates URL/max-iteration input, then hands execution to TUI-owned queued prompts and state. Key behavior: - render a single pentest prompt schema with bounded curl/browser guidance and root-cause reporting - track a fixed baseline taxonomy in code, derive pending categories, and ignore model-provided pending values - count unique root_cause_id values instead of category coverage, so same-category defects stay distinct - keep provider/tool failures from advancing successful iterations or no-new streaks; queue recovery prompts for unavailable Browser/provider failures - stop after computed taxonomy coverage plus two successful no-new passes, or at the safety cap - add focused tests for parsing, lifecycle, provider recovery, tool failure retry, dedupe, and stopping

Add a /pentest resume path that reloads persisted run state for the current session and queues the next follow-up prompt. Persist active run state and structured report artifacts as session sidecars so interrupted multi-iteration assessments can continue with prior root causes, coverage, and reports intact. Cover resume and report persistence with pentest-specific tests.

Add /pentest-scoped tool enforcement through prompt allowed_tools metadata while keeping the filter tied to active /pentest turns. Tune the /pentest prompt for real authorized targets: evidence-first probing, inventory-driven gap passes, report-only tool/taxonomy suggestions, and stricter blocked probe handling. Harden /pentest report parsing and state updates so coverage and root-cause counts come from evidence-backed probes/findings rather than model self-report.

Restructures the /pentest slash-command from a single-session model with direct probe tools into a three-role swarm where the orchestrator turn is dispatch-only and probes happen in nested subagent sessions. Roles: - Coordinator (orchestrator turn) allowed_tools = Agent | SendMessage | Read | Grep no Bash / Browser / WebFetch — direct probe tools are intentionally absent; the slash-command request filter blocks them on the orchestrator turn so every probe goes through a subagent. - Sandbox layer five specialists, each its own per-yaml tools[] allowlist, max_turns=3, effort=medium: pentest-recon — surface mapping (headers, robots, sitemap, OpenAPI, source maps, backup/config files, banners) pentest-access — auth / session / authorization (login, JWT, IDOR, mass assignment, business logic) pentest-injection — parsers / interpreters (SQL, NoSQL, cmd, SSTI, SSRF, deser, path traversal, upload) pentest-client — browser-rendered behavior (XSS, CSP, CSRF, CORS, JS secret leak) pentest-infra — deployment misconfig (admin/ debug/metrics exposure, verbose errors, dep versions, rate limit) - Validator independent reproduction agent, dispatched conditionally (see gating rule below): pentest-validator — re-runs a sandbox candidate from a clean context, emits one of confirmed | inconclusive | refuted plus the verbatim reproduction it ran. Does not invent new findings. Hard dispatch budget (non-negotiable): At most 2 Agent tool calls per coordinator iteration. Iteration 1 is pentest-recon plus one other specialist in parallel (2 dispatches); steady state is 1 sandbox plus 0-1 validator (1-2 dispatches); the final REPORT iteration is 0 dispatches. The cap prevents the unbounded fan-out that stalled an earlier draft of this architecture on iter 1. Conditional validator gating (cost control): - Lane A — inline promote (no validator dispatch): confidence=high AND evidence shows a clear server-side state change or sink-confirmed impact AND the sandbox quotes a verbatim request plus response excerpt. Set validated_by=coordinator-inline. - Lane B — dispatch validator (consumes one slot): confidence is medium or lower, OR evidence is purely client-side without server-side state, OR the sandbox omitted the reproduction request, OR the impact claim depends on chained state the sandbox could not re-prove. Set validated_by=pentest-validator after the validator returns confirmed. - Validator verdicts: confirmed → promote to findings[] inconclusive → move to blocked_probes with a concrete next_attempt refuted → drop; one line in coordinator_notes only, never enters findings[] or blocked_probes Per-dispatch scoping rules require a concrete narrow scope per Agent call (specific endpoint / object family / parser / role pair / browser sink), pass safety constraints (same-origin, reversible, no DoS, no exfiltration), and rely on the baked-in max_turns=3 in each specialist yaml rather than passing max_turns through the dispatch input. Aggregation discipline: subagent replies are quoted verbatim into findings[].evidence and probes[].evidence (no paraphrase, no invented fields), each finding and probe is tagged with the via_subagent that produced it, and subagent_dispatches[] carries one entry per Agent call this iteration (length ≤ 2). Output schema additions over the previous single-session /pentest: - findings[].via_subagent which specialist produced the candidate - findings[].validator_evidence verbatim validator stdout (empty when inline-promoted) - findings[].validated_by pentest-validator | coordinator-inline - probes[].via_subagent - subagent_dispatches[] length ≤ 2, one entry per Agent call - coordinator_notes one line per refuted candidate / frontier hypothesis The 17-category coverage taxonomy (recon, auth, session_token, authz, injection_sql, injection_nosql, injection_other, xss, csrf_cors, ssrf, deserialization, file_handling, crypto_secrets, business_logic, misconfiguration, rate_limit_dos, supply_chain) is unchanged. The runtime computes pending taxonomy and the no-new streak from evidence-backed probes/findings; the coordinator's status field is advisory.

AGENTS.md "Every public Rust function must have a docstring" was violated by `command_helpers/pentest.rs`: `prepare_pentest_command` shipped without a docstring, and four sibling public items (constants, enum, struct) had no docs either, leaving the entire public surface of the slash command undocumented. Adds docstrings to: - DEFAULT_PENTEST_MAX_ITERATIONS — the iteration safety cap - PENTEST_USAGE — the help string surfaced on bad input - PentestCommand enum and its Start / Resume / Stop variants - PentestStart struct and its url / max_iterations / initial_prompt fields - prepare_pentest_command — accepted argument shapes, normalization, defaults, and error contract No behavior change.

Copilot

Pull request overview

Adds an MVP, TUI-managed multi-turn /pentest workflow that drives an “authorized web pentest” coordinator prompt + specialized subagents, with persistence, iteration/stop logic, and tool scoping.

Changes:

Introduces the pentest prompt schema plus recon/access/injection/client/infra/validator agent definitions.
Implements a PentestState state machine in puffer-tui (start/resume/stop, provider-failure recovery, dedupe + stop conditions, JSON artifact persistence) and wires it into the TUI submit/advance loop.
Adds request-scoped tool filtering in puffer-core so pentest turns are constrained by the prompt’s allowed_tools, plus focused tests for parsing and lifecycle behavior.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
resources/prompts/pentest.yaml	Defines the coordinator prompt template and report JSON schema for the pentest loop.
resources/agents/pentest-validator.yaml	Adds an independent reproduction validator subagent definition.
resources/agents/pentest-recon.yaml	Adds a recon/surface-mapping subagent definition.
resources/agents/pentest-injection.yaml	Adds an injection-focused subagent definition.
resources/agents/pentest-infra.yaml	Adds an infra/misconfiguration subagent definition.
resources/agents/pentest-client.yaml	Adds a client-side/browser-focused subagent definition.
resources/agents/pentest-access.yaml	Adds an auth/session/authz-focused subagent definition.
crates/puffer-tui/src/state.rs	Extends `TuiState` with `active_pentest` storage.
crates/puffer-tui/src/pentest.rs	Implements the pentest lifecycle state machine, transcript parsing, persistence, and stop logic.
crates/puffer-tui/src/pentest_tests.rs	Adds unit tests for pentest parsing/lifecycle/recovery/dedupe/stopping behavior.
crates/puffer-tui/src/pentest_command.rs	Bridges `/pentest` slash command handling into the TUI submit path and tool-scope selection.
crates/puffer-tui/src/lib.rs	Wires pentest advancement into the main TUI loop and Ctrl+C stop behavior.
crates/puffer-tui/src/flow.rs	Intercepts `/pentest`, manages interaction with `/loop`, and switches provider execution to prompt-scoped tool filtering.
crates/puffer-core/runtime.rs	Adds `execute_user_prompt_streaming_with_prompt_tools_and_cancel` to enforce prompt `allowed_tools` as a request-scoped tool filter.
crates/puffer-core/lib.rs	Re-exports pentest helpers and the new prompt-scoped streaming execution function.
crates/puffer-core/command/tests/basics.rs	Adds a registration test asserting `/pentest` is a local command with an argument hint.
crates/puffer-core/command.rs	Registers `/pentest` as a local command (TUI-only) and adds a TUI-required message for non-TUI runs.
crates/puffer-core/command_helpers/pentest.rs	Adds `/pentest` argument parsing, URL normalization, and initial prompt rendering.
crates/puffer-core/command_helpers/mod.rs	Registers the new pentest command helper module and exports its API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. The /pentest argument hint exposed via supported_commands() was "<url> [max-iterations] | stop" but the parser also accepts `resume` (and `cancel` as a stop alias). /help and command pickers showed an incomplete usage. Updates the hint to "<url> [max-iterations] | resume | stop" and the matching unit test in command/tests/basics.rs. `cancel` is intentionally omitted from the hint because it is an alias of `stop` rather than its own subcommand. 2. The docstring on prepare_pentest_command claimed "nothing is appended to the session transcript on parse failure", which mixed the helper's contract with caller behavior. Reworded to scope the guarantee to the helper itself: side-effect free on parse failure, does not append anything to the session transcript. No behavior change in the parser itself; only the user-visible hint and the docstring change.

…ions The merge of master into feature/pentest-command brought in the lightweight_context field on TurnRequestOptions (added by the /recap side-turn work on master). The /pentest streaming entrypoint introduced on this branch constructs its own TurnRequestOptions in runtime.rs and was missing the new field after the merge. /pentest is the main user-facing prompt loop, not a lightweight side turn, so the field is set to false to match the other primary-turn constructors (execute_user_prompt, execute_user_prompt_streaming, …).

Merge remote-tracking branch 'upstream/master' into feature/pentest-command. Conflicts resolved: - crates/puffer-core/lib.rs - crates/puffer-tui/src/flow.rs Plus `actor` field added to TranscriptEvent constructions in crates/puffer-tui/src/pentest.rs and pentest_command.rs to match the new signature introduced upstream.

…-recon prompt Replaces the prior fixed five-specialist pentest workflow with a Coordinator-Defined Subagents (CDS) architecture: - One generic `pentest-specialist` yaml with no built-in specialty; the coordinator names the role inline per dispatch via structured fields ([ROLE]/[OBJECTIVE]/[METHODOLOGY]/[TOOLS-ALLOWED]/[TURNS]/ [SUCCESS]/[SAFETY]/[SCOPE]). Specialist max_turns bumped 3 → 5. - `pentest-validator` yaml gains PoC-style verification methodology per finding class (XSS → real browser, SQLi → control payload, IDOR → two-leg replay, JWT → forged-token replay, deserialization → side-effect, open redirect/SSRF → host/body confirmation, default creds → log in and access protected endpoint). - Coordinator prompt is rewritten around async dispatch: every Agent call returns `async_launched` immediately; results are reaped by the runtime on a later turn (TaskOutput, SendMessage resume, or auto-reap) and inlined into the next coordinator prompt. The 16-agent global concurrent cap is enforced by the runtime. - Coordinator prompt now carries a Deep recon playbook (SPA bundle route extraction, OpenAPI/Swagger/GraphQL contract discovery, WordPress/Drupal/Joomla/Actuator/Django/FastAPI/Express surface enumeration) and a vulnerability-class methodology reference covering XSS, crypto/JWT, vulnerable-components, deserialization, anti-automation, misconfig/hidden paths, open redirect/SSRF, broken auth + REST authz, and business logic chains. Runtime additions to support the architecture: - `Agent` is added to `is_parallel_safe_tool`. `tool_batch` pre- stages `Arc<AppState>` / `Arc<ProviderRegistry>` + a clonable `AuthStore` once before `thread::scope`, then re-clones per- thread inside the spawn loop. A `tool_id == "Agent"` branch in the scoped closure routes to `super::agents::execute_agent_tool` while every other parallel-safe tool keeps its existing `claude_tools::execute_parallel_tool` path. - TUI `pentest` module gains an auto-reap path: it tracks `reaped_agent_ids`, scans the workspace `runtime/agent_outputs/` files for terminal-status agents listed in the latest report's `in_flight_agents`, collapses provider-failure noise to one line, and splices completed replies into the next coordinator prompt under "Specialist completions auto-reaped this turn". `command_helpers/pentest` now exports `DEFAULT_PENTEST_MAX_DISPATCHES_PER_ITER` alongside `DEFAULT_PENTEST_MAX_ITERATIONS` and parses an optional third positional CLI arg to override the per-iteration dispatch budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

0oyun added 5 commits May 10, 2026 22:38

0oyun marked this pull request as ready for review May 18, 2026 06:50

Copilot AI review requested due to automatic review settings May 18, 2026 06:50

Copilot started reviewing on behalf of 0oyun May 18, 2026 06:50 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread crates/puffer-tui/src/pentest.rs

Comment thread crates/puffer-core/command.rs Outdated

Comment thread crates/puffer-core/command_helpers/pentest.rs Outdated

0oyun and others added 5 commits May 18, 2026 15:09

Merge branch 'master' into feature/pentest-command

2a38c72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

command: /pentest — multi-turn vulnerability discovery loop (MVP)#103

command: /pentest — multi-turn vulnerability discovery loop (MVP)#103
0oyun wants to merge 10 commits into
berabuddies:masterfrom
0oyun:feature/pentest-command

0oyun commented May 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0oyun commented May 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants