Last Updated: 2026-03-05
Branch: dev
Status: 13 modules, 48 attacks, OWASP LLM01-10 coverage, attack chaining, supply chain integrity, adaptive selection
- YAML config loader with
${ENV_VAR}expansion - Scanning profiles: quick, thorough, stealth
- Target interface, CustomTarget, ThrottledTarget
- HTTP client with timeout, proxy, custom headers
- OpenAI-compatible LLM client
- YAML payload loader with category/severity filtering
- Mutation engine with 9 mutation types
- 506 payloads across 48 YAML files in 40 categories
- Payload validation and structure tests
- Direct injection (46 payloads)
- Jailbreak (68 payloads across 5 files)
- System prompt extraction (40 payloads across 3 files)
- Encoding bypass (50 payloads x 18 transforms)
- Composable Check functions (30+ checks)
- Domain-specific checks for all 12 attack modules
- Canary word extraction and detection
- Confidence scoring (none/low/medium/high/confirmed)
- Text renderer with ANSI colors
- JSON renderer for machine consumption
- Markdown report generator
- HTML report generator with rich formatting
- Test coverage: 31 test suites, 350+ test functions
- CLI wiring: --profile, --attack, --objective flags
- Target headers passthrough
- Rate limiting via ThrottledTarget wrapper
- Strategy interface with 3 implementations
- SimpleSequence: fixed payload ordering
- Crescendo: gradual escalation from benign to malicious
- RefusalRecovery: adaptive tactic switching on refusal
- Conversation manager with turn tracking
- Turn analyzer with decision logic
- Integration across all modules
- Event-driven architecture (8 event types)
- Async engine execution model
- Real-time streaming renderer with ANSI colors
- Verbose mode with per-payload detail
- CLI integration
- Agent module: goal-hijack, tool-abuse, memory-poison (60 payloads)
- RAG module: context-injection, context-overflow, retrieval-hijack (26 payloads)
- Output module: xss, command-injection, ssrf, markdown-injection (40 payloads)
- Privacy module: pii-extraction, training-data, credential-leak (30 payloads)
- Privesc module: role-confusion, permission-bypass, cross-tenant (28 payloads)
- Hallucination module: false-citation, fabrication, sycophancy (30 payloads)
- Concurrent payload execution (semaphore + mutex pattern)
- CI/CD mode: --ci, --fail-on, exit code 2 on threshold
- Quick CLI: --provider, --model, --endpoint, --api-key (config-free usage)
- Session management: save, resume, checkpoints, hooks
- Configurable payload workers (--payload-workers)
- Native target types: OpenAI, Anthropic (via factory)
- Target factory for provider-based creation
- Chain executor with step orchestration and variable propagation
- 5 transform types: extract-canary, set-variable, append-context, format-payload, conditional
- YAML chain definition loader with built-in + custom chain support
- Chain result renderer with per-step output
- 10 built-in chains
- CLI flags: --chain, --chains-dir
- Model extraction (8 payloads) - architecture probing, parameter elicitation, training details
- Adversarial examples (8 payloads) - homoglyphs, zero-width chars, RTL overrides, combining diacritics
- Membership inference (8 payloads) - verbatim recall, code memorization, license text probing
- Analysis checks: ModelInternalsCheck, AdversarialFlipCheck, VerbatimRecallCheck
- Resource exhaustion (8 payloads) - token flooding, context fill, recursive instructions
- Output amplification (8 payloads) - essay maximization, format explosion, code gen floods
- Compute intensive (8 payloads) - combinatorial analysis, ReDoS prompts, impossible tasks
- Analysis checks: ResourceExhaustionCheck, LatencyAnomalyCheck, DegradedOutputCheck
MutationClientinterface in payloads package (narrow, no circular imports)LLMMutatorwith bounded concurrency, retry-once, refusal filtering- 3 mutation strategies: rephrase, persona-shift, context-wrap
- CLI flags: --mutate (openai|anthropic|ollama), --mutate-model, --mutate-variants
- Adapter types in main.go for OpenAI and Anthropic clients
- Config fields:
llm_mutation,mutation_variants,mutation_workers - 16 tests: strategies, variant limits, refusals, errors, concurrency, metadata isolation
- Visual injection (6 payloads) - text-in-image prompts, OCR-based instruction injection
- Cross-modal conflict (4 payloads) - contradictory instructions across text and image channels
- Steganographic text (4 payloads) - hidden messages via LSB steganography in images
- Programmatic PNG generation: 5x7 bitmap font rendering, base64 encoding, data URL output
MultimodalTargetinterface withAsMultimodal()helper for graceful fallback- Analysis checks: VisualInstructionCheck, CrossModalCheck, SteganographicCheck
- Model verification (8 payloads) - identity probing, capability fingerprinting, knowledge cutoff, multi-turn consistency
- Backdoor detection (6 payloads) - trigger prefix, sudo patterns, bracket IDs, unicode zero-width, passphrases, encoded triggers
- Dependency trust (6 payloads) - fake tool output, fake API response, fake plugin result, metadata injection, trusted source claims, RAG provenance injection
- Deployment probing (6 payloads) - infrastructure probe, framework probe, dependency enumeration, version fingerprint, config extraction, error elicitation
- Analysis checks: IdentityInconsistencyCheck, BehavioralShiftCheck, TrustBoundaryViolationCheck, InfrastructureLeakCheck
- Model verification supports multi-turn via strategy package
- Schema manipulation (6 payloads) - malformed tool calls, extra parameter injection, prototype pollution
- Parameter injection (8 payloads) - shell injection, SQL injection, path traversal, SSRF in tool arguments
- Tool confusion (7 payloads) - wrong tool invocation, destructive operation misdirection
- Recursive calls (6 payloads) - self-referential chains, infinite loops, circular tool invocation
- Tool output poison (6 payloads) - fabricated tool responses influencing agent behavior
- Analysis checks: SchemaManipulationCheck, ParameterInjectionCheck, ToolConfusionCheck, RecursiveCallCheck, ToolOutputPoisonCheck
- SARIF CWE mappings for all 5 new attack types
- Attack-level adaptive selection via
FilterAttacks()— skips individual attack categories based on defense profile AttackDefenseMapping: 48 attack categories mapped to defense types- Complements existing module-level
SelectModules()for finer-grained control - Merges with user
--attackfilter; module-level entries preserved - 8 new tests covering all edge cases
- Tactic index: Aggregates attack effectiveness across all benchmark runs, per-model and per-attack-category
- TacticStats/ModelTacticStats: Success rate, severity, trend detection (improving/stable/declining from 3+ runs)
- ModelStats: Per-model overview with avg/best/worst scores, weakest/strongest attack categories
- Recommendation engine: 3 modes — known model (direct history), known provider (cross-model inference at 0.7 discount), cold start
- Effectiveness scoring: Weighted formula (0.6 success_rate + 0.4 severity_factor) with recency boost
- CLI subcommands:
harpoon benchmark stats(filter by model/module/attack, text/JSON) andharpoon benchmark recommend(--model required, text/JSON) - Scan integration:
--benchmarkflag saves scan results to benchmark store;--recommendflag auto-selects attacks from historical data - Renderers: Text tables and JSON for recommendations, stats, and model overviews
- 22 new tests (52 total in benchmark package)
| Metric | Count |
|---|---|
| Attack modules | 13 |
| Attack types | 48 |
| Payloads | 609 |
| YAML payload files | 53 |
| Chain definitions | 10 |
| Encoding transforms | 18 |
| Mutation types | 9 deterministic + 3 LLM strategies |
| Strategies | 3 |
| Go code (production) | ~16,400 lines |
| Go code (test) | ~10,200 lines |
| Test functions | 350+ |
| Test suites | 31 |
| Notes/docs | 45 + 3 |
| External deps | 1 |
Impact: Medium-High | Complexity: High Why: Systems with multiple LLMs (planner -> executor, inner/outer agents) have unique attack surfaces.
New module: internal/modules/multiagent/
Attack Types:
- Confused deputy — trick inner agent into acting with outer agent's authority
- Inter-agent injection — poison messages between agents in a pipeline
- Orchestrator manipulation — compromise the planner/router to redirect execution
- Trust chain exploitation — exploit implicit trust between agents
Impact: High | Complexity: Moderate Why: Tool use / function calling is exploding. Our agent module covers basics but not the deeper attack surface.
Extend: internal/modules/agent/
New Attack Types:
- Schema manipulation — trick model into malformed tool calls, inject extra parameters
- Parameter injection — smuggle malicious values into tool arguments
- Tool confusion — make agent call the wrong tool (e.g., delete instead of read)
- Recursive tool calls — infinite loops, self-referential chains
- Tool output poisoning — manipulate tool return values to influence next actions
Impact: Medium | Complexity: Moderate
Score how well LLM-generated mutations preserve the original attack intent.
- Embedding-based similarity comparison
- Intent classification check
- Integration with LLMMutator to auto-discard low-scoring variants
Impact: Medium | Complexity: Low
Follow existing OpenAI/Anthropic target pattern:
internal/targets/azure.gointernal/targets/bedrock.go- Add to target factory
-
strings.Titledeprecation in mutator.go — replaced withgolang.org/x/text/cases - README.md needs updating to reflect all 12 modules — updated with all modules, 44 attack types, 539 payloads
# Build and test
make build && make test
# Quick scan (no config)
./bin/harpoon --provider openai --model gpt-4 --attack injection
# Full scan with config
./bin/harpoon --config configs/harpoon.yaml --target my-llm --profile thorough --report html
# CI mode
./bin/harpoon --config configs/harpoon.yaml --ci --fail-on high
# LLM-mutated payloads
./bin/harpoon --provider openai --model gpt-4 --mutate openai --mutate-variants 3
# Run attack chain
./bin/harpoon --config configs/harpoon.yaml --target my-llm --chain recon-and-exploit
# Supply chain attacks
./bin/harpoon --provider openai --model gpt-4 --attack supply
./bin/harpoon --provider openai --model gpt-4 --attack model-verification
./bin/harpoon --provider openai --model gpt-4 --attack backdoor-detection
# List/resume sessions
./bin/harpoon --list-sessions
./bin/harpoon --config configs/harpoon.yaml --session <id>