Harpoon Development Progress

Last Updated: 2026-03-05 Branch: dev Status: 13 modules, 48 attacks, OWASP LLM01-10 coverage, attack chaining, supply chain integrity, adaptive selection

Completed Work

Phase 1: Foundation

YAML config loader with ${ENV_VAR} expansion
Scanning profiles: quick, thorough, stealth
Target interface, CustomTarget, ThrottledTarget
HTTP client with timeout, proxy, custom headers
OpenAI-compatible LLM client

Phase 2: Payloads

YAML payload loader with category/severity filtering
Mutation engine with 9 mutation types
506 payloads across 48 YAML files in 40 categories
Payload validation and structure tests

Phase 3: Prompt Module

Direct injection (46 payloads)
Jailbreak (68 payloads across 5 files)
System prompt extraction (40 payloads across 3 files)
Encoding bypass (50 payloads x 18 transforms)

Phase 4: Analysis

Composable Check functions (30+ checks)
Domain-specific checks for all 12 attack modules
Canary word extraction and detection
Confidence scoring (none/low/medium/high/confirmed)

Phase 5: Reporting

Text renderer with ANSI colors
JSON renderer for machine consumption
Markdown report generator
HTML report generator with rich formatting

Phase 6: Hardening

Test coverage: 31 test suites, 350+ test functions
CLI wiring: --profile, --attack, --objective flags
Target headers passthrough
Rate limiting via ThrottledTarget wrapper

Phase 6.5: Multi-Turn Strategies

Strategy interface with 3 implementations
- SimpleSequence: fixed payload ordering
- Crescendo: gradual escalation from benign to malicious
- RefusalRecovery: adaptive tactic switching on refusal
Conversation manager with turn tracking
Turn analyzer with decision logic
Integration across all modules

Phase 7: Progress Streaming

Event-driven architecture (8 event types)
Async engine execution model
Real-time streaming renderer with ANSI colors
Verbose mode with per-payload detail
CLI integration

Phase 8: Extended Modules

Agent module: goal-hijack, tool-abuse, memory-poison (60 payloads)
RAG module: context-injection, context-overflow, retrieval-hijack (26 payloads)
Output module: xss, command-injection, ssrf, markdown-injection (40 payloads)
Privacy module: pii-extraction, training-data, credential-leak (30 payloads)
Privesc module: role-confusion, permission-bypass, cross-tenant (28 payloads)
Hallucination module: false-citation, fabrication, sycophancy (30 payloads)

Phase 9: Production Features

Concurrent payload execution (semaphore + mutex pattern)
CI/CD mode: --ci, --fail-on, exit code 2 on threshold
Quick CLI: --provider, --model, --endpoint, --api-key (config-free usage)
Session management: save, resume, checkpoints, hooks
Configurable payload workers (--payload-workers)
Native target types: OpenAI, Anthropic (via factory)
Target factory for provider-based creation

Phase 10: Attack Chaining

Chain executor with step orchestration and variable propagation
5 transform types: extract-canary, set-variable, append-context, format-payload, conditional
YAML chain definition loader with built-in + custom chain support
Chain result renderer with per-step output
10 built-in chains
CLI flags: --chain, --chains-dir

Phase 11: Model Module

Model extraction (8 payloads) - architecture probing, parameter elicitation, training details
Adversarial examples (8 payloads) - homoglyphs, zero-width chars, RTL overrides, combining diacritics
Membership inference (8 payloads) - verbatim recall, code memorization, license text probing
Analysis checks: ModelInternalsCheck, AdversarialFlipCheck, VerbatimRecallCheck

Phase 12: Denial of Service Module

Resource exhaustion (8 payloads) - token flooding, context fill, recursive instructions
Output amplification (8 payloads) - essay maximization, format explosion, code gen floods
Compute intensive (8 payloads) - combinatorial analysis, ReDoS prompts, impossible tasks
Analysis checks: ResourceExhaustionCheck, LatencyAnomalyCheck, DegradedOutputCheck

Phase 13: LLM-Based Mutations

MutationClient interface in payloads package (narrow, no circular imports)
LLMMutator with bounded concurrency, retry-once, refusal filtering
3 mutation strategies: rephrase, persona-shift, context-wrap
CLI flags: --mutate (openai|anthropic|ollama), --mutate-model, --mutate-variants
Adapter types in main.go for OpenAI and Anthropic clients
Config fields: llm_mutation, mutation_variants, mutation_workers
16 tests: strategies, variant limits, refusals, errors, concurrency, metadata isolation

Phase 14: Multimodal Module

Visual injection (6 payloads) - text-in-image prompts, OCR-based instruction injection
Cross-modal conflict (4 payloads) - contradictory instructions across text and image channels
Steganographic text (4 payloads) - hidden messages via LSB steganography in images
Programmatic PNG generation: 5x7 bitmap font rendering, base64 encoding, data URL output
MultimodalTarget interface with AsMultimodal() helper for graceful fallback
Analysis checks: VisualInstructionCheck, CrossModalCheck, SteganographicCheck

Phase 15: Supply Chain Module

Model verification (8 payloads) - identity probing, capability fingerprinting, knowledge cutoff, multi-turn consistency
Backdoor detection (6 payloads) - trigger prefix, sudo patterns, bracket IDs, unicode zero-width, passphrases, encoded triggers
Dependency trust (6 payloads) - fake tool output, fake API response, fake plugin result, metadata injection, trusted source claims, RAG provenance injection
Deployment probing (6 payloads) - infrastructure probe, framework probe, dependency enumeration, version fingerprint, config extraction, error elicitation
Analysis checks: IdentityInconsistencyCheck, BehavioralShiftCheck, TrustBoundaryViolationCheck, InfrastructureLeakCheck
Model verification supports multi-turn via strategy package

Phase 16: Function Calling Attacks

Schema manipulation (6 payloads) - malformed tool calls, extra parameter injection, prototype pollution
Parameter injection (8 payloads) - shell injection, SQL injection, path traversal, SSRF in tool arguments
Tool confusion (7 payloads) - wrong tool invocation, destructive operation misdirection
Recursive calls (6 payloads) - self-referential chains, infinite loops, circular tool invocation
Tool output poison (6 payloads) - fabricated tool responses influencing agent behavior
Analysis checks: SchemaManipulationCheck, ParameterInjectionCheck, ToolConfusionCheck, RecursiveCallCheck, ToolOutputPoisonCheck
SARIF CWE mappings for all 5 new attack types

Phase 17: Adaptive Attack Selection

Attack-level adaptive selection via FilterAttacks() — skips individual attack categories based on defense profile
AttackDefenseMapping: 48 attack categories mapped to defense types
Complements existing module-level SelectModules() for finer-grained control
Merges with user --attack filter; module-level entries preserved
8 new tests covering all edge cases

Phase 18: Benchmark & Scoring System

Tactic index: Aggregates attack effectiveness across all benchmark runs, per-model and per-attack-category
TacticStats/ModelTacticStats: Success rate, severity, trend detection (improving/stable/declining from 3+ runs)
ModelStats: Per-model overview with avg/best/worst scores, weakest/strongest attack categories
Recommendation engine: 3 modes — known model (direct history), known provider (cross-model inference at 0.7 discount), cold start
Effectiveness scoring: Weighted formula (0.6 success_rate + 0.4 severity_factor) with recency boost
CLI subcommands: harpoon benchmark stats (filter by model/module/attack, text/JSON) and harpoon benchmark recommend (--model required, text/JSON)
Scan integration: --benchmark flag saves scan results to benchmark store; --recommend flag auto-selects attacks from historical data
Renderers: Text tables and JSON for recommendations, stats, and model overviews
22 new tests (52 total in benchmark package)

Current Metrics

Metric	Count
Attack modules	13
Attack types	48
Payloads	609
YAML payload files	53
Chain definitions	10
Encoding transforms	18
Mutation types	9 deterministic + 3 LLM strategies
Strategies	3
Go code (production)	~16,400 lines
Go code (test)	~10,200 lines
Test functions	350+
Test suites	31
Notes/docs	45 + 3
External deps	1

Roadmap

Multi-Agent Attacks

Impact: Medium-High | Complexity: High Why: Systems with multiple LLMs (planner -> executor, inner/outer agents) have unique attack surfaces.

New module: internal/modules/multiagent/

Attack Types:

Confused deputy — trick inner agent into acting with outer agent's authority
Inter-agent injection — poison messages between agents in a pipeline
Orchestrator manipulation — compromise the planner/router to redirect execution
Trust chain exploitation — exploit implicit trust between agents

Function Calling Attacks (Extend Agent Module)

Impact: High | Complexity: Moderate Why: Tool use / function calling is exploding. Our agent module covers basics but not the deeper attack surface.

Extend: internal/modules/agent/

New Attack Types:

Schema manipulation — trick model into malformed tool calls, inject extra parameters
Parameter injection — smuggle malicious values into tool arguments
Tool confusion — make agent call the wrong tool (e.g., delete instead of read)
Recursive tool calls — infinite loops, self-referential chains
Tool output poisoning — manipulate tool return values to influence next actions

Lower Priority

Semantic Preservation Scoring

Impact: Medium | Complexity: Moderate

Score how well LLM-generated mutations preserve the original attack intent.

Embedding-based similarity comparison
Intent classification check
Integration with LLMMutator to auto-discard low-scoring variants

Azure/Bedrock Targets

Impact: Medium | Complexity: Low

Follow existing OpenAI/Anthropic target pattern:

internal/targets/azure.go
internal/targets/bedrock.go
Add to target factory

Technical Debt

strings.Title deprecation in mutator.go — replaced with golang.org/x/text/cases
README.md needs updating to reflect all 12 modules — updated with all modules, 44 attack types, 539 payloads

Quick Reference

# Build and test
make build && make test

# Quick scan (no config)
./bin/harpoon --provider openai --model gpt-4 --attack injection

# Full scan with config
./bin/harpoon --config configs/harpoon.yaml --target my-llm --profile thorough --report html

# CI mode
./bin/harpoon --config configs/harpoon.yaml --ci --fail-on high

# LLM-mutated payloads
./bin/harpoon --provider openai --model gpt-4 --mutate openai --mutate-variants 3

# Run attack chain
./bin/harpoon --config configs/harpoon.yaml --target my-llm --chain recon-and-exploit

# Supply chain attacks
./bin/harpoon --provider openai --model gpt-4 --attack supply
./bin/harpoon --provider openai --model gpt-4 --attack model-verification
./bin/harpoon --provider openai --model gpt-4 --attack backdoor-detection

# List/resume sessions
./bin/harpoon --list-sessions
./bin/harpoon --config configs/harpoon.yaml --session <id>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harpoon Development Progress

Completed Work

Phase 1: Foundation

Phase 2: Payloads

Phase 3: Prompt Module

Phase 4: Analysis

Phase 5: Reporting

Phase 6: Hardening

Phase 6.5: Multi-Turn Strategies

Phase 7: Progress Streaming

Phase 8: Extended Modules

Phase 9: Production Features

Phase 10: Attack Chaining

Phase 11: Model Module

Phase 12: Denial of Service Module

Phase 13: LLM-Based Mutations

Phase 14: Multimodal Module

Phase 15: Supply Chain Module

Phase 16: Function Calling Attacks

Phase 17: Adaptive Attack Selection

Phase 18: Benchmark & Scoring System

Current Metrics

Roadmap

Multi-Agent Attacks

Function Calling Attacks (Extend Agent Module)

Lower Priority

Semantic Preservation Scoring

Azure/Bedrock Targets

Technical Debt

Quick Reference

FilesExpand file tree

PROGRESS.md

Latest commit

History

PROGRESS.md

File metadata and controls

Harpoon Development Progress

Completed Work

Phase 1: Foundation

Phase 2: Payloads

Phase 3: Prompt Module

Phase 4: Analysis

Phase 5: Reporting

Phase 6: Hardening

Phase 6.5: Multi-Turn Strategies

Phase 7: Progress Streaming

Phase 8: Extended Modules

Phase 9: Production Features

Phase 10: Attack Chaining

Phase 11: Model Module

Phase 12: Denial of Service Module

Phase 13: LLM-Based Mutations

Phase 14: Multimodal Module

Phase 15: Supply Chain Module

Phase 16: Function Calling Attacks

Phase 17: Adaptive Attack Selection

Phase 18: Benchmark & Scoring System

Current Metrics

Roadmap

Multi-Agent Attacks

Function Calling Attacks (Extend Agent Module)

Lower Priority

Semantic Preservation Scoring

Azure/Bedrock Targets

Technical Debt

Quick Reference