Skip to content

Latest commit

 

History

History
259 lines (208 loc) · 11.4 KB

File metadata and controls

259 lines (208 loc) · 11.4 KB

Harpoon Development Progress

Last Updated: 2026-03-05 Branch: dev Status: 13 modules, 48 attacks, OWASP LLM01-10 coverage, attack chaining, supply chain integrity, adaptive selection


Completed Work

Phase 1: Foundation

  • YAML config loader with ${ENV_VAR} expansion
  • Scanning profiles: quick, thorough, stealth
  • Target interface, CustomTarget, ThrottledTarget
  • HTTP client with timeout, proxy, custom headers
  • OpenAI-compatible LLM client

Phase 2: Payloads

  • YAML payload loader with category/severity filtering
  • Mutation engine with 9 mutation types
  • 506 payloads across 48 YAML files in 40 categories
  • Payload validation and structure tests

Phase 3: Prompt Module

  • Direct injection (46 payloads)
  • Jailbreak (68 payloads across 5 files)
  • System prompt extraction (40 payloads across 3 files)
  • Encoding bypass (50 payloads x 18 transforms)

Phase 4: Analysis

  • Composable Check functions (30+ checks)
  • Domain-specific checks for all 12 attack modules
  • Canary word extraction and detection
  • Confidence scoring (none/low/medium/high/confirmed)

Phase 5: Reporting

  • Text renderer with ANSI colors
  • JSON renderer for machine consumption
  • Markdown report generator
  • HTML report generator with rich formatting

Phase 6: Hardening

  • Test coverage: 31 test suites, 350+ test functions
  • CLI wiring: --profile, --attack, --objective flags
  • Target headers passthrough
  • Rate limiting via ThrottledTarget wrapper

Phase 6.5: Multi-Turn Strategies

  • Strategy interface with 3 implementations
    • SimpleSequence: fixed payload ordering
    • Crescendo: gradual escalation from benign to malicious
    • RefusalRecovery: adaptive tactic switching on refusal
  • Conversation manager with turn tracking
  • Turn analyzer with decision logic
  • Integration across all modules

Phase 7: Progress Streaming

  • Event-driven architecture (8 event types)
  • Async engine execution model
  • Real-time streaming renderer with ANSI colors
  • Verbose mode with per-payload detail
  • CLI integration

Phase 8: Extended Modules

  • Agent module: goal-hijack, tool-abuse, memory-poison (60 payloads)
  • RAG module: context-injection, context-overflow, retrieval-hijack (26 payloads)
  • Output module: xss, command-injection, ssrf, markdown-injection (40 payloads)
  • Privacy module: pii-extraction, training-data, credential-leak (30 payloads)
  • Privesc module: role-confusion, permission-bypass, cross-tenant (28 payloads)
  • Hallucination module: false-citation, fabrication, sycophancy (30 payloads)

Phase 9: Production Features

  • Concurrent payload execution (semaphore + mutex pattern)
  • CI/CD mode: --ci, --fail-on, exit code 2 on threshold
  • Quick CLI: --provider, --model, --endpoint, --api-key (config-free usage)
  • Session management: save, resume, checkpoints, hooks
  • Configurable payload workers (--payload-workers)
  • Native target types: OpenAI, Anthropic (via factory)
  • Target factory for provider-based creation

Phase 10: Attack Chaining

  • Chain executor with step orchestration and variable propagation
  • 5 transform types: extract-canary, set-variable, append-context, format-payload, conditional
  • YAML chain definition loader with built-in + custom chain support
  • Chain result renderer with per-step output
  • 10 built-in chains
  • CLI flags: --chain, --chains-dir

Phase 11: Model Module

  • Model extraction (8 payloads) - architecture probing, parameter elicitation, training details
  • Adversarial examples (8 payloads) - homoglyphs, zero-width chars, RTL overrides, combining diacritics
  • Membership inference (8 payloads) - verbatim recall, code memorization, license text probing
  • Analysis checks: ModelInternalsCheck, AdversarialFlipCheck, VerbatimRecallCheck

Phase 12: Denial of Service Module

  • Resource exhaustion (8 payloads) - token flooding, context fill, recursive instructions
  • Output amplification (8 payloads) - essay maximization, format explosion, code gen floods
  • Compute intensive (8 payloads) - combinatorial analysis, ReDoS prompts, impossible tasks
  • Analysis checks: ResourceExhaustionCheck, LatencyAnomalyCheck, DegradedOutputCheck

Phase 13: LLM-Based Mutations

  • MutationClient interface in payloads package (narrow, no circular imports)
  • LLMMutator with bounded concurrency, retry-once, refusal filtering
  • 3 mutation strategies: rephrase, persona-shift, context-wrap
  • CLI flags: --mutate (openai|anthropic|ollama), --mutate-model, --mutate-variants
  • Adapter types in main.go for OpenAI and Anthropic clients
  • Config fields: llm_mutation, mutation_variants, mutation_workers
  • 16 tests: strategies, variant limits, refusals, errors, concurrency, metadata isolation

Phase 14: Multimodal Module

  • Visual injection (6 payloads) - text-in-image prompts, OCR-based instruction injection
  • Cross-modal conflict (4 payloads) - contradictory instructions across text and image channels
  • Steganographic text (4 payloads) - hidden messages via LSB steganography in images
  • Programmatic PNG generation: 5x7 bitmap font rendering, base64 encoding, data URL output
  • MultimodalTarget interface with AsMultimodal() helper for graceful fallback
  • Analysis checks: VisualInstructionCheck, CrossModalCheck, SteganographicCheck

Phase 15: Supply Chain Module

  • Model verification (8 payloads) - identity probing, capability fingerprinting, knowledge cutoff, multi-turn consistency
  • Backdoor detection (6 payloads) - trigger prefix, sudo patterns, bracket IDs, unicode zero-width, passphrases, encoded triggers
  • Dependency trust (6 payloads) - fake tool output, fake API response, fake plugin result, metadata injection, trusted source claims, RAG provenance injection
  • Deployment probing (6 payloads) - infrastructure probe, framework probe, dependency enumeration, version fingerprint, config extraction, error elicitation
  • Analysis checks: IdentityInconsistencyCheck, BehavioralShiftCheck, TrustBoundaryViolationCheck, InfrastructureLeakCheck
  • Model verification supports multi-turn via strategy package

Phase 16: Function Calling Attacks

  • Schema manipulation (6 payloads) - malformed tool calls, extra parameter injection, prototype pollution
  • Parameter injection (8 payloads) - shell injection, SQL injection, path traversal, SSRF in tool arguments
  • Tool confusion (7 payloads) - wrong tool invocation, destructive operation misdirection
  • Recursive calls (6 payloads) - self-referential chains, infinite loops, circular tool invocation
  • Tool output poison (6 payloads) - fabricated tool responses influencing agent behavior
  • Analysis checks: SchemaManipulationCheck, ParameterInjectionCheck, ToolConfusionCheck, RecursiveCallCheck, ToolOutputPoisonCheck
  • SARIF CWE mappings for all 5 new attack types

Phase 17: Adaptive Attack Selection

  • Attack-level adaptive selection via FilterAttacks() — skips individual attack categories based on defense profile
  • AttackDefenseMapping: 48 attack categories mapped to defense types
  • Complements existing module-level SelectModules() for finer-grained control
  • Merges with user --attack filter; module-level entries preserved
  • 8 new tests covering all edge cases

Phase 18: Benchmark & Scoring System

  • Tactic index: Aggregates attack effectiveness across all benchmark runs, per-model and per-attack-category
  • TacticStats/ModelTacticStats: Success rate, severity, trend detection (improving/stable/declining from 3+ runs)
  • ModelStats: Per-model overview with avg/best/worst scores, weakest/strongest attack categories
  • Recommendation engine: 3 modes — known model (direct history), known provider (cross-model inference at 0.7 discount), cold start
  • Effectiveness scoring: Weighted formula (0.6 success_rate + 0.4 severity_factor) with recency boost
  • CLI subcommands: harpoon benchmark stats (filter by model/module/attack, text/JSON) and harpoon benchmark recommend (--model required, text/JSON)
  • Scan integration: --benchmark flag saves scan results to benchmark store; --recommend flag auto-selects attacks from historical data
  • Renderers: Text tables and JSON for recommendations, stats, and model overviews
  • 22 new tests (52 total in benchmark package)

Current Metrics

Metric Count
Attack modules 13
Attack types 48
Payloads 609
YAML payload files 53
Chain definitions 10
Encoding transforms 18
Mutation types 9 deterministic + 3 LLM strategies
Strategies 3
Go code (production) ~16,400 lines
Go code (test) ~10,200 lines
Test functions 350+
Test suites 31
Notes/docs 45 + 3
External deps 1

Roadmap

Multi-Agent Attacks

Impact: Medium-High | Complexity: High Why: Systems with multiple LLMs (planner -> executor, inner/outer agents) have unique attack surfaces.

New module: internal/modules/multiagent/

Attack Types:

  • Confused deputy — trick inner agent into acting with outer agent's authority
  • Inter-agent injection — poison messages between agents in a pipeline
  • Orchestrator manipulation — compromise the planner/router to redirect execution
  • Trust chain exploitation — exploit implicit trust between agents

Function Calling Attacks (Extend Agent Module)

Impact: High | Complexity: Moderate Why: Tool use / function calling is exploding. Our agent module covers basics but not the deeper attack surface.

Extend: internal/modules/agent/

New Attack Types:

  • Schema manipulation — trick model into malformed tool calls, inject extra parameters
  • Parameter injection — smuggle malicious values into tool arguments
  • Tool confusion — make agent call the wrong tool (e.g., delete instead of read)
  • Recursive tool calls — infinite loops, self-referential chains
  • Tool output poisoning — manipulate tool return values to influence next actions

Lower Priority

Semantic Preservation Scoring

Impact: Medium | Complexity: Moderate

Score how well LLM-generated mutations preserve the original attack intent.

  • Embedding-based similarity comparison
  • Intent classification check
  • Integration with LLMMutator to auto-discard low-scoring variants

Azure/Bedrock Targets

Impact: Medium | Complexity: Low

Follow existing OpenAI/Anthropic target pattern:

  • internal/targets/azure.go
  • internal/targets/bedrock.go
  • Add to target factory

Technical Debt

  • strings.Title deprecation in mutator.go — replaced with golang.org/x/text/cases
  • README.md needs updating to reflect all 12 modules — updated with all modules, 44 attack types, 539 payloads

Quick Reference

# Build and test
make build && make test

# Quick scan (no config)
./bin/harpoon --provider openai --model gpt-4 --attack injection

# Full scan with config
./bin/harpoon --config configs/harpoon.yaml --target my-llm --profile thorough --report html

# CI mode
./bin/harpoon --config configs/harpoon.yaml --ci --fail-on high

# LLM-mutated payloads
./bin/harpoon --provider openai --model gpt-4 --mutate openai --mutate-variants 3

# Run attack chain
./bin/harpoon --config configs/harpoon.yaml --target my-llm --chain recon-and-exploit

# Supply chain attacks
./bin/harpoon --provider openai --model gpt-4 --attack supply
./bin/harpoon --provider openai --model gpt-4 --attack model-verification
./bin/harpoon --provider openai --model gpt-4 --attack backdoor-detection

# List/resume sessions
./bin/harpoon --list-sessions
./bin/harpoon --config configs/harpoon.yaml --session <id>