Skip to content

Add prompt-injection multimodal and gateway evidence gates#1604

Closed
wowsofine wants to merge 1 commit into
UnitOneAI:mainfrom
wowsofine:improve/prompt-injection-multimodal-gateway
Closed

Add prompt-injection multimodal and gateway evidence gates#1604
wowsofine wants to merge 1 commit into
UnitOneAI:mainfrom
wowsofine:improve/prompt-injection-multimodal-gateway

Conversation

@wowsofine
Copy link
Copy Markdown

Skill Improvement ($50-150 Bounty)

Skill Modified

Skill name: prompt-injection
Skill path: skills/ai-security/prompt-injection/SKILL.md

What Was Wrong

Issue #1437 identifies three coverage gaps: multimodal prompt injection through vision/audio/media parsing, LLM gateway or AI firewall evidence, and cross-agent prompt injection. The existing skill focused on direct and indirect text paths, so reviewers could miss instructions extracted from images, OCR, speech-to-text, video frames, or delegated agent handoffs.

What This PR Fixes

  • Adds multimodal input channels to the interaction-surface map.
  • Adds Multimodal Injection and Cross-Agent Prompt Injection test categories.
  • Adds LLM Gateway / AI Firewall, multimodal parsing, and cross-agent trust-boundary evidence gates.
  • Extends finding output fields with vector, source modality, and control-gap details.
  • Adds MITRE ATLAS direct and indirect prompt-injection references and common pitfalls.

Evidence

Before (skill misses this / false positive on this):

A user uploads a screenshot or audio clip containing instructions. OCR, captioning, or transcription output is passed to the LLM after text-only filters already ran.
A worker agent summarizes poisoned web content, and the orchestrator treats that summary as privileged instructions.
A product says it has an AI firewall, but reviewers do not capture which routes, modalities, tools, and failure modes it enforces.

After (now correctly handled):

The skill requires media modality inventory, parser/source labels, post-parser policy checks, gateway coverage boundaries, fail-closed behavior, and cross-agent capability verification.

Test Cases Added/Updated

  • Added vulnerable test cases (tests/vulnerable/)
  • Added benign test cases (tests/benign/)
  • Existing tests still pass / not applicable: documentation-only skill guidance update.

Bounty Tier

  • Minor ($50) - Doc update, small logic tweak, typo fix
  • Moderate ($100) - New edge case coverage, FP reduction with evidence
  • Substantial ($150) - Rewritten detection logic, major coverage expansion

Verification

  • git diff --check
  • Required frontmatter field check across skills/ and roles/
  • Prompt-injection pattern scan equivalent to .github/workflows/injection-scan.yml
  • rg -n "Multimodal Injection|LLM Gateway / AI Firewall Evidence|Cross-Agent Prompt Injection|Multimodal Parsing Constraints|AML.T0051.000|AML.T0051.001|version: \"1.0.3\"" skills/ai-security/prompt-injection/SKILL.md

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: GitHub Sponsors, PayPal, or crypto; details can be provided privately after maintainer acceptance.

/claim #1437

@wowsofine
Copy link
Copy Markdown
Author

Withdrawing this PR to avoid duplicate implementation noise. Earlier PRs #1461 and #1549 already target #1437, and #1461 includes a direct /claim plus fixture coverage. I will not pursue bounty consideration for this duplicate PR.

@wowsofine wowsofine closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant