Add prompt-injection multimodal and gateway evidence gates by wowsofine · Pull Request #1604 · UnitOneAI/SecuritySkills

wowsofine · 2026-06-07T13:32:07Z

Skill Improvement ($50-150 Bounty)

Skill Modified

Skill name: prompt-injection
Skill path: skills/ai-security/prompt-injection/SKILL.md

What Was Wrong

Issue #1437 identifies three coverage gaps: multimodal prompt injection through vision/audio/media parsing, LLM gateway or AI firewall evidence, and cross-agent prompt injection. The existing skill focused on direct and indirect text paths, so reviewers could miss instructions extracted from images, OCR, speech-to-text, video frames, or delegated agent handoffs.

What This PR Fixes

Adds multimodal input channels to the interaction-surface map.
Adds Multimodal Injection and Cross-Agent Prompt Injection test categories.
Adds LLM Gateway / AI Firewall, multimodal parsing, and cross-agent trust-boundary evidence gates.
Extends finding output fields with vector, source modality, and control-gap details.
Adds MITRE ATLAS direct and indirect prompt-injection references and common pitfalls.

Evidence

Before (skill misses this / false positive on this):

A user uploads a screenshot or audio clip containing instructions. OCR, captioning, or transcription output is passed to the LLM after text-only filters already ran.
A worker agent summarizes poisoned web content, and the orchestrator treats that summary as privileged instructions.
A product says it has an AI firewall, but reviewers do not capture which routes, modalities, tools, and failure modes it enforces.

After (now correctly handled):

The skill requires media modality inventory, parser/source labels, post-parser policy checks, gateway coverage boundaries, fail-closed behavior, and cross-agent capability verification.

Test Cases Added/Updated

Added vulnerable test cases (tests/vulnerable/)
Added benign test cases (tests/benign/)
Existing tests still pass / not applicable: documentation-only skill guidance update.

Bounty Tier

Minor ($50) - Doc update, small logic tweak, typo fix
Moderate ($100) - New edge case coverage, FP reduction with evidence
Substantial ($150) - Rewritten detection logic, major coverage expansion

Verification

git diff --check
Required frontmatter field check across skills/ and roles/
Prompt-injection pattern scan equivalent to .github/workflows/injection-scan.yml
rg -n "Multimodal Injection|LLM Gateway / AI Firewall Evidence|Cross-Agent Prompt Injection|Multimodal Parsing Constraints|AML.T0051.000|AML.T0051.001|version: \"1.0.3\"" skills/ai-security/prompt-injection/SKILL.md

Bounty Info

I have read and agree to the CONTRIBUTING.md bounty terms
Preferred payment method: GitHub Sponsors, PayPal, or crypto; details can be provided privately after maintainer acceptance.

/claim #1437

wowsofine · 2026-06-07T14:05:53Z

Withdrawing this PR to avoid duplicate implementation noise. Earlier PRs #1461 and #1549 already target #1437, and #1461 includes a direct /claim plus fixture coverage. I will not pursue bounty consideration for this duplicate PR.

Improve prompt injection multimodal gates

428df67

wowsofine closed this Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt-injection multimodal and gateway evidence gates#1604

Add prompt-injection multimodal and gateway evidence gates#1604
wowsofine wants to merge 1 commit into
UnitOneAI:mainfrom
wowsofine:improve/prompt-injection-multimodal-gateway

wowsofine commented Jun 7, 2026

Uh oh!

wowsofine commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wowsofine commented Jun 7, 2026

Skill Improvement ($50-150 Bounty)

Skill Modified

What Was Wrong

What This PR Fixes

Evidence

Test Cases Added/Updated

Bounty Tier

Verification

Bounty Info

Uh oh!

wowsofine commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant