A practical reference for securing autonomous AI agents against real-world attacks.
Autonomous AI agents — systems that can read data, use tools, and take action without constant human supervision — are increasingly deployed in production. They also represent a new and largely unsolved attack surface.
This repository documents the threat landscape and provides actionable defense patterns for builders, operators, and security researchers working with agentic AI systems.
Important: Prompt injection is not a solved problem. Input filtering alone fails against adaptive attacks with >85% success rates. The defenses here are about reducing risk, not eliminating it.
- Agent builders: Developers implementing autonomous systems (tool-using LLMs, multi-agent frameworks, coding assistants)
- Security engineers: Teams responsible for securing AI deployments
- Operators: People running agents in production who need to understand what can go wrong
- Researchers: Anyone studying adversarial robustness in agentic AI systems
Start with the threat model, then review defenses for your risk profile:
-
THREAT-MODEL.md — Comprehensive catalog of attacks against autonomous agents, organized by attack surface (Input, Tools, Memory, Credentials, Communication, Supply Chain). Read this first to understand what you're defending against.
-
DEFENSES.md — Practical mitigations mapped to each threat. Includes implementation guidance, effectiveness assessments, and honest trade-offs.
-
ARCHITECTURE.md — Zero-trust reference architecture for autonomous agents. Shows how to structure privilege separation, credential isolation, and monitoring.
-
CHECKLIST.md — Copy-paste security checklist. Use this to audit your deployment before going to production.
This documentation is built on several foundational concepts:
An agent should satisfy no more than two of these three properties:
- (A) Processes untrustworthy inputs
- (B) Accesses sensitive systems or data
- (C) Changes state or communicates externally
If all three are required, require human approval for sensitive operations.
The combination of:
- Access to private data
- Exposure to untrusted content
- Ability to communicate externally
...creates a direct path to data exfiltration. Avoid this combination without strong controls.
The OWASP Agentic AI Security Working Group (Dec 2025) identified these critical risks:
- Agent Goal Hijack (prompt injection → exfiltration)
- Tool Misuse (legitimate tools used destructively)
- Identity & Privilege Abuse (credential leakage)
- Agentic Supply Chain Vulnerabilities (poisoned skills/MCP)
- Unexpected Code Execution (arbitrary code via natural language)
- Memory & Context Poisoning (persistent manipulation)
- Insecure Inter-Agent Communication (spoofed messages)
- Cascading Failures (automated error amplification)
- Human-Agent Trust Exploitation (confident hallucinations)
- Rogue Agents (misalignment and concealment)
All threats in this repository map to one or more of these categories.
The attacks documented here are not theoretical:
- GitHub Copilot Chat (CVSS 9.6): Indirect prompt injection via repository files
- Gemini memory attack: Persistent context poisoning across sessions
- Replit agent meltdown: Cascading tool misuse leading to resource exhaustion
- Amazon Q: Unintended code execution via natural language
- AutoGPT RCE: Remote code execution through compromised plugins
These incidents inform the threat model and defense recommendations.
This documentation follows these principles:
Honest about limitations: We don't oversell defenses. Prompt injection isn't solved. Filtering is unreliable. Zero-day risks exist. We say so clearly.
Practical over theoretical: Every defense can be implemented today. We provide concrete examples, not abstract principles.
Engineering mindset: Written for practitioners, not compliance checkboxes. We document trade-offs, not perfect solutions.
Evidence-based: References to real incidents, published research, and production experience.
This work synthesizes research and insights from:
- OWASP Agentic AI Security Working Group (December 2025 Top 10)
- Meta AI Research: "Rule of Two" privilege separation framework
- Simon Willison: Prompt injection research and the "Lethal Trifecta" concept
- Adversarial robustness research: Papers on adaptive attacks against AI systems
- Real-world incidents: GitHub, Google, Replit, Amazon, and open-source agent frameworks
See individual documents for detailed citations.
This is a living document. Contributions welcome:
- New threats: Documented attacks we've missed
- Defense improvements: Better mitigations or implementation details
- Real-world incidents: Case studies from production deployments
- Corrections: Errors, outdated info, or unclear guidance
Submit issues or pull requests. Please include:
- Concrete examples where applicable
- References to source material
- Honest assessment of effectiveness/limitations
MIT License - see LICENSE for details.
This documentation is provided as-is for educational and operational purposes. No warranty of security is implied by following these patterns.
Disclaimer: Autonomous agents operate in a rapidly evolving threat landscape. These patterns reduce risk but cannot eliminate it. Security is a continuous process, not a destination. Treat this as a starting point, not a complete solution.