This document provides a detailed explanation of the system architecture, design decisions, and implementation details.
┌─────────────────────────────────────────────────────────────────┐
│ Layer 1: LLM Decision Making (Main PC) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ LM Studio │ │
│ │ • Model: qwen2.5-coder-14b-instruct-abliterated │ │
│ │ • Inference: temp=0.4, min_p=0.08, repeat_penalty=1.08 │ │
│ │ • API: OpenAI-compatible /v1/chat/completions │ │
│ └───────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ HTTP POST (chat completions)
↓
┌─────────────────────────────────────────────────────────────────┐
│ Layer 2: Knowledge Base (K3s Cluster) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MCP RAG Server │ │
│ │ • Protocol: Model Context Protocol (JSON-RPC 2.0) │ │
│ │ • Index: FAISS (L2 distance, 5,395 documents) │ │
│ │ • Embeddings: all-MiniLM-L6-v2 (384 dimensions) │ │
│ │ • Sources: GTFOBins, Atomic Red Team, HackTricks │ │
│ └───────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ MCP Protocol (tools/call)
↑
┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: Autonomous Agent (K3s Pod - Isolated) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Red Team Agent Process │ │
│ │ ┌─────────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ MCP Client │→ │ LLM Client │→ │ Command │ │ │
│ │ │ (Query RAG) │ │ (Get Plan) │ │ Sandbox │ │ │
│ │ └─────────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ BlackArch Toolkit (2000+ tools) │ │ │
│ │ │ • hydra, nmap, metasploit, sqlmap, etc. │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ NetworkPolicy (Kernel-level Isolation) │ │
│ │ • Egress: ONLY target + MCP + LLM + DNS │ │
│ │ • Ingress: DENY ALL │ │
│ └───────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ SSH (port 22)
↓
┌─────────────────────────────────────────────────────────────────┐
│ Layer 4: Target Machine │
│ • Intentionally vulnerable system │
│ • Weak credentials, SUID binaries, misconfigurations │
└─────────────────────────────────────────────────────────────────┘
Role: Provides the "brain" for agent decision-making
Technology Stack:
- LM Studio 0.3.28+ (local LLM inference)
- Model: Qwen 2.5 Coder 14B Instruct (abliterated variant)
- API: OpenAI-compatible REST API
Why Qwen 2.5 Coder Abliterated?
- Uncensored: Standard models refuse security-related queries ("I cannot help with hacking")
- Code-focused: Optimized for command syntax and technical tasks
- Good reasoning: 14B parameters sufficient for attack planning
- Local: No data leaves infrastructure (privacy + air-gap compatible)
Inference Configuration:
{
"temperature": 0.4, # Low = deterministic command generation
"min_p": 0.08, # Dynamic sampling (adapts to model confidence)
"top_p": 1.0, # Disabled (conflicts with min_p)
"top_k": 0, # Disabled (min_p is superior)
"repeat_penalty": 1.08, # Prevents "nmap... nmap... nmap..." loops
"stream": False, # Required for reliable text parsing
"cache_prompt": True, # Cache system prompt for speed
"max_tokens": 2048 # Prevent runaway generation
}Why These Settings?
- Temperature 0.4: Research shows lower temp (0.3-0.5) better for tool use than default 0.7
- Min-P 0.08: Dynamically filters tokens based on top token probability (better than static top_p)
- Stream: False: Easier to parse complete responses vs streaming tokens
Known Limitation:
- Tool calling API broken on abliterated models (see
KNOWN-ISSUES.md) - Workaround: Agent-orchestrated pattern with text parsing
Role: Provides offensive security knowledge via semantic search
Technology Stack:
- Web Framework: Flask 3.0.0 + Gunicorn (sync workers)
- Vector Database: FAISS (Facebook AI Similarity Search)
- Embeddings: sentence-transformers (all-MiniLM-L6-v2, 384 dims)
- Protocol: Model Context Protocol (MCP) JSON-RPC 2.0
Why Flask + Gunicorn (not FastAPI + Uvicorn)?
- K3s containerd blocks
socketpair()syscall - FastAPI + Uvicorn requires socketpair for asyncio event loop
- Flask + sync workers = no socketpair needed
- Lesson: "Boring technology" wins in restricted environments
Data Sources (5,395 documents total):
- GTFOBins (~200 files): Unix binary exploitation for privilege escalation
- Atomic Red Team (~5,000 tests): MITRE ATT&CK-mapped adversary emulation
- HackTricks (~200 pages): Offensive security playbooks and techniques
FAISS Index Configuration:
dimension = 384 # all-MiniLM-L6-v2 embedding size
index = faiss.IndexFlatL2(dimension) # L2 distance (Euclidean)Why L2 Distance?
- Simple, fast, works well for cybersecurity technical content
- Alternatives (cosine similarity) showed minimal improvement in testing
- FAISS IndexFlatL2 supports exact search (no approximation errors)
MCP Tools Exposed:
-
search(query: str, top_k: int = 5)
- Semantic search over entire knowledge base
- Returns ranked results with source metadata
- Example:
search("SSH brute force weak password")
-
list_techniques()
- Returns all 327 indexed MITRE ATT&CK techniques
- Useful for agent to explore available tactics
API Endpoint:
- Internal:
http://mcp-rag-server.default.svc.cluster.local:8080 - External:
http://192.168.1.181:30800(NodePort)
Role: Orchestrates attack loop (query → plan → execute → observe → iterate)
Architecture Pattern: Agent-Orchestrated (not LLM-orchestrated)
Why Agent-Orchestrated?
# Agent controls the loop
while not objective_achieved:
knowledge = query_mcp_rag(objective)
plan = llm.generate(prompt_with_knowledge)
commands = extract_commands_from_text(plan)
result = execute_with_sandbox(commands[0])
# vs LLM-orchestrated (doesn't work with abliterated models)
# response = llm.chat(messages, tools=[query_rag_tool, execute_command_tool])
# for tool_call in response.tool_calls: # <-- BROKEN on abliterated models
# execute_tool(tool_call)Benefits:
- ✅ Works with abliterated models (essential for uncensored security research)
- ✅ More transparent (can inspect exact prompts and responses)
- ✅ Easier to debug (print/log every step)
- ✅ More flexible (adjust extraction patterns without model changes)
Components:
class MCPClient:
def __init__(self, mcp_url: str):
self._initialize() # Handshake: initialize → notifications/initialized → tools/list
def search(self, query: str, top_k: int = 3) -> List[Dict]:
# Query knowledge base via MCP tools/callMCP Handshake:
initialize- Exchange protocol version and capabilitiesnotifications/initialized- Confirm readytools/list- Discover available toolstools/call- Execute search/list_techniques
class LLMClient:
def generate(self, prompt: str, system: str = None) -> str:
# Call LM Studio /v1/chat/completions with optimized paramsPrompt Structure:
System: [Comprehensive methodology with 5 phases]
User:
Relevant techniques from knowledge base:
[RAG results - top 3 documents]
Objective: Gain SSH access using weak credentials
Target: 192.168.1.99
Based on the knowledge base, provide ONE specific command.
class CommandSandbox:
WHITELIST = [
'ssh', 'hydra', 'nmap', 'metasploit', ... # 2000+ BlackArch tools
]
BLACKLIST_PATTERNS = [
r'rm\s+-rf\s+/', # Prevent destruction
r'dd\s+if=.*of=/dev/', # Prevent disk wipe
r'mkfs', # Prevent filesystem format
...
]
def execute(self, command: str) -> Tuple[str, int]:
if not self.is_safe(command):
return "BLOCKED", -1
result = subprocess.run(command, shell=True, capture_output=True)
self.log(command, result.stdout, result.returncode)
return result.stdout, result.returncodeSafety Features:
- Whitelist: Only known tools allowed
- Blacklist: Destructive patterns blocked
- Logging: Every command recorded with timestamp
- Timeout: 30s max execution time
- Non-root: Runs as UID 1000
class RedTeamAgent:
def attack_cycle(self, objective: str, max_iterations: int = 5):
for iteration in range(max_iterations):
# 1. Query knowledge base
knowledge = self.query_knowledge_base(objective)
# 2. Ask LLM for plan
plan = self.ask_llm(plan_prompt, context=knowledge)
# 3. Extract commands from freeform text
commands = self._extract_commands(plan)
# 4. Check for repetition (prevent infinite loops)
if self._is_repeating(commands[0]):
# Query RAG for alternative techniques
alt_knowledge = self.query_knowledge_base(f"{objective} alternative")
plan = self.ask_llm("Suggest DIFFERENT approach", context=alt_knowledge)
commands = self._extract_commands(plan)
# 5. Execute command
self.command_history.append(commands[0])
output, code = self.execute_command(commands[0])
# 6. Check success
if code == 0 and success_pattern_matches(output):
print("SUCCESS!")
break
# 7. Ask LLM what to try next (with exit code feedback)
feedback = f"Exit code: {code} {'✅' if code == 0 else '❌'}\nOutput: {output[:1024]}"
next_action = self.ask_llm(feedback)Key Features:
- Repetition Detection: Tracks last 3 commands, auto-queries RAG for alternatives if stuck
- Exit Code Feedback: LLM learns from success (0) vs failure (non-zero)
- Output Truncation: 1024 chars max to LLM (prevents context overflow)
- Command History: Full log saved to
/var/log/agent/commands.log
Role: Provides offensive security toolkit in isolated environment
Base Image: archlinux:latest
Installed Tools (highlights):
- Password Cracking: hydra, john, hashcat, medusa
- Network Scanning: nmap, masscan, tcpdump, wireshark-cli
- Web Tools: sqlmap, nikto, dirb, gobuster, wfuzz
- Exploitation: metasploit, msfvenom
- Wireless: aircrack-ng, airodump-ng, aireplay-ng
- Enumeration: enum4linux, smbclient, rpcclient
- +1990 more via BlackArch repository
Dockerfile Key Sections:
FROM archlinux:latest
# Install BlackArch repository
RUN curl -O https://blackarch.org/strap.sh && \
chmod +x strap.sh && \
./strap.sh
# Install essential tools (2GB+ install)
RUN pacman -Sy --noconfirm \
python python-pip \
openssh nmap netcat curl wget \
hydra sshpass nikto sqlmap \
metasploit john hashcat \
aircrack-ng wireshark-cli tcpdump
# Python dependencies
RUN pip install --no-cache-dir requests --break-system-packages
# Non-root user
RUN useradd -m -u 1000 agent
USER agent
CMD ["python", "/app/agent.py"]Security Context (in K8s manifest):
securityContext:
runAsUser: 1000
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefaultWhy Non-Root?
- Defense in depth: Even if agent escapes sandbox, can't modify container
- Best practice: Never run as root unless absolutely necessary
- UID 1000: Standard user ID (not privileged)
Role: Kernel-level enforcement of allowed network traffic
Why NetworkPolicy > iptables?
- Declarative: Defined in YAML, version-controlled
- Kubernetes-native: Integrated with pod lifecycle
- Provable: Can verify policy with kubectl describe
- Automatic: Applied immediately when pod starts
Policy Definition:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: agent-isolation
namespace: redteam-lab
spec:
podSelector:
matchLabels:
app: redteam-agent
policyTypes:
- Egress
egress:
# 1. Allow DNS (required for domain resolution)
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# 2. Allow MCP RAG server (knowledge base)
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: default
podSelector:
matchLabels:
app: mcp-rag-server
ports:
- protocol: TCP
port: 8080
# 3. Allow LM Studio (LLM inference)
- to:
- ipBlock:
cidr: 192.168.1.84/32
ports:
- protocol: TCP
port: 1234
# 4. Allow target machine (attack surface)
- to:
- ipBlock:
cidr: 192.168.1.99/32
ports:
- protocol: TCP
port: 22 # SSH onlyWhat's BLOCKED (implicit deny):
- ❌ Internet access (no 0.0.0.0/0 route)
- ❌ Other Kubernetes pods (except MCP server)
- ❌ Other LAN hosts (except target + LLM)
- ❌ Kubernetes API server
- ❌ All ingress traffic
Testing Isolation:
# Should FAIL (internet blocked)
kubectl exec redteam-agent -n redteam-lab -- curl https://google.com
# Should SUCCEED (MCP server allowed)
kubectl exec redteam-agent -n redteam-lab -- curl http://mcp-rag-server.default.svc.cluster.local:8080/healthz
# Should SUCCEED (target allowed)
kubectl exec redteam-agent -n redteam-lab -- nc -zv 192.168.1.99 221. OBJECTIVE SET
↓
User/Config: "Gain SSH access using weak credentials"
2. QUERY KNOWLEDGE BASE
↓
Agent → MCP Server: tools/call("search", {"query": "SSH brute force weak password"})
↓
MCP Server → FAISS: Vector similarity search
↓
FAISS returns top 3 documents:
- Atomic Red Team T1110.001 (Password Guessing)
- HackTricks SSH Brute Force
- Hydra documentation
↓
MCP Server → Agent: Results with technique details
3. PLAN ATTACK
↓
Agent → LM Studio: POST /v1/chat/completions
{
"system": "You are a red team agent... [methodology]",
"user": "Knowledge base: [RAG results]\n\nObjective: Gain SSH access\nSuggest ONE command."
}
↓
LM Studio (Qwen 2.5 Coder):
- Reads RAG context
- Applies system prompt methodology
- Generates response with reasoning + command
↓
Response: "**Reasoning**: Try common passwords via hydra (T1110.001)\n**Command**: `hydra -l victim -p password123 ssh://192.168.1.99`"
↓
Agent: Extract command via regex
4. EXECUTE COMMAND
↓
Agent → CommandSandbox: execute("hydra -l victim -p password123 ssh://192.168.1.99")
↓
Sandbox: Check whitelist (hydra ✅) and blacklist (no patterns matched ✅)
↓
Sandbox → Shell: subprocess.run(command, capture_output=True)
↓
Shell: Execute hydra against target
↓
Target (192.168.1.99): Accept connection, validate password
↓
Hydra: "[22][ssh] host: 192.168.1.99 login: victim password: password123"
↓
Sandbox → Agent: (stdout, exit_code=0)
↓
Agent: Log to /var/log/agent/commands.log
5. OBSERVE RESULT
↓
Agent: Check if "password" in output and exit_code == 0
↓
Agent: SUCCESS! SSH credentials found.
↓
Break loop
6. REPORT
↓
Agent: Print summary (time, commands executed, success)
Failure Scenario (with repetition detection):
Iteration 1: Execute "nmap -sV 192.168.1.99" → Exit 0
Iteration 2: Execute "nmap -sV 192.168.1.99" → Exit 0 (repeated)
Iteration 3: Execute "nmap -sV 192.168.1.99" → REPETITION DETECTED
↓
Agent: Query RAG("SSH access alternative to nmap")
↓
LLM: Suggest "hydra" instead
↓
Execute "hydra..." → Success
Decision: Agent controls loop, LLM provides text responses (not tool calls)
Reason: LM Studio tool calling API broken on abliterated models
Evidence:
curl .../v1/chat/completions -d '{"tools": [...]}'
# Returns: {"error":"Unexpected empty grammar stack after accepting piece: {\""}Trade-offs:
| Approach | Pros | Cons |
|---|---|---|
| LLM-Orchestrated | Structured output, type safety | Doesn't work with abliterated models ❌ |
| Agent-Orchestrated | Works with abliterated, more transparent | Requires text parsing |
Conclusion: Agent-orchestrated is actually BETTER for our use case (transparency, flexibility)
Decision: Use Flask + sync workers (not FastAPI + Uvicorn)
Reason: K3s containerd blocks socketpair() syscall
Testing:
# FastAPI + Uvicorn
import asyncio
loop = asyncio.get_event_loop() # Requires socketpair() → BLOCKED by K3s
# Flask + Gunicorn sync workers
# No asyncio, no socketpair needed → WORKSLesson: Async isn't always necessary. Sync workers handle ~100 req/s fine for our use case.
Decision: Use min_p=0.08 (not top_p=0.9)
Reason: Min-P adapts dynamically to model confidence
How it works:
Top-P (static threshold):
- Keep tokens until cumulative probability ≥ 0.9
- Same threshold regardless of model confidence
Min-P (dynamic threshold):
- Keep tokens with P ≥ (min_p × P_top_token)
- If top token = 80%, min_p 0.08 → keep tokens >6.4%
- If top token = 20%, min_p 0.08 → keep tokens >1.6%
Result: Fewer low-quality command suggestions, more deterministic outputs
Source: Research paper "Min-P Sampling: A Better Sampling Method" (2024)
Decision: Attack real hardware (192.168.1.99), not just other pods
Reasons:
- More realistic: Real attack surface (SSH, kernel, filesystem)
- Better demo: Shows agent works on actual infrastructure
- CVE testing: Can test real vulnerabilities, not simulated
- Still safe: NetworkPolicy prevents lateral movement
Alternative considered: Vulnerable container (Damn Vulnerable Web App in pod)
- Pros: Easier setup, faster reset
- Cons: Less realistic, can't test OS-level exploits
Decision: Use BlackArch repository (Arch Linux base)
Reasons:
- Rolling release: Always latest tool versions
- Lightweight: Minimal base image (~500MB vs Kali's ~2GB)
- Pacman: Fast, reliable package manager
- Consistency: Main PC already runs BlackArch
Alternative: Kali Linux
- Pros: More popular, better documented
- Cons: Debian-based (slower updates), larger images
Single Agent:
- Attack cycle: ~90 seconds (SSH compromise)
- MCP query latency: <100ms (FAISS L2 search)
- LLM inference: 5-15s per response (14B model, consumer GPU)
- Total iterations: 1-5 (depending on objective complexity)
Resource Usage:
- Agent pod: 0.5 CPU, 512MB RAM (typical)
- MCP server: 0.2 CPU, 1GB RAM (FAISS index in memory)
- LM Studio: 8GB VRAM (14B model in 4-bit quantization)
Horizontal Scaling (multiple agents):
# Scale to 10 agents attacking different targets
kubectl scale deployment redteam-agent -n redteam-lab --replicas=10Bottlenecks:
-
LM Studio: Single LLM instance = sequential inference
- Solution: LM Studio supports parallel requests (queue internally)
- Alternative: Deploy multiple LM Studio instances
-
MCP RAG Server: FAISS IndexFlatL2 is single-threaded
- Solution: FAISS supports GPU acceleration (IndexFlatL2 → IndexIVFFlat)
- Alternative: Multiple MCP server replicas (stateless, can load-balance)
-
Network: 1Gbps LAN sufficient for <100 agents
- MCP queries: ~50KB/query
- LLM responses: ~5KB/response
Future Enhancements:
- LLM: Use vLLM for batched inference (10x throughput)
- RAG: GPU-accelerated FAISS (100x faster search)
- Agents: Distributed task queue (Celery + Redis)
Symptom: Command extraction returns empty list
Recovery:
commands = self._extract_commands(plan)
if not commands:
print("⚠️ No commands found")
continue # Skip iteration, query RAG againPrevention: Structured system prompt enforces output format
Symptom: Same command executed 3+ times with failures
Recovery:
if self._is_repeating(command):
# Query RAG for alternative techniques
alt_knowledge = self.query_knowledge_base(f"{objective} alternative")
plan = self.ask_llm("Suggest DIFFERENT approach")Metrics: In testing, repetition detection reduced stuck loops by 80%
Symptom: Connection refused or timeout
Recovery:
try:
results = self.mcp.search(query)
except Exception as e:
print(f"❌ MCP search failed: {e}")
results = [] # Proceed without RAG context (degraded mode)Alternative: Agent can still function with LLM-only (no RAG), just less accurate
Symptom: Network timeout or connection refused
Recovery:
output, code = self.execute_command(command)
if code != 0 and "timeout" in output.lower():
print("⚠️ Target unreachable, aborting")
breakNetworkPolicy blocks: Verified if unable to connect to allowed target IP
Assumptions:
- Agent code is trusted (user writes it)
- LLM may generate malicious commands (defense: sandbox)
- Target is intentionally vulnerable (expected)
Mitigations:
- NetworkPolicy: Kernel-level isolation (not app-level)
- Command Sandbox: Whitelist + blacklist (defense in depth)
- Non-root: Agent can't modify container filesystem
- Resource Limits: Prevent DoS (1 CPU, 1GB RAM)
- Logging: Full audit trail of all commands
Residual Risks:
- LLM prompt injection: Could trick agent into skipping sandbox
- Mitigation: Sandbox enforced in code, not via LLM
- 0-day in BlackArch tools: Could escape container
- Mitigation: seccomp RuntimeDefault blocks many syscalls
- Kubernetes CVE: Could escape NetworkPolicy
- Mitigation: Keep K8s updated, monitor CVEs
-
Blue Team Agent: Defensive monitoring pod
- Detects agent activity via logs
- Competes with red team (red vs blue game)
-
Multi-Objective Support: Chain techniques
- Example: "Gain SSH access → privesc → exfiltrate flag"
-
Stealth Metrics: Measure detectability
- Track noise level (failed logins, scans, etc.)
-
Reinforcement Learning: Agent learns from success/failure
- Reward: Time to compromise + stealth
- Technique: PPO or DQN
-
Multi-Agent Collaboration: Multiple agents coordinate
- Example: One agent distracts IDS, another exploits
-
Custom Exploit Development: Agent writes own scripts
- LLM generates Python/Bash exploits based on recon
- CTF Automation: Solve Capture The Flag challenges
- Real CVE Exploitation: Test against actual vulnerabilities
- Adversarial Training: Red agent trains blue agent (GAN-like)
Document Version: 1.0 Last Updated: October 9, 2025 Status: Production architecture, battle-tested