Architecture Deep Dive

Adversarial Security Agents System Design

This document provides a detailed explanation of the system architecture, design decisions, and implementation details.

System Overview
Component Details
Data Flow
Design Decisions
Scalability & Performance

System Overview

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Layer 1: LLM Decision Making (Main PC)                         │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  LM Studio                                                 │  │
│  │  • Model: qwen2.5-coder-14b-instruct-abliterated         │  │
│  │  • Inference: temp=0.4, min_p=0.08, repeat_penalty=1.08  │  │
│  │  • API: OpenAI-compatible /v1/chat/completions           │  │
│  └───────────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP POST (chat completions)
                             ↓
┌─────────────────────────────────────────────────────────────────┐
│  Layer 2: Knowledge Base (K3s Cluster)                          │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  MCP RAG Server                                            │  │
│  │  • Protocol: Model Context Protocol (JSON-RPC 2.0)        │  │
│  │  • Index: FAISS (L2 distance, 5,395 documents)            │  │
│  │  • Embeddings: all-MiniLM-L6-v2 (384 dimensions)          │  │
│  │  • Sources: GTFOBins, Atomic Red Team, HackTricks         │  │
│  └───────────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │ MCP Protocol (tools/call)
                             ↑
┌─────────────────────────────────────────────────────────────────┐
│  Layer 3: Autonomous Agent (K3s Pod - Isolated)                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Red Team Agent Process                                    │  │
│  │  ┌─────────────────┐  ┌──────────────┐  ┌──────────────┐ │  │
│  │  │ MCP Client      │→ │ LLM Client   │→ │ Command      │ │  │
│  │  │ (Query RAG)     │  │ (Get Plan)   │  │ Sandbox      │ │  │
│  │  └─────────────────┘  └──────────────┘  └──────────────┘ │  │
│  │                                               │            │  │
│  │  ┌─────────────────────────────────────────────┐          │  │
│  │  │ BlackArch Toolkit (2000+ tools)            │          │  │
│  │  │ • hydra, nmap, metasploit, sqlmap, etc.   │          │  │
│  │  └─────────────────────────────────────────────┘          │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  NetworkPolicy (Kernel-level Isolation)                    │  │
│  │  • Egress: ONLY target + MCP + LLM + DNS                  │  │
│  │  • Ingress: DENY ALL                                      │  │
│  └───────────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │ SSH (port 22)
                             ↓
┌─────────────────────────────────────────────────────────────────┐
│  Layer 4: Target Machine                                        │
│  • Intentionally vulnerable system                              │
│  • Weak credentials, SUID binaries, misconfigurations           │
└─────────────────────────────────────────────────────────────────┘

Component Details

1. LM Studio (LLM Inference Engine)

Role: Provides the "brain" for agent decision-making

Technology Stack:

LM Studio 0.3.28+ (local LLM inference)
Model: Qwen 2.5 Coder 14B Instruct (abliterated variant)
API: OpenAI-compatible REST API

Why Qwen 2.5 Coder Abliterated?

Uncensored: Standard models refuse security-related queries ("I cannot help with hacking")
Code-focused: Optimized for command syntax and technical tasks
Good reasoning: 14B parameters sufficient for attack planning
Local: No data leaves infrastructure (privacy + air-gap compatible)

Inference Configuration:

{
    "temperature": 0.4,        # Low = deterministic command generation
    "min_p": 0.08,             # Dynamic sampling (adapts to model confidence)
    "top_p": 1.0,              # Disabled (conflicts with min_p)
    "top_k": 0,                # Disabled (min_p is superior)
    "repeat_penalty": 1.08,    # Prevents "nmap... nmap... nmap..." loops
    "stream": False,           # Required for reliable text parsing
    "cache_prompt": True,      # Cache system prompt for speed
    "max_tokens": 2048         # Prevent runaway generation
}

Why These Settings?

Temperature 0.4: Research shows lower temp (0.3-0.5) better for tool use than default 0.7
Min-P 0.08: Dynamically filters tokens based on top token probability (better than static top_p)
Stream: False: Easier to parse complete responses vs streaming tokens

Known Limitation:

Tool calling API broken on abliterated models (see KNOWN-ISSUES.md)
Workaround: Agent-orchestrated pattern with text parsing

2. MCP RAG Server (Knowledge Base)

Role: Provides offensive security knowledge via semantic search

Technology Stack:

Web Framework: Flask 3.0.0 + Gunicorn (sync workers)
Vector Database: FAISS (Facebook AI Similarity Search)
Embeddings: sentence-transformers (all-MiniLM-L6-v2, 384 dims)
Protocol: Model Context Protocol (MCP) JSON-RPC 2.0

Why Flask + Gunicorn (not FastAPI + Uvicorn)?

K3s containerd blocks socketpair() syscall
FastAPI + Uvicorn requires socketpair for asyncio event loop
Flask + sync workers = no socketpair needed
Lesson: "Boring technology" wins in restricted environments

Data Sources (5,395 documents total):

GTFOBins (~200 files): Unix binary exploitation for privilege escalation
Atomic Red Team (~5,000 tests): MITRE ATT&CK-mapped adversary emulation
HackTricks (~200 pages): Offensive security playbooks and techniques

FAISS Index Configuration:

dimension = 384  # all-MiniLM-L6-v2 embedding size
index = faiss.IndexFlatL2(dimension)  # L2 distance (Euclidean)

Why L2 Distance?

Simple, fast, works well for cybersecurity technical content
Alternatives (cosine similarity) showed minimal improvement in testing
FAISS IndexFlatL2 supports exact search (no approximation errors)

MCP Tools Exposed:

search(query: str, top_k: int = 5)
- Semantic search over entire knowledge base
- Returns ranked results with source metadata
- Example: search("SSH brute force weak password")
list_techniques()
- Returns all 327 indexed MITRE ATT&CK techniques
- Useful for agent to explore available tactics

API Endpoint:

Internal: http://mcp-rag-server.default.svc.cluster.local:8080
External: http://192.168.1.181:30800 (NodePort)

3. Red Team Agent (Autonomous Executor)

Role: Orchestrates attack loop (query → plan → execute → observe → iterate)

Architecture Pattern: Agent-Orchestrated (not LLM-orchestrated)

Why Agent-Orchestrated?

# Agent controls the loop
while not objective_achieved:
    knowledge = query_mcp_rag(objective)
    plan = llm.generate(prompt_with_knowledge)
    commands = extract_commands_from_text(plan)
    result = execute_with_sandbox(commands[0])

# vs LLM-orchestrated (doesn't work with abliterated models)
# response = llm.chat(messages, tools=[query_rag_tool, execute_command_tool])
# for tool_call in response.tool_calls:  # <-- BROKEN on abliterated models
#     execute_tool(tool_call)

Benefits:

✅ Works with abliterated models (essential for uncensored security research)
✅ More transparent (can inspect exact prompts and responses)
✅ Easier to debug (print/log every step)
✅ More flexible (adjust extraction patterns without model changes)

Components:

a. MCPClient

class MCPClient:
    def __init__(self, mcp_url: str):
        self._initialize()  # Handshake: initialize → notifications/initialized → tools/list

    def search(self, query: str, top_k: int = 3) -> List[Dict]:
        # Query knowledge base via MCP tools/call

MCP Handshake:

initialize - Exchange protocol version and capabilities
notifications/initialized - Confirm ready
tools/list - Discover available tools
tools/call - Execute search/list_techniques

b. LLMClient

class LLMClient:
    def generate(self, prompt: str, system: str = None) -> str:
        # Call LM Studio /v1/chat/completions with optimized params

Prompt Structure:

System: [Comprehensive methodology with 5 phases]

User:
Relevant techniques from knowledge base:
[RAG results - top 3 documents]

Objective: Gain SSH access using weak credentials
Target: 192.168.1.99

Based on the knowledge base, provide ONE specific command.

c. CommandSandbox

class CommandSandbox:
    WHITELIST = [
        'ssh', 'hydra', 'nmap', 'metasploit', ...  # 2000+ BlackArch tools
    ]

    BLACKLIST_PATTERNS = [
        r'rm\s+-rf\s+/',    # Prevent destruction
        r'dd\s+if=.*of=/dev/',  # Prevent disk wipe
        r'mkfs',            # Prevent filesystem format
        ...
    ]

    def execute(self, command: str) -> Tuple[str, int]:
        if not self.is_safe(command):
            return "BLOCKED", -1

        result = subprocess.run(command, shell=True, capture_output=True)
        self.log(command, result.stdout, result.returncode)
        return result.stdout, result.returncode

Safety Features:

Whitelist: Only known tools allowed
Blacklist: Destructive patterns blocked
Logging: Every command recorded with timestamp
Timeout: 30s max execution time
Non-root: Runs as UID 1000

d. RedTeamAgent (Main Loop)

class RedTeamAgent:
    def attack_cycle(self, objective: str, max_iterations: int = 5):
        for iteration in range(max_iterations):
            # 1. Query knowledge base
            knowledge = self.query_knowledge_base(objective)

            # 2. Ask LLM for plan
            plan = self.ask_llm(plan_prompt, context=knowledge)

            # 3. Extract commands from freeform text
            commands = self._extract_commands(plan)

            # 4. Check for repetition (prevent infinite loops)
            if self._is_repeating(commands[0]):
                # Query RAG for alternative techniques
                alt_knowledge = self.query_knowledge_base(f"{objective} alternative")
                plan = self.ask_llm("Suggest DIFFERENT approach", context=alt_knowledge)
                commands = self._extract_commands(plan)

            # 5. Execute command
            self.command_history.append(commands[0])
            output, code = self.execute_command(commands[0])

            # 6. Check success
            if code == 0 and success_pattern_matches(output):
                print("SUCCESS!")
                break

            # 7. Ask LLM what to try next (with exit code feedback)
            feedback = f"Exit code: {code} {'✅' if code == 0 else '❌'}\nOutput: {output[:1024]}"
            next_action = self.ask_llm(feedback)

Key Features:

Repetition Detection: Tracks last 3 commands, auto-queries RAG for alternatives if stuck
Exit Code Feedback: LLM learns from success (0) vs failure (non-zero)
Output Truncation: 1024 chars max to LLM (prevents context overflow)
Command History: Full log saved to /var/log/agent/commands.log

4. BlackArch Container

Role: Provides offensive security toolkit in isolated environment

Base Image: archlinux:latest

Installed Tools (highlights):

Password Cracking: hydra, john, hashcat, medusa
Network Scanning: nmap, masscan, tcpdump, wireshark-cli
Web Tools: sqlmap, nikto, dirb, gobuster, wfuzz
Exploitation: metasploit, msfvenom
Wireless: aircrack-ng, airodump-ng, aireplay-ng
Enumeration: enum4linux, smbclient, rpcclient
+1990 more via BlackArch repository

Dockerfile Key Sections:

FROM archlinux:latest

# Install BlackArch repository
RUN curl -O https://blackarch.org/strap.sh && \
    chmod +x strap.sh && \
    ./strap.sh

# Install essential tools (2GB+ install)
RUN pacman -Sy --noconfirm \
    python python-pip \
    openssh nmap netcat curl wget \
    hydra sshpass nikto sqlmap \
    metasploit john hashcat \
    aircrack-ng wireshark-cli tcpdump

# Python dependencies
RUN pip install --no-cache-dir requests --break-system-packages

# Non-root user
RUN useradd -m -u 1000 agent
USER agent

CMD ["python", "/app/agent.py"]

Security Context (in K8s manifest):

securityContext:
  runAsUser: 1000
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault

Why Non-Root?

Defense in depth: Even if agent escapes sandbox, can't modify container
Best practice: Never run as root unless absolutely necessary
UID 1000: Standard user ID (not privileged)

5. Network Isolation (Kubernetes NetworkPolicy)

Role: Kernel-level enforcement of allowed network traffic

Why NetworkPolicy > iptables?

Declarative: Defined in YAML, version-controlled
Kubernetes-native: Integrated with pod lifecycle
Provable: Can verify policy with kubectl describe
Automatic: Applied immediately when pod starts

Policy Definition:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: agent-isolation
  namespace: redteam-lab
spec:
  podSelector:
    matchLabels:
      app: redteam-agent
  policyTypes:
    - Egress
  egress:
    # 1. Allow DNS (required for domain resolution)
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53

    # 2. Allow MCP RAG server (knowledge base)
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: default
          podSelector:
            matchLabels:
              app: mcp-rag-server
      ports:
        - protocol: TCP
          port: 8080

    # 3. Allow LM Studio (LLM inference)
    - to:
        - ipBlock:
            cidr: 192.168.1.84/32
      ports:
        - protocol: TCP
          port: 1234

    # 4. Allow target machine (attack surface)
    - to:
        - ipBlock:
            cidr: 192.168.1.99/32
      ports:
        - protocol: TCP
          port: 22  # SSH only

What's BLOCKED (implicit deny):

❌ Internet access (no 0.0.0.0/0 route)
❌ Other Kubernetes pods (except MCP server)
❌ Other LAN hosts (except target + LLM)
❌ Kubernetes API server
❌ All ingress traffic

Testing Isolation:

# Should FAIL (internet blocked)
kubectl exec redteam-agent -n redteam-lab -- curl https://google.com

# Should SUCCEED (MCP server allowed)
kubectl exec redteam-agent -n redteam-lab -- curl http://mcp-rag-server.default.svc.cluster.local:8080/healthz

# Should SUCCEED (target allowed)
kubectl exec redteam-agent -n redteam-lab -- nc -zv 192.168.1.99 22

Data Flow

Typical Attack Iteration

1. OBJECTIVE SET
   ↓
   User/Config: "Gain SSH access using weak credentials"

2. QUERY KNOWLEDGE BASE
   ↓
   Agent → MCP Server: tools/call("search", {"query": "SSH brute force weak password"})
   ↓
   MCP Server → FAISS: Vector similarity search
   ↓
   FAISS returns top 3 documents:
   - Atomic Red Team T1110.001 (Password Guessing)
   - HackTricks SSH Brute Force
   - Hydra documentation
   ↓
   MCP Server → Agent: Results with technique details

3. PLAN ATTACK
   ↓
   Agent → LM Studio: POST /v1/chat/completions
   {
     "system": "You are a red team agent... [methodology]",
     "user": "Knowledge base: [RAG results]\n\nObjective: Gain SSH access\nSuggest ONE command."
   }
   ↓
   LM Studio (Qwen 2.5 Coder):
   - Reads RAG context
   - Applies system prompt methodology
   - Generates response with reasoning + command
   ↓
   Response: "**Reasoning**: Try common passwords via hydra (T1110.001)\n**Command**: `hydra -l victim -p password123 ssh://192.168.1.99`"
   ↓
   Agent: Extract command via regex

4. EXECUTE COMMAND
   ↓
   Agent → CommandSandbox: execute("hydra -l victim -p password123 ssh://192.168.1.99")
   ↓
   Sandbox: Check whitelist (hydra ✅) and blacklist (no patterns matched ✅)
   ↓
   Sandbox → Shell: subprocess.run(command, capture_output=True)
   ↓
   Shell: Execute hydra against target
   ↓
   Target (192.168.1.99): Accept connection, validate password
   ↓
   Hydra: "[22][ssh] host: 192.168.1.99   login: victim   password: password123"
   ↓
   Sandbox → Agent: (stdout, exit_code=0)
   ↓
   Agent: Log to /var/log/agent/commands.log

5. OBSERVE RESULT
   ↓
   Agent: Check if "password" in output and exit_code == 0
   ↓
   Agent: SUCCESS! SSH credentials found.
   ↓
   Break loop

6. REPORT
   ↓
   Agent: Print summary (time, commands executed, success)

Failure Scenario (with repetition detection):

Iteration 1: Execute "nmap -sV 192.168.1.99" → Exit 0
Iteration 2: Execute "nmap -sV 192.168.1.99" → Exit 0 (repeated)
Iteration 3: Execute "nmap -sV 192.168.1.99" → REPETITION DETECTED
   ↓
   Agent: Query RAG("SSH access alternative to nmap")
   ↓
   LLM: Suggest "hydra" instead
   ↓
   Execute "hydra..." → Success

Design Decisions

1. Why Agent-Orchestrated Instead of LLM-Orchestrated?

Decision: Agent controls loop, LLM provides text responses (not tool calls)

Reason: LM Studio tool calling API broken on abliterated models

Evidence:

curl .../v1/chat/completions -d '{"tools": [...]}'
# Returns: {"error":"Unexpected empty grammar stack after accepting piece: {\""}

Trade-offs:

Approach	Pros	Cons
LLM-Orchestrated	Structured output, type safety	Doesn't work with abliterated models ❌
Agent-Orchestrated	Works with abliterated, more transparent	Requires text parsing

Conclusion: Agent-orchestrated is actually BETTER for our use case (transparency, flexibility)

2. Why Flask + Gunicorn for MCP Server?

Decision: Use Flask + sync workers (not FastAPI + Uvicorn)

Reason: K3s containerd blocks socketpair() syscall

Testing:

# FastAPI + Uvicorn
import asyncio
loop = asyncio.get_event_loop()  # Requires socketpair() → BLOCKED by K3s

# Flask + Gunicorn sync workers
# No asyncio, no socketpair needed → WORKS

Lesson: Async isn't always necessary. Sync workers handle ~100 req/s fine for our use case.

3. Why Min-P Sampling Instead of Top-P?

Decision: Use min_p=0.08 (not top_p=0.9)

Reason: Min-P adapts dynamically to model confidence

How it works:

Top-P (static threshold):
- Keep tokens until cumulative probability ≥ 0.9
- Same threshold regardless of model confidence

Min-P (dynamic threshold):
- Keep tokens with P ≥ (min_p × P_top_token)
- If top token = 80%, min_p 0.08 → keep tokens >6.4%
- If top token = 20%, min_p 0.08 → keep tokens >1.6%

Result: Fewer low-quality command suggestions, more deterministic outputs

Source: Research paper "Min-P Sampling: A Better Sampling Method" (2024)

4. Why Physical Target Instead of Container?

Decision: Attack real hardware (192.168.1.99), not just other pods

Reasons:

More realistic: Real attack surface (SSH, kernel, filesystem)
Better demo: Shows agent works on actual infrastructure
CVE testing: Can test real vulnerabilities, not simulated
Still safe: NetworkPolicy prevents lateral movement

Alternative considered: Vulnerable container (Damn Vulnerable Web App in pod)

Pros: Easier setup, faster reset
Cons: Less realistic, can't test OS-level exploits

5. Why BlackArch Instead of Kali Linux?

Decision: Use BlackArch repository (Arch Linux base)

Reasons:

Rolling release: Always latest tool versions
Lightweight: Minimal base image (~500MB vs Kali's ~2GB)
Pacman: Fast, reliable package manager
Consistency: Main PC already runs BlackArch

Alternative: Kali Linux

Pros: More popular, better documented
Cons: Debian-based (slower updates), larger images

Scalability & Performance

Current Performance

Single Agent:

Attack cycle: ~90 seconds (SSH compromise)
MCP query latency: <100ms (FAISS L2 search)
LLM inference: 5-15s per response (14B model, consumer GPU)
Total iterations: 1-5 (depending on objective complexity)

Resource Usage:

Agent pod: 0.5 CPU, 512MB RAM (typical)
MCP server: 0.2 CPU, 1GB RAM (FAISS index in memory)
LM Studio: 8GB VRAM (14B model in 4-bit quantization)

Scaling Considerations

Horizontal Scaling (multiple agents):

# Scale to 10 agents attacking different targets
kubectl scale deployment redteam-agent -n redteam-lab --replicas=10

Bottlenecks:

LM Studio: Single LLM instance = sequential inference
- Solution: LM Studio supports parallel requests (queue internally)
- Alternative: Deploy multiple LM Studio instances
MCP RAG Server: FAISS IndexFlatL2 is single-threaded
- Solution: FAISS supports GPU acceleration (IndexFlatL2 → IndexIVFFlat)
- Alternative: Multiple MCP server replicas (stateless, can load-balance)
Network: 1Gbps LAN sufficient for <100 agents
- MCP queries: ~50KB/query
- LLM responses: ~5KB/response

Future Enhancements:

LLM: Use vLLM for batched inference (10x throughput)
RAG: GPU-accelerated FAISS (100x faster search)
Agents: Distributed task queue (Celery + Redis)

Failure Modes & Recovery

1. LLM Generates Invalid Command

Symptom: Command extraction returns empty list

Recovery:

commands = self._extract_commands(plan)
if not commands:
    print("⚠️  No commands found")
    continue  # Skip iteration, query RAG again

Prevention: Structured system prompt enforces output format

2. Command Fails 3+ Times (Repetition)

Symptom: Same command executed 3+ times with failures

Recovery:

if self._is_repeating(command):
    # Query RAG for alternative techniques
    alt_knowledge = self.query_knowledge_base(f"{objective} alternative")
    plan = self.ask_llm("Suggest DIFFERENT approach")

Metrics: In testing, repetition detection reduced stuck loops by 80%

3. MCP Server Unavailable

Symptom: Connection refused or timeout

Recovery:

try:
    results = self.mcp.search(query)
except Exception as e:
    print(f"❌ MCP search failed: {e}")
    results = []  # Proceed without RAG context (degraded mode)

Alternative: Agent can still function with LLM-only (no RAG), just less accurate

4. Target Unreachable

Symptom: Network timeout or connection refused

Recovery:

output, code = self.execute_command(command)
if code != 0 and "timeout" in output.lower():
    print("⚠️  Target unreachable, aborting")
    break

NetworkPolicy blocks: Verified if unable to connect to allowed target IP

Security Considerations

Threat Model

Assumptions:

Agent code is trusted (user writes it)
LLM may generate malicious commands (defense: sandbox)
Target is intentionally vulnerable (expected)

Mitigations:

NetworkPolicy: Kernel-level isolation (not app-level)
Command Sandbox: Whitelist + blacklist (defense in depth)
Non-root: Agent can't modify container filesystem
Resource Limits: Prevent DoS (1 CPU, 1GB RAM)
Logging: Full audit trail of all commands

Residual Risks:

LLM prompt injection: Could trick agent into skipping sandbox
- Mitigation: Sandbox enforced in code, not via LLM
0-day in BlackArch tools: Could escape container
- Mitigation: seccomp RuntimeDefault blocks many syscalls
Kubernetes CVE: Could escape NetworkPolicy
- Mitigation: Keep K8s updated, monitor CVEs

Future Architecture Evolution

Short-Term (Next 3 Months)

Blue Team Agent: Defensive monitoring pod
- Detects agent activity via logs
- Competes with red team (red vs blue game)
Multi-Objective Support: Chain techniques
- Example: "Gain SSH access → privesc → exfiltrate flag"
Stealth Metrics: Measure detectability
- Track noise level (failed logins, scans, etc.)

Mid-Term (6-12 Months)

Reinforcement Learning: Agent learns from success/failure
- Reward: Time to compromise + stealth
- Technique: PPO or DQN
Multi-Agent Collaboration: Multiple agents coordinate
- Example: One agent distracts IDS, another exploits
Custom Exploit Development: Agent writes own scripts
- LLM generates Python/Bash exploits based on recon

Long-Term (12+ Months)

CTF Automation: Solve Capture The Flag challenges
Real CVE Exploitation: Test against actual vulnerabilities
Adversarial Training: Red agent trains blue agent (GAN-like)

Document Version: 1.0 Last Updated: October 9, 2025 Status: Production architecture, battle-tested

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Deep Dive

Adversarial Security Agents System Design

Table of Contents

System Overview

High-Level Architecture

Component Details

1. LM Studio (LLM Inference Engine)

2. MCP RAG Server (Knowledge Base)

3. Red Team Agent (Autonomous Executor)

a. MCPClient

b. LLMClient

c. CommandSandbox

d. RedTeamAgent (Main Loop)

4. BlackArch Container

5. Network Isolation (Kubernetes NetworkPolicy)

Data Flow

Typical Attack Iteration

Design Decisions

1. Why Agent-Orchestrated Instead of LLM-Orchestrated?

2. Why Flask + Gunicorn for MCP Server?

3. Why Min-P Sampling Instead of Top-P?

4. Why Physical Target Instead of Container?

5. Why BlackArch Instead of Kali Linux?

Scalability & Performance

Current Performance

Scaling Considerations

Failure Modes & Recovery

1. LLM Generates Invalid Command

2. Command Fails 3+ Times (Repetition)

3. MCP Server Unavailable

4. Target Unreachable

Security Considerations

Threat Model

Future Architecture Evolution

Short-Term (Next 3 Months)

Mid-Term (6-12 Months)

Long-Term (12+ Months)

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Deep Dive

Adversarial Security Agents System Design

Table of Contents

System Overview

High-Level Architecture

Component Details

1. LM Studio (LLM Inference Engine)

2. MCP RAG Server (Knowledge Base)

3. Red Team Agent (Autonomous Executor)

a. MCPClient

b. LLMClient

c. CommandSandbox

d. RedTeamAgent (Main Loop)

4. BlackArch Container

5. Network Isolation (Kubernetes NetworkPolicy)

Data Flow

Typical Attack Iteration

Design Decisions

1. Why Agent-Orchestrated Instead of LLM-Orchestrated?

2. Why Flask + Gunicorn for MCP Server?

3. Why Min-P Sampling Instead of Top-P?

4. Why Physical Target Instead of Container?

5. Why BlackArch Instead of Kali Linux?

Scalability & Performance

Current Performance

Scaling Considerations

Failure Modes & Recovery

1. LLM Generates Invalid Command

2. Command Fails 3+ Times (Repetition)

3. MCP Server Unavailable

4. Target Unreachable

Security Considerations

Threat Model

Future Architecture Evolution

Short-Term (Next 3 Months)

Mid-Term (6-12 Months)

Long-Term (12+ Months)