Skip to content

feat: introduce multi-agent red team orchestration framework#36

Merged
l50 merged 9 commits into
mainfrom
jayson/cap-847-implement-multi-agent-red-team-architecture-in-kubernetes
Jan 13, 2026
Merged

feat: introduce multi-agent red team orchestration framework#36
l50 merged 9 commits into
mainfrom
jayson/cap-847-implement-multi-agent-red-team-architecture-in-kubernetes

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented Jan 13, 2026

Key Changes:

  • Implemented distributed multi-agent orchestration with Redis and Kubernetes
  • Added robust configuration management and agent role specialization
  • Enabled Redis-based cross-pod task queues for agent communication
  • Provided detailed role-specific agent instructions and workflow automation

Added:

  • Multi-agent operation configuration:
    • New config/multi-agent-production.yaml defines operational parameters, agent
      roles, timeouts, recovery, priorities, and security for Kubernetes deployments
  • Multi-agent orchestration core:
    • src/ares/core/config.py: YAML/env config loader with agent and operation
      schemas, supporting overrides and caching
    • src/ares/core/dispatcher.py: Central RedTeamDispatcher for agent
      registration, message routing, task management, and shared state
    • src/ares/core/messages.py: Typed inter-agent protocol for tasks, discovery,
      and coordination
    • src/ares/core/task_queue.py: Redis-based cross-pod task queue for
      orchestrator <-> worker communication with heartbeat and queue stats
    • src/ares/core/worker.py: Worker agent loop for polling tasks from Redis or
      dispatcher and reporting results
    • src/ares/core/workflows.py: Automated credential expansion and exploitation
      workflows for recursive attack loops
    • src/ares/core/orchestrator.py: Main entrypoint for running multi-agent
      operations and coordination logic
    • src/ares/core/k8s_executor.py: (Deprecated) Kubernetes pod executor for
      direct pod command execution (retained for debugging/logging)
    • src/ares/core/recovery.py: OperationRecoveryManager for checkpointing,
      restoring, and cleaning up operation state in Redis
  • Agent factories and role specializations:
    • src/ares/core/factories/red_agents.py: Factories for creating role-specific
      agents, hooks, and toolsets with detailed orchestration logic
    • Role-specific agent instruction templates in
      src/ares/templates/redteam/agents/
  • Orchestrator and callback tools:
    • src/ares/tools/red/orchestrator.py: Tools for dispatching tasks, monitoring
      state, and reporting results for orchestrator, cracker, and lateral agents
  • Integration and end-to-end tests:
    • tests/integration/test_multi_agent_workflow.py,
      tests/integration/test_redis_task_queue_integration.py, and
      tests/test_task_queue.py for orchestrator, queue, and workflow validation

Changed:

  • pyproject.toml, poetry.lock, and uv.lock:
    • Added redis, kubernetes, and importlib_resources dependencies for
      distributed coordination and resource loading
    • Updated build includes to package templates and YAML configs in the wheel
  • Taskfile.yaml:
    • Added tasks for running, resuming, cleaning up, and monitoring multi-agent
      operations in Kubernetes, plus infrastructure checks
  • src/ares/core/__init__.py, src/ares/core/factories/__init__.py:
    • Re-exported new multi-agent orchestration, dispatcher, and workflow modules
  • src/ares/core/engines.py, src/ares/core/templates.py:
    • Refactored template path resolution to use importlib_resources for reliable
      access in installed packages and containers
  • src/ares/core/models.py:
    • Added multi-agent models: AgentInfo, AgentRole, TaskInfo, TaskResult,
      SharedRedTeamState, and cross-agent compatibility with single-agent state
  • src/ares/core/remote.py:
    • Unified execution logic to support "k8s", "local", and "ssm" modes for
      orchestrator/worker/EC2 with Redis-based command dispatch in K8s
  • src/ares/tools/red/network.py:
    • Expanded with low-hanging fruit credential discovery tools, LAPS, new ADCS,
      ACL, and delegation attack methods, and improved BloodHound host parsing
  • src/ares/core/factories/red_factory.py:
    • Enhanced hooks and workflow logic for vulnerability/attack tracking and
      reporting, including new low-hanging fruit tool prioritization
  • src/ares/main.py:
    • Added multi-agent and worker CLI commands for launching orchestrator and
      specialized agent pods with config-driven options

Removed:

  • Static template file locations:
    • All template and agent instruction paths are now accessed via package
      resources for portability and containerization
  • Redundant or single-agent-only code paths in factories and workflow logic

l50 added 8 commits January 12, 2026 15:48
…am capabilities

**Added:**

- RedTeamDispatcher class for centralized multi-agent task coordination and message
  routing (src/ares/core/dispatcher.py)
- KubernetesPodExecutor for executing commands in ephemeral Kubernetes pods, with pod
  discovery and retry (src/ares/core/k8s_executor.py)
- Inter-agent message protocol with Pydantic models and enums for agent communication
  (src/ares/core/messages.py)
- OperationRecoveryManager for checkpoint/restore of cluster-wide state in Redis to
  support pod crash recovery (src/ares/core/recovery.py)
- Multi-agent models: AgentInfo, AgentRole, SharedRedTeamState, TaskInfo, TaskResult,
  VulnerabilityInfo, AgentLocalState (src/ares/core/models.py)
- Factories for creating specialized red team agents by role, including agent
  registration and ensemble creation (src/ares/core/factories/red_agents.py)
- Orchestrator, Cracker, and Lateral callback toolsets for agent-specific multi-agent
  workflows (src/ares/tools/red/orchestrator.py)
- Dedicated agent instruction templates for orchestrator, cracker, lateral, privesc,
  acl_exploiter, poisoner, atomic roles (templates/redteam/agents/*.md.jinja)

**Changed:**

- __init__ files in core and factories updated to expose new dispatcher, recovery,
  Kubernetes executor, multi-agent factories, and models
- RedFactory and RedTeamState expanded to support multi-agent features and new attack
  chains
- Red team toolsets extended with CredentialDiscoveryTools, expanded ACL, ADCS,
  delegation, and MSSQL attack support (src/ares/tools/red/network.py)
- System instructions template updated with multi-agent, low-hanging fruit, and
  advanced attack path guidance (templates/redteam/agents/system_instructions.md.jinja)
- Tests for red_factory updated for new event/message model, more robust event
  handling, and ToolEnd checks (tests/test_red_factory.py)
- Exposed new orchestration and callback toolsets in tools/red/__init__.py

**Removed:**

- No files removed in this change.
**Changed:**

- Renamed `AgentRole.ORCHESTRATOR` to `AgentRole.ENUM` across all logic, including
  dispatcher, agent creation, and configuration to better reflect enumeration role
- Renamed `AgentRole.ACL_EXPLOITER` to `AgentRole.ACL` for brevity and clarity
- Renamed `AgentRole.POISONER` to `AgentRole.POISONING` for naming consistency
- Updated all role-based configuration dictionaries, instruction templates,
  capabilities, and factory logic to use new role names
- Changed default multi-agent ensemble roles to use new names
- Updated all references and logic in dispatcher to use new role names for
  subscriptions and routing

**Removed:**

- Deprecated old template files and replaced them by renaming:
  - `templates/redteam/agents/orchestrator.md.jinja` → `enum.md.jinja`
  - `templates/redteam/agents/acl_exploiter.md.jinja` → `acl.md.jinja`
  - `templates/redteam/agents/poisoner.md.jinja` → `poisoning.md.jinja`
**Changed:**

- Relocated all template files from top-level templates directory to
  src/ares/templates to improve project organization and align with standard
  source code structure. No content changes were made to the templates.
**Added:**

- Introduced multi-agent operation orchestration via new `ares.core.orchestrator` with
  workflow automation, agent ensemble creation, and dispatcher integration
- Implemented worker agent loop (`ares.core.worker`) for specialized agent task
  processing, heartbeat monitoring, and dispatcher task completion reporting
- Added workflow automation utilities (`ares.core.workflows`) for credential
  expansion and exploitation coordination in multi-agent operations
- Provided production YAML configuration for multi-agent operations
  (`config/multi-agent-production.yaml`) supporting agent roles, timeouts,
  priorities, and resource/security settings
- Added integration tests for end-to-end multi-agent workflow orchestration,
  vulnerability queue, credential expansion, and dispatcher message flow
  (`tests/integration/test_multi_agent_workflow.py`)

**Changed:**

- Updated `Taskfile.yaml` with new multi-agent red team tasks, status, checkpoint,
  and Kubernetes infrastructure checks for streamlined multi-agent management
- Extended `pyproject.toml` to include YAML, Jinja, and Markdown files in build
  artifacts and added conditional dependency for `importlib_resources`
- Updated `src/ares/core/__init__.py` to expose new orchestration, config,
  worker, and workflow modules in package exports
- Refactored dispatcher (`src/ares/core/dispatcher.py`) to add priority-based
  vulnerability queue, exploitation tracking, and async task completion handling
- Enhanced orchestrator toolset (`src/ares/tools/red/orchestrator.py`) with
  credential expansion, vulnerability queueing, and queue status reporting tools
- Improved template resource loading (`src/ares/core/templates.py`) for
  compatibility with package installations using `importlib_resources`
- Updated red team engines (`src/ares/core/engines.py`) to use new template
  resource loading for attack chain and detection recipe YAMLs
- Updated main CLI entrypoint (`src/ares/main.py`) to support multi-agent
  orchestration and worker agent invocation with config-driven argument parsing
- Extended recovery manager (`src/ares/core/recovery.py`) with periodic
  checkpointing using dispatcher state
- Improved remote execution module (`src/ares/core/remote.py`) to support both
  Kubernetes subprocess and AWS SSM execution modes

**Added:**

- Added new configuration module (`src/ares/core/config.py`) for YAML-driven,
  environment-variable-overrideable multi-agent operation settings

**Changed:**

- Updated README generator to use the new template directory location

**Removed:**

- No removals in this change set; all changes are additive or enhancements to
  support multi-agent red team workflows and orchestration
…rkers

**Added:**

- Added `redis` as a required dependency for worker operation coordination
- Implemented `discover_active_operation` async utility to scan Redis for the
  most recently checkpointed operation, enabling workers to auto-discover
  which operation to join if not specified
- Enhanced CLI and worker startup to support optional operation ID with
  auto-discovery logic

**Changed:**

- Updated worker launch flow to allow empty or missing operation IDs; workers
  now attempt Redis-based discovery before failing
- Improved CLI documentation and parameter handling to reflect new auto-
  discovery behavior, including updated usage examples and argument descriptions
- Adjusted handling of empty string operation IDs (e.g., from k8s configmaps)
  to trigger auto-discovery logic rather than error
- Updated lock and dependency files to include `redis` and the correct marker
  for `importlib-resources` based on Python version

**Removed:**

- Removed strict requirement for an explicit operation ID when starting a
  worker; this is now optional due to discovery logic
…oss-pod messaging

**Added:**

- Introduced `RedisTaskQueue` in `src/ares/core/task_queue.py` for cross-pod task and result messaging via Redis, supporting multi-agent workflows in Kubernetes
- Implemented `TaskMessage` and `TaskResult` Pydantic models for structured task/result exchange
- Added `RedisWorkerAgent` to `src/ares/core/worker.py` for polling Redis and reporting results in Kubernetes deployments
- Added `kubernetes>=29.0.0` as a dependency in `pyproject.toml` and `uv.lock` for direct K8s pod interactions
- Created `tests/integration/test_redis_task_queue_integration.py` for end-to-end Redis queue testing
- Created `tests/test_task_queue.py` for unit testing `RedisTaskQueue` behavior and models

**Changed:**

- Updated `RedTeamDispatcher` to use `RedisTaskQueue` when `redis_url` is set, falling back to in-memory queues otherwise
- Refactored all major dispatcher task routing methods (`request_crack`, `request_lateral_movement`, etc.) to support Redis queueing for cross-pod communication
- Extended dispatcher with `dispatch_and_wait` and `wait_for_redis_result` for synchronous-style orchestration via Redis
- Updated worker startup to prefer Redis-based polling in Kubernetes deployments and fall back to in-memory dispatcher in single-process mode
- Improved prompt generation for Redis task consumption in `generate_prompt_from_task`
- Updated orchestrator tools to support `wait_for_result` and timeout parameters, enabling synchronous workflows with Redis-backed workers
- Enhanced BloodHound output parsing and host registration for improved state sharing between agents
- Marked `KubernetesPodExecutor` as deprecated for task dispatch, recommending Redis-based task queue usage
- Updated integration tests and fixtures to mock or patch Redis for reliable test execution
- Updated `uv.lock` to reflect new dependencies: `kubernetes`, `durationpy`, `google-auth`, `pyasn1`, `pyasn1-modules`, `requests-oauthlib`, `rsa`, `websocket-client`

**Removed:**

- Removed direct K8s port-forward and local subprocess-based task routing from main orchestrator workflow in favor of in-cluster execution and Redis-based coordination
- Deprecated direct in-process pod execution for agent communication in favor of Redis queue mechanisms
@linear
Copy link
Copy Markdown

linear Bot commented Jan 13, 2026

CAP-847 Implement Multi-Agent Red Team Architecture in Kubernetes

Description:
Transition the Ares red team framework from a monolithic agent to a modular, multi-agent architecture, where each specialized agent runs in its own Kubernetes pod. This change enables parallel, coordinated penetration testing with role-based expertise and improved resiliency.


Objective:

Design and implement a Kubernetes-based multi-agent red team system with specialized agents, centralized task dispatch, and resilient shared state, supporting parallelized attack workflows and coordinated operations.


Scope of Work:

  • Finalize and review the RED-AGENTS-PROPOSAL.md architecture document
  • Implement RedTeamDispatcher for centralized task routing and credential broadcasting (pub/sub)
  • Develop KubernetesPodExecutor to manage ephemeral agent pods
  • Build SharedRedTeamState with Redis for state sharing and recovery
  • Define AgentRole enum and map roles to toolsets (ROLE_TOOLSETS)
  • Create a factory for spawning agents by role
  • Implement inter-agent communication protocol (message types: CredentialDiscovered, CrackRequest, etc.)
  • Update red_factory.py to track vulnerabilities, prioritize exploitation, and enforce completion criteria
  • Enhance network.py with credential discovery utilities (LDAP search, LAPS, spraying)
  • Draft and update role-specific system instructions (system_instructions.md.jinja)
  • Develop and expand the test suite for tracking logic, event handling, and pod recovery
  • Integrate Redis/etcd and Kubernetes into development and CI environments

Dependencies:

  • Redis/etcd deployed for shared state management
  • Access to a Kubernetes cluster (test and production)
  • Python Kubernetes client package available
  • Dreadnode Agent SDK with multi-agent support
  • Stakeholder review and approval of the architecture proposal

Acceptance Criteria:

  1. Each agent role (Orchestrator, Cracker, ACL Exploiter, PrivEsc, Lateral Mover, Poisoner, Atomic Red Team) is implemented as a separate, deployable Kubernetes pod.
  2. Central RedTeamDispatcher coordinates task assignment, credential sharing, and agent orchestration.
  3. Shared state is reliably persisted in Redis/etcd and supports recovery from individual pod failures.
  4. Inter-agent communication uses a defined message protocol and enables credential and task sharing.
  5. Vulnerability and credential discovery utilities are operational and integrated into the workflow.
  6. End-to-end tests demonstrate parallel operation, pod recovery, and successful exploitation loops.
  7. Documentation (RED-AGENTS-PROPOSAL.md and system instructions) is complete and reflects the implemented system.

Additional Notes:

  • Success metrics: Domain Admin access in under 2 hours, >90% credential discovery, 100% exploitation of found vulnerabilities, >95% pod recovery, >3x parallel efficiency over baseline.
  • Address open questions (LLM cost, T-code validation, blue team evasion, resource contention, rollback) during implementation.
  • Reference architecture diagrams and workflow examples in RED-AGENTS-PROPOSAL.md.
  • Consider performance and resource usage in Kubernetes during development.
  • N/A

@dreadnode-renovate-bot dreadnode-renovate-bot Bot added area/python area/pre-commit Changes made to pre-commit hooks labels Jan 13, 2026
**Changed:**

- Adjust temporary project structure in test to create the templates directory at
  `src/ares/templates` with parent directories, aligning with the path expected
  by the script in `test_generate_readme.py`
@l50 l50 merged commit 721f394 into main Jan 13, 2026
8 checks passed
@l50 l50 deleted the jayson/cap-847-implement-multi-agent-red-team-architecture-in-kubernetes branch January 13, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/pre-commit Changes made to pre-commit hooks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant