feat: introduce multi-agent red team orchestration framework by l50 · Pull Request #36 · dreadnode/ares

l50 · 2026-01-13T22:11:24Z

Key Changes:

Implemented distributed multi-agent orchestration with Redis and Kubernetes
Added robust configuration management and agent role specialization
Enabled Redis-based cross-pod task queues for agent communication
Provided detailed role-specific agent instructions and workflow automation

Added:

Multi-agent operation configuration:
- New config/multi-agent-production.yaml defines operational parameters, agent
  roles, timeouts, recovery, priorities, and security for Kubernetes deployments
Multi-agent orchestration core:
- src/ares/core/config.py: YAML/env config loader with agent and operation
  schemas, supporting overrides and caching
- src/ares/core/dispatcher.py: Central RedTeamDispatcher for agent
  registration, message routing, task management, and shared state
- src/ares/core/messages.py: Typed inter-agent protocol for tasks, discovery,
  and coordination
- src/ares/core/task_queue.py: Redis-based cross-pod task queue for
  orchestrator <-> worker communication with heartbeat and queue stats
- src/ares/core/worker.py: Worker agent loop for polling tasks from Redis or
  dispatcher and reporting results
- src/ares/core/workflows.py: Automated credential expansion and exploitation
  workflows for recursive attack loops
- src/ares/core/orchestrator.py: Main entrypoint for running multi-agent
  operations and coordination logic
- src/ares/core/k8s_executor.py: (Deprecated) Kubernetes pod executor for
  direct pod command execution (retained for debugging/logging)
- src/ares/core/recovery.py: OperationRecoveryManager for checkpointing,
  restoring, and cleaning up operation state in Redis
Agent factories and role specializations:
- src/ares/core/factories/red_agents.py: Factories for creating role-specific
  agents, hooks, and toolsets with detailed orchestration logic
- Role-specific agent instruction templates in
  src/ares/templates/redteam/agents/
Orchestrator and callback tools:
- src/ares/tools/red/orchestrator.py: Tools for dispatching tasks, monitoring
  state, and reporting results for orchestrator, cracker, and lateral agents
Integration and end-to-end tests:
- tests/integration/test_multi_agent_workflow.py,
  tests/integration/test_redis_task_queue_integration.py, and
  tests/test_task_queue.py for orchestrator, queue, and workflow validation

Changed:

pyproject.toml, poetry.lock, and uv.lock:
- Added redis, kubernetes, and importlib_resources dependencies for
  distributed coordination and resource loading
- Updated build includes to package templates and YAML configs in the wheel
Taskfile.yaml:
- Added tasks for running, resuming, cleaning up, and monitoring multi-agent
  operations in Kubernetes, plus infrastructure checks
src/ares/core/__init__.py, src/ares/core/factories/__init__.py:
- Re-exported new multi-agent orchestration, dispatcher, and workflow modules
src/ares/core/engines.py, src/ares/core/templates.py:
- Refactored template path resolution to use importlib_resources for reliable
  access in installed packages and containers
src/ares/core/models.py:
- Added multi-agent models: AgentInfo, AgentRole, TaskInfo, TaskResult,
  SharedRedTeamState, and cross-agent compatibility with single-agent state
src/ares/core/remote.py:
- Unified execution logic to support "k8s", "local", and "ssm" modes for
  orchestrator/worker/EC2 with Redis-based command dispatch in K8s
src/ares/tools/red/network.py:
- Expanded with low-hanging fruit credential discovery tools, LAPS, new ADCS,
  ACL, and delegation attack methods, and improved BloodHound host parsing
src/ares/core/factories/red_factory.py:
- Enhanced hooks and workflow logic for vulnerability/attack tracking and
  reporting, including new low-hanging fruit tool prioritization
src/ares/main.py:
- Added multi-agent and worker CLI commands for launching orchestrator and
  specialized agent pods with config-driven options

Removed:

Static template file locations:
- All template and agent instruction paths are now accessed via package
  resources for portability and containerization
Redundant or single-agent-only code paths in factories and workflow logic

…am capabilities **Added:** - RedTeamDispatcher class for centralized multi-agent task coordination and message routing (src/ares/core/dispatcher.py) - KubernetesPodExecutor for executing commands in ephemeral Kubernetes pods, with pod discovery and retry (src/ares/core/k8s_executor.py) - Inter-agent message protocol with Pydantic models and enums for agent communication (src/ares/core/messages.py) - OperationRecoveryManager for checkpoint/restore of cluster-wide state in Redis to support pod crash recovery (src/ares/core/recovery.py) - Multi-agent models: AgentInfo, AgentRole, SharedRedTeamState, TaskInfo, TaskResult, VulnerabilityInfo, AgentLocalState (src/ares/core/models.py) - Factories for creating specialized red team agents by role, including agent registration and ensemble creation (src/ares/core/factories/red_agents.py) - Orchestrator, Cracker, and Lateral callback toolsets for agent-specific multi-agent workflows (src/ares/tools/red/orchestrator.py) - Dedicated agent instruction templates for orchestrator, cracker, lateral, privesc, acl_exploiter, poisoner, atomic roles (templates/redteam/agents/*.md.jinja) **Changed:** - __init__ files in core and factories updated to expose new dispatcher, recovery, Kubernetes executor, multi-agent factories, and models - RedFactory and RedTeamState expanded to support multi-agent features and new attack chains - Red team toolsets extended with CredentialDiscoveryTools, expanded ACL, ADCS, delegation, and MSSQL attack support (src/ares/tools/red/network.py) - System instructions template updated with multi-agent, low-hanging fruit, and advanced attack path guidance (templates/redteam/agents/system_instructions.md.jinja) - Tests for red_factory updated for new event/message model, more robust event handling, and ToolEnd checks (tests/test_red_factory.py) - Exposed new orchestration and callback toolsets in tools/red/__init__.py **Removed:** - No files removed in this change.

**Changed:** - Renamed `AgentRole.ORCHESTRATOR` to `AgentRole.ENUM` across all logic, including dispatcher, agent creation, and configuration to better reflect enumeration role - Renamed `AgentRole.ACL_EXPLOITER` to `AgentRole.ACL` for brevity and clarity - Renamed `AgentRole.POISONER` to `AgentRole.POISONING` for naming consistency - Updated all role-based configuration dictionaries, instruction templates, capabilities, and factory logic to use new role names - Changed default multi-agent ensemble roles to use new names - Updated all references and logic in dispatcher to use new role names for subscriptions and routing **Removed:** - Deprecated old template files and replaced them by renaming: - `templates/redteam/agents/orchestrator.md.jinja` → `enum.md.jinja` - `templates/redteam/agents/acl_exploiter.md.jinja` → `acl.md.jinja` - `templates/redteam/agents/poisoner.md.jinja` → `poisoning.md.jinja`

**Changed:** - Relocated all template files from top-level templates directory to src/ares/templates to improve project organization and align with standard source code structure. No content changes were made to the templates.

**Added:** - Introduced multi-agent operation orchestration via new `ares.core.orchestrator` with workflow automation, agent ensemble creation, and dispatcher integration - Implemented worker agent loop (`ares.core.worker`) for specialized agent task processing, heartbeat monitoring, and dispatcher task completion reporting - Added workflow automation utilities (`ares.core.workflows`) for credential expansion and exploitation coordination in multi-agent operations - Provided production YAML configuration for multi-agent operations (`config/multi-agent-production.yaml`) supporting agent roles, timeouts, priorities, and resource/security settings - Added integration tests for end-to-end multi-agent workflow orchestration, vulnerability queue, credential expansion, and dispatcher message flow (`tests/integration/test_multi_agent_workflow.py`) **Changed:** - Updated `Taskfile.yaml` with new multi-agent red team tasks, status, checkpoint, and Kubernetes infrastructure checks for streamlined multi-agent management - Extended `pyproject.toml` to include YAML, Jinja, and Markdown files in build artifacts and added conditional dependency for `importlib_resources` - Updated `src/ares/core/__init__.py` to expose new orchestration, config, worker, and workflow modules in package exports - Refactored dispatcher (`src/ares/core/dispatcher.py`) to add priority-based vulnerability queue, exploitation tracking, and async task completion handling - Enhanced orchestrator toolset (`src/ares/tools/red/orchestrator.py`) with credential expansion, vulnerability queueing, and queue status reporting tools - Improved template resource loading (`src/ares/core/templates.py`) for compatibility with package installations using `importlib_resources` - Updated red team engines (`src/ares/core/engines.py`) to use new template resource loading for attack chain and detection recipe YAMLs - Updated main CLI entrypoint (`src/ares/main.py`) to support multi-agent orchestration and worker agent invocation with config-driven argument parsing - Extended recovery manager (`src/ares/core/recovery.py`) with periodic checkpointing using dispatcher state - Improved remote execution module (`src/ares/core/remote.py`) to support both Kubernetes subprocess and AWS SSM execution modes **Added:** - Added new configuration module (`src/ares/core/config.py`) for YAML-driven, environment-variable-overrideable multi-agent operation settings **Changed:** - Updated README generator to use the new template directory location **Removed:** - No removals in this change set; all changes are additive or enhancements to support multi-agent red team workflows and orchestration

…rkers **Added:** - Added `redis` as a required dependency for worker operation coordination - Implemented `discover_active_operation` async utility to scan Redis for the most recently checkpointed operation, enabling workers to auto-discover which operation to join if not specified - Enhanced CLI and worker startup to support optional operation ID with auto-discovery logic **Changed:** - Updated worker launch flow to allow empty or missing operation IDs; workers now attempt Redis-based discovery before failing - Improved CLI documentation and parameter handling to reflect new auto- discovery behavior, including updated usage examples and argument descriptions - Adjusted handling of empty string operation IDs (e.g., from k8s configmaps) to trigger auto-discovery logic rather than error - Updated lock and dependency files to include `redis` and the correct marker for `importlib-resources` based on Python version **Removed:** - Removed strict requirement for an explicit operation ID when starting a worker; this is now optional due to discovery logic

…oss-pod messaging **Added:** - Introduced `RedisTaskQueue` in `src/ares/core/task_queue.py` for cross-pod task and result messaging via Redis, supporting multi-agent workflows in Kubernetes - Implemented `TaskMessage` and `TaskResult` Pydantic models for structured task/result exchange - Added `RedisWorkerAgent` to `src/ares/core/worker.py` for polling Redis and reporting results in Kubernetes deployments - Added `kubernetes>=29.0.0` as a dependency in `pyproject.toml` and `uv.lock` for direct K8s pod interactions - Created `tests/integration/test_redis_task_queue_integration.py` for end-to-end Redis queue testing - Created `tests/test_task_queue.py` for unit testing `RedisTaskQueue` behavior and models **Changed:** - Updated `RedTeamDispatcher` to use `RedisTaskQueue` when `redis_url` is set, falling back to in-memory queues otherwise - Refactored all major dispatcher task routing methods (`request_crack`, `request_lateral_movement`, etc.) to support Redis queueing for cross-pod communication - Extended dispatcher with `dispatch_and_wait` and `wait_for_redis_result` for synchronous-style orchestration via Redis - Updated worker startup to prefer Redis-based polling in Kubernetes deployments and fall back to in-memory dispatcher in single-process mode - Improved prompt generation for Redis task consumption in `generate_prompt_from_task` - Updated orchestrator tools to support `wait_for_result` and timeout parameters, enabling synchronous workflows with Redis-backed workers - Enhanced BloodHound output parsing and host registration for improved state sharing between agents - Marked `KubernetesPodExecutor` as deprecated for task dispatch, recommending Redis-based task queue usage - Updated integration tests and fixtures to mock or patch Redis for reliable test execution - Updated `uv.lock` to reflect new dependencies: `kubernetes`, `durationpy`, `google-auth`, `pyasn1`, `pyasn1-modules`, `requests-oauthlib`, `rsa`, `websocket-client` **Removed:** - Removed direct K8s port-forward and local subprocess-based task routing from main orchestrator workflow in favor of in-cluster execution and Redis-based coordination - Deprecated direct in-process pod execution for agent communication in favor of Redis queue mechanisms

linear · 2026-01-13T22:11:27Z

CAP-847 Implement Multi-Agent Red Team Architecture in Kubernetes

Description:
Transition the Ares red team framework from a monolithic agent to a modular, multi-agent architecture, where each specialized agent runs in its own Kubernetes pod. This change enables parallel, coordinated penetration testing with role-based expertise and improved resiliency.

Objective:

Design and implement a Kubernetes-based multi-agent red team system with specialized agents, centralized task dispatch, and resilient shared state, supporting parallelized attack workflows and coordinated operations.

Scope of Work:

Dependencies:

Redis/etcd deployed for shared state management
Access to a Kubernetes cluster (test and production)
Python Kubernetes client package available
Dreadnode Agent SDK with multi-agent support
Stakeholder review and approval of the architecture proposal

Acceptance Criteria:

Each agent role (Orchestrator, Cracker, ACL Exploiter, PrivEsc, Lateral Mover, Poisoner, Atomic Red Team) is implemented as a separate, deployable Kubernetes pod.
Central RedTeamDispatcher coordinates task assignment, credential sharing, and agent orchestration.
Shared state is reliably persisted in Redis/etcd and supports recovery from individual pod failures.
Inter-agent communication uses a defined message protocol and enables credential and task sharing.
Vulnerability and credential discovery utilities are operational and integrated into the workflow.
End-to-end tests demonstrate parallel operation, pod recovery, and successful exploitation loops.
Documentation (RED-AGENTS-PROPOSAL.md and system instructions) is complete and reflects the implemented system.

Additional Notes:

Success metrics: Domain Admin access in under 2 hours, >90% credential discovery, 100% exploitation of found vulnerabilities, >95% pod recovery, >3x parallel efficiency over baseline.
Address open questions (LLM cost, T-code validation, blue team evasion, resource contention, rollback) during implementation.
Reference architecture diagrams and workflow examples in RED-AGENTS-PROPOSAL.md.
Consider performance and resource usage in Kubernetes during development.
N/A

**Changed:** - Adjust temporary project structure in test to create the templates directory at `src/ares/templates` with parent directories, aligning with the path expected by the script in `test_generate_readme.py`

l50 added 8 commits January 12, 2026 15:48

fix: add enum role to valid roles list

3b58313

fix: add enum to role_mapping dictionary

8d94157

dreadnode-renovate-bot Bot added area/python area/pre-commit Changes made to pre-commit hooks labels Jan 13, 2026

test: update test project structure to match script expectations

55bab1d

**Changed:** - Adjust temporary project structure in test to create the templates directory at `src/ares/templates` with parent directories, aligning with the path expected by the script in `test_generate_readme.py`

l50 merged commit 721f394 into main Jan 13, 2026
8 checks passed

l50 deleted the jayson/cap-847-implement-multi-agent-red-team-architecture-in-kubernetes branch January 13, 2026 22:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce multi-agent red team orchestration framework#36

feat: introduce multi-agent red team orchestration framework#36
l50 merged 9 commits into
mainfrom
jayson/cap-847-implement-multi-agent-red-team-architecture-in-kubernetes

l50 commented Jan 13, 2026

Uh oh!

linear Bot commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l50 commented Jan 13, 2026

Uh oh!

linear Bot commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant