The Ares Blue Agent is an autonomous SOC investigation system. It picks up Grafana alerts, queries Loki logs and Prometheus metrics for evidence, maps findings to MITRE ATT&CK, and writes investigation reports.
Key Capabilities:
- Alert triage and multi-stage investigation (triage → causation → lateral → synthesis)
- LogQL/PromQL query optimization with rate limiting and retry
- Evidence extraction using the Pyramid of Pain framework
- MITRE ATT&CK technique mapping and gap analysis
- Lateral movement detection and scope expansion
- Attack precursor identification (root cause analysis)
- Historical investigation store for pattern matching and false-positive tracking
- Red-Blue correlation to surface detection gaps
- Markdown report generation with timeline, evidence inventory, and recommendations
Location: ares-cli/src/orchestrator/blue/
The investigation orchestrator manages the full investigation lifecycle:
- Coordinates LLM-powered investigation agents for Grafana alerts
- Dispatches tasks to specialized sub-agents (triage, threat hunter, lateral analyst, escalation)
- Chains follow-up investigations based on discovered evidence types
- Enforces hard timeout watchdog (1 min/step + 2 min buffer)
- Generates partial reports on timeout
- Handles investigation state persistence via Redis (task queues run on NATS JetStream)
Location: ares-cli/src/worker/blue_task_loop.rs
Runs the worker-side investigation loop with:
- Adaptive query limits based on alert severity and stage
- Query optimization and duplicate detection
- Rate limiting to prevent resource abuse
- Automatic retry with exponential backoff
- Resilience mechanisms for failed queries
Location: ares-core/src/models/blue.rs
The SharedBlueTeamState model tracks:
- Investigation ID, alert context, current stage
- Evidence inventory with pyramid level classification
- Timeline of events with MITRE technique mappings
- Investigative questions from question engines
- Query execution log
- Identified MITRE techniques and tactics
- Queried hosts/users for scope tracking
- Lateral movement graph
- Attack synopsis and recommendations
- Escalation status
- Initial alert analysis
- First-level evidence gathering
- IOC extraction (IPs, domains, hashes, processes)
- Basic timeline construction
- Query limit: 8 queries (12 for critical alerts)
- Root cause analysis
- Precursor attack identification
- Attack chain reconstruction
- Evidence validation and correlation
- Query limit: 14 queries
- Lateral movement detection
- Impact assessment across hosts/users
- Scope expansion to compromised assets
- Connection graph construction
- Query limit: 20 queries
- Evidence consolidation
- MITRE ATT&CK mapping
- Pyramid of Pain assessment
- Recommendations generation
- Markdown report creation
- Query limit: 20 queries
Alert Detected
↓
TRIAGE (query observability data)
↓
CAUSATION (find root cause)
↓
LATERAL (assess scope)
↓
SYNTHESIS (generate report)
↓
Report Delivered
Location: ares-tools/src/ (blue feature)
record_evidence(
evidence_type: EvidenceType, // ip, domain, hash, process, file, user, etc.
value: String,
pyramid_level: i32, // 1=Hash Values, 6=TTPs
mitre_techniques: Vec<String>,
confidence: f64, // 0.0-1.0
description: String,
source_query: Option<String>
)
Evidence Types:
ip- IP addressesdomain- Domain nameshash- File hashesprocess- Process names/pathsfile- File pathsuser- User accountsservice- Services/daemonstool- Attack toolsmalware- Malware familiestechnique- MITRE techniquesbehavior- Attack behaviors
Pyramid of Pain Levels:
- Hash Values (trivial to change)
- IP Addresses
- Domain Names
- Network/Host Artifacts
- Tools
- TTPs (hard to change)
add_timeline_event(
timestamp: String,
description: String,
mitre_technique: Option<String>,
evidence_ids: Vec<String>,
severity: String // info, low, medium, high, critical
)
track_host_investigation(hostname: String)
track_user_investigation(username: String)
complete_investigation(
attack_synopsis: String,
recommendations: Vec<String>,
should_escalate: bool,
escalation_reason: Option<String>
)
Finalizes investigation with:
- Attack summary and recommendations
- Automatic response guidance extraction from alert annotations
- Fallback synopsis generation from collected evidence
- Investigation report generation trigger
get_firing_alerts() -> Vec<Alert>
get_alert_history(alert_name, lookback_hours) -> Vec<Alert>
post_investigation_started(investigation_id, alert_name)
post_investigation_completed(investigation_id, report_url)
Features:
- MCP connection management (60s timeout with fallback)
- Multi-endpoint support for different Grafana versions
- Automatic annotation creation on Grafana dashboards
query_loki(
logql: String,
start_time: String,
end_time: String,
limit: i32 = 100
) -> Vec<LogLine>
Features:
- Query validation and optimization
- Regex error detection (catches empty-compatible patterns like
.*) - Label matchers, line filters, parsers support
- Result streaming with configurable line limits
- Automatic time range adjustment on timeout
query_prometheus_instant(query: String, time: String)
query_prometheus_range(query: String, start: String, end: String, step: String)
get_metric_metadata(metric: String)
Pre-built LogQL queries optimized for detecting red team attack patterns:
- Windows Event ID detection templates
- Pattern-based filters for common attack techniques
- Performance optimization (prefer
|=over|~) - Optimized selectors to prevent Loki timeouts
Example templates:
- Lateral movement detection (RDP, SMB, WMI, PSExec)
- Privilege escalation events
- Credential dumping patterns
- Suspicious process execution
- Network reconnaissance
get_combined_questions() -> Vec<InvestigativeQuestion>
Generates investigative questions from three engines:
-
MITRE Navigator Engine
- Maps evidence to MITRE techniques
- Predicts follow-on techniques in attack chains
- Identifies tactic gaps in coverage
-
Pyramid Climber Engine
- Pushes investigation from IOCs toward TTPs
- Encourages evidence at higher pyramid levels
- Guides analysts toward actionable intelligence
-
Detection Recipes Engine
- Windows Security Event patterns
- Structured investigation workflows
- Event ID correlation patterns
find_similar_investigations(
alert_name: String,
mitre_techniques: Vec<String>,
severity: String
) -> Vec<Investigation>
Features:
- Historical investigation lookup
- Query effectiveness statistics
- False positive pattern learning
- Investigation pattern matching
- Technique name resolution
- Tactic mapping (Reconnaissance, Initial Access, Execution, etc.)
- Attack lifecycle coverage analysis
- Technique relationship mapping
Location: ares-core/src/correlation/
The AlertCluster class groups related alerts using similarity scoring:
Similarity Factors:
- Common hosts (40% weight)
- Common users (30% weight)
- Common IPs (20% weight)
- Shared MITRE techniques (10% weight)
Features:
- Time-window clustering
- Extracts hosts, users, IPs, techniques from alert labels/annotations
- Identifies campaign patterns across multiple alerts
Location: ares-core/src/state/
The LateralGraph tracks host-to-host connections and attack spread:
Connection Types:
- SMB (file shares)
- RDP (remote desktop)
- WMI (Windows Management Instrumentation)
- PSExec (remote execution)
- SSH (secure shell)
- WinRM (Windows Remote Management)
- DCOM (Distributed COM)
Features:
- Investigated vs pending hosts tracking
- Pivot suggestions for scope expansion
- Evidence linkage to connections
- MITRE technique associations
Location: ares-core/src/correlation/
Correlates red team activities with blue team detections to identify gaps:
Components:
RedTeamActivity- Captures red team attack actionsBlueTeamDetection- Records blue team alert/investigation resultsCorrelationMatch- Links activities to detectionsDetectionGap- Identifies undetected red team activitiesCorrelationReport- Full correlation analysis
Match Quality Levels:
- STRONG - Direct correlation with high confidence
- GOOD - Clear correlation with supporting evidence
- WEAK - Possible correlation with limited evidence
- TENUOUS - Low confidence correlation
Location: ares-core/src/
Automatic validation of recorded evidence:
- IOC extraction from query results
- Validation against recent query results
- Confidence adjustment based on validation status
- Suggested IOCs from query data
- Source query tracking for provenance
Location: ares-core/src/
Ensures reliable query execution:
- Automatic retry with exponential backoff
- Timeout handling with time range reduction
- Query result caching
- Connection pooling
Query limits scale based on alert severity and investigation stage:
Base Limits:
- Normal alerts: 8 queries per investigation
- Critical alerts: 12 queries per investigation
Stage-Based Limits:
- Triage: 8 queries
- Causation: 14 queries
- Lateral: 20 queries
- Synthesis: 20 queries
Bonus Queries:
- +3 for finding evidence
- +2 for reaching Pyramid level 4+ (Tools/TTPs)
Hard Limits:
- Maximum 25 total queries
- Maximum 2 runs of identical query (duplicate detection)
- Free retries for queries returning 0 results
Prevents Broad Selectors:
# BAD - Too broad, causes timeouts
{job=~".+"}
{deployment=~".+"}
# GOOD - Specific labels
{job="eventlog"}
{deployment="windows-hosts"}
Filter Recommendations:
# PREFER: Fast string contains
{job="eventlog"} |= "4624"
# AVOID: Slow regex when unnecessary
{job="eventlog"} |~ "4624"
Best Practices:
- Use specific label selectors (job, deployment, namespace)
- Apply line filters (
|=) before regex patterns (|~) - Limit time ranges for large datasets
- Use streaming aggregations when possible
The blue agent uses MCP to connect to Grafana and access observability data:
Capabilities:
- Grafana datasource discovery
- Loki label name and value enumeration
- Prometheus metric discovery
- Alert rule management
- Dashboard and panel access
- Annotation creation and management
- Multi-architecture image rendering
Setup: See Grafana MCP Setup for MCP server installation instructions.
Location: ares-core/src/reports/
Investigation reports include:
-
Executive Summary
- High-level findings
- Alert context and severity
- Key evidence summary
-
Timeline of Events
- Chronological attack progression
- Pyramid level indicators
- MITRE technique mappings
-
MITRE ATT&CK Mapping
- Identified techniques and tactics
- Tactical coverage analysis
- Attack lifecycle visualization
-
Pyramid of Pain Assessment
- IOC type distribution
- Progression toward TTPs
- Actionable intelligence rating
-
Evidence Inventory
- Complete evidence list with sources
- Confidence ratings
- Validation status
-
Scope Analysis
- Affected hosts and users
- Impacted services
- Lateral movement paths
-
Recommendations
- Immediate response actions
- Remediation steps
- Detection improvements
-
Appendix
- Raw query data
- Investigation metadata
- JSON export
Completed investigations are stored for learning and reference:
- Investigation store for historical lookup
- Query effectiveness statistics
- Pattern matching for similar cases
- False positive tracking
The blue agent uses four mandatory question engines to guide investigations:
Identifies what came BEFORE the detected technique:
- Analyzes MITRE attack phases
- Identifies likely precursor techniques
- Builds complete attack chains
- Focuses on root cause analysis
Maps techniques and predicts progression:
- Maps evidence to MITRE techniques
- Predicts follow-on techniques
- Identifies tactical gaps in coverage
- Suggests techniques commonly seen together
Pushes investigation toward actionable intelligence:
- Guides from IOCs (hashes, IPs) toward TTPs
- Encourages evidence at higher pyramid levels
- Focuses on attacker behaviors vs artifacts
- Prioritizes hard-to-change indicators
Provides structured investigation workflows:
- Windows Event ID patterns
- Event correlation sequences
- Investigation checklists
- Known attack patterns
Critical Focus Areas:
- Query efficiency: query → record evidence → complete (minimize query loops)
- Use current time values (not stale alert timestamps)
- Mandatory datasource discovery workflow
- Label value enumeration to prevent timeouts
- Immediate evidence recording after queries
- Precursor investigation emphasis (root cause)
- Lateral scope expansion for high/critical alerts
Anti-Patterns to Avoid:
- Multiple queries without recording evidence
- Broad regex patterns in label selectors
- Long time ranges on high-cardinality data
- Duplicate or redundant queries
- Investigation without following question engines
- Ignoring query result validation
| Component | Path |
|---|---|
| Blue Orchestrator | ares-cli/src/orchestrator/blue/ |
| Blue Worker Task Loop | ares-cli/src/worker/blue_task_loop.rs |
| Blue CLI Commands | ares-cli/src/blue/ |
| Core Models | ares-core/src/models/ |
| State Management | ares-core/src/state/ |
| Correlation Engine | ares-core/src/correlation/ |
| Report Generation | ares-core/src/reports/ |
| Tool Dispatch | ares-tools/src/ |
| Configuration | config/ares.yaml |
Blue agent configuration in config/ files:
blue_team:
investigation:
max_queries: 25 # Hard query limit
timeout_per_step: 60 # Seconds per investigation step
timeout_buffer: 120 # Extra seconds before hard timeout
query_cache_ttl: 300 # Query cache TTL in seconds
observability:
loki_timeout: 30 # Loki query timeout
prometheus_timeout: 30 # Prometheus query timeout
default_log_limit: 100 # Default log line limit
reporting:
format: markdown # Report format
include_raw_data: true # Include appendix with raw data
export_json: true # Export JSON alongside markdown- API keys in
.envor 1Password:ANTHROPIC_API_KEY,OPENAI_API_KEY,GRAFANA_SERVICE_ACCOUNT_TOKEN,DREADNODE_API_KEY - Grafana MCP configured (see Grafana MCP Usage)
- Redis accessible (K8s in-cluster, or port-forwarded for local/EC2)
- ares binary built (
cargo build --release)
# 1. Start a blue investigation from the latest red team operation
task blue:once LATEST=true
# 2. Monitor progress
task blue:multi:status LATEST=true
# 3. View results
task blue:multi:evidence LATEST=true
task blue:multi:techniques LATEST=true
task blue:reports:consolidate LATEST=trueAll blue team tasks are invoked via task blue:<command>. Most accept
OPERATION_ID=op-xxx or LATEST=true to identify the target.
# Single investigation from a red team operation (local execution)
task blue:once OPERATION_ID=op-xxx
task blue:once LATEST=true
# Single investigation from a red team operation (K8s remote)
task blue:once:remote LATEST=true
# Submit a specific alert JSON file
task blue:investigate ALERT=alert.json
# Continuous poll mode (re-checks every POLL_INTERVAL seconds)
task blue:poll
# Multi-agent investigation via K8s orchestrator
task blue:multi ALERT=alert.json
task blue:multi ALERT=alert.json INVESTIGATION_ID=inv-xxx MULTI_AGENT=true
# Multi-agent from red team operation (K8s remote)
task blue:multi:remote LATEST=true
task blue:multi:remote OPERATION_ID=op-xxx# Investigation status
task blue:multi:status LATEST=true
task blue:multi:status INVESTIGATION_ID=inv-xxx
# Aggregate status for all investigations in an operation
task blue:multi:operation-status LATEST=true
task blue:multi:operation-status LATEST=true WATCH=10 # auto-refresh
# List all investigations
task blue:multi:list
# Runtime info
task blue:multi:runtime LATEST=true
# Triage decision audit trail
task blue:multi:triage-status LATEST=true
# Follow logs
task blue:multi:logs # orchestrator only
task blue:multi:logs ALL=true # all blue pods
task blue:multi:logs ROLE=threat-hunter # specific role# Evidence collected (Pyramid of Pain items)
task blue:multi:evidence LATEST=true
task blue:multi:evidence LATEST=true JSON=true # machine-readable
# MITRE ATT&CK techniques identified
task blue:multi:techniques LATEST=true# Generate consolidated report from Redis state
task blue:reports:consolidate LATEST=true
task blue:reports:consolidate OPERATION_ID=op-xxx OUTPUT_DIR=./reports
# Export detection playbook (runs on red orchestrator pod)
task blue:playbook LATEST=true
task blue:playbook OPERATION_ID=op-xxx JSON=true
# List / view local reports
task blue:reports:list
task blue:reports:latest
# Clean up reports
task blue:reports:clean# Delete a single investigation
task blue:multi:delete INVESTIGATION_ID=inv-xxx
# Delete an operation and all its investigations
task blue:multi:delete-operation OPERATION_ID=op-xxx
# Clean up investigations older than N hours
task blue:multi:cleanup MAX_AGE_HOURS=24
task blue:multi:cleanup ALL=true DRY_RUN=true # preview before deletingFor environments without Taskfile, or when you need more control, use
ares directly. Add --k8s <NAMESPACE> for K8s or --ec2 <NAME> for
EC2 transport.
# Submit from red team operation alerts
ares blue from-operation --latest
ares --k8s attack-simulation blue from-operation op-xxx
# Submit a single alert
ares blue submit '{"alert_title":"Suspicious LSASS","severity":"high"}'
# Continuous poll mode
ares blue watch --poll-interval 30 --max-steps 50
# Investigation status and results
ares blue list
ares blue status --latest
ares blue evidence --latest
ares blue evidence --latest --json
ares blue techniques --latest
ares blue runtime --latest
ares blue triage-status --latest
ares blue operation-status --latest --watch 10
# Report generation
ares blue report --latest --output-dir ./reports
ares blue report --operation-id op-xxx --regenerate
# Cleanup
ares blue delete inv-xxx --force
ares blue delete-operation op-xxx --force
ares blue cleanup --max-age-hours 24 --all --force
ares blue cleanup --dry-runWhen running on EC2 instead of K8s, port-forward Redis first:
# Start SSM port-forward (Redis on localhost:16379)
task ec2:redis:forward EC2_NAME=ares-tools
# In another terminal, run blue commands with the forwarded Redis
ARES_REDIS_URL=redis://localhost:16379 ares blue from-operation --latestSet BLUE_ENABLED=1 to start blue team investigations automatically when
a red team operation runs:
task red:ec2:multi TARGET=dreadgoad DOMAIN=contoso.local BLUE_ENABLED=1| Variable | Default | Description |
|---|---|---|
MODEL |
config file | LLM model override |
POLL_INTERVAL |
30 |
Seconds between poll cycles |
MAX_STEPS_BLUE |
50 |
Max agent steps (watch/poll mode) |
MAX_STEPS_BLUE_ONCE |
15 |
Max agent steps (once/investigate mode) |
GRAFANA_URL |
(none - must be set) | Grafana instance |
K8S_NAMESPACE |
attack-simulation |
K8s namespace for remote commands |
REPORT_DIR |
./reports |
Report output directory |
LOG_DIR |
./logs |
Log output directory |
The Ares Blue Agent handles autonomous SOC investigation:
- Picks up alerts from Grafana
- Queries Loki and Prometheus with rate limiting and retry
- Extracts evidence using the Pyramid of Pain framework
- Maps to MITRE ATT&CK for tactical context and gap analysis
- Identifies attack precursors to build complete attack chains
- Detects lateral movement and expands investigation scope
- Correlates related alerts to identify campaign patterns
- Learns from past investigations
- Generates reports with timelines, recommendations, and evidence
- Posts annotations back to Grafana
The blue agent cuts investigation time by automating the triage-to-report pipeline. The Red-Blue correlation loop surfaces detection gaps that manual review tends to miss.