CodeWeaver System

An autonomous SRE agent system with chaos engineering testing.

🎯 Problem Statement

In modern cloud-native environments, Site Reliability Engineers (SREs) face several critical challenges:

24/7 Incident Response: Production systems fail at any time, requiring constant human monitoring and immediate response, leading to burnout and high operational costs.
Manual Diagnostics: Engineers spend significant time reading through logs, correlating events, and identifying root causes during incidents - time that could be better spent on prevention and innovation.
Slow Mean Time to Recovery (MTTR): Even with runbooks and documentation, the time between incident detection and resolution remains high due to human involvement in the diagnostic and remediation loop.
Alert Fatigue: Teams are overwhelmed with alerts, many of which could be resolved automatically with the right context and decision-making capabilities.
Inconsistent Response: Different engineers may handle the same incident differently, leading to varying resolution times and outcomes.
Lack of Proactive Testing: Chaos engineering and failure injection are often manual processes, making it difficult to validate system resilience continuously.

💡 Our Solution: CodeWeaver

CodeWeaver is an autonomous AI-powered SRE agent that combines intelligent log analysis, self-healing capabilities, and chaos engineering to deliver true autonomous incident response:

Key Features

🤖 Autonomous Incident Response

Receives alerts via webhooks and immediately springs into action
No human intervention required for common failure scenarios
Continuous learning from incident patterns

🔍 AI-Powered Diagnostics

Uses advanced LLM (Groq) to analyze logs and identify root causes
Correlates errors across distributed systems
Provides intelligent remediation recommendations

🛠️ Self-Healing Execution

Generates and executes Python scripts to resolve issues autonomously
Interacts with service APIs to trigger recovery actions
Validates successful remediation

🧪 Built-in Chaos Engineering

Integrated chaos-app for continuous resilience testing
Simulates real-world failures (network issues, service crashes, etc.)
Validates that the autonomous agent can handle failures before they occur in production

📊 Full Observability

Shared log volumes for seamless log access
Real-time monitoring through structured logging
Transparent decision-making process

How It Works

Detection: CodeWeaver receives an alert via webhook when a service degrades
Diagnosis: AI agent reads logs from shared volumes and analyzes error patterns
Planning: LLM generates an execution plan with Python scripts to resolve the issue
Execution: Agent automatically executes the remediation scripts
Validation: Verifies that the service has been restored to healthy state
Learning: Logs the entire process for future reference and improvement

Impact

⚡ Reduced MTTR: From minutes/hours to seconds
💰 Lower Operational Costs: Reduces need for 24/7 on-call rotations
🎯 Consistent Response: Same high-quality resolution every time
🛡️ Proactive Resilience: Continuous chaos testing ensures readiness
😌 Reduced Burnout: Engineers focus on innovation, not firefighting

🚀 Quick Start with Docker Compose

Prerequisites

Docker and Docker Compose installed
Groq API key (get from https://console.groq.com/keys)

1. Set up environment variables

Create a .env file in the Core directory:

GROQ_API_KEY=your_groq_api_key_here

2. Start the system

docker compose up --build

This will start:

chaos-app on port 8000 - Service that can simulate failures
codeweaver-agent on port 8001 - Autonomous SRE agent

3. Test the system

Trigger a failure:

curl -X POST http://localhost:8000/chaos/trigger

Send alert to trigger autonomous recovery:

curl -X POST http://localhost:8001/webhook/alert \
  -H "Content-Type: application/json" \
  -d '{"data": {"message": "Service Down", "severity": "critical"}}'

Watch the magic happen:

CodeWeaver reads logs from shared volume
AI analyzes the error (ConnectionRefused)
Plans a restart action
Executes POST /chaos/resolve
Service recovers automatically! ✨

🏗️ Architecture

┌─────────────────────┐         ┌─────────────────────┐
│   Chaos App         │         │  CodeWeaver Agent   │
│   Port: 8000        │◄────────│   Port: 8001        │
│                     │         │                     │
│  Simulates failures │         │  Monitors & Fixes   │
│  Writes logs        │         │  Reads logs via     │
│  /var/log/chaos-app │────────►│  shared volume      │
└─────────────────────┘         └─────────────────────┘
         │                               │
         └───────────┬───────────────────┘
                     │
              Shared Volume
            (shared-logs)

📁 Project Structure

codeweaver/
├── docker-compose.yml       # Orchestration config
├── Core/                    # SRE Agent
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── .env                # API keys
│   └── src/
│       ├── main.py         # FastAPI app
│       ├── diagnoser.py    # AI log analysis
│       ├── planner.py      # Action planning
│       └── executor.py     # Action execution
└── chaos-app/              # Test service
    ├── Dockerfile
    ├── requirements.txt
    └── main.py

🔧 Services

Chaos App

Endpoints:
- GET / - Health check
- GET /buy - Payment endpoint (fails when broken)
- GET /status - Check if in chaos mode
- POST /chaos/trigger - Activate chaos mode
- POST /chaos/resolve - Deactivate chaos mode

CodeWeaver Agent

Endpoints:
- GET / - Health check
- POST /webhook/alert - Receive alerts and trigger autonomous recovery

🐳 Docker Compose Features

Shared Logs: Volume shared-logs allows agent to read chaos-app logs
Custom Network: codeweaver-net bridge for service communication
Health Checks: chaos-app must be healthy before agent starts
Environment: GROQ_API_KEY passed from .env file

🛠️ Local Development

CodeWeaver Agent

cd Core
python -m venv venv
.\venv\Scripts\Activate.ps1  # Windows
source venv/bin/activate      # Linux/Mac
pip install -r requirements.txt
uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload

Chaos App

cd chaos-app
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

📊 Monitoring

View logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f codeweaver-agent
docker-compose logs -f chaos-app

# Shared logs volume
docker exec -it codeweaver-agent cat /logs/chaos-app/service.log

🧪 Full Integration Test

Start the system:
```
docker-compose up -d
```

Verify services are running:

curl http://localhost:8000/
curl http://localhost:8001/

Trigger chaos:

curl -X POST http://localhost:8000/chaos/trigger

Verify service is broken:

curl http://localhost:8000/buy
# Should return 500 error

Send alert to CodeWeaver:

curl -X POST http://localhost:8001/webhook/alert \
  -H "Content-Type: application/json" \
  -d '{"data": {"message": "Critical failure"}}'

CodeWeaver will automatically:
- Read logs from /logs/chaos-app/service.log
- Detect ConnectionRefusedError
- Plan restart action
- Execute POST http://chaos-app:8000/chaos/resolve
- Service recovers!

Verify recovery:

curl http://localhost:8000/buy
# Should return success

📝 License

Built for autonomous SRE operations with AI-powered incident response.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Chaos-app		Chaos-app
Core		Core
dashboard-next		dashboard-next
.env.example		.env.example
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeWeaver System

🎯 Problem Statement

💡 Our Solution: CodeWeaver

Key Features

How It Works

Impact

🚀 Quick Start with Docker Compose

Prerequisites

1. Set up environment variables

2. Start the system

3. Test the system

🏗️ Architecture

📁 Project Structure

🔧 Services

Chaos App

CodeWeaver Agent

🐳 Docker Compose Features

🛠️ Local Development

CodeWeaver Agent

Chaos App

📊 Monitoring

🧪 Full Integration Test

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeWeaver System

🎯 Problem Statement

💡 Our Solution: CodeWeaver

Key Features

How It Works

Impact

🚀 Quick Start with Docker Compose

Prerequisites

1. Set up environment variables

2. Start the system

3. Test the system

🏗️ Architecture

📁 Project Structure

🔧 Services

Chaos App

CodeWeaver Agent

🐳 Docker Compose Features

🛠️ Local Development

CodeWeaver Agent

Chaos App

📊 Monitoring

🧪 Full Integration Test

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages