Production-ready, full-stack Chaos Engineering platform built entirely on AWS
Test your system's resilience by injecting controlled failures into your infrastructure.
๐ Quick Start โข ๐ Documentation โข ๐๏ธ Architecture โข ๐ก Examples โข ๐ค Contributing
Modern React dashboard for managing and monitoring chaos experiments
In today's complex distributed systems, failure is inevitable. The question isn't if your system will fail, but when. Chaos Engineering helps you:
- ๐ก๏ธ Build Resilience - Discover weaknesses before customers do
- ๐ฏ Validate Assumptions - Test if your failover actually works
- ๐ Improve Monitoring - Find blind spots in observability
- ๐ Increase Confidence - Deploy with certainty your system can handle failures
- ๐ฐ Reduce Downtime - Prevent costly outages through proactive testing
Inspired by Netflix's battle-tested Chaos Monkey, this platform brings enterprise-grade chaos engineering to your AWS environment.
|
|
- 5 CloudFormation Templates - Complete infrastructure automation
- Multi-AZ VPC - High-availability networking
- Auto Scaling Groups - Dynamic capacity management
- Application Load Balancer - Intelligent traffic distribution
- DynamoDB - Serverless database for experiments and results
- 18-State Step Functions Workflow - Sophisticated orchestration
- CloudWatch Integration - Detailed metrics and logging
- Results Analytics - Comprehensive experiment analysis
- One-Command Deployment - Deploy entire stack in 15 minutes
โ
AWS Account with admin access
โ
AWS CLI configured
โ
Node.js 18+ and npm
โ
Git# 1. Clone the repository
git clone https://github.com/ARAVIND281/Chaos-Engineering-Platform.git
cd Chaos-Engineering-Platform
# 2. Deploy everything (takes ~15 minutes)
./scripts/deploy-fullstack-complete.sh dev
# 3. Access your dashboard
# URL will be displayed after deployment completesThat's it! ๐ Your chaos engineering platform is live.
๐น Watch Quick Start Video
Coming soon: Step-by-step video walkthrough
graph TB
subgraph "User Interface"
A[React Dashboard<br/>S3 + CloudFront]
end
subgraph "API Layer"
B[API Gateway]
C[Lambda Functions<br/>TypeScript]
D[DynamoDB<br/>Experiments & Results]
end
subgraph "Chaos Engine"
E[Step Functions<br/>18-State Workflow]
F1[Get Target<br/>Lambda]
F2[Inject Failure<br/>Lambda]
F3[Validate Health<br/>Lambda]
end
subgraph "Target Infrastructure"
G[VPC<br/>Multi-AZ]
H[Auto Scaling Group]
I[Load Balancer]
J[EC2 Instances]
end
A --> B
B --> C
C --> D
C --> E
E --> F1
E --> F2
E --> F3
F1 --> H
F2 --> J
F3 --> I
G --> H
H --> J
I --> J
| Component | Technology | Purpose |
|---|---|---|
| ๐จ Frontend | React 18 + TypeScript + Vite | Modern dashboard for experiment management |
| ๐ง Backend API | Lambda + API Gateway | RESTful API for CRUD operations |
| ๐๏ธ Database | DynamoDB | Serverless data persistence |
| โ๏ธ Orchestration | Step Functions | 18-state chaos workflow |
| ๐จ Chaos Functions | Python Lambda | Failure injection logic |
| ๐ Networking | VPC + ALB | Multi-AZ infrastructure |
| ๐ฏ Target App | Auto Scaling Group | Sample application for testing |
| ๐ Monitoring | CloudWatch | Metrics and logging |
1๏ธโฃ Through the Dashboard (Recommended)
- Access your dashboard at the provided URL
- Login with
admin@chaos-platform.com/any-password - Click "New Experiment"
- Configure experiment:
Target: Auto Scaling Group Failure Type: Instance Termination Dry Run: โ Enabled (for first test) - Click "Start Experiment"
- Monitor in real-time as the platform:
- โ Validates system health
- ๐ฅ Injects controlled failure
- ๐ Monitors system response
- โ Validates recovery
- ๐ Generates detailed report
2๏ธโฃ Through AWS CLI
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:ACCOUNT:stateMachine:chaos-platform-chaos-experiment \
--input '{
"experimentId": "exp-cli-001",
"targetType": "ASG",
"targetId": "chaos-platform-asg",
"failureType": "INSTANCE_TERMINATION",
"dryRun": false,
"configuration": {
"expectedHealthyInstances": 2
}
}'3๏ธโฃ Programmatically (TypeScript)
import { StepFunctions } from '@aws-sdk/client-sfn';
const stepfunctions = new StepFunctions({ region: 'us-east-1' });
await stepfunctions.startExecution({
stateMachineArn: 'arn:aws:states:...:stateMachine:chaos-platform-chaos-experiment',
input: JSON.stringify({
experimentId: 'exp-programmatic-001',
targetType: 'ASG',
failureType: 'CPU_STRESS',
dryRun: false
})
});๐ Experiment: exp-2025-01-27-abc123
๐ฏ Target: chaos-platform-asg (2 instances)
๐ฅ Failure: Instance Termination
โฑ๏ธ Duration: 5m 32s
Results:
โ
Pre-check: System healthy (2/2 instances)
๐ฅ Chaos: Terminated i-0abc123
โณ Recovery: Auto Scaling launched replacement
โ
Post-check: System recovered (2/2 instances)
๐ Availability: 99.8% maintained during test
Learnings:
โข Auto Scaling Group successfully replaced failed instance
โข Load Balancer detected unhealthy instance in 30s
โข Application remained available throughout experiment
| Document | Description |
|---|---|
| ๐ Quick Deployment | Get started in 5 minutes |
| ๐ Step-by-Step Guide | Detailed walkthrough |
| ๐ AWS IAM Setup | Required permissions |
| ๐๏ธ Architecture Design | System architecture |
| ๐ Project Summary | Component overview |
- Weekly Tutorials - Step-by-step implementation guides
- API Documentation - Complete API reference (coming soon)
- Troubleshooting Guide - Common issues (coming soon)
Chaos-Engineering-Platform/
โโโ ๐ infrastructure/ # CloudFormation templates
โ โโโ vpc-infrastructure.yaml
โ โโโ target-application.yaml
โ โโโ chaos-lambda-functions.yaml
โ โโโ chaos-step-functions.yaml
โ โโโ fullstack-database.yaml
โ
โโโ ๐ lambda-functions/ # Chaos injection logic
โ โโโ get-target-instance/ # Instance selection
โ โโโ inject-failure/ # Failure injection
โ โโโ validate-system-health/ # Health validation
โ
โโโ ๐ backend/ # TypeScript API
โ โโโ src/handlers/ # API endpoints
โ โโโ src/services/ # Business logic
โ โโโ src/types/ # TypeScript types
โ
โโโ ๐ frontend/ # React Dashboard
โ โโโ src/components/ # UI components
โ โโโ src/pages/ # Application pages
โ โโโ src/lib/ # Utilities
โ
โโโ ๐ scripts/ # Automation scripts
โ โโโ deploy-fullstack-complete.sh
โ โโโ cleanup.sh
โ
โโโ ๐ docs/ # Documentation
โโโ deployment/
โโโ fullstack/
โโโ archive/
|
First 12 months with AWS Free Tier:
|
Running Tests
# Backend tests
cd backend
npm test
# Frontend tests
cd frontend
npm test
# End-to-end tests
./scripts/test-end-to-end.shGitHub Actions (Coming Soon)
- โ Automated testing on PR
- โ Infrastructure validation
- โ Security scanning
- โ Deployment automation
We love contributions! ๐
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See our Contribution Guidelines for more details.
- Full-stack React dashboard
- Step Functions orchestration
- Multi-AZ VPC infrastructure
- Auto Scaling target application
- Chaos Lambda functions
- DynamoDB persistence
- One-command deployment
- Advanced analytics dashboard
- Experiment scheduling
- Slack/Teams notifications
- API authentication (JWT)
- Kubernetes chaos experiments
- Multi-region testing
- Custom failure plugins
- Team collaboration features
- Experiment templates library
- Cost optimization recommendations
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License - you can use, modify, and distribute this code freely.
|
Inspired By Netflix Chaos Monkey |
Built With shadcn/ui |
Powered By AWS |
Icons By Lucide |
- Repository: github.com/ARAVIND281/Chaos-Engineering-Platform
- Issues: Report a bug or request a feature
- Documentation: Full documentation
If this project helped you, please give it a โญ!
Made by ARAVIND281 | Licensed under MIT | Contributions welcome!