DevOps Experiment - Production-Grade AWS EKS Platform

A comprehensive, production-ready DevOps platform demonstrating Infrastructure as Code, GitOps, and modern cloud-native practices.

Last Updated: December 2024

🏗️ Architecture Overview

CI/CD Pipeline

GitOps Workflow

📁 Project Structure

.
├── terraform/                    # Terraform modules
│   ├── modules/
│   │   ├── vpc/                 # VPC, subnets, NAT, IGW
│   │   ├── eks/                 # EKS cluster + node groups
│   │   ├── eks-addons/          # EKS add-ons (CSI, CNI, etc.)
│   │   └── irsa/                # IAM Roles for Service Accounts
│   └── providers.tf
│
├── terragrunt/                   # Terragrunt environment configs
│   ├── terragrunt.hcl           # Root configuration
│   ├── dev/
│   │   ├── env.hcl
│   │   ├── vpc/
│   │   ├── eks/
│   │   └── eks-addons/
│   ├── staging/
│   └── prod/
│
├── kubernetes/                   # Kubernetes manifests
│   ├── flux-system/             # Flux bootstrap configuration
│   ├── infrastructure/          # Cluster-wide infrastructure
│   │   ├── sources/             # Helm repositories
│   │   ├── monitoring/          # Prometheus, Grafana
│   │   ├── nvidia/              # NVIDIA device plugin
│   │   └── ingress/             # Ingress controller
│   └── apps/                    # Application deployments
│       └── sample-gpu-app/
│
├── .github/
│   └── workflows/
│       ├── terraform-ci.yaml    # TF validate, plan, apply
│       ├── container-build.yaml # Build & push containers
│       └── flux-diff.yaml       # Preview Flux changes
│
├── docker/                       # Dockerfiles
│   └── sample-gpu-app/
│
└── docs/                         # Additional documentation
    ├── SETUP.md
    ├── GPU-WORKLOADS.md
    └── TROUBLESHOOTING.md

🚀 Features

Infrastructure as Code

Terraform Modules: Reusable, versioned modules for VPC and EKS
Terragrunt: DRY configuration management across environments
State Management: Remote state with S3 + DynamoDB locking
GPU Support: Pre-configured node groups for NVIDIA GPU instances

GitOps with Flux

Automated Deployments: Git as single source of truth
Helm Controller: Declarative Helm release management
Kustomize Integration: Environment-specific overlays
Image Automation: Automatic image updates (optional)

Monitoring & Observability

Prometheus: Metrics collection with GPU metrics support
Grafana: Pre-configured dashboards for K8s and GPU monitoring
Alertmanager: Alert routing and notification

CI/CD with GitHub Actions

Infrastructure Pipeline: Validate → Plan → Apply workflow
Container Pipeline: Build, scan, and push to ECR
Security Scanning: Trivy for container vulnerability scanning
Cost Estimation: Infracost integration for PR cost preview

🛠️ Prerequisites

AWS CLI v2 configured with appropriate credentials
Terraform >= 1.5.0
Terragrunt >= 0.50.0
kubectl >= 1.28
Flux CLI >= 2.0
Docker (for building containers)

🏁 Quick Start

1. Clone and Configure

git clone https://github.com/mateenali66/devops-experiment.git
cd devops-experiment

# Set your AWS profile
export AWS_PROFILE=personal

2. Initialize Backend (First Time Only)

cd terragrunt/dev
terragrunt run-all init

3. Deploy Infrastructure

# Review the plan
terragrunt run-all plan

# Apply infrastructure
terragrunt run-all apply

4. Bootstrap Flux

# Configure kubectl
aws eks update-kubeconfig --name eks-dev-cluster --region us-west-2

# Bootstrap Flux
flux bootstrap github \
  --owner=mateenali66 \
  --repository=devops-experiment \
  --branch=main \
  --path=kubernetes/clusters/dev \
  --personal

5. Access Grafana

# Port forward Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80

# Default credentials: admin / prom-operator

🎮 GPU Workloads

This platform supports NVIDIA GPU workloads out of the box:

# Example GPU pod request
resources:
  limits:
    nvidia.com/gpu: 1

See docs/GPU-WORKLOADS.md for detailed GPU configuration.

📊 Monitoring Dashboards

Pre-configured Grafana dashboards:

Kubernetes Cluster Overview
Node Exporter / Node Metrics
NVIDIA GPU Metrics (DCGM)
Flux GitOps Status
Container Resource Usage

🔐 Security Considerations

Private EKS endpoint (configurable)
IRSA for pod-level AWS permissions
Network policies for pod isolation
Secrets management via External Secrets Operator
Container image scanning in CI/CD

💰 Cost Optimization

Spot instances for non-GPU workloads
Cluster autoscaler for dynamic scaling
Karpenter support (optional)
Right-sizing recommendations via Grafana

📚 Documentation

🤝 Contributing

Fork the repository
Create a feature branch
Submit a pull request

📜 License

MIT License - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
diagrams		diagrams
docker/sample-gpu-app		docker/sample-gpu-app
docs		docs
kubernetes		kubernetes
terraform/modules		terraform/modules
terragrunt		terragrunt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DevOps Experiment - Production-Grade AWS EKS Platform

🏗️ Architecture Overview

CI/CD Pipeline

GitOps Workflow

📁 Project Structure

🚀 Features

Infrastructure as Code

GitOps with Flux

Monitoring & Observability

CI/CD with GitHub Actions

🛠️ Prerequisites

🏁 Quick Start

1. Clone and Configure

2. Initialize Backend (First Time Only)

3. Deploy Infrastructure

4. Bootstrap Flux

5. Access Grafana

🎮 GPU Workloads

📊 Monitoring Dashboards

🔐 Security Considerations

💰 Cost Optimization

📚 Documentation

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Languages

License

mateenali66/devops-experiment

Folders and files

Latest commit

History

Repository files navigation

DevOps Experiment - Production-Grade AWS EKS Platform

🏗️ Architecture Overview

CI/CD Pipeline

GitOps Workflow

📁 Project Structure

🚀 Features

Infrastructure as Code

GitOps with Flux

Monitoring & Observability

CI/CD with GitHub Actions

🛠️ Prerequisites

🏁 Quick Start

1. Clone and Configure

2. Initialize Backend (First Time Only)

3. Deploy Infrastructure

4. Bootstrap Flux

5. Access Grafana

🎮 GPU Workloads

📊 Monitoring Dashboards

🔐 Security Considerations

💰 Cost Optimization

📚 Documentation

🤝 Contributing

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages