Skip to content

mateenali66/devops-experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DevOps Experiment - Production-Grade AWS EKS Platform

Terraform Kubernetes AWS GPU GitOps

A comprehensive, production-ready DevOps platform demonstrating Infrastructure as Code, GitOps, and modern cloud-native practices.

Last Updated: December 2024

๐Ÿ—๏ธ Architecture Overview

EKS Architecture

CI/CD Pipeline

CI/CD Pipeline

GitOps Workflow

GitOps Flow

๐Ÿ“ Project Structure

.
โ”œโ”€โ”€ terraform/                    # Terraform modules
โ”‚   โ”œโ”€โ”€ modules/
โ”‚   โ”‚   โ”œโ”€โ”€ vpc/                 # VPC, subnets, NAT, IGW
โ”‚   โ”‚   โ”œโ”€โ”€ eks/                 # EKS cluster + node groups
โ”‚   โ”‚   โ”œโ”€โ”€ eks-addons/          # EKS add-ons (CSI, CNI, etc.)
โ”‚   โ”‚   โ””โ”€โ”€ irsa/                # IAM Roles for Service Accounts
โ”‚   โ””โ”€โ”€ providers.tf
โ”‚
โ”œโ”€โ”€ terragrunt/                   # Terragrunt environment configs
โ”‚   โ”œโ”€โ”€ terragrunt.hcl           # Root configuration
โ”‚   โ”œโ”€โ”€ dev/
โ”‚   โ”‚   โ”œโ”€โ”€ env.hcl
โ”‚   โ”‚   โ”œโ”€โ”€ vpc/
โ”‚   โ”‚   โ”œโ”€โ”€ eks/
โ”‚   โ”‚   โ””โ”€โ”€ eks-addons/
โ”‚   โ”œโ”€โ”€ staging/
โ”‚   โ””โ”€โ”€ prod/
โ”‚
โ”œโ”€โ”€ kubernetes/                   # Kubernetes manifests
โ”‚   โ”œโ”€โ”€ flux-system/             # Flux bootstrap configuration
โ”‚   โ”œโ”€โ”€ infrastructure/          # Cluster-wide infrastructure
โ”‚   โ”‚   โ”œโ”€โ”€ sources/             # Helm repositories
โ”‚   โ”‚   โ”œโ”€โ”€ monitoring/          # Prometheus, Grafana
โ”‚   โ”‚   โ”œโ”€โ”€ nvidia/              # NVIDIA device plugin
โ”‚   โ”‚   โ””โ”€โ”€ ingress/             # Ingress controller
โ”‚   โ””โ”€โ”€ apps/                    # Application deployments
โ”‚       โ””โ”€โ”€ sample-gpu-app/
โ”‚
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ”œโ”€โ”€ terraform-ci.yaml    # TF validate, plan, apply
โ”‚       โ”œโ”€โ”€ container-build.yaml # Build & push containers
โ”‚       โ””โ”€โ”€ flux-diff.yaml       # Preview Flux changes
โ”‚
โ”œโ”€โ”€ docker/                       # Dockerfiles
โ”‚   โ””โ”€โ”€ sample-gpu-app/
โ”‚
โ””โ”€โ”€ docs/                         # Additional documentation
    โ”œโ”€โ”€ SETUP.md
    โ”œโ”€โ”€ GPU-WORKLOADS.md
    โ””โ”€โ”€ TROUBLESHOOTING.md

๐Ÿš€ Features

Infrastructure as Code

  • Terraform Modules: Reusable, versioned modules for VPC and EKS
  • Terragrunt: DRY configuration management across environments
  • State Management: Remote state with S3 + DynamoDB locking
  • GPU Support: Pre-configured node groups for NVIDIA GPU instances

GitOps with Flux

  • Automated Deployments: Git as single source of truth
  • Helm Controller: Declarative Helm release management
  • Kustomize Integration: Environment-specific overlays
  • Image Automation: Automatic image updates (optional)

Monitoring & Observability

  • Prometheus: Metrics collection with GPU metrics support
  • Grafana: Pre-configured dashboards for K8s and GPU monitoring
  • Alertmanager: Alert routing and notification

CI/CD with GitHub Actions

  • Infrastructure Pipeline: Validate โ†’ Plan โ†’ Apply workflow
  • Container Pipeline: Build, scan, and push to ECR
  • Security Scanning: Trivy for container vulnerability scanning
  • Cost Estimation: Infracost integration for PR cost preview

๐Ÿ› ๏ธ Prerequisites

  • AWS CLI v2 configured with appropriate credentials
  • Terraform >= 1.5.0
  • Terragrunt >= 0.50.0
  • kubectl >= 1.28
  • Flux CLI >= 2.0
  • Docker (for building containers)

๐Ÿ Quick Start

1. Clone and Configure

git clone https://github.com/mateenali66/devops-experiment.git
cd devops-experiment

# Set your AWS profile
export AWS_PROFILE=personal

2. Initialize Backend (First Time Only)

cd terragrunt/dev
terragrunt run-all init

3. Deploy Infrastructure

# Review the plan
terragrunt run-all plan

# Apply infrastructure
terragrunt run-all apply

4. Bootstrap Flux

# Configure kubectl
aws eks update-kubeconfig --name eks-dev-cluster --region us-west-2

# Bootstrap Flux
flux bootstrap github \
  --owner=mateenali66 \
  --repository=devops-experiment \
  --branch=main \
  --path=kubernetes/clusters/dev \
  --personal

5. Access Grafana

# Port forward Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80

# Default credentials: admin / prom-operator

๐ŸŽฎ GPU Workloads

This platform supports NVIDIA GPU workloads out of the box:

# Example GPU pod request
resources:
  limits:
    nvidia.com/gpu: 1

See docs/GPU-WORKLOADS.md for detailed GPU configuration.

๐Ÿ“Š Monitoring Dashboards

Pre-configured Grafana dashboards:

  • Kubernetes Cluster Overview
  • Node Exporter / Node Metrics
  • NVIDIA GPU Metrics (DCGM)
  • Flux GitOps Status
  • Container Resource Usage

๐Ÿ” Security Considerations

  • Private EKS endpoint (configurable)
  • IRSA for pod-level AWS permissions
  • Network policies for pod isolation
  • Secrets management via External Secrets Operator
  • Container image scanning in CI/CD

๐Ÿ’ฐ Cost Optimization

  • Spot instances for non-GPU workloads
  • Cluster autoscaler for dynamic scaling
  • Karpenter support (optional)
  • Right-sizing recommendations via Grafana

๐Ÿ“š Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

๐Ÿ“œ License

MIT License - see LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published