Production-Grade EKS Platform with Autoscaling & Observability

Designed and built a scalable Kubernetes platform on AWS focusing on cost optimization, observability, and production-grade deployment practices.

Key Highlights

Area	What Was Built
Autoscaling	Dynamic node provisioning with Karpenter — cost-optimized over static node groups
Observability	Full Prometheus + Grafana stack with real-time cluster and app metrics
TLS Automation	cert-manager with self-signed certs (dev) — extensible to Let's Encrypt in production
CI/CD	GitHub Actions pipeline: build → tag → push → deploy via Helm
Ingress	NGINX Ingress Controller with AWS ALB for external traffic routing
High Availability	Multi-AZ EKS architecture with decoupled frontend/backend deployments

Key Design Decisions

Karpenter
- karpenter is fast, dynamic and cost optimized but has complexity and Requires additional configuration
- consider Cluster Autoscaler if you have predictable workload or want simplicity
Helm for application deployment
- Used helm for highly portable and configurable deployments across environments
Self signed ClusterIssueer
- Currently self signed cluster issuser is used for simplicity.
- Use trusted issuer like Let's Encrypt in production.
Prometheus + Grafana stack
- Used for vendor-independent Kubernetes monitoring with deep cluster-level visibility
- Widely adopted in Kubernetes ecosystems
- AWS CloudWatch can be used for tighter AWS integration

Quick Start

Prerequisites:

# 1. Clone the repo
git clone https://github.com/your-username/eks-platform.git
cd eks-platform

# 2. Provision infrastructure
cd envs/dev
mv backenc.tf.example.hcp backenc.tf   # update with your hcp account values
mv terraform.tfvars.example terraform.tfvars
terraform init
terraform apply -auto-approve

# 3. Configure kubectl
aws eks update-kubeconfig --name eks-dev-cluster --region us-east-1

# 4. Deploy application stack
helm upgrade --install app ./helm -f helm/values.yaml

Full setup guide → docs/installation.md

CI/CD Pipeline

  Push to main
      │
      ▼
  Build Docker images (frontend + backend)
      │
      ▼
  Tag with commit SHA (e.g., sha-a3f9c12)
      │
      ▼
  Push to Container Registry (ECR / GHCR)
      │
      ▼
  helm upgrade --install → EKS

Zero-downtime deployments via rolling update strategy
Commit SHA tagging ensures full traceability — every image is pinned
Failed deployments auto-rollback via Helm revision history

Pipeline details → docs/cicd.md

Production Considerations

These are known gaps intentionally deferred for dev environment simplicity:

Area	Current State	Production Target
TLS	Self-signed cert-manager	Let's Encrypt via ACME
API Access	0.0.0.0/0	CIDR-restricted per team
Terraform State	S3 backend with locking	S3 backend with locking
Secrets	Helm values	AWS Secrets Manager / ESO
HPA	Not configured	CPU/memory-based autoscaling per workload
Karpenter	Aggressive consolidation to reduce cost	Tune carefully to avoid disruption

Security

IAM roles scoped per node pool — no wildcard permissions
TLS termination at NGINX ingress — backend services are never exposed directly
Private container registry support via Kubernetes image pull secrets
Security group rules limit node-to-node and pod-to-pod traffic

Project Structure

├── envs/
│   └── dev/
│       ├── main.tf
│       ├── backend.tf
│       ├── provider.tf
│       └── variables.tf

├── modules/
│   ├── eks/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   └── variables.tf
│   └── vpc/
│       ├── main.tf
│       ├── outputs.tf
│       └── variables.tf

├── helm/
│   ├── charts/
│   ├── Chart.yaml
│   ├── values.yaml
│   └── templates/
│       ├── backend-deployment.yaml
│       ├── backend-service.yaml
│       ├── frontend-deployment.yaml
│       ├── frontend-service.yaml
│       ├── ingress.yaml
│       ├── postgres-deployment.yaml
│       ├── postgres-service.yaml
│       ├── redis-deployment.yaml
│       ├── redis-service.yaml

├── scripts/
│   └── deploy.sh

├── selfsigned.yaml
├── inflate-deployment.yaml
├── .pre-commit-config.yaml
├── .terraform-version

Autoscaling with Karpenter

The example inflate-deployment.yaml Simulates high resource scheduling by requesting CPU resources:

Forces pods into Pending state
Karpenter provisions new nodes automatically
Pods get scheduled without manual intervention

apply with kubectl apply -f inflate-deployment.yaml

License

MIT License

Contributing

Pull requests and enhancements are always welcomed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Production-Grade EKS Platform with Autoscaling & Observability

Key Highlights

Key Design Decisions

Quick Start

CI/CD Pipeline

Production Considerations

Security

Project Structure

Autoscaling with Karpenter

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
app		app
assets		assets
docs		docs
envs/dev		envs/dev
frontend		frontend
helm		helm
modules		modules
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.terraform-version		.terraform-version
LICENSE		LICENSE
README.md		README.md
inflate-deployment.yaml		inflate-deployment.yaml
nodeClass.yaml		nodeClass.yaml
nodePool.yaml		nodePool.yaml
selfsigned.yaml		selfsigned.yaml

Folders and files

Latest commit

History

Repository files navigation

Production-Grade EKS Platform with Autoscaling & Observability

Key Highlights

Key Design Decisions

Quick Start

CI/CD Pipeline

Production Considerations

Security

Project Structure

Autoscaling with Karpenter

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages