A production-grade AI full-stack application with complete DevOps, observability, security scanning, infrastructure-as-code, and GitOps-ready deployment — built as a portfolio demonstration of platform engineering practices.
| Area | What a Hiring Manager Can Inspect |
|---|---|
| Full-Stack Development | React + TypeScript frontend, FastAPI backend, AI endpoint with pluggable providers |
| Containerization | Multi-stage Dockerfiles, docker-compose orchestration, minimal base images |
| CI/CD | GitHub Actions with test → scan → build → push pipeline |
| Observability | OpenTelemetry traces, Prometheus metrics, Grafana dashboards, structured logging |
| Security / DevSecOps | Trivy container scanning, dependency audit, CodeQL, OpenSSF Scorecard, least-privilege CI |
| Infrastructure as Code | Terraform/OpenTofu skeleton for AWS, ready for real provisioning |
| GitOps | Kubernetes manifests, Kustomize overlays, Argo CD application definitions |
| Testing | Unit tests, API contract tests, smoke tests, security validation tests |
graph TB
subgraph "Client"
FE[React + Vite Frontend]
end
subgraph "Backend"
API[FastAPI Service]
AI[AI Endpoint - Mock/Pluggable]
HEALTH[Health Check]
METRICS[/metrics - Prometheus/]
end
subgraph "Observability"
OTEL[OpenTelemetry Collector]
PROM[Prometheus]
GRAF[Grafana]
LOGS[Structured Logs - JSON]
end
subgraph "Infrastructure"
K8S[Kubernetes Cluster]
ARGO[Argo CD]
TF[Terraform/OpenTofu]
end
FE -->|HTTP + Correlation ID| API
API --> AI
API --> HEALTH
API --> METRICS
API -->|OTLP| OTEL
OTEL --> PROM
PROM --> GRAF
API --> LOGS
ARGO -->|GitOps Sync| K8S
TF -->|Provision| K8S
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite |
| Backend | Python 3.12, FastAPI, Uvicorn |
| AI | Mock provider (pluggable to OpenAI, Anthropic, Ollama) |
| Containers | Docker, docker-compose |
| CI/CD | GitHub Actions |
| Observability | OpenTelemetry, Prometheus, Grafana |
| Security | Trivy, CodeQL, OpenSSF Scorecard, pip-audit |
| IaC | Terraform / OpenTofu |
| Orchestration | Kubernetes, Kustomize |
| GitOps | Argo CD |
ai-devops-platform-lab/
├── apps/
│ ├── web/ # React + TypeScript + Vite frontend
│ └── api/ # FastAPI backend
├── infra/
│ ├── terraform/ # Cloud infrastructure skeleton
│ ├── k8s/ # Kubernetes manifests + Kustomize
│ └── argocd/ # Argo CD application definitions
├── observability/
│ ├── grafana/ # Dashboard JSON + provisioning
│ ├── prometheus/ # Prometheus config
│ └── otel/ # OpenTelemetry Collector config
├── .github/
│ └── workflows/ # CI/CD pipelines
├── docs/ # Architecture docs, ADRs
├── docker-compose.yml # Local development orchestration
├── Makefile # Task runner
├── .env.example # Environment template
├── SECURITY.md # Vulnerability reporting policy
└── README.md
- Docker & Docker Compose
- Node.js 20+ (for frontend dev)
- Python 3.12+ (for backend dev)
- Make
# Clone and enter
git clone https://github.com/hubertlim/ai-devops-platform-lab.git
cd ai-devops-platform-lab
# Copy environment template
cp .env.example .env
# Start all services
make up
# Frontend: http://localhost:5173
# Backend: http://localhost:8000
# API Docs: http://localhost:8000/docs
# Grafana: http://localhost:3001
# Prometheus: http://localhost:9090make build # Build all containers
make test # Run all tests
make lint # Lint frontend + backend
make scan # Run Trivy container scan
make down # Stop all services
make clean # Remove containers and volumesThe GitHub Actions pipeline runs on every push and PR:
┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐
│ Lint │───▶│ Test │───▶│ Scan │───▶│ Build │
└─────────┘ └──────────┘ └─────────┘ └──────────┘
│
┌──────▼──────┐
│ Push Image │
│ (main only) │
└─────────────┘
- Lint: ESLint + Ruff
- Test: pytest + Vitest
- Scan: Trivy (container + filesystem), pip-audit, npm audit
- Build: Multi-stage Docker builds
- Push: GHCR (GitHub Container Registry) on main branch only
Security: All jobs use minimal permissions, pinned action versions, and no secrets in logs.
| Control | Implementation |
|---|---|
| Dependency scanning | pip-audit, npm audit in CI |
| Container scanning | Trivy on built images |
| Static analysis | CodeQL (scheduled) |
| Supply chain | OpenSSF Scorecard workflow |
| Secrets | No real secrets committed; .env.example with placeholders |
| CI permissions | Least-privilege permissions: blocks |
| Vulnerability reporting | SECURITY.md with responsible disclosure process |
Request (with X-Correlation-ID header)
│
▼
Frontend ──▶ Backend (FastAPI)
│
├── Structured JSON log (correlation_id, method, path, status, duration)
├── OpenTelemetry span (trace_id linked to correlation_id)
└── Prometheus counter/histogram update
│
▼
Prometheus scrapes /metrics
│
▼
Grafana dashboard (request rate, latency, error rate, AI endpoint usage)
| Environment | Method | Config |
|---|---|---|
| Local | docker-compose | .env + docker-compose.yml |
| Staging | Kubernetes + Argo CD | infra/k8s/overlays/staging/ |
| Production | Kubernetes + Argo CD | infra/k8s/overlays/production/ |
Argo CD watches this repository and auto-syncs Kubernetes manifests when changes merge to main.
- Core full-stack app with AI endpoint
- Docker + docker-compose local dev
- GitHub Actions CI pipeline
- OpenTelemetry + Prometheus + Grafana observability
- Trivy + CodeQL + OpenSSF security scanning
- Terraform skeleton
- Kubernetes manifests + Kustomize
- Argo CD GitOps definitions
- Real LLM provider integration (OpenAI, Anthropic, Ollama)
- End-to-end tests with Playwright
- Chaos engineering with Litmus
- Cost monitoring dashboard
- Multi-region deployment example
Screenshots will be added once the project is deployed to a live environment. The local stack (Frontend, API docs at
/docs, Grafana, Prometheus) is fully functional viamake up.
MIT — see LICENSE.