- Tooling: Docker Compose
- Purpose: feature development and integration validation
- Characteristics:
- direct host port mapping
- local
.env-driven config - fast iteration
- Tooling: Kubernetes namespace (
miniredis-staging) - Purpose: pre-production integration and release validation
- Characteristics:
- production-like routing and policies
- synthetic load and smoke tests
- rollback rehearsal
- Tooling: Kubernetes namespace (
miniredis-prod) - Purpose: live workloads
- Characteristics:
- managed TLS, secret management, observability stack
- autoscaling and resilience controls
- progressive delivery
- One service per image.
- Multi-stage Docker builds.
- Immutable tags:
service:<git-sha>- optional semver tags for releases.
- Registry push from CI after tests pass.
- Runtime images kept minimal for attack-surface reduction.
Deployment:- frontend
- api-gateway
- backend
- node-manager
- monitoring-service
- auth-service
StatefulSet:- postgres-main (if self-hosted)
- postgres-auth (if self-hosted)
Service(ClusterIP) for internal communication.Ingressfor external access.ConfigMapfor non-sensitive config.Secretfor credentials/tokens/keys.PVCfor stateful storage.
| Logical Service | App Port | K8s Service Port | Public Exposure |
|---|---|---|---|
| frontend | 5173 | 80/443 via ingress | public |
| api-gateway | 8080 | 8080 | public/internal |
| backend | 5500 | 5500 | internal |
| node-manager | 7000 | 7000 | internal |
| monitoring-service | 9000 | 9000 | internal via gateway or restricted public |
| auth-service | 8000 | 8000 | internal via gateway |
| postgres-main | 5432 | 5432 | internal only |
| postgres-auth | 5432 | 5432 | internal only |
Tenant node port pool:
- controlled by env (
REDIS_PORT_START,REDIS_PORT_END) - verify no conflict with node/host networking strategy.
Each service should have:
- readiness probe,
- liveness probe,
- CPU/memory requests and limits,
- rolling update strategy.
Critical services should also have:
- PodDisruptionBudget,
- anti-affinity rules,
- HPA (api-gateway, backend, monitoring).
- Namespace segmentation by environment.
- NetworkPolicy default deny + explicit allow.
- TLS managed with cert-manager.
- Secrets from secret manager (or sealed secrets).
- Non-root container execution where possible.
- CI image and dependency scanning.
- Metrics: Prometheus
- Dashboards: Grafana
- Logs: Loki
- Traces: OpenTelemetry collector (planned)
Minimum dashboards:
- gateway latency/error rate
- backend request/error trends
- node-manager active node count + memory
- monitoring freshness and scrape health
- database health and connection saturation
- Lint and static checks.
- Unit tests.
- Integration tests (compose/k8s ephemeral).
- Build and scan images.
- Push versioned images.
- Update Helm values/image tags.
- Argo CD sync to staging.
- Smoke test gate.
- Promote to production.
- Monitor rollback triggers.
- Default: rolling update.
- Sensitive services: canary rollout.
- Auto rollback triggers:
- readiness failures,
- elevated 5xx rates,
- health-check failure budget breach.
- Daily full backups for DB.
- Periodic restore test in staging.
- RPO/RTO documented for production.
- Versioned migration policy for schema changes.
- Helm charts for all current services.
- Staging namespace with ingress + TLS.
- Observability baseline in cluster.
- GitOps promotion workflow.
- Production hardening and runbooks.