Skip to content
View dmazhukov's full-sized avatar
  • Vietnam

Block or report dmazhukov

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dmazhukov/README.md

Dmitrii Zhukov — Senior DevOps / Platform / SRE Engineer

Infrastructure engineer focused on Kubernetes reliability, observability stacks, and PostgreSQL HA operations. 20+ years in tech — 14 as a .NET developer and tech lead, 6+ in DevOps and cloud infrastructure.

Based in Vietnam 🇻🇳 · Remote worldwide · GMT+7


What I work on

  • Kubernetes — Production clusters across GKE, kubeadm and Rancher — cloud and on-premise including offline-tolerant ship environments (ships with satellite connectivity — not your typical infra problem).
  • Observability — Prometheus, Grafana, Istio service mesh with distributed tracing.
  • PostgreSQL — HA clusters with Patroni, streaming replication, failover testing, performance tuning.
  • IaC — Terraform, Ansible, Helm. Infrastructure treated the same as application code: PR reviews, tested pipelines, no manual snowflakes.
  • CI/CD — Jenkins, GitLab CI, GitHub Actions, Codefresh. Reduced release failure rates by 90% through pre-deployment validation and staged rollouts.

Featured

Production-grade Prometheus alerting rules for Kubernetes, PostgreSQL/Patroni, and SLO burn rate alerting — with runbooks.

Covers:

  • Pod crash-loop, OOM, PVC fill-up, deployment rollout stuck
  • Patroni cluster health, replication lag, XID wraparound
  • Multi-window SLO burn rate (Google SRE method)
  • Node disk, network, clock skew

Stack

Orchestration   Kubernetes (GKE · kubeadm · Rancher) · Docker · Helm
Cloud           GCP · AWS · DigitalOcean · Yandex.Cloud · Alibaba Cloud
Observability   Prometheus · Grafana · Istio · ELK · Dynatrace
Databases       PostgreSQL · Patroni · MS SQL · Oracle
IaC             Terraform · Ansible
CI/CD           Jenkins · GitLab CI · GitHub Actions · Codefresh
Scripting       Python · Bash · Go

By the numbers

Achievement Result
Production release failures −90% (from ~10 to ~1/year)
System uptime 99.8% for cruise operations
Cloud migration Zero downtime · −30% cost
CI/CD speed −75% deployment time
IAM security incidents −60% after RBAC reorganization
PostgreSQL HA 99.95% uptime · <30s failover

Connect

Pinned Loading

  1. build-harness build-harness Public

    Forked from cloudposse/build-harness

    🤖Collection of Makefiles to facilitate building Golang projects, Dockerfiles, Helm charts, and more

    Makefile 1

  2. typhoon typhoon Public

    Forked from poseidon/typhoon

    Minimal and free Kubernetes distribution

    HCL 1

  3. zalenium zalenium Public

    Forked from zalando/zalenium

    A flexible and scalable container based Selenium Grid with video recording, live preview, basic auth & dashboard.

    Java 1

  4. cloudshell-homebrew cloudshell-homebrew Public

    Google Cloud Shell custom image with Homebrew(Linuxbrew) installed

    Dockerfile 1