Design Spec: k8s-victoriametrics-healthcheck
Parent: #93
Target: rw-cli-codecollection
Spec
# CodeBundle Design Spec (VictoriaMetrics Kubernetes)
codebundle_name: "k8s-victoriametrics-healthcheck"
target_collection: "rw-cli-codecollection"
display_name: "Kubernetes VictoriaMetrics Health Check"
author: "rw-codebundle-agent"
purpose: |
Validates VictoriaMetrics workloads on Kubernetes against operator expectations:
pods ready and scheduling clean, persistent storage for vmstorage healthy,
and HTTP health endpoints responding. Surfaces recent error signatures from
component logs so operators can catch ingestion or query failures early.
tasks:
- name: "Verify VictoriaMetrics workload pod readiness"
description: |
Lists Deployments, StatefulSets, and DaemonSets in the target namespace
that match VictoriaMetrics labels (single-node, vmagent, or cluster
components: vmselect, vminsert, vmstorage) and reports pods not Ready,
CrashLoopBackOff, ImagePullBackOff, or failed rollout conditions.
script_name: "check-vm-workload-readiness.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "logs-config"
- name: "Check VictoriaMetrics storage PVCs bound and healthy"
description: |
Lists PVCs used by VictoriaMetrics StatefulSets (especially vmstorage),
flags Pending/Failed/Lost phases, volume binding failures, and capacity
pressure when status is available.
script_name: "check-vm-storage-pvcs.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "logs-config"
- name: "Probe VictoriaMetrics HTTP health endpoints"
description: |
For each discovered component pod, curls localhost:port/health (and
readiness where applicable) via kubectl exec, using documented default
ports per component (single-node VM, vmselect, vminsert, vmstorage,
vmagent). Raises issues on non-2xx or connection failures.
script_name: "check-vm-http-health.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "metrics"
- name: "Check VictoriaMetrics cluster status API (vmselect)"
description: |
When cluster mode is detected, requests vmselect cluster status or
overview JSON (e.g. /api/v1/status/cluster) and validates storage nodes
and insert/select paths are reported healthy per VM documentation.
script_name: "check-vm-cluster-status.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Scan VictoriaMetrics recent logs for errors"
description: |
Tails or greps recent container logs for ERROR/panic/fatal patterns on
VictoriaMetrics-labeled pods to catch runtime failures not visible from
pod phase alone.
script_name: "check-vm-recent-error-logs.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "logs-regexp"
scope:
level: "Project"
qualifiers:
- CONTEXT
- NAMESPACE
iteration_pattern: |
One SLX per namespace where VictoriaMetrics workloads are deployed. User
configures CONTEXT and NAMESPACE; optional label selector narrows to a
specific Helm release or operator instance.
resource_types:
- "kubernetes_namespace"
generation_strategy: |
Match namespaces containing workloads matching VictoriaMetrics label
conventions (configurable selector). Default discovery: Deployments or
StatefulSets with labels such as app.kubernetes.io/name matching
victoria-metrics-*, vmselect, vminsert, vmstorage, vmagent, or chart
defaults from VictoriaMetrics helm/k8s-stack. Resource qualifier: namespace
name.
env_vars:
- name: CONTEXT
description: "Kubernetes context to use"
required: true
- name: NAMESPACE
description: "Namespace where VictoriaMetrics workloads run"
required: true
- name: KUBERNETES_DISTRIBUTION_BINARY
description: "kubectl-compatible binary"
required: false
default: "kubectl"
- name: VM_LABEL_SELECTOR
description: |
Optional label selector to scope pods (e.g. app.kubernetes.io/instance=my-vm).
If empty, scripts discover common VictoriaMetrics component labels.
required: false
default: ""
- name: VM_DEPLOYMENT_MODE
description: |
single | cluster | auto — auto tries to detect cluster vs single-node from
workload kinds and labels.
required: false
default: "auto"
secrets:
- name: kubeconfig
description: "Kubeconfig secret for cluster access"
format: "Standard kubeconfig file"
platform:
name: "kubernetes"
cli_tools:
- "kubectl"
- "jq"
- "curl"
auth_methods:
- "kubeconfig (cluster RBAC)"
api_docs: "https://docs.victoriametrics.com/"
related_bundles:
- name: "k8s-loki-healthcheck"
relationship: "complements"
notes: "Same kubectl + HTTP health pattern for observability backends in rw-cli-codecollection."
- name: "k8s-prometheus-healthcheck"
relationship: "complements"
notes: "Prometheus operator triage; VictoriaMetrics is a separate metrics stack."
- name: "k8s-pvc-healthcheck"
relationship: "complements"
notes: "Generic PVC checks; this bundle specializes vmstorage PVC signals."
- name: "k8s-cortexmetrics-ingestor-health"
relationship: "overlaps"
notes: "Cortex ring health in rw-public-codecollection; different product but similar distributed-metrics ops mental model."
test_scenarios:
- name: "healthy_single_node_vm"
description: "Single-node VM or vmagent with all pods Ready and /health OK"
expected_issues: 0
- name: "vmstorage_pvc_pending"
description: "PVC stuck Pending for vmstorage StatefulSet"
expected_issues: 1
expected_severities: [3]
- name: "cluster_component_unhealthy"
description: "vmselect cluster status reports unhealthy storage node"
expected_issues: 1
expected_severities: [3]
notes: |
VictoriaMetrics exposes /health on each binary; default ports differ by
component (e.g. 8481 select, 8480 insert, 8482 storage, 8429 single, 8429
vmagent — confirm against installed version and chart). Prefer exec into
pod over NodePort unless Service names are parameterized. Helm charts
(victoria-metrics-k8s-stack) may use different labels; VM_LABEL_SELECTOR must
document examples. If cluster status API path differs by version, implementer
should follow the version pinned in the deployment. Align Robot tasks with
rw-cli patterns: RW.CLI.Run Bash File, RW.Core.Add Issue, timeout_seconds,
tags access:read-only and data:*. Include .runwhen generation rules and
.test/Taskfile per farm standards so scorer passes.
Design Spec: k8s-victoriametrics-healthcheck
Parent: #93
Target:
rw-cli-codecollectionSpec