A custom Kubernetes controller that automatically manages PodDisruptionBudgets (PDBs) for Deployments and StatefulSets using flexible, annotation-driven configuration.
This controller enables safe, automated disruption management with per-workload overrides and cluster-wide defaults.
-
Automatic PDB Creation/Update:
Ensures every eligible Deployment/StatefulSet has a PDB, using defaults or per-workload overrides. -
Annotation-Based Customization:
Users can setminAvailable,maxUnavailable, or opt out of PDB management with annotations. -
Dynamic Default Configuration:
Cluster-wide default PDB values are set via a ConfigMap and can be changed at runtime without redeploying the controller. -
Automatic Detection and Handling of Poor PDB Configurations:
The controller detects overly restrictive or ineffective PodDisruptionBudgets (such asminAvailableequal to replicas,minAvailable: 100%, ormaxUnavailable: 0/0%) that could block voluntary disruptions or cause operational issues.Through the
FixPoorPDBsoption in thecastai-pdb-controller-configConfigMap, you can choose whether the controller should only warn about these poor configurations (default) or automatically delete and recreate them with safe defaults. This ensures cluster upgrades, node drains, and scaling operations are not blocked by problematic PDBs. -
Live Reconciliation:
If annotations or ConfigMap values change, existing PDBs are updated automatically to reflect new requirements. -
Bypass Support:
Workloads can opt out of automatic PDB management at any time by adding a bypass annotation. -
Exclusion Rules:
Configure regex-based exclusion rules to automatically skip PDB creation for specific workloads based on namespace, name, and label patterns. Useful for system workloads, temporary deployments, or critical services. -
Garbage Collection:
Orphaned PDBs are cleaned up when workloads are deleted or change state. -
Leader Election:
Supports safe, highly available operation in multi-replica controller deployments. -
Configurable log levels:
SetlogLevelin thecastai-pdb-controller-configConfigMap todebug,info,warn, orerror(defaultinfo) to control how much the controller writes to stderr.
If no annotation is set, the controller uses the values from the castai-pdb-controller-config ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: castai-pdb-controller-config
namespace: castai-agent
data:
minAvailable: "1"
logLevel: "info" # debug | info | warn | error — default info suppresses DEBUG trace lines
# maxUnavailable: "50%" # Optional, use one or the other
FixPoorPDBs: "true" # Set to "true" to auto-fix poor PDBs, "false" to only warn
exclusions: |
- namespaceRegex: "^kube-system$"
nameRegex: ""
labels: {}
- namespaceRegex: ""
nameRegex: ".*-temp$"
labels:
app: "^db-.*"
env: "staging|dev"Configure exclusion rules to automatically skip PDB creation for specific workloads:
apiVersion: v1
kind: ConfigMap
metadata:
name: castai-pdb-controller-config
namespace: castai-agent
data:
exclusions: |
- namespaceRegex: "^kube-system$" # Exclude all workloads in kube-system
nameRegex: ""
labels: {}
- namespaceRegex: "" # Exclude workloads with names ending in -temp
nameRegex: ".*-temp$"
labels:
app: "^db-.*" # AND app label starts with db-
env: "staging|dev" # AND env is staging or dev
- namespaceRegex: "monitoring" # Exclude workloads in monitoring namespace
nameRegex: ""
labels:
role: "critical" # AND role label is criticalExclusion Rule Logic:
- Each rule is evaluated independently
- If a workload matches ANY rule, no PDB is created
- Within a single rule, all specified criteria must match (AND logic)
- Empty strings for
namespaceRegexornameRegexmean "no filter" - Empty object
{}forlabelsmeans "no label filter" - Regular expressions are supported for flexible matching
Add annotations to your Deployment or StatefulSet to override the defaults:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
annotations:
workloads.cast.ai/pdb-minAvailable: "2"
spec:
replicas: 3
# ...apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-db
namespace: my-namespace
annotations:
workloads.cast.ai/pdb-maxUnavailable: "25%"
spec:
replicas: 4
# ...apiVersion: apps/v1
kind: Deployment
metadata:
name: no-pdb-app
namespace: my-namespace
annotations:
workloads.cast.ai/bypass-default-pdb: "true"
spec:
replicas: 5
# ...| Annotation | Description | Example Value |
|---|---|---|
workloads.cast.ai/pdb-minAvailable |
Minimum pods that must be available (int or percent, one only) | "2", "50%" |
workloads.cast.ai/pdb-maxUnavailable |
Maximum pods that can be unavailable (int or percent, one only) | "1", "25%" |
workloads.cast.ai/bypass-default-pdb |
Opt out of automatic PDB management | "true" |
- On workload creation or update:
The controller checks for annotations and creates/updates a PDB accordingly. - On annotation or ConfigMap change:
The controller reconciles and updates existing PDBs to match new settings. - On workload deletion or bypass:
The controller deletes the associated PDB. - On ConfigMap update:
All workloads using the default config are updated to the new values.
Verbosity is controlled by the logLevel key in the castai-pdb-controller-config ConfigMap (no redeploy required). You can also set the CASTAI_PDB_CONTROLLER_LOG_LEVEL environment variable on the controller pod only when logLevel is omitted from the ConfigMap.
| Level | Meaning |
|---|---|
| error | Failures only (API errors, invalid exclusion regexes, failed creates/deletes). |
| warn | Errors plus warnings (invalid durations in the ConfigMap, invalid selectors, poor-PDB / multi-PDB warnings). Hides routine success lines. |
| info | Default. Normal operations (PDB create/update/delete, skips, leader messages, config summary). Does not emit DEBUG: trace lines. |
| debug | Everything including high-volume DEBUG: traces (per-workload exclusion checks, reconciliation steps). Use only while troubleshooting. |
Aliases (case-insensitive): d, i, w, e, warning, fatal (same as error). Unknown values fall back to info with a warning.
Helm: set config.logLevel in values.yaml (rendered into the ConfigMap).
- Kubernetes 1.21+
- Permissions to manage Deployments, StatefulSets, PDBs, and ConfigMaps in your cluster.
- RBAC rules that allow listing namespaces and managing PDBs at the cluster scope.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
annotations:
workloads.cast.ai/pdb-minAvailable: "2"
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:latest- Duplicate logs:
Usually caused by log collector configuration, not the controller itself. - No PDB created:
Ensure your workload has at least 2 replicas and is not opted out with the bypass annotation. - RBAC errors:
Make sure your controller has permissions to list namespaces and manage PDBs.
For advanced usage, deployment via Helm, or troubleshooting, see the controller source code and your cluster’s RBAC configuration.
If you decide to remove the castai-pdb-controller from your cluster, you need to run the following clean-up command if you'd like all custom-created PDBs to also be deleted.
kubectl get poddisruptionbudget --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name" \
| awk '$2 ~ /^castai-.*-pdb$/ {print "kubectl delete poddisruptionbudget -n " $1 " " $2}' \
| sh