From de98fa6bc4d2ce0a03ae33967084c0e404122627 Mon Sep 17 00:00:00 2001
From: David Yu <david.yu@redpanda.com>
Date: Tue, 23 Jun 2026 13:54:18 -0700
Subject: [PATCH 1/4] manage/k8s: document decommission timing settings
 (--decommission-wait-interval, RequeueAfter)

Add a "Tune automatic decommission timing" section to the Kubernetes
decommission guide explaining the re-check/requeue interval settings for
both deployment modes:

- Operator: --decommission-wait-interval (default 8s), passed via the
  operator chart's additionalCmdFlags, which sets the Decommission
  controller's RequeueAfter (surfaced as the "next run in" log line).
- Helm sidecar: decommissionRequeueTimeout (10s) and decommissionAfter (60s).

Includes defaults, a worked helm example, how to read the interval from
operator logs, and guidance for adjusting the values (recheck vs debounce,
reallocation throughput is separate).

Ref: DOC-2270

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../kubernetes/k-decommission-brokers.adoc    | 56 +++++++++++++++++++
 1 file changed, 56 insertions(+)
diff --git a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
index ee3310794a..f0042b7ff1 100644
--- a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
+++ b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
@@ -482,6 +482,8 @@ helm upgrade --install redpanda-controller redpanda/operator \
 +
 - `--additional-controllers="decommission"`: Enables the Decommission controller.
 - `rbac.createAdditionalControllerCRs=true`: Creates the required RBAC rules for the Redpanda Operator to monitor the StatefulSet and update PVCs and PVs.
++
+TIP: To change how often the Decommission controller re-checks the cluster for brokers that need decommissioning, pass the `--decommission-wait-interval` flag through `additionalCmdFlags`. See <<decommission-timing>>.
 
 .. Configure a Redpanda resource with seven Redpanda brokers:
 +
@@ -660,6 +662,60 @@ kubectl logs deployment/redpanda-controller --namespace <operator-namespace>
 
 You can repeat this procedure to continue to scale down.
 
+[[decommission-timing]]
+== Tune automatic decommission timing
+
+The <<Automated,automatic decommissioner>> polls the cluster on a regular interval to detect brokers that need to be decommissioned. The setting that controls this interval, and any debounce window before the decommissioner acts, depends on how the controller is deployed: as the Decommission controller inside the Redpanda Operator, or as the broker decommissioner sidecar in a Helm-only deployment.
+
+[cols="2,1,4"]
+|===
+| Setting | Default | Description
+
+| `--decommission-wait-interval` (Operator; set through `additionalCmdFlags`)
+| `8s`
+| Requeue interval (`RequeueAfter`) for the operator's Decommission controller: how often the controller re-checks the cluster for brokers that need decommissioning when a reconcile did not already schedule a sooner re-check.
+
+| `decommissionRequeueTimeout` (Helm sidecar; under `statefulset.sideCars.brokerDecommissioner`)
+| `10s`
+| How often the sidecar re-checks a cluster that already has a broker flagged for decommissioning.
+
+| `decommissionAfter` (Helm sidecar; under `statefulset.sideCars.brokerDecommissioner`)
+| `60s`
+| How long a broker must continuously meet the decommission conditions before the sidecar acts. This debounce window prevents acting on transient conditions, such as a broker that is briefly unreachable during a restart.
+|===
+
+=== Set the interval for the Operator
+
+The operator's Decommission controller does not expose its interval as a dedicated Helm value. Instead, pass the `--decommission-wait-interval` flag through `additionalCmdFlags` when you install or upgrade the operator:
+
+[,bash,subs="attributes+"]
+----
+helm upgrade --install redpanda-controller redpanda/operator \
+  --namespace <namespace> \
+  --create-namespace \
+  --set image.tag={latest-operator-version} \
+  --set "additionalCmdFlags={--additional-controllers=decommission,--decommission-wait-interval=30s}" \
+  --set rbac.createAdditionalControllerCRs=true
+----
+
+The flag accepts any Go duration string, such as `8s`, `30s`, or `2m`. The default is `8s`. After each reconcile, the controller logs the next scheduled run, and the `next run in` value reflects the configured interval:
+
+[.no-copy]
+----
+{"level":"info","logger":"DecommissionReconciler.Reconcile","msg":"successful reconciliation finished in 1m0s, next run in 30s","controller":"statefulset", ...}
+----
+
+=== Set the intervals for Helm
+
+For a Helm-only deployment, set the sidecar values directly under `statefulset.sideCars.brokerDecommissioner`. For a full example, see <<Automated,Use the BrokerDecommissioner>>.
+
+=== Guidance for adjusting the intervals
+
+* These settings control only how often the decommissioner *re-checks* for work and how long it waits before acting. They do not change how fast partition data is reallocated once a decommission begins. Reallocation throughput is governed by xref:reference:cluster-properties.adoc#raft_learner_recovery_rate[`raft_learner_recovery_rate`] and xref:reference:tunable-properties.adoc#partition_autobalancing_concurrent_moves[`partition_autobalancing_concurrent_moves`].
+* Increase the re-check interval to reduce reconcile frequency, and the associated log and Admin API traffic, on large or stable clusters. Decrease it for faster detection of brokers that need decommissioning.
+* For Helm (sidecar) deployments, keep `decommissionRequeueTimeout` smaller than `decommissionAfter` -- ideally well below it -- so the sidecar re-evaluates the cluster at least once within the debounce window. If the re-check interval is close to or larger than `decommissionAfter`, the decommissioner may wait up to one additional interval before acting. The Kubernetes controller-runtime work queue also adds a small amount of jitter.
+* A single operator reconcile can take up to about a minute because the Decommission controller verifies that cluster health is stable before it commits to a decommission. This is expected, and is independent of the `--decommission-wait-interval` value.
+
 == Troubleshooting
 
 If the decommissioning process is not making progress, investigate the following potential issues:

From 9c813d26e3d29f7afda26a2c154eb22cd7182d49 Mon Sep 17 00:00:00 2001
From: David Yu <david.yu@redpanda.com>
Date: Tue, 23 Jun 2026 14:21:13 -0700
Subject: [PATCH 2/4] manage/k8s: clarify decommission interval is periodic
 re-check, not scale-in gate

Per EKS end-to-end testing: a user-initiated scale-in (reducing
statefulset.replicas) is detected from a StatefulSet watch event and acted
on promptly (~seconds) regardless of --decommission-wait-interval. The
interval governs the periodic re-check cadence for conditions that arise
without a triggering event (for example, a broker that becomes
unreachable), so raising it does not delay routine scale-ins.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 modules/manage/pages/kubernetes/k-decommission-brokers.adoc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
index f0042b7ff1..a2a1615606 100644
--- a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
+++ b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
@@ -712,6 +712,7 @@ For a Helm-only deployment, set the sidecar values directly under `statefulset.s
 === Guidance for adjusting the intervals
 
 * These settings control only how often the decommissioner *re-checks* for work and how long it waits before acting. They do not change how fast partition data is reallocated once a decommission begins. Reallocation throughput is governed by xref:reference:cluster-properties.adoc#raft_learner_recovery_rate[`raft_learner_recovery_rate`] and xref:reference:tunable-properties.adoc#partition_autobalancing_concurrent_moves[`partition_autobalancing_concurrent_moves`].
+* This interval is the *periodic* re-check cadence. A scale-in that you initiate by reducing `statefulset.replicas` is detected from a StatefulSet watch event and acted on promptly, so raising the interval does not delay a routine scale-in. The interval primarily determines how quickly the controller notices conditions that arise without a triggering event, such as a broker that becomes unreachable.
 * Increase the re-check interval to reduce reconcile frequency, and the associated log and Admin API traffic, on large or stable clusters. Decrease it for faster detection of brokers that need decommissioning.
 * For Helm (sidecar) deployments, keep `decommissionRequeueTimeout` smaller than `decommissionAfter` -- ideally well below it -- so the sidecar re-evaluates the cluster at least once within the debounce window. If the re-check interval is close to or larger than `decommissionAfter`, the decommissioner may wait up to one additional interval before acting. The Kubernetes controller-runtime work queue also adds a small amount of jitter.
 * A single operator reconcile can take up to about a minute because the Decommission controller verifies that cluster health is stable before it commits to a decommission. This is expected, and is independent of the `--decommission-wait-interval` value.

From 54eae0d4d1c31c675805df24b26da4f706c6ba5c Mon Sep 17 00:00:00 2001
From: David Yu <david.yu@redpanda.com>
Date: Tue, 23 Jun 2026 22:07:48 -0700
Subject: [PATCH 3/4] =?UTF-8?q?manage/k8s:=20address=20review=20=E2=80=94?=
 =?UTF-8?q?=20shell-correct=20helm=20--set,=20lowercase=20operator,=20re-c?=
 =?UTF-8?q?heck=20wording?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Align both additionalCmdFlags examples to the shell-correct form
  (--set "additionalCmdFlags={...}"): outer-quoted to protect {}/comma from
  brace expansion, no pointless inner quotes. Verified the rendered list with
  `helm template`: ["--additional-controllers=decommission","--decommission-wait-interval=30s"].
- Lowercase bare-noun "operator" (heading + table label) per docs convention.
- Intro: "polls ... to detect" -> "re-checks ... for" to match the event-driven note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../pages/kubernetes/k-decommission-brokers.adoc     | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
index a2a1615606..f39514a8c5 100644
--- a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
+++ b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
@@ -476,11 +476,11 @@ helm upgrade --install redpanda-controller redpanda/operator \
   --namespace <namespace> \
   --set image.tag={latest-operator-version} \
   --create-namespace \
-  --set additionalCmdFlags={--additional-controllers="decommission"} \
+  --set "additionalCmdFlags={--additional-controllers=decommission}" \
   --set rbac.createAdditionalControllerCRs=true
 ----
 +
-- `--additional-controllers="decommission"`: Enables the Decommission controller.
+- `--additional-controllers=decommission`: Enables the Decommission controller.
 - `rbac.createAdditionalControllerCRs=true`: Creates the required RBAC rules for the Redpanda Operator to monitor the StatefulSet and update PVCs and PVs.
 +
 TIP: To change how often the Decommission controller re-checks the cluster for brokers that need decommissioning, pass the `--decommission-wait-interval` flag through `additionalCmdFlags`. See <<decommission-timing>>.
@@ -582,7 +582,7 @@ When scaling in (removing brokers), remove only one broker at a time. If you red
 Operator::
 +
 --
-The Decommission controller is already running in the Redpanda Operator (enabled in the earlier `additionalCmdFlags={--additional-controllers="decommission"}` step). To trigger a decommission, change only the StatefulSet replica count on the Redpanda resource. Do not add `sideCars.brokerDecommissioner` here, as that field is not part of the Redpanda CRD and is silently dropped when the resource is applied.
+The Decommission controller is already running in the Redpanda Operator (enabled in the earlier `additionalCmdFlags={--additional-controllers=decommission}` step). To trigger a decommission, change only the StatefulSet replica count on the Redpanda resource. Do not add `sideCars.brokerDecommissioner` here, as that field is not part of the Redpanda CRD and is silently dropped when the resource is applied.
 
 .`redpanda-cluster.yaml`
 [,yaml,lines=9]
@@ -665,13 +665,13 @@ You can repeat this procedure to continue to scale down.
 [[decommission-timing]]
 == Tune automatic decommission timing
 
-The <<Automated,automatic decommissioner>> polls the cluster on a regular interval to detect brokers that need to be decommissioned. The setting that controls this interval, and any debounce window before the decommissioner acts, depends on how the controller is deployed: as the Decommission controller inside the Redpanda Operator, or as the broker decommissioner sidecar in a Helm-only deployment.
+The <<Automated,automatic decommissioner>> re-checks the cluster on a regular interval for brokers that need to be decommissioned. The setting that controls this interval, and any debounce window before the decommissioner acts, depends on how the controller is deployed: as the Decommission controller inside the Redpanda Operator, or as the broker decommissioner sidecar in a Helm-only deployment.
 
 [cols="2,1,4"]
 |===
 | Setting | Default | Description
 
-| `--decommission-wait-interval` (Operator; set through `additionalCmdFlags`)
+| `--decommission-wait-interval` (operator; set through `additionalCmdFlags`)
 | `8s`
 | Requeue interval (`RequeueAfter`) for the operator's Decommission controller: how often the controller re-checks the cluster for brokers that need decommissioning when a reconcile did not already schedule a sooner re-check.
 
@@ -684,7 +684,7 @@ The <<Automated,automatic decommissioner>> polls the cluster on a regular interv
 | How long a broker must continuously meet the decommission conditions before the sidecar acts. This debounce window prevents acting on transient conditions, such as a broker that is briefly unreachable during a restart.
 |===
 
-=== Set the interval for the Operator
+=== Set the interval for the operator
 
 The operator's Decommission controller does not expose its interval as a dedicated Helm value. Instead, pass the `--decommission-wait-interval` flag through `additionalCmdFlags` when you install or upgrade the operator:
 

From 8052d9cecd0c1f1be9d43d047ae8c2c7b36208c7 Mon Sep 17 00:00:00 2001
From: David Yu <david.yu@redpanda.com>
Date: Tue, 23 Jun 2026 22:10:46 -0700
Subject: [PATCH 4/4] manage/k8s: use capital "Operator" (Kubernetes
 convention) for the bare noun
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per maintainer preference, capitalize bare-noun "Operator" page-wide (heading,
table label, prose) — reverts the earlier lowercasing. Chart path
`redpanda/operator` and the `{latest-operator-version}` attribute stay lowercase.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../pages/kubernetes/k-decommission-brokers.adoc     | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
index f39514a8c5..1a29891f2d 100644
--- a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
+++ b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
@@ -598,7 +598,7 @@ spec:
       replicas: 6 <1>
 ----
 +
-<1> `statefulset.replicas`: Reduce by one. The Decommission controller in the operator detects the change and decommissions the broker on the highest-ordinal Pod.
+<1> `statefulset.replicas`: Reduce by one. The Decommission controller in the Operator detects the change and decommissions the broker on the highest-ordinal Pod.
 
 ```bash
 kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
@@ -671,9 +671,9 @@ The <<Automated,automatic decommissioner>> re-checks the cluster on a regular in
 |===
 | Setting | Default | Description
 
-| `--decommission-wait-interval` (operator; set through `additionalCmdFlags`)
+| `--decommission-wait-interval` (Operator; set through `additionalCmdFlags`)
 | `8s`
-| Requeue interval (`RequeueAfter`) for the operator's Decommission controller: how often the controller re-checks the cluster for brokers that need decommissioning when a reconcile did not already schedule a sooner re-check.
+| Requeue interval (`RequeueAfter`) for the Operator's Decommission controller: how often the controller re-checks the cluster for brokers that need decommissioning when a reconcile did not already schedule a sooner re-check.
 
 | `decommissionRequeueTimeout` (Helm sidecar; under `statefulset.sideCars.brokerDecommissioner`)
 | `10s`
@@ -684,9 +684,9 @@ The <<Automated,automatic decommissioner>> re-checks the cluster on a regular in
 | How long a broker must continuously meet the decommission conditions before the sidecar acts. This debounce window prevents acting on transient conditions, such as a broker that is briefly unreachable during a restart.
 |===
 
-=== Set the interval for the operator
+=== Set the interval for the Operator
 
-The operator's Decommission controller does not expose its interval as a dedicated Helm value. Instead, pass the `--decommission-wait-interval` flag through `additionalCmdFlags` when you install or upgrade the operator:
+The Operator's Decommission controller does not expose its interval as a dedicated Helm value. Instead, pass the `--decommission-wait-interval` flag through `additionalCmdFlags` when you install or upgrade the Operator:
 
 [,bash,subs="attributes+"]
 ----
@@ -715,7 +715,7 @@ For a Helm-only deployment, set the sidecar values directly under `statefulset.s
 * This interval is the *periodic* re-check cadence. A scale-in that you initiate by reducing `statefulset.replicas` is detected from a StatefulSet watch event and acted on promptly, so raising the interval does not delay a routine scale-in. The interval primarily determines how quickly the controller notices conditions that arise without a triggering event, such as a broker that becomes unreachable.
 * Increase the re-check interval to reduce reconcile frequency, and the associated log and Admin API traffic, on large or stable clusters. Decrease it for faster detection of brokers that need decommissioning.
 * For Helm (sidecar) deployments, keep `decommissionRequeueTimeout` smaller than `decommissionAfter` -- ideally well below it -- so the sidecar re-evaluates the cluster at least once within the debounce window. If the re-check interval is close to or larger than `decommissionAfter`, the decommissioner may wait up to one additional interval before acting. The Kubernetes controller-runtime work queue also adds a small amount of jitter.
-* A single operator reconcile can take up to about a minute because the Decommission controller verifies that cluster health is stable before it commits to a decommission. This is expected, and is independent of the `--decommission-wait-interval` value.
+* A single Operator reconcile can take up to about a minute because the Decommission controller verifies that cluster health is stable before it commits to a decommission. This is expected, and is independent of the `--decommission-wait-interval` value.
 
 == Troubleshooting