fix(helm): upgrade-safe deployments — PVC for data, zero-downtime RollingUpdate by intel352 · Pull Request #157 · GoCodeAlone/workflow

intel352 · 2026-02-25T00:24:16Z

Summary

Add pvc.yaml for /app/data with helm.sh/resource-policy: keep — PVC survives helm uninstall, preventing loss of downloaded plugins and runtime state. Opt-in via persistence.enabled=true (default: false / emptyDir for stateless installs).
Explicit RollingUpdate strategy with maxSurge: 1, maxUnavailable: 0 — ensures old pods keep serving traffic until the new pod is fully healthy (passes readiness probe at /healthz).

Test plan

helm upgrade --install workflow ./deploy/helm/workflow — verify no data loss
With persistence.enabled=true: confirm PVC is created and survives helm uninstall
Rolling update: confirm zero downtime (old pod stays up until new pod is ready)

🤖 Generated with Claude Code

…date strategy - Add pvc.yaml for /app/data with helm.sh/resource-policy: keep annotation (persistence.enabled=false by default; emptyDir is preserved for stateless installs) - Update deployment to use PVC when persistence.enabled=true, else emptyDir - Add explicit strategy: RollingUpdate maxSurge:1 maxUnavailable:0 for zero-downtime upgrades - Add persistence config block to values.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR enhances the Helm chart with persistent storage support and explicit rolling update configuration for safer deployments. It adds opt-in PVC support for the /app/data directory (used for downloaded plugins and runtime state) and configures zero-downtime rolling updates.

Changes:

Add persistence.* configuration options in values.yaml (disabled by default)
Add pvc.yaml template with helm.sh/resource-policy: keep annotation
Configure RollingUpdate strategy with maxSurge: 1, maxUnavailable: 0
Conditionally mount PVC or emptyDir based on persistence.enabled flag

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
deploy/helm/workflow/values.yaml	Adds persistence configuration section with enabled, storageClass, accessMode, and size options
deploy/helm/workflow/templates/pvc.yaml	New template creating PVC with keep policy when persistence is enabled
deploy/helm/workflow/templates/deployment.yaml	Adds explicit RollingUpdate strategy and conditional volume mounting (PVC vs emptyDir)

Comments suppressed due to low confidence (1)

deploy/helm/workflow/values.yaml:97

The comment states "With emptyDir (disabled), data is lost on pod restart" but this is slightly misleading. With emptyDir, data is lost when the pod is deleted or rescheduled to a different node, but it survives container restarts within the same pod. A container restart (e.g., due to a crash) does not lose emptyDir data.

Consider clarifying: "With emptyDir (disabled), data is lost when the pod is deleted or rescheduled — plugins must be re-fetched."

# With emptyDir (disabled), data is lost on pod restart — plugins must be re-fetched.

Copilot · 2026-02-25T00:27:43Z

deploy/helm/workflow/templates/pvc.yaml

+  {{- if .Values.persistence.storageClass }}
+  storageClassName: {{ .Values.persistence.storageClass }}


The storageClassName condition uses a simple if check, but when storageClass is an empty string (""), it will not render the storageClassName field. However, in Kubernetes, explicitly setting storageClassName: "" has special meaning - it requests a PV with no storage class (disables dynamic provisioning).

The current logic treats empty string as "not set", which may not be the intended behavior. Consider using if ne .Values.persistence.storageClass "" to explicitly check for non-empty string, or document that empty string should not be used if the user wants to disable dynamic provisioning.

Suggested change

{{- if .Values.persistence.storageClass }}

storageClassName: {{ .Values.persistence.storageClass }}

{{- if hasKey .Values.persistence "storageClass" }}

storageClassName: {{ .Values.persistence.storageClass | quote }}

Copilot · 2026-02-25T00:27:43Z

deploy/helm/workflow/values.yaml


+# Plugin and runtime data persistence
+# With emptyDir (disabled), data is lost on pod restart — plugins must be re-fetched.
+# Enable to retain downloaded plugins and runtime state across restarts.


The documentation should explicitly warn about the compatibility issues between persistence and autoscaling/multiple replicas. The comment mentions data loss on pod restart and plugin re-fetching, but doesn't mention the critical limitation that ReadWriteOnce access mode is incompatible with:

Autoscaling (autoscaling.enabled=true)

Multiple replicas (replicaCount > 1)

Rolling updates across different nodes

Suggest adding a warning like: "Note: Default ReadWriteOnce mode supports only single replica. For multiple replicas or autoscaling, use ReadWriteMany access mode (requires compatible storage class) or disable persistence."

Suggested change

# Enable to retain downloaded plugins and runtime state across restarts.

# Enable to retain downloaded plugins and runtime state across restarts.

# Note: Default ReadWriteOnce mode supports only a single replica. For autoscaling (autoscaling.enabled=true),

# multiple replicas (replicaCount > 1), or rolling updates across different nodes, use ReadWriteMany

# (requires a compatible StorageClass) or disable persistence.

Copilot · 2026-02-25T00:27:44Z

deploy/helm/workflow/templates/deployment.yaml

+          {{- if .Values.persistence.enabled }}
+          persistentVolumeClaim:
+            claimName: {{ include "workflow.fullname" . }}-data
+          {{- else }}
          emptyDir: {}
+          {{- end }}


When persistence is enabled with ReadWriteOnce access mode, the deployment should include podAffinity or topologySpreadConstraints to ensure all pods are scheduled on the same node. Without this, the default scheduler may place pods on different nodes, causing mount failures during rolling updates or scaling operations.

Consider adding a conditional block in the deployment that sets podAffinity when persistence.enabled=true and persistence.accessMode=ReadWriteOnce. For example:

{{- if and .Values.persistence.enabled (eq .Values.persistence.accessMode "ReadWriteOnce") }} affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: {{- include "workflow.selectorLabels" . | nindent 12 }} topologyKey: kubernetes.io/hostname {{- end }}

This ensures all replicas are co-located on the same node, allowing them to share the RWO volume.

Copilot · 2026-02-25T00:27:44Z

deploy/helm/workflow/templates/pvc.yaml

+  labels:
+    {{- include "workflow.labels" . | nindent 4 }}
+  annotations:
+    # Keep PVC on helm uninstall to prevent data loss


The helm.sh/resource-policy: keep annotation means the PVC will persist after helm uninstall, which is the intended behavior for data retention. However, this has operational implications that should be documented:

When reinstalling with a different release name, a new PVC will be created (old data not reused)

When reinstalling with the same release name, the old PVC will be reused (data preserved)

Manual PVC cleanup is required after helm uninstall if data is no longer needed

This could lead to orphaned PVCs accumulating over time

Consider adding a note in the comment or values.yaml documentation about PVC lifecycle management and cleanup procedures.

Suggested change

# Keep PVC on helm uninstall to prevent data loss

# Keep PVC on helm uninstall to prevent data loss.

# Note:

# - This PVC will persist after `helm uninstall` and must be deleted manually if no longer needed.

# - Reinstalling the chart with the SAME release name will reuse this existing PVC (data preserved).

# - Reinstalling the chart with a DIFFERENT release name will create a new PVC (old data not reused).

# - Over time, this can lead to orphaned PVCs accumulating; ensure you have a cleanup process in place.

Copilot · 2026-02-25T00:27:44Z

deploy/helm/workflow/values.yaml

+persistence:
+  enabled: false
+  storageClass: ""
+  accessMode: ReadWriteOnce


The default accessMode ReadWriteOnce conflicts with the rolling update strategy and potential multi-replica scenarios. With RWO, the PVC can only be mounted by pods on a single node. During a rolling update with maxSurge: 1, if the new pod is scheduled on a different node than the old pod, it cannot mount the PVC, causing the deployment to fail.

Consider:

Document this limitation prominently in the comment above (e.g., "ReadWriteOnce requires all pods on same node; scaling beyond 1 replica requires ReadWriteMany")

Consider defaulting to ReadWriteMany if the storage backend supports it

Add validation or warnings when persistence.enabled=true with replicaCount > 1 or autoscaling.enabled=true

This is particularly critical because autoscaling is supported in this chart (can scale to 10 replicas), but would be incompatible with RWO persistence.

Copilot · 2026-02-25T00:27:44Z

deploy/helm/workflow/templates/deployment.yaml

+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1
+      maxUnavailable: 0


The RollingUpdate strategy with maxSurge: 1 and maxUnavailable: 0 can cause deployment failures when combined with persistence.enabled=true and the default ReadWriteOnce access mode. During the rolling update, both old and new pods may need to mount the PVC simultaneously, but RWO only allows one pod to mount it at a time (and only on the same node).

Consider adding a note in the deployment that this strategy requires either:

Persistence disabled (emptyDir)

ReadWriteMany access mode for persistence

Node affinity to ensure pods stay on the same node

Or use Recreate strategy when persistence is enabled with RWO

Alternatively, the chart could dynamically adjust the strategy based on persistence.enabled and persistence.accessMode values.

Suggested change

strategy:

type: RollingUpdate

rollingUpdate:

maxSurge: 1

maxUnavailable: 0

strategy:

{{- if and .Values.persistence.enabled (eq (default "ReadWriteOnce" .Values.persistence.accessMode) "ReadWriteOnce") }}

type: Recreate

{{- else }}

type: RollingUpdate

rollingUpdate:

maxSurge: 1

maxUnavailable: 0

{{- end }}

Copilot AI review requested due to automatic review settings February 25, 2026 00:24

Copilot started reviewing on behalf of intel352 February 25, 2026 00:24 View session

intel352 merged commit e948204 into main Feb 25, 2026
19 checks passed

intel352 deleted the feature/helm-upgrade-safety branch February 25, 2026 00:27

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(helm): upgrade-safe deployments — PVC for data, zero-downtime RollingUpdate#157

fix(helm): upgrade-safe deployments — PVC for data, zero-downtime RollingUpdate#157
intel352 merged 1 commit intomainfrom
feature/helm-upgrade-safety

intel352 commented Feb 25, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		{{- if .Values.persistence.storageClass }}
		storageClassName: {{ .Values.persistence.storageClass }}

-# Enable to retain downloaded plugins and runtime state across restarts.
+# Enable to retain downloaded plugins and runtime state across restarts.
+# Note: Default ReadWriteOnce mode supports only a single replica. For autoscaling (autoscaling.enabled=true),
+# multiple replicas (replicaCount > 1), or rolling updates across different nodes, use ReadWriteMany
+# (requires a compatible StorageClass) or disable persistence.

-    # Keep PVC on helm uninstall to prevent data loss
+    # Keep PVC on helm uninstall to prevent data loss.
+    # Note:
+    # - This PVC will persist after `helm uninstall` and must be deleted manually if no longer needed.
+    # - Reinstalling the chart with the SAME release name will reuse this existing PVC (data preserved).
+    # - Reinstalling the chart with a DIFFERENT release name will create a new PVC (old data not reused).
+    # - Over time, this can lead to orphaned PVCs accumulating; ensure you have a cleanup process in place.

Conversation

intel352 commented Feb 25, 2026

Summary

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants