-
Notifications
You must be signed in to change notification settings - Fork 0
fix(helm): upgrade-safe deployments — PVC for data, zero-downtime RollingUpdate #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,11 @@ spec: | |
| {{- if not .Values.autoscaling.enabled }} | ||
| replicas: {{ .Values.replicaCount }} | ||
| {{- end }} | ||
| strategy: | ||
| type: RollingUpdate | ||
| rollingUpdate: | ||
| maxSurge: 1 | ||
| maxUnavailable: 0 | ||
| selector: | ||
| matchLabels: | ||
| {{- include "workflow.selectorLabels" . | nindent 6 }} | ||
|
|
@@ -90,7 +95,12 @@ spec: | |
| {{- end }} | ||
| volumes: | ||
| - name: data | ||
| {{- if .Values.persistence.enabled }} | ||
| persistentVolumeClaim: | ||
| claimName: {{ include "workflow.fullname" . }}-data | ||
| {{- else }} | ||
| emptyDir: {} | ||
| {{- end }} | ||
|
Comment on lines
+98
to
+103
|
||
| {{- if .Values.config.inline }} | ||
| - name: config | ||
| configMap: | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,20 @@ | ||||||||||||||||
| {{- if .Values.persistence.enabled }} | ||||||||||||||||
| apiVersion: v1 | ||||||||||||||||
| kind: PersistentVolumeClaim | ||||||||||||||||
| metadata: | ||||||||||||||||
| name: {{ include "workflow.fullname" . }}-data | ||||||||||||||||
| labels: | ||||||||||||||||
| {{- include "workflow.labels" . | nindent 4 }} | ||||||||||||||||
| annotations: | ||||||||||||||||
| # Keep PVC on helm uninstall to prevent data loss | ||||||||||||||||
|
||||||||||||||||
| # Keep PVC on helm uninstall to prevent data loss | |
| # Keep PVC on helm uninstall to prevent data loss. | |
| # Note: | |
| # - This PVC will persist after `helm uninstall` and must be deleted manually if no longer needed. | |
| # - Reinstalling the chart with the SAME release name will reuse this existing PVC (data preserved). | |
| # - Reinstalling the chart with a DIFFERENT release name will create a new PVC (old data not reused). | |
| # - Over time, this can lead to orphaned PVCs accumulating; ensure you have a cleanup process in place. |
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storageClassName condition uses a simple if check, but when storageClass is an empty string (""), it will not render the storageClassName field. However, in Kubernetes, explicitly setting storageClassName: "" has special meaning - it requests a PV with no storage class (disables dynamic provisioning).
The current logic treats empty string as "not set", which may not be the intended behavior. Consider using if ne .Values.persistence.storageClass "" to explicitly check for non-empty string, or document that empty string should not be used if the user wants to disable dynamic provisioning.
| {{- if .Values.persistence.storageClass }} | |
| storageClassName: {{ .Values.persistence.storageClass }} | |
| {{- if hasKey .Values.persistence "storageClass" }} | |
| storageClassName: {{ .Values.persistence.storageClass | quote }} |
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -93,6 +93,15 @@ env: | |||||||||||
| # Sensitive values from Kubernetes secrets | ||||||||||||
| envFromSecret: "" | ||||||||||||
|
|
||||||||||||
| # Plugin and runtime data persistence | ||||||||||||
| # With emptyDir (disabled), data is lost on pod restart — plugins must be re-fetched. | ||||||||||||
| # Enable to retain downloaded plugins and runtime state across restarts. | ||||||||||||
|
||||||||||||
| # Enable to retain downloaded plugins and runtime state across restarts. | |
| # Enable to retain downloaded plugins and runtime state across restarts. | |
| # Note: Default ReadWriteOnce mode supports only a single replica. For autoscaling (autoscaling.enabled=true), | |
| # multiple replicas (replicaCount > 1), or rolling updates across different nodes, use ReadWriteMany | |
| # (requires a compatible StorageClass) or disable persistence. |
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default accessMode ReadWriteOnce conflicts with the rolling update strategy and potential multi-replica scenarios. With RWO, the PVC can only be mounted by pods on a single node. During a rolling update with maxSurge: 1, if the new pod is scheduled on a different node than the old pod, it cannot mount the PVC, causing the deployment to fail.
Consider:
- Document this limitation prominently in the comment above (e.g., "ReadWriteOnce requires all pods on same node; scaling beyond 1 replica requires ReadWriteMany")
- Consider defaulting to ReadWriteMany if the storage backend supports it
- Add validation or warnings when persistence.enabled=true with replicaCount > 1 or autoscaling.enabled=true
This is particularly critical because autoscaling is supported in this chart (can scale to 10 replicas), but would be incompatible with RWO persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RollingUpdate strategy with maxSurge: 1 and maxUnavailable: 0 can cause deployment failures when combined with persistence.enabled=true and the default ReadWriteOnce access mode. During the rolling update, both old and new pods may need to mount the PVC simultaneously, but RWO only allows one pod to mount it at a time (and only on the same node).
Consider adding a note in the deployment that this strategy requires either:
Alternatively, the chart could dynamically adjust the strategy based on persistence.enabled and persistence.accessMode values.