Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions deploy/helm/workflow/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Comment on lines +11 to +15
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RollingUpdate strategy with maxSurge: 1 and maxUnavailable: 0 can cause deployment failures when combined with persistence.enabled=true and the default ReadWriteOnce access mode. During the rolling update, both old and new pods may need to mount the PVC simultaneously, but RWO only allows one pod to mount it at a time (and only on the same node).

Consider adding a note in the deployment that this strategy requires either:

  1. Persistence disabled (emptyDir)
  2. ReadWriteMany access mode for persistence
  3. Node affinity to ensure pods stay on the same node
  4. Or use Recreate strategy when persistence is enabled with RWO

Alternatively, the chart could dynamically adjust the strategy based on persistence.enabled and persistence.accessMode values.

Suggested change
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
strategy:
{{- if and .Values.persistence.enabled (eq (default "ReadWriteOnce" .Values.persistence.accessMode) "ReadWriteOnce") }}
type: Recreate
{{- else }}
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
{{- end }}

Copilot uses AI. Check for mistakes.
selector:
matchLabels:
{{- include "workflow.selectorLabels" . | nindent 6 }}
Expand Down Expand Up @@ -90,7 +95,12 @@ spec:
{{- end }}
volumes:
- name: data
{{- if .Values.persistence.enabled }}
persistentVolumeClaim:
claimName: {{ include "workflow.fullname" . }}-data
{{- else }}
emptyDir: {}
{{- end }}
Comment on lines +98 to +103
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When persistence is enabled with ReadWriteOnce access mode, the deployment should include podAffinity or topologySpreadConstraints to ensure all pods are scheduled on the same node. Without this, the default scheduler may place pods on different nodes, causing mount failures during rolling updates or scaling operations.

Consider adding a conditional block in the deployment that sets podAffinity when persistence.enabled=true and persistence.accessMode=ReadWriteOnce. For example:

{{- if and .Values.persistence.enabled (eq .Values.persistence.accessMode "ReadWriteOnce") }}
affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          {{- include "workflow.selectorLabels" . | nindent 12 }}
      topologyKey: kubernetes.io/hostname
{{- end }}

This ensures all replicas are co-located on the same node, allowing them to share the RWO volume.

Copilot uses AI. Check for mistakes.
{{- if .Values.config.inline }}
- name: config
configMap:
Expand Down
20 changes: 20 additions & 0 deletions deploy/helm/workflow/templates/pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{- if .Values.persistence.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "workflow.fullname" . }}-data
labels:
{{- include "workflow.labels" . | nindent 4 }}
annotations:
# Keep PVC on helm uninstall to prevent data loss
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helm.sh/resource-policy: keep annotation means the PVC will persist after helm uninstall, which is the intended behavior for data retention. However, this has operational implications that should be documented:

  1. When reinstalling with a different release name, a new PVC will be created (old data not reused)
  2. When reinstalling with the same release name, the old PVC will be reused (data preserved)
  3. Manual PVC cleanup is required after helm uninstall if data is no longer needed
  4. This could lead to orphaned PVCs accumulating over time

Consider adding a note in the comment or values.yaml documentation about PVC lifecycle management and cleanup procedures.

Suggested change
# Keep PVC on helm uninstall to prevent data loss
# Keep PVC on helm uninstall to prevent data loss.
# Note:
# - This PVC will persist after `helm uninstall` and must be deleted manually if no longer needed.
# - Reinstalling the chart with the SAME release name will reuse this existing PVC (data preserved).
# - Reinstalling the chart with a DIFFERENT release name will create a new PVC (old data not reused).
# - Over time, this can lead to orphaned PVCs accumulating; ensure you have a cleanup process in place.

Copilot uses AI. Check for mistakes.
helm.sh/resource-policy: keep
spec:
accessModes:
- {{ .Values.persistence.accessMode }}
{{- if .Values.persistence.storageClass }}
storageClassName: {{ .Values.persistence.storageClass }}
Comment on lines +14 to +15
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storageClassName condition uses a simple if check, but when storageClass is an empty string (""), it will not render the storageClassName field. However, in Kubernetes, explicitly setting storageClassName: "" has special meaning - it requests a PV with no storage class (disables dynamic provisioning).

The current logic treats empty string as "not set", which may not be the intended behavior. Consider using if ne .Values.persistence.storageClass "" to explicitly check for non-empty string, or document that empty string should not be used if the user wants to disable dynamic provisioning.

Suggested change
{{- if .Values.persistence.storageClass }}
storageClassName: {{ .Values.persistence.storageClass }}
{{- if hasKey .Values.persistence "storageClass" }}
storageClassName: {{ .Values.persistence.storageClass | quote }}

Copilot uses AI. Check for mistakes.
{{- end }}
resources:
requests:
storage: {{ .Values.persistence.size }}
{{- end }}
9 changes: 9 additions & 0 deletions deploy/helm/workflow/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,15 @@ env:
# Sensitive values from Kubernetes secrets
envFromSecret: ""

# Plugin and runtime data persistence
# With emptyDir (disabled), data is lost on pod restart — plugins must be re-fetched.
# Enable to retain downloaded plugins and runtime state across restarts.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should explicitly warn about the compatibility issues between persistence and autoscaling/multiple replicas. The comment mentions data loss on pod restart and plugin re-fetching, but doesn't mention the critical limitation that ReadWriteOnce access mode is incompatible with:

  • Autoscaling (autoscaling.enabled=true)
  • Multiple replicas (replicaCount > 1)
  • Rolling updates across different nodes

Suggest adding a warning like: "Note: Default ReadWriteOnce mode supports only single replica. For multiple replicas or autoscaling, use ReadWriteMany access mode (requires compatible storage class) or disable persistence."

Suggested change
# Enable to retain downloaded plugins and runtime state across restarts.
# Enable to retain downloaded plugins and runtime state across restarts.
# Note: Default ReadWriteOnce mode supports only a single replica. For autoscaling (autoscaling.enabled=true),
# multiple replicas (replicaCount > 1), or rolling updates across different nodes, use ReadWriteMany
# (requires a compatible StorageClass) or disable persistence.

Copilot uses AI. Check for mistakes.
persistence:
enabled: false
storageClass: ""
accessMode: ReadWriteOnce
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default accessMode ReadWriteOnce conflicts with the rolling update strategy and potential multi-replica scenarios. With RWO, the PVC can only be mounted by pods on a single node. During a rolling update with maxSurge: 1, if the new pod is scheduled on a different node than the old pod, it cannot mount the PVC, causing the deployment to fail.

Consider:

  1. Document this limitation prominently in the comment above (e.g., "ReadWriteOnce requires all pods on same node; scaling beyond 1 replica requires ReadWriteMany")
  2. Consider defaulting to ReadWriteMany if the storage backend supports it
  3. Add validation or warnings when persistence.enabled=true with replicaCount > 1 or autoscaling.enabled=true

This is particularly critical because autoscaling is supported in this chart (can scale to 10 replicas), but would be incompatible with RWO persistence.

Copilot uses AI. Check for mistakes.
size: 1Gi

# Prometheus monitoring
monitoring:
enabled: false
Expand Down
Loading