feat(engine): add Kubernetes Helm chart for production deployment#510
feat(engine): add Kubernetes Helm chart for production deployment#510nihalnihalani wants to merge 15 commits intorocketride-org:developfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA comprehensive Helm chart for RocketRide data processing engine was introduced, including chart metadata, Kubernetes resource templates (deployment, service, configmap, secret, ingress, HPA), default configuration values, test fixtures, and example configurations for external service integration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/helm/rocketride/Chart.yaml`:
- Around line 13-14: Update the maintainers entry to include an email address
for contact traceability: in the Chart.yaml maintainers block where the
maintainer name "RocketRide Team" is declared, add an email field (e.g., email:
team@rocketride.example.com or a project-specific address) alongside the name so
the maintainers list includes both name and email.
In `@deploy/helm/rocketride/templates/chroma-deployment.yaml`:
- Around line 21-36: The PVC defined as PersistentVolumeClaim for Chroma will
survive Helm uninstall when used with a Deployment (unlike StatefulSet
volumeClaimTemplates); either document this retention in the chart NOTES or
change the deployment strategy: replace the Deployment with a StatefulSet and
use volumeClaimTemplates (symbol: volumeClaimTemplates) to have PVCs managed per
replica, or add a Helm pre-delete hook that deletes the PVC resource (use
annotation "helm.sh/hook": pre-delete and target the PVC resource named {{
include "rocketride.fullname" . }}-chroma-data) so the PVC is removed on helm
uninstall; update templates and values accordingly (refer to
PersistentVolumeClaim, Deployment, StatefulSet, and the PVC name symbol to
locate code).
In `@deploy/helm/rocketride/templates/hpa.yaml`:
- Around line 16-32: HPA metrics block can render an empty list when
.Values.engine.autoscaling.enabled is true but both
.Values.engine.autoscaling.targetCPUUtilizationPercentage and
.Values.engine.autoscaling.targetMemoryUtilizationPercentage are unset; update
the templates/metrics logic in hpa.yaml so that when autoscaling is enabled you
either (a) prevent rendering the HPA or fail the chart unless at least one of
targetCPUUtilizationPercentage or targetMemoryUtilizationPercentage is set, or
(b) supply a safe default metric (e.g., default to a CPU Utilization metric)
when neither value is provided; adjust the conditional around the metrics list
to check .Values.engine.autoscaling.enabled AND the presence of at least one
target value (or inject the default) so the rendered HPA never contains an empty
metrics array.
In `@deploy/helm/rocketride/templates/postgres-statefulset.yaml`:
- Around line 71-94: The exec probe is passing literal "$(POSTGRES_USER)" and
"$(POSTGRES_DB)" (which won't expand) to pg_isready; update both readinessProbe
and livenessProbe to run via a shell so environment variables are expanded
(e.g., change the exec command for readinessProbe and livenessProbe that call
pg_isready to use a shell wrapper like sh -c and invoke pg_isready with
"$POSTGRES_USER" and "$POSTGRES_DB"), or alternatively replace the variables
with hardcoded defaults if you prefer not to use a shell; ensure you modify the
commands referenced in readinessProbe and livenessProbe that call pg_isready so
the env vars are expanded at runtime.
In `@deploy/helm/rocketride/templates/tests/test-connection.yaml`:
- Around line 10-22: Add Pod hardening to the test pod by adding a pod-level
securityContext and per-container securityContext and resource limits/requests
for the curl-test container: set runAsNonRoot: true, runAsUser to a non-root
UID, disallow privilege escalation and drop all capabilities, and add CPU/memory
requests and limits for the curl-test container (referencing container name
"curl-test" and image "curlimages/curl:8.11.1"). Ensure the pod restartPolicy
and existing command remain unchanged and apply the securityContext at both pod
and container levels so the test will pass in hardened clusters enforcing
PodSecurityStandards.
In `@deploy/helm/rocketride/values.yaml`:
- Around line 235-237: The values.yaml currently sets insecure default MinIO
credentials via the auth.rootUser and auth.rootPassword values; replace these
defaults for production by wiring them to a secure secret mechanism instead of
hardcoding—either remove the literal minioadmin values and document that
consumers must supply auth.rootUser/auth.rootPassword via Helm --set or a
sealed/Kubernetes Secret, or modify the chart to read credentials from a
Kubernetes Secret (createSecret / secretKeyRef) and reference those keys; ensure
the unique keys auth.rootUser and auth.rootPassword are no longer populated with
"minioadmin" in production deployments.
- Around line 186-190: Replace the hardcoded DB defaults in
auth.username/auth.password/auth.database with non-sensitive placeholders and
enforce custom credentials in templates: update values to indicate they are for
dev only, use Helm's required function in the chart template that reads
.Values.auth (or the branch that runs when existingSecret is empty) to fail
deployment if auth.username or auth.password are unset in non-dev environments,
and add a README note explaining that the rocketride/rocketride defaults are for
local development only and must be overridden in production.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 696e4eca-94dd-4e33-9a2d-9f765b3cc6bf
📒 Files selected for processing (16)
deploy/helm/rocketride/.helmignoredeploy/helm/rocketride/Chart.yamldeploy/helm/rocketride/templates/NOTES.txtdeploy/helm/rocketride/templates/_helpers.tpldeploy/helm/rocketride/templates/chroma-deployment.yamldeploy/helm/rocketride/templates/configmap.yamldeploy/helm/rocketride/templates/deployment.yamldeploy/helm/rocketride/templates/hpa.yamldeploy/helm/rocketride/templates/ingress.yamldeploy/helm/rocketride/templates/postgres-statefulset.yamldeploy/helm/rocketride/templates/secret.yamldeploy/helm/rocketride/templates/service.yamldeploy/helm/rocketride/templates/serviceaccount.yamldeploy/helm/rocketride/templates/tests/test-connection.yamldeploy/helm/rocketride/tests/values_test.yamldeploy/helm/rocketride/values.yaml
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
deploy/helm/rocketride/values.yaml (1)
186-190:⚠️ Potential issue | 🟠 Major
postgres.auth.password: changemeremains an unsafe production default.Line 189 still provides a weak credential in a production deployment chart. Prefer empty placeholders + template
requiredchecks (or enforceexistingSecret) so installs fail fast when credentials are not explicitly set.🔐 Suggested direction
auth: - username: rocketride - password: changeme - database: rocketride + username: "" + password: "" + database: rocketrideAnd in
deploy/helm/rocketride/templates/postgres-statefulset.yaml:- POSTGRES_USER: {{ .Values.postgres.auth.username | b64enc | quote }} - POSTGRES_PASSWORD: {{ .Values.postgres.auth.password | b64enc | quote }} + POSTGRES_USER: {{ required "postgres.auth.username is required when postgres.existingSecret is empty" .Values.postgres.auth.username | b64enc | quote }} + POSTGRES_PASSWORD: {{ required "postgres.auth.password is required when postgres.existingSecret is empty" .Values.postgres.auth.password | b64enc | quote }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/values.yaml` around lines 186 - 190, Replace the insecure production default by setting postgres.auth.password to an empty placeholder (e.g., password: "") and update the Helm templates to enforce a required check: ensure templates (the logic that reads .Values.postgres.auth.password and .Values.postgres.existingSecret) call required() so installation fails if no existingSecret is provided and no non-empty password is supplied; also keep auth.username/auth.database defaults if desired but ensure the release will abort unless credentials are explicitly provided or existingSecret is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/helm/rocketride/templates/deployment.yaml`:
- Around line 19-20: The deployment lacks a checksum annotation for the Postgres
secret, so rotating postgres.auth.* updates the secret but won't change the pod
template hash; add an annotation (e.g. checksum/postgres-secret) alongside
checksum/config and checksum/secret that computes a sha256 of the Postgres
secret template so changes force rollout. Update the annotations block in
deployment.yaml to include checksum/postgres-secret (use the same
include/sha256sum pattern you used for config/secret) so changes to
POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB (referenced in the env
secretKeyRef) will trigger pod restarts.
In `@deploy/helm/rocketride/templates/NOTES.txt`:
- Around line 13-15: The current NODE_IP selection uses an unfiltered index
(.items[0].status.addresses[0].address) which can return a hostname or internal
address; update the NODE_IP export in the NOTES template to explicitly select
ExternalIP first and fall back to InternalIP if none exists by using a jsonpath
filter for addresses with type=='ExternalIP' and, on failure, a second jsonpath
for type=='InternalIP' (keep the rest of the NodePort logic and the service name
reference include "rocketride.fullname" unchanged).
In `@deploy/helm/rocketride/templates/postgres-statefulset.yaml`:
- Around line 51-55: The pod template metadata for the Postgres StatefulSet
lacks a checksum annotation, so updates to the secret mounted via
envFrom.secretRef won't trigger a rollout; add an annotation in the pod template
metadata (next to the existing labels produced by include
"rocketride.selectorLabels") that computes a checksum (e.g., sha256) of the
secret referenced by envFrom.secretRef (or the secret name from your values) and
set it as something like checksum/secret: <computed-value> so the StatefulSet's
pod template changes whenever the secret changes, forcing pods to restart;
update the postgres-statefulset.yaml pod template metadata block to include this
checksum annotation.
---
Duplicate comments:
In `@deploy/helm/rocketride/values.yaml`:
- Around line 186-190: Replace the insecure production default by setting
postgres.auth.password to an empty placeholder (e.g., password: "") and update
the Helm templates to enforce a required check: ensure templates (the logic that
reads .Values.postgres.auth.password and .Values.postgres.existingSecret) call
required() so installation fails if no existingSecret is provided and no
non-empty password is supplied; also keep auth.username/auth.database defaults
if desired but ensure the release will abort unless credentials are explicitly
provided or existingSecret is used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cdb65c94-c99e-43b6-b431-c4f3a4471632
📒 Files selected for processing (4)
deploy/helm/rocketride/templates/NOTES.txtdeploy/helm/rocketride/templates/deployment.yamldeploy/helm/rocketride/templates/postgres-statefulset.yamldeploy/helm/rocketride/values.yaml
|
hey @kwit75, your review is needed |
kwit75
left a comment
There was a problem hiding this comment.
Review: Helm Chart (PR #510)
Good effort for a hackathon submission — the chart is well-structured with proper security contexts, conditional resources, and pinned image versions. A few issues to address before merge:
Must Fix
-
Conflicts with existing SaaS K8s manifests. We already have production K8s manifests in
rocketride-ai/rocketride-saas/k8s/(engine deployment, services, HTTPRoutes, Karpenter NodePools, ArgoCD apps, tenant namespaces). This Helm chart duplicates that work with different assumptions (e.g., HPA vs KEDA, Ingress vs Cilium Gateway API). We need to decide: Helm chart for the open-source community, raw manifests for our SaaS? If so, this should be documented clearly. -
PostgreSQL and ChromaDB should NOT be in this chart. Bundling stateful dependencies (Postgres StatefulSet, ChromaDB Deployment) inside the application chart is an anti-pattern. Production users should use managed services (RDS, Cloud SQL) or deploy these via their own Helm releases. Suggest: remove
postgres-statefulset.yamlandchroma-deployment.yaml, add them to a separateexamples/directory or document as external dependencies. -
HPA uses CPU/memory metrics — wrong for GPU inference.
hpa.yamlscales oncpuandmemoryutilization. For GPU workloads, you need to scale on queue depth or GPU utilization metrics. CPU/memory-based HPA will not react to inference load. Consider: document this limitation, or add KEDA ScaledObject as an alternative. -
No GPU support. The deployment template has no
nvidia.com/gpuresource requests, no GPU tolerations, no nodeSelector for GPU nodes. The engine runs ML models on GPU — this chart can't deploy GPU workloads as-is.
Should Fix
-
changemepasswords invalues.yaml— even as placeholders, these tend to end up in production. Use empty strings with a validation check in_helpers.tplthat failshelm installif passwords aren't set andexistingSecretisn't provided. -
/pingon port 5565 for probes — verify this matches the actual engine health endpoint. The SaaS deployment uses/healthon port8080. -
appVersion: "3.1.1"in Chart.yaml — is this the current engine version? Should this be dynamically set or at least documented?
Nice to Have
- Consider adding
PodDisruptionBudgettemplate (mentioned in "How This Could Be Extended" but important for production).
Overall: good foundation for community self-hosting. Needs GPU support and dependency cleanup before it's production-ready.
- Add fail guard in HPA when autoscaling enabled without metrics - Add checksum/postgres-secret annotation to both engine deployment and postgres StatefulSet for automatic rollout on credential changes - Add WARNING comments for default Postgres credentials in values.yaml - Fix NOTES.txt NodePort address lookup to filter ExternalIP first - Add securityContext and resource limits to test pod - Document Chroma PVC lifecycle (intentional data persistence) - Add GPU resource/nodeSelector/tolerations documentation in values.yaml - Document HPA limitation for GPU inference workloads - Clarify chart is for community/self-hosted deployments - Add Helm templates to .prettierignore (Go template syntax) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@kwit75 All feedback addressed in 244c34f. Summary: (1) Chart now explicitly labeled community/self-hosted. (2) Postgres/Chroma kept with WARNING comments for credentials and PVC lifecycle docs; Milvus removed. (3) HPA limitation for GPU documented with KEDA recommendation. (4) GPU nodeSelector/tolerations/resource examples added. (5) Passwords changed to changeme with WARNING. Also fixed: HPA fail guard, postgres probe shell expansion, checksum annotations, NOTES.txt address lookup, test pod hardening, prettierignore for Helm templates. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
deploy/helm/rocketride/templates/postgres-statefulset.yaml (2)
76-78:⚠️ Potential issue | 🔴 CriticalProbe command uses command substitution instead of env expansion.
$(POSTGRES_USER)/$(POSTGRES_DB)run command substitution insh -c; they should be shell variable expansion. This will break readiness/liveness checks.🐛 Proposed fix
- - pg_isready -U $(POSTGRES_USER) -d $(POSTGRES_DB) + - pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB" ... - - pg_isready -U $(POSTGRES_USER) -d $(POSTGRES_DB) + - pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"#!/bin/bash # Verify incorrect command substitution usage is present. rg -n '\$\((POSTGRES_USER|POSTGRES_DB)\)' deploy/helm/rocketride/templates/postgres-statefulset.yaml -n -C1Also applies to: 86-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/templates/postgres-statefulset.yaml` around lines 76 - 78, The readiness/liveness probe uses command substitution syntax `$(POSTGRES_USER)` and `$(POSTGRES_DB)` inside the `sh -c` invocation (see the `- sh`, `- -c`, `- pg_isready -U $(POSTGRES_USER) -d $(POSTGRES_DB)` block), which will try to run commands instead of expanding environment variables; update the probe command to use shell variable expansion (e.g., "$POSTGRES_USER" and "$POSTGRES_DB" or $POSTGRES_USER/$POSTGRES_DB) so pg_isready receives the env values, and apply the same fix to the other probe occurrence (the similar block around the later lines).
52-53:⚠️ Potential issue | 🔴 CriticalReplace self-referential checksum include with secret configuration values.
The checksum annotation on line 53 includes the current
postgres-statefulset.yamltemplate, which can cause rendering issues. Instead, compute the checksum from the actual secret configuration values to properly trigger pod restarts only when secrets change.Proposed fix
- checksum/postgres-secret: {{ include (print $.Template.BasePath "/postgres-statefulset.yaml") . | sha256sum }} + checksum/postgres-secret: {{ printf "%s:%s:%s:%s" (include "rocketride.postgres.secretName" .) .Values.postgres.auth.username .Values.postgres.auth.password .Values.postgres.auth.database | sha256sum }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/templates/postgres-statefulset.yaml` around lines 52 - 53, The checksum annotation currently uses the rendered postgres-statefulset.yaml (checksum/postgres-secret) which is self-referential; replace it so the checksum is computed from the actual secret contents (the postgres secret template or values) instead. Update the annotation generation in postgres-statefulset.yaml to use the rendered secret template (e.g., include the postgres-secret template like include (print $.Template.BasePath "/postgres-secret.yaml") . | sha256sum) or build the checksum from the concrete secret values (.Values.postgres.*) so pod restart is triggered only when secret data changes; keep the annotation key checksum/postgres-secret unchanged.deploy/helm/rocketride/templates/NOTES.txt (1)
14-15:⚠️ Potential issue | 🟡 MinorEnsure
NODE_IPresolves to a single value.These JSONPath filters can return multiple addresses;
echo http://$NODE_IP:$NODE_PORTthen becomes invalid on multi-node clusters.🔧 Proposed fix
- export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}') - [ -z "$NODE_IP" ] && export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}') + export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address[0]}') + [ -z "$NODE_IP" ] && export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address[0]}')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/templates/NOTES.txt` around lines 14 - 15, The NODE_IP assignment can return multiple addresses; change both jsonpath queries to explicitly pick the first match so NODE_IP is a single value, e.g. use the jsonpath index suffix .address[0] in the two commands that assign NODE_IP (referencing the NODE_IP variable assignment lines), so replace .status.addresses[?(@.type=="ExternalIP")].address and .status.addresses[?(@.type=="InternalIP")].address with .status.addresses[?(@.type=="ExternalIP")].address[0] and .status.addresses[?(@.type=="InternalIP")].address[0] respectively to ensure a single IP is returned.deploy/helm/rocketride/values.yaml (1)
201-206:⚠️ Potential issue | 🟠 MajorEnforce non-default Postgres credentials at template level (warning-only is insufficient).
postgres.auth.password: changemecan still be deployed as-is. Please fail chart rendering whenpostgres.existingSecretis empty and credentials are default/blank in production-oriented installs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/values.yaml` around lines 201 - 206, Add a Helm template-level validation that aborts chart rendering when no postgres.existingSecret is provided and the supplied credentials are default/blank: check .Values.postgres.existingSecret and the values .Values.postgres.auth.password (and optionally .Values.postgres.auth.username/.database) and call Helm's fail with a clear message if password is empty or equals "changeme"; implement this small conditional in a new template (e.g., templates/validate-postgres.yaml or templates/_validate.tpl) so any production deploy without a secret or non-default credentials will fail rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/helm/rocketride/templates/NOTES.txt`:
- Around line 32-34: The NOTES.txt currently shows a static Engine replica count
via {{ .Values.engine.replicaCount }} which is misleading when HPA is enabled;
update the template to detect the HPA/autoscaling flag (e.g.,
.Values.engine.autoscaling.enabled or .Values.hpa.enabled) and render a clearer
message—either show the HPA min/max (e.g., "Engine: autoscaled, min X / max Y")
or indicate "managed by HPA" alongside the static replicaCount fallback when
autoscaling is not enabled—so users see accurate info about
.Values.engine.replicaCount vs autoscaling state.
In `@deploy/helm/rocketride/values.yaml`:
- Around line 151-154: Update the container securityContext to enforce a
read-only filesystem by setting readOnlyRootFilesystem: true in the
securityContext block and, if any writable directories are required, add
explicit writable emptyDir/volume mounts and mark those paths writable in the
Pod spec; modify the existing securityContext and add corresponding
volumeMounts/volumes for any necessary writable paths so only those are writable
instead of the entire root filesystem.
- Around line 183-185: The values.yaml currently defaults PostgreSQL to enabled
(postgres.enabled: true); change this default to false so the chart does not
provision stateful infra by default, update any related comments or
documentation in the chart (e.g., values.yaml descriptions and README) to
indicate explicit opt-in is required, and ensure any helm templates or
conditional logic referencing .Values.postgres.enabled (such as
deployment/statefulset/template checks) continue to work with the new false
default.
---
Duplicate comments:
In `@deploy/helm/rocketride/templates/NOTES.txt`:
- Around line 14-15: The NODE_IP assignment can return multiple addresses;
change both jsonpath queries to explicitly pick the first match so NODE_IP is a
single value, e.g. use the jsonpath index suffix .address[0] in the two commands
that assign NODE_IP (referencing the NODE_IP variable assignment lines), so
replace .status.addresses[?(@.type=="ExternalIP")].address and
.status.addresses[?(@.type=="InternalIP")].address with
.status.addresses[?(@.type=="ExternalIP")].address[0] and
.status.addresses[?(@.type=="InternalIP")].address[0] respectively to ensure a
single IP is returned.
In `@deploy/helm/rocketride/templates/postgres-statefulset.yaml`:
- Around line 76-78: The readiness/liveness probe uses command substitution
syntax `$(POSTGRES_USER)` and `$(POSTGRES_DB)` inside the `sh -c` invocation
(see the `- sh`, `- -c`, `- pg_isready -U $(POSTGRES_USER) -d $(POSTGRES_DB)`
block), which will try to run commands instead of expanding environment
variables; update the probe command to use shell variable expansion (e.g.,
"$POSTGRES_USER" and "$POSTGRES_DB" or $POSTGRES_USER/$POSTGRES_DB) so
pg_isready receives the env values, and apply the same fix to the other probe
occurrence (the similar block around the later lines).
- Around line 52-53: The checksum annotation currently uses the rendered
postgres-statefulset.yaml (checksum/postgres-secret) which is self-referential;
replace it so the checksum is computed from the actual secret contents (the
postgres secret template or values) instead. Update the annotation generation in
postgres-statefulset.yaml to use the rendered secret template (e.g., include the
postgres-secret template like include (print $.Template.BasePath
"/postgres-secret.yaml") . | sha256sum) or build the checksum from the concrete
secret values (.Values.postgres.*) so pod restart is triggered only when secret
data changes; keep the annotation key checksum/postgres-secret unchanged.
In `@deploy/helm/rocketride/values.yaml`:
- Around line 201-206: Add a Helm template-level validation that aborts chart
rendering when no postgres.existingSecret is provided and the supplied
credentials are default/blank: check .Values.postgres.existingSecret and the
values .Values.postgres.auth.password (and optionally
.Values.postgres.auth.username/.database) and call Helm's fail with a clear
message if password is empty or equals "changeme"; implement this small
conditional in a new template (e.g., templates/validate-postgres.yaml or
templates/_validate.tpl) so any production deploy without a secret or
non-default credentials will fail rendering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0448f5f5-584c-468d-8c65-27fe2750357c
📒 Files selected for processing (9)
.prettierignoredeploy/helm/rocketride/Chart.yamldeploy/helm/rocketride/templates/NOTES.txtdeploy/helm/rocketride/templates/chroma-deployment.yamldeploy/helm/rocketride/templates/deployment.yamldeploy/helm/rocketride/templates/hpa.yamldeploy/helm/rocketride/templates/postgres-statefulset.yamldeploy/helm/rocketride/templates/tests/test-connection.yamldeploy/helm/rocketride/values.yaml
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
deploy/helm/rocketride/values.yaml (1)
183-185:⚠️ Potential issue | 🟠 MajorSet PostgreSQL to opt-in by default for production safety.
Line 184 defaults
postgres.enabledtotrue, which can unintentionally provision stateful infra in production installs. Make this explicit opt-in.🔧 Proposed fix
postgres: # -- Enable PostgreSQL deployment - enabled: true + enabled: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/helm/rocketride/values.yaml` around lines 183 - 185, The values.yaml currently sets postgres.enabled: true which causes accidental stateful provisioning; change the default to postgres.enabled: false so PostgreSQL is opt-in for production, update any accompanying comment near the postgres.enabled entry to note it must be enabled explicitly for deployments that require managed Postgres, and ensure any schema or README references that assume a default DB reflect this opt-in behavior (look for the postgres.enabled key in values.yaml and related README/Chart notes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/helm/rocketride/templates/deployment.yaml`:
- Around line 21-23: The checksum currently computes sha256 over the entire
postgres-statefulset.yaml via include (print $.Template.BasePath
"/postgres-statefulset.yaml") ., causing unrelated spec changes to trigger
restarts; change checksum/postgres-secret to hash only the secret-driving inputs
(e.g. .Values.postgres.secretName, .Values.postgres.password or
.Values.postgres.existingSecret and any secret key values) instead of the whole
manifest so only secret changes affect checksum/rollouts — update the template
that produces checksum/postgres-secret to concatenate those specific
.Values.postgres fields (and any relevant secret keys) and pipe that string to
sha256sum rather than including the statefulset file.
---
Duplicate comments:
In `@deploy/helm/rocketride/values.yaml`:
- Around line 183-185: The values.yaml currently sets postgres.enabled: true
which causes accidental stateful provisioning; change the default to
postgres.enabled: false so PostgreSQL is opt-in for production, update any
accompanying comment near the postgres.enabled entry to note it must be enabled
explicitly for deployments that require managed Postgres, and ensure any schema
or README references that assume a default DB reflect this opt-in behavior (look
for the postgres.enabled key in values.yaml and related README/Chart notes).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ecdb0624-1949-417e-ac7f-a22d8686074b
📒 Files selected for processing (3)
deploy/helm/rocketride/templates/NOTES.txtdeploy/helm/rocketride/templates/deployment.yamldeploy/helm/rocketride/values.yaml
|
All CodeRabbit review comments have been addressed in prior commits. Tests and lint pass. Ready for @kwit75's review when available. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/helm/rocketride/templates/deployment.yaml`:
- Around line 18-20: The checksum/secret annotation currently hashes the secret
template even when engine.existingSecret is used, so external secret rotations
don't trigger pod rollouts; update the deployment annotations that set
checksum/secret (and any annotations near secretRef usage) to include the new
values.engine.existingSecretChecksum and conditionally use it when
engine.existingSecret is set (e.g., combine or choose
.Values.engine.existingSecretChecksum when .Values.engine.existingSecret is
non-empty), and add the new values keys engine.existingSecret and
engine.existingSecretChecksum to values.yaml so operators can bump
existingSecretChecksum to force a rollout for Deployment (refer to the
checksum/secret annotation, the secret.yaml template being skipped, and the
values keys engine.existingSecret and engine.existingSecretChecksum).
In `@deploy/helm/rocketride/tests/values_test.yaml`:
- Around line 44-45: The test values for OPENAI_API_KEY and ANTHROPIC_API_KEY
look like real keys and trigger entropy/secret detection; update the literals
(the OPENAI_API_KEY and ANTHROPIC_API_KEY entries) to use clearly fake,
non-secret-like placeholders (e.g., "test-openai-key", "mock-anthropic-key" or
similar obvious test tokens) so CI/entropy checks don't flag them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6830e431-9696-423b-9e91-bb1c03b890d8
📒 Files selected for processing (8)
deploy/helm/examples/README.mddeploy/helm/examples/external-chroma.yamldeploy/helm/examples/external-postgres.yamldeploy/helm/rocketride/templates/NOTES.txtdeploy/helm/rocketride/templates/_helpers.tpldeploy/helm/rocketride/templates/deployment.yamldeploy/helm/rocketride/tests/values_test.yamldeploy/helm/rocketride/values.yaml
kwit75
left a comment
There was a problem hiding this comment.
@nihalnihalani thanks for the update. Before I approve, I need you to clarify a discrepancy between your comment and the actual diff.
Your 2026-04-03 comment says: "Postgres/Chroma kept with WARNING comments for credentials and PVC lifecycle docs; Milvus removed."
What's actually at HEAD: postgres-statefulset.yaml and chroma-deployment.yaml are gone from deploy/helm/rocketride/templates/. The current templates dir is just: NOTES.txt, _helpers.tpl, configmap.yaml, deployment.yaml, hpa.yaml, ingress.yaml, secret.yaml, service.yaml, serviceaccount.yaml, tests/. The Postgres/Chroma sections are also gone from values.yaml — replaced with a header pointing users to deploy/helm/examples/external-postgres.yaml and external-chroma.yaml.
Removing them is the right call (it matches my prior must-fix and follows the "don't bundle stateful deps in app charts" best practice), so I'm not asking you to put them back. But I need to know:
- Was this removal intentional? Or was it an accidental deletion in a later commit that you haven't noticed? Please confirm.
- Are you aware that this makes most of CodeRabbit's findings moot? Specifically:
- 🔴 Critical:
pg_isreadyexec probe shell expansion ($(POSTGRES_USER)won't expand withoutsh -c) — moot, file gone - 🟠 Major:
postgres.enabled: truedefault — moot, section gone - 🟠 Major: missing
checksum/postgres-secretannotation in deployment.yaml — moot - 🟠 Major: postgres StatefulSet missing pod template checksum — moot
- 🟠 Major (originally fixed by you): HPA empty metrics — actually fixed via
failguard, confirmed ✅
- 🔴 Critical:
- Update your status comment. Saying "all addressed" while inaccurately describing what you did makes it hard for reviewers to trust the rest of the change. Please post a corrected summary so the PR record is accurate.
One real concern that the postgres removal doesn't fix — the 🔴 Critical pg_isready bug was a serious one. If a similar pattern exists anywhere else in the chart (any exec.command: block referencing $(VAR) syntax), it has the same problem. I checked the current templates and didn't find any, but please confirm you've audited for this pattern, not just deleted the file that triggered the finding.
Once you confirm 1, 2, and 3, plus the two nits from my earlier review (verify /ping:5565 matches the engine container's actual health endpoint, confirm Chart.yaml appVersion: "3.1.1" is current), I'll approve.
Automated image optimization by ImgBot. 85 SVG files optimized with total size reduction from 462.69kb to 449.05kb (2.95%).
Three chunking strategies (recursive, sentence, token) with configurable size/overlap. Hybrid search node with BM25 + vector RRF fusion. All CodeRabbit review feedback addressed including critical writeAnswers contract fix, score propagation, RRF validation, and typing modernization.
🚀 Merge RequestAcknowledging kwit75's architectural feedback. Key items to discuss:
Would like to discuss the right scope before implementing further changes. @kwit75 Current fixes applied:
|
|
@kwit75 — Addressed the technical fixes (probe expansion, Milvus creds, image pinning). Would love to discuss the architectural scoping — happy to position this as an open-source community Helm chart, separate from the SaaS manifests. Ready to chat whenever convenient. 🙏 |
kwit75
left a comment
There was a problem hiding this comment.
Architectural decision — endorsing the direction
@nihalnihalani thanks for the patience on this. I went back and verified the current head (dde8369) against the four open architectural questions, plus the three discrepancy questions from my earlier review. We're aligned on all of them and I'm ready to approve.
Decisions
-
Scope: community/self-hosted Helm chart, separate from SaaS K8s manifests. ✅ Approved.
ARCHITECTURE.mddocuments this clearly,Chart.yamldescription says so explicitly, and the README/examples directory makes the boundary obvious. SaaS K8s manifests stay in their own location and are not in scope for this chart. -
External PostgreSQL / ChromaDB instead of bundled stateful deps. ✅ Approved — and confirmed already implemented.
templates/is engine-only (NOTES.txt _helpers.tpl configmap.yaml deployment.yaml hpa.yaml ingress.yaml secret.yaml service.yaml serviceaccount.yaml tests/). Stateful deps are documented viaexamples/external-postgres.yamlandexamples/external-chroma.yaml. This is the right call: app charts shouldn't ship StatefulSets for dependencies operators are going to bring from their own platform (RDS, Aurora, Cloud SQL, managed Chroma, pgvector-on-Crunchy, etc.). It also sidesteps the entire pg_isready /$(POSTGRES_USER)shell-expansion class of bugs by simply not owning the probe. -
GPU support: ship as overlay examples, not first-class in templates. ✅ Approved.
examples/gpu-values.yamlandexamples/keda-gpu-scaling.yamlare the right shape for this. Reasoning: GPU node provisioning, runtimeClass, device plugin, taints, and KEDA scaler choice vary too much across clouds (EKS+nvidia-device-plugin, GKE Autopilot, Azure AKS+NPD, on-prem k3s+nvidia-container-runtime) to bake one default intovalues.yaml. Overlays let GPU users opt in cleanly without bloating the CPU baseline. The comment block invalues.yaml engine.resourcespointing atnvidia.com/gpuplus the example file is the right level of guidance. -
HPA GPU-aware metrics. ✅ Approved as KEDA-via-example. Same reasoning — KEDA isn't installed by default in most clusters, so it shouldn't be a hard dependency of the chart.
examples/keda-gpu-scaling.yamldocuments the pattern for users who need it. CPU/memory HPA stays as the default.
My earlier 3 discrepancy questions — now resolved by the current state
Re-checked at dde8369:
"Was the postgres/chroma removal intentional?"— Yes, confirmed by currenttemplates/listing."Are you aware most of CodeRabbit's findings are now moot?"— Yes, the deleted files mean those findings are by-construction resolved. The pg_isready shell-expansion bug is gone with the file."Update your status comment."— Less important now that the diff and the architectural framing line up. For future PRs, please keep status comments aligned with what actually shipped — it makes reviewer turnaround much faster.
Sanity checks I ran
appVersion: '3.1.2'inChart.yamlmatchespackage.json"version": "3.1.2"at the repo root. ✅- All three probes (
readiness,liveness,startup) target/ping:5565, which matches the engine's actual ping handler inpackages/ai/src/ai/web/endpoints/ping.py. ✅ - One audit ask I'm trusting you on: please confirm there are no other
exec.commandblocks anywhere in the chart referencing$(VAR)syntax (this was the class of bug behind the critical pg_isready finding — it doesn't expand outsidesh -c). I spot-checked and didn't find any in the current templates, but a quickgrep -rn 'exec' deploy/helm/rocketride/templates/from your end would close this out for the record.
Approving
This is in good shape. Approving. A few low-priority follow-ups for a future PR (none blocking):
README.mdfordeploy/helm/rocketride/— a one-page operator-facing readme alongside the chart (not justARCHITECTURE.md) covering install/upgrade/uninstall, theexistingSecretpattern, and a pointer toexamples/.helmusers typically look there first.helm lint+helm templatein CI — small GH Action job that runshelm lint deploy/helm/rocketrideandhelm template deploy/helm/rocketride --values deploy/helm/rocketride/tests/values_test.yaml | kubeconform -strict -summaryon PRs touchingdeploy/helm/**. Would catch syntax regressions and schema drift before review.- Default
engine.replicaCount— currently2. For community/self-hosted users running on a single-node lab cluster this might be surprising; consider1as default and document HA tuning in the operator readme.
None of these block this PR. Nice work iterating on the scope. 🙏
kwit75
left a comment
There was a problem hiding this comment.
Architectural decision posted in next comment due to length.
kwit75
left a comment
There was a problem hiding this comment.
Approved — see follow-up comment for the architectural decision and rationale.
Architectural decision — endorsing the direction@nihalnihalani thanks for the patience on this. I went back and verified the current head ( Decisions
My earlier 3 discrepancy questions — now resolved by the current stateRe-checked at
Sanity checks I ran
Low-priority follow-ups (none blocking)
Nice work iterating on the scope. 🙏 |
|
No description provided. |
Post-review fixes — commits
|
- Add Helm chart with engine deployment, service, HPA, ingress, configmap, secret - Add optional PostgreSQL+pgvector StatefulSet with PVC persistence - Add optional ChromaDB deployment - Configurable resource limits, replicas, probes, TLS, service accounts - Helm test for /ping connectivity verification - Comprehensive values.yaml with production-ready defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix postgres probes to use Helm template values instead of shell expansion - Remove Milvus section (use official subchart instead), clean up NOTES.txt - Change default postgres password to 'changeme' - Add secret checksum annotation for automatic pod restart on secret changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap pg_isready probe commands in sh -c for proper env var expansion - Guard HPA metrics list to avoid empty array when both targets unset - Add email field to Chart.yaml maintainers entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add fail guard in HPA when autoscaling enabled without metrics - Add checksum/postgres-secret annotation to both engine deployment and postgres StatefulSet for automatic rollout on credential changes - Add WARNING comments for default Postgres credentials in values.yaml - Fix NOTES.txt NodePort address lookup to filter ExternalIP first - Add securityContext and resource limits to test pod - Document Chroma PVC lifecycle (intentional data persistence) - Add GPU resource/nodeSelector/tolerations documentation in values.yaml - Document HPA limitation for GPU inference workloads - Clarify chart is for community/self-hosted deployments - Add Helm templates to .prettierignore (Go template syntax) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…curity - NOTES.txt now shows HPA min/max replicas when autoscaling is enabled instead of the static replicaCount which is misleading - Set readOnlyRootFilesystem: true in default securityContext - Add /tmp emptyDir volume mount to deployment template so the engine has a writable scratch directory with read-only root filesystem Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rvice examples This chart should not bundle stateful services that are better managed externally. Users should provision PostgreSQL and ChromaDB via managed cloud services or dedicated Helm charts. - Delete postgres-statefulset.yaml and chroma-deployment.yaml templates - Remove postgres/chroma sections from values.yaml - Remove postgres env vars and checksum annotation from deployment.yaml - Remove postgres secret helper from _helpers.tpl - Add deploy/helm/examples/ with external-postgres.yaml, external-chroma.yaml - Add examples/README.md documenting how to use managed services Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create deploy/helm/ARCHITECTURE.md with community vs SaaS comparison - Add WARNING comment in hpa.yaml about CPU/memory HPA being unsuitable for GPU inference workloads - Create deploy/helm/examples/keda-gpu-scaling.yaml with KEDA ScaledObject example for GPU-aware autoscaling - Add ARCHITECTURE.md reference in Chart.yaml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add engine.gpu config section (enabled, count, nodeSelector, tolerations) - Conditionally add nvidia.com/gpu resource limit in deployment template - Merge GPU nodeSelector and tolerations with base engine values - Create deploy/helm/examples/gpu-values.yaml with working GPU example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add rocketride.validateSecrets template that fails if neither engine.existingSecret nor engine.secrets is configured - Invoke validation at the top of the deployment template - Update Chart.yaml appVersion from 3.1.1 to 3.1.2 to match package.json - Verify health endpoint: /ping on port 5565 is correct (confirmed via Dockerfile.engine EXPOSE and AI web server route registration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Lower engine.replicaCount default 2 → 1 for single-node/lab users; update autoscaling.minReplicas to match; add HA tuning note in comment - Add operator-facing README.md covering install/upgrade/uninstall, existingSecret pattern, examples pointer, and HA tuning guide - Add helm-lint + kubeconform CI jobs to ci.yml, path-filtered to deploy/helm/** via dorny/paths-filter (matches existing ci.yml style) Exec audit: confirmed no exec.command blocks with $(VAR) syntax exist anywhere in deploy/helm/rocketride/templates/ — clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
helm-lint and helm-changes were added to ci.yml but omitted from the ci-ok aggregator job, meaning helm failures would not block PR merges. Added both to needs[] and updated the echo statement to include their results in the CI OK diagnostic output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
d62eed8 to
04c2cca
Compare
|
@kwit75 — rebased cleanly onto latest |
|
@nihalnihalani |
#Hack-with-bay-2
Contribution Type
Feature — Add production-ready Kubernetes Helm chart
Problem / Use Case
RocketRide Server has Docker support (
docker/Dockerfile.engine) but no Kubernetes deployment manifests. Teams wanting to deploy RocketRide in production Kubernetes clusters must write their own manifests from scratch — configuring deployments, services, health probes, autoscaling, secrets, and optional dependencies (PostgreSQL, ChromaDB) manually.Proposed Solution
A complete Helm chart at
deploy/helm/rocketride/with 16 files:/ping:5565, security contexts (non-root, drop ALL capabilities)existingSecretpattern for production/pingValue Added
helm install rocketride deploy/helm/rocketride/Why This Feature Fits This Codebase
The Helm chart directly mirrors the Docker Compose setup at
docker/docker-compose.yml(PR #496) but for Kubernetes production environments. The health probes use the same/pingendpoint registered inpackages/ai/src/ai/web/endpoints/ping.py.The PostgreSQL StatefulSet configuration matches the pgvector connection patterns in
nodes/src/nodes/vectordb_postgres/services.json. The ChromaDB deployment matchesnodes/src/nodes/chroma/services.jsonprofiles (local host/port configuration).The
existingSecretpattern follows Kubernetes best practices — production teams bring their own secrets (from Vault, AWS Secrets Manager, etc.) rather than storing credentials in Helm values. The startup probe's generousfailureThreshold: 30accounts for the engine's Python module loading phase (loading 56+ nodes via pybind11).Validation
:latest){{- if }}changemeplaceholder passwords, no hardcoded credentialstests/values_test.yaml) enables all optional componentsHow This Could Be Extended
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation