feat(helm): observability support (ArcadeDB 26.7.1, #4463)#12
Conversation
Adds the brainstorming design for exposing ArcadeDB 26.7.1 observability (health probes, OTLP metrics, structured logging, tracing) in the Helm chart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive, opt-in observability support for the ArcadeDB Helm chart (bumping the version to 26.7.1), including structured JSON logging, OTLP metrics, distributed tracing, and a new Prometheus ServiceMonitor template. It also updates the default liveness probe to use the dependency-free '/api/v1/health' endpoint. The review feedback highlights critical opportunities to prevent nil pointer dereference errors during template rendering by safely defaulting nested maps under '.Values.observability' in the helper templates and the ServiceMonitor manifest.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| {{- define "arcadedb.observability.args" -}} | ||
| {{- $o := .Values.observability -}} | ||
| {{- if eq $o.logging.format "json" }} | ||
| - -Darcadedb.server.logFormat=json | ||
| {{- end }} | ||
| {{- if $o.logging.includeTrace }} | ||
| - -Darcadedb.server.logIncludeTrace=true | ||
| {{- end }} | ||
| {{- if $o.metrics.otlp.enabled }} | ||
| - -Darcadedb.serverMetrics.otlp.enabled=true | ||
| - -Darcadedb.serverMetrics.otlp.endpoint={{ $o.metrics.otlp.endpoint }} | ||
| {{- end }} | ||
| {{- if $o.tracing.enabled }} | ||
| - -Darcadedb.serverMetrics.tracing.enabled=true | ||
| - -Darcadedb.serverMetrics.tracing.endpoint={{ $o.tracing.endpoint }} | ||
| - -Darcadedb.serverMetrics.tracing.samplingRate={{ $o.tracing.samplingRate }} | ||
| {{- end }} | ||
| {{- if $o.health.readinessRequiresHA }} | ||
| - -Darcadedb.server.readinessRequiresHA=true | ||
| {{- end }} | ||
| {{- end -}} |
There was a problem hiding this comment.
The helper arcadedb.observability.args directly accesses nested keys under .Values.observability (such as .Values.observability.logging.format and .Values.observability.metrics.otlp.enabled) without checking if the parent keys exist. If a user provides a custom values file that partially overrides or completely omits the observability section, Helm will throw a nil pointer evaluation error (e.g., nil pointer evaluating interface {}). Using default dict to safely navigate these nested maps prevents rendering crashes.
{{- define "arcadedb.observability.args" -}}
{{- $o := .Values.observability | default dict -}}
{{- $logging := $o.logging | default dict -}}
{{- $metrics := $o.metrics | default dict -}}
{{- $otlp := $metrics.otlp | default dict -}}
{{- $tracing := $o.tracing | default dict -}}
{{- $health := $o.health | default dict -}}
{{- if eq $logging.format "json" }}
- -Darcadedb.server.logFormat=json
{{- end }}
{{- if $logging.includeTrace }}
- -Darcadedb.server.logIncludeTrace=true
{{- end }}
{{- if $otlp.enabled }}
- -Darcadedb.serverMetrics.otlp.enabled=true
- -Darcadedb.serverMetrics.otlp.endpoint={{ $otlp.endpoint }}
{{- end }}
{{- if $tracing.enabled }}
- -Darcadedb.serverMetrics.tracing.enabled=true
- -Darcadedb.serverMetrics.tracing.endpoint={{ $tracing.endpoint }}
- -Darcadedb.serverMetrics.tracing.samplingRate={{ $tracing.samplingRate }}
{{- end }}
{{- if $health.readinessRequiresHA }}
- -Darcadedb.server.readinessRequiresHA=true
{{- end }}
{{- end -}}
There was a problem hiding this comment.
Verified this empirically before applying. The stated trigger — partially overriding or omitting the observability section — does not panic: Helm coalesces user values over the chart's values.yaml defaults, so omitted/partially-set keys retain their defaults (helm template -f with only observability.metrics.otlp.enabled: true renders fine, exit 0; the unit suite also sets partial observability keys and passes).
The real failure mode is explicitly nulling a map, e.g. --set observability=null or --set observability.logging=null, which does dereference nil. That's a genuine crash, so I've applied nil-safe navigation (| default dict) to this helper and the other three sites, plus regression tests covering the explicit-null case.
Fixed in 9e17bbb.
| {{- define "arcadedb.observability.validate" -}} | ||
| {{- $p := .Values.observability.metrics.prometheus -}} | ||
| {{- if or $p.serviceMonitor.enabled $p.podAnnotations.enabled -}} | ||
| {{- $promEnabled := false -}} | ||
| {{- with .Values.arcadedb.plugins.prometheus -}} | ||
| {{- if .enabled -}}{{- $promEnabled = true -}}{{- end -}} | ||
| {{- end -}} | ||
| {{- if not $promEnabled -}} | ||
| {{- fail "observability.metrics.prometheus serviceMonitor/podAnnotations require arcadedb.plugins.prometheus.enabled=true (the /prometheus endpoint must be served)" -}} | ||
| {{- end -}} | ||
| {{- end -}} | ||
| {{- end -}} |
There was a problem hiding this comment.
The validation helper arcadedb.observability.validate accesses .Values.observability.metrics.prometheus directly. If observability or metrics is omitted or overridden partially, this will cause a nil pointer dereference error during rendering. Safely defaulting the nested maps ensures robust template rendering.
{{- define "arcadedb.observability.validate" -}}
{{- $o := .Values.observability | default dict -}}
{{- $metrics := $o.metrics | default dict -}}
{{- $p := $metrics.prometheus | default dict -}}
{{- $serviceMonitor := $p.serviceMonitor | default dict -}}
{{- $podAnnotations := $p.podAnnotations | default dict -}}
{{- if or $serviceMonitor.enabled $podAnnotations.enabled -}}
{{- $promEnabled := false -}}
{{- with .Values.arcadedb.plugins.prometheus -}}
{{- if .enabled -}}{{- $promEnabled = true -}}{{- end -}}
{{- end -}}
{{- if not $promEnabled -}}
{{- fail "observability.metrics.prometheus serviceMonitor/podAnnotations require arcadedb.plugins.prometheus.enabled=true (the /prometheus endpoint must be served)" -}}
{{- end -}}
{{- end -}}
{{- end -}}
There was a problem hiding this comment.
Addressed in 9e17bbb — arcadedb.observability.validate now safely defaults the nested maps. (See the thread above: normal partial overrides were already safe via Helm coalescing; the real crash was explicit nulling, now guarded with a regression test.)
| {{- define "arcadedb.podAnnotations" -}} | ||
| {{- $annotations := deepCopy (default dict .Values.podAnnotations) -}} | ||
| {{- $pa := .Values.observability.metrics.prometheus.podAnnotations -}} | ||
| {{- if $pa.enabled -}} | ||
| {{- $port := int .Values.service.http.port -}} | ||
| {{- if $pa.port -}}{{- $port = int $pa.port -}}{{- end -}} | ||
| {{- $_ := set $annotations "prometheus.io/scrape" "true" -}} | ||
| {{- $_ := set $annotations "prometheus.io/port" (printf "%d" $port) -}} | ||
| {{- $_ := set $annotations "prometheus.io/path" (default "/prometheus" $pa.path) -}} | ||
| {{- end -}} | ||
| {{- if $annotations -}} | ||
| {{- toYaml $annotations -}} | ||
| {{- end -}} | ||
| {{- end -}} |
There was a problem hiding this comment.
The helper arcadedb.podAnnotations accesses .Values.observability.metrics.prometheus.podAnnotations directly. If any of the parent keys are omitted in custom values, this will cause a nil pointer dereference error. Defaulting the nested maps safely avoids this issue.
{{- define "arcadedb.podAnnotations" -}}
{{- $annotations := deepCopy (default dict .Values.podAnnotations) -}}
{{- $o := .Values.observability | default dict -}}
{{- $metrics := $o.metrics | default dict -}}
{{- $prometheus := $metrics.prometheus | default dict -}}
{{- $pa := $prometheus.podAnnotations | default dict -}}
{{- if $pa.enabled -}}
{{- $port := int .Values.service.http.port -}}
{{- if $pa.port -}}{{- $port = int $pa.port -}}
{{- end -}}
{{- $_ := set $annotations "prometheus.io/scrape" "true" -}}
{{- $_ := set $annotations "prometheus.io/port" (printf "%d" $port) -}}
{{- $_ := set $annotations "prometheus.io/path" (default "/prometheus" $pa.path) -}}
{{- end -}}
{{- if $annotations -}}
{{- toYaml $annotations -}}
{{- end -}}
{{- end -}}
There was a problem hiding this comment.
Addressed in 9e17bbb — arcadedb.podAnnotations now safely navigates the nested maps with | default dict.
| {{- if .Values.observability.metrics.prometheus.serviceMonitor.enabled }} | ||
| {{- include "arcadedb.observability.validate" . -}} | ||
| {{- $sm := .Values.observability.metrics.prometheus.serviceMonitor -}} | ||
| {{- if and $sm.basicAuth.enabled (not $sm.basicAuth.secretName) -}} | ||
| {{- fail "serviceMonitor.basicAuth.enabled requires basicAuth.secretName (a secret with username + password keys)" -}} | ||
| {{- end -}} |
There was a problem hiding this comment.
In servicemonitor.yaml, accessing .Values.observability.metrics.prometheus.serviceMonitor.enabled and $sm.basicAuth.enabled directly can cause nil pointer dereference errors if observability or basicAuth is omitted or partially overridden in custom values. Safely defaulting these nested maps prevents rendering failures.
{{- $o := .Values.observability | default dict -}}
{{- $metrics := $o.metrics | default dict -}}
{{- $prometheus := $metrics.prometheus | default dict -}}
{{- $sm := $prometheus.serviceMonitor | default dict -}}
{{- $basicAuth := $sm.basicAuth | default dict -}}
{{- if $sm.enabled }}
{{- include "arcadedb.observability.validate" . -}}
{{- if and $basicAuth.enabled (not $basicAuth.secretName) -}}
{{- fail "serviceMonitor.basicAuth.enabled requires basicAuth.secretName (a secret with username + password keys)" -}}
{{- end -}}There was a problem hiding this comment.
Addressed in 9e17bbb — servicemonitor.yaml now resolves $sm and $basicAuth via | default dict before use, so an explicitly nulled observability/serviceMonitor/basicAuth renders cleanly instead of panicking. Regression test added.
Guards the observability helpers and ServiceMonitor template against an explicitly nulled observability map/sub-map (e.g. --set observability=null), which panicked with a nil pointer dereference. Normal partial overrides and omissions were already safe via Helm's value coalescing; this covers the explicit-null edge. Adds regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chart appVersion is bumped to 26.7.1 ahead of the image publish, so the default image.tag=appVersion is not pullable yet and the integration job's helm install --wait times out. Point the PR integration job at the rolling `latest` tag (== 26.7.1 content) until arcadedata/arcadedb:26.7.1 is published. REVERT this `with:` override at release so the job tests the pinned appVersion again (see f3db108). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document that appVersion can be bumped ahead of the image publish, that the PR integration job is then temporarily pinned to `latest`, and that the override must be removed once the pinned image ships. Adds a pending callout for the current 26.7.1 state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds opt-in, behavior-preserving observability support for ArcadeDB 26.7.1 (issue ArcadeData/arcadedb#4463), covering all four pillars: health probes (#4464), OTLP metrics export (#4465), structured JSON logging (#4466), and distributed tracing (#4467).
A new feature-grouped
observability:values section is translated by helpers into-DJVM args, plus Kubernetes-native Prometheus discovery the raw args can't provide.observability.metrics.otlp.{enabled,endpoint}pushes metrics to an OTel collector alongside the existing/prometheusendpoint.ServiceMonitor(Prometheus Operator) andprometheus.io/*scrape pod annotations. A newrequireAuthenticationknob on the prometheus plugin enables unauthenticated scraping. A template guard fails fast if scrape discovery is enabled without the prometheus plugin.observability.tracing.{enabled,endpoint,samplingRate}.observability.logging.{format,includeTrace}(text|json)./api/v1/health(no DB I/O, never 503); readiness stays on/api/v1/ready. Optionalobservability.health.readinessRequiresHA.version/appVersionbumped to26.7.1(the release that serves/api/v1/health), so the default image and the new liveness default move together.The
ServiceMonitorselects only the{fullname}-httpService via a newapp.kubernetes.io/component: httplabel, avoiding double-scraping the headless Service.Every knob defaults off / behavior-preserving except the liveness path, which is coupled to the 26.7.1 bump.
Test Plan
make test-unit— 159/159 pass across 13 suites (newobservability_test.yaml,servicemonitor_test.yaml; updatedstatefulset_test.yaml,helpers_test.yaml,service_test.yaml)make lint— cleanbash -n ci/integration-test.sh— clean; adds a/api/v1/health→ 204 smoke phase (runs green under thelatest-image guard)helm templatefully-enabled and at-defaults renders verifiedappVersionimage — do not release until thearcadedb:26.7.1image is published)🤖 Generated with Claude Code