Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,13 +138,14 @@ kubectl -n openshell rollout status statefulset/openshell

Look for failed installs, unexpected values, missing namespace, wrong image tag, TLS settings that do not match the registered endpoint, and scheduling failures.

For HA or PostgreSQL-backed installs, also check the service-binding Secret and
bundled PostgreSQL workload:
For HA or PostgreSQL-backed installs, also check the external database Secret
referenced by `server.externalDbSecret` and the PostgreSQL workload if the test
or operator deployed one in-cluster:

```bash
kubectl -n openshell get secret -l app.kubernetes.io/instance=openshell
kubectl -n openshell get statefulset,pod,pvc -l app.kubernetes.io/instance=openshell
kubectl -n openshell logs statefulset/openshell-postgres --tail=200
kubectl -n openshell get secret openshell-ha-pg -o yaml
kubectl -n openshell get deployment,service,pod -l app.kubernetes.io/name=openshell-e2e-postgres
kubectl -n openshell logs deployment/openshell-e2e-postgres --tail=200
```

Check required Helm deployment secrets:
Expand Down
7 changes: 4 additions & 3 deletions .agents/skills/helm-dev-environment/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,10 @@ generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional

The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`.

**HA test deploy** (two gateway replicas + bundled PostgreSQL): uncomment
**HA test deploy** (two gateway replicas + external PostgreSQL Secret): uncomment
`#- ci/values-high-availability.yaml` in `deploy/helm/openshell/skaffold.yaml`,
then run `mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.
create the Secret named `openshell-ha-pg` with a `uri` key, then run
`mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.

### TLS behaviour

Expand Down Expand Up @@ -203,7 +204,7 @@ mise run helm:k3s:status
| `deploy/helm/openshell/ci/values-skaffold.yaml` | Dev overrides (image pull policy, TLS disabled for local Skaffold) |
| `deploy/helm/openshell/ci/values-cert-manager.yaml` | cert-manager PKI overlay (opt-in; disables pkiInitJob) |
| `deploy/helm/openshell/ci/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay |
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with bundled PostgreSQL) |
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with external PostgreSQL Secret) |
| `deploy/helm/openshell/ci/values-keycloak.yaml` | Keycloak OIDC overlay |
| `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) |
| `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) |
Expand Down
8 changes: 0 additions & 8 deletions .github/actions/release-helm-oci/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,6 @@ runs:
exit 1
fi

- name: Build chart dependencies
env:
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}
shell: bash
run: |
set -euo pipefail
helm dependency build "${CHART_DIR}"

- name: Package Helm chart
env:
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/branch-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ jobs:
image-tag: ${{ github.sha }}
job-name: Kubernetes HA E2E (Rust smoke)
extra-helm-values: deploy/helm/openshell/ci/values-high-availability.yaml
external-postgres-secret: openshell-ha-pg

core-e2e-result:
name: Core E2E result
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/e2e-kubernetes-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ on:
required: false
type: string
default: ""
external-postgres-secret:
description: "Create an ephemeral external PostgreSQL fixture and write its URI to this Secret"
required: false
type: string
default: ""
mise-version:
description: "mise version to install on the bare Kubernetes e2e runner"
required: false
Expand Down Expand Up @@ -111,6 +116,7 @@ jobs:
env:
OPENSHELL_E2E_KUBE_CONTEXT: kind-${{ env.KIND_CLUSTER_NAME }}
OPENSHELL_E2E_KUBE_EXTRA_VALUES: ${{ inputs.extra-helm-values }}
OPENSHELL_E2E_KUBE_EXTERNAL_POSTGRES_SECRET: ${{ inputs.external-postgres-secret }}
IMAGE_TAG: ${{ inputs.image-tag }}
OPENSHELL_REGISTRY: ghcr.io/nvidia/openshell
run: mise run --no-deps --skip-deps e2e:kubernetes
4 changes: 3 additions & 1 deletion architecture/compute-runtimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,9 @@ runtime still owns GPU device injection.

## Deployment Shape

Kubernetes deployments use the Helm chart under `deploy/helm/openshell`.
Kubernetes deployments use the Helm chart under `deploy/helm/openshell`. The
chart deploys the gateway and sandbox runtime integration, but HA deployments
must point `server.externalDbSecret` at an operator-managed PostgreSQL database.
Standalone local deployments start the gateway with a selected runtime such as
Docker, Podman, or VM. The CLI can register multiple gateways and switch between
them without changing the sandbox architecture.
Expand Down
6 changes: 0 additions & 6 deletions deploy/helm/openshell/Chart.lock

This file was deleted.

6 changes: 0 additions & 6 deletions deploy/helm/openshell/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,3 @@ type: application
# empty), so a released chart automatically pulls the matching gateway and supervisor images.
version: 0.0.0
appVersion: "0.0.0"
dependencies:
- name: postgresql
version: 18.6.7
repository: oci://registry-1.docker.io/bitnamicharts
condition: postgres.enabled
alias: postgres
29 changes: 7 additions & 22 deletions deploy/helm/openshell/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ See [`values.yaml`](values.yaml) for source defaults. Selected overlays:
- [`ci/values-gateway.yaml`](ci/values-gateway.yaml) - gateway-only configuration
- [`ci/values-cert-manager.yaml`](ci/values-cert-manager.yaml) - cert-manager integration
- [`ci/values-keycloak.yaml`](ci/values-keycloak.yaml) - Keycloak OIDC integration
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - HA gateway test overlay with bundled PostgreSQL
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - CI overlay for multi-replica external PostgreSQL testing

### Database backend

Expand All @@ -65,12 +65,15 @@ By default, OpenShell uses SQLite:
```yaml
server:
dbUrl: "sqlite:/var/openshell/openshell.db"
postgres:
enabled: false
```

#### External PostgreSQL

Use external PostgreSQL when the gateway should connect to a database managed
outside this chart. The OpenShell chart does not deploy a database; install
PostgreSQL separately using the chart, operator, or managed service that fits
your environment, then pass the connection URI through a Secret.

Create a Secret containing the PostgreSQL connection URI if one does not
already exist:

Expand All @@ -87,18 +90,6 @@ helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <vers
--set server.externalDbSecret=my-pg-credentials
```

#### Bundled PostgreSQL

Deploy a PostgreSQL instance alongside the gateway using the bundled
Bitnami subchart. A random password is generated automatically:

```bash
helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <version> \
--set postgres.enabled=true
```

To set an explicit password, add `--set postgres.auth.password=my-secret-password`.

#### OpenShift

Append these flags to any of the PostgreSQL commands above for OpenShift:
Expand Down Expand Up @@ -159,12 +150,6 @@ JWT signing Secret.
| podLabels | object | `{}` | Extra labels to add to the gateway pod. |
| podLifecycle.terminationGracePeriodSeconds | int | `5` | Grace period, in seconds, before Kubernetes terminates the gateway pod. |
| podSecurityContext.fsGroup | int | `1000` | fsGroup assigned to the gateway pod. |
| postgres.auth.database | string | `"openshell"` | |
| postgres.auth.password | string | `""` | |
| postgres.auth.username | string | `"openshell"` | |
| postgres.enabled | bool | `false` | Deploy the bundled Bitnami PostgreSQL subchart. |
| postgres.primary.persistence.enabled | bool | `true` | |
| postgres.serviceBindings.enabled | bool | `true` | |
| probes.liveness.failureThreshold | int | `3` | Liveness probe failure threshold before the container is restarted. |
| probes.liveness.initialDelaySeconds | int | `2` | Liveness probe initial delay, in seconds. |
| probes.liveness.periodSeconds | int | `5` | Liveness probe period, in seconds. |
Expand All @@ -176,7 +161,7 @@ JWT signing Secret.
| probes.startup.failureThreshold | int | `30` | Startup probe failure threshold before the container is killed. |
| probes.startup.periodSeconds | int | `2` | Startup probe period, in seconds. |
| probes.startup.timeoutSeconds | int | `1` | Startup probe timeout, in seconds. |
| replicaCount | int | `1` | Number of OpenShell gateway replicas. |
| replicaCount | int | `1` | Number of OpenShell gateway replicas. Values greater than 1 require server.externalDbSecret because the default SQLite backend is per pod. |
| resources | object | `{}` | Gateway pod resource requests and limits. |
| sandboxServiceAccount.annotations | object | `{}` | Annotations to add to the generated sandbox service account. |
| sandboxServiceAccount.create | bool | `true` | Create a service account for sandbox pods. |
Expand Down
21 changes: 6 additions & 15 deletions deploy/helm/openshell/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ See [`values.yaml`](values.yaml) for source defaults. Selected overlays:
- [`ci/values-gateway.yaml`](ci/values-gateway.yaml) - gateway-only configuration
- [`ci/values-cert-manager.yaml`](ci/values-cert-manager.yaml) - cert-manager integration
- [`ci/values-keycloak.yaml`](ci/values-keycloak.yaml) - Keycloak OIDC integration
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - HA gateway test overlay with bundled PostgreSQL
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - CI overlay for multi-replica external PostgreSQL testing

### Database backend

Expand All @@ -65,12 +65,15 @@ By default, OpenShell uses SQLite:
```yaml
server:
dbUrl: "sqlite:/var/openshell/openshell.db"
postgres:
enabled: false
```

#### External PostgreSQL

Use external PostgreSQL when the gateway should connect to a database managed
outside this chart. The OpenShell chart does not deploy a database; install
PostgreSQL separately using the chart, operator, or managed service that fits
your environment, then pass the connection URI through a Secret.

Create a Secret containing the PostgreSQL connection URI if one does not
already exist:

Expand All @@ -87,18 +90,6 @@ helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <vers
--set server.externalDbSecret=my-pg-credentials
```

#### Bundled PostgreSQL

Deploy a PostgreSQL instance alongside the gateway using the bundled
Bitnami subchart. A random password is generated automatically:

```bash
helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <version> \
--set postgres.enabled=true
```

To set an explicit password, add `--set postgres.auth.password=my-secret-password`.

#### OpenShift

Append these flags to any of the PostgreSQL commands above for OpenShift:
Expand Down
24 changes: 5 additions & 19 deletions deploy/helm/openshell/ci/values-high-availability.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,10 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# CI/dev overlay for exercising the gateway with more than one replica.
# SQLite is not suitable for HA because each replica has its own pod volume, so
# this overlay enables the bundled PostgreSQL dependency added by the chart.
# CI/dev overlay for exercising the gateway with more than one replica. SQLite
# is not suitable for HA because each replica has its own pod volume, so this
# overlay expects the caller to provide a PostgreSQL Secret named openshell-ha-pg.
replicaCount: 2

global:
security:
# The mirror serves the same pinned Bitnami PostgreSQL digest, but Bitnami's
# chart verification treats non-Docker-Hub registries as unrecognized.
allowInsecureImages: true

postgres:
enabled: true
# Keep the HA CI/dev overlay off Docker Hub's unauthenticated pull path.
# The Bitnami subchart defaults to registry-1.docker.io/bitnami/postgresql:latest.
image:
registry: mirror.gcr.io
repository: bitnami/postgresql
digest: sha256:7651d7f24aad83fe68a222f7f20eded10d325c96ebee285ca5bf8162eddcba64
auth:
password: openshell-ha-ci
server:
externalDbSecret: openshell-ha-pg
2 changes: 1 addition & 1 deletion deploy/helm/openshell/skaffold.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ deploy:
#- ci/values-keycloak.yaml
# To enable the Gateway API HTTPRoute (requires Envoy Gateway above):
#- ci/values-gateway.yaml
# To test HA gateway behavior with bundled PostgreSQL:
# To test multi-replica external PostgreSQL behavior:
#- ci/values-high-availability.yaml
setValueTemplates:
image.repository: '{{.IMAGE_REPO_openshell_gateway}}'
Expand Down
43 changes: 12 additions & 31 deletions deploy/helm/openshell/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -102,37 +102,6 @@ Namespace where sandbox pods are created. An explicit
{{- .Values.server.sandboxNamespace | default .Release.Namespace -}}
{{- end }}

{{/*
Fully qualified name of the PostgreSQL subchart, mirroring the Bitnami
common.names.fullname template so we stay in sync when users set
postgres.fullnameOverride or postgres.nameOverride.
*/}}
{{- define "openshell.postgresFullname" -}}
{{- if .Values.postgres.fullnameOverride -}}
{{- .Values.postgres.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default "postgres" .Values.postgres.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end }}

{{/*
Name of the Secret holding the PostgreSQL connection URI.
- server.externalDbSecret set: use it verbatim (always wins)
- postgres.enabled=true: derive from Bitnami service-binding naming convention
*/}}
{{- define "openshell.dbSecretName" -}}
{{- if .Values.server.externalDbSecret -}}
{{- .Values.server.externalDbSecret -}}
{{- else -}}
{{- printf "%s-svcbind-custom-user" (include "openshell.postgresFullname" .) -}}
{{- end -}}
{{- end }}

{{/*
Name of the Secret holding gateway-minted sandbox JWT signing material.
*/}}
Expand Down Expand Up @@ -174,3 +143,15 @@ init-container
{{- printf "%s://%s.%s.svc.cluster.local:%d" $scheme (include "openshell.fullname" .) .Release.Namespace (int .Values.service.port) -}}
{{- end -}}
{{- end }}

{{/*
Validate chart values that Helm would otherwise accept silently.
*/}}
{{- define "openshell.validateValues" -}}
{{- if and (hasKey .Values "postgres") (kindIs "map" .Values.postgres) (hasKey .Values.postgres "enabled") -}}
{{- fail "postgres.enabled was removed; the OpenShell chart no longer deploys PostgreSQL. Provision PostgreSQL separately and set server.externalDbSecret to a Secret containing a PostgreSQL URI." -}}
{{- end -}}
{{- if and (gt (int (default 1 .Values.replicaCount)) 1) (not .Values.server.externalDbSecret) -}}
{{- fail "replicaCount > 1 requires server.externalDbSecret; multiple gateway replicas cannot share the default per-pod SQLite database." -}}
{{- end -}}
{{- end }}
4 changes: 2 additions & 2 deletions deploy/helm/openshell/templates/gateway-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ still override anything in this file.

One value is intentionally NOT rendered here:
- server.dbUrl → passed via OPENSHELL_DB_URL env var (from Secret)
when postgres.enabled=true or server.externalDbSecret
is set, otherwise --db-url arg for SQLite
when server.externalDbSecret is set, otherwise
--db-url arg for SQLite
*/}}
apiVersion: v1
kind: ConfigMap
Expand Down
10 changes: 4 additions & 6 deletions deploy/helm/openshell/templates/statefulset.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
{{- if and .Values.postgres.enabled (not .Values.postgres.serviceBindings.enabled) (not .Values.server.externalDbSecret) }}
{{- fail "postgres.serviceBindings.enabled must be true when using bundled PostgreSQL" }}
{{- end }}
{{- include "openshell.validateValues" . }}
apiVersion: apps/v1
kind: StatefulSet
metadata:
Expand Down Expand Up @@ -56,16 +54,16 @@ spec:
args:
- --config
- /etc/openshell/gateway.toml
{{- if not (or .Values.postgres.enabled .Values.server.externalDbSecret) }}
{{- if not .Values.server.externalDbSecret }}
- --db-url
- {{ .Values.server.dbUrl | quote }}
{{- end }}
env:
{{- if or .Values.postgres.enabled .Values.server.externalDbSecret }}
{{- if .Values.server.externalDbSecret }}
- name: OPENSHELL_DB_URL
valueFrom:
secretKeyRef:
name: {{ include "openshell.dbSecretName" . }}
name: {{ .Values.server.externalDbSecret }}
key: uri
{{- end }}
# All gateway settings live in the ConfigMap-backed TOML file
Expand Down
Loading
Loading