diff --git a/.agents/skills/debug-openshell-cluster/SKILL.md b/.agents/skills/debug-openshell-cluster/SKILL.md index cb9bd060c..fae8af123 100644 --- a/.agents/skills/debug-openshell-cluster/SKILL.md +++ b/.agents/skills/debug-openshell-cluster/SKILL.md @@ -132,7 +132,7 @@ Common findings: helm -n openshell status openshell helm -n openshell get values openshell kubectl -n openshell get statefulset,pod,svc,pvc -kubectl -n openshell logs statefulset/openshell --tail=200 +kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200 kubectl -n openshell rollout status statefulset/openshell ``` @@ -238,7 +238,7 @@ If the gateway is healthy but sandbox creation fails: ```bash kubectl -n openshell get pods kubectl -n openshell get events --sort-by=.lastTimestamp | tail -n 50 -kubectl -n openshell logs statefulset/openshell --tail=200 +kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200 ``` Check the configured sandbox namespace: @@ -286,7 +286,7 @@ openshell logs | Docker or Podman sandbox never registers | Wrong callback endpoint or supervisor startup failure | Gateway logs and sandbox container logs | | Docker GPU e2e fails before GPU sandbox comparison | NVIDIA CDI specs are missing or Docker has not discovered them | `docker info --format '{{json .DiscoveredDevices}}'`, `/etc/cdi`, `/var/run/cdi`, `nvidia-cdi-refresh.service` | | Kubernetes gateway pod pending | PVC unbound, taint, selector, or insufficient resources | `kubectl -n openshell describe pod ` | -| Kubernetes gateway pod crash loops | Missing secret, bad DB URL, bad TLS config | `kubectl -n openshell logs statefulset/openshell` | +| Kubernetes gateway pod crash loops | Missing secret, bad DB URL, bad TLS config | `kubectl -n openshell logs statefulset/openshell -c openshell-gateway` | | CLI TLS error | Local mTLS bundle does not match server cert/CA | Check `~/.config/openshell/gateways//mtls/` | | Image pull failure | Gateway or sandbox image cannot be pulled | Runtime events and image pull credentials | | `K8s namespace not ready` with `envoy-gateway-openshell.yaml: the server could not find the requested resource` | Optional Gateway API manifest was applied without Envoy Gateway CRDs, or k3s Helm controller startup exceeded the namespace wait | Apply `deploy/kube/manifests/envoy-gateway-openshell.yaml` manually only after Envoy Gateway is installed and `grpcRoute` is enabled | diff --git a/deploy/helm/openshell/templates/statefulset.yaml b/deploy/helm/openshell/templates/statefulset.yaml index e6cb1037e..3c0bd2cd3 100644 --- a/deploy/helm/openshell/templates/statefulset.yaml +++ b/deploy/helm/openshell/templates/statefulset.yaml @@ -46,7 +46,7 @@ spec: securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} containers: - - name: {{ .Chart.Name }} + - name: openshell-gateway securityContext: {{- toYaml .Values.securityContext | nindent 12 }} image: {{ include "openshell.image" . | quote }} diff --git a/deploy/helm/openshell/tests/gateway_config_test.yaml b/deploy/helm/openshell/tests/gateway_config_test.yaml index e4edb5612..e9ca8014f 100644 --- a/deploy/helm/openshell/tests/gateway_config_test.yaml +++ b/deploy/helm/openshell/tests/gateway_config_test.yaml @@ -18,6 +18,13 @@ tests: - exists: path: spec.template.metadata.annotations["checksum/gateway-config"] + - it: uses a stable gateway container name + template: templates/statefulset.yaml + asserts: + - equal: + path: spec.template.spec.containers[0].name + value: openshell-gateway + - it: mounts the OIDC CA bundle when TLS is disabled template: templates/statefulset.yaml set: