Skip to content

Run code-executor unprivileged with seccomp on k8s >= 1.33#311

Draft
mertbozfakioglu wants to merge 5 commits into
r2from
mertbozfakioglu/plat-1012-ce-less-privileged-seccomp
Draft

Run code-executor unprivileged with seccomp on k8s >= 1.33#311
mertbozfakioglu wants to merge 5 commits into
r2from
mertbozfakioglu/plat-1012-ce-less-privileged-seccomp

Conversation

@mertbozfakioglu

Copy link
Copy Markdown

Summary

Makes the code-executor (ce) run unprivileged on Kubernetes >= 1.33, mirroring how the JS executor (jsExecutor) already sandboxes itself, while keeping the existing privileged behavior on older clusters. Addresses PLAT-1012.

When rendered against k8s >= 1.33, the code-executor now uses:

  • a localhost seccomp profile (nsjail-seccomp.json, the same one jsExecutor uses)
  • capabilities.add: [NET_ADMIN]
  • procMount: Unmasked
  • hostUsers: false (user namespaces, required for procMount: Unmasked)
  • an install-seccomp init container that installs the profile onto the node (same pattern as deployment_js_executor.yaml)

On k8s < 1.33 it falls back to the previous privileged: true, so the chart still installs without requiring 1.33+ (the ProcMountType and UserNamespacesSupport feature gates are only on-by-default at 1.33).

Setting codeExecutor.securityContext explicitly continues to override the auto behavior in either direction.

Why 1.33

procMount: Unmasked depends on the ProcMountType feature gate, which (along with UserNamespacesSupport that it relies on) is only enabled by default starting in Kubernetes 1.33.

Behavior change

codeExecutor.securityContext previously defaulted to privileged: true; it is now commented out so version auto-detection is the default. On upgrade, clusters on 1.33+ flip from privileged to the seccomp sandbox automatically; clusters < 1.33 are unchanged. Users can pin codeExecutor.securityContext to force a mode.

Changes

  • charts/retool/templates/deployment_code_executor.yaml - version-gated securityContext, install-seccomp init container, hostUsers: false, seccomp checksum annotation, seccomp volumes
  • charts/retool/templates/configmap_code_executor.yaml (new) - embeds the seccomp profile into a ConfigMap (rendered only on the seccomp path)
  • charts/retool/values.yaml - comment out the default securityContext, add codeExecutor.seccompLocalhostProfile
  • charts/retool/Chart.yaml - bump 6.11.0 -> 6.12.0

Test plan

  • helm template --kube-version 1.34.3 renders seccomp config + init container + ConfigMap + hostUsers: false
  • helm template --kube-version 1.32.3 renders privileged: true and none of the seccomp scaffolding
  • helm template --kube-version 1.34.3 --set codeExecutor.securityContext.privileged=false honors the explicit override and skips scaffolding
  • helm lint passes
  • Validate on a live 1.33+ cluster that the code-executor starts and the seccomp profile is installed on the node
  • Validate nsjail sandboxing still functions under the seccomp profile (no denied syscalls)

Made with Cursor

mertbozfakioglu and others added 3 commits June 8, 2026 13:38
On Kubernetes 1.33+ (where the ProcMountType and UserNamespacesSupport
feature gates are on by default), the code-executor now runs unprivileged
using a localhost seccomp profile, NET_ADMIN, an unmasked /proc, and user
namespaces - mirroring how the JS executor sandboxes itself. The nsjail
seccomp profile is installed onto the node by an install-seccomp init
container. On older clusters it falls back to the existing privileged mode,
so the chart still installs without requiring 1.33+.

Setting codeExecutor.securityContext explicitly continues to override this
behavior for either mode.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds an unprivileged execution mode for the code-executor on Kubernetes ≥ 1.33, gating on semverCompare ">=1.33-0" and the absence of an explicit codeExecutor.securityContext. Clusters on older versions continue to use privileged: true unchanged.

  • Seccomp scaffolding: a new configmap_code_executor.yaml embeds nsjail-seccomp.json; the deployment grows an install-seccomp init container (identical to the jsExecutor pattern), hostUsers: false, procMount: Unmasked, NET_ADMIN capability, and a checksum/seccomp rollout annotation.
  • values.yaml: the default securityContext: privileged: true is commented out; codeExecutor.seccompLocalhostProfile (defaulting to profiles/nsjail-seccomp.json) is added for operators who need a custom path.
  • Key divergence from jsExecutor: unlike the jsExecutor, the code-executor requires procMount: Unmasked, which in turn requires hostUsers: false (user namespaces). This means the install-seccomp init container writes to a hostPath volume while running inside a user namespace — a scenario that depends on the node having idmap-mount support in the kernel (≥ 5.12 partial, ≥ 6.3 full for hostPath); without it the init container fails and the pod never reaches Running state.

Confidence Score: 3/5

Safe to merge for clusters where all nodes run Linux kernel ≥ 5.12; the pod will silently fail to start on 1.33+ clusters whose kernels pre-date idmap-mount support.

The core design is sound and mirrors the established jsExecutor pattern. However, adding hostUsers: false to the whole pod while also needing the init container to write to a hostPath directory creates a real runtime dependency on idmap-mount kernel support. The two live-cluster validation checkboxes in the test plan are explicitly unchecked, so this failure mode has not been exercised yet. The allowPrivilegeEscalation omission in the seccomp security context is a hardening gap worth closing before wide rollout.

charts/retool/templates/deployment_code_executor.yaml — the hostUsers: false + hostPath-write interaction and the missing allowPrivilegeEscalation: false in the seccomp security context branch.

Important Files Changed

Filename Overview
charts/retool/templates/deployment_code_executor.yaml Adds version-gated seccomp scaffolding (hostUsers, init container, volumes, security context); the init container write to /var/lib/kubelet/seccomp depends on idmap-mount kernel support when hostUsers: false is active, and allowPrivilegeEscalation is not explicitly disabled in the seccomp branch.
charts/retool/templates/configmap_code_executor.yaml New ConfigMap rendering the nsjail-seccomp.json profile; condition mirrors the deployment's $useSecComp logic exactly and follows the existing jsExecutor ConfigMap pattern.
charts/retool/values.yaml Comments out the default privileged securityContext and adds seccompLocalhostProfile with a sensible default; updated comments clearly document the auto-detection behavior and escape hatches.
charts/retool/Chart.yaml Bumps chart version from 6.11.0 to 6.12.0, appropriate for a behavior-changing release.

Sequence Diagram

sequenceDiagram
    participant Helm as Helm Template Engine
    participant CM as ConfigMap (nsjail-seccomp)
    participant Init as install-seccomp (init)
    participant Node as Node /var/lib/kubelet/seccomp
    participant CE as code-executor container

    Helm->>CM: "Render ConfigMap with nsjail-seccomp.json (k8s >=1.33 only)"
    Helm->>Init: Render install-seccomp init container
    Helm->>CE: Render securityContext (NET_ADMIN + procMount:Unmasked + Localhost seccomp)

    Note over Helm,CE: At runtime (k8s >= 1.33, hostUsers:false)

    Init->>CM: Mount seccomp-profile volume
    Init->>Node: Copy nsjail-seccomp.json via host-seccomp hostPath
    Note over Init,Node: Requires idmap mounts (kernel >= 5.12)
    Init-->>CE: Init completes, main container starts

    CE->>CE: nsjail reads seccomp profile from /proc (procMount:Unmasked)
    CE-->>Node: Seccomp profile applied from Localhost path
Loading

Reviews (1): Last reviewed commit: "Drop codeExecutor securityContext commen..." | Re-trigger Greptile

Comment on lines +51 to +58
{{- if $useSecComp }}
hostUsers: false
{{- end }}
{{- if or $useSecComp .Values.initContainers }}
initContainers:
{{- if $useSecComp }}
- name: install-seccomp
image: busybox:1.37.0@sha256:b3255e7dfbcd10cb367af0d409747d511aeb66dfac98cf30e97e87e4207dd76f

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 hostUsers: false creates a write-permission dependency on idmap mounts

The pod-level hostUsers: false enables user namespaces for the entire pod, including the install-seccomp init container. With user namespaces active, UID 0 inside the container maps to a non-privileged UID on the host (e.g. 65536), so the /var/lib/kubelet/seccomp directory — typically owned by host root — is only writable if the kubelet is using idmap mounts (Linux kernel ≥ 5.12 for partial support, ≥ 6.3 for full hostPath support). The jsExecutor's identical init container works without issue because it does not set hostUsers: false.

If the node kernel is older (even on a k8s 1.33 cluster), the init container will fail with a permission-denied error and the pod will never reach the running state. The two live-cluster validation items in the test plan remain unchecked — this is the exact failure mode they would surface.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 94 to +106
securityContext:
{{ if .Values.codeExecutor.securityContext }}
{{- if .Values.codeExecutor.securityContext }}
{{ toYaml .Values.codeExecutor.securityContext | indent 10 }}
{{ else }}
{{- else if $useSecComp }}
capabilities:
add: ["NET_ADMIN"]
procMount: Unmasked
seccompProfile:
type: Localhost
localhostProfile: {{ .Values.codeExecutor.seccompLocalhostProfile }}
{{- else }}
privileged: true
{{ end }}
{{- end }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 allowPrivilegeEscalation not explicitly disabled in the seccomp branch

The seccomp security context adds NET_ADMIN but does not set allowPrivilegeEscalation: false. Without that field the kubelet leaves the no_new_privs bit unset, which means a setuid binary inside the container could still gain elevated privileges despite the seccomp filter. The jsExecutor security context has the same gap, but NET_ADMIN is a more powerful capability than what jsExecutor holds, making the omission slightly more impactful here. Consider adding allowPrivilegeEscalation: false to close this vector under the seccomp path.

mertbozfakioglu and others added 2 commits June 8, 2026 13:51
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant