feat(specs): add custom runner image specification by jbpratt · Pull Request #1563 · ambient-code/platform

jbpratt · 2026-05-12T13:18:33Z

Summary

Adds specs/agents/runner-image.spec.md defining the stable runner contract and a workspace-level custom image override
Custom images are built via Dockerfile FROM on a published base image — no init hooks
New runner_image and runner_image_pull_secret fields on ProjectSettings let workspace admins configure a custom runner per project
Defines stable interfaces: AG-UI HTTP endpoints, filesystem layout, entrypoint contract, environment variables, security constraints, and Python runtime requirements
Includes image selection precedence (ProjectSettings > agent registry > operator default), registry allowlist validation, RBAC, and failure mode scenarios

Details

The spec establishes the boundary between "what the platform guarantees" and "what custom images can change." Key design decisions:

Dockerfile FROM only — init hooks rejected due to non-reproducibility, startup latency, network dependency, and OpenShift SCC conflicts
ProjectSettings, not Session — image trust is an admin concern; all sessions in a project use the same vetted image
Agent registry is orthogonal — custom image overrides the container image but preserves RUNNER_TYPE, resources, and sandbox config from the registry

Test plan

Review spec for completeness against runner.spec.md and control-plane.spec.md
Verify GIVEN/WHEN/THEN scenarios are testable
Confirm implementation touchpoints table is accurate

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added a comprehensive spec for custom runner images: required AG-UI HTTP endpoints/port, Python 3.12+ requirement, preserved Python minor version, and required filesystem paths.
- Clarified runtime constraints: no CMD/ENTRYPOINT overrides, required non-root UID 1001, forbidden overrides of specific injected env vars, and graceful shutdown behavior.
- Added ProjectSettings options for runner_image and runner_image_pull_secret with precedence, validation, pull-policy rules, RBAC guidance, failure modes, and security/isolation expectations.

netlify · 2026-05-12T13:18:48Z

✅ Deploy Preview for cheerful-kitten-f556a0 canceled.

Name	Link
🔨 Latest commit	`54f94db`
🔍 Latest deploy log	https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a0388fe8232c10009b70e0c

coderabbitai · 2026-05-12T13:18:50Z

📝 Walkthrough

Walkthrough

Adds a stable "custom runner image" specification describing required AG-UI HTTP endpoints, Python/runtime and filesystem constraints, non-root and exec/signal expectations, ProjectSettings-based runner image override with validation and pull-secret support, failure modes, security assumptions, and base-image contract labeling.

Changes

Runner image contract & ProjectSettings override

Layer / File(s)	Summary
AG-UI HTTP contract and required endpoints `specs/agents/runner-image.spec.md` (lines 1–173)	Defines AG-UI interface on `AGUI_PORT` (default 8001) and required endpoints: `/`, `/interrupt`, `/health`, `/capabilities`, `/events/{thread_id}`; response formats must not be removed/changed; remaining endpoints inherit from `ambient_runner`.
Runtime, filesystem, and packaging constraints `specs/agents/runner-image.spec.md` (lines 1–173)	Requires Python 3.12+ and `ambient_runner` package; runner process must preserve base image Python major.minor; enforces preservation of paths (`/workspace`, `/app`, `/app/ambient-runner`, `/app/vertex`, `/tmp`) and forbids removing/relocating them.
Entrypoint, signal handling, and startup behavior `specs/agents/runner-image.spec.md` (lines 1–173)	Custom images must not override CMD/ENTRYPOINT; wrappers allowed only if they `exec` the runner, ensure listener on `AGUI_PORT`, propagate SIGTERM for graceful shutdown (PID 1 or child of PID 1), and start within configured startup timeout.
Control-plane injected env vars and UID/security constraints `specs/agents/runner-image.spec.md` (lines 1–173)	Lists env vars that must not be overridden (e.g., `SESSION_ID`, `PROJECT_NAME`, `WORKSPACE_PATH`, `AGUI_PORT`, backend/grpc/token vars, `INITIAL_PROMPT`, `IS_RESUME`, `CREDENTIAL_IDS`, `RUNNER_TYPE`); requires UID 1001/non-root execution, `allowPrivilegeEscalation: false`, capability drops; root allowed in build stages only.
ProjectSettings fields and selection precedence `specs/agents/runner-image.spec.md` (lines 175–305)	Adds `runner_image` and `runner_image_pull_secret` to ProjectSettings; defines selection precedence (ProjectSettings > agent registry defaults > operator `RUNNER_IMAGE`), without altering agent-type-specific settings like `RUNNER_TYPE` or resource limits; changes apply only to new sessions.
Image validation, allowlist, and pull policy `specs/agents/runner-image.spec.md` (lines 175–305)	Specifies image syntax/host validation and optional allowlist via `RUNNER_IMAGE_ALLOWED_REGISTRIES`; injects `imagePullSecrets` from `runner_image_pull_secret`; imagePullPolicy = `IfNotPresent` for digests and `localhost/` references, `Always` otherwise.
RBAC and operational constraints `specs/agents/runner-image.spec.md` (lines 175–305)	Documents RBAC requirement: only principals with `project_settings:update` may modify `runner_image`/`runner_image_pull_secret`; notes that updates affect new sessions only.
Failure modes and session state transitions `specs/agents/runner-image.spec.md` (lines 306–346)	Enumerates failure cases and outcomes: health/readiness timeout → session `Failed`; startup crashes → `Failed`; missing bridge for `RUNNER_TYPE` → session error; image pull failures → error/backoff and `Failed` as applicable.
Security boundary and isolation expectations `specs/agents/runner-image.spec.md` (lines 347–356)	States assumed controls: NetworkPolicy isolation, per-turn credential fetching, per-session ServiceAccount isolation; clarifies these are platform responsibilities around the image contract.
Base image publishing and contract labeling `specs/agents/runner-image.spec.md` (lines 359–379)	Requires base image to include OCI label `io.ambient-code.runner-contract-version="1"`; mismatch emits a warning (non-blocking) on pod creation; includes example publishing guidance.

🚥 Pre-merge checks | ✅ 8

✅ Passed checks (8 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title follows Conventional Commits format (feat(specs): description) and accurately describes the main change: adding a new runner image specification file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity	✅ Passed	Documentation-only PR (379-line spec file added). No code modifications, no algorithmic implementations, no data structures, loops, caching, or API patterns. Performance check not applicable.
Security And Secret Handling	✅ Passed	No hardcoded secrets, tokens, or injection vulnerabilities. Spec mandates CP-injected env vars, per-turn credentials, per-session SA isolation, RBAC enforcement, and network isolation.
Kubernetes Resource Safety	✅ Passed	PR is documentation-only (spec file, no actual Kubernetes resources). Spec documents security context, resource limits, RBAC, and namespace scoping requirements that implementations must follow.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

specs/agents/runner-image.spec.md (2)
93-93: 💤 Low value

Clarify path description to avoid confusion.

The phrase "MUST contain installed ambient_runner package" could be misread to mean the pip package must be installed at /app/ambient-runner, when it actually means this directory contains the application code (main.py) that imports the package installed elsewhere in site-packages.
📝 Clearer phrasing
-| `/app/ambient-runner` | Runner package source and working directory | MUST contain installed `ambient_runner` package |
+| `/app/ambient-runner` | Runner application root and working directory | MUST contain main.py and application code; requires `ambient_runner` package installed via pip |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` at line 93, The spec line for
`/app/ambient-runner` is ambiguous about where the pip-installed ambient_runner
resides; update the wording so it clearly states that `/app/ambient-runner`
contains the application source (e.g., main.py) which imports the
`ambient_runner` package installed in site-packages, not that the pip package
itself is installed at that path; reference the `/app/ambient-runner` directory,
the application entrypoint `main.py`, and the `ambient_runner` package in the
revised sentence to make this distinction explicit.
461-461: ⚡ Quick win

Consider blocking contract version mismatches by default.

The spec makes version checking advisory-only (CP logs warning but creates pod anyway). However, if a custom image uses contract v2 with breaking changes and the CP expects v1, the session will fail unpredictably at runtime rather than being rejected upfront.
💡 Alternative design

Make blocking the default with operator opt-in for mismatches:
-The CP MAY read this label at pod creation time and log a warning if the contract version does not match the expected version. This is advisory — the CP SHALL NOT block pod creation based on contract version mismatch.
+The CP SHALL read this label at pod creation time. If the contract version does not match the expected version, the CP SHALL transition the session to `Failed` with a condition describing the mismatch UNLESS the operator has set `ALLOW_CONTRACT_VERSION_MISMATCH=true`.
This preserves flexibility for operators who explicitly opt in while preventing accidental incompatibilities.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/agents/runner-image.spec.md` at line 461, Update the sentence about
contract-version handling so the Control Plane (CP) SHALL by default reject pod
creation on a contract version mismatch instead of merely warning; add a clear
operator-configurable override (e.g., an "allowContractMismatch" opt-in flag)
that, when enabled, permits the previous advisory behavior and logs a warning;
ensure the wording references the "contract version" label and the CP's behavior
at "pod creation" so readers can locate and implement the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 274-276: Document that ProjectSettings.runner_image can override
image but not agent-type config (RUNNER_TYPE, resource limits, state dir) and
add a Failure Modes entry describing the cryptic Python import error when a
custom image lacks the required bridge implementation (e.g., ClaudeBridge,
GeminiCLIBridge, LangGraphBridge) for the session's runner type; update the
recommendations to advise building custom images FROM the standard base to
inherit all bridges and add a runtime validation step in the session creation
flow (where ProjectSettings.runner_image is applied) that inspects the image or
performs a quick probe to confirm the presence of the required bridge for the
requested RUNNER_TYPE and surface a clear, actionable error if missing.

---

Nitpick comments:
In `@specs/agents/runner-image.spec.md`:
- Line 93: The spec line for `/app/ambient-runner` is ambiguous about where the
pip-installed ambient_runner resides; update the wording so it clearly states
that `/app/ambient-runner` contains the application source (e.g., main.py) which
imports the `ambient_runner` package installed in site-packages, not that the
pip package itself is installed at that path; reference the
`/app/ambient-runner` directory, the application entrypoint `main.py`, and the
`ambient_runner` package in the revised sentence to make this distinction
explicit.
- Line 461: Update the sentence about contract-version handling so the Control
Plane (CP) SHALL by default reject pod creation on a contract version mismatch
instead of merely warning; add a clear operator-configurable override (e.g., an
"allowContractMismatch" opt-in flag) that, when enabled, permits the previous
advisory behavior and logs a warning; ensure the wording references the
"contract version" label and the CP's behavior at "pod creation" so readers can
locate and implement the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34cd54f5-c174-486c-a499-0113c9af9cf5

📥 Commits

Reviewing files that changed from the base of the PR and between 28874a9 and 0add287.

📒 Files selected for processing (1)

specs/agents/runner-image.spec.md

Define the stable runner contract and a ProjectSettings-driven image override so workspace admins can layer tools onto the base runner via Dockerfile FROM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/agents/runner-image.spec.md`:
- Around line 154-164: The spec enforces a contradictory UID requirement: it
mandates a fixed UID 1001 via Dockerfile `USER 1001` while also recommending
OpenShift arbitrary-UID compatibility (e.g. `chmod -R g=u`), which conflicts
under restrictive SCCs; change the normative contract to require non-root
runtime behavior (`runAsNonRoot: true`, `allowPrivilegeEscalation: false`,
`drop: ["ALL"]` and no root at runtime) and demote `UID 1001`/`Dockerfile USER
1001` to a base-image default or recommendation, keeping the OpenShift
compatibility guidance (`chmod -R g=u` on writable paths) as a SHOULD rather
than a SHALL so implementations can satisfy `runAsNonRoot` without a fixed UID.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7fd1e84c-8a85-4262-a291-b54a4719e4c9

📥 Commits

Reviewing files that changed from the base of the PR and between 0add287 and e576f72.

📒 Files selected for processing (1)

specs/agents/runner-image.spec.md

coderabbitai · 2026-05-12T15:32:46Z

+A custom runner image SHALL run as UID 1001 with no root privileges.
+
+| Constraint | Enforced by |
+|------------|-------------|
+| UID 1001 | Dockerfile `USER 1001` |
+| `runAsNonRoot: true` | Pod SecurityContext |
+| `allowPrivilegeEscalation: false` | Pod SecurityContext |
+| `drop: ["ALL"]` capabilities | Pod SecurityContext |
+
+Custom images MAY use `USER 0` during build stages for installing system packages, provided the final `USER` directive sets UID 1001. Custom images SHOULD include OpenShift arbitrary-UID compatibility (`chmod -R g=u` on writeable paths).
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve UID contract contradiction for OpenShift compatibility

Line 154 mandates a fixed UID (1001), but Line 163 simultaneously recommends OpenShift arbitrary-UID compatibility. These are mutually inconsistent as a normative contract and can lead to incompatible implementations under restricted SCC.

Use a non-root contract as normative (runAsNonRoot, no privilege escalation, dropped caps), and make 1001 a base-image default rather than a hard runtime requirement.

Proposed spec wording change

-A custom runner image SHALL run as UID 1001 with no root privileges. +A custom runner image SHALL run as non-root with no root privileges. The base image default runtime user is UID 1001, but deployments MAY run with an arbitrary non-root UID (e.g., OpenShift restricted SCC). | Constraint | Enforced by | |------------|-------------| -| UID 1001 | Dockerfile `USER 1001` | +| Non-root runtime user (default: UID 1001) | Dockerfile `USER 1001` + Pod SecurityContext | | `runAsNonRoot: true` | Pod SecurityContext | | `allowPrivilegeEscalation: false` | Pod SecurityContext | | `drop: ["ALL"]` capabilities | Pod SecurityContext |

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specs/agents/runner-image.spec.md` around lines 154 - 164, The spec enforces a contradictory UID requirement: it mandates a fixed UID 1001 via Dockerfile `USER 1001` while also recommending OpenShift arbitrary-UID compatibility (e.g. `chmod -R g=u`), which conflicts under restrictive SCCs; change the normative contract to require non-root runtime behavior (`runAsNonRoot: true`, `allowPrivilegeEscalation: false`, `drop: ["ALL"]` and no root at runtime) and demote `UID 1001`/`Dockerfile USER 1001` to a base-image default or recommendation, keeping the OpenShift compatibility guidance (`chmod -R g=u` on writable paths) as a SHOULD rather than a SHALL so implementations can satisfy `runAsNonRoot` without a fixed UID.

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread specs/agents/runner-image.spec.md Outdated

feat(specs): add custom runner image specification

2308ab4

Define the stable runner contract and a ProjectSettings-driven image override so workspace admins can layer tools onto the base runner via Dockerfile FROM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbpratt force-pushed the spec/custom-runner-image branch from b49e8eb to 2308ab4 Compare May 12, 2026 13:50

Merge branch 'main' into spec/custom-runner-image

e576f72

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

mergify Bot added 5 commits May 12, 2026 15:50

Merge branch 'main' into spec/custom-runner-image

25d29a8

Merge branch 'main' into spec/custom-runner-image

89d7e4e

Merge branch 'main' into spec/custom-runner-image

7b710c2

Merge branch 'main' into spec/custom-runner-image

e4c9e26

Merge branch 'main' into spec/custom-runner-image

54f94db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(specs): add custom runner image specification#1563

feat(specs): add custom runner image specification#1563
jbpratt wants to merge 7 commits into
ambient-code:mainfrom
jbpratt:spec/custom-runner-image

jbpratt commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

netlify Bot commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jbpratt commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test plan

Summary by CodeRabbit

Uh oh!

netlify Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for cheerful-kitten-f556a0 canceled.

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jbpratt commented May 12, 2026 •

edited by coderabbitai Bot

Loading

netlify Bot commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading