Skip to content

Delegate urls flag#1591

Open
maknop wants to merge 78 commits into
ambient-code:mainfrom
RedHatInsights:delegate_urls_flag
Open

Delegate urls flag#1591
maknop wants to merge 78 commits into
ambient-code:mainfrom
RedHatInsights:delegate_urls_flag

Conversation

@maknop
Copy link
Copy Markdown
Contributor

@maknop maknop commented May 15, 2026

Summary of Changes

File modified: components/manifests/templates/template-services.yaml:692

Change:

  • --scope=user:full
  • --openshift-delegate-urls={"/":{"resource":"projects","verb":"list"}}
  • --upstream-timeout=5m

What This Fixes

The --openshift-delegate-urls flag makes the OAuth proxy validate the user's access token on every request by checking if they have permission to list projects. This ensures:

  1. Expired tokens are caught immediately at the proxy level instead of being forwarded to the backend
  2. Better error messages - users get redirected to re-authenticate instead of seeing "Token expired or invalid"
  3. Consistent behavior with the production overlay (which already has this flag)

Why This Solves the ~2 Minute Issue

Without this flag:

  • OAuth proxy only validates authentication (logged in?) but not authorization
  • Expired tokens are passed through to the backend
  • Backend discovers the expired token only when SSAR cache expires (30s intervals)
  • Creates the cascading error pattern you were seeing

With this flag:

  • OAuth proxy validates both authentication AND authorization on every request
  • Expired/invalid tokens are rejected at the proxy layer
  • User is automatically redirected to re-authenticate
  • No cascading errors

Summary by CodeRabbit

  • New Features

    • Added OAuth authentication proxy for frontend secure access.
    • Introduced comprehensive CI/CD pipeline configurations for automated component builds.
    • Added deployment templates for operator and application services on Kubernetes.
  • Chores

    • Upgraded frontend Node.js runtime to version 24.
    • Standardized database secret naming across deployments.
    • Updated database configuration for external RDS connectivity with SSL.

red-hat-konflux and others added 30 commits April 22, 2026 08:44
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Signed-off-by: red-hat-konflux <konflux@no-reply.konflux-ci.dev>
Creates kustomize overlay for deploying to hcmais01ue1 via app-interface:
- Uses Konflux images from redhat-services-prod/hcm-eng-prod-tenant
- Scales down in-cluster databases (using external RDS from app-interface Phase 2)
- Scales down MinIO (using external S3 from app-interface Phase 2)
- Includes CRDs, RBAC, routes, and all application components
- Patches operator to use Konflux runner image

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Convert kustomize overlay to OpenShift Template format for app-interface
SaaS deployment. Split into two templates:

1. template-operator.yaml (CRDs, ClusterRoles, operator deployment)
   - Operator and ambient-runner images
   - Cluster-scoped resources (CRDs, RBAC)
   - Operator deployment and its ConfigMaps

2. template-services.yaml (Application services)
   - Backend, frontend, public-api, ambient-api-server images
   - All deployments, services, routes, configmaps
   - Scales in-cluster services to 0 (minio, postgresql, unleash)

Both templates use IMAGE_TAG parameter (auto-generated from git commit SHA)
and support Konflux image gating through app-interface.

This allows app-interface to use provider: openshift-template with
proper parameter substitution instead of the directory provider which
doesn't run kustomize build.
Creates kustomize overlay for deploying to hcmais01ue1 via app-interface:
- Uses Konflux images from redhat-services-prod/hcm-eng-prod-tenant
- Scales down in-cluster databases (using external RDS from app-interface Phase 2)
- Scales down MinIO (using external S3 from app-interface Phase 2)
- Includes CRDs, RBAC, routes, and all application components
- Patches operator to use Konflux runner image

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The objects field must be a YAML array with proper list indicators.
Previous version was missing the '-' prefix on array items, causing:
'unable to decode STDIN: json: cannot unmarshal object into Go struct
field Template.objects of type []runtime.RawExtension'

Changes:
- Rebuild templates using Python yaml library for correct formatting
- Objects now properly formatted as YAML array with '- apiVersion:'
- Add validate.sh script for testing with oc process
- Both templates validated successfully

Generated from kustomize overlay output with proper YAML structure.
Remove minio, postgresql, unleash, ambient-api-server-db.
Using external RDS and S3 from app-interface.

Removed 12 resources (4 Deployments, 4 Services, 3 PVCs, 1 Secret)
Remaining: ambient-api-server, backend-api, frontend, public-api
Disables OTEL metrics export by commenting out OTEL_EXPORTER_OTLP_ENDPOINT
environment variable in operator deployment manifests.

The operator was configured to send metrics to otel-collector.ambient-code.svc:4317,
but this service does not exist in the cluster, causing repeated gRPC connection
failures every 30 seconds with error:
"failed to upload metrics: context deadline exceeded: rpc error: code = Unavailable
desc = name resolver error: produced zero addresses"

With OTEL_EXPORTER_OTLP_ENDPOINT unset, InitMetrics() will skip metrics export
and log "metrics export disabled" instead of throwing connection errors.

Changes:
- Comment out OTEL_EXPORTER_OTLP_ENDPOINT in base operator deployment
- Comment out OTEL_EXPORTER_OTLP_ENDPOINT in OpenShift template
- Add clarifying comment about re-enabling when collector is deployed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Add oauth-proxy component to frontend deployment (dashboard-ui port on 8443)
- Enable SSL for ambient-api-server RDS connection (db-sslmode=require)
- Set AMBIENT_ENV to 'stage' for ambient-api-server
- Enable OpenShift service-ca for ambient-api-server TLS cert provisioning
- Regenerate templates with new oauth-proxy and api-server patches

This enables:
- Authenticated access to frontend via OpenShift OAuth
- Secure connections to external RDS database
- Automatic TLS certificate rotation for ambient-api-server

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove postgresql, minio, unleash, and ambient-api-server-db resources
from the services template. These services are scaled to 0 via kustomize
patches because we use external RDS and S3 instead.

Including them in the template causes app-interface to try deploying
them, which fails imagePattern validation and wastes resources.

Excluded resources:
- Deployment/postgresql, Service/postgresql
- Deployment/minio, Service/minio, PVC/minio-data
- Deployment/unleash, Service/unleash
- Deployment/ambient-api-server-db, Service/ambient-api-server-db

Template now has 21 service resources (down from 30).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Switch from custom vault secrets to OpenShift service account-based OAuth:
- Use Red Hat's official ose-oauth-proxy-rhel9 image
- Use service account token for cookie secret (no vault needed)
- Enable HTTPS on OAuth proxy with OpenShift service-ca auto-generated certs
- Add system:auth-delegator ClusterRoleBinding for OAuth delegation
- Add OAuth redirect reference annotation to frontend ServiceAccount
- Fix service account reference from 'nginx' to 'frontend'
- Add missing NAMESPACE and UPSTREAM_TIMEOUT parameters

Benefits:
- No manual vault secret management
- Automatic TLS cert rotation via service-ca
- Standard OpenShift OAuth integration pattern
- Follows app-interface team recommendations

Files changed:
- frontend-rbac.yaml: Added OAuth annotations and auth-delegator binding
- oauth-proxy component patches: Updated to new configuration
- Templates: Regenerated with OAuth fixes (27 operator, 21 service resources)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The RDS credentials secret should not be in the OpenShift template - it's
provided by the external resource provider (terraform) in app-interface.

The namespace's externalResources section already defines:
  - provider: rds
    output_resource_name: ambient-code-rds

This automatically creates the secret with the correct RDS credentials.
Including the secret in the template with VAULT_INJECTED placeholders
caused deployment failures.

Changes:
- Excluded ambient-code-rds secret from template generation
- Template now has 20 service resources (down from 21)
- Deployment still references the secret via volumeMount (correct)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
Changes GCP service account configuration to align with app-interface
deployment where credentials are provided via Vault.

Changes:
- template-services.yaml: Update backend vertex-credentials secret name
  from 'ambient-vertex' to 'stage-gcp-creds' (matches Vault secret)
- template-operator.yaml: Update GOOGLE_APPLICATION_CREDENTIALS path
  to match Vault secret key name 'itpc-gcp-hcm-pe-eng.json'

The secret is provided by app-interface via:
  path: engineering-productivity/ambient-code/stage-gcp-creds

This allows the backend and operator to use Vertex AI for Claude and
Gemini API calls with the service account configured with
roles/aiplatform.user permissions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
Configure OAuth proxy sidecar to inject authentication token into
forwarded requests, fixing 401 errors on /api/projects endpoints.

Changes:
- Add --pass-access-token=true flag to inject X-Forwarded-Access-Token header
- Change upstream from frontend-service:3000 to localhost:3000 (correct sidecar pattern)
- Remove --request-logging to reduce log noise

Backend logs showed:
  tokenSource=none hasAuthHeader=false hasFwdToken=false

The backend expects the X-Forwarded-Access-Token header, which is now
injected by the OAuth proxy for all authenticated requests.

Flow:
1. User authenticates via OpenShift OAuth ✓
2. OAuth proxy injects token header ✓ (new)
3. Frontend forwards token to backend API ✓ (fixed)

This resolves the 401 authentication errors while maintaining the
working OpenShift OAuth integration.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed the '--set-authorization-header=true' option from the configuration.
wcmitchell and others added 27 commits May 4, 2026 17:06
…sterrole

fix: add MLflow permissions to agentic-operator ClusterRole
The operator needs to create NetworkPolicies in user namespaces to
isolate runner pods. Without this permission, session creation fails
with:

  networkpolicies.networking.k8s.io is forbidden:
  User "system:serviceaccount:ambient-code:agentic-operator"
  cannot create resource "networkpolicies" in API group
  "networking.k8s.io" in the namespace "mknop-ws"

This adds create/delete/get/list permissions for NetworkPolicies
to the agentic-operator ClusterRole.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Configure oauth-proxy to route /api/* requests to backend-service instead
of the Next.js frontend. Without this routing, all requests including /api/*
go to localhost:3000, causing 503 errors because Next.js doesn't handle
backend API routes.

Changes:
- Add --upstream=http://backend-service:8080/api/ before default upstream
- Requests to /api/* now route to backend-service:8080
- All other requests continue to Next.js frontend at localhost:3000

OAuth2-proxy processes upstreams in order and uses the path portion as a
matching key. The /api/ path in the upstream URL matches any request
starting with /api/, and the full request path is forwarded to the backend.

Request flow example:
  Browser: GET https://ambient.corp.stage.redhat.com/api/projects/foo/sessions/bar
  → OAuth-proxy checks auth via --openshift-delegate-urls
  → Matches --upstream=http://backend-service:8080/api/ (longest match)
  → Forwards to: http://backend-service:8080/api/projects/foo/sessions/bar

Fixes browser console errors:
  GET /api/projects/.../git/status [503 Service Unavailable]
  AG-UI stream error: Connection error
  The connection to .../agui/events was interrupted

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
fix: add backend API routing to oauth-proxy upstream
Remove --openshift-delegate-urls parameter from oauth-proxy that was
blocking /api/* requests with "no resource mapped path" errors.

Issue:
- openshift-delegate-urls={"/api":{"resource":"projects","verb":"list"}}
  only matches /api exactly, not /api/* subpaths
- All /api/* requests were returning 503 even though backend received
  and processed them successfully (200 OK in backend logs)
- oauth-proxy logs showed: "no resource mapped path"

Solution:
OAuth-proxy still provides authentication (OAuth login required for all
requests) and passes the access token to the backend via --pass-access-token.
The backend handles its own fine-grained authorization based on the token,
so the blanket openshift-delegate-urls check is redundant and overly
restrictive.

Authorization flow after this change:
1. User authenticates via OAuth (enforced by oauth-proxy)
2. oauth-proxy passes access token to backend
3. Backend validates token and checks user permissions per endpoint
4. Backend returns appropriate response (200, 403, 404, etc.)

This matches the backend's existing authorization model where different
API endpoints have different permission requirements that can't be
expressed in a single openshift-delegate-urls pattern.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…urls

fix: remove overly restrictive openshift-delegate-urls check
increased initial prompt deploy seconds to 10 seconds
Kubernetes Health Probes:
- Added readiness probe (3s initial delay, 5s period)
- Added liveness probe (20s initial delay, 30s period)
- Prevents Service routing traffic before FastAPI is ready
- Reduces 503 "runner unavailable" errors

Error Logging Improvements:
- Enhanced retry error logging to include exception type
- Previously logged empty strings for exceptions like asyncio.TimeoutError
- Now logs: "error: TimeoutError: <details>" instead of "error: "

Benefits:
- Prevents premature traffic routing to starting pods
- More informative error logs for debugging
- Better system resilience through health probes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Allows overriding the GCP project used for Vertex AI API calls.
Defaults to itpc-gcp-hcm-pe-eng-claude where service account
has proper aiplatform.user permissions.

Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
…ertex_id

feat: parameterize ANTHROPIC_VERTEX_PROJECT_ID
…improve-logging

fix(runner): add health probes and improve INITIAL_PROMPT error logging
Add NetworkPolicy to deploy templates allowing ingress traffic from
runner pods to backend-api. This resolves connectivity issues where
runner pods in user namespaces cannot reach backend-service due to
default-deny NetworkPolicies.

The NetworkPolicy:
- Targets backend-api pods specifically (vs. all pods)
- Allows ingress from ambient-code-runner pods across all namespaces
- Uses ${NAMESPACE} template parameter for proper scoping

Based on upstream PR: ambient-code#1553

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…emplate

feat: add NetworkPolicy to allow runner pod ingress
updated resouces requests for operator/runner
Configure OAuth proxy to proactively refresh authentication cookies
before token expiration to prevent "Token expired or invalid" errors
during long-running sessions.

Changes:
- Added --cookie-expire=24h to set session lifetime
- Added --cookie-refresh=1h to refresh tokens every hour
- Ensures tokens are refreshed before the typical 24h expiration

This prevents users from encountering authentication errors mid-session
when the Kubernetes access token expires. The 1-hour refresh interval
provides a safety margin while maintaining security best practices.

Fixes token expiration issue where users get "Token expired or invalid"
error after chatting with Claude for extended periods.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…e-refresh

fix: add OAuth proxy cookie refresh to prevent token expiration
Update the OAuth proxy upstream configuration in the frontend
deployment from http://backend-service:8080/api/ to
http://localhost:3000 to align with the local frontend service.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…onfig

fix: update oauth proxy upstream to localhost:3000
Updates 11 Tekton task bundles to resolve enterprise contract warnings:
- apply-tags, build-image-index, buildah-oci-ta, clamav-scan
- coverity-availability-check, ecosystem-cert-preflight-checks
- git-clone-oci-ta, init, prefetch-dependencies-oci-ta
- push-dockerfile-oci-ta, rpms-signature-scan

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
chore(konflux): update task bundle SHAs to latest versions
Konflux is flagging our frontend image as deprecated, likely because
there isn't a current nodejs 20 image anymore. Updateing to 24

Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
Signed-off-by: Chris Mitchell <cmitchel@redhat.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 15, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit a9ac6d9
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a076b63170f6700089d0397

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR consolidates infrastructure and deployment configuration by introducing ten Tekton pipeline manifests for building five components (ambient-api-server, ambient-runner, backend, frontend, operator, public-api), migrating database secrets to a unified ambient-code-rds reference across all overlays, defining full OpenShift templates for operator and runtime services, upgrading frontend Node.js from 20 to 24, configuring OAuth proxy RBAC, and applying minor operator runtime adjustments.

Changes

Infrastructure & Deployment Configuration Consolidation

Layer / File(s) Summary
Tekton Build Pipelines for Five Components
.tekton/ambient-code-*-main-{pull-request,push}.yaml (10 files)
Introduces comprehensive Tekton PipelineRuns for building ambient-api-server, ambient-runner, backend, frontend, operator, and public-api. Each pipeline initializes proxy/cache settings, clones into OCI storage, optionally prefetches dependencies, builds via buildah with configurable arguments, builds image index and optional source image, conditionally runs security checks (Clair, ecosystem cert, SAST/Snyk, ClamAV, Coverity availability gating Coverity SAST, shell/unicode checks), applies tags, pushes dockerfile artifacts, and runs RPM signature scanning when checks enabled.
Database Secret Consolidation
components/manifests/base/platform/ambient-api-server-*.yml, components/manifests/components/ambient-api-server-db/*, components/manifests/overlays/*/, components/ambient-api-server/templates/db-template.yml
Renames ambient-api-server-dbambient-code-rds across all overlays (base, local-dev, production, kind, app-interface). Updates volume references, init container env vars, and PostgreSQL Deployment secret keys. Adds app-interface RDS secret patch for Vault-backed external RDS connection.
OpenShift Templates: Operator & Services
components/manifests/templates/template-operator.yaml, components/manifests/templates/template-services.yaml
Defines operator template with AgenticSession/ProjectSettings CRDs, RBAC (ServiceAccounts, ClusterRoles, ClusterRoleBindings), ConfigMaps (agent registry, auth, features, models), NetworkPolicy, and agentic-operator Deployment. Defines services template with Namespace, Secrets, Services, LimitRange, PVCs, four Deployments (api-server with migrations, backend-api with OAuth/credential wiring, frontend with oauth-proxy sidecar, public-api with hardened security), PodDisruptionBudget, and Routes with TLS termination.
Frontend Node.js 20→24 & OAuth RBAC
components/frontend/Dockerfile, components/frontend/package.json, components/manifests/base/rbac/frontend-rbac.yaml, components/manifests/components/oauth-proxy/frontend-oauth-*.yaml
Upgrades frontend Dockerfile deps/builder/runner stages and @types/node to Node.js 24. Adds OAuth redirect annotation to frontend ServiceAccount and replaces custom ambient-frontend-auth ClusterRoleBinding with system:auth-delegator. Updates oauth-proxy sidecar image, args (client-id, secret paths, cookie/upstream settings), HTTPS probes, and TLS volume mounting frontend-proxy-tls.
App-Interface Overlay
components/manifests/overlays/app-interface/*, components/manifests/overlays/app-interface/kustomization.yaml
Configures app-interface overlay with Routes (ambient-api-server REST/gRPC, backend, frontend, public-api), Kustomization patching operator image to Konflux runner, scaling internal deps to zero, enabling RDS SSL/service-ca TLS, overriding component images to quay.io/redhat-services-prod, Namespace/labels, and Vertex AI ConfigMap.
Operator OTel & Session Configuration
components/operator/internal/controller/otel_metrics.go, components/operator/internal/handlers/sessions.go
Initializes no-op meter instruments when OTLP disabled and logs no-op state. Simplifies runner container readiness/liveness probes by removing timeout/failure-threshold. Adds INITIAL_PROMPT_DELAY_SECONDS=10 env var to runner.
Production & Local Overlays
components/manifests/overlays/production/ambient-api-server-*-patch.yaml, components/manifests/overlays/production/kustomization.yaml, components/manifests/overlays/local-dev/*, components/manifests/overlays/kind/*
Adds production RDS SSL patches (jwt-args and migration-ssl) with --db-sslmode=require. Updates kustomization to apply migration SSL patch. Updates local-dev and kind overlays to reference ambient-code-rds secret.
Documentation & Validation
components/manifests/README.md, components/manifests/templates/validate.sh
Updates README with ambient-code-rds secret descriptions. Adds validate.sh script to validate OpenShift templates via oc process.

Suggested labels

ambient-code:needs-human


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error K8s Secret missing OwnerReferences in ambient-api-server-db-secret-patch.yaml. The ambient-code-rds Secret lacks metadata.ownerReferences, risking orphaning and lifecycle drift per criterion #6. Add ownerReferences to the Secret resource to establish lifecycle ownership and prevent orphaning per coding guidelines.
Title check ⚠️ Warning Title 'Delegate urls flag' does not follow Conventional Commits format (type(scope): description). Missing type prefix and proper scope/description structure. Reformat title to Conventional Commits format, e.g., 'ci(manifests): delegate URLs flag for OAuth proxy validation' or 'feat(oauth-proxy): enable openshift-delegate-urls for per-request token validation'.
Kubernetes Resource Safety ⚠️ Warning Multiple Kubernetes resource safety violations: (1) Secret missing ownerReferences; (2) 4 containers lack securityContext entirely; (3) Insecure securityContext on ambient-api-server. Add ownerReferences to Secret ambient-code-rds. Add securityContext to backend-api, frontend, oauth-proxy, agentic-operator containers. Fix ambient-api-server securityContext settings (readOnlyRootFilesystem, runAsNonRoot).
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed No performance regressions found. Metrics init runs once at startup, probe tuning is config-only, Node.js upgrade is infrastructure, manifests are declarative YAML.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
components/operator/internal/handlers/sessions.go (1)

1047-1094: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🚨 Duplicate ReadinessProbe and LivenessProbe fields prevent compilation.

The corev1.Container struct literal declares ReadinessProbe at lines 1047 and 1075, and LivenessProbe at lines 1059 and 1085. Go rejects duplicate field names in struct literals—the operator binary won't build.

Remove the first block (lines 1047–1072) and keep the simplified probes (lines 1075–1094).

Proposed fix
-			ReadinessProbe: &corev1.Probe{
-				ProbeHandler: corev1.ProbeHandler{
-					HTTPGet: &corev1.HTTPGetAction{
-						Path: "/health",
-						Port: intstr.FromInt32(runnerPort),
-					},
-				},
-				InitialDelaySeconds: 3,
-				PeriodSeconds:       5,
-				TimeoutSeconds:      2,
-				FailureThreshold:    3,
-			},
-			LivenessProbe: &corev1.Probe{
-				ProbeHandler: corev1.ProbeHandler{
-					HTTPGet: &corev1.HTTPGetAction{
-						Path: "/health",
-						Port: intstr.FromInt32(runnerPort),
-					},
-				},
-				InitialDelaySeconds: 20,
-				PeriodSeconds:       30,
-				TimeoutSeconds:      5,
-				FailureThreshold:    3,
-			},
-
 			VolumeMounts: runnerVolumeMounts,

 			// Health probes to prevent Service routing before FastAPI is ready
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/operator/internal/handlers/sessions.go` around lines 1047 - 1094,
The container spec in the corev1.Container struct literal contains duplicate
ReadinessProbe and LivenessProbe fields which causes a compile error; in the
block that sets up the runner container (the corev1.Container literal that
references runnerPort and runnerVolumeMounts) remove the first
ReadinessProbe/LivenessProbe pair (the one with InitialDelaySeconds:3/20,
PeriodSeconds:5/30, TimeoutSeconds and FailureThreshold values) and keep the
later simplified probe definitions (the second ReadinessProbe/LivenessProbe
pair); ensure only a single ReadinessProbe and a single LivenessProbe remain in
that container spec.
components/manifests/base/platform/ambient-api-server-db.yml (2)

22-33: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

PVC missing OwnerReferences.

Child resources like PVCs must have ownerReferences set to enable proper lifecycle management and garbage collection. As per coding guidelines, all child resources (Jobs, Secrets, PVCs) must have OwnerReferences with controller owner refs.

🔗 Recommended fix
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: ambient-api-server-db-data
   labels:
     app: ambient-api-server
     component: database
+  ownerReferences:
+  - apiVersion: apps/v1
+    kind: Deployment
+    name: ambient-api-server-db
+    uid: <set-by-controller>
+    controller: true
+    blockOwnerDeletion: true
 spec:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/base/platform/ambient-api-server-db.yml` around lines 22
- 33, The PVC manifest for PersistentVolumeClaim named
"ambient-api-server-db-data" is missing metadata.ownerReferences; add a
metadata.ownerReferences array on that resource containing the owning
controller's apiVersion, kind, name and uid and set controller: true and
blockOwnerDeletion: true so the PVC is garbage-collected with its owner (use the
actual owner resource's values—e.g., the Database/StatefulSet/Deployment that
owns the PVC—for apiVersion, kind, name and uid). Ensure the ownerReference is
placed under metadata in the manifest for the resource
"ambient-api-server-db-data" so Jobs/Secrets/PVCs follow the project's
controller owner ref convention.

58-101: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

PostgreSQL container missing SecurityContext and resource limits.

Container violates mandatory security requirements:

  1. No securityContext (must have runAsNonRoot: true, drop ALL capabilities, readOnlyRootFilesystem)
  2. No resource requests/limits (unbounded resource usage)

As per coding guidelines, all containers in manifests must have restricted SecurityContext and resource limits/requests.

🛡️ Recommended fix
       containers:
         - name: postgresql
           image: postgres:16
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop:
+              - ALL
+          resources:
+            requests:
+              cpu: 100m
+              memory: 256Mi
+            limits:
+              cpu: 500m
+              memory: 512Mi
           ports:

Note: Setting readOnlyRootFilesystem: true requires mounting a writable emptyDir volume at /var/lib/postgresql/data and /var/run/postgresql for PostgreSQL operation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/base/platform/ambient-api-server-db.yml` around lines 58
- 101, Add a restrictive securityContext and resource requests/limits to the
PostgreSQL container named "postgresql": set securityContext.runAsNonRoot: true,
securityContext.capabilities.drop: ["ALL"], and
securityContext.readOnlyRootFilesystem: true; add CPU/memory resource requests
and limits under resources (both requests and limits). Because
readOnlyRootFilesystem requires writable runtime/data mounts, change the
existing volumeMount "ambient-api-server-db-data" to use an emptyDir for
/var/lib/postgresql/data and add a second emptyDir volume mount for
/var/run/postgresql (ensure corresponding volume entries exist), leaving env
vars (POSTGRES_USER/POSTGRES_PASSWORD/POSTGRES_DB/PGDATA) and probes unchanged.
Ensure names match the container "postgresql" and volumeMount
"ambient-api-server-db-data".
components/manifests/base/platform/ambient-api-server-secrets.yml (1)

2-15: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Secrets missing OwnerReferences.

Both Secrets (ambient-code-rds and ambient-api-server) lack ownerReferences. This prevents proper garbage collection and lifecycle management. As per coding guidelines, all Kubernetes Secrets in manifests must have OwnerReferences.

🔗 Recommended fix
 apiVersion: v1
 kind: Secret
 metadata:
   name: ambient-code-rds
   labels:
     app: ambient-api-server
     component: database
+  ownerReferences:
+  - apiVersion: apps/v1
+    kind: Deployment
+    name: ambient-api-server-db
+    uid: <set-by-controller>
+    controller: true
+    blockOwnerDeletion: true
 type: Opaque

Apply similar pattern to the ambient-api-server Secret with appropriate owner (likely the ambient-api-server Deployment).

Also applies to: 18-27

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/base/platform/ambient-api-server-secrets.yml` around
lines 2 - 15, Add Kubernetes ownerReferences to both Secret manifests
(ambient-code-rds and ambient-api-server) so they are garbage-collected with
their owning Deployment; update each Secret to include an ownerReferences array
containing the owner's apiVersion, kind (Deployment), name (ambient-api-server),
uid (the Deployment's UID), and controller/blockOwnerDeletion flags; ensure the
uid is sourced from the actual ambient-api-server Deployment (or
templated/filled during deployment) rather than a hardcoded value so owner
linkage is correct.
components/manifests/base/rbac/frontend-rbac.yaml (1)

9-32: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove legacy ambient-frontend-auth ClusterRole/Binding from both manifests.

The ambient-frontend-auth role (lines 9–32) duplicates permissions already granted by system:auth-delegator (lines 34–45) to the frontend ServiceAccount. The legacy role exists in two separate manifests: components/manifests/base/rbac/frontend-rbac.yaml (shown here) and components/manifests/templates/template-operator.yaml. Cleaning up requires removing both to fully retire the redundant RBAC surface.

♻️ Cleanup scope

Remove from frontend-rbac.yaml (lines 9–32) and from template-operator.yaml (the ClusterRole at ~line 577 and ClusterRoleBinding at ~line 1198).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/base/rbac/frontend-rbac.yaml` around lines 9 - 32,
Delete the legacy ClusterRole and ClusterRoleBinding resources named
"ambient-frontend-auth" (the ClusterRole granting
tokenreviews/subjectaccessreviews and its ClusterRoleBinding that binds
ServiceAccount "frontend" in namespace "ambient-code") from all manifests where
they appear; ensure you also remove any remaining roleRef or subject entries
that reference "ambient-frontend-auth" so the frontend SA instead relies on the
existing "system:auth-delegator" binding and no duplicate RBAC objects remain.
🧹 Nitpick comments (1)
components/manifests/overlays/app-interface/namespace.yaml (1)

1-12: ⚡ Quick win

Consolidate duplicate namespace definitions.

Both namespace-patch.yaml and namespace.yaml define the ambient-code Namespace. While Kustomize will merge them, maintaining two files for the same resource creates confusion and merge complexity.

♻️ Consolidate into single namespace.yaml

Remove namespace-patch.yaml and keep this file with all metadata, or vice versa. Choose one canonical source for the namespace definition.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/overlays/app-interface/namespace.yaml` around lines 1 -
12, There are duplicate Namespace resources for ambient-code (kind: Namespace,
metadata.name: ambient-code) across namespace.yaml and namespace-patch.yaml;
pick a single canonical file and consolidate all labels/annotations there
(ensure metadata.labels: environment, service, name, app and
metadata.annotations: app.kubernetes.io/name and app.kubernetes.io/part-of are
preserved) then remove the other file (or convert it to a deliberate kustomize
patch if needed) so only one manifest declares the Namespace resource.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/frontend/Dockerfile`:
- Line 1: Update stale Node.js 20 comments to Node.js 24: replace the comment
string "# Use Red Hat UBI Node.js 20 minimal image for dependencies" and any
other occurrences of "Node.js 20", "nodejs-20" or "nodejs-20-minimal" in this
Dockerfile (notably the comment blocks around the base image references) with
"Node.js 24" and the correct tags "nodejs-24" / "nodejs-24-minimal" so the
comments match the actual image tags.

In
`@components/manifests/components/oauth-proxy/frontend-oauth-deployment-patch.yaml`:
- Around line 31-35: The patch currently replaces the container's
volumeMounts/volumes and drops the oauth-client-secret and oauth-cookie-secret
mounts referenced by the oauth-proxy args (paths /etc/oauth-client/client_secret
and /etc/oauth-cookie/cookie_secret), causing startup failures; fix by either
adding the missing volumeMounts (oauth-client-secret, oauth-cookie-secret) and
corresponding volumes back into the patch alongside frontend-proxy-tls, or
convert this manifest patch to a patchesJson6902 with merge directives so
existing mounts are preserved; also add the securityContext block to the
oauth-proxy container with runAsNonRoot: true, allowPrivilegeEscalation: false,
capabilities.drop: [ALL], and readOnlyRootFilesystem: true.

In
`@components/manifests/components/oauth-proxy/frontend-oauth-service-patch.yaml`:
- Line 9: The annotation key used in this manifest is the alpha variant
"service.alpha.openshift.io/serving-cert-secret-name: frontend-proxy-tls" which
is inconsistent with the repo convention; update that annotation to the beta
variant "service.beta.openshift.io/serving-cert-secret-name" while preserving
the secret name value (frontend-proxy-tls) so the annotation reads
service.beta.openshift.io/serving-cert-secret-name: frontend-proxy-tls; locate
the entry by the exact annotation key and the value "frontend-proxy-tls" in this
manifest (frontend-oauth-service-patch.yaml) or other similar manifests and
replace alpha with beta.

In
`@components/manifests/overlays/app-interface/ambient-api-server-db-secret-patch.yaml`:
- Around line 4-14: Add a metadata.ownerReferences entry to the Secret named
ambient-code-rds so it is owned by its managing controller; populate
ownerReferences with the owning resource's apiVersion, kind, name and uid and
set controller: true and blockOwnerDeletion: true. Locate the Secret resource
(metadata.name: ambient-code-rds, labels app: ambient-api-server / component:
database) and add the ownerReferences array referencing the correct controller
object (fill in the controller's apiVersion/kind/name/uid from the controller
resource) to ensure proper lifecycle and garbage collection.

In `@components/manifests/overlays/app-interface/kustomization.yaml`:
- Around line 85-121: The kustomization images block currently uses newTag:
latest for all overrides and includes duplicate entries for each image (names
with and without the :latest suffix); replace each newTag: latest with an
immutable identifier (preferably the image digest "sha256:..." from your
CI/Konflux build or a semantic version tag) for these images (vteam_operator,
vteam_backend, vteam_frontend, vteam_public_api, vteam_api_server,
vteam_claude_runner) by updating the corresponding newTag fields, and remove the
redundant duplicate name entries (keep only one mapping per image using the
canonical name without the :latest suffix) so every image override in the images
list has a single entry and an immutable tag.

In `@components/manifests/templates/template-operator.yaml`:
- Around line 1295-1437: Add a restricted securityContext to the
agentic-operator container spec: under the container with name
"agentic-operator" add securityContext with runAsNonRoot: true,
readOnlyRootFilesystem: true and capabilities.drop containing "ALL" (matching
the pattern used by the public-api container in template-services.yaml); ensure
this securityContext is placed alongside env/image/ports so the container no
longer inherits cluster defaults and complies with the manifests guideline.

In `@components/manifests/templates/template-services.yaml`:
- Around line 242-243: The template currently sets a wildcard CORS flag
(--cors-allowed-origins=*) which is too permissive; change the service flags to
read allowed origins from a parameter/env var (e.g., CORS_ALLOWED_ORIGINS)
instead of using '*' and default that var to a concrete allow-list containing
the frontend/public-api Route hostnames; update the manifest where the flags are
defined (look for --cors-allowed-origins and
--cors-allowed-headers=X-Ambient-Project) to reference the env var (and keep the
header flag) and ensure the chart/template values or Deployment env section
documents and populates the default explicit hostnames for production.
- Around line 678-694: The args list still contains stale OAuth proxy flags
--scope=user:full and --upstream-timeout=5m which contradict the PR objective of
replacing them with --openshift-delegate-urls; either remove those two flags
from the args block (keep --openshift-delegate-urls={"..."} only) or, if you
intend to keep all three, update the PR description to state you are augmenting
rather than replacing; locate and edit the args array that includes
--http-address, --openshift-delegate-urls, --scope=..., and
--upstream-timeout=... and remove the two old flags (--scope=user:full and
--upstream-timeout=5m) to align code with the stated change.
- Around line 220-810: Several containers are missing or have incomplete
SecurityContext; update the containers named backend-api, frontend, and
oauth-proxy (and fix ambient-api-server and migration to be fully compliant) to
include a restricted securityContext: set runAsNonRoot: true,
allowPrivilegeEscalation: false, capabilities.drop: [ALL], and
readOnlyRootFilesystem: true. If a container truly requires writable filesystem
space, mount an emptyDir at the writable path and keep the pod’s root filesystem
read-only while restricting privileges as above. Ensure the same restricted
settings are applied to initContainers (migration) and verify public-api remains
unchanged.

---

Outside diff comments:
In `@components/manifests/base/platform/ambient-api-server-db.yml`:
- Around line 22-33: The PVC manifest for PersistentVolumeClaim named
"ambient-api-server-db-data" is missing metadata.ownerReferences; add a
metadata.ownerReferences array on that resource containing the owning
controller's apiVersion, kind, name and uid and set controller: true and
blockOwnerDeletion: true so the PVC is garbage-collected with its owner (use the
actual owner resource's values—e.g., the Database/StatefulSet/Deployment that
owns the PVC—for apiVersion, kind, name and uid). Ensure the ownerReference is
placed under metadata in the manifest for the resource
"ambient-api-server-db-data" so Jobs/Secrets/PVCs follow the project's
controller owner ref convention.
- Around line 58-101: Add a restrictive securityContext and resource
requests/limits to the PostgreSQL container named "postgresql": set
securityContext.runAsNonRoot: true, securityContext.capabilities.drop: ["ALL"],
and securityContext.readOnlyRootFilesystem: true; add CPU/memory resource
requests and limits under resources (both requests and limits). Because
readOnlyRootFilesystem requires writable runtime/data mounts, change the
existing volumeMount "ambient-api-server-db-data" to use an emptyDir for
/var/lib/postgresql/data and add a second emptyDir volume mount for
/var/run/postgresql (ensure corresponding volume entries exist), leaving env
vars (POSTGRES_USER/POSTGRES_PASSWORD/POSTGRES_DB/PGDATA) and probes unchanged.
Ensure names match the container "postgresql" and volumeMount
"ambient-api-server-db-data".

In `@components/manifests/base/platform/ambient-api-server-secrets.yml`:
- Around line 2-15: Add Kubernetes ownerReferences to both Secret manifests
(ambient-code-rds and ambient-api-server) so they are garbage-collected with
their owning Deployment; update each Secret to include an ownerReferences array
containing the owner's apiVersion, kind (Deployment), name (ambient-api-server),
uid (the Deployment's UID), and controller/blockOwnerDeletion flags; ensure the
uid is sourced from the actual ambient-api-server Deployment (or
templated/filled during deployment) rather than a hardcoded value so owner
linkage is correct.

In `@components/manifests/base/rbac/frontend-rbac.yaml`:
- Around line 9-32: Delete the legacy ClusterRole and ClusterRoleBinding
resources named "ambient-frontend-auth" (the ClusterRole granting
tokenreviews/subjectaccessreviews and its ClusterRoleBinding that binds
ServiceAccount "frontend" in namespace "ambient-code") from all manifests where
they appear; ensure you also remove any remaining roleRef or subject entries
that reference "ambient-frontend-auth" so the frontend SA instead relies on the
existing "system:auth-delegator" binding and no duplicate RBAC objects remain.

In `@components/operator/internal/handlers/sessions.go`:
- Around line 1047-1094: The container spec in the corev1.Container struct
literal contains duplicate ReadinessProbe and LivenessProbe fields which causes
a compile error; in the block that sets up the runner container (the
corev1.Container literal that references runnerPort and runnerVolumeMounts)
remove the first ReadinessProbe/LivenessProbe pair (the one with
InitialDelaySeconds:3/20, PeriodSeconds:5/30, TimeoutSeconds and
FailureThreshold values) and keep the later simplified probe definitions (the
second ReadinessProbe/LivenessProbe pair); ensure only a single ReadinessProbe
and a single LivenessProbe remain in that container spec.

---

Nitpick comments:
In `@components/manifests/overlays/app-interface/namespace.yaml`:
- Around line 1-12: There are duplicate Namespace resources for ambient-code
(kind: Namespace, metadata.name: ambient-code) across namespace.yaml and
namespace-patch.yaml; pick a single canonical file and consolidate all
labels/annotations there (ensure metadata.labels: environment, service, name,
app and metadata.annotations: app.kubernetes.io/name and
app.kubernetes.io/part-of are preserved) then remove the other file (or convert
it to a deliberate kustomize patch if needed) so only one manifest declares the
Namespace resource.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 71ea2ffe-098c-4ed4-8119-69d27d22e509

📥 Commits

Reviewing files that changed from the base of the PR and between 63545c7 and a9ac6d9.

⛔ Files ignored due to path filters (1)
  • components/frontend/package-lock.json is excluded by !**/package-lock.json, !**/package-lock.json
📒 Files selected for processing (52)
  • .tekton/ambient-code-ambient-api-server-main-pull-request.yaml
  • .tekton/ambient-code-ambient-api-server-main-push.yaml
  • .tekton/ambient-code-ambient-runner-main-pull-request.yaml
  • .tekton/ambient-code-ambient-runner-main-push.yaml
  • .tekton/ambient-code-backend-main-pull-request.yaml
  • .tekton/ambient-code-backend-main-push.yaml
  • .tekton/ambient-code-frontend-main-pull-request.yaml
  • .tekton/ambient-code-frontend-main-push.yaml
  • .tekton/ambient-code-operator-main-pull-request.yaml
  • .tekton/ambient-code-operator-main-push.yaml
  • .tekton/ambient-code-public-api-main-pull-request.yaml
  • .tekton/ambient-code-public-api-main-push.yaml
  • components/ambient-api-server/templates/db-template.yml
  • components/frontend/Dockerfile
  • components/frontend/package.json
  • components/manifests/README.md
  • components/manifests/base/core/ambient-api-server-service.yml
  • components/manifests/base/core/operator-deployment.yaml
  • components/manifests/base/platform/ambient-api-server-db.yml
  • components/manifests/base/platform/ambient-api-server-secrets.yml
  • components/manifests/base/rbac/frontend-rbac.yaml
  • components/manifests/components/ambient-api-server-db/ambient-api-server-db-json-patch.yaml
  • components/manifests/components/ambient-api-server-db/ambient-api-server-init-db-patch.yaml
  • components/manifests/components/ambient-api-server-db/kustomization.yaml
  • components/manifests/components/oauth-proxy/frontend-oauth-deployment-patch.yaml
  • components/manifests/components/oauth-proxy/frontend-oauth-service-patch.yaml
  • components/manifests/overlays/app-interface/ambient-api-server-db-secret-patch.yaml
  • components/manifests/overlays/app-interface/ambient-api-server-env-patch.yaml
  • components/manifests/overlays/app-interface/ambient-api-server-route.yaml
  • components/manifests/overlays/app-interface/ambient-api-server-service-ca-patch.yaml
  • components/manifests/overlays/app-interface/ambient-api-server-ssl-patch.yaml
  • components/manifests/overlays/app-interface/backend-route.yaml
  • components/manifests/overlays/app-interface/kustomization.yaml
  • components/manifests/overlays/app-interface/namespace-patch.yaml
  • components/manifests/overlays/app-interface/namespace.yaml
  • components/manifests/overlays/app-interface/operator-config-openshift.yaml
  • components/manifests/overlays/app-interface/operator-runner-image-patch.yaml
  • components/manifests/overlays/app-interface/public-api-route.yaml
  • components/manifests/overlays/app-interface/route.yaml
  • components/manifests/overlays/kind/api-server-db-security-patch.yaml
  • components/manifests/overlays/kind/api-server-no-jwt-patch.yaml
  • components/manifests/overlays/local-dev/ambient-api-server-db-credentials-patch.yaml
  • components/manifests/overlays/local-dev/ambient-api-server-db-json-patch.yaml
  • components/manifests/overlays/local-dev/ambient-api-server-init-db-patch.yaml
  • components/manifests/overlays/production/ambient-api-server-jwt-args-patch.yaml
  • components/manifests/overlays/production/ambient-api-server-migration-ssl-patch.yaml
  • components/manifests/overlays/production/kustomization.yaml
  • components/manifests/templates/template-operator.yaml
  • components/manifests/templates/template-services.yaml
  • components/manifests/templates/validate.sh
  • components/operator/internal/controller/otel_metrics.go
  • components/operator/internal/handlers/sessions.go

@@ -1,5 +1,5 @@
# Use Red Hat UBI Node.js 20 minimal image for dependencies
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale Node.js 20 references in comments.

The image tags are now nodejs-24/nodejs-24-minimal but the surrounding comments still say "Node.js 20" / "nodejs-20". Quick cleanup to keep the docs honest.

📝 Proposed fix
-# Use Red Hat UBI Node.js 20 minimal image for dependencies
+# Use Red Hat UBI Node.js 24 minimal image for dependencies
 FROM registry.access.redhat.com/ubi9/nodejs-24-minimal AS deps
@@
 # Rebuild the source code only when needed
-# Use the full nodejs-20 image (not minimal) for the build stage because
+# Use the full nodejs-24 image (not minimal) for the build stage because
 # Next.js 16 Turbopack requires native SWC binaries that depend on glibc.
 FROM registry.access.redhat.com/ubi9/nodejs-24 AS builder

Also applies to: 13-15

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/frontend/Dockerfile` at line 1, Update stale Node.js 20 comments
to Node.js 24: replace the comment string "# Use Red Hat UBI Node.js 20 minimal
image for dependencies" and any other occurrences of "Node.js 20", "nodejs-20"
or "nodejs-20-minimal" in this Dockerfile (notably the comment blocks around the
base image references) with "Node.js 24" and the correct tags "nodejs-24" /
"nodejs-24-minimal" so the comments match the actual image tags.

Comment on lines +31 to +35
- --client-secret-file=/etc/oauth-client/client_secret
- --upstream=http://localhost:3000
- --client-id=ambient-frontend
- --client-secret-file=/etc/oauth/config/client-secret
- --cookie-secret-file=/etc/oauth/config/cookie_secret
- --cookie-expire=23h0m0s
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret-file=/etc/oauth-cookie/cookie_secret
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate the base frontend Deployment to confirm oauth-client/oauth-cookie volumes & securityContext exist.
fd -t f -e yaml -e yml | xargs rg -lP '^\s*name:\s*frontend\s*$' | xargs -I{} rg -nP -C5 '(oauth-client|oauth-cookie|securityContext|volumes:|volumeMounts:)' {}

# Also check the kustomize component for any companion volumes definition.
fd -t f . components/manifests/components/oauth-proxy

Repository: ambient-code/platform

Length of output: 6354


🏁 Script executed:

fd -t f -e yaml -e yml -path "*/components/manifests/*" | head -20

Repository: ambient-code/platform

Length of output: 234


🏁 Script executed:

cat -n components/manifests/components/oauth-proxy/kustomization.yaml

Repository: ambient-code/platform

Length of output: 412


🏁 Script executed:

rg -n "name.*frontend" components/manifests -A2 -B2 -t yaml | grep -E "(kind:|name:|Deployment)" | head -20

Repository: ambient-code/platform

Length of output: 1800


🏁 Script executed:

sed -n '620,800p' components/manifests/templates/template-services.yaml | cat -n

Repository: ambient-code/platform

Length of output: 6474


🏁 Script executed:

rg -n "kind: Deployment" components/manifests/templates/template-services.yaml -A100 | grep -A100 "name: frontend$" | head -120

Repository: ambient-code/platform

Length of output: 3901


🏁 Script executed:

cat -n components/manifests/components/oauth-proxy/frontend-oauth-deployment-patch.yaml

Repository: ambient-code/platform

Length of output: 3013


🏁 Script executed:

rg -n "securityContext" components/manifests/components/oauth-proxy/ -B2 -A5

Repository: ambient-code/platform

Length of output: 47


🏁 Script executed:

rg -n "containers:" components/manifests/templates/template-services.yaml -A50 | grep -A40 "name: oauth-proxy" | grep -i "securitycontext"

Repository: ambient-code/platform

Length of output: 47


🏁 Script executed:

sed -n '678,750p' components/manifests/templates/template-services.yaml | cat -n

Repository: ambient-code/platform

Length of output: 2949


🏁 Script executed:

rg -n "oauth-proxy" components/manifests/templates/template-services.yaml -B5 -A30 | grep -A30 "name: oauth-proxy" | head -40

Repository: ambient-code/platform

Length of output: 1112


Strategic merge patch will lose oauth-client and oauth-cookie secret mounts, causing oauth-proxy startup failure.

The patch replaces the entire volumeMounts and volumes lists instead of merging them. This removes the oauth-cookie-secret and oauth-client-secret mounts (lines 731-734 in the base), leaving only frontend-proxy-tls. Since the oauth-proxy args reference /etc/oauth-client/client_secret and /etc/oauth-cookie/cookie_secret, the container will crash with "file not found" at startup.

Either:

  • Add the missing volumeMounts/volumes to the patch (lines 72-78), or
  • Switch to a patchesJson6902 patch with merge directives to preserve existing mounts

Additionally, oauth-proxy lacks a securityContext. Per coding guidelines, add:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@components/manifests/components/oauth-proxy/frontend-oauth-deployment-patch.yaml`
around lines 31 - 35, The patch currently replaces the container's
volumeMounts/volumes and drops the oauth-client-secret and oauth-cookie-secret
mounts referenced by the oauth-proxy args (paths /etc/oauth-client/client_secret
and /etc/oauth-cookie/cookie_secret), causing startup failures; fix by either
adding the missing volumeMounts (oauth-client-secret, oauth-cookie-secret) and
corresponding volumes back into the patch alongside frontend-proxy-tls, or
convert this manifest patch to a patchesJson6902 with merge directives so
existing mounts are preserved; also add the securityContext block to the
oauth-proxy container with runAsNonRoot: true, allowPrivilegeEscalation: false,
capabilities.drop: [ALL], and readOnlyRootFilesystem: true.

name: frontend-service
annotations:
service.beta.openshift.io/serving-cert-secret-name: dashboard-proxy-tls
service.alpha.openshift.io/serving-cert-secret-name: frontend-proxy-tls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Compare existing usage across the repo to confirm beta is the prevailing convention.
rg -nP 'service\.(alpha|beta)\.openshift\.io/serving-cert-secret-name'

Repository: ambient-code/platform

Length of output: 2213


Use service.beta.openshift.io/... for consistency with codebase convention.

The file uses service.alpha.openshift.io/serving-cert-secret-name, but beta is the standard across the repository (10+ instances in manifests and templates). Change to service.beta.openshift.io/serving-cert-secret-name: frontend-proxy-tls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@components/manifests/components/oauth-proxy/frontend-oauth-service-patch.yaml`
at line 9, The annotation key used in this manifest is the alpha variant
"service.alpha.openshift.io/serving-cert-secret-name: frontend-proxy-tls" which
is inconsistent with the repo convention; update that annotation to the beta
variant "service.beta.openshift.io/serving-cert-secret-name" while preserving
the secret name value (frontend-proxy-tls) so the annotation reads
service.beta.openshift.io/serving-cert-secret-name: frontend-proxy-tls; locate
the entry by the exact annotation key and the value "frontend-proxy-tls" in this
manifest (frontend-oauth-service-patch.yaml) or other similar manifests and
replace alpha with beta.

Comment on lines +4 to +14
metadata:
name: ambient-code-rds
labels:
app: ambient-api-server
component: database
annotations:
# External RDS connection managed via Vault secrets from app-interface Phase 2
# These values will be injected by vault-secret-manager from Vault path:
# app-interface/data/ambient-code-platform/stage/rds-credentials
qontract.recycle: "true"
type: Opaque
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add ownerReferences to this Secret resource.

This new Secret is missing metadata.ownerReferences, so it can become orphaned and drift from its controller lifecycle.

As per coding guidelines **/{k8s,kubernetes,manifests,deploy,config}/**/*secret*.{yaml,yml,json}: Flag Kubernetes Secrets missing OwnerReferences.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@components/manifests/overlays/app-interface/ambient-api-server-db-secret-patch.yaml`
around lines 4 - 14, Add a metadata.ownerReferences entry to the Secret named
ambient-code-rds so it is owned by its managing controller; populate
ownerReferences with the owning resource's apiVersion, kind, name and uid and
set controller: true and blockOwnerDeletion: true. Locate the Secret resource
(metadata.name: ambient-code-rds, labels app: ambient-api-server / component:
database) and add the ownerReferences array referencing the correct controller
object (fill in the controller's apiVersion/kind/name/uid from the controller
resource) to ensure proper lifecycle and garbage collection.

Comment on lines +85 to +121
images:
- name: quay.io/ambient_code/vteam_operator
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-operator-main
newTag: latest
- name: quay.io/ambient_code/vteam_operator:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-operator-main
newTag: latest
- name: quay.io/ambient_code/vteam_backend
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-backend-main
newTag: latest
- name: quay.io/ambient_code/vteam_backend:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-backend-main
newTag: latest
- name: quay.io/ambient_code/vteam_frontend
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-frontend-main
newTag: latest
- name: quay.io/ambient_code/vteam_frontend:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-frontend-main
newTag: latest
- name: quay.io/ambient_code/vteam_public_api
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-public-api-main
newTag: latest
- name: quay.io/ambient_code/vteam_public_api:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-public-api-main
newTag: latest
- name: quay.io/ambient_code/vteam_api_server
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-ambient-api-server-main
newTag: latest
- name: quay.io/ambient_code/vteam_api_server:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-ambient-api-server-main
newTag: latest
- name: quay.io/ambient_code/vteam_claude_runner
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-ambient-runner-main
newTag: latest
- name: quay.io/ambient_code/vteam_claude_runner:latest
newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-ambient-runner-main
newTag: latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Use immutable image tags instead of latest in production.

All image overrides use newTag: latest, which violates immutable deployment best practices for production environments. This prevents:

  • Deterministic deployments (can't guarantee what code is running)
  • Rollback capability (can't revert to known-good versions)
  • Audit trails (can't track which commit is deployed)

Additionally, duplicate entries exist for each image (with and without :latest suffix in the name field, lines 86-91, 92-97, 98-103, 104-109, 110-115, 116-121), which is redundant.

📌 Proposed fix: Use digest-based or semantic version tags

Option 1 (recommended): Use image digests from Konflux builds:

 images:
 - name: quay.io/ambient_code/vteam_operator
   newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-operator-main
-  newTag: latest
+  newTag: sha256:abc123...  # digest from Konflux build
-- name: quay.io/ambient_code/vteam_operator:latest
-  newName: quay.io/redhat-services-prod/hcm-eng-prod-tenant/ambient-code-main/ambient-code-operator-main
-  newTag: latest

Option 2: Use semantic versioning:

-  newTag: latest
+  newTag: v1.2.3  # from git tag

Apply pattern to all 6 image overrides and remove duplicate entries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/overlays/app-interface/kustomization.yaml` around lines
85 - 121, The kustomization images block currently uses newTag: latest for all
overrides and includes duplicate entries for each image (names with and without
the :latest suffix); replace each newTag: latest with an immutable identifier
(preferably the image digest "sha256:..." from your CI/Konflux build or a
semantic version tag) for these images (vteam_operator, vteam_backend,
vteam_frontend, vteam_public_api, vteam_api_server, vteam_claude_runner) by
updating the corresponding newTag fields, and remove the redundant duplicate
name entries (keep only one mapping per image using the canonical name without
the :latest suffix) so every image override in the images list has a single
entry and an immutable tag.

Comment on lines +1295 to +1437
- apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: agentic-operator
name: agentic-operator
namespace: ambient-code
spec:
replicas: 1
selector:
matchLabels:
app: agentic-operator
template:
metadata:
labels:
app: agentic-operator
spec:
containers:
- args:
- --max-concurrent-reconciles=10
- --health-probe-bind-address=:8081
- --leader-elect=false
env:
- name: AMBIENT_CODE_RUNNER_IMAGE
value: ${IMAGE_AMBIENT_RUNNER}:${IMAGE_TAG}
- name: MAX_CONCURRENT_RECONCILES
value: '10'
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: BACKEND_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: BACKEND_API_URL
value: http://backend-service:8080/api
- name: IMAGE_PULL_POLICY
value: IfNotPresent
- name: USE_VERTEX
valueFrom:
configMapKeyRef:
key: USE_VERTEX
name: operator-config
optional: true
- name: CLOUD_ML_REGION
valueFrom:
configMapKeyRef:
key: CLOUD_ML_REGION
name: operator-config
optional: true
- name: ANTHROPIC_VERTEX_PROJECT_ID
valueFrom:
configMapKeyRef:
key: ANTHROPIC_VERTEX_PROJECT_ID
name: operator-config
optional: true
- name: GOOGLE_APPLICATION_CREDENTIALS
valueFrom:
configMapKeyRef:
key: GOOGLE_APPLICATION_CREDENTIALS
name: operator-config
optional: true
- name: LANGFUSE_ENABLED
valueFrom:
secretKeyRef:
key: LANGFUSE_ENABLED
name: ambient-admin-langfuse-secret
optional: true
- name: LANGFUSE_HOST
valueFrom:
secretKeyRef:
key: LANGFUSE_HOST
name: ambient-admin-langfuse-secret
optional: true
- name: LANGFUSE_PUBLIC_KEY
valueFrom:
secretKeyRef:
key: LANGFUSE_PUBLIC_KEY
name: ambient-admin-langfuse-secret
optional: true
- name: LANGFUSE_SECRET_KEY
valueFrom:
secretKeyRef:
key: LANGFUSE_SECRET_KEY
name: ambient-admin-langfuse-secret
optional: true
- name: GOOGLE_OAUTH_CLIENT_ID
valueFrom:
secretKeyRef:
key: GOOGLE_OAUTH_CLIENT_ID
name: google-workflow-app-secret
optional: true
- name: GOOGLE_OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: GOOGLE_OAUTH_CLIENT_SECRET
name: google-workflow-app-secret
optional: true
- name: STATE_SYNC_IMAGE
value: quay.io/ambient_code/vteam_state_sync:latest
- name: S3_ENDPOINT
value: http://minio.ambient-code.svc:9000
- name: S3_BUCKET
value: ambient-sessions
- name: DEPLOYMENT_ENV
value: production
- name: VERSION
value: latest
image: ${IMAGE_OPERATOR}:${IMAGE_TAG}
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /healthz
port: health
initialDelaySeconds: 15
periodSeconds: 20
name: agentic-operator
ports:
- containerPort: 8081
name: health
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: health
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 100m
memory: 512Mi
volumeMounts:
- mountPath: /config/models
name: model-manifest
readOnly: true
- mountPath: /config/registry
name: agent-registry
readOnly: true
restartPolicy: Always
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing restricted SecurityContext on agentic-operator container.

The container spec defines no securityContext, so it inherits cluster defaults (likely running with caps and writable root FS). Per coding guidelines, containers under **/manifests/**/*.yaml must set runAsNonRoot, drop ALL capabilities, and readOnlyRootFilesystem: true. The public-api container in template-services.yaml (lines 803-809) is a good reference.

🛡️ Proposed fix
           image: ${IMAGE_OPERATOR}:${IMAGE_TAG}
           imagePullPolicy: Always
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop:
+              - ALL
+            readOnlyRootFilesystem: true
+            runAsNonRoot: true
           livenessProbe:

As per coding guidelines: "All containers must have restricted SecurityContext: runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-operator.yaml` around lines 1295 -
1437, Add a restricted securityContext to the agentic-operator container spec:
under the container with name "agentic-operator" add securityContext with
runAsNonRoot: true, readOnlyRootFilesystem: true and capabilities.drop
containing "ALL" (matching the pattern used by the public-api container in
template-services.yaml); ensure this securityContext is placed alongside
env/image/ports so the container no longer inherits cluster defaults and
complies with the manifests guideline.

Comment on lines +220 to +810
containers:
- command:
- /usr/local/bin/ambient-api-server
- serve
- --db-host-file=/secrets/db/db.host
- --db-port-file=/secrets/db/db.port
- --db-user-file=/secrets/db/db.user
- --db-password-file=/secrets/db/db.password
- --db-name-file=/secrets/db/db.name
- --enable-jwt=true
- --enable-authz=false
- --jwk-cert-file=/configs/authentication/jwks.json
- --enable-https=false
- --api-server-bindaddress=:8000
- --metrics-server-bindaddress=:4433
- --health-check-server-bindaddress=:4434
- --db-sslmode=require
- --db-max-open-connections=50
- --enable-db-debug=false
- --enable-metrics-https=false
- --http-read-timeout=5s
- --http-write-timeout=30s
- --cors-allowed-origins=*
- --cors-allowed-headers=X-Ambient-Project
- --enable-grpc=true
- --grpc-server-bindaddress=:9000
- --alsologtostderr
- -v=4
env:
- name: AMBIENT_ENV
value: production
image: ${IMAGE_AMBIENT_API_SERVER}:${IMAGE_TAG}
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /api/ambient
port: 8000
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 5
name: api-server
ports:
- containerPort: 8000
name: api
protocol: TCP
- containerPort: 4433
name: metrics
protocol: TCP
- containerPort: 4434
name: health
protocol: TCP
- containerPort: 9000
name: grpc
protocol: TCP
readinessProbe:
httpGet:
httpHeaders:
- name: User-Agent
value: Probe
path: /healthcheck
port: 4434
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 200m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
volumeMounts:
- mountPath: /secrets/db
name: db-secrets
- mountPath: /secrets/service
name: app-secrets
- mountPath: /configs/authentication
name: auth-config
initContainers:
- command:
- /usr/local/bin/ambient-api-server
- migrate
- --db-host-file=/secrets/db/db.host
- --db-port-file=/secrets/db/db.port
- --db-user-file=/secrets/db/db.user
- --db-password-file=/secrets/db/db.password
- --db-name-file=/secrets/db/db.name
- --db-sslmode=require
- --alsologtostderr
- -v=4
image: ${IMAGE_AMBIENT_API_SERVER}:${IMAGE_TAG}
imagePullPolicy: Always
name: migration
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
volumeMounts:
- mountPath: /secrets/db
name: db-secrets
serviceAccountName: ambient-api-server
volumes:
- name: db-secrets
secret:
secretName: ambient-code-rds
- name: app-secrets
secret:
secretName: ambient-api-server
- configMap:
name: ambient-api-server-auth
name: auth-config
- apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: backend-api
name: backend-api
namespace: ambient-code
spec:
replicas: 1
selector:
matchLabels:
app: backend-api
strategy:
type: Recreate
template:
metadata:
labels:
app: backend-api
role: backend
spec:
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PORT
value: '8080'
- name: STATE_BASE_DIR
value: /workspace
- name: SPEC_KIT_REPO
value: ambient-code/spec-kit-rh
- name: SPEC_KIT_VERSION
value: main
- name: SPEC_KIT_TEMPLATE
value: spec-kit-template-claude-sh
- name: IMAGE_PULL_POLICY
valueFrom:
configMapKeyRef:
key: IMAGE_PULL_POLICY
name: operator-config
optional: true
- name: GITHUB_APP_ID
valueFrom:
secretKeyRef:
key: GITHUB_APP_ID
name: github-app-secret
optional: true
- name: GITHUB_PRIVATE_KEY
valueFrom:
secretKeyRef:
key: GITHUB_PRIVATE_KEY
name: github-app-secret
optional: true
- name: GITHUB_CLIENT_ID
valueFrom:
secretKeyRef:
key: GITHUB_CLIENT_ID
name: github-app-secret
optional: true
- name: GITHUB_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: GITHUB_CLIENT_SECRET
name: github-app-secret
optional: true
- name: GITHUB_STATE_SECRET
valueFrom:
secretKeyRef:
key: GITHUB_STATE_SECRET
name: github-app-secret
optional: true
- name: GOOGLE_OAUTH_CLIENT_ID
valueFrom:
secretKeyRef:
key: GOOGLE_OAUTH_CLIENT_ID
name: google-workflow-app-secret
optional: true
- name: GOOGLE_OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: GOOGLE_OAUTH_CLIENT_SECRET
name: google-workflow-app-secret
optional: true
- name: OAUTH_STATE_SECRET
valueFrom:
secretKeyRef:
key: OAUTH_STATE_SECRET
name: google-workflow-app-secret
optional: true
- name: BACKEND_URL
valueFrom:
secretKeyRef:
key: BACKEND_URL
name: google-workflow-app-secret
optional: true
- name: OPERATOR_IMAGE
valueFrom:
configMapKeyRef:
key: OPERATOR_IMAGE
name: operator-config
optional: true
- name: OOTB_WORKFLOWS_REPO
value: https://github.com/ambient-code/workflows.git
- name: OOTB_WORKFLOWS_BRANCH
value: main
- name: OOTB_WORKFLOWS_PATH
value: workflows
- name: USE_VERTEX
valueFrom:
configMapKeyRef:
key: USE_VERTEX
name: operator-config
optional: true
- name: CLOUD_ML_REGION
valueFrom:
configMapKeyRef:
key: CLOUD_ML_REGION
name: operator-config
optional: true
- name: ANTHROPIC_VERTEX_PROJECT_ID
valueFrom:
configMapKeyRef:
key: ANTHROPIC_VERTEX_PROJECT_ID
name: operator-config
optional: true
- name: GOOGLE_APPLICATION_CREDENTIALS
valueFrom:
configMapKeyRef:
key: GOOGLE_APPLICATION_CREDENTIALS
name: operator-config
optional: true
- name: LDAP_SRV_DOMAIN
valueFrom:
configMapKeyRef:
key: LDAP_SRV_DOMAIN
name: ldap-config
optional: true
- name: LDAP_URL
valueFrom:
configMapKeyRef:
key: LDAP_URL
name: ldap-config
optional: true
- name: LDAP_BASE_DN
valueFrom:
configMapKeyRef:
key: LDAP_BASE_DN
name: ldap-config
optional: true
- name: LDAP_GROUP_BASE_DN
valueFrom:
configMapKeyRef:
key: LDAP_GROUP_BASE_DN
name: ldap-config
optional: true
- name: LDAP_BIND_DN
valueFrom:
configMapKeyRef:
key: LDAP_BIND_DN
name: ldap-config
optional: true
- name: LDAP_BIND_PASSWORD
valueFrom:
secretKeyRef:
key: LDAP_BIND_PASSWORD
name: ldap-credentials
optional: true
- name: LDAP_CA_CERT_PATH
valueFrom:
configMapKeyRef:
key: LDAP_CA_CERT_PATH
name: ldap-config
optional: true
- name: UNLEASH_URL
valueFrom:
secretKeyRef:
key: unleash-url
name: unleash-credentials
optional: true
- name: UNLEASH_CLIENT_KEY
valueFrom:
secretKeyRef:
key: client-api-token
name: unleash-credentials
optional: true
- name: UNLEASH_ADMIN_URL
valueFrom:
secretKeyRef:
key: unleash-admin-url
name: unleash-credentials
optional: true
- name: UNLEASH_ADMIN_TOKEN
valueFrom:
secretKeyRef:
key: admin-api-token
name: unleash-credentials
optional: true
- name: UNLEASH_PROJECT
valueFrom:
secretKeyRef:
key: unleash-project
name: unleash-credentials
optional: true
- name: UNLEASH_ENVIRONMENT
valueFrom:
secretKeyRef:
key: unleash-environment
name: unleash-credentials
optional: true
image: ${IMAGE_BACKEND}:${IMAGE_TAG}
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
name: backend-api
ports:
- containerPort: 8080
name: http
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
resources:
limits:
cpu: '1'
memory: 1536Mi
requests:
cpu: 200m
memory: 512Mi
volumeMounts:
- mountPath: /workspace
name: backend-state
- mountPath: /app/vertex
name: vertex-credentials
readOnly: true
- mountPath: /config/models
name: model-manifest
readOnly: true
- mountPath: /config/flags
name: flags-config
readOnly: true
- mountPath: /config/registry
name: agent-registry
readOnly: true
- mountPath: /etc/pki/custom-ca
name: ldap-ca-cert
readOnly: true
serviceAccountName: backend-api
volumes:
- name: backend-state
persistentVolumeClaim:
claimName: backend-state-pvc
- name: vertex-credentials
secret:
optional: true
secretName: stage-gcp-creds
- configMap:
name: ambient-models
optional: true
name: model-manifest
- configMap:
name: ambient-flags
optional: true
name: flags-config
- configMap:
name: ambient-agent-registry
optional: true
name: agent-registry
- configMap:
name: ldap-ca-cert
optional: true
name: ldap-ca-cert
- apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: frontend
name: frontend
namespace: ambient-code
spec:
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- env:
- name: BACKEND_URL
value: http://backend-service:8080/api
- name: NODE_ENV
value: production
- name: GITHUB_APP_SLUG
value: ambient-code
- name: UNLEASH_URL
valueFrom:
secretKeyRef:
key: unleash-url
name: unleash-credentials
optional: true
- name: UNLEASH_CLIENT_KEY
valueFrom:
secretKeyRef:
key: client-api-token
name: unleash-credentials
optional: true
image: ${IMAGE_FRONTEND}:${IMAGE_TAG}
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 30
periodSeconds: 10
name: frontend
ports:
- containerPort: 3000
name: http
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 5
periodSeconds: 5
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
- args:
- --http-address=:8443
- --https-address=
- --provider=openshift
- --client-id=ambient-code
- --client-secret-file=/etc/oauth-client/client_secret
- --upstream=http://localhost:3000
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret-file=/etc/oauth-cookie/cookie_secret
- --cookie-expire=24h
- --cookie-refresh=1h
- --pass-access-token
- --scope=user:full
- --openshift-delegate-urls={"/":{"resource":"projects","verb":"list"}}
- --upstream-timeout=5m
- --skip-auth-regex=^/metrics
image: ${OAUTH_PROXY_IMAGE_NAME}:${OAUTH_PROXY_IMAGE_TAG}
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /oauth/healthz
port: dashboard-ui
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: oauth-proxy
ports:
- containerPort: 8443
name: dashboard-ui
readinessProbe:
failureThreshold: 3
httpGet:
path: /oauth/healthz
port: dashboard-ui
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 10m
memory: 50Mi
volumeMounts:
- mountPath: /etc/tls/private
name: frontend-proxy-tls
- mountPath: /etc/oauth-cookie
name: oauth-cookie-secret
- mountPath: /etc/oauth-client
name: oauth-client-secret
serviceAccountName: frontend
volumes:
- name: frontend-proxy-tls
secret:
secretName: frontend-proxy-tls
- name: oauth-cookie-secret
secret:
secretName: stage-cookie-secret
- name: oauth-client-secret
secret:
secretName: stage-sso-client
- apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: public-api
name: public-api
namespace: ambient-code
spec:
replicas: 1
selector:
matchLabels:
app: public-api
template:
metadata:
labels:
app: public-api
role: api-gateway
spec:
containers:
- env:
- name: PORT
value: '8081'
- name: BACKEND_URL
value: http://backend-service:8080
- name: GIN_MODE
value: release
- name: BACKEND_TIMEOUT
value: 30s
- name: RATE_LIMIT_RPS
value: '100'
- name: RATE_LIMIT_BURST
value: '200'
image: ${IMAGE_PUBLIC_API}:${IMAGE_TAG}
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
name: public-api
ports:
- containerPort: 8081
name: http
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
serviceAccountName: public-api
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Missing restricted SecurityContext on backend-api, frontend, and oauth-proxy containers.

Several containers in this template lack a securityContext block entirely (backend-api lines 339-592, frontend lines 633-677, oauth-proxy sidecar lines 678-735). The ambient-api-server container (line 291) and its migration init container (line 319) declare a partial context but set readOnlyRootFilesystem: false and omit runAsNonRoot: true. Only public-api (lines 803-809) is fully compliant.

Add the standard restricted context to each container; if writes are genuinely required (e.g., for SQLite/local cache), prefer mounting an emptyDir for the writable path and keep root FS read-only.

🛡️ Reference (oauth-proxy sidecar)
           image: ${OAUTH_PROXY_IMAGE_NAME}:${OAUTH_PROXY_IMAGE_TAG}
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop:
+              - ALL
+            readOnlyRootFilesystem: true
+            runAsNonRoot: true

As per coding guidelines: "All containers must have restricted SecurityContext: runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem" and "Flag missing namespace scoping or pod security context in Kubernetes resources".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` around lines 220 -
810, Several containers are missing or have incomplete SecurityContext; update
the containers named backend-api, frontend, and oauth-proxy (and fix
ambient-api-server and migration to be fully compliant) to include a restricted
securityContext: set runAsNonRoot: true, allowPrivilegeEscalation: false,
capabilities.drop: [ALL], and readOnlyRootFilesystem: true. If a container truly
requires writable filesystem space, mount an emptyDir at the writable path and
keep the pod’s root filesystem read-only while restricting privileges as above.
Ensure the same restricted settings are applied to initContainers (migration)
and verify public-api remains unchanged.

Comment on lines +242 to +243
- --cors-allowed-origins=*
- --cors-allowed-headers=X-Ambient-Project
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wildcard CORS in a production template.

--cors-allowed-origins=* lets any origin invoke the API server. For a Template shipped as the production deployment artifact, this is far too permissive and effectively disables CORS-based protections. Parameterize it (e.g., CORS_ALLOWED_ORIGINS) and default to an explicit allow-list of the frontend/public-api Route hostnames.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` around lines 242 -
243, The template currently sets a wildcard CORS flag (--cors-allowed-origins=*)
which is too permissive; change the service flags to read allowed origins from a
parameter/env var (e.g., CORS_ALLOWED_ORIGINS) instead of using '*' and default
that var to a concrete allow-list containing the frontend/public-api Route
hostnames; update the manifest where the flags are defined (look for
--cors-allowed-origins and --cors-allowed-headers=X-Ambient-Project) to
reference the env var (and keep the header flag) and ensure the chart/template
values or Deployment env section documents and populates the default explicit
hostnames for production.

Comment on lines +678 to +694
- args:
- --http-address=:8443
- --https-address=
- --provider=openshift
- --client-id=ambient-code
- --client-secret-file=/etc/oauth-client/client_secret
- --upstream=http://localhost:3000
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret-file=/etc/oauth-cookie/cookie_secret
- --cookie-expire=24h
- --cookie-refresh=1h
- --pass-access-token
- --scope=user:full
- --openshift-delegate-urls={"/":{"resource":"projects","verb":"list"}}
- --upstream-timeout=5m
- --skip-auth-regex=^/metrics
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm PR title/description vs final code state
gh pr view 1591 --json title,body 2>/dev/null | head -100 || true
rg -nP -C2 '(--scope=user:full|--upstream-timeout=5m|--openshift-delegate-urls)' components/manifests/

Repository: ambient-code/platform

Length of output: 4205


Remove stale OAuth proxy flags—code diverges from PR objective.

PR description specifies replacing --scope=user:full and --upstream-timeout=5m with --openshift-delegate-urls, but lines 691 and 693 still contain the old flags. The delegate-urls flag (line 692) is present, but the scope and timeout flags should have been removed per the stated change.

If all three flags are intentional, update the PR description to reflect an augment rather than replace. Otherwise, remove lines 691 and 693 to align with the documented objective. The broader --scope=user:full weakens the access control posture if per-request delegation is the new design.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/templates/template-services.yaml` around lines 678 -
694, The args list still contains stale OAuth proxy flags --scope=user:full and
--upstream-timeout=5m which contradict the PR objective of replacing them with
--openshift-delegate-urls; either remove those two flags from the args block
(keep --openshift-delegate-urls={"..."} only) or, if you intend to keep all
three, update the PR description to state you are augmenting rather than
replacing; locate and edit the args array that includes --http-address,
--openshift-delegate-urls, --scope=..., and --upstream-timeout=... and remove
the two old flags (--scope=user:full and --upstream-timeout=5m) to align code
with the stated change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants