Skip to content

koshihq/koshi-runtime

Koshi Runtime

Koshi Runtime is a workload-scoped governance plane for AI systems. It deploys as a Kubernetes sidecar that enforces deterministic policy at the workload boundary — token budgets, per-request guards, and tiered enforcement decisions — using reservation-first accounting. The sidecar supports three operating shapes: listener audit (shadow-only, default), enforcement with built-in policy presets, and operator-authored custom policy via namespace-local ConfigMap (works in both listener and enforcement modes). A standalone deployment model is available for centralized enforcement with header-based identity.

Discover your governance posture before enforcing it. Koshi ships in listener mode by default: the full enforcement pipeline — identity resolution, policy lookup, guard evaluation, budget accounting — executes on every request, but no traffic is blocked. Shadow decisions (would_reject, would_throttle, would_kill) reveal exactly where your policies would intervene. When the posture matches your intent, activate enforcement on the same sidecar — choose built-in policy presets via the runtime.getkoshi.ai/policy annotation, or deliver custom policy via a namespace-local ConfigMap (runtime.getkoshi.ai/configmap). Both work with a single annotation change and a pod restart. See From Audit to Enforcement.

No repo clone required. Install Koshi directly from the published OCI Helm chart and container image.

Quick Start: Kubernetes (Listener Mode)

# 1. Install into koshi-system
helm install koshi oci://ghcr.io/koshihq/charts/koshi \
  --version 0.2.12 \
  --namespace koshi-system --create-namespace

# 2. Opt namespaces in — only labeled namespaces get the sidecar
kubectl label namespace my-namespace runtime.getkoshi.ai/inject=true

# 3. Restart workloads to pick up the sidecar
kubectl rollout restart deployment -n my-namespace

# 4. Observe shadow events
kubectl logs -n my-namespace deploy/my-app -c koshi-listener --tail=100 | \
  jq 'select(.stream == "event")'

# 5. Check metrics (default sidecar port; adjust if you changed sidecar.port)
kubectl port-forward -n my-namespace deploy/my-app 15080:15080
curl http://localhost:15080/metrics | grep koshi_listener

What You Can Do on Day One

  • Install Koshi in listener mode — one Helm command, no repo clone or config files required
  • Label any namespace and restart workloads to get sidecars injected
  • Collect structured JSON events from koshi-listener container logs
  • Scrape Prometheus metrics from /metrics on the sidecar port (default 15080, configurable via sidecar.port)
  • Observe real shadow decisions (allow, would_throttle, would_kill, would_reject) on live traffic without blocking anything

What Traffic Produces Signal

Workloads produce governance signal when they send OpenAI- or Anthropic-compatible API requests through the sidecar. The webhook injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL env vars pointing at the sidecar (only if the container does not already set them). The sidecar evaluates the request against policy, emits a shadow event, and proxies the request to the real upstream transparently.

Prerequisites for signal:

  • The workload's SDK or HTTP client must honor OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars. The official OpenAI and Anthropic SDKs do this by default.
  • The workload must not already set these env vars in its pod spec — if present, the webhook will not overwrite them.
  • The workload must not hardcode provider URLs in application code or config files, bypassing the env vars entirely.

No signal? Check these first:

  1. Verify the sidecar container exists: kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}' — look for koshi-listener
  2. Verify the env vars were injected: kubectl get pod <pod> -o jsonpath='{.spec.containers[0].env[*].name}' — look for OPENAI_BASE_URL / ANTHROPIC_BASE_URL
  3. If the env vars are missing, the workload's pod spec likely already defines them — check the Deployment manifest
  4. If the env vars are present but no events appear, the SDK may not be honoring them — check whether the app uses a custom HTTP client or hardcoded base URL

What Gets Installed

Component Namespace Purpose
Injector Deployment koshi-system Mutating admission webhook — injects sidecar into labeled namespaces
MutatingWebhookConfiguration cluster-scoped Intercepts pod CREATE in namespaces with runtime.getkoshi.ai/inject: "true"
ConfigMap koshi-system Runtime config for the control-plane deployment (mode, upstreams, default policy). Injected sidecars use built-in sidecar config by default (mode and policy selected via pod annotations); they do not read this ConfigMap. For custom sidecar policy, operators create a separate namespace-local ConfigMap and reference it via runtime.getkoshi.ai/configmap.
TLS Secret koshi-system Webhook serving certificate (self-signed by default)
NetworkPolicy koshi-system Restricts injector ingress to apiserver, sidecar egress to upstreams

The sidecar (koshi-listener) is injected into workload pods automatically. It:

  • Listens on :15080 by default (configurable via sidecar.port Helm value)
  • Exposes /metrics, /healthz, /readyz, /status
  • Receives traffic via OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars injected into app containers

How It Works

Request → Identify workload (pod metadata) → Resolve policy → Extract max_tokens →
  Per-request guard check → Reserve tokens (rolling window) → Tier decision →
  Emit shadow event (listener) or enforce (enforcement) → Proxy to upstream →
  Record actual usage on response

Listener vs Enforcement Mode

Both modes execute the same enforcement pipeline. Listener mode surfaces governance posture without affecting traffic; enforcement mode acts on it.

Behavior Listener Enforcement
Identity failure Emit would_reject, proxy through Return 403
Budget exceeded Emit would_throttle, proxy through Return 429 with Retry-After
Kill decision Emit would_kill, proxy through Return 503
Metrics koshi_listener_* series koshi_enforcement_* series
Default listen addr :15080 :8080

Workload Identity

In Kubernetes, identity is derived from pod metadata injected by the webhook at admission time:

Source Env Var Example
Pod namespace KOSHI_POD_NAMESPACE production
Owner kind (normalized) KOSHI_WORKLOAD_KIND Deployment
Owner name (normalized) KOSHI_WORKLOAD_NAME my-service
Pod name KOSHI_POD_NAME my-service-abc123-xyz

Normalization rules: ReplicaSet owners with pod-template-hash are normalized to Deployment. StatefulSet, DaemonSet, Job, and CronJob owners are used directly. Pods with no owner resolve as Pod/<name>.

In standalone (non-Kubernetes) mode, identity comes from an HTTP header (default: x-genops-workload-id).

Configuration

Config is loaded from KOSHI_CONFIG_PATH when set. If KOSHI_CONFIG_PATH is unset, injected sidecars use built-in sidecar config; KOSHI_MODE and KOSHI_POLICY_OVERRIDE (set via pod annotations) determine listener vs enforcement mode and built-in policy selection.

Sidecar config behavior: The control-plane deployment (main runtime and injector) uses the charted ConfigMap via KOSHI_CONFIG_PATH. Injected sidecars use built-in sidecar config by default — listener mode by default, enforcement mode when annotated — with selectable built-in policies. For arbitrary custom policy, annotate the pod with runtime.getkoshi.ai/configmap to mount a namespace-local ConfigMap containing custom policies, and runtime.getkoshi.ai/policy to select which policy to use. See Path C: Sidecar custom config.

Listener Mode (recommended starting point)

mode:
  type: "listener"

upstreams:
  openai: "https://api.openai.com"
  anthropic: "https://api.anthropic.com"

default_policy:
  id: "_listener_default"
  budgets:
    rolling_tokens:
      window_seconds: 3600
      limit_tokens: 1000000
      burst_tokens: 0
  guards:
    max_tokens_per_request: 32768
  decision_tiers:
    tier1_auto: { action: "throttle" }

See examples/listener-config.yaml for a fully annotated reference.

Enforcement Mode

upstreams:
  openai: "https://api.openai.com"
  anthropic: "https://api.anthropic.com"

workloads:
  - id: "my-agent"
    identity: { mode: "header", key: "x-genops-workload-id" }
    policy_refs: ["standard"]

policies:
  - id: "standard"
    budgets:
      rolling_tokens:
        window_seconds: 300
        limit_tokens: 100000
        burst_tokens: 10000
    guards:
      max_tokens_per_request: 4096
    decision_tiers:
      tier1_auto: { action: "throttle" }
      tier3_platform: { action: "kill_workload" }

See examples/config.yaml for a fully annotated reference.

Settings Reference

Field Default Description
mode.type "enforcement" "listener" or "enforcement"
default_policy none Policy applied to unknown workloads
strict_mode false Reject unknown workloads even if default_policy is set
sse_extraction true Extract actual token usage from SSE streams
listen_addr :8080 (enforcement) / :15080 (listener) Server listen address

Observability

Structured Events

Events are emitted as JSON to stdout with "stream": "event". Filter with:

kubectl logs -c koshi-listener | jq 'select(.stream == "event")'

Listener Metrics

Metric Labels Description
koshi_listener_decisions_total namespace, decision_shadow, reason_code Shadow decision counter
koshi_listener_tokens_total namespace, provider, phase Token reservation/actual counter
koshi_listener_latency_seconds (none) Enforcement pipeline latency histogram

Reason Codes

All decisions and error responses include a stable reason_code:

Code Meaning
identity_missing Could not resolve workload identity and no default policy fallback available
policy_not_found Identity resolved but no explicit or default policy available for evaluation
guard_max_tokens Request max_tokens exceeds the resolved policy's per-request guard
budget_exhausted_throttle Resolved policy's rolling window budget exceeded → throttle
budget_exhausted_kill Resolved policy's rolling window budget exceeded → kill
upstream_not_configured No upstream URL for detected provider
upstream_timeout Upstream did not respond in time
system_degraded Runtime entered degraded mode
budget_config_error Budget tracker misconfiguration

See docs/kubernetes-observability.md for detailed observability guidance including sample Prometheus queries and event field reference.

How Shadow Decisions Relate to Policy

Shadow decisions are computed against the policy context available to the sidecar at request time. If listener mode is started with a default_policy, requests are evaluated against that policy even when no explicit workload-to-policy mappings are defined. In that case, expect allow, would_throttle, or would_kill outcomes — not would_reject. A would_reject shadow decision only appears when Koshi cannot resolve a usable policy context for the request.

Situation Policy context Expected shadow outcomes
Listener with default_policy only default_policy allow, would_throttle, would_kill
Listener with explicit workload-to-policy mapping matched policy allow, would_throttle, would_kill
Listener with policy override annotation override policy allow, would_throttle, would_kill
Identity missing, no default policy none would_reject (identity_missing)
Identity resolved, no matching or default policy none would_reject (policy_not_found)

Deployment

Docker

make docker
# Produces koshi:latest — distroless nonroot image

Kubernetes (Helm)

helm install koshi oci://ghcr.io/koshihq/charts/koshi \
  --version 0.2.12 \
  --namespace koshi-system --create-namespace

Version pinning: Always pin --version in production to avoid unexpected upgrades. The appVersion field in the chart metadata determines the default image tag when image.tag is unset.

Docker Hub mirror: If you prefer Docker Hub, add --set image.repository=docker.io/koshihq/koshi-runtime. The chart and configuration are identical.

Key Helm values:

Value Default Description
mode listener Runtime mode
injector.enabled true Deploy the admission webhook
webhook.failurePolicy Ignore Webhook down → pods still create without sidecar
sidecar.port 15080 Sidecar listen port
namespaceSelector.matchLabels runtime.getkoshi.ai/inject: "true" Which namespaces get injection
networkPolicy.enabled true Deploy NetworkPolicy for injector

Annotations

Annotation Values Description
runtime.getkoshi.ai/inject "false" Opt out a specific pod from injection
runtime.getkoshi.ai/mode "enforcement" Enable sidecar enforcement mode. When set, the sidecar actively blocks requests that violate policy (429/503). Omit or set to "listener" for shadow-only audit mode (default).
runtime.getkoshi.ai/policy policy ID Select a built-in sidecar policy: sidecar-baseline (default), sidecar-strict, sidecar-high-throughput. Works in both listener and enforcement sidecar modes. Required when runtime.getkoshi.ai/configmap is set. Unknown IDs fail at sidecar startup.
runtime.getkoshi.ai/configmap ConfigMap name Mount a namespace-local ConfigMap containing custom sidecar policy. The ConfigMap must contain a config.yaml data key. Must be paired with runtime.getkoshi.ai/policy to select which policy from the ConfigMap to use. Works in both listener and enforcement modes.

Health Endpoints

Endpoint Method Response
/healthz GET 200 OK / 503 degraded
/readyz GET 200 ready / 503 degraded
/status GET Runtime diagnostics and budget state (JSON)
/metrics GET Prometheus metrics

Deployment Models

Koshi supports two deployment models today. They serve different purposes and have different operational characteristics.

Injected sidecar Standalone deployment
Purpose Governance audit (listener mode, default) or live enforcement with per-pod blast radius (enforcement mode via annotation) Centralized enforcement with full custom policy config
Identity Pod metadata (injected by webhook at admission) HTTP header (x-genops-workload-id default)
Config source Built-in sidecar config by default (mode and policy via pod annotations); namespace-local ConfigMap when runtime.getkoshi.ai/configmap is set File-based runtime config (KOSHI_CONFIG_PATH / ConfigMap)
Policy Built-in policy catalog (sidecar-baseline, sidecar-strict, sidecar-high-throughput) selectable via annotation; or arbitrary custom policy via namespace-local ConfigMap Named policies with per-workload binding; fully operator-defined
Traffic effect Shadow-only by default; active enforcement (403 / 429 / 503) when mode annotation is set Active — 403 / 429 / 503 on policy violations
Blast radius Per pod (each sidecar is independent) Centralized (all routed traffic shares one deployment)
Availability Sidecar lifecycle follows the workload pod Operator-managed; default chart runs a single replica
Metrics koshi_listener_* or koshi_enforcement_* series (depending on mode) koshi_enforcement_* series

Sidecar Operating Shapes

The injected sidecar supports three operating shapes, controlled by pod annotations:

Listener (default) Enforcement + built-in policy Custom ConfigMap sidecar
Mode annotation none (or listener) enforcement optional — omit for shadow, enforcement for blocking
Policy annotation optional (selects built-in for shadow eval) optional (defaults to sidecar-baseline) required (selects from ConfigMap)
ConfigMap annotation none none required
Traffic effect Shadow only Active blocking (429/503) Shadow if no mode annotation; active blocking (429/503) if enforcement
Policy source Built-in defaults Built-in catalog Operator-authored ConfigMap

Current Adoption Path

  1. Audit — install Koshi in listener mode, inject sidecars, collect shadow decisions on live traffic. This reveals which workloads generate AI API traffic, what their token patterns look like, and where the default policy boundary sits.

  2. Validate — use shadow outcomes, identity coverage, and token pressure to finalize policy intent. See Posture Discovery and the pre-enforcement checklist.

  3. Enforce — three paths are available:

    • Sidecar enforcement with built-in policy (in-place): add runtime.getkoshi.ai/mode: enforcement to pod annotations and optionally select a built-in policy via runtime.getkoshi.ai/policy. No routing change, no identity change, per-pod blast radius preserved. See Path A.
    • Sidecar custom config via ConfigMap: mount a namespace-local ConfigMap with custom policies via runtime.getkoshi.ai/configmap and runtime.getkoshi.ai/policy annotations. Works in both listener and enforcement modes. See Path C.
    • Standalone enforcement (deployment handoff): deploy Koshi as a self-hosted standalone runtime for centralized enforcement and header-based identity. This is a deployment-model handoff. See Path B and the onboarding guide.

Most teams should start with Path A or Path C. Both preserve per-pod blast radius and require no routing or identity changes. Choose Path A (built-in presets) when standard limits fit, or Path C (custom ConfigMap) when you need operator-authored budgets and guards. Path B (standalone) is for teams that need centralized enforcement across workloads, header-based identity, or a shared enforcement point.

From Audit to Enforcement

There are three paths from listener audit to live enforcement. Choose based on your policy requirements.

Path A: Built-in sidecar Path C: Custom ConfigMap Path B: Standalone
Policy Built-in catalog (3 presets) Operator-authored (any budgets/guards/tiers) Operator-authored (full config file)
Identity Pod metadata (automatic) Pod metadata (automatic) HTTP header (manual)
Traffic change None None Reroute to standalone Service
Blast radius Per pod Per pod All routed workloads
Config delivery Pod annotation Namespace-local ConfigMap KOSHI_CONFIG_PATH
Best for Quick adoption, standard limits Team-specific budgets and guards Centralized coordination, header identity

Path A: Sidecar enforcement (in-place)

When to choose: You want enforcement with minimal effort and the built-in policy presets fit your workload's token patterns. This is the fastest path from listener audit to live enforcement.

The simplest path. Add pod annotations and enforcement is active on the next pod restart. No routing change, no identity change, no config file.

  1. Add runtime.getkoshi.ai/mode: "enforcement" to the pod template annotations
  2. Optionally select a built-in policy: runtime.getkoshi.ai/policy: "sidecar-strict" (defaults to sidecar-baseline)
  3. Restart the workload — the sidecar now actively blocks requests that violate the selected policy

What you get: live enforcement with per-pod blast radius, pod-derived identity, and built-in policy selection (sidecar-baseline, sidecar-strict, sidecar-high-throughput).

What you don't get: arbitrary custom policy (custom budgets, guards, tier configs) — for that, use Path C (sidecar custom config via ConfigMap). For centralized budget coordination or header-based identity, use standalone enforcement.

See examples/enforcement-sidecar-deployment.yaml for a complete example.

Path B: Standalone enforcement (deployment handoff)

When to choose: You need centralized budget coordination across workloads, header-based identity, or a shared enforcement point. Most teams should start with sidecar enforcement (Path A or C) and only move to standalone if these specific requirements emerge.

Moving to standalone enforcement is not a config change — it is a deployment-model handoff involving three distinct transitions and one operational risk that should be planned for explicitly.

Policy transition

Listener audit results do not automatically become production policies. The handoff requires manual policy operationalization:

  • Observed workloads from sidecar events (namespace, workload_kind, workload_name tuples) must be mapped into explicit workloads entries with an id, identity.mode: "header", and policy_refs.
  • Operators must define named policies with budget limits, guards, and tier actions informed by shadow outcomes from the audit — but the translation is manual, not automated.
  • policy_refs must be attached to each workload entry to bind it to the appropriate policies.
  • Shadow outcomes like would_throttle and would_kill inform what limits are appropriate, but they do not generate policy config.

Identity transition

Sidecar listener audits resolve identity from pod metadata injected by the webhook at admission time (namespace, workload_kind, workload_name). Standalone enforcement uses a different identity model:

  • Standalone enforcement uses HeaderResolver — identity comes from an HTTP header, not pod metadata.
  • Operators must choose a deployment-wide identity header key (default: x-genops-workload-id).
  • Application code, SDK wrapper, API gateway, or service mesh must send that header on every request routed through the standalone deployment.
  • In v1, all header-mode workloads share the same identity key — this is a real implementation constraint, not a configuration default.

Traffic transition

Sidecar listener mode works because the webhook redirects app traffic locally to the sidecar on localhost. Standalone enforcement routes traffic through a self-hosted Koshi runtime (in Kubernetes, exposed via a Service) — a fundamentally different traffic path:

  • Application HTTP clients must be pointed at the standalone Koshi runtime (e.g., via a Kubernetes Service) instead of directly at AI provider APIs.
  • For workloads being moved to standalone enforcement, the injected sidecar path should be removed or bypassed — this typically means removing the namespace label and restarting workloads.
  • For standalone enforcement specifically, this is a traffic-path change from sidecar-local routing to standalone routing. (For sidecar enforcement, no traffic change is needed — see Path A and Path C.)

Rollout considerations

This handoff is a traffic-path change, not just a policy or config change. Operators should plan for:

  • Shared enforcement path: all routed traffic flows through a self-hosted Koshi runtime, unlike per-pod sidecars where each workload has its own independent sidecar instance.
  • Broader-scope traffic change: a misconfiguration in the standalone runtime or its routing (e.g., Kubernetes Service) affects all workloads routed through it, so testing with a narrow subset first is worthwhile.
  • Traffic cutover coordination: incorrect routing targets, missing identity headers, or DNS issues can misroute traffic. Validate connectivity before shifting production workloads.
  • Rollback path: restoring the previous state means re-enabling sidecar injection and restarting workloads, not changing a config value. Plan this path before cutting over.

Recommendation: start with a small number of workloads. The sidecar listener namespace label can be re-applied and workloads restarted to restore the audit-only posture at any time.

Worked example: one audited workload to standalone enforcement

Listener audit observed:

namespace:        "prod"
workload_kind:    "Deployment"
workload_name:    "payments-api"
provider:         "openai"
decision_shadow:  "would_throttle"
reason_code:      "guard_max_tokens"

Standalone enforcement config:

mode:
  type: "enforcement"

upstreams:
  openai: "https://api.openai.com"

workloads:
  - id: "prod/payments-api"
    type: "service"
    owner_team: "payments"
    environment: "production"
    identity:
      mode: "header"
      key: "x-genops-workload-id"
    model_targets:
      - provider: "openai"
        model: "gpt-4"
    policy_refs:
      - "payments-standard"

policies:
  - id: "payments-standard"
    budgets:
      rolling_tokens:
        window_seconds: 300
        limit_tokens: 250000
        burst_tokens: 10000
    guards:
      max_tokens_per_request: 8192
    decision_tiers:
      tier1_auto:
        action: "throttle"
      tier3_platform:
        action: "kill_workload"

Traffic and identity change:

# Before: sidecar listener audit (webhook-injected env var)
OPENAI_BASE_URL=http://localhost:15080

# After: standalone enforcement (operator-configured)
OPENAI_BASE_URL=http://koshi-koshi.koshi-system.svc.cluster.local:8080
# Identity header sent on every request:
X-GenOps-Workload-Id: prod/payments-api

What happened in this example:

Fields derived from the listener audit:

  • namespace, workload_kind, workload_name, provider — observed directly in structured events
  • decision_shadow: "would_throttle" + reason_code: "guard_max_tokens" — informed the operator that per-request token limits needed attention

Fields chosen by the operator (not in audit output):

  • Standalone workload ID convention (prod/payments-api) — operator decision
  • type, owner_team, environment — organizational metadata, not present in audit events
  • Identity header key (x-genops-workload-id) — operator choice (in v1, all header-mode workloads must share the same key, enforced by config validation)
  • Policy budget and guard values (limit_tokens: 250000, max_tokens_per_request: 8192) — operator decision informed by audit pressure, not a direct translation from built-in listener defaults

Traffic change:

  • Operator rerouted traffic from the sidecar-local localhost:15080 path to the standalone Koshi Service on port 8080 and ensured the identity header is sent on every request

Standalone availability considerations

Standalone enforcement introduces a shared traffic path through a self-hosted Koshi runtime that differs from sidecar modes:

  • The default chart runs a single runtime replica. Operators should evaluate replica count, resource allocation, and disruption budget for their availability requirements.
  • Scaling replicas improves availability, but does not provide globally shared budget state or cross-replica coordination in v1. Each replica maintains independent in-memory accounting.
  • Application traffic must be routed to the standalone deployment (e.g., via a Kubernetes Service), and callers must send the configured workload identity header.

These are standard operational concerns — not blockers — but they should be considered as part of the enforcement rollout design.

Path C: Sidecar custom config via ConfigMap

When to choose: The built-in policy presets don't fit your workload — you need operator-authored budgets, guards, or tier configurations. Custom config works in both listener and enforcement modes, so you can shadow-test custom policy before activating blocking.

A sidecar with configmap + policy annotations and no mode annotation runs in listener mode with the custom policy (shadow decisions against custom budgets/guards). Adding mode: "enforcement" activates blocking.

# Pod template annotations for custom ConfigMap sidecar
annotations:
  runtime.getkoshi.ai/configmap: "my-team-policy"       # required — mounts the ConfigMap
  runtime.getkoshi.ai/policy: "team-standard"            # required — selects policy from ConfigMap
  # runtime.getkoshi.ai/mode: "enforcement"              # optional — uncomment to activate blocking

Required annotations:

  • runtime.getkoshi.ai/configmap: <configmap-name> — mounts the namespace-local ConfigMap
  • runtime.getkoshi.ai/policy: <policy-id> — selects which policy from the ConfigMap to use (required when configmap is set)

Optional:

  • runtime.getkoshi.ai/mode: "enforcement" — activates blocking; omit for listener mode (default)

ConfigMap contract:

  • The ConfigMap must contain a config.yaml data key — the sidecar loads from /etc/koshi-sidecar/config.yaml
  • Do not define workloads in the ConfigMap config — the sidecar synthesizes its own workload from pod identity at startup
  • mode.type in the config file is ignored — mode comes from the annotation only
  • Pod restart is required after ConfigMap content changes or annotation changes

What you get: arbitrary custom policy (operator-authored budgets, guards, tier configs) with per-pod blast radius and pod-derived identity. No routing change, no identity change.

What you don't get: centralized budget coordination or header-based identity. For those, use standalone enforcement.

See examples/sidecar-custom-configmap.yaml and examples/sidecar-custom-deployment.yaml for complete examples.

See the enforcement mode config reference and the pre-enforcement checklist before switching to standalone enforcement.

Posture Discovery in Listener Mode

Listener mode is a policy design sketchpad at the execution boundary. The full enforcement pipeline runs on every AI API request, but decisions are shadow-only — no traffic is blocked. This lets operators observe how specific policy constructs would interact with real traffic before committing to enforcement.

How shadow decisions relate to policy design

Each shadow outcome maps to a specific policy construct that would fire in enforcement mode:

Shadow outcome Policy construct tested What to refine
allow All checks passed Baseline posture is acceptable for this traffic pattern
would_throttle + guard_max_tokens guards.max_tokens_per_request Per-request token guard is tighter than this workload's actual request sizes
would_throttle + budget_exhausted_throttle budgets.rolling_tokens.limit_tokens / window_seconds Rolling budget is tighter than this workload's sustained consumption rate
would_kill + budget_exhausted_kill decision_tiers.tier3_platform.action: "kill_workload" Severe budget pressure — review whether consumption is expected or the budget needs widening
would_reject + identity_missing Identity resolution (webhook injection) Sidecar couldn't resolve workload identity — check that the webhook is injecting env vars
would_reject + policy_not_found Policy lookup No usable policy context — relevant when explicit workload mappings are configured without a default fallback

Iterative refinement workflow

  1. Observe — collect shadow decisions on live traffic. Start with the built-in default listener policy.
  2. Identify pressure points — which reason_code values appear most? Which namespaces or workloads generate would_throttle or would_kill?
  3. Refine policy intent — use the shadow outcomes to decide what guard limits, budget windows, and tier actions are appropriate for each workload class.
  4. Repeat — continue observing until the shadow posture matches your intended enforcement posture. The goal is to reach a state where the shadow outcomes are what you want enforcement to produce.

This observe-refine-repeat loop is the primary value of listener mode. Shadow decisions are not just audit data — they are the feedback signal for designing production policy.

Built-in per-pod policy selection is supported via the runtime.getkoshi.ai/policy annotation — choose from sidecar-baseline, sidecar-strict, or sidecar-high-throughput in both listener and enforcement sidecar modes. For arbitrary custom policy, use the runtime.getkoshi.ai/configmap annotation to mount a namespace-local ConfigMap with operator-authored budgets, guards, and tier configurations. See Path C: Sidecar custom config via ConfigMap.

Architecture

Koshi Runtime implements the GenOps Governance Specification — an open standard for AI workload governance semantics. GenOps defines the event attributes, lifecycle events, and status metadata that Koshi emits. Operators don't need to learn GenOps to use Koshi; it matters when integrating governance telemetry into broader tooling. See GenOps Compatibility.

  • One binary, two roles. KOSHI_ROLE=injector starts the admission webhook server. Default starts the proxy.
  • No Kubernetes API calls on the request path. Pod identity is normalized at admission time by the webhook and read from env vars by the sidecar.
  • Webhook failurePolicy: Ignore. If the injector is down, pods still create — they just don't get the sidecar. Verify sidecar presence after deployment: kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}' — look for koshi-listener.
  • Base-URL injection is safe. OPENAI_BASE_URL / ANTHROPIC_BASE_URL are only set on app containers if not already present. See What Traffic Produces Signal for implications when these vars are already defined.
  • Reservation-first accounting. Tokens are reserved before the request and reconciled with actual usage after the response.
  • Fail open on infrastructure, fail closed on policy. A panic triggers degraded pass-through mode. In enforcement mode, an unknown workload gets 403. In listener mode, unknown workloads emit would_reject and traffic proxies through.

Uninstall / Rollback

# Remove Koshi entirely
helm uninstall koshi -n koshi-system

# Remove the auto-generated TLS secret (created by cert-gen hook, not managed by Helm release)
kubectl delete secret koshi-koshi-webhook-tls -n koshi-system 2>/dev/null || true

# Remove namespace label (stops new pods from getting sidecars)
kubectl label namespace my-namespace runtime.getkoshi.ai/inject-

# Restart workloads to remove existing sidecars
kubectl rollout restart deployment -n my-namespace

Existing pods with sidecars continue to function until restarted. Removing the namespace label only affects future pod creation.

Documentation

For setup, evaluation, contribution, and architectural context:

Start here

Project

  • Roadmap — public product direction for the open runtime
  • Contributing — contribution guide
  • Security — vulnerability reporting and disclosure policy
  • License — Apache 2.0

Design

GenOps Compatibility

Koshi is the runtime operators deploy. GenOps is the open governance specification Koshi implements — it defines the event semantics, required attributes, and interoperability surfaces that make governance data portable across tools and platforms.

You do not need to read the GenOps spec to use Koshi. The spec shows up in three places:

Surface What it provides
Structured events genops.spec.version attribute on every event; required attribute names (genops.accounting.*, genops.policy.*, genops.team, etc.)
GET /status genops_spec_version field in runtime diagnostics
Standalone header defaults Default identity header name x-genops-workload-id follows GenOps naming conventions

Day-one operators can ignore GenOps entirely — Koshi handles compliance internally. Platform teams integrating governance telemetry into broader observability or compliance pipelines benefit from the stable, spec-defined attribute names and event structure.

Spec version: 0.1.0. Built against the GenOps Governance Specification. See Koshi and GenOps for the full relationship.

Development

make build        # Build binary
make test-race    # Run tests with race detector
make lint         # Run linter
make docker       # Build Docker image

Known v1 Limitations

  • In-memory budget state. Each sidecar maintains its own budget. State is lost on restart.
  • No cross-replica coordination. Budget enforcement is per-replica.
  • Single policy per workload. Only the first policy_ref is used.
  • Listener mode accounting is policy-scoped per sidecar. Shadow accounting uses a shared policy key (_default or listener_policy/<id>) rather than a per-workload tracker key. This keeps accounting bounded, but does not provide cross-replica or cluster-wide budget simulation. Listener events still include namespace, workload kind, and workload name for observability; only the in-memory accounting key is policy-scoped.