Koshi Runtime is a workload-scoped governance plane for AI systems. It deploys as a Kubernetes sidecar that enforces deterministic policy at the workload boundary — token budgets, per-request guards, and tiered enforcement decisions — using reservation-first accounting. The sidecar supports three operating shapes: listener audit (shadow-only, default), enforcement with built-in policy presets, and operator-authored custom policy via namespace-local ConfigMap (works in both listener and enforcement modes). A standalone deployment model is available for centralized enforcement with header-based identity.
Discover your governance posture before enforcing it. Koshi ships in listener mode by default: the full enforcement pipeline — identity resolution, policy lookup, guard evaluation, budget accounting — executes on every request, but no traffic is blocked. Shadow decisions (would_reject, would_throttle, would_kill) reveal exactly where your policies would intervene. When the posture matches your intent, activate enforcement on the same sidecar — choose built-in policy presets via the runtime.getkoshi.ai/policy annotation, or deliver custom policy via a namespace-local ConfigMap (runtime.getkoshi.ai/configmap). Both work with a single annotation change and a pod restart. See From Audit to Enforcement.
No repo clone required. Install Koshi directly from the published OCI Helm chart and container image.
# 1. Install into koshi-system
helm install koshi oci://ghcr.io/koshihq/charts/koshi \
--version 0.2.12 \
--namespace koshi-system --create-namespace
# 2. Opt namespaces in — only labeled namespaces get the sidecar
kubectl label namespace my-namespace runtime.getkoshi.ai/inject=true
# 3. Restart workloads to pick up the sidecar
kubectl rollout restart deployment -n my-namespace
# 4. Observe shadow events
kubectl logs -n my-namespace deploy/my-app -c koshi-listener --tail=100 | \
jq 'select(.stream == "event")'
# 5. Check metrics (default sidecar port; adjust if you changed sidecar.port)
kubectl port-forward -n my-namespace deploy/my-app 15080:15080
curl http://localhost:15080/metrics | grep koshi_listener- Install Koshi in listener mode — one Helm command, no repo clone or config files required
- Label any namespace and restart workloads to get sidecars injected
- Collect structured JSON events from
koshi-listenercontainer logs - Scrape Prometheus metrics from
/metricson the sidecar port (default15080, configurable viasidecar.port) - Observe real shadow decisions (
allow,would_throttle,would_kill,would_reject) on live traffic without blocking anything
Workloads produce governance signal when they send OpenAI- or Anthropic-compatible API requests through the sidecar. The webhook injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL env vars pointing at the sidecar (only if the container does not already set them). The sidecar evaluates the request against policy, emits a shadow event, and proxies the request to the real upstream transparently.
Prerequisites for signal:
- The workload's SDK or HTTP client must honor
OPENAI_BASE_URL/ANTHROPIC_BASE_URLenv vars. The official OpenAI and Anthropic SDKs do this by default. - The workload must not already set these env vars in its pod spec — if present, the webhook will not overwrite them.
- The workload must not hardcode provider URLs in application code or config files, bypassing the env vars entirely.
No signal? Check these first:
- Verify the sidecar container exists:
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'— look forkoshi-listener - Verify the env vars were injected:
kubectl get pod <pod> -o jsonpath='{.spec.containers[0].env[*].name}'— look forOPENAI_BASE_URL/ANTHROPIC_BASE_URL - If the env vars are missing, the workload's pod spec likely already defines them — check the Deployment manifest
- If the env vars are present but no events appear, the SDK may not be honoring them — check whether the app uses a custom HTTP client or hardcoded base URL
| Component | Namespace | Purpose |
|---|---|---|
| Injector Deployment | koshi-system |
Mutating admission webhook — injects sidecar into labeled namespaces |
| MutatingWebhookConfiguration | cluster-scoped | Intercepts pod CREATE in namespaces with runtime.getkoshi.ai/inject: "true" |
| ConfigMap | koshi-system |
Runtime config for the control-plane deployment (mode, upstreams, default policy). Injected sidecars use built-in sidecar config by default (mode and policy selected via pod annotations); they do not read this ConfigMap. For custom sidecar policy, operators create a separate namespace-local ConfigMap and reference it via runtime.getkoshi.ai/configmap. |
| TLS Secret | koshi-system |
Webhook serving certificate (self-signed by default) |
| NetworkPolicy | koshi-system |
Restricts injector ingress to apiserver, sidecar egress to upstreams |
The sidecar (koshi-listener) is injected into workload pods automatically. It:
- Listens on
:15080by default (configurable viasidecar.portHelm value) - Exposes
/metrics,/healthz,/readyz,/status - Receives traffic via
OPENAI_BASE_URL/ANTHROPIC_BASE_URLenv vars injected into app containers
Request → Identify workload (pod metadata) → Resolve policy → Extract max_tokens →
Per-request guard check → Reserve tokens (rolling window) → Tier decision →
Emit shadow event (listener) or enforce (enforcement) → Proxy to upstream →
Record actual usage on response
Both modes execute the same enforcement pipeline. Listener mode surfaces governance posture without affecting traffic; enforcement mode acts on it.
| Behavior | Listener | Enforcement |
|---|---|---|
| Identity failure | Emit would_reject, proxy through |
Return 403 |
| Budget exceeded | Emit would_throttle, proxy through |
Return 429 with Retry-After |
| Kill decision | Emit would_kill, proxy through |
Return 503 |
| Metrics | koshi_listener_* series |
koshi_enforcement_* series |
| Default listen addr | :15080 |
:8080 |
In Kubernetes, identity is derived from pod metadata injected by the webhook at admission time:
| Source | Env Var | Example |
|---|---|---|
| Pod namespace | KOSHI_POD_NAMESPACE |
production |
| Owner kind (normalized) | KOSHI_WORKLOAD_KIND |
Deployment |
| Owner name (normalized) | KOSHI_WORKLOAD_NAME |
my-service |
| Pod name | KOSHI_POD_NAME |
my-service-abc123-xyz |
Normalization rules: ReplicaSet owners with pod-template-hash are normalized to Deployment. StatefulSet, DaemonSet, Job, and CronJob owners are used directly. Pods with no owner resolve as Pod/<name>.
In standalone (non-Kubernetes) mode, identity comes from an HTTP header (default: x-genops-workload-id).
Config is loaded from KOSHI_CONFIG_PATH when set. If KOSHI_CONFIG_PATH is unset, injected sidecars use built-in sidecar config; KOSHI_MODE and KOSHI_POLICY_OVERRIDE (set via pod annotations) determine listener vs enforcement mode and built-in policy selection.
Sidecar config behavior: The control-plane deployment (main runtime and injector) uses the charted ConfigMap via KOSHI_CONFIG_PATH. Injected sidecars use built-in sidecar config by default — listener mode by default, enforcement mode when annotated — with selectable built-in policies. For arbitrary custom policy, annotate the pod with runtime.getkoshi.ai/configmap to mount a namespace-local ConfigMap containing custom policies, and runtime.getkoshi.ai/policy to select which policy to use. See Path C: Sidecar custom config.
mode:
type: "listener"
upstreams:
openai: "https://api.openai.com"
anthropic: "https://api.anthropic.com"
default_policy:
id: "_listener_default"
budgets:
rolling_tokens:
window_seconds: 3600
limit_tokens: 1000000
burst_tokens: 0
guards:
max_tokens_per_request: 32768
decision_tiers:
tier1_auto: { action: "throttle" }See examples/listener-config.yaml for a fully annotated reference.
upstreams:
openai: "https://api.openai.com"
anthropic: "https://api.anthropic.com"
workloads:
- id: "my-agent"
identity: { mode: "header", key: "x-genops-workload-id" }
policy_refs: ["standard"]
policies:
- id: "standard"
budgets:
rolling_tokens:
window_seconds: 300
limit_tokens: 100000
burst_tokens: 10000
guards:
max_tokens_per_request: 4096
decision_tiers:
tier1_auto: { action: "throttle" }
tier3_platform: { action: "kill_workload" }See examples/config.yaml for a fully annotated reference.
| Field | Default | Description |
|---|---|---|
mode.type |
"enforcement" |
"listener" or "enforcement" |
default_policy |
none | Policy applied to unknown workloads |
strict_mode |
false |
Reject unknown workloads even if default_policy is set |
sse_extraction |
true |
Extract actual token usage from SSE streams |
listen_addr |
:8080 (enforcement) / :15080 (listener) |
Server listen address |
Events are emitted as JSON to stdout with "stream": "event". Filter with:
kubectl logs -c koshi-listener | jq 'select(.stream == "event")'| Metric | Labels | Description |
|---|---|---|
koshi_listener_decisions_total |
namespace, decision_shadow, reason_code |
Shadow decision counter |
koshi_listener_tokens_total |
namespace, provider, phase |
Token reservation/actual counter |
koshi_listener_latency_seconds |
(none) | Enforcement pipeline latency histogram |
All decisions and error responses include a stable reason_code:
| Code | Meaning |
|---|---|
identity_missing |
Could not resolve workload identity and no default policy fallback available |
policy_not_found |
Identity resolved but no explicit or default policy available for evaluation |
guard_max_tokens |
Request max_tokens exceeds the resolved policy's per-request guard |
budget_exhausted_throttle |
Resolved policy's rolling window budget exceeded → throttle |
budget_exhausted_kill |
Resolved policy's rolling window budget exceeded → kill |
upstream_not_configured |
No upstream URL for detected provider |
upstream_timeout |
Upstream did not respond in time |
system_degraded |
Runtime entered degraded mode |
budget_config_error |
Budget tracker misconfiguration |
See docs/kubernetes-observability.md for detailed observability guidance including sample Prometheus queries and event field reference.
Shadow decisions are computed against the policy context available to the sidecar at request time. If listener mode is started with a default_policy, requests are evaluated against that policy even when no explicit workload-to-policy mappings are defined. In that case, expect allow, would_throttle, or would_kill outcomes — not would_reject. A would_reject shadow decision only appears when Koshi cannot resolve a usable policy context for the request.
| Situation | Policy context | Expected shadow outcomes |
|---|---|---|
Listener with default_policy only |
default_policy |
allow, would_throttle, would_kill |
| Listener with explicit workload-to-policy mapping | matched policy | allow, would_throttle, would_kill |
| Listener with policy override annotation | override policy | allow, would_throttle, would_kill |
| Identity missing, no default policy | none | would_reject (identity_missing) |
| Identity resolved, no matching or default policy | none | would_reject (policy_not_found) |
make docker
# Produces koshi:latest — distroless nonroot imagehelm install koshi oci://ghcr.io/koshihq/charts/koshi \
--version 0.2.12 \
--namespace koshi-system --create-namespaceVersion pinning: Always pin
--versionin production to avoid unexpected upgrades. TheappVersionfield in the chart metadata determines the default image tag whenimage.tagis unset.
Docker Hub mirror: If you prefer Docker Hub, add
--set image.repository=docker.io/koshihq/koshi-runtime. The chart and configuration are identical.
Key Helm values:
| Value | Default | Description |
|---|---|---|
mode |
listener |
Runtime mode |
injector.enabled |
true |
Deploy the admission webhook |
webhook.failurePolicy |
Ignore |
Webhook down → pods still create without sidecar |
sidecar.port |
15080 |
Sidecar listen port |
namespaceSelector.matchLabels |
runtime.getkoshi.ai/inject: "true" |
Which namespaces get injection |
networkPolicy.enabled |
true |
Deploy NetworkPolicy for injector |
| Annotation | Values | Description |
|---|---|---|
runtime.getkoshi.ai/inject |
"false" |
Opt out a specific pod from injection |
runtime.getkoshi.ai/mode |
"enforcement" |
Enable sidecar enforcement mode. When set, the sidecar actively blocks requests that violate policy (429/503). Omit or set to "listener" for shadow-only audit mode (default). |
runtime.getkoshi.ai/policy |
policy ID | Select a built-in sidecar policy: sidecar-baseline (default), sidecar-strict, sidecar-high-throughput. Works in both listener and enforcement sidecar modes. Required when runtime.getkoshi.ai/configmap is set. Unknown IDs fail at sidecar startup. |
runtime.getkoshi.ai/configmap |
ConfigMap name | Mount a namespace-local ConfigMap containing custom sidecar policy. The ConfigMap must contain a config.yaml data key. Must be paired with runtime.getkoshi.ai/policy to select which policy from the ConfigMap to use. Works in both listener and enforcement modes. |
| Endpoint | Method | Response |
|---|---|---|
/healthz |
GET | 200 OK / 503 degraded |
/readyz |
GET | 200 ready / 503 degraded |
/status |
GET | Runtime diagnostics and budget state (JSON) |
/metrics |
GET | Prometheus metrics |
Koshi supports two deployment models today. They serve different purposes and have different operational characteristics.
| Injected sidecar | Standalone deployment | |
|---|---|---|
| Purpose | Governance audit (listener mode, default) or live enforcement with per-pod blast radius (enforcement mode via annotation) | Centralized enforcement with full custom policy config |
| Identity | Pod metadata (injected by webhook at admission) | HTTP header (x-genops-workload-id default) |
| Config source | Built-in sidecar config by default (mode and policy via pod annotations); namespace-local ConfigMap when runtime.getkoshi.ai/configmap is set |
File-based runtime config (KOSHI_CONFIG_PATH / ConfigMap) |
| Policy | Built-in policy catalog (sidecar-baseline, sidecar-strict, sidecar-high-throughput) selectable via annotation; or arbitrary custom policy via namespace-local ConfigMap |
Named policies with per-workload binding; fully operator-defined |
| Traffic effect | Shadow-only by default; active enforcement (403 / 429 / 503) when mode annotation is set | Active — 403 / 429 / 503 on policy violations |
| Blast radius | Per pod (each sidecar is independent) | Centralized (all routed traffic shares one deployment) |
| Availability | Sidecar lifecycle follows the workload pod | Operator-managed; default chart runs a single replica |
| Metrics | koshi_listener_* or koshi_enforcement_* series (depending on mode) |
koshi_enforcement_* series |
The injected sidecar supports three operating shapes, controlled by pod annotations:
| Listener (default) | Enforcement + built-in policy | Custom ConfigMap sidecar | |
|---|---|---|---|
| Mode annotation | none (or listener) |
enforcement |
optional — omit for shadow, enforcement for blocking |
| Policy annotation | optional (selects built-in for shadow eval) | optional (defaults to sidecar-baseline) |
required (selects from ConfigMap) |
| ConfigMap annotation | none | none | required |
| Traffic effect | Shadow only | Active blocking (429/503) | Shadow if no mode annotation; active blocking (429/503) if enforcement |
| Policy source | Built-in defaults | Built-in catalog | Operator-authored ConfigMap |
-
Audit — install Koshi in listener mode, inject sidecars, collect shadow decisions on live traffic. This reveals which workloads generate AI API traffic, what their token patterns look like, and where the default policy boundary sits.
-
Validate — use shadow outcomes, identity coverage, and token pressure to finalize policy intent. See Posture Discovery and the pre-enforcement checklist.
-
Enforce — three paths are available:
- Sidecar enforcement with built-in policy (in-place): add
runtime.getkoshi.ai/mode: enforcementto pod annotations and optionally select a built-in policy viaruntime.getkoshi.ai/policy. No routing change, no identity change, per-pod blast radius preserved. See Path A. - Sidecar custom config via ConfigMap: mount a namespace-local ConfigMap with custom policies via
runtime.getkoshi.ai/configmapandruntime.getkoshi.ai/policyannotations. Works in both listener and enforcement modes. See Path C. - Standalone enforcement (deployment handoff): deploy Koshi as a self-hosted standalone runtime for centralized enforcement and header-based identity. This is a deployment-model handoff. See Path B and the onboarding guide.
- Sidecar enforcement with built-in policy (in-place): add
Most teams should start with Path A or Path C. Both preserve per-pod blast radius and require no routing or identity changes. Choose Path A (built-in presets) when standard limits fit, or Path C (custom ConfigMap) when you need operator-authored budgets and guards. Path B (standalone) is for teams that need centralized enforcement across workloads, header-based identity, or a shared enforcement point.
There are three paths from listener audit to live enforcement. Choose based on your policy requirements.
| Path A: Built-in sidecar | Path C: Custom ConfigMap | Path B: Standalone | |
|---|---|---|---|
| Policy | Built-in catalog (3 presets) | Operator-authored (any budgets/guards/tiers) | Operator-authored (full config file) |
| Identity | Pod metadata (automatic) | Pod metadata (automatic) | HTTP header (manual) |
| Traffic change | None | None | Reroute to standalone Service |
| Blast radius | Per pod | Per pod | All routed workloads |
| Config delivery | Pod annotation | Namespace-local ConfigMap | KOSHI_CONFIG_PATH |
| Best for | Quick adoption, standard limits | Team-specific budgets and guards | Centralized coordination, header identity |
When to choose: You want enforcement with minimal effort and the built-in policy presets fit your workload's token patterns. This is the fastest path from listener audit to live enforcement.
The simplest path. Add pod annotations and enforcement is active on the next pod restart. No routing change, no identity change, no config file.
- Add
runtime.getkoshi.ai/mode: "enforcement"to the pod template annotations - Optionally select a built-in policy:
runtime.getkoshi.ai/policy: "sidecar-strict"(defaults tosidecar-baseline) - Restart the workload — the sidecar now actively blocks requests that violate the selected policy
What you get: live enforcement with per-pod blast radius, pod-derived identity, and built-in policy selection (sidecar-baseline, sidecar-strict, sidecar-high-throughput).
What you don't get: arbitrary custom policy (custom budgets, guards, tier configs) — for that, use Path C (sidecar custom config via ConfigMap). For centralized budget coordination or header-based identity, use standalone enforcement.
See examples/enforcement-sidecar-deployment.yaml for a complete example.
When to choose: You need centralized budget coordination across workloads, header-based identity, or a shared enforcement point. Most teams should start with sidecar enforcement (Path A or C) and only move to standalone if these specific requirements emerge.
Moving to standalone enforcement is not a config change — it is a deployment-model handoff involving three distinct transitions and one operational risk that should be planned for explicitly.
Listener audit results do not automatically become production policies. The handoff requires manual policy operationalization:
- Observed workloads from sidecar events (
namespace,workload_kind,workload_nametuples) must be mapped into explicitworkloadsentries with anid,identity.mode: "header", andpolicy_refs. - Operators must define named
policieswith budget limits, guards, and tier actions informed by shadow outcomes from the audit — but the translation is manual, not automated. policy_refsmust be attached to each workload entry to bind it to the appropriate policies.- Shadow outcomes like
would_throttleandwould_killinform what limits are appropriate, but they do not generate policy config.
Sidecar listener audits resolve identity from pod metadata injected by the webhook at admission time (namespace, workload_kind, workload_name). Standalone enforcement uses a different identity model:
- Standalone enforcement uses
HeaderResolver— identity comes from an HTTP header, not pod metadata. - Operators must choose a deployment-wide identity header key (default:
x-genops-workload-id). - Application code, SDK wrapper, API gateway, or service mesh must send that header on every request routed through the standalone deployment.
- In v1, all header-mode workloads share the same identity key — this is a real implementation constraint, not a configuration default.
Sidecar listener mode works because the webhook redirects app traffic locally to the sidecar on localhost. Standalone enforcement routes traffic through a self-hosted Koshi runtime (in Kubernetes, exposed via a Service) — a fundamentally different traffic path:
- Application HTTP clients must be pointed at the standalone Koshi runtime (e.g., via a Kubernetes Service) instead of directly at AI provider APIs.
- For workloads being moved to standalone enforcement, the injected sidecar path should be removed or bypassed — this typically means removing the namespace label and restarting workloads.
- For standalone enforcement specifically, this is a traffic-path change from sidecar-local routing to standalone routing. (For sidecar enforcement, no traffic change is needed — see Path A and Path C.)
This handoff is a traffic-path change, not just a policy or config change. Operators should plan for:
- Shared enforcement path: all routed traffic flows through a self-hosted Koshi runtime, unlike per-pod sidecars where each workload has its own independent sidecar instance.
- Broader-scope traffic change: a misconfiguration in the standalone runtime or its routing (e.g., Kubernetes Service) affects all workloads routed through it, so testing with a narrow subset first is worthwhile.
- Traffic cutover coordination: incorrect routing targets, missing identity headers, or DNS issues can misroute traffic. Validate connectivity before shifting production workloads.
- Rollback path: restoring the previous state means re-enabling sidecar injection and restarting workloads, not changing a config value. Plan this path before cutting over.
Recommendation: start with a small number of workloads. The sidecar listener namespace label can be re-applied and workloads restarted to restore the audit-only posture at any time.
Listener audit observed:
namespace: "prod"
workload_kind: "Deployment"
workload_name: "payments-api"
provider: "openai"
decision_shadow: "would_throttle"
reason_code: "guard_max_tokens"
Standalone enforcement config:
mode:
type: "enforcement"
upstreams:
openai: "https://api.openai.com"
workloads:
- id: "prod/payments-api"
type: "service"
owner_team: "payments"
environment: "production"
identity:
mode: "header"
key: "x-genops-workload-id"
model_targets:
- provider: "openai"
model: "gpt-4"
policy_refs:
- "payments-standard"
policies:
- id: "payments-standard"
budgets:
rolling_tokens:
window_seconds: 300
limit_tokens: 250000
burst_tokens: 10000
guards:
max_tokens_per_request: 8192
decision_tiers:
tier1_auto:
action: "throttle"
tier3_platform:
action: "kill_workload"Traffic and identity change:
# Before: sidecar listener audit (webhook-injected env var)
OPENAI_BASE_URL=http://localhost:15080
# After: standalone enforcement (operator-configured)
OPENAI_BASE_URL=http://koshi-koshi.koshi-system.svc.cluster.local:8080
# Identity header sent on every request:
X-GenOps-Workload-Id: prod/payments-apiWhat happened in this example:
Fields derived from the listener audit:
namespace,workload_kind,workload_name,provider— observed directly in structured eventsdecision_shadow: "would_throttle"+reason_code: "guard_max_tokens"— informed the operator that per-request token limits needed attention
Fields chosen by the operator (not in audit output):
- Standalone workload ID convention (
prod/payments-api) — operator decision type,owner_team,environment— organizational metadata, not present in audit events- Identity header key (
x-genops-workload-id) — operator choice (in v1, all header-mode workloads must share the same key, enforced by config validation) - Policy budget and guard values (
limit_tokens: 250000,max_tokens_per_request: 8192) — operator decision informed by audit pressure, not a direct translation from built-in listener defaults
Traffic change:
- Operator rerouted traffic from the sidecar-local
localhost:15080path to the standalone Koshi Service on port 8080 and ensured the identity header is sent on every request
Standalone enforcement introduces a shared traffic path through a self-hosted Koshi runtime that differs from sidecar modes:
- The default chart runs a single runtime replica. Operators should evaluate replica count, resource allocation, and disruption budget for their availability requirements.
- Scaling replicas improves availability, but does not provide globally shared budget state or cross-replica coordination in v1. Each replica maintains independent in-memory accounting.
- Application traffic must be routed to the standalone deployment (e.g., via a Kubernetes Service), and callers must send the configured workload identity header.
These are standard operational concerns — not blockers — but they should be considered as part of the enforcement rollout design.
When to choose: The built-in policy presets don't fit your workload — you need operator-authored budgets, guards, or tier configurations. Custom config works in both listener and enforcement modes, so you can shadow-test custom policy before activating blocking.
A sidecar with configmap + policy annotations and no mode annotation runs in listener mode with the custom policy (shadow decisions against custom budgets/guards). Adding mode: "enforcement" activates blocking.
# Pod template annotations for custom ConfigMap sidecar
annotations:
runtime.getkoshi.ai/configmap: "my-team-policy" # required — mounts the ConfigMap
runtime.getkoshi.ai/policy: "team-standard" # required — selects policy from ConfigMap
# runtime.getkoshi.ai/mode: "enforcement" # optional — uncomment to activate blockingRequired annotations:
runtime.getkoshi.ai/configmap: <configmap-name>— mounts the namespace-local ConfigMapruntime.getkoshi.ai/policy: <policy-id>— selects which policy from the ConfigMap to use (required when configmap is set)
Optional:
runtime.getkoshi.ai/mode: "enforcement"— activates blocking; omit for listener mode (default)
ConfigMap contract:
- The ConfigMap must contain a
config.yamldata key — the sidecar loads from/etc/koshi-sidecar/config.yaml - Do not define
workloadsin the ConfigMap config — the sidecar synthesizes its own workload from pod identity at startup mode.typein the config file is ignored — mode comes from the annotation only- Pod restart is required after ConfigMap content changes or annotation changes
What you get: arbitrary custom policy (operator-authored budgets, guards, tier configs) with per-pod blast radius and pod-derived identity. No routing change, no identity change.
What you don't get: centralized budget coordination or header-based identity. For those, use standalone enforcement.
See examples/sidecar-custom-configmap.yaml and examples/sidecar-custom-deployment.yaml for complete examples.
See the enforcement mode config reference and the pre-enforcement checklist before switching to standalone enforcement.
Listener mode is a policy design sketchpad at the execution boundary. The full enforcement pipeline runs on every AI API request, but decisions are shadow-only — no traffic is blocked. This lets operators observe how specific policy constructs would interact with real traffic before committing to enforcement.
Each shadow outcome maps to a specific policy construct that would fire in enforcement mode:
| Shadow outcome | Policy construct tested | What to refine |
|---|---|---|
allow |
All checks passed | Baseline posture is acceptable for this traffic pattern |
would_throttle + guard_max_tokens |
guards.max_tokens_per_request |
Per-request token guard is tighter than this workload's actual request sizes |
would_throttle + budget_exhausted_throttle |
budgets.rolling_tokens.limit_tokens / window_seconds |
Rolling budget is tighter than this workload's sustained consumption rate |
would_kill + budget_exhausted_kill |
decision_tiers.tier3_platform.action: "kill_workload" |
Severe budget pressure — review whether consumption is expected or the budget needs widening |
would_reject + identity_missing |
Identity resolution (webhook injection) | Sidecar couldn't resolve workload identity — check that the webhook is injecting env vars |
would_reject + policy_not_found |
Policy lookup | No usable policy context — relevant when explicit workload mappings are configured without a default fallback |
- Observe — collect shadow decisions on live traffic. Start with the built-in default listener policy.
- Identify pressure points — which
reason_codevalues appear most? Which namespaces or workloads generatewould_throttleorwould_kill? - Refine policy intent — use the shadow outcomes to decide what guard limits, budget windows, and tier actions are appropriate for each workload class.
- Repeat — continue observing until the shadow posture matches your intended enforcement posture. The goal is to reach a state where the shadow outcomes are what you want enforcement to produce.
This observe-refine-repeat loop is the primary value of listener mode. Shadow decisions are not just audit data — they are the feedback signal for designing production policy.
Built-in per-pod policy selection is supported via the runtime.getkoshi.ai/policy annotation — choose from sidecar-baseline, sidecar-strict, or sidecar-high-throughput in both listener and enforcement sidecar modes. For arbitrary custom policy, use the runtime.getkoshi.ai/configmap annotation to mount a namespace-local ConfigMap with operator-authored budgets, guards, and tier configurations. See Path C: Sidecar custom config via ConfigMap.
Koshi Runtime implements the GenOps Governance Specification — an open standard for AI workload governance semantics. GenOps defines the event attributes, lifecycle events, and status metadata that Koshi emits. Operators don't need to learn GenOps to use Koshi; it matters when integrating governance telemetry into broader tooling. See GenOps Compatibility.
- One binary, two roles.
KOSHI_ROLE=injectorstarts the admission webhook server. Default starts the proxy. - No Kubernetes API calls on the request path. Pod identity is normalized at admission time by the webhook and read from env vars by the sidecar.
- Webhook
failurePolicy: Ignore. If the injector is down, pods still create — they just don't get the sidecar. Verify sidecar presence after deployment:kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'— look forkoshi-listener. - Base-URL injection is safe.
OPENAI_BASE_URL/ANTHROPIC_BASE_URLare only set on app containers if not already present. See What Traffic Produces Signal for implications when these vars are already defined. - Reservation-first accounting. Tokens are reserved before the request and reconciled with actual usage after the response.
- Fail open on infrastructure, fail closed on policy. A panic triggers degraded pass-through mode. In enforcement mode, an unknown workload gets 403. In listener mode, unknown workloads emit
would_rejectand traffic proxies through.
# Remove Koshi entirely
helm uninstall koshi -n koshi-system
# Remove the auto-generated TLS secret (created by cert-gen hook, not managed by Helm release)
kubectl delete secret koshi-koshi-webhook-tls -n koshi-system 2>/dev/null || true
# Remove namespace label (stops new pods from getting sidecars)
kubectl label namespace my-namespace runtime.getkoshi.ai/inject-
# Restart workloads to remove existing sidecars
kubectl rollout restart deployment -n my-namespaceExisting pods with sidecars continue to function until restarted. Removing the namespace label only affects future pod creation.
For setup, evaluation, contribution, and architectural context:
Start here
- Koshi onboarding — install, verify, collect signal, interpret shadow outcomes
- Local demo walkthrough — kind cluster setup, synthetic traffic, event and metric validation
- Kubernetes observability guide — structured events, Prometheus queries, Grafana patterns
Project
- Roadmap — public product direction for the open runtime
- Contributing — contribution guide
- Security — vulnerability reporting and disclosure policy
- License — Apache 2.0
Design
- Deterministic Accounting Invariants
- Enforcement Boundary
- Operator Trust Guarantees
- Why Koshi Exists
- Koshi and GenOps
Koshi is the runtime operators deploy. GenOps is the open governance specification Koshi implements — it defines the event semantics, required attributes, and interoperability surfaces that make governance data portable across tools and platforms.
You do not need to read the GenOps spec to use Koshi. The spec shows up in three places:
| Surface | What it provides |
|---|---|
| Structured events | genops.spec.version attribute on every event; required attribute names (genops.accounting.*, genops.policy.*, genops.team, etc.) |
GET /status |
genops_spec_version field in runtime diagnostics |
| Standalone header defaults | Default identity header name x-genops-workload-id follows GenOps naming conventions |
Day-one operators can ignore GenOps entirely — Koshi handles compliance internally. Platform teams integrating governance telemetry into broader observability or compliance pipelines benefit from the stable, spec-defined attribute names and event structure.
Spec version: 0.1.0. Built against the GenOps Governance Specification. See Koshi and GenOps for the full relationship.
make build # Build binary
make test-race # Run tests with race detector
make lint # Run linter
make docker # Build Docker image- In-memory budget state. Each sidecar maintains its own budget. State is lost on restart.
- No cross-replica coordination. Budget enforcement is per-replica.
- Single policy per workload. Only the first
policy_refis used. - Listener mode accounting is policy-scoped per sidecar. Shadow accounting uses a shared policy key (
_defaultorlistener_policy/<id>) rather than a per-workload tracker key. This keeps accounting bounded, but does not provide cross-replica or cluster-wide budget simulation. Listener events still include namespace, workload kind, and workload name for observability; only the in-memory accounting key is policy-scoped.