Koshi Runtime

Koshi Runtime is a workload-scoped governance plane for AI systems. It deploys as a Kubernetes sidecar that enforces deterministic policy at the workload boundary — token budgets, per-request guards, and tiered enforcement decisions — using reservation-first accounting. The sidecar supports three operating shapes: listener audit (shadow-only, default), enforcement with built-in policy presets, and operator-authored custom policy via namespace-local ConfigMap (works in both listener and enforcement modes). A standalone deployment model is available for centralized enforcement with header-based identity.

Discover your governance posture before enforcing it. Koshi ships in listener mode by default: the full enforcement pipeline — identity resolution, policy lookup, guard evaluation, budget accounting — executes on every request, but no traffic is blocked. Shadow decisions (would_reject, would_throttle, would_kill) reveal exactly where your policies would intervene. When the posture matches your intent, activate enforcement on the same sidecar — choose built-in policy presets via the runtime.getkoshi.ai/policy annotation, or deliver custom policy via a namespace-local ConfigMap (runtime.getkoshi.ai/configmap). Both work with a single annotation change and a pod restart. See From Audit to Enforcement.

No repo clone required. Install Koshi directly from the published OCI Helm chart and container image.

Quick Start: Kubernetes (Listener Mode)

# 1. Install into koshi-system
helm install koshi oci://ghcr.io/koshihq/charts/koshi \
  --version 0.2.12 \
  --namespace koshi-system --create-namespace

# 2. Opt namespaces in — only labeled namespaces get the sidecar
kubectl label namespace my-namespace runtime.getkoshi.ai/inject=true

# 3. Restart workloads to pick up the sidecar
kubectl rollout restart deployment -n my-namespace

# 4. Observe shadow events
kubectl logs -n my-namespace deploy/my-app -c koshi-listener --tail=100 | \
  jq 'select(.stream == "event")'

# 5. Check metrics (default sidecar port; adjust if you changed sidecar.port)
kubectl port-forward -n my-namespace deploy/my-app 15080:15080
curl http://localhost:15080/metrics | grep koshi_listener

What You Can Do on Day One

Install Koshi in listener mode — one Helm command, no repo clone or config files required
Label any namespace and restart workloads to get sidecars injected
Collect structured JSON events from koshi-listener container logs
Scrape Prometheus metrics from /metrics on the sidecar port (default 15080, configurable via sidecar.port)
Observe real shadow decisions (allow, would_throttle, would_kill, would_reject) on live traffic without blocking anything

What Traffic Produces Signal

Workloads produce governance signal when they send OpenAI- or Anthropic-compatible API requests through the sidecar. The webhook injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL env vars pointing at the sidecar (only if the container does not already set them). The sidecar evaluates the request against policy, emits a shadow event, and proxies the request to the real upstream transparently.

Prerequisites for signal:

The workload's SDK or HTTP client must honor OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars. The official OpenAI and Anthropic SDKs do this by default.
The workload must not already set these env vars in its pod spec — if present, the webhook will not overwrite them.
The workload must not hardcode provider URLs in application code or config files, bypassing the env vars entirely.

No signal? Check these first:

Verify the sidecar container exists: kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}' — look for koshi-listener
Verify the env vars were injected: kubectl get pod <pod> -o jsonpath='{.spec.containers[0].env[*].name}' — look for OPENAI_BASE_URL / ANTHROPIC_BASE_URL
If the env vars are missing, the workload's pod spec likely already defines them — check the Deployment manifest
If the env vars are present but no events appear, the SDK may not be honoring them — check whether the app uses a custom HTTP client or hardcoded base URL

What Gets Installed

Component	Namespace	Purpose
Injector Deployment	`koshi-system`	Mutating admission webhook — injects sidecar into labeled namespaces
MutatingWebhookConfiguration	cluster-scoped	Intercepts pod CREATE in namespaces with `runtime.getkoshi.ai/inject: "true"`
ConfigMap	`koshi-system`	Runtime config for the control-plane deployment (mode, upstreams, default policy). Injected sidecars use built-in sidecar config by default (mode and policy selected via pod annotations); they do not read this ConfigMap. For custom sidecar policy, operators create a separate namespace-local ConfigMap and reference it via `runtime.getkoshi.ai/configmap`.
TLS Secret	`koshi-system`	Webhook serving certificate (self-signed by default)
NetworkPolicy	`koshi-system`	Restricts injector ingress to apiserver, sidecar egress to upstreams

The sidecar (koshi-listener) is injected into workload pods automatically. It:

Listens on :15080 by default (configurable via sidecar.port Helm value)
Exposes /metrics, /healthz, /readyz, /status
Receives traffic via OPENAI_BASE_URL / ANTHROPIC_BASE_URL env vars injected into app containers

How It Works

Request → Identify workload (pod metadata) → Resolve policy → Extract max_tokens →
  Per-request guard check → Reserve tokens (rolling window) → Tier decision →
  Emit shadow event (listener) or enforce (enforcement) → Proxy to upstream →
  Record actual usage on response

Listener vs Enforcement Mode

Both modes execute the same enforcement pipeline. Listener mode surfaces governance posture without affecting traffic; enforcement mode acts on it.

Behavior	Listener	Enforcement
Identity failure	Emit `would_reject`, proxy through	Return 403
Budget exceeded	Emit `would_throttle`, proxy through	Return 429 with `Retry-After`
Kill decision	Emit `would_kill`, proxy through	Return 503
Metrics	`koshi_listener_*` series	`koshi_enforcement_*` series
Default listen addr	`:15080`	`:8080`

Workload Identity

In Kubernetes, identity is derived from pod metadata injected by the webhook at admission time:

Source	Env Var	Example
Pod namespace	`KOSHI_POD_NAMESPACE`	`production`
Owner kind (normalized)	`KOSHI_WORKLOAD_KIND`	`Deployment`
Owner name (normalized)	`KOSHI_WORKLOAD_NAME`	`my-service`
Pod name	`KOSHI_POD_NAME`	`my-service-abc123-xyz`

Normalization rules: ReplicaSet owners with pod-template-hash are normalized to Deployment. StatefulSet, DaemonSet, Job, and CronJob owners are used directly. Pods with no owner resolve as Pod/<name>.

In standalone (non-Kubernetes) mode, identity comes from an HTTP header (default: x-genops-workload-id).

Configuration

Config is loaded from KOSHI_CONFIG_PATH when set. If KOSHI_CONFIG_PATH is unset, injected sidecars use built-in sidecar config; KOSHI_MODE and KOSHI_POLICY_OVERRIDE (set via pod annotations) determine listener vs enforcement mode and built-in policy selection.

Sidecar config behavior: The control-plane deployment (main runtime and injector) uses the charted ConfigMap via KOSHI_CONFIG_PATH. Injected sidecars use built-in sidecar config by default — listener mode by default, enforcement mode when annotated — with selectable built-in policies. For arbitrary custom policy, annotate the pod with runtime.getkoshi.ai/configmap to mount a namespace-local ConfigMap containing custom policies, and runtime.getkoshi.ai/policy to select which policy to use. See Path C: Sidecar custom config.

Listener Mode (recommended starting point)

mode:
  type: "listener"

upstreams:
  openai: "https://api.openai.com"
  anthropic: "https://api.anthropic.com"

default_policy:
  id: "_listener_default"
  budgets:
    rolling_tokens:
      window_seconds: 3600
      limit_tokens: 1000000
      burst_tokens: 0
  guards:
    max_tokens_per_request: 32768
  decision_tiers:
    tier1_auto: { action: "throttle" }

See examples/listener-config.yaml for a fully annotated reference.

Enforcement Mode

upstreams:
  openai: "https://api.openai.com"
  anthropic: "https://api.anthropic.com"

workloads:
  - id: "my-agent"
    identity: { mode: "header", key: "x-genops-workload-id" }
    policy_refs: ["standard"]

policies:
  - id: "standard"
    budgets:
      rolling_tokens:
        window_seconds: 300
        limit_tokens: 100000
        burst_tokens: 10000
    guards:
      max_tokens_per_request: 4096
    decision_tiers:
      tier1_auto: { action: "throttle" }
      tier3_platform: { action: "kill_workload" }

See examples/config.yaml for a fully annotated reference.

Settings Reference

Field	Default	Description
`mode.type`	`"enforcement"`	`"listener"` or `"enforcement"`
`default_policy`	none	Policy applied to unknown workloads
`strict_mode`	`false`	Reject unknown workloads even if `default_policy` is set
`sse_extraction`	`true`	Extract actual token usage from SSE streams
`listen_addr`	`:8080` (enforcement) / `:15080` (listener)	Server listen address

Observability

Structured Events

Events are emitted as JSON to stdout with "stream": "event". Filter with:

kubectl logs -c koshi-listener | jq 'select(.stream == "event")'

Listener Metrics

Metric	Labels	Description
`koshi_listener_decisions_total`	`namespace`, `decision_shadow`, `reason_code`	Shadow decision counter
`koshi_listener_tokens_total`	`namespace`, `provider`, `phase`	Token reservation/actual counter
`koshi_listener_latency_seconds`	(none)	Enforcement pipeline latency histogram

Reason Codes

All decisions and error responses include a stable reason_code:

Code	Meaning
`identity_missing`	Could not resolve workload identity and no default policy fallback available
`policy_not_found`	Identity resolved but no explicit or default policy available for evaluation
`guard_max_tokens`	Request `max_tokens` exceeds the resolved policy's per-request guard
`budget_exhausted_throttle`	Resolved policy's rolling window budget exceeded → throttle
`budget_exhausted_kill`	Resolved policy's rolling window budget exceeded → kill
`upstream_not_configured`	No upstream URL for detected provider
`upstream_timeout`	Upstream did not respond in time
`system_degraded`	Runtime entered degraded mode
`budget_config_error`	Budget tracker misconfiguration

See docs/kubernetes-observability.md for detailed observability guidance including sample Prometheus queries and event field reference.

How Shadow Decisions Relate to Policy

Shadow decisions are computed against the policy context available to the sidecar at request time. If listener mode is started with a default_policy, requests are evaluated against that policy even when no explicit workload-to-policy mappings are defined. In that case, expect allow, would_throttle, or would_kill outcomes — not would_reject. A would_reject shadow decision only appears when Koshi cannot resolve a usable policy context for the request.

Situation	Policy context	Expected shadow outcomes
Listener with `default_policy` only	`default_policy`	`allow`, `would_throttle`, `would_kill`
Listener with explicit workload-to-policy mapping	matched policy	`allow`, `would_throttle`, `would_kill`
Listener with policy override annotation	override policy	`allow`, `would_throttle`, `would_kill`
Identity missing, no default policy	none	`would_reject` (`identity_missing`)
Identity resolved, no matching or default policy	none	`would_reject` (`policy_not_found`)

Deployment

Docker

make docker
# Produces koshi:latest — distroless nonroot image

Kubernetes (Helm)

helm install koshi oci://ghcr.io/koshihq/charts/koshi \
  --version 0.2.12 \
  --namespace koshi-system --create-namespace

Version pinning: Always pin --version in production to avoid unexpected upgrades. The appVersion field in the chart metadata determines the default image tag when image.tag is unset.

Docker Hub mirror: If you prefer Docker Hub, add --set image.repository=docker.io/koshihq/koshi-runtime. The chart and configuration are identical.

Key Helm values:

Value	Default	Description
`mode`	`listener`	Runtime mode
`injector.enabled`	`true`	Deploy the admission webhook
`webhook.failurePolicy`	`Ignore`	Webhook down → pods still create without sidecar
`sidecar.port`	`15080`	Sidecar listen port
`namespaceSelector.matchLabels`	`runtime.getkoshi.ai/inject: "true"`	Which namespaces get injection
`networkPolicy.enabled`	`true`	Deploy NetworkPolicy for injector

Annotations

Annotation	Values	Description
`runtime.getkoshi.ai/inject`	`"false"`	Opt out a specific pod from injection
`runtime.getkoshi.ai/mode`	`"enforcement"`	Enable sidecar enforcement mode. When set, the sidecar actively blocks requests that violate policy (429/503). Omit or set to `"listener"` for shadow-only audit mode (default).
`runtime.getkoshi.ai/policy`	policy ID	Select a built-in sidecar policy: `sidecar-baseline` (default), `sidecar-strict`, `sidecar-high-throughput`. Works in both listener and enforcement sidecar modes. Required when `runtime.getkoshi.ai/configmap` is set. Unknown IDs fail at sidecar startup.
`runtime.getkoshi.ai/configmap`	ConfigMap name	Mount a namespace-local ConfigMap containing custom sidecar policy. The ConfigMap must contain a `config.yaml` data key. Must be paired with `runtime.getkoshi.ai/policy` to select which policy from the ConfigMap to use. Works in both listener and enforcement modes.

Health Endpoints

Endpoint	Method	Response
`/healthz`	GET	200 OK / 503 degraded
`/readyz`	GET	200 ready / 503 degraded
`/status`	GET	Runtime diagnostics and budget state (JSON)
`/metrics`	GET	Prometheus metrics

Deployment Models

Koshi supports two deployment models today. They serve different purposes and have different operational characteristics.

	Injected sidecar	Standalone deployment
Purpose	Governance audit (listener mode, default) or live enforcement with per-pod blast radius (enforcement mode via annotation)	Centralized enforcement with full custom policy config
Identity	Pod metadata (injected by webhook at admission)	HTTP header (`x-genops-workload-id` default)
Config source	Built-in sidecar config by default (mode and policy via pod annotations); namespace-local ConfigMap when `runtime.getkoshi.ai/configmap` is set	File-based runtime config (`KOSHI_CONFIG_PATH` / ConfigMap)
Policy	Built-in policy catalog (`sidecar-baseline`, `sidecar-strict`, `sidecar-high-throughput`) selectable via annotation; or arbitrary custom policy via namespace-local ConfigMap	Named policies with per-workload binding; fully operator-defined
Traffic effect	Shadow-only by default; active enforcement (403 / 429 / 503) when mode annotation is set	Active — 403 / 429 / 503 on policy violations
Blast radius	Per pod (each sidecar is independent)	Centralized (all routed traffic shares one deployment)
Availability	Sidecar lifecycle follows the workload pod	Operator-managed; default chart runs a single replica
Metrics	`koshi_listener_` or `koshi_enforcement_` series (depending on mode)	`koshi_enforcement_*` series

Sidecar Operating Shapes

The injected sidecar supports three operating shapes, controlled by pod annotations:

	Listener (default)	Enforcement + built-in policy	Custom ConfigMap sidecar
Mode annotation	none (or `listener`)	`enforcement`	optional — omit for shadow, `enforcement` for blocking
Policy annotation	optional (selects built-in for shadow eval)	optional (defaults to `sidecar-baseline`)	required (selects from ConfigMap)
ConfigMap annotation	none	none	required
Traffic effect	Shadow only	Active blocking (429/503)	Shadow if no mode annotation; active blocking (429/503) if `enforcement`
Policy source	Built-in defaults	Built-in catalog	Operator-authored ConfigMap

Current Adoption Path

Audit — install Koshi in listener mode, inject sidecars, collect shadow decisions on live traffic. This reveals which workloads generate AI API traffic, what their token patterns look like, and where the default policy boundary sits.
Validate — use shadow outcomes, identity coverage, and token pressure to finalize policy intent. See Posture Discovery and the pre-enforcement checklist.
Enforce — three paths are available:
- Sidecar enforcement with built-in policy (in-place): add runtime.getkoshi.ai/mode: enforcement to pod annotations and optionally select a built-in policy via runtime.getkoshi.ai/policy. No routing change, no identity change, per-pod blast radius preserved. See Path A.
- Sidecar custom config via ConfigMap: mount a namespace-local ConfigMap with custom policies via runtime.getkoshi.ai/configmap and runtime.getkoshi.ai/policy annotations. Works in both listener and enforcement modes. See Path C.
- Standalone enforcement (deployment handoff): deploy Koshi as a self-hosted standalone runtime for centralized enforcement and header-based identity. This is a deployment-model handoff. See Path B and the onboarding guide.

Most teams should start with Path A or Path C. Both preserve per-pod blast radius and require no routing or identity changes. Choose Path A (built-in presets) when standard limits fit, or Path C (custom ConfigMap) when you need operator-authored budgets and guards. Path B (standalone) is for teams that need centralized enforcement across workloads, header-based identity, or a shared enforcement point.

From Audit to Enforcement

There are three paths from listener audit to live enforcement. Choose based on your policy requirements.

	Path A: Built-in sidecar	Path C: Custom ConfigMap	Path B: Standalone
Policy	Built-in catalog (3 presets)	Operator-authored (any budgets/guards/tiers)	Operator-authored (full config file)
Identity	Pod metadata (automatic)	Pod metadata (automatic)	HTTP header (manual)
Traffic change	None	None	Reroute to standalone Service
Blast radius	Per pod	Per pod	All routed workloads
Config delivery	Pod annotation	Namespace-local ConfigMap	`KOSHI_CONFIG_PATH`
Best for	Quick adoption, standard limits	Team-specific budgets and guards	Centralized coordination, header identity

Path A: Sidecar enforcement (in-place)

When to choose: You want enforcement with minimal effort and the built-in policy presets fit your workload's token patterns. This is the fastest path from listener audit to live enforcement.

The simplest path. Add pod annotations and enforcement is active on the next pod restart. No routing change, no identity change, no config file.

Add runtime.getkoshi.ai/mode: "enforcement" to the pod template annotations
Optionally select a built-in policy: runtime.getkoshi.ai/policy: "sidecar-strict" (defaults to sidecar-baseline)
Restart the workload — the sidecar now actively blocks requests that violate the selected policy

What you get: live enforcement with per-pod blast radius, pod-derived identity, and built-in policy selection (sidecar-baseline, sidecar-strict, sidecar-high-throughput).

What you don't get: arbitrary custom policy (custom budgets, guards, tier configs) — for that, use Path C (sidecar custom config via ConfigMap). For centralized budget coordination or header-based identity, use standalone enforcement.

See examples/enforcement-sidecar-deployment.yaml for a complete example.

Path B: Standalone enforcement (deployment handoff)

When to choose: You need centralized budget coordination across workloads, header-based identity, or a shared enforcement point. Most teams should start with sidecar enforcement (Path A or C) and only move to standalone if these specific requirements emerge.

Moving to standalone enforcement is not a config change — it is a deployment-model handoff involving three distinct transitions and one operational risk that should be planned for explicitly.

Policy transition

Listener audit results do not automatically become production policies. The handoff requires manual policy operationalization:

Observed workloads from sidecar events (namespace, workload_kind, workload_name tuples) must be mapped into explicit workloads entries with an id, identity.mode: "header", and policy_refs.
Operators must define named policies with budget limits, guards, and tier actions informed by shadow outcomes from the audit — but the translation is manual, not automated.
policy_refs must be attached to each workload entry to bind it to the appropriate policies.
Shadow outcomes like would_throttle and would_kill inform what limits are appropriate, but they do not generate policy config.

Identity transition

Sidecar listener audits resolve identity from pod metadata injected by the webhook at admission time (namespace, workload_kind, workload_name). Standalone enforcement uses a different identity model:

Standalone enforcement uses HeaderResolver — identity comes from an HTTP header, not pod metadata.
Operators must choose a deployment-wide identity header key (default: x-genops-workload-id).
Application code, SDK wrapper, API gateway, or service mesh must send that header on every request routed through the standalone deployment.
In v1, all header-mode workloads share the same identity key — this is a real implementation constraint, not a configuration default.

Traffic transition

Sidecar listener mode works because the webhook redirects app traffic locally to the sidecar on localhost. Standalone enforcement routes traffic through a self-hosted Koshi runtime (in Kubernetes, exposed via a Service) — a fundamentally different traffic path:

Application HTTP clients must be pointed at the standalone Koshi runtime (e.g., via a Kubernetes Service) instead of directly at AI provider APIs.
For workloads being moved to standalone enforcement, the injected sidecar path should be removed or bypassed — this typically means removing the namespace label and restarting workloads.
For standalone enforcement specifically, this is a traffic-path change from sidecar-local routing to standalone routing. (For sidecar enforcement, no traffic change is needed — see Path A and Path C.)

Rollout considerations

This handoff is a traffic-path change, not just a policy or config change. Operators should plan for:

Shared enforcement path: all routed traffic flows through a self-hosted Koshi runtime, unlike per-pod sidecars where each workload has its own independent sidecar instance.
Broader-scope traffic change: a misconfiguration in the standalone runtime or its routing (e.g., Kubernetes Service) affects all workloads routed through it, so testing with a narrow subset first is worthwhile.
Traffic cutover coordination: incorrect routing targets, missing identity headers, or DNS issues can misroute traffic. Validate connectivity before shifting production workloads.
Rollback path: restoring the previous state means re-enabling sidecar injection and restarting workloads, not changing a config value. Plan this path before cutting over.

Recommendation: start with a small number of workloads. The sidecar listener namespace label can be re-applied and workloads restarted to restore the audit-only posture at any time.

Worked example: one audited workload to standalone enforcement

Listener audit observed:

namespace:        "prod"
workload_kind:    "Deployment"
workload_name:    "payments-api"
provider:         "openai"
decision_shadow:  "would_throttle"
reason_code:      "guard_max_tokens"

Standalone enforcement config:

mode:
  type: "enforcement"

upstreams:
  openai: "https://api.openai.com"

workloads:
  - id: "prod/payments-api"
    type: "service"
    owner_team: "payments"
    environment: "production"
    identity:
      mode: "header"
      key: "x-genops-workload-id"
    model_targets:
      - provider: "openai"
        model: "gpt-4"
    policy_refs:
      - "payments-standard"

policies:
  - id: "payments-standard"
    budgets:
      rolling_tokens:
        window_seconds: 300
        limit_tokens: 250000
        burst_tokens: 10000
    guards:
      max_tokens_per_request: 8192
    decision_tiers:
      tier1_auto:
        action: "throttle"
      tier3_platform:
        action: "kill_workload"

Traffic and identity change:

# Before: sidecar listener audit (webhook-injected env var)
OPENAI_BASE_URL=http://localhost:15080

# After: standalone enforcement (operator-configured)
OPENAI_BASE_URL=http://koshi-koshi.koshi-system.svc.cluster.local:8080
# Identity header sent on every request:
X-GenOps-Workload-Id: prod/payments-api

What happened in this example:

Fields derived from the listener audit:

namespace, workload_kind, workload_name, provider — observed directly in structured events
decision_shadow: "would_throttle" + reason_code: "guard_max_tokens" — informed the operator that per-request token limits needed attention

Fields chosen by the operator (not in audit output):

Standalone workload ID convention (prod/payments-api) — operator decision
type, owner_team, environment — organizational metadata, not present in audit events
Identity header key (x-genops-workload-id) — operator choice (in v1, all header-mode workloads must share the same key, enforced by config validation)
Policy budget and guard values (limit_tokens: 250000, max_tokens_per_request: 8192) — operator decision informed by audit pressure, not a direct translation from built-in listener defaults

Traffic change:

Operator rerouted traffic from the sidecar-local localhost:15080 path to the standalone Koshi Service on port 8080 and ensured the identity header is sent on every request

Standalone availability considerations

Standalone enforcement introduces a shared traffic path through a self-hosted Koshi runtime that differs from sidecar modes:

The default chart runs a single runtime replica. Operators should evaluate replica count, resource allocation, and disruption budget for their availability requirements.
Scaling replicas improves availability, but does not provide globally shared budget state or cross-replica coordination in v1. Each replica maintains independent in-memory accounting.
Application traffic must be routed to the standalone deployment (e.g., via a Kubernetes Service), and callers must send the configured workload identity header.

These are standard operational concerns — not blockers — but they should be considered as part of the enforcement rollout design.

Path C: Sidecar custom config via ConfigMap

When to choose: The built-in policy presets don't fit your workload — you need operator-authored budgets, guards, or tier configurations. Custom config works in both listener and enforcement modes, so you can shadow-test custom policy before activating blocking.

A sidecar with configmap + policy annotations and no mode annotation runs in listener mode with the custom policy (shadow decisions against custom budgets/guards). Adding mode: "enforcement" activates blocking.

# Pod template annotations for custom ConfigMap sidecar
annotations:
  runtime.getkoshi.ai/configmap: "my-team-policy"       # required — mounts the ConfigMap
  runtime.getkoshi.ai/policy: "team-standard"            # required — selects policy from ConfigMap
  # runtime.getkoshi.ai/mode: "enforcement"              # optional — uncomment to activate blocking

Required annotations:

runtime.getkoshi.ai/configmap: <configmap-name> — mounts the namespace-local ConfigMap
runtime.getkoshi.ai/policy: <policy-id> — selects which policy from the ConfigMap to use (required when configmap is set)

Optional:

runtime.getkoshi.ai/mode: "enforcement" — activates blocking; omit for listener mode (default)

ConfigMap contract:

The ConfigMap must contain a config.yaml data key — the sidecar loads from /etc/koshi-sidecar/config.yaml
Do not define workloads in the ConfigMap config — the sidecar synthesizes its own workload from pod identity at startup
mode.type in the config file is ignored — mode comes from the annotation only
Pod restart is required after ConfigMap content changes or annotation changes

What you get: arbitrary custom policy (operator-authored budgets, guards, tier configs) with per-pod blast radius and pod-derived identity. No routing change, no identity change.

What you don't get: centralized budget coordination or header-based identity. For those, use standalone enforcement.

See examples/sidecar-custom-configmap.yaml and examples/sidecar-custom-deployment.yaml for complete examples.

See the enforcement mode config reference and the pre-enforcement checklist before switching to standalone enforcement.

Posture Discovery in Listener Mode

Listener mode is a policy design sketchpad at the execution boundary. The full enforcement pipeline runs on every AI API request, but decisions are shadow-only — no traffic is blocked. This lets operators observe how specific policy constructs would interact with real traffic before committing to enforcement.

How shadow decisions relate to policy design

Each shadow outcome maps to a specific policy construct that would fire in enforcement mode:

Shadow outcome	Policy construct tested	What to refine
`allow`	All checks passed	Baseline posture is acceptable for this traffic pattern
`would_throttle` + `guard_max_tokens`	`guards.max_tokens_per_request`	Per-request token guard is tighter than this workload's actual request sizes
`would_throttle` + `budget_exhausted_throttle`	`budgets.rolling_tokens.limit_tokens` / `window_seconds`	Rolling budget is tighter than this workload's sustained consumption rate
`would_kill` + `budget_exhausted_kill`	`decision_tiers.tier3_platform.action: "kill_workload"`	Severe budget pressure — review whether consumption is expected or the budget needs widening
`would_reject` + `identity_missing`	Identity resolution (webhook injection)	Sidecar couldn't resolve workload identity — check that the webhook is injecting env vars
`would_reject` + `policy_not_found`	Policy lookup	No usable policy context — relevant when explicit workload mappings are configured without a default fallback

Iterative refinement workflow

Observe — collect shadow decisions on live traffic. Start with the built-in default listener policy.
Identify pressure points — which reason_code values appear most? Which namespaces or workloads generate would_throttle or would_kill?
Refine policy intent — use the shadow outcomes to decide what guard limits, budget windows, and tier actions are appropriate for each workload class.
Repeat — continue observing until the shadow posture matches your intended enforcement posture. The goal is to reach a state where the shadow outcomes are what you want enforcement to produce.

This observe-refine-repeat loop is the primary value of listener mode. Shadow decisions are not just audit data — they are the feedback signal for designing production policy.

Built-in per-pod policy selection is supported via the runtime.getkoshi.ai/policy annotation — choose from sidecar-baseline, sidecar-strict, or sidecar-high-throughput in both listener and enforcement sidecar modes. For arbitrary custom policy, use the runtime.getkoshi.ai/configmap annotation to mount a namespace-local ConfigMap with operator-authored budgets, guards, and tier configurations. See Path C: Sidecar custom config via ConfigMap.

Architecture

Koshi Runtime implements the GenOps Governance Specification — an open standard for AI workload governance semantics. GenOps defines the event attributes, lifecycle events, and status metadata that Koshi emits. Operators don't need to learn GenOps to use Koshi; it matters when integrating governance telemetry into broader tooling. See GenOps Compatibility.

One binary, two roles. KOSHI_ROLE=injector starts the admission webhook server. Default starts the proxy.
No Kubernetes API calls on the request path. Pod identity is normalized at admission time by the webhook and read from env vars by the sidecar.
Webhook failurePolicy: Ignore. If the injector is down, pods still create — they just don't get the sidecar. Verify sidecar presence after deployment: kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}' — look for koshi-listener.
Base-URL injection is safe. OPENAI_BASE_URL / ANTHROPIC_BASE_URL are only set on app containers if not already present. See What Traffic Produces Signal for implications when these vars are already defined.
Reservation-first accounting. Tokens are reserved before the request and reconciled with actual usage after the response.
Fail open on infrastructure, fail closed on policy. A panic triggers degraded pass-through mode. In enforcement mode, an unknown workload gets 403. In listener mode, unknown workloads emit would_reject and traffic proxies through.

Uninstall / Rollback

# Remove Koshi entirely
helm uninstall koshi -n koshi-system

# Remove the auto-generated TLS secret (created by cert-gen hook, not managed by Helm release)
kubectl delete secret koshi-koshi-webhook-tls -n koshi-system 2>/dev/null || true

# Remove namespace label (stops new pods from getting sidecars)
kubectl label namespace my-namespace runtime.getkoshi.ai/inject-

# Restart workloads to remove existing sidecars
kubectl rollout restart deployment -n my-namespace

Existing pods with sidecars continue to function until restarted. Removing the namespace label only affects future pod creation.

Documentation

For setup, evaluation, contribution, and architectural context:

Start here

Koshi onboarding — install, verify, collect signal, interpret shadow outcomes
Local demo walkthrough — kind cluster setup, synthetic traffic, event and metric validation
Kubernetes observability guide — structured events, Prometheus queries, Grafana patterns

Project

Roadmap — public product direction for the open runtime
Contributing — contribution guide
Security — vulnerability reporting and disclosure policy
License — Apache 2.0

Design

GenOps Compatibility

Koshi is the runtime operators deploy. GenOps is the open governance specification Koshi implements — it defines the event semantics, required attributes, and interoperability surfaces that make governance data portable across tools and platforms.

You do not need to read the GenOps spec to use Koshi. The spec shows up in three places:

Surface	What it provides
Structured events	`genops.spec.version` attribute on every event; required attribute names (`genops.accounting.`, `genops.policy.`, `genops.team`, etc.)
`GET /status`	`genops_spec_version` field in runtime diagnostics
Standalone header defaults	Default identity header name `x-genops-workload-id` follows GenOps naming conventions

Day-one operators can ignore GenOps entirely — Koshi handles compliance internally. Platform teams integrating governance telemetry into broader observability or compliance pipelines benefit from the stable, spec-defined attribute names and event structure.

Spec version: 0.1.0. Built against the GenOps Governance Specification. See Koshi and GenOps for the full relationship.

Development

make build        # Build binary
make test-race    # Run tests with race detector
make lint         # Run linter
make docker       # Build Docker image

Known v1 Limitations

In-memory budget state. Each sidecar maintains its own budget. State is lost on restart.
No cross-replica coordination. Budget enforcement is per-replica.
Single policy per workload. Only the first policy_ref is used.
Listener mode accounting is policy-scoped per sidecar. Shadow accounting uses a shared policy key (_default or listener_policy/<id>) rather than a per-workload tracker key. This keeps accounting bounded, but does not provide cross-replica or cluster-wide budget simulation. Listener events still include namespace, workload kind, and workload name for observability; only the in-memory accounting key is policy-scoped.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
cmd/koshi		cmd/koshi
demo/local		demo/local
deploy/helm/koshi		deploy/helm/koshi
docs		docs
examples		examples
internal		internal
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Koshi Runtime

Quick Start: Kubernetes (Listener Mode)

What You Can Do on Day One

What Traffic Produces Signal

What Gets Installed

How It Works

Listener vs Enforcement Mode

Workload Identity

Configuration

Listener Mode (recommended starting point)

Enforcement Mode

Settings Reference

Observability

Structured Events

Listener Metrics

Reason Codes

How Shadow Decisions Relate to Policy

Deployment

Docker

Kubernetes (Helm)

Annotations

Health Endpoints

Deployment Models

Sidecar Operating Shapes

Current Adoption Path

From Audit to Enforcement

Path A: Sidecar enforcement (in-place)

Path B: Standalone enforcement (deployment handoff)

Policy transition

Identity transition

Traffic transition

Rollout considerations

Worked example: one audited workload to standalone enforcement

Standalone availability considerations

Path C: Sidecar custom config via ConfigMap

Posture Discovery in Listener Mode

How shadow decisions relate to policy design

Iterative refinement workflow

Architecture

Uninstall / Rollback

Documentation

GenOps Compatibility

Development

Known v1 Limitations

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages