Skip to content

feat(warm): Warm Agents — deterministic orchestration with blast-radius enforcement#502

Open
Mnehmos wants to merge 9 commits intoKilo-Org:devfrom
Mnehmos:feat/warm-agents
Open

feat(warm): Warm Agents — deterministic orchestration with blast-radius enforcement#502
Mnehmos wants to merge 9 commits intoKilo-Org:devfrom
Mnehmos:feat/warm-agents

Conversation

@Mnehmos
Copy link

@Mnehmos Mnehmos commented Feb 20, 2026

Kilo League Challenge #3 — "For Devs, By Devs"
Build an open source developer tool — or contribute something that improves Kilo itself.
This PR is our submission for the DeveloperWeek 2026 hackathon track.

Closes #455


Summary

Warm Agents adds a deterministic, stateful orchestration layer to Kilo that answers three questions current agent loops cannot:

  1. What was the agent allowed to change? — Every task declares a scope (paths, operations, MCP tools). Tool calls outside scope are blocked before execution.
  2. What did the agent actually do? — An append-only JSONL audit trail logs every state transition, invariant check, and scope decision with timestamps and task IDs.
  3. Can sub-agents exceed their parent's permissions? — No. Hierarchical scope enforcement validates that child scope is always a subset of parent scope.

Key capabilities

  • Warmness scoring — Agents accumulate context familiarity scores based on files loaded, tools used, and task success. The scheduler routes tasks to the "warmest" qualified agent instead of cold-spawning.
  • Scoped permissions — Every tool call is pre-checked against the active task's declared scope. Paths outside the allowed set are blocked. Operations not declared (e.g., execute when only read/write allowed) are rejected.
  • Hierarchical sub-tasks — When the Task tool spawns a sub-agent, scope is automatically inferred from the task description (regex extracts file paths) and narrowed within the parent's bounds. The child can never exceed the parent's permissions.
  • Append-only audit trail — State transitions, invariant checks (passed + blocked), scope narrowing decisions, and MCP health events are logged to $XDG_DATA_HOME/kilo/warm/audit/{sessionID}.jsonl.
  • Durable state — All agent and task state is externalized to disk. Process crashes don't lose orchestration context. Incomplete tasks are recoverable.
  • MCP health awareness — Tool schema drift detection, server health tracking, graceful degradation when MCP servers change or disconnect.
  • Rollback & replay — Snapshot-based rollback for failed tasks. Replay engine can re-execute audit trails for CI/CD verification.
  • Zero overhead when off — All integration is behind --warm flag checks and lazy import(). No performance impact on normal usage.

Architecture

┌──────────────────────────────────────────────────────┐
│                   Warm Agents System                  │
├──────────────────────────────────────────────────────┤
│  Scheduler <-> Capability Registry <-> Invariant Engine │
│       |              |                     |          │
│  State Store    Warmness Scorer    Replay / Rollback  │
│       |                                    |          │
│            Audit Log (append-only JSONL)               │
└──────────────────────────────────────────────────────┘
         |                |                  |
  SessionPrompt.loop()   MCP Client    Snapshot System
  (existing)             (existing)    (existing)

Integration touchpoints (4 existing files, surgical changes)

File Change Guard
session/prompt.ts Scope pre-check on registry + MCP tool execution __warmContext?.enabled or KILO_WARM=1
tool/task.ts Sub-task creation/completion when Task tool spawns sub-agents Same
cli/cmd/run.ts --warm flag, warm init, status display, task completion args.warm
cli/cmd/tui/thread.ts --warm flag forwarded to worker via KILO_WARM=1 env args.warm

All changes are:

  • Guarded behind opt-in flags (zero impact when --warm is not passed)
  • Lazy-loaded via dynamic import() — no new imports in hot paths
  • Marked with // kilocode_change comments following codebase convention

Demo output

The standalone audit demo (bun test/warm/demo-audit.ts) exercises the full API surface without an LLM:

━━━ Phase 3: Tool Pre-Checks (Parent Scope) ━━━
  ✓ read /projects/myapp/src/auth/login.ts    (within scope)
  ✓ write /projects/myapp/src/auth/utils.ts   (within scope)
  ✗ read /etc/passwd                          (BLOCKED — outside scope)
  ✗ write /tmp/malicious.sh                   (BLOCKED — outside scope)
  ✗ bash rm -rf /                             (BLOCKED — operation not declared)
  ✗ webfetch https://evil.com                 (BLOCKED — operation not declared)

━━━ Phase 5: Tool Pre-Checks (Sub-Task Scope) ━━━
  ✓ read /projects/myapp/src/auth/login.ts    (within sub-scope)
  ✗ write /projects/myapp/src/ui/dashboard.ts (BLOCKED — outside sub-task scope)
  ✗ read /projects/myapp/config/settings.json (BLOCKED — outside sub-task scope)

━━━ Summary ━━━
  Audit entries: 19
  Transitions: 7
  Invariant checks: 12 (✓ 6 passed, ✗ 6 blocked)

Test plan

  • 154 tests passing across 12 test files (349 assertions)
  • Typecheck clean (bun run typecheck in packages/opencode)
  • Standalone audit demo generates valid JSONL audit trail
  • All integration points are no-op when --warm is not active
  • Hierarchical scope enforcement: child can never exceed parent scope
  • Postcondition violations correctly fail tasks (files outside scope)
  • Live testing with production-quality model (blocked by free model limitations in headless mode)
  • Extended soak testing with real workloads

Files added

Directory Files Purpose
src/warm/ 17 modules Agent state, task state, scorer, invariant engine, scheduler, policy, capability registry, MCP health, rollback, replay, state store, audit, integration bridge, bus events, failure reports
test/warm/ 12 test files + 1 demo Full coverage of all warm subsystems
docs/ 2 architecture docs Design rationale and implementation plan

Total: ~6,200 lines added, ~160 lines modified across 4 existing files


"For any execution, a senior developer can answer what the agent was trying to do, what it was allowed to change, and what state survives process death — by reading the audit log and durable state alone."

🤖 Generated with Claude Code

Mnehmos and others added 8 commits February 18, 2026 21:31
…riant, audit

Introduces the Warm Agents orchestration subsystem in packages/opencode/src/warm/.
This adds deterministic dispatch primitives, warmness scoring, blast-radius
enforcement, and an append-only audit log — all as additive code with zero
changes to existing files.

Modules:
- agent-state: typed 5-state lifecycle (cold→warming→warm→executing→cooling)
- task-state: typed 7-state lifecycle (pending→claimed→...→completed|rolled_back)
- scorer: 4-dimension warmness scoring (recency, familiarity, toolMatch, continuity)
- invariant: blast-radius enforcement for tool calls + pre/postcondition checks
- audit: JSONL append-only audit log for dispatch decisions and state transitions
- state-store: durable persistence for agent/task state via Bun.file + Lock
- failure-report: structured failure reports answering the 3 quality-bar questions
- bus-events: Bus event definitions for warm agent lifecycle

All 66 tests passing.

Ref: Kilo-Org#455

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CLI flag

Adds the dispatch and integration layer for Warm Agents:

- policy: last-wins rule evaluation for dispatch decisions (deny, allow,
  require_approval, pin_agent), blast-radius ceiling, capability deny-list
- scheduler: routes tasks to warmest qualified agent via scorer, with
  pinned → warmest → cold-spawn fallback chain, writes audit entries
- capability-registry: in-memory agent capability index with MCP server
  tracking, qualified-agent queries, and unhealthy-server detection
- warm-session: high-level orchestration API (submitTask, completeTask,
  toolPreCheck, registerAgent) bridging all subsystems
- --warm CLI flag on `kilo run` to opt-in (zero behavior change without it)

Seam changes:
- cli/cmd/run.ts: additive --warm option + warm context initialization
  (marked with kilocode_change comments)

85 tests passing (19 new for policy + capability-registry).

Ref: Kilo-Org#455

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Completes the Warm Agents prototype with hardening subsystems:

- mcp-health: server registration, success/failure tracking with 3-strike
  threshold, tool schema drift detection (added/removed), recovery marking,
  healthy/unhealthy queries integrated with CapabilityRegistry
- rollback: task-level rollback protocol extending Snapshot contract,
  reversibility checks, blast-radius-scoped file restoration, failure
  report generation with full quality-bar coverage
- replay: audit log trace builder, structural verification (dispatch
  determinism, lifecycle integrity, invariant coverage), summary output

Test suite: 119 tests across 10 files (34 new):
- mcp-health: 14 tests (register, success, drift, failure threshold, recovery)
- rollback: 6 tests (non-reversible skip, missing snapshot, file restore, reports)
- replay: 7 tests (trace counts, determinism, lifecycle chain, coverage, summary)
- integration: 7 tests (full lifecycle, policy denial, blast ceiling, MCP health,
  postcondition violation, replay verification, cold spawn fallback)

Ref: Kilo-Org#455

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add integration bridge module for safe globalThis context access
- Wire blast-radius invariant pre-checks into prompt.ts tool execution
  (both registry tools and MCP tools)
- Register warm agent and create default task on --warm CLI startup
- Add warm status output to run.ts event loop (tool completion + session end)
- Add createDefaultTask for CLI integration with working-directory scope
- 136 tests passing (17 new integration bridge tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --warm option to TUI command ($0/default)
- Pass KILO_WARM=1 env var to worker thread
- Add ensureContext() lazy initialization in integration bridge
  (auto-creates warm context on first tool call when KILO_WARM=1)
- Works seamlessly in both TUI and headless run modes
- 136 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cular deps

The dynamic import("../warm") loaded the barrel index.ts which triggered
circular dependencies, causing WarmIntegration to be undefined at runtime.
Switch all dynamic imports to target specific modules directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Orchestrators can now spawn sub-agents with narrower scope:
- Scope inference from task descriptions (extracts file paths/dirs)
- Child scope validated to be within parent's blast-radius
- Sub-task tool calls enforced against narrower scope
- Parent task restored after sub-agent completes
- Full audit trail for scope narrowing decisions
- 18 new tests for hierarchical scope (154 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exercises the full warm API surface without requiring an LLM:
agent lifecycle, blast-radius enforcement, hierarchical sub-tasks,
scope narrowing, postcondition violations, and audit log generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

export function matchesGlob(filePath: string, patterns: string[]): boolean {
for (const pattern of patterns) {
if (pattern === "**" || pattern === "**/*") return true
if (filePath.startsWith(pattern.replace("/**", "").replace("/*", ""))) return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: matchesGlob uses startsWith after stripping glob suffixes, which causes false positives.

For example, the pattern src/routes/** becomes src/routes after .replace("/**", ""), so filePath.startsWith("src/routes") will incorrectly match src/routesExtra/file.ts.

This is a blast-radius enforcement bypass — a tool operating on src/routesExtra/ would pass the check when it shouldn't.

Fix: append a / separator before the startsWith check:

Suggested change
if (filePath.startsWith(pattern.replace("/**", "").replace("/*", ""))) return true
if (filePath.startsWith(pattern.replace("/**", "/").replace("/*", "/"))) return true

tool(part)
// kilocode_change start - warm agent status after tool completion
if (args.warm && args.format !== "json") {
const warmCtx = (globalThis as any).__warmContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Direct (globalThis as any).__warmContext access bypasses the centralized WarmIntegration.getContext() that was created specifically to encapsulate this.

Since WarmIntegration is already imported on line 543, consider using WarmIntegration.getContext() here instead for consistency and type safety. The same applies to line 540.

using _ = await Lock.read(filePath)
const data = await Bun.file(filePath).json()
return AgentState.Info.parse(data)
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Empty catch block silently swallows all errors, including unexpected ones (e.g., permission denied, corrupt JSON). Per AGENTS.md: "Never leave a catch block empty."

For getAgent/getTask, ENOENT (file not found) returning undefined is expected, but other errors should be logged:

} catch (e) {
  if ((e as NodeJS.ErrnoException).code !== "ENOENT") {
    log.warn("getAgent failed", { agentID, error: e })
  }
  return undefined
}

Same issue at lines 68, 87, and 119.

): Promise<void> {
await Audit.append(task.sessionID, {
type: "dispatch_decision",
id: `audit_dispatch_${Date.now()}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Audit ID audit_dispatch_${Date.now()} can collide when multiple dispatches happen within the same millisecond (e.g., batch task submission).

Other audit sites in integration.ts already use a random suffix pattern: ${Date.now()}_${Math.random().toString(36).slice(2, 8)}. Consider using the same pattern here for consistency.

Suggested change
id: `audit_dispatch_${Date.now()}`,
id: `audit_dispatch_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,

): Promise<void> {
await Audit.append(sessionID, {
type: "state_transition",
id: `audit_transition_${Date.now()}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Same audit ID collision risk as in scheduler.tsaudit_transition_${Date.now()} can produce duplicates for rapid state transitions.

Suggested change
id: `audit_transition_${Date.now()}`,
id: `audit_transition_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,


// kilocode_change start - warm agents: create scoped sub-task for sub-agent
let warmSubTask: { taskID: string; parentTaskID: string; narrowed: boolean; scope: string[]; previousTask?: any } | undefined
if ((globalThis as any).__warmContext?.enabled || process.env.KILO_WARM === "1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: This check duplicates the logic in WarmIntegration.isEnabled(). Consider using the centralized helper instead:

const { WarmIntegration } = await import("../warm/integration")
if (WarmIntegration.isEnabled()) {

This keeps the warm-context detection logic in one place and avoids direct globalThis access in upstream files.

@kiloconnect
Copy link
Contributor

kiloconnect bot commented Feb 20, 2026

Code Review Summary

Status: 11 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 10
SUGGESTION 1
Issue Details (click to expand)

WARNING

File Line Issue
packages/opencode/src/warm/invariant.ts 35 matchesGlob uses startsWith after stripping glob suffixes — not a true glob match
packages/opencode/src/cli/cmd/run.ts 461 Direct (globalThis as any).__warmContext access instead of using WarmIntegration.getContext()
packages/opencode/src/warm/state-store.ts 36 Empty catch block silently swallows all errors in getAgent
packages/opencode/src/warm/scheduler.ts 110 Audit ID audit_dispatch_${Date.now()} can collide under concurrent calls
packages/opencode/src/warm/warm-session.ts 341 Same audit ID collision risk as in scheduler.ts
packages/opencode/src/warm/integration.ts 40 Race condition in ensureContext — concurrent tool calls can double-initialize warm context
packages/opencode/src/warm/integration.ts 130 durationMs parameter is accepted but never used — dead code
packages/opencode/src/warm/integration.ts 139 Audit entry uses phase: "tool_pre" for a post-execution log — semantically wrong
packages/opencode/src/warm/rollback.ts 80 Empty .catch(() => {}) silently swallows Bus publish errors (violates AGENTS.md)
packages/opencode/src/session/prompt.ts 851 Empty .catch(() => {}) and hardcoded durationMs: 0

SUGGESTION

File Line Issue
packages/opencode/src/tool/task.ts 130 Warm context check duplicates the logic in WarmIntegration.isEnabled()
Other Observations (not in diff)

Issues found in unchanged code or patterns that span multiple locations:

File Line Issue
packages/opencode/src/warm/state-store.ts 63, 68, 88, 114, 119, 149 Multiple empty catch blocks throughout StateStore — all silently swallow errors
packages/opencode/src/warm/mcp-health.ts 135 Same Date.now() audit ID collision pattern as scheduler.ts and warm-session.ts
packages/opencode/src/warm/warm-session.ts 214, 258 Task IDs warm_task_${Date.now()} and warm_subtask_${Date.now()} can collide if called rapidly
Files Reviewed (35 files)
  • .gitignore - 0 issues
  • docs/warm-agents-architect-prompt.md - 0 issues
  • docs/warm-agents-architecture.md - 0 issues
  • packages/opencode/src/cli/cmd/run.ts - 1 issue
  • packages/opencode/src/cli/cmd/tui/thread.ts - 0 issues
  • packages/opencode/src/session/prompt.ts - 1 issue
  • packages/opencode/src/tool/task.ts - 1 issue
  • packages/opencode/src/warm/agent-state.ts - 0 issues
  • packages/opencode/src/warm/audit.ts - 0 issues
  • packages/opencode/src/warm/bus-events.ts - 0 issues
  • packages/opencode/src/warm/capability-registry.ts - 0 issues
  • packages/opencode/src/warm/failure-report.ts - 0 issues
  • packages/opencode/src/warm/index.ts - 0 issues
  • packages/opencode/src/warm/integration.ts - 3 issues
  • packages/opencode/src/warm/invariant.ts - 1 issue
  • packages/opencode/src/warm/mcp-health.ts - 0 issues (pattern noted in Other Observations)
  • packages/opencode/src/warm/policy.ts - 0 issues
  • packages/opencode/src/warm/replay.ts - 0 issues
  • packages/opencode/src/warm/rollback.ts - 1 issue
  • packages/opencode/src/warm/scheduler.ts - 1 issue
  • packages/opencode/src/warm/scorer.ts - 0 issues
  • packages/opencode/src/warm/state-store.ts - 1 issue
  • packages/opencode/src/warm/task-state.ts - 0 issues
  • packages/opencode/src/warm/warm-session.ts - 1 issue
  • packages/opencode/test/warm/agent-state.test.ts - 0 issues
  • packages/opencode/test/warm/capability-registry.test.ts - 0 issues
  • packages/opencode/test/warm/demo-audit.ts - 0 issues
  • packages/opencode/test/warm/hierarchical-scope.test.ts - 0 issues
  • packages/opencode/test/warm/integration-bridge.test.ts - 0 issues
  • packages/opencode/test/warm/integration.test.ts - 0 issues
  • packages/opencode/test/warm/invariant.test.ts - 0 issues
  • packages/opencode/test/warm/mcp-health.test.ts - 0 issues
  • packages/opencode/test/warm/policy.test.ts - 0 issues
  • packages/opencode/test/warm/replay.test.ts - 0 issues
  • packages/opencode/test/warm/rollback.test.ts - 0 issues
  • packages/opencode/test/warm/scorer.test.ts - 0 issues
  • packages/opencode/test/warm/task-state.test.ts - 0 issues

Fix these issues in Kilo Cloud

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* but no context exists yet. This handles the TUI case where the
* worker thread has the env var but warm init hasn't happened yet.
*/
export async function ensureContext(sessionID: string): Promise<WarmSessionType.WarmContext | undefined> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Race condition in ensureContext — concurrent tool calls can double-initialize

If two tool calls arrive simultaneously (e.g., parallel tool execution), both can pass the existing?.enabled check on line 42 before either call reaches setContext(ctx) on line 51. This would create two separate warm contexts, two agents, and two tasks — with only the last one surviving on globalThis.

Consider adding a guard (e.g., a pending promise) to serialize initialization:

let pending: Promise<WarmSessionType.WarmContext | undefined> | undefined

export function ensureContext(sessionID: string): Promise<WarmSessionType.WarmContext | undefined> {
  const existing = getContext()
  if (existing?.enabled) return Promise.resolve(existing)
  if (process.env.KILO_WARM !== "1") return Promise.resolve(undefined)
  if (!pending) pending = doInit(sessionID).finally(() => { pending = undefined })
  return pending
}

sessionID: string,
toolName: string,
args: Record<string, unknown>,
durationMs: number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: durationMs parameter is accepted but never used in the function body

The durationMs parameter on line 130 is dead code — it's never referenced inside logToolExecution. The caller in prompt.ts always passes 0. Either measure and pass the actual duration, or remove the parameter.

type: "invariant_check",
id: `audit_exec_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
taskID: ctx.activeTask.id,
phase: "tool_pre",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Audit entry uses phase: "tool_pre" for a post-execution log

This function is called after tool execution (from prompt.ts line 851), but the audit entry records phase: "tool_pre". This makes the audit log misleading — replay/analysis would interpret this as a pre-check rather than a post-execution record. Should be "tool_post" to match the actual semantics.

sessionID: task.sessionID,
reason: "postcondition failure",
filesRestored: restored,
}).catch(() => {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Empty .catch(() => {}) silently swallows Bus publish errors

Per the project's style guide (AGENTS.md): "Never leave a catch block empty." While the comment explains Bus may not have Instance context, at minimum log the error so failures are visible:

Suggested change
}).catch(() => {})
}).catch((e) => log.warn("bus publish failed", { error: e }))

// kilocode_change start - warm agent audit logging
if (warmCheck?.logged) {
const { WarmIntegration } = await import("../warm/integration")
await WarmIntegration.logToolExecution(ctx.sessionID, item.id, args, 0).catch(() => {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Empty .catch(() => {}) silently swallows audit logging errors, and durationMs is hardcoded to 0

Two issues:

  1. The empty catch violates the project's "no empty catch blocks" rule (AGENTS.md). At minimum, log the error.
  2. durationMs: 0 is always passed — if duration tracking isn't needed, remove the parameter from logToolExecution. If it is needed, measure the actual tool execution time.
Suggested change
await WarmIntegration.logToolExecution(ctx.sessionID, item.id, args, 0).catch(() => {})
await WarmIntegration.logToolExecution(ctx.sessionID, item.id, args, 0).catch((e) => log.warn("warm audit failed", { error: e }))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Warm Agents — deterministic orchestration with stateful agent reuse

1 participant

Comments