Skip to content

Releases: iampantherr/SecureContext

v0.18.0 — Sprint 2 Baseline: Self-Improving Skill Engine

29 Apr 02:02

Choose a tag to compare

[0.18.0] — 2026-04-29 — Sprint 2 baseline: skill mutation engine + replay + agentskills.io interop

The self-improving skill loop. Skills become first-class hash-protected
artifacts; replay against synthetic fixtures produces composite outcome
scores; mutators propose candidate variants; winners promote atomically.
Per-project skills override global at resolve time. Cross-project
promotion candidates surface via findGlobalPromotionCandidates.

This is the Sprint 2 baseline — verified end-to-end with both unit
tests and a live cross-project demo against Postgres. v0.18.1 (next)
adds the CLI-based runtime mutator + outcome-trigger guardrails + operator-
gated global promotion queue, all without requiring an Anthropic API key.

Added — skill subsystem (src/skills/)

  • types.ts (192 lines) — Skill, SkillRun, SkillMutation, MutationContext type graph
  • loader.ts (323 lines) — markdown frontmatter parser + HMAC-SHA256 body sign
  • storage.ts (259 lines) — SQLite CRUD + tamper detection (SkillTamperedError)
  • storage_pg.ts (248 lines) — Postgres mirror for skills_pg / skill_runs_pg / skill_mutations_pg
  • storage_dual.ts (146 lines) — backend-aware dispatch (sqlite | postgres | dual)
  • scoring.ts (246 lines) — composite outcome score (accuracy + cost + speed) + acceptance
  • replay.ts (234 lines) — synthetic-fixture replay harness with HMAC-verify gate
  • mutator.ts (228 lines) — pluggable Mutator interface + helpers
  • mutators/local_mock.ts (71 lines) — deterministic test mutator
  • mutators/realtime_sonnet.ts (125 lines) — Anthropic Messages API direct
  • mutators/batch_sonnet.ts (159 lines) — Anthropic Batch API (50% discount)
  • orchestrator.ts (256 lines) — full select→mutate→replay→promote cycle
  • format/agentskills_io.ts (144 lines) — agentskills.io interop import/export

Added — cron primitive (src/cron/)

  • scheduler.ts (190 lines) — in-process scheduler with persistence, daily/interval triggers, history bound

Added — 3 SQLite migrations (20-22) and 3 PG migrations (6-8)

  • skills / skills_pg — versioned hash-protected skill registry (UNIQUE active per name+scope)
  • skill_runs / skill_runs_pg — execution telemetry with composite outcome score
  • skill_mutations / skill_mutations_pg — proposal + replay + promotion ledger

Added — 7 new MCP tools

Tool Purpose
zc_skill_list List active skills with recent score
zc_skill_show Full skill detail (HMAC-verified)
zc_skill_score Aggregate score + acceptance check
zc_skill_run_replay Replay against fixtures via LocalDeterministicExecutor
zc_skill_propose_mutation Run one mutation cycle on demand
zc_skill_export Export as agentskills.io markdown
zc_skill_import Accept agentskills.io markdown → store as skill

Added — entrypoint scripts

  • scripts/run-nightly-mutations.mjs — OS cron entrypoint (Linux cron / Windows Task Scheduler)
  • scripts/sprint2-cross-project-demo.mjs — live cross-project promotion demo (verified)
  • scripts/sprint2-live-demo.mjs — single-project mutation cycle demo (verified)

Added — RT-S2-* security tests

  • RT-S2-05: ZC_MUTATOR_MODEL allowlist falls back to local-mock on unknown values
  • RT-S2-07: pre-submission secret_scanner rejects API-key / AWS-key payloads
  • RT-S2-08: skill body HMAC mismatch → SkillTamperedError on storage read
  • RT-S2-09: candidate body HMAC verified before replay; mismatch → marked failed

Documentation

  • docs/SKILLS_WALKTHROUGH.md (~250 lines) — comprehensive usage guide

Test suite: 786/786 (was 645)

  • 132 new Sprint 2 unit tests
  • 9 new PG-mirror integration tests (require live PG)
  • All quality gates green: ESLint 0 errors, env-pinning linter 0 unclassified
  • Live cross-project demo: 9/9 steps pass against real Postgres

Migration notes

  • 3 new SQLite migrations (20-22) auto-apply on first run
  • 3 new PG migrations (6-8) require ZC_TELEMETRY_BACKEND=postgres|dual for activation
  • New env var ZC_MUTATOR_MODEL (allowlist-enforced; defaults to local-mock)
  • No breaking changes — Sprint 2 additions are additive

Architectural decisions ratified (D1-D6)

  • D1: Storage = dual (SQLite per-project default + PG centralized; both supported in this release)
  • D2: Skill scope = hierarchical (per-project overrides global at resolve time)
  • D3: Replay benchmark source = synthetic fixtures first (real-historical replay deferred to Sprint 2.5)
  • D4: Mutation engine = Sonnet 4.6 batch primary + realtime fallback + LocalMock for tests
  • D5: Per-tool-call cost storage (skill_runs.total_cost rolls up)
  • D6: Existing learnings/ JSONL kept; auto-feedback loop from v0.17.2 preserved

Sprint 2.5 deferrals

Tracked in C:\Users\Amit\AI_projects\.harness-planning\ARCHITECTURAL_LESSONS.md:

  • S2.5-1 Subprocess sandbox executor (RT-S2-03/04)
  • S2.5-2 Real-historical replay
  • S2.5-3 Override confirmation prompt (RT-S2-06)
  • S2.5-4 Cross-project auto-promotion
  • S2.5-5 Compacted-segment HMAC (RT-S2-08 for compaction)
  • S2.5-7 zc_unredact tool
  • S2.5-8 Skill injection scanner (RT-S2-01 hardening)

v0.17.2 — Architectural Lints (L1+L3) + Learning-Loop Closure (L4)

20 Apr 14:13

Choose a tag to compare

[0.17.2] — 2026-04-20 — Architectural lints (L1+L3) + learning-loop closure (L4)

Pre-Sprint-2 hardening round. Closes three classes of bugs identified by
the v0.17.1 verification retrospective before the mutation-engine build
begins. All three are "catch future regressions automatically so we
don't keep rediscovering the same class of bug by luck":

Added — L1: env-pinning linter (scripts/check-env-pinning.mjs)

Static analysis script that walks src/**/*.ts for every process.env.ZC_*
reference, classifies each as CRITICAL / SHARED_PROPAGATED / OPERATIONAL,
and verifies CRITICAL vars are explicitly pinned in BOTH orchestrator +
worker launcher heredocs of A2A_dispatcher/start-agents.ps1.

Would have caught the v0.17.0 ZC_AGENT_ID pollution bug that silently
mis-attributed 16 consecutive tool_calls to the wrong agent_id (breaking
per-agent HKDF subkey isolation + RLS + log scoping).

  • 14-case self-test (scripts/check-env-pinning.test.mjs) covering happy
    path, missing pin, unclassified var, shared-propagation warnings,
    bracket-notation refs, missing dispatcher path.
  • Run via npm run check:env (production) or npm run check:env:test (selftest).
  • Exit 0 = all green, exit 1 = new var unclassified OR critical missing.

Added — L3: ESLint flat config with @typescript-eslint/no-floating-promises

Installed eslint@9 + typescript-eslint@8 with a minimal config focused
on the single most-load-bearing rule: no-floating-promises. When the
outcomes.ts module became async in v0.12.0, the posttool-outcomes.mjs
hook kept calling resolveGitCommitOutcome(...) without await — the
process exited before the async DB write completed. 9 months of
undetected outcome-data loss.
The lint would have caught it on the
first write.

  • Scanned src/ on install: found 3 real floating-promise violations
    (2 recordToolCall in server.ts, 1 reader.cancel in fetcher.ts).
    All fixed with explicit void operator + comments documenting intent.
  • Self-test (scripts/test-lint-catches-floating-promise.mjs) creates
    a synthetic TS file with an unawaited call, confirms ESLint fails on
    it, and confirms void + await both silence the rule. 5/5 pass.
  • Run via npm run lint or npm run lint:test.

Added — L4: outcome → learnings JSONL auto-feedback (src/outcome_feedback.ts)

Closes the learning loop. Previously, a failure becoming a learning
required agent discipline: (1) notice failure, (2) write to
failures.jsonl, (3) remember the format, (4) let the hook mirror. Four
points of failure, all behavioral.

Now: recordOutcome({outcomeKind: 'rejected' | 'failed' | 'insufficient' | 'errored' | 'reverted'}) atomically appends a structured JSON line
to <projectPath>/learnings/failures.jsonl. Successful outcomes
(shipped, accepted) with confidence ≥ 0.9 append to
learnings/experiments.jsonl. Future sessions retrieve via zc_search
without any agent discipline required.

Features:

  • Best-effort; swallows errors (never affects the primary outcome row).
  • Auto-creates learnings/ dir if missing (guard: projectPath must exist).
  • Symlink-escape guard: target must resolve inside <projectPath>/learnings/.
  • Payload capped at 64 KB per line; oversized evidence → dropped with a marker.
  • Concurrent writers don't corrupt — single appendFileSync per line.

16 unit tests covering every outcome-kind branch, security guards
(symlink escape, ghost projectPath), large-evidence truncation, rapid
concurrent appends, and downstream-consumer format (learnings-indexer
can mirror these rows into PG).

Live verified end-to-end: called recordOutcome with kind='rejected'
failures.jsonl gained 1 structured line tagged
"source":"auto-feedback-v0.17.1". Low-confidence accepted correctly
skipped. High-confidence shipped landed in experiments.jsonl.

Test suite: 645/645 (+16 from v0.17.1)

  • New: src/outcome_feedback.test.ts (16 tests)
  • New: scripts/check-env-pinning.test.mjs (14 cases)
  • New: scripts/test-lint-catches-floating-promise.mjs (5 cases)

Migration

  • No schema changes. No behavior changes for existing outcomes — the
    feedback module is additive. Projects with no learnings/ dir get one
    auto-created on the first failure/success outcome.
  • Operators running CI should add npm run check:env + npm run lint
    to the pipeline.

v0.17.1 — Agent-Idle Fixes + Recall Cache + Cost Correctness

20 Apr 13:56

Choose a tag to compare

[0.17.1] — 2026-04-20 — Agent-idle fixes (A+B+C+D) + recall cache + cost-correctness (Tier 1+2)

Hotfix round addressing five issues found in live verification of v0.17.0:
(a) agents going idle after zc_summarize_session instead of draining the
task queue, (b) zc_recall_context dominating session cost at ~82% on Opus,
(c) tool-call cost accounting billed at the wrong rate (5× over-reported on
Opus), (d) infra-tool noise polluting the orchestrator's "do it myself vs.
delegate to Sonnet developer" cost comparisons, and (e) seven
architectural bugs surfaced by end-to-end data-flow tracing.

Added — src/recall_cache.ts (60s TTL + change-detection)

  • In-memory cache for zc_recall_context keyed by (project_path, agent_id).
    TTL 60s; cache miss on any new working_memory / broadcasts /
    session_events row. Repeat calls inside the window return the prior
    response prefixed with (cached Xs ago) — saves ~800 output tokens per hit.
    Estimated savings: ~$0.06/call on Opus, ~$0.012/call on Sonnet.
  • force: true arg bypasses the cache when an agent explicitly wants fresh data.
  • Cache is scoped per (project_hash, agent_id) — no cross-agent leakage.
  • Process-lifetime only; max 64 entries with FIFO prune.
  • 11 unit tests.

Added — Tier 1 pricing: computeToolCallCost() in src/pricing.ts

Tool calls now billed from the LLM's perspective:

  • Tool call args (what the LLM generated to invoke) → billed at model's output rate
  • Tool response (what the LLM reads on its next turn) → billed at model's input rate

The naive computeCost() inverted these, over-reporting cost by ~5× on Opus
(output $75/Mtok vs. input $15/Mtok). For zc_recall_context:

  • Before: 798 × $75/Mtok = $0.060 (treated as Opus output)
  • After: 798 × $15/Mtok = $0.012 (Opus reads as input on next turn)

Matters because the Opus orchestrator uses cost tracking to decide "do I
handle this myself vs. delegate to the Sonnet developer" — inflated
numbers nudge toward unnecessary delegation.

Added — Tier 2 infra-tool zero-cost (INFRA_TOOLS set)

DB-assembly tools (zc_recall_context, zc_file_summary, zc_project_card,
zc_status) now return cost_usd=0. Rationale: their responses are
deterministic from DB state — no LLM, no Ollama, no external service — so
per-call work is negligible. Token counts still accurate so audits can
recompute via computeToolCallCost.

Override: set ZC_DISABLE_INFRA_ZERO_COST=1 when you want full cost
reconciliation against Anthropic invoices.

Added — HTTP endpoint GET /api/v1/queue/stats-by-role

Returns { role: { queued, claimed, done, failed } } for task_queue_pg.
Used by the A2A dispatcher's new checkWorkerWake (see A2A_dispatcher
v0.17.1) to poke idle workers when their role has claimable work.

Fixed — outcomes resolver pipeline (3 latent bugs from v0.12.0+)

  1. getMostRecentToolCallForSession was SQLite-only. In Postgres mode
    session lookups returned null → resolveGitCommitOutcome +
    resolveFollowUpOutcomes silently no-op'd. Result: every outcome row
    since v0.12.0 (when the function became async) failed to persist.
  2. posttool-outcomes.mjs hook had the same SQLite-only query for session
    id discovery. Fixed with the same PG lookup + SQLite fallback pattern.
  3. Hook called resolveGitCommitOutcome(...) without await. Process
    exited before the async resolver's DB write completed. 9 months of
    undetected outcome-data loss
    (L3 in the architectural-lessons doc).

Fixed — learnings-indexer.mjs hook coverage gaps

  1. Previously matched only Write|Edit|MultiEdit|NotebookEdit. Agents
    using echo ... >> learnings/X.jsonl via Bash silently bypassed the
    hook. Now matches Bash too and parses >> / > redirection
    targets from the command.
  2. Hook only wrote to SQLite; Postgres learnings_pg populated only via
    manual scripts/backfill-learnings.mjs. Now mirrors to PG when
    ZC_TELEMETRY_BACKEND=postgres|dual. Module-resolution handles running
    from ~/.claude/hooks/ with no node_modules via file:// fallback
    to SC repo's node_modules/pg.
  3. projectPath hashing normalized via realpathSync so forward-slash /
    backslash variants on Windows hash consistently.

Test suite: 629/629 (+12 from v0.17.0)

  • Added src/recall_cache.test.ts (11 tests: cold-miss, hit, staleness,
    cross-agent/project isolation, TTL, undefined-agent bucketing).
  • Added telemetry non-infra-tool cost test.
  • Updated postgres_backend.test.ts RT-S3-06 + sprint1_integration.test.ts
    for new cost formula.

Migration

  • Pure code fixes — no schema changes.
  • Historical tool_calls_pg rows retain their old cost_usd values; new
    rows use corrected formula.
  • To use -WorkerCount N with PG backend, ensure sc-api is rebuilt from
    v0.17.1 source (adds /api/v1/queue/stats-by-role endpoint).

v0.17.0 — Work-Stealing Queue + Model Router + Ownership Guard + Multi-Worker Pools

20 Apr 03:07

Choose a tag to compare

[0.17.0] — 2026-04-20 — Sprint 3 Phase 3: Work-Stealing Queue + Model Router + Ownership Guard + Multi-Worker Pools

Sprint 3 Phase 3 — the pieces that let multiple workers in the same role share one task queue without stepping on each other. Closes the "single worker per role" limit that v0.15.0/v0.16.0 left in place.

Added — Postgres work-stealing queue (§8.2)

  • task_queue_pg table (migration id=5) with state CHECK constraint + routing index (project_hash, role, state, ts) + partial heartbeat index WHERE state='claimed'.
  • src/task_queue.ts — seven operations backed by FOR UPDATE SKIP LOCKED so N workers can race-claim atomically without blocking each other:
    • enqueueTask() — idempotent (ON CONFLICT DO NOTHING)
    • claimTask() — atomic primitive (UPDATE ... WHERE task_id = (SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1))
    • heartbeatTask() — workers must call every 30s
    • completeTask() / failTask() — terminal states (fail bumps retries)
    • reclaimStaleTasks(staleAfterSeconds=300) — sweep dead claims back to queue
    • getQueueStats() — counts by state
  • 13 unit tests (src/task_queue.test.ts) including:
    • RT-S4-01: 50 concurrent workers × 100 tasks → each task claimed EXACTLY once (no double-claim; core correctness property of SKIP LOCKED)
    • RT-S4-02: 600s-stale heartbeat → reclaim back to queued + retries++
    • RT-S4-03: failTask bumps retries + persists failure_reason
    • RT-S4-04: cross-role + cross-project scope isolation

Added — 6 MCP tools exposing the queue

  • zc_enqueue_task (orchestrator) · zc_claim_task (worker) · zc_heartbeat_task · zc_complete_task · zc_fail_task · zc_queue_stats
  • Worker agent_id is sourced from ZC_AGENT_ID env var so a multi-worker pool (e.g. developer-1/2/3 all role=developer) shares one queue keyed by (project_hash, role) and claims atomically.
  • 5 MCP integration tests (src/task_queue_mcp.test.ts) covering end-to-end lifecycle, 3-worker race, fail path, stats aggregation, cross-project isolation.

Added — Complexity-based model router (§8.5)

  • src/indexing/model_router.tschooseModel(complexity 1-5) returns {model, tier, reason, estimatedInputCostPerMtok, inputClamped}:
    • 1-2 → Haiku 4.5 (trivial tasks, $0.25/Mtok)
    • 3-4 → Sonnet 4.6 (standard work, $3.00/Mtok — cost/quality sweet spot)
    • 5 → Opus 4.7 (hard reasoning, $15.00/Mtok)
  • Env overrides: ZC_MODEL_TIER_{HAIKU,SONNET,OPUS} resolved per call so operators can flip at runtime.
  • Safe defaults: null / undefined / NaN / Infinity / out-of-range → Sonnet with inputClamped=true.
  • 19 unit tests covering tier mapping, rounding, clamping edges, env overrides, result shape.
  • zc_choose_model MCP tool wraps it.

Added — File-ownership overlap guard at /api/v1/broadcast (§8.2)

  • HTTP API rejects ASSIGN whose file_ownership_exclusive overlaps any in-flight (unmerged) ASSIGN's exclusive set → HTTP 409 Conflict with overlapping_files + conflicting_broadcast_id. Prevents two workers being assigned the same file.
  • "In-flight" = ASSIGN whose task has no subsequent MERGE in the last 200 broadcasts.
  • 5 integration tests (src/ownership_guard.test.ts):
    • RT-S4-05: overlapping exclusive → 409
    • RT-S4-06: disjoint exclusive → 200
    • RT-S4-07: re-ASSIGN allowed after MERGE of the prior task
    • Plus back-compat (no excl set) + non-ASSIGN types bypass guard

Fixed — recallSharedChannel was silently dropping v0.15.0 §8.1 structured columns

SQLite-path recallSharedChannel only projected legacy columns. All downstream consumers saw file_ownership_exclusive=undefined even when the DB column was populated — the ownership-guard work surfaced this hidden v0.15.0 gap. Now projects all 7 v0.15.0 §8.1 columns with NULL → undefined semantics.

Added — -WorkerCount N on start-agents.ps1 + role-tagged registration (A2A_dispatcher side)

  • New -WorkerCount param (1-20, default 1). When > 1, expands each -Roles entry into N numbered workers suffixed -1..-N:
    start-agents.ps1 -Roles developer -WorkerCount 3
    # → spawns developer-1, developer-2, developer-3
    #   each with its own WT window, worktree, registration
    #   all sharing role="developer" — one work-stealing queue
  • Get-AgentRole helper strips -N suffix so $roleMeta + roles.json deep-prompt lookups still work.
  • register.mjs accepts --role flag / ZC_AGENT_ROLE env → writes _agent_roles[agentId] sidecar so dispatcher can route by role without breaking the existing agentId → pane string map.
  • Back-compat: WorkerCount=1 (default) preserves legacy plain names ("developer" not "developer-1").
  • Env propagation fix: worker/orchestrator launch scripts now also propagate ZC_POSTGRES_* + ZC_TELEMETRY_BACKEND so the agent's MCP server can reach task_queue_pg (closes the longstanding v0.10.4 env-propagation follow-up).

Added — scripts/backfill-learnings.mjs (close the learning loop)

  • The PostToolUse learnings-indexer.mjs hook only mirrors NEW Write/Edit events — prior <project>/learnings/*.jsonl rows never get indexed into learnings / learnings_pg. So agents couldn't zc_search past decisions/failures from earlier sessions.
  • New script scans <project>/learnings/*.jsonl, categorizes by filename stem, idempotently upserts (via UNIQUE), mirrors to PG when ZC_TELEMETRY_BACKEND=postgres|dual.
  • Verified on Test_Agent_Coordination: 6 rows backfilled (3 decisions + 3 metrics). Previously both SQLite and PG had 0 learnings rows despite JSONL content existing.

Test Suite

  • 617/617 unit+integration tests pass (was 575 pre-v0.17.0; +42 new: 13 task_queue + 19 model_router + 5 ownership guard + 5 task_queue MCP).
  • Live E2E on Test_Agent_Coordination with -WorkerCount 3: agent called zc_choose_model (verified 2→haiku, 4→sonnet, 5→opus tier mapping), enqueued 3 disjoint-ownership tasks via zc_enqueue_task, workers atomically claimed via zc_claim_task, committed actual file hardening (e.g. checkRequest(req) in src/rate-limiter.js throwing TypeError: rate-limiter: req argument is required; harden: validate argv in index commit f25acf5a).

Migration

  • Schema: migration id=5 (task_queue_pg) is idempotent + additive — Postgres-only feature (no SQLite companion).
  • API: zero breaking changes. All new MCP tools are additive.
  • Env for workers: if you run in HTTP/Postgres mode, restart agents via start-agents.ps1 so they pick up the updated launch scripts that propagate ZC_POSTGRES_*. Until then, zc_enqueue_task/zc_claim_task return Postgres pool unavailable.

v0.16.0 — Sprint 3 Phase 2: Postgres Backend + T3.1 SET LOCAL ROLE + T3.2 RLS

19 Apr 01:40

Choose a tag to compare

Sprint 3 Phase 2 — Postgres backend (deferred since v0.12.x) + both Tier 3 access-control fixes from §8.6 of the canonical plan. Closes the v0.15.0 limitation where structured ASSIGN fields were silently dropped in HTTP API mode.

Three major adds

Postgres backend for telemetry/outcomes

`ChainedTablePostgres` mirrors `ChainedTableSqlite` using `BEGIN; SELECT row_hash ... FOR UPDATE; INSERT; COMMIT` (Postgres analog of SQLite's BEGIN IMMEDIATE). Same chain content (HKDF-keyed HMAC) — rows byte-identical across backends, migration is a SQL copy.

Wired in via existing `ZC_TELEMETRY_BACKEND=sqlite|postgres|dual` env switch.

Tier 3 fix T3.1 — per-query SET LOCAL ROLE

Each agent now writes telemetry under a per-agent Postgres role (`zc_agent_`), lazily provisioned with minimum INSERT/SELECT/UPDATE grants. Each chained INSERT runs inside `BEGIN; SET LOCAL ROLE ; INSERT; COMMIT` — Postgres' `current_user` reflects the actual writing agent, not the pool's user.

Tier 3 fix T3.2 — Row-Level Security on outcomes_pg

4 RLS policies enforce read tiers (Chin & Older 2011 Ch5+13, Bell-LaPadula confidentiality):

  • `public/internal` → any role
  • `confidential` → registered agent
  • `restricted` → ONLY `created_by_agent_id` (matched against `current_setting('zc.current_agent')`)

This is enforced INSIDE Postgres, not in app code. Even a compromised agent process with valid DB credentials cannot read other agents' restricted outcomes.

HTTP API forwards structured ASSIGN columns

Closes the v0.15.0 known limitation. `POST /api/v1/broadcast` now accepts and forwards all 7 v0.15.0 structured fields.

Tests

  • 575/575 pass (565 + 10 new Postgres tests)
  • Postgres tests run against real local Docker container, auto-skip when no PG reachable
  • RT-S3-05 verified live: cross-agent read of `'restricted'` row blocked by Postgres RLS even with shared DB credentials
  • RT-S3-06 verified live: chain hashes byte-identical across SQLite + Postgres (rows migrate without rehashing)

Bugs found + fixed during integration

  1. `provisionAgentRole` originally inside writer txn → grants invisible to SET LOCAL ROLE. Fixed via separate-connection provisioning.
  2. `SELECT FOR UPDATE` needs `UPDATE` privilege on most PG versions — added explicit GRANT.
  3. Missing `GRANT USAGE ON SCHEMA public` — required for table access.

Known limitations

  • Existing `securecontext-api` Docker container is v0.8.0 — needs rebuild (`docker compose build sc-api && docker compose up -d sc-api`) to pick up v0.16.0 endpoints
  • Live multi-agent test through `start-agents.ps1` with `ZC_TELEMETRY_BACKEND=postgres` requires that container rebuild — functionally validated via 10 unit tests against real Postgres + RT-S3-05 cross-agent RLS test

Upgrade notes

Backward-compatible by default. Don't set `ZC_TELEMETRY_BACKEND` and SQLite continues exactly as v0.15.0.

To enable Postgres backend:

  1. Set `ZC_POSTGRES_PASSWORD` (or `ZC_POSTGRES_URL`)
  2. Set `ZC_TELEMETRY_BACKEND=postgres` (or `=dual` for parity verification)
  3. Pool's owning role needs `CREATEROLE` privilege (bundled `scuser` already has it)
  4. Rebuild + redeploy the Docker `securecontext-api` container

What's next

v0.17.0 — §8.2-8.5 work-stealing queue + worker pool spawning + file-ownership enforcement + complexity-based model routing. Uses the Postgres backend shipped here.


See CHANGELOG.md for full details.

v0.15.0 — Sprint 3 Phase 1: Structured ASSIGN + MAC Classification (Tier 3 part)

19 Apr 00:52

Choose a tag to compare

First slice of Sprint 3. Foundation pieces that don't require Postgres backend.

Two features

§8.1 Structured ASSIGN broadcast schema (additive, backward-compatible)

7 new optional fields on `zc_broadcast` for type=ASSIGN:

  • `acceptance_criteria` (testable assertions)
  • `complexity_estimate` (1-5)
  • `file_ownership_exclusive` + `file_ownership_read_only` (path-traversal-filtered)
  • `task_dependencies` (broadcast IDs that must MERGE first)
  • `required_skills`
  • `estimated_tokens`

Existing ASSIGN broadcasts work unchanged (backward-compat). Dispatcher in v0.17.0 will consume these for tier routing + file-ownership enforcement.

§8.6 T3.2 MAC-style classification on outcomes (Chin & Older 2011 Ch5+Ch13)

Classification labels: `public` / `internal` / `confidential` / `restricted` with read-filter:

  • `'restricted'` rows readable ONLY by `created_by_agent_id` — closes the cross-agent leak gate from §8.6 T3.2

`resolveUserPromptOutcome` now auto-tags `'restricted'` with the agent's identity (sentiment about user messages belongs to the originating agent only).

Tests

  • 565/565 pass (541 baseline + 24 new)
  • RT-S3-02: cross-agent read of `'restricted'` row blocked
  • RT-S3-03: legacy rows get `'internal'` default; CHECK blocks NULL
  • RT-S3-04: SQL injection via classification value blocked by CHECK constraint
  • Edge cases: complexity clamping, oversize cap, path traversal, integer-only deps, downgrade of restricted-without-creator

Live verification

Real Claude CLI agent on Test_Agent_Coordination processed broadcast #1037 (4 tool_calls). Local-mode broadcastFact verified all 7 structured fields round-trip through SQLite.

Known limitations (deferred to v0.16.0)

  • HTTP API mode: existing api-server (Docker container) doesn't yet know about structured ASSIGN columns. Local mode works fully.
  • T3.1 per-agent Postgres role: deferred since it depends on Postgres backend landing first (per §8.6 acceptance criteria).
  • v0.17.0 will land §8.2-8.5: work-stealing queue, worker pool spawning, file-ownership enforcement, complexity-based routing.

See CHANGELOG.md for full details.

v0.14.0 — Native AST + Provenance Tagging + Louvain Community Detection

18 Apr 22:36

Choose a tag to compare

The "deeper internal capabilities" release. Three features that complement v0.13.0's graphify integration — bringing similar structural-understanding capabilities natively to SC's KB even when graphify isn't available.

Three features

Phase A — Provenance tagging

Every `working_memory` and `source_meta` row now carries a `provenance` flag (Chin & Older 2011 Ch6+Ch7 'speaks-for' formalism — every claim carries its trust chain):

  • EXTRACTED — read directly from a primary source
  • INFERRED — produced by an LLM
  • AMBIGUOUS — multiple plausible readings
  • UNKNOWN — legacy default

API additive (backward compat). Promotion/downgrade via re-assert. Migrations 16+17 with CHECK constraint. RT-S3-01 verifies SQL injection blocked.

Phase B — AST extractor (TS/JS/Python)

Regex-based deterministic L0/L1 for code files without an LLM call. ~80% LLM cost reduction on indexing for code-heavy projects.

Live samples from the agent run:

  • `rate-limiter.js` → "REST API Rate Limiter Middleware. Contains 1 class, 1 function."
  • `search.js` → "Task Search — Fuzzy Matching... Contains 2 functions, 1 import."

Why regex first, tree-sitter later: tree-sitter requires per-language WASM grammars (~500KB each) that aren't bundled. Regex covers 80/20 case at zero install friction. Interface designed for v0.15.0 swap with no breaking change.

Phase C — Louvain community detection

`zc_kb_cluster` + `zc_kb_community_for` MCP tools cluster KB sources by graph topology (no embeddings needed). For "what's related to X" questions, two files that import each other are obviously related — no embedding call needed.

Live verification: clustered 26 sources from Test_Agent_Coordination into 5 communities (sizes 6+6+5+2+1).

(Algorithm note: Louvain not Leiden — Leiden isn't published as npm package. Same family, similar quality.)

Test summary

  • 541/541 tests pass (470 baseline + 71 new)
  • Live agent run: All three features fired correctly with real Claude CLI agents on Test_Agent_Coordination
  • Edge cases covered: empty files, syntax-broken files, very-large >5MB files, comments-only, abstract classes, generator functions, default exports, Python all, async def, decorators

Two new MCP tools

Tool Purpose
`zc_kb_cluster()` Run Louvain over KB; persist communities
`zc_kb_community_for(source)` Look up a source's community + community-mates

Backward compatible

All existing code paths unchanged:

  • `rememberFact` and `indexContent` keep old signatures (provenance is new optional last arg)
  • AST is automatic for code extensions (no API change)
  • Migrations 16+17 are defensive (idempotent)

Recommended workflow for agents

Question Right tool
"What's the architecture of this project?" `zc_kb_cluster` first, drill in with `zc_kb_community_for`
"What's related to file X?" `zc_kb_community_for("file:src/X.ts")`
"Summarize this code file" `zc_file_summary` (now AST-extracted if TS/JS/Python)

What's next

Sprint 3 picks up Tier 3 access-control fixes — see `HARNESS_EVOLUTION_PLAN.md §8.6` (locked with hard "DO NOT START" gate).


See CHANGELOG.md for full details.

v0.13.0 — graphify Integration: Structural Knowledge Graph as a First-Class SC Capability

18 Apr 21:24

Choose a tag to compare

SC + graphify stacked. SC now proxies to graphify (29.7k★, AI coding assistant skill) so agents can navigate the structural knowledge graph alongside SC's persistent state + telemetry. They solve different problems and stack multiplicatively for token savings on architectural questions.

Three new MCP tools

  • `zc_graph_query(query)` — natural-language query over the structural graph (god nodes, communities, relationships)
  • `zc_graph_path(from, to)` — shortest path between two named nodes
  • `zc_graph_neighbors(node)` — immediate neighbors of a named node

All three return helpful hints when graphify isn't set up — they're inert until `pip install graphifyy && /graphify .` is run.

Auto-index `GRAPH_REPORT.md`

`zc_index_project` now auto-detects `graphify-out/GRAPH_REPORT.md` and indexes it into SC's KB so agents discover it via normal `zc_search` without needing to know graphify exists.

Token savings (combined SC + graphify)

Question Without either SC alone graphify alone Both stacked
Architectural ("how does auth work") ~25k ~2k ~500 orient ~1.5k
State / history N/A ~1.5k N/A ~1.5k
Specific implementation N/A ~800 N/A ~800

Tests

  • 470/470 pass (459 baseline + 11 new graph_proxy tests)
  • Live subprocess path not unit-tested (requires Python + graphifyy in CI; covered by manual integration)

How to enable

```bash

One-time

pip install graphifyy && graphify install

Per project

/graphify .

In your AI assistant

zc_graph_query "how does the auth flow connect to the database?"
```

If graphify isn't installed, SC works exactly as before. The new tools just return hints.

Recommended workflow

Question type Right tool
Architectural / structural `zc_graph_query` first, then `zc_search` for precise content
State / history `zc_recall_context`
Specific implementation `zc_search`
What's connected to X `zc_graph_neighbors`

Deferred to v0.14.0

The deeper internal capabilities (complement graphify rather than replace):

  • Native AST tree-sitter pre-pass for code files (LLM-free L0 — ~50% indexing cost reduction)
  • EXTRACTED / INFERRED / AMBIGUOUS provenance tagging (Chin & Older 2011 "speaks-for" formalism — every claim carries its trust chain)
  • Leiden community detection over SC's KB (graph topology beats vector similarity for some queries)

Then Sprint 3 picks up Tier 3 access-control fixes — see `HARNESS_EVOLUTION_PLAN.md §8.6`.


See CHANGELOG.md for full details.

v0.12.1 — Tier 2: Reference Monitor + session_token Binding for Telemetry

18 Apr 21:17

Choose a tag to compare

Closes the two largest remaining access-control gaps from the v0.12.0 design review. Telemetry writes now have a single bypass-proof enforcement point that authenticates the writer's identity, not just verifies row integrity.

Highlights

  • HTTP API Reference Monitor — `POST /api/v1/telemetry/tool_call` and `/outcome` enforce session_token binding before any DB write. Pattern from Chin & Older 2011 Ch12 ("Reference Monitor" — exactly one enforcement point per protected resource, tamper-proof + always invoked + verifiable).
  • session_token binding — every telemetry write requires `Authorization: Bearer <session_token>`. Server asserts the token's bound `agent_id` matches the row's claimed `agent_id` (HTTP 403 on mismatch).
  • `ZC_TELEMETRY_MODE` env switch — `local` (default, unchanged), `api` (route through Reference Monitor), `dual` (write to both for migration).
  • Token cache + lifecycle — fetched lazily, cached 1 hour, re-fetched on 401, falls back to local mode if unreachable.

Security closes Tier 2 gaps

Gap Before v0.12.1 After v0.12.1
#1 No bypass-proof enforcement Each agent's MCP server opened the project DB directly All writes route through the API; only the API process holds DB write authority
#2 `agent_id` was an unauthenticated string Agent A could write rows claiming to be agent B API verifies `body.agentId === token.aid`; forgery blocked with HTTP 403

Combined with v0.12.0's per-agent HMAC subkey (Tier 1 #1), telemetry rows are now integrity-protected (chain) AND authenticated (token-bound writer).

Red-team tests

  • RT-S2-02: alice's token cannot write a row claiming bob → 403
  • RT-S2-03: missing/malformed/empty Authorization header → 401
  • RT-S2-04: revoked token → 401
  • RT-S2-05: project-A token used against project-B → 401 (project-scoped capability per Ch11)
  • RT-S2-06: end-to-end via `recordToolCallViaApi` succeeds with valid token

Test summary

  • 459/459 tests pass (449 baseline + 10 new Reference Monitor tests)
  • Stress test still chain ✓ OK under 10 concurrent writers × 100 calls (458 writes/sec)

Upgrade notes

Backward-compatible by default. Existing deployments continue using local-mode SQLite unless they set `ZC_TELEMETRY_MODE=api`.

For multi-agent production deployments:

  1. Set `ZC_API_KEY` (already required for v0.9.0+ broadcast RBAC)
  2. Set `ZC_TELEMETRY_MODE=api` in agent environments
  3. Set `ZC_AGENT_ID` + `ZC_AGENT_ROLE` per agent (used for session_token issuance)
  4. Rebuild the SC HTTP API Docker image — the new `/api/v1/telemetry/*` endpoints need the v0.12.1 code. The shipped image will need a refresh.

Deferred to v0.12.2

  • Postgres backend (`ChainedTablePostgres`) — second `ChainedTable` implementation
  • Tier 1 fix #2 (POSIX 0700/0600 hardening)
  • Tier 1 fix #3 (per-agent Postgres role with INSERT-only grant)
  • Cross-backend stress test
  • Docker image rebuild + publish

Sprint 3 then picks up Tier 3 — see `HARNESS_EVOLUTION_PLAN.md §8.6` (locked in with hard "DO NOT START Sprint 3 until..." gate).


See CHANGELOG.md for full details.

v0.12.0 — Sprint 2 Prep: ChainedTable Abstraction + Per-Agent HMAC Subkey (Tier 1 #1)

18 Apr 21:08

Choose a tag to compare

Foundation release for the dual-backend telemetry roadmap. Ships the storage abstraction layer that v0.12.1 will plug Postgres into, and closes the largest pre-existing access-control gap in v0.11.0's hash-chain design.

Highlights

  • ChainedTable backend-agnostic abstraction with HKDF-derived per-agent HMAC subkey
  • Tier 1 access-control fix #1 closed: per-agent HMAC subkey blocks cross-agent row forgery (RT-S2-01 verifies)
  • Async public API (Option 4): recordToolCall, recordOutcome, and the 3 resolvers are all async — SQLite path stays sync internally; future backends drop in without API change
  • Removed _lastHashCache from v0.11.0 (was redundant with BEGIN IMMEDIATE, added a Heisenbug surface)

Security closes Tier 1 Gap #5 (Chin & Older 2011, Ch6+Ch7)

v0.11.0 used the raw machine secret as the HMAC key, making chains integrity-only. An insider with the machine secret could compute valid HMACs for any agent_id.

v0.12.0 derives per-agent subkeys: `HKDF-Expand(machine_secret, "zc-chain:" || agent_id, 32)`. Verifier reads each row's stored agent_id and derives the matching subkey — a row claiming the wrong identity fails HMAC verification.

Combined with v0.12.1's session_token binding, telemetry rows become authenticated, not just integrity-protected.

⚠️ BREAKING — chain verification

Existing v0.11.0 chains will fail to verify under v0.12.0. The HMAC key derivation changed (raw secret → HKDF subkey). `verifyToolCallChain` reports `brokenAt: 0, brokenKind: "hash-mismatch"` for any pre-upgrade row.

Migration: non-production deployments can truncate and restart. Production deployments should wait for v0.12.1's `scripts/migrate-v011-to-v012-chains.mjs` re-hash helper.

Test summary

  • 449/449 tests pass (433 baseline + 16 new chained_table tests + RT-S2-01)
  • Stress test 10×100 still chain ✓ OK (regression from v0.11.0 + a7ed9a1 confirmed)
  • All 22 prior test files have async-cascade calls updated; no test logic changes

What ships next (v0.12.1)

  • Tier 2 fix #1: Reference Monitor (telemetry routes through HTTP API, single bypass-proof enforcement point per Chin & Older Ch12)
  • Tier 2 fix #2: session_token binding for telemetry writes
  • Postgres backend (`ChainedTablePostgres`) with single-statement INSERT + FOR UPDATE
  • `ZC_TELEMETRY_BACKEND=sqlite|postgres|dual` env selection
  • Remaining Tier 1 fixes (POSIX hardening + per-agent Postgres role)
  • Cross-backend stress test

Sprint 3 then picks up Tier 3 — see `HARNESS_EVOLUTION_PLAN.md §8.6`.


See CHANGELOG.md for full details.