feat(vigil): plugin-hook-liveness canary#558
Conversation
Expose provenance_context on process_agent_update for model-facing S22 metadata. Recover S22 fields mangled into recent_tool_results without promoting operational args or creating bogus tool evidence. Verified with focused pytest suite, compileall, ruff undefined-name checks, diff check, static scan, and independent review.
On 2026-06-02 the unitares-governance-plugin hook chain was found dark for
~2.5 weeks (single-quoted ${CLAUDE_PLUGIN_ROOT} in hooks.json suppressed
shell expansion; every hook died "No such file"). checkins.log had frozen
at 2026-05-16 and nothing surfaced it — no canary asked "are my own hooks
firing?".
This adds that canary as a Vigil check. The load-bearing constraint: it is
EXTERNAL to the hook chain (a heartbeat written by the hooks can't detect the
hooks being un-dispatchable — the exact failure was the wrapper never running).
It's a divergence test, not a staleness threshold: it compares Claude Code's
own activity log (~/.claude/history.jsonl, which the plugin can't suppress)
against the newest hook artifact (checkins.log / hook-skips.log), and fires
only when there is recent CC activity AND every hook artifact is stale — so
"hooks dark" is distinguished from "operator idle". Taking the newest artifact
across send+skip logs means a healthy-but-gated chain still reads alive
(signal = "did a hook dispatch", not "did a check-in record").
Severity warning (advisory to verify, not a critical page). Thresholds are
env-configurable (VIGIL_HOOK_ACTIVITY_HOURS / VIGIL_HOOK_STALE_HOURS).
Registered as a built-in; 9 unit tests on the pure assess() incl. the
2026-06-02 reproduction case.
✅ Documentation Validation PassedTool Count: 7 tools tools All documentation is synchronized with the codebase. |
|
Correction to the "needs a consumer" caveat above — verified, it's already wired. I flagged in the description that routing
Same path |
Why
On 2026-06-02 the
unitares-governance-pluginhook chain was found dark for ~2.5 weeks. Every command in the plugin'shooks.jsonwrapped${CLAUDE_PLUGIN_ROOT}in single quotes, which/bin/shwon't expand — so every hook resolved to a literal path and died "No such file".~/.unitares/checkins.loghad frozen at 2026-05-16 and nothing surfaced it. In a project whose thesis is observability of agent behavior, there was no canary asking "are my own hooks even firing?". (Plugin fix:cirwel/unitares-governance-plugin0.4.5.)This adds that canary.
Design
The load-bearing constraint: the canary is external to the hook chain. A heartbeat written by the hooks can't detect the hooks being un-dispatchable — the exact failure was that the wrapper never ran, so anything it was meant to touch never got touched. A self-reported signal is blind to its own death.
So
PluginHookLiveness(running inside Vigil, a separate launchd process) compares two independent signals:~/.claude/history.jsonl(advances on every prompt; the plugin can't suppress it).checkins.log(send) +hook-skips.log(gated skip). Newest-wins means a healthy-but-gated chain still reads alive: the signal is "did a hook dispatch", not "did a check-in record".It's a divergence test, not a staleness threshold: it fires only when there's recent CC activity and every hook artifact is stale — distinguishing "hooks dark" from "operator idle" so a quiet weekend doesn't page anyone.
Honest scope (no silent caps)
~/.claude/hooks) governance hooks → it answers "is the governance hook layer alive", slightly broader than "is the plugin chain alive". That's the right default.stale_hourswith no dispatch could read dark. Default window (24h) makes this rare; severity iswarning(advisory to verify), not critical.fingerprint_key=plugin_hook_chain_dark) — route it to the Discord governance bridge / Sentinel where it'll actually be seen. (Follow-up, not in this PR.)Config
VIGIL_HOOK_ACTIVITY_HOURS(default 12),VIGIL_HOOK_STALE_HOURS(default 24),VIGIL_CC_HISTORY_PATH,VIGIL_HOOK_ARTIFACT_PATHS.Tests
9 unit tests on the pure
assess()(no clock/fs coupling), including the 2026-06-02 reproduction (CC active + checkins.log 17 days stale →plugin_hook_chain_dark) and the newest-artifact-wins gated-chain case. Full Vigil suite: 100 passed.Verified live: against this machine the check reads "live" off real
turn_stop/auto_editrows the now-fixed plugin hooks wrote this session (UUIDcc447979) —checkins.logwent from frozen-May-16 to actively writing.Note
Pre-existing unrelated failure in
test_lease_plane_canonicalize.py::...on_macos(sandboxTMPDIRis/private/tmpnot/private/var) — reproduces with this branch's changes stashed; not introduced here.🤖 Generated with Claude Code