Date: 2026-02-13 Status: Accepted
The hypergumbo project uses AI agents for autonomous development. These agents are governed by a "stop hook" system (see ADR-0008) that decides whether an agent is allowed to stop working or must continue. The hook counts open work items and invariant violations to determine if the agent has finished.
Currently, the stop hook's input is two markdown files:
.agent/invariant-ledger.md(455 lines) — Tracks discovered code invariants (e.g., "everycallsedge has a non-null caller symbol"), their root causes, fixes, and whether they've been fully generalized across all languages.~/hypergumbo_lab_notebook/guidance_log/work_items.md(35 lines) — A categorized backlog of non-invariant work (developer experience, linkers, CI, etc.).
The stop hook (stop_logic.sh) reads these via grep patterns:
grep -c '^\s*- \*\*TODO!\*\*' "$LEDGER_FILE" # hard TODOs (block stopping)
grep -c '^\s*- \*\*TODO\*\*[^!]' "$LEDGER_FILE" # soft TODOs (block stopping)This works but is fragile and limited:
- No schema enforcement — Nothing prevents malformed entries, status typos ("FIXD" instead of "FIXED"), or vocabulary drift. Agents can write whatever they want.
- No deduplication — Agents can create redundant items with no detection.
- No human interface — Only agents read/write these files. Humans have no ergonomic way to browse, filter, triage, or override agent decisions.
- No access control — The human cannot lock a field to prevent agent modification, nor can they have a private conversation with the agent about a specific item.
- Fragile parsing — The grep patterns break if the markdown format drifts even slightly.
- SQLite as primary storage — Binary file, opaque to git diffs, can't be meaningfully reviewed in PRs. (SQLite is used as an out-of-tree read cache — see Read Cache — but the source of truth is always the append-only op log files.)
- git-bug — Mature (6+ years, ~100k LOC Go) but rigid: status is hardcoded to Open/Closed, no parent-child, extending requires Go compilation. Wrong language for a Python project. However, git-bug's core insight — storing immutable operations rather than mutable snapshots — directly inspired our operation-log storage model (see Item Schema). git-bug's "entity ID = SHA-256 hash of the first operation" also inspired our hash-based ID scheme (see Key Design Decisions).
- beads — Feature-rich (~250k LOC Go) but overengineered for our needs. Beads' per-field resolution strategies (terminal-status-wins, timestamp tiebreakers) inspired the compile rules in Compile Rules. Beads' hash-based IDs (UUID → truncated SHA-256) validated the collision-free distributed ID approach we adopt in Key Design Decisions.
- Separate git repo — Unnecessary complexity. The YAML files are small and merge cleanly in the same repo.
Replace both markdown files with a YAML-backed structured tracker that provides:
- Schema-enforced controlled vocabulary — Statuses, kinds, and other fields are validated against a config file. Invalid values are rejected.
- CLI for agents (
scripts/tracker) — Replaces grep patterns. The stop hook callsscripts/tracker count-todos --hardinstead of grepping markdown. - TUI for humans (
scripts/tracker tui) — A Textual-based terminal UI for browsing, editing, filtering, and discussing items. - Append-only operation log — Each item's history is stored as a hidden YAML file (
.opsextension, dotfile naming) in a dotdir, containing an ordered list of immutable operations. Current state is derived by replaying ops. The append-only format means git can always auto-merge concurrent edits — no custom merge driver needed. The op log files are deliberately hidden from agents via dotdir + dotfile + explicit AGENTS.md rules (see Agent Context Protection). - Three-tier visibility — Items live in one of three tiers: canonical (committed, shared with upstream), workspace (committed, backed up to fork remote, excluded from upstream PRs), or stealth (gitignored, never leaves the machine). Visibility only moves up via
promote(workspace → canonical) or down viademote/stealth. This directory-level separation cleanly handles the fork workflow (see Three-Tier Visibility). - Fork-safe by design — Contributors fork the repo and get upstream's canonical tracker as read-only context. Their agent writes to workspace (committed to the fork, backed up to the fork's remote).
scripts/contributeautomatically excludes workspace from upstream PRs. Canonical items can be promoted as separate, intentional PRs (see Three-Tier Visibility). - Field locking — Humans can lock any field to prevent agent modification.
- Discussion threads — Each item has an async discussion field where human and agent exchange messages.
- Parent-child relationships — Items can form trees via an optional
parentfield. - Configurable kinds — Item types are defined in
config.yaml, not hardcoded. Users can add new kinds (with custom ID prefixes) without changing code. - SQLite read cache — An out-of-tree SQLite database (in
$XDG_CACHE_HOME) caches compiled snapshots for fast queries. The op log files remain the source of truth; the cache accelerates the read path (list,ready,count-todos) so agents can query the tracker on every task-selection cycle without parsing hundreds of op log files.
Instead of hardcoding separate types like invariants, work items, issues, bugs, etc, all items share a single universal schema. The kind field determines what type of item it is, and valid kinds are defined in config.yaml. To add a new kind (say, "latke"), you edit the config — no code changes. Each kind can optionally declare a fields_schema that names the known fields for its fields dict, their types, and whether they're required (see Key Design Decisions). Kinds without a schema have fully open-ended fields.
.agent/
├── tracker/ # canonical tier (committed, shared with upstream)
│ ├── config.yaml.template # tracked governance rules (kinds, statuses, field schemas)
│ ├── config.yaml # gitignored, generated by `init`, human-owned (mode 644)
│ └── .ops/ # committed op logs (dotdir — agents should not read)
│ ├── .INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit.ops
│ ├── .META-dabop-firuz-hadol-jikam-losib-mufad-nokap-pidul.ops
│ └── .WI-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad.ops
└── tracker-workspace/ # workspace tier (committed, fork-local)
├── config.yaml.template # tracked template for per-fork overrides
├── config.yaml # gitignored, generated by `init` or `fork-setup`
├── .ops/ # committed op logs (backed up to fork remote)
│ ├── .WI-gutob-kinap-sifad-tuhom-badol-fikam-gusib-hilap.ops
│ └── .INV-hamoj-libud-mifog-nakip-rosab-sudol-tifag-vukim.ops
└── stealth/ # stealth tier (gitignored, never leaves machine)
└── .WI-julad-mifog-vakob-zikap-bomud-diral-fusob-gihap.ops
$XDG_CACHE_HOME/hypergumbo-tracker/<repo-fingerprint>/
├── canonical.cache.db # SQLite read cache for canonical tier
├── canonical.last_list # positional alias stash for canonical
├── workspace.cache.db # SQLite read cache for workspace tier
├── workspace.last_list # positional alias stash for workspace
├── stealth.cache.db # SQLite read cache for stealth tier
└── stealth.last_list # positional alias stash for stealth
Two tracker directories, each containing op log files in a .ops/ dotdir. One file per item, flat within each dotdir. Each file is a dotfile (.INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit.ops) containing an append-only operation log (see Item Schema). The kind is inside the YAML content, not encoded in the path. The .ops extension and dotfile naming are deliberate — they prevent agents from casually reading the raw operation history (see Agent Context Protection).
Cache and ephemeral files (.cache.db, .last_list) live outside the repo tree in $XDG_CACHE_HOME/hypergumbo-tracker/<repo-fingerprint>/ (see Read Cache). The repo-fingerprint key (hash of remote URL + first commit SHA) allows multiple checkouts of the same repo to share a cache, and avoids ownership/permission conflicts when two OS users share a checkout.
Canonical (.agent/tracker/) is the repo's institutional memory — committed, shared across forks, included in upstream PRs. Workspace (.agent/tracker-workspace/) is the agent's personal working memory — committed and pushed to the fork's remote for backup, but excluded from upstream PRs by scripts/contribute. Stealth (.agent/tracker-workspace/stealth/) is gitignored and never leaves the machine.
The store reads all three tiers transparently via TrackerSet — agents and humans see a unified merged view. Items are tagged [C], [W], or [S] in CLI output to indicate their tier. The agent writes to workspace by default; items are promoted to canonical explicitly (see Three-Tier Visibility). Agents should never read op log files directly — they use scripts/tracker show <ID> or scripts/tracker show <ID> --json to get compiled current state (see Agent Context Protection).
The repo tracks a template (config.yaml.template). The actual config (config.yaml) is gitignored and human-owned via OS file permissions — the standard .env.example → .env pattern (see Security Model).
scripts/tracker init (run by the human user) copies the full template to config.yaml, then performs a YAML-aware merge of the per-deployment fields (see below) into the copy — for example, stop_hook.scope is merged into the existing stop_hook block alongside blocking_statuses and resolved_statuses, not appended as a duplicate top-level key. Finally, it sets ownership: chown <human_user> config.yaml && chmod 644 config.yaml. The result is a complete config file — not just overrides. The agent can read config (needs to, for validation) but cannot write it — the OS enforces this.
The validation code loads config from a chain: config.yaml if it exists, otherwise config.yaml.template (fallback, not merge). CI uses the template directly (which contains all governance rules but no per-deployment fields — actor_resolution and stop_hook.scope use built-in defaults when absent). validate warns in both directions: when config.yaml contains kinds or statuses not present in config.yaml.template (local-only additions that would fail in CI), and when config.yaml.template has been updated with new kinds or statuses that config.yaml doesn't have (stale local config — re-run init to regenerate).
config.yaml.template (tracked, shared governance rules):
kinds:
invariant:
prefix: INV
description: "Discovered invariant with root cause analysis"
fields_schema:
statement:
type: text
required: true
description: "The invariant being tracked"
root_cause:
type: text
required: true
description: "Why the invariant was violated"
fix:
type: text
description: "How the root cause was addressed"
verification:
type: text
description: "How the fix was verified across languages/constructs"
regression_tests:
type: list
description: "Test cases that guard against recurrence"
scope:
type: text
description: "Which languages/constructs are affected"
progress_pct:
type: integer
min: 0
max: 100
description: "Percentage of affected scope addressed"
meta_invariant:
prefix: META
description: "Cross-cutting invariant tracking multi-language coverage"
fields_schema:
statement:
type: text
required: true
description: "The cross-cutting invariant"
languages_done:
type: list
description: "Languages where the invariant holds"
languages_remaining:
type: list
description: "Languages not yet checked or fixed"
progress_pct:
type: integer
min: 0
max: 100
description: "Percentage of languages addressed"
work_item:
prefix: WI
description: "Backlog item for non-invariant work"
# No fields_schema — work items use title/description only.
# Omitting fields_schema means: no known fields, no validation,
# no warnings on arbitrary keys.
# Add new kinds freely — no code changes needed:
# latke:
# prefix: LTK
# description: "Jews for a free Palestine"
# fields_schema:
# filling:
# type: text
# required: true
# Status vocabulary. All statuses are config-defined — no Python enum.
# With OS-permission-protected config, this is harder for the agent to
# modify than source code (see [Security Model](#security-model)).
statuses:
- todo_hard # investigate deeply, assume structural
- todo_soft # address or defer freely
- in_progress # actively being worked on
- done # completed
- deferred # explicitly deferred
- wont_do # decided against
# Stop hook semantics. blocking_statuses are what count-todos counts.
# resolved_statuses are what the `before` soft-blocking filter treats
# as "done" (predecessor resolved). These sets must not overlap, and
# blocking_statuses must be non-empty (otherwise the stop hook is toothless).
# Statuses in neither set (like in_progress) are "neutral": they don't
# block stopping and don't satisfy `before` soft-blocking. This is
# intentional — in_progress items are actively being worked on, so
# they shouldn't block the stop hook (the agent is already on it),
# but they also aren't "done" for dependency purposes.
stop_hook:
blocking_statuses: [todo_hard, todo_soft]
resolved_statuses: [done, deferred, wont_do]
# Freeform tags for categorization. The config lists "well-known" tags for
# autocomplete in the TUI. Validation does NOT reject unknown tags (open vocabulary).
well_known_tags:
- developer_experience
- cross_language_linkers
- analysis_quality
- language_additions
- ci_infrastructure
- framework_patternsPer-deployment fields (merged into the template copy by init via YAML-aware merge — config.yaml is a complete file, not just overrides). These fields have built-in defaults when absent, so the template works standalone in CI:
# Actor resolution. Usernames matching these patterns are resolved as "agent".
# All other usernames resolve as "human". Default (when absent): ["*_agent"].
# See [Security Model](#security-model).
actor_resolution:
agent_usernames: ["*_agent"]
# Stop hook scoping (merged into the stop_hook block from the template).
# On upstream repos, "all" counts canonical + workspace + stealth.
# On forks, "workspace" counts workspace + stealth — the fork agent isn't blocked by
# upstream's canonical items, which it can read but not close.
# Default (when absent): "all".
# Detected automatically by scripts/tracker fork-setup.
stop_hook:
scope: all # "all" | "workspace"
# Lamport clock branch set. The clock peeks at these branches (plus HEAD
# and any unmerged branches) to compute cross-branch causal ordering.
# Default (when absent): ["dev", "main"].
# Override for repos using different branch conventions (e.g., ["master"]).
lamport_branches: ["dev", "main"]Validation enforces:
- All referenced statuses exist in the
statuseslist. blocking_statusesandresolved_statusesdon't overlap.blocking_statusesis non-empty (otherwise the stop hook is toothless).actor_resolution.agent_usernamesis a non-empty list of glob patterns.lamport_branchesis a non-empty list of branch name strings.
Each op log file (.ops) is an append-only list of operations. The store never mutates existing ops — it only appends new ones. Current state is derived by compile(), a pure function that replays all ops in Lamport clock order (see Compile Rules).
# .INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit.ops — append-only operation log (in .agent/tracker/.ops/)
- op: create # f7a2
at: "2026-02-11T18:00:00Z" # f7a2
by: agent # f7a2
actor: jgstern_agent # f7a2
clock: 1 # f7a2
nonce: f7a2 # f7a2
data: # f7a2
kind: invariant # f7a2
title: "Call Attribution Completeness" # f7a2
status: todo_hard # f7a2
priority: 2 # f7a2
parent: null # f7a2
tags: [analysis_quality] # f7a2
before: [] # f7a2
duplicate_of: [] # f7a2
not_duplicate_of: [] # f7a2
pr_ref: null # f7a2
description: "" # f7a2
fields: # f7a2
statement: "Every emitted `calls` edge has a non-null caller symbol" # f7a2
root_cause: "JS/TS arrow function early-return in _get_enclosing_function()" # f7a2
fix: "Position-based lookup for arrow functions" # f7a2
verification: "Kotlin and Scala lambdas work correctly..." # f7a2
regression_tests: # f7a2
- "test_js_ts.py::TestCallbackCallAttribution" # f7a2
- "test_kotlin.py::TestKotlinLambdaCallAttribution" # f7a2
scope: null # f7a2
progress_pct: null # f7a2
- op: discuss # b3c1
at: "2026-02-11T18:30:00Z" # b3c1
by: human # b3c1
actor: jgstern # b3c1
clock: 2 # b3c1
nonce: b3c1 # b3c1
message: "I think this should be higher priority because it affects CI." # b3c1
- op: update # d4e5
at: "2026-02-11T18:31:00Z" # d4e5
by: agent # d4e5
actor: jgstern_agent # d4e5
clock: 3 # d4e5
nonce: d4e5 # d4e5
set: # d4e5
priority: 0 # d4e5
- op: discuss # a1b2
at: "2026-02-11T18:32:00Z" # a1b2
by: agent # a1b2
actor: jgstern_agent # a1b2
clock: 4 # a1b2
nonce: a1b2 # a1b2
message: "Agreed. Bumping to P0." # a1b2
- op: lock # c8d9
at: "2026-02-11T18:33:00Z" # c8d9
by: human # c8d9
actor: jgstern # c8d9
clock: 5 # c8d9
nonce: c8d9 # c8d9
lock: [priority] # c8d9
- op: update # e6f7
at: "2026-02-11T19:30:00Z" # e6f7
by: agent # e6f7
actor: jgstern_agent # e6f7
clock: 6 # e6f7
nonce: e6f7 # e6f7
set: # e6f7
status: done # e6f7Operation types:
| Op type | Fields | Effect |
|---|---|---|
create |
data: {kind, title, status, priority, ...} |
Initialize item with all fields |
update |
set: {field: value, ...}, optional add: {field: [value, ...]}, optional remove: {field: [value, ...]} |
set: overwrite scalar fields (LWW). add/remove: incremental modification of set-valued fields (tags, before, duplicate_of, not_duplicate_of) — see Compile Rules. |
discuss |
message: "..." |
Append a discussion entry |
discuss_clear |
(none) | Clear all previous discussion entries |
discuss_summarize |
message: "..." |
Replace discussion with a single summary |
lock |
lock: [field, ...] |
Add fields to locked set |
unlock |
unlock: [field, ...] |
Remove fields from locked set |
promote |
(none) | Record promotion (workspace → canonical; file also moves) |
demote |
(none) | Record demotion (canonical → workspace; file also moves) |
stealth |
(none) | Record move to stealth (workspace → stealth; file moves to gitignored dir) |
unstealth |
(none) | Record move from stealth (stealth → workspace; file moves back) |
reconcile |
from_tier: "...", reason: "..." |
Record automated cross-tier duplicate resolution (see Self-Healing Reconciliation) |
Every op carries at (ISO 8601 UTC timestamp), by (agent or human), actor (the OS username that performed the operation, e.g., jgstern_agent — preserved for audit trail and multi-agent debugging; see Security Model), clock (Lamport clock — monotonically increasing integer per op log file), and nonce (4 random hex chars). The nonce appears as an inline # <nonce> comment on every line of each op — not just the first line. This is load-bearing for merge=union correctness: it makes every line globally unique, preventing git's line-level union driver from deduplicating or stripping shared lines across ops. See Compile Rules.
The operation log IS the audit trail. There is no separate audit_trail field — the file itself is a complete, ordered record of every change. scripts/tracker log <ID> prints the raw ops; scripts/tracker show <ID> prints the compiled current state.
Append-only operation log. Instead of storing a mutable snapshot (read-modify-write), each change appends an immutable op to the file. This is a simplified version of git-bug's operation-sourced model, adapted to plain YAML files instead of git objects. The key benefit: concurrent edits to the same item never produce git conflicts. Op log files are marked merge=union in .gitattributes (see .gitattributes), which tells git to keep lines from both sides on conflict. Combined with the nonce-on-every-line serialization format (see Compile Rules), this guarantees all ops are preserved as distinct YAML list items without conflict markers or data loss. The compile() function sorts ops by Lamport clock and applies them deterministically, regardless of the order they appear in the file (see Compile Rules).
Lamport clock for causal ordering. Each op carries a clock field — an integer that captures causal ordering across branches. When appending an op, the store peeks at the op log file on a scoped set of branches (via git cat-file --batch), computes max(clock) across all of them, and sets the new op's clock to max + 1. This is a genuine Lamport clock: the cross-branch peek is the "message receive" step in the classic algorithm (clock = max(local, received) + 1). All reads are from the local git object store — no network calls, sub-millisecond per branch, works fully offline.
Implementation: git cat-file --batch. Rather than spawning one git show <branch>:<path> subprocess per branch (which costs ~1ms per call), the store pipes all <branch>:<path> refs into a single git cat-file --batch subprocess. Benchmarking shows this is ~14× faster than serial git show: 257 branches resolves in ~19ms (batch) vs. ~265ms (serial). This makes the Lamport clock negligible overhead at any realistic branch count — it is not a scaling bottleneck.
Scoped branch set. The peek only scans branches that could contain unmerged tracker ops: dev, main, and HEAD (the current working branch). Already-merged feature branches are redundant — their ops are already reachable via dev. In hypergumbo's sequential workflow (auto-pr blocks new work while CI runs), there's rarely even a second active feature branch. As a safety margin, branches with unmerged commits (git branch --no-merged dev) are also included, but this set is typically empty. Stale feature branches are excluded entirely — see "Branch hygiene" below.
This ensures causal ordering across the active branch frontier: if agent B can see agent A's ops on any active branch (even without merging), B's next op gets a strictly higher clock — correctly ordered regardless of wall-clock skew between machines. For truly concurrent ops (written on branches not locally visible to each other — e.g., one agent hasn't fetched the other's remote), both sides may produce the same clock value; the tiebreaker (clock, timestamp, actor_rank) resolves these deterministically. The guarantee boundary is honest: causally ordered relative to everything locally visible, tiebreaker for everything else — the strongest guarantee possible without a centralized server.
Inspired by git-bug's Lamport clock system (util/lamport/), but requiring no separate clock files or git object storage — just one integer field per op and a cross-branch max() call on append.
Branch assumptions and fallbacks. The scoped branch set assumes dev and main exist. For repos using different conventions or degraded environments:
- If
devis missing, fall back tomain; if both are missing, useHEADonly. - Shallow clones (
git clone --depth N):git cat-file --batchmay fail to resolve objects for branches referencing commits outside the shallow history. The clock still works but loses cross-branch ordering; the(clock, timestamp, actor_rank)tiebreaker applies — same as the cross-branch concurrency guarantee, honestly degraded. - The branch names (
dev,main) are documented constants in the store. For repos usingmasteror trunk-based development withoutdev, override via alamport_brancheslist inconfig.yaml(default:[dev, main]). - Performance expectation: the scoped set should be ≤5 branches.
git branch --no-merged devis the only potentially expensive call; on repos with hundreds of stale branches, this could take tens of milliseconds. Branch hygiene (below) keeps this small in practice.
Branch hygiene. Stale feature branches (already merged into dev) are useless for the Lamport clock — they contain no ops that aren't already on dev. To prevent accumulation, scripts/auto-pr deletes feature branches (local and remote) after successful merge. For manual PRs, AGENTS.md documents the expectation: delete your feature branch after merge. This keeps the scoped branch set small (typically 2–3 branches) and eliminates the risk of degraded performance from branch accumulation.
Rebase-safe by design. The nonce-on-every-line serialization format (see Compile Rules) makes merge=union safe under both merge and rebase. Because every line carries a unique # <nonce> suffix, git's line-level union driver cannot match or deduplicate lines across different ops — even when two ops share the same structure (e.g., both are update ops setting status: done). This was validated empirically: 9/9 adversarial scenarios (including identical ops, cascade diamonds, and 8-way concurrent ops) produce correct results with nonce-on-every-line, for both merge and rebase strategies. See ~/hypergumbo_lab_notebook/adr-0013-prototyping-scripts/rebase_nonce_every_line.py for the reproduction scripts. The tracker imposes no constraints on git workflow — teams can freely use merge, rebase, squash-merge, or any combination.
status field:
- UNFIXED →
todo_hard - PARTIALLY ADDRESSED →
in_progress - FIXED →
done - TODO! →
todo_hard - TODO →
todo_soft - DONE →
done - DEFERRED →
deferred - WON'T DO →
wont_do
The stop hook counts items whose status is in blocking_statuses (see Config File). The before soft-blocking filter treats items whose status is in resolved_statuses as "done" (predecessor resolved).
Integer priority tiers (0–4). Priority is an integer:
| Value | Meaning |
|---|---|
| 0 | P0: critical / drop everything |
| 1 | P1: high |
| 2 | P2: medium (default) |
| 3 | P3: low |
| 4 | P4: backlog |
If --priority is omitted on add, the CLI assigns a default of 2 (P2). Items are sorted by (priority, before-ordering, created_at) — see below.
before field for enforced ordering. To express "this item should be worked on before that one" without changing priority tiers, an item can declare before: [<ID>, ...]. Read before: [Y] as "I block Y — finish me before starting Y":
- op: update # b2c3
at: "2026-02-12T10:00:00Z" # b2c3
by: human # b2c3
actor: jgstern # b2c3
clock: 7 # b2c3
nonce: b2c3 # b2c3
add: # b2c3
before: [INV-dabop-firuz-hadol-jikam-losib-mufad-nokap-pidul] # b2c3Unlike a display-order hint, before is enforced as a soft-blocking relationship. If item X has before: [Y], then Y is not ready until X is resolved (status is in resolved_statuses — see Config File). This is transitive: if X has before: [Y] and Y has before: [Z], then Z is blocked until both X and Y are resolved. The scripts/tracker ready command (see CLI) returns only items that are actionable (todo_hard or todo_soft) and unblocked — this is what agents use for task selection.
Within the ready set, items are sorted by (priority, created_at). For display purposes (list), all items are shown, sorted by (priority, topological order of before links, created_at). Validation warns on (but does not reject) before links pointing to items in a different tier or to closed items — the ready filter simply ignores stale links. Cycles in before links are rejected by validate — a cycle would deadlock the agent.
Timestamps. created_at is the timestamp of the first op (create). "Last updated" is derived from the timestamp of the last op in the file — computed by compile(), never stored as a separate field. Both are ISO 8601 UTC. Staleness detection ("this todo_hard item hasn't been touched in 14 days") uses the last op's timestamp.
Config-defined statuses. Statuses, blocking_statuses, and resolved_statuses are all defined in config.yaml (see Config File) — not hardcoded as a Python enum. "Hardcoded in code" provides no additional protection over "defined in config" — the agent can edit and auto-PR Python source just as easily as YAML. With the OS-permission-protected config model (see Security Model), config-defined statuses are actually harder for the agent to modify than source code — the agent literally cannot write() to a file owned by the human user with mode 644. The Python code loads the status vocabulary from config at startup and validates ops against it. Adding a new status (e.g., blocked_external) is a config change made by the human — no code PR needed. kind is also validated against config at runtime — adding a new kind is purely a config change.
Hash-based IDs with kind prefix and proquint encoding. IDs are <kind prefix>-<proquint> (e.g., INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit), where the proquint suffix is a proquint encoding of the first 128 bits of the SHA-256 hash of the canonicalized create op content — specifically the data dict, serialized with sorted keys. The hash input excludes at, by, clock, and nonce — so the same logical item created at different times by different actors produces the same ID.
Proquint encoding maps 16 bits to a 5-letter pronounceable syllable (CVCVC pattern: 16 consonants × 4 vowels × 16 consonants × 4 vowels × 16 consonants = 2^16). Eight syllables encode 128 bits ≈ 3.4 × 10^38 values; birthday collision probability is negligible even at planetary scale (8 billion users × 256 agents × 10 items/day × 100 years ≈ 7.5 × 10^17 items yields <0.001 expected collisions). IDs are long but rarely typed in full — prefix matching means INV-lusab suffices in practice. The proquint Python package is pure Python with no dependencies (~30 lines of encode/decode logic; can be vendored).
This gives natural deduplication: if two agents independently discover the same invariant with the same title, description, and fields, they get the same ID. Inspired by both git-bug's entity ID scheme (SHA-256 of the first operation) and beads' hash-based short IDs (UUID → truncated SHA-256).
Same-branch existence check. When add() computes an ID, it checks whether a file with that ID already exists in the target tier. If the existing item has identical data, the item has already been created — add() refuses and reports the existing item: "INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit already exists (title: 'Call Attribution Completeness'). Use 'update' to modify it." This prevents silent overwrites when two agents on the same branch independently discover the same invariant. The agent learns the item exists and can update it instead (e.g., to add fields the first agent didn't fill in). If the existing item has different data (a hash collision), add() appends a salt to the hash input and recomputes — the item is created under a different ID transparently. Cross-branch duplicate creation (two agents on different branches creating the same-ID item before merging) is handled differently — see Compile Rules.
No sequential counter, no lockfile, no single-writer assumption.
Prefix matching and positional aliases. Full proquint IDs are pronounceable but long (INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit), so the CLI accepts any unambiguous prefix for convenience:
scripts/tracker update INV-lus --status done # resolves if unique
scripts/tracker update lus --status done # even without the kind prefix
scripts/tracker show INV-lusab # longer prefix if ambiguousAmbiguous prefix → error listing matches:
error: INV-lus is ambiguous. Did you mean:
INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit Call Attribution Completeness
INV-lusod-fikam-gobad-hilun-jomab-kifud-losip-murad Symbol Resolution Consistency
The list and ready commands display the shortest unambiguous prefix per item (computed at render time, never stored). Additionally, output rows are numbered, and the CLI accepts positional aliases via :N syntax:
scripts/tracker ready
# ID Status Title
1 INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit todo_hard Call Attribution Completeness
2 WI-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad todo_soft Add Dart framework patterns
scripts/tracker update :1 --status done # "item #1 from last list output"The last-displayed ID list is stashed in $XDG_CACHE_HOME/hypergumbo-tracker/<repo-fingerprint>/ (see Storage Layout). The :N syntax is unambiguous (colon distinguishes it from an ID prefix). Positional aliases are ephemeral — never stored in op log files, just a CLI convenience.
Advisory file locking (flock()) around appends. Even append-only files can get corrupted by concurrent writes from multiple processes (agent CLI + human TUI, or two agent tasks running in parallel). The store wraps the critical section — acquire lock, compute Lamport clock, serialize op, append, fsync, release lock — in an advisory file lock, then updates the cache outside the lock (the cache update is idempotent and reads the file, so it's safe to run unlocked). The Lamport clock computation is inside the lock to prevent two concurrent processes from reading the same max clock and producing duplicate clock values on the same branch:
import fcntl
def _append_op(filepath: Path, build_op: Callable[..., bytes], cache: Cache) -> None:
with open(filepath, "a") as f:
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
try:
# Clock computation inside the lock: read current max clock
# from the file (and cross-branch peek), then build the op
# with clock = max + 1. This prevents two concurrent processes
# from computing the same clock value on the same branch.
op_bytes = build_op(filepath)
f.write(op_bytes.decode())
f.flush()
os.fsync(f.fileno())
finally:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
cache.upsert_from_file(filepath) # outside lock — idempotent, reads fileContention is expected to be rare (two processes appending to the same item at the same instant), but when it happens, correctness matters more than performance. The lock scope is per-file, so appends to different items never block each other. Note: flock() is advisory on Linux — a process that doesn't call it can still write to the file. This is fine because the store is the sole writer of op log files (see YAML Serialization Rules); the lock protects against concurrent tracker processes, not against arbitrary file writes.
Two-tier near-duplicate detection. Hash-based IDs catch verbatim duplicates but not semantic near-duplicates. Two agents discovering the same invariant but phrasing it differently ("Every calls edge has a non-null caller" vs. "All calls edges must have non-null callers") produce different hashes and different IDs. A two-tier similarity detection system addresses this:
Tier 1 — SimHash (fast, always runs on add). SimHash computes a locality-sensitive fingerprint over tokenized title + description + fields text. The algorithm is ~30 lines of pure Python (hash each token, accumulate bit-position votes, threshold), runs in microseconds, and requires no external dependencies. SimHash has a useful formal guarantee: for inputs with cosine similarity S, the probability of a k-bit fingerprint collision is (1 - arccos(S)/π)^k. At 64 bits, unrelated items (cosine similarity ~0) have a collision probability of ~10⁻¹⁹ — identical to a random 64-bit hash. The "cost" of locality sensitivity appears only in the moderate-similarity zone (cosine ~0.5: ~10⁻¹³ collision probability at 64 bits), which is precisely where you want detection.
On add, the store computes the new item's SimHash and compares it (by Hamming distance) against existing items' cached SimHash fingerprints. If the distance is below a configurable threshold, a warning is emitted. The store uses a threshold of 13 bits (~20% of 64-bit width) for add-time warnings; validation uses a tighter threshold of 8 bits for validate --similar. These empirical thresholds were tuned during implementation — the illustrative "≤3 bits" from early design proved too aggressive (high false-positive rate on real items). Example warning:
WARNING: INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit is similar to existing INV-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad
(SimHash distance: 3 bits, title overlap: 82%)
Creating anyway. Run `scripts/tracker show INV-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad` to compare.
To mark as duplicate: scripts/tracker update INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit --duplicate-of INV-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad
To suppress: scripts/tracker update INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit --not-duplicate-of INV-fodak-humit-kobap-linud-rasib-sufag-tohim-vukad
The item is created regardless — no blocking. validate --similar resurfaces unflagged pairs on demand.
Tier 2 — Embedding-based LSH with semantic tags (lazy, on demand). When the embedding model is available (optional dependency), validate --deep-similar computes dense embeddings for items, derives semantic tags from the embedding space (e.g., "call-graph-integrity", "symbol-resolution", "edge-attribution"), and applies a second LSH over the tag vectors. This tier discriminates between items that share vocabulary but are about different things — the zone where SimHash produces false positives.
Model choice: nomic-ai/modernbert-embed-base (ONNX, model_q4f16.onnx, 140 MB). ModernBERT is a recent embedding model with strong retrieval performance. The q4f16 variant uses 4-bit weight quantization with fp16 activations — the smallest available variant (140 MB vs. 596 MB for fp32) with negligible quality loss for this coarse-grained task (distinguishing topics, not fine-grained ranking). CPU-friendly via ONNX Runtime, no GPU required, consistent with hypergumbo's local-first philosophy. The ONNX models are hosted at https://huggingface.co/nomic-ai/modernbert-embed-base/tree/main/onnx. Runtime dependencies: onnxruntime (CPU) + tokenizers (for the model's tokenizer). These are lighter than sentence-transformers + PyTorch and avoid pulling in a full deep learning framework.
The semantic tags provide an interpretability layer: when validate --deep-similar flags a pair, it explains why ("both tagged call-graph-integrity, edge-attribution") rather than just reporting a distance. The human or agent makes a faster triage decision.
This tier degrades gracefully: if onnxruntime isn't installed or the model hasn't been downloaded, validate --deep-similar emits a warning and falls back to SimHash-only results. No hard dependency.
Human correction loop. Two list fields on every item support dedup triage (see Duplicate Detection):
duplicate_of: [<ID>, ...]— marks this item as a duplicate of one or more others. Items with non-emptyduplicate_ofare excluded fromreadyandcount-todos.not_duplicate_of: [<ID>, ...]— records explicit "I've reviewed this pair, they're distinct" judgments. Suppresses future similarity warnings for those specific pairs.
Both are set via scripts/tracker update or from the TUI. The human can lock duplicate_of to prevent agent override. validate --similar skips pairs listed in not_duplicate_of.
Per-kind fields_schema (open schema pattern). The fields dict is an open-ended key-value store, but each kind can optionally declare a fields_schema in config.yaml that names the known fields, their types, and whether they're required. Three rules govern validation:
- Known fields are validated strictly. If the schema declares
progress_pctastype: integer, min: 0, max: 100, thenvalidaterejectsprogress_pct: "half done". Required fields (e.g.,statementfor invariants) must be present in thecreateop'sdata.fields. - Unknown fields produce a warning, not an error. An invariant with
fields.rout_causepasses validation but emits:WARNING: INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit has unknown field 'rout_cause' (did you mean 'root_cause'?)— edit-distance suggestion against the declared field names. The pre-commit hook shows the warning; the agent or human can fix it or ignore it. - No
fields_schemameans anything goes. Work items have no structured fields — they usetitleanddescriptiononly. Omittingfields_schema(or setting it to{}) disables field validation for that kind entirely.
Supported types are minimal: text (string), integer (with optional min/max), list (of strings), boolean. No nested objects, no foreign-key references. If a kind needs more complex validation, add it as a custom check in validation.py — don't extend the type system.
This gives the TUI concrete improvements: with a schema, the detail panel renders known fields in declared order with their description as tooltip/label, and unknown fields appear in a separate "Other" section below. The edit form presents known fields as named inputs with type-appropriate widgets (text area for text, spinner for integer with min/max, multi-line list editor for list). Without a schema, the TUI falls back to a generic key-value editor.
Any item can point to a parent via parent: <ID>, forming a tree:
INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit (parent: null) ← root invariant
├── INV-hamoj-libud-mifog-nakip-rosab-sudol-tifag-vukim (parent: INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit) ← child: a specific generalization
└── INV-kipod-nafug-posab-ridol-safim-tuhob-vikad-zulip (parent: INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit) ← child: another generalization
This replaces the old pending_generalizations embedded list — each generalization becomes a first-class item with its own ID, status, priority, and history.
The store provides children(id) and ancestors(id) traversal methods. The TUI toggles between tree-view (indented by hierarchy) and flat table-view (filterable, sortable).
The human can lock any field (or the discussion channel) on any item to prevent agent modification. Locks are enforced at write time: the store compiles current state, checks locked_fields, and refuses to append update or discuss ops from agents that touch locked fields. The error message is clear: "Field 'priority' on INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit is locked. Ask the human to unlock it."
Locks are toggled from the TUI (l key) or via CLI (scripts/tracker lock <ID> <field>). The agent can always read locked fields — it just can't write them.
Human-authority operations. The following ops require human authority (see Security Model) — the store refuses to append them when the resolved actor is "agent": lock, unlock, discuss_clear, stealth, unstealth. These ops override agent behavior or control visibility, so they must come from the human. promote, demote, and discuss_summarize remain available to both actors — they are workflow operations the agent legitimately needs.
Cross-branch lock enforcement. Lock enforcement uses the same scoped cross-branch peek as the Lamport clock (Key Design Decisions). Before appending an agent update op, the store peeks at the op log file on the scoped branch set (dev, main, HEAD, plus any unmerged branches) via git cat-file --batch, compiles each branch's version of the item, and unions their locked_fields sets. If the field being updated is locked on any branch in the scoped set, the write is rejected — even if the lock hasn't been merged into the current branch yet. This gives locks the same guarantee boundary as the Lamport clock: enforced against the active branch frontier, with validate warnings as a backstop for ops written on branches that weren't locally visible at write time (e.g., a remote branch not yet fetched). The cross-branch peek adds negligible overhead — it's the same git cat-file --batch calls the Lamport clock already makes, just extracting lock state from the compiled result.
Residual edge case. For truly concurrent ops on branches not yet fetched (neither side has visibility of the other), lock violations can survive a merge. validate detects these and emits a warning. The human corrects with a new op if needed. This is the same honest guarantee boundary documented for the Lamport clock — strongest possible without a central server.
Each item has an optional discussion, composed from discuss, discuss_clear, and discuss_summarize ops in the operation log. The discussion is async: the agent doesn't watch the tracker in a loop. It checks during stop hook reflection, between tasks, or when instructed. Discussions develop over hours or days as the agent works on other things. Many items can have active discussions simultaneously.
The clear-then-lock pattern gives the human a decisive override:
discuss <ID> --clear— appends adiscuss_clearop (compile ignores all prior discussion). Human-authority only (see Security Model).discuss <ID> "Priority stays at P0. Non-negotiable."— appends adiscussop (actor resolved fromos.getuid(), no--asflag)lock <ID> discussion— appends alockop so the agent can't respond
This doesn't remove the old discussion from the agent's context window (if it was already read), but it's an unambiguous signal in the persisted state: "I've decided, stop arguing."
Soft cap and summarization. The store emits a warning to stderr when a compiled discussion exceeds 20 entries. This is a soft cap — no hard limit is enforced, and humans can always override. To manage long-running discussions, use the --summarize flag:
scripts/tracker discuss <ID> --summarize "Summary text here"This appends a discuss_summarize op. When compiled, all prior discussion entries are replaced with a single summary entry marked is_summary: true. The TUI shows a warning badge (e.g., [20+ msgs]) next to items with oversized discussions.
Discussion rate limit (runaway loop guard). The soft cap catches normal excess; a rate limit catches degenerate cases. The store tracks discussion volume per item per calendar day using a simple heuristic: len(message) / 4.4 as token estimate (no tokenizer dependency). A generous daily cap — 200,000 tokens per item — is hardcoded in the store. Legitimate multi-day threads never hit this; an agent appending in a tight loop hits it within minutes. When the limit is reached:
warning: Discussion rate limit reached on INV-lusab (200,000 tokens today).
Further discussion deferred until tomorrow, or run --summarize to reset.
This is a circuit breaker, not a policy tool. It catches the degenerate case without ever restricting a human who wants a 60-entry thread on a thorny invariant. The limit is hardcoded (not in config) because it's a blast radius control, not a governance knob — see Safety Model.
File readability. Because discussion ops live in the same op log file as data ops, items with long discussions produce large files that are noisy in git log -p and git diff. Five mitigations: (1) op log files are in a dotdir (.ops/) with dotfile names, so they don't appear in casual browsing; (2) linguist-generated in .gitattributes collapses these in PR diffs; (3) a textconv diff driver (textconv) declutters local git log -p and git diff by showing compiled item state instead of raw ops; (4) discuss_summarize replaces accumulated entries with a single summary, which should be used proactively when discussions exceed the 20-entry soft cap; (5) the tracker: commit prefix convention (Commit Convention) lets developers filter tracker changes from git log entirely. If this proves insufficient, a future revision could split discussions into separate files — but this adds complexity to the merge story and is deferred unless needed.
Items exist in one of three visibility tiers, determined by which directory the file physically lives in:
| Tier | Directory | In git? | Pushed to fork remote? | In upstream PRs? | Use case |
|---|---|---|---|---|---|
| Canonical | .agent/tracker/.ops/ |
yes | yes | yes | Repo's institutional memory |
| Workspace | .agent/tracker-workspace/.ops/ |
yes | yes | no | Agent's working memory, backed up |
| Stealth | .agent/tracker-workspace/stealth/ |
no | no | no | Truly private, local-only |
Canonical is the shared truth — committed, included in PRs, visible to everyone. This is where confirmed invariants, validated work items, and reviewed findings live.
Workspace is the agent's scratch space — committed to git and pushed to the fork's remote (for backup and continuity across machines), but excluded from upstream PRs by scripts/contribute. The agent writes here by default. Items are promoted to canonical explicitly when they're confirmed and worth sharing.
Stealth is fully private — gitignored, never leaves the machine. For draft items, sensitive priority overrides, or human-agent discussions the human doesn't want in any git history.
Promotion and demotion. Visibility moves between tiers via promote and demote commands. Each transition appends an op to the op log file (for audit trail) and physically moves the file between .ops/ directories:
scripts/tracker promote <ID>— workspace → canonical (file moves fromtracker-workspace/.ops/totracker/.ops/)scripts/tracker demote <ID>— canonical → workspace (reverse)scripts/tracker stealth <ID>— workspace → stealth (file moves totracker-workspace/stealth/)scripts/tracker unstealth <ID>— stealth → workspace (reverse)- TUI:
mkey opens a tier-move dialog on the selected item
The ID does not change on promotion/demotion — it's content-derived, not path-derived.
Tier is location, not compiled state. Tier movement ops (promote, demote, stealth, unstealth) are audit-only in compile() — they record that movement occurred but do not affect the compiled item's fields. The authoritative tier is determined by the item's physical directory location, not by replaying ops. This is a deliberate architectural choice: git-level visibility controls (.gitignore for stealth, path exclusion in scripts/contribute for workspace) enforce tier boundaries without requiring YAML parsing. If tier were a compiled field in a single shared directory, .gitignore could not selectively exclude stealth items, and scripts/contribute would need to parse every ops file to determine which items to exclude from upstream PRs — a fragile and error-prone approach. The directory-based model provides defense in depth: even if compile() has a bug, workspace items physically cannot leak to upstream because they live in a different directory that contribute excludes by path. The tradeoff is that shutil.move() is not append-only, so interrupted moves can produce cross-tier duplicates — handled by the Self-Healing Reconciliation layer.
Fork workflow. When a contributor forks and clones, they get canonical (it's committed) and an empty workspace. Their agent:
- Reads canonical items for context (upstream's priorities and institutional knowledge)
- Writes new items to workspace (committed to the fork, backed up to the fork's remote)
- Never modifies canonical directly (by convention — canonical is upstream's, not the fork's)
When the contributor runs scripts/contribute, workspace changes are automatically excluded from the PR. The PR contains only code changes. If the contributor discovers an invariant worth sharing with upstream:
scripts/tracker promote INV-lusab— moves item to canonical (prefix matching)- Creates a separate
tracker:PR with just the promoted item - Upstream maintainer reviews it independently from the code PR
This separates "here's my code contribution" from "here's an invariant I discovered." The upstream maintainer can evaluate each on its own merits.
Stop hook scoping on forks. If count-todos aggregated all tiers, the fork agent would be permanently blocked by upstream's open canonical items (which it can read but not close). The fix: config.yaml has a stop_hook.scope field (see Config File). On forks, this is set to workspace so the stop hook only counts the fork agent's own work. The ready command still shows canonical items (the fork agent should be aware of upstream's priorities), but canonical items don't block the fork agent from stopping.
| Context | count-todos scope |
ready scope |
|---|---|---|
| Upstream | canonical + workspace + stealth (all) |
canonical + workspace + stealth |
| Fork | workspace + stealth (workspace) |
canonical + workspace + stealth |
Stealth items are always counted regardless of scope — they're local to the machine and always relevant to the local agent's stopping decision.
Fork detection and scope configuration are performed by scripts/tracker fork-setup (human-only — see CLI), which checks for the presence of an upstream remote and sets stop_hook.scope: workspace in the workspace config.yaml (gitignored, human-owned — see Config File). scripts/contribute checks whether fork-setup has been run (by reading stop_hook.scope from config); if not, it prints a reminder and exits rather than proceeding with a misconfigured scope.
Workspace starts empty on forks. When a contributor forks and clones, workspace has no items. The fork's agent creates items as it works. It doesn't get copies of canonical items — that would create duplicates in the merged read view and diverge immediately. Canonical is read-only context, not a starting point to be cloned.
Cross-tier references. An item in workspace can reference a canonical item via parent or before — for example, a workspace work item tracking progress on fixing a canonical invariant. The TrackerSet merged read view resolves references across tiers transparently. Validation checks cross-tier refs against the full merged set.
scripts/contribute workspace exclusion. The contribute script already handles fork-specific PR creation. Adding workspace exclusion is ~15 lines:
# Before creating the PR branch, exclude workspace changes
WORKSPACE_CHANGES=$(git diff --name-only "$UPSTREAM_DEV"...HEAD -- '.agent/tracker-workspace/')
if [ -n "$WORKSPACE_CHANGES" ]; then
echo "Excluding $(echo "$WORKSPACE_CHANGES" | wc -l) workspace tracker files from PR"
# Create clean branch without workspace-only commits
fiLeak mitigation. If a contributor uses raw git push instead of contribute, workspace items appear in the PR. Mitigations: (1) documented convention in AGENTS.md, (2) pre-push hook warns when workspace items are in a push to upstream, (3) linguist-generated collapses tracker diffs in the PR view so the noise is at least hidden. The consequence of a leak is just noise, not data loss — the maintainer can ignore workspace items in the diff.
Cross-tier duplicates — the same item ID existing in multiple tier directories — can arise from interrupted tier moves (promote/demote appends the op but crashes before or after the file move) or merge artifacts (one branch promotes an item while another branch continues editing it in workspace). TrackerSet handles these automatically on the read path — the tracker's Python code self-heals without agent involvement (see Safety Model).
When tier-movement ops exist: TrackerSet concatenates all ops from both files, compiles normally (Lamport clock ordering resolves everything), and looks at the last tier-movement op (promote, demote, stealth, unstealth) in the compiled history. That op determines the intended tier. The store merges the ops into a single file in the correct tier directory, deletes the other copy, and appends a reconcile op recording what happened (from_tier, reason). This is deterministic — no judgment needed, no agent involvement.
When no tier-movement ops exist: The item was independently created in different tiers on different branches (same content hash, different directories). This is genuinely ambiguous — the store cannot determine the correct tier. TrackerSet flags the item with a derived cross_tier_conflict field in the compiled snapshot (like created_at and updated_at — computed on read, not stored). The CLI surfaces this prominently:
⚠ INV-lusab exists in both canonical and workspace with no tier-movement history.
Resolve: scripts/tracker promote INV-lusab (workspace → canonical)
or: scripts/tracker demote INV-lusab (canonical → workspace)
Until resolved, the item appears in list and show with a conflict indicator but is excluded from ready (the agent should not work on items in ambiguous state). However, count_todos intentionally includes cross-tier conflict items — they represent real data integrity issues that should not be silently ignored. The circuit breaker (5 identical stop attempts with no progress) prevents the agent from getting stuck indefinitely on genuinely unresolvable conflicts. validate warns on cross-tier duplicates.
Self-healing is append-only. The reconcile op is the audit trail — the store never silently deletes or rewrites ops. The reconciled file contains the full combined history from both tier copies.
Self-healing attempt cap. If the same item triggers reconciliation more than 3 times (tracked via reconcile op count in the compiled history), the store stops attempting automatic repair, flags the item with a persistent error in the compiled snapshot, and surfaces it to the human via validate and TUI. This prevents repair loops from degenerate merge scenarios.
Recovery from capped items. When an item hits the reconciliation cap, the human resolves it via scripts/tracker reconcile-reset <ID>, which: (1) presents the current state of all tier copies, (2) asks the human to choose the surviving tier, (3) merges all ops into a single file in the chosen tier, (4) deletes the other copies, and (5) appends a reconcile op with reason: "manual-reset" that resets the reconciliation counter. This is a human-authority operation (see Security Model).
The compile() function is a pure function: it takes a list of ops from an op log file, sorts them by Lamport clock, and folds them into a snapshot. This is where concurrent edits are resolved — not in a merge driver, but in the read path.
Sort order: Ops are sorted by (clock, timestamp, actor_rank) where human ranks higher than agent. The Lamport clock captures causal ordering via cross-branch peek (see Key Design Decisions): if agent B could see agent A's ops on any local branch before writing, B's clock is strictly higher. For truly concurrent ops (same clock value, from branches not locally visible to each other), timestamp breaks ties. For same-clock-same-timestamp ops, human wins. This ensures deterministic output regardless of the order ops appear in the file after a git merge.
Duplicate create ops (cross-branch merge of independently-created items). Content-hash IDs (Key Design Decisions) mean that two agents on different branches who independently create the same item produce the same ID and write to the same .ops dotfile. On the same branch, add() detects the existing file and refuses (see Key Design Decisions). But across branches, both files exist independently until merge. After merge=union, the merged file contains two create ops (with different nonces and clocks, since each agent generated its own). compile() handles this gracefully: it takes the create op with the lowest clock (the causally earliest creation) as the canonical creation event and ignores subsequent create ops with identical data. The created_at derived field uses the earliest create op's timestamp. All subsequent update, discuss, lock, etc. ops from both branches are folded normally — the item's compiled state reflects the combined work of both agents. validate emits an informational notice ("INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit has 2 create ops — likely independent creation on separate branches, merged cleanly") but does not error.
Per-field resolution during fold:
| Field / op type | Compile behavior |
|---|---|
Scalar fields (status, priority, title, parent, description, pr_ref) |
Last write wins |
tags |
Accumulated: add ops union into the set, remove ops subtract. set replaces wholesale (use sparingly — concurrent set ops lose one side's intent). |
before |
Accumulated: add ops union, remove ops subtract. set replaces wholesale. |
duplicate_of |
Accumulated: add ops union, remove ops subtract. set replaces wholesale. |
not_duplicate_of |
Accumulated: add ops union, remove ops subtract. set replaces wholesale. |
fields (dict) |
Per-key last write wins (merge, not replace — updating fields.root_cause does not clobber fields.statement) |
locked_fields |
Accumulated: lock ops add to the set, unlock ops remove |
| Discussion | Accumulated: discuss ops append, discuss_clear resets to empty, discuss_summarize replaces with single summary |
Tier movement (promote, demote, stealth, unstealth) |
Audit-only: recorded in the op log for reconciliation and history, but do not affect compiled state. Tier is determined by directory location, not by op replay (see Three-Tier Visibility). |
reconcile |
Audit-only: records cross-tier duplicate resolution (see Self-Healing Reconciliation). Does not affect compiled state. |
Derived fields (computed by compile(), never stored):
created_at: timestamp of thecreateopupdated_at: timestamp of the last op in the file
Why merge=union and not git's default merge. When two branches both append multi-line ops to the end of the same file, git's default (ort) merge strategy produces a conflict — even when the appended ops are completely different. Simulation confirms this: default merge conflicts in 7 of 8 tested scenarios, including the trivial case of appending one update op on branch A and one discuss op on branch B. Conflicts block the autonomous agent workflow (the agent can't resolve <<<<<<< markers in YAML), so default merge is not viable for concurrent append-only files.
merge=union eliminates these conflicts by keeping lines from both sides. However, merge=union operates at the line level, not the YAML-block level. This creates two hazards:
-
Op fusion. When two branches append ops that share the same first line (e.g., both start with
- op: update), the union driver deduplicates that shared line, fusing two ops into one malformed YAML mapping with duplicate keys. Simulation confirms this: without mitigation,merge=uniongarbles the result in 4 of 8 tested scenarios (every case where both branches append the same op type). -
Line stripping. Even when first lines differ (e.g., different nonces on the
- op:line), ops that share identical internal lines (e.g.,set:/status: done) can have those shared lines deduplicated by the union driver's diff algorithm. This silently strips payload from earlier ops, producing valid YAML with correct op counts but empty op bodies — silent data loss. Empirical testing showed this affects 10/12 adversarial scenarios when only the first line carries a nonce, including a merge-only control (proving this is amerge=unionproperty, not rebase-specific). See~/hypergumbo_lab_notebook/adr-0013-prototyping-scripts/rebase_duplication_v2.py.
Nonce-on-every-line: the fix. Every line of every op carries the nonce as an inline # <nonce> comment:
- op: update # d4e5
at: "2026-02-11T18:31:00Z" # d4e5
by: agent # d4e5
actor: jgstern_agent # d4e5
clock: 3 # d4e5
nonce: d4e5 # d4e5
set: # d4e5
priority: 0 # d4e5
- op: update # e6f7
at: "2026-02-11T19:30:00Z" # e6f7
by: agent # e6f7
actor: jgstern_agent # e6f7
clock: 6 # e6f7
nonce: e6f7 # e6f7
set: # e6f7
status: done # e6f7Since each op has a unique nonce and every line carries it, every line in the file is globally unique. The union driver cannot match or deduplicate any line across ops — neither first lines (preventing fusion) nor internal lines (preventing stripping). The nonce also appears as a regular field (nonce: d4e5) for programmatic access — the comments are purely for merge correctness and are stripped by YAML parsers. ruamel.yaml's comment-preservation support handles nonce-on-every-line naturally on the write path.
Simulation results. With nonce-on-every-line, merge=union passes all tested scenarios — 9/9 adversarial cases including: identical ops (same set: block), cascade diamonds, 8-way concurrent ops, three-way merges with all-update ops, and post-rebase merges with pre-rebase lineages. Every scenario that failed with nonce-on-first-line (10/12 in the adversarial suite) passes cleanly with nonce-on-every-line. See ~/hypergumbo_lab_notebook/adr-0013-prototyping-scripts/rebase_nonce_every_line.py for the reproduction scripts. The format is more verbose than nonce-on-first-line, but buys complete merge-strategy independence: the tracker is safe under merge, rebase, squash-merge, or any combination.
YAML's implicit type coercion (yes→true, 3.0→float, bare null) makes agent-written content error-prone. To prevent data corruption, all YAML I/O flows through the store using strict serialization conventions.
Libraries. The store uses two YAML libraries, each for a distinct purpose:
ruamel.yaml(~0.18) for writes — round-trip-safe serialization with preserved quoting, comment retention, and canonical field ordering. All YAML output flows through ruamel.yaml.PyYAMLwithCSafeLoaderfor reads — wraps LibYAML (C extension), roughly 10× faster than ruamel.yaml's pure-Python parser. Since the read path doesn't need to preserve quoting or comments (it just needs Python dicts), the C loader is safe here. A dedicated test (test_yaml_roundtrip.py) verifies thatCSafeLoaderandruamel.yamlproduce identical parsed output for all op types, including adversarial inputs.
This split means the hot path — compile(), list, ready, count-todos — benefits from C-speed parsing, while the write path retains ruamel.yaml's strict serialization guarantees. The cache layer (Read Cache) further reduces how often even the C loader is invoked.
Benchmark-confirmed performance gap. Testing with realistic op log files (180–12,000 ops) shows CSafeLoader is consistently 3–10× faster than both SafeLoader and ruamel.yaml's parser. At 3,000 ops (~750KB file), CSafeLoader parses in ~200ms vs. ~1,000ms for SafeLoader and ~1,170ms for ruamel.yaml. This dual-library split improves the hot path speed by 5×.
Quoting rules:
- String fields that could be misinterpreted are always double-quoted:
title,description,message(in discuss ops), allfields.*string values op,status,kind,byare unquoted (controlled vocabulary, known-safe values)- Multiline strings use block scalar (
|) style - List-valued fields in
updateops (add/removedicts): always use YAML flow-style (e.g.,tags: [ci_infrastructure, analysis_quality]). Flow-style keeps the entire list on one line, making it atomic undermerge=union— git cannot interleave lines from different ops within a single-line value. Lists insidecreateops'data.fields(e.g.,regression_tests) may use block-style safely, since eachcreateop has a unique nonce namespace and interleaving across ops cannot occur.
Canonical op field order. Each op is serialized with fields in this order:
op (with nonce comment), at, by, actor, clock, nonce, [op-specific fields: data/set/add/remove/message/lock/unlock]
Every line of every op carries an inline # <nonce> comment that duplicates the nonce field value. This is load-bearing for merge=union correctness (see Compile Rules): it makes every line globally unique, preventing the union driver from fusing same-type ops (first-line deduplication) or stripping shared internal lines (payload deduplication). The comments are invisible to YAML parsers but visible to git's line-level merge. ruamel.yaml's comment-preservation support handles nonce-on-every-line naturally on the write path.
Within the create op's data, fields are ordered:
kind, title, status, priority, parent, tags, before, duplicate_of, not_duplicate_of, pr_ref, description, fields
Sole-writer invariant. The store is the sole writer of op log files. Agents and humans use the CLI/TUI — they never edit .ops files directly. Agents should never read op log files either — they use scripts/tracker show <ID> for compiled state (see Agent Context Protection).
Enforcement. The pre-commit hook (Pre-Commit Validation) validates all staged tracker files, catching malformed YAML regardless of how it was written. Additionally, validate checks for:
- Ops missing the
noncefield (the CLI always generates one) - Ops with lines missing the
# <nonce>inline comment — nonce-on-every-line is load-bearing formerge=unioncorrectness (Compile Rules) - Nonce comments not matching the
noncefield value on any line (detects copy-paste errors) - Non-canonical field ordering (the CLI always serializes in canonical order)
These pre-commit hook checks make it difficult for invalid YAML to get committed.
Round-trip invariant. load(dump(load(file))) == load(file) — enforced by a dedicated test (test_yaml_roundtrip.py) with adversarial inputs ("yes", "null", "3.0", "*bold*", strings with colons, leading whitespace, emoji, etc.).
Trailing newline. The store always writes a trailing newline. This ensures that when two branches both append ops, git's merge sees clean line boundaries and can concatenate without garbling.
Op log files grow monotonically (append-only). An item with 50 updates and 30 discussion entries has 80+ ops in a single .ops file. For the current scale (<500 items, <100 ops per item), this is fine — compile() is linear in op count and fast.
If file sizes become problematic, the compaction strategy is:
- Snapshot op: A new
compactop type containing the full compiled state at a point in time. When present,compile()starts from the snapshot and only replays ops with higher clocks. - Op pruning: A
scripts/tracker compact <ID>command that (a) compiles current state, (b) rewrites the file as a singlecompactop followed by only the post-snapshot ops. This is a destructive rewrite (not append-only), so it requires human confirmation and produces a normal git diff (not a merge-safe append). - Discussion summarization already exists (
discuss_summarize), which is the most likely source of file bloat.
This is explicitly deferred — the current design handles the expected scale. The compaction mechanism is documented here so the extension point is clear when needed.
The read path (list, ready, count-todos) must load and compile all items. Even with PyYAML's CSafeLoader (YAML Serialization Rules), parsing 500 op log files with 50+ ops each on every invocation adds latency that compounds when agents call ready frequently. An out-of-tree SQLite cache (in $XDG_CACHE_HOME) eliminates this cost for all read operations. The cache also means agents never need to read .ops files directly — scripts/tracker show/list/ready all query the cache (see Agent Context Protection).
Location: $XDG_CACHE_HOME/hypergumbo-tracker/<repo-fingerprint>/ (see Storage Layout). One cache database per tier (canonical.cache.db, workspace.cache.db, stealth.cache.db). The repo-fingerprint key (hash of remote URL + first commit SHA) allows multiple checkouts of the same repo to share a cache. Created automatically on first read; deleted and rebuilt by scripts/tracker cache-rebuild.
Why XDG, not in-repo. Two OS users sharing a checkout (see Security Model) cannot share a single SQLite database safely — every write changes file ownership, causing permission errors for the other user. XDG gives each user their own cache directory (/home/jgstern/.cache/... vs. /home/jgstern_agent/.cache/...). This is consistent with hypergumbo's existing cache strategy (~/.cache/hypergumbo/). The TRACKER_CACHE_DIR environment variable overrides the default — useful when the cache directory is on a network filesystem where SQLite locking is unreliable (e.g., NFS).
Schema:
CREATE TABLE items (
id TEXT PRIMARY KEY, -- e.g., "INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit"
kind TEXT NOT NULL,
title TEXT NOT NULL,
status TEXT NOT NULL,
priority INTEGER NOT NULL,
parent TEXT,
tags TEXT, -- JSON array
before_ids TEXT, -- JSON array of IDs
duplicate_of TEXT, -- JSON array of IDs
not_duplicate_of TEXT, -- JSON array of IDs
pr_ref TEXT,
description TEXT,
fields TEXT, -- JSON dict
locked_fields TEXT, -- JSON array
discussion TEXT, -- JSON array of {by, message, is_summary}
simhash INTEGER, -- 64-bit SimHash fingerprint (for fast similarity queries)
tier TEXT NOT NULL, -- "canonical", "workspace", or "stealth"
created_at TEXT NOT NULL, -- ISO 8601
updated_at TEXT NOT NULL, -- ISO 8601
source_mtime REAL NOT NULL, -- file mtime at last cache update
source_size INTEGER NOT NULL -- file size in bytes at last cache update
);
CREATE INDEX idx_status ON items(status);
CREATE INDEX idx_kind ON items(kind);
CREATE INDEX idx_priority ON items(priority);Cache invalidation: incremental via byte-offset tracking. The naive approach — re-parse the entire file whenever mtime changes — is expensive when discussion is heavy: benchmarking shows that re-parsing 150 stale items with 1,000 ops each costs ~7.7 seconds, even though the data fields (status, priority, tags, etc.) haven't changed. The count-todos and ready queries don't need discussion content at all, so this is entirely wasted work.
The fix exploits the append-only invariant: old bytes are never modified, new ops are always appended at the end. The cache stores both source_mtime and source_size per item. On potential invalidation:
stat()the file — get current mtime and size.- If mtime unchanged → cache hit. Return cached data. (The common case.)
- If mtime changed and current size ≥ stored
source_size→ likely append-only: a. Seek tosource_size, read only the new bytes. b. Parse the new bytes as a YAML fragment (they are complete op list items because the store always writes trailing newlines — see YAML Serialization Rules). c. If all new ops arediscuss,discuss_clear, ordiscuss_summarize→ discussion-only change: update only thediscussion,updated_at,source_mtime, andsource_sizecolumns in the cache. Skip data re-compile entirely. d. Otherwise (any new op is not a discussion op —update,lock,unlock,promote,demote,create,stealth,unstealth,reconcile) → data change: full re-parse and re-compile. - If mtime changed and current size < stored
source_size→ file was rewritten (compaction, manual edit, or unusual merge): full re-parse and re-compile.
This works reliably after merge=union merges: when two branches both append ops, merge=union preserves the original bytes verbatim and appends both sides' new lines at the end. The first source_size bytes are unchanged, so seeking to source_size produces exactly the new ops from both branches.
Benchmark impact: With incremental invalidation, the cost of a discussion-only cache miss drops from ~50ms per item (full re-parse of a 1,000-op file) to <1ms (seek + parse one small YAML op). For the 150-stale-item scenario above, this reduces invalidation cost from ~7.7 seconds to ~50ms — a ~150× improvement. This eliminates the strongest performance argument for separating discussion into separate files, while preserving the structural simplicity of one file per item.
Write-through updates: When the store appends an op (via add, update, discuss, etc.), it re-compiles that single item and upserts the cache row (including updated source_mtime and source_size) immediately. This means the cache is always current after a local write — no deferred rebuild needed. Other processes on the same machine sharing the same .cache.db benefit from write-through done by any process: once process A appends a discuss op and updates the cache, process B's next read sees a fresh source_mtime and gets a cache hit. Redundant re-parsing only occurs during the narrow window when multiple processes simultaneously detect a stale mtime before any finishes updating the cache — this is idempotent and harmless (all processes compute the same result).
Cold start: On first run (or after cache-rebuild), the store parses all YAML files, compiles each, and populates the cache. Cold-start time scales with total op volume, not just item count: 200 items × 50 ops each (~1.6 MB total YAML) takes ~340ms; 500 items × 500 ops each (~39 MB) takes ~11 seconds; 500 items × 3,000 ops each (~234 MB) takes ~53 seconds. Proactive use of discuss_summarize and compaction (Op Log Compaction) directly reduces cold-start time by keeping per-item op counts manageable. Subsequent reads after cold start are sub-millisecond (cache hit path).
After git pull or git merge: File mtimes change for any items modified on the incoming branch. The next read detects stale mtimes and applies incremental invalidation — for items where only discussion ops were appended (the common case when multiple agents are actively discussing), only the new bytes are parsed. Items with data changes get a full re-compile. This is the common case: a pull brings 5–10 changed items; the store incrementally processes 5–10 files, not all 500.
Robustness. The cache is strictly derived data — deleting .cache.db and re-running any command rebuilds it from the YAML source of truth. The cache is never consulted during writes (writes always go to YAML). If the cache is corrupt or out of date, the worst case is a one-time cold rebuild, not data loss.
Why SQLite. It's in Python's standard library (sqlite3), requires no additional dependency, supports indexed queries for filtered listing (SELECT ... WHERE status IN (...blocking_statuses...) AND ... ORDER BY priority, created_at), and handles concurrent reads safely (WAL mode). The .cache.db file lives outside the repo tree (XDG cache) and is disposable — it never enters the merge story.
Why not /tmp. The performance bottleneck is YAML parsing, not SQLite I/O. Cache hits already take ~0.02ms (a SQL query on indexed columns). Moving the database from XDG cache to a tmpfs-backed /tmp/ would save microseconds on an operation that takes microseconds — not meaningful. XDG cache survives reboots (unlike /tmp on many distros), so cold starts only happen on first-ever run or after explicit cache-rebuild.
Items carry two list fields for dedup triage:
-
duplicate_of: [<ID>, ...]— Marks this item as a duplicate of one or more others. Set automatically by agent confirmation of a similarity warning, or manually by human. Items with non-emptyduplicate_ofare excluded fromreadyandcount-todosbut remain in the store for audit. Multiple IDs allow marking an item as a dupe of several others (e.g., three agents independently discovered the same invariant — pick one survivor, mark the other two). -
not_duplicate_of: [<ID>, ...]— Records explicit "I've reviewed this pair, they're distinct" judgments. Suppresses future similarity warnings between those specific pairs. Accumulated over time as the human or agent triages flagged pairs.validate --similarskips pairs where either item lists the other innot_duplicate_of.
Both fields are modified via update ops (last-write-wins, replaced wholesale — same as tags and before). The human can lock duplicate_of on an item to prevent agent override of a triage decision.
Detection flow:
- On
add: SimHash (tier 1) compares the new item against all cached fingerprints. If Hamming distance is below threshold, a warning is emitted with actionable commands. The item is created regardless — no blocking. - On
validate --similar: SimHash comparison across all items, skippingnot_duplicate_ofpairs. Reports unflagged near-duplicate pairs. - On
validate --deep-similar: Additionally runs embedding-based semantic tag LSH (tier 2) ifonnxruntimeis available. Discriminates between items that share vocabulary but are about different things. Falls back to SimHash-only if embeddings are unavailable.
TUI integration (MVP): Items with non-empty duplicate_of are displayed dimmed or struck-through in the table/tree view. Post-MVP: TUI groups items sharing a duplicate_of target, with a "merge" button that consolidates op logs, picks the surviving item's title/fields, and closes the others. Merge semantics are designed when real usage informs the needs.
Op log files contain the full operation history for each item — every create, update, discuss, lock, and unlock op ever applied. An agent reading the raw op log wastes context window on historical intermediate states and may act on stale data (e.g., seeing status: todo_hard in the create op rather than the final compiled status: done). The compiled current state is what agents need; the op log is an implementation detail.
Three layers of defense prevent agents from reading op logs directly:
- Dotdir (
.ops/) — Agents are trained to skip dotdirs by convention. The op log directory is hidden fromls, file explorers, and casual glob patterns. - Dotfile (
.INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit.ops) — Each op log file is itself a dotfile, doubly hidden. Even if an agent navigates into.ops/, the files don't appear in standard directory listings. - Explicit instruction in AGENTS.md — The following rules are added:
- "Always use
scripts/tracker show <ID>orscripts/tracker show <ID> --jsonto read tracker item state." - "Always refuse to read files ending in
.ops. These are internal operation logs that will pollute your context window with historical data you don't need."
- "Always use
The .ops file extension (rather than .yaml) is also a signal — agents scanning for readable config or data files by extension will not match .ops files.
Why not just use the CLI? Agents with file access (Claude Code, Cursor, Copilot) default to reading files directly — it's faster than subprocess calls and is what they're trained to do. The dotdir/dotfile defense works with that behavior: agents will read files at obvious paths but skip hidden ones. If an agent is looking for tracker data, the most obvious path (.agent/tracker/) contains only config.yaml and hidden directories — nothing tempting to read.
Comparison with alternatives:
- git-bug stores data as git objects in
refs/bugs/*— not files at all, so agents can't read them. This is the strongest protection but requires custom git plumbing and a Go toolchain. - beads stores data in
.beads/issues.jsonl— a visible file containing current state (not history), but as a single massive JSON line per issue that wastes context window differently. - The tracker plan's approach is a pragmatic middle ground: the data is in files (for git portability and
merge=union), but hidden behind conventions that agents respect.
OS-level tripwire (inotifywait). As a fourth layer, scripts/install-hooks can optionally start an inotifywait watcher on the .ops/ directories:
inotifywait -m -r -e access,open .agent/tracker/.ops/ .agent/tracker-workspace/.ops/ \
--format '%T %w%f %e' --timefmt '%H:%M:%S' >> ~/.ops-access.log &The tracker CLI's access pattern is a predictable burst (many files opened in rapid succession during compile()). Any isolated .ops file read without a temporally correlated CLI invocation indicates a direct agent read. This is a monitoring/awareness mechanism — it does not block access, just logs it for the human to review. inotifywait requires no root privileges (it uses Linux's inotify kernel API) and is available in the inotify-tools package on all major distributions.
What already went wrong. The agent changed work item statuses in the markdown files using formatting that didn't match the stop hook's grep regex. Items silently fell out of governance view. The agent wasn't being adversarial — it was being helpful in a way that happened to be invisible to the tooling. This is the actual incident that motivated ADR-0013.
The primary security win of the tracker is schema enforcement. The agent cannot write a status that the governance tooling can't see, because status is validated against a controlled vocabulary on every write, and the pre-commit hook validates on every commit. There is no format to accidentally drift from.
What hasn't gone wrong. The agent has never attempted to forge human authority, bypass the stop hook, or deliberately evade governance. The agent's "motivation" comes from its system prompt. If you can't trust the system prompt to say "don't forge human ops," you can't trust it to say "don't rm -rf .agent/tracker" either.
Layered recovery (the real backstop). The deployment has multiple layers of recovery external to the tracker:
- VM snapshots — any mistake is reversible by rolling back the VM.
- Limited forge permissions — the agent has a Codeberg contributor account, not an owner account. It can create PRs but cannot force-push main, delete the repo, or change repository settings.
- Read-only mirror — the Codeberg repo is mirrored to GitHub, which the agent has no access to.
The tracker doesn't need to be a cryptographic fortress. It operates within a system where catastrophic mistakes are recoverable. Its job is to be rigid enough that the agent can't accidentally undermine governance — not to prevent a determined adversary with file-level write access.
The deployment has two OS users:
| Account | UID | sudo | Role |
|---|---|---|---|
jgstern |
1001 | yes | Human |
jgstern_agent |
1002 | no | Agent |
Both are routinely logged in simultaneously. The agent runs Claude Code under jgstern_agent. The human SSHes in as jgstern.
import os
import pwd
def _resolve_actor() -> str:
username = pwd.getpwuid(os.getuid()).pw_name
# Agent username patterns are configurable in config.yaml
# Default: ["*_agent"]
if _matches_agent_patterns(username):
return "agent"
return "human"No env vars to sniff or unset. No keys to generate or lose. No signatures to verify. os.getuid() returns 1002 no matter what environment variables are set, what CLI flags are passed, or what AGENTS.md says. The agent cannot become UID 1001 without su/sudo, which it doesn't have.
The CLI resolves the actor internally; there is no --as flag. Human-authority operations (see Field Locking) check the resolved actor and refuse to execute as "agent":
error: 'lock' requires human authority (current user: jgstern_agent).
Run this command as jgstern.
The human runs the TUI from their own account (jgstern). The TUI is the "human voice" because only UID 1001 can pass the actor check.
Configurable agent username patterns. The username.endswith("_agent") heuristic works for the current setup. For broader adoption, agent username patterns are configurable in config.yaml (see Config File): actor_resolution.agent_usernames: ["*_agent"]. Override per-project.
The repo lives in /home/jgstern_agent/hypergumbo/. A shared group gives both users write access to tracker directories:
# One-time setup (as jgstern, who has sudo):
sudo groupadd hypergumbo-tracker
sudo usermod -aG hypergumbo-tracker jgstern
sudo usermod -aG hypergumbo-tracker jgstern_agent
# Set tracker dirs to shared group with setgid:
chgrp -R hypergumbo-tracker .agent/tracker .agent/tracker-workspace
chmod -R g+rwX .agent/tracker .agent/tracker-workspace
chmod g+s .agent/tracker/.ops
chmod g+s .agent/tracker-workspace/.ops
chmod g+s .agent/tracker-workspace/stealthThe setgid bit ensures new .ops files created by either user inherit the group. Both accounts can append ops. The flock() around appends (see Key Design Decisions) handles concurrent writes.
Contributors and standalone adopters likely run single-user. For them:
- Actor resolution returns
"human"(no_agentsuffix). - The agent (if they run one) gets the same identity — there's no OS boundary on a single-user machine.
- Social controls (AGENTS.md rules, CLI conventions) are the enforcement layer, same as the existing design for lock/discuss ops.
- Config file permissions don't provide additional protection (same user owns everything).
This is an honest degradation, not a bug. Fork contributors don't need the same governance rigor as the upstream maintainer's autonomous agent.
For adopters who want stronger guarantees, creating a second OS user account is straightforward. The README should provide explicit steps for setting up a two-user deployment on a VM or container, and explain why it's a good idea. Running in a VM with snapshots (or at the very least a container) is also recommended — it's the real backstop for catastrophic mistakes, and it's cheap.
See Config File for the template-based config design. The key security property: config.yaml is gitignored and owned by the human user with mode 644. The agent can read it (needs to, for validation) but cannot write it. The OS enforces this — UID 1002 cannot write a file owned by UID 1001 with mode 644. No CLI checks, no social contracts, no crypto. The agent cannot redefine governance rules (kinds, statuses, blocking semantics) without privilege escalation.
Layer 1: Tracker code self-heals deterministic issues. The tracker's Python code handles structural inconsistencies automatically — no agent involvement. Cross-tier duplicates with clear tier-movement ops are reconciled on read (see Self-Healing Reconciliation). The agent never sees the inconsistency — it was fixed before the CLI returned output. Self-healing is always append-only (a new reconcile op, never rewriting or deleting existing ops) so the audit trail shows what happened.
Layer 2: Agent uses CLI to resolve ambiguities on unlocked items. The agent calls scripts/tracker update or scripts/tracker promote as a black-box tool — same as it uses git or pytest. It's making work-item decisions, not debugging the tracker. The CLI enforces locks, actor authority, schema validation, and rate limits. The agent can't do anything the CLI doesn't allow.
Layer 3: Unanticipated errors surface to the human, with hardcoded blast radius limits. For errors the tracker can't self-heal and the agent can't resolve through normal CLI usage, the system surfaces them through validate output, TUI error indicators, and derived conflict fields in compiled snapshots. The channel is impossible for the agent to dismiss or silence.
Hardcoded blast radius limits (in the store, not in config):
- Never delete or rewrite ops. Self-healing appends, always.
- Refuse to write to items that can't compile. If an op log is corrupted beyond what self-healing can fix, the store refuses to append rather than making it worse. The item is frozen until a human looks at it.
- Cap self-healing attempts. If the same item triggers reconciliation more than 3 times, stop trying and escalate (see Self-Healing Reconciliation). Prevents repair loops.
- Discussion rate limit. Token-based daily cap per item (see Discussion Threads). Prevents runaway agent loops from bloating op logs.
- Fail-closed stop hook. If
scripts/tracker count-todosexits non-zero (crash, corrupt cache, missing config), the stop hook treats this as blocking — the agent cannot stop. This eliminates the class of failure that motivated ADR-0013: governance tooling that silently gives wrong answers. A stopped agent is a loud signal that gets investigated; a churning agent with silently-broken governance is invisible.
The security guarantee, stated honestly: On a two-user deployment, human authority over ops is enforced by the OS (getuid). Human authority over config is enforced by file permissions. The agent cannot forge human-authority ops or redefine governance rules without privilege escalation. On a single-user deployment, both are enforced by convention (AGENTS.md rules, CLI design). The tracker does not claim cryptographic non-repudiation. The real backstop for all deployments is external: VM snapshots, limited forge permissions, and a read-only mirror.
| Proposal | Why not |
|---|---|
| Per-op cryptographic signatures | The OS user boundary already provides a non-spoofable actor identity. Crypto adds key management, performance cost, and contributor onboarding complexity for zero additional security in the actual deployment. |
| Hash-chain ops for tamper evidence | Append-only files with merge=union + nonce-on-every-line already make rewriting history a visible git operation (git log -p shows it). Hash chains add O(n) verification cost on every read and create fragile "chain broken" failure modes after legitimate git operations (rebase, cherry-pick). |
Signed policy config (detached .sig or policy-as-ops) |
Superseded by the gitignored-config-with-OS-permissions model, which provides stronger protection (OS-enforced, not crypto-enforced) with zero complexity. |
bootstrap-secure script |
The two-user setup is a shared-group chmod + chown away, folded into scripts/tracker init. No separate bootstrap needed. |
Policy rollback prevention via policy_hash in every op |
Config changes are rare. Adding a policy hash to every op's payload creates coupling between config edits and op validity. Not worth the complexity — and moot with OS-permission-protected config. |
| Hard discussion entry cap | A hard cap that blocks further discussion is unnecessarily restrictive. A generous rate limit (tokens/day) catches runaway loops without ever blocking legitimate long threads. |
CLI _resolve_actor() check on config edits |
Code theater. The agent can write files directly; a CLI check only works if the agent voluntarily uses the CLI. OS file permissions are the real enforcement. |
The tracker is a standalone Python package: packages/hypergumbo-tracker/. It lives alongside the other hypergumbo packages in the monorepo but has no dependency on hypergumbo-core — it doesn't need analyzers, IR, or tree-sitter. It's a standalone tool that happens to live in the same repo.
This keeps CI fast: tracker tests run in their own isolated job, not bloating the already-large core test suite.
packages/hypergumbo-tracker/
├── pyproject.toml # deps, console_scripts entry points, MPL-2.0 license
├── LICENSE # MPL-2.0 full text
├── README.md
├── src/
│ └── hypergumbo_tracker/
│ ├── __init__.py # Public API exports
│ ├── models.py # Dataclasses + op types + config loading + actor resolution
│ ├── store.py # YAML read/write, ID generation, compile(), list/filter, tree traversal
│ ├── trackerset.py # Multi-tier read: merges canonical + workspace + stealth stores
│ ├── cache.py # SQLite read cache: schema, invalidation, write-through, rebuild
│ ├── validation.py # Schema validation, enum enforcement, parent refs, cycle detection
│ ├── migration.py # One-time markdown → YAML converter
│ ├── cli.py # CLI (console_scripts: hypergumbo-tracker, hypergumbo-tracker-textconv)
│ ├── stop_hook.py # count_todos, hash_todos, generate_guidance (scope-aware)
│ ├── embeddings.py # Tier 2 embedding-based near-duplicate detection (ONNX/ModernBERT)
│ └── tui.py # Textual TUI application
└── tests/
├── conftest.py # Shared fixtures and test configuration
├── test_models.py
├── test_store.py # CRUD, compile(), concurrent-append scenarios
├── test_trackerset.py # Multi-tier merged reads, cross-tier refs, promote/demote
├── test_cache.py # SQLite cache: invalidation, write-through, cold start, corruption recovery
├── test_validation.py
├── test_migration.py
├── test_yaml_roundtrip.py # Adversarial YAML serialization tests
├── test_compile_properties.py # Property-based tests (hypothesis) for compile()
├── test_cli.py
├── test_stop_hook.py
├── test_embeddings.py # ONNX embedding and semantic duplicate detection tests
├── test_fork_workflow.py # Fork-based contributor workflow tests
├── test_git_integration.py # Git integration: Lamport clock, branch tracking
├── test_tui.py
└── test_tui_snapshots.py # Textual snapshot tests for visual regression
The package is a dependency of the hypergumbo umbrella meta-package — pip install hypergumbo pulls it in alongside core and the lang packages. But it has no dependency on hypergumbo-core (no analyzers, IR, or tree-sitter), so it can also be installed standalone by projects that want the tracker without hypergumbo's analysis tooling:
pip install hypergumbo-tracker # standalone CLI + TUI
pip install hypergumbo # gets tracker + everything elseRequired:
- ruamel.yaml (~0.18) — Round-trip-safe YAML write with preserved quoting (write path only — see YAML Serialization Rules)
- PyYAML (~6.0, with C extension) — Fast YAML read via
CSafeLoader(read path only — see YAML Serialization Rules) - proquint (~0.2) — Proquint encoding/decoding for hash-based IDs (pure Python, no deps; ~30 lines — could be vendored if preferred)
- rich (~14.3.2) — CLI table formatting
Required:
- textual (~7.5) — TUI framework (the TUI is a core feature; making it optional adds complexity for negligible footprint savings)
Optional ([dedup] extra):
- onnxruntime (~1.17) — ONNX model inference for
validate --deep-similar(tier 2 dedup). CPU-only, no GPU or PyTorch required. - tokenizers (~0.21) — Fast tokenizer for
nomic-ai/modernbert-embed-base. HuggingFace's Rust-backed tokenizer library. - The ONNX model file (
model_q4f16.onnx, 140 MB) is downloaded on first use and cached locally. Falls back gracefully if unavailable.
Dev:
- pytest, pytest-cov, pytest-xdist — testing (same versions as other packages)
- pytest-asyncio — async test support for Textual's
App.run_test()/Pilot(configureasyncio_mode=autoin pyproject.toml) - pytest-textual-snapshot — official Textual snapshot plugin for SVG-based visual regression testing
- hypothesis — property-based testing for
compile()invariants (see Verification)
The tracker package is licensed under MPL-2.0, while the rest of hypergumbo is AGPL-3.0-or-later. This dual-license structure enables standalone adoption: projects that want structured agent governance can pip install hypergumbo-tracker without AGPL obligations on their own code. MPL-2.0's copyleft is file-level (modifications to tracker source files must be shared), not program-level (the tracker can be embedded in proprietary projects without infecting them). AGPL's copyleft protects hypergumbo's core analysis tooling from unreciprocated SaaS use.
SPDX headers. Every source file in the repo carries an SPDX license identifier:
# SPDX-License-Identifier: MPL-2.0 # in packages/hypergumbo-tracker/
# SPDX-License-Identifier: AGPL-3.0-or-later # everywhere else# SPDX-License-Identifier: AGPL-3.0-or-later # shell scripts outside the tracker packageThe FSFE's REUSE tool validates compliance in CI (reuse lint). Contributors see the applicable license at the top of the file they're editing — no need to reason about directory boundaries. The DCO sign-off (git commit -s) is license-agnostic (it defers to "the open source license indicated in the file"), so the existing sign-off process requires no changes.
Integration glue. This ADR modifies files outside packages/hypergumbo-tracker/ (stop hook, pre-commit, CI workflows, AGENTS.md, scripts/tracker wrapper). Those files remain AGPL-3.0-or-later — they are hypergumbo-specific integration that is not useful standalone. MPL-2.0 is AGPL-3.0-compatible (MPL Section 3.3), so the AGPL host can depend on the MPL tracker without license conflict.
Entry points as the license boundary for executables. The tracker declares two console_scripts entry points in its pyproject.toml:
hypergumbo-tracker— Main CLI (all subcommands)hypergumbo-tracker-textconv— Git textconv driver for.opsfiles (see textconv)
Both are installed to $PATH by pip install hypergumbo-tracker and are MPL-2.0 as part of the tracker package. The repo's scripts/tracker is a thin AGPL-3.0 wrapper that delegates to the installed hypergumbo-tracker command — it exists for consistency with other repo scripts (scripts/auto-pr, scripts/contribute, etc.), not because the tracker needs it. Standalone users interact exclusively with the MPL entry points.
The tracker package declares console_scripts entry points (hypergumbo-tracker and hypergumbo-tracker-textconv) in its pyproject.toml. pip install hypergumbo-tracker makes both available on $PATH. Within the hypergumbo repo, scripts/tracker is a thin wrapper that delegates to the installed hypergumbo-tracker command (or falls back to python -m hypergumbo_tracker.cli), maintaining consistency with other repo scripts. All subcommands except tui produce plain text (or --json for machine consumption). All <ID> arguments accept proquint prefix matching (e.g., INV-lus or just lus) and positional aliases (:N referring to the Nth item from the last list/ready output).
| Subcommand | Purpose | Primary Consumer |
|---|---|---|
init |
Create .agent/tracker/ and .agent/tracker-workspace/ dirs (with .ops/ dotdirs), copy config.yaml.template → config.yaml (human-owned, mode 644), set up .gitignore entries (config.yaml, stealth/). See Config File, Security Model. |
human |
count-todos [--hard|--soft] |
Print integer count of blocking items (respects stop_hook.scope config, uses blocking_statuses from config). Exit 0 on success; exit 1 on error (stop hook treats non-zero as blocking — see Safety Model). |
stop_logic.sh |
hash-todos |
Print SHA256 of the circuit breaker input. Input specification: for each item with status in blocking_statuses (respecting stop_hook.scope), concatenate id + "\t" + status + "\t" + title + "\n", sorted by ID. Hash the resulting UTF-8 bytes with SHA-256. Discussion and fields are excluded — only identity and blocking status affect the hash. This ensures the circuit breaker fires when the agent is making no governance-relevant progress, not when discussions or field details change. |
stop_logic.sh |
validate [FILE...] [--similar] [--deep-similar] [--strict] |
Validate op log files. Exit codes: 0 = valid (warnings emitted to stderr), 1 = validation errors found, 2 = tracker internal failure (corrupt state, missing config, unreadable files). When called with file paths, validates only those files (but still checks cross-file constraints like duplicate IDs and dangling parent refs against the full set). No args = validate all. Warns on cross-tier duplicates. Warns when config.yaml has kinds/statuses not in config.yaml.template (CI fallback gap). --similar: surface near-duplicate pairs via SimHash (skips pairs in not_duplicate_of). --deep-similar: additionally uses embedding-based semantic tags for discrimination (requires onnxruntime + tokenizers; falls back to SimHash-only if unavailable). --strict: promote warnings to errors (exit 1). Pre-commit hook blocks on exit ≥ 1. Stop hook treats exit ≥ 1 as blocking. |
pre-commit hook, CI |
add --kind <kind> --title "..." [--tier canonical|workspace|stealth] |
Create new item (appends create op; default tier: workspace). Computes SimHash on creation and warns if similar items exist (see Key Design Decisions). Accepts prefix or positional alias (:N) for --duplicate-of/--not-duplicate-of flags. |
agent |
update <ID> --status|--priority|... |
Update fields (appends update op; respects locked_fields and actor authority). Scalar fields use --status, --priority, etc. Set-valued fields use --add-tag, --remove-tag, --add-before, --remove-before, --add-duplicate-of, --remove-duplicate-of, --add-not-duplicate-of, --remove-not-duplicate-of (mapped to add/remove dicts in the op). |
agent |
discuss <ID> "msg" |
Append discuss op (actor resolved from os.getuid() — no --as flag; see Security Model) |
both |
discuss <ID> --clear |
Append discuss_clear op (human-authority only) |
human |
discuss <ID> --summarize "summary" |
Append discuss_summarize op |
both |
lock <ID> <field> [<field>...] |
Append lock op (human-authority only) |
human |
unlock <ID> <field> [<field>...] |
Append unlock op (human-authority only) |
human |
promote <ID> |
Append promote op + move file workspace → canonical |
both |
demote <ID> |
Append demote op + move file canonical → workspace |
both |
stealth <ID> |
Move file workspace → stealth (human-authority only) | human |
unstealth <ID> |
Move file stealth → workspace (human-authority only) | human |
show <ID> |
Print compiled current state (formatted; includes cross-tier conflict indicator if applicable) | agent |
list [--status X] [--kind Y] [--tag Z] [--tier T] |
Filtered list (compact table, sorted by priority/before/created_at; shows tier indicator and conflict markers) | agent |
ready [--limit N] |
List actionable, unblocked items from all tiers (respects before soft-blocking; excludes items with cross-tier conflicts). Scope only affects count-todos, not ready — see Three-Tier Visibility. |
agent |
log <ID> |
Print raw operation log | both |
migrate |
Convert existing markdown → YAML (one-time, into canonical) | human/agent |
guidance |
Generate guidance markdown for stop hook (scope-aware) | stop_logic.sh |
fork-setup |
Detect fork (upstream remote), set workspace stop_hook.scope: workspace in config. Writes to config.yaml (human-owned), so must be run by the human user. If run by the agent, prints the required config change and exits with a message asking the human to run it. |
human |
reconcile-reset <ID> |
Resolve a capped cross-tier duplicate: present tier copies, ask human to choose surviving tier, merge ops, delete other copy, reset reconciliation counter. Human-authority only. See Self-Healing Reconciliation. | human |
cache-rebuild |
Delete and rebuild cache from YAML source of truth | human/agent |
textconv <FILE> |
Emit compiled one-line-per-field text representation of an op log file (used by git's textconv diff driver — see textconv) | git diff |
tui |
Launch Textual TUI (human-authority context — os.getuid() resolves as human) |
human |
Real-world terminals span 40×16 (phone over SSH) to 225×55 (full desktop). A single fixed layout tested only at 80×24 would be unusable on small screens and wasteful on large ones. The TUI uses three responsive layout tiers that adapt to the available terminal size, with a hard minimum of 40×16. Textual lacks CSS media queries, so responsive behavior is programmatic via Resize events + CSS class toggling — the idiomatic Textual pattern.
40×16. Below this, the TUI hides all interactive content and displays a centered static message: "Terminal too small (need 40×16, got WxH)". No interactive content is rendered until the terminal is resized above the minimum.
| Tier | Condition | Rationale |
|---|---|---|
| Compact | cols < 60 OR rows < 20 | Either dimension too small for two-pane |
| Wide | cols > 120 AND rows > 38 | Extra space for enhanced detail |
| Standard | (everything else) | Two-pane layout fits comfortably |
Evaluation order: compact first (any dimension too small), then wide (both dimensions large), then standard (the default). This ensures no terminal size falls through.
The OR/AND logic handles odd aspect ratios correctly:
- (100, 18): compact — height is the binding constraint
- (45, 40): compact — width too narrow for two panes
- (80, 24): standard — the typical terminal
- (130, 38): standard — borderline, not enough vertical for wide
- (225, 55): wide — full desktop
Chrome: 1-row header (app name + scope indicator), 1-row footer (top-3 keys: q/f/Enter). ~4 rows total chrome.
- List view (default): Full-width DataTable. Columns:
#(3 chars), tier indicator (1 char), priority (2 chars), truncated ID (adaptive), title (remaining width). Status column hidden below 55 cols. - Detail view (
Enter): Replaces list, full-screen scrollable. Shows title, status, priority, tier, full ID, tags, parent, description, fields (schema-aware), discussion (5 most recent, scrollable).Escreturns to list. - No tree toggle — insufficient width for indentation to be useful.
Chrome: 1-row header (filter chips for kind/status/tag/tier; search bar at ≥80 cols), 1-row footer (up to 6 keybindings). ~4 rows total chrome.
- Left panel (40–50% width, min 30 cols): DataTable or TreeView (
ttoggle). Columns:#, tier[C]/[W]/[S], priority, ID (2–3 syllable pairs), status, title. - Right panel (remaining width): Detail view with Rich markup. For kinds with a
fields_schema, known fields are rendered in declared order with theirdescriptionas a tooltip/label; unknown fields appear in a separate "Other" section below. For kinds without a schema, fields are rendered as a generic key-value list. Lock icons on locked fields. Discussion entries (most recent, scrollable). Discussion badge[20+ msgs]. - Vertical divider: 1 col.
Inherits standard structure with enhancements:
- Extra list columns:
created_at(date),updated_at(date), conflict indicator. - Longer ID truncation: 3–4 syllable pairs.
- Enhanced right panel: Secondary activity panel for discussion entries alongside detail (both visible simultaneously).
- Full keybindings in footer.
- Filter chips show active values inline.
Full proquint IDs are 48–53 chars. Truncation by available column width:
| Column width | Display | Example |
|---|---|---|
| ≤ 10 | prefix + 1 syllable pair | INV-bolil |
| 11–20 | prefix + 2 pairs | INV-bolil-mirid |
| 21–32 | prefix + 3–4 pairs | INV-bolil-mirid-pakim |
| > 32 | full or shortest unambiguous | full ID |
Uses the same shortest-unambiguous-prefix logic as the CLI — truncated IDs displayed in the TUI are directly usable as CLI arguments.
TrackerAppmaintains a reactivelayout_tierattribute.on_resizecomputes new tier from current dimensions. If tier changed, calls_apply_layout_tier().- Layout switching via CSS class toggling:
remove_class("compact", "standard", "wide")thenadd_class(new_tier). Three CSS rulesets control visibility and sizing per tier. - Below 40×16: all content hidden, "too small" label shown.
- State preservation: selected item ID preserved (not row index), filter state preserved, unsaved edit form state preserved. Scroll positions reset on tier change.
| Key | Action | Compact list | Compact detail | Standard | Wide |
|---|---|---|---|---|---|
q |
Quit | ✓ | ✓ | ✓ | ✓ |
Enter |
Open detail / select (via Textual on_data_table_row_selected event, not explicit BINDINGS — avoids misleading footer entry in standard/wide modes where detail is always visible) |
✓ | — | ✓ | ✓ |
Esc |
Back to list | — | ✓ | — | — |
t |
Tree/table toggle | — | — | ✓ | ✓ |
f |
Filter panel | ✓ | ✓ | ✓ | ✓ |
e |
Edit item | ✓ | ✓ | ✓ | ✓ |
n |
New item | ✓ | ✓ | ✓ | ✓ |
p |
Set parent | — | ✓ | ✓ | ✓ |
b |
Set before | — | ✓ | ✓ | ✓ |
l |
Lock toggle | — | ✓ | ✓ | ✓ |
m |
Tier move | ✓ | ✓ | ✓ | ✓ |
d |
Discussion | ✓ | ✓ | ✓ | ✓ |
D |
Clear discussion | — | ✓ | ✓ | ✓ |
In compact mode, p/b/l/D require visual context only available in the detail view — disabled in list view, enabled in detail view.
All edits append ops to the YAML file (immediate persistence). When editing fields on a kind with a fields_schema, the TUI presents known fields as named inputs with type-appropriate widgets (text area for text, spinner for integer with min/max constraints, multi-line list editor for list). Unknown fields are editable via a generic key-value row. Items with unresolved cross-tier conflicts (see Self-Healing Reconciliation) show a conflict indicator with resolution options.
The TUI app accepts a dependency-injected TrackerSet (wrapping canonical and workspace Store instances) so tests can point at tmp_path fixtures without touching .agent/tracker/ on the real filesystem. This also enables safe pytest-xdist parallelism. All test fixtures use deterministic at timestamps in ops (no datetime.now()) to avoid flaky snapshots — the compile path already derives created_at/updated_at from op timestamps, so freezing time at the op level is sufficient.
Multi-size test matrix (replaces single-size 80×24):
| Test size | Tier | Purpose |
|---|---|---|
| (30, 10) | too-small | "Terminal too small" message displayed |
| (40, 16) | compact | Minimum supported; list renders, basic nav |
| (50, 18) | compact | Phone-typical; ID truncation verified |
| (80, 24) | standard | Primary flow test size |
| (120, 34) | standard | Upper-standard; columns scale |
| (160, 45) | wide | Enhanced columns appear |
Dynamic resize tests:
- (80, 24) → (40, 16): standard→compact, selected item preserved
- (80, 24) → (160, 45): standard→wide, extra columns appear
- (40, 16) detail view → (80, 24): compact detail → standard right panel
- Any size → (30, 10): "too small" shown; resize back → app resumes
stop_logic.sh gains a conditional that tries the tracker CLI first and falls back to the existing grep patterns:
if [[ -x "$REPO_ROOT/scripts/tracker" && -d "$REPO_ROOT/.agent/tracker" ]]; then
# count-todos respects stop_hook.scope from config.yaml:
# - "all" (default, upstream): counts canonical + workspace + stealth
# - "workspace" (forks): counts workspace + stealth only
# Fail-closed: if the tracker errors, treat as blocking.
# A stopped agent is a loud signal; silently-broken governance is invisible.
if ! TOTAL_HARD=$(scripts/tracker count-todos --hard); then
echo "tracker: count-todos --hard failed (exit $?). Treating as blocking." >&2
TOTAL_HARD=999
fi
if ! TOTAL_SOFT=$(scripts/tracker count-todos --soft); then
echo "tracker: count-todos --soft failed (exit $?). Treating as blocking." >&2
TOTAL_SOFT=999
fi
TOTAL_TODOS=$((TOTAL_HARD + TOTAL_SOFT))
CURRENT_HASH=$(scripts/tracker hash-todos 2>/dev/null) || \
{ echo "WARNING: hash-todos failed, using fallback hash" >&2; CURRENT_HASH="fallback-$$"; }
# ... existing hash file / circuit breaker logic unchanged ...
else
# Legacy grep patterns (existing code, no changes)
HARD_TODO_COUNT=$(grep -c '^\s*- \*\*TODO!\*\*' "$LEDGER_FILE" 2>/dev/null) || HARD_TODO_COUNT=0
# ... etc ...
fiTask selection vs. stopping. count-todos answers "can I stop?" (total open work, scoped by stop_hook.scope). The separate scripts/tracker ready command answers "what should I work on next?" — it returns items from all tiers that are actionable and unblocked by before links (see Key Design Decisions), so the agent is always aware of canonical items even on forks. The stop hook uses count-todos; the agent's task-selection logic (documented in AGENTS.md) uses ready.
The grep fallback was removed in PR 7 (commit 77e4dc2). stop_logic.sh now uses the tracker CLI exclusively (fail-closed: if the tracker CLI is present but fails, the hook blocks). The markdown files are read-only archives (kept for git history, no longer consumed by anything). Phase 1 (dual-mode with grep fallback) was a transitional step and is no longer relevant.
Added to .githooks/pre-commit, inserted before Ruff (fail fast — tracker validation takes ~100ms). Only staged .ops files are validated per-file; cross-file constraints (duplicate IDs, dangling parents) still check the full set but only load ID and parent fields, not full compilation:
# Run tracker validation (fast - only staged files)
echo -n " Tracker (schema)... "
if [ -d ".agent/tracker" ] && command -v scripts/tracker &> /dev/null; then
STAGED_TRACKER=$(git diff --cached --name-only -- '.agent/tracker/.ops/' '.agent/tracker-workspace/.ops/' '.agent/tracker-workspace/stealth/' || true)
if [ -n "$STAGED_TRACKER" ]; then
if scripts/tracker validate $STAGED_TRACKER 2>/dev/null; then
echo -e "${GREEN}✓${NC}"
else
echo -e "${RED}✗${NC}"
echo ""
echo -e "${RED}Tracker validation failed. Fix YAML issues before committing.${NC}"
exit 1
fi
else
echo -e "${YELLOW}skipped (no tracker files staged)${NC}"
fi
else
echo -e "${YELLOW}skipped (no tracker)${NC}"
fiThis matches the existing pre-commit style in .githooks/pre-commit (color-coded pass/skip/fail, echo-then-check pattern). In CI, scripts/tracker validate (no args) validates all files.
Validation catches:
- Malformed YAML (not a valid list of ops)
- First op is not
createor is missing requireddatafields - Unknown
kind(not in config.yaml) - Invalid
status(not in config.yaml statuses list) - Unknown
optype - Missing required op fields (
op,at,by,clock,nonce) - Priority not an integer in range 0–4
- Timestamps not valid ISO 8601 UTC
- Duplicate IDs (across all tiers: canonical, workspace, stealth)
- Dangling parent references (parent ID doesn't exist)
- ID prefix doesn't match kind's configured prefix
- Cycles in
beforelinks - Required
fieldskeys missing (per kind'sfields_schema, if defined) fieldsvalues failing type/range checks (e.g.,progress_pct: "half done"when schema saystype: integer)- Cross-tier duplicates (same ID exists in multiple tier directories — see Self-Healing Reconciliation)
- Warning (non-blocking):
config.yamlcontains kinds or statuses not present inconfig.yaml.template(local-only additions that would fail in CI) - Warning (non-blocking):
config.yaml.templatehas kinds or statuses thatconfig.yamldoesn't have (stale local config — re-runinit) - Warning (non-blocking): unknown
fieldskeys with edit-distance suggestion (e.g.,'rout_cause' — did you mean 'root_cause'?), only for kinds with afields_schema - Warning (non-blocking): agent
updateops touching fields that were locked at the time (by timestamp)
scripts/tracker migrate performs one-time conversion of the existing markdown files:
- Write default
config.yaml.templatewith 3 kinds (invariant, meta_invariant, work_item) - Parse
.agent/invariant-ledger.md— regex on## INV-NNN:headers and- **Status:**fields - Parse
~/hypergumbo_lab_notebook/guidance_log/work_items.md— regex on category headers and- **STATUS**items - Map to unified status:
- "FIXED" / "✅ FIXED" →
done - "⬛ WON'T DO" →
wont_do - "UNFIXED" →
todo_hard - "PARTIALLY ADDRESSED" →
in_progress **TODO!**→todo_hard**TODO**→todo_soft**DONE**→done**DEFERRED**→deferred
- "FIXED" / "✅ FIXED" →
- Assign priorities: old P1 → 1, P2 → 2, P3 → 3; invariants with todo_hard → 0, todo_soft → 1, done/deferred → 4
- Generate hash-based proquint IDs by hashing each
createop's canonicalizeddatadict (SHA-256, first 128 bits, proquint-encoded), with kind-appropriate prefixes - Convert
pending_generalizationsembedded lists into child items withparent: <parent-ID> - Map work item categories to tags (e.g., "Developer Experience" → tag
developer_experience) - Write each item as an op log file in
.agent/tracker/.ops/(canonical tier — migrated items are upstream's institutional memory), using dotfile naming (.INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit.ops) - Create empty
.agent/tracker-workspace/withconfig.yaml.template(including.ops/andstealth/dirs) - Validate all written files
- Print summary: N items migrated (by kind), N parent-child links created
Migration is idempotent: re-running produces the same IDs (same content → same hash) and the same YAML output.
| File | Change |
|---|---|
packages/hypergumbo-tracker/ (NEW) |
Entire new package: pyproject.toml (with console_scripts entry points), LICENSE (MPL-2.0), 11 src modules (all with SPDX-License-Identifier: MPL-2.0 headers), 17 test modules |
packages/hypergumbo-tracker/LICENSE (NEW) |
MPL-2.0 full text |
scripts/tracker (NEW) |
Thin AGPL-3.0 bash wrapper delegating to installed hypergumbo-tracker entry point (falls back to python -m hypergumbo_tracker.cli) |
scripts/check-package-coverage |
Add tracker to PACKAGES map for per-package CI isolation |
scripts/dev-install |
Add pip install -e packages/hypergumbo-tracker[dev] |
.agent/hooks/_shared/stop_logic.sh |
Add tracker-first path with grep fallback (scope-aware via config) |
scripts/auto-pr |
Delete local and remote feature branch after successful merge (branch hygiene — see Key Design Decisions) |
scripts/contribute |
Add workspace exclusion (~15 lines): exclude .agent/tracker-workspace/ from upstream PRs |
.agent/tracker/ (NEW) |
Canonical tier: .ops/ dotdir with op log files from migration + config.yaml.template (tracked) + config.yaml (gitignored, human-owned) |
.agent/tracker-workspace/ (NEW) |
Workspace tier: empty .ops/, stealth/ dirs + config.yaml.template (tracked) + config.yaml (gitignored) |
scripts/tracker-textconv (NEW) |
AGPL-3.0 bash shim for git textconv diff driver — delegates to hypergumbo-tracker-textconv entry point, falls back to python -m hypergumbo_tracker.cli, then to cat "$1" with warning (see textconv) |
.gitattributes (NEW) |
linguist-generated + merge=union + diff=tracker for both canonical and workspace .ops/.*.ops files (see .gitattributes, textconv) |
.gitignore |
Add .agent/tracker/config.yaml, .agent/tracker-workspace/config.yaml, .agent/tracker-workspace/stealth/ |
AGENTS.md |
Update grep pattern instructions → scripts/tracker equivalents; add tracker: commit prefix convention and batching guidance (see Commit Convention); add task-selection guidance: use scripts/tracker ready (not list) to pick next work item; add agent context protection rules: always use scripts/tracker show or --json, always refuse to read .ops files (see Agent Context Protection); add branch hygiene expectation (delete feature branches after merge); update contributor workflow to reference fork-setup; document security model and two-user setup |
README.md |
Add section on recommended deployment setup: two OS user accounts (human + agent), VM with snapshots or container, with explicit setup steps and rationale (see Security Model) |
.agent/stop_reflect.md |
Update Section 2 grep patterns → tracker CLI |
.agent/cooldown_prompt.md |
Minor reference updates |
scripts/install-hooks |
Add git config diff.tracker.textconv scripts/tracker-textconv for local diff declutter (see textconv) |
.githooks/pre-commit |
Add incremental scripts/tracker validate step (staged files only from both tiers, before Ruff) |
.github/workflows/ci.yml |
Fix CODE_PATTERNS to exclude tracker .ops files; add tracker_data output; add tracker-validate job; update ci-complete gate; add concurrency group (see CI Integration) |
.github/workflows/full-suite.yml |
Fix CODE_PATTERNS to exclude tracker .ops files; add test-tracker job; update aggregate (see CI Integration) |
LICENSE |
Add preamble noting per-package licensing: packages/hypergumbo-tracker/ is MPL-2.0, everything else AGPL-3.0-or-later |
CONTRIBUTING.md |
Document dual-license structure (MPL-2.0 for tracker, AGPL-3.0-or-later for everything else), SPDX header convention, and that DCO sign-off covers both licenses per-file |
- Create
packages/hypergumbo-tracker/with pyproject.toml (includingconsole_scriptsentry points:hypergumbo-tracker→hypergumbo_tracker.cli:main,hypergumbo-tracker-textconv→hypergumbo_tracker.cli:textconv_main), LICENSE (MPL-2.0), src layout, tests dir. All source files carry# SPDX-License-Identifier: MPL-2.0headers - Update root
LICENSEwith preamble noting per-package licensing - Update
CONTRIBUTING.mdto document dual-license structure (MPL-2.0 for tracker, AGPL-3.0-or-later for everything else), SPDX header convention, and that DCO sign-off covers both licenses per-file models.py: Op dataclasses (includingpromote/demote/reconcile/reconcile-resetop types,updateops withset/add/removedicts,actorfield on all ops), Tier enum (canonical/workspace/stealth), config loading from chain (config.yaml→config.yaml.templatefallback, includingfields_schemaper kind — supported types:text,integerwith optionalmin/max,list,boolean;blocking_statuses/resolved_statuses;actor_resolution.agent_usernamespatterns;lamport_brancheslist with default[dev, main]). Status vocabulary loaded from config at startup (no Python enum). Actor resolution viaos.getuid()+ configurable agent username patterns (see Security Model)store.py: YAML write (ruamel.yaml, flow-style for list-valued fields inupdateops) and read (PyYAMLCSafeLoader— see YAML Serialization Rules), hash-based ID generation (SHA-256 of canonicalizedcreateopdatadict, first 128 bits proquint-encoded — see Key Design Decisions), same-branch existence check onadd()(refuse to create if file with computed ID already exists in the target tier — see Key Design Decisions), SimHash computation on item content (64-bit fingerprint, cached in SQLite), prefix matching resolver (shortest unambiguous prefix), positional alias support (stash file in XDG cache dir), scoped cross-branch Lamport clock (peek configurable branches +HEAD+ unmerged branches viagit cat-file --batch, with fallbacks for missing branches and shallow clones — see Key Design Decisions), cross-branch lock enforcement (same scoped peek, union oflocked_fields), human-authority enforcement via_resolve_actor()(see Security Model), nonce generation (4 random hex chars per op, serialized as inline# <nonce>comment on every line formerge=unioncorrectness — see Compile Rules),flock()with clock computation inside the lock (per-file advisory lock — see Key Design Decisions), discussion rate limit (token-based daily cap per item,len(message) / 4.4as estimate — see Discussion Threads),compile()function (tolerates duplicatecreateops from cross-branch merges — lowest-clockcreatewins, subsequent identical-datacreateops ignored — see Compile Rules; set-valued fields compiled via accumulatedadd/removeops — see Compile Rules), list/filter,ready()filter (soft-blocking viabeforelinks, usesresolved_statusesfrom config), tree traversal (children/ancestors), canonical op field ordering,beforetopological sort, refuse to write to items that can't compile (frozen until human intervention — see Safety Model). Store operates on a single directory (one tier) — multi-tier merging is handled byTrackerSet__init__.py: public API- Create
.gitattributeswithlinguist-generatedandmerge=unionfor both.agent/tracker/.ops/.*.opsand.agent/tracker-workspace/.ops/.*.ops(see .gitattributes) - Add
.agent/tracker/config.yaml,.agent/tracker-workspace/config.yaml, and.agent/tracker-workspace/stealth/to.gitignore - Update
scripts/check-package-coverageandscripts/dev-install - Tests: model construction, store CRUD (append ops with
actorfield preserved), hash-based ID generation (same content → same ID, different content → different ID, IDs are valid proquint-encoded), proquint round-trip (encode → decode → encode produces same result),add()same-branch existence check (create item, attemptadd()with identical content →ItemExistsErrorwith existing item's title; verify the original file is not overwritten; verify different content producing a different ID succeeds normally; verify hash collision — create item, thenadd()with different content that produces the same ID via mocked hash → auto-salts and creates under a different ID), prefix matching (unique prefix resolves, ambiguous prefix errors with candidates, kind-prefix-less matching works), positional aliases (:1resolves to first item in last list, stale alias file warns), SimHash computation (identical text → identical fingerprint, similar text → low Hamming distance, unrelated text → high Hamming distance), SimHash similarity warning onadd(mock store with existing items, verify warning emitted when distance below threshold, verify no warning when above threshold, verifynot_duplicate_ofsuppresses warning),duplicate_ofexclusion (items with non-emptyduplicate_ofexcluded fromreadyandcount_todos), scoped cross-branch Lamport clock (mockgit cat-file --batchto simulate peek across configured branches/HEAD/unmerged branches, verify clock > max across scoped set, verify merged branches are excluded, verify fallback whendev/mainmissing — usesHEADonly), cross-branch lock enforcement (mockgit cat-file --batchto simulate lock on another branch in the scoped set, verify agent update rejected), nonce uniqueness (two ops with identical content/clock/timestamp produce byte-different serializations),compile()with interleaved ops from simulated concurrent branches (same clock values, clock-skewed timestamps),compile()with duplicatecreateops (twocreateops with samedatabut different nonces/clocks → lowest-clockcreateused forcreated_at, subsequentcreateignored, all non-createops from both branches folded normally; twocreateops with same ID but differentdata→ compile uses lowest-clockcreate, logs warning),compile()withadd/removeops on set-valued fields (two concurrentaddops fortags→ union of both;addfollowed byremove→ correct set difference;setfollowed byadd→ set replaces base then add accumulates; concurrentsetops → LWW with warning), tree traversal,ready()filter (items blocked by incompletebeforepredecessors excluded, transitive blocking, stale/cross-tier links ignored),beforesorting,beforecycle rejection test_yaml_roundtrip.py: adversarial inputs ("yes","null","3.0","*bold*", strings with colons, leading whitespace, emoji), canonical field order verification (includingactorfield), nonce field presence verification, nonce-on-every-line verification (every line of every serialized op carries a# <nonce>inline comment matching thenoncefield value), flow-style enforcement for list-valued fields inupdateops (add/removedicts), CSafeLoader/ruamel.yaml parity (verify both parsers produce identical Python objects for all op types including adversarial inputs — note: CSafeLoader strips comments, so the nonce-on-every-line comments are not visible on the read path; comments are verified via raw string inspection of the serialized output, not via parsed data)test_compile_properties.py: property-based tests usinghypothesis— generate random op sequences (create followed by random update/discuss/lock/unlock ops with random clocks and timestamps) and verify: (1) idempotency (compile(ops) == compile(ops)), (2) permutation invariance (compile(shuffle(ops)) == compile(ops)), (3) terminal status consistency (compiled status = status from highest-clock update op that sets it), (4) duplicate-create resilience (generate op sequence with twocreateops sharing the samedatabut different clocks/nonces, verifycompile()produces the same result as with a singlecreateop followed by the same non-createops), (5) additive-op commutativity (generate random sequences ofadd/removeops ontagswith random clocks, verifycompile(shuffle(ops))produces the same tag set regardless of op order)
trackerset.py: Multi-tier wrapper that instantiates aStoreper tier (canonical, workspace, stealth), merges reads transparently, resolves cross-tierparent/beforereferences, routes writes to the correct tier, implementspromote()/demote()/stealth()/unstealth()(append op + physical file move between directories),reconcile_reset()(human-authority — merge ops, delete duplicate, reset counter), self-healing cross-tier duplicate reconciliation (follows last tier-movement op when deterministic, flags ambiguous cases with derivedcross_tier_conflictfield, caps reconciliation attempts at 3 per item — see Self-Healing Reconciliation), provides unifiedready()(excludes items with cross-tier conflicts) and scope-awarecount_todos()(respectsstop_hook.scopeandblocking_statusesfrom config)cache.py: SQLite read cache (see Read Cache) — one cache database per tier in$XDG_CACHE_HOME/hypergumbo-tracker/<repo-fingerprint>/. Schema creation (includingsource_sizeandtiercolumns), incremental byte-offset invalidation (seek to storedsource_size, parse only new bytes, skip data re-compile for discussion-only appends), write-through upsert on local ops, cold-start rebuild,cache-rebuildentry point,TRACKER_CACHE_DIRoverride. All read operations (list,ready,count-todos,show) query the cache; writes go to YAML and update the cache row in one steptest_trackerset.py: multi-tier merged reads (items from canonical + workspace + stealth appear in unified list with correct tier indicators), cross-tierparentresolution (workspace item withparentpointing to canonical item resolves correctly), cross-tierbeforeresolution,promote(workspace → canonical: op appended, file physically moved, cache updated in both tiers, ID unchanged),demote(canonical → workspace: reverse),stealth(workspace → stealth: file moves to gitignored dir),unstealth(stealth → workspace), scope-awarecount_todos(scope=allcounts canonical + workspace + stealth; scope=workspacecounts workspace + stealth only; usesblocking_statusesfrom config),readyalways shows all tiers regardless of scope, self-healing reconciliation (cross-tier duplicate withpromoteop → auto-reconciled to canonical withreconcileop appended; cross-tier duplicate withdemoteop → auto-reconciled to workspace; cross-tier duplicate with no tier-movement ops →cross_tier_conflictflag set, item excluded fromready; reconciliation attempt cap: item with 3+ priorreconcileops → stops trying, surfaces persistent error;reconcile-resetresets counter and resolves capped items; self-healing is append-only: verify no ops deleted or rewritten during reconciliation), human-authority enforcement (agent UID rejected forlock,unlock,discuss_clear,stealth,unstealth,reconcile-reset; agent UID accepted forpromote,demote,discuss_summarize,discuss,update)test_cache.py: SQLite cache correctness — write-through (append op, verify cache row updated without re-parse, verifysource_sizeupdated), mtime invalidation (touch YAML file, verify re-parse on next read), cold start (delete.cache.db, verify rebuilt from YAML), corruption recovery (corrupt.cache.db, verify rebuilt transparently), stale cache (simulategit pullchanging file mtimes, verify only changed items re-parsed), cache-vs-YAML consistency (compile from YAML and compare against cache row for all items), incremental invalidation (append discuss op to file, verify only new bytes parsed and data fields not re-compiled; append update op to file, verify full re-compile triggered; simulatemerge=unionby appending ops from two simulated branches, verify incremental parse finds all new ops; simulate file truncation/rewrite, verify fallback to full re-parse; verifysource_sizetracking is accurate across append/merge/rewrite scenarios)
validation.py: schema checks, status validation against config (not a hardcoded enum), dedup (across all tiers), cross-tier duplicate detection, parent ref checks (cross-tier),beforecycle detection (cross-tier), compiled-state checks, per-kindfields_schemavalidation (required fields present, type/range checks on known fields, edit-distance typo warnings for unknown fields), config-vs-template divergence warning (warn whenconfig.yamlhas kinds/statuses not inconfig.yaml.template), flow-style enforcement for list-valued fields inupdateops. Must support optional file-path arguments from the start (for incremental pre-commit validation — see Pre-Commit Validation). Exit codes: 0 = valid (warnings to stderr), 1 = validation errors, 2 = internal failure- Tests: validation pass/fail (including
fields_schema: required field missing → error, wrong type → error, unknown field with close edit distance → warning with suggestion, unknown field on kind without schema → no warning), exit code verification (errors → exit 1, warnings only → exit 0, internal failure → exit 2,--strictpromotes warnings to exit 1)
migration.py: markdown parser, status normalizer, priority assigner (integer tiers), hash-based ID generator (SHA-256 of canonicalizedcreateopdatadict, first 128 bits proquint-encoded), writer- Test against actual current content of both markdown files
- Creates
.agent/tracker/.ops/(canonical tier) with migrated op log files (each dotfile containing a singlecreateop) - Creates
config.yaml.templatefiles for both tiers - Creates empty
.agent/tracker-workspace/with.ops/,stealth/dirs and template - Tests: parse each markdown format, normalize all status variants, verify parent-child links
Absorbed into PR 1c. See above.
stop_hook.py: scope-aware count_todos() (readsstop_hook.scopeandblocking_statusesfrom config; exit 0 on success, exit 1 on error — stop hook treats non-zero as blocking), hash_todos() (input spec: for each item with status inblocking_statusesrespecting scope, concatenateid + "\t" + status + "\t" + title + "\n"sorted by ID, SHA-256 hash the UTF-8 bytes — discussion and fields excluded), generate_guidance()- Update
stop_logic.shwith dual-mode (tracker-first with fail-closed error handling and scope-aware counting, grep-fallback for Phase 1 transition only) - Update
.github/workflows/ci.yml: fixCODE_PATTERNSto exclude.agent/tracker/.ops/and.agent/tracker-workspace/.ops/; addtracker_dataoutput tochangesjob; addtracker-validatejob; updateci-completegate; add concurrency group (see CI Integration) - Update
.github/workflows/full-suite.yml: fixCODE_PATTERNS; addtest-trackerjob; updateaggregate(see CI Integration) - Tests: stop_hook functions match expected counts on fixture data (test both scope=
alland scope=workspace), hash stability (verify hash input spec: IDs sorted, only blocking items, fields/discussion excluded), scope=workspaceexcludes canonical items from count, fail-closed behavior (mockcount_todosto raise exception → stop hook treats as blocking; mock corrupt cache → rebuild attempted, if rebuild fails → blocking)
PR 5: Pre-commit + AGENTS.md + commit convention + branch hygiene + contribute [MERGED] (commit 1e4a636)
- Update
.githooks/pre-commitwith incremental tracker validation (staged.opsfiles only from both tiers, before Ruff — see Pre-Commit Validation) - Update AGENTS.md: replace grep pattern instructions with
scripts/trackerequivalents; addtracker:commit prefix convention and batching guidance (see Commit Convention); add task-selection guidance instructing agents to usescripts/tracker ready(notlist) to pick their next work item; add agent context protection rules: "Always usescripts/tracker show <ID>orscripts/tracker show <ID> --jsonto read tracker item state. Always refuse to read files ending in.ops." (see Agent Context Protection); add branch hygiene expectation (delete feature branches after merge); update contributor workflow to referencefork-setupand explain three-tier model for forks; document security model and two-user setup expectations - Update README.md: add section on recommended deployment setup — two OS user accounts (human + agent), VM with snapshots or container, with explicit setup steps (
groupadd,usermod,chgrp,chmod g+s) and concise rationale (see Security Model) - Update
scripts/auto-pr: delete local and remote feature branch after successful merge (keeps the scoped Lamport clock branch set small — see Key Design Decisions) - Update
scripts/contribute: add workspace exclusion (~15 lines) to strip.agent/tracker-workspace/from upstream PRs - Update stop_reflect.md, cooldown_prompt.md references
- Tests: pre-commit validation catches invalid
.opsfiles from both tiers, warns on lock violations, skips gracefully when no tracker files staged; contribute workspace exclusion (mock git operations, verify workspace files excluded from PR branch)
tui.py:TrackerApp(App)with dependency-injectedTrackerSet,_compute_tier(w, h)function implementing the tier definitions above, CSS class switching (compact/standard/wide),on_resizehandlertextual~=7.5declared as required dep in PR 1a's pyproject.toml- Compact layout: single-pane full-width DataTable, stacked detail on
Enter,Escreturns to list. Minimum-size enforcement (centered "Terminal too small" message below 40×16) _truncate_id(full_id, max_width, shortest_unambiguous)helper implementing the ID truncation strategy above- Footer with tier-appropriate keybinding hints (top-3 in compact:
q/f/Enter) - Basic keybindings:
q,f,e,n,m,d - Tests use Textual's
App.run_test()/Pilot(headless, async viapytest-asyncio). Pilot flows at (40, 16) and (50, 18). Too-small test at (30, 10). Unit tests for_compute_tier()(all 12 representative sizes) and_truncate_id()(each column-width bucket)
- Two-pane layout: left DataTable/TreeView, right detail panel, vertical divider
- Tree/table toggle (
t) - Header filter chips (kind/status/tag/tier) + search bar (at ≥80 cols)
- Schema-aware detail rendering: known fields in declared order with description tooltips, "Other" section for unknown fields, lock icons on locked fields
- Discussion entries in right panel (most recent, scrollable). Discussion badge
[20+ msgs] - Standard keybindings enabled:
t,p,b,l,D - Tier indicator column (
[C]/[W]/[S]), tier move dialog (mkey: promote/demote/stealth/unstealth) - Schema-aware edit form (type-appropriate widgets for known fields: text area, integer spinner with min/max, list editor; generic key-value row for unknown fields)
- Cross-tier conflict indicator for items with unresolved duplicates (see Self-Healing Reconciliation)
- Tests: Pilot flows at (80, 24) and (120, 34). Edit flow, lock toggle (
l) verify agent write rejected on locked field, discussion panel (d) submit message +Dclear, tier move (m) promote/demote, schema-aware rendering (kind withfields_schemavs. kind without), tree/table toggle (t) selection preservation, filter (f) by status/tier
- Wide enhancements: extra columns (
created_at,updated_at, conflict indicator), longer ID truncation (3–4 syllable pairs), expanded footer with full keybindings, enhanced right panel with secondary activity panel for discussion entries alongside detail, filter chips show active values inline - Dynamic resize handler with state preservation (selected item ID, filter state, edit form state preserved; scroll positions reset)
- "Too small" overlay for < 40×16 (any size → below minimum → all content hidden; resize back → app resumes)
- Tests: Pilot flow at (160, 45) verifying enhanced columns appear. Dynamic resize tests: (80, 24) → (40, 16) standard→compact with selected item preserved; (80, 24) → (160, 45) standard→wide with extra columns; (40, 16) detail view → (80, 24) compact detail → standard right panel; any size → (30, 10) "too small" shown, resize back → app resumes
pytest-textual-snapshotSVG baselines for all three tiers:- (40, 16): compact list view
- (55, 18): compact with status column visible
- (50, 18): compact detail view
- (80, 24): standard two-pane layout
- (80, 24): tree view with parent-child hierarchy
- (160, 45): wide layout with enhanced columns
- Filter panel open
- Discussion badge (
[20+ msgs]) - Locked-field item with lock icon
- Schema-aware detail (known fields in order, "Other" section) vs. generic detail (flat key-value list)
- Tier move dialog
- (30, 10): "too small" message
- Update with
pytest --snapshot-update. Compatible withpytest-xdist
- Remove grep fallback from stop_logic.sh
- Add deprecation notice headers to the old markdown files
- Final cleanup
- Add pre-push hook warning when workspace items are pushed to upstream remote
- End-to-end fork workflow test: fork-setup → workspace writes → contribute excludes workspace → promote → separate tracker PR
- Documentation: add fork workflow guide to README or CONTRIBUTING.md
After each PR:
pytest -n auto --cov-fail-under=100— full test coverage (project requirement)scripts/tracker validate— all YAML files pass schema validation (across both tiers)scripts/tracker count-todos --hard+--soft— counts match expected values (test with both scope=alland scope=workspace)scripts/tracker cache-rebuild— verify caches for both tiers rebuild cleanly andlistoutput matches a full YAML-only compile- Incremental invalidation sanity check: append a
discussop to an op log file (outside the store, simulating another process), runscripts/tracker list, verify the item's data fields are unchanged in output and thatsource_sizein.cache.dbreflects the new file size
End-to-end after PR 4:
- Trigger stop hook, verify it uses the tracker CLI path (not the grep fallback)
- Verify circuit breaker hash from
hash-todosmatches the old grep-based hash (regression test for migration correctness) - Critical: end-to-end
merge=uniontest. In a temporary git repo: (1) create branches A and B from a common base with an existing op log file, (2) on branch A, append anupdateop viascripts/tracker update, (3) on branch B, append a differentupdateop (same op type — the scenario that fails without nonce-on-every-line) to the same item, (4) merge B into A, (5) verify no conflict markers, both ops present as distinct YAML list items,compile()produces correct state reflecting both updates, (6) verify Lamport clocks are correctly ordered, (7) verify the# <nonce>comments on every line of each op survived the merge intact. Additionally test with identicalset:blocks across ops (the scenario where nonce-on-first-line fails due to line stripping). This test validates the nonce-on-every-line +merge=uniondesign. Add totest_store.pyas an integration test usingsubprocessto run actual git commands in a temp repo. - End-to-end rebase safety test. Same setup as the merge test, but (4a) rebase branch A onto B instead of merging, (4b) then merge the pre-rebase lineage with the post-rebase result. Verify no duplicate ops, no stripped fields, correct
compile()output. This validates that nonce-on-every-line makes the tracker rebase-safe. - Cross-branch duplicate creation test. In a temporary git repo: (1) create branches A and B from a common base with no op log file, (2) on branch A,
scripts/tracker addwith specific title/fields, (3) on branch B,scripts/tracker addwith identical title/fields (producing the same content-hash ID and same filename), (4) on each branch, append a differentupdateop to the item (A sets status toin_progress, B adds a field), (5) merge B into A, (6) verify no conflict markers, file contains twocreateops and bothupdateops, (7)compile()produces correct state:created_atfrom the lowest-clockcreate, status and fields reflect both updates, (8)validateemits informational notice about duplicatecreateops but does not error. This validates the duplicate-create-op tolerance described in Compile Rules.
End-to-end after PR 5:
- Simulate fork workflow: create item in workspace, verify
contributeexcludes it from PR, promote item to canonical, verify it appears in a separate commit - Verify
fork-setupdetects upstream remote and setsstop_hook.scope: workspace
End-to-end after PR 6d:
scripts/tracker tuilaunches, displays all items from both tiers- Responsive tiers verified at representative sizes: (40, 16) compact, (80, 24) standard, (160, 45) wide, below (40, 16) too-small message
- Dynamic resize: standard→compact preserves selected item; compact detail→standard moves detail to right panel
- Edits, locking, discussion, tier moves persist at all sizes
- ID truncation produces readable, prefix-matchable IDs at all widths
- Discussion badges for >20 entries
- Discussion entries scrollable in detail/activity panel
- All Pilot tests pass at multiple sizes with
tmp_path-backedTrackerSet - Snapshot baselines committed and passing for all three tiers
End-to-end after PR 8:
- Full fork lifecycle: fork → clone → fork-setup → agent creates workspace items → contribute (workspace excluded) → promote item → tracker PR → upstream merge → sync fork
Tracker op log files (.ops) are tracked on the main branch alongside code. Without careful CI configuration, every tracker change would trigger the full test suite unnecessarily, waste CI runner time, and potentially block the CI queue for real code changes. This section ensures tracker changes are smooth: no wasted CI, no blocked PRs, no noisy diffs.
Key architectural advantage: The existing CI already uses a changes job with job-level if: conditionals (not workflow-level paths filters), and a ci-complete gate as the sole required branch protection check. Jobs skipped via if: report as "skipped" which counts as passing for required checks in Forgejo Actions. This means we don't need structural changes — just surgical updates to the change detection logic and one new lightweight job.
The changes job in both ci.yml and full-suite.yml uses CODE_PATTERNS to decide whether expensive jobs run:
CODE_PATTERNS='\.py$|\.yaml$|\.yml$|\.json$|\.toml$|pyproject\.toml|scripts/|\.github/workflows/'The \.yaml$ pattern matches tracker config files, and .ops files could match other patterns. Fix by splitting detection to exclude tracker data:
CODE_PATTERNS='\.py$|\.json$|\.toml$|pyproject\.toml|scripts/|\.github/workflows/'
YAML_PATTERN='\.ya?ml$'
TRACKER_DATA='^\.agent/tracker(-workspace)?/'
CHANGED=$(git diff --name-only "$base" "$head")
has_code=false
# Non-YAML code files
echo "$CHANGED" | grep -qE "$CODE_PATTERNS" && has_code=true
# YAML files that aren't tracker data
echo "$CHANGED" | grep -E "$YAML_PATTERN" | grep -vqE "$TRACKER_DATA" && has_code=true
echo "code=$has_code" >> "$GITHUB_OUTPUT"Apply this to both .github/workflows/ci.yml (line 52) and .github/workflows/full-suite.yml (line 60).
No .gitattributes exists in the repo. Create one at the repo root:
# Tracker op log files: machine-generated append-only operation logs.
# - linguist-generated: collapse in PR diffs, exclude from language stats
# - merge=union: on conflict, keep lines from both sides (matches append-only design)
# - diff=tracker: use textconv driver to show compiled state in diffs (see [textconv](#local-diff-declutter-textconv))
.agent/tracker/.ops/.*.ops linguist-generated merge=union diff=tracker
.agent/tracker-workspace/.ops/.*.ops linguist-generated merge=union diff=trackermerge=union is the critical entry. It tells git that when a merge conflict occurs in these files, include all lines from both sides. This is exactly right for append-only operation logs: two branches that both appended ops will have all ops preserved without conflict markers. This upgrades the merge guarantee from "git usually handles appends correctly" to "git is explicitly told to keep everything from both sides."
linguist-generated causes Forgejo/Gitea to collapse these files in PR diffs by default and exclude them from language statistics, reducing review noise.
diff=tracker assigns a custom textconv diff driver (textconv) that shows compiled item state instead of raw operation logs in local git log -p and git diff. This is especially valuable now that op logs are in dotfiles — git log -p would otherwise show raw ops from hidden files, which is confusing. The textconv driver shows the compiled state instead.
Add a second output to the changes job so the new tracker-validate job knows when to run:
outputs:
code: ${{ steps.filter.outputs.code }}
tracker_data: ${{ steps.filter.outputs.tracker_data }}Detection logic (appended to the filter step):
if echo "$CHANGED" | grep -qE '^\.agent/tracker(-workspace)?/|^packages/hypergumbo-tracker/'; then
echo "tracker_data=true" >> "$GITHUB_OUTPUT"
else
echo "tracker_data=false" >> "$GITHUB_OUTPUT"
fiNote: packages/hypergumbo-tracker/ source changes also trigger code=true (because .py matches CODE_PATTERNS), so the full code CI runs too. The tracker_data output additionally triggers the lightweight validation job below.
A fast job (~10 seconds) that validates tracker YAML schema. No tree-sitter grammars, no grammar wheel builds, no venv cache — just pip-install the tracker package and run validate:
tracker-validate:
needs: [changes, stop-the-line]
if: >-
always() &&
needs.changes.outputs.tracker_data == 'true' &&
needs.stop-the-line.result != 'failure'
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install tracker
run: pip install -e packages/hypergumbo-tracker
- name: Validate
run: scripts/tracker validateAdd tracker-validate to the gate job's needs list and failure check:
ci-complete:
needs: [changes, stop-the-line, lint, audit, verify-generated, build-grammars, pytest, dco, tracker-validate]
if: always()
# ... existing steps, plus:
# [[ "${{ needs.tracker-validate.result }}" == "failure" ]]
# in the failure check blockSince ci-complete is the sole required branch protection check, this ensures:
- Tracker-only PRs: code jobs skip ("skipped"),
tracker-validateruns →ci-completepasses - Code-only PRs:
tracker-validateskips ("skipped"), code jobs run →ci-completepasses - Mixed PRs: both run →
ci-completepasses if both pass
Prevents CI queue congestion from rapid agent commits. Add at the top level of ci.yml, after on::
concurrency:
group: ci-${{ github.head_ref || github.ref_name }}
cancel-in-progress: trueWhen a new push arrives on a branch while CI is running for that branch, the in-progress run is cancelled and replaced. This is safe because ci-complete is the only required check — cancellation doesn't leave stale "pending" status checks. The full-suite.yml already has a singleton concurrency group with cancel-in-progress: false (correct — full suite should not be interrupted).
Document in AGENTS.md:
Commit prefix. Tracker-only changes use a tracker: conventional-commit prefix:
tracker: close INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit, update 3 work items
tracker: batch status updates for completed invariants
Batching. Agents should batch tracker operations into fewer commits rather than committing after every scripts/tracker update call. Perform all tracker updates for a logical unit of work, then commit once with a summary message.
Filtering. To view history without tracker noise:
git log --oneline -- ':!.agent/tracker/.ops' ':!.agent/tracker-workspace/.ops' ':!.agent/tracker-workspace/stealth' # path-based (always works)
git log --oneline --invert-grep --grep='^tracker:' # prefix-based (requires convention)Add a lightweight parallel test job alongside test-core, test-mainstream, test-common, test-extended:
test-tracker:
needs: [changes]
if: needs.changes.outputs.code == 'true'
runs-on: self-hosted
outputs:
coverage: ${{ steps.tests.outputs.coverage }}
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install and test
id: tests
run: |
pip install --upgrade pip
pip install -e packages/hypergumbo-tracker[tui] pytest pytest-cov pytest-xdist
pytest packages/hypergumbo-tracker/tests/ -n auto --tb=short \
--cov=packages/hypergumbo-tracker/src --cov-report=term | tee coverage-output.txt
COV=$(grep "^TOTAL" coverage-output.txt | awk '{print $NF}' | tr -d '%')
echo "coverage=${COV:-0}" >> "$GITHUB_OUTPUT"This job does not need the prep job, grammar wheels, or heavy deps — the tracker package has no tree-sitter dependency. Update the aggregate job to include test-tracker in its needs list and coverage reporting.
.agent/tracker/.ops/*.ops data changes produce code=false from the changes job, so smart test selection (ADR-0010) is never invoked for tracker-only changes. No changes to scripts/smart-test or .ci/affected-tests.txt handling needed.
When packages/hypergumbo-tracker/ code changes, code=true fires and the existing smart test selection runs normally. The scripts/check-package-coverage PACKAGES map needs a tracker entry to include the tracker package in per-package CI isolation (already noted in Files Modified).
linguist-generated (.gitattributes) collapses tracker diffs in Forgejo/Codeberg PR views, but doesn't help locally — git log -p, git diff, and git show still dump raw operation logs from the .ops dotfiles. Inspired by the smart-test pattern (wrap the tool, show a compact summary, keep the full output accessible), a textconv diff driver solves this transparently for all local git diff commands.
How it works. Git's diff.<driver>.textconv config points to an executable that converts a file to a text representation before diffing. Git runs the converter on both the old and new versions, then diffs the text representations. The diff=tracker attribute in .gitattributes (.gitattributes) assigns this driver to all tracker .ops files.
Setup (added to scripts/install-hooks):
git config diff.tracker.textconv scripts/tracker-textconvscripts/tracker-textconv — a thin AGPL-3.0 bash shim that delegates to the MPL-2.0 hypergumbo-tracker-textconv entry point, with graceful fallback:
#!/usr/bin/env bash
# SPDX-License-Identifier: AGPL-3.0-or-later
# Git textconv driver for tracker op log files.
# Delegates to the MPL-2.0 entry point; falls back to raw YAML if not installed.
hypergumbo-tracker-textconv "$1" 2>/dev/null && exit 0
echo "# hypergumbo-tracker not installed — run dev-install for compiled diffs"
cat "$1"scripts/tracker textconv <FILE> — CLI subcommand that compiles the item and emits a compact, one-line-per-field text representation designed for readable diffs:
INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit Call Attribution Completeness
status: todo_hard priority: P0 tags: [analysis_quality]
parent: null before: [] pr_ref: null
fields.statement: Every emitted `calls` edge has a non-null caller symbol
fields.root_cause: JS/TS arrow function early-return in _get_enclosing_function()
fields.fix: Position-based lookup for arrow functions
discussion: 2 entries
locked: [priority]
ops: 6 updated: 2026-02-11T19:30:00Z
When a field changes, git log -p shows a clean diff of the compiled states:
INV-lusab-bired-fomak-gunid-hasob-jikal-mofad-nukit Call Attribution Completeness
- status: todo_hard priority: P0 tags: [analysis_quality]
+ status: done priority: P0 tags: [analysis_quality]
parent: null before: [] pr_ref: null
...
- ops: 6 updated: 2026-02-11T19:30:00Z
+ ops: 7 updated: 2026-02-12T10:00:00ZInstead of the raw YAML op that was appended:
+- op: update # e6f7
+ at: "2026-02-12T10:00:00Z" # e6f7
+ by: agent # e6f7
+ clock: 7 # e6f7
+ nonce: e6f7 # e6f7
+ set: # e6f7
+ status: done # e6f7Bypass. git log -p --no-textconv (or git diff --no-textconv) shows the raw .ops file content when needed. This is the standard git escape hatch — no custom flags required.
Bootstrapping. On a fresh clone before dev-install, the hypergumbo-tracker-textconv entry point isn't available. The cat "$1" fallback in scripts/tracker-textconv ensures diffs still work — they just show raw ops until the package is installed. No broken state, just degraded display.
| smart-test | tracker-textconv | |
|---|---|---|
| Wraps | pytest | git diff rendering for .ops |
| Shows | ~20-line compact summary | compiled item state |
| Full output | .ci/pytest-output.log |
raw .ops content (via --no-textconv) |
| License | AGPL-3.0 (part of hypergumbo) | Entry point is MPL-2.0; scripts/tracker-textconv shim is AGPL-3.0 |
| Setup | alias in .venv/bin/pytest |
diff=tracker in .gitattributes + config in install-hooks |
| Transparent | yes (alias) | yes (git attribute) |
- Schema enforcement eliminates vocabulary drift and malformed entries — the demonstrated threat that motivated this ADR
- OS-level actor resolution (
os.getuid()) provides non-forgeable human authority without crypto key management - Gitignored config with OS file permissions makes governance rules harder for the agent to modify than source code
- Three-layer safety model: tracker self-heals deterministic issues, agent uses CLI as black box, unanticipated errors surface to human with hardcoded blast radius limits
- Agents get structured task selection via
scripts/tracker readyinstead of fragile grep patterns - Humans get a TUI for browsing, triage, field locking, and async discussion
- Fork-safe three-tier visibility enables contributor workflows without governance conflicts
merge=unionwith nonce-on-every-line eliminates merge conflicts for concurrent agent edits and is safe under both merge and rebase- Additive ops for set-valued fields (
tags,before,duplicate_of,not_duplicate_of) eliminate silent data loss under concurrency — consistent with the accumulated semantics already used forlocked_fieldsand discussion - Fail-closed stop hook ensures tracker errors surface as loud agent stops rather than silent governance failures
- Per-op
actorfield preserves full identity for audit trail and multi-agent debugging - Append-only operation log provides a complete audit trail with no additional infrastructure
- Self-healing cross-tier reconciliation handles interrupted tier moves and merge artifacts without agent involvement
- Reusable across projects — standalone MPL-2.0 package with no hypergumbo-core dependency; MPL's file-level copyleft removes the AGPL adoption barrier for projects that want agent governance without code analysis
- Content-hash IDs provide natural deduplication without coordination
- SQLite read cache (XDG-compliant, per-user) makes frequent agent queries (count-todos, ready) sub-millisecond
- Config-defined statuses and blocking semantics — governance changes are config changes, not code PRs
- New package adds maintenance surface (~11 source modules, ~17 test modules)
- Nonce-on-every-line makes op log files more verbose (every line carries
# <nonce>suffix) - Two YAML libraries (ruamel.yaml for writes, PyYAML/CSafeLoader for reads) in the dependency tree
- Dual-license repo (AGPL + MPL) requires SPDX headers on every file and clear contributor documentation
- Migration is a one-way door — reverting to markdown after migration loses op-log history
- Cross-branch Lamport clock adds coupling between the tracker store and git internals
- Op log files grow monotonically; compaction is deferred to a future revision
- Two-user deployment requires OS-level setup (shared group, setgid); single-user deployments degrade to social controls for human authority enforcement
- The TUI depends on Textual (required dependency), keeping the package self-contained
- Embedding-based dedup (tier 2) is optional and degrades gracefully when unavailable
- Stealth tier is gitignored — provides privacy but no backup
linguist-generatedcollapses tracker diffs in PRs, reducing review noise at the cost of visibilityflock()advisory locking protects against concurrent tracker processes but not arbitrary file writes — acceptable because the store is the sole writer
- ADR-0008: Autonomous Governance — Stop hook system this replaces
- ADR-0010: Modular Packages — Package structure pattern followed
- git-bug — Operation-sourced model inspiration
- beads — Per-field resolution strategy inspiration
- proquint — Pronounceable hash encoding
.agent/invariant-ledger.md— Current invariant tracking (replaced by tracker)~/hypergumbo_lab_notebook/guidance_log/work_items.md— Current work item tracking (replaced by tracker)