Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Hook scripts in `src/hooks/` are standalone Node.js scripts (no iii-sdk import).
## Current Stats (v0.9.28)

- 61 MCP tools (8 visible by default, `AGENTMEMORY_TOOLS=all` for all)
- 135 REST endpoints
- 136 REST endpoints
- 6 MCP resources, 3 MCP prompts
- 12 hooks, 15 skills
- 50+ iii functions
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1850,7 +1850,7 @@ Create `~/.agentmemory/.env`:

<h2 id="api"><picture><source media="(prefers-color-scheme: dark)" srcset="assets/tags/light/section-api.svg"><img src="assets/tags/section-api.svg" alt="API" height="32" /></picture></h2>

135 endpoints on port `3111`. The REST API binds to `127.0.0.1` by default. Protected endpoints require `Authorization: Bearer <secret>` when `AGENTMEMORY_SECRET` is set, and mesh sync endpoints require `AGENTMEMORY_SECRET` on both peers.
136 endpoints on port `3111`. The REST API binds to `127.0.0.1` by default. Protected endpoints require `Authorization: Bearer <secret>` when `AGENTMEMORY_SECRET` is set, and mesh sync endpoints require `AGENTMEMORY_SECRET` on both peers.

### Health Thresholds

Expand Down
2 changes: 1 addition & 1 deletion assets/tags/light/section-api.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion assets/tags/section-api.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Arena Synthesis: Issue 338

## Rubric

1. Uses current repo evidence for stop order and path handling.
2. Accounts for the Windows comment and distinguishes stale original claims from
residual valid scope.
3. Identifies the protected boundary / human checkpoint correctly.
4. Proposes a narrow, testable fix direction without broad scope expansion.
5. Names concrete files/tests and residual uncertainty.

## Scores

| Candidate | Repo evidence | Windows residual | Boundary/checkpoint | Narrow fix | Files/tests/uncertainty | Total |
| --- | --- | --- | --- | --- | --- | --- |
| A | 5 | 5 | 5 | 4 | 5 | 24 |
| B | 5 | 5 | 4 | 5 | 5 | 24 |
| C | 4 | 4 | 4 | 4 | 3 | 19 |

## Decision

Base: Candidate B.

Candidate B is the cleanest base because it keeps the issue shape narrow:
the original broad claims are stale on this branch, while the remaining
plausible bug is native Windows shutdown still depending on a worker `SIGTERM`
even though the flush lives in the worker signal handler.

Grafts:
- From Candidate A: stronger proof that existing Windows stop tests still model
worker stop as `SIGTERM`-driven, especially
`test/cli-stop-port-detection.test.ts`.
- From Candidate A: fuller path-handling evidence in `src/cli/engine-launch.ts`,
`src/cli/runtime-config.ts`, `src/cli/iii-config.ts`,
`test/engine-launch.test.ts`, `test/cli-iii-config.test.ts`, and
`test/runtime-config.test.ts`.
- From Candidate A: clearer rejection notes that more grace time and another
stop-order tweak do not solve the Windows residual.
- From Candidate C: concise framing that the original POSIX-facing body is
stale while the Windows-specific comment remains live.

Rejected:
- A REST endpoint is one possible transport for a pre-stop flush, but the arena
does not pick it before human approval because that would change API surface
and endpoint counts.
- Boot reconciliation alone is not the primary fix. It may recover already
written state, but it does not prevent shutdown-time loss when the worker is
hard-terminated before flush.
- A Windows console-control-event solution is higher uncertainty because the
worker is spawned under iii-engine supervision.

## Validity Finding

Issue #338 is **stale for the original broad POSIX/data-dir claims** and
**valid for a narrower native Windows shutdown residual**.

Already fixed on current base:
- Native stop order is worker-first before engine in `src/cli.ts`.
- Unresponsive native `--force` stop is also worker-first in
`src/cli/stop-processes.ts`.
- Worker shutdown grace is 5000 ms on worker-first paths.
- Bundled/runtime iii config rewrites `./data/state_store.db` and
`./data/stream_store` to absolute paths under `~/.agentmemory/data`.

Still plausible and actionable:
- Worker stop still routes through `process.kill(pid, "SIGTERM")`.
- The critical flush is still inside the worker's `SIGTERM` / `SIGINT` handler:
`indexPersistence.stop()`, `indexPersistence.save()`, and
`shutdownSdkWithTimeout(sdk)`.
- Public issue comments report that Windows treats this `SIGTERM` path as hard
process termination, so the handler and flush do not run.

## Likely Smallest Fix Family

Recommended scope for implementation, pending human approval:

Add a local/private worker flush or checkpoint path that the CLI invokes before
terminating the worker on Windows. Preserve the existing worker-first stop order
and treat OS-level termination as cleanup after the checkpoint succeeds, or as a
forceful fallback after a clear timeout/failure policy.

Likely touched files:
- `src/cli.ts`
- `src/cli/stop-processes.ts`
- `src/index.ts`
- `src/shutdown.ts` or a new small lifecycle helper
- possibly `src/triggers/api.ts` if the chosen transport is REST
- focused tests under `test/`

Human checkpoint is required before implementation because this crosses
persistence, shutdown, and iii-engine lifecycle boundaries, and may add a new
callable control surface.

## Focused Test Direction

Before production edits:
- RED test proving Windows stop performs a flush/checkpoint before worker
termination, instead of relying only on `SIGTERM`.
- Test that worker flush happens before engine termination.
- Test timeout/failure behavior for pre-stop flush: either preserve and fail, or
force only with explicit warning when `--force` semantics apply.
- Preserve existing path tests: `test/engine-launch.test.ts`,
`test/cli-iii-config.test.ts`, and `test/runtime-config.test.ts`.

## Residual Uncertainty

- No live Windows repro was run in this environment.
- The exact lost artifact should be named by the failing test or harness:
observation KV rows, memory rows, BM25/vector snapshots, audit rows, or
in-flight hook writes.
- Current branch is behind newer `origin/main`; relevant stop/path files should
be rechecked before final implementation or PR work.
Loading
Loading