Ran MemFleet through a full end-to-end coordination exercise on a real two-machine deployment
(one engine, agents reaching it over a private network), driven entirely through the public
fleet_* MCP tools. Two things to report: the fleet fixes from the v0.6.17/v0.6.30 round hold
up under real multi-process use, and two smaller edges remain.
The earlier fleet fixes hold (confirmation)
The fleet issues fixed in v0.6.17 and re-confirmed in v0.6.30 all behave correctly on a current
deployment, exercised cross-process rather than in a single demo:
- Lease lifecycle is solid across every transition I exercised: an exclusive claim never
double-grants a second agent on the same scope (it queues, state=requested), renew extends
it, a higher-priority request preempts a lower one, and releasing auto-grants the next queued
waiter.
- Already-running agents see each other in real time. A second client process, started
separately, published an intent that the first process then saw live through its preflight,
with the natural-language assignment attached. The audit attributed each action to its own
session. That is the "refresh before every decision" fix working across processes.
- Recording edits classifies conflicts into additive / overlap / destructive, a destructive
overlap raises a Class C escalation with a mediation request and a suggested partition, and
resolving it propagates the right per-agent directive (the winner reads "proceed", the other
reads "defer"). The durable audit trail captures every step with actor, agent, and timestamp.
27 of 28 coordination operations behaved as documented. Good shape. The one that did not, plus
one restart edge, are below.
Two rough edges
fleet_submit_verdict rejects an argument shape its sibling calls accept. It requires its
verdict as a structured object and rejects the double-encoded JSON string, while
fleet_publish_intent and fleet_record_episode explicitly accept that same double-encoded
form (some MCP clients stringify object arguments). The practical effect: a client that
stringifies object args can drive the human-resolution path (fleet_resolve_escalation,
string args) but not the agent-mediator path (fleet_submit_verdict). Accepting the
double-encoded string in verdict too would remove the asymmetry and make the agent-judge
flow reachable from those clients.
- A long-running headless deployment cannot be cleanly restarted (license gate). After the
initial grace period, restarting the daemon exits with "no offline license installed", and
under a KeepAlive supervisor it throttle-loops instead of coming back up. Running in an
offline/headless configuration did not bypass it. So a deployment that has been up for a
while cannot be restarted without intervention. Either let offline/headless mode start
without the check, or document the requirement loudly before someone hits it in production. A
side effect worth noting: a headless engine appears to run without a file watcher, so its
graph likely goes stale unless something re-indexes it; a documented "keep a headless
deployment fresh" path would help.
(One already-open item, the opaque "bad intent" error on generic episode metadata, is tracked
separately and not repeated here.)
Net
The coordination model is well thought through, and the most load-bearing result is that the
prior fixes hold up under real cross-process, two-machine use: 27 of 28 operations behaved
exactly as documented. The two edges above are the agent-mediator verdict argument shape and
the headless restart/freshness story. Detail on any of these available on request.
Ran MemFleet through a full end-to-end coordination exercise on a real two-machine deployment
(one engine, agents reaching it over a private network), driven entirely through the public
fleet_*MCP tools. Two things to report: the fleet fixes from the v0.6.17/v0.6.30 round holdup under real multi-process use, and two smaller edges remain.
The earlier fleet fixes hold (confirmation)
The fleet issues fixed in v0.6.17 and re-confirmed in v0.6.30 all behave correctly on a current
deployment, exercised cross-process rather than in a single demo:
double-grants a second agent on the same scope (it queues,
state=requested), renew extendsit, a higher-priority request preempts a lower one, and releasing auto-grants the next queued
waiter.
separately, published an intent that the first process then saw live through its preflight,
with the natural-language assignment attached. The audit attributed each action to its own
session. That is the "refresh before every decision" fix working across processes.
overlap raises a Class C escalation with a mediation request and a suggested partition, and
resolving it propagates the right per-agent directive (the winner reads "proceed", the other
reads "defer"). The durable audit trail captures every step with actor, agent, and timestamp.
27 of 28 coordination operations behaved as documented. Good shape. The one that did not, plus
one restart edge, are below.
Two rough edges
fleet_submit_verdictrejects an argument shape its sibling calls accept. It requires itsverdictas a structured object and rejects the double-encoded JSON string, whilefleet_publish_intentandfleet_record_episodeexplicitly accept that same double-encodedform (some MCP clients stringify object arguments). The practical effect: a client that
stringifies object args can drive the human-resolution path (
fleet_resolve_escalation,string args) but not the agent-mediator path (
fleet_submit_verdict). Accepting thedouble-encoded string in
verdicttoo would remove the asymmetry and make the agent-judgeflow reachable from those clients.
initial grace period, restarting the daemon exits with "no offline license installed", and
under a KeepAlive supervisor it throttle-loops instead of coming back up. Running in an
offline/headless configuration did not bypass it. So a deployment that has been up for a
while cannot be restarted without intervention. Either let offline/headless mode start
without the check, or document the requirement loudly before someone hits it in production. A
side effect worth noting: a headless engine appears to run without a file watcher, so its
graph likely goes stale unless something re-indexes it; a documented "keep a headless
deployment fresh" path would help.
(One already-open item, the opaque "bad intent" error on generic episode metadata, is tracked
separately and not repeated here.)
Net
The coordination model is well thought through, and the most load-bearing result is that the
prior fixes hold up under real cross-process, two-machine use: 27 of 28 operations behaved
exactly as documented. The two edges above are the agent-mediator verdict argument shape and
the headless restart/freshness story. Detail on any of these available on request.