feat: add integrated device leasing#890
Conversation
5c36131 to
5a70306
Compare
|
Size Report
Startup median (7 runs, lower is better):
Top changed chunks:
|
|
Review finding on the latest green #890 head: Blocking: src/cli/commands/connection-runtime.ts:115 and src/cli/commands/connection-runtime.ts:187 leave a newly allocated lease stranded if local remote-connection state cannot be written. materializeRemoteConnectionForCommand allocates or refreshes the remote lease, updates nextState with the lease id, then writes the local state file. If writeRemoteConnectionState throws, the catch only stops the newly prepared Metro cleanup and rethrows. The just-acquired lease remains active in the remote daemon, no session was opened/bound yet, and this client has no persisted state to release it later. For proxy/device-aware leases that means the selected device can stay busy until the 5 minute TTL. Please track when this materialization acquired a new lease and release that lease in the write-failure cleanup path, passing the same tenant/run/provider/client/device metadata. The existing write-failure test at src/tests/remote-connection.test.ts:1338 already covers Metro cleanup; it should also assert the newly allocated lease is released. Existing leases that were merely heartbeated should not be released. |
|
Still seeing the lease cleanup blocker on c8b99d6. The write-state failure path in materializeRemoteConnectionForCommand still catches writeRemoteConnectionState errors by stopping prepared Metro cleanup and rethrowing, but it does not release a fresh proxy lease returned by allocateOrReuseLease. So if allocation succeeds and the state write fails, the lease remains active until TTL. The new hardening commit updates selector propagation/scope checks, but I do not see a release-on-write-failure path or a regression test for that failure. |
cd62dd1 to
f66360a
Compare
Thermo-nuclear code-quality reviewReviewed the full diff (68 files, +3790/-595). The behavior and test coverage look solid, and a couple of extractions are genuinely good ( But this is the strict pass, and there are several structural issues I'd want addressed before merge. They mostly stem from one root cause: the "lease identity" concept is modeled ~10 different ways and re-stitched field-by-field at every layer, and the few helpers built to centralize that don't actually reach the boundaries that need them. 1. Blocking — the lease scope is one concept modeled as ~10 near-identical shapes, remapped by hand everywhereWe now have
The abstraction exists but doesn't earn its keep — we pay for the mapping layer and still hand-write the same object literal at each call site. Code-judo here: pick one canonical 2. Blocking —
|
|
E2E proxy leasing verification update (2026-06-26):
Fixes pushed in 131a793:
Validation:
|
|
Addressed the latest lease review in
Validation:
|
|
Critical cleanup follow-up pushed in What I double-checked:
Validation after the cleanup:
|
|
Addressed the agent-device-cloud review feedback in
Important boundary: upstream still cannot invent a hosted/cloud Validation:
|
The proxy connect profile is written to disk as a non-secret remote config, but it unconditionally copied `metroBearerToken` into that file, leaking the secret at rest. Mirror the cloud path, which keeps `daemonAuthToken` in-memory only: the token still flows through this connect via the returned flags, and later commands re-supply it via AGENT_DEVICE_METRO_BEARER_TOKEN. Extend the non-secret-profile test to assert the bearer token is absent from disk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VPa5Z9GBkeqoxVctC85N7e
releaseSessionLease + sessionStore.delete ran only on the happy path, after several awaits (app-log/perf/snapshot teardown, platform close dispatch, runner stop) that can throw. A failed close therefore stranded the device lease until the inactivity expiry. Wrap teardown in try/finally so ownership is always freed; the original error still propagates after finally. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VPa5Z9GBkeqoxVctC85N7e
18963dd to
386abe7
Compare
|
Latest-head re-review on 386abe7: I do not see a remaining blocker. Checked the lease ADR and the production route: connect persists non-secret connection/identity metadata, proxy open resolves a concrete device before lease allocation, lease allocate/heartbeat/release carry tenant/run/provider/client/device scope, request admission and heartbeat run under the daemon request lock, open stores the admitted lease on the session, and close/disconnect release or clean up best-effort. The branch is clean against main and checks are green. The earlier local proxy E2E contention smoke covers the device-facing path I would require here. Residual risk: this is still broad routing/lifecycle work, so I would merge it with the existing proxy E2E evidence as the main confidence signal rather than relying on unit tests alone. |
Summary
Add integrated device leasing for shared remote/proxy access, so multiple agents can use one long-lived
agent-device proxywithout fighting over the same simulator/device or surfacing low-level runner ownership errors.Before this PR, remote/proxy clients could connect to the same host daemon but ownership was effectively enforced late by platform runner/helper locks. In practice, two sandboxed agents targeting the same iOS device could trip over errors like “iOS runner is already owned by another agent-device daemon”, and recovery often required manual cleanup or restarting the proxy.
After this PR, remote access has an explicit lease lifecycle:
connectstores the remote/proxy profile and stable client identity.openlazily resolves the target device and acquires a provider-aware device lease.openheartbeat the lease under the daemon request lock.closereleases the active session/device lease; inactive proxy leases expire after five minutes.DEVICE_LEASE_BUSY, including backend/provider/device metadata, instead of leaking into runner ownership failures.Example direct-proxy usage:
# Device host Mac: start one long-lived proxy agent-device proxy --port 4310If client B targets the exact device currently leased by client A, it gets a device-lease busy error and can retry after A closes or after the inactivity TTL expires. The proxy process itself is intended to stay up.
Implementation scope spans remote connection CLI/profile handling, daemon lease admission/session lifecycle, provider-aware device contention, generated non-secret proxy profiles, iOS runner lease diagnostics, docs, and focused regression coverage.
Touched-file count: 61.
Validation
Verified with focused lease/proxy coverage, static gates, and build:
pnpm exec vitest run src/daemon/__tests__/lease-context.test.ts src/daemon/__tests__/lease-registry.test.ts src/core/__tests__/lease-scope.test.ts src/__tests__/remote-connection.test.ts src/daemon/__tests__/lease-lifecycle.test.ts src/daemon/__tests__/request-execution-scope.test.ts src/platforms/ios/__tests__/runner-session.test.tspassed: 7 files / 142 tests.pnpm exec vitest run src/__tests__/remote-connection.test.ts src/daemon/__tests__/lease-registry.test.tspassed after the final cleanup: 2 files / 49 tests.pnpm formatpassed.pnpm check:quickpassed.pnpm check:fallow --base origin/mainpassed: no issues in 61 changed files.pnpm buildpassed.pnpm check:unitunit phase passed: 277 files / 2719 tests. The aggregate still exits intest/integration/smoke-daemon-clean.test.ts:40on the known daemon PID liveness assertion, outside the changed lease/proxy paths.Manual proxy verification exercised one host proxy with multiple sandboxed clients opening stock apps on separate iOS simulators, confirmed same-device contention returns
DEVICE_LEASE_BUSY, and confirmed close/reacquire works without restarting the proxy.