feat: enable parallel iOS simulator execution with per-device session tracking#3138
feat: enable parallel iOS simulator execution with per-device session tracking#3138qwertey6 wants to merge 1 commit intomobile-dev-inc:mainfrom
Conversation
|
CI note: The E2E failures on Android and iOS are pre-existing and unrelated to this PR:
Both failures reproduce on other unrelated PRs in the repo ( Build, unit tests, web E2E, and iOS XCTest runner tests all pass. |
dd74302 to
803cfba
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
|
Is the description of this PR correct?
I think sharding is a different use case from two parallel unrelated I've tidied up a couple of the linked issues - clearly duplicates. Oops. |
|
Good question — you're right that Two separate
I'll update the PR description to be more precise about which use case each fix addresses. Thanks for the nudge — and for cleaning up the dupes. |
803cfba to
7cd8f12
Compare
proksh
left a comment
There was a problem hiding this comment.
Thanks for tackling this @qwertey6 - parallel iOS is a real pain point and the three collision points are correctly identified.
Looking at the PR, every change is coordination machinery layered around three shared resources (the XCTest port on localhost, ~/.maestro/sessions, and the debug log directory) so that two processes can use them simultaneously without lying to each other. Each fix is mechanically correct but downstream of one underlying design choice: that those three resources are shared resources processes need to negotiate over. So the PR reads as a stack of small workarounds rather than one architectural change.
I'd suggest looking at a simpler architecture that removes most of this:
-
Ask the OS for a port.
ServerSocket(0).use { it.localPort }- no scanning, no7001..7128range, no usedPorts map, no--driver-host-portflag. Same TOCTOU exposure as the currentisPortAvailablecheck (microseconds, ephemeral range), dramatically simpler mechanism. -
Stop sharing session state across processes - and consider deleting
SessionStoreentirely (not "make it in-memory" - delete it). Within one Maestro JVM, two sessions never share a device:--shard-splitdistributes shards 1:1 onto distinct devices (runShardSuite readsdeviceIds[shardIndex], and validateDevices enforcesdevices >= shards), so the warm-reuse check is always false within one process. The shared file at~/.maestro/sessionsexists only to enable warm reuse across separately-launched CLI invocations on the same device - a CLI-iteration optimization worth a few seconds of cold-start. If warm reuse ever turns out to matter in practice, the right shape is a long-running daemon that owns drivers, not a shared file scribbled on by every CLI invocation.
The PID-suffixed log dir should stay - its unrelated to the workaround pattern.
Happy to talk through this if I'm missing a use case.
7cd8f12 to
58eacc2
Compare
qwertey6
left a comment
There was a problem hiding this comment.
@proksh — thanks for the detailed review. I agree with point 1 and disagree with point 2. Here's my reasoning:
Point 1: ServerSocket(0) — done
I've updated selectPort() to use ServerSocket(0). The 7001-7128 range, usedPorts ConcurrentHashMap, and isPortAvailable scanning are all removed. Much simpler. The --driver-host-port flag stays for the explicit-port use case (it still validates via isPortAvailable before using the user-specified port).
Point 2: Deleting SessionStore — I'd push back here
The SessionStore, connectToExistingSession, heartbeat loop, and shared ~/.maestro/sessions file are existing Maestro infrastructure that my changes fix — they don't add it. The per-device key format, self-exclusion bug, and file locking are bug fixes to code that was already shipping and already broken.
Deleting SessionStore would require rearchitecting two cross-process behaviors that currently depend on it:
1. Shutdown coordination (MaestroSessionManager line 129):
if (SessionStore.activeSessions().isEmpty()) {
session.close()
}Process A checks whether any other process is still using a device before closing. Without this, process A's shutdown hook calls session.close() → IOSDriver.close(), potentially killing a driver that process B is actively using. Removing SessionStore means either:
- Every process kills its driver on exit → breaks parallel execution (process A kills B's XCTest runner)
- No process kills its driver on exit → leaks orphaned xcodebuild/runner processes indefinitely
- A daemon owns driver lifecycle → correct, but significantly larger scope
2. Warm reuse across CLI invocations (connectToExistingSession):
Without it, every maestro test invocation pays the full cold-start penalty: extract build products, run xcodebuild test-without-building, poll for ~15-20s until the HTTP server responds. Warm reuse cuts this from ~15s to ~8s per device. For local iteration (run flow, tweak, run again), that's the difference between responsive and sluggish.
A long-running daemon is the right eventual architecture for warm reuse, but it's significantly larger scope than fixing the existing mechanism. My changes make the existing shared-state approach correct (per-device keys, self-exclusion, file locking) without introducing new architectural concepts.
Summary
I've adopted ServerSocket(0) for port selection (pushed). I'd keep SessionStore with the per-device fixes — it's existing infrastructure that serves real purposes, and deleting it introduces new bugs that would require a daemon to solve properly.
|
+1 on this. We're running parallel Maestro tests across multiple iOS simulators and hitting the exact port collision issue described here. Currently forced to run sequentially as a workaround, which significantly slows down our CI pipeline. Would love to see this merged! |
|
@qwertey6 Thanks for the pushback and for shipping I dug into I agree warm reuse is worth keeping and a daemon is out of scope. What I'd still push on is the shape of the cross-process state — the shared Alternative: turn
Migration is trivial — This addresses the underlying concern (processes shouldn't negotiate over shared mutable state) without losing warm reuse or scoping in a daemon. Should be smaller than what's in the PR today — One open question: am I missing a case where two processes legitimately need to share the same driver on the same device simultaneously? `--shard-split~ distributes 1:1 onto distinct devices within a JVM, and across CLI invocations the second one can just connect to the first's HTTP server without host-side coordination. If there's a case I'm missing, I'll drop the objection. |
|
I run |
Yes that will be 2 sessions. |
|
@proksh — I like the direction of per-device files. In fact, that's exactly what PR #3167 already implements — One concern with the So the question is whether this per-device port file (already in #3167) can fully replace
Two paths forward:
I'd lean toward A — ship the bug fixes now, iterate on the architecture next — but happy to go either way. |
proksh
left a comment
There was a problem hiding this comment.
One concern with the {port, pid, startedAt} JSON shape you proposed: PID liveness checks aren't cross-platform. ProcessHandle.of(pid) is Java 9+ (Maestro targets Java 8), and there's no reliable equivalent on Windows without JNA. The port probe alone is sufficient — if the port responds to /status, the runner is alive regardless of PID. If it doesn't, start fresh.
So the question is whether this per-device port file (already in #3167) can fully replace SessionStore, or whether there are cases where the heartbeat-based shared state is still needed. I believe it can replace it — the only cross-process coordination SessionStore provides is:
Warm reuse: "is a driver already running on this device?" → port probe handles this
Shutdown coordination: "should I close this driver?" → per-device file check handles this
Got it on PID / Java 8 / port probe + /status. That makes sense.
Re: replacing SessionStore with the per-device port approach in #3167: I’m aligned that warm reuse and shutdown hints can be driven by probe + on-disk port per deviceId, without heartbeats, as long as we define the shutdown cases clearly.
Two paths forward:
Option A: Merge this PR as-is (fixes the bugs in the existing SessionStore), then land #3167 next and use it as the foundation to refactor SessionStore out in a follow-up
Option B: Collapse #3138 and #3167 into a single PR that replaces SessionStore with per-device port files from the start
I'd lean toward A — ship the bug fixes now, iterate on the architecture next — but happy to go either way.
Lets go with option A. Prefer not to combine #3167 with this PR. This solution is already fixing a lot of edge cases we have.
Left a few minor comments.
| } | ||
|
|
||
| fun activeSessionsForDevice(platform: Platform, deviceId: String? = null): List<String> { | ||
| val devicePrefix = "${platform}_${deviceId ?: "unknown"}_" |
There was a problem hiding this comment.
We should avoid fallback to "unknown" here. This can cause 2 unrelated runs under one key.
Why is this deviceId nullable? Can we require non-null device key before writing session state.
| } | ||
|
|
||
| @Test | ||
| fun `shouldCloseSession returns true when no sessions remain for device even if other devices have sessions`() { |
There was a problem hiding this comment.
can we add a false case for this.
Two Android heartbeats on emulator-5554, delete one → shouldCloseSession(ANDROID, "emulator-5554") is false.
|
|
||
| val deviceASessions = sessionStore.activeSessionsForDevice(Platform.IOS, "device-A") | ||
|
|
||
| assertThat(deviceASessions).hasSize(1) |
There was a problem hiding this comment.
Can we also validate the key here -
assertThat(deviceASessions.single()).isEqualTo(expectedKey)
For future proofing activeSessionsForDevice.
| // DESIRED: should close because no Android sessions remain on this device | ||
| val shouldClose = sessionStore.shouldCloseSession(Platform.ANDROID, "emulator-5554") | ||
| assertThat(shouldClose).isTrue() | ||
| } |
There was a problem hiding this comment.
Can we add a test for pruneInactiveSessions?
To answer this more clearly @Fishbowler: Yes - two newSessions (two processes). Second one may reuse the same XCTest runner by skipping driver.open() when connectToExistingSession is set. So two sessions, often one driver. Correct me if I am wrong @qwertey6 |
|
Thanks! I'd certainly like that capability to be retained. |
|
@qwertey6 Let me know if you need anything from us. |
iOS simulators share the host's localhost, causing port collisions when
multiple Maestro processes target different sims simultaneously. Session
tracking was per-platform, so two processes on different devices would
interfere with each other's sessions.
Changes:
- Per-device session tracking: SessionStore keys are now
"{platform}_{deviceId}_{sessionId}" instead of "{platform}_{sessionId}"
- Add --driver-host-port CLI flag for explicit XCTest server port
- Auto-select available ports with isPortAvailable() check
- Refactor SessionStore from singleton to injectable class (DI)
- Add shouldCloseSession(platform, deviceId) for per-device shutdown
instead of global activeSessions().isEmpty()
- Add cross-process file locking to KeyValueStore (~/.maestro/sessions)
- Append PID to debug log directory to prevent parallel race
- Enable useJUnitPlatform() in maestro-cli (was missing)
- Add SessionStoreTest with 8 tests covering isolation and lifecycle
Verified: 3 iOS simulators + Android emulator running simultaneously,
all passing. Both --driver-host-port (explicit) and auto-port-selection
work correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
58eacc2 to
ab8ca1e
Compare
|
@proksh — addressed all four comments:
All pushed. |
Proposed changes
Parallel execution on iOS is completely broken. Two
maestro testprocesses targeting different iOS simulators collide on the XCTest HTTP server port and interfere with each other's sessions. This makes--shard-splitunusable on iOS and forces sequential execution.Impact
--shard-splitand multi-process execution now work on iOSThread.sleep(1000)hack from Fix Android Parallel/Shards Race Condition #1867 — the per-device session check eliminates the race condition it was working aroundRoot cause
Two bugs compound (same analysis as #2339 by @avinash-bharti):
hasActiveSessions()checks if any session exists on the same platform — regardless of device. Process B sees process A's session and skipsdriver.open(), connecting to the wrong device.What this PR does differently from #2339 and #2821
Builds on the same architectural approach (per-device session tracking +
--driver-host-port), with additional correctness fixes discovered during testing:--driver-host-portflaginitialDelay=0fires before session check, causing process to find itself and skipdriver.open(). Broke Android completely.~/.maestro/sessionshad noFileLock, concurrent processes could corrupt session datashouldCloseSession(platform, deviceId)replaces globalactiveSessions().isEmpty()isPortAvailable()with correct check-then-claim orderingChanges
SessionStore.kt— Refactored from singleton to injectable class. Key format:"{platform}_{deviceId}_{sessionId}". New methods:hasActiveSessionForDevice(),shouldCloseSession(),activeSessionsForDevice()MaestroSessionManager.kt— UsesSessionStore.default, per-device heartbeat/delete/closeKeyValueStore.kt— AddedFileLockfor cross-process safetyApp.kt+TestCommand.kt—--driver-host-portCLI optionTestCommand.kt—selectPort()withisPortAvailable()checkSocketUtils.kt—isPortAvailable()functionDebugLogStore.kt— PID in log directory name (Java 8 compatible viaManagementFactory)parent?.driverHostPortinstead ofnullVerification
--driver-host-portexplicit ports--shard-split=2with 2 flowsTesting
SessionStoreTest(per-device isolation, key format, lifecycle, shutdown)useJUnitPlatform()added tomaestro-cli/build.gradle.kts(was missing)Issues fixed
Closes #1853, #2556, #1485, #2082, #2703
Supersedes #2339 and #2821 — credit to @avinash-bharti and @omnarayan for the root cause analysis and architectural direction.