Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions app/src-tauri/src/webview_apis/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,18 @@
//!
//! ## Startup / port coordination
//!
//! The server picks its port at boot:
//! 1. If `OPENHUMAN_WEBVIEW_APIS_PORT` is set, try that port first.
//! 2. Else bind `127.0.0.1:0` and let the OS pick.
//!
//! Either way the resolved port is exposed via
//! [`resolved_port`] and pushed into the core sidecar's environment
//! as `OPENHUMAN_WEBVIEW_APIS_PORT` by `core_process::spawn_core`.
//! The server always binds `127.0.0.1:0` and lets the OS pick an
//! ephemeral port. The resolved port is exposed via [`resolved_port`]
//! and pushed into the core sidecar's environment as
//! `OPENHUMAN_WEBVIEW_APIS_PORT` by `core_process::spawn_core` so the
//! client side can find it.
//!
//! `OPENHUMAN_WEBVIEW_APIS_PORT` is an **output** of the bridge — it is
//! intentionally never read as input. Honouring a pre-existing value
//! was the cause of Sentry OPENHUMAN-TAURI-82 on Windows: a stale env
//! value left over from a prior run (or inherited from a parent
//! process) led the next launch to re-bind the exact same port and
//! fail with WSAEADDRINUSE (`os error 10048`).

pub mod router;
pub mod server;
Expand Down
88 changes: 78 additions & 10 deletions app/src-tauri/src/webview_apis/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,20 +43,24 @@ pub fn resolved_port() -> u16 {
/// Start the server. Idempotent: after the first successful call any
/// subsequent call is a no-op. Returns the bound port.
///
/// Port selection: if `PORT_ENV` is set and non-zero, bind that port
/// (caller gets a deterministic port across runs — useful in dev);
/// otherwise bind `127.0.0.1:0` and let the OS pick.
/// Port selection: always bind `127.0.0.1:0` and let the OS pick an
/// ephemeral port. The resolved port is then exported via `PORT_ENV`
/// (by the caller in `lib.rs`) so the core sidecar can discover it.
///
/// We deliberately ignore any pre-existing `PORT_ENV` value here:
/// honouring it caused Sentry OPENHUMAN-TAURI-82 on Windows — if a
/// previous run wrote `PORT_ENV=49342` into the user's environment
/// (or the env was inherited from a parent process / leftover dev
/// session), the next launch would attempt to re-bind that exact
/// port and fail with WSAEADDRINUSE / os error 10048 whenever the
/// socket was still held by another process or stuck in TIME_WAIT.
/// `PORT_ENV` is an *output* of the bridge, not an input.
pub async fn start() -> Result<u16, String> {
if STARTED.get().is_some() {
return Ok(resolved_port());
}

let requested = std::env::var(PORT_ENV)
.ok()
.and_then(|s| s.parse::<u16>().ok())
.unwrap_or(0);

let addr: SocketAddr = format!("127.0.0.1:{requested}")
let addr: SocketAddr = "127.0.0.1:0"
.parse()
.map_err(|e| format!("[webview_apis] bad addr: {e}"))?;
let listener = TcpListener::bind(addr)
Expand All @@ -69,7 +73,7 @@ pub async fn start() -> Result<u16, String> {
RESOLVED_PORT.store(port, Ordering::SeqCst);
let _ = STARTED.set(());

log::info!("[webview_apis] server listening on {bound}");
log::info!("[webview_apis] server listening on {bound} (OS-assigned ephemeral)");

let accept_handle = tokio::spawn(async move {
loop {
Expand Down Expand Up @@ -264,3 +268,67 @@ impl Response {
}
}
}

#[cfg(test)]
mod tests {
use super::*;

/// Regression test for Sentry OPENHUMAN-TAURI-82.
///
/// If `PORT_ENV` carries a stale value pointing at a port that is
/// already in use (the failure mode reported on Windows: a previous
/// run wrote `49342` into the env, then the same port was held by
/// another process / stuck in TIME_WAIT), `start()` must still
/// succeed by binding a fresh OS-assigned ephemeral port instead of
/// trying to re-bind the stale port.
// Single-threaded runtime: `std::env::set_var` mutates process-global
// state and is not thread-safe in Rust. Under the default multi-threaded
// tokio test runtime, threads spawned by the same test could observe
// the env between `set_var` and the restore. `current_thread` eliminates
// that intra-test window (cross-test races between OS threads from
// OTHER tests are still possible — see the save/restore pattern below
// for that part). Per graycyrus review on PR #1543.
#[tokio::test(flavor = "current_thread")]
async fn start_ignores_stale_port_env_and_binds_ephemeral() {
// Occupy a port so `PORT_ENV` points at something that would
// definitely fail if `start()` honoured it.
let blocker = TcpListener::bind("127.0.0.1:0")
.await
.expect("blocker bind");
let stale_port = blocker.local_addr().expect("blocker addr").port();
// Save+restore `PORT_ENV` so parallel tests in the same process
// don't see this test's mutation. (Per CodeRabbit feedback on PR
// #1543.) `std::env::set_var` is process-global; without the
// restore, an unrelated test asserting on `PORT_ENV` could observe
// `stale_port` and flake.
let prev_port_env = std::env::var(PORT_ENV).ok();
std::env::set_var(PORT_ENV, stale_port.to_string());

Comment thread
coderabbitai[bot] marked this conversation as resolved.
let bound = start()
.await
.expect("start should succeed despite stale PORT_ENV");

assert_ne!(
bound, stale_port,
"start() must pick a fresh ephemeral port, not the stale one in PORT_ENV"
);
assert_eq!(resolved_port(), bound);

// Hold `blocker` until after `start()` so the kernel definitely
// can't satisfy a bind on `stale_port` — defends against the
// exact race the Sentry issue describes.
drop(blocker);
// Note: `stop()` only aborts the accept loop — it does NOT (and
// cannot) reset the `STARTED` OnceLock. If a second test in this
// binary later calls `start()` it'll hit the idempotency
// early-return and silently observe this test's port. A future
// refactor to `AtomicBool`-based singleton would let `stop()`
// fully tear down. Tracked as graycyrus feedback on PR #1543;
// currently inert because this is the only test in the module.
stop();
match prev_port_env {
Some(v) => std::env::set_var(PORT_ENV, v),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] stop() does not reset the STARTED singleton — misleading teardown

STARTED is a OnceLock<()>. Once set it can never be unset. stop() aborts the accept-loop JoinHandle but does not touch STARTED, RESOLVED_PORT, or ACCEPT_LOOP. That means:

  • If a second test in the same binary later calls start(), it hits the early-return at the idempotency guard and silently gets back this test's port — not a freshly-bound one.
  • The stop() call gives a false impression that the server state is fully torn down.

Currently there's only one test so it doesn't cause a failure today, but it's a landmine.

Suggested fix — either:

  1. Remove the stop() call and add a comment explaining the singleton constraint:
// Note: stop() only aborts the accept loop; it does NOT reset
// the STARTED OnceLock. Any future test calling start() in the
// same binary will short-circuit and get this port back.
drop(blocker);
  1. Or, reset observable state under #[cfg(test)] inside stop():
pub fn stop() {
    // ... existing abort ...
    #[cfg(test)]
    RESOLVED_PORT.store(0, Ordering::SeqCst);
}

(Option 2 still can't reset OnceLock, so start() would still no-op. Fully resettable would need AtomicBool instead of OnceLock.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 44c155d6 — added inline comment documenting that stop() cannot reset the STARTED OnceLock. Currently inert (single test in module); a future test that calls start() again would observe this port. A full AtomicBool-based singleton refactor is the right move when a second test arrives; leaving as a flagged landmine vs. expanding scope here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 44c155d6 — added inline comment documenting that stop() cannot reset the STARTED OnceLock. Currently inert (single test in module); a future test that calls start() again would observe this port. A full AtomicBool-based singleton refactor is the right move when a second test arrives; leaving as a flagged landmine vs. expanding scope here.

None => std::env::remove_var(PORT_ENV),
}
}
}
Loading