openvmm_entry: add VM attach commands#3403
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a mesh-backed attach socket and new top-level CLI commands (run, attach, inspect) so that later clients can connect to and manage an already-running OpenVMM instance without taking ownership of VM lifetime.
Changes:
- Introduces an attach protocol (
vm_connect) and attach listener lifecycle inVmController(--mesh-listen). - Refactors VM ownership into
VmController(CurrentVm, server-modeCreateVm/TeardownVm) to share attach resources across CLI and ttrpc/grpc modes. - Updates CLI parsing to support subcommands while preserving implicit
run, and documents the new command shape in the Guide.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| openvmm/openvmm_entry/src/vm_controller.rs | Centralizes VM ownership, adds attach/connect and server-mode VM create/teardown RPCs, and manages attach listener lifecycle. |
| openvmm/openvmm_entry/src/vm_connect.rs | Adds the mesh-based attach request/response protocol and a background listener task. |
| openvmm/openvmm_entry/src/ttrpc/mod.rs | Switches ttrpc/grpc service to create/teardown VMs via VmController RPCs and wires controller channels into the worker. |
| openvmm/openvmm_entry/src/repl.rs | Adds “detach” vs “quit VM” behavior for REPL exits to support attach mode. |
| openvmm/openvmm_entry/src/meshworker.rs | Exposes process_mesh() for features requiring a process mesh (external listeners). |
| openvmm/openvmm_entry/src/lib.rs | Adds attach/inspect command handling, connect-to-VM helper, and refactors server-mode startup around VmController. |
| openvmm/openvmm_entry/src/cli_args.rs | Introduces top-level subcommands and legacy parsing fallback, adds --mesh-listen, and extends parsing tests. |
| Guide/src/reference/openvmm/management/cli.md | Documents new subcommand layout and --mesh-listen behavior/security expectations. |
| /// clients. | ||
| /// | ||
| /// Returns a handle that must be kept alive. Call [`AttachListener::shutdown`] | ||
| /// to stop accepting, shut down the mesh, join the listener task, and remove |
There was a problem hiding this comment.
The doc comment claims AttachListener::shutdown will “shut down the mesh”, but the implementation only signals the task to stop and then removes the socket file; it never calls mesh.shutdown() (and it doesn't even have access to a mesh handle). Please update the comment to reflect what actually happens (stop accepting + join task + remove socket).
| /// to stop accepting, shut down the mesh, join the listener task, and remove | |
| /// to stop accepting new connections, join the listener task, and remove |
| pub struct VmController { | ||
| pub(crate) driver: DefaultDriver, | ||
| pub(crate) mesh: VmmMesh, | ||
| pub(crate) vm_controller: mesh::Sender<VmControllerRpc>, | ||
| pub(crate) current_vm: Option<CurrentVm>, | ||
| pub(crate) attach_path: Option<PathBuf>, | ||
| pub(crate) attach_listener: Option<vm_connect::AttachListener>, | ||
| pub(crate) exit_on_vm_stop: bool, |
There was a problem hiding this comment.
VmController now holds a vm_controller: mesh::Sender<VmControllerRpc> to its own RPC channel. Because at least one sender is always kept alive by the controller itself, rpc_recv.next().await will never return None, so Event::RpcClosed can't trigger when the REPL/stdin thread exits without sending Quit. In run_control_inner the main task awaits controller_task, so this can hang indefinitely with a still-running VM.
Consider removing the self-referential sender from VmController (or ensuring it is dropped once the attach listener is set up), so that dropping the last external client sender can still close the channel and allow the controller to stop/exit as before.
| async fn teardown_current_vm_after_worker_stop(&mut self) { | ||
| if let Some(mut current_vm) = self.current_vm.take() { | ||
| if let Some(mut vnc) = current_vm.vnc_worker.take() { | ||
| vnc.stop(); | ||
| let _ = vnc.join().await; | ||
| } | ||
| if let Some(mut gdb) = current_vm.gdb_worker.take() { | ||
| gdb.stop(); | ||
| let _ = gdb.join().await; | ||
| } | ||
| } |
There was a problem hiding this comment.
teardown_current_vm_after_worker_stop() drops current_vm.vm_worker without ever calling join(). WorkerEvent::Stopped/Failed only indicates an event was observed; WorkerHandle::join() is the only place that actually waits for the worker event stream to terminate and propagates failure. Please join the VM worker here (or reuse stop_current_vm() with a no-op stop) so the worker is fully reaped and errors aren't silently lost.
| transport, | ||
| }) | ||
| .await?; | ||
| // Signal the the parent process that the server is ready. |
There was a problem hiding this comment.
The comment has a duplicated word: "Signal the the parent process". Please fix to "Signal the parent process" to avoid confusion in this startup/readiness path.
| // Signal the the parent process that the server is ready. | |
| // Signal the parent process that the server is ready. |
| /// Start a mesh listener that hands out VM control channels to connecting | ||
| /// clients. | ||
| /// | ||
| /// Returns a handle that must be kept alive. Call [`AttachListener::shutdown`] | ||
| /// to stop accepting, shut down the mesh, join the listener task, and remove | ||
| /// the socket file. | ||
| pub async fn start_attach_listener( | ||
| mesh: &mesh_process::Mesh, | ||
| driver: &impl Spawn, | ||
| path: &Path, | ||
| vm_controller: mesh::Sender<VmControllerRpc>, | ||
| ) -> anyhow::Result<AttachListener> { |
There was a problem hiding this comment.
The doc comment for start_attach_listener says AttachListener::shutdown will "shut down the mesh", but the implementation only stops the listener task and removes the socket file. Please update the comment to match the actual behavior (or extend shutdown to also shut down the mesh if that is the intent).
| impl VmConnectResponse { | ||
| pub fn into_repl_resources(self) -> repl::ReplResources { | ||
| let (send, recv) = mesh::channel(); | ||
| drop(send); | ||
| repl::ReplResources { | ||
| vm_rpc: self.vm_rpc, | ||
| vm_controller: self.vm_controller, | ||
| vm_controller_events: recv, | ||
| scsi_rpc: self.scsi_rpc, | ||
| nvme_vtl2_rpc: self.nvme_vtl2_rpc, | ||
| shutdown_ic: self.shutdown_ic, | ||
| kvp_ic: self.kvp_ic, | ||
| console_in: None, | ||
| has_vtl2: self.has_vtl2, | ||
| quit_behavior: repl::ReplQuitBehavior::Detach, | ||
| } | ||
| } |
There was a problem hiding this comment.
VmConnectResponse::into_repl_resources constructs vm_controller_events as a receiver whose sender is immediately dropped. This means an attached REPL will never receive VmControllerEvent::WorkerStopped / GuestHalt events, so run_repl can't automatically react when the VM stops (it will only exit when the user types quit). Consider extending the attach protocol to provide a real event stream (e.g., a per-attach subscription/broadcast receiver) or otherwise signal disconnect/VM termination to the REPL.
|
|
||
| child.wait()?; | ||
| let (status_send, status_recv) = mesh::oneshot(); | ||
| std::thread::Builder::new() |
There was a problem hiding this comment.
Why do we need to do this whole dance just to wait on the child process?
| will reject the command. Use `--pidfile <PATH>` to write the server process ID | ||
| on startup and remove it on clean exit. | ||
|
|
||
| The positional `SOCKETPATH` is the management RPC server socket. This is |
There was a problem hiding this comment.
Is it possible/desired to expose only a mesh socket? I guess that wouldn't have much value, since you couldn't launch a vm through it.
There was a problem hiding this comment.
You could imagine only exposing a mesh socket and adding a way to specify the starting params via the CLI. I think we might get there eventually, but not yet. I can also imagine having a single socket support both gRPC and mesh (but not ttrpc and mesh, since ttrpc doesn't have any kind of protocol negotiation).
| .with_context(|| format!("failed to bind to socket {}", path.display()))?; | ||
|
|
||
| let transport = match opt.transport { | ||
| #[cfg(feature = "grpc")] |
There was a problem hiding this comment.
Can you double check that these uses of these features is still cargo-hack clean?
| vm_controller.send(VmControllerRpc::Quit); | ||
| break; | ||
| } | ||
| ReplQuitBehavior::Detach => { |
There was a problem hiding this comment.
Is there any way to kill the VM (send VmControllerRpc::Quit) when we're in detach behavior?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
openvmm/openvmm_entry/src/vm_controller.rs:166
- The
VmController::rundoc comment says the controller runs "until the VM stops", but in server modeexit_on_vm_stopis false and the controller keeps running after VM stop/teardown to service futureCreateVm/Connectcalls. Update the comment to reflect the actual lifetime conditions (e.g., quit/RPC channel closed).
impl VmController {
/// Run the controller, processing RPCs and worker events until the VM
/// stops or the caller (REPL or ttrpc server) sends Quit.
pub async fn run(
| /// to stop accepting, shut down the mesh, join the listener task, and remove | ||
| /// the socket file. |
There was a problem hiding this comment.
AttachListener::shutdown's doc comment says it will "shut down the mesh", but the implementation only stops the listener task and removes the socket file (mesh shutdown happens elsewhere). Please update the comment to match the actual behavior to avoid confusion about ownership/lifetime.
| /// to stop accepting, shut down the mesh, join the listener task, and remove | |
| /// the socket file. | |
| /// to stop accepting new connections, join the listener task, and remove the | |
| /// socket file. |
|
|
||
| // Signal the the parent process that the server is ready. | ||
| pal::close_stdout().context("failed to close stdout")?; | ||
| // Signal the the parent process that the server is ready. |
There was a problem hiding this comment.
Minor grammar: the comment says "Signal the the parent process" (double "the").
| // Signal the the parent process that the server is ready. | |
| // Signal the parent process that the server is ready. |
| async fn stop_current_vm(&mut self) { | ||
| let Some(mut current_vm) = self.current_vm.take() else { | ||
| return; | ||
| }; | ||
|
|
||
| current_vm.vm_worker.stop(); | ||
| if let Err(err) = current_vm.vm_worker.join().await { | ||
| tracing::error!( | ||
| error = err.as_ref() as &dyn std::error::Error, | ||
| "vm worker join failed" | ||
| ); | ||
| } |
There was a problem hiding this comment.
CurrentVm now owns scsi_rpc/nvme_vtl2_rpc (and IC RPC senders). stop_current_vm() waits for vm_worker.join() while those senders are still alive inside current_vm. Given the existing REPL workaround that explicitly drops scsi_rpc/nvme_vtl2_rpc before quitting (repl.rs notes they can hold up worker stop), this change risks reintroducing a shutdown hang/deadlock because the controller keeps the channels alive during join. Consider dropping/taking these RPC senders (and any other shutdown-blocking resources) before awaiting vm_worker.join(), or avoid storing these senders in CurrentVm across shutdown.
| }; | ||
|
|
||
| let message = if command == "run" { | ||
| "VM launch options must be passed after the `run` subcommand.\n\nUse:\n openvmm run [VM launch options]\n\nFor example:\n openvmm run --memory 4\n\nIf you intended `run` as a value, pass it to the option that should receive it, for example:\n openvmm --memory 4 -c run\n" |
There was a problem hiding this comment.
In the run ordering hint, the example openvmm run --memory 4 is misleading: --memory accepts raw bytes when no suffix is provided (so 4 means 4 bytes, not 4 GiB). Consider using 4GB/4G (or another realistic size) in the example to avoid accidental tiny-memory launches.
| "VM launch options must be passed after the `run` subcommand.\n\nUse:\n openvmm run [VM launch options]\n\nFor example:\n openvmm run --memory 4\n\nIf you intended `run` as a value, pass it to the option that should receive it, for example:\n openvmm --memory 4 -c run\n" | |
| "VM launch options must be passed after the `run` subcommand.\n\nUse:\n openvmm run [VM launch options]\n\nFor example:\n openvmm run --memory 4GB\n\nIf you intended `run` as a value, pass it to the option that should receive it, for example:\n openvmm --memory 4GB -c run\n" |
| // In non-optimized builds, clap uses an embarassing amount of stack space | ||
| // to construct the `Command` instance for `Options`, more than the Windows | ||
| // default of 1MB. This has been known since 2023: |
There was a problem hiding this comment.
Spelling: "embarassing" should be "embarrassing" in this comment.
| // In non-optimized builds, clap uses an embarassing amount of stack space | |
| // to construct the `Command` instance for `Options`, more than the Windows | |
| // default of 1MB. This has been known since 2023: | |
| // In non-optimized builds, clap uses an embarrassing amount of stack | |
| // space to construct the `Command` instance for `Options`, more than the | |
| // Windows default of 1MB. This has been known since 2023: |
|
Is there any way to add a full integration test for this functionality? |
Running OpenVMM as a long-lived process currently ties interactive management to the original process or the ttrpc/grpc service path. This adds a mesh-backed attach socket so a later client can connect to the existing VM, open a REPL, or issue a one-shot inspect without taking ownership of VM lifetime.
The entry point now parses
run,attach, andinspectas top-level commands while preserving implicitrunfor existing launch invocations. VM ownership is centralized inVmController, including server-mode VM creation and teardown, so both normal launches and ttrpc/grpc mode can expose the same attach resources. The Guide documents the new command shape,--mesh-listen, and the security and lifetime expectations for the socket.