nvme_driver: only support keepalive for a certain class of devices#3415
nvme_driver: only support keepalive for a certain class of devices#3415gurasinghMS wants to merge 18 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR narrows NVMe “keepalive” (servicing without device reset) behavior in Underhill to only a specific class of devices, as a mitigation for known incompatibilities with certain devices
Changes:
- Introduces a helper to decide whether a device is “keepalive-compatible” based on its identifier.
- During servicing shutdown, only enables keepalive (do-not-reset / skip shutdown) for compatible devices; logs when keepalive is requested but disabled per-device.
- During servicing save, skips persisting NVMe state for incompatible devices.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| openhcl/underhill_core/src/nvme_manager/mod.rs | Adds a helper function to gate keepalive compatibility based on an identifier heuristic. |
| openhcl/underhill_core/src/nvme_manager/manager.rs | Applies the compatibility gate to shutdown options and to which devices participate in save/restore. |
| // nvme_keepalive is received from host but it is only valid | ||
| // when memory pool allocator supports save/restore. Further, | ||
| // as a partial mitigation for known incompatibilities between | ||
| // keepalive and NVMe Direct v2 devices, we only honor | ||
| // keepalive for ASAP devices (identified by the VPCI | ||
| // instance GUID containing `c05b`). | ||
| let host_requested_keepalive = | ||
| nvme_keepalive && self.context.save_restore_supported; | ||
| let device_keepalive = | ||
| host_requested_keepalive && is_nvme_keepalive_compatible(&pci_id); | ||
| if host_requested_keepalive && !device_keepalive { | ||
| tracing::info!( | ||
| %pci_id, | ||
| "disabling nvme keepalive for non-ASAP device; \ | ||
| falling back to reset-on-servicing" | ||
| ); |
There was a problem hiding this comment.
This is good for a proof of concept + evaluating this fix. As was discussed offline: you'll want to use device identity data to make this analysis more robust.
|
Cool. Thanks for giving this a try. Please make sure to write a CI test that validates this works when there's two devices: device (a) that supports keepalive and device (b) that does not. Hopefully that will be rather quick, and will catch basic problems in this path. |
| do_not_reset: nvme_keepalive | ||
| && self.context.save_restore_supported | ||
| && is_nvme_keepalive_compatible(&pci_id), | ||
| skip_device_shutdown: nvme_keepalive | ||
| && self.context.save_restore_supported | ||
| && is_nvme_keepalive_compatible(&pci_id), |
There was a problem hiding this comment.
I don't follow. I think the code's logic is correct.
There was a problem hiding this comment.
This is interesting, makes me wanna think twice about it because when my local copilot first wrote this code it also decided to do !is_nvme_keepalive_compatible(&pci_id) here and I had corrected it.
| .filter_map(|(pci_id, driver)| { | ||
| if is_nvme_keepalive_compatible(pci_id) { | ||
| Some((pci_id.clone(), driver.client().clone())) | ||
| } else { | ||
| tracing::info!( | ||
| %pci_id, | ||
| "skipping save of nvme device; \ | ||
| keepalive disabled for this device" | ||
| ); | ||
| None | ||
| } | ||
| }) |
There was a problem hiding this comment.
This makes sense. I don't know how long the logging prolongs the time inside the lock (which lock? I don't see locks in this function), so don't know if we should fix this.
There was a problem hiding this comment.
I don't know what lock it is talking about either. But unless there are hundreds of devices, which there shouldn't be, I think this is fine. @alandau you ok moving forward with this?
alandau
left a comment
There was a problem hiding this comment.
Looks good, a few nits inline.
Do we want to allow keepalive only on c05b, or find out the pciid for the problematic hardware and disable keepalive only for it?
| do_not_reset: nvme_keepalive | ||
| && self.context.save_restore_supported | ||
| && is_nvme_keepalive_compatible(&pci_id), | ||
| skip_device_shutdown: nvme_keepalive | ||
| && self.context.save_restore_supported | ||
| && is_nvme_keepalive_compatible(&pci_id), |
There was a problem hiding this comment.
I don't follow. I think the code's logic is correct.
| .filter_map(|(pci_id, driver)| { | ||
| if is_nvme_keepalive_compatible(pci_id) { | ||
| Some((pci_id.clone(), driver.client().clone())) | ||
| } else { | ||
| tracing::info!( | ||
| %pci_id, | ||
| "skipping save of nvme device; \ | ||
| keepalive disabled for this device" | ||
| ); | ||
| None | ||
| } | ||
| }) |
There was a problem hiding this comment.
This makes sense. I don't know how long the logging prolongs the time inside the lock (which lock? I don't see locks in this function), so don't know if we should fix this.
…over, it will tear down that device state and start again ... at least it should
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
openhcl/underhill_core/src/nvme_manager/manager.rs:1
- The restore path still iterates every saved NVMe entry and always passes
Some(&disk.driver_state), even when the PCI ID is not keepalive-compatible. That means a servicing upgrade from an older build that saved all NVMe controllers will still attempt to restore state for non-c05bdevices instead of forcing them through the reset path. The save-side filter added in this PR only protects states produced by the new code, not states produced by the previous version.
// Copyright (c) Microsoft Corporation.
| vm.restart_openhcl(igvm_file.clone(), flags).await?; | ||
|
|
||
| agent.ping().await?; | ||
|
|
||
| // The non-c05b device must issue CREATE_IO_COMPLETION_QUEUE after | ||
| // servicing because its keepalive was downgraded to reset. | ||
| CancelContext::new() | ||
| .with_timeout(Duration::from_secs(60)) | ||
| .until_cancelled(no_keepalive_create_seen_recv) | ||
| .await |
| source: NvmeSpawnerError, | ||
| } | ||
|
|
||
| /// Device prefix for nvme devices that are compatibble with keepalive. |
| /// Returns whether the given PCI ID corresponds to an NVMe device that is | ||
| /// compatible with the keepalive. | ||
| /// DEV_NOTE: This is a weak heuristic based approach which is not ideal but is necessary | ||
| pub(crate) fn is_nvme_keepalive_compatible(pci_id: &str) -> bool { |
|
LGTM |
… figure out keepalive info
|
|
||
| // Read the PCI vendor/device IDs once and cache the resulting | ||
| // keepalive compatibility flag on the `NvmeDriverManager`. | ||
| let keepalive_compatible = is_nvme_keepalive_compatible(&pci_id); |
There was a problem hiding this comment.
@alandau do you know if the device would be resolved at this point? Like if the device has not arrived back and reading the config space leads to a stall, it could impact boot times (which is no better than not doing keepalive at all)
There was a problem hiding this comment.
I don't. But I remember at some point we ask VfioDevice to restore and it has to go to /sys to do its magic. You can try seeing if it's doing it past this point or not.
| let keepalive_compatible = is_nvme_keepalive_compatible(&pci_id); | ||
|
|
||
| let driver = NvmeDriverManager::new( | ||
| &context.driver_source, | ||
| &pci_id, | ||
| context.vp_count, | ||
| context.save_restore_supported, | ||
| context.save_restore_supported && keepalive_compatible, |
| #[openvmm_test(openhcl_linux_direct_x64 [LATEST_LINUX_DIRECT_TEST_X64])] | ||
| async fn servicing_keepalive_per_device_gate( | ||
| config: PetriVmBuilder<OpenVmmPetriBackend>, | ||
| (igvm_file,): (ResolvedArtifact<impl petri_artifacts_common::tags::IsOpenhclIgvm>,), | ||
| ) -> Result<(), anyhow::Error> { |
| .with_hardware_config_fault( | ||
| HardwareConfigFaultConfig::new() | ||
| .with_vendor_id(0x1414) | ||
| .with_device_id(0xb111), |
| let hardware_config_fault = fault_configuration.hardware_config_fault.take(); | ||
| let vendor_id = hardware_config_fault | ||
| .and_then(|f| f.vendor_id) | ||
| .unwrap_or(VENDOR_ID); | ||
| let device_id = hardware_config_fault | ||
| .and_then(|f| f.device_id) | ||
| .unwrap_or(0x00a9); |
|
LGTM, a few small ones inline |
alandau
left a comment
There was a problem hiding this comment.
Ohh, apparently commenting on the discussion tab doesn't submit pending line comments.
|
|
||
| // Read the PCI vendor/device IDs once and cache the resulting | ||
| // keepalive compatibility flag on the `NvmeDriverManager`. | ||
| let keepalive_compatible = is_nvme_keepalive_compatible(&pci_id); |
There was a problem hiding this comment.
I don't. But I remember at some point we ask VfioDevice to restore and it has to go to /sys to do its magic. You can try seeing if it's doing it past this point or not.
| .with_hardware_config_fault( | ||
| HardwareConfigFaultConfig::new() | ||
| .with_vendor_id(0x1414) | ||
| .with_device_id(0xb111), |
| use nvme_resources::fault::AdminQueueFaultBehavior; | ||
| use nvme_resources::fault::AdminQueueFaultConfig; | ||
| use nvme_resources::fault::FaultConfiguration; | ||
| use nvme_resources::fault::HardwareConfigFaultConfig; |
| match guard.entry(pci_id.clone()) { | ||
| hash_map::Entry::Occupied(_) => unreachable!(), // We checked above that this entry does not exist. | ||
| hash_map::Entry::Vacant(entry) => { | ||
| let keepalive_compatible = is_nvme_keepalive_compatible(&pci_id); | ||
|
|
| let mut devices_to_save: HashMap<String, NvmeDriverManagerClient> = self | ||
| .context | ||
| .devices | ||
| .write() | ||
| .iter() |
This reverts commit 4efda83.
No description provided.