Fix NVMe hot-swap LED stuck in FAILURE under VMD controllers#275
Fix NVMe hot-swap LED stuck in FAILURE under VMD controllers#275tasleson wants to merge 3 commits into
Conversation
NVMe udev events may arrive with a virtual nvme-subsystem sysfs path (e.g. /sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1) that cannot be resolved to a PCI controller. This causes block_device_init() to fail in _compare(), silently dropping add and remove events. Add a devnode name fallback to _compare() so that virtual nvme-subsystem paths are matched to their corresponding block device. This restores udev event processing for NVMe hot-swap under VMD controllers. Signed-off-by: Tony Asleson <tasleson@redhat.com> Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
FAILED_DRIVE is intentionally sticky across sysfs scan cycles to prevent RAID member fault LEDs from flickering. However, this also prevents standalone NVMe drives from recovering after a hot-swap cycle, leaving the failure LED on permanently. When a non-RAID device in FAILED_DRIVE state reappears in the sysfs scan, transition it to ADDED so the normal state machine can drive it back to healthy operation. RAID members remain sticky and require mdadm or other raid tool intervention to clear the fault. Signed-off-by: Tony Asleson <tasleson@redhat.com> Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
When ledctl walks sysfs for --list-slots, VMD PCI slot entries may persist after a drive is physically removed because the VMD controller maintains the PCI topology. This causes ledctl to report a device in the slot that is no longer present. Add a stat() check on the device node to verify the block device actually exists before associating it with the slot. Signed-off-by: Tony Asleson <tasleson@redhat.com> Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| result->bl_device = get_block_device_from_sysfs_path(pci_slot->ctx, | ||
| pci_slot->address, true); | ||
| if (result->bl_device) { |
There was a problem hiding this comment.
I don't think we should care. There is always a race window, We still can hit a moment between stat and snprintf but of course hit window is reduced.
I think that we should always trust the state we have saved by _sysfs_scan
It might not be uptodate but in worst case we will print the device that gone. Not a big deal.
There was a problem hiding this comment.
There will be more place like that, so I would stick with no handling.
(Unless I missing something?)
| temp->ibpi = block->ibpi; | ||
| } | ||
| } else if (temp->ibpi == LED_IBPI_PATTERN_FAILED_DRIVE && | ||
| !temp->raid_dev) { |
There was a problem hiding this comment.
It requires config option because we are changing legacy behavior. I would add something like:
"BLINK_PERSISTENT_FAIL_ON_READD = TRUE" - that sholud be a default.
We cannot change it as is because it is to big risk. I cannot predict how many deployments sticked to this behavior. It would be especially harmful for the users that sticked to failure as indication that disks behaves incorrectly.
Root cause
NVMe udev events arrive with a virtual nvme-subsystem sysfs path (e.g.
/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1), but ledmon stores block devices using their physicalPCI sysfs path. The
_compare()function in udev event handling callsblock_device_init()on the virtual path,which fails because
block_get_controller()cannot match a virtual path to any PCI controller. This silently dropsall add and remove udev events for NVMe devices.
Without udev event processing, ledmon detects removal only through a timestamp mismatch in
_send_msg(), whichsets
FAILED_DRIVE. On re-insertion the device reappears in the sysfs scan, butFAILED_DRIVEis intentionallysticky in
_add_block()(to protect RAID members), so the state is never cleared. The udevaddevent that wouldnormally break out of this via the
ADDED→ONESHOT_NORMAL→UNKNOWNstate machine is never matched.Changes
Fix udev event matching — Add a devnode name fallback to
_compare()so that virtual nvme-subsystem pathsare matched to their corresponding block device.
Allow non-RAID recovery from FAILED_DRIVE — When a non-RAID device in
FAILED_DRIVEstate reappears in thesysfs scan, transition it to
ADDEDso the state machine can drive it back to normal. RAID members remain stickyand require explicit intervention regardless of whether the removal was intentional or caused by
hardware. Note: This change may need to be placed behind a configuration setting
Validate device node in ledctl slot reporting — Add a
stat()check on the device node before associating ablock device with a PCI slot, so
ledctl --list-slotsdoes not report a device that is no longer present.This needs careful review and ideally testing from user supplied issue.
Resolves: #274