Skip to content

kernel_npem: add support for kernel NPEM driver#270

Open
stuarthayes wants to merge 7 commits into
md-raid-utilities:mainfrom
stuarthayes:kernel_npem
Open

kernel_npem: add support for kernel NPEM driver#270
stuarthayes wants to merge 7 commits into
md-raid-utilities:mainfrom
stuarthayes:kernel_npem

Conversation

@stuarthayes
Copy link
Copy Markdown

Add a new controller for the recently-added kernel NPEM/_DSM driver (in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via sysfs. This allows access even if the kernel doesn't allow direct access to PCI config space (for NPEM), and also allows access to the PCI _DSM method of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3, section 4.7 "_DSM Definitions for PCIe SSD Status LED").

The code was based on the npem controller code, but uses the sysfs interface rather than directly accessing the NPEM capability in PCI config space.

mtkaczyk
mtkaczyk previously approved these changes Oct 29, 2025
Copy link
Copy Markdown
Member

@mtkaczyk mtkaczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you!

Copy link
Copy Markdown
Collaborator

@bkucman bkucman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart ,

Thank you for your contribution, Nice work!
After the review, I have a few points to clarify and some minor corrections.

It seems that the NPEM kernel still operates per slots, so my question is: did you use the implemented slot tests for the "NPEM kernel"? If not, please run them and verify that everything works correctly.

Also, please rebase the branch.

Thanks!

Comment thread src/lib/block.c Outdated
Comment thread src/lib/block.c
Comment thread src/lib/libled.c Outdated
Comment thread src/lib/kernel_npem.h Outdated
Comment thread src/lib/kernel_npem.c Outdated
Comment thread src/lib/sysfs.c Outdated
Add a new controller for the recently-added kernel NPEM/_DSM driver
(in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via
sysfs. This allows access even if the kernel doesn't allow direct access to
PCI config space (for NPEM), and also allows access to the PCI _DSM method
of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3,
section 4.7 "_DSM Definitions for PCIe SSD Status LED").

The code was based on the npem controller code, but uses the sysfs
interface rather than directly accessing the NPEM capability in PCI config
space.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
A few cosmetic changes for kernel_npem.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Include LED_CNTRL_TYPE_KERNEL_NPEM in led_controller_slot_support().

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Comment thread src/lib/kernel_npem.h
Add the newly-added KERNEL_NPEM controller type to two places in block.c
that were missed in the initial commit.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Update lib_unit_test.c to allow 7 controller types so it doesn't fail with
KERNEL_NPEM.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
@stuarthayes
Copy link
Copy Markdown
Author

It seems that the NPEM kernel still operates per slots, so my question is: did you use the implemented slot tests for the "NPEM kernel"? If not, please run them and verify that everything works correctly.

I hadn't run the tests. I found an issue when I ran the tests, which I fixed with another commit (7088fe0).

@stuarthayes
Copy link
Copy Markdown
Author

Also, please rebase the branch.

Done, thanks for all the feedback!

@tasleson
Copy link
Copy Markdown
Collaborator

tasleson commented Nov 21, 2025

FYI: I'm in the middle of reviewing, testing etc. I should be done by end of day Tuesday. @stuarthayes Thanks for this PR, greatly appreciated!

I manually tested this on a Dell PowerEdge R750. I'm seeing the following let states when listing the slots:

# ./src/ledctl/ledctl --list-slots -n "Kernel NPEM"
slot: 0000:69:00.0    led state: UNKNOWN         device: /dev/nvme1n1   
slot: 0000:6a:00.0    led state: UNKNOWN         device: /dev/nvme2n1   
slot: 0000:6b:00.0    led state: UNKNOWN         device: /dev/nvme6n1   
slot: 0000:6c:00.0    led state: UNKNOWN         device: /dev/nvme8n1   
slot: 0000:6d:00.0    led state: UNKNOWN         device: /dev/nvme10n1  
slot: 0000:6e:00.0    led state: UNKNOWN         device: /dev/nvme14n1  
slot: 0000:6f:00.0    led state: UNKNOWN         device: /dev/nvme18n1  
slot: 0000:70:00.0    led state: UNKNOWN         device: /dev/nvme19n1  
slot: 0000:e7:00.0    led state: UNKNOWN         device: /dev/nvme0n1   
slot: 0000:e8:00.0    led state: UNKNOWN         device: /dev/nvme3n1   
slot: 0000:e9:00.0    led state: UNKNOWN         device: /dev/nvme4n1   
slot: 0000:ea:00.0    led state: UNKNOWN         device: /dev/nvme5n1   
slot: 0000:eb:00.0    led state: UNKNOWN         device: /dev/nvme7n1   
slot: 0000:ec:00.0    led state: UNKNOWN         device: /dev/nvme9n1   
slot: 0000:ed:00.0    led state: UNKNOWN         device: /dev/nvme11n1  
slot: 0000:ee:00.0    led state: UNKNOWN         device: /dev/nvme12n1  
slot: 0000:f1:00.0    led state: UNKNOWN         device: /dev/nvme13n1  
slot: 0000:f2:00.0    led state: UNKNOWN         device: /dev/nvme15n1  
slot: 0000:f3:00.0    led state: UNKNOWN         device: /dev/nvme16n1  
slot: 0000:f4:00.0    led state: UNKNOWN         device: /dev/nvme17n1 

The hardware only supports locate, and I can both set and read that state:

# ./src/ledctl/ledctl --set-slot --slot 0000:f4:00.0 --controller-type "Kernel NPEM" --state locate
# ./src/ledctl/ledctl --list-slots -n "Kernel NPEM" | grep 0000:f4:00.0
slot: 0000:f4:00.0    led state: LOCATE          device: /dev/nvme17n1

Given that, I think it’s reasonable to report this as a known state (e.g., “OK”) rather than “UNKNOWN” when the hardware only implements a subset of the standard. We still need to be able to distinguish between:

  • a LED path that exists and returns a valid value (state is known), and
  • how we interpret that value semantically (e.g., mapping it to “OK”, “LOCATE”, etc.).

On a related note, I’m a little unsure about exposing yet another controller type to the end user. From their perspective, this feels more like an implementation detail: the same hardware on an older kernel would use a different implementation but should ideally look the same from the tool’s interface. That said, I understand why you introduced a new controller type here; I just want to flag the end user/API difference.

Notes

  • Unit tests fail on this hardware, as expected with what I'm seeing.
  • I'm not sure if this behavior is consistent for other LED transports too, when specifying an unsupported state for the hardware.
# ledctl failure=/dev/nvme16n1 ; echo $?;
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
0
# ledctl --set-slot --slot 0000:f4:00.0 --controller-type "Kernel NPEM" --state failure ; echo $?
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
10

Comment thread src/lib/libled.c Outdated
[LED_CNTRL_TYPE_AHCI] = "AHCI",
[LED_CNTRL_TYPE_NPEM] = "NPEM",
[LED_CNTRL_TYPE_AMD] = "AMD",
[LED_CNTRL_TYPE_KERNEL_NPEM] = "Kernel NPEM",
Copy link
Copy Markdown
Member

@mtkaczyk mtkaczyk Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh one last thing. On Linux systems spaces are avoided, so please change it to KERNEL_NPEM
I see that there is "Dell SSD" and you did it for consistently but I still think that changing it to avoid "" every time is better

@mtkaczyk
Copy link
Copy Markdown
Member

mtkaczyk commented Dec 2, 2025

Given that, I think it’s reasonable to report this as a known state (e.g., “OK”) rather than “UNKNOWN” when the hardware only implements a subset of the standard.

I would say that the right it to say that it belongs to hardware in this case, we would consider "HARDWARE_MANAGED" or something like that. I don't think that saying OK is correct because hardware may behave differently. Anyway, it is probably not big issue so it should be fine anyway.
It is already happening for other controllers as well.

On a related note, I’m a little unsure about exposing yet another controller type to the end user. From their perspective, this feels more like an implementation detail: the same hardware on an older kernel would use a different implementation but should ideally look the same from the tool’s interface. That said, I understand why you introduced a new controller type here; I just want to flag the end user/API difference.

This is reasonable. it should configurable by conf, I don't see a strong need to give users raw access to both interfaces directly.

@tasleson
Copy link
Copy Markdown
Collaborator

tasleson commented Dec 4, 2025

I would say that the right it to say that it belongs to hardware in this case, we would consider "HARDWARE_MANAGED" or something like that. I don't think that saying OK is correct because hardware may behave differently. Anyway, it is probably not big issue so it should be fine anyway.
It is already happening for other controllers as well.

This is what we have today

enum led_ibpi_pattern {
	LED_IBPI_PATTERN_UNKNOWN = 0,
	LED_IBPI_PATTERN_NONE = 1,	/* used only to initialize ibpi_prev */
	LED_IBPI_PATTERN_NORMAL = 2,
	LED_IBPI_PATTERN_ONESHOT_NORMAL = 3,
	LED_IBPI_PATTERN_DEGRADED = 4,
	LED_IBPI_PATTERN_HOTSPARE = 5,
	LED_IBPI_PATTERN_REBUILD = 6,
	LED_IBPI_PATTERN_FAILED_ARRAY = 7,
	LED_IBPI_PATTERN_PFA = 8,
	LED_IBPI_PATTERN_FAILED_DRIVE = 9,
	LED_IBPI_PATTERN_LOCATE = 10,
	LED_IBPI_PATTERN_LOCATE_OFF = 11,
	LED_IBPI_PATTERN_ADDED = 12,
	LED_IBPI_PATTERN_REMOVED = 13,
	LED_IBPI_PATTERN_LOCATE_AND_FAIL = 14,

How about LED_IBPI_PATTERN_NORMAL ?

IMHO we need to let the API caller know that we have some level of hardware support, LED_IBPI_PATTERN_UNKNOWN doesn't convey that.

NOTE: We can certainly add an additional value here.

@mtkaczyk
Copy link
Copy Markdown
Member

mtkaczyk commented Dec 9, 2025

How about LED_IBPI_PATTERN_NORMAL ?

For example, for NPEM it means that OK capability is ON. This would be also kind of inconsistently when we are presetting NORMAL when all LEDs are OFF. Unknown at least suggests that we don't know what is happening.

IMHO we need to let the API caller know that we have some level of hardware support, LED_IBPI_PATTERN_UNKNOWN doesn't convey that.

That would be perfect but probably we don't have such info in all cases. For NPEM, this information is visible as NPEM_ENABLE and it is turned on persistently when first NPEM request happens. There is no option to disable this and NPEM enable is not presented by driver:
https://elixir.bootlin.com/linux/v6.18/source/drivers/pci/npem.c#L187
https://elixir.bootlin.com/linux/v6.18/source/drivers/pci/npem.c#L207

	/* This bit is always required */
	ctrl = inds | PCI_NPEM_CTRL_ENABLE;

So, if all kernel interface NPEM capabilities are off before ledctl is used first time - it is hardware managed but if we turned NORMAL and disabled NORMAL, all are off but it is not hardware managed because NPEM_ENABLE is kept.
NPEM_ENABLE is not presented to userspace.

But overall, If we have good argument for that we can change UNKNOWN to NORMAL, I'm not against. It cannot be handled perfectly because kernel driver interface does not provide functionality to determine it (NPEM_ENABLE).

We have chosen to not present NPEM_ENABLE on purpose because for the regular user it provides unnecessary level of complexity and was not kernel ledclass interface friendly.

@bocmanpy
Copy link
Copy Markdown

Also wanted that feature, currently we can't use LED for our devices.

# ledctl --version
Intel(R) Enclosure LED Control Application 1.1.0
Copyright (C) 2009-2024 Intel Corporation.
# ledctl --list-controllers
/sys/devices/pci0000:58/0000:58:02.0 (NPEM)
/sys/devices/pci0000:6b/0000:6b:06.0 (NPEM)
/sys/devices/pci0000:c5/0000:c5:08.0 (NPEM)
/sys/devices/pci0000:58/0000:58:04.0 (NPEM)
/sys/devices/pci0000:6b/0000:6b:08.0 (NPEM)
/sys/devices/pci0000:c5/0000:c5:06.0 (NPEM)

Also the platform and kernel
6.8.0-87-generic
Supermicro Super Server (X14DBI-T)

@mtkaczyk
Copy link
Copy Markdown
Member

@stuarthayes gentle reminder :)

@stuarthayes
Copy link
Copy Markdown
Author

How about LED_IBPI_PATTERN_NORMAL ?

For example, for NPEM it means that OK capability is ON. This would be also kind of inconsistently when we are presetting NORMAL when all LEDs are OFF. Unknown at least suggests that we don't know what is happening.

IMHO we need to let the API caller know that we have some level of hardware support, LED_IBPI_PATTERN_UNKNOWN doesn't convey that.

How do you feel about using LED_IBPI_PATTERN_LOCATE_OFF when no LED states are active with kernel_npem in the special case for when LOCATE is the only LED that's supported (as is the case on many Dell systems)?

[root@sydsfwlab01-x2 ledmon_stuart2]# src/ledctl/ledctl --get-slot --slot 0000:8d:00.0 --controller-type "Kernel_NPEM"
slot: 0000:8d:00.0    led state: LOCATE_OFF      device: /dev/nvme5n1

Systems that support states other than just "locate" would presumably support all of the NPEM required states (i.e., "ok", "locate", "fail", and "rebuild"). "Locate" is kind of a special case in that it is independent of the other states, so even if firmware is controlling the LEDs indicating the drive state, the user could still toggle "locate" on and off. It's possible that some unknown/future system might support "fail" or "rebuild" or some other states but not explicitly support "ok", but that seems unlikely to me.

@stuarthayes
Copy link
Copy Markdown
Author

On a related note, I’m a little unsure about exposing yet another controller type to the end user. From their perspective, this feels more like an implementation detail: the same hardware on an older kernel would use a different implementation but should ideally look the same from the tool’s interface. That said, I understand why you introduced a new controller type here; I just want to flag the end user/API difference.

This is reasonable. it should configurable by conf, I don't see a strong need to give users raw access to both interfaces directly.

Is this requesting that the new kernel_npem controller ALSO be called “NPEM” rather than “Kernel NPEM”, so that the user just sees NPEM regardless of which controller type is used? So ledmon/ledctl would select the kernel_npem controller when it sees any devices preset that use that interface, otherwise it would choose the older (current) npem controller, unless the config file specifically specifies whether the user space or kernel NPEM driver should be used?

@mtkaczyk
Copy link
Copy Markdown
Member

Oh sorry @stuarthayes , I miss this..

How do you feel about using LED_IBPI_PATTERN_LOCATE_OFF when no LED states are active with kernel_npem in the special case for when LOCATE is the only LED that's supported (as is the case on many Dell systems)?

It is acceptable to me.

Is this requesting that the new kernel_npem controller ALSO be called “NPEM” rather than “Kernel NPEM”, so that the user just sees NPEM regardless of which controller type is used?

Correct!

So ledmon/ledctl would select the kernel_npem controller when it sees any devices preset that use that interface, otherwise it would choose the older (current) npem controller, unless the config file specifically specifies whether the user space or kernel NPEM driver should be used?

No, let's keep it simpler. Default should be kernel interface, so if no config option is specified we should check if kernel NPEM is supported and print error if not (we can suggest updating config file in error message).
No optimistic checks for detection should be done in my opinion, we work as configured.

Of course there is regression risk. I think it is still fine, we don't need to rush with ledmon release.

@tasleson
Copy link
Copy Markdown
Collaborator

@stuarthayes @mtkaczyk It would be great to get this change integrated into the code base, so that downstream users can get the functionality packaged etc.

Please let me know if there is anything I can do to help out.

Thanks!

@mtkaczyk
Copy link
Copy Markdown
Member

I already pinged @stuarthayes, he is aware that we are waiting.

Many systems (Dell servers) support the LED control _DSM but only support
the locate state. Return LOCATE_OFF instead of UNKNOWN for these sysetms
when LOCATE isn't on.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Change the new "Kernel NPEM" controller to be the "NPEM" controller,
replacing the older NPEM controller (that supports NPEM via direct PCI
config space access) by default. Add a conf file option "USERSPACE_NPEM="
that can be specified to force use of the older NPEM driver, for use on
systems with kernels that don't have the kernel NPEM/_DSM driver included.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
@stuarthayes
Copy link
Copy Markdown
Author

OK, I've changed the kernel_npem controller to report state as LOCATE_OFF (instead of UNKNOWN) if locate is the only state that is supported but locate is off... this will help Dell servers which only support the locate state.

I also changed kernel_npem to be called "NPEM" instead of "Kernel NPEM", and I modified the code so that only one NPEM controller will be used, which will be the new one (that interfaces with the kernel NPEM/_DSM driver) instead of the old one (which tries to directly access PCI config space in user space and doesn't support NPEM). I added an option "USERSPACE_NPEM=" which can be set to one in ledmon.conf to force use of the older NPEM driver in case someone is using this with a kernel that doesn't support the NPEM/_DSM driver.

Sorry it took so long to address these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants