fix(collector): add index label to hwmon metrics to distinguish devices#456
Conversation
On dual socket systems multiple hwmon devices can share the same chip_name. This adds an index label derived from the directory name to ensure uniqueness.
|
Hello @samoz83 Thanks a lot for your time and effort in putting up this PR. Really appreciate it!! Is it possible for you to share the files inside Cheers!! |
|
I did try to ttar the directory, but it had issues. Hopefully this tar has what you need, please say if you require me to do anything else. Thanks |
Signed-off-by: Mahendra Paipuri <mahendra.paipuri@gmail.com>
|
Hello @samoz83 Thanks a lot for files. I pushed a commit on your branch that updates the e2e test fixtures. It is strange that commit exists on your branch but is not showing up on the PR. Maybe you can look into it, please? If not, close this PR and open a new one. Thanks in advance!! And thanks again for your time and effort! |
|
Not sure what happened there, but I've managed to push your changes, hopefully should be okay now. |
|
Awesome!! Cheers @samoz83 |
|
@samoz83 I will make a release with this patch by the end of the day! Thanks again! |
Fixes #455
Description
Some hwmon devices can report the exact same name.
Previously, the collector relied solely on this name for labelling. In my case, this caused Prometheus to treat metrics from different sockets as duplicates, resulting in silent data loss (one socket overwriting the other).
The Fix:
This PR adds a unique index label (derived from the directory number, e.g., 6 or 7) to the generated metrics. This ensures every socket produces a unique time series, even if they share the same driver name.
How Has This Been Tested?
Tested on a dual-socket AMD EPYC server running Rocky Linux 9.
Before:
Output showed only one metric series for
amd_hsmp_hwmondespite two sockets being present.After:
Output now correctly shows two distinct series with unique indices: