A Prometheus exporter for NVIDIA/Mellanox network adapter temperature monitoring using the mget_temp utility from NVIDIA Firmware Tools (MFT).
This exporter polls comprehensive temperature data from NVIDIA/Mellanox network adapters using two commands per device and exposes the metrics in Prometheus format. It provides:
- Device temperatures (
mget_temp): Main temperature readings from network adapters (direct reading from mget_temp -d DEVICE) - Thermal diode temperatures (
mget_thermal_diode_temp_celsius): Individual thermal sensor temperature readings in Celsius, including maximum allowed temperature as a label - Thermal diode voltages (
mget_thermal_diode_voltage_volts): Individual thermal sensor voltage readings in Volts, including maximum allowed voltage as a label
The exporter uses two commands per device:
mget_temp -d devicefor the main temperature readingmget_temp -d device -vfor detailed thermal diode data
It automatically distinguishes between temperature (T) and voltage (V) measurements and creates appropriate metrics for each type, with thresholds included as labels.
The exporter automatically discovers MST devices using mst status command, polls temperature data every 10 seconds, and exposes only custom metrics (no Go runtime or process metrics) at the /metrics endpoint on port 6656 by default (configurable).
- Go 1.26+
- NVIDIA Firmware Tools (MFT) must be installed and available in your PATH
- NVIDIA/Mellanox network adapters
- Superuser/Administrator privileges - The exporter must be run with elevated privileges as the
mget_temputility requires superuser access
Download and install NVIDIA Firmware Tools from: https://network.nvidia.com/products/adapter-software/firmware-tools/
Ensure both the mget_temp and mst utilities are available in your system PATH.
The exporter automatically discovers MST devices using the mst status command. No manual configuration is required - it will find and monitor all available MST devices.
mst start first before device discovery will work:
sudo mst startManual Device Specification (Optional):
If you need to specify devices manually, use the -devices flag:
# Linux
sudo ./mget_exporter -devices "/dev/mst/mt4127_pciconf0,/dev/mst/mt4128_pciconf0"
# Windows
mget_exporter.exe -devices "mt4115_pciconf0,mt4116_pciconf0"The included build.sh script cross-compiles for Linux (amd64, arm64) and Windows (amd64) with stripped binaries:
./build.shBinaries are placed in the build/ directory.
# Linux
go build -o mget_exporter .
# Windows (cross-compile from Linux)
GOOS=windows GOARCH=amd64 go build -o mget_exporter.exe .-port string: Port to listen on (default "6656")-devices string: Comma-separated list of device IDs (optional - if not specified, will auto-discover usingmst status-⚠️ On Linux you will have to runmst startfirst )
-
First, start the MST service:
sudo mst start
-
Run the exporter with sudo (automatic device discovery):
sudo ./mget_exporter
Or with a custom port:
sudo ./mget_exporter -port 8080
Or with manual device specification:
sudo ./mget_exporter -devices "/dev/mst/mt4127_pciconf0,/dev/mst/mt4128_pciconf0" -
Access metrics at
http://localhost:6656/metrics(or your custom port)
-
Open Command Prompt or PowerShell as Administrator
-
Run the exporter (automatic device discovery):
mget_exporter.exe
Or with a custom port:
mget_exporter.exe -port 8080Or with manual device specification:
mget_exporter.exe -devices "mt4115_pciconf0,mt4116_pciconf0" -
Access metrics at
http://localhost:6656/metrics(or your custom port)
Note: On Windows, you must run the entire terminal session as Administrator before executing the exporter.
The exporter provides the following Prometheus metrics (only custom metrics, no Go runtime or process metrics):
mget_temp{device="device_name"}: Main temperature reading from the network adapter (direct reading from mget_temp -d DEVICE)mget_thermal_diode_temp_celsius{device="device_name", diode="diode_name", threshold="max_temp"}: Temperature from individual thermal diodes in Celsius. The threshold label contains the maximum allowed temperature as an unsigned integer.mget_thermal_diode_voltage_volts{device="device_name", diode="diode_name", threshold="max_voltage"}: Voltage from individual thermal diodes in Volts. The threshold label contains the maximum allowed voltage as an unsigned integer.
The exporter efficiently gathers temperature data using the following approach:
- Main Temperature: Runs
mget_temp -d deviceto get the main device temperature for each discovered device - Thermal Diode Data: Runs
mget_temp -d device -vto get detailed thermal diode information for each device - Data Parsing: Parses the tabular output to extract thermal diode information
- Parallel Processing: Each device is monitored in parallel for optimal performance
This approach ensures automatic discovery of all available devices while maintaining the most accurate temperature readings.
This software uses and integrates with proprietary NVIDIA tools and technologies:
- NVIDIA and Mellanox are trademarks of NVIDIA Corporation
- mget_temp utility is part of NVIDIA Firmware Tools (MFT) and is copyrighted by NVIDIA Corporation
- NVIDIA Firmware Tools are copyrighted by NVIDIA Corporation
This exporter is a third-party tool that interfaces with NVIDIA's MFT utilities and is not officially endorsed by NVIDIA Corporation.
This project is licensed under the MIT License - see the LICENSE file for details.
Important: While this exporter code is open source under the MIT License, it depends on proprietary NVIDIA Firmware Tools (MFT):
- This exporter code itself is free to use and modify under the MIT License
- NVIDIA Firmware Tools (MFT) are proprietary software - ensure compliance with NVIDIA's licensing terms
- The
mget_temputility and other MFT components are subject to NVIDIA's license agreements - Users must obtain and install NVIDIA Firmware Tools separately from official NVIDIA sources
By using this exporter, you acknowledge that you will comply with all applicable NVIDIA license terms for the MFT tools.