NVSentinel Health Monitor #755
-
|
Is it required for the health monitor to use dcgm host engine as a separate service for collecting data and sending to platform-connectors? Does NVSentinel support standalone mode instead? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
|
Hey @harinik05 , In the current design, nvsentinel-gpu-health-monitor component is connecting to dcgm via dcgm service only.hostengine start into the gpu-health-monitor is not yet supported. One of the reason that it is like that is because dcgm-exporter also need dcgm metrics and they are also connecting with dcgm the same way nvsentinel-gpu-health-monitor is connecting to dcgm. So, by default that is the design choice. We will try to think more if we can integrate host-engine starting ibuilt in nvsentinel-gpu-health-monitor component. Can you please your use case? Is it that in your deployments, dcgm is not running as a standalone service? |
Beta Was this translation helpful? Give feedback.
-
|
@deesharma24 i got a follow up question on this. You mentioned earlier that dcgm exporter and gpu health monitor can connect the same way to dcgm service. Is it possible to run both of them simultaneously? Wondering if theres contention that is caused when 2 clients are trying to connect to the same host engine service endpoint |
Beta Was this translation helpful? Give feedback.
Hey @harinik05 , In the current design, nvsentinel-gpu-health-monitor component is connecting to dcgm via dcgm service only.hostengine start into the gpu-health-monitor is not yet supported. One of the reason that it is like that is because dcgm-exporter also need dcgm metrics and they are also connecting with dcgm the same way nvsentinel-gpu-health-monitor is connecting to dcgm. So, by default that is the design choice. We will try to think more if we can integrate host-engine starting ibuilt in nvsentinel-gpu-health-monitor component.
Can you please your use case? Is it that in your deployments, dcgm is not running as a standalone service?