Skip to content
Discussion options

You must be logged in to vote

Hey @harinik05 , In the current design, nvsentinel-gpu-health-monitor component is connecting to dcgm via dcgm service only.hostengine start into the gpu-health-monitor is not yet supported. One of the reason that it is like that is because dcgm-exporter also need dcgm metrics and they are also connecting with dcgm the same way nvsentinel-gpu-health-monitor is connecting to dcgm. So, by default that is the design choice. We will try to think more if we can integrate host-engine starting ibuilt in nvsentinel-gpu-health-monitor component.

Can you please your use case? Is it that in your deployments, dcgm is not running as a standalone service?

Replies: 2 comments 6 replies

Comment options

You must be logged in to vote
4 replies
@deesharma24
Comment options

@deesharma24
Comment options

@harinik05
Comment options

@deesharma24
Comment options

Answer selected by deesharma24
Comment options

You must be logged in to vote
2 replies
@lalitadithya
Comment options

@harinik05
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants