GPU Monitoring

tmam can collect real-time GPU metrics from NVIDIA and AMD GPUs and display them in the dashboard's GPU Analytics view.

Enabling GPU Monitoring

Pass collect_gpu_stats=True to init():

from tmam import init
init(
url="http://localhost:5050/api/sdk",
public_key="pk-tmam-xxxxxxxx",
secrect_key="sk-tmam-xxxxxxxx",
application_name="my-gpu-app",
collect_gpu_stats=True,
)

tmam will auto-detect whether an NVIDIA or AMD GPU is present.

GPU Vendor Requirements

NVIDIA GPUs

Install the pynvml library (NVIDIA Management Library Python bindings):

pip install pynvml

Requires NVIDIA drivers to be installed on the host. Works with any CUDA-capable GPU.

AMD GPUs

Install the amdsmi library:

pip install amdsmi

Requires AMD ROCm drivers to be installed on the host.

Metrics Collected

For each GPU in the system, tmam collects the following metrics, tagged with the GPU index, UUID, and name:

Metric Name	OTel Name	Description
Utilization	gpu.utilization	Core utilization %
Encoder Utilization	gpu.enc.utilization	Video encoder utilization %
Decoder Utilization	gpu.dec.utilization	Video decoder utilization %
Temperature	gpu.temperature	Temperature in °C
Fan Speed	gpu.fan_speed	Fan speed (NVIDIA only)
Memory Available	gpu.memory.available	Available VRAM in MB
Memory Total	gpu.memory.total	Total VRAM in MB
Memory Used	gpu.memory.used	Used VRAM in MB
Memory Free	gpu.memory.free	Free VRAM in MB
Power Draw	gpu.power.draw	Current power draw in W
Power Limit	gpu.power.limit	Power limit in W

All metrics are tagged with:

gpu.index — GPU index (0, 1, 2...)
gpu.uuid — GPU UUID
gpu.name — GPU model name
service.name — your application_name
deployment.environment — your environment

Viewing GPU Metrics

In the dashboard, navigate to Analytics → GPU to see:

GPU utilization over time
Memory usage (used vs. total)
Temperature and power draw
Per-GPU breakdowns for multi-GPU systems

No GPU Detected

If collect_gpu_stats=True but no supported GPU is found, tmam logs:

Tmam GPU Instrumentation Error: No supported GPUs found.
If this is a non-GPU host, set `collect_gpu_stats=False` to disable GPU stats.

This does not affect other tracing or metrics collection — it is non-fatal.

Example: LLM + GPU Monitoring

from tmam import init
from transformers import pipeline
init(
url="http://localhost:5050/api/sdk",
public_key="pk-tmam-xxxxxxxx",
secrect_key="sk-tmam-xxxxxxxx",
application_name="local-llm",
environment="dev",
collect_gpu_stats=True,   # monitor GPU while running inference
)
Transformers calls are auto-instrumented
generator = pipeline("text-generation", model="gpt2", device=0)
output = generator("The future of AI is", max_new_tokens=50)
print(output[0]["generated_text"])

While inference runs, tmam records both the LLM span (tokens, latency) and GPU metrics (VRAM usage, utilization) — correlatable by timestamp in the dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Monitoring

GPU Monitoring

Enabling GPU Monitoring

GPU Vendor Requirements

NVIDIA GPUs

AMD GPUs

Metrics Collected

Viewing GPU Metrics

No GPU Detected

Example: LLM + GPU Monitoring

Transformers calls are auto-instrumented

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally