Skip to content

Add SystemPerformanceInfo for compute platform telemetry#251

Open
follesoe wants to merge 3 commits intomasterfrom
system-performance-telemetry
Open

Add SystemPerformanceInfo for compute platform telemetry#251
follesoe wants to merge 3 commits intomasterfrom
system-performance-telemetry

Conversation

@follesoe
Copy link
Member

@follesoe follesoe commented Mar 5, 2026

Summary

Adds comprehensive system performance telemetry to support monitoring across both iMX6 (X3) and Jetson Orin NX 16 GB (X3 Ultra / X7) platforms — similar to what jtop, htop, and tegrastats provide.

New messages in message_formats.proto

Message Purpose
CpuCoreLoad Per-core CPU index, load (0..1), and clock frequency (MHz)
GpuInfo GPU load, frequency, and temperature
DlaInfo Per-engine Deep Learning Accelerator load and frequency
MemoryInfo RAM and swap totals/used/cached (uint64 to handle 16 GB+)
ThermalZone Typed thermal zone reading using ThermalZoneId enum (CPU, GPU, SOC, BOARD, TJ)
VideoCodecInfo NVENC/NVDEC encoder and decoder load
PowerRailInfo Per-rail voltage, current, and power from INA sensors
SystemPerformanceInfo Composite message combining all of the above plus queue loads

New in telemetry.proto

  • SystemPerformanceInfoTel — telemetry wrapper for SystemPerformanceInfo

Deprecations

  • CPUInfo — superseded by SystemPerformanceInfo
  • CPUTemperature — superseded by SystemPerformanceInfo.thermal_zones

Both are kept intact for backward compatibility.

Wire size estimates

Jetson Orin NX (all fields populated: 8 cores, GPU, 2 DLA engines, 4 thermal zones, 4 power rails):

Field Bytes
cpu_cores (×8) ~112
cpu_load_average 5
gpu ~17
dla_engines (×2) ~28
memory ~57
thermal_zones (×4) ~36
power_rails (×4) ~112
video_codec ~12
Queue loads (×3) 15
Total ~394 B

iMX6 (4 cores, 1 thermal zone, no GPU/DLA/power/codec): ~133 B

At 1–10 Hz publish rates this is well under 5 KB/s on the wire.

Design decisions

  • ThermalZoneId enum instead of string for thermal zones — saves ~25 bytes and provides type safety. Zones are well-known and finite across our platforms.
  • PowerRailInfo.name stays as string — rail names vary across board revisions, so a string is more flexible here.
  • uint64 for memory fieldsuint32 would overflow at 4 GB, insufficient for the 16 GB Orin NX.
  • Platform-agnostic composite message — unpopulated fields are zero/empty by default in protobuf, so iMX6 simply omits GPU/DLA/power/codec fields with no overhead.

🤖 Generated with Claude Code

Add new protocol messages to support detailed performance monitoring
across both iMX6 (X3) and Jetson Orin NX (X3 Ultra/X7) platforms.

New messages: CpuCoreLoad, GpuInfo, DlaInfo, MemoryInfo, ThermalZone,
VideoCodecInfo, PowerRailInfo, and a composite SystemPerformanceInfo
with corresponding SystemPerformanceInfoTel telemetry wrapper.

Deprecates CPUInfo, CPUTemperature in favor of the new messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@follesoe follesoe requested review from Copilot and jp-pino and removed request for jp-pino March 5, 2026 09:59
@follesoe follesoe self-assigned this Mar 5, 2026
@follesoe follesoe added the enhancement New feature or request label Mar 5, 2026
@follesoe follesoe added this to the Blunux v4.7 milestone Mar 5, 2026
@follesoe follesoe marked this pull request as ready for review March 5, 2026 10:00
@follesoe follesoe requested a review from jp-pino March 5, 2026 10:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new composite telemetry schema for system performance monitoring across compute platforms (iMX6 and Jetson Orin NX), introducing a richer replacement for legacy CPU-only telemetry while keeping backward compatibility.

Changes:

  • Added SystemPerformanceInfo and supporting messages/enums to message_formats.proto (CPU cores, GPU/DLA, memory, thermals, power rails, video codec, queue loads).
  • Added SystemPerformanceInfoTel wrapper to telemetry.proto.
  • Marked CPUInfo / CPUTemperature and CPUInfoTel as deprecated via comments pointing to the new message(s).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
protobuf_definitions/telemetry.proto Adds SystemPerformanceInfoTel and marks CPUInfoTel as deprecated in comments.
protobuf_definitions/message_formats.proto Introduces SystemPerformanceInfo and related component messages; annotates legacy CPU messages as deprecated.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Capitalize "cpu" to "CPU" in telemetry deprecation comment for
  consistency with existing naming conventions.
- Rename cpu_load_average to cpu_utilization to avoid confusion with
  Linux load average (which is unbounded and reported as 1/5/15 min).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@jp-pino jp-pino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference this is the info we can probably get:

Image Image Image

Probably also good to report the current power mode

tegrastats and jtop only expose active/inactive status and clock
frequency for NVENC/NVDEC, not utilization percentages. Updated
fields from load floats to bool active + frequency pairs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@follesoe follesoe requested a review from jp-pino March 5, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants