Skip to content

Conversation

@alexsu52
Copy link

Changes:

Only flag imbalance if the COUNT of GPUs on each node differs.
Example:
4 on Node 0, 4 on Node 1 -> counts=[4,4] -> set={4} -> len=1 -> NOT imbalanced.
7 on Node 0, 1 on Node 1 -> counts=[7,1] -> set={7,1} -> len=2 -> Imbalanced.

Reason for changes:

The previous logic would issue a NUMA imbalance warning if not all GPUs were connected to the same node, resulting in a false positive when using a multi-socket CPU.

Copilot AI review requested due to automatic review settings February 12, 2026 11:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a false positive NUMA imbalance warning that occurred when GPUs were distributed across multiple NUMA nodes in a balanced manner (e.g., multi-socket CPUs with equal GPU counts per socket). The logic now correctly identifies imbalance only when GPU counts differ between nodes.

Changes:

  • Updated NUMA imbalance detection logic to check for uneven GPU distribution across nodes rather than merely checking if GPUs span multiple nodes

@alexsu52 alexsu52 changed the title Fix NUMA imbalance to mean uneven GPU distribution across nodes Fix/preflight NUMA imbalance to mean uneven GPU distribution across nodes Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant