[release-1.14] Fix hami vGPU scheduling failure in large and medium-scale clusters by volcano-sh-bot · Pull Request #5427 · volcano-sh/volcano

volcano-sh-bot · 2026-06-11T06:58:24Z

This is an automated cherry-pick of #5393

Signed-off-by: fanhy36 <fanhy36@chinaunicom.cn>

volcano-sh-bot · 2026-06-11T06:58:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kingeasternsun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request removes the vGPU handshake mechanism and its associated node annotation patching, while also removing the unused patchNodeAnnotations utility function. Instead, it introduces validation checks to ensure that nodes have allocatable vGPU resources (VolcanoVGPUNumber, VolcanoVGPUCores, and VolcanoVGPUMemory) with non-zero values. The feedback suggests refactoring these repetitive resource validation checks into a loop to improve code maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T06:59:33Z

+		gpuNumberRes, gpuNumberExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUNumber)]
+		if !gpuNumberExists || gpuNumberRes.Value() == 0 {
+			klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUNumber)
+			return nil
+		}
+
+		vgpuCoresRes, vgpuCoresExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUCores)]
+		if !vgpuCoresExists || vgpuCoresRes.Value() == 0 {
+			klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUCores)
+			return nil
+		}
+
+		vgpuMemoryRes, vgpuMemoryExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUMemory)]
+		if !vgpuMemoryExists || vgpuMemoryRes.Value() == 0 {
+			klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUMemory)
+			return nil
+		}


The checks for the three allocatable vGPU resources (VolcanoVGPUNumber, VolcanoVGPUCores, and VolcanoVGPUMemory) are highly repetitive. We can simplify this code and improve maintainability by iterating over a slice of the required resource names in a loop.

Suggested change

gpuNumberRes, gpuNumberExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUNumber)]

if !gpuNumberExists || gpuNumberRes.Value() == 0 {

klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUNumber)

return nil

}

vgpuCoresRes, vgpuCoresExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUCores)]

if !vgpuCoresExists || vgpuCoresRes.Value() == 0 {

klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUCores)

return nil

}

vgpuMemoryRes, vgpuMemoryExists := node.Status.Allocatable[v1.ResourceName(deviceconfig.VolcanoVGPUMemory)]

if !vgpuMemoryExists || vgpuMemoryRes.Value() == 0 {

klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, deviceconfig.VolcanoVGPUMemory)

return nil

}

requiredResources := []string{

deviceconfig.VolcanoVGPUNumber,

deviceconfig.VolcanoVGPUCores,

deviceconfig.VolcanoVGPUMemory,

}

for _, resName := range requiredResources {

res, exists := node.Status.Allocatable[v1.ResourceName(resName)]

if !exists || res.Value() == 0 {

klog.V(3).Infof("Node %s does not have allocatable %s resource or value is 0, returning nil", node.Name, resName)

return nil

}

}

JesseStutler · 2026-06-11T12:20:54Z

This PR wrongfully contains Chinese commit, which needs to refactor, I will allow Copilot to amend the commit and cherrypick

删除scheduler和hami-dp之间的handshake

dfa1e6a

Signed-off-by: fanhy36 <fanhy36@chinaunicom.cn>

volcano-sh-bot mentioned this pull request Jun 11, 2026

Fix hami vGPU scheduling failure in large and medium-scale clusters #5393

Merged

volcano-sh-bot requested review from merryzhou and william-wang June 11, 2026 06:58

volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 11, 2026

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

JesseStutler closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-1.14] Fix hami vGPU scheduling failure in large and medium-scale clusters#5427

[release-1.14] Fix hami vGPU scheduling failure in large and medium-scale clusters#5427
volcano-sh-bot wants to merge 1 commit into
volcano-sh:release-1.14from
volcano-sh-bot:cherry-pick-5393-to-release-1.14

volcano-sh-bot commented Jun 11, 2026

Uh oh!

volcano-sh-bot commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

JesseStutler commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

volcano-sh-bot commented Jun 11, 2026

Uh oh!

volcano-sh-bot commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

JesseStutler commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants