fix ascend vnpu addResource#5363
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a PodMap to AscendDevice to track device usage per pod, preventing duplicate resource additions or subtractions. It also refactors AddResourceUsage and SubResourceUsage to accept *AscendDevice directly and switches from InRequestDevices to SupportDevices. Feedback on the changes highlights an inefficiency in SubResource where a new DeviceUsage is redundantly allocated and assigned to PodMap immediately before the key is deleted.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Signed-off-by: james <open4pd@4paradigm.com>
efa0705 to
c05e901
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds per-pod tracking for Ascend device allocations to make resource accounting idempotent and improves unit coverage around resource addition.
Changes:
- Introduces
PodMaponAscendDeviceto track per-pod device usage and prevent double-counting. - Updates add/sub resource logic to use
PodMapand to skip unknown devices safely. - Adds a table-driven test suite for
AscendDevices.AddResource.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/scheduler/api/devices/ascend/hami/device_info.go | Adds per-pod usage tracking (PodMap), updates add/sub accounting flow, and extends deepcopy/snapshot to include the new map. |
| pkg/scheduler/api/devices/ascend/hami/device_info_test.go | Adds table-driven tests and helper constructors for AddResource behavior (single/multi/duplicate/invalid annotation, etc.). |
Comments suppressed due to low confidence (2)
pkg/scheduler/api/devices/ascend/hami/device_info.go:154
- This changes the public method signatures from
(id string, ...)to(*AscendDevice, ...), which is a breaking API change for any external callers. Consider keeping the originalid-based exported methods as wrappers (lookup + call an unexported helper), or make the pointer-taking variants unexported to avoid forcing downstream changes.
func (ads *AscendDevices) AddResourceUsage(dev *AscendDevice, cores int32, mem int32) error {
dev.DeviceUsage.Used++
dev.DeviceUsage.Usedcores += cores
dev.DeviceUsage.Usedmem += mem
return nil
}
func (ads *AscendDevices) SubResourceUsage(dev *AscendDevice, cores int32, mem int32) error {
dev.DeviceUsage.Used--
dev.DeviceUsage.Usedcores -= cores
dev.DeviceUsage.Usedmem -= mem
return nil
}
pkg/scheduler/api/devices/ascend/hami/device_info.go:171
- The SubResource behavior changed materially (annotation key source + PodMap-gated subtraction). There are new tests for AddResource, but no corresponding unit tests asserting SubResource decrements exactly once and cleans up PodMap (including cases like missing device, repeated SubResource calls for same pod, and PodMap missing entry behavior). Adding targeted tests here would prevent regressions.
func (ads *AscendDevices) SubResource(pod *v1.Pod) {
if ads == nil {
return
}
ano_key := devices.SupportDevices[ads.Type]
ano, ok := pod.Annotations[ano_key]
if !ok {
return
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: james <open4pd@4paradigm.com>
ca2b977 to
e6c8337
Compare
|
Please help /cc @archlitchi |
|
@DSFans2014 have you tested on a NPU environment? |
Yes, I've already tested this fix locally. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: archlitchi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-1.15 |
|
@JesseStutler: new pull request created: #5377 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #5360
Special notes for your reviewer:
Does this PR introduce a user-facing change?