feat(predicates): implement DRA Score/NormalizeScore and add compile-time reflection checks#5423
feat(predicates): implement DRA Score/NormalizeScore and add compile-time reflection checks#5423gagan-devv wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @gagan-devv! It looks like this is your first PR to volcano-sh/volcano 🎉 |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds Dynamic Resource Allocation (DRA) scoring support to the predicates plugin and introduces E2E + unit guardrail tests to validate scheduling/scoring behavior and upstream framework interface changes.
Changes:
- Add a new scheduling E2E test suite and a DRA score-focused E2E spec.
- Register DynamicResources as a Score plugin (with adapter + normalization support) and update existing plugin init tests.
- Add a guardrail test that reflects upstream plugin framework interface implementations and unit tests for the score adapter’s PreScore behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/scheduling/main_test.go | Adds suite-level client initialization for scheduling E2E tests. |
| test/e2e/scheduling/e2e_test.go | Adds Ginkgo suite entrypoint for scheduling E2E tests. |
| test/e2e/scheduling/dra_score.go | Adds an E2E spec validating scheduling using DRA scores and sets up a DeviceClass/ResourceClaimTemplate. |
| pkg/scheduler/plugins/predicates/predicates_test.go | Updates expectations to include DynamicResources in Score when enabled. |
| pkg/scheduler/plugins/predicates/predicates_extension_test.go | Adds reflection-based guardrail for upstream plugin interfaces + unit test for score adapter PreScore behavior. |
| pkg/scheduler/plugins/predicates/predicates.go | Enables DRA via args, registers DynamicResources scoring, improves normalizer selection, adds score adapter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func TestMain(m *testing.M) { | ||
| home := e2eutil.HomeDir() | ||
| configPath := e2eutil.KubeconfigPath(home) | ||
| config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath) | ||
| e2eutil.VcClient = vcclient.NewForConfigOrDie(config) | ||
| e2eutil.KubeClient = kubernetes.NewForConfigOrDie(config) | ||
| os.Exit(m.Run()) | ||
| } |
| driver := drautils.NewDriver(f, nodes, drautils.DriverResources(2)) | ||
|
|
||
| BeforeEach(func() { | ||
| ensureDeviceClass(driver.Name) |
| { | ||
| Name: "req-1", | ||
| Exactly: &resourcev1.ExactDeviceRequest{ | ||
| DeviceClassName: "dra-score-class", |
| func ensureDeviceClass(driverName string) { | ||
| class := &resourcev1.DeviceClass{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "dra-score-class", | ||
| }, |
| DeferCleanup(func() { | ||
| err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{}) | ||
| if err != nil && !apierrors.IsNotFound(err) { | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| } | ||
| }) | ||
| } |
| "PostBindPlugin": reflect.TypeOf((*fwk.PostBindPlugin)(nil)).Elem(), | ||
| "QueueSortPlugin": reflect.TypeOf((*fwk.QueueSortPlugin)(nil)).Elem(), | ||
| "PreEnqueuePlugin": reflect.TypeOf((*fwk.PreEnqueuePlugin)(nil)).Elem(), | ||
| "SignPlugin": reflect.TypeOf((*fwk.SignPlugin)(nil)).Elem(), |
| func (s *scorePluginAdapter) PreScore(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodes []fwk.NodeInfo) *fwk.Status { | ||
| if !s.enabled { | ||
| return fwk.NewStatus(fwk.Skip, "DRA scoring is disabled") | ||
| } |
| func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) { | ||
| if !s.enabled { | ||
| return 0, nil | ||
| } |
| draWeight := 1 | ||
| if w, ok := framework.Get[int](pp.pluginArguments, "dynamicresources.weight"); ok { | ||
| draWeight = w | ||
| } | ||
| addScorePlugin(dynamicresources.Name, &scorePluginAdapter{ | ||
| ScorePlugin: dynamicResourceAllocationPlugin, | ||
| enabled: pp.enabledPredicates.dynamicResourceAllocationEnable, | ||
| }, draWeight) |
There was a problem hiding this comment.
Code Review
This pull request introduces Dynamic Resource Allocation (DRA) scoring support in the predicates plugin, along with corresponding unit and E2E tests. Feedback on the changes includes addressing a potential nil pointer dereference in the Score method when logging node information, properly handling errors when building the kubeconfig in E2E tests, and replacing context.TODO() with context.Background() in test contexts.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) { | ||
| if !s.enabled { | ||
| return 0, nil | ||
| } | ||
| klog.V(4).Infof("Scoring pod %s/%s on node %s using DRA", pod.Namespace, pod.Name, nodeInfo.Node().Name) | ||
| return s.ScorePlugin.Score(ctx, state, pod, nodeInfo) | ||
| } |
There was a problem hiding this comment.
In Score, logging nodeInfo.Node().Name directly can cause a nil pointer dereference panic if nodeInfo or nodeInfo.Node() is nil (which can happen in mock/test environments or edge cases). It is safer to check for nil before accessing the node name.
func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) {
if !s.enabled {
return 0, nil
}
nodeName := ""
if nodeInfo != nil && nodeInfo.Node() != nil {
nodeName = nodeInfo.Node().Name
}
klog.V(4).Infof("Scoring pod %s/%s on node %s using DRA", pod.Namespace, pod.Name, nodeName)
return s.ScorePlugin.Score(ctx, state, pod, nodeInfo)
}| func TestMain(m *testing.M) { | ||
| home := e2eutil.HomeDir() | ||
| configPath := e2eutil.KubeconfigPath(home) | ||
| config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath) |
There was a problem hiding this comment.
The error returned by clientcmd.BuildConfigFromFlags is ignored. If building the config fails, config will be nil, causing NewForConfigOrDie to panic with an unhelpful error. It is better to handle the error explicitly.
| config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath) | |
| config, err := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath) | |
| if err != nil { | |
| panic(err) | |
| } |
| _, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.TODO(), class, metav1.CreateOptions{}) | ||
| if err != nil && !apierrors.IsAlreadyExists(err) { | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| } | ||
|
|
||
| DeferCleanup(func() { | ||
| err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{}) | ||
| if err != nil && !apierrors.IsNotFound(err) { | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| } | ||
| }) |
There was a problem hiding this comment.
Use context.Background() instead of context.TODO() as a default context for API calls in E2E tests when a specific request context is not available.
| _, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.TODO(), class, metav1.CreateOptions{}) | |
| if err != nil && !apierrors.IsAlreadyExists(err) { | |
| Expect(err).NotTo(HaveOccurred()) | |
| } | |
| DeferCleanup(func() { | |
| err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{}) | |
| if err != nil && !apierrors.IsNotFound(err) { | |
| Expect(err).NotTo(HaveOccurred()) | |
| } | |
| }) | |
| _, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.Background(), class, metav1.CreateOptions{}) | |
| if err != nil && !apierrors.IsAlreadyExists(err) { | |
| Expect(err).NotTo(HaveOccurred()) | |
| } | |
| DeferCleanup(func() { | |
| err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.Background(), "dra-score-class", metav1.DeleteOptions{}) | |
| if err != nil && !apierrors.IsNotFound(err) { | |
| Expect(err).NotTo(HaveOccurred()) | |
| } | |
| }) |
…ardrails (volcano-sh#5329) Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
Fixes parsing of predicate.DynamicResourceAllocationEnable in predicates plugin. Improves scorePluginAdapter to short-circuit when disabled and comply with klog.V(4). Adds Ginkgo E2E test suite under test/e2e/scheduling to verify DRA score scheduling. Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
Adds nil-safety checks to scorePluginAdapter.Score and configures weight validation. Removes redundant enabled flag in scorePluginAdapter for consistent disabled behavior. Updates E2E tests with unique DeviceClass names and handles BuildConfigFromFlags errors. Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
eb11446 to
cc59923
Compare
|
Hi @JesseStutler, I have successfully implemented the requested framework changes and opened this PR for review! Summary of Changes:
All unit tests and local Kind integration suites pass successfully. Ready for your review and feedback! |
Description
This PR resolves the synchronization gap between Volcano's scheduling predicates adapter layer and upstream
kube-schedulerframework extensions regarding Dynamic Resource Allocation (DRA).Key Implementations:
ScoreandNormalizeScoreextension entry points within Volcano's predicates proxy, allowing the scheduler to respect upstream DRA prioritization metrics when computing node weights.DynamicResourceAllocationEnableflag, ensuring the scoring pathway short-circuits gracefully to a neutral weight if DRA is disabled.predicates_extension_test.go) utilizing Go'sreflectpackage to automatically audit wrapped in-tree plugins (DynamicResourcesandVolumeBinding). The test maintains an explicit allowed interface whitelist and will throw a test failure if upstream updates introduce unmapped interface hooks.test/e2e/scheduling/dra_score.go), explicitly declaringschedulerName: volcano.Fixes #5329
Type of Change
Verification Conducted
go test -v ./pkg/scheduler/plugins/predicates/...executes successfully and the reflection matrix catches un-whitelisted assertions.dra_score.gosuite successfully passes under Volcano's scheduling loop runtime.