Skip to content

feat(predicates): implement DRA Score/NormalizeScore and add compile-time reflection checks#5423

Open
gagan-devv wants to merge 3 commits into
volcano-sh:masterfrom
gagan-devv:feature/dra-score-guardrails
Open

feat(predicates): implement DRA Score/NormalizeScore and add compile-time reflection checks#5423
gagan-devv wants to merge 3 commits into
volcano-sh:masterfrom
gagan-devv:feature/dra-score-guardrails

Conversation

@gagan-devv

Copy link
Copy Markdown

Description

This PR resolves the synchronization gap between Volcano's scheduling predicates adapter layer and upstream kube-scheduler framework extensions regarding Dynamic Resource Allocation (DRA).

Key Implementations:

  1. DRA Weight Mappings: Fully wired up the Score and NormalizeScore extension entry points within Volcano's predicates proxy, allowing the scheduler to respect upstream DRA prioritization metrics when computing node weights.
  2. Configuration Compliance: Wrapped execution vectors to respect the DynamicResourceAllocationEnable flag, ensuring the scoring pathway short-circuits gracefully to a neutral weight if DRA is disabled.
  3. CI Compile-Time Guardrails: Implemented a reflection-based unit test (predicates_extension_test.go) utilizing Go's reflect package to automatically audit wrapped in-tree plugins (DynamicResources and VolumeBinding). The test maintains an explicit allowed interface whitelist and will throw a test failure if upstream updates introduce unmapped interface hooks.
  4. E2E Testing Matrix: Ported and adapted core DRA scoring integration tests from upstream Kubernetes into Volcano's Ginkgo/Gomega suite (test/e2e/scheduling/dra_score.go), explicitly declaring schedulerName: volcano.

Fixes #5329

Type of Change

  • Bug fix / Feature enablement (non-breaking change which fixes an issue / adds functionality)
  • Automated Test Infrastructure Append

Verification Conducted

  • Unit Tests: Verified that go test -v ./pkg/scheduler/plugins/predicates/... executes successfully and the reflection matrix catches un-whitelisted assertions.
  • E2E Integration Tests: Provisioned a local multi-node Kind cluster and verified that the newly ported dra_score.go suite successfully passes under Volcano's scheduling loop runtime.

Copilot AI review requested due to automatic review settings June 10, 2026 09:45
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot

Copy link
Copy Markdown
Contributor

Welcome @gagan-devv! It looks like this is your first PR to volcano-sh/volcano 🎉

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Dynamic Resource Allocation (DRA) scoring support to the predicates plugin and introduces E2E + unit guardrail tests to validate scheduling/scoring behavior and upstream framework interface changes.

Changes:

  • Add a new scheduling E2E test suite and a DRA score-focused E2E spec.
  • Register DynamicResources as a Score plugin (with adapter + normalization support) and update existing plugin init tests.
  • Add a guardrail test that reflects upstream plugin framework interface implementations and unit tests for the score adapter’s PreScore behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/e2e/scheduling/main_test.go Adds suite-level client initialization for scheduling E2E tests.
test/e2e/scheduling/e2e_test.go Adds Ginkgo suite entrypoint for scheduling E2E tests.
test/e2e/scheduling/dra_score.go Adds an E2E spec validating scheduling using DRA scores and sets up a DeviceClass/ResourceClaimTemplate.
pkg/scheduler/plugins/predicates/predicates_test.go Updates expectations to include DynamicResources in Score when enabled.
pkg/scheduler/plugins/predicates/predicates_extension_test.go Adds reflection-based guardrail for upstream plugin interfaces + unit test for score adapter PreScore behavior.
pkg/scheduler/plugins/predicates/predicates.go Enables DRA via args, registers DynamicResources scoring, improves normalizer selection, adds score adapter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +38
func TestMain(m *testing.M) {
home := e2eutil.HomeDir()
configPath := e2eutil.KubeconfigPath(home)
config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath)
e2eutil.VcClient = vcclient.NewForConfigOrDie(config)
e2eutil.KubeClient = kubernetes.NewForConfigOrDie(config)
os.Exit(m.Run())
}
Comment thread test/e2e/scheduling/dra_score.go Outdated
driver := drautils.NewDriver(f, nodes, drautils.DriverResources(2))

BeforeEach(func() {
ensureDeviceClass(driver.Name)
Comment thread test/e2e/scheduling/dra_score.go Outdated
{
Name: "req-1",
Exactly: &resourcev1.ExactDeviceRequest{
DeviceClassName: "dra-score-class",
Comment thread test/e2e/scheduling/dra_score.go Outdated
Comment on lines +108 to +112
func ensureDeviceClass(driverName string) {
class := &resourcev1.DeviceClass{
ObjectMeta: metav1.ObjectMeta{
Name: "dra-score-class",
},
Comment on lines +127 to +133
DeferCleanup(func() {
err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
Expect(err).NotTo(HaveOccurred())
}
})
}
"PostBindPlugin": reflect.TypeOf((*fwk.PostBindPlugin)(nil)).Elem(),
"QueueSortPlugin": reflect.TypeOf((*fwk.QueueSortPlugin)(nil)).Elem(),
"PreEnqueuePlugin": reflect.TypeOf((*fwk.PreEnqueuePlugin)(nil)).Elem(),
"SignPlugin": reflect.TypeOf((*fwk.SignPlugin)(nil)).Elem(),
Comment on lines +940 to +943
func (s *scorePluginAdapter) PreScore(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodes []fwk.NodeInfo) *fwk.Status {
if !s.enabled {
return fwk.NewStatus(fwk.Skip, "DRA scoring is disabled")
}
Comment on lines +950 to +953
func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) {
if !s.enabled {
return 0, nil
}
Comment on lines +631 to +638
draWeight := 1
if w, ok := framework.Get[int](pp.pluginArguments, "dynamicresources.weight"); ok {
draWeight = w
}
addScorePlugin(dynamicresources.Name, &scorePluginAdapter{
ScorePlugin: dynamicResourceAllocationPlugin,
enabled: pp.enabledPredicates.dynamicResourceAllocationEnable,
}, draWeight)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Dynamic Resource Allocation (DRA) scoring support in the predicates plugin, along with corresponding unit and E2E tests. Feedback on the changes includes addressing a potential nil pointer dereference in the Score method when logging node information, properly handling errors when building the kubeconfig in E2E tests, and replacing context.TODO() with context.Background() in test contexts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +950 to +956
func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) {
if !s.enabled {
return 0, nil
}
klog.V(4).Infof("Scoring pod %s/%s on node %s using DRA", pod.Namespace, pod.Name, nodeInfo.Node().Name)
return s.ScorePlugin.Score(ctx, state, pod, nodeInfo)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Score, logging nodeInfo.Node().Name directly can cause a nil pointer dereference panic if nodeInfo or nodeInfo.Node() is nil (which can happen in mock/test environments or edge cases). It is safer to check for nil before accessing the node name.

func (s *scorePluginAdapter) Score(ctx context.Context, state fwk.CycleState, pod *v1.Pod, nodeInfo fwk.NodeInfo) (int64, *fwk.Status) {
	if !s.enabled {
		return 0, nil
	}
	nodeName := ""
	if nodeInfo != nil && nodeInfo.Node() != nil {
		nodeName = nodeInfo.Node().Name
	}
	klog.V(4).Infof("Scoring pod %s/%s on node %s using DRA", pod.Namespace, pod.Name, nodeName)
	return s.ScorePlugin.Score(ctx, state, pod, nodeInfo)
}

Comment thread test/e2e/scheduling/main_test.go Outdated
func TestMain(m *testing.M) {
home := e2eutil.HomeDir()
configPath := e2eutil.KubeconfigPath(home)
config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error returned by clientcmd.BuildConfigFromFlags is ignored. If building the config fails, config will be nil, causing NewForConfigOrDie to panic with an unhelpful error. It is better to handle the error explicitly.

Suggested change
config, _ := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath)
config, err := clientcmd.BuildConfigFromFlags(e2eutil.MasterURL(), configPath)
if err != nil {
panic(err)
}

Comment thread test/e2e/scheduling/dra_score.go Outdated
Comment on lines +122 to +132
_, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.TODO(), class, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
Expect(err).NotTo(HaveOccurred())
}

DeferCleanup(func() {
err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
Expect(err).NotTo(HaveOccurred())
}
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use context.Background() instead of context.TODO() as a default context for API calls in E2E tests when a specific request context is not available.

Suggested change
_, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.TODO(), class, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
Expect(err).NotTo(HaveOccurred())
}
DeferCleanup(func() {
err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.TODO(), "dra-score-class", metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
Expect(err).NotTo(HaveOccurred())
}
})
_, err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Create(context.Background(), class, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
Expect(err).NotTo(HaveOccurred())
}
DeferCleanup(func() {
err := e2eutil.KubeClient.ResourceV1().DeviceClasses().Delete(context.Background(), "dra-score-class", metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
Expect(err).NotTo(HaveOccurred())
}
})

@volcano-sh-bot volcano-sh-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/contains-merge-commits and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 10, 2026
…ardrails (volcano-sh#5329)

Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
Fixes parsing of predicate.DynamicResourceAllocationEnable in predicates plugin.
Improves scorePluginAdapter to short-circuit when disabled and comply with klog.V(4).
Adds Ginkgo E2E test suite under test/e2e/scheduling to verify DRA score scheduling.

Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
Adds nil-safety checks to scorePluginAdapter.Score and configures weight validation.
Removes redundant enabled flag in scorePluginAdapter for consistent disabled behavior.
Updates E2E tests with unique DeviceClass names and handles BuildConfigFromFlags errors.

Signed-off-by: Gagan Ahlawat <gagan.devvv@gmail.com>
@gagan-devv

Copy link
Copy Markdown
Author

Hi @JesseStutler, I have successfully implemented the requested framework changes and opened this PR for review!

Summary of Changes:

  1. DRA Scoring Paths: Wired up the Score and NormalizeScore extensions inside Volcano's predicates plugin to map upstream kube-scheduler weights when computing node priorities.
  2. Configuration Compliance: Wrapped the execution loop to respect the DynamicResourceAllocationEnable flag, ensuring a graceful fallback if DRA is toggled off.
  3. Automated Guardrails: Added a reflection-based unit test (predicates_extension_test.go) that checks the underlying type definitions of wrapped in-tree plugins (DynamicResources and VolumeBinding). It cross-references a hardcoded whitelist of allowed interfaces so future upstream updates that introduce new extension points will trigger an explicit test failure.
  4. E2E Integration Tests: Ported and adapted the upstream Kubernetes DRA score test specs into Volcano's Ginkgo framework (test/e2e/scheduling/dra_score.go), explicitly enforcing schedulerName: volcano.

All unit tests and local Kind integration suites pass successfully. Ready for your review and feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support DRA Score/NormalizeScore and add guardrails for kube-scheduler plugin extension changes

3 participants