Skip to content

Support vnpus.configs wrapper format for HAMi vNPU device config#5399

Open
shivansh-gohem wants to merge 2 commits into
volcano-sh:masterfrom
shivansh-gohem:fix/vnpus-configs-compat-5372
Open

Support vnpus.configs wrapper format for HAMi vNPU device config#5399
shivansh-gohem wants to merge 2 commits into
volcano-sh:masterfrom
shivansh-gohem:fix/vnpus-configs-compat-5372

Conversation

@shivansh-gohem

Copy link
Copy Markdown

What type of PR is this?

/kind bug

What this PR does / why we need it

The HAMi / ascend-device-plugin shared ConfigMap (hami-scheduler-device) has moved to a new vnpus structure:

vnpus:
  hamiVnpuCore: false
  configs:
    - chipName: 910A
      commonWord: Ascend910A
      ...

But Volcano currently parses vnpus as a direct array:

type Config struct {
    VNPUs []VNPUConfig `yaml:"vnpus"`
}

This means users following the HAMi vNPU guide will deploy the new format, but Volcano silently fails to parse the vNPU configs — resulting in zero configs being loaded.

This PR adds a VNPUsConfig wrapper struct with a custom UnmarshalYAML that auto-detects the YAML shape:

  1. First tries the new wrapper format: vnpus: { hamiVnpuCore: ..., configs: [...] }
  2. Falls back to the legacy array format: vnpus: [{ chipName: ... }]

This ensures full backward compatibility — existing deployments using the old array format continue to work, while new deployments following HAMi docs also parse correctly.

Which issue(s) this PR fixes

Fixes #5372

Changes

Core: Config Package

  • config/vnpu.go: Added VNPUsConfig wrapper struct with HamiVnpuCore bool + Configs []VNPUConfig, and a custom UnmarshalYAML method for dual-format parsing
  • config/config.go: Changed Config.VNPUs field type from []VNPUConfig to VNPUsConfig; updated GetDefaultDevicesConfig() and inline YAML documentation
  • config/config_test.go: Added TestParseVNPUsNewWrapperFormat and TestParseVNPUsLegacyArrayFormat

Consumers

  • deviceshare/deviceshare.go: Updated .VNPUs.VNPUs.Configs (1 site)
  • ascend/hami/device_info.go: Updated .VNPUs.VNPUs.Configs (3 sites)
  • ascend/hami/device_info_test.go: Updated .VNPUs.VNPUs.Configs (3 sites)

Special notes for your reviewer

  • The custom UnmarshalYAML uses a type alias (vnpusConfigAlias) to prevent infinite recursion when unmarshaling the wrapper format
  • The HamiVnpuCore field is parsed and stored but not yet consumed by Volcano — included for forward compatibility with HAMi
  • No external APIs or CLI flags are affected; this is a purely internal config parsing change

Does this PR introduce a user-facing change?

Support both the new HAMi vnpus.configs wrapper format and the legacy vnpus direct array format in device config parsing for Ascend vNPU.

Copilot AI review requested due to automatic review settings June 7, 2026 06:58
@volcano-sh-bot volcano-sh-bot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels Jun 7, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

Welcome @shivansh-gohem! It looks like this is your first PR to volcano-sh/volcano 🎉

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 7, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates VNPU configuration parsing to support a new “HAMi wrapper” YAML format while preserving support for the legacy direct-array format, and updates call sites accordingly.

Changes:

  • Introduced VNPUsConfig wrapper type with custom YAML unmarshaling to accept both wrapper and legacy array formats.
  • Updated scheduler/device code paths to read VNPU configs via config.VNPUs.Configs.
  • Added unit tests covering unmarshaling for both YAML formats.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/scheduler/plugins/deviceshare/deviceshare.go Switch VNPU iteration to the new VNPUs.Configs field.
pkg/scheduler/api/devices/config/vnpu.go Add VNPUsConfig and custom UnmarshalYAML supporting wrapper + legacy formats.
pkg/scheduler/api/devices/config/config_test.go Add tests for parsing the new wrapper and legacy array vnpus YAML.
pkg/scheduler/api/devices/config/config.go Change Config.VNPUs type to VNPUsConfig and update default config + docs comment.
pkg/scheduler/api/devices/ascend/hami/device_info_test.go Update tests to reference VNPUs.Configs.
pkg/scheduler/api/devices/ascend/hami/device_info.go Update runtime VNPU usage to reference VNPUs.Configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +32 to +49
func (v *VNPUsConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// Try the new wrapper format: { hamiVnpuCore: ..., configs: [...] }
type vnpusConfigAlias VNPUsConfig // alias prevents infinite recursion
var wrapper vnpusConfigAlias
if err := unmarshal(&wrapper); err == nil && len(wrapper.Configs) > 0 {
*v = VNPUsConfig(wrapper)
return nil
}

// Fallback: try legacy direct array format: [{ chipName: ... }]
var configs []VNPUConfig
if err := unmarshal(&configs); err != nil {
return fmt.Errorf("vnpus: failed to parse as either wrapper struct or direct array: %w", err)
}
v.Configs = configs
v.HamiVnpuCore = false
return nil
}
Comment on lines +281 to +308
func TestParseVNPUsNewWrapperFormat(t *testing.T) {
yamlStr := `
vnpus:
hamiVnpuCore: false
configs:
- chipName: 910A
commonWord: Ascend910A
resourceName: huawei.com/Ascend910A
resourceMemoryName: huawei.com/Ascend910A-memory
memoryAllocatable: 32768
memoryCapacity: 32768
aiCore: 30
templates:
- name: vir02
memory: 2184
aiCore: 2
`
var config Config
err := yaml.Unmarshal([]byte(yamlStr), &config)
assert.Nil(t, err)
assert.Equal(t, false, config.VNPUs.HamiVnpuCore)
assert.Equal(t, 1, len(config.VNPUs.Configs))
assert.Equal(t, "910A", config.VNPUs.Configs[0].ChipName)
assert.Equal(t, "Ascend910A", config.VNPUs.Configs[0].CommonWord)
assert.Equal(t, int64(32768), config.VNPUs.Configs[0].MemoryAllocatable)
assert.Equal(t, 1, len(config.VNPUs.Configs[0].Templates))
assert.Equal(t, "vir02", config.VNPUs.Configs[0].Templates[0].Name)
}
Comment on lines 54 to 58
type Config struct {
//NvidiaConfig is used for vGPU feature for nvidia, gpushare is not using this config
NvidiaConfig NvidiaConfig `yaml:"nvidia"`
VNPUs []VNPUConfig `yaml:"vnpus"`
VNPUs VNPUsConfig `yaml:"vnpus"`
}

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new wrapper format for VNPU configurations (VNPUsConfig) that supports both a new HAMi wrapper format and the legacy direct array format, updating references and adding corresponding unit tests. A review comment identifies a bug in the custom UnmarshalYAML implementation where a valid wrapper format with an empty configs list would incorrectly fall back to the legacy format and fail to parse, and provides a robust solution to handle this case.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +32 to +49
func (v *VNPUsConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// Try the new wrapper format: { hamiVnpuCore: ..., configs: [...] }
type vnpusConfigAlias VNPUsConfig // alias prevents infinite recursion
var wrapper vnpusConfigAlias
if err := unmarshal(&wrapper); err == nil && len(wrapper.Configs) > 0 {
*v = VNPUsConfig(wrapper)
return nil
}

// Fallback: try legacy direct array format: [{ chipName: ... }]
var configs []VNPUConfig
if err := unmarshal(&configs); err != nil {
return fmt.Errorf("vnpus: failed to parse as either wrapper struct or direct array: %w", err)
}
v.Configs = configs
v.HamiVnpuCore = false
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current custom UnmarshalYAML implementation relies on len(wrapper.Configs) > 0 to detect the new wrapper format. However, if a user provides a valid new wrapper format with an empty configs list (e.g., vnpus: { hamiVnpuCore: false, configs: [] }), len(wrapper.Configs) > 0 will be false. This causes the parser to fall back to the legacy direct array format, which then fails because a YAML map cannot be unmarshaled into a slice.

To fix this and make the parsing robust, we should explicitly check if the YAML node is a map (representing the new wrapper format) by attempting to unmarshal it into a temporary map first. If that succeeds, we parse it as the wrapper format; otherwise, we fall back to the legacy direct array format.

func (v *VNPUsConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
	// Check if the input is a map (new wrapper format)
	var raw map[string]interface{}
	if err := unmarshal(&raw); err == nil {
		type vnpusConfigAlias VNPUsConfig // alias prevents infinite recursion
		var wrapper vnpusConfigAlias
		if err := unmarshal(&wrapper); err != nil {
			return fmt.Errorf("vnpus: failed to parse wrapper struct: %w", err)
		}
		*v = VNPUsConfig(wrapper)
		return nil
	}

	// Fallback: try legacy direct array format: [{ chipName: ... }]
	var configs []VNPUConfig
	if err := unmarshal(&configs); err != nil {
		return fmt.Errorf("vnpus: failed to parse as either wrapper struct or direct array: %w", err)
	}
	v.Configs = configs
	v.HamiVnpuCore = false
	return nil
}

@shivansh-gohem shivansh-gohem force-pushed the fix/vnpus-configs-compat-5372 branch from 596f7ad to 1341adf Compare June 7, 2026 07:02
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 7, 2026
@shivansh-gohem

Copy link
Copy Markdown
Author

}

// UnmarshalYAML implements custom YAML unmarshaling for VNPUsConfig.
// It probes the YAML node type to disambiguate between the two formats:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these API-comments

@archlitchi

Copy link
Copy Markdown
Contributor

/ok-to-test

@volcano-sh-bot volcano-sh-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 9, 2026
@archlitchi

Copy link
Copy Markdown
Contributor

/assign

@shivansh-gohem

Copy link
Copy Markdown
Author

@archlitchi resolved the feedback!! ptal

Comment thread pkg/scheduler/api/devices/config/vnpu.go Outdated
return nil
}

// Fallback: YAML node is an array — parse as legacy direct array format.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all comments in this method except this one

@shivansh-gohem

Copy link
Copy Markdown
Author

@archlitchi ptal whenever you have time

@archlitchi

Copy link
Copy Markdown
Contributor

please resolve the conflicts for it to be merged

@shivansh-gohem shivansh-gohem force-pushed the fix/vnpus-configs-compat-5372 branch from 0a352a2 to 0258f50 Compare June 11, 2026 07:47
@shivansh-gohem

Copy link
Copy Markdown
Author

@archlitchi i have resolved all the conflicts , please have a look!!

@shivansh-gohem shivansh-gohem force-pushed the fix/vnpus-configs-compat-5372 branch from 0258f50 to dd6deab Compare June 11, 2026 18:34
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from archlitchi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support vnpus.configs in HAMi vNPU device config ConfigMap

4 participants