Skip to content

✨ Enable dual-stack binds for VM Operator webhook, metrics, health, profiler, and web-console validator#5

Open
hpannem wants to merge 37 commits into
topic/hpannem/dualstack-finalfrom
ipv6-endpoints
Open

✨ Enable dual-stack binds for VM Operator webhook, metrics, health, profiler, and web-console validator#5
hpannem wants to merge 37 commits into
topic/hpannem/dualstack-finalfrom
ipv6-endpoints

Conversation

@hpannem
Copy link
Copy Markdown
Owner

@hpannem hpannem commented Apr 15, 2026

What does this PR do, and why is it needed?

VM Operator’s controller manager and web-console validation server need to listen on addresses that work in IPv6-only and dual-stack Kubernetes clusters. This change adds a configurable webhook bind address (default empty; deployments set ::) and wires it into controller-runtime’s webhook server Host. It updates shipped kustomize patches so metrics, health, and profiler use [::]:port where appropriate, and extends the web-console validator with SERVER_BIND_ADDRESS / --server-bind-address so the HTTP server can bind on [::] for dual-stack. WCP’s metrics port patch is updated to [::]:9848 while keeping args[0] as the metrics flag so existing JSON6902 patches keep working.

Are there any special notes for your reviewer

config/default/manager_auth_proxy_patch.yaml intentionally lists --metrics-addr=[::]:8443 first so config/wcp/vmoperator/manager_metrics_port_patch.yaml (which replaces args/0) still only retargets the metrics bind.

Please add a release note if necessary

The controller manager accepts --webhook-bind-address (use "::" for dual-stack). Default kustomize patches bind metrics, health, profiler, and the web-console validation server on IPv6 dual-stack-friendly addresses. The web-console validator honors SERVER_BIND_ADDRESS and --server-bind-address.

@hpannem hpannem self-assigned this Apr 15, 2026
aakashchan and others added 29 commits April 15, 2026 14:59
LocalizedMethodFault can be one of many different vSphere Fault type like NoCompatibleHost, GenericDrsFault etc

    - Update LocalizedMessagesFromFault to recursively unwrap GenericDrsFault and NoCompatibleHost using reflection.
    - Refactors the fault parser to use reflection, allowing it to drill down into VimFault.MethodFault.FaultMessage and recursive Error arrays. Includes new test coverage for anti-affinity GenericDrsFaults.

    - Add comprehensive unit tests for deeply nested fault extraction, including anti-affinity policy errors.
Changed build mode for Go from 'autobuild' to 'manual' and updated build steps accordingly.
This patch removes the codeql action until its issues can be
sorted.
A PVC can have other DataSourceRef types, such snapshot.storage.k8s.io/VolumeSnapshot
if created from a snapshot, that its zonal constraint should still be
considered until we put all PVCs in the placement ConfigSpec.
Writing to the fake provider vSphereClient may race with the
controller reading it from the fake provider.

Seen in https://github.com/vmware-tanzu/vm-operator/actions/runs/24476299421/job/71529316798?pr=1554
When either is unset, or they have different values, lookup the
VM's zone and assign the zone to the label and status. To avoid
a lookup if both the label and status are set to the same value,
we'll just trust what is there.
Add stuff that the old new-schema-version.py missed when a type
was just added in the now prior Hub version.

While here, also add conversion for VirtualMachineClassInstance
since that wasn't done between v1a4 and v1a5. Since this CRD is
stripped only enable conversion when the feature is also enabled.
…ackfill-zone-status-from-label

🌱 Don't always backfill VM Status.Zone from label
…1a6-conversion

🐛 Add missing v1a6 version conversion bits
…roller-data-race

🌱 Fix data race in VMIC controller test
The updated version addresses the following CVEs:
CVE-2026-32282
CVE-2026-32289
CVE-2026-33810
CVE-2026-27144
CVE-2026-27143
CVE-2026-32288
CVE-2026-32283
CVE-2026-27140

Signed-off-by: Rafael Brito <rafael.brito@broadcom.com>
[Merging on behalf of @Shuting]
* ✨ Add validation for VLAN sub-interfaces capability

This commit introduces capability checks for VLAN sub-interfaces in
the VirtualMachine webhook to ensure the feature is only used when
supported by the Supervisor.

* Fix unit test regression error

* Use latest WCP Capabilities Key

* Fix comments

* Explicit set VMVlanSubinterface to false in unit test
…rect parameter

Signed-off-by: Rafael Brito <rafael.brito@broadcom.com>
…/e2etest-tweak-script-and-readme

🐛 Small corrections on the newly introduced e2e test setup
…-just-vm-dsref

🐛 Only skip PVCs with VirtualMachine as the DSRef
Allow `kubernetes.io/hostname` as a valid topology key for VMAffinity
RequiredDuringScheduling when VMAffinityDuringExecution feature flag is enabled.

- Add feature flag check for hostname topology key validation
- Add comprehensive test coverage for all VMAffinity validation scenarios
- Maintain backward compatibility (zone-only when feature disabled)

Signed-off-by: Nabarun Pal <nabarun.pal@broadcom.com>
This PR exposes an argument for the download path of Kubectl and an
argument for where to store the logs.
…ffinity

Previously, processVMAffinity() and processVMAntiAffinity() only
translated zone-topology terms (topologyKey: topology.kubernetes.io/zone)
into VmPlacementPolicies. Host-topology terms (topologyKey:
kubernetes.io/hostname) were silently skipped with the expectation that
ClusterModules would handle host-level anti-affinity.

This change:

- Refactors buildTagIDsFromZoneTopology into a generic
  buildTagIDsFromTopology(vmCtx, terms, topologyKey) with thin wrappers
  buildTagIDsFromZoneTopology and buildTagIDsFromHostTopology to avoid
  duplicating the tag extraction logic.

- Extends processVMAffinity to generate VmVmAffinity policies with
  Host topology for both required and preferred terms.

- Extends processVMAntiAffinity to generate VmVmAntiAffinity policies
  with Host topology for both required and preferred terms. Each
  topology/strictness combination produces a separate policy object.

- Enable HostRecommRequired for placement calls made through group placement.

- Gates all host-topology processing behind a feature-gate

- Adds tests for required/preferred host-level affinity and
  anti-affinity, mixed zone+host topology terms, and verifies that
  host-topology terms are silently skipped when VMAffinityDuringExecution
  is disabled while zone-topology terms continue to work.

Signed-off-by: Nabarun Pal <nabarun.pal@broadcom.com>
When the VM Class BootOptions was already populated,
the reconciler cleared it and only applied VM spec fields.
Seed csBootOptions from *configSpec.BootOptions when it is set
so class defaults (e.g. bootDelay) remain unless the VM
spec overrides them.

Add tests for class-only bootDelay and VM spec override.
Update controller-runtime from v0.22.3 to v0.23.1 along with
k8s.io/* dependencies from v0.34.1 to v0.35.4, including e2e tests.
Ginkgo had to be upgraded to v2.27.2.

Additionally, this patch includes the update of otel dependency to v1.41 to address
CVE-2026-29181 and CVE-2026-39883.

This includes the following breaking/deprecation changes (from @aruneshpa):
- ctrl.NewWebhookManagedBy(mgr).For(&obj{}).Complete() changed to
  ctrl.NewWebhookManagedBy(mgr, &obj{}).Complete() (object is now
  passed as a second argument instead of via .For() chaining).
- mgr.GetEventRecorderFor() is deprecated in favor of
  mgr.GetEventRecorder(), which returns the new
  k8s.io/client-go/tools/events.EventRecorder instead of the old
  k8s.io/client-go/tools/record.EventRecorder. Migrated
  pkg/record.Recorder to accept the new EventRecorder type and
  updated all call sites and test fakes accordingly.

As per investigation why some test-services failed after the upgrade,
it has been pointed that k8s v0.35.x client being ~100ms slower in cache sync.
The `Eventually` default timeout from "1s" had to be increased to "2s" in a couple of tests.

Signed-off-by: Rafael Brito <rafael.brito@broadcom.com>
Signed-off-by: Rafael Brito <rafa@stormforge.io>
Signed-off-by: Rafael Brito <rafael.brito@broadcom.com>
…finity

🌱 Allow host-level topology key in VMAffinity with feature flag
…/process-host-aaf

✨ Generate VmPlacementPolicies for host-level AF and AAF during Placement
…/bump-controller-runtime-and-otel

🌱 Update controller runtime to 0.23.1 and otel 1.41
When VMAffinityDuringExecution feature flag is enabled:
- Allow both zone and host topology keys for preferred scheduling
- Maintain backward compatibility when feature flag is disabled (zone only)
- Apply consistent host topology support across Required and Preferred scheduling

Test coverage added:
- VM Affinity PreferredDuringScheduling with host topology key acceptance
- VM Affinity PreferredDuringScheduling with unsupported topology key rejection
- VM Anti-Affinity PreferredDuringScheduling with unsupported topology key rejection

This enables users to specify kubernetes.io/hostname topology keys for
VM affinity preferred scheduling when the VMAffinityDuringExecution
capability is enabled.

Signed-off-by: Nabarun Pal <nabarun.pal@broadcom.com>
Adding a new E2E test in the VM-Hardware suite that tests positive and
negative cases for mutli-writer, encrypted volumes with physical sharing
mode controllers.
palnabarun and others added 3 commits April 30, 2026 04:11
…ost-topology-af-preferred-during-scheduling

🐛 Add host topology support for VM Affinity PreferredDuringScheduling
…ervice changes (vmware-tanzu#1552)

* Dualstack VirtualMachineService changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants