Adding Capacity Reservation during VM Resize#4707
Adding Capacity Reservation during VM Resize#4707rh-returners wants to merge 14 commits intomasterfrom
Conversation
@microsoft-github-policy-service agree |
|
@copilot review |
Switched from reflection mode to source mode (-source) for the new CapacityReservationGroups, CapacityReservations, and VirtualMachines mock clients. This ensures deterministic mock generation across Go versions (reflection mode requires Go 1.25 which is only available in CI, not locally). Changes: - generate.go: revert to ResourceSKUsClient-only reflection mode; add separate source-mode generate directives for the 3 new clients - armcompute.go: revert to original single-interface version - Add new separate mock files per interface (source-mode generated) Co-authored-by: tuxerrante <8364469+tuxerrante@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/ARO-RP/sessions/7f872d11-3b82-4401-bed7-8af1e5550ea9
Reviewed and addressed the CI failure in the Root cause: The new Fix: Switched to source mode (
The original |
There was a problem hiding this comment.
Pull request overview
Introduces capacity-reservation-aware VM resize support (via transient Capacity Reservation Groups) and addresses mock generation determinism issues by switching new interfaces to mockgen source-mode generation.
Changes:
- Add new ARM compute wrapper clients/interfaces (VirtualMachines, CapacityReservationGroups, CapacityReservations) plus deterministic mocks.
- Implement capacity-reservation-backed single-VM resize flow in admin AzureActions and expose it via the admin VM resize endpoint (
useCapacityReservation+zone). - Refactor admin action preparation helpers and extend unit tests/mocks to cover the new flow.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/util/mocks/azureclient/azuresdk/armcompute/virtualmachines.go | New gomock for the VirtualMachines ARM wrapper client. |
| pkg/util/mocks/azureclient/azuresdk/armcompute/capacityreservations.go | New gomock for CapacityReservations ARM wrapper client. |
| pkg/util/mocks/azureclient/azuresdk/armcompute/capacityreservationgroups.go | New gomock for CapacityReservationGroups ARM wrapper client. |
| pkg/util/mocks/azureclient/azuresdk/armcompute/armcompute.go | Keeps legacy ResourceSKUsClient mock; minor regen/import ordering changes. |
| pkg/util/mocks/adminactions/kubeactions.go | Mock regen/import ordering changes. |
| pkg/util/mocks/adminactions/azureactions.go | Extends AzureActions mock with CRG-related methods. |
| pkg/util/azureclient/azuresdk/armcompute/virtualmachines.go | New minimal ARM VirtualMachines client wrapper construction. |
| pkg/util/azureclient/azuresdk/armcompute/virtualmachines_addons.go | Adds helper methods (Get/List/*AndWait) for ARM VirtualMachines operations. |
| pkg/util/azureclient/azuresdk/armcompute/generate.go | Updates go:generate to create deterministic mocks for new interfaces via source mode. |
| pkg/util/azureclient/azuresdk/armcompute/capacityreservations.go | New minimal ARM CapacityReservations client wrapper construction. |
| pkg/util/azureclient/azuresdk/armcompute/capacityreservations_addons.go | Adds *AndWait helpers for ARM CapacityReservations operations. |
| pkg/util/azureclient/azuresdk/armcompute/capacityreservationgroups.go | New minimal ARM CapacityReservationGroups client wrapper construction. |
| pkg/util/azureclient/azuresdk/armcompute/capacityreservationgroups_addons.go | Adds helper methods for CRG create/delete behavior (including 202 handling). |
| pkg/frontend/adminactions/azureactions.go | Extends AzureActions interface and wires in new ARM compute clients. |
| pkg/frontend/adminactions/azureactions_vmresize_capacity.go | Implements CRG-backed single VM resize + CRG lifecycle management/retries. |
| pkg/frontend/adminactions/azureactions_vmresize_capacity_test.go | Adds unit tests for the CRG-backed resize helpers. |
| pkg/frontend/admin_vm_actions.go | Extracts a cluster-scoped admin actions preparer (no VM name validation). |
| pkg/frontend/admin_openshiftcluster_vmresize.go | Adds useCapacityReservation/zone handling and routes to CRG resize path. |
| pkg/frontend/admin_openshiftcluster_vmresize_test.go | Adds endpoint tests for CRG resize path and missing-zone validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Validate zone parameter against VM's actual zone before creating any Azure resources in CRGResizeSingleVM to prevent unnecessary deallocation - Use a fresh background context with timeout for post-success CRG cleanup so cleanup is not skipped if the request context is cancelled/deadlined - Add tests for invalid useCapacityReservation value (400) and zone provided without useCapacityReservation=true (400) - Update existing CRGResizeSingleVM tests to expect the zone validation GET Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…atus and some other copilot fixes
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| //go:generate rm -rf ../../../../util/mocks/$GOPACKAGE | ||
| //go:generate mockgen -destination=../../../../util/mocks/azureclient/azuresdk/$GOPACKAGE/$GOPACKAGE.go github.com/Azure/ARO-RP/pkg/util/azureclient/azuresdk/$GOPACKAGE ResourceSKUsClient | ||
| //go:generate mockgen -source ./capacityreservationgroups.go -destination=../../../../util/mocks/azureclient/azuresdk/$GOPACKAGE/capacityreservationgroups.go github.com/Azure/ARO-RP/pkg/util/azureclient/azuresdk/$GOPACKAGE CapacityReservationGroupsClient | ||
| //go:generate mockgen -source ./capacityreservations.go -destination=../../../../util/mocks/azureclient/azuresdk/$GOPACKAGE/capacityreservations.go github.com/Azure/ARO-RP/pkg/util/azureclient/azuresdk/$GOPACKAGE CapacityReservationsClient | ||
| //go:generate mockgen -source ./virtualmachines.go -destination=../../../../util/mocks/azureclient/azuresdk/$GOPACKAGE/virtualmachines.go github.com/Azure/ARO-RP/pkg/util/azureclient/azuresdk/$GOPACKAGE VirtualMachinesClient |
There was a problem hiding this comment.
The rm -rf target (../../../../util/mocks/$GOPACKAGE) doesn't match the directory where mocks are generated (../../../../util/mocks/azureclient/azuresdk/$GOPACKAGE). This can leave stale generated files behind and undermine the determinism goal. Update the cleanup path to remove the actual destination directory before regenerating.
| ) | ||
|
|
||
| // CapacityReservationsClientAddons contains addons for CapacityReservationsClient | ||
| type CapacityReservationsClientAddons interface { |
There was a problem hiding this comment.
The generated mock for CapacityReservationsClient includes a Get(...) method, but the exposed interface here only declares CreateOrUpdateAndWait and DeleteAndWait. This mismatch usually indicates mockgen is not generating from the intended interface definition (or the interface is missing a method). Align the interface/mocks by either adding Get here intentionally (if it’s meant to be part of the minimal wrapper) or regenerating mocks from the correct source/args so the mock method set matches the actual interface.
| type CapacityReservationsClientAddons interface { | |
| type CapacityReservationsClientAddons interface { | |
| Get(ctx context.Context, resourceGroupName, capacityReservationGroupName, capacityReservationName string, options *armcompute.CapacityReservationsClientGetOptions) (armcompute.CapacityReservationsClientGetResponse, error) |
| func (c *capacityReservationGroupsClient) pollCRGDeleted(ctx context.Context, resourceGroupName, capacityReservationGroupName string) error { | ||
| for { | ||
| _, err := c.Get(ctx, resourceGroupName, capacityReservationGroupName, nil) | ||
| if err != nil { | ||
| var responseErr *azcore.ResponseError | ||
| if errors.As(err, &responseErr) && responseErr.StatusCode == http.StatusNotFound { | ||
| return nil | ||
| } | ||
| return err | ||
| } | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-time.After(crgDeletePollInterval): | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Using time.After(...) inside a loop allocates a new timer each iteration. For long-running deletes this can generate unnecessary allocations. Prefer a time.Ticker (created once, stopped via defer ticker.Stop()) or reuse a time.Timer to reduce per-iteration overhead.
| func (a *azureActions) deleteReservationWithRetry(ctx context.Context, clusterRG, crName string) error { | ||
| for attempt := 1; attempt <= crgMaxRetries; attempt++ { | ||
| err := a.armCapacityReservations.DeleteAndWait(ctx, clusterRG, capacityReservationGroupName, crName) | ||
| if err == nil || azureerrors.IsNotFoundError(err) { | ||
| return nil | ||
| } | ||
| if !isReferencedByVMError(err) || attempt == crgMaxRetries { | ||
| return fmt.Errorf("delete target reservation %s: %w", crName, err) | ||
| } | ||
| a.log.Infof("reservation %s still referenced by VM (Azure propagation lag), retrying in %s (attempt %d/%d)", | ||
| crName, crgRetryInterval, attempt, crgMaxRetries) | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-time.After(crgRetryInterval): | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The final return nil is effectively unreachable because the loop either returns nil on success/404 or returns an error on the final retry. Removing dead/unreachable code simplifies reasoning and avoids suggesting a silent-success path that can’t actually happen.
| // Allow enough time for all retries in CRGDelete to complete. | ||
| cleanupTimeout := time.Duration(crgMaxRetries)*crgRetryInterval + 2*time.Minute | ||
| cleanCtx, cancel := context.WithTimeout(context.Background(), cleanupTimeout) |
There was a problem hiding this comment.
The cleanup timeout calculation is duplicated in two places. Extracting it into a small helper (or a named const/var) reduces the risk of future edits changing one path but not the other.
| cleanupTimeout := time.Duration(crgMaxRetries)*crgRetryInterval + 2*time.Minute | ||
| cleanupCtx, cleanupCancel := context.WithTimeout(context.Background(), cleanupTimeout) |
There was a problem hiding this comment.
The cleanup timeout calculation is duplicated in two places. Extracting it into a small helper (or a named const/var) reduces the risk of future edits changing one path but not the other.
What this PR does / why we need it
This PR adds Capacity Reservation Group (CRG)-backed VM resize support to the admin VM resize endpoint, allowing operators to resize a single OpenShift cluster VM while allocating capacity from an Azure Capacity Reservation Group.
It also fixes a mock generation determinism issue that was causing CI failures.
Changes
Which issue this PR addresses
ARO-23134 — Add Capacity Reservation support during VM resize operations.
Test plan
Is there any documentation that needs to be updated for this PR?
No user-facing documentation changes required. The new parameters are admin-only and will be documented in internal runbooks.
How do you know this will function as expected in production?