NO-JIRA: Add nested virtualization support for AWS EC2 NodePools#8681
NO-JIRA: Add nested virtualization support for AWS EC2 NodePools#8681jhjaggars wants to merge 5 commits into
Conversation
…tform Add CpuOptions struct to AWSNodePoolPlatform to support configuring nested virtualization on AWS EC2 instances. Supported instance families are C8i, M8i, and R8i with valid values of "enabled" or "disabled". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update vendored cluster-api-provider-aws from v2.10.0 to v2.11.1 which includes upstream NestedVirtualization support in CPUOptions. Regenerate CAPA CRDs and apply configuration clients. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ool creation Add --nested-virtualization flag to both `hypershift create cluster aws` and `hypershift create nodepool aws` commands. Accepts "enabled" or "disabled" values for C8i, M8i, and R8i instance families. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nested virtualization Map NodePool CpuOptions.NestedVirtualization to the upstream CAPA CPUOptions type on AWSMachineTemplateSpec. The cpuOptions handling runs before the placement nil guard so it applies even when placement is not configured. Also prevent informer creation for unused platform-specific resources (e.g. SecretProviderClass on non-Azure clusters). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@jhjaggars: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThis PR adds two independent features. First, it introduces AWS EC2 nested virtualization configuration by defining a new 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jhjaggars The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8681 +/- ##
==========================================
- Coverage 41.44% 41.43% -0.01%
==========================================
Files 756 756
Lines 93644 93648 +4
==========================================
- Hits 38807 38802 -5
- Misses 52114 52124 +10
+ Partials 2723 2722 -1
... and 4 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/cluster/aws/create.go`:
- Around line 410-414: Validate o.NestedVirtualization at the CLI boundary by
checking its value against the documented allow-list and return an error
immediately if it's not allowed; then compute the effective instance type (use
o.InstanceType if set else the code's default m5.large/m6g.large) and verify
that this instance type's family is one of the supported families for nested
virtualization, returning a clear validation error if incompatible; update the
block that assigns nodePool.Spec.Platform.AWS.CpuOptions (where
o.NestedVirtualization is used) to perform both checks and fail fast instead of
setting an invalid value.
In `@cmd/nodepool/aws/create.go`:
- Around line 108-109: Add early validation for the nested virtualization flag
before the NodePool spec is written. Implement allow-list based validation that
checks two things: first, verify that the nested virtualization value is only
"enabled" or "disabled" (reject any arbitrary values), and second, validate that
the selected instance family is compatible with nested virtualization (only C8i,
M8i, and R8i families are supported, so reject incompatible defaults like
m5.large and m6g.large). This validation should be added at the trust boundary
where the flag is processed and should reject invalid combinations before any
NodePool spec is created. Apply the same validation logic to the other affected
code sections around lines 176-180.
In `@support/controlplane-component/controlplane-component.go`:
- Around line 208-211: The deletion logic currently returns early based only on
the volatile field c.hasBeenApplied; update the delete path in the component's
delete handler (the method using c.hasBeenApplied) to avoid relying on in-memory
state by first checking persistent cluster state or resource markers
instead—e.g., query the API/server for the component resource or its
finalizer/annotation via the controller client (use the existing client/get
method used elsewhere) and only skip deletion when the resource truly does not
exist or when a persistent “never-delete” marker is present; remove the early
return that depends solely on c.hasBeenApplied and replace it with a
cluster-existence check and/or persistent state check before deciding not to
delete.
- Around line 186-190: The code marks c.hasBeenApplied = true regardless of
whether c.update(cpContext) returned an error; change the logic so
c.hasBeenApplied is set only when the update succeeded: call reconcilationError
= c.update(cpContext) and then set c.hasBeenApplied = true only if
reconcilationError == nil (i.e., after a successful return from c.update),
leaving it false when c.update fails so downstream delete/cleanup behavior
remains correct.
In `@support/controlplane-component/generic-adapter.go`:
- Around line 54-57: The short-circuit using the process-local flag
ga.hasBeenApplied causes deletion to be skipped after a controller restart;
remove that reliance and instead detect resource existence/ownership at runtime
before skipping deletion: in the delete path (where ga.hasBeenApplied is
checked) call the appropriate lookup/delete logic (e.g., use the adapter's
client or ga.get/ga.getResource method) to query the cluster for the resource or
its ownership/annotation, attempt the deletion if the resource is present, and
only ignore errors that indicate "not found"; keep the operation idempotent and
remove the ga.hasBeenApplied-only return so cleanup runs across restarts.
- Around line 21-22: The hasBeenApplied boolean on genericAdapter is mutated on
a value copy (so updates are lost) and is only in-memory (lost on restart);
change manifestsAdapters from map[string]genericAdapter to
map[string]*genericAdapter and update update() and any retrieval sites to use
pointer semantics so genericAdapter.reconcile() sets the real struct's
hasBeenApplied; also remove sole reliance on the in-memory flag by deriving
applied state from the cluster (e.g., check the target resource
existence/owner/annotation) or persist the flag into the control-plane
workload's Status (add a field in the CR status and read/write it in
update()/reconcile()) so the predicate-based cleanup survives restarts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: fea52243-7df4-4f72-a4be-ab67d3321aaa
⛔ Files ignored due to path filters (167)
api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**client/applyconfiguration/hypershift/v1beta1/awsnodepoolplatform.gois excluded by!client/**client/applyconfiguration/hypershift/v1beta1/cpuoptions.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**cmd/install/assets/crds/cluster-api-provider-aws/infrastructure.cluster.x-k8s.io_awsclusters.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/cluster-api-provider-aws/infrastructure.cluster.x-k8s.io_awsclustertemplates.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/cluster-api-provider-aws/infrastructure.cluster.x-k8s.io_awsmachinepools.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/cluster-api-provider-aws/infrastructure.cluster.x-k8s.io_awsmachines.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/cluster-api-provider-aws/infrastructure.cluster.x-k8s.io_awsmachinetemplates.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamldocs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdgo.sumis excluded by!**/*.sumvendor/github.com/aws/aws-sdk-go-v2/service/ec2/CHANGELOG.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_AttachVolume.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_CreateSecondaryNetwork.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_CreateSecondarySubnet.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DeleteSecondaryNetwork.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DeleteSecondarySubnet.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DescribeSecondaryInterfaces.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DescribeSecondaryNetworks.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DescribeSecondarySubnets.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_DetachVolume.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_ModifyInstanceCpuOptions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_ModifyInstanceMetadataDefaults.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_ModifyInstanceMetadataOptions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_RunInstances.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/api_op_SearchTransitGatewayRoutes.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/deserializers.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/generated.jsonis excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/go_module_metadata.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/serializers.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/types/enums.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/types/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/ec2/validators.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/CHANGELOG.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/LICENSE.txtis excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_client.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_AssociateAccessPolicy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_AssociateEncryptionConfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_AssociateIdentityProviderConfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateAccessEntry.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateAddon.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateCapability.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateCluster.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateEksAnywhereSubscription.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateFargateProfile.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreateNodegroup.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_CreatePodIdentityAssociation.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteAccessEntry.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteAddon.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteCapability.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteCluster.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteEksAnywhereSubscription.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteFargateProfile.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeleteNodegroup.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeletePodIdentityAssociation.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DeregisterCluster.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeAccessEntry.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeAddon.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeAddonConfiguration.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeAddonVersions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeCapability.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeCluster.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeClusterVersions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeEksAnywhereSubscription.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeFargateProfile.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeIdentityProviderConfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeInsight.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeInsightsRefresh.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeNodegroup.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribePodIdentityAssociation.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DescribeUpdate.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_DisassociateAccessPolicy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListAccessEntries.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListAccessPolicies.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListAddons.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListAssociatedAccessPolicies.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListCapabilities.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListClusters.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListEksAnywhereSubscriptions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListFargateProfiles.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListIdentityProviderConfigs.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListInsights.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListNodegroups.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListPodIdentityAssociations.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListTagsForResource.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_ListUpdates.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_RegisterCluster.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_StartInsightsRefresh.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_TagResource.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UntagResource.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateAccessEntry.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateAddon.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateCapability.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateClusterConfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateClusterVersion.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateNodegroupConfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdateNodegroupVersion.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/api_op_UpdatePodIdentityAssociation.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/auth.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/deserializers.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/doc.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/endpoints.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/generated.jsonis excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/go_module_metadata.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/internal/endpoints/endpoints.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/options.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/serializers.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/types/enums.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/types/errors.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/types/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/aws/aws-sdk-go-v2/service/eks/validators.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift-online/ocm-common/pkg/resource/validations/kms_arn_regex_validation.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/aws.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/golang.org/x/crypto/blake2b/blake2b.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2bAVX2_amd64.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2bAVX2_amd64.sis excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2b_amd64.sis excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2b_generic.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2b_ref.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/blake2x.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/go125.gois excluded by!vendor/**,!**/vendor/**vendor/golang.org/x/crypto/blake2b/register.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta1/awscluster_conversion.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta1/awsmachine_conversion.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta1/network_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta1/types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta1/zz_generated.conversion.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awscluster_defaults.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awscluster_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awscluster_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsclustercontrolleridentity_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsclusterroleidentity_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsclusterstaticidentity_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsclustertemplate_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsmachine_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsmachine_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/awsmachinetemplate_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/bastion.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/defaults.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/network_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/api/v1beta2/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/sigs.k8s.io/cluster-api-provider-aws/v2/controlplane/rosa/api/v1beta2/rosacontrolplane_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/controlplane/rosa/api/v1beta2/rosacontrolplane_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/controlplane/rosa/api/v1beta2/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/OWNERSis excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/awsfargateprofile_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/awsmachinepool_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/awsmanagedmachinepool_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/rosamachinepool_defaults.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/rosamachinepool_types.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/rosamachinepool_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/rosanetwork_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/rosaroleconfig_webhook.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/validation.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/exp/api/v1beta2/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/sigs.k8s.io/cluster-api-provider-aws/v2/pkg/eks/eks.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/cluster-api-provider-aws/v2/pkg/hash/base36.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (7)
api/hypershift/v1beta1/aws.gocmd/cluster/aws/create.gocmd/nodepool/aws/create.gogo.modhypershift-operator/controllers/nodepool/aws.gosupport/controlplane-component/controlplane-component.gosupport/controlplane-component/generic-adapter.go
| if o.NestedVirtualization != "" { | ||
| nodePool.Spec.Platform.AWS.CpuOptions = &hyperv1.CpuOptions{ | ||
| NestedVirtualization: o.NestedVirtualization, | ||
| } | ||
| } |
There was a problem hiding this comment.
Fail fast on invalid --nested-virtualization values and unsupported effective instance types.
The flag is currently free-form, and if users set it without --instance-type, the default m5.large/m6g.large path is incompatible with the documented supported families. This will fail late instead of at CLI validation time.
Proposed fix
import (
"context"
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"os"
+ "strings"
@@
func validateAWSOptions(_ context.Context, opts *core.CreateOptions, awsOpts *RawCreateOptions) error {
if err := ValidateCreateCredentialInfo(awsOpts.Credentials, awsOpts.CredentialSecretName, opts.Namespace, opts.PullSecretFile); err != nil {
return err
}
@@
if err := awsutil.ValidateVPCCIDR(awsOpts.VPCCIDR); err != nil {
return err
}
+ if err := validateNestedVirtualization(awsOpts.NestedVirtualization, awsOpts.InstanceType, opts.Arch); err != nil {
+ return err
+ }
return nil
}
+
+func validateNestedVirtualization(value, instanceType, arch string) error {
+ if value == "" {
+ return nil
+ }
+ if value != "enabled" && value != "disabled" {
+ return fmt.Errorf("nested-virtualization must be either enabled or disabled")
+ }
+ effectiveInstanceType := instanceType
+ if effectiveInstanceType == "" {
+ switch arch {
+ case hyperv1.ArchitectureAMD64:
+ effectiveInstanceType = "m5.large"
+ case hyperv1.ArchitectureARM64:
+ effectiveInstanceType = "m6g.large"
+ }
+ }
+ if !(strings.HasPrefix(effectiveInstanceType, "c8i.") ||
+ strings.HasPrefix(effectiveInstanceType, "m8i.") ||
+ strings.HasPrefix(effectiveInstanceType, "r8i.")) {
+ return fmt.Errorf("nested-virtualization is only supported on C8i, M8i, and R8i instance families")
+ }
+ return nil
+}As per coding guidelines: "Validate at trust boundaries with allow-lists, not deny-lists".
Also applies to: 514-514
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/cluster/aws/create.go` around lines 410 - 414, Validate
o.NestedVirtualization at the CLI boundary by checking its value against the
documented allow-list and return an error immediately if it's not allowed; then
compute the effective instance type (use o.InstanceType if set else the code's
default m5.large/m6g.large) and verify that this instance type's family is one
of the supported families for nested virtualization, returning a clear
validation error if incompatible; update the block that assigns
nodePool.Spec.Platform.AWS.CpuOptions (where o.NestedVirtualization is used) to
perform both checks and fail fast instead of setting an invalid value.
| cmd.Flags().StringVar(&platformOpts.NestedVirtualization, "nested-virtualization", platformOpts.NestedVirtualization, "Enable nested virtualization on EC2 instances (enabled or disabled). Supported on C8i, M8i, and R8i instance families.") | ||
|
|
There was a problem hiding this comment.
Add early validation for nested virtualization value and instance family compatibility.
This path also accepts arbitrary values and allows incompatible defaults (m5.large/m6g.large) when nested virtualization is requested, which should be rejected before writing the NodePool spec.
Proposed fix
import (
"context"
"fmt"
+ "strings"
@@
func (o *RawAWSPlatformCreateOptions) Validate(_ context.Context, _ *core.CreateNodePoolOptions) (core.NodePoolPlatformCompleter, error) {
@@
if o.RootVolumeSize < 8 {
return nil, fmt.Errorf("root volume size must be at least 8 GB, got %d", o.RootVolumeSize)
}
+ if o.NestedVirtualization != "" && o.NestedVirtualization != "enabled" && o.NestedVirtualization != "disabled" {
+ return nil, fmt.Errorf("nested-virtualization must be either enabled or disabled")
+ }
@@
if o.InstanceType != "" {
instanceType = o.InstanceType
} else {
@@
}
+ if o.NestedVirtualization != "" &&
+ !(strings.HasPrefix(instanceType, "c8i.") || strings.HasPrefix(instanceType, "m8i.") || strings.HasPrefix(instanceType, "r8i.")) {
+ return fmt.Errorf("nested-virtualization is only supported on C8i, M8i, and R8i instance families")
+ }As per coding guidelines: "Validate at trust boundaries with allow-lists, not deny-lists".
Also applies to: 176-180
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/nodepool/aws/create.go` around lines 108 - 109, Add early validation for
the nested virtualization flag before the NodePool spec is written. Implement
allow-list based validation that checks two things: first, verify that the
nested virtualization value is only "enabled" or "disabled" (reject any
arbitrary values), and second, validate that the selected instance family is
compatible with nested virtualization (only C8i, M8i, and R8i families are
supported, so reject incompatible defaults like m5.large and m6g.large). This
validation should be added at the trust boundary where the flag is processed and
should reject invalid combinations before any NodePool spec is created. Apply
the same validation logic to the other affected code sections around lines
176-180.
| if len(unavailableDependencies) == 0 { | ||
| // reconcile only when all dependencies are available, and don't return error immediately so it can be included in the status condition first. | ||
| reconcilationError = c.update(cpContext) | ||
| c.hasBeenApplied = true | ||
| } |
There was a problem hiding this comment.
Mark component applied only when update succeeds.
Line 189 sets c.hasBeenApplied = true even when reconcilationError != nil. This breaks the “applied successfully” contract and can trigger incorrect delete behavior later.
Suggested fix
reconcilationError = c.update(cpContext)
- c.hasBeenApplied = true
+ if reconcilationError == nil {
+ c.hasBeenApplied = true
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@support/controlplane-component/controlplane-component.go` around lines 186 -
190, The code marks c.hasBeenApplied = true regardless of whether
c.update(cpContext) returned an error; change the logic so c.hasBeenApplied is
set only when the update succeeded: call reconcilationError =
c.update(cpContext) and then set c.hasBeenApplied = true only if
reconcilationError == nil (i.e., after a successful return from c.update),
leaving it false when c.update fails so downstream delete/cleanup behavior
remains correct.
| if !c.hasBeenApplied { | ||
| // if the component has not been applied, it doesn't exist, so there's nothing to delete. | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Deletion skip is tied to volatile process state.
Line 208 returns early based only on c.hasBeenApplied. After controller restarts, this flag resets and can leave previously created component resources undeleted when predicate becomes false.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@support/controlplane-component/controlplane-component.go` around lines 208 -
211, The deletion logic currently returns early based only on the volatile field
c.hasBeenApplied; update the delete path in the component's delete handler (the
method using c.hasBeenApplied) to avoid relying on in-memory state by first
checking persistent cluster state or resource markers instead—e.g., query the
API/server for the component resource or its finalizer/annotation via the
controller client (use the existing client/get method used elsewhere) and only
skip deletion when the resource truly does not exist or when a persistent
“never-delete” marker is present; remove the early return that depends solely on
c.hasBeenApplied and replace it with a cluster-existence check and/or persistent
state check before deciding not to delete.
| // tracks whether the resource has been applied successfully. | ||
| hasBeenApplied bool |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify map value type and copy-based retrieval path
rg -n 'manifestsAdapters\s+map\[string\]genericAdapter|adapter,\s*exist\s*:=\s*c\.manifestsAdapters\[manifestName\]|return adapter\.reconcile' \
support/controlplane-component/controlplane-component.goRepository: openshift/hypershift
Length of output: 219
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== generic-adapter.go (relevant sections) ==="
sed -n '1,180p' support/controlplane-component/generic-adapter.go
echo "=== controlplane-component.go (relevant sections) ==="
sed -n '120,360p' support/controlplane-component/controlplane-component.goRepository: openshift/hypershift
Length of output: 11107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== generic-adapter.go (relevant sections) ==="
sed -n '1,220p' support/controlplane-component/generic-adapter.go
echo "=== controlplane-component.go (relevant sections) ==="
sed -n '120,360p' support/controlplane-component/controlplane-component.goRepository: openshift/hypershift
Length of output: 11107
generic-adapter.hasBeenApplied doesn’t persist (breaks predicate-based cleanup)
controlPlaneWorkload.manifestsAdapters is map[string]genericAdapter, and update() loads adapter by value (adapter, exist := c.manifestsAdapters[manifestName]). genericAdapter.reconcile() mutates ga.hasBeenApplied, but those mutations apply to the local copy, so subsequent reconciles won’t see it—meaning the predicate == false branch will often early-return and skip deletion.
Additionally, hasBeenApplied is purely in-memory, so an operator restart will reset it and the predicate-false path may skip cleanup even when the object still exists.
Suggested fix
- manifestsAdapters map[string]genericAdapter
+ manifestsAdapters map[string]*genericAdapter🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@support/controlplane-component/generic-adapter.go` around lines 21 - 22, The
hasBeenApplied boolean on genericAdapter is mutated on a value copy (so updates
are lost) and is only in-memory (lost on restart); change manifestsAdapters from
map[string]genericAdapter to map[string]*genericAdapter and update update() and
any retrieval sites to use pointer semantics so genericAdapter.reconcile() sets
the real struct's hasBeenApplied; also remove sole reliance on the in-memory
flag by deriving applied state from the cluster (e.g., check the target resource
existence/owner/annotation) or persist the flag into the control-plane
workload's Status (add a field in the CR status and read/write it in
update()/reconcile()) so the predicate-based cleanup survives restarts.
| if !ga.hasBeenApplied { | ||
| // if the resource has not been applied, it doesn't exist, so there's nothing to delete. | ||
| return nil | ||
| } |
There was a problem hiding this comment.
In-memory gate can skip required cleanup after controller restart.
Line 54 short-circuits deletion solely from ga.hasBeenApplied. That flag is process-local, so after restart previously managed resources are treated as “never applied” and cleanup can be skipped.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@support/controlplane-component/generic-adapter.go` around lines 54 - 57, The
short-circuit using the process-local flag ga.hasBeenApplied causes deletion to
be skipped after a controller restart; remove that reliance and instead detect
resource existence/ownership at runtime before skipping deletion: in the delete
path (where ga.hasBeenApplied is checked) call the appropriate lookup/delete
logic (e.g., use the adapter's client or ga.get/ga.getResource method) to query
the cluster for the resource or its ownership/annotation, attempt the deletion
if the resource is present, and only ignore errors that indicate "not found";
keep the operation idempotent and remove the ga.hasBeenApplied-only return so
cleanup runs across restarts.
|
Now I have the complete root cause analysis. Here is my report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryPR #8681 adds a Root CauseThe PR added an optimization to
Recommendations
Evidence
|
Adds support for enabling nested virtualization on AWS EC2 instances (C8i, M8i, R8i families) through the NodePool API. This allows running OpenShift Virtualization (KubeVirt) workloads on AWS-hosted clusters.
CpuOptionsstruct withNestedVirtualizationfield toAWSNodePoolPlatformNestedVirtualizationsupport inCPUOptions(PR kubernetes-sigs/cluster-api-provider-aws#5874)--nested-virtualizationflag tohypershift create cluster awsandhypershift create nodepool awsCpuOptions.NestedVirtualizationfrom NodePool spec to CAPAAWSMachineTemplateSpec.CPUOptions--nested-virtualization enabled --instance-type c8i.2xlargecpuOptions.nestedVirtualization: enabledpropagates through: NodePool → AWSMachineTemplate → AWSMachine → EC2 instanceCpuOptions.NestedVirtualization: enabledvia AWS CLIvmxCPU flag present on guest nodes via/proc/cpuinfoN/A — new feature
Summary by CodeRabbit
New Features
--nested-virtualizationCLI flag.Chores