diff --git a/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md b/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md index 455de6be441..18d15866ed9 100644 --- a/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md +++ b/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md @@ -159,3 +159,7 @@ Step-by-step description of the backup process: how the OADP plugin triggers the ### [Restore Flow](restore-flow.md) Step-by-step description of the restore process: how the OADP plugin injects the snapshot URL, how the Control Plane Operator restores etcd, and how the cluster recovers. + +### [Managed Services Credentials](managed-services-credentials.md) + +Credential configuration for managed platforms (ROSA HCP and ARO HCP) that use federated credentials (AWS STS/IRSA and Azure Workload Identity) instead of static keys. Covers auto-detection logic, Secret format requirements, and infrastructure prerequisites. diff --git a/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md b/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md new file mode 100644 index 00000000000..492b3263499 --- /dev/null +++ b/docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md @@ -0,0 +1,322 @@ +--- +title: Managed Services Credentials +--- + +# Managed Services Credential Configuration + +!!! warning "Tech Preview" + + This feature requires the `HCPEtcdBackup` feature gate enabled in the HyperShift Operator. + +The HCPEtcdBackup controller automatically detects the authentication mode from the credential Secret referenced in the backup specification. This page describes how to configure credentials for managed service platforms (ROSA HCP and ARO HCP) that use short-lived, federated credentials instead of long-lived static keys. + +## OADP Plugin Secret Flow + +The credential Secret originates in the OADP namespace (typically `openshift-adp`) and is always named `cloud-credentials`. It always uses the `cloud` data key. When the OADP HyperShift plugin triggers an etcd snapshot backup, it copies this Secret to the HyperShift Operator namespace, remapping the key and preserving the original: + +| | Namespace | Secret Name | Data Keys | +|---|-----------|-------------|-----------| +| **Source** | `` | `cloud-credentials` | `cloud` | +| **Destination** | `hypershift` (HO namespace) | `cloud-credentials` | `credentials` (remapped from `cloud`) + `cloud` (preserved) | + +The destination Secret contains both keys with the same content. The controller's credential auto-detection logic reads from the destination copy: + +- For **S3** storage: reads the `credentials` key (remapped from `cloud`) +- For **Azure Blob** storage: checks the `cloud` key first (preserved original), then falls back to `credentials` + +No manual Secret creation or copying is required — the OADP plugin handles this automatically for both ROSA HCP and ARO HCP. + +## Credential Auto-Detection + +The controller inspects the content of the credential Secret to determine the authentication mode. No explicit configuration flag is required — the Secret format itself drives the behavior. + +### AWS Credential Modes + +| Mode | Detection | PodSpec Behavior | +|------|-----------|------------------| +| **Static** | `credentials` key contains an AWS credentials file (`[default]` profile) | Mounts credentials file at `/etc/etcd-backup-creds/credentials`, passes `--credentials-file` | +| **STS/IRSA** | `credentials` key contains a bare IAM role ARN (`arn:aws:iam::...`) | Projected SA token volume (`sts.amazonaws.com` audience), `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` env vars, no credentials file | + +### Azure Credential Modes + +| Mode | Detection | PodSpec Behavior | +|------|-----------|------------------| +| **Workload Identity** | Secret has a `cloud` key containing a non-empty `AZURE_CLIENT_ID=` value | Pod label `azure.workload.identity/use=true`, SA annotated with `azure.workload.identity/client-id`, no credentials file, no `--azure-auth-type` flag | +| **Client Secret** | `credentials` key contains JSON with a non-empty `clientSecret` field | Mounts credentials file, passes `--credentials-file` and `--azure-auth-type client-secret` | +| **Managed Identity** | `credentials` key contains JSON without a `clientSecret` field | Mounts credentials file, passes `--credentials-file` and `--azure-auth-type managed-identity` | + +## ROSA HCP (AWS STS/IRSA) + +ROSA HCP clusters use IAM Roles for Service Accounts (IRSA) via AWS Security Token Service (STS). The backup Job Pod authenticates to S3 using a projected ServiceAccount token exchanged for temporary AWS credentials. + +### Prerequisites + +1. An S3 bucket for storing etcd snapshots +2. An IAM role with a trust policy allowing the OIDC provider associated with your management cluster +3. The IAM role must have permissions to write objects to the target S3 bucket + +#### IAM Role Trust Policy + +The trust policy must allow the `etcd-backup-job` ServiceAccount in the HyperShift Operator namespace to assume the role: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { + ":sub": "system:serviceaccount::etcd-backup-job" + } + } + } + ] +} +``` + +Replace: + +- ``: Your AWS account ID +- ``: The OIDC provider URL for your management cluster (without `https://`) +- ``: The namespace where the HyperShift Operator runs (typically `hypershift`) + +#### IAM Role Permissions + +The role needs S3 write access: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:AbortMultipartUpload" + ], + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ] + } + ] +} +``` + +!!! note + + The `s3:PutObject` permission authorizes all multipart upload operations (`CreateMultipartUpload`, `UploadPart`, `CompleteMultipartUpload`) used by the AWS SDK v2 Transfer Manager, which automatically splits files larger than 5 MB into multipart uploads. Etcd snapshots typically range from 30-100 MB, so multipart upload is used in practice. `s3:AbortMultipartUpload` is included to clean up incomplete uploads on failure. + +### Credential Secret Format + +The credential Secret in the OADP namespace uses the `cloud` key with the IAM role ARN: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: cloud-credentials + namespace: +type: Opaque +stringData: + cloud: "arn:aws:iam:::role/" +``` + +The OADP plugin copies this Secret to the HyperShift Operator namespace, remapping `cloud` → `credentials` and preserving the original `cloud` key (see [OADP Plugin Secret Flow](#oadp-plugin-secret-flow)). The controller reads the `credentials` key in the destination copy and detects STS mode when the value starts with `arn:`. + +!!! note + + The `cloud` value must be a bare ARN string, not an AWS credentials file. + +### HCPEtcdBackup CR + +```yaml +apiVersion: hypershift.openshift.io/v1beta1 +kind: HCPEtcdBackup +metadata: + name: my-backup + namespace: +spec: + storage: + storageType: S3 + s3: + bucket: + region: + keyPrefix: etcd-backups + credentials: + name: cloud-credentials +``` + +### What Happens at Runtime + +When the controller detects STS mode: + +1. The ServiceAccount `etcd-backup-job` is created in the HO namespace (no special annotations needed for AWS IRSA — the projected token handles authentication) +2. The Job Pod gets a projected volume with a ServiceAccount token using audience `sts.amazonaws.com` and 1-hour expiration +3. The upload container receives environment variables `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` +4. No credentials file is mounted — the AWS SDK uses the projected token to assume the role via STS + +## ARO HCP (Azure Workload Identity) + +ARO HCP clusters use Azure AD Workload Identity for pod-level authentication. The backup Job Pod authenticates to Azure Blob Storage using a federated token projected by the Azure Workload Identity webhook. + +### Prerequisites + +1. An Azure Storage Account with a blob container for storing etcd snapshots +2. A User-Assigned Managed Identity with a federated credential configured for the backup Job's ServiceAccount +3. The managed identity must have `Storage Blob Data Contributor` role on the storage account + +#### Federated Credential + +Create a federated credential on the managed identity that trusts the `etcd-backup-job` ServiceAccount: + +```bash +az identity federated-credential create \ + --name etcd-backup-fedcred \ + --identity-name \ + --resource-group \ + --issuer \ + --subject "system:serviceaccount::etcd-backup-job" \ + --audiences "api://AzureADTokenExchange" +``` + +Replace: + +- ``: Name of the User-Assigned Managed Identity +- ``: Resource group containing the managed identity +- ``: The OIDC issuer URL of your management cluster +- ``: The namespace where the HyperShift Operator runs (typically `hypershift`) + +!!! important + + The subject must match exactly: `system:serviceaccount::etcd-backup-job`. The Job runs in the HO namespace, not the HCP namespace. + +#### Storage Account Role Assignment + +Assign the `Storage Blob Data Contributor` role to the managed identity on the storage account: + +```bash +az role assignment create \ + --assignee-object-id $(az identity show -n -g --query principalId -o tsv) \ + --role "Storage Blob Data Contributor" \ + --scope /subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts/ +``` + +### Credential Secret Format + +The credential Secret in the OADP namespace uses the `cloud` key with Azure identity configuration in key-value format: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: cloud-credentials + namespace: +type: Opaque +stringData: + cloud: | + AZURE_SUBSCRIPTION_ID= + AZURE_TENANT_ID= + AZURE_CLIENT_ID= + AZURE_RESOURCE_GROUP= + AZURE_CLOUD_NAME=AzurePublicCloud +``` + +The OADP plugin copies this Secret to the HyperShift Operator namespace, remapping `cloud` → `credentials` and preserving the original `cloud` key (see [OADP Plugin Secret Flow](#oadp-plugin-secret-flow)). The controller reads the destination copy and detects Workload Identity mode when it finds the `cloud` key. It extracts the `AZURE_CLIENT_ID` value to annotate the ServiceAccount. + +!!! note + + Fields like `AZURE_FEDERATED_TOKEN_FILE` and `AZURE_AUTHORITY_HOST` are not needed in the Secret — they are injected at runtime by the Azure Workload Identity webhook into the Pod environment. + +### HCPEtcdBackup CR + +```yaml +apiVersion: hypershift.openshift.io/v1beta1 +kind: HCPEtcdBackup +metadata: + name: my-backup + namespace: +spec: + storage: + storageType: AzureBlob + azureBlob: + container: + storageAccount: + keyPrefix: etcd-backups + credentials: + name: cloud-credentials +``` + +### What Happens at Runtime + +When the controller detects Azure Workload Identity mode: + +1. The ServiceAccount `etcd-backup-job` is created in the HO namespace with annotation `azure.workload.identity/client-id: ` +2. The Job Pod template gets the label `azure.workload.identity/use: "true"` +3. The Azure Workload Identity webhook mutates the Pod to inject: + - A projected volume with a federated token + - Environment variables `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_FEDERATED_TOKEN_FILE`, `AZURE_AUTHORITY_HOST` +4. No credentials file is mounted and no `--azure-auth-type` flag is passed — the Azure SDK uses the injected federated token + +## OADP Plugin Integration + +The OADP HyperShift plugin handles credential propagation automatically for both ROSA HCP and ARO HCP. No manual Secret creation or copying is required. The plugin copies the `cloud-credentials` Secret from the OADP namespace to the HyperShift Operator namespace, performing key remapping as described in the [OADP Plugin Secret Flow](#oadp-plugin-secret-flow) section above. The controller then auto-detects the credential mode from the destination Secret content. + +## Troubleshooting + +### AWS STS: Access Denied + +```text +An error occurred (AccessDenied) when calling the operation +``` + +- Verify the IAM role trust policy allows the correct OIDC provider and ServiceAccount subject +- Confirm the IAM role has all required S3 permissions on the target bucket (`s3:PutObject`, `s3:AbortMultipartUpload`) — see [IAM Role Permissions](#iam-role-permissions) +- Check that the OIDC provider URL matches the management cluster's issuer + +### Azure WI: No Matching Federated Identity Record + +```text +AADSTS700213: No matching federated identity record found for presented assertion subject +``` + +- Verify the federated credential subject matches exactly: `system:serviceaccount::etcd-backup-job` +- The Job runs in the HO namespace (e.g., `hypershift`), not the HCP namespace +- Confirm the OIDC issuer URL matches the management cluster's issuer +- Check the audience is set to `api://AzureADTokenExchange` + +### Secret Key Mismatch + +If the controller falls through to the wrong credential mode, check the destination Secret (in the HO namespace) has the expected keys after plugin copying: + +- AWS: must have a `credentials` key (remapped from `cloud` by the plugin) +- Azure WI: must have a `cloud` key (preserved by the plugin; the presence of this key triggers WI mode). `AZURE_CLIENT_ID=` should be included in the value for SA annotation +- Azure Client Secret: must have a `credentials` key with JSON containing `clientSecret` + +### Job Pods Not Starting + +Check the ServiceAccount and its annotations: + +```bash +kubectl get sa etcd-backup-job -n -o yaml +``` + +- For Azure WI: the SA must have annotation `azure.workload.identity/client-id` +- For AWS STS: verify the projected volume appears in the Pod spec + +### Verifying PodSpec + +Inspect the generated Job to confirm the correct credential mode was applied: + +```bash +kubectl get job -n -l app=etcd-backup -o yaml +``` + +- **AWS STS**: Look for `AWS_ROLE_ARN` env var and `aws-iam-token` projected volume +- **Azure WI**: Look for pod label `azure.workload.identity/use: "true"` and no `--credentials-file` in container args +- **Static/Client Secret**: Look for `backup-credentials` volume and `--credentials-file` in container args diff --git a/docs/content/reference/aggregated-docs.md b/docs/content/reference/aggregated-docs.md index 0a0aef48d23..861c16ec4cb 100644 --- a/docs/content/reference/aggregated-docs.md +++ b/docs/content/reference/aggregated-docs.md @@ -18597,6 +18597,338 @@ Step-by-step description of the backup process: how the OADP plugin triggers the Step-by-step description of the restore process: how the OADP plugin injects the snapshot URL, how the Control Plane Operator restores etcd, and how the cluster recovers. +### Managed Services Credentials + +Credential configuration for managed platforms (ROSA HCP and ARO HCP) that use federated credentials (AWS STS/IRSA and Azure Workload Identity) instead of static keys. Covers auto-detection logic, Secret format requirements, and infrastructure prerequisites. + + +--- + +## Source: docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md + +--- +title: Managed Services Credentials +--- + +# Managed Services Credential Configuration + +!!! warning "Tech Preview" + + This feature requires the `HCPEtcdBackup` feature gate enabled in the HyperShift Operator. + +The HCPEtcdBackup controller automatically detects the authentication mode from the credential Secret referenced in the backup specification. This page describes how to configure credentials for managed service platforms (ROSA HCP and ARO HCP) that use short-lived, federated credentials instead of long-lived static keys. + +## OADP Plugin Secret Flow + +The credential Secret originates in the OADP namespace (typically `openshift-adp`) and is always named `cloud-credentials`. It always uses the `cloud` data key. When the OADP HyperShift plugin triggers an etcd snapshot backup, it copies this Secret to the HyperShift Operator namespace, remapping the key and preserving the original: + +| | Namespace | Secret Name | Data Keys | +|---|-----------|-------------|-----------| +| **Source** | `` | `cloud-credentials` | `cloud` | +| **Destination** | `hypershift` (HO namespace) | `cloud-credentials` | `credentials` (remapped from `cloud`) + `cloud` (preserved) | + +The destination Secret contains both keys with the same content. The controller's credential auto-detection logic reads from the destination copy: + +- For **S3** storage: reads the `credentials` key (remapped from `cloud`) +- For **Azure Blob** storage: checks the `cloud` key first (preserved original), then falls back to `credentials` + +No manual Secret creation or copying is required — the OADP plugin handles this automatically for both ROSA HCP and ARO HCP. + +## Credential Auto-Detection + +The controller inspects the content of the credential Secret to determine the authentication mode. No explicit configuration flag is required — the Secret format itself drives the behavior. + +### AWS Credential Modes + +| Mode | Detection | PodSpec Behavior | +|------|-----------|------------------| +| **Static** | `credentials` key contains an AWS credentials file (`[default]` profile) | Mounts credentials file at `/etc/etcd-backup-creds/credentials`, passes `--credentials-file` | +| **STS/IRSA** | `credentials` key contains a bare IAM role ARN (`arn:aws:iam::...`) | Projected SA token volume (`sts.amazonaws.com` audience), `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` env vars, no credentials file | + +### Azure Credential Modes + +| Mode | Detection | PodSpec Behavior | +|------|-----------|------------------| +| **Workload Identity** | Secret has a `cloud` key containing a non-empty `AZURE_CLIENT_ID=` value | Pod label `azure.workload.identity/use=true`, SA annotated with `azure.workload.identity/client-id`, no credentials file, no `--azure-auth-type` flag | +| **Client Secret** | `credentials` key contains JSON with a non-empty `clientSecret` field | Mounts credentials file, passes `--credentials-file` and `--azure-auth-type client-secret` | +| **Managed Identity** | `credentials` key contains JSON without a `clientSecret` field | Mounts credentials file, passes `--credentials-file` and `--azure-auth-type managed-identity` | + +## ROSA HCP (AWS STS/IRSA) + +ROSA HCP clusters use IAM Roles for Service Accounts (IRSA) via AWS Security Token Service (STS). The backup Job Pod authenticates to S3 using a projected ServiceAccount token exchanged for temporary AWS credentials. + +### Prerequisites + +1. An S3 bucket for storing etcd snapshots +2. An IAM role with a trust policy allowing the OIDC provider associated with your management cluster +3. The IAM role must have permissions to write objects to the target S3 bucket + +#### IAM Role Trust Policy + +The trust policy must allow the `etcd-backup-job` ServiceAccount in the HyperShift Operator namespace to assume the role: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { + ":sub": "system:serviceaccount::etcd-backup-job" + } + } + } + ] +} +``` + +Replace: + +- ``: Your AWS account ID +- ``: The OIDC provider URL for your management cluster (without `https://`) +- ``: The namespace where the HyperShift Operator runs (typically `hypershift`) + +#### IAM Role Permissions + +The role needs S3 write access: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:AbortMultipartUpload" + ], + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ] + } + ] +} +``` + +!!! note + + The `s3:PutObject` permission authorizes all multipart upload operations (`CreateMultipartUpload`, `UploadPart`, `CompleteMultipartUpload`) used by the AWS SDK v2 Transfer Manager, which automatically splits files larger than 5 MB into multipart uploads. Etcd snapshots typically range from 30-100 MB, so multipart upload is used in practice. `s3:AbortMultipartUpload` is included to clean up incomplete uploads on failure. + +### Credential Secret Format + +The credential Secret in the OADP namespace uses the `cloud` key with the IAM role ARN: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: cloud-credentials + namespace: +type: Opaque +stringData: + cloud: "arn:aws:iam:::role/" +``` + +The OADP plugin copies this Secret to the HyperShift Operator namespace, remapping `cloud` → `credentials` and preserving the original `cloud` key (see OADP Plugin Secret Flow). The controller reads the `credentials` key in the destination copy and detects STS mode when the value starts with `arn:`. + +!!! note + + The `cloud` value must be a bare ARN string, not an AWS credentials file. + +### HCPEtcdBackup CR + +```yaml +apiVersion: hypershift.openshift.io/v1beta1 +kind: HCPEtcdBackup +metadata: + name: my-backup + namespace: +spec: + storage: + storageType: S3 + s3: + bucket: + region: + keyPrefix: etcd-backups + credentials: + name: cloud-credentials +``` + +### What Happens at Runtime + +When the controller detects STS mode: + +1. The ServiceAccount `etcd-backup-job` is created in the HO namespace (no special annotations needed for AWS IRSA — the projected token handles authentication) +2. The Job Pod gets a projected volume with a ServiceAccount token using audience `sts.amazonaws.com` and 1-hour expiration +3. The upload container receives environment variables `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` +4. No credentials file is mounted — the AWS SDK uses the projected token to assume the role via STS + +## ARO HCP (Azure Workload Identity) + +ARO HCP clusters use Azure AD Workload Identity for pod-level authentication. The backup Job Pod authenticates to Azure Blob Storage using a federated token projected by the Azure Workload Identity webhook. + +### Prerequisites + +1. An Azure Storage Account with a blob container for storing etcd snapshots +2. A User-Assigned Managed Identity with a federated credential configured for the backup Job's ServiceAccount +3. The managed identity must have `Storage Blob Data Contributor` role on the storage account + +#### Federated Credential + +Create a federated credential on the managed identity that trusts the `etcd-backup-job` ServiceAccount: + +```bash +az identity federated-credential create \ + --name etcd-backup-fedcred \ + --identity-name \ + --resource-group \ + --issuer \ + --subject "system:serviceaccount::etcd-backup-job" \ + --audiences "api://AzureADTokenExchange" +``` + +Replace: + +- ``: Name of the User-Assigned Managed Identity +- ``: Resource group containing the managed identity +- ``: The OIDC issuer URL of your management cluster +- ``: The namespace where the HyperShift Operator runs (typically `hypershift`) + +!!! important + + The subject must match exactly: `system:serviceaccount::etcd-backup-job`. The Job runs in the HO namespace, not the HCP namespace. + +#### Storage Account Role Assignment + +Assign the `Storage Blob Data Contributor` role to the managed identity on the storage account: + +```bash +az role assignment create \ + --assignee-object-id $(az identity show -n -g --query principalId -o tsv) \ + --role "Storage Blob Data Contributor" \ + --scope /subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts/ +``` + +### Credential Secret Format + +The credential Secret in the OADP namespace uses the `cloud` key with Azure identity configuration in key-value format: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: cloud-credentials + namespace: +type: Opaque +stringData: + cloud: | + AZURE_SUBSCRIPTION_ID= + AZURE_TENANT_ID= + AZURE_CLIENT_ID= + AZURE_RESOURCE_GROUP= + AZURE_CLOUD_NAME=AzurePublicCloud +``` + +The OADP plugin copies this Secret to the HyperShift Operator namespace, remapping `cloud` → `credentials` and preserving the original `cloud` key (see OADP Plugin Secret Flow). The controller reads the destination copy and detects Workload Identity mode when it finds the `cloud` key. It extracts the `AZURE_CLIENT_ID` value to annotate the ServiceAccount. + +!!! note + + Fields like `AZURE_FEDERATED_TOKEN_FILE` and `AZURE_AUTHORITY_HOST` are not needed in the Secret — they are injected at runtime by the Azure Workload Identity webhook into the Pod environment. + +### HCPEtcdBackup CR + +```yaml +apiVersion: hypershift.openshift.io/v1beta1 +kind: HCPEtcdBackup +metadata: + name: my-backup + namespace: +spec: + storage: + storageType: AzureBlob + azureBlob: + container: + storageAccount: + keyPrefix: etcd-backups + credentials: + name: cloud-credentials +``` + +### What Happens at Runtime + +When the controller detects Azure Workload Identity mode: + +1. The ServiceAccount `etcd-backup-job` is created in the HO namespace with annotation `azure.workload.identity/client-id: ` +2. The Job Pod template gets the label `azure.workload.identity/use: "true"` +3. The Azure Workload Identity webhook mutates the Pod to inject: + - A projected volume with a federated token + - Environment variables `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_FEDERATED_TOKEN_FILE`, `AZURE_AUTHORITY_HOST` +4. No credentials file is mounted and no `--azure-auth-type` flag is passed — the Azure SDK uses the injected federated token + +## OADP Plugin Integration + +The OADP HyperShift plugin handles credential propagation automatically for both ROSA HCP and ARO HCP. No manual Secret creation or copying is required. The plugin copies the `cloud-credentials` Secret from the OADP namespace to the HyperShift Operator namespace, performing key remapping as described in the OADP Plugin Secret Flow section above. The controller then auto-detects the credential mode from the destination Secret content. + +## Troubleshooting + +### AWS STS: Access Denied + +```text +An error occurred (AccessDenied) when calling the operation +``` + +- Verify the IAM role trust policy allows the correct OIDC provider and ServiceAccount subject +- Confirm the IAM role has all required S3 permissions on the target bucket (`s3:PutObject`, `s3:AbortMultipartUpload`) — see IAM Role Permissions +- Check that the OIDC provider URL matches the management cluster's issuer + +### Azure WI: No Matching Federated Identity Record + +```text +AADSTS700213: No matching federated identity record found for presented assertion subject +``` + +- Verify the federated credential subject matches exactly: `system:serviceaccount::etcd-backup-job` +- The Job runs in the HO namespace (e.g., `hypershift`), not the HCP namespace +- Confirm the OIDC issuer URL matches the management cluster's issuer +- Check the audience is set to `api://AzureADTokenExchange` + +### Secret Key Mismatch + +If the controller falls through to the wrong credential mode, check the destination Secret (in the HO namespace) has the expected keys after plugin copying: + +- AWS: must have a `credentials` key (remapped from `cloud` by the plugin) +- Azure WI: must have a `cloud` key (preserved by the plugin; the presence of this key triggers WI mode). `AZURE_CLIENT_ID=` should be included in the value for SA annotation +- Azure Client Secret: must have a `credentials` key with JSON containing `clientSecret` + +### Job Pods Not Starting + +Check the ServiceAccount and its annotations: + +```bash +kubectl get sa etcd-backup-job -n -o yaml +``` + +- For Azure WI: the SA must have annotation `azure.workload.identity/client-id` +- For AWS STS: verify the projected volume appears in the Pod spec + +### Verifying PodSpec + +Inspect the generated Job to confirm the correct credential mode was applied: + +```bash +kubectl get job -n -l app=etcd-backup -o yaml +``` + +- **AWS STS**: Look for `AWS_ROLE_ARN` env var and `aws-iam-token` projected volume +- **Azure WI**: Look for pod label `azure.workload.identity/use: "true"` and no `--credentials-file` in container args +- **Static/Client Secret**: Look for `backup-credentials` volume and `--credentials-file` in container args + --- diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index c50f7e0a1fa..7e6ece8ecdd 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -113,6 +113,7 @@ nav: - how-to/disaster-recovery/etcd-snapshot-backup/index.md - how-to/disaster-recovery/etcd-snapshot-backup/backup-flow.md - how-to/disaster-recovery/etcd-snapshot-backup/restore-flow.md + - how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md - 'Disconnected': - how-to/disconnected/index.md - how-to/disconnected/image-content-sources.md diff --git a/hypershift-operator/controllers/etcdbackup/credentials.go b/hypershift-operator/controllers/etcdbackup/credentials.go new file mode 100644 index 00000000000..cb81fe5b641 --- /dev/null +++ b/hypershift-operator/controllers/etcdbackup/credentials.go @@ -0,0 +1,126 @@ +package etcdbackup + +import ( + "encoding/json" + "strings" + + hyperv1 "github.com/openshift/hypershift/api/hypershift/v1beta1" + + corev1 "k8s.io/api/core/v1" + "k8s.io/klog/v2" +) + +type credentialMode string + +const ( + credentialModeAWSStatic credentialMode = "aws-static" + credentialModeAWSSTS credentialMode = "aws-sts" + credentialModeAzureClientSecret credentialMode = "azure-client-secret" + credentialModeAzureWorkloadIdentity credentialMode = "azure-workload-identity" + credentialModeAzureManagedIdentity credentialMode = "azure-managed-identity" +) + +type resolvedCredentials struct { + Mode credentialMode + SecretName string + RoleARN string + ClientID string +} + +func (c resolvedCredentials) needsCredentialsFile() bool { + switch c.Mode { + case credentialModeAWSStatic, credentialModeAzureClientSecret, credentialModeAzureManagedIdentity: + return true + default: + return false + } +} + +func (c resolvedCredentials) needsProjectedToken() bool { + return c.Mode == credentialModeAWSSTS +} + +func (c resolvedCredentials) needsWorkloadIdentityLabel() bool { + return c.Mode == credentialModeAzureWorkloadIdentity +} + +func (c resolvedCredentials) azureAuthType() string { + switch c.Mode { + case credentialModeAzureClientSecret: + return "client-secret" + case credentialModeAzureManagedIdentity: + return "managed-identity" + default: + return "" + } +} + +func resolveCredentials(storageType hyperv1.HCPEtcdBackupStorageType, secret *corev1.Secret) resolvedCredentials { + switch storageType { + case hyperv1.S3BackupStorage: + return resolveAWSCredentials(secret) + case hyperv1.AzureBlobBackupStorage: + return resolveAzureCredentials(secret) + default: + klog.Warningf("resolveCredentials: unknown storage type %q for secret %s, defaulting to AWS static credentials", storageType, secret.Name) + return resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: secret.Name} + } +} + +func resolveAWSCredentials(secret *corev1.Secret) resolvedCredentials { + creds := strings.TrimSpace(string(secret.Data["credentials"])) + if strings.HasPrefix(creds, "arn:") { + return resolvedCredentials{ + Mode: credentialModeAWSSTS, + SecretName: secret.Name, + RoleARN: creds, + } + } + return resolvedCredentials{ + Mode: credentialModeAWSStatic, + SecretName: secret.Name, + } +} + +func resolveAzureCredentials(secret *corev1.Secret) resolvedCredentials { + if cloudData, ok := secret.Data["cloud"]; ok { + var clientID string + for line := range strings.SplitSeq(string(cloudData), "\n") { + line = strings.TrimSpace(line) + if v, ok := strings.CutPrefix(line, "AZURE_CLIENT_ID="); ok { + clientID = strings.TrimSpace(v) + break + } + } + if clientID == "" { + return resolvedCredentials{ + Mode: credentialModeAzureManagedIdentity, + SecretName: secret.Name, + } + } + return resolvedCredentials{ + Mode: credentialModeAzureWorkloadIdentity, + SecretName: secret.Name, + ClientID: clientID, + } + } + + if credData, ok := secret.Data["credentials"]; ok { + var creds struct { + ClientSecret string `json:"clientSecret"` + } + if err := json.Unmarshal(credData, &creds); err != nil { + klog.Warningf("resolveAzureCredentials: failed to parse 'credentials' key in secret %s as JSON: %v, falling through to managed-identity mode", secret.Name, err) + } else if creds.ClientSecret != "" { + return resolvedCredentials{ + Mode: credentialModeAzureClientSecret, + SecretName: secret.Name, + } + } + } + + return resolvedCredentials{ + Mode: credentialModeAzureManagedIdentity, + SecretName: secret.Name, + } +} diff --git a/hypershift-operator/controllers/etcdbackup/credentials_test.go b/hypershift-operator/controllers/etcdbackup/credentials_test.go new file mode 100644 index 00000000000..1a328cf4ab6 --- /dev/null +++ b/hypershift-operator/controllers/etcdbackup/credentials_test.go @@ -0,0 +1,302 @@ +package etcdbackup + +import ( + "testing" + + . "github.com/onsi/gomega" + + hyperv1 "github.com/openshift/hypershift/api/hypershift/v1beta1" + + corev1 "k8s.io/api/core/v1" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" +) + +func TestResolveAWSCredentials(t *testing.T) { + t.Run("When Secret contains an ARN it should detect STS mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds"}, + Data: map[string][]byte{ + "credentials": []byte("arn:aws:iam::123456789012:role/etcd-backup-role"), + }, + } + + result := resolveAWSCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAWSSTS)) + g.Expect(result.RoleARN).To(Equal("arn:aws:iam::123456789012:role/etcd-backup-role")) + g.Expect(result.SecretName).To(Equal("aws-creds")) + }) + + t.Run("When Secret contains an ARN with whitespace it should detect STS mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds"}, + Data: map[string][]byte{ + "credentials": []byte(" arn:aws:iam::123456789012:role/etcd-backup-role\n"), + }, + } + + result := resolveAWSCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAWSSTS)) + g.Expect(result.RoleARN).To(Equal("arn:aws:iam::123456789012:role/etcd-backup-role")) + }) + + t.Run("When Secret contains INI credentials it should detect static mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds"}, + Data: map[string][]byte{ + "credentials": []byte("[default]\naws_access_key_id = AKIAIOSFODNN7EXAMPLE\naws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\n"), + }, + } + + result := resolveAWSCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAWSStatic)) + g.Expect(result.RoleARN).To(BeEmpty()) + g.Expect(result.SecretName).To(Equal("aws-creds")) + }) + + t.Run("When Secret has empty credentials it should detect static mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds"}, + Data: map[string][]byte{"credentials": []byte("")}, + } + + result := resolveAWSCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAWSStatic)) + }) +} + +func TestResolveAzureCredentials(t *testing.T) { + t.Run("When Secret has cloud key with AZURE_CLIENT_ID it should detect workload-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_SUBSCRIPTION_ID=sub-123\nAZURE_TENANT_ID=tenant-456\nAZURE_CLIENT_ID=client-789\nAZURE_RESOURCE_GROUP=rg-test\nAZURE_CLOUD_NAME=AzurePublicCloud\n"), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureWorkloadIdentity)) + g.Expect(result.ClientID).To(Equal("client-789")) + g.Expect(result.SecretName).To(Equal("azure-creds")) + }) + + t.Run("When Secret has cloud key without AZURE_CLIENT_ID it should fall back to managed-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_SUBSCRIPTION_ID=sub-123\nAZURE_TENANT_ID=tenant-456\n"), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureManagedIdentity)) + g.Expect(result.ClientID).To(BeEmpty()) + }) + + t.Run("When Secret has credentials key with clientSecret JSON it should detect client-secret mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "credentials": []byte(`{"subscriptionId":"sub-123","tenantId":"tenant-456","clientId":"client-789","clientSecret":"secret-abc"}`), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureClientSecret)) + g.Expect(result.SecretName).To(Equal("azure-creds")) + }) + + t.Run("When Secret has credentials key with JSON but empty clientSecret it should detect managed-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "credentials": []byte(`{"subscriptionId":"sub-123","tenantId":"tenant-456","clientId":"client-789","clientSecret":""}`), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureManagedIdentity)) + }) + + t.Run("When Secret has credentials key with non-JSON content it should detect managed-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "credentials": []byte("certificate-data-here"), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureManagedIdentity)) + }) + + t.Run("When Secret has no known keys it should detect managed-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{}, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureManagedIdentity)) + }) + + t.Run("When Secret has cloud key with AZURE_CLIENT_ID= and whitespace value it should fall back to managed-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_SUBSCRIPTION_ID=sub-123\nAZURE_CLIENT_ID= \nAZURE_TENANT_ID=tenant-456\n"), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureManagedIdentity)) + g.Expect(result.ClientID).To(BeEmpty()) + }) + + t.Run("When Secret has cloud key with Windows-style CRLF line endings it should trim and detect workload-identity mode", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_SUBSCRIPTION_ID=sub-123\r\nAZURE_CLIENT_ID=client-789\r\nAZURE_TENANT_ID=tenant-456\r\n"), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureWorkloadIdentity)) + g.Expect(result.ClientID).To(Equal("client-789")) + }) + + t.Run("When Secret has both cloud and credentials keys it should prioritize cloud key", func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds"}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_CLIENT_ID=wi-client\n"), + "credentials": []byte(`{"clientSecret":"should-be-ignored"}`), + }, + } + + result := resolveAzureCredentials(secret) + g.Expect(result.Mode).To(Equal(credentialModeAzureWorkloadIdentity)) + g.Expect(result.ClientID).To(Equal("wi-client")) + }) +} + +func TestResolveCredentials(t *testing.T) { + tests := []struct { + name string + storageType hyperv1.HCPEtcdBackupStorageType + secretName string + secretData map[string][]byte + wantMode credentialMode + }{ + { + name: "When storage type is S3 it should delegate to AWS resolution", + storageType: hyperv1.S3BackupStorage, + secretName: "aws-creds", + secretData: map[string][]byte{"credentials": []byte("arn:aws:iam::123456789012:role/test")}, + wantMode: credentialModeAWSSTS, + }, + { + name: "When storage type is AzureBlob it should delegate to Azure resolution", + storageType: hyperv1.AzureBlobBackupStorage, + secretName: "azure-creds", + secretData: map[string][]byte{"cloud": []byte("AZURE_CLIENT_ID=client-123\n")}, + wantMode: credentialModeAzureWorkloadIdentity, + }, + { + name: "When storage type is unknown it should default to AWS static mode", + storageType: "UnknownStorage", + secretName: "some-creds", + secretData: map[string][]byte{}, + wantMode: credentialModeAWSStatic, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + g := NewGomegaWithT(t) + secret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: tt.secretName}, + Data: tt.secretData, + } + result := resolveCredentials(tt.storageType, secret) + g.Expect(result.Mode).To(Equal(tt.wantMode)) + g.Expect(result.SecretName).To(Equal(tt.secretName)) + }) + } +} + +func TestCredentialBehavioralMethods(t *testing.T) { + tests := []struct { + name string + mode credentialMode + wantNeedsCredentialsFile bool + wantNeedsProjectedToken bool + wantNeedsWorkloadIdentity bool + wantAzureAuthType string + }{ + { + name: "When mode is AWS static it should need credentials file only", + mode: credentialModeAWSStatic, + wantNeedsCredentialsFile: true, + wantNeedsProjectedToken: false, + wantNeedsWorkloadIdentity: false, + wantAzureAuthType: "", + }, + { + name: "When mode is AWS STS it should need projected token only", + mode: credentialModeAWSSTS, + wantNeedsCredentialsFile: false, + wantNeedsProjectedToken: true, + wantNeedsWorkloadIdentity: false, + wantAzureAuthType: "", + }, + { + name: "When mode is Azure client-secret it should need credentials file and report auth type", + mode: credentialModeAzureClientSecret, + wantNeedsCredentialsFile: true, + wantNeedsProjectedToken: false, + wantNeedsWorkloadIdentity: false, + wantAzureAuthType: "client-secret", + }, + { + name: "When mode is Azure workload-identity it should need workload identity label only", + mode: credentialModeAzureWorkloadIdentity, + wantNeedsCredentialsFile: false, + wantNeedsProjectedToken: false, + wantNeedsWorkloadIdentity: true, + wantAzureAuthType: "", + }, + { + name: "When mode is Azure managed-identity it should need credentials file and report auth type", + mode: credentialModeAzureManagedIdentity, + wantNeedsCredentialsFile: true, + wantNeedsProjectedToken: false, + wantNeedsWorkloadIdentity: false, + wantAzureAuthType: "managed-identity", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + g := NewGomegaWithT(t) + creds := resolvedCredentials{Mode: tt.mode} + g.Expect(creds.needsCredentialsFile()).To(Equal(tt.wantNeedsCredentialsFile)) + g.Expect(creds.needsProjectedToken()).To(Equal(tt.wantNeedsProjectedToken)) + g.Expect(creds.needsWorkloadIdentityLabel()).To(Equal(tt.wantNeedsWorkloadIdentity)) + g.Expect(creds.azureAuthType()).To(Equal(tt.wantAzureAuthType)) + }) + } +} diff --git a/hypershift-operator/controllers/etcdbackup/reconciler.go b/hypershift-operator/controllers/etcdbackup/reconciler.go index 6c8e4f8672b..eba123ba4f6 100644 --- a/hypershift-operator/controllers/etcdbackup/reconciler.go +++ b/hypershift-operator/controllers/etcdbackup/reconciler.go @@ -65,11 +65,13 @@ const ( volumeEtcdCerts = "etcd-certs" volumeEtcdBackup = "etcd-backup" volumeCredentials = "backup-credentials" + volumeAWSIAMToken = "aws-iam-token" // Mount paths. mountPathEtcdCerts = "/etc/etcd-certs" mountPathEtcdBackup = "/etc/etcd-backup" mountPathCredentials = "/etc/etcd-backup-creds" + mountPathAWSIAMToken = "/var/run/secrets/aws-iam-token" requeueInterval = 10 * time.Second ) @@ -149,9 +151,24 @@ func (r *HCPEtcdBackupReconciler) validatePrerequisites(ctx context.Context, bac func (r *HCPEtcdBackupReconciler) createResourcesAndJob(ctx context.Context, backup *hyperv1.HCPEtcdBackup, hcp *hyperv1.HostedControlPlane) (ctrl.Result, error) { logger := log.FromContext(ctx) - logger.Info("creating backup resources", "backup", backup.Name, "namespace", backup.Namespace) - if err := r.ensureServiceAccount(ctx); err != nil { + credentialSecretName, err := r.getCredentialSecretName(backup) + if err != nil { + return r.setFailedConditionAndUpdate(ctx, backup, hyperv1.BackupFailedReason, err.Error()) + } + credSecret := &corev1.Secret{} + if err := r.Get(ctx, types.NamespacedName{Name: credentialSecretName, Namespace: r.OperatorNamespace}, credSecret); err != nil { + if apierrors.IsNotFound(err) { + return r.setFailedConditionAndUpdate(ctx, backup, hyperv1.BackupFailedReason, + fmt.Sprintf("credential Secret %q not found in namespace %q", credentialSecretName, r.OperatorNamespace)) + } + return ctrl.Result{}, fmt.Errorf("failed to get credential Secret: %w", err) + } + + creds := resolveCredentials(backup.Spec.Storage.StorageType, credSecret) + logger.Info("creating backup resources", "backup", backup.Name, "namespace", backup.Namespace, "credentialMode", creds.Mode) + + if err := r.ensureServiceAccount(ctx, creds); err != nil { return ctrl.Result{}, fmt.Errorf("failed to ensure ServiceAccount: %w", err) } @@ -163,7 +180,7 @@ func (r *HCPEtcdBackupReconciler) createResourcesAndJob(ctx context.Context, bac return ctrl.Result{}, fmt.Errorf("failed to ensure NetworkPolicy: %w", err) } - if err := r.createBackupJob(ctx, backup, hcp); err != nil { + if err := r.createBackupJob(ctx, backup, hcp, creds); err != nil { if apierrors.IsNotFound(err) { // Clean up RBAC and NetworkPolicy created above before marking terminal. if cleanupErr := r.cleanupResources(ctx, backup); cleanupErr != nil { @@ -376,6 +393,28 @@ func (r *HCPEtcdBackupReconciler) findActiveJob(ctx context.Context, hcpNamespac return nil, nil } +// hasNonTerminalBackup checks if any other HCPEtcdBackup in the same namespace +// is not yet in a terminal state (pending or in-progress). This guards against +// deleting shared RBAC/NetworkPolicy resources while another backup still needs them, +// covering the race window between HCPEtcdBackup creation and Job creation. +func (r *HCPEtcdBackupReconciler) hasNonTerminalBackup(ctx context.Context, current *hyperv1.HCPEtcdBackup) (bool, string, error) { + backupList := &hyperv1.HCPEtcdBackupList{} + if err := r.List(ctx, backupList, client.InNamespace(current.Namespace)); err != nil { + return false, "", err + } + + for i := range backupList.Items { + other := &backupList.Items[i] + if other.Name == current.Name { + continue + } + if !isTerminal(other) { + return true, other.Name, nil + } + } + return false, "", nil +} + // findJobForBackup finds the Job created for this specific backup. func (r *HCPEtcdBackupReconciler) findJobForBackup(ctx context.Context, backup *hyperv1.HCPEtcdBackup) (*batchv1.Job, error) { jobList := &batchv1.JobList{} @@ -538,7 +577,10 @@ func (r *HCPEtcdBackupReconciler) setEncryptionMetadata(backup *hyperv1.HCPEtcdB } // ensureServiceAccount creates the ServiceAccount for backup Jobs in the HO namespace. -func (r *HCPEtcdBackupReconciler) ensureServiceAccount(ctx context.Context) error { +// For Azure Workload Identity mode, if the SA already has a +// azure.workload.identity/client-id annotation (e.g., set by infrastructure/Helm), +// it is preserved. Otherwise, the annotation is set from the credential secret. +func (r *HCPEtcdBackupReconciler) ensureServiceAccount(ctx context.Context, creds resolvedCredentials) error { sa := &corev1.ServiceAccount{ ObjectMeta: metav1.ObjectMeta{ Name: jobServiceAccountName, @@ -546,6 +588,16 @@ func (r *HCPEtcdBackupReconciler) ensureServiceAccount(ctx context.Context) erro }, } _, err := controllerutil.CreateOrUpdate(ctx, r.Client, sa, func() error { + if creds.needsWorkloadIdentityLabel() { + if sa.Annotations == nil { + sa.Annotations = map[string]string{} + } + if sa.Annotations["azure.workload.identity/client-id"] == "" { + sa.Annotations["azure.workload.identity/client-id"] = creds.ClientID + } + } else { + delete(sa.Annotations, "azure.workload.identity/client-id") + } return nil }) return err @@ -657,18 +709,20 @@ func (r *HCPEtcdBackupReconciler) ensureNetworkPolicy(ctx context.Context, backu } // cleanupResources removes temporary NetworkPolicy and RBAC from the HCP namespace. -// It skips deletion if another backup Job is still active in the same HCP namespace, -// because the shared resources (NetworkPolicy, RBAC) are needed by that Job. +// It skips deletion if another backup is pending or in-progress in the same namespace, +// because the shared resources (NetworkPolicy, RBAC) may be needed by that backup. func (r *HCPEtcdBackupReconciler) cleanupResources(ctx context.Context, backup *hyperv1.HCPEtcdBackup) error { logger := log.FromContext(ctx) - // Guard: don't delete shared resources while another backup Job is active. - activeJob, err := r.findActiveJob(ctx, backup.Namespace) + // Guard: don't delete shared resources while another non-terminal backup exists. + // This covers both the case where a Job is active AND the race window where a + // new HCPEtcdBackup has been created but its Job hasn't been spawned yet. + hasOther, otherName, err := r.hasNonTerminalBackup(ctx, backup) if err != nil { - return fmt.Errorf("failed to check for active jobs before cleanup: %w", err) + return fmt.Errorf("failed to check for non-terminal backups before cleanup: %w", err) } - if activeJob != nil { - logger.Info("skipping cleanup: another backup Job is still active", "activeJob", activeJob.Name) + if hasOther { + logger.Info("skipping cleanup: another backup is still pending or in-progress", "otherBackup", otherName) return nil } @@ -719,7 +773,7 @@ func (r *HCPEtcdBackupReconciler) cleanupResources(ctx context.Context, backup * // createBackupJob creates the backup Job in the HO namespace with the 3-container // PodSpec: fetch-etcd-certs (init), etcdctl snapshot save (init), etcd-upload (main). -func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *hyperv1.HCPEtcdBackup, hcp *hyperv1.HostedControlPlane) error { +func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *hyperv1.HCPEtcdBackup, hcp *hyperv1.HostedControlPlane, creds resolvedCredentials) error { // Resolve images pullSecret := &corev1.Secret{} if err := r.Get(ctx, types.NamespacedName{Name: pullSecretName, Namespace: backup.Namespace}, pullSecret); err != nil { @@ -740,8 +794,8 @@ func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *h return fmt.Errorf("failed to resolve etcd image: %w", err) } - // Build upload args based on storage type - uploadArgs, credentialSecretName, err := r.buildUploadArgs(backup) + // Build upload args based on storage type and credential mode + uploadArgs, err := r.buildUploadArgs(backup, creds) if err != nil { return fmt.Errorf("failed to build upload args: %w", err) } @@ -753,6 +807,14 @@ func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *h LabelHCPNamespace: backup.Namespace, } + podLabels := make(map[string]string, len(jobLabels)+1) + for k, v := range jobLabels { + podLabels[k] = v + } + if creds.needsWorkloadIdentityLabel() { + podLabels["azure.workload.identity/use"] = "true" + } + job := &batchv1.Job{ ObjectMeta: metav1.ObjectMeta{ GenerateName: fmt.Sprintf("etcd-backup-%s-", backup.Name), @@ -765,33 +827,12 @@ func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *h BackoffLimit: ptr.To[int32](0), Template: corev1.PodTemplateSpec{ ObjectMeta: metav1.ObjectMeta{ - Labels: jobLabels, + Labels: podLabels, }, Spec: corev1.PodSpec{ ServiceAccountName: jobServiceAccountName, RestartPolicy: corev1.RestartPolicyNever, - Volumes: []corev1.Volume{ - { - Name: volumeEtcdCerts, - VolumeSource: corev1.VolumeSource{ - EmptyDir: &corev1.EmptyDirVolumeSource{}, - }, - }, - { - Name: volumeEtcdBackup, - VolumeSource: corev1.VolumeSource{ - EmptyDir: &corev1.EmptyDirVolumeSource{}, - }, - }, - { - Name: volumeCredentials, - VolumeSource: corev1.VolumeSource{ - Secret: &corev1.SecretVolumeSource{ - SecretName: credentialSecretName, - }, - }, - }, - }, + Volumes: r.buildJobVolumes(creds), InitContainers: []corev1.Container{ { Name: "fetch-certs", @@ -837,23 +878,7 @@ func (r *HCPEtcdBackupReconciler) createBackupJob(ctx context.Context, backup *h }, }, Containers: []corev1.Container{ - { - Name: "upload", - Image: cpoImage, - Command: uploadArgs, - VolumeMounts: []corev1.VolumeMount{ - { - Name: volumeEtcdBackup, - MountPath: mountPathEtcdBackup, - ReadOnly: true, - }, - { - Name: volumeCredentials, - MountPath: mountPathCredentials, - ReadOnly: true, - }, - }, - }, + r.buildUploadContainer(cpoImage, uploadArgs, creds), }, }, }, @@ -898,9 +923,94 @@ func (r *HCPEtcdBackupReconciler) getCredentialSecretName(backup *hyperv1.HCPEtc return "", fmt.Errorf("unsupported storage type: %s", backup.Spec.Storage.StorageType) } -// buildUploadArgs constructs the command args for the etcd-upload container -// and returns the credential Secret name. -func (r *HCPEtcdBackupReconciler) buildUploadArgs(backup *hyperv1.HCPEtcdBackup) ([]string, string, error) { +func (r *HCPEtcdBackupReconciler) buildJobVolumes(creds resolvedCredentials) []corev1.Volume { + volumes := []corev1.Volume{ + { + Name: volumeEtcdCerts, + VolumeSource: corev1.VolumeSource{ + EmptyDir: &corev1.EmptyDirVolumeSource{}, + }, + }, + { + Name: volumeEtcdBackup, + VolumeSource: corev1.VolumeSource{ + EmptyDir: &corev1.EmptyDirVolumeSource{}, + }, + }, + } + + if creds.needsCredentialsFile() { + volumes = append(volumes, corev1.Volume{ + Name: volumeCredentials, + VolumeSource: corev1.VolumeSource{ + Secret: &corev1.SecretVolumeSource{ + SecretName: creds.SecretName, + }, + }, + }) + } + + if creds.needsProjectedToken() { + volumes = append(volumes, corev1.Volume{ + Name: volumeAWSIAMToken, + VolumeSource: corev1.VolumeSource{ + Projected: &corev1.ProjectedVolumeSource{ + Sources: []corev1.VolumeProjection{ + { + ServiceAccountToken: &corev1.ServiceAccountTokenProjection{ + Audience: "sts.amazonaws.com", + ExpirationSeconds: ptr.To[int64](3600), + Path: "token", + }, + }, + }, + }, + }, + }) + } + + return volumes +} + +func (r *HCPEtcdBackupReconciler) buildUploadContainer(image string, args []string, creds resolvedCredentials) corev1.Container { + container := corev1.Container{ + Name: "upload", + Image: image, + Command: args, + VolumeMounts: []corev1.VolumeMount{ + { + Name: volumeEtcdBackup, + MountPath: mountPathEtcdBackup, + ReadOnly: true, + }, + }, + } + + if creds.needsCredentialsFile() { + container.VolumeMounts = append(container.VolumeMounts, corev1.VolumeMount{ + Name: volumeCredentials, + MountPath: mountPathCredentials, + ReadOnly: true, + }) + } + + if creds.needsProjectedToken() { + container.Env = []corev1.EnvVar{ + {Name: "AWS_ROLE_ARN", Value: creds.RoleARN}, + {Name: "AWS_WEB_IDENTITY_TOKEN_FILE", Value: mountPathAWSIAMToken + "/token"}, + } + container.VolumeMounts = append(container.VolumeMounts, corev1.VolumeMount{ + Name: volumeAWSIAMToken, + MountPath: mountPathAWSIAMToken, + ReadOnly: true, + }) + } + + return container +} + +// buildUploadArgs constructs the command args for the etcd-upload container. +func (r *HCPEtcdBackupReconciler) buildUploadArgs(backup *hyperv1.HCPEtcdBackup, creds resolvedCredentials) ([]string, error) { args := []string{ "control-plane-operator", "etcd-upload", "--snapshot-path", mountPathEtcdBackup + "/snapshot.db", @@ -914,12 +1024,14 @@ func (r *HCPEtcdBackupReconciler) buildUploadArgs(backup *hyperv1.HCPEtcdBackup) "--aws-bucket", s3.Bucket, "--aws-region", s3.Region, "--key-prefix", s3.KeyPrefix, - "--credentials-file", mountPathCredentials+"/credentials", ) + if creds.needsCredentialsFile() { + args = append(args, "--credentials-file", mountPathCredentials+"/credentials") + } if s3.KMSKeyARN != "" { args = append(args, "--aws-kms-key-arn", s3.KMSKeyARN) } - return args, s3.Credentials.Name, nil + return args, nil case hyperv1.AzureBlobBackupStorage: azure := backup.Spec.Storage.AzureBlob @@ -928,15 +1040,20 @@ func (r *HCPEtcdBackupReconciler) buildUploadArgs(backup *hyperv1.HCPEtcdBackup) "--azure-container", azure.Container, "--azure-storage-account", azure.StorageAccount, "--key-prefix", azure.KeyPrefix, - "--credentials-file", mountPathCredentials+"/credentials", ) + if creds.needsCredentialsFile() { + args = append(args, "--credentials-file", mountPathCredentials+"/credentials") + } + if authType := creds.azureAuthType(); authType != "" { + args = append(args, "--azure-auth-type", authType) + } if azure.EncryptionKeyURL != "" { args = append(args, "--azure-encryption-scope", azure.EncryptionKeyURL) } - return args, azure.Credentials.Name, nil + return args, nil } - return nil, "", fmt.Errorf("unsupported storage type: %s", backup.Spec.Storage.StorageType) + return nil, fmt.Errorf("unsupported storage type: %s", backup.Spec.Storage.StorageType) } // enforceRetention deletes the oldest completed HCPEtcdBackup CRs if the count diff --git a/hypershift-operator/controllers/etcdbackup/reconciler_test.go b/hypershift-operator/controllers/etcdbackup/reconciler_test.go index 7f15a5223ed..077e00f1957 100644 --- a/hypershift-operator/controllers/etcdbackup/reconciler_test.go +++ b/hypershift-operator/controllers/etcdbackup/reconciler_test.go @@ -436,6 +436,14 @@ func TestReconcile(t *testing.T) { }, } + // Non-terminal backup from tc4a (still in-progress) + tc4aBackup := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{ + Name: "pr8139-tc4a", + Namespace: testHCPNamespace, + }, + } + // Active Job from the first backup (tc4a) activeJob := &batchv1.Job{ ObjectMeta: metav1.ObjectMeta{ @@ -478,7 +486,7 @@ func TestReconcile(t *testing.T) { }, } - r := newReconciler(rejectedBackup, activeJob, np, role, rb) + r := newReconciler(rejectedBackup, tc4aBackup, activeJob, np, role, rb) ctx := context.Background() result, err := r.Reconcile(ctx, ctrl.Request{ @@ -685,7 +693,7 @@ func TestCleanupResources(t *testing.T) { g.Expect(err).ToNot(HaveOccurred()) }) - t.Run("When another backup Job is active it should skip cleanup", func(t *testing.T) { + t.Run("When another non-terminal backup exists it should skip cleanup", func(t *testing.T) { g := NewGomegaWithT(t) backup := newHCPEtcdBackup() @@ -709,31 +717,15 @@ func TestCleanupResources(t *testing.T) { }, } - // Active Job from another backup in the same HCP namespace - activeJob := &batchv1.Job{ + // Another non-terminal backup in the same namespace (no Job yet — race window) + otherBackup := &hyperv1.HCPEtcdBackup{ ObjectMeta: metav1.ObjectMeta{ - Name: "etcd-backup-other-backup", - Namespace: testHONamespace, - Labels: map[string]string{ - LabelApp: LabelName, - LabelBackupName: "other-backup", - LabelHCPNamespace: testHCPNamespace, - }, - }, - Spec: batchv1.JobSpec{ - Template: corev1.PodTemplateSpec{ - Spec: corev1.PodSpec{ - Containers: []corev1.Container{{Name: "test", Image: "test:latest"}}, - RestartPolicy: corev1.RestartPolicyNever, - }, - }, - }, - Status: batchv1.JobStatus{ - Active: 1, + Name: "other-backup", + Namespace: testHCPNamespace, }, } - r := newReconciler(backup, np, role, rb, activeJob) + r := newReconciler(backup, np, role, rb, otherBackup) ctx := context.Background() err := r.cleanupResources(ctx, backup) @@ -744,6 +736,58 @@ func TestCleanupResources(t *testing.T) { g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.Role{})).To(Succeed()) g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.RoleBinding{})).To(Succeed()) }) + + t.Run("When all other backups are terminal it should proceed with cleanup", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + + np := &networkingv1.NetworkPolicy{ + ObjectMeta: metav1.ObjectMeta{ + Name: NetworkPolicyName, + Namespace: testHCPNamespace, + }, + } + role := &rbacv1.Role{ + ObjectMeta: metav1.ObjectMeta{ + Name: RBACName, + Namespace: testHCPNamespace, + }, + } + rb := &rbacv1.RoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: RBACName, + Namespace: testHCPNamespace, + }, + } + + // Another backup that already completed (terminal) + completedBackup := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{ + Name: "completed-backup", + Namespace: testHCPNamespace, + }, + Status: hyperv1.HCPEtcdBackupStatus{ + Conditions: []metav1.Condition{ + { + Type: string(hyperv1.BackupCompleted), + Status: metav1.ConditionTrue, + Reason: hyperv1.BackupSucceededReason, + }, + }, + }, + } + + r := newReconciler(backup, np, role, rb, completedBackup) + ctx := context.Background() + + err := r.cleanupResources(ctx, backup) + g.Expect(err).ToNot(HaveOccurred()) + + // All resources should be deleted + g.Expect(r.Get(ctx, types.NamespacedName{Name: NetworkPolicyName, Namespace: testHCPNamespace}, &networkingv1.NetworkPolicy{})).ToNot(Succeed()) + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.Role{})).ToNot(Succeed()) + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.RoleBinding{})).ToNot(Succeed()) + }) } func newTestJob(status batchv1.JobStatus) *batchv1.Job { @@ -918,59 +962,117 @@ func TestSetEncryptionMetadata(t *testing.T) { } func TestBuildUploadArgs(t *testing.T) { - t.Run("When storage type is S3 it should build S3 upload args", func(t *testing.T) { - g := NewGomegaWithT(t) - backup := newHCPEtcdBackup() - r := newReconciler(backup) - - args, credSecret, err := r.buildUploadArgs(backup) - g.Expect(err).ToNot(HaveOccurred()) - g.Expect(credSecret).To(Equal("aws-creds")) - g.Expect(args).To(ContainElements( - "--storage-type", "S3", - "--aws-bucket", "my-bucket", - "--aws-region", "us-east-1", - "--key-prefix", "backups/test", - )) - }) - - t.Run("When S3 has KMS key it should include kms-key-arn flag", func(t *testing.T) { - g := NewGomegaWithT(t) - backup := newHCPEtcdBackup() - backup.Spec.Storage.S3.KMSKeyARN = "arn:aws:kms:us-east-1:123456789012:key/test" - r := newReconciler(backup) - - args, _, err := r.buildUploadArgs(backup) - g.Expect(err).ToNot(HaveOccurred()) - g.Expect(args).To(ContainElements("--aws-kms-key-arn", "arn:aws:kms:us-east-1:123456789012:key/test")) - }) - - t.Run("When storage type is AzureBlob it should build Azure upload args", func(t *testing.T) { - g := NewGomegaWithT(t) - backup := &hyperv1.HCPEtcdBackup{ + newAzureBackup := func(encryptionKeyURL string) *hyperv1.HCPEtcdBackup { + return &hyperv1.HCPEtcdBackup{ Spec: hyperv1.HCPEtcdBackupSpec{ Storage: hyperv1.HCPEtcdBackupStorage{ StorageType: hyperv1.AzureBlobBackupStorage, AzureBlob: hyperv1.HCPEtcdBackupAzureBlob{ - Container: "my-container", - StorageAccount: "mystorageaccount", - KeyPrefix: "backups", - Credentials: hyperv1.SecretReference{Name: "azure-creds"}, + Container: "my-container", + StorageAccount: "mystorageaccount", + KeyPrefix: "backups", + Credentials: hyperv1.SecretReference{Name: "azure-creds"}, + EncryptionKeyURL: encryptionKeyURL, }, }, }, } - r := newReconciler() + } - args, credSecret, err := r.buildUploadArgs(backup) - g.Expect(err).ToNot(HaveOccurred()) - g.Expect(credSecret).To(Equal("azure-creds")) - g.Expect(args).To(ContainElements( - "--storage-type", "AzureBlob", - "--azure-container", "my-container", - "--azure-storage-account", "mystorageaccount", - )) - }) + tests := []struct { + name string + backup *hyperv1.HCPEtcdBackup + creds resolvedCredentials + wantErr string + wantContain []string + wantNotContain []string + }{ + { + name: "When storage type is S3 with static credentials it should build S3 upload args with credentials-file", + backup: newHCPEtcdBackup(), + creds: resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"}, + wantContain: []string{ + "--storage-type", "S3", + "--aws-bucket", "my-bucket", + "--aws-region", "us-east-1", + "--key-prefix", "backups/test", + "--credentials-file", + }, + }, + { + name: "When storage type is S3 with STS credentials it should omit credentials-file", + backup: newHCPEtcdBackup(), + creds: resolvedCredentials{Mode: credentialModeAWSSTS, SecretName: "aws-creds", RoleARN: "arn:aws:iam::123:role/test"}, + wantContain: []string{"--storage-type", "S3"}, + wantNotContain: []string{"--credentials-file"}, + }, + { + name: "When S3 has KMS key it should include kms-key-arn flag", + backup: func() *hyperv1.HCPEtcdBackup { + b := newHCPEtcdBackup() + b.Spec.Storage.S3.KMSKeyARN = "arn:aws:kms:us-east-1:123456789012:key/test" + return b + }(), + creds: resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"}, + wantContain: []string{"--aws-kms-key-arn", "arn:aws:kms:us-east-1:123456789012:key/test"}, + }, + { + name: "When storage type is AzureBlob with client-secret it should include credentials-file and azure-auth-type", + backup: newAzureBackup(""), + creds: resolvedCredentials{Mode: credentialModeAzureClientSecret, SecretName: "azure-creds"}, + wantContain: []string{ + "--storage-type", "AzureBlob", + "--azure-container", "my-container", + "--azure-storage-account", "mystorageaccount", + "--credentials-file", + "--azure-auth-type", "client-secret", + }, + }, + { + name: "When storage type is AzureBlob with workload-identity it should omit credentials-file and azure-auth-type", + backup: newAzureBackup(""), + creds: resolvedCredentials{Mode: credentialModeAzureWorkloadIdentity, SecretName: "azure-creds", ClientID: "client-123"}, + wantContain: []string{"--storage-type", "AzureBlob"}, + wantNotContain: []string{"--credentials-file", "--azure-auth-type"}, + }, + { + name: "When storage type is AzureBlob with encryption key it should include azure-encryption-scope", + backup: newAzureBackup("https://myvault.vault.azure.net/keys/mykey"), + creds: resolvedCredentials{Mode: credentialModeAzureClientSecret, SecretName: "azure-creds"}, + wantContain: []string{"--azure-encryption-scope", "https://myvault.vault.azure.net/keys/mykey"}, + }, + { + name: "When storage type is unsupported it should return an error", + backup: &hyperv1.HCPEtcdBackup{ + Spec: hyperv1.HCPEtcdBackupSpec{ + Storage: hyperv1.HCPEtcdBackupStorage{StorageType: "UnknownStorage"}, + }, + }, + creds: resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "some-creds"}, + wantErr: "unsupported storage type", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + g := NewGomegaWithT(t) + r := newReconciler() + + args, err := r.buildUploadArgs(tt.backup, tt.creds) + if tt.wantErr != "" { + g.Expect(err).To(HaveOccurred()) + g.Expect(err.Error()).To(ContainSubstring(tt.wantErr)) + return + } + g.Expect(err).ToNot(HaveOccurred()) + for _, elem := range tt.wantContain { + g.Expect(args).To(ContainElement(elem), "expected arg %q to be present", elem) + } + for _, elem := range tt.wantNotContain { + g.Expect(args).ToNot(ContainElement(elem), "expected arg %q to be absent", elem) + } + }) + } } func TestEnforceRetention(t *testing.T) { @@ -1035,7 +1137,7 @@ func TestEnforceRetention(t *testing.T) { for i := range 3 { b := &hyperv1.HCPEtcdBackup{ ObjectMeta: metav1.ObjectMeta{ - Name: "backup-" + string(rune('a'+i)), + Name: fmt.Sprintf("backup-%d", i), Namespace: testHCPNamespace, }, Status: hyperv1.HCPEtcdBackupStatus{ @@ -1155,7 +1257,8 @@ func TestCreateBackupJob(t *testing.T) { r := newReconciler(backup, hcp, pullSecret, credSecret) ctx := context.Background() - err := r.createBackupJob(ctx, backup, hcp) + creds := resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) // Find the created Job @@ -1234,7 +1337,8 @@ func TestCreateBackupJob(t *testing.T) { r := newReconciler(backup, hcp, pullSecret) ctx := context.Background() - err := r.createBackupJob(ctx, backup, hcp) + creds := resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) jobList := &batchv1.JobList{} @@ -1283,7 +1387,8 @@ func TestCreateBackupJob(t *testing.T) { r := newReconciler(backup, hcp, pullSecret, credSecret) ctx := context.Background() - err := r.createBackupJob(ctx, backup, hcp) + creds := resolvedCredentials{Mode: credentialModeAzureClientSecret, SecretName: "azure-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) jobList := &batchv1.JobList{} @@ -1308,7 +1413,8 @@ func TestCreateBackupJob(t *testing.T) { r := newReconciler(backup, hcp) ctx := context.Background() - err := r.createBackupJob(ctx, backup, hcp) + creds := resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).To(HaveOccurred()) g.Expect(err.Error()).To(ContainSubstring("pull secret")) }) @@ -1336,7 +1442,8 @@ func TestCreateBackupJob(t *testing.T) { r := newReconciler(backup, hcp, pullSecret, credSecret) ctx := context.Background() - err := r.createBackupJob(ctx, backup, hcp) + creds := resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: "aws-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) jobList := &batchv1.JobList{} @@ -1346,122 +1453,380 @@ func TestCreateBackupJob(t *testing.T) { upload := jobList.Items[0].Spec.Template.Spec.Containers[0] g.Expect(upload.Command).To(ContainElements("--aws-kms-key-arn", "arn:aws:kms:us-east-1:123456789012:key/test-key")) }) -} -func TestReconcileHappyPath(t *testing.T) { - t.Run("When etcd is healthy and no active Job exists it should create backup resources and Job", func(t *testing.T) { + t.Run("When AWS STS mode it should create Job with projected token volume and env vars", func(t *testing.T) { g := NewGomegaWithT(t) backup := newHCPEtcdBackup() hcp := newHostedControlPlane() - sts := newEtcdStatefulSet(3, 3) pullSecret := &corev1.Secret{ - ObjectMeta: metav1.ObjectMeta{ - Name: pullSecretName, - Namespace: testHCPNamespace, - }, - Data: map[string][]byte{ - corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`), - }, + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, } credSecret := &corev1.Secret{ - ObjectMeta: metav1.ObjectMeta{ - Name: "aws-creds", - Namespace: testHONamespace, - }, - Data: map[string][]byte{ - "credentials": []byte("fake-creds"), - }, + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds", Namespace: testHONamespace}, } - r := newReconciler(backup, hcp, sts, pullSecret, credSecret) + r := newReconciler(backup, hcp, pullSecret, credSecret) ctx := context.Background() - result, err := r.Reconcile(ctx, ctrl.Request{ - NamespacedName: types.NamespacedName{ - Name: testBackupName, - Namespace: testHCPNamespace, - }, - }) + creds := resolvedCredentials{ + Mode: credentialModeAWSSTS, + SecretName: "aws-creds", + RoleARN: "arn:aws:iam::123456789012:role/etcd-backup", + } + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) - g.Expect(result.RequeueAfter).To(Equal(requeueInterval)) - // Verify Job was created jobList := &batchv1.JobList{} g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) g.Expect(jobList.Items).To(HaveLen(1)) - g.Expect(jobList.Items[0].Labels[LabelBackupName]).To(Equal(testBackupName)) - // Verify ServiceAccount was created - sa := &corev1.ServiceAccount{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + podSpec := jobList.Items[0].Spec.Template.Spec - // Verify RBAC was created in HCP namespace - role := &rbacv1.Role{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, role)).To(Succeed()) - rb := &rbacv1.RoleBinding{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, rb)).To(Succeed()) + // Should NOT have credentials volume + for _, v := range podSpec.Volumes { + g.Expect(v.Name).ToNot(Equal(volumeCredentials), "credentials volume should not be present in STS mode") + } - // Verify NetworkPolicy was created in HCP namespace - np := &networkingv1.NetworkPolicy{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: NetworkPolicyName, Namespace: testHCPNamespace}, np)).To(Succeed()) + // Should have projected token volume + var tokenVolume *corev1.Volume + for i := range podSpec.Volumes { + if podSpec.Volumes[i].Name == volumeAWSIAMToken { + tokenVolume = &podSpec.Volumes[i] + break + } + } + g.Expect(tokenVolume).ToNot(BeNil(), "projected token volume should be present") + g.Expect(tokenVolume.Projected).ToNot(BeNil()) + g.Expect(tokenVolume.Projected.Sources).To(HaveLen(1)) + g.Expect(tokenVolume.Projected.Sources[0].ServiceAccountToken.Audience).To(Equal("sts.amazonaws.com")) - // Verify backup status set to BackupInProgress - updated := &hyperv1.HCPEtcdBackup{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) - g.Expect(updated.Status.Conditions).To(HaveLen(1)) - g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupInProgressReason)) - g.Expect(updated.Status.Conditions[0].Status).To(Equal(metav1.ConditionFalse)) + // Upload container env vars + upload := podSpec.Containers[0] + g.Expect(upload.Env).To(ContainElements( + corev1.EnvVar{Name: "AWS_ROLE_ARN", Value: "arn:aws:iam::123456789012:role/etcd-backup"}, + corev1.EnvVar{Name: "AWS_WEB_IDENTITY_TOKEN_FILE", Value: mountPathAWSIAMToken + "/token"}, + )) - // Verify HCP condition was set - updatedHCP := &hyperv1.HostedControlPlane{} - g.Expect(r.Get(ctx, types.NamespacedName{Name: testHCPName, Namespace: testHCPNamespace}, updatedHCP)).To(Succeed()) - hcpCond := meta.FindStatusCondition(updatedHCP.Status.Conditions, string(hyperv1.EtcdBackupSucceeded)) - g.Expect(hcpCond).ToNot(BeNil()) - g.Expect(hcpCond.Status).To(Equal(metav1.ConditionFalse)) - g.Expect(hcpCond.Reason).To(Equal(hyperv1.BackupInProgressReason)) + // Upload container should mount token volume, not credentials + var hasCreds, hasToken bool + for _, m := range upload.VolumeMounts { + if m.Name == volumeCredentials { + hasCreds = true + } + if m.Name == volumeAWSIAMToken { + hasToken = true + } + } + g.Expect(hasCreds).To(BeFalse(), "credentials mount should not be present in STS mode") + g.Expect(hasToken).To(BeTrue(), "token mount should be present in STS mode") + + // Upload args should NOT have --credentials-file + g.Expect(upload.Command).ToNot(ContainElement("--credentials-file")) }) -} -func TestUpdateHostedClusterBackupURL(t *testing.T) { - t.Run("When HCP has HostedClusterAnnotation it should update HC status with snapshot URL", func(t *testing.T) { + t.Run("When Azure Workload Identity mode it should create Job with WI pod label and no credentials volume", func(t *testing.T) { g := NewGomegaWithT(t) + backup := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{Name: testBackupName, Namespace: testHCPNamespace}, + Spec: hyperv1.HCPEtcdBackupSpec{ + Storage: hyperv1.HCPEtcdBackupStorage{ + StorageType: hyperv1.AzureBlobBackupStorage, + AzureBlob: hyperv1.HCPEtcdBackupAzureBlob{ + Container: "my-container", + StorageAccount: "mystorageaccount", + KeyPrefix: "backups/test", + Credentials: hyperv1.SecretReference{Name: "azure-creds"}, + }, + }, + }, + } hcp := newHostedControlPlane() - hc := newHostedCluster() - r := newReconciler(hcp, hc) + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, + } + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds", Namespace: testHONamespace}, + } + r := newReconciler(backup, hcp, pullSecret, credSecret) + ctx := context.Background() - err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") + creds := resolvedCredentials{ + Mode: credentialModeAzureWorkloadIdentity, + SecretName: "azure-creds", + ClientID: "client-789", + } + err := r.createBackupJob(ctx, backup, hcp, creds) g.Expect(err).ToNot(HaveOccurred()) - updatedHC := &hyperv1.HostedCluster{} - g.Expect(r.Get(context.Background(), types.NamespacedName{Name: testHCName, Namespace: testHCNamespace}, updatedHC)).To(Succeed()) - g.Expect(updatedHC.Status.LastSuccessfulEtcdBackupURL).To(Equal("s3://bucket/snapshot.db")) - }) + jobList := &batchv1.JobList{} + g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) + g.Expect(jobList.Items).To(HaveLen(1)) - t.Run("When HCP is missing HostedClusterAnnotation it should return an error", func(t *testing.T) { - g := NewGomegaWithT(t) - hcp := newHostedControlPlane() - hcp.Annotations = nil - r := newReconciler(hcp) + podSpec := jobList.Items[0].Spec.Template.Spec - err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") - g.Expect(err).To(HaveOccurred()) - g.Expect(err.Error()).To(ContainSubstring("missing")) - }) + // Should NOT have credentials volume + for _, v := range podSpec.Volumes { + g.Expect(v.Name).ToNot(Equal(volumeCredentials), "credentials volume should not be present in WI mode") + } - t.Run("When HostedCluster does not exist it should return an error", func(t *testing.T) { - g := NewGomegaWithT(t) - hcp := newHostedControlPlane() - r := newReconciler(hcp) // no HC in the fake client + // Pod template should have WI label + podLabels := jobList.Items[0].Spec.Template.Labels + g.Expect(podLabels["azure.workload.identity/use"]).To(Equal("true")) - err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") - g.Expect(err).To(HaveOccurred()) + // Upload container should NOT mount credentials + upload := podSpec.Containers[0] + for _, m := range upload.VolumeMounts { + g.Expect(m.Name).ToNot(Equal(volumeCredentials), "credentials mount should not be present in WI mode") + } + + // Upload args should NOT have --credentials-file or --azure-auth-type + g.Expect(upload.Command).ToNot(ContainElement("--credentials-file")) + g.Expect(upload.Command).ToNot(ContainElement("--azure-auth-type")) }) - t.Run("When HC status update conflicts it should retry and succeed", func(t *testing.T) { + t.Run("When Azure managed-identity mode it should include credentials-file and azure-auth-type managed-identity", func(t *testing.T) { g := NewGomegaWithT(t) - hcp := newHostedControlPlane() - hc := newHostedCluster() - - callCount := 0 + backup := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{Name: testBackupName, Namespace: testHCPNamespace}, + Spec: hyperv1.HCPEtcdBackupSpec{ + Storage: hyperv1.HCPEtcdBackupStorage{ + StorageType: hyperv1.AzureBlobBackupStorage, + AzureBlob: hyperv1.HCPEtcdBackupAzureBlob{ + Container: "my-container", + StorageAccount: "mystorageaccount", + KeyPrefix: "backups/test", + Credentials: hyperv1.SecretReference{Name: "azure-creds"}, + }, + }, + }, + } + hcp := newHostedControlPlane() + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, + } + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds", Namespace: testHONamespace}, + } + r := newReconciler(backup, hcp, pullSecret, credSecret) + ctx := context.Background() + + creds := resolvedCredentials{Mode: credentialModeAzureManagedIdentity, SecretName: "azure-creds"} + err := r.createBackupJob(ctx, backup, hcp, creds) + g.Expect(err).ToNot(HaveOccurred()) + + jobList := &batchv1.JobList{} + g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) + g.Expect(jobList.Items).To(HaveLen(1)) + + upload := jobList.Items[0].Spec.Template.Spec.Containers[0] + g.Expect(upload.Command).To(ContainElements("--credentials-file")) + g.Expect(upload.Command).To(ContainElements("--azure-auth-type", "managed-identity")) + + // Pod template should NOT have WI label + podLabels := jobList.Items[0].Spec.Template.Labels + g.Expect(podLabels).ToNot(HaveKey("azure.workload.identity/use")) + }) +} + +func TestEnsureServiceAccountWithCredentials(t *testing.T) { + t.Run("When Azure Workload Identity mode and SA has pre-configured annotation it should preserve it", func(t *testing.T) { + g := NewGomegaWithT(t) + preConfiguredSA := &corev1.ServiceAccount{ + ObjectMeta: metav1.ObjectMeta{ + Name: jobServiceAccountName, + Namespace: testHONamespace, + Annotations: map[string]string{ + "azure.workload.identity/client-id": "preconfigured-client-id", + }, + }, + } + r := newReconciler(preConfiguredSA) + ctx := context.Background() + + creds := resolvedCredentials{ + Mode: credentialModeAzureWorkloadIdentity, + ClientID: "different-client-id-from-secret", + } + g.Expect(r.ensureServiceAccount(ctx, creds)).To(Succeed()) + + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + g.Expect(sa.Annotations["azure.workload.identity/client-id"]).To(Equal("preconfigured-client-id")) + }) + + t.Run("When Azure Workload Identity mode and SA has no annotation it should set it from credential secret", func(t *testing.T) { + g := NewGomegaWithT(t) + r := newReconciler() + ctx := context.Background() + + creds := resolvedCredentials{ + Mode: credentialModeAzureWorkloadIdentity, + ClientID: "client-789", + } + g.Expect(r.ensureServiceAccount(ctx, creds)).To(Succeed()) + + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + g.Expect(sa.Annotations["azure.workload.identity/client-id"]).To(Equal("client-789")) + }) + + t.Run("When non-WI mode it should not annotate the ServiceAccount", func(t *testing.T) { + g := NewGomegaWithT(t) + r := newReconciler() + ctx := context.Background() + + creds := resolvedCredentials{Mode: credentialModeAWSStatic} + g.Expect(r.ensureServiceAccount(ctx, creds)).To(Succeed()) + + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + g.Expect(sa.Annotations).ToNot(HaveKey("azure.workload.identity/client-id")) + }) + + t.Run("When switching from WI to static mode it should remove the annotation", func(t *testing.T) { + g := NewGomegaWithT(t) + preConfiguredSA := &corev1.ServiceAccount{ + ObjectMeta: metav1.ObjectMeta{ + Name: jobServiceAccountName, + Namespace: testHONamespace, + Annotations: map[string]string{ + "azure.workload.identity/client-id": "client-789", + }, + }, + } + r := newReconciler(preConfiguredSA) + ctx := context.Background() + + // First: verify WI preserves annotation + wiCreds := resolvedCredentials{Mode: credentialModeAzureWorkloadIdentity, ClientID: "ignored"} + g.Expect(r.ensureServiceAccount(ctx, wiCreds)).To(Succeed()) + + // Then: update with static + staticCreds := resolvedCredentials{Mode: credentialModeAWSStatic} + g.Expect(r.ensureServiceAccount(ctx, staticCreds)).To(Succeed()) + + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + g.Expect(sa.Annotations).ToNot(HaveKey("azure.workload.identity/client-id")) + }) +} + +func TestReconcileHappyPath(t *testing.T) { + t.Run("When etcd is healthy and no active Job exists it should create backup resources and Job", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + sts := newEtcdStatefulSet(3, 3) + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{ + Name: pullSecretName, + Namespace: testHCPNamespace, + }, + Data: map[string][]byte{ + corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`), + }, + } + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{ + Name: "aws-creds", + Namespace: testHONamespace, + }, + Data: map[string][]byte{ + "credentials": []byte("fake-creds"), + }, + } + r := newReconciler(backup, hcp, sts, pullSecret, credSecret) + ctx := context.Background() + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{ + Name: testBackupName, + Namespace: testHCPNamespace, + }, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result.RequeueAfter).To(Equal(requeueInterval)) + + // Verify Job was created + jobList := &batchv1.JobList{} + g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) + g.Expect(jobList.Items).To(HaveLen(1)) + g.Expect(jobList.Items[0].Labels[LabelBackupName]).To(Equal(testBackupName)) + + // Verify ServiceAccount was created + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + + // Verify RBAC was created in HCP namespace + role := &rbacv1.Role{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, role)).To(Succeed()) + rb := &rbacv1.RoleBinding{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, rb)).To(Succeed()) + + // Verify NetworkPolicy was created in HCP namespace + np := &networkingv1.NetworkPolicy{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: NetworkPolicyName, Namespace: testHCPNamespace}, np)).To(Succeed()) + + // Verify backup status set to BackupInProgress + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions).To(HaveLen(1)) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupInProgressReason)) + g.Expect(updated.Status.Conditions[0].Status).To(Equal(metav1.ConditionFalse)) + + // Verify HCP condition was set + updatedHCP := &hyperv1.HostedControlPlane{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testHCPName, Namespace: testHCPNamespace}, updatedHCP)).To(Succeed()) + hcpCond := meta.FindStatusCondition(updatedHCP.Status.Conditions, string(hyperv1.EtcdBackupSucceeded)) + g.Expect(hcpCond).ToNot(BeNil()) + g.Expect(hcpCond.Status).To(Equal(metav1.ConditionFalse)) + g.Expect(hcpCond.Reason).To(Equal(hyperv1.BackupInProgressReason)) + }) +} + +func TestUpdateHostedClusterBackupURL(t *testing.T) { + t.Run("When HCP has HostedClusterAnnotation it should update HC status with snapshot URL", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + hc := newHostedCluster() + r := newReconciler(hcp, hc) + + err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") + g.Expect(err).ToNot(HaveOccurred()) + + updatedHC := &hyperv1.HostedCluster{} + g.Expect(r.Get(context.Background(), types.NamespacedName{Name: testHCName, Namespace: testHCNamespace}, updatedHC)).To(Succeed()) + g.Expect(updatedHC.Status.LastSuccessfulEtcdBackupURL).To(Equal("s3://bucket/snapshot.db")) + }) + + t.Run("When HCP is missing HostedClusterAnnotation it should return an error", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + hcp.Annotations = nil + r := newReconciler(hcp) + + err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") + g.Expect(err).To(HaveOccurred()) + g.Expect(err.Error()).To(ContainSubstring("missing")) + }) + + t.Run("When HostedCluster does not exist it should return an error", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + r := newReconciler(hcp) // no HC in the fake client + + err := r.updateHostedClusterBackupURL(context.Background(), hcp, "s3://bucket/snapshot.db") + g.Expect(err).To(HaveOccurred()) + }) + + t.Run("When HC status update conflicts it should retry and succeed", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + hc := newHostedCluster() + + callCount := 0 s := newScheme() fakeClient := fake.NewClientBuilder(). WithScheme(s). @@ -1693,6 +2058,605 @@ func TestValidatePrerequisites(t *testing.T) { } } +func TestResolveControlPlaneOperatorImage(t *testing.T) { + t.Run("When HCP has ControlPlaneOperatorImage annotation it should use the override", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + hcp.Annotations[hyperv1.ControlPlaneOperatorImageAnnotation] = "quay.io/custom/cpo:override" + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, + } + r := newReconciler(hcp, pullSecret) + + image, err := r.resolveControlPlaneOperatorImage(context.Background(), hcp, testReleaseImage, pullSecret.Data[corev1.DockerConfigJsonKey]) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(image).To(Equal("quay.io/custom/cpo:override")) + }) + + t.Run("When HCP has no annotation it should resolve from release payload", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + r := newReconciler(hcp) + + image, err := r.resolveControlPlaneOperatorImage(context.Background(), hcp, testReleaseImage, []byte(`{"auths":{}}`)) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(image).To(Equal(testCPOImage)) + }) + + t.Run("When release payload has no hypershift component it should fallback to HO image", func(t *testing.T) { + g := NewGomegaWithT(t) + hcp := newHostedControlPlane() + r := newReconciler(hcp) + r.ReleaseProvider = &emptyReleaseProvider{} + + image, err := r.resolveControlPlaneOperatorImage(context.Background(), hcp, testReleaseImage, []byte(`{"auths":{}}`)) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(image).To(Equal("quay.io/hypershift/hypershift:latest")) + }) +} + +type emptyReleaseProvider struct{} + +func (f *emptyReleaseProvider) Lookup(_ context.Context, _ string, _ []byte) (*releaseinfo.ReleaseImage, error) { + return &releaseinfo.ReleaseImage{ + ImageStream: &imageapi.ImageStream{ + Spec: imageapi.ImageStreamSpec{ + Tags: []imageapi.TagReference{}, + }, + }, + }, nil +} + +func (f *emptyReleaseProvider) GetRegistryOverrides() map[string]string { return nil } +func (f *emptyReleaseProvider) GetOpenShiftImageRegistryOverrides() map[string][]string { + return nil +} +func (f *emptyReleaseProvider) GetMirroredReleaseImage() string { return "" } + +func TestCreateResourcesAndJob(t *testing.T) { + t.Run("When createBackupJob fails with NotFound it should cleanup resources and set BackupFailed", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds", Namespace: testHONamespace}, + Data: map[string][]byte{"credentials": []byte("fake-creds")}, + } + // No pull secret — createBackupJob will fail with NotFound + r := newReconciler(backup, hcp, credSecret) + ctx := context.Background() + + result, err := r.createResourcesAndJob(ctx, backup, hcp) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions).To(HaveLen(1)) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupFailedReason)) + g.Expect(updated.Status.Conditions[0].Message).To(ContainSubstring("pull secret")) + }) + + t.Run("When credential secret is not found it should set BackupFailed", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + r := newReconciler(backup, hcp) // no credential secret + ctx := context.Background() + + result, err := r.createResourcesAndJob(ctx, backup, hcp) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupFailedReason)) + g.Expect(updated.Status.Conditions[0].Message).To(ContainSubstring("credential Secret")) + }) +} + +func TestCleanupResourcesWithDeleteErrors(t *testing.T) { + t.Run("When delete returns a real error it should propagate the first error", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + + np := &networkingv1.NetworkPolicy{ + ObjectMeta: metav1.ObjectMeta{Name: NetworkPolicyName, Namespace: testHCPNamespace}, + } + + s := newScheme() + fakeClient := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(backup, np). + WithStatusSubresource(&hyperv1.HCPEtcdBackup{}). + WithInterceptorFuncs(interceptor.Funcs{ + Delete: func(ctx context.Context, cl client.WithWatch, obj client.Object, opts ...client.DeleteOption) error { + if _, ok := obj.(*networkingv1.NetworkPolicy); ok { + return fmt.Errorf("simulated delete failure") + } + return cl.Delete(ctx, obj, opts...) + }, + }). + Build() + + r := &HCPEtcdBackupReconciler{ + Client: fakeClient, + OperatorNamespace: testHONamespace, + } + + err := r.cleanupResources(context.Background(), backup) + g.Expect(err).To(HaveOccurred()) + g.Expect(err.Error()).To(ContainSubstring("simulated delete failure")) + }) +} + +func TestHandleJobStatusEdgeCases(t *testing.T) { + t.Run("When Job succeeds but getSnapshotURLFromPod fails it should still succeed with empty URL", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + hc := newHostedCluster() + job := newTestJob(batchv1.JobStatus{ + Conditions: []batchv1.JobCondition{ + {Type: batchv1.JobComplete, Status: corev1.ConditionTrue}, + }, + }) + + s := newScheme() + fakeClient := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(backup, job, hcp, hc). + WithStatusSubresource(&hyperv1.HCPEtcdBackup{}, &hyperv1.HostedControlPlane{}, &hyperv1.HostedCluster{}). + WithInterceptorFuncs(interceptor.Funcs{ + List: func(ctx context.Context, cl client.WithWatch, list client.ObjectList, opts ...client.ListOption) error { + if _, ok := list.(*corev1.PodList); ok { + return fmt.Errorf("simulated pod list failure") + } + return cl.List(ctx, list, opts...) + }, + }). + Build() + + r := &HCPEtcdBackupReconciler{ + Client: fakeClient, + OperatorNamespace: testHONamespace, + } + + result, err := r.handleJobStatus(context.Background(), backup, job, hcp) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(context.Background(), types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupSucceededReason)) + g.Expect(updated.Status.SnapshotURL).To(BeEmpty()) + }) + + t.Run("When Job succeeds with no termination message it should succeed with empty URL and not update HC", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + hc := newHostedCluster() + job := newTestJob(batchv1.JobStatus{ + Conditions: []batchv1.JobCondition{ + {Type: batchv1.JobComplete, Status: corev1.ConditionTrue}, + }, + }) + // Pod exists but upload container has no termination message + pod := &corev1.Pod{ + ObjectMeta: metav1.ObjectMeta{ + Name: "backup-pod", Namespace: testHONamespace, + Labels: map[string]string{"batch.kubernetes.io/job-name": "etcd-backup-test"}, + }, + Spec: corev1.PodSpec{ + Containers: []corev1.Container{{Name: "upload", Image: "test:latest"}}, + RestartPolicy: corev1.RestartPolicyNever, + }, + Status: corev1.PodStatus{ + ContainerStatuses: []corev1.ContainerStatus{ + {Name: "upload", State: corev1.ContainerState{Terminated: &corev1.ContainerStateTerminated{ExitCode: 0}}}, + }, + }, + } + r := newReconciler(backup, job, hcp, hc, pod) + + result, err := r.handleJobStatus(context.Background(), backup, job, hcp) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(context.Background(), types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupSucceededReason)) + g.Expect(updated.Status.SnapshotURL).To(BeEmpty()) + + // HC should not have been updated since URL was empty + updatedHC := &hyperv1.HostedCluster{} + g.Expect(r.Get(context.Background(), types.NamespacedName{Name: testHCName, Namespace: testHCNamespace}, updatedHC)).To(Succeed()) + g.Expect(updatedHC.Status.LastSuccessfulEtcdBackupURL).To(BeEmpty()) + }) +} + +func TestReconcileTerminalCleansUpAndEnforcesRetention(t *testing.T) { + t.Run("When backup is terminal it should cleanup RBAC and NetworkPolicy and enforce retention", func(t *testing.T) { + g := NewGomegaWithT(t) + ctx := context.Background() + + baseTime := time.Now().Add(-7 * time.Hour) + + // The backup being reconciled is terminal (succeeded) + backup := newHCPEtcdBackup() + backup.ObjectMeta.CreationTimestamp = metav1.NewTime(baseTime.Add(6 * time.Hour)) + backup.Status.Conditions = []metav1.Condition{ + { + Type: string(hyperv1.BackupCompleted), + Status: metav1.ConditionTrue, + Reason: hyperv1.BackupSucceededReason, + }, + } + + // 6 older completed backups (total 7 with above, max is 5 → should delete 2 oldest) + var objs []client.Object + objs = append(objs, backup) + for i := range 6 { + b := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf("old-backup-%d", i), + Namespace: testHCPNamespace, + CreationTimestamp: metav1.NewTime(baseTime.Add(time.Duration(i) * time.Hour)), + }, + Status: hyperv1.HCPEtcdBackupStatus{ + Conditions: []metav1.Condition{ + { + Type: string(hyperv1.BackupCompleted), + Status: metav1.ConditionTrue, + Reason: hyperv1.BackupSucceededReason, + }, + }, + }, + } + objs = append(objs, b) + } + + // Shared resources that should be cleaned up (no other non-terminal backup) + np := &networkingv1.NetworkPolicy{ + ObjectMeta: metav1.ObjectMeta{Name: NetworkPolicyName, Namespace: testHCPNamespace}, + } + role := &rbacv1.Role{ + ObjectMeta: metav1.ObjectMeta{Name: RBACName, Namespace: testHCPNamespace}, + } + rb := &rbacv1.RoleBinding{ + ObjectMeta: metav1.ObjectMeta{Name: RBACName, Namespace: testHCPNamespace}, + } + objs = append(objs, np, role, rb) + + r := newReconciler(objs...) + r.MaxBackupCount = 5 + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + // RBAC and NetworkPolicy should be gone + g.Expect(r.Get(ctx, types.NamespacedName{Name: NetworkPolicyName, Namespace: testHCPNamespace}, &networkingv1.NetworkPolicy{})).ToNot(Succeed()) + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.Role{})).ToNot(Succeed()) + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, &rbacv1.RoleBinding{})).ToNot(Succeed()) + + // Retention: should have kept only 5 completed backups + remaining := &hyperv1.HCPEtcdBackupList{} + g.Expect(r.List(ctx, remaining, client.InNamespace(testHCPNamespace))).To(Succeed()) + g.Expect(remaining.Items).To(HaveLen(5)) + }) +} + +func TestReconcileMonitorsExistingJob(t *testing.T) { + t.Run("When backup has an existing completed Job it should set BackupSucceeded via Reconcile", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + hc := newHostedCluster() + sts := newEtcdStatefulSet(3, 3) + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds", Namespace: testHONamespace}, + Data: map[string][]byte{"credentials": []byte("fake-creds")}, + } + + completedJob := &batchv1.Job{ + ObjectMeta: metav1.ObjectMeta{ + Name: "etcd-backup-" + testBackupName, + Namespace: testHONamespace, + Labels: map[string]string{ + LabelApp: LabelName, + LabelBackupName: testBackupName, + LabelHCPNamespace: testHCPNamespace, + }, + }, + Spec: batchv1.JobSpec{ + Template: corev1.PodTemplateSpec{ + Spec: corev1.PodSpec{ + Containers: []corev1.Container{{Name: "test", Image: "test:latest"}}, + RestartPolicy: corev1.RestartPolicyNever, + }, + }, + }, + Status: batchv1.JobStatus{ + Conditions: []batchv1.JobCondition{ + {Type: batchv1.JobComplete, Status: corev1.ConditionTrue}, + }, + }, + } + pod := &corev1.Pod{ + ObjectMeta: metav1.ObjectMeta{ + Name: "backup-pod", Namespace: testHONamespace, + Labels: map[string]string{"batch.kubernetes.io/job-name": "etcd-backup-" + testBackupName}, + }, + Spec: corev1.PodSpec{ + Containers: []corev1.Container{{Name: "upload", Image: "test:latest"}}, + RestartPolicy: corev1.RestartPolicyNever, + }, + Status: corev1.PodStatus{ + ContainerStatuses: []corev1.ContainerStatus{ + { + Name: "upload", + State: corev1.ContainerState{ + Terminated: &corev1.ContainerStateTerminated{ + ExitCode: 0, + Message: "s3://bucket/snapshot.db", + }, + }, + }, + }, + }, + } + + r := newReconciler(backup, hcp, hc, sts, credSecret, completedJob, pod) + ctx := context.Background() + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupSucceededReason)) + g.Expect(updated.Status.SnapshotURL).To(Equal("s3://bucket/snapshot.db")) + }) + + t.Run("When backup has an existing failed Job it should set BackupFailed via Reconcile", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + sts := newEtcdStatefulSet(3, 3) + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds", Namespace: testHONamespace}, + Data: map[string][]byte{"credentials": []byte("fake-creds")}, + } + + failedJob := &batchv1.Job{ + ObjectMeta: metav1.ObjectMeta{ + Name: "etcd-backup-" + testBackupName, + Namespace: testHONamespace, + Labels: map[string]string{ + LabelApp: LabelName, + LabelBackupName: testBackupName, + LabelHCPNamespace: testHCPNamespace, + }, + }, + Spec: batchv1.JobSpec{ + Template: corev1.PodTemplateSpec{ + Spec: corev1.PodSpec{ + Containers: []corev1.Container{{Name: "test", Image: "test:latest"}}, + RestartPolicy: corev1.RestartPolicyNever, + }, + }, + }, + Status: batchv1.JobStatus{ + Conditions: []batchv1.JobCondition{ + {Type: batchv1.JobFailed, Status: corev1.ConditionTrue, Message: "BackoffLimitExceeded"}, + }, + }, + } + + r := newReconciler(backup, hcp, sts, credSecret, failedJob) + ctx := context.Background() + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result).To(Equal(ctrl.Result{})) + + updated := &hyperv1.HCPEtcdBackup{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, updated)).To(Succeed()) + g.Expect(updated.Status.Conditions[0].Reason).To(Equal(hyperv1.BackupFailedReason)) + g.Expect(updated.Status.Conditions[0].Message).To(ContainSubstring("BackoffLimitExceeded")) + }) +} + +func TestReconcileHappyPathWithSTS(t *testing.T) { + t.Run("When AWS STS credentials it should create Job with projected token via full Reconcile", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + hcp := newHostedControlPlane() + sts := newEtcdStatefulSet(3, 3) + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, + } + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "aws-creds", Namespace: testHONamespace}, + Data: map[string][]byte{"credentials": []byte("arn:aws:iam::123456789012:role/etcd-backup")}, + } + r := newReconciler(backup, hcp, sts, pullSecret, credSecret) + ctx := context.Background() + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result.RequeueAfter).To(Equal(requeueInterval)) + + jobList := &batchv1.JobList{} + g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) + g.Expect(jobList.Items).To(HaveLen(1)) + + podSpec := jobList.Items[0].Spec.Template.Spec + // Should have projected token volume, not credentials + var hasToken, hasCreds bool + for _, v := range podSpec.Volumes { + if v.Name == volumeAWSIAMToken { + hasToken = true + } + if v.Name == volumeCredentials { + hasCreds = true + } + } + g.Expect(hasToken).To(BeTrue()) + g.Expect(hasCreds).To(BeFalse()) + + upload := podSpec.Containers[0] + g.Expect(upload.Env).To(ContainElement(corev1.EnvVar{ + Name: "AWS_ROLE_ARN", + Value: "arn:aws:iam::123456789012:role/etcd-backup", + })) + }) +} + +func TestReconcileHappyPathWithAzureWI(t *testing.T) { + t.Run("When Azure Workload Identity credentials it should annotate SA and create Job via full Reconcile", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{Name: testBackupName, Namespace: testHCPNamespace}, + Spec: hyperv1.HCPEtcdBackupSpec{ + Storage: hyperv1.HCPEtcdBackupStorage{ + StorageType: hyperv1.AzureBlobBackupStorage, + AzureBlob: hyperv1.HCPEtcdBackupAzureBlob{ + Container: "my-container", + StorageAccount: "mystorageaccount", + KeyPrefix: "backups/test", + Credentials: hyperv1.SecretReference{Name: "azure-creds"}, + }, + }, + }, + } + hcp := newHostedControlPlane() + sts := newEtcdStatefulSet(3, 3) + pullSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: pullSecretName, Namespace: testHCPNamespace}, + Data: map[string][]byte{corev1.DockerConfigJsonKey: []byte(`{"auths":{}}`)}, + } + credSecret := &corev1.Secret{ + ObjectMeta: metav1.ObjectMeta{Name: "azure-creds", Namespace: testHONamespace}, + Data: map[string][]byte{ + "cloud": []byte("AZURE_CLIENT_ID=my-client-id\nAZURE_TENANT_ID=my-tenant\n"), + }, + } + r := newReconciler(backup, hcp, sts, pullSecret, credSecret) + ctx := context.Background() + + result, err := r.Reconcile(ctx, ctrl.Request{ + NamespacedName: types.NamespacedName{Name: testBackupName, Namespace: testHCPNamespace}, + }) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(result.RequeueAfter).To(Equal(requeueInterval)) + + // Verify SA was annotated with the client ID + sa := &corev1.ServiceAccount{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: jobServiceAccountName, Namespace: testHONamespace}, sa)).To(Succeed()) + g.Expect(sa.Annotations["azure.workload.identity/client-id"]).To(Equal("my-client-id")) + + // Verify Job has WI pod label + jobList := &batchv1.JobList{} + g.Expect(r.List(ctx, jobList, client.InNamespace(testHONamespace))).To(Succeed()) + g.Expect(jobList.Items).To(HaveLen(1)) + g.Expect(jobList.Items[0].Spec.Template.Labels["azure.workload.identity/use"]).To(Equal("true")) + + // Verify no credentials volume + for _, v := range jobList.Items[0].Spec.Template.Spec.Volumes { + g.Expect(v.Name).ToNot(Equal(volumeCredentials)) + } + }) +} + +func TestEnsureRBACIdempotency(t *testing.T) { + t.Run("When ensureRBAC is called twice it should not error", func(t *testing.T) { + g := NewGomegaWithT(t) + backup := newHCPEtcdBackup() + r := newReconciler(backup) + ctx := context.Background() + + g.Expect(r.ensureRBAC(ctx, backup)).To(Succeed()) + g.Expect(r.ensureRBAC(ctx, backup)).To(Succeed()) + + role := &rbacv1.Role{} + g.Expect(r.Get(ctx, types.NamespacedName{Name: RBACName, Namespace: testHCPNamespace}, role)).To(Succeed()) + g.Expect(role.Rules).To(HaveLen(2)) + }) +} + +func TestEnforceRetentionDeleteRace(t *testing.T) { + t.Run("When an old backup was already deleted by another controller it should skip NotFound and succeed", func(t *testing.T) { + g := NewGomegaWithT(t) + + baseTime := time.Now().Add(-7 * time.Hour) + var backups []client.Object + for i := range 7 { + b := &hyperv1.HCPEtcdBackup{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf("backup-%d", i), + Namespace: testHCPNamespace, + CreationTimestamp: metav1.NewTime(baseTime.Add(time.Duration(i) * time.Hour)), + }, + Status: hyperv1.HCPEtcdBackupStatus{ + Conditions: []metav1.Condition{ + { + Type: string(hyperv1.BackupCompleted), + Status: metav1.ConditionTrue, + Reason: hyperv1.BackupSucceededReason, + }, + }, + }, + } + backups = append(backups, b) + } + + deleteCount := 0 + s := newScheme() + fakeClient := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(backups...). + WithStatusSubresource(&hyperv1.HCPEtcdBackup{}). + WithInterceptorFuncs(interceptor.Funcs{ + Delete: func(ctx context.Context, cl client.WithWatch, obj client.Object, opts ...client.DeleteOption) error { + deleteCount++ + if deleteCount == 1 { + // Simulate race: backup-0 already deleted by another controller + return apierrors.NewNotFound( + schema.GroupResource{Group: "hypershift.openshift.io", Resource: "hcpetcdbackups"}, + obj.GetName(), + ) + } + return cl.Delete(ctx, obj, opts...) + }, + }). + Build() + + r := &HCPEtcdBackupReconciler{ + Client: fakeClient, + OperatorNamespace: testHONamespace, + MaxBackupCount: 5, + } + + err := r.enforceRetention(context.Background(), testHCPNamespace) + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(deleteCount).To(Equal(2)) + }) +} + func TestGetSnapshotURLFromPod(t *testing.T) { tests := []struct { name string