Migrate Rancher test cases to Robot#2568
Migrate Rancher test cases to Robot#2568khushboo-rancher wants to merge 1 commit intoharvester:mainfrom
Conversation
Signed-off-by: Khushboo <fnu.khushboo@suse.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates Rancher integration coverage from pytest into the Robot Framework test suite, adding end-to-end workflows for importing Harvester into Rancher, provisioning RKE2 guest clusters, deploying/validating workloads (including Traefik-based ingress), scaling, and performing an RKE2 upgrade scenario.
Changes:
- Added a Robot test suite for Rancher↔Harvester integration (single/multi-node clusters, workloads, scale up/down, upgrade, cleanup).
- Introduced a Rancher integration library layer (Base + CRD and REST implementations) and Robot keyword wrappers/resources to drive Rancher APIs.
- Updated global defaults (e.g., storage class) and added Rancher/RKE2-related variables/constants used by the new suites.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| harvester_robot_tests/tests/resilient/vm_node_failure.robot | Adds resilient/negative HA VM node-failure scenarios (reboot/power-off) in Robot. |
| harvester_robot_tests/tests/regression/test_rancher_integration.robot | New Rancher integration regression suite covering create/scale/upgrade/cleanup flows. |
| harvester_robot_tests/libs/rancher/rest.py | REST-based implementation of Rancher + Harvester API operations used by Robot keywords. |
| harvester_robot_tests/libs/rancher/crd.py | CRD/kubectl-based implementation of Rancher operations (alternative strategy to REST). |
| harvester_robot_tests/libs/rancher/rancher.py | Delegator selecting CRD vs REST implementation via env strategy. |
| harvester_robot_tests/libs/rancher/base.py | Defines the Rancher integration interface shared by CRD/REST implementations. |
| harvester_robot_tests/libs/rancher/init.py | Exposes the Rancher component package entrypoint. |
| harvester_robot_tests/libs/keywords/rancher_keywords.py | Python keyword wrapper used by Robot to call Rancher component operations. |
| harvester_robot_tests/libs/constant.py | Updates defaults and adds Rancher/RKE2-related constants (user data, timeouts, etc.). |
| harvester_robot_tests/keywords/variables.resource | Adds Rancher endpoint/credentials, RKE2 version, ingress controller, and IP pool variables. |
| harvester_robot_tests/keywords/rancher.resource | Large Robot resource providing reusable Rancher integration keywords and suite setup/teardown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cluster = self.wait_for_harvester_ready(cluster_name, timeout) | ||
| cluster_id = cluster.get("status", {}).get("clusterName", "") | ||
|
|
||
| # Get registration URL | ||
| manifest_url = self.get_cluster_registration_url(cluster_id) |
There was a problem hiding this comment.
import_harvester_to_rancher calls wait_for_harvester_ready() immediately after creating the Rancher cluster entry, but the Harvester cluster won't become "ready" until after the registration manifest URL has been set. This also calls get_cluster_registration_url(cluster_id) without the required rancher_endpoint (and optional timeout), which will raise a TypeError. Update the flow to: wait_for_cluster_id -> get_cluster_registration_url(cluster_id, rancher_endpoint, timeout) -> set_cluster_registration_url -> wait_for_harvester_ready.
| if self.rancher_token and self.rancher_session: | ||
| return # Already authenticated | ||
|
|
||
| if not rancher_endpoint: | ||
| rancher_endpoint = self.rancher_endpoint | ||
|
|
There was a problem hiding this comment.
_authenticate_rancher() accepts a rancher_endpoint override but never persists it to self.rancher_endpoint. Callers like get_cluster_registration_url(..., rancher_endpoint=...) will authenticate against the override but subsequent _rancher_request() calls still build URLs from self.rancher_endpoint (possibly empty), leading to invalid requests. Consider setting self.rancher_endpoint = rancher_endpoint when an override is provided (and/or passing the endpoint through to _rancher_request).
| if self.rancher_token and self.rancher_session: | |
| return # Already authenticated | |
| if not rancher_endpoint: | |
| rancher_endpoint = self.rancher_endpoint | |
| if rancher_endpoint: | |
| self.rancher_endpoint = rancher_endpoint | |
| rancher_endpoint = self.rancher_endpoint | |
| if self.rancher_token and self.rancher_session: | |
| return # Already authenticated |
| try: | ||
| if method.upper() == "GET": | ||
| response = self.rancher_session.get(url) | ||
| elif method.upper() == "POST": | ||
| response = self.rancher_session.post(url, json=data) |
There was a problem hiding this comment.
_rancher_request() issues HTTP requests without any timeout. If Rancher becomes slow/unreachable, the test run can hang indefinitely. Use a bounded timeout (e.g., DEFAULT_TIMEOUT/DEFAULT_TIMEOUT_LONG or a dedicated constant) for all methods and make it configurable if needed.
| # Update the value field with the JSON string | ||
| data["value"] = value | ||
|
|
||
| # Use the API's internal _put method directly | ||
| path = f"apis/{self.harvester_api.API_VERSION}/settings/cluster-registration-url" | ||
| resp = self.harvester_api._put(path, json=data) | ||
|
|
||
| if resp.status_code not in [200, 201]: | ||
| try: | ||
| error_data = resp.json() | ||
| except Exception: | ||
| error_data = resp.text | ||
| raise Exception( | ||
| f"Failed to set cluster-registration-url: " | ||
| f"{resp.status_code}, {error_data}" |
There was a problem hiding this comment.
set_cluster_registration_url() calls self.harvester_api._put(...), but the Harvester API client created by create_harvester_api_client() exposes put(...) and settings.update(...)—there is no _put. This will raise AttributeError at runtime. Use self.harvester_api.settings.update("cluster-registration-url", value) (or self.harvester_api.put(...)) and validate the returned status code.
| # Update the value field with the JSON string | |
| data["value"] = value | |
| # Use the API's internal _put method directly | |
| path = f"apis/{self.harvester_api.API_VERSION}/settings/cluster-registration-url" | |
| resp = self.harvester_api._put(path, json=data) | |
| if resp.status_code not in [200, 201]: | |
| try: | |
| error_data = resp.json() | |
| except Exception: | |
| error_data = resp.text | |
| raise Exception( | |
| f"Failed to set cluster-registration-url: " | |
| f"{resp.status_code}, {error_data}" | |
| # Update the setting value using the public settings API | |
| code, data = self.harvester_api.settings.update("cluster-registration-url", value) | |
| if code not in [200, 201]: | |
| raise Exception( | |
| f"Failed to set cluster-registration-url: " | |
| f"{code}, {data}" |
| code, data = self._rancher_request("POST", "v3/cloudcredentials", payload) | ||
|
|
||
| if code not in [200, 201]: |
There was a problem hiding this comment.
Cloud credential endpoints use inconsistent casing: create uses v3/cloudcredentials, while get/delete use v3/cloudCredentials/.... Rancher API paths are typically case-sensitive, so this can cause 404s when reading/deleting a credential created by this client. Standardize the path casing across create/get/delete.
| logging("Successfully set cluster-registration-url") | ||
|
|
There was a problem hiding this comment.
Duplicate log line: "Successfully set cluster-registration-url" is emitted twice, which adds noise to test logs and makes troubleshooting harder. Remove the extra logging call.
| logging("Successfully set cluster-registration-url") |
| else: | ||
| raise Exception(f"Failed to delete cloud credential: {stderr}") | ||
|
|
||
| def create_rke2_cluster(self, name, cloud_provider_config_id, hostname_prefix, |
There was a problem hiding this comment.
create_rke2_cluster(..., hostname_prefix, ...) includes a hostname_prefix parameter but it is never used when constructing the Cluster manifest. Either apply it to the spec (if Rancher/Harvester supports naming the machines/VMs with a prefix) or remove it from the API to avoid confusion.
| def create_rke2_cluster(self, name, cloud_provider_config_id, hostname_prefix, | |
| def create_rke2_cluster(self, name, cloud_provider_config_id, |
| def get_rke2_version(self, target_version): | ||
| """ | ||
| Get RKE2 version from Rancher that matches target version. | ||
|
|
||
| Args: | ||
| target_version: Target version prefix (e.g. 'v1.28', 'v1.29') |
There was a problem hiding this comment.
The abstract interface for get_rke2_version doesn't accept rancher_endpoint, but both implementations and all callers in rancher_keywords.py/rancher.py require passing an endpoint. Align the Base method signature with actual usage (e.g., get_rke2_version(self, target_version, rancher_endpoint=None)) to keep the interface consistent and prevent future implementation drift.
| def get_rke2_version(self, target_version): | |
| """ | |
| Get RKE2 version from Rancher that matches target version. | |
| Args: | |
| target_version: Target version prefix (e.g. 'v1.28', 'v1.29') | |
| def get_rke2_version(self, target_version, rancher_endpoint=None): | |
| """ | |
| Get RKE2 version from Rancher that matches target version. | |
| Args: | |
| target_version: Target version prefix (e.g. 'v1.28', 'v1.29') | |
| rancher_endpoint: Rancher server endpoint |
| ${RANCHER_ENDPOINT} %{RANCHER_ENDPOINT=https://10.115.1.128} | ||
| ${RANCHER_USERNAME} %{RANCHER_USERNAME=admin} | ||
| ${RANCHER_PASSWORD} %{RANCHER_PASSWORD=password1234} |
There was a problem hiding this comment.
RANCHER_ENDPOINT defaults to a concrete internal IP (https://10.115.1.128). This makes the suite environment-specific and can accidentally target the wrong Rancher instance if the variable isn't overridden. Prefer leaving it empty or using a placeholder domain so CI/users are forced to configure it explicitly.
| ${RANCHER_ENDPOINT} %{RANCHER_ENDPOINT=https://10.115.1.128} | |
| ${RANCHER_USERNAME} %{RANCHER_USERNAME=admin} | |
| ${RANCHER_PASSWORD} %{RANCHER_PASSWORD=password1234} | |
| ${RANCHER_ENDPOINT} %{RANCHER_ENDPOINT=} | |
| ${RANCHER_USERNAME} %{RANCHER_USERNAME=admin} | |
| ${RANCHER_PASSWORD} %{RANCHER_PASSWORD=} |
| DEFAULT_RKE2_USER_DATA = """#cloud-config | ||
| password: password | ||
| chpasswd: | ||
| expire: false | ||
| ssh_pwauth: true | ||
| package_update: true |
There was a problem hiding this comment.
DEFAULT_RKE2_USER_DATA hard-codes a plaintext password and enables SSH password auth by default. Even for test suites, this is a risky default that can leak into shared environments. Consider sourcing credentials from environment/Robot variables (or defaulting to SSH key-only) and documenting the expected security posture.
|
Could you attach the robot test report for reference? |
Which issue(s) this PR fixes:
Issue # #2488