Skip to content

feat(MTV-595): add warm migration test for VMs with XFS v4 filesystem#503

Open
Chenli-Hu wants to merge 1 commit into
RedHatQE:mainfrom
Chenli-Hu:xfs
Open

feat(MTV-595): add warm migration test for VMs with XFS v4 filesystem#503
Chenli-Hu wants to merge 1 commit into
RedHatQE:mainfrom
Chenli-Hu:xfs

Conversation

@Chenli-Hu
Copy link
Copy Markdown
Contributor

@Chenli-Hu Chenli-Hu commented May 19, 2026

Summary by CodeRabbit

  • New Features

    • XFS filesystem compatibility for warm migrations, including optional post-migration verification on migrated VMs.
    • Ability to specify a custom pod network in a migration plan to prioritize its mapping during migration.
  • Bug Fixes

    • Improved guest-network IP detection compatibility with older nmcli versions.
  • Tests

    • Added an XFS warm-migration test case and split parametrization for Windows vs. XFS; test config updated.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24860194-4681-41ea-bbc3-957e6c9feb9b

📥 Commits

Reviewing files that changed from the base of the PR and between 63804a1 and 2ddfb0b.

📒 Files selected for processing (6)
  • tests/tests_config/config.py
  • tests/warm/test_warm_migration_comprehensive.py
  • utilities/mtv_migration.py
  • utilities/post_migration.py
  • utilities/utils.py
  • utilities/vmware_guest_operations.py

Walkthrough

Adds XFS compatibility and custom pod network support to warm migration tooling and tests: plan creation accepts xfs_compatibility, network mapping can prioritize a custom pod network, tests add an XFS scenario and pass the custom network, and post-migration checks validate XFS metadata on migrated Linux VMs.

Changes

Warm Migration XFS Support

Layer / File(s) Summary
Network reordering for custom pod network
utilities/utils.py
gen_network_map_list accepts custom_pod_network and, when set and pod_only is false, reorders VM networks so the named network is mapped to the pod destination first. Adds _reorder_networks_by_pod_network that finds the named network or raises ValueError listing available names.
XFS and custom network parameter wiring
utilities/mtv_migration.py
create_plan_resource gains xfs_compatibility: bool = False and conditionally injects it into the Plan CR. get_network_migration_map accepts custom_pod_network and forwards it to gen_network_map_list.
XFS test configuration and parametrization
tests/tests_config/config.py, tests/warm/test_warm_migration_comprehensive.py
Add "test_warm_migration_xfs" to tests_params enabling XFS and custom_pod_network. Split comprehensive warm migration parametrization into distinct Windows and XFS cases and add XFS to the test docstring feature list.
XFS test implementation
tests/warm/test_warm_migration_comprehensive.py
test_create_networkmap reads optional custom_pod_network from prepared_plan and passes it to get_network_migration_map. test_create_plan uses .get(...) with defaults for optional plan fields (including preserve_static_ips, xfs_compatibility) and passes target_labels only when present.
Post-migration XFS validation
utilities/post_migration.py
Add check_vm_command_output() to run SSH commands and assert expected stdout. When plan["xfs_compatibility"] is enabled and SSH connections exist, run xfs_info /sdb1 and expect crc=0, recording per-VM failures.
nmcli compatibility tweak
utilities/vmware_guest_operations.py
Embedded guest nmcli script now uses nmcli -f (instead of -g) to extract fields for compatibility with older nmcli versions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels: verified, lgtm-coderabbitai[bot]

Suggested reviewers:

  • myakove
  • krcmarik
  • solenoci
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding a warm migration test for VMs with XFS v4 filesystem, which is the primary focus across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@redhat-qe-bot
Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: All label categories are enabled (default configuration)

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest conventional-title - Validate commit message format
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. Status Checks: All required status checks must pass
  3. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  4. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • myakove
  • solenoci

Reviewers:

  • krcmarik
  • myakove
  • solenoci
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge
AI Features
  • Conventional Title: Mode: fix (claude/claude-opus-4-6[1m])
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])
  • Test Oracle: Triggers: approved (cursor/gpt-5.4-xhigh-fast); /test-oracle can be used anytime

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@redhat-qe-bot redhat-qe-bot requested a review from solenoci May 19, 2026 00:44
@redhat-qe-bot redhat-qe-bot requested a review from myakove May 19, 2026 00:44
@redhat-qe-bot redhat-qe-bot requested a review from krcmarik May 19, 2026 00:44
@redhat-qe-bot redhat-qe-bot changed the title Warm migrate VM with XFS v4 from vSphere feat(MTV-595): add warm migration test for VMs with XFS v4 filesystem May 19, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utilities/utils.py`:
- Around line 305-315: The code currently mutates the input list via
networks.remove(...) and networks.insert(...), which can leak order across
callers; instead construct and return a new list so the original networks
argument is not modified: after finding pod_net (the result of next(...)), build
a new list with pod_net first followed by all other entries filtered out (use
networks and custom_pod_network to filter), and return that new list; keep the
existing ValueError behavior when pod_net is None and reference the same symbols
(pod_net, networks, custom_pod_network) so the change is localized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 02b43eca-2a31-467e-b8e6-358366e4a560

📥 Commits

Reviewing files that changed from the base of the PR and between f82917b and 8ed18dc.

📒 Files selected for processing (4)
  • tests/tests_config/config.py
  • tests/warm/test_warm_migration_comprehensive.py
  • utilities/mtv_migration.py
  • utilities/utils.py

Comment thread utilities/utils.py Outdated
@Chenli-Hu
Copy link
Copy Markdown
Contributor Author

Chenli-Hu commented May 19, 2026

Hi @krcmarik , please help to review, thank you!
I tested in local, both 1) and 2) are passed on MTV 2.11.6.
There is a bug: MTV-5419 on 2.12.0, without that fix the warm migration will failed on MTV 2.12.0. => Tested on latest 2.12.0, the bug is fixed, both 1) and 2) passed on 2.12.0-40.

1)Migrate cloned xfs VM only
#uv run pytest -vv -m "tier0" -k "comprehensive-warm-xfs" --tc=source_provider:"vcenter" --tc=storage_class:"nfs-csi" ....

2)Migrate both cloned mtv-win2022-ip-3disks and cloned mtv-feature-rhel7-xfs
#uv run pytest -vv -m "tier0" -k "comprehensive-warm" --tc=source_provider:"vcenter" --tc=storage_class:"nfs-csi"....


tests/warm/test_warm_migration_comprehensive.py::TestWarmMigrationComprehensive::test_create_storagemap[comprehensive-warm-xfs] PASSED
_ 1 of 5 completed, 1 Pass, 0 Fail, 0 Skip, 0 XPass, 0 XFail, 0 Error, 0 ReRun _

tests/warm/test_warm_migration_comprehensive.py::TestWarmMigrationComprehensive::test_create_networkmap[comprehensive-warm-xfs] PASSED
_ 2 of 5 completed, 2 Pass, 0 Fail, 0 Skip, 0 XPass, 0 XFail, 0 Error, 0 ReRun _

tests/warm/test_warm_migration_comprehensive.py::TestWarmMigrationComprehensive::test_create_plan[comprehensive-warm-xfs] PASSED
_ 3 of 5 completed, 3 Pass, 0 Fail, 0 Skip, 0 XPass, 0 XFail, 0 Error, 0 ReRun _

tests/warm/test_warm_migration_comprehensive.py::TestWarmMigrationComprehensive::test_migrate_vms[comprehensive-warm-xfs] PASSED
_ 4 of 5 completed, 4 Pass, 0 Fail, 0 Skip, 0 XPass, 0 XFail, 0 Error, 0 ReRun _

tests/warm/test_warm_migration_comprehensive.py::TestWarmMigrationComprehensive::test_check_vms[comprehensive-warm-xfs] PASSED
_ 5 of 5 completed, 5 Pass, 0 Fail, 0 Skip, 0 XPass, 0 XFail, 0 Error, 0 ReRun _
======== 5 passed, 211 deselected, 3021 warnings in 1589.90s (0:26:29) =========

@Chenli-Hu
Copy link
Copy Markdown
Contributor Author

Chenli-Hu commented May 19, 2026

/verified

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/warm/test_warm_migration_comprehensive.py`:
- Line 193: The test currently enables preserve_static_ips by default via
prepared_plan.get("preserve_static_ips", True), which implicitly turns the
feature on for plans that never declared it; change this to read the flag
without a default so missing keys are treated as absent (e.g., use
prepared_plan["preserve_static_ips"] or otherwise pass the explicit value from
the test fixture) to avoid silently enabling static-IP validation paths—update
the call site that sets preserve_static_ips to stop using .get(..., True) and
instead require an explicit value from prepared_plan.

In `@utilities/mtv_migration.py`:
- Around line 146-147: You've added a new positional parameter xfs_compatibility
before target_power_state which breaks existing positional callers; change the
function signature to keep existing positional binding by either moving
xfs_compatibility after target_power_state or make it keyword-only (e.g., put
"*, xfs_compatibility: bool = False" after target_power_state) so old callers
still bind target_power_state correctly, and then update any call sites/tests
that intentionally pass xfs_compatibility positionally to use the keyword form.

In `@utilities/post_migration.py`:
- Around line 1257-1276: The code resolves destination_vm_name but still uses
vm_name when creating the SSH connection and logging; update usages to target
the migrated VM by replacing vm_name with destination_vm_name in
vm_ssh_connections.create and in the LOGGER.info message (leave
get_ssh_credentials_from_provider_config(source_provider_data, source_vm_data)
and the win_os check as-is so credentials and OS checks still come from
source_vm_data). Ensure vm_ssh_connections.create(...) is called with
vm_name=destination_vm_name and corresponding log text reflects the destination
VM.

In `@utilities/utils.py`:
- Around line 258-259: The current call to _reorder_networks_by_pod_network can
raise a ValueError when custom_pod_network is missing even though pod_only=True
makes reordering unnecessary; update the condition around the call so it only
runs when custom_pod_network is truthy AND pod_only is False (i.e., guard the
call with "if custom_pod_network and not pod_only") to skip reordering in
pod_only mode; reference symbols: custom_pod_network, pod_only,
_reorder_networks_by_pod_network, networks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4da22c4c-e415-4cb9-abe3-75542d1da274

📥 Commits

Reviewing files that changed from the base of the PR and between 8ed18dc and 4a38083.

📒 Files selected for processing (5)
  • tests/tests_config/config.py
  • tests/warm/test_warm_migration_comprehensive.py
  • utilities/mtv_migration.py
  • utilities/post_migration.py
  • utilities/utils.py

Comment thread tests/warm/test_warm_migration_comprehensive.py Outdated
Comment thread utilities/mtv_migration.py Outdated
Comment thread utilities/post_migration.py
Comment thread utilities/utils.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utilities/post_migration.py`:
- Around line 1475-1488: The XFS check calls check_vm_command_output whenever
plan.get("xfs_compatibility") is true, causing failures if the destination VM is
powered off; update the condition to also verify the destination VM is powered
on (e.g., vm.get("power_state") == "on" or an existing vm_powered_on flag) and
only run check_vm_command_output when vm_ssh_connections and
plan.get("xfs_compatibility") are true AND the VM is powered on, keeping the
existing try/except and res[vm_name].append(...) behavior for error reporting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e1102a9-cf8a-4744-9f84-80712626103e

📥 Commits

Reviewing files that changed from the base of the PR and between 4a38083 and 9c5c0b4.

📒 Files selected for processing (6)
  • tests/tests_config/config.py
  • tests/warm/test_warm_migration_comprehensive.py
  • utilities/mtv_migration.py
  • utilities/post_migration.py
  • utilities/utils.py
  • utilities/vmware_guest_operations.py

Comment thread utilities/post_migration.py
@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 27, 2026

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (4)

Grey Divider


Action required

1. source_vms_data defaulted to {} 📘 Rule violation ☼ Reliability
Description
check_vm_command_output() silently defaults missing plan["source_vms_data"][vm_name] to {},
even though it uses that data to infer Windows-vs-Linux and select SSH credentials, which can
mis-detect guest OS (e.g., treat Windows as Linux) and fail later for misleading reasons. This
violates the intended fail-fast behavior and can cause confusing downstream post-migration
validation failures.
Code

utilities/post_migration.py[R1320-1333]

Evidence
PR Compliance ID 9 requires rejecting invalid or missing required data early with a clear exception,
but the implementation uses plan.get("source_vms_data", {}).get(vm_name, {}), allowing absent
source_vms_data or missing VM entries to propagate into OS detection and credential selection
without an explicit failure. Because credential selection is driven by the win_os flag taken from
this optional/feature-specific structure (treated elsewhere as required only for specific features
like static-IP preservation), missing data can cause Windows guests to be treated as Linux and lead
to check failures for the wrong reason rather than a clear, immediate error at the point of use.

CLAUDE.md: Fail Fast: Do Not Allow Invalid None Values to Propagate
utilities/post_migration.py[1320-1333]
utilities/post_migration.py[39-81]
utilities/post_migration.py[1294-1336]
utilities/post_migration.py[1444-1454]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`check_vm_command_output()` currently uses `plan.get("source_vms_data", {}).get(vm_name, {})` to determine whether the guest is Windows and to choose Linux vs Windows SSH credentials, which masks missing `source_vms_data`/missing VM entries and can cause incorrect OS detection and misleading downstream failures during post-migration validation. Update the logic to fail fast with a clear exception when required VM metadata is missing, and/or stop using `source_vms_data` as the general source of OS/credential metadata.

## Issue Context
- This code path is used for post-migration validation and should comply with the fail-fast requirement (reject missing/invalid required data early with a clear, specific exception).
- `get_ssh_credentials_from_provider_config()` expects a reliable `source_vm_info["win_os"]` flag.
- `source_vms_data` is not guaranteed to exist for all scenarios/providers and is treated as required only for specific features (e.g., static-IP preservation), so it is not a reliable general source of OS metadata.
- `check_vms()` already has authoritative `source_vm` data available from `source_provider.vm_dict(...)`; prefer passing that into `check_vm_command_output()` (e.g., accept `source_vm_info: dict[str, Any]`) and use it for the Windows guard and credential selection. Optionally keep `source_vms_data` only for data that truly lives there, but do not use it for OS detection.

## Fix Focus Areas
- utilities/post_migration.py[1294-1361]
- utilities/post_migration.py[1320-1333]
- utilities/post_migration.py[1475-1494]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. No retries for SSH cmd 🐞 Bug ☼ Reliability
Description
check_vm_command_output() runs the SSH command exactly once with no retry/backoff, unlike existing
SSH checks in this file, so transient guest readiness/SSH instability post-migration can cause
intermittent false test failures.
Code

utilities/post_migration.py[R1337-1347]

Evidence
The codebase already implements retry/backoff for SSH readiness using TimeoutSampler, but the
newly added helper does not, increasing the chance of timing-based flake failures right after
migration.

utilities/post_migration.py[84-140]
utilities/post_migration.py[1294-1358]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`check_vm_command_output()` executes `executor.run_cmd(command)` once and fails immediately on non-zero rc or missing expected output. Post-migration guests commonly have short windows of instability (services still starting, udev settling, etc.), and this file already uses `TimeoutSampler` retry patterns for SSH connectivity.

### Issue Context
- `check_ssh_connectivity()` uses `TimeoutSampler` to avoid flakiness.
- `check_vm_command_output()` creates a new SSH connection and immediately executes the command without any retry loop.

### Fix Focus Areas
- utilities/post_migration.py[84-140]
- utilities/post_migration.py[1294-1361]

### Suggested fix
- Wrap the command execution + expected-output assertion in a `TimeoutSampler` retry loop (similar to `check_ssh_connectivity()`), e.g.:
 - retry for `wait_timeout=120` with `sleep=10`
 - retry on SSH exceptions and on `rc != 0`
 - (optionally) retry if `expected_output` not found, since filesystems/services may not be ready immediately
- Keep the final error message from the last attempt for debuggability.
- Alternatively, reuse an already-validated SSH session/connection if available to reduce repeated port-forward setup.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Defaults used in prepared_plan.get() 📘 Rule violation ≡ Correctness
Description
The test now reads project-controlled tests_params values via .get() with implicit
defaults/None, which can silently mask missing or misspelled required configuration keys and change
migration behavior without failing fast.
Code

tests/warm/test_warm_migration_comprehensive.py[R189-200]

Evidence
Rule 28 forbids using defaults for project-controlled config keys. The updated test_create_plan
passes multiple values from prepared_plan using .get() (including a default for
preserve_static_ips) which can silently substitute missing keys instead of failing fast.

CLAUDE.md: Configuration We Control Must Not Use Defaults: Access Required Keys Directly; Only Optional Feature Flags May Default
tests/warm/test_warm_migration_comprehensive.py[189-200]
tests/tests_config/config.py[524-575]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`prepared_plan` originates from `py_config["tests_params"][...]` (configuration we control). The test uses `.get()` (and `.get(..., default)`) for keys that should be explicitly present, which can hide misconfiguration and let invalid config propagate as `None`/defaults.

## Issue Context
Rule requires required project-controlled config keys to be accessed directly (e.g., `prepared_plan["target_power_state"]`) and forbids silently defaulting missing keys (except for optional feature flags).

## Fix Focus Areas
- tests/warm/test_warm_migration_comprehensive.py[189-200]
- tests/tests_config/config.py[524-575]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Class parametrization shares class state 📘 Rule violation ☼ Reliability ⭐ New
Description
The class-level parametrization adds a second scenario (test_warm_migration_xfs) while the test
class stores migration resources on mutable class attributes, which can leak/overwrite state across
test cases and create order-dependent behavior. This violates the requirement to avoid shared
mutable state between tests.
Code

tests/warm/test_warm_migration_comprehensive.py[R40-43]

Evidence
A second class-level parameter set was added, causing the same test class to run for multiple
configurations. The class still uses mutable class attributes to store resources, which can be
overwritten between parametrized runs and introduces shared mutable state across test cases.

CLAUDE.md: Do Not Share Mutable State Between Tests
tests/warm/test_warm_migration_comprehensive.py[33-46]
tests/warm/test_warm_migration_comprehensive.py[92-201]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`TestWarmMigrationComprehensive` is now parametrized with two plan configurations, but it relies on mutable class attributes (`self.__class__.storage_map`, `network_map`, `plan_resource`) to pass state between tests. This creates cross-test coupling and can cause state from one parametrized run to overwrite/affect another.

## Issue Context
The compliance rule forbids sharing mutable state between tests; resources should be produced/consumed via fixtures (ideally class-scoped) or each test should be fully self-contained.

## Fix Focus Areas
- tests/warm/test_warm_migration_comprehensive.py[33-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Wrong exception type used 🐞 Bug ⚙ Maintainability ⭐ New
Description
check_vm_command_output() raises GuestCommandError for SSH command failures, but GuestCommandError
is defined as VMware Guest Operations-specific, making failures misleading and potentially
mis-handled by callers that key off exception type.
Code

utilities/post_migration.py[R1351-1354]

Evidence
The exception class is explicitly documented as VMware Guest Operations-specific, but the new SSH
helper raises it for SSH command failures, creating a semantic mismatch.

exceptions/exceptions.py[122-127]
utilities/post_migration.py[1294-1354]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`utilities/post_migration.py:check_vm_command_output()` raises `GuestCommandError` when an SSH-executed command returns non-zero. However, `GuestCommandError` is defined/documented as applying to VMware Guest Operations failures, so using it for SSH conflates two different execution paths and makes troubleshooting/exception handling confusing.

### Issue Context
- `GuestCommandError` is documented as *VMware Guest Operations* related.
- `check_vm_command_output()` executes commands via SSH (`rrmngmnt` executor), not via Guest Operations.

### Fix Focus Areas
- Define a new exception for SSH/remote command execution (e.g., `SSHCommandError` / `RemoteCommandError`) **or** reuse an existing generic command-exec exception used elsewhere in the repo.
- Update `check_vm_command_output()` to raise the new/generic exception type instead of `GuestCommandError`.
- Ensure any callers/catch blocks (current and future) remain correct.

#### Fix Focus Areas (files/lines)
- exceptions/exceptions.py[122-130]
- utilities/post_migration.py[1294-1366]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Hard-coded XFS device path 🐞 Bug ☼ Reliability
Description
When xfs_compatibility is enabled, check_vms() runs xfs_info against the fixed device /sdb1,
but the plan/test config does not specify which disk/partition is XFS; if the VM’s XFS filesystem is
on a different device name, the post-migration validation will fail even though migration succeeded.
Code

utilities/post_migration.py[R1475-1491]

Evidence
The test config enables xfs_compatibility but does not provide a device identifier; the
post-migration check then hard-codes /sdb1 and any resulting error is turned into a test failure
via the aggregated failed_checks path.

tests/tests_config/config.py[558-575]
utilities/post_migration.py[1475-1494]
utilities/post_migration.py[1592-1595]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The XFS compatibility post-migration check always executes `xfs_info /sdb1`. The VM config/plan does not define an XFS device path, so this check is brittle and can fail for correct migrations when the XFS filesystem is on a different device (e.g., `/dev/sda1`, `/dev/vda2`, `/dev/mapper/...`).

### Issue Context
- The new XFS verification runs inside `check_vms()` only when `plan.get('xfs_compatibility')` is true.
- Any exception is collected into `res[...]` and ultimately causes `pytest.fail(...)`, so a wrong device path makes the whole test fail.

### Fix Focus Areas
- utilities/post_migration.py[1294-1361]
- utilities/post_migration.py[1475-1494]
- tests/tests_config/config.py[558-575]

### Suggested fix
Implement one of the following (prefer dynamic discovery):
1) **Dynamic discovery**: SSH-run a command to find an XFS filesystem device (e.g., `findmnt -nr -t xfs -o SOURCE | head -n1` or `lsblk -nrpo NAME,FSTYPE | awk '$2=="xfs"{print $1; exit}'`) and then run `xfs_info` on that discovered device.
2) **Config-driven**: Add a plan config field like `xfs_device` (or `xfs_mountpoint`) under `test_warm_migration_xfs` and pass it into the verifier instead of hard-coding.
3) **Fallbacks**: Try a small set of common device patterns and fail with a clear error that includes `lsblk` output.

Also update logging/error messages to include the discovered device and full command output on failure.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
7. Hardcoded namespaces in XFS config 📘 Rule violation ☼ Reliability
Description
The new XFS warm-migration test config hardcodes namespace values (vm_target_namespace pattern and
multus_namespace), rather than sourcing namespaces from fixtures, which can undermine parallel
isolation and policy consistency.
Code

tests/tests_config/config.py[R569-571]

Evidence
Rule 33 requires namespace values to be provided by fixtures and not hardcoded. The added
test_warm_migration_xfs configuration includes direct namespace strings (vm_target_namespace and
multus_namespace).

CLAUDE.md: Namespaces Must Be Provided by Fixtures (Never Hardcode)
tests/tests_config/config.py[558-575]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new `test_warm_migration_xfs` config sets namespace values directly in config (including a generated string and a fixed `default`). Compliance requires namespaces to be provided by fixtures rather than hardcoded in test logic/config.

## Issue Context
Namespace fixtures typically incorporate session UUID patterns to ensure uniqueness and safe parallel execution.

## Fix Focus Areas
- tests/tests_config/config.py[558-575]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Pod network reorder mismatch 🐞 Bug ≡ Correctness
Description
_reorder_networks_by_pod_network() can only match networks by net['name'], but some inventories
(e.g., OpenShift) return a pod network mapping as {'type': 'pod'} without a name, making it
impossible to select/reorder the pod network via custom_pod_network and leading to
confusing/incorrect behavior when the pod network is not already first.
Code

utilities/utils.py[R305-311]

Evidence
The reorder helper searches for a match only via net.get('name'), while the OpenShift inventory
explicitly produces a pod network mapping that lacks a name key, so that entry can never be
targeted by custom_pod_network matching-by-name.

utilities/utils.py[254-312]
libs/forklift_inventory.py[479-499]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`custom_pod_network` reordering only matches networks by `name`, but OpenShift inventory can emit a pod network entry without a `name` key (`{"type": "pod"}`). This prevents selecting/reordering that pod entry and can cause incorrect mapping when the pod network is not first.

### Issue Context
- `gen_network_map_list()` optionally reorders the inventory network list before mapping index 0 to pod.
- OpenShift inventory network list may include a pod entry represented as `{"type": "pod"}`.

### Fix Focus Areas
- utilities/utils.py[226-312]
- libs/forklift_inventory.py[479-499]

### Suggested fix
- Make `_reorder_networks_by_pod_network()` able to:
 - Prefer `{"type": "pod"}` entry when `custom_pod_network` indicates pod (e.g., allow `custom_pod_network="pod"` or `"__pod__"`), **or**
 - If an entry has `type=="pod"` and no name, allow matching by a synthetic name, **or**
 - Explicitly reject `custom_pod_network` for inventories that include non-name entries with a clear error message explaining what values are valid.
- Update the ValueError message to differentiate unnamed networks (pod) from named ones so users know what they can select.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

9. Docstring exceptions incomplete 🐞 Bug ⚙ Maintainability ⭐ New
Description
check_vm_command_output() documents only ValueError/AssertionError/GuestCommandError, but it also
raises KeyError (missing source_vms_data) and ConnectionError (SSH connection not established), so
the function contract is inaccurate and callers will be misled.
Code

utilities/post_migration.py[R1312-1345]

Evidence
The docstring's Raises section does not include KeyError/ConnectionError, but the function body
raises both under common error paths.

utilities/post_migration.py[1302-1333]
utilities/post_migration.py[1342-1345]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`check_vm_command_output()` raises exceptions that are not listed in its docstring `Raises:` section (notably `KeyError` and `ConnectionError`). This makes the function contract misleading and complicates correct handling/debugging.

### Issue Context
The function explicitly raises:
- `KeyError` when `plan["source_vms_data"][vm_name]` is missing.
- `ConnectionError` when the SSH connection is not established.

### Fix Focus Areas
- Update the docstring `Raises:` section to include all exception types that can be raised.
- If you change the exception type for SSH command failures (see related finding), update the docstring accordingly.

#### Fix Focus Areas (files/lines)
- utilities/post_migration.py[1302-1365]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 2ddfb0b

Results up to commit 63804a1


🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0)


Action required
1. Defaults used in prepared_plan.get() 📘 Rule violation ≡ Correctness
Description
The test now reads project-controlled tests_params values via .get() with implicit
defaults/None, which can silently mask missing or misspelled required configuration keys and change
migration behavior without failing fast.
Code

tests/warm/test_warm_migration_comprehensive.py[R189-200]

Evidence
Rule 28 forbids using defaults for project-controlled config keys. The updated test_create_plan
passes multiple values from prepared_plan using .get() (including a default for
preserve_static_ips) which can silently substitute missing keys instead of failing fast.

CLAUDE.md: Configuration We Control Must Not Use Defaults: Access Required Keys Directly; Only Optional Feature Flags May Default
tests/warm/test_warm_migration_comprehensive.py[189-200]
tests/tests_config/config.py[524-575]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`prepared_plan` originates from `py_config["tests_params"][...]` (configuration we control). The test uses `.get()` (and `.get(..., default)`) for keys that should be explicitly present, which can hide misconfiguration and let invalid config propagate as `None`/defaults.

## Issue Context
Rule requires required project-controlled config keys to be accessed directly (e.g., `prepared_plan["target_power_state"]`) and forbids silently defaulting missing keys (except for optional feature flags).

## Fix Focus Areas
- tests/warm/test_warm_migration_comprehensive.py[189-200]
- tests/tests_config/config.py[524-575]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. Hard-coded XFS device path 🐞 Bug ☼ Reliability
Description
When xfs_compatibility is enabled, check_vms() runs xfs_info against the fixed device /sdb1,
but the plan/test config does not specify which disk/partition is XFS; if the VM’s XFS filesystem is
on a different device name, the post-migration validation will fail even though migration succeeded.
Code

utilities/post_migration.py[R1475-1491]

Evidence
The test config enables xfs_compatibility but does not provide a device identifier; the
post-migration check then hard-codes /sdb1 and any resulting error is turned into a test failure
via the aggregated failed_checks path.

tests/tests_config/config.py[558-575]
utilities/post_migration.py[1475-1494]
utilities/post_migration.py[1592-1595]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The XFS compatibility post-migration check always executes `xfs_info /sdb1`. The VM config/plan does not define an XFS device path, so this check is brittle and can fail for correct migrations when the XFS filesystem is on a different device (e.g., `/dev/sda1`, `/dev/vda2`, `/dev/mapper/...`).

### Issue Context
- The new XFS verification runs inside `check_vms()` only when `plan.get('xfs_compatibility')` is true.
- Any exception is collected into `res[...]` and ultimately causes `pytest.fail(...)`, so a wrong device path makes the whole test fail.

### Fix Focus Areas
- utilities/post_migration.py[1294-1361]
- utilities/post_migration.py[1475-1494]
- tests/tests_config/config.py[558-575]

### Suggested fix
Implement one of the following (prefer dynamic discovery):
1) **Dynamic discovery**: SSH-run a command to find an XFS filesystem device (e.g., `findmnt -nr -t xfs -o SOURCE | head -n1` or `lsblk -nrpo NAME,FSTYPE | awk '$2=="xfs"{print $1; exit}'`) and then run `xfs_info` on that discovered device.
2) **Config-driven**: Add a plan config field like `xfs_device` (or `xfs_mountpoint`) under `test_warm_migration_xfs` and pass it into the verifier instead of hard-coding.
3) **Fallbacks**: Try a small set of common device patterns and fail with a clear error that includes `lsblk` output.

Also update logging/error messages to include the discovered device and full command output on failure.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Hardcoded namespaces in XFS config 📘 Rule violation ☼ Reliability
Description
The new XFS warm-migration test config hardcodes namespace values (vm_target_namespace pattern and
multus_namespace), rather than sourcing namespaces from fixtures, which can undermine parallel
isolation and policy consistency.
Code

tests/tests_config/config.py[R569-571]

Evidence
Rule 33 requires namespace values to be provided by fixtures and not hardcoded. The added
test_warm_migration_xfs configuration includes direct namespace strings (vm_target_namespace and
multus_namespace).

CLAUDE.md: Namespaces Must Be Provided by Fixtures (Never Hardcode)
tests/tests_config/config.py[558-575]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new `test_warm_migration_xfs` config sets namespace values directly in config (including a generated string and a fixed `default`). Compliance requires namespaces to be provided by fixtures rather than hardcoded in test logic/config.

## Issue Context
Namespace fixtures typically incorporate session UUID patterns to ensure uniqueness and safe parallel execution.

## Fix Focus Areas
- tests/tests_config/config.py[558-575]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Pod network reorder mismatch 🐞 Bug ≡ Correctness
Description
_reorder_networks_by_pod_network() can only match networks by net['name'], but some inventories
(e.g., OpenShift) return a pod network mapping as {'type': 'pod'} without a name, making it
impossible to select/reorder the pod network via custom_pod_network and leading to
confusing/incorrect behavior when the pod network is not already first.
Code

utilities/utils.py[R305-311]

Evidence
The reorder helper searches for a match only via net.get('name'), while the OpenShift inventory
explicitly produces a pod network mapping that lacks a name key, so that entry can never be
targeted by custom_pod_network matching-by-name.

utilities/utils.py[254-312]
libs/forklift_inventory.py[479-499]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`custom_pod_network` reordering only matches networks by `name`, but OpenShift inventory can emit a pod network entry without a `name` key (`{"type": "pod"}`). This prevents selecting/reordering that pod entry and can cause incorrect mapping when the pod network is not first.

### Issue Context
- `gen_network_map_list()` optionally reorders the inventory network list before mapping index 0 to pod.
- OpenShift inventory network list may include a pod entry represented as `{"type": "pod"}`.

### Fix Focus Areas
- utilities/utils.py[226-312]
- libs/forklift_inventory.py[479-499]

### Suggested fix
- Make `_reorder_networks_by_pod_network()` able to:
 - Prefer `{"type": "pod"}` entry when `custom_pod_network` indicates pod (e.g., allow `custom_pod_network="pod"` or `"__pod__"`), **or**
 - If an entry has `type=="pod"` and no name, allow matching by a synthetic name, **or**
 - Explicitly reject `custom_pod_network` for inventories that include non-name entries with a clear error message explaining what values are valid.
- Update the ValueError message to differentiate unnamed networks (pod) from named ones so users know what they can select.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Results up to commit 63804a1


🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (0)


Action required
1. source_vms_data defaulted to {} 📘 Rule violation ☼ Reliability
Description
check_vm_command_output() silently defaults missing plan["source_vms_data"][vm_name] to {},
even though it uses that data to infer Windows-vs-Linux and select SSH credentials, which can
mis-detect guest OS (e.g., treat Windows as Linux) and fail later for misleading reasons. This
violates the intended fail-fast behavior and can cause confusing downstream post-migration
validation failures.
Code

utilities/post_migration.py[R1320-1333]

Evidence
PR Compliance ID 9 requires rejecting invalid or missing required data early with a clear exception,
but the implementation uses plan.get("source_vms_data", {}).get(vm_name, {}), allowing absent
source_vms_data or missing VM entries to propagate into OS detection and credential selection
without an explicit failure. Because credential selection is driven by the win_os flag taken from
this optional/feature-specific structure (treated elsewhere as required only for specific features
like static-IP preservation), missing data can cause Windows guests to be treated as Linux and lead
to check failures for the wrong reason rather than a clear, immediate error at the point of use.

CLAUDE.md: Fail Fast: Do Not Allow Invalid None Values to Propagate
utilities/post_migration.py[1320-1333]
utilities/post_migration.py[39-81]
utilities/post_migration.py[1294-1336]
utilities/post_migration.py[1444-1454]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`check_vm_command_output()` currently uses `plan.get("source_vms_data", {}).get(vm_name, {})` to determine whether the guest is Windows and to choose Linux vs Windows SSH credentials, which masks missing `source_vms_data`/missing VM entries and can cause incorrect OS detection and misleading downstream failures during post-migration validation. Update the logic to fail fast with a clear exception when required VM metadata is missing, and/or stop using `source_vms_data` as the general source of OS/credential metadata.

## Issue Context
- This code path is used for post-migration validation and should comply with the fail-fast requirement (reject missing/invalid required data early with a clear, specific exception).
- `get_ssh_credentials_from_provider_config()` expects a reliable `source_vm_info["win_os"]` flag.
- `source_vms_data` is not guaranteed to exist for all scenarios/providers and is treated as required only for specific features (e.g., static-IP preservation), so it is not a reliable general source of OS metadata.
- `check_vms()` already has authoritative `source_vm` data available from `source_provider.vm_dict(...)`; prefer passing that into `check_vm_command_output()` (e.g., accept `source_vm_info: dict[str, Any]`) and use it for the Windows guard and credential selection. Optionally keep `source_vms_data` only for data that truly lives there, but do not use it for OS detection.

## Fix Focus Areas
- utilities/post_migration.py[1294-1361]
- utilities/post_migration.py[1320-1333]
- utilities/post_migration.py[1475-1494]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. No retries for SSH cmd 🐞 Bug ☼ Reliability
Description
check_vm_command_output() runs the SSH command exactly once with no retry/backoff, unlike existing
SSH checks in this file, so transient guest readiness/SSH instability post-migration can cause
intermittent false test failures.
Code

utilities/post_migration.py[R1337-1347]

Evidence
The codebase already implements retry/backoff for SSH readiness using TimeoutSampler, but the
newly added helper does not, increasing the chance of timing-based flake failures right after
migration.

utilities/post_migration.py[84-140]
utilities/post_migration.py[1294-1358]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`check_vm_command_output()` executes `executor.run_cmd(command)` once and fails immediately on non-zero rc or missing expected output. Post-migration guests commonly have short windows of instability (services still starting, udev settling, etc.), and this file already uses `TimeoutSampler` retry patterns for SSH connectivity.

### Issue Context
- `check_ssh_connectivity()` uses `TimeoutSampler` to avoid flakiness.
- `check_vm_command_output()` creates a new SSH connection and immediately executes the command without any retry loop.

### Fix Focus Areas
- utilities/post_migration.py[84-140]
- utilities/post_migration.py[1294-1361]

### Suggested fix
- Wrap the command execution + expected-output assertion in a `TimeoutSampler` retry loop (similar to `check_ssh_connectivity()`), e.g.:
 - retry for `wait_timeout=120` with `sleep=10`
 - retry on SSH exceptions and on `rc != 0`
 - (optionally) retry if `expected_output` not found, since filesystems/services may not be ready immediately
- Keep the final error message from the last attempt for debuggability.
- Alternatively, reuse an already-validated SSH session/connection if available to reduce repeated port-forward setup.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utilities/post_migration.py`:
- Around line 1321-1333: The code is silently defaulting source_vm_data to {}
and assuming ssh_conn.rrmngmnt_host exists which leads to opaque failures;
update the logic around source_vm_data,
get_ssh_credentials_from_provider_config, and ssh_conn usage to fail fast:
explicitly check if plan.get("source_vms_data") is None or if source_vm_data is
missing for vm_name and raise a clear ValueError naming vm_name and plan key,
validate that get_ssh_credentials_from_provider_config returned non-None
ssh_username/ssh_password and raise a descriptive error if missing, and before
using ssh_conn.rrmngmnt_host validate ssh_conn is not None and that
ssh_conn.rrmngmnt_host is present (raise a clear exception referencing
destination_vm_name and ssh_conn) so callers get actionable setup errors instead
of opaque attribute errors.

In `@utilities/vmware_guest_operations.py`:
- Around line 222-224: The awk program is written with double braces so it’s
passed literally and returns empty values; update the shell command strings that
set dev and method (the lines constructing "dev=$(nmcli -f GENERAL.DEVICES ... |
awk -F: '{{print $2}}' ..." and "method=$(nmcli -f ipv4.method ... | awk -F:
'{{print $2}}' ...)") to use single-brace awk syntax ({print $2}) so dev and
method are extracted correctly; ensure both occurrences are fixed and keep the
surrounding nmcli/test/xargs pipeline intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 193fb40a-71bf-4373-b22a-67de42ab42f0

📥 Commits

Reviewing files that changed from the base of the PR and between 9c5c0b4 and 63804a1.

📒 Files selected for processing (6)
  • tests/tests_config/config.py
  • tests/warm/test_warm_migration_comprehensive.py
  • utilities/mtv_migration.py
  • utilities/post_migration.py
  • utilities/utils.py
  • utilities/vmware_guest_operations.py

Comment thread utilities/post_migration.py Outdated
Comment thread utilities/vmware_guest_operations.py Outdated
Comment on lines 189 to 200
virtual_machines_list=prepared_plan["virtual_machines"],
target_power_state=prepared_plan["target_power_state"],
target_power_state=prepared_plan.get("target_power_state"),
target_namespace=target_namespace,
warm_migration=prepared_plan["warm_migration"],
preserve_static_ips=prepared_plan["preserve_static_ips"],
vm_target_namespace=prepared_plan["vm_target_namespace"],
pvc_name_template=prepared_plan["pvc_name_template"],
pvc_name_template_use_generate_name=prepared_plan["pvc_name_template_use_generate_name"],
target_labels=target_vm_labels["vm_labels"],
target_affinity=prepared_plan["target_affinity"],
preserve_static_ips=prepared_plan.get("preserve_static_ips", False),
vm_target_namespace=prepared_plan.get("vm_target_namespace"),
pvc_name_template=prepared_plan.get("pvc_name_template"),
pvc_name_template_use_generate_name=prepared_plan.get("pvc_name_template_use_generate_name"),
target_labels=target_vm_labels.get("vm_labels") if "target_labels" in prepared_plan else None,
target_affinity=prepared_plan.get("target_affinity"),
xfs_compatibility=prepared_plan.get("xfs_compatibility", False),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Defaults used in prepared_plan.get() 📘 Rule violation ≡ Correctness

The test now reads project-controlled tests_params values via .get() with implicit
defaults/None, which can silently mask missing or misspelled required configuration keys and change
migration behavior without failing fast.
Agent Prompt
## Issue description
`prepared_plan` originates from `py_config["tests_params"][...]` (configuration we control). The test uses `.get()` (and `.get(..., default)`) for keys that should be explicitly present, which can hide misconfiguration and let invalid config propagate as `None`/defaults.

## Issue Context
Rule requires required project-controlled config keys to be accessed directly (e.g., `prepared_plan["target_power_state"]`) and forbids silently defaulting missing keys (except for optional feature flags).

## Fix Focus Areas
- tests/warm/test_warm_migration_comprehensive.py[189-200]
- tests/tests_config/config.py[524-575]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 27, 2026

Code review by qodo was updated up to the latest commit 63804a1

Comment on lines +1320 to +1333
# Get source VM data from plan
source_vm_data = plan.get("source_vms_data", {}).get(vm_name, {})

# Check if Windows VM
if source_vm_data.get("win_os", False):
raise ValueError(
f"Command output verification is only supported on Linux guests. VM '{vm_name}' is a Windows guest."
)

LOGGER.info(f"Running command on VM {destination_vm_name}: {' '.join(command)}")

# Get SSH credentials
ssh_username, ssh_password = get_ssh_credentials_from_provider_config(source_provider_data, source_vm_data)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. source_vms_data defaulted to {} 📘 Rule violation ☼ Reliability

check_vm_command_output() silently defaults missing plan["source_vms_data"][vm_name] to {},
even though it uses that data to infer Windows-vs-Linux and select SSH credentials, which can
mis-detect guest OS (e.g., treat Windows as Linux) and fail later for misleading reasons. This
violates the intended fail-fast behavior and can cause confusing downstream post-migration
validation failures.
Agent Prompt
## Issue description
`check_vm_command_output()` currently uses `plan.get("source_vms_data", {}).get(vm_name, {})` to determine whether the guest is Windows and to choose Linux vs Windows SSH credentials, which masks missing `source_vms_data`/missing VM entries and can cause incorrect OS detection and misleading downstream failures during post-migration validation. Update the logic to fail fast with a clear exception when required VM metadata is missing, and/or stop using `source_vms_data` as the general source of OS/credential metadata.

## Issue Context
- This code path is used for post-migration validation and should comply with the fail-fast requirement (reject missing/invalid required data early with a clear, specific exception).
- `get_ssh_credentials_from_provider_config()` expects a reliable `source_vm_info["win_os"]` flag.
- `source_vms_data` is not guaranteed to exist for all scenarios/providers and is treated as required only for specific features (e.g., static-IP preservation), so it is not a reliable general source of OS metadata.
- `check_vms()` already has authoritative `source_vm` data available from `source_provider.vm_dict(...)`; prefer passing that into `check_vm_command_output()` (e.g., accept `source_vm_info: dict[str, Any]`) and use it for the Windows guard and credential selection. Optionally keep `source_vms_data` only for data that truly lives there, but do not use it for OS detection.

## Fix Focus Areas
- utilities/post_migration.py[1294-1361]
- utilities/post_migration.py[1320-1333]
- utilities/post_migration.py[1475-1494]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +1337 to +1347
with ssh_conn:
# Execute command using executor with the correct user
executor = ssh_conn.rrmngmnt_host.executor(user=ssh_conn.rrmngmnt_user) # type: ignore[union-attr]
executor.port = ssh_conn.local_port

rc, stdout, stderr = executor.run_cmd(command)

if rc != 0:
raise GuestCommandError(
f"Command {' '.join(command)} failed on VM {destination_vm_name} (rc={rc}): {stderr}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. No retries for ssh cmd 🐞 Bug ☼ Reliability

check_vm_command_output() runs the SSH command exactly once with no retry/backoff, unlike existing
SSH checks in this file, so transient guest readiness/SSH instability post-migration can cause
intermittent false test failures.
Agent Prompt
### Issue description
`check_vm_command_output()` executes `executor.run_cmd(command)` once and fails immediately on non-zero rc or missing expected output. Post-migration guests commonly have short windows of instability (services still starting, udev settling, etc.), and this file already uses `TimeoutSampler` retry patterns for SSH connectivity.

### Issue Context
- `check_ssh_connectivity()` uses `TimeoutSampler` to avoid flakiness.
- `check_vm_command_output()` creates a new SSH connection and immediately executes the command without any retry loop.

### Fix Focus Areas
- utilities/post_migration.py[84-140]
- utilities/post_migration.py[1294-1361]

### Suggested fix
- Wrap the command execution + expected-output assertion in a `TimeoutSampler` retry loop (similar to `check_ssh_connectivity()`), e.g.:
  - retry for `wait_timeout=120` with `sleep=10`
  - retry on SSH exceptions and on `rc != 0`
  - (optionally) retry if `expected_output` not found, since filesystems/services may not be ready immediately
- Keep the final error message from the last attempt for debuggability.
- Alternatively, reuse an already-validated SSH session/connection if available to reduce repeated port-forward setup.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Signed-off-by: Chenli Hu <chhu@redhat.com>
@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 27, 2026

Code review by qodo was updated up to the latest commit 2ddfb0b

@qodo-code-review
Copy link
Copy Markdown

Code Review by Qodo

Grey Divider

Looking for bugs?

Check back in a few minutes. An AI review agent is analyzing this pull request.

Grey Divider

Qodo Logo

@Chenli-Hu
Copy link
Copy Markdown
Contributor Author

/verified

@myakove
Copy link
Copy Markdown
Collaborator

myakove commented May 31, 2026

@Chenli-Hu pleasee checkout Qodo comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants