Skip to content

[VIRT] Refactor descheduler tests#4739

Open
dshchedr wants to merge 2 commits intoRedHatQE:mainfrom
dshchedr:refactor-descheduler-tests
Open

[VIRT] Refactor descheduler tests#4739
dshchedr wants to merge 2 commits intoRedHatQE:mainfrom
dshchedr:refactor-descheduler-tests

Conversation

@dshchedr
Copy link
Copy Markdown
Collaborator

@dshchedr dshchedr commented May 5, 2026

What this PR does / why we need it:

Remove flaky and duplicate tests.

  • Remove test_no_migrations_storm tests - flaky, cascading migrations are legitimate rebalancing behavior

  • Remove TestDeschedulerEvictsVMAfterDrainUncordon - basically it duplicates the utilization imbalance test, just with different setup

  • Fix flaky PSI metrics tests by selecting node with most VMs

Summary by CodeRabbit

  • Tests

    • Updated descheduler test setup to use a cordon/uncordon flow for preparing and restoring nodes, and adjusted test dependencies accordingly.
    • Removed a pair of migration-storm verification tests.
  • Chores

    • Simplified fixtures and removed a now-unneeded VM failover wait helper to reduce complexity and improve maintainability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

Reworked descheduler tests to replace a drain-based flow with a cordon/uncordon flow: added fixtures that cordon and later uncordon a selected schedulable node, removed fixtures that computed drain targets and waited for per-VM failover, and removed tests asserting "no migration storm", updating test dependencies accordingly.

Changes

Descheduler Node Management Refactoring

Layer / File(s) Summary
Test Fixture Definition
tests/virt/node/descheduler/conftest.py
Added cordoned_node_for_vm_deployment(admin_client, schedulable_nodes) which cordons the first schedulable node via node_mgmt_console and waits until it is unschedulable; added uncordon_node(cordoned_node_for_vm_deployment) which runs oc adm uncordon <node> via run_command. Removed vms_orig_nodes_before_node_drain, node_to_drain, and drain_uncordon_node fixtures that selected drain targets and performed drain + per-VM failover waits. Also added imports shlex and run_command used by the new flow.
Test Implementation & Dependency Wiring
tests/virt/node/descheduler/test_descheduler.py
Replaced the drain_uncordon_node fixture usage with uncordon_node in test_descheduler_evicts_vm_after_drain_uncordon. Removed NO_MIGRATION_STORM_ASSERT_MESSAGE constant and deleted test_no_migrations_storm tests from both descheduler test classes; updated @pytest.mark.dependency() targets so subsequent boot-time tests depend directly on the eviction tests.
Utility Cleanup
tests/virt/node/descheduler/utils.py
Removed the helper wait_vmi_failover(vm, orig_node) and dropped TIMEOUT_15MIN from the imported timeout constants, as the per-VM failover wait is no longer used.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description lacks most required template sections. Only "What this PR does / why we need it" is partially filled; short description, details, related issues, special notes, and jira-ticket are missing. Complete the template by adding: short description, detailed explanation, which issue(s) this fixes, special notes for reviewer, and jira-ticket URL (or "NONE" if not tracked).
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title '[VIRT] Refactor descheduler tests' accurately summarizes the main change: a refactoring of descheduler test fixtures and tests, replacing drain-based node preparation with cordon-based approach and removing migration storm validation tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot
Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest verify-bugs-are-open - verify-bugs-are-open
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  5. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • dshchedr
  • vsibirsk

Reviewers:

  • SamAlber
  • akri3i
  • dshchedr
  • kbidarkar
  • vsibirsk
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/virt/node/descheduler/conftest.py`:
- Around line 141-145: The uncordon_node fixture yields immediately after
running "oc adm uncordon" which causes a race; after calling run(...) in the
uncordon_node fixture, call the helper
wait_for_node_schedulable_status(cordoned_node_for_vm_deployment, True) (or
equivalent API used elsewhere) to wait until cordoned_node_for_vm_deployment
becomes schedulable before yielding, so tests like
assert_vms_distribution_after_failover observe the updated schedulable state
reliably.
- Line 2: Replace the forbidden subprocess.run usage and import with
pyhelper_utils.shell.run_command: remove "from subprocess import run" and import
run_command from pyhelper_utils.shell, then change the call sites that use
subprocess.run(..., shell=True, f"...") to run_command([...]) providing the
command and arguments as a list (no shell string interpolation) and
propagate/handle the returned result similarly; specifically update any
references to subprocess.run and the f-string command construction to use
run_command([...]) to eliminate shell=True and command-injection risk.

In `@tests/virt/node/descheduler/test_descheduler.py`:
- Line 50: Add a brief explanatory comment above the pytest.mark.dependency
decorator that states why the test depends on
test_descheduler_evicts_vm_after_drain_uncordon (for example: that the earlier
test sets up/validates VM drain/uncordon state required by this test), update
the decorator usage at the occurrence referencing
f"{TESTS_CLASS_NAME}::test_descheduler_evicts_vm_after_drain_uncordon" and apply
the same explanatory-comment pattern to the other dependency instance (the one
at the later occurrence around line 94) so both decorators have the required
rationale comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e414306f-a6b1-494d-9885-a011e9246b59

📥 Commits

Reviewing files that changed from the base of the PR and between ff39c74 and 26d9304.

📒 Files selected for processing (2)
  • tests/virt/node/descheduler/conftest.py
  • tests/virt/node/descheduler/test_descheduler.py

Comment thread tests/virt/node/descheduler/conftest.py Outdated
Comment thread tests/virt/node/descheduler/conftest.py Outdated
Comment thread tests/virt/node/descheduler/test_descheduler.py Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 5, 2026
@openshift-virtualization-qe-bot-5
Copy link
Copy Markdown

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-4739 published

@openshift-virtualization-qe-bot-3
Copy link
Copy Markdown
Contributor

Clean rebase detected — no code changes compared to previous head (c5b0963).

The node_to_run_stress fixture was picking the first node with any VMs,
which could result in insufficient load.

Now select the node with the MOST VMs to maximize stress load and ensure
the overutilization threshold.

Signed-off-by: Denys Shchedrivyi <dshchedr@redhat.com>
@openshift-virtualization-qe-bot-3
Copy link
Copy Markdown
Contributor

Clean rebase detected — no code changes compared to previous head (75a6e54).

@dshchedr
Copy link
Copy Markdown
Collaborator Author

dshchedr commented May 7, 2026

/build-and-push-container

@openshift-virtualization-qe-bot-5
Copy link
Copy Markdown

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-4739 published

@dshchedr
Copy link
Copy Markdown
Collaborator Author

dshchedr commented May 8, 2026

/wip cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants