Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
307 changes: 307 additions & 0 deletions stps/sig-virt/mig-vgpu-stp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,307 @@
# Openshift-virtualization-tests Test plan

## **MIG vGPU - Quality Engineering Plan**

### **Metadata & Tracking**

- **Enhancement(s):** [Links to enhancement(s); KubeVirt, OpenShift, etc.]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fill in the Enhancement(s) field or remove the placeholder.

The Enhancement(s) field contains placeholder text. If no OpenShift enhancement PR exists for this feature, reference the High-Level Design (HLD) document here. Based on learnings, it's acceptable to reference only the HLD when no enhancement PR exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 7, Replace the placeholder line
labeled "Enhancement(s):" with either the link(s) to the relevant enhancement
PR(s) (e.g., OpenShift/KubeVirt enhancement URLs) or, if no enhancement exists,
a link and brief citation to the High-Level Design (HLD) document; update the
"Enhancement(s):" field in the document so it no longer contains placeholder
text and includes the actual enhancement or HLD reference.

- **Feature Tracking:** https://redhat.atlassian.net/browse/VIRTSTRAT-166
- **Epic Tracking:** https://redhat.atlassian.net/browse/CNV-13713
<!-- Tasks must be created to block the feature -->
- **QE Owner(s):** [Name(s)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assign QE Owner(s) before approval.

The QE Owner(s) field must be populated with the actual name(s) of the responsible QE engineers before this STP can be approved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 11, Populate the QE Owner(s) field in
the STP header by replacing the placeholder "[Name(s)]" with the actual
responsible QE engineer name(s); update the "QE Owner(s):" entry so it lists one
or more real names (e.g., "QE Owner(s): Jane Doe, John Smith") before approving
the document.

- **Owning SIG:** sig-virt
- **Participating SIGs:** [List of participating SIGs]
Comment on lines +5 to +13
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address VEP issue field mentioned in PR objectives.

The PR description states "STP Metadata: VEP issue field is present but not filled," but no VEP (Validated Enhancement Proposal) field appears in the Metadata & Tracking section. Please either:

  • Add the VEP field with appropriate value/placeholder, or
  • Clarify in the PR description if the VEP field is not applicable to this STP
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 5 - 13, The Metadata & Tracking
section currently omits the VEP field referenced in the PR objectives; either
add a "VEP:" (or "VEP issue:") entry under the "Metadata & Tracking" block with
an appropriate value or placeholder (e.g., "VEP: TBD" or the VEP number) so the
text "STP Metadata: VEP issue field is present but not filled" matches the
document, or update the PR description to explicitly state that a VEP is not
applicable; locate and modify the "Metadata & Tracking" section (the header and
the list containing Enhancement(s), Feature Tracking, Epic Tracking, QE
Owner(s), Owning SIG, Participating SIG) to include the new VEP field or change
the PR description accordingly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify Participating SIGs or remove if none.

The Participating SIGs field should list any other SIGs involved in this feature, or be removed/marked as "None" if sig-virt is the only participating SIG.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 13, Update the "**Participating
SIGs:**" field in the mig-vgpu-stp.md document: either list the other SIGs
participating (e.g., "sig-foo, sig-bar") if there are collaborators, or remove
the line entirely or replace it with "None" when sig-virt is the only SIG;
ensure the final text uses the exact "**Participating SIGs:**" label so
reviewers can find it easily.


**Document Conventions:**
- **MIG** — Multi-Instance GPU: NVIDIA technology that partitions a single physical GPU into multiple isolated instances
- **vGPU** — Virtual GPU: GPU virtualization that allows multiple VMs to share a physical GPU
- **MIG vGPU** — A vGPU slice backed by a MIG instance, combining MIG isolation with GPU virtualization
- **RHEL VM** — Red Hat Enterprise Linux Virtual Machine

### **Feature Overview**

<!-- Provide a brief (2-4 sentences) description of the feature being tested.
Include: what it does, why it matters to customers, and key technical components. -->

Enable support for MIG-backed NVIDIA vGPUs within OpenShift Virtualization, allowing users to allocate GPU resources more efficiently and securely by leveraging NVIDIA's Multi-Instance GPU (MIG) technology. This feature helps maximize GPU utilization, reduce resource fragmentation, and provide guaranteed performance for AI/ML or HPC workloads running in KubeVirt-based virtual machines on OpenShift.

---

### **I. Motivation and Requirements Review (QE Review Guidelines)**

This section documents the mandatory QE review process. The goal is to understand the feature's value,
technology, and testability before formal test planning.

#### **1. Requirement & User Story Review Checklist**

<!-- **How to complete this checklist:**
1. **Checkbox**: Mark [x] if the check is complete; if the item cannot be checked - add an explanation why in the `details` section
2. Complete the relevant, needed details for the checklist item -->

- [x] **Review Requirements**
- *List the key D/S requirements reviewed:*
- Nodes with supported NVIDIA GPUs must advertise MIG vGPU devices in their `Capacity` and `Allocatable` sections after MIG configuration
- RHEL VMs must be creatable with a MIG vGPU device attached and reach Running state
- The MIG vGPU device must be visible inside the RHEL VM via standard PCI enumeration (`lspci`)
- Multiple RHEL VMs must be able to share the same physical GPU concurrently

- [x] **Understand Value and Customer Use Cases**
- *Describe the feature's value to customers:* Customers running AI/ML and HPC workloads require dedicated, isolated GPU resources per VM. MIG vGPU provides hardware-level isolation with predictable performance, allowing safe multi-tenancy on expensive GPU hardware while maximizing GPU utilization and reducing resource fragmentation.

- [x] **Testability**
- *Note any requirements that are unclear or untestable:* None identified at this time; all acceptance criteria are testable via node inspection and in-VM CLI commands.

- [x] **Acceptance Criteria**
- *List the acceptance criteria:*
- Node `Capacity` and `Allocatable` fields reflect the configured MIG vGPU device after setup
- A RHEL VM with a MIG vGPU device request reaches `Running` state
- `lspci -nnk | grep NVIDIA` inside the RHEL VM returns the expected NVIDIA device entry
- Two RHEL VMs each assigned one MIG vGPU slice from the same GPU are both reachable and operational concurrently
- *Note any gaps or missing criteria:* None

- [x] **Non-Functional Requirements (NFRs)**
- *List applicable NFRs and their targets:*
- Resource isolation: MIG vGPU instances must not impact each other's performance
- Supportability: GPU device visibility must be consistent across VM restarts

#### **2. Known Limitations**

- **MIG vGPU is only supported on NVIDIA GPUs that support the MIG feature (e.g., A100, A30, H100); testing is limited to the NVIDIA A30 as that is the only available hardware**
- *Sign-off:* [Name/Date]

- **Only RHEL guest OS is validated **
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove trailing asterisks.

The line contains trailing ** after "validated" which appears to be a formatting error.

📝 Proposed fix
-- **Only RHEL guest OS is validated **
+- **Only RHEL guest OS is validated**
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Only RHEL guest OS is validated **
- **Only RHEL guest OS is validated**
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 72, Fix the malformed bold markup on
the line containing "Only RHEL guest OS is validated" by removing the trailing
`**` so the bold formatting is balanced; update the line in mig-vgpu-stp.md from
"**Only RHEL guest OS is validated **" to either "**Only RHEL guest OS is
validated**" (to keep bold) or "Only RHEL guest OS is validated" (to remove
bold) as appropriate.

- *Sign-off:* [Name/Date]

- **MIG vGPU for Windows guests is only supported on vGPUs created on RTX Pro 6000 hardware; Windows MIG vGPU is not tested in this cycle as the available hardware is the A30**
- *Sign-off:* [Name/Date]

- **MIG vGPU configuration requires pre-configuration of the GPU node (MIG mode enabled, MIG profiles set) before VM scheduling**
- *Sign-off:* [Name/Date]
Comment on lines +70 to +79
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete sign-off fields for Known Limitations.

All Known Limitations require sign-off with name and date before this STP is approved. These sign-offs acknowledge the documented constraints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 70 - 79, The Known Limitations
section contains placeholder sign-offs "[Name/Date]" for each bullet (e.g., the
lines "Only RHEL guest OS is validated", "MIG vGPU for Windows guests is only
supported on vGPUs created on RTX Pro 6000 hardware...", and "MIG vGPU
configuration requires pre-configuration of the GPU node..."); replace each
placeholder with the actual reviewer/approver name and date to complete the
sign-off fields so the STP can be approved.


#### **3. Technology and Design Review**

- [ ] **Developer Handoff/QE Kickoff**
- *Key takeaways and concerns:* [Summarize key points and concerns]

- [ ] **Technology Challenges**
- *List identified challenges:*
- Requires NVIDIA MIG-capable GPU hardware in the test cluster
- MIG profile configuration and GPU Operator setup must be completed before tests run
- *Impact on testing approach:* Tests can only execute on nodes with supported GPU hardware.

- [ ] **API Extensions**
- *List new or modified APIs:* Node resource capacity fields (`nvidia.com/mig-*` resources); VirtualMachine spec GPU device stanza
- *Testing impact:* Tests must validate node resource advertisement and VM spec GPU device assignment.

- [ ] **Test Environment Needs**
- *See environment requirements in Section II.3 and testing tools in Section II.3.1*

- [ ] **Topology Considerations**
- *Describe topology requirements:* At least one worker node with an NVIDIA A30 GPU and the NVIDIA GPU Operator installed and configured for MIG mode.
- *Impact on test design:* Tests must use node selectors or node affinity rules targeting the GPU-equipped node.

Comment on lines +81 to +102
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete the Technology and Design Review section.

Several items in this section are unchecked or contain placeholder text:

  • Line 83: Developer Handoff/QE Kickoff is unchecked with placeholder text
  • Lines 86-91: Technology Challenges is unchecked but has content
  • Lines 92-95: API Extensions is unchecked but has content
  • Lines 96-98: Test Environment Needs is unchecked
  • Lines 99-102: Topology Considerations is unchecked but has content

Please review each item and check the boxes once completed, or clarify if these are intentionally left for stakeholder input during review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 81 - 102, The Technology and
Design Review section has unchecked items and placeholder text; update the
checklist by marking the boxes as completed where content is provided (check
"Technology Challenges", "API Extensions", and "Topology Considerations") and
replace the placeholder in "Developer Handoff/QE Kickoff" with a concise summary
of handoff actions and QE kickoff steps (who, what, and follow-ups), and either
populate "Test Environment Needs" with the referenced environment/tool details
from Section II.3/II.3.1 or add a short note that the item requires stakeholder
confirmation; reference the section headings "Developer Handoff/QE Kickoff",
"Technology Challenges", "API Extensions", "Test Environment Needs", and
"Topology Considerations" when making these edits.

### **II. Software Test Plan (STP)**

This STP serves as the **overall roadmap for testing**, detailing the scope, approach, resources, and schedule.

#### **1. Scope of Testing**

**Testing Goals**

- **[P0]** Verify that a MIG-capable GPU node correctly advertises MIG vGPU devices in its `Capacity` and `Allocatable` node fields after MIG configuration
- **[P0]** Validate that a RHEL VM requesting a MIG vGPU device can be created and reaches `Running` state
- **[P0]** Confirm that the NVIDIA GPU device is visible inside a running RHEL VM via `lspci -nnk | grep NVIDIA`
- **[P1]** Verify that two RHEL VMs, each assigned one MIG vGPU slice from the same physical GPU, can run concurrently without conflict

**Out of Scope (Testing Scope Exclusions)**

- **Legacy GPUs without MIG support**
- *Rationale:* Only Ampere and Hopper generation GPUs (e.g., A100, H100/H200) or later that support MIG are targeted; testing on non-MIG GPUs is not planned
- *PM/Lead Agreement:* [Name/Date]

- **Advanced multi-tenancy beyond GPU-level isolation**
- *Rationale:* Deep security isolation beyond MIG's hardware partitioning (e.g., vTPM integration) is not addressed in this feature
- *PM/Lead Agreement:* [Name/Date]

- **Custom MIG topologies beyond standard configurations**
- *Rationale:* Standard MIG slicing profiles recognized by the NVIDIA GPU Operator are assumed; custom or non-standard MIG topologies are not tested
- *PM/Lead Agreement:* [Name/Date]

- **Windows guest OS**
- *Rationale:* MIG vGPU for Windows is only supported on RTX Pro 6000 hardware; the available test hardware is the NVIDIA A30, which does not support Windows MIG vGPU
- *PM/Lead Agreement:* [Name/Date]

- **GPU benchmark / performance testing inside VMs**
- *Rationale:* No standardized GPU benchmark tooling integrated into CI; performance NFRs deferred to a future cycle
- *PM/Lead Agreement:* [Name/Date]

- **MIG profile configuration and GPU Operator installation**
- *Rationale:* Infrastructure pre-configuration is handled outside the test scope; tests assume a correctly configured GPU node
- *PM/Lead Agreement:* [Name/Date]

Comment on lines +116 to +141
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete PM/Lead Agreement sign-offs for out-of-scope items.

All out-of-scope items require PM/Lead sign-off (lines 120, 124, 128, 132, 136, 140) to ensure alignment on testing boundaries. These should be completed before final STP approval.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 116 - 141, Populate the PM/Lead
Agreement placeholders for each out-of-scope bullet so the document records
explicit sign-offs: add a named approver and date in place of each "[Name/Date]"
for the items "Legacy GPUs without MIG support", "Advanced multi-tenancy beyond
GPU-level isolation", "Custom MIG topologies beyond standard configurations",
"Windows guest OS", "GPU benchmark / performance testing inside VMs", and "MIG
profile configuration and GPU Operator installation" to indicate formal
acceptance of these exclusions.

**Test Limitations**

- **Testing is limited to the NVIDIA A30 GPU — other supported MIG-capable GPUs (e.g., A100, H100/H200) are not available in the test environment**
- *Sign-off:* [Name/Date]

- **MIG-capable GPU hardware (e.g., NVIDIA A100) must be available in the test cluster — tests cannot run on standard CI nodes**
- *Sign-off:* [Name/Date]
Comment on lines +142 to +148
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete sign-offs for Test Limitations.

Sign-off fields at lines 145 and 148 must be completed before final STP approval.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 142 - 148, The "Test Limitations"
section contains two unresolved sign-off placeholders for the NVIDIA
A30-specific limitation and the MIG-capable GPU hardware requirement; complete
both sign-off fields by replacing "[Name/Date]" with the approver's full name
and approval date for each bullet (the bullet mentioning "NVIDIA A30 GPU" and
the bullet mentioning "MIG-capable GPU hardware (e.g., NVIDIA A100)"), ensuring
the completed entries are accurate and authoritative before final STP approval.


#### **2. Test Strategy**

**Functional**

- [x] **Functional Testing** — Validates that the feature works according to specified requirements and user stories
- *Details:* Functional tests cover node resource advertisement, VM creation with MIG vGPU, in-VM GPU visibility, and concurrent multi-VM execution on a shared GPU.

- [x] **Automation Testing** — Confirms test automation plan is in place for CI and regression coverage
- *Details:* All test scenarios will be automated using the standard openshift-virtualization-tests and integrated into the GPU-specific CI lane targeting MIG-capable nodes.

- [x] **Regression Testing** — Verifies that new changes do not break existing functionality
- *Details:* Existing GPU passthrough and vGPU tests will be included in regression scope to ensure MIG vGPU changes do not break non-MIG GPU workflows.

**Non-Functional**

- [ ] **Performance Testing**
- *Details:* Not applicable this cycle — GPU performance benchmarking inside VMs is out of scope (see Test Limitations).

- [ ] **Scale Testing**
- *Details:* Not applicable this cycle — limited to a single GPU node; scale testing deferred.

- [ ] **Security Testing**
- *Details:* N/A — no new RBAC or authentication changes introduced by this feature.

- [x] **Usability Testing**
- *Details:* Validate that node capacity/allocatable fields are set correctly. Validate that VM status and events provide clear feedback when a MIG vGPU device is successfully assigned or fails to be allocated.

- [ ] **Monitoring**
- *Details:* N/A — no new metrics or alerts introduced by this feature in this cycle.

**Integration & Compatibility**

- [x] **Compatibility Testing**
- *Details:* Tests run on the target OCP + OpenShift Virtualization version with NVIDIA GPU Operator. Ensure existing non-MIG vGPU and GPU passthrough tests remain unaffected.

- [ ] **Upgrade Testing**
- *Details:* Not in scope for this cycle.

- [x] **Dependencies**
- *Details:* Requires NVIDIA GPU Operator to be installed and MIG mode enabled on the target node before tests execute.

- [ ] **Cross Integrations**
- *Details:* N/A — no known cross-team integration impacts identified.

**Infrastructure**

- [ ] **Cloud Testing**
- *Details:* N/A — feature requires bare-metal nodes with MIG supported NVIDIA GPU hardware.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hyphenate compound adjective.

"MIG supported" should be "MIG-supported" when used as a compound adjective modifying "NVIDIA GPU hardware."

📝 Proposed fix
-  - *Details:* N/A — feature requires bare-metal nodes with MIG supported NVIDIA GPU hardware.
+  - *Details:* N/A — feature requires bare-metal nodes with MIG-supported NVIDIA GPU hardware.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- *Details:* N/A — feature requires bare-metal nodes with MIG supported NVIDIA GPU hardware.
- *Details:* N/A — feature requires bare-metal nodes with MIG-supported NVIDIA GPU hardware.
🧰 Tools
🪛 LanguageTool

[grammar] ~197-~197: Use a hyphen to join words.
Context: ...ature requires bare-metal nodes with MIG supported NVIDIA GPU hardware. #### **3...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 197, Summary: The phrase "MIG
supported NVIDIA GPU hardware" must be hyphenated as a compound adjective. In
the document string "MIG supported NVIDIA GPU hardware" (found in
mig-vgpu-stp.md near the details line), replace it with "MIG-supported NVIDIA
GPU hardware" so the compound adjective correctly modifies "NVIDIA GPU
hardware"; ensure any other occurrences of the exact phrase "MIG supported" used
as a modifier are updated the same way.


#### **3. Test Environment**

- **Cluster Topology:** 3-master/3-worker bare-metal (at least one worker node with NVIDIA A30 GPU)

- **OCP & OpenShift Virtualization Version(s):** [e.g., OCP 4.21 with OpenShift Virtualization 4.21]

- **CPU Virtualization:** VT-x (Intel) or AMD-V enabled

- **Compute Resources:** GPU node requires an NVIDIA A30 GPU

- **Special Hardware:** NVIDIA A30 GPU on at least one worker node

- **Storage:** ocs-storagecluster-ceph-rbd-virtualization

- **Network:** OVN-Kubernetes, IPv4

- **Required Operators:** NVIDIA GPU Operator (with MIG mode and vGPU manager configured)

- **Platform:** Bare metal

- **Special Configurations:** GPU node must have MIG mode enabled and appropriate MIG profiles configured prior to test execution

#### **3.1. Testing Tools & Frameworks**

- **Test Framework:** openshift-virtualization-tests

- **CI/CD:** N/A

- **Other Tools:** `lspci` (available inside RHEL VM guest OS) for in-VM GPU visibility validation;

#### **3.2. DevOps & Cluster Provisioning**

MIG vGPU configuration must be enabled as part of the cluster deployment pipeline before any tests can execute. This work is tracked under [CNV-67712](https://redhat.atlassian.net/browse/CNV-67712).

- **Cluster Deploy Job:** The cluster deploy job must be extended to enable MIG vGPU configuration on nodes equipped with the NVIDIA A30 GPU. This includes:
- Enabling MIG mode on the GPU node during cluster provisioning
- Applying the appropriate MIG partition profile (e.g., `1g.6gb`) via the NVIDIA GPU Operator CRD

- **Tracking:** [CNV-67712 — Enable MIG vGPU configuration via the cluster deploy job](https://redhat.atlassian.net/browse/CNV-67712)

#### **4. Entry Criteria**

The following conditions must be met before testing can begin:

- [ ] Requirements and design documents are **approved and merged**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Requirements and design approval is a prerequisite.

Entry criterion "Requirements and design documents are approved and merged" is currently unchecked. This must be satisfied before testing begins.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` at line 243, Update the entry criterion
checklist in mig-vgpu-stp.md by marking the "Requirements and design documents
are **approved and merged**" item as satisfied: change the unchecked box "[ ]
Requirements and design documents are **approved and merged**" to a checked box
"[x] Requirements and design documents are **approved and merged**" and, if
possible, add a brief reference (PR/MR number or link) to the merged approval
artifact so reviewers can verify the prerequisite is met.

- [x] Test environment can be **set up and configured** (see Section II.3 - Test Environment)
- [x] NVIDIA GPU Operator is installed and MIG mode is enabled on the target GPU node
- [x] [CNV-67712](https://redhat.atlassian.net/browse/CNV-67712) is resolved — cluster deploy job enables MIG vGPU configuration automatically on the GPU node

#### **5. Risks**

**Timeline/Schedule**

- **Risk:** N/A
- **Mitigation:** N/A
- *Estimated impact on schedule:* N/A
- *Sign-off:* N/A

**Test Coverage**

- **Risk:** N/A
- **Mitigation:** N/A
- *Areas with reduced coverage:* N/A
- *Sign-off:* N/A

**Test Environment**

- **Risk:** Only one NVIDIA A30 GPU node exists in a single cluster; if the node or cluster is unavailable (e.g., hardware failure, cluster maintenance), all MIG vGPU testing is blocked with no fallback environment.
- **Mitigation:** None — no alternative GPU hardware or cluster is available; testing is fully dependent on this single node's availability.
- *Sign-off:* [Name/Date]
Comment on lines +266 to +268
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete sign-off for Test Environment risk.

The risk acknowledgment at line 268 requires sign-off with name and date before final approval.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/mig-vgpu-stp.md` around lines 266 - 268, The Test Environment
risk block currently lacks an approver signature; update the risk
acknowledgement paragraph (the "**Risk:** Only one NVIDIA A30 GPU node..." /
"**Mitigation:** None" block) to include a completed sign-off by replacing
"[Name/Date]" with the approver's full name and the approval date in YYYY-MM-DD
(or the project's standard date format), ensuring the "*Sign-off:*" line reads
e.g. "*Sign-off:* Alice Smith / 2026-04-06" so the document has a clear,
traceable approval for the Test Environment risk.


### **III. Test Scenarios & Traceability**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about these:

  1. migrate vm?
  2. restart vm?


<!-- This section links D/S requirements to test coverage, enabling reviewers to verify all requirements are tested. -->

- **[CNV-38740](https://redhat.atlassian.net/browse/CNV-38740)** — MIG vGPU node capacity and allocatable resources are updated after MIG configuration
- *Test Scenario:* [Tier 2] Verify node `Capacity` and `Allocatable` sections show the MIG vGPU device after MIG mode and profiles are configured on the GPU node
- *Priority:* P0

- **[CNV-38740](https://redhat.atlassian.net/browse/CNV-38740)** — RHEL VM with MIG vGPU device reaches Running state
- *Test Scenario:* [Tier 2] Verify a RHEL VM requesting a MIG vGPU device can be created and transitions to `Running` state
- *Priority:* P0

- **[CNV-38740](https://redhat.atlassian.net/browse/CNV-38740)** — MIG vGPU device is visible inside a running RHEL VM
- *Test Scenario:* [Tier 2] Verify the NVIDIA GPU device is visible inside the RHEL VM via `lspci -nnk | grep NVIDIA`
- *Priority:* P0

- **[CNV-38740](https://redhat.atlassian.net/browse/CNV-38740)** — Two RHEL VMs with one MIG vGPU each run concurrently on the same GPU
- *Test Scenario:* [Tier 2] Verify two RHEL VMs each assigned one MIG vGPU slice from the same A30 GPU can run in parallel without conflict
- *Priority:* P1

---

### **IV. Sign-off and Approval**

This Software Test Plan requires approval from the following stakeholders:

* **Reviewers:**
- dshchedr
- vsibirsk
- rnetser
- kbidarkar
- SiboWang1997
- jerry7z
- SamAlber
* **Approvers:**
- dshchedr
- vsibirsk
- rnetser
Loading