CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001
CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001Tal-or wants to merge 2 commits into
Conversation
|
@Tal-or: This pull request references CNF-23565 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…Router Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning. Tracking: CNF-22582, RFE-8921 AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
192566c to
c941e7f
Compare
jmencak
left a comment
There was a problem hiding this comment.
Looks good to me overall. Have a couple of questions to improve my understanding of the problem and found a few nits.
|
|
||
| ### Non-Goals | ||
|
|
||
| - Managing the lifecycle of OVS-DPDK processes themselves (PMD thread creation, DPDK EAL |
There was a problem hiding this comment.
I understand this is a non-goal, however, I'd like to understand how the OVS-DPDK processes run in OpenShift. So, they're not managed by kubelet at all? Do they run as regular userspace processes outside of OpenShift control?
There was a problem hiding this comment.
So, they're not managed by kubelet at all?
They are not.
Do they run as regular userspace processes outside of OpenShift control?
AFAIU, yes.
The user will by pass the kernel networking stack and all infra-communication will be done through the DPDK talking directly to the NICs
I believe @MarSik can add more details and clarity about that
There was a problem hiding this comment.
That is how I understand the feature as well. The whole idea is that this stack is a regular OS systemd controlled service and that is why they need a dedicated set of cpus isolated from from the OS and from OCP.
9342e11 to
60e437d
Compare
Add on different commit for clarity. Will squash once it'll be ready to merge. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
60e437d to
b632c5c
Compare
|
@Tal-or: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Thank you for the changes. |
|
/cc @browsell |
bartwensley
left a comment
There was a problem hiding this comment.
A few questions and comments - mostly regarding isolcpus=domain
| `spec.net.disableOvsDynamicPinning` to prevent OVN-Kubernetes from dynamically changing `ovs-vswitchd` and | ||
| `ovsdb-server` processes' CPU affinity. | ||
| When `dedicated` is set, the operator automatically configures full kernel-level | ||
| isolation (`isolcpus=domain,managed_irq`, `nohz_full`, `rcu_nocbs`), adds the dedicated CPUs to |
There was a problem hiding this comment.
OpenShift doesn't use "domain" AFAIK so I don't think that should be part of the proposed enhancement. That affects several places in the document.
|
|
||
| - Provide a `dedicated` CPU set in the PerformanceProfile API that is fully excluded from Kubelet | ||
| scheduling (all QoS classes), OS daemons, and kernel housekeeping. | ||
| - Automatically ban dedicated CPUs from irqbalance and configure `isolcpus=domain,managed_irq` |
There was a problem hiding this comment.
Isn't adding domain here going to disable scheduling for ALL isolated CPUs which will break application pods?
| - A **MachineConfig** that: | ||
| - Does NOT include the OVS dynamic pinning trigger file (because `disableOvsDynamicPinning` | ||
| is `true`). | ||
| - Configures the irqbalance service with `IRQBALANCE_BANNED_CPUS` set to the hex mask of |
There was a problem hiding this comment.
The cri-o service updates IRQBALANCE_BANNED_CPUS to exclude isolated containers. Will it require any updates to account for the new dedicated CPUs?
| and interrupt handling. | ||
| - CPUs 2-3,6-7 are isolated for application workloads (Guaranteed QoS pods). | ||
|
|
||
| 6. OVN-Kubernetes (or the network operator) starts OVS-DPDK and pins PMD threads to the |
There was a problem hiding this comment.
What component will be changed to do this pinning and how will it determine which CPUs to use?
|
|
||
| ## Open Questions | ||
|
|
||
| 1. Should the API field be named `dedicated` or something more descriptive like |
There was a problem hiding this comment.
I think using dedicated is going to be confusing it sounds very similar to isolated (or reserved). I'd prefer more specific naming.
| `infrastructureNetworking` or `dpdkCpus`? The current name is generic enough to support | ||
| future use cases beyond DPDK but may be too vague. | ||
|
|
||
| 2. Should `isolcpus=domain,managed_irq` be used together (both domain isolation and managed |
There was a problem hiding this comment.
As commented above, I don't see how it would be possible to use "domain" as it would apply to the isolated (i.e. application) CPUs as well.
| - Hardware interrupts are not routed to dedicated CPUs (`/proc/interrupts` verification). | ||
| - No host processes are running on dedicated CPUs (`ps -eo pid,psr,comm` verification). | ||
|
|
||
| ### Integration Tests |
There was a problem hiding this comment.
I think a test is required to ensure that the IRQBALANCE_BANNED_CPUs is handled correctly when isolated containers (with the irq-load-balancing.crio.io annotation) are created and deleted.
Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning.
Tracking: CNF-22582, RFE-8921
AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com