Skip to content

switchdev mode for physical functions when using SRIOV #213

@dann1

Description

@dann1

By default, the SRIOV capable physical functions operate on legacy mode. We are planning to extend the VF logic in the network drivers to handle VF passthrough while the PF is in switchdev mode.

We would like for one-deploy to set up the smartnic mode if chosen in the inventory. Something like

        - address: "0000:81:00.0"
          set_driver: omit
          set_numvfs: max
          set_mode: switchdev # default is legacy

Some examples of smartnic modes. Mellanox node

[root@sm15 ~]# dpdk-devbind.py --status-dev net

Network devices using kernel driver
===================================
0000:01:00.0 'BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller 16d8' numa_node=0 if=vmnic0 drv=bnxt_en unused=vfio-pci
0000:01:00.1 'BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller 16d8' numa_node=0 if=vmnic1 drv=bnxt_en unused=vfio-pci
0000:81:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=pf0,eth7,eth5,eth3,eth1,eth6,eth4,eth2,eth0 drv=mlx5_core unused=vfio-pci
0000:81:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=pf1 drv=mlx5_core unused=vfio-pci
0000:81:00.2 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf002 drv=mlx5_core unused=vfio-pci
0000:81:00.3 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf003 drv=mlx5_core unused=vfio-pci
0000:81:00.4 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf004 drv=mlx5_core unused=vfio-pci
0000:81:00.5 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf005 drv=mlx5_core unused=vfio-pci
0000:81:00.6 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf006 drv=mlx5_core unused=vfio-pci
0000:81:00.7 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf007 drv=mlx5_core unused=vfio-pci
0000:81:01.0 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf010 drv=mlx5_core unused=vfio-pci
0000:81:01.1 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf011 drv=mlx5_core unused=vfio-pci
0000:81:01.2 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf012 drv=mlx5_core unused=vfio-pci
0000:81:01.3 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf013 drv=mlx5_core unused=vfio-pci
0000:81:01.4 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf014 drv=mlx5_core unused=vfio-pci
0000:81:01.5 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf015 drv=mlx5_core unused=vfio-pci
0000:81:01.6 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf016 drv=mlx5_core unused=vfio-pci
0000:81:01.7 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf017 drv=mlx5_core unused=vfio-pci
0000:81:02.0 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf020 drv=mlx5_core unused=vfio-pci
0000:81:02.1 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=vf021 drv=mlx5_core unused=vfio-pci
[root@sm15 ~]# devlink dev eswitch show pci/0000:81:00.0
pci/0000:81:00.0: mode switchdev inline-mode link encap-mode basic
[root@sm15 ~]# devlink dev eswitch show pci/0000:81:00.1
pci/0000:81:00.1: mode legacy inline-mode none encap-mode basic
[root@sm15 ~]# devlink dev eswitch show pci/0000:01:00.0
pci/0000:01:00.0: mode legacy
[root@sm15 ~]# devlink dev eswitch show pci/0000:01:00.1
pci/0000:01:00.1: mode legacy

Intel node

[root@nfhhvbcnlb03 ~]# dpdk-devbind.py --status-dev net | grep -i e810
0000:16:00.0 'Ethernet Controller E810-XXV for SFP 159b' numa_node=0 if=ens1f0np0,ens1f0npf0vf0,ens1f0npf0vf1 drv=ice unused=vfio-pci
0000:16:00.1 'Ethernet Controller E810-XXV for SFP 159b' numa_node=0 if=ens1f1np1 drv=ice unused=vfio-pci
0000:40:00.0 'Ethernet Controller E810-XXV for SFP 159b' numa_node=0 if=eno12399np0 drv=ice unused=vfio-pci
0000:40:00.1 'Ethernet Controller E810-XXV for SFP 159b' numa_node=0 if=eno12409np1 drv=ice unused=vfio-pci
[root@nfhhvbcnlb03 ~]# dpdk-devbind.py --status-dev net | grep -i x710
0000:42:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' numa_node=0 if=ens3f0 drv=i40e unused=vfio-pci
0000:42:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' numa_node=0 if=ens3f1 drv=i40e unused=vfio-pci
0000:6a:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' numa_node=0 if=ens2f0 drv=i40e unused=vfio-pci
0000:6a:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' numa_node=0 if=ens2f1 drv=i40e unused=vfio-pci
[root@nfhhvbcnlb03 ~]# devlink dev eswitch show pci/0000:16:00.0
pci/0000:16:00.0: mode switchdev
[root@nfhhvbcnlb03 ~]# devlink dev eswitch show pci/0000:16:00.1
pci/0000:16:00.1: mode legacy
# x710 has i40e driver which doesn't support switchdev mode
[root@nfhhvbcnlb03 ~]# devlink dev eswitch show pci/0000:42:00.0
pci/0000:42:00.0:

To activate the switchdev mode

pci="0000:16:00.0" # for example
devlink dev eswitch set pci/$pci mode switchdev

# succesful activation
[root@nfhhvbcnlb03 ~]# devlink dev eswitch set pci/0000:16:00.0 mode switchdev
[root@nfhhvbcnlb03 ~]# echo $?

# failed activation. x710 has i40e driver which doesn't support switchdev mode
[root@nfhhvbcnlb03 ~]# devlink dev eswitch set pci/0000:42:00.0 mode switchdev 
kernel answers: Operation not supported

Activating switchdevmode will yield new representors, consider for possible issues with existing PCI management logic

[root@sm15 ~]# devlink dev eswitch show pci/0000:81:00.0
pci/0000:81:00.0: mode switchdev inline-mode link encap-mode basic
[root@sm15 ~]# ls -l /sys/bus/pci/devices/0000\:81\:00.0/net/
total 0
drwxr-xr-x. 5 root root 0 May 14 11:36 eth0
drwxr-xr-x. 5 root root 0 May 14 11:36 eth1
drwxr-xr-x. 5 root root 0 May 14 11:36 eth2
drwxr-xr-x. 5 root root 0 May 14 11:36 eth3
drwxr-xr-x. 5 root root 0 May 14 11:36 eth4
drwxr-xr-x. 5 root root 0 May 14 11:36 eth5
drwxr-xr-x. 5 root root 0 May 14 11:36 eth6
drwxr-xr-x. 5 root root 0 May 14 11:36 eth7
drwxr-xr-x. 5 root root 0 May 13 10:54 pf0
[root@sm15 ~]# dpdk-devbind.py --status-dev net | grep -i pf0
0000:81:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=pf0,eth7,eth5,eth3,eth1,eth6,eth4,eth2,eth0 drv=mlx5_core unused=vfio-pci

When using this mode, hw-tc-offload should be enabled on the pf and the vf representor interfaces

ethtool -K $PF1 hw-tc-offload on
ethtool -K $VF_PR hw-tc-offload on

Some references

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions