To be able to file a bug I needed to be able to reproduce this setup on the latest version of each piece of software, preferably without needing a full OpenStack deployment. Then I could start removing layers of complexity until I had this simplest reproducer of the bug.
Although no one single part of the process of reproducing this setup was particularly difficult it did involve a fair few moving parts and below I run through them (mainly for my benefit when next week I've forgotten everything I did).
DPDK
I was lucky enough to have access to a server with two dpdk compatible network cards which I could deploy using maas. The server had also been setup to have hugepages created on install. This was done by creating a custom maas tag and assigning it to the server:
After installing the development release of Ubuntu (eoan) on the server it was time to install the dpdk and ovs packages.
2 ubuntu@maas:~⟫ maas maas tag read dpdk Success. Machine-readable output follows: { "definition": "", "name": "dpdk", "resource_uri": "/MAAS/api/2.0/tags/dpdk/", "kernel_opts": "hugepages=103117 iommu=pt intel_iommu=on", "comment": "DPDK enabled machines" }
After installing the development release of Ubuntu (eoan) on the server it was time to install the dpdk and ovs packages.
ubuntu@node-licetus:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu Eoan Ermine (development branch) Release: 19.10 Codename: eoan ubuntu@node-licetus:~$ sudo apt-get -q install -y dpdk openvswitch-switch-dpdk
Update the system to use openvswitch-switch-dpdk for ovs-vswitchd.
ubuntu@node-licetus:~$ sudo update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk update-alternatives: using /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk to provide /usr/sbin/ovs-vswitchd (ovs-vswitchd) in manual mode
ubuntu@node-licetus:~$ grep -E 'a0:36:9f:dd:31:bc|a0:36:9f:dd:31:be' /sys/class/net/*/address /sys/class/net/enp3s0f0/address:a0:36:9f:dd:31:bc /sys/class/net/enp3s0f1/address:a0:36:9f:dd:31:be
ubuntu@node-licetus:~$ ls -ld /sys/class/net/enp3s0f0 /sys/class/net/enp3s0f1 | awk '{print $NF}' | awk 'BEGIN {FS="/"} {print $6}' 0000:03:00.0 0000:03:00.1To switch the network cards from being kernel managed to being managed by dpdk the /etc/dpdk/interfaces is updated and dpdk restarted. When this is done the network cards will disappears from tools like ip.
root@node-licetus:~# ip -br addr show | grep enp enp3s0f0 UP fe80::a236:9fff:fedd:31bc/64 enp3s0f1 UP fe80::a236:9fff:fedd:31be/64 root@node-licetus:~# echo "pci 0000:03:00.0 vfio-pci > pci 0000:03:00.1 vfio-pci" >> /etc/dpdk/interfaces root@node-licetus:~# systemctl restart dpdk root@node-licetus:~# ip -br addr show | grep enp root@node-licetus:~#
DPDK enabled OVS
There are a few global settings which need to be applied when using ovs with dpdk. The first relates to hugepages. Hugepages need to be allocated per NUMA node. First check that the hugepages have been created as requested by the kernel option specified in maas:
root@node-licetus:~# grep -i hugepages_ /proc/meminfo HugePages_Total: 103117 HugePages_Free: 103117 HugePages_Rsvd: 0 HugePages_Surp: 0
Now see how many NUMA nodes there are:
I chose to allocate 4096MB to each of the NUMA nodes. This is done via the dpdk-socket-mem option which takes a comma delimited list of hugepage numbers as its value:
Next white list the network cards in ovs via their PCI addresses:
# ls -ld /sys/devices/system/node/node* | wc -l 2
I chose to allocate 4096MB to each of the NUMA nodes. This is done via the dpdk-socket-mem option which takes a comma delimited list of hugepage numbers as its value:
root@node-licetus:~# ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem="4096,4096" root@node-licetus:~#
root@node-licetus:~# ovs-vsctl set Open_vSwitch . other_config:dpdk-extra="--pci-whitelist 0000:03:00.0 --pci-whitelist 0000:03:00.1" root@node-licetus:~#
Finally restart openvswitch-switch and check the log:
root@node-licetus:~# systemctl restart openvswitch-switch root@node-licetus:~#
root@node-licetus:~# grep --color -E 'PCI|DPDK|ovs-vswitchd' /var/log/openvswitch/ovs-vswitchd.log 2019-07-11T12:19:51.475Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2019-07-11T12:19:51.496Z|00007|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.11.0 2019-07-11T13:07:54.806Z|00009|dpdk|ERR|DPDK not supported in this copy of Open vSwitch. 2019-07-11T13:08:17.961Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2019-07-11T13:08:17.969Z|00007|dpdk|INFO|Using DPDK 18.11.0 2019-07-11T13:08:17.969Z|00008|dpdk|INFO|DPDK Enabled - initializing... 2019-07-11T13:08:17.969Z|00011|dpdk|INFO|Per port memory for DPDK devices disabled. 2019-07-11T13:08:17.969Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd --pci-whitelist 0000:03:00.0 --pci-whitelist 0000:03:00.1 --socket-mem 4096,4096 --socket-limit 4096,4096 -l 0. 2019-07-11T13:08:26.915Z|00019|dpdk|INFO|EAL: PCI device 0000:03:00.0 on NUMA socket 0 2019-07-11T13:08:27.600Z|00023|dpdk|INFO|EAL: PCI device 0000:03:00.1 on NUMA socket 0 2019-07-11T13:08:28.090Z|00026|dpdk|INFO|DPDK Enabled - initialized 2019-07-11T13:08:28.097Z|00051|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.11.0
Bridge with DPDK Bonded NICs
As per the OVS docs when creating the bridge the datapath_type needs to be set to netdev to tell ovs run in userspace mode .
root@node-licetus:~# ovs-vsctl -- add-br br-test root@node-licetus:~# ovs-vsctl -- set bridge br-test datapath_type=netdev
Now create the bond device and attach it to the bridge:
ovs-vsctl seems quite happy to create the bond even if there is a problem so its worth taking a moment to check the device exists in the bridge without any errors:
Part of my testing was to use jumbo frames so the final step is to set the mtu on the dpdk devices:
root@node-licetus:~# ovs-vsctl --may-exist add-bond br-test dpdk-bond0 dpdk-nic1 dpdk-nic2 \ > -- set Interface dpdk-nic1 type=dpdk options:dpdk-devargs=0000:03:00.0 \ > -- set Interface dpdk-nic2 type=dpdk options:dpdk-devargs=0000:03:00.1
ovs-vsctl seems quite happy to create the bond even if there is a problem so its worth taking a moment to check the device exists in the bridge without any errors:
root@node-licetus:~# ovs-vsctl show 181b55d1-999a-464b-adf4-d80ca1790988 Bridge br-test Port br-test Interface br-test type: internal Port "dpdk-bond0" Interface "dpdk-nic1" type: dpdk options: {dpdk-devargs="0000:03:00.0"} Interface "dpdk-nic2" type: dpdk options: {dpdk-devargs="0000:03:00.1"} ovs_version: "2.11.0"
Part of my testing was to use jumbo frames so the final step is to set the mtu on the dpdk devices:
root@node-licetus:~# ovs-vsctl set Interface dpdk-nic1 mtu_request=9000 root@node-licetus:~# ovs-vsctl set Interface dpdk-nic2 mtu_request=9000 root@node-licetus:~#
Tap in Network Namespace Attached to a Bridge.
Create a tap device called tap1 in the bridge:
root@node-licetus:~# ovs-vsctl add-port br-test tap1 -- set Interface tap1 type=internal root@node-licetus:~#
Create a network namespace called ns1 and place the tap1 into it.
root@node-licetus:~# ip netns add ns1 root@node-licetus:~# ip link set tap1 netns ns1
Bring up tap1 and assign it an IP address:
root@node-licetus:~# ip netns exec ns1 ip link set dev tap1 up root@node-licetus:~# ip netns exec ns1 ip link set dev lo up root@node-licetus:~# ip netns exec ns1 ip addr add 172.20.0.1/24 dev tap1
root@node-licetus:~# ip netns exec ns1 ip link set dev tap1 mtu 9000
Finally the network the tap needs to be on is vlan 2933 which is delivered to the network cards as part of a vlan trunk. To assign tap1 to the vlan with id 2933 the tap port needs to be tagged.
root@node-licetus:~# ovs-vsctl set port tap1 tag=2933 root@node-licetus:~#
Testing
That really is it. Below is the resulting bridge and tap interface:
root@node-licetus:~# ovs-vsctl show 181b55d1-999a-464b-adf4-d80ca1790988 Bridge br-test Port "tap1" tag: 2933 Interface "tap1" type: internal Port br-test Interface br-test type: internal Port "dpdk-bond0" Interface "dpdk-nic1" type: dpdk options: {dpdk-devargs="0000:03:00.0"} Interface "dpdk-nic2" type: dpdk options: {dpdk-devargs="0000:03:00.1"} ovs_version: "2.11.0"
root@node-licetus:~# ip netns exec ns1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 12: tap1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether 0a:96:ad:cd:2e:7f brd ff:ff:ff:ff:ff:ff inet 172.20.0.1/24 scope global tap1 valid_lft forever preferred_lft forever inet6 fe80::896:adff:fecd:2e7f/64 scope link valid_lft forever preferred_lft forever
ubuntu@node-husband:~$ ping -c3 172.20.0.1 PING 172.20.0.1 (172.20.0.1) 56(84) bytes of data. 64 bytes from 172.20.0.1: icmp_seq=1 ttl=64 time=0.215 ms 64 bytes from 172.20.0.1: icmp_seq=2 ttl=64 time=0.146 ms 64 bytes from 172.20.0.1: icmp_seq=3 ttl=64 time=0.144 ms --- 172.20.0.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2046ms rtt min/avg/max/mdev = 0.144/0.168/0.215/0.034 ms
Final Thoughts
If you want to use dpdk with OpenStack then the OpenStack charms make it easy. The charms look after all of the above and much more.
If you want a more complete guide to dpdk try here The new simplicity to consume dpdk