Monday, 14 March 2022

OpenStack OVS Debugging Walk-through

I recently needed to investigate broken connectivity in an OpenStack deployment after an OVS upgrade. This blog is a walk-through of that process. In this setup the missing connectivity is between the Octavia units and the amphora units they manage. There is nothing Octavia specific about the debugging session. At the start all I knew was that the OpenStack Xena Octavia tests were failing. On closer inspection of the test runs I could see that the amphorae were stuck in a BOOTING state.
$ openstack loadbalancer amphora list -c id -c status -c role
+--------------------------------------+-----------+--------+
| id                                   | status    | role   |
+--------------------------------------+-----------+--------+
| c7dc61df-c7c7-4bce-9d79-216118a0f001 | BOOTING   | BACKUP |
| e64930ad-1cf0-4073-9827-45302a3aac52 | BOOTING   | MASTER |
+--------------------------------------+-----------+--------+

The corresponding guest units appeared to be up and available
$ openstack server list --all-projects -c Name -c Status -c Networks
+----------------------------------------------+--------+----------------------------------------------------------------------------------------------------------------+
| Name                                         | Status | Networks                                                                                                       |
+----------------------------------------------+--------+----------------------------------------------------------------------------------------------------------------+
| amphora-e64930ad-1cf0-4073-9827-45302a3aac52 | ACTIVE | lb-mgmt-net=fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf; private=192.168.0.59; private_lb_fip_network=10.42.0.175  |
| amphora-c7dc61df-c7c7-4bce-9d79-216118a0f001 | ACTIVE | lb-mgmt-net=fc00:5896:ff23:44dc:f816:3eff:fe7e:15d1; private=192.168.0.194; private_lb_fip_network=10.42.0.231 |
+----------------------------------------------+--------+----------------------------------------------------------------------------------------------------------------+
However, looking in the Ocatavia logs revealed these errors:
2022-03-12 09:04:19.832 78382 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. 
  Retrying.: requests.exceptions.ConnectTimeout: 
  HTTPSConnectionPool(host='fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf', port=9443): 
    Max retries exceeded with url: // 
    (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7fcc3436dca0>,
                                   'Connection to fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf timed out.
                                   (connect timeout=10.0)'))
Testing from the Octavia unit confirms the lack of connectivity
$ ping -c2 fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf
PING fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf(fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf) 56 data bytes

--- fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1027ms

$ nc -zvw2 fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf 9443
nc: connect to fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf port 9443 (tcp) timed out: Operation now in progress
At this point the issue could be anywhere between the Octavia unit and the guest. I assumed the most likely place to be causing the issue was the guest. To check this I setup a tcpdump on the tap device which connects the guest to the virtual network. To find the tap device first find the hypervisor hosting the guest:
$ openstack server show 6bdbe8b9-2765-48aa-a48f-4057e8038d9c -c name -c 'OS-EXT-SRV-ATTR:hypervisor_hostname' 
+-------------------------------------+-----------------------------------------------------+
| Field                               | Value                                               |
+-------------------------------------+-----------------------------------------------------+
| OS-EXT-SRV-ATTR:hypervisor_hostname | juju-7091df-zaza-d0217f8a78e8-8.project.serverstack |
| name                                | amphora-e64930ad-1cf0-4073-9827-45302a3aac52        |
+-------------------------------------+-----------------------------------------------------+
Find the port ID corresponding to the IP address we are trying to connect to
$ openstack port list -cID -c'Fixed IP Addresses' | grep 'fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf'
| 93fff561-bfda-46b4-9d61-5f4ea8559a15 | ip_address='fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf', subnet_id='e4f5ce87-759a-465a-9d89-0e43066a6c6a' |
Now that the port is know we can connect to the hypervisor (juju-7091df-zaza-d0217f8a78e8-8) and search for the tap device corresponding to the port ID:
$ ip a | grep tap | grep 93fff561
25: tap93fff561-bf:  mtu 1458 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000
Any traffic to the quest using the ip associated with the port will pass through this tap device. If we set off a ping on the Octavia unit then we can use tcpdump on the tap device to confirm that the icmpv6 echo packets are sucessfully arriving.
$ sudo tcpdump -i tap93fff561-bf -l 'icmp6[icmp6type]=icmp6-echo'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap93fff561-bf, link-type EN10MB (Ethernet), capture size 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel
The packets are not arriving so the break must be earlier on. Connectivity from the Octavia unit to the guest goes through a gre tunnel between the Octavia unit and the hypervisor hosting the guest. The next step is to setup listeners at either end of the gre tunnel. In this case the ip address of the Octavia unit is 172.20.0.139 and the hypervisor is 172.20.0.209. This allows us to find the corresponding gre interface in the ovs br-tun bridge. Looking at the output below from the hypervisor there are four gre tunnels and gre-ac14008b is the tunnel to the Octavia unit since the remote_ip option corresponds to the ip of the Octavia unit.
$ sudo ovs-vsctl show | grep -A3 "Port gre"
        Port gre-ac14008b
            Interface gre-ac14008b
                type: gre
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="172.20.0.209", out_key=flow, remote_ip="172.20.0.139"}
        Port gre-ac140070
            Interface gre-ac140070
                type: gre
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="172.20.0.209", out_key=flow, remote_ip="172.20.0.112"}
        Port gre-ac1400f4
            Interface gre-ac1400f4
                type: gre
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="172.20.0.209", out_key=flow, remote_ip="172.20.0.244"}
        Port gre-ac1400df
            Interface gre-ac1400df
                type: gre
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="172.20.0.209", out_key=flow, remote_ip="172.20.0.223"}

Repeating the process above reveals that the interface on the Octavia unit which handles the other end of the tunnel is called gre-ac1400d1. With this information we can set up a tcpdump processes on both ends of the gre tunnel to check for traffic: On the hypervisor:
$ sudo ovs-tcpdump -i gre-ac14008b -l 'icmp6[icmp6type]=icmp6-echo'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on migre-ac14008b, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
On the octavia unit:
$ sudo ovs-tcpdump -i gre-ac1400d1 -l 'icmp6[icmp6type]=icmp6-echo'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on migre-ac1400d1, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
The packets are not managing to leave the Octavia host, in fact they are not even making it to the gre interface on the Octavia host. Before investigating further its worth looking at a diagram showing the devices between o-hm0 and the tap device associated with the guest.
This is starting to look like an issue with port security. In an ML2 OVS deployment port security is enforced by openflow rules in the br-int ovs bridge. To check this we can disable port security and retest. In this deployment the port on the Octavia unit is named octavia-health-manager-octavia-2-listen-port
$ openstack port show octavia-health-manager-octavia-2-listen-port --fit-width -c name -c fixed_ips -c port_security_enabled -c security_group_ids
+-----------------------+--------------------------------------------------------------------------------------------------------+
| Field                 | Value                                                                                                  |
+-----------------------+--------------------------------------------------------------------------------------------------------+
| fixed_ips             | ip_address='fc00:5896:ff23:44dc:f816:3eff:fea8:7ef6', subnet_id='e4f5ce87-759a-465a-9d89-0e43066a6c6a' |
| name                  | octavia-health-manager-octavia-2-listen-port                                                           |
| port_security_enabled | True                                                                                                   |
| security_group_ids    | c01a6b7c-55e6-49ba-a141-d67401c42b6d                                                                   |
+-----------------------+--------------------------------------------------------------------------------------------------------+
To disable port security the security group needs to be removed too, this can be done in a single command
$ openstack port set --disable-port-security --no-security-group octavia-health-manager-octavia-2-listen-port
Then retest:
$ ping -c2  fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf
PING fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf(fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf) 56 data bytes
64 bytes from fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf: icmp_seq=1 ttl=64 time=2.34 ms
64 bytes from fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf: icmp_seq=2 ttl=64 time=1.45 ms

--- fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.454/1.895/2.337/0.441 ms
So removing port security fixes the issue. This strongly implies that the openflow rules are blocking the packets or failing to forward them to the gre interface. To continue debugging the issue we need to re-enable port security:
$ openstack port set --security-group lb-health-mgr-sec-grp --enable-port-security octavia-health-manager-octavia-2-listen-port
The next obvious place to look is the security group rules of the security group associated with the port.
a$ openstack security group rule list lb-health-mgr-sec-grp
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 2579171b-e2b7-43aa-b18e-1ca50bf1e2ce | None        | IPv6      | ::/0      |            | egress    | None                  | None                 |
| 632a4117-83bd-4d08-8bc7-de8424f92b6a | ipv6-icmp   | IPv6      | ::/0      |            | ingress   | None                  | None                 |
| be9c3312-dd15-4f1b-93bf-17a5f1917e0d | None        | IPv4      | 0.0.0.0/0 |            | egress    | None                  | None                 |
| eba21f3a-e90b-4b05-b729-20eb54ce047e | udp         | IPv6      | ::/0      | 5555:5555  | ingress   | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
The first rule in the list is allowing any IPv6 egress traffic so the echo request should not be being blocked. Since the security group rules look ok we should now look at the flows associated with br-int incase there are any clues there. The complete set of flows associated with br-int can be seen by running sudo ovs-ofctl dump-flows br-int (since there are 84 of them I have ommitted listing them here). dump-flows by default includes additional information such as the number of packets which have matched a rule. This allows us to see how many of those rules are actually being hit:
$ sudo ovs-ofctl dump-flows br-int |  grep -v n_packets=0, | wc -l
21
This is still a lot of rules to go through. Luckily ovs provides a trace utility which allows you to send a theoretical packet through the flows and see how it would be treated. To use is you need to collect information about the source and destination.
IN_PORT="o-hm0"
BRIDGE="br-int"
HM0_SRC="fc00:5896:ff23:44dc:f816:3eff:fea8:7ef6"
DST_IP="fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf"
HM0_MAC="fa:16:3e:a8:7e:f6"
DST_MAC="fe:16:3e:0d:9a:cf"
sudo ovs-appctl ofproto/trace $BRIDGE in_port="$IN_PORT",icmp6,dl_src=$HM0_MAC,dl_dst=$DST_MAC,ipv6_src=$HM0_SRC,ipv6_dst=$DST_IP
The output is verbose but crucially at the end the packet has been transfered to br-tun and you can see it being set out of the tunnels:
...
22. dl_vlan=1, priority 1, cookie 0xfc585d7201b94e91
    pop_vlan
    set_field:0x2f1->tun_id
    output:2
     -> output to kernel tunnel
    output:3
     -> output to kernel tunnel
    output:5
     -> output to kernel tunnel
This is strange because the trace utility seems to be indicating that the packets should be being sent down the gre tunnels but this is not what we are seeing in practise. Since I manually constructed the test packet previously it is possible I ommited some data which would cause the packet to be handled differently (such as icmp_type). To construct a more authentic packet to test with we can use tcpdump again to capture a packet and then pass this through the trace utility. To do this set off a continious ping again, and run tcpdump this time against the o-hm0 interface and save the captured packets to a file.
$ sudo tcpdump -i o-hm0 'icmp6[icmp6type]=icmp6-echo' -w ping.pcap
The packets in the file need to be converted to hex before they can be fed into the trace utility
$ sudo ovs-pcap ping.pcap > ping.hex
We can now grab the first packet in the file and feed it through the trace utility:
$ IN_PORT="o-hm0"
$ BRIDGE="br-int"
$ PACKET=$(head -1 ping.hex)                                   
$ sudo ovs-appctl ofproto/trace $BRIDGE in_port="$IN_PORT" $PACKET
...
20. dl_vlan=1,dl_dst=fa:16:3e:0d:9a:cf, priority 2, cookie 0xfc585d7201b94e91
    pop_vlan
    set_field:0x2f1->tun_id
    output:5
     -> output to kernel tunnel
Once again the packet is correctly transfered to br-tun and out of a tunnel. In this case rather than flooding the packet out of all the gre tunnels it has been sent down the tunnel that corresponds to the correct hypervisor. Earlier we identified the interface as gre-ac1400d1 and we can see that is port 5:
$ sudo ovs-vsctl list interface | grep -B1 -E "ofport.*\s5"
name                : gre-ac1400d1
ofport              : 5
At this point it seems that the flows are permitting the packets in theory but not in practise. These seems to be pointing strongly that the rules themselves are not the issue but the way ovs is handleing the rules. However, investigating ovs itself is likely going to be a long process involving compiling ovs so I really want to make sure I have ruled out any other factors. The next thing to try is manullay adding promiscuous flow rules and see if that fixes connectivity. In this deployment port 2 is the patch cable between br-int and br-tun and host port o-hm0 is port3.
$ sudo ovs-vsctl list interface | grep -B1 -E "ofport.*\s(2|3)"
name                : patch-tun
ofport              : 2
--
name                : o-hm0
ofport              : 3
Below adds rules that allow any icmp6 traffic on both ports.
sudo ovs-ofctl add-flow br-int table=0,priority=96,icmp6,in_port=2,actions=NORMAL
sudo ovs-ofctl add-flow br-int table=0,priority=96,icmp6,in_port=3,actions=NORMAL
Once again connectivity is restored:
~$ ping -c2  fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf
PING fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf(fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf) 56 data bytes
64 bytes from fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf: icmp_seq=1 ttl=64 time=1.49 ms
64 bytes from fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf: icmp_seq=2 ttl=64 time=1.34 ms

--- fc00:5896:ff23:44dc:f816:3eff:fe0d:9acf ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.342/1.416/1.490/0.074 ms
Now remove the rules again.
$ sudo ovs-ofctl --strict del-flows br-int table=0,priority=96,icmp6,in_port=2
$ sudo ovs-ofctl --strict del-flows br-int table=0,priority=96,icmp6,in_port=3
At this point it seems clear that the issue is with ovs. I did do one other test before investigating ovs itself and that was to dump the flows from a working deployment with ovs 2.15 (using the handy --no-stats option to remove the packet and byte counts). Then upgrade and diff the flows again. Other than the cookie values changing and the order of some rules with the same priority changing the rules were the same.