Wednesday 8 July 2015

Neutron Router High Availability? As easy as "juju set"

The Juju Charms for deploying Openstack have just had their three monthly update (15.04 release). The charms now allow the new Neutron Layer 3 High Availability using Virtual Router Redundancy Protocol (VRRP) feature to be used. When enabled, this feature will allow Neutron to quickly failover a router to another Neutron gateway in the event that the primary node hosting the router is lost. The feature was introduced in Juno and marked as experimental so I would recommend only using it with deployments >= Kilo.

Enabling Router ha:

L3 HA in kilo requires that DVR and L2 Population are disabled, so to enable it in the charms:
juju set neutron-api enable-l3ha=True
juju set neutron-api enable-dvr=False
juju set neutron-api l2-population=False

The number of L3 agents that will run standby routers can also be configured:
juju set neutron-api max-l3-agents-per-router=2
juju set neutron-api min-l3-agents-per-router=2

Creating a HA enabled router.

The charms switch on router HA by default once enable-l3ha has been enabled.
$ neutron router-create ha-router
Created a new router:
| Field                 | Value                                |
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | 64ff0665-5600-433c-b2d8-33509ce88eb1 |
| name                  | test2                                |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | 8e8b1426508f42aeaff783180d7b2ef4     |
/!\ Currently a router cannot be switched in and out of HA mode
$ neutron router-update 64ff0665-5600-433c-b2d8-33509ce88eb1  --ha=False
400-{u'NeutronError': {u'message': u'Cannot update read-only attribute ha', u'type': u'HTTPBadRequest', u'detail': u''}}

Under the hood:

Below is a worked example, following the creation of an HA enabled router showing the components created implicitly by Neutron. In this environment the following networks have already been created:
$ neutron net-list
| id                                   | name    | subnets                                              |
| 32ba54bc-804e-489e-8903-b8dc0ed535f7 | private | a3ed1cc4-3451-418f-a412-80ad8cca2ec4 |
| c9a3bc24-6390-4220-b136-bc0edf1fe2f2 | ext_net | 76098d4d-bfa4-4f96-89e0-78c851d80dac     |

$ neutron subnet-list 
| id                                   | name           | cidr            | allocation_pools                                   |
| 76098d4d-bfa4-4f96-89e0-78c851d80dac | ext_net_subnet |     | {"start": "", "end": ""}     |
| a3ed1cc4-3451-418f-a412-80ad8cca2ec4 | private_subnet | | {"start": "", "end": ""} |

In this environment there are three neutron-gateways:
$ juju status neutron-gateway --format=short

- neutron-gateway/0: (started)
- neutron-gateway/1: (started)
- neutron-gateway/2: (started)

With their corresponding L3 agents:
$ neutron agent-list | grep "L3 agent"
| 28f227d8-e620-4478-ba36-856fb0409393 | L3 agent           | juju-lytrusty-machine-7  | :-)   | True           |
| 8d439f33-e4f8-4784-a617-5b3328bab9e3 | L3 agent           | juju-lytrusty-machine-6  | :-)   | True           |
| bdc00c2a-77c0-45c3-ab8a-ceca3319832d | L3 agent           | juju-lytrusty-machine-8  | :-)   | True           |

There is no router defined yet so the only network namespace present is the dhcp namespace for the private network:
$ juju run --service neutron-gateway --format=yaml "ip netns list"
- MachineId: "6"
  Stdout: ""
  UnitId: neutron-gateway/0
- MachineId: "7"
  Stdout: |
  UnitId: neutron-gateway/1
- MachineId: "8"
  Stdout: ""
  UnitId: neutron-gateway/2

Creating a router will add a qrouter-$ROUTERID netns to two of the gateway nodes (since min-l3-agents-per-router=2 and max-l3-agents-per-router=2)

$ neutron router-create ha-router
Created a new router:
| Field                 | Value                                |
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | 192ba483-c060-4ee2-86ad-fe38ea280c93 |
| name                  | ha-router                            |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | 8e8b1426508f42aeaff783180d7b2ef4     |

Neutron has assigned this router to two of the three agents:

$ ROUTER_ID="192ba483-c060-4ee2-86ad-fe38ea280c93"
$ neutron l3-agent-list-hosting-router $ROUTER_ID
| id                                   | host                    | admin_state_up | alive |
| 28f227d8-e620-4478-ba36-856fb0409393 | juju-lytrusty-machine-7 | True           | :-)   |
| bdc00c2a-77c0-45c3-ab8a-ceca3319832d | juju-lytrusty-machine-8 | True           | :-)   |
A netns for the new router will have been created in neutron-gateway/1 and neutron-gateway/2:
$  juju run --service neutron-gateway --format=yaml "ip netns list"
- MachineId: "6"
  Stdout: ""
  UnitId: neutron-gateway/0
- MachineId: "7"
  Stdout: |
  UnitId: neutron-gateway/1
- MachineId: "8"
  Stdout: |
  UnitId: neutron-gateway/2
A keepalived process is spawned in each of the qrouter netns and these process communicate over a dedicated network which is created implicitly when the HA enabled router is added.
$ neutron net-list
| id                                   | name                                               | subnets                                               |
| 32ba54bc-804e-489e-8903-b8dc0ed535f7 | private                                            | a3ed1cc4-3451-418f-a412-80ad8cca2ec4  |
| af9cad57-b4fe-465d-b439-b72aaec16309 | HA network tenant 8e8b1426508f42aeaff783180d7b2ef4 | f0cb279b-36fe-43dc-a03b-8eb8b99e7f0b |
| c9a3bc24-6390-4220-b136-bc0edf1fe2f2 | ext_net                                            | 76098d4d-bfa4-4f96-89e0-78c851d80dac      |

Neutron creates a dedicated interface in the qrouter netns for this traffic.

$ neutron port-list
| id                                   | name                                            | mac_address       | fixed_ips                                                                            |
| 72326e9b-67e8-403a-80c3-4bac9748cdb6 |                                                 | fa:16:3e:aa:2c:96 | {"subnet_id": "a3ed1cc4-3451-418fa412-80ad8cca2ec4", "ip_address": ""}   |
| 89c47030-f849-41ed-96e6-a36a3a696eeb | HA port tenant 8e8b1426508f42aeaff783180d7b2ef4 | fa:16:3e:d4:fc:a1 | {"subnet_id": "f0cb279b-36fe-43dca03b-8eb8b99e7f0b", "ip_address": ""}  |
| 9ce2b6ac-9983-4ffd-ae97-6400682021c8 | HA port tenant 8e8b1426508f42aeaff783180d7b2ef4 | fa:16:3e:a5:76:e9 | {"subnet_id": "f0cb279b-36fe-43dca03b-8eb8b99e7f0b", "ip_address": ""}  |

$  juju run --unit neutron-gateway/1,neutron-gateway/2 --format=yaml "ip netns exec qrouter-$ROUTER_ID ip addr list | grep  ha-"
- MachineId: "7"
  Stdout: |
    2: ha-89c47030-f8:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        inet brd scope global ha-89c47030-f8
  UnitId: neutron-gateway/1
- MachineId: "8"
  Stdout: |
    2: ha-9ce2b6ac-99:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        inet brd scope global ha-9ce2b6ac-99
        inet scope global ha-9ce2b6ac-99
  UnitId: neutron-gateway/2

Keepalived writes out state to /var/lib/neutron/ha_confs/$ROUTER_ID/state, this can be queried to find out who is currently the master:

$ juju run --unit  neutron-gateway/1,neutron-gateway/2 "cat /var/lib/neutron/ha_confs/$ROUTER_ID/state"
- MachineId: "7"
  Stdout: backup
  UnitId: neutron-gateway/1
- MachineId: "8"
  Stdout: master
  UnitId: neutron-gateway/2

Plugging the router into the networks:

$ neutron router-gateway-set $ROUTER_ID c9a3bc24-6390-4220-b136-bc0edf1fe2f2
Set gateway for router 192ba483-c060-4ee2-86ad-fe38ea280c93
$ neutron router-interface-add $ROUTER_ID a3ed1cc4-3451-418f-a412-80ad8cca2ec4
Added interface 4ffe673c-b528-4891-b9ec-3ebdcfc146e2 to router 192ba483-c060-4ee2-86ad-fe38ea280c93.

The router now has an IP on the private subnet which will be managed by keepalived:

$ neutron router-show $ROUTER_ID                            
| Field                 | Value                                                                                                                                                                                  |
| admin_state_up        | True                                                                                                                                                                                   |
| distributed           | False                                                                                                                                                                                  |
| external_gateway_info | {"network_id": "c9a3bc24-6390-4220-b136-bc0edf1fe2f2", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "76098d4d-bfa4-4f96-89e0-78c851d80dac", "ip_address": ""}]} |
| ha                    | True                                                                                                                                                                                   |
| id                    | 192ba483-c060-4ee2-86ad-fe38ea280c93                                                                                                                                                   |
| name                  | ha-router                                                                                                                                                                              |
| routes                |                                                                                                                                                                                        |
| status                | ACTIVE                                                                                                                                                                                 |
| tenant_id             | 8e8b1426508f42aeaff783180d7b2ef4                                                                                                                                                       |

Since neutron-gateway/2 has been designated as the leader it will have the router ip ( in its netns:
$ juju run --unit neutron-gateway/1,neutron-gateway/2 --format=yaml "ip netns exec qrouter-192ba483-c060-4ee2-86ad-fe38ea280c93 ip addr list | grep  10.5.150"
- MachineId: "7"
  ReturnCode: 1
  Stdout: ""
  UnitId: neutron-gateway/1
- MachineId: "8"
  Stdout: |2
        inet scope global qg-288da587-97
  UnitId: neutron-gateway/2

$ ping -c2
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.756 ms
64 bytes from icmp_seq=2 ttl=64 time=0.487 ms

--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.487/0.621/0.756/0.136 ms

Finally, shutting down neutron-gateway/2 will tigger the router ip to flip over to neutron-gateway/1:

$ juju run --unit neutron-gateway/2 "shutdown -h now"
$ juju run --unit  neutron-gateway/1 "cat /var/lib/neutron/ha_confs/$ROUTER_ID/state"
$ juju run --unit neutron-gateway/1 --format=yaml "ip netns exec qrouter-192ba483-c060-4ee2-86ad-fe38ea280c93 ip addr list | grep  10.5.150"
- MachineId: "7"
  Stdout: |2
        inet scope global qg-288da587-97
  UnitId: neutron-gateway/1

$ ping -c2
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.359 ms
64 bytes from icmp_seq=2 ttl=64 time=0.497 ms

--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.359/0.428/0.497/0.069 ms