neutron-specs/specs/kilo/report-ha-router-master.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode

=======================
Report HA Router Master
=======================

https://blueprints.launchpad.net/neutron/+spec/report-ha-router-master

Highly available routers is a new functionality that was merged in the
l3-high-availability blueprint. HA routers are scheduled on multiple L3 agents
however the cloud operator has no way of knowing where the active instance
is.

Problem Description
===================

A cloud operator can know which L3 agents are providing a router, but not
where the active instance is. Legacy routers may be manually moved from
one agent to another. With HA routers, the equivalent is moving the active
instance, but that is not currently possible.
The first step is to know where the active instance, which will be addressed
in this blueprint, however setting the location of the active instance is
out of scope and will be addressed in the future.

The operator might want to perform node maintenance which is assisted by
manually moving routers from the node. Likewise the operator might want
to see the state of routers after a failover (Did the active instance
actually failover?).

Proposed Change
===============

l3-agent-list-hosting-router <router_id>

Currently shows all L3 agents hosting the router. It will now also show the HA
state (Active, standby or fault) of said router on every agent.

::

 +-----------+------+----------------+-------+----------+
 | id        | host | admin_state_up | alive | ha_state |
 +-----------+------+----------------+-------+----------+
 | 534c4b37- | net1 | True           | :-)   | active   |
 | da2730c6- | net2 | True           | :-)   | standby  |
 | 7abcd991- | net3 | True           | xxx   | fault    |
 +-----------+------+----------------+-------+----------+

Keepalived doesn't support a way to query the current VRRP state.
The only way to know then is to use notifier scripts.
These scripts are executed when a state transition occurs,
and receive the new state (Master, backup, fault).

Every time we reconfigure keepalived (When the router is created or updated)
we tell it to execute a Python script
(That is maintained as part of the repository).

The script will:

1) Write the new state to a file in $state_path/ha_confs/router_id/state
2) Notify the agent that a transition has occurred via a Unix domain socket.
   The reason that step 1 will happen in the script and not in the agent after
   it receives the notification is that we want to write down the state
   transition whenever it happens so that it isn't lost if the agent is down.
   keepalived does not expose a way to query for the current state, so that
   if a state transition occurred but we failed to write it down, that
   information is forever lost.

The L3 agent will start and stop the metadata proxy when it receives a
notification. This is to save on memory usage by enabling the proxy
only on the active instance. This can be important at scale as every
proxy takes 20+ MBs.

The L3 agent will batch these state change notifications over a period of T
seconds. When T seconds have passed and no new notifications have arrived it
will send a RPC message to the server with a map of router ID to VRRP state
on that specific agent. How it works is that once an event is received
by the agent, it batches all future events over a period of T seconds. When
the timer goes off, it sends all of the state changes in a single message
to the controller. Additionally, every time the agent starts it gets a list
of routers scheduled on the agent. The agent will now loop through said
routers, collect their HA states from disk and update the server.
This is to catch any state changes that occurred if and when an agent was down.
If a router changes states multiple times during the batching period, the
agent will only send the most up to date state.

The RPC message send will be retried in case the management
network is temporarily down, or the agent is disconnected from it.

The server will then persist this information following the RPC message:
The tables are already set up for this. Each router has an entry in the HA
bindings table per agent it is scheduled to, and the record contains the
VRRP state on that specific agent. The controller will also persist
the last time a state change was received, so that in a split brain situation
the admin would be able to understand which is the 'real' master by observing
the time stamps.

Optionally*, the server will look for dead agents (That have not sent
heartbeats in a while) and will mark their HA routers as down. This will
aid the main use case of a hypervisor dying (Of course not being able
to report of any state changes), and another hypervisor hosting all of the
routers. In this case the API will return 'active' for all routers on both
machines until the server notices that the first agent died and marks
its routers as down.

* This is an optional enhancement that could be added after the enhancement
  lands if we find it correct.

Data Model Impact
-----------------

The HA state of every router to agent binding is persisted in the
L3HARouterAgentPortBinding table. It is currently unused. A DB migration
will be necessary in order to add time stamps as well as the 'fault'
state, as currently only the 'active' and 'standby' can be persisted.

REST API Impact
---------------

l3-agent-list-hosting-router will now return an extra column that can be
'active', 'standby' or 'fault' for HA routers, or None for other types of
routers.

Security Impact
---------------

keepalived runs as root, as does the transition script that it invokes.
The transition script talks to the agent via a Unix domain socket.

Notifications Impact
--------------------

None.

Other End User Impact
---------------------

python-neutronclient will support the new ha_state column. It will show
'active', 'standby' or 'fault' when a proper response is received. '-' will be
displayed if None is received by an old server or for non-HA routers.

Performance Impact
------------------

Assuming two L3 agents and 1,000 routers hosted on each, a failover from node
1 to node 2 should induce only a single RPC call from node 2 to the server,
and a single DB transaction.

IPv6 Impact
-----------

None.

Other Deployer Impact
---------------------

None.

Developer Impact
----------------

None.

Community Impact
----------------

None.

Alternatives
------------

Instead of neutron-keepalived-state-change notifying the agent via a Unix
domain socket, the agent could poll for the state of all HA routers every
T seconds. It would then diff the new states against a cached copy
and notify the server of any changes. One could argue that this is simpler
to implement and maintain, but is less performant.

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  Assaf Muller <amuller>

Work Items
----------

* Current keepalived notifier bash scripts are generated in-line. These will
  now be a Python script maintained as part of the repository. The script
  will be available as neutron-keepalived-state-change and will be invoked
  by keepalived.
* At first the script will replicate the existing behavior of the bash
  scripts: Write the new state to disk and start up or shut down the metadata
  proxy.
* The script must also notify the agent of the state change via a Unix domain
  socket. Starting and stopping the metadata proxy will be moved ot the agent.
* The RPC message that updates HA routers states will be implemented (It
  currently actually already exists but cannot be used without changing
  its format).
* The agent will batch up state change notifications in to a single RPC
  message. The Nova notifier mechanism batches notifications and the code
  will be reused.
* The API must expose the new ha_state column.
* The L3 agent must report HA states after it starts.
* Add the fault state and state change timestamp via a DB migration patch.
* Optional: The controller will look for dead agents and move their HA routers
  to the fault state.

Dependencies
============

None.

Testing
=======

Tempest Tests
-------------

L3 HA cannot be tested in Tempest without multi-node support. L3 HA is the
first candidate to be tested when in-tree integration tests are introduced
via the integration-tests blueprint.

Functional Tests
----------------

The L3 agent already has functional testing in place. Two new tests will
be added:

1) When a state change occurs, that the notification arrives at the agent.
2) When multiple state changes occur, that the RPC call is sent to the server
   with the expected parameters.

API Tests
---------

The RPC and DB methods will be tested with unit tests.

Documentation Impact
====================

The changes to the API and CLI require documentation.

User Documentation
------------------

The CLI client documentation must be updated.

Developer Documentation
-----------------------

The Neutron API change must be documented.

References
==========

* Topic branch:
  https://review.openstack.org/#/q/branch:master+topic:bug/1365453,n,z
* https://blueprints.launchpad.net/neutron/+spec/l3-high-availability
* https://bugs.launchpad.net/neutron/+bug/1365453