Report HA router master
Change-Id: Ibf20960104660268bb28fd0083ad8dd3b0cd38c8
This commit is contained in:
parent
cd75a5ded6
commit
ad9d2f67d6
266
specs/kilo/report-ha-router-master.rst
Normal file
266
specs/kilo/report-ha-router-master.rst
Normal file
@ -0,0 +1,266 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=======================
|
||||
Report HA Router Master
|
||||
=======================
|
||||
|
||||
https://blueprints.launchpad.net/neutron/+spec/report-ha-router-master
|
||||
|
||||
Highly available routers is a new functionality that was merged in the
|
||||
l3-high-availability blueprint. HA routers are scheduled on multiple L3 agents
|
||||
however the cloud operator has no way of knowing where the active instance
|
||||
is.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
A cloud operator can know which L3 agents are providing a router, but not
|
||||
where the active instance is. Legacy routers may be manually moved from
|
||||
one agent to another. With HA routers, the equivalent is moving the active
|
||||
instance, but that is not currently possible.
|
||||
The first step is to know where the active instance, which will be addressed
|
||||
in this blueprint, however setting the location of the active instance is
|
||||
out of scope and will be addressed in the future.
|
||||
|
||||
The operator might want to perform node maintenance which is assisted by
|
||||
manually moving routers from the node. Likewise the operator might want
|
||||
to see the state of routers after a failover (Did the active instance
|
||||
actually failover?).
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
l3-agent-list-hosting-router <router_id>
|
||||
|
||||
Currently shows all L3 agents hosting the router. It will now also show the HA
|
||||
state (Active, standby or fault) of said router on every agent.
|
||||
|
||||
::
|
||||
|
||||
+-----------+------+----------------+-------+----------+
|
||||
| id | host | admin_state_up | alive | ha_state |
|
||||
+-----------+------+----------------+-------+----------+
|
||||
| 534c4b37- | net1 | True | :-) | active |
|
||||
| da2730c6- | net2 | True | :-) | standby |
|
||||
| 7abcd991- | net3 | True | xxx | fault |
|
||||
+-----------+------+----------------+-------+----------+
|
||||
|
||||
Keepalived doesn't support a way to query the current VRRP state.
|
||||
The only way to know then is to use notifier scripts.
|
||||
These scripts are executed when a state transition occurs,
|
||||
and receive the new state (Master, backup, fault).
|
||||
|
||||
Every time we reconfigure keepalived (When the router is created or updated)
|
||||
we tell it to execute a Python script
|
||||
(That is maintained as part of the repository).
|
||||
|
||||
The script will:
|
||||
|
||||
1) Write the new state to a file in $state_path/ha_confs/router_id/state
|
||||
2) Notify the agent that a transition has occurred via a Unix domain socket.
|
||||
The reason that step 1 will happen in the script and not in the agent after
|
||||
it receives the notification is that we want to write down the state
|
||||
transition whenever it happens so that it isn't lost if the agent is down.
|
||||
keepalived does not expose a way to query for the current state, so that
|
||||
if a state transition occurred but we failed to write it down, that
|
||||
information is forever lost.
|
||||
|
||||
The L3 agent will start and stop the metadata proxy when it receives a
|
||||
notification. This is to save on memory usage by enabling the proxy
|
||||
only on the active instance. This can be important at scale as every
|
||||
proxy takes 20+ MBs.
|
||||
|
||||
The L3 agent will batch these state change notifications over a period of T
|
||||
seconds. When T seconds have passed and no new notifications have arrived it
|
||||
will send a RPC message to the server with a map of router ID to VRRP state
|
||||
on that specific agent. How it works is that once an event is received
|
||||
by the agent, it batches all future events over a period of T seconds. When
|
||||
the timer goes off, it sends all of the state changes in a single message
|
||||
to the controller. Additionally, every time the agent starts it gets a list
|
||||
of routers scheduled on the agent. The agent will now loop through said
|
||||
routers, collect their HA states from disk and update the server.
|
||||
This is to catch any state changes that occurred if and when an agent was down.
|
||||
If a router changes states multiple times during the batching period, the
|
||||
agent will only send the most up to date state.
|
||||
|
||||
The RPC message send will be retried in case the management
|
||||
network is temporarily down, or the agent is disconnected from it.
|
||||
|
||||
The server will then persist this information following the RPC message:
|
||||
The tables are already set up for this. Each router has an entry in the HA
|
||||
bindings table per agent it is scheduled to, and the record contains the
|
||||
VRRP state on that specific agent. The controller will also persist
|
||||
the last time a state change was received, so that in a split brain situation
|
||||
the admin would be able to understand which is the 'real' master by observing
|
||||
the time stamps.
|
||||
|
||||
Optionally*, the server will look for dead agents (That have not sent
|
||||
heartbeats in a while) and will mark their HA routers as down. This will
|
||||
aid the main use case of a hypervisor dying (Of course not being able
|
||||
to report of any state changes), and another hypervisor hosting all of the
|
||||
routers. In this case the API will return 'active' for all routers on both
|
||||
machines until the server notices that the first agent died and marks
|
||||
its routers as down.
|
||||
|
||||
* This is an optional enhancement that could be added after the enhancement
|
||||
lands if we find it correct.
|
||||
|
||||
Data Model Impact
|
||||
-----------------
|
||||
|
||||
The HA state of every router to agent binding is persisted in the
|
||||
L3HARouterAgentPortBinding table. It is currently unused. A DB migration
|
||||
will be necessary in order to add time stamps as well as the 'fault'
|
||||
state, as currently only the 'active' and 'standby' can be persisted.
|
||||
|
||||
REST API Impact
|
||||
---------------
|
||||
|
||||
l3-agent-list-hosting-router will now return an extra column that can be
|
||||
'active', 'standby' or 'fault' for HA routers, or None for other types of
|
||||
routers.
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
|
||||
keepalived runs as root, as does the transition script that it invokes.
|
||||
The transition script talks to the agent via a Unix domain socket.
|
||||
|
||||
Notifications Impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
|
||||
python-neutronclient will support the new ha_state column. It will show
|
||||
'active', 'standby' or 'fault' when a proper response is received. '-' will be
|
||||
displayed if None is received by an old server or for non-HA routers.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Assuming two L3 agents and 1,000 routers hosted on each, a failover from node
|
||||
1 to node 2 should induce only a single RPC call from node 2 to the server,
|
||||
and a single DB transaction.
|
||||
|
||||
IPv6 Impact
|
||||
-----------
|
||||
|
||||
None.
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Community Impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Instead of neutron-keepalived-state-change notifying the agent via a Unix
|
||||
domain socket, the agent could poll for the state of all HA routers every
|
||||
T seconds. It would then diff the new states against a cached copy
|
||||
and notify the server of any changes. One could argue that this is simpler
|
||||
to implement and maintain, but is less performant.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Assaf Muller <amuller>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Current keepalived notifier bash scripts are generated in-line. These will
|
||||
now be a Python script maintained as part of the repository. The script
|
||||
will be available as neutron-keepalived-state-change and will be invoked
|
||||
by keepalived.
|
||||
* At first the script will replicate the existing behavior of the bash
|
||||
scripts: Write the new state to disk and start up or shut down the metadata
|
||||
proxy.
|
||||
* The script must also notify the agent of the state change via a Unix domain
|
||||
socket. Starting and stopping the metadata proxy will be moved ot the agent.
|
||||
* The RPC message that updates HA routers states will be implemented (It
|
||||
currently actually already exists but cannot be used without changing
|
||||
its format).
|
||||
* The agent will batch up state change notifications in to a single RPC
|
||||
message. The Nova notifier mechanism batches notifications and the code
|
||||
will be reused.
|
||||
* The API must expose the new ha_state column.
|
||||
* The L3 agent must report HA states after it starts.
|
||||
* Add the fault state and state change timestamp via a DB migration patch.
|
||||
* Optional: The controller will look for dead agents and move their HA routers
|
||||
to the fault state.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Tempest Tests
|
||||
-------------
|
||||
|
||||
L3 HA cannot be tested in Tempest without multi-node support. L3 HA is the
|
||||
first candidate to be tested when in-tree integration tests are introduced
|
||||
via the integration-tests blueprint.
|
||||
|
||||
Functional Tests
|
||||
----------------
|
||||
|
||||
The L3 agent already has functional testing in place. Two new tests will
|
||||
be added:
|
||||
|
||||
1) When a state change occurs, that the notification arrives at the agent.
|
||||
2) When multiple state changes occur, that the RPC call is sent to the server
|
||||
with the expected parameters.
|
||||
|
||||
API Tests
|
||||
---------
|
||||
|
||||
The RPC and DB methods will be tested with unit tests.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
The changes to the API and CLI require documentation.
|
||||
|
||||
User Documentation
|
||||
------------------
|
||||
|
||||
The CLI client documentation must be updated.
|
||||
|
||||
Developer Documentation
|
||||
-----------------------
|
||||
|
||||
The Neutron API change must be documented.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Topic branch:
|
||||
https://review.openstack.org/#/q/branch:master+topic:bug/1365453,n,z
|
||||
* https://blueprints.launchpad.net/neutron/+spec/l3-high-availability
|
||||
* https://bugs.launchpad.net/neutron/+bug/1365453
|
||||
|
Loading…
Reference in New Issue
Block a user