host monitor by consul

This spec introduces a new host monitor by consul. It monitors host
heartbeat via management, tenant and storage connectivity. Only in the
case of defined states, it will trigger host recovery, for example,
storage interface disconnected.

Change-Id: If81a6a9543513acf199afe6a17dceb1544657272
Implements: bp host-monitor-by-consul
This commit is contained in:
suzhengwei 2020-06-08 14:42:14 +08:00 committed by Radosław Piliszek
parent 9631ab25d3
commit 53a7c7273c
2 changed files with 178 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

View File

@ -0,0 +1,178 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================
host monitor by consul
=======================
https://blueprints.launchpad.net/masakari-monitors/+spec/host-monitor-by-consul
Problem description
===================
Usually, there are management network, tenant network and storage network
in one cloud platform. The compute nodes may have management, tenant and
storage interface to connect to these three networks.
Currently, Masakari host monitor uses pacemaker and pacemaker-remote to
monitor hosts' connection. Actually, it is monitoring the hosts heartbeat
through management interface. Once a host's management connectivity is
detected down, it will send notification to masakari to trigger host
failure recovery workflow.
This solution has some flaws especially when ``management`` connectivity
is down and the other two connectivity ``tenant`` and ``storage`` are up.
Users can still access their VMs without any interruptions, so there is no
need to send notification in this case.
Proposed change
===============
This spec introduces a new host monitor. Specifically, host connectivity
monitoring via management, tenant and storage interfaces by consul agent.
The low-level architecture for host monitoring is shown as below:
.. image:: /images/host-monitor-by-consul.png
Each host runs three consul agents, which respectively bind management, tenant
and storage interfaces. They make up three independent consul cluster.
Consul agent runs in server mode on controller nodes, while in client mode on
compute nodes.
For example, consul cluster via management connectivity.
All agents bind management interface, and are responsible for running checks
and keeping services in sync.
Consul is built on top of Serf which provides a full gossip protocol that is
used for multiple purposes. Serf provides membership, failure detection, and
event broadcast. Consul use gossip protocol to manage membership. If one agent
is found disconnected, it will broadcast messages to the cluster quickly.
Host-monitor periodically retreives all consul members heath data from local
consul agents. It picks out every nodes' managent, tenant and storage health
state separately, and combine them together.
Then It will send notification depending on defined host states as listed below
in the table:
+-----------------+-----------------+-----------------+------------------+
| management | tenant | storage | actions |
+-----------------+-----------------+-----------------+------------------+
| up | up | down | recovery |
+-----------------+-----------------+-----------------+------------------+
| up | down | down | recovery |
+-----------------+-----------------+-----------------+------------------+
| down | up | down | recovery |
+-----------------+-----------------+-----------------+------------------+
| down | down | down | recovery |
+-----------------+-----------------+-----------------+------------------+
* 'up' represents connectivity up.
* 'down' represents connectivity down.
* 'recovery' represents host recovery.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* suzhengwei <sugar-2008@163.com>
Work Items
----------
- Masakari host-monitor driver based on consul
- masakari documentation updates
Dependencies
============
- Requires that consul agents are installed and running to monitor
the hosts management, tenant and storage connectivity.
Testing
=======
Unit tests will be needed.
Documentation Impact
====================
The admin configuration documentation need to be updated.
References
==========
https://www.consul.io/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Victoria
- Introduced
* - Xena
- Re-proposed