config/sysinv/sysinv/sysinv
Eric MacDonald f19dd0498f Mtce: Make Multi-Node Failure Avoidance Configurable
The maintenance system implements a high availability (HA) feature
designed to detect the simultaneous heartbeat failure of a group
of hosts and avoid failing all those hosts until heartbeat resumes
or after a set period of time.

This feature is called Multi-Node Failure Avoidance, aka MNFA, and
currently has the hosts threshold set to 3 and timeout set to 100 secs.

This update implements enhancements to that existing feature by
making the 'number-of-hosts threshold' and 'timeout period'
customer configurable service parameters.

The new service parameters are listed under platform:maintenance which
display with the following command

> system service-parameter-list

mnfa_threshold: This new label and value is added to the puppet
managed /etc/mtc.ini and represents the number of hosts that are
required to fail heartbeat as a group; within the heartbeat
failure window (heartbeat_failure_threshold) after which maintenance
activates MNFA Mode.

This update changes the default number of failing hosts from
3 to 2 while allowing a configurable range from 2 to 100.

mnfa_timeout: This new label and value is added to the puppet
managed /etc/mtc.ini. While MNFA mode is active, it will remain active
until the number of failing hosts drop below the mnfa_threshold or this
timer expires. The MNFA mode deactivates on the first occurance of
either case. Upon deactivation the remaining failed hosts are no
longer treated as a failure group but instead are all Gracefully
Recovered individually. A value of zero imposes no timeout making the
deactivation criteria solely host based.

This update changes the default 100 second timer to 0; no-timeout
while permitting valid a times range from 100 to 86400 secs or 1 day.

DocImpact
Story: 2003576
Task: 24903

Change-Id: I2fb737a4cd3c235845b064449949fcada303d6b2
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-08-31 10:43:25 -04:00
..
.eggs StarlingX open source release updates 2018-05-31 07:35:52 -07:00
contrib StarlingX open source release updates 2018-05-31 07:35:52 -07:00
doc/source delete unnecessary symbol 2018-08-01 02:21:46 +00:00
etc/sysinv Remove Ceph Cache Tiering support 2018-06-29 13:44:43 -04:00
scripts StarlingX open source release updates 2018-05-31 07:35:52 -07:00
sysinv Mtce: Make Multi-Node Failure Avoidance Configurable 2018-08-31 10:43:25 -04:00
tools StarlingX open source release updates 2018-05-31 07:35:52 -07:00
.coveragerc StarlingX open source release updates 2018-05-31 07:35:52 -07:00
.gitignore Sysinv tox updates. Prepare for bandit reports and test reports 2018-06-29 13:25:09 -04:00
.testr.conf Sysinv. Cleanup import statements for pep8 2018-06-29 13:43:53 -04:00
CONTRIBUTING.rst StarlingX open source release updates 2018-05-31 07:35:52 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:35:52 -07:00
MANIFEST.in StarlingX open source release updates 2018-05-31 07:35:52 -07:00
README.rst StarlingX open source release updates 2018-05-31 07:35:52 -07:00
babel.cfg StarlingX open source release updates 2018-05-31 07:35:52 -07:00
openstack-common.conf StarlingX open source release updates 2018-05-31 07:35:52 -07:00
pylint.rc Fix sysinv tox job 2018-06-29 13:44:42 -04:00
requirements.txt Extend sysinv to assign kubernetes labels to nodes 2018-08-24 15:40:48 -04:00
setup.cfg Move storage puppet plugin before nova plugin 2018-08-29 16:45:53 -04:00
setup.py Add a zuul job for sysinv tox unittest 2018-08-13 16:34:06 +08:00
test-requirements.txt Fix TOX for sysinv 2018-06-28 22:07:39 -04:00
tox.ini Fix tox pep8 errors of type E722 in sysinv 2018-08-10 14:51:50 -04:00

README.rst

Placeholder to allow setup.py to work. Removing this requires modifying the setup.py manifest.