Node reinstallation
implements blueprint mos-node-reinstallation Change-Id: I0fa9c39832a0e458fbb897456370872058197763
This commit is contained in:
parent
b4b03b8730
commit
a2f6232f2e
204
specs/7.0/node-reinstallation.rst
Normal file
204
specs/7.0/node-reinstallation.rst
Normal file
@ -0,0 +1,204 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
MOS Node Reinstallation
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/fuel/+spec/mos-node-reinstallation
|
||||
|
||||
Node reinstallation allows fully/partially recover failed nodes
|
||||
using the standard fuel processes 'provision' and 'deploy'.
|
||||
Full reinstallation - purge all data from reinstalled node
|
||||
Partial reinstallation - some data can be preserved OS will be
|
||||
reinstalled.
|
||||
In case when only system should be reinstalled from scratch
|
||||
(partially) Partition Preservation feature should be enabled.
|
||||
|
||||
(https://blueprints.launchpad.net/fuel/+spec/partition-preservation)
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently fuel does not fully support functioning node reinstallation.
|
||||
Slave nodes can't be restored after fail. Including but not limited to
|
||||
MongoDB failures, Galera failures, update failures, upgrade failures, etc.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Reinstallation feature includes multiple changes which should be implemented.
|
||||
|
||||
|
||||
* Partition Preservation (will be implemented separately)
|
||||
(https://blueprints.launchpad.net/fuel/+spec/partition-preservation).
|
||||
|
||||
* Node renaming (https://blueprints.launchpad.net/fuel/+spec/node-naming).
|
||||
|
||||
* MongoDB recovery in case of failure (assumed should be fixed in 7.0).
|
||||
|
||||
* Swift ring sync during redeploy (assumed should be fixed in 7.0).
|
||||
|
||||
|
||||
Reinstallation process:
|
||||
|
||||
1) Nailgun shouldn't serialize recovering controller as primary.
|
||||
Nailgun should always serialize recovering controller as regular
|
||||
controller. Same is applied for other roles that have primary one.
|
||||
|
||||
2) Partition preservation manipulation should be prepared
|
||||
before node will be reprovisioned.
|
||||
|
||||
3) The last step is provision and deploy.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
API part will not change. Reinstallation process will use standard
|
||||
API calls - provision and deploy
|
||||
|
||||
API changes will be in partition preservation
|
||||
(https://blueprints.launchpad.net/fuel/+spec/partition-preservation).
|
||||
|
||||
Node renaming
|
||||
(https://blueprints.launchpad.net/fuel/+spec/node-naming).
|
||||
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Reinstallation process using partition preservation should improve
|
||||
deployment stage. Swift, Mysql, Mongodb services synchronization
|
||||
time should be shorter.
|
||||
In case compute node should be reinstalled using partition
|
||||
preservation method VM images migration not required.
|
||||
|
||||
None
|
||||
|
||||
Plugin impact
|
||||
-------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
:Primary Assignee: Ivan Ponomarev
|
||||
|
||||
:QA: Dmitriy Kruglov
|
||||
|
||||
:Nandatory design review: Vladimir Kuklin
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
#. Nailgun shouldn't serialize recovered controller as primary
|
||||
Nailgun should be able reinstall slave node and using the same name
|
||||
to return slave node back to the cluster.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
No strict dependencies
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Manual testing and acceptance criteria:
|
||||
|
||||
- It is possible to perform a full reinstallation (all data is purged) of a
|
||||
failed slave node to recover to previous working state
|
||||
- It is possible to perform a partial reinstallation (some data is preserved)
|
||||
of a failed slave node to recover to previous working state
|
||||
|
||||
Scenarios to automate
|
||||
|
||||
Reinstall single compute:
|
||||
|
||||
1. Do reinstallation of the compute
|
||||
2. Run Network check
|
||||
3. Run OSTF tests set
|
||||
4. list nova services and verify that the 'nova-compute' service is enabled
|
||||
and is running on the reinstalled node
|
||||
|
||||
Reinstall single controller:
|
||||
|
||||
1. Do reinstallation of the controller
|
||||
2. Run Network check
|
||||
3. Run OSTF tests set
|
||||
4. Verify that the reinstalled controller is in pacemaker cluster and has
|
||||
'online' status
|
||||
5. Verify that the reinstalled controller is in rabbitmq cluster and running
|
||||
6. Verify that the reinstalled controller is in Halera cluster
|
||||
|
||||
Reinstallation of full cluster:
|
||||
|
||||
1. Do reinstallation of whole cluster
|
||||
2. Run Network check
|
||||
3. Run OSTF tests set
|
||||
4. Verify that the reinstalled controller is in pacemaker cluster and has
|
||||
'online' status
|
||||
5. Verify that the reinstalled controller is in rabbitmq cluster and running
|
||||
6. Verify that the reinstalled controller is in Halera cluster
|
||||
7. list nova services and verify that the 'nova-compute' service is enabled
|
||||
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Reinstallation documentation will be added to the User Guide section
|
||||
|
||||
References
|
||||
==========
|
||||
|
Loading…
Reference in New Issue
Block a user