Fenix rolling upgrade
Use case definition for Fenix project Change-Id: I45e3b4982be4357479628f4414328915bf92a62d Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
This commit is contained in:
parent
bf603c75cc
commit
124e51c285
@ -10,3 +10,4 @@ a starting point.
|
||||
|
||||
use-cases/nic-failure-affects-instance-and-app.rst
|
||||
use-cases/heat-mistral-aodh.rst
|
||||
use-cases/fenix-rolling-upgrade.rst
|
||||
|
107
use-cases/fenix-rolling-upgrade.rst
Normal file
107
use-cases/fenix-rolling-upgrade.rst
Normal file
@ -0,0 +1,107 @@
|
||||
==============================================
|
||||
Infrastructure rolling maintenance and upgrade
|
||||
==============================================
|
||||
|
||||
Telco has for years made maintenance and upgrades in rolling fashion. Now it is
|
||||
the time to achieve this in the OpenStack also. Rolling upgrade makes minimal
|
||||
downtime to infrastructure as well as for the application on top of it.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
- Infrastructure maintenance and upgrade needs to possible in rolling fashion
|
||||
to minimize downtime for services and applications.
|
||||
|
||||
- Maintenance and upgrade needs to be managed without adding more resources
|
||||
to a system while all compute capacity is in use.
|
||||
|
||||
- It needs to be possible to know what hosts and instances are maintained and
|
||||
what not.
|
||||
|
||||
- There needs to be a generic messaging defined between infrastructure and
|
||||
application manager (VNFM).
|
||||
|
||||
- It has to be possible to ask application manager to scale down at non busy
|
||||
hour to get free capacity during rolling maintenance and upgrade.
|
||||
|
||||
- Application manager will need to know when planned maintenance session is
|
||||
over, so it can scale back to full capacity.
|
||||
|
||||
- Application manager needs to be aware of planned host maintenance, so
|
||||
application (VNF) will safely be running somewhere else when the host will
|
||||
be down for maintenance.
|
||||
|
||||
- Different infrastructure services needs to be aware of host being down for
|
||||
maintenance. This can be important to disable automatic self-healing
|
||||
actions or billing. There needs to be a generic messaging defined for this.
|
||||
|
||||
- Application manager needs to know when his instances are to move to
|
||||
upgraded host, so it can also make its own upgrade to take new
|
||||
capabilities into use.
|
||||
|
||||
- Rolling maintenance framework needs to be pluggable to handle different
|
||||
maintenance and upgrade workflows and actions for hosts. This is also
|
||||
important to support different payloads and clouds.
|
||||
|
||||
- Infrastructure admin needs to be able to have rolling maintenance done
|
||||
with one-click.
|
||||
|
||||
- Infrastructure admin needs to be able to know rolling maintenance status
|
||||
through API and notification.
|
||||
|
||||
- It must be possible for each maintenance session to define needed software
|
||||
packages and plug-ins to run the maintenance workflow properly.
|
||||
|
||||
|
||||
OpenStack projects used
|
||||
=======================
|
||||
|
||||
All mentioned problems are being solved by the new `Fenix
|
||||
<https://wiki.openstack.org/wiki/Fenix>`_ project to manage the
|
||||
rolling maintenance and upgrade. More of its internals can be read
|
||||
from project own documentation and blueprints. Proof of concept code
|
||||
is already being tested in the OPNFV Doctor CI with a sample
|
||||
implementation. The `Doctor maintenance design document`__ describes
|
||||
the initial interaction needed. Also, the presentation in the
|
||||
OpenStack Vancouver summit `"How to gain VNF zero downtime during
|
||||
Infrastructure Maintenance and Upgrade"`__ will show the way for
|
||||
implementing the Fenix.
|
||||
|
||||
__ https://wiki.opnfv.org/download/attachments/5046291/Planned%20Maintenance%20Design%20Guideline.pdf?version=1&modificationDate=1527183603000&api=v2
|
||||
__ https://www.openstack.org/videos/vancouver-2018/how-to-gain-vnf-zero-down-time-during-infrastructure-maintenance-and-upgrade
|
||||
|
||||
As Fenix can interact with the application manager. There is a
|
||||
blueprint to support the interaction in Tacker__. This would enable a
|
||||
complex test case to be built to test Fenix workflow, that uses purely
|
||||
OpenStack components.
|
||||
|
||||
__ https://blueprints.launchpad.net/tacker/+spec/vnf-rolling-upgrade
|
||||
|
||||
To disable self-healing, Fenix host maintenance notification could be
|
||||
supported by Vitrage and Masakari.
|
||||
|
||||
As workflows can be different, there has already been some discussion with
|
||||
the Airship and the Blazar projects. The Blazar should make a blueprint to have
|
||||
it possible to change application-specific reservations to support rolling
|
||||
maintenance. Airship could later look to implement its own maintenance and
|
||||
upgrade process by utilizing Fenix.
|
||||
|
||||
Upgrade checks for different projects are `a community goal for
|
||||
Stein`__. This is one step towards the automated rolling upgrade.
|
||||
|
||||
__ https://storyboard.openstack.org/#!/story/2003657
|
||||
|
||||
|
||||
Future work
|
||||
===========
|
||||
|
||||
`Fenix blueprints`__ indicate what is yet to be done for the basic
|
||||
Fenix engine. When this work is ready, one can concentrate to make the
|
||||
sample workflow plug-in for the rolling upgrade, sample upgrade action
|
||||
plug-ins and the framework for testing it. Ideally, the framework use
|
||||
case would be the OpenStack and application (VNF) upgrade. This can
|
||||
then work as an example to implement own workflow and other plug-ins
|
||||
for a specific real work use case.
|
||||
|
||||
__ https://storyboard.openstack.org/#!/worklist/482
|
Loading…
Reference in New Issue
Block a user