Fenix rolling upgrade

Use case definition for Fenix project Change-Id: I45e3b4982be4357479628f4414328915bf92a62d Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com>
2018-11-05 07:15:50 +02:00 · 2018-11-05 07:15:50 +02:00 · 124e51c285
commit 124e51c285
parent bf603c75cc
2 changed files with 108 additions and 0 deletions
--- a/doc/source/use-cases.rst
+++ b/doc/source/use-cases.rst
@ -10,3 +10,4 @@ a starting point.

   use-cases/nic-failure-affects-instance-and-app.rst
   use-cases/heat-mistral-aodh.rst
+   use-cases/fenix-rolling-upgrade.rst
--- a/use-cases/fenix-rolling-upgrade.rst
+++ b/use-cases/fenix-rolling-upgrade.rst
@ -0,0 +1,107 @@
+==============================================
+Infrastructure rolling maintenance and upgrade
+==============================================
+
+Telco has for years made maintenance and upgrades in rolling fashion. Now it is
+the time to achieve this in the OpenStack also. Rolling upgrade makes minimal
+downtime to infrastructure as well as for the application on top of it.
+
+
+Problem description
+===================
+
+- Infrastructure maintenance and upgrade needs to possible in rolling fashion
+  to minimize downtime for services and applications.
+
+- Maintenance and upgrade needs to be managed without adding more resources
+  to a system while all compute capacity is in use.
+
+- It needs to be possible to know what hosts and instances are maintained and
+  what not.
+
+- There needs to be a generic messaging defined between infrastructure and
+  application manager (VNFM).
+
+- It has to be possible to ask application manager to scale down at non busy
+  hour to get free capacity during rolling maintenance and upgrade.
+
+- Application manager will need to know when planned maintenance session is
+  over, so it can scale back to full capacity.
+
+- Application manager needs to be aware of planned host maintenance, so
+  application (VNF) will safely be running somewhere else when the host will
+  be down for maintenance.
+
+- Different infrastructure services needs to be aware of host being down for
+  maintenance. This can be important to disable automatic self-healing
+  actions or billing. There needs to be a generic messaging defined for this.
+
+- Application manager needs to know when his instances are to move to
+  upgraded host, so it can also make its own upgrade to take new
+  capabilities into use.
+
+- Rolling maintenance framework needs to be pluggable to handle different
+  maintenance and upgrade workflows and actions for hosts. This is also
+  important to support different payloads and clouds.
+
+- Infrastructure admin needs to be able to have rolling maintenance done
+  with one-click.
+
+- Infrastructure admin needs to be able to know rolling maintenance status
+  through API and notification.
+
+- It must be possible for each maintenance session to define needed software
+  packages and plug-ins to run the maintenance workflow properly.
+
+
+OpenStack projects used
+=======================
+
+All mentioned problems are being solved by the new `Fenix
+<https://wiki.openstack.org/wiki/Fenix>`_ project to manage the
+rolling maintenance and upgrade. More of its internals can be read
+from project own documentation and blueprints. Proof of concept code
+is already being tested in the OPNFV Doctor CI with a sample
+implementation. The `Doctor maintenance design document`__ describes
+the initial interaction needed. Also, the presentation in the
+OpenStack Vancouver summit `"How to gain VNF zero downtime during
+Infrastructure Maintenance and Upgrade"`__ will show the way for
+implementing the Fenix.
+
+__ https://wiki.opnfv.org/download/attachments/5046291/Planned%20Maintenance%20Design%20Guideline.pdf?version=1&modificationDate=1527183603000&api=v2
+__ https://www.openstack.org/videos/vancouver-2018/how-to-gain-vnf-zero-down-time-during-infrastructure-maintenance-and-upgrade
+
+As Fenix can interact with the application manager. There is a
+blueprint to support the interaction in Tacker__.  This would enable a
+complex test case to be built to test Fenix workflow, that uses purely
+OpenStack components.
+
+__ https://blueprints.launchpad.net/tacker/+spec/vnf-rolling-upgrade
+
+To disable self-healing, Fenix host maintenance notification could be
+supported by Vitrage and Masakari.
+
+As workflows can be different, there has already been some discussion with
+the Airship and the Blazar projects. The Blazar should make a blueprint to have
+it possible to change application-specific reservations to support rolling
+maintenance. Airship could later look to implement its own maintenance and
+upgrade process by utilizing Fenix.
+
+Upgrade checks for different projects are `a community goal for
+Stein`__. This is one step towards the automated rolling upgrade.
+
+__ https://storyboard.openstack.org/#!/story/2003657
+
+
+Future work
+===========
+
+`Fenix blueprints`__ indicate what is yet to be done for the basic
+Fenix engine. When this work is ready, one can concentrate to make the
+sample workflow plug-in for the rolling upgrade, sample upgrade action
+plug-ins and the framework for testing it. Ideally, the framework use
+case would be the OpenStack and application (VNF) upgrade. This can
+then work as an example to implement own workflow and other plug-ins
+for a specific real work use case.
+
+__ https://storyboard.openstack.org/#!/worklist/482