Merge "Add Gaps Analysis to Rolling Upgrades"
This commit is contained in:
@@ -1,92 +0,0 @@
|
|||||||
Rolling Upgrades
|
|
||||||
=============================
|
|
||||||
**Sections in** *italics* **are optional.**
|
|
||||||
|
|
||||||
*Problem description*
|
|
||||||
---------------------
|
|
||||||
OpenStack operators often shy away from upgrading to the latest OpenStack
|
|
||||||
release due to concerns about the intrusiveness of upgrades. This prohibits
|
|
||||||
operators from realizing the complete value of their OpenStack cloud,
|
|
||||||
specifically their access to a constantly improving platform.
|
|
||||||
|
|
||||||
User Stories
|
|
||||||
------------
|
|
||||||
* As a Cloud User, I want to experience a stable, regularly updated
|
|
||||||
OpenStack platform in order to utilize new features, bug fixes and
|
|
||||||
security enhancements, so that my cloud development experience is
|
|
||||||
consistently world-class.
|
|
||||||
* As a Cloud Operator, I want to provide my users a reliable and
|
|
||||||
available OpenStack platform so that they do not experience any data
|
|
||||||
plane or control plane downtime
|
|
||||||
* As a Cloud Operator, I want to have confidence in my ability to
|
|
||||||
perform an OpenStack cloud upgrade so that I can perform them on a
|
|
||||||
monthly basis
|
|
||||||
* As a Cloud Operator, I want to be able to roll back the most recent cloud
|
|
||||||
upgrade I initiate in the event of issues so that I can be confident
|
|
||||||
that even in the case of errors I will still avoid data plane or
|
|
||||||
control plane downtime
|
|
||||||
* As a Cloud Operator, I want to be able to define characteristics of
|
|
||||||
a rolling reboot of my data and control plane hosts so that my users
|
|
||||||
are not impacted by a rolling upgrade
|
|
||||||
|
|
||||||
Usage Scenarios Examples
|
|
||||||
------------------------
|
|
||||||
1. Successful upgrade
|
|
||||||
a. Cloud Operator schedules OpenStack upgrade to latest release
|
|
||||||
b. Cloud Operator can be assured that API contracts are backwards compatible
|
|
||||||
c. Cloud Operator performs upgrade following simple documentation
|
|
||||||
d. Cloud Operator notifies users of successful upgrade and new feature and
|
|
||||||
enhancement availability
|
|
||||||
e. Cloud Operator schedules next upgrade for 1 month's time to take
|
|
||||||
advantage of backports and security updates
|
|
||||||
2. Unsuccessful upgrade
|
|
||||||
a. Cloud Operator schedules OpenStack upgrade to latest 6 month release
|
|
||||||
b. While performing upgrade Cloud Operator notices an unexpected error
|
|
||||||
c. Cloud Operator rolls back the upgrade to a previously known, error-free
|
|
||||||
state
|
|
||||||
3. Immediate Upgrade
|
|
||||||
a. Cloud Operator is informed that a security vulnerability has been found
|
|
||||||
in an OpenStack service and a patch is available for the current release
|
|
||||||
b. Cloud Operator schedules an upgrade to the newest update
|
|
||||||
c. After successfully completed the Cloud Operator's cloud is no longer
|
|
||||||
vulnerable
|
|
||||||
4. Rolling Upgrade on Dataplane
|
|
||||||
a. Cloud Operator schedules an OpenStack upgrade for a security
|
|
||||||
vulnerability which requires reboots of the entire dataplane hosts
|
|
||||||
b. Cloud Operator initiates the upgrade and performs the reboots of the
|
|
||||||
dataplane hosts in an automated, configurable process
|
|
||||||
c. Cloud Users are unaffected by the reboots
|
|
||||||
|
|
||||||
Opportunity/Justification
|
|
||||||
-------------------------
|
|
||||||
This is a large reason why enterprises fail to gain the full value of their
|
|
||||||
OpenStack cloud. Upgrades have never been easy and in many environments require
|
|
||||||
**downtime of both the control and dataplane.** This is an inherently un-cloudy
|
|
||||||
characteristic of the OpenStack platform. Fixing upgrades so would clear up
|
|
||||||
many concerns which limit OpenStack adoption today.
|
|
||||||
|
|
||||||
Related User Stories
|
|
||||||
--------------------
|
|
||||||
None.
|
|
||||||
|
|
||||||
*Requirements*
|
|
||||||
--------------
|
|
||||||
None.
|
|
||||||
|
|
||||||
*Gaps*
|
|
||||||
------
|
|
||||||
Upgrades today require downtime in the data plane, network connectivity and often
|
|
||||||
control plane.
|
|
||||||
|
|
||||||
*Affected By*
|
|
||||||
-------------
|
|
||||||
None.
|
|
||||||
|
|
||||||
*External References*
|
|
||||||
---------------------
|
|
||||||
None.
|
|
||||||
|
|
||||||
Glossary
|
|
||||||
--------
|
|
||||||
**Control Plane** Hosts or infrastructure which operate OpenStack services
|
|
||||||
**Data Plane** Hosts or infrastructure which are managed by OpenStack services
|
|
||||||
0
user-stories/tracked/rolling-upgrades/gap/empty
Normal file
0
user-stories/tracked/rolling-upgrades/gap/empty
Normal file
239
user-stories/tracked/rolling-upgrades/rolling-upgrades.rst
Normal file
239
user-stories/tracked/rolling-upgrades/rolling-upgrades.rst
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
Rolling Updates and Upgrades
|
||||||
|
=============================
|
||||||
|
|
||||||
|
*Problem description*
|
||||||
|
---------------------
|
||||||
|
OpenStack operators often shy away from upgrading or updating OpenStack due to
|
||||||
|
concerns about the intrusiveness of upgrades. This prohibits operators from
|
||||||
|
realizing the complete value of their OpenStack cloud, specifically their
|
||||||
|
access to a constantly improving platform and interoperability with an
|
||||||
|
expanding OpenStack ecosystem.
|
||||||
|
|
||||||
|
The use cases below cover deployments based directly on the OpenStack upstream
|
||||||
|
code base. While some of the features may be utilized by distribution providers
|
||||||
|
to improve their support for non-disruptive updates and upgrades, they are not
|
||||||
|
specifically covered in this document.
|
||||||
|
|
||||||
|
User Stories
|
||||||
|
------------
|
||||||
|
* As a Cloud User, I want to experience a stable, regularly updated
|
||||||
|
OpenStack platform in order to utilize new features, bug fixes and
|
||||||
|
security enhancements, so that my cloud development experience is
|
||||||
|
consistently world-class.
|
||||||
|
* As a Cloud Operator, I want to provide my users a reliable and
|
||||||
|
available OpenStack platform so that they do not experience any data
|
||||||
|
plane or control plane downtime
|
||||||
|
* As a Cloud Operator, I want to have confidence in my ability to
|
||||||
|
perform an OpenStack cloud update so that I can perform them on a
|
||||||
|
monthly basis
|
||||||
|
* As a Cloud Operator, I want to be able to roll back the most recent cloud
|
||||||
|
upgrade or update I initiate in the event of issues so that I can be
|
||||||
|
confident that even in the case of errors I will still avoid data plane or
|
||||||
|
control plane downtime
|
||||||
|
* As a Cloud Operator, I want to be able to define characteristics of
|
||||||
|
a rolling reboot of my data and control plane hosts so that my users
|
||||||
|
are not impacted by a rolling upgrade
|
||||||
|
|
||||||
|
Usage Scenarios Examples
|
||||||
|
------------------------
|
||||||
|
1. Successful upgrade
|
||||||
|
a. Cloud Operator schedules OpenStack upgrade to latest release
|
||||||
|
b. Cloud Operator can be assured that API contracts are backwards
|
||||||
|
compatible
|
||||||
|
c. Cloud Operator performs upgrade following simple documentation
|
||||||
|
d. Cloud Operator notifies users of successful upgrade and new feature and
|
||||||
|
enhancement availability
|
||||||
|
e. Cloud Operator schedules next update for 1 month's time (or as needed)
|
||||||
|
to take advantage of backports, bug fixes and security updates
|
||||||
|
2. Unsuccessful Update/Upgrade
|
||||||
|
a. Cloud Operator schedules OpenStack upgrade to latest 6 month release
|
||||||
|
b. While performing upgrade Cloud Operator notices an unexpected error
|
||||||
|
c. Cloud Operator rolls back the upgrade or update to a previously known,
|
||||||
|
error-free state
|
||||||
|
3. Immediate Update
|
||||||
|
a. Cloud Operator is informed that a security vulnerability has been found
|
||||||
|
in an OpenStack service and a patch is available for the current release
|
||||||
|
b. Cloud Operator schedules an update to correct the vulnerability
|
||||||
|
c. After successfully completed the Cloud Operator's cloud is no longer
|
||||||
|
vulnerable
|
||||||
|
4. Rolling Upgrade on Dataplane
|
||||||
|
a. Cloud Operator schedules an OpenStack upgrade or update for a security
|
||||||
|
vulnerability which requires reboots of the entire fleet of data-plane
|
||||||
|
hosts
|
||||||
|
b. Cloud Operator initiates the upgrade and performs the reboots of the
|
||||||
|
dataplane hosts in an automated, configurable process
|
||||||
|
c. Cloud Users are unaffected by the reboots
|
||||||
|
|
||||||
|
Opportunity/Justification
|
||||||
|
-------------------------
|
||||||
|
This is a large reason why enterprises fail to gain the full value of their
|
||||||
|
OpenStack cloud. **Upgrades have never been easy and in many environments
|
||||||
|
require downtime of both the control and dataplane.** This is an inherently
|
||||||
|
un-cloudy characteristic of the OpenStack platform. Fixing upgrades so would
|
||||||
|
clear up many concerns which limit OpenStack adoption today.
|
||||||
|
|
||||||
|
Related User Stories
|
||||||
|
--------------------
|
||||||
|
None.
|
||||||
|
|
||||||
|
*Requirements*
|
||||||
|
--------------
|
||||||
|
None.
|
||||||
|
|
||||||
|
*Gaps*
|
||||||
|
------
|
||||||
|
Upgrades today require downtime in the data plane, network connectivity and
|
||||||
|
often control plane.
|
||||||
|
|
||||||
|
The current gaps preventing rolling upgrades span a number of fronts which can
|
||||||
|
best be illustrated via a process for performing a rolling upgrade.
|
||||||
|
|
||||||
|
1. **Maintenance Mode**- Preventing the scheduling of additional instances on a
|
||||||
|
host
|
||||||
|
2. **Live Migration**- Improvements to live migrating existing resources from
|
||||||
|
hosts
|
||||||
|
3. **Upgrade Orchestration**- Orchestrating deployment of upgraded or new
|
||||||
|
versions of a service
|
||||||
|
4. **Versioned Objects**- Enabling communication between different versions of
|
||||||
|
the same OpenStack Service
|
||||||
|
5. **Online Schema Migration**- Enable database schema migrations without
|
||||||
|
requiring service downtime
|
||||||
|
6. **Graceful Shutdown**- Ensure services can be shut down without interrupting
|
||||||
|
requests in process
|
||||||
|
7. **Upgrade Orchestration**- Orchestrating potential removal of older versions
|
||||||
|
of a service and cleanup
|
||||||
|
8. **Upgrade Orchestration**- Ease of use tools for performing upgrades across
|
||||||
|
control and data plane hosts
|
||||||
|
9. **Upgrade Gating**- Gating projects on successful rolling upgrades
|
||||||
|
10. **Project Tagging**- Informing operators which projects can successfully
|
||||||
|
perform rolling upgrades
|
||||||
|
|
||||||
|
|
||||||
|
For operators, a successful cloud upgrade involves all OpenStack services
|
||||||
|
deployed in a cloud. For that reason a number of these fronts require
|
||||||
|
enhancements to all projects likely deployed by operators. We'll review these
|
||||||
|
items first:
|
||||||
|
|
||||||
|
**Versioned Objects**
|
||||||
|
|
||||||
|
A version objects library exists in Oslo. Each individual project must consider
|
||||||
|
whether or not versioned objects is the right tool for the rolling upgrades
|
||||||
|
job. The following is the status of versioned objects for common OpenStack
|
||||||
|
projects:
|
||||||
|
|
||||||
|
* Nova - Implemented
|
||||||
|
* Neutron - Not Implemented
|
||||||
|
* Glance - Not Implemented
|
||||||
|
* Cinder - Implemented
|
||||||
|
* Swift - Not Applicable
|
||||||
|
* Keystone - Not Implemented
|
||||||
|
* Horizon - Not Implemented
|
||||||
|
* Heat - Implemented
|
||||||
|
* Ceilometer - Alternatives Proposed
|
||||||
|
|
||||||
|
**Online Schema Migration**
|
||||||
|
|
||||||
|
Online schema migration, like versioned object support, is solved in a variety
|
||||||
|
of fashions. Some projects propose standard schema expansion and contraction to
|
||||||
|
happen over an entire development cycle rather than online at the time of
|
||||||
|
upgrade. The following is the status of online schema migration for common
|
||||||
|
OpenStack projects:
|
||||||
|
|
||||||
|
* Nova - Alternative Adopted
|
||||||
|
* Neutron - Not Implemented
|
||||||
|
* Glance - Unknown
|
||||||
|
* Cinder - Not Implemented
|
||||||
|
* Swift - Unknown
|
||||||
|
* Keystone - Unknown
|
||||||
|
* Horizon - Unknown
|
||||||
|
* Heat - Unknown
|
||||||
|
* Ceilometer - Unknown
|
||||||
|
|
||||||
|
**Maintenance Mode**
|
||||||
|
|
||||||
|
Maintenance mode is only useful in those services where entire hosts are used
|
||||||
|
to create virtual resources. The following is the status of maintenance mode
|
||||||
|
for applicable OpenStack projects:
|
||||||
|
|
||||||
|
* Nova - Implemented
|
||||||
|
* Cinder - Not Implemented
|
||||||
|
* Neutron - Unknown
|
||||||
|
* Ceilometer - Unknown
|
||||||
|
* Swift - Implemented
|
||||||
|
|
||||||
|
**Live Migration**
|
||||||
|
|
||||||
|
Like maintenance mode, live migration is only applicable to those services
|
||||||
|
where hosts are providing resources. The following is the status of live
|
||||||
|
migration for applicable OpenStack projects:
|
||||||
|
|
||||||
|
* Nova - Implemented (needs some improvements)
|
||||||
|
* Cinder - Not Implemented
|
||||||
|
* Swift - Implemented
|
||||||
|
|
||||||
|
**Graceful Shutdown**
|
||||||
|
|
||||||
|
Graceful shutdown is applicable to all common OpenStack services and should
|
||||||
|
result in services being able to be shutdown only after existing requests have
|
||||||
|
been processed. The following is the status of graceful shutdown across common
|
||||||
|
OpenStack projects:
|
||||||
|
|
||||||
|
* Nova - Implemented
|
||||||
|
* Neutron - Implemented
|
||||||
|
* Glance - Unknown
|
||||||
|
* Cinder - Implemented
|
||||||
|
* Swift - Unknown
|
||||||
|
* Keystone - Unknown
|
||||||
|
* Horizon - Unknown
|
||||||
|
* Heat - Unknown
|
||||||
|
* Ceilometer - Unknown
|
||||||
|
|
||||||
|
Other fronts require work in specific orchestration projects or OpenStack infra
|
||||||
|
.
|
||||||
|
|
||||||
|
**Upgrade Orchestration**
|
||||||
|
|
||||||
|
Within OpenStack many of the cloud deployment mechanisms have made concerted
|
||||||
|
effort towards providing upgrade orchestration. Depending on the reference
|
||||||
|
architecture each deployment mechanism will determine the appropriate order and
|
||||||
|
methodology for performing a rolling upgrade. The status of each deployment
|
||||||
|
methods approach to rolling upgrades follows:
|
||||||
|
|
||||||
|
* Triple O - Unknown
|
||||||
|
* Fuel - Unknown
|
||||||
|
* OpenStack Puppet - Unknown
|
||||||
|
* OpenStack Ansible - Upgrade scripts
|
||||||
|
* OpenStack Chef - Unknown
|
||||||
|
|
||||||
|
**Upgrade Gating**
|
||||||
|
|
||||||
|
OpenStack infra has not begun deploying upgrade tests into the gate. There is
|
||||||
|
an available single node upgrade test project called grenade.
|
||||||
|
|
||||||
|
**Project Tagging**
|
||||||
|
|
||||||
|
There is no project meta data tag to signify that a given OpenStack project is
|
||||||
|
capable of performing a rolling upgrade.
|
||||||
|
* Status - Implemented
|
||||||
|
|
||||||
|
*Affected By*
|
||||||
|
-------------
|
||||||
|
None.
|
||||||
|
|
||||||
|
*External References*
|
||||||
|
---------------------
|
||||||
|
* `Dan Smith's Upgrade Blog Series <http://www.danplanet.com/blog/tag/upgrades/>`_
|
||||||
|
* `Rolling Upgrades Project Meta Data Tag <https://github.com/openstack/governance/blob/master/reference/tags/assert_supports-rolling-upgrade.rst>`_
|
||||||
|
|
||||||
|
|
||||||
|
Glossary
|
||||||
|
--------
|
||||||
|
* **Control Plane** Hosts or infrastructure which operate OpenStack services
|
||||||
|
(e.g. nova-api)
|
||||||
|
* **Data Plane** Hosts or infrastructure which are managed by OpenStack
|
||||||
|
services (e.g. VM running on the hypervisor)
|
||||||
|
* **Upgrade** Installing an entirely different OpenStack major software release
|
||||||
|
with new versions available twice a year
|
||||||
|
* **Update** Installing new OpenStack software, typically from a stable branch,
|
||||||
|
to gain access to bug fixes, security patches etc. These can happen as
|
||||||
|
frequently as needed
|
||||||
Reference in New Issue
Block a user