From 61b7f497543179d66b822a6c710cd564cd394cac Mon Sep 17 00:00:00 2001 From: Kenny Johnston Date: Wed, 9 Dec 2015 19:52:37 -0600 Subject: [PATCH] Add Gaps Analysis to Rolling Upgrades Added process for rolling upgrade and initial gaps analysis Refactor the folder structure. Change-Id: I95dc7d484d76118fadf143ba57593258bcc90ad7 --- user-stories/tracked/rolling-upgrades.rst | 92 ------- .../tracked/rolling-upgrades/gap/empty | 0 .../rolling-upgrades/rolling-upgrades.rst | 239 ++++++++++++++++++ 3 files changed, 239 insertions(+), 92 deletions(-) delete mode 100644 user-stories/tracked/rolling-upgrades.rst create mode 100644 user-stories/tracked/rolling-upgrades/gap/empty create mode 100644 user-stories/tracked/rolling-upgrades/rolling-upgrades.rst diff --git a/user-stories/tracked/rolling-upgrades.rst b/user-stories/tracked/rolling-upgrades.rst deleted file mode 100644 index bc9ac8e..0000000 --- a/user-stories/tracked/rolling-upgrades.rst +++ /dev/null @@ -1,92 +0,0 @@ -Rolling Upgrades -============================= -**Sections in** *italics* **are optional.** - -*Problem description* ---------------------- -OpenStack operators often shy away from upgrading to the latest OpenStack -release due to concerns about the intrusiveness of upgrades. This prohibits -operators from realizing the complete value of their OpenStack cloud, -specifically their access to a constantly improving platform. - -User Stories ------------- -* As a Cloud User, I want to experience a stable, regularly updated - OpenStack platform in order to utilize new features, bug fixes and - security enhancements, so that my cloud development experience is - consistently world-class. -* As a Cloud Operator, I want to provide my users a reliable and - available OpenStack platform so that they do not experience any data - plane or control plane downtime -* As a Cloud Operator, I want to have confidence in my ability to - perform an OpenStack cloud upgrade so that I can perform them on a - monthly basis -* As a Cloud Operator, I want to be able to roll back the most recent cloud - upgrade I initiate in the event of issues so that I can be confident - that even in the case of errors I will still avoid data plane or - control plane downtime -* As a Cloud Operator, I want to be able to define characteristics of - a rolling reboot of my data and control plane hosts so that my users - are not impacted by a rolling upgrade - -Usage Scenarios Examples ------------------------- -1. Successful upgrade - a. Cloud Operator schedules OpenStack upgrade to latest release - b. Cloud Operator can be assured that API contracts are backwards compatible - c. Cloud Operator performs upgrade following simple documentation - d. Cloud Operator notifies users of successful upgrade and new feature and - enhancement availability - e. Cloud Operator schedules next upgrade for 1 month's time to take - advantage of backports and security updates -2. Unsuccessful upgrade - a. Cloud Operator schedules OpenStack upgrade to latest 6 month release - b. While performing upgrade Cloud Operator notices an unexpected error - c. Cloud Operator rolls back the upgrade to a previously known, error-free - state -3. Immediate Upgrade - a. Cloud Operator is informed that a security vulnerability has been found - in an OpenStack service and a patch is available for the current release - b. Cloud Operator schedules an upgrade to the newest update - c. After successfully completed the Cloud Operator's cloud is no longer - vulnerable -4. Rolling Upgrade on Dataplane - a. Cloud Operator schedules an OpenStack upgrade for a security - vulnerability which requires reboots of the entire dataplane hosts - b. Cloud Operator initiates the upgrade and performs the reboots of the - dataplane hosts in an automated, configurable process - c. Cloud Users are unaffected by the reboots - -Opportunity/Justification -------------------------- -This is a large reason why enterprises fail to gain the full value of their -OpenStack cloud. Upgrades have never been easy and in many environments require -**downtime of both the control and dataplane.** This is an inherently un-cloudy -characteristic of the OpenStack platform. Fixing upgrades so would clear up -many concerns which limit OpenStack adoption today. - -Related User Stories --------------------- -None. - -*Requirements* --------------- -None. - -*Gaps* ------- -Upgrades today require downtime in the data plane, network connectivity and often -control plane. - -*Affected By* -------------- -None. - -*External References* ---------------------- -None. - -Glossary --------- -**Control Plane** Hosts or infrastructure which operate OpenStack services -**Data Plane** Hosts or infrastructure which are managed by OpenStack services diff --git a/user-stories/tracked/rolling-upgrades/gap/empty b/user-stories/tracked/rolling-upgrades/gap/empty new file mode 100644 index 0000000..e69de29 diff --git a/user-stories/tracked/rolling-upgrades/rolling-upgrades.rst b/user-stories/tracked/rolling-upgrades/rolling-upgrades.rst new file mode 100644 index 0000000..810f8bc --- /dev/null +++ b/user-stories/tracked/rolling-upgrades/rolling-upgrades.rst @@ -0,0 +1,239 @@ +Rolling Updates and Upgrades +============================= + +*Problem description* +--------------------- +OpenStack operators often shy away from upgrading or updating OpenStack due to +concerns about the intrusiveness of upgrades. This prohibits operators from +realizing the complete value of their OpenStack cloud, specifically their +access to a constantly improving platform and interoperability with an +expanding OpenStack ecosystem. + +The use cases below cover deployments based directly on the OpenStack upstream +code base. While some of the features may be utilized by distribution providers +to improve their support for non-disruptive updates and upgrades, they are not +specifically covered in this document. + +User Stories +------------ +* As a Cloud User, I want to experience a stable, regularly updated + OpenStack platform in order to utilize new features, bug fixes and + security enhancements, so that my cloud development experience is + consistently world-class. +* As a Cloud Operator, I want to provide my users a reliable and + available OpenStack platform so that they do not experience any data + plane or control plane downtime +* As a Cloud Operator, I want to have confidence in my ability to + perform an OpenStack cloud update so that I can perform them on a + monthly basis +* As a Cloud Operator, I want to be able to roll back the most recent cloud + upgrade or update I initiate in the event of issues so that I can be + confident that even in the case of errors I will still avoid data plane or + control plane downtime +* As a Cloud Operator, I want to be able to define characteristics of + a rolling reboot of my data and control plane hosts so that my users + are not impacted by a rolling upgrade + +Usage Scenarios Examples +------------------------ +1. Successful upgrade + a. Cloud Operator schedules OpenStack upgrade to latest release + b. Cloud Operator can be assured that API contracts are backwards + compatible + c. Cloud Operator performs upgrade following simple documentation + d. Cloud Operator notifies users of successful upgrade and new feature and + enhancement availability + e. Cloud Operator schedules next update for 1 month's time (or as needed) + to take advantage of backports, bug fixes and security updates +2. Unsuccessful Update/Upgrade + a. Cloud Operator schedules OpenStack upgrade to latest 6 month release + b. While performing upgrade Cloud Operator notices an unexpected error + c. Cloud Operator rolls back the upgrade or update to a previously known, + error-free state +3. Immediate Update + a. Cloud Operator is informed that a security vulnerability has been found + in an OpenStack service and a patch is available for the current release + b. Cloud Operator schedules an update to correct the vulnerability + c. After successfully completed the Cloud Operator's cloud is no longer + vulnerable +4. Rolling Upgrade on Dataplane + a. Cloud Operator schedules an OpenStack upgrade or update for a security + vulnerability which requires reboots of the entire fleet of data-plane + hosts + b. Cloud Operator initiates the upgrade and performs the reboots of the + dataplane hosts in an automated, configurable process + c. Cloud Users are unaffected by the reboots + +Opportunity/Justification +------------------------- +This is a large reason why enterprises fail to gain the full value of their +OpenStack cloud. **Upgrades have never been easy and in many environments +require downtime of both the control and dataplane.** This is an inherently +un-cloudy characteristic of the OpenStack platform. Fixing upgrades so would +clear up many concerns which limit OpenStack adoption today. + +Related User Stories +-------------------- +None. + +*Requirements* +-------------- +None. + +*Gaps* +------ +Upgrades today require downtime in the data plane, network connectivity and +often control plane. + +The current gaps preventing rolling upgrades span a number of fronts which can +best be illustrated via a process for performing a rolling upgrade. + +1. **Maintenance Mode**- Preventing the scheduling of additional instances on a + host +2. **Live Migration**- Improvements to live migrating existing resources from + hosts +3. **Upgrade Orchestration**- Orchestrating deployment of upgraded or new + versions of a service +4. **Versioned Objects**- Enabling communication between different versions of + the same OpenStack Service +5. **Online Schema Migration**- Enable database schema migrations without + requiring service downtime +6. **Graceful Shutdown**- Ensure services can be shut down without interrupting + requests in process +7. **Upgrade Orchestration**- Orchestrating potential removal of older versions + of a service and cleanup +8. **Upgrade Orchestration**- Ease of use tools for performing upgrades across + control and data plane hosts +9. **Upgrade Gating**- Gating projects on successful rolling upgrades +10. **Project Tagging**- Informing operators which projects can successfully + perform rolling upgrades + + +For operators, a successful cloud upgrade involves all OpenStack services +deployed in a cloud. For that reason a number of these fronts require +enhancements to all projects likely deployed by operators. We'll review these +items first: + +**Versioned Objects** + +A version objects library exists in Oslo. Each individual project must consider +whether or not versioned objects is the right tool for the rolling upgrades +job. The following is the status of versioned objects for common OpenStack +projects: + +* Nova - Implemented +* Neutron - Not Implemented +* Glance - Not Implemented +* Cinder - Implemented +* Swift - Not Applicable +* Keystone - Not Implemented +* Horizon - Not Implemented +* Heat - Implemented +* Ceilometer - Alternatives Proposed + +**Online Schema Migration** + +Online schema migration, like versioned object support, is solved in a variety +of fashions. Some projects propose standard schema expansion and contraction to +happen over an entire development cycle rather than online at the time of +upgrade. The following is the status of online schema migration for common +OpenStack projects: + +* Nova - Alternative Adopted +* Neutron - Not Implemented +* Glance - Unknown +* Cinder - Not Implemented +* Swift - Unknown +* Keystone - Unknown +* Horizon - Unknown +* Heat - Unknown +* Ceilometer - Unknown + +**Maintenance Mode** + +Maintenance mode is only useful in those services where entire hosts are used +to create virtual resources. The following is the status of maintenance mode +for applicable OpenStack projects: + +* Nova - Implemented +* Cinder - Not Implemented +* Neutron - Unknown +* Ceilometer - Unknown +* Swift - Implemented + +**Live Migration** + +Like maintenance mode, live migration is only applicable to those services +where hosts are providing resources. The following is the status of live +migration for applicable OpenStack projects: + +* Nova - Implemented (needs some improvements) +* Cinder - Not Implemented +* Swift - Implemented + +**Graceful Shutdown** + +Graceful shutdown is applicable to all common OpenStack services and should +result in services being able to be shutdown only after existing requests have +been processed. The following is the status of graceful shutdown across common +OpenStack projects: + +* Nova - Implemented +* Neutron - Implemented +* Glance - Unknown +* Cinder - Implemented +* Swift - Unknown +* Keystone - Unknown +* Horizon - Unknown +* Heat - Unknown +* Ceilometer - Unknown + +Other fronts require work in specific orchestration projects or OpenStack infra +. + +**Upgrade Orchestration** + +Within OpenStack many of the cloud deployment mechanisms have made concerted +effort towards providing upgrade orchestration. Depending on the reference +architecture each deployment mechanism will determine the appropriate order and +methodology for performing a rolling upgrade. The status of each deployment +methods approach to rolling upgrades follows: + +* Triple O - Unknown +* Fuel - Unknown +* OpenStack Puppet - Unknown +* OpenStack Ansible - Upgrade scripts +* OpenStack Chef - Unknown + +**Upgrade Gating** + +OpenStack infra has not begun deploying upgrade tests into the gate. There is +an available single node upgrade test project called grenade. + +**Project Tagging** + +There is no project meta data tag to signify that a given OpenStack project is +capable of performing a rolling upgrade. +* Status - Implemented + +*Affected By* +------------- +None. + +*External References* +--------------------- +* `Dan Smith's Upgrade Blog Series `_ +* `Rolling Upgrades Project Meta Data Tag `_ + + +Glossary +-------- +* **Control Plane** Hosts or infrastructure which operate OpenStack services + (e.g. nova-api) +* **Data Plane** Hosts or infrastructure which are managed by OpenStack + services (e.g. VM running on the hypervisor) +* **Upgrade** Installing an entirely different OpenStack major software release + with new versions available twice a year +* **Update** Installing new OpenStack software, typically from a stable branch, + to gain access to bug fixes, security patches etc. These can happen as + frequently as needed \ No newline at end of file