Merge "StarlingX Platform Upgrades"
This commit is contained in:
commit
8fecfa3bf0
|
@ -0,0 +1,735 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
|
||||
===========================
|
||||
StarlingX Platform Upgrades
|
||||
===========================
|
||||
|
||||
Storyboard:
|
||||
https://storyboard.openstack.org/#!/story/2007403
|
||||
|
||||
This story will provide a mechanism to upgrade the platform components on
|
||||
a running StarlingX system. This is required to allow upgrades between
|
||||
StarlingX versions.
|
||||
|
||||
The platform upgrade components includes the Host OS and StarlingX
|
||||
components. (e.g. flock services)
|
||||
|
||||
A maintenance release for stx3 is required to upgrade to stx4.0
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
StarlingX must provide a mechanism to allow migration to a new StarlingX
|
||||
release.
|
||||
|
||||
In order to provide a robust and simple upgrade experience for
|
||||
users of StarlingX, the upgrade process must be automated as much as
|
||||
possible and controls must be in place to ensure the steps are followed
|
||||
in the right order.
|
||||
|
||||
The platform components compatibility release over release are affected
|
||||
by inter node messaging between components, configuration migration
|
||||
requirements, and kubernetes control plane compatibility.
|
||||
|
||||
The downtime over an upgrade must be minimized.
|
||||
|
||||
* controller upgrade - impact minimized to time for host-swact
|
||||
* worker upgrade - impact to applications minimized to time it take to
|
||||
migrate application from a worker node before it is upgraded
|
||||
* storage - no loss of storage over an upgrade
|
||||
|
||||
Upgrades must be done in-service. The platform and applications must
|
||||
continue to provide service during the upgrade. This does not apply to
|
||||
simplex deployments.
|
||||
|
||||
Upgrades must be done without any additional hardware.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
Three types of StarlingX upgrades will be supported:
|
||||
|
||||
* Platform Upgrade, which includes Host OS and StarlingX components
|
||||
(e.g. flock services)
|
||||
* Kubernetes Upgrade
|
||||
* Application upgrade, which includes: StarlingX applications (e.g.
|
||||
platform-integ-apps, stx-openstack), User applications and kubernetes workloads
|
||||
|
||||
These three types of upgrades are done independently. For example, the Platform is
|
||||
upgraded to a new release of StarlingX without changing the kubernetes version.
|
||||
However, there are dependencies which determine the order in which these
|
||||
upgrades can be done. For example, kubernetes must be upgraded to a particular
|
||||
version before a platform upgrade can be done.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* Administrator wants to upgrade to a new StarlingX platform version
|
||||
with minimal impact to running applications.
|
||||
* Administrator wants to abort an upgrade in progress prior to upgrading
|
||||
all controllers
|
||||
Note: Downgrade to previous release version is not supported
|
||||
|
||||
|
||||
Proposed Process
|
||||
================
|
||||
|
||||
StarlingX will only support upgrades from release N to release N+1.
|
||||
For example, maintenance release stx3.x can be upgraded to stx4,
|
||||
but not directly to stx5.0.
|
||||
Changes required for kubernetes configuration compatibility is
|
||||
delivered in a maintenance release to enable upgrade from stx3 to stx4.
|
||||
|
||||
The administrator must ensure that their deployment has enough
|
||||
extra capacity (e.g. worker hosts) to allow one (or more) hosts to be
|
||||
temporarily taken out of service for the upgrade.
|
||||
|
||||
For each supported platform version, supported upgrades from version is
|
||||
tracked by the metadata (in metal/common-bsp/files/upgrades/metadata.xml).
|
||||
The metadata handling is extended to support multiple from versions.
|
||||
|
||||
A maintenance release will enable stx3 to stx4 upgrades, and includes
|
||||
the configuration updates required to enable compatibility with the
|
||||
kubernetes control plane during the upgrade.
|
||||
|
||||
The following is a summary of the steps the user will take when performing
|
||||
a platform upgrade. For each step, a summary of the actions the system
|
||||
will perform is provided.
|
||||
|
||||
The software_upgrade table tracks the upgrade state, and includes:
|
||||
* upgrade-started
|
||||
* data-migration
|
||||
* data-migration-complete
|
||||
* upgrading-controllers
|
||||
* upgrading-hosts
|
||||
* activation-requested
|
||||
* activation-complete
|
||||
* completing
|
||||
* completed
|
||||
|
||||
When an upgrade is aborted the following state transitions occur:
|
||||
* aborting
|
||||
* abort-completing
|
||||
* aborting-reinstall
|
||||
|
||||
#. **Import release N+1 load**
|
||||
|
||||
::
|
||||
|
||||
# system load-import <bootimage.iso> <bootimage.sig>
|
||||
|
||||
# system load-list
|
||||
+----+----------+------------------+
|
||||
| id | state | software_version |
|
||||
+----+----------+------------------+
|
||||
| 1 | active | 19.12 |
|
||||
| 2 | imported | 20.06 |
|
||||
+----+----------+------------------+
|
||||
|
||||
The fields are:
|
||||
|
||||
* software_version: comes from metadata in the load image.
|
||||
|
||||
* states:
|
||||
|
||||
* active: the current version, version N
|
||||
* importing: image is being uploaded to load repository
|
||||
* error: error load state
|
||||
* deleting: load is being deleted from repository
|
||||
* imported: version that can be upgraded to, i.e. version N+1
|
||||
|
||||
#. **Perform Health checks for upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system health-query-upgrade
|
||||
|
||||
This will perform health checks to ensure the system is at
|
||||
a state ready for upgrade.
|
||||
|
||||
These health checks are also performed as part of upgrade-start.
|
||||
|
||||
These include checks for:
|
||||
|
||||
* upgrade target load is imported
|
||||
* all hosts provisioned
|
||||
* all hosts load current
|
||||
* all hosts unlocked/enabled
|
||||
* all hosts have matching configs
|
||||
* no management affecting alarms
|
||||
* for ceph systems: storage cluster is healthy
|
||||
* verifies kubernetes nodes are ready
|
||||
* verifies kubernetes control plane pods are ready
|
||||
* verifies that the kubernetes control plane is at a version and
|
||||
configuration required for upgrade. If not, the kubernetes upgrade [1]_
|
||||
method must be performed in order to bring it to baseline.
|
||||
|
||||
#. **Start the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-start
|
||||
|
||||
This performs semantic checks and the health checks as per the
|
||||
'health-query-upgrade' step.
|
||||
|
||||
This will make a copy of the system data (e.g. postgres databases,
|
||||
armada, helm, kubernetes, and puppet hiera data migrations) to be used
|
||||
in the upgrade.
|
||||
|
||||
Note that /opt/etcd cluster data may be dynamically updating cluster
|
||||
state info until the host-swact when service management brings down the
|
||||
active-standby etcd on N side and up on N+1 side.
|
||||
|
||||
Configuration changes are not allowed after this point, until the upgrade
|
||||
is completed.
|
||||
|
||||
#. **Lock and upgrade controller-1**
|
||||
|
||||
::
|
||||
|
||||
# system host-upgrade controller-1
|
||||
|
||||
* upgrade state is set to 'data-migration'
|
||||
* update upgrade_controller_1 flag so that controller-1 can determine
|
||||
whether in an upgrade
|
||||
* host controller-1 is reinstalled with N+1 load
|
||||
* Migrate data and configuration from release N to release N+1
|
||||
* A special release N+1 puppet upgrade manifest is applied, based on
|
||||
the hiera data that was migrated from release N. This allows for one-time
|
||||
actions similar to what was done on the initial install of controller-0
|
||||
(e.g. configuring rabbit, postgres, keystone).
|
||||
* Generate hiera data for release N+1, to be used to apply
|
||||
the regular puppet manifest when controller-1 is unlocked
|
||||
* sync replicated (DRBD) filesystems
|
||||
* upgrade state is set to 'data-migration-complete'
|
||||
* system data is present in both release N and release N+1 versioned
|
||||
directories (e.g. /opt/platform/config/<release>,
|
||||
/var/lib/postgresql/<release>)
|
||||
|
||||
#. **Unlock controller-1**
|
||||
|
||||
This includes generating configuration data for controller-1 which must be
|
||||
generated from the active controller.
|
||||
|
||||
* the join_cmd for the kubernetes controlplane is generated on the N side
|
||||
for the N+1 hierdata
|
||||
|
||||
The N+1 hiera data drives the puppet manifest apply.
|
||||
|
||||
#. **Swact to controller-1**
|
||||
|
||||
::
|
||||
|
||||
# system host-swact controller-0
|
||||
|
||||
* controller-1 becomes active and runs release N+1 while rest of the
|
||||
system is running release N
|
||||
* Any release N+1 components that do inter-node communications must be
|
||||
backwards compatible to ensure that communication with release N
|
||||
works correctly
|
||||
* update to backup /opt/etcd/version_N and restore
|
||||
to /opt/etcd/version_N+1 for the target version on host-swact
|
||||
This must be performed at a time where data loss can be avoided.
|
||||
As part of the host-swact startup on controller-1, and during an upgrade,
|
||||
etcd is copied from the release N etcd directory to the release N+1
|
||||
etcd directory.
|
||||
|
||||
#. **Lock and upgrade controller-0**
|
||||
|
||||
::
|
||||
|
||||
# system host-upgrade controller-0
|
||||
|
||||
* install N+1 load and on the host-unlock, apply the upgrades manifest
|
||||
and the puppet host configuration
|
||||
* after controller-0 is upgraded, upgrade state is set to 'upgrading-hosts'
|
||||
|
||||
#. **If applicable, Lock and upgrade storage hosts**
|
||||
|
||||
::
|
||||
|
||||
# system host-upgrade storage-0
|
||||
|
||||
* If provisioned, all storage hosts must be upgraded prior to
|
||||
proceeding with workers
|
||||
* Install N+1 load; up to half of the storage hosts can be done in parallel
|
||||
* Ceph data sync
|
||||
|
||||
#. **Lock and upgrade worker hosts**
|
||||
|
||||
::
|
||||
|
||||
# system host-upgrade worker-x
|
||||
|
||||
* Migrate workloads from worker node (triggered by host-lock)
|
||||
* Install N+1 load
|
||||
* Can be done in parallel, depending upon excess capacity.
|
||||
Each worker host will first be locked using the existing "system
|
||||
host-lock" CLI (worker hosts can be done in any order). This results in
|
||||
services being migrated off the host and applies the NoExecute taint,
|
||||
which will evict any pods that can be evicted.
|
||||
|
||||
#. **Activate the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-activate
|
||||
|
||||
* Perform any additional configuration which may be required after all
|
||||
hosts have been upgraded.
|
||||
|
||||
#. **host-swact to controller-0**
|
||||
|
||||
::
|
||||
|
||||
# system host-swact controller-1
|
||||
|
||||
#. **Complete the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-complete
|
||||
|
||||
* Run post-checks to ensure upgrade has been completed
|
||||
* Remove release N data
|
||||
|
||||
**Failure Handling**
|
||||
|
||||
* When a failure happens and cannot be resolved without manual intervention,
|
||||
the upgrade state will be set to data-migration-failed or activation-failed.
|
||||
* To recover, the user will need to resolve the issue that caused the upgrade
|
||||
step to fail.
|
||||
* An upgrade-abort is only possible before controller-0 has been upgraded.
|
||||
In other cases, the user would need to resolve the issue and reattempt
|
||||
the step.
|
||||
|
||||
**Health Checks**
|
||||
|
||||
* In order to ensure the health and stability of the system we will do
|
||||
health checks both before allowing a platform upgrade to start and then as
|
||||
each upgrade CLI is run.
|
||||
* The health checks will include:
|
||||
|
||||
* basic system health (i.e. system health-query)
|
||||
* new kubernetes specific checks - for example:
|
||||
|
||||
* verify that all kubernetes control plane pods are running
|
||||
* verify that all kubernetes applications are fully applied
|
||||
* verify that kubernetes control plane version and configuration is at
|
||||
baseline required for platform upgrade.
|
||||
|
||||
**Interactions with container applications**
|
||||
|
||||
* The kubernetes join_cmd must be created from the N side running the
|
||||
active kubernetes control plane.
|
||||
* The platform upgrade must be performed exclusively from the kubernetes
|
||||
upgrade. A kubernetes upgrade is not allowed when a platform upgrade
|
||||
is in progress, and vice-versa.
|
||||
* Before starting a platform upgrade, we also need to check that
|
||||
kubernetes configuration is at a baseline suitable for upgrade.
|
||||
The N+1 load metadata enforces the configuration baseline required on
|
||||
the N from side.
|
||||
* If the N+1 version is at a newer kubernetes version, then the
|
||||
kubernetes upgrade procedure must be completed first in order to align
|
||||
the kubernetes version.
|
||||
* After a platform upgrade has started, helm-override operations will be
|
||||
prevented as these configuration changes will not be preserved after
|
||||
upgrade-start and can also trigger applications to be reapplied.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Update the kubernetes configuration to N+1 configuration after the upgrade.
|
||||
However, this would necessitate the coordination of activation of features
|
||||
such as control plane address, encryption at rest during an upgrade, such
|
||||
as during the upgrade-activate step. This would require N+1 to be
|
||||
backwards compatible with N.
|
||||
|
||||
A mechanism is required to upgrade etcd [3]_, thus keeping the versioning
|
||||
for the etcd database will allow an upgrade to a newer etcd version.
|
||||
|
||||
etcd Upgrades
|
||||
|
||||
* host-swact to controller-1 during upgrade. As part of the host-swact during
|
||||
an upgrade, the kubernetes etcd is copied from N side. This would take a
|
||||
copy at a time when the etcd data is not allowed to change, as etcd
|
||||
would be brought down on controller-0, and prior to service management
|
||||
bringing up etcd on controller-1. After the new version of etcd runs with
|
||||
the migrated etcd, it is no longer possible to run the old version of etcd
|
||||
against it. Therefore, the release N version of the data must be maintained
|
||||
in the event of a host-swact back or upgrade-abort prior to
|
||||
upgrade of controller-0. This is the chosen alternative.
|
||||
* Alternative: upgrade-start: As an alternative to updating etcd on host-swact.
|
||||
Migrate etcd for upgrade. Configuration changes which
|
||||
affect the cluster state information could still occur in this scenario.
|
||||
Kubernetes state changes that occur after the snapshot would be lost
|
||||
and have the potential to put the kubernetes cluster into a bad state.
|
||||
* Alternative: /opt/etcd is unversioned so that the N and N+1 sides both reference
|
||||
the same directory. This is based on the premise that kubernetes control
|
||||
plane is upgraded independently and does not require a versioned directory.
|
||||
However, as noted in the host-swact alternative, this would not be compatible with
|
||||
upgrade-abort or host-swact back to the N release.
|
||||
|
||||
|
||||
Data Model Impact
|
||||
-----------------
|
||||
|
||||
The following tables in the sysinv database are required. The datamodel
|
||||
required to support platform upgrades are in the stx3.0 data model,
|
||||
and include the following platform upgrade focused tables.
|
||||
|
||||
* loads
|
||||
represents the load version (e.g. N and N+1), load state, compatible
|
||||
versions
|
||||
|
||||
* software_upgrade
|
||||
represents the software upgrade state, from_load and to_load
|
||||
|
||||
* host_upgrade
|
||||
represents the software_load and target_load for each host
|
||||
|
||||
REST API Impact
|
||||
---------------
|
||||
|
||||
The v1/load, health, upgrade implements the platform upgrade specific URL
|
||||
utilized for the upgrade. The config repo api-ref-sysinv-v1-config.rst
|
||||
doc is updated accordingly.
|
||||
|
||||
The sysinv REST API supports the following upgrade-related methods:
|
||||
|
||||
* The existing resource /loads
|
||||
|
||||
* URLS:
|
||||
|
||||
* /v1/loads
|
||||
|
||||
* Request Methods:
|
||||
|
||||
* GET /v1/loads
|
||||
|
||||
* Returns all platform loads known to the system
|
||||
|
||||
* POST /v1/loads/import_load
|
||||
|
||||
* Imports the new load passed into the body of the POST request
|
||||
|
||||
* The existing resource /upgrade
|
||||
|
||||
* URLS:
|
||||
|
||||
* /v1/upgrade
|
||||
|
||||
* The existing resource /ihosts support for upgrade actions.
|
||||
|
||||
* URLS:
|
||||
|
||||
* /v1/ihosts/<hostid>
|
||||
|
||||
* Request Methods:
|
||||
|
||||
* POST /v1/ihosts/<hostid>/upgrade
|
||||
|
||||
* Upgrades the platform load on the specified host
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
|
||||
This story is providing a mechanism to upgrade platform from one version
|
||||
to another. It does not introduce any additional security impacts above
|
||||
what is already there regarding the initial deployment.
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
|
||||
End users will typically perform upgrades using the sysinv (i.e.
|
||||
system) CLI. The CLI commands used for the upgrade are as noted in
|
||||
the `Proposed Process`_ section above.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
When a platform upgrade is in progress, each host must be taken out of
|
||||
service in order to install the new load.
|
||||
The user must ensure that there is enough capacity in the system to handle
|
||||
the removal from service of one (or more) hosts as the load on each
|
||||
host is upgraded.
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
Deployers will now be able to upgrade StarlingX platform on a running system.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
Developers working on the StarlingX components that manage container
|
||||
applications may need to be aware that certain operations should be
|
||||
prevented when a platform upgrade is in progress. This is discussed in
|
||||
the `Proposed Process`_ section above.
|
||||
|
||||
Upgrade Impact
|
||||
--------------
|
||||
|
||||
StarlingX platform upgrades are independent from the Kubernetes upgrade [1]_.
|
||||
However, when StarlingX platform upgrades are supported, checks must be put
|
||||
in place to ensure that the kubernetes version is not allowed to change due
|
||||
to a platform upgrade. In effect, the system must be upgraded to the same
|
||||
version of kubernetes as is packaged in the new platform release, to ensure
|
||||
this is the case. This will be enforced through semantic checking in the
|
||||
platform upgrade APIs.
|
||||
|
||||
The platform upgrade excludes the upgrade of applications. Applications will
|
||||
need to be compatible with the new version of the platform/kubernetes.
|
||||
Any upgrade of hosted applications is independent of the platform upgrade.
|
||||
|
||||
Simplex Platform Upgrades
|
||||
=========================
|
||||
|
||||
At a high level the simplex upgrade process involves the following steps.
|
||||
|
||||
* Taking a backup of the platform data.
|
||||
* Installing the new StarlingX software.
|
||||
* Restoring and migrating the platform data.
|
||||
|
||||
Simplex Upgrade Process
|
||||
-----------------------
|
||||
|
||||
#. **Import release N+1 load**
|
||||
|
||||
::
|
||||
|
||||
# system load-import <bootimage.iso> <bootimage.sig>
|
||||
|
||||
# system load-list
|
||||
+----+----------+------------------+
|
||||
| id | state | software_version |
|
||||
+----+----------+------------------+
|
||||
| 1 | active | 19.12 |
|
||||
| 2 | imported | 20.06 |
|
||||
+----+----------+------------------+
|
||||
|
||||
#. **Start the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-start
|
||||
|
||||
This performs semantic checks and the health checks as per the
|
||||
'health-query-upgrade' command.
|
||||
|
||||
This will make a copy of the system platform data similar to a platform
|
||||
backup. The upgrade data will be placed under /opt/backups.
|
||||
|
||||
Any changes made after this point will be lost.
|
||||
|
||||
#. **Copy the upgrade data**
|
||||
|
||||
During the upgrade process the rootfs will be wiped, and the upgrade data
|
||||
deleted. The upgrade data must be copied from the system to an alternate safe
|
||||
location (such as a USB drive or remote server).
|
||||
|
||||
#. **Lock and upgrade controller-0**
|
||||
|
||||
::
|
||||
|
||||
# system host-upgrade controller-0
|
||||
|
||||
This will wipe the rootfs and reboot the host.
|
||||
|
||||
#. **Install the new release of StarlingX**
|
||||
|
||||
Install the new release of StarlingX software via network or USB.
|
||||
|
||||
#. **Restore the upgrade data**
|
||||
|
||||
::
|
||||
|
||||
# ansible-playbook /usr/share/ansible/stx-ansible/playbooks/upgrade.yml
|
||||
|
||||
The upgrade playbook will migrate the upgrade data to the current release
|
||||
and restore it to the system.
|
||||
|
||||
This playbook requires the following parameters:
|
||||
|
||||
* ansible_become_pass
|
||||
* admin_password
|
||||
* upgrade_data_file
|
||||
|
||||
#. **Unlock controller-0**
|
||||
|
||||
::
|
||||
|
||||
# system host-unlock controller-0
|
||||
|
||||
#. **Activate the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-activate
|
||||
|
||||
Perform any additional configuration which may be required after the host is
|
||||
unlocked.
|
||||
|
||||
#. **Complete the upgrade**
|
||||
|
||||
::
|
||||
|
||||
# system upgrade-complete
|
||||
|
||||
Remove data from the previous release.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
|
||||
* John Kung (john.kung@windriver.com)
|
||||
|
||||
Other contributors:
|
||||
|
||||
* David Sullivan (david.sullivan@windriver.com)
|
||||
|
||||
Repos Impacted
|
||||
--------------
|
||||
|
||||
* config
|
||||
* update
|
||||
* integ
|
||||
* metal
|
||||
* stx-puppet
|
||||
* ansible-playbooks
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Please refer to the Story [2]_ for the complete list of tasks.
|
||||
|
||||
The following are prerequisites to prior the upgrade
|
||||
|
||||
* update kubernetes configuration to the configured features on N+1
|
||||
This will be enabled by a software delivered increment that will enable the
|
||||
required configuration baseline.
|
||||
|
||||
* update kubernetes control_plane_address
|
||||
* kubernetes encryption at rest
|
||||
|
||||
Updating the kubernetes version is covered by [1] and
|
||||
is performed independently.
|
||||
|
||||
* This is enforced by the upgrade load N+1 metadata which specifies the
|
||||
upgrade supported from load.
|
||||
|
||||
* The etcd directory is unversioned so that it can be referenced by the N
|
||||
and N+1 kubernetes control plane
|
||||
|
||||
The following steps in the upgrade require changes:
|
||||
|
||||
* load-import
|
||||
The metadata handling is extended to support multiple from versions.
|
||||
|
||||
* health-query-upgrade
|
||||
Health checks are added to ensure kubernetes version and configuration
|
||||
are at correct baseline for upgrade
|
||||
|
||||
upgrade-start
|
||||
|
||||
* upgrade-start-pkg-extract
|
||||
Update to reference dnf rather than superseded repoquery tool
|
||||
* migrate puppet hiera data
|
||||
* Export armada, helm, kubernetes configuration to N+1
|
||||
* export the databases for N+1
|
||||
|
||||
host-upgrade
|
||||
|
||||
* create /etc/platform/.upgrade_controller_1 so that controller-1 via RPC
|
||||
can determine that controller upgrade is required
|
||||
|
||||
host-unlock
|
||||
|
||||
* Create join command from the N side for the N+1 side
|
||||
* run upgrades playbook for docker. This will push docker images required.
|
||||
|
||||
host-swact
|
||||
|
||||
* update to backup /opt/etcd/from_version and restore
|
||||
to /opt/etcd/to_version for the target version on host-swact
|
||||
This is performed at a time where data loss can be avoided. During an upgrade,
|
||||
before etcd has started on controller-1, after host-swact, the etcd is copied
|
||||
from controller-0. Normally, etcdctl snapshot is required when data is still
|
||||
dynamically changing; however, as service management manages etcd in
|
||||
active-standby, and the snapshot is occuring as part of etcd startup, it is
|
||||
possible to use a direct copy.
|
||||
|
||||
Ansible:
|
||||
|
||||
* upgrade playbook for docker. push_k8s_images.yml is updated to handle
|
||||
platform upgrade case.
|
||||
|
||||
Integ:
|
||||
|
||||
* Update registry-token-server to continue to support GET for token
|
||||
This is performed as part of Story 2006145, Task 38763
|
||||
https://review.opendev.org/#/c/707283/
|
||||
|
||||
* Add semantic checks to existing APIs
|
||||
|
||||
* application-apply/remove/etc... - prevent when platform upgrade in
|
||||
progress
|
||||
* helm-override-update/etc... - prevent when platform upgrade in progress
|
||||
|
||||
Miscellaneous:
|
||||
|
||||
* Update metadata for upgrade versions
|
||||
* Remove openstack service, databases references in upgrade code
|
||||
* Update supported from version checks
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Upgrades must be tested in the following StarlingX configurations:
|
||||
|
||||
* AIO-DX
|
||||
* Standard with controller storage
|
||||
* Standard with dedicated storage
|
||||
* AIO-SX
|
||||
|
||||
The testing can be performed on hardware or virtual environments.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
New user end user documentation will be required to describe how platform
|
||||
upgrades should be done.
|
||||
|
||||
The config API reference will also need updates.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] Kubernetes Upgrade Story https://storyboard.openstack.org/#!/story/2006781
|
||||
.. [2] Platform Upgrades Story https://storyboard.openstack.org/#!/story/2007403
|
||||
.. [3] etcd upgrades https://etcd.io/docs/v3.4.0/upgrades
|
||||
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - stx-4.0
|
||||
- Introduced
|
Loading…
Reference in New Issue