a7584ec1a5
Alembic does lots of new things. Provide docs for how to use this. We also improve upgrade docs slightly, removing references to ancient reviews that are no longer really helpful as well as calling out our N -> N+1 constraint. Change-Id: I3760b82ce3bd71aa0a760d7137d69dfa3f29dc1d Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
267 lines
12 KiB
ReStructuredText
267 lines
12 KiB
ReStructuredText
==============
|
|
Upgrade checks
|
|
==============
|
|
|
|
.. note::
|
|
|
|
This document details how to generate upgrade checks as part of a new
|
|
feature or bugfix. For info on how to apply existing upgrade checks, refer
|
|
to the documentation for the :program:`nova-status` command in
|
|
:doc:`/cli/nova-status`. For info on the general upgrade process for a nova
|
|
deployment, refer to :doc:`/admin/upgrades`.
|
|
|
|
Nova provides automated :ref:`upgrade check tooling <nova-status-checks>` to
|
|
assist deployment tools in verifying critical parts of the deployment,
|
|
especially when it comes to major changes during upgrades that require operator
|
|
intervention.
|
|
|
|
This guide covers the background on nova's upgrade check tooling, how it is
|
|
used, and what to look for in writing new checks.
|
|
|
|
|
|
Background
|
|
----------
|
|
|
|
Nova has historically supported offline database schema migrations
|
|
(:program:`nova-manage db sync` and :program:`nova-manage api_db sync`) and
|
|
online data migrations (:program:`nova-manage db online_data_migrations`)
|
|
during upgrades, as discussed in :doc:`/reference/database-migrations`.
|
|
The :program:`nova-status upgrade check` command was introduced in the 15.0.0
|
|
(Ocata) release to aid in the verification of two major required changes in that
|
|
release, namely Placement and Cells v2.
|
|
|
|
Integration with the Placement service and deploying Cells v2 was optional
|
|
starting in the 14.0.0 Newton release and made required in the Ocata release.
|
|
The nova team working on these changes knew that there were required deployment
|
|
changes to successfully upgrade to Ocata. In addition, the required deployment
|
|
changes were not things that could simply be verified in a database migration
|
|
script, e.g. a migration script should not make REST API calls to Placement.
|
|
|
|
So ``nova-status upgrade check`` was written to provide an automated
|
|
"pre-flight" check to verify that required deployment steps were performed
|
|
prior to upgrading to Ocata.
|
|
|
|
Reference the `Ocata changes`_ for implementation details.
|
|
|
|
.. _Ocata changes: https://review.opendev.org/#/q/topic:bp/resource-providers-scheduler-db-filters+status:merged+file:%255Enova/cmd/status.py
|
|
|
|
|
|
Guidelines
|
|
----------
|
|
|
|
* The checks should be able to run within a virtual environment or container.
|
|
All that is required is a full configuration file, similar to running other
|
|
``nova-manage`` type administration commands. In the case of nova, this
|
|
means having :oslo.config:group:`api_database`,
|
|
:oslo.config:group:`placement`, etc sections configured.
|
|
|
|
* Candidates for automated upgrade checks are things in a project's upgrade
|
|
release notes which can be verified via the database. For example, when
|
|
upgrading to Cells v2 in Ocata, one required step was creating
|
|
"cell mappings" for ``cell0`` and ``cell1``. This can easily be verified by
|
|
checking the contents of the ``cell_mappings`` table in the ``nova_api``
|
|
database.
|
|
|
|
* Checks will query the database(s) and potentially REST APIs (depending on the
|
|
check) but should not expect to run RPC calls. For example, a check should
|
|
not require that the ``nova-compute`` service is running on a particular
|
|
host.
|
|
|
|
* Checks are typically meant to be run before re-starting and upgrading to new
|
|
service code, which is how `grenade uses them`__, but they can also be run
|
|
as a :ref:`post-install verify step <verify-install-nova-status>` which is
|
|
how `openstack-ansible`__ also uses them. The high-level set of upgrade steps
|
|
for upgrading nova in grenade is:
|
|
|
|
* Install new code
|
|
* Sync the database schema for new models
|
|
(``nova-manage api_db sync``; ``nova-manage db sync``)
|
|
* Run the online data migrations (``nova-manage db online_data_migrations``)
|
|
* Run the upgrade check (``nova-status upgrade check``)
|
|
* Restart services with new code
|
|
|
|
.. __: https://github.com/openstack-dev/grenade/blob/dc7f4a4ba/projects/60_nova/upgrade.sh#L96
|
|
.. __: https://review.opendev.org/#/c/575125/
|
|
|
|
* Checks must be idempotent so they can be run repeatedly and the results are
|
|
always based on the latest data. This allows an operator to run the checks,
|
|
fix any issues reported, and then iterate until the status check no longer
|
|
reports any issues.
|
|
|
|
* Checks which cannot easily, or should not, be run within offline database
|
|
migrations are a good candidate for these CLI-driven checks. For example,
|
|
``instances`` records are in the cell database and for each instance there
|
|
should be a corresponding ``request_specs`` table entry in the ``nova_api``
|
|
database. A ``nova-manage db online_data_migrations`` routine was added in
|
|
the Newton release to back-fill request specs for existing instances, and
|
|
`in Rocky`__ an upgrade check was added to make sure all non-deleted
|
|
instances have a request spec so compatibility code can be removed in Stein.
|
|
In older releases of nova we would have added a `blocker migration`__ as part
|
|
of the database schema migrations to make sure the online data migrations had
|
|
been completed before the upgrade could proceed.
|
|
|
|
.. note::
|
|
|
|
Usage of ``nova-status upgrade check`` does not preclude the need
|
|
for blocker migrations within a given database, but in the case of
|
|
request specs the check spans multiple databases and was a better
|
|
fit for the ``nova-status`` tooling.
|
|
|
|
.. __: https://review.opendev.org/#/c/581813/
|
|
.. __: https://review.opendev.org/#/c/289450/
|
|
|
|
* All checks should have an accompanying upgrade release note.
|
|
|
|
|
|
Structure
|
|
---------
|
|
|
|
There is no graph logic for checks, meaning each check is meant to be run
|
|
independently of other checks in the same set. For example, a project could
|
|
have five checks which run serially but that does not mean the second check
|
|
in the set depends on the results of the first check in the set, or the
|
|
third check depends on the second, and so on.
|
|
|
|
The base framework is fairly simple as can be seen from the `initial change`_.
|
|
Each check is registered in the ``_upgrade_checks`` variable and the ``check``
|
|
method executes each check and records the result. The most severe result is
|
|
recorded for the final return code.
|
|
|
|
There are one of three possible results per check:
|
|
|
|
* ``Success``: All upgrade readiness checks passed successfully and there is
|
|
nothing to do.
|
|
* ``Warning``: At least one check encountered an issue and requires further
|
|
investigation. This is considered a warning but the upgrade may be OK.
|
|
* ``Failure``: There was an upgrade status check failure that needs to be
|
|
investigated. This should be considered something that stops an upgrade.
|
|
|
|
The ``UpgradeCheckResult`` object provides for adding details when there
|
|
is a warning or failure result which generally should refer to how to resolve
|
|
the failure, e.g. maybe ``nova-manage db online_data_migrations`` is
|
|
incomplete and needs to be run again.
|
|
|
|
Using the `cells v2 check`_ as an example, there are really two checks
|
|
involved:
|
|
|
|
1. Do the cell0 and cell1 mappings exist?
|
|
2. Do host mappings exist in the API database if there are compute node
|
|
records in the cell database?
|
|
|
|
Failing either check results in a ``Failure`` status for that check and return
|
|
code of ``2`` for the overall run.
|
|
|
|
The initial `placement check`_ provides an example of a warning response. In
|
|
that check, if there are fewer resource providers in Placement than there are
|
|
compute nodes in the cell database(s), the deployment may be underutilized
|
|
because the ``nova-scheduler`` is using the Placement service to determine
|
|
candidate hosts for scheduling.
|
|
|
|
Warning results are good for cases where scenarios are known to run through
|
|
a rolling upgrade process, e.g. ``nova-compute`` being configured to report
|
|
resource provider information into the Placement service. These are things
|
|
that should be investigated and completed at some point, but might not cause
|
|
any immediate failures.
|
|
|
|
The results feed into a standard output for the checks:
|
|
|
|
.. code-block:: console
|
|
|
|
$ nova-status upgrade check
|
|
+----------------------------------------------------+
|
|
| Upgrade Check Results |
|
|
+----------------------------------------------------+
|
|
| Check: Cells v2 |
|
|
| Result: Success |
|
|
| Details: None |
|
|
+----------------------------------------------------+
|
|
| Check: Placement API |
|
|
| Result: Failure |
|
|
| Details: There is no placement-api endpoint in the |
|
|
| service catalog. |
|
|
+----------------------------------------------------+
|
|
|
|
.. _initial change: https://review.opendev.org/#/c/411517/
|
|
.. _cells v2 check: https://review.opendev.org/#/c/411525/
|
|
.. _placement check: https://review.opendev.org/#/c/413250/
|
|
|
|
|
|
FAQs
|
|
----
|
|
|
|
- How is the ``nova-status`` upgrade script packaged and deployed?
|
|
|
|
There is a ``console_scripts`` entry for ``nova-status`` in the ``setup.cfg``
|
|
file.
|
|
|
|
- Why are there multiple parts to the command structure, i.e. "upgrade" and
|
|
"check"?
|
|
|
|
This is an artifact of how the ``nova-manage`` command is structured which
|
|
has categories of sub-commands, like ``nova-manage db`` is a sub-category
|
|
made up of other sub-commands like ``nova-manage db sync``. The
|
|
``nova-status upgrade check`` command was written in the same way for
|
|
consistency and extensibility if other sub-commands need to be added later.
|
|
|
|
- Why is the upgrade check command not part of the standard python-\*client
|
|
CLIs?
|
|
|
|
The ``nova-status`` command was modeled after the ``nova-manage`` command
|
|
which is meant to be admin-only and has direct access to the database,
|
|
unlike other CLI packages like python-novaclient which requires a token
|
|
and communicates with nova over the REST API. Because of this, it is also
|
|
possible to write commands in ``nova-manage`` and ``nova-status`` that can
|
|
work while the API service is down for maintenance.
|
|
|
|
- How should the checks be documented?
|
|
|
|
Each check should be documented in the :ref:`history section
|
|
<nova-status-checks>` of the CLI guide and have a release note. This is
|
|
important since the checks can be run in an isolated environment apart from
|
|
the actual deployed version of the code and since the checks should be
|
|
idempotent, the history / change log is good for knowing what is being
|
|
validated.
|
|
|
|
- Do other projects support upgrade checks?
|
|
|
|
A community-wide `goal for the Stein release`__ is adding the same type of
|
|
``$PROJECT-status upgrade check`` tooling to other projects to ease in
|
|
upgrading OpenStack across the board. So while the guidelines in this
|
|
document are primarily specific to nova, they should apply generically to
|
|
other projects wishing to incorporate the same tooling.
|
|
|
|
.. __: https://governance.openstack.org/tc/goals/stein/upgrade-checkers.html
|
|
|
|
- Where should the documentation live for projects other than nova?
|
|
|
|
As part of the standard OpenStack project `documentation guidelines`__ the
|
|
command should be documented under ``doc/source/cli`` in each project repo.
|
|
|
|
.. __: https://docs.openstack.org/doc-contrib-guide/project-guides.html
|
|
|
|
- Can upgrade checks be backported?
|
|
|
|
Sometimes upgrade checks can be backported to aid in pre-empting bugs on
|
|
stable branches. For example, a check was added for `bug 1759316`__ in Rocky
|
|
which was also backported to stable/queens in case anyone upgrading from Pike
|
|
to Queens would hit the same issue. Backportable checks are generally only
|
|
made for latent bugs since someone who has already passed checks and upgraded
|
|
to a given stable branch should not start failing after a patch release on
|
|
that same branch. For this reason, any check being backported should have a
|
|
release note with it.
|
|
|
|
.. __: https://bugs.launchpad.net/nova/+bug/1759316
|
|
|
|
- Can upgrade checks only be for N-1 to N version upgrades?
|
|
|
|
No, not necessarily. The upgrade checks are also an essential part of
|
|
`fast-forward upgrades`__ to make sure that as you roll through each release
|
|
performing schema (data model) updates and data migrations that you are
|
|
also completing all of the necessary changes. For example, if you are
|
|
fast forward upgrading from Ocata to Rocky, something could have been
|
|
added, deprecated or removed in Pike or Queens and a pre-upgrade check is
|
|
a way to make sure the necessary steps were taking while upgrading through
|
|
those releases before restarting the Rocky code at the end.
|
|
|
|
.. __: https://wiki.openstack.org/wiki/Fast_forward_upgrades
|