33a56781f4
This change mainly fixes incorrect use of backticks but also adress some other minor issues like unbalanced backticks, incorrect spacing or missing _ in links. This change add a tox target to run sphinx-lint as well as adding it to the relevent tox envs to enforce it in ci. pre-commit is leveraged to install and execute sphinx-lint but it does not reqiure you to install the hooks locally into your working dir. Change-Id: Ib97b35c9014bc31876003cef4362c47a8a3a4e0e
1151 lines
49 KiB
ReStructuredText
1151 lines
49 KiB
ReStructuredText
==========
|
|
Cells (v2)
|
|
==========
|
|
|
|
.. versionadded:: 16.0.0 (Pike)
|
|
|
|
This document describes the layout of a deployment with cells v2, including
|
|
deployment considerations for security and scale and recommended practices and
|
|
tips for running and maintaining cells v2 for admins and operators. It is
|
|
focused on code present in Pike and later, and while it is geared towards
|
|
people who want to have multiple cells for whatever reason, the nature of the
|
|
cells v2 support in Nova means that it applies in some way to all deployments.
|
|
|
|
Before reading any further, there is a nice overview presentation_ that Andrew
|
|
Laski gave at the Austin (Newton) summit which may be worth watching.
|
|
|
|
.. _presentation: https://www.openstack.org/videos/summits/austin-2016/nova-cells-v2-whats-going-on
|
|
|
|
.. note::
|
|
|
|
Cells v2 is different to the cells feature found in earlier versions of
|
|
nova, also known as cells v1. Cells v1 was deprecated in 16.0.0 (Pike) and
|
|
removed entirely in Train (20.0.0).
|
|
|
|
|
|
Overview
|
|
--------
|
|
|
|
The purpose of the cells functionality in nova is to allow larger deployments
|
|
to shard their many compute nodes into cells. All nova deployments are by
|
|
definition cells deployments, even if most will only ever have a single cell.
|
|
This means a multi-cell deployment will not be radically different from a
|
|
"standard" nova deployment.
|
|
|
|
Consider such a deployment. It will consists of the following components:
|
|
|
|
- The :program:`nova-api` service which provides the external REST API to
|
|
users.
|
|
|
|
- The :program:`nova-scheduler` and ``placement`` services which are
|
|
responsible for tracking resources and deciding which compute node instances
|
|
should be on.
|
|
|
|
- An "API database" that is used primarily by :program:`nova-api` and
|
|
:program:`nova-scheduler` (called *API-level services* below) to track
|
|
location information about instances, as well as a temporary location for
|
|
instances being built but not yet scheduled.
|
|
|
|
- The :program:`nova-conductor` service which offloads long-running tasks for
|
|
the API-level services and insulates compute nodes from direct database access
|
|
|
|
- The :program:`nova-compute` service which manages the virt driver and
|
|
hypervisor host.
|
|
|
|
- A "cell database" which is used by API, conductor and compute
|
|
services, and which houses the majority of the information about
|
|
instances.
|
|
|
|
- A "cell0 database" which is just like the cell database, but
|
|
contains only instances that failed to be scheduled. This database mimics a
|
|
regular cell, but has no compute nodes and is used only as a place to put
|
|
instances that fail to land on a real compute node (and thus a real cell).
|
|
|
|
- A message queue which allows the services to communicate with each
|
|
other via RPC.
|
|
|
|
In smaller deployments, there will typically be a single message queue that all
|
|
services share and a single database server which hosts the API database, a
|
|
single cell database, as well as the required cell0 database. Because we only
|
|
have one "real" cell, we consider this a "single-cell deployment".
|
|
|
|
In larger deployments, we can opt to shard the deployment using multiple cells.
|
|
In this configuration there will still only be one global API database but
|
|
there will be a cell database (where the bulk of the instance information
|
|
lives) for each cell, each containing a portion of the instances for the entire
|
|
deployment within, as well as per-cell message queues and per-cell
|
|
:program:`nova-conductor` instances. There will also be an additional
|
|
:program:`nova-conductor` instance, known as a *super conductor*, to handle
|
|
API-level operations.
|
|
|
|
In these larger deployments, each of the nova services will use a cell-specific
|
|
configuration file, all of which will at a minimum specify a message queue
|
|
endpoint (i.e. :oslo.config:option:`transport_url`). Most of the services will
|
|
also contain database connection configuration information (i.e.
|
|
:oslo.config:option:`database.connection`), while API-level services that need
|
|
access to the global routing and placement information will also be configured
|
|
to reach the API database (i.e. :oslo.config:option:`api_database.connection`).
|
|
|
|
.. note::
|
|
|
|
The pair of :oslo.config:option:`transport_url` and
|
|
:oslo.config:option:`database.connection` configured for a service defines
|
|
what cell a service lives in.
|
|
|
|
API-level services need to be able to contact other services in all of
|
|
the cells. Since they only have one configured
|
|
:oslo.config:option:`transport_url` and
|
|
:oslo.config:option:`database.connection`, they look up the information for the
|
|
other cells in the API database, with records called *cell mappings*.
|
|
|
|
.. note::
|
|
|
|
The API database must have cell mapping records that match
|
|
the :oslo.config:option:`transport_url` and
|
|
:oslo.config:option:`database.connection` configuration options of the
|
|
lower-level services. See the ``nova-manage`` :ref:`man-page-cells-v2`
|
|
commands for more information about how to create and examine these records.
|
|
|
|
The following section goes into more detail about the difference between
|
|
single-cell and multi-cell deployments.
|
|
|
|
|
|
Service layout
|
|
--------------
|
|
|
|
The services generally have a well-defined communication pattern that
|
|
dictates their layout in a deployment. In a small/simple scenario, the
|
|
rules do not have much of an impact as all the services can
|
|
communicate with each other on a single message bus and in a single
|
|
cell database. However, as the deployment grows, scaling and security
|
|
concerns may drive separation and isolation of the services.
|
|
|
|
Single cell
|
|
~~~~~~~~~~~
|
|
|
|
This is a diagram of the basic services that a simple (single-cell) deployment
|
|
would have, as well as the relationships (i.e. communication paths) between
|
|
them:
|
|
|
|
.. graphviz::
|
|
|
|
digraph services {
|
|
graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
|
|
node [fontsize=10 fontname="Monospace"];
|
|
edge [arrowhead="normal", arrowsize="0.8"];
|
|
labelloc=bottom;
|
|
labeljust=left;
|
|
|
|
{ rank=same
|
|
api [label="nova-api"]
|
|
apidb [label="API Database" shape="box"]
|
|
scheduler [label="nova-scheduler"]
|
|
}
|
|
{ rank=same
|
|
mq [label="MQ" shape="diamond"]
|
|
conductor [label="nova-conductor"]
|
|
}
|
|
{ rank=same
|
|
cell0db [label="Cell0 Database" shape="box"]
|
|
celldb [label="Cell Database" shape="box"]
|
|
compute [label="nova-compute"]
|
|
}
|
|
|
|
api -> mq -> compute
|
|
conductor -> mq -> scheduler
|
|
|
|
api -> apidb
|
|
api -> cell0db
|
|
api -> celldb
|
|
|
|
conductor -> apidb
|
|
conductor -> cell0db
|
|
conductor -> celldb
|
|
}
|
|
|
|
All of the services are configured to talk to each other over the same
|
|
message bus, and there is only one cell database where live instance
|
|
data resides. The cell0 database is present (and required) but as no
|
|
compute nodes are connected to it, this is still a "single cell"
|
|
deployment.
|
|
|
|
Multiple cells
|
|
~~~~~~~~~~~~~~
|
|
|
|
In order to shard the services into multiple cells, a number of things
|
|
must happen. First, the message bus must be split into pieces along
|
|
the same lines as the cell database. Second, a dedicated conductor
|
|
must be run for the API-level services, with access to the API
|
|
database and a dedicated message queue. We call this *super conductor*
|
|
to distinguish its place and purpose from the per-cell conductor nodes.
|
|
|
|
.. graphviz::
|
|
|
|
digraph services2 {
|
|
graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
|
|
node [fontsize=10 fontname="Monospace"];
|
|
edge [arrowhead="normal", arrowsize="0.8"];
|
|
labelloc=bottom;
|
|
labeljust=left;
|
|
|
|
subgraph api {
|
|
api [label="nova-api"]
|
|
scheduler [label="nova-scheduler"]
|
|
conductor [label="super conductor"]
|
|
{ rank=same
|
|
apimq [label="API MQ" shape="diamond"]
|
|
apidb [label="API Database" shape="box"]
|
|
}
|
|
|
|
api -> apimq -> conductor
|
|
api -> apidb
|
|
conductor -> apimq -> scheduler
|
|
conductor -> apidb
|
|
}
|
|
|
|
subgraph clustercell0 {
|
|
label="Cell 0"
|
|
color=green
|
|
cell0db [label="Cell Database" shape="box"]
|
|
}
|
|
|
|
subgraph clustercell1 {
|
|
label="Cell 1"
|
|
color=blue
|
|
mq1 [label="Cell MQ" shape="diamond"]
|
|
cell1db [label="Cell Database" shape="box"]
|
|
conductor1 [label="nova-conductor"]
|
|
compute1 [label="nova-compute"]
|
|
|
|
conductor1 -> mq1 -> compute1
|
|
conductor1 -> cell1db
|
|
|
|
}
|
|
|
|
subgraph clustercell2 {
|
|
label="Cell 2"
|
|
color=red
|
|
mq2 [label="Cell MQ" shape="diamond"]
|
|
cell2db [label="Cell Database" shape="box"]
|
|
conductor2 [label="nova-conductor"]
|
|
compute2 [label="nova-compute"]
|
|
|
|
conductor2 -> mq2 -> compute2
|
|
conductor2 -> cell2db
|
|
}
|
|
|
|
api -> mq1 -> conductor1
|
|
api -> mq2 -> conductor2
|
|
api -> cell0db
|
|
api -> cell1db
|
|
api -> cell2db
|
|
|
|
conductor -> cell0db
|
|
conductor -> cell1db
|
|
conductor -> mq1
|
|
conductor -> cell2db
|
|
conductor -> mq2
|
|
}
|
|
|
|
It is important to note that services in the lower cell boxes only
|
|
have the ability to call back to the placement API but cannot access
|
|
any other API-layer services via RPC, nor do they have access to the
|
|
API database for global visibility of resources across the cloud.
|
|
This is intentional and provides security and failure domain
|
|
isolation benefits, but also has impacts on some things that would
|
|
otherwise require this any-to-any communication style. Check :ref:`upcall`
|
|
below for the most up-to-date information about any caveats that may be present
|
|
due to this limitation.
|
|
|
|
|
|
Database layout
|
|
---------------
|
|
|
|
As mentioned previously, there is a split between global data and data that is
|
|
local to a cell. These databases schema are referred to as the *API* and *main*
|
|
database schemas, respectively.
|
|
|
|
API database
|
|
~~~~~~~~~~~~
|
|
|
|
The API database is the database used for API-level services, such as
|
|
:program:`nova-api` and, in a multi-cell deployment, the superconductor.
|
|
The models and migrations related to this database can be found in
|
|
``nova.db.api``, and the database can be managed using the
|
|
:program:`nova-manage api_db` commands.
|
|
|
|
Main (cell-level) database
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The main database is the database used for cell-level :program:`nova-conductor`
|
|
instances. The models and migrations related to this database can be found in
|
|
``nova.db.main``, and the database can be managed using the
|
|
:program:`nova-manage db` commands.
|
|
|
|
|
|
Usage
|
|
-----
|
|
|
|
As noted previously, all deployments are in effect now cells v2 deployments. As
|
|
a result, setup of any nova deployment - even those that intend to only have
|
|
one cell - will involve some level of cells configuration. These changes are
|
|
configuration-related, both in the main nova configuration file as well as some
|
|
extra records in the databases.
|
|
|
|
All nova deployments must now have the following databases available
|
|
and configured:
|
|
|
|
1. The "API" database
|
|
2. One special "cell" database called "cell0"
|
|
3. One (or eventually more) "cell" databases
|
|
|
|
Thus, a small nova deployment will have an API database, a cell0, and
|
|
what we will call here a "cell1" database. High-level tracking
|
|
information is kept in the API database. Instances that are never
|
|
scheduled are relegated to the cell0 database, which is effectively a
|
|
graveyard of instances that failed to start. All successful/running
|
|
instances are stored in "cell1".
|
|
|
|
.. note::
|
|
|
|
Since Nova services make use of both configuration file and some
|
|
databases records, starting or restarting those services with an
|
|
incomplete configuration could lead to an incorrect deployment.
|
|
Only restart the services once you are done with the described
|
|
steps below.
|
|
|
|
.. note::
|
|
|
|
The following examples show the full expanded command line usage of
|
|
the setup commands. This is to make it easier to visualize which of
|
|
the various URLs are used by each of the commands. However, you should be
|
|
able to put all of that in the config file and :program:`nova-manage` will
|
|
use those values. If need be, you can create separate config files and pass
|
|
them as ``nova-manage --config-file foo.conf`` to control the behavior
|
|
without specifying things on the command lines.
|
|
|
|
Configuring a new deployment
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
If you are installing Nova for the first time and have no compute hosts in the
|
|
database yet then it will be necessary to configure cell0 and at least one
|
|
additional "real" cell. To begin, ensure your API database schema has been
|
|
populated using the :program:`nova-manage api_db sync` command. Ensure the
|
|
connection information for this database is stored in the ``nova.conf`` file
|
|
using the :oslo.config:option:`api_database.connection` config option:
|
|
|
|
.. code-block:: ini
|
|
|
|
[api_database]
|
|
connection = mysql+pymysql://root:secretmysql@dbserver/nova_api?charset=utf8
|
|
|
|
Since there may be multiple "cell" databases (and in fact everyone
|
|
will have cell0 and cell1 at a minimum), connection info for these is
|
|
stored in the API database. Thus, the API database must exist and must provide
|
|
information on how to connect to it before continuing to the steps below, so
|
|
that :program:`nova-manage` can find your other databases.
|
|
|
|
Next, we will create the necessary records for the cell0 database. To
|
|
do that we will first use :program:`nova-manage cell_v2 map_cell0` to create
|
|
and map cell0. For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 map_cell0 \
|
|
--database_connection mysql+pymysql://root:secretmysql@dbserver/nova_cell0?charset=utf8
|
|
|
|
.. note::
|
|
|
|
If you don't specify ``--database_connection`` then the commands will use
|
|
the :oslo.config:option:`database.connection` value from your config file
|
|
and mangle the database name to have a ``_cell0`` suffix
|
|
|
|
.. warning::
|
|
|
|
If your databases are on separate hosts then you should specify
|
|
``--database_connection`` or make certain that the :file:`nova.conf`
|
|
being used has the :oslo.config:option:`database.connection` value pointing
|
|
to the same user/password/host that will work for the cell0 database.
|
|
If the cell0 mapping was created incorrectly, it can be deleted
|
|
using the :program:`nova-manage cell_v2 delete_cell` command before running
|
|
:program:`nova-manage cell_v2 map_cell0` again with the proper database
|
|
connection value.
|
|
|
|
We will then use :program:`nova-manage db sync` to apply the database schema to
|
|
this new database. For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage db sync \
|
|
--database_connection mysql+pymysql://root:secretmysql@dbserver/nova_cell0?charset=utf8
|
|
|
|
Since no hosts are ever in cell0, nothing further is required for its setup.
|
|
Note that all deployments only ever have one cell0, as it is special, so once
|
|
you have done this step you never need to do it again, even if you add more
|
|
regular cells.
|
|
|
|
Now, we must create another cell which will be our first "regular"
|
|
cell, which has actual compute hosts in it, and to which instances can
|
|
actually be scheduled. First, we create the cell record using
|
|
:program:`nova-manage cell_v2 create_cell`. For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 create_cell \
|
|
--name cell1 \
|
|
--database_connection mysql+pymysql://root:secretmysql@127.0.0.1/nova?charset=utf8 \
|
|
--transport-url rabbit://stackrabbit:secretrabbit@mqserver:5672/
|
|
|
|
.. note::
|
|
|
|
If you don't specify the database and transport urls then
|
|
:program:`nova-manage` will use the :oslo.config:option:`transport_url` and
|
|
:oslo.config:option:`database.connection` values from the config file.
|
|
|
|
.. note::
|
|
|
|
It is a good idea to specify a name for the new cell you create so you can
|
|
easily look up cell UUIDs with the :program:`nova-manage cell_v2 list_cells`
|
|
command later if needed.
|
|
|
|
.. note::
|
|
|
|
The :program:`nova-manage cell_v2 create_cell` command will print the UUID
|
|
of the newly-created cell if ``--verbose`` is passed, which is useful if you
|
|
need to run commands like :program:`nova-manage cell_v2 discover_hosts`
|
|
targeted at a specific cell.
|
|
|
|
At this point, the API database can now find the cell database, and further
|
|
commands will attempt to look inside. If this is a completely fresh database
|
|
(such as if you're adding a cell, or if this is a new deployment), then you
|
|
will need to run :program:`nova-manage db sync` on it to initialize the
|
|
schema.
|
|
|
|
Now we have a cell, but no hosts are in it which means the scheduler will never
|
|
actually place instances there. The next step is to scan the database for
|
|
compute node records and add them into the cell we just created. For this step,
|
|
you must have had a compute node started such that it registers itself as a
|
|
running service. You can identify this using the :program:`openstack compute
|
|
service list` command:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ openstack compute service list --service nova-compute
|
|
|
|
Once that has happened, you can scan and add it to the cell using the
|
|
:program:`nova-manage cell_v2 discover_hosts` command:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 discover_hosts
|
|
|
|
This command will connect to any databases for which you have created cells (as
|
|
above), look for hosts that have registered themselves there, and map those
|
|
hosts in the API database so that they are visible to the scheduler as
|
|
available targets for instances. Any time you add more compute hosts to a cell,
|
|
you need to re-run this command to map them from the top-level so they can be
|
|
utilized. You can also configure a periodic task to have Nova discover new
|
|
hosts automatically by setting the
|
|
:oslo.config:option:`scheduler.discover_hosts_in_cells_interval` to a time
|
|
interval in seconds. The periodic task is run by the :program:`nova-scheduler`
|
|
service, so you must be sure to configure it on all of your
|
|
:program:`nova-scheduler` hosts.
|
|
|
|
.. note::
|
|
|
|
In the future, whenever you add new compute hosts, you will need to run the
|
|
:program:`nova-manage cell_v2 discover_hosts` command after starting them to
|
|
map them to the cell if you did not configure automatic host discovery using
|
|
:oslo.config:option:`scheduler.discover_hosts_in_cells_interval`.
|
|
|
|
Adding a new cell to an existing deployment
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You can add additional cells to your deployment using the same steps used above
|
|
to create your first cell. We can create a new cell record using
|
|
:program:`nova-manage cell_v2 create_cell`. For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 create_cell \
|
|
--name cell2 \
|
|
--database_connection mysql+pymysql://root:secretmysql@127.0.0.1/nova?charset=utf8 \
|
|
--transport-url rabbit://stackrabbit:secretrabbit@mqserver:5672/
|
|
|
|
.. note::
|
|
|
|
If you don't specify the database and transport urls then
|
|
:program:`nova-manage` will use the :oslo.config:option:`transport_url` and
|
|
:oslo.config:option:`database.connection` values from the config file.
|
|
|
|
.. note::
|
|
|
|
It is a good idea to specify a name for the new cell you create so you can
|
|
easily look up cell UUIDs with the :program:`nova-manage cell_v2 list_cells`
|
|
command later if needed.
|
|
|
|
.. note::
|
|
|
|
The :program:`nova-manage cell_v2 create_cell` command will print the UUID
|
|
of the newly-created cell if ``--verbose`` is passed, which is useful if you
|
|
need to run commands like :program:`nova-manage cell_v2 discover_hosts`
|
|
targeted at a specific cell.
|
|
|
|
You can repeat this step for each cell you wish to add to your deployment. Your
|
|
existing cell database will be re-used - this simply informs the top-level API
|
|
database about your existing cell databases.
|
|
|
|
Once you've created your new cell, use :program:`nova-manage cell_v2
|
|
discover_hosts` to map compute hosts to cells. This is only necessary if you
|
|
haven't enabled automatic discovery using the
|
|
:oslo.config:option:`scheduler.discover_hosts_in_cells_interval` option. For
|
|
example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 discover_hosts
|
|
|
|
.. note::
|
|
|
|
This command will search for compute hosts in each cell database and map
|
|
them to the corresponding cell. This can be slow, particularly for larger
|
|
deployments. You may wish to specify the ``--cell_uuid`` option, which will
|
|
limit the search to a specific cell. You can use the :program:`nova-manage
|
|
cell_v2 list_cells` command to look up cell UUIDs if you are going to
|
|
specify ``--cell_uuid``.
|
|
|
|
Finally, run the :program:`nova-manage cell_v2 map_instances` command to map
|
|
existing instances to the new cell(s). For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ nova-manage cell_v2 map_instances
|
|
|
|
.. note::
|
|
|
|
This command will search for instances in each cell database and map them to
|
|
the correct cell. This can be slow, particularly for larger deployments. You
|
|
may wish to specify the ``--cell_uuid`` option, which will limit the search
|
|
to a specific cell. You can use the :program:`nova-manage cell_v2
|
|
list_cells` command to look up cell UUIDs if you are going to specify
|
|
``--cell_uuid``.
|
|
|
|
.. note::
|
|
|
|
The ``--max-count`` option can be specified if you would like to limit the
|
|
number of instances to map in a single run. If ``--max-count`` is not
|
|
specified, all instances will be mapped. Repeated runs of the command will
|
|
start from where the last run finished so it is not necessary to increase
|
|
``--max-count`` to finish. An exit code of 0 indicates that all instances
|
|
have been mapped. An exit code of 1 indicates that there are remaining
|
|
instances that need to be mapped.
|
|
|
|
Template URLs in Cell Mappings
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Starting in the 18.0.0 (Rocky) release, the URLs provided in the cell mappings
|
|
for ``--database_connection`` and ``--transport-url`` can contain
|
|
variables which are evaluated each time they are loaded from the
|
|
database, and the values of which are taken from the corresponding
|
|
base options in the host's configuration file. The base URL is parsed
|
|
and the following elements may be substituted into the cell mapping
|
|
URL (using ``rabbit://bob:s3kret@myhost:123/nova?sync=true#extra``):
|
|
|
|
.. list-table:: Cell Mapping URL Variables
|
|
:header-rows: 1
|
|
:widths: 15, 50, 15
|
|
|
|
* - Variable
|
|
- Meaning
|
|
- Part of example URL
|
|
* - ``scheme``
|
|
- The part before the ``://``
|
|
- ``rabbit``
|
|
* - ``username``
|
|
- The username part of the credentials
|
|
- ``bob``
|
|
* - ``password``
|
|
- The password part of the credentials
|
|
- ``s3kret``
|
|
* - ``hostname``
|
|
- The hostname or address
|
|
- ``myhost``
|
|
* - ``port``
|
|
- The port number (must be specified)
|
|
- ``123``
|
|
* - ``path``
|
|
- The "path" part of the URL (without leading slash)
|
|
- ``nova``
|
|
* - ``query``
|
|
- The full query string arguments (without leading question mark)
|
|
- ``sync=true``
|
|
* - ``fragment``
|
|
- Everything after the first hash mark
|
|
- ``extra``
|
|
|
|
Variables are provided in curly brackets, like ``{username}``. A simple template
|
|
of ``rabbit://{username}:{password}@otherhost/{path}`` will generate a full URL
|
|
of ``rabbit://bob:s3kret@otherhost/nova`` when used with the above example.
|
|
|
|
.. note::
|
|
|
|
The :oslo.config:option:`database.connection` and
|
|
:oslo.config:option:`transport_url` values are not reloaded from the
|
|
configuration file during a SIGHUP, which means that a full service restart
|
|
will be required to notice changes in a cell mapping record if variables are
|
|
changed.
|
|
|
|
.. note::
|
|
|
|
The :oslo.config:option:`transport_url` option can contain an
|
|
extended syntax for the "netloc" part of the URL
|
|
(i.e. ``userA:passwordA@hostA:portA,userB:passwordB@hostB:portB``). In this
|
|
case, substitutions of the form ``username1``, ``username2``, etc will be
|
|
honored and can be used in the template URL.
|
|
|
|
The templating of these URLs may be helpful in order to provide each service host
|
|
with its own credentials for, say, the database. Without templating, all hosts
|
|
will use the same URL (and thus credentials) for accessing services like the
|
|
database and message queue. By using a URL with a template that results in the
|
|
credentials being taken from the host-local configuration file, each host will
|
|
use different values for those connections.
|
|
|
|
Assuming you have two service hosts that are normally configured with the cell0
|
|
database as their primary connection, their (abbreviated) configurations would
|
|
look like this:
|
|
|
|
.. code-block:: ini
|
|
|
|
[database]
|
|
connection = mysql+pymysql://service1:foo@myapidbhost/nova_cell0
|
|
|
|
and:
|
|
|
|
.. code-block:: ini
|
|
|
|
[database]
|
|
connection = mysql+pymysql://service2:bar@myapidbhost/nova_cell0
|
|
|
|
Without cell mapping template URLs, they would still use the same credentials
|
|
(as stored in the mapping) to connect to the cell databases. However, consider
|
|
template URLs like the following::
|
|
|
|
mysql+pymysql://{username}:{password}@mycell1dbhost/nova
|
|
|
|
and::
|
|
|
|
mysql+pymysql://{username}:{password}@mycell2dbhost/nova
|
|
|
|
Using the first service and cell1 mapping, the calculated URL that will actually
|
|
be used for connecting to that database will be::
|
|
|
|
mysql+pymysql://service1:foo@mycell1dbhost/nova
|
|
|
|
|
|
Design
|
|
------
|
|
|
|
Prior to the introduction of cells v2, when a request hit the Nova API for a
|
|
particular instance, the instance information was fetched from the database.
|
|
The information contained the hostname of the compute node on which the
|
|
instance was currently located. If the request needed to take action on the
|
|
instance (which it generally would), the hostname was used to calculate the
|
|
name of a queue and a message was written there which would eventually find its
|
|
way to the proper compute node.
|
|
|
|
The meat of the cells v2 feature was to split this hostname lookup into two parts
|
|
that yielded three pieces of information instead of one. Basically, instead of
|
|
merely looking up the *name* of the compute node on which an instance was
|
|
located, we also started obtaining database and queue connection information.
|
|
Thus, when asked to take action on instance $foo, we now:
|
|
|
|
1. Lookup the three-tuple of (database, queue, hostname) for that instance
|
|
2. Connect to that database and fetch the instance record
|
|
3. Connect to the queue and send the message to the proper hostname queue
|
|
|
|
The above differs from the previous organization in two ways. First, we now
|
|
need to do two database lookups before we know where the instance lives.
|
|
Second, we need to demand-connect to the appropriate database and queue. Both
|
|
of these changes had performance implications, but it was possible to mitigate
|
|
them through the use of things like a memcache of instance mapping information
|
|
and pooling of connections to database and queue systems. The number of cells
|
|
will always be much smaller than the number of instances.
|
|
|
|
There were also availability implications with the new feature since something like a
|
|
instance list which might query multiple cells could end up with a partial result
|
|
if there is a database failure in a cell. These issues can be mitigated, as
|
|
discussed in :ref:`handling-cell-failures`. A database failure within a cell
|
|
would cause larger issues than a partial list result so the expectation is that
|
|
it would be addressed quickly and cells v2 will handle it by indicating in the
|
|
response that the data may not be complete.
|
|
|
|
|
|
Comparison with cells v1
|
|
------------------------
|
|
|
|
Prior to the introduction of cells v2, nova had a very similar feature, also
|
|
called cells or referred to as cells v1 for disambiguation. Cells v2 was an
|
|
effort to address many of the perceived shortcomings of the cell v1 feature.
|
|
Benefits of the cells v2 feature over the previous cells v1 feature include:
|
|
|
|
- Native sharding of the database and queue as a first-class-feature in nova.
|
|
All of the code paths will go through the lookup procedure and thus we won't
|
|
have the same feature parity issues as we do with current cells.
|
|
|
|
- No high-level replication of all the cell databases at the top. The API will
|
|
need a database of its own for things like the instance index, but it will
|
|
not need to replicate all the data at the top level.
|
|
|
|
- It draws a clear line between global and local data elements. Things like
|
|
flavors and keypairs are clearly global concepts that need only live at the
|
|
top level. Providing this separation allows compute nodes to become even more
|
|
stateless and insulated from things like deleted/changed global data.
|
|
|
|
- Existing non-cells users will suddenly gain the ability to spawn a new "cell"
|
|
from their existing deployment without changing their architecture. Simply
|
|
adding information about the new database and queue systems to the new index
|
|
will allow them to consume those resources.
|
|
|
|
- Existing cells users will need to fill out the cells mapping index, shutdown
|
|
their existing cells synchronization service, and ultimately clean up their
|
|
top level database. However, since the high-level organization is not
|
|
substantially different, they will not have to re-architect their systems to
|
|
move to cells v2.
|
|
|
|
- Adding new sets of hosts as a new "cell" allows them to be plugged into a
|
|
deployment and tested before allowing builds to be scheduled to them.
|
|
|
|
|
|
.. _cells-v2-caveats:
|
|
|
|
Caveats
|
|
-------
|
|
|
|
.. note::
|
|
|
|
Many of these caveats have been addressed since the introduction of cells v2
|
|
in the 16.0.0 (Pike) release. These are called out below.
|
|
|
|
Cross-cell move operations
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Support for cross-cell cold migration and resize was introduced in the 21.0.0
|
|
(Ussuri) release. This is documented in
|
|
:doc:`/admin/configuration/cross-cell-resize`. Prior to this release, it was
|
|
not possible to cold migrate or resize an instance from a host in one cell to a
|
|
host in another cell.
|
|
|
|
It is not currently possible to live migrate, evacuate or unshelve an instance
|
|
from a host in one cell to a host in another cell.
|
|
|
|
Quota-related quirks
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Quotas are now calculated live at the point at which an operation
|
|
would consume more resource, instead of being kept statically in the
|
|
database. This means that a multi-cell environment may incorrectly
|
|
calculate the usage of a tenant if one of the cells is unreachable, as
|
|
those resources cannot be counted. In this case, the tenant may be
|
|
able to consume more resource from one of the available cells, putting
|
|
them far over quota when the unreachable cell returns.
|
|
|
|
.. note::
|
|
|
|
Starting in the Train (20.0.0) release, it is possible to configure
|
|
counting of quota usage from the placement service and API database
|
|
to make quota usage calculations resilient to down or poor-performing
|
|
cells in a multi-cell environment. See the :doc:`quotas documentation
|
|
</admin/quotas>` for more details.
|
|
|
|
Starting in the 2023.2 Bobcat (28.0.0) release, it is possible to configure
|
|
unified limits quotas, which stores quota limits as Keystone unified limits
|
|
and counts quota usage from the placement service and API database. See the
|
|
:doc:`unified limits documentation </admin/unified-limits>` for more
|
|
details.
|
|
|
|
Performance of listing instances
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Prior to the 17.0.0 (Queens) release, the instance list operation may not sort
|
|
and paginate results properly when crossing multiple cell boundaries. Further,
|
|
the performance of a sorted list operation across multiple cells was
|
|
considerably slower than with a single cell. This was resolved as part of the
|
|
`efficient-multi-cell-instance-list-and-sort`__ spec.
|
|
|
|
.. __: https://blueprints.launchpad.net/nova/+spec/efficient-multi-cell-instance-list-and-sort
|
|
|
|
Notifications
|
|
~~~~~~~~~~~~~
|
|
|
|
With a multi-cell environment with multiple message queues, it is
|
|
likely that operators will want to configure a separate connection to
|
|
a unified queue for notifications. This can be done in the configuration file
|
|
of all nodes. Refer to the :oslo.messaging-doc:`oslo.messaging configuration
|
|
documentation
|
|
<configuration/opts.html#oslo_messaging_notifications.transport_url>` for more
|
|
details.
|
|
|
|
.. _cells-v2-layout-metadata-api:
|
|
|
|
Nova Metadata API service
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Starting from the 19.0.0 (Stein) release, the :doc:`nova metadata API service
|
|
</admin/metadata-service>` can be run either globally or per cell using the
|
|
:oslo.config:option:`api.local_metadata_per_cell` configuration option.
|
|
|
|
.. rubric:: Global
|
|
|
|
If you have networks that span cells, you might need to run Nova metadata API
|
|
globally. When running globally, it should be configured as an API-level
|
|
service with access to the :oslo.config:option:`api_database.connection`
|
|
information. The nova metadata API service **must not** be run as a standalone
|
|
service, using the :program:`nova-api-metadata` service, in this case.
|
|
|
|
.. rubric:: Local per cell
|
|
|
|
Running Nova metadata API per cell can have better performance and data
|
|
isolation in a multi-cell deployment. If your networks are segmented along
|
|
cell boundaries, then you can run Nova metadata API service per cell. If you
|
|
choose to run it per cell, you should also configure each
|
|
:neutron-doc:`neutron-metadata-agent
|
|
<configuration/metadata-agent.html?#DEFAULT.nova_metadata_host>` service to
|
|
point to the corresponding :program:`nova-api-metadata`. The nova metadata API
|
|
service **must** be run as a standalone service, using the
|
|
:program:`nova-api-metadata` service, in this case.
|
|
|
|
Console proxies
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Starting from the 18.0.0 (Rocky) release, console proxies must be run per cell
|
|
because console token authorizations are stored in cell databases. This means
|
|
that each console proxy server must have access to the
|
|
:oslo.config:option:`database.connection` information for the cell database
|
|
containing the instances for which it is proxying console access. This
|
|
functionality was added as part of the `convert-consoles-to-objects`__ spec.
|
|
|
|
.. __: https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/convert-consoles-to-objects.html
|
|
|
|
.. _upcall:
|
|
|
|
Operations requiring upcalls
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
If you deploy multiple cells with a superconductor as described above,
|
|
computes and cell-based conductors will not have the ability to speak
|
|
to the scheduler as they are not connected to the same MQ. This is by
|
|
design for isolation, but currently the processes are not in place to
|
|
implement some features without such connectivity. Thus, anything that
|
|
requires a so-called "upcall" will not function. This impacts the
|
|
following:
|
|
|
|
#. Instance reschedules during boot and resize (part 1)
|
|
|
|
.. note::
|
|
|
|
This has been resolved in the `Queens release`__.
|
|
|
|
.. __: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/return-alternate-hosts.html
|
|
|
|
#. Instance affinity reporting from the compute nodes to scheduler
|
|
#. The late anti-affinity check during server create and evacuate
|
|
#. Querying host aggregates from the cell
|
|
|
|
.. note::
|
|
|
|
This has been resolved in the `Rocky release`__.
|
|
|
|
.. __: https://blueprints.launchpad.net/nova/+spec/live-migration-in-xapi-pool
|
|
|
|
#. Attaching a volume and ``[cinder] cross_az_attach = False``
|
|
#. Instance reschedules during boot and resize (part 2)
|
|
|
|
.. note:: This has been resolved in the `Ussuri release`__.
|
|
|
|
.. __: https://review.opendev.org/q/topic:bug/1781286
|
|
|
|
The first is simple: if you boot an instance, it gets scheduled to a
|
|
compute node, fails, it would normally be re-scheduled to another
|
|
node. That requires scheduler intervention and thus it will not work
|
|
in Pike with a multi-cell layout. If you do not rely on reschedules
|
|
for covering up transient compute-node failures, then this will not
|
|
affect you. To ensure you do not make futile attempts at rescheduling,
|
|
you should set :oslo.config:option:`scheduler.max_attempts` to ``1`` in
|
|
``nova.conf``.
|
|
|
|
The second two are related. The summary is that some of the facilities
|
|
that Nova has for ensuring that affinity/anti-affinity is preserved
|
|
between instances does not function in Pike with a multi-cell
|
|
layout. If you don't use affinity operations, then this will not
|
|
affect you. To make sure you don't make futile attempts at the
|
|
affinity check, you should set
|
|
:oslo.config:option:`workarounds.disable_group_policy_check_upcall` to ``True``
|
|
and :oslo.config:option:`filter_scheduler.track_instance_changes` to ``False``
|
|
in ``nova.conf``.
|
|
|
|
The fourth was previously only a problem when performing live migrations using
|
|
the since-removed XenAPI driver and not specifying ``--block-migrate``. The
|
|
driver would attempt to figure out if block migration should be performed based
|
|
on source and destination hosts being in the same aggregate. Since aggregates
|
|
data had migrated to the API database, the cell conductor would not be able to
|
|
access the aggregate information and would fail.
|
|
|
|
The fifth is a problem because when a volume is attached to an instance
|
|
in the *nova-compute* service, and ``[cinder]/cross_az_attach=False`` in
|
|
nova.conf, we attempt to look up the availability zone that the instance is
|
|
in which includes getting any host aggregates that the ``instance.host`` is in.
|
|
Since the aggregates are in the API database and the cell conductor cannot
|
|
access that information, so this will fail. In the future this check could be
|
|
moved to the *nova-api* service such that the availability zone between the
|
|
instance and the volume is checked before we reach the cell, except in the
|
|
case of :term:`boot from volume <Boot From Volume>` where the *nova-compute*
|
|
service itself creates the volume and must tell Cinder in which availability
|
|
zone to create the volume. Long-term, volume creation during boot from volume
|
|
should be moved to the top-level superconductor which would eliminate this AZ
|
|
up-call check problem.
|
|
|
|
The sixth is detailed in `bug 1781286`__ and is similar to the first issue.
|
|
The issue is that servers created without a specific availability zone
|
|
will have their AZ calculated during a reschedule based on the alternate host
|
|
selected. Determining the AZ for the alternate host requires an "up call" to
|
|
the API DB.
|
|
|
|
.. __: https://bugs.launchpad.net/nova/+bug/1781286
|
|
|
|
|
|
.. _handling-cell-failures:
|
|
|
|
Handling cell failures
|
|
----------------------
|
|
|
|
For an explanation on how ``nova-api`` handles cell failures please see the
|
|
`Handling Down Cells`__ section of the Compute API guide. Below, you can find
|
|
some recommended practices and considerations for effectively tolerating cell
|
|
failure situations.
|
|
|
|
.. __: https://docs.openstack.org/api-guide/compute/down_cells.html
|
|
|
|
Configuration considerations
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Since a cell being reachable or not is determined through timeouts, it is suggested
|
|
to provide suitable values for the following settings based on your requirements.
|
|
|
|
#. :oslo.config:option:`database.max_retries` is 10 by default meaning every time
|
|
a cell becomes unreachable, it would retry 10 times before nova can declare the
|
|
cell as a "down" cell.
|
|
#. :oslo.config:option:`database.retry_interval` is 10 seconds and
|
|
:oslo.config:option:`oslo_messaging_rabbit.rabbit_retry_interval` is 1 second by
|
|
default meaning every time a cell becomes unreachable it would retry every 10
|
|
seconds or 1 second depending on if it's a database or a message queue problem.
|
|
#. Nova also has a timeout value called ``CELL_TIMEOUT`` which is hardcoded to 60
|
|
seconds and that is the total time the nova-api would wait before returning
|
|
partial results for the "down" cells.
|
|
|
|
The values of the above settings will affect the time required for nova to decide
|
|
if a cell is unreachable and then take the necessary actions like returning
|
|
partial results.
|
|
|
|
The operator can also control the results of certain actions like listing
|
|
servers and services depending on the value of the
|
|
:oslo.config:option:`api.list_records_by_skipping_down_cells` config option.
|
|
If this is true, the results from the unreachable cells will be skipped
|
|
and if it is false, the request will just fail with an API error in situations where
|
|
partial constructs cannot be computed.
|
|
|
|
Disabling down cells
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
While the temporary outage in the infrastructure is being fixed, the affected
|
|
cells can be disabled so that they are removed from being scheduling candidates.
|
|
To enable or disable a cell, use :command:`nova-manage cell_v2 update_cell
|
|
--cell_uuid <cell_uuid> --disable`. See the :ref:`man-page-cells-v2` man page
|
|
for details on command usage.
|
|
|
|
Known issues
|
|
~~~~~~~~~~~~
|
|
|
|
1. **Services and Performance:** In case a cell is down during the startup of nova
|
|
services, there is the chance that the services hang because of not being able
|
|
to connect to all the cell databases that might be required for certain calculations
|
|
and initializations. An example scenario of this situation is if
|
|
:oslo.config:option:`upgrade_levels.compute` is set to ``auto`` then the
|
|
``nova-api`` service hangs on startup if there is at least one unreachable
|
|
cell. This is because it needs to connect to all the cells to gather
|
|
information on each of the compute service's version to determine the compute
|
|
version cap to use. The current workaround is to pin the
|
|
:oslo.config:option:`upgrade_levels.compute` to a particular version like
|
|
"rocky" and get the service up under such situations. See `bug 1815697`__
|
|
for more details. Also note
|
|
that in general during situations where cells are not reachable certain
|
|
"slowness" may be experienced in operations requiring hitting all the cells
|
|
because of the aforementioned configurable timeout/retry values.
|
|
|
|
.. _cells-counting-quotas:
|
|
|
|
2. **Counting Quotas:** Another known issue is in the current approach of counting
|
|
quotas where we query each cell database to get the used resources and aggregate
|
|
them which makes it sensitive to temporary cell outages. While the cell is
|
|
unavailable, we cannot count resource usage residing in that cell database and
|
|
things would behave as though more quota is available than should be. That is,
|
|
if a tenant has used all of their quota and part of it is in cell A and cell A
|
|
goes offline temporarily, that tenant will suddenly be able to allocate more
|
|
resources than their limit (assuming cell A returns, the tenant will have more
|
|
resources allocated than their allowed quota).
|
|
|
|
.. note:: Starting in the Train (20.0.0) release, it is possible to
|
|
configure counting of quota usage from the placement service and
|
|
API database to make quota usage calculations resilient to down or
|
|
poor-performing cells in a multi-cell environment. See the
|
|
:doc:`quotas documentation</user/quotas>` for more details.
|
|
|
|
.. __: https://bugs.launchpad.net/nova/+bug/1815697
|
|
|
|
|
|
FAQs
|
|
----
|
|
|
|
- How do I find out which hosts are bound to which cell?
|
|
|
|
There are a couple of ways to do this.
|
|
|
|
#. Run :program:`nova-manage cell_v2 discover_hosts --verbose`.
|
|
|
|
This does not produce a report but if you are trying to determine if a
|
|
host is in a cell you can run this and it will report any hosts that are
|
|
not yet mapped to a cell and map them. This command is idempotent.
|
|
|
|
#. Run :program:`nova-manage cell_v2 list_hosts`.
|
|
|
|
This will list hosts in all cells. If you want to list hosts in a
|
|
specific cell, you can use the ``--cell_uuid`` option.
|
|
|
|
- I updated the ``database_connection`` and/or ``transport_url`` in a cell
|
|
using the ``nova-manage cell_v2 update_cell`` command but the API is still
|
|
trying to use the old settings.
|
|
|
|
The cell mappings are cached in the :program:`nova-api` service worker so you
|
|
will need to restart the worker process to rebuild the cache. Note that there
|
|
is another global cache tied to request contexts, which is used in the
|
|
nova-conductor and nova-scheduler services, so you might need to do the same
|
|
if you are having the same issue in those services. As of the 16.0.0 (Pike)
|
|
release there is no timer on the cache or hook to refresh the cache using a
|
|
SIGHUP to the service.
|
|
|
|
- I have upgraded from Newton to Ocata and I can list instances but I get a
|
|
HTTP 404 (NotFound) error when I try to get details on a specific instance.
|
|
|
|
Instances need to be mapped to cells so the API knows which cell an instance
|
|
lives in. When upgrading, the :program:`nova-manage cell_v2 simple_cell_setup`
|
|
command will automatically map the instances to the single cell which is
|
|
backed by the existing nova database. If you have already upgraded and did
|
|
not use the :program:`nova-manage cell_v2 simple_cell_setup` command, you can run the
|
|
:program:`nova-manage cell_v2 map_instances` command with the ``--cell_uuid``
|
|
option to map all instances in the given cell. See the
|
|
:ref:`man-page-cells-v2` man page for details on command usage.
|
|
|
|
- Can I create a cell but have it disabled from scheduling?
|
|
|
|
Yes. It is possible to create a pre-disabled cell such that it does not
|
|
become a candidate for scheduling new VMs. This can be done by running the
|
|
:program:`nova-manage cell_v2 create_cell` command with the ``--disabled``
|
|
option.
|
|
|
|
- How can I disable a cell so that the new server create requests do not go to
|
|
it while I perform maintenance?
|
|
|
|
Existing cells can be disabled by running :program:`nova-manage cell_v2
|
|
update_cell` with the ``--disable`` option and can be re-enabled once the
|
|
maintenance period is over by running this command with the ``--enable``
|
|
option.
|
|
|
|
- I disabled (or enabled) a cell using the :program:`nova-manage cell_v2
|
|
update_cell` or I created a new (pre-disabled) cell(mapping) using the
|
|
:program:`nova-manage cell_v2 create_cell` command but the scheduler is still
|
|
using the old settings.
|
|
|
|
The cell mappings are cached in the scheduler worker so you will either need
|
|
to restart the scheduler process to refresh the cache, or send a SIGHUP
|
|
signal to the scheduler by which it will automatically refresh the cells
|
|
cache and the changes will take effect.
|
|
|
|
- Why was the cells REST API not implemented for cells v2? Why are there no
|
|
CRUD operations for cells in the API?
|
|
|
|
One of the deployment challenges that cells v1 had was the requirement for
|
|
the API and control services to be up before a new cell could be deployed.
|
|
This was not a problem for large-scale public clouds that never shut down,
|
|
but is not a reasonable requirement for smaller clouds that do offline
|
|
upgrades and/or clouds which could be taken completely offline by something
|
|
like a power outage. Initial devstack and gate testing for cells v1 was
|
|
delayed by the need to engineer a solution for bringing the services
|
|
partially online in order to deploy the rest, and this continues to be a gap
|
|
for other deployment tools. Consider also the FFU case where the control
|
|
plane needs to be down for a multi-release upgrade window where changes to
|
|
cell records have to be made. This would be quite a bit harder if the way
|
|
those changes are made is via the API, which must remain down during the
|
|
process.
|
|
|
|
Further, there is a long-term goal to move cell configuration (i.e.
|
|
cell_mappings and the associated URLs and credentials) into config and get
|
|
away from the need to store and provision those things in the database.
|
|
Obviously a CRUD interface in the API would prevent us from making that move.
|
|
|
|
- Why are cells not exposed as a grouping mechanism in the API for listing
|
|
services, instances, and other resources?
|
|
|
|
Early in the design of cells v2 we set a goal to not let the cell concept
|
|
leak out of the API, even for operators. Aggregates are the way nova supports
|
|
grouping of hosts for a variety of reasons, and aggregates can cut across
|
|
cells, and/or be aligned with them if desired. If we were to support cells as
|
|
another grouping mechanism, we would likely end up having to implement many
|
|
of the same features for them as aggregates, such as scheduler features,
|
|
metadata, and other searching/filtering operations. Since aggregates are how
|
|
Nova supports grouping, we expect operators to use aggregates any time they
|
|
need to refer to a cell as a group of hosts from the API, and leave actual
|
|
cells as a purely architectural detail.
|
|
|
|
The need to filter instances by cell in the API can and should be solved by
|
|
adding a generic by-aggregate filter, which would allow listing instances on
|
|
hosts contained within any aggregate, including one that matches the cell
|
|
boundaries if so desired.
|
|
|
|
- Why are the API responses for ``GET /servers``, ``GET /servers/detail``,
|
|
``GET /servers/{server_id}`` and ``GET /os-services`` missing some
|
|
information for certain cells at certain times? Why do I see the status as
|
|
"UNKNOWN" for the servers in those cells at those times when I run
|
|
``openstack server list`` or ``openstack server show``?
|
|
|
|
Starting from microversion 2.69 the API responses of ``GET /servers``, ``GET
|
|
/servers/detail``, ``GET /servers/{server_id}`` and ``GET /os-services`` may
|
|
contain missing keys during down cell situations. See the `Handling Down
|
|
Cells`__ section of the Compute API guide for more information on the partial
|
|
constructs.
|
|
|
|
For administrative considerations, see :ref:`handling-cell-failures`.
|
|
|
|
.. __: https://docs.openstack.org/api-guide/compute/down_cells.html
|
|
|
|
References
|
|
----------
|
|
|
|
A large number of cells v2-related presentations have been given at various
|
|
OpenStack and OpenInfra Summits over the years. These provide an excellent
|
|
reference on the history and development of the feature along with details from
|
|
real-world users of the feature.
|
|
|
|
- `Newton Summit Video - Nova Cells V2: What's Going On?`__
|
|
- `Pike Summit Video - Scaling Nova: How CellsV2 Affects Your Deployment`__
|
|
- `Queens Summit Video - Add Cellsv2 to your existing Nova deployment`__
|
|
- `Rocky Summit Video - Moving from CellsV1 to CellsV2 at CERN`__
|
|
- `Stein Summit Video - Scaling Nova with CellsV2: The Nova Developer and the
|
|
CERN Operator perspective`__
|
|
- `Train Summit Video - What's new in Nova Cellsv2?`__
|
|
|
|
.. __: https://www.openstack.org/videos/austin-2016/nova-cells-v2-whats-going-on
|
|
.. __: https://www.openstack.org/videos/boston-2017/scaling-nova-how-cellsv2-affects-your-deployment
|
|
.. __: https://www.openstack.org/videos/sydney-2017/adding-cellsv2-to-your-existing-nova-deployment
|
|
.. __: https://www.openstack.org/videos/summits/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern
|
|
.. __: https://www.openstack.org/videos/summits/berlin-2018/scaling-nova-with-cellsv2-the-nova-developer-and-the-cern-operator-perspective
|
|
.. __: https://www.openstack.org/videos/summits/denver-2019/whats-new-in-nova-cellsv2
|