Add some more cellsv2 doc goodness

This adds a fresh cellsv2 overview document that talks about
deployment decisions for single and multiple cell environments
in an attempt to help address confusion about what the service
layouts look like in a multi-cell setup.

Change-Id: I1da7c375dbb98c125aebabec548280de8d8ed381
Dan Smith 5 years ago
parent 7234e6e474
commit 7c17010448
  1. 1
  2. 2
  3. 293

@ -175,6 +175,7 @@ these are a great place to start reading up on the current plans.
:maxdepth: 1

@ -11,6 +11,8 @@
License for the specific language governing permissions and limitations
under the License.
.. _cells:

@ -0,0 +1,293 @@
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Cells Layout (v2)
This document describes the layout of a deployment with Cells
version 2, including deployment considerations for security and
scale. It is focused on code present in Pike and later, and while it
is geared towards people who want to have multiple cells for whatever
reason, the nature of the cellsv2 support in Nova means that it
applies in some way to all deployments.
.. note:: The concepts laid out in this document do not in any way
relate to CellsV1, which includes the ``nova-cells``
service, and the ``[cells]`` section of the configuration
file. For more information on the differences, see the main
:ref:`cells` page.
A basic Nova system consists of the following components:
* The nova-api service which provides the external REST API to users.
* The nova-scheduler and placement services which are responsible
for tracking resources and deciding which compute node instances
should be on.
* An "API database" that is used primarily by nova-api and
nova-scheduler (called *API-level services* below) to track location
information about instances, as well as a temporary location for
instances being built but not yet scheduled.
* The nova-conductor service which offloads long-running tasks for the
API-level service, as well as insulates compute nodes from direct
database access
* The nova-compute service which manages the virt driver and
hypervisor host.
* A "cell database" which is used by API, conductor and compute
services, and which houses the majority of the information about
* A "cell0 database" which is just like the cell database, but
contains only instances that failed to be scheduled.
* A message queue which allows the services to communicate with each
other via RPC.
All deployments have at least the above components. Small deployments
likely have a single message queue that all services share, and a
single database server which hosts the API database, a single cell
database, as well as the required cell0 database. This is considered a
"single-cell deployment" because it only has one "real" cell. The
cell0 database mimics a regular cell, but has no compute nodes and is
used only as a place to put instances that fail to land on a real
compute node (and thus a real cell).
The purpose of the cells functionality in nova is specifically to
allow larger deployments to shard their many compute nodes into cells,
each of which has a database and message queue. The API database is
always and only global, but there can be many cell databases (where
the bulk of the instance information lives), each with a portion of
the instances for the entire deployment within.
All of the nova services use a configuration file, all of which will
at a minimum specify a message queue endpoint
(i.e. ``[DEFAULT]/transport_url``). Most of the services also require
configuration of database connection information
(i.e. ``[database]/connection``). API-level services that need access
to the global routing and placement information will also be
configured to reach the API database
(i.e. ``[api_database]/connection``).
.. note:: The pair of ``transport_url`` and ``[database]/connection``
configured for a service defines what cell a service lives
API-level services need to be able to contact other services in all of
the cells. Since they only have one configured ``transport_url`` and
``[database]/connection`` they look up the information for the other
cells in the API database, with records called *cell mappings*.
.. note:: The API database must have cell mapping records that match
the ``transport_url`` and ``[database]/connection``
configuration elements of the lower-level services. See the
``nova-manage`` :ref:`man-page-cells-v2` commands for more
information about how to create and examine these records.
Service Layout
The services generally have a well-defined communication pattern that
dictates their layout in a deployment. In a small/simple scenario, the
rules do not have much of an impact as all the services can
communicate with each other on a single message bus and in a single
cell database. However, as the deployment grows, scaling and security
concerns may drive separation and isolation of the services.
This is a diagram of the basic services that a simple (single-cell)
deployment would have, as well as the relationships
(i.e. communication paths) between them:
.. graphviz::
digraph services {
graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
node [fontsize=10 fontname="Monospace"];
edge [arrowhead="normal", arrowsize="0.8"];
{ rank=same
api [label="nova-api"]
apidb [label="API Database" shape="box"]
scheduler [label="nova-scheduler"]
{ rank=same
mq [label="MQ" shape="diamond"]
conductor [label="nova-conductor"]
{ rank=same
cell0db [label="Cell0 Database" shape="box"]
celldb [label="Cell Database" shape="box"]
compute [label="nova-compute"]
api -> mq -> compute
conductor -> mq -> scheduler
api -> apidb
api -> cell0db
api -> celldb
conductor -> apidb
conductor -> cell0db
conductor -> celldb
All of the services are configured to talk to each other over the same
message bus, and there is only one cell database where live instance
data resides. The cell0 database is present (and required) but as no
compute nodes are connected to it, this is still a "single cell"
Multiple Cells
In order to shard the services into multiple cells, a number of things
must happen. First, the message bus must be split into pieces along
the same lines as the cell database. Second, a dedicated conductor
must be run for the API-level services, with access to the API
database and a dedicated message queue. We call this *super conductor*
to distinguish its place and purpose from the per-cell conductor nodes.
.. graphviz::
digraph services2 {
graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
node [fontsize=10 fontname="Monospace"];
edge [arrowhead="normal", arrowsize="0.8"];
subgraph api {
api [label="nova-api"]
scheduler [label="nova-scheduler"]
conductor [label="super conductor"]
{ rank=same
apimq [label="API MQ" shape="diamond"]
apidb [label="API Database" shape="box"]
api -> apimq -> conductor
api -> apidb
conductor -> apimq -> scheduler
conductor -> apidb
subgraph clustercell0 {
label="Cell 0"
cell0db [label="Cell Database" shape="box"]
subgraph clustercell1 {
label="Cell 1"
mq1 [label="Cell MQ" shape="diamond"]
cell1db [label="Cell Database" shape="box"]
conductor1 [label="nova-conductor"]
compute1 [label="nova-compute"]
conductor1 -> mq1 -> compute1
conductor1 -> cell1db
subgraph clustercell2 {
label="Cell 2"
mq2 [label="Cell MQ" shape="diamond"]
cell2db [label="Cell Database" shape="box"]
conductor2 [label="nova-conductor"]
compute2 [label="nova-compute"]
conductor2 -> mq2 -> compute2
conductor2 -> cell2db
api -> mq1 -> conductor1
api -> mq2 -> conductor2
api -> cell0db
api -> cell1db
api -> cell2db
conductor -> cell0db
conductor -> cell1db
conductor -> mq1
conductor -> cell2db
conductor -> mq2
It is important to note that services in the lower cell boxes do not
have the ability to call back to the API-layer services via RPC, nor
do they have access to the API database for global visibility of
resources across the cloud. This is intentional and provides security
and failure domain isolation benefits, but also has impacts on some
things that would otherwise require this any-to-any communication
style. Check the release notes for the version of Nova you are using
for the most up-to-date information about any caveats that may be
present due to this limitation.
Caveats of a Multi-Cell deployment
.. note: This information is correct as of the Pike release.
Cross-cell instance migrations
Currently it is not possible to migrate an instance from a host in one
cell to a host in another cell. This may be possible in the future,
but it is currently unsupported. This impacts cold migration,
resizes, live migrations, evacuate, and unshelve operations.
Quota-related quirks
Quotas are now calculated live at the point at which an operation
would consume more resource, instead of being kept statically in the
database. This means that a multi-cell environment may incorrectly
calculate the usage of a tenant if one of the cells is unreachable, as
those resources cannot be counted. In this case, the tenant may be
able to consume more resource from one of the available cells, putting
them far over quota when the unreachable cell returns. In the future,
placement will provide us with a consistent way to calculate usage
independent of the actual cell being reachable.
Performance of listing instances
With multiple cells, the instance list operation may not sort and
paginate results properly when crossing multiple cell
boundaries. Further, the performance of a sorted list operation will
be considerably slower than with a single cell.
With a multi-cell environment with multiple message queues, it is
likely that operators will want to configure a separate connection to
a unified queue for notifications. This can be done in the
configuration file of all nodes. See the `oslo.messaging configuration
documentation for more details
Neutron Metadata API proxy
The Neutron metadata API proxy should be global across all cells, and
thus be configured as an API-level service with access to the
``[api_database]/connection`` information.