diff --git a/doc/source/index.rst b/doc/source/index.rst index 841de444c9d6..4d17a822ab82 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -175,6 +175,7 @@ these are a great place to start reading up on the current plans. :maxdepth: 1 user/cells + user/cellsv2_layout user/upgrade contributor/api contributor/microversions diff --git a/doc/source/user/cells.rst b/doc/source/user/cells.rst index fe1165529aeb..c3926637a816 100644 --- a/doc/source/user/cells.rst +++ b/doc/source/user/cells.rst @@ -11,6 +11,8 @@ License for the specific language governing permissions and limitations under the License. +.. _cells: + ======= Cells ======= diff --git a/doc/source/user/cellsv2_layout.rst b/doc/source/user/cellsv2_layout.rst new file mode 100644 index 000000000000..137fa5733dc4 --- /dev/null +++ b/doc/source/user/cellsv2_layout.rst @@ -0,0 +1,293 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + +=================== + Cells Layout (v2) +=================== + +This document describes the layout of a deployment with Cells +version 2, including deployment considerations for security and +scale. It is focused on code present in Pike and later, and while it +is geared towards people who want to have multiple cells for whatever +reason, the nature of the cellsv2 support in Nova means that it +applies in some way to all deployments. + +.. note:: The concepts laid out in this document do not in any way + relate to CellsV1, which includes the ``nova-cells`` + service, and the ``[cells]`` section of the configuration + file. For more information on the differences, see the main + :ref:`cells` page. + +Concepts +======== + +A basic Nova system consists of the following components: + +* The nova-api service which provides the external REST API to users. +* The nova-scheduler and placement services which are responsible + for tracking resources and deciding which compute node instances + should be on. +* An "API database" that is used primarily by nova-api and + nova-scheduler (called *API-level services* below) to track location + information about instances, as well as a temporary location for + instances being built but not yet scheduled. +* The nova-conductor service which offloads long-running tasks for the + API-level service, as well as insulates compute nodes from direct + database access +* The nova-compute service which manages the virt driver and + hypervisor host. +* A "cell database" which is used by API, conductor and compute + services, and which houses the majority of the information about + instances. +* A "cell0 database" which is just like the cell database, but + contains only instances that failed to be scheduled. +* A message queue which allows the services to communicate with each + other via RPC. + +All deployments have at least the above components. Small deployments +likely have a single message queue that all services share, and a +single database server which hosts the API database, a single cell +database, as well as the required cell0 database. This is considered a +"single-cell deployment" because it only has one "real" cell. The +cell0 database mimics a regular cell, but has no compute nodes and is +used only as a place to put instances that fail to land on a real +compute node (and thus a real cell). + +The purpose of the cells functionality in nova is specifically to +allow larger deployments to shard their many compute nodes into cells, +each of which has a database and message queue. The API database is +always and only global, but there can be many cell databases (where +the bulk of the instance information lives), each with a portion of +the instances for the entire deployment within. + +All of the nova services use a configuration file, all of which will +at a minimum specify a message queue endpoint +(i.e. ``[DEFAULT]/transport_url``). Most of the services also require +configuration of database connection information +(i.e. ``[database]/connection``). API-level services that need access +to the global routing and placement information will also be +configured to reach the API database +(i.e. ``[api_database]/connection``). + +.. note:: The pair of ``transport_url`` and ``[database]/connection`` + configured for a service defines what cell a service lives + in. + +API-level services need to be able to contact other services in all of +the cells. Since they only have one configured ``transport_url`` and +``[database]/connection`` they look up the information for the other +cells in the API database, with records called *cell mappings*. + +.. note:: The API database must have cell mapping records that match + the ``transport_url`` and ``[database]/connection`` + configuration elements of the lower-level services. See the + ``nova-manage`` :ref:`man-page-cells-v2` commands for more + information about how to create and examine these records. + +Service Layout +============== + +The services generally have a well-defined communication pattern that +dictates their layout in a deployment. In a small/simple scenario, the +rules do not have much of an impact as all the services can +communicate with each other on a single message bus and in a single +cell database. However, as the deployment grows, scaling and security +concerns may drive separation and isolation of the services. + +Simple +------ + +This is a diagram of the basic services that a simple (single-cell) +deployment would have, as well as the relationships +(i.e. communication paths) between them: + +.. graphviz:: + + digraph services { + graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true]; + node [fontsize=10 fontname="Monospace"]; + edge [arrowhead="normal", arrowsize="0.8"]; + labelloc=bottom; + labeljust=left; + + { rank=same + api [label="nova-api"] + apidb [label="API Database" shape="box"] + scheduler [label="nova-scheduler"] + } + { rank=same + mq [label="MQ" shape="diamond"] + conductor [label="nova-conductor"] + } + { rank=same + cell0db [label="Cell0 Database" shape="box"] + celldb [label="Cell Database" shape="box"] + compute [label="nova-compute"] + } + + api -> mq -> compute + conductor -> mq -> scheduler + + api -> apidb + api -> cell0db + api -> celldb + + conductor -> apidb + conductor -> cell0db + conductor -> celldb + } + +All of the services are configured to talk to each other over the same +message bus, and there is only one cell database where live instance +data resides. The cell0 database is present (and required) but as no +compute nodes are connected to it, this is still a "single cell" +deployment. + +Multiple Cells +-------------- + +In order to shard the services into multiple cells, a number of things +must happen. First, the message bus must be split into pieces along +the same lines as the cell database. Second, a dedicated conductor +must be run for the API-level services, with access to the API +database and a dedicated message queue. We call this *super conductor* +to distinguish its place and purpose from the per-cell conductor nodes. + +.. graphviz:: + + digraph services2 { + graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true]; + node [fontsize=10 fontname="Monospace"]; + edge [arrowhead="normal", arrowsize="0.8"]; + labelloc=bottom; + labeljust=left; + + subgraph api { + api [label="nova-api"] + scheduler [label="nova-scheduler"] + conductor [label="super conductor"] + { rank=same + apimq [label="API MQ" shape="diamond"] + apidb [label="API Database" shape="box"] + } + + api -> apimq -> conductor + api -> apidb + conductor -> apimq -> scheduler + conductor -> apidb + } + + subgraph clustercell0 { + label="Cell 0" + color=green + cell0db [label="Cell Database" shape="box"] + } + + subgraph clustercell1 { + label="Cell 1" + color=blue + mq1 [label="Cell MQ" shape="diamond"] + cell1db [label="Cell Database" shape="box"] + conductor1 [label="nova-conductor"] + compute1 [label="nova-compute"] + + conductor1 -> mq1 -> compute1 + conductor1 -> cell1db + + } + + subgraph clustercell2 { + label="Cell 2" + color=red + mq2 [label="Cell MQ" shape="diamond"] + cell2db [label="Cell Database" shape="box"] + conductor2 [label="nova-conductor"] + compute2 [label="nova-compute"] + + conductor2 -> mq2 -> compute2 + conductor2 -> cell2db + } + + api -> mq1 -> conductor1 + api -> mq2 -> conductor2 + api -> cell0db + api -> cell1db + api -> cell2db + + conductor -> cell0db + conductor -> cell1db + conductor -> mq1 + conductor -> cell2db + conductor -> mq2 + } + +It is important to note that services in the lower cell boxes do not +have the ability to call back to the API-layer services via RPC, nor +do they have access to the API database for global visibility of +resources across the cloud. This is intentional and provides security +and failure domain isolation benefits, but also has impacts on some +things that would otherwise require this any-to-any communication +style. Check the release notes for the version of Nova you are using +for the most up-to-date information about any caveats that may be +present due to this limitation. + +Caveats of a Multi-Cell deployment +---------------------------------- + +.. note: This information is correct as of the Pike release. + +Cross-cell instance migrations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently it is not possible to migrate an instance from a host in one +cell to a host in another cell. This may be possible in the future, +but it is currently unsupported. This impacts cold migration, +resizes, live migrations, evacuate, and unshelve operations. + +Quota-related quirks +~~~~~~~~~~~~~~~~~~~~ + +Quotas are now calculated live at the point at which an operation +would consume more resource, instead of being kept statically in the +database. This means that a multi-cell environment may incorrectly +calculate the usage of a tenant if one of the cells is unreachable, as +those resources cannot be counted. In this case, the tenant may be +able to consume more resource from one of the available cells, putting +them far over quota when the unreachable cell returns. In the future, +placement will provide us with a consistent way to calculate usage +independent of the actual cell being reachable. + +Performance of listing instances +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +With multiple cells, the instance list operation may not sort and +paginate results properly when crossing multiple cell +boundaries. Further, the performance of a sorted list operation will +be considerably slower than with a single cell. + +Notifications +~~~~~~~~~~~~~ + +With a multi-cell environment with multiple message queues, it is +likely that operators will want to configure a separate connection to +a unified queue for notifications. This can be done in the +configuration file of all nodes. See the `oslo.messaging configuration +`_ +documentation for more details + +Neutron Metadata API proxy +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Neutron metadata API proxy should be global across all cells, and +thus be configured as an API-level service with access to the +``[api_database]/connection`` information.