We currently have three cells v2 documents in-tree: - A 'user/cellsv2-layout' document that details the structure or architecture of a cells v2 deployment (which is to say, any modern nova deployment) - A 'user/cells' document, which is written from a pre-cells v2 viewpoint and details the changes that cells v2 *will* require and the benefits it *would* bring. It also includes steps for upgrading from pre-cells v2 (that is, pre-Pike) deployment or a deployment with cells v1 (which we removed in Train and probably broke long before) - An 'admin/cells' document, which doesn't contain much other than some advice for handling down cells Clearly there's a lot of cruft to be cleared out as well as some centralization of information that's possible. As such, we combine all of these documents into one document, 'admin/cells'. This is chosen over 'users/cells' since cells are not an end-user-facing feature. References to cells v1 and details on upgrading from pre-cells v2 deployments are mostly dropped, as are some duplicated installation/configuration steps. Formatting is fixed and Sphinx-isms used to cross reference config option where possible. Finally, redirects are added so that people can continue to find the relevant resources. The result is (hopefully) a one stop shop for all things cells v2-related that operators can use to configure and understand their deployments. Change-Id: If39db50fd8b109a5a13dec70f8030f3663555065 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
13 KiB
Cross-cell resize
Note
This document describes how to configure nova for cross-cell resize.
For information on same-cell resize <Same-Cell Resize>
, refer to
/admin/configuration/resize
. For information on the
cells v2 feature, refer to /admin/cells
.
Historically resizing and cold migrating a server has been explicitly restricted to within the same cell in which the server already exists. The cross-cell resize feature allows configuring nova to allow resizing and cold migrating servers across cells.
The full design details are in the Ussuri spec and there is a video from a summit talk with a high-level overview.
Use case
There are many reasons to use multiple cells in a nova deployment beyond just scaling the database and message queue. Cells can also be used to shard a deployment by hardware generation and feature functionality. When sharding by hardware generation, it would be natural to setup a host aggregate for each cell and map flavors to the aggregate. Then when it comes time to decommission old hardware the deployer could provide new flavors and request that users resize to the new flavors, before some deadline, which under the covers will migrate their servers to the new cell with newer hardware. Administrators could also just cold migrate the servers during a maintenance window to the new cell.
Requirements
To enable cross-cell resize functionality the following conditions must be met.
Minimum compute versions
All compute services must be upgraded to 21.0.0 (Ussuri) or later and
not be pinned to older RPC API versions in :oslo.configupgrade_levels.compute
.
Policy configuration
The policy rule compute:servers:resize:cross_cell
controls who can perform a cross-cell resize or cold migrate operation.
By default the policy disables the functionality for all users.
A microversion is not required to opt into the behavior, just passing
the policy check. As such, it is recommended to start by allowing only
certain users to be able to perform a cross-cell resize or cold
migration, for example by setting the rule to
rule:admin_api
or some other rule for test teams but not
normal users until you are comfortable supporting the feature.
Compute driver
There are no special compute driver implementations required to
support the feature, it is built on existing driver interfaces used
during resize and shelve/unshelve. However, only the libvirt compute
driver has integration testing in the nova-multi-cell
CI
job.
Networking
The networking API must expose the
Port Bindings Extended
API extension which was added in the
13.0.0 (Rocky) release for Neutron.
Notifications
The types of events and their payloads remain unchanged. The major
difference from same-cell resize is the publisher_id may be
different in some cases since some events are sent from the conductor
service rather than a compute service. For example, with same-cell
resize the instance.resize_revert.start
notification is
sent from the source compute host in the finish_revert_resize
method but with cross-cell resize that same notification is sent from
the conductor service.
Obviously the actual message queue sending the notifications would be different for the source and target cells assuming they use separate transports.
Instance actions
The overall instance actions named resize
,
confirmResize
and revertResize
are the same as
same-cell resize. However, the events which make up those
actions will be different for cross-cell resize since the event names
are generated based on the compute service methods involved in the
operation and there are different methods involved in a cross-cell
resize. This is important for triage when a cross-cell resize operation
fails.
Scheduling
The CrossCellWeigher <cross-cell-weigher>
is enabled
by default. When a scheduling request allows selecting compute nodes
from another cell the weigher will by default prefer hosts
within the source cell over hosts from another cell. However, this
behavior is configurable using the :oslo.configfilter_scheduler.cross_cell_move_weight_multiplier
configuration option if, for example, you want to drain old cells when
resizing or cold migrating.
Code flow
The end user experience is meant to not change, i.e. status
transitions. A successfully cross-cell resized server will go to
VERIFY_RESIZE
status and from there the user can either
confirm or revert the resized server using the normal confirmResize
and revertResize
server action APIs.
Under the covers there are some differences from a traditional same-cell resize:
- There is no inter-compute interaction. Everything is synchronously
orchestrated
from the (super)conductor service. This uses the :oslo.config
long_rpc_timeout
configuration option. - The orchestration tasks in the (super)conductor service are in
charge of creating a copy of the instance and its related records in the
target cell database at the beginning of the operation, deleting them in
case of rollback or when the resize is confirmed/reverted, and updating
the
instance_mappings
table record in the API database. - Non-volume-backed servers will have their root disk uploaded to the image service as a temporary snapshot image just like during the shelveOffload operation. When finishing the resize on the destination host in the target cell that snapshot image will be used to spawn the guest and then the snapshot image will be deleted.
Sequence diagram
The following diagrams are current as of the 21.0.0 (Ussuri) release.
Resize
This is the sequence of calls to get the server to
VERIFY_RESIZE
status.
- seqdiag {
-
API; Conductor; Scheduler; Source; Destination; edge_length = 300; span_height = 15; activation = none; default_note_color = white;
API ->> Conductor [label = "cast", note = "resize_instance/migrate_server"]; Conductor => Scheduler [label = "MigrationTask", note = "select_destinations"]; Conductor -> Conductor [label = "TargetDBSetupTask"]; Conductor => Destination [label = "PrepResizeAtDestTask", note = "prep_snapshot_based_resize_at_dest"]; Conductor => Source [label = "PrepResizeAtSourceTask", note = "prep_snapshot_based_resize_at_source"]; Conductor => Destination [label = "FinishResizeAtDestTask", note = "finish_snapshot_based_resize_at_dest"]; Conductor -> Conductor [label = "FinishResizeAtDestTask", note = "update instance mapping"];
}
Confirm resize
This is the sequence of calls when confirming or
deleting a server in VERIFY_RESIZE
status.
- seqdiag {
-
API; Conductor; Source; edge_length = 300; span_height = 15; activation = none; default_note_color = white;
API ->> Conductor [label = "cast (or call if deleting)", note = "confirm_snapshot_based_resize"];
// separator to indicate everything after this is driven by ConfirmResizeTask === ConfirmResizeTask ===
Conductor => Source [label = "call", note = "confirm_snapshot_based_resize_at_source"]; Conductor -> Conductor [note = "hard delete source cell instance"]; Conductor -> Conductor [note = "update target cell instance status"];
}
Revert resize
This is the sequence of calls when reverting a server in
VERIFY_RESIZE
status.
- seqdiag {
-
API; Conductor; Source; Destination; edge_length = 300; span_height = 15; activation = none; default_note_color = white;
API ->> Conductor [label = "cast", note = "revert_snapshot_based_resize"];
// separator to indicate everything after this is driven by RevertResizeTask === RevertResizeTask ===
Conductor -> Conductor [note = "update records from target to source cell"]; Conductor -> Conductor [note = "update instance mapping"]; Conductor => Destination [label = "call", note = "revert_snapshot_based_resize_at_dest"]; Conductor -> Conductor [note = "hard delete target cell instance"]; Conductor => Source [label = "call", note = "finish_revert_snapshot_based_resize_at_source"];
}
Limitations
These are known to not yet be supported in the code:
- Instances with ports attached that have
bandwidth-aware </admin/ports-with-resource-requests>
resource provider allocations. Nova falls back to same-cell resize if the server has such ports. - Rescheduling to alternative hosts within the same target cell in
case the primary selected host fails the
prep_snapshot_based_resize_at_dest
call.
These may not work since they have not been validated by integration testing:
- Instances with PCI devices attached.
- Instances with a NUMA topology.
Other limitations:
- The config drive associated with the server, if there is one, will be re-generated on the destination host in the target cell. Therefore if the server was created with personality files they will be lost. However, this is no worse than evacuating a server that had a config drive when the source and destination compute host are not on shared storage or when shelve offloading and unshelving a server with a config drive. If necessary, the resized server can be rebuilt to regain the personality files.
- The
_poll_unconfirmed_resizes
periodic task, which can be :oslo.configconfigured <resize_confirm_window>
to automatically confirm pending resizes on the target host, might not support cross-cell resizes because doing so would require anup-call <upcall>
to the API to confirm the resize and cleanup the source cell database.
Troubleshooting
Timeouts
Configure a service user <user_token_timeout>
in case the
user token times out, e.g. during the snapshot and download of a large
server image.
If RPC calls are timing out with a MessagingTimeout
error in the logs, check the :oslo.configlong_rpc_timeout
option to see if it is high enough
though the default value (30 minutes) should be sufficient.
Recovering from failure
The orchestration tasks in conductor that drive the operation are built with rollbacks so each part of the operation can be rolled back in order if a subsequent task fails.
The thing to keep in mind is the instance_mappings
record in the API DB is the authority on where the instance "lives" and
that is where the API will go to show the instance in a
GET /servers/{server_id}
call or any action performed on
the server, including deleting it.
So if the resize fails and there is a copy of the instance and its
related records in the target cell, the tasks should automatically
delete them but if not you can hard-delete the records from whichever
cell is not the one in the instance_mappings
table.
If the instance is in ERROR
status, check the logs in
both the source and destination compute service to see if there is
anything that needs to be manually recovered, for example volume
attachments or port bindings, and also check the (super)conductor
service logs. Assuming volume attachments and port bindings are OK
(current and pointing at the correct host), then try hard rebooting the
server to get it back to ACTIVE
status. If that fails, you
may need to rebuild
the server on the source host. Note that the guest's disks on the source
host are not deleted until the resize is confirmed so if there is an
issue prior to confirm or confirm itself fails, the guest disks should
still be available for rebuilding the instance if necessary.