nova/doc/source/admin/troubleshooting/affinity-policy-violated.rst
Stephen Finucane 13b06ebb1c docs: Add a new cells v2 document
We currently have three cells v2 documents in-tree:

- A 'user/cellsv2-layout' document that details the structure or
  architecture of a cells v2 deployment (which is to say, any modern
  nova deployment)
- A 'user/cells' document, which is written from a pre-cells v2
  viewpoint and details the changes that cells v2 *will* require and the
  benefits it *would* bring. It also includes steps for upgrading from
  pre-cells v2 (that is, pre-Pike) deployment or a deployment with cells
  v1 (which we removed in Train and probably broke long before)
- An 'admin/cells' document, which doesn't contain much other than some
  advice for handling down cells

Clearly there's a lot of cruft to be cleared out as well as some
centralization of information that's possible. As such, we combine all
of these documents into one document, 'admin/cells'. This is chosen over
'users/cells' since cells are not an end-user-facing feature. References
to cells v1 and details on upgrading from pre-cells v2 deployments are
mostly dropped, as are some duplicated installation/configuration steps.
Formatting is fixed and Sphinx-isms used to cross reference config
option where possible. Finally, redirects are added so that people can
continue to find the relevant resources. The result is (hopefully) a
one stop shop for all things cells v2-related that operators can use to
configure and understand their deployments.

Change-Id: If39db50fd8b109a5a13dec70f8030f3663555065
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-10-19 12:51:39 +01:00

3.1 KiB

Affinity policy violated with parallel requests

Problem

Parallel server create requests for affinity or anti-affinity land on the same host and servers go to the ACTIVE state even though the affinity or anti-affinity policy was violated.

Solution

There are two ways to avoid anti-/affinity policy violations among multiple server create requests.

Create multiple servers as a single request

Use the multi-create API with the min_count parameter set or the multi-create CLI with the --min option set to the desired number of servers.

This works because when the batch of requests is visible to nova-scheduler at the same time as a group, it will be able to choose compute hosts that satisfy the anti-/affinity constraint and will send them to the same hosts or different hosts accordingly.

Adjust Nova configuration settings

When requests are made separately and the scheduler cannot consider the batch of requests at the same time as a group, anti-/affinity races are handled by what is called the "late affinity check" in nova-compute. Once a server lands on a compute host, if the request involves a server group, nova-compute contacts the API database (via nova-conductor) to retrieve the server group and then it checks whether the affinity policy has been violated. If the policy has been violated, nova-compute initiates a reschedule of the server create request. Note that this means the deployment must have :oslo.configscheduler.max_attempts set greater than 1 (default is 3) to handle races.

An ideal configuration for multiple cells will minimize upcalls <upcall> from the cells to the API database. This is how devstack, for example, is configured in the CI gate. The cell conductors do not set :oslo.configapi_database.connection and nova-compute sets :oslo.configworkarounds.disable_group_policy_check_upcall to True.

However, if a deployment needs to handle racing affinity requests, it needs to configure cell conductors to have access to the API database, for example:

[api_database]
connection = mysql+pymysql://root:a@127.0.0.1/nova_api?charset=utf8

The deployment also needs to configure nova-compute services not to disable the group policy check upcall by either not setting (use the default) :oslo.configworkarounds.disable_group_policy_check_upcall or setting it to False, for example:

[workarounds]
disable_group_policy_check_upcall = False

With these settings, anti-/affinity policy should not be violated even when parallel server create requests are racing.

Future work is needed to add anti-/affinity support to the placement service in order to eliminate the need for the late affinity check in nova-compute.