We currently have three cells v2 documents in-tree: - A 'user/cellsv2-layout' document that details the structure or architecture of a cells v2 deployment (which is to say, any modern nova deployment) - A 'user/cells' document, which is written from a pre-cells v2 viewpoint and details the changes that cells v2 *will* require and the benefits it *would* bring. It also includes steps for upgrading from pre-cells v2 (that is, pre-Pike) deployment or a deployment with cells v1 (which we removed in Train and probably broke long before) - An 'admin/cells' document, which doesn't contain much other than some advice for handling down cells Clearly there's a lot of cruft to be cleared out as well as some centralization of information that's possible. As such, we combine all of these documents into one document, 'admin/cells'. This is chosen over 'users/cells' since cells are not an end-user-facing feature. References to cells v1 and details on upgrading from pre-cells v2 deployments are mostly dropped, as are some duplicated installation/configuration steps. Formatting is fixed and Sphinx-isms used to cross reference config option where possible. Finally, redirects are added so that people can continue to find the relevant resources. The result is (hopefully) a one stop shop for all things cells v2-related that operators can use to configure and understand their deployments. Change-Id: If39db50fd8b109a5a13dec70f8030f3663555065 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
3.1 KiB
Affinity policy violated with parallel requests
Problem
Parallel server create requests for affinity or anti-affinity land on
the same host and servers go to the ACTIVE
state even
though the affinity or anti-affinity policy was violated.
Solution
There are two ways to avoid anti-/affinity policy violations among multiple server create requests.
Create multiple servers as a single request
Use the multi-create
API with the min_count
parameter set or the multi-create
CLI with the --min
option set to the desired number of
servers.
This works because when the batch of requests is visible to
nova-scheduler
at the same time as a group, it will be able
to choose compute hosts that satisfy the anti-/affinity constraint and
will send them to the same hosts or different hosts accordingly.
Adjust Nova configuration settings
When requests are made separately and the scheduler cannot consider
the batch of requests at the same time as a group, anti-/affinity races
are handled by what is called the "late affinity check" in
nova-compute
. Once a server lands on a compute host, if the
request involves a server group, nova-compute
contacts the
API database (via nova-conductor
) to retrieve the server
group and then it checks whether the affinity policy has been violated.
If the policy has been violated, nova-compute
initiates a
reschedule of the server create request. Note that this means the
deployment must have :oslo.configscheduler.max_attempts
set greater than
1
(default is 3
) to handle races.
An ideal configuration for multiple cells will minimize upcalls <upcall>
from
the cells to the API database. This is how devstack, for example, is
configured in the CI gate. The cell conductors do not set
:oslo.configapi_database.connection
and
nova-compute
sets :oslo.configworkarounds.disable_group_policy_check_upcall
to
True
.
However, if a deployment needs to handle racing affinity requests, it needs to configure cell conductors to have access to the API database, for example:
[api_database]
connection = mysql+pymysql://root:a@127.0.0.1/nova_api?charset=utf8
The deployment also needs to configure nova-compute
services not to disable the group policy check upcall by either not
setting (use the default) :oslo.configworkarounds.disable_group_policy_check_upcall
or
setting it to False
, for example:
[workarounds]
disable_group_policy_check_upcall = False
With these settings, anti-/affinity policy should not be violated even when parallel server create requests are racing.
Future work is needed to add anti-/affinity support to the placement
service in order to eliminate the need for the late affinity check in
nova-compute
.