Add info about affinity requests to the troubleshooting doc
We had recent bug report about a possible regression related to affinity policy enforcement with parallel server create requests. It turned out not to be a regression but because of the complexity around affinity enforcement, it might help to add a section to the compute troubleshooting doc about it which we could refer to in the future. Related-Bug: #1863190 Change-Id: I508c48183a7205d46e13154d4e92d31dfa7f7d78
This commit is contained in:
parent
139d9b63b5
commit
df216de6d9
@ -16,6 +16,7 @@ you how to troubleshoot Compute.
|
||||
|
||||
troubleshooting/orphaned-allocations.rst
|
||||
troubleshooting/rebuild-placement-db.rst
|
||||
troubleshooting/affinity-policy-violated.rst
|
||||
|
||||
|
||||
Compute service logging
|
||||
|
@ -0,0 +1,78 @@
|
||||
Affinity policy violated with parallel requests
|
||||
===============================================
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
Parallel server create requests for affinity or anti-affinity land on the same
|
||||
host and servers go to the ``ACTIVE`` state even though the affinity or
|
||||
anti-affinity policy was violated.
|
||||
|
||||
Solution
|
||||
--------
|
||||
|
||||
There are two ways to avoid anti-/affinity policy violations among multiple
|
||||
server create requests.
|
||||
|
||||
Create multiple servers as a single request
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Use the `multi-create API`_ with the ``min_count`` parameter set or the
|
||||
`multi-create CLI`_ with the ``--min`` option set to the desired number of
|
||||
servers.
|
||||
|
||||
This works because when the batch of requests is visible to ``nova-scheduler``
|
||||
at the same time as a group, it will be able to choose compute hosts that
|
||||
satisfy the anti-/affinity constraint and will send them to the same hosts or
|
||||
different hosts accordingly.
|
||||
|
||||
.. _multi-create API: https://docs.openstack.org/api-ref/compute/#create-multiple-servers
|
||||
.. _multi-create CLI: https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create
|
||||
|
||||
Adjust Nova configuration settings
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When requests are made separately and the scheduler cannot consider the batch
|
||||
of requests at the same time as a group, anti-/affinity races are handled by
|
||||
what is called the "late affinity check" in ``nova-compute``. Once a server
|
||||
lands on a compute host, if the request involves a server group,
|
||||
``nova-compute`` contacts the API database (via ``nova-conductor``) to retrieve
|
||||
the server group and then it checks whether the affinity policy has been
|
||||
violated. If the policy has been violated, ``nova-compute`` initiates a
|
||||
reschedule of the server create request. Note that this means the deployment
|
||||
must have :oslo.config:option:`scheduler.max_attempts` set greater than ``1``
|
||||
(default is ``3``) to handle races.
|
||||
|
||||
An ideal configuration for multiple cells will minimize `upcalls`_ from the
|
||||
cells to the API database. This is how devstack, for example, is configured in
|
||||
the CI gate. The cell conductors do not set
|
||||
:oslo.config:option:`api_database.connection` and ``nova-compute`` sets
|
||||
:oslo.config:option:`workarounds.disable_group_policy_check_upcall` to
|
||||
``True``.
|
||||
|
||||
However, if a deployment needs to handle racing affinity requests, it needs to
|
||||
configure cell conductors to have access to the API database, for example:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[api_database]
|
||||
connection = mysql+pymysql://root:a@127.0.0.1/nova_api?charset=utf8
|
||||
|
||||
The deployment also needs to configure ``nova-compute`` services not to disable
|
||||
the group policy check upcall by either not setting (use the default)
|
||||
:oslo.config:option:`workarounds.disable_group_policy_check_upcall` or setting
|
||||
it to ``False``, for example:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[workarounds]
|
||||
disable_group_policy_check_upcall = False
|
||||
|
||||
With these settings, anti-/affinity policy should not be violated even when
|
||||
parallel server create requests are racing.
|
||||
|
||||
Future work is needed to add anti-/affinity support to the placement service in
|
||||
order to eliminate the need for the late affinity check in ``nova-compute``.
|
||||
|
||||
.. _upcalls: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls
|
||||
|
Loading…
Reference in New Issue
Block a user