nova/nova/conductor
Matt Riedemann c895d3e6bc Sanity check instance mapping during scheduling
mnaser reported a weird case where an instance was found
in both cell0 (deleted there) and in cell1 (not deleted
there but in error state from a failed build). It's unclear
how this could happen besides some weird clustered rabbitmq
issue where maybe the schedule and build request to conductor
happens twice for the same instance and one picks a host and
tries to build and the other fails during scheduling and is
buried in cell0.

To avoid a split brain situation like this, we add a sanity
check in _bury_in_cell0 to make sure the instance mapping is
not pointing at a cell when we go to update it to cell0.
Similarly a check is added in the schedule_and_build_instances
flow (the code is moved to a private method to make it easier
to test).

Worst case is this is unnecessary but doesn't hurt anything,
best case is this helps avoid split brain clustered rabbit
issues.

Closes-Bug: #1775934

Change-Id: I335113f0ec59516cb337d34b6fc9078ea202130f
(cherry picked from commit 5b552518e1)
(cherry picked from commit efc35b1c52)
2020-10-06 21:51:34 +00:00
..
tasks Merge "Update instance.availability_zone during live migration" 2019-03-13 23:25:11 +00:00
__init__.py Remove conductor local api:s and 'use_local' config option 2016-10-18 14:26:06 +02:00
api.py In Python3.7 async is a keyword [1] 2018-07-20 12:21:34 -04:00
manager.py Sanity check instance mapping during scheduling 2020-10-06 21:51:34 +00:00
rpcapi.py Handle legacy request spec dict in ComputeTaskManager._cold_migrate 2019-09-26 11:00:11 -04:00