Recover from REBOOT-* state on compute manager start-up

If a compute manager is stopped / fails during certain REBOOT-*
operations then the instance will be left stuck in a transitional
task_state

This change handles two possible task states REBOOT_PENDING and
REBOOT_STARTED (for both soft and hard reboots). And either clears
these states or retries the reboot depending on the instance state.

Both task states are set after the request has gotten to the compute
manager so we can handle these safely knowing the operation has
ended/failed with the restart of the compute manager.

We retry the reboot where the state is PENDING and where the state is
STARTED and the instance is not running.

Where the instance is running and the state is STARTED we simply
transition the instance to an ACTIVE state. The user can retry the
reboot if required.

Related to blueprint recover-stuck-state

Change-Id: Ib318b3d444a67616441302f3daa8de20ee946f3f
This commit is contained in:
David McNally
2013-11-22 16:18:53 +00:00
parent 5137045db2
commit cc0be157d0
12 changed files with 327 additions and 19 deletions

View File

@@ -23,6 +23,8 @@ from oslo.config import cfg
from nova import block_device
from nova.compute import flavors
from nova.compute import power_state
from nova.compute import task_states
from nova import exception
from nova.network import model as network_model
from nova import notifications
@@ -448,6 +450,16 @@ def usage_volume_info(vol_usage):
return usage_info
def get_reboot_type(task_state, current_power_state):
"""Checks if the current instance state requires a HARD reboot."""
if current_power_state != power_state.RUNNING:
return 'HARD'
soft_types = [task_states.REBOOT_STARTED, task_states.REBOOT_PENDING,
task_states.REBOOTING]
reboot_type = 'SOFT' if task_state in soft_types else 'HARD'
return reboot_type
class EventReporter(object):
"""Context manager to report instance action events."""