Recover from REBOOT-* state on compute manager start-up

If a compute manager is stopped / fails during certain REBOOT-* operations then the instance will be left stuck in a transitional task_state This change handles two possible task states REBOOT_PENDING and REBOOT_STARTED (for both soft and hard reboots). And either clears these states or retries the reboot depending on the instance state. Both task states are set after the request has gotten to the compute manager so we can handle these safely knowing the operation has ended/failed with the restart of the compute manager. We retry the reboot where the state is PENDING and where the state is STARTED and the instance is not running. Where the instance is running and the state is STARTED we simply transition the instance to an ACTIVE state. The user can retry the reboot if required. Related to blueprint recover-stuck-state Change-Id: Ib318b3d444a67616441302f3daa8de20ee946f3f
2013-11-22 16:18:53 +00:00
parent 5137045db2
commit cc0be157d0
12 changed files with 327 additions and 19 deletions
--- a/nova/compute/utils.py
+++ b/nova/compute/utils.py
@@ -23,6 +23,8 @@ from oslo.config import cfg

 from nova import block_device
 from nova.compute import flavors
+from nova.compute import power_state
+from nova.compute import task_states
 from nova import exception
 from nova.network import model as network_model
 from nova import notifications
@@ -448,6 +450,16 @@ def usage_volume_info(vol_usage):
    return usage_info


+def get_reboot_type(task_state, current_power_state):
+    """Checks if the current instance state requires a HARD reboot."""
+    if current_power_state != power_state.RUNNING:
+        return 'HARD'
+    soft_types = [task_states.REBOOT_STARTED, task_states.REBOOT_PENDING,
+                  task_states.REBOOTING]
+    reboot_type = 'SOFT' if task_state in soft_types else 'HARD'
+    return reboot_type
+
+
 class EventReporter(object):
    """Context manager to report instance action events."""