From e8b6d26043bc66a8ae7b15125da1c284aacf629d Mon Sep 17 00:00:00 2001 From: Adam Spiers Date: Wed, 25 May 2016 18:03:29 +0100 Subject: [PATCH] be more precise about failure events "Compute host is down" is not a failure event; in fact it's not necessarily even a failure state, because the compute host could have been cleanly shut down. In contrast, it's clearly a failure event if the compute host crashes or hangs. Also make similar fixes for other failure events. Change-Id: Ifae9aee9d5b7df37b322aa98807ce31d15312734 --- user-stories/proposed/ha_vm.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/user-stories/proposed/ha_vm.rst b/user-stories/proposed/ha_vm.rst index 8d37b20..61beec6 100644 --- a/user-stories/proposed/ha_vm.rst +++ b/user-stories/proposed/ha_vm.rst @@ -42,11 +42,11 @@ can be detected and recovered by the system. Possible failure events include: * VM provisioning process (nova-compute service) is down. -* Compute host is down. +* Compute host crashes or hangs. -* Hypervisor has failed (e.g. libvirtd process is dead or unresponsive). +* Hypervisor fails, e.g. libvirtd process dies or becomes unresponsive. -* Network is down +* Network component fails. There are many ways a network component could fail, e.g. NIC configuration error, NIC driver failure, NIC hardware failure, cable @@ -71,11 +71,11 @@ The goal of the user story is to reduce that interruption via automated recovery Usage Scenario Examples +++++++++++++++++++++++ -* VM is down +* Recovery from VM failure Monitor the VM. Detect VM down failure and notify system to recover the VM. -* VM provisioning process is down +* Recovery from ``nova-compute`` failure Monitor the provisioning process (nova-compute service). Detect process failure and notify system to restart the service. @@ -87,7 +87,7 @@ Usage Scenario Examples the hosts must be fenced to prevent two instances writing to the same shared storage that lead to data corruption. -* Hypervisor host is down +* Recovery from hypervisor host failure Monitor the hypervisor host. Detect hypervisor host failure and evacuate all VMs from failed host. Restart the VMs on new hosts that enable an