Add Instance HA support

This adds support for an Instance HA deployment option which evacuates
VMs after a compute node failure. To enable this feature just add
-e environments/compute-instanceha.yaml and make sure the compute nodes
have the OS::TripleO::Services::ComputeInstanceHA and the
OS::TripleO::Services::PacemakerRemote services added to it.

Testing has been done as follows:
1) Deploy an overcloud with Instance HA
2) Create a VM on the overcloud
3) Crash a compute node
4) Observe that the nova evacuate resource agent initiates the nova
   evacuation:
Nov 29 10:39:49 localhost NovaEvacuate(nova-evacuate)[32253]: NOTICE: Initiating evacuation of overcloud-novacompute-0.localdomain with fence_evacuate
Nov 29 10:39:57 localhost NovaEvacuate(nova-evacuate)[32253]: NOTICE: Completed evacuation of overcloud-novacompute-0.localdomain
5) Observe the VM having been started on the functional compute node

A documentation patch will follow explaining the whole mechanism more
in detail.

blueprint instance-ha

Depends-On: I4d1908242e9513a225d2b1da06ed4ee769ee10f7
Change-Id: If6c7d6c56eca96bd64ac5936036d119bd9ec6226
This commit is contained in:
Michele Baldessari 2017-09-21 22:16:20 +02:00
parent 1724c7d088
commit c56cdc8dda
5 changed files with 70 additions and 0 deletions

View File

@ -0,0 +1,9 @@
# An environment which enables instance HA
# Needs to be combined with environments/puppet-pacemaker.yaml
# The ComputeInstanceHA *and* PacemakerRemote services need to be added
# to your Compute role
resource_registry:
OS::TripleO::Services::ComputeInstanceHA: ../puppet/services/pacemaker/compute-instanceha.yaml
parameter_defaults:
EnableInstanceHA: true

View File

@ -308,6 +308,7 @@ resource_registry:
OS::TripleO::Services::SkydiveAgent: OS::Heat::None
OS::TripleO::Services::SkydiveAnalyzer: OS::Heat::None
OS::TripleO::Services::LoginDefs: OS::Heat::None
OS::TripleO::Services::ComputeInstanceHA: OS::Heat::None
# Logging
OS::TripleO::Services::Logging::BarbicanApi: docker/services/logging/files/barbican-api.yaml

View File

@ -397,6 +397,12 @@ outputs:
keystone::db::mysql::allowed_hosts:
- '%'
- "%{hiera('mysql_bind_host')}"
pacemaker:
keystone::endpoint::public_url: {get_param: [EndpointMap, KeystonePublic, uri_no_suffix]}
keystone::endpoint::internal_url: {get_param: [EndpointMap, KeystoneInternal, uri_no_suffix]}
keystone::endpoint::admin_url: {get_param: [EndpointMap, KeystoneAdmin, uri_no_suffix]}
keystone::endpoint::region: {get_param: KeystoneRegion}
keystone::admin_password: {get_param: AdminPassword}
horizon:
if:
- keystone_ldap_domain_enabled

View File

@ -0,0 +1,48 @@
heat_template_version: queens
description: >
OpenStack Compute InstanceHA services configured with Puppet
parameters:
ServiceData:
default: {}
description: Dictionary packing service data
type: json
ServiceNetMap:
default: {}
description: Mapping of service_name -> network name. Typically set
via parameter_defaults in the resource registry. This
mapping overrides those in ServiceNetMapDefaults.
type: json
DefaultPasswords:
default: {}
type: json
RoleName:
default: ''
description: Role name on which the service is applied
type: string
RoleParameters:
default: {}
description: Parameters specific to the role
type: json
EndpointMap:
default: {}
description: Mapping of service endpoint -> protocol. Typically set
via parameter_defaults in the resource registry.
type: json
EnableInstanceHA:
default: false
description: Whether to enable an Instance Ha configurarion or not.
This setup requires the Compute role to have the
PacemakerRemote service added to it.
type: boolean
outputs:
role_data:
description: Role data for Compute InstanceHA service.
value:
service_name: compute_instanceha
global_config_settings:
tripleo::instanceha: {get_param: EnableInstanceHA}
step_config: |
include ::tripleo::profile::pacemaker::compute_instanceha

View File

@ -0,0 +1,6 @@
---
features:
- |
Support for Instance HA is added. This configures the control plane to do
fence for a compute node that dies, then a nova --force-down and finally
and evacuation for the vms that were running on the failed node.