tripleo-heat-templates/extraconfig/tasks/major_upgrade_pacemaker.yaml
Michele Baldessari ad07a29f94 Fix races in major-upgrade-pacemaker Step2
tripleo-heat-templates/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
has the following code:
...
check_resource mongod started 600

if [[ -n $(is_bootstrap_node) ]]; then
...
    tstart=$(date +%s)
    while ! clustercheck; do
        sleep 5
        tnow=$(date +%s)
        if (( tnow-tstart > galera_sync_timeout )) ; then
            echo_error "ERROR galera sync timed out"
            exit 1
        fi
    done

    # Run all the db syncs
    cinder-manage db sync
...
fi

start_or_enable_service rabbitmq
check_resource rabbitmq started 600
start_or_enable_service redis
check_resource redis started 600
start_or_enable_service openstack-cinder-volume
check_resource openstack-cinder-volume started 600

systemctl_swift start

for service in $(services_to_migrate); do
    manage_systemd_service start "${service%%-clone}"
    check_resource_systemd "${service%%-clone}" started 600
done
"""

The problem with the above code is that it is open to the following race
condition:
1) Bootstrap node is busy checking the galera status via cluster check
2) Non-bootstrap node has already reached: start_or_enable_service
   rabbitmq and later lines. These lines will be skipped because
   start_or_enable_service is a noop on non-bootstrap nodes and
   check_resource rabbitmq only checks that pcs status |grep rabbitmq
   returns true.
3) Non-bootstrap node can then reach the manage_systemd_service start
   and it will fail with stuff like:
  "Job for openstack-nova-scheduler.service failed because the control
  process exited with error code. See \"systemctl status
  openstack-nova-scheduler.service\" and \"journalctl -xe\" for
  details.\n" (because the db tables are not migrated yet)

This happens because 3) was started on non-bootstrap nodes before the
db-sync statements are complete on the bootstrap node. I did not feel
like changing the semantics of check_resource and remove the noop on
non-bootstrap nodes as other parts of the tree might rely on this
behaviour.

Depends-On: Ia016264b51f485b97fa150ebd357b109581342ed
Change-Id: I663313e183bb05b35d0c5af016c2d1705c772bd9
Closes-Bug: #1627965
2016-09-29 07:41:28 +02:00

142 lines
4.6 KiB
YAML

heat_template_version: 2016-10-14
description: 'Upgrade for Pacemaker deployments'
parameters:
servers:
type: json
input_values:
type: json
description: input values for the software deployments
UpgradeLevelNovaCompute:
type: string
description: Nova Compute upgrade level
default: ''
MySqlMajorUpgrade:
type: string
description: Can be auto,yes,no and influences if the major upgrade should do or detect an automatic mysql upgrade
constraints:
- allowed_values: ['auto', 'yes', 'no']
default: 'auto'
IgnoreCephUpgradeWarnings:
type: boolean
default: false
description: If enabled, Ceph upgrade will be forced even though cluster or PGs status is not clean
resources:
# TODO(jistr): for Mitaka->Newton upgrades and further we can use
# map_merge with input_values instead of feeding params into scripts
# via str_replace on bash snippets
CephMonUpgradeConfig:
type: OS::Heat::SoftwareConfig
properties:
group: script
config:
list_join:
- ''
- - str_replace:
template: |
#!/bin/bash
ignore_ceph_upgrade_warnings='IGNORE_CEPH_UPGRADE_WARNINGS'
params:
IGNORE_CEPH_UPGRADE_WARNINGS: {get_param: IgnoreCephUpgradeWarnings}
- get_file: major_upgrade_ceph_mon.sh
CephMonUpgradeDeployment:
type: OS::Heat::SoftwareDeploymentGroup
properties:
servers: {get_param: [servers, Controller]}
config: {get_resource: CephMonUpgradeConfig}
input_values: {get_param: input_values}
update_policy:
batch_create:
max_batch_size: 1
rolling_update:
max_batch_size: 1
ControllerPacemakerUpgradeConfig_Step1:
type: OS::Heat::SoftwareConfig
properties:
group: script
config:
list_join:
- ''
- - str_replace:
template: |
#!/bin/bash
upgrade_level_nova_compute='UPGRADE_LEVEL_NOVA_COMPUTE'
params:
UPGRADE_LEVEL_NOVA_COMPUTE: {get_param: UpgradeLevelNovaCompute}
- str_replace:
template: |
#!/bin/bash
mariadb_do_major_upgrade='MYSQL_MAJOR_UPGRADE'
params:
MYSQL_MAJOR_UPGRADE: {get_param: MySqlMajorUpgrade}
- get_file: pacemaker_common_functions.sh
- get_file: major_upgrade_check.sh
- get_file: major_upgrade_pacemaker_migrations.sh
- get_file: major_upgrade_controller_pacemaker_1.sh
ControllerPacemakerUpgradeDeployment_Step1:
type: OS::Heat::SoftwareDeploymentGroup
depends_on: CephMonUpgradeDeployment
properties:
servers: {get_param: [servers, Controller]}
config: {get_resource: ControllerPacemakerUpgradeConfig_Step1}
input_values: {get_param: input_values}
BlockStorageUpgradeConfig:
type: OS::Heat::SoftwareConfig
depends_on: ControllerPacemakerUpgradeDeployment_Step1
properties:
group: script
config: {get_file: major_upgrade_block_storage.sh}
BlockStorageUpgradeDeployment:
type: OS::Heat::SoftwareDeploymentGroup
properties:
servers: {get_param: [servers, BlockStorage]}
config: {get_resource: BlockStorageUpgradeConfig}
input_values: {get_param: input_values}
ControllerPacemakerUpgradeConfig_Step2:
type: OS::Heat::SoftwareConfig
properties:
group: script
config:
list_join:
- ''
- - get_file: pacemaker_common_functions.sh
- get_file: major_upgrade_pacemaker_migrations.sh
- get_file: major_upgrade_controller_pacemaker_2.sh
ControllerPacemakerUpgradeDeployment_Step2:
type: OS::Heat::SoftwareDeploymentGroup
depends_on: BlockStorageUpgradeDeployment
properties:
servers: {get_param: [servers, Controller]}
config: {get_resource: ControllerPacemakerUpgradeConfig_Step2}
input_values: {get_param: input_values}
ControllerPacemakerUpgradeConfig_Step3:
type: OS::Heat::SoftwareConfig
properties:
group: script
config:
list_join:
- ''
- - get_file: pacemaker_common_functions.sh
- get_file: major_upgrade_pacemaker_migrations.sh
- get_file: major_upgrade_controller_pacemaker_3.sh
ControllerPacemakerUpgradeDeployment_Step3:
type: OS::Heat::SoftwareDeploymentGroup
depends_on: ControllerPacemakerUpgradeDeployment_Step2
properties:
servers: {get_param: [servers, Controller]}
config: {get_resource: ControllerPacemakerUpgradeConfig_Step3}
input_values: {get_param: input_values}