RETIRED, Heat templates for deploying OpenStack
Go to file
Michele Baldessari a7e9730a85 Fix pcs restart in composable HA
When a redeploy command is being run in a composable HA environment, if there
are any configuration changes, the <bundle>_restart containers will be kicked
off. These restart containers will then try and restart the bundles globally in
the cluster.

These restarts will be fired off in parallel from different nodes. So
haproxy-bundle will be restarted from controller-0, mysql-bundle from
database-0, rabbitmq-bundle from messaging-0.

This has proven to be problematic and very often (rhbz#1868113) it would fail
the redeploy with:
2020-08-11T13:40:25.996896822+00:00 stderr F Error: Could not complete shutdown of rabbitmq-bundle, 1 resources remaining
2020-08-11T13:40:25.996896822+00:00 stderr F Error performing operation: Timer expired
2020-08-11T13:40:25.996896822+00:00 stderr F Set 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role set=rabbitmq-bundle-meta_attributes name=target-role value=stopped
2020-08-11T13:40:25.996896822+00:00 stderr F Waiting for 2 resources to stop:
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F Deleted 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role name=target-role
2020-08-11T13:40:25.996896822+00:00 stderr F

or

2020-08-11T13:39:49.197487180+00:00 stderr F Waiting for 2 resources to start again:
2020-08-11T13:39:49.197487180+00:00 stderr F * galera-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F

After discussing it with kgaillot it seems that concurrent restarts in pcmk are just brittle:
"""
Sadly restarts are brittle, and they do in fact assume that nothing else is causing resources to start or stop. They work like this:

- Get the current configuration and state of the cluster, including a list of active resources (list #1)
- Set resource target-role to Stopped
- Get the current configuration and state of the cluster, including a list of which resources *should* be active (list #2)
- Compare lists #1 and #2, and the difference is the resources that should stop
- Periodically refresh the configuration and state until the list of active resources matches list #2
- Delete the target-role
- Periodically refresh the configuration and state until the list of active resources matches list #1
"""

So the suggestion is to replace the restarts with an enable/disable cycle of the resource.

Tested this on a dozen runs on a composable HA environment and did not observe the error
any longer.

NB: This is not a clean cherry-pick of the related change, but a merge
    of master's I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e and
    Ia850286682f09cd75651591a1158c2e467343c1d (Drop bootstrap_host_exec
    from pacemaker_restart_bundle)

Closes-Bug: #1892206

Change-Id: I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e
2020-08-21 11:29:13 +02:00
ci Merge "[queens-only] Fluentd - Fix multiline format" into stable/queens 2020-07-10 23:34:18 +00:00
common Merge "[queens-only] Fluentd - Fix multiline format" into stable/queens 2020-07-10 23:34:18 +00:00
deployed-server Added OctaviaDeploymentConfig to deployed server role 2020-06-23 16:36:03 -02:30
docker Exclude /etc/hostname 2020-08-11 13:41:52 -04:00
docker_config_scripts Fix pcs restart in composable HA 2020-08-21 11:29:13 +02:00
environments Merge "[QUEENS-ONLY] Deploy clustered Redis with STF" into stable/queens 2020-08-12 16:55:16 +00:00
extraconfig Handle generating grub config for EFI partitions 2020-05-15 15:54:44 +12:00
firstboot Try a timesync as part of first boot 2019-05-23 08:18:40 -06:00
network Add j2 per-role MetricsQdrNetwork 2020-08-07 14:22:29 +02:00
plan-samples Update default value for derive params workflow inputs 2018-01-15 05:50:47 -05:00
puppet Merge "Add j2 per-role MetricsQdrNetwork" into stable/queens 2020-08-11 04:32:10 +00:00
releasenotes HA: minor update of arbitrary container image name 2020-03-30 14:11:33 +02:00
roles Add SSHD composable service to Networker role definition 2020-01-29 15:05:50 -07:00
sample-env-generator Merge "TLS everywhere: switch Octavia to use DNS entries" into stable/queens 2019-06-14 22:18:31 +00:00
scripts [Templates] Use str_replace for hosts. 2018-11-15 08:54:22 +00:00
tools NodeDataLookup utility should rely on python env 2020-02-20 22:20:20 +00:00
tripleo_heat_templates Do not generate apache/haproxy certs for invalid networks 2018-02-08 12:50:04 +00:00
validation-scripts Make comparisons case insensitive 2019-06-26 08:07:56 -06:00
zuul.d Add scenario 10 (octavia) job to queens 2019-10-21 08:05:36 +00:00
.gitignore Sample environment generator 2017-06-12 15:02:50 -05:00
.gitreview OpenDev Migration Patch 2019-04-19 19:35:08 +00:00
.testr.conf Improve nova statedir ownership logic 2018-07-25 16:49:52 +02:00
LICENSE Add license file 2014-01-20 11:58:20 +01:00
README.rst Merge "fix the scenario chart" into stable/queens 2018-06-01 20:40:28 +00:00
all-nodes-validation.yaml Optional ICMP validation of controllers and gateways 2019-03-18 17:06:45 +00:00
babel.cfg Add release configuration. 2013-10-22 17:49:35 +01:00
bindep.txt Add in roles data validation 2017-07-07 09:51:40 -06:00
bootstrap-config.yaml Change template names to queens 2017-11-23 10:15:32 +01:00
capabilities-map.yaml Add networking-ansible ML2 plugin support 2018-12-07 08:56:17 +00:00
config-download-software.yaml Support SshKnownHostsDeployment with config-download 2018-07-12 19:58:40 -04:00
config-download-structured.yaml Support SshKnownHostsDeployment with config-download 2018-07-12 19:58:40 -04:00
default_passwords.yaml Change template names to queens 2017-11-23 10:15:32 +01:00
hosts-config.yaml [Templates] Use str_replace for hosts. 2018-11-15 08:54:22 +00:00
j2_excludes.yaml Remove ipv6 specific network templates 2017-08-31 13:12:17 -07:00
net-config-bond.j2.yaml Add ability to specify dns search domains 2019-05-23 20:15:17 +00:00
net-config-bridge.j2.yaml Render NIC config templates with jinja2 2018-02-13 00:19:37 -08:00
net-config-linux-bridge.j2.yaml Render NIC config templates with jinja2 2018-02-13 00:19:37 -08:00
net-config-noop.j2.yaml Render NIC config templates with jinja2 2018-02-13 00:19:37 -08:00
net-config-static-bridge-with-external-dhcp.j2.yaml Render NIC config templates with jinja2 2018-02-13 00:19:37 -08:00
net-config-static-bridge.j2.yaml Add ability to specify dns search domains 2019-05-23 20:15:17 +00:00
net-config-static.j2.yaml Add ability to specify dns search domains 2019-05-23 20:15:17 +00:00
net-config-undercloud.j2.yaml Add ability to specify dns search domains 2019-05-23 20:15:17 +00:00
network_data.yaml Allow overlay tunnel endpoints on IPv6 address 2019-09-02 15:30:59 +02:00
network_data_ganesha.yaml Allow overlay tunnel endpoints on IPv6 address 2019-09-02 15:30:59 +02:00
overcloud-resource-registry-puppet.j2.yaml [Rocky and older] Actually install tmpwatch on overcloud nodes 2019-12-17 06:32:50 +00:00
overcloud.j2.yaml Add network vip mapping into service data 2020-01-27 09:14:22 +00:00
plan-environment.yaml Containers defaults for plan environment 2018-05-28 08:11:24 +00:00
requirements.txt Add validation for hiera interpolation in services 2020-01-10 10:20:39 +01:00
roles_data.yaml Merge "QDR for metrics collection purposes" into stable/queens 2019-07-31 21:46:04 +00:00
roles_data_undercloud.yaml [Queens-only] Install and configure tmpwatch for log cleanup 2019-03-27 07:42:11 +01:00
setup.cfg Drop deprecated templates/Makefile/merge.py 2015-11-25 15:00:13 -05:00
setup.py Updated from global requirements 2017-03-28 13:03:01 +00:00
test-requirements.txt Improve nova statedir ownership logic 2018-07-25 16:49:52 +02:00
tox.ini Improve nova statedir ownership logic 2018-07-25 16:49:52 +02:00

README.rst

Team and repository tags

image

tripleo-heat-templates

Heat templates to deploy OpenStack using OpenStack.

Features

The ability to deploy a multi-node, role based OpenStack deployment using OpenStack Heat. Notable features include:

  • Choice of deployment/configuration tooling: puppet, (soon) docker
  • Role based deployment: roles for the controller, compute, ceph, swift, and cinder storage
  • physical network configuration: support for isolated networks, bonding, and standard ctlplane networking

Directories

A description of the directory layout in TripleO Heat Templates.

  • environments: contains heat environment files that can be used with -e

    on the command like to enable features, etc.

  • extraconfig: templates used to enable 'extra' functionality. Includes

    functionality for distro specific registration and upgrades.

  • firstboot: example first_boot scripts that can be used when initially

    creating instances.

  • network: heat templates to help create isolated networks and ports
  • puppet: templates mostly driven by configuration with puppet. To use these

    templates you can use the overcloud-resource-registry-puppet.yaml.

  • validation-scripts: validation scripts useful to all deployment

    configurations

  • roles: example roles that can be used with the tripleoclient to generate

    a roles_data.yaml for a deployment See the roles/README.rst for additional details.

Service testing matrix

The configuration for the CI scenarios will be defined in tripleo-heat-templates/ci/ and should be executed according to the following table:

- scn000 scn001 scn002 scn003 scn004 scn006 scn007 scn009 non-ha ovh-ha
openshift

X

keystone

X

X

X

X

X

X

X

X

X

glance

rbd

swift

file

rgw

file

file

file

file

cinder

rbd

iscsi
heat

X

X

ironic

X

mysql

X

X

X

X

X

X

X

X

X

neutron

ovs

ovs

ovs

ovs

ovs

ovn

ovs

ovs

neutron-bgpvpn

wip

ovn

X

neutron-l2gw

wip

rabbitmq

X

X

X

X

X

X

X

X

mongodb
redis

X

X

haproxy

X

X

X

X

X

X

X

X

memcached

X

X

X

X

X

X

X

X

pacemaker

X

X

X

X

X

X

X

X

nova

qemu

qemu

qemu

qemu

ironic

qemu

qemu

qemu

ntp

X

X

X

X

X

X

X

X

X

X

snmp

X

X

X

X

X

X

X

X

X

X

timezone

X

X

X

X

X

X

X

X

X

X

sahara

X

mistral

X

swift

X

aodh

X

X

ceilometer

X

X

gnocchi

rbd

swift

panko

X

X

barbican

X

zaqar

X

ec2api

X

cephrgw

X

tacker

X

congress

X

cephmds

X

manila

X

collectd

X

fluentd

X

sensu-client

X