a7e9730a85
When a redeploy command is being run in a composable HA environment, if there are any configuration changes, the <bundle>_restart containers will be kicked off. These restart containers will then try and restart the bundles globally in the cluster. These restarts will be fired off in parallel from different nodes. So haproxy-bundle will be restarted from controller-0, mysql-bundle from database-0, rabbitmq-bundle from messaging-0. This has proven to be problematic and very often (rhbz#1868113) it would fail the redeploy with: 2020-08-11T13:40:25.996896822+00:00 stderr F Error: Could not complete shutdown of rabbitmq-bundle, 1 resources remaining 2020-08-11T13:40:25.996896822+00:00 stderr F Error performing operation: Timer expired 2020-08-11T13:40:25.996896822+00:00 stderr F Set 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role set=rabbitmq-bundle-meta_attributes name=target-role value=stopped 2020-08-11T13:40:25.996896822+00:00 stderr F Waiting for 2 resources to stop: 2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle 2020-08-11T13:40:25.996896822+00:00 stderr F * rabbitmq-bundle 2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle 2020-08-11T13:40:25.996896822+00:00 stderr F Deleted 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role name=target-role 2020-08-11T13:40:25.996896822+00:00 stderr F or 2020-08-11T13:39:49.197487180+00:00 stderr F Waiting for 2 resources to start again: 2020-08-11T13:39:49.197487180+00:00 stderr F * galera-bundle 2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle 2020-08-11T13:39:49.197487180+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining 2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle 2020-08-11T13:39:49.197487180+00:00 stderr F After discussing it with kgaillot it seems that concurrent restarts in pcmk are just brittle: """ Sadly restarts are brittle, and they do in fact assume that nothing else is causing resources to start or stop. They work like this: - Get the current configuration and state of the cluster, including a list of active resources (list #1) - Set resource target-role to Stopped - Get the current configuration and state of the cluster, including a list of which resources *should* be active (list #2) - Compare lists #1 and #2, and the difference is the resources that should stop - Periodically refresh the configuration and state until the list of active resources matches list #2 - Delete the target-role - Periodically refresh the configuration and state until the list of active resources matches list #1 """ So the suggestion is to replace the restarts with an enable/disable cycle of the resource. Tested this on a dozen runs on a composable HA environment and did not observe the error any longer. NB: This is not a clean cherry-pick of the related change, but a merge of master's I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e and Ia850286682f09cd75651591a1158c2e467343c1d (Drop bootstrap_host_exec from pacemaker_restart_bundle) Closes-Bug: #1892206 Change-Id: I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e |
||
---|---|---|
ci | ||
common | ||
deployed-server | ||
docker | ||
docker_config_scripts | ||
environments | ||
extraconfig | ||
firstboot | ||
network | ||
plan-samples | ||
puppet | ||
releasenotes | ||
roles | ||
sample-env-generator | ||
scripts | ||
tools | ||
tripleo_heat_templates | ||
validation-scripts | ||
zuul.d | ||
.gitignore | ||
.gitreview | ||
.testr.conf | ||
LICENSE | ||
README.rst | ||
all-nodes-validation.yaml | ||
babel.cfg | ||
bindep.txt | ||
bootstrap-config.yaml | ||
capabilities-map.yaml | ||
config-download-software.yaml | ||
config-download-structured.yaml | ||
default_passwords.yaml | ||
hosts-config.yaml | ||
j2_excludes.yaml | ||
net-config-bond.j2.yaml | ||
net-config-bridge.j2.yaml | ||
net-config-linux-bridge.j2.yaml | ||
net-config-noop.j2.yaml | ||
net-config-static-bridge-with-external-dhcp.j2.yaml | ||
net-config-static-bridge.j2.yaml | ||
net-config-static.j2.yaml | ||
net-config-undercloud.j2.yaml | ||
network_data.yaml | ||
network_data_ganesha.yaml | ||
overcloud-resource-registry-puppet.j2.yaml | ||
overcloud.j2.yaml | ||
plan-environment.yaml | ||
requirements.txt | ||
roles_data.yaml | ||
roles_data_undercloud.yaml | ||
setup.cfg | ||
setup.py | ||
test-requirements.txt | ||
tox.ini |
README.rst
Team and repository tags
tripleo-heat-templates
Heat templates to deploy OpenStack using OpenStack.
- Free software: Apache License (2.0)
- Documentation: https://docs.openstack.org/tripleo-docs/latest/
- Source: http://git.openstack.org/cgit/openstack/tripleo-heat-templates
- Bugs: https://bugs.launchpad.net/tripleo
Features
The ability to deploy a multi-node, role based OpenStack deployment using OpenStack Heat. Notable features include:
- Choice of deployment/configuration tooling: puppet, (soon) docker
- Role based deployment: roles for the controller, compute, ceph, swift, and cinder storage
- physical network configuration: support for isolated networks, bonding, and standard ctlplane networking
Directories
A description of the directory layout in TripleO Heat Templates.
- environments: contains heat environment files that can be used with -e
on the command like to enable features, etc.
- extraconfig: templates used to enable 'extra' functionality. Includes
functionality for distro specific registration and upgrades.
- firstboot: example first_boot scripts that can be used when initially
creating instances.
- network: heat templates to help create isolated networks and ports
- puppet: templates mostly driven by configuration with puppet. To use these
templates you can use the overcloud-resource-registry-puppet.yaml.
- validation-scripts: validation scripts useful to all deployment
configurations
- roles: example roles that can be used with the tripleoclient to generate
a roles_data.yaml for a deployment See the roles/README.rst for additional details.
Service testing matrix
The configuration for the CI scenarios will be defined in tripleo-heat-templates/ci/ and should be executed according to the following table:
- | scn000 | scn001 | scn002 | scn003 | scn004 | scn006 | scn007 | scn009 | non-ha | ovh-ha |
---|---|---|---|---|---|---|---|---|---|---|
openshift |
|
|||||||||
keystone |
|
|
|
|
|
|
|
|
|
|
glance |
|
swift |
|
|
|
|
|
|
||
cinder |
|
iscsi | ||||||||
heat |
|
|
||||||||
ironic |
|
|||||||||
mysql |
|
|
|
|
|
|
|
|
|
|
neutron |
|
|
|
|
|
|
|
|
||
neutron-bgpvpn |
|
|||||||||
ovn |
|
|||||||||
neutron-l2gw |
|
|||||||||
rabbitmq |
|
|
|
|
|
|
|
|
||
mongodb | ||||||||||
redis |
|
|
||||||||
haproxy |
|
|
|
|
|
|
|
|
||
memcached |
|
|
|
|
|
|
|
|
||
pacemaker |
|
|
|
|
|
|
|
|
||
nova |
|
|
|
|
ironic |
|
|
|
||
ntp |
|
|
|
|
|
|
|
|
|
|
snmp |
|
|
|
|
|
|
|
|
|
|
timezone |
|
|
|
|
|
|
|
|
|
|
sahara |
|
|||||||||
mistral |
|
|||||||||
swift |
|
|||||||||
aodh |
|
|
||||||||
ceilometer |
|
|
||||||||
gnocchi |
|
|
||||||||
panko |
|
|
||||||||
barbican |
|
|||||||||
zaqar |
|
|||||||||
ec2api |
|
|||||||||
cephrgw |
|
|||||||||
tacker |
|
|||||||||
congress |
|
|||||||||
cephmds |
|
|||||||||
manila |
|
|||||||||
collectd |
|
|||||||||
fluentd |
|
|||||||||
sensu-client |
|