tripleo-heat-templates/deployment
Damien Ciabrini 3230f005c1 HA: reorder init_bundle and restart_bundle for improved updates
A pacemaker bundle can be restarted either because:
  . a tripleo config has been updated (from /var/lib/config-data)
  . the bundle config has been updated (container image, bundle
    parameter,...)

In HA services, special container "*_restart_bundle" is in charge
of restarting the HA service on tripleo config change. Special
container "*_init_bundle" handles restart on bundle config change.

When both types of change occur at the same time, the bundle must
be restarted first, so that the container has a chance to be
recreated with all bind-mounts updated before it tries to reload
the updated config.

Implement the improvement with two changes:

1. Make the "*_restart_bundle" start after the "*_init_bundle", and
make sure "*_restart_bundle" is only enabled after the initial
deployment.

2. During minor update, make sure that the "*_restart_bundle" not
only restarts the container, but also waits until the service
is operational (e.g. galera fully promoted to Master). This forces
the rolling restart to happen sequentially, and avoid service
disruption in quorum-based clustered services like galera and
rabbitmq.

Tested the following update use cases:

* minor update: ensure that *_restart_bundle restarts all types of
  resources (OCF, bundles, A/P, A/P Master/Slave).

* minor update: ensure *_restart_bundle is not executed when no
  config or image update happened for a service.

* restart_bundle: when resource (OCF or container) fails to
  restart, bail out early instead of waiting for nothing until
  timeout is reached.

* restart_bundle: make sure a resource is restarted even when it
  is in failed stated when *_restart_bundle is called.

* restart_bundle: A/P can be restarted on any node, so watch
  restart globally. When the resource restarts as Slave, continue
  watching for a Master elsewhere in the cluster.

* restart_bundle: if an A/P is not running locally, make sure it
  doesn't get restarted anywhere else in the cluster.

* restart_bundle: do not try to restart stopped (disabled) or
  unmanaged resource. Bail out early instead, to not wait until
  timeout is reached.

* stack update: make sure that running a stack update with no
  change does not trigger any *_restart_bundle, and does not
  restart any HA container either.

* stack update: when bundle and config will change, ensure bundle
  is updated before HA containers are restarted (e.g. HAProxy
  migration to TLS everywhere)

Change-Id: Ic41d4597e9033f9d7847bb6c10c25f443fbd5b0e
Closes-Bug: #1839858
2020-01-23 16:09:36 +01:00
..
aide Convert heat template to use aide role 2019-07-03 12:00:17 +00:00
aodh Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
apache Revert "Wire-in Apache MPM module parameters and switch it" 2019-08-02 10:34:07 +00:00
auditd Move auditd, ca-cert, certmonger to deployment 2019-05-30 20:37:25 +00:00
backup-and-restore Use list join for rendering rear config file in heat 2019-11-24 23:32:40 +01:00
barbican Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
cavium Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
ceilometer Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
ceph-ansible Merge "Add swiftoperator role on ceph-rgw template" 2020-01-20 17:57:36 +00:00
certs Move auditd, ca-cert, certmonger to deployment 2019-05-30 20:37:25 +00:00
cinder HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
clients Use ansible to install client packages 2020-01-16 08:16:01 -06:00
container-image-prepare Honor Debug for container image prepare 2019-11-01 08:51:41 -06:00
database HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
deprecated Use ansible to install client packages 2020-01-16 08:16:01 -06:00
etcd Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
experimental Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
glance Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
gnocchi Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
haproxy HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
heat Raise Heat API WSGI timeout to 600s 2020-01-21 13:49:20 -05:00
horizon horizon: put plugins toggles in quotes 2020-01-13 11:07:06 -05:00
image-serve Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
ipa Restart certmnonger after registering system with IPA 2019-10-28 11:24:31 -04:00
ipsec Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
ironic Assign service role for ironic user 2020-01-18 21:14:44 +09:00
iscsid Get rid of docker removing in post_upgrade tasks. 2019-11-12 16:33:38 +01:00
keepalived Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
kernel Modify import_role to include_role for boot params service 2020-01-10 11:58:38 +05:30
keystone keystone: fix trailing space 2020-01-08 23:12:55 +00:00
logging Merge "Fix containers-common.yaml path for RsyslogSidecar service" 2019-12-16 23:41:09 +00:00
login-defs Use login-defs role from tripleo-ansible in sc004 2019-08-10 13:25:16 +03:00
logrotate Adding hourly option to LogrotateRotationInterval parameter 2020-01-07 16:54:06 -05:00
manila HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
masquerade-networks Move masq-nets, swift-external, and validations to deployment 2019-05-30 20:37:30 +00:00
memcached Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
messaging Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
metrics Merge "Open ports for Metrics QDRs" 2020-01-16 02:30:52 +00:00
mistral Merge "Manage all Keystone resources with Ansible" 2020-01-09 04:40:33 +00:00
multipathd Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
neutron depends_on: add .service to avoid errors in logs 2020-01-13 22:49:25 -05:00
nova Merge "Improve documentation for 'NovaComputeCpuSharedSet' parameter" 2020-01-21 23:21:00 +00:00
octavia Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
openvswitch depends_on: add .service to avoid errors in logs 2020-01-13 22:49:25 -05:00
ovn HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
pacemaker clustercheck: use fqdn instead of ip for bind address 2020-01-17 14:05:01 +01:00
placement Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
podman Add second fact to ensure type safty 2019-09-27 14:47:05 -05:00
qdr Remove unnecessary slash volume maps 2019-12-04 20:32:14 +02:00
rabbitmq HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00
rhsm Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
sahara Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
securetty Configure securetty using tripleo-ansible 2019-09-04 01:22:40 +00:00
skydive Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
snmp Merge "Check if snmpd is enabled for upgrade_tasks" 2019-11-20 19:17:50 +00:00
sshd Fix sshd firewall rule 2020-01-01 13:13:22 +05:30
swift Merge "Fix permission error if Barbican is enabled for Swift" 2020-01-09 21:28:03 +00:00
tests Add an experimental test container volume create service 2019-12-23 16:47:42 +00:00
time Configure time using tripleo-ansible 2019-11-20 14:03:40 +00:00
timesync Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
tripleo-firewall Convert firewall rules to use TripleO-Ansible 2019-11-18 15:40:22 -06:00
tripleo-packages Bypass openvswitch update logic if expected packages are not present 2020-01-07 12:00:06 -03:30
tuned Convert heat template to use tuned role 2019-06-26 23:01:43 +00:00
undercloud Bypass openvswitch update logic if expected packages are not present 2020-01-07 12:00:06 -03:30
validations Use tripleo-validations-package role instead of puppet 2019-08-26 08:56:35 +00:00
veritas-hyperscale Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
vpp Move vpp, and veritas-hyperscale into deployment 2019-05-30 20:37:33 +00:00
zaqar Manage all Keystone resources with Ansible 2020-01-06 22:33:05 +00:00
README.rst deployment: document keystone_resources 2020-01-10 02:36:39 +00:00
containers-common.yaml HA: reorder init_bundle and restart_bundle for improved updates 2020-01-23 16:09:36 +01:00

README.rst

TripleO Deployments

This directory contains files that represent individual service deployments, orchestration tools, and the configuration tools used to deploy them.

Directory Structure

Each logical grouping of services will have a directory. Example: 'timesync'. Within this directory related timesync services would exist to for example configure timesync services on baremetal or via containers.

Filenaming conventions

As a convention each deployments service filename will reflect both the deployment engine (baremetal, or containers) along with the config tool used to deploy that service.

The convention is <service-name>-<engine>-<config management tool>.

Examples:

deployment/aodh/aodh-api-container-puppet.yaml (containerized Aodh service configured with Puppet)

deployment/aodh/aodh-api-container-ansible.yaml (containerized Aodh service configured with Ansible)

deployment/timesync/chrony-baremetal-ansible.yaml (baremetal Chrony service configured with Ansible)

deployment/timesync/chrony-baremetal-puppet.yaml (baremetal Chrony service configured with Puppet)

Building Kolla Images

TripleO currently relies on Kolla(Dockerfile) containers. Kolla supports container customization and we are making use of this feature within TripleO to inject puppet (our configuration tool of choice) into the Kolla base images. A variety of other customizations are being made via the tripleo-common/container-images/tripleo_kolla_template_overrides.j2 file.

To build Kolla images for TripleO adjust your kolla config1 to build your centos base image with puppet using the example below:

$ cat template-overrides.j2 {% extends parent_template %} {% set base_centos_binary_packages_append = ['puppet'] %} {% set nova_scheduler_packages_append = ['openstack-tripleo-common'] %}

kolla-build --base centos --template-override template-overrides.j2

Containerized Deployment Template Structure

Each deployment template may define a set of output values control the underlying service deployment in a variety of ways. These output sections are specific to the TripleO deployment architecture. The following sections are available for containerized services.

  • config_settings: This section contains service specific hiera data can be used to generate config files for each service. This data is ultimately processed via the container-puppet.py tool which generates config files for each service according to the settings here.

  • kolla_config: Contains YAML that represents how to map config files into the kolla container. This config file is typically mapped into the container itself at the /var/lib/kolla/config_files/config.json location and drives how kolla's external config mechanisms work.

  • docker_config: Data that is passed to paunch tool to configure a container, or step of containers at each step. See the available steps documented below which are implemented by TripleO's cluster deployment architecture. If you want the tasks executed only once for the bootstrap node per a role in the cluster, use the /usr/bin/bootstrap_host_exec wrapper.

  • puppet_config: This section is a nested set of key value pairs that drive the creation of config files using puppet. Required parameters include:

    • puppet_tags: Puppet resource tag names that are used to generate config files with puppet. Only the named config resources are used to generate a config file. Any service that specifies tags will have the default tags of 'file,concat,file_line,augeas,cron' appended to the setting. Example: keystone_config
    • config_volume: The name of the volume (directory) where config files will be generated for this service. Use this as the location to bind mount into the running Kolla container for configuration.
    • config_image: The name of the container image that will be used for generating configuration files. This is often the same container that the runtime service uses. Some services share a common set of config files which are generated in a common base container.
    • step_config: This setting controls the manifest that is used to create container config files via puppet. The puppet tags below are used along with this manifest to generate a config directory for this container.
  • container_puppet_tasks: This section provides data to drive the container-puppet.py tool directly. The task is executed for the defined steps before the corresponding docker_config's step. Puppet always sees the step number overrided as the step #6. It might be useful for initialization of things. See container-puppet.py for formatting. Note that the tasks are executed only once for the bootstrap node per a role in the cluster. Make sure the puppet manifest ensures the wanted "at most once" semantics. That may be achieved via the <service_name>_short_bootstrap_node_name hiera parameters automatically evaluated for each service.

  • global_config_settings: the hiera keys will be distributed to all roles

  • service_config_settings: Takes an extra key to wire in values that are defined for a service that need to be consumed by some other service. For example: service_config_settings: haproxy: foo: bar This will set the hiera key 'foo' on all roles where haproxy is included.

Deployment steps

Similar to baremetal containers are brought up in a stepwise manner. The current architecture supports bringing up baremetal services alongside of containers. For each step the baremetal puppet manifests are executed first and then any containers are brought up afterwards.

Steps correlate to the following:

Pre) Containers config files generated per hiera settings. 1) Load Balancer configuration baremetal a) step 1 baremetal b) step 1 containers 2) Core Services (Database/Rabbit/NTP/etc.) a) step 2 baremetal b) step 2 containers 3) Early Openstack Service setup (Ringbuilder, etc.) a) step 3 baremetal b) step 3 containers 4) General OpenStack Services a) step 4 baremetal b) step 4 containers c) Keystone containers post initialization (tenant,service,endpoint creation) 5) Service activation (Pacemaker), online data migration a) step 5 baremetal b) step 5 containers

Update steps:

All services have an associated update_tasks output that is an ansible snippet that will be run during update in an rolling update that is expected to run in a rolling update fashion (one node at a time)

For Controller (where pacemaker is running) we have the following states:
  1. Step=1: stop the cluster on the updated node;
  2. Step=2: Pull the latest image and retag the it pcmklatest
  3. Step=3: yum upgrade happens on the host.
  4. Step=4: Restart the cluster on the node
  5. Step=5: Verification: Currently we test that the pacemaker services are running.

Then the usual deploy steps are run which pull in the latest image for all containerized services and the updated configuration if any.

Note: as pacemaker is not containerized, the points 1 and 4 happen in deployment/pacemaker/pacemaker-baremetal-puppet.yaml.

Fast-forward Upgrade Steps

Each service template may optionally define a fast_forward_upgrade_tasks key, which is a list of Ansible tasks to be performed during the fast-forward upgrade process. As with Upgrade steps each task is associated to a particular step provided as a variable and used along with a release variable by a basic conditional that determines when the task should run.

Steps are broken down into two categories, prep tasks executed across all hosts and bootstrap tasks executed on a single host for a given role.

The individual steps then correspond to the following tasks during the upgrade:

Prep steps:

  • Step=0: Check running services
  • Step=1: Stop the service
  • Step=2: Stop the cluster
  • Step=3: Update repos

Bootstrap steps:

  • Step=4: DB backups
  • Step=5: Pre package update commands
  • Step=6: Package updates
  • Step=7: Post package update commands
  • Step=8: DB syncs
  • Step=9: Verification

Input Parameters

Each service may define its own input parameters and defaults. Operators will use the parameter_defaults section of any Heat environment to set per service parameters.

Apart from sevice specific inputs, there are few default parameters for all the services. Following are the list of default parameters:

  • ServiceData: Mapping of service specific data. It is used to encapsulate all the service specific data. As of now, it contains net_cidr_map, which contains the CIDR map for all the networks. Additional data will be added as and when required.

  • ServiceNetMap: Mapping of service_name -> network name. Default mappings for service to network names are defined in ../network/service_net_map.j2.yaml, which may be overridden via ServiceNetMap values added to a user environment file via parameter_defaults.

  • EndpointMap: Mapping of service endpoint -> protocol. Contains a mapping of endpoint data generated for all services, based on the data included in ../network/endpoints/endpoint_data.yaml.

  • DefaultPasswords: Mapping of service -> default password. Used to pass some passwords from the parent templates, this is a legacy interface and should not be used by new services.

  • RoleName: Name of the role on which this service is deployed. A service can be deployed in multiple roles. This is an internal parameter (should not be set via environment file), which is fetched from the name attribute of the roles_data.yaml template.

  • RoleParameters: Parameter specific to a role on which the service is applied. Using the format "<RoleName>Parameters" in the parameter_defaults of user environment file, parameters can be provided for a specific role. For example, in order to provide a parameter specific to "Compute" role, below is the format:

    parameter_defaults:
      ComputeParameters:
        Param1: value

Update Steps

Each service template may optionally define a update_tasks key, which is a list of ansible tasks to be performed during the minor update process. These are executed in a rolling manner node-by-node.

We allow a series of steps for the per-service update sequence via conditionals referencing a step variable e.g when: step|int == 2.

Pre-upgrade Rolling Steps

Each service template may optionally define a pre_upgrade_rolling_tasks key, which is a list of ansible tasks to be performed before the main upgrade phase, and these tasks are executed in a node-by-node rolling manner on the overcloud, similarly as update_tasks.

Upgrade Steps

Each service template may optionally define a upgrade_tasks key, which is a list of ansible tasks to be performed during the upgrade process.

Similar to the update_tasks, we allow a series of steps for the per-service upgrade sequence, defined as ansible tasks with a "when: step|int == 1" for the first step, "== 2" for the second, etc.

Steps correlate to the following:

  1. Perform any pre-upgrade validations.
  2. Stop the control-plane services, e.g disable LoadBalancer, stop pacemaker cluster and stop any managed resources. The exact order is controlled by the cluster constraints.
  3. Perform a package update and install new packages: A general upgrade is done, and only new package should go into service ansible tasks.
  4. Start services needed for migration tasks (e.g DB)
  5. Perform any migration tasks, e.g DB sync commands

Note that the services are not started in the upgrade tasks - we instead re-run puppet which does any reconfiguration required for the new version, then starts the services.

When running an OS upgrade via the tags system_upgrade_prepare and system_upgrade_run, or the combined tag system_upgrade, the steps corellate to the following:

  1. Any pre-service-stop actions. (system_upgrade_prepare)
  2. Stop all services. (system_upgrade_prepare)
  3. Post-service-stop actions like removing packages before the upgrade. (system_upgrade_prepare)
  4. Step reserved for the tripleo-packages service. Only package download for upgrade (under system_upgrade_prepare tag), and reboot for performing the offline upgrade (under system_upgrade_run tag) happens here.
  5. Any post-upgrade tasks (system_upgrade_run).

Nova Server Metadata Settings

One can use the hook of type OS::TripleO::ServiceServerMetadataHook to pass entries to the nova instances' metadata. It is, however, disabled by default. In order to overwrite it one needs to define it in the resource registry. An implementation of this hook needs to conform to the following:

  • It needs to define an input called RoleData of json type. This gets as input the contents of the role_data for each role's ServiceChain.
  • This needs to define an output called metadata which will be given to the Nova Server resource as the instance's metadata.

Keystone resources management

Keystone resources, such as users, roles, domains, endpoints, services, role assignments, are now managed by tripleo-keystone-resources Ansible role.


  1. See the override file which can be used to build Kolla packages that work with TripleO.↩︎