tripleo-heat-templates/deployment/certs
Damien Ciabrini 8968c7efd6 Rolling certificate update for HA services
(manually squashed the subsequent fix [1] into a single commit)
(also manually squashed [2] because of #1906505)

There are certain HA clustered services (e.g. galera) that don't
have the ability natively to reload their TLS certificate without
being restarted. If too many replicas are restarted concurrently
this might result in full service disruption.

To ensure service availability, provide a means to ensure that
only one service replica is restarted at a time in the cluster.
This works by using pacemaker's CIB to implement a cluster-wide
restart lock for a service. The lock has a TTL so it's guaranteed
to be eventually released without requiring complex contingency
cleanup in case of failures.

Tested locally by running the following:
1. force recreate certificate on all nodes at once for galera
   (ipa-cert resubmit -i mysql), and verify that the resources
   restart one after the other

2. create a lock manually in pacemaker, recreate certificate for
   galera on all nodes, and verify that no resource is restarted
   before the manually created lock expires.

3. create a lock manually, let it expires, recreate a certificate,
   and verify that the resource is restarted appropriately and the
   lock gets cleaned up from pacemaker once the restart finished.

[1] Id10f026c8b31cad7b7313ac9427a99b3e6744788
[2] I17f1364932e43b8487515084e41b525e186888db

Related-Bug: #1904193
Closes-Bug: #1885113
Change-Id: Ib2b62e33b34cf72edfdae6299cf432259bf960a2
(cherry picked from commit 0f54889408)
(cherry picked from commit c8f5fdfc36)
(cherry picked from commit 8b16911cc2)
2021-01-07 14:30:21 +01:00
..
ca-certs-baremetal-puppet.yaml Use absolute name to include puppet classes 2020-04-11 08:13:23 +09:00
certmonger-user-baremetal-puppet.yaml Rolling certificate update for HA services 2021-01-07 14:30:21 +01:00