tripleo-heat-templates/deployment/pacemaker
Michele Baldessari dd45ce71a9 Reauthenicate remotes after an upgrade to Train
When upgrading from Queens to Train and changing major operating system
as well, we currently take great care in puppet tripleo to let any
remotes still running on Centos7 keep working while the control plane
has moved to new Centos8 and hence new pcmk/pcs versions.

Since pcs has changed how things authenticate with remotes, we need to
make sure that after an upgrade we reauthenticate any remotes with pcs.
Other wise any pcs operation involving a remote (scaleup) will fail with
an error like the following:

021-10-14 18:57:16,332 p=133710 u=mistral n=ansible | fatal: [clopkhd1]: FAILED! => {"ansible_job_id": "823537932379.262448", "attempts": 126, "changed": true, "cmd": "set -o pipefail; puppet apply  --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=false   /var/lib/tripleo-config/puppet_step_config.pp 2>&1 | logger -s -t puppet-user", "delta": "0:06:42.041400", "end": "2021-10-14 18:57:13.293758", "failed_when_result": true, "finished": 1, "msg": "non-zero return code", "
...
...
Error: pcs create failed: Error: Hosts 'cmp1', 'cmp10', 'cmp11', 'cmp12', 'cmp14', 'cmp15', 'cmp16', 'cmp17', 'cmp18', 'cmp19', 'cmp2', 'cmp3', 'cmp4', 'cmp5', 'cmp6', 'cmp7', 'cmp8', 'cmp9' are not known to pcs, try to authenticate the hosts using 'pcs host auth cmp1 cmp10 cmp11 cmp12 cmp14 cmp15 cmp16 cmp17 cmp18 cmp19 cmp2 cmp3 cmp4 cmp5 cmp6 cmp7 cmp8 cmp9' command, use --skip-offline to override\n<13>Oct 14 18:56:54 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Pacemaker::Resource::Remote[cmp13]/Pcmk_remote[cmp13]/ensure: change from 'absent' to 'present' failed: pcs create failed: Error: Hosts 'cmp1', 'cmp10', 'cmp11', 'cmp12', 'cmp14', 'cmp15', 'cmp16', 'cmp17', 'cmp18', 'cmp19', 'cmp2', 'cmp3', 'cmp4', 'cmp5', 'cmp6', 'cmp7', 'cmp8', 'cmp9' are not known to pcs, try to authenticate the hosts using 'pcs host auth cmp1 cmp10 cmp11 cmp12 cmp14 cmp15 cmp16 cmp17 cmp18 cmp19 cmp2 cmp3 cmp4 cmp5 cmp6 cmp7 cmp8 cmp9' command, use --skip-offline to override\n<13>Oct 14 18:56:54 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Pacemaker/Exec[exec-wait-for-cmp13]: Dependency Pcmk_remote[cmp13] has failures: true

This is because the upgrade remotes have not been reauthenticated to
pcs (which basically means that the remote computes are not present in
/var/lib/pcsd/known-hosts). With this change we observe the following
during an FFU:
2021-10-29 21:12:20 | TASK [Try and reauthenticate the remote via pcsd from the core cluster] ********
2021-10-29 21:12:20 | Friday 29 October 2021  21:12:14 +0000 (0:00:00.760)       0:00:15.098 ********
2021-10-29 21:12:20 | changed: [compute-0 -> 192.168.24.20] => {"changed": true, "cmd": "pcs host auth \"compute-0\" -u hacluster -p $(hiera -c /etc/puppet/hiera.yaml hacluster_pwd)", "delta": "0:00:02.180656", "end": "2021-10-29 21:12:17.656863", "rc": 0, "start": "2021-10-29 21:12:15.476207", "stderr": "", "stderr_lines": [], "stdout": "compute-0: Authorized", "stdout_lines": ["compute-0: Authorized"]}

And afterwards we see the node in /var/lib/pcsd/known-hosts (which we
did not before):
[root@controller-0 ~]# grep compute /var/lib/pcsd/known-hosts
    "compute-0": {
          "addr": "compute-0",

Closes-Bug: #1949255

Change-Id: Ib105fedd014a46260cd3f2fa3e2e59ed0ffb730d
2021-10-30 07:29:31 +00:00
..
clustercheck-container-puppet.yaml Remove unnecessary slash volume maps 2020-02-10 12:01:02 -05:00
compute-instanceha-baremetal-puppet.yaml Remove corosync.conf if it's a dir from remote. 2020-11-20 14:24:41 +01:00
ovn-dbs-baremetal-puppet.yaml Move compute-instanceha, neutron-ovn-dvr-ha to deployments 2019-05-30 20:37:36 +00:00
pacemaker-baremetal-puppet.yaml Correct metrics_qdr logging path and regex parsing 2021-06-04 11:24:17 -04:00
pacemaker-remote-baremetal-puppet.yaml Reauthenicate remotes after an upgrade to Train 2021-10-30 07:29:31 +00:00