Currently if backup_cib() fails we just give up instantly.
In most cases this is actually okay since there are no chances
that the command succeeds in later retries. There are a number of cases
though were retrying should be done. IHA FFU is one of them because the
following might happen:
1) On the compute node we will try to set up a property when the remote
is not yet connected
Jul 21 04:59:45 compute-1 puppet-user: Debug: Executing: '/sbin/ip6tables-save'
Jul 21 04:59:46 compute-1 pacemaker-remoted: warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_rw because not connected to cluster
Jul 21 04:59:46 compute-1 pacemaker-remoted: error: Error in connection setup (/dev/shm/qb-42459-61225-14-DzDuCR/qb): Remote I/O error (121)
Jul 21 04:59:46 compute-1 puppet-user: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-compute-1-compute-instanceha-role]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200721-61008-hz2sdw failed with code: 1 -> Error: unable to get cib
2) This happens because in IHA FFU after the control plane has been
upgraded we upgrade the compute nodes one by one, without running
any other commands on the control plane. So we need to keep retrying
setting the property even if backup_cib() fails because eventually
the core cluster *will* reconnect to the remote:
Jul 21 05:01:22 compute-1 pacemaker-remoted: notice: Remote client connection accepted
That is due to the fact that we cannot necessarily control at what point
the core cluster retries the connection, so we should strive for
retrying a few times no matter what.
Tested this 4 times in a row successfully (before this patch we would
fail quite often)