c138575062
On AIO deployments puppet is run twice with two different manifests: 1. 'controller': to configure controller services 2. 'worker': to configure worker services. Ceph is configured when 'controller' manifests are applied, there is no need to run them a second time, when 'worker' set is applied. Commit adds new puppet classes to encapsulate ceph configuration based on node personality and adds a check to not apply it a 2nd time on controllers. If the ceph manifests are executed a second time then we get into a racing issue between SM's process monitoring and 'worker' puppet manifests triggering a restart of ceph-mon as part of reconfiguration After a reboot on AIO, SM takes control of ceph-mon monitoring after 'controller' puppet manifests finish applying. As part of this, SM monitors processes death notification and gets the pid from the .pid file. And periodically executes '/etc/init.d/ceph status mon.controller' for a more advanced monitoring. When the 'worker' manifests are executed, they trigger a restart of ceph-mon through /etc/init.d/ceph restart that has two steps: 'stop' in which ceph-mon is stopped, and 'start' in which it is restarted. In the first step, stopping ceph-mon leads to the death of ceph-mon process and removal of its PID file. This is promptly detected by SM which immediately triggers a start of ceph-mon that creates a new pid file. Problem is that ceph-mon was already in a restart, and at the end of the 'stop' step the init script cleans up the new pid file instead of the old. This leads to controllers swacting a couple of times before the system gets rid of the rogue process. Change-Id: I2a0df3bab716a553e71e322e1515bee2bb2f700d Co-authored-by: Ovidiu Poncea <ovidiu.poncea@windriver.com> Story: 2002844 Task: 29214 Signed-off-by: Ovidiu Poncea <ovidiu.poncea@windriver.com> |
||
---|---|---|
.. | ||
build_srpm.data | ||
puppet-manifests.spec |