Add retry to start OSD
In Nautilus, it was verified for every start during the recovery sequence, the OSD have an initial monmap which have fsid zeroed. During the OSD initialization it requests the monmap to the monitor and and proceeds to the initialization routines, which involves sending commands to be executed by the monitor using the stored fsid. If the fsid doesn't match with the one in the monitor, it sends a message of "wrong fsid", causing the OSD to stop its execution. For the most part, the OSD will receive the monmap previous to sending commands to the monitor, however, sometimes this fails. In virtual environments it was possible to reproduce this error only once in 90 tries. In kernel rt this seems to happen more frequently. This fix allows the OSD to try five times to be started, so it can have a chance of correctly receiving the monmap before starting sending commands to the monitor. Testing performed: AIO-SX - Created an ansible test file to reproduce this error and left it running forever until the "wrong fsid" message appeared and in the second try it was able to receive the monmap and OSD was successfully started. Story: 2009074 Task: 44094 Signed-off-by: Vinicius Lopes da Silva <vinicius.lopesdasilva@windriver.com> Change-Id: Ib4b6d37b520ec2d78ea7b6a6c411a128ce284f66
This commit is contained in:
@@ -194,8 +194,8 @@
|
||||
- name: Restore store.db from mon-store
|
||||
shell: cp -ar /tmp/mon-store/store.db /var/lib/ceph/mon/ceph-{{ mon_name }}
|
||||
|
||||
- name: Bring up ceph Monitor and OSDs
|
||||
command: /etc/init.d/ceph start mon osd
|
||||
- name: Bring up ceph Monitor
|
||||
command: /etc/init.d/ceph start mon
|
||||
|
||||
- name: Wait for ceph monitor to be up
|
||||
shell: ceph -s
|
||||
@@ -203,6 +203,19 @@
|
||||
retries: 5
|
||||
delay: 2
|
||||
|
||||
# During initialization of OSD, it requests the monmap to the monitor
|
||||
# before sending the monitor commands to be run. Since there are different
|
||||
# threads involved, it is possible the OSD sends the command before receiving
|
||||
# the monmap, causing an error of "wrong fsid" and making OSD to stop its
|
||||
# execution. So we add a retry of 5 to make sure it will receive the monmap when
|
||||
# it should
|
||||
- name: Bring up ceph OSDs
|
||||
command: /etc/init.d/ceph start osd
|
||||
retries: 5
|
||||
delay: 3
|
||||
register: result
|
||||
until: result.rc == 0
|
||||
|
||||
- name: Enable Ceph Msgr v2 protocol
|
||||
shell: ceph mon enable-msgr2
|
||||
until: true
|
||||
|
||||
Reference in New Issue
Block a user