StarlingX Integration and packaging
e0f83421e2
The Ceph processes were not being started before Pmon started its monitoring. Then Pmon was detecting Ceph processes failures when the host was being initialized. Besides that, the puppet manifest was creating the .ceph_started flag at the wrong time, preventing controlling the Ceph processes at boot up. The ceph.sh script is modified to initialize all the Ceph processes according to the system mode and the system type. For a Simplex, all processes are started in sequence (mon, mds, osd). For the Duplex, only the fixed monitor and the mds are started. SM will start the other processes. For the Standard, it will start with no parameter first, and then the mds. After all processes were initialized, the .ceph_started flag is created so Pmon and SM can start monitoring Ceph processes. The ceph.sh script will always return success to prevent the host to not get enabled when Ceph fails to start. If any Ceph process fails when starting, the Pmon and SM will try to recover them and raise alarms accondingly. Additional changes: - Added a 'forcestart' action to the ceph-init-wrapper script to bypass the .ceph_started flag. The 'start' action on the ceph-init-wrapper script verifies the .ceph_started_flag and will skip the initialization if it does not exist. Creating the flag before calling the 'start' command would trigger a race condition between Pmon/SM and the ceph.sh script. - Improved the logging to add timestamp to each line. Test-Plan: PASS: AIO-SX deploy and lock/unlock, checking Ceph is running as expected and pmon log shows no errors for Ceph processes. PASS: For AIO-DX, AIO-DX+, Standard, Storage: Deploy, lock/unlock each host, DOR test, force reboot the active controller and force reboot the standby controller. Check Ceph is running as expected and the pmon log shows no errors for Ceph processes. PASS: Apply Ceph after the deploy (runtime) and check if the flag .ceph_started has been created. PASS: Deploy AIO-SX with Ceph configured with 1 OSD. Force corrupt OSD data by deleting some files from the disk. Reboot the host. Check the log /var/log/ceph/ceph-init.log if the start osd action returned error and check if the alarm raised for the OSD. Partial-bug: 2083056 Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com> Change-Id: I41cad8190616909f2a8be1d27c2ef8dd5a75a6a3 |
||
---|---|---|
base | ||
bmc | ||
centos-debian-compat | ||
ceph/ceph | ||
config | ||
database/mariadb/debian | ||
devstack | ||
doc | ||
docker/python-docker/debian | ||
filesystem | ||
golang-github-dev | ||
gpu/gpu-operator | ||
grub | ||
kata-containers/debian | ||
kubernetes | ||
ldap | ||
networking | ||
ostree | ||
python | ||
releasenotes | ||
requests-toolbelt | ||
security | ||
storage-drivers/trident-installer/debian | ||
tools | ||
virt | ||
.gitignore | ||
.gitreview | ||
.pylintrc | ||
.yamllint | ||
.zuul.yaml | ||
bindep.txt | ||
CONTRIBUTORS.wrs | ||
debian_build_layer.cfg | ||
debian_iso_image.inc | ||
debian_pkg_dirs | ||
debian_stable_docker_images.inc | ||
distroless_stable_docker_images.inc | ||
LICENSE | ||
README.rst | ||
test-requirements.txt | ||
tox.ini |
integ
StarlingX Integration