Improve sc-adminep-certificate creation error log

When creating the sc-adminep-certificate in subclouds during
bootstrap/rehoming, there is a rescue block used to log the
state of the involved objects and apps for debug purposes in
case of failures.

However, due to this block, the playbook moves forward in an
unhealthy state, and fails later on, causing confusion while
debugging.

Also, moved a conditional wait that confirms the readiness of
the Certificate in K8s to be under the same block where the
objects are created. This way, the available rescue tasks
will log details in case the sc-cert/sc-adminep-certificate
is not ready after the max expected time.

Test Plan:
PASS: Deployed SX subcloud w/ success.

PASS: Deployed SX subcloud. During the deployment, manually
      modified the sc-cert/sc-adminep-certificate Certificate
      to not be issued after creation, before the conditional
      wait that verifies it. Observed that the rescue tasks
      were triggered and the playbook failed afterwards.

Closes-bug: 2091651

Change-Id: Ib0b617eda2bf67d5a8e743f7afd58445ffc36a7d
Signed-off-by: Marcelo de Castro Loebens <Marcelo.DeCastroLoebens@windriver.com>
This commit is contained in:
Marcelo de Castro Loebens
2024-11-04 11:09:55 -04:00
parent 711e6dc349
commit 4ab7aae0d6

View File

@@ -42,6 +42,12 @@
until: create_subcloud_ep is not failed
retries: 10
delay: 30
- name: Wait up to 30s for admin endpoint certificate to be ready
command: >-
kubectl --kubeconfig=/etc/kubernetes/admin.conf -n "{{ sc_adminep_ca_cert_ns }}"
wait --for=condition=ready certificate "{{ sc_adminep_cert_secret }}" --timeout=30s
rescue:
- name: System app-list
shell: "source /etc/platform/openrc; system application-list"
@@ -85,10 +91,10 @@
- debug:
msg: "{{ all_sc_cert_yaml_output.stdout_lines }}"
- name: Wait up to 30s for admin endpoint certificate to be ready
command: >-
kubectl --kubeconfig=/etc/kubernetes/admin.conf -n "{{ sc_adminep_ca_cert_ns }}"
wait --for=condition=ready certificate "{{ sc_adminep_cert_secret }}" --timeout=30s
- fail:
msg: >-
"Error while creating certificate {{ sc_adminep_cert_secret }}.
Check resources dumped and the logs from cert-manager pods."
- name: Copy admin endpoint certficates to the shared filesystem directory
copy: