1bf3f77ebfafe1c1fdd56bd96c0e51e4fe2c412d
Role for recovering subcloud certificates after expiry.
This role recovers k8s Root CAs, k8s leaf certificates and dc admin
endpoint certificate chain.
This commit adds support for AIO-DX subcloud types to the existing
common/recover-subcloud-certificates role.
Note:
- As it is now, it only works for AIO-SX and AIO-DX subcloud types.
Additional work will be done for other types of environments and role
common/recover-subcloud-certificates will evolve to support other
multi-node system types as well.
A follow-up review will be posted with enhancements for compute nodes
after more testing and refactoring.
Test case:
PASS: In a subcloud where k8s certificates are not expired, run sudo
show-certs.sh and take note of the kubelet certificates dates
(kubelet-client-current.pem, kubelet-server, kubelet CA),
k8s certificates dates (admin.conf,apiserver,
apiserver-kubelet-client, controller-manager.conf,
front-proxy-client, scheduler.conf, K8s Root CA, FrontProxy CA),
dc admin endpoint certificates (DC-adminep-root-ca,
sc-adminep-intermediate-ca, subcloud#-adminep-certificate).
Now trigger the execution of role common/
recover-subcloud-certificates from the system controller and
verify that dates have not changed.
PASS: After the step above, run 'kubectl get po -A' to verify the
health of the cluster.
PASS: Verify rehoming runs for SX and DX subclouds successfully
after recovery:
1) On subcloud:
- Change 'hardware clock' of vbox vm to more than 11 years in the future
- Verify, after turning on vm, that system date was now 11 years ahead
- Verify that kubernetes is not responding. No kubectl commands were
accepted. 'sudo show-certs.sh' showed etcd and kubelet certs expired
and 'kubeadm certs check-expiration --config
/etc/kubernetes/kubeadm.yaml' shows all k8s certificates have expired
2) On the new systemcontroller:
- Configure network and ensure connectivity between new system
controller and the subcloud
- Trigger the execution of role common/recover-subcloud-certificates
and wait for it to finish ( ~ 7 mins when certs are expired,
about 20 secs otherwise)
3) On subcloud:
- Run 'kubeadm certs check-expiration --config
/etc/kubernetes/kubeadm.yaml and verify that all certificates
(admin.conf, apiserver, apiserver-kubelet-client,
controller-manager.conf, front-proxy-client, scheduler.conf,
FrontProxy CA, K8s Root CA) now show valid dates.
- Run 'sudo show-certs.sh' and verity that
the etcd certificates (etcd-client.crt, etcd-server.crt,
apiserver-etcd-client.crt), kubelet certificates
(kubelet-client-current.pem, kubelet-server, kubelet CA),
and the dc admin endpoint certificates (DC-adminep-root-ca,
sc-adminep-intermediate-ca, subcloud#-adminep-certificate)
now show valid dates.
4) On the new systemcontroller:
- Run 'dcmanager subcloud add --migrate' for the subcloud and verify
that the rehoming procedure is able to complete.
PASS: Verify reconnect to the same system controller is possible:
1) On the systemcontroller:
- Verify that target subcloud is online
- Change system controller's date to the target date in the future
- Manually recover certificates by running
/usr/bin/kube-cert-rotation.sh and
/usr/bin/kube-expired-kubelet-cert-recovery.sh and manually
restarting pods
2) On the subcloud:
- Change 'hardware clock' of vbox vm to more than 11 years in the future
- Verify, after turning on vm, that system date was now 11 years ahead
- Verify that kubernetes is not responding. No kubectl commands were
accepted. 'sudo show-certs.sh' showed etcd and kubelet certs expired
and 'kubeadm certs check-expiration --config
/etc/kubernetes/kubeadm.yaml' shows all k8s certificates have expired
3) On systemcontroller:
- Verify that subcloud now appears as 'offline'
- Trigger the execution of role common/recover-subcloud-certificates
and wait for it to finish ( ~ 7 mins when certs are expired,
about 20 secs otherwise)
4) On subcloud:
- Run 'kubeadm certs check-expiration --config
/etc/kubernetes/kubeadm.yaml and verify that all certificates
(admin.conf, apiserver, apiserver-kubelet-client,
controller-manager.conf, front-proxy-client, scheduler.conf,
FrontProxy CA, K8s Root CA) now show valid dates.
- Run 'sudo show-certs.sh' and verity that
the etcd certificates (etcd-client.crt, etcd-server.crt,
apiserver-etcd-client.crt), kubelet certificates
(kubelet-client-current.pem, kubelet-server, kubelet CA),
and the dc admin endpoint certificates (DC-adminep-root-ca,
sc-adminep-intermediate-ca, subcloud#-adminep-certificate)
now show valid dates.
5) On systemcontroller:
- Run 'dcmanager subcloud show subcloud#' and verify that subcloud is
now back online
Story: 2010815
Task: 48713
Depends-on: https://review.opendev.org/c/starlingx/config/+/893163
Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: Ief6504644e55d6ce83a85742c83b4173fbbf8808
stx-ansible-playbooks
StarlingX Bootstrap and Deployment Ansible1 Playbooks
Execution environment
- Unix like OS (recent Linux based distributions, MacOS, Cygwin)
- Python 3.8 and later
Additional Required Packages
In addition to the pakages listed in requirements.txt and test-requirements.txt, the following packages are required to run the playbooks remotely:
- python3-pexpect
- python3-ptyprocess
- sshpass
Supported StarlingX Releases
The playbooks are compatible with StarlingX R8.0 and later.
Executing StarlingX Playbooks
Bootstrap Playbook
For instructions on how to set up and execute the bootstrap playbook
from another host, please refer to the StarlingX Documentation2, at
Installation Guides, section Configure
controller-0 of the respective system deployment type.
Developer Notes
This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest3.
References
Description
Languages
Jinja
73%
Python
17.5%
Shell
6.5%
Smarty
2.9%