Go to file

Rei Oliveira 1bf3f77ebf Recover expired certificates on AIO-DX subclouds

Role for recovering subcloud certificates after expiry.
This role recovers k8s Root CAs, k8s leaf certificates and dc admin
endpoint certificate chain.
This commit adds support for AIO-DX subcloud types to the existing
common/recover-subcloud-certificates role.

Note:

- As it is now, it only works for AIO-SX and AIO-DX subcloud types.
  Additional work will be done for other types of environments and role
  common/recover-subcloud-certificates will evolve to support other
  multi-node system types as well.
  A follow-up review will be posted with enhancements for compute nodes
  after more testing and refactoring.

Test case:

PASS: In a subcloud where k8s certificates are not expired, run sudo
      show-certs.sh and take note of the kubelet certificates dates
      (kubelet-client-current.pem, kubelet-server, kubelet CA),
      k8s certificates dates (admin.conf,apiserver,
      apiserver-kubelet-client, controller-manager.conf,
      front-proxy-client, scheduler.conf, K8s Root CA, FrontProxy CA),
      dc admin endpoint certificates (DC-adminep-root-ca,
      sc-adminep-intermediate-ca, subcloud#-adminep-certificate).
      Now trigger the execution of role common/
      recover-subcloud-certificates from the system controller and
      verify that dates have not changed.
PASS: After the step above, run 'kubectl get po -A' to verify the
      health of the cluster.

PASS: Verify rehoming runs for SX and DX subclouds successfully
after recovery:

1) On subcloud:
- Change 'hardware clock' of vbox vm to more than 11 years in the future
- Verify, after turning on vm, that system date was now 11 years ahead
- Verify that kubernetes is not responding. No kubectl commands were
  accepted. 'sudo show-certs.sh' showed etcd and kubelet certs expired
  and 'kubeadm certs check-expiration --config
  /etc/kubernetes/kubeadm.yaml' shows all k8s certificates have expired

2) On the new systemcontroller:
- Configure network and ensure connectivity between new system
  controller and the subcloud
- Trigger the execution of role common/recover-subcloud-certificates
  and wait for it to finish ( ~ 7 mins when certs are expired,
  about 20 secs otherwise)

3) On subcloud:
- Run 'kubeadm certs check-expiration --config
  /etc/kubernetes/kubeadm.yaml and verify that all certificates
  (admin.conf,  apiserver, apiserver-kubelet-client,
  controller-manager.conf, front-proxy-client, scheduler.conf,
  FrontProxy CA, K8s Root CA) now show valid dates.
- Run 'sudo show-certs.sh' and verity that
  the etcd certificates (etcd-client.crt, etcd-server.crt,
  apiserver-etcd-client.crt), kubelet certificates
  (kubelet-client-current.pem, kubelet-server, kubelet CA),
  and the dc admin endpoint certificates (DC-adminep-root-ca,
  sc-adminep-intermediate-ca, subcloud#-adminep-certificate)
  now show valid dates.

4) On the new systemcontroller:
 - Run 'dcmanager subcloud add --migrate' for the subcloud and verify
   that the rehoming procedure is able to complete.

PASS: Verify reconnect to the same system controller is possible:

1) On the systemcontroller:
- Verify that target subcloud is online
- Change system controller's date to the target date in the future
- Manually recover certificates by running
  /usr/bin/kube-cert-rotation.sh and
  /usr/bin/kube-expired-kubelet-cert-recovery.sh and manually
  restarting pods

2) On the subcloud:
- Change 'hardware clock' of vbox vm to more than 11 years in the future
- Verify, after turning on vm, that system date was now 11 years ahead
- Verify that kubernetes is not responding. No kubectl commands were
  accepted. 'sudo show-certs.sh' showed etcd and kubelet certs expired
  and 'kubeadm certs check-expiration --config
  /etc/kubernetes/kubeadm.yaml' shows all k8s certificates have expired

3) On systemcontroller:
- Verify that subcloud now appears as 'offline'
- Trigger the execution of role common/recover-subcloud-certificates
  and wait for it to finish ( ~ 7 mins when certs are expired,
  about 20 secs otherwise)

4) On subcloud:
- Run 'kubeadm certs check-expiration --config
  /etc/kubernetes/kubeadm.yaml and verify that all certificates
  (admin.conf,  apiserver, apiserver-kubelet-client,
  controller-manager.conf, front-proxy-client, scheduler.conf,
  FrontProxy CA, K8s Root CA) now show valid dates.
- Run 'sudo show-certs.sh' and verity that
  the etcd certificates (etcd-client.crt, etcd-server.crt,
  apiserver-etcd-client.crt), kubelet certificates
  (kubelet-client-current.pem, kubelet-server, kubelet CA),
  and the dc admin endpoint certificates (DC-adminep-root-ca,
  sc-adminep-intermediate-ca, subcloud#-adminep-certificate)
  now show valid dates.

5) On systemcontroller:
 - Run 'dcmanager subcloud show subcloud#' and verify that subcloud is
   now back online

Story: 2010815
Task: 48713
Depends-on: https://review.opendev.org/c/starlingx/config/+/893163

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: Ief6504644e55d6ce83a85742c83b4173fbbf8808

2023-09-19 15:09:41 -03:00

examples

Fix cert-manager migration playbook for subclouds

2023-06-22 16:12:25 -04:00

playbookconfig

Recover expired certificates on AIO-DX subclouds

2023-09-19 15:09:41 -03:00

.ansible-lint

Fix zuul failures from new release of ansible-lint

2020-08-18 10:49:30 -05:00

.gitignore

Fixing up tox -e cover for stx-config

2018-12-20 08:53:15 -06:00

.gitreview

Added .gitreview

2019-06-15 14:03:07 -05:00

.yamllint

Enable yamllint for ansible playbook .yml files

2019-06-24 13:27:26 -05:00

.zuul.yaml

Fix github mirroring for this repo

2023-04-28 12:38:49 -04:00

centos_build_layer.cfg

Build layering, add layer build config file

2019-10-15 19:47:15 +08:00

centos_iso_image.inc

Fix remote bootstrap from Windows control host

2019-07-09 17:28:06 -04:00

centos_pkg_dirs

Clean up repo import

2019-06-15 14:21:19 -05:00

CONTRIBUTORS.wrs

Clean up repo import

2019-06-15 14:21:19 -05:00

debian_build_layer.cfg

Add debian_build_layer.cfg file

2021-10-05 14:04:06 -04:00

debian_iso_image.inc

Add debian_iso_image.inc file

2021-10-27 01:56:02 -04:00

debian_pkg_dirs

Add debian package for playbookconfig

2021-09-29 18:32:15 +03:00

LICENSE

Clean up repo import

2019-06-15 14:21:19 -05:00

README.rst

starlingx/ansible-playbooks README improvement

2023-07-19 12:13:55 -03:00

requirements.txt

Fix remote play for backup and restore

2022-08-17 14:51:35 -03:00

test-requirements.txt

Fix tox ansible lint failure due to a new module

2022-01-10 11:04:20 -06:00

tox.ini

Update tox.ini to work with tox 4

2022-12-26 21:52:05 +00:00

README.rst

stx-ansible-playbooks

StarlingX Bootstrap and Deployment Ansible¹ Playbooks

Execution environment

Unix like OS (recent Linux based distributions, MacOS, Cygwin)
Python 3.8 and later

Additional Required Packages

In addition to the pakages listed in requirements.txt and test-requirements.txt, the following packages are required to run the playbooks remotely:

python3-pexpect
python3-ptyprocess
sshpass

Supported StarlingX Releases

The playbooks are compatible with StarlingX R8.0 and later.

Executing StarlingX Playbooks

Bootstrap Playbook

For instructions on how to set up and execute the bootstrap playbook from another host, please refer to the StarlingX Documentation², at Installation Guides, section Configure controller-0 of the respective system deployment type.

Developer Notes

This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest³.

README.rst

stx-ansible-playbooks

Execution environment

Additional Required Packages

Supported StarlingX Releases

Executing StarlingX Playbooks

Bootstrap Playbook

Developer Notes

References