diff --git a/doc/source/specs/stx-6.0/approved/_placeholder.rst b/doc/source/specs/stx-6.0/approved/_placeholder.rst deleted file mode 100644 index 0cd761b..0000000 --- a/doc/source/specs/stx-6.0/approved/_placeholder.rst +++ /dev/null @@ -1,8 +0,0 @@ -.. placeholder: - -=========== -Placeholder -=========== - -This file is a placeholder and should be deleted when the first spec is moved -to this directory. diff --git a/doc/source/specs/stx-6.0/approved/security-2008675-kubernetes-rootca-update.rst b/doc/source/specs/stx-6.0/approved/security-2008675-kubernetes-rootca-update.rst new file mode 100644 index 0000000..b9f85b9 --- /dev/null +++ b/doc/source/specs/stx-6.0/approved/security-2008675-kubernetes-rootca-update.rst @@ -0,0 +1,641 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. http://creativecommons.org/licenses/by/3.0/legalcode + + +===================================== +Kubernetes root CA certificate update +===================================== + +Storyboard: +https://storyboard.openstack.org/#!/story/2008675 + +This feature introduces CLI/REST APIs and execution orchestration for updating +Kubernetes root CA certficate and certificates issued by the root CA in a +rolling fashion so that the impact on the system is minimized. + +Problem description +=================== + +In a deployed Kubernetes cluster, the root CA certficate signs all the other +serving and client certificates used by various components for various +purposes. This root CA certificate may need to be updated for security or +administrative reasons while the cluster is still running. + +An update mechanism is needed to update the root CA certificate and all the +certificates signed by the root CA certificate in a rolling fashion (ie., +minimal impact on the applications and services running in the cluster). + +Currently Kubernetes doesn't provide such a mechanism out of the box. A manual +update procedure [1]_ is possible but it's lengthy and error-prone. This +feature is to introduce a set of CLI/REST APIs and execution orchestration to +simplify the procedure. + +Use Cases +--------- + +* The cluster's root CA certificate approaches its expiry date, the cloud admin + need to update the root CA certicate in order for the cluster to function + continously. +* The cloud admin decides to update the root CA certificate with a new one for + security concern. + +Proposed change +=============== + +Enhance sysinv to support root CA certificate rolling update +------------------------------------------------------------ + +A rolling update procedure roughly based on [1]_ has been investigated. The +procedure consists of three phases. The first phase is to update kubernetes +components and pods to trust the new root CA certficate along with the old one +(trust both). The second phase is to update kubernetes components' server and +client certificates with new ones signed by the new root CA certificate. The +third phase is to remove the old root CA certficate from components' and pods' +trusted CA bundle so that only the new root CA certificate is trusted. + +We will wrap up this update procedure by sysinv CLI commands and supporting +APIs. VIM and DC orchestration of the procedure will be in the future. This is +being done to hide the complexities of the underlying procedure, add in +semantic checks and overall provides a simpler, less error-prone procedure, +which will be analogous to the approach taken for other complex multi-host +procedures such as kubernetes upgrade, patching and system upgrades. + +The overall feature will have multiple layers. sysinv REST APIs and CLI is the +first layer providing the fundamental implementation of the certificate update. +VIM orchestration is the second layer for executing the update across all hosts +in a cluster, by utilizing support from sysinv. DC Orchestration is the third +layer for executing VIM orchestration across all subclouds of a DC system. + +There will also be a 4th layer in the future where cert-manager will manage the +kubernetes Root CA certificate and key. cert-mon will monitor the certificate +and raise alarm when it needs to be updated so that user can schedule the +orchestration of the update during a maintenance window. + +The initial version of the spec will cover only the first layer, the sysinv +support for root CA certifcate update. Changes include adding new system CLI +commands and sysinv REST APIs to the existing framework, adding logic to sysinv +conductor to generate required puppet hieradata, and adding new puppet runtime +manifests to be applied by sysinv agent to make the actual certificate update +on hosts. + +Sysinv operations for root CA certificate update +------------------------------------------------ + +A new set of sysinv CLI commands will be introduced to simplify the update +procedure. It will be a procedure similar to software upgrade, with a start, +execute and complete cycle. There won't be support for "abort", but user can +retry the command if it fails. And user can choose to restart the update +procedure by uploading or re-generating a new root CA certficate. This also +provides a mechanism to resume to the original CA certificate if user chooses +to upload the original CA certificate. + +The following is a summary of the CLI commands and the steps to perform +kubernetes root CA certificate update. + +1. system kube-rootca-update-start +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Pre-check to validate the update, initialize the procedure and mark update + progress as update-started. + +2. system kube-rootca-certificate-generate +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Generates a new kubernetes root CA certificate +* Change progress state to update-new-rootca-cert-generated + +2. system kube-rootca-certificate-upload +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* User can choose to use this command to upload a new kubernetes root CA + certificate and private key from a file instead of generating one +* Change progress state to update-new-rootca-cert-uploaded + +3. system kube-rootca-host-update --phase=trustBothCAs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Update apiserver's trusted CAs to include the new CA cert +* Update scheduler's trusted CAs to include the new CA cert +* Update controller-manager's trusted CAs to include the new CA cert +* Update kubelet's trusted CAs to include the new CA cert +* Update admin.conf's trusted CAs to include the new CA cert +* Change progress state to updated-host-trustBothCAs on success +* Change progress state to updating-host-trustBothCAs-failed on failure + +4. system kube-rootca-pods-update --phase=trustBothCAs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Annotate Daemonsets and Deployments to trigger pod replacement in a safer + rolling fashion, to ensure pods to pick up the new root CA cert as its trusted + CA along with the old root CA certificate +* Change progess state to updated-pods-trustBothCAs on success +* Change progess state to updating-pods-trustBothCAs-failed on failure + +5. system kube-rootca-host-update --phase=updateCerts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Update admin.conf's client cert/key data with new ones signed by the + new root CA +* Update apiserver's server and client certs/keys with new ones signed by the + new root CA +* Update scheduler's client cert/key with new one signed by the new root CA +* Update controller-manager's client cert/key with new one signed by the new + root CA +* Update kubelet's client cert/key with new one signed by the new root CA +* Change progress state to updated-host-updateCerts on success +* Chante progress state to updating-host-updateCerts-failed on failure + +6. system kube-rootca-host-update --phase=trustNewCA +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Update admin.conf's trusted CAs to remove the old root CA +* Update apiserver's trusted CAs to remove the old root CA +* Update controller-manager's trusted CAs to remove the old root CA +* Update scheduler's trusted CAs to remove the old root CA +* Update kubelet's trusted CAs to remove the old root CA +* Change progress state to updated-host-trustNewCA on success +* Change progress state to updating-host-trustNewCA-failed on failure + +7. system kube-rootca-pods-update --phase=trustNewCA +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Annotate Daemonsets and Deployments to trigger pod replacement in a safer + rolling fashion, to remove the old root CA from pods trusted CA list +* Change progress state to updated-pods-trustNewCA on success +* Change progress state to updating-pods-trustNewCA-failed on failure + +8. system kube-rootca-host-update complete +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Post-check to verify the update +* Change the progress state to update-complete + +system kube-rootca-update-list +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Run this command anytime to show the update status of all hosts in the + cluster + +system kube-rootca-update-show +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* Run this command anytime to show the overall update status + +VIM Orchestration Operations +---------------------------- + +Refer to future spec + +DC Orchestration Operations +--------------------------- + +Refer to future spec + +cert-mon monitoring and alarm raising +------------------------------------- + +Refer to future spec + +Fault Handling +-------------- + +After the update start, user can re-try the step that fails. At any step before +update-complete, user can choose to reload or regenerate a new root CA +certificate and start the update procedure again. This provides a mechanism to +recover from a step that fails multiple times, as well as a mechanism to +restore the original root CA certficate. + +CLI Clients +----------- + +We will extend the existing system clients to add the new commands. + +Web GUI +------- + +If we want to allow the update to be handled entirely through the GUI we'd need +to add support in the GUI for all the operations from sysinv. + +This will not be implemented in the initial release. + +Alternatives +------------ + +kubernetes v1.18.1 has support to renew certificates via +"kubeadm alpha certs renew" command [2]_. Certificates can be renewed by +kubeadm include admin.conf, apiserver, apiserver-kubelete-client, +controller-manager.conf, scheduler.conf. It doesn't support renewal of the root +CA certificate and kubelet client certificates. + +We could update /etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key with +a new root CA cert and use kubeadm to update the certificates supported, but +this procedure won't be a rolling update and will cause service outage. Still +we have to handle kubelet client certificates as they are not managed by +kubeadm. + +Notably, this alternative procedure would be a lengthy manual error-prone +procedure. + +Data model impact +----------------- + +In order to track the progress of the update, the following tables in sysinv +database are required. + +* kube_rootca_update + + * created/update/delete_at: as per other tables + * id: as per other tables + * uuid: as per other tables + * from_rootca_cert: character (255), the id of the old root CA cert + * to_rootca_cert: character (255), the id of the new root CA cert + * state: character (255), the state of the update + +* kube_rootca_host_update + + * created/update/delete_at: as per other tables + * id: as per other tables + * uuid: as per other tables + * target_rootca_cert: character (255), the id of the new root CA cert + * effective_rootca_cert: character (255), the id of the current root CA cert + * state: character (255), the state of the update + * host_id: foreign key (i_host.id) + +REST API impact +--------------- + +New sysinv REST APIs will be added to implement the certificate update logic on +top of the existing sysinv API framework. The actual certificate update in the +API implementation will be by sysinv-agent applying runtime puppet manifests on +each host. + +The following is the list of REST resources and APIs to be added: + +The new resource /kube_update_ca is added +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* URLS: + + * /v1/kube_update_ca + +* Request Methods: + + * POST /v1/kube_update_ca + + * Creates (starts) a new root CA cert update + + * Response body example:: + + {"from_rootca_cert": "kubenetes-5118144266510589551", + "state": "update-started", + "uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773", + "created_at": "2021-03-25T12:04:10.372399+00:00", + "updated_at": "2021-03-25T12:04:10.372399+00:00"} + + * GET /v1/kube_update_ca + + * Return the current kube_update_ca + + * Response body example:: + + {"from_rootca_cert": "kubenetes-5118144266510589551", + "to_rootca_cert": "kubenetes-6118144266510589551", + "state": "update-started", + "uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773", + "created_at": "2021-03-25T12:04:10.372399+00:00", + "updated_at": "2021-03-25T14:45:43.252964+00:00"} + + * PATCH /v1/kube_update_ca + + * Modifies the current rootca_update. Used to update the state of the + update (e.g. to update_complete). + + * Response body example:: + + {"from_rootca_cert": "kubenetes-5118144266510589551", + "to_rootca_cert": "kubenetes-6118144266510589551", + "state": "update-complete", + "uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773", + "created_at": "2021-03-25T12:04:10.372399+00:00", + "updated_at": "2021-03-25T14:45:43.252964+00:00"} + + * DELETE /v1/kube_update_ca + + * Deletes the current rootca_update (after it is completed) + +The new resource /kube_rootca_certificate/upload is added +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* URLS: + + * /v1/kube_rootca_certificate/upload + +* Request Methods: + + * POST /v1/kube_rootca_certificate/upload + + * Upload a root CA cert and key from a file + + * Request body example:: + + {"ca.crt": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMyRENDQWNDZ0..." + "ca.key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQk..."} + + * Return body example:: + + {"cert_id": "kubenetes-5118144266510589551"} + +The new resource /v1/kube_rootca_certificate/generate is added +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* URLS: + + * /v1/kube_rootca_certificate/generate + +* Request Methods: + + * POST /v1/kube_rootca_certificate/generate + + * Tell sysinv to generate a new root CA cert and key pair + + * Return body example:: + + {"cert_id": "kubenetes-5118144266510589551"} + +The existing resource /ihosts is modified to add new actions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* URLS: + + * /v1/ihosts/ + +* Request Methods: + + * POST /v1/ihosts//kube_update_ca + + * Update root CA cert on the specified host + + * Request body example:: + + {"phase", "trustBothCAs"} + + * Response body example:: + + {"id": "4", + "hostname": "controller-1", + "personality": "controller", + "target_rootca_cert": "kubenetes-6118144266510589551", + "effective_rootca_cert": "kubenetes-5118144266510589551", + "state": "updating-host-trustBothCAs"} + +The new resource /kube_hosts_update_ca +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* URLs: + + * /v1/kube_hosts_update_ca + +* Request Methods: + + * GET /v1/kube_hosts_update_ca + + * Returns the update details of all hosts + + * Response body example:: + + { + "hosts": [ + {"id": "2", + "hostname": "controller-1", + "personality": "controller", + "target_rootca_cert": "kubenetes-6118144266510589551", + "effective_rootca_cert": "kubenetes-5118144266510589551", + "state": "updating-host-trustBothCAs" + }, + {"id": "4", + "hostname": "compute-0", + "personality": "compute", + "target_rootca_cert": "kubenetes-6118144266510589551", + "effective_rootca_cert": "kubenetes-5118144266510589551", + "state": "updating-host-updateCerts" + } + ] + } + +Security impact +--------------- + +The new sysinv APIs are to be added within the existing framework, there is +no changes to the existing security model. + +The feature is providing a mechanism to update kubernetes certificates. +Frequent or routine certificate update will enhance cluster security. + +Other end user impact +--------------------- + +End users will typically perform kubernetes root CA certificate update using +the sysinv (i.e. system) CLI. The new CLI commands are shown in the Proposed +change section above. + +Performance Impact +------------------ + +When a root CA certificate update is in progress, kubernetes components +(apiserver, scheduler, controller-manager, kubelet) and application pods will +be restarted. Since the update is a rolling update, system will be +functioning as usual but there will be small performance impact during the +update. The user should update the host sequentially so the impact can be +minimized. + +Other deployer impact +--------------------- + +Deployers will now be able to update the root CA certificate on a running +system in a rolling fashion. + +Developer impact +---------------- + +Developers working on the StarlingX components that manage container +applications may need to be aware that certain operations should be prevented +when a root CA update is in progress, since these components will be restarted +during the update. + +Developers working on application pods may also need to be aware that certain +operations should be prevented when a root CA update is in progress as pods will +be restarted during the update. + +Generally speaking, there shouldn't be any deployment or development activities +on the system when a update is in progress. A maintenance window is a good time +to do the update. + +Upgrade impact +-------------- + +The newly added root CA update tables in sysinv database need to be created +during upgrade from a release without this feature to a release with this +feature. The tables will have initial empty default values. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + +* Andy Ning (andy.wrs) + +Other contributors: + +* Soubihe, Joao Paulo (jsoubihe) + +Repos Impacted +-------------- + +Impacted repo from this spec: +* config +* stx-puppet + +Work Items +---------- + +Sysinv +^^^^^^ + +* New DB tables and APIs to access them + +* kube-rootca-update-start CLI/API + + * basic infrastructure + * semantic and system health checks for update start + * raise alarm to prevent upgrade, patching, etc. + +* kube-rootca-certificate-upload CLI/API + + * basic infrastructure + * semantic checks + * root CA issuer creation in cert-manager + * calculate the ID of the new root certificate + +* kube-rootca-certificate-generate CLI/API + + * basic infrastructure + * root CA certficate and issuer creation in cert-manager + * calculate the ID of the new root certificate + +* kube-rootca-host-update --phase=trustBothCAs CLI/API + + * basic infrastructure + * semantic checks + * conductor RPC/implementation (generate hieradata, call agent to apply + puppet manifests, handle apply result, update host state etc...) + * agent RPC/implementation (apply puppet manifest, report back config + status, etc...) + +* kube-rootca-pods-update --phase=trustBothCAs CLI/API + + * basic infrastructure + * semantic checks + * conductor implementation (generate hieradata, trigger puppet + manifests apply, handle apply result, update progress state etc...) + +* kube-rootca-host-update --phase=updateCerts CLI/API + + * basic infrastructure + * semantic checks + * conductor RPC/implementation (generate certificates and hieradata, call + agent to apply puppet manifests, handle apply result, update host state + etc...) + * agent RPC/implementation (apply puppet manifest, report back config + status, etc...) + +* kube-rootca-host-update --phase=trustNewCA CLI/API + + * basic infrastructure + * semantic checks + * conductor RPC/implementation (generate hieradata, call agent to apply + puppet manifests, handle apply result, update host state etc...) + * agent RPC/implementation (apply puppet manifest, report back config + status, etc...) + +* kube-rootca-pods-update --phase=trustNewCA CLI/API + + * basic infrastructure + * semantic checks + * conductor implementation (generate hieradata, trigger puppet + manifests apply, handle apply result, update progress state etc...) + +* kube-rootca-update-complete CLI/API + + * basic infrastructure + * semantic checks + * clear the update in progress alarm + * system health checks for update complete + +* kube-rootca-update-show CLI/API + + * basic infrastructure + * condutor database query + +* kube-rootca-update-list CLI/API + + * basic infrastructure + * condutor database query + +Puppet +^^^^^^ + +* runtime manifest for host update trustBothCAs phase +* runtime manifest for host update updateCerts phase +* runtime manifest for host update trustNewCA phase + +System Upgrade +^^^^^^^^^^^^^^ + +* Upgrade script to create the new tables in sysinv database when upgrading + from a release without this feature. The tables will have default empty + values. + +Dependencies +============ + +None + +Testing +======= + +The feature must be tested in the following StarlingX configurations: + +* AIO-SX +* AIO-DX +* Standard with at least one kubernetes worker node + +The test can be performed on hardware or virtual environments. + +Documentation Impact +==================== + +New end user documentation will be required to describe how kubernetes root CA +certificate update should be done. The config API reference will also need +updates. + +References +========== + +.. [1] https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates +.. [2] https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/ + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - stx-6.0 + - Introduced