Initial spec for kubernetes root CA certificate update

Story: 2008675

Authored-By: Andy Ning <andy.ning@windriver.com>
Co-Authored-By: Joao Paulo Soubihe <joaopaulo.soubihe@windriver.com>
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Change-Id: Ia09423afcf1762857a347d99f3cda8da2c4b1e77
This commit is contained in:
Andy Ning 2021-03-10 14:27:15 -05:00
parent 2d43ae232f
commit 8c2f28e24f
2 changed files with 641 additions and 8 deletions

View File

@ -1,8 +0,0 @@
.. placeholder:
===========
Placeholder
===========
This file is a placeholder and should be deleted when the first spec is moved
to this directory.

View File

@ -0,0 +1,641 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
=====================================
Kubernetes root CA certificate update
=====================================
Storyboard:
https://storyboard.openstack.org/#!/story/2008675
This feature introduces CLI/REST APIs and execution orchestration for updating
Kubernetes root CA certficate and certificates issued by the root CA in a
rolling fashion so that the impact on the system is minimized.
Problem description
===================
In a deployed Kubernetes cluster, the root CA certficate signs all the other
serving and client certificates used by various components for various
purposes. This root CA certificate may need to be updated for security or
administrative reasons while the cluster is still running.
An update mechanism is needed to update the root CA certificate and all the
certificates signed by the root CA certificate in a rolling fashion (ie.,
minimal impact on the applications and services running in the cluster).
Currently Kubernetes doesn't provide such a mechanism out of the box. A manual
update procedure [1]_ is possible but it's lengthy and error-prone. This
feature is to introduce a set of CLI/REST APIs and execution orchestration to
simplify the procedure.
Use Cases
---------
* The cluster's root CA certificate approaches its expiry date, the cloud admin
need to update the root CA certicate in order for the cluster to function
continously.
* The cloud admin decides to update the root CA certificate with a new one for
security concern.
Proposed change
===============
Enhance sysinv to support root CA certificate rolling update
------------------------------------------------------------
A rolling update procedure roughly based on [1]_ has been investigated. The
procedure consists of three phases. The first phase is to update kubernetes
components and pods to trust the new root CA certficate along with the old one
(trust both). The second phase is to update kubernetes components' server and
client certificates with new ones signed by the new root CA certificate. The
third phase is to remove the old root CA certficate from components' and pods'
trusted CA bundle so that only the new root CA certificate is trusted.
We will wrap up this update procedure by sysinv CLI commands and supporting
APIs. VIM and DC orchestration of the procedure will be in the future. This is
being done to hide the complexities of the underlying procedure, add in
semantic checks and overall provides a simpler, less error-prone procedure,
which will be analogous to the approach taken for other complex multi-host
procedures such as kubernetes upgrade, patching and system upgrades.
The overall feature will have multiple layers. sysinv REST APIs and CLI is the
first layer providing the fundamental implementation of the certificate update.
VIM orchestration is the second layer for executing the update across all hosts
in a cluster, by utilizing support from sysinv. DC Orchestration is the third
layer for executing VIM orchestration across all subclouds of a DC system.
There will also be a 4th layer in the future where cert-manager will manage the
kubernetes Root CA certificate and key. cert-mon will monitor the certificate
and raise alarm when it needs to be updated so that user can schedule the
orchestration of the update during a maintenance window.
The initial version of the spec will cover only the first layer, the sysinv
support for root CA certifcate update. Changes include adding new system CLI
commands and sysinv REST APIs to the existing framework, adding logic to sysinv
conductor to generate required puppet hieradata, and adding new puppet runtime
manifests to be applied by sysinv agent to make the actual certificate update
on hosts.
Sysinv operations for root CA certificate update
------------------------------------------------
A new set of sysinv CLI commands will be introduced to simplify the update
procedure. It will be a procedure similar to software upgrade, with a start,
execute and complete cycle. There won't be support for "abort", but user can
retry the command if it fails. And user can choose to restart the update
procedure by uploading or re-generating a new root CA certficate. This also
provides a mechanism to resume to the original CA certificate if user chooses
to upload the original CA certificate.
The following is a summary of the CLI commands and the steps to perform
kubernetes root CA certificate update.
1. system kube-rootca-update-start
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Pre-check to validate the update, initialize the procedure and mark update
progress as update-started.
2. system kube-rootca-certificate-generate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Generates a new kubernetes root CA certificate
* Change progress state to update-new-rootca-cert-generated
2. system kube-rootca-certificate-upload
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* User can choose to use this command to upload a new kubernetes root CA
certificate and private key from a file instead of generating one
* Change progress state to update-new-rootca-cert-uploaded
3. system kube-rootca-host-update <hostname> --phase=trustBothCAs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Update apiserver's trusted CAs to include the new CA cert
* Update scheduler's trusted CAs to include the new CA cert
* Update controller-manager's trusted CAs to include the new CA cert
* Update kubelet's trusted CAs to include the new CA cert
* Update admin.conf's trusted CAs to include the new CA cert
* Change progress state to updated-host-trustBothCAs on success
* Change progress state to updating-host-trustBothCAs-failed on failure
4. system kube-rootca-pods-update --phase=trustBothCAs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Annotate Daemonsets and Deployments to trigger pod replacement in a safer
rolling fashion, to ensure pods to pick up the new root CA cert as its trusted
CA along with the old root CA certificate
* Change progess state to updated-pods-trustBothCAs on success
* Change progess state to updating-pods-trustBothCAs-failed on failure
5. system kube-rootca-host-update <hostname> --phase=updateCerts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Update admin.conf's client cert/key data with new ones signed by the
new root CA
* Update apiserver's server and client certs/keys with new ones signed by the
new root CA
* Update scheduler's client cert/key with new one signed by the new root CA
* Update controller-manager's client cert/key with new one signed by the new
root CA
* Update kubelet's client cert/key with new one signed by the new root CA
* Change progress state to updated-host-updateCerts on success
* Chante progress state to updating-host-updateCerts-failed on failure
6. system kube-rootca-host-update <hostname> --phase=trustNewCA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Update admin.conf's trusted CAs to remove the old root CA
* Update apiserver's trusted CAs to remove the old root CA
* Update controller-manager's trusted CAs to remove the old root CA
* Update scheduler's trusted CAs to remove the old root CA
* Update kubelet's trusted CAs to remove the old root CA
* Change progress state to updated-host-trustNewCA on success
* Change progress state to updating-host-trustNewCA-failed on failure
7. system kube-rootca-pods-update --phase=trustNewCA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Annotate Daemonsets and Deployments to trigger pod replacement in a safer
rolling fashion, to remove the old root CA from pods trusted CA list
* Change progress state to updated-pods-trustNewCA on success
* Change progress state to updating-pods-trustNewCA-failed on failure
8. system kube-rootca-host-update complete
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Post-check to verify the update
* Change the progress state to update-complete
system kube-rootca-update-list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Run this command anytime to show the update status of all hosts in the
cluster
system kube-rootca-update-show
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Run this command anytime to show the overall update status
VIM Orchestration Operations
----------------------------
Refer to future spec
DC Orchestration Operations
---------------------------
Refer to future spec
cert-mon monitoring and alarm raising
-------------------------------------
Refer to future spec
Fault Handling
--------------
After the update start, user can re-try the step that fails. At any step before
update-complete, user can choose to reload or regenerate a new root CA
certificate and start the update procedure again. This provides a mechanism to
recover from a step that fails multiple times, as well as a mechanism to
restore the original root CA certficate.
CLI Clients
-----------
We will extend the existing system clients to add the new commands.
Web GUI
-------
If we want to allow the update to be handled entirely through the GUI we'd need
to add support in the GUI for all the operations from sysinv.
This will not be implemented in the initial release.
Alternatives
------------
kubernetes v1.18.1 has support to renew certificates via
"kubeadm alpha certs renew" command [2]_. Certificates can be renewed by
kubeadm include admin.conf, apiserver, apiserver-kubelete-client,
controller-manager.conf, scheduler.conf. It doesn't support renewal of the root
CA certificate and kubelet client certificates.
We could update /etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key with
a new root CA cert and use kubeadm to update the certificates supported, but
this procedure won't be a rolling update and will cause service outage. Still
we have to handle kubelet client certificates as they are not managed by
kubeadm.
Notably, this alternative procedure would be a lengthy manual error-prone
procedure.
Data model impact
-----------------
In order to track the progress of the update, the following tables in sysinv
database are required.
* kube_rootca_update
* created/update/delete_at: as per other tables
* id: as per other tables
* uuid: as per other tables
* from_rootca_cert: character (255), the id of the old root CA cert
* to_rootca_cert: character (255), the id of the new root CA cert
* state: character (255), the state of the update
* kube_rootca_host_update
* created/update/delete_at: as per other tables
* id: as per other tables
* uuid: as per other tables
* target_rootca_cert: character (255), the id of the new root CA cert
* effective_rootca_cert: character (255), the id of the current root CA cert
* state: character (255), the state of the update
* host_id: foreign key (i_host.id)
REST API impact
---------------
New sysinv REST APIs will be added to implement the certificate update logic on
top of the existing sysinv API framework. The actual certificate update in the
API implementation will be by sysinv-agent applying runtime puppet manifests on
each host.
The following is the list of REST resources and APIs to be added:
The new resource /kube_update_ca is added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* URLS:
* /v1/kube_update_ca
* Request Methods:
* POST /v1/kube_update_ca
* Creates (starts) a new root CA cert update
* Response body example::
{"from_rootca_cert": "kubenetes-5118144266510589551",
"state": "update-started",
"uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773",
"created_at": "2021-03-25T12:04:10.372399+00:00",
"updated_at": "2021-03-25T12:04:10.372399+00:00"}
* GET /v1/kube_update_ca
* Return the current kube_update_ca
* Response body example::
{"from_rootca_cert": "kubenetes-5118144266510589551",
"to_rootca_cert": "kubenetes-6118144266510589551",
"state": "update-started",
"uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773",
"created_at": "2021-03-25T12:04:10.372399+00:00",
"updated_at": "2021-03-25T14:45:43.252964+00:00"}
* PATCH /v1/kube_update_ca
* Modifies the current rootca_update. Used to update the state of the
update (e.g. to update_complete).
* Response body example::
{"from_rootca_cert": "kubenetes-5118144266510589551",
"to_rootca_cert": "kubenetes-6118144266510589551",
"state": "update-complete",
"uuid": "223ba65e-45d1-4383-baa7-f03bb4c46773",
"created_at": "2021-03-25T12:04:10.372399+00:00",
"updated_at": "2021-03-25T14:45:43.252964+00:00"}
* DELETE /v1/kube_update_ca
* Deletes the current rootca_update (after it is completed)
The new resource /kube_rootca_certificate/upload is added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* URLS:
* /v1/kube_rootca_certificate/upload
* Request Methods:
* POST /v1/kube_rootca_certificate/upload
* Upload a root CA cert and key from a file
* Request body example::
{"ca.crt": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMyRENDQWNDZ0..."
"ca.key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQk..."}
* Return body example::
{"cert_id": "kubenetes-5118144266510589551"}
The new resource /v1/kube_rootca_certificate/generate is added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* URLS:
* /v1/kube_rootca_certificate/generate
* Request Methods:
* POST /v1/kube_rootca_certificate/generate
* Tell sysinv to generate a new root CA cert and key pair
* Return body example::
{"cert_id": "kubenetes-5118144266510589551"}
The existing resource /ihosts is modified to add new actions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* URLS:
* /v1/ihosts/<hostid>
* Request Methods:
* POST /v1/ihosts/<hostid>/kube_update_ca
* Update root CA cert on the specified host
* Request body example::
{"phase", "trustBothCAs"}
* Response body example::
{"id": "4",
"hostname": "controller-1",
"personality": "controller",
"target_rootca_cert": "kubenetes-6118144266510589551",
"effective_rootca_cert": "kubenetes-5118144266510589551",
"state": "updating-host-trustBothCAs"}
The new resource /kube_hosts_update_ca
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* URLs:
* /v1/kube_hosts_update_ca
* Request Methods:
* GET /v1/kube_hosts_update_ca
* Returns the update details of all hosts
* Response body example::
{
"hosts": [
{"id": "2",
"hostname": "controller-1",
"personality": "controller",
"target_rootca_cert": "kubenetes-6118144266510589551",
"effective_rootca_cert": "kubenetes-5118144266510589551",
"state": "updating-host-trustBothCAs"
},
{"id": "4",
"hostname": "compute-0",
"personality": "compute",
"target_rootca_cert": "kubenetes-6118144266510589551",
"effective_rootca_cert": "kubenetes-5118144266510589551",
"state": "updating-host-updateCerts"
}
]
}
Security impact
---------------
The new sysinv APIs are to be added within the existing framework, there is
no changes to the existing security model.
The feature is providing a mechanism to update kubernetes certificates.
Frequent or routine certificate update will enhance cluster security.
Other end user impact
---------------------
End users will typically perform kubernetes root CA certificate update using
the sysinv (i.e. system) CLI. The new CLI commands are shown in the Proposed
change section above.
Performance Impact
------------------
When a root CA certificate update is in progress, kubernetes components
(apiserver, scheduler, controller-manager, kubelet) and application pods will
be restarted. Since the update is a rolling update, system will be
functioning as usual but there will be small performance impact during the
update. The user should update the host sequentially so the impact can be
minimized.
Other deployer impact
---------------------
Deployers will now be able to update the root CA certificate on a running
system in a rolling fashion.
Developer impact
----------------
Developers working on the StarlingX components that manage container
applications may need to be aware that certain operations should be prevented
when a root CA update is in progress, since these components will be restarted
during the update.
Developers working on application pods may also need to be aware that certain
operations should be prevented when a root CA update is in progress as pods will
be restarted during the update.
Generally speaking, there shouldn't be any deployment or development activities
on the system when a update is in progress. A maintenance window is a good time
to do the update.
Upgrade impact
--------------
The newly added root CA update tables in sysinv database need to be created
during upgrade from a release without this feature to a release with this
feature. The tables will have initial empty default values.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* Andy Ning (andy.wrs)
Other contributors:
* Soubihe, Joao Paulo (jsoubihe)
Repos Impacted
--------------
Impacted repo from this spec:
* config
* stx-puppet
Work Items
----------
Sysinv
^^^^^^
* New DB tables and APIs to access them
* kube-rootca-update-start CLI/API
* basic infrastructure
* semantic and system health checks for update start
* raise alarm to prevent upgrade, patching, etc.
* kube-rootca-certificate-upload CLI/API
* basic infrastructure
* semantic checks
* root CA issuer creation in cert-manager
* calculate the ID of the new root certificate
* kube-rootca-certificate-generate CLI/API
* basic infrastructure
* root CA certficate and issuer creation in cert-manager
* calculate the ID of the new root certificate
* kube-rootca-host-update <hostname> --phase=trustBothCAs CLI/API
* basic infrastructure
* semantic checks
* conductor RPC/implementation (generate hieradata, call agent to apply
puppet manifests, handle apply result, update host state etc...)
* agent RPC/implementation (apply puppet manifest, report back config
status, etc...)
* kube-rootca-pods-update --phase=trustBothCAs CLI/API
* basic infrastructure
* semantic checks
* conductor implementation (generate hieradata, trigger puppet
manifests apply, handle apply result, update progress state etc...)
* kube-rootca-host-update <hostname> --phase=updateCerts CLI/API
* basic infrastructure
* semantic checks
* conductor RPC/implementation (generate certificates and hieradata, call
agent to apply puppet manifests, handle apply result, update host state
etc...)
* agent RPC/implementation (apply puppet manifest, report back config
status, etc...)
* kube-rootca-host-update <hostname> --phase=trustNewCA CLI/API
* basic infrastructure
* semantic checks
* conductor RPC/implementation (generate hieradata, call agent to apply
puppet manifests, handle apply result, update host state etc...)
* agent RPC/implementation (apply puppet manifest, report back config
status, etc...)
* kube-rootca-pods-update --phase=trustNewCA CLI/API
* basic infrastructure
* semantic checks
* conductor implementation (generate hieradata, trigger puppet
manifests apply, handle apply result, update progress state etc...)
* kube-rootca-update-complete CLI/API
* basic infrastructure
* semantic checks
* clear the update in progress alarm
* system health checks for update complete
* kube-rootca-update-show CLI/API
* basic infrastructure
* condutor database query
* kube-rootca-update-list CLI/API
* basic infrastructure
* condutor database query
Puppet
^^^^^^
* runtime manifest for host update trustBothCAs phase
* runtime manifest for host update updateCerts phase
* runtime manifest for host update trustNewCA phase
System Upgrade
^^^^^^^^^^^^^^
* Upgrade script to create the new tables in sysinv database when upgrading
from a release without this feature. The tables will have default empty
values.
Dependencies
============
None
Testing
=======
The feature must be tested in the following StarlingX configurations:
* AIO-SX
* AIO-DX
* Standard with at least one kubernetes worker node
The test can be performed on hardware or virtual environments.
Documentation Impact
====================
New end user documentation will be required to describe how kubernetes root CA
certificate update should be done. The config API reference will also need
updates.
References
==========
.. [1] https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates
.. [2] https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - stx-6.0
- Introduced