304 lines
11 KiB
ReStructuredText
304 lines
11 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
======================================
|
|
Secret Management with Hashicorp Vault
|
|
======================================
|
|
|
|
Storyboard: https://storyboard.openstack.org/#!/story/2007718
|
|
|
|
This story introduces Vault into the StarlingX solution in order to
|
|
support secret management.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Users want the ability to store and access secrets securely. These secrets can
|
|
include credentials, encryption keys, API tokens and other data that should
|
|
not be stored in plain text on a system. Vault provides the ability to encrypt
|
|
and store secrets with access control via a range of authorization and access
|
|
policy configurations.
|
|
|
|
Vault's features include dynamic secret generation, data encryption, leasing
|
|
and renewal, revocation and audit/logging. Vault is an open source project
|
|
and is supported for deployment in a Kubernetes cluster. More information is
|
|
available at: https://www.vaultproject.io/docs
|
|
|
|
Use Cases
|
|
---------
|
|
|
|
* Developer wants to store secrets to be consumed by an application
|
|
* Developer wants to submit data for encryption before storing it elsewhere
|
|
* End user wants granular access control for secrets with detailed logs
|
|
* End user wants to dynamically generate API keys for scripts
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Packaging & installation
|
|
------------------------
|
|
|
|
StarlingX will contain the Armada manifest and Vault helm charts, which will
|
|
be uploaded and applied to pull the container images from a public registry,
|
|
configure and launch the Vault pods.
|
|
|
|
The Vault application will initially be packaged as a tarball that can be
|
|
transferred to the system and activated with system application-upload &
|
|
system application-apply.
|
|
|
|
In the future, the app will be auto-installed during platform deployment in
|
|
in order to provide secret management for other StarlingX applications.
|
|
|
|
The Vault application will provide system-overrides and support auto-apply on
|
|
system configuration changes. This specifically relates to the Vault HA
|
|
configuration parameters which will be set based on the StarlingX deployment
|
|
type.
|
|
|
|
The user will be provided the ability to override the default configuration
|
|
parameters of the application via override parameters.
|
|
|
|
Vault integration for platform secrets
|
|
--------------------------------------
|
|
|
|
StarlingX platform apps currently using Barbican for secrets storage will be
|
|
migrated to use Vault. This will include the exististing IPMI credentials and
|
|
external Docker registry credentials currently stored in Barbican. Vault is
|
|
being added as a cloud-native/container-based solution that can more easily
|
|
provide a robust feature set to hosted container applications.
|
|
|
|
Vault storage backend
|
|
---------------------
|
|
|
|
|
|
Vault supports several storage backends where encrypted data is persisted.
|
|
The suitable options for StarlingX are the Consul backend and the integrated
|
|
storage backend with the Raft consensus protocol. Integrated storage with Raft
|
|
is preferred for the following reasons:
|
|
|
|
- Does not require the integration and deployment of another separate app.
|
|
Consul is a service mesh that offers many other unrelated features. This
|
|
would significantly broaden the maintenance scope for the Vault app.
|
|
|
|
- Supports HA and provides backup/restore workflows.
|
|
|
|
- Simple to deploy and understand. The official Vault helm charts automatically
|
|
create and mount a data storage PVC for each Vault pod and the Raft consensus
|
|
protocol is used to replicate Vault data to each pod.
|
|
|
|
- Raft consensus protocol is at the heart of etcd in Kubernetes, so knowledge
|
|
is transferrable between the solutions.
|
|
|
|
This table illustrates the failure tolerance for various cluster sizes.
|
|
The documentation recommends cluster sizes of 3 or 5.
|
|
|
|
+---------+-------------+-------------------+
|
|
| Servers | Quorum Size | Failure Tolerance |
|
|
+---------+-------------+-------------------+
|
|
| 1 | 1 | 0 |
|
|
+---------+-------------+-------------------+
|
|
| 2 | 2 | 0 |
|
|
+---------+-------------+-------------------+
|
|
| 3 | 2 | 1 |
|
|
+---------+-------------+-------------------+
|
|
| 4 | 3 | 1 |
|
|
+---------+-------------+-------------------+
|
|
| 5 | 3 | 2 |
|
|
+---------+-------------+-------------------+
|
|
| 6 | 4 | 2 |
|
|
+---------+-------------+-------------------+
|
|
| 7 | 4 | 3 |
|
|
+---------+-------------+-------------------+
|
|
|
|
For additional information on storage backends, see:
|
|
https://www.vaultproject.io/docs/configuration/storage
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
In the secret management ecosystem, Vault offers what can be described as
|
|
'secret service' meaning that it provides secrets as a service to consumers.
|
|
This differs from other secret management tools that are focused on file
|
|
encyption. Many of the other products in the secret service category are cloud
|
|
provider products and cannot be directly integrated into a provider-agnostic
|
|
platform.
|
|
|
|
Vault is the most fully featured and documented of the available self-hosted
|
|
options. Vault also provides official helm charts and will continue to
|
|
extend its cloud integration in the future.
|
|
|
|
For a helpful comparison of secret management tools, see:
|
|
https://amido.com/app/uploads/2019/04/Secret-Management-tooling-Table-Tools.png
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
Applications currently consuming secrets from Barbican will require code
|
|
changes to make the appropriate REST API calls to access Vault. This will also
|
|
require defining a policy schema for accessing the secrets in Vault along with
|
|
selecting a method of authorizing the applications to access Vault. Barbican
|
|
relies on Keystone as its authorization backend, while Vault supports a range
|
|
of authorization backends. Vault's Kubernetes auth method will be used to
|
|
authenticate apps running within the k8s cluster. The AppRole auth method will
|
|
be used for apps running on the underlying OS.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
Provides a new REST API for setting and consuming secrets and encryption
|
|
services.
|
|
|
|
For API documentation, see:
|
|
https://www.vaultproject.io/api-docs
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
Vault is a security-focused application which aims to enhance the security
|
|
posture of the apps that consume its services. Concerns raised by this app
|
|
integration are discussed here.
|
|
|
|
Handling of the root token and master key shards
|
|
________________________________________________
|
|
|
|
Initialization of a Vault server generates a root token used for logging into
|
|
the vault, as well as a configurable number of master key shards used for
|
|
unsealing the vault and rekey/recovery operations.
|
|
|
|
Vault documentation recommends revoking the initial root token after initial
|
|
configuration is completed and other authorization methods have been made
|
|
available. Root tokens can be regenerated with a quorum of key shards present.
|
|
|
|
The key shards are the linchpin of vault's security and are used to decrypt
|
|
the master key in vault's memory during unsealing. These keys must be preserved
|
|
but can invalidate vault's security if they are compromised.
|
|
|
|
Unsealing Vault server pods
|
|
___________________________
|
|
|
|
Vault servers automatically come up in a sealed state where they are unable to
|
|
decrypt any of their stored data until a quorum of key shards are provided
|
|
during an unseal operation. This means that Vault servers running in a cluster
|
|
will be sealed if their pods are restarted for any reason.
|
|
|
|
This challenge can be handled by configuring Vault to autounseal by querying
|
|
an external Key Management System (ie. from a cloud provider) for unseal keys,
|
|
querying another Vault server for unseal keys, or by storing the unseal keys
|
|
somewhere and configuring kubernetes to unseal Vault via the REST API when a
|
|
pod restarts.
|
|
|
|
In the short-term, this integration will store the key shards locally in a
|
|
keyring with root-only permissions. This will require creating a custom script
|
|
allowing kubernetes to access these keys and provide them to pods via the Vault
|
|
REST API upon restart.
|
|
|
|
In the longer-term, support will be extended to configure both an external KMS
|
|
or a second external Vault instance for auto unseal operations.
|
|
|
|
Storing StarlingX system secrets along side user secrets
|
|
________________________________________________________
|
|
|
|
The strict policy enforcement and authorization structure of Vault is intended
|
|
for use cases where secrets must be isolated and their access must be finely
|
|
controlled.
|
|
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
Developers working on StarlingX may wish to store secrets in Vault for use with
|
|
their applications. This will require considering how to set the secret, what
|
|
authorization method the application will use and if the secret will be
|
|
required during platform deployment. Vault cannot be made available until late
|
|
in the deployment because it runs in the kubernetes cluster.
|
|
|
|
Developers that are working on kubernetes applications may want to leverage the
|
|
vault injection feature which injects secrets into a pod, making them available
|
|
at a specified path and does not require the application to be vault-aware.
|
|
|
|
Upgrade impact
|
|
--------------
|
|
|
|
An upgrade path needs to be determined for migrating secrets from Barbican to
|
|
Vault.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
|
|
* Cole Walker
|
|
|
|
Repos Impacted
|
|
--------------
|
|
|
|
* vault-armada-app
|
|
* ansible-playbooks
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Create new repo for the new application 'Vault', define required armada
|
|
manifests and import helm charts for app
|
|
|
|
* Implement system overrides and define Vault as a mandatory platform app
|
|
|
|
* Auto-apply HA configuration based on system deployment type (AIO, Standard,
|
|
etc.)
|
|
|
|
* Extend ansible role 'bringup-bootstrap-applications' to include Vault
|
|
|
|
* Write automation to auto unseal restarted Vault pods using master key shards
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
* Kubernetes cluster
|
|
|
|
Testing
|
|
=======
|
|
|
|
* Vault pods should return to a ready state after being restarted
|
|
as indicated by 'kubectl get pods'
|
|
|
|
* User overrides should be available for various parameters including HA
|
|
configuration, initialization parameters and audit configuration
|
|
|
|
* The deployment should auto apply HA settings based on the system type
|
|
|
|
* Users should be able to set and consume secrets
|
|
|
|
* Activities should be logged to the audit storage backend
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Documentation to be updated with user override configuration parameters and
|
|
usability of Vault in StarlingX
|
|
|
|
References
|
|
==========
|
|
|
|
* Feature storyboard: https://storyboard.openstack.org/#!/story/2007718
|
|
* Vault: https://www.vaultproject.io/
|
|
|
|
History
|
|
=======
|
|
|
|
.. list-table:: Revisions
|
|
:header-rows: 1
|
|
|
|
* - Release Name
|
|
- Description
|
|
* - STX 5.0
|
|
- Introduced
|