Merge "Add Project Teapot idea"

This commit is contained in:
Zuul 2020-03-09 19:14:38 +00:00 committed by Gerrit Code Review
commit 6f33f32257
11 changed files with 1388 additions and 1 deletions

View File

@ -0,0 +1,178 @@
Teapot Compute
==============
Project Teapot is conceived as an exclusively bare-metal compute service for
Kubernetes clusters. Providing bare-metal compute workers to tenants allows
them to make their own decisions about how they make use of virtualisation. For
example, Tenants can choose to use a container hypervisor (such as Kata_) to
further sandbox applications, traditional VMs (such as those managed by
KubeVirt_ or `OpenStack Nova`_), *or both* `side-by-side
<https://kubernetes.io/docs/concepts/containers/runtime-class/>`_ in the same
cluster. Furthermore, it allows users to manage all components of an
application -- both those that run in containers and those that need a
traditional VM -- from the same Kubernetes control plane (using KubeVirt).
Finally, it eliminates the complexity of needing to virtualise access to
specialist hardware such as :abbr:`GPGPU (general-purpose GPU)`\ s or FPGAs,
while still allowing the capability to be used by different tenants at
different times.
However, the *master* nodes of tenant cluster will run in containers on the
management cluster (or some other centrally-managed cluster). This makes it
easy and cost-effective to provide high availability of cluster control planes,
by not sacrificing large numbers of hosts to this purpose or requiring
workloads to run on master nodes. It also makes it possible to optionally
operate Teapot as a fully-managed Kubernetes service. Finally, it makes it
relatively cheap to scale a cluster to zero when it has nothing to do, for
example if it is only used for batch jobs, without requiring it to be recreated
from scratch each time. Since the management cluster also runs on bare metal,
the tenant pods could also be isolated from each other and from the rest of the
system using Kata, in addition to regular security policies.
.. _teapot-compute-metal3:
Metal³
------
Provisioning of bare-metal servers will use `Metal³`_.
The baremetal-operator from Metal³ provides a Kubernetes-native interface over
a simplified `OpenStack Ironic`_ deployment. In this configuration, Ironic runs
standalone (i.e. it does not use Keystone authentication). All communication
between components occurs inside of a pod. RabbitMQ has been replaced by
json-rpc. Ironic state is maintained in a database, but the database can run on
ephemeral storage -- the Kubernetes custom resource (BareMetalHost) is the
source of truth.
The baremetal-operator will run only in the management cluster (or some other
centrally managed cluster) because it requires access to both the :abbr:`BMC
(Baseboard Management Controller)`\ s' network (as well as the
:ref:`provisioning network <teapot-networking-provisioning>`) and the
authentication credentials for the BMCs.
.. _teapot-compute-cluster-api:
Cluster API
-----------
The baremetal-operator can be integrated with the Kubernetes Cluster Lifecycle
SIG's `Cluster API`_ via another Metal³ component, the
cluster-api-provider-baremetal. This contains a BareMetalMachine controller
that implements the Machine abstraction using a BareMetalHost. (Airship_ 2.0 is
also slated to use Metal³ and the Cluster API to manage cluster provisioning,
so this mechanism could be extended to deploy fully-configured clusters with
Airship as well.)
When the Cluster API is used to build standalone clusters, typically a
bootstrap node is created (often using a local VM) to run it in order to create
the permanent cluster members. The Cluster and Machine resources are then
'pivoted' (copied) into the cluster, which continues to manage itself while the
bootstrap node is retired. When used with a centralised cluster manager such as
Teapot, the process is usually similar but can use the management cluster to do
the bootstrapping. Pivoting is optional but usually expected.
Teapot imposes some additional constraints. Because the BareMetalHost objects
must remain in the management cluster, the Machine objects cannot be simply
copied to the tenant cluster and continue to be backed by the BareMetalMachine
controller in its present form.
One option might be to build a machine controller for the tenant cluster that
is backed by a Machine object in another cluster (the management cluster). This
might prove useful for centralised management clusters in general, not just
Teapot. We would have no choice but to name this component
cluster-api-provider-cluster-api.
Cluster API does not yet have support for running the tenant control plane in
containers. Tools like Gardener_ do, but are not yet well integrated with the
Cluster API. However, the Cluster Lifecycle SIG is aware of this use case, and
will likely evolve the Cluster API to make this possible.
.. _teapot-compute-autoscaling:
Autoscaling
-----------
The preferred mechanism in Kubernetes for applications to control the size of
the cluster they run in is the Cluster Autoscaler. There is no separate
interface to this mechanism for applications. If an application is too busy, it
simply requests more or larger pods. When there is no longer sufficient
capacity to schedule all requested pods, the Cluster Autoscaler will scale the
cluster up. Similarly, if there is significant excess capacity not being used
by pods, Cluster Autoscaler will scale the cluster down.
Cluster Autoscaler works using its own cloud-specific plugins. A `plugin that
uses the Cluster API is in progress
<https://github.com/kubernetes/autoscaler/pull/1866>`_, so Teapot could
automatically make use of that provided that the Machine resources were pivoted
into the tenant cluster.
One significant challenge posed by bare-metal is the extremely high latency
involved in provisioning a bare-metal host (15 minutes is not unusual, due in
large part to running hardware tests including checking increasingly massive
amounts of RAM). The situation is even worse when needing to deprovision a host
from one tenant before giving it to another tenant, since that requires
cleaning the local disks, though this extra overhead can be essentially
eliminated if the disk is encrypted (in which case only the keys need be
erased).
.. _teapot-compute-scheduling:
Scheduling
----------
Unlike when operating a standalone bare-metal cluster, when allocating hosts
amongst different clusters it is important to have sophisticated ways of
selecting which hosts are added to which cluster.
An obvious example would be selecting for various hardware traits -- which are
unlikely to be grouped into 'flavours' in the way that Nova does. The optimal
way of doing this would likely include some sort of cost function, so that a
cluster is always allocated the minimum spec machine that meets its
requirements. Another example would be selecting for either affinity or
anti-affinity of hosts, possibly at different (and deployment-specific) levels
of granularity.
Work is underway in Metal³ on a hardware-classification-controller that will
add labels to BareMetalHosts based on selected traits, and the baremetal
actuator can select hosts based on labels. This would be sufficient to perform
flavour-based allocation and affinity, but likely not on its own for
trait-based allocation and anti-affinity.
.. _teapot-compute-reservation:
Reservation and Quota Management
--------------------------------
The design for quota management should recognise the many ways in which it is
used in both private and public clouds. In public clouds utilisation is
controlled by billing; quotas are primarily a tool for *users* to limit their
financial exposure.
In private OpenStack clouds, the implementation of chargeback is rare. A more
common model is that a department will contribute a portion of the capital
budget for a cloud in exchange for a quota -- a model that fits quite well with
Teapot's allocation of entire hosts to tenants.
To best support the private cloud use case, there need to be separate concepts
of a guaranteed minimum reservation and a maximum quota. The sum of minimum
reservations must not exceed the capacity of the cloud (are more complex
requirement than it sounds, since it must take into account selected hardware
traits). Some form of pre-emption is needed, along with a way of prioritising
requests for hosts. Similar concepts exist in many public clouds, in the form
of reserved and spot-rate instances.
The reservation/quota system should have a time component. This allows, for
example, users who have large batch jobs to reserve capacity for them without
tying it up around the clock. (The increasing importance of machine learning
means that once again almost everybody has large batch jobs.) Time-based
reservations can also help mitigate the high latency of moving hosts between
tenants, by allowing some of the demand to be anticipated.
.. _Kata: https://katacontainers.io/
.. _KubeVirt: https://kubevirt.io/
.. _OpenStack Nova: https://docs.openstack.org/nova
.. _Metal³: https://metal3.io/
.. _OpenStack Ironic: https://docs.openstack.org/ironic
.. _Cluster API: https://github.com/kubernetes-sigs/cluster-api#readme
.. _Airship: https://www.airshipit.org/
.. _Gardener: https://gardener.cloud/030-architecture/

View File

@ -0,0 +1,117 @@
Teapot DNS
==========
Project Teapot must provide a trusted way for DNS information generated by the
(untrusted) tenant clusters to be propagated out to the network.
Each tenant cluster requires at least 2 DNS records -- one for the control
plane, and a wildcard for any applications. These would usually be subdomains
of a zone delegated to the Teapot for this purpose. Teapot would be responsible
for rolling up these records and making them available over DNS.
Since Teapot will be responsible for :ref:`allocating public IP addresses
<teapot-networking-external>`, it will also need to be responsible for
advertising reverse DNS records for those IPs.
Implementation Options
----------------------
The Kubernetes SIG ExternalDNS_ project is a Kubernetes-native service that
collects IP addresses for Services and Ingresses running in the cluster and
exports DNS records for them (though it is *not* itself a DNS server). It
supports many different back-ends -- both traditional DNS servers and
cloud-based services (including OpenStack Designate).
While tenants are of course free to run this in their own clusters already
(perhaps pointing to an external cloud service), this is not sufficient to
satisfy the above requirements. It requires them to use an external cloud
service (which may not always be appropriate for internal-only applications in
a private cloud), since tenants are untrusted and cannot be given write access
to an internal DNS server. And reverse DNS records cannot be exported, because
tenant clusters are not a trusted source of information about what IP addresses
are assigned to them.
.. _teapot-dns-externaldns:
ExternalDNS in load balancing cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If Teapot implemented the :doc:`load balancing <load-balancing>` :ref:`option
based on Ingress resources <teapot-load-balancing-ingress-api>` in the
management cluster (or a separate load balancing cluster), and these were used
for both Services and Ingresses, then ExternalDNS running in that same cluster
would automatically see all of the external endpoints for the tenant clusters.
It could even rely on the fact that the IP addresses will have been sanitised
already before creating the Ingress objects. There would need to be provision
made somewhere for sanitising the DNS names, however.
On its own this only satisfies the first requirement. Additional work might
need to be done to export the wildcard DNS records for the tenant workloads.
(Note that the tenant control planes would be running in containers on the
management cluster or another centrally-managed cluster, and may well have
Ingress resources associated with them already.) And additional work would
certainly be needed to export the reverse DNS records.
A major downside of this is that it gives the tenant very little control over
whether and how it exports DNS information.
.. _teapot-dns-externaldns-sync:
Build a component to sync ExternalDNS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A component running in a tenant cluster could sync any ExternalDNS Endpoint
resources (not to be confused with Kubernetes Endpoint resources) from the
tenant cluster into the management cluster. (This component could even be
written as an ExternalDNS provider.) This option is analagous to an
:ref:`Ingress-based API for load balancing
<teapot-load-balancing-ingress-api>`.
On the management cluster side, a validating webhook would check for legitimacy
prior to accepting a resource. More investigation is required into the
mechanics of this -- since the resources are not normally manipulated by
anything other than ExternalDNS itself, having something else writing to them
might prove brittle.
Again, additional work might need to be done to export the wildcard DNS records
for the tenant workloads and would be needed for reverse DNS records.
.. _teapot-dns-designate:
OpenStack Designate
~~~~~~~~~~~~~~~~~~~
Designate_ is already one of the supported back-ends for ExternalDNS. By
running a minimal, opinionated installation of Designate in the management
cluster we could allow tenants to choose whether and how to set up ExternalDNS
in their own clusters. They could choose to export records to the Teapot cloud,
to some external cloud, or not at all.
Since Designate has an API, it would be easy to add the two top-level records
for each cluster.
Designate has the ability to export reverse DNS records based on floating IPs.
However, the current implementation is tightly coupled to Neutron. If Neutron
is used in Teapot it should be as an implementation detail only, so other
services like Designate should not rely on integrating with it. Therefore
additional work would be required to support reverse DNS. There is an API
plugin point to pull data, or it could be pushed in through the Designate API.
Ideally the back-end in the opinionated configuration would be CoreDNS_, due to
its status in the Kubernetes community (it is used for the *internal* DNS and
is a CNCF project). However, there is currently no CoreDNS back-end for
Designate. An alternative to writing one would be to write a Designate plugin
for CoreDNS -- similar plugins exist for other clouds already. The latter would
provide the most benefit to OpenStack users, since theoretically tenants could
make use of it even if CoreDNS is not chosen as the back-end by their OpenStack
cloud's administrators.
The Designate Sink component would not be required, but the rest of Designate
is also built around RabbitMQ, which is highly undesirable. However, it is
largely used to implement RPC patterns (``call``, not ``cast``), and might be
amenable to being swapped for a json-rpc interface in the same way as is done
in Ironic for Metal³.
.. _ExternalDNS: https://github.com/kubernetes-sigs/external-dns#readme
.. _Designate: https://docs.openstack.org/designate/
.. _CoreDNS: https://coredns.io/

View File

@ -0,0 +1,146 @@
Teapot Identity Management
==========================
Teapot need not, and should not, impose any particular identity management
system for tenant clusters. These are the clusters that applications and
application developers/operators will routinely interact with, and the choice
of identity management providers is completely up to the administrators of
those clusters, or at least the administrator of the Teapot cloud when running
as a fully-managed service.
Identity management in Teapot itself (i.e. the management cluster) is needed
for two different purposes. While not strictly necessary, it would be
advantageous to require only one identity management provider to cover both of
these use cases.
Authenticating From Below
-------------------------
Software running in the tenant clusters needs to authenticate to the cloud to
request resources, such as machines, :doc:`load balancers <load-balancing>`,
:doc:`shared storage <storage>`, :doc:`DNS records <dns>`, and (in future)
managed software services.
Credentials for these purposes should be regularly rotated and narrowly
authorised, to limit both the scope and duration of any compromise.
Authenticating From Above
-------------------------
Real users and sometime software services need to authenticate to the cloud to
create or destroy clusters, manually scale them up or down, request quotas, and
so on.
In many cases, such as most enterprise private clouds, these credentials should
be linked to an external identity management provider. This would allow
auditors of the system to tie physical hardware directly back to corporeal
humans to which it is allocated and the organisational units to which they
belong.
Humans must also have a secure way of delegating privileges to an application
to interact with the cloud in this way -- for example, imagine a CI system that
needs to create an entire test cluster from scratch and destroy it again. This
must not require the user's own credentials to be stored anywhere.
Implementation options
----------------------
.. _teapot-idm-keystone:
OpenStack Keystone
~~~~~~~~~~~~~~~~~~
Keystone_ is currently the only game in town for providing identity management
for OpenStack services that are candidates for being included to provide some
multi-tenant functionality in Teapot, such as :ref:`Manila
<teapot-storage-manila>` and :ref:`Designate <teapot-dns-designate>`. Therefore
using Keystone for all identity management on the management cluster would not
only not increase complexity of the deployment, it would actually minimise it.
An authorisation webhook for Kubernetes that uses Keystone is available in
cloud-provider-openstack_. In general, OAuth seems to be preferred to webhooks
for connecting external identity management systems, but there is at least a
working option.
Keystone supports delegating user authentication
to LDAP, as well as offering its own built-in user management. It can also
federate with other identity providers via the `OpenID Connect`_ or SAML_
protocols. Using Keystone would also make it simpler to run Teapot alongside an
existing OpenStack cloud -- enabling tenants to share services in that cloud,
as well as potentially making Teapot's functionality available behind an
OpenStack-native API (similar to Magnum) for those who want it.
Keystone also features quota management capabilities that could be reused to
manage tenant quotas_. A proof-of-concept for a validating webhook that allows
this to be used for governing Kubernetes resources `exists
<https://github.com/cmurphy/keyhook#readme>`_.
While there are generally significant impedance mismatches between the
Kubernetes and Keystone models of authorisation, Project Teapot is a fresh
start and can prescribe custom policy models that mitigate the mismatch.
(Ongoing changes to default policies will likely smooth over these kinds of
issues in regular OpenStack clouds also.) This may not be so easy when sharing
a Keystone :doc:`with an OpenStack cloud <openstack-integration>` though.
Keystone Application Credentials allow users to create (potentially)
short-lived credentials that an application can use to authenticate without the
need to store the user's own LDAP password (which likely also governs their
access to a wide range of unrelated corporate services) anywhere. Credentials
provided to tenant clusters should be exclusively of this type, limited to the
purpose assigned (e.g. credentials intended for accessing storage can only be
used to access storage), and regularly rotated out and expired.
.. _teapot-idm-dex:
Dex
~~~
Dex_ is an identity management service that uses `OpenID Connect`_ to provide
authentication to Kubernetes. It too supports delegating user authentication to
LDAP, amongst others. This would likely be seen as a more conventional choice
in the Kubernetes community. Dex can store its data using Kubernetes custom
resources, so it is the most lightweight option.
Dex does not support authorisation. However, Keystone supports OpenID Connect
as a federated identity provider, so it could still be used as the
authorisation mechanism (including for OpenStack-derived services such as
Manila) using Dex for authentication. However, this inevitably adds additional
moving parts. In general, Keystone has `difficultly
<https://bugs.launchpad.net/keystone/+bug/1589993>`_ with application
credentials for federated users because it is not immediately notified of
membership revocations, but since both components are under the same control in
this case it would be easier to build some additional integration to keep them
in sync.
.. _teapot-idm-keycloak:
Keycloak
~~~~~~~~
Keycloak_ is a more full-featured identity management service. It would also be
seen in the Kubernetes community as a more conventional choice than Keystone,
although it does not use the Kubernetes API as a data store. Keycloak is
significantly more complex to deploy than Dex. However, a `Kubernetes operator
for Keycloak <https://operatorhub.io/operator/keycloak-operator>`_ now exists,
which should hide much of the complexity.
Keystone could federate to Keycloak as an identity management provider using
either OpenID Connect or SAML.
Theoretically, Keycloak could be used without Keystone if the Keystone
middleware in the services were replaced by some new OpenID Connect middleware.
The architecture of OpenStack is designed to make this at least possible. It
would also require changes to client-side code (most prominently any
cloud-provider-openstack providers that might otherwise be reused), although
there is a chance that they could be contained to a small blast radius around
Gophercloud's `clientconfig module
<https://github.com/gophercloud/utils/tree/master/openstack/clientconfig>`.
.. _Keystone: https://docs.openstack.org/keystone/
.. _OpenID Connect: https://openid.net/connect/
.. _SAML: https://docs.oasis-open.org/security/saml/Post2.0/sstc-saml-tech-overview-2.0.html
.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-keystone-webhook-authenticator-and-authorizer.md#readme
.. _quotas: https://docs.openstack.org/keystone/latest/admin/unified-limits.html
.. _Dex: https://github.com/dexidp/dex/#readme
.. _Keycloak: https://www.keycloak.org/

View File

@ -0,0 +1,129 @@
Project Teapot
==============
.. _teapot-introduction:
Introduction
------------
Project Teapot is a design proposal for a bare-metal cloud to run Kubernetes
on.
When OpenStack was first designed, 10 years ago, the cloud computing landscape
was a very different place. In the intervening period, OpenStack has amassed an
enormous installed base of many thousands of users who all depend on it
remaining essentially the same service, with backward-compatible APIs. If we
designed an open source cloud platform without those restrictions and looking
ahead to the 2020s, knowing everything we know today, what might it look like?
And how could we build it without starting from scratch, but using existing
open source technologies where possible? Project Teapot is one answer to these
questions.
Project Teapot is designed to run natively on Kubernetes, and to integrate with
Kubernetes clusters deployed by tenants. It provides only bare-metal compute
capacity, so that tenants can orchestrate all aspects of an application -- from
legacy VMs to cloud-native containers to workloads requiring custom hardware,
and everything in between -- through a single API that they can control.
It seems inevitable that numerous organisations are going to end up
implementing various subsets of this functionality just to deal with bare-metal
clusters in their own environment. By developing Teapot in the open, we would
give them a chance to reduce costs by collaborating on a common solution.
.. _teapot-goals:
Goals
-----
OpenStack's `mission
<https://governance.openstack.org/tc/resolutions/20160217-mission-amendment.html>`_
is to be ubiquitous; Teapot's is narrower. In the 2020s, Kubernetes will be
ubiquitous. However, Kubernetes' separation of responsibilities with the
underlying cloud mean that some important capabilities are considered out of
scope for it -- most obviously multi-tenancy of the sort provided by clouds,
allowing isolation from potentially malicious users (including innocuous users
who have had their workloads hacked by malicious third parties). Teapot's
primary mission is to fill those gaps with an open source solution, by
providing a cloud layer to manage a physical data center beneath Kubernetes.
In addition to mediating access to a physical data center, another important
role of clouds is to offer managed services (for example, a database as a
service). Teapot itself can be used to provide a managed service -- Kubernetes
(though it could equally be configured to provide fully user-controlled tenant
clusters). A secondary goal is to make Teapot a platform that cloud providers
could use to offer other kinds of managed service as well. Teapot is an easier
base than OpenStack on which to deploy such services because it is itself based
on Kubernetes.
.. _teapot-non-goals:
Non-Goals
---------
Teapot's design makes it suitable for deployments that require multi-tenancy
and are medium-sized or larger. Specifically, Teapot makes sense when tenants
are large enough to be able to utilise at least one (and usually more than one)
entire bare-metal server, because managing virtual machines is not a goal.
Smaller deployments that nevertheless require hard multi-tenancy (that is to
say, zero trust required between tenants) would be better off with OpenStack.
Smaller deployments that do not require hard multi-tenancy would be better off
running a single standalone Kubernetes cluster.
.. _teapot-design:
Design
------
The `Vision for OpenStack Clouds`_ states that the `physical data center
management function
<https://governance.openstack.org/tc/reference/technical-vision.html#basic-physical-data-center-management>`_
of a cloud must "[provide] the abstractions needed to deal with external
systems like :doc:`compute <compute>`, :doc:`storage <storage>`, and
:doc:`networking <networking>` hardware [including :doc:`load balancers
<load-balancing>` and :doc:`hardware security modules <key-management>`], the
:doc:`Domain Name System <dns>`, and :doc:`identity management systems <idm>`."
This proposal discusses implementation options for each of those classes of
systems.
Teapot also fulfils the `self-service
<https://governance.openstack.org/tc/reference/technical-vision.html#self-service>`_
requirements of a cloud, by providing multi-tenancy and :ref:`capacity
management <teapot-compute-reservation>`. In the Kubernetes model,
multi-tenancy is something that must be provided by the cloud layer.
Because Teapot targets Kubernetes as its tenant workload, it is able to
`provide applications control
<https://governance.openstack.org/tc/reference/technical-vision.html#application-control>`_
over the cloud using the standard Kubernetes interfaces (such as Ingress
resources and the Cluster Autoscaler). This greatly simplifies porting of many
workloads to and from other clouds.
Teapot is designed to be radically simpler than OpenStack to :doc:`install
<installation>` and operate. By running on the same technology stack as the
tenant clusters it deploys, it allows a common set of skills to be applied to
the operation of both applications and the underlying infrastructure. By
eschewing direct management of virtualisation it avoids having to shoehorn
bare-metal management into a virtualisation context or vice-versa, and
eliminates entire layers of networking abstractions.
At the same time, Teapot should be able to :doc:`interoperate with OpenStack
<openstack-integration>` when required so that each enhances the value of the
other without adding unnecessary layers of complexity.
Index
-----
.. toctree::
compute
storage
networking
load-balancing
dns
idm
key-management
installation
openstack-integration
.. _Vision for OpenStack Clouds: https://governance.openstack.org/tc/reference/technical-vision.html

View File

@ -0,0 +1,69 @@
Teapot Installation
===================
In a sense, the core of Teapot is simply an application running in a Kubernetes
cluster (the management cluster). This is a great advantage for ease of
installation, because Kubernetes is renowned for its simplicity in
bootstrapping. Many, many (perhaps too many) tools already exist for
bootstrapping a Kubernetes cluster, so there is no need to reinvent them.
However, Teapot is designed to be the system that provides cloud services to
bare-metal Kubernetes clusters, and while it is possible to run the management
cluster on another cloud (such as OpenStack), it is likely in most instances to
be self-hosted on bare metal. This presents a unique bootstrapping challenge.
OpenStack does not define an 'official' installer, largely due to the plethora
of configuration management tools that different users preferred. Teapot does
not have the same issue, as it standardises on Kubernetes as the *lingua
franca*. There should be a single official installer and third parties are
encouraged to add extensions and customisations by adding Resources and
Operators through the Kubernetes API.
Implementation Options
----------------------
Metal³
~~~~~~
`Metal³`_ is designed to bootstrap standalone bare-metal clusters, so it can be
used to install the management cluster. There are multiple ways to do this. One
is to use the `Cluster API`_ on a bootstrap VM, and then pivot the relevant
resources into the cluster. The OpenShift installer takes a slightly different
approach, again using a bootstrap VM, but creating the master nodes initially
using Terraform and then creating BareMetalHost resources marked as 'externally
provisioned' for them in the cluster.
One inevitable challenge is that the initial bootstrap VM must be able to
connect to the :ref:`management and provisioning networks
<teapot-networking-provisioning>` in order to begin the installation. That
makes it difficult to simply run from a laptop, which makes installing a small
proof-of-concept cluster harder than anyone would like. (This is inherent to
the bare-metal environment and also a problem for OpenStack installers). If a
physical host must be used as the bootstrap, reincorporating that hardware into
the actual cluster once it is up and running should at least be simpler on
Kubernetes.
Airship
~~~~~~~
Airship_ 2.0 uses Metal³ and the Cluster API to provision Kubernetes clusters
on bare metal. It also provides a declarative way of repeatably setting the
initial configuration and workloads of the deployed cluster, along with a rich
document layering and substitution model (based on Kustomize). This might be
the simplest existing way of defining what a Teapot installation looks like
while allowing distributors and third-party vendors a clear method for
providing customisations and add-ons.
Teapot Operator
~~~~~~~~~~~~~~~
A Kubernetes operator for managing the deployment and configuration of the
Teapot components could greatly simplify the installation process. This is not
incompatible with using Airship (or indeed any other method) to define the
configuration, as Helm would just create the top-level custom resource(s)
controlled by the operator, instead of lower-level resources for the individual
components.
.. _Metal³: https://metal3.io/
.. _Cluster API: https://github.com/kubernetes-sigs/cluster-api#readme
.. _Airship: https://www.airshipit.org/

View File

@ -0,0 +1,70 @@
Teapot Key Management
=====================
Kubernetes offers the Secret resource for storing secrets needed by
applications. This is an improvement on storing them in the applications'
source code, but unfortunately by default Secrets are not encrypted at rest,
but simply stored in etcd in plaintext. An EncryptionConfiguration_ resource
can be used to ensure the Secrets are encrypted before storing them, but in
most cases the keys used to encrypt the data are themselves stored in etcd in
plaintext, alongside the encrypted data.
This can be avoided by using a `Key Management Service provider`_ plugin. In
this case the encryption keys for each Secret are themselves encrypted, and can
only be decrypted using a master key stored in the key management service
(which may be a hardware security module). All extant KMS providers appear to
be for cloud services; there are no baremetal options.
Since the KMS provider is necessary to provide effective encryption at rest and
is the *de facto* responsibility of the cloud, it would be desirable for Teapot
to support it. The implementation should be able to make use of :abbr:`HSM
(Hardware Security Module)`\ s, but also be able to work with a pure-software
solution.
Implementation Options
----------------------
.. _teapot-key-management-barbican:
OpenStack Barbican
~~~~~~~~~~~~~~~~~~
Barbican_ provides exactly the thing we want. It `provides
<https://docs.openstack.org/barbican/latest/install/barbican-backend.html>`_ an
abstraction over HSMs as well as software implementations using Dogtag_ (which
can itself store its master keys either in software or in an HSM) or Vault_,
along with another that simply stores its master key in the config file.
Like other OpenStack services, Barbican uses Keystone for :doc:`authentication
<idm>`. A :abbr:`KMS (Key Management Service)` provider for Barbican already
exists in cloud-provider-openstack_. This could be used in both the management
cluster and in tenant clusters.
Barbican's architecture is relatively simple, although it does rely on RabbitMQ
for communication between the API and the workers. This should be easy to
replace with something like json-rpc as was done for Ironic in Metal³ to
simplify the deployment.
Storing keys in software on a dynamic system like Kubernetes presents
challenges. It might be necessary to use a host volume on the master nodes to
store master keys when no HSM is available. Ultimately the most secure solution
is to use a HSM.
.. _teapot-key-management-secrets:
Write a new KMS plugin using Secrets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Writing a KMS provider plugin is very straightforward. We could write one that
just uses a Secret stored in the management cluster as the master key.
However, this could not be used to encrypt Secrets at rest in the management
cluster itself.
.. _EncryptionConfiguration: https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
.. _Key Management Service provider: https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/
.. _Barbican: https://docs.openstack.org/barbican/latest/
.. _Dogtag: https://www.dogtagpki.org/wiki/PKI_Main_Page
.. _Vault: https://www.vaultproject.io/
.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-barbican-kms-plugin.md#readme

View File

@ -0,0 +1,241 @@
Teapot Load Balancing
=====================
Load balancers are one of the things that Kubernetes expects to be provided by
the underlying cloud. No multi-tenant bare-metal solutions for this exist, so
project Teapot would need to provide one. Ideally an external load balancer
would act as an abstraction over what could be either a tenant-specific
software load balancer or multi-tenant-safe access to a hardware (or virtual)
load balancer.
There are two ways for an application to request an external load balancer in
Kubernetes. The first is to create a Service_ with type |LoadBalancer|_. This
is the older way of doing things but is still useful for lower-level plumbing,
and may be required for non-HTTP(S) protocols. The preferred (though nominally
beta) way is to create an Ingress_. The Ingress API allows for more
sophisticated control (such as adding abbr:`TLS (Transport Layer Security)`
termination), and can allow multiple services to share a single external load
balancer (including across different DNS names), and hence a single IP address.
Most managed Kubernetes services provide an Ingress controller that can set up
external load balancers, including TLS termination, using the underlying
cloud's services. Without this, tenants can still use an Ingress controller,
but it would have to be one that uses resources available to the tenant, such
as by running software load balancers in the tenant cluster.
When using a Service of type |LoadBalancer| (rather than an Ingress), there is
no standardised way of requesting TLS termination (some cloud providers permit
it using an annotation), so supporting this use case is not a high priority.
The |LoadBalancer| Service type in general should be supported, however (though
there are existing Kubernetes offerings where it is not).
Implementation options
----------------------
The choices below are not mutually exclusive. An administrator of a Teapot
cloud and their tenants could each potentially choose from among several
available options.
.. _teapot-load-balancing-metallb-l2:
MetalLB (Layer 2) on tenant cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MetalLB_ project provides two ways of doing load balancing for bare-metal
clusters. One requires control over only layer 2, although it really only
provides the high-availability aspects of load balancing, not actual balancing.
All incoming traffic for each service is directed to a single node; from there
kubeproxy distributes it to the endpoints that handle it. However, should the
node die, traffic rapidly fails over to another node.
This form of load balancing does not support offloading TLS termination,
results in large amounts of East-West traffic, and consumes resources from the
guest cluster.
Tenants could decide to use this unilaterally (i.e. without the involvement of
the management cluster or its administrators). However, using MetalLB restricts
the choice of CNI plugins -- for example it does not work with OVN. A
pre-requisite to use it would be that all tenant machines share a layer 2
broadcast domain, which may be undesirable in larger clouds. This may be an
acceptable solution for Services in some cases though.
.. _teapot-load-balancing-metallb-l3-management:
MetalLB (Layer 3) on management cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The layer 3 form of MetalLB_ load balancing provides true load balancing, but
requires control over the network hardware in the form of advertising
:abbr:`ECMP (Equal Cost Multiple Path)` routes via BGP. (This also places
additional `requirements
<https://metallb.universe.tf/concepts/bgp/#limitations>`_ on the network
hardware.) Since tenant clusters are not trusted to do this, it would have to
run in the management cluster. There would need to be an API in the management
cluster to vet requests and pass them on to MetalLB, and a
cloud-provider-teapot plugin that tenants could optionally install to connect
to it.
This form of load balancing does not support offloading TLS termination either.
.. _teapot-load-balancing-metallb-l3-tenant:
MetalLB (Layer 3) on tenant cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While the network cannot trust BGP announcements from tenants, in principle the
management cluster could have a component, perhaps based on `ExaBGP
<https://github.com/Exa-Networks/exabgp#readme>`_, that listens to such
announcements on the tenant V(x)LANs, drops any that refer to networks not
allocated to the tenant, and rebroadcasts the legitimate ones to the network
hardware.
This would allow tenant networks to choose to make use of MetalLB in its Layer
3 mode, providing actual traffic balancing as well as making it possible to
split tenant machines amongst separate L2 broadcast domains. It would also
allow tenants to choose among a much wider range of :doc:`CNI plugins
<./networking>`, many of which also rely on BGP announcements.
This form of load balancing still does not support offloading TLS termination.
.. _teapot-load-balancing-ovn:
Build a new OVN-based load balancer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One drawback of MetalLB is that it is not compatible with using OVN as the
network overlay. This is unfortunate, as OVN is one of the most popular network
overlays used with OpenStack, and thus might be a common choice for those
wanting to integrate workloads running in OpenStack and Kubernetes together.
A new OVN-based network load balancer in the vein of MetalLB might provide more
options for this group.
.. _teapot-load-balancing-ingress-api:
Build a new API using Ingress resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A new API in the management cluster would receive requests in a form similar to
an Ingress resource, sanitise them, and then proxy them to an Ingress
controller running in the management cluster (or some other
centrally-controlled cluster). In fact, it is possible the 'API' could be as
simple as using the existing Ingress API in a namespace with a validating
webhook.
The most challenging part of this would be coaxing the Ingress controllers on
the load balancing cluster to target services in a different cluster (the
tenant cluster). Most likely we would have to sync the EndpointSlices from the
tenant cluster into the load balancing cluster.
In all likelihood when using a software-based Ingress controller running in a
load balancing cluster, a network load balancer would also be used on that
cluster to ensure high-availability of the load balancers themselves. Examples
include MetalLB and `kube-keepalived-vip
<https://github.com/aledbf/kube-keepalived-vip>`_ (which uses :abbr:`VRRP
(Virtual Router Redundancy Protocol)` to ensure high availability). This
component would need to be integrated with :ref:`public IP assignment
<teapot-networking-external>`.
There are already controllers for several types of software load balancers (the
nginx controller is even officially supported by the Kubernetes project), as
well as multiple hardware load balancers. This includes an existing Octavia
Ingress controller in cloud-provider-openstack_, which would be useful for
:doc:`integrating with OpenStack clouds <openstack-integration>`. The ecosystem
around this API is likely to have continued growth. This is also likely to be
the site of future innovation around configuration of network hardware, such as
hardware firewalls.
In general, Ingress controllers are not expected to support non-HTTP(S)
protocols, so it's not necessarily possible to implement the |LoadBalancer|
Service type with an arbitrary plugin. However, the nginx Ingress controller
has support for arbitrary `TCP and UDP services
<https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/>`_,
so the API would be able to provide for either type.
Unlike the network load balancer options, this form of load balancing would be
able to terminate TLS connections.
.. _teapot-load-balancing-custom-api:
Build a new custom API
~~~~~~~~~~~~~~~~~~~~~~
A new service running on the management cluster would provide an API through
which tenants could request a load balancer. An implementation of this API
would provide a pure-software load balancer running in containers in the
management cluster (or some other centrally-controlled cluster). As in the case
of an Ingress-based controller, a network load balancer would likely be used to
provide high-availability of the load balancers.
The API would be designed such that alternate implementations of the controller
could be created for various load balancing hardware. Ideally one would take
the form of a shim to the existing cloud-provider API for load balancers, so
that existing plugins could be used. This would include
cloud-provider-openstack, for the case where Teapot is installed alongside an
OpenStack cloud allowing it to make use of Octavia.
Unlike the network load balancer options, this form of load balancing would be
able to terminate TLS connections.
This option seems to be strictly inferior to using Ingress controllers on the
load balancing cluster to implement an API, assuming both options prove
feasible.
.. _teapot-load-balancing-ingress-controller:
Build a new Ingress controller
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the event that we build a new API in the management cluster, a Teapot
Ingress controller would proxy requests for an Ingress to it. This controller
would likely be responsible for syncing the EndpointSlices to the API as well.
.. _teapot-load-balancing-cloud-provider:
Build a new cloud-provider
~~~~~~~~~~~~~~~~~~~~~~~~~~
In the event that we build a new API in the management cluster, a
cloud-provider-teapot plugin that tenants could optionally install would allow
them to make use of the API in the management cluster to configure Services of
type |LoadBalancer|.
While helpful to increase portability of applications between clouds, this is a
much lower priority than building an Ingress controller. Tenants can always
choose to use Layer 2 MetalLB for their |LoadBalancer| Services instead.
.. _teapot-load-balancing-octavia:
OpenStack Octavia
~~~~~~~~~~~~~~~~~
On paper, Octavia_ provides exactly what we want: a multi-tenant abstraction
layer over hardware load balancer APIs, with a software-based driver for those
wanting a pure-software solution.
In practice, however, there is only one driver for a hardware load balancer
(along with a couple of other out-of-tree drivers), and an Ingress controller
for that hardware also exists. More drivers existed for the earlier Neutron
LBaaS v2 API, but some vendors had largely moved on to Kubernetes by the time
the Neutron API was replaced by Octavia.
The pure-software driver (Amphora) itself supports provider plugins for its
compute and network. However the only currently available providers are for
OpenStack Nova and OpenStack Neutron. Nova will not be present in Teapot. Since
we want to make use of Neutron only as a replaceable implementation detail --
if at all -- Teapot cannot allow other components of the system to become
dependent on it. Additional providers would have to be written in order to use
Octavia in Teapot.
Another possibility is integration in the other direction -- using a
Kubernetes-based service as a driver for Octavia when Teapot is
:doc:`co-installed with an OpenStack cloud <openstack-integration>`.
.. |LoadBalancer| replace:: ``LoadBalancer``
.. _Service: https://kubernetes.io/docs/concepts/services-networking/service/
.. _LoadBalancer: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
.. _Ingress: https://kubernetes.io/docs/concepts/services-networking/ingress/
.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-octavia-ingress-controller.md#readme
.. _MetalLB: https://metallb.universe.tf/
.. _Octavia: https://docs.openstack.org/octavia/

View File

@ -0,0 +1,187 @@
Teapot Networking
=================
In Project Teapot, tenant clusters are deployed exclusively on bare-metal
servers, which are under the complete control of the tenant. Therefore the
network itself must be the guarantor of multi-tenancy, with only untrusted
components running on tenant machines. (Trusted components can still run within
the management cluster.)
.. _teapot-networking-multi-tenancy:
Multi-tenant Network Model
--------------------------
Support for VLANs and VxLAN is ubiquitous in modern data center network
hardware, so this will be the basis for Teapot's networking. Each tenant will
be assigned one or more V(x)LANs. (Separate failure domains will likely also
have separate broadcast domains.) As machines are assigned to the tenant, the
Teapot controller will connect each to a private virtual network also assigned
to the tenant.
Small deployments can just use VLANs. Larger deployments may need VxLAN, and in
this case :abbr:`VTEP (VxLAN Tunnel EndPoint)`-capable edge switches and a
VTEP-capable router will be required.
This design frees the tenant clusters from being forced to use a particular
:abbr:`CNI (Container Network Interface)` plugin. Tenants are free to select a
networking overlay (e.g. Flannel, Cilium, OVN, &c.) or other CNI plugin (e.g.
Calico, Romana) of their choice within the tenant cluster, provided that it
does not need to be trusted by the network. (This would preclude solutions that
rely on advertising BGP/OSPF routes, although it's conceivable that one day
these advertisements could be filtered through a trusted component in the
management cluster and rebroadcast to the unencapsulated network -- this would
also be useful for :ref:`load balancing
<teapot-load-balancing-metallb-l3-tenant>` of Services.) If the tenant's CNI
plugin does create an overlay network, that technically means that packets will
be double-encapsulated, which is a Bad Thing when it occurs in VM-based
clusters, for several reasons:
* There is a performance overhead to encapsulating the packets on the
hypervisor, and it also limits the ability to apply some performance
optimisations (such as using SR-IOV to provide direct access to the NICs from
the VMs by virtualising the PCIe bus).
* The extra overhead in each packet can cause fragmentation, and reduces the
bandwidth available at the edge.
* Broadcast, multicast and unknown unicast traffic is flooded to all possible
endpoints in the overlay network; doing this at multiple layers can increase
network load.
However, these problems are significantly mitigated in the Teapot model:
* The performance cost of performing the second encapsulation is eliminated by
offloading it to the network hardware.
* Encapsulation headers are carried only within the core of the network, where
bandwidth is less scarce and frame sizes can be adjusted to prevent
fragmentation.
* CNI plugins don't generally make significant use of broadcast or multicast.
.. _teapot-networking-provisioning:
Provisioning Network
--------------------
Generally bare-metal machines will need at least one interface connected to a
provisioning network in order to boot using :abbr:`PXE (Pre-boot execution
environment)`. Typically the provisioning network is required to be an untagged
VLAN.
PXE can be avoided by provisioning using virtual media (where the BMC attaches
a virtual disk containing the boot image to the host's USB), but hardware
support for doing this from Ironic is uneven (though rapidly improving) and it
is considerably slower than PXE. In addition, the Ironic agent typically
communicates over this network for purposes such as introspection of hosts or
cleaning of disks.
For the purpose of PXE booting, hosts could be left permanently connected to
the provisioning network provided they are isolated from each other (e.g. using
private VLANs). This would have the downside that the main network interface of
the tenant worker would have to appear on a tagged VLAN. However, the Ironic
agent's access to the Ironic APIs is unauthenticated, and therefore not safe to
be carried over networks that have hosts allocated to tenants connected to
them. This could occur over a separate network, but in any event hosts'
membership of this network will have to be changed dynamically in concert with
the baremetal provisioner.
The :abbr:`BMC (Baseboard management controller)`\ s will be connected to a
separate network that is reachable only from the management cluster.
.. _teapot-networking-storage:
Storage Network
---------------
When (optionally) used in combination with multi-tenant storage, machines will
need to also be connected to a separate storage network. The networking
requirements for this network are much simpler, as it does not need to be
dynamically managed. Each edge port should be isolated from all of the others
(using e.g. Private VLANs), regardless of whether they are part of the same
tenant. :abbr:`QoS (Quality of Service)` rules should ensure that no individual
machine can effectively deny access to others. Configuring the switches for the
storage network can be considered out of scope for Project Teapot, at least
initially, as the configuration need not be dynamic, but might be in scope for
the :doc:`installer <installation>`.
.. _teapot-networking-external:
External connections
--------------------
Workloads running in a tenant cluster can request to be exposed for incoming
external connections in a number of different ways. The Teapot cloud is
responsible for ensuring that each of these is possible.
The ``NodePort`` service type simply requires that the IP addresses of the
cluster members be routable from external networks.
For IPv4 support in particular, Teapot will need to be able to allocate public
IP addresses and route traffic for them to the appropriate networks.
Traditionally this is done using :abbr:`NAT (Network Address Translation)`
(e.g. Floating IPs in OpenStack). Users can specify an externalAddress to make
use of public IPs within their cluster, although there's no built-in way to
discover what IPs are available. Teapot should also have a way of exporting the
:doc:`reverse DNS records <dns>` for public IP addresses.
The ``LoadBalancer`` Service type uses an external :doc:`load balancer
<load-balancing>` as a front end. Traffic from the load balancer is directed
to a ``NodePort`` service within the tenant cluster.
Most managed Kubernetes services provide an Ingress controller that can set up
load balancing (including :abbr:`TLS (Transport Layer Security)` termination)
in the underlying cloud for HTTP(S) traffic, including automatically
configuring public IPs. If Teapot provided :ref:`such an Ingress controller
<teapot-load-balancing-ingress-controller>`, it might be a viable option to not
support public IPs at all for the ``NodePort`` service type. In this case, the
implementation of public IPs could be confined to the :ref:`load balancing API
<teapot-load-balancing-ingress-api>`, and the only stable public IP addresses
would be the Virtual IPs of the load balancers. Tenant IPv6 addresses could
easily be made publicly routable to provide direct access to ``NodePort``
services over IPv6 only, although this also comes with the caveat that some
clients may be tempted to rely on the IP of a Service being static, when in
fact the only safe way to reference it is via a :doc:`DNS name <dns>` exported
by ExternalDNS.
Implementation Options
----------------------
.. _teapot-networking-ansible:
Ansible Networking
~~~~~~~~~~~~~~~~~~
A good long-term implementation strategy might be to use ansible-networking to
directly configure the top-of-rack switches. This would be driven by a
Kubernetes controller running in the management cluster operating on a set of
Custom Resource Definitions (CRDs). The ansible-networking project supports a
wide variety of hardware already. A minimal proof of concept for this
controller `exists <https://github.com/bcrochet/physical-switch-operator>`_.
In addition to configuring the edge switches, a solution for public IPs and
other ways of exposing services is also needed. Future requirements likely
include configuring limited cross-tenant network connectivity, and access to
hardware load balancers and other data center hardware.
.. _teapot-networking-neutron:
OpenStack Neutron
~~~~~~~~~~~~~~~~~
A good short-term option might be to use a cut-down Neutron installation as an
implementation detail to manage the network. Using only the baremetal port
types in Neutron circumvents a lot of the complexity. Most of the Neutron
agents would not be required, so message queue--based RPC could be eliminated
or replaced with json-rpc (as it has been in Ironic for Metal³). Since only a
trusted service would be controlling network changes, Keystone authentication
would not be required either.
To ensure that Neutron itself could eventually be switched out, it would be
strictly confined behind a Kubernetes-native API, in much the same way as
Ironic is behind Metal³. The existing direct integration between Ironic and
Neutron would not be used, and nor could we rely on Neutron to provide an
integration point for e.g. :ref:`Octavia <teapot-load-balancing-octavia>` to
provide an abstraction over hardware load balancers.
The abstraction point would be the Kubernetes CRDs -- different controllers
could be chosen to manage custom resources (and those might in turn make use of
additional non-public CRDs), but we would not attempt to build controllers with
multiple plugin points that could lead to ballooning complexity.

View File

@ -0,0 +1,109 @@
Teapot and OpenStack
====================
Many potential users of Teapot have large existing OpenStack deployments.
Teapot is not intended to be a wholesale replacement for OpenStack -- it does
not deal with virtualisation at all, in fact -- so it is important that the two
complement each other.
.. _teapot-openstack-managed-services:
Managed Services
----------------
A goal of Teapot is to make it easier for cloud providers to offer managed
services to tenants. Attempts to do this in OpenStack, such as Trove_, have
mostly foundered. The Kubernetes Operator pattern offers the most promising
ground for building such services in future, and since Teapot is
Kubernetes-native it would be well-placed to host them.
Building a thin OpenStack-style ReST API over such services would allow their
use from an OpenStack cloud (presumably sharing, or federated to, the same
Keystone) simultaneously. And, in fact, most such services could be decoupled
from Teapot altogether and run in a generic Kubernetes cluster so that they
could benefit users of either cloud type even absent the other.
Teapot's :ref:`load balancing API <teapot-load-balancing-ingress-api>` would
arguably already be a managed service. :ref:`Octavia
<teapot-load-balancing-octavia>` could possibly use it as a back-end as a first
example.
.. _teapot-openstack-side-by-side:
Side-by-side Clouds
-------------------
Teapot should be co-installable alongside an existing OpenStack cloud to
provide additional value. In this configuration, the Teapot cloud would use the
OpenStack cloud's :ref:`Keystone <teapot-idm-keystone>` and any services that
are expected to be found in the catalog (e.g. :ref:`Manila
<teapot-storage-manila>`, :ref:`Cinder <teapot-storage-cinder>`,
:ref:`Designate <teapot-dns-designate>`).
An OpenStack-style ReST API in front of Teapot would allow users of the
OpenStack cloud to create and manage bare-metal Kubernetes clusters in much the
same way they do today with Magnum.
Tenants would need a way to connect their Neutron networks in OpenStack to the
Kubernetes clusters. Since Teapot tenant networks are :ref:`just V(x)LANs
<teapot-networking-multi-tenancy>`, this could be accomplished by adding those
networks as provider networks in Neutron, and allowing the correct tenants to
connect to them via Neutron routers. This should be sufficient for the main use
case, which would be running parts of an application in a Kubernetes cluster
while other parts remain in OpenStack VMs.
However, the ideal for this type of deployment would be to allow servers to be
dynamically moved between the OpenStack and Teapot clouds. Sharing inventory
with OpenStack's Ironic might be simple enough -- if Metal³ was configured to
use the OpenStack cloud's Ironic then a small component could claim hosts in
OpenStack Placement and create corresponding BareMetalHost objects in Teapot.
Both clouds would end up manipulating the top-of-rack switch configuration for
a host, but presumably only at different times.
Switching hosts between acting as OpenStack compute nodes and being available
to Teapot tenants would be more complex, since it would require interaction
with the tool managing the OpenStack deployment, of which there are many.
However, supporting autoscaling between the two is probably unnecessary.
Manually moving hosts between the clouds should be manageable, since no changes
to the physical network cabling would be required. Separate :ref:`provisioning
networks <teapot-networking-provisioning>` would need to be maintained, since
the provisioner needs control over DHCP.
.. _teapot-openstack-on-teapot:
OpenStack on Teapot
-------------------
To date, the most popular OpenStack installers have converged on Ansible as a
deployment tool because the complexity of OpenStack needs tight control over
the workflow that purely declarative tools struggle to match. However,
Kubernetes Operators present a declarative alternative that is nonetheless
equally flexible. Even without Operators, Airship_ and StarlingX_ are both
installing OpenStack on top of Kubernetes. It seems likely that in the future
this will be a popular way of layering things, and Teapot is well-placed to
enable it since it provides bare-metal hosts running Kubernetes.
For a large, shared OpenStack cloud, this would likely be best achieved by
running the OpenStack control plane components inside the Teapot management
cluster. Sharing of services would then be similar to the side-by-side case.
OpenStack Compute nodes or e.g. Ceph storage nodes could be deployed using
`Metal³`_. This effectively means building an OpenStack installation/management
system similar to a TripleO undercloud but based on Kubernetes.
There is a second use case, for running small OpenStack installations (similar
to StarlingX) within a tenant. In these cases, the tenant OpenStack would still
need to access storage from the Teapot cloud. This could possibly be achieved
by federating the tenant Keystone to Teapot's Keystone and using hierarchical
multi-tenancy so that projects in the tenant Keystone are actually sub-projects
of the tenant's project in the Teapot Keystone. (The long-dead `Trio2o
<https://opendev.org/x/trio2o#trio2o>`_ project also offered a potential
solution in the form of an API proxy, but probably not one worth resurrecting.)
Use of an overlay network (e.g. OVN) would be required, since the tenant would
have no access to the underlying network hardware. Some integration between the
tenant's Neutron and Teapot would need to be built to allow ingress traffic.
.. _Trove: https://docs.openstack.org/trove/
.. _Airship: https://www.airshipit.org/
.. _StarlingX: https://www.starlingx.io/
.. _Metal³: https://metal3.io/

View File

@ -0,0 +1,140 @@
Teapot Storage
==============
Project Teapot should have the ability to optionally provide multi-tenant
access to shared file, block, and/or object storage. Shared file and block
storage capabilities are not currently available to Kubernetes users except
through the cloud providers.
Tenants can always choose to use hyperconverged storage -- that is to say, both
compute and storage workloads on the same hosts -- without involvement or
permission from Teapot. (For example, by using Rook_.) However, this means that
compute and storage cannot be scaled independently; they are tightly coupled.
Tenants with disproportionately large amounts of data but modest compute needs
(and sometimes vice-versa) would not be served efficiently. Hyperconverged
storage also usually makes sense only for clusters that are essentially fixed.
Changing the size of the cluster results in rebalancing of storage, so it is
not suitable for workloads that vary greatly over time (for instance, training
of machine learning models).
To efficiently run hyperconverged storage also requires a somewhat specialised
choice of servers. Particularly in a large cloud where different tenants have
different storage requirements, it might be cheaper to provide a centralised
storage cluster and thus require either fewer variants or less specialisation
of server hardware.
For all of these reasons, a shared storage pool is needed to take full
advantage of the highly dynamic environment offered by a cloud like Teapot.
Providing multi-tenant access to shared file and block storage allows the cloud
provider to use a dedicated storage network (such as a :abbr:`SAN (Storage Area
Network)`). Many potential users may already have something like this. Having
the storage centralised also makes it easier and more efficient to share large
amounts of data between tenants when required (since traffic can be confined to
the same :ref:`storage network <teapot-networking-storage>` rather than
traversing the public network).
Applications can use object storage anywhere (including outside clouds), but to
minimise network bandwith, it will often be better to have it nearby. Should
the `proposal to add Object Bucket Provisioning
<https://github.com/kubernetes/enhancements/pull/1383>`_ to Kubernetes
eventuate, there will also be advantage in have object storage as part of the
local cloud, using the same authentication mechanism.
Implementation Options
----------------------
OpenStack already provides robust, mature implementations of multi-tenant
shared storage that are accessible from Kubernetes. The main task would be to
integrate them into the system and simplify deployment. These services would
run in either the management cluster or a separate (but still
centrally-managed) storage cluster.
.. _teapot-storage-manila:
OpenStack Manila
~~~~~~~~~~~~~~~~
Manila_ is the most natural fit for Kubernetes because it provides 'RWX'
(Read/Write Many) persistent storage, which is often needed to avoid downtime
when pods are upgraded or rescheduled to different nodes as well as for
applications where multiple pods are writing to the same filesystem in
parallel.
Manila's architecture is relatively simple already. It would be helpful if the
dependency on RabbitMQ could be removed (to be replaced with e.g. json-rpc in
the same way that Ironic has in Metal³), but this would require more
investigation. An Operator for deploying and managing Manila on Kubernetes is
under development.
A :abbr:`CSI (Container Storage Interface)` plugin for Manila already exists in
cloud-provider-openstack_.
.. _teapot-storage-cinder:
OpenStack Cinder
~~~~~~~~~~~~~~~~
Cinder_ is more limited than Manila in the sense that it can provide only 'RWO'
(Read/Write One) access to persistent storage for most applications.
(Kubernetes volume mounts are generally file-based -- Kubernetes creates its
own file system on block devices if none is present.) However, Kubernetes does
now support raw block storage volumes, which *do* support 'RWX' mode for
applications that can work with raw block offsets. KubeVirt in particular is
expected to make use of raw block mode persistent volumes for backing virtual
machines, so this is likely to be a common use case.
Much of the complexity in Cinder is linked to the need to provide agents
running on Nova compute hosts. Since Teapot is a baremetal-only service, only
the parts of Cinder needed to provide storage to Ironic servers are required.
Unfortunately, Cinder is quite heavily dependent on RabbitMQ. However, there
may be scope for simplification through further work with the Cinder community.
The remaining portions of Cinder are architecturally very similar to Manila, so
similar results could be expected.
Cinder has a dependency on Barbican for supporting encrypted volumes. Encrypted
volume support is not required but would be nice to have. This is another
reason to use :ref:`Barbican <teapot-key-management-barbican>`. It would be
nice to think that we could adapt Cinder to be able to use Kubernetes Secrets
instead (perhaps via another key manager back-end to Castellan), but that
doesn't actually provide the :doc:`level of security you would hope for
<key-management>` without Barbican or an equivalent anyway.
A :abbr:`CSI (Container Storage Interface)` plugin for Cinder already exists in
cloud-provider-openstack_.
Ember_ is an alternative CSI plugin that makes use of lib-cinder, rather than
all of Cinder. This allows Cinder's hardware drivers to be used directly from
Kubernetes while eliminating a lot of overhead. However, some of the overhead
that is eliminated is the API that enforces multi-tenancy. Therefore, Ember is
not an option for this particular use case.
.. _teapot-storage-swift:
OpenStack Swift
~~~~~~~~~~~~~~~
Swift_ is a very mature object storage system, with both a native API and the
ability to emulate Amazon S3. It supports :ref:`Keystone <teapot-idm-keystone>`
authentication. It has a relatively simple architecture that should make it
straightforward to deploy on top of Kubernetes.
.. _teapot-storage-radosgw:
Ceph Object Gateway
~~~~~~~~~~~~~~~~~~~
RadosGW_ is a service to provide an object storage interface backed by Ceph,
with two APIs that are compatible with large subsets of Swift and Amazon S3,
respectively. It can use either :ref:`Keystone <teapot-idm-keystone>` or
:ref:`Keycloak <teapot-idm-keycloak>` for authentication. It can be installed
and managed using the Rook_ operator.
.. _Rook: https://rook.io/
.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack#readme
.. _Manila: https://docs.openstack.org/manila/latest/
.. _Cinder: https://docs.openstack.org/cinder/latest/
.. _Ember: https://ember-csi.io/
.. _Swift: https://docs.openstack.org/swift/latest/
.. _RadosGW: https://docs.ceph.com/docs/master/radosgw/

View File

@ -89,8 +89,9 @@ Proposed ideas
==============
.. toctree::
:maxdepth: 2
:maxdepth: 1
:titlesonly:
:glob:
ideas/*/index
ideas/*