Merge "Add Project Teapot idea"

2020-03-09 19:14:38 +00:00 · 2020-03-09 19:14:38 +00:00 · 6f33f32257
parent 651d8107e9 e77cfc3208
commit 6f33f32257
11 changed files with 1388 additions and 1 deletions
--- a/doc/source/ideas/teapot/compute.rst
+++ b/doc/source/ideas/teapot/compute.rst
@ -0,0 +1,178 @@
+Teapot Compute
+==============
+
+Project Teapot is conceived as an exclusively bare-metal compute service for
+Kubernetes clusters. Providing bare-metal compute workers to tenants allows
+them to make their own decisions about how they make use of virtualisation. For
+example, Tenants can choose to use a container hypervisor (such as Kata_) to
+further sandbox applications, traditional VMs (such as those managed by
+KubeVirt_ or `OpenStack Nova`_), *or both* `side-by-side
+<https://kubernetes.io/docs/concepts/containers/runtime-class/>`_ in the same
+cluster. Furthermore, it allows users to manage all components of an
+application -- both those that run in containers and those that need a
+traditional VM -- from the same Kubernetes control plane (using KubeVirt).
+Finally, it eliminates the complexity of needing to virtualise access to
+specialist hardware such as :abbr:`GPGPU (general-purpose GPU)`\ s or FPGAs,
+while still allowing the capability to be used by different tenants at
+different times.
+
+However, the *master* nodes of tenant cluster will run in containers on the
+management cluster (or some other centrally-managed cluster). This makes it
+easy and cost-effective to provide high availability of cluster control planes,
+by not sacrificing large numbers of hosts to this purpose or requiring
+workloads to run on master nodes. It also makes it possible to optionally
+operate Teapot as a fully-managed Kubernetes service. Finally, it makes it
+relatively cheap to scale a cluster to zero when it has nothing to do, for
+example if it is only used for batch jobs, without requiring it to be recreated
+from scratch each time. Since the management cluster also runs on bare metal,
+the tenant pods could also be isolated from each other and from the rest of the
+system using Kata, in addition to regular security policies.
+
+.. _teapot-compute-metal3:
+
+Metal³
+------
+
+Provisioning of bare-metal servers will use `Metal³`_.
+
+The baremetal-operator from Metal³ provides a Kubernetes-native interface over
+a simplified `OpenStack Ironic`_ deployment. In this configuration, Ironic runs
+standalone (i.e. it does not use Keystone authentication). All communication
+between components occurs inside of a pod. RabbitMQ has been replaced by
+json-rpc. Ironic state is maintained in a database, but the database can run on
+ephemeral storage -- the Kubernetes custom resource (BareMetalHost) is the
+source of truth.
+
+The baremetal-operator will run only in the management cluster (or some other
+centrally managed cluster) because it requires access to both the :abbr:`BMC
+(Baseboard Management Controller)`\ s' network (as well as the
+:ref:`provisioning network <teapot-networking-provisioning>`) and the
+authentication credentials for the BMCs.
+
+.. _teapot-compute-cluster-api:
+
+Cluster API
+-----------
+
+The baremetal-operator can be integrated with the Kubernetes Cluster Lifecycle
+SIG's `Cluster API`_ via another Metal³ component, the
+cluster-api-provider-baremetal. This contains a BareMetalMachine controller
+that implements the Machine abstraction using a BareMetalHost. (Airship_ 2.0 is
+also slated to use Metal³ and the Cluster API to manage cluster provisioning,
+so this mechanism could be extended to deploy fully-configured clusters with
+Airship as well.)
+
+When the Cluster API is used to build standalone clusters, typically a
+bootstrap node is created (often using a local VM) to run it in order to create
+the permanent cluster members. The Cluster and Machine resources are then
+'pivoted' (copied) into the cluster, which continues to manage itself while the
+bootstrap node is retired. When used with a centralised cluster manager such as
+Teapot, the process is usually similar but can use the management cluster to do
+the bootstrapping. Pivoting is optional but usually expected.
+
+Teapot imposes some additional constraints. Because the BareMetalHost objects
+must remain in the management cluster, the Machine objects cannot be simply
+copied to the tenant cluster and continue to be backed by the BareMetalMachine
+controller in its present form.
+
+One option might be to build a machine controller for the tenant cluster that
+is backed by a Machine object in another cluster (the management cluster). This
+might prove useful for centralised management clusters in general, not just
+Teapot. We would have no choice but to name this component
+cluster-api-provider-cluster-api.
+
+Cluster API does not yet have support for running the tenant control plane in
+containers. Tools like Gardener_ do, but are not yet well integrated with the
+Cluster API. However, the Cluster Lifecycle SIG is aware of this use case, and
+will likely evolve the Cluster API to make this possible.
+
+.. _teapot-compute-autoscaling:
+
+Autoscaling
+-----------
+
+The preferred mechanism in Kubernetes for applications to control the size of
+the cluster they run in is the Cluster Autoscaler. There is no separate
+interface to this mechanism for applications. If an application is too busy, it
+simply requests more or larger pods. When there is no longer sufficient
+capacity to schedule all requested pods, the Cluster Autoscaler will scale the
+cluster up. Similarly, if there is significant excess capacity not being used
+by pods, Cluster Autoscaler will scale the cluster down.
+
+Cluster Autoscaler works using its own cloud-specific plugins. A `plugin that
+uses the Cluster API is in progress
+<https://github.com/kubernetes/autoscaler/pull/1866>`_, so Teapot could
+automatically make use of that provided that the Machine resources were pivoted
+into the tenant cluster.
+
+One significant challenge posed by bare-metal is the extremely high latency
+involved in provisioning a bare-metal host (15 minutes is not unusual, due in
+large part to running hardware tests including checking increasingly massive
+amounts of RAM). The situation is even worse when needing to deprovision a host
+from one tenant before giving it to another tenant, since that requires
+cleaning the local disks, though this extra overhead can be essentially
+eliminated if the disk is encrypted (in which case only the keys need be
+erased).
+
+.. _teapot-compute-scheduling:
+
+Scheduling
+----------
+
+Unlike when operating a standalone bare-metal cluster, when allocating hosts
+amongst different clusters it is important to have sophisticated ways of
+selecting which hosts are added to which cluster.
+
+An obvious example would be selecting for various hardware traits -- which are
+unlikely to be grouped into 'flavours' in the way that Nova does. The optimal
+way of doing this would likely include some sort of cost function, so that a
+cluster is always allocated the minimum spec machine that meets its
+requirements. Another example would be selecting for either affinity or
+anti-affinity of hosts, possibly at different (and deployment-specific) levels
+of granularity.
+
+Work is underway in Metal³ on a hardware-classification-controller that will
+add labels to BareMetalHosts based on selected traits, and the baremetal
+actuator can select hosts based on labels. This would be sufficient to perform
+flavour-based allocation and affinity, but likely not on its own for
+trait-based allocation and anti-affinity.
+
+.. _teapot-compute-reservation:
+
+Reservation and Quota Management
+--------------------------------
+
+The design for quota management should recognise the many ways in which it is
+used in both private and public clouds. In public clouds utilisation is
+controlled by billing; quotas are primarily a tool for *users* to limit their
+financial exposure.
+
+In private OpenStack clouds, the implementation of chargeback is rare. A more
+common model is that a department will contribute a portion of the capital
+budget for a cloud in exchange for a quota -- a model that fits quite well with
+Teapot's allocation of entire hosts to tenants.
+
+To best support the private cloud use case, there need to be separate concepts
+of a guaranteed minimum reservation and a maximum quota. The sum of minimum
+reservations must not exceed the capacity of the cloud (are more complex
+requirement than it sounds, since it must take into account selected hardware
+traits). Some form of pre-emption is needed, along with a way of prioritising
+requests for hosts. Similar concepts exist in many public clouds, in the form
+of reserved and spot-rate instances.
+
+The reservation/quota system should have a time component. This allows, for
+example, users who have large batch jobs to reserve capacity for them without
+tying it up around the clock. (The increasing importance of machine learning
+means that once again almost everybody has large batch jobs.) Time-based
+reservations can also help mitigate the high latency of moving hosts between
+tenants, by allowing some of the demand to be anticipated.
+
+
+.. _Kata: https://katacontainers.io/
+.. _KubeVirt: https://kubevirt.io/
+.. _OpenStack Nova: https://docs.openstack.org/nova
+.. _Metal³: https://metal3.io/
+.. _OpenStack Ironic: https://docs.openstack.org/ironic
+.. _Cluster API: https://github.com/kubernetes-sigs/cluster-api#readme
+.. _Airship: https://www.airshipit.org/
+.. _Gardener: https://gardener.cloud/030-architecture/
--- a/doc/source/ideas/teapot/dns.rst
+++ b/doc/source/ideas/teapot/dns.rst
@ -0,0 +1,117 @@
+Teapot DNS
+==========
+
+Project Teapot must provide a trusted way for DNS information generated by the
+(untrusted) tenant clusters to be propagated out to the network.
+
+Each tenant cluster requires at least 2 DNS records -- one for the control
+plane, and a wildcard for any applications. These would usually be subdomains
+of a zone delegated to the Teapot for this purpose. Teapot would be responsible
+for rolling up these records and making them available over DNS.
+
+Since Teapot will be responsible for :ref:`allocating public IP addresses
+<teapot-networking-external>`, it will also need to be responsible for
+advertising reverse DNS records for those IPs.
+
+Implementation Options
+----------------------
+
+The Kubernetes SIG ExternalDNS_ project is a Kubernetes-native service that
+collects IP addresses for Services and Ingresses running in the cluster and
+exports DNS records for them (though it is *not* itself a DNS server). It
+supports many different back-ends -- both traditional DNS servers and
+cloud-based services (including OpenStack Designate).
+
+While tenants are of course free to run this in their own clusters already
+(perhaps pointing to an external cloud service), this is not sufficient to
+satisfy the above requirements. It requires them to use an external cloud
+service (which may not always be appropriate for internal-only applications in
+a private cloud), since tenants are untrusted and cannot be given write access
+to an internal DNS server. And reverse DNS records cannot be exported, because
+tenant clusters are not a trusted source of information about what IP addresses
+are assigned to them.
+
+.. _teapot-dns-externaldns:
+
+ExternalDNS in load balancing cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If Teapot implemented the :doc:`load balancing <load-balancing>` :ref:`option
+based on Ingress resources <teapot-load-balancing-ingress-api>` in the
+management cluster (or a separate load balancing cluster), and these were used
+for both Services and Ingresses, then ExternalDNS running in that same cluster
+would automatically see all of the external endpoints for the tenant clusters.
+It could even rely on the fact that the IP addresses will have been sanitised
+already before creating the Ingress objects. There would need to be provision
+made somewhere for sanitising the DNS names, however.
+
+On its own this only satisfies the first requirement. Additional work might
+need to be done to export the wildcard DNS records for the tenant workloads.
+(Note that the tenant control planes would be running in containers on the
+management cluster or another centrally-managed cluster, and may well have
+Ingress resources associated with them already.) And additional work would
+certainly be needed to export the reverse DNS records.
+
+A major downside of this is that it gives the tenant very little control over
+whether and how it exports DNS information.
+
+.. _teapot-dns-externaldns-sync:
+
+Build a component to sync ExternalDNS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A component running in a tenant cluster could sync any ExternalDNS Endpoint
+resources (not to be confused with Kubernetes Endpoint resources) from the
+tenant cluster into the management cluster. (This component could even be
+written as an ExternalDNS provider.) This option is analagous to an
+:ref:`Ingress-based API for load balancing
+<teapot-load-balancing-ingress-api>`.
+
+On the management cluster side, a validating webhook would check for legitimacy
+prior to accepting a resource. More investigation is required into the
+mechanics of this -- since the resources are not normally manipulated by
+anything other than ExternalDNS itself, having something else writing to them
+might prove brittle.
+
+Again, additional work might need to be done to export the wildcard DNS records
+for the tenant workloads and would be needed for reverse DNS records.
+
+.. _teapot-dns-designate:
+
+OpenStack Designate
+~~~~~~~~~~~~~~~~~~~
+
+Designate_ is already one of the supported back-ends for ExternalDNS. By
+running a minimal, opinionated installation of Designate in the management
+cluster we could allow tenants to choose whether and how to set up ExternalDNS
+in their own clusters. They could choose to export records to the Teapot cloud,
+to some external cloud, or not at all.
+
+Since Designate has an API, it would be easy to add the two top-level records
+for each cluster.
+
+Designate has the ability to export reverse DNS records based on floating IPs.
+However, the current implementation is tightly coupled to Neutron. If Neutron
+is used in Teapot it should be as an implementation detail only, so other
+services like Designate should not rely on integrating with it. Therefore
+additional work would be required to support reverse DNS. There is an API
+plugin point to pull data, or it could be pushed in through the Designate API.
+
+Ideally the back-end in the opinionated configuration would be CoreDNS_, due to
+its status in the Kubernetes community (it is used for the *internal* DNS and
+is a CNCF project). However, there is currently no CoreDNS back-end for
+Designate. An alternative to writing one would be to write a Designate plugin
+for CoreDNS -- similar plugins exist for other clouds already. The latter would
+provide the most benefit to OpenStack users, since theoretically tenants could
+make use of it even if CoreDNS is not chosen as the back-end by their OpenStack
+cloud's administrators.
+
+The Designate Sink component would not be required, but the rest of Designate
+is also built around RabbitMQ, which is highly undesirable. However, it is
+largely used to implement RPC patterns (``call``, not ``cast``), and might be
+amenable to being swapped for a json-rpc interface in the same way as is done
+in Ironic for Metal³.
+
+.. _ExternalDNS: https://github.com/kubernetes-sigs/external-dns#readme
+.. _Designate: https://docs.openstack.org/designate/
+.. _CoreDNS: https://coredns.io/
--- a/doc/source/ideas/teapot/idm.rst
+++ b/doc/source/ideas/teapot/idm.rst
@ -0,0 +1,146 @@
+Teapot Identity Management
+==========================
+
+Teapot need not, and should not, impose any particular identity management
+system for tenant clusters. These are the clusters that applications and
+application developers/operators will routinely interact with, and the choice
+of identity management providers is completely up to the administrators of
+those clusters, or at least the administrator of the Teapot cloud when running
+as a fully-managed service.
+
+Identity management in Teapot itself (i.e. the management cluster) is needed
+for two different purposes. While not strictly necessary, it would be
+advantageous to require only one identity management provider to cover both of
+these use cases.
+
+Authenticating From Below
+-------------------------
+
+Software running in the tenant clusters needs to authenticate to the cloud to
+request resources, such as machines, :doc:`load balancers <load-balancing>`,
+:doc:`shared storage <storage>`, :doc:`DNS records <dns>`, and (in future)
+managed software services.
+
+Credentials for these purposes should be regularly rotated and narrowly
+authorised, to limit both the scope and duration of any compromise.
+
+Authenticating From Above
+-------------------------
+
+Real users and sometime software services need to authenticate to the cloud to
+create or destroy clusters, manually scale them up or down, request quotas, and
+so on.
+
+In many cases, such as most enterprise private clouds, these credentials should
+be linked to an external identity management provider. This would allow
+auditors of the system to tie physical hardware directly back to corporeal
+humans to which it is allocated and the organisational units to which they
+belong.
+
+Humans must also have a secure way of delegating privileges to an application
+to interact with the cloud in this way -- for example, imagine a CI system that
+needs to create an entire test cluster from scratch and destroy it again. This
+must not require the user's own credentials to be stored anywhere.
+
+Implementation options
+----------------------
+
+.. _teapot-idm-keystone:
+
+OpenStack Keystone
+~~~~~~~~~~~~~~~~~~
+
+Keystone_ is currently the only game in town for providing identity management
+for OpenStack services that are candidates for being included to provide some
+multi-tenant functionality in Teapot, such as :ref:`Manila
+<teapot-storage-manila>` and :ref:`Designate <teapot-dns-designate>`. Therefore
+using Keystone for all identity management on the management cluster would not
+only not increase complexity of the deployment, it would actually minimise it.
+
+An authorisation webhook for Kubernetes that uses Keystone is available in
+cloud-provider-openstack_. In general, OAuth seems to be preferred to webhooks
+for connecting external identity management systems, but there is at least a
+working option.
+
+Keystone supports delegating user authentication
+to LDAP, as well as offering its own built-in user management. It can also
+federate with other identity providers via the `OpenID Connect`_ or SAML_
+protocols. Using Keystone would also make it simpler to run Teapot alongside an
+existing OpenStack cloud -- enabling tenants to share services in that cloud,
+as well as potentially making Teapot's functionality available behind an
+OpenStack-native API (similar to Magnum) for those who want it.
+
+Keystone also features quota management capabilities that could be reused to
+manage tenant quotas_. A proof-of-concept for a validating webhook that allows
+this to be used for governing Kubernetes resources `exists
+<https://github.com/cmurphy/keyhook#readme>`_.
+
+While there are generally significant impedance mismatches between the
+Kubernetes and Keystone models of authorisation, Project Teapot is a fresh
+start and can prescribe custom policy models that mitigate the mismatch.
+(Ongoing changes to default policies will likely smooth over these kinds of
+issues in regular OpenStack clouds also.) This may not be so easy when sharing
+a Keystone :doc:`with an OpenStack cloud <openstack-integration>` though.
+
+Keystone Application Credentials allow users to create (potentially)
+short-lived credentials that an application can use to authenticate without the
+need to store the user's own LDAP password (which likely also governs their
+access to a wide range of unrelated corporate services) anywhere. Credentials
+provided to tenant clusters should be exclusively of this type, limited to the
+purpose assigned (e.g. credentials intended for accessing storage can only be
+used to access storage), and regularly rotated out and expired.
+
+.. _teapot-idm-dex:
+
+Dex
+~~~
+
+Dex_ is an identity management service that uses `OpenID Connect`_ to provide
+authentication to Kubernetes. It too supports delegating user authentication to
+LDAP, amongst others. This would likely be seen as a more conventional choice
+in the Kubernetes community. Dex can store its data using Kubernetes custom
+resources, so it is the most lightweight option.
+
+Dex does not support authorisation. However, Keystone supports OpenID Connect
+as a federated identity provider, so it could still be used as the
+authorisation mechanism (including for OpenStack-derived services such as
+Manila) using Dex for authentication. However, this inevitably adds additional
+moving parts. In general, Keystone has `difficultly
+<https://bugs.launchpad.net/keystone/+bug/1589993>`_ with application
+credentials for federated users because it is not immediately notified of
+membership revocations, but since both components are under the same control in
+this case it would be easier to build some additional integration to keep them
+in sync.
+
+.. _teapot-idm-keycloak:
+
+Keycloak
+~~~~~~~~
+
+Keycloak_ is a more full-featured identity management service. It would also be
+seen in the Kubernetes community as a more conventional choice than Keystone,
+although it does not use the Kubernetes API as a data store. Keycloak is
+significantly more complex to deploy than Dex. However, a `Kubernetes operator
+for Keycloak <https://operatorhub.io/operator/keycloak-operator>`_ now exists,
+which should hide much of the complexity.
+
+Keystone could federate to Keycloak as an identity management provider using
+either OpenID Connect or SAML.
+
+Theoretically, Keycloak could be used without Keystone if the Keystone
+middleware in the services were replaced by some new OpenID Connect middleware.
+The architecture of OpenStack is designed to make this at least possible. It
+would also require changes to client-side code (most prominently any
+cloud-provider-openstack providers that might otherwise be reused), although
+there is a chance that they could be contained to a small blast radius around
+Gophercloud's `clientconfig module
+<https://github.com/gophercloud/utils/tree/master/openstack/clientconfig>`.
+
+
+.. _Keystone: https://docs.openstack.org/keystone/
+.. _OpenID Connect: https://openid.net/connect/
+.. _SAML: https://docs.oasis-open.org/security/saml/Post2.0/sstc-saml-tech-overview-2.0.html
+.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-keystone-webhook-authenticator-and-authorizer.md#readme
+.. _quotas: https://docs.openstack.org/keystone/latest/admin/unified-limits.html
+.. _Dex: https://github.com/dexidp/dex/#readme
+.. _Keycloak: https://www.keycloak.org/
--- a/doc/source/ideas/teapot/index.rst
+++ b/doc/source/ideas/teapot/index.rst
@ -0,0 +1,129 @@
+Project Teapot
+==============
+
+.. _teapot-introduction:
+
+Introduction
+------------
+
+Project Teapot is a design proposal for a bare-metal cloud to run Kubernetes
+on.
+
+When OpenStack was first designed, 10 years ago, the cloud computing landscape
+was a very different place. In the intervening period, OpenStack has amassed an
+enormous installed base of many thousands of users who all depend on it
+remaining essentially the same service, with backward-compatible APIs. If we
+designed an open source cloud platform without those restrictions and looking
+ahead to the 2020s, knowing everything we know today, what might it look like?
+And how could we build it without starting from scratch, but using existing
+open source technologies where possible? Project Teapot is one answer to these
+questions.
+
+Project Teapot is designed to run natively on Kubernetes, and to integrate with
+Kubernetes clusters deployed by tenants. It provides only bare-metal compute
+capacity, so that tenants can orchestrate all aspects of an application -- from
+legacy VMs to cloud-native containers to workloads requiring custom hardware,
+and everything in between -- through a single API that they can control.
+
+It seems inevitable that numerous organisations are going to end up
+implementing various subsets of this functionality just to deal with bare-metal
+clusters in their own environment. By developing Teapot in the open, we would
+give them a chance to reduce costs by collaborating on a common solution.
+
+.. _teapot-goals:
+
+Goals
+-----
+
+OpenStack's `mission
+<https://governance.openstack.org/tc/resolutions/20160217-mission-amendment.html>`_
+is to be ubiquitous; Teapot's is narrower. In the 2020s, Kubernetes will be
+ubiquitous. However, Kubernetes' separation of responsibilities with the
+underlying cloud mean that some important capabilities are considered out of
+scope for it -- most obviously multi-tenancy of the sort provided by clouds,
+allowing isolation from potentially malicious users (including innocuous users
+who have had their workloads hacked by malicious third parties). Teapot's
+primary mission is to fill those gaps with an open source solution, by
+providing a cloud layer to manage a physical data center beneath Kubernetes.
+
+In addition to mediating access to a physical data center, another important
+role of clouds is to offer managed services (for example, a database as a
+service). Teapot itself can be used to provide a managed service -- Kubernetes
+(though it could equally be configured to provide fully user-controlled tenant
+clusters). A secondary goal is to make Teapot a platform that cloud providers
+could use to offer other kinds of managed service as well. Teapot is an easier
+base than OpenStack on which to deploy such services because it is itself based
+on Kubernetes.
+
+.. _teapot-non-goals:
+
+Non-Goals
+---------
+
+Teapot's design makes it suitable for deployments that require multi-tenancy
+and are medium-sized or larger. Specifically, Teapot makes sense when tenants
+are large enough to be able to utilise at least one (and usually more than one)
+entire bare-metal server, because managing virtual machines is not a goal.
+
+Smaller deployments that nevertheless require hard multi-tenancy (that is to
+say, zero trust required between tenants) would be better off with OpenStack.
+
+Smaller deployments that do not require hard multi-tenancy would be better off
+running a single standalone Kubernetes cluster.
+
+.. _teapot-design:
+
+Design
+------
+
+The `Vision for OpenStack Clouds`_ states that the `physical data center
+management function
+<https://governance.openstack.org/tc/reference/technical-vision.html#basic-physical-data-center-management>`_
+of a cloud must "[provide] the abstractions needed to deal with external
+systems like :doc:`compute <compute>`, :doc:`storage <storage>`, and
+:doc:`networking <networking>` hardware [including :doc:`load balancers
+<load-balancing>` and :doc:`hardware security modules <key-management>`], the
+:doc:`Domain Name System <dns>`, and :doc:`identity management systems <idm>`."
+This proposal discusses implementation options for each of those classes of
+systems.
+
+Teapot also fulfils the `self-service
+<https://governance.openstack.org/tc/reference/technical-vision.html#self-service>`_
+requirements of a cloud, by providing multi-tenancy and :ref:`capacity
+management <teapot-compute-reservation>`. In the Kubernetes model,
+multi-tenancy is something that must be provided by the cloud layer.
+
+Because Teapot targets Kubernetes as its tenant workload, it is able to
+`provide applications control
+<https://governance.openstack.org/tc/reference/technical-vision.html#application-control>`_
+over the cloud using the standard Kubernetes interfaces (such as Ingress
+resources and the Cluster Autoscaler). This greatly simplifies porting of many
+workloads to and from other clouds.
+
+Teapot is designed to be radically simpler than OpenStack to :doc:`install
+<installation>` and operate. By running on the same technology stack as the
+tenant clusters it deploys, it allows a common set of skills to be applied to
+the operation of both applications and the underlying infrastructure. By
+eschewing direct management of virtualisation it avoids having to shoehorn
+bare-metal management into a virtualisation context or vice-versa, and
+eliminates entire layers of networking abstractions.
+
+At the same time, Teapot should be able to :doc:`interoperate with OpenStack
+<openstack-integration>` when required so that each enhances the value of the
+other without adding unnecessary layers of complexity.
+
+Index
+-----
+
+.. toctree::
+    compute
+    storage
+    networking
+    load-balancing
+    dns
+    idm
+    key-management
+    installation
+    openstack-integration
+
+.. _Vision for OpenStack Clouds: https://governance.openstack.org/tc/reference/technical-vision.html
--- a/doc/source/ideas/teapot/installation.rst
+++ b/doc/source/ideas/teapot/installation.rst
@ -0,0 +1,69 @@
+Teapot Installation
+===================
+
+In a sense, the core of Teapot is simply an application running in a Kubernetes
+cluster (the management cluster). This is a great advantage for ease of
+installation, because Kubernetes is renowned for its simplicity in
+bootstrapping. Many, many (perhaps too many) tools already exist for
+bootstrapping a Kubernetes cluster, so there is no need to reinvent them.
+
+However, Teapot is designed to be the system that provides cloud services to
+bare-metal Kubernetes clusters, and while it is possible to run the management
+cluster on another cloud (such as OpenStack), it is likely in most instances to
+be self-hosted on bare metal. This presents a unique bootstrapping challenge.
+
+OpenStack does not define an 'official' installer, largely due to the plethora
+of configuration management tools that different users preferred. Teapot does
+not have the same issue, as it standardises on Kubernetes as the *lingua
+franca*. There should be a single official installer and third parties are
+encouraged to add extensions and customisations by adding Resources and
+Operators through the Kubernetes API.
+
+Implementation Options
+----------------------
+
+Metal³
+~~~~~~
+
+`Metal³`_ is designed to bootstrap standalone bare-metal clusters, so it can be
+used to install the management cluster. There are multiple ways to do this. One
+is to use the `Cluster API`_ on a bootstrap VM, and then pivot the relevant
+resources into the cluster. The OpenShift installer takes a slightly different
+approach, again using a bootstrap VM, but creating the master nodes initially
+using Terraform and then creating BareMetalHost resources marked as 'externally
+provisioned' for them in the cluster.
+
+One inevitable challenge is that the initial bootstrap VM must be able to
+connect to the :ref:`management and provisioning networks
+<teapot-networking-provisioning>` in order to begin the installation. That
+makes it difficult to simply run from a laptop, which makes installing a small
+proof-of-concept cluster harder than anyone would like. (This is inherent to
+the bare-metal environment and also a problem for OpenStack installers). If a
+physical host must be used as the bootstrap, reincorporating that hardware into
+the actual cluster once it is up and running should at least be simpler on
+Kubernetes.
+
+Airship
+~~~~~~~
+
+Airship_ 2.0 uses Metal³ and the Cluster API to provision Kubernetes clusters
+on bare metal. It also provides a declarative way of repeatably setting the
+initial configuration and workloads of the deployed cluster, along with a rich
+document layering and substitution model (based on Kustomize). This might be
+the simplest existing way of defining what a Teapot installation looks like
+while allowing distributors and third-party vendors a clear method for
+providing customisations and add-ons.
+
+Teapot Operator
+~~~~~~~~~~~~~~~
+
+A Kubernetes operator for managing the deployment and configuration of the
+Teapot components could greatly simplify the installation process. This is not
+incompatible with using Airship (or indeed any other method) to define the
+configuration, as Helm would just create the top-level custom resource(s)
+controlled by the operator, instead of lower-level resources for the individual
+components.
+
+.. _Metal³: https://metal3.io/
+.. _Cluster API: https://github.com/kubernetes-sigs/cluster-api#readme
+.. _Airship: https://www.airshipit.org/
--- a/doc/source/ideas/teapot/key-management.rst
+++ b/doc/source/ideas/teapot/key-management.rst
@ -0,0 +1,70 @@
+Teapot Key Management
+=====================
+
+Kubernetes offers the Secret resource for storing secrets needed by
+applications. This is an improvement on storing them in the applications'
+source code, but unfortunately by default Secrets are not encrypted at rest,
+but simply stored in etcd in plaintext. An EncryptionConfiguration_ resource
+can be used to ensure the Secrets are encrypted before storing them, but in
+most cases the keys used to encrypt the data are themselves stored in etcd in
+plaintext, alongside the encrypted data.
+
+This can be avoided by using a `Key Management Service provider`_ plugin. In
+this case the encryption keys for each Secret are themselves encrypted, and can
+only be decrypted using a master key stored in the key management service
+(which may be a hardware security module). All extant KMS providers appear to
+be for cloud services; there are no baremetal options.
+
+Since the KMS provider is necessary to provide effective encryption at rest and
+is the *de facto* responsibility of the cloud, it would be desirable for Teapot
+to support it. The implementation should be able to make use of :abbr:`HSM
+(Hardware Security Module)`\ s, but also be able to work with a pure-software
+solution.
+
+Implementation Options
+----------------------
+
+.. _teapot-key-management-barbican:
+
+OpenStack Barbican
+~~~~~~~~~~~~~~~~~~
+
+Barbican_ provides exactly the thing we want. It `provides
+<https://docs.openstack.org/barbican/latest/install/barbican-backend.html>`_ an
+abstraction over HSMs as well as software implementations using Dogtag_ (which
+can itself store its master keys either in software or in an HSM) or Vault_,
+along with another that simply stores its master key in the config file.
+
+Like other OpenStack services, Barbican uses Keystone for :doc:`authentication
+<idm>`. A :abbr:`KMS (Key Management Service)` provider for Barbican already
+exists in cloud-provider-openstack_. This could be used in both the management
+cluster and in tenant clusters.
+
+Barbican's architecture is relatively simple, although it does rely on RabbitMQ
+for communication between the API and the workers. This should be easy to
+replace with something like json-rpc as was done for Ironic in Metal³ to
+simplify the deployment.
+
+Storing keys in software on a dynamic system like Kubernetes presents
+challenges. It might be necessary to use a host volume on the master nodes to
+store master keys when no HSM is available. Ultimately the most secure solution
+is to use a HSM.
+
+.. _teapot-key-management-secrets:
+
+Write a new KMS plugin using Secrets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Writing a KMS provider plugin is very straightforward. We could write one that
+just uses a Secret stored in the management cluster as the master key.
+
+However, this could not be used to encrypt Secrets at rest in the management
+cluster itself.
+
+
+.. _EncryptionConfiguration: https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
+.. _Key Management Service provider: https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/
+.. _Barbican: https://docs.openstack.org/barbican/latest/
+.. _Dogtag: https://www.dogtagpki.org/wiki/PKI_Main_Page
+.. _Vault: https://www.vaultproject.io/
+.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-barbican-kms-plugin.md#readme
--- a/doc/source/ideas/teapot/load-balancing.rst
+++ b/doc/source/ideas/teapot/load-balancing.rst
@ -0,0 +1,241 @@
+Teapot Load Balancing
+=====================
+
+Load balancers are one of the things that Kubernetes expects to be provided by
+the underlying cloud. No multi-tenant bare-metal solutions for this exist, so
+project Teapot would need to provide one. Ideally an external load balancer
+would act as an abstraction over what could be either a tenant-specific
+software load balancer or multi-tenant-safe access to a hardware (or virtual)
+load balancer.
+
+There are two ways for an application to request an external load balancer in
+Kubernetes. The first is to create a Service_ with type |LoadBalancer|_. This
+is the older way of doing things but is still useful for lower-level plumbing,
+and may be required for non-HTTP(S) protocols. The preferred (though nominally
+beta) way is to create an Ingress_. The Ingress API allows for more
+sophisticated control (such as adding abbr:`TLS (Transport Layer Security)`
+termination), and can allow multiple services to share a single external load
+balancer (including across different DNS names), and hence a single IP address.
+
+Most managed Kubernetes services provide an Ingress controller that can set up
+external load balancers, including TLS termination, using the underlying
+cloud's services. Without this, tenants can still use an Ingress controller,
+but it would have to be one that uses resources available to the tenant, such
+as by running software load balancers in the tenant cluster.
+
+When using a Service of type |LoadBalancer| (rather than an Ingress), there is
+no standardised way of requesting TLS termination (some cloud providers permit
+it using an annotation), so supporting this use case is not a high priority.
+The |LoadBalancer| Service type in general should be supported, however (though
+there are existing Kubernetes offerings where it is not).
+
+Implementation options
+----------------------
+
+The choices below are not mutually exclusive. An administrator of a Teapot
+cloud and their tenants could each potentially choose from among several
+available options.
+
+.. _teapot-load-balancing-metallb-l2:
+
+MetalLB (Layer 2) on tenant cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The MetalLB_ project provides two ways of doing load balancing for bare-metal
+clusters. One requires control over only layer 2, although it really only
+provides the high-availability aspects of load balancing, not actual balancing.
+All incoming traffic for each service is directed to a single node; from there
+kubeproxy distributes it to the endpoints that handle it. However, should the
+node die, traffic rapidly fails over to another node.
+
+This form of load balancing does not support offloading TLS termination,
+results in large amounts of East-West traffic, and consumes resources from the
+guest cluster.
+
+Tenants could decide to use this unilaterally (i.e. without the involvement of
+the management cluster or its administrators). However, using MetalLB restricts
+the choice of CNI plugins -- for example it does not work with OVN. A
+pre-requisite to use it would be that all tenant machines share a layer 2
+broadcast domain, which may be undesirable in larger clouds. This may be an
+acceptable solution for Services in some cases though.
+
+.. _teapot-load-balancing-metallb-l3-management:
+
+MetalLB (Layer 3) on management cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The layer 3 form of MetalLB_ load balancing provides true load balancing, but
+requires control over the network hardware in the form of advertising
+:abbr:`ECMP (Equal Cost Multiple Path)` routes via BGP. (This also places
+additional `requirements
+<https://metallb.universe.tf/concepts/bgp/#limitations>`_ on the network
+hardware.) Since tenant clusters are not trusted to do this, it would have to
+run in the management cluster. There would need to be an API in the management
+cluster to vet requests and pass them on to MetalLB, and a
+cloud-provider-teapot plugin that tenants could optionally install to connect
+to it.
+
+This form of load balancing does not support offloading TLS termination either.
+
+.. _teapot-load-balancing-metallb-l3-tenant:
+
+MetalLB (Layer 3) on tenant cluster
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While the network cannot trust BGP announcements from tenants, in principle the
+management cluster could have a component, perhaps based on `ExaBGP
+<https://github.com/Exa-Networks/exabgp#readme>`_, that listens to such
+announcements on the tenant V(x)LANs, drops any that refer to networks not
+allocated to the tenant, and rebroadcasts the legitimate ones to the network
+hardware.
+
+This would allow tenant networks to choose to make use of MetalLB in its Layer
+3 mode, providing actual traffic balancing as well as making it possible to
+split tenant machines amongst separate L2 broadcast domains. It would also
+allow tenants to choose among a much wider range of :doc:`CNI plugins
+<./networking>`, many of which also rely on BGP announcements.
+
+This form of load balancing still does not support offloading TLS termination.
+
+.. _teapot-load-balancing-ovn:
+
+Build a new OVN-based load balancer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One drawback of MetalLB is that it is not compatible with using OVN as the
+network overlay. This is unfortunate, as OVN is one of the most popular network
+overlays used with OpenStack, and thus might be a common choice for those
+wanting to integrate workloads running in OpenStack and Kubernetes together.
+
+A new OVN-based network load balancer in the vein of MetalLB might provide more
+options for this group.
+
+.. _teapot-load-balancing-ingress-api:
+
+Build a new API using Ingress resources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A new API in the management cluster would receive requests in a form similar to
+an Ingress resource, sanitise them, and then proxy them to an Ingress
+controller running in the management cluster (or some other
+centrally-controlled cluster). In fact, it is possible the 'API' could be as
+simple as using the existing Ingress API in a namespace with a validating
+webhook.
+
+The most challenging part of this would be coaxing the Ingress controllers on
+the load balancing cluster to target services in a different cluster (the
+tenant cluster). Most likely we would have to sync the EndpointSlices from the
+tenant cluster into the load balancing cluster.
+
+In all likelihood when using a software-based Ingress controller running in a
+load balancing cluster, a network load balancer would also be used on that
+cluster to ensure high-availability of the load balancers themselves. Examples
+include MetalLB and `kube-keepalived-vip
+<https://github.com/aledbf/kube-keepalived-vip>`_ (which uses :abbr:`VRRP
+(Virtual Router Redundancy Protocol)` to ensure high availability). This
+component would need to be integrated with :ref:`public IP assignment
+<teapot-networking-external>`.
+
+There are already controllers for several types of software load balancers (the
+nginx controller is even officially supported by the Kubernetes project), as
+well as multiple hardware load balancers. This includes an existing Octavia
+Ingress controller in cloud-provider-openstack_, which would be useful for
+:doc:`integrating with OpenStack clouds <openstack-integration>`. The ecosystem
+around this API is likely to have continued growth. This is also likely to be
+the site of future innovation around configuration of network hardware, such as
+hardware firewalls.
+
+In general, Ingress controllers are not expected to support non-HTTP(S)
+protocols, so it's not necessarily possible to implement the |LoadBalancer|
+Service type with an arbitrary plugin. However, the nginx Ingress controller
+has support for arbitrary `TCP and UDP services
+<https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/>`_,
+so the API would be able to provide for either type.
+
+Unlike the network load balancer options, this form of load balancing would be
+able to terminate TLS connections.
+
+.. _teapot-load-balancing-custom-api:
+
+Build a new custom API
+~~~~~~~~~~~~~~~~~~~~~~
+
+A new service running on the management cluster would provide an API through
+which tenants could request a load balancer. An implementation of this API
+would provide a pure-software load balancer running in containers in the
+management cluster (or some other centrally-controlled cluster). As in the case
+of an Ingress-based controller, a network load balancer would likely be used to
+provide high-availability of the load balancers.
+
+The API would be designed such that alternate implementations of the controller
+could be created for various load balancing hardware. Ideally one would take
+the form of a shim to the existing cloud-provider API for load balancers, so
+that existing plugins could be used. This would include
+cloud-provider-openstack, for the case where Teapot is installed alongside an
+OpenStack cloud allowing it to make use of Octavia.
+
+Unlike the network load balancer options, this form of load balancing would be
+able to terminate TLS connections.
+
+This option seems to be strictly inferior to using Ingress controllers on the
+load balancing cluster to implement an API, assuming both options prove
+feasible.
+
+.. _teapot-load-balancing-ingress-controller:
+
+Build a new Ingress controller
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the event that we build a new API in the management cluster, a Teapot
+Ingress controller would proxy requests for an Ingress to it. This controller
+would likely be responsible for syncing the EndpointSlices to the API as well.
+
+.. _teapot-load-balancing-cloud-provider:
+
+Build a new cloud-provider
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the event that we build a new API in the management cluster, a
+cloud-provider-teapot plugin that tenants could optionally install would allow
+them to make use of the API in the management cluster to configure Services of
+type |LoadBalancer|.
+
+While helpful to increase portability of applications between clouds, this is a
+much lower priority than building an Ingress controller. Tenants can always
+choose to use Layer 2 MetalLB for their |LoadBalancer| Services instead.
+
+.. _teapot-load-balancing-octavia:
+
+OpenStack Octavia
+~~~~~~~~~~~~~~~~~
+
+On paper, Octavia_ provides exactly what we want: a multi-tenant abstraction
+layer over hardware load balancer APIs, with a software-based driver for those
+wanting a pure-software solution.
+
+In practice, however, there is only one driver for a hardware load balancer
+(along with a couple of other out-of-tree drivers), and an Ingress controller
+for that hardware also exists. More drivers existed for the earlier Neutron
+LBaaS v2 API, but some vendors had largely moved on to Kubernetes by the time
+the Neutron API was replaced by Octavia.
+
+The pure-software driver (Amphora) itself supports provider plugins for its
+compute and network. However the only currently available providers are for
+OpenStack Nova and OpenStack Neutron. Nova will not be present in Teapot. Since
+we want to make use of Neutron only as a replaceable implementation detail --
+if at all -- Teapot cannot allow other components of the system to become
+dependent on it. Additional providers would have to be written in order to use
+Octavia in Teapot.
+
+Another possibility is integration in the other direction -- using a
+Kubernetes-based service as a driver for Octavia when Teapot is
+:doc:`co-installed with an OpenStack cloud <openstack-integration>`.
+
+.. |LoadBalancer| replace:: ``LoadBalancer``
+
+.. _Service: https://kubernetes.io/docs/concepts/services-networking/service/
+.. _LoadBalancer: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
+.. _Ingress: https://kubernetes.io/docs/concepts/services-networking/ingress/
+.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-octavia-ingress-controller.md#readme
+.. _MetalLB: https://metallb.universe.tf/
+.. _Octavia: https://docs.openstack.org/octavia/
--- a/doc/source/ideas/teapot/networking.rst
+++ b/doc/source/ideas/teapot/networking.rst
@ -0,0 +1,187 @@
+Teapot Networking
+=================
+
+In Project Teapot, tenant clusters are deployed exclusively on bare-metal
+servers, which are under the complete control of the tenant. Therefore the
+network itself must be the guarantor of multi-tenancy, with only untrusted
+components running on tenant machines. (Trusted components can still run within
+the management cluster.)
+
+.. _teapot-networking-multi-tenancy:
+
+Multi-tenant Network Model
+--------------------------
+
+Support for VLANs and VxLAN is ubiquitous in modern data center network
+hardware, so this will be the basis for Teapot's networking. Each tenant will
+be assigned one or more V(x)LANs. (Separate failure domains will likely also
+have separate broadcast domains.) As machines are assigned to the tenant, the
+Teapot controller will connect each to a private virtual network also assigned
+to the tenant.
+
+Small deployments can just use VLANs. Larger deployments may need VxLAN, and in
+this case :abbr:`VTEP (VxLAN Tunnel EndPoint)`-capable edge switches and a
+VTEP-capable router will be required.
+
+This design frees the tenant clusters from being forced to use a particular
+:abbr:`CNI (Container Network Interface)` plugin. Tenants are free to select a
+networking overlay (e.g. Flannel, Cilium, OVN, &c.) or other CNI plugin (e.g.
+Calico, Romana) of their choice within the tenant cluster, provided that it
+does not need to be trusted by the network. (This would preclude solutions that
+rely on advertising BGP/OSPF routes, although it's conceivable that one day
+these advertisements could be filtered through a trusted component in the
+management cluster and rebroadcast to the unencapsulated network -- this would
+also be useful for :ref:`load balancing
+<teapot-load-balancing-metallb-l3-tenant>` of Services.) If the tenant's CNI
+plugin does create an overlay network, that technically means that packets will
+be double-encapsulated, which is a Bad Thing when it occurs in VM-based
+clusters, for several reasons:
+
+* There is a performance overhead to encapsulating the packets on the
+  hypervisor, and it also limits the ability to apply some performance
+  optimisations (such as using SR-IOV to provide direct access to the NICs from
+  the VMs by virtualising the PCIe bus).
+* The extra overhead in each packet can cause fragmentation, and reduces the
+  bandwidth available at the edge.
+* Broadcast, multicast and unknown unicast traffic is flooded to all possible
+  endpoints in the overlay network; doing this at multiple layers can increase
+  network load.
+
+However, these problems are significantly mitigated in the Teapot model:
+
+* The performance cost of performing the second encapsulation is eliminated by
+  offloading it to the network hardware.
+* Encapsulation headers are carried only within the core of the network, where
+  bandwidth is less scarce and frame sizes can be adjusted to prevent
+  fragmentation.
+* CNI plugins don't generally make significant use of broadcast or multicast.
+
+.. _teapot-networking-provisioning:
+
+Provisioning Network
+--------------------
+
+Generally bare-metal machines will need at least one interface connected to a
+provisioning network in order to boot using :abbr:`PXE (Pre-boot execution
+environment)`. Typically the provisioning network is required to be an untagged
+VLAN.
+
+PXE can be avoided by provisioning using virtual media (where the BMC attaches
+a virtual disk containing the boot image to the host's USB), but hardware
+support for doing this from Ironic is uneven (though rapidly improving) and it
+is considerably slower than PXE. In addition, the Ironic agent typically
+communicates over this network for purposes such as introspection of hosts or
+cleaning of disks.
+
+For the purpose of PXE booting, hosts could be left permanently connected to
+the provisioning network provided they are isolated from each other (e.g. using
+private VLANs). This would have the downside that the main network interface of
+the tenant worker would have to appear on a tagged VLAN. However, the Ironic
+agent's access to the Ironic APIs is unauthenticated, and therefore not safe to
+be carried over networks that have hosts allocated to tenants connected to
+them. This could occur over a separate network, but in any event hosts'
+membership of this network will have to be changed dynamically in concert with
+the baremetal provisioner.
+
+The :abbr:`BMC (Baseboard management controller)`\ s will be connected to a
+separate network that is reachable only from the management cluster.
+
+.. _teapot-networking-storage:
+
+Storage Network
+---------------
+
+When (optionally) used in combination with multi-tenant storage, machines will
+need to also be connected to a separate storage network. The networking
+requirements for this network are much simpler, as it does not need to be
+dynamically managed. Each edge port should be isolated from all of the others
+(using e.g. Private VLANs), regardless of whether they are part of the same
+tenant. :abbr:`QoS (Quality of Service)` rules should ensure that no individual
+machine can effectively deny access to others. Configuring the switches for the
+storage network can be considered out of scope for Project Teapot, at least
+initially, as the configuration need not be dynamic, but might be in scope for
+the :doc:`installer <installation>`.
+
+.. _teapot-networking-external:
+
+External connections
+--------------------
+
+Workloads running in a tenant cluster can request to be exposed for incoming
+external connections in a number of different ways. The Teapot cloud is
+responsible for ensuring that each of these is possible.
+
+The ``NodePort`` service type simply requires that the IP addresses of the
+cluster members be routable from external networks.
+
+For IPv4 support in particular, Teapot will need to be able to allocate public
+IP addresses and route traffic for them to the appropriate networks.
+Traditionally this is done using :abbr:`NAT (Network Address Translation)`
+(e.g. Floating IPs in OpenStack). Users can specify an externalAddress to make
+use of public IPs within their cluster, although there's no built-in way to
+discover what IPs are available. Teapot should also have a way of exporting the
+:doc:`reverse DNS records <dns>` for public IP addresses.
+
+The ``LoadBalancer`` Service type uses an external :doc:`load balancer
+<load-balancing>` as a front end. Traffic from the load balancer is directed
+to a ``NodePort`` service within the tenant cluster.
+
+Most managed Kubernetes services provide an Ingress controller that can set up
+load balancing (including :abbr:`TLS (Transport Layer Security)` termination)
+in the underlying cloud for HTTP(S) traffic, including automatically
+configuring public IPs. If Teapot provided :ref:`such an Ingress controller
+<teapot-load-balancing-ingress-controller>`, it might be a viable option to not
+support public IPs at all for the ``NodePort`` service type. In this case, the
+implementation of public IPs could be confined to the :ref:`load balancing API
+<teapot-load-balancing-ingress-api>`, and the only stable public IP addresses
+would be the Virtual IPs of the load balancers. Tenant IPv6 addresses could
+easily be made publicly routable to provide direct access to ``NodePort``
+services over IPv6 only, although this also comes with the caveat that some
+clients may be tempted to rely on the IP of a Service being static, when in
+fact the only safe way to reference it is via a :doc:`DNS name <dns>` exported
+by ExternalDNS.
+
+Implementation Options
+----------------------
+
+.. _teapot-networking-ansible:
+
+Ansible Networking
+~~~~~~~~~~~~~~~~~~
+
+A good long-term implementation strategy might be to use ansible-networking to
+directly configure the top-of-rack switches. This would be driven by a
+Kubernetes controller running in the management cluster operating on a set of
+Custom Resource Definitions (CRDs). The ansible-networking project supports a
+wide variety of hardware already. A minimal proof of concept for this
+controller `exists <https://github.com/bcrochet/physical-switch-operator>`_.
+
+In addition to configuring the edge switches, a solution for public IPs and
+other ways of exposing services is also needed. Future requirements likely
+include configuring limited cross-tenant network connectivity, and access to
+hardware load balancers and other data center hardware.
+
+.. _teapot-networking-neutron:
+
+OpenStack Neutron
+~~~~~~~~~~~~~~~~~
+
+A good short-term option might be to use a cut-down Neutron installation as an
+implementation detail to manage the network. Using only the baremetal port
+types in Neutron circumvents a lot of the complexity. Most of the Neutron
+agents would not be required, so message queue--based RPC could be eliminated
+or replaced with json-rpc (as it has been in Ironic for Metal³). Since only a
+trusted service would be controlling network changes, Keystone authentication
+would not be required either.
+
+To ensure that Neutron itself could eventually be switched out, it would be
+strictly confined behind a Kubernetes-native API, in much the same way as
+Ironic is behind Metal³. The existing direct integration between Ironic and
+Neutron would not be used, and nor could we rely on Neutron to provide an
+integration point for e.g. :ref:`Octavia <teapot-load-balancing-octavia>` to
+provide an abstraction over hardware load balancers.
+
+The abstraction point would be the Kubernetes CRDs -- different controllers
+could be chosen to manage custom resources (and those might in turn make use of
+additional non-public CRDs), but we would not attempt to build controllers with
+multiple plugin points that could lead to ballooning complexity.
--- a/doc/source/ideas/teapot/openstack-integration.rst
+++ b/doc/source/ideas/teapot/openstack-integration.rst
@ -0,0 +1,109 @@
+Teapot and OpenStack
+====================
+
+Many potential users of Teapot have large existing OpenStack deployments.
+Teapot is not intended to be a wholesale replacement for OpenStack -- it does
+not deal with virtualisation at all, in fact -- so it is important that the two
+complement each other.
+
+.. _teapot-openstack-managed-services:
+
+Managed Services
+----------------
+
+A goal of Teapot is to make it easier for cloud providers to offer managed
+services to tenants. Attempts to do this in OpenStack, such as Trove_, have
+mostly foundered. The Kubernetes Operator pattern offers the most promising
+ground for building such services in future, and since Teapot is
+Kubernetes-native it would be well-placed to host them.
+
+Building a thin OpenStack-style ReST API over such services would allow their
+use from an OpenStack cloud (presumably sharing, or federated to, the same
+Keystone) simultaneously. And, in fact, most such services could be decoupled
+from Teapot altogether and run in a generic Kubernetes cluster so that they
+could benefit users of either cloud type even absent the other.
+
+Teapot's :ref:`load balancing API <teapot-load-balancing-ingress-api>` would
+arguably already be a managed service. :ref:`Octavia
+<teapot-load-balancing-octavia>` could possibly use it as a back-end as a first
+example.
+
+.. _teapot-openstack-side-by-side:
+
+Side-by-side Clouds
+-------------------
+
+Teapot should be co-installable alongside an existing OpenStack cloud to
+provide additional value. In this configuration, the Teapot cloud would use the
+OpenStack cloud's :ref:`Keystone <teapot-idm-keystone>` and any services that
+are expected to be found in the catalog (e.g. :ref:`Manila
+<teapot-storage-manila>`, :ref:`Cinder <teapot-storage-cinder>`,
+:ref:`Designate <teapot-dns-designate>`).
+
+An OpenStack-style ReST API in front of Teapot would allow users of the
+OpenStack cloud to create and manage bare-metal Kubernetes clusters in much the
+same way they do today with Magnum.
+
+Tenants would need a way to connect their Neutron networks in OpenStack to the
+Kubernetes clusters. Since Teapot tenant networks are :ref:`just V(x)LANs
+<teapot-networking-multi-tenancy>`, this could be accomplished by adding those
+networks as provider networks in Neutron, and allowing the correct tenants to
+connect to them via Neutron routers. This should be sufficient for the main use
+case, which would be running parts of an application in a Kubernetes cluster
+while other parts remain in OpenStack VMs.
+
+However, the ideal for this type of deployment would be to allow servers to be
+dynamically moved between the OpenStack and Teapot clouds. Sharing inventory
+with OpenStack's Ironic might be simple enough -- if Metal³ was configured to
+use the OpenStack cloud's Ironic then a small component could claim hosts in
+OpenStack Placement and create corresponding BareMetalHost objects in Teapot.
+Both clouds would end up manipulating the top-of-rack switch configuration for
+a host, but presumably only at different times.
+
+Switching hosts between acting as OpenStack compute nodes and being available
+to Teapot tenants would be more complex, since it would require interaction
+with the tool managing the OpenStack deployment, of which there are many.
+However, supporting autoscaling between the two is probably unnecessary.
+Manually moving hosts between the clouds should be manageable, since no changes
+to the physical network cabling would be required. Separate :ref:`provisioning
+networks <teapot-networking-provisioning>` would need to be maintained, since
+the provisioner needs control over DHCP.
+
+.. _teapot-openstack-on-teapot:
+
+OpenStack on Teapot
+-------------------
+
+To date, the most popular OpenStack installers have converged on Ansible as a
+deployment tool because the complexity of OpenStack needs tight control over
+the workflow that purely declarative tools struggle to match. However,
+Kubernetes Operators present a declarative alternative that is nonetheless
+equally flexible. Even without Operators, Airship_ and StarlingX_ are both
+installing OpenStack on top of Kubernetes. It seems likely that in the future
+this will be a popular way of layering things, and Teapot is well-placed to
+enable it since it provides bare-metal hosts running Kubernetes.
+
+For a large, shared OpenStack cloud, this would likely be best achieved by
+running the OpenStack control plane components inside the Teapot management
+cluster. Sharing of services would then be similar to the side-by-side case.
+OpenStack Compute nodes or e.g. Ceph storage nodes could be deployed using
+`Metal³`_. This effectively means building an OpenStack installation/management
+system similar to a TripleO undercloud but based on Kubernetes.
+
+There is a second use case, for running small OpenStack installations (similar
+to StarlingX) within a tenant. In these cases, the tenant OpenStack would still
+need to access storage from the Teapot cloud. This could possibly be achieved
+by federating the tenant Keystone to Teapot's Keystone and using hierarchical
+multi-tenancy so that projects in the tenant Keystone are actually sub-projects
+of the tenant's project in the Teapot Keystone. (The long-dead `Trio2o
+<https://opendev.org/x/trio2o#trio2o>`_ project also offered a potential
+solution in the form of an API proxy, but probably not one worth resurrecting.)
+Use of an overlay network (e.g. OVN) would be required, since the tenant would
+have no access to the underlying network hardware. Some integration between the
+tenant's Neutron and Teapot would need to be built to allow ingress traffic.
+
+
+.. _Trove: https://docs.openstack.org/trove/
+.. _Airship: https://www.airshipit.org/
+.. _StarlingX: https://www.starlingx.io/
+.. _Metal³: https://metal3.io/
--- a/doc/source/ideas/teapot/storage.rst
+++ b/doc/source/ideas/teapot/storage.rst
@ -0,0 +1,140 @@
+Teapot Storage
+==============
+
+Project Teapot should have the ability to optionally provide multi-tenant
+access to shared file, block, and/or object storage. Shared file and block
+storage capabilities are not currently available to Kubernetes users except
+through the cloud providers.
+
+Tenants can always choose to use hyperconverged storage -- that is to say, both
+compute and storage workloads on the same hosts -- without involvement or
+permission from Teapot. (For example, by using Rook_.) However, this means that
+compute and storage cannot be scaled independently; they are tightly coupled.
+Tenants with disproportionately large amounts of data but modest compute needs
+(and sometimes vice-versa) would not be served efficiently. Hyperconverged
+storage also usually makes sense only for clusters that are essentially fixed.
+Changing the size of the cluster results in rebalancing of storage, so it is
+not suitable for workloads that vary greatly over time (for instance, training
+of machine learning models).
+
+To efficiently run hyperconverged storage also requires a somewhat specialised
+choice of servers. Particularly in a large cloud where different tenants have
+different storage requirements, it might be cheaper to provide a centralised
+storage cluster and thus require either fewer variants or less specialisation
+of server hardware.
+
+For all of these reasons, a shared storage pool is needed to take full
+advantage of the highly dynamic environment offered by a cloud like Teapot.
+
+Providing multi-tenant access to shared file and block storage allows the cloud
+provider to use a dedicated storage network (such as a :abbr:`SAN (Storage Area
+Network)`). Many potential users may already have something like this. Having
+the storage centralised also makes it easier and more efficient to share large
+amounts of data between tenants when required (since traffic can be confined to
+the same :ref:`storage network <teapot-networking-storage>` rather than
+traversing the public network).
+
+Applications can use object storage anywhere (including outside clouds), but to
+minimise network bandwith, it will often be better to have it nearby. Should
+the `proposal to add Object Bucket Provisioning
+<https://github.com/kubernetes/enhancements/pull/1383>`_ to Kubernetes
+eventuate, there will also be advantage in have object storage as part of the
+local cloud, using the same authentication mechanism.
+
+Implementation Options
+----------------------
+
+OpenStack already provides robust, mature implementations of multi-tenant
+shared storage that are accessible from Kubernetes. The main task would be to
+integrate them into the system and simplify deployment. These services would
+run in either the management cluster or a separate (but still
+centrally-managed) storage cluster.
+
+.. _teapot-storage-manila:
+
+OpenStack Manila
+~~~~~~~~~~~~~~~~
+
+Manila_ is the most natural fit for Kubernetes because it provides 'RWX'
+(Read/Write Many) persistent storage, which is often needed to avoid downtime
+when pods are upgraded or rescheduled to different nodes as well as for
+applications where multiple pods are writing to the same filesystem in
+parallel.
+
+Manila's architecture is relatively simple already. It would be helpful if the
+dependency on RabbitMQ could be removed (to be replaced with e.g. json-rpc in
+the same way that Ironic has in Metal³), but this would require more
+investigation. An Operator for deploying and managing Manila on Kubernetes is
+under development.
+
+A :abbr:`CSI (Container Storage Interface)` plugin for Manila already exists in
+cloud-provider-openstack_.
+
+.. _teapot-storage-cinder:
+
+OpenStack Cinder
+~~~~~~~~~~~~~~~~
+
+Cinder_ is more limited than Manila in the sense that it can provide only 'RWO'
+(Read/Write One) access to persistent storage for most applications.
+(Kubernetes volume mounts are generally file-based -- Kubernetes creates its
+own file system on block devices if none is present.) However, Kubernetes does
+now support raw block storage volumes, which *do* support 'RWX' mode for
+applications that can work with raw block offsets. KubeVirt in particular is
+expected to make use of raw block mode persistent volumes for backing virtual
+machines, so this is likely to be a common use case.
+
+Much of the complexity in Cinder is linked to the need to provide agents
+running on Nova compute hosts. Since Teapot is a baremetal-only service, only
+the parts of Cinder needed to provide storage to Ironic servers are required.
+Unfortunately, Cinder is quite heavily dependent on RabbitMQ. However, there
+may be scope for simplification through further work with the Cinder community.
+The remaining portions of Cinder are architecturally very similar to Manila, so
+similar results could be expected.
+
+Cinder has a dependency on Barbican for supporting encrypted volumes. Encrypted
+volume support is not required but would be nice to have. This is another
+reason to use :ref:`Barbican <teapot-key-management-barbican>`. It would be
+nice to think that we could adapt Cinder to be able to use Kubernetes Secrets
+instead (perhaps via another key manager back-end to Castellan), but that
+doesn't actually provide the :doc:`level of security you would hope for
+<key-management>` without Barbican or an equivalent anyway.
+
+A :abbr:`CSI (Container Storage Interface)` plugin for Cinder already exists in
+cloud-provider-openstack_.
+
+Ember_ is an alternative CSI plugin that makes use of lib-cinder, rather than
+all of Cinder. This allows Cinder's hardware drivers to be used directly from
+Kubernetes while eliminating a lot of overhead. However, some of the overhead
+that is eliminated is the API that enforces multi-tenancy. Therefore, Ember is
+not an option for this particular use case.
+
+.. _teapot-storage-swift:
+
+OpenStack Swift
+~~~~~~~~~~~~~~~
+
+Swift_ is a very mature object storage system, with both a native API and the
+ability to emulate Amazon S3. It supports :ref:`Keystone <teapot-idm-keystone>`
+authentication. It has a relatively simple architecture that should make it
+straightforward to deploy on top of Kubernetes.
+
+.. _teapot-storage-radosgw:
+
+Ceph Object Gateway
+~~~~~~~~~~~~~~~~~~~
+
+RadosGW_ is a service to provide an object storage interface backed by Ceph,
+with two APIs that are compatible with large subsets of Swift and Amazon S3,
+respectively. It can use either :ref:`Keystone <teapot-idm-keystone>` or
+:ref:`Keycloak <teapot-idm-keycloak>` for authentication. It can be installed
+and managed using the Rook_ operator.
+
+
+.. _Rook: https://rook.io/
+.. _cloud-provider-openstack: https://github.com/kubernetes/cloud-provider-openstack#readme
+.. _Manila: https://docs.openstack.org/manila/latest/
+.. _Cinder: https://docs.openstack.org/cinder/latest/
+.. _Ember: https://ember-csi.io/
+.. _Swift: https://docs.openstack.org/swift/latest/
+.. _RadosGW: https://docs.ceph.com/docs/master/radosgw/
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -89,8 +89,9 @@ Proposed ideas
 ==============

 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
   :titlesonly:
   :glob:

+   ideas/*/index
   ideas/*