Support for Podman in Stein

This is a specs for the next generation of containers in TripleO Stein release. Change-Id: I12782deab7b24105aa5efd81827b8b5a7330d5f1
2018-09-13 16:31:14 -06:00 · 2018-09-13 16:31:14 -06:00 · 4a7f0256a0
parent 5bfcc67c68
commit 4a7f0256a0
1 changed files with 322 additions and 0 deletions
--- a/specs/stein/podman.rst
+++ b/specs/stein/podman.rst
@ -0,0 +1,322 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=======================================
+Podman support for container management
+=======================================
+
+Launchpad blueprint:
+
+https://blueprints.launchpad.net/tripleo/+spec/podman-support
+
+There is an ongoing desire to manage TripleO containers with a set of tools
+designed to solve complex problems when deploying applications.
+The containerization of TripleO started with a Docker CLI implementation
+but we are looking at how we could leverage the container orchestration
+on a Kubernetes friendly solution.
+
+
+Problem Description
+===================
+
+There are three problems that this document will cover:
+
+* There is an ongoing discussion on whether or not Docker will be
+  maintained on future versions of Red Hat platforms. There is a general
+  move on OCI (Open Containers Initiative) conformant runtimes, as CRI-O
+  (Container Runtime Interface for OCI).
+
+* The TripleO community has been looking at how we could orchestrate the
+  containers lifecycle with Kubernetes, in order to bring consistency with
+  other projects like OpenShift for example.
+
+* The TripleO project aims to work on the next version of Red Hat platforms,
+  therefore we are looking at Docker alternatives in Stein cycle.
+
+
+Proposed Change
+===============
+
+Introduction
+------------
+
+The containerization of TripleO has been an ongoing effort since a few releases
+now and we've always been looking at a step-by-step approach that tries to
+maintain backward compatibility for the deployers and developers; and also
+in a way where upgrade from a previous release is possible, without too much
+pain. With that said, we are looking at a proposed change that isn't too much
+disruptive but is still aligned with the general roadmap of the container
+story and hopefully will drive us to manage our containers with Kubernetes.
+We use Paunch project to provide an abstraction in our container integration.
+Paunch will deal with container configurations formats with backends support.
+
+Integrate Podman CLI
+--------------------
+
+The goal of Podman is to allow users to run standalone (non-orchestrated)
+containers which is what we have been doing with Docker until now.
+Podman also allows users to run groups of containers called Pods where a Pod is
+a term developed for the Kubernetes Project which describes an object that
+has one or more containerized processes sharing multiple namespaces
+(Network, IPC and optionally PID).
+Podman doesn't have any daemon which makes it lighter than Docker and use a
+more traditional fork/exec model of Unix and Linux.
+The container runtime used by Podman is runc.
+The CLI has a partial backward compatibility with Docker so its integration
+in TripleO shouldn't be that painful.
+
+It is proposed to add support for Podman CLI (beside Docker CLI) in TripleO
+to manage the creation, deletion, inspection of our containers.
+We would have a new parameter called ContainerCli in TripleO, that if set to
+'podman', will make the container provisionning done with Podman CLI and not
+Docker CLI.
+
+Because there is no daemon, there are some problems that we needs to solve:
+
+* Automatically restart failed containers.
+* Automatically start containers when the host is (re)booted.
+* Start the containers in a specific order during host boot.
+* Provide an channel of communication with containers.
+* Run container healthchecks.
+
+To solve the first 3 problems, it is proposed to use Systemd:
+
+* Use Restart so we can configure a restart policy for our containers.
+  Most of our containers would run with Restart=always policy, but we'll
+  have to support some exceptions.
+* The systemd services will be enabled by default so the containers start
+  at boot.
+* The ordering will be managed by Wants which provides Implicit Dependencies
+  in Systemd. Wants is a weaker version of Requires. It'll allow to make sure
+  we start HAproxy before Keepalived for example, if they are on the same host.
+  Because it is a weak dependency, they will only be honored if the containers
+  are running on the same host.
+* The way containers will be managed (start/stop/restart/status) will be
+  familiar for our operators used to control Systemd services. However
+  we probably want to make it clear that this is not our long term goal to
+  manage the containers with Systemd.
+
+The Systemd integration would be:
+
+* complete enough to cover our use-cases and bring feature parity with the
+  Docker implementation.
+* light enough to be able to migrate our container lifecycle with Kubernetes
+  in the future (e.g. CRI-O).
+
+
+For the fourth problem, we are still investigating the options:
+
+* varlink: interface description format and protocol that aims to make services
+  accessible to both humans and machines in the simplest feasible way.
+* CRI-O: CI-based implementation of Kubernetes Container Runtime Interface
+  without Kubelet. For example, we could use a CRI-O Python binding to
+  communicate with the containers.
+* A dedicated image which runs the rootwrap daemon, with rootwrap filters to only run the allowed
+  commands.  The controlling container will have the rootwrap socket mounted in so that it can
+  trigger allowed calls in the rootwrap container.  For pacemaker, the rootwrap container will allow
+  image tagging. For neutron, the rootwrap container will spawn the processes inside the container,
+  so it will need to be a long-lived container that is managed outside paunch.
+
+             +---------+     +----------+
+             |         |     |          |
+             | L3Agent +-----+ Rootwrap |
+             |         |     |          |
+             +---------+     +----------+
+
+  In this example, the L3Agent container has mounted in the rootwrap daemon socket so that it can
+  run allowed commands inside the rootwrap container.
+
+Finally, the fifth problem is still an ongoing question.
+There are some plans to support healthchecks in Podman but nothing has been
+done as of today. We might have to implement something on our side with
+Systemd.
+
+Alternatives
+============
+
+Two alternatives are proposed.
+
+CRI-O Integration
+-----------------
+
+CRI-O is meant to provide an integration path between OCI conformant runtimes
+and the kubelet. Specifically, it implements the Kubelet Container Runtime
+Interface (CRI) using OCI conformant runtimes. Note that the CLI utility for
+interacting with CRI-O isn't meant to be used in production, so managing
+the containers lifecycle with a CLI is only possible with Docker or Podman.
+
+So instead of a smooth migration from Docker CLI to Podman CLI, we could go
+straight to Kubernetes integration and convert our TripleO services to work
+with a standalone Kubelet managed by CRI-O.
+We would have to generate YAML files for each container in a Pod format,
+so CRI-O can manage them.
+It wouldn't require Systemd integration, as the containers will be managed
+by Kubelet.
+The operator would control the container lifecycle by using kubectl commands
+and the automated deployment & upgrade process would happen in Paunch with
+a Kubelet backend.
+
+While this implementation will help us to move to a multi-node Kubernetes
+friendly environment, it remains the most risky option in term of the
+quantity of work that needs to happen versus the time that we have to design,
+implement, test and ship the next tooling before the end of Stein cycle.
+
+We also need to keep in mind that CRI-O and Podman share containers/storage
+and containers/image libraries, so the issues that we have had with Podman
+will be hit with CRI-O as well.
+
+Keep Docker
+-----------
+
+We could keep Docker around and do not change anything in the way we manage
+containers. We could also keep Docker and make it work with CRI-O.
+The only risk here is that Docker tooling might not be supported in the future
+by Red Hat platforms and we would be on our own if any issue with Docker.
+The TripleO community is always seeking for an healthy and long term
+collaboration between us and the projects communities that we are interracting
+with.
+
+Proposed roadmap
+================
+
+In Stein:
+
+* Make Paunch support Podman as an alternative to Docker.
+* Get our existing services fully deployable on Podman, with parity to
+  what we had with Docker.
+* If we have time, add Podman pod support to Paunch
+
+In "T" cycle:
+
+* Rewrite all of our container yaml to the pod format.
+* Add a Kubelet backend to Paunch (or change our agent tooling to call
+  Kubelet directly from Ansible).
+* Get our existing service fully deployable via Kublet, with parity to
+  what we had with Podman / Docker.
+* Evaluate switching to Kubernetes proper.
+
+
+Security Impact
+===============
+
+The TripleO containers will rely on Podman security.
+If we don't use CRI-O or varlink to communicate with containers, we'll have
+to consider running some containers in privileged mode and mount
+/var/lib/containers into the containers. This is a security concern and
+we'll have to evaluate it.
+Also, we'll have to make the proposed solution with SELinux in Enforcing mode.
+
+Docker solution doesn't enforce selinux separation between containers.
+Podman does, and there's currently no easy way to deactivate that globally.
+So we'll basically get a more secure containers with Podman, as we have to
+support separation from the very beginning.
+
+Upgrade Impact
+==============
+
+The containers that were managed by Docker Engine will be removed and
+provisioned into the new runtime. This process will happen when Paunch
+generates and execute the new container configuration.
+The operator shouldn't have to do any manual action and the migration will be
+automated, mainly by Paunch.
+The Containerized Undercloud upgrade job will test the upgrade of an Undercloud
+running Docker containers on Rocky and upgrade to Podman containers on Stein.
+The Overcloud upgrade jobs will also test.
+
+Note: as the docker runtime doesn't have the selinux separation,
+some chcon/relabelling might be needed prior the move to podman runtime.
+
+End User Impact
+===============
+
+The operators won't be able to run Docker CLI like before and instead will
+have to use Podman CLI, where some backward compatibility is garanteed.
+
+Performance Impact
+==================
+
+There are different aspects of performances that we'll need to investigate:
+
+* Container performances (relying on Podman).
+* How Systemd + Podman work together and how restart work versus Docker engine.
+
+Deployer Impact
+===============
+
+There shouldn't be much impact for the deployer, as we aim to make this change
+the most transparent as possible. The only option (so far) that will be
+exposed to the deployer will be "ContainerCli", where only 'docker' and
+'podman' will be supported. If 'podman' is choosen, the transition will be
+automated.
+
+Developer Impact
+================
+
+There shouldn't be much impact for the developer of TripleO services, except
+that there are some things in Podman that slightly changed when comparing
+with Docker. For example Podman won't create the missing directories when
+doing bind-mount into the containers, while Docker create them.
+
+Implementation
+==============
+
+Contributors
+------------
+
+* Bogdan Dobrelya
+* Cédric Jeanneret
+* Emilien Macchi
+* Steve Baker
+
+Work Items
+----------
+
+* Update TripleO services to work with Podman (e.g. fix bind-mounts issues).
+* SELinux separation (relates to bind-mounts rights + some other issues when
+  we're calling iptables/other host command from a containe)
+* Systemd integration.
+* Healthcheck support.
+* Socket / runtime: varlink? CRI-O?
+* Upgrade workflow.
+* Testing.
+* Documentation for operators.
+
+
+Dependencies
+============
+
+* The Podman integration depends a lot on how stable is the tool and how
+  often it is released and shipped so we can test it in CI.
+* The Healthchecks interface depends on Podman's roadmap.
+
+Testing
+=======
+
+First of all, we'll switch the Undercloud jobs to use Podman and this work
+should be done by milestone-1. Both the deployment and upgrade jobs should
+be switched and actually working.
+The overcloud jobs should be switched by milestone-2.
+
+We'll keep Docker testing support until we keep testing running on CentOS7
+platform.
+
+Documentation Impact
+====================
+
+We'll need to document the new commands (mainly the same as Docker), and
+the differences of how containers should be managed (Systemd instead of Docker
+CLI for example).
+
+
+References
+==========
+
+* https://www.projectatomic.io/blog/2018/02/reintroduction-podman/
+* https://github.com/kubernetes-sigs/cri-o
+* https://github.com/kubernetes/community/blob/master/contributors/devel/container-runtime-interface.md
+* https://varlink.org/
+* https://github.com/containers/libpod/blob/master/transfer.md
+* https://etherpad.openstack.org/p/tripleo-standalone-kubelet-poc