Add container spec

This adds a proposed specification for using containers as build
resources.

Change-Id: I5d495f87a7d1f97546d2f3325caad8a2e30b201c
This commit is contained in:
James E. Blair 2018-04-10 10:23:36 -07:00
parent 44ef49f997
commit 3ad89dba93
3 changed files with 347 additions and 0 deletions

View File

@ -17,4 +17,5 @@ Zuul, though advanced users may find it interesting.
docs
ansible
javascript
specs/index
releasenotes

View File

@ -0,0 +1,330 @@
Use Kubernetes for Build Resources
==================================
.. warning:: This is not authoritative documentation. These features
are not currently available in Zuul. They may change significantly
before final implementation, or may never be fully completed.
There has been a lot of interest in using containers for build
resources in Zuul. The use cases are varied, so we need to describe
in concrete terms what we aim to support. Zuul provides a number of
unique facilities to a CI/CD system which are well-explored in the
full-system context (i.e., VMs or bare metal) but it's less obvious
how to take advantage of these features in a container environment.
As we design support for containers it's also important that we
understand how things like speculative git repo states and job content
will work with containers.
In this document, we will consider two general approaches to using
containers as build resources:
* Containers that behave like a machine
* Native container workflow
Finally, there are multiple container environments. Kubernetes and
OpenShift (an open source distribution of Kubernetes) are popular
environments which provide significant infrastructure to help us more
easily integrate them with Zuul, so this document will focus on these.
We may be able to extend this to other environments in the future.
.. _container-machine:
Containers That Behave Like a Machine
-------------------------------------
In some cases users may want to run scripted job content in an
environment that is more lightweight than a VM. In this case, we're
expecting to get a container which behaves like a VM. The important
characteristic of this scenario is that the job is not designed
specifically as a container-specific workload (e.g., it might simply
be a code style check). It could just as easily run on a VM as a
container.
To achieve this, we should expect that a job defined in terms of
simple commands should work. For example, a job which runs a playbook
with::
hosts: all
tasks:
- command: tox -e pep8
Should work given an appropriate base job which prepares a container.
A user might expect to request a container in the same way they
request any other node::
nodeset:
nodes:
- name: controller
label: python-container
To provide this support, Nodepool would need to create the requested
container. Nodepool can use either the kubernetes python client or
the `openshift python client`_ to do this. The openshift client is
downstream of the native `kubernetes python client`_, so they are
compatible, but using the openshift client offers a superset of
functionality, which may come in handy later. Since the Ansible
k8s_raw module uses the openshift client for this reason, we may want
to as well.
In kubernetes, a group of related containers forms a pod, which is the
API-level object used in order to cause a container or containers to
be created. Even if a single container is desired, a pod with a
single container is created. Therefore, for this case, Nodepool will
need to create a pod with a container. Some aspects of the
container's configuration (such as the image which is used) will need
to be defined in the Nodepool configuration for the label. These
configuration values should be supplied in a manner typical of
Nodepool's configuration format. Note that very little customization
is expected here -- more complex topologies should not be supported by
this mechanism and should be left instead for :ref:`container-native`.
Containers in Kubernetes always run a single command, and when that
command is finished, the container terminates. Nodepool doesn't have
the context to run a command at this point, so instead, it can create
a container running a command that can simply run forever, for
example, ``/bin/sh``.
A "container behaving like a machine" may be accessible via SSH, or it
may not. It's generally not difficult to run an SSHD in a container,
however, in order for that to be useful, it still needs to be
accessible over the network from the Zuul executor. This requires
that a service be configured for the container along with ingress
access to the cluster. This is an additional complication that some
users may not want to undertake, especially if the goal is to run a
relatively simple job. On the other hand, some environments may be
more natually suited to using an SSH connection.
We can address both cases, but they will be handled a bit differently.
First, the case where the container does run SSHD:
Nodepool would need to create a Kubernetes service for the container.
If Nodepool and the Zuul executor are running in the same Kubernetes
cluster, the container will be accessible to them, so Nodepool can
return this information to Zuul and the service address can be added
to the Ansible inventory with the SSH connection plugin as normal. If
the kubernetes cluster is external to Nodepool and Zuul, Nodepool will
also need to establish an ingress resource in order to make it
externally accessible. Both of these will require additional Nodepool
configuration and code to implement. Due to the additional
complexity, these should be implemented as follow-on changes after the
simpler case where SSHD is not running in the container.
In the case where the container does not run SSHD, and we interact
with it via native commands, Nodepool will create a service account in
Kubernetes, and inform Zuul that the appropriate connection plugin for
Ansible is the `kubectl connection plugin`_, along with the service
account credentials, and Zuul will add it to the inventory with that
configuration. It will then be able to run additional commands in the
container -- the commands which comprise the actual job.
Strictly speaking, this is all that is required for basic support in
Zuul, but as discussed in the introduction, we need to understand how
to build a complete solution including dealing with the speculative
git repo state.
A base job can be constructed to update the git repos inside the
container, and retrieve any artifacts produced. We should be able to
have the same base job detect whether there are containers in the
inventory and alter behavior as needed to accomodate them. For a
discussion of how the git repo state can be synchronised, see
:ref:`git-repo-sync`.
If we want streaming output from a kubectl command, we may need to
create a local fork of the kubectl connection plugin in order to
connect it to the log streamer (much in the way we do for the command
module).
Not all jobs will be expected to work in containers. Some frequently
used Ansible modules will not behave as expected when run with the
kubectl connection plugin. Synchronize, in particular, may be
problematic (though as there is support for synchronize with docker,
that may be possible to overcome). We will want to think about
ways to keep the base job(s) as flexible as possible so they work with
multiple connection types, but there may be limits. Containers which
run SSHD should not have these problems.
.. _kubectl connection plugin: https://docs.ansible.com/ansible/2.5/plugins/connection/kubectl.html
.. _openshift python client: https://pypi.org/project/openshift/
.. _kubernetes python client: https://pypi.org/project/kubernetes/
.. _container-native:
Native Container Workflow
-------------------------
A workflow that is designed from the start for containers may behave
very differently. In particular, it's likely to be heavily image
based, and may have any number of containers which may be created and
destroyed in the process of executing the job.
It may use the `k8s_raw Ansible module`_ to interact directly with
Kubernetes, creating and destroying pods for the job in much the same
way that an existing job may use Ansible to orchestrate actions on a
worker node.
All of this means that we should not expect Nodepool to provide a
running container -- the job itself will create containers as needed.
It also means that we need to think about how a job will use the
speculative git repos. It's very likely to need to build custom
images using those repos which are then used to launch containers.
Let's consider a job which begins by building container images from
the speculative git source, then launches containers from those images
and exercises them.
.. note:: It's also worth considering a complete job graph where a
dedicated job builds images and subsequent jobs use them. We'll
deal with that situation in :ref:`buildset`.
Within a single job, we could build images by requesting either a full
machine or a :ref:`container-machine` from Nodepool and running the
image build on that machine. Or we could use the `k8s_raw Ansible
module`_ to create that container from within the job. We would use the
:ref:`git-repo-sync` process to get the appropriate source code onto
the builder. Regardless, once the image builds are complete, we can
then use the result in the remainder of the job.
In order to use an image (regardless of how it's created) Kubernetes
is going to expect to be able to find the image in a repository it
knows about. Putting images created based on speculative future git
repo stats into a public image repository may be confusing, and
require extra work to clean those up. Therefore, the best approach
may be to work with private, per-build image repositories.
The best approach for this may be to have the job run an image
repository after it completes the image builds, then upload those
builds to the repository. The only thing Nodepool needs to provide in
this situation is a Kubernetes namespace for the job. The job itself
can perform the image build, create a service account token for the
image repository, run the image repository, and upload the image. Of
course, it will be useful to create reusable roles and jobs in
zuul-jobs to implement this universally.
OpenShift provides some features that make this easier, so an
OpenShift-specific driver could additonally do the following and
reduce the complexity in the job:
We can ask Nodepool to create an `OpenShift project`_ for the use of
the job. That will create a private image repository for the project.
Service accounts in the project are automatically created with
``imagePullSecrets`` configured to use the private image repository [#f1]_.
We can have Zuul use one of the default service accouns, or have
Nodepool create a new one specifically for Zuul, and then when using
the `k8s_raw Ansible module`_, the image registry will automatically be
used.
While we may consider expanding the Nodepool API and configuration
language to more explicitly support other types of resources in the
future, for now, the concept of labels is sufficiently generic to
support the use cases outlined here. A label might correspond to a
virtual machine, physical machine, container, namespace, or OpenShift
project. In all cases, Zuul requests one of these things from
Nodepool by using a label.
.. _OpenShift Project: https://docs.openshift.org/latest/dev_guide/projects.html
.. [#f1] https://docs.openshift.org/latest/dev_guide/managing_images.html#using-image-pull-secrets
.. _k8s_raw Ansible module: http://docs.ansible.com/ansible/2.5/modules/k8s_raw_module.html
.. _git-repo-sync:
Synchronizing Git Repos
-----------------------
Our existing method of synchronizing git repositories onto a worker
node relies on SSH. It's possible to run an SSH daemon in a container
(or pod), but if it's otherwise not needed, it may be considered too
cumbersome. In particular, it may mean establishing a service entry
in kubernetes and an ingress route so that the executor can reach the
SSH server. However, it's always possible to run commands in a
container using kubectl with direct stdin/stdout connections without
any of the service/ingress complications. It should be possible to
adapt our process to use this.
Our current process will use a git cache if present on the worker
image. This is optional -- a Zuul user does not need a specially
prepared image, but if one is present, it can speed up operation. In
a container environment, we could have Nodepool build container images
with a git repo cache, but in the world of containers, there are
universally accessible image stores, and considerable tooling around
building custom images already. So for now, we won't have nodepool
build container images itself, but rather expect that a publicly
accessible base image will be used, or an administrator will create
and make an image available to Kubernetes if a custom image is needed
in their environment. If we find that we also want to support
container image builds in Nodepool in the future, we can add support
for that later.
The first step in the process is to create a new pod based on either a
base image. Ensure it has ``git`` installed. If the pod is going to
be used to run a single command (i.e., :ref:`container-machine`, or
will only be used to build images), then a single container is
sufficient. However, if the pod will support multiple containers,
each needing access to the git cache, then we can use the `sidecar
pattern`_ to update the git repo once. In that case, in the pod
definition, we should specify an `emptyDir volume`_ where the final
git repos will be placed, and other containers in the pod can mount
the same volume.
Run commands in the container to clone the git repos to the
destination path.
Run commands in the container to push the updated git commits. In
place of the normal ``git push`` command which relies on SSH, use a
custom SSH command which uses kubectl to set up the remote end of the
connection.
Here is an example custom ssh script:
.. code-block:: bash
#!/bin/bash
/usr/bin/kubectl exec zuultest -c sidecar -i /usr/bin/git-receive-pack /zuul/glance
Here is an example use of that script to push to a remote branch:
.. code-block:: console
[root@kube-1 glance]# GIT_SSH="/root/gitssh.sh" git push kube HEAD:testbranch
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 281 bytes | 281.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git+ssh://kube/
* [new branch] HEAD -> testbranch
.. _sidecar pattern: https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar
.. _emptyDir volume: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
.. _buildset:
BuildSet Resources
------------------
It may be very desirable to construct a job graph which builds
container images once at the top, and then supports multiple jobs
which deploy and exercise those images. The use of a private image
registry is particularly suited to this.
On the other hand, folks may want jobs in a buildset to be isolated
from each other, so we may not want to simply assume that all jobs in
a buildset are related.
An approach which is intuitive and doesn't preclude either approach is
to allow the user to tell Zuul that the resources used by a job (e.g.,
the Kubernetes namespace, and any containers or other nodes) should
continue running until the end of the buildset. These resources would
then be placed in the inventory of child jobs for their use. In this
way, the job we constructed earlier which built and image and uploaded
it into a registry that it hosted could then be the root of a tree of
child jobs which use that registry. If the image-build-registry job
created a service token, that could be passed to the child jobs for
their use when they start their own containers or pods.
In order to support this, we may need to implement provider affinity
for builds in a buildset in Nodepool so that we don't have to deal
with ingress access to the registry (which may not be possible).
Otherwise if a Nodepool had access to two Kubernetes clusters, we
might assign a child job to a different cluster.

View File

@ -0,0 +1,16 @@
Specifications
==============
This section contains specifications for future Zuul development. As
we work on implementing significant changes, these document our plans
for those changes and help us work on them collaboratively.
.. warning:: These are not authoritative documentation. These
features are not currently available in Zuul. They may change
significantly before final implementation, or may never be fully
completed.
.. toctree::
:maxdepth: 1
container-build-resources