Remove implemented specs

These three specifications have been fully implemented:

    * Use Kubernetes for Build Resources
    * Multiple Ansible versions
    * Web Dashboard Log Handling

Remove them from the index since they are not intended as long-term
feature documentation, and the features they implemented were
described in proper documentation as a part of that work.

Also update the index page with a couple of additional sentences
about how spec completion is handled (through removal of the
implemented spec documents) and why we consider it reasonable to do
it this way.

Change-Id: I493d995a329a087341986457bec4b328f1e8b7ea
This commit is contained in:
Jeremy Stanley 2020-01-07 19:06:20 +00:00 committed by James E. Blair
parent d576363599
commit 1324a3a2a8
4 changed files with 4 additions and 578 deletions

View File

@ -1,330 +0,0 @@
Use Kubernetes for Build Resources
==================================
.. warning:: This is not authoritative documentation. These features
are not currently available in Zuul. They may change significantly
before final implementation, or may never be fully completed.
There has been a lot of interest in using containers for build
resources in Zuul. The use cases are varied, so we need to describe
in concrete terms what we aim to support. Zuul provides a number of
unique facilities to a CI/CD system which are well-explored in the
full-system context (i.e., VMs or bare metal) but it's less obvious
how to take advantage of these features in a container environment.
As we design support for containers it's also important that we
understand how things like speculative git repo states and job content
will work with containers.
In this document, we will consider two general approaches to using
containers as build resources:
* Containers that behave like a machine
* Native container workflow
Finally, there are multiple container environments. Kubernetes and
OpenShift (an open source distribution of Kubernetes) are popular
environments which provide significant infrastructure to help us more
easily integrate them with Zuul, so this document will focus on these.
We may be able to extend this to other environments in the future.
.. _container-machine:
Containers That Behave Like a Machine
-------------------------------------
In some cases users may want to run scripted job content in an
environment that is more lightweight than a VM. In this case, we're
expecting to get a container which behaves like a VM. The important
characteristic of this scenario is that the job is not designed
specifically as a container-specific workload (e.g., it might simply
be a code style check). It could just as easily run on a VM as a
container.
To achieve this, we should expect that a job defined in terms of
simple commands should work. For example, a job which runs a playbook
with::
hosts: all
tasks:
- command: tox -e pep8
Should work given an appropriate base job which prepares a container.
A user might expect to request a container in the same way they
request any other node::
nodeset:
nodes:
- name: controller
label: python-container
To provide this support, Nodepool would need to create the requested
container. Nodepool can use either the kubernetes python client or
the `openshift python client`_ to do this. The openshift client is
downstream of the native `kubernetes python client`_, so they are
compatible, but using the openshift client offers a superset of
functionality, which may come in handy later. Since the Ansible
k8s_raw module uses the openshift client for this reason, we may want
to as well.
In kubernetes, a group of related containers forms a pod, which is the
API-level object used in order to cause a container or containers to
be created. Even if a single container is desired, a pod with a
single container is created. Therefore, for this case, Nodepool will
need to create a pod with a container. Some aspects of the
container's configuration (such as the image which is used) will need
to be defined in the Nodepool configuration for the label. These
configuration values should be supplied in a manner typical of
Nodepool's configuration format. Note that very little customization
is expected here -- more complex topologies should not be supported by
this mechanism and should be left instead for :ref:`container-native`.
Containers in Kubernetes always run a single command, and when that
command is finished, the container terminates. Nodepool doesn't have
the context to run a command at this point, so instead, it can create
a container running a command that can simply run forever, for
example, ``/bin/sh``.
A "container behaving like a machine" may be accessible via SSH, or it
may not. It's generally not difficult to run an SSHD in a container,
however, in order for that to be useful, it still needs to be
accessible over the network from the Zuul executor. This requires
that a service be configured for the container along with ingress
access to the cluster. This is an additional complication that some
users may not want to undertake, especially if the goal is to run a
relatively simple job. On the other hand, some environments may be
more natually suited to using an SSH connection.
We can address both cases, but they will be handled a bit differently.
First, the case where the container does run SSHD:
Nodepool would need to create a Kubernetes service for the container.
If Nodepool and the Zuul executor are running in the same Kubernetes
cluster, the container will be accessible to them, so Nodepool can
return this information to Zuul and the service address can be added
to the Ansible inventory with the SSH connection plugin as normal. If
the kubernetes cluster is external to Nodepool and Zuul, Nodepool will
also need to establish an ingress resource in order to make it
externally accessible. Both of these will require additional Nodepool
configuration and code to implement. Due to the additional
complexity, these should be implemented as follow-on changes after the
simpler case where SSHD is not running in the container.
In the case where the container does not run SSHD, and we interact
with it via native commands, Nodepool will create a service account in
Kubernetes, and inform Zuul that the appropriate connection plugin for
Ansible is the `kubectl connection plugin`_, along with the service
account credentials, and Zuul will add it to the inventory with that
configuration. It will then be able to run additional commands in the
container -- the commands which comprise the actual job.
Strictly speaking, this is all that is required for basic support in
Zuul, but as discussed in the introduction, we need to understand how
to build a complete solution including dealing with the speculative
git repo state.
A base job can be constructed to update the git repos inside the
container, and retrieve any artifacts produced. We should be able to
have the same base job detect whether there are containers in the
inventory and alter behavior as needed to accomodate them. For a
discussion of how the git repo state can be synchronised, see
:ref:`git-repo-sync`.
If we want streaming output from a kubectl command, we may need to
create a local fork of the kubectl connection plugin in order to
connect it to the log streamer (much in the way we do for the command
module).
Not all jobs will be expected to work in containers. Some frequently
used Ansible modules will not behave as expected when run with the
kubectl connection plugin. Synchronize, in particular, may be
problematic (though as there is support for synchronize with docker,
that may be possible to overcome). We will want to think about
ways to keep the base job(s) as flexible as possible so they work with
multiple connection types, but there may be limits. Containers which
run SSHD should not have these problems.
.. _kubectl connection plugin: https://docs.ansible.com/ansible/2.5/plugins/connection/kubectl.html
.. _openshift python client: https://pypi.org/project/openshift/
.. _kubernetes python client: https://pypi.org/project/kubernetes/
.. _container-native:
Native Container Workflow
-------------------------
A workflow that is designed from the start for containers may behave
very differently. In particular, it's likely to be heavily image
based, and may have any number of containers which may be created and
destroyed in the process of executing the job.
It may use the `k8s_raw Ansible module`_ to interact directly with
Kubernetes, creating and destroying pods for the job in much the same
way that an existing job may use Ansible to orchestrate actions on a
worker node.
All of this means that we should not expect Nodepool to provide a
running container -- the job itself will create containers as needed.
It also means that we need to think about how a job will use the
speculative git repos. It's very likely to need to build custom
images using those repos which are then used to launch containers.
Let's consider a job which begins by building container images from
the speculative git source, then launches containers from those images
and exercises them.
.. note:: It's also worth considering a complete job graph where a
dedicated job builds images and subsequent jobs use them. We'll
deal with that situation in :ref:`buildset`.
Within a single job, we could build images by requesting either a full
machine or a :ref:`container-machine` from Nodepool and running the
image build on that machine. Or we could use the `k8s_raw Ansible
module`_ to create that container from within the job. We would use the
:ref:`git-repo-sync` process to get the appropriate source code onto
the builder. Regardless, once the image builds are complete, we can
then use the result in the remainder of the job.
In order to use an image (regardless of how it's created) Kubernetes
is going to expect to be able to find the image in a repository it
knows about. Putting images created based on speculative future git
repo stats into a public image repository may be confusing, and
require extra work to clean those up. Therefore, the best approach
may be to work with private, per-build image repositories.
The best approach for this may be to have the job run an image
repository after it completes the image builds, then upload those
builds to the repository. The only thing Nodepool needs to provide in
this situation is a Kubernetes namespace for the job. The job itself
can perform the image build, create a service account token for the
image repository, run the image repository, and upload the image. Of
course, it will be useful to create reusable roles and jobs in
zuul-jobs to implement this universally.
OpenShift provides some features that make this easier, so an
OpenShift-specific driver could additonally do the following and
reduce the complexity in the job:
We can ask Nodepool to create an `OpenShift project`_ for the use of
the job. That will create a private image repository for the project.
Service accounts in the project are automatically created with
``imagePullSecrets`` configured to use the private image repository [#f1]_.
We can have Zuul use one of the default service accouns, or have
Nodepool create a new one specifically for Zuul, and then when using
the `k8s_raw Ansible module`_, the image registry will automatically be
used.
While we may consider expanding the Nodepool API and configuration
language to more explicitly support other types of resources in the
future, for now, the concept of labels is sufficiently generic to
support the use cases outlined here. A label might correspond to a
virtual machine, physical machine, container, namespace, or OpenShift
project. In all cases, Zuul requests one of these things from
Nodepool by using a label.
.. _OpenShift Project: https://docs.openshift.org/latest/dev_guide/projects.html
.. [#f1] https://docs.openshift.org/latest/dev_guide/managing_images.html#using-image-pull-secrets
.. _k8s_raw Ansible module: http://docs.ansible.com/ansible/2.5/modules/k8s_raw_module.html
.. _git-repo-sync:
Synchronizing Git Repos
-----------------------
Our existing method of synchronizing git repositories onto a worker
node relies on SSH. It's possible to run an SSH daemon in a container
(or pod), but if it's otherwise not needed, it may be considered too
cumbersome. In particular, it may mean establishing a service entry
in kubernetes and an ingress route so that the executor can reach the
SSH server. However, it's always possible to run commands in a
container using kubectl with direct stdin/stdout connections without
any of the service/ingress complications. It should be possible to
adapt our process to use this.
Our current process will use a git cache if present on the worker
image. This is optional -- a Zuul user does not need a specially
prepared image, but if one is present, it can speed up operation. In
a container environment, we could have Nodepool build container images
with a git repo cache, but in the world of containers, there are
universally accessible image stores, and considerable tooling around
building custom images already. So for now, we won't have nodepool
build container images itself, but rather expect that a publicly
accessible base image will be used, or an administrator will create
and make an image available to Kubernetes if a custom image is needed
in their environment. If we find that we also want to support
container image builds in Nodepool in the future, we can add support
for that later.
The first step in the process is to create a new pod based on either a
base image. Ensure it has ``git`` installed. If the pod is going to
be used to run a single command (i.e., :ref:`container-machine`, or
will only be used to build images), then a single container is
sufficient. However, if the pod will support multiple containers,
each needing access to the git cache, then we can use the `sidecar
pattern`_ to update the git repo once. In that case, in the pod
definition, we should specify an `emptyDir volume`_ where the final
git repos will be placed, and other containers in the pod can mount
the same volume.
Run commands in the container to clone the git repos to the
destination path.
Run commands in the container to push the updated git commits. In
place of the normal ``git push`` command which relies on SSH, use a
custom SSH command which uses kubectl to set up the remote end of the
connection.
Here is an example custom ssh script:
.. code-block:: bash
#!/bin/bash
/usr/bin/kubectl exec zuultest -c sidecar -i /usr/bin/git-receive-pack /zuul/glance
Here is an example use of that script to push to a remote branch:
.. code-block:: console
[root@kube-1 glance]# GIT_SSH="/root/gitssh.sh" git push kube HEAD:testbranch
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 281 bytes | 281.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git+ssh://kube/
* [new branch] HEAD -> testbranch
.. _sidecar pattern: https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar
.. _emptyDir volume: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
.. _buildset:
BuildSet Resources
------------------
It may be very desirable to construct a job graph which builds
container images once at the top, and then supports multiple jobs
which deploy and exercise those images. The use of a private image
registry is particularly suited to this.
On the other hand, folks may want jobs in a buildset to be isolated
from each other, so we may not want to simply assume that all jobs in
a buildset are related.
An approach which is intuitive and doesn't preclude either approach is
to allow the user to tell Zuul that the resources used by a job (e.g.,
the Kubernetes namespace, and any containers or other nodes) should
continue running until the end of the buildset. These resources would
then be placed in the inventory of child jobs for their use. In this
way, the job we constructed earlier which built and image and uploaded
it into a registry that it hosted could then be the root of a tree of
child jobs which use that registry. If the image-build-registry job
created a service token, that could be passed to the child jobs for
their use when they start their own containers or pods.
In order to support this, we may need to implement provider affinity
for builds in a buildset in Nodepool so that we don't have to deal
with ingress access to the registry (which may not be possible).
Otherwise if a Nodepool had access to two Kubernetes clusters, we
might assign a child job to a different cluster.

View File

@ -3,7 +3,10 @@ Specifications
This section contains specifications for future Zuul development. As
we work on implementing significant changes, these document our plans
for those changes and help us work on them collaboratively.
for those changes and help us work on them collaboratively. Once a
specification is implemented, it should be removed. All relevant
details for implemented work must be reflected correctly in Zuul's
documentation instead.
.. warning:: These are not authoritative documentation. These
features are not currently available in Zuul. They may change
@ -13,9 +16,6 @@ for those changes and help us work on them collaboratively.
.. toctree::
:maxdepth: 1
container-build-resources
multiple-ansible-versions
logs
tenant-scoped-admin-web-API
kubernetes-operator
circular-dependencies

View File

@ -1,87 +0,0 @@
Web Dashboard Log Handling
==========================
.. warning:: This is not authoritative documentation. These features
are not currently available in Zuul. They may change significantly
before final implementation, or may never be fully completed.
The OpenStack project infrastructure developed some useful log hosting
tools and practices, but they are neither scalable nor generally
applicable to other projects. This spec describes some of those
features and how we can incorporate them into Zuul in a more widely
applicable way.
In general, OpenStack uses a static log server and several features of
Apache, including a python WSGI app, to mediate access to the logs.
This provides directory browsing, log severity filters and highlights,
deep linking to log lines, and dynamic rendering of ARA from sqlite
databases.
We can expand the role of the Zuul dashboard to take on some of those
duties, thereby reducing the storage requirements, and offloading the
computation to the browser of the user requesting it. In the process,
we can make a better experience for users navigating log files, all
without compromising Zuul's independence from backend storage.
All of what follows should apply equally to file or swift-based
storage.
Much of this functionality centers on enhancements to the existing
per-build page in the web dashboard.
If we have the uploader also create and upload a file manifest (say,
zuul-manifest.json), then the build page can fetch [#f1]_ this file
from the log storage and display an index of artifacts for the build.
This can allow for access to artifacts directly within the Zuul web
user interface, without the discontinuity of hitting an artifact
server for the index. Since swift (and other object storage systems)
may not be capable of dynamically creating indexes, this allows the
functionality to work in all cases and removes the need to
pre-generate index pages when using swift.
We have extended Zuul to store additional artifact information about a
build. For example, in our docs build jobs, we override the
success-url to link to the generated doc content. If we return the
preview location as additional information about the build, we can
highlight that in a prominent place on the build page. In the future,
we may want to leave the success-url alone so that users visit the
build page and then follow a link to the preview. It would be an
extra click compared to what we do today, but it may be a better
experience in the long run. Either way would still be an option, of
course, according to operator preference.
Another important feature we currently have in OpenStack is a piece of
middleware we call "os-loganalyze" (or OSLA) which HTMLifies text
logs. It dynamically parses log files and creates anchors for each
line (for easy hyperlink references) and also highlights and allows
filtering by severity.
We could implement this in javascript, so that when a user browses
from the build page to a text file, the javascript component could
fetch [#f1]_ the file and perform the HTMLification as OSLA does
today. If it can't parse the file, it will just pass it through
unchanged. And in the virtual file listing on the build page, we can
add a "raw" link for direct download of the file without going through
the HTMLification. This would eliminate the need for us to pre-render
content when we upload to swift, and by implementing this in a generic
manner in javascript in Zuul, more non-OpenStack Zuul users would
benefit from this functionality.
Any unknown files, or existing .html files, should just be direct links
which leave the Zuul dashboard interface (we don't want to attempt to
proxy or embed everything).
Finally, even today Zuul creates a job-output.json, with the idea that
we would eventually use it as a substitute for job-output.txt when
reviewing logs. The build page would be a natural launching point for a
page which read this json file and created a more interactive and
detailed version of the streaming job output.
In summary, by moving some of the OpenStack-specific log processing into
Zuul-dashboard javascript, we can make it more generic and widely
applicable, and at the same time provide better support for multiple log
storage systems.
.. [#f1] Swift does support specifying CORS headers. Users would need
to do so to support this, and users of static file servers would need
to do something similar.

View File

@ -1,157 +0,0 @@
Multiple Ansible versions
=========================
.. warning:: This is not authoritative documentation. These features
are not currently available in Zuul. They may change significantly
before final implementation, or may never be fully completed.
Currently zuul only supports one specific ansible version at once. This
complicates upgrading ansible because ansible often breaks backwards
compatibility and so we need to synchronize the upgrade on the complete
deployment which is often not possible.
Instead we want to support multiple ansible versions at once so we can handle
the lifecycle of ansible versions by adding new versions and deprecating old
ones.
Requirements
------------
We want jobs to be able to pick a specific ansible version to run. However as we
have lots of stuff that overrides things in ansible we will let the job only
select a minor version (e.g. 2.6) in a list of supported versions. This is also
necessary from a security point of view so the user cannot pick a specific
bugfix version that is known to contain certain security flaws. Zuul needs to
support a list of specific minor versions from where it will pick the latest
bugfix version or a pinned version if we need to.
We need virtual environments for ansible which are not relocatable. Because of
this we must install ansible after the installation. Further we need to support
sdist based installations as well as wheel based installations. However wheel
based installations don't have any possibility for post install hooks. Thus
to stay consistent we will require to manually run a post installation script
to install the ansible versions. This script will work standalone as well as
within the executor process context. This will make it possible to let the
executor optionally install ansible at startup time if there is interest.
This can be useful for the quick start. However the recommended way will be the
manual run.
We also need a configuration that specifies additional packages that need to
be installed along with ansible. This is required because different deployers
use different ansible connections or modules that have additional optional
dependencies (e.g. winrm).
Job configuration
-----------------
There will be a new job attribute ``ansible-version`` that will instruct zuul
to take the specified version. This attribute will be validated against a list
of supported ansible versions that zuul currently can handle. Zuul will throw
a configuration error if a job selects an unsupported or unparsable version.
If no ``ansible-version`` is defined zuul will pick up the ansible version
marked as default. We will also need a mechanism to deprecate ansible versions
to prepare removing old versions. We could add labels to the supported versions.
To express that. The supported versions we will start with will be:
* 2.5 (deprecated)
* 2.6
* 2.7 (default)
The default ansible version will be configurable on global and tenant level.
This way updating the default to newer versions can be made much less
disruptive. Being able to specify the default version on tenant level makes
it easy to canary release ansible updates in a larger multi tenant environment.
We will also need to be able to pin a version to a specific bugfix version in
case the latest one is known to be broken. This will also be handled by the
installation mechanisms described below.
Installing ansible
------------------
We currently pull in ansible via the ``requirements.txt``. This will no longer
be sufficient. Instead zuul itself needs to take care of installing the
supported versions into a pre-defined directory structure using the venv module.
The executor will have two new config options:
* ``ansible-root``: The root path where ansible installations will be found. The
default will be ``<exec-root>/lib/zuul/executor-ansible``. All supported
ansible installations will live inside a venv in the path
``ansible-root/<minor-version>``.
* ``manage-ansible``: A boolean flag that tells the executor manage the
installed ansible versions itself. The default will be ``true``.
If set to ``true`` the executor will install and upgrade all supported
ansible versions on startup.
If set to ``false`` the executor will validate the presence of all supported
ansible versions on startup and throw an error on missing installations.
The management of the ansible installations will be performed by a script
installed along the zuul binaries. This will install and update every supported
version of ansible into a specified ``ansible-root`` directory.
This script
can then be used by the executor or externally depending on the configuration.
Dockerized deployment
---------------------
In a dockerized deployment it is preferable to pre-install ansible during the
image build. This can be done by calling the installation script during the
image build. The official images will contain all supported versions.
Ansible module overrides
------------------------
We currently have many ansible module overrides. These may or may not be
targeted to a specific ansible version. Currently they are organized into the
folder ``zuul/ansible``. In order to support multiple ansible versions without
needing to fork everything there this will be reorganized into:
* ``zuul/ansible/generic``: Overrides and modules valid for all supported
ansible versions.
* ``zuul/ansible/<version>``: Overrides and modules valid for a specific
version.
If there are overrides that are valid for a range of ansible versions we can
define it in the lowest version and make use of symlinking to the other versions
in order to minimize additional maintenance overhead by not forking an override
where possible. Generally we should strive for having as much as possible in the
generic part to minimize the maintenance effort of these.
Deprecation policy
------------------
We should handle deprecating and removing supported ansible versions similar to
the deprecation policy described in zuul-jobs:
https://zuul-ci.org/docs/zuul-jobs/policy.html
Further we should make sure that whenever we release a new version of zuul we
should make sure that the list of supported ansible versions is a subset of
what is supported by ansible at that time. The list of supported ansible
versions can be found here:
https://docs.ansible.com/ansible/latest/reference_appendices/release_and_maintenance.html#release-status
We also should notify the users when they use deprecated ansible versions. This
can be done in two ways. First the executor will emit a warning to the logs when
it encounters a job that uses a deprecated ansible version. The executor already
can return warnings together with the build result. These will be added directly
to the reporting to the code review system. This can be used to warn about
deprecated ansible versions at a prominent location instead of burying it
somewhere in megabytes of logs.
Testing
-------
We also have a set of tests that validate the security overrides. We need to
test them for all supported ansible versions. Where needed we also need to fork
or add additional version specific tests.