Spec: Python Build/Install Process Simplification

The current python wheel/venv build process is not easily
understood, and the install process has become complicated.

This aims to work towards making it simpler to deploy,
simpler to understand and to make many of the current
features which are forced on deployers to be opt-in.

Change-Id: Ideee1ba496173590558b29ba6fcac719ba4e162e
This commit is contained in:
Jesse Pretorius 2017-09-06 15:49:51 +01:00 committed by Jesse Pretorius (odyssey4me)
parent 98b3d9fe18
commit 86659fafb8
1 changed files with 339 additions and 0 deletions

View File

@ -0,0 +1,339 @@
Python Build/Install Process Simplification
###########################################
:date: 2017-09-06 13:00
:tags: python, build, source, repo
The current python wheel/venv build process is not easily understood, and the
install process has become complicated. This blueprint aims to work towards
making it simpler to deploy, simpler to understand and to make many of the
current features which are forced on all deployers to be opt-in.
Launchpad Blueprint:
https://blueprints.launchpad.net/openstack-ansible/+spec/python-build-install-simplification
Problem description
===================
Building
~~~~~~~~
The Python repository used for OpenStack-Ansible deployments is used to
prepare `Python wheels`_ for any git- or pypi-sourced packages for an
environment. Using wheels speeds up the installation of the package and
takes away the need to install the distribution packages required to
compile the package when installing.
The repository preparation process also prepares `Python virtualenvs`_
for all OSA roles with the prefix ``os_`` (which are expected to be
OpenStack services) in order to speed up the deployment of the services
by downloading a complete virtualenv instead of installing the packages
individually for every host that needs the service.
The ``py_pkgs`` lookup, which pulls together the information used by the
build process. It is a black box in terms of what it does, making some
decisions about the information it reads and outputs which are not
documented anywhere other than in the code itself. The code is not easily
modified without breaking the process and is therefore most often left
alone and not well maintained, resulting in an increasing amount of technical
debt. The subsequent jinja in the repo-build role which processes it is
tough to work through and not easily maintained. While both of these could
be adjusted to make use of different plugins or filters, it would remain
a set of black boxes which are complex to untangle.
The way that git repositories are specified and parameters are provided
to the build process does not scale very well. Each git repo requires at
least two flat variables to be set (``git_repo`` and ``git_install_branch``)
and can optionally have more set. This model of setting variables makes it
really easy to override individual settings, but requires the use of a
pattern match mechanism to discover all the settings (which is why we use
the lookup to do it). The settings are also put in disparate places, making
them hard to find - defaults/repo_packages, role/defaults. It is not very
obvious to most newcomers how to change them and it is not obvious to many
veterans what many of the settings mean. It often requires a lot of code
walking to understand the meaning of some settings like ``venvwithindex``
and ``ignorerequirements``.
The git clone process used to fetch the git sources in order to use when
building is done asynchronously in order to improve the time to completion,
however individual asynchronous tasks cannot be retried in Ansible, and the
git clones commonly fail. This is an Ansible limitation which we could work
around by implementing our own action module, but this would increase the
technical debt as we would have to constantly keep the module code updated
as we update to later versions of Ansible.
When building wheels, pip has no way of resolving all dependencies up-front.
The only capability it has is to resolve the requirements for the current
package. It then processes each package requirement in turn. To do so requires
downloading the package and unpacking it to read the requirements. This is a
sequential process and therefore takes a long time when processing packages
with a lot of requirements as is typical for OpenStack projects.
In Kilo the OpenStack requirements management process did not have the jobs
which tested the co-installability of all OpenStack packages and produced the
``upper-constraints.txt`` file as a manifest of which package versions worked
together. We therefore needed to do our own processing of all python packages
which would be installed by the roles and had to compile a set of requirements
and constraints across them all for the purpose of building the wheels, and
ensuring that the installed set were consistent for a build. When the
OpenStack requirements repository started publishing the upper-constraints
file we adopted it immediately to help keep builds more consistent. However,
we still produce our own ``requirements_absolute_requirements.txt`` file
which is used for all pip install tasks in order to ensure consistency and
to ensure that the packages we built from git are used (instead of making
the install in the role do the install from the git source, it installs from
the wheel held in the repo server). However this is not practical any more
as there are requirements for different services and needs which are not
resolvable down to a common set - we need to be able to allow the installation
of any version of packages and only apply constraints when needed.
Some of the venvs we build do not adhere to the OpenStack requirements process
and therefore sometimes cannot be built using the upper constraints file.
There has also been some interest in being able to do mixed series deployments
instead of homogenous deployments. This would involve preparing a venv
containing packages from a different series with a different set of
constraints. Currently the constraints used in the repo build process are
global - we only have the ability to enable/disable their use when building
venvs. It would be better to be able to specify a global fallback for
constraints, but to allow per venv constraints too.
The use of Python 2.7 for OpenStack and Ansible is waning and the need to
shift everything to use Python 3.5 has arisen as a new requirement. The
tooling will need to be shifted to implement the venvs using Python 3.5
where applicable, but may still need to prepare venvs using Python 2.7 if
a service does not yet support running in a Python 3.5 environment.
In Newton we introduced the ability to do multi-architecture builds to cater
for multiple architectures, then had to also split out multi-distro builds due
do wheel/venvs references to C libraries being different for each distro due
to the libraries available being different. Currently this is working, but it
makes the repo build process much more complex and take a lot more time. The
process to synchronise the per-distro and per-architecture built artifacts is
error prone and confuses many newcomers to the project.
In order to facilitate using the repo-server to respond to pip index queries,
multiple directories and symlinks have been used to prepare the appropriate
structure so that the correct responses are given back to pip. The process of
setting up all the symlinks is very time consuming and in some places the
process may cause dead links, especially when rebuilding for a specific release
tag.
Storing
~~~~~~~
Once the wheels, venvs and other artifacts are built for an environment they
are stored and synchronised between the repo containers using a combination of
rsync and lsyncd. While this sync process is generally OK, it is commonly a
cause for confusion and requires a complex troubleshooting process to figure
out why packages are not present.
Installing
~~~~~~~~~~
The consumption of the prepared wheels and virtualenvs has changed over
time. With the introduction of ``developer_mode`` into the roles there
is a lot of code and functionality duplication between the repo build
process and the role installation process.
The need to cater for the optional inclusion of a variety of plugins/drivers
in the venvs either through the use of additional Python packages or by
symlinking system packages into the venv (when the package is proprietary or
unavailable via git or pypi) causes further complexity in the process.
When executing a pip installation, pip always looks for packages in the
following order: local cache, local folder, default index, extra indexes.
Pip will always check all locations before deciding which to use for the
installation. This means that if there are multiple indexes used, it queries
them all. This can be slow if any of those are not local to the environment.
.. _Python wheels: http://pythonwheels.com
.. _Python virtualenvs: https://virtualenv.pypa.io
Proposed change
===============
* Change the repo build process to, by default, only build wheels for git
sources it is given without also building the dependencies. The ability
to build all wheels will still be there, but will not be the default
behaviour. This will cut down the time taken in this process when in CI,
development environments or small online environments where it is not
necessary to build/store all the wheels. The full build will only be
necessary for offline deployments and for environments where the deployer
specifically opts-in to ensuring that everything is built.
* Replace the current storage structure for wheels with a flat directory.
This directory will be served via the pypi API provided by the very simple
`pypiserver`_ application. If we need to continue to provide per-distro
or per-architecture wheels then we could implement distro/arch indexes
which are supplied by individual folders. However, it is unlikely that
this will be necessary.
* Use nginx as a reverse proxy which responds to requests from pip by first
trying against the local pypiserver, then against tarballs.openstack.org
and then against pypi. This will allow nginx to cache all downloaded
packages (speeding up subsequent requests) without the repo server having
build them.
* Implement changes to the roles to allow service-specific constraints to
be applied when building venvs. This allows a CI process to build service
venvs and to publish the list of tested versions for that service. Then
for production builds the published list can be used as a constraint for
the venv to ensure that production builds use the same versions. This
solves a problem we have today where some projects (eg: tempest, rally,
gnocchi) have to be built unconstrained as they do not conform to the
global requirements process.
* Implement changes to each role to handle the wheel building and venv
building, but do it in such a way that only the build can be executed
by using tags, setting a specific flag, or include_role and tasks_from.
The specific dependencies can then be itemised in the role and the role
can be used for artifact preparation.
* Remove all pip install activities from hosts, replacing them with the
use of distro packages exclusively for any python requirements on the
hosts. We should avoid implementing as many python packages on the host
as possible and focus all efforts on implementing everything we need
(including the Ansible requirements for targeted hosts) into venvs.
All Ansible tasks should then specifically use the appropriate venv
when executing tasks, avoiding the use of any python libraries on the
host. This prevents system package conflicts and will reduce the host
package installation requirements.
* Implement a playbook which is optionally used to prepare pre-built venvs
for an environment as they are today. If a deployer wishes to prepare
the venvs in a build process, the playbook should be exercised in the
build process and should be executed on a designated 'build host' which
will make use of ephemeral containers and/or virtual machines on the build
host to exercise the builds for the necessary distribution and architecture
combinations.
* Remove the complex git caching/staging process which exists today and
make the use of the repo server for git caching for the services that
require it (eg: nova-console uses novnc/spice from git) entirely optional.
* Implement a playbook which can be used to stage offline installs by
downloading all built artifacts (completed, perhaps by a CI job) to the
deployment host, then distributing them appropriately.
* Simplify the constraints management by implementing the use of
--constraints in the following order:
--constraint user-specified-constraints.txt
--constraint openstack-ansible-pins.txt
--constraint openstack-upper-constraints.txt
This would replace the current method which merges the various constraints
into one file, requiring a fair amount of jinja magic because a single
file cannot have two constraints and resolve successfully into a single
result as we need in our current mechanism.
* Implement changes to roles to ensure that the build process and the
packages only required when building (dev headers, etc) are only
used when a build is being executed. The build packages and the runtime
packages will be changed into separate lists so that the runtime environment
is only installing the packages it needs.
* Ensure that 'optional' pip packages are installed into the venv during the
build stage, rather than during the install stage.
.. _pypiserver: https://pypiserver.readthedocs.io
Alternatives
------------
* The build process can remain as-is, continuing to confuse deployers and
difficult to maintain.
* The build process can be changed to only build and store wheels for packages
which are pip installed onto the hosts, and only to build and store the
venvs for distribution.
Playbook/Role impact
--------------------
Playbooks will be added to cater for the build process and the staging
process. The roles will be adjusted to properly separate out the build
tasks and the distro packages to install for the build (versus those
required when using pre-built wheels).
Upgrade impact
--------------
Care will be taken to ensure that upgrades happen as they do today.
Security impact
---------------
The security posture should be improved by the reduction of packages installed
onto hosts and containers when a full set of artifacts are built.
Performance impact
------------------
The performance of the deployment should be improved due to the reduction in
time taken to deploy with pre-built packages if a full set of artifacts are
built.
End user impact
---------------
There is no end-user impact for consumers of an OpenStack cloud, except
perhaps that upgrades will be quicker to execute, thus resulting in reduced
maintenance slot requirements.
Deployer impact
---------------
* As deployments and upgrades will be quicker to execute, deployers will be
able to execute them in shorter maintenance slots.
* Deployers will need to understand how better to utilise the CI process to
prepare the required artifacts to speed up deployments.
Developer impact
----------------
As the build process will be integrated into the roles, it will be easier to
understand how it works and what it does.
Dependencies
------------
This spec will be implemented in partnership with
https://blueprints.launchpad.net/openstack-ansible/+spec/deployment-stages
Implementation
==============
Assignee(s)
-----------
Primary assignee:
jesse-pretorius (odyssey4me)
Work items
----------
Each of the roles implemented in the default AIO will be worked through in
sequence to re-arrange and optimise based on this workflow. The work items
are not being detailed here but will be reflected in gerrit through the
blueprint's topic and will be visible in launchpad.
Testing
=======
As this process matures, it may be simpler to use the integrated build for
all role testing instead of having two seperate test implementations. This
reduces technical debt for the project.
Documentation impact
====================
This work will need to include documentation updates which describe the new
way that deployments can be implemented using full artifact builds and
how to implement offline installs.
References
==========
* https://12factor.net/
* http://www.clearlytech.com/2014/01/04/12-factor-apps-plain-english/