Validations in TripleO Workflows

Running validations inside the existing workflows will make them more
readily available in command line, lets us use them in the CI and
simplify the GUI code.

Change-Id: I4ce37e5e09cb37be0b83e7388102866e2ee8a8e1
This commit is contained in:
Tomas Sedovic 2016-11-02 18:23:56 +01:00
parent 3cca23ac90
commit eafff70947
1 changed files with 224 additions and 0 deletions

View File

@ -0,0 +1,224 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================
Validations in TripleO Workflows
================================
https://blueprints.launchpad.net/tripleo/+spec/validations-in-workflows
The Newton release introduced TripleO validations -- a set of
extendable checks that identify potential deployment issues early and
verify that the deployed OpenStack is set up properly. These
validations are automatically being run by the TripleO UI, but there
is no support for the command line workflow and they're not being
exercised by our CI jobs either.
Problem Description
===================
When enabled, TripleO UI runs the validations at the appropriate phase
of the planning and deployment. This is done within the TripleO UI
codebase and therefore not available to python-tripleoclient or
the CI.
The TripleO deployer can run the validations manually, but they need
to know at which point to do so and they will need to do it by calling
Mistral directly.
This causes a disparity between the command line and GUI experience
and complicates the efforts to exercise the validations by the CI.
Proposed Change
===============
Overview
--------
Each validation already advertises where in the planning/deployment
process it should be run. This is under the ``vars/metagata/groups``
section. In addition, the ``tripleo.validations.v1.run_groups``
Mistral workflow lets us run all validations belonging to a given
group.
For each validation group (currently ``pre-introspection``, ``pre-deployment``
and ``post-deployment``) we will update the appropriate workflow in
tripleo-common to optionally call ``run_groups``.
Each of the workflows above will receive a new Mistral input called
``run_validations``. It will be a boolean value that indicates whether
the validations ought to be run as part of that workflow or not.
To expose this functionality to the command line user, we will add an
option for enabling/disabling validations into python-tripleoclient
(which will set the ``run_validations`` Mistral input) and a way to
show the results of each validation to the screen output.
When the validations are run, they will report their status to Zaqar
and any failures will block the deployment. The deployer can disable
validations if they wish to proceed despite failures.
One unresolved question is the post-deployment validations. The Heat
stack create/update Mistral action is currently asynchronous and we
have no way of calling actions after the deployment has finished.
Unless we change that, the post-deployment validations may have to be
run manually (or via python-tripleoclient).
Alternatives
------------
1. Document where to run each group and how and leave it at that. This
risks that the users already familiar with TripleO may miss the
validations or that they won't bother.
We would still need to find a way to run validations in a CI job,
though.
2. Provide subcommands to run validations (and groups of validations)
into python-tripleoclient and rely on people running them manually.
This is similar to 1., but provides an easier way of running a
validation and getting its result.
Note that this may be a useful addition even if with the proposal
outlined in this specification.
3. Do what the GUI does in python-tripleoclient, too. The client will
know when to run which validation and will report the results back.
The drawback is that we'll need to implement and maintain the same
set of rules in two different codebases and have no API to do them.
I.e. what the switch to Mistral is supposed to solve.
Security Impact
---------------
None
Other End User Impact
---------------------
We will need to modify python-tripleoclient to be able to display the
status of validations once they finished. TripleO UI already does this.
The deployers may need to learn about the validations.
Performance Impact
------------------
Running a validation can take about a minute (this depends on the
nature of the validation, e.g. does it check a configuration file or
does it need to log in to all compute nodes).
This may can be a concern if we run multiple validations at the same
time.
We should be able to run the whole group in parallel. It's possible
we're already doing that, but this needs to be investigated.
Specifically, does ``with-items`` run the tasks in sequence or in
parallel?
There are also some options that would allow us to speed up the
running time of a validation itself, by using common ways of speeding
up Ansible playbooks in general:
* Disabling the default "setup" task for validations that don't need
it (this task gathers hardware and system information about the
target node and it takes some time)
* Using persistent SSH connections
* Making each validation task run independently (by default, Ansible
runs a task on all the nodes, waits for its completion everywhere
and then moves on to another task)
* Each validation runs the ``tripleo-ansible-inventory`` script which
gathers information about deployed servers and configuration from
Mistral and Heat. Running this script can be slow. When we run
multiple validations at the same time, we should generate the
inventory only once and cache the results.
Since the validations are going to be optional, the deployer can
always choose not to run them. On the other hand, any slowdown should
ideally outweigh the time spent investigating failed deployments.
We will also document the actual time difference. This information
should be readily available from our CI environments, but we should
also provide measurements on the bare metal.
Other Deployer Impact
---------------------
Depending on whether the validations will be run by default or not,
the only impact should be an option that lets the deployer to run them
or not.
Developer Impact
----------------
The TripleO developers may need to learn about validations, where to
find them and how to change them.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
tsedovic
Other contributors:
None
Work Items
----------
Work items or tasks -- break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we're mostly trying to understand the timeline for implementation.
* Add ``run_validations`` input and call ``run_groups`` from the
deployment and node registration workflows
* Add an option to run the validations to python-tripleoclient
* Display the validations results with python-tripleoclient
* Add or update a CI job to run the validations
* Add a CI job to tripleo-validations
Dependencies
============
None
Testing
=======
This should make the validations testable in CI. Ideally, we would
verify the expected success/failure for the known validations given
the CI environment. But having them go through the testing machinery
would be a good first step to ensure we don't break anything.
Documentation Impact
====================
We will need to document the fact that we have validations, where they
live and when and how are they being run.
References
==========
* http://docs.openstack.org/developer/tripleo-common/readme.html#validations
* http://git.openstack.org/cgit/openstack/tripleo-validations/
* http://docs.openstack.org/developer/tripleo-validations/