Validations in TripleO Workflows

Running validations inside the existing workflows will make them more readily available in command line, lets us use them in the CI and simplify the GUI code. Change-Id: I4ce37e5e09cb37be0b83e7388102866e2ee8a8e1
2016-11-02 18:23:56 +01:00 · 2016-11-02 18:23:56 +01:00 · eafff70947
commit eafff70947
parent 3cca23ac90
1 changed files with 224 additions and 0 deletions
--- a/specs/ocata/validations-in-workflows.rst
+++ b/specs/ocata/validations-in-workflows.rst
@ -0,0 +1,224 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+================================
+Validations in TripleO Workflows
+================================
+
+https://blueprints.launchpad.net/tripleo/+spec/validations-in-workflows
+
+The Newton release introduced TripleO validations -- a set of
+extendable checks that identify potential deployment issues early and
+verify that the deployed OpenStack is set up properly. These
+validations are automatically being run by the TripleO UI, but there
+is no support for the command line workflow and they're not being
+exercised by our CI jobs either.
+
+
+Problem Description
+===================
+
+When enabled, TripleO UI runs the validations at the appropriate phase
+of the planning and deployment. This is done within the TripleO UI
+codebase and therefore not available to python-tripleoclient or
+the CI.
+
+The TripleO deployer can run the validations manually, but they need
+to know at which point to do so and they will need to do it by calling
+Mistral directly.
+
+This causes a disparity between the command line and GUI experience
+and complicates the efforts to exercise the validations by the CI.
+
+
+Proposed Change
+===============
+
+Overview
+--------
+
+Each validation already advertises where in the planning/deployment
+process it should be run. This is under the ``vars/metagata/groups``
+section. In addition, the ``tripleo.validations.v1.run_groups``
+Mistral workflow lets us run all validations belonging to a given
+group.
+
+For each validation group (currently ``pre-introspection``, ``pre-deployment``
+and ``post-deployment``) we will update the appropriate workflow in
+tripleo-common to optionally call ``run_groups``.
+
+Each of the workflows above will receive a new Mistral input called
+``run_validations``. It will be a boolean value that indicates whether
+the validations ought to be run as part of that workflow or not.
+
+To expose this functionality to the command line user, we will add an
+option for enabling/disabling validations into python-tripleoclient
+(which will set the ``run_validations`` Mistral input) and a way to
+show the results of each validation to the screen output.
+
+When the validations are run, they will report their status to Zaqar
+and any failures will block the deployment. The deployer can disable
+validations if they wish to proceed despite failures.
+
+One unresolved question is the post-deployment validations. The Heat
+stack create/update Mistral action is currently asynchronous and we
+have no way of calling actions after the deployment has finished.
+Unless we change that, the post-deployment validations may have to be
+run manually (or via python-tripleoclient).
+
+
+Alternatives
+------------
+
+1. Document where to run each group and how and leave it at that. This
+   risks that the users already familiar with TripleO may miss the
+   validations or that they won't bother.
+
+   We would still need to find a way to run validations in a CI job,
+   though.
+
+2. Provide subcommands to run validations (and groups of validations)
+   into python-tripleoclient and rely on people running them manually.
+
+   This is similar to 1., but provides an easier way of running a
+   validation and getting its result.
+
+   Note that this may be a useful addition even if with the proposal
+   outlined in this specification.
+
+3. Do what the GUI does in python-tripleoclient, too. The client will
+   know when to run which validation and will report the results back.
+
+   The drawback is that we'll need to implement and maintain the same
+   set of rules in two different codebases and have no API to do them.
+   I.e. what the switch to Mistral is supposed to solve.
+
+
+
+Security Impact
+---------------
+
+None
+
+Other End User Impact
+---------------------
+
+We will need to modify python-tripleoclient to be able to display the
+status of validations once they finished. TripleO UI already does this.
+
+The deployers may need to learn about the validations.
+
+Performance Impact
+------------------
+
+Running a validation can take about a minute (this depends on the
+nature of the validation, e.g. does it check a configuration file or
+does it need to log in to all compute nodes).
+
+This may can be a concern if we run multiple validations at the same
+time.
+
+We should be able to run the whole group in parallel. It's possible
+we're already doing that, but this needs to be investigated.
+Specifically, does ``with-items`` run the tasks in sequence or in
+parallel?
+
+There are also some options that would allow us to speed up the
+running time of a validation itself, by using common ways of speeding
+up Ansible playbooks in general:
+
+* Disabling the default "setup" task for validations that don't need
+  it (this task gathers hardware and system information about the
+  target node and it takes some time)
+* Using persistent SSH connections
+* Making each validation task run independently (by default, Ansible
+  runs a task on all the nodes, waits for its completion everywhere
+  and then moves on to another task)
+* Each validation runs the ``tripleo-ansible-inventory`` script which
+  gathers information about deployed servers and configuration from
+  Mistral and Heat. Running this script can be slow. When we run
+  multiple validations at the same time, we should generate the
+  inventory only once and cache the results.
+
+Since the validations are going to be optional, the deployer can
+always choose not to run them. On the other hand, any slowdown should
+ideally outweigh the time spent investigating failed deployments.
+
+We will also document the actual time difference. This information
+should be readily available from our CI environments, but we should
+also provide measurements on the bare metal.
+
+
+Other Deployer Impact
+---------------------
+
+Depending on whether the validations will be run by default or not,
+the only impact should be an option that lets the deployer to run them
+or not.
+
+
+Developer Impact
+----------------
+
+The TripleO developers may need to learn about validations, where to
+find them and how to change them.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  tsedovic
+
+Other contributors:
+  None
+
+Work Items
+----------
+
+Work items or tasks -- break the feature up into the things that need to be
+done to implement it. Those parts might end up being done by different people,
+but we're mostly trying to understand the timeline for implementation.
+
+* Add ``run_validations`` input and call ``run_groups`` from the
+  deployment and node registration workflows
+* Add an option to run the validations to python-tripleoclient
+* Display the validations results with python-tripleoclient
+* Add or update a CI job to run the validations
+* Add a CI job to tripleo-validations
+
+
+Dependencies
+============
+
+None
+
+
+Testing
+=======
+
+This should make the validations testable in CI. Ideally, we would
+verify the expected success/failure for the known validations given
+the CI environment. But having them go through the testing machinery
+would be a good first step to ensure we don't break anything.
+
+
+Documentation Impact
+====================
+
+We will need to document the fact that we have validations, where they
+live and when and how are they being run.
+
+
+References
+==========
+
+* http://docs.openstack.org/developer/tripleo-common/readme.html#validations
+* http://git.openstack.org/cgit/openstack/tripleo-validations/
+* http://docs.openstack.org/developer/tripleo-validations/