From 42edb218bd9c0ca0ef658f7ef9e9039f1f760ae5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Jeanneret?= <cjeanner@redhat.com>
Date: Mon, 6 Aug 2018 16:10:12 +0200
Subject: [PATCH] Validation Framework specifications

Provide a common, unified validation framework inside tripleoclient.

This resubmits Iffaa3c99ac401626c70211437dd98f214b4973e4 previously
merged too fast.

This reverts commit 20fc7a387af043809ec96a6ac1c3bac29f60516b.

Blueprint: validation-framework
Change-Id: Ib99f82227d045c07d1e8b602627c8bcd6a88114c
---
 specs/stein/validation-framework.rst | 276 +++++++++++++++++++++++++++
 1 file changed, 276 insertions(+)
 create mode 100644 specs/stein/validation-framework.rst

diff --git a/specs/stein/validation-framework.rst b/specs/stein/validation-framework.rst
new file mode 100644
index 00000000..2a0c785f
--- /dev/null
+++ b/specs/stein/validation-framework.rst
@@ -0,0 +1,276 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=================================================================
+Provide a common Validation Framework inside python-tripleoclient
+=================================================================
+
+https://blueprints.launchpad.net/tripleo/+spec/validation-framework
+
+Currently, we're lacking a common validation framework in tripleoclient. This
+framework should provide an easy way to validate environment prior deploy and
+prior update/upgrade, on both undercloud and overcloud.
+
+Problem Description
+===================
+
+Currently, we have two types of validations:
+
+* Those launched prior the undercloud deploy, embedded into the deploy itself
+
+* Those launched at will via a Mistral Workflow
+
+There isn't any unified way to call any validations by itself in an easy way,
+and we lack the capacity to easily add new validations for the undercloud
+preflight checks.
+
+The current situation is not optimal, as the operator must go in the UI in order
+to run validations - there is a way to run them from the CLI, using the exact
+same workflows as the UI. This can't be used in order to get proper preflight
+validations, especially when we don't get a working Mistral (prior the
+undercloud deploy, or with all-on-one/standalone).
+
+Moreover, there is a need to make the CLI and UI converge. The latter already
+uses the full list of validations. Adding the full support of
+tripleo-validations to the CLI will improve the overall quality, usability and
+maintenance of the validations.
+
+Finally, a third type should be added: service validations called during the
+deploy itself. This doesn't directly affect the tripleoclient codebase, but
+tripleo-heat-templates.
+
+Proposed Change
+===============
+
+Overview
+--------
+
+In order to improve the current situation, we propose to create a new
+"branching" in the tripleoclient commands: `openstack tripleo validator`
+
+This new subcommand will allow to list and run validations in an independent
+way.
+
+Doing so will allow to get a clear and clean view on the validations we can run
+depending on the stage we're in.
+
+(Note: the subcommand has yet to be defined - this is only a "mock-up".)
+
+The following subcommands should be supported:
+
+* ``openstack tripleo validator list``: will display all the available
+  validations with a small description, like "validate network capabilities on
+  undercloud"
+
+* ``openstack tripleo validator run``: will run the validations. Should take
+  options, like:
+
+    * ``--validation-name``: run only the passed validation.
+    * ``--undercloud``: runs all undercloud-related validations
+    * ``--overcloud``: runs all overcloud-related validations
+    * ``--use-mistral``: runs validations through Mistral
+    * ``--use-ansible``: runs validations directly via Ansible
+    * ``--plan``: allows to run validations against specific plan. Defaults to
+      $TRIPLEO_PLAN_NAME or "overcloud"
+
+* in addition, common options for all the subcommands:
+
+  * ``--extra-playbooks``: path to a local directory containing validation
+    playbook maintained by the operator, or swift directory containing extra
+    validation playbooks.
+  * ``--output``: points to a valid Ansible output_callback, such as the native
+    *json*, or custom *validation_output*. The default one should be the latter
+    as it renders a "human readable" output. More callbacks can be added later.
+
+The ``--extra-playbooks`` must support both local path and remote swift
+container, since the custom validation support will push any validation to a
+dedicated swift directory.
+
+The default engine will be determined by the presence of Mistral: if Mistral is
+present and accepting requests (meaning the Undercloud is most probably
+deployed), the validator has to use it by default. If no Mistral is present, it
+must fallback on the ansible-playbook.
+
+The validations should be in the form of Ansible playbook, in order to be
+easily accessed from Mistral as well (as it is currently the case). It will
+also allow to get a proper documentation, canvas and gives the possibility to
+validate the playbook before running it (ensuring there are metadata, output,
+and so on).
+
+We might also create some dedicated playbooks in order to make a kind of
+"self validation", ensuring we actually can run the validations (network,
+resources, and so on).
+
+The UI uses Mistral workflows in order to run the validations - the CLI must
+be able to use those same workflows of course, but also run at least some
+validations directly via ansible, especially when we want to validate the
+undercloud environment before we even deploy it.
+
+In the end, all the default validation playbooks should be in one and only one
+location: tripleo-validations. The support for "custom validations" being added,
+such custom validation should also be supported (see references for details).
+
+In order to get a proper way to "aim" the validations, proper validation groups
+must be created and documented. Of course, one validation can be part of
+multiple groups.
+
+In addition, a proper documentation with examples describing the Good Practices
+regarding the playbooks content, format and outputs should be created.
+
+For instance, a playbook should contain a description, a "human readable error
+output", and if applicable a possible solution.
+
+Proper testing for the default validations (i.e. those in tripleo-validations)
+might be added as well in order to ensure a new validation follows the Good
+Practices.
+
+We might want to add support for "nagios-compatible outputs" and exit codes,
+but it is not sure running those validations through any monitoring tool is a
+good idea due to the possible load it might create. This has to be discussed
+later, once we get the framework in place.
+
+Alternatives
+------------
+
+No real alternatives in fact. Currently, we have many ways to validate, but
+they are all unrelated, not concerted. If we don't provide a unified framework,
+we will get more and more "side validations ways" and it won't be maintainable.
+
+Security Impact
+---------------
+
+Rights might be needed for some validations - they should be added accordingly
+in the system sudoers, in a way that limits unwanted privilege escalations.
+
+
+Other End User Impact
+---------------------
+
+The end user will get a proper way to validate the environment prior to any
+action.
+This will give more confidence in the final product, and ease the update and
+upgrade processes.
+
+It will also provide a good way to collect information about the systems in
+case of failures.
+
+If a "nagios-compatible output" is to be created (mix of ansible JSON output,
+parsing and compatibility stuff), it might provide a way to get a daily report
+about the health of the stack - this might be a nice feature, but not in the
+current scope (will need a new stdout_callback for instance).
+
+Performance Impact
+------------------
+
+The more validations we get, the more time it might take IF we decide to run
+them by default prior any action.
+
+The current way to disable them, either with a configuration file or a CLI
+option will stay.
+
+In addition, we can make a great use of "groups" in order to filter out greedy
+validations.
+
+
+Other Deployer Impact
+---------------------
+
+Providing a CLI subcommand for validation will make the deployment easier.
+
+Providing a unified framework will allow an operator to run the validations
+either from the UI, or from the CLI, without any surprise regarding the
+validation list.
+
+Developer Impact
+----------------
+
+A refactoring will be needed in python-tripleoclient and probably in
+tripleo-common in order to get a proper subcommand and options.
+
+A correct way to call Ansible from Python is to be decided (ansible-runner?).
+
+A correct way to call Mistral workflows from the CLI is to be created if it
+does not already exist.
+
+In the end, the framework will allow other Openstack projects to push their own
+validations, since they are the ones knowing how and what to validate in the
+different services making Openstack.
+
+All validations will be centralized in the tripleo-validations repository.
+This means we might want to create a proper tree in order to avoid having
+100+ validations in the same directory.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  cjeanner
+
+Other contributors:
+  akrivoka
+  ccamacho
+  dpeacock
+  florianf
+
+
+Work Items
+----------
+
+* List current existing validations in both undercloud_preflight.py and
+  openstack-tripleo-validations.
+
+* Decide if we integrate ansible-runner as a dependency (needs to be packaged).
+
+* Implement the undercloud_preflight validations as Ansible playbook.
+
+* Implement a proper way to call Ansible from the tripleoclient code.
+
+* Implement support for a configuration file dedicated for the validations.
+
+* Implement the new subcommand tree in tripleoclient.
+
+* Validate, Validate, Validate.
+
+
+Dependencies
+============
+
+* Ansible-runner: https://github.com/ansible/ansible-runner
+
+* Openstack-tripleo-validations: https://github.com/openstack/tripleo-validations
+
+
+
+Testing
+=======
+
+The CI can't possibly provide the "right" environment with all the requirements.
+The code has to implement a way to configure the validations so that the CI
+can override the *productive* values we will set in the validations.
+
+
+Documentation Impact
+====================
+
+A new entry in the documentation must be created in order to describe this new
+framework (for the devs) and new subcommand (for the operators).
+
+References
+==========
+
+* http://lists.openstack.org/pipermail/openstack-dev/2018-July/132263.html
+
+* https://bugzilla.redhat.com/show_bug.cgi?id=1599829
+
+* https://bugzilla.redhat.com/show_bug.cgi?id=1601739
+
+* https://review.openstack.org/569513 (custom validation support)
+
+* https://docs.openstack.org/tripleo-docs/latest/install/validations/validations.html