Graph concept extension

Blueprint: graph-concept-extension
Change-Id: I1d0b9844a2603774f261b7d933e0c720ecd0e112
This commit is contained in:
Vladimir Kuklin 2016-08-15 15:39:38 +03:00 committed by Igor Kalnitsky
parent a14e3d1560
commit 9f74458044
1 changed files with 548 additions and 0 deletions

View File

@ -0,0 +1,548 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================
Fuel Graph Concept Extension And Usage
======================================
https://blueprints.launchpad.net/fuel/+spec/graph-concept-extension
There is introduced a new opportunity that allows to execute graphs
for different purposes by the Fuel graph concept extension.
-------------------
Problem description
-------------------
Currently, the Fuel graph concept is tied to the deployment process. For
example, we can't use graphs for provisioning, deletion or verification.
Those actions are hardcoded in Nailgun and Astute, and there's no way to
extend extend them easily.
Meantime we want to see every action as a graph in order to make it pluggable
and extendable, since end users usually want to somehow change them. For
instance, some of them want to use torrent protocol for image delivering
instead of HTTP and there's no way to change it so far.
Another problem is that we can't verify advanced network configuration in
bootstrap mode. The problem lies in our approach where network-checker is
responsible only for basic configuration while we need l23network manifest
to be applied in order to verify network against real configuration.
Having everything in the graphs allows to reuse that puppet manifest, and
hence prepare network for verification.
There're plenty of places where we have hardcoded actions instead of
declarative ones. Moving them into graphs will help to clean and simplify
our code base, as well as provide opportunity to customize them manually
or via plugins.
----------------
Proposed changes
----------------
#. **Transaction Manager**
Nailgun should have a general transaction manager for running graphs as
well as a bunch of them within a single transaction.
The transaction manager must be used by the new RESTful API endpoint
for executing graphs. See REST API section for details.
#. **Default Graphs for Basic Actions**
At minimum we want to see the following actions as graphs:
* Deployment (done)
* Provisioning
* Verification
* Deletion
Hence, fuel-library should provide tasks for those graphs the same
way they provide them for deployment. The proposed way is to separate
them on filesystem (drop into different directories) and sync them
one be one by passing additional argument to Fuel CLI. Example:
.. code-block:: console
fuel rel --sync-deployment-tasks --dir /etc/puppet/ --graph provision
#. **Scenarios**
Scenarios is the way to run specified graphs one-by-one, each on pre-defined
set of nodes. A set of nodes could be specified either explicitly or by
using YAQL expression.
Scenarios is a good way to provide a high level orchestration flows such
as "Deploy Changes" in declarative manner.
#. **New Astute tasks**
In order to support existing scenarios as graphs we need to implement the
following tasks in task-based format in Astute:
* ``erase_node`` - run mcollective erase_node action
* ``master_shell`` - execute a task on the master node with a particular
node context
* ``move_to_bootstrap`` - reregister node with a bootstrap profile in
cobbler
#. **New method of nodes statuses update**
In order to get rid of hardcoded state machine of node statuses, we
need to provide a way to set node statuses in a data driven format.
Hence, it's proposed to add a set of callbacks: ``on_success``, ``on_error``
and ``on_stop``.
.. code-block:: yaml
graph_metadata:
on_success:
node_attributes:
status: ready
on_error:
node_attributes:
status: error
error_type: deploy
on_stop: null
Web UI
======
Custom graphs management in Fuel UI was described and implemented within the
[1], although the ability to execute a sequence of graphs is introduced in this
spec as extension.
Working in 'Custom Scenarios' deployment mode, user should be able to specify
a sequence of space-separated graph types, that he wants to execute.
Also, it is necessary to use a new ``/api/v1/graphs/execute/`` handler (that
works with transactions manager) in Fuel UI to run a graph/graphs.
Nailgun
=======
Data model
----------
#. Having everything defined as a graph and mechanism to run few graphs within
a single transaction, simple means we can't rely on task's name anymore. It
makes more sense to distinguish runs by two criteria: ``graph_type`` and
``dry_run``. So it's proposed to extend ``tasks`` table with those columns
and mark ``tasks.name`` as deprecated column.
#. Transient node statuses shouldn't be persisted in database. That means
``nodes::status`` attribute should contain either ``discover`` or
``provisioned`` or ``deployed``. Statuses ``provisioning``, ``deploying``
and ``error`` should be calculated based on node attributes.
* ``provisioning`` = ``discovery`` + ``progress >= 0``
* ``deploying`` = ``provisioned`` + ``progress >= 0``
* ``error`` = ``error_type`` is not ``null``
When any action is committed the ``progress`` should be resetted to
``100``.
``error_type`` should not be limited to pre-defined set of types.
#. In order to implement scenarios, we need to design a database schema for
new entity. Here's a proposed solution:
.. code-block:: text
.
SCENARIOS_ACTS
SCENARIOS +--------------------+
+-----------+ | + id (pk) |
| + id (pk) |<------------| + scenario_id (fk) |
| + name | | + order |
+-----------+ | + graph_type |
| + nodes |
+--------------------+
where:
* ``scenarios::name`` is a unique identifier to be used by clients for
running scenarios;
* ``scenarios_acts::scenario_id`` is a foreign key to ``scenarios``;
* ``scenarios_acts::order`` is an execution order in scenario;
* ``scenarios_acts::graph_type`` is a graph type to run;
* ``scenarious_acts::nodes`` is a JSON column that may contain either
hardcoded JSON array with nodes IDs or JSON object with ``yaql_exp`` key
for getting nodes IDs on fly;
Executing scenarios mean: run its graphs on corresponding set of nodes
within a single transaction.
REST API
--------
#. **Graphs Execution**
.. http:post:: /graphs/execute
Execute passed graphs.
**Request:**
.. code-block:: http
POST /graphs/execute HTTP/1.1
{
"cluster": <cluster-id>,
"graphs": [
{
"type": "graph-type-1",
"nodes": [1, 2, 3, 4],
"tasks": ["task-a", "task-b"]
},
{
"type": "graph-type-2",
"nodes": [3, 4],
"tasks": ["task-c", "task-d"]
},
],
"dry_run": false,
"force": false
}
where:
* ``cluster`` -- cluster id;
* ``graphs`` -- list of graphs to be executed, with optional ``nodes``
and ``tasks`` params;
* ``dry_run`` (optional, default: false) -- run graphs in dry run mode;
* ``force`` (optional, default: false) -- execute tasks anyway; don't
take into account previous runs.
**Response:**
.. code-block:: http
HTTP/1.1 202 Accepted
{
"task_uuid": "transaction-uuid",
...
}
where:
* ``task_uuid`` -- unique ID of accepted transaction
As the graph term was extended, some requests should be modified to avoid
misunderstanding. In the following requests the deployment/deploy word
should be removed:
* ``GET /releases/<release_id>/deployment_graphs/``
* ``GET/POST/PUT/PATCH/DELETE /releases/<release_id>/deployment_graphs/<graph_type>/``
* ``GET /releases/<release_id>/deployment_tasks/``
* ``GET /clusters/<cluster_id>/deployment_graphs/``
* ``GET /clusters/<cluster_id>/deployment_tasks/``
* ``GET/POST/PUT/PATCH/DELETE /clusters/<cluster_id>/deployment_graphs/<graph_type>/``
* ``GET /plugins/<plugin_id>/deployment_graphs/``
* ``GET/POST/PUT/PATCH/DELETE /plugins/<plugin_id>/deployment_graphs/<graph_type>/``
* ``GET /clusters/<cluster_id>/deploy_tasks/graph.gv``
#. **Scenarios**
.. http:post:: /scenarios
Create a new workflow.
**Request:**
.. code-block:: http
POST /scenarios HTTP/1.1
{
"name": "deploy-changes",
"scenario": [
{
"graph_type": "provision",
"nodes": {
"yaql_exp": "select nodes for provisioning"
}
},
{
"graph_type": "deployment"
"nodes": ...,
}
...
]
}
.. http:get:: /scenarios
List available scenarios.
**Response:**
.. code-block:: http
HTTP/1.1 200 Ok
[
{
"id": 1,
"name": "deploy-changes",
"scenario": [
... scenario's acts ...
]
},
{
"id": 2,
...
}
]
.. http:post:: /scenarios/:name/execute
Run a scenarios with a given ``name``. If successful a transaction ID
is returned.
**Response:**
.. code-block:: http
HTTP/1.1 202 Accepted
{
"task_uuid": "transaction uuid"
}
Orchestration
=============
None
RPC Protocol
------------
None
Fuel Client
===========
For listing/uploading/downloading will be used the common custom graph commands
[0].
The graph execution command should stay practically the same, however it is
necessary to be able to define several graph types to run them one by one. Also
it should be possible to enforce execution of tasks without skipping and to run
only specific tasks ignoring dependancies.
.. code-block:: console
fuel2 graph execute --env 1 [--nodes 1 2 3]
[--graph-types gtype1 gtype2]
[--task-names task1 task2]
[--force]
[--dry-run]
where
* ``--nodes`` executes only on passed nodes;
* ``--graph-types`` executes passed graphs within one transaction;
* ``--task-names`` executes only passed tasks skipping others;
* ``--force`` executes tasks anyway;
* ``--dry-run`` executes in dry-run mode (doesn't affect nodes)
Plugins
=======
None
Fuel Library
============
* Compose the default provisioning and deletion graphs.
* Compose the default verification graph. This graph should contain tasks
for the network configuring and checking.
* All default graphs should be loaded during the Fuel installation with
the corresponding graph types.
------------
Alternatives
------------
None for the whole approach.
For the verification tool:
* Use the standard network verification mechanism, although in this
case we have a deal with non-realistic network configuration.
* Use connectivity checker plugin [2] to verify network during
the deployment, but it will take more time to rework.
--------------
Upgrade impact
--------------
Some API endpoints are renamed so it breaks backward compatibility.
---------------
Security impact
---------------
None
--------------------
Notifications impact
--------------------
None
---------------
End user impact
---------------
Ability to:
* execute different graphs for different purposes.
* check the realistic network configuration design before the deployment
process.
------------------
Performance impact
------------------
None
-----------------
Deployment impact
-----------------
The whole mechanism is more flexible. The provisioning part is configurable
and easier to debug. Thanks to the verification graph mechanism, errors
detection before the deployment stage may save a lot of time in case of
reconfiguration necessity.
----------------
Developer impact
----------------
None
---------------------
Infrastructure impact
---------------------
None
--------------------
Documentation impact
--------------------
* API, CLI and UI documentations should be extended according to the
appropriate changes.
--------------
Implementation
--------------
Assignee(s)
===========
Primary assignee:
bgaifullin
Other contributors:
vsharshov (astute)
sbogatkin (library: deletion, provisioning)
lefremova (library: verification)
ikutukov (client)
Mandatory design review:
ashtokolov
vkuklin
Work Items
==========
* Implement transaction manager that runs a bunch of graphs one by one,
each with own context generated on top of changes committed by previous
graph.
* Implement new Astute tasks for moving nodes to bootstrap, running shell
tasks on master node with context of other roles and removing nodes.
* Implement new graphs to run provisioning, deployment, deletion and
verification.
* Implement CLI interface to run graphs in one transaction.
* Implement Fuel UI to run graphs in one transaction as well as scenarios.
Dependencies
============
Custom graph management on UI [1].
-----------
Testing, QA
-----------
* New logic in nailgun should be covered by unit and integration tests.
* Functional tests that executes verification and provisioning graphs on
bootstrap nodes should be introduced.
Acceptance criteria
===================
* The Fuel graph concept is extended so we can use a graph mechanism
for different purposes.
* Network checking tool in Fuel is introduced for realistic configurations
via execution an appropriate verification graph on bootstrap nodes.
So as a cloud operator I have the possibility to investigate the production
specific network defects before the deployment.
* Provisioning and deletion mechanisms also work via the corresponding graphs
execution.
* While the default graphs for the base actions are loaded during the Fuel
insallation, user may specify and execute custom graphs.
----------
References
----------
[0] Allow user to run custom graph on cluster
https://blueprints.launchpad.net/fuel/+spec/custom-graph-execution
[1] Custom graph management on UI
https://blueprints.launchpad.net/fuel/+spec/ui-custom-graph
[2] Connectivity checker plugin
https://github.com/xenolog/fuel-plugin-connectivity-checker