Merge "Initial version of developer doc for action"

2015-06-21 01:26:29 +00:00 · 2015-06-21 01:26:29 +00:00 · de8b506381
parent 3cef8499ab 2a159a6d36
commit de8b506381
1 changed files with 320 additions and 0 deletions
--- a/doc/source/developer/action.rst
+++ b/doc/source/developer/action.rst
@ -0,0 +1,320 @@
+..
+  Licensed under the Apache License, Version 2.0 (the "License"); you may
+  not use this file except in compliance with the License. You may obtain
+  a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+  License for the specific language governing permissions and limitations
+  under the License.
+
+Actions
+=======
+
+An action is an abstraction of some logic that can be executed by a worker
+thread. Most of the operations supported by Senlin are executed asynchronously,
+which means they are queued into database and then picked up by certain worker
+thread for execution.
+
+Currently, Senlin only supports builtin actions listed below. In future, we
+may evolve to support user-defined actions (UDAs). A user-defined action may
+carry a Shell script to be executed on a target Nova server, or a Heat
+SoftwareConfig to be deployed on a stack, for example. The following builtin
+actions are supported at the time of this design:
+
+- ``CLUSTER_CREATE``: An action for creating a cluster;
+- ``CLUSTER_DELETE``: An action for deleting a cluster;
+- ``CLUSTER_UPDATE``: An action for updating a cluster;
+- ``CLUSTER_ADD_NODES``: An action for adding existing nodes to a cluster;
+- ``CLUSTER_DEL_NODES``: An action for removing nodes from a cluster;
+- ``CLUSTER_RESIZE``: An action for adjusting the size of a cluster;
+- ``CLUSTER_SCALE_IN``: An action to shrink the size of a cluster by removing
+  nodes from the cluster;
+- ``CLUSTER_SCALE_OUT``: An action to extend the size of a cluster by creating
+  new nodes using the ``profile_id`` of the cluster;
+- ``CLUSTER_ATTACH_POLICY``: An action to attach a policy to a cluster;
+- ``CLUSTER_DETACH_POLICY``: An action to detach a policy from a cluster;
+- ``CLUSTER_UPDATE_POLICY``: An action to update the properties of a binding
+  between a cluster and a policy;
+- ``NODE_CREATE``: An action for creating a new node;
+- ``NODE_DELETE``: An action for deleting an existing node;
+- ``NODE_UPDATE``: An action for updating the properties of an existing node;
+- ``NODE_JOIN``: An action for joining a node to an existing cluster;
+- ``NODE_LEAVE``: An action for a node to leave its current owning cluster;
+- ``POLICY_ENABLE``: An action for globally enabling a policy;
+- ``POLICY_DISABLE``: An action for globally disabling a policy;
+- ``POLICY_UPDATE``: An action for updating the properties of a policy.
+
+
+-----------------
+Action Properties
+-----------------
+
+An action has the following properties when created:
+
+- ``id``: a globally unique ID for the action object;
+- ``name``: a string representation of the action name which might be
+  generated automatically for actions derived from other operations;
+- ``context``: a dictionary that contains the calling context that will be
+  used by the engine when executing the action. Contents in this dictionary
+  may contain sensitive information such as user credentials.
+- ``action``: a text property that contains the action body to be executed.
+  Currently, this property only contains the name of a builtin action. In
+  future, we will provide a structured definition of action for UDAs.
+- ``target``: the UUID of an object (e.g. a cluster, a node or a policy) to
+  be operated;
+- ``cause``: a string indicating the reason why this action was created. The
+  purpose of this property is for the engine to check whether a new lock should
+  be acquired before operating an object. Valid values for this property
+  include:
+
+  * ``RPC Request``: this indicates that the action was created upon receiving
+    a RPC request from Senlin API, which means a lock is likely needed;
+  * ``Derived Action``: this indicates that the action was created internally
+    as part of the execution path of another action, which means a lock might
+    have been acquired;
+
+- ``owner``: the UUID of a worker thread that currently "owns" this action and
+  is responsible for executing it.
+- ``interval``: the interval (in seconds) for repetitive actions, a value of 0
+  means that the action won't be repeated;
+- ``start_time``: timestamp when the action was last started. This field is
+  provided for action execution timeout detection;
+- ``stop_time``: timestamp when the action was stopped. This field is provided
+  for measuring the execution time of an action;
+- ``timeout``: timeout (in seconds) for the action execution. A value of 0
+  means that the action does not have a customized timeout constraint, though
+  it may still have to honor the system wide ``default_action_timeout``
+  setting.
+- ``status``: a string representation of the current status of the action. See
+  subsection below for detailed status definitions.
+- ``status_reason``: a string describing the reason that has led the action to
+  its current status.
+- ``control``: a string for holding the pending signals such as ``CANCEL``,
+  ``SUSPEND`` or ``RESUME``.
+- ``inputs``: a dictionary that provides inputs to the action when executed;
+- ``outputs``: a dictionary that captures the outputs (including error
+  messages) from the action execution;
+- ``depends_on``: a UUID list for the actions that must be successfully
+  completed before the current action becomes ``READY``. An action cannot
+  become ``READY`` when this property is not an empty string.
+- ``depended_by``: a UUID list for the actions that depends on the successful
+  completion of current action. When the current action is completed with a
+  success, the actions listed in this property will get notified.
+- ``created_time``: the timestamp when the action was created;
+- ``updated_time``: the timestamp when the action was last updated;
+- ``deleted_time``: the timestamp when the action was deleted. Note that a
+  non-empty value of this property alway indicates that the action is deleted.
+
+*TODO*: Add support for scheduled action execution.
+
+*NOTE*: The default value of the ``default_action_timeout`` is 3600 seconds.
+
+
+The Action Data Property
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+An action object has a property named ``data`` which is used for saving policy
+decisions. This property is a Python dict for different policies to save and
+exchange policy decision data.
+
+Suppose we have a scaling policy, a deletion policy and a load-balancing
+policy attached to the same cluster. By design, when an ``CLUSTER_SCALE_IN``
+action is picked up for execution, the following sequence will happen:
+
+1) When the action is about to be executed, the worker thread checks all
+   policies that have registered a "pre_op" on this action type.
+2) Based on priority setting, the "pre_op" of the scaling policy is invoked,
+   and the policy determines the number of nodes to be deleted. This decision
+   is saved to the action's ``data`` property in the following format:
+
+::
+
+   "deletion": {
+     "count": 2
+   }
+
+3) Based on the priority setting, the deletion policy is evaluated next. When
+   the "pre_op" method of the deletion policy is invoked, it first checks the
+   ``data`` property of the action where it finds out the number of nodes to
+   delete. Then it will calculate the list of candidates to be deleted using
+   its selection criteria (e.g. ``OLDEST_FIRST``). Finally, it saves the list
+   of candidate nodes to be deleted to the ``data`` property of the action, in
+   the following format:
+
+::
+
+   "deletion": {
+     "count": 2,
+     "candidates": ["1234-4567-9900", "3232-5656-1111"]
+   }
+
+4) According to priority setting, the load-balancing policy is evaluated last.
+   When invoked, its "pre_op" method checks the ``data`` property of the
+   action and finds out the candidate nodes to be removed from the cluster.
+   With this information, the method removes the nodes from the load-balancer
+   maintained by the policy.
+
+5) The action's ``execute()`` method is now invoked and it removes the nodes
+   as given in its ``data`` property, updates the cluster's last update
+   timestamp, then returns.
+
+From the example above, we can see that the ``data`` property of an action
+plays a critical role in policy checking and enforcement. To avoid losing of
+the in-memory ``data`` content during service restart, Senlin persists the
+content to database whenever it is changed.
+
+Note that there are policies that will write to the ``data`` property of a
+node for a similar reason. For example, a placement policy may decide where a
+new node should be created. This information is saved into the ``data``
+property of a node. When a profile is about to create a node, it is supposed
+to check this property and enforce it. For a Nova server profile, this means
+that the profile code will inject ``scheduler_hints`` to the server instance
+before it is created.
+
+
+---------------
+Action Statuses
+---------------
+
+An action can be in one of the following statuses during its lifetime:
+
+- ``INIT``: Action object is being initialized, not ready for execution;
+- ``READY``: Action object can be picked up by any worker thread for
+  execution;
+- ``WAITING``: Action object has dependencies on other actions, it may
+  become ``READY`` only when the dependents are all completed with successes;
+- ``RUNNING``: Action object is being executed by a worker thread;
+- ``SUSPENDED``: Action object is suspended during execution, so the only way
+  to put it back to ``RUNNING`` status is to send it a ``RESUME`` signal;
+- ``SUCCEEDED``: Action object has completed execution with a success;
+- ``FAILED``: Action object execution has been aborted due to failures;
+- ``CANCELLED``: Action object execution has been aborted due to a ``CANCEL``
+  signal.
+
+Collectively, the ``SUCCEEDED``, ``FAILED`` and ``CANCELLED`` statuses are all
+valid action completion status.
+
+
+------------------------------------------
+The ``execute()`` Method and Return Values
+------------------------------------------
+
+Each subclass of the base ``Action`` must provide an implementation of the
+``execute()`` method which provides the actual logic to be invoked by the
+generic action execution framework.
+
+Senlin defines a protocol for the execution of actions. The ``execute()``
+method should always return a tuple ``<RES>, <REASON>`` where the ``<RES>``
+indicates whether the action procedure execution was successful and the
+``<REASON>`` provides an explanation of the result, e.g. the error message
+when the execution has failed. In this protocol, the action procedure can
+return one of the following values:
+
+- ``OK``: the action execution was a complete success;
+- ``ERROR``: the action execution has failed with error messages;
+- ``RETRY``: the action execution has encountered some resource competition
+  situation, so the recommendation is to re-start the action if possible;
+- ``CANCEL``: the action has received a ``CANCEL`` signal and thus has aborted
+  its execution;
+- ``TIMEOUT``: the action has detected a timeout error when performing some
+  time consuming jobs.
+
+When the return value is ``OK``, the action status will be set to
+``SUCCEEDED``; when the return value is ``ERROR`` or ``TIMEOUT``, the action
+status will be set to ``FAILED``; when the return value is ``CANCEL``, the
+action status will be set to ``CANCELLED``; finally, when the return value is
+``RETRY``, the action status is reset to ``READY``, and the current worker
+thread will release its lock on the action so that other threads can pick it
+up when resources permit.
+
+
+------------------
+Creating An Action
+------------------
+
+Currently, Senlin actions are mostly generated from within the Senlin engine,
+either due to a RPC request, or due to aother action's execution.
+
+In future, Senlin plans to support user-defined actions (UDAs). Senlin API will
+provide API for creating an UDA and invoking an action which can be an UDA.
+
+
+---------------
+Listing Actions
+---------------
+
+Senlin provides an ``action_list`` API for users to query the action objects
+in the Senlin database. Such a query request can be accompanied with the
+following query parameters in the query string:
+
+- ``filters``: a map that will be used for filtering out records that fail to
+  match the criteria. The recognizable keys in the map include:
+
+  * ``name``: the name of the actions where the value can be a string or a
+    list of strings;
+  * ``target``: the UUID of the object targeted by the action where the value
+    can be a string or a list of strings;
+  * ``action``: the builtin action for matching where the value can be a
+    string or a list of strings;
+  * ``created_time``: the timestamp the action was created;
+  * ``updated_time``: the timestamp the action as last updated;
+  * ``deleted_time``: the timestamp the action was deleted.
+
+- ``limit``: a number that restricts the maximum number of action records to be
+  returned from the query. It is useful for displaying the records in pages
+  where the page size can be specified as the limit.
+- ``marker``: A string that represents the last seen UUID of actions in
+  previous queries. This query will only return results appearing after the
+  specified UUID. This is useful for displaying records in pages.
+- ``sort_dir``: A string to enforce sorting of the results. It can accept
+  either ``asc`` or ``desc`` as its value.
+- ``sort_keys``: A string or a list of strings where each string gives an
+  action property name used for sorting.
+- ``show_deleted``: A boolean indicating whether deleted actions should be
+  included in the results. The default is False.
+
+
+-----------------
+Getting An Action
+-----------------
+
+Senlin API provides the ``action_show`` API call for software or a user to
+retrieve a specific action for examining its details. When such a query
+arrives at the Senlin engine, the engine will search the database for the
+``action_id`` specified.
+
+User can provide the UUID, the name or the short ID of an action as the
+``action_id`` for query. The Senlin engine will try each of them in sequence.
+When more than one action matches the criteria, an error message is returned
+to user, or else the details of the action object is returned.
+
+
+-------------------
+Signaling An Action
+-------------------
+
+When an action is in ``RUNNING`` status, a user can send signals to it. A
+signal is actually a word that will be written into the ``control`` field of
+the ``action`` table in the database.
+
+When an action is capable of handling signals, it is supposed to check its
+``control`` field in the DB table regularly and abort execution in a graceful
+way. An action has the freedom to check or ignore these signals. In other
+words, Senlin cannot guarantee that a signal will have effect on any action.
+
+The currently supported signal words are:
+
+- ``CANCEL``: this word indicates that the target action should cancel its
+  execution and return when possible;
+- ``SUSPEND``: this word indicates that the target action should suspend its
+  execution when possible. The action doesn't have to return. As an
+  alternative, it can sleep waiting on a ``RESUME`` signal to continue its
+  work;
+- ``RESUME``: this word indicates that the target action, if suspended, should
+  resume its execution.
+
+The support to ``SUSPEND`` and ``RESUME`` signals are still under development.