Merge tag '0.4.0' into debian/experimental

Release 0.4.0 Conflicts: doc/source/utils.rst requirements.txt taskflow/tests/unit/action_engine/test_compile.py taskflow/tests/unit/test_action_engine_compile.py taskflow/tests/unit/test_flattening.py
2014-10-01 16:31:01 +08:00
parent 28bd701265 4529eb7222
commit fead4e710e
152 changed files with 13513 additions and 1760 deletions
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -1,8 +1,7 @@
 If you would like to contribute to the development of OpenStack,
-you must follow the steps in the "If you're a developer, start here"
-section of this page:
+you must follow the steps documented at:

-   http://wiki.openstack.org/HowToContribute
+   http://wiki.openstack.org/HowToContribute#If_you.27re_a_developer

 Once those steps have been completed, changes to OpenStack
 should be submitted for review via the Gerrit tool, following
--- a/README.rst
+++ b/README.rst
@@ -20,13 +20,14 @@ Requirements

 Because TaskFlow has many optional (pluggable) parts like persistence
 backends and engines, we decided to split our requirements into two
-parts: - things that are absolutely required by TaskFlow (you can’t use
-TaskFlow without them) are put to ``requirements.txt``; - things that
-are required by some optional part of TaskFlow (you can use TaskFlow
-without them) are put to ``optional-requirements.txt``; if you want to
-use the feature in question, you should add that requirements to your
-project or environment; - as usual, things that required only for
-running tests are put to ``test-requirements.txt``.
+parts: - things that are absolutely required by TaskFlow (you can't use
+TaskFlow without them) are put into ``requirements-pyN.txt`` (``N`` being the
+Python *major* version number used to install the package); - things that are
+required by some optional part of TaskFlow (you can use TaskFlow without
+them) are put into ``optional-requirements.txt``; if you want to use the
+feature in question, you should add that requirements to your project or
+environment; - as usual, things that required only for running tests are
+put into ``test-requirements.txt``.

 Tox.ini
 ~~~~~~~
--- a/doc/diagrams/core.graffle
+++ b/doc/diagrams/core.graffle
--- a/doc/source/arguments_and_results.rst
+++ b/doc/source/arguments_and_results.rst
@@ -53,8 +53,8 @@ the task.
    ...     def execute(self, spam, eggs):
    ...         return spam + eggs
    ...
-    >>> MyTask().requires
-    set(['eggs', 'spam'])
+    >>> sorted(MyTask().requires)
+    ['eggs', 'spam']

 Inference from the method signature is the ''simplest'' way to specify task
 arguments. Optional arguments (with default values), and special arguments like
--- a/doc/source/conductors.rst
+++ b/doc/source/conductors.rst
@@ -24,9 +24,41 @@ They are responsible for the following:

 .. note::

-     They are inspired by and have similar responsiblities
+     They are inspired by and have similar responsibilities
     as `railroad conductors`_.

+Considerations
+==============
+
+Some usage considerations should be used when using a conductor to make sure
+it's used in a safe and reliable manner. Eventually we hope to make these
+non-issues but for now they are worth mentioning.
+
+Endless cycling
+---------------
+
+**What:** Jobs that fail (due to some type of internal error) on one conductor
+will be abandoned by that conductor and then another conductor may experience
+those same errors and abandon it (and repeat). This will create a job
+abandonment cycle that will continue for as long as the job exists in an
+claimable state.
+
+**Example:**
+
+.. image:: img/conductor_cycle.png
+   :scale: 70%
+   :alt: Conductor cycling
+
+**Alleviate by:**
+
+#. Forcefully delete jobs that have been failing continuously after a given
+   number of conductor attempts. This can be either done manually or
+   automatically via scripts (or other associated monitoring).
+#. Resolve the internal error's cause (storage backend failure, other...).
+#. Help implement `jobboard garbage binning`_.
+
+.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin
+
 Interfaces
 ==========

--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.abspath('../..'))
 extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.doctest',
+    'sphinx.ext.extlinks',
    'sphinx.ext.inheritance_diagram',
    'sphinx.ext.intersphinx',
    'sphinx.ext.viewcode',
@@ -37,6 +38,7 @@ exclude_patterns = ['_build']
 # General information about the project.
 project = u'TaskFlow'
 copyright = u'2013-2014, OpenStack Foundation'
+source_tree = 'http://git.openstack.org/cgit/openstack/taskflow/tree'

 # If true, '()' will be appended to :func: etc. cross-reference text.
 add_function_parentheses = True
@@ -51,6 +53,10 @@ pygments_style = 'sphinx'
 # Prefixes that are ignored for sorting the Python module index
 modindex_common_prefix = ['taskflow.']

+# Shortened external links.
+extlinks = {
+    'example': (source_tree + '/taskflow/examples/%s.py', ''),
+}

 # -- Options for HTML output --------------------------------------------------

--- a/doc/source/examples.rst
+++ b/doc/source/examples.rst
@@ -1,11 +1,165 @@
-========
-Examples
-========
+Making phone calls
+==================

-While developing TaskFlow the team has worked hard to make sure the concepts
-that TaskFlow provides are explained by *relevant* examples. To explore these
-please check out the `examples`_ directory in the TaskFlow source tree. If the
-examples provided are not satisfactory (or up to your standards) contributions
-are welcome and very much appreciated to improve them.
+.. note::

-.. _examples: http://git.openstack.org/cgit/openstack/taskflow/tree/taskflow/examples
+    Full source located at :example:`simple_linear`.
+
+.. literalinclude:: ../../taskflow/examples/simple_linear.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Making phone calls (automatically reverting)
+============================================
+
+.. note::
+
+    Full source located at :example:`reverting_linear`.
+
+.. literalinclude:: ../../taskflow/examples/reverting_linear.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Building a car
+==============
+
+.. note::
+
+    Full source located at :example:`build_a_car`.
+
+.. literalinclude:: ../../taskflow/examples/build_a_car.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Linear equation solver (explicit dependencies)
+==============================================
+
+.. note::
+
+    Full source located at :example:`calculate_linear`.
+
+.. literalinclude:: ../../taskflow/examples/calculate_linear.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Linear equation solver (inferred dependencies)
+==============================================
+
+``Source:`` :example:`graph_flow.py`
+
+.. literalinclude:: ../../taskflow/examples/graph_flow.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Linear equation solver (in parallel)
+====================================
+
+.. note::
+
+    Full source located at :example:`calculate_in_parallel`
+
+.. literalinclude:: ../../taskflow/examples/calculate_in_parallel.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Creating a volume (in parallel)
+===============================
+
+.. note::
+
+    Full source located at :example:`create_parallel_volume`
+
+.. literalinclude:: ../../taskflow/examples/create_parallel_volume.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Storing & emitting a bill
+=========================
+
+.. note::
+
+    Full source located at :example:`fake_billing`
+
+.. literalinclude:: ../../taskflow/examples/fake_billing.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Suspending a workflow & resuming
+================================
+
+.. note::
+
+    Full source located at :example:`resume_from_backend`
+
+.. literalinclude:: ../../taskflow/examples/resume_from_backend.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Creating a virtual machine (resumable)
+======================================
+
+.. note::
+
+    Full source located at :example:`resume_vm_boot`
+
+.. literalinclude:: ../../taskflow/examples/resume_vm_boot.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Creating a volume (resumable)
+=============================
+
+.. note::
+
+    Full source located at :example:`resume_volume_create`
+
+.. literalinclude:: ../../taskflow/examples/resume_volume_create.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Running engines via iteration
+=============================
+
+.. note::
+
+    Full source located at :example:`run_by_iter`
+
+.. literalinclude:: ../../taskflow/examples/run_by_iter.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Controlling retries using a retry controller
+============================================
+
+.. note::
+
+    Full source located at :example:`retry_flow`
+
+.. literalinclude:: ../../taskflow/examples/retry_flow.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Distributed execution (simple)
+==============================
+
+.. note::
+
+    Full source located at :example:`wbe_simple_linear`
+
+.. literalinclude:: ../../taskflow/examples/wbe_simple_linear.py
+    :language: python
+    :linenos:
+    :lines: 16-
--- a/doc/source/img/conductor_cycle.png
+++ b/doc/source/img/conductor_cycle.png
--- a/doc/source/img/engine_states.svg
+++ b/doc/source/img/engine_states.svg
--- a/doc/source/img/flow_states.png
+++ b/doc/source/img/flow_states.png
--- a/doc/source/img/flow_states.svg
+++ b/doc/source/img/flow_states.svg
--- a/doc/source/img/retry_states.png
+++ b/doc/source/img/retry_states.png
--- a/doc/source/img/retry_states.svg
+++ b/doc/source/img/retry_states.svg
--- a/doc/source/img/task_states.png
+++ b/doc/source/img/task_states.png
--- a/doc/source/img/task_states.svg
+++ b/doc/source/img/task_states.svg
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -1,13 +1,14 @@
 TaskFlow
 ========

-*TaskFlow is a Python library for OpenStack that helps make task execution
-easy, consistent, and reliable.*
+*TaskFlow is a Python library that helps to make task execution easy,
+consistent and reliable.* [#f1]_

 .. note::

-    Additional documentation is also hosted on wiki:
-    https://wiki.openstack.org/wiki/TaskFlow
+    If you are just getting started or looking for an overview please
+    visit: http://wiki.openstack.org/wiki/TaskFlow which provides better
+    introductory material, description of high level goals and related content.

 Contents
 ========
@@ -33,6 +34,55 @@ Contents

   workers

+Examples
+--------
+
+While developing TaskFlow the team has worked *hard* to make sure the various
+concepts are explained by *relevant* examples. Here are a few selected examples
+to get started (ordered by *perceived* complexity):
+
+.. toctree::
+   :maxdepth: 2
+
+   examples
+
+To explore more of these examples please check out the `examples`_ directory
+in the TaskFlow `source tree`_.
+
+.. note::
+
+    If the examples provided are not satisfactory (or up to your
+    standards) contributions are welcome and very much appreciated to help
+    improve them. The higher the quality and the clearer the examples are the
+    better and more useful they are for everyone.
+
+.. _examples: http://git.openstack.org/cgit/openstack/taskflow/tree/taskflow/examples
+.. _source tree: http://git.openstack.org/cgit/openstack/taskflow/
+
+Considerations
+--------------
+
+Things to consider before (and during) development and integration with
+TaskFlow into your project:
+
+* Read over the `paradigm shifts`_ and engage the team in `IRC`_ (or via the
+  `openstack-dev`_ mailing list) if these need more explanation (prefix
+  ``[TaskFlow]`` to your emails subject to get an even faster response).
+* Follow (or at least attempt to follow) some of the established
+  `best practices`_ (feel free to add your own suggested best practices).
+
+.. warning::
+
+        External usage of internal helpers and other internal utility functions
+        and modules should be kept to a *minimum* as these may be altered,
+        refactored or moved *without* notice. If you are unsure whether to use
+        a function, class, or module, please ask (see above).
+
+.. _IRC: irc://chat.freenode.net/openstack-state-management
+.. _best practices: http://wiki.openstack.org/wiki/TaskFlow/Best_practices
+.. _paradigm shifts: http://wiki.openstack.org/wiki/TaskFlow/Paradigm_shifts
+.. _openstack-dev: mailto:openstack-dev@lists.openstack.org
+
 Miscellaneous
 -------------

@@ -40,9 +90,7 @@ Miscellaneous
   :maxdepth: 2

   exceptions
-   utils
   states
-   examples

 Indices and tables
 ==================
@@ -51,3 +99,7 @@ Indices and tables
 * :ref:`modindex`
 * :ref:`search`

+.. [#f1] It should be noted that even though it is designed with OpenStack
+         integration in mind, and that is where most of its *current*
+         integration is it aims to be generally usable and useful in any
+         project.
--- a/doc/source/inputs_and_outputs.rst
+++ b/doc/source/inputs_and_outputs.rst
@@ -34,6 +34,7 @@ set of names of such values is available via ``provides`` property of the flow.
    from taskflow import task
    from taskflow.patterns import linear_flow
    from taskflow import engines
+    from pprint import pprint

 For example:

@@ -118,10 +119,11 @@ of the engine helpers (:py:func:`~taskflow.engines.helpers.run` or
   >>> flo = linear_flow.Flow("cat-dog")
   >>> flo.add(CatTalk(), DogTalk(provides="dog"))
   <taskflow.patterns.linear_flow.Flow object at 0x...>
-   >>> engines.run(flo, store={'meow': 'meow', 'woof': 'woof'})
+   >>> result = engines.run(flo, store={'meow': 'meow', 'woof': 'woof'})
   meow
   woof
-   {'meow': 'meow', 'woof': 'woof', 'dog': 'dog'}
+   >>> pprint(result)
+   {'dog': 'dog', 'meow': 'meow', 'woof': 'woof'}

 You can also directly interact with the engine storage layer to add additional
 values, note that if this route is used you can't use
@@ -145,7 +147,7 @@ Outputs
 As you can see from examples above, the run method returns all flow outputs in
 a ``dict``. This same data can be fetched via
 :py:meth:`~taskflow.storage.Storage.fetch_all` method of the storage. You can
-also get single results using :py:meth:`~taskflow.storage.Storage.fetch_all`.
+also get single results using :py:meth:`~taskflow.storage.Storage.fetch`.
 For example:

 .. doctest::
@@ -154,8 +156,8 @@ For example:
   >>> eng.run()
   meow
   woof
-   >>> print(eng.storage.fetch_all())
-   {'meow': 'meow', 'woof': 'woof', 'dog': 'dog'}
+   >>> pprint(eng.storage.fetch_all())
+   {'dog': 'dog', 'meow': 'meow', 'woof': 'woof'}
   >>> print(eng.storage.fetch("dog"))
   dog

--- a/doc/source/jobs.rst
+++ b/doc/source/jobs.rst
@@ -214,7 +214,7 @@ the engine can immediately stop doing further work. The effect that this causes
 is that when a claim is lost another engine can immediately attempt to acquire
 the claim that was previously lost and it *could* begin working on the
 unfinished tasks that the later engine may also still be executing (since that
-engine is not yet aware that it has lost the claim).
+engine is not yet aware that it has *lost* the claim).

 **TLDR:** not `preemptable`_, possible to become aware of losing a claim
 after the fact (at the next state change), another engine could have acquired
@@ -235,8 +235,8 @@ the claim by then, therefore both would be *working* on a job.

 #. Delay claiming partially completed work by adding a wait period (to allow
   the previous engine to coalesce) before working on a partially completed job
-   (combine this with the prior suggestions and dual-engine issues should be
-   avoided).
+   (combine this with the prior suggestions and *most* dual-engine issues
+   should be avoided).

 .. _idempotent: http://en.wikipedia.org/wiki/Idempotence
 .. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
--- a/doc/source/states.rst
+++ b/doc/source/states.rst
@@ -7,46 +7,40 @@ States
 Engine
 ======

-.. image:: img/engine_states.png
-   :height: 265px
-   :align: right
+.. image:: img/engine_states.svg
+   :width: 660px
+   :align: left
   :alt: Action engine state transitions

-Executing
---------
+**RESUMING** - Prepares flow & atoms to be resumed.

-**RESUMING** - Prepare flow to be resumed.
+**SCHEDULING** - Schedules and submits atoms to be worked on.

-**SCHEDULING** - Schedule nodes to be worked on.
+**WAITING** - Wait for atoms to finish executing.

-**WAITING** - Wait for nodes to finish executing.
+**ANALYZING** - Analyzes and processes result/s of atom completion.

-**ANALYZING** - Analyze and process result/s of node completion.
+**SUCCESS** - Completed successfully.

-End
---
-
-**SUCCESS** - Engine completed successfully.
-
-**REVERTED** - Engine reverting was induced and all nodes were not completed
+**REVERTED** - Reverting was induced and all atoms were **not** completed
 successfully.

-**SUSPENDED** - Engine was suspended while running..
+**SUSPENDED** - Suspended while running.

 Flow
 ====

-.. image:: img/flow_states.png
-   :height: 400px
-   :align: right
+.. image:: img/flow_states.svg
+   :width: 660px
+   :align: left
   :alt: Flow state transitions

 **PENDING** - A flow starts its life in this state.

 **RUNNING** - In this state flow makes a progress, executes and/or reverts its
-tasks.
+atoms.

-**SUCCESS** - Once all tasks have finished successfully the flow transitions to
+**SUCCESS** - Once all atoms have finished successfully the flow transitions to
 the SUCCESS state.

 **REVERTED** - The flow transitions to this state when it has been reverted
@@ -57,14 +51,14 @@ after the failure.

 **SUSPENDING** - In the RUNNING state the flow can be suspended. When this
 happens, flow transitions to the SUSPENDING state immediately. In that state
-the engine running the flow waits for running tasks to finish (since the engine
-can not preempt tasks that are active).
+the engine running the flow waits for running atoms to finish (since the engine
+can not preempt atoms that are active).

-**SUSPENDED** - When no tasks are running and all results received so far are
+**SUSPENDED** - When no atoms are running and all results received so far are
 saved, the flow transitions from the SUSPENDING state to SUSPENDED. Also it may
-go to the SUCCESS state if all tasks were in fact ran, or to the REVERTED state
-if the flow was reverting and all tasks were reverted while the engine was
-waiting for running tasks to finish, or to the FAILURE state if tasks were run
+go to the SUCCESS state if all atoms were in fact ran, or to the REVERTED state
+if the flow was reverting and all atoms were reverted while the engine was
+waiting for running atoms to finish, or to the FAILURE state if atoms were run
 or reverted and some of them failed.

 **RESUMING** - When the flow is interrupted 'in a hard way' (e.g. server
@@ -79,24 +73,25 @@ From the SUCCESS, FAILURE or REVERTED states the flow can be ran again (and
 thus it goes back into the RUNNING state). One of the possible use cases for
 this transition is to allow for alteration of a flow or flow details associated
 with a previously ran flow after the flow has finished, and client code wants
-to ensure that each task from this new (potentially updated) flow has its
+to ensure that each atom from this new (potentially updated) flow has its
 chance to run.

 .. note::

  The current code also contains strong checks during each flow state
-  transition using the model described above and raises the InvalidState
-  exception if an invalid transition is attempted. This exception being
-  triggered usually means there is some kind of bug in the engine code or some
-  type of misuse/state violation is occurring, and should be reported as such.
+  transition using the model described above and raises the
+  :py:class:`~taskflow.exceptions.InvalidState` exception if an invalid
+  transition is attempted. This exception being triggered usually means there
+  is some kind of bug in the engine code or some type of misuse/state violation
+  is occurring, and should be reported as such.


 Task
 ====

-.. image:: img/task_states.png
-   :height: 265px
-   :align: right
+.. image:: img/task_states.svg
+   :width: 660px
+   :align: left
   :alt: Task state transitions

 **PENDING** - When a task is added to a flow, it starts in the PENDING state,
@@ -105,7 +100,8 @@ on to complete.  The task transitions to the PENDING state after it was
 reverted and its flow was restarted or retried.

 **RUNNING** - When flow starts to execute the task, it transitions to the
-RUNNING state, and stays in this state until its execute() method returns.
+RUNNING state, and stays in this state until its
+:py:meth:`execute() <taskflow.task.BaseTask.execute>` method returns.

 **SUCCESS** - The task transitions to this state after it was finished
 successfully.
@@ -115,20 +111,20 @@ error. When the flow containing this task is being reverted, all its tasks are
 walked in particular order.

 **REVERTING** - The task transitions to this state when the flow starts to
-revert it and its revert() method is called. Only tasks in the SUCCESS or
-FAILURE state can be reverted.  If this method fails (raises exception), task
-goes to the FAILURE state.
+revert it and its :py:meth:`revert() <taskflow.task.BaseTask.revert>` method
+is called. Only tasks in the SUCCESS or FAILURE state can be reverted. If this
+method fails (raises exception), the task goes to the FAILURE state.

-**REVERTED** - The task that has been reverted appears it this state.
+**REVERTED** - A task that has been reverted appears in this state.


 Retry
 =====

-.. image:: img/retry_states.png
-   :height: 275px
-   :align: right
-   :alt: Task state transitions
+.. image:: img/retry_states.svg
+   :width: 660px
+   :align: left
+   :alt: Retry state transitions

 Retry has the same states as a task and one additional state.

@@ -138,7 +134,8 @@ on to complete.  The retry transitions to the PENDING state after it was
 reverted and its flow was restarted or retried.

 **RUNNING** - When flow starts to execute the retry, it transitions to the
-RUNNING state, and stays in this state until its execute() method returns.
+RUNNING state, and stays in this state until its
+:py:meth:`execute() <taskflow.retry.Retry.execute>` method returns.

 **SUCCESS** - The retry transitions to this state after it was finished
 successfully.
@@ -148,14 +145,12 @@ error. When the flow containing this retry is being reverted, all its tasks are
 walked in particular order.

 **REVERTING** - The retry transitions to this state when the flow starts to
-revert it and its revert() method is called. Only retries in SUCCESS or FAILURE
-state can be reverted.  If this method fails (raises exception), task goes to
-the FAILURE.
+revert it and its :py:meth:`revert() <taskflow.retry.Retry.revert>` method is
+called. Only retries in SUCCESS or FAILURE state can be reverted. If this
+method fails (raises exception), the retry goes to the FAILURE state.

-**REVERTED** - The retry that has been reverted appears it this state.
+**REVERTED** - A retry that has been reverted appears in this state.

 **RETRYING** - If flow that is managed by the current retry was failed and
-reverted, the retry prepares it for the next run and transitions to the
+reverted, the engine prepares it for the next run and transitions to the
 RETRYING state.
-
-
--- a/doc/source/utils.rst
+++ b/doc/source/utils.rst
@@ -1,15 +0,0 @@
-----
-Utils
-----
-
-.. warning::
-
-        External usage of internal helpers and other internal utility functions
-        and modules should be kept to a *minimum* as these may be altered,
-        refactored or moved *without* notice.
-
-The following classes and modules though may be used:
-
-.. autoclass:: taskflow.utils.misc.Failure
-.. autoclass:: taskflow.utils.eventlet_utils.GreenExecutor
-.. automodule:: taskflow.utils.persistence_utils
--- a/doc/source/workers.rst
+++ b/doc/source/workers.rst
@@ -7,8 +7,7 @@ Overview

 This is engine that schedules tasks to **workers** -- separate processes
 dedicated for certain atoms execution, possibly running on other machines,
-connected via `amqp`_ (or other supported `kombu
-<http://kombu.readthedocs.org/>`_ transports).
+connected via `amqp`_ (or other supported `kombu`_ transports).

 .. note::

@@ -18,6 +17,7 @@ connected via `amqp`_ (or other supported `kombu
    production ready.

 .. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
+.. _kombu: http://kombu.readthedocs.org/

 Terminology
 -----------
--- a/optional-requirements.txt
+++ b/optional-requirements.txt
@@ -1,8 +1,11 @@
-# This file lists dependencies that are used by different
-# pluggable (optional) parts of TaskFlow, like engines
-# or persistence backends. They are not strictly required
-# by TaskFlow (you can use TaskFlow without them), but
-# so they don't go to requirements.txt.
+# This file lists dependencies that are used by different pluggable (optional)
+# parts of TaskFlow, like engines or persistence backends. They are not
+# strictly required by TaskFlow (aka you can use TaskFlow without them), so
+# they don't go into one of the requirements.txt files.
+
+# The order of packages is significant, because pip processes them in the order
+# of appearance. Changing the order has an impact on the overall integration
+# process, which may cause wedges in the gate later.

 # Database (sqlalchemy) persistence:
 SQLAlchemy>=0.7.8,<=0.9.99
--- a/requirements-py2.txt
+++ b/requirements-py2.txt
@@ -0,0 +1,22 @@
+# The order of packages is significant, because pip processes them in the order
+# of appearance. Changing the order has an impact on the overall integration
+# process, which may cause wedges in the gate later.
+
+# Packages needed for using this library.
+anyjson>=0.3.3
+iso8601>=0.1.9
+# Only needed on python 2.6
+ordereddict
+# Python 2->3 compatibility library.
+six>=1.7.0
+# Very nice graph library
+networkx>=1.8
+Babel>=1.3
+# Used for backend storage engine loading.
+stevedore>=0.14
+# Backport for concurrent.futures which exists in 3.2+
+futures>=2.1.6
+# Used for structured input validation
+jsonschema>=2.0.0,<3.0.0
+# For pretty printing state-machine tables
+PrettyTable>=0.7,<0.8
--- a/requirements-py3.txt
+++ b/requirements-py3.txt
@@ -0,0 +1,18 @@
+# The order of packages is significant, because pip processes them in the order
+# of appearance. Changing the order has an impact on the overall integration
+# process, which may cause wedges in the gate later.
+
+# Packages needed for using this library.
+anyjson>=0.3.3
+iso8601>=0.1.9
+# Python 2->3 compatibility library.
+six>=1.7.0
+# Very nice graph library
+networkx>=1.8
+Babel>=1.3
+# Used for backend storage engine loading.
+stevedore>=0.14
+# Used for structured input validation
+jsonschema>=2.0.0,<3.0.0
+# For pretty printing state-machine tables
+PrettyTable>=0.7,<0.8
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,13 +0,0 @@
-# Packages needed for using this library.
-pbr>=0.6,!=0.7,<1.0
-anyjson>=0.3.3
-iso8601>=0.1.9
-# Python 2->3 compatibility library.
-six>=1.6.0
-# Very nice graph library
-networkx>=1.8
-Babel>=1.3
-# Used for backend storage engine loading.
-stevedore>=0.14
-# Backport for concurrent.futures which exists in 3.2+
-futures>=2.1.3
--- a/taskflow/atom.py
+++ b/taskflow/atom.py
@@ -78,7 +78,9 @@ def _build_rebind_dict(args, rebind_args):

 def _build_arg_mapping(atom_name, reqs, rebind_args, function, do_infer,
                       ignore_list=None):
-    """Given a function, its requirements and a rebind mapping this helper
+    """Builds an input argument mapping for a given function.
+
+    Given a function, its requirements and a rebind mapping this helper
    function will build the correct argument mapping for the given function as
    well as verify that the final argument mapping does not have missing or
    extra arguments (where applicable).
--- a/taskflow/conductors/base.py
+++ b/taskflow/conductors/base.py
@@ -24,7 +24,9 @@ from taskflow.utils import lock_utils

@six.add_metaclass(abc.ABCMeta)
 class Conductor(object):
-    """Conductors act as entities which extract jobs from a jobboard, assign
+    """Conductors conduct jobs & assist in associated runtime interactions.
+
+    Conductors act as entities which extract jobs from a jobboard, assign
    there work to some engine (using some desired configuration) and then wait
    for that work to complete. If the work fails then they abandon the claimed
    work (or if the process they are running in crashes or dies this
@@ -99,13 +101,13 @@ class Conductor(object):

    @abc.abstractmethod
    def run(self):
-        """Continuously claims, runs, and consumes jobs, and waits for more
-        jobs when there are none left on the jobboard.
-        """
+        """Continuously claims, runs, and consumes jobs (and repeat)."""

    @abc.abstractmethod
    def _dispatch_job(self, job):
-        """Accepts a single (already claimed) job and causes it to be run in
+        """Dispatches a claimed job for work completion.
+
+        Accepts a single (already claimed) job and causes it to be run in
        an engine. Returns a boolean that signifies whether the job should
        be consumed. The job is consumed upon completion (unless False is
        returned which will signify the job should be abandoned instead).
--- a/taskflow/conductors/single_threaded.py
+++ b/taskflow/conductors/single_threaded.py
@@ -20,8 +20,8 @@ import six
 from taskflow.conductors import base
 from taskflow import exceptions as excp
 from taskflow.listeners import logging as logging_listener
+from taskflow.types import timing as tt
 from taskflow.utils import lock_utils
-from taskflow.utils import misc

 LOG = logging.getLogger(__name__)
 WAIT_TIMEOUT = 0.5
@@ -58,8 +58,8 @@ class SingleThreadedConductor(base.Conductor):
        if wait_timeout is None:
            wait_timeout = WAIT_TIMEOUT
        if isinstance(wait_timeout, (int, float) + six.string_types):
-            self._wait_timeout = misc.Timeout(float(wait_timeout))
-        elif isinstance(wait_timeout, misc.Timeout):
+            self._wait_timeout = tt.Timeout(float(wait_timeout))
+        elif isinstance(wait_timeout, tt.Timeout):
            self._wait_timeout = wait_timeout
        else:
            raise ValueError("Invalid timeout literal: %s" % (wait_timeout))
@@ -67,10 +67,13 @@ class SingleThreadedConductor(base.Conductor):

    @lock_utils.locked
    def stop(self, timeout=None):
-        """Requests the conductor to stop dispatching and returns whether the
-        stop request was successfully completed. If the dispatching is still
-        occurring then False is returned otherwise True will be returned to
-        signal that the conductor is no longer dispatching job requests.
+        """Requests the conductor to stop dispatching.
+
+        This method can be used to request that a conductor stop its
+        consumption & dispatching loop. It returns whether the stop request
+        was successfully completed. If the dispatching is still occurring
+        then False is returned otherwise True will be returned to signal that
+        the conductor is no longer consuming & dispatching job requests.

        NOTE(harlowja): If a timeout is provided the dispatcher loop may
        not have ceased by the timeout reached (the request to cease will
@@ -93,17 +96,24 @@ class SingleThreadedConductor(base.Conductor):
                engine.run()
            except excp.WrappedFailure as e:
                if all((f.check(*NO_CONSUME_EXCEPTIONS) for f in e)):
-                    LOG.warn("Job execution failed (consumption being"
-                             " skipped): %s", job, exc_info=True)
                    consume = False
-                else:
-                    LOG.warn("Job execution failed: %s", job, exc_info=True)
+                if LOG.isEnabledFor(logging.WARNING):
+                    if consume:
+                        LOG.warn("Job execution failed (consumption being"
+                                 " skipped): %s [%s failures]", job, len(e))
+                    else:
+                        LOG.warn("Job execution failed (consumption"
+                                 " proceeding): %s [%s failures]", job, len(e))
+                    # Show the failure/s + traceback (if possible)...
+                    for i, f in enumerate(e):
+                        LOG.warn("%s. %s", i + 1, f.pformat(traceback=True))
            except NO_CONSUME_EXCEPTIONS:
                LOG.warn("Job execution failed (consumption being"
                         " skipped): %s", job, exc_info=True)
                consume = False
            except Exception:
-                LOG.warn("Job execution failed: %s", job, exc_info=True)
+                LOG.warn("Job execution failed (consumption proceeding): %s",
+                         job, exc_info=True)
            else:
                LOG.info("Job completed successfully: %s", job)
        return consume
--- a/taskflow/engines/action_engine/analyzer.py
+++ b/taskflow/engines/action_engine/analyzer.py
@@ -22,11 +22,13 @@ from taskflow import states as st


 class Analyzer(object):
-    """Analyzes a compilation output to get the next atoms for execution or
-    reversion by utilizing the compilations underlying structures (graphs,
-    nodes and edge relations...) and using this information along with the
-    atom state/states stored in storage to provide useful analysis functions
-    to the rest of the runtime system.
+    """Analyzes a compilation and aids in execution processes.
+
+    Its primary purpose is to get the next atoms for execution or reversion
+    by utilizing the compilations underlying structures (graphs, nodes and
+    edge relations...) and using this information along with the atom
+    state/states stored in storage to provide other useful functionality to
+    the rest of the runtime system.
    """

    def __init__(self, compilation, storage):
@@ -56,8 +58,11 @@ class Analyzer(object):
            return []

    def browse_nodes_for_execute(self, node=None):
-        """Browse next nodes to execute for given node if specified and
-        for whole graph otherwise.
+        """Browse next nodes to execute.
+
+        This returns a collection of nodes that are ready to be executed, if
+        given a specific node it will only examine the successors of that node,
+        otherwise it will examine the whole graph.
        """
        if node:
            nodes = self._graph.successors(node)
@@ -71,8 +76,11 @@ class Analyzer(object):
        return available_nodes

    def browse_nodes_for_revert(self, node=None):
-        """Browse next nodes to revert for given node if specified and
-        for whole graph otherwise.
+        """Browse next nodes to revert.
+
+        This returns a collection of nodes that are ready to be be reverted, if
+        given a specific node it will only examine the predecessors of that
+        node, otherwise it will examine the whole graph.
        """
        if node:
            nodes = self._graph.predecessors(node)
@@ -87,7 +95,6 @@ class Analyzer(object):

    def _is_ready_for_execute(self, task):
        """Checks if task is ready to be executed."""
-
        state = self.get_state(task)
        intention = self._storage.get_atom_intention(task.name)
        transition = st.check_task_transition(state, st.RUNNING)
@@ -104,7 +111,6 @@ class Analyzer(object):

    def _is_ready_for_revert(self, task):
        """Checks if task is ready to be reverted."""
-
        state = self.get_state(task)
        intention = self._storage.get_atom_intention(task.name)
        transition = st.check_task_transition(state, st.REVERTING)
@@ -120,15 +126,14 @@ class Analyzer(object):
                   for state, intention in six.itervalues(task_states))

    def iterate_subgraph(self, retry):
-        """Iterates a subgraph connected to current retry controller, including
-        nested retry controllers and its nodes.
-        """
+        """Iterates a subgraph connected to given retry controller."""
        for _src, dst in traversal.dfs_edges(self._graph, retry):
            yield dst

    def iterate_retries(self, state=None):
-        """Iterates retry controllers of a graph with given state or all
-        retries if state is None.
+        """Iterates retry controllers that match the provided state.
+
+        If no state is provided it will yield back all retry controllers.
        """
        for node in self._graph.nodes_iter():
            if isinstance(node, retry_atom.Retry):
--- a/taskflow/engines/action_engine/compiler.py
+++ b/taskflow/engines/action_engine/compiler.py
@@ -42,8 +42,7 @@ class Compilation(object):


 class PatternCompiler(object):
-    """Compiles patterns & atoms (potentially nested) into an compilation
-    unit with a *logically* equivalent directed acyclic graph representation.
+    """Compiles patterns & atoms into a compilation unit.

    NOTE(harlowja): during this pattern translation process any nested flows
    will be converted into there equivalent subgraphs. This currently implies
@@ -51,8 +50,8 @@ class PatternCompiler(object):
    be associated with there previously containing flow but instead will lose
    this identity and what will remain is the logical constraints that there
    contained flow mandated. In the future this may be changed so that this
-    association is not lost via the compilation process (since it is sometime
-    useful to retain part of this relationship).
+    association is not lost via the compilation process (since it can be
+    useful to retain this relationship).
    """
    def compile(self, root):
        graph = _Flattener(root).flatten()
@@ -80,9 +79,11 @@ class _Flattener(object):
        self._freeze = bool(freeze)

    def _add_new_edges(self, graph, nodes_from, nodes_to, edge_attrs):
-        """Adds new edges from nodes to other nodes in the specified graph,
-        with the following edge attributes (defaulting to the class provided
-        edge_data if None), if the edge does not already exist.
+        """Adds new edges from nodes to other nodes in the specified graph.
+
+        It will connect the nodes_from to the nodes_to if an edge currently
+        does *not* exist. When an edge is created the provided edge attributes
+        will be applied to the new edge between these two nodes.
        """
        nodes_to = list(nodes_to)
        for u in nodes_from:
@@ -109,8 +110,18 @@ class _Flattener(object):
        elif isinstance(item, task.BaseTask):
            return self._flatten_task
        elif isinstance(item, retry.Retry):
-            raise TypeError("Retry controller %s (%s) is used not as a flow "
-                            "parameter" % (item, type(item)))
+            if len(self._history) == 1:
+                raise TypeError("Retry controller: %s (%s) must only be used"
+                                " as a flow constructor parameter and not as a"
+                                " root component" % (item, type(item)))
+            else:
+                # TODO(harlowja): we should raise this type error earlier
+                # instead of later since we should do this same check on add()
+                # calls, this makes the error more visible (instead of waiting
+                # until compile time).
+                raise TypeError("Retry controller: %s (%s) must only be used"
+                                " as a flow constructor parameter and not as a"
+                                " flow added component" % (item, type(item)))
        else:
            return None

--- a/taskflow/engines/action_engine/engine.py
+++ b/taskflow/engines/action_engine/engine.py
@@ -14,24 +14,33 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import contextlib
 import threading

 from taskflow.engines.action_engine import compiler
 from taskflow.engines.action_engine import executor
 from taskflow.engines.action_engine import runtime
 from taskflow.engines import base
-
 from taskflow import exceptions as exc
 from taskflow.openstack.common import excutils
 from taskflow import retry
 from taskflow import states
 from taskflow import storage as atom_storage
-
 from taskflow.utils import lock_utils
 from taskflow.utils import misc
 from taskflow.utils import reflection


+@contextlib.contextmanager
+def _start_stop(executor):
+    # A teenie helper context manager to safely start/stop a executor...
+    executor.start()
+    try:
+        yield executor
+    finally:
+        executor.stop()
+
+
 class ActionEngine(base.EngineBase):
    """Generic action-based engine.

@@ -112,31 +121,38 @@ class ActionEngine(base.EngineBase):
        """
        self.compile()
        self.prepare()
-        self._task_executor.start()
-        state = None
        runner = self._runtime.runner
-        try:
+        last_state = None
+        with _start_stop(self._task_executor):
            self._change_state(states.RUNNING)
-            for state in runner.run_iter(timeout=timeout):
-                try:
-                    try_suspend = yield state
-                except GeneratorExit:
-                    break
-                else:
-                    if try_suspend:
+            try:
+                closed = False
+                for (last_state, failures) in runner.run_iter(timeout=timeout):
+                    if failures:
+                        misc.Failure.reraise_if_any(failures)
+                    if closed:
+                        continue
+                    try:
+                        try_suspend = yield last_state
+                    except GeneratorExit:
+                        # The generator was closed, attempt to suspend and
+                        # continue looping until we have cleanly closed up
+                        # shop...
+                        closed = True
                        self.suspend()
-        except Exception:
-            with excutils.save_and_reraise_exception():
-                self._change_state(states.FAILURE)
-        else:
-            ignorable_states = getattr(runner, 'ignorable_states', [])
-            if state and state not in ignorable_states:
-                self._change_state(state)
-                if state != states.SUSPENDED and state != states.SUCCESS:
-                    failures = self.storage.get_failures()
-                    misc.Failure.reraise_if_any(failures.values())
-        finally:
-            self._task_executor.stop()
+                    else:
+                        if try_suspend:
+                            self.suspend()
+            except Exception:
+                with excutils.save_and_reraise_exception():
+                    self._change_state(states.FAILURE)
+            else:
+                ignorable_states = getattr(runner, 'ignorable_states', [])
+                if last_state and last_state not in ignorable_states:
+                    self._change_state(last_state)
+                    if last_state not in [states.SUSPENDED, states.SUCCESS]:
+                        failures = self.storage.get_failures()
+                        misc.Failure.reraise_if_any(failures.values())

    def _change_state(self, state):
        with self._state_lock:
@@ -144,20 +160,12 @@ class ActionEngine(base.EngineBase):
            if not states.check_flow_transition(old_state, state):
                return
            self.storage.set_flow_state(state)
-        try:
-            flow_uuid = self._flow.uuid
-        except AttributeError:
-            # NOTE(harlowja): if the flow was just a single task, then it
-            # will not itself have a uuid, but the constructed flow_detail
-            # will.
-            if self._flow_detail is not None:
-                flow_uuid = self._flow_detail.uuid
-            else:
-                flow_uuid = None
-        details = dict(engine=self,
-                       flow_name=self._flow.name,
-                       flow_uuid=flow_uuid,
-                       old_state=old_state)
+        details = {
+            'engine': self,
+            'flow_name': self.storage.flow_name,
+            'flow_uuid': self.storage.flow_uuid,
+            'old_state': old_state,
+        }
        self.notifier.notify(state, details)

    def _ensure_storage(self):
@@ -226,9 +234,12 @@ class MultiThreadedActionEngine(ActionEngine):
    _storage_factory = atom_storage.MultiThreadedStorage

    def _task_executor_factory(self):
-        return executor.ParallelTaskExecutor(self._executor)
+        return executor.ParallelTaskExecutor(executor=self._executor,
+                                             max_workers=self._max_workers)

-    def __init__(self, flow, flow_detail, backend, conf, **kwargs):
+    def __init__(self, flow, flow_detail, backend, conf,
+                 executor=None, max_workers=None):
        super(MultiThreadedActionEngine, self).__init__(
            flow, flow_detail, backend, conf)
-        self._executor = kwargs.get('executor')
+        self._executor = executor
+        self._max_workers = max_workers
--- a/taskflow/engines/action_engine/executor.py
+++ b/taskflow/engines/action_engine/executor.py
@@ -31,11 +31,14 @@ REVERTED = 'reverted'
 def _execute_task(task, arguments, progress_callback):
    with task.autobind('update_progress', progress_callback):
        try:
+            task.pre_execute()
            result = task.execute(**arguments)
        except Exception:
            # NOTE(imelnikov): wrap current exception with Failure
            # object and return it.
            result = misc.Failure()
+        finally:
+            task.post_execute()
    return (task, EXECUTED, result)


@@ -45,11 +48,14 @@ def _revert_task(task, arguments, result, failures, progress_callback):
    kwargs['flow_failures'] = failures
    with task.autobind('update_progress', progress_callback):
        try:
+            task.pre_revert()
            result = task.revert(**kwargs)
        except Exception:
            # NOTE(imelnikov): wrap current exception with Failure
            # object and return it.
            result = misc.Failure()
+        finally:
+            task.post_revert()
    return (task, REVERTED, result)


@@ -105,13 +111,14 @@ class SerialTaskExecutor(TaskExecutorBase):
 class ParallelTaskExecutor(TaskExecutorBase):
    """Executes tasks in parallel.

-    Submits tasks to executor which should provide interface similar
+    Submits tasks to an executor which should provide an interface similar
    to concurrent.Futures.Executor.
    """

-    def __init__(self, executor=None):
+    def __init__(self, executor=None, max_workers=None):
        self._executor = executor
-        self._own_executor = executor is None
+        self._max_workers = max_workers
+        self._create_executor = executor is None

    def execute_task(self, task, task_uuid, arguments, progress_callback=None):
        return self._executor.submit(
@@ -127,11 +134,14 @@ class ParallelTaskExecutor(TaskExecutorBase):
        return async_utils.wait_for_any(fs, timeout)

    def start(self):
-        if self._own_executor:
-            thread_count = threading_utils.get_optimal_thread_count()
-            self._executor = futures.ThreadPoolExecutor(thread_count)
+        if self._create_executor:
+            if self._max_workers is not None:
+                max_workers = self._max_workers
+            else:
+                max_workers = threading_utils.get_optimal_thread_count()
+            self._executor = futures.ThreadPoolExecutor(max_workers)

    def stop(self):
-        if self._own_executor:
+        if self._create_executor:
            self._executor.shutdown(wait=True)
            self._executor = None
--- a/taskflow/engines/action_engine/retry_action.py
+++ b/taskflow/engines/action_engine/retry_action.py
@@ -17,7 +17,6 @@
 import logging

 from taskflow.engines.action_engine import executor as ex
-from taskflow import exceptions
 from taskflow import states
 from taskflow.utils import async_utils
 from taskflow.utils import misc
@@ -39,27 +38,25 @@ class RetryAction(object):
        return kwargs

    def change_state(self, retry, state, result=None):
-        old_state = self._storage.get_atom_state(retry.name)
-        if old_state == state:
-            return state != states.PENDING
        if state in SAVE_RESULT_STATES:
            self._storage.save(retry.name, result, state)
        elif state == states.REVERTED:
            self._storage.cleanup_retry_history(retry.name, state)
        else:
+            old_state = self._storage.get_atom_state(retry.name)
+            if state == old_state:
+                # NOTE(imelnikov): nothing really changed, so we should not
+                # write anything to storage and run notifications
+                return
            self._storage.set_atom_state(retry.name, state)
        retry_uuid = self._storage.get_atom_uuid(retry.name)
        details = dict(retry_name=retry.name,
                       retry_uuid=retry_uuid,
                       result=result)
        self._notifier.notify(state, details)
-        return True

    def execute(self, retry):
-        if not self.change_state(retry, states.RUNNING):
-            raise exceptions.InvalidState("Retry controller %s is in invalid "
-                                          "state and can't be executed" %
-                                          retry.name)
+        self.change_state(retry, states.RUNNING)
        kwargs = self._get_retry_args(retry)
        try:
            result = retry.execute(**kwargs)
@@ -71,10 +68,7 @@ class RetryAction(object):
        return async_utils.make_completed_future((retry, ex.EXECUTED, result))

    def revert(self, retry):
-        if not self.change_state(retry, states.REVERTING):
-            raise exceptions.InvalidState("Retry controller %s is in invalid "
-                                          "state and can't be reverted" %
-                                          retry.name)
+        self.change_state(retry, states.REVERTING)
        kwargs = self._get_retry_args(retry)
        kwargs['flow_failures'] = self._storage.get_failures()
        try:
--- a/taskflow/engines/action_engine/runner.py
+++ b/taskflow/engines/action_engine/runner.py
@@ -14,24 +14,199 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import logging
+
 from taskflow import states as st
+from taskflow.types import fsm
 from taskflow.utils import misc

+# Waiting state timeout (in seconds).
+_WAITING_TIMEOUT = 60

-_WAITING_TIMEOUT = 60  # in seconds
+# Meta states the state machine uses.
+_UNDEFINED = 'UNDEFINED'
+_GAME_OVER = 'GAME_OVER'
+_META_STATES = (_GAME_OVER, _UNDEFINED)
+
+LOG = logging.getLogger(__name__)
+
+
+class _MachineMemory(object):
+    """State machine memory."""
+
+    def __init__(self):
+        self.next_nodes = set()
+        self.not_done = set()
+        self.failures = []
+        self.done = set()
+
+
+class _MachineBuilder(object):
+    """State machine *builder* that the runner uses.
+
+    NOTE(harlowja): the machine states that this build will for are::
+
+    +--------------+-----------+------------+----------+---------+
+    |    Start     |   Event   |    End     | On Enter | On Exit |
+    +--------------+-----------+------------+----------+---------+
+    |  ANALYZING   |  finished | GAME_OVER  | on_enter | on_exit |
+    |  ANALYZING   |  schedule | SCHEDULING | on_enter | on_exit |
+    |  ANALYZING   |    wait   |  WAITING   | on_enter | on_exit |
+    |  FAILURE[$]  |           |            |          |         |
+    |  GAME_OVER   |   failed  |  FAILURE   | on_enter | on_exit |
+    |  GAME_OVER   |  reverted |  REVERTED  | on_enter | on_exit |
+    |  GAME_OVER   |  success  |  SUCCESS   | on_enter | on_exit |
+    |  GAME_OVER   | suspended | SUSPENDED  | on_enter | on_exit |
+    |   RESUMING   |  schedule | SCHEDULING | on_enter | on_exit |
+    | REVERTED[$]  |           |            |          |         |
+    |  SCHEDULING  |    wait   |  WAITING   | on_enter | on_exit |
+    |  SUCCESS[$]  |           |            |          |         |
+    | SUSPENDED[$] |           |            |          |         |
+    | UNDEFINED[^] |   start   |  RESUMING  | on_enter | on_exit |
+    |   WAITING    |  analyze  | ANALYZING  | on_enter | on_exit |
+    +--------------+-----------+------------+----------+---------+
+
+    Between any of these yielded states (minus ``GAME_OVER`` and ``UNDEFINED``)
+    if the engine has been suspended or the engine has failed (due to a
+    non-resolveable task failure or scheduling failure) the machine will stop
+    executing new tasks (currently running tasks will be allowed to complete)
+    and this machines run loop will be broken.
+    """
+
+    def __init__(self, runtime, waiter):
+        self._analyzer = runtime.analyzer
+        self._completer = runtime.completer
+        self._scheduler = runtime.scheduler
+        self._storage = runtime.storage
+        self._waiter = waiter
+
+    def runnable(self):
+        return self._storage.get_flow_state() == st.RUNNING
+
+    def build(self, timeout=None):
+        memory = _MachineMemory()
+        if timeout is None:
+            timeout = _WAITING_TIMEOUT
+
+        def resume(old_state, new_state, event):
+            memory.next_nodes.update(self._completer.resume())
+            memory.next_nodes.update(self._analyzer.get_next_nodes())
+            return 'schedule'
+
+        def game_over(old_state, new_state, event):
+            if memory.failures:
+                return 'failed'
+            if self._analyzer.get_next_nodes():
+                return 'suspended'
+            elif self._analyzer.is_success():
+                return 'success'
+            else:
+                return 'reverted'
+
+        def schedule(old_state, new_state, event):
+            if self.runnable() and memory.next_nodes:
+                not_done, failures = self._scheduler.schedule(
+                    memory.next_nodes)
+                if not_done:
+                    memory.not_done.update(not_done)
+                if failures:
+                    memory.failures.extend(failures)
+                memory.next_nodes.clear()
+            return 'wait'
+
+        def wait(old_state, new_state, event):
+            # TODO(harlowja): maybe we should start doing 'yield from' this
+            # call sometime in the future, or equivalent that will work in
+            # py2 and py3.
+            if memory.not_done:
+                done, not_done = self._waiter.wait_for_any(memory.not_done,
+                                                           timeout)
+                memory.done.update(done)
+                memory.not_done = not_done
+            return 'analyze'
+
+        def analyze(old_state, new_state, event):
+            next_nodes = set()
+            while memory.done:
+                fut = memory.done.pop()
+                try:
+                    node, event, result = fut.result()
+                    retain = self._completer.complete(node, event, result)
+                    if retain and isinstance(result, misc.Failure):
+                        memory.failures.append(result)
+                except Exception:
+                    memory.failures.append(misc.Failure())
+                else:
+                    try:
+                        more_nodes = self._analyzer.get_next_nodes(node)
+                    except Exception:
+                        memory.failures.append(misc.Failure())
+                    else:
+                        next_nodes.update(more_nodes)
+            if self.runnable() and next_nodes and not memory.failures:
+                memory.next_nodes.update(next_nodes)
+                return 'schedule'
+            elif memory.not_done:
+                return 'wait'
+            else:
+                return 'finished'
+
+        def on_exit(old_state, event):
+            LOG.debug("Exiting old state '%s' in response to event '%s'",
+                      old_state, event)
+
+        def on_enter(new_state, event):
+            LOG.debug("Entering new state '%s' in response to event '%s'",
+                      new_state, event)
+
+        # NOTE(harlowja): when ran in debugging mode it is quite useful
+        # to track the various state transitions as they happen...
+        watchers = {}
+        if LOG.isEnabledFor(logging.DEBUG):
+            watchers['on_exit'] = on_exit
+            watchers['on_enter'] = on_enter
+
+        m = fsm.FSM(_UNDEFINED)
+        m.add_state(_GAME_OVER, **watchers)
+        m.add_state(_UNDEFINED, **watchers)
+        m.add_state(st.ANALYZING, **watchers)
+        m.add_state(st.RESUMING, **watchers)
+        m.add_state(st.REVERTED, terminal=True, **watchers)
+        m.add_state(st.SCHEDULING, **watchers)
+        m.add_state(st.SUCCESS, terminal=True, **watchers)
+        m.add_state(st.SUSPENDED, terminal=True, **watchers)
+        m.add_state(st.WAITING, **watchers)
+        m.add_state(st.FAILURE, terminal=True, **watchers)
+
+        m.add_transition(_GAME_OVER, st.REVERTED, 'reverted')
+        m.add_transition(_GAME_OVER, st.SUCCESS, 'success')
+        m.add_transition(_GAME_OVER, st.SUSPENDED, 'suspended')
+        m.add_transition(_GAME_OVER, st.FAILURE, 'failed')
+        m.add_transition(_UNDEFINED, st.RESUMING, 'start')
+        m.add_transition(st.ANALYZING, _GAME_OVER, 'finished')
+        m.add_transition(st.ANALYZING, st.SCHEDULING, 'schedule')
+        m.add_transition(st.ANALYZING, st.WAITING, 'wait')
+        m.add_transition(st.RESUMING, st.SCHEDULING, 'schedule')
+        m.add_transition(st.SCHEDULING, st.WAITING, 'wait')
+        m.add_transition(st.WAITING, st.ANALYZING, 'analyze')
+
+        m.add_reaction(_GAME_OVER, 'finished', game_over)
+        m.add_reaction(st.ANALYZING, 'analyze', analyze)
+        m.add_reaction(st.RESUMING, 'start', resume)
+        m.add_reaction(st.SCHEDULING, 'schedule', schedule)
+        m.add_reaction(st.WAITING, 'wait', wait)
+
+        return (m, memory)


 class Runner(object):
    """Runner that iterates while executing nodes using the given runtime.

-    This runner acts as the action engine run loop, it resumes the workflow,
-    schedules all task it can for execution using the runtimes scheduler and
-    analyzer components, and than waits on returned futures and then activates
-    the runtimes completion component to finish up those tasks.
-
-    This process repeats until the analzyer runs out of next nodes, when the
-    scheduler can no longer schedule tasks or when the the engine has been
-    suspended or a task has failed and that failure could not be resolved.
+    This runner acts as the action engine run loop/state-machine, it resumes
+    the workflow, schedules all task it can for execution using the runtimes
+    scheduler and analyzer components, and than waits on returned futures and
+    then activates the runtimes completion component to finish up those tasks
+    and so on...

    NOTE(harlowja): If the runtimes scheduler component is able to schedule
    tasks in parallel, this enables parallel running and/or reversion.
@@ -43,94 +218,22 @@ class Runner(object):
    ignorable_states = (st.SCHEDULING, st.WAITING, st.RESUMING, st.ANALYZING)

    def __init__(self, runtime, waiter):
-        self._scheduler = runtime.scheduler
-        self._completer = runtime.completer
-        self._storage = runtime.storage
-        self._analyzer = runtime.analyzer
-        self._waiter = waiter
+        self._builder = _MachineBuilder(runtime, waiter)

-    def is_running(self):
-        return self._storage.get_flow_state() == st.RUNNING
+    @property
+    def builder(self):
+        return self._builder
+
+    def runnable(self):
+        return self._builder.runnable()

    def run_iter(self, timeout=None):
-        """Runs the nodes using the runtime components.
-
-        NOTE(harlowja): the states that this generator will go through are:
-
-        RESUMING -> SCHEDULING
-        SCHEDULING -> WAITING
-        WAITING -> ANALYZING
-        ANALYZING -> SCHEDULING
-
-        Between any of these yielded states if the engine has been suspended
-        or the engine has failed (due to a non-resolveable task failure or
-        scheduling failure) the engine will stop executing new tasks (currently
-        running tasks will be allowed to complete) and this iteration loop
-        will be broken.
-        """
-        if timeout is None:
-            timeout = _WAITING_TIMEOUT
-
-        # Prepare flow to be resumed
-        yield st.RESUMING
-        next_nodes = self._completer.resume()
-        next_nodes.update(self._analyzer.get_next_nodes())
-
-        # Schedule nodes to be worked on
-        yield st.SCHEDULING
-        if self.is_running():
-            not_done, failures = self._scheduler.schedule(next_nodes)
-        else:
-            not_done, failures = (set(), [])
-
-        # Run!
-        #
-        # At this point we need to ensure we wait for all active nodes to
-        # finish running (even if we are asked to suspend) since we can not
-        # preempt those tasks (maybe in the future we will be better able to do
-        # this).
-        while not_done:
-            yield st.WAITING
-
-            # TODO(harlowja): maybe we should start doing 'yield from' this
-            # call sometime in the future, or equivalent that will work in
-            # py2 and py3.
-            done, not_done = self._waiter.wait_for_any(not_done, timeout)
-
-            # Analyze the results and schedule more nodes (unless we had
-            # failures). If failures occurred just continue processing what
-            # is running (so that we don't leave it abandoned) but do not
-            # schedule anything new.
-            yield st.ANALYZING
-            next_nodes = set()
-            for future in done:
-                try:
-                    node, event, result = future.result()
-                    retain = self._completer.complete(node, event, result)
-                    if retain and isinstance(result, misc.Failure):
-                        failures.append(result)
-                except Exception:
-                    failures.append(misc.Failure())
+        """Runs the nodes using a built state machine."""
+        machine, memory = self.builder.build(timeout=timeout)
+        for (_prior_state, new_state) in machine.run_iter('start'):
+            # NOTE(harlowja): skip over meta-states.
+            if new_state not in _META_STATES:
+                if new_state == st.FAILURE:
+                    yield (new_state, memory.failures)
                else:
-                    try:
-                        more_nodes = self._analyzer.get_next_nodes(node)
-                    except Exception:
-                        failures.append(misc.Failure())
-                    else:
-                        next_nodes.update(more_nodes)
-            if next_nodes and not failures and self.is_running():
-                yield st.SCHEDULING
-                # Recheck incase someone suspended it.
-                if self.is_running():
-                    more_not_done, failures = self._scheduler.schedule(
-                        next_nodes)
-                    not_done.update(more_not_done)
-
-        if failures:
-            misc.Failure.reraise_if_any(failures)
-        if self._analyzer.get_next_nodes():
-            yield st.SUSPENDED
-        elif self._analyzer.is_success():
-            yield st.SUCCESS
-        else:
-            yield st.REVERTED
+                    yield (new_state, [])
--- a/taskflow/engines/action_engine/runtime.py
+++ b/taskflow/engines/action_engine/runtime.py
@@ -14,23 +14,24 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+from taskflow.engines.action_engine import analyzer as ca
+from taskflow.engines.action_engine import executor as ex
+from taskflow.engines.action_engine import retry_action as ra
+from taskflow.engines.action_engine import runner as ru
+from taskflow.engines.action_engine import task_action as ta
 from taskflow import exceptions as excp
 from taskflow import retry as retry_atom
 from taskflow import states as st
 from taskflow import task as task_atom
 from taskflow.utils import misc

-from taskflow.engines.action_engine import analyzer as ca
-from taskflow.engines.action_engine import executor as ex
-from taskflow.engines.action_engine import retry_action as ra
-from taskflow.engines.action_engine import runner as ru
-from taskflow.engines.action_engine import task_action as ta
-

 class Runtime(object):
-    """An object that contains various utility methods and properties that
-    represent the collection of runtime components and functionality needed
-    for an action engine to run to completion.
+    """A aggregate of runtime objects, properties, ... used during execution.
+
+    This object contains various utility methods and properties that represent
+    the collection of runtime components and functionality needed for an
+    action engine to run to completion.
    """

    def __init__(self, compilation, storage, task_notifier, task_executor):
@@ -155,8 +156,13 @@ class Completer(object):
        return False

    def _process_atom_failure(self, atom, failure):
-        """On atom failure find its retry controller, ask for the action to
-        perform with failed subflow and set proper intention for subflow nodes.
+        """Processes atom failure & applies resolution strategies.
+
+        On atom failure this will find the atoms associated retry controller
+        and ask that controller for the strategy to perform to resolve that
+        failure. After getting a resolution strategy decision this method will
+        then adjust the needed other atoms intentions, and states, ... so that
+        the failure can be worked around.
        """
        retry = self._analyzer.find_atom_retry(atom)
        if retry:
@@ -195,6 +201,9 @@ class Scheduler(object):

    def _schedule_node(self, node):
        """Schedule a single node for execution."""
+        # TODO(harlowja): we need to rework this so that we aren't doing type
+        # checking here, type checking usually means something isn't done right
+        # and usually will limit extensibility in the future.
        if isinstance(node, task_atom.BaseTask):
            return self._schedule_task(node)
        elif isinstance(node, retry_atom.Retry):
@@ -204,8 +213,10 @@ class Scheduler(object):
                            % (node, type(node)))

    def _schedule_retry(self, retry):
-        """Schedules the given retry for revert or execute depending
-        on its intention.
+        """Schedules the given retry atom for *future* completion.
+
+        Depending on the atoms stored intention this may schedule the retry
+        atom for reversion or execution.
        """
        intention = self._storage.get_atom_intention(retry.name)
        if intention == st.EXECUTE:
@@ -221,8 +232,10 @@ class Scheduler(object):
                                        " intention: %s" % intention)

    def _schedule_task(self, task):
-        """Schedules the given task for revert or execute depending
-        on its intention.
+        """Schedules the given task atom for *future* completion.
+
+        Depending on the atoms stored intention this may schedule the task
+        atom for reversion or execution.
        """
        intention = self._storage.get_atom_intention(task.name)
        if intention == st.EXECUTE:
--- a/taskflow/engines/action_engine/task_action.py
+++ b/taskflow/engines/action_engine/task_action.py
@@ -16,7 +16,6 @@

 import logging

-from taskflow import exceptions
 from taskflow import states
 from taskflow.utils import misc

@@ -32,10 +31,30 @@ class TaskAction(object):
        self._task_executor = task_executor
        self._notifier = notifier

-    def change_state(self, task, state, result=None, progress=None):
+    def _is_identity_transition(self, state, task, progress):
+        if state in SAVE_RESULT_STATES:
+            # saving result is never identity transition
+            return False
        old_state = self._storage.get_atom_state(task.name)
-        if old_state == state:
-            return state != states.PENDING
+        if state != old_state:
+            # changing state is not identity transition by definition
+            return False
+        # NOTE(imelnikov): last thing to check is that the progress has
+        # changed, which means progress is not None and is different from
+        # what is stored in the database.
+        if progress is None:
+            return False
+        old_progress = self._storage.get_task_progress(task.name)
+        if old_progress != progress:
+            return False
+        return True
+
+    def change_state(self, task, state, result=None, progress=None):
+        if self._is_identity_transition(state, task, progress):
+            # NOTE(imelnikov): ignore identity transitions in order
+            # to avoid extra write to storage backend and, what's
+            # more important, extra notifications
+            return
        if state in SAVE_RESULT_STATES:
            self._storage.save(task.name, result, state)
        else:
@@ -49,7 +68,6 @@ class TaskAction(object):
        self._notifier.notify(state, details)
        if progress is not None:
            task.update_progress(progress)
-        return True

    def _on_update_progress(self, task, event_data, progress, **kwargs):
        """Should be called when task updates its progress."""
@@ -62,9 +80,7 @@ class TaskAction(object):
                          task, progress)

    def schedule_execution(self, task):
-        if not self.change_state(task, states.RUNNING, progress=0.0):
-            raise exceptions.InvalidState("Task %s is in invalid state and"
-                                          " can't be executed" % task.name)
+        self.change_state(task, states.RUNNING, progress=0.0)
        kwargs = self._storage.fetch_mapped_args(task.rebind,
                                                 atom_name=task.name)
        task_uuid = self._storage.get_atom_uuid(task.name)
@@ -79,9 +95,7 @@ class TaskAction(object):
                              result=result, progress=1.0)

    def schedule_reversion(self, task):
-        if not self.change_state(task, states.REVERTING, progress=0.0):
-            raise exceptions.InvalidState("Task %s is in invalid state and"
-                                          " can't be reverted" % task.name)
+        self.change_state(task, states.REVERTING, progress=0.0)
        kwargs = self._storage.fetch_mapped_args(task.rebind,
                                                 atom_name=task.name)
        task_uuid = self._storage.get_atom_uuid(task.name)
--- a/taskflow/engines/base.py
+++ b/taskflow/engines/base.py
@@ -54,9 +54,12 @@ class EngineBase(object):

    @abc.abstractmethod
    def compile(self):
-        """Compiles the contained flow into a structure which the engine can
-        use to run or if this can not be done then an exception is thrown
-        indicating why this compilation could not be achieved.
+        """Compiles the contained flow into a internal representation.
+
+        This internal representation is what the engine will *actually* use to
+        run. If this compilation can not be accomplished then an exception
+        is expected to be thrown with a message indicating why the compilation
+        could not be achieved.
        """

    @abc.abstractmethod
--- a/taskflow/engines/helpers.py
+++ b/taskflow/engines/helpers.py
@@ -31,15 +31,23 @@ from taskflow.utils import reflection
 ENGINES_NAMESPACE = 'taskflow.engines'


+def _fetch_factory(factory_name):
+    try:
+        return importutils.import_class(factory_name)
+    except (ImportError, ValueError) as e:
+        raise ImportError("Could not import factory %r: %s"
+                          % (factory_name, e))
+
+
 def _fetch_validate_factory(flow_factory):
    if isinstance(flow_factory, six.string_types):
-        factory_fun = importutils.import_class(flow_factory)
+        factory_fun = _fetch_factory(flow_factory)
        factory_name = flow_factory
    else:
        factory_fun = flow_factory
        factory_name = reflection.get_callable_name(flow_factory)
        try:
-            reimported = importutils.import_class(factory_name)
+            reimported = _fetch_factory(factory_name)
            assert reimported == factory_fun
        except (ImportError, AssertionError):
            raise ValueError('Flow factory %r is not reimportable by name %s'
@@ -50,7 +58,7 @@ def _fetch_validate_factory(flow_factory):
 def load(flow, store=None, flow_detail=None, book=None,
         engine_conf=None, backend=None, namespace=ENGINES_NAMESPACE,
         **kwargs):
-    """Load flow into engine.
+    """Load a flow into an engine.

    This function creates and prepares engine to run the
    flow. All that is left is to run the engine with 'run()' method.
@@ -151,8 +159,7 @@ def run(flow, store=None, flow_detail=None, book=None,
 def save_factory_details(flow_detail,
                         flow_factory, factory_args, factory_kwargs,
                         backend=None):
-    """Saves the given factories reimportable name, args, kwargs into the
-    flow detail.
+    """Saves the given factories reimportable attributes into the flow detail.

    This function saves the factory name, arguments, and keyword arguments
    into the given flow details object  and if a backend is provided it will
@@ -227,9 +234,11 @@ def load_from_factory(flow_factory, factory_args=None, factory_kwargs=None,


 def flow_from_detail(flow_detail):
-    """Recreate flow previously loaded with load_form_factory.
+    """Reloads a flow previously saved.

-    Gets flow factory name from metadata, calls it to recreate the flow.
+    Gets the flow factories name and any arguments and keyword arguments from
+    the flow details metadata, and then calls that factory to recreate the
+    flow.

    :param flow_detail: FlowDetail that holds state of the flow to load
    """
@@ -241,7 +250,7 @@ def flow_from_detail(flow_detail):
                         % (flow_detail.name, flow_detail.uuid))

    try:
-        factory_fun = importutils.import_class(factory_data['name'])
+        factory_fun = _fetch_factory(factory_data['name'])
    except (KeyError, ImportError):
        raise ImportError('Could not import factory for flow %s %s'
                          % (flow_detail.name, flow_detail.uuid))
@@ -253,10 +262,10 @@ def flow_from_detail(flow_detail):

 def load_from_detail(flow_detail, store=None, engine_conf=None, backend=None,
                     namespace=ENGINES_NAMESPACE, **kwargs):
-    """Reload flow previously loaded with load_form_factory function.
+    """Reloads an engine previously saved.

-    Gets flow factory name from metadata, calls it to recreate the flow
-    and loads flow into engine with load().
+    This reloads the flow using the flow_from_detail() function and then calls
+    into the load() function to create an engine from that flow.

    :param flow_detail: FlowDetail that holds state of the flow to load
    :param store: dict -- data to put to storage to satisfy flow requirements
--- a/taskflow/engines/worker_based/cache.py
+++ b/taskflow/engines/worker_based/cache.py
@@ -14,54 +14,16 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

-import logging
 import random

 import six

 from taskflow.engines.worker_based import protocol as pr
-from taskflow.utils import lock_utils as lu
-
-LOG = logging.getLogger(__name__)
+from taskflow.types import cache as base


-class Cache(object):
-    """Represents thread-safe cache."""
-
-    def __init__(self):
-        self._data = {}
-        self._lock = lu.ReaderWriterLock()
-
-    def get(self, key):
-        """Retrieve a value from the cache."""
-        with self._lock.read_lock():
-            return self._data.get(key)
-
-    def set(self, key, value):
-        """Set a value in the cache."""
-        with self._lock.write_lock():
-            self._data[key] = value
-            LOG.debug("Cache updated. Capacity: %s", len(self._data))
-
-    def delete(self, key):
-        """Delete a value from the cache."""
-        with self._lock.write_lock():
-            self._data.pop(key, None)
-
-    def cleanup(self, on_expired_callback=None):
-        """Delete out-dated values from the cache."""
-        with self._lock.write_lock():
-            expired_values = [(k, v) for k, v in six.iteritems(self._data)
-                              if v.expired]
-            for (k, _v) in expired_values:
-                self._data.pop(k, None)
-        if on_expired_callback:
-            for (_k, v) in expired_values:
-                on_expired_callback(v)
-
-
-class RequestsCache(Cache):
-    """Represents thread-safe requests cache."""
+class RequestsCache(base.ExpiringCache):
+    """Represents a thread-safe requests cache."""

    def get_waiting_requests(self, tasks):
        """Get list of waiting requests by tasks."""
@@ -73,8 +35,8 @@ class RequestsCache(Cache):
        return waiting_requests


-class WorkersCache(Cache):
-    """Represents thread-safe workers cache."""
+class WorkersCache(base.ExpiringCache):
+    """Represents a thread-safe workers cache."""

    def get_topic_by_task(self, task):
        """Get topic for a given task."""
--- a/taskflow/engines/worker_based/dispatcher.py
+++ b/taskflow/engines/worker_based/dispatcher.py
@@ -0,0 +1,112 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+
+from kombu import exceptions as kombu_exc
+import six
+
+from taskflow import exceptions as excp
+
+LOG = logging.getLogger(__name__)
+
+
+class TypeDispatcher(object):
+    """Receives messages and dispatches to type specific handlers."""
+
+    def __init__(self, type_handlers):
+        self._handlers = dict(type_handlers)
+        self._requeue_filters = []
+
+    def add_requeue_filter(self, callback):
+        """Add a callback that can *request* message requeuing.
+
+        The callback will be activated before the message has been acked and
+        it can be used to instruct the dispatcher to requeue the message
+        instead of processing it.
+        """
+        assert six.callable(callback), "Callback must be callable"
+        self._requeue_filters.append(callback)
+
+    def _collect_requeue_votes(self, data, message):
+        # Returns how many of the filters asked for the message to be requeued.
+        requeue_votes = 0
+        for f in self._requeue_filters:
+            try:
+                if f(data, message):
+                    requeue_votes += 1
+            except Exception:
+                LOG.exception("Failed calling requeue filter to determine"
+                              " if message %r should be requeued.",
+                              message.delivery_tag)
+        return requeue_votes
+
+    def _requeue_log_error(self, message, errors):
+        # TODO(harlowja): Remove when http://github.com/celery/kombu/pull/372
+        # is merged and a version is released with this change...
+        try:
+            message.requeue()
+        except errors as exc:
+            # This was taken from how kombu is formatting its messages
+            # when its reject_log_error or ack_log_error functions are
+            # used so that we have a similar error format for requeuing.
+            LOG.critical("Couldn't requeue %r, reason:%r",
+                         message.delivery_tag, exc, exc_info=True)
+        else:
+            LOG.debug("AMQP message %r requeued.", message.delivery_tag)
+
+    def _process_message(self, data, message, message_type):
+        handler = self._handlers.get(message_type)
+        if handler is None:
+            message.reject_log_error(logger=LOG,
+                                     errors=(kombu_exc.MessageStateError,))
+            LOG.warning("Unexpected message type: '%s' in message"
+                        " %r", message_type, message.delivery_tag)
+        else:
+            if isinstance(handler, (tuple, list)):
+                handler, validator = handler
+                try:
+                    validator(data)
+                except excp.InvalidFormat as e:
+                    message.reject_log_error(
+                        logger=LOG, errors=(kombu_exc.MessageStateError,))
+                    LOG.warn("Message: %r, '%s' was rejected due to it being"
+                             " in an invalid format: %s",
+                             message.delivery_tag, message_type, e)
+                    return
+            message.ack_log_error(logger=LOG,
+                                  errors=(kombu_exc.MessageStateError,))
+            if message.acknowledged:
+                LOG.debug("AMQP message %r acknowledged.",
+                          message.delivery_tag)
+                handler(data, message)
+
+    def on_message(self, data, message):
+        """This method is called on incoming messages."""
+        LOG.debug("Got message: %r", message.delivery_tag)
+        if self._collect_requeue_votes(data, message):
+            self._requeue_log_error(message,
+                                    errors=(kombu_exc.MessageStateError,))
+        else:
+            try:
+                message_type = message.properties['type']
+            except KeyError:
+                message.reject_log_error(
+                    logger=LOG, errors=(kombu_exc.MessageStateError,))
+                LOG.warning("The 'type' message property is missing"
+                            " in message %r", message.delivery_tag)
+            else:
+                self._process_message(data, message, message_type)
--- a/taskflow/engines/worker_based/executor.py
+++ b/taskflow/engines/worker_based/executor.py
@@ -14,15 +14,16 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import functools
 import logging

-from kombu import exceptions as kombu_exc
-
 from taskflow.engines.action_engine import executor
 from taskflow.engines.worker_based import cache
 from taskflow.engines.worker_based import protocol as pr
 from taskflow.engines.worker_based import proxy
 from taskflow import exceptions as exc
+from taskflow.openstack.common import timeutils
+from taskflow.types import timing as tt
 from taskflow.utils import async_utils
 from taskflow.utils import misc
 from taskflow.utils import reflection
@@ -74,48 +75,42 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
        self._topics = topics
        self._requests_cache = cache.RequestsCache()
        self._workers_cache = cache.WorkersCache()
-        self._proxy = proxy.Proxy(uuid, exchange, self._on_message,
+        self._workers_arrival = threading.Condition()
+        handlers = {
+            pr.NOTIFY: [
+                self._process_notify,
+                functools.partial(pr.Notify.validate, response=True),
+            ],
+            pr.RESPONSE: [
+                self._process_response,
+                pr.Response.validate,
+            ],
+        }
+        self._proxy = proxy.Proxy(uuid, exchange, handlers,
                                  self._on_wait, **kwargs)
        self._proxy_thread = None
-        self._periodic = PeriodicWorker(misc.Timeout(pr.NOTIFY_PERIOD),
+        self._periodic = PeriodicWorker(tt.Timeout(pr.NOTIFY_PERIOD),
                                        [self._notify_topics])
        self._periodic_thread = None

-    def _on_message(self, data, message):
-        """This method is called on incoming message."""
-        LOG.debug("Got message: %s", data)
-        try:
-            # acknowledge message before processing
-            message.ack()
-        except kombu_exc.MessageStateError:
-            LOG.exception("Failed to acknowledge AMQP message.")
-        else:
-            LOG.debug("AMQP message acknowledged.")
-            try:
-                msg_type = message.properties['type']
-            except KeyError:
-                LOG.warning("The 'type' message property is missing.")
-            else:
-                if msg_type == pr.NOTIFY:
-                    self._process_notify(data)
-                elif msg_type == pr.RESPONSE:
-                    self._process_response(data, message)
-                else:
-                    LOG.warning("Unexpected message type: %s", msg_type)
-
-    def _process_notify(self, notify):
+    def _process_notify(self, notify, message):
        """Process notify message from remote side."""
        LOG.debug("Start processing notify message.")
        topic = notify['topic']
        tasks = notify['tasks']

        # add worker info to the cache
-        self._workers_cache.set(topic, tasks)
+        self._workers_arrival.acquire()
+        try:
+            self._workers_cache[topic] = tasks
+            self._workers_arrival.notify_all()
+        finally:
+            self._workers_arrival.release()

        # publish waiting requests
        for request in self._requests_cache.get_waiting_requests(tasks):
-            request.set_pending()
-            self._publish_request(request, topic)
+            if request.transition_and_log_error(pr.PENDING, logger=LOG):
+                self._publish_request(request, topic)

    def _process_response(self, response, message):
        """Process response from remote side."""
@@ -125,20 +120,23 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
        except KeyError:
            LOG.warning("The 'correlation_id' message property is missing.")
        else:
-            LOG.debug("Task uuid: '%s'", task_uuid)
            request = self._requests_cache.get(task_uuid)
            if request is not None:
                response = pr.Response.from_dict(response)
                if response.state == pr.RUNNING:
-                    request.set_running()
+                    request.transition_and_log_error(pr.RUNNING, logger=LOG)
                elif response.state == pr.PROGRESS:
                    request.on_progress(**response.data)
                elif response.state in (pr.FAILURE, pr.SUCCESS):
-                    # NOTE(imelnikov): request should not be in cache when
-                    # another thread can see its result and schedule another
-                    # request with same uuid; so we remove it, then set result
-                    self._requests_cache.delete(request.uuid)
-                    request.set_result(**response.data)
+                    moved = request.transition_and_log_error(response.state,
+                                                             logger=LOG)
+                    if moved:
+                        # NOTE(imelnikov): request should not be in the
+                        # cache when another thread can see its result and
+                        # schedule another request with the same uuid; so
+                        # we remove it, then set the result...
+                        del self._requests_cache[request.uuid]
+                        request.set_result(**response.data)
                else:
                    LOG.warning("Unexpected response status: '%s'",
                                response.state)
@@ -152,10 +150,21 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
        When request has expired it is removed from the requests cache and
        the `RequestTimeout` exception is set as a request result.
        """
-        LOG.debug("Request '%r' has expired.", request)
-        LOG.debug("The '%r' request has expired.", request)
-        request.set_result(misc.Failure.from_exception(
-            exc.RequestTimeout("The '%r' request has expired" % request)))
+        if request.transition_and_log_error(pr.FAILURE, logger=LOG):
+            # Raise an exception (and then catch it) so we get a nice
+            # traceback that the request will get instead of it getting
+            # just an exception with no traceback...
+            try:
+                request_age = timeutils.delta_seconds(request.created_on,
+                                                      timeutils.utcnow())
+                raise exc.RequestTimeout(
+                    "Request '%s' has expired after waiting for %0.2f"
+                    " seconds for it to transition out of (%s) states"
+                    % (request, request_age, ", ".join(pr.WAITING_STATES)))
+            except exc.RequestTimeout:
+                with misc.capture_failure() as fail:
+                    LOG.debug(fail.exception_str)
+                    request.set_result(fail)

    def _on_wait(self):
        """This function is called cyclically between draining events."""
@@ -174,11 +183,11 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
            # before putting it into the requests cache to prevent the notify
            # processing thread get list of waiting requests and publish it
            # before it is published here, so it wouldn't be published twice.
-            request.set_pending()
-            self._requests_cache.set(request.uuid, request)
-            self._publish_request(request, topic)
+            if request.transition_and_log_error(pr.PENDING, logger=LOG):
+                self._requests_cache[request.uuid] = request
+                self._publish_request(request, topic)
        else:
-            self._requests_cache.set(request.uuid, request)
+            self._requests_cache[request.uuid] = request

        return request.result

@@ -191,10 +200,10 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
                                correlation_id=request.uuid)
        except Exception:
            with misc.capture_failure() as failure:
-                LOG.exception("Failed to submit the '%s' request." %
-                              request)
-                self._requests_cache.delete(request.uuid)
-                request.set_result(failure)
+                LOG.exception("Failed to submit the '%s' request.", request)
+                if request.transition_and_log_error(pr.FAILURE, logger=LOG):
+                    del self._requests_cache[request.uuid]
+                    request.set_result(failure)

    def _notify_topics(self):
        """Cyclically called to publish notify message to each topic."""
@@ -215,8 +224,35 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
        """Wait for futures returned by this executor to complete."""
        return async_utils.wait_for_any(fs, timeout)

+    def wait_for_workers(self, workers=1, timeout=None):
+        """Waits for geq workers to notify they are ready to do work.
+
+        NOTE(harlowja): if a timeout is provided this function will wait
+        until that timeout expires, if the amount of workers does not reach
+        the desired amount of workers before the timeout expires then this will
+        return how many workers are still needed, otherwise it will
+        return zero.
+        """
+        if workers <= 0:
+            raise ValueError("Worker amount must be greater than zero")
+        w = None
+        if timeout is not None:
+            w = tt.StopWatch(timeout).start()
+        self._workers_arrival.acquire()
+        try:
+            while len(self._workers_cache) < workers:
+                if w is not None and w.expired():
+                    return workers - len(self._workers_cache)
+                timeout = None
+                if w is not None:
+                    timeout = w.leftover()
+                self._workers_arrival.wait(timeout)
+            return 0
+        finally:
+            self._workers_arrival.release()
+
    def start(self):
-        """Start proxy thread (and associated topic notification thread)."""
+        """Starts proxy thread and associated topic notification thread."""
        if not _is_alive(self._proxy_thread):
            self._proxy_thread = tu.daemon_thread(self._proxy.start)
            self._proxy_thread.start()
@@ -227,9 +263,7 @@ class WorkerTaskExecutor(executor.TaskExecutorBase):
            self._periodic_thread.start()

    def stop(self):
-        """Stop proxy thread (and associated topic notification thread), so
-        those threads will be gracefully terminated.
-        """
+        """Stops proxy thread and associated topic notification thread."""
        if self._periodic_thread is not None:
            self._periodic.stop()
            self._periodic_thread.join()
--- a/taskflow/engines/worker_based/protocol.py
+++ b/taskflow/engines/worker_based/protocol.py
@@ -15,16 +15,24 @@
 #    under the License.

 import abc
-
-import six
+import logging
+import threading

 from concurrent import futures
+import jsonschema
+from jsonschema import exceptions as schema_exc
+import six

 from taskflow.engines.action_engine import executor
+from taskflow import exceptions as excp
+from taskflow.openstack.common import timeutils
+from taskflow.types import timing as tt
+from taskflow.utils import lock_utils
 from taskflow.utils import misc
 from taskflow.utils import reflection

-# NOTE(skudriashev): This is protocol events, not related to the task states.
+# NOTE(skudriashev): This is protocol states and events, which are not
+# related to task states.
 WAITING = 'WAITING'
 PENDING = 'PENDING'
 RUNNING = 'RUNNING'
@@ -32,6 +40,35 @@ SUCCESS = 'SUCCESS'
 FAILURE = 'FAILURE'
 PROGRESS = 'PROGRESS'

+# During these states the expiry is active (once out of these states the expiry
+# no longer matters, since we have no way of knowing how long a task will run
+# for).
+WAITING_STATES = (WAITING, PENDING)
+
+_ALL_STATES = (WAITING, PENDING, RUNNING, SUCCESS, FAILURE, PROGRESS)
+_STOP_TIMER_STATES = (RUNNING, SUCCESS, FAILURE)
+
+# Transitions that a request state can go through.
+_ALLOWED_TRANSITIONS = (
+    # Used when a executor starts to publish a request to a selected worker.
+    (WAITING, PENDING),
+    # When a request expires (isn't able to be processed by any worker).
+    (WAITING, FAILURE),
+    # Worker has started executing a request.
+    (PENDING, RUNNING),
+    # Worker failed to construct/process a request to run (either the worker
+    # did not transition to RUNNING in the given timeout or the worker itself
+    # had some type of failure before RUNNING started).
+    #
+    # Also used by the executor if the request was attempted to be published
+    # but that did publishing process did not work out.
+    (PENDING, FAILURE),
+    # Execution failed due to some type of remote failure.
+    (RUNNING, FAILURE),
+    # Execution succeeded & has completed.
+    (RUNNING, SUCCESS),
+)
+
 # Remote task actions.
 EXECUTE = 'execute'
 REVERT = 'revert'
@@ -61,6 +98,14 @@ NOTIFY = 'NOTIFY'
 REQUEST = 'REQUEST'
 RESPONSE = 'RESPONSE'

+# Special jsonschema validation types/adjustments.
+_SCHEMA_TYPES = {
+    # See: https://github.com/Julian/jsonschema/issues/148
+    'array': (list, tuple),
+}
+
+LOG = logging.getLogger(__name__)
+

@six.add_metaclass(abc.ABCMeta)
 class Message(object):
@@ -78,18 +123,101 @@ class Notify(Message):
    """Represents notify message type."""
    TYPE = NOTIFY

+    # NOTE(harlowja): the executor (the entity who initially requests a worker
+    # to send back a notification response) schema is different than the
+    # worker response schema (that's why there are two schemas here).
+    _RESPONSE_SCHEMA = {
+        "type": "object",
+        'properties': {
+            'topic': {
+                "type": "string",
+            },
+            'tasks': {
+                "type": "array",
+                "items": {
+                    "type": "string",
+                },
+            }
+        },
+        "required": ["topic", 'tasks'],
+        "additionalProperties": False,
+    }
+    _SENDER_SCHEMA = {
+        "type": "object",
+        "additionalProperties": False,
+    }
+
    def __init__(self, **data):
        self._data = data

    def to_dict(self):
        return self._data

+    @classmethod
+    def validate(cls, data, response):
+        if response:
+            schema = cls._RESPONSE_SCHEMA
+        else:
+            schema = cls._SENDER_SCHEMA
+        try:
+            jsonschema.validate(data, schema, types=_SCHEMA_TYPES)
+        except schema_exc.ValidationError as e:
+            if response:
+                raise excp.InvalidFormat("%s message response data not of the"
+                                         " expected format: %s"
+                                         % (cls.TYPE, e.message), e)
+            else:
+                raise excp.InvalidFormat("%s message sender data not of the"
+                                         " expected format: %s"
+                                         % (cls.TYPE, e.message), e)
+

 class Request(Message):
-    """Represents request with execution results. Every request is created in
-    the WAITING state and is expired within the given timeout.
+    """Represents request with execution results.
+
+    Every request is created in the WAITING state and is expired within the
+    given timeout if it does not transition out of the (WAITING, PENDING)
+    states.
    """
+
    TYPE = REQUEST
+    _SCHEMA = {
+        "type": "object",
+        'properties': {
+            # These two are typically only sent on revert actions (that is
+            # why are are not including them in the required section).
+            'result': {},
+            'failures': {
+                "type": "object",
+            },
+            'task_cls': {
+                'type': 'string',
+            },
+            'task_name': {
+                'type': 'string',
+            },
+            'task_version': {
+                "oneOf": [
+                    {
+                        "type": "string",
+                    },
+                    {
+                        "type": "array",
+                    },
+                ],
+            },
+            'action': {
+                "type": "string",
+                "enum": list(six.iterkeys(ACTION_TO_EVENT)),
+            },
+            # Keyword arguments that end up in the revert() or execute()
+            # method of the remote task.
+            'arguments': {
+                "type": "object",
+            },
+        },
+        'required': ['task_cls', 'task_name', 'task_version', 'action'],
+    }

    def __init__(self, task, uuid, action, arguments, progress_callback,
                 timeout, **kwargs):
@@ -101,13 +229,12 @@ class Request(Message):
        self._arguments = arguments
        self._progress_callback = progress_callback
        self._kwargs = kwargs
-        self._watch = misc.StopWatch(duration=timeout).start()
+        self._watch = tt.StopWatch(duration=timeout).start()
        self._state = WAITING
+        self._lock = threading.Lock()
+        self._created_on = timeutils.utcnow()
        self.result = futures.Future()

-    def __repr__(self):
-        return "%s:%s" % (self._task_cls, self._action)
-
    @property
    def uuid(self):
        return self._uuid
@@ -120,6 +247,10 @@ class Request(Message):
    def state(self):
        return self._state

+    @property
+    def created_on(self):
+        return self._created_on
+
    @property
    def expired(self):
        """Check if request has expired.
@@ -131,13 +262,16 @@ class Request(Message):
        state for more then the given timeout (it is not considered to be
        expired in any other state).
        """
-        if self._state in (WAITING, PENDING):
+        if self._state in WAITING_STATES:
            return self._watch.expired()
        return False

    def to_dict(self):
-        """Return json-serializable request, converting all `misc.Failure`
-        objects into dictionaries.
+        """Return json-serializable request.
+
+        To convert requests that have failed due to some exception this will
+        convert all `misc.Failure` objects into dictionaries (which will then
+        be reconstituted by the receiver).
        """
        request = dict(task_cls=self._task_cls, task_name=self._task.name,
                       task_version=self._task.version, action=self._action,
@@ -158,20 +292,121 @@ class Request(Message):
    def set_result(self, result):
        self.result.set_result((self._task, self._event, result))

-    def set_pending(self):
-        self._state = PENDING
-
-    def set_running(self):
-        self._state = RUNNING
-        self._watch.stop()
-
    def on_progress(self, event_data, progress):
        self._progress_callback(self._task, event_data, progress)

+    def transition_and_log_error(self, new_state, logger=None):
+        """Transitions *and* logs an error if that transitioning raises.
+
+        This overlays the transition function and performs nearly the same
+        functionality but instead of raising if the transition was not valid
+        it logs a warning to the provided logger and returns False to
+        indicate that the transition was not performed (note that this
+        is *different* from the transition function where False means
+        ignored).
+        """
+        if logger is None:
+            logger = LOG
+        moved = False
+        try:
+            moved = self.transition(new_state)
+        except excp.InvalidState:
+            logger.warn("Failed to transition '%s' to %s state.", self,
+                        new_state, exc_info=True)
+        return moved
+
+    @lock_utils.locked
+    def transition(self, new_state):
+        """Transitions the request to a new state.
+
+        If transition was performed, it returns True. If transition
+        should was ignored, it returns False. If transition was not
+        valid (and will not be performed), it raises an InvalidState
+        exception.
+        """
+        old_state = self._state
+        if old_state == new_state:
+            return False
+        pair = (old_state, new_state)
+        if pair not in _ALLOWED_TRANSITIONS:
+            raise excp.InvalidState("Request transition from %s to %s is"
+                                    " not allowed" % pair)
+        if new_state in _STOP_TIMER_STATES:
+            self._watch.stop()
+        self._state = new_state
+        LOG.debug("Transitioned '%s' from %s state to %s state", self,
+                  old_state, new_state)
+        return True
+
+    @classmethod
+    def validate(cls, data):
+        try:
+            jsonschema.validate(data, cls._SCHEMA, types=_SCHEMA_TYPES)
+        except schema_exc.ValidationError as e:
+            raise excp.InvalidFormat("%s message response data not of the"
+                                     " expected format: %s"
+                                     % (cls.TYPE, e.message), e)
+

 class Response(Message):
    """Represents response message type."""
    TYPE = RESPONSE
+    _SCHEMA = {
+        "type": "object",
+        'properties': {
+            'state': {
+                "type": "string",
+                "enum": list(_ALL_STATES),
+            },
+            'data': {
+                "anyOf": [
+                    {
+                        "$ref": "#/definitions/progress",
+                    },
+                    {
+                        "$ref": "#/definitions/completion",
+                    },
+                    {
+                        "$ref": "#/definitions/empty",
+                    },
+                ],
+            },
+        },
+        "required": ["state", 'data'],
+        "additionalProperties": False,
+        "definitions": {
+            "progress": {
+                "type": "object",
+                "properties": {
+                    'progress': {
+                        'type': 'number',
+                    },
+                    'event_data': {
+                        'type': 'object',
+                    },
+                },
+                "required": ["progress", 'event_data'],
+                "additionalProperties": False,
+            },
+            # Used when sending *only* request state changes (and no data is
+            # expected).
+            "empty": {
+                "type": "object",
+                "additionalProperties": False,
+            },
+            "completion": {
+                "type": "object",
+                "properties": {
+                    # This can be any arbitrary type that a task returns, so
+                    # thats why we can't be strict about what type it is since
+                    # any of the json serializable types are allowed.
+                    "result": {},
+                },
+                "required": ["result"],
+                "additionalProperties": False,
+            },
+        },
+    }

    def __init__(self, state, **data):
        self._state = state
@@ -195,3 +430,12 @@ class Response(Message):

    def to_dict(self):
        return dict(state=self._state, data=self._data)
+
+    @classmethod
+    def validate(cls, data):
+        try:
+            jsonschema.validate(data, cls._SCHEMA, types=_SCHEMA_TYPES)
+        except schema_exc.ValidationError as e:
+            raise excp.InvalidFormat("%s message response data not of the"
+                                     " expected format: %s"
+                                     % (cls.TYPE, e.message), e)
--- a/taskflow/engines/worker_based/proxy.py
+++ b/taskflow/engines/worker_based/proxy.py
@@ -14,13 +14,16 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

-import kombu
 import logging
 import socket
 import threading

+import kombu
 import six

+from taskflow.engines.worker_based import dispatcher
+from taskflow.utils import misc
+
 LOG = logging.getLogger(__name__)

 # NOTE(skudriashev): A timeout of 1 is often used in environments where
@@ -29,39 +32,56 @@ DRAIN_EVENTS_PERIOD = 1


 class Proxy(object):
-    """Proxy picks up messages from the named exchange, calls on_message
-    callback when new message received and is used to publish messages.
-    """
+    """A proxy processes messages from/to the named exchange."""

-    def __init__(self, topic, exchange_name, on_message, on_wait=None,
+    def __init__(self, topic, exchange_name, type_handlers, on_wait=None,
                 **kwargs):
        self._topic = topic
        self._exchange_name = exchange_name
-        self._on_message = on_message
        self._on_wait = on_wait
        self._running = threading.Event()
-        self._url = kwargs.get('url')
-        self._transport = kwargs.get('transport')
-        self._transport_opts = kwargs.get('transport_options')
+        self._dispatcher = dispatcher.TypeDispatcher(type_handlers)
+        self._dispatcher.add_requeue_filter(
+            # NOTE(skudriashev): Process all incoming messages only if proxy is
+            # running, otherwise requeue them.
+            lambda data, message: not self.is_running)
+
+        url = kwargs.get('url')
+        transport = kwargs.get('transport')
+        transport_opts = kwargs.get('transport_options')

        self._drain_events_timeout = DRAIN_EVENTS_PERIOD
-        if self._transport == 'memory' and self._transport_opts:
-            polling_interval = self._transport_opts.get('polling_interval')
-            if polling_interval:
+        if transport == 'memory' and transport_opts:
+            polling_interval = transport_opts.get('polling_interval')
+            if polling_interval is not None:
                self._drain_events_timeout = polling_interval

        # create connection
-        self._conn = kombu.Connection(self._url, transport=self._transport,
-                                      transport_options=self._transport_opts)
+        self._conn = kombu.Connection(url, transport=transport,
+                                      transport_options=transport_opts)

        # create exchange
        self._exchange = kombu.Exchange(name=self._exchange_name,
                                        durable=False,
                                        auto_delete=True)

+    @property
+    def connection_details(self):
+        # The kombu drivers seem to use 'N/A' when they don't have a version...
+        driver_version = self._conn.transport.driver_version()
+        if driver_version and driver_version.lower() == 'n/a':
+            driver_version = None
+        return misc.AttrDict(
+            uri=self._conn.as_uri(include_password=False),
+            transport=misc.AttrDict(
+                options=dict(self._conn.transport_options),
+                driver_type=self._conn.transport.driver_type,
+                driver_name=self._conn.transport.driver_name,
+                driver_version=driver_version))
+
    @property
    def is_running(self):
-        """Return whether proxy is running."""
+        """Return whether the proxy is running."""
        return self._running.is_set()

    def _make_queue(self, name, exchange, **kwargs):
@@ -74,7 +94,7 @@ class Proxy(object):
                           **kwargs)

    def publish(self, msg, routing_key, **kwargs):
-        """Publish message to the named exchange with routing key."""
+        """Publish message to the named exchange with given routing key."""
        LOG.debug("Sending %s", msg)
        if isinstance(routing_key, six.string_types):
            routing_keys = [routing_key]
@@ -97,7 +117,7 @@ class Proxy(object):
        with kombu.connections[self._conn].acquire(block=True) as conn:
            queue = self._make_queue(self._topic, self._exchange, channel=conn)
            with conn.Consumer(queues=queue,
-                               callbacks=[self._on_message]):
+                               callbacks=[self._dispatcher.on_message]):
                self._running.set()
                while self.is_running:
                    try:
--- a/taskflow/engines/worker_based/server.py
+++ b/taskflow/engines/worker_based/server.py
@@ -17,7 +17,7 @@
 import functools
 import logging

-from kombu import exceptions as kombu_exc
+import six

 from taskflow.engines.worker_based import protocol as pr
 from taskflow.engines.worker_based import proxy
@@ -26,60 +26,52 @@ from taskflow.utils import misc
 LOG = logging.getLogger(__name__)


+def delayed(executor):
+    """Wraps & runs the function using a futures compatible executor."""
+
+    def decorator(f):
+
+        @six.wraps(f)
+        def wrapper(*args, **kwargs):
+            return executor.submit(f, *args, **kwargs)
+
+        return wrapper
+
+    return decorator
+
+
 class Server(object):
    """Server implementation that waits for incoming tasks requests."""

    def __init__(self, topic, exchange, executor, endpoints, **kwargs):
-        self._proxy = proxy.Proxy(topic, exchange, self._on_message, **kwargs)
+        handlers = {
+            pr.NOTIFY: [
+                delayed(executor)(self._process_notify),
+                functools.partial(pr.Notify.validate, response=False),
+            ],
+            pr.REQUEST: [
+                delayed(executor)(self._process_request),
+                pr.Request.validate,
+            ],
+        }
+        self._proxy = proxy.Proxy(topic, exchange, handlers,
+                                  on_wait=None, **kwargs)
        self._topic = topic
        self._executor = executor
        self._endpoints = dict([(endpoint.name, endpoint)
                                for endpoint in endpoints])

-    def _on_message(self, data, message):
-        """This method is called on incoming message."""
-        LOG.debug("Got message: %s", data)
-        # NOTE(skudriashev): Process all incoming messages only if proxy is
-        # running, otherwise requeue them.
-        if self._proxy.is_running:
-            # NOTE(skudriashev): Process request only if message has been
-            # acknowledged successfully.
-            try:
-                # acknowledge message before processing
-                message.ack()
-            except kombu_exc.MessageStateError:
-                LOG.exception("Failed to acknowledge AMQP message.")
-            else:
-                LOG.debug("AMQP message acknowledged.")
-                try:
-                    msg_type = message.properties['type']
-                except KeyError:
-                    LOG.warning("The 'type' message property is missing.")
-                else:
-                    if msg_type == pr.NOTIFY:
-                        handler = self._process_notify
-                    elif msg_type == pr.REQUEST:
-                        handler = self._process_request
-                    else:
-                        LOG.warning("Unexpected message type: %s", msg_type)
-                        return
-                    # spawn new thread to process request
-                    self._executor.submit(handler, data, message)
-        else:
-            try:
-                # requeue message
-                message.requeue()
-            except kombu_exc.MessageStateError:
-                LOG.exception("Failed to requeue AMQP message.")
-            else:
-                LOG.debug("AMQP message requeued.")
+    @property
+    def connection_details(self):
+        return self._proxy.connection_details

    @staticmethod
    def _parse_request(task_cls, task_name, action, arguments, result=None,
                       failures=None, **kwargs):
-        """Parse request before it can be processed. All `misc.Failure` objects
-        that have been converted to dict on the remote side to be serializable
-        are now converted back to objects.
+        """Parse request before it can be further processed.
+
+        All `misc.Failure` objects that have been converted to dict on the
+        remote side will now converted back to `misc.Failure` objects.
        """
        action_args = dict(arguments=arguments, task_name=task_name)
        if result is not None:
@@ -96,9 +88,10 @@ class Server(object):

    @staticmethod
    def _parse_message(message):
-        """Parse broker message to get the `reply_to` and the `correlation_id`
-        properties. If required properties are missing - the `ValueError` is
-        raised.
+        """Extracts required attributes out of the messages properties.
+
+        This extracts the `reply_to` and the `correlation_id` properties. If
+        any of these required properties are missing a `ValueError` is raised.
        """
        properties = []
        for prop in ('reply_to', 'correlation_id'):
--- a/taskflow/engines/worker_based/worker.py
+++ b/taskflow/engines/worker_based/worker.py
@@ -15,6 +15,11 @@
 #    under the License.

 import logging
+import os
+import platform
+import socket
+import string
+import sys

 from concurrent import futures

@@ -23,6 +28,37 @@ from taskflow.engines.worker_based import server
 from taskflow import task as t_task
 from taskflow.utils import reflection
 from taskflow.utils import threading_utils as tu
+from taskflow import version
+
+BANNER_TEMPLATE = string.Template("""
+TaskFlow v${version} WBE worker.
+Connection details:
+  Driver = $transport_driver
+  Exchange = $exchange
+  Topic = $topic
+  Transport = $transport_type
+  Uri = $connection_uri
+Powered by:
+  Executor = $executor_type
+  Thread count = $executor_thread_count
+Supported endpoints:$endpoints
+System details:
+  Hostname = $hostname
+  Pid = $pid
+  Platform = $platform
+  Python = $python
+  Thread id = $thread_id
+""".strip())
+BANNER_TEMPLATE.defaults = {
+    # These values may not be possible to fetch/known, default to unknown...
+    'pid': '???',
+    'hostname': '???',
+    'executor_thread_count': '???',
+    'endpoints': ' %s' % ([]),
+    # These are static (avoid refetching...)
+    'version': version.version_string(),
+    'python': sys.version.split("\n", 1)[0].strip(),
+}

 LOG = logging.getLogger(__name__)

@@ -78,6 +114,7 @@ class Worker(object):
            self._executor = futures.ThreadPoolExecutor(self._threads_count)
            self._owns_executor = True
        self._endpoints = self._derive_endpoints(tasks)
+        self._exchange = exchange
        self._server = server.Server(topic, exchange, self._executor,
                                     self._endpoints, **kwargs)

@@ -87,17 +124,48 @@ class Worker(object):
        derived_tasks = reflection.find_subclasses(tasks, t_task.BaseTask)
        return [endpoint.Endpoint(task) for task in derived_tasks]

-    def run(self):
-        """Run worker."""
-        if self._threads_count != -1:
-            LOG.info("Starting the '%s' topic worker in %s threads.",
-                     self._topic, self._threads_count)
+    def _generate_banner(self):
+        """Generates a banner that can be useful to display before running."""
+        tpl_params = {}
+        connection_details = self._server.connection_details
+        transport = connection_details.transport
+        if transport.driver_version:
+            transport_driver = "%s v%s" % (transport.driver_name,
+                                           transport.driver_version)
        else:
-            LOG.info("Starting the '%s' topic worker using a %s.", self._topic,
-                     self._executor)
-        LOG.info("Tasks list:")
-        for endpoint in self._endpoints:
-            LOG.info("|-- %s", endpoint)
+            transport_driver = transport.driver_name
+        tpl_params['transport_driver'] = transport_driver
+        tpl_params['exchange'] = self._exchange
+        tpl_params['topic'] = self._topic
+        tpl_params['transport_type'] = transport.driver_type
+        tpl_params['connection_uri'] = connection_details.uri
+        tpl_params['executor_type'] = reflection.get_class_name(self._executor)
+        if self._threads_count != -1:
+            tpl_params['executor_thread_count'] = self._threads_count
+        if self._endpoints:
+            pretty_endpoints = []
+            for ep in self._endpoints:
+                pretty_endpoints.append("  - %s" % ep)
+            # This ensures there is a newline before the list...
+            tpl_params['endpoints'] = "\n" + "\n".join(pretty_endpoints)
+        try:
+            tpl_params['hostname'] = socket.getfqdn()
+        except socket.error:
+            pass
+        try:
+            tpl_params['pid'] = os.getpid()
+        except OSError:
+            pass
+        tpl_params['platform'] = platform.platform()
+        tpl_params['thread_id'] = tu.get_ident()
+        return BANNER_TEMPLATE.substitute(BANNER_TEMPLATE.defaults,
+                                          **tpl_params)
+
+    def run(self, display_banner=True):
+        """Runs the worker."""
+        if display_banner:
+            for line in self._generate_banner().splitlines():
+                LOG.info(line)
        self._server.start()

    def wait(self):
--- a/taskflow/examples/build_a_car.py
+++ b/taskflow/examples/build_a_car.py
@@ -32,14 +32,16 @@ from taskflow.patterns import graph_flow as gf
 from taskflow.patterns import linear_flow as lf
 from taskflow import task

+import example_utils as eu  # noqa

-# INTRO: This examples shows how a graph_flow and linear_flow can be used
-# together to execute non-dependent tasks by going through the steps required
-# to build a simplistic car (an assembly line if you will). It also shows
-# how raw functions can be wrapped into a task object instead of being forced
-# to use the more heavy task base class. This is useful in scenarios where
-# pre-existing code has functions that you easily want to plug-in to taskflow,
-# without requiring a large amount of code changes.
+
+# INTRO: This examples shows how a graph flow and linear flow can be used
+# together to execute dependent & non-dependent tasks by going through the
+# steps required to build a simplistic car (an assembly line if you will). It
+# also shows how raw functions can be wrapped into a task object instead of
+# being forced to use the more *heavy* task base class. This is useful in
+# scenarios where pre-existing code has functions that you easily want to
+# plug-in to taskflow, without requiring a large amount of code changes.


 def build_frame():
@@ -58,6 +60,9 @@ def build_wheels():
    return '4'


+# These just return true to indiciate success, they would in the real work
+# do more than just that.
+
 def install_engine(frame, engine):
    return True

@@ -75,13 +80,7 @@ def install_wheels(frame, engine, engine_installed, wheels):


 def trash(**kwargs):
-    print_wrapped("Throwing away pieces of car!")
-
-
-def print_wrapped(text):
-    print("-" * (len(text)))
-    print(text)
-    print("-" * (len(text)))
+    eu.print_wrapped("Throwing away pieces of car!")


 def startup(**kwargs):
@@ -114,6 +113,9 @@ def task_watch(state, details):

 flow = lf.Flow("make-auto").add(
    task.FunctorTask(startup, revert=trash, provides='ran'),
+    # A graph flow allows automatic dependency based ordering, the ordering
+    # is determined by analyzing the symbols required and provided and ordering
+    # execution based on a functioning order (if one exists).
    gf.Flow("install-parts").add(
        task.FunctorTask(build_frame, provides='frame'),
        task.FunctorTask(build_engine, provides='engine'),
@@ -141,7 +143,7 @@ flow = lf.Flow("make-auto").add(
 # the tasks should produce, in this example this specification will influence
 # what those tasks do and what output they create. Different tasks depend on
 # different information from this specification, all of which will be provided
-# automatically by the engine.
+# automatically by the engine to those tasks.
 spec = {
    "frame": 'steel',
    "engine": 'honda',
@@ -164,7 +166,7 @@ engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
 engine.notifier.register('*', flow_watch)
 engine.task_notifier.register('*', task_watch)

-print_wrapped("Building a car")
+eu.print_wrapped("Building a car")
 engine.run()

 # Alter the specification and ensure that the reverting logic gets triggered
@@ -177,8 +179,8 @@ engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
 engine.notifier.register('*', flow_watch)
 engine.task_notifier.register('*', task_watch)

-print_wrapped("Building a wrong car that doesn't match specification")
+eu.print_wrapped("Building a wrong car that doesn't match specification")
 try:
    engine.run()
 except Exception as e:
-    print_wrapped("Flow failed: %s" % e)
+    eu.print_wrapped("Flow failed: %s" % e)
--- a/taskflow/examples/buildsystem.py
+++ b/taskflow/examples/buildsystem.py
@@ -29,8 +29,11 @@ import taskflow.engines
 from taskflow.patterns import graph_flow as gf
 from taskflow import task

+import example_utils as eu  # noqa

-# In this example we demonstrate use of TargetedFlow to make oversimplified
+
+# In this example we demonstrate use of a target flow (a flow that only
+# executes up to a specified target) to make an *oversimplified* pseudo
 # build system. It pretends to compile all sources to object files and
 # link them into an executable. It also can build docs, but this can be
 # "switched off" via targeted flow special power -- ability to ignore
@@ -75,7 +78,7 @@ class BuildDocsTask(task.Task):


 def make_flow_and_store(source_files, executable_only=False):
-    flow = gf.TargetedFlow('build flow')
+    flow = gf.TargetedFlow('build-flow')
    object_targets = []
    store = {}
    for source in source_files:
@@ -97,12 +100,12 @@ def make_flow_and_store(source_files, executable_only=False):
    return flow, store


-SOURCE_FILES = ['first.c', 'second.cpp', 'main.cpp']
+if __name__ == "__main__":
+    SOURCE_FILES = ['first.c', 'second.cpp', 'main.cpp']
+    eu.print_wrapped('Running all tasks:')
+    flow, store = make_flow_and_store(SOURCE_FILES)
+    taskflow.engines.run(flow, store=store)

-print('Running all tasks:')
-flow, store = make_flow_and_store(SOURCE_FILES)
-taskflow.engines.run(flow, store=store)
-
-print('\nBuilding executable, no docs:')
-flow, store = make_flow_and_store(SOURCE_FILES, executable_only=True)
-taskflow.engines.run(flow, store=store)
+    eu.print_wrapped('Building executable, no docs:')
+    flow, store = make_flow_and_store(SOURCE_FILES, executable_only=True)
+    taskflow.engines.run(flow, store=store)
--- a/taskflow/examples/calculate_in_parallel.py
+++ b/taskflow/examples/calculate_in_parallel.py
@@ -26,25 +26,24 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)

 import taskflow.engines
-
 from taskflow.patterns import linear_flow as lf
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task

-# INTRO: This examples shows how linear_flow and unordered_flow can be used
-# together to execute calculations in parallel and then use the
-# result for the next task. Adder task is used for all calculations
-# and arguments' bindings are used to set correct parameters to the task.
+# INTRO: This examples shows how a linear flow and a unordered flow can be
+# used together to execute calculations in parallel and then use the
+# result for the next task/s. The adder task is used for all calculations
+# and argument bindings are used to set correct parameters for each task.


 # This task provides some values from as a result of execution, this can be
 # useful when you want to provide values from a static set to other tasks that
 # depend on those values existing before those tasks can run.
 #
-# This method is *depreciated* in favor of a simpler mechanism that just
-# provides those values on engine running by prepopulating the storage backend
-# before your tasks are ran (which accomplishes a similar goal in a more
-# uniform manner).
+# NOTE(harlowja): this usage is *depreciated* in favor of a simpler mechanism
+# that provides those values on engine running by prepopulating the storage
+# backend before your tasks are ran (which accomplishes a similar goal in a
+# more uniform manner).
 class Provider(task.Task):
    def __init__(self, name, *args, **kwargs):
        super(Provider, self).__init__(name=name, **kwargs)
--- a/taskflow/examples/calculate_linear.py
+++ b/taskflow/examples/calculate_linear.py
@@ -30,11 +30,11 @@ from taskflow.patterns import linear_flow as lf
 from taskflow import task


-# INTRO: In this example linear_flow is used to group four tasks to calculate
+# INTRO: In this example a linear flow is used to group four tasks to calculate
 # a value. A single added task is used twice, showing how this can be done
 # and the twice added task takes in different bound values. In the first case
 # it uses default parameters ('x' and 'y') and in the second case arguments
-# are bound with ('z', 'd') keys from the engines storage mechanism.
+# are bound with ('z', 'd') keys from the engines internal storage mechanism.
 #
 # A multiplier task uses a binding that another task also provides, but this
 # example explicitly shows that 'z' parameter is bound with 'a' key
@@ -47,10 +47,10 @@ from taskflow import task
 # useful when you want to provide values from a static set to other tasks that
 # depend on those values existing before those tasks can run.
 #
-# This method is *depreciated* in favor of a simpler mechanism that just
-# provides those values on engine running by prepopulating the storage backend
-# before your tasks are ran (which accomplishes a similar goal in a more
-# uniform manner).
+# NOTE(harlowja): this usage is *depreciated* in favor of a simpler mechanism
+# that just provides those values on engine running by prepopulating the
+# storage backend before your tasks are ran (which accomplishes a similar goal
+# in a more uniform manner).
 class Provider(task.Task):

    def __init__(self, name, *args, **kwargs):
@@ -89,8 +89,8 @@ class Multiplier(task.Task):

 # Note here that the ordering is established so that the correct sequences
 # of operations occurs where the adding and multiplying is done according
-# to the expected and typical mathematical model. A graph_flow could also be
-# used here to automatically ensure the correct ordering.
+# to the expected and typical mathematical model. A graph flow could also be
+# used here to automatically infer & ensure the correct ordering.
 flow = lf.Flow('root').add(
    # Provide the initial values for other tasks to depend on.
    #
--- a/taskflow/examples/delayed_return.py
+++ b/taskflow/examples/delayed_return.py
@@ -35,7 +35,6 @@ sys.path.insert(0, self_dir)
 # while the function will have returned.

 import taskflow.engines
-
 from taskflow.listeners import base
 from taskflow.patterns import linear_flow as lf
 from taskflow import states
--- a/taskflow/examples/example_utils.py
+++ b/taskflow/examples/example_utils.py
@@ -35,6 +35,12 @@ except ImportError:
    SQLALCHEMY_AVAILABLE = False


+def print_wrapped(text):
+    print("-" * (len(text)))
+    print(text)
+    print("-" * (len(text)))
+
+
 def rm_path(persist_path):
    if not os.path.exists(persist_path):
        return
--- a/taskflow/examples/fake_billing.py
+++ b/taskflow/examples/fake_billing.py
@@ -28,10 +28,9 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)


-from taskflow.openstack.common import uuidutils
-
 from taskflow import engines
 from taskflow.listeners import printing
+from taskflow.openstack.common import uuidutils
 from taskflow.patterns import graph_flow as gf
 from taskflow.patterns import linear_flow as lf
 from taskflow import task
@@ -70,7 +69,7 @@ class UrlCaller(object):

 # Since engines save the output of tasks to a optional persistent storage
 # backend resources have to be dealt with in a slightly different manner since
-# resources are transient and can not be persisted (or serialized). For tasks
+# resources are transient and can *not* be persisted (or serialized). For tasks
 # that require access to a set of resources it is a common pattern to provide
 # a object (in this case this object) on construction of those tasks via the
 # task constructor.
@@ -149,9 +148,9 @@ class DeclareSuccess(task.Task):
        print("All data processed and sent to %s" % (sent_to))


-# Resources (db handles and similar) of course can't be persisted so we need
-# to make sure that we pass this resource fetcher to the tasks constructor so
-# that the tasks have access to any needed resources (the resources are
+# Resources (db handles and similar) of course can *not* be persisted so we
+# need to make sure that we pass this resource fetcher to the tasks constructor
+# so that the tasks have access to any needed resources (the resources are
 # lazily loaded so that they are only created when they are used).
 resources = ResourceFetcher()
 flow = lf.Flow("initialize-me")
--- a/taskflow/examples/graph_flow.py
+++ b/taskflow/examples/graph_flow.py
@@ -31,20 +31,20 @@ from taskflow.patterns import linear_flow as lf
 from taskflow import task


-# In this example there are complex dependencies between tasks that are used to
-# perform a simple set of linear equations.
+# In this example there are complex *inferred* dependencies between tasks that
+# are used to perform a simple set of linear equations.
 #
 # As you will see below the tasks just define what they require as input
 # and produce as output (named values). Then the user doesn't care about
-# ordering the TASKS (in this case the tasks calculate pieces of the overall
+# ordering the tasks (in this case the tasks calculate pieces of the overall
 # equation).
 #
-# As you will notice graph_flow resolves dependencies automatically using the
-# tasks requirements and provided values and no ordering dependency has to be
-# manually created.
+# As you will notice a graph flow resolves dependencies automatically using the
+# tasks symbol requirements and provided symbol values and no orderin
+# dependency has to be manually created.
 #
-# Also notice that flows of any types can be nested into a graph_flow; subflows
-# dependencies will be resolved too!! Pretty cool right!
+# Also notice that flows of any types can be nested into a graph flow; showing
+# that subflow dependencies (and associated ordering) will be inferred too.


 class Adder(task.Task):
--- a/taskflow/examples/persistence_example.py
+++ b/taskflow/examples/persistence_example.py
@@ -35,7 +35,7 @@ from taskflow.persistence import logbook
 from taskflow import task
 from taskflow.utils import persistence_utils as p_utils

-import example_utils  # noqa
+import example_utils as eu  # noqa

 # INTRO: In this example we create two tasks, one that will say hi and one
 # that will say bye with optional capability to raise an error while
@@ -49,12 +49,6 @@ import example_utils  # noqa
 # the database during both of these modes (failing or not failing).


-def print_wrapped(text):
-    print("-" * (len(text)))
-    print(text)
-    print("-" * (len(text)))
-
-
 class HiTask(task.Task):
    def execute(self):
        print("Hi!")
@@ -84,7 +78,7 @@ def make_flow(blowup=False):
 # Persist the flow and task state here, if the file/dir exists already blowup
 # if not don't blowup, this allows a user to see both the modes and to see
 # what is stored in each case.
-if example_utils.SQLALCHEMY_AVAILABLE:
+if eu.SQLALCHEMY_AVAILABLE:
    persist_path = os.path.join(tempfile.gettempdir(), "persisting.db")
    backend_uri = "sqlite:///%s" % (persist_path)
 else:
@@ -96,7 +90,7 @@ if os.path.exists(persist_path):
 else:
    blowup = True

-with example_utils.get_backend(backend_uri) as backend:
+with eu.get_backend(backend_uri) as backend:
    # Now we can run.
    engine_config = {
        'backend': backend,
@@ -108,17 +102,17 @@ with example_utils.get_backend(backend_uri) as backend:
    # did exist, assume we won't blowup (and therefore this shows the undo
    # and redo that a flow will go through).
    flow = make_flow(blowup=blowup)
-    print_wrapped("Running")
+    eu.print_wrapped("Running")
    try:
        eng = engines.load(flow, **engine_config)
        eng.run()
        if not blowup:
-            example_utils.rm_path(persist_path)
+            eu.rm_path(persist_path)
    except Exception:
        # NOTE(harlowja): don't exit with non-zero status code, so that we can
        # print the book contents, as well as avoiding exiting also makes the
        # unit tests (which also runs these examples) pass.
        traceback.print_exc(file=sys.stdout)

-    print_wrapped("Book contents")
+    eu.print_wrapped("Book contents")
    print(p_utils.pformat(engine_config['book']))
--- a/taskflow/examples/pseudo_scoping.out.txt
+++ b/taskflow/examples/pseudo_scoping.out.txt
@@ -0,0 +1,11 @@
+Running simple flow:
+Fetching number for Josh.
+Calling Josh 777.
+
+Calling many people using prefixed factory:
+Fetching number for Jim.
+Calling Jim 444.
+Fetching number for Joe.
+Calling Joe 555.
+Fetching number for Josh.
+Calling Josh 777.
--- a/taskflow/examples/pseudo_scoping.py
+++ b/taskflow/examples/pseudo_scoping.py
@@ -0,0 +1,113 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Ivan Melnikov <iv at altlinux dot org>
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+import os
+import sys
+
+logging.basicConfig(level=logging.ERROR)
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+import taskflow.engines
+from taskflow.patterns import linear_flow as lf
+from taskflow import task
+
+# INTRO: pseudo-scoping by adding prefixes
+
+# Sometimes you need scoping -- e.g. for adding several
+# similar subflows to one flow to do same stuff for different
+# data. But current version of TaskFlow does not allow that
+# directly, so you have to resort to some kind of trickery.
+# One (and more or less recommended, if not the only) way of
+# solving the problem is to transform every task name, it's
+# provides and requires values -- e.g. by adding prefix to them.
+# This example shows how this could be done.
+
+
+# The example task is simple: for each specified person, fetch
+# his or her phone number from phone book and call.
+
+
+PHONE_BOOK = {
+    'jim': '444',
+    'joe': '555',
+    'iv_m': '666',
+    'josh': '777'
+}
+
+
+class FetchNumberTask(task.Task):
+    """Task that fetches number from phone book."""
+
+    default_provides = 'number'
+
+    def execute(self, person):
+        print('Fetching number for %s.' % person)
+        return PHONE_BOOK[person.lower()]
+
+
+class CallTask(task.Task):
+    """Task that calls person by number."""
+
+    def execute(self, person, number):
+        print('Calling %s %s.' % (person, number))
+
+# This is how it works for one person:
+
+simple_flow = lf.Flow('simple one').add(
+    FetchNumberTask(),
+    CallTask())
+print('Running simple flow:')
+taskflow.engines.run(simple_flow, store={'person': 'Josh'})
+
+
+# To call several people you'll need a factory function that will
+# make a flow with given prefix for you. We need to add prefix
+# to task names, their provides and requires values. For requires,
+# we use `rebind` argument of task constructor.
+def subflow_factory(prefix):
+    def pr(what):
+        return '%s-%s' % (prefix, what)
+
+    return lf.Flow(pr('flow')).add(
+        FetchNumberTask(pr('fetch'),
+                        provides=pr('number'),
+                        rebind=[pr('person')]),
+        CallTask(pr('call'),
+                 rebind=[pr('person'), pr('number')])
+    )
+
+
+def call_them_all():
+    # Let's call them all. We need a flow:
+    flow = lf.Flow('call-them-prefixed')
+
+    # We'll also need to inject person names with prefixed argument
+    # name to storage to satisfy task requirements.
+    persons = {}
+
+    for person in ('Jim', 'Joe', 'Josh'):
+        prefix = person.lower()
+        persons['%s-person' % prefix] = person
+        flow.add(subflow_factory(prefix))
+    taskflow.engines.run(flow, store=persons)
+
+print('\nCalling many people using prefixed factory:')
+call_them_all()
--- a/taskflow/examples/resume_from_backend.py
+++ b/taskflow/examples/resume_from_backend.py
@@ -28,12 +28,11 @@ sys.path.insert(0, top_dir)
 sys.path.insert(0, self_dir)

 import taskflow.engines
-
 from taskflow.patterns import linear_flow as lf
 from taskflow import task
 from taskflow.utils import persistence_utils as p_utils

-import example_utils  # noqa
+import example_utils as eu  # noqa

 # INTRO: In this example linear_flow is used to group three tasks, one which
 # will suspend the future work the engine may do. This suspend engine is then
@@ -53,20 +52,13 @@ import example_utils  # noqa
 #
 #     python taskflow/examples/resume_from_backend.py \
 #       zookeeper://127.0.0.1:2181/taskflow/resume_from_backend/
-#


-### UTILITY FUNCTIONS #########################################
-
-
-def print_wrapped(text):
-    print("-" * (len(text)))
-    print(text)
-    print("-" * (len(text)))
+# UTILITY FUNCTIONS #########################################


 def print_task_states(flowdetail, msg):
-    print_wrapped(msg)
+    eu.print_wrapped(msg)
    print("Flow '%s' state: %s" % (flowdetail.name, flowdetail.state))
    # Sort by these so that our test validation doesn't get confused by the
    # order in which the items in the flow detail can be in.
@@ -82,7 +74,7 @@ def find_flow_detail(backend, lb_id, fd_id):
    return lb.find(fd_id)


-### CREATE FLOW ###############################################
+# CREATE FLOW ###############################################


 class InterruptTask(task.Task):
@@ -104,12 +96,12 @@ def flow_factory():
        TestTask(name='second'))


-### INITIALIZE PERSISTENCE ####################################
+# INITIALIZE PERSISTENCE ####################################

-with example_utils.get_backend() as backend:
+with eu.get_backend() as backend:
    logbook = p_utils.temporary_log_book(backend)

-    ### CREATE AND RUN THE FLOW: FIRST ATTEMPT ####################
+    # CREATE AND RUN THE FLOW: FIRST ATTEMPT ####################

    flow = flow_factory()
    flowdetail = p_utils.create_flow_detail(flow, logbook, backend)
@@ -117,13 +109,13 @@ with example_utils.get_backend() as backend:
                                   backend=backend)

    print_task_states(flowdetail, "At the beginning, there is no state")
-    print_wrapped("Running")
+    eu.print_wrapped("Running")
    engine.run()
    print_task_states(flowdetail, "After running")

-    ### RE-CREATE, RESUME, RUN ####################################
+    # RE-CREATE, RESUME, RUN ####################################

-    print_wrapped("Resuming and running again")
+    eu.print_wrapped("Resuming and running again")

    # NOTE(harlowja): reload the flow detail from backend, this will allow us
    # to resume the flow from its suspended state, but first we need to search
--- a/taskflow/examples/resume_many_flows/resume_all.py
+++ b/taskflow/examples/resume_many_flows/resume_all.py
@@ -30,7 +30,6 @@ sys.path.insert(0, example_dir)


 import taskflow.engines
-
 from taskflow import states

 import example_utils  # noqa
--- a/taskflow/examples/resume_vm_boot.py
+++ b/taskflow/examples/resume_vm_boot.py
@@ -31,19 +31,16 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)
 sys.path.insert(0, self_dir)

-from taskflow.patterns import graph_flow as gf
-from taskflow.patterns import linear_flow as lf
-
-from taskflow.openstack.common import uuidutils
-
 from taskflow import engines
 from taskflow import exceptions as exc
+from taskflow.openstack.common import uuidutils
+from taskflow.patterns import graph_flow as gf
+from taskflow.patterns import linear_flow as lf
 from taskflow import task
-
 from taskflow.utils import eventlet_utils as e_utils
 from taskflow.utils import persistence_utils as p_utils

-import example_utils  # noqa
+import example_utils as eu  # noqa

 # INTRO: This examples shows how a hierarchy of flows can be used to create a
 # vm in a reliable & resumable manner using taskflow + a miniature version of
@@ -61,12 +58,6 @@ def slow_down(how_long=0.5):
            time.sleep(how_long)


-def print_wrapped(text):
-    print("-" * (len(text)))
-    print(text)
-    print("-" * (len(text)))
-
-
 class PrintText(task.Task):
    """Just inserts some text print outs in a workflow."""
    def __init__(self, print_what, no_slow=False):
@@ -77,10 +68,10 @@ class PrintText(task.Task):

    def execute(self):
        if self._no_slow:
-            print_wrapped(self._text)
+            eu.print_wrapped(self._text)
        else:
            with slow_down():
-                print_wrapped(self._text)
+                eu.print_wrapped(self._text)


 class DefineVMSpec(task.Task):
@@ -229,10 +220,10 @@ def create_flow():
        PrintText("Instance is running!", no_slow=True))
    return flow

-print_wrapped("Initializing")
+eu.print_wrapped("Initializing")

 # Setup the persistence & resumption layer.
-with example_utils.get_backend() as backend:
+with eu.get_backend() as backend:
    try:
        book_id, flow_id = sys.argv[2].split("+", 1)
        if not uuidutils.is_uuid_like(book_id):
@@ -275,7 +266,7 @@ with example_utils.get_backend() as backend:
                                          engine_conf=engine_conf)

    # Make me my vm please!
-    print_wrapped('Running')
+    eu.print_wrapped('Running')
    engine.run()

 # How to use.
--- a/taskflow/examples/resume_volume_create.py
+++ b/taskflow/examples/resume_volume_create.py
@@ -31,12 +31,10 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)
 sys.path.insert(0, self_dir)

+from taskflow import engines
 from taskflow.patterns import graph_flow as gf
 from taskflow.patterns import linear_flow as lf
-
-from taskflow import engines
 from taskflow import task
-
 from taskflow.utils import persistence_utils as p_utils

 import example_utils  # noqa
--- a/taskflow/examples/reverting_linear.py
+++ b/taskflow/examples/reverting_linear.py
@@ -26,26 +26,19 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)

 import taskflow.engines
-
 from taskflow.patterns import linear_flow as lf
 from taskflow import task

 # INTRO: In this example we create three tasks, each of which ~calls~ a given
-# number (provided as a function input), one of those tasks fails calling a
+# number (provided as a function input), one of those tasks *fails* calling a
 # given number (the suzzie calling); this causes the workflow to enter the
 # reverting process, which activates the revert methods of the previous two
 # phone ~calls~.
 #
 # This simulated calling makes it appear like all three calls occur or all
 # three don't occur (transaction-like capabilities). No persistence layer is
-# used here so reverting and executing will not handle process failure.
-#
-# This example shows a basic usage of the taskflow structures without involving
-# the complexity of persistence. Using the structures that taskflow provides
-# via tasks and flows makes it possible for you to easily at a later time
-# hook in a persistence layer (and then gain the functionality that offers)
-# when you decide the complexity of adding that layer in is 'worth it' for your
-# applications usage pattern (which some applications may not need).
+# used here so reverting and executing will *not* be tolerant of process
+# failure.


 class CallJim(task.Task):
@@ -94,6 +87,6 @@ except Exception as e:
    # how to deal with multiple tasks failing while running.
    #
    # You will also note that this is not a problem in this case since no
-    # parallelism is involved; this is ensured by the usage of a linear flow,
-    # which runs serially as well as the default engine type which is 'serial'.
+    # parallelism is involved; this is ensured by the usage of a linear flow
+    # and the default engine type which is 'serial' vs being 'parallel'.
    print("Flow failed: %s" % e)
--- a/taskflow/examples/simple_linear.py
+++ b/taskflow/examples/simple_linear.py
@@ -36,12 +36,13 @@ from taskflow import task
 # sequence (the flow) and then passing the work off to an engine, with some
 # initial data to be ran in a reliable manner.
 #
-# This example shows a basic usage of the taskflow structures without involving
-# the complexity of persistence. Using the structures that taskflow provides
-# via tasks and flows makes it possible for you to easily at a later time
-# hook in a persistence layer (and then gain the functionality that offers)
-# when you decide the complexity of adding that layer in is 'worth it' for your
-# applications usage pattern (which some applications may not need).
+# NOTE(harlowja): This example shows a basic usage of the taskflow structures
+# without involving the complexity of persistence. Using the structures that
+# taskflow provides via tasks and flows makes it possible for you to easily at
+# a later time hook in a persistence layer (and then gain the functionality
+# that offers) when you decide the complexity of adding that layer in
+# is 'worth it' for your applications usage pattern (which certain applications
+# may not need).


 class CallJim(task.Task):
--- a/taskflow/examples/simple_linear_listening.py
+++ b/taskflow/examples/simple_linear_listening.py
@@ -26,7 +26,6 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)

 import taskflow.engines
-
 from taskflow.patterns import linear_flow as lf
 from taskflow import task

--- a/taskflow/examples/wbe_simple_linear.out.txt
+++ b/taskflow/examples/wbe_simple_linear.out.txt
@@ -0,0 +1,5 @@
+Running 2 workers.
+Executing some work.
+Execution finished.
+Result = {"result1": 1, "result2": 666, "x": 111, "y": 222, "z": 333}
+Stopping workers.
--- a/taskflow/examples/wbe_simple_linear.py
+++ b/taskflow/examples/wbe_simple_linear.py
@@ -0,0 +1,146 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import json
+import logging
+import os
+import sys
+import tempfile
+import threading
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+from taskflow import engines
+from taskflow.engines.worker_based import worker
+from taskflow.patterns import linear_flow as lf
+from taskflow.tests import utils
+
+import example_utils  # noqa
+
+# INTRO: This example walks through a miniature workflow which shows how to
+# start up a number of workers (these workers will process task execution and
+# reversion requests using any provided input data) and then use an engine
+# that creates a set of *capable* tasks and flows (the engine can not create
+# tasks that the workers are not able to run, this will end in failure) that
+# those workers will run and then executes that workflow seamlessly using the
+# workers to perform the actual execution.
+#
+# NOTE(harlowja): this example simulates the expected larger number of workers
+# by using a set of threads (which in this example simulate the remote workers
+# that would typically be running on other external machines).
+
+# A filesystem can also be used as the queue transport (useful as simple
+# transport type that does not involve setting up a larger mq system). If this
+# is false then the memory transport is used instead, both work in standalone
+# setups.
+USE_FILESYSTEM = False
+BASE_SHARED_CONF = {
+    'exchange': 'taskflow',
+}
+WORKERS = 2
+WORKER_CONF = {
+    # These are the tasks the worker can execute, they *must* be importable,
+    # typically this list is used to restrict what workers may execute to
+    # a smaller set of *allowed* tasks that are known to be safe (one would
+    # not want to allow all python code to be executed).
+    'tasks': [
+        'taskflow.tests.utils:TaskOneArgOneReturn',
+        'taskflow.tests.utils:TaskMultiArgOneReturn'
+    ],
+}
+ENGINE_CONF = {
+    'engine': 'worker-based',
+}
+
+
+def run(engine_conf):
+    flow = lf.Flow('simple-linear').add(
+        utils.TaskOneArgOneReturn(provides='result1'),
+        utils.TaskMultiArgOneReturn(provides='result2')
+    )
+    eng = engines.load(flow,
+                       store=dict(x=111, y=222, z=333),
+                       engine_conf=engine_conf)
+    eng.run()
+    return eng.storage.fetch_all()
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.ERROR)
+
+    # Setup our transport configuration and merge it into the worker and
+    # engine configuration so that both of those use it correctly.
+    shared_conf = dict(BASE_SHARED_CONF)
+
+    tmp_path = None
+    if USE_FILESYSTEM:
+        tmp_path = tempfile.mkdtemp(prefix='wbe-example-')
+        shared_conf.update({
+            'transport': 'filesystem',
+            'transport_options': {
+                'data_folder_in': tmp_path,
+                'data_folder_out': tmp_path,
+                'polling_interval': 0.1,
+            },
+        })
+    else:
+        shared_conf.update({
+            'transport': 'memory',
+            'transport_options': {
+                'polling_interval': 0.1,
+            },
+        })
+    worker_conf = dict(WORKER_CONF)
+    worker_conf.update(shared_conf)
+    engine_conf = dict(ENGINE_CONF)
+    engine_conf.update(shared_conf)
+    workers = []
+    worker_topics = []
+
+    try:
+        # Create a set of workers to simulate actual remote workers.
+        print('Running %s workers.' % (WORKERS))
+        for i in range(0, WORKERS):
+            worker_conf['topic'] = 'worker-%s' % (i + 1)
+            worker_topics.append(worker_conf['topic'])
+            w = worker.Worker(**worker_conf)
+            runner = threading.Thread(target=w.run)
+            runner.daemon = True
+            runner.start()
+            w.wait()
+            workers.append((runner, w.stop))
+
+        # Now use those workers to do something.
+        print('Executing some work.')
+        engine_conf['topics'] = worker_topics
+        result = run(engine_conf)
+        print('Execution finished.')
+        # This is done so that the test examples can work correctly
+        # even when the keys change order (which will happen in various
+        # python versions).
+        print("Result = %s" % json.dumps(result, sort_keys=True))
+    finally:
+        # And cleanup.
+        print('Stopping workers.')
+        while workers:
+            r, stopper = workers.pop()
+            stopper()
+            r.join()
+        if tmp_path:
+            example_utils.rm_path(tmp_path)
--- a/taskflow/examples/worker_based/flow.py
+++ b/taskflow/examples/worker_based/flow.py
@@ -1,61 +0,0 @@
-# -*- coding: utf-8 -*-
-
-#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
-#
-#    Licensed under the Apache License, Version 2.0 (the "License"); you may
-#    not use this file except in compliance with the License. You may obtain
-#    a copy of the License at
-#
-#         http://www.apache.org/licenses/LICENSE-2.0
-#
-#    Unless required by applicable law or agreed to in writing, software
-#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-#    License for the specific language governing permissions and limitations
-#    under the License.
-
-import json
-import logging
-import sys
-
-import taskflow.engines
-from taskflow.patterns import linear_flow as lf
-from taskflow.tests import utils
-
-LOG = logging.getLogger(__name__)
-
-
-if __name__ == "__main__":
-    logging.basicConfig(level=logging.ERROR)
-    engine_conf = {
-        'engine': 'worker-based',
-        'exchange': 'taskflow',
-        'topics': ['test-topic'],
-    }
-
-    # parse command line
-    try:
-        arg = sys.argv[1]
-    except IndexError:
-        pass
-    else:
-        try:
-            cfg = json.loads(arg)
-        except ValueError:
-            engine_conf.update(url=arg)
-        else:
-            engine_conf.update(cfg)
-    finally:
-        LOG.debug("Worker configuration: %s\n" %
-                  json.dumps(engine_conf, sort_keys=True, indent=4))
-
-    # create and run flow
-    flow = lf.Flow('simple-linear').add(
-        utils.TaskOneArgOneReturn(provides='result1'),
-        utils.TaskMultiArgOneReturn(provides='result2')
-    )
-    eng = taskflow.engines.load(flow,
-                                store=dict(x=111, y=222, z=333),
-                                engine_conf=engine_conf)
-    eng.run()
-    print(json.dumps(eng.storage.fetch_all(), sort_keys=True))
--- a/taskflow/examples/worker_based/worker.py
+++ b/taskflow/examples/worker_based/worker.py
@@ -1,58 +0,0 @@
-# -*- coding: utf-8 -*-
-
-#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
-#
-#    Licensed under the Apache License, Version 2.0 (the "License"); you may
-#    not use this file except in compliance with the License. You may obtain
-#    a copy of the License at
-#
-#         http://www.apache.org/licenses/LICENSE-2.0
-#
-#    Unless required by applicable law or agreed to in writing, software
-#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-#    License for the specific language governing permissions and limitations
-#    under the License.
-
-import json
-import logging
-import sys
-
-from taskflow.engines.worker_based import worker as w
-
-LOG = logging.getLogger(__name__)
-
-
-if __name__ == "__main__":
-    logging.basicConfig(level=logging.ERROR)
-    worker_conf = {
-        'exchange': 'taskflow',
-        'topic': 'test-topic',
-        'tasks': [
-            'taskflow.tests.utils:TaskOneArgOneReturn',
-            'taskflow.tests.utils:TaskMultiArgOneReturn'
-        ]
-    }
-
-    # parse command line
-    try:
-        arg = sys.argv[1]
-    except IndexError:
-        pass
-    else:
-        try:
-            cfg = json.loads(arg)
-        except ValueError:
-            worker_conf.update(url=arg)
-        else:
-            worker_conf.update(cfg)
-    finally:
-        LOG.debug("Worker configuration: %s\n" %
-                  json.dumps(worker_conf, sort_keys=True, indent=4))
-
-    # run worker
-    worker = w.Worker(**worker_conf)
-    try:
-        worker.run()
-    except KeyboardInterrupt:
-        pass
--- a/taskflow/examples/worker_based_flow.out.txt
+++ b/taskflow/examples/worker_based_flow.out.txt
@@ -1,6 +0,0 @@
-Run worker.
-Run flow.
-{"result1": 1, "result2": 666, "x": 111, "y": 222, "z": 333}
-
-Flow finished.
-Stop worker.
--- a/taskflow/examples/worker_based_flow.py
+++ b/taskflow/examples/worker_based_flow.py
@@ -1,73 +0,0 @@
-# -*- coding: utf-8 -*-
-
-#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
-#
-#    Licensed under the Apache License, Version 2.0 (the "License"); you may
-#    not use this file except in compliance with the License. You may obtain
-#    a copy of the License at
-#
-#         http://www.apache.org/licenses/LICENSE-2.0
-#
-#    Unless required by applicable law or agreed to in writing, software
-#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-#    License for the specific language governing permissions and limitations
-#    under the License.
-
-import json
-import os
-import subprocess
-import sys
-import tempfile
-
-self_dir = os.path.abspath(os.path.dirname(__file__))
-sys.path.insert(0, self_dir)
-
-import example_utils  # noqa
-
-
-def _path_to(name):
-    return os.path.abspath(os.path.join(os.path.dirname(__file__),
-                                        'worker_based', name))
-
-
-def run_test(name, config):
-    cmd = [sys.executable, _path_to(name), config]
-    process = subprocess.Popen(cmd, stdin=None, stdout=subprocess.PIPE,
-                               stderr=sys.stderr)
-    return process, cmd
-
-
-def main():
-    tmp_path = None
-    try:
-        tmp_path = tempfile.mkdtemp(prefix='worker-based-example-')
-        config = json.dumps({
-            'transport': 'filesystem',
-            'transport_options': {
-                'data_folder_in': tmp_path,
-                'data_folder_out': tmp_path
-            }
-        })
-
-        print('Run worker.')
-        worker_process, _ = run_test('worker.py', config)
-
-        print('Run flow.')
-        flow_process, flow_cmd = run_test('flow.py', config)
-        stdout, _ = flow_process.communicate()
-        rc = flow_process.returncode
-        if rc != 0:
-            raise RuntimeError("Could not run %s [%s]" % (flow_cmd, rc))
-        print(stdout.decode())
-        print('Flow finished.')
-
-        print('Stop worker.')
-        worker_process.terminate()
-
-    finally:
-        if tmp_path is not None:
-            example_utils.rm_path(tmp_path)
-
-if __name__ == '__main__':
-    main()
--- a/taskflow/examples/wrapped_exception.py
+++ b/taskflow/examples/wrapped_exception.py
@@ -29,13 +29,14 @@ sys.path.insert(0, top_dir)


 import taskflow.engines
-
 from taskflow import exceptions
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task
 from taskflow.tests import utils
 from taskflow.utils import misc

+import example_utils as eu  # noqa
+
 # INTRO: In this example we create two tasks which can trigger exceptions
 # based on various inputs to show how to analyze the thrown exceptions for
 # which types were thrown and handle the different types in different ways.
@@ -54,12 +55,6 @@ from taskflow.utils import misc
 # that code to do further cleanups (if desired).


-def print_wrapped(text):
-    print("-" * (len(text)))
-    print(text)
-    print("-" * (len(text)))
-
-
 class FirstException(Exception):
    """Exception that first task raises."""

@@ -112,18 +107,18 @@ def run(**store):
        misc.Failure.reraise_if_any(unknown_failures)


-print_wrapped("Raise and catch first exception only")
+eu.print_wrapped("Raise and catch first exception only")
 run(sleep1=0.0, raise1=True,
    sleep2=0.0, raise2=False)

 # NOTE(imelnikov): in general, sleeping does not guarantee that we'll have both
 # task running before one of them fails, but with current implementation this
 # works most of times, which is enough for our purposes here (as an example).
-print_wrapped("Raise and catch both exceptions")
+eu.print_wrapped("Raise and catch both exceptions")
 run(sleep1=1.0, raise1=True,
    sleep2=1.0, raise2=True)

-print_wrapped("Handle one exception, and re-raise another")
+eu.print_wrapped("Handle one exception, and re-raise another")
 try:
    run(sleep1=1.0, raise1=True,
        sleep2=1.0, raise2='boom')
--- a/taskflow/exceptions.py
+++ b/taskflow/exceptions.py
@@ -84,9 +84,7 @@ class ExecutionFailure(TaskFlowException):


 class RequestTimeout(ExecutionFailure):
-    """Raised when a worker request was not finished within an allotted
-    timeout.
-    """
+    """Raised when a worker request was not finished within allotted time."""


 class InvalidState(ExecutionFailure):
@@ -131,6 +129,10 @@ class MultipleChoices(TaskFlowException):
    """Raised when some decision can't be made due to many possible choices."""


+class InvalidFormat(TaskFlowException):
+    """Raised when some object/entity is not in the expected format."""
+
+
 # Others.

 class WrappedFailure(Exception):
--- a/taskflow/flow.py
+++ b/taskflow/flow.py
@@ -55,8 +55,11 @@ class Flow(object):

    @property
    def retry(self):
-        """A retry object that will affect control how (and if) this flow
-        retries while execution is underway.
+        """The associated flow retry controller.
+
+        This retry controller object will affect & control how (and if) this
+        flow and its contained components retry when execution is underway and
+        a failure occurs.
        """
        return self._retry

--- a/taskflow/jobs/backends/init.py
+++ b/taskflow/jobs/backends/init.py
@@ -31,9 +31,25 @@ LOG = logging.getLogger(__name__)


 def fetch(name, conf, namespace=BACKEND_NAMESPACE, **kwargs):
-    """Fetch a jobboard backend with the given configuration (and any board
-    specific kwargs) in the given entrypoint namespace and create it with the
-    given name.
+    """Fetch a jobboard backend with the given configuration.
+
+    This fetch method will look for the entrypoint name in the entrypoint
+    namespace, and then attempt to instantiate that entrypoint using the
+    provided name, configuration and any board specific kwargs.
+
+    NOTE(harlowja): to aid in making it easy to specify configuration and
+    options to a board the configuration (which is typical just a dictionary)
+    can also be a uri string that identifies the entrypoint name and any
+    configuration specific to that board.
+
+    For example, given the following configuration uri:
+
+    zookeeper://<not-used>/?a=b&c=d
+
+    This will look for the entrypoint named 'zookeeper' and will provide
+    a configuration object composed of the uris parameters, in this case that
+    is {'a': 'b', 'c': 'd'} to the constructor of that board instance (also
+    including the name specified).
    """
    if isinstance(conf, six.string_types):
        conf = {'board': conf}
@@ -58,8 +74,11 @@ def fetch(name, conf, namespace=BACKEND_NAMESPACE, **kwargs):

@contextlib.contextmanager
 def backend(name, conf, namespace=BACKEND_NAMESPACE, **kwargs):
-    """Fetches a jobboard backend, connects to it and allows it to be used in
-    a context manager statement with the jobboard being closed upon completion.
+    """Fetches a jobboard, connects to it and closes it on completion.
+
+    This allows a board instance to fetched, connected to, and then used in a
+    context manager statement with the board being closed upon context
+    manager exit.
    """
    jb = fetch(name, conf, namespace=namespace, **kwargs)
    jb.connect()
--- a/taskflow/jobs/backends/impl_zookeeper.py
+++ b/taskflow/jobs/backends/impl_zookeeper.py
@@ -33,6 +33,7 @@ from taskflow.openstack.common import excutils
 from taskflow.openstack.common import jsonutils
 from taskflow.openstack.common import uuidutils
 from taskflow import states
+from taskflow.types import timing as tt
 from taskflow.utils import kazoo_utils
 from taskflow.utils import lock_utils
 from taskflow.utils import misc
@@ -431,7 +432,7 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
                    else:
                        child_proc(request)

-    def post(self, name, book, details=None):
+    def post(self, name, book=None, details=None):

        def format_posting(job_uuid):
            posting = {
@@ -475,6 +476,17 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
            return job

    def claim(self, job, who):
+        def _unclaimable_try_find_owner(cause):
+            try:
+                owner = self.find_owner(job)
+            except Exception:
+                owner = None
+            if owner:
+                msg = "Job %s already claimed by '%s'" % (job.uuid, owner)
+            else:
+                msg = "Job %s already claimed" % (job.uuid)
+            return excp.UnclaimableJob(msg, cause)
+
        _check_who(who)
        with self._wrap(job.uuid, job.path, "Claiming failure: %s"):
            # NOTE(harlowja): post as json which will allow for future changes
@@ -482,21 +494,33 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
            value = jsonutils.dumps({
                'owner': who,
            })
+            # Ensure the target job is still existent (at the right version).
+            job_data, job_stat = self._client.get(job.path)
+            txn = self._client.transaction()
+            # This will abort (and not create the lock) if the job has been
+            # removed (somehow...) or updated by someone else to a different
+            # version...
+            txn.check(job.path, version=job_stat.version)
+            txn.create(job.lock_path, value=misc.binary_encode(value),
+                       ephemeral=True)
            try:
-                self._client.create(job.lock_path,
-                                    value=misc.binary_encode(value),
-                                    ephemeral=True)
-            except k_exceptions.NodeExistsException:
-                # Try to see if we can find who the owner really is...
-                try:
-                    owner = self.find_owner(job)
-                except Exception:
-                    owner = None
-                if owner:
-                    msg = "Job %s already claimed by '%s'" % (job.uuid, owner)
+                kazoo_utils.checked_commit(txn)
+            except k_exceptions.NodeExistsError as e:
+                raise _unclaimable_try_find_owner(e)
+            except kazoo_utils.KazooTransactionException as e:
+                if len(e.failures) < 2:
+                    raise
                else:
-                    msg = "Job %s already claimed" % (job.uuid)
-                raise excp.UnclaimableJob(msg)
+                    if isinstance(e.failures[0], k_exceptions.NoNodeError):
+                        raise excp.NotFound(
+                            "Job %s not found to be claimed" % job.uuid,
+                            e.failures[0])
+                    if isinstance(e.failures[1], k_exceptions.NodeExistsError):
+                        raise _unclaimable_try_find_owner(e.failures[1])
+                    else:
+                        raise excp.UnclaimableJob(
+                            "Job %s claim failed due to transaction"
+                            " not succeeding" % (job.uuid), e)

    @contextlib.contextmanager
    def _wrap(self, job_uuid, job_path,
@@ -557,9 +581,10 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
                raise excp.JobFailure("Can not consume a job %s"
                                      " which is not owned by %s"
                                      % (job.uuid, who))
-            with self._client.transaction() as txn:
-                txn.delete(job.lock_path, version=lock_stat.version)
-                txn.delete(job.path, version=data_stat.version)
+            txn = self._client.transaction()
+            txn.delete(job.lock_path, version=lock_stat.version)
+            txn.delete(job.path, version=data_stat.version)
+            kazoo_utils.checked_commit(txn)
            self._remove_job(job.path)

    def abandon(self, job, who):
@@ -576,8 +601,9 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
                raise excp.JobFailure("Can not abandon a job %s"
                                      " which is not owned by %s"
                                      % (job.uuid, who))
-            with self._client.transaction() as txn:
-                txn.delete(job.lock_path, version=lock_stat.version)
+            txn = self._client.transaction()
+            txn.delete(job.lock_path, version=lock_stat.version)
+            kazoo_utils.checked_commit(txn)

    def _state_change_listener(self, state):
        LOG.debug("Kazoo client has changed to state: %s", state)
@@ -586,13 +612,12 @@ class ZookeeperJobBoard(jobboard.NotifyingJobBoard):
        # Wait until timeout expires (or forever) for jobs to appear.
        watch = None
        if timeout is not None:
-            watch = misc.StopWatch(duration=float(timeout))
-            watch.start()
+            watch = tt.StopWatch(duration=float(timeout)).start()
        self._job_cond.acquire()
        try:
            while True:
                if not self._known_jobs:
-                    if watch and watch.expired():
+                    if watch is not None and watch.expired():
                        raise excp.NotFound("Expired waiting for jobs to"
                                            " arrive; waited %s seconds"
                                            % watch.elapsed())
--- a/taskflow/jobs/job.py
+++ b/taskflow/jobs/job.py
@@ -24,16 +24,22 @@ from taskflow.openstack.common import uuidutils

@six.add_metaclass(abc.ABCMeta)
 class Job(object):
-    """A job is a higher level abstraction over a set of flows as well as the
-    *ownership* of those flows, it is the highest piece of work that can be
-    owned by an entity performing those flows.
+    """A abstraction that represents a named and trackable unit of work.

-    Only one entity will be operating on the flows contained in a job at a
-    given time (for the foreseeable future).
+    A job connects a logbook, a owner, last modified and created on dates and
+    any associated state that the job has. Since it is a connector to a
+    logbook, which are each associated with a set of factories that can create
+    set of flows, it is the current top-level container for a piece of work
+    that can be owned by an entity (typically that entity will read those
+    logbooks and run any contained flows).

-    It is the object that should be transferred to another entity on failure of
-    so that the contained flows ownership can be transferred to the secondary
-    entity for resumption/continuation/reverting.
+    Only one entity will be allowed to own and operate on the flows contained
+    in a job at a given time (for the foreseeable future).
+
+    NOTE(harlowja): It is the object that will be transferred to another
+    entity on failure so that the contained flows ownership can be
+    transferred to the secondary entity/owner for resumption, continuation,
+    reverting...
    """

    def __init__(self, name, uuid=None, details=None):
--- a/taskflow/jobs/jobboard.py
+++ b/taskflow/jobs/jobboard.py
@@ -24,10 +24,15 @@ from taskflow.utils import misc

@six.add_metaclass(abc.ABCMeta)
 class JobBoard(object):
-    """A jobboard is an abstract representation of a place where jobs
-    can be posted, reposted, claimed and transferred. There can be multiple
-    implementations of this job board, depending on the desired semantics and
-    capabilities of the underlying jobboard implementation.
+    """A place where jobs can be posted, reposted, claimed and transferred.
+
+    There can be multiple implementations of this job board, depending on the
+    desired semantics and capabilities of the underlying jobboard
+    implementation.
+
+    NOTE(harlowja): the name is meant to be an analogous to a board/posting
+    system that is used in newspapers, or elsewhere to solicit jobs that
+    people can interview and apply for (and then work on & complete).
    """

    def __init__(self, name, conf):
@@ -36,8 +41,7 @@ class JobBoard(object):

    @abc.abstractmethod
    def iterjobs(self, only_unclaimed=False, ensure_fresh=False):
-        """Returns an iterator that will provide back jobs that are currently
-        on this jobboard.
+        """Returns an iterator of jobs that are currently on this board.

        NOTE(harlowja): the ordering of this iteration should be by posting
        order (oldest to newest) if possible, but it is left up to the backing
@@ -60,9 +64,10 @@ class JobBoard(object):

    @abc.abstractmethod
    def wait(self, timeout=None):
-        """Waits a given amount of time for job/s to be posted, when jobs are
-        found then an iterator will be returned that contains the jobs at
-        the given point in time.
+        """Waits a given amount of time for jobs to be posted.
+
+        When jobs are found then an iterator will be returned that can be used
+        to iterate over those jobs.

        NOTE(harlowja): since a jobboard can be mutated on by multiple external
        entities at the *same* time the iterator that can be returned *may*
@@ -75,8 +80,11 @@ class JobBoard(object):

    @abc.abstractproperty
    def job_count(self):
-        """Returns how many jobs are on this jobboard (this count may change as
-        new jobs appear or are removed).
+        """Returns how many jobs are on this jobboard.
+
+        NOTE(harlowja): this count may change as jobs appear or are removed so
+        the accuracy of this count should not be used in a way that requires
+        it to be exact & absolute.
        """

    @abc.abstractmethod
@@ -90,11 +98,13 @@ class JobBoard(object):

    @abc.abstractmethod
    def consume(self, job, who):
-        """Permanently (and atomically) removes a job from the jobboard,
-        signaling that this job has been completed by the entity assigned
-        to that job.
+        """Permanently (and atomically) removes a job from the jobboard.

-        Only the entity that has claimed that job is able to consume a job.
+        Consumption signals to the board (and any others examining the board)
+        that this job has been completed by the entity that previously claimed
+        that job.
+
+        Only the entity that has claimed that job is able to consume the job.

        A job that has been consumed can not be reclaimed or reposted by
        another entity (job postings are immutable). Any entity consuming
@@ -108,12 +118,14 @@ class JobBoard(object):
        """

    @abc.abstractmethod
-    def post(self, name, book, details=None):
-        """Atomically creates and posts a job to the jobboard, allowing others
-        to attempt to claim that job (and subsequently work on that job). The
-        contents of the provided logbook must provide enough information for
-        others to reference to construct & work on the desired entries that
-        are contained in that logbook.
+    def post(self, name, book=None, details=None):
+        """Atomically creates and posts a job to the jobboard.
+
+        This posting allowing others to attempt to claim that job (and
+        subsequently work on that job). The contents of the provided logbook,
+        details dictionary, or name (or a mix of these) must provide *enough*
+        information for consumers to reference to construct and perform that
+        jobs contained work (whatever it may be).

        Once a job has been posted it can only be removed by consuming that
        job (after that job is claimed). Any entity can post/propose jobs
@@ -124,13 +136,14 @@ class JobBoard(object):

    @abc.abstractmethod
    def claim(self, job, who):
-        """Atomically attempts to claim the given job for the entity and either
-        succeeds or fails at claiming by throwing corresponding exceptions.
+        """Atomically attempts to claim the provided job.

        If a job is claimed it is expected that the entity that claims that job
-        will at sometime in the future work on that jobs flows and either fail
-        at completing them (resulting in a reposting) or consume that job from
-        the jobboard (signaling its completion).
+        will at sometime in the future work on that jobs contents and either
+        fail at completing them (resulting in a reposting) or consume that job
+        from the jobboard (signaling its completion). If claiming fails then
+        a corresponding exception will be raised to signal this to the claim
+        attempter.

        :param job: a job on this jobboard that can be claimed (if it does
            not exist then a NotFound exception will be raised).
@@ -139,10 +152,12 @@ class JobBoard(object):

    @abc.abstractmethod
    def abandon(self, job, who):
-        """Atomically abandons the given job on the jobboard, allowing that job
-        to be reclaimed by others. This would typically occur if the entity
-        that has claimed the job has failed or is unable to complete the job
-        or jobs it has claimed.
+        """Atomically attempts to abandon the provided job.
+
+        This abandonment signals to others that the job may now be reclaimed.
+        This would typically occur if the entity that has claimed the job has
+        failed or is unable to complete the job or jobs it had previously
+        claimed.

        Only the entity that has claimed that job can abandon a job. Any entity
        abandoning a unclaimed job (or a job they do not own) will cause an
@@ -177,13 +192,14 @@ REMOVAL = 'REMOVAL'  # existing job is/has been removed


 class NotifyingJobBoard(JobBoard):
-    """A jobboard subclass that can notify about jobs being created
-    and removed, which can remove the repeated usage of iterjobs() to achieve
-    the same operation.
+    """A jobboard subclass that can notify others about board events.
+
+    Implementers are expected to notify *at least* about jobs being posted
+    and removed.

    NOTE(harlowja): notifications that are emitted *may* be emitted on a
    separate dedicated thread when they occur, so ensure that all callbacks
-    registered are thread safe.
+    registered are thread safe (and block for as little time as possible).
    """
    def __init__(self, name, conf):
        super(NotifyingJobBoard, self).__init__(name, conf)
--- a/taskflow/listeners/timing.py
+++ b/taskflow/listeners/timing.py
@@ -21,7 +21,7 @@ import logging
 from taskflow import exceptions as exc
 from taskflow.listeners import base
 from taskflow import states
-from taskflow.utils import misc
+from taskflow.types import timing as tt

 STARTING_STATES = (states.RUNNING, states.REVERTING)
 FINISHED_STATES = base.FINISH_STATES + (states.REVERTED,)
@@ -64,8 +64,7 @@ class TimingListener(base.ListenerBase):
        if state == states.PENDING:
            self._timers.pop(task_name, None)
        elif state in STARTING_STATES:
-            self._timers[task_name] = misc.StopWatch()
-            self._timers[task_name].start()
+            self._timers[task_name] = tt.StopWatch().start()
        elif state in FINISHED_STATES:
            if task_name in self._timers:
                self._record_ending(self._timers[task_name], task_name)
--- a/taskflow/openstack/common/gettextutils.py
+++ b/taskflow/openstack/common/gettextutils.py
@@ -42,7 +42,7 @@ class TranslatorFactory(object):
    """Create translator functions
    """

-    def __init__(self, domain, lazy=False, localedir=None):
+    def __init__(self, domain, localedir=None):
        """Establish a set of translation functions for the domain.

        :param domain: Name of translation domain,
@@ -55,7 +55,6 @@ class TranslatorFactory(object):
        :type localedir: str
        """
        self.domain = domain
-        self.lazy = lazy
        if localedir is None:
            localedir = os.environ.get(domain.upper() + '_LOCALEDIR')
        self.localedir = localedir
@@ -75,16 +74,19 @@ class TranslatorFactory(object):
        """
        if domain is None:
            domain = self.domain
-        if self.lazy:
-            return functools.partial(Message, domain=domain)
-        t = gettext.translation(
-            domain,
-            localedir=self.localedir,
-            fallback=True,
-        )
-        if six.PY3:
-            return t.gettext
-        return t.ugettext
+        t = gettext.translation(domain,
+                                localedir=self.localedir,
+                                fallback=True)
+        # Use the appropriate method of the translation object based
+        # on the python version.
+        m = t.gettext if six.PY3 else t.ugettext
+
+        def f(msg):
+            """oslo.i18n.gettextutils translation function."""
+            if USE_LAZY:
+                return Message(msg, domain=domain)
+            return m(msg)
+        return f

    @property
    def primary(self):
@@ -159,7 +161,7 @@ def enable_lazy():
    USE_LAZY = True


-def install(domain, lazy=False):
+def install(domain):
    """Install a _() function using the given translation domain.

    Given a translation domain, install a _() function using gettext's
@@ -170,26 +172,14 @@ def install(domain, lazy=False):
    a translation-domain-specific environment variable (e.g.
    NOVA_LOCALEDIR).

+    Note that to enable lazy translation, enable_lazy must be
+    called.
+
    :param domain: the translation domain
-    :param lazy: indicates whether or not to install the lazy _() function.
-                 The lazy _() introduces a way to do deferred translation
-                 of messages by installing a _ that builds Message objects,
-                 instead of strings, which can then be lazily translated into
-                 any available locale.
    """
-    if lazy:
-        from six import moves
-        tf = TranslatorFactory(domain, lazy=True)
-        moves.builtins.__dict__['_'] = tf.primary
-    else:
-        localedir = '%s_LOCALEDIR' % domain.upper()
-        if six.PY3:
-            gettext.install(domain,
-                            localedir=os.environ.get(localedir))
-        else:
-            gettext.install(domain,
-                            localedir=os.environ.get(localedir),
-                            unicode=True)
+    from six import moves
+    tf = TranslatorFactory(domain)
+    moves.builtins.__dict__['_'] = tf.primary


 class Message(six.text_type):
@@ -373,8 +363,8 @@ def get_available_languages(domain):
               'zh_Hant_HK': 'zh_HK',
               'zh_Hant': 'zh_TW',
               'fil': 'tl_PH'}
-    for (locale, alias) in six.iteritems(aliases):
-        if locale in language_list and alias not in language_list:
+    for (locale_, alias) in six.iteritems(aliases):
+        if locale_ in language_list and alias not in language_list:
            language_list.append(alias)

    _AVAILABLE_LANGUAGES[domain] = language_list
--- a/taskflow/openstack/common/importutils.py
+++ b/taskflow/openstack/common/importutils.py
@@ -24,10 +24,10 @@ import traceback
 def import_class(import_str):
    """Returns a class from a string including module and class."""
    mod_str, _sep, class_str = import_str.rpartition('.')
+    __import__(mod_str)
    try:
-        __import__(mod_str)
        return getattr(sys.modules[mod_str], class_str)
-    except (ValueError, AttributeError):
+    except AttributeError:
        raise ImportError('Class %s cannot be found (%s)' %
                          (class_str,
                           traceback.format_exception(*sys.exc_info())))
--- a/taskflow/openstack/common/jsonutils.py
+++ b/taskflow/openstack/common/jsonutils.py
@@ -38,11 +38,19 @@ import inspect
 import itertools
 import sys

+is_simplejson = False
 if sys.version_info < (2, 7):
    # On Python <= 2.6, json module is not C boosted, so try to use
    # simplejson module if available
    try:
        import simplejson as json
+        # NOTE(mriedem): Make sure we have a new enough version of simplejson
+        # to support the namedobject_as_tuple argument. This can be removed
+        # in the Kilo release when python 2.6 support is dropped.
+        if 'namedtuple_as_object' in inspect.getargspec(json.dumps).args:
+            is_simplejson = True
+        else:
+            import json
    except ImportError:
        import json
 else:
@@ -165,15 +173,23 @@ def to_primitive(value, convert_instances=False, convert_datetime=True,


 def dumps(value, default=to_primitive, **kwargs):
+    if is_simplejson:
+        kwargs['namedtuple_as_object'] = False
    return json.dumps(value, default=default, **kwargs)


-def loads(s, encoding='utf-8'):
-    return json.loads(strutils.safe_decode(s, encoding))
+def dump(obj, fp, *args, **kwargs):
+    if is_simplejson:
+        kwargs['namedtuple_as_object'] = False
+    return json.dump(obj, fp, *args, **kwargs)


-def load(fp, encoding='utf-8'):
-    return json.load(codecs.getreader(encoding)(fp))
+def loads(s, encoding='utf-8', **kwargs):
+    return json.loads(strutils.safe_decode(s, encoding), **kwargs)
+
+
+def load(fp, encoding='utf-8', **kwargs):
+    return json.load(codecs.getreader(encoding)(fp), **kwargs)


 try:
--- a/taskflow/openstack/common/network_utils.py
+++ b/taskflow/openstack/common/network_utils.py
@@ -17,18 +17,15 @@
 Network-related utilities and helper functions.
 """

-# TODO(jd) Use six.moves once
-# https://bitbucket.org/gutworth/six/pull-request/28
-# is merged
-try:
-    import urllib.parse
-    SplitResult = urllib.parse.SplitResult
-except ImportError:
-    import urlparse
-    SplitResult = urlparse.SplitResult
+import logging
+import socket

 from six.moves.urllib import parse

+from taskflow.openstack.common.gettextutils import _LW
+
+LOG = logging.getLogger(__name__)
+

 def parse_host_port(address, default_port=None):
    """Interpret a string as a host:port pair.
@@ -52,8 +49,12 @@ def parse_host_port(address, default_port=None):
    ('::1', 1234)
    >>> parse_host_port('2001:db8:85a3::8a2e:370:7334', default_port=1234)
    ('2001:db8:85a3::8a2e:370:7334', 1234)
-
+    >>> parse_host_port(None)
+    (None, None)
    """
+    if not address:
+        return (None, None)
+
    if address[0] == '[':
        # Escaped ipv6
        _host, _port = address[1:].split(']')
@@ -74,7 +75,7 @@ def parse_host_port(address, default_port=None):
    return (host, None if port is None else int(port))


-class ModifiedSplitResult(SplitResult):
+class ModifiedSplitResult(parse.SplitResult):
    """Split results class for urlsplit."""

    # NOTE(dims): The functions below are needed for Python 2.6.x.
@@ -106,3 +107,57 @@ def urlsplit(url, scheme='', allow_fragments=True):
        path, query = path.split('?', 1)
    return ModifiedSplitResult(scheme, netloc,
                               path, query, fragment)
+
+
+def set_tcp_keepalive(sock, tcp_keepalive=True,
+                      tcp_keepidle=None,
+                      tcp_keepalive_interval=None,
+                      tcp_keepalive_count=None):
+    """Set values for tcp keepalive parameters
+
+    This function configures tcp keepalive parameters if users wish to do
+    so.
+
+    :param tcp_keepalive: Boolean, turn on or off tcp_keepalive. If users are
+      not sure, this should be True, and default values will be used.
+
+    :param tcp_keepidle: time to wait before starting to send keepalive probes
+    :param tcp_keepalive_interval: time between successive probes, once the
+      initial wait time is over
+    :param tcp_keepalive_count: number of probes to send before the connection
+      is killed
+    """
+
+    # NOTE(praneshp): Despite keepalive being a tcp concept, the level is
+    # still SOL_SOCKET. This is a quirk.
+    if isinstance(tcp_keepalive, bool):
+        sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, tcp_keepalive)
+    else:
+        raise TypeError("tcp_keepalive must be a boolean")
+
+    if not tcp_keepalive:
+        return
+
+    # These options aren't available in the OS X version of eventlet,
+    # Idle + Count * Interval effectively gives you the total timeout.
+    if tcp_keepidle is not None:
+        if hasattr(socket, 'TCP_KEEPIDLE'):
+            sock.setsockopt(socket.IPPROTO_TCP,
+                            socket.TCP_KEEPIDLE,
+                            tcp_keepidle)
+        else:
+            LOG.warning(_LW('tcp_keepidle not available on your system'))
+    if tcp_keepalive_interval is not None:
+        if hasattr(socket, 'TCP_KEEPINTVL'):
+            sock.setsockopt(socket.IPPROTO_TCP,
+                            socket.TCP_KEEPINTVL,
+                            tcp_keepalive_interval)
+        else:
+            LOG.warning(_LW('tcp_keepintvl not available on your system'))
+    if tcp_keepalive_count is not None:
+        if hasattr(socket, 'TCP_KEEPCNT'):
+            sock.setsockopt(socket.IPPROTO_TCP,
+                            socket.TCP_KEEPCNT,
+                            tcp_keepalive_count)
+        else:
+            LOG.warning(_LW('tcp_keepknt not available on your system'))
--- a/taskflow/openstack/common/strutils.py
+++ b/taskflow/openstack/common/strutils.py
@@ -50,6 +50,39 @@ SLUGIFY_STRIP_RE = re.compile(r"[^\w\s-]")
 SLUGIFY_HYPHENATE_RE = re.compile(r"[-\s]+")


+# NOTE(flaper87): The following globals are used by `mask_password`
+_SANITIZE_KEYS = ['adminPass', 'admin_pass', 'password', 'admin_password']
+
+# NOTE(ldbragst): Let's build a list of regex objects using the list of
+# _SANITIZE_KEYS we already have. This way, we only have to add the new key
+# to the list of _SANITIZE_KEYS and we can generate regular expressions
+# for XML and JSON automatically.
+_SANITIZE_PATTERNS_2 = []
+_SANITIZE_PATTERNS_1 = []
+
+# NOTE(amrith): Some regular expressions have only one parameter, some
+# have two parameters. Use different lists of patterns here.
+_FORMAT_PATTERNS_1 = [r'(%(key)s\s*[=]\s*)[^\s^\'^\"]+']
+_FORMAT_PATTERNS_2 = [r'(%(key)s\s*[=]\s*[\"\']).*?([\"\'])',
+                      r'(%(key)s\s+[\"\']).*?([\"\'])',
+                      r'([-]{2}%(key)s\s+)[^\'^\"^=^\s]+([\s]*)',
+                      r'(<%(key)s>).*?(</%(key)s>)',
+                      r'([\"\']%(key)s[\"\']\s*:\s*[\"\']).*?([\"\'])',
+                      r'([\'"].*?%(key)s[\'"]\s*:\s*u?[\'"]).*?([\'"])',
+                      r'([\'"].*?%(key)s[\'"]\s*,\s*\'--?[A-z]+\'\s*,\s*u?'
+                      '[\'"]).*?([\'"])',
+                      r'(%(key)s\s*--?[A-z]+\s*)\S+(\s*)']
+
+for key in _SANITIZE_KEYS:
+    for pattern in _FORMAT_PATTERNS_2:
+        reg_ex = re.compile(pattern % {'key': key}, re.DOTALL)
+        _SANITIZE_PATTERNS_2.append(reg_ex)
+
+    for pattern in _FORMAT_PATTERNS_1:
+        reg_ex = re.compile(pattern % {'key': key}, re.DOTALL)
+        _SANITIZE_PATTERNS_1.append(reg_ex)
+
+
 def int_from_bool_as_string(subject):
    """Interpret a string as a boolean and return either 1 or 0.

@@ -237,3 +270,42 @@ def to_slug(value, incoming=None, errors="strict"):
        "ascii", "ignore").decode("ascii")
    value = SLUGIFY_STRIP_RE.sub("", value).strip().lower()
    return SLUGIFY_HYPHENATE_RE.sub("-", value)
+
+
+def mask_password(message, secret="***"):
+    """Replace password with 'secret' in message.
+
+    :param message: The string which includes security information.
+    :param secret: value with which to replace passwords.
+    :returns: The unicode value of message with the password fields masked.
+
+    For example:
+
+    >>> mask_password("'adminPass' : 'aaaaa'")
+    "'adminPass' : '***'"
+    >>> mask_password("'admin_pass' : 'aaaaa'")
+    "'admin_pass' : '***'"
+    >>> mask_password('"password" : "aaaaa"')
+    '"password" : "***"'
+    >>> mask_password("'original_password' : 'aaaaa'")
+    "'original_password' : '***'"
+    >>> mask_password("u'original_password' :   u'aaaaa'")
+    "u'original_password' :   u'***'"
+    """
+    message = six.text_type(message)
+
+    # NOTE(ldbragst): Check to see if anything in message contains any key
+    # specified in _SANITIZE_KEYS, if not then just return the message since
+    # we don't have to mask any passwords.
+    if not any(key in message for key in _SANITIZE_KEYS):
+        return message
+
+    substitute = r'\g<1>' + secret + r'\g<2>'
+    for pattern in _SANITIZE_PATTERNS_2:
+        message = re.sub(pattern, substitute, message)
+
+    substitute = r'\g<1>' + secret
+    for pattern in _SANITIZE_PATTERNS_1:
+        message = re.sub(pattern, substitute, message)
+
+    return message
--- a/taskflow/openstack/common/timeutils.py
+++ b/taskflow/openstack/common/timeutils.py
@@ -114,7 +114,7 @@ def utcnow():


 def iso8601_from_timestamp(timestamp):
-    """Returns a iso8601 formatted date from timestamp."""
+    """Returns an iso8601 formatted date from timestamp."""
    return isotime(datetime.datetime.utcfromtimestamp(timestamp))


@@ -134,7 +134,7 @@ def set_time_override(override_time=None):

 def advance_time_delta(timedelta):
    """Advance overridden time using a datetime.timedelta."""
-    assert(not utcnow.override_time is None)
+    assert utcnow.override_time is not None
    try:
        for dt in utcnow.override_time:
            dt += timedelta
--- a/taskflow/patterns/graph_flow.py
+++ b/taskflow/patterns/graph_flow.py
@@ -72,9 +72,12 @@ class Flow(flow.Flow):
        return graph

    def _swap(self, graph):
-        """Validates the replacement graph and then swaps the underlying graph
-        with a frozen version of the replacement graph (this maintains the
-        invariant that the underlying graph is immutable).
+        """Validates the replacement graph and then swaps the underlying graph.
+
+        After swapping occurs the underlying graph will be frozen so that the
+        immutability invariant is maintained (we may be able to relax this
+        constraint in the future since our exposed public api does not allow
+        direct access to the underlying graph).
        """
        if not graph.is_directed_acyclic():
            raise exc.DependencyFailure("No path through the items in the"
--- a/taskflow/persistence/backends/init.py
+++ b/taskflow/persistence/backends/init.py
@@ -30,8 +30,25 @@ LOG = logging.getLogger(__name__)


 def fetch(conf, namespace=BACKEND_NAMESPACE, **kwargs):
-    """Fetches a given backend using the given configuration (and any backend
-    specific kwargs) in the given entrypoint namespace.
+    """Fetch a persistence backend with the given configuration.
+
+    This fetch method will look for the entrypoint name in the entrypoint
+    namespace, and then attempt to instantiate that entrypoint using the
+    provided configuration and any persistence backend specific kwargs.
+
+    NOTE(harlowja): to aid in making it easy to specify configuration and
+    options to a backend the configuration (which is typical just a dictionary)
+    can also be a uri string that identifies the entrypoint name and any
+    configuration specific to that backend.
+
+    For example, given the following configuration uri:
+
+    mysql://<not-used>/?a=b&c=d
+
+    This will look for the entrypoint named 'mysql' and will provide
+    a configuration object composed of the uris parameters, in this case that
+    is {'a': 'b', 'c': 'd'} to the constructor of that persistence backend
+    instance.
    """
    backend_name = conf['connection']
    try:
@@ -54,8 +71,12 @@ def fetch(conf, namespace=BACKEND_NAMESPACE, **kwargs):

@contextlib.contextmanager
 def backend(conf, namespace=BACKEND_NAMESPACE, **kwargs):
-    """Fetches a persistence backend, ensures that it is upgraded and upon
-    context manager completion closes the backend.
+    """Fetches a backend, connects, upgrades, then closes it on completion.
+
+    This allows a backend instance to be fetched, connected to, have its schema
+    upgraded (if the schema is already up to date this is a no-op) and then
+    used in a context manager statement with the backend being closed upon
+    context manager exit.
    """
    with contextlib.closing(fetch(conf, namespace=namespace, **kwargs)) as be:
        with contextlib.closing(be.get_connection()) as conn:
--- a/taskflow/persistence/backends/base.py
+++ b/taskflow/persistence/backends/base.py
@@ -70,9 +70,11 @@ class Connection(object):

    @abc.abstractmethod
    def validate(self):
-        """Validates that a backend is still ok to be used (the semantics
-        of this vary depending on the backend). On failure a backend specific
-        exception is raised that will indicate why the failure occurred.
+        """Validates that a backend is still ok to be used.
+
+        The semantics of this *may* vary depending on the backend. On failure a
+        backend specific exception should be raised that will indicate why the
+        failure occurred.
        """
        pass

--- a/taskflow/persistence/backends/impl_dir.py
+++ b/taskflow/persistence/backends/impl_dir.py
@@ -33,10 +33,24 @@ LOG = logging.getLogger(__name__)


 class DirBackend(base.Backend):
-    """A backend that writes logbooks, flow details, and task details to a
-    provided directory. This backend does *not* provide transactional semantics
-    although it does guarantee that there will be no race conditions when
-    writing/reading by using file level locking.
+    """A directory and file based backend.
+
+    This backend writes logbooks, flow details, and atom details to a provided
+    base path on the local filesystem. It will create and store those objects
+    in three key directories (one for logbooks, one for flow details and one
+    for atom details). It creates those associated directories and then
+    creates files inside those directories that represent the contents of those
+    objects for later reading and writing.
+
+    This backend does *not* provide true transactional semantics. It does
+    guarantee that there will be no interprocess race conditions when
+    writing and reading by using a consistent hierarchy of file based locks.
+
+    Example conf:
+
+    conf = {
+        "path": "/tmp/taskflow",
+    }
    """
    def __init__(self, conf):
        super(DirBackend, self).__init__(conf)
--- a/taskflow/persistence/backends/impl_memory.py
+++ b/taskflow/persistence/backends/impl_memory.py
@@ -15,8 +15,6 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

-"""Implementation of in-memory backend."""
-
 import logging

 import six
@@ -29,8 +27,10 @@ LOG = logging.getLogger(__name__)


 class MemoryBackend(base.Backend):
-    """A backend that writes logbooks, flow details, and task details to in
-    memory dictionaries.
+    """A in-memory (non-persistent) backend.
+
+    This backend writes logbooks, flow details, and atom details to in-memory
+    dictionaries and retrieves from those dictionaries as needed.
    """
    def __init__(self, conf=None):
        super(MemoryBackend, self).__init__(conf)
--- a/taskflow/persistence/backends/impl_sqlalchemy.py
+++ b/taskflow/persistence/backends/impl_sqlalchemy.py
@@ -32,6 +32,7 @@ from sqlalchemy import orm as sa_orm
 from sqlalchemy import pool as sa_pool

 from taskflow import exceptions as exc
+from taskflow.openstack.common import strutils
 from taskflow.persistence.backends import base
 from taskflow.persistence.backends.sqlalchemy import migration
 from taskflow.persistence.backends.sqlalchemy import models
@@ -120,6 +121,18 @@ def _is_db_connection_error(reason):
    return _in_any(reason, list(MY_SQL_CONN_ERRORS + POSTGRES_CONN_ERRORS))


+def _as_bool(value):
+    if isinstance(value, bool):
+        return value
+    # This is different than strutils, but imho is an acceptable difference.
+    if value is None:
+        return False
+    # NOTE(harlowja): prefer strictness to avoid users getting accustomed
+    # to passing bad values in and this *just working* (which imho is a bad
+    # habit to encourage).
+    return strutils.bool_from_string(value, strict=True)
+
+
 def _thread_yield(dbapi_con, con_record):
    """Ensure other greenthreads get a chance to be executed.

@@ -167,6 +180,14 @@ def _ping_listener(dbapi_conn, connection_rec, connection_proxy):


 class SQLAlchemyBackend(base.Backend):
+    """A sqlalchemy backend.
+
+    Example conf:
+
+    conf = {
+        "connection": "sqlite:////tmp/test.db",
+    }
+    """
    def __init__(self, conf, engine=None):
        super(SQLAlchemyBackend, self).__init__(conf)
        if engine is not None:
@@ -183,8 +204,8 @@ class SQLAlchemyBackend(base.Backend):
        # all the popping that will happen below.
        conf = copy.deepcopy(self._conf)
        engine_args = {
-            'echo': misc.as_bool(conf.pop('echo', False)),
-            'convert_unicode': misc.as_bool(conf.pop('convert_unicode', True)),
+            'echo': _as_bool(conf.pop('echo', False)),
+            'convert_unicode': _as_bool(conf.pop('convert_unicode', True)),
            'pool_recycle': 3600,
        }
        if 'idle_timeout' in conf:
@@ -229,13 +250,13 @@ class SQLAlchemyBackend(base.Backend):
        engine = sa.create_engine(sql_connection, **engine_args)
        checkin_yield = conf.pop('checkin_yield',
                                 eventlet_utils.EVENTLET_AVAILABLE)
-        if misc.as_bool(checkin_yield):
+        if _as_bool(checkin_yield):
            sa.event.listen(engine, 'checkin', _thread_yield)
        if 'mysql' in e_url.drivername:
-            if misc.as_bool(conf.pop('checkout_ping', True)):
+            if _as_bool(conf.pop('checkout_ping', True)):
                sa.event.listen(engine, 'checkout', _ping_listener)
            mode = None
-            if misc.as_bool(conf.pop('mysql_traditional_mode', True)):
+            if _as_bool(conf.pop('mysql_traditional_mode', True)):
                mode = 'TRADITIONAL'
            if 'mysql_sql_mode' in conf:
                mode = conf.pop('mysql_sql_mode')
@@ -337,9 +358,13 @@ class Connection(base.Connection):
        failures[-1].reraise()

    def _run_in_session(self, functor, *args, **kwargs):
-        """Runs a function in a session and makes sure that sqlalchemy
-        exceptions aren't emitted from that sessions actions (as that would
-        expose the underlying backends exception model).
+        """Runs a callback in a session.
+
+        This function proxy will create a session, and then call the callback
+        with that session (along with the provided args and kwargs). It ensures
+        that the session is opened & closed and makes sure that sqlalchemy
+        exceptions aren't emitted from the callback or sessions actions (as
+        that would expose the underlying sqlalchemy exception model).
        """
        try:
            session = self._make_session()
--- a/taskflow/persistence/backends/impl_zookeeper.py
+++ b/taskflow/persistence/backends/impl_zookeeper.py
@@ -34,9 +34,16 @@ MIN_ZK_VERSION = (3, 4, 0)


 class ZkBackend(base.Backend):
-    """ZooKeeper as backend storage implementation
+    """A zookeeper backend.

-    Example conf (use Kazoo):
+    This backend writes logbooks, flow details, and atom details to a provided
+    base path in zookeeper. It will create and store those objects in three
+    key directories (one for logbooks, one for flow details and one for atom
+    details). It creates those associated directories and then creates files
+    inside those directories that represent the contents of those objects for
+    later reading and writing.
+
+    Example conf:

    conf = {
        "hosts": "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181",
@@ -126,8 +133,11 @@ class ZkConnection(base.Connection):

    @contextlib.contextmanager
    def _exc_wrapper(self):
-        """Exception wrapper which wraps kazoo exceptions and groups them
-        to taskflow exceptions.
+        """Exception context-manager which wraps kazoo exceptions.
+
+        This is used to capture and wrap any kazoo specific exceptions and
+        then group them into corresponding taskflow exceptions (not doing
+        that would expose the underlying kazoo exception model).
        """
        try:
            yield
@@ -146,8 +156,10 @@ class ZkConnection(base.Connection):
    def update_atom_details(self, ad):
        """Update a atom detail transactionally."""
        with self._exc_wrapper():
-            with self._client.transaction() as txn:
-                return self._update_atom_details(ad, txn)
+            txn = self._client.transaction()
+            ad = self._update_atom_details(ad, txn)
+            k_utils.checked_commit(txn)
+            return ad

    def _update_atom_details(self, ad, txn, create_missing=False):
        # Determine whether the desired data exists or not.
@@ -199,8 +211,10 @@ class ZkConnection(base.Connection):
    def update_flow_details(self, fd):
        """Update a flow detail transactionally."""
        with self._exc_wrapper():
-            with self._client.transaction() as txn:
-                return self._update_flow_details(fd, txn)
+            txn = self._client.transaction()
+            fd = self._update_flow_details(fd, txn)
+            k_utils.checked_commit(txn)
+            return fd

    def _update_flow_details(self, fd, txn, create_missing=False):
        # Determine whether the desired data exists or not
@@ -296,19 +310,19 @@ class ZkConnection(base.Connection):
            return e_lb

        with self._exc_wrapper():
-            with self._client.transaction() as txn:
-                # Determine whether the desired data exists or not.
-                lb_path = paths.join(self.book_path, lb.uuid)
-                try:
-                    lb_data, _zstat = self._client.get(lb_path)
-                except k_exc.NoNodeError:
-                    # Create a new logbook since it doesn't exist.
-                    e_lb = _create_logbook(lb_path, txn)
-                else:
-                    # Otherwise update the existing logbook instead.
-                    e_lb = _update_logbook(lb_path, lb_data, txn)
-                # Finally return (updated) logbook.
-                return e_lb
+            txn = self._client.transaction()
+            # Determine whether the desired data exists or not.
+            lb_path = paths.join(self.book_path, lb.uuid)
+            try:
+                lb_data, _zstat = self._client.get(lb_path)
+            except k_exc.NoNodeError:
+                # Create a new logbook since it doesn't exist.
+                e_lb = _create_logbook(lb_path, txn)
+            else:
+                # Otherwise update the existing logbook instead.
+                e_lb = _update_logbook(lb_path, lb_data, txn)
+            k_utils.checked_commit(txn)
+            return e_lb

    def _get_logbook(self, lb_uuid):
        lb_path = paths.join(self.book_path, lb_uuid)
@@ -370,35 +384,38 @@ class ZkConnection(base.Connection):
            txn.delete(lb_path)

        with self._exc_wrapper():
-            with self._client.transaction() as txn:
-                _destroy_logbook(lb_uuid, txn)
+            txn = self._client.transaction()
+            _destroy_logbook(lb_uuid, txn)
+            k_utils.checked_commit(txn)

    def clear_all(self, delete_dirs=True):
        """Delete all data transactionally."""
        with self._exc_wrapper():
-            with self._client.transaction() as txn:
+            txn = self._client.transaction()

-                # Delete all data under logbook path.
-                for lb_uuid in self._client.get_children(self.book_path):
-                    lb_path = paths.join(self.book_path, lb_uuid)
-                    for fd_uuid in self._client.get_children(lb_path):
-                        txn.delete(paths.join(lb_path, fd_uuid))
-                    txn.delete(lb_path)
+            # Delete all data under logbook path.
+            for lb_uuid in self._client.get_children(self.book_path):
+                lb_path = paths.join(self.book_path, lb_uuid)
+                for fd_uuid in self._client.get_children(lb_path):
+                    txn.delete(paths.join(lb_path, fd_uuid))
+                txn.delete(lb_path)

-                # Delete all data under flow detail path.
-                for fd_uuid in self._client.get_children(self.flow_path):
-                    fd_path = paths.join(self.flow_path, fd_uuid)
-                    for ad_uuid in self._client.get_children(fd_path):
-                        txn.delete(paths.join(fd_path, ad_uuid))
-                    txn.delete(fd_path)
+            # Delete all data under flow detail path.
+            for fd_uuid in self._client.get_children(self.flow_path):
+                fd_path = paths.join(self.flow_path, fd_uuid)
+                for ad_uuid in self._client.get_children(fd_path):
+                    txn.delete(paths.join(fd_path, ad_uuid))
+                txn.delete(fd_path)

-                # Delete all data under atom detail path.
-                for ad_uuid in self._client.get_children(self.atom_path):
-                    ad_path = paths.join(self.atom_path, ad_uuid)
-                    txn.delete(ad_path)
+            # Delete all data under atom detail path.
+            for ad_uuid in self._client.get_children(self.atom_path):
+                ad_path = paths.join(self.atom_path, ad_uuid)
+                txn.delete(ad_path)

-                # Delete containing directories.
-                if delete_dirs:
-                    txn.delete(self.book_path)
-                    txn.delete(self.atom_path)
-                    txn.delete(self.flow_path)
+            # Delete containing directories.
+            if delete_dirs:
+                txn.delete(self.book_path)
+                txn.delete(self.atom_path)
+                txn.delete(self.flow_path)
+
+            k_utils.checked_commit(txn)
--- a/taskflow/persistence/backends/sqlalchemy/alembic/versions/14b227d79a87_add_intention_column.py
+++ b/taskflow/persistence/backends/sqlalchemy/alembic/versions/14b227d79a87_add_intention_column.py
@@ -20,6 +20,7 @@ down_revision = '84d6e888850'

 from alembic import op
 import sqlalchemy as sa
+
 from taskflow import states


--- a/taskflow/persistence/backends/sqlalchemy/alembic/versions/84d6e888850_add_task_detail_type.py
+++ b/taskflow/persistence/backends/sqlalchemy/alembic/versions/84d6e888850_add_task_detail_type.py
@@ -28,6 +28,7 @@ down_revision = '1c783c0c2875'

 from alembic import op
 import sqlalchemy as sa
+
 from taskflow.persistence import logbook


--- a/taskflow/persistence/backends/sqlalchemy/models.py
+++ b/taskflow/persistence/backends/sqlalchemy/models.py
@@ -25,7 +25,6 @@ from sqlalchemy import types as types
 from taskflow.openstack.common import jsonutils
 from taskflow.openstack.common import timeutils
 from taskflow.openstack.common import uuidutils
-
 from taskflow.persistence import logbook
 from taskflow import states

--- a/taskflow/persistence/logbook.py
+++ b/taskflow/persistence/logbook.py
@@ -64,14 +64,20 @@ def _fix_meta(data):


 class LogBook(object):
-    """This class that contains a dict of flow detail entries for a
-    given *job* so that the job can track what 'work' has been
-    completed for resumption/reverting and miscellaneous tracking
+    """A container of flow details, a name and associated metadata.
+
+    Typically this class contains a collection of flow detail entries
+    for a given engine (or job) so that those entities can track what 'work'
+    has been completed for resumption, reverting and miscellaneous tracking
    purposes.

    The data contained within this class need *not* be backed by the backend
    storage in real time. The data in this class will only be guaranteed to be
    persisted when a save occurs via some backend connection.
+
+    NOTE(harlowja): the naming of this class is analogous to a ships log or a
+    similar type of record used in detailing work that been completed (or work
+    that has not been completed).
    """
    def __init__(self, name, uuid=None):
        if uuid:
@@ -159,8 +165,11 @@ class LogBook(object):


 class FlowDetail(object):
-    """This class contains a dict of atom detail entries for a given
-    flow along with any metadata associated with that flow.
+    """A container of atom details, a name and associated metadata.
+
+    Typically this class contains a collection of atom detail entries that
+    represent the atoms in a given flow structure (along with any other needed
+    metadata relevant to that flow).

    The data contained within this class need *not* be backed by the backend
    storage in real time. The data in this class will only be guaranteed to be
@@ -241,13 +250,15 @@ class FlowDetail(object):

@six.add_metaclass(abc.ABCMeta)
 class AtomDetail(object):
-    """This is a base class that contains an entry that contains the
-    persistence of an atom after or before (or during) it is running including
-    any results it may have produced, any state that it may be in (failed
-    for example), any exception that occurred when running and any associated
-    stacktrace that may have occurring during that exception being thrown
-    and any other metadata that should be stored along-side the details
-    about this atom.
+    """A base container of atom specific runtime information and metadata.
+
+    This is a base class that contains attributes that are used to connect
+    a atom to the persistence layer during, after, or before it is running
+    including any results it may have produced, any state that it may be
+    in (failed for example), any exception that occurred when running and any
+    associated stacktrace that may have occurring during that exception being
+    thrown and any other metadata that should be stored along-side the details
+    about the connected atom.

    The data contained within this class need *not* backed by the backend
    storage in real time. The data in this class will only be guaranteed to be
@@ -276,8 +287,11 @@ class AtomDetail(object):

    @property
    def last_results(self):
-        """Gets the atoms last result (if it has many results it should then
-        return the last one of many).
+        """Gets the atoms last result.
+
+        If the atom has produced many results (for example if it has been
+        retried, reverted, executed and ...) this returns the last one of
+        many results.
        """
        return self.results

@@ -397,7 +411,7 @@ class TaskDetail(AtomDetail):

    def merge(self, other, deep_copy=False):
        if not isinstance(other, TaskDetail):
-            raise NotImplemented("Can only merge with other task details")
+            raise NotImplementedError("Can only merge with other task details")
        if other is self:
            return self
        super(TaskDetail, self).merge(other, deep_copy=deep_copy)
@@ -482,7 +496,8 @@ class RetryDetail(AtomDetail):

    def merge(self, other, deep_copy=False):
        if not isinstance(other, RetryDetail):
-            raise NotImplemented("Can only merge with other retry details")
+            raise NotImplementedError("Can only merge with other retry "
+                                      "details")
        if other is self:
            return self
        super(RetryDetail, self).merge(other, deep_copy=deep_copy)
--- a/taskflow/retry.py
+++ b/taskflow/retry.py
@@ -34,8 +34,7 @@ RETRY = "RETRY"

@six.add_metaclass(abc.ABCMeta)
 class Decider(object):
-    """A base class or mixin for an object that can decide how to resolve
-    execution failures.
+    """A class/mixin object that can decide how to resolve execution failures.

    A decider may be executed multiple times on subflow or other atom
    failure and it is expected to make a decision about what should be done
@@ -45,10 +44,11 @@ class Decider(object):

    @abc.abstractmethod
    def on_failure(self, history, *args, **kwargs):
-        """On subflow failure makes a decision about the future flow
-        execution using information about prior previous failures (if this
-        historical failure information is not available or was not persisted
-        this history will be empty).
+        """On failure makes a decision about the future.
+
+        This method will typically use information about prior failures (if
+        this historical failure information is not available or was not
+        persisted this history will be empty).

        Returns retry action constant:

@@ -63,9 +63,13 @@ class Decider(object):

@six.add_metaclass(abc.ABCMeta)
 class Retry(atom.Atom, Decider):
-    """A base class for a retry object that decides how to resolve subflow
-    execution failures and may also provide execute and revert methods to alter
-    the inputs of subflow atoms.
+    """A class that can decide how to resolve execution failures.
+
+    This abstract base class is used to inherit from and provide different
+    strategies that will be activated upon execution failures. Since a retry
+    object is an atom it may also provide execute and revert methods to alter
+    the inputs of connected atoms (depending on the desired strategy to be
+    used this can be quite useful).
    """

    default_provides = None
@@ -88,22 +92,32 @@ class Retry(atom.Atom, Decider):

    @abc.abstractmethod
    def execute(self, history, *args, **kwargs):
-        """Activate a given retry which will produce data required to
-           start or restart a subflow using previously provided values and a
-           history of subflow failures from previous runs.
-           Retry can provide same values multiple times (after each run),
-           the latest value will be used by tasks. Old values will be saved to
-           the history of retry that is a list of tuples (result, failures)
-           where failures is a dictionary of failures by task names.
-           This allows to make retries of subflow with different parameters.
+        """Executes the given retry atom.
+
+        This execution activates a given retry which will typically produce
+        data required to start or restart a connected component using
+        previously provided values and a history of prior failures from
+        previous runs. The historical data can be analyzed to alter the
+        resolution strategy that this retry controller will use.
+
+        For example, a retry can provide the same values multiple times (after
+        each run), the latest value or some other variation. Old values will be
+        saved to the history of the retry atom automatically, that is a list of
+        tuples (result, failures) are persisted where failures is a dictionary
+        of failures indexed by task names and the result is the execution
+        result returned by this retry controller during that failure resolution
+        attempt.
        """

    def revert(self, history, *args, **kwargs):
-        """Revert this retry using the given context, all results
-           that had been provided by previous tries and all errors caused
-           a reversion. This method will be called only if a subflow must be
-           reverted without the retry. It won't be called on subflow retry, but
-           all subflow's tasks will be reverted before the retry.
+        """Reverts this retry using the given context.
+
+        On revert call all results that had been provided by previous tries
+        and all errors caused during reversion are provided. This method
+        will be called *only* if a subflow must be reverted without the
+        retry (that is to say that the controller has ran out of resolution
+        options and has either given up resolution or has failed to handle
+        a execution failure).
        """


@@ -146,9 +160,12 @@ class Times(Retry):


 class ForEachBase(Retry):
-    """Base class for retries that iterate given collection."""
+    """Base class for retries that iterate over a given collection."""

    def _get_next_value(self, values, history):
+        # Fetches the next resolution result to try, removes overlapping
+        # entries with what has already been tried and then returns the first
+        # resolution strategy remaining.
        items = (item for item, _failures in history)
        remaining = misc.sequence_minus(values, items)
        if not remaining:
@@ -166,8 +183,10 @@ class ForEachBase(Retry):


 class ForEach(ForEachBase):
-    """Accepts a collection of values to the constructor. Returns the next
-    element of the collection on each try.
+    """Applies a statically provided collection of strategies.
+
+    Accepts a collection of decision strategies on construction and returns the
+    next element of the collection on each try.
    """

    def __init__(self, values, name=None, provides=None, requires=None,
@@ -180,12 +199,17 @@ class ForEach(ForEachBase):
        return self._on_failure(self._values, history)

    def execute(self, history, *args, **kwargs):
+        # NOTE(harlowja): This allows any connected components to know the
+        # current resolution strategy being attempted.
        return self._get_next_value(self._values, history)


 class ParameterizedForEach(ForEachBase):
-    """Accepts a collection of values from storage as a parameter of execute
-     method. Returns the next element of the collection on each try.
+    """Applies a dynamically provided collection of strategies.
+
+    Accepts a collection of decision strategies from a predecessor (or from
+    storage) as a parameter and returns the next element of that collection on
+    each try.
    """

    def on_failure(self, values, history, *args, **kwargs):
--- a/taskflow/states.py
+++ b/taskflow/states.py
@@ -46,14 +46,14 @@ EXECUTE = 'EXECUTE'
 IGNORE = 'IGNORE'
 REVERT = 'REVERT'
 RETRY = 'RETRY'
-INTENTIONS = [EXECUTE, IGNORE, REVERT, RETRY]
+INTENTIONS = (EXECUTE, IGNORE, REVERT, RETRY)

 # Additional engine states
 SCHEDULING = 'SCHEDULING'
 WAITING = 'WAITING'
 ANALYZING = 'ANALYZING'

-## Flow state transitions
+# Flow state transitions
 # See: http://docs.openstack.org/developer/taskflow/states.html

 _ALLOWED_FLOW_TRANSITIONS = frozenset((
@@ -124,7 +124,7 @@ def check_flow_transition(old_state, new_state):
                           % pair)


-## Task state transitions
+# Task state transitions
 # See: http://docs.openstack.org/developer/taskflow/states.html

 _ALLOWED_TASK_TRANSITIONS = frozenset((
--- a/taskflow/storage.py
+++ b/taskflow/storage.py
@@ -77,9 +77,12 @@ class Storage(object):

    @abc.abstractproperty
    def _lock_cls(self):
-        """Lock class used to generate reader/writer locks for protecting
-        read/write access to the underlying storage backend and internally
-        mutating operations.
+        """Lock class used to generate reader/writer locks.
+
+        These locks are used for protecting read/write access to the
+        underlying storage backend when internally mutating operations occur.
+        They ensure that we read and write data in a consistent manner when
+        being used in a multithreaded situation.
        """

    def _with_connection(self, functor, *args, **kwargs):
@@ -248,9 +251,12 @@ class Storage(object):
                self._with_connection(self._save_atom_detail, ad)

    def update_atom_metadata(self, atom_name, update_with):
-        """Updates a atoms metadata given another dictionary or a list of
-        (key, value) pairs to include in the updated metadata (newer keys will
-        overwrite older keys).
+        """Updates a atoms associated metadata.
+
+        This update will take a provided dictionary or a list of (key, value)
+        pairs to include in the updated metadata (newer keys will overwrite
+        older keys) and after merging saves the updated data into the
+        underlying persistence layer.
        """
        self._update_atom_metadata(atom_name, update_with)

--- a/taskflow/task.py
+++ b/taskflow/task.py
@@ -30,8 +30,12 @@ LOG = logging.getLogger(__name__)

@six.add_metaclass(abc.ABCMeta)
 class BaseTask(atom.Atom):
-    """An abstraction that defines a potential piece of work that can be
-    applied and can be reverted to undo the work as a single task.
+    """An abstraction that defines a potential piece of work.
+
+    This potential piece of work is expected to be able to contain
+    functionality that defines what can be executed to accomplish that work
+    as well as a way of defining what can be executed to reverted/undo that
+    same piece of work.
    """

    TASK_EVENTS = ('update_progress', )
@@ -43,6 +47,15 @@ class BaseTask(atom.Atom):
        # Map of events => lists of callbacks to invoke on task events.
        self._events_listeners = collections.defaultdict(list)

+    def pre_execute(self):
+        """Code to be run prior to executing the task.
+
+        A common pattern for initializing the state of the system prior to
+        running tasks is to define some code in a base class that all your
+        tasks inherit from.  In that class, you can define a pre_execute
+        method and it will always be invoked just prior to your tasks running.
+        """
+
    @abc.abstractmethod
    def execute(self, *args, **kwargs):
        """Activate a given task which will perform some operation and return.
@@ -61,6 +74,25 @@ class BaseTask(atom.Atom):
        or remote).
        """

+    def post_execute(self):
+        """Code to be run after executing the task.
+
+        A common pattern for cleaning up global state of the system after the
+        execution of tasks is to define some code in a base class that all your
+        tasks inherit from.  In that class, you can define a post_execute
+        method and it will always be invoked just after your tasks execute,
+        regardless of whether they succeded or not.
+
+        This pattern is useful if you have global shared database sessions
+        that need to be cleaned up, for example.
+        """
+
+    def pre_revert(self):
+        """Code to be run prior to reverting the task.
+
+        This works the same as pre_execute, but for the revert phase.
+        """
+
    def revert(self, *args, **kwargs):
        """Revert this task.

@@ -75,6 +107,12 @@ class BaseTask(atom.Atom):
        contain the failure information.
        """

+    def post_revert(self):
+        """Code to be run after reverting the task.
+
+        This works the same as post_execute, but for the revert phase.
+        """
+
    def update_progress(self, progress, **kwargs):
        """Update task progress and notify all registered listeners.

@@ -101,8 +139,12 @@ class BaseTask(atom.Atom):

    @contextlib.contextmanager
    def autobind(self, event_name, handler_func, **kwargs):
-        """Binds a given function to the task for a given event name and then
-        unbinds that event name and associated function automatically on exit.
+        """Binds & unbinds a given event handler to the task.
+
+        This function binds and unbinds using the context manager protocol.
+        When events are triggered on the task of the given event name this
+        handler will automatically be called with the provided keyword
+        arguments.
        """
        bound = False
        if handler_func is not None:
@@ -135,10 +177,11 @@ class BaseTask(atom.Atom):
        self._events_listeners[event].append((handler, kwargs))

    def unbind(self, event, handler=None):
-        """Remove a previously-attached event handler from the task. If handler
-        function not passed, then unbind all event handlers for the provided
-        event. If multiple of the same handlers are bound, then the first
-        match is removed (and only the first match).
+        """Remove a previously-attached event handler from the task.
+
+        If a handler function not passed, then this will unbind all event
+        handlers for the provided event. If multiple of the same handlers are
+        bound, then the first match is removed (and only the first match).

        :param event: event type
        :param handler: handler previously bound
--- a/taskflow/test.py
+++ b/taskflow/test.py
@@ -14,15 +14,13 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import fixtures
 import mock
-
+import six
 from testtools import compat
 from testtools import matchers
 from testtools import testcase

-import fixtures
-import six
-
 from taskflow import exceptions
 from taskflow.tests import utils
 from taskflow.utils import misc
@@ -41,8 +39,11 @@ class GreaterThanEqual(object):


 class FailureRegexpMatcher(object):
-    """Matches if the failure was caused by the given exception and its string
-    matches to the given pattern.
+    """Matches if the failure was caused by the given exception and message.
+
+    This will match if a given failure contains and exception of the given
+    class type and if its string message matches to the given regular
+    expression pattern.
    """

    def __init__(self, exc_class, pattern):
@@ -59,8 +60,10 @@ class FailureRegexpMatcher(object):


 class ItemsEqual(object):
-    """Matches the sequence that has same elements as reference
-    object, regardless of the order.
+    """Matches the items in two sequences.
+
+    This matcher will validate that the provided sequence has the same elements
+    as a reference sequence, regardless of the order.
    """

    def __init__(self, seq):
@@ -167,9 +170,7 @@ class TestCase(testcase.TestCase):

    def assertFailuresRegexp(self, exc_class, pattern, callable_obj, *args,
                             **kwargs):
-        """Assert that the callable failed with the given exception and its
-        string matches to the given pattern.
-        """
+        """Asserts the callable failed with the given exception and message."""
        try:
            with utils.wrap_all_failures():
                callable_obj(*args, **kwargs)
@@ -200,8 +201,11 @@ class MockTestCase(TestCase):
        return mocked

    def _patch_class(self, module, name, autospec=True, attach_as=None):
-        """Patch class, create class instance mock and attach them to
-        the master mock.
+        """Patches a modules class.
+
+        This will create a class instance mock (using the provided name to
+        find the class in the module) and attach a mock class the master mock
+        to be cleaned up on test exit.
        """
        if autospec:
            instance_mock = mock.Mock(spec_set=getattr(module, name))
--- a/taskflow/tests/test_examples.py
+++ b/taskflow/tests/test_examples.py
@@ -24,7 +24,7 @@ extension; then it will be checked that output did not change.

 When this module is used as main module, output for all examples are
 generated. Please note that this will break tests as output for most
-examples is indeterministic.
+examples is indeterministic (due to hash randomization for example).
 """


@@ -91,8 +91,12 @@ def list_examples():
 class ExamplesTestCase(taskflow.test.TestCase):
    @classmethod
    def update(cls):
-        """For each example, adds on a test method that the testing framework
-        will then run.
+        """For each example, adds on a test method.
+
+        This newly created test method will then be activated by the testing
+        framework when it scans for and runs tests. This makes for a elegant
+        and simple way to ensure that all of the provided examples
+        actually work.
        """
        def add_test_method(name, method_name):
            def test_example(self):
--- a/Show More
+++ b/Show More