Merge tag '1.15.0' into debian/liberty

taskflow 1.15.0 release
2015-07-15 11:26:16 +02:00
parent ab54e63b9e 9f846d0475
commit 3d94ac62d0
188 changed files with 10919 additions and 6876 deletions
--- a/1
+++ b/1
@@ -0,0 +1 @@
+.. This is a generated file! Do not edit.
--- a/README.rst
+++ b/README.rst
@@ -1,6 +1,14 @@
 TaskFlow
 ========

+.. image:: https://img.shields.io/pypi/v/taskflow.svg
+    :target: https://pypi.python.org/pypi/taskflow/
+    :alt: Latest Version
+
+.. image:: https://img.shields.io/pypi/dm/taskflow.svg
+    :target: https://pypi.python.org/pypi/taskflow/
+    :alt: Downloads
+
 A library to do [jobs, tasks, flows] in a highly available, easy to understand
 and declarative manner (and more!) to be used with OpenStack and other
 projects.
@@ -22,18 +30,16 @@ Requirements
 ~~~~~~~~~~~~

 Because this project has many optional (pluggable) parts like persistence
-backends and engines, we decided to split our requirements into three
+backends and engines, we decided to split our requirements into two
 parts: - things that are absolutely required (you can't use the project
-without them) are put into ``requirements-pyN.txt`` (``N`` being the
-Python *major* version number used to install the package). The requirements
+without them) are put into ``requirements.txt``. The requirements
 that are required by some optional part of this project (you can use the
-project without them) are put into our ``tox.ini`` file (so that we can still
-test the optional functionality works as expected). If you want to use the
-feature in question (`eventlet`_ or the worker based engine that
-uses `kombu`_ or the `sqlalchemy`_ persistence backend or jobboards which
+project without them) are put into our ``test-requirements.txt`` file (so
+that we can still test the optional functionality works as expected). If
+you want to use the feature in question (`eventlet`_ or the worker based engine
+that uses `kombu`_ or the `sqlalchemy`_ persistence backend or jobboards which
 have an implementation built using `kazoo`_ ...), you should add
-that requirement(s) to your project or environment; - as usual, things that
-required only for running tests are put into ``test-requirements.txt``.
+that requirement(s) to your project or environment.

 Tox.ini
 ~~~~~~~
--- a/doc/diagrams/area_of_influence.graffle.tgz
+++ b/doc/diagrams/area_of_influence.graffle.tgz
--- a/doc/source/arguments_and_results.rst
+++ b/doc/source/arguments_and_results.rst
@@ -74,8 +74,8 @@ ignored during inference (as these names have special meaning/usage in python).
    ...     def execute(self, *args, **kwargs):
    ...         pass
    ...
-    >>> UniTask().requires
-    frozenset([])
+    >>> sorted(UniTask().requires)
+    []

 .. make vim sphinx highlighter* happy**

@@ -84,7 +84,7 @@ Rebinding
 ---------

 **Why:** There are cases when the value you want to pass to a task/retry is
-stored with a name other then the corresponding arguments name. That's when the
+stored with a name other than the corresponding arguments name. That's when the
 ``rebind`` constructor parameter comes in handy. Using it the flow author
 can instruct the engine to fetch a value from storage by one name, but pass it
 to a tasks/retrys ``execute`` method with another name. There are two possible
@@ -214,8 +214,8 @@ name of the value.
    ...    def execute(self):
    ...        return 42
    ...
-    >>> TheAnswerReturningTask(provides='the_answer').provides
-    set(['the_answer'])
+    >>> sorted(TheAnswerReturningTask(provides='the_answer').provides)
+    ['the_answer']

 Returning a tuple
 +++++++++++++++++
@@ -416,7 +416,7 @@ the following history (printed as a list)::
 At this point (since the implementation returned ``RETRY``) the
 |retry.execute| method will be called again and it will receive the same
 history and it can then return a value that subseqent tasks can use to alter
-there behavior.
+their behavior.

 If instead the |retry.execute| method itself raises an exception,
 the |retry.revert| method of the implementation will be called and
--- a/doc/source/atoms.rst
+++ b/doc/source/atoms.rst
@@ -23,9 +23,9 @@ values (requirements) and name outputs (provided values).
 Task
 =====

-A :py:class:`task <taskflow.task.BaseTask>` (derived from an atom) is the
-smallest possible unit of work that can have an execute & rollback sequence
-associated with it. These task objects all derive
+A :py:class:`task <taskflow.task.BaseTask>` (derived from an atom) is a
+unit of work that can have an execute & rollback sequence associated with
+it (they are *nearly* analogous to functions). These task objects all derive
 from :py:class:`~taskflow.task.BaseTask` which defines what a task must
 provide in terms of properties and methods.

@@ -48,38 +48,30 @@ Retry
 =====

 A :py:class:`retry <taskflow.retry.Retry>` (derived from an atom) is a special
-unit that handles errors, controls flow execution and can (for example) retry
-other atoms with other parameters if needed. When an associated atom
-fails, these retry units are *consulted* to determine what the resolution
-method should be. The goal is that with this *consultation* the retry atom
-will suggest a method for getting around the failure (perhaps by retrying,
-reverting a single item, or reverting everything contained in the retries
-associated scope).
+unit of work that handles errors, controls flow execution and can (for
+example) retry other atoms with other parameters if needed. When an associated
+atom fails, these retry units are *consulted* to determine what the resolution
+*strategy* should be. The goal is that with this consultation the retry atom
+will suggest a *strategy* for getting around the failure (perhaps by retrying,
+reverting a single atom, or reverting everything contained in the retries
+associated `scope`_).

 Currently derivatives of the :py:class:`retry <taskflow.retry.Retry>` base
-class must provide a ``on_failure`` method to determine how a failure should
-be handled.
+class must provide a :py:func:`~taskflow.retry.Retry.on_failure` method to
+determine how a failure should be handled. The current enumeration(s) that can
+be returned from the :py:func:`~taskflow.retry.Retry.on_failure` method
+are defined in an enumeration class described here:

-The current enumeration set that can be returned from this method is:
-
-* ``RETRY`` - retries the surrounding subflow (a retry object is associated
-  with a flow, which is typically converted into a graph hierarchy at
-  compilation time) again.
-
-* ``REVERT`` - reverts only the surrounding subflow but *consult* the
-  parent atom before doing this to determine if the parent retry object
-  provides a different reconciliation strategy (retry atoms can be nested, this
-  is possible since flows themselves can be nested).
-
-* ``REVERT_ALL`` - completely reverts a whole flow.
+.. autoclass:: taskflow.retry.Decision

 To aid in the reconciliation process the
-:py:class:`retry <taskflow.retry.Retry>` base class also mandates ``execute``
-and ``revert`` methods (although subclasses are allowed to define these methods
-as no-ops) that can be used by a retry atom to interact with the runtime
-execution model (for example, to track the number of times it has been
-called which is useful for the :py:class:`~taskflow.retry.ForEach` retry
-subclass).
+:py:class:`retry <taskflow.retry.Retry>` base class also mandates
+:py:func:`~taskflow.retry.Retry.execute`
+and :py:func:`~taskflow.retry.Retry.revert` methods (although subclasses
+are allowed to define these methods as no-ops) that can be used by a retry
+atom to interact with the runtime execution model (for example, to track the
+number of times it has been called which is useful for
+the :py:class:`~taskflow.retry.ForEach` retry subclass).

 To avoid recreating common retry patterns the following provided retry
 subclasses are provided:
@@ -94,8 +86,40 @@ subclasses are provided:
  :py:class:`~taskflow.retry.ForEach` but extracts values from storage
  instead of the :py:class:`~taskflow.retry.ForEach` constructor.

-Examples
--------
+.. _scope: http://en.wikipedia.org/wiki/Scope_%28computer_science%29
+
+.. note::
+
+    They are *similar* to exception handlers but are made to be *more* capable
+    due to their ability to *dynamically* choose a reconciliation strategy,
+    which allows for these atoms to influence subsequent execution(s) and the
+    inputs any associated atoms require.
+
+Area of influence
+-----------------
+
+Each retry atom is associated with a flow and it can *influence* how the
+atoms (or nested flows) contained in that that flow retry or revert (using
+the previously mentioned patterns and decision enumerations):
+
+*For example:*
+
+.. image:: img/area_of_influence.svg
+   :width: 325px
+   :align: left
+   :alt: Retry area of influence
+
+In this diagram retry controller (1) will be consulted if task ``A``, ``B``
+or ``C`` fail and retry controller (2) decides to delegate its retry decision
+to retry controller (1). If retry controller (2) does **not** decide to
+delegate its retry decision to retry controller (1) then retry
+controller (1) will be oblivious of any decisions. If any of
+task ``1``, ``2`` or ``3`` fail then only retry controller (1) will be
+consulted to determine the strategy/pattern to apply to resolve there
+associated failure.
+
+Usage examples
+--------------

 .. testsetup::

@@ -167,7 +191,13 @@ Interfaces
 ==========

 .. automodule:: taskflow.task
-.. automodule:: taskflow.retry
+.. autoclass:: taskflow.retry.Retry
+.. autoclass:: taskflow.retry.History
+.. autoclass:: taskflow.retry.AlwaysRevert
+.. autoclass:: taskflow.retry.AlwaysRevertAll
+.. autoclass:: taskflow.retry.Times
+.. autoclass:: taskflow.retry.ForEach
+.. autoclass:: taskflow.retry.ParameterizedForEach

 Hierarchy
 =========
@@ -175,5 +205,10 @@ Hierarchy
 .. inheritance-diagram::
    taskflow.atom
    taskflow.task
-    taskflow.retry
+    taskflow.retry.Retry
+    taskflow.retry.AlwaysRevert
+    taskflow.retry.AlwaysRevertAll
+    taskflow.retry.Times
+    taskflow.retry.ForEach
+    taskflow.retry.ParameterizedForEach
    :parts: 1
--- a/doc/source/conductors.rst
+++ b/doc/source/conductors.rst
@@ -2,6 +2,10 @@
 Conductors
 ----------

+.. image:: img/conductor.png
+   :width: 97px
+   :alt: Conductor
+
 Overview
 ========

@@ -18,14 +22,14 @@ They are responsible for the following:
  tasks and flows to be executed).
 * Dispatching the engine using the provided :doc:`persistence <persistence>`
  layer and engine configuration.
-* Completing or abandoning the claimed job (depending on dispatching and
-  execution outcome).
+* Completing or abandoning the claimed :doc:`job <jobs>` (depending on
+  dispatching and execution outcome).
 * *Rinse and repeat*.

 .. note::

     They are inspired by and have similar responsibilities
-     as `railroad conductors`_.
+     as `railroad conductors`_ or `musical conductors`_.

 Considerations
 ==============
@@ -53,28 +57,31 @@ claimable state.

 #. Forcefully delete jobs that have been failing continuously after a given
   number of conductor attempts. This can be either done manually or
-   automatically via scripts (or other associated monitoring).
+   automatically via scripts (or other associated monitoring) or via
+   the jobboards :py:func:`~taskflow.jobs.base.JobBoard.trash` method.
 #. Resolve the internal error's cause (storage backend failure, other...).
-#. Help implement `jobboard garbage binning`_.
-
-.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin

 Interfaces
 ==========

 .. automodule:: taskflow.conductors.base
+.. automodule:: taskflow.conductors.backends

 Implementations
 ===============

-.. automodule:: taskflow.conductors.single_threaded
+Blocking
+--------
+
+.. automodule:: taskflow.conductors.backends.impl_blocking

 Hierarchy
 =========

 .. inheritance-diagram::
    taskflow.conductors.base
-    taskflow.conductors.single_threaded
+    taskflow.conductors.backends.impl_blocking
    :parts: 1

+.. _musical conductors: http://en.wikipedia.org/wiki/Conducting
 .. _railroad conductors: http://en.wikipedia.org/wiki/Conductor_%28transportation%29
--- a/doc/source/engines.rst
+++ b/doc/source/engines.rst
@@ -17,11 +17,13 @@ and *ideal* is that deployers or developers of a service that use TaskFlow can
 select an engine that suites their setup best without modifying the code of
 said service.

-Engines usually have different capabilities and configuration, but all of them
-**must** implement the same interface and preserve the semantics of patterns
-(e.g. parts of a :py:class:`.linear_flow.Flow`
-are run one after another, in order, even if the selected engine is *capable*
-of running tasks in parallel).
+.. note::
+
+    Engines usually have different capabilities and configuration, but all of
+    them **must** implement the same interface and preserve the semantics of
+    patterns (e.g. parts of a :py:class:`.linear_flow.Flow`
+    are run one after another, in order, even if the selected
+    engine is *capable* of running tasks in parallel).

 Why they exist
 --------------
@@ -29,7 +31,7 @@ Why they exist
 An engine being *the* core component which actually makes your flows progress
 is likely a new concept for many programmers so let's describe how it operates
 in more depth and some of the reasoning behind why it exists. This will
-hopefully make it more clear on there value add to the TaskFlow library user.
+hopefully make it more clear on their value add to the TaskFlow library user.

 First though let us discuss something most are familiar already with; the
 difference between `declarative`_ and `imperative`_ programming models. The
@@ -57,7 +59,7 @@ declarative model) allows for the following functionality to become possible:
  accomplished allows for a *natural* way of resuming by allowing the engine to
  track the current state and know at which point a workflow is in and how to
  get back into that state when resumption occurs.
-* Enhancing scalability: When a engine is responsible for executing your
+* Enhancing scalability: When an engine is responsible for executing your
  desired work it becomes possible to alter the *how* in the future by creating
  new types of execution backends (for example the `worker`_ model which does
  not execute locally). Without the decoupling of the *what* and the *how* it
@@ -172,13 +174,13 @@ using your desired execution model.
    scalability by reducing thread/process creation and teardown as well as by
    reusing existing pools (which is a good practice in general).

-.. note::
+.. warning::

    Running tasks with a `process pool executor`_ is **experimentally**
    supported. This is mainly due to the `futures backport`_ and
    the `multiprocessing`_ module that exist in older versions of python not
    being as up to date (with important fixes such as :pybug:`4892`,
-    :pybug:`6721`, :pybug:`9205`, :pybug:`11635`, :pybug:`16284`,
+    :pybug:`6721`, :pybug:`9205`, :pybug:`16284`,
    :pybug:`22393` and others...) as the most recent python version (which
    themselves have a variety of ongoing/recent bugs).

@@ -203,7 +205,7 @@ For further information, please refer to the the following:
 How they run
 ============

-To provide a peek into the general process that a engine goes through when
+To provide a peek into the general process that an engine goes through when
 running lets break it apart a little and describe what one of the engine types
 does while executing (for this we will look into the
 :py:class:`~taskflow.engines.action_engine.engine.ActionEngine` engine type).
@@ -221,39 +223,48 @@ are setup.
 Compiling
 ---------

-During this stage the flow will be converted into an internal graph
-representation using a
-:py:class:`~taskflow.engines.action_engine.compiler.Compiler` (the default
-implementation for patterns is the
+During this stage (see :py:func:`~taskflow.engines.base.Engine.compile`) the
+flow will be converted into an internal graph representation using a
+compiler (the default implementation for patterns is the
 :py:class:`~taskflow.engines.action_engine.compiler.PatternCompiler`). This
 class compiles/converts the flow objects and contained atoms into a
-`networkx`_ directed graph that contains the equivalent atoms defined in the
-flow and any nested flows & atoms as well as the constraints that are created
-by the application of the different flow patterns. This graph is then what will
-be analyzed & traversed during the engines execution. At this point a few
-helper object are also created and saved to internal engine variables (these
-object help in execution of atoms, analyzing the graph and performing other
-internal engine activities). At the finishing of this stage a
+`networkx`_ directed graph (and tree structure) that contains the equivalent
+atoms defined in the flow and any nested flows & atoms as well as the
+constraints that are created by the application of the different flow
+patterns. This graph (and tree) are what will be analyzed & traversed during
+the engines execution. At this point a few helper object are also created and
+saved to internal engine variables (these object help in execution of
+atoms, analyzing the graph and performing other internal engine
+activities). At the finishing of this stage a
 :py:class:`~taskflow.engines.action_engine.runtime.Runtime` object is created
-which contains references to all needed runtime components.
+which contains references to all needed runtime components and its
+:py:func:`~taskflow.engines.action_engine.runtime.Runtime.compile` is called
+to compile a cache of frequently used execution helper objects.

 Preparation
 -----------

-This stage starts by setting up the storage needed for all atoms in the
-previously created graph, ensuring that corresponding
-:py:class:`~taskflow.persistence.logbook.AtomDetail` (or subclass of) objects
-are created for each node in the graph. Once this is done final validation
-occurs on the requirements that are needed to start execution and what
-:py:class:`~taskflow.storage.Storage` provides.  If there is any atom or flow
-requirements not satisfied then execution will not be allowed to continue.
+This stage (see :py:func:`~taskflow.engines.base.Engine.prepare`) starts by
+setting up the storage needed for all atoms in the compiled graph, ensuring
+that corresponding :py:class:`~taskflow.persistence.models.AtomDetail` (or
+subclass of) objects are created for each node in the graph.
+
+Validation
+----------
+
+This stage (see :py:func:`~taskflow.engines.base.Engine.validate`) performs
+any final validation of the compiled (and now storage prepared) engine. It
+compares the requirements that are needed to start execution and
+what is currently provided or will be produced in the future. If there are
+*any* atom requirements that are not satisfied (no known current provider or
+future producer is found) then execution will **not** be allowed to continue.

 Execution
 ---------

 The graph (and helper objects) previously created are now used for guiding
-further execution. The flow is put into the ``RUNNING`` :doc:`state <states>`
-and a
+further execution (see :py:func:`~taskflow.engines.base.Engine.run`). The
+flow is put into the ``RUNNING`` :doc:`state <states>` and a
 :py:class:`~taskflow.engines.action_engine.runner.Runner` implementation
 object starts to take over and begins going through the stages listed
 below (for a more visual diagram/representation see
@@ -262,10 +273,10 @@ the :ref:`engine state diagram <engine states>`).
 .. note::

   The engine will respect the constraints imposed by the flow. For example,
-   if Engine is executing a :py:class:`.linear_flow.Flow` then it is
-   constrained by the dependency-graph which is linear in this case, and hence
-   using a Parallel Engine may not yield any benefits if one is looking for
-   concurrency.
+   if an engine is executing a :py:class:`~taskflow.patterns.linear_flow.Flow`
+   then it is constrained by the dependency graph which is linear in this
+   case, and hence using a parallel engine may not yield any benefits if one
+   is looking for concurrency.

 Resumption
 ^^^^^^^^^^
@@ -282,7 +293,7 @@ for things like retry atom which can influence what a tasks intention should be
 :py:class:`~taskflow.engines.action_engine.analyzer.Analyzer` helper
 object which was designed to provide helper methods for this analysis). Once
 these intentions are determined and associated with each task (the intention is
-also stored in the :py:class:`~taskflow.persistence.logbook.AtomDetail` object)
+also stored in the :py:class:`~taskflow.persistence.models.AtomDetail` object)
 the :ref:`scheduling <scheduling>` stage starts.

 .. _scheduling:
@@ -292,7 +303,7 @@ Scheduling

 This stage selects which atoms are eligible to run by using a
 :py:class:`~taskflow.engines.action_engine.scheduler.Scheduler` implementation
-(the default implementation looks at there intention, checking if predecessor
+(the default implementation looks at their intention, checking if predecessor
 atoms have ran and so-on, using a
 :py:class:`~taskflow.engines.action_engine.analyzer.Analyzer` helper
 object as needed) and submits those atoms to a previously provided compatible
@@ -312,15 +323,15 @@ submitted to complete. Once one of the future objects completes (or fails) that
 atoms result will be examined and finalized using a
 :py:class:`~taskflow.engines.action_engine.completer.Completer` implementation.
 It typically will persist results to a provided persistence backend (saved
-into the corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail`
-and :py:class:`~taskflow.persistence.logbook.FlowDetail` objects via the
+into the corresponding :py:class:`~taskflow.persistence.models.AtomDetail`
+and :py:class:`~taskflow.persistence.models.FlowDetail` objects via the
 :py:class:`~taskflow.storage.Storage` helper) and reflect
 the new state of the atom. At this point what typically happens falls into two
 categories, one for if that atom failed and one for if it did not. If the atom
 failed it may be set to a new intention such as ``RETRY`` or
 ``REVERT`` (other atoms that were predecessors of this failing atom may also
 have there intention altered). Once this intention adjustment has happened a
-new round of :ref:`scheduling <scheduling>`  occurs and this process repeats
+new round of :ref:`scheduling <scheduling>` occurs and this process repeats
 until the engine succeeds or fails (if the process running the engine dies the
 above stages will be restarted and resuming will occur).

@@ -328,8 +339,8 @@ above stages will be restarted and resuming will occur).

    If the engine is suspended while the engine is going through the above
    stages this will stop any further scheduling stages from occurring and
-    all currently executing atoms will be allowed to finish (and there results
-    will be saved).
+    all currently executing work will be allowed to finish (see
+    :ref:`suspension <suspension>`).

 Finishing
 ---------
@@ -346,6 +357,79 @@ failures have occurred then the engine will have finished and if so desired the
 :doc:`persistence <persistence>` can be used to cleanup any details that were
 saved for this execution.

+Special cases
+=============
+
+.. _suspension:
+
+Suspension
+----------
+
+Each engine implements a :py:func:`~taskflow.engines.base.Engine.suspend`
+method that can be used to *externally* (or in the future *internally*) request
+that the engine stop :ref:`scheduling <scheduling>` new work. By default what
+this performs is a transition of the flow state from ``RUNNING`` into a
+``SUSPENDING`` state (which will later transition into a ``SUSPENDED`` state).
+Since an engine may be remotely executing atoms (or locally executing them)
+and there is currently no preemption what occurs is that the engines
+:py:class:`~taskflow.engines.action_engine.runner.Runner` state machine will
+detect this transition into ``SUSPENDING`` has occurred and the state
+machine will avoid scheduling new work (it will though let active work
+continue). After the current work has finished the engine will
+transition from ``SUSPENDING`` into ``SUSPENDED`` and return from its
+:py:func:`~taskflow.engines.base.Engine.run` method.
+
+
+.. note::
+
+    When :py:func:`~taskflow.engines.base.Engine.run`  is returned from at that
+    point there *may* (but does not have to be, depending on what was active
+    when :py:func:`~taskflow.engines.base.Engine.suspend` was called) be
+    unfinished work in the flow that was not finished (but which can be
+    resumed at a later point in time).
+
+Scoping
+=======
+
+During creation of flows it is also important to understand the lookup
+strategy (also typically known as `scope`_ resolution) that the engine you
+are using will internally use. For example when a task ``A`` provides
+result 'a' and a task ``B`` after ``A`` provides a different result 'a' and a
+task ``C`` after ``A`` and after ``B`` requires 'a' to run, which one will
+be selected?
+
+Default strategy
+----------------
+
+When an engine is executing it internally interacts with the
+:py:class:`~taskflow.storage.Storage` class
+and that class interacts with the a
+:py:class:`~taskflow.engines.action_engine.scopes.ScopeWalker` instance
+and the :py:class:`~taskflow.storage.Storage` class uses the following
+lookup order to find (or fail) a atoms requirement lookup/request:
+
+#. Transient injected atom specific arguments.
+#. Non-transient injected atom specific arguments.
+#. Transient injected arguments (flow specific).
+#. Non-transient injected arguments (flow specific).
+#. First scope visited provider that produces the named result; note that
+   if multiple providers are found in the same scope the *first* (the scope
+   walkers yielded ordering defines what *first* means) that produced that
+   result *and* can be extracted without raising an error is selected as the
+   provider of the requested requirement.
+#. Fails with :py:class:`~taskflow.exceptions.NotFound` if unresolved at this
+   point (the ``cause`` attribute of this exception may have more details on
+   why the lookup failed).
+
+.. note::
+
+    To examine this this information when debugging it is recommended to
+    enable the ``BLATHER`` logging level (level 5). At this level the storage
+    and scope code/layers will log what is being searched for and what is
+    being found.
+
+.. _scope: http://en.wikipedia.org/wiki/Scope_%28computer_science%29
+
 Interfaces
 ==========

@@ -354,15 +438,27 @@ Interfaces
 Implementations
 ===============

+.. automodule:: taskflow.engines.action_engine.engine
+
+Components
+----------
+
+.. warning::
+
+    External usage of internal engine functions, components and modules should
+    be kept to a **minimum** as they may be altered, refactored or moved to
+    other locations **without** notice (and without the typical deprecation
+    cycle).
+
 .. automodule:: taskflow.engines.action_engine.analyzer
 .. automodule:: taskflow.engines.action_engine.compiler
 .. automodule:: taskflow.engines.action_engine.completer
-.. automodule:: taskflow.engines.action_engine.engine
 .. automodule:: taskflow.engines.action_engine.executor
 .. automodule:: taskflow.engines.action_engine.runner
 .. automodule:: taskflow.engines.action_engine.runtime
 .. automodule:: taskflow.engines.action_engine.scheduler
-.. automodule:: taskflow.engines.action_engine.scopes
+.. autoclass:: taskflow.engines.action_engine.scopes.ScopeWalker
+    :special-members: __iter__

 Hierarchy
 =========
--- a/doc/source/examples.rst
+++ b/doc/source/examples.rst
@@ -34,6 +34,30 @@ Using listeners
    :linenos:
    :lines: 16-

+Using listeners (to watch a phone call)
+=======================================
+
+.. note::
+
+    Full source located at :example:`simple_linear_listening`.
+
+.. literalinclude:: ../../taskflow/examples/simple_linear_listening.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+Dumping a in-memory backend
+===========================
+
+.. note::
+
+    Full source located at :example:`dump_memory_backend`.
+
+.. literalinclude:: ../../taskflow/examples/dump_memory_backend.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
 Making phone calls
 ==================

@@ -176,6 +200,18 @@ Summation mapper(s) and reducer (in parallel)
    :linenos:
    :lines: 16-

+Sharing a thread pool executor (in parallel)
+============================================
+
+.. note::
+
+    Full source located at :example:`share_engine_thread`
+
+.. literalinclude:: ../../taskflow/examples/share_engine_thread.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
 Storing & emitting a bill
 =========================

@@ -306,3 +342,28 @@ Jobboard producer/consumer (simple)
    :language: python
    :linenos:
    :lines: 16-
+
+Conductor simulating a CI pipeline
+==================================
+
+.. note::
+
+    Full source located at :example:`tox_conductor`
+
+.. literalinclude:: ../../taskflow/examples/tox_conductor.py
+    :language: python
+    :linenos:
+    :lines: 16-
+
+
+Conductor running 99 bottles of beer song requests
+==================================================
+
+.. note::
+
+    Full source located at :example:`99_bottles`
+
+.. literalinclude:: ../../taskflow/examples/99_bottles.py
+    :language: python
+    :linenos:
+    :lines: 16-
--- a/doc/source/history.rst
+++ b/doc/source/history.rst
@@ -0,0 +1,2 @@
+.. include:: ../../ChangeLog
+
--- a/doc/source/img/area_of_influence.svg
+++ b/doc/source/img/area_of_influence.svg
--- a/doc/source/img/conductor.png
+++ b/doc/source/img/conductor.png
--- a/doc/source/img/job_states.svg
+++ b/doc/source/img/job_states.svg
--- a/doc/source/img/retry_states.svg
+++ b/doc/source/img/retry_states.svg
--- a/doc/source/img/task_states.svg
+++ b/doc/source/img/task_states.svg
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -14,7 +14,7 @@ Contents
 ========

 .. toctree::
-   :maxdepth: 3
+   :maxdepth: 2

   atoms
   arguments_and_results
@@ -29,6 +29,9 @@ Contents
   jobs
   conductors

+Supplementary
+=============
+
 Examples
 --------

@@ -62,7 +65,8 @@ TaskFlow into your project:

 * Read over the `paradigm shifts`_ and engage the team in `IRC`_ (or via the
  `openstack-dev`_ mailing list) if these need more explanation (prefix
-  ``[TaskFlow]`` to your emails subject to get an even faster response).
+  ``[Oslo][TaskFlow]`` to your emails subject to get an even faster
+  response).
 * Follow (or at least attempt to follow) some of the established
  `best practices`_ (feel free to add your own suggested best practices).
 * Keep in touch with the team (see above); we are all friendly and enjoy
@@ -85,6 +89,29 @@ Miscellaneous
   types
   utils

+Bookshelf
+---------
+
+A useful collection of links, documents, papers, similar
+projects, frameworks and libraries.
+
+.. note::
+
+     Please feel free to submit your own additions and/or changes.
+
+.. toctree::
+   :maxdepth: 1
+
+   shelf
+
+History
+-------
+
+.. toctree::
+   :maxdepth: 2
+
+   history
+
 Indices and tables
 ==================

--- a/doc/source/jobs.rst
+++ b/doc/source/jobs.rst
@@ -30,7 +30,7 @@ Definitions
 Jobs
  A :py:class:`job <taskflow.jobs.base.Job>` consists of a unique identifier,
  name, and a reference to a :py:class:`logbook
-  <taskflow.persistence.logbook.LogBook>` which contains the details of the
+  <taskflow.persistence.models.LogBook>` which contains the details of the
  work that has been or should be/will be completed to finish the work that has
  been created for that job.

@@ -43,7 +43,7 @@ Jobboards
  jobboards implement the same interface and semantics so that the backend
  usage is as transparent as possible. This allows deployers or developers of a
  service that uses TaskFlow to select a jobboard implementation that fits
-  their setup (and there intended usage) best.
+  their setup (and their intended usage) best.

 High level architecture
 =======================
@@ -62,7 +62,8 @@ Features
    the previously partially completed work or begin initial work to ensure
    that the workflow as a whole progresses (where progressing implies
    transitioning through the workflow :doc:`patterns <patterns>` and
-    :doc:`atoms <atoms>` and completing their associated state transitions).
+    :doc:`atoms <atoms>` and completing their associated
+    :doc:`states <states>` transitions).

 - Atomic transfer and single ownership

@@ -94,11 +95,12 @@ Features
 Usage
 =====

-All engines are mere classes that implement same interface, and of course it is
-possible to import them and create their instances just like with any classes
-in Python. But the easier (and recommended) way for creating jobboards is by
-using the :py:meth:`fetch() <taskflow.jobs.backends.fetch>` function which uses
-entrypoints (internally using `stevedore`_) to fetch and configure your backend
+All jobboards are mere classes that implement same interface, and of course
+it is possible to import them and create instances of them just like with any
+other class in Python. But the easier (and recommended) way for creating
+jobboards is by using the :py:meth:`fetch() <taskflow.jobs.backends.fetch>`
+function which uses entrypoints (internally using `stevedore`_) to fetch and
+configure your backend.

 Using this function the typical creation of a jobboard (and an example posting
 of a job) might look like:
@@ -200,13 +202,27 @@ Additional *configuration* parameters:
 * ``handler``: a class that provides ``kazoo.handlers``-like interface; it will
  be used internally by `kazoo`_ to perform asynchronous operations, useful
  when your program uses eventlet and you want to instruct kazoo to use an
-  eventlet compatible handler (such as the `eventlet handler`_).
+  eventlet compatible handler.

 .. note::

    See :py:class:`~taskflow.jobs.backends.impl_zookeeper.ZookeeperJobBoard`
    for implementation details.

+Redis
+-----
+
+**Board type**: ``'redis'``
+
+Uses `redis`_ to provide the jobboard capabilities and semantics by using
+a redis hash datastructure and individual job ownership keys (that can
+optionally expire after a given amount of time).
+
+.. note::
+
+    See :py:class:`~taskflow.jobs.backends.impl_redis.RedisJobBoard`
+    for implementation details.
+
 Considerations
 ==============

@@ -218,7 +234,7 @@ Dual-engine jobs
 ----------------

 **What:** Since atoms and engines are not currently `preemptable`_ we can not
-force a engine (or the threads/remote workers... it is using to run) to stop
+force an engine (or the threads/remote workers... it is using to run) to stop
 working on an atom (it is general bad behavior to force code to stop without
 its consent anyway) if it has already started working on an atom (short of
 doing a ``kill -9`` on the running interpreter).  This could cause problems
@@ -265,18 +281,27 @@ Interfaces
 Implementations
 ===============

+Zookeeper
+---------
+
 .. automodule:: taskflow.jobs.backends.impl_zookeeper

+Redis
+-----
+
+.. automodule:: taskflow.jobs.backends.impl_redis
+
 Hierarchy
 =========

 .. inheritance-diagram::
    taskflow.jobs.base
+    taskflow.jobs.backends.impl_redis
    taskflow.jobs.backends.impl_zookeeper
    :parts: 1

 .. _paradigm shift: https://wiki.openstack.org/wiki/TaskFlow/Paradigm_shifts#Workflow_ownership_transfer
 .. _zookeeper: http://zookeeper.apache.org/
 .. _kazoo: http://kazoo.readthedocs.org/
-.. _eventlet handler: https://pypi.python.org/pypi/kazoo-eventlet-handler/
 .. _stevedore: http://stevedore.readthedocs.org/
+.. _redis: http://redis.io/
--- a/doc/source/notifications.rst
+++ b/doc/source/notifications.rst
@@ -1,6 +1,6 @@
-===========================
+---------------------------
 Notifications and listeners
-===========================
+---------------------------

 .. testsetup::

@@ -10,13 +10,12 @@ Notifications and listeners
    from taskflow.types import notifier
    ANY = notifier.Notifier.ANY

--------
 Overview
--------
+========

 Engines provide a way to receive notification on task and flow state
-transitions, which is useful for monitoring, logging, metrics, debugging
-and plenty of other tasks.
+transitions (see :doc:`states <states>`), which is useful for
+monitoring, logging, metrics, debugging and plenty of other tasks.

 To receive these notifications you should register a callback with
 an instance of the :py:class:`~taskflow.types.notifier.Notifier`
@@ -27,9 +26,8 @@ TaskFlow also comes with a set of predefined :ref:`listeners <listeners>`, and
 provides means to write your own listeners, which can be more convenient than
 using raw callbacks.

--------------------------------------
 Receiving notifications with callbacks
--------------------------------------
+======================================

 Flow notifications
 ------------------
@@ -106,9 +104,8 @@ A basic example is:

 .. _listeners:

---------
 Listeners
---------
+=========

 TaskFlow comes with a set of predefined listeners -- helper classes that can be
 used to do various actions on flow and/or tasks transitions. You can also
@@ -147,28 +144,31 @@ For example, this is how you can use
   <taskflow.engines.action_engine.engine.SerialActionEngine object at ...> has moved task 'DogTalk' (...) into state 'SUCCESS' from state 'RUNNING' with result 'dog' (failure=False)
   <taskflow.engines.action_engine.engine.SerialActionEngine object at ...> has moved flow 'cat-dog' (...) into state 'SUCCESS' from state 'RUNNING'

-Basic listener
--------------
+Interfaces
+==========

-.. autoclass:: taskflow.listeners.base.Listener
+.. automodule:: taskflow.listeners.base
+
+Implementations
+===============

 Printing and logging listeners
 ------------------------------

-.. autoclass:: taskflow.listeners.base.DumpingListener
-
 .. autoclass:: taskflow.listeners.logging.LoggingListener

 .. autoclass:: taskflow.listeners.logging.DynamicLoggingListener

 .. autoclass:: taskflow.listeners.printing.PrintingListener

-Timing listener
---------------
+Timing listeners
+----------------

-.. autoclass:: taskflow.listeners.timing.TimingListener
+.. autoclass:: taskflow.listeners.timing.DurationListener

-.. autoclass:: taskflow.listeners.timing.PrintingTimingListener
+.. autoclass:: taskflow.listeners.timing.PrintingDurationListener
+
+.. autoclass:: taskflow.listeners.timing.EventTimeListener

 Claim listener
 --------------
@@ -181,7 +181,7 @@ Capturing listener
 .. autoclass:: taskflow.listeners.capturing.CaptureListener

 Hierarchy
---------
+=========

 .. inheritance-diagram::
    taskflow.listeners.base.DumpingListener
@@ -191,6 +191,7 @@ Hierarchy
    taskflow.listeners.logging.DynamicLoggingListener
    taskflow.listeners.logging.LoggingListener
    taskflow.listeners.printing.PrintingListener
-    taskflow.listeners.timing.PrintingTimingListener
-    taskflow.listeners.timing.TimingListener
+    taskflow.listeners.timing.PrintingDurationListener
+    taskflow.listeners.timing.EventTimeListener
+    taskflow.listeners.timing.DurationListener
    :parts: 1
--- a/doc/source/persistence.rst
+++ b/doc/source/persistence.rst
@@ -40,38 +40,38 @@ On :doc:`engine <engines>` construction typically a backend (it can be
 optional) will be provided which satisfies the
 :py:class:`~taskflow.persistence.base.Backend` abstraction. Along with
 providing a backend object a
-:py:class:`~taskflow.persistence.logbook.FlowDetail` object will also be
+:py:class:`~taskflow.persistence.models.FlowDetail` object will also be
 created and provided (this object will contain the details about the flow to be
 ran) to the engine constructor (or associated :py:meth:`load()
 <taskflow.engines.helpers.load>` helper functions).  Typically a
-:py:class:`~taskflow.persistence.logbook.FlowDetail` object is created from a
-:py:class:`~taskflow.persistence.logbook.LogBook` object (the book object acts
-as a type of container for :py:class:`~taskflow.persistence.logbook.FlowDetail`
-and :py:class:`~taskflow.persistence.logbook.AtomDetail` objects).
+:py:class:`~taskflow.persistence.models.FlowDetail` object is created from a
+:py:class:`~taskflow.persistence.models.LogBook` object (the book object acts
+as a type of container for :py:class:`~taskflow.persistence.models.FlowDetail`
+and :py:class:`~taskflow.persistence.models.AtomDetail` objects).

 **Preparation**: Once an engine starts to run it will create a
 :py:class:`~taskflow.storage.Storage` object which will act as the engines
 interface to the underlying backend storage objects (it provides helper
 functions that are commonly used by the engine, avoiding repeating code when
 interacting with the provided
-:py:class:`~taskflow.persistence.logbook.FlowDetail` and
+:py:class:`~taskflow.persistence.models.FlowDetail` and
 :py:class:`~taskflow.persistence.base.Backend` objects). As an engine
 initializes it will extract (or create)
-:py:class:`~taskflow.persistence.logbook.AtomDetail` objects for each atom in
+:py:class:`~taskflow.persistence.models.AtomDetail` objects for each atom in
 the workflow the engine will be executing.

 **Execution:** When an engine beings to execute (see :doc:`engine <engines>`
 for more of the details about how an engine goes about this process) it will
 examine any previously existing
-:py:class:`~taskflow.persistence.logbook.AtomDetail` objects to see if they can
+:py:class:`~taskflow.persistence.models.AtomDetail` objects to see if they can
 be used for resuming; see :doc:`resumption <resumption>` for more details on
 this subject. For atoms which have not finished (or did not finish correctly
 from a previous run) they will begin executing only after any dependent inputs
 are ready. This is done by analyzing the execution graph and looking at
-predecessor :py:class:`~taskflow.persistence.logbook.AtomDetail` outputs and
+predecessor :py:class:`~taskflow.persistence.models.AtomDetail` outputs and
 states (which may have been persisted in a past run). This will result in
-either using there previous information or by running those predecessors and
-saving their output to the :py:class:`~taskflow.persistence.logbook.FlowDetail`
+either using their previous information or by running those predecessors and
+saving their output to the :py:class:`~taskflow.persistence.models.FlowDetail`
 and :py:class:`~taskflow.persistence.base.Backend` objects. This
 execution, analysis and interaction with the storage objects continues (what is
 described here is a simplification of what really happens; which is quite a bit
@@ -81,7 +81,7 @@ will have succeeded or failed in its attempt to run the workflow).
 **Post-execution:** Typically when an engine is done running the logbook would
 be discarded (to avoid creating a stockpile of useless data) and the backend
 storage would be told to delete any contents for a given execution. For certain
-use-cases though it may be advantageous to retain logbooks and there contents.
+use-cases though it may be advantageous to retain logbooks and their contents.

 A few scenarios come to mind:

@@ -176,7 +176,7 @@ concept everyone is familiar with).
    See :py:class:`~taskflow.persistence.backends.impl_dir.DirBackend`
    for implementation details.

-Sqlalchemy
+SQLAlchemy
 ----------

 **Connection**: ``'mysql'`` or ``'postgres'`` or ``'sqlite'``
@@ -249,9 +249,13 @@ parent_uuid  VARCHAR   False
    ``results`` will contain. This size limit will restrict how many prior
    failures a retry atom can contain. More information and a future fix
    will be posted to bug `1416088`_ (for the meantime try to ensure that
-    your retry units history does not grow beyond ~80 prior results).
+    your retry units history does not grow beyond ~80 prior results). This
+    truncation can also be avoided by providing ``mysql_sql_mode`` as
+    ``traditional`` when selecting your mysql + sqlalchemy based
+    backend (see the `mysql modes`_ documentation for what this implies).

 .. _1416088: http://bugs.launchpad.net/taskflow/+bug/1416088
+.. _mysql modes: http://dev.mysql.com/doc/refman/5.0/en/sql-mode.html

 Zookeeper
 ---------
@@ -279,14 +283,34 @@ Interfaces

 .. automodule:: taskflow.persistence.backends
 .. automodule:: taskflow.persistence.base
-.. automodule:: taskflow.persistence.logbook
+.. automodule:: taskflow.persistence.path_based
+
+Models
+======
+
+.. automodule:: taskflow.persistence.models

 Implementations
 ===============

-.. automodule:: taskflow.persistence.backends.impl_dir
+Memory
+------
+
 .. automodule:: taskflow.persistence.backends.impl_memory
+
+Files
+-----
+
+.. automodule:: taskflow.persistence.backends.impl_dir
+
+SQLAlchemy
+----------
+
 .. automodule:: taskflow.persistence.backends.impl_sqlalchemy
+
+Zookeeper
+---------
+
 .. automodule:: taskflow.persistence.backends.impl_zookeeper

 Storage
--- a/doc/source/resumption.rst
+++ b/doc/source/resumption.rst
@@ -46,7 +46,7 @@ name serves a special purpose in the resumption process (as well as serving a
 useful purpose when running, allowing for atom identification in the
 :doc:`notification <notifications>` process). The reason for having names is
 that an atom in a flow needs to be somehow  matched with (a potentially)
-existing :py:class:`~taskflow.persistence.logbook.AtomDetail` during engine
+existing :py:class:`~taskflow.persistence.models.AtomDetail` during engine
 resumption & subsequent running.

 The match should be:
@@ -71,9 +71,9 @@ Scenarios
 =========

 When new flow is loaded into engine, there is no persisted data for it yet, so
-a corresponding :py:class:`~taskflow.persistence.logbook.FlowDetail` object
+a corresponding :py:class:`~taskflow.persistence.models.FlowDetail` object
 will be created, as well as a
-:py:class:`~taskflow.persistence.logbook.AtomDetail` object for each atom that
+:py:class:`~taskflow.persistence.models.AtomDetail` object for each atom that
 is contained in it. These will be immediately saved into the persistence
 backend that is configured. If no persistence backend is configured, then as
 expected nothing will be saved and the atoms and flow will be ran in a
@@ -94,7 +94,7 @@ When the factory function mentioned above returns the exact same the flow and
 atoms (no changes are performed).

 **Runtime change:** Nothing should be done -- the engine will re-associate
-atoms with :py:class:`~taskflow.persistence.logbook.AtomDetail` objects by name
+atoms with :py:class:`~taskflow.persistence.models.AtomDetail` objects by name
 and then the engine resumes.

 Atom was added
@@ -105,7 +105,7 @@ in (for example for changing the runtime structure of what was previously ran
 in the first run).

 **Runtime change:** By default when the engine resumes it will notice that a
-corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail` does not
+corresponding :py:class:`~taskflow.persistence.models.AtomDetail` does not
 exist and one will be created and associated.

 Atom was removed
@@ -134,7 +134,7 @@ factory should replace this name where it was being used previously.
 exist when a new atom is added. In the future TaskFlow could make this easier
 by providing a ``upgrade()`` function that can be used to give users the
 ability to upgrade atoms before running (manual introspection & modification of
-a :py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
+a :py:class:`~taskflow.persistence.models.LogBook` can be done before engine
 loading and running to accomplish this in the meantime).

 Atom was split in two atoms or merged
@@ -150,7 +150,7 @@ exist when a new atom is added or removed. In the future TaskFlow could make
 this easier by providing a ``migrate()`` function that can be used to give
 users the ability to migrate atoms previous data before running (manual
 introspection & modification of a
-:py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
+:py:class:`~taskflow.persistence.models.LogBook` can be done before engine
 loading and running to accomplish this in the meantime).

 Flow structure was changed
--- a/doc/source/shelf.rst
+++ b/doc/source/shelf.rst
@@ -0,0 +1,60 @@
+Libraries & frameworks
+----------------------
+
+* `APScheduler`_ (Python)
+* `Async`_ (Python)
+* `Celery`_ (Python)
+* `Graffiti`_ (Python)
+* `JobLib`_ (Python)
+* `Luigi`_ (Python)
+* `Mesos`_ (C/C++)
+* `Papy`_ (Python)
+* `Parallel Python`_ (Python)
+* `RQ`_ (Python)
+* `Spiff`_ (Python)
+* `TBB Flow`_ (C/C++)
+
+Languages
+---------
+
+* `Ani`_
+* `Make`_
+* `Plaid`_
+
+Services
+--------
+
+* `Cloud Dataflow`_
+* `Mistral`_
+
+Papers
+------
+
+* `Advances in Dataflow Programming Languages`_
+
+Related paradigms
+-----------------
+
+* `Dataflow programming`_
+* `Programming paradigm(s)`_
+
+.. _APScheduler: http://pythonhosted.org/APScheduler/
+.. _Async: http://pypi.python.org/pypi/async
+.. _Celery: http://www.celeryproject.org/
+.. _Graffiti: http://github.com/SegFaultAX/graffiti
+.. _JobLib: http://pythonhosted.org/joblib/index.html
+.. _Luigi: http://github.com/spotify/luigi
+.. _RQ: http://python-rq.org/
+.. _Mistral: http://wiki.openstack.org/wiki/Mistral
+.. _Mesos: http://mesos.apache.org/
+.. _Parallel Python: http://www.parallelpython.com/
+.. _Spiff: http://github.com/knipknap/SpiffWorkflow
+.. _Papy: http://code.google.com/p/papy/
+.. _Make: http://www.gnu.org/software/make/
+.. _Ani: http://code.google.com/p/anic/
+.. _Programming paradigm(s): http://en.wikipedia.org/wiki/Programming_paradigm
+.. _Plaid: http://www.cs.cmu.edu/~aldrich/plaid/
+.. _Advances in Dataflow Programming Languages: http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/DataFlowProgrammingLanguages.pdf
+.. _Cloud Dataflow: https://cloud.google.com/dataflow/
+.. _TBB Flow: https://www.threadingbuildingblocks.org/tutorial-intel-tbb-flow-graph
+.. _Dataflow programming: http://en.wikipedia.org/wiki/Dataflow_programming
--- a/doc/source/states.rst
+++ b/doc/source/states.rst
@@ -121,9 +121,14 @@ or if needed will wait for all of the atoms it depends on to complete.

 .. note::

-  A engine running a task also transitions the task to the ``PENDING`` state
+  An engine running a task also transitions the task to the ``PENDING`` state
  after it was reverted and its containing flow was restarted or retried.

+
+**IGNORE** - When a conditional decision has been made to skip (not
+execute) the task the engine will transition the task to
+the ``IGNORE`` state.
+
 **RUNNING** - When an engine running the task starts to execute the task, the
 engine will transition the task to the ``RUNNING`` state, and the task will
 stay in this state until the tasks :py:meth:`~taskflow.task.BaseTask.execute`
@@ -168,10 +173,14 @@ flow that the retry is associated with by consulting its

 .. note::

-  A engine running a retry also transitions the retry to the ``PENDING`` state
+  An engine running a retry also transitions the retry to the ``PENDING`` state
  after it was reverted and its associated flow was restarted or retried.

-**RUNNING** - When a engine starts to execute the retry, the engine
+**IGNORE** - When a conditional decision has been made to skip (not
+execute) the retry the engine will transition the retry to
+the ``IGNORE`` state.
+
+**RUNNING** - When an engine starts to execute the retry, the engine
 transitions the retry to the ``RUNNING`` state, and the retry stays in this
 state until its :py:meth:`~taskflow.retry.Retry.execute` method returns.

@@ -194,3 +203,26 @@ already in the ``FAILURE`` state then this is a no-op).
 **RETRYING** - If flow that is associated with the current retry was failed and
 reverted, the engine prepares the flow for the next run and transitions the
 retry to the ``RETRYING`` state.
+
+Jobs
+====
+
+.. image:: img/job_states.svg
+   :width: 500px
+   :align: center
+   :alt: Job state transitions
+
+**UNCLAIMED** - A job (with details about what work is to be completed) has
+been initially posted (by some posting entity) for work on by some other
+entity (for example a :doc:`conductor <conductors>`). This can also be a state
+that is entered when some owning entity has manually abandoned (or
+lost ownership of) a previously claimed job.
+
+**CLAIMED** - A job that is *actively* owned by some entity; typically that
+ownership is tied to jobs persistent data via some ephemeral connection so
+that the job ownership is lost (typically automatically or after some
+timeout) if that ephemeral connection is lost.
+
+**COMPLETE** - The work defined in the job has been finished by its owning
+entity and the job can no longer be processed (and it *may* be removed at
+some/any point in the future).
--- a/doc/source/types.rst
+++ b/doc/source/types.rst
@@ -29,11 +29,6 @@ FSM

 .. automodule:: taskflow.types.fsm

-Futures
-=======
-
-.. automodule:: taskflow.types.futures
-
 Graph
 =====

@@ -43,11 +38,12 @@ Notifier
 ========

 .. automodule:: taskflow.types.notifier
+    :special-members: __call__

-Periodic
-========
+Sets
+====

-.. automodule:: taskflow.types.periodic
+.. automodule:: taskflow.types.sets

 Table
 =====
--- a/doc/source/utils.rst
+++ b/doc/source/utils.rst
@@ -33,11 +33,6 @@ Kombu

 .. automodule:: taskflow.utils.kombu_utils

-Locks
-~~~~~
-
-.. automodule:: taskflow.utils.lock_utils
-
 Miscellaneous
 ~~~~~~~~~~~~~

@@ -48,6 +43,16 @@ Persistence

 .. automodule:: taskflow.utils.persistence_utils

+Redis
+~~~~~
+
+.. automodule:: taskflow.utils.redis_utils
+
+Schema
+~~~~~~
+
+.. automodule:: taskflow.utils.schema_utils
+
 Threading
 ~~~~~~~~~

--- a/doc/source/workers.rst
+++ b/doc/source/workers.rst
@@ -7,10 +7,9 @@ connected via `amqp`_ (or other supported `kombu`_ transports).

 .. note::

-    This engine is under active development and is experimental but it is
-    usable and does work but is missing some features (please check the
-    `blueprint page`_ for known issues and plans) that will make it more
-    production ready.
+    This engine is under active development and is usable and **does** work
+    but is missing some features (please check the `blueprint page`_ for
+    known issues and plans) that will make it more production ready.

 .. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe

@@ -18,8 +17,8 @@ Terminology
 -----------

 Client
-  Code or program or service that uses this library to define flows and
-  run them via engines.
+  Code or program or service (or user) that uses this library to define
+  flows and run them via engines.

 Transport + protocol
  Mechanism (and `protocol`_ on top of that mechanism) used to pass information
@@ -118,7 +117,7 @@ engine executor in the following manner:
 4. The executor gets the task request confirmation from the worker and the task
   request state changes from the ``PENDING`` to the ``RUNNING`` state. Once a
   task request is in the ``RUNNING`` state it can't be timed-out (considering
-   that task execution process may take unpredictable time).
+   that the task execution process may take an unpredictable amount of time).
 5. The executor gets the task execution result from the worker and passes it
   back to the executor and worker-based engine to finish task processing (this
   repeats for subsequent tasks).
@@ -129,7 +128,9 @@ engine executor in the following manner:
    json-serializable (they contain references to tracebacks which are not
    serializable), so they are converted to dicts before sending and converted
    from dicts after receiving on both executor & worker sides (this
-    translation is lossy since the traceback won't be fully retained).
+    translation is lossy since the traceback can't be fully retained, due
+    to its contents containing internal interpreter references and
+    details).

 Protocol
 ~~~~~~~~
@@ -406,16 +407,20 @@ Limitations
  locally to avoid transport overhead for very simple tasks (currently it will
  run even lightweight tasks remotely, which may be non-performant).
 * Fault detection, currently when a worker acknowledges a task the engine will
-  wait for the task result indefinitely (a task could take a very long time to
-  finish). In the future there needs to be a way to limit the duration of a
-  remote workers execution (and track there liveness) and possibly spawn
-  the task on a secondary worker if a timeout is reached (aka the first worker
-  has died or has stopped responding).
+  wait for the task result indefinitely (a task may take an indeterminate
+  amount of time to finish). In the future there needs to be a way to limit
+  the duration of a remote workers execution (and track their liveness) and
+  possibly spawn the task on a secondary worker if a timeout is reached (aka
+  the first worker has died or has stopped responding).

-Interfaces
-==========
+Implementations
+===============

 .. automodule:: taskflow.engines.worker_based.engine
+
+Components
+----------
+
 .. automodule:: taskflow.engines.worker_based.proxy
 .. automodule:: taskflow.engines.worker_based.worker

--- a/openstack-common.conf
+++ b/openstack-common.conf
@@ -1,7 +1,4 @@
 [DEFAULT]

-# The list of modules to copy from oslo-incubator.git
-script=tools/run_cross_tests.sh
-
 # The base module to hold the copy of openstack.common
 base=taskflow
--- a/2
+++ b/2
@@ -12,7 +12,7 @@ variable-rgx=[a-z_][a-z0-9_]{0,30}$
 argument-rgx=[a-z_][a-z0-9_]{1,30}$

 # Method names should be at least 3 characters long
-# and be lowecased with underscores
+# and be lowercased with underscores
 method-rgx=[a-z_][a-z0-9_]{2,50}$

 # Don't require docstrings on tests.
--- a/requirements-py2.txt
+++ b/requirements-py2.txt
@@ -1,30 +0,0 @@
-# The order of packages is significant, because pip processes them in the order
-# of appearance. Changing the order has an impact on the overall integration
-# process, which may cause wedges in the gate later.
-
-# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
-pbr>=0.6,!=0.7,<1.0
-
-# Packages needed for using this library.
-
-# Only needed on python 2.6
-ordereddict
-
-# Python 2->3 compatibility library.
-six>=1.7.0
-
-# Very nice graph library
-networkx>=1.8
-
-# Used for backend storage engine loading.
-stevedore>=1.1.0  # Apache-2.0
-
-# Backport for concurrent.futures which exists in 3.2+
-futures>=2.1.6
-
-# Used for structured input validation
-jsonschema>=2.0.0,<3.0.0
-
-# For common utilities
-oslo.utils>=1.2.0                       # Apache-2.0
-oslo.serialization>=1.2.0               # Apache-2.0
--- a/requirements-py3.txt
+++ b/requirements-py3.txt
@@ -1,24 +0,0 @@
-# The order of packages is significant, because pip processes them in the order
-# of appearance. Changing the order has an impact on the overall integration
-# process, which may cause wedges in the gate later.
-
-# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
-pbr>=0.6,!=0.7,<1.0
-
-# Packages needed for using this library.
-
-# Python 2->3 compatibility library.
-six>=1.7.0
-
-# Very nice graph library
-networkx>=1.8
-
-# Used for backend storage engine loading.
-stevedore>=1.1.0  # Apache-2.0
-
-# Used for structured input validation
-jsonschema>=2.0.0,<3.0.0
-
-# For common utilities
-oslo.utils>=1.2.0                       # Apache-2.0
-oslo.serialization>=1.2.0               # Apache-2.0
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,48 @@
+# The order of packages is significant, because pip processes them in the order
+# of appearance. Changing the order has an impact on the overall integration
+# process, which may cause wedges in the gate later.
+
+# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
+pbr<2.0,>=0.11
+
+# Packages needed for using this library.
+
+# Python 2->3 compatibility library.
+six>=1.9.0
+
+# Enum library made for <= python 3.3
+enum34;python_version=='2.7' or python_version=='2.6'
+
+# For async and/or periodic work
+futurist>=0.1.1 # Apache-2.0
+
+# For reader/writer + interprocess locks.
+fasteners>=0.7 # Apache-2.0
+
+# Very nice graph library
+networkx>=1.8
+
+# For contextlib new additions/compatibility for <= python 3.3
+contextlib2>=0.4.0 # PSF License
+
+# Used for backend storage engine loading.
+stevedore>=1.5.0 # Apache-2.0
+
+# Backport for concurrent.futures which exists in 3.2+
+futures>=3.0;python_version=='2.7' or python_version=='2.6'
+
+# Backport for time.monotonic which is in 3.3+
+monotonic>=0.1 # Apache-2.0
+
+# Used for structured input validation
+jsonschema!=2.5.0,<3.0.0,>=2.0.0
+
+# For common utilities
+oslo.utils>=1.6.0 # Apache-2.0
+oslo.serialization>=1.4.0 # Apache-2.0
+
+# For lru caches and such
+cachetools>=1.0.0 # MIT License
+
+# For deprecation of things
+debtcollector>=0.3.0 # Apache-2.0
--- a/setup.cfg
+++ b/setup.cfg
@@ -17,10 +17,8 @@ classifier =
    Operating System :: POSIX :: Linux
    Programming Language :: Python
    Programming Language :: Python :: 2
-    Programming Language :: Python :: 2.6
    Programming Language :: Python :: 2.7
    Programming Language :: Python :: 3
-    Programming Language :: Python :: 3.3
    Programming Language :: Python :: 3.4
    Topic :: Software Development :: Libraries
    Topic :: System :: Distributed Computing
@@ -36,6 +34,10 @@ packages =
 [entry_points]
 taskflow.jobboards =
    zookeeper = taskflow.jobs.backends.impl_zookeeper:ZookeeperJobBoard
+    redis = taskflow.jobs.backends.impl_redis:RedisJobBoard
+
+taskflow.conductors =
+    blocking = taskflow.conductors.backends.impl_blocking:BlockingConductor

 taskflow.persistence =
    dir = taskflow.persistence.backends.impl_dir:DirBackend
--- a/setup.py
+++ b/setup.py
@@ -1,4 +1,3 @@
-#!/usr/bin/env python
 # Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
--- a/taskflow/atom.py
+++ b/taskflow/atom.py
@@ -16,14 +16,22 @@
 #    under the License.

 import abc
+import collections
+import itertools

 from oslo_utils import reflection
 import six
+from six.moves import zip as compat_zip

-from taskflow import exceptions
+from taskflow.types import sets
 from taskflow.utils import misc


+# Helper types tuples...
+_sequence_types = (list, tuple, collections.Sequence)
+_set_types = (set, collections.Set)
+
+
 def _save_as_to_mapping(save_as):
    """Convert save_as to mapping name => index.

@@ -33,25 +41,27 @@ def _save_as_to_mapping(save_as):
    # outside of code so that it's more easily understandable, since what an
    # atom returns is pretty crucial for other later operations.
    if save_as is None:
-        return {}
+        return collections.OrderedDict()
    if isinstance(save_as, six.string_types):
        # NOTE(harlowja): this means that your atom will only return one item
        # instead of a dictionary-like object or a indexable object (like a
        # list or tuple).
-        return {save_as: None}
-    elif isinstance(save_as, (tuple, list)):
+        return collections.OrderedDict([(save_as, None)])
+    elif isinstance(save_as, _sequence_types):
        # NOTE(harlowja): this means that your atom will return a indexable
        # object, like a list or tuple and the results can be mapped by index
        # to that tuple/list that is returned for others to use.
-        return dict((key, num) for num, key in enumerate(save_as))
-    elif isinstance(save_as, set):
+        return collections.OrderedDict((key, num)
+                                       for num, key in enumerate(save_as))
+    elif isinstance(save_as, _set_types):
        # NOTE(harlowja): in the case where a set is given we will not be
-        # able to determine the numeric ordering in a reliable way (since it is
-        # a unordered set) so the only way for us to easily map the result of
-        # the atom will be via the key itself.
-        return dict((key, key) for key in save_as)
-    raise TypeError('Atom provides parameter '
-                    'should be str, set or tuple/list, not %r' % save_as)
+        # able to determine the numeric ordering in a reliable way (since it
+        # may be an unordered set) so the only way for us to easily map the
+        # result of the atom will be via the key itself.
+        return collections.OrderedDict((key, key) for key in save_as)
+    else:
+        raise TypeError('Atom provides parameter '
+                        'should be str, set or tuple/list, not %r' % save_as)


 def _build_rebind_dict(args, rebind_args):
@@ -62,9 +72,9 @@ def _build_rebind_dict(args, rebind_args):
    new name onto the required name).
    """
    if rebind_args is None:
-        return {}
+        return collections.OrderedDict()
    elif isinstance(rebind_args, (list, tuple)):
-        rebind = dict(zip(args, rebind_args))
+        rebind = collections.OrderedDict(compat_zip(args, rebind_args))
        if len(args) < len(rebind_args):
            rebind.update((a, a) for a in rebind_args[len(args):])
        return rebind
@@ -85,11 +95,11 @@ def _build_arg_mapping(atom_name, reqs, rebind_args, function, do_infer,
    extra arguments (where applicable).
    """

-    # build a list of required arguments based on function signature
+    # Build a list of required arguments based on function signature.
    req_args = reflection.get_callable_args(function, required_only=True)
    all_args = reflection.get_callable_args(function, required_only=False)

-    # remove arguments that are part of ignore_list
+    # Remove arguments that are part of ignore list.
    if ignore_list:
        for arg in ignore_list:
            if arg in req_args:
@@ -97,65 +107,56 @@ def _build_arg_mapping(atom_name, reqs, rebind_args, function, do_infer,
    else:
        ignore_list = []

-    required = {}
-    # add reqs to required mappings
+    # Build the required names.
+    required = collections.OrderedDict()
+
+    # Add required arguments to required mappings if inference is enabled.
+    if do_infer:
+        required.update((a, a) for a in req_args)
+
+    # Add additional manually provided requirements to required mappings.
    if reqs:
        if isinstance(reqs, six.string_types):
            required.update({reqs: reqs})
        else:
            required.update((a, a) for a in reqs)

-    # add req_args to required mappings if do_infer is set
-    if do_infer:
-        required.update((a, a) for a in req_args)
-
-    # update required mappings based on rebind_args
+    # Update required mappings values based on rebinding of arguments names.
    required.update(_build_rebind_dict(req_args, rebind_args))

+    # Determine if there are optional arguments that we may or may not take.
    if do_infer:
-        opt_args = set(all_args) - set(required) - set(ignore_list)
-        optional = dict((a, a) for a in opt_args)
+        opt_args = sets.OrderedSet(all_args)
+        opt_args = opt_args - set(itertools.chain(six.iterkeys(required),
+                                                  iter(ignore_list)))
+        optional = collections.OrderedDict((a, a) for a in opt_args)
    else:
-        optional = {}
+        optional = collections.OrderedDict()

+    # Check if we are given some extra arguments that we aren't able to accept.
    if not reflection.accepts_kwargs(function):
-        extra_args = set(required) - set(all_args)
+        extra_args = sets.OrderedSet(six.iterkeys(required))
+        extra_args -= all_args
        if extra_args:
-            extra_args_str = ', '.join(sorted(extra_args))
            raise ValueError('Extra arguments given to atom %s: %s'
-                             % (atom_name, extra_args_str))
+                             % (atom_name, list(extra_args)))

    # NOTE(imelnikov): don't use set to preserve order in error message
    missing_args = [arg for arg in req_args if arg not in required]
    if missing_args:
        raise ValueError('Missing arguments for atom %s: %s'
-                         % (atom_name, ' ,'.join(missing_args)))
+                         % (atom_name, missing_args))
    return required, optional


@six.add_metaclass(abc.ABCMeta)
 class Atom(object):
-    """An abstract flow atom that causes a flow to progress (in some manner).
+    """An unit of work that causes a flow to progress (in some manner).

-    An atom is a named object that operates with input flow data to perform
+    An atom is a named object that operates with input data to perform
    some action that furthers the overall flows progress. It usually also
    produces some of its own named output as a result of this process.

-    :ivar version: An *immutable* version that associates version information
-                   with this atom. It can be useful in resuming older versions
-                   of atoms. Standard major, minor versioning concepts
-                   should apply.
-    :ivar save_as: An *immutable* output ``resource`` name dictionary this atom
-                   produces that other atoms may depend on this atom providing.
-                   The format is output index (or key when a dictionary
-                   is returned from the execute method) to stored argument
-                   name.
-    :ivar rebind: An *immutable* input ``resource`` mapping dictionary that
-                  can be used to alter the inputs given to this atom. It is
-                  typically used for mapping a prior atoms output into
-                  the names that this atom expects (in a way this is like
-                  remapping a namespace of another atom into the namespace
-                  of this atom).
    :param name: Meaningful name for this atom, should be something that is
                 distinguishable and understandable for notification,
                 debugging, storing and any other similar purposes.
@@ -164,52 +165,61 @@ class Atom(object):
                     to correlate and associate the thing/s this atom
                     produces, if it produces anything at all.
    :param inject: An *immutable* input_name => value dictionary which
-                  specifies  any initial inputs that should be automatically
-                  injected into the atoms scope before the atom execution
-                  commences (this allows for providing atom *local* values that
-                  do not need to be provided by other atoms/dependents).
+                   specifies  any initial inputs that should be automatically
+                   injected into the atoms scope before the atom execution
+                   commences (this allows for providing atom *local* values
+                   that do not need to be provided by other atoms/dependents).
+    :ivar version: An *immutable* version that associates version information
+                   with this atom. It can be useful in resuming older versions
+                   of atoms. Standard major, minor versioning concepts
+                   should apply.
+    :ivar save_as: An *immutable* output ``resource`` name
+                   :py:class:`.OrderedDict` this atom produces that other
+                   atoms may depend on this atom providing. The format is
+                   output index (or key when a dictionary is returned from
+                   the execute method) to stored argument name.
+    :ivar rebind: An *immutable* input ``resource`` :py:class:`.OrderedDict`
+                  that can be used to alter the inputs given to this atom. It
+                  is typically used for mapping a prior atoms output into
+                  the names that this atom expects (in a way this is like
+                  remapping a namespace of another atom into the namespace
+                  of this atom).
    :ivar inject: See parameter ``inject``.
-    :ivar requires: Any inputs this atom requires to function (if applicable).
-                    NOTE(harlowja): there can be no intersection between what
-                    this atom requires and what it produces (since this would
-                    be an impossible dependency to satisfy).
-    :ivar optional: Any inputs that are optional for this atom's execute
-                    method.
-
+    :ivar name: See parameter ``name``.
+    :ivar requires: A :py:class:`~taskflow.types.sets.OrderedSet` of inputs
+                    this atom requires to function.
+    :ivar optional: A :py:class:`~taskflow.types.sets.OrderedSet` of inputs
+                    that are optional for this atom to function.
+    :ivar provides: A :py:class:`~taskflow.types.sets.OrderedSet` of outputs
+                    this atom produces.
    """

    def __init__(self, name=None, provides=None, inject=None):
-        self._name = name
-        self.save_as = _save_as_to_mapping(provides)
+        self.name = name
        self.version = (1, 0)
        self.inject = inject
-        self.requires = frozenset()
-        self.optional = frozenset()
+        self.save_as = _save_as_to_mapping(provides)
+        self.requires = sets.OrderedSet()
+        self.optional = sets.OrderedSet()
+        self.provides = sets.OrderedSet(self.save_as)
+        self.rebind = collections.OrderedDict()

    def _build_arg_mapping(self, executor, requires=None, rebind=None,
                           auto_extract=True, ignore_list=None):
-        req_arg, opt_arg = _build_arg_mapping(self.name, requires, rebind,
-                                              executor, auto_extract,
-                                              ignore_list)
-
-        self.rebind = {}
-        if opt_arg:
-            self.rebind.update(opt_arg)
-        if req_arg:
-            self.rebind.update(req_arg)
-        self.requires = frozenset(req_arg.values())
-        self.optional = frozenset(opt_arg.values())
+        required, optional = _build_arg_mapping(self.name, requires, rebind,
+                                                executor, auto_extract,
+                                                ignore_list=ignore_list)
+        rebind = collections.OrderedDict()
+        for (arg_name, bound_name) in itertools.chain(six.iteritems(required),
+                                                      six.iteritems(optional)):
+            rebind.setdefault(arg_name, bound_name)
+        self.rebind = rebind
+        self.requires = sets.OrderedSet(six.itervalues(required))
+        self.optional = sets.OrderedSet(six.itervalues(optional))
        if self.inject:
-            inject_set = set(six.iterkeys(self.inject))
-            self.requires -= inject_set
-            self.optional -= inject_set
-
-        out_of_order = self.provides.intersection(self.requires)
-        if out_of_order:
-            raise exceptions.DependencyFailure(
-                "Atom %(item)s provides %(oo)s that are required "
-                "by this atom"
-                % dict(item=self.name, oo=sorted(out_of_order)))
+            inject_keys = frozenset(six.iterkeys(self.inject))
+            self.requires -= inject_keys
+            self.optional -= inject_keys

    @abc.abstractmethod
    def execute(self, *args, **kwargs):
@@ -219,23 +229,8 @@ class Atom(object):
    def revert(self, *args, **kwargs):
        """Reverts this atom (undoing any :meth:`execute` side-effects)."""

-    @property
-    def name(self):
-        """A non-unique name for this atom (human readable)."""
-        return self._name
-
    def __str__(self):
        return "%s==%s" % (self.name, misc.get_version_string(self))

    def __repr__(self):
        return '<%s %s>' % (reflection.get_class_name(self), self)
-
-    @property
-    def provides(self):
-        """Any outputs this atom produces.
-
-        NOTE(harlowja): there can be no intersection between what this atom
-        requires and what it produces (since this would be an impossible
-        dependency to satisfy).
-        """
-        return set(self.save_as)
--- a/taskflow/conductors/backends/init.py
+++ b/taskflow/conductors/backends/init.py
@@ -0,0 +1,45 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+
+import stevedore.driver
+
+from taskflow import exceptions as exc
+
+# NOTE(harlowja): this is the entrypoint namespace, not the module namespace.
+CONDUCTOR_NAMESPACE = 'taskflow.conductors'
+
+LOG = logging.getLogger(__name__)
+
+
+def fetch(kind, name, jobboard, namespace=CONDUCTOR_NAMESPACE, **kwargs):
+    """Fetch a conductor backend with the given options.
+
+    This fetch method will look for the entrypoint 'kind' in the entrypoint
+    namespace, and then attempt to instantiate that entrypoint using the
+    provided name, jobboard and any board specific kwargs.
+    """
+    LOG.debug('Looking for %r conductor driver in %r', kind, namespace)
+    try:
+        mgr = stevedore.driver.DriverManager(
+            namespace, kind,
+            invoke_on_load=True,
+            invoke_args=(name, jobboard),
+            invoke_kwds=kwargs)
+        return mgr.driver
+    except RuntimeError as e:
+        raise exc.NotFound("Could not find conductor %s" % (kind), e)
--- a/taskflow/conductors/backends/impl_blocking.py
+++ b/taskflow/conductors/backends/impl_blocking.py
@@ -0,0 +1,219 @@
+# -*- coding: utf-8 -*-
+
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import threading
+
+try:
+    from contextlib import ExitStack  # noqa
+except ImportError:
+    from contextlib2 import ExitStack  # noqa
+
+from debtcollector import removals
+from oslo_utils import excutils
+import six
+
+from taskflow.conductors import base
+from taskflow import exceptions as excp
+from taskflow.listeners import logging as logging_listener
+from taskflow import logging
+from taskflow.types import timing as tt
+from taskflow.utils import async_utils
+
+LOG = logging.getLogger(__name__)
+WAIT_TIMEOUT = 0.5
+NO_CONSUME_EXCEPTIONS = tuple([
+    excp.ExecutionFailure,
+    excp.StorageFailure,
+])
+
+
+class BlockingConductor(base.Conductor):
+    """A conductor that runs jobs in its own dispatching loop.
+
+    This conductor iterates over jobs in the provided jobboard (waiting for
+    the given timeout if no jobs exist) and attempts to claim them, work on
+    those jobs in its local thread (blocking further work from being claimed
+    and consumed) and then consume those work units after completetion. This
+    process will repeat until the conductor has been stopped or other critical
+    error occurs.
+
+    NOTE(harlowja): consumption occurs even if a engine fails to run due to
+    a task failure. This is only skipped when an execution failure or
+    a storage failure occurs which are *usually* correctable by re-running on
+    a different conductor (storage failures and execution failures may be
+    transient issues that can be worked around by later execution). If a job
+    after completing can not be consumed or abandoned the conductor relies
+    upon the jobboard capabilities to automatically abandon these jobs.
+    """
+
+    START_FINISH_EVENTS_EMITTED = tuple([
+        'compilation', 'preparation',
+        'validation', 'running',
+    ])
+    """Events will be emitted for the start and finish of each engine
+       activity defined above, the actual event name that can be registered
+       to subscribe to will be ``${event}_start`` and ``${event}_end`` where
+       the ``${event}`` in this pseudo-variable will be one of these events.
+    """
+
+    def __init__(self, name, jobboard,
+                 persistence=None, engine=None,
+                 engine_options=None, wait_timeout=None):
+        super(BlockingConductor, self).__init__(
+            name, jobboard, persistence=persistence,
+            engine=engine, engine_options=engine_options)
+        if wait_timeout is None:
+            wait_timeout = WAIT_TIMEOUT
+        if isinstance(wait_timeout, (int, float) + six.string_types):
+            self._wait_timeout = tt.Timeout(float(wait_timeout))
+        elif isinstance(wait_timeout, tt.Timeout):
+            self._wait_timeout = wait_timeout
+        else:
+            raise ValueError("Invalid timeout literal: %s" % (wait_timeout))
+        self._dead = threading.Event()
+
+    @removals.removed_kwarg('timeout', version="0.8", removal_version="2.0")
+    def stop(self, timeout=None):
+        """Requests the conductor to stop dispatching.
+
+        This method can be used to request that a conductor stop its
+        consumption & dispatching loop.
+
+        The method returns immediately regardless of whether the conductor has
+        been stopped.
+
+        .. deprecated:: 0.8
+
+            The ``timeout`` parameter is **deprecated** and is present for
+            backward compatibility **only**. In order to wait for the
+            conductor to gracefully shut down, :py:meth:`wait` should be used
+            instead.
+        """
+        self._wait_timeout.interrupt()
+
+    @property
+    def dispatching(self):
+        return not self._dead.is_set()
+
+    def _listeners_from_job(self, job, engine):
+        listeners = super(BlockingConductor, self)._listeners_from_job(job,
+                                                                       engine)
+        listeners.append(logging_listener.LoggingListener(engine, log=LOG))
+        return listeners
+
+    def _dispatch_job(self, job):
+        engine = self._engine_from_job(job)
+        listeners = self._listeners_from_job(job, engine)
+        with ExitStack() as stack:
+            for listener in listeners:
+                stack.enter_context(listener)
+            LOG.debug("Dispatching engine for job '%s'", job)
+            consume = True
+            try:
+                for stage_func, event_name in [(engine.compile, 'compilation'),
+                                               (engine.prepare, 'preparation'),
+                                               (engine.validate, 'validation'),
+                                               (engine.run, 'running')]:
+                    self._notifier.notify("%s_start" % event_name, {
+                        'job': job,
+                        'engine': engine,
+                        'conductor': self,
+                    })
+                    stage_func()
+                    self._notifier.notify("%s_end" % event_name, {
+                        'job': job,
+                        'engine': engine,
+                        'conductor': self,
+                    })
+            except excp.WrappedFailure as e:
+                if all((f.check(*NO_CONSUME_EXCEPTIONS) for f in e)):
+                    consume = False
+                if LOG.isEnabledFor(logging.WARNING):
+                    if consume:
+                        LOG.warn("Job execution failed (consumption being"
+                                 " skipped): %s [%s failures]", job, len(e))
+                    else:
+                        LOG.warn("Job execution failed (consumption"
+                                 " proceeding): %s [%s failures]", job, len(e))
+                    # Show the failure/s + traceback (if possible)...
+                    for i, f in enumerate(e):
+                        LOG.warn("%s. %s", i + 1, f.pformat(traceback=True))
+            except NO_CONSUME_EXCEPTIONS:
+                LOG.warn("Job execution failed (consumption being"
+                         " skipped): %s", job, exc_info=True)
+                consume = False
+            except Exception:
+                LOG.warn("Job execution failed (consumption proceeding): %s",
+                         job, exc_info=True)
+            else:
+                LOG.info("Job completed successfully: %s", job)
+            return async_utils.make_completed_future(consume)
+
+    def run(self):
+        self._dead.clear()
+        try:
+            while True:
+                if self._wait_timeout.is_stopped():
+                    break
+                dispatched = 0
+                for job in self._jobboard.iterjobs():
+                    if self._wait_timeout.is_stopped():
+                        break
+                    LOG.debug("Trying to claim job: %s", job)
+                    try:
+                        self._jobboard.claim(job, self._name)
+                    except (excp.UnclaimableJob, excp.NotFound):
+                        LOG.debug("Job already claimed or consumed: %s", job)
+                        continue
+                    consume = False
+                    try:
+                        f = self._dispatch_job(job)
+                    except KeyboardInterrupt:
+                        with excutils.save_and_reraise_exception():
+                            LOG.warn("Job dispatching interrupted: %s", job)
+                    except Exception:
+                        LOG.warn("Job dispatching failed: %s", job,
+                                 exc_info=True)
+                    else:
+                        dispatched += 1
+                        consume = f.result()
+                    try:
+                        if consume:
+                            self._jobboard.consume(job, self._name)
+                        else:
+                            self._jobboard.abandon(job, self._name)
+                    except (excp.JobFailure, excp.NotFound):
+                        if consume:
+                            LOG.warn("Failed job consumption: %s", job,
+                                     exc_info=True)
+                        else:
+                            LOG.warn("Failed job abandonment: %s", job,
+                                     exc_info=True)
+                if dispatched == 0 and not self._wait_timeout.is_stopped():
+                    self._wait_timeout.wait()
+        finally:
+            self._dead.set()
+
+    def wait(self, timeout=None):
+        """Waits for the conductor to gracefully exit.
+
+        This method waits for the conductor to gracefully exit. An optional
+        timeout can be provided, which will cause the method to return
+        within the specified timeout. If the timeout is reached, the returned
+        value will be False.
+
+        :param timeout: Maximum number of seconds that the :meth:`wait` method
+                        should block for.
+        """
+        return self._dead.wait(timeout)
--- a/taskflow/conductors/base.py
+++ b/taskflow/conductors/base.py
@@ -15,16 +15,17 @@
 import abc
 import threading

+import fasteners
 import six

 from taskflow import engines
 from taskflow import exceptions as excp
-from taskflow.utils import lock_utils
+from taskflow.types import notifier


@six.add_metaclass(abc.ABCMeta)
 class Conductor(object):
-    """Conductors conduct jobs & assist in associated runtime interactions.
+    """Base for all conductor implementations.

    Conductors act as entities which extract jobs from a jobboard, assign
    there work to some engine (using some desired configuration) and then wait
@@ -34,8 +35,8 @@ class Conductor(object):
    period of time will finish up the prior failed conductors work.
    """

-    def __init__(self, name, jobboard, persistence,
-                 engine=None, engine_options=None):
+    def __init__(self, name, jobboard,
+                 persistence=None, engine=None, engine_options=None):
        self._name = name
        self._jobboard = jobboard
        self._engine = engine
@@ -45,6 +46,18 @@ class Conductor(object):
            self._engine_options = engine_options.copy()
        self._persistence = persistence
        self._lock = threading.RLock()
+        self._notifier = notifier.Notifier()
+
+    @property
+    def notifier(self):
+        """The conductor actions (or other state changes) notifier.
+
+        NOTE(harlowja): different conductor implementations may emit
+        different events + event details at different times, so refer to your
+        conductor documentation to know exactly what can and what can not be
+        subscribed to.
+        """
+        return self._notifier

    def _flow_detail_from_job(self, job):
        """Extracts a flow detail from a job (via some manner).
@@ -88,20 +101,36 @@ class Conductor(object):
            store = dict(job.details["store"])
        else:
            store = {}
-        return engines.load_from_detail(flow_detail, store=store,
-                                        engine=self._engine,
-                                        backend=self._persistence,
-                                        **self._engine_options)
+        engine = engines.load_from_detail(flow_detail, store=store,
+                                          engine=self._engine,
+                                          backend=self._persistence,
+                                          **self._engine_options)
+        return engine

-    @lock_utils.locked
+    def _listeners_from_job(self, job, engine):
+        """Returns a list of listeners to be attached to an engine.
+
+        This method should be overridden in order to attach listeners to
+        engines. It will be called once for each job, and the list returned
+        listeners will be added to the engine for this job.
+
+        :param job: A job instance that is about to be run in an engine.
+        :param engine: The engine that listeners will be attached to.
+        :returns: a list of (unregistered) listener instances.
+        """
+        # TODO(dkrause): Create a standard way to pass listeners or
+        #                listener factories over the jobboard
+        return []
+
+    @fasteners.locked
    def connect(self):
        """Ensures the jobboard is connected (noop if it is already)."""
        if not self._jobboard.connected:
            self._jobboard.connect()

-    @lock_utils.locked
+    @fasteners.locked
    def close(self):
-        """Closes the jobboard, disallowing further use."""
+        """Closes the contained jobboard, disallowing further use."""
        self._jobboard.close()

    @abc.abstractmethod
--- a/taskflow/conductors/single_threaded.py
+++ b/taskflow/conductors/single_threaded.py
@@ -1,5 +1,7 @@
 # -*- coding: utf-8 -*-

+#    Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
+#
 #    Licensed under the Apache License, Version 2.0 (the "License"); you may
 #    not use this file except in compliance with the License. You may obtain
 #    a copy of the License at
@@ -12,163 +14,18 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

-import six
+from debtcollector import moves
+from debtcollector import removals

-from taskflow.conductors import base
-from taskflow import exceptions as excp
-from taskflow.listeners import logging as logging_listener
-from taskflow import logging
-from taskflow.types import timing as tt
-from taskflow.utils import async_utils
-from taskflow.utils import deprecation
-from taskflow.utils import threading_utils
+from taskflow.conductors.backends import impl_blocking

-LOG = logging.getLogger(__name__)
-WAIT_TIMEOUT = 0.5
-NO_CONSUME_EXCEPTIONS = tuple([
-    excp.ExecutionFailure,
-    excp.StorageFailure,
-])
+# TODO(harlowja): remove this module soon...
+removals.removed_module(__name__,
+                        replacement="the conductor entrypoints",
+                        version="0.8", removal_version="2.0",
+                        stacklevel=4)

-
-class SingleThreadedConductor(base.Conductor):
-    """A conductor that runs jobs in its own dispatching loop.
-
-    This conductor iterates over jobs in the provided jobboard (waiting for
-    the given timeout if no jobs exist) and attempts to claim them, work on
-    those jobs in its local thread (blocking further work from being claimed
-    and consumed) and then consume those work units after completetion. This
-    process will repeat until the conductor has been stopped or other critical
-    error occurs.
-
-    NOTE(harlowja): consumption occurs even if a engine fails to run due to
-    a task failure. This is only skipped when an execution failure or
-    a storage failure occurs which are *usually* correctable by re-running on
-    a different conductor (storage failures and execution failures may be
-    transient issues that can be worked around by later execution). If a job
-    after completing can not be consumed or abandoned the conductor relies
-    upon the jobboard capabilities to automatically abandon these jobs.
-    """
-
-    def __init__(self, name, jobboard, persistence,
-                 engine=None, engine_options=None, wait_timeout=None):
-        super(SingleThreadedConductor, self).__init__(
-            name, jobboard, persistence,
-            engine=engine, engine_options=engine_options)
-        if wait_timeout is None:
-            wait_timeout = WAIT_TIMEOUT
-        if isinstance(wait_timeout, (int, float) + six.string_types):
-            self._wait_timeout = tt.Timeout(float(wait_timeout))
-        elif isinstance(wait_timeout, tt.Timeout):
-            self._wait_timeout = wait_timeout
-        else:
-            raise ValueError("Invalid timeout literal: %s" % (wait_timeout))
-        self._dead = threading_utils.Event()
-
-    @deprecation.removed_kwarg('timeout',
-                               version="0.8", removal_version="?")
-    def stop(self, timeout=None):
-        """Requests the conductor to stop dispatching.
-
-        This method can be used to request that a conductor stop its
-        consumption & dispatching loop.
-
-        The method returns immediately regardless of whether the conductor has
-        been stopped.
-
-        :param timeout: This parameter is **deprecated** and is present for
-                        backward compatibility **only**. In order to wait for
-                        the conductor to gracefully shut down, :meth:`wait`
-                        should be used instead.
-        """
-        self._wait_timeout.interrupt()
-
-    @property
-    def dispatching(self):
-        return not self._dead.is_set()
-
-    def _dispatch_job(self, job):
-        engine = self._engine_from_job(job)
-        consume = True
-        with logging_listener.LoggingListener(engine, log=LOG):
-            LOG.debug("Dispatching engine %s for job: %s", engine, job)
-            try:
-                engine.run()
-            except excp.WrappedFailure as e:
-                if all((f.check(*NO_CONSUME_EXCEPTIONS) for f in e)):
-                    consume = False
-                if LOG.isEnabledFor(logging.WARNING):
-                    if consume:
-                        LOG.warn("Job execution failed (consumption being"
-                                 " skipped): %s [%s failures]", job, len(e))
-                    else:
-                        LOG.warn("Job execution failed (consumption"
-                                 " proceeding): %s [%s failures]", job, len(e))
-                    # Show the failure/s + traceback (if possible)...
-                    for i, f in enumerate(e):
-                        LOG.warn("%s. %s", i + 1, f.pformat(traceback=True))
-            except NO_CONSUME_EXCEPTIONS:
-                LOG.warn("Job execution failed (consumption being"
-                         " skipped): %s", job, exc_info=True)
-                consume = False
-            except Exception:
-                LOG.warn("Job execution failed (consumption proceeding): %s",
-                         job, exc_info=True)
-            else:
-                LOG.info("Job completed successfully: %s", job)
-        return async_utils.make_completed_future(consume)
-
-    def run(self):
-        self._dead.clear()
-        try:
-            while True:
-                if self._wait_timeout.is_stopped():
-                    break
-                dispatched = 0
-                for job in self._jobboard.iterjobs():
-                    if self._wait_timeout.is_stopped():
-                        break
-                    LOG.debug("Trying to claim job: %s", job)
-                    try:
-                        self._jobboard.claim(job, self._name)
-                    except (excp.UnclaimableJob, excp.NotFound):
-                        LOG.debug("Job already claimed or consumed: %s", job)
-                        continue
-                    consume = False
-                    try:
-                        f = self._dispatch_job(job)
-                    except Exception:
-                        LOG.warn("Job dispatching failed: %s", job,
-                                 exc_info=True)
-                    else:
-                        dispatched += 1
-                        consume = f.result()
-                    try:
-                        if consume:
-                            self._jobboard.consume(job, self._name)
-                        else:
-                            self._jobboard.abandon(job, self._name)
-                    except (excp.JobFailure, excp.NotFound):
-                        if consume:
-                            LOG.warn("Failed job consumption: %s", job,
-                                     exc_info=True)
-                        else:
-                            LOG.warn("Failed job abandonment: %s", job,
-                                     exc_info=True)
-                if dispatched == 0 and not self._wait_timeout.is_stopped():
-                    self._wait_timeout.wait()
-        finally:
-            self._dead.set()
-
-    def wait(self, timeout=None):
-        """Waits for the conductor to gracefully exit.
-
-        This method waits for the conductor to gracefully exit.  An optional
-        timeout can be provided, which will cause the method to return
-        within the specified timeout.  If the timeout is reached, the returned
-        value will be False.
-
-        :param timeout: Maximum number of seconds that the :meth:`wait` method
-                        should block for.
-        """
-        return self._dead.wait(timeout)
+# TODO(harlowja): remove this proxy/legacy class soon...
+SingleThreadedConductor = moves.moved_class(
+    impl_blocking.BlockingConductor, 'SingleThreadedConductor',
+    __name__, version="0.8", removal_version="?")
--- a/taskflow/engines/init.py
+++ b/taskflow/engines/init.py
@@ -14,8 +14,16 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+from oslo_utils import eventletutils as _eventletutils

-# promote helpers to this module namespace
+# Give a nice warning that if eventlet is being used these modules
+# are highly recommended to be patched (or otherwise bad things could
+# happen).
+_eventletutils.warn_eventlet_not_patched(
+    expected_patched_modules=['time', 'thread'])
+
+
+# Promote helpers to this module namespace (for easy access).
 from taskflow.engines.helpers import flow_from_detail   # noqa
 from taskflow.engines.helpers import load               # noqa
 from taskflow.engines.helpers import load_from_detail   # noqa
--- a/taskflow/engines/action_engine/actions/base.py
+++ b/taskflow/engines/action_engine/actions/base.py
@@ -32,11 +32,6 @@ SAVE_RESULT_STATES = (states.SUCCESS, states.FAILURE)
 class Action(object):
    """An action that handles executing, state changes, ... of atoms."""

-    def __init__(self, storage, notifier, walker_factory):
+    def __init__(self, storage, notifier):
        self._storage = storage
        self._notifier = notifier
-        self._walker_factory = walker_factory
-
-    @abc.abstractmethod
-    def handles(self, atom):
-        """Checks if this action handles the provided atom."""
--- a/taskflow/engines/action_engine/actions/retry.py
+++ b/taskflow/engines/action_engine/actions/retry.py
@@ -14,13 +14,14 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import futurist
+
 from taskflow.engines.action_engine.actions import base
 from taskflow.engines.action_engine import executor as ex
 from taskflow import logging
 from taskflow import retry as retry_atom
 from taskflow import states
 from taskflow.types import failure
-from taskflow.types import futures

 LOG = logging.getLogger(__name__)

@@ -44,20 +45,14 @@ def _revert_retry(retry, arguments):
 class RetryAction(base.Action):
    """An action that handles executing, state changes, ... of retry atoms."""

-    def __init__(self, storage, notifier, walker_factory):
-        super(RetryAction, self).__init__(storage, notifier, walker_factory)
-        self._executor = futures.SynchronousExecutor()
-
-    @staticmethod
-    def handles(atom):
-        return isinstance(atom, retry_atom.Retry)
+    def __init__(self, storage, notifier):
+        super(RetryAction, self).__init__(storage, notifier)
+        self._executor = futurist.SynchronousExecutor()

    def _get_retry_args(self, retry, addons=None):
-        scope_walker = self._walker_factory(retry)
        arguments = self._storage.fetch_mapped_args(
            retry.rebind,
            atom_name=retry.name,
-            scope_walker=scope_walker,
            optional_args=retry.optional
        )
        history = self._storage.get_retry_history(retry.name)
--- a/taskflow/engines/action_engine/actions/task.py
+++ b/taskflow/engines/action_engine/actions/task.py
@@ -28,14 +28,10 @@ LOG = logging.getLogger(__name__)
 class TaskAction(base.Action):
    """An action that handles scheduling, state changes, ... of task atoms."""

-    def __init__(self, storage, notifier, walker_factory, task_executor):
-        super(TaskAction, self).__init__(storage, notifier, walker_factory)
+    def __init__(self, storage, notifier, task_executor):
+        super(TaskAction, self).__init__(storage, notifier)
        self._task_executor = task_executor

-    @staticmethod
-    def handles(atom):
-        return isinstance(atom, task_atom.BaseTask)
-
    def _is_identity_transition(self, old_state, state, task, progress):
        if state in base.SAVE_RESULT_STATES:
            # saving result is never identity transition
@@ -100,11 +96,9 @@ class TaskAction(base.Action):

    def schedule_execution(self, task):
        self.change_state(task, states.RUNNING, progress=0.0)
-        scope_walker = self._walker_factory(task)
        arguments = self._storage.fetch_mapped_args(
            task.rebind,
            atom_name=task.name,
-            scope_walker=scope_walker,
            optional_args=task.optional
        )
        if task.notifier.can_be_registered(task_atom.EVENT_UPDATE_PROGRESS):
@@ -126,11 +120,9 @@ class TaskAction(base.Action):

    def schedule_reversion(self, task):
        self.change_state(task, states.REVERTING, progress=0.0)
-        scope_walker = self._walker_factory(task)
        arguments = self._storage.fetch_mapped_args(
            task.rebind,
            atom_name=task.name,
-            scope_walker=scope_walker,
            optional_args=task.optional
        )
        task_uuid = self._storage.get_atom_uuid(task.name)
--- a/taskflow/engines/action_engine/analyzer.py
+++ b/taskflow/engines/action_engine/analyzer.py
@@ -14,6 +14,8 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import itertools
+
 from networkx.algorithms import traversal
 import six

@@ -21,6 +23,60 @@ from taskflow import retry as retry_atom
 from taskflow import states as st


+class IgnoreDecider(object):
+    """Checks any provided edge-deciders and determines if ok to run."""
+
+    def __init__(self, atom, edge_deciders):
+        self._atom = atom
+        self._edge_deciders = edge_deciders
+
+    def check(self, runtime):
+        """Returns bool of whether this decider should allow running."""
+        results = {}
+        for name in six.iterkeys(self._edge_deciders):
+            results[name] = runtime.storage.get(name)
+        for local_decider in six.itervalues(self._edge_deciders):
+            if not local_decider(history=results):
+                return False
+        return True
+
+    def affect(self, runtime):
+        """If the :py:func:`~.check` returns false, affects associated atoms.
+
+        This will alter the associated atom + successor atoms by setting there
+        state to ``IGNORE`` so that they are ignored in future runtime
+        activities.
+        """
+        successors_iter = runtime.analyzer.iterate_subgraph(self._atom)
+        runtime.reset_nodes(itertools.chain([self._atom], successors_iter),
+                            state=st.IGNORE, intention=st.IGNORE)
+
+    def check_and_affect(self, runtime):
+        """Handles :py:func:`~.check` + :py:func:`~.affect` in right order."""
+        proceed = self.check(runtime)
+        if not proceed:
+            self.affect(runtime)
+        return proceed
+
+
+class NoOpDecider(object):
+    """No-op decider that says it is always ok to run & has no effect(s)."""
+
+    def check(self, runtime):
+        """Always good to go."""
+        return True
+
+    def affect(self, runtime):
+        """Does nothing."""
+
+    def check_and_affect(self, runtime):
+        """Handles :py:func:`~.check` + :py:func:`~.affect` in right order.
+
+        Does nothing.
+        """
+        return self.check(runtime)
+
+
 class Analyzer(object):
    """Analyzes a compilation and aids in execution processes.

@@ -31,21 +87,25 @@ class Analyzer(object):
    the rest of the runtime system.
    """

-    def __init__(self, compilation, storage):
-        self._storage = storage
-        self._graph = compilation.execution_graph
+    def __init__(self, runtime):
+        self._storage = runtime.storage
+        self._execution_graph = runtime.compilation.execution_graph
+        self._check_atom_transition = runtime.check_atom_transition
+        self._fetch_edge_deciders = runtime.fetch_edge_deciders

    def get_next_nodes(self, node=None):
+        """Get next nodes to run (originating from node or all nodes)."""
        if node is None:
            execute = self.browse_nodes_for_execute()
            revert = self.browse_nodes_for_revert()
            return execute + revert
-
        state = self.get_state(node)
        intention = self._storage.get_atom_intention(node.name)
        if state == st.SUCCESS:
            if intention == st.REVERT:
-                return [node]
+                return [
+                    (node, NoOpDecider()),
+                ]
            elif intention == st.EXECUTE:
                return self.browse_nodes_for_execute(node)
            else:
@@ -60,74 +120,90 @@ class Analyzer(object):
    def browse_nodes_for_execute(self, node=None):
        """Browse next nodes to execute.

-        This returns a collection of nodes that are ready to be executed, if
-        given a specific node it will only examine the successors of that node,
-        otherwise it will examine the whole graph.
+        This returns a collection of nodes that *may* be ready to be
+        executed, if given a specific node it will only examine the successors
+        of that node, otherwise it will examine the whole graph.
        """
-        if node:
-            nodes = self._graph.successors(node)
+        if node is not None:
+            nodes = self._execution_graph.successors(node)
        else:
-            nodes = self._graph.nodes_iter()
-
-        available_nodes = []
+            nodes = self._execution_graph.nodes_iter()
+        ready_nodes = []
        for node in nodes:
-            if self._is_ready_for_execute(node):
-                available_nodes.append(node)
-        return available_nodes
+            is_ready, late_decider = self._get_maybe_ready_for_execute(node)
+            if is_ready:
+                ready_nodes.append((node, late_decider))
+        return ready_nodes

    def browse_nodes_for_revert(self, node=None):
        """Browse next nodes to revert.

-        This returns a collection of nodes that are ready to be be reverted, if
-        given a specific node it will only examine the predecessors of that
-        node, otherwise it will examine the whole graph.
+        This returns a collection of nodes that *may* be ready to be be
+        reverted, if given a specific node it will only examine the
+        predecessors of that node, otherwise it will examine the whole
+        graph.
        """
-        if node:
-            nodes = self._graph.predecessors(node)
+        if node is not None:
+            nodes = self._execution_graph.predecessors(node)
        else:
-            nodes = self._graph.nodes_iter()
-
-        available_nodes = []
+            nodes = self._execution_graph.nodes_iter()
+        ready_nodes = []
        for node in nodes:
-            if self._is_ready_for_revert(node):
-                available_nodes.append(node)
-        return available_nodes
+            is_ready, late_decider = self._get_maybe_ready_for_revert(node)
+            if is_ready:
+                ready_nodes.append((node, late_decider))
+        return ready_nodes

-    def _is_ready_for_execute(self, task):
-        """Checks if task is ready to be executed."""
-        state = self.get_state(task)
-        intention = self._storage.get_atom_intention(task.name)
-        transition = st.check_task_transition(state, st.RUNNING)
+    def _get_maybe_ready_for_execute(self, atom):
+        """Returns if an atom is *likely* ready to be executed."""
+
+        state = self.get_state(atom)
+        intention = self._storage.get_atom_intention(atom.name)
+        transition = self._check_atom_transition(atom, state, st.RUNNING)
        if not transition or intention != st.EXECUTE:
-            return False
+            return (False, None)

-        task_names = []
-        for prev_task in self._graph.predecessors(task):
-            task_names.append(prev_task.name)
+        predecessor_names = []
+        for previous_atom in self._execution_graph.predecessors(atom):
+            predecessor_names.append(previous_atom.name)

-        task_states = self._storage.get_atoms_states(task_names)
-        return all(state == st.SUCCESS and intention == st.EXECUTE
-                   for state, intention in six.itervalues(task_states))
+        predecessor_states = self._storage.get_atoms_states(predecessor_names)
+        predecessor_states_iter = six.itervalues(predecessor_states)
+        ok_to_run = all(state == st.SUCCESS and intention == st.EXECUTE
+                        for state, intention in predecessor_states_iter)

-    def _is_ready_for_revert(self, task):
-        """Checks if task is ready to be reverted."""
-        state = self.get_state(task)
-        intention = self._storage.get_atom_intention(task.name)
-        transition = st.check_task_transition(state, st.REVERTING)
+        if not ok_to_run:
+            return (False, None)
+        else:
+            edge_deciders = self._fetch_edge_deciders(atom)
+            return (True, IgnoreDecider(atom, edge_deciders))
+
+    def _get_maybe_ready_for_revert(self, atom):
+        """Returns if an atom is *likely* ready to be reverted."""
+
+        state = self.get_state(atom)
+        intention = self._storage.get_atom_intention(atom.name)
+        transition = self._check_atom_transition(atom, state, st.REVERTING)
        if not transition or intention not in (st.REVERT, st.RETRY):
-            return False
+            return (False, None)

-        task_names = []
-        for prev_task in self._graph.successors(task):
-            task_names.append(prev_task.name)
+        predecessor_names = []
+        for previous_atom in self._execution_graph.successors(atom):
+            predecessor_names.append(previous_atom.name)

-        task_states = self._storage.get_atoms_states(task_names)
-        return all(state in (st.PENDING, st.REVERTED)
-                   for state, intention in six.itervalues(task_states))
+        predecessor_states = self._storage.get_atoms_states(predecessor_names)
+        predecessor_states_iter = six.itervalues(predecessor_states)
+        ok_to_run = all(state in (st.PENDING, st.REVERTED)
+                        for state, intention in predecessor_states_iter)

-    def iterate_subgraph(self, retry):
-        """Iterates a subgraph connected to given retry controller."""
-        for _src, dst in traversal.dfs_edges(self._graph, retry):
+        if not ok_to_run:
+            return (False, None)
+        else:
+            return (True, NoOpDecider())
+
+    def iterate_subgraph(self, atom):
+        """Iterates a subgraph connected to given atom."""
+        for _src, dst in traversal.dfs_edges(self._execution_graph, atom):
            yield dst

    def iterate_retries(self, state=None):
@@ -135,23 +211,30 @@ class Analyzer(object):

        If no state is provided it will yield back all retry controllers.
        """
-        for node in self._graph.nodes_iter():
+        for node in self._execution_graph.nodes_iter():
            if isinstance(node, retry_atom.Retry):
                if not state or self.get_state(node) == state:
                    yield node

    def iterate_all_nodes(self):
-        for node in self._graph.nodes_iter():
+        """Yields back all nodes in the execution graph."""
+        for node in self._execution_graph.nodes_iter():
            yield node

    def find_atom_retry(self, atom):
-        return self._graph.node[atom].get('retry')
+        """Returns the retry atom associated to the given atom (or none)."""
+        return self._execution_graph.node[atom].get('retry')

    def is_success(self):
-        for node in self._graph.nodes_iter():
-            if self.get_state(node) != st.SUCCESS:
+        """Checks if all nodes in the execution graph are in 'happy' state."""
+        for atom in self.iterate_all_nodes():
+            atom_state = self.get_state(atom)
+            if atom_state == st.IGNORE:
+                continue
+            if atom_state != st.SUCCESS:
                return False
        return True

-    def get_state(self, node):
-        return self._storage.get_atom_state(node.name)
+    def get_state(self, atom):
+        """Gets the state of a given atom (from the backend storage unit)."""
+        return self._storage.get_atom_state(atom.name)
--- a/taskflow/engines/action_engine/compiler.py
+++ b/taskflow/engines/action_engine/compiler.py
@@ -17,14 +17,14 @@
 import collections
 import threading

+import fasteners
+
 from taskflow import exceptions as exc
 from taskflow import flow
 from taskflow import logging
-from taskflow import retry
 from taskflow import task
 from taskflow.types import graph as gr
 from taskflow.types import tree as tr
-from taskflow.utils import lock_utils
 from taskflow.utils import misc

 LOG = logging.getLogger(__name__)
@@ -158,13 +158,22 @@ class Linker(object):
                        " decomposed into an empty graph" % (v, u, u))
                for u in u_g.nodes_iter():
                    for v in v_g.nodes_iter():
-                        depends_on = u.provides & v.requires
+                        # This is using the intersection() method vs the &
+                        # operator since the latter doesn't work with frozen
+                        # sets (when used in combination with ordered sets).
+                        #
+                        # If this is not done the following happens...
+                        #
+                        # TypeError: unsupported operand type(s)
+                        # for &: 'frozenset' and 'OrderedSet'
+                        depends_on = u.provides.intersection(v.requires)
                        if depends_on:
+                            edge_attrs = {
+                                _EDGE_REASONS: frozenset(depends_on),
+                            }
                            _add_update_edges(graph,
                                              [u], [v],
-                                              attr_dict={
-                                                  _EDGE_REASONS: depends_on,
-                                              })
+                                              attr_dict=edge_attrs)
            else:
                # Connect nodes with no predecessors in v to nodes with no
                # successors in the *first* non-empty predecessor of v (thus
@@ -180,8 +189,84 @@ class Linker(object):
            priors.append((u, v))


+class _TaskCompiler(object):
+    """Non-recursive compiler of tasks."""
+
+    @staticmethod
+    def handles(obj):
+        return isinstance(obj, task.BaseTask)
+
+    def compile(self, task, parent=None):
+        graph = gr.DiGraph(name=task.name)
+        graph.add_node(task)
+        node = tr.Node(task)
+        if parent is not None:
+            parent.add(node)
+        return graph, node
+
+
+class _FlowCompiler(object):
+    """Recursive compiler of flows."""
+
+    @staticmethod
+    def handles(obj):
+        return isinstance(obj, flow.Flow)
+
+    def __init__(self, deep_compiler_func, linker):
+        self._deep_compiler_func = deep_compiler_func
+        self._linker = linker
+
+    def _connect_retry(self, retry, graph):
+        graph.add_node(retry)
+
+        # All nodes that have no predecessors should depend on this retry.
+        nodes_to = [n for n in graph.no_predecessors_iter() if n is not retry]
+        if nodes_to:
+            _add_update_edges(graph, [retry], nodes_to,
+                              attr_dict=_RETRY_EDGE_DATA)
+
+        # Add association for each node of graph that has no existing retry.
+        for n in graph.nodes_iter():
+            if n is not retry and flow.LINK_RETRY not in graph.node[n]:
+                graph.node[n][flow.LINK_RETRY] = retry
+
+    @staticmethod
+    def _occurence_detector(to_graph, from_graph):
+        return sum(1 for node in from_graph.nodes_iter()
+                   if node in to_graph)
+
+    def _decompose_flow(self, flow, parent=None):
+        """Decomposes a flow into a graph, tree node + decomposed subgraphs."""
+        graph = gr.DiGraph(name=flow.name)
+        node = tr.Node(flow)
+        if parent is not None:
+            parent.add(node)
+        if flow.retry is not None:
+            node.add(tr.Node(flow.retry))
+        decomposed_members = {}
+        for item in flow:
+            subgraph, _subnode = self._deep_compiler_func(item, parent=node)
+            decomposed_members[item] = subgraph
+            if subgraph.number_of_nodes():
+                graph = gr.merge_graphs(
+                    graph, subgraph,
+                    # We can specialize this to be simpler than the default
+                    # algorithm which creates overhead that we don't
+                    # need for our purposes...
+                    overlap_detector=self._occurence_detector)
+        return graph, node, decomposed_members
+
+    def compile(self, flow, parent=None):
+        graph, node, decomposed_members = self._decompose_flow(flow,
+                                                               parent=parent)
+        self._linker.apply_constraints(graph, flow, decomposed_members)
+        if flow.retry is not None:
+            self._connect_retry(flow.retry, graph)
+        return graph, node
+
+
 class PatternCompiler(object):
-    """Compiles a pattern (or task) into a compilation unit.
+    """Compiles a flow pattern (or task) into a compilation unit.

    Let's dive into the basic idea for how this works:

@@ -189,9 +274,10 @@ class PatternCompiler(object):
    this object could be a task, or a flow (one of the supported patterns),
    the end-goal is to produce a :py:class:`.Compilation` object as the result
    with the needed components. If this is not possible a
-    :py:class:`~.taskflow.exceptions.CompilationFailure` will be raised (or
-    in the case where a unknown type is being requested to compile
-    a ``TypeError`` will be raised).
+    :py:class:`~.taskflow.exceptions.CompilationFailure` will be raised.
+    In the case where a **unknown** type is being requested to compile
+    a ``TypeError`` will be raised and when a duplicate object (one that
+    has **already** been compiled) is encountered a ``ValueError`` is raised.

    The complexity of this comes into play when the 'root' is a flow that
    contains itself other nested flows (and so-on); to compile this object and
@@ -281,98 +367,40 @@ class PatternCompiler(object):
        self._freeze = freeze
        self._lock = threading.Lock()
        self._compilation = None
+        self._matchers = [
+            _FlowCompiler(self._compile, self._linker),
+            _TaskCompiler(),
+        ]

-    def _flatten(self, item, parent):
-        """Flattens a item (pattern, task) into a graph + tree node."""
-        functor = self._find_flattener(item, parent)
-        self._pre_item_flatten(item)
-        graph, node = functor(item, parent)
-        self._post_item_flatten(item, graph, node)
-        return graph, node
-
-    def _find_flattener(self, item, parent):
-        """Locates the flattening function to use to flatten the given item."""
-        if isinstance(item, flow.Flow):
-            return self._flatten_flow
-        elif isinstance(item, task.BaseTask):
-            return self._flatten_task
-        elif isinstance(item, retry.Retry):
-            if parent is None:
-                raise TypeError("Retry controller '%s' (%s) must only be used"
-                                " as a flow constructor parameter and not as a"
-                                " root component" % (item, type(item)))
-            else:
-                raise TypeError("Retry controller '%s' (%s) must only be used"
-                                " as a flow constructor parameter and not as a"
-                                " flow added component" % (item, type(item)))
+    def _compile(self, item, parent=None):
+        """Compiles a item (pattern, task) into a graph + tree node."""
+        for m in self._matchers:
+            if m.handles(item):
+                self._pre_item_compile(item)
+                graph, node = m.compile(item, parent=parent)
+                self._post_item_compile(item, graph, node)
+                return graph, node
        else:
-            raise TypeError("Unknown item '%s' (%s) requested to flatten"
+            raise TypeError("Unknown object '%s' (%s) requested to compile"
                            % (item, type(item)))

-    def _connect_retry(self, retry, graph):
-        graph.add_node(retry)
-
-        # All nodes that have no predecessors should depend on this retry.
-        nodes_to = [n for n in graph.no_predecessors_iter() if n is not retry]
-        if nodes_to:
-            _add_update_edges(graph, [retry], nodes_to,
-                              attr_dict=_RETRY_EDGE_DATA)
-
-        # Add association for each node of graph that has no existing retry.
-        for n in graph.nodes_iter():
-            if n is not retry and flow.LINK_RETRY not in graph.node[n]:
-                graph.node[n][flow.LINK_RETRY] = retry
-
-    def _flatten_task(self, task, parent):
-        """Flattens a individual task."""
-        graph = gr.DiGraph(name=task.name)
-        graph.add_node(task)
-        node = tr.Node(task)
-        if parent is not None:
-            parent.add(node)
-        return graph, node
-
-    def _decompose_flow(self, flow, parent):
-        """Decomposes a flow into a graph, tree node + decomposed subgraphs."""
-        graph = gr.DiGraph(name=flow.name)
-        node = tr.Node(flow)
-        if parent is not None:
-            parent.add(node)
-        if flow.retry is not None:
-            node.add(tr.Node(flow.retry))
-        decomposed_members = {}
-        for item in flow:
-            subgraph, _subnode = self._flatten(item, node)
-            decomposed_members[item] = subgraph
-            if subgraph.number_of_nodes():
-                graph = gr.merge_graphs([graph, subgraph])
-        return graph, node, decomposed_members
-
-    def _flatten_flow(self, flow, parent):
-        """Flattens a flow."""
-        graph, node, decomposed_members = self._decompose_flow(flow, parent)
-        self._linker.apply_constraints(graph, flow, decomposed_members)
-        if flow.retry is not None:
-            self._connect_retry(flow.retry, graph)
-        return graph, node
-
-    def _pre_item_flatten(self, item):
-        """Called before a item is flattened; any pre-flattening actions."""
+    def _pre_item_compile(self, item):
+        """Called before a item is compiled; any pre-compilation actions."""
        if item in self._history:
-            raise ValueError("Already flattened item '%s' (%s), recursive"
-                             " flattening is not supported" % (item,
-                                                               type(item)))
+            raise ValueError("Already compiled item '%s' (%s), duplicate"
+                             " and/or recursive compiling is not"
+                             " supported" % (item, type(item)))
        self._history.add(item)

-    def _post_item_flatten(self, item, graph, node):
-        """Called after a item is flattened; doing post-flattening actions."""
+    def _post_item_compile(self, item, graph, node):
+        """Called after a item is compiled; doing post-compilation actions."""

-    def _pre_flatten(self):
-        """Called before the flattening of the root starts."""
+    def _pre_compile(self):
+        """Called before the compilation of the root starts."""
        self._history.clear()

-    def _post_flatten(self, graph, node):
-        """Called after the flattening of the root finishes successfully."""
+    def _post_compile(self, graph, node):
+        """Called after the compilation of the root finishes successfully."""
        dup_names = misc.get_duplicate_keys(graph.nodes_iter(),
                                            key=lambda node: node.name)
        if dup_names:
@@ -396,13 +424,13 @@ class PatternCompiler(object):
                # Indent it so that it's slightly offset from the above line.
                LOG.blather("  %s", line)

-    @lock_utils.locked
+    @fasteners.locked
    def compile(self):
        """Compiles the contained item into a compiled equivalent."""
        if self._compilation is None:
-            self._pre_flatten()
-            graph, node = self._flatten(self._root, None)
-            self._post_flatten(graph, node)
+            self._pre_compile()
+            graph, node = self._compile(self._root, parent=None)
+            self._post_compile(graph, node)
            if self._freeze:
                graph.freeze()
                node.freeze()
--- a/taskflow/engines/action_engine/completer.py
+++ b/taskflow/engines/action_engine/completer.py
@@ -14,22 +14,102 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import abc
+import weakref
+
+from oslo_utils import reflection
+import six
+
 from taskflow.engines.action_engine import executor as ex
+from taskflow import logging
 from taskflow import retry as retry_atom
 from taskflow import states as st
 from taskflow import task as task_atom
 from taskflow.types import failure

+LOG = logging.getLogger(__name__)
+
+
+@six.add_metaclass(abc.ABCMeta)
+class Strategy(object):
+    """Failure resolution strategy base class."""
+
+    strategy = None
+
+    def __init__(self, runtime):
+        self._runtime = runtime
+
+    @abc.abstractmethod
+    def apply(self):
+        """Applies some algorithm to resolve some detected failure."""
+
+    def __str__(self):
+        base = reflection.get_class_name(self, fully_qualified=False)
+        if self.strategy is not None:
+            strategy_name = self.strategy.name
+        else:
+            strategy_name = "???"
+        return base + "(strategy=%s)" % (strategy_name)
+
+
+class RevertAndRetry(Strategy):
+    """Sets the *associated* subflow for revert to be later retried."""
+
+    strategy = retry_atom.RETRY
+
+    def __init__(self, runtime, retry):
+        super(RevertAndRetry, self).__init__(runtime)
+        self._retry = retry
+
+    def apply(self):
+        tweaked = self._runtime.reset_nodes([self._retry], state=None,
+                                            intention=st.RETRY)
+        tweaked.extend(self._runtime.reset_subgraph(self._retry, state=None,
+                                                    intention=st.REVERT))
+        return tweaked
+
+
+class RevertAll(Strategy):
+    """Sets *all* nodes/atoms to the ``REVERT`` intention."""
+
+    strategy = retry_atom.REVERT_ALL
+
+    def __init__(self, runtime):
+        super(RevertAll, self).__init__(runtime)
+        self._analyzer = runtime.analyzer
+
+    def apply(self):
+        return self._runtime.reset_nodes(self._analyzer.iterate_all_nodes(),
+                                         state=None, intention=st.REVERT)
+
+
+class Revert(Strategy):
+    """Sets atom and *associated* nodes to the ``REVERT`` intention."""
+
+    strategy = retry_atom.REVERT
+
+    def __init__(self, runtime, atom):
+        super(Revert, self).__init__(runtime)
+        self._atom = atom
+
+    def apply(self):
+        tweaked = self._runtime.reset_nodes([self._atom], state=None,
+                                            intention=st.REVERT)
+        tweaked.extend(self._runtime.reset_subgraph(self._atom, state=None,
+                                                    intention=st.REVERT))
+        return tweaked
+

 class Completer(object):
    """Completes atoms using actions to complete them."""

    def __init__(self, runtime):
-        self._runtime = runtime
+        self._runtime = weakref.proxy(runtime)
        self._analyzer = runtime.analyzer
        self._retry_action = runtime.retry_action
        self._storage = runtime.storage
        self._task_action = runtime.task_action
+        self._undefined_resolver = RevertAll(self._runtime)

    def _complete_task(self, task, event, result):
        """Completes the given task, processes task failure."""
@@ -75,6 +155,32 @@ class Completer(object):
                return True
        return False

+    def _determine_resolution(self, atom, failure):
+        """Determines which resolution strategy to activate/apply."""
+        retry = self._analyzer.find_atom_retry(atom)
+        if retry is not None:
+            # Ask retry controller what to do in case of failure.
+            strategy = self._retry_action.on_failure(retry, atom, failure)
+            if strategy == retry_atom.RETRY:
+                return RevertAndRetry(self._runtime, retry)
+            elif strategy == retry_atom.REVERT:
+                # Ask parent retry and figure out what to do...
+                parent_resolver = self._determine_resolution(retry, failure)
+                # Ok if the parent resolver says something not REVERT, and
+                # it isn't just using the undefined resolver, assume the
+                # parent knows best.
+                if parent_resolver is not self._undefined_resolver:
+                    if parent_resolver.strategy != retry_atom.REVERT:
+                        return parent_resolver
+                return Revert(self._runtime, retry)
+            elif strategy == retry_atom.REVERT_ALL:
+                return RevertAll(self._runtime)
+            else:
+                raise ValueError("Unknown atom failure resolution"
+                                 " action/strategy '%s'" % strategy)
+        else:
+            return self._undefined_resolver
+
    def _process_atom_failure(self, atom, failure):
        """Processes atom failure & applies resolution strategies.

@@ -84,30 +190,15 @@ class Completer(object):
        then adjust the needed other atoms intentions, and states, ... so that
        the failure can be worked around.
        """
-        retry = self._analyzer.find_atom_retry(atom)
-        if retry is not None:
-            # Ask retry controller what to do in case of failure
-            action = self._retry_action.on_failure(retry, atom, failure)
-            if action == retry_atom.RETRY:
-                # Prepare just the surrounding subflow for revert to be later
-                # retried...
-                self._storage.set_atom_intention(retry.name, st.RETRY)
-                self._runtime.reset_subgraph(retry, state=None,
-                                             intention=st.REVERT)
-            elif action == retry_atom.REVERT:
-                # Ask parent checkpoint.
-                self._process_atom_failure(retry, failure)
-            elif action == retry_atom.REVERT_ALL:
-                # Prepare all flow for revert
-                self._revert_all()
-            else:
-                raise ValueError("Unknown atom failure resolution"
-                                 " action '%s'" % action)
+        resolver = self._determine_resolution(atom, failure)
+        LOG.debug("Applying resolver '%s' to resolve failure '%s'"
+                  " of atom '%s'", resolver, failure, atom)
+        tweaked = resolver.apply()
+        # Only show the tweaked node list when blather is on, otherwise
+        # just show the amount/count of nodes tweaks...
+        if LOG.isEnabledFor(logging.BLATHER):
+            LOG.blather("Modified/tweaked %s nodes while applying"
+                        " resolver '%s'", tweaked, resolver)
        else:
-            # Prepare all flow for revert
-            self._revert_all()
-
-    def _revert_all(self):
-        """Attempts to set all nodes to the REVERT intention."""
-        self._runtime.reset_nodes(self._analyzer.iterate_all_nodes(),
-                                  state=None, intention=st.REVERT)
+            LOG.debug("Modified/tweaked %s nodes while applying"
+                      " resolver '%s'", len(tweaked), resolver)
--- a/taskflow/engines/action_engine/engine.py
+++ b/taskflow/engines/action_engine/engine.py
@@ -19,7 +19,10 @@ import contextlib
 import threading

 from concurrent import futures
+import fasteners
+import networkx as nx
 from oslo_utils import excutils
+from oslo_utils import strutils
 import six

 from taskflow.engines.action_engine import compiler
@@ -27,11 +30,14 @@ from taskflow.engines.action_engine import executor
 from taskflow.engines.action_engine import runtime
 from taskflow.engines import base
 from taskflow import exceptions as exc
+from taskflow import logging
 from taskflow import states
+from taskflow import storage
 from taskflow.types import failure
-from taskflow.utils import lock_utils
 from taskflow.utils import misc

+LOG = logging.getLogger(__name__)
+

@contextlib.contextmanager
 def _start_stop(executor):
@@ -60,6 +66,13 @@ class ActionEngine(base.Engine):
    """
    _compiler_factory = compiler.PatternCompiler

+    NO_RERAISING_STATES = frozenset([states.SUSPENDED, states.SUCCESS])
+    """
+    States that if the engine stops in will **not** cause any potential
+    failures to be reraised. States **not** in this list will cause any
+    failure/s that were captured (if any) to get reraised.
+    """
+
    def __init__(self, flow, flow_detail, backend, options):
        super(ActionEngine, self).__init__(flow, flow_detail, backend, options)
        self._runtime = None
@@ -69,10 +82,18 @@ class ActionEngine(base.Engine):
        self._state_lock = threading.RLock()
        self._storage_ensured = False

+    def _check(self, name, check_compiled, check_storage_ensured):
+        """Check (and raise) if the engine has not reached a certain stage."""
+        if check_compiled and not self._compiled:
+            raise exc.InvalidState("Can not %s an engine which"
+                                   " has not been compiled" % name)
+        if check_storage_ensured and not self._storage_ensured:
+            raise exc.InvalidState("Can not %s an engine"
+                                   " which has not has its storage"
+                                   " populated" % name)
+
    def suspend(self):
-        if not self._compiled:
-            raise exc.InvalidState("Can not suspend an engine"
-                                   " which has not been compiled")
+        self._check('suspend', True, False)
        self._change_state(states.SUSPENDING)

    @property
@@ -88,8 +109,31 @@ class ActionEngine(base.Engine):
        else:
            return None

+    @misc.cachedproperty
+    def storage(self):
+        """The storage unit for this engine.
+
+        NOTE(harlowja): the atom argument lookup strategy will change for
+        this storage unit after
+        :py:func:`~taskflow.engines.base.Engine.compile` has
+        completed (since **only** after compilation is the actual structure
+        known). Before :py:func:`~taskflow.engines.base.Engine.compile`
+        has completed the atom argument lookup strategy lookup will be
+        restricted to injected arguments **only** (this will **not** reflect
+        the actual runtime lookup strategy, which typically will be, but is
+        not always different).
+        """
+        def _scope_fetcher(atom_name):
+            if self._compiled:
+                return self._runtime.fetch_scopes_for(atom_name)
+            else:
+                return None
+        return storage.Storage(self._flow_detail,
+                               backend=self._backend,
+                               scope_fetcher=_scope_fetcher)
+
    def run(self):
-        with lock_utils.try_lock(self._lock) as was_locked:
+        with fasteners.try_lock(self._lock) as was_locked:
            if not was_locked:
                raise exc.ExecutionFailure("Engine currently locked, please"
                                           " try again later")
@@ -119,6 +163,7 @@ class ActionEngine(base.Engine):
        """
        self.compile()
        self.prepare()
+        self.validate()
        runner = self._runtime.runner
        last_state = None
        with _start_stop(self._task_executor):
@@ -148,7 +193,7 @@ class ActionEngine(base.Engine):
                ignorable_states = getattr(runner, 'ignorable_states', [])
                if last_state and last_state not in ignorable_states:
                    self._change_state(last_state)
-                    if last_state not in [states.SUSPENDED, states.SUCCESS]:
+                    if last_state not in self.NO_RERAISING_STATES:
                        failures = self.storage.get_failures()
                        failure.Failure.reraise_if_any(failures.values())

@@ -168,16 +213,63 @@ class ActionEngine(base.Engine):

    def _ensure_storage(self):
        """Ensure all contained atoms exist in the storage unit."""
+        transient = strutils.bool_from_string(
+            self._options.get('inject_transient', True))
+        self.storage.ensure_atoms(
+            self._compilation.execution_graph.nodes_iter())
        for node in self._compilation.execution_graph.nodes_iter():
-            self.storage.ensure_atom(node)
            if node.inject:
-                self.storage.inject_atom_args(node.name, node.inject)
+                self.storage.inject_atom_args(node.name,
+                                              node.inject,
+                                              transient=transient)

-    @lock_utils.locked
+    @fasteners.locked
+    def validate(self):
+        self._check('validate', True, True)
+        # At this point we can check to ensure all dependencies are either
+        # flow/task provided or storage provided, if there are still missing
+        # dependencies then this flow will fail at runtime (which we can avoid
+        # by failing at validation time).
+        execution_graph = self._compilation.execution_graph
+        if LOG.isEnabledFor(logging.BLATHER):
+            LOG.blather("Validating scoping and argument visibility for"
+                        " execution graph with %s nodes and %s edges with"
+                        " density %0.3f", execution_graph.number_of_nodes(),
+                        execution_graph.number_of_edges(),
+                        nx.density(execution_graph))
+        missing = set()
+        # Attempt to retain a chain of what was missing (so that the final
+        # raised exception for the flow has the nodes that had missing
+        # dependencies).
+        last_cause = None
+        last_node = None
+        missing_nodes = 0
+        fetch_func = self.storage.fetch_unsatisfied_args
+        for node in execution_graph.nodes_iter():
+            node_missing = fetch_func(node.name, node.rebind,
+                                      optional_args=node.optional)
+            if node_missing:
+                cause = exc.MissingDependencies(node,
+                                                sorted(node_missing),
+                                                cause=last_cause)
+                last_cause = cause
+                last_node = node
+                missing_nodes += 1
+                missing.update(node_missing)
+        if missing:
+            # For when a task is provided (instead of a flow) and that
+            # task is the only item in the graph and its missing deps, avoid
+            # re-wrapping it in yet another exception...
+            if missing_nodes == 1 and last_node is self._flow:
+                raise last_cause
+            else:
+                raise exc.MissingDependencies(self._flow,
+                                              sorted(missing),
+                                              cause=last_cause)
+
+    @fasteners.locked
    def prepare(self):
-        if not self._compiled:
-            raise exc.InvalidState("Can not prepare an engine"
-                                   " which has not been compiled")
+        self._check('prepare', True, False)
        if not self._storage_ensured:
            # Set our own state to resuming -> (ensure atoms exist
            # in storage) -> suspended in the storage unit and notify any
@@ -186,14 +278,6 @@ class ActionEngine(base.Engine):
            self._ensure_storage()
            self._change_state(states.SUSPENDED)
            self._storage_ensured = True
-        # At this point we can check to ensure all dependencies are either
-        # flow/task provided or storage provided, if there are still missing
-        # dependencies then this flow will fail at runtime (which we can avoid
-        # by failing at preparation time).
-        external_provides = set(self.storage.fetch_all().keys())
-        missing = self._flow.requires - external_provides
-        if missing:
-            raise exc.MissingDependencies(self._flow, sorted(missing))
        # Reset everything back to pending (if we were previously reverted).
        if self.storage.get_flow_state() == states.REVERTED:
            self._runtime.reset_all()
@@ -203,7 +287,7 @@ class ActionEngine(base.Engine):
    def _compiler(self):
        return self._compiler_factory(self._flow)

-    @lock_utils.locked
+    @fasteners.locked
    def compile(self):
        if self._compiled:
            return
@@ -212,6 +296,7 @@ class ActionEngine(base.Engine):
                                        self.storage,
                                        self.atom_notifier,
                                        self._task_executor)
+        self._runtime.compile()
        self._compiled = True


@@ -239,7 +324,7 @@ class _ExecutorTextMatch(collections.namedtuple('_ExecutorTextMatch',
 class ParallelActionEngine(ActionEngine):
    """Engine that runs tasks in parallel manner.

-    Supported keyword arguments:
+    Supported option keys:

    * ``executor``: a object that implements a :pep:`3148` compatible executor
      interface; it will be used for scheduling tasks. The following
@@ -279,7 +364,7 @@ String (case insensitive)    Executor used
    #
    # NOTE(harlowja): the reason we use the library/built-in futures is to
    # allow for instances of that to be detected and handled correctly, instead
-    # of forcing everyone to use our derivatives...
+    # of forcing everyone to use our derivatives (futurist or other)...
    _executor_cls_matchers = [
        _ExecutorTypeMatch((futures.ThreadPoolExecutor,),
                           executor.ParallelThreadTaskExecutor),
--- a/taskflow/engines/action_engine/executor.py
+++ b/taskflow/engines/action_engine/executor.py
@@ -19,7 +19,9 @@ import collections
 from multiprocessing import managers
 import os
 import pickle
+import threading

+import futurist
 from oslo_utils import excutils
 from oslo_utils import reflection
 from oslo_utils import timeutils
@@ -30,9 +32,7 @@ from six.moves import queue as compat_queue
 from taskflow import logging
 from taskflow import task as task_atom
 from taskflow.types import failure
-from taskflow.types import futures
 from taskflow.types import notifier
-from taskflow.types import timing
 from taskflow.utils import async_utils
 from taskflow.utils import threading_utils

@@ -175,7 +175,7 @@ class _WaitWorkItem(object):
                'kind': _KIND_COMPLETE_ME,
            }
            if self._channel.put(message):
-                watch = timing.StopWatch()
+                watch = timeutils.StopWatch()
                watch.start()
                self._barrier.wait()
                LOG.blather("Waited %s seconds until task '%s' %s emitted"
@@ -240,7 +240,7 @@ class _Dispatcher(object):
            raise ValueError("Provided dispatch periodicity must be greater"
                             " than zero and not '%s'" % dispatch_periodicity)
        self._targets = {}
-        self._dead = threading_utils.Event()
+        self._dead = threading.Event()
        self._dispatch_periodicity = dispatch_periodicity
        self._stop_when_empty = False

@@ -304,7 +304,7 @@ class _Dispatcher(object):
                     " %s to target '%s'", kind, sender, target)

    def run(self, queue):
-        watch = timing.StopWatch(duration=self._dispatch_periodicity)
+        watch = timeutils.StopWatch(duration=self._dispatch_periodicity)
        while (not self._dead.is_set() or
               (self._stop_when_empty and self._targets)):
            watch.restart()
@@ -347,18 +347,16 @@ class TaskExecutor(object):

    def start(self):
        """Prepare to execute tasks."""
-        pass

    def stop(self):
        """Finalize task executor."""
-        pass


 class SerialTaskExecutor(TaskExecutor):
    """Executes tasks one after another."""

    def __init__(self):
-        self._executor = futures.SynchronousExecutor()
+        self._executor = futurist.SynchronousExecutor()

    def start(self):
        self._executor.restart()
@@ -417,11 +415,8 @@ class ParallelTaskExecutor(TaskExecutor):

    def start(self):
        if self._own_executor:
-            if self._max_workers is not None:
-                max_workers = self._max_workers
-            else:
-                max_workers = threading_utils.get_optimal_thread_count()
-            self._executor = self._create_executor(max_workers=max_workers)
+            self._executor = self._create_executor(
+                max_workers=self._max_workers)

    def stop(self):
        if self._own_executor:
@@ -433,7 +428,7 @@ class ParallelThreadTaskExecutor(ParallelTaskExecutor):
    """Executes tasks in parallel using a thread pool executor."""

    def _create_executor(self, max_workers=None):
-        return futures.ThreadPoolExecutor(max_workers=max_workers)
+        return futurist.ThreadPoolExecutor(max_workers=max_workers)


 class ParallelProcessTaskExecutor(ParallelTaskExecutor):
@@ -463,7 +458,7 @@ class ParallelProcessTaskExecutor(ParallelTaskExecutor):
        self._queue = None

    def _create_executor(self, max_workers=None):
-        return futures.ProcessPoolExecutor(max_workers=max_workers)
+        return futurist.ProcessPoolExecutor(max_workers=max_workers)

    def start(self):
        if threading_utils.is_alive(self._worker):
--- a/taskflow/engines/action_engine/runner.py
+++ b/taskflow/engines/action_engine/runner.py
@@ -51,39 +51,50 @@ class _MachineMemory(object):
        self.done = set()


-class _MachineBuilder(object):
-    """State machine *builder* that the runner uses.
+class Runner(object):
+    """State machine *builder* + *runner* that powers the engine components.

-    NOTE(harlowja): the machine states that this build will for are::
+    NOTE(harlowja): the machine (states and events that will trigger
+    transitions) that this builds is represented by the following
+    table::

-    +--------------+------------------+------------+----------+---------+
-         Start     |      Event       |    End     | On Enter | On Exit
-    +--------------+------------------+------------+----------+---------+
-       ANALYZING   |    completed     | GAME_OVER  |          |
-       ANALYZING   |  schedule_next   | SCHEDULING |          |
-       ANALYZING   |  wait_finished   |  WAITING   |          |
-       FAILURE[$]  |                  |            |          |
-       GAME_OVER   |      failed      |  FAILURE   |          |
-       GAME_OVER   |     reverted     |  REVERTED  |          |
-       GAME_OVER   |     success      |  SUCCESS   |          |
-       GAME_OVER   |    suspended     | SUSPENDED  |          |
-        RESUMING   |  schedule_next   | SCHEDULING |          |
-      REVERTED[$]  |                  |            |          |
-       SCHEDULING  |  wait_finished   |  WAITING   |          |
-       SUCCESS[$]  |                  |            |          |
-      SUSPENDED[$] |                  |            |          |
-      UNDEFINED[^] |      start       |  RESUMING  |          |
-        WAITING    | examine_finished | ANALYZING  |          |
-    +--------------+------------------+------------+----------+---------+
+        +--------------+------------------+------------+----------+---------+
+             Start     |      Event       |    End     | On Enter | On Exit
+        +--------------+------------------+------------+----------+---------+
+           ANALYZING   |    completed     | GAME_OVER  |          |
+           ANALYZING   |  schedule_next   | SCHEDULING |          |
+           ANALYZING   |  wait_finished   |  WAITING   |          |
+           FAILURE[$]  |                  |            |          |
+           GAME_OVER   |      failed      |  FAILURE   |          |
+           GAME_OVER   |     reverted     |  REVERTED  |          |
+           GAME_OVER   |     success      |  SUCCESS   |          |
+           GAME_OVER   |    suspended     | SUSPENDED  |          |
+            RESUMING   |  schedule_next   | SCHEDULING |          |
+          REVERTED[$]  |                  |            |          |
+           SCHEDULING  |  wait_finished   |  WAITING   |          |
+           SUCCESS[$]  |                  |            |          |
+          SUSPENDED[$] |                  |            |          |
+          UNDEFINED[^] |      start       |  RESUMING  |          |
+            WAITING    | examine_finished | ANALYZING  |          |
+        +--------------+------------------+------------+----------+---------+

    Between any of these yielded states (minus ``GAME_OVER`` and ``UNDEFINED``)
    if the engine has been suspended or the engine has failed (due to a
    non-resolveable task failure or scheduling failure) the machine will stop
    executing new tasks (currently running tasks will be allowed to complete)
    and this machines run loop will be broken.
+
+    NOTE(harlowja): If the runtimes scheduler component is able to schedule
+    tasks in parallel, this enables parallel running and/or reversion.
    """

+    # Informational states this action yields while running, not useful to
+    # have the engine record but useful to provide to end-users when doing
+    # execution iterations.
+    ignorable_states = (st.SCHEDULING, st.WAITING, st.RESUMING, st.ANALYZING)
+
    def __init__(self, runtime, waiter):
+        self._runtime = runtime
        self._analyzer = runtime.analyzer
        self._completer = runtime.completer
        self._scheduler = runtime.scheduler
@@ -91,20 +102,36 @@ class _MachineBuilder(object):
        self._waiter = waiter

    def runnable(self):
+        """Checks if the storage says the flow is still runnable/running."""
        return self._storage.get_flow_state() == st.RUNNING

    def build(self, timeout=None):
+        """Builds a state-machine (that can be/is used during running)."""
+
        memory = _MachineMemory()
        if timeout is None:
            timeout = _WAITING_TIMEOUT

+        # Cache some local functions/methods...
+        do_schedule = self._scheduler.schedule
+        wait_for_any = self._waiter.wait_for_any
+        do_complete = self._completer.complete
+
+        def iter_next_nodes(target_node=None):
+            # Yields and filters and tweaks the next nodes to execute...
+            maybe_nodes = self._analyzer.get_next_nodes(node=target_node)
+            for node, late_decider in maybe_nodes:
+                proceed = late_decider.check_and_affect(self._runtime)
+                if proceed:
+                    yield node
+
        def resume(old_state, new_state, event):
            # This reaction function just updates the state machines memory
            # to include any nodes that need to be executed (from a previous
            # attempt, which may be empty if never ran before) and any nodes
            # that are now ready to be ran.
            memory.next_nodes.update(self._completer.resume())
-            memory.next_nodes.update(self._analyzer.get_next_nodes())
+            memory.next_nodes.update(iter_next_nodes())
            return _SCHEDULE

        def game_over(old_state, new_state, event):
@@ -114,7 +141,7 @@ class _MachineBuilder(object):
            # it is *always* called before the final state is entered.
            if memory.failures:
                return _FAILED
-            if self._analyzer.get_next_nodes():
+            if any(1 for node in iter_next_nodes()):
                return _SUSPENDED
            elif self._analyzer.is_success():
                return _SUCCESS
@@ -128,8 +155,7 @@ class _MachineBuilder(object):
            # that holds this information to stop or suspend); handles failures
            # that occur during this process safely...
            if self.runnable() and memory.next_nodes:
-                not_done, failures = self._scheduler.schedule(
-                    memory.next_nodes)
+                not_done, failures = do_schedule(memory.next_nodes)
                if not_done:
                    memory.not_done.update(not_done)
                if failures:
@@ -142,8 +168,7 @@ class _MachineBuilder(object):
            # call sometime in the future, or equivalent that will work in
            # py2 and py3.
            if memory.not_done:
-                done, not_done = self._waiter.wait_for_any(memory.not_done,
-                                                           timeout)
+                done, not_done = wait_for_any(memory.not_done, timeout)
                memory.done.update(done)
                memory.not_done = not_done
            return _ANALYZE
@@ -160,7 +185,7 @@ class _MachineBuilder(object):
                node = fut.atom
                try:
                    event, result = fut.result()
-                    retain = self._completer.complete(node, event, result)
+                    retain = do_complete(node, event, result)
                    if isinstance(result, failure.Failure):
                        if retain:
                            memory.failures.append(result)
@@ -183,7 +208,7 @@ class _MachineBuilder(object):
                    memory.failures.append(failure.Failure())
                else:
                    try:
-                        more_nodes = self._analyzer.get_next_nodes(node)
+                        more_nodes = set(iter_next_nodes(target_node=node))
                    except Exception:
                        memory.failures.append(failure.Failure())
                    else:
@@ -204,10 +229,10 @@ class _MachineBuilder(object):
            LOG.debug("Entering new state '%s' in response to event '%s'",
                      new_state, event)

-        # NOTE(harlowja): when ran in debugging mode it is quite useful
+        # NOTE(harlowja): when ran in blather mode it is quite useful
        # to track the various state transitions as they happen...
        watchers = {}
-        if LOG.isEnabledFor(logging.DEBUG):
+        if LOG.isEnabledFor(logging.BLATHER):
            watchers['on_exit'] = on_exit
            watchers['on_enter'] = on_enter

@@ -244,38 +269,9 @@ class _MachineBuilder(object):
        m.freeze()
        return (m, memory)

-
-class Runner(object):
-    """Runner that iterates while executing nodes using the given runtime.
-
-    This runner acts as the action engine run loop/state-machine, it resumes
-    the workflow, schedules all task it can for execution using the runtimes
-    scheduler and analyzer components, and than waits on returned futures and
-    then activates the runtimes completion component to finish up those tasks
-    and so on...
-
-    NOTE(harlowja): If the runtimes scheduler component is able to schedule
-    tasks in parallel, this enables parallel running and/or reversion.
-    """
-
-    # Informational states this action yields while running, not useful to
-    # have the engine record but useful to provide to end-users when doing
-    # execution iterations.
-    ignorable_states = (st.SCHEDULING, st.WAITING, st.RESUMING, st.ANALYZING)
-
-    def __init__(self, runtime, waiter):
-        self._builder = _MachineBuilder(runtime, waiter)
-
-    @property
-    def builder(self):
-        return self._builder
-
-    def runnable(self):
-        return self._builder.runnable()
-
    def run_iter(self, timeout=None):
-        """Runs the nodes using a built state machine."""
-        machine, memory = self.builder.build(timeout=timeout)
+        """Runs iteratively using a locally built state machine."""
+        machine, memory = self.build(timeout=timeout)
        for (_prior_state, new_state) in machine.run_iter(_START):
            # NOTE(harlowja): skip over meta-states.
            if new_state not in _META_STATES:
--- a/taskflow/engines/action_engine/runtime.py
+++ b/taskflow/engines/action_engine/runtime.py
@@ -14,6 +14,8 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import functools
+
 from taskflow.engines.action_engine.actions import retry as ra
 from taskflow.engines.action_engine.actions import task as ta
 from taskflow.engines.action_engine import analyzer as an
@@ -21,7 +23,9 @@ from taskflow.engines.action_engine import completer as co
 from taskflow.engines.action_engine import runner as ru
 from taskflow.engines.action_engine import scheduler as sched
 from taskflow.engines.action_engine import scopes as sc
+from taskflow import flow as flow_type
 from taskflow import states as st
+from taskflow import task
 from taskflow.utils import misc


@@ -38,7 +42,53 @@ class Runtime(object):
        self._task_executor = task_executor
        self._storage = storage
        self._compilation = compilation
-        self._scopes = {}
+        self._atom_cache = {}
+
+    def compile(self):
+        """Compiles & caches frequently used execution helper objects.
+
+        Build out a cache of commonly used item that are associated
+        with the contained atoms (by name), and are useful to have for
+        quick lookup on (for example, the change state handler function for
+        each atom, the scope walker object for each atom, the task or retry
+        specific scheduler and so-on).
+        """
+        change_state_handlers = {
+            'task': functools.partial(self.task_action.change_state,
+                                      progress=0.0),
+            'retry': self.retry_action.change_state,
+        }
+        schedulers = {
+            'retry': self.retry_scheduler,
+            'task': self.task_scheduler,
+        }
+        execution_graph = self._compilation.execution_graph
+        for atom in self.analyzer.iterate_all_nodes():
+            metadata = {}
+            walker = sc.ScopeWalker(self.compilation, atom, names_only=True)
+            if isinstance(atom, task.BaseTask):
+                check_transition_handler = st.check_task_transition
+                change_state_handler = change_state_handlers['task']
+                scheduler = schedulers['task']
+            else:
+                check_transition_handler = st.check_retry_transition
+                change_state_handler = change_state_handlers['retry']
+                scheduler = schedulers['retry']
+            edge_deciders = {}
+            for previous_atom in execution_graph.predecessors(atom):
+                # If there is any link function that says if this connection
+                # is able to run (or should not) ensure we retain it and use
+                # it later as needed.
+                u_v_data = execution_graph.adj[previous_atom][atom]
+                u_v_decider = u_v_data.get(flow_type.LINK_DECIDER)
+                if u_v_decider is not None:
+                    edge_deciders[previous_atom.name] = u_v_decider
+            metadata['scope_walker'] = walker
+            metadata['check_transition_handler'] = check_transition_handler
+            metadata['change_state_handler'] = change_state_handler
+            metadata['scheduler'] = scheduler
+            metadata['edge_deciders'] = edge_deciders
+            self._atom_cache[atom.name] = metadata

    @property
    def compilation(self):
@@ -50,7 +100,7 @@ class Runtime(object):

    @misc.cachedproperty
    def analyzer(self):
-        return an.Analyzer(self._compilation, self._storage)
+        return an.Analyzer(self)

    @misc.cachedproperty
    def runner(self):
@@ -64,53 +114,101 @@ class Runtime(object):
    def scheduler(self):
        return sched.Scheduler(self)

+    @misc.cachedproperty
+    def task_scheduler(self):
+        return sched.TaskScheduler(self)
+
+    @misc.cachedproperty
+    def retry_scheduler(self):
+        return sched.RetryScheduler(self)
+
    @misc.cachedproperty
    def retry_action(self):
-        return ra.RetryAction(self._storage, self._atom_notifier,
-                              self._fetch_scopes_for)
+        return ra.RetryAction(self._storage,
+                              self._atom_notifier)

    @misc.cachedproperty
    def task_action(self):
        return ta.TaskAction(self._storage,
-                             self._atom_notifier, self._fetch_scopes_for,
+                             self._atom_notifier,
                             self._task_executor)

-    def _fetch_scopes_for(self, atom):
-        """Fetches a tuple of the visible scopes for the given atom."""
+    def check_atom_transition(self, atom, current_state, target_state):
+        """Checks if the atom can transition to the provided target state."""
+        # This does not check if the name exists (since this is only used
+        # internally to the engine, and is not exposed to atoms that will
+        # not exist and therefore doesn't need to handle that case).
+        metadata = self._atom_cache[atom.name]
+        check_transition_handler = metadata['check_transition_handler']
+        return check_transition_handler(current_state, target_state)
+
+    def fetch_edge_deciders(self, atom):
+        """Fetches the edge deciders for the given atom."""
+        # This does not check if the name exists (since this is only used
+        # internally to the engine, and is not exposed to atoms that will
+        # not exist and therefore doesn't need to handle that case).
+        metadata = self._atom_cache[atom.name]
+        return metadata['edge_deciders']
+
+    def fetch_scheduler(self, atom):
+        """Fetches the cached specific scheduler for the given atom."""
+        # This does not check if the name exists (since this is only used
+        # internally to the engine, and is not exposed to atoms that will
+        # not exist and therefore doesn't need to handle that case).
+        metadata = self._atom_cache[atom.name]
+        return metadata['scheduler']
+
+    def fetch_scopes_for(self, atom_name):
+        """Fetches a walker of the visible scopes for the given atom."""
        try:
-            return self._scopes[atom]
+            metadata = self._atom_cache[atom_name]
        except KeyError:
-            walker = sc.ScopeWalker(self.compilation, atom,
-                                    names_only=True)
-            visible_to = tuple(walker)
-            self._scopes[atom] = visible_to
-            return visible_to
+            # This signals to the caller that there is no walker for whatever
+            # atom name was given that doesn't really have any associated atom
+            # known to be named with that name; this is done since the storage
+            # layer will call into this layer to fetch a scope for a named
+            # atom and users can provide random names that do not actually
+            # exist...
+            return None
+        else:
+            return metadata['scope_walker']

    # Various helper methods used by the runtime components; not for public
    # consumption...

-    def reset_nodes(self, nodes, state=st.PENDING, intention=st.EXECUTE):
-        for node in nodes:
+    def reset_nodes(self, atoms, state=st.PENDING, intention=st.EXECUTE):
+        """Resets all the provided atoms to the given state and intention."""
+        tweaked = []
+        for atom in atoms:
+            metadata = self._atom_cache[atom.name]
+            if state or intention:
+                tweaked.append((atom, state, intention))
            if state:
-                if self.task_action.handles(node):
-                    self.task_action.change_state(node, state,
-                                                  progress=0.0)
-                elif self.retry_action.handles(node):
-                    self.retry_action.change_state(node, state)
-                else:
-                    raise TypeError("Unknown how to reset atom '%s' (%s)"
-                                    % (node, type(node)))
+                change_state_handler = metadata['change_state_handler']
+                change_state_handler(atom, state)
            if intention:
-                self.storage.set_atom_intention(node.name, intention)
+                self.storage.set_atom_intention(atom.name, intention)
+        return tweaked

    def reset_all(self, state=st.PENDING, intention=st.EXECUTE):
-        self.reset_nodes(self.analyzer.iterate_all_nodes(),
-                         state=state, intention=intention)
+        """Resets all atoms to the given state and intention."""
+        return self.reset_nodes(self.analyzer.iterate_all_nodes(),
+                                state=state, intention=intention)

-    def reset_subgraph(self, node, state=st.PENDING, intention=st.EXECUTE):
-        self.reset_nodes(self.analyzer.iterate_subgraph(node),
-                         state=state, intention=intention)
+    def reset_subgraph(self, atom, state=st.PENDING, intention=st.EXECUTE):
+        """Resets a atoms subgraph to the given state and intention.
+
+        The subgraph is contained of all of the atoms successors.
+        """
+        return self.reset_nodes(self.analyzer.iterate_subgraph(atom),
+                                state=state, intention=intention)

    def retry_subflow(self, retry):
+        """Prepares a retrys + its subgraph for execution.
+
+        This sets the retrys intention to ``EXECUTE`` and resets all of its
+        subgraph (its successors) to the ``PENDING`` state with an ``EXECUTE``
+        intention.
+        """
        self.storage.set_atom_intention(retry.name, st.EXECUTE)
        self.reset_subgraph(retry)
--- a/taskflow/engines/action_engine/scheduler.py
+++ b/taskflow/engines/action_engine/scheduler.py
@@ -14,23 +14,21 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import weakref
+
 from taskflow import exceptions as excp
-from taskflow import retry as retry_atom
 from taskflow import states as st
-from taskflow import task as task_atom
 from taskflow.types import failure


-class _RetryScheduler(object):
+class RetryScheduler(object):
+    """Schedules retry atoms."""
+
    def __init__(self, runtime):
-        self._runtime = runtime
+        self._runtime = weakref.proxy(runtime)
        self._retry_action = runtime.retry_action
        self._storage = runtime.storage

-    @staticmethod
-    def handles(atom):
-        return isinstance(atom, retry_atom.Retry)
-
    def schedule(self, retry):
        """Schedules the given retry atom for *future* completion.

@@ -51,15 +49,13 @@ class _RetryScheduler(object):
                                        " intention: %s" % intention)


-class _TaskScheduler(object):
+class TaskScheduler(object):
+    """Schedules task atoms."""
+
    def __init__(self, runtime):
        self._storage = runtime.storage
        self._task_action = runtime.task_action

-    @staticmethod
-    def handles(atom):
-        return isinstance(atom, task_atom.BaseTask)
-
    def schedule(self, task):
        """Schedules the given task atom for *future* completion.

@@ -77,39 +73,28 @@ class _TaskScheduler(object):


 class Scheduler(object):
-    """Schedules atoms using actions to schedule."""
+    """Safely schedules atoms using a runtime ``fetch_scheduler`` routine."""

    def __init__(self, runtime):
-        self._schedulers = [
-            _RetryScheduler(runtime),
-            _TaskScheduler(runtime),
-        ]
+        self._fetch_scheduler = runtime.fetch_scheduler

-    def _schedule_node(self, node):
-        """Schedule a single node for execution."""
-        for sched in self._schedulers:
-            if sched.handles(node):
-                return sched.schedule(node)
-        else:
-            raise TypeError("Unknown how to schedule '%s' (%s)"
-                            % (node, type(node)))
+    def schedule(self, atoms):
+        """Schedules the provided atoms for *future* completion.

-    def schedule(self, nodes):
-        """Schedules the provided nodes for *future* completion.
-
-        This method should schedule a future for each node provided and return
+        This method should schedule a future for each atom provided and return
        a set of those futures to be waited on (or used for other similar
        purposes). It should also return any failure objects that represented
        scheduling failures that may have occurred during this scheduling
        process.
        """
        futures = set()
-        for node in nodes:
+        for atom in atoms:
+            scheduler = self._fetch_scheduler(atom)
            try:
-                futures.add(self._schedule_node(node))
+                futures.add(scheduler.schedule(atom))
            except Exception:
                # Immediately stop scheduling future work so that we can
-                # exit execution early (rather than later) if a single task
+                # exit execution early (rather than later) if a single atom
                # fails to schedule correctly.
                return (futures, [failure.Failure()])
        return (futures, [])
--- a/taskflow/engines/action_engine/scopes.py
+++ b/taskflow/engines/action_engine/scopes.py
@@ -21,29 +21,30 @@ from taskflow import logging
 LOG = logging.getLogger(__name__)


-def _extract_atoms(node, idx=-1):
+def _extract_atoms_iter(node, idx=-1):
    # Always go left to right, since right to left is the pattern order
    # and we want to go backwards and not forwards through that ordering...
    if idx == -1:
        children_iter = node.reverse_iter()
    else:
        children_iter = reversed(node[0:idx])
-    atoms = []
    for child in children_iter:
        if isinstance(child.item, flow_type.Flow):
-            atoms.extend(_extract_atoms(child))
+            for atom in _extract_atoms_iter(child):
+                yield atom
        elif isinstance(child.item, atom_type.Atom):
-            atoms.append(child.item)
+            yield child.item
        else:
            raise TypeError(
                "Unknown extraction item '%s' (%s)" % (child.item,
                                                       type(child.item)))
-    return atoms


 class ScopeWalker(object):
    """Walks through the scopes of a atom using a engines compilation.

+    NOTE(harlowja): for internal usage only.
+
    This will walk the visible scopes that are accessible for the given
    atom, which can be used by some external entity in some meaningful way,
    for example to find dependent values...
@@ -54,60 +55,80 @@ class ScopeWalker(object):
        if self._node is None:
            raise ValueError("Unable to find atom '%s' in compilation"
                             " hierarchy" % atom)
+        self._level_cache = {}
        self._atom = atom
        self._graph = compilation.execution_graph
        self._names_only = names_only
+        self._predecessors = None
+
+    #: Function that extracts the *associated* atoms of a given tree node.
+    _extract_atoms_iter = staticmethod(_extract_atoms_iter)

    def __iter__(self):
        """Iterates over the visible scopes.

        How this works is the following:

-        We find all the possible predecessors of the given atom, this is useful
-        since we know they occurred before this atom but it doesn't tell us
-        the corresponding scope *level* that each predecessor was created in,
-        so we need to find this information.
+        We first grab all the predecessors of the given atom (lets call it
+        ``Y``) by using the :py:class:`~.compiler.Compilation` execution
+        graph (and doing a reverse breadth-first expansion to gather its
+        predecessors), this is useful since we know they *always* will
+        exist (and execute) before this atom but it does not tell us the
+        corresponding scope *level* (flow, nested flow...) that each
+        predecessor was created in, so we need to find this information.

        For that information we consult the location of the atom ``Y`` in the
-        node hierarchy. We lookup in a reverse order the parent ``X`` of ``Y``
-        and traverse backwards from the index in the parent where ``Y``
-        occurred, all children in ``X`` that we encounter in this backwards
-        search (if a child is a flow itself, its atom contents will be
-        expanded) will be assumed to be at the same scope. This is then a
-        *potential* single scope, to make an *actual* scope we remove the items
-        from the *potential* scope that are not predecessors of ``Y`` to form
-        the *actual* scope.
+        :py:class:`~.compiler.Compilation` hierarchy/tree. We lookup in a
+        reverse order the parent ``X`` of ``Y`` and traverse backwards from
+        the index in the parent where ``Y`` exists to all siblings (and
+        children of those siblings) in ``X`` that we encounter in this
+        backwards search (if a sibling is a flow itself, its atom(s)
+        will be recursively expanded and included). This collection will
+        then be assumed to be at the same scope. This is what is called
+        a *potential* single scope, to make an *actual* scope we remove the
+        items from the *potential* scope that are **not** predecessors
+        of ``Y`` to form the *actual* scope which we then yield back.

        Then for additional scopes we continue up the tree, by finding the
        parent of ``X`` (lets call it ``Z``) and perform the same operation,
        going through the children in a reverse manner from the index in
        parent ``Z`` where ``X`` was located. This forms another *potential*
        scope which we provide back as an *actual* scope after reducing the
-        potential set by the predecessors of ``Y``. We then repeat this process
-        until we no longer have any parent nodes (aka have reached the top of
-        the tree) or we run out of predecessors.
+        potential set to only include predecessors previously gathered. We
+        then repeat this process until we no longer have any parent
+        nodes (aka we have reached the top of the tree) or we run out of
+        predecessors.
        """
-        predecessors = set(self._graph.bfs_predecessors_iter(self._atom))
+        if self._predecessors is None:
+            pred_iter = self._graph.bfs_predecessors_iter(self._atom)
+            self._predecessors = set(pred_iter)
+        predecessors = self._predecessors.copy()
        last = self._node
-        for parent in self._node.path_iter(include_self=False):
+        for lvl, parent in enumerate(self._node.path_iter(include_self=False)):
            if not predecessors:
                break
            last_idx = parent.index(last.item)
-            visible = []
-            for a in _extract_atoms(parent, idx=last_idx):
-                if a in predecessors:
-                    predecessors.remove(a)
-                    if not self._names_only:
-                        visible.append(a)
-                    else:
-                        visible.append(a.name)
-            if LOG.isEnabledFor(logging.BLATHER):
-                if not self._names_only:
+            try:
+                visible, removals = self._level_cache[lvl]
+                predecessors = predecessors - removals
+            except KeyError:
+                visible = []
+                removals = set()
+                for atom in self._extract_atoms_iter(parent, idx=last_idx):
+                    if atom in predecessors:
+                        predecessors.remove(atom)
+                        removals.add(atom)
+                        visible.append(atom)
+                    if not predecessors:
+                        break
+                self._level_cache[lvl] = (visible, removals)
+                if LOG.isEnabledFor(logging.BLATHER):
                    visible_names = [a.name for a in visible]
-                else:
-                    visible_names = visible
-                LOG.blather("Scope visible to '%s' (limited by parent '%s'"
-                            " index < %s) is: %s", self._atom,
-                            parent.item.name, last_idx, visible_names)
-            yield visible
+                    LOG.blather("Scope visible to '%s' (limited by parent '%s'"
+                                " index < %s) is: %s", self._atom,
+                                parent.item.name, last_idx, visible_names)
+            if self._names_only:
+                yield [a.name for a in visible]
+            else:
+                yield visible
            last = parent
--- a/taskflow/engines/base.py
+++ b/taskflow/engines/base.py
@@ -17,12 +17,10 @@

 import abc

+from debtcollector import moves
 import six

-from taskflow import storage
 from taskflow.types import notifier
-from taskflow.utils import deprecation
-from taskflow.utils import misc


@six.add_metaclass(abc.ABCMeta)
@@ -56,10 +54,18 @@ class Engine(object):
        return self._notifier

    @property
-    @deprecation.moved_property('atom_notifier', version="0.6",
-                                removal_version="?")
+    @moves.moved_property('atom_notifier', version="0.6",
+                          removal_version="2.0")
    def task_notifier(self):
-        """The task notifier."""
+        """The task notifier.
+
+        .. deprecated:: 0.6
+
+            The property is **deprecated** and is present for
+            backward compatibility **only**. In order to access this
+            property going forward the :py:attr:`.atom_notifier` should
+            be used instead.
+        """
        return self._atom_notifier

    @property
@@ -72,10 +78,9 @@ class Engine(object):
        """The options that were passed to this engine on construction."""
        return self._options

-    @misc.cachedproperty
+    @abc.abstractproperty
    def storage(self):
-        """The storage unit for this flow."""
-        return storage.Storage(self._flow_detail, backend=self._backend)
+        """The storage unit for this engine."""

    @abc.abstractmethod
    def compile(self):
@@ -92,9 +97,18 @@ class Engine(object):
        """Performs any pre-run, but post-compilation actions.

        NOTE(harlowja): During preparation it is currently assumed that the
-        underlying storage will be initialized, all final dependencies
-        will be verified, the tasks will be reset and the engine will enter
-        the PENDING state.
+        underlying storage will be initialized, the atoms will be reset and
+        the engine will enter the PENDING state.
+        """
+
+    @abc.abstractmethod
+    def validate(self):
+        """Performs any pre-run, post-prepare validation actions.
+
+        NOTE(harlowja): During validation all final dependencies
+        will be verified and ensured. This will by default check that all
+        atoms have satisfiable requirements (satisfied by some other
+        provider).
        """

    @abc.abstractmethod
@@ -105,15 +119,13 @@ class Engine(object):
    def suspend(self):
        """Attempts to suspend the engine.

-        If the engine is currently running tasks then this will attempt to
-        suspend future work from being started (currently active tasks can
+        If the engine is currently running atoms then this will attempt to
+        suspend future work from being started (currently active atoms can
        not currently be preempted) and move the engine into a suspend state
        which can then later be resumed from.
        """


 # TODO(harlowja): remove in 0.7 or later...
-EngineBase = deprecation.moved_inheritable_class(Engine,
-                                                 'EngineBase', __name__,
-                                                 version="0.6",
-                                                 removal_version="?")
+EngineBase = moves.moved_class(Engine, 'EngineBase', __name__,
+                               version="0.6", removal_version="2.0")
--- a/taskflow/engines/helpers.py
+++ b/taskflow/engines/helpers.py
@@ -18,6 +18,7 @@ import contextlib
 import itertools
 import traceback

+from debtcollector import renames
 from oslo_utils import importutils
 from oslo_utils import reflection
 import six
@@ -26,7 +27,6 @@ import stevedore.driver
 from taskflow import exceptions as exc
 from taskflow import logging
 from taskflow.persistence import backends as p_backends
-from taskflow.utils import deprecation
 from taskflow.utils import misc
 from taskflow.utils import persistence_utils as p_utils

@@ -90,14 +90,14 @@ def _extract_engine(**kwargs):
            lambda frame: frame[0] in _FILE_NAMES,
            reversed(traceback.extract_stack(limit=3)))
        stacklevel = sum(1 for _frame in finder)
-        decorator = deprecation.renamed_kwarg('engine_conf', 'engine',
-                                              version="0.6",
-                                              removal_version="?",
-                                              # Three is added on since the
-                                              # decorator adds three of its own
-                                              # stack levels that we need to
-                                              # hop out of...
-                                              stacklevel=stacklevel + 3)
+        decorator = renames.renamed_kwarg('engine_conf', 'engine',
+                                          version="0.6",
+                                          removal_version="2.0",
+                                          # Three is added on since the
+                                          # decorator adds three of its own
+                                          # stack levels that we need to
+                                          # hop out of...
+                                          stacklevel=stacklevel + 3)
        return decorator(_compat_extract)(**kwargs)
    else:
        return _compat_extract(**kwargs)
@@ -134,7 +134,7 @@ def load(flow, store=None, flow_detail=None, book=None,

    This function creates and prepares an engine to run the provided flow. All
    that is left after this returns is to run the engine with the
-    engines ``run()`` method.
+    engines :py:meth:`~taskflow.engines.base.Engine.run` method.

    Which engine to load is specified via the ``engine`` parameter. It
    can be a string that names the engine type to use, or a string that
@@ -143,7 +143,15 @@ def load(flow, store=None, flow_detail=None, book=None,

    Which storage backend to use is defined by the backend parameter. It
    can be backend itself, or a dictionary that is passed to
-    ``taskflow.persistence.backends.fetch()`` to obtain a viable backend.
+    :py:func:`~taskflow.persistence.backends.fetch` to obtain a
+    viable backend.
+
+    .. deprecated:: 0.6
+
+        The ``engine_conf`` argument is **deprecated** and is present
+        for backward compatibility **only**. In order to provide this
+        argument going forward the ``engine`` string (or URI) argument
+        should be used instead.

    :param flow: flow to load
    :param store: dict -- data to put to storage to satisfy flow requirements
@@ -198,7 +206,15 @@ def run(flow, store=None, flow_detail=None, book=None,

    The arguments are interpreted as for :func:`load() <load>`.

-    :returns: dictionary of all named results (see ``storage.fetch_all()``)
+    .. deprecated:: 0.6
+
+        The ``engine_conf`` argument is **deprecated** and is present
+        for backward compatibility **only**. In order to provide this
+        argument going forward the ``engine`` string (or URI) argument
+        should be used instead.
+
+    :returns: dictionary of all named
+              results (see :py:meth:`~.taskflow.storage.Storage.fetch_all`)
    """
    engine = load(flow, store=store, flow_detail=flow_detail, book=book,
                  engine_conf=engine_conf, backend=backend,
@@ -262,6 +278,13 @@ def load_from_factory(flow_factory, factory_args=None, factory_kwargs=None,

    Further arguments are interpreted as for :func:`load() <load>`.

+    .. deprecated:: 0.6
+
+        The ``engine_conf`` argument is **deprecated** and is present
+        for backward compatibility **only**. In order to provide this
+        argument going forward the ``engine`` string (or URI) argument
+        should be used instead.
+
    :returns: engine
    """

@@ -322,6 +345,13 @@ def load_from_detail(flow_detail, store=None, engine_conf=None, backend=None,

    Further arguments are interpreted as for :func:`load() <load>`.

+    .. deprecated:: 0.6
+
+        The ``engine_conf`` argument is **deprecated** and is present
+        for backward compatibility **only**. In order to provide this
+        argument going forward the ``engine`` string (or URI) argument
+        should be used instead.
+
    :returns: engine
    """
    flow = flow_from_detail(flow_detail)
--- a/taskflow/engines/worker_based/dispatcher.py
+++ b/taskflow/engines/worker_based/dispatcher.py
@@ -23,6 +23,36 @@ from taskflow.utils import kombu_utils as ku
 LOG = logging.getLogger(__name__)


+class Handler(object):
+    """Component(s) that will be called on reception of messages."""
+
+    __slots__ = ['_process_message', '_validator']
+
+    def __init__(self, process_message, validator=None):
+        self._process_message = process_message
+        self._validator = validator
+
+    @property
+    def process_message(self):
+        """Main callback that is called to process a received message.
+
+        This is only called after the format has been validated (using
+        the ``validator`` callback if applicable) and only after the message
+        has been acknowledged.
+        """
+        return self._process_message
+
+    @property
+    def validator(self):
+        """Optional callback that will be activated before processing.
+
+        This callback if present is expected to validate the message and
+        raise :py:class:`~taskflow.exceptions.InvalidFormat` if the message
+        is not valid.
+        """
+        return self._validator
+
+
 class TypeDispatcher(object):
    """Receives messages and dispatches to type specific handlers."""

@@ -99,10 +129,9 @@ class TypeDispatcher(object):
            LOG.warning("Unexpected message type: '%s' in message"
                        " '%s'", message_type, ku.DelayedPretty(message))
        else:
-            if isinstance(handler, (tuple, list)):
-                handler, validator = handler
+            if handler.validator is not None:
                try:
-                    validator(data)
+                    handler.validator(data)
                except excp.InvalidFormat as e:
                    message.reject_log_error(
                        logger=LOG, errors=(kombu_exc.MessageStateError,))
@@ -115,7 +144,7 @@ class TypeDispatcher(object):
            if message.acknowledged:
                LOG.debug("Message '%s' was acknowledged.",
                          ku.DelayedPretty(message))
-                handler(data, message)
+                handler.process_message(data, message)
            else:
                message.reject_log_error(logger=LOG,
                                         errors=(kombu_exc.MessageStateError,))
--- a/taskflow/engines/worker_based/engine.py
+++ b/taskflow/engines/worker_based/engine.py
@@ -36,8 +36,9 @@ class WorkerBasedActionEngine(engine.ActionEngine):
                               of the (PENDING, WAITING) request states. When
                               expired the associated task the request was made
                               for will have its result become a
-                               `RequestTimeout` exception instead of its
-                               normally returned value (or raised exception).
+                               :py:class:`~taskflow.exceptions.RequestTimeout`
+                               exception instead of its normally returned
+                               value (or raised exception).
    :param transport_options: transport specific options (see:
                              http://kombu.readthedocs.org/ for what these
                              options imply and are expected to be)
--- a/taskflow/engines/worker_based/executor.py
+++ b/taskflow/engines/worker_based/executor.py
@@ -16,16 +16,17 @@

 import functools

+from futurist import periodics
 from oslo_utils import timeutils

 from taskflow.engines.action_engine import executor
+from taskflow.engines.worker_based import dispatcher
 from taskflow.engines.worker_based import protocol as pr
 from taskflow.engines.worker_based import proxy
 from taskflow.engines.worker_based import types as wt
 from taskflow import exceptions as exc
 from taskflow import logging
 from taskflow import task as task_atom
-from taskflow.types import periodic
 from taskflow.utils import kombu_utils as ku
 from taskflow.utils import misc
 from taskflow.utils import threading_utils as tu
@@ -44,10 +45,8 @@ class WorkerTaskExecutor(executor.TaskExecutor):
        self._requests_cache = wt.RequestsCache()
        self._transition_timeout = transition_timeout
        type_handlers = {
-            pr.RESPONSE: [
-                self._process_response,
-                pr.Response.validate,
-            ],
+            pr.RESPONSE: dispatcher.Handler(self._process_response,
+                                            validator=pr.Response.validate),
        }
        self._proxy = proxy.Proxy(uuid, exchange,
                                  type_handlers=type_handlers,
@@ -68,7 +67,7 @@ class WorkerTaskExecutor(executor.TaskExecutor):
        self._helpers.bind(lambda: tu.daemon_thread(self._proxy.start),
                           after_start=lambda t: self._proxy.wait(),
                           before_join=lambda t: self._proxy.stop())
-        p_worker = periodic.PeriodicWorker.create([self._finder])
+        p_worker = periodics.PeriodicWorker.create([self._finder])
        if p_worker:
            self._helpers.bind(lambda: tu.daemon_thread(p_worker.start),
                               before_join=lambda t: p_worker.stop(),
--- a/taskflow/engines/worker_based/protocol.py
+++ b/taskflow/engines/worker_based/protocol.py
@@ -15,11 +15,11 @@
 #    under the License.

 import abc
+import collections
 import threading

-from concurrent import futures
-import jsonschema
-from jsonschema import exceptions as schema_exc
+import fasteners
+import futurist
 from oslo_utils import reflection
 from oslo_utils import timeutils
 import six
@@ -28,8 +28,7 @@ from taskflow.engines.action_engine import executor
 from taskflow import exceptions as excp
 from taskflow import logging
 from taskflow.types import failure as ft
-from taskflow.types import timing as tt
-from taskflow.utils import lock_utils
+from taskflow.utils import schema_utils as su

 # NOTE(skudriashev): This is protocol states and events, which are not
 # related to task states.
@@ -98,12 +97,6 @@ NOTIFY = 'NOTIFY'
 REQUEST = 'REQUEST'
 RESPONSE = 'RESPONSE'

-# Special jsonschema validation types/adjustments.
-_SCHEMA_TYPES = {
-    # See: https://github.com/Julian/jsonschema/issues/148
-    'array': (list, tuple),
-}
-
 LOG = logging.getLogger(__name__)


@@ -112,7 +105,8 @@ class Message(object):
    """Base class for all message types."""

    def __str__(self):
-        return "<%s> %s" % (self.TYPE, self.to_dict())
+        cls_name = reflection.get_class_name(self, fully_qualified=False)
+        return "<%s> %s" % (cls_name, self.to_dict())

    @abc.abstractmethod
    def to_dict(self):
@@ -166,16 +160,25 @@ class Notify(Message):
        else:
            schema = cls.SENDER_SCHEMA
        try:
-            jsonschema.validate(data, schema, types=_SCHEMA_TYPES)
-        except schema_exc.ValidationError as e:
+            su.schema_validate(data, schema)
+        except su.ValidationError as e:
+            cls_name = reflection.get_class_name(cls, fully_qualified=False)
            if response:
-                raise excp.InvalidFormat("%s message response data not of the"
-                                         " expected format: %s"
-                                         % (cls.TYPE, e.message), e)
+                excp.raise_with_cause(excp.InvalidFormat,
+                                      "%s message response data not of the"
+                                      " expected format: %s" % (cls_name,
+                                                                e.message),
+                                      cause=e)
            else:
-                raise excp.InvalidFormat("%s message sender data not of the"
-                                         " expected format: %s"
-                                         % (cls.TYPE, e.message), e)
+                excp.raise_with_cause(excp.InvalidFormat,
+                                      "%s message sender data not of the"
+                                      " expected format: %s" % (cls_name,
+                                                                e.message),
+                                      cause=e)
+
+
+_WorkUnit = collections.namedtuple('_WorkUnit', ['task_cls', 'task_name',
+                                                 'action', 'arguments'])


 class Request(Message):
@@ -235,11 +238,11 @@ class Request(Message):
        self._event = ACTION_TO_EVENT[action]
        self._arguments = arguments
        self._kwargs = kwargs
-        self._watch = tt.StopWatch(duration=timeout).start()
+        self._watch = timeutils.StopWatch(duration=timeout).start()
        self._state = WAITING
        self._lock = threading.Lock()
        self._created_on = timeutils.utcnow()
-        self._result = futures.Future()
+        self._result = futurist.Future()
        self._result.atom = task
        self._notifier = task.notifier

@@ -332,7 +335,7 @@ class Request(Message):
                        new_state, exc_info=True)
        return moved

-    @lock_utils.locked
+    @fasteners.locked
    def transition(self, new_state):
        """Transitions the request to a new state.

@@ -358,11 +361,60 @@ class Request(Message):
    @classmethod
    def validate(cls, data):
        try:
-            jsonschema.validate(data, cls.SCHEMA, types=_SCHEMA_TYPES)
-        except schema_exc.ValidationError as e:
-            raise excp.InvalidFormat("%s message response data not of the"
-                                     " expected format: %s"
-                                     % (cls.TYPE, e.message), e)
+            su.schema_validate(data, cls.SCHEMA)
+        except su.ValidationError as e:
+            cls_name = reflection.get_class_name(cls, fully_qualified=False)
+            excp.raise_with_cause(excp.InvalidFormat,
+                                  "%s message response data not of the"
+                                  " expected format: %s" % (cls_name,
+                                                            e.message),
+                                  cause=e)
+        else:
+            # Validate all failure dictionaries that *may* be present...
+            failures = []
+            if 'failures' in data:
+                failures.extend(six.itervalues(data['failures']))
+            result = data.get('result')
+            if result is not None:
+                result_data_type, result_data = result
+                if result_data_type == 'failure':
+                    failures.append(result_data)
+            for fail_data in failures:
+                ft.Failure.validate(fail_data)
+
+    @staticmethod
+    def from_dict(data, task_uuid=None):
+        """Parses **validated** data into a work unit.
+
+        All :py:class:`~taskflow.types.failure.Failure` objects that have been
+        converted to dict(s) on the remote side will now converted back
+        to py:class:`~taskflow.types.failure.Failure` objects.
+        """
+        task_cls = data['task_cls']
+        task_name = data['task_name']
+        action = data['action']
+        arguments = data.get('arguments', {})
+        result = data.get('result')
+        failures = data.get('failures')
+        # These arguments will eventually be given to the task executor
+        # so they need to be in a format it will accept (and using keyword
+        # argument names that it accepts)...
+        arguments = {
+            'arguments': arguments,
+        }
+        if task_uuid is not None:
+            arguments['task_uuid'] = task_uuid
+        if result is not None:
+            result_data_type, result_data = result
+            if result_data_type == 'failure':
+                arguments['result'] = ft.Failure.from_dict(result_data)
+            else:
+                arguments['result'] = result_data
+        if failures is not None:
+            arguments['failures'] = {}
+            for task, fail_data in six.iteritems(failures):
+                arguments['failures'][task] = ft.Failure.from_dict(fail_data)
+        return _WorkUnit(task_cls, task_name, action, arguments)


 class Response(Message):
@@ -455,8 +507,15 @@ class Response(Message):
    @classmethod
    def validate(cls, data):
        try:
-            jsonschema.validate(data, cls.SCHEMA, types=_SCHEMA_TYPES)
-        except schema_exc.ValidationError as e:
-            raise excp.InvalidFormat("%s message response data not of the"
-                                     " expected format: %s"
-                                     % (cls.TYPE, e.message), e)
+            su.schema_validate(data, cls.SCHEMA)
+        except su.ValidationError as e:
+            cls_name = reflection.get_class_name(cls, fully_qualified=False)
+            excp.raise_with_cause(excp.InvalidFormat,
+                                  "%s message response data not of the"
+                                  " expected format: %s" % (cls_name,
+                                                            e.message),
+                                  cause=e)
+        else:
+            state = data['state']
+            if state == FAILURE and 'result' in data:
+                ft.Failure.validate(data['result'])
--- a/taskflow/engines/worker_based/proxy.py
+++ b/taskflow/engines/worker_based/proxy.py
@@ -15,6 +15,7 @@
 #    under the License.

 import collections
+import threading

 import kombu
 from kombu import exceptions as kombu_exceptions
@@ -22,7 +23,6 @@ import six

 from taskflow.engines.worker_based import dispatcher
 from taskflow import logging
-from taskflow.utils import threading_utils

 LOG = logging.getLogger(__name__)

@@ -75,7 +75,7 @@ class Proxy(object):
        self._topic = topic
        self._exchange_name = exchange
        self._on_wait = on_wait
-        self._running = threading_utils.Event()
+        self._running = threading.Event()
        self._dispatcher = dispatcher.TypeDispatcher(
            # NOTE(skudriashev): Process all incoming messages only if proxy is
            # running, otherwise requeue them.
--- a/taskflow/engines/worker_based/server.py
+++ b/taskflow/engines/worker_based/server.py
@@ -17,14 +17,14 @@
 import functools

 from oslo_utils import reflection
-import six
+from oslo_utils import timeutils

+from taskflow.engines.worker_based import dispatcher
 from taskflow.engines.worker_based import protocol as pr
 from taskflow.engines.worker_based import proxy
 from taskflow import logging
 from taskflow.types import failure as ft
 from taskflow.types import notifier as nt
-from taskflow.types import timing as tt
 from taskflow.utils import kombu_utils as ku
 from taskflow.utils import misc

@@ -38,14 +38,13 @@ class Server(object):
                 url=None, transport=None, transport_options=None,
                 retry_options=None):
        type_handlers = {
-            pr.NOTIFY: [
+            pr.NOTIFY: dispatcher.Handler(
                self._delayed_process(self._process_notify),
-                functools.partial(pr.Notify.validate, response=False),
-            ],
-            pr.REQUEST: [
+                validator=functools.partial(pr.Notify.validate,
+                                            response=False)),
+            pr.REQUEST: dispatcher.Handler(
                self._delayed_process(self._process_request),
-                pr.Request.validate,
-            ],
+                validator=pr.Request.validate),
        }
        self._executor = executor
        self._proxy = proxy.Proxy(topic, exchange,
@@ -77,7 +76,7 @@ class Server(object):
        def _on_receive(content, message):
            LOG.debug("Submitting message '%s' for execution in the"
                      " future to '%s'", ku.DelayedPretty(message), func_name)
-            watch = tt.StopWatch()
+            watch = timeutils.StopWatch()
            watch.start()
            try:
                self._executor.submit(_on_run, watch, content, message)
@@ -94,32 +93,6 @@ class Server(object):
    def connection_details(self):
        return self._proxy.connection_details

-    @staticmethod
-    def _parse_request(task_cls, task_name, action, arguments, result=None,
-                       failures=None, **kwargs):
-        """Parse request before it can be further processed.
-
-        All `failure.Failure` objects that have been converted to dict on the
-        remote side will now converted back to `failure.Failure` objects.
-        """
-        # These arguments will eventually be given to the task executor
-        # so they need to be in a format it will accept (and using keyword
-        # argument names that it accepts)...
-        arguments = {
-            'arguments': arguments,
-        }
-        if result is not None:
-            data_type, data = result
-            if data_type == 'failure':
-                arguments['result'] = ft.Failure.from_dict(data)
-            else:
-                arguments['result'] = data
-        if failures is not None:
-            arguments['failures'] = {}
-            for key, data in six.iteritems(failures):
-                arguments['failures'][key] = ft.Failure.from_dict(data)
-        return (task_cls, task_name, action, arguments)
-
    @staticmethod
    def _parse_message(message):
        """Extracts required attributes out of the messages properties.
@@ -199,11 +172,9 @@ class Server(object):
            reply_callback = functools.partial(self._reply, True, reply_to,
                                               task_uuid)

-        # parse request to get task name, action and action arguments
+        # Parse the request to get the activity/work to perform.
        try:
-            bundle = self._parse_request(**request)
-            task_cls, task_name, action, arguments = bundle
-            arguments['task_uuid'] = task_uuid
+            work = pr.Request.from_dict(request, task_uuid=task_uuid)
        except ValueError:
            with misc.capture_failure() as failure:
                LOG.warn("Failed to parse request contents from message '%s'",
@@ -211,34 +182,35 @@ class Server(object):
                reply_callback(result=failure.to_dict())
                return

-        # get task endpoint
+        # Now fetch the task endpoint (and action handler on it).
        try:
-            endpoint = self._endpoints[task_cls]
+            endpoint = self._endpoints[work.task_cls]
        except KeyError:
            with misc.capture_failure() as failure:
                LOG.warn("The '%s' task endpoint does not exist, unable"
                         " to continue processing request message '%s'",
-                         task_cls, ku.DelayedPretty(message), exc_info=True)
+                         work.task_cls, ku.DelayedPretty(message),
+                         exc_info=True)
                reply_callback(result=failure.to_dict())
                return
        else:
            try:
-                handler = getattr(endpoint, action)
+                handler = getattr(endpoint, work.action)
            except AttributeError:
                with misc.capture_failure() as failure:
                    LOG.warn("The '%s' handler does not exist on task endpoint"
                             " '%s', unable to continue processing request"
-                             " message '%s'", action, endpoint,
+                             " message '%s'", work.action, endpoint,
                             ku.DelayedPretty(message), exc_info=True)
                    reply_callback(result=failure.to_dict())
                    return
            else:
                try:
-                    task = endpoint.generate(name=task_name)
+                    task = endpoint.generate(name=work.task_name)
                except Exception:
                    with misc.capture_failure() as failure:
                        LOG.warn("The '%s' task '%s' generation for request"
-                                 " message '%s' failed", endpoint, action,
+                                 " message '%s' failed", endpoint, work.action,
                                 ku.DelayedPretty(message), exc_info=True)
                        reply_callback(result=failure.to_dict())
                        return
@@ -246,7 +218,7 @@ class Server(object):
                    if not reply_callback(state=pr.RUNNING):
                        return

-        # associate *any* events this task emits with a proxy that will
+        # Associate *any* events this task emits with a proxy that will
        # emit them back to the engine... for handling at the engine side
        # of things...
        if task.notifier.can_be_registered(nt.Notifier.ANY):
@@ -254,22 +226,23 @@ class Server(object):
                                   functools.partial(self._on_event,
                                                     reply_to, task_uuid))
        elif isinstance(task.notifier, nt.RestrictedNotifier):
-            # only proxy the allowable events then...
+            # Only proxy the allowable events then...
            for event_type in task.notifier.events_iter():
                task.notifier.register(event_type,
                                       functools.partial(self._on_event,
                                                         reply_to, task_uuid))

-        # perform the task action
+        # Perform the task action.
        try:
-            result = handler(task, **arguments)
+            result = handler(task, **work.arguments)
        except Exception:
            with misc.capture_failure() as failure:
                LOG.warn("The '%s' endpoint '%s' execution for request"
-                         " message '%s' failed", endpoint, action,
+                         " message '%s' failed", endpoint, work.action,
                         ku.DelayedPretty(message), exc_info=True)
                reply_callback(result=failure.to_dict())
        else:
+            # And be done with it!
            if isinstance(result, ft.Failure):
                reply_callback(result=result.to_dict())
            else:
--- a/taskflow/engines/worker_based/types.py
+++ b/taskflow/engines/worker_based/types.py
@@ -20,15 +20,16 @@ import itertools
 import random
 import threading

+from futurist import periodics
 from oslo_utils import reflection
+from oslo_utils import timeutils
 import six

+from taskflow.engines.worker_based import dispatcher
 from taskflow.engines.worker_based import protocol as pr
 from taskflow import logging
 from taskflow.types import cache as base
 from taskflow.types import notifier
-from taskflow.types import periodic
-from taskflow.types import timing as tt
 from taskflow.utils import kombu_utils as ku

 LOG = logging.getLogger(__name__)
@@ -122,7 +123,7 @@ class WorkerFinder(object):
        """
        if workers <= 0:
            raise ValueError("Worker amount must be greater than zero")
-        watch = tt.StopWatch(duration=timeout)
+        watch = timeutils.StopWatch(duration=timeout)
        watch.start()
        with self._cond:
            while self._total_workers() < workers:
@@ -165,10 +166,10 @@ class ProxyWorkerFinder(WorkerFinder):
        self._workers = {}
        self._uuid = uuid
        self._proxy.dispatcher.type_handlers.update({
-            pr.NOTIFY: [
+            pr.NOTIFY: dispatcher.Handler(
                self._process_response,
-                functools.partial(pr.Notify.validate, response=True),
-            ],
+                validator=functools.partial(pr.Notify.validate,
+                                            response=True)),
        })
        self._counter = itertools.count()

@@ -179,7 +180,7 @@ class ProxyWorkerFinder(WorkerFinder):
        else:
            return TopicWorker(topic, tasks)

-    @periodic.periodic(pr.NOTIFY_PERIOD)
+    @periodics.periodic(pr.NOTIFY_PERIOD, run_immediately=True)
    def beat(self):
        """Cyclically called to publish notify message to each topic."""
        self._proxy.publish(pr.Notify(), self._topics, reply_to=self._uuid)
--- a/taskflow/engines/worker_based/worker.py
+++ b/taskflow/engines/worker_based/worker.py
@@ -20,47 +20,17 @@ import socket
 import string
 import sys

+import futurist
 from oslo_utils import reflection

 from taskflow.engines.worker_based import endpoint
 from taskflow.engines.worker_based import server
 from taskflow import logging
 from taskflow import task as t_task
-from taskflow.types import futures
 from taskflow.utils import misc
 from taskflow.utils import threading_utils as tu
 from taskflow import version

-BANNER_TEMPLATE = string.Template("""
-TaskFlow v${version} WBE worker.
-Connection details:
-  Driver = $transport_driver
-  Exchange = $exchange
-  Topic = $topic
-  Transport = $transport_type
-  Uri = $connection_uri
-Powered by:
-  Executor = $executor_type
-  Thread count = $executor_thread_count
-Supported endpoints:$endpoints
-System details:
-  Hostname = $hostname
-  Pid = $pid
-  Platform = $platform
-  Python = $python
-  Thread id = $thread_id
-""".strip())
-BANNER_TEMPLATE.defaults = {
-    # These values may not be possible to fetch/known, default to unknown...
-    'pid': '???',
-    'hostname': '???',
-    'executor_thread_count': '???',
-    'endpoints': ' %s' % ([]),
-    # These are static (avoid refetching...)
-    'version': version.version_string(),
-    'python': sys.version.split("\n", 1)[0].strip(),
-}
-
 LOG = logging.getLogger(__name__)


@@ -88,6 +58,39 @@ class Worker(object):
                          (see: :py:attr:`~.proxy.Proxy.DEFAULT_RETRY_OPTIONS`)
    """

+    BANNER_TEMPLATE = string.Template("""
+TaskFlow v${version} WBE worker.
+Connection details:
+  Driver = $transport_driver
+  Exchange = $exchange
+  Topic = $topic
+  Transport = $transport_type
+  Uri = $connection_uri
+Powered by:
+  Executor = $executor_type
+  Thread count = $executor_thread_count
+Supported endpoints:$endpoints
+System details:
+  Hostname = $hostname
+  Pid = $pid
+  Platform = $platform
+  Python = $python
+  Thread id = $thread_id
+""".strip())
+
+    # See: http://bugs.python.org/issue13173 for why we are doing this...
+    BANNER_TEMPLATE.defaults = {
+        # These values may not be possible to fetch/known, default
+        # to ??? to represent that they are unknown...
+        'pid': '???',
+        'hostname': '???',
+        'executor_thread_count': '???',
+        'endpoints': ' %s' % ([]),
+        # These are static (avoid refetching...)
+        'version': version.version_string(),
+        'python': sys.version.split("\n", 1)[0].strip(),
+    }
+
    def __init__(self, exchange, topic, tasks,
                 executor=None, threads_count=None, url=None,
                 transport=None, transport_options=None,
@@ -95,13 +98,9 @@ class Worker(object):
        self._topic = topic
        self._executor = executor
        self._owns_executor = False
-        self._threads_count = -1
        if self._executor is None:
-            if threads_count is not None:
-                self._threads_count = int(threads_count)
-            else:
-                self._threads_count = tu.get_optimal_thread_count()
-            self._executor = futures.ThreadPoolExecutor(self._threads_count)
+            self._executor = futurist.ThreadPoolExecutor(
+                max_workers=threads_count)
            self._owns_executor = True
        self._endpoints = self._derive_endpoints(tasks)
        self._exchange = exchange
@@ -119,7 +118,10 @@ class Worker(object):

    def _generate_banner(self):
        """Generates a banner that can be useful to display before running."""
-        tpl_params = {}
+        try:
+            tpl_params = dict(self.BANNER_TEMPLATE.defaults)
+        except AttributeError:
+            tpl_params = {}
        connection_details = self._server.connection_details
        transport = connection_details.transport
        if transport.driver_version:
@@ -133,8 +135,9 @@ class Worker(object):
        tpl_params['transport_type'] = transport.driver_type
        tpl_params['connection_uri'] = connection_details.uri
        tpl_params['executor_type'] = reflection.get_class_name(self._executor)
-        if self._threads_count != -1:
-            tpl_params['executor_thread_count'] = self._threads_count
+        threads_count = getattr(self._executor, 'max_workers', None)
+        if threads_count is not None:
+            tpl_params['executor_thread_count'] = threads_count
        if self._endpoints:
            pretty_endpoints = []
            for ep in self._endpoints:
@@ -151,8 +154,7 @@ class Worker(object):
            pass
        tpl_params['platform'] = platform.platform()
        tpl_params['thread_id'] = tu.get_ident()
-        banner = BANNER_TEMPLATE.substitute(BANNER_TEMPLATE.defaults,
-                                            **tpl_params)
+        banner = self.BANNER_TEMPLATE.substitute(**tpl_params)
        # NOTE(harlowja): this is needed since the template in this file
        # will always have newlines that end with '\n' (even on different
        # platforms due to the way this source file is encoded) so we have
--- a/taskflow/examples/99_bottles.py
+++ b/taskflow/examples/99_bottles.py
@@ -0,0 +1,204 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import contextlib
+import logging
+import os
+import sys
+import time
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+from taskflow.conductors import backends as conductor_backends
+from taskflow import engines
+from taskflow.jobs import backends as job_backends
+from taskflow.patterns import linear_flow as lf
+from taskflow.persistence import backends as persistence_backends
+from taskflow.persistence import logbook
+from taskflow import task
+from taskflow.types import timing
+
+from oslo_utils import uuidutils
+
+# Instructions!
+#
+# 1. Install zookeeper (or change host listed below)
+# 2. Download this example, place in file '99_bottles.py'
+# 3. Run `python 99_bottles.py p` to place a song request onto the jobboard
+# 4. Run `python 99_bottles.py c` a few times (in different shells)
+# 5. On demand kill previously listed processes created in (4) and watch
+#    the work resume on another process (and repeat)
+# 6. Keep enough workers alive to eventually finish the song (if desired).
+
+ME = os.getpid()
+ZK_HOST = "localhost:2181"
+JB_CONF = {
+    'hosts': ZK_HOST,
+    'board': 'zookeeper',
+    'path': '/taskflow/99-bottles-demo',
+}
+PERSISTENCE_URI = r"sqlite:////tmp/bottles.db"
+TAKE_DOWN_DELAY = 1.0
+PASS_AROUND_DELAY = 3.0
+HOW_MANY_BOTTLES = 99
+
+
+class TakeABottleDown(task.Task):
+    def execute(self, bottles_left):
+        sys.stdout.write('Take one down, ')
+        sys.stdout.flush()
+        time.sleep(TAKE_DOWN_DELAY)
+        return bottles_left - 1
+
+
+class PassItAround(task.Task):
+    def execute(self):
+        sys.stdout.write('pass it around, ')
+        sys.stdout.flush()
+        time.sleep(PASS_AROUND_DELAY)
+
+
+class Conclusion(task.Task):
+    def execute(self, bottles_left):
+        sys.stdout.write('%s bottles of beer on the wall...\n' % bottles_left)
+        sys.stdout.flush()
+
+
+def make_bottles(count):
+    # This is the function that will be called to generate the workflow
+    # and will also be called to regenerate it on resumption so that work
+    # can continue from where it last left off...
+
+    s = lf.Flow("bottle-song")
+
+    take_bottle = TakeABottleDown("take-bottle-%s" % count,
+                                  inject={'bottles_left': count},
+                                  provides='bottles_left')
+    pass_it = PassItAround("pass-%s-around" % count)
+    next_bottles = Conclusion("next-bottles-%s" % (count - 1))
+    s.add(take_bottle, pass_it, next_bottles)
+
+    for bottle in reversed(list(range(1, count))):
+        take_bottle = TakeABottleDown("take-bottle-%s" % bottle,
+                                      provides='bottles_left')
+        pass_it = PassItAround("pass-%s-around" % bottle)
+        next_bottles = Conclusion("next-bottles-%s" % (bottle - 1))
+        s.add(take_bottle, pass_it, next_bottles)
+
+    return s
+
+
+def run_conductor():
+    # This continuously consumers until its stopped via ctrl-c or other
+    # kill signal...
+
+    event_watches = {}
+
+    # This will be triggered by the conductor doing various activities
+    # with engines, and is quite nice to be able to see the various timing
+    # segments (which is useful for debugging, or watching, or figuring out
+    # where to optimize).
+    def on_conductor_event(event, details):
+        print("Event '%s' has been received..." % event)
+        print("Details = %s" % details)
+        if event.endswith("_start"):
+            w = timing.StopWatch()
+            w.start()
+            base_event = event[0:-len("_start")]
+            event_watches[base_event] = w
+        if event.endswith("_end"):
+            base_event = event[0:-len("_end")]
+            try:
+                w = event_watches.pop(base_event)
+                w.stop()
+                print("It took %0.3f seconds for event '%s' to finish"
+                      % (w.elapsed(), base_event))
+            except KeyError:
+                pass
+
+    print("Starting conductor with pid: %s" % ME)
+    my_name = "conductor-%s" % ME
+    persist_backend = persistence_backends.fetch(PERSISTENCE_URI)
+    with contextlib.closing(persist_backend):
+        with contextlib.closing(persist_backend.get_connection()) as conn:
+            conn.upgrade()
+        job_backend = job_backends.fetch(my_name, JB_CONF,
+                                         persistence=persist_backend)
+        job_backend.connect()
+        with contextlib.closing(job_backend):
+            cond = conductor_backends.fetch('blocking', my_name, job_backend,
+                                            persistence=persist_backend)
+            cond.notifier.register(cond.notifier.ANY, on_conductor_event)
+            # Run forever, and kill -9 or ctrl-c me...
+            try:
+                cond.run()
+            finally:
+                cond.stop()
+                cond.wait()
+
+
+def run_poster():
+    # This just posts a single job and then ends...
+    print("Starting poster with pid: %s" % ME)
+    my_name = "poster-%s" % ME
+    persist_backend = persistence_backends.fetch(PERSISTENCE_URI)
+    with contextlib.closing(persist_backend):
+        with contextlib.closing(persist_backend.get_connection()) as conn:
+            conn.upgrade()
+        job_backend = job_backends.fetch(my_name, JB_CONF,
+                                         persistence=persist_backend)
+        job_backend.connect()
+        with contextlib.closing(job_backend):
+            # Create information in the persistence backend about the
+            # unit of work we want to complete and the factory that
+            # can be called to create the tasks that the work unit needs
+            # to be done.
+            lb = logbook.LogBook("post-from-%s" % my_name)
+            fd = logbook.FlowDetail("song-from-%s" % my_name,
+                                    uuidutils.generate_uuid())
+            lb.add(fd)
+            with contextlib.closing(persist_backend.get_connection()) as conn:
+                conn.save_logbook(lb)
+            engines.save_factory_details(fd, make_bottles,
+                                         [HOW_MANY_BOTTLES], {},
+                                         backend=persist_backend)
+            # Post, and be done with it!
+            jb = job_backend.post("song-from-%s" % my_name, book=lb)
+            print("Posted: %s" % jb)
+            print("Goodbye...")
+
+
+def main():
+    if len(sys.argv) == 1:
+        sys.stderr.write("%s p|c\n" % os.path.basename(sys.argv[0]))
+    elif sys.argv[1] in ('p', 'c'):
+        if sys.argv[-1] == "v":
+            logging.basicConfig(level=5)
+        else:
+            logging.basicConfig(level=logging.ERROR)
+        if sys.argv[1] == 'p':
+            run_poster()
+        else:
+            run_conductor()
+    else:
+        sys.stderr.write("%s p|c (v?)\n" % os.path.basename(sys.argv[0]))
+
+
+if __name__ == '__main__':
+    main()
--- a/taskflow/examples/alphabet_soup.py
+++ b/taskflow/examples/alphabet_soup.py
@@ -38,7 +38,7 @@ from taskflow import task


 # In this example we show how a simple linear set of tasks can be executed
-# using local processes (and not threads or remote workers) with minimial (if
+# using local processes (and not threads or remote workers) with minimal (if
 # any) modification to those tasks to make them safe to run in this mode.
 #
 # This is useful since it allows further scaling up your workflows when thread
--- a/taskflow/examples/build_a_car.py
+++ b/taskflow/examples/build_a_car.py
@@ -38,7 +38,7 @@ ANY = notifier.Notifier.ANY
 import example_utils as eu  # noqa


-# INTRO: This examples shows how a graph flow and linear flow can be used
+# INTRO: This example shows how a graph flow and linear flow can be used
 # together to execute dependent & non-dependent tasks by going through the
 # steps required to build a simplistic car (an assembly line if you will). It
 # also shows how raw functions can be wrapped into a task object instead of
@@ -167,7 +167,7 @@ engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
 # flow_watch function for flow state transitions, and registers the
 # same all (ANY) state transitions for task state transitions.
 engine.notifier.register(ANY, flow_watch)
-engine.task_notifier.register(ANY, task_watch)
+engine.atom_notifier.register(ANY, task_watch)

 eu.print_wrapped("Building a car")
 engine.run()
@@ -180,7 +180,7 @@ spec['doors'] = 5

 engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
 engine.notifier.register(ANY, flow_watch)
-engine.task_notifier.register(ANY, task_watch)
+engine.atom_notifier.register(ANY, task_watch)

 eu.print_wrapped("Building a wrong car that doesn't match specification")
 try:
--- a/taskflow/examples/calculate_in_parallel.py
+++ b/taskflow/examples/calculate_in_parallel.py
@@ -30,7 +30,7 @@ from taskflow.patterns import linear_flow as lf
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task

-# INTRO: This examples shows how a linear flow and a unordered flow can be
+# INTRO: These examples show how a linear flow and an unordered flow can be
 # used together to execute calculations in parallel and then use the
 # result for the next task/s. The adder task is used for all calculations
 # and argument bindings are used to set correct parameters for each task.
--- a/taskflow/examples/create_parallel_volume.py
+++ b/taskflow/examples/create_parallel_volume.py
@@ -35,7 +35,7 @@ from taskflow.listeners import printing
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task

-# INTRO: This examples shows how unordered_flow can be used to create a large
+# INTRO: These examples show how unordered_flow can be used to create a large
 # number of fake volumes in parallel (or serially, depending on a constant that
 # can be easily changed).

--- a/taskflow/examples/dump_memory_backend.py
+++ b/taskflow/examples/dump_memory_backend.py
@@ -0,0 +1,78 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+import os
+import sys
+
+logging.basicConfig(level=logging.ERROR)
+
+self_dir = os.path.abspath(os.path.dirname(__file__))
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+sys.path.insert(0, self_dir)
+
+from taskflow import engines
+from taskflow.patterns import linear_flow as lf
+from taskflow.persistence import backends
+from taskflow import task
+from taskflow.utils import persistence_utils as pu
+
+# INTRO: in this example we create a dummy flow with a dummy task, and run
+# it using a in-memory backend and pre/post run we dump out the contents
+# of the in-memory backends tree structure (which can be quite useful to
+# look at for debugging or other analysis).
+
+
+class PrintTask(task.Task):
+    def execute(self):
+        print("Running '%s'" % self.name)
+
+
+backend = backends.fetch({
+    'connection': 'memory://',
+})
+book, flow_detail = pu.temporary_flow_detail(backend=backend)
+
+# Make a little flow and run it...
+f = lf.Flow('root')
+for alpha in ['a', 'b', 'c']:
+    f.add(PrintTask(alpha))
+
+e = engines.load(f, flow_detail=flow_detail,
+                 book=book, backend=backend)
+e.compile()
+e.prepare()
+
+print("----------")
+print("Before run")
+print("----------")
+print(backend.memory.pformat())
+print("----------")
+
+e.run()
+
+print("---------")
+print("After run")
+print("---------")
+for path in backend.memory.ls_r(backend.memory.root_path, absolute=True):
+    value = backend.memory[path]
+    if value:
+        print("%s -> %s" % (path, value))
+    else:
+        print("%s" % (path))
--- a/taskflow/examples/echo_listener.py
+++ b/taskflow/examples/echo_listener.py
@@ -31,8 +31,8 @@ from taskflow.patterns import linear_flow as lf
 from taskflow import task

 # INTRO: This example walks through a miniature workflow which will do a
-# simple echo operation; during this execution a listener is assocated with
-# the engine to recieve all notifications about what the flow has performed,
+# simple echo operation; during this execution a listener is associated with
+# the engine to receive all notifications about what the flow has performed,
 # this example dumps that output to the stdout for viewing (at debug level
 # to show all the information which is possible).

--- a/taskflow/examples/fake_billing.py
+++ b/taskflow/examples/fake_billing.py
@@ -36,8 +36,8 @@ from taskflow.patterns import linear_flow as lf
 from taskflow import task
 from taskflow.utils import misc

-# INTRO: This example walks through a miniature workflow which simulates a
-# the reception of a API request, creation of a database entry, driver
+# INTRO: This example walks through a miniature workflow which simulates
+# the reception of an API request, creation of a database entry, driver
 # activation (which invokes a 'fake' webservice) and final completion.
 #
 # This example also shows how a function/object (in this class the url sending)
--- a/taskflow/examples/graph_flow.py
+++ b/taskflow/examples/graph_flow.py
@@ -80,12 +80,37 @@ store = {
    "y5": 9,
 }

+# This is the expected values that should be created.
+unexpected = 0
+expected = [
+    ('x1', 4),
+    ('x2', 12),
+    ('x3', 16),
+    ('x4', 21),
+    ('x5', 20),
+    ('x6', 41),
+    ('x7', 82),
+]
+
 result = taskflow.engines.run(
    flow, engine='serial', store=store)

 print("Single threaded engine result %s" % result)
+for (name, value) in expected:
+    actual = result.get(name)
+    if actual != value:
+        sys.stderr.write("%s != %s\n" % (actual, value))
+        unexpected += 1

 result = taskflow.engines.run(
    flow, engine='parallel', store=store)

 print("Multi threaded engine result %s" % result)
+for (name, value) in expected:
+    actual = result.get(name)
+    if actual != value:
+        sys.stderr.write("%s != %s\n" % (actual, value))
+        unexpected += 1
+
+if unexpected:
+    sys.exit(1)
--- a/taskflow/examples/hello_world.py
+++ b/taskflow/examples/hello_world.py
@@ -25,16 +25,17 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
                                       os.pardir))
 sys.path.insert(0, top_dir)

+import futurist
+
 from taskflow import engines
 from taskflow.patterns import linear_flow as lf
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task
-from taskflow.types import futures
 from taskflow.utils import eventlet_utils


 # INTRO: This is the defacto hello world equivalent for taskflow; it shows how
-# a overly simplistic workflow can be created that runs using different
+# an overly simplistic workflow can be created that runs using different
 # engines using different styles of execution (all can be used to run in
 # parallel if a workflow is provided that is parallelizable).

@@ -82,19 +83,19 @@ song.add(PrinterTask("conductor@begin",

 # Run in parallel using eventlet green threads...
 if eventlet_utils.EVENTLET_AVAILABLE:
-    with futures.GreenThreadPoolExecutor() as executor:
+    with futurist.GreenThreadPoolExecutor() as executor:
        e = engines.load(song, executor=executor, engine='parallel')
        e.run()


 # Run in parallel using real threads...
-with futures.ThreadPoolExecutor(max_workers=1) as executor:
+with futurist.ThreadPoolExecutor(max_workers=1) as executor:
    e = engines.load(song, executor=executor, engine='parallel')
    e.run()


 # Run in parallel using external processes...
-with futures.ProcessPoolExecutor(max_workers=1) as executor:
+with futurist.ProcessPoolExecutor(max_workers=1) as executor:
    e = engines.load(song, executor=executor, engine='parallel')
    e.run()

--- a/taskflow/examples/job_board_no_test.py
+++ b/taskflow/examples/job_board_no_test.py
@@ -1,171 +0,0 @@
-# -*- encoding: utf-8 -*-
-#
-# Copyright © 2013 eNovance <licensing@enovance.com>
-#
-# Authors: Dan Krause <dan@dankrause.net>
-#          Cyril Roelandt <cyril.roelandt@enovance.com>
-#
-# Licensed under the Apache License, Version 2.0 (the "License"); you may
-# not use this file except in compliance with the License. You may obtain
-# a copy of the License at
-#
-#      http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-
-# This example shows how to use the job board feature.
-#
-# Let's start by creating some jobs:
-# $ python job_board_no_test.py create my-board my-job '{}'
-# $ python job_board_no_test.py create my-board my-job '{"foo": "bar"}'
-# $ python job_board_no_test.py create my-board my-job '{"foo": "baz"}'
-# $ python job_board_no_test.py create my-board my-job '{"foo": "barbaz"}'
-#
-# Make sure they were registered:
-# $ python job_board_no_test.py list my-board
-# 7277181a-1f83-473d-8233-f361615bae9e - {}
-# 84a396e8-d02e-450d-8566-d93cb68550c0 - {u'foo': u'bar'}
-# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
-# cd9aae2c-fd64-416d-8ba0-426fa8e3d59c - {u'foo': u'barbaz'}
-#
-# Perform one job:
-# $ python job_board_no_test.py consume my-board \
-#       84a396e8-d02e-450d-8566-d93cb68550c0
-# Performing job 84a396e8-d02e-450d-8566-d93cb68550c0 with args \
-#     {u'foo': u'bar'}
-# $ python job_board_no_test.py list my-board
-# 7277181a-1f83-473d-8233-f361615bae9e - {}
-# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
-# cd9aae2c-fd64-416d-8ba0-426fa8e3d59c - {u'foo': u'barbaz'}
-#
-# Delete a job:
-# $ python job_board_no_test.py delete my-board \
-#       cd9aae2c-fd64-416d-8ba0-426fa8e3d59c
-# $ python job_board_no_test.py list my-board
-# 7277181a-1f83-473d-8233-f361615bae9e - {}
-# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
-#
-# Delete all the remaining jobs
-# $ python job_board_no_test.py clear my-board
-# $ python job_board_no_test.py list my-board
-# $
-
-import argparse
-import contextlib
-import json
-import os
-import sys
-import tempfile
-
-import taskflow.jobs.backends as job_backends
-from taskflow.persistence import logbook
-
-import example_utils  # noqa
-
-
-@contextlib.contextmanager
-def jobboard(*args, **kwargs):
-    jb = job_backends.fetch(*args, **kwargs)
-    jb.connect()
-    yield jb
-    jb.close()
-
-
-conf = {
-    'board': 'zookeeper',
-    'hosts': ['127.0.0.1:2181']
-}
-
-
-def consume_job(args):
-    def perform_job(job):
-        print("Performing job %s with args %s" % (job.uuid, job.details))
-
-    with jobboard(args.board_name, conf) as jb:
-        for job in jb.iterjobs(ensure_fresh=True):
-            if job.uuid == args.job_uuid:
-                jb.claim(job, "test-client")
-                perform_job(job)
-                jb.consume(job, "test-client")
-
-
-def clear_jobs(args):
-    with jobboard(args.board_name, conf) as jb:
-        for job in jb.iterjobs(ensure_fresh=True):
-            jb.claim(job, "test-client")
-            jb.consume(job, "test-client")
-
-
-def create_job(args):
-    store = json.loads(args.details)
-    book = logbook.LogBook(args.job_name)
-    if example_utils.SQLALCHEMY_AVAILABLE:
-        persist_path = os.path.join(tempfile.gettempdir(), "persisting.db")
-        backend_uri = "sqlite:///%s" % (persist_path)
-    else:
-        persist_path = os.path.join(tempfile.gettempdir(), "persisting")
-        backend_uri = "file:///%s" % (persist_path)
-    with example_utils.get_backend(backend_uri) as backend:
-        backend.get_connection().save_logbook(book)
-        with jobboard(args.board_name, conf, persistence=backend) as jb:
-            jb.post(args.job_name, book, details=store)
-
-
-def list_jobs(args):
-    with jobboard(args.board_name, conf) as jb:
-        for job in jb.iterjobs(ensure_fresh=True):
-            print("%s - %s" % (job.uuid, job.details))
-
-
-def delete_job(args):
-    with jobboard(args.board_name, conf) as jb:
-        for job in jb.iterjobs(ensure_fresh=True):
-            if job.uuid == args.job_uuid:
-                jb.claim(job, "test-client")
-                jb.consume(job, "test-client")
-
-
-def main(argv):
-    parser = argparse.ArgumentParser()
-    subparsers = parser.add_subparsers(title='subcommands',
-                                       description='valid subcommands',
-                                       help='additional help')
-
-    # Consume command
-    parser_consume = subparsers.add_parser('consume')
-    parser_consume.add_argument('board_name')
-    parser_consume.add_argument('job_uuid')
-    parser_consume.set_defaults(func=consume_job)
-
-    # Clear command
-    parser_consume = subparsers.add_parser('clear')
-    parser_consume.add_argument('board_name')
-    parser_consume.set_defaults(func=clear_jobs)
-
-    # Create command
-    parser_create = subparsers.add_parser('create')
-    parser_create.add_argument('board_name')
-    parser_create.add_argument('job_name')
-    parser_create.add_argument('details')
-    parser_create.set_defaults(func=create_job)
-
-    # Delete command
-    parser_delete = subparsers.add_parser('delete')
-    parser_delete.add_argument('board_name')
-    parser_delete.add_argument('job_uuid')
-    parser_delete.set_defaults(func=delete_job)
-
-    # List command
-    parser_list = subparsers.add_parser('list')
-    parser_list.add_argument('board_name')
-    parser_list.set_defaults(func=list_jobs)
-
-    args = parser.parse_args(argv)
-    args.func(args)
-
-if __name__ == '__main__':
-    main(sys.argv[1:])
--- a/taskflow/examples/jobboard_produce_consume_colors.py
+++ b/taskflow/examples/jobboard_produce_consume_colors.py
@@ -30,6 +30,7 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
                                       os.pardir))
 sys.path.insert(0, top_dir)

+import six
 from six.moves import range as compat_range
 from zake import fake_client

@@ -40,7 +41,7 @@ from taskflow.utils import threading_utils
 # In this example we show how a jobboard can be used to post work for other
 # entities to work on. This example creates a set of jobs using one producer
 # thread (typically this would be split across many machines) and then having
-# other worker threads with there own jobboards select work using a given
+# other worker threads with their own jobboards select work using a given
 # filters [red/blue] and then perform that work (and consuming or abandoning
 # the job after it has been completed or failed).

@@ -66,7 +67,7 @@ PRODUCER_UNITS = 10

 # How many units of work are expected to be produced (used so workers can
 # know when to stop running and shutdown, typically this would not be a
-# a value but we have to limit this examples execution time to be less than
+# a value but we have to limit this example's execution time to be less than
 # infinity).
 EXPECTED_UNITS = PRODUCER_UNITS * PRODUCERS

@@ -150,6 +151,14 @@ def producer(ident, client):


 def main():
+    if six.PY3:
+        # TODO(harlowja): Hack to make eventlet work right, remove when the
+        # following is fixed: https://github.com/eventlet/eventlet/issues/230
+        from taskflow.utils import eventlet_utils as _eu  # noqa
+        try:
+            import eventlet as _eventlet  # noqa
+        except ImportError:
+            pass
    with contextlib.closing(fake_client.FakeClient()) as c:
        created = []
        for i in compat_range(0, PRODUCERS):
--- a/taskflow/examples/parallel_table_multiply.py
+++ b/taskflow/examples/parallel_table_multiply.py
@@ -27,12 +27,12 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
                                       os.pardir))
 sys.path.insert(0, top_dir)

+import futurist
 from six.moves import range as compat_range

 from taskflow import engines
 from taskflow.patterns import unordered_flow as uf
 from taskflow import task
-from taskflow.types import futures
 from taskflow.utils import eventlet_utils

 # INTRO: This example walks through a miniature workflow which does a parallel
@@ -98,9 +98,9 @@ def main():

    # Now run it (using the specified executor)...
    if eventlet_utils.EVENTLET_AVAILABLE:
-        executor = futures.GreenThreadPoolExecutor(max_workers=5)
+        executor = futurist.GreenThreadPoolExecutor(max_workers=5)
    else:
-        executor = futures.ThreadPoolExecutor(max_workers=5)
+        executor = futurist.ThreadPoolExecutor(max_workers=5)
    try:
        e = engines.load(f, engine='parallel', executor=executor)
        for st in e.run_iter():
--- a/taskflow/examples/persistence_example.py
+++ b/taskflow/examples/persistence_example.py
@@ -31,7 +31,7 @@ sys.path.insert(0, self_dir)

 from taskflow import engines
 from taskflow.patterns import linear_flow as lf
-from taskflow.persistence import logbook
+from taskflow.persistence import models
 from taskflow import task
 from taskflow.utils import persistence_utils as p_utils

@@ -68,15 +68,15 @@ class ByeTask(task.Task):
        print("Bye!")


-# This generates your flow structure (at this stage nothing is ran).
+# This generates your flow structure (at this stage nothing is run).
 def make_flow(blowup=False):
    flow = lf.Flow("hello-world")
    flow.add(HiTask(), ByeTask(blowup))
    return flow


-# Persist the flow and task state here, if the file/dir exists already blowup
-# if not don't blowup, this allows a user to see both the modes and to see
+# Persist the flow and task state here, if the file/dir exists already blow up
+# if not don't blow up, this allows a user to see both the modes and to see
 # what is stored in each case.
 if eu.SQLALCHEMY_AVAILABLE:
    persist_path = os.path.join(tempfile.gettempdir(), "persisting.db")
@@ -91,10 +91,10 @@ else:
    blowup = True

 with eu.get_backend(backend_uri) as backend:
-    # Make a flow that will blowup if the file doesn't exist previously, if it
-    # did exist, assume we won't blowup (and therefore this shows the undo
+    # Make a flow that will blow up if the file didn't exist previously, if it
+    # did exist, assume we won't blow up (and therefore this shows the undo
    # and redo that a flow will go through).
-    book = logbook.LogBook("my-test")
+    book = models.LogBook("my-test")
    flow = make_flow(blowup=blowup)
    eu.print_wrapped("Running")
    try:
--- a/taskflow/examples/resume_vm_boot.py
+++ b/taskflow/examples/resume_vm_boot.py
@@ -31,6 +31,7 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)
 sys.path.insert(0, self_dir)

+import futurist
 from oslo_utils import uuidutils

 from taskflow import engines
@@ -38,13 +39,12 @@ from taskflow import exceptions as exc
 from taskflow.patterns import graph_flow as gf
 from taskflow.patterns import linear_flow as lf
 from taskflow import task
-from taskflow.types import futures
 from taskflow.utils import eventlet_utils
 from taskflow.utils import persistence_utils as p_utils

 import example_utils as eu  # noqa

-# INTRO: This examples shows how a hierarchy of flows can be used to create a
+# INTRO: These examples show how a hierarchy of flows can be used to create a
 # vm in a reliable & resumable manner using taskflow + a miniature version of
 # what nova does while booting a vm.

@@ -239,7 +239,7 @@ with eu.get_backend() as backend:
    # Set up how we want our engine to run, serial, parallel...
    executor = None
    if eventlet_utils.EVENTLET_AVAILABLE:
-        executor = futures.GreenThreadPoolExecutor(5)
+        executor = futurist.GreenThreadPoolExecutor(5)

    # Create/fetch a logbook that will track the workflows work.
    book = None
--- a/taskflow/examples/resume_volume_create.py
+++ b/taskflow/examples/resume_volume_create.py
@@ -39,7 +39,7 @@ from taskflow.utils import persistence_utils as p_utils

 import example_utils  # noqa

-# INTRO: This examples shows how a hierarchy of flows can be used to create a
+# INTRO: These examples show how a hierarchy of flows can be used to create a
 # pseudo-volume in a reliable & resumable manner using taskflow + a miniature
 # version of what cinder does while creating a volume (very miniature).

--- a/taskflow/examples/retry_flow.py
+++ b/taskflow/examples/retry_flow.py
@@ -32,7 +32,7 @@ from taskflow import task

 # INTRO: In this example we create a retry controller that receives a phone
 # directory and tries different phone numbers. The next task tries to call Jim
-# using the given number. If if is not a Jim's number, the tasks raises an
+# using the given number. If it is not a Jim's number, the task raises an
 # exception and retry controller takes the next number from the phone
 # directory and retries the call.
 #
--- a/taskflow/examples/run_by_iter.py
+++ b/taskflow/examples/run_by_iter.py
@@ -37,7 +37,7 @@ from taskflow import task
 from taskflow.utils import persistence_utils


-# INTRO: This examples shows how to run a set of engines at the same time, each
+# INTRO: This example shows how to run a set of engines at the same time, each
 # running in different engines using a single thread of control to iterate over
 # each engine (which causes that engine to advanced to its next state during
 # each iteration).
--- a/taskflow/examples/run_by_iter_enumerate.py
+++ b/taskflow/examples/run_by_iter_enumerate.py
@@ -33,10 +33,10 @@ from taskflow.persistence import backends as persistence_backends
 from taskflow import task
 from taskflow.utils import persistence_utils

-# INTRO: This examples shows how to run a engine using the engine iteration
+# INTRO: These examples show how to run an engine using the engine iteration
 # capability, in between iterations other activities occur (in this case a
 # value is output to stdout); but more complicated actions can occur at the
-# boundary when a engine yields its current state back to the caller.
+# boundary when an engine yields its current state back to the caller.


 class EchoNameTask(task.Task):
--- a/taskflow/examples/share_engine_thread.py
+++ b/taskflow/examples/share_engine_thread.py
@@ -0,0 +1,81 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2012-2013 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+import os
+import random
+import sys
+import time
+
+logging.basicConfig(level=logging.ERROR)
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+import futurist
+import six
+
+from taskflow import engines
+from taskflow.patterns import unordered_flow as uf
+from taskflow import task
+from taskflow.utils import threading_utils as tu
+
+# INTRO: in this example we create 2 dummy flow(s) with a 2 dummy task(s), and
+# run it using a shared thread pool executor to show how a single executor can
+# be used with more than one engine (sharing the execution thread pool between
+# them); this allows for saving resources and reusing threads in situations
+# where this is benefical.
+
+
+class DelayedTask(task.Task):
+    def __init__(self, name):
+        super(DelayedTask, self).__init__(name=name)
+        self._wait_for = random.random()
+
+    def execute(self):
+        print("Running '%s' in thread '%s'" % (self.name, tu.get_ident()))
+        time.sleep(self._wait_for)
+
+
+f1 = uf.Flow("f1")
+f1.add(DelayedTask("f1-1"))
+f1.add(DelayedTask("f1-2"))
+
+f2 = uf.Flow("f2")
+f2.add(DelayedTask("f2-1"))
+f2.add(DelayedTask("f2-2"))
+
+# Run them all using the same futures (thread-pool based) executor...
+with futurist.ThreadPoolExecutor() as ex:
+    e1 = engines.load(f1, engine='parallel', executor=ex)
+    e2 = engines.load(f2, engine='parallel', executor=ex)
+    iters = [e1.run_iter(), e2.run_iter()]
+    # Iterate over a copy (so we can remove from the source list).
+    cloned_iters = list(iters)
+    while iters:
+        # Run a single 'step' of each iterator, forcing each engine to perform
+        # some work, then yield, and repeat until each iterator is consumed
+        # and there is no more engine work to be done.
+        for it in cloned_iters:
+            try:
+                six.next(it)
+            except StopIteration:
+                try:
+                    iters.remove(it)
+                except ValueError:
+                    pass
--- a/taskflow/examples/simple_linear.py
+++ b/taskflow/examples/simple_linear.py
@@ -41,8 +41,8 @@ from taskflow import task
 # taskflow provides via tasks and flows makes it possible for you to easily at
 # a later time hook in a persistence layer (and then gain the functionality
 # that offers) when you decide the complexity of adding that layer in
-# is 'worth it' for your applications usage pattern (which certain applications
-# may not need).
+# is 'worth it' for your application's usage pattern (which certain
+# applications may not need).


 class CallJim(task.Task):
--- a/taskflow/examples/simple_linear_listening.py
+++ b/taskflow/examples/simple_linear_listening.py
@@ -37,7 +37,7 @@ ANY = notifier.Notifier.ANY
 # a given ~phone~ number (provided as a function input) in a linear fashion
 # (one after the other).
 #
-# For a workflow which is serial this shows a extremely simple way
+# For a workflow which is serial this shows an extremely simple way
 # of structuring your tasks (the code that does the work) into a linear
 # sequence (the flow) and then passing the work off to an engine, with some
 # initial data to be ran in a reliable manner.
@@ -92,11 +92,11 @@ engine = taskflow.engines.load(flow, store={
 })

 # This is where we attach our callback functions to the 2 different
-# notification objects that a engine exposes. The usage of a '*' (kleene star)
+# notification objects that an engine exposes. The usage of a ANY (kleene star)
 # here means that we want to be notified on all state changes, if you want to
 # restrict to a specific state change, just register that instead.
 engine.notifier.register(ANY, flow_watch)
-engine.task_notifier.register(ANY, task_watch)
+engine.atom_notifier.register(ANY, task_watch)

 # And now run!
 engine.run()
--- a/taskflow/examples/simple_linear_pass.py
+++ b/taskflow/examples/simple_linear_pass.py
@@ -31,7 +31,7 @@ from taskflow import engines
 from taskflow.patterns import linear_flow
 from taskflow import task

-# INTRO: This examples shows how a task (in a linear/serial workflow) can
+# INTRO: This example shows how a task (in a linear/serial workflow) can
 # produce an output that can be then consumed/used by a downstream task.


--- a/taskflow/examples/simple_map_reduce.py
+++ b/taskflow/examples/simple_map_reduce.py
@@ -27,9 +27,9 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
 sys.path.insert(0, top_dir)
 sys.path.insert(0, self_dir)

-# INTRO: this examples shows a simplistic map/reduce implementation where
+# INTRO: These examples show a simplistic map/reduce implementation where
 # a set of mapper(s) will sum a series of input numbers (in parallel) and
-# return there individual summed result. A reducer will then use those
+# return their individual summed result. A reducer will then use those
 # produced values and perform a final summation and this result will then be
 # printed (and verified to ensure the calculation was as expected).

--- a/taskflow/examples/switch_graph_flow.py
+++ b/taskflow/examples/switch_graph_flow.py
@@ -0,0 +1,75 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import logging
+import os
+import sys
+
+logging.basicConfig(level=logging.ERROR)
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+from taskflow import engines
+from taskflow.patterns import graph_flow as gf
+from taskflow.persistence import backends
+from taskflow import task
+from taskflow.utils import persistence_utils as pu
+
+
+class DummyTask(task.Task):
+    def execute(self):
+        print("Running %s" % self.name)
+
+
+def allow(history):
+    print(history)
+    return False
+
+
+r = gf.Flow("root")
+r_a = DummyTask('r-a')
+r_b = DummyTask('r-b')
+r.add(r_a, r_b)
+r.link(r_a, r_b, decider=allow)
+
+backend = backends.fetch({
+    'connection': 'memory://',
+})
+book, flow_detail = pu.temporary_flow_detail(backend=backend)
+
+e = engines.load(r, flow_detail=flow_detail, book=book, backend=backend)
+e.compile()
+e.prepare()
+e.run()
+
+
+print("---------")
+print("After run")
+print("---------")
+entries = [os.path.join(backend.memory.root_path, child)
+           for child in backend.memory.ls(backend.memory.root_path)]
+while entries:
+    path = entries.pop()
+    value = backend.memory[path]
+    if value:
+        print("%s -> %s" % (path, value))
+    else:
+        print("%s" % (path))
+    entries.extend(os.path.join(path, child)
+                   for child in backend.memory.ls(path))
--- a/taskflow/examples/timing_listener.py
+++ b/taskflow/examples/timing_listener.py
@@ -36,7 +36,7 @@ from taskflow import task
 # and have variable run time tasks run and show how the listener will print
 # out how long those tasks took (when they started and when they finished).
 #
-# This shows how timing metrics can be gathered (or attached onto a engine)
+# This shows how timing metrics can be gathered (or attached onto an engine)
 # after a workflow has been constructed, making it easy to gather metrics
 # dynamically for situations where this kind of information is applicable (or
 # even adding this information on at a later point in the future when your
@@ -55,5 +55,5 @@ class VariableTask(task.Task):
 f = lf.Flow('root')
 f.add(VariableTask('a'), VariableTask('b'), VariableTask('c'))
 e = engines.load(f)
-with timing.PrintingTimingListener(e):
+with timing.PrintingDurationListener(e):
    e.run()
--- a/taskflow/examples/tox_conductor.py
+++ b/taskflow/examples/tox_conductor.py
@@ -0,0 +1,243 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import contextlib
+import itertools
+import logging
+import os
+import shutil
+import socket
+import sys
+import tempfile
+import threading
+import time
+
+logging.basicConfig(level=logging.ERROR)
+
+top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
+                                       os.pardir,
+                                       os.pardir))
+sys.path.insert(0, top_dir)
+
+from oslo_utils import timeutils
+from oslo_utils import uuidutils
+import six
+from zake import fake_client
+
+from taskflow.conductors import backends as conductors
+from taskflow import engines
+from taskflow.jobs import backends as boards
+from taskflow.patterns import linear_flow
+from taskflow.persistence import backends as persistence
+from taskflow.persistence import models
+from taskflow import task
+from taskflow.utils import threading_utils
+
+# INTRO: This examples shows how a worker/producer can post desired work (jobs)
+# to a jobboard and a conductor can consume that work (jobs) from that jobboard
+# and execute those jobs in a reliable & async manner (for example, if the
+# conductor were to crash then the job will be released back onto the jobboard
+# and another conductor can attempt to finish it, from wherever that job last
+# left off).
+#
+# In this example a in-memory jobboard (and in-memory storage) is created and
+# used that simulates how this would be done at a larger scale (it is an
+# example after all).
+
+# Restrict how long this example runs for...
+RUN_TIME = 5
+REVIEW_CREATION_DELAY = 0.5
+SCAN_DELAY = 0.1
+NAME = "%s_%s" % (socket.getfqdn(), os.getpid())
+
+# This won't really use zookeeper but will use a local version of it using
+# the zake library that mimics an actual zookeeper cluster using threads and
+# an in-memory data structure.
+JOBBOARD_CONF = {
+    'board': 'zookeeper://localhost?path=/taskflow/tox/jobs',
+}
+
+
+class RunReview(task.Task):
+    # A dummy task that clones the review and runs tox...
+
+    def _clone_review(self, review, temp_dir):
+        print("Cloning review '%s' into %s" % (review['id'], temp_dir))
+
+    def _run_tox(self, temp_dir):
+        print("Running tox in %s" % temp_dir)
+
+    def execute(self, review, temp_dir):
+        self._clone_review(review, temp_dir)
+        self._run_tox(temp_dir)
+
+
+class MakeTempDir(task.Task):
+    # A task that creates and destroys a temporary dir (on failure).
+    #
+    # It provides the location of the temporary dir for other tasks to use
+    # as they see fit.
+
+    default_provides = 'temp_dir'
+
+    def execute(self):
+        return tempfile.mkdtemp()
+
+    def revert(self, *args, **kwargs):
+        temp_dir = kwargs.get(task.REVERT_RESULT)
+        if temp_dir:
+            shutil.rmtree(temp_dir)
+
+
+class CleanResources(task.Task):
+    # A task that cleans up any workflow resources.
+
+    def execute(self, temp_dir):
+        print("Removing %s" % temp_dir)
+        shutil.rmtree(temp_dir)
+
+
+def review_iter():
+    """Makes reviews (never-ending iterator/generator)."""
+    review_id_gen = itertools.count(0)
+    while True:
+        review_id = six.next(review_id_gen)
+        review = {
+            'id': review_id,
+        }
+        yield review
+
+
+# The reason this is at the module namespace level is important, since it must
+# be accessible from a conductor dispatching an engine, if it was a lambda
+# function for example, it would not be reimportable and the conductor would
+# be unable to reference it when creating the workflow to run.
+def create_review_workflow():
+    """Factory method used to create a review workflow to run."""
+    f = linear_flow.Flow("tester")
+    f.add(
+        MakeTempDir(name="maker"),
+        RunReview(name="runner"),
+        CleanResources(name="cleaner")
+    )
+    return f
+
+
+def generate_reviewer(client, saver, name=NAME):
+    """Creates a review producer thread with the given name prefix."""
+    real_name = "%s_reviewer" % name
+    no_more = threading.Event()
+    jb = boards.fetch(real_name, JOBBOARD_CONF,
+                      client=client, persistence=saver)
+
+    def make_save_book(saver, review_id):
+        # Record what we want to happen (sometime in the future).
+        book = models.LogBook("book_%s" % review_id)
+        detail = models.FlowDetail("flow_%s" % review_id,
+                                   uuidutils.generate_uuid())
+        book.add(detail)
+        # Associate the factory method we want to be called (in the future)
+        # with the book, so that the conductor will be able to call into
+        # that factory to retrieve the workflow objects that represent the
+        # work.
+        #
+        # These args and kwargs *can* be used to save any specific parameters
+        # into the factory when it is being called to create the workflow
+        # objects (typically used to tell a factory how to create a unique
+        # workflow that represents this review).
+        factory_args = ()
+        factory_kwargs = {}
+        engines.save_factory_details(detail, create_review_workflow,
+                                     factory_args, factory_kwargs)
+        with contextlib.closing(saver.get_connection()) as conn:
+            conn.save_logbook(book)
+            return book
+
+    def run():
+        """Periodically publishes 'fake' reviews to analyze."""
+        jb.connect()
+        review_generator = review_iter()
+        with contextlib.closing(jb):
+            while not no_more.is_set():
+                review = six.next(review_generator)
+                details = {
+                    'store': {
+                        'review': review,
+                    },
+                }
+                job_name = "%s_%s" % (real_name, review['id'])
+                print("Posting review '%s'" % review['id'])
+                jb.post(job_name,
+                        book=make_save_book(saver, review['id']),
+                        details=details)
+                time.sleep(REVIEW_CREATION_DELAY)
+
+    # Return the unstarted thread, and a callback that can be used
+    # shutdown that thread (to avoid running forever).
+    return (threading_utils.daemon_thread(target=run), no_more.set)
+
+
+def generate_conductor(client, saver, name=NAME):
+    """Creates a conductor thread with the given name prefix."""
+    real_name = "%s_conductor" % name
+    jb = boards.fetch(name, JOBBOARD_CONF,
+                      client=client, persistence=saver)
+    conductor = conductors.fetch("blocking", real_name, jb,
+                                 engine='parallel', wait_timeout=SCAN_DELAY)
+
+    def run():
+        jb.connect()
+        with contextlib.closing(jb):
+            conductor.run()
+
+    # Return the unstarted thread, and a callback that can be used
+    # shutdown that thread (to avoid running forever).
+    return (threading_utils.daemon_thread(target=run), conductor.stop)
+
+
+def main():
+    # Need to share the same backend, so that data can be shared...
+    persistence_conf = {
+        'connection': 'memory',
+    }
+    saver = persistence.fetch(persistence_conf)
+    with contextlib.closing(saver.get_connection()) as conn:
+        # This ensures that the needed backend setup/data directories/schema
+        # upgrades and so on... exist before they are attempted to be used...
+        conn.upgrade()
+    fc1 = fake_client.FakeClient()
+    # Done like this to share the same client storage location so the correct
+    # zookeeper features work across clients...
+    fc2 = fake_client.FakeClient(storage=fc1.storage)
+    entities = [
+        generate_reviewer(fc1, saver),
+        generate_conductor(fc2, saver),
+    ]
+    for t, stopper in entities:
+        t.start()
+    try:
+        watch = timeutils.StopWatch(duration=RUN_TIME)
+        watch.start()
+        while not watch.expired():
+            time.sleep(0.1)
+    finally:
+        for t, stopper in reversed(entities):
+            stopper()
+            t.join()
+
+
+if __name__ == '__main__':
+    main()
--- a/taskflow/examples/wbe_event_sender.py
+++ b/taskflow/examples/wbe_event_sender.py
@@ -36,10 +36,10 @@ from taskflow.utils import threading_utils

 ANY = notifier.Notifier.ANY

-# INTRO: This examples shows how to use a remote workers event notification
+# INTRO: These examples show how to use a remote worker's event notification
 # attribute to proxy back task event notifications to the controlling process.
 #
-# In this case a simple set of events are triggered by a worker running a
+# In this case a simple set of events is triggered by a worker running a
 # task (simulated to be remote by using a kombu memory transport and threads).
 # Those events that the 'remote worker' produces will then be proxied back to
 # the task that the engine is running 'remotely', and then they will be emitted
@@ -113,10 +113,10 @@ if __name__ == "__main__":
    workers = []

    # These topics will be used to request worker information on; those
-    # workers will respond with there capabilities which the executing engine
+    # workers will respond with their capabilities which the executing engine
    # will use to match pending tasks to a matched worker, this will cause
    # the task to be sent for execution, and the engine will wait until it
-    # is finished (a response is recieved) and then the engine will either
+    # is finished (a response is received) and then the engine will either
    # continue with other tasks, do some retry/failure resolution logic or
    # stop (and potentially re-raise the remote workers failure)...
    worker_topics = []
--- a/taskflow/examples/wbe_mandelbrot.py
+++ b/taskflow/examples/wbe_mandelbrot.py
@@ -111,11 +111,11 @@ def calculate(engine_conf):
    # an image bitmap file.

    # And unordered flow is used here since the mandelbrot calculation is an
-    # example of a embarrassingly parallel computation that we can scatter
+    # example of an embarrassingly parallel computation that we can scatter
    # across as many workers as possible.
    flow = uf.Flow("mandelbrot")

-    # These symbols will be automatically given to tasks as input to there
+    # These symbols will be automatically given to tasks as input to their
    # execute method, in this case these are constants used in the mandelbrot
    # calculation.
    store = {
--- a/taskflow/exceptions.py
+++ b/taskflow/exceptions.py
@@ -17,15 +17,46 @@
 import os
 import traceback

+from oslo_utils import excutils
+from oslo_utils import reflection
 import six


+def raise_with_cause(exc_cls, message, *args, **kwargs):
+    """Helper to raise + chain exceptions (when able) and associate a *cause*.
+
+    NOTE(harlowja): Since in py3.x exceptions can be chained (due to
+    :pep:`3134`) we should try to raise the desired exception with the given
+    *cause* (or extract a *cause* from the current stack if able) so that the
+    exception formats nicely in old and new versions of python. Since py2.x
+    does **not** support exception chaining (or formatting) our root exception
+    class has a :py:meth:`~taskflow.exceptions.TaskFlowException.pformat`
+    method that can be used to get *similar* information instead (and this
+    function makes sure to retain the *cause* in that case as well so
+    that the :py:meth:`~taskflow.exceptions.TaskFlowException.pformat` method
+    shows them).
+
+    :param exc_cls: the :py:class:`~taskflow.exceptions.TaskFlowException`
+                    class to raise.
+    :param message: the text/str message that will be passed to
+                    the exceptions constructor as its first positional
+                    argument.
+    :param args: any additional positional arguments to pass to the
+                 exceptions constructor.
+    :param kwargs: any additional keyword arguments to pass to the
+                   exceptions constructor.
+    """
+    if not issubclass(exc_cls, TaskFlowException):
+        raise ValueError("Subclass of taskflow exception is required")
+    excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
+
+
 class TaskFlowException(Exception):
    """Base class for *most* exceptions emitted from this library.

    NOTE(harlowja): in later versions of python we can likely remove the need
-    to have a cause here as PY3+ have implemented PEP 3134 which handles
-    chaining in a much more elegant manner.
+    to have a ``cause`` here as PY3+ have implemented :pep:`3134` which
+    handles chaining in a much more elegant manner.

    :param message: the exception message, typically some string that is
                    useful for consumers to view when debugging or analyzing
@@ -43,35 +74,55 @@ class TaskFlowException(Exception):
    def cause(self):
        return self._cause

-    def pformat(self, indent=2, indent_text=" "):
+    def __str__(self):
+        return self.pformat()
+
+    def _get_message(self):
+        # We must *not* call into the __str__ method as that will reactivate
+        # the pformat method, which will end up badly (and doesn't look
+        # pretty at all); so be careful...
+        return self.args[0]
+
+    def pformat(self, indent=2, indent_text=" ", show_root_class=False):
        """Pretty formats a taskflow exception + any connected causes."""
        if indent < 0:
-            raise ValueError("indent must be greater than or equal to zero")
-        return os.linesep.join(self._pformat(self, [], 0,
-                                             indent=indent,
-                                             indent_text=indent_text))
-
-    @classmethod
-    def _pformat(cls, excp, lines, current_indent, indent=2, indent_text=" "):
-        line_prefix = indent_text * current_indent
-        for line in traceback.format_exception_only(type(excp), excp):
-            # We'll add our own newlines on at the end of formatting.
-            #
-            # NOTE(harlowja): the reason we don't search for os.linesep is
-            # that the traceback module seems to only use '\n' (for some
-            # reason).
-            if line.endswith("\n"):
-                line = line[0:-1]
-            lines.append(line_prefix + line)
-        try:
-            cause = excp.cause
-        except AttributeError:
-            pass
-        else:
-            if cause is not None:
-                cls._pformat(cause, lines, current_indent + indent,
-                             indent=indent, indent_text=indent_text)
-        return lines
+            raise ValueError("Provided 'indent' must be greater than"
+                             " or equal to zero instead of %s" % indent)
+        buf = six.StringIO()
+        if show_root_class:
+            buf.write(reflection.get_class_name(self, fully_qualified=False))
+            buf.write(": ")
+        buf.write(self._get_message())
+        active_indent = indent
+        next_up = self.cause
+        seen = []
+        while next_up is not None and next_up not in seen:
+            seen.append(next_up)
+            buf.write(os.linesep)
+            if isinstance(next_up, TaskFlowException):
+                buf.write(indent_text * active_indent)
+                buf.write(reflection.get_class_name(next_up,
+                                                    fully_qualified=False))
+                buf.write(": ")
+                buf.write(next_up._get_message())
+            else:
+                lines = traceback.format_exception_only(type(next_up), next_up)
+                for i, line in enumerate(lines):
+                    buf.write(indent_text * active_indent)
+                    if line.endswith("\n"):
+                        # We'll add our own newlines on...
+                        line = line[0:-1]
+                    buf.write(line)
+                    if i + 1 != len(lines):
+                        buf.write(os.linesep)
+            if not isinstance(next_up, TaskFlowException):
+                # Don't go deeper into non-taskflow exceptions... as we
+                # don't know if there exception 'cause' attributes are even
+                # useable objects...
+                break
+            active_indent += indent
+            next_up = getattr(next_up, 'cause', None)
+        return buf.getvalue()


 # Errors related to storage or operations on storage units.
--- a/taskflow/flow.py
+++ b/taskflow/flow.py
@@ -31,6 +31,9 @@ LINK_RETRY = 'retry'
 # This key denotes the link was created due to symbol constraints and the
 # value will be a set of names that the constraint ensures are satisfied.
 LINK_REASONS = 'reasons'
+#
+# This key denotes a callable that will determine if the target is visited.
+LINK_DECIDER = 'decider'


@six.add_metaclass(abc.ABCMeta)
@@ -96,9 +99,8 @@ class Flow(object):
        """

    def __str__(self):
-        lines = ["%s: %s" % (reflection.get_class_name(self), self.name)]
-        lines.append("%s" % (len(self)))
-        return "; ".join(lines)
+        return "%s: %s(len=%d)" % (reflection.get_class_name(self),
+                                   self.name, len(self))

    @property
    def provides(self):
--- a/taskflow/jobs/backends/init.py
+++ b/taskflow/jobs/backends/init.py
@@ -39,17 +39,17 @@ def fetch(name, conf, namespace=BACKEND_NAMESPACE, **kwargs):

    NOTE(harlowja): to aid in making it easy to specify configuration and
    options to a board the configuration (which is typical just a dictionary)
-    can also be a uri string that identifies the entrypoint name and any
+    can also be a URI string that identifies the entrypoint name and any
    configuration specific to that board.

-    For example, given the following configuration uri:
+    For example, given the following configuration URI::

-    zookeeper://<not-used>/?a=b&c=d
+        zookeeper://<not-used>/?a=b&c=d

    This will look for the entrypoint named 'zookeeper' and will provide
-    a configuration object composed of the uris parameters, in this case that
-    is {'a': 'b', 'c': 'd'} to the constructor of that board instance (also
-    including the name specified).
+    a configuration object composed of the URI's components, in this case that
+    is ``{'a': 'b', 'c': 'd'}`` to the constructor of that board
+    instance (also including the name specified).
    """
    if isinstance(conf, six.string_types):
        conf = {'board': conf}
--- a/taskflow/jobs/backends/impl_redis.py
+++ b/taskflow/jobs/backends/impl_redis.py
@@ -0,0 +1,957 @@
+# -*- coding: utf-8 -*-
+
+#    Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+import contextlib
+import datetime
+import string
+import threading
+import time
+
+import fasteners
+import msgpack
+from oslo_serialization import msgpackutils
+from oslo_utils import strutils
+from oslo_utils import timeutils
+from oslo_utils import uuidutils
+from redis import exceptions as redis_exceptions
+import six
+from six.moves import range as compat_range
+
+from taskflow import exceptions as exc
+from taskflow.jobs import base
+from taskflow import logging
+from taskflow import states
+from taskflow.utils import misc
+from taskflow.utils import redis_utils as ru
+
+
+LOG = logging.getLogger(__name__)
+
+
+@contextlib.contextmanager
+def _translate_failures():
+    """Translates common redis exceptions into taskflow exceptions."""
+    try:
+        yield
+    except redis_exceptions.ConnectionError:
+        exc.raise_with_cause(exc.JobFailure, "Failed to connect to redis")
+    except redis_exceptions.TimeoutError:
+        exc.raise_with_cause(exc.JobFailure,
+                             "Failed to communicate with redis, connection"
+                             " timed out")
+    except redis_exceptions.RedisError:
+        exc.raise_with_cause(exc.JobFailure,
+                             "Failed to communicate with redis,"
+                             " internal error")
+
+
+class RedisJob(base.Job):
+    """A redis job."""
+
+    def __init__(self, board, name, sequence, key,
+                 uuid=None, details=None,
+                 created_on=None, backend=None,
+                 book=None, book_data=None):
+        super(RedisJob, self).__init__(board, name,
+                                       uuid=uuid, details=details,
+                                       backend=backend,
+                                       book=book, book_data=book_data)
+        self._created_on = created_on
+        self._client = board._client
+        self._redis_version = board._redis_version
+        self._sequence = sequence
+        self._key = key
+        self._last_modified_key = board.join(key + board.LAST_MODIFIED_POSTFIX)
+        self._owner_key = board.join(key + board.OWNED_POSTFIX)
+
+    @property
+    def key(self):
+        """Key (in board listings/trash hash) the job data is stored under."""
+        return self._key
+
+    @property
+    def last_modified_key(self):
+        """Key the job last modified data is stored under."""
+        return self._last_modified_key
+
+    @property
+    def owner_key(self):
+        """Key the job claim + data of the owner is stored under."""
+        return self._owner_key
+
+    @property
+    def sequence(self):
+        """Sequence number of the current job."""
+        return self._sequence
+
+    def expires_in(self):
+        """How many seconds until the claim expires.
+
+        Returns the number of seconds until the ownership entry expires or
+        :attr:`~taskflow.utils.redis_utils.UnknownExpire.DOES_NOT_EXPIRE` or
+        :attr:`~taskflow.utils.redis_utils.UnknownExpire.KEY_NOT_FOUND` if it
+        does not expire or if the expiry can not be determined (perhaps the
+        :attr:`.owner_key` expired at/before time of inquiry?).
+        """
+        with _translate_failures():
+            return ru.get_expiry(self._client, self._owner_key,
+                                 prior_version=self._redis_version)
+
+    def extend_expiry(self, expiry):
+        """Extends the owner key (aka the claim) expiry for this job.
+
+        NOTE(harlowja): if the claim for this job did **not** previously
+        have an expiry associated with it, calling this method will create
+        one (and after that time elapses the claim on this job will cease
+        to exist).
+
+        Returns ``True`` if the expiry request was performed
+        otherwise ``False``.
+        """
+        with _translate_failures():
+            return ru.apply_expiry(self._client, self._owner_key, expiry,
+                                   prior_version=self._redis_version)
+
+    def __lt__(self, other):
+        if self.created_on == other.created_on:
+            return self.sequence < other.sequence
+        else:
+            return self.created_on < other.created_on
+
+    @property
+    def created_on(self):
+        return self._created_on
+
+    @property
+    def last_modified(self):
+        with _translate_failures():
+            raw_last_modified = self._client.get(self._last_modified_key)
+        last_modified = None
+        if raw_last_modified:
+            last_modified = self._board._loads(
+                raw_last_modified, root_types=(datetime.datetime,))
+            # NOTE(harlowja): just incase this is somehow busted (due to time
+            # sync issues/other), give back the most recent one (since redis
+            # does not maintain clock information; we could have this happen
+            # due to now clients who mutate jobs also send the time in).
+            last_modified = max(last_modified, self._created_on)
+        return last_modified
+
+    @property
+    def state(self):
+        listings_key = self._board.listings_key
+        owner_key = self._owner_key
+        listings_sub_key = self._key
+
+        def _do_fetch(p):
+            # NOTE(harlowja): state of a job in redis is not set into any
+            # explicit 'state' field, but is maintained by what nodes exist in
+            # redis instead (ie if a owner key exists, then we know a owner
+            # is active, if no job data exists and no owner, then we know that
+            # the job is unclaimed, and so-on)...
+            p.multi()
+            p.hexists(listings_key, listings_sub_key)
+            p.exists(owner_key)
+            job_exists, owner_exists = p.execute()
+            if not job_exists:
+                if owner_exists:
+                    # This should **not** be possible due to lua code ordering
+                    # but let's log an INFO statement if it does happen (so
+                    # that it can be investigated)...
+                    LOG.info("Unexpected owner key found at '%s' when job"
+                             " key '%s[%s]' was not found", owner_key,
+                             listings_key, listings_sub_key)
+                return states.COMPLETE
+            else:
+                if owner_exists:
+                    return states.CLAIMED
+                else:
+                    return states.UNCLAIMED
+
+        with _translate_failures():
+            return self._client.transaction(_do_fetch,
+                                            listings_key, owner_key,
+                                            value_from_callable=True)
+
+    def __str__(self):
+        """Pretty formats the job into something *more* meaningful."""
+        tpl = "%s: %s (uuid=%s, owner_key=%s, sequence=%s, details=%s)"
+        return tpl % (type(self).__name__,
+                      self.name, self.uuid, self.owner_key,
+                      self.sequence, self.details)
+
+
+class RedisJobBoard(base.JobBoard):
+    """A jobboard backed by `redis`_.
+
+    Powered by the `redis-py <http://redis-py.readthedocs.org/>`_ library.
+
+    This jobboard creates job entries by listing jobs in a redis `hash`_. This
+    hash contains jobs that can be actively worked on by (and examined/claimed
+    by) some set of eligible consumers. Job posting is typically performed
+    using the :meth:`.post` method (this creates a hash entry with job
+    contents/details encoded in `msgpack`_). The users of these
+    jobboard(s) (potentially on disjoint sets of machines) can then
+    iterate over the available jobs and decide if they want to attempt to
+    claim one of the jobs they have iterated over. If so they will then
+    attempt to contact redis and they will attempt to create a key in
+    redis (using a embedded lua script to perform this atomically) to claim a
+    desired job. If the entity trying to use the jobboard to :meth:`.claim`
+    the job is able to create that lock/owner key then it will be
+    allowed (and expected) to perform whatever *work* the contents of that
+    job described. Once the claiming entity is finished the lock/owner key
+    and the `hash`_ entry will be deleted (if successfully completed) in a
+    single request (also using a embedded lua script to perform this
+    atomically). If the claiming entity is not successful (or the entity
+    that claimed the job dies) the lock/owner key can be released
+    automatically (by **optional** usage of a claim expiry) or by
+    using :meth:`.abandon` to manually abandon the job so that it can be
+    consumed/worked on by others.
+
+    NOTE(harlowja): by default the :meth:`.claim` has no expiry (which
+    means claims will be persistent, even under claiming entity failure). To
+    ensure a expiry occurs pass a numeric value for the ``expiry`` keyword
+    argument to the :meth:`.claim` method that defines how many seconds the
+    claim should be retained for. When an expiry is used ensure that that
+    claim is kept alive while it is being worked on by using
+    the :py:meth:`~.RedisJob.extend_expiry` method periodically.
+
+    .. _msgpack: http://msgpack.org/
+    .. _redis: http://redis.io/
+    .. _hash: http://redis.io/topics/data-types#hashes
+    """
+
+    CLIENT_CONF_TRANSFERS = tuple([
+        # Host config...
+        ('host', str),
+        ('port', int),
+
+        # See: http://redis.io/commands/auth
+        ('password', str),
+
+        # Data encoding/decoding + error handling
+        ('encoding', str),
+        ('encoding_errors', str),
+
+        # Connection settings.
+        ('socket_timeout', float),
+        ('socket_connect_timeout', float),
+
+        # This one negates the usage of host, port, socket connection
+        # settings as it doesn't use the same kind of underlying socket...
+        ('unix_socket_path', str),
+
+        # Do u want ssl???
+        ('ssl', strutils.bool_from_string),
+        ('ssl_keyfile', str),
+        ('ssl_certfile', str),
+        ('ssl_cert_reqs', str),
+        ('ssl_ca_certs', str),
+
+        # See: http://www.rediscookbook.org/multiple_databases.html
+        ('db', int),
+    ])
+    """
+    Keys (and value type converters) that we allow to proxy from the jobboard
+    configuration into the redis client (used to configure the redis client
+    internals if no explicit client is provided via the ``client`` keyword
+    argument).
+
+    See: http://redis-py.readthedocs.org/en/latest/#redis.Redis
+
+    See: https://github.com/andymccurdy/redis-py/blob/2.10.3/redis/client.py
+    """
+
+    #: Postfix (combined with job key) used to make a jobs owner key.
+    OWNED_POSTFIX = b".owned"
+
+    #: Postfix (combined with job key) used to make a jobs last modified key.
+    LAST_MODIFIED_POSTFIX = b".last_modified"
+
+    #: Default namespace for keys when none is provided.
+    DEFAULT_NAMESPACE = b'taskflow'
+
+    MIN_REDIS_VERSION = (2, 6)
+    """
+    Minimum redis version this backend requires.
+
+    This version is required since we need the built-in server-side lua
+    scripting support that is included in 2.6 and newer.
+    """
+
+    NAMESPACE_SEP = b':'
+    """
+    Separator that is used to combine a key with the namespace (to get
+    the **actual** key that will be used).
+    """
+
+    KEY_PIECE_SEP = b'.'
+    """
+    Separator that is used to combine a bunch of key pieces together (to get
+    the **actual** key that will be used).
+    """
+
+    #: Expected lua response status field when call is ok.
+    SCRIPT_STATUS_OK = "ok"
+
+    #: Expected lua response status field when call is **not** ok.
+    SCRIPT_STATUS_ERROR = "error"
+
+    #: Expected lua script error response when the owner is not as expected.
+    SCRIPT_NOT_EXPECTED_OWNER = "Not expected owner!"
+
+    #: Expected lua script error response when the owner is not findable.
+    SCRIPT_UNKNOWN_OWNER = "Unknown owner!"
+
+    #: Expected lua script error response when the job is not findable.
+    SCRIPT_UNKNOWN_JOB = "Unknown job!"
+
+    #: Expected lua script error response when the job is already claimed.
+    SCRIPT_ALREADY_CLAIMED = "Job already claimed!"
+
+    SCRIPT_TEMPLATES = {
+        'consume': """
+-- Extract *all* the variables (so we can easily know what they are)...
+local owner_key = KEYS[1]
+local listings_key = KEYS[2]
+local last_modified_key = KEYS[3]
+
+local expected_owner = ARGV[1]
+local job_key = ARGV[2]
+local result = {}
+if redis.call("hexists", listings_key, job_key) == 1 then
+    if redis.call("exists", owner_key) == 1 then
+        local owner = redis.call("get", owner_key)
+        if owner ~= expected_owner then
+            result["status"] = "${error}"
+            result["reason"] = "${not_expected_owner}"
+            result["owner"] = owner
+        else
+            -- The order is important here, delete the owner first (and if
+            -- that blows up, the job data will still exist so it can be
+            -- worked on again, instead of the reverse)...
+            redis.call("del", owner_key, last_modified_key)
+            redis.call("hdel", listings_key, job_key)
+            result["status"] = "${ok}"
+        end
+    else
+        result["status"] = "${error}"
+        result["reason"] = "${unknown_owner}"
+    end
+else
+    result["status"] = "${error}"
+    result["reason"] = "${unknown_job}"
+end
+return cmsgpack.pack(result)
+""",
+        'claim': """
+local function apply_ttl(key, ms_expiry)
+    if ms_expiry ~= nil then
+        redis.call("pexpire", key, ms_expiry)
+    end
+end
+
+-- Extract *all* the variables (so we can easily know what they are)...
+local owner_key = KEYS[1]
+local listings_key = KEYS[2]
+local last_modified_key = KEYS[3]
+
+local expected_owner = ARGV[1]
+local job_key = ARGV[2]
+local last_modified_blob = ARGV[3]
+
+-- If this is non-numeric (which it may be) this becomes nil
+local ms_expiry = nil
+if ARGV[4] ~= "none" then
+    ms_expiry = tonumber(ARGV[4])
+end
+local result = {}
+if redis.call("hexists", listings_key, job_key) == 1 then
+    if redis.call("exists", owner_key) == 1 then
+        local owner = redis.call("get", owner_key)
+        if owner == expected_owner then
+            -- Owner is the same, leave it alone...
+            redis.call("set", last_modified_key, last_modified_blob)
+            apply_ttl(owner_key, ms_expiry)
+            result["status"] = "${ok}"
+        else
+            result["status"] = "${error}"
+            result["reason"] = "${already_claimed}"
+            result["owner"] = owner
+        end
+    else
+        redis.call("set", owner_key, expected_owner)
+        redis.call("set", last_modified_key, last_modified_blob)
+        apply_ttl(owner_key, ms_expiry)
+        result["status"] = "${ok}"
+    end
+else
+    result["status"] = "${error}"
+    result["reason"] = "${unknown_job}"
+end
+return cmsgpack.pack(result)
+""",
+        'abandon': """
+-- Extract *all* the variables (so we can easily know what they are)...
+local owner_key = KEYS[1]
+local listings_key = KEYS[2]
+local last_modified_key = KEYS[3]
+
+local expected_owner = ARGV[1]
+local job_key = ARGV[2]
+local last_modified_blob = ARGV[3]
+local result = {}
+if redis.call("hexists", listings_key, job_key) == 1 then
+    if redis.call("exists", owner_key) == 1 then
+        local owner = redis.call("get", owner_key)
+        if owner ~= expected_owner then
+            result["status"] = "${error}"
+            result["reason"] = "${not_expected_owner}"
+            result["owner"] = owner
+        else
+            redis.call("del", owner_key)
+            redis.call("set", last_modified_key, last_modified_blob)
+            result["status"] = "${ok}"
+        end
+    else
+        result["status"] = "${error}"
+        result["reason"] = "${unknown_owner}"
+    end
+else
+    result["status"] = "${error}"
+    result["reason"] = "${unknown_job}"
+end
+return cmsgpack.pack(result)
+""",
+        'trash': """
+-- Extract *all* the variables (so we can easily know what they are)...
+local owner_key = KEYS[1]
+local listings_key = KEYS[2]
+local last_modified_key = KEYS[3]
+local trash_listings_key = KEYS[4]
+
+local expected_owner = ARGV[1]
+local job_key = ARGV[2]
+local last_modified_blob = ARGV[3]
+local result = {}
+if redis.call("hexists", listings_key, job_key) == 1 then
+    local raw_posting = redis.call("hget", listings_key, job_key)
+    if redis.call("exists", owner_key) == 1 then
+        local owner = redis.call("get", owner_key)
+        if owner ~= expected_owner then
+            result["status"] = "${error}"
+            result["reason"] = "${not_expected_owner}"
+            result["owner"] = owner
+        else
+            -- This ordering is important (try to first move the value
+            -- and only if that works do we try to do any deletions)...
+            redis.call("hset", trash_listings_key, job_key, raw_posting)
+            redis.call("set", last_modified_key, last_modified_blob)
+            redis.call("del", owner_key)
+            redis.call("hdel", listings_key, job_key)
+            result["status"] = "${ok}"
+        end
+    else
+        result["status"] = "${error}"
+        result["reason"] = "${unknown_owner}"
+    end
+else
+    result["status"] = "${error}"
+    result["reason"] = "${unknown_job}"
+end
+return cmsgpack.pack(result)
+""",
+    }
+    """`Lua`_ **template** scripts that will be used by various methods (they
+    are turned into real scripts and loaded on call into the :func:`.connect`
+    method).
+
+    Some things to note:
+
+    - The lua script is ran serially, so when this runs no other command will
+      be mutating the backend (and redis also ensures that no other script
+      will be running) so atomicity of these scripts are  guaranteed by redis.
+
+    - Transactions were considered (and even mostly implemented) but
+      ultimately rejected since redis does not support rollbacks and
+      transactions can **not** be interdependent (later operations can **not**
+      depend on the results of earlier operations). Both of these issues limit
+      our ability to correctly report errors (with useful messages) and to
+      maintain consistency under failure/contention (due to the inability to
+      rollback). A third and final blow to using transactions was to
+      correctly use them we would have to set a watch on a *very* contentious
+      key (the listings key) which would under load cause clients to retry more
+      often then would be desired (this also increases network load, CPU
+      cycles used, transactions failures triggered and so on).
+
+    - Partial transaction execution is possible due to pre/post ``EXEC``
+      failures (and the lack of rollback makes this worse).
+
+    So overall after thinking, it seemed like having little lua scripts
+    was not that bad (even if it is somewhat convoluted) due to the above and
+    public mentioned issues with transactions. In general using lua scripts
+    for this purpose seems to be somewhat common practice and it solves the
+    issues that came up when transactions were considered & implemented.
+
+    Some links about redis (and redis + lua) that may be useful to look over:
+
+    - `Atomicity of scripts`_
+    - `Scripting and transactions`_
+    - `Why redis does not support rollbacks`_
+    - `Intro to lua for redis programmers`_
+    - `Five key takeaways for developing with redis`_
+    - `Everything you always wanted to know about redis`_ (slides)
+
+    .. _Lua: http://www.lua.org/
+    .. _Atomicity of scripts: http://redis.io/commands/eval#atomicity-of-\
+                              scripts
+    .. _Scripting and transactions: http://redis.io/topics/transactions#redis-\
+                                    scripting-and-transactions
+    .. _Why redis does not support rollbacks: http://redis.io/topics/transa\
+                                              ctions#why-redis-does-not-suppo\
+                                              rt-roll-backs
+    .. _Intro to lua for redis programmers: http://www.redisgreen.net/blog/int\
+                                            ro-to-lua-for-redis-programmers
+    .. _Five key takeaways for developing with redis: https://redislabs.com/bl\
+                                                      og/5-key-takeaways-fo\
+                                                      r-developing-with-redis
+    .. _Everything you always wanted to know about redis: http://www.slidesh
+                                                          are.net/carlosabal\
+                                                          de/everything-you-a\
+                                                          lways-wanted-to-\
+                                                          know-about-redis-b\
+                                                          ut-were-afraid-to-ask
+    """
+
+    @classmethod
+    def _make_client(cls, conf):
+        client_conf = {}
+        for key, value_type_converter in cls.CLIENT_CONF_TRANSFERS:
+            if key in conf:
+                if value_type_converter is not None:
+                    client_conf[key] = value_type_converter(conf[key])
+                else:
+                    client_conf[key] = conf[key]
+        return ru.RedisClient(**client_conf)
+
+    def __init__(self, name, conf,
+                 client=None, persistence=None):
+        super(RedisJobBoard, self).__init__(name, conf)
+        self._closed = True
+        if client is not None:
+            self._client = client
+            self._owns_client = False
+        else:
+            self._client = self._make_client(self._conf)
+            # NOTE(harlowja): This client should not work until connected...
+            self._client.close()
+            self._owns_client = True
+        self._namespace = self._conf.get('namespace', self.DEFAULT_NAMESPACE)
+        self._open_close_lock = threading.RLock()
+        # Redis server version connected to + scripts (populated on connect).
+        self._redis_version = None
+        self._scripts = {}
+        # The backend to load the full logbooks from, since what is sent over
+        # the data connection is only the logbook uuid and name, and not the
+        # full logbook.
+        self._persistence = persistence
+
+    def join(self, key_piece, *more_key_pieces):
+        """Create and return a namespaced key from many segments.
+
+        NOTE(harlowja): all pieces that are text/unicode are converted into
+        their binary equivalent (if they are already binary no conversion
+        takes place) before being joined (as redis expects binary keys and not
+        unicode/text ones).
+        """
+        namespace_pieces = []
+        if self._namespace is not None:
+            namespace_pieces = [self._namespace, self.NAMESPACE_SEP]
+        else:
+            namespace_pieces = []
+        key_pieces = [key_piece]
+        if more_key_pieces:
+            key_pieces.extend(more_key_pieces)
+        for i in compat_range(0, len(namespace_pieces)):
+            namespace_pieces[i] = misc.binary_encode(namespace_pieces[i])
+        for i in compat_range(0, len(key_pieces)):
+            key_pieces[i] = misc.binary_encode(key_pieces[i])
+        namespace = b"".join(namespace_pieces)
+        key = self.KEY_PIECE_SEP.join(key_pieces)
+        return namespace + key
+
+    @property
+    def namespace(self):
+        """The namespace all keys will be prefixed with (or none)."""
+        return self._namespace
+
+    @misc.cachedproperty
+    def trash_key(self):
+        """Key where a hash will be stored with trashed jobs in it."""
+        return self.join(b"trash")
+
+    @misc.cachedproperty
+    def sequence_key(self):
+        """Key where a integer will be stored (used to sequence jobs)."""
+        return self.join(b"sequence")
+
+    @misc.cachedproperty
+    def listings_key(self):
+        """Key where a hash will be stored with active jobs in it."""
+        return self.join(b"listings")
+
+    @property
+    def job_count(self):
+        with _translate_failures():
+            return self._client.hlen(self.listings_key)
+
+    @property
+    def connected(self):
+        return not self._closed
+
+    @fasteners.locked(lock='_open_close_lock')
+    def connect(self):
+        self.close()
+        if self._owns_client:
+            self._client = self._make_client(self._conf)
+        with _translate_failures():
+            # The client maintains a connection pool, so do a ping and
+            # if that works then assume the connection works, which may or
+            # may not be continuously maintained (if the server dies
+            # at a later time, we will become aware of that when the next
+            # op occurs).
+            self._client.ping()
+            is_new_enough, redis_version = ru.is_server_new_enough(
+                self._client, self.MIN_REDIS_VERSION)
+            if not is_new_enough:
+                wanted_version = ".".join([str(p)
+                                           for p in self.MIN_REDIS_VERSION])
+                if redis_version:
+                    raise exc.JobFailure("Redis version %s or greater is"
+                                         " required (version %s is to"
+                                         " old)" % (wanted_version,
+                                                    redis_version))
+                else:
+                    raise exc.JobFailure("Redis version %s or greater is"
+                                         " required" % (wanted_version))
+            else:
+                self._redis_version = redis_version
+                script_params = {
+                    # Status field values.
+                    'ok': self.SCRIPT_STATUS_OK,
+                    'error': self.SCRIPT_STATUS_ERROR,
+
+                    # Known error reasons (when status field is error).
+                    'not_expected_owner': self.SCRIPT_NOT_EXPECTED_OWNER,
+                    'unknown_owner': self.SCRIPT_UNKNOWN_OWNER,
+                    'unknown_job': self.SCRIPT_UNKNOWN_JOB,
+                    'already_claimed': self.SCRIPT_ALREADY_CLAIMED,
+                }
+                prepared_scripts = {}
+                for n, raw_script_tpl in six.iteritems(self.SCRIPT_TEMPLATES):
+                    script_tpl = string.Template(raw_script_tpl)
+                    script_blob = script_tpl.substitute(**script_params)
+                    script = self._client.register_script(script_blob)
+                    prepared_scripts[n] = script
+                self._scripts.update(prepared_scripts)
+                self._closed = False
+
+    @fasteners.locked(lock='_open_close_lock')
+    def close(self):
+        if self._owns_client:
+            self._client.close()
+        self._scripts.clear()
+        self._redis_version = None
+        self._closed = True
+
+    @staticmethod
+    def _dumps(obj):
+        try:
+            return msgpackutils.dumps(obj)
+        except (msgpack.PackException, ValueError):
+            # TODO(harlowja): remove direct msgpack exception access when
+            # oslo.utils provides easy access to the underlying msgpack
+            # pack/unpack exceptions..
+            exc.raise_with_cause(exc.JobFailure,
+                                 "Failed to serialize object to"
+                                 " msgpack blob")
+
+    @staticmethod
+    def _loads(blob, root_types=(dict,)):
+        try:
+            return misc.decode_msgpack(blob, root_types=root_types)
+        except (msgpack.UnpackException, ValueError):
+            # TODO(harlowja): remove direct msgpack exception access when
+            # oslo.utils provides easy access to the underlying msgpack
+            # pack/unpack exceptions..
+            exc.raise_with_cause(exc.JobFailure,
+                                 "Failed to deserialize object from"
+                                 " msgpack blob (of length %s)" % len(blob))
+
+    _decode_owner = staticmethod(misc.binary_decode)
+
+    _encode_owner = staticmethod(misc.binary_encode)
+
+    def find_owner(self, job):
+        owner_key = self.join(job.key + self.OWNED_POSTFIX)
+        with _translate_failures():
+            raw_owner = self._client.get(owner_key)
+            return self._decode_owner(raw_owner)
+
+    def post(self, name, book=None, details=None):
+        job_uuid = uuidutils.generate_uuid()
+        posting = base.format_posting(job_uuid, name,
+                                      created_on=timeutils.utcnow(),
+                                      book=book, details=details)
+        with _translate_failures():
+            sequence = self._client.incr(self.sequence_key)
+            posting.update({
+                'sequence': sequence,
+            })
+        with _translate_failures():
+            raw_posting = self._dumps(posting)
+            raw_job_uuid = six.b(job_uuid)
+            was_posted = bool(self._client.hsetnx(self.listings_key,
+                                                  raw_job_uuid, raw_posting))
+            if not was_posted:
+                raise exc.JobFailure("New job located at '%s[%s]' could not"
+                                     " be posted" % (self.listings_key,
+                                                     raw_job_uuid))
+            else:
+                return RedisJob(self, name, sequence, raw_job_uuid,
+                                uuid=job_uuid, details=details,
+                                created_on=posting['created_on'],
+                                book=book, book_data=posting.get('book'),
+                                backend=self._persistence)
+
+    def wait(self, timeout=None, initial_delay=0.005,
+             max_delay=1.0, sleep_func=time.sleep):
+        if initial_delay > max_delay:
+            raise ValueError("Initial delay %s must be less than or equal"
+                             " to the provided max delay %s"
+                             % (initial_delay, max_delay))
+        # This does a spin-loop that backs off by doubling the delay
+        # up to the provided max-delay. In the future we could try having
+        # a secondary client connected into redis pubsub and use that
+        # instead, but for now this is simpler.
+        w = timeutils.StopWatch(duration=timeout)
+        w.start()
+        delay = initial_delay
+        while True:
+            jc = self.job_count
+            if jc > 0:
+                it = self.iterjobs()
+                return it
+            else:
+                if w.expired():
+                    raise exc.NotFound("Expired waiting for jobs to"
+                                       " arrive; waited %s seconds"
+                                       % w.elapsed())
+                else:
+                    remaining = w.leftover(return_none=True)
+                    if remaining is not None:
+                        delay = min(delay * 2, remaining, max_delay)
+                    else:
+                        delay = min(delay * 2, max_delay)
+                    sleep_func(delay)
+
+    def iterjobs(self, only_unclaimed=False, ensure_fresh=False):
+        with _translate_failures():
+            raw_postings = self._client.hgetall(self.listings_key)
+        postings = []
+        for raw_job_key, raw_posting in six.iteritems(raw_postings):
+            posting = self._loads(raw_posting)
+            details = posting.get('details', {})
+            job_uuid = posting['uuid']
+            job = RedisJob(self, posting['name'], posting['sequence'],
+                           raw_job_key, uuid=job_uuid, details=details,
+                           created_on=posting['created_on'],
+                           book_data=posting.get('book'),
+                           backend=self._persistence)
+            postings.append(job)
+        postings = sorted(postings)
+        for job in postings:
+            if only_unclaimed:
+                if job.state == states.UNCLAIMED:
+                    yield job
+            else:
+                yield job
+
+    @base.check_who
+    def consume(self, job, who):
+        script = self._get_script('consume')
+        with _translate_failures():
+            raw_who = self._encode_owner(who)
+            raw_result = script(keys=[job.owner_key, self.listings_key,
+                                      job.last_modified_key],
+                                args=[raw_who, job.key])
+            result = self._loads(raw_result)
+        status = result['status']
+        if status != self.SCRIPT_STATUS_OK:
+            reason = result.get('reason')
+            if reason == self.SCRIPT_UNKNOWN_JOB:
+                raise exc.NotFound("Job %s not found to be"
+                                   " consumed" % (job.uuid))
+            elif reason == self.SCRIPT_UNKNOWN_OWNER:
+                raise exc.NotFound("Can not consume job %s"
+                                   " which we can not determine"
+                                   " the owner of" % (job.uuid))
+            elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
+                raw_owner = result.get('owner')
+                if raw_owner:
+                    owner = self._decode_owner(raw_owner)
+                    raise exc.JobFailure("Can not consume job %s"
+                                         " which is not owned by %s (it is"
+                                         " actively owned by %s)"
+                                         % (job.uuid, who, owner))
+                else:
+                    raise exc.JobFailure("Can not consume job %s"
+                                         " which is not owned by %s"
+                                         % (job.uuid, who))
+            else:
+                raise exc.JobFailure("Failure to consume job %s,"
+                                     " unknown internal error (reason=%s)"
+                                     % (job.uuid, reason))
+
+    @base.check_who
+    def claim(self, job, who, expiry=None):
+        if expiry is None:
+            # On the lua side none doesn't translate to nil so we have
+            # do to this string conversion to make sure that we can tell
+            # the difference.
+            ms_expiry = "none"
+        else:
+            ms_expiry = int(expiry * 1000.0)
+            if ms_expiry <= 0:
+                raise ValueError("Provided expiry (when converted to"
+                                 " milliseconds) must be greater"
+                                 " than zero instead of %s" % (expiry))
+        script = self._get_script('claim')
+        with _translate_failures():
+            raw_who = self._encode_owner(who)
+            raw_result = script(keys=[job.owner_key, self.listings_key,
+                                      job.last_modified_key],
+                                args=[raw_who, job.key,
+                                      # NOTE(harlowja): we need to send this
+                                      # in as a blob (even if it's not
+                                      # set/used), since the format can not
+                                      # currently be created in lua...
+                                      self._dumps(timeutils.utcnow()),
+                                      ms_expiry])
+            result = self._loads(raw_result)
+        status = result['status']
+        if status != self.SCRIPT_STATUS_OK:
+            reason = result.get('reason')
+            if reason == self.SCRIPT_UNKNOWN_JOB:
+                raise exc.NotFound("Job %s not found to be"
+                                   " claimed" % (job.uuid))
+            elif reason == self.SCRIPT_ALREADY_CLAIMED:
+                raw_owner = result.get('owner')
+                if raw_owner:
+                    owner = self._decode_owner(raw_owner)
+                    raise exc.UnclaimableJob("Job %s already"
+                                             " claimed by %s"
+                                             % (job.uuid, owner))
+                else:
+                    raise exc.UnclaimableJob("Job %s already"
+                                             " claimed" % (job.uuid))
+            else:
+                raise exc.JobFailure("Failure to claim job %s,"
+                                     " unknown internal error (reason=%s)"
+                                     % (job.uuid, reason))
+
+    @base.check_who
+    def abandon(self, job, who):
+        script = self._get_script('abandon')
+        with _translate_failures():
+            raw_who = self._encode_owner(who)
+            raw_result = script(keys=[job.owner_key, self.listings_key,
+                                      job.last_modified_key],
+                                args=[raw_who, job.key,
+                                      self._dumps(timeutils.utcnow())])
+            result = self._loads(raw_result)
+        status = result.get('status')
+        if status != self.SCRIPT_STATUS_OK:
+            reason = result.get('reason')
+            if reason == self.SCRIPT_UNKNOWN_JOB:
+                raise exc.NotFound("Job %s not found to be"
+                                   " abandoned" % (job.uuid))
+            elif reason == self.SCRIPT_UNKNOWN_OWNER:
+                raise exc.NotFound("Can not abandon job %s"
+                                   " which we can not determine"
+                                   " the owner of" % (job.uuid))
+            elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
+                raw_owner = result.get('owner')
+                if raw_owner:
+                    owner = self._decode_owner(raw_owner)
+                    raise exc.JobFailure("Can not abandon job %s"
+                                         " which is not owned by %s (it is"
+                                         " actively owned by %s)"
+                                         % (job.uuid, who, owner))
+                else:
+                    raise exc.JobFailure("Can not abandon job %s"
+                                         " which is not owned by %s"
+                                         % (job.uuid, who))
+            else:
+                raise exc.JobFailure("Failure to abandon job %s,"
+                                     " unknown internal"
+                                     " error (status=%s, reason=%s)"
+                                     % (job.uuid, status, reason))
+
+    def _get_script(self, name):
+        try:
+            return self._scripts[name]
+        except KeyError:
+            exc.raise_with_cause(exc.NotFound,
+                                 "Can not access %s script (has this"
+                                 " board been connected?)" % name)
+
+    @base.check_who
+    def trash(self, job, who):
+        script = self._get_script('trash')
+        with _translate_failures():
+            raw_who = self._encode_owner(who)
+            raw_result = script(keys=[job.owner_key, self.listings_key,
+                                      job.last_modified_key, self.trash_key],
+                                args=[raw_who, job.key,
+                                      self._dumps(timeutils.utcnow())])
+            result = self._loads(raw_result)
+        status = result['status']
+        if status != self.SCRIPT_STATUS_OK:
+            reason = result.get('reason')
+            if reason == self.SCRIPT_UNKNOWN_JOB:
+                raise exc.NotFound("Job %s not found to be"
+                                   " trashed" % (job.uuid))
+            elif reason == self.SCRIPT_UNKNOWN_OWNER:
+                raise exc.NotFound("Can not trash job %s"
+                                   " which we can not determine"
+                                   " the owner of" % (job.uuid))
+            elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
+                raw_owner = result.get('owner')
+                if raw_owner:
+                    owner = self._decode_owner(raw_owner)
+                    raise exc.JobFailure("Can not trash job %s"
+                                         " which is not owned by %s (it is"
+                                         " actively owned by %s)"
+                                         % (job.uuid, who, owner))
+                else:
+                    raise exc.JobFailure("Can not trash job %s"
+                                         " which is not owned by %s"
+                                         % (job.uuid, who))
+            else:
+                raise exc.JobFailure("Failure to trash job %s,"
+                                     " unknown internal error (reason=%s)"
+                                     % (job.uuid, reason))
--- a/taskflow/jobs/backends/impl_zookeeper.py
+++ b/taskflow/jobs/backends/impl_zookeeper.py
@@ -17,14 +17,17 @@
 import collections
 import contextlib
 import functools
+import sys
 import threading

-from concurrent import futures
+import fasteners
+import futurist
 from kazoo import exceptions as k_exceptions
 from kazoo.protocol import paths as k_paths
 from kazoo.recipe import watchers
 from oslo_serialization import jsonutils
 from oslo_utils import excutils
+from oslo_utils import timeutils
 from oslo_utils import uuidutils
 import six

@@ -32,66 +35,39 @@ from taskflow import exceptions as excp
 from taskflow.jobs import base
 from taskflow import logging
 from taskflow import states
-from taskflow.types import timing as tt
 from taskflow.utils import kazoo_utils
-from taskflow.utils import lock_utils
 from taskflow.utils import misc

 LOG = logging.getLogger(__name__)

-UNCLAIMED_JOB_STATES = (
-    states.UNCLAIMED,
-)
-ALL_JOB_STATES = (
-    states.UNCLAIMED,
-    states.COMPLETE,
-    states.CLAIMED,
-)
-
-# Transaction support was added in 3.4.0
-MIN_ZK_VERSION = (3, 4, 0)
-LOCK_POSTFIX = ".lock"
-JOB_PREFIX = 'job'
-
-
-def _check_who(who):
-    if not isinstance(who, six.string_types):
-        raise TypeError("Job applicant must be a string type")
-    if len(who) == 0:
-        raise ValueError("Job applicant must be non-empty")
-

 class ZookeeperJob(base.Job):
    """A zookeeper job."""

-    def __init__(self, name, board, client, backend, path,
+    def __init__(self, board, name, client, path,
                 uuid=None, details=None, book=None, book_data=None,
-                 created_on=None):
-        super(ZookeeperJob, self).__init__(name, uuid=uuid, details=details)
-        self._board = board
-        self._book = book
-        if not book_data:
-            book_data = {}
-        self._book_data = book_data
+                 created_on=None, backend=None):
+        super(ZookeeperJob, self).__init__(board, name,
+                                           uuid=uuid, details=details,
+                                           backend=backend,
+                                           book=book, book_data=book_data)
        self._client = client
-        self._backend = backend
-        if all((self._book, self._book_data)):
-            raise ValueError("Only one of 'book_data' or 'book'"
-                             " can be provided")
        self._path = k_paths.normpath(path)
-        self._lock_path = path + LOCK_POSTFIX
+        self._lock_path = self._path + board.LOCK_POSTFIX
        self._created_on = created_on
        self._node_not_found = False
        basename = k_paths.basename(self._path)
        self._root = self._path[0:-len(basename)]
-        self._sequence = int(basename[len(JOB_PREFIX):])
+        self._sequence = int(basename[len(board.JOB_PREFIX):])

    @property
    def lock_path(self):
+        """Path the job lock/claim and owner znode is stored."""
        return self._lock_path

    @property
    def path(self):
+        """Path the job data znode is stored."""
        return self._path

    @property
@@ -112,22 +88,27 @@ class ZookeeperJob(base.Job):
                return trans_func(attr)
            else:
                return attr
-        except k_exceptions.NoNodeError as e:
-            raise excp.NotFound("Can not fetch the %r attribute"
-                                " of job %s (%s), path %s not found"
-                                % (attr_name, self.uuid, self.path, path), e)
-        except self._client.handler.timeout_exception as e:
-            raise excp.JobFailure("Can not fetch the %r attribute"
-                                  " of job %s (%s), operation timed out"
-                                  % (attr_name, self.uuid, self.path), e)
-        except k_exceptions.SessionExpiredError as e:
-            raise excp.JobFailure("Can not fetch the %r attribute"
-                                  " of job %s (%s), session expired"
-                                  % (attr_name, self.uuid, self.path), e)
-        except (AttributeError, k_exceptions.KazooException) as e:
-            raise excp.JobFailure("Can not fetch the %r attribute"
-                                  " of job %s (%s), internal error" %
-                                  (attr_name, self.uuid, self.path), e)
+        except k_exceptions.NoNodeError:
+            excp.raise_with_cause(
+                excp.NotFound,
+                "Can not fetch the %r attribute of job %s (%s),"
+                " path %s not found" % (attr_name, self.uuid,
+                                        self.path, path))
+        except self._client.handler.timeout_exception:
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the %r attribute of job %s (%s),"
+                " operation timed out" % (attr_name, self.uuid, self.path))
+        except k_exceptions.SessionExpiredError:
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the %r attribute of job %s (%s),"
+                " session expired" % (attr_name, self.uuid, self.path))
+        except (AttributeError, k_exceptions.KazooException):
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the %r attribute of job %s (%s),"
+                " internal error" % (attr_name, self.uuid, self.path))

    @property
    def last_modified(self):
@@ -155,23 +136,6 @@ class ZookeeperJob(base.Job):
                self._node_not_found = True
        return self._created_on

-    @property
-    def board(self):
-        return self._board
-
-    def _load_book(self):
-        book_uuid = self.book_uuid
-        if self._backend is not None and book_uuid is not None:
-            # TODO(harlowja): we are currently limited by assuming that the
-            # job posted has the same backend as this loader (to start this
-            # seems to be a ok assumption, and can be adjusted in the future
-            # if we determine there is a use-case for multi-backend loaders,
-            # aka a registry of loaders).
-            with contextlib.closing(self._backend.get_connection()) as conn:
-                return conn.get_logbook(book_uuid)
-        # No backend to fetch from or no uuid specified
-        return None
-
    @property
    def state(self):
        owner = self.board.find_owner(self)
@@ -181,15 +145,21 @@ class ZookeeperJob(base.Job):
            job_data = misc.decode_json(raw_data)
        except k_exceptions.NoNodeError:
            pass
-        except k_exceptions.SessionExpiredError as e:
-            raise excp.JobFailure("Can not fetch the state of %s,"
-                                  " session expired" % (self.uuid), e)
-        except self._client.handler.timeout_exception as e:
-            raise excp.JobFailure("Can not fetch the state of %s,"
-                                  " operation timed out" % (self.uuid), e)
-        except k_exceptions.KazooException as e:
-            raise excp.JobFailure("Can not fetch the state of %s, internal"
-                                  " error" % (self.uuid), e)
+        except k_exceptions.SessionExpiredError:
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the state of %s,"
+                " session expired" % (self.uuid))
+        except self._client.handler.timeout_exception:
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the state of %s,"
+                " operation timed out" % (self.uuid))
+        except k_exceptions.KazooException:
+            excp.raise_with_cause(
+                excp.JobFailure,
+                "Can not fetch the state of %s,"
+                " internal error" % (self.uuid))
        if not job_data:
            # No data this job has been completed (the owner that we might have
            # fetched will not be able to be fetched again, since the job node
@@ -209,30 +179,6 @@ class ZookeeperJob(base.Job):
    def __hash__(self):
        return hash(self.path)

-    @property
-    def book(self):
-        if self._book is None:
-            self._book = self._load_book()
-        return self._book
-
-    @property
-    def book_uuid(self):
-        if self._book:
-            return self._book.uuid
-        if self._book_data:
-            return self._book_data.get('uuid')
-        else:
-            return None
-
-    @property
-    def book_name(self):
-        if self._book:
-            return self._book.name
-        if self._book_data:
-            return self._book_data.get('name')
-        else:
-            return None
-

 class ZookeeperJobBoardIterator(six.Iterator):
    """Iterator over a zookeeper jobboard that iterates over potential jobs.
@@ -246,6 +192,16 @@ class ZookeeperJobBoardIterator(six.Iterator):
      over unclaimed jobs.
    """

+    _UNCLAIMED_JOB_STATES = (
+        states.UNCLAIMED,
+    )
+
+    _JOB_STATES = (
+        states.UNCLAIMED,
+        states.COMPLETE,
+        states.CLAIMED,
+    )
+
    def __init__(self, board, only_unclaimed=False, ensure_fresh=False):
        self._board = board
        self._jobs = collections.deque()
@@ -255,6 +211,7 @@ class ZookeeperJobBoardIterator(six.Iterator):

    @property
    def board(self):
+        """The board this iterator was created from."""
        return self._board

    def __iter__(self):
@@ -262,9 +219,9 @@ class ZookeeperJobBoardIterator(six.Iterator):

    def _next_job(self):
        if self.only_unclaimed:
-            allowed_states = UNCLAIMED_JOB_STATES
+            allowed_states = self._UNCLAIMED_JOB_STATES
        else:
-            allowed_states = ALL_JOB_STATES
+            allowed_states = self._JOB_STATES
        job = None
        while self._jobs and job is None:
            maybe_job = self._jobs.popleft()
@@ -292,29 +249,49 @@ class ZookeeperJobBoardIterator(six.Iterator):


 class ZookeeperJobBoard(base.NotifyingJobBoard):
-    """A jobboard backend by zookeeper.
+    """A jobboard backed by `zookeeper`_.

    Powered by the `kazoo <http://kazoo.readthedocs.org/>`_ library.

    This jobboard creates *sequenced* persistent znodes in a directory in
-    zookeeper (that directory defaults ``/taskflow/jobs``) and uses zookeeper
-    watches to notify other jobboards that the job which was posted using the
-    :meth:`.post` method (this creates a znode with contents/details in json)
-    The users of those jobboard(s) (potentially on disjoint sets of machines)
-    can then iterate over the available jobs and decide if they want to attempt
-    to claim one of the jobs they have iterated over. If so they will then
-    attempt to contact zookeeper and will attempt to create a ephemeral znode
-    using the name of the persistent znode + ".lock" as a postfix. If the
-    entity trying to use the jobboard to :meth:`.claim` the job is able to
-    create a ephemeral znode with that name then it will be allowed (and
-    expected) to perform whatever *work* the contents of that job that it
-    locked described. Once finished the ephemeral znode and persistent znode
-    may be deleted (if successfully completed) in a single transcation or if
-    not successfull (or the entity that claimed the znode dies) the ephemeral
-    znode will be released (either manually by using :meth:`.abandon` or
-    automatically by zookeeper the ephemeral is deemed to be lost).
+    zookeeper and uses zookeeper watches to notify other jobboards of
+    jobs which were posted using the :meth:`.post` method (this creates a
+    znode with job contents/details encoded in `json`_). The users of these
+    jobboard(s) (potentially on disjoint sets of machines) can then iterate
+    over the available jobs and decide if they want
+    to attempt to claim one of the jobs they have iterated over. If so they
+    will then attempt to contact zookeeper and they will attempt to create a
+    ephemeral znode using the name of the persistent znode + ".lock" as a
+    postfix. If the entity trying to use the jobboard to :meth:`.claim` the
+    job is able to create a ephemeral znode with that name then it will be
+    allowed (and expected) to perform whatever *work* the contents of that
+    job described. Once the claiming entity is finished the ephemeral znode
+    and persistent znode will be deleted (if successfully completed) in a
+    single transaction. If the claiming entity is not successful (or the
+    entity that claimed the znode dies) the ephemeral znode will be
+    released (either manually by using :meth:`.abandon` or automatically by
+    zookeeper when the ephemeral node and associated session is deemed to
+    have been lost).
+
+    .. _zookeeper: http://zookeeper.apache.org/
+    .. _json: http://json.org/
    """

+    #: Transaction support was added in 3.4.0 so we need at least that version.
+    MIN_ZK_VERSION = (3, 4, 0)
+
+    #: Znode **postfix** that lock entries have.
+    LOCK_POSTFIX = ".lock"
+
+    #: Znode child path created under root path that contains trashed jobs.
+    TRASH_FOLDER = ".trash"
+
+    #: Znode **prefix** that job entries have.
+    JOB_PREFIX = 'job'
+
+    #: Default znode path used for jobs (data, locks...).
+    DEFAULT_PATH = "/taskflow/jobs"
+
    def __init__(self, name, conf,
                 client=None, persistence=None, emit_notifications=True):
        super(ZookeeperJobBoard, self).__init__(name, conf)
@@ -324,17 +301,17 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
        else:
            self._client = kazoo_utils.make_client(self._conf)
            self._owned = True
-        path = str(conf.get("path", "/taskflow/jobs"))
+        path = str(conf.get("path", self.DEFAULT_PATH))
        if not path:
            raise ValueError("Empty zookeeper path is disallowed")
        if not k_paths.isabs(path):
            raise ValueError("Zookeeper path must be absolute")
        self._path = path
-        # The backend to load the full logbooks from, since whats sent over
-        # the zookeeper data connection is only the logbook uuid and name, and
-        # not currently the full logbook (later when a zookeeper backend
-        # appears we can likely optimize for that backend usage by directly
-        # reading from the path where the data is stored, if we want).
+        self._trash_path = self._path.replace(k_paths.basename(self._path),
+                                              self.TRASH_FOLDER)
+        # The backend to load the full logbooks from, since what is sent over
+        # the data connection is only the logbook uuid and name, and not the
+        # full logbook.
        self._persistence = persistence
        # Misc. internal details
        self._known_jobs = {}
@@ -345,23 +322,34 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
        self._job_watcher = None
        # Since we use sequenced ids this will be the path that the sequences
        # are prefixed with, for example, job0000000001, job0000000002, ...
-        self._job_base = k_paths.join(path, JOB_PREFIX)
+        self._job_base = k_paths.join(path, self.JOB_PREFIX)
        self._worker = None
        self._emit_notifications = bool(emit_notifications)
+        self._connected = False

    def _emit(self, state, details):
-        # Submit the work to the executor to avoid blocking the kazoo queue.
+        # Submit the work to the executor to avoid blocking the kazoo threads
+        # and queue(s)...
+        worker = self._worker
+        if worker is None:
+            return
        try:
-            self._worker.submit(self.notifier.notify, state, details)
-        except (AttributeError, RuntimeError):
-            # Notification thread is shutdown or non-existent, either case we
-            # just want to skip submitting a notification...
+            worker.submit(self.notifier.notify, state, details)
+        except RuntimeError:
+            # Notification thread is shutdown just skip submitting a
+            # notification...
            pass

    @property
    def path(self):
+        """Path where all job znodes will be stored."""
        return self._path

+    @property
+    def trash_path(self):
+        """Path where all trashed job znodes will be stored."""
+        return self._trash_path
+
    @property
    def job_count(self):
        return len(self._known_jobs)
@@ -375,15 +363,17 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
    def _force_refresh(self):
        try:
            children = self._client.get_children(self.path)
-        except self._client.handler.timeout_exception as e:
-            raise excp.JobFailure("Refreshing failure, operation timed out",
-                                  e)
-        except k_exceptions.SessionExpiredError as e:
-            raise excp.JobFailure("Refreshing failure, session expired", e)
+        except self._client.handler.timeout_exception:
+            excp.raise_with_cause(excp.JobFailure,
+                                  "Refreshing failure, operation timed out")
+        except k_exceptions.SessionExpiredError:
+            excp.raise_with_cause(excp.JobFailure,
+                                  "Refreshing failure, session expired")
        except k_exceptions.NoNodeError:
            pass
-        except k_exceptions.KazooException as e:
-            raise excp.JobFailure("Refreshing failure, internal error", e)
+        except k_exceptions.KazooException:
+            excp.raise_with_cause(excp.JobFailure,
+                                  "Refreshing failure, internal error")
        else:
            self._on_job_posting(children, delayed=False)

@@ -429,8 +419,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
                # jobs information into the known job set (if it's already
                # existing then just leave it alone).
                if path not in self._known_jobs:
-                    job = ZookeeperJob(job_data['name'], self,
-                                       self._client, self._persistence, path,
+                    job = ZookeeperJob(self, job_data['name'],
+                                       self._client, path,
+                                       backend=self._persistence,
                                       uuid=job_data['uuid'],
                                       book_data=job_data.get("book"),
                                       details=job_data.get("details", {}),
@@ -444,7 +435,8 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
        LOG.debug("Got children %s under path %s", children, self.path)
        child_paths = []
        for c in children:
-            if c.endswith(LOCK_POSTFIX) or not c.startswith(JOB_PREFIX):
+            if (c.endswith(self.LOCK_POSTFIX) or
+                    not c.startswith(self.JOB_PREFIX)):
                # Skip lock paths or non-job-paths (these are not valid jobs)
                continue
            child_paths.append(k_paths.join(self.path, c))
@@ -488,45 +480,31 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
                self._process_child(path, request)

    def post(self, name, book=None, details=None):
-
-        def format_posting(job_uuid):
-            posting = {
-                'uuid': job_uuid,
-                'name': name,
-            }
-            if details:
-                posting['details'] = details
-            else:
-                posting['details'] = {}
-            if book is not None:
-                posting['book'] = {
-                    'name': book.name,
-                    'uuid': book.uuid,
-                }
-            return posting
-
        # NOTE(harlowja): Jobs are not ephemeral, they will persist until they
        # are consumed (this may change later, but seems safer to do this until
        # further notice).
        job_uuid = uuidutils.generate_uuid()
+        job_posting = base.format_posting(job_uuid, name,
+                                          book=book, details=details)
+        raw_job_posting = misc.binary_encode(jsonutils.dumps(job_posting))
        with self._wrap(job_uuid, None,
-                        "Posting failure: %s", ensure_known=False):
-            job_posting = format_posting(job_uuid)
-            job_posting = misc.binary_encode(jsonutils.dumps(job_posting))
+                        fail_msg_tpl="Posting failure: %s",
+                        ensure_known=False):
            job_path = self._client.create(self._job_base,
-                                           value=job_posting,
+                                           value=raw_job_posting,
                                           sequence=True,
                                           ephemeral=False)
-            job = ZookeeperJob(name, self, self._client,
-                               self._persistence, job_path,
-                               book=book, details=details,
-                               uuid=job_uuid)
+            job = ZookeeperJob(self, name, self._client, job_path,
+                               backend=self._persistence,
+                               book=book, details=details, uuid=job_uuid,
+                               book_data=job_posting.get('book'))
            with self._job_cond:
                self._known_jobs[job_path] = job
                self._job_cond.notify_all()
            self._emit(base.POSTED, details={'job': job})
            return job

+    @base.check_who
    def claim(self, job, who):
        def _unclaimable_try_find_owner(cause):
            try:
@@ -534,13 +512,14 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
            except Exception:
                owner = None
            if owner:
-                msg = "Job %s already claimed by '%s'" % (job.uuid, owner)
+                message = "Job %s already claimed by '%s'" % (job.uuid, owner)
            else:
-                msg = "Job %s already claimed" % (job.uuid)
-            return excp.UnclaimableJob(msg, cause)
+                message = "Job %s already claimed" % (job.uuid)
+            excp.raise_with_cause(excp.UnclaimableJob,
+                                  message, cause=cause)

-        _check_who(who)
-        with self._wrap(job.uuid, job.path, "Claiming failure: %s"):
+        with self._wrap(job.uuid, job.path,
+                        fail_msg_tpl="Claiming failure: %s"):
            # NOTE(harlowja): post as json which will allow for future changes
            # more easily than a raw string/text.
            value = jsonutils.dumps({
@@ -558,21 +537,23 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
            try:
                kazoo_utils.checked_commit(txn)
            except k_exceptions.NodeExistsError as e:
-                raise _unclaimable_try_find_owner(e)
+                _unclaimable_try_find_owner(e)
            except kazoo_utils.KazooTransactionException as e:
                if len(e.failures) < 2:
                    raise
                else:
                    if isinstance(e.failures[0], k_exceptions.NoNodeError):
-                        raise excp.NotFound(
+                        excp.raise_with_cause(
+                            excp.NotFound,
                            "Job %s not found to be claimed" % job.uuid,
-                            e.failures[0])
+                            cause=e.failures[0])
                    if isinstance(e.failures[1], k_exceptions.NodeExistsError):
-                        raise _unclaimable_try_find_owner(e.failures[1])
+                        _unclaimable_try_find_owner(e.failures[1])
                    else:
-                        raise excp.UnclaimableJob(
+                        excp.raise_with_cause(
+                            excp.UnclaimableJob,
                            "Job %s claim failed due to transaction"
-                            " not succeeding" % (job.uuid), e)
+                            " not succeeding" % (job.uuid), cause=e)

    @contextlib.contextmanager
    def _wrap(self, job_uuid, job_path,
@@ -588,21 +569,23 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
                raise excp.NotFound(fail_msg_tpl % (job_uuid))
        try:
            yield
-        except self._client.handler.timeout_exception as e:
+        except self._client.handler.timeout_exception:
            fail_msg_tpl += ", operation timed out"
-            raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
-        except k_exceptions.SessionExpiredError as e:
+            excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))
+        except k_exceptions.SessionExpiredError:
            fail_msg_tpl += ", session expired"
-            raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
+            excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))
        except k_exceptions.NoNodeError:
            fail_msg_tpl += ", unknown job"
-            raise excp.NotFound(fail_msg_tpl % (job_uuid))
-        except k_exceptions.KazooException as e:
+            excp.raise_with_cause(excp.NotFound, fail_msg_tpl % (job_uuid))
+        except k_exceptions.KazooException:
            fail_msg_tpl += ", internal error"
-            raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
+            excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))

    def find_owner(self, job):
-        with self._wrap(job.uuid, job.path, "Owner query failure: %s"):
+        with self._wrap(job.uuid, job.path,
+                        fail_msg_tpl="Owner query failure: %s",
+                        ensure_known=False):
            try:
                self._client.sync(job.lock_path)
                raw_data, _lock_stat = self._client.get(job.lock_path)
@@ -618,14 +601,16 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
        return (misc.decode_json(lock_data), lock_stat,
                misc.decode_json(job_data), job_stat)

+    @base.check_who
    def consume(self, job, who):
-        _check_who(who)
-        with self._wrap(job.uuid, job.path, "Consumption failure: %s"):
+        with self._wrap(job.uuid, job.path,
+                        fail_msg_tpl="Consumption failure: %s"):
            try:
                owner_data = self._get_owner_and_data(job)
                lock_data, lock_stat, data, data_stat = owner_data
            except k_exceptions.NoNodeError:
-                raise excp.JobFailure("Can not consume a job %s"
+                excp.raise_with_cause(excp.NotFound,
+                                      "Can not consume a job %s"
                                      " which we can not determine"
                                      " the owner of" % (job.uuid))
            if lock_data.get("owner") != who:
@@ -638,14 +623,16 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
            kazoo_utils.checked_commit(txn)
            self._remove_job(job.path)

+    @base.check_who
    def abandon(self, job, who):
-        _check_who(who)
-        with self._wrap(job.uuid, job.path, "Abandonment failure: %s"):
+        with self._wrap(job.uuid, job.path,
+                        fail_msg_tpl="Abandonment failure: %s"):
            try:
                owner_data = self._get_owner_and_data(job)
                lock_data, lock_stat, data, data_stat = owner_data
            except k_exceptions.NoNodeError:
-                raise excp.JobFailure("Can not abandon a job %s"
+                excp.raise_with_cause(excp.NotFound,
+                                      "Can not abandon a job %s"
                                      " which we can not determine"
                                      " the owner of" % (job.uuid))
            if lock_data.get("owner") != who:
@@ -656,12 +643,36 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
            txn.delete(job.lock_path, version=lock_stat.version)
            kazoo_utils.checked_commit(txn)

+    @base.check_who
+    def trash(self, job, who):
+        with self._wrap(job.uuid, job.path,
+                        fail_msg_tpl="Trash failure: %s"):
+            try:
+                owner_data = self._get_owner_and_data(job)
+                lock_data, lock_stat, data, data_stat = owner_data
+            except k_exceptions.NoNodeError:
+                excp.raise_with_cause(excp.NotFound,
+                                      "Can not trash a job %s"
+                                      " which we can not determine"
+                                      " the owner of" % (job.uuid))
+            if lock_data.get("owner") != who:
+                raise excp.JobFailure("Can not trash a job %s"
+                                      " which is not owned by %s"
+                                      % (job.uuid, who))
+            trash_path = job.path.replace(self.path, self.trash_path)
+            value = misc.binary_encode(jsonutils.dumps(data))
+            txn = self._client.transaction()
+            txn.create(trash_path, value=value)
+            txn.delete(job.lock_path, version=lock_stat.version)
+            txn.delete(job.path, version=data_stat.version)
+            kazoo_utils.checked_commit(txn)
+
    def _state_change_listener(self, state):
        LOG.debug("Kazoo client has changed to state: %s", state)

    def wait(self, timeout=None):
        # Wait until timeout expires (or forever) for jobs to appear.
-        watch = tt.StopWatch(duration=timeout)
+        watch = timeutils.StopWatch(duration=timeout)
        watch.start()
        with self._job_cond:
            while True:
@@ -684,9 +695,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):

    @property
    def connected(self):
-        return self._client.connected
+        return self._connected and self._client.connected

-    @lock_utils.locked(lock='_open_close_lock')
+    @fasteners.locked(lock='_open_close_lock')
    def close(self):
        if self._owned:
            LOG.debug("Stopping client")
@@ -698,8 +709,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
        with self._job_cond:
            self._known_jobs.clear()
        LOG.debug("Stopped & cleared local state")
+        self._connected = False

-    @lock_utils.locked(lock='_open_close_lock')
+    @fasteners.locked(lock='_open_close_lock')
    def connect(self, timeout=10.0):

        def try_clean():
@@ -717,25 +729,33 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
                timeout = float(timeout)
            self._client.start(timeout=timeout)
        except (self._client.handler.timeout_exception,
-                k_exceptions.KazooException) as e:
-            raise excp.JobFailure("Failed to connect to zookeeper", e)
+                k_exceptions.KazooException):
+            excp.raise_with_cause(excp.JobFailure,
+                                  "Failed to connect to zookeeper")
        try:
            if self._conf.get('check_compatible', True):
-                kazoo_utils.check_compatible(self._client, MIN_ZK_VERSION)
+                kazoo_utils.check_compatible(self._client, self.MIN_ZK_VERSION)
            if self._worker is None and self._emit_notifications:
-                self._worker = futures.ThreadPoolExecutor(max_workers=1)
+                self._worker = futurist.ThreadPoolExecutor(max_workers=1)
            self._client.ensure_path(self.path)
+            self._client.ensure_path(self.trash_path)
            if self._job_watcher is None:
                self._job_watcher = watchers.ChildrenWatch(
                    self._client,
                    self.path,
                    func=self._on_job_posting,
                    allow_session_lost=True)
+            self._connected = True
        except excp.IncompatibleVersion:
            with excutils.save_and_reraise_exception():
                try_clean()
        except (self._client.handler.timeout_exception,
-                k_exceptions.KazooException) as e:
-            try_clean()
-            raise excp.JobFailure("Failed to do post-connection"
-                                  " initialization", e)
+                k_exceptions.KazooException):
+            exc_type, exc, exc_tb = sys.exc_info()
+            try:
+                try_clean()
+                excp.raise_with_cause(excp.JobFailure,
+                                      "Failed to do post-connection"
+                                      " initialization", cause=exc)
+            finally:
+                del(exc_type, exc, exc_tb)
--- a/taskflow/jobs/base.py
+++ b/taskflow/jobs/base.py
@@ -16,6 +16,7 @@
 #    under the License.

 import abc
+import contextlib

 from oslo_utils import uuidutils
 import six
@@ -43,7 +44,9 @@ class Job(object):
    reverting...
    """

-    def __init__(self, name, uuid=None, details=None):
+    def __init__(self, board, name,
+                 uuid=None, details=None, backend=None,
+                 book=None, book_data=None):
        if uuid:
            self._uuid = uuid
        else:
@@ -52,45 +55,62 @@ class Job(object):
        if not details:
            details = {}
        self._details = details
+        self._backend = backend
+        self._board = board
+        self._book = book
+        if not book_data:
+            book_data = {}
+        self._book_data = book_data

    @abc.abstractproperty
    def last_modified(self):
        """The datetime the job was last modified."""
-        pass

    @abc.abstractproperty
    def created_on(self):
        """The datetime the job was created on."""
-        pass

-    @abc.abstractproperty
+    @property
    def board(self):
        """The board this job was posted on or was created from."""
+        return self._board

    @abc.abstractproperty
    def state(self):
-        """The current state of this job."""
+        """Access the current state of this job."""
+        pass

-    @abc.abstractproperty
+    @property
    def book(self):
        """Logbook associated with this job.

        If no logbook is associated with this job, this property is None.
        """
+        if self._book is None:
+            self._book = self._load_book()
+        return self._book

-    @abc.abstractproperty
+    @property
    def book_uuid(self):
        """UUID of logbook associated with this job.

        If no logbook is associated with this job, this property is None.
        """
+        if self._book is not None:
+            return self._book.uuid
+        else:
+            return self._book_data.get('uuid')

-    @abc.abstractproperty
+    @property
    def book_name(self):
        """Name of logbook associated with this job.

        If no logbook is associated with this job, this property is None.
        """
+        if self._book is not None:
+            return self._book.name
+        else:
+            return self._book_data.get('name')

    @property
    def uuid(self):
@@ -107,10 +127,24 @@ class Job(object):
        """The non-uniquely identifying name of this job."""
        return self._name

+    def _load_book(self):
+        book_uuid = self.book_uuid
+        if self._backend is not None and book_uuid is not None:
+            # TODO(harlowja): we are currently limited by assuming that the
+            # job posted has the same backend as this loader (to start this
+            # seems to be a ok assumption, and can be adjusted in the future
+            # if we determine there is a use-case for multi-backend loaders,
+            # aka a registry of loaders).
+            with contextlib.closing(self._backend.get_connection()) as conn:
+                return conn.get_logbook(book_uuid)
+        # No backend to fetch from or no uuid specified
+        return None
+
    def __str__(self):
        """Pretty formats the job into something *more* meaningful."""
-        return "%s %s (%s): %s" % (type(self).__name__,
-                                   self.name, self.uuid, self.details)
+        return "%s: %s (uuid=%s, details=%s)" % (type(self).__name__,
+                                                 self.name, self.uuid,
+                                                 self.details)


@six.add_metaclass(abc.ABCMeta)
@@ -260,6 +294,25 @@ class JobBoard(object):
            this must be the same name that was used for claiming this job.
        """

+    @abc.abstractmethod
+    def trash(self, job, who):
+        """Trash the provided job.
+
+        Trashing a job signals to others that the job is broken and should not
+        be reclaimed. This is provided as an option for users to be able to
+        remove jobs from the board externally.  The trashed job details should
+        be kept around in an alternate location to be reviewed, if desired.
+
+        Only the entity that has claimed that job can trash a job. Any entity
+        trashing a unclaimed job (or a job they do not own) will cause an
+        exception.
+
+        :param job: a job on this jobboard that can be trashed (if it does
+            not exist then a NotFound exception will be raised).
+        :param who: string that names the entity performing the trashing,
+            this must be the same name that was used for claiming this job.
+        """
+
    @abc.abstractproperty
    def connected(self):
        """Returns if this jobboard is connected."""
@@ -295,3 +348,40 @@ class NotifyingJobBoard(JobBoard):
    def __init__(self, name, conf):
        super(NotifyingJobBoard, self).__init__(name, conf)
        self.notifier = notifier.Notifier()
+
+
+# Internal helpers for usage by board implementations...
+
+def check_who(meth):
+
+    @six.wraps(meth)
+    def wrapper(self, job, who, *args, **kwargs):
+        if not isinstance(who, six.string_types):
+            raise TypeError("Job applicant must be a string type")
+        if len(who) == 0:
+            raise ValueError("Job applicant must be non-empty")
+        return meth(self, job, who, *args, **kwargs)
+
+    return wrapper
+
+
+def format_posting(uuid, name, created_on=None, last_modified=None,
+                   details=None, book=None):
+    posting = {
+        'uuid': uuid,
+        'name': name,
+    }
+    if created_on is not None:
+        posting['created_on'] = created_on
+    if last_modified is not None:
+        posting['last_modified'] = last_modified
+    if details:
+        posting['details'] = details
+    else:
+        posting['details'] = {}
+    if book is not None:
+        posting['book'] = {
+            'name': book.name,
+            'uuid': book.uuid,
+        }
+    return posting
--- a/taskflow/listeners/base.py
+++ b/taskflow/listeners/base.py
@@ -18,6 +18,7 @@ from __future__ import absolute_import

 import abc

+from debtcollector import moves
 from oslo_utils import excutils
 import six

@@ -25,7 +26,6 @@ from taskflow import logging
 from taskflow import states
 from taskflow.types import failure
 from taskflow.types import notifier
-from taskflow.utils import deprecation

 LOG = logging.getLogger(__name__)

@@ -165,10 +165,8 @@ class Listener(object):


 # TODO(harlowja): remove in 0.7 or later...
-ListenerBase = deprecation.moved_inheritable_class(Listener,
-                                                   'ListenerBase', __name__,
-                                                   version="0.6",
-                                                   removal_version="?")
+ListenerBase = moves.moved_class(Listener, 'ListenerBase', __name__,
+                                 version="0.6", removal_version="2.0")


@six.add_metaclass(abc.ABCMeta)
@@ -213,10 +211,18 @@ class DumpingListener(Listener):


 # TODO(harlowja): remove in 0.7 or later...
-class LoggingBase(deprecation.moved_inheritable_class(DumpingListener,
-                                                      'LoggingBase', __name__,
-                                                      version="0.6",
-                                                      removal_version="?")):
+class LoggingBase(moves.moved_class(DumpingListener,
+                                    'LoggingBase', __name__,
+                                    version="0.6", removal_version="2.0")):
+
+    """Legacy logging base.
+
+     .. deprecated:: 0.6
+
+         This class is **deprecated** and is present for backward
+         compatibility **only**, its replacement
+         :py:class:`.DumpingListener` should be used going forward.
+    """

    def _dump(self, message, *args, **kwargs):
        self._log(message, *args, **kwargs)
--- a/taskflow/listeners/timing.py
+++ b/taskflow/listeners/timing.py
@@ -17,12 +17,15 @@
 from __future__ import absolute_import

 import itertools
+import time
+
+from debtcollector import moves
+from oslo_utils import timeutils

 from taskflow import exceptions as exc
 from taskflow.listeners import base
 from taskflow import logging
 from taskflow import states
-from taskflow.types import timing as tt

 STARTING_STATES = frozenset((states.RUNNING, states.REVERTING))
 FINISHED_STATES = frozenset((base.FINISH_STATES + (states.REVERTED,)))
@@ -39,7 +42,7 @@ def _printer(message):
    print(message)


-class TimingListener(base.Listener):
+class DurationListener(base.Listener):
    """Listener that captures task duration.

    It records how long a task took to execute (or fail)
@@ -47,13 +50,13 @@ class TimingListener(base.Listener):
    to task metadata with key ``'duration'``.
    """
    def __init__(self, engine):
-        super(TimingListener, self).__init__(engine,
-                                             task_listen_for=WATCH_STATES,
-                                             flow_listen_for=[])
+        super(DurationListener, self).__init__(engine,
+                                               task_listen_for=WATCH_STATES,
+                                               flow_listen_for=[])
        self._timers = {}

    def deregister(self):
-        super(TimingListener, self).deregister()
+        super(DurationListener, self).deregister()
        # There should be none that still exist at deregistering time, so log a
        # warning if there were any that somehow still got left behind...
        leftover_timers = len(self._timers)
@@ -78,7 +81,7 @@ class TimingListener(base.Listener):
        if state == states.PENDING:
            self._timers.pop(task_name, None)
        elif state in STARTING_STATES:
-            self._timers[task_name] = tt.StopWatch().start()
+            self._timers[task_name] = timeutils.StopWatch().start()
        elif state in FINISHED_STATES:
            timer = self._timers.pop(task_name, None)
            if timer is not None:
@@ -86,22 +89,76 @@ class TimingListener(base.Listener):
                self._record_ending(timer, task_name)


-class PrintingTimingListener(TimingListener):
-    """Listener that prints the start & stop timing as well as recording it."""
+TimingListener = moves.moved_class(DurationListener,
+                                   'TimingListener', __name__,
+                                   version="0.8", removal_version="2.0")
+
+
+class PrintingDurationListener(DurationListener):
+    """Listener that prints the duration as well as recording it."""

    def __init__(self, engine, printer=None):
-        super(PrintingTimingListener, self).__init__(engine)
+        super(PrintingDurationListener, self).__init__(engine)
        if printer is None:
            self._printer = _printer
        else:
            self._printer = printer

    def _record_ending(self, timer, task_name):
-        super(PrintingTimingListener, self)._record_ending(timer, task_name)
+        super(PrintingDurationListener, self)._record_ending(timer, task_name)
        self._printer("It took task '%s' %0.2f seconds to"
                      " finish." % (task_name, timer.elapsed()))

    def _task_receiver(self, state, details):
-        super(PrintingTimingListener, self)._task_receiver(state, details)
+        super(PrintingDurationListener, self)._task_receiver(state, details)
        if state in STARTING_STATES:
            self._printer("'%s' task started." % (details['task_name']))
+
+
+PrintingTimingListener = moves.moved_class(
+    PrintingDurationListener, 'PrintingTimingListener', __name__,
+    version="0.8", removal_version="2.0")
+
+
+class EventTimeListener(base.Listener):
+    """Listener that captures task, flow, and retry event timestamps.
+
+    It records how when an event is received (using unix time) to
+    storage. It saves the timestamps under keys (in atom or flow details
+    metadata) of the format ``{event}-timestamp`` where ``event`` is the
+    state/event name that has been received.
+
+    This information can be later extracted/examined to derive durations...
+    """
+
+    def __init__(self, engine,
+                 task_listen_for=base.DEFAULT_LISTEN_FOR,
+                 flow_listen_for=base.DEFAULT_LISTEN_FOR,
+                 retry_listen_for=base.DEFAULT_LISTEN_FOR):
+        super(EventTimeListener, self).__init__(
+            engine, task_listen_for=task_listen_for,
+            flow_listen_for=flow_listen_for, retry_listen_for=retry_listen_for)
+
+    def _record_atom_event(self, state, atom_name):
+        meta_update = {'%s-timestamp' % state: time.time()}
+        try:
+            # Don't let storage failures throw exceptions in a listener method.
+            self._engine.storage.update_atom_metadata(atom_name, meta_update)
+        except exc.StorageFailure:
+            LOG.warn("Failure to store timestamp %s for atom %s",
+                     meta_update, atom_name, exc_info=True)
+
+    def _flow_receiver(self, state, details):
+        meta_update = {'%s-timestamp' % state: time.time()}
+        try:
+            # Don't let storage failures throw exceptions in a listener method.
+            self._engine.storage.update_flow_metadata(meta_update)
+        except exc.StorageFailure:
+            LOG.warn("Failure to store timestamp %s for flow %s",
+                     meta_update, details['flow_name'], exc_info=True)
+
+    def _task_receiver(self, state, details):
+        self._record_atom_event(state, details['task_name'])
+
+    def _retry_receiver(self, state, details):
+        self._record_atom_event(state, details['retry_name'])
--- a/taskflow/logging.py
+++ b/taskflow/logging.py
@@ -32,6 +32,7 @@ CRITICAL = logging.CRITICAL
 DEBUG = logging.DEBUG
 ERROR = logging.ERROR
 FATAL = logging.FATAL
+INFO = logging.INFO
 NOTSET = logging.NOTSET
 WARN = logging.WARN
 WARNING = logging.WARNING
--- a/taskflow/patterns/graph_flow.py
+++ b/taskflow/patterns/graph_flow.py
@@ -16,22 +16,32 @@

 import collections

+import six
+
 from taskflow import exceptions as exc
 from taskflow import flow
 from taskflow.types import graph as gr


 def _unsatisfied_requires(node, graph, *additional_provided):
-    """Extracts the unsatisified symbol requirements of a single node."""
    requires = set(node.requires)
    if not requires:
        return requires
    for provided in additional_provided:
-        requires = requires - provided
+        # This is using the difference() method vs the -
+        # operator since the latter doesn't work with frozen
+        # or regular sets (when used in combination with ordered
+        # sets).
+        #
+        # If this is not done the following happens...
+        #
+        # TypeError: unsupported operand type(s)
+        # for -: 'set' and 'OrderedSet'
+        requires = requires.difference(provided)
        if not requires:
            return requires
    for pred in graph.bfs_predecessors_iter(node):
-        requires = requires - pred.provides
+        requires = requires.difference(pred.provides)
        if not requires:
            return requires
    return requires
@@ -55,16 +65,23 @@ class Flow(flow.Flow):
        self._graph = gr.DiGraph()
        self._graph.freeze()

-    def link(self, u, v):
+    #: Extracts the unsatisified symbol requirements of a single node.
+    _unsatisfied_requires = staticmethod(_unsatisfied_requires)
+
+    def link(self, u, v, decider=None):
        """Link existing node u as a runtime dependency of existing node v."""
        if not self._graph.has_node(u):
-            raise ValueError('Item %s not found to link from' % (u))
+            raise ValueError("Node '%s' not found to link from" % (u))
        if not self._graph.has_node(v):
-            raise ValueError('Item %s not found to link to' % (v))
-        self._swap(self._link(u, v, manual=True))
+            raise ValueError("Node '%s' not found to link to" % (v))
+        if decider is not None:
+            if not six.callable(decider):
+                raise ValueError("Decider boolean callback must be callable")
+        self._swap(self._link(u, v, manual=True, decider=decider))
        return self

-    def _link(self, u, v, graph=None, reason=None, manual=False):
+    def _link(self, u, v, graph=None,
+              reason=None, manual=False, decider=None):
        mutable_graph = True
        if graph is None:
            graph = self._graph
@@ -74,6 +91,8 @@ class Flow(flow.Flow):
        attrs = graph.get_edge_data(u, v)
        if not attrs:
            attrs = {}
+        if decider is not None:
+            attrs[flow.LINK_DECIDER] = decider
        if manual:
            attrs[flow.LINK_MANUAL] = True
        if reason is not None:
@@ -94,34 +113,38 @@ class Flow(flow.Flow):
        direct access to the underlying graph).
        """
        if not graph.is_directed_acyclic():
-            raise exc.DependencyFailure("No path through the items in the"
+            raise exc.DependencyFailure("No path through the node(s) in the"
                                        " graph produces an ordering that"
                                        " will allow for logical"
                                        " edge traversal")
        self._graph = graph.freeze()

-    def add(self, *items, **kwargs):
+    def add(self, *nodes, **kwargs):
        """Adds a given task/tasks/flow/flows to this flow.

-        :param items: items to add to the flow
+        :param nodes: node(s) to add to the flow
        :param kwargs: keyword arguments, the two keyword arguments
                       currently processed are:

                        * ``resolve_requires`` a boolean that when true (the
-                          default) implies that when items are added their
-                          symbol requirements will be matched to existing items
-                          and links will be automatically made to those
+                          default) implies that when node(s) are added their
+                          symbol requirements will be matched to existing
+                          node(s) and links will be automatically made to those
                          providers. If multiple possible providers exist
                          then a AmbiguousDependency exception will be raised.
                        * ``resolve_existing``, a boolean that when true (the
-                          default) implies that on addition of a new item that
-                          existing items will have their requirements scanned
-                          for symbols that this newly added item can provide.
+                          default) implies that on addition of a new node that
+                          existing node(s) will have their requirements scanned
+                          for symbols that this newly added node can provide.
                          If a match is found a link is automatically created
-                          from the newly added item to the requiree.
+                          from the newly added node to the requiree.
        """
-        items = [i for i in items if not self._graph.has_node(i)]
-        if not items:
+
+        # Let's try to avoid doing any work if we can; since the below code
+        # after this filter can create more temporary graphs that aren't needed
+        # if the nodes already exist...
+        nodes = [i for i in nodes if not self._graph.has_node(i)]
+        if not nodes:
            return self

        # This syntax will *hopefully* be better in future versions of python.
@@ -143,52 +166,52 @@ class Flow(flow.Flow):
                retry_provides.add(value)
                provided[value].append(self._retry)

-        for item in self._graph.nodes_iter():
-            for value in _unsatisfied_requires(item, self._graph,
-                                               retry_provides):
-                required[value].append(item)
-            for value in item.provides:
-                provided[value].append(item)
+        for node in self._graph.nodes_iter():
+            for value in self._unsatisfied_requires(node, self._graph,
+                                                    retry_provides):
+                required[value].append(node)
+            for value in node.provides:
+                provided[value].append(node)

-        # NOTE(harlowja): Add items and edges to a temporary copy of the
+        # NOTE(harlowja): Add node(s) and edge(s) to a temporary copy of the
        # underlying graph and only if that is successful added to do we then
        # swap with the underlying graph.
        tmp_graph = gr.DiGraph(self._graph)
-        for item in items:
-            tmp_graph.add_node(item)
+        for node in nodes:
+            tmp_graph.add_node(node)

            # Try to find a valid provider.
            if resolve_requires:
-                for value in _unsatisfied_requires(item, tmp_graph,
-                                                   retry_provides):
+                for value in self._unsatisfied_requires(node, tmp_graph,
+                                                        retry_provides):
                    if value in provided:
                        providers = provided[value]
                        if len(providers) > 1:
                            provider_names = [n.name for n in providers]
                            raise exc.AmbiguousDependency(
                                "Resolution error detected when"
-                                " adding %(item)s, multiple"
+                                " adding '%(node)s', multiple"
                                " providers %(providers)s found for"
                                " required symbol '%(value)s'"
-                                % dict(item=item.name,
+                                % dict(node=node.name,
                                       providers=sorted(provider_names),
                                       value=value))
                        else:
-                            self._link(providers[0], item,
+                            self._link(providers[0], node,
                                       graph=tmp_graph, reason=value)
                    else:
-                        required[value].append(item)
+                        required[value].append(node)

-            for value in item.provides:
-                provided[value].append(item)
+            for value in node.provides:
+                provided[value].append(node)

            # See if what we provide fulfills any existing requiree.
            if resolve_existing:
-                for value in item.provides:
+                for value in node.provides:
                    if value in required:
                        for requiree in list(required[value]):
-                            if requiree is not item:
-                                self._link(item, requiree,
+                            if requiree is not node:
+                                self._link(node, requiree,
                                           graph=tmp_graph, reason=value)
                                required[value].remove(requiree)

@@ -222,8 +245,9 @@ class Flow(flow.Flow):
            requires.update(self._retry.requires)
            retry_provides.update(self._retry.provides)
        g = self._get_subgraph()
-        for item in g.nodes_iter():
-            requires.update(_unsatisfied_requires(item, g, retry_provides))
+        for node in g.nodes_iter():
+            requires.update(self._unsatisfied_requires(node, g,
+                                                       retry_provides))
        return frozenset(requires)


@@ -239,36 +263,35 @@ class TargetedFlow(Flow):
        self._subgraph = None
        self._target = None

-    def set_target(self, target_item):
+    def set_target(self, target_node):
        """Set target for the flow.

-        Any items (tasks or subflows) not needed for the target
-        item will not be executed.
+        Any node(s) (tasks or subflows) not needed for the target
+        node will not be executed.
        """
-        if not self._graph.has_node(target_item):
-            raise ValueError('Item %s not found' % target_item)
-        self._target = target_item
+        if not self._graph.has_node(target_node):
+            raise ValueError("Node '%s' not found" % target_node)
+        self._target = target_node
        self._subgraph = None

    def reset_target(self):
        """Reset target for the flow.

-        All items of the flow will be executed.
+        All node(s) of the flow will be executed.
        """
-
        self._target = None
        self._subgraph = None

-    def add(self, *items):
+    def add(self, *nodes):
        """Adds a given task/tasks/flow/flows to this flow."""
-        super(TargetedFlow, self).add(*items)
+        super(TargetedFlow, self).add(*nodes)
        # reset cached subgraph, in case it was affected
        self._subgraph = None
        return self

-    def link(self, u, v):
+    def link(self, u, v, decider=None):
        """Link existing node u as a runtime dependency of existing node v."""
-        super(TargetedFlow, self).link(u, v)
+        super(TargetedFlow, self).link(u, v, decider=decider)
        # reset cached subgraph, in case it was affected
        self._subgraph = None
        return self
--- a/taskflow/persistence/backends/init.py
+++ b/taskflow/persistence/backends/init.py
@@ -16,6 +16,7 @@

 import contextlib

+import six
 from stevedore import driver

 from taskflow import exceptions as exc
@@ -38,18 +39,20 @@ def fetch(conf, namespace=BACKEND_NAMESPACE, **kwargs):

    NOTE(harlowja): to aid in making it easy to specify configuration and
    options to a backend the configuration (which is typical just a dictionary)
-    can also be a uri string that identifies the entrypoint name and any
+    can also be a URI string that identifies the entrypoint name and any
    configuration specific to that backend.

-    For example, given the following configuration uri:
+    For example, given the following configuration URI::

-    mysql://<not-used>/?a=b&c=d
+        mysql://<not-used>/?a=b&c=d

    This will look for the entrypoint named 'mysql' and will provide
-    a configuration object composed of the uris parameters, in this case that
-    is {'a': 'b', 'c': 'd'} to the constructor of that persistence backend
+    a configuration object composed of the URI's components, in this case that
+    is ``{'a': 'b', 'c': 'd'}`` to the constructor of that persistence backend
    instance.
    """
+    if isinstance(conf, six.string_types):
+        conf = {'connection': conf}
    backend_name = conf['connection']
    try:
        uri = misc.parse_uri(backend_name)
--- a/taskflow/persistence/backends/impl_dir.py
+++ b/taskflow/persistence/backends/impl_dir.py
@@ -15,33 +15,39 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import contextlib
 import errno
 import os
 import shutil

+import cachetools
+import fasteners
 from oslo_serialization import jsonutils
-import six

 from taskflow import exceptions as exc
-from taskflow import logging
-from taskflow.persistence import base
-from taskflow.persistence import logbook
-from taskflow.utils import lock_utils
+from taskflow.persistence import path_based
 from taskflow.utils import misc

-LOG = logging.getLogger(__name__)
+
+@contextlib.contextmanager
+def _storagefailure_wrapper():
+    try:
+        yield
+    except exc.TaskFlowException:
+        raise
+    except Exception as e:
+        if isinstance(e, (IOError, OSError)) and e.errno == errno.ENOENT:
+            exc.raise_with_cause(exc.NotFound,
+                                 'Item not found: %s' % e.filename,
+                                 cause=e)
+        else:
+            exc.raise_with_cause(exc.StorageFailure,
+                                 "Storage backend internal error", cause=e)


-class DirBackend(base.Backend):
+class DirBackend(path_based.PathBasedBackend):
    """A directory and file based backend.

-    This backend writes logbooks, flow details, and atom details to a provided
-    base path on the local filesystem. It will create and store those objects
-    in three key directories (one for logbooks, one for flow details and one
-    for atom details). It creates those associated directories and then
-    creates files inside those directories that represent the contents of those
-    objects for later reading and writing.
-
    This backend does *not* provide true transactional semantics. It does
    guarantee that there will be no interprocess race conditions when
    writing and reading by using a consistent hierarchy of file based locks.
@@ -49,22 +55,33 @@ class DirBackend(base.Backend):
    Example configuration::

        conf = {
-            "path": "/tmp/taskflow",
+            "path": "/tmp/taskflow",  # save data to this root directory
+            "max_cache_size": 1024,  # keep up-to 1024 entries in memory
        }
    """
+
+    DEFAULT_FILE_ENCODING = 'utf-8'
+    """
+    Default encoding used when decoding or encoding files into or from
+    text/unicode into binary or binary into text/unicode.
+    """
+
    def __init__(self, conf):
        super(DirBackend, self).__init__(conf)
-        self._path = os.path.abspath(conf['path'])
-        self._lock_path = os.path.join(self._path, 'locks')
-        self._file_cache = {}
-
-    @property
-    def lock_path(self):
-        return self._lock_path
-
-    @property
-    def base_path(self):
-        return self._path
+        max_cache_size = self._conf.get('max_cache_size')
+        if max_cache_size is not None:
+            max_cache_size = int(max_cache_size)
+            if max_cache_size < 1:
+                raise ValueError("Maximum cache size must be greater than"
+                                 " or equal to one")
+            self.file_cache = cachetools.LRUCache(max_cache_size)
+        else:
+            self.file_cache = {}
+        self.encoding = self._conf.get('encoding', self.DEFAULT_FILE_ENCODING)
+        if not self._path:
+            raise ValueError("Empty path is disallowed")
+        self._path = os.path.abspath(self._path)
+        self.lock = fasteners.ReaderWriterLock()

    def get_connection(self):
        return Connection(self)
@@ -73,333 +90,77 @@ class DirBackend(base.Backend):
        pass


-class Connection(base.Connection):
-    def __init__(self, backend):
-        self._backend = backend
-        self._file_cache = self._backend._file_cache
-        self._flow_path = os.path.join(self._backend.base_path, 'flows')
-        self._atom_path = os.path.join(self._backend.base_path, 'atoms')
-        self._book_path = os.path.join(self._backend.base_path, 'books')
-
-    def validate(self):
-        # Verify key paths exist.
-        paths = [
-            self._backend.base_path,
-            self._backend.lock_path,
-            self._flow_path,
-            self._atom_path,
-            self._book_path,
-        ]
-        for p in paths:
-            if not os.path.isdir(p):
-                raise RuntimeError("Missing required directory: %s" % (p))
-
+class Connection(path_based.PathBasedConnection):
    def _read_from(self, filename):
        # This is very similar to the oslo-incubator fileutils module, but
        # tweaked to not depend on a global cache, as well as tweaked to not
        # pull-in the oslo logging module (which is a huge pile of code).
        mtime = os.path.getmtime(filename)
-        cache_info = self._file_cache.setdefault(filename, {})
+        cache_info = self.backend.file_cache.setdefault(filename, {})
        if not cache_info or mtime > cache_info.get('mtime', 0):
            with open(filename, 'rb') as fp:
-                cache_info['data'] = fp.read().decode('utf-8')
+                cache_info['data'] = misc.binary_decode(
+                    fp.read(), encoding=self.backend.encoding)
                cache_info['mtime'] = mtime
        return cache_info['data']

    def _write_to(self, filename, contents):
-        if isinstance(contents, six.text_type):
-            contents = contents.encode('utf-8')
+        contents = misc.binary_encode(contents,
+                                      encoding=self.backend.encoding)
        with open(filename, 'wb') as fp:
            fp.write(contents)
-        self._file_cache.pop(filename, None)
+        self.backend.file_cache.pop(filename, None)

-    def _run_with_process_lock(self, lock_name, functor, *args, **kwargs):
-        lock_path = os.path.join(self.backend.lock_path, lock_name)
-        with lock_utils.InterProcessLock(lock_path):
+    @contextlib.contextmanager
+    def _path_lock(self, path):
+        lockfile = self._join_path(path, 'lock')
+        with fasteners.InterProcessLock(lockfile) as lock:
+            with _storagefailure_wrapper():
+                yield lock
+
+    def _join_path(self, *parts):
+        return os.path.join(*parts)
+
+    def _get_item(self, path):
+        with self._path_lock(path):
+            item_path = self._join_path(path, 'metadata')
+            return misc.decode_json(self._read_from(item_path))
+
+    def _set_item(self, path, value, transaction):
+        with self._path_lock(path):
+            item_path = self._join_path(path, 'metadata')
+            self._write_to(item_path, jsonutils.dumps(value))
+
+    def _del_tree(self, path, transaction):
+        with self._path_lock(path):
+            shutil.rmtree(path)
+
+    def _get_children(self, path):
+        with _storagefailure_wrapper():
+            return [link for link in os.listdir(path)
+                    if os.path.islink(self._join_path(path, link))]
+
+    def _ensure_path(self, path):
+        with _storagefailure_wrapper():
+            misc.ensure_tree(path)
+
+    def _create_link(self, src_path, dest_path, transaction):
+        with _storagefailure_wrapper():
            try:
-                return functor(*args, **kwargs)
-            except exc.TaskFlowException:
-                raise
-            except Exception as e:
-                LOG.exception("Failed running locking file based session")
-                # NOTE(harlowja): trap all other errors as storage errors.
-                raise exc.StorageFailure("Storage backend internal error", e)
-
-    def _get_logbooks(self):
-        lb_uuids = []
-        try:
-            lb_uuids = [d for d in os.listdir(self._book_path)
-                        if os.path.isdir(os.path.join(self._book_path, d))]
-        except EnvironmentError as e:
-            if e.errno != errno.ENOENT:
-                raise
-        for lb_uuid in lb_uuids:
-            try:
-                yield self._get_logbook(lb_uuid)
-            except exc.NotFound:
-                pass
-
-    def get_logbooks(self):
-        try:
-            books = list(self._get_logbooks())
-        except EnvironmentError as e:
-            raise exc.StorageFailure("Unable to fetch logbooks", e)
-        else:
-            for b in books:
-                yield b
-
-    @property
-    def backend(self):
-        return self._backend
-
-    def close(self):
-        pass
-
-    def _save_atom_details(self, atom_detail, ignore_missing):
-        # See if we have an existing atom detail to merge with.
-        e_ad = None
-        try:
-            e_ad = self._get_atom_details(atom_detail.uuid, lock=False)
-        except EnvironmentError:
-            if not ignore_missing:
-                raise exc.NotFound("No atom details found with id: %s"
-                                   % atom_detail.uuid)
-        if e_ad is not None:
-            atom_detail = e_ad.merge(atom_detail)
-        ad_path = os.path.join(self._atom_path, atom_detail.uuid)
-        ad_data = base._format_atom(atom_detail)
-        self._write_to(ad_path, jsonutils.dumps(ad_data))
-        return atom_detail
-
-    def update_atom_details(self, atom_detail):
-        return self._run_with_process_lock("atom",
-                                           self._save_atom_details,
-                                           atom_detail,
-                                           ignore_missing=False)
-
-    def _get_atom_details(self, uuid, lock=True):
-
-        def _get():
-            ad_path = os.path.join(self._atom_path, uuid)
-            ad_data = misc.decode_json(self._read_from(ad_path))
-            ad_cls = logbook.atom_detail_class(ad_data['type'])
-            return ad_cls.from_dict(ad_data['atom'])
-
-        if lock:
-            return self._run_with_process_lock('atom', _get)
-        else:
-            return _get()
-
-    def _get_flow_details(self, uuid, lock=True):
-
-        def _get():
-            fd_path = os.path.join(self._flow_path, uuid)
-            meta_path = os.path.join(fd_path, 'metadata')
-            meta = misc.decode_json(self._read_from(meta_path))
-            fd = logbook.FlowDetail.from_dict(meta)
-            ad_to_load = []
-            ad_path = os.path.join(fd_path, 'atoms')
-            try:
-                ad_to_load = [f for f in os.listdir(ad_path)
-                              if os.path.islink(os.path.join(ad_path, f))]
-            except EnvironmentError as e:
-                if e.errno != errno.ENOENT:
-                    raise
-            for ad_uuid in ad_to_load:
-                fd.add(self._get_atom_details(ad_uuid))
-            return fd
-
-        if lock:
-            return self._run_with_process_lock('flow', _get)
-        else:
-            return _get()
-
-    def _save_atoms_and_link(self, atom_details, local_atom_path):
-        for atom_detail in atom_details:
-            self._save_atom_details(atom_detail, ignore_missing=True)
-            src_ad_path = os.path.join(self._atom_path, atom_detail.uuid)
-            target_ad_path = os.path.join(local_atom_path, atom_detail.uuid)
-            try:
-                os.symlink(src_ad_path, target_ad_path)
-            except EnvironmentError as e:
+                os.symlink(src_path, dest_path)
+            except OSError as e:
                if e.errno != errno.EEXIST:
                    raise

-    def _save_flow_details(self, flow_detail, ignore_missing):
-        # See if we have an existing flow detail to merge with.
-        e_fd = None
-        try:
-            e_fd = self._get_flow_details(flow_detail.uuid, lock=False)
-        except EnvironmentError:
-            if not ignore_missing:
-                raise exc.NotFound("No flow details found with id: %s"
-                                   % flow_detail.uuid)
-        if e_fd is not None:
-            e_fd = e_fd.merge(flow_detail)
-            for ad in flow_detail:
-                if e_fd.find(ad.uuid) is None:
-                    e_fd.add(ad)
-            flow_detail = e_fd
-        flow_path = os.path.join(self._flow_path, flow_detail.uuid)
-        misc.ensure_tree(flow_path)
-        self._write_to(os.path.join(flow_path, 'metadata'),
-                       jsonutils.dumps(flow_detail.to_dict()))
-        if len(flow_detail):
-            atom_path = os.path.join(flow_path, 'atoms')
-            misc.ensure_tree(atom_path)
-            self._run_with_process_lock('atom',
-                                        self._save_atoms_and_link,
-                                        list(flow_detail), atom_path)
-        return flow_detail
+    @contextlib.contextmanager
+    def _transaction(self):
+        """This just wraps a global write-lock."""
+        lock = self.backend.lock.write_lock
+        with lock():
+            yield

-    def update_flow_details(self, flow_detail):
-        return self._run_with_process_lock("flow",
-                                           self._save_flow_details,
-                                           flow_detail,
-                                           ignore_missing=False)
-
-    def _save_flows_and_link(self, flow_details, local_flow_path):
-        for flow_detail in flow_details:
-            self._save_flow_details(flow_detail, ignore_missing=True)
-            src_fd_path = os.path.join(self._flow_path, flow_detail.uuid)
-            target_fd_path = os.path.join(local_flow_path, flow_detail.uuid)
-            try:
-                os.symlink(src_fd_path, target_fd_path)
-            except EnvironmentError as e:
-                if e.errno != errno.EEXIST:
-                    raise
-
-    def _save_logbook(self, book):
-        # See if we have an existing logbook to merge with.
-        e_lb = None
-        try:
-            e_lb = self._get_logbook(book.uuid)
-        except exc.NotFound:
-            pass
-        if e_lb is not None:
-            e_lb = e_lb.merge(book)
-            for fd in book:
-                if e_lb.find(fd.uuid) is None:
-                    e_lb.add(fd)
-            book = e_lb
-        book_path = os.path.join(self._book_path, book.uuid)
-        misc.ensure_tree(book_path)
-        self._write_to(os.path.join(book_path, 'metadata'),
-                       jsonutils.dumps(book.to_dict(marshal_time=True)))
-        if len(book):
-            flow_path = os.path.join(book_path, 'flows')
-            misc.ensure_tree(flow_path)
-            self._run_with_process_lock('flow',
-                                        self._save_flows_and_link,
-                                        list(book), flow_path)
-        return book
-
-    def save_logbook(self, book):
-        return self._run_with_process_lock("book",
-                                           self._save_logbook, book)
-
-    def upgrade(self):
-
-        def _step_create():
-            for path in (self._book_path, self._flow_path, self._atom_path):
-                try:
-                    misc.ensure_tree(path)
-                except EnvironmentError as e:
-                    raise exc.StorageFailure("Unable to create logbooks"
-                                             " required child path %s" % path,
-                                             e)
-
-        for path in (self._backend.base_path, self._backend.lock_path):
-            try:
-                misc.ensure_tree(path)
-            except EnvironmentError as e:
-                raise exc.StorageFailure("Unable to create logbooks required"
-                                         " path %s" % path, e)
-
-        self._run_with_process_lock("init", _step_create)
-
-    def clear_all(self):
-
-        def _step_clear():
-            for d in (self._book_path, self._flow_path, self._atom_path):
-                if os.path.isdir(d):
-                    shutil.rmtree(d)
-
-        def _step_atom():
-            self._run_with_process_lock("atom", _step_clear)
-
-        def _step_flow():
-            self._run_with_process_lock("flow", _step_atom)
-
-        def _step_book():
-            self._run_with_process_lock("book", _step_flow)
-
-        # Acquire all locks by going through this little hierarchy.
-        self._run_with_process_lock("init", _step_book)
-
-    def destroy_logbook(self, book_uuid):
-
-        def _destroy_atoms(atom_details):
-            for atom_detail in atom_details:
-                atom_path = os.path.join(self._atom_path, atom_detail.uuid)
-                try:
-                    shutil.rmtree(atom_path)
-                except EnvironmentError as e:
-                    if e.errno != errno.ENOENT:
-                        raise exc.StorageFailure("Unable to remove atom"
-                                                 " directory %s" % atom_path,
-                                                 e)
-
-        def _destroy_flows(flow_details):
-            for flow_detail in flow_details:
-                flow_path = os.path.join(self._flow_path, flow_detail.uuid)
-                self._run_with_process_lock("atom", _destroy_atoms,
-                                            list(flow_detail))
-                try:
-                    shutil.rmtree(flow_path)
-                except EnvironmentError as e:
-                    if e.errno != errno.ENOENT:
-                        raise exc.StorageFailure("Unable to remove flow"
-                                                 " directory %s" % flow_path,
-                                                 e)
-
-        def _destroy_book():
-            book = self._get_logbook(book_uuid)
-            book_path = os.path.join(self._book_path, book.uuid)
-            self._run_with_process_lock("flow", _destroy_flows, list(book))
-            try:
-                shutil.rmtree(book_path)
-            except EnvironmentError as e:
-                if e.errno != errno.ENOENT:
-                    raise exc.StorageFailure("Unable to remove book"
-                                             " directory %s" % book_path, e)
-
-        # Acquire all locks by going through this little hierarchy.
-        self._run_with_process_lock("book", _destroy_book)
-
-    def _get_logbook(self, book_uuid):
-        book_path = os.path.join(self._book_path, book_uuid)
-        meta_path = os.path.join(book_path, 'metadata')
-        try:
-            meta = misc.decode_json(self._read_from(meta_path))
-        except EnvironmentError as e:
-            if e.errno == errno.ENOENT:
-                raise exc.NotFound("No logbook found with id: %s" % book_uuid)
-            else:
-                raise
-        lb = logbook.LogBook.from_dict(meta, unmarshal_time=True)
-        fd_path = os.path.join(book_path, 'flows')
-        fd_uuids = []
-        try:
-            fd_uuids = [f for f in os.listdir(fd_path)
-                        if os.path.islink(os.path.join(fd_path, f))]
-        except EnvironmentError as e:
-            if e.errno != errno.ENOENT:
-                raise
-        for fd_uuid in fd_uuids:
-            lb.add(self._get_flow_details(fd_uuid))
-        return lb
-
-    def get_logbook(self, book_uuid):
-        return self._run_with_process_lock("book",
-                                           self._get_logbook, book_uuid)
+    def validate(self):
+        with _storagefailure_wrapper():
+            for p in (self.flow_path, self.atom_path, self.book_path):
+                if not os.path.isdir(p):
+                    raise RuntimeError("Missing required directory: %s" % (p))
--- a/Show More
+++ b/Show More