Add specific scoping documentation

Adds information into the arguments and result docs about how scoping lookup works and what it implies. Change-Id: I810874dce042ec43fe9e704d6689215e19d67c9c
2015-02-24 16:19:04 -08:00
parent 0a97fb96b5
commit 6da46b71d9
3 changed files with 66 additions and 30 deletions
--- a/doc/source/engines.rst
+++ b/doc/source/engines.rst
@@ -346,6 +346,47 @@ failures have occurred then the engine will have finished and if so desired the
 :doc:`persistence <persistence>` can be used to cleanup any details that were
 saved for this execution.

+Scoping
+=======
+
+During creation of flows it is also important to understand the lookup
+strategy (also typically known as `scope`_ resolution) that the engine you
+are using will internally use. For example when a task ``A`` provides
+result 'a' and a task ``B`` after ``A`` provides a different result 'a' and a
+task ``C`` after ``A`` and after ``B`` requires 'a' to run, which one will
+be selected?
+
+Default strategy
+----------------
+
+When a engine is executing it internally interacts with the
+:py:class:`~taskflow.storage.Storage` class
+and that class interacts with the a
+:py:class:`~taskflow.engines.action_engine.scopes.ScopeWalker` instance
+and the :py:class:`~taskflow.storage.Storage` class uses the following
+lookup order to find (or fail) a atoms requirement lookup/request:
+
+#. Injected atom specific arguments.
+#. Transient injected arguments.
+#. Non-transient injected arguments.
+#. First scope visited provider that produces the named result; note that
+   if multiple providers are found in the same scope the *first* (the scope
+   walkers yielded ordering defines what *first* means) that produced that
+   result *and* can be extracted without raising an error is selected as the
+   provider of the requested requirement.
+#. Fails with :py:class:`~taskflow.exceptions.NotFound` if unresolved at this
+   point (the ``cause`` attribute of this exception may have more details on
+   why the lookup failed).
+
+.. note::
+
+    To examine this this information when debugging it is recommended to
+    enable the ``BLATHER`` logging level (level 5). At this level the storage
+    and scope code/layers will log what is being searched for and what is
+    being found.
+
+.. _scope: http://en.wikipedia.org/wiki/Scope_%28computer_science%29
+
 Interfaces
 ==========

@@ -362,7 +403,8 @@ Implementations
 .. automodule:: taskflow.engines.action_engine.runner
 .. automodule:: taskflow.engines.action_engine.runtime
 .. automodule:: taskflow.engines.action_engine.scheduler
-.. automodule:: taskflow.engines.action_engine.scopes
+.. autoclass:: taskflow.engines.action_engine.scopes.ScopeWalker
+    :special-members: __iter__

 Hierarchy
 =========
--- a/taskflow/engines/action_engine/scopes.py
+++ b/taskflow/engines/action_engine/scopes.py
@@ -44,6 +44,8 @@ def _extract_atoms(node, idx=-1):
 class ScopeWalker(object):
    """Walks through the scopes of a atom using a engines compilation.

+    NOTE(harlowja): for internal usage only.
+
    This will walk the visible scopes that are accessible for the given
    atom, which can be used by some external entity in some meaningful way,
    for example to find dependent values...
@@ -63,29 +65,35 @@ class ScopeWalker(object):

        How this works is the following:

-        We find all the possible predecessors of the given atom, this is useful
-        since we know they occurred before this atom but it doesn't tell us
-        the corresponding scope *level* that each predecessor was created in,
-        so we need to find this information.
+        We first grab all the predecessors of the given atom (lets call it
+        ``Y``) by using the :py:class:`~.compiler.Compilation` execution
+        graph (and doing a reverse breadth-first expansion to gather its
+        predecessors), this is useful since we know they *always* will
+        exist (and execute) before this atom but it does not tell us the
+        corresponding scope *level* (flow, nested flow...) that each
+        predecessor was created in, so we need to find this information.

        For that information we consult the location of the atom ``Y`` in the
-        node hierarchy. We lookup in a reverse order the parent ``X`` of ``Y``
-        and traverse backwards from the index in the parent where ``Y``
-        occurred, all children in ``X`` that we encounter in this backwards
-        search (if a child is a flow itself, its atom contents will be
-        expanded) will be assumed to be at the same scope. This is then a
-        *potential* single scope, to make an *actual* scope we remove the items
-        from the *potential* scope that are not predecessors of ``Y`` to form
-        the *actual* scope.
+        :py:class:`~.compiler.Compilation` hierarchy/tree. We lookup in a
+        reverse order the parent ``X`` of ``Y`` and traverse backwards from
+        the index in the parent where ``Y`` exists to all siblings (and
+        children of those siblings) in ``X`` that we encounter in this
+        backwards search (if a sibling is a flow itself, its atom(s)
+        will be recursively expanded and included). This collection will
+        then be assumed to be at the same scope. This is what is called
+        a *potential* single scope, to make an *actual* scope we remove the
+        items from the *potential* scope that are **not** predecessors
+        of ``Y`` to form the *actual* scope which we then yield back.

        Then for additional scopes we continue up the tree, by finding the
        parent of ``X`` (lets call it ``Z``) and perform the same operation,
        going through the children in a reverse manner from the index in
        parent ``Z`` where ``X`` was located. This forms another *potential*
        scope which we provide back as an *actual* scope after reducing the
-        potential set by the predecessors of ``Y``. We then repeat this process
-        until we no longer have any parent nodes (aka have reached the top of
-        the tree) or we run out of predecessors.
+        potential set to only include predecessors previously gathered. We
+        then repeat this process until we no longer have any parent
+        nodes (aka we have reached the top of the tree) or we run out of
+        predecessors.
        """
        predecessors = set(self._graph.bfs_predecessors_iter(self._atom))
        last = self._node
--- a/taskflow/storage.py
+++ b/taskflow/storage.py
@@ -673,24 +673,10 @@ class Storage(object):
        with self._lock.read_lock():
            if optional_args is None:
                optional_args = []
-
            if atom_name and atom_name not in self._atom_name_to_uuid:
                raise exceptions.NotFound("Unknown atom name: %s" % atom_name)
            if not args_mapping:
                return {}
-
-            # The order of lookup is the following:
-            #
-            # 1. Injected atom specific arguments.
-            # 2. Transient injected arguments.
-            # 3. Non-transient injected arguments.
-            # 4. First scope visited group that produces the named result.
-            #    a). The first of that group that actually provided the name
-            #        result is selected (if group size is greater than one).
-            #
-            # Otherwise: blowup! (this will also happen if reading or
-            # extracting an expected result fails, since it is better to fail
-            # on lookup then provide invalid data from the wrong provider)
            if atom_name:
                injected_args = self._injected_args.get(atom_name, {})
            else: