Merge tag '1.15.0' into debian/liberty

taskflow 1.15.0 release
This commit is contained in:
Thomas Goirand
2015-07-15 11:26:16 +02:00
188 changed files with 10919 additions and 6876 deletions

1
ChangeLog Normal file
View File

@@ -0,0 +1 @@
.. This is a generated file! Do not edit.

View File

@@ -1,6 +1,14 @@
TaskFlow
========
.. image:: https://img.shields.io/pypi/v/taskflow.svg
:target: https://pypi.python.org/pypi/taskflow/
:alt: Latest Version
.. image:: https://img.shields.io/pypi/dm/taskflow.svg
:target: https://pypi.python.org/pypi/taskflow/
:alt: Downloads
A library to do [jobs, tasks, flows] in a highly available, easy to understand
and declarative manner (and more!) to be used with OpenStack and other
projects.
@@ -22,18 +30,16 @@ Requirements
~~~~~~~~~~~~
Because this project has many optional (pluggable) parts like persistence
backends and engines, we decided to split our requirements into three
backends and engines, we decided to split our requirements into two
parts: - things that are absolutely required (you can't use the project
without them) are put into ``requirements-pyN.txt`` (``N`` being the
Python *major* version number used to install the package). The requirements
without them) are put into ``requirements.txt``. The requirements
that are required by some optional part of this project (you can use the
project without them) are put into our ``tox.ini`` file (so that we can still
test the optional functionality works as expected). If you want to use the
feature in question (`eventlet`_ or the worker based engine that
uses `kombu`_ or the `sqlalchemy`_ persistence backend or jobboards which
project without them) are put into our ``test-requirements.txt`` file (so
that we can still test the optional functionality works as expected). If
you want to use the feature in question (`eventlet`_ or the worker based engine
that uses `kombu`_ or the `sqlalchemy`_ persistence backend or jobboards which
have an implementation built using `kazoo`_ ...), you should add
that requirement(s) to your project or environment; - as usual, things that
required only for running tests are put into ``test-requirements.txt``.
that requirement(s) to your project or environment.
Tox.ini
~~~~~~~

Binary file not shown.

View File

@@ -74,8 +74,8 @@ ignored during inference (as these names have special meaning/usage in python).
... def execute(self, *args, **kwargs):
... pass
...
>>> UniTask().requires
frozenset([])
>>> sorted(UniTask().requires)
[]
.. make vim sphinx highlighter* happy**
@@ -84,7 +84,7 @@ Rebinding
---------
**Why:** There are cases when the value you want to pass to a task/retry is
stored with a name other then the corresponding arguments name. That's when the
stored with a name other than the corresponding arguments name. That's when the
``rebind`` constructor parameter comes in handy. Using it the flow author
can instruct the engine to fetch a value from storage by one name, but pass it
to a tasks/retrys ``execute`` method with another name. There are two possible
@@ -214,8 +214,8 @@ name of the value.
... def execute(self):
... return 42
...
>>> TheAnswerReturningTask(provides='the_answer').provides
set(['the_answer'])
>>> sorted(TheAnswerReturningTask(provides='the_answer').provides)
['the_answer']
Returning a tuple
+++++++++++++++++
@@ -416,7 +416,7 @@ the following history (printed as a list)::
At this point (since the implementation returned ``RETRY``) the
|retry.execute| method will be called again and it will receive the same
history and it can then return a value that subseqent tasks can use to alter
there behavior.
their behavior.
If instead the |retry.execute| method itself raises an exception,
the |retry.revert| method of the implementation will be called and

View File

@@ -23,9 +23,9 @@ values (requirements) and name outputs (provided values).
Task
=====
A :py:class:`task <taskflow.task.BaseTask>` (derived from an atom) is the
smallest possible unit of work that can have an execute & rollback sequence
associated with it. These task objects all derive
A :py:class:`task <taskflow.task.BaseTask>` (derived from an atom) is a
unit of work that can have an execute & rollback sequence associated with
it (they are *nearly* analogous to functions). These task objects all derive
from :py:class:`~taskflow.task.BaseTask` which defines what a task must
provide in terms of properties and methods.
@@ -48,38 +48,30 @@ Retry
=====
A :py:class:`retry <taskflow.retry.Retry>` (derived from an atom) is a special
unit that handles errors, controls flow execution and can (for example) retry
other atoms with other parameters if needed. When an associated atom
fails, these retry units are *consulted* to determine what the resolution
method should be. The goal is that with this *consultation* the retry atom
will suggest a method for getting around the failure (perhaps by retrying,
reverting a single item, or reverting everything contained in the retries
associated scope).
unit of work that handles errors, controls flow execution and can (for
example) retry other atoms with other parameters if needed. When an associated
atom fails, these retry units are *consulted* to determine what the resolution
*strategy* should be. The goal is that with this consultation the retry atom
will suggest a *strategy* for getting around the failure (perhaps by retrying,
reverting a single atom, or reverting everything contained in the retries
associated `scope`_).
Currently derivatives of the :py:class:`retry <taskflow.retry.Retry>` base
class must provide a ``on_failure`` method to determine how a failure should
be handled.
class must provide a :py:func:`~taskflow.retry.Retry.on_failure` method to
determine how a failure should be handled. The current enumeration(s) that can
be returned from the :py:func:`~taskflow.retry.Retry.on_failure` method
are defined in an enumeration class described here:
The current enumeration set that can be returned from this method is:
* ``RETRY`` - retries the surrounding subflow (a retry object is associated
with a flow, which is typically converted into a graph hierarchy at
compilation time) again.
* ``REVERT`` - reverts only the surrounding subflow but *consult* the
parent atom before doing this to determine if the parent retry object
provides a different reconciliation strategy (retry atoms can be nested, this
is possible since flows themselves can be nested).
* ``REVERT_ALL`` - completely reverts a whole flow.
.. autoclass:: taskflow.retry.Decision
To aid in the reconciliation process the
:py:class:`retry <taskflow.retry.Retry>` base class also mandates ``execute``
and ``revert`` methods (although subclasses are allowed to define these methods
as no-ops) that can be used by a retry atom to interact with the runtime
execution model (for example, to track the number of times it has been
called which is useful for the :py:class:`~taskflow.retry.ForEach` retry
subclass).
:py:class:`retry <taskflow.retry.Retry>` base class also mandates
:py:func:`~taskflow.retry.Retry.execute`
and :py:func:`~taskflow.retry.Retry.revert` methods (although subclasses
are allowed to define these methods as no-ops) that can be used by a retry
atom to interact with the runtime execution model (for example, to track the
number of times it has been called which is useful for
the :py:class:`~taskflow.retry.ForEach` retry subclass).
To avoid recreating common retry patterns the following provided retry
subclasses are provided:
@@ -94,8 +86,40 @@ subclasses are provided:
:py:class:`~taskflow.retry.ForEach` but extracts values from storage
instead of the :py:class:`~taskflow.retry.ForEach` constructor.
Examples
--------
.. _scope: http://en.wikipedia.org/wiki/Scope_%28computer_science%29
.. note::
They are *similar* to exception handlers but are made to be *more* capable
due to their ability to *dynamically* choose a reconciliation strategy,
which allows for these atoms to influence subsequent execution(s) and the
inputs any associated atoms require.
Area of influence
-----------------
Each retry atom is associated with a flow and it can *influence* how the
atoms (or nested flows) contained in that that flow retry or revert (using
the previously mentioned patterns and decision enumerations):
*For example:*
.. image:: img/area_of_influence.svg
:width: 325px
:align: left
:alt: Retry area of influence
In this diagram retry controller (1) will be consulted if task ``A``, ``B``
or ``C`` fail and retry controller (2) decides to delegate its retry decision
to retry controller (1). If retry controller (2) does **not** decide to
delegate its retry decision to retry controller (1) then retry
controller (1) will be oblivious of any decisions. If any of
task ``1``, ``2`` or ``3`` fail then only retry controller (1) will be
consulted to determine the strategy/pattern to apply to resolve there
associated failure.
Usage examples
--------------
.. testsetup::
@@ -167,7 +191,13 @@ Interfaces
==========
.. automodule:: taskflow.task
.. automodule:: taskflow.retry
.. autoclass:: taskflow.retry.Retry
.. autoclass:: taskflow.retry.History
.. autoclass:: taskflow.retry.AlwaysRevert
.. autoclass:: taskflow.retry.AlwaysRevertAll
.. autoclass:: taskflow.retry.Times
.. autoclass:: taskflow.retry.ForEach
.. autoclass:: taskflow.retry.ParameterizedForEach
Hierarchy
=========
@@ -175,5 +205,10 @@ Hierarchy
.. inheritance-diagram::
taskflow.atom
taskflow.task
taskflow.retry
taskflow.retry.Retry
taskflow.retry.AlwaysRevert
taskflow.retry.AlwaysRevertAll
taskflow.retry.Times
taskflow.retry.ForEach
taskflow.retry.ParameterizedForEach
:parts: 1

View File

@@ -2,6 +2,10 @@
Conductors
----------
.. image:: img/conductor.png
:width: 97px
:alt: Conductor
Overview
========
@@ -18,14 +22,14 @@ They are responsible for the following:
tasks and flows to be executed).
* Dispatching the engine using the provided :doc:`persistence <persistence>`
layer and engine configuration.
* Completing or abandoning the claimed job (depending on dispatching and
execution outcome).
* Completing or abandoning the claimed :doc:`job <jobs>` (depending on
dispatching and execution outcome).
* *Rinse and repeat*.
.. note::
They are inspired by and have similar responsibilities
as `railroad conductors`_.
as `railroad conductors`_ or `musical conductors`_.
Considerations
==============
@@ -53,28 +57,31 @@ claimable state.
#. Forcefully delete jobs that have been failing continuously after a given
number of conductor attempts. This can be either done manually or
automatically via scripts (or other associated monitoring).
automatically via scripts (or other associated monitoring) or via
the jobboards :py:func:`~taskflow.jobs.base.JobBoard.trash` method.
#. Resolve the internal error's cause (storage backend failure, other...).
#. Help implement `jobboard garbage binning`_.
.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin
Interfaces
==========
.. automodule:: taskflow.conductors.base
.. automodule:: taskflow.conductors.backends
Implementations
===============
.. automodule:: taskflow.conductors.single_threaded
Blocking
--------
.. automodule:: taskflow.conductors.backends.impl_blocking
Hierarchy
=========
.. inheritance-diagram::
taskflow.conductors.base
taskflow.conductors.single_threaded
taskflow.conductors.backends.impl_blocking
:parts: 1
.. _musical conductors: http://en.wikipedia.org/wiki/Conducting
.. _railroad conductors: http://en.wikipedia.org/wiki/Conductor_%28transportation%29

View File

@@ -17,11 +17,13 @@ and *ideal* is that deployers or developers of a service that use TaskFlow can
select an engine that suites their setup best without modifying the code of
said service.
Engines usually have different capabilities and configuration, but all of them
**must** implement the same interface and preserve the semantics of patterns
(e.g. parts of a :py:class:`.linear_flow.Flow`
are run one after another, in order, even if the selected engine is *capable*
of running tasks in parallel).
.. note::
Engines usually have different capabilities and configuration, but all of
them **must** implement the same interface and preserve the semantics of
patterns (e.g. parts of a :py:class:`.linear_flow.Flow`
are run one after another, in order, even if the selected
engine is *capable* of running tasks in parallel).
Why they exist
--------------
@@ -29,7 +31,7 @@ Why they exist
An engine being *the* core component which actually makes your flows progress
is likely a new concept for many programmers so let's describe how it operates
in more depth and some of the reasoning behind why it exists. This will
hopefully make it more clear on there value add to the TaskFlow library user.
hopefully make it more clear on their value add to the TaskFlow library user.
First though let us discuss something most are familiar already with; the
difference between `declarative`_ and `imperative`_ programming models. The
@@ -57,7 +59,7 @@ declarative model) allows for the following functionality to become possible:
accomplished allows for a *natural* way of resuming by allowing the engine to
track the current state and know at which point a workflow is in and how to
get back into that state when resumption occurs.
* Enhancing scalability: When a engine is responsible for executing your
* Enhancing scalability: When an engine is responsible for executing your
desired work it becomes possible to alter the *how* in the future by creating
new types of execution backends (for example the `worker`_ model which does
not execute locally). Without the decoupling of the *what* and the *how* it
@@ -172,13 +174,13 @@ using your desired execution model.
scalability by reducing thread/process creation and teardown as well as by
reusing existing pools (which is a good practice in general).
.. note::
.. warning::
Running tasks with a `process pool executor`_ is **experimentally**
supported. This is mainly due to the `futures backport`_ and
the `multiprocessing`_ module that exist in older versions of python not
being as up to date (with important fixes such as :pybug:`4892`,
:pybug:`6721`, :pybug:`9205`, :pybug:`11635`, :pybug:`16284`,
:pybug:`6721`, :pybug:`9205`, :pybug:`16284`,
:pybug:`22393` and others...) as the most recent python version (which
themselves have a variety of ongoing/recent bugs).
@@ -203,7 +205,7 @@ For further information, please refer to the the following:
How they run
============
To provide a peek into the general process that a engine goes through when
To provide a peek into the general process that an engine goes through when
running lets break it apart a little and describe what one of the engine types
does while executing (for this we will look into the
:py:class:`~taskflow.engines.action_engine.engine.ActionEngine` engine type).
@@ -221,39 +223,48 @@ are setup.
Compiling
---------
During this stage the flow will be converted into an internal graph
representation using a
:py:class:`~taskflow.engines.action_engine.compiler.Compiler` (the default
implementation for patterns is the
During this stage (see :py:func:`~taskflow.engines.base.Engine.compile`) the
flow will be converted into an internal graph representation using a
compiler (the default implementation for patterns is the
:py:class:`~taskflow.engines.action_engine.compiler.PatternCompiler`). This
class compiles/converts the flow objects and contained atoms into a
`networkx`_ directed graph that contains the equivalent atoms defined in the
flow and any nested flows & atoms as well as the constraints that are created
by the application of the different flow patterns. This graph is then what will
be analyzed & traversed during the engines execution. At this point a few
helper object are also created and saved to internal engine variables (these
object help in execution of atoms, analyzing the graph and performing other
internal engine activities). At the finishing of this stage a
`networkx`_ directed graph (and tree structure) that contains the equivalent
atoms defined in the flow and any nested flows & atoms as well as the
constraints that are created by the application of the different flow
patterns. This graph (and tree) are what will be analyzed & traversed during
the engines execution. At this point a few helper object are also created and
saved to internal engine variables (these object help in execution of
atoms, analyzing the graph and performing other internal engine
activities). At the finishing of this stage a
:py:class:`~taskflow.engines.action_engine.runtime.Runtime` object is created
which contains references to all needed runtime components.
which contains references to all needed runtime components and its
:py:func:`~taskflow.engines.action_engine.runtime.Runtime.compile` is called
to compile a cache of frequently used execution helper objects.
Preparation
-----------
This stage starts by setting up the storage needed for all atoms in the
previously created graph, ensuring that corresponding
:py:class:`~taskflow.persistence.logbook.AtomDetail` (or subclass of) objects
are created for each node in the graph. Once this is done final validation
occurs on the requirements that are needed to start execution and what
:py:class:`~taskflow.storage.Storage` provides. If there is any atom or flow
requirements not satisfied then execution will not be allowed to continue.
This stage (see :py:func:`~taskflow.engines.base.Engine.prepare`) starts by
setting up the storage needed for all atoms in the compiled graph, ensuring
that corresponding :py:class:`~taskflow.persistence.models.AtomDetail` (or
subclass of) objects are created for each node in the graph.
Validation
----------
This stage (see :py:func:`~taskflow.engines.base.Engine.validate`) performs
any final validation of the compiled (and now storage prepared) engine. It
compares the requirements that are needed to start execution and
what is currently provided or will be produced in the future. If there are
*any* atom requirements that are not satisfied (no known current provider or
future producer is found) then execution will **not** be allowed to continue.
Execution
---------
The graph (and helper objects) previously created are now used for guiding
further execution. The flow is put into the ``RUNNING`` :doc:`state <states>`
and a
further execution (see :py:func:`~taskflow.engines.base.Engine.run`). The
flow is put into the ``RUNNING`` :doc:`state <states>` and a
:py:class:`~taskflow.engines.action_engine.runner.Runner` implementation
object starts to take over and begins going through the stages listed
below (for a more visual diagram/representation see
@@ -262,10 +273,10 @@ the :ref:`engine state diagram <engine states>`).
.. note::
The engine will respect the constraints imposed by the flow. For example,
if Engine is executing a :py:class:`.linear_flow.Flow` then it is
constrained by the dependency-graph which is linear in this case, and hence
using a Parallel Engine may not yield any benefits if one is looking for
concurrency.
if an engine is executing a :py:class:`~taskflow.patterns.linear_flow.Flow`
then it is constrained by the dependency graph which is linear in this
case, and hence using a parallel engine may not yield any benefits if one
is looking for concurrency.
Resumption
^^^^^^^^^^
@@ -282,7 +293,7 @@ for things like retry atom which can influence what a tasks intention should be
:py:class:`~taskflow.engines.action_engine.analyzer.Analyzer` helper
object which was designed to provide helper methods for this analysis). Once
these intentions are determined and associated with each task (the intention is
also stored in the :py:class:`~taskflow.persistence.logbook.AtomDetail` object)
also stored in the :py:class:`~taskflow.persistence.models.AtomDetail` object)
the :ref:`scheduling <scheduling>` stage starts.
.. _scheduling:
@@ -292,7 +303,7 @@ Scheduling
This stage selects which atoms are eligible to run by using a
:py:class:`~taskflow.engines.action_engine.scheduler.Scheduler` implementation
(the default implementation looks at there intention, checking if predecessor
(the default implementation looks at their intention, checking if predecessor
atoms have ran and so-on, using a
:py:class:`~taskflow.engines.action_engine.analyzer.Analyzer` helper
object as needed) and submits those atoms to a previously provided compatible
@@ -312,15 +323,15 @@ submitted to complete. Once one of the future objects completes (or fails) that
atoms result will be examined and finalized using a
:py:class:`~taskflow.engines.action_engine.completer.Completer` implementation.
It typically will persist results to a provided persistence backend (saved
into the corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail`
and :py:class:`~taskflow.persistence.logbook.FlowDetail` objects via the
into the corresponding :py:class:`~taskflow.persistence.models.AtomDetail`
and :py:class:`~taskflow.persistence.models.FlowDetail` objects via the
:py:class:`~taskflow.storage.Storage` helper) and reflect
the new state of the atom. At this point what typically happens falls into two
categories, one for if that atom failed and one for if it did not. If the atom
failed it may be set to a new intention such as ``RETRY`` or
``REVERT`` (other atoms that were predecessors of this failing atom may also
have there intention altered). Once this intention adjustment has happened a
new round of :ref:`scheduling <scheduling>` occurs and this process repeats
new round of :ref:`scheduling <scheduling>` occurs and this process repeats
until the engine succeeds or fails (if the process running the engine dies the
above stages will be restarted and resuming will occur).
@@ -328,8 +339,8 @@ above stages will be restarted and resuming will occur).
If the engine is suspended while the engine is going through the above
stages this will stop any further scheduling stages from occurring and
all currently executing atoms will be allowed to finish (and there results
will be saved).
all currently executing work will be allowed to finish (see
:ref:`suspension <suspension>`).
Finishing
---------
@@ -346,6 +357,79 @@ failures have occurred then the engine will have finished and if so desired the
:doc:`persistence <persistence>` can be used to cleanup any details that were
saved for this execution.
Special cases
=============
.. _suspension:
Suspension
----------
Each engine implements a :py:func:`~taskflow.engines.base.Engine.suspend`
method that can be used to *externally* (or in the future *internally*) request
that the engine stop :ref:`scheduling <scheduling>` new work. By default what
this performs is a transition of the flow state from ``RUNNING`` into a
``SUSPENDING`` state (which will later transition into a ``SUSPENDED`` state).
Since an engine may be remotely executing atoms (or locally executing them)
and there is currently no preemption what occurs is that the engines
:py:class:`~taskflow.engines.action_engine.runner.Runner` state machine will
detect this transition into ``SUSPENDING`` has occurred and the state
machine will avoid scheduling new work (it will though let active work
continue). After the current work has finished the engine will
transition from ``SUSPENDING`` into ``SUSPENDED`` and return from its
:py:func:`~taskflow.engines.base.Engine.run` method.
.. note::
When :py:func:`~taskflow.engines.base.Engine.run` is returned from at that
point there *may* (but does not have to be, depending on what was active
when :py:func:`~taskflow.engines.base.Engine.suspend` was called) be
unfinished work in the flow that was not finished (but which can be
resumed at a later point in time).
Scoping
=======
During creation of flows it is also important to understand the lookup
strategy (also typically known as `scope`_ resolution) that the engine you
are using will internally use. For example when a task ``A`` provides
result 'a' and a task ``B`` after ``A`` provides a different result 'a' and a
task ``C`` after ``A`` and after ``B`` requires 'a' to run, which one will
be selected?
Default strategy
----------------
When an engine is executing it internally interacts with the
:py:class:`~taskflow.storage.Storage` class
and that class interacts with the a
:py:class:`~taskflow.engines.action_engine.scopes.ScopeWalker` instance
and the :py:class:`~taskflow.storage.Storage` class uses the following
lookup order to find (or fail) a atoms requirement lookup/request:
#. Transient injected atom specific arguments.
#. Non-transient injected atom specific arguments.
#. Transient injected arguments (flow specific).
#. Non-transient injected arguments (flow specific).
#. First scope visited provider that produces the named result; note that
if multiple providers are found in the same scope the *first* (the scope
walkers yielded ordering defines what *first* means) that produced that
result *and* can be extracted without raising an error is selected as the
provider of the requested requirement.
#. Fails with :py:class:`~taskflow.exceptions.NotFound` if unresolved at this
point (the ``cause`` attribute of this exception may have more details on
why the lookup failed).
.. note::
To examine this this information when debugging it is recommended to
enable the ``BLATHER`` logging level (level 5). At this level the storage
and scope code/layers will log what is being searched for and what is
being found.
.. _scope: http://en.wikipedia.org/wiki/Scope_%28computer_science%29
Interfaces
==========
@@ -354,15 +438,27 @@ Interfaces
Implementations
===============
.. automodule:: taskflow.engines.action_engine.engine
Components
----------
.. warning::
External usage of internal engine functions, components and modules should
be kept to a **minimum** as they may be altered, refactored or moved to
other locations **without** notice (and without the typical deprecation
cycle).
.. automodule:: taskflow.engines.action_engine.analyzer
.. automodule:: taskflow.engines.action_engine.compiler
.. automodule:: taskflow.engines.action_engine.completer
.. automodule:: taskflow.engines.action_engine.engine
.. automodule:: taskflow.engines.action_engine.executor
.. automodule:: taskflow.engines.action_engine.runner
.. automodule:: taskflow.engines.action_engine.runtime
.. automodule:: taskflow.engines.action_engine.scheduler
.. automodule:: taskflow.engines.action_engine.scopes
.. autoclass:: taskflow.engines.action_engine.scopes.ScopeWalker
:special-members: __iter__
Hierarchy
=========

View File

@@ -34,6 +34,30 @@ Using listeners
:linenos:
:lines: 16-
Using listeners (to watch a phone call)
=======================================
.. note::
Full source located at :example:`simple_linear_listening`.
.. literalinclude:: ../../taskflow/examples/simple_linear_listening.py
:language: python
:linenos:
:lines: 16-
Dumping a in-memory backend
===========================
.. note::
Full source located at :example:`dump_memory_backend`.
.. literalinclude:: ../../taskflow/examples/dump_memory_backend.py
:language: python
:linenos:
:lines: 16-
Making phone calls
==================
@@ -176,6 +200,18 @@ Summation mapper(s) and reducer (in parallel)
:linenos:
:lines: 16-
Sharing a thread pool executor (in parallel)
============================================
.. note::
Full source located at :example:`share_engine_thread`
.. literalinclude:: ../../taskflow/examples/share_engine_thread.py
:language: python
:linenos:
:lines: 16-
Storing & emitting a bill
=========================
@@ -306,3 +342,28 @@ Jobboard producer/consumer (simple)
:language: python
:linenos:
:lines: 16-
Conductor simulating a CI pipeline
==================================
.. note::
Full source located at :example:`tox_conductor`
.. literalinclude:: ../../taskflow/examples/tox_conductor.py
:language: python
:linenos:
:lines: 16-
Conductor running 99 bottles of beer song requests
==================================================
.. note::
Full source located at :example:`99_bottles`
.. literalinclude:: ../../taskflow/examples/99_bottles.py
:language: python
:linenos:
:lines: 16-

2
doc/source/history.rst Normal file
View File

@@ -0,0 +1,2 @@
.. include:: ../../ChangeLog

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 7.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 22 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 20 KiB

View File

@@ -14,7 +14,7 @@ Contents
========
.. toctree::
:maxdepth: 3
:maxdepth: 2
atoms
arguments_and_results
@@ -29,6 +29,9 @@ Contents
jobs
conductors
Supplementary
=============
Examples
--------
@@ -62,7 +65,8 @@ TaskFlow into your project:
* Read over the `paradigm shifts`_ and engage the team in `IRC`_ (or via the
`openstack-dev`_ mailing list) if these need more explanation (prefix
``[TaskFlow]`` to your emails subject to get an even faster response).
``[Oslo][TaskFlow]`` to your emails subject to get an even faster
response).
* Follow (or at least attempt to follow) some of the established
`best practices`_ (feel free to add your own suggested best practices).
* Keep in touch with the team (see above); we are all friendly and enjoy
@@ -85,6 +89,29 @@ Miscellaneous
types
utils
Bookshelf
---------
A useful collection of links, documents, papers, similar
projects, frameworks and libraries.
.. note::
Please feel free to submit your own additions and/or changes.
.. toctree::
:maxdepth: 1
shelf
History
-------
.. toctree::
:maxdepth: 2
history
Indices and tables
==================

View File

@@ -30,7 +30,7 @@ Definitions
Jobs
A :py:class:`job <taskflow.jobs.base.Job>` consists of a unique identifier,
name, and a reference to a :py:class:`logbook
<taskflow.persistence.logbook.LogBook>` which contains the details of the
<taskflow.persistence.models.LogBook>` which contains the details of the
work that has been or should be/will be completed to finish the work that has
been created for that job.
@@ -43,7 +43,7 @@ Jobboards
jobboards implement the same interface and semantics so that the backend
usage is as transparent as possible. This allows deployers or developers of a
service that uses TaskFlow to select a jobboard implementation that fits
their setup (and there intended usage) best.
their setup (and their intended usage) best.
High level architecture
=======================
@@ -62,7 +62,8 @@ Features
the previously partially completed work or begin initial work to ensure
that the workflow as a whole progresses (where progressing implies
transitioning through the workflow :doc:`patterns <patterns>` and
:doc:`atoms <atoms>` and completing their associated state transitions).
:doc:`atoms <atoms>` and completing their associated
:doc:`states <states>` transitions).
- Atomic transfer and single ownership
@@ -94,11 +95,12 @@ Features
Usage
=====
All engines are mere classes that implement same interface, and of course it is
possible to import them and create their instances just like with any classes
in Python. But the easier (and recommended) way for creating jobboards is by
using the :py:meth:`fetch() <taskflow.jobs.backends.fetch>` function which uses
entrypoints (internally using `stevedore`_) to fetch and configure your backend
All jobboards are mere classes that implement same interface, and of course
it is possible to import them and create instances of them just like with any
other class in Python. But the easier (and recommended) way for creating
jobboards is by using the :py:meth:`fetch() <taskflow.jobs.backends.fetch>`
function which uses entrypoints (internally using `stevedore`_) to fetch and
configure your backend.
Using this function the typical creation of a jobboard (and an example posting
of a job) might look like:
@@ -200,13 +202,27 @@ Additional *configuration* parameters:
* ``handler``: a class that provides ``kazoo.handlers``-like interface; it will
be used internally by `kazoo`_ to perform asynchronous operations, useful
when your program uses eventlet and you want to instruct kazoo to use an
eventlet compatible handler (such as the `eventlet handler`_).
eventlet compatible handler.
.. note::
See :py:class:`~taskflow.jobs.backends.impl_zookeeper.ZookeeperJobBoard`
for implementation details.
Redis
-----
**Board type**: ``'redis'``
Uses `redis`_ to provide the jobboard capabilities and semantics by using
a redis hash datastructure and individual job ownership keys (that can
optionally expire after a given amount of time).
.. note::
See :py:class:`~taskflow.jobs.backends.impl_redis.RedisJobBoard`
for implementation details.
Considerations
==============
@@ -218,7 +234,7 @@ Dual-engine jobs
----------------
**What:** Since atoms and engines are not currently `preemptable`_ we can not
force a engine (or the threads/remote workers... it is using to run) to stop
force an engine (or the threads/remote workers... it is using to run) to stop
working on an atom (it is general bad behavior to force code to stop without
its consent anyway) if it has already started working on an atom (short of
doing a ``kill -9`` on the running interpreter). This could cause problems
@@ -265,18 +281,27 @@ Interfaces
Implementations
===============
Zookeeper
---------
.. automodule:: taskflow.jobs.backends.impl_zookeeper
Redis
-----
.. automodule:: taskflow.jobs.backends.impl_redis
Hierarchy
=========
.. inheritance-diagram::
taskflow.jobs.base
taskflow.jobs.backends.impl_redis
taskflow.jobs.backends.impl_zookeeper
:parts: 1
.. _paradigm shift: https://wiki.openstack.org/wiki/TaskFlow/Paradigm_shifts#Workflow_ownership_transfer
.. _zookeeper: http://zookeeper.apache.org/
.. _kazoo: http://kazoo.readthedocs.org/
.. _eventlet handler: https://pypi.python.org/pypi/kazoo-eventlet-handler/
.. _stevedore: http://stevedore.readthedocs.org/
.. _redis: http://redis.io/

View File

@@ -1,6 +1,6 @@
===========================
---------------------------
Notifications and listeners
===========================
---------------------------
.. testsetup::
@@ -10,13 +10,12 @@ Notifications and listeners
from taskflow.types import notifier
ANY = notifier.Notifier.ANY
--------
Overview
--------
========
Engines provide a way to receive notification on task and flow state
transitions, which is useful for monitoring, logging, metrics, debugging
and plenty of other tasks.
transitions (see :doc:`states <states>`), which is useful for
monitoring, logging, metrics, debugging and plenty of other tasks.
To receive these notifications you should register a callback with
an instance of the :py:class:`~taskflow.types.notifier.Notifier`
@@ -27,9 +26,8 @@ TaskFlow also comes with a set of predefined :ref:`listeners <listeners>`, and
provides means to write your own listeners, which can be more convenient than
using raw callbacks.
--------------------------------------
Receiving notifications with callbacks
--------------------------------------
======================================
Flow notifications
------------------
@@ -106,9 +104,8 @@ A basic example is:
.. _listeners:
---------
Listeners
---------
=========
TaskFlow comes with a set of predefined listeners -- helper classes that can be
used to do various actions on flow and/or tasks transitions. You can also
@@ -147,28 +144,31 @@ For example, this is how you can use
<taskflow.engines.action_engine.engine.SerialActionEngine object at ...> has moved task 'DogTalk' (...) into state 'SUCCESS' from state 'RUNNING' with result 'dog' (failure=False)
<taskflow.engines.action_engine.engine.SerialActionEngine object at ...> has moved flow 'cat-dog' (...) into state 'SUCCESS' from state 'RUNNING'
Basic listener
--------------
Interfaces
==========
.. autoclass:: taskflow.listeners.base.Listener
.. automodule:: taskflow.listeners.base
Implementations
===============
Printing and logging listeners
------------------------------
.. autoclass:: taskflow.listeners.base.DumpingListener
.. autoclass:: taskflow.listeners.logging.LoggingListener
.. autoclass:: taskflow.listeners.logging.DynamicLoggingListener
.. autoclass:: taskflow.listeners.printing.PrintingListener
Timing listener
---------------
Timing listeners
----------------
.. autoclass:: taskflow.listeners.timing.TimingListener
.. autoclass:: taskflow.listeners.timing.DurationListener
.. autoclass:: taskflow.listeners.timing.PrintingTimingListener
.. autoclass:: taskflow.listeners.timing.PrintingDurationListener
.. autoclass:: taskflow.listeners.timing.EventTimeListener
Claim listener
--------------
@@ -181,7 +181,7 @@ Capturing listener
.. autoclass:: taskflow.listeners.capturing.CaptureListener
Hierarchy
---------
=========
.. inheritance-diagram::
taskflow.listeners.base.DumpingListener
@@ -191,6 +191,7 @@ Hierarchy
taskflow.listeners.logging.DynamicLoggingListener
taskflow.listeners.logging.LoggingListener
taskflow.listeners.printing.PrintingListener
taskflow.listeners.timing.PrintingTimingListener
taskflow.listeners.timing.TimingListener
taskflow.listeners.timing.PrintingDurationListener
taskflow.listeners.timing.EventTimeListener
taskflow.listeners.timing.DurationListener
:parts: 1

View File

@@ -40,38 +40,38 @@ On :doc:`engine <engines>` construction typically a backend (it can be
optional) will be provided which satisfies the
:py:class:`~taskflow.persistence.base.Backend` abstraction. Along with
providing a backend object a
:py:class:`~taskflow.persistence.logbook.FlowDetail` object will also be
:py:class:`~taskflow.persistence.models.FlowDetail` object will also be
created and provided (this object will contain the details about the flow to be
ran) to the engine constructor (or associated :py:meth:`load()
<taskflow.engines.helpers.load>` helper functions). Typically a
:py:class:`~taskflow.persistence.logbook.FlowDetail` object is created from a
:py:class:`~taskflow.persistence.logbook.LogBook` object (the book object acts
as a type of container for :py:class:`~taskflow.persistence.logbook.FlowDetail`
and :py:class:`~taskflow.persistence.logbook.AtomDetail` objects).
:py:class:`~taskflow.persistence.models.FlowDetail` object is created from a
:py:class:`~taskflow.persistence.models.LogBook` object (the book object acts
as a type of container for :py:class:`~taskflow.persistence.models.FlowDetail`
and :py:class:`~taskflow.persistence.models.AtomDetail` objects).
**Preparation**: Once an engine starts to run it will create a
:py:class:`~taskflow.storage.Storage` object which will act as the engines
interface to the underlying backend storage objects (it provides helper
functions that are commonly used by the engine, avoiding repeating code when
interacting with the provided
:py:class:`~taskflow.persistence.logbook.FlowDetail` and
:py:class:`~taskflow.persistence.models.FlowDetail` and
:py:class:`~taskflow.persistence.base.Backend` objects). As an engine
initializes it will extract (or create)
:py:class:`~taskflow.persistence.logbook.AtomDetail` objects for each atom in
:py:class:`~taskflow.persistence.models.AtomDetail` objects for each atom in
the workflow the engine will be executing.
**Execution:** When an engine beings to execute (see :doc:`engine <engines>`
for more of the details about how an engine goes about this process) it will
examine any previously existing
:py:class:`~taskflow.persistence.logbook.AtomDetail` objects to see if they can
:py:class:`~taskflow.persistence.models.AtomDetail` objects to see if they can
be used for resuming; see :doc:`resumption <resumption>` for more details on
this subject. For atoms which have not finished (or did not finish correctly
from a previous run) they will begin executing only after any dependent inputs
are ready. This is done by analyzing the execution graph and looking at
predecessor :py:class:`~taskflow.persistence.logbook.AtomDetail` outputs and
predecessor :py:class:`~taskflow.persistence.models.AtomDetail` outputs and
states (which may have been persisted in a past run). This will result in
either using there previous information or by running those predecessors and
saving their output to the :py:class:`~taskflow.persistence.logbook.FlowDetail`
either using their previous information or by running those predecessors and
saving their output to the :py:class:`~taskflow.persistence.models.FlowDetail`
and :py:class:`~taskflow.persistence.base.Backend` objects. This
execution, analysis and interaction with the storage objects continues (what is
described here is a simplification of what really happens; which is quite a bit
@@ -81,7 +81,7 @@ will have succeeded or failed in its attempt to run the workflow).
**Post-execution:** Typically when an engine is done running the logbook would
be discarded (to avoid creating a stockpile of useless data) and the backend
storage would be told to delete any contents for a given execution. For certain
use-cases though it may be advantageous to retain logbooks and there contents.
use-cases though it may be advantageous to retain logbooks and their contents.
A few scenarios come to mind:
@@ -176,7 +176,7 @@ concept everyone is familiar with).
See :py:class:`~taskflow.persistence.backends.impl_dir.DirBackend`
for implementation details.
Sqlalchemy
SQLAlchemy
----------
**Connection**: ``'mysql'`` or ``'postgres'`` or ``'sqlite'``
@@ -249,9 +249,13 @@ parent_uuid VARCHAR False
``results`` will contain. This size limit will restrict how many prior
failures a retry atom can contain. More information and a future fix
will be posted to bug `1416088`_ (for the meantime try to ensure that
your retry units history does not grow beyond ~80 prior results).
your retry units history does not grow beyond ~80 prior results). This
truncation can also be avoided by providing ``mysql_sql_mode`` as
``traditional`` when selecting your mysql + sqlalchemy based
backend (see the `mysql modes`_ documentation for what this implies).
.. _1416088: http://bugs.launchpad.net/taskflow/+bug/1416088
.. _mysql modes: http://dev.mysql.com/doc/refman/5.0/en/sql-mode.html
Zookeeper
---------
@@ -279,14 +283,34 @@ Interfaces
.. automodule:: taskflow.persistence.backends
.. automodule:: taskflow.persistence.base
.. automodule:: taskflow.persistence.logbook
.. automodule:: taskflow.persistence.path_based
Models
======
.. automodule:: taskflow.persistence.models
Implementations
===============
.. automodule:: taskflow.persistence.backends.impl_dir
Memory
------
.. automodule:: taskflow.persistence.backends.impl_memory
Files
-----
.. automodule:: taskflow.persistence.backends.impl_dir
SQLAlchemy
----------
.. automodule:: taskflow.persistence.backends.impl_sqlalchemy
Zookeeper
---------
.. automodule:: taskflow.persistence.backends.impl_zookeeper
Storage

View File

@@ -46,7 +46,7 @@ name serves a special purpose in the resumption process (as well as serving a
useful purpose when running, allowing for atom identification in the
:doc:`notification <notifications>` process). The reason for having names is
that an atom in a flow needs to be somehow matched with (a potentially)
existing :py:class:`~taskflow.persistence.logbook.AtomDetail` during engine
existing :py:class:`~taskflow.persistence.models.AtomDetail` during engine
resumption & subsequent running.
The match should be:
@@ -71,9 +71,9 @@ Scenarios
=========
When new flow is loaded into engine, there is no persisted data for it yet, so
a corresponding :py:class:`~taskflow.persistence.logbook.FlowDetail` object
a corresponding :py:class:`~taskflow.persistence.models.FlowDetail` object
will be created, as well as a
:py:class:`~taskflow.persistence.logbook.AtomDetail` object for each atom that
:py:class:`~taskflow.persistence.models.AtomDetail` object for each atom that
is contained in it. These will be immediately saved into the persistence
backend that is configured. If no persistence backend is configured, then as
expected nothing will be saved and the atoms and flow will be ran in a
@@ -94,7 +94,7 @@ When the factory function mentioned above returns the exact same the flow and
atoms (no changes are performed).
**Runtime change:** Nothing should be done -- the engine will re-associate
atoms with :py:class:`~taskflow.persistence.logbook.AtomDetail` objects by name
atoms with :py:class:`~taskflow.persistence.models.AtomDetail` objects by name
and then the engine resumes.
Atom was added
@@ -105,7 +105,7 @@ in (for example for changing the runtime structure of what was previously ran
in the first run).
**Runtime change:** By default when the engine resumes it will notice that a
corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail` does not
corresponding :py:class:`~taskflow.persistence.models.AtomDetail` does not
exist and one will be created and associated.
Atom was removed
@@ -134,7 +134,7 @@ factory should replace this name where it was being used previously.
exist when a new atom is added. In the future TaskFlow could make this easier
by providing a ``upgrade()`` function that can be used to give users the
ability to upgrade atoms before running (manual introspection & modification of
a :py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
a :py:class:`~taskflow.persistence.models.LogBook` can be done before engine
loading and running to accomplish this in the meantime).
Atom was split in two atoms or merged
@@ -150,7 +150,7 @@ exist when a new atom is added or removed. In the future TaskFlow could make
this easier by providing a ``migrate()`` function that can be used to give
users the ability to migrate atoms previous data before running (manual
introspection & modification of a
:py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
:py:class:`~taskflow.persistence.models.LogBook` can be done before engine
loading and running to accomplish this in the meantime).
Flow structure was changed

60
doc/source/shelf.rst Normal file
View File

@@ -0,0 +1,60 @@
Libraries & frameworks
----------------------
* `APScheduler`_ (Python)
* `Async`_ (Python)
* `Celery`_ (Python)
* `Graffiti`_ (Python)
* `JobLib`_ (Python)
* `Luigi`_ (Python)
* `Mesos`_ (C/C++)
* `Papy`_ (Python)
* `Parallel Python`_ (Python)
* `RQ`_ (Python)
* `Spiff`_ (Python)
* `TBB Flow`_ (C/C++)
Languages
---------
* `Ani`_
* `Make`_
* `Plaid`_
Services
--------
* `Cloud Dataflow`_
* `Mistral`_
Papers
------
* `Advances in Dataflow Programming Languages`_
Related paradigms
-----------------
* `Dataflow programming`_
* `Programming paradigm(s)`_
.. _APScheduler: http://pythonhosted.org/APScheduler/
.. _Async: http://pypi.python.org/pypi/async
.. _Celery: http://www.celeryproject.org/
.. _Graffiti: http://github.com/SegFaultAX/graffiti
.. _JobLib: http://pythonhosted.org/joblib/index.html
.. _Luigi: http://github.com/spotify/luigi
.. _RQ: http://python-rq.org/
.. _Mistral: http://wiki.openstack.org/wiki/Mistral
.. _Mesos: http://mesos.apache.org/
.. _Parallel Python: http://www.parallelpython.com/
.. _Spiff: http://github.com/knipknap/SpiffWorkflow
.. _Papy: http://code.google.com/p/papy/
.. _Make: http://www.gnu.org/software/make/
.. _Ani: http://code.google.com/p/anic/
.. _Programming paradigm(s): http://en.wikipedia.org/wiki/Programming_paradigm
.. _Plaid: http://www.cs.cmu.edu/~aldrich/plaid/
.. _Advances in Dataflow Programming Languages: http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/DataFlowProgrammingLanguages.pdf
.. _Cloud Dataflow: https://cloud.google.com/dataflow/
.. _TBB Flow: https://www.threadingbuildingblocks.org/tutorial-intel-tbb-flow-graph
.. _Dataflow programming: http://en.wikipedia.org/wiki/Dataflow_programming

View File

@@ -121,9 +121,14 @@ or if needed will wait for all of the atoms it depends on to complete.
.. note::
A engine running a task also transitions the task to the ``PENDING`` state
An engine running a task also transitions the task to the ``PENDING`` state
after it was reverted and its containing flow was restarted or retried.
**IGNORE** - When a conditional decision has been made to skip (not
execute) the task the engine will transition the task to
the ``IGNORE`` state.
**RUNNING** - When an engine running the task starts to execute the task, the
engine will transition the task to the ``RUNNING`` state, and the task will
stay in this state until the tasks :py:meth:`~taskflow.task.BaseTask.execute`
@@ -168,10 +173,14 @@ flow that the retry is associated with by consulting its
.. note::
A engine running a retry also transitions the retry to the ``PENDING`` state
An engine running a retry also transitions the retry to the ``PENDING`` state
after it was reverted and its associated flow was restarted or retried.
**RUNNING** - When a engine starts to execute the retry, the engine
**IGNORE** - When a conditional decision has been made to skip (not
execute) the retry the engine will transition the retry to
the ``IGNORE`` state.
**RUNNING** - When an engine starts to execute the retry, the engine
transitions the retry to the ``RUNNING`` state, and the retry stays in this
state until its :py:meth:`~taskflow.retry.Retry.execute` method returns.
@@ -194,3 +203,26 @@ already in the ``FAILURE`` state then this is a no-op).
**RETRYING** - If flow that is associated with the current retry was failed and
reverted, the engine prepares the flow for the next run and transitions the
retry to the ``RETRYING`` state.
Jobs
====
.. image:: img/job_states.svg
:width: 500px
:align: center
:alt: Job state transitions
**UNCLAIMED** - A job (with details about what work is to be completed) has
been initially posted (by some posting entity) for work on by some other
entity (for example a :doc:`conductor <conductors>`). This can also be a state
that is entered when some owning entity has manually abandoned (or
lost ownership of) a previously claimed job.
**CLAIMED** - A job that is *actively* owned by some entity; typically that
ownership is tied to jobs persistent data via some ephemeral connection so
that the job ownership is lost (typically automatically or after some
timeout) if that ephemeral connection is lost.
**COMPLETE** - The work defined in the job has been finished by its owning
entity and the job can no longer be processed (and it *may* be removed at
some/any point in the future).

View File

@@ -29,11 +29,6 @@ FSM
.. automodule:: taskflow.types.fsm
Futures
=======
.. automodule:: taskflow.types.futures
Graph
=====
@@ -43,11 +38,12 @@ Notifier
========
.. automodule:: taskflow.types.notifier
:special-members: __call__
Periodic
========
Sets
====
.. automodule:: taskflow.types.periodic
.. automodule:: taskflow.types.sets
Table
=====

View File

@@ -33,11 +33,6 @@ Kombu
.. automodule:: taskflow.utils.kombu_utils
Locks
~~~~~
.. automodule:: taskflow.utils.lock_utils
Miscellaneous
~~~~~~~~~~~~~
@@ -48,6 +43,16 @@ Persistence
.. automodule:: taskflow.utils.persistence_utils
Redis
~~~~~
.. automodule:: taskflow.utils.redis_utils
Schema
~~~~~~
.. automodule:: taskflow.utils.schema_utils
Threading
~~~~~~~~~

View File

@@ -7,10 +7,9 @@ connected via `amqp`_ (or other supported `kombu`_ transports).
.. note::
This engine is under active development and is experimental but it is
usable and does work but is missing some features (please check the
`blueprint page`_ for known issues and plans) that will make it more
production ready.
This engine is under active development and is usable and **does** work
but is missing some features (please check the `blueprint page`_ for
known issues and plans) that will make it more production ready.
.. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
@@ -18,8 +17,8 @@ Terminology
-----------
Client
Code or program or service that uses this library to define flows and
run them via engines.
Code or program or service (or user) that uses this library to define
flows and run them via engines.
Transport + protocol
Mechanism (and `protocol`_ on top of that mechanism) used to pass information
@@ -118,7 +117,7 @@ engine executor in the following manner:
4. The executor gets the task request confirmation from the worker and the task
request state changes from the ``PENDING`` to the ``RUNNING`` state. Once a
task request is in the ``RUNNING`` state it can't be timed-out (considering
that task execution process may take unpredictable time).
that the task execution process may take an unpredictable amount of time).
5. The executor gets the task execution result from the worker and passes it
back to the executor and worker-based engine to finish task processing (this
repeats for subsequent tasks).
@@ -129,7 +128,9 @@ engine executor in the following manner:
json-serializable (they contain references to tracebacks which are not
serializable), so they are converted to dicts before sending and converted
from dicts after receiving on both executor & worker sides (this
translation is lossy since the traceback won't be fully retained).
translation is lossy since the traceback can't be fully retained, due
to its contents containing internal interpreter references and
details).
Protocol
~~~~~~~~
@@ -406,16 +407,20 @@ Limitations
locally to avoid transport overhead for very simple tasks (currently it will
run even lightweight tasks remotely, which may be non-performant).
* Fault detection, currently when a worker acknowledges a task the engine will
wait for the task result indefinitely (a task could take a very long time to
finish). In the future there needs to be a way to limit the duration of a
remote workers execution (and track there liveness) and possibly spawn
the task on a secondary worker if a timeout is reached (aka the first worker
has died or has stopped responding).
wait for the task result indefinitely (a task may take an indeterminate
amount of time to finish). In the future there needs to be a way to limit
the duration of a remote workers execution (and track their liveness) and
possibly spawn the task on a secondary worker if a timeout is reached (aka
the first worker has died or has stopped responding).
Interfaces
==========
Implementations
===============
.. automodule:: taskflow.engines.worker_based.engine
Components
----------
.. automodule:: taskflow.engines.worker_based.proxy
.. automodule:: taskflow.engines.worker_based.worker

View File

@@ -1,7 +1,4 @@
[DEFAULT]
# The list of modules to copy from oslo-incubator.git
script=tools/run_cross_tests.sh
# The base module to hold the copy of openstack.common
base=taskflow

View File

@@ -12,7 +12,7 @@ variable-rgx=[a-z_][a-z0-9_]{0,30}$
argument-rgx=[a-z_][a-z0-9_]{1,30}$
# Method names should be at least 3 characters long
# and be lowecased with underscores
# and be lowercased with underscores
method-rgx=[a-z_][a-z0-9_]{2,50}$
# Don't require docstrings on tests.

View File

@@ -1,30 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
pbr>=0.6,!=0.7,<1.0
# Packages needed for using this library.
# Only needed on python 2.6
ordereddict
# Python 2->3 compatibility library.
six>=1.7.0
# Very nice graph library
networkx>=1.8
# Used for backend storage engine loading.
stevedore>=1.1.0 # Apache-2.0
# Backport for concurrent.futures which exists in 3.2+
futures>=2.1.6
# Used for structured input validation
jsonschema>=2.0.0,<3.0.0
# For common utilities
oslo.utils>=1.2.0 # Apache-2.0
oslo.serialization>=1.2.0 # Apache-2.0

View File

@@ -1,24 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
pbr>=0.6,!=0.7,<1.0
# Packages needed for using this library.
# Python 2->3 compatibility library.
six>=1.7.0
# Very nice graph library
networkx>=1.8
# Used for backend storage engine loading.
stevedore>=1.1.0 # Apache-2.0
# Used for structured input validation
jsonschema>=2.0.0,<3.0.0
# For common utilities
oslo.utils>=1.2.0 # Apache-2.0
oslo.serialization>=1.2.0 # Apache-2.0

48
requirements.txt Normal file
View File

@@ -0,0 +1,48 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
# See: https://bugs.launchpad.net/pbr/+bug/1384919 for why this is here...
pbr<2.0,>=0.11
# Packages needed for using this library.
# Python 2->3 compatibility library.
six>=1.9.0
# Enum library made for <= python 3.3
enum34;python_version=='2.7' or python_version=='2.6'
# For async and/or periodic work
futurist>=0.1.1 # Apache-2.0
# For reader/writer + interprocess locks.
fasteners>=0.7 # Apache-2.0
# Very nice graph library
networkx>=1.8
# For contextlib new additions/compatibility for <= python 3.3
contextlib2>=0.4.0 # PSF License
# Used for backend storage engine loading.
stevedore>=1.5.0 # Apache-2.0
# Backport for concurrent.futures which exists in 3.2+
futures>=3.0;python_version=='2.7' or python_version=='2.6'
# Backport for time.monotonic which is in 3.3+
monotonic>=0.1 # Apache-2.0
# Used for structured input validation
jsonschema!=2.5.0,<3.0.0,>=2.0.0
# For common utilities
oslo.utils>=1.6.0 # Apache-2.0
oslo.serialization>=1.4.0 # Apache-2.0
# For lru caches and such
cachetools>=1.0.0 # MIT License
# For deprecation of things
debtcollector>=0.3.0 # Apache-2.0

View File

@@ -17,10 +17,8 @@ classifier =
Operating System :: POSIX :: Linux
Programming Language :: Python
Programming Language :: Python :: 2
Programming Language :: Python :: 2.6
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.3
Programming Language :: Python :: 3.4
Topic :: Software Development :: Libraries
Topic :: System :: Distributed Computing
@@ -36,6 +34,10 @@ packages =
[entry_points]
taskflow.jobboards =
zookeeper = taskflow.jobs.backends.impl_zookeeper:ZookeeperJobBoard
redis = taskflow.jobs.backends.impl_redis:RedisJobBoard
taskflow.conductors =
blocking = taskflow.conductors.backends.impl_blocking:BlockingConductor
taskflow.persistence =
dir = taskflow.persistence.backends.impl_dir:DirBackend

View File

@@ -1,4 +1,3 @@
#!/usr/bin/env python
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");

View File

@@ -16,14 +16,22 @@
# under the License.
import abc
import collections
import itertools
from oslo_utils import reflection
import six
from six.moves import zip as compat_zip
from taskflow import exceptions
from taskflow.types import sets
from taskflow.utils import misc
# Helper types tuples...
_sequence_types = (list, tuple, collections.Sequence)
_set_types = (set, collections.Set)
def _save_as_to_mapping(save_as):
"""Convert save_as to mapping name => index.
@@ -33,25 +41,27 @@ def _save_as_to_mapping(save_as):
# outside of code so that it's more easily understandable, since what an
# atom returns is pretty crucial for other later operations.
if save_as is None:
return {}
return collections.OrderedDict()
if isinstance(save_as, six.string_types):
# NOTE(harlowja): this means that your atom will only return one item
# instead of a dictionary-like object or a indexable object (like a
# list or tuple).
return {save_as: None}
elif isinstance(save_as, (tuple, list)):
return collections.OrderedDict([(save_as, None)])
elif isinstance(save_as, _sequence_types):
# NOTE(harlowja): this means that your atom will return a indexable
# object, like a list or tuple and the results can be mapped by index
# to that tuple/list that is returned for others to use.
return dict((key, num) for num, key in enumerate(save_as))
elif isinstance(save_as, set):
return collections.OrderedDict((key, num)
for num, key in enumerate(save_as))
elif isinstance(save_as, _set_types):
# NOTE(harlowja): in the case where a set is given we will not be
# able to determine the numeric ordering in a reliable way (since it is
# a unordered set) so the only way for us to easily map the result of
# the atom will be via the key itself.
return dict((key, key) for key in save_as)
raise TypeError('Atom provides parameter '
'should be str, set or tuple/list, not %r' % save_as)
# able to determine the numeric ordering in a reliable way (since it
# may be an unordered set) so the only way for us to easily map the
# result of the atom will be via the key itself.
return collections.OrderedDict((key, key) for key in save_as)
else:
raise TypeError('Atom provides parameter '
'should be str, set or tuple/list, not %r' % save_as)
def _build_rebind_dict(args, rebind_args):
@@ -62,9 +72,9 @@ def _build_rebind_dict(args, rebind_args):
new name onto the required name).
"""
if rebind_args is None:
return {}
return collections.OrderedDict()
elif isinstance(rebind_args, (list, tuple)):
rebind = dict(zip(args, rebind_args))
rebind = collections.OrderedDict(compat_zip(args, rebind_args))
if len(args) < len(rebind_args):
rebind.update((a, a) for a in rebind_args[len(args):])
return rebind
@@ -85,11 +95,11 @@ def _build_arg_mapping(atom_name, reqs, rebind_args, function, do_infer,
extra arguments (where applicable).
"""
# build a list of required arguments based on function signature
# Build a list of required arguments based on function signature.
req_args = reflection.get_callable_args(function, required_only=True)
all_args = reflection.get_callable_args(function, required_only=False)
# remove arguments that are part of ignore_list
# Remove arguments that are part of ignore list.
if ignore_list:
for arg in ignore_list:
if arg in req_args:
@@ -97,65 +107,56 @@ def _build_arg_mapping(atom_name, reqs, rebind_args, function, do_infer,
else:
ignore_list = []
required = {}
# add reqs to required mappings
# Build the required names.
required = collections.OrderedDict()
# Add required arguments to required mappings if inference is enabled.
if do_infer:
required.update((a, a) for a in req_args)
# Add additional manually provided requirements to required mappings.
if reqs:
if isinstance(reqs, six.string_types):
required.update({reqs: reqs})
else:
required.update((a, a) for a in reqs)
# add req_args to required mappings if do_infer is set
if do_infer:
required.update((a, a) for a in req_args)
# update required mappings based on rebind_args
# Update required mappings values based on rebinding of arguments names.
required.update(_build_rebind_dict(req_args, rebind_args))
# Determine if there are optional arguments that we may or may not take.
if do_infer:
opt_args = set(all_args) - set(required) - set(ignore_list)
optional = dict((a, a) for a in opt_args)
opt_args = sets.OrderedSet(all_args)
opt_args = opt_args - set(itertools.chain(six.iterkeys(required),
iter(ignore_list)))
optional = collections.OrderedDict((a, a) for a in opt_args)
else:
optional = {}
optional = collections.OrderedDict()
# Check if we are given some extra arguments that we aren't able to accept.
if not reflection.accepts_kwargs(function):
extra_args = set(required) - set(all_args)
extra_args = sets.OrderedSet(six.iterkeys(required))
extra_args -= all_args
if extra_args:
extra_args_str = ', '.join(sorted(extra_args))
raise ValueError('Extra arguments given to atom %s: %s'
% (atom_name, extra_args_str))
% (atom_name, list(extra_args)))
# NOTE(imelnikov): don't use set to preserve order in error message
missing_args = [arg for arg in req_args if arg not in required]
if missing_args:
raise ValueError('Missing arguments for atom %s: %s'
% (atom_name, ' ,'.join(missing_args)))
% (atom_name, missing_args))
return required, optional
@six.add_metaclass(abc.ABCMeta)
class Atom(object):
"""An abstract flow atom that causes a flow to progress (in some manner).
"""An unit of work that causes a flow to progress (in some manner).
An atom is a named object that operates with input flow data to perform
An atom is a named object that operates with input data to perform
some action that furthers the overall flows progress. It usually also
produces some of its own named output as a result of this process.
:ivar version: An *immutable* version that associates version information
with this atom. It can be useful in resuming older versions
of atoms. Standard major, minor versioning concepts
should apply.
:ivar save_as: An *immutable* output ``resource`` name dictionary this atom
produces that other atoms may depend on this atom providing.
The format is output index (or key when a dictionary
is returned from the execute method) to stored argument
name.
:ivar rebind: An *immutable* input ``resource`` mapping dictionary that
can be used to alter the inputs given to this atom. It is
typically used for mapping a prior atoms output into
the names that this atom expects (in a way this is like
remapping a namespace of another atom into the namespace
of this atom).
:param name: Meaningful name for this atom, should be something that is
distinguishable and understandable for notification,
debugging, storing and any other similar purposes.
@@ -164,52 +165,61 @@ class Atom(object):
to correlate and associate the thing/s this atom
produces, if it produces anything at all.
:param inject: An *immutable* input_name => value dictionary which
specifies any initial inputs that should be automatically
injected into the atoms scope before the atom execution
commences (this allows for providing atom *local* values that
do not need to be provided by other atoms/dependents).
specifies any initial inputs that should be automatically
injected into the atoms scope before the atom execution
commences (this allows for providing atom *local* values
that do not need to be provided by other atoms/dependents).
:ivar version: An *immutable* version that associates version information
with this atom. It can be useful in resuming older versions
of atoms. Standard major, minor versioning concepts
should apply.
:ivar save_as: An *immutable* output ``resource`` name
:py:class:`.OrderedDict` this atom produces that other
atoms may depend on this atom providing. The format is
output index (or key when a dictionary is returned from
the execute method) to stored argument name.
:ivar rebind: An *immutable* input ``resource`` :py:class:`.OrderedDict`
that can be used to alter the inputs given to this atom. It
is typically used for mapping a prior atoms output into
the names that this atom expects (in a way this is like
remapping a namespace of another atom into the namespace
of this atom).
:ivar inject: See parameter ``inject``.
:ivar requires: Any inputs this atom requires to function (if applicable).
NOTE(harlowja): there can be no intersection between what
this atom requires and what it produces (since this would
be an impossible dependency to satisfy).
:ivar optional: Any inputs that are optional for this atom's execute
method.
:ivar name: See parameter ``name``.
:ivar requires: A :py:class:`~taskflow.types.sets.OrderedSet` of inputs
this atom requires to function.
:ivar optional: A :py:class:`~taskflow.types.sets.OrderedSet` of inputs
that are optional for this atom to function.
:ivar provides: A :py:class:`~taskflow.types.sets.OrderedSet` of outputs
this atom produces.
"""
def __init__(self, name=None, provides=None, inject=None):
self._name = name
self.save_as = _save_as_to_mapping(provides)
self.name = name
self.version = (1, 0)
self.inject = inject
self.requires = frozenset()
self.optional = frozenset()
self.save_as = _save_as_to_mapping(provides)
self.requires = sets.OrderedSet()
self.optional = sets.OrderedSet()
self.provides = sets.OrderedSet(self.save_as)
self.rebind = collections.OrderedDict()
def _build_arg_mapping(self, executor, requires=None, rebind=None,
auto_extract=True, ignore_list=None):
req_arg, opt_arg = _build_arg_mapping(self.name, requires, rebind,
executor, auto_extract,
ignore_list)
self.rebind = {}
if opt_arg:
self.rebind.update(opt_arg)
if req_arg:
self.rebind.update(req_arg)
self.requires = frozenset(req_arg.values())
self.optional = frozenset(opt_arg.values())
required, optional = _build_arg_mapping(self.name, requires, rebind,
executor, auto_extract,
ignore_list=ignore_list)
rebind = collections.OrderedDict()
for (arg_name, bound_name) in itertools.chain(six.iteritems(required),
six.iteritems(optional)):
rebind.setdefault(arg_name, bound_name)
self.rebind = rebind
self.requires = sets.OrderedSet(six.itervalues(required))
self.optional = sets.OrderedSet(six.itervalues(optional))
if self.inject:
inject_set = set(six.iterkeys(self.inject))
self.requires -= inject_set
self.optional -= inject_set
out_of_order = self.provides.intersection(self.requires)
if out_of_order:
raise exceptions.DependencyFailure(
"Atom %(item)s provides %(oo)s that are required "
"by this atom"
% dict(item=self.name, oo=sorted(out_of_order)))
inject_keys = frozenset(six.iterkeys(self.inject))
self.requires -= inject_keys
self.optional -= inject_keys
@abc.abstractmethod
def execute(self, *args, **kwargs):
@@ -219,23 +229,8 @@ class Atom(object):
def revert(self, *args, **kwargs):
"""Reverts this atom (undoing any :meth:`execute` side-effects)."""
@property
def name(self):
"""A non-unique name for this atom (human readable)."""
return self._name
def __str__(self):
return "%s==%s" % (self.name, misc.get_version_string(self))
def __repr__(self):
return '<%s %s>' % (reflection.get_class_name(self), self)
@property
def provides(self):
"""Any outputs this atom produces.
NOTE(harlowja): there can be no intersection between what this atom
requires and what it produces (since this would be an impossible
dependency to satisfy).
"""
return set(self.save_as)

View File

@@ -0,0 +1,45 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
import stevedore.driver
from taskflow import exceptions as exc
# NOTE(harlowja): this is the entrypoint namespace, not the module namespace.
CONDUCTOR_NAMESPACE = 'taskflow.conductors'
LOG = logging.getLogger(__name__)
def fetch(kind, name, jobboard, namespace=CONDUCTOR_NAMESPACE, **kwargs):
"""Fetch a conductor backend with the given options.
This fetch method will look for the entrypoint 'kind' in the entrypoint
namespace, and then attempt to instantiate that entrypoint using the
provided name, jobboard and any board specific kwargs.
"""
LOG.debug('Looking for %r conductor driver in %r', kind, namespace)
try:
mgr = stevedore.driver.DriverManager(
namespace, kind,
invoke_on_load=True,
invoke_args=(name, jobboard),
invoke_kwds=kwargs)
return mgr.driver
except RuntimeError as e:
raise exc.NotFound("Could not find conductor %s" % (kind), e)

View File

@@ -0,0 +1,219 @@
# -*- coding: utf-8 -*-
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import threading
try:
from contextlib import ExitStack # noqa
except ImportError:
from contextlib2 import ExitStack # noqa
from debtcollector import removals
from oslo_utils import excutils
import six
from taskflow.conductors import base
from taskflow import exceptions as excp
from taskflow.listeners import logging as logging_listener
from taskflow import logging
from taskflow.types import timing as tt
from taskflow.utils import async_utils
LOG = logging.getLogger(__name__)
WAIT_TIMEOUT = 0.5
NO_CONSUME_EXCEPTIONS = tuple([
excp.ExecutionFailure,
excp.StorageFailure,
])
class BlockingConductor(base.Conductor):
"""A conductor that runs jobs in its own dispatching loop.
This conductor iterates over jobs in the provided jobboard (waiting for
the given timeout if no jobs exist) and attempts to claim them, work on
those jobs in its local thread (blocking further work from being claimed
and consumed) and then consume those work units after completetion. This
process will repeat until the conductor has been stopped or other critical
error occurs.
NOTE(harlowja): consumption occurs even if a engine fails to run due to
a task failure. This is only skipped when an execution failure or
a storage failure occurs which are *usually* correctable by re-running on
a different conductor (storage failures and execution failures may be
transient issues that can be worked around by later execution). If a job
after completing can not be consumed or abandoned the conductor relies
upon the jobboard capabilities to automatically abandon these jobs.
"""
START_FINISH_EVENTS_EMITTED = tuple([
'compilation', 'preparation',
'validation', 'running',
])
"""Events will be emitted for the start and finish of each engine
activity defined above, the actual event name that can be registered
to subscribe to will be ``${event}_start`` and ``${event}_end`` where
the ``${event}`` in this pseudo-variable will be one of these events.
"""
def __init__(self, name, jobboard,
persistence=None, engine=None,
engine_options=None, wait_timeout=None):
super(BlockingConductor, self).__init__(
name, jobboard, persistence=persistence,
engine=engine, engine_options=engine_options)
if wait_timeout is None:
wait_timeout = WAIT_TIMEOUT
if isinstance(wait_timeout, (int, float) + six.string_types):
self._wait_timeout = tt.Timeout(float(wait_timeout))
elif isinstance(wait_timeout, tt.Timeout):
self._wait_timeout = wait_timeout
else:
raise ValueError("Invalid timeout literal: %s" % (wait_timeout))
self._dead = threading.Event()
@removals.removed_kwarg('timeout', version="0.8", removal_version="2.0")
def stop(self, timeout=None):
"""Requests the conductor to stop dispatching.
This method can be used to request that a conductor stop its
consumption & dispatching loop.
The method returns immediately regardless of whether the conductor has
been stopped.
.. deprecated:: 0.8
The ``timeout`` parameter is **deprecated** and is present for
backward compatibility **only**. In order to wait for the
conductor to gracefully shut down, :py:meth:`wait` should be used
instead.
"""
self._wait_timeout.interrupt()
@property
def dispatching(self):
return not self._dead.is_set()
def _listeners_from_job(self, job, engine):
listeners = super(BlockingConductor, self)._listeners_from_job(job,
engine)
listeners.append(logging_listener.LoggingListener(engine, log=LOG))
return listeners
def _dispatch_job(self, job):
engine = self._engine_from_job(job)
listeners = self._listeners_from_job(job, engine)
with ExitStack() as stack:
for listener in listeners:
stack.enter_context(listener)
LOG.debug("Dispatching engine for job '%s'", job)
consume = True
try:
for stage_func, event_name in [(engine.compile, 'compilation'),
(engine.prepare, 'preparation'),
(engine.validate, 'validation'),
(engine.run, 'running')]:
self._notifier.notify("%s_start" % event_name, {
'job': job,
'engine': engine,
'conductor': self,
})
stage_func()
self._notifier.notify("%s_end" % event_name, {
'job': job,
'engine': engine,
'conductor': self,
})
except excp.WrappedFailure as e:
if all((f.check(*NO_CONSUME_EXCEPTIONS) for f in e)):
consume = False
if LOG.isEnabledFor(logging.WARNING):
if consume:
LOG.warn("Job execution failed (consumption being"
" skipped): %s [%s failures]", job, len(e))
else:
LOG.warn("Job execution failed (consumption"
" proceeding): %s [%s failures]", job, len(e))
# Show the failure/s + traceback (if possible)...
for i, f in enumerate(e):
LOG.warn("%s. %s", i + 1, f.pformat(traceback=True))
except NO_CONSUME_EXCEPTIONS:
LOG.warn("Job execution failed (consumption being"
" skipped): %s", job, exc_info=True)
consume = False
except Exception:
LOG.warn("Job execution failed (consumption proceeding): %s",
job, exc_info=True)
else:
LOG.info("Job completed successfully: %s", job)
return async_utils.make_completed_future(consume)
def run(self):
self._dead.clear()
try:
while True:
if self._wait_timeout.is_stopped():
break
dispatched = 0
for job in self._jobboard.iterjobs():
if self._wait_timeout.is_stopped():
break
LOG.debug("Trying to claim job: %s", job)
try:
self._jobboard.claim(job, self._name)
except (excp.UnclaimableJob, excp.NotFound):
LOG.debug("Job already claimed or consumed: %s", job)
continue
consume = False
try:
f = self._dispatch_job(job)
except KeyboardInterrupt:
with excutils.save_and_reraise_exception():
LOG.warn("Job dispatching interrupted: %s", job)
except Exception:
LOG.warn("Job dispatching failed: %s", job,
exc_info=True)
else:
dispatched += 1
consume = f.result()
try:
if consume:
self._jobboard.consume(job, self._name)
else:
self._jobboard.abandon(job, self._name)
except (excp.JobFailure, excp.NotFound):
if consume:
LOG.warn("Failed job consumption: %s", job,
exc_info=True)
else:
LOG.warn("Failed job abandonment: %s", job,
exc_info=True)
if dispatched == 0 and not self._wait_timeout.is_stopped():
self._wait_timeout.wait()
finally:
self._dead.set()
def wait(self, timeout=None):
"""Waits for the conductor to gracefully exit.
This method waits for the conductor to gracefully exit. An optional
timeout can be provided, which will cause the method to return
within the specified timeout. If the timeout is reached, the returned
value will be False.
:param timeout: Maximum number of seconds that the :meth:`wait` method
should block for.
"""
return self._dead.wait(timeout)

View File

@@ -15,16 +15,17 @@
import abc
import threading
import fasteners
import six
from taskflow import engines
from taskflow import exceptions as excp
from taskflow.utils import lock_utils
from taskflow.types import notifier
@six.add_metaclass(abc.ABCMeta)
class Conductor(object):
"""Conductors conduct jobs & assist in associated runtime interactions.
"""Base for all conductor implementations.
Conductors act as entities which extract jobs from a jobboard, assign
there work to some engine (using some desired configuration) and then wait
@@ -34,8 +35,8 @@ class Conductor(object):
period of time will finish up the prior failed conductors work.
"""
def __init__(self, name, jobboard, persistence,
engine=None, engine_options=None):
def __init__(self, name, jobboard,
persistence=None, engine=None, engine_options=None):
self._name = name
self._jobboard = jobboard
self._engine = engine
@@ -45,6 +46,18 @@ class Conductor(object):
self._engine_options = engine_options.copy()
self._persistence = persistence
self._lock = threading.RLock()
self._notifier = notifier.Notifier()
@property
def notifier(self):
"""The conductor actions (or other state changes) notifier.
NOTE(harlowja): different conductor implementations may emit
different events + event details at different times, so refer to your
conductor documentation to know exactly what can and what can not be
subscribed to.
"""
return self._notifier
def _flow_detail_from_job(self, job):
"""Extracts a flow detail from a job (via some manner).
@@ -88,20 +101,36 @@ class Conductor(object):
store = dict(job.details["store"])
else:
store = {}
return engines.load_from_detail(flow_detail, store=store,
engine=self._engine,
backend=self._persistence,
**self._engine_options)
engine = engines.load_from_detail(flow_detail, store=store,
engine=self._engine,
backend=self._persistence,
**self._engine_options)
return engine
@lock_utils.locked
def _listeners_from_job(self, job, engine):
"""Returns a list of listeners to be attached to an engine.
This method should be overridden in order to attach listeners to
engines. It will be called once for each job, and the list returned
listeners will be added to the engine for this job.
:param job: A job instance that is about to be run in an engine.
:param engine: The engine that listeners will be attached to.
:returns: a list of (unregistered) listener instances.
"""
# TODO(dkrause): Create a standard way to pass listeners or
# listener factories over the jobboard
return []
@fasteners.locked
def connect(self):
"""Ensures the jobboard is connected (noop if it is already)."""
if not self._jobboard.connected:
self._jobboard.connect()
@lock_utils.locked
@fasteners.locked
def close(self):
"""Closes the jobboard, disallowing further use."""
"""Closes the contained jobboard, disallowing further use."""
self._jobboard.close()
@abc.abstractmethod

View File

@@ -1,5 +1,7 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
@@ -12,163 +14,18 @@
# License for the specific language governing permissions and limitations
# under the License.
import six
from debtcollector import moves
from debtcollector import removals
from taskflow.conductors import base
from taskflow import exceptions as excp
from taskflow.listeners import logging as logging_listener
from taskflow import logging
from taskflow.types import timing as tt
from taskflow.utils import async_utils
from taskflow.utils import deprecation
from taskflow.utils import threading_utils
from taskflow.conductors.backends import impl_blocking
LOG = logging.getLogger(__name__)
WAIT_TIMEOUT = 0.5
NO_CONSUME_EXCEPTIONS = tuple([
excp.ExecutionFailure,
excp.StorageFailure,
])
# TODO(harlowja): remove this module soon...
removals.removed_module(__name__,
replacement="the conductor entrypoints",
version="0.8", removal_version="2.0",
stacklevel=4)
class SingleThreadedConductor(base.Conductor):
"""A conductor that runs jobs in its own dispatching loop.
This conductor iterates over jobs in the provided jobboard (waiting for
the given timeout if no jobs exist) and attempts to claim them, work on
those jobs in its local thread (blocking further work from being claimed
and consumed) and then consume those work units after completetion. This
process will repeat until the conductor has been stopped or other critical
error occurs.
NOTE(harlowja): consumption occurs even if a engine fails to run due to
a task failure. This is only skipped when an execution failure or
a storage failure occurs which are *usually* correctable by re-running on
a different conductor (storage failures and execution failures may be
transient issues that can be worked around by later execution). If a job
after completing can not be consumed or abandoned the conductor relies
upon the jobboard capabilities to automatically abandon these jobs.
"""
def __init__(self, name, jobboard, persistence,
engine=None, engine_options=None, wait_timeout=None):
super(SingleThreadedConductor, self).__init__(
name, jobboard, persistence,
engine=engine, engine_options=engine_options)
if wait_timeout is None:
wait_timeout = WAIT_TIMEOUT
if isinstance(wait_timeout, (int, float) + six.string_types):
self._wait_timeout = tt.Timeout(float(wait_timeout))
elif isinstance(wait_timeout, tt.Timeout):
self._wait_timeout = wait_timeout
else:
raise ValueError("Invalid timeout literal: %s" % (wait_timeout))
self._dead = threading_utils.Event()
@deprecation.removed_kwarg('timeout',
version="0.8", removal_version="?")
def stop(self, timeout=None):
"""Requests the conductor to stop dispatching.
This method can be used to request that a conductor stop its
consumption & dispatching loop.
The method returns immediately regardless of whether the conductor has
been stopped.
:param timeout: This parameter is **deprecated** and is present for
backward compatibility **only**. In order to wait for
the conductor to gracefully shut down, :meth:`wait`
should be used instead.
"""
self._wait_timeout.interrupt()
@property
def dispatching(self):
return not self._dead.is_set()
def _dispatch_job(self, job):
engine = self._engine_from_job(job)
consume = True
with logging_listener.LoggingListener(engine, log=LOG):
LOG.debug("Dispatching engine %s for job: %s", engine, job)
try:
engine.run()
except excp.WrappedFailure as e:
if all((f.check(*NO_CONSUME_EXCEPTIONS) for f in e)):
consume = False
if LOG.isEnabledFor(logging.WARNING):
if consume:
LOG.warn("Job execution failed (consumption being"
" skipped): %s [%s failures]", job, len(e))
else:
LOG.warn("Job execution failed (consumption"
" proceeding): %s [%s failures]", job, len(e))
# Show the failure/s + traceback (if possible)...
for i, f in enumerate(e):
LOG.warn("%s. %s", i + 1, f.pformat(traceback=True))
except NO_CONSUME_EXCEPTIONS:
LOG.warn("Job execution failed (consumption being"
" skipped): %s", job, exc_info=True)
consume = False
except Exception:
LOG.warn("Job execution failed (consumption proceeding): %s",
job, exc_info=True)
else:
LOG.info("Job completed successfully: %s", job)
return async_utils.make_completed_future(consume)
def run(self):
self._dead.clear()
try:
while True:
if self._wait_timeout.is_stopped():
break
dispatched = 0
for job in self._jobboard.iterjobs():
if self._wait_timeout.is_stopped():
break
LOG.debug("Trying to claim job: %s", job)
try:
self._jobboard.claim(job, self._name)
except (excp.UnclaimableJob, excp.NotFound):
LOG.debug("Job already claimed or consumed: %s", job)
continue
consume = False
try:
f = self._dispatch_job(job)
except Exception:
LOG.warn("Job dispatching failed: %s", job,
exc_info=True)
else:
dispatched += 1
consume = f.result()
try:
if consume:
self._jobboard.consume(job, self._name)
else:
self._jobboard.abandon(job, self._name)
except (excp.JobFailure, excp.NotFound):
if consume:
LOG.warn("Failed job consumption: %s", job,
exc_info=True)
else:
LOG.warn("Failed job abandonment: %s", job,
exc_info=True)
if dispatched == 0 and not self._wait_timeout.is_stopped():
self._wait_timeout.wait()
finally:
self._dead.set()
def wait(self, timeout=None):
"""Waits for the conductor to gracefully exit.
This method waits for the conductor to gracefully exit. An optional
timeout can be provided, which will cause the method to return
within the specified timeout. If the timeout is reached, the returned
value will be False.
:param timeout: Maximum number of seconds that the :meth:`wait` method
should block for.
"""
return self._dead.wait(timeout)
# TODO(harlowja): remove this proxy/legacy class soon...
SingleThreadedConductor = moves.moved_class(
impl_blocking.BlockingConductor, 'SingleThreadedConductor',
__name__, version="0.8", removal_version="?")

View File

@@ -14,8 +14,16 @@
# License for the specific language governing permissions and limitations
# under the License.
from oslo_utils import eventletutils as _eventletutils
# promote helpers to this module namespace
# Give a nice warning that if eventlet is being used these modules
# are highly recommended to be patched (or otherwise bad things could
# happen).
_eventletutils.warn_eventlet_not_patched(
expected_patched_modules=['time', 'thread'])
# Promote helpers to this module namespace (for easy access).
from taskflow.engines.helpers import flow_from_detail # noqa
from taskflow.engines.helpers import load # noqa
from taskflow.engines.helpers import load_from_detail # noqa

View File

@@ -32,11 +32,6 @@ SAVE_RESULT_STATES = (states.SUCCESS, states.FAILURE)
class Action(object):
"""An action that handles executing, state changes, ... of atoms."""
def __init__(self, storage, notifier, walker_factory):
def __init__(self, storage, notifier):
self._storage = storage
self._notifier = notifier
self._walker_factory = walker_factory
@abc.abstractmethod
def handles(self, atom):
"""Checks if this action handles the provided atom."""

View File

@@ -14,13 +14,14 @@
# License for the specific language governing permissions and limitations
# under the License.
import futurist
from taskflow.engines.action_engine.actions import base
from taskflow.engines.action_engine import executor as ex
from taskflow import logging
from taskflow import retry as retry_atom
from taskflow import states
from taskflow.types import failure
from taskflow.types import futures
LOG = logging.getLogger(__name__)
@@ -44,20 +45,14 @@ def _revert_retry(retry, arguments):
class RetryAction(base.Action):
"""An action that handles executing, state changes, ... of retry atoms."""
def __init__(self, storage, notifier, walker_factory):
super(RetryAction, self).__init__(storage, notifier, walker_factory)
self._executor = futures.SynchronousExecutor()
@staticmethod
def handles(atom):
return isinstance(atom, retry_atom.Retry)
def __init__(self, storage, notifier):
super(RetryAction, self).__init__(storage, notifier)
self._executor = futurist.SynchronousExecutor()
def _get_retry_args(self, retry, addons=None):
scope_walker = self._walker_factory(retry)
arguments = self._storage.fetch_mapped_args(
retry.rebind,
atom_name=retry.name,
scope_walker=scope_walker,
optional_args=retry.optional
)
history = self._storage.get_retry_history(retry.name)

View File

@@ -28,14 +28,10 @@ LOG = logging.getLogger(__name__)
class TaskAction(base.Action):
"""An action that handles scheduling, state changes, ... of task atoms."""
def __init__(self, storage, notifier, walker_factory, task_executor):
super(TaskAction, self).__init__(storage, notifier, walker_factory)
def __init__(self, storage, notifier, task_executor):
super(TaskAction, self).__init__(storage, notifier)
self._task_executor = task_executor
@staticmethod
def handles(atom):
return isinstance(atom, task_atom.BaseTask)
def _is_identity_transition(self, old_state, state, task, progress):
if state in base.SAVE_RESULT_STATES:
# saving result is never identity transition
@@ -100,11 +96,9 @@ class TaskAction(base.Action):
def schedule_execution(self, task):
self.change_state(task, states.RUNNING, progress=0.0)
scope_walker = self._walker_factory(task)
arguments = self._storage.fetch_mapped_args(
task.rebind,
atom_name=task.name,
scope_walker=scope_walker,
optional_args=task.optional
)
if task.notifier.can_be_registered(task_atom.EVENT_UPDATE_PROGRESS):
@@ -126,11 +120,9 @@ class TaskAction(base.Action):
def schedule_reversion(self, task):
self.change_state(task, states.REVERTING, progress=0.0)
scope_walker = self._walker_factory(task)
arguments = self._storage.fetch_mapped_args(
task.rebind,
atom_name=task.name,
scope_walker=scope_walker,
optional_args=task.optional
)
task_uuid = self._storage.get_atom_uuid(task.name)

View File

@@ -14,6 +14,8 @@
# License for the specific language governing permissions and limitations
# under the License.
import itertools
from networkx.algorithms import traversal
import six
@@ -21,6 +23,60 @@ from taskflow import retry as retry_atom
from taskflow import states as st
class IgnoreDecider(object):
"""Checks any provided edge-deciders and determines if ok to run."""
def __init__(self, atom, edge_deciders):
self._atom = atom
self._edge_deciders = edge_deciders
def check(self, runtime):
"""Returns bool of whether this decider should allow running."""
results = {}
for name in six.iterkeys(self._edge_deciders):
results[name] = runtime.storage.get(name)
for local_decider in six.itervalues(self._edge_deciders):
if not local_decider(history=results):
return False
return True
def affect(self, runtime):
"""If the :py:func:`~.check` returns false, affects associated atoms.
This will alter the associated atom + successor atoms by setting there
state to ``IGNORE`` so that they are ignored in future runtime
activities.
"""
successors_iter = runtime.analyzer.iterate_subgraph(self._atom)
runtime.reset_nodes(itertools.chain([self._atom], successors_iter),
state=st.IGNORE, intention=st.IGNORE)
def check_and_affect(self, runtime):
"""Handles :py:func:`~.check` + :py:func:`~.affect` in right order."""
proceed = self.check(runtime)
if not proceed:
self.affect(runtime)
return proceed
class NoOpDecider(object):
"""No-op decider that says it is always ok to run & has no effect(s)."""
def check(self, runtime):
"""Always good to go."""
return True
def affect(self, runtime):
"""Does nothing."""
def check_and_affect(self, runtime):
"""Handles :py:func:`~.check` + :py:func:`~.affect` in right order.
Does nothing.
"""
return self.check(runtime)
class Analyzer(object):
"""Analyzes a compilation and aids in execution processes.
@@ -31,21 +87,25 @@ class Analyzer(object):
the rest of the runtime system.
"""
def __init__(self, compilation, storage):
self._storage = storage
self._graph = compilation.execution_graph
def __init__(self, runtime):
self._storage = runtime.storage
self._execution_graph = runtime.compilation.execution_graph
self._check_atom_transition = runtime.check_atom_transition
self._fetch_edge_deciders = runtime.fetch_edge_deciders
def get_next_nodes(self, node=None):
"""Get next nodes to run (originating from node or all nodes)."""
if node is None:
execute = self.browse_nodes_for_execute()
revert = self.browse_nodes_for_revert()
return execute + revert
state = self.get_state(node)
intention = self._storage.get_atom_intention(node.name)
if state == st.SUCCESS:
if intention == st.REVERT:
return [node]
return [
(node, NoOpDecider()),
]
elif intention == st.EXECUTE:
return self.browse_nodes_for_execute(node)
else:
@@ -60,74 +120,90 @@ class Analyzer(object):
def browse_nodes_for_execute(self, node=None):
"""Browse next nodes to execute.
This returns a collection of nodes that are ready to be executed, if
given a specific node it will only examine the successors of that node,
otherwise it will examine the whole graph.
This returns a collection of nodes that *may* be ready to be
executed, if given a specific node it will only examine the successors
of that node, otherwise it will examine the whole graph.
"""
if node:
nodes = self._graph.successors(node)
if node is not None:
nodes = self._execution_graph.successors(node)
else:
nodes = self._graph.nodes_iter()
available_nodes = []
nodes = self._execution_graph.nodes_iter()
ready_nodes = []
for node in nodes:
if self._is_ready_for_execute(node):
available_nodes.append(node)
return available_nodes
is_ready, late_decider = self._get_maybe_ready_for_execute(node)
if is_ready:
ready_nodes.append((node, late_decider))
return ready_nodes
def browse_nodes_for_revert(self, node=None):
"""Browse next nodes to revert.
This returns a collection of nodes that are ready to be be reverted, if
given a specific node it will only examine the predecessors of that
node, otherwise it will examine the whole graph.
This returns a collection of nodes that *may* be ready to be be
reverted, if given a specific node it will only examine the
predecessors of that node, otherwise it will examine the whole
graph.
"""
if node:
nodes = self._graph.predecessors(node)
if node is not None:
nodes = self._execution_graph.predecessors(node)
else:
nodes = self._graph.nodes_iter()
available_nodes = []
nodes = self._execution_graph.nodes_iter()
ready_nodes = []
for node in nodes:
if self._is_ready_for_revert(node):
available_nodes.append(node)
return available_nodes
is_ready, late_decider = self._get_maybe_ready_for_revert(node)
if is_ready:
ready_nodes.append((node, late_decider))
return ready_nodes
def _is_ready_for_execute(self, task):
"""Checks if task is ready to be executed."""
state = self.get_state(task)
intention = self._storage.get_atom_intention(task.name)
transition = st.check_task_transition(state, st.RUNNING)
def _get_maybe_ready_for_execute(self, atom):
"""Returns if an atom is *likely* ready to be executed."""
state = self.get_state(atom)
intention = self._storage.get_atom_intention(atom.name)
transition = self._check_atom_transition(atom, state, st.RUNNING)
if not transition or intention != st.EXECUTE:
return False
return (False, None)
task_names = []
for prev_task in self._graph.predecessors(task):
task_names.append(prev_task.name)
predecessor_names = []
for previous_atom in self._execution_graph.predecessors(atom):
predecessor_names.append(previous_atom.name)
task_states = self._storage.get_atoms_states(task_names)
return all(state == st.SUCCESS and intention == st.EXECUTE
for state, intention in six.itervalues(task_states))
predecessor_states = self._storage.get_atoms_states(predecessor_names)
predecessor_states_iter = six.itervalues(predecessor_states)
ok_to_run = all(state == st.SUCCESS and intention == st.EXECUTE
for state, intention in predecessor_states_iter)
def _is_ready_for_revert(self, task):
"""Checks if task is ready to be reverted."""
state = self.get_state(task)
intention = self._storage.get_atom_intention(task.name)
transition = st.check_task_transition(state, st.REVERTING)
if not ok_to_run:
return (False, None)
else:
edge_deciders = self._fetch_edge_deciders(atom)
return (True, IgnoreDecider(atom, edge_deciders))
def _get_maybe_ready_for_revert(self, atom):
"""Returns if an atom is *likely* ready to be reverted."""
state = self.get_state(atom)
intention = self._storage.get_atom_intention(atom.name)
transition = self._check_atom_transition(atom, state, st.REVERTING)
if not transition or intention not in (st.REVERT, st.RETRY):
return False
return (False, None)
task_names = []
for prev_task in self._graph.successors(task):
task_names.append(prev_task.name)
predecessor_names = []
for previous_atom in self._execution_graph.successors(atom):
predecessor_names.append(previous_atom.name)
task_states = self._storage.get_atoms_states(task_names)
return all(state in (st.PENDING, st.REVERTED)
for state, intention in six.itervalues(task_states))
predecessor_states = self._storage.get_atoms_states(predecessor_names)
predecessor_states_iter = six.itervalues(predecessor_states)
ok_to_run = all(state in (st.PENDING, st.REVERTED)
for state, intention in predecessor_states_iter)
def iterate_subgraph(self, retry):
"""Iterates a subgraph connected to given retry controller."""
for _src, dst in traversal.dfs_edges(self._graph, retry):
if not ok_to_run:
return (False, None)
else:
return (True, NoOpDecider())
def iterate_subgraph(self, atom):
"""Iterates a subgraph connected to given atom."""
for _src, dst in traversal.dfs_edges(self._execution_graph, atom):
yield dst
def iterate_retries(self, state=None):
@@ -135,23 +211,30 @@ class Analyzer(object):
If no state is provided it will yield back all retry controllers.
"""
for node in self._graph.nodes_iter():
for node in self._execution_graph.nodes_iter():
if isinstance(node, retry_atom.Retry):
if not state or self.get_state(node) == state:
yield node
def iterate_all_nodes(self):
for node in self._graph.nodes_iter():
"""Yields back all nodes in the execution graph."""
for node in self._execution_graph.nodes_iter():
yield node
def find_atom_retry(self, atom):
return self._graph.node[atom].get('retry')
"""Returns the retry atom associated to the given atom (or none)."""
return self._execution_graph.node[atom].get('retry')
def is_success(self):
for node in self._graph.nodes_iter():
if self.get_state(node) != st.SUCCESS:
"""Checks if all nodes in the execution graph are in 'happy' state."""
for atom in self.iterate_all_nodes():
atom_state = self.get_state(atom)
if atom_state == st.IGNORE:
continue
if atom_state != st.SUCCESS:
return False
return True
def get_state(self, node):
return self._storage.get_atom_state(node.name)
def get_state(self, atom):
"""Gets the state of a given atom (from the backend storage unit)."""
return self._storage.get_atom_state(atom.name)

View File

@@ -17,14 +17,14 @@
import collections
import threading
import fasteners
from taskflow import exceptions as exc
from taskflow import flow
from taskflow import logging
from taskflow import retry
from taskflow import task
from taskflow.types import graph as gr
from taskflow.types import tree as tr
from taskflow.utils import lock_utils
from taskflow.utils import misc
LOG = logging.getLogger(__name__)
@@ -158,13 +158,22 @@ class Linker(object):
" decomposed into an empty graph" % (v, u, u))
for u in u_g.nodes_iter():
for v in v_g.nodes_iter():
depends_on = u.provides & v.requires
# This is using the intersection() method vs the &
# operator since the latter doesn't work with frozen
# sets (when used in combination with ordered sets).
#
# If this is not done the following happens...
#
# TypeError: unsupported operand type(s)
# for &: 'frozenset' and 'OrderedSet'
depends_on = u.provides.intersection(v.requires)
if depends_on:
edge_attrs = {
_EDGE_REASONS: frozenset(depends_on),
}
_add_update_edges(graph,
[u], [v],
attr_dict={
_EDGE_REASONS: depends_on,
})
attr_dict=edge_attrs)
else:
# Connect nodes with no predecessors in v to nodes with no
# successors in the *first* non-empty predecessor of v (thus
@@ -180,8 +189,84 @@ class Linker(object):
priors.append((u, v))
class _TaskCompiler(object):
"""Non-recursive compiler of tasks."""
@staticmethod
def handles(obj):
return isinstance(obj, task.BaseTask)
def compile(self, task, parent=None):
graph = gr.DiGraph(name=task.name)
graph.add_node(task)
node = tr.Node(task)
if parent is not None:
parent.add(node)
return graph, node
class _FlowCompiler(object):
"""Recursive compiler of flows."""
@staticmethod
def handles(obj):
return isinstance(obj, flow.Flow)
def __init__(self, deep_compiler_func, linker):
self._deep_compiler_func = deep_compiler_func
self._linker = linker
def _connect_retry(self, retry, graph):
graph.add_node(retry)
# All nodes that have no predecessors should depend on this retry.
nodes_to = [n for n in graph.no_predecessors_iter() if n is not retry]
if nodes_to:
_add_update_edges(graph, [retry], nodes_to,
attr_dict=_RETRY_EDGE_DATA)
# Add association for each node of graph that has no existing retry.
for n in graph.nodes_iter():
if n is not retry and flow.LINK_RETRY not in graph.node[n]:
graph.node[n][flow.LINK_RETRY] = retry
@staticmethod
def _occurence_detector(to_graph, from_graph):
return sum(1 for node in from_graph.nodes_iter()
if node in to_graph)
def _decompose_flow(self, flow, parent=None):
"""Decomposes a flow into a graph, tree node + decomposed subgraphs."""
graph = gr.DiGraph(name=flow.name)
node = tr.Node(flow)
if parent is not None:
parent.add(node)
if flow.retry is not None:
node.add(tr.Node(flow.retry))
decomposed_members = {}
for item in flow:
subgraph, _subnode = self._deep_compiler_func(item, parent=node)
decomposed_members[item] = subgraph
if subgraph.number_of_nodes():
graph = gr.merge_graphs(
graph, subgraph,
# We can specialize this to be simpler than the default
# algorithm which creates overhead that we don't
# need for our purposes...
overlap_detector=self._occurence_detector)
return graph, node, decomposed_members
def compile(self, flow, parent=None):
graph, node, decomposed_members = self._decompose_flow(flow,
parent=parent)
self._linker.apply_constraints(graph, flow, decomposed_members)
if flow.retry is not None:
self._connect_retry(flow.retry, graph)
return graph, node
class PatternCompiler(object):
"""Compiles a pattern (or task) into a compilation unit.
"""Compiles a flow pattern (or task) into a compilation unit.
Let's dive into the basic idea for how this works:
@@ -189,9 +274,10 @@ class PatternCompiler(object):
this object could be a task, or a flow (one of the supported patterns),
the end-goal is to produce a :py:class:`.Compilation` object as the result
with the needed components. If this is not possible a
:py:class:`~.taskflow.exceptions.CompilationFailure` will be raised (or
in the case where a unknown type is being requested to compile
a ``TypeError`` will be raised).
:py:class:`~.taskflow.exceptions.CompilationFailure` will be raised.
In the case where a **unknown** type is being requested to compile
a ``TypeError`` will be raised and when a duplicate object (one that
has **already** been compiled) is encountered a ``ValueError`` is raised.
The complexity of this comes into play when the 'root' is a flow that
contains itself other nested flows (and so-on); to compile this object and
@@ -281,98 +367,40 @@ class PatternCompiler(object):
self._freeze = freeze
self._lock = threading.Lock()
self._compilation = None
self._matchers = [
_FlowCompiler(self._compile, self._linker),
_TaskCompiler(),
]
def _flatten(self, item, parent):
"""Flattens a item (pattern, task) into a graph + tree node."""
functor = self._find_flattener(item, parent)
self._pre_item_flatten(item)
graph, node = functor(item, parent)
self._post_item_flatten(item, graph, node)
return graph, node
def _find_flattener(self, item, parent):
"""Locates the flattening function to use to flatten the given item."""
if isinstance(item, flow.Flow):
return self._flatten_flow
elif isinstance(item, task.BaseTask):
return self._flatten_task
elif isinstance(item, retry.Retry):
if parent is None:
raise TypeError("Retry controller '%s' (%s) must only be used"
" as a flow constructor parameter and not as a"
" root component" % (item, type(item)))
else:
raise TypeError("Retry controller '%s' (%s) must only be used"
" as a flow constructor parameter and not as a"
" flow added component" % (item, type(item)))
def _compile(self, item, parent=None):
"""Compiles a item (pattern, task) into a graph + tree node."""
for m in self._matchers:
if m.handles(item):
self._pre_item_compile(item)
graph, node = m.compile(item, parent=parent)
self._post_item_compile(item, graph, node)
return graph, node
else:
raise TypeError("Unknown item '%s' (%s) requested to flatten"
raise TypeError("Unknown object '%s' (%s) requested to compile"
% (item, type(item)))
def _connect_retry(self, retry, graph):
graph.add_node(retry)
# All nodes that have no predecessors should depend on this retry.
nodes_to = [n for n in graph.no_predecessors_iter() if n is not retry]
if nodes_to:
_add_update_edges(graph, [retry], nodes_to,
attr_dict=_RETRY_EDGE_DATA)
# Add association for each node of graph that has no existing retry.
for n in graph.nodes_iter():
if n is not retry and flow.LINK_RETRY not in graph.node[n]:
graph.node[n][flow.LINK_RETRY] = retry
def _flatten_task(self, task, parent):
"""Flattens a individual task."""
graph = gr.DiGraph(name=task.name)
graph.add_node(task)
node = tr.Node(task)
if parent is not None:
parent.add(node)
return graph, node
def _decompose_flow(self, flow, parent):
"""Decomposes a flow into a graph, tree node + decomposed subgraphs."""
graph = gr.DiGraph(name=flow.name)
node = tr.Node(flow)
if parent is not None:
parent.add(node)
if flow.retry is not None:
node.add(tr.Node(flow.retry))
decomposed_members = {}
for item in flow:
subgraph, _subnode = self._flatten(item, node)
decomposed_members[item] = subgraph
if subgraph.number_of_nodes():
graph = gr.merge_graphs([graph, subgraph])
return graph, node, decomposed_members
def _flatten_flow(self, flow, parent):
"""Flattens a flow."""
graph, node, decomposed_members = self._decompose_flow(flow, parent)
self._linker.apply_constraints(graph, flow, decomposed_members)
if flow.retry is not None:
self._connect_retry(flow.retry, graph)
return graph, node
def _pre_item_flatten(self, item):
"""Called before a item is flattened; any pre-flattening actions."""
def _pre_item_compile(self, item):
"""Called before a item is compiled; any pre-compilation actions."""
if item in self._history:
raise ValueError("Already flattened item '%s' (%s), recursive"
" flattening is not supported" % (item,
type(item)))
raise ValueError("Already compiled item '%s' (%s), duplicate"
" and/or recursive compiling is not"
" supported" % (item, type(item)))
self._history.add(item)
def _post_item_flatten(self, item, graph, node):
"""Called after a item is flattened; doing post-flattening actions."""
def _post_item_compile(self, item, graph, node):
"""Called after a item is compiled; doing post-compilation actions."""
def _pre_flatten(self):
"""Called before the flattening of the root starts."""
def _pre_compile(self):
"""Called before the compilation of the root starts."""
self._history.clear()
def _post_flatten(self, graph, node):
"""Called after the flattening of the root finishes successfully."""
def _post_compile(self, graph, node):
"""Called after the compilation of the root finishes successfully."""
dup_names = misc.get_duplicate_keys(graph.nodes_iter(),
key=lambda node: node.name)
if dup_names:
@@ -396,13 +424,13 @@ class PatternCompiler(object):
# Indent it so that it's slightly offset from the above line.
LOG.blather(" %s", line)
@lock_utils.locked
@fasteners.locked
def compile(self):
"""Compiles the contained item into a compiled equivalent."""
if self._compilation is None:
self._pre_flatten()
graph, node = self._flatten(self._root, None)
self._post_flatten(graph, node)
self._pre_compile()
graph, node = self._compile(self._root, parent=None)
self._post_compile(graph, node)
if self._freeze:
graph.freeze()
node.freeze()

View File

@@ -14,22 +14,102 @@
# License for the specific language governing permissions and limitations
# under the License.
import abc
import weakref
from oslo_utils import reflection
import six
from taskflow.engines.action_engine import executor as ex
from taskflow import logging
from taskflow import retry as retry_atom
from taskflow import states as st
from taskflow import task as task_atom
from taskflow.types import failure
LOG = logging.getLogger(__name__)
@six.add_metaclass(abc.ABCMeta)
class Strategy(object):
"""Failure resolution strategy base class."""
strategy = None
def __init__(self, runtime):
self._runtime = runtime
@abc.abstractmethod
def apply(self):
"""Applies some algorithm to resolve some detected failure."""
def __str__(self):
base = reflection.get_class_name(self, fully_qualified=False)
if self.strategy is not None:
strategy_name = self.strategy.name
else:
strategy_name = "???"
return base + "(strategy=%s)" % (strategy_name)
class RevertAndRetry(Strategy):
"""Sets the *associated* subflow for revert to be later retried."""
strategy = retry_atom.RETRY
def __init__(self, runtime, retry):
super(RevertAndRetry, self).__init__(runtime)
self._retry = retry
def apply(self):
tweaked = self._runtime.reset_nodes([self._retry], state=None,
intention=st.RETRY)
tweaked.extend(self._runtime.reset_subgraph(self._retry, state=None,
intention=st.REVERT))
return tweaked
class RevertAll(Strategy):
"""Sets *all* nodes/atoms to the ``REVERT`` intention."""
strategy = retry_atom.REVERT_ALL
def __init__(self, runtime):
super(RevertAll, self).__init__(runtime)
self._analyzer = runtime.analyzer
def apply(self):
return self._runtime.reset_nodes(self._analyzer.iterate_all_nodes(),
state=None, intention=st.REVERT)
class Revert(Strategy):
"""Sets atom and *associated* nodes to the ``REVERT`` intention."""
strategy = retry_atom.REVERT
def __init__(self, runtime, atom):
super(Revert, self).__init__(runtime)
self._atom = atom
def apply(self):
tweaked = self._runtime.reset_nodes([self._atom], state=None,
intention=st.REVERT)
tweaked.extend(self._runtime.reset_subgraph(self._atom, state=None,
intention=st.REVERT))
return tweaked
class Completer(object):
"""Completes atoms using actions to complete them."""
def __init__(self, runtime):
self._runtime = runtime
self._runtime = weakref.proxy(runtime)
self._analyzer = runtime.analyzer
self._retry_action = runtime.retry_action
self._storage = runtime.storage
self._task_action = runtime.task_action
self._undefined_resolver = RevertAll(self._runtime)
def _complete_task(self, task, event, result):
"""Completes the given task, processes task failure."""
@@ -75,6 +155,32 @@ class Completer(object):
return True
return False
def _determine_resolution(self, atom, failure):
"""Determines which resolution strategy to activate/apply."""
retry = self._analyzer.find_atom_retry(atom)
if retry is not None:
# Ask retry controller what to do in case of failure.
strategy = self._retry_action.on_failure(retry, atom, failure)
if strategy == retry_atom.RETRY:
return RevertAndRetry(self._runtime, retry)
elif strategy == retry_atom.REVERT:
# Ask parent retry and figure out what to do...
parent_resolver = self._determine_resolution(retry, failure)
# Ok if the parent resolver says something not REVERT, and
# it isn't just using the undefined resolver, assume the
# parent knows best.
if parent_resolver is not self._undefined_resolver:
if parent_resolver.strategy != retry_atom.REVERT:
return parent_resolver
return Revert(self._runtime, retry)
elif strategy == retry_atom.REVERT_ALL:
return RevertAll(self._runtime)
else:
raise ValueError("Unknown atom failure resolution"
" action/strategy '%s'" % strategy)
else:
return self._undefined_resolver
def _process_atom_failure(self, atom, failure):
"""Processes atom failure & applies resolution strategies.
@@ -84,30 +190,15 @@ class Completer(object):
then adjust the needed other atoms intentions, and states, ... so that
the failure can be worked around.
"""
retry = self._analyzer.find_atom_retry(atom)
if retry is not None:
# Ask retry controller what to do in case of failure
action = self._retry_action.on_failure(retry, atom, failure)
if action == retry_atom.RETRY:
# Prepare just the surrounding subflow for revert to be later
# retried...
self._storage.set_atom_intention(retry.name, st.RETRY)
self._runtime.reset_subgraph(retry, state=None,
intention=st.REVERT)
elif action == retry_atom.REVERT:
# Ask parent checkpoint.
self._process_atom_failure(retry, failure)
elif action == retry_atom.REVERT_ALL:
# Prepare all flow for revert
self._revert_all()
else:
raise ValueError("Unknown atom failure resolution"
" action '%s'" % action)
resolver = self._determine_resolution(atom, failure)
LOG.debug("Applying resolver '%s' to resolve failure '%s'"
" of atom '%s'", resolver, failure, atom)
tweaked = resolver.apply()
# Only show the tweaked node list when blather is on, otherwise
# just show the amount/count of nodes tweaks...
if LOG.isEnabledFor(logging.BLATHER):
LOG.blather("Modified/tweaked %s nodes while applying"
" resolver '%s'", tweaked, resolver)
else:
# Prepare all flow for revert
self._revert_all()
def _revert_all(self):
"""Attempts to set all nodes to the REVERT intention."""
self._runtime.reset_nodes(self._analyzer.iterate_all_nodes(),
state=None, intention=st.REVERT)
LOG.debug("Modified/tweaked %s nodes while applying"
" resolver '%s'", len(tweaked), resolver)

View File

@@ -19,7 +19,10 @@ import contextlib
import threading
from concurrent import futures
import fasteners
import networkx as nx
from oslo_utils import excutils
from oslo_utils import strutils
import six
from taskflow.engines.action_engine import compiler
@@ -27,11 +30,14 @@ from taskflow.engines.action_engine import executor
from taskflow.engines.action_engine import runtime
from taskflow.engines import base
from taskflow import exceptions as exc
from taskflow import logging
from taskflow import states
from taskflow import storage
from taskflow.types import failure
from taskflow.utils import lock_utils
from taskflow.utils import misc
LOG = logging.getLogger(__name__)
@contextlib.contextmanager
def _start_stop(executor):
@@ -60,6 +66,13 @@ class ActionEngine(base.Engine):
"""
_compiler_factory = compiler.PatternCompiler
NO_RERAISING_STATES = frozenset([states.SUSPENDED, states.SUCCESS])
"""
States that if the engine stops in will **not** cause any potential
failures to be reraised. States **not** in this list will cause any
failure/s that were captured (if any) to get reraised.
"""
def __init__(self, flow, flow_detail, backend, options):
super(ActionEngine, self).__init__(flow, flow_detail, backend, options)
self._runtime = None
@@ -69,10 +82,18 @@ class ActionEngine(base.Engine):
self._state_lock = threading.RLock()
self._storage_ensured = False
def _check(self, name, check_compiled, check_storage_ensured):
"""Check (and raise) if the engine has not reached a certain stage."""
if check_compiled and not self._compiled:
raise exc.InvalidState("Can not %s an engine which"
" has not been compiled" % name)
if check_storage_ensured and not self._storage_ensured:
raise exc.InvalidState("Can not %s an engine"
" which has not has its storage"
" populated" % name)
def suspend(self):
if not self._compiled:
raise exc.InvalidState("Can not suspend an engine"
" which has not been compiled")
self._check('suspend', True, False)
self._change_state(states.SUSPENDING)
@property
@@ -88,8 +109,31 @@ class ActionEngine(base.Engine):
else:
return None
@misc.cachedproperty
def storage(self):
"""The storage unit for this engine.
NOTE(harlowja): the atom argument lookup strategy will change for
this storage unit after
:py:func:`~taskflow.engines.base.Engine.compile` has
completed (since **only** after compilation is the actual structure
known). Before :py:func:`~taskflow.engines.base.Engine.compile`
has completed the atom argument lookup strategy lookup will be
restricted to injected arguments **only** (this will **not** reflect
the actual runtime lookup strategy, which typically will be, but is
not always different).
"""
def _scope_fetcher(atom_name):
if self._compiled:
return self._runtime.fetch_scopes_for(atom_name)
else:
return None
return storage.Storage(self._flow_detail,
backend=self._backend,
scope_fetcher=_scope_fetcher)
def run(self):
with lock_utils.try_lock(self._lock) as was_locked:
with fasteners.try_lock(self._lock) as was_locked:
if not was_locked:
raise exc.ExecutionFailure("Engine currently locked, please"
" try again later")
@@ -119,6 +163,7 @@ class ActionEngine(base.Engine):
"""
self.compile()
self.prepare()
self.validate()
runner = self._runtime.runner
last_state = None
with _start_stop(self._task_executor):
@@ -148,7 +193,7 @@ class ActionEngine(base.Engine):
ignorable_states = getattr(runner, 'ignorable_states', [])
if last_state and last_state not in ignorable_states:
self._change_state(last_state)
if last_state not in [states.SUSPENDED, states.SUCCESS]:
if last_state not in self.NO_RERAISING_STATES:
failures = self.storage.get_failures()
failure.Failure.reraise_if_any(failures.values())
@@ -168,16 +213,63 @@ class ActionEngine(base.Engine):
def _ensure_storage(self):
"""Ensure all contained atoms exist in the storage unit."""
transient = strutils.bool_from_string(
self._options.get('inject_transient', True))
self.storage.ensure_atoms(
self._compilation.execution_graph.nodes_iter())
for node in self._compilation.execution_graph.nodes_iter():
self.storage.ensure_atom(node)
if node.inject:
self.storage.inject_atom_args(node.name, node.inject)
self.storage.inject_atom_args(node.name,
node.inject,
transient=transient)
@lock_utils.locked
@fasteners.locked
def validate(self):
self._check('validate', True, True)
# At this point we can check to ensure all dependencies are either
# flow/task provided or storage provided, if there are still missing
# dependencies then this flow will fail at runtime (which we can avoid
# by failing at validation time).
execution_graph = self._compilation.execution_graph
if LOG.isEnabledFor(logging.BLATHER):
LOG.blather("Validating scoping and argument visibility for"
" execution graph with %s nodes and %s edges with"
" density %0.3f", execution_graph.number_of_nodes(),
execution_graph.number_of_edges(),
nx.density(execution_graph))
missing = set()
# Attempt to retain a chain of what was missing (so that the final
# raised exception for the flow has the nodes that had missing
# dependencies).
last_cause = None
last_node = None
missing_nodes = 0
fetch_func = self.storage.fetch_unsatisfied_args
for node in execution_graph.nodes_iter():
node_missing = fetch_func(node.name, node.rebind,
optional_args=node.optional)
if node_missing:
cause = exc.MissingDependencies(node,
sorted(node_missing),
cause=last_cause)
last_cause = cause
last_node = node
missing_nodes += 1
missing.update(node_missing)
if missing:
# For when a task is provided (instead of a flow) and that
# task is the only item in the graph and its missing deps, avoid
# re-wrapping it in yet another exception...
if missing_nodes == 1 and last_node is self._flow:
raise last_cause
else:
raise exc.MissingDependencies(self._flow,
sorted(missing),
cause=last_cause)
@fasteners.locked
def prepare(self):
if not self._compiled:
raise exc.InvalidState("Can not prepare an engine"
" which has not been compiled")
self._check('prepare', True, False)
if not self._storage_ensured:
# Set our own state to resuming -> (ensure atoms exist
# in storage) -> suspended in the storage unit and notify any
@@ -186,14 +278,6 @@ class ActionEngine(base.Engine):
self._ensure_storage()
self._change_state(states.SUSPENDED)
self._storage_ensured = True
# At this point we can check to ensure all dependencies are either
# flow/task provided or storage provided, if there are still missing
# dependencies then this flow will fail at runtime (which we can avoid
# by failing at preparation time).
external_provides = set(self.storage.fetch_all().keys())
missing = self._flow.requires - external_provides
if missing:
raise exc.MissingDependencies(self._flow, sorted(missing))
# Reset everything back to pending (if we were previously reverted).
if self.storage.get_flow_state() == states.REVERTED:
self._runtime.reset_all()
@@ -203,7 +287,7 @@ class ActionEngine(base.Engine):
def _compiler(self):
return self._compiler_factory(self._flow)
@lock_utils.locked
@fasteners.locked
def compile(self):
if self._compiled:
return
@@ -212,6 +296,7 @@ class ActionEngine(base.Engine):
self.storage,
self.atom_notifier,
self._task_executor)
self._runtime.compile()
self._compiled = True
@@ -239,7 +324,7 @@ class _ExecutorTextMatch(collections.namedtuple('_ExecutorTextMatch',
class ParallelActionEngine(ActionEngine):
"""Engine that runs tasks in parallel manner.
Supported keyword arguments:
Supported option keys:
* ``executor``: a object that implements a :pep:`3148` compatible executor
interface; it will be used for scheduling tasks. The following
@@ -279,7 +364,7 @@ String (case insensitive) Executor used
#
# NOTE(harlowja): the reason we use the library/built-in futures is to
# allow for instances of that to be detected and handled correctly, instead
# of forcing everyone to use our derivatives...
# of forcing everyone to use our derivatives (futurist or other)...
_executor_cls_matchers = [
_ExecutorTypeMatch((futures.ThreadPoolExecutor,),
executor.ParallelThreadTaskExecutor),

View File

@@ -19,7 +19,9 @@ import collections
from multiprocessing import managers
import os
import pickle
import threading
import futurist
from oslo_utils import excutils
from oslo_utils import reflection
from oslo_utils import timeutils
@@ -30,9 +32,7 @@ from six.moves import queue as compat_queue
from taskflow import logging
from taskflow import task as task_atom
from taskflow.types import failure
from taskflow.types import futures
from taskflow.types import notifier
from taskflow.types import timing
from taskflow.utils import async_utils
from taskflow.utils import threading_utils
@@ -175,7 +175,7 @@ class _WaitWorkItem(object):
'kind': _KIND_COMPLETE_ME,
}
if self._channel.put(message):
watch = timing.StopWatch()
watch = timeutils.StopWatch()
watch.start()
self._barrier.wait()
LOG.blather("Waited %s seconds until task '%s' %s emitted"
@@ -240,7 +240,7 @@ class _Dispatcher(object):
raise ValueError("Provided dispatch periodicity must be greater"
" than zero and not '%s'" % dispatch_periodicity)
self._targets = {}
self._dead = threading_utils.Event()
self._dead = threading.Event()
self._dispatch_periodicity = dispatch_periodicity
self._stop_when_empty = False
@@ -304,7 +304,7 @@ class _Dispatcher(object):
" %s to target '%s'", kind, sender, target)
def run(self, queue):
watch = timing.StopWatch(duration=self._dispatch_periodicity)
watch = timeutils.StopWatch(duration=self._dispatch_periodicity)
while (not self._dead.is_set() or
(self._stop_when_empty and self._targets)):
watch.restart()
@@ -347,18 +347,16 @@ class TaskExecutor(object):
def start(self):
"""Prepare to execute tasks."""
pass
def stop(self):
"""Finalize task executor."""
pass
class SerialTaskExecutor(TaskExecutor):
"""Executes tasks one after another."""
def __init__(self):
self._executor = futures.SynchronousExecutor()
self._executor = futurist.SynchronousExecutor()
def start(self):
self._executor.restart()
@@ -417,11 +415,8 @@ class ParallelTaskExecutor(TaskExecutor):
def start(self):
if self._own_executor:
if self._max_workers is not None:
max_workers = self._max_workers
else:
max_workers = threading_utils.get_optimal_thread_count()
self._executor = self._create_executor(max_workers=max_workers)
self._executor = self._create_executor(
max_workers=self._max_workers)
def stop(self):
if self._own_executor:
@@ -433,7 +428,7 @@ class ParallelThreadTaskExecutor(ParallelTaskExecutor):
"""Executes tasks in parallel using a thread pool executor."""
def _create_executor(self, max_workers=None):
return futures.ThreadPoolExecutor(max_workers=max_workers)
return futurist.ThreadPoolExecutor(max_workers=max_workers)
class ParallelProcessTaskExecutor(ParallelTaskExecutor):
@@ -463,7 +458,7 @@ class ParallelProcessTaskExecutor(ParallelTaskExecutor):
self._queue = None
def _create_executor(self, max_workers=None):
return futures.ProcessPoolExecutor(max_workers=max_workers)
return futurist.ProcessPoolExecutor(max_workers=max_workers)
def start(self):
if threading_utils.is_alive(self._worker):

View File

@@ -51,39 +51,50 @@ class _MachineMemory(object):
self.done = set()
class _MachineBuilder(object):
"""State machine *builder* that the runner uses.
class Runner(object):
"""State machine *builder* + *runner* that powers the engine components.
NOTE(harlowja): the machine states that this build will for are::
NOTE(harlowja): the machine (states and events that will trigger
transitions) that this builds is represented by the following
table::
+--------------+------------------+------------+----------+---------+
Start | Event | End | On Enter | On Exit
+--------------+------------------+------------+----------+---------+
ANALYZING | completed | GAME_OVER | |
ANALYZING | schedule_next | SCHEDULING | |
ANALYZING | wait_finished | WAITING | |
FAILURE[$] | | | |
GAME_OVER | failed | FAILURE | |
GAME_OVER | reverted | REVERTED | |
GAME_OVER | success | SUCCESS | |
GAME_OVER | suspended | SUSPENDED | |
RESUMING | schedule_next | SCHEDULING | |
REVERTED[$] | | | |
SCHEDULING | wait_finished | WAITING | |
SUCCESS[$] | | | |
SUSPENDED[$] | | | |
UNDEFINED[^] | start | RESUMING | |
WAITING | examine_finished | ANALYZING | |
+--------------+------------------+------------+----------+---------+
+--------------+------------------+------------+----------+---------+
Start | Event | End | On Enter | On Exit
+--------------+------------------+------------+----------+---------+
ANALYZING | completed | GAME_OVER | |
ANALYZING | schedule_next | SCHEDULING | |
ANALYZING | wait_finished | WAITING | |
FAILURE[$] | | | |
GAME_OVER | failed | FAILURE | |
GAME_OVER | reverted | REVERTED | |
GAME_OVER | success | SUCCESS | |
GAME_OVER | suspended | SUSPENDED | |
RESUMING | schedule_next | SCHEDULING | |
REVERTED[$] | | | |
SCHEDULING | wait_finished | WAITING | |
SUCCESS[$] | | | |
SUSPENDED[$] | | | |
UNDEFINED[^] | start | RESUMING | |
WAITING | examine_finished | ANALYZING | |
+--------------+------------------+------------+----------+---------+
Between any of these yielded states (minus ``GAME_OVER`` and ``UNDEFINED``)
if the engine has been suspended or the engine has failed (due to a
non-resolveable task failure or scheduling failure) the machine will stop
executing new tasks (currently running tasks will be allowed to complete)
and this machines run loop will be broken.
NOTE(harlowja): If the runtimes scheduler component is able to schedule
tasks in parallel, this enables parallel running and/or reversion.
"""
# Informational states this action yields while running, not useful to
# have the engine record but useful to provide to end-users when doing
# execution iterations.
ignorable_states = (st.SCHEDULING, st.WAITING, st.RESUMING, st.ANALYZING)
def __init__(self, runtime, waiter):
self._runtime = runtime
self._analyzer = runtime.analyzer
self._completer = runtime.completer
self._scheduler = runtime.scheduler
@@ -91,20 +102,36 @@ class _MachineBuilder(object):
self._waiter = waiter
def runnable(self):
"""Checks if the storage says the flow is still runnable/running."""
return self._storage.get_flow_state() == st.RUNNING
def build(self, timeout=None):
"""Builds a state-machine (that can be/is used during running)."""
memory = _MachineMemory()
if timeout is None:
timeout = _WAITING_TIMEOUT
# Cache some local functions/methods...
do_schedule = self._scheduler.schedule
wait_for_any = self._waiter.wait_for_any
do_complete = self._completer.complete
def iter_next_nodes(target_node=None):
# Yields and filters and tweaks the next nodes to execute...
maybe_nodes = self._analyzer.get_next_nodes(node=target_node)
for node, late_decider in maybe_nodes:
proceed = late_decider.check_and_affect(self._runtime)
if proceed:
yield node
def resume(old_state, new_state, event):
# This reaction function just updates the state machines memory
# to include any nodes that need to be executed (from a previous
# attempt, which may be empty if never ran before) and any nodes
# that are now ready to be ran.
memory.next_nodes.update(self._completer.resume())
memory.next_nodes.update(self._analyzer.get_next_nodes())
memory.next_nodes.update(iter_next_nodes())
return _SCHEDULE
def game_over(old_state, new_state, event):
@@ -114,7 +141,7 @@ class _MachineBuilder(object):
# it is *always* called before the final state is entered.
if memory.failures:
return _FAILED
if self._analyzer.get_next_nodes():
if any(1 for node in iter_next_nodes()):
return _SUSPENDED
elif self._analyzer.is_success():
return _SUCCESS
@@ -128,8 +155,7 @@ class _MachineBuilder(object):
# that holds this information to stop or suspend); handles failures
# that occur during this process safely...
if self.runnable() and memory.next_nodes:
not_done, failures = self._scheduler.schedule(
memory.next_nodes)
not_done, failures = do_schedule(memory.next_nodes)
if not_done:
memory.not_done.update(not_done)
if failures:
@@ -142,8 +168,7 @@ class _MachineBuilder(object):
# call sometime in the future, or equivalent that will work in
# py2 and py3.
if memory.not_done:
done, not_done = self._waiter.wait_for_any(memory.not_done,
timeout)
done, not_done = wait_for_any(memory.not_done, timeout)
memory.done.update(done)
memory.not_done = not_done
return _ANALYZE
@@ -160,7 +185,7 @@ class _MachineBuilder(object):
node = fut.atom
try:
event, result = fut.result()
retain = self._completer.complete(node, event, result)
retain = do_complete(node, event, result)
if isinstance(result, failure.Failure):
if retain:
memory.failures.append(result)
@@ -183,7 +208,7 @@ class _MachineBuilder(object):
memory.failures.append(failure.Failure())
else:
try:
more_nodes = self._analyzer.get_next_nodes(node)
more_nodes = set(iter_next_nodes(target_node=node))
except Exception:
memory.failures.append(failure.Failure())
else:
@@ -204,10 +229,10 @@ class _MachineBuilder(object):
LOG.debug("Entering new state '%s' in response to event '%s'",
new_state, event)
# NOTE(harlowja): when ran in debugging mode it is quite useful
# NOTE(harlowja): when ran in blather mode it is quite useful
# to track the various state transitions as they happen...
watchers = {}
if LOG.isEnabledFor(logging.DEBUG):
if LOG.isEnabledFor(logging.BLATHER):
watchers['on_exit'] = on_exit
watchers['on_enter'] = on_enter
@@ -244,38 +269,9 @@ class _MachineBuilder(object):
m.freeze()
return (m, memory)
class Runner(object):
"""Runner that iterates while executing nodes using the given runtime.
This runner acts as the action engine run loop/state-machine, it resumes
the workflow, schedules all task it can for execution using the runtimes
scheduler and analyzer components, and than waits on returned futures and
then activates the runtimes completion component to finish up those tasks
and so on...
NOTE(harlowja): If the runtimes scheduler component is able to schedule
tasks in parallel, this enables parallel running and/or reversion.
"""
# Informational states this action yields while running, not useful to
# have the engine record but useful to provide to end-users when doing
# execution iterations.
ignorable_states = (st.SCHEDULING, st.WAITING, st.RESUMING, st.ANALYZING)
def __init__(self, runtime, waiter):
self._builder = _MachineBuilder(runtime, waiter)
@property
def builder(self):
return self._builder
def runnable(self):
return self._builder.runnable()
def run_iter(self, timeout=None):
"""Runs the nodes using a built state machine."""
machine, memory = self.builder.build(timeout=timeout)
"""Runs iteratively using a locally built state machine."""
machine, memory = self.build(timeout=timeout)
for (_prior_state, new_state) in machine.run_iter(_START):
# NOTE(harlowja): skip over meta-states.
if new_state not in _META_STATES:

View File

@@ -14,6 +14,8 @@
# License for the specific language governing permissions and limitations
# under the License.
import functools
from taskflow.engines.action_engine.actions import retry as ra
from taskflow.engines.action_engine.actions import task as ta
from taskflow.engines.action_engine import analyzer as an
@@ -21,7 +23,9 @@ from taskflow.engines.action_engine import completer as co
from taskflow.engines.action_engine import runner as ru
from taskflow.engines.action_engine import scheduler as sched
from taskflow.engines.action_engine import scopes as sc
from taskflow import flow as flow_type
from taskflow import states as st
from taskflow import task
from taskflow.utils import misc
@@ -38,7 +42,53 @@ class Runtime(object):
self._task_executor = task_executor
self._storage = storage
self._compilation = compilation
self._scopes = {}
self._atom_cache = {}
def compile(self):
"""Compiles & caches frequently used execution helper objects.
Build out a cache of commonly used item that are associated
with the contained atoms (by name), and are useful to have for
quick lookup on (for example, the change state handler function for
each atom, the scope walker object for each atom, the task or retry
specific scheduler and so-on).
"""
change_state_handlers = {
'task': functools.partial(self.task_action.change_state,
progress=0.0),
'retry': self.retry_action.change_state,
}
schedulers = {
'retry': self.retry_scheduler,
'task': self.task_scheduler,
}
execution_graph = self._compilation.execution_graph
for atom in self.analyzer.iterate_all_nodes():
metadata = {}
walker = sc.ScopeWalker(self.compilation, atom, names_only=True)
if isinstance(atom, task.BaseTask):
check_transition_handler = st.check_task_transition
change_state_handler = change_state_handlers['task']
scheduler = schedulers['task']
else:
check_transition_handler = st.check_retry_transition
change_state_handler = change_state_handlers['retry']
scheduler = schedulers['retry']
edge_deciders = {}
for previous_atom in execution_graph.predecessors(atom):
# If there is any link function that says if this connection
# is able to run (or should not) ensure we retain it and use
# it later as needed.
u_v_data = execution_graph.adj[previous_atom][atom]
u_v_decider = u_v_data.get(flow_type.LINK_DECIDER)
if u_v_decider is not None:
edge_deciders[previous_atom.name] = u_v_decider
metadata['scope_walker'] = walker
metadata['check_transition_handler'] = check_transition_handler
metadata['change_state_handler'] = change_state_handler
metadata['scheduler'] = scheduler
metadata['edge_deciders'] = edge_deciders
self._atom_cache[atom.name] = metadata
@property
def compilation(self):
@@ -50,7 +100,7 @@ class Runtime(object):
@misc.cachedproperty
def analyzer(self):
return an.Analyzer(self._compilation, self._storage)
return an.Analyzer(self)
@misc.cachedproperty
def runner(self):
@@ -64,53 +114,101 @@ class Runtime(object):
def scheduler(self):
return sched.Scheduler(self)
@misc.cachedproperty
def task_scheduler(self):
return sched.TaskScheduler(self)
@misc.cachedproperty
def retry_scheduler(self):
return sched.RetryScheduler(self)
@misc.cachedproperty
def retry_action(self):
return ra.RetryAction(self._storage, self._atom_notifier,
self._fetch_scopes_for)
return ra.RetryAction(self._storage,
self._atom_notifier)
@misc.cachedproperty
def task_action(self):
return ta.TaskAction(self._storage,
self._atom_notifier, self._fetch_scopes_for,
self._atom_notifier,
self._task_executor)
def _fetch_scopes_for(self, atom):
"""Fetches a tuple of the visible scopes for the given atom."""
def check_atom_transition(self, atom, current_state, target_state):
"""Checks if the atom can transition to the provided target state."""
# This does not check if the name exists (since this is only used
# internally to the engine, and is not exposed to atoms that will
# not exist and therefore doesn't need to handle that case).
metadata = self._atom_cache[atom.name]
check_transition_handler = metadata['check_transition_handler']
return check_transition_handler(current_state, target_state)
def fetch_edge_deciders(self, atom):
"""Fetches the edge deciders for the given atom."""
# This does not check if the name exists (since this is only used
# internally to the engine, and is not exposed to atoms that will
# not exist and therefore doesn't need to handle that case).
metadata = self._atom_cache[atom.name]
return metadata['edge_deciders']
def fetch_scheduler(self, atom):
"""Fetches the cached specific scheduler for the given atom."""
# This does not check if the name exists (since this is only used
# internally to the engine, and is not exposed to atoms that will
# not exist and therefore doesn't need to handle that case).
metadata = self._atom_cache[atom.name]
return metadata['scheduler']
def fetch_scopes_for(self, atom_name):
"""Fetches a walker of the visible scopes for the given atom."""
try:
return self._scopes[atom]
metadata = self._atom_cache[atom_name]
except KeyError:
walker = sc.ScopeWalker(self.compilation, atom,
names_only=True)
visible_to = tuple(walker)
self._scopes[atom] = visible_to
return visible_to
# This signals to the caller that there is no walker for whatever
# atom name was given that doesn't really have any associated atom
# known to be named with that name; this is done since the storage
# layer will call into this layer to fetch a scope for a named
# atom and users can provide random names that do not actually
# exist...
return None
else:
return metadata['scope_walker']
# Various helper methods used by the runtime components; not for public
# consumption...
def reset_nodes(self, nodes, state=st.PENDING, intention=st.EXECUTE):
for node in nodes:
def reset_nodes(self, atoms, state=st.PENDING, intention=st.EXECUTE):
"""Resets all the provided atoms to the given state and intention."""
tweaked = []
for atom in atoms:
metadata = self._atom_cache[atom.name]
if state or intention:
tweaked.append((atom, state, intention))
if state:
if self.task_action.handles(node):
self.task_action.change_state(node, state,
progress=0.0)
elif self.retry_action.handles(node):
self.retry_action.change_state(node, state)
else:
raise TypeError("Unknown how to reset atom '%s' (%s)"
% (node, type(node)))
change_state_handler = metadata['change_state_handler']
change_state_handler(atom, state)
if intention:
self.storage.set_atom_intention(node.name, intention)
self.storage.set_atom_intention(atom.name, intention)
return tweaked
def reset_all(self, state=st.PENDING, intention=st.EXECUTE):
self.reset_nodes(self.analyzer.iterate_all_nodes(),
state=state, intention=intention)
"""Resets all atoms to the given state and intention."""
return self.reset_nodes(self.analyzer.iterate_all_nodes(),
state=state, intention=intention)
def reset_subgraph(self, node, state=st.PENDING, intention=st.EXECUTE):
self.reset_nodes(self.analyzer.iterate_subgraph(node),
state=state, intention=intention)
def reset_subgraph(self, atom, state=st.PENDING, intention=st.EXECUTE):
"""Resets a atoms subgraph to the given state and intention.
The subgraph is contained of all of the atoms successors.
"""
return self.reset_nodes(self.analyzer.iterate_subgraph(atom),
state=state, intention=intention)
def retry_subflow(self, retry):
"""Prepares a retrys + its subgraph for execution.
This sets the retrys intention to ``EXECUTE`` and resets all of its
subgraph (its successors) to the ``PENDING`` state with an ``EXECUTE``
intention.
"""
self.storage.set_atom_intention(retry.name, st.EXECUTE)
self.reset_subgraph(retry)

View File

@@ -14,23 +14,21 @@
# License for the specific language governing permissions and limitations
# under the License.
import weakref
from taskflow import exceptions as excp
from taskflow import retry as retry_atom
from taskflow import states as st
from taskflow import task as task_atom
from taskflow.types import failure
class _RetryScheduler(object):
class RetryScheduler(object):
"""Schedules retry atoms."""
def __init__(self, runtime):
self._runtime = runtime
self._runtime = weakref.proxy(runtime)
self._retry_action = runtime.retry_action
self._storage = runtime.storage
@staticmethod
def handles(atom):
return isinstance(atom, retry_atom.Retry)
def schedule(self, retry):
"""Schedules the given retry atom for *future* completion.
@@ -51,15 +49,13 @@ class _RetryScheduler(object):
" intention: %s" % intention)
class _TaskScheduler(object):
class TaskScheduler(object):
"""Schedules task atoms."""
def __init__(self, runtime):
self._storage = runtime.storage
self._task_action = runtime.task_action
@staticmethod
def handles(atom):
return isinstance(atom, task_atom.BaseTask)
def schedule(self, task):
"""Schedules the given task atom for *future* completion.
@@ -77,39 +73,28 @@ class _TaskScheduler(object):
class Scheduler(object):
"""Schedules atoms using actions to schedule."""
"""Safely schedules atoms using a runtime ``fetch_scheduler`` routine."""
def __init__(self, runtime):
self._schedulers = [
_RetryScheduler(runtime),
_TaskScheduler(runtime),
]
self._fetch_scheduler = runtime.fetch_scheduler
def _schedule_node(self, node):
"""Schedule a single node for execution."""
for sched in self._schedulers:
if sched.handles(node):
return sched.schedule(node)
else:
raise TypeError("Unknown how to schedule '%s' (%s)"
% (node, type(node)))
def schedule(self, atoms):
"""Schedules the provided atoms for *future* completion.
def schedule(self, nodes):
"""Schedules the provided nodes for *future* completion.
This method should schedule a future for each node provided and return
This method should schedule a future for each atom provided and return
a set of those futures to be waited on (or used for other similar
purposes). It should also return any failure objects that represented
scheduling failures that may have occurred during this scheduling
process.
"""
futures = set()
for node in nodes:
for atom in atoms:
scheduler = self._fetch_scheduler(atom)
try:
futures.add(self._schedule_node(node))
futures.add(scheduler.schedule(atom))
except Exception:
# Immediately stop scheduling future work so that we can
# exit execution early (rather than later) if a single task
# exit execution early (rather than later) if a single atom
# fails to schedule correctly.
return (futures, [failure.Failure()])
return (futures, [])

View File

@@ -21,29 +21,30 @@ from taskflow import logging
LOG = logging.getLogger(__name__)
def _extract_atoms(node, idx=-1):
def _extract_atoms_iter(node, idx=-1):
# Always go left to right, since right to left is the pattern order
# and we want to go backwards and not forwards through that ordering...
if idx == -1:
children_iter = node.reverse_iter()
else:
children_iter = reversed(node[0:idx])
atoms = []
for child in children_iter:
if isinstance(child.item, flow_type.Flow):
atoms.extend(_extract_atoms(child))
for atom in _extract_atoms_iter(child):
yield atom
elif isinstance(child.item, atom_type.Atom):
atoms.append(child.item)
yield child.item
else:
raise TypeError(
"Unknown extraction item '%s' (%s)" % (child.item,
type(child.item)))
return atoms
class ScopeWalker(object):
"""Walks through the scopes of a atom using a engines compilation.
NOTE(harlowja): for internal usage only.
This will walk the visible scopes that are accessible for the given
atom, which can be used by some external entity in some meaningful way,
for example to find dependent values...
@@ -54,60 +55,80 @@ class ScopeWalker(object):
if self._node is None:
raise ValueError("Unable to find atom '%s' in compilation"
" hierarchy" % atom)
self._level_cache = {}
self._atom = atom
self._graph = compilation.execution_graph
self._names_only = names_only
self._predecessors = None
#: Function that extracts the *associated* atoms of a given tree node.
_extract_atoms_iter = staticmethod(_extract_atoms_iter)
def __iter__(self):
"""Iterates over the visible scopes.
How this works is the following:
We find all the possible predecessors of the given atom, this is useful
since we know they occurred before this atom but it doesn't tell us
the corresponding scope *level* that each predecessor was created in,
so we need to find this information.
We first grab all the predecessors of the given atom (lets call it
``Y``) by using the :py:class:`~.compiler.Compilation` execution
graph (and doing a reverse breadth-first expansion to gather its
predecessors), this is useful since we know they *always* will
exist (and execute) before this atom but it does not tell us the
corresponding scope *level* (flow, nested flow...) that each
predecessor was created in, so we need to find this information.
For that information we consult the location of the atom ``Y`` in the
node hierarchy. We lookup in a reverse order the parent ``X`` of ``Y``
and traverse backwards from the index in the parent where ``Y``
occurred, all children in ``X`` that we encounter in this backwards
search (if a child is a flow itself, its atom contents will be
expanded) will be assumed to be at the same scope. This is then a
*potential* single scope, to make an *actual* scope we remove the items
from the *potential* scope that are not predecessors of ``Y`` to form
the *actual* scope.
:py:class:`~.compiler.Compilation` hierarchy/tree. We lookup in a
reverse order the parent ``X`` of ``Y`` and traverse backwards from
the index in the parent where ``Y`` exists to all siblings (and
children of those siblings) in ``X`` that we encounter in this
backwards search (if a sibling is a flow itself, its atom(s)
will be recursively expanded and included). This collection will
then be assumed to be at the same scope. This is what is called
a *potential* single scope, to make an *actual* scope we remove the
items from the *potential* scope that are **not** predecessors
of ``Y`` to form the *actual* scope which we then yield back.
Then for additional scopes we continue up the tree, by finding the
parent of ``X`` (lets call it ``Z``) and perform the same operation,
going through the children in a reverse manner from the index in
parent ``Z`` where ``X`` was located. This forms another *potential*
scope which we provide back as an *actual* scope after reducing the
potential set by the predecessors of ``Y``. We then repeat this process
until we no longer have any parent nodes (aka have reached the top of
the tree) or we run out of predecessors.
potential set to only include predecessors previously gathered. We
then repeat this process until we no longer have any parent
nodes (aka we have reached the top of the tree) or we run out of
predecessors.
"""
predecessors = set(self._graph.bfs_predecessors_iter(self._atom))
if self._predecessors is None:
pred_iter = self._graph.bfs_predecessors_iter(self._atom)
self._predecessors = set(pred_iter)
predecessors = self._predecessors.copy()
last = self._node
for parent in self._node.path_iter(include_self=False):
for lvl, parent in enumerate(self._node.path_iter(include_self=False)):
if not predecessors:
break
last_idx = parent.index(last.item)
visible = []
for a in _extract_atoms(parent, idx=last_idx):
if a in predecessors:
predecessors.remove(a)
if not self._names_only:
visible.append(a)
else:
visible.append(a.name)
if LOG.isEnabledFor(logging.BLATHER):
if not self._names_only:
try:
visible, removals = self._level_cache[lvl]
predecessors = predecessors - removals
except KeyError:
visible = []
removals = set()
for atom in self._extract_atoms_iter(parent, idx=last_idx):
if atom in predecessors:
predecessors.remove(atom)
removals.add(atom)
visible.append(atom)
if not predecessors:
break
self._level_cache[lvl] = (visible, removals)
if LOG.isEnabledFor(logging.BLATHER):
visible_names = [a.name for a in visible]
else:
visible_names = visible
LOG.blather("Scope visible to '%s' (limited by parent '%s'"
" index < %s) is: %s", self._atom,
parent.item.name, last_idx, visible_names)
yield visible
LOG.blather("Scope visible to '%s' (limited by parent '%s'"
" index < %s) is: %s", self._atom,
parent.item.name, last_idx, visible_names)
if self._names_only:
yield [a.name for a in visible]
else:
yield visible
last = parent

View File

@@ -17,12 +17,10 @@
import abc
from debtcollector import moves
import six
from taskflow import storage
from taskflow.types import notifier
from taskflow.utils import deprecation
from taskflow.utils import misc
@six.add_metaclass(abc.ABCMeta)
@@ -56,10 +54,18 @@ class Engine(object):
return self._notifier
@property
@deprecation.moved_property('atom_notifier', version="0.6",
removal_version="?")
@moves.moved_property('atom_notifier', version="0.6",
removal_version="2.0")
def task_notifier(self):
"""The task notifier."""
"""The task notifier.
.. deprecated:: 0.6
The property is **deprecated** and is present for
backward compatibility **only**. In order to access this
property going forward the :py:attr:`.atom_notifier` should
be used instead.
"""
return self._atom_notifier
@property
@@ -72,10 +78,9 @@ class Engine(object):
"""The options that were passed to this engine on construction."""
return self._options
@misc.cachedproperty
@abc.abstractproperty
def storage(self):
"""The storage unit for this flow."""
return storage.Storage(self._flow_detail, backend=self._backend)
"""The storage unit for this engine."""
@abc.abstractmethod
def compile(self):
@@ -92,9 +97,18 @@ class Engine(object):
"""Performs any pre-run, but post-compilation actions.
NOTE(harlowja): During preparation it is currently assumed that the
underlying storage will be initialized, all final dependencies
will be verified, the tasks will be reset and the engine will enter
the PENDING state.
underlying storage will be initialized, the atoms will be reset and
the engine will enter the PENDING state.
"""
@abc.abstractmethod
def validate(self):
"""Performs any pre-run, post-prepare validation actions.
NOTE(harlowja): During validation all final dependencies
will be verified and ensured. This will by default check that all
atoms have satisfiable requirements (satisfied by some other
provider).
"""
@abc.abstractmethod
@@ -105,15 +119,13 @@ class Engine(object):
def suspend(self):
"""Attempts to suspend the engine.
If the engine is currently running tasks then this will attempt to
suspend future work from being started (currently active tasks can
If the engine is currently running atoms then this will attempt to
suspend future work from being started (currently active atoms can
not currently be preempted) and move the engine into a suspend state
which can then later be resumed from.
"""
# TODO(harlowja): remove in 0.7 or later...
EngineBase = deprecation.moved_inheritable_class(Engine,
'EngineBase', __name__,
version="0.6",
removal_version="?")
EngineBase = moves.moved_class(Engine, 'EngineBase', __name__,
version="0.6", removal_version="2.0")

View File

@@ -18,6 +18,7 @@ import contextlib
import itertools
import traceback
from debtcollector import renames
from oslo_utils import importutils
from oslo_utils import reflection
import six
@@ -26,7 +27,6 @@ import stevedore.driver
from taskflow import exceptions as exc
from taskflow import logging
from taskflow.persistence import backends as p_backends
from taskflow.utils import deprecation
from taskflow.utils import misc
from taskflow.utils import persistence_utils as p_utils
@@ -90,14 +90,14 @@ def _extract_engine(**kwargs):
lambda frame: frame[0] in _FILE_NAMES,
reversed(traceback.extract_stack(limit=3)))
stacklevel = sum(1 for _frame in finder)
decorator = deprecation.renamed_kwarg('engine_conf', 'engine',
version="0.6",
removal_version="?",
# Three is added on since the
# decorator adds three of its own
# stack levels that we need to
# hop out of...
stacklevel=stacklevel + 3)
decorator = renames.renamed_kwarg('engine_conf', 'engine',
version="0.6",
removal_version="2.0",
# Three is added on since the
# decorator adds three of its own
# stack levels that we need to
# hop out of...
stacklevel=stacklevel + 3)
return decorator(_compat_extract)(**kwargs)
else:
return _compat_extract(**kwargs)
@@ -134,7 +134,7 @@ def load(flow, store=None, flow_detail=None, book=None,
This function creates and prepares an engine to run the provided flow. All
that is left after this returns is to run the engine with the
engines ``run()`` method.
engines :py:meth:`~taskflow.engines.base.Engine.run` method.
Which engine to load is specified via the ``engine`` parameter. It
can be a string that names the engine type to use, or a string that
@@ -143,7 +143,15 @@ def load(flow, store=None, flow_detail=None, book=None,
Which storage backend to use is defined by the backend parameter. It
can be backend itself, or a dictionary that is passed to
``taskflow.persistence.backends.fetch()`` to obtain a viable backend.
:py:func:`~taskflow.persistence.backends.fetch` to obtain a
viable backend.
.. deprecated:: 0.6
The ``engine_conf`` argument is **deprecated** and is present
for backward compatibility **only**. In order to provide this
argument going forward the ``engine`` string (or URI) argument
should be used instead.
:param flow: flow to load
:param store: dict -- data to put to storage to satisfy flow requirements
@@ -198,7 +206,15 @@ def run(flow, store=None, flow_detail=None, book=None,
The arguments are interpreted as for :func:`load() <load>`.
:returns: dictionary of all named results (see ``storage.fetch_all()``)
.. deprecated:: 0.6
The ``engine_conf`` argument is **deprecated** and is present
for backward compatibility **only**. In order to provide this
argument going forward the ``engine`` string (or URI) argument
should be used instead.
:returns: dictionary of all named
results (see :py:meth:`~.taskflow.storage.Storage.fetch_all`)
"""
engine = load(flow, store=store, flow_detail=flow_detail, book=book,
engine_conf=engine_conf, backend=backend,
@@ -262,6 +278,13 @@ def load_from_factory(flow_factory, factory_args=None, factory_kwargs=None,
Further arguments are interpreted as for :func:`load() <load>`.
.. deprecated:: 0.6
The ``engine_conf`` argument is **deprecated** and is present
for backward compatibility **only**. In order to provide this
argument going forward the ``engine`` string (or URI) argument
should be used instead.
:returns: engine
"""
@@ -322,6 +345,13 @@ def load_from_detail(flow_detail, store=None, engine_conf=None, backend=None,
Further arguments are interpreted as for :func:`load() <load>`.
.. deprecated:: 0.6
The ``engine_conf`` argument is **deprecated** and is present
for backward compatibility **only**. In order to provide this
argument going forward the ``engine`` string (or URI) argument
should be used instead.
:returns: engine
"""
flow = flow_from_detail(flow_detail)

View File

@@ -23,6 +23,36 @@ from taskflow.utils import kombu_utils as ku
LOG = logging.getLogger(__name__)
class Handler(object):
"""Component(s) that will be called on reception of messages."""
__slots__ = ['_process_message', '_validator']
def __init__(self, process_message, validator=None):
self._process_message = process_message
self._validator = validator
@property
def process_message(self):
"""Main callback that is called to process a received message.
This is only called after the format has been validated (using
the ``validator`` callback if applicable) and only after the message
has been acknowledged.
"""
return self._process_message
@property
def validator(self):
"""Optional callback that will be activated before processing.
This callback if present is expected to validate the message and
raise :py:class:`~taskflow.exceptions.InvalidFormat` if the message
is not valid.
"""
return self._validator
class TypeDispatcher(object):
"""Receives messages and dispatches to type specific handlers."""
@@ -99,10 +129,9 @@ class TypeDispatcher(object):
LOG.warning("Unexpected message type: '%s' in message"
" '%s'", message_type, ku.DelayedPretty(message))
else:
if isinstance(handler, (tuple, list)):
handler, validator = handler
if handler.validator is not None:
try:
validator(data)
handler.validator(data)
except excp.InvalidFormat as e:
message.reject_log_error(
logger=LOG, errors=(kombu_exc.MessageStateError,))
@@ -115,7 +144,7 @@ class TypeDispatcher(object):
if message.acknowledged:
LOG.debug("Message '%s' was acknowledged.",
ku.DelayedPretty(message))
handler(data, message)
handler.process_message(data, message)
else:
message.reject_log_error(logger=LOG,
errors=(kombu_exc.MessageStateError,))

View File

@@ -36,8 +36,9 @@ class WorkerBasedActionEngine(engine.ActionEngine):
of the (PENDING, WAITING) request states. When
expired the associated task the request was made
for will have its result become a
`RequestTimeout` exception instead of its
normally returned value (or raised exception).
:py:class:`~taskflow.exceptions.RequestTimeout`
exception instead of its normally returned
value (or raised exception).
:param transport_options: transport specific options (see:
http://kombu.readthedocs.org/ for what these
options imply and are expected to be)

View File

@@ -16,16 +16,17 @@
import functools
from futurist import periodics
from oslo_utils import timeutils
from taskflow.engines.action_engine import executor
from taskflow.engines.worker_based import dispatcher
from taskflow.engines.worker_based import protocol as pr
from taskflow.engines.worker_based import proxy
from taskflow.engines.worker_based import types as wt
from taskflow import exceptions as exc
from taskflow import logging
from taskflow import task as task_atom
from taskflow.types import periodic
from taskflow.utils import kombu_utils as ku
from taskflow.utils import misc
from taskflow.utils import threading_utils as tu
@@ -44,10 +45,8 @@ class WorkerTaskExecutor(executor.TaskExecutor):
self._requests_cache = wt.RequestsCache()
self._transition_timeout = transition_timeout
type_handlers = {
pr.RESPONSE: [
self._process_response,
pr.Response.validate,
],
pr.RESPONSE: dispatcher.Handler(self._process_response,
validator=pr.Response.validate),
}
self._proxy = proxy.Proxy(uuid, exchange,
type_handlers=type_handlers,
@@ -68,7 +67,7 @@ class WorkerTaskExecutor(executor.TaskExecutor):
self._helpers.bind(lambda: tu.daemon_thread(self._proxy.start),
after_start=lambda t: self._proxy.wait(),
before_join=lambda t: self._proxy.stop())
p_worker = periodic.PeriodicWorker.create([self._finder])
p_worker = periodics.PeriodicWorker.create([self._finder])
if p_worker:
self._helpers.bind(lambda: tu.daemon_thread(p_worker.start),
before_join=lambda t: p_worker.stop(),

View File

@@ -15,11 +15,11 @@
# under the License.
import abc
import collections
import threading
from concurrent import futures
import jsonschema
from jsonschema import exceptions as schema_exc
import fasteners
import futurist
from oslo_utils import reflection
from oslo_utils import timeutils
import six
@@ -28,8 +28,7 @@ from taskflow.engines.action_engine import executor
from taskflow import exceptions as excp
from taskflow import logging
from taskflow.types import failure as ft
from taskflow.types import timing as tt
from taskflow.utils import lock_utils
from taskflow.utils import schema_utils as su
# NOTE(skudriashev): This is protocol states and events, which are not
# related to task states.
@@ -98,12 +97,6 @@ NOTIFY = 'NOTIFY'
REQUEST = 'REQUEST'
RESPONSE = 'RESPONSE'
# Special jsonschema validation types/adjustments.
_SCHEMA_TYPES = {
# See: https://github.com/Julian/jsonschema/issues/148
'array': (list, tuple),
}
LOG = logging.getLogger(__name__)
@@ -112,7 +105,8 @@ class Message(object):
"""Base class for all message types."""
def __str__(self):
return "<%s> %s" % (self.TYPE, self.to_dict())
cls_name = reflection.get_class_name(self, fully_qualified=False)
return "<%s> %s" % (cls_name, self.to_dict())
@abc.abstractmethod
def to_dict(self):
@@ -166,16 +160,25 @@ class Notify(Message):
else:
schema = cls.SENDER_SCHEMA
try:
jsonschema.validate(data, schema, types=_SCHEMA_TYPES)
except schema_exc.ValidationError as e:
su.schema_validate(data, schema)
except su.ValidationError as e:
cls_name = reflection.get_class_name(cls, fully_qualified=False)
if response:
raise excp.InvalidFormat("%s message response data not of the"
" expected format: %s"
% (cls.TYPE, e.message), e)
excp.raise_with_cause(excp.InvalidFormat,
"%s message response data not of the"
" expected format: %s" % (cls_name,
e.message),
cause=e)
else:
raise excp.InvalidFormat("%s message sender data not of the"
" expected format: %s"
% (cls.TYPE, e.message), e)
excp.raise_with_cause(excp.InvalidFormat,
"%s message sender data not of the"
" expected format: %s" % (cls_name,
e.message),
cause=e)
_WorkUnit = collections.namedtuple('_WorkUnit', ['task_cls', 'task_name',
'action', 'arguments'])
class Request(Message):
@@ -235,11 +238,11 @@ class Request(Message):
self._event = ACTION_TO_EVENT[action]
self._arguments = arguments
self._kwargs = kwargs
self._watch = tt.StopWatch(duration=timeout).start()
self._watch = timeutils.StopWatch(duration=timeout).start()
self._state = WAITING
self._lock = threading.Lock()
self._created_on = timeutils.utcnow()
self._result = futures.Future()
self._result = futurist.Future()
self._result.atom = task
self._notifier = task.notifier
@@ -332,7 +335,7 @@ class Request(Message):
new_state, exc_info=True)
return moved
@lock_utils.locked
@fasteners.locked
def transition(self, new_state):
"""Transitions the request to a new state.
@@ -358,11 +361,60 @@ class Request(Message):
@classmethod
def validate(cls, data):
try:
jsonschema.validate(data, cls.SCHEMA, types=_SCHEMA_TYPES)
except schema_exc.ValidationError as e:
raise excp.InvalidFormat("%s message response data not of the"
" expected format: %s"
% (cls.TYPE, e.message), e)
su.schema_validate(data, cls.SCHEMA)
except su.ValidationError as e:
cls_name = reflection.get_class_name(cls, fully_qualified=False)
excp.raise_with_cause(excp.InvalidFormat,
"%s message response data not of the"
" expected format: %s" % (cls_name,
e.message),
cause=e)
else:
# Validate all failure dictionaries that *may* be present...
failures = []
if 'failures' in data:
failures.extend(six.itervalues(data['failures']))
result = data.get('result')
if result is not None:
result_data_type, result_data = result
if result_data_type == 'failure':
failures.append(result_data)
for fail_data in failures:
ft.Failure.validate(fail_data)
@staticmethod
def from_dict(data, task_uuid=None):
"""Parses **validated** data into a work unit.
All :py:class:`~taskflow.types.failure.Failure` objects that have been
converted to dict(s) on the remote side will now converted back
to py:class:`~taskflow.types.failure.Failure` objects.
"""
task_cls = data['task_cls']
task_name = data['task_name']
action = data['action']
arguments = data.get('arguments', {})
result = data.get('result')
failures = data.get('failures')
# These arguments will eventually be given to the task executor
# so they need to be in a format it will accept (and using keyword
# argument names that it accepts)...
arguments = {
'arguments': arguments,
}
if task_uuid is not None:
arguments['task_uuid'] = task_uuid
if result is not None:
result_data_type, result_data = result
if result_data_type == 'failure':
arguments['result'] = ft.Failure.from_dict(result_data)
else:
arguments['result'] = result_data
if failures is not None:
arguments['failures'] = {}
for task, fail_data in six.iteritems(failures):
arguments['failures'][task] = ft.Failure.from_dict(fail_data)
return _WorkUnit(task_cls, task_name, action, arguments)
class Response(Message):
@@ -455,8 +507,15 @@ class Response(Message):
@classmethod
def validate(cls, data):
try:
jsonschema.validate(data, cls.SCHEMA, types=_SCHEMA_TYPES)
except schema_exc.ValidationError as e:
raise excp.InvalidFormat("%s message response data not of the"
" expected format: %s"
% (cls.TYPE, e.message), e)
su.schema_validate(data, cls.SCHEMA)
except su.ValidationError as e:
cls_name = reflection.get_class_name(cls, fully_qualified=False)
excp.raise_with_cause(excp.InvalidFormat,
"%s message response data not of the"
" expected format: %s" % (cls_name,
e.message),
cause=e)
else:
state = data['state']
if state == FAILURE and 'result' in data:
ft.Failure.validate(data['result'])

View File

@@ -15,6 +15,7 @@
# under the License.
import collections
import threading
import kombu
from kombu import exceptions as kombu_exceptions
@@ -22,7 +23,6 @@ import six
from taskflow.engines.worker_based import dispatcher
from taskflow import logging
from taskflow.utils import threading_utils
LOG = logging.getLogger(__name__)
@@ -75,7 +75,7 @@ class Proxy(object):
self._topic = topic
self._exchange_name = exchange
self._on_wait = on_wait
self._running = threading_utils.Event()
self._running = threading.Event()
self._dispatcher = dispatcher.TypeDispatcher(
# NOTE(skudriashev): Process all incoming messages only if proxy is
# running, otherwise requeue them.

View File

@@ -17,14 +17,14 @@
import functools
from oslo_utils import reflection
import six
from oslo_utils import timeutils
from taskflow.engines.worker_based import dispatcher
from taskflow.engines.worker_based import protocol as pr
from taskflow.engines.worker_based import proxy
from taskflow import logging
from taskflow.types import failure as ft
from taskflow.types import notifier as nt
from taskflow.types import timing as tt
from taskflow.utils import kombu_utils as ku
from taskflow.utils import misc
@@ -38,14 +38,13 @@ class Server(object):
url=None, transport=None, transport_options=None,
retry_options=None):
type_handlers = {
pr.NOTIFY: [
pr.NOTIFY: dispatcher.Handler(
self._delayed_process(self._process_notify),
functools.partial(pr.Notify.validate, response=False),
],
pr.REQUEST: [
validator=functools.partial(pr.Notify.validate,
response=False)),
pr.REQUEST: dispatcher.Handler(
self._delayed_process(self._process_request),
pr.Request.validate,
],
validator=pr.Request.validate),
}
self._executor = executor
self._proxy = proxy.Proxy(topic, exchange,
@@ -77,7 +76,7 @@ class Server(object):
def _on_receive(content, message):
LOG.debug("Submitting message '%s' for execution in the"
" future to '%s'", ku.DelayedPretty(message), func_name)
watch = tt.StopWatch()
watch = timeutils.StopWatch()
watch.start()
try:
self._executor.submit(_on_run, watch, content, message)
@@ -94,32 +93,6 @@ class Server(object):
def connection_details(self):
return self._proxy.connection_details
@staticmethod
def _parse_request(task_cls, task_name, action, arguments, result=None,
failures=None, **kwargs):
"""Parse request before it can be further processed.
All `failure.Failure` objects that have been converted to dict on the
remote side will now converted back to `failure.Failure` objects.
"""
# These arguments will eventually be given to the task executor
# so they need to be in a format it will accept (and using keyword
# argument names that it accepts)...
arguments = {
'arguments': arguments,
}
if result is not None:
data_type, data = result
if data_type == 'failure':
arguments['result'] = ft.Failure.from_dict(data)
else:
arguments['result'] = data
if failures is not None:
arguments['failures'] = {}
for key, data in six.iteritems(failures):
arguments['failures'][key] = ft.Failure.from_dict(data)
return (task_cls, task_name, action, arguments)
@staticmethod
def _parse_message(message):
"""Extracts required attributes out of the messages properties.
@@ -199,11 +172,9 @@ class Server(object):
reply_callback = functools.partial(self._reply, True, reply_to,
task_uuid)
# parse request to get task name, action and action arguments
# Parse the request to get the activity/work to perform.
try:
bundle = self._parse_request(**request)
task_cls, task_name, action, arguments = bundle
arguments['task_uuid'] = task_uuid
work = pr.Request.from_dict(request, task_uuid=task_uuid)
except ValueError:
with misc.capture_failure() as failure:
LOG.warn("Failed to parse request contents from message '%s'",
@@ -211,34 +182,35 @@ class Server(object):
reply_callback(result=failure.to_dict())
return
# get task endpoint
# Now fetch the task endpoint (and action handler on it).
try:
endpoint = self._endpoints[task_cls]
endpoint = self._endpoints[work.task_cls]
except KeyError:
with misc.capture_failure() as failure:
LOG.warn("The '%s' task endpoint does not exist, unable"
" to continue processing request message '%s'",
task_cls, ku.DelayedPretty(message), exc_info=True)
work.task_cls, ku.DelayedPretty(message),
exc_info=True)
reply_callback(result=failure.to_dict())
return
else:
try:
handler = getattr(endpoint, action)
handler = getattr(endpoint, work.action)
except AttributeError:
with misc.capture_failure() as failure:
LOG.warn("The '%s' handler does not exist on task endpoint"
" '%s', unable to continue processing request"
" message '%s'", action, endpoint,
" message '%s'", work.action, endpoint,
ku.DelayedPretty(message), exc_info=True)
reply_callback(result=failure.to_dict())
return
else:
try:
task = endpoint.generate(name=task_name)
task = endpoint.generate(name=work.task_name)
except Exception:
with misc.capture_failure() as failure:
LOG.warn("The '%s' task '%s' generation for request"
" message '%s' failed", endpoint, action,
" message '%s' failed", endpoint, work.action,
ku.DelayedPretty(message), exc_info=True)
reply_callback(result=failure.to_dict())
return
@@ -246,7 +218,7 @@ class Server(object):
if not reply_callback(state=pr.RUNNING):
return
# associate *any* events this task emits with a proxy that will
# Associate *any* events this task emits with a proxy that will
# emit them back to the engine... for handling at the engine side
# of things...
if task.notifier.can_be_registered(nt.Notifier.ANY):
@@ -254,22 +226,23 @@ class Server(object):
functools.partial(self._on_event,
reply_to, task_uuid))
elif isinstance(task.notifier, nt.RestrictedNotifier):
# only proxy the allowable events then...
# Only proxy the allowable events then...
for event_type in task.notifier.events_iter():
task.notifier.register(event_type,
functools.partial(self._on_event,
reply_to, task_uuid))
# perform the task action
# Perform the task action.
try:
result = handler(task, **arguments)
result = handler(task, **work.arguments)
except Exception:
with misc.capture_failure() as failure:
LOG.warn("The '%s' endpoint '%s' execution for request"
" message '%s' failed", endpoint, action,
" message '%s' failed", endpoint, work.action,
ku.DelayedPretty(message), exc_info=True)
reply_callback(result=failure.to_dict())
else:
# And be done with it!
if isinstance(result, ft.Failure):
reply_callback(result=result.to_dict())
else:

View File

@@ -20,15 +20,16 @@ import itertools
import random
import threading
from futurist import periodics
from oslo_utils import reflection
from oslo_utils import timeutils
import six
from taskflow.engines.worker_based import dispatcher
from taskflow.engines.worker_based import protocol as pr
from taskflow import logging
from taskflow.types import cache as base
from taskflow.types import notifier
from taskflow.types import periodic
from taskflow.types import timing as tt
from taskflow.utils import kombu_utils as ku
LOG = logging.getLogger(__name__)
@@ -122,7 +123,7 @@ class WorkerFinder(object):
"""
if workers <= 0:
raise ValueError("Worker amount must be greater than zero")
watch = tt.StopWatch(duration=timeout)
watch = timeutils.StopWatch(duration=timeout)
watch.start()
with self._cond:
while self._total_workers() < workers:
@@ -165,10 +166,10 @@ class ProxyWorkerFinder(WorkerFinder):
self._workers = {}
self._uuid = uuid
self._proxy.dispatcher.type_handlers.update({
pr.NOTIFY: [
pr.NOTIFY: dispatcher.Handler(
self._process_response,
functools.partial(pr.Notify.validate, response=True),
],
validator=functools.partial(pr.Notify.validate,
response=True)),
})
self._counter = itertools.count()
@@ -179,7 +180,7 @@ class ProxyWorkerFinder(WorkerFinder):
else:
return TopicWorker(topic, tasks)
@periodic.periodic(pr.NOTIFY_PERIOD)
@periodics.periodic(pr.NOTIFY_PERIOD, run_immediately=True)
def beat(self):
"""Cyclically called to publish notify message to each topic."""
self._proxy.publish(pr.Notify(), self._topics, reply_to=self._uuid)

View File

@@ -20,47 +20,17 @@ import socket
import string
import sys
import futurist
from oslo_utils import reflection
from taskflow.engines.worker_based import endpoint
from taskflow.engines.worker_based import server
from taskflow import logging
from taskflow import task as t_task
from taskflow.types import futures
from taskflow.utils import misc
from taskflow.utils import threading_utils as tu
from taskflow import version
BANNER_TEMPLATE = string.Template("""
TaskFlow v${version} WBE worker.
Connection details:
Driver = $transport_driver
Exchange = $exchange
Topic = $topic
Transport = $transport_type
Uri = $connection_uri
Powered by:
Executor = $executor_type
Thread count = $executor_thread_count
Supported endpoints:$endpoints
System details:
Hostname = $hostname
Pid = $pid
Platform = $platform
Python = $python
Thread id = $thread_id
""".strip())
BANNER_TEMPLATE.defaults = {
# These values may not be possible to fetch/known, default to unknown...
'pid': '???',
'hostname': '???',
'executor_thread_count': '???',
'endpoints': ' %s' % ([]),
# These are static (avoid refetching...)
'version': version.version_string(),
'python': sys.version.split("\n", 1)[0].strip(),
}
LOG = logging.getLogger(__name__)
@@ -88,6 +58,39 @@ class Worker(object):
(see: :py:attr:`~.proxy.Proxy.DEFAULT_RETRY_OPTIONS`)
"""
BANNER_TEMPLATE = string.Template("""
TaskFlow v${version} WBE worker.
Connection details:
Driver = $transport_driver
Exchange = $exchange
Topic = $topic
Transport = $transport_type
Uri = $connection_uri
Powered by:
Executor = $executor_type
Thread count = $executor_thread_count
Supported endpoints:$endpoints
System details:
Hostname = $hostname
Pid = $pid
Platform = $platform
Python = $python
Thread id = $thread_id
""".strip())
# See: http://bugs.python.org/issue13173 for why we are doing this...
BANNER_TEMPLATE.defaults = {
# These values may not be possible to fetch/known, default
# to ??? to represent that they are unknown...
'pid': '???',
'hostname': '???',
'executor_thread_count': '???',
'endpoints': ' %s' % ([]),
# These are static (avoid refetching...)
'version': version.version_string(),
'python': sys.version.split("\n", 1)[0].strip(),
}
def __init__(self, exchange, topic, tasks,
executor=None, threads_count=None, url=None,
transport=None, transport_options=None,
@@ -95,13 +98,9 @@ class Worker(object):
self._topic = topic
self._executor = executor
self._owns_executor = False
self._threads_count = -1
if self._executor is None:
if threads_count is not None:
self._threads_count = int(threads_count)
else:
self._threads_count = tu.get_optimal_thread_count()
self._executor = futures.ThreadPoolExecutor(self._threads_count)
self._executor = futurist.ThreadPoolExecutor(
max_workers=threads_count)
self._owns_executor = True
self._endpoints = self._derive_endpoints(tasks)
self._exchange = exchange
@@ -119,7 +118,10 @@ class Worker(object):
def _generate_banner(self):
"""Generates a banner that can be useful to display before running."""
tpl_params = {}
try:
tpl_params = dict(self.BANNER_TEMPLATE.defaults)
except AttributeError:
tpl_params = {}
connection_details = self._server.connection_details
transport = connection_details.transport
if transport.driver_version:
@@ -133,8 +135,9 @@ class Worker(object):
tpl_params['transport_type'] = transport.driver_type
tpl_params['connection_uri'] = connection_details.uri
tpl_params['executor_type'] = reflection.get_class_name(self._executor)
if self._threads_count != -1:
tpl_params['executor_thread_count'] = self._threads_count
threads_count = getattr(self._executor, 'max_workers', None)
if threads_count is not None:
tpl_params['executor_thread_count'] = threads_count
if self._endpoints:
pretty_endpoints = []
for ep in self._endpoints:
@@ -151,8 +154,7 @@ class Worker(object):
pass
tpl_params['platform'] = platform.platform()
tpl_params['thread_id'] = tu.get_ident()
banner = BANNER_TEMPLATE.substitute(BANNER_TEMPLATE.defaults,
**tpl_params)
banner = self.BANNER_TEMPLATE.substitute(**tpl_params)
# NOTE(harlowja): this is needed since the template in this file
# will always have newlines that end with '\n' (even on different
# platforms due to the way this source file is encoded) so we have

View File

@@ -0,0 +1,204 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import contextlib
import logging
import os
import sys
import time
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir,
os.pardir))
sys.path.insert(0, top_dir)
from taskflow.conductors import backends as conductor_backends
from taskflow import engines
from taskflow.jobs import backends as job_backends
from taskflow.patterns import linear_flow as lf
from taskflow.persistence import backends as persistence_backends
from taskflow.persistence import logbook
from taskflow import task
from taskflow.types import timing
from oslo_utils import uuidutils
# Instructions!
#
# 1. Install zookeeper (or change host listed below)
# 2. Download this example, place in file '99_bottles.py'
# 3. Run `python 99_bottles.py p` to place a song request onto the jobboard
# 4. Run `python 99_bottles.py c` a few times (in different shells)
# 5. On demand kill previously listed processes created in (4) and watch
# the work resume on another process (and repeat)
# 6. Keep enough workers alive to eventually finish the song (if desired).
ME = os.getpid()
ZK_HOST = "localhost:2181"
JB_CONF = {
'hosts': ZK_HOST,
'board': 'zookeeper',
'path': '/taskflow/99-bottles-demo',
}
PERSISTENCE_URI = r"sqlite:////tmp/bottles.db"
TAKE_DOWN_DELAY = 1.0
PASS_AROUND_DELAY = 3.0
HOW_MANY_BOTTLES = 99
class TakeABottleDown(task.Task):
def execute(self, bottles_left):
sys.stdout.write('Take one down, ')
sys.stdout.flush()
time.sleep(TAKE_DOWN_DELAY)
return bottles_left - 1
class PassItAround(task.Task):
def execute(self):
sys.stdout.write('pass it around, ')
sys.stdout.flush()
time.sleep(PASS_AROUND_DELAY)
class Conclusion(task.Task):
def execute(self, bottles_left):
sys.stdout.write('%s bottles of beer on the wall...\n' % bottles_left)
sys.stdout.flush()
def make_bottles(count):
# This is the function that will be called to generate the workflow
# and will also be called to regenerate it on resumption so that work
# can continue from where it last left off...
s = lf.Flow("bottle-song")
take_bottle = TakeABottleDown("take-bottle-%s" % count,
inject={'bottles_left': count},
provides='bottles_left')
pass_it = PassItAround("pass-%s-around" % count)
next_bottles = Conclusion("next-bottles-%s" % (count - 1))
s.add(take_bottle, pass_it, next_bottles)
for bottle in reversed(list(range(1, count))):
take_bottle = TakeABottleDown("take-bottle-%s" % bottle,
provides='bottles_left')
pass_it = PassItAround("pass-%s-around" % bottle)
next_bottles = Conclusion("next-bottles-%s" % (bottle - 1))
s.add(take_bottle, pass_it, next_bottles)
return s
def run_conductor():
# This continuously consumers until its stopped via ctrl-c or other
# kill signal...
event_watches = {}
# This will be triggered by the conductor doing various activities
# with engines, and is quite nice to be able to see the various timing
# segments (which is useful for debugging, or watching, or figuring out
# where to optimize).
def on_conductor_event(event, details):
print("Event '%s' has been received..." % event)
print("Details = %s" % details)
if event.endswith("_start"):
w = timing.StopWatch()
w.start()
base_event = event[0:-len("_start")]
event_watches[base_event] = w
if event.endswith("_end"):
base_event = event[0:-len("_end")]
try:
w = event_watches.pop(base_event)
w.stop()
print("It took %0.3f seconds for event '%s' to finish"
% (w.elapsed(), base_event))
except KeyError:
pass
print("Starting conductor with pid: %s" % ME)
my_name = "conductor-%s" % ME
persist_backend = persistence_backends.fetch(PERSISTENCE_URI)
with contextlib.closing(persist_backend):
with contextlib.closing(persist_backend.get_connection()) as conn:
conn.upgrade()
job_backend = job_backends.fetch(my_name, JB_CONF,
persistence=persist_backend)
job_backend.connect()
with contextlib.closing(job_backend):
cond = conductor_backends.fetch('blocking', my_name, job_backend,
persistence=persist_backend)
cond.notifier.register(cond.notifier.ANY, on_conductor_event)
# Run forever, and kill -9 or ctrl-c me...
try:
cond.run()
finally:
cond.stop()
cond.wait()
def run_poster():
# This just posts a single job and then ends...
print("Starting poster with pid: %s" % ME)
my_name = "poster-%s" % ME
persist_backend = persistence_backends.fetch(PERSISTENCE_URI)
with contextlib.closing(persist_backend):
with contextlib.closing(persist_backend.get_connection()) as conn:
conn.upgrade()
job_backend = job_backends.fetch(my_name, JB_CONF,
persistence=persist_backend)
job_backend.connect()
with contextlib.closing(job_backend):
# Create information in the persistence backend about the
# unit of work we want to complete and the factory that
# can be called to create the tasks that the work unit needs
# to be done.
lb = logbook.LogBook("post-from-%s" % my_name)
fd = logbook.FlowDetail("song-from-%s" % my_name,
uuidutils.generate_uuid())
lb.add(fd)
with contextlib.closing(persist_backend.get_connection()) as conn:
conn.save_logbook(lb)
engines.save_factory_details(fd, make_bottles,
[HOW_MANY_BOTTLES], {},
backend=persist_backend)
# Post, and be done with it!
jb = job_backend.post("song-from-%s" % my_name, book=lb)
print("Posted: %s" % jb)
print("Goodbye...")
def main():
if len(sys.argv) == 1:
sys.stderr.write("%s p|c\n" % os.path.basename(sys.argv[0]))
elif sys.argv[1] in ('p', 'c'):
if sys.argv[-1] == "v":
logging.basicConfig(level=5)
else:
logging.basicConfig(level=logging.ERROR)
if sys.argv[1] == 'p':
run_poster()
else:
run_conductor()
else:
sys.stderr.write("%s p|c (v?)\n" % os.path.basename(sys.argv[0]))
if __name__ == '__main__':
main()

View File

@@ -38,7 +38,7 @@ from taskflow import task
# In this example we show how a simple linear set of tasks can be executed
# using local processes (and not threads or remote workers) with minimial (if
# using local processes (and not threads or remote workers) with minimal (if
# any) modification to those tasks to make them safe to run in this mode.
#
# This is useful since it allows further scaling up your workflows when thread

View File

@@ -38,7 +38,7 @@ ANY = notifier.Notifier.ANY
import example_utils as eu # noqa
# INTRO: This examples shows how a graph flow and linear flow can be used
# INTRO: This example shows how a graph flow and linear flow can be used
# together to execute dependent & non-dependent tasks by going through the
# steps required to build a simplistic car (an assembly line if you will). It
# also shows how raw functions can be wrapped into a task object instead of
@@ -167,7 +167,7 @@ engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
# flow_watch function for flow state transitions, and registers the
# same all (ANY) state transitions for task state transitions.
engine.notifier.register(ANY, flow_watch)
engine.task_notifier.register(ANY, task_watch)
engine.atom_notifier.register(ANY, task_watch)
eu.print_wrapped("Building a car")
engine.run()
@@ -180,7 +180,7 @@ spec['doors'] = 5
engine = taskflow.engines.load(flow, store={'spec': spec.copy()})
engine.notifier.register(ANY, flow_watch)
engine.task_notifier.register(ANY, task_watch)
engine.atom_notifier.register(ANY, task_watch)
eu.print_wrapped("Building a wrong car that doesn't match specification")
try:

View File

@@ -30,7 +30,7 @@ from taskflow.patterns import linear_flow as lf
from taskflow.patterns import unordered_flow as uf
from taskflow import task
# INTRO: This examples shows how a linear flow and a unordered flow can be
# INTRO: These examples show how a linear flow and an unordered flow can be
# used together to execute calculations in parallel and then use the
# result for the next task/s. The adder task is used for all calculations
# and argument bindings are used to set correct parameters for each task.

View File

@@ -35,7 +35,7 @@ from taskflow.listeners import printing
from taskflow.patterns import unordered_flow as uf
from taskflow import task
# INTRO: This examples shows how unordered_flow can be used to create a large
# INTRO: These examples show how unordered_flow can be used to create a large
# number of fake volumes in parallel (or serially, depending on a constant that
# can be easily changed).

View File

@@ -0,0 +1,78 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
import os
import sys
logging.basicConfig(level=logging.ERROR)
self_dir = os.path.abspath(os.path.dirname(__file__))
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir,
os.pardir))
sys.path.insert(0, top_dir)
sys.path.insert(0, self_dir)
from taskflow import engines
from taskflow.patterns import linear_flow as lf
from taskflow.persistence import backends
from taskflow import task
from taskflow.utils import persistence_utils as pu
# INTRO: in this example we create a dummy flow with a dummy task, and run
# it using a in-memory backend and pre/post run we dump out the contents
# of the in-memory backends tree structure (which can be quite useful to
# look at for debugging or other analysis).
class PrintTask(task.Task):
def execute(self):
print("Running '%s'" % self.name)
backend = backends.fetch({
'connection': 'memory://',
})
book, flow_detail = pu.temporary_flow_detail(backend=backend)
# Make a little flow and run it...
f = lf.Flow('root')
for alpha in ['a', 'b', 'c']:
f.add(PrintTask(alpha))
e = engines.load(f, flow_detail=flow_detail,
book=book, backend=backend)
e.compile()
e.prepare()
print("----------")
print("Before run")
print("----------")
print(backend.memory.pformat())
print("----------")
e.run()
print("---------")
print("After run")
print("---------")
for path in backend.memory.ls_r(backend.memory.root_path, absolute=True):
value = backend.memory[path]
if value:
print("%s -> %s" % (path, value))
else:
print("%s" % (path))

View File

@@ -31,8 +31,8 @@ from taskflow.patterns import linear_flow as lf
from taskflow import task
# INTRO: This example walks through a miniature workflow which will do a
# simple echo operation; during this execution a listener is assocated with
# the engine to recieve all notifications about what the flow has performed,
# simple echo operation; during this execution a listener is associated with
# the engine to receive all notifications about what the flow has performed,
# this example dumps that output to the stdout for viewing (at debug level
# to show all the information which is possible).

View File

@@ -36,8 +36,8 @@ from taskflow.patterns import linear_flow as lf
from taskflow import task
from taskflow.utils import misc
# INTRO: This example walks through a miniature workflow which simulates a
# the reception of a API request, creation of a database entry, driver
# INTRO: This example walks through a miniature workflow which simulates
# the reception of an API request, creation of a database entry, driver
# activation (which invokes a 'fake' webservice) and final completion.
#
# This example also shows how a function/object (in this class the url sending)

View File

@@ -80,12 +80,37 @@ store = {
"y5": 9,
}
# This is the expected values that should be created.
unexpected = 0
expected = [
('x1', 4),
('x2', 12),
('x3', 16),
('x4', 21),
('x5', 20),
('x6', 41),
('x7', 82),
]
result = taskflow.engines.run(
flow, engine='serial', store=store)
print("Single threaded engine result %s" % result)
for (name, value) in expected:
actual = result.get(name)
if actual != value:
sys.stderr.write("%s != %s\n" % (actual, value))
unexpected += 1
result = taskflow.engines.run(
flow, engine='parallel', store=store)
print("Multi threaded engine result %s" % result)
for (name, value) in expected:
actual = result.get(name)
if actual != value:
sys.stderr.write("%s != %s\n" % (actual, value))
unexpected += 1
if unexpected:
sys.exit(1)

View File

@@ -25,16 +25,17 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir))
sys.path.insert(0, top_dir)
import futurist
from taskflow import engines
from taskflow.patterns import linear_flow as lf
from taskflow.patterns import unordered_flow as uf
from taskflow import task
from taskflow.types import futures
from taskflow.utils import eventlet_utils
# INTRO: This is the defacto hello world equivalent for taskflow; it shows how
# a overly simplistic workflow can be created that runs using different
# an overly simplistic workflow can be created that runs using different
# engines using different styles of execution (all can be used to run in
# parallel if a workflow is provided that is parallelizable).
@@ -82,19 +83,19 @@ song.add(PrinterTask("conductor@begin",
# Run in parallel using eventlet green threads...
if eventlet_utils.EVENTLET_AVAILABLE:
with futures.GreenThreadPoolExecutor() as executor:
with futurist.GreenThreadPoolExecutor() as executor:
e = engines.load(song, executor=executor, engine='parallel')
e.run()
# Run in parallel using real threads...
with futures.ThreadPoolExecutor(max_workers=1) as executor:
with futurist.ThreadPoolExecutor(max_workers=1) as executor:
e = engines.load(song, executor=executor, engine='parallel')
e.run()
# Run in parallel using external processes...
with futures.ProcessPoolExecutor(max_workers=1) as executor:
with futurist.ProcessPoolExecutor(max_workers=1) as executor:
e = engines.load(song, executor=executor, engine='parallel')
e.run()

View File

@@ -1,171 +0,0 @@
# -*- encoding: utf-8 -*-
#
# Copyright © 2013 eNovance <licensing@enovance.com>
#
# Authors: Dan Krause <dan@dankrause.net>
# Cyril Roelandt <cyril.roelandt@enovance.com>
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
# This example shows how to use the job board feature.
#
# Let's start by creating some jobs:
# $ python job_board_no_test.py create my-board my-job '{}'
# $ python job_board_no_test.py create my-board my-job '{"foo": "bar"}'
# $ python job_board_no_test.py create my-board my-job '{"foo": "baz"}'
# $ python job_board_no_test.py create my-board my-job '{"foo": "barbaz"}'
#
# Make sure they were registered:
# $ python job_board_no_test.py list my-board
# 7277181a-1f83-473d-8233-f361615bae9e - {}
# 84a396e8-d02e-450d-8566-d93cb68550c0 - {u'foo': u'bar'}
# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
# cd9aae2c-fd64-416d-8ba0-426fa8e3d59c - {u'foo': u'barbaz'}
#
# Perform one job:
# $ python job_board_no_test.py consume my-board \
# 84a396e8-d02e-450d-8566-d93cb68550c0
# Performing job 84a396e8-d02e-450d-8566-d93cb68550c0 with args \
# {u'foo': u'bar'}
# $ python job_board_no_test.py list my-board
# 7277181a-1f83-473d-8233-f361615bae9e - {}
# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
# cd9aae2c-fd64-416d-8ba0-426fa8e3d59c - {u'foo': u'barbaz'}
#
# Delete a job:
# $ python job_board_no_test.py delete my-board \
# cd9aae2c-fd64-416d-8ba0-426fa8e3d59c
# $ python job_board_no_test.py list my-board
# 7277181a-1f83-473d-8233-f361615bae9e - {}
# 4d355d6a-2c72-44a2-a558-19ae52e8ae2c - {u'foo': u'baz'}
#
# Delete all the remaining jobs
# $ python job_board_no_test.py clear my-board
# $ python job_board_no_test.py list my-board
# $
import argparse
import contextlib
import json
import os
import sys
import tempfile
import taskflow.jobs.backends as job_backends
from taskflow.persistence import logbook
import example_utils # noqa
@contextlib.contextmanager
def jobboard(*args, **kwargs):
jb = job_backends.fetch(*args, **kwargs)
jb.connect()
yield jb
jb.close()
conf = {
'board': 'zookeeper',
'hosts': ['127.0.0.1:2181']
}
def consume_job(args):
def perform_job(job):
print("Performing job %s with args %s" % (job.uuid, job.details))
with jobboard(args.board_name, conf) as jb:
for job in jb.iterjobs(ensure_fresh=True):
if job.uuid == args.job_uuid:
jb.claim(job, "test-client")
perform_job(job)
jb.consume(job, "test-client")
def clear_jobs(args):
with jobboard(args.board_name, conf) as jb:
for job in jb.iterjobs(ensure_fresh=True):
jb.claim(job, "test-client")
jb.consume(job, "test-client")
def create_job(args):
store = json.loads(args.details)
book = logbook.LogBook(args.job_name)
if example_utils.SQLALCHEMY_AVAILABLE:
persist_path = os.path.join(tempfile.gettempdir(), "persisting.db")
backend_uri = "sqlite:///%s" % (persist_path)
else:
persist_path = os.path.join(tempfile.gettempdir(), "persisting")
backend_uri = "file:///%s" % (persist_path)
with example_utils.get_backend(backend_uri) as backend:
backend.get_connection().save_logbook(book)
with jobboard(args.board_name, conf, persistence=backend) as jb:
jb.post(args.job_name, book, details=store)
def list_jobs(args):
with jobboard(args.board_name, conf) as jb:
for job in jb.iterjobs(ensure_fresh=True):
print("%s - %s" % (job.uuid, job.details))
def delete_job(args):
with jobboard(args.board_name, conf) as jb:
for job in jb.iterjobs(ensure_fresh=True):
if job.uuid == args.job_uuid:
jb.claim(job, "test-client")
jb.consume(job, "test-client")
def main(argv):
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(title='subcommands',
description='valid subcommands',
help='additional help')
# Consume command
parser_consume = subparsers.add_parser('consume')
parser_consume.add_argument('board_name')
parser_consume.add_argument('job_uuid')
parser_consume.set_defaults(func=consume_job)
# Clear command
parser_consume = subparsers.add_parser('clear')
parser_consume.add_argument('board_name')
parser_consume.set_defaults(func=clear_jobs)
# Create command
parser_create = subparsers.add_parser('create')
parser_create.add_argument('board_name')
parser_create.add_argument('job_name')
parser_create.add_argument('details')
parser_create.set_defaults(func=create_job)
# Delete command
parser_delete = subparsers.add_parser('delete')
parser_delete.add_argument('board_name')
parser_delete.add_argument('job_uuid')
parser_delete.set_defaults(func=delete_job)
# List command
parser_list = subparsers.add_parser('list')
parser_list.add_argument('board_name')
parser_list.set_defaults(func=list_jobs)
args = parser.parse_args(argv)
args.func(args)
if __name__ == '__main__':
main(sys.argv[1:])

View File

@@ -30,6 +30,7 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir))
sys.path.insert(0, top_dir)
import six
from six.moves import range as compat_range
from zake import fake_client
@@ -40,7 +41,7 @@ from taskflow.utils import threading_utils
# In this example we show how a jobboard can be used to post work for other
# entities to work on. This example creates a set of jobs using one producer
# thread (typically this would be split across many machines) and then having
# other worker threads with there own jobboards select work using a given
# other worker threads with their own jobboards select work using a given
# filters [red/blue] and then perform that work (and consuming or abandoning
# the job after it has been completed or failed).
@@ -66,7 +67,7 @@ PRODUCER_UNITS = 10
# How many units of work are expected to be produced (used so workers can
# know when to stop running and shutdown, typically this would not be a
# a value but we have to limit this examples execution time to be less than
# a value but we have to limit this example's execution time to be less than
# infinity).
EXPECTED_UNITS = PRODUCER_UNITS * PRODUCERS
@@ -150,6 +151,14 @@ def producer(ident, client):
def main():
if six.PY3:
# TODO(harlowja): Hack to make eventlet work right, remove when the
# following is fixed: https://github.com/eventlet/eventlet/issues/230
from taskflow.utils import eventlet_utils as _eu # noqa
try:
import eventlet as _eventlet # noqa
except ImportError:
pass
with contextlib.closing(fake_client.FakeClient()) as c:
created = []
for i in compat_range(0, PRODUCERS):

View File

@@ -27,12 +27,12 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir))
sys.path.insert(0, top_dir)
import futurist
from six.moves import range as compat_range
from taskflow import engines
from taskflow.patterns import unordered_flow as uf
from taskflow import task
from taskflow.types import futures
from taskflow.utils import eventlet_utils
# INTRO: This example walks through a miniature workflow which does a parallel
@@ -98,9 +98,9 @@ def main():
# Now run it (using the specified executor)...
if eventlet_utils.EVENTLET_AVAILABLE:
executor = futures.GreenThreadPoolExecutor(max_workers=5)
executor = futurist.GreenThreadPoolExecutor(max_workers=5)
else:
executor = futures.ThreadPoolExecutor(max_workers=5)
executor = futurist.ThreadPoolExecutor(max_workers=5)
try:
e = engines.load(f, engine='parallel', executor=executor)
for st in e.run_iter():

View File

@@ -31,7 +31,7 @@ sys.path.insert(0, self_dir)
from taskflow import engines
from taskflow.patterns import linear_flow as lf
from taskflow.persistence import logbook
from taskflow.persistence import models
from taskflow import task
from taskflow.utils import persistence_utils as p_utils
@@ -68,15 +68,15 @@ class ByeTask(task.Task):
print("Bye!")
# This generates your flow structure (at this stage nothing is ran).
# This generates your flow structure (at this stage nothing is run).
def make_flow(blowup=False):
flow = lf.Flow("hello-world")
flow.add(HiTask(), ByeTask(blowup))
return flow
# Persist the flow and task state here, if the file/dir exists already blowup
# if not don't blowup, this allows a user to see both the modes and to see
# Persist the flow and task state here, if the file/dir exists already blow up
# if not don't blow up, this allows a user to see both the modes and to see
# what is stored in each case.
if eu.SQLALCHEMY_AVAILABLE:
persist_path = os.path.join(tempfile.gettempdir(), "persisting.db")
@@ -91,10 +91,10 @@ else:
blowup = True
with eu.get_backend(backend_uri) as backend:
# Make a flow that will blowup if the file doesn't exist previously, if it
# did exist, assume we won't blowup (and therefore this shows the undo
# Make a flow that will blow up if the file didn't exist previously, if it
# did exist, assume we won't blow up (and therefore this shows the undo
# and redo that a flow will go through).
book = logbook.LogBook("my-test")
book = models.LogBook("my-test")
flow = make_flow(blowup=blowup)
eu.print_wrapped("Running")
try:

View File

@@ -31,6 +31,7 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
sys.path.insert(0, top_dir)
sys.path.insert(0, self_dir)
import futurist
from oslo_utils import uuidutils
from taskflow import engines
@@ -38,13 +39,12 @@ from taskflow import exceptions as exc
from taskflow.patterns import graph_flow as gf
from taskflow.patterns import linear_flow as lf
from taskflow import task
from taskflow.types import futures
from taskflow.utils import eventlet_utils
from taskflow.utils import persistence_utils as p_utils
import example_utils as eu # noqa
# INTRO: This examples shows how a hierarchy of flows can be used to create a
# INTRO: These examples show how a hierarchy of flows can be used to create a
# vm in a reliable & resumable manner using taskflow + a miniature version of
# what nova does while booting a vm.
@@ -239,7 +239,7 @@ with eu.get_backend() as backend:
# Set up how we want our engine to run, serial, parallel...
executor = None
if eventlet_utils.EVENTLET_AVAILABLE:
executor = futures.GreenThreadPoolExecutor(5)
executor = futurist.GreenThreadPoolExecutor(5)
# Create/fetch a logbook that will track the workflows work.
book = None

View File

@@ -39,7 +39,7 @@ from taskflow.utils import persistence_utils as p_utils
import example_utils # noqa
# INTRO: This examples shows how a hierarchy of flows can be used to create a
# INTRO: These examples show how a hierarchy of flows can be used to create a
# pseudo-volume in a reliable & resumable manner using taskflow + a miniature
# version of what cinder does while creating a volume (very miniature).

View File

@@ -32,7 +32,7 @@ from taskflow import task
# INTRO: In this example we create a retry controller that receives a phone
# directory and tries different phone numbers. The next task tries to call Jim
# using the given number. If if is not a Jim's number, the tasks raises an
# using the given number. If it is not a Jim's number, the task raises an
# exception and retry controller takes the next number from the phone
# directory and retries the call.
#

View File

@@ -37,7 +37,7 @@ from taskflow import task
from taskflow.utils import persistence_utils
# INTRO: This examples shows how to run a set of engines at the same time, each
# INTRO: This example shows how to run a set of engines at the same time, each
# running in different engines using a single thread of control to iterate over
# each engine (which causes that engine to advanced to its next state during
# each iteration).

View File

@@ -33,10 +33,10 @@ from taskflow.persistence import backends as persistence_backends
from taskflow import task
from taskflow.utils import persistence_utils
# INTRO: This examples shows how to run a engine using the engine iteration
# INTRO: These examples show how to run an engine using the engine iteration
# capability, in between iterations other activities occur (in this case a
# value is output to stdout); but more complicated actions can occur at the
# boundary when a engine yields its current state back to the caller.
# boundary when an engine yields its current state back to the caller.
class EchoNameTask(task.Task):

View File

@@ -0,0 +1,81 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2012-2013 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
import os
import random
import sys
import time
logging.basicConfig(level=logging.ERROR)
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir,
os.pardir))
sys.path.insert(0, top_dir)
import futurist
import six
from taskflow import engines
from taskflow.patterns import unordered_flow as uf
from taskflow import task
from taskflow.utils import threading_utils as tu
# INTRO: in this example we create 2 dummy flow(s) with a 2 dummy task(s), and
# run it using a shared thread pool executor to show how a single executor can
# be used with more than one engine (sharing the execution thread pool between
# them); this allows for saving resources and reusing threads in situations
# where this is benefical.
class DelayedTask(task.Task):
def __init__(self, name):
super(DelayedTask, self).__init__(name=name)
self._wait_for = random.random()
def execute(self):
print("Running '%s' in thread '%s'" % (self.name, tu.get_ident()))
time.sleep(self._wait_for)
f1 = uf.Flow("f1")
f1.add(DelayedTask("f1-1"))
f1.add(DelayedTask("f1-2"))
f2 = uf.Flow("f2")
f2.add(DelayedTask("f2-1"))
f2.add(DelayedTask("f2-2"))
# Run them all using the same futures (thread-pool based) executor...
with futurist.ThreadPoolExecutor() as ex:
e1 = engines.load(f1, engine='parallel', executor=ex)
e2 = engines.load(f2, engine='parallel', executor=ex)
iters = [e1.run_iter(), e2.run_iter()]
# Iterate over a copy (so we can remove from the source list).
cloned_iters = list(iters)
while iters:
# Run a single 'step' of each iterator, forcing each engine to perform
# some work, then yield, and repeat until each iterator is consumed
# and there is no more engine work to be done.
for it in cloned_iters:
try:
six.next(it)
except StopIteration:
try:
iters.remove(it)
except ValueError:
pass

View File

@@ -41,8 +41,8 @@ from taskflow import task
# taskflow provides via tasks and flows makes it possible for you to easily at
# a later time hook in a persistence layer (and then gain the functionality
# that offers) when you decide the complexity of adding that layer in
# is 'worth it' for your applications usage pattern (which certain applications
# may not need).
# is 'worth it' for your application's usage pattern (which certain
# applications may not need).
class CallJim(task.Task):

View File

@@ -37,7 +37,7 @@ ANY = notifier.Notifier.ANY
# a given ~phone~ number (provided as a function input) in a linear fashion
# (one after the other).
#
# For a workflow which is serial this shows a extremely simple way
# For a workflow which is serial this shows an extremely simple way
# of structuring your tasks (the code that does the work) into a linear
# sequence (the flow) and then passing the work off to an engine, with some
# initial data to be ran in a reliable manner.
@@ -92,11 +92,11 @@ engine = taskflow.engines.load(flow, store={
})
# This is where we attach our callback functions to the 2 different
# notification objects that a engine exposes. The usage of a '*' (kleene star)
# notification objects that an engine exposes. The usage of a ANY (kleene star)
# here means that we want to be notified on all state changes, if you want to
# restrict to a specific state change, just register that instead.
engine.notifier.register(ANY, flow_watch)
engine.task_notifier.register(ANY, task_watch)
engine.atom_notifier.register(ANY, task_watch)
# And now run!
engine.run()

View File

@@ -31,7 +31,7 @@ from taskflow import engines
from taskflow.patterns import linear_flow
from taskflow import task
# INTRO: This examples shows how a task (in a linear/serial workflow) can
# INTRO: This example shows how a task (in a linear/serial workflow) can
# produce an output that can be then consumed/used by a downstream task.

View File

@@ -27,9 +27,9 @@ top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
sys.path.insert(0, top_dir)
sys.path.insert(0, self_dir)
# INTRO: this examples shows a simplistic map/reduce implementation where
# INTRO: These examples show a simplistic map/reduce implementation where
# a set of mapper(s) will sum a series of input numbers (in parallel) and
# return there individual summed result. A reducer will then use those
# return their individual summed result. A reducer will then use those
# produced values and perform a final summation and this result will then be
# printed (and verified to ensure the calculation was as expected).

View File

@@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import logging
import os
import sys
logging.basicConfig(level=logging.ERROR)
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir,
os.pardir))
sys.path.insert(0, top_dir)
from taskflow import engines
from taskflow.patterns import graph_flow as gf
from taskflow.persistence import backends
from taskflow import task
from taskflow.utils import persistence_utils as pu
class DummyTask(task.Task):
def execute(self):
print("Running %s" % self.name)
def allow(history):
print(history)
return False
r = gf.Flow("root")
r_a = DummyTask('r-a')
r_b = DummyTask('r-b')
r.add(r_a, r_b)
r.link(r_a, r_b, decider=allow)
backend = backends.fetch({
'connection': 'memory://',
})
book, flow_detail = pu.temporary_flow_detail(backend=backend)
e = engines.load(r, flow_detail=flow_detail, book=book, backend=backend)
e.compile()
e.prepare()
e.run()
print("---------")
print("After run")
print("---------")
entries = [os.path.join(backend.memory.root_path, child)
for child in backend.memory.ls(backend.memory.root_path)]
while entries:
path = entries.pop()
value = backend.memory[path]
if value:
print("%s -> %s" % (path, value))
else:
print("%s" % (path))
entries.extend(os.path.join(path, child)
for child in backend.memory.ls(path))

View File

@@ -36,7 +36,7 @@ from taskflow import task
# and have variable run time tasks run and show how the listener will print
# out how long those tasks took (when they started and when they finished).
#
# This shows how timing metrics can be gathered (or attached onto a engine)
# This shows how timing metrics can be gathered (or attached onto an engine)
# after a workflow has been constructed, making it easy to gather metrics
# dynamically for situations where this kind of information is applicable (or
# even adding this information on at a later point in the future when your
@@ -55,5 +55,5 @@ class VariableTask(task.Task):
f = lf.Flow('root')
f.add(VariableTask('a'), VariableTask('b'), VariableTask('c'))
e = engines.load(f)
with timing.PrintingTimingListener(e):
with timing.PrintingDurationListener(e):
e.run()

View File

@@ -0,0 +1,243 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2014 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import contextlib
import itertools
import logging
import os
import shutil
import socket
import sys
import tempfile
import threading
import time
logging.basicConfig(level=logging.ERROR)
top_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
os.pardir,
os.pardir))
sys.path.insert(0, top_dir)
from oslo_utils import timeutils
from oslo_utils import uuidutils
import six
from zake import fake_client
from taskflow.conductors import backends as conductors
from taskflow import engines
from taskflow.jobs import backends as boards
from taskflow.patterns import linear_flow
from taskflow.persistence import backends as persistence
from taskflow.persistence import models
from taskflow import task
from taskflow.utils import threading_utils
# INTRO: This examples shows how a worker/producer can post desired work (jobs)
# to a jobboard and a conductor can consume that work (jobs) from that jobboard
# and execute those jobs in a reliable & async manner (for example, if the
# conductor were to crash then the job will be released back onto the jobboard
# and another conductor can attempt to finish it, from wherever that job last
# left off).
#
# In this example a in-memory jobboard (and in-memory storage) is created and
# used that simulates how this would be done at a larger scale (it is an
# example after all).
# Restrict how long this example runs for...
RUN_TIME = 5
REVIEW_CREATION_DELAY = 0.5
SCAN_DELAY = 0.1
NAME = "%s_%s" % (socket.getfqdn(), os.getpid())
# This won't really use zookeeper but will use a local version of it using
# the zake library that mimics an actual zookeeper cluster using threads and
# an in-memory data structure.
JOBBOARD_CONF = {
'board': 'zookeeper://localhost?path=/taskflow/tox/jobs',
}
class RunReview(task.Task):
# A dummy task that clones the review and runs tox...
def _clone_review(self, review, temp_dir):
print("Cloning review '%s' into %s" % (review['id'], temp_dir))
def _run_tox(self, temp_dir):
print("Running tox in %s" % temp_dir)
def execute(self, review, temp_dir):
self._clone_review(review, temp_dir)
self._run_tox(temp_dir)
class MakeTempDir(task.Task):
# A task that creates and destroys a temporary dir (on failure).
#
# It provides the location of the temporary dir for other tasks to use
# as they see fit.
default_provides = 'temp_dir'
def execute(self):
return tempfile.mkdtemp()
def revert(self, *args, **kwargs):
temp_dir = kwargs.get(task.REVERT_RESULT)
if temp_dir:
shutil.rmtree(temp_dir)
class CleanResources(task.Task):
# A task that cleans up any workflow resources.
def execute(self, temp_dir):
print("Removing %s" % temp_dir)
shutil.rmtree(temp_dir)
def review_iter():
"""Makes reviews (never-ending iterator/generator)."""
review_id_gen = itertools.count(0)
while True:
review_id = six.next(review_id_gen)
review = {
'id': review_id,
}
yield review
# The reason this is at the module namespace level is important, since it must
# be accessible from a conductor dispatching an engine, if it was a lambda
# function for example, it would not be reimportable and the conductor would
# be unable to reference it when creating the workflow to run.
def create_review_workflow():
"""Factory method used to create a review workflow to run."""
f = linear_flow.Flow("tester")
f.add(
MakeTempDir(name="maker"),
RunReview(name="runner"),
CleanResources(name="cleaner")
)
return f
def generate_reviewer(client, saver, name=NAME):
"""Creates a review producer thread with the given name prefix."""
real_name = "%s_reviewer" % name
no_more = threading.Event()
jb = boards.fetch(real_name, JOBBOARD_CONF,
client=client, persistence=saver)
def make_save_book(saver, review_id):
# Record what we want to happen (sometime in the future).
book = models.LogBook("book_%s" % review_id)
detail = models.FlowDetail("flow_%s" % review_id,
uuidutils.generate_uuid())
book.add(detail)
# Associate the factory method we want to be called (in the future)
# with the book, so that the conductor will be able to call into
# that factory to retrieve the workflow objects that represent the
# work.
#
# These args and kwargs *can* be used to save any specific parameters
# into the factory when it is being called to create the workflow
# objects (typically used to tell a factory how to create a unique
# workflow that represents this review).
factory_args = ()
factory_kwargs = {}
engines.save_factory_details(detail, create_review_workflow,
factory_args, factory_kwargs)
with contextlib.closing(saver.get_connection()) as conn:
conn.save_logbook(book)
return book
def run():
"""Periodically publishes 'fake' reviews to analyze."""
jb.connect()
review_generator = review_iter()
with contextlib.closing(jb):
while not no_more.is_set():
review = six.next(review_generator)
details = {
'store': {
'review': review,
},
}
job_name = "%s_%s" % (real_name, review['id'])
print("Posting review '%s'" % review['id'])
jb.post(job_name,
book=make_save_book(saver, review['id']),
details=details)
time.sleep(REVIEW_CREATION_DELAY)
# Return the unstarted thread, and a callback that can be used
# shutdown that thread (to avoid running forever).
return (threading_utils.daemon_thread(target=run), no_more.set)
def generate_conductor(client, saver, name=NAME):
"""Creates a conductor thread with the given name prefix."""
real_name = "%s_conductor" % name
jb = boards.fetch(name, JOBBOARD_CONF,
client=client, persistence=saver)
conductor = conductors.fetch("blocking", real_name, jb,
engine='parallel', wait_timeout=SCAN_DELAY)
def run():
jb.connect()
with contextlib.closing(jb):
conductor.run()
# Return the unstarted thread, and a callback that can be used
# shutdown that thread (to avoid running forever).
return (threading_utils.daemon_thread(target=run), conductor.stop)
def main():
# Need to share the same backend, so that data can be shared...
persistence_conf = {
'connection': 'memory',
}
saver = persistence.fetch(persistence_conf)
with contextlib.closing(saver.get_connection()) as conn:
# This ensures that the needed backend setup/data directories/schema
# upgrades and so on... exist before they are attempted to be used...
conn.upgrade()
fc1 = fake_client.FakeClient()
# Done like this to share the same client storage location so the correct
# zookeeper features work across clients...
fc2 = fake_client.FakeClient(storage=fc1.storage)
entities = [
generate_reviewer(fc1, saver),
generate_conductor(fc2, saver),
]
for t, stopper in entities:
t.start()
try:
watch = timeutils.StopWatch(duration=RUN_TIME)
watch.start()
while not watch.expired():
time.sleep(0.1)
finally:
for t, stopper in reversed(entities):
stopper()
t.join()
if __name__ == '__main__':
main()

View File

@@ -36,10 +36,10 @@ from taskflow.utils import threading_utils
ANY = notifier.Notifier.ANY
# INTRO: This examples shows how to use a remote workers event notification
# INTRO: These examples show how to use a remote worker's event notification
# attribute to proxy back task event notifications to the controlling process.
#
# In this case a simple set of events are triggered by a worker running a
# In this case a simple set of events is triggered by a worker running a
# task (simulated to be remote by using a kombu memory transport and threads).
# Those events that the 'remote worker' produces will then be proxied back to
# the task that the engine is running 'remotely', and then they will be emitted
@@ -113,10 +113,10 @@ if __name__ == "__main__":
workers = []
# These topics will be used to request worker information on; those
# workers will respond with there capabilities which the executing engine
# workers will respond with their capabilities which the executing engine
# will use to match pending tasks to a matched worker, this will cause
# the task to be sent for execution, and the engine will wait until it
# is finished (a response is recieved) and then the engine will either
# is finished (a response is received) and then the engine will either
# continue with other tasks, do some retry/failure resolution logic or
# stop (and potentially re-raise the remote workers failure)...
worker_topics = []

View File

@@ -111,11 +111,11 @@ def calculate(engine_conf):
# an image bitmap file.
# And unordered flow is used here since the mandelbrot calculation is an
# example of a embarrassingly parallel computation that we can scatter
# example of an embarrassingly parallel computation that we can scatter
# across as many workers as possible.
flow = uf.Flow("mandelbrot")
# These symbols will be automatically given to tasks as input to there
# These symbols will be automatically given to tasks as input to their
# execute method, in this case these are constants used in the mandelbrot
# calculation.
store = {

View File

@@ -17,15 +17,46 @@
import os
import traceback
from oslo_utils import excutils
from oslo_utils import reflection
import six
def raise_with_cause(exc_cls, message, *args, **kwargs):
"""Helper to raise + chain exceptions (when able) and associate a *cause*.
NOTE(harlowja): Since in py3.x exceptions can be chained (due to
:pep:`3134`) we should try to raise the desired exception with the given
*cause* (or extract a *cause* from the current stack if able) so that the
exception formats nicely in old and new versions of python. Since py2.x
does **not** support exception chaining (or formatting) our root exception
class has a :py:meth:`~taskflow.exceptions.TaskFlowException.pformat`
method that can be used to get *similar* information instead (and this
function makes sure to retain the *cause* in that case as well so
that the :py:meth:`~taskflow.exceptions.TaskFlowException.pformat` method
shows them).
:param exc_cls: the :py:class:`~taskflow.exceptions.TaskFlowException`
class to raise.
:param message: the text/str message that will be passed to
the exceptions constructor as its first positional
argument.
:param args: any additional positional arguments to pass to the
exceptions constructor.
:param kwargs: any additional keyword arguments to pass to the
exceptions constructor.
"""
if not issubclass(exc_cls, TaskFlowException):
raise ValueError("Subclass of taskflow exception is required")
excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
class TaskFlowException(Exception):
"""Base class for *most* exceptions emitted from this library.
NOTE(harlowja): in later versions of python we can likely remove the need
to have a cause here as PY3+ have implemented PEP 3134 which handles
chaining in a much more elegant manner.
to have a ``cause`` here as PY3+ have implemented :pep:`3134` which
handles chaining in a much more elegant manner.
:param message: the exception message, typically some string that is
useful for consumers to view when debugging or analyzing
@@ -43,35 +74,55 @@ class TaskFlowException(Exception):
def cause(self):
return self._cause
def pformat(self, indent=2, indent_text=" "):
def __str__(self):
return self.pformat()
def _get_message(self):
# We must *not* call into the __str__ method as that will reactivate
# the pformat method, which will end up badly (and doesn't look
# pretty at all); so be careful...
return self.args[0]
def pformat(self, indent=2, indent_text=" ", show_root_class=False):
"""Pretty formats a taskflow exception + any connected causes."""
if indent < 0:
raise ValueError("indent must be greater than or equal to zero")
return os.linesep.join(self._pformat(self, [], 0,
indent=indent,
indent_text=indent_text))
@classmethod
def _pformat(cls, excp, lines, current_indent, indent=2, indent_text=" "):
line_prefix = indent_text * current_indent
for line in traceback.format_exception_only(type(excp), excp):
# We'll add our own newlines on at the end of formatting.
#
# NOTE(harlowja): the reason we don't search for os.linesep is
# that the traceback module seems to only use '\n' (for some
# reason).
if line.endswith("\n"):
line = line[0:-1]
lines.append(line_prefix + line)
try:
cause = excp.cause
except AttributeError:
pass
else:
if cause is not None:
cls._pformat(cause, lines, current_indent + indent,
indent=indent, indent_text=indent_text)
return lines
raise ValueError("Provided 'indent' must be greater than"
" or equal to zero instead of %s" % indent)
buf = six.StringIO()
if show_root_class:
buf.write(reflection.get_class_name(self, fully_qualified=False))
buf.write(": ")
buf.write(self._get_message())
active_indent = indent
next_up = self.cause
seen = []
while next_up is not None and next_up not in seen:
seen.append(next_up)
buf.write(os.linesep)
if isinstance(next_up, TaskFlowException):
buf.write(indent_text * active_indent)
buf.write(reflection.get_class_name(next_up,
fully_qualified=False))
buf.write(": ")
buf.write(next_up._get_message())
else:
lines = traceback.format_exception_only(type(next_up), next_up)
for i, line in enumerate(lines):
buf.write(indent_text * active_indent)
if line.endswith("\n"):
# We'll add our own newlines on...
line = line[0:-1]
buf.write(line)
if i + 1 != len(lines):
buf.write(os.linesep)
if not isinstance(next_up, TaskFlowException):
# Don't go deeper into non-taskflow exceptions... as we
# don't know if there exception 'cause' attributes are even
# useable objects...
break
active_indent += indent
next_up = getattr(next_up, 'cause', None)
return buf.getvalue()
# Errors related to storage or operations on storage units.

View File

@@ -31,6 +31,9 @@ LINK_RETRY = 'retry'
# This key denotes the link was created due to symbol constraints and the
# value will be a set of names that the constraint ensures are satisfied.
LINK_REASONS = 'reasons'
#
# This key denotes a callable that will determine if the target is visited.
LINK_DECIDER = 'decider'
@six.add_metaclass(abc.ABCMeta)
@@ -96,9 +99,8 @@ class Flow(object):
"""
def __str__(self):
lines = ["%s: %s" % (reflection.get_class_name(self), self.name)]
lines.append("%s" % (len(self)))
return "; ".join(lines)
return "%s: %s(len=%d)" % (reflection.get_class_name(self),
self.name, len(self))
@property
def provides(self):

View File

@@ -39,17 +39,17 @@ def fetch(name, conf, namespace=BACKEND_NAMESPACE, **kwargs):
NOTE(harlowja): to aid in making it easy to specify configuration and
options to a board the configuration (which is typical just a dictionary)
can also be a uri string that identifies the entrypoint name and any
can also be a URI string that identifies the entrypoint name and any
configuration specific to that board.
For example, given the following configuration uri:
For example, given the following configuration URI::
zookeeper://<not-used>/?a=b&c=d
zookeeper://<not-used>/?a=b&c=d
This will look for the entrypoint named 'zookeeper' and will provide
a configuration object composed of the uris parameters, in this case that
is {'a': 'b', 'c': 'd'} to the constructor of that board instance (also
including the name specified).
a configuration object composed of the URI's components, in this case that
is ``{'a': 'b', 'c': 'd'}`` to the constructor of that board
instance (also including the name specified).
"""
if isinstance(conf, six.string_types):
conf = {'board': conf}

View File

@@ -0,0 +1,957 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2015 Yahoo! Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import contextlib
import datetime
import string
import threading
import time
import fasteners
import msgpack
from oslo_serialization import msgpackutils
from oslo_utils import strutils
from oslo_utils import timeutils
from oslo_utils import uuidutils
from redis import exceptions as redis_exceptions
import six
from six.moves import range as compat_range
from taskflow import exceptions as exc
from taskflow.jobs import base
from taskflow import logging
from taskflow import states
from taskflow.utils import misc
from taskflow.utils import redis_utils as ru
LOG = logging.getLogger(__name__)
@contextlib.contextmanager
def _translate_failures():
"""Translates common redis exceptions into taskflow exceptions."""
try:
yield
except redis_exceptions.ConnectionError:
exc.raise_with_cause(exc.JobFailure, "Failed to connect to redis")
except redis_exceptions.TimeoutError:
exc.raise_with_cause(exc.JobFailure,
"Failed to communicate with redis, connection"
" timed out")
except redis_exceptions.RedisError:
exc.raise_with_cause(exc.JobFailure,
"Failed to communicate with redis,"
" internal error")
class RedisJob(base.Job):
"""A redis job."""
def __init__(self, board, name, sequence, key,
uuid=None, details=None,
created_on=None, backend=None,
book=None, book_data=None):
super(RedisJob, self).__init__(board, name,
uuid=uuid, details=details,
backend=backend,
book=book, book_data=book_data)
self._created_on = created_on
self._client = board._client
self._redis_version = board._redis_version
self._sequence = sequence
self._key = key
self._last_modified_key = board.join(key + board.LAST_MODIFIED_POSTFIX)
self._owner_key = board.join(key + board.OWNED_POSTFIX)
@property
def key(self):
"""Key (in board listings/trash hash) the job data is stored under."""
return self._key
@property
def last_modified_key(self):
"""Key the job last modified data is stored under."""
return self._last_modified_key
@property
def owner_key(self):
"""Key the job claim + data of the owner is stored under."""
return self._owner_key
@property
def sequence(self):
"""Sequence number of the current job."""
return self._sequence
def expires_in(self):
"""How many seconds until the claim expires.
Returns the number of seconds until the ownership entry expires or
:attr:`~taskflow.utils.redis_utils.UnknownExpire.DOES_NOT_EXPIRE` or
:attr:`~taskflow.utils.redis_utils.UnknownExpire.KEY_NOT_FOUND` if it
does not expire or if the expiry can not be determined (perhaps the
:attr:`.owner_key` expired at/before time of inquiry?).
"""
with _translate_failures():
return ru.get_expiry(self._client, self._owner_key,
prior_version=self._redis_version)
def extend_expiry(self, expiry):
"""Extends the owner key (aka the claim) expiry for this job.
NOTE(harlowja): if the claim for this job did **not** previously
have an expiry associated with it, calling this method will create
one (and after that time elapses the claim on this job will cease
to exist).
Returns ``True`` if the expiry request was performed
otherwise ``False``.
"""
with _translate_failures():
return ru.apply_expiry(self._client, self._owner_key, expiry,
prior_version=self._redis_version)
def __lt__(self, other):
if self.created_on == other.created_on:
return self.sequence < other.sequence
else:
return self.created_on < other.created_on
@property
def created_on(self):
return self._created_on
@property
def last_modified(self):
with _translate_failures():
raw_last_modified = self._client.get(self._last_modified_key)
last_modified = None
if raw_last_modified:
last_modified = self._board._loads(
raw_last_modified, root_types=(datetime.datetime,))
# NOTE(harlowja): just incase this is somehow busted (due to time
# sync issues/other), give back the most recent one (since redis
# does not maintain clock information; we could have this happen
# due to now clients who mutate jobs also send the time in).
last_modified = max(last_modified, self._created_on)
return last_modified
@property
def state(self):
listings_key = self._board.listings_key
owner_key = self._owner_key
listings_sub_key = self._key
def _do_fetch(p):
# NOTE(harlowja): state of a job in redis is not set into any
# explicit 'state' field, but is maintained by what nodes exist in
# redis instead (ie if a owner key exists, then we know a owner
# is active, if no job data exists and no owner, then we know that
# the job is unclaimed, and so-on)...
p.multi()
p.hexists(listings_key, listings_sub_key)
p.exists(owner_key)
job_exists, owner_exists = p.execute()
if not job_exists:
if owner_exists:
# This should **not** be possible due to lua code ordering
# but let's log an INFO statement if it does happen (so
# that it can be investigated)...
LOG.info("Unexpected owner key found at '%s' when job"
" key '%s[%s]' was not found", owner_key,
listings_key, listings_sub_key)
return states.COMPLETE
else:
if owner_exists:
return states.CLAIMED
else:
return states.UNCLAIMED
with _translate_failures():
return self._client.transaction(_do_fetch,
listings_key, owner_key,
value_from_callable=True)
def __str__(self):
"""Pretty formats the job into something *more* meaningful."""
tpl = "%s: %s (uuid=%s, owner_key=%s, sequence=%s, details=%s)"
return tpl % (type(self).__name__,
self.name, self.uuid, self.owner_key,
self.sequence, self.details)
class RedisJobBoard(base.JobBoard):
"""A jobboard backed by `redis`_.
Powered by the `redis-py <http://redis-py.readthedocs.org/>`_ library.
This jobboard creates job entries by listing jobs in a redis `hash`_. This
hash contains jobs that can be actively worked on by (and examined/claimed
by) some set of eligible consumers. Job posting is typically performed
using the :meth:`.post` method (this creates a hash entry with job
contents/details encoded in `msgpack`_). The users of these
jobboard(s) (potentially on disjoint sets of machines) can then
iterate over the available jobs and decide if they want to attempt to
claim one of the jobs they have iterated over. If so they will then
attempt to contact redis and they will attempt to create a key in
redis (using a embedded lua script to perform this atomically) to claim a
desired job. If the entity trying to use the jobboard to :meth:`.claim`
the job is able to create that lock/owner key then it will be
allowed (and expected) to perform whatever *work* the contents of that
job described. Once the claiming entity is finished the lock/owner key
and the `hash`_ entry will be deleted (if successfully completed) in a
single request (also using a embedded lua script to perform this
atomically). If the claiming entity is not successful (or the entity
that claimed the job dies) the lock/owner key can be released
automatically (by **optional** usage of a claim expiry) or by
using :meth:`.abandon` to manually abandon the job so that it can be
consumed/worked on by others.
NOTE(harlowja): by default the :meth:`.claim` has no expiry (which
means claims will be persistent, even under claiming entity failure). To
ensure a expiry occurs pass a numeric value for the ``expiry`` keyword
argument to the :meth:`.claim` method that defines how many seconds the
claim should be retained for. When an expiry is used ensure that that
claim is kept alive while it is being worked on by using
the :py:meth:`~.RedisJob.extend_expiry` method periodically.
.. _msgpack: http://msgpack.org/
.. _redis: http://redis.io/
.. _hash: http://redis.io/topics/data-types#hashes
"""
CLIENT_CONF_TRANSFERS = tuple([
# Host config...
('host', str),
('port', int),
# See: http://redis.io/commands/auth
('password', str),
# Data encoding/decoding + error handling
('encoding', str),
('encoding_errors', str),
# Connection settings.
('socket_timeout', float),
('socket_connect_timeout', float),
# This one negates the usage of host, port, socket connection
# settings as it doesn't use the same kind of underlying socket...
('unix_socket_path', str),
# Do u want ssl???
('ssl', strutils.bool_from_string),
('ssl_keyfile', str),
('ssl_certfile', str),
('ssl_cert_reqs', str),
('ssl_ca_certs', str),
# See: http://www.rediscookbook.org/multiple_databases.html
('db', int),
])
"""
Keys (and value type converters) that we allow to proxy from the jobboard
configuration into the redis client (used to configure the redis client
internals if no explicit client is provided via the ``client`` keyword
argument).
See: http://redis-py.readthedocs.org/en/latest/#redis.Redis
See: https://github.com/andymccurdy/redis-py/blob/2.10.3/redis/client.py
"""
#: Postfix (combined with job key) used to make a jobs owner key.
OWNED_POSTFIX = b".owned"
#: Postfix (combined with job key) used to make a jobs last modified key.
LAST_MODIFIED_POSTFIX = b".last_modified"
#: Default namespace for keys when none is provided.
DEFAULT_NAMESPACE = b'taskflow'
MIN_REDIS_VERSION = (2, 6)
"""
Minimum redis version this backend requires.
This version is required since we need the built-in server-side lua
scripting support that is included in 2.6 and newer.
"""
NAMESPACE_SEP = b':'
"""
Separator that is used to combine a key with the namespace (to get
the **actual** key that will be used).
"""
KEY_PIECE_SEP = b'.'
"""
Separator that is used to combine a bunch of key pieces together (to get
the **actual** key that will be used).
"""
#: Expected lua response status field when call is ok.
SCRIPT_STATUS_OK = "ok"
#: Expected lua response status field when call is **not** ok.
SCRIPT_STATUS_ERROR = "error"
#: Expected lua script error response when the owner is not as expected.
SCRIPT_NOT_EXPECTED_OWNER = "Not expected owner!"
#: Expected lua script error response when the owner is not findable.
SCRIPT_UNKNOWN_OWNER = "Unknown owner!"
#: Expected lua script error response when the job is not findable.
SCRIPT_UNKNOWN_JOB = "Unknown job!"
#: Expected lua script error response when the job is already claimed.
SCRIPT_ALREADY_CLAIMED = "Job already claimed!"
SCRIPT_TEMPLATES = {
'consume': """
-- Extract *all* the variables (so we can easily know what they are)...
local owner_key = KEYS[1]
local listings_key = KEYS[2]
local last_modified_key = KEYS[3]
local expected_owner = ARGV[1]
local job_key = ARGV[2]
local result = {}
if redis.call("hexists", listings_key, job_key) == 1 then
if redis.call("exists", owner_key) == 1 then
local owner = redis.call("get", owner_key)
if owner ~= expected_owner then
result["status"] = "${error}"
result["reason"] = "${not_expected_owner}"
result["owner"] = owner
else
-- The order is important here, delete the owner first (and if
-- that blows up, the job data will still exist so it can be
-- worked on again, instead of the reverse)...
redis.call("del", owner_key, last_modified_key)
redis.call("hdel", listings_key, job_key)
result["status"] = "${ok}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_owner}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_job}"
end
return cmsgpack.pack(result)
""",
'claim': """
local function apply_ttl(key, ms_expiry)
if ms_expiry ~= nil then
redis.call("pexpire", key, ms_expiry)
end
end
-- Extract *all* the variables (so we can easily know what they are)...
local owner_key = KEYS[1]
local listings_key = KEYS[2]
local last_modified_key = KEYS[3]
local expected_owner = ARGV[1]
local job_key = ARGV[2]
local last_modified_blob = ARGV[3]
-- If this is non-numeric (which it may be) this becomes nil
local ms_expiry = nil
if ARGV[4] ~= "none" then
ms_expiry = tonumber(ARGV[4])
end
local result = {}
if redis.call("hexists", listings_key, job_key) == 1 then
if redis.call("exists", owner_key) == 1 then
local owner = redis.call("get", owner_key)
if owner == expected_owner then
-- Owner is the same, leave it alone...
redis.call("set", last_modified_key, last_modified_blob)
apply_ttl(owner_key, ms_expiry)
result["status"] = "${ok}"
else
result["status"] = "${error}"
result["reason"] = "${already_claimed}"
result["owner"] = owner
end
else
redis.call("set", owner_key, expected_owner)
redis.call("set", last_modified_key, last_modified_blob)
apply_ttl(owner_key, ms_expiry)
result["status"] = "${ok}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_job}"
end
return cmsgpack.pack(result)
""",
'abandon': """
-- Extract *all* the variables (so we can easily know what they are)...
local owner_key = KEYS[1]
local listings_key = KEYS[2]
local last_modified_key = KEYS[3]
local expected_owner = ARGV[1]
local job_key = ARGV[2]
local last_modified_blob = ARGV[3]
local result = {}
if redis.call("hexists", listings_key, job_key) == 1 then
if redis.call("exists", owner_key) == 1 then
local owner = redis.call("get", owner_key)
if owner ~= expected_owner then
result["status"] = "${error}"
result["reason"] = "${not_expected_owner}"
result["owner"] = owner
else
redis.call("del", owner_key)
redis.call("set", last_modified_key, last_modified_blob)
result["status"] = "${ok}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_owner}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_job}"
end
return cmsgpack.pack(result)
""",
'trash': """
-- Extract *all* the variables (so we can easily know what they are)...
local owner_key = KEYS[1]
local listings_key = KEYS[2]
local last_modified_key = KEYS[3]
local trash_listings_key = KEYS[4]
local expected_owner = ARGV[1]
local job_key = ARGV[2]
local last_modified_blob = ARGV[3]
local result = {}
if redis.call("hexists", listings_key, job_key) == 1 then
local raw_posting = redis.call("hget", listings_key, job_key)
if redis.call("exists", owner_key) == 1 then
local owner = redis.call("get", owner_key)
if owner ~= expected_owner then
result["status"] = "${error}"
result["reason"] = "${not_expected_owner}"
result["owner"] = owner
else
-- This ordering is important (try to first move the value
-- and only if that works do we try to do any deletions)...
redis.call("hset", trash_listings_key, job_key, raw_posting)
redis.call("set", last_modified_key, last_modified_blob)
redis.call("del", owner_key)
redis.call("hdel", listings_key, job_key)
result["status"] = "${ok}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_owner}"
end
else
result["status"] = "${error}"
result["reason"] = "${unknown_job}"
end
return cmsgpack.pack(result)
""",
}
"""`Lua`_ **template** scripts that will be used by various methods (they
are turned into real scripts and loaded on call into the :func:`.connect`
method).
Some things to note:
- The lua script is ran serially, so when this runs no other command will
be mutating the backend (and redis also ensures that no other script
will be running) so atomicity of these scripts are guaranteed by redis.
- Transactions were considered (and even mostly implemented) but
ultimately rejected since redis does not support rollbacks and
transactions can **not** be interdependent (later operations can **not**
depend on the results of earlier operations). Both of these issues limit
our ability to correctly report errors (with useful messages) and to
maintain consistency under failure/contention (due to the inability to
rollback). A third and final blow to using transactions was to
correctly use them we would have to set a watch on a *very* contentious
key (the listings key) which would under load cause clients to retry more
often then would be desired (this also increases network load, CPU
cycles used, transactions failures triggered and so on).
- Partial transaction execution is possible due to pre/post ``EXEC``
failures (and the lack of rollback makes this worse).
So overall after thinking, it seemed like having little lua scripts
was not that bad (even if it is somewhat convoluted) due to the above and
public mentioned issues with transactions. In general using lua scripts
for this purpose seems to be somewhat common practice and it solves the
issues that came up when transactions were considered & implemented.
Some links about redis (and redis + lua) that may be useful to look over:
- `Atomicity of scripts`_
- `Scripting and transactions`_
- `Why redis does not support rollbacks`_
- `Intro to lua for redis programmers`_
- `Five key takeaways for developing with redis`_
- `Everything you always wanted to know about redis`_ (slides)
.. _Lua: http://www.lua.org/
.. _Atomicity of scripts: http://redis.io/commands/eval#atomicity-of-\
scripts
.. _Scripting and transactions: http://redis.io/topics/transactions#redis-\
scripting-and-transactions
.. _Why redis does not support rollbacks: http://redis.io/topics/transa\
ctions#why-redis-does-not-suppo\
rt-roll-backs
.. _Intro to lua for redis programmers: http://www.redisgreen.net/blog/int\
ro-to-lua-for-redis-programmers
.. _Five key takeaways for developing with redis: https://redislabs.com/bl\
og/5-key-takeaways-fo\
r-developing-with-redis
.. _Everything you always wanted to know about redis: http://www.slidesh
are.net/carlosabal\
de/everything-you-a\
lways-wanted-to-\
know-about-redis-b\
ut-were-afraid-to-ask
"""
@classmethod
def _make_client(cls, conf):
client_conf = {}
for key, value_type_converter in cls.CLIENT_CONF_TRANSFERS:
if key in conf:
if value_type_converter is not None:
client_conf[key] = value_type_converter(conf[key])
else:
client_conf[key] = conf[key]
return ru.RedisClient(**client_conf)
def __init__(self, name, conf,
client=None, persistence=None):
super(RedisJobBoard, self).__init__(name, conf)
self._closed = True
if client is not None:
self._client = client
self._owns_client = False
else:
self._client = self._make_client(self._conf)
# NOTE(harlowja): This client should not work until connected...
self._client.close()
self._owns_client = True
self._namespace = self._conf.get('namespace', self.DEFAULT_NAMESPACE)
self._open_close_lock = threading.RLock()
# Redis server version connected to + scripts (populated on connect).
self._redis_version = None
self._scripts = {}
# The backend to load the full logbooks from, since what is sent over
# the data connection is only the logbook uuid and name, and not the
# full logbook.
self._persistence = persistence
def join(self, key_piece, *more_key_pieces):
"""Create and return a namespaced key from many segments.
NOTE(harlowja): all pieces that are text/unicode are converted into
their binary equivalent (if they are already binary no conversion
takes place) before being joined (as redis expects binary keys and not
unicode/text ones).
"""
namespace_pieces = []
if self._namespace is not None:
namespace_pieces = [self._namespace, self.NAMESPACE_SEP]
else:
namespace_pieces = []
key_pieces = [key_piece]
if more_key_pieces:
key_pieces.extend(more_key_pieces)
for i in compat_range(0, len(namespace_pieces)):
namespace_pieces[i] = misc.binary_encode(namespace_pieces[i])
for i in compat_range(0, len(key_pieces)):
key_pieces[i] = misc.binary_encode(key_pieces[i])
namespace = b"".join(namespace_pieces)
key = self.KEY_PIECE_SEP.join(key_pieces)
return namespace + key
@property
def namespace(self):
"""The namespace all keys will be prefixed with (or none)."""
return self._namespace
@misc.cachedproperty
def trash_key(self):
"""Key where a hash will be stored with trashed jobs in it."""
return self.join(b"trash")
@misc.cachedproperty
def sequence_key(self):
"""Key where a integer will be stored (used to sequence jobs)."""
return self.join(b"sequence")
@misc.cachedproperty
def listings_key(self):
"""Key where a hash will be stored with active jobs in it."""
return self.join(b"listings")
@property
def job_count(self):
with _translate_failures():
return self._client.hlen(self.listings_key)
@property
def connected(self):
return not self._closed
@fasteners.locked(lock='_open_close_lock')
def connect(self):
self.close()
if self._owns_client:
self._client = self._make_client(self._conf)
with _translate_failures():
# The client maintains a connection pool, so do a ping and
# if that works then assume the connection works, which may or
# may not be continuously maintained (if the server dies
# at a later time, we will become aware of that when the next
# op occurs).
self._client.ping()
is_new_enough, redis_version = ru.is_server_new_enough(
self._client, self.MIN_REDIS_VERSION)
if not is_new_enough:
wanted_version = ".".join([str(p)
for p in self.MIN_REDIS_VERSION])
if redis_version:
raise exc.JobFailure("Redis version %s or greater is"
" required (version %s is to"
" old)" % (wanted_version,
redis_version))
else:
raise exc.JobFailure("Redis version %s or greater is"
" required" % (wanted_version))
else:
self._redis_version = redis_version
script_params = {
# Status field values.
'ok': self.SCRIPT_STATUS_OK,
'error': self.SCRIPT_STATUS_ERROR,
# Known error reasons (when status field is error).
'not_expected_owner': self.SCRIPT_NOT_EXPECTED_OWNER,
'unknown_owner': self.SCRIPT_UNKNOWN_OWNER,
'unknown_job': self.SCRIPT_UNKNOWN_JOB,
'already_claimed': self.SCRIPT_ALREADY_CLAIMED,
}
prepared_scripts = {}
for n, raw_script_tpl in six.iteritems(self.SCRIPT_TEMPLATES):
script_tpl = string.Template(raw_script_tpl)
script_blob = script_tpl.substitute(**script_params)
script = self._client.register_script(script_blob)
prepared_scripts[n] = script
self._scripts.update(prepared_scripts)
self._closed = False
@fasteners.locked(lock='_open_close_lock')
def close(self):
if self._owns_client:
self._client.close()
self._scripts.clear()
self._redis_version = None
self._closed = True
@staticmethod
def _dumps(obj):
try:
return msgpackutils.dumps(obj)
except (msgpack.PackException, ValueError):
# TODO(harlowja): remove direct msgpack exception access when
# oslo.utils provides easy access to the underlying msgpack
# pack/unpack exceptions..
exc.raise_with_cause(exc.JobFailure,
"Failed to serialize object to"
" msgpack blob")
@staticmethod
def _loads(blob, root_types=(dict,)):
try:
return misc.decode_msgpack(blob, root_types=root_types)
except (msgpack.UnpackException, ValueError):
# TODO(harlowja): remove direct msgpack exception access when
# oslo.utils provides easy access to the underlying msgpack
# pack/unpack exceptions..
exc.raise_with_cause(exc.JobFailure,
"Failed to deserialize object from"
" msgpack blob (of length %s)" % len(blob))
_decode_owner = staticmethod(misc.binary_decode)
_encode_owner = staticmethod(misc.binary_encode)
def find_owner(self, job):
owner_key = self.join(job.key + self.OWNED_POSTFIX)
with _translate_failures():
raw_owner = self._client.get(owner_key)
return self._decode_owner(raw_owner)
def post(self, name, book=None, details=None):
job_uuid = uuidutils.generate_uuid()
posting = base.format_posting(job_uuid, name,
created_on=timeutils.utcnow(),
book=book, details=details)
with _translate_failures():
sequence = self._client.incr(self.sequence_key)
posting.update({
'sequence': sequence,
})
with _translate_failures():
raw_posting = self._dumps(posting)
raw_job_uuid = six.b(job_uuid)
was_posted = bool(self._client.hsetnx(self.listings_key,
raw_job_uuid, raw_posting))
if not was_posted:
raise exc.JobFailure("New job located at '%s[%s]' could not"
" be posted" % (self.listings_key,
raw_job_uuid))
else:
return RedisJob(self, name, sequence, raw_job_uuid,
uuid=job_uuid, details=details,
created_on=posting['created_on'],
book=book, book_data=posting.get('book'),
backend=self._persistence)
def wait(self, timeout=None, initial_delay=0.005,
max_delay=1.0, sleep_func=time.sleep):
if initial_delay > max_delay:
raise ValueError("Initial delay %s must be less than or equal"
" to the provided max delay %s"
% (initial_delay, max_delay))
# This does a spin-loop that backs off by doubling the delay
# up to the provided max-delay. In the future we could try having
# a secondary client connected into redis pubsub and use that
# instead, but for now this is simpler.
w = timeutils.StopWatch(duration=timeout)
w.start()
delay = initial_delay
while True:
jc = self.job_count
if jc > 0:
it = self.iterjobs()
return it
else:
if w.expired():
raise exc.NotFound("Expired waiting for jobs to"
" arrive; waited %s seconds"
% w.elapsed())
else:
remaining = w.leftover(return_none=True)
if remaining is not None:
delay = min(delay * 2, remaining, max_delay)
else:
delay = min(delay * 2, max_delay)
sleep_func(delay)
def iterjobs(self, only_unclaimed=False, ensure_fresh=False):
with _translate_failures():
raw_postings = self._client.hgetall(self.listings_key)
postings = []
for raw_job_key, raw_posting in six.iteritems(raw_postings):
posting = self._loads(raw_posting)
details = posting.get('details', {})
job_uuid = posting['uuid']
job = RedisJob(self, posting['name'], posting['sequence'],
raw_job_key, uuid=job_uuid, details=details,
created_on=posting['created_on'],
book_data=posting.get('book'),
backend=self._persistence)
postings.append(job)
postings = sorted(postings)
for job in postings:
if only_unclaimed:
if job.state == states.UNCLAIMED:
yield job
else:
yield job
@base.check_who
def consume(self, job, who):
script = self._get_script('consume')
with _translate_failures():
raw_who = self._encode_owner(who)
raw_result = script(keys=[job.owner_key, self.listings_key,
job.last_modified_key],
args=[raw_who, job.key])
result = self._loads(raw_result)
status = result['status']
if status != self.SCRIPT_STATUS_OK:
reason = result.get('reason')
if reason == self.SCRIPT_UNKNOWN_JOB:
raise exc.NotFound("Job %s not found to be"
" consumed" % (job.uuid))
elif reason == self.SCRIPT_UNKNOWN_OWNER:
raise exc.NotFound("Can not consume job %s"
" which we can not determine"
" the owner of" % (job.uuid))
elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
raw_owner = result.get('owner')
if raw_owner:
owner = self._decode_owner(raw_owner)
raise exc.JobFailure("Can not consume job %s"
" which is not owned by %s (it is"
" actively owned by %s)"
% (job.uuid, who, owner))
else:
raise exc.JobFailure("Can not consume job %s"
" which is not owned by %s"
% (job.uuid, who))
else:
raise exc.JobFailure("Failure to consume job %s,"
" unknown internal error (reason=%s)"
% (job.uuid, reason))
@base.check_who
def claim(self, job, who, expiry=None):
if expiry is None:
# On the lua side none doesn't translate to nil so we have
# do to this string conversion to make sure that we can tell
# the difference.
ms_expiry = "none"
else:
ms_expiry = int(expiry * 1000.0)
if ms_expiry <= 0:
raise ValueError("Provided expiry (when converted to"
" milliseconds) must be greater"
" than zero instead of %s" % (expiry))
script = self._get_script('claim')
with _translate_failures():
raw_who = self._encode_owner(who)
raw_result = script(keys=[job.owner_key, self.listings_key,
job.last_modified_key],
args=[raw_who, job.key,
# NOTE(harlowja): we need to send this
# in as a blob (even if it's not
# set/used), since the format can not
# currently be created in lua...
self._dumps(timeutils.utcnow()),
ms_expiry])
result = self._loads(raw_result)
status = result['status']
if status != self.SCRIPT_STATUS_OK:
reason = result.get('reason')
if reason == self.SCRIPT_UNKNOWN_JOB:
raise exc.NotFound("Job %s not found to be"
" claimed" % (job.uuid))
elif reason == self.SCRIPT_ALREADY_CLAIMED:
raw_owner = result.get('owner')
if raw_owner:
owner = self._decode_owner(raw_owner)
raise exc.UnclaimableJob("Job %s already"
" claimed by %s"
% (job.uuid, owner))
else:
raise exc.UnclaimableJob("Job %s already"
" claimed" % (job.uuid))
else:
raise exc.JobFailure("Failure to claim job %s,"
" unknown internal error (reason=%s)"
% (job.uuid, reason))
@base.check_who
def abandon(self, job, who):
script = self._get_script('abandon')
with _translate_failures():
raw_who = self._encode_owner(who)
raw_result = script(keys=[job.owner_key, self.listings_key,
job.last_modified_key],
args=[raw_who, job.key,
self._dumps(timeutils.utcnow())])
result = self._loads(raw_result)
status = result.get('status')
if status != self.SCRIPT_STATUS_OK:
reason = result.get('reason')
if reason == self.SCRIPT_UNKNOWN_JOB:
raise exc.NotFound("Job %s not found to be"
" abandoned" % (job.uuid))
elif reason == self.SCRIPT_UNKNOWN_OWNER:
raise exc.NotFound("Can not abandon job %s"
" which we can not determine"
" the owner of" % (job.uuid))
elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
raw_owner = result.get('owner')
if raw_owner:
owner = self._decode_owner(raw_owner)
raise exc.JobFailure("Can not abandon job %s"
" which is not owned by %s (it is"
" actively owned by %s)"
% (job.uuid, who, owner))
else:
raise exc.JobFailure("Can not abandon job %s"
" which is not owned by %s"
% (job.uuid, who))
else:
raise exc.JobFailure("Failure to abandon job %s,"
" unknown internal"
" error (status=%s, reason=%s)"
% (job.uuid, status, reason))
def _get_script(self, name):
try:
return self._scripts[name]
except KeyError:
exc.raise_with_cause(exc.NotFound,
"Can not access %s script (has this"
" board been connected?)" % name)
@base.check_who
def trash(self, job, who):
script = self._get_script('trash')
with _translate_failures():
raw_who = self._encode_owner(who)
raw_result = script(keys=[job.owner_key, self.listings_key,
job.last_modified_key, self.trash_key],
args=[raw_who, job.key,
self._dumps(timeutils.utcnow())])
result = self._loads(raw_result)
status = result['status']
if status != self.SCRIPT_STATUS_OK:
reason = result.get('reason')
if reason == self.SCRIPT_UNKNOWN_JOB:
raise exc.NotFound("Job %s not found to be"
" trashed" % (job.uuid))
elif reason == self.SCRIPT_UNKNOWN_OWNER:
raise exc.NotFound("Can not trash job %s"
" which we can not determine"
" the owner of" % (job.uuid))
elif reason == self.SCRIPT_NOT_EXPECTED_OWNER:
raw_owner = result.get('owner')
if raw_owner:
owner = self._decode_owner(raw_owner)
raise exc.JobFailure("Can not trash job %s"
" which is not owned by %s (it is"
" actively owned by %s)"
% (job.uuid, who, owner))
else:
raise exc.JobFailure("Can not trash job %s"
" which is not owned by %s"
% (job.uuid, who))
else:
raise exc.JobFailure("Failure to trash job %s,"
" unknown internal error (reason=%s)"
% (job.uuid, reason))

View File

@@ -17,14 +17,17 @@
import collections
import contextlib
import functools
import sys
import threading
from concurrent import futures
import fasteners
import futurist
from kazoo import exceptions as k_exceptions
from kazoo.protocol import paths as k_paths
from kazoo.recipe import watchers
from oslo_serialization import jsonutils
from oslo_utils import excutils
from oslo_utils import timeutils
from oslo_utils import uuidutils
import six
@@ -32,66 +35,39 @@ from taskflow import exceptions as excp
from taskflow.jobs import base
from taskflow import logging
from taskflow import states
from taskflow.types import timing as tt
from taskflow.utils import kazoo_utils
from taskflow.utils import lock_utils
from taskflow.utils import misc
LOG = logging.getLogger(__name__)
UNCLAIMED_JOB_STATES = (
states.UNCLAIMED,
)
ALL_JOB_STATES = (
states.UNCLAIMED,
states.COMPLETE,
states.CLAIMED,
)
# Transaction support was added in 3.4.0
MIN_ZK_VERSION = (3, 4, 0)
LOCK_POSTFIX = ".lock"
JOB_PREFIX = 'job'
def _check_who(who):
if not isinstance(who, six.string_types):
raise TypeError("Job applicant must be a string type")
if len(who) == 0:
raise ValueError("Job applicant must be non-empty")
class ZookeeperJob(base.Job):
"""A zookeeper job."""
def __init__(self, name, board, client, backend, path,
def __init__(self, board, name, client, path,
uuid=None, details=None, book=None, book_data=None,
created_on=None):
super(ZookeeperJob, self).__init__(name, uuid=uuid, details=details)
self._board = board
self._book = book
if not book_data:
book_data = {}
self._book_data = book_data
created_on=None, backend=None):
super(ZookeeperJob, self).__init__(board, name,
uuid=uuid, details=details,
backend=backend,
book=book, book_data=book_data)
self._client = client
self._backend = backend
if all((self._book, self._book_data)):
raise ValueError("Only one of 'book_data' or 'book'"
" can be provided")
self._path = k_paths.normpath(path)
self._lock_path = path + LOCK_POSTFIX
self._lock_path = self._path + board.LOCK_POSTFIX
self._created_on = created_on
self._node_not_found = False
basename = k_paths.basename(self._path)
self._root = self._path[0:-len(basename)]
self._sequence = int(basename[len(JOB_PREFIX):])
self._sequence = int(basename[len(board.JOB_PREFIX):])
@property
def lock_path(self):
"""Path the job lock/claim and owner znode is stored."""
return self._lock_path
@property
def path(self):
"""Path the job data znode is stored."""
return self._path
@property
@@ -112,22 +88,27 @@ class ZookeeperJob(base.Job):
return trans_func(attr)
else:
return attr
except k_exceptions.NoNodeError as e:
raise excp.NotFound("Can not fetch the %r attribute"
" of job %s (%s), path %s not found"
% (attr_name, self.uuid, self.path, path), e)
except self._client.handler.timeout_exception as e:
raise excp.JobFailure("Can not fetch the %r attribute"
" of job %s (%s), operation timed out"
% (attr_name, self.uuid, self.path), e)
except k_exceptions.SessionExpiredError as e:
raise excp.JobFailure("Can not fetch the %r attribute"
" of job %s (%s), session expired"
% (attr_name, self.uuid, self.path), e)
except (AttributeError, k_exceptions.KazooException) as e:
raise excp.JobFailure("Can not fetch the %r attribute"
" of job %s (%s), internal error" %
(attr_name, self.uuid, self.path), e)
except k_exceptions.NoNodeError:
excp.raise_with_cause(
excp.NotFound,
"Can not fetch the %r attribute of job %s (%s),"
" path %s not found" % (attr_name, self.uuid,
self.path, path))
except self._client.handler.timeout_exception:
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the %r attribute of job %s (%s),"
" operation timed out" % (attr_name, self.uuid, self.path))
except k_exceptions.SessionExpiredError:
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the %r attribute of job %s (%s),"
" session expired" % (attr_name, self.uuid, self.path))
except (AttributeError, k_exceptions.KazooException):
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the %r attribute of job %s (%s),"
" internal error" % (attr_name, self.uuid, self.path))
@property
def last_modified(self):
@@ -155,23 +136,6 @@ class ZookeeperJob(base.Job):
self._node_not_found = True
return self._created_on
@property
def board(self):
return self._board
def _load_book(self):
book_uuid = self.book_uuid
if self._backend is not None and book_uuid is not None:
# TODO(harlowja): we are currently limited by assuming that the
# job posted has the same backend as this loader (to start this
# seems to be a ok assumption, and can be adjusted in the future
# if we determine there is a use-case for multi-backend loaders,
# aka a registry of loaders).
with contextlib.closing(self._backend.get_connection()) as conn:
return conn.get_logbook(book_uuid)
# No backend to fetch from or no uuid specified
return None
@property
def state(self):
owner = self.board.find_owner(self)
@@ -181,15 +145,21 @@ class ZookeeperJob(base.Job):
job_data = misc.decode_json(raw_data)
except k_exceptions.NoNodeError:
pass
except k_exceptions.SessionExpiredError as e:
raise excp.JobFailure("Can not fetch the state of %s,"
" session expired" % (self.uuid), e)
except self._client.handler.timeout_exception as e:
raise excp.JobFailure("Can not fetch the state of %s,"
" operation timed out" % (self.uuid), e)
except k_exceptions.KazooException as e:
raise excp.JobFailure("Can not fetch the state of %s, internal"
" error" % (self.uuid), e)
except k_exceptions.SessionExpiredError:
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the state of %s,"
" session expired" % (self.uuid))
except self._client.handler.timeout_exception:
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the state of %s,"
" operation timed out" % (self.uuid))
except k_exceptions.KazooException:
excp.raise_with_cause(
excp.JobFailure,
"Can not fetch the state of %s,"
" internal error" % (self.uuid))
if not job_data:
# No data this job has been completed (the owner that we might have
# fetched will not be able to be fetched again, since the job node
@@ -209,30 +179,6 @@ class ZookeeperJob(base.Job):
def __hash__(self):
return hash(self.path)
@property
def book(self):
if self._book is None:
self._book = self._load_book()
return self._book
@property
def book_uuid(self):
if self._book:
return self._book.uuid
if self._book_data:
return self._book_data.get('uuid')
else:
return None
@property
def book_name(self):
if self._book:
return self._book.name
if self._book_data:
return self._book_data.get('name')
else:
return None
class ZookeeperJobBoardIterator(six.Iterator):
"""Iterator over a zookeeper jobboard that iterates over potential jobs.
@@ -246,6 +192,16 @@ class ZookeeperJobBoardIterator(six.Iterator):
over unclaimed jobs.
"""
_UNCLAIMED_JOB_STATES = (
states.UNCLAIMED,
)
_JOB_STATES = (
states.UNCLAIMED,
states.COMPLETE,
states.CLAIMED,
)
def __init__(self, board, only_unclaimed=False, ensure_fresh=False):
self._board = board
self._jobs = collections.deque()
@@ -255,6 +211,7 @@ class ZookeeperJobBoardIterator(six.Iterator):
@property
def board(self):
"""The board this iterator was created from."""
return self._board
def __iter__(self):
@@ -262,9 +219,9 @@ class ZookeeperJobBoardIterator(six.Iterator):
def _next_job(self):
if self.only_unclaimed:
allowed_states = UNCLAIMED_JOB_STATES
allowed_states = self._UNCLAIMED_JOB_STATES
else:
allowed_states = ALL_JOB_STATES
allowed_states = self._JOB_STATES
job = None
while self._jobs and job is None:
maybe_job = self._jobs.popleft()
@@ -292,29 +249,49 @@ class ZookeeperJobBoardIterator(six.Iterator):
class ZookeeperJobBoard(base.NotifyingJobBoard):
"""A jobboard backend by zookeeper.
"""A jobboard backed by `zookeeper`_.
Powered by the `kazoo <http://kazoo.readthedocs.org/>`_ library.
This jobboard creates *sequenced* persistent znodes in a directory in
zookeeper (that directory defaults ``/taskflow/jobs``) and uses zookeeper
watches to notify other jobboards that the job which was posted using the
:meth:`.post` method (this creates a znode with contents/details in json)
The users of those jobboard(s) (potentially on disjoint sets of machines)
can then iterate over the available jobs and decide if they want to attempt
to claim one of the jobs they have iterated over. If so they will then
attempt to contact zookeeper and will attempt to create a ephemeral znode
using the name of the persistent znode + ".lock" as a postfix. If the
entity trying to use the jobboard to :meth:`.claim` the job is able to
create a ephemeral znode with that name then it will be allowed (and
expected) to perform whatever *work* the contents of that job that it
locked described. Once finished the ephemeral znode and persistent znode
may be deleted (if successfully completed) in a single transcation or if
not successfull (or the entity that claimed the znode dies) the ephemeral
znode will be released (either manually by using :meth:`.abandon` or
automatically by zookeeper the ephemeral is deemed to be lost).
zookeeper and uses zookeeper watches to notify other jobboards of
jobs which were posted using the :meth:`.post` method (this creates a
znode with job contents/details encoded in `json`_). The users of these
jobboard(s) (potentially on disjoint sets of machines) can then iterate
over the available jobs and decide if they want
to attempt to claim one of the jobs they have iterated over. If so they
will then attempt to contact zookeeper and they will attempt to create a
ephemeral znode using the name of the persistent znode + ".lock" as a
postfix. If the entity trying to use the jobboard to :meth:`.claim` the
job is able to create a ephemeral znode with that name then it will be
allowed (and expected) to perform whatever *work* the contents of that
job described. Once the claiming entity is finished the ephemeral znode
and persistent znode will be deleted (if successfully completed) in a
single transaction. If the claiming entity is not successful (or the
entity that claimed the znode dies) the ephemeral znode will be
released (either manually by using :meth:`.abandon` or automatically by
zookeeper when the ephemeral node and associated session is deemed to
have been lost).
.. _zookeeper: http://zookeeper.apache.org/
.. _json: http://json.org/
"""
#: Transaction support was added in 3.4.0 so we need at least that version.
MIN_ZK_VERSION = (3, 4, 0)
#: Znode **postfix** that lock entries have.
LOCK_POSTFIX = ".lock"
#: Znode child path created under root path that contains trashed jobs.
TRASH_FOLDER = ".trash"
#: Znode **prefix** that job entries have.
JOB_PREFIX = 'job'
#: Default znode path used for jobs (data, locks...).
DEFAULT_PATH = "/taskflow/jobs"
def __init__(self, name, conf,
client=None, persistence=None, emit_notifications=True):
super(ZookeeperJobBoard, self).__init__(name, conf)
@@ -324,17 +301,17 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
else:
self._client = kazoo_utils.make_client(self._conf)
self._owned = True
path = str(conf.get("path", "/taskflow/jobs"))
path = str(conf.get("path", self.DEFAULT_PATH))
if not path:
raise ValueError("Empty zookeeper path is disallowed")
if not k_paths.isabs(path):
raise ValueError("Zookeeper path must be absolute")
self._path = path
# The backend to load the full logbooks from, since whats sent over
# the zookeeper data connection is only the logbook uuid and name, and
# not currently the full logbook (later when a zookeeper backend
# appears we can likely optimize for that backend usage by directly
# reading from the path where the data is stored, if we want).
self._trash_path = self._path.replace(k_paths.basename(self._path),
self.TRASH_FOLDER)
# The backend to load the full logbooks from, since what is sent over
# the data connection is only the logbook uuid and name, and not the
# full logbook.
self._persistence = persistence
# Misc. internal details
self._known_jobs = {}
@@ -345,23 +322,34 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
self._job_watcher = None
# Since we use sequenced ids this will be the path that the sequences
# are prefixed with, for example, job0000000001, job0000000002, ...
self._job_base = k_paths.join(path, JOB_PREFIX)
self._job_base = k_paths.join(path, self.JOB_PREFIX)
self._worker = None
self._emit_notifications = bool(emit_notifications)
self._connected = False
def _emit(self, state, details):
# Submit the work to the executor to avoid blocking the kazoo queue.
# Submit the work to the executor to avoid blocking the kazoo threads
# and queue(s)...
worker = self._worker
if worker is None:
return
try:
self._worker.submit(self.notifier.notify, state, details)
except (AttributeError, RuntimeError):
# Notification thread is shutdown or non-existent, either case we
# just want to skip submitting a notification...
worker.submit(self.notifier.notify, state, details)
except RuntimeError:
# Notification thread is shutdown just skip submitting a
# notification...
pass
@property
def path(self):
"""Path where all job znodes will be stored."""
return self._path
@property
def trash_path(self):
"""Path where all trashed job znodes will be stored."""
return self._trash_path
@property
def job_count(self):
return len(self._known_jobs)
@@ -375,15 +363,17 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
def _force_refresh(self):
try:
children = self._client.get_children(self.path)
except self._client.handler.timeout_exception as e:
raise excp.JobFailure("Refreshing failure, operation timed out",
e)
except k_exceptions.SessionExpiredError as e:
raise excp.JobFailure("Refreshing failure, session expired", e)
except self._client.handler.timeout_exception:
excp.raise_with_cause(excp.JobFailure,
"Refreshing failure, operation timed out")
except k_exceptions.SessionExpiredError:
excp.raise_with_cause(excp.JobFailure,
"Refreshing failure, session expired")
except k_exceptions.NoNodeError:
pass
except k_exceptions.KazooException as e:
raise excp.JobFailure("Refreshing failure, internal error", e)
except k_exceptions.KazooException:
excp.raise_with_cause(excp.JobFailure,
"Refreshing failure, internal error")
else:
self._on_job_posting(children, delayed=False)
@@ -429,8 +419,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
# jobs information into the known job set (if it's already
# existing then just leave it alone).
if path not in self._known_jobs:
job = ZookeeperJob(job_data['name'], self,
self._client, self._persistence, path,
job = ZookeeperJob(self, job_data['name'],
self._client, path,
backend=self._persistence,
uuid=job_data['uuid'],
book_data=job_data.get("book"),
details=job_data.get("details", {}),
@@ -444,7 +435,8 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
LOG.debug("Got children %s under path %s", children, self.path)
child_paths = []
for c in children:
if c.endswith(LOCK_POSTFIX) or not c.startswith(JOB_PREFIX):
if (c.endswith(self.LOCK_POSTFIX) or
not c.startswith(self.JOB_PREFIX)):
# Skip lock paths or non-job-paths (these are not valid jobs)
continue
child_paths.append(k_paths.join(self.path, c))
@@ -488,45 +480,31 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
self._process_child(path, request)
def post(self, name, book=None, details=None):
def format_posting(job_uuid):
posting = {
'uuid': job_uuid,
'name': name,
}
if details:
posting['details'] = details
else:
posting['details'] = {}
if book is not None:
posting['book'] = {
'name': book.name,
'uuid': book.uuid,
}
return posting
# NOTE(harlowja): Jobs are not ephemeral, they will persist until they
# are consumed (this may change later, but seems safer to do this until
# further notice).
job_uuid = uuidutils.generate_uuid()
job_posting = base.format_posting(job_uuid, name,
book=book, details=details)
raw_job_posting = misc.binary_encode(jsonutils.dumps(job_posting))
with self._wrap(job_uuid, None,
"Posting failure: %s", ensure_known=False):
job_posting = format_posting(job_uuid)
job_posting = misc.binary_encode(jsonutils.dumps(job_posting))
fail_msg_tpl="Posting failure: %s",
ensure_known=False):
job_path = self._client.create(self._job_base,
value=job_posting,
value=raw_job_posting,
sequence=True,
ephemeral=False)
job = ZookeeperJob(name, self, self._client,
self._persistence, job_path,
book=book, details=details,
uuid=job_uuid)
job = ZookeeperJob(self, name, self._client, job_path,
backend=self._persistence,
book=book, details=details, uuid=job_uuid,
book_data=job_posting.get('book'))
with self._job_cond:
self._known_jobs[job_path] = job
self._job_cond.notify_all()
self._emit(base.POSTED, details={'job': job})
return job
@base.check_who
def claim(self, job, who):
def _unclaimable_try_find_owner(cause):
try:
@@ -534,13 +512,14 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
except Exception:
owner = None
if owner:
msg = "Job %s already claimed by '%s'" % (job.uuid, owner)
message = "Job %s already claimed by '%s'" % (job.uuid, owner)
else:
msg = "Job %s already claimed" % (job.uuid)
return excp.UnclaimableJob(msg, cause)
message = "Job %s already claimed" % (job.uuid)
excp.raise_with_cause(excp.UnclaimableJob,
message, cause=cause)
_check_who(who)
with self._wrap(job.uuid, job.path, "Claiming failure: %s"):
with self._wrap(job.uuid, job.path,
fail_msg_tpl="Claiming failure: %s"):
# NOTE(harlowja): post as json which will allow for future changes
# more easily than a raw string/text.
value = jsonutils.dumps({
@@ -558,21 +537,23 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
try:
kazoo_utils.checked_commit(txn)
except k_exceptions.NodeExistsError as e:
raise _unclaimable_try_find_owner(e)
_unclaimable_try_find_owner(e)
except kazoo_utils.KazooTransactionException as e:
if len(e.failures) < 2:
raise
else:
if isinstance(e.failures[0], k_exceptions.NoNodeError):
raise excp.NotFound(
excp.raise_with_cause(
excp.NotFound,
"Job %s not found to be claimed" % job.uuid,
e.failures[0])
cause=e.failures[0])
if isinstance(e.failures[1], k_exceptions.NodeExistsError):
raise _unclaimable_try_find_owner(e.failures[1])
_unclaimable_try_find_owner(e.failures[1])
else:
raise excp.UnclaimableJob(
excp.raise_with_cause(
excp.UnclaimableJob,
"Job %s claim failed due to transaction"
" not succeeding" % (job.uuid), e)
" not succeeding" % (job.uuid), cause=e)
@contextlib.contextmanager
def _wrap(self, job_uuid, job_path,
@@ -588,21 +569,23 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
raise excp.NotFound(fail_msg_tpl % (job_uuid))
try:
yield
except self._client.handler.timeout_exception as e:
except self._client.handler.timeout_exception:
fail_msg_tpl += ", operation timed out"
raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
except k_exceptions.SessionExpiredError as e:
excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))
except k_exceptions.SessionExpiredError:
fail_msg_tpl += ", session expired"
raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))
except k_exceptions.NoNodeError:
fail_msg_tpl += ", unknown job"
raise excp.NotFound(fail_msg_tpl % (job_uuid))
except k_exceptions.KazooException as e:
excp.raise_with_cause(excp.NotFound, fail_msg_tpl % (job_uuid))
except k_exceptions.KazooException:
fail_msg_tpl += ", internal error"
raise excp.JobFailure(fail_msg_tpl % (job_uuid), e)
excp.raise_with_cause(excp.JobFailure, fail_msg_tpl % (job_uuid))
def find_owner(self, job):
with self._wrap(job.uuid, job.path, "Owner query failure: %s"):
with self._wrap(job.uuid, job.path,
fail_msg_tpl="Owner query failure: %s",
ensure_known=False):
try:
self._client.sync(job.lock_path)
raw_data, _lock_stat = self._client.get(job.lock_path)
@@ -618,14 +601,16 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
return (misc.decode_json(lock_data), lock_stat,
misc.decode_json(job_data), job_stat)
@base.check_who
def consume(self, job, who):
_check_who(who)
with self._wrap(job.uuid, job.path, "Consumption failure: %s"):
with self._wrap(job.uuid, job.path,
fail_msg_tpl="Consumption failure: %s"):
try:
owner_data = self._get_owner_and_data(job)
lock_data, lock_stat, data, data_stat = owner_data
except k_exceptions.NoNodeError:
raise excp.JobFailure("Can not consume a job %s"
excp.raise_with_cause(excp.NotFound,
"Can not consume a job %s"
" which we can not determine"
" the owner of" % (job.uuid))
if lock_data.get("owner") != who:
@@ -638,14 +623,16 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
kazoo_utils.checked_commit(txn)
self._remove_job(job.path)
@base.check_who
def abandon(self, job, who):
_check_who(who)
with self._wrap(job.uuid, job.path, "Abandonment failure: %s"):
with self._wrap(job.uuid, job.path,
fail_msg_tpl="Abandonment failure: %s"):
try:
owner_data = self._get_owner_and_data(job)
lock_data, lock_stat, data, data_stat = owner_data
except k_exceptions.NoNodeError:
raise excp.JobFailure("Can not abandon a job %s"
excp.raise_with_cause(excp.NotFound,
"Can not abandon a job %s"
" which we can not determine"
" the owner of" % (job.uuid))
if lock_data.get("owner") != who:
@@ -656,12 +643,36 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
txn.delete(job.lock_path, version=lock_stat.version)
kazoo_utils.checked_commit(txn)
@base.check_who
def trash(self, job, who):
with self._wrap(job.uuid, job.path,
fail_msg_tpl="Trash failure: %s"):
try:
owner_data = self._get_owner_and_data(job)
lock_data, lock_stat, data, data_stat = owner_data
except k_exceptions.NoNodeError:
excp.raise_with_cause(excp.NotFound,
"Can not trash a job %s"
" which we can not determine"
" the owner of" % (job.uuid))
if lock_data.get("owner") != who:
raise excp.JobFailure("Can not trash a job %s"
" which is not owned by %s"
% (job.uuid, who))
trash_path = job.path.replace(self.path, self.trash_path)
value = misc.binary_encode(jsonutils.dumps(data))
txn = self._client.transaction()
txn.create(trash_path, value=value)
txn.delete(job.lock_path, version=lock_stat.version)
txn.delete(job.path, version=data_stat.version)
kazoo_utils.checked_commit(txn)
def _state_change_listener(self, state):
LOG.debug("Kazoo client has changed to state: %s", state)
def wait(self, timeout=None):
# Wait until timeout expires (or forever) for jobs to appear.
watch = tt.StopWatch(duration=timeout)
watch = timeutils.StopWatch(duration=timeout)
watch.start()
with self._job_cond:
while True:
@@ -684,9 +695,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
@property
def connected(self):
return self._client.connected
return self._connected and self._client.connected
@lock_utils.locked(lock='_open_close_lock')
@fasteners.locked(lock='_open_close_lock')
def close(self):
if self._owned:
LOG.debug("Stopping client")
@@ -698,8 +709,9 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
with self._job_cond:
self._known_jobs.clear()
LOG.debug("Stopped & cleared local state")
self._connected = False
@lock_utils.locked(lock='_open_close_lock')
@fasteners.locked(lock='_open_close_lock')
def connect(self, timeout=10.0):
def try_clean():
@@ -717,25 +729,33 @@ class ZookeeperJobBoard(base.NotifyingJobBoard):
timeout = float(timeout)
self._client.start(timeout=timeout)
except (self._client.handler.timeout_exception,
k_exceptions.KazooException) as e:
raise excp.JobFailure("Failed to connect to zookeeper", e)
k_exceptions.KazooException):
excp.raise_with_cause(excp.JobFailure,
"Failed to connect to zookeeper")
try:
if self._conf.get('check_compatible', True):
kazoo_utils.check_compatible(self._client, MIN_ZK_VERSION)
kazoo_utils.check_compatible(self._client, self.MIN_ZK_VERSION)
if self._worker is None and self._emit_notifications:
self._worker = futures.ThreadPoolExecutor(max_workers=1)
self._worker = futurist.ThreadPoolExecutor(max_workers=1)
self._client.ensure_path(self.path)
self._client.ensure_path(self.trash_path)
if self._job_watcher is None:
self._job_watcher = watchers.ChildrenWatch(
self._client,
self.path,
func=self._on_job_posting,
allow_session_lost=True)
self._connected = True
except excp.IncompatibleVersion:
with excutils.save_and_reraise_exception():
try_clean()
except (self._client.handler.timeout_exception,
k_exceptions.KazooException) as e:
try_clean()
raise excp.JobFailure("Failed to do post-connection"
" initialization", e)
k_exceptions.KazooException):
exc_type, exc, exc_tb = sys.exc_info()
try:
try_clean()
excp.raise_with_cause(excp.JobFailure,
"Failed to do post-connection"
" initialization", cause=exc)
finally:
del(exc_type, exc, exc_tb)

View File

@@ -16,6 +16,7 @@
# under the License.
import abc
import contextlib
from oslo_utils import uuidutils
import six
@@ -43,7 +44,9 @@ class Job(object):
reverting...
"""
def __init__(self, name, uuid=None, details=None):
def __init__(self, board, name,
uuid=None, details=None, backend=None,
book=None, book_data=None):
if uuid:
self._uuid = uuid
else:
@@ -52,45 +55,62 @@ class Job(object):
if not details:
details = {}
self._details = details
self._backend = backend
self._board = board
self._book = book
if not book_data:
book_data = {}
self._book_data = book_data
@abc.abstractproperty
def last_modified(self):
"""The datetime the job was last modified."""
pass
@abc.abstractproperty
def created_on(self):
"""The datetime the job was created on."""
pass
@abc.abstractproperty
@property
def board(self):
"""The board this job was posted on or was created from."""
return self._board
@abc.abstractproperty
def state(self):
"""The current state of this job."""
"""Access the current state of this job."""
pass
@abc.abstractproperty
@property
def book(self):
"""Logbook associated with this job.
If no logbook is associated with this job, this property is None.
"""
if self._book is None:
self._book = self._load_book()
return self._book
@abc.abstractproperty
@property
def book_uuid(self):
"""UUID of logbook associated with this job.
If no logbook is associated with this job, this property is None.
"""
if self._book is not None:
return self._book.uuid
else:
return self._book_data.get('uuid')
@abc.abstractproperty
@property
def book_name(self):
"""Name of logbook associated with this job.
If no logbook is associated with this job, this property is None.
"""
if self._book is not None:
return self._book.name
else:
return self._book_data.get('name')
@property
def uuid(self):
@@ -107,10 +127,24 @@ class Job(object):
"""The non-uniquely identifying name of this job."""
return self._name
def _load_book(self):
book_uuid = self.book_uuid
if self._backend is not None and book_uuid is not None:
# TODO(harlowja): we are currently limited by assuming that the
# job posted has the same backend as this loader (to start this
# seems to be a ok assumption, and can be adjusted in the future
# if we determine there is a use-case for multi-backend loaders,
# aka a registry of loaders).
with contextlib.closing(self._backend.get_connection()) as conn:
return conn.get_logbook(book_uuid)
# No backend to fetch from or no uuid specified
return None
def __str__(self):
"""Pretty formats the job into something *more* meaningful."""
return "%s %s (%s): %s" % (type(self).__name__,
self.name, self.uuid, self.details)
return "%s: %s (uuid=%s, details=%s)" % (type(self).__name__,
self.name, self.uuid,
self.details)
@six.add_metaclass(abc.ABCMeta)
@@ -260,6 +294,25 @@ class JobBoard(object):
this must be the same name that was used for claiming this job.
"""
@abc.abstractmethod
def trash(self, job, who):
"""Trash the provided job.
Trashing a job signals to others that the job is broken and should not
be reclaimed. This is provided as an option for users to be able to
remove jobs from the board externally. The trashed job details should
be kept around in an alternate location to be reviewed, if desired.
Only the entity that has claimed that job can trash a job. Any entity
trashing a unclaimed job (or a job they do not own) will cause an
exception.
:param job: a job on this jobboard that can be trashed (if it does
not exist then a NotFound exception will be raised).
:param who: string that names the entity performing the trashing,
this must be the same name that was used for claiming this job.
"""
@abc.abstractproperty
def connected(self):
"""Returns if this jobboard is connected."""
@@ -295,3 +348,40 @@ class NotifyingJobBoard(JobBoard):
def __init__(self, name, conf):
super(NotifyingJobBoard, self).__init__(name, conf)
self.notifier = notifier.Notifier()
# Internal helpers for usage by board implementations...
def check_who(meth):
@six.wraps(meth)
def wrapper(self, job, who, *args, **kwargs):
if not isinstance(who, six.string_types):
raise TypeError("Job applicant must be a string type")
if len(who) == 0:
raise ValueError("Job applicant must be non-empty")
return meth(self, job, who, *args, **kwargs)
return wrapper
def format_posting(uuid, name, created_on=None, last_modified=None,
details=None, book=None):
posting = {
'uuid': uuid,
'name': name,
}
if created_on is not None:
posting['created_on'] = created_on
if last_modified is not None:
posting['last_modified'] = last_modified
if details:
posting['details'] = details
else:
posting['details'] = {}
if book is not None:
posting['book'] = {
'name': book.name,
'uuid': book.uuid,
}
return posting

View File

@@ -18,6 +18,7 @@ from __future__ import absolute_import
import abc
from debtcollector import moves
from oslo_utils import excutils
import six
@@ -25,7 +26,6 @@ from taskflow import logging
from taskflow import states
from taskflow.types import failure
from taskflow.types import notifier
from taskflow.utils import deprecation
LOG = logging.getLogger(__name__)
@@ -165,10 +165,8 @@ class Listener(object):
# TODO(harlowja): remove in 0.7 or later...
ListenerBase = deprecation.moved_inheritable_class(Listener,
'ListenerBase', __name__,
version="0.6",
removal_version="?")
ListenerBase = moves.moved_class(Listener, 'ListenerBase', __name__,
version="0.6", removal_version="2.0")
@six.add_metaclass(abc.ABCMeta)
@@ -213,10 +211,18 @@ class DumpingListener(Listener):
# TODO(harlowja): remove in 0.7 or later...
class LoggingBase(deprecation.moved_inheritable_class(DumpingListener,
'LoggingBase', __name__,
version="0.6",
removal_version="?")):
class LoggingBase(moves.moved_class(DumpingListener,
'LoggingBase', __name__,
version="0.6", removal_version="2.0")):
"""Legacy logging base.
.. deprecated:: 0.6
This class is **deprecated** and is present for backward
compatibility **only**, its replacement
:py:class:`.DumpingListener` should be used going forward.
"""
def _dump(self, message, *args, **kwargs):
self._log(message, *args, **kwargs)

View File

@@ -17,12 +17,15 @@
from __future__ import absolute_import
import itertools
import time
from debtcollector import moves
from oslo_utils import timeutils
from taskflow import exceptions as exc
from taskflow.listeners import base
from taskflow import logging
from taskflow import states
from taskflow.types import timing as tt
STARTING_STATES = frozenset((states.RUNNING, states.REVERTING))
FINISHED_STATES = frozenset((base.FINISH_STATES + (states.REVERTED,)))
@@ -39,7 +42,7 @@ def _printer(message):
print(message)
class TimingListener(base.Listener):
class DurationListener(base.Listener):
"""Listener that captures task duration.
It records how long a task took to execute (or fail)
@@ -47,13 +50,13 @@ class TimingListener(base.Listener):
to task metadata with key ``'duration'``.
"""
def __init__(self, engine):
super(TimingListener, self).__init__(engine,
task_listen_for=WATCH_STATES,
flow_listen_for=[])
super(DurationListener, self).__init__(engine,
task_listen_for=WATCH_STATES,
flow_listen_for=[])
self._timers = {}
def deregister(self):
super(TimingListener, self).deregister()
super(DurationListener, self).deregister()
# There should be none that still exist at deregistering time, so log a
# warning if there were any that somehow still got left behind...
leftover_timers = len(self._timers)
@@ -78,7 +81,7 @@ class TimingListener(base.Listener):
if state == states.PENDING:
self._timers.pop(task_name, None)
elif state in STARTING_STATES:
self._timers[task_name] = tt.StopWatch().start()
self._timers[task_name] = timeutils.StopWatch().start()
elif state in FINISHED_STATES:
timer = self._timers.pop(task_name, None)
if timer is not None:
@@ -86,22 +89,76 @@ class TimingListener(base.Listener):
self._record_ending(timer, task_name)
class PrintingTimingListener(TimingListener):
"""Listener that prints the start & stop timing as well as recording it."""
TimingListener = moves.moved_class(DurationListener,
'TimingListener', __name__,
version="0.8", removal_version="2.0")
class PrintingDurationListener(DurationListener):
"""Listener that prints the duration as well as recording it."""
def __init__(self, engine, printer=None):
super(PrintingTimingListener, self).__init__(engine)
super(PrintingDurationListener, self).__init__(engine)
if printer is None:
self._printer = _printer
else:
self._printer = printer
def _record_ending(self, timer, task_name):
super(PrintingTimingListener, self)._record_ending(timer, task_name)
super(PrintingDurationListener, self)._record_ending(timer, task_name)
self._printer("It took task '%s' %0.2f seconds to"
" finish." % (task_name, timer.elapsed()))
def _task_receiver(self, state, details):
super(PrintingTimingListener, self)._task_receiver(state, details)
super(PrintingDurationListener, self)._task_receiver(state, details)
if state in STARTING_STATES:
self._printer("'%s' task started." % (details['task_name']))
PrintingTimingListener = moves.moved_class(
PrintingDurationListener, 'PrintingTimingListener', __name__,
version="0.8", removal_version="2.0")
class EventTimeListener(base.Listener):
"""Listener that captures task, flow, and retry event timestamps.
It records how when an event is received (using unix time) to
storage. It saves the timestamps under keys (in atom or flow details
metadata) of the format ``{event}-timestamp`` where ``event`` is the
state/event name that has been received.
This information can be later extracted/examined to derive durations...
"""
def __init__(self, engine,
task_listen_for=base.DEFAULT_LISTEN_FOR,
flow_listen_for=base.DEFAULT_LISTEN_FOR,
retry_listen_for=base.DEFAULT_LISTEN_FOR):
super(EventTimeListener, self).__init__(
engine, task_listen_for=task_listen_for,
flow_listen_for=flow_listen_for, retry_listen_for=retry_listen_for)
def _record_atom_event(self, state, atom_name):
meta_update = {'%s-timestamp' % state: time.time()}
try:
# Don't let storage failures throw exceptions in a listener method.
self._engine.storage.update_atom_metadata(atom_name, meta_update)
except exc.StorageFailure:
LOG.warn("Failure to store timestamp %s for atom %s",
meta_update, atom_name, exc_info=True)
def _flow_receiver(self, state, details):
meta_update = {'%s-timestamp' % state: time.time()}
try:
# Don't let storage failures throw exceptions in a listener method.
self._engine.storage.update_flow_metadata(meta_update)
except exc.StorageFailure:
LOG.warn("Failure to store timestamp %s for flow %s",
meta_update, details['flow_name'], exc_info=True)
def _task_receiver(self, state, details):
self._record_atom_event(state, details['task_name'])
def _retry_receiver(self, state, details):
self._record_atom_event(state, details['retry_name'])

View File

@@ -32,6 +32,7 @@ CRITICAL = logging.CRITICAL
DEBUG = logging.DEBUG
ERROR = logging.ERROR
FATAL = logging.FATAL
INFO = logging.INFO
NOTSET = logging.NOTSET
WARN = logging.WARN
WARNING = logging.WARNING

View File

@@ -16,22 +16,32 @@
import collections
import six
from taskflow import exceptions as exc
from taskflow import flow
from taskflow.types import graph as gr
def _unsatisfied_requires(node, graph, *additional_provided):
"""Extracts the unsatisified symbol requirements of a single node."""
requires = set(node.requires)
if not requires:
return requires
for provided in additional_provided:
requires = requires - provided
# This is using the difference() method vs the -
# operator since the latter doesn't work with frozen
# or regular sets (when used in combination with ordered
# sets).
#
# If this is not done the following happens...
#
# TypeError: unsupported operand type(s)
# for -: 'set' and 'OrderedSet'
requires = requires.difference(provided)
if not requires:
return requires
for pred in graph.bfs_predecessors_iter(node):
requires = requires - pred.provides
requires = requires.difference(pred.provides)
if not requires:
return requires
return requires
@@ -55,16 +65,23 @@ class Flow(flow.Flow):
self._graph = gr.DiGraph()
self._graph.freeze()
def link(self, u, v):
#: Extracts the unsatisified symbol requirements of a single node.
_unsatisfied_requires = staticmethod(_unsatisfied_requires)
def link(self, u, v, decider=None):
"""Link existing node u as a runtime dependency of existing node v."""
if not self._graph.has_node(u):
raise ValueError('Item %s not found to link from' % (u))
raise ValueError("Node '%s' not found to link from" % (u))
if not self._graph.has_node(v):
raise ValueError('Item %s not found to link to' % (v))
self._swap(self._link(u, v, manual=True))
raise ValueError("Node '%s' not found to link to" % (v))
if decider is not None:
if not six.callable(decider):
raise ValueError("Decider boolean callback must be callable")
self._swap(self._link(u, v, manual=True, decider=decider))
return self
def _link(self, u, v, graph=None, reason=None, manual=False):
def _link(self, u, v, graph=None,
reason=None, manual=False, decider=None):
mutable_graph = True
if graph is None:
graph = self._graph
@@ -74,6 +91,8 @@ class Flow(flow.Flow):
attrs = graph.get_edge_data(u, v)
if not attrs:
attrs = {}
if decider is not None:
attrs[flow.LINK_DECIDER] = decider
if manual:
attrs[flow.LINK_MANUAL] = True
if reason is not None:
@@ -94,34 +113,38 @@ class Flow(flow.Flow):
direct access to the underlying graph).
"""
if not graph.is_directed_acyclic():
raise exc.DependencyFailure("No path through the items in the"
raise exc.DependencyFailure("No path through the node(s) in the"
" graph produces an ordering that"
" will allow for logical"
" edge traversal")
self._graph = graph.freeze()
def add(self, *items, **kwargs):
def add(self, *nodes, **kwargs):
"""Adds a given task/tasks/flow/flows to this flow.
:param items: items to add to the flow
:param nodes: node(s) to add to the flow
:param kwargs: keyword arguments, the two keyword arguments
currently processed are:
* ``resolve_requires`` a boolean that when true (the
default) implies that when items are added their
symbol requirements will be matched to existing items
and links will be automatically made to those
default) implies that when node(s) are added their
symbol requirements will be matched to existing
node(s) and links will be automatically made to those
providers. If multiple possible providers exist
then a AmbiguousDependency exception will be raised.
* ``resolve_existing``, a boolean that when true (the
default) implies that on addition of a new item that
existing items will have their requirements scanned
for symbols that this newly added item can provide.
default) implies that on addition of a new node that
existing node(s) will have their requirements scanned
for symbols that this newly added node can provide.
If a match is found a link is automatically created
from the newly added item to the requiree.
from the newly added node to the requiree.
"""
items = [i for i in items if not self._graph.has_node(i)]
if not items:
# Let's try to avoid doing any work if we can; since the below code
# after this filter can create more temporary graphs that aren't needed
# if the nodes already exist...
nodes = [i for i in nodes if not self._graph.has_node(i)]
if not nodes:
return self
# This syntax will *hopefully* be better in future versions of python.
@@ -143,52 +166,52 @@ class Flow(flow.Flow):
retry_provides.add(value)
provided[value].append(self._retry)
for item in self._graph.nodes_iter():
for value in _unsatisfied_requires(item, self._graph,
retry_provides):
required[value].append(item)
for value in item.provides:
provided[value].append(item)
for node in self._graph.nodes_iter():
for value in self._unsatisfied_requires(node, self._graph,
retry_provides):
required[value].append(node)
for value in node.provides:
provided[value].append(node)
# NOTE(harlowja): Add items and edges to a temporary copy of the
# NOTE(harlowja): Add node(s) and edge(s) to a temporary copy of the
# underlying graph and only if that is successful added to do we then
# swap with the underlying graph.
tmp_graph = gr.DiGraph(self._graph)
for item in items:
tmp_graph.add_node(item)
for node in nodes:
tmp_graph.add_node(node)
# Try to find a valid provider.
if resolve_requires:
for value in _unsatisfied_requires(item, tmp_graph,
retry_provides):
for value in self._unsatisfied_requires(node, tmp_graph,
retry_provides):
if value in provided:
providers = provided[value]
if len(providers) > 1:
provider_names = [n.name for n in providers]
raise exc.AmbiguousDependency(
"Resolution error detected when"
" adding %(item)s, multiple"
" adding '%(node)s', multiple"
" providers %(providers)s found for"
" required symbol '%(value)s'"
% dict(item=item.name,
% dict(node=node.name,
providers=sorted(provider_names),
value=value))
else:
self._link(providers[0], item,
self._link(providers[0], node,
graph=tmp_graph, reason=value)
else:
required[value].append(item)
required[value].append(node)
for value in item.provides:
provided[value].append(item)
for value in node.provides:
provided[value].append(node)
# See if what we provide fulfills any existing requiree.
if resolve_existing:
for value in item.provides:
for value in node.provides:
if value in required:
for requiree in list(required[value]):
if requiree is not item:
self._link(item, requiree,
if requiree is not node:
self._link(node, requiree,
graph=tmp_graph, reason=value)
required[value].remove(requiree)
@@ -222,8 +245,9 @@ class Flow(flow.Flow):
requires.update(self._retry.requires)
retry_provides.update(self._retry.provides)
g = self._get_subgraph()
for item in g.nodes_iter():
requires.update(_unsatisfied_requires(item, g, retry_provides))
for node in g.nodes_iter():
requires.update(self._unsatisfied_requires(node, g,
retry_provides))
return frozenset(requires)
@@ -239,36 +263,35 @@ class TargetedFlow(Flow):
self._subgraph = None
self._target = None
def set_target(self, target_item):
def set_target(self, target_node):
"""Set target for the flow.
Any items (tasks or subflows) not needed for the target
item will not be executed.
Any node(s) (tasks or subflows) not needed for the target
node will not be executed.
"""
if not self._graph.has_node(target_item):
raise ValueError('Item %s not found' % target_item)
self._target = target_item
if not self._graph.has_node(target_node):
raise ValueError("Node '%s' not found" % target_node)
self._target = target_node
self._subgraph = None
def reset_target(self):
"""Reset target for the flow.
All items of the flow will be executed.
All node(s) of the flow will be executed.
"""
self._target = None
self._subgraph = None
def add(self, *items):
def add(self, *nodes):
"""Adds a given task/tasks/flow/flows to this flow."""
super(TargetedFlow, self).add(*items)
super(TargetedFlow, self).add(*nodes)
# reset cached subgraph, in case it was affected
self._subgraph = None
return self
def link(self, u, v):
def link(self, u, v, decider=None):
"""Link existing node u as a runtime dependency of existing node v."""
super(TargetedFlow, self).link(u, v)
super(TargetedFlow, self).link(u, v, decider=decider)
# reset cached subgraph, in case it was affected
self._subgraph = None
return self

View File

@@ -16,6 +16,7 @@
import contextlib
import six
from stevedore import driver
from taskflow import exceptions as exc
@@ -38,18 +39,20 @@ def fetch(conf, namespace=BACKEND_NAMESPACE, **kwargs):
NOTE(harlowja): to aid in making it easy to specify configuration and
options to a backend the configuration (which is typical just a dictionary)
can also be a uri string that identifies the entrypoint name and any
can also be a URI string that identifies the entrypoint name and any
configuration specific to that backend.
For example, given the following configuration uri:
For example, given the following configuration URI::
mysql://<not-used>/?a=b&c=d
mysql://<not-used>/?a=b&c=d
This will look for the entrypoint named 'mysql' and will provide
a configuration object composed of the uris parameters, in this case that
is {'a': 'b', 'c': 'd'} to the constructor of that persistence backend
a configuration object composed of the URI's components, in this case that
is ``{'a': 'b', 'c': 'd'}`` to the constructor of that persistence backend
instance.
"""
if isinstance(conf, six.string_types):
conf = {'connection': conf}
backend_name = conf['connection']
try:
uri = misc.parse_uri(backend_name)

View File

@@ -15,33 +15,39 @@
# License for the specific language governing permissions and limitations
# under the License.
import contextlib
import errno
import os
import shutil
import cachetools
import fasteners
from oslo_serialization import jsonutils
import six
from taskflow import exceptions as exc
from taskflow import logging
from taskflow.persistence import base
from taskflow.persistence import logbook
from taskflow.utils import lock_utils
from taskflow.persistence import path_based
from taskflow.utils import misc
LOG = logging.getLogger(__name__)
@contextlib.contextmanager
def _storagefailure_wrapper():
try:
yield
except exc.TaskFlowException:
raise
except Exception as e:
if isinstance(e, (IOError, OSError)) and e.errno == errno.ENOENT:
exc.raise_with_cause(exc.NotFound,
'Item not found: %s' % e.filename,
cause=e)
else:
exc.raise_with_cause(exc.StorageFailure,
"Storage backend internal error", cause=e)
class DirBackend(base.Backend):
class DirBackend(path_based.PathBasedBackend):
"""A directory and file based backend.
This backend writes logbooks, flow details, and atom details to a provided
base path on the local filesystem. It will create and store those objects
in three key directories (one for logbooks, one for flow details and one
for atom details). It creates those associated directories and then
creates files inside those directories that represent the contents of those
objects for later reading and writing.
This backend does *not* provide true transactional semantics. It does
guarantee that there will be no interprocess race conditions when
writing and reading by using a consistent hierarchy of file based locks.
@@ -49,22 +55,33 @@ class DirBackend(base.Backend):
Example configuration::
conf = {
"path": "/tmp/taskflow",
"path": "/tmp/taskflow", # save data to this root directory
"max_cache_size": 1024, # keep up-to 1024 entries in memory
}
"""
DEFAULT_FILE_ENCODING = 'utf-8'
"""
Default encoding used when decoding or encoding files into or from
text/unicode into binary or binary into text/unicode.
"""
def __init__(self, conf):
super(DirBackend, self).__init__(conf)
self._path = os.path.abspath(conf['path'])
self._lock_path = os.path.join(self._path, 'locks')
self._file_cache = {}
@property
def lock_path(self):
return self._lock_path
@property
def base_path(self):
return self._path
max_cache_size = self._conf.get('max_cache_size')
if max_cache_size is not None:
max_cache_size = int(max_cache_size)
if max_cache_size < 1:
raise ValueError("Maximum cache size must be greater than"
" or equal to one")
self.file_cache = cachetools.LRUCache(max_cache_size)
else:
self.file_cache = {}
self.encoding = self._conf.get('encoding', self.DEFAULT_FILE_ENCODING)
if not self._path:
raise ValueError("Empty path is disallowed")
self._path = os.path.abspath(self._path)
self.lock = fasteners.ReaderWriterLock()
def get_connection(self):
return Connection(self)
@@ -73,333 +90,77 @@ class DirBackend(base.Backend):
pass
class Connection(base.Connection):
def __init__(self, backend):
self._backend = backend
self._file_cache = self._backend._file_cache
self._flow_path = os.path.join(self._backend.base_path, 'flows')
self._atom_path = os.path.join(self._backend.base_path, 'atoms')
self._book_path = os.path.join(self._backend.base_path, 'books')
def validate(self):
# Verify key paths exist.
paths = [
self._backend.base_path,
self._backend.lock_path,
self._flow_path,
self._atom_path,
self._book_path,
]
for p in paths:
if not os.path.isdir(p):
raise RuntimeError("Missing required directory: %s" % (p))
class Connection(path_based.PathBasedConnection):
def _read_from(self, filename):
# This is very similar to the oslo-incubator fileutils module, but
# tweaked to not depend on a global cache, as well as tweaked to not
# pull-in the oslo logging module (which is a huge pile of code).
mtime = os.path.getmtime(filename)
cache_info = self._file_cache.setdefault(filename, {})
cache_info = self.backend.file_cache.setdefault(filename, {})
if not cache_info or mtime > cache_info.get('mtime', 0):
with open(filename, 'rb') as fp:
cache_info['data'] = fp.read().decode('utf-8')
cache_info['data'] = misc.binary_decode(
fp.read(), encoding=self.backend.encoding)
cache_info['mtime'] = mtime
return cache_info['data']
def _write_to(self, filename, contents):
if isinstance(contents, six.text_type):
contents = contents.encode('utf-8')
contents = misc.binary_encode(contents,
encoding=self.backend.encoding)
with open(filename, 'wb') as fp:
fp.write(contents)
self._file_cache.pop(filename, None)
self.backend.file_cache.pop(filename, None)
def _run_with_process_lock(self, lock_name, functor, *args, **kwargs):
lock_path = os.path.join(self.backend.lock_path, lock_name)
with lock_utils.InterProcessLock(lock_path):
@contextlib.contextmanager
def _path_lock(self, path):
lockfile = self._join_path(path, 'lock')
with fasteners.InterProcessLock(lockfile) as lock:
with _storagefailure_wrapper():
yield lock
def _join_path(self, *parts):
return os.path.join(*parts)
def _get_item(self, path):
with self._path_lock(path):
item_path = self._join_path(path, 'metadata')
return misc.decode_json(self._read_from(item_path))
def _set_item(self, path, value, transaction):
with self._path_lock(path):
item_path = self._join_path(path, 'metadata')
self._write_to(item_path, jsonutils.dumps(value))
def _del_tree(self, path, transaction):
with self._path_lock(path):
shutil.rmtree(path)
def _get_children(self, path):
with _storagefailure_wrapper():
return [link for link in os.listdir(path)
if os.path.islink(self._join_path(path, link))]
def _ensure_path(self, path):
with _storagefailure_wrapper():
misc.ensure_tree(path)
def _create_link(self, src_path, dest_path, transaction):
with _storagefailure_wrapper():
try:
return functor(*args, **kwargs)
except exc.TaskFlowException:
raise
except Exception as e:
LOG.exception("Failed running locking file based session")
# NOTE(harlowja): trap all other errors as storage errors.
raise exc.StorageFailure("Storage backend internal error", e)
def _get_logbooks(self):
lb_uuids = []
try:
lb_uuids = [d for d in os.listdir(self._book_path)
if os.path.isdir(os.path.join(self._book_path, d))]
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise
for lb_uuid in lb_uuids:
try:
yield self._get_logbook(lb_uuid)
except exc.NotFound:
pass
def get_logbooks(self):
try:
books = list(self._get_logbooks())
except EnvironmentError as e:
raise exc.StorageFailure("Unable to fetch logbooks", e)
else:
for b in books:
yield b
@property
def backend(self):
return self._backend
def close(self):
pass
def _save_atom_details(self, atom_detail, ignore_missing):
# See if we have an existing atom detail to merge with.
e_ad = None
try:
e_ad = self._get_atom_details(atom_detail.uuid, lock=False)
except EnvironmentError:
if not ignore_missing:
raise exc.NotFound("No atom details found with id: %s"
% atom_detail.uuid)
if e_ad is not None:
atom_detail = e_ad.merge(atom_detail)
ad_path = os.path.join(self._atom_path, atom_detail.uuid)
ad_data = base._format_atom(atom_detail)
self._write_to(ad_path, jsonutils.dumps(ad_data))
return atom_detail
def update_atom_details(self, atom_detail):
return self._run_with_process_lock("atom",
self._save_atom_details,
atom_detail,
ignore_missing=False)
def _get_atom_details(self, uuid, lock=True):
def _get():
ad_path = os.path.join(self._atom_path, uuid)
ad_data = misc.decode_json(self._read_from(ad_path))
ad_cls = logbook.atom_detail_class(ad_data['type'])
return ad_cls.from_dict(ad_data['atom'])
if lock:
return self._run_with_process_lock('atom', _get)
else:
return _get()
def _get_flow_details(self, uuid, lock=True):
def _get():
fd_path = os.path.join(self._flow_path, uuid)
meta_path = os.path.join(fd_path, 'metadata')
meta = misc.decode_json(self._read_from(meta_path))
fd = logbook.FlowDetail.from_dict(meta)
ad_to_load = []
ad_path = os.path.join(fd_path, 'atoms')
try:
ad_to_load = [f for f in os.listdir(ad_path)
if os.path.islink(os.path.join(ad_path, f))]
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise
for ad_uuid in ad_to_load:
fd.add(self._get_atom_details(ad_uuid))
return fd
if lock:
return self._run_with_process_lock('flow', _get)
else:
return _get()
def _save_atoms_and_link(self, atom_details, local_atom_path):
for atom_detail in atom_details:
self._save_atom_details(atom_detail, ignore_missing=True)
src_ad_path = os.path.join(self._atom_path, atom_detail.uuid)
target_ad_path = os.path.join(local_atom_path, atom_detail.uuid)
try:
os.symlink(src_ad_path, target_ad_path)
except EnvironmentError as e:
os.symlink(src_path, dest_path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def _save_flow_details(self, flow_detail, ignore_missing):
# See if we have an existing flow detail to merge with.
e_fd = None
try:
e_fd = self._get_flow_details(flow_detail.uuid, lock=False)
except EnvironmentError:
if not ignore_missing:
raise exc.NotFound("No flow details found with id: %s"
% flow_detail.uuid)
if e_fd is not None:
e_fd = e_fd.merge(flow_detail)
for ad in flow_detail:
if e_fd.find(ad.uuid) is None:
e_fd.add(ad)
flow_detail = e_fd
flow_path = os.path.join(self._flow_path, flow_detail.uuid)
misc.ensure_tree(flow_path)
self._write_to(os.path.join(flow_path, 'metadata'),
jsonutils.dumps(flow_detail.to_dict()))
if len(flow_detail):
atom_path = os.path.join(flow_path, 'atoms')
misc.ensure_tree(atom_path)
self._run_with_process_lock('atom',
self._save_atoms_and_link,
list(flow_detail), atom_path)
return flow_detail
@contextlib.contextmanager
def _transaction(self):
"""This just wraps a global write-lock."""
lock = self.backend.lock.write_lock
with lock():
yield
def update_flow_details(self, flow_detail):
return self._run_with_process_lock("flow",
self._save_flow_details,
flow_detail,
ignore_missing=False)
def _save_flows_and_link(self, flow_details, local_flow_path):
for flow_detail in flow_details:
self._save_flow_details(flow_detail, ignore_missing=True)
src_fd_path = os.path.join(self._flow_path, flow_detail.uuid)
target_fd_path = os.path.join(local_flow_path, flow_detail.uuid)
try:
os.symlink(src_fd_path, target_fd_path)
except EnvironmentError as e:
if e.errno != errno.EEXIST:
raise
def _save_logbook(self, book):
# See if we have an existing logbook to merge with.
e_lb = None
try:
e_lb = self._get_logbook(book.uuid)
except exc.NotFound:
pass
if e_lb is not None:
e_lb = e_lb.merge(book)
for fd in book:
if e_lb.find(fd.uuid) is None:
e_lb.add(fd)
book = e_lb
book_path = os.path.join(self._book_path, book.uuid)
misc.ensure_tree(book_path)
self._write_to(os.path.join(book_path, 'metadata'),
jsonutils.dumps(book.to_dict(marshal_time=True)))
if len(book):
flow_path = os.path.join(book_path, 'flows')
misc.ensure_tree(flow_path)
self._run_with_process_lock('flow',
self._save_flows_and_link,
list(book), flow_path)
return book
def save_logbook(self, book):
return self._run_with_process_lock("book",
self._save_logbook, book)
def upgrade(self):
def _step_create():
for path in (self._book_path, self._flow_path, self._atom_path):
try:
misc.ensure_tree(path)
except EnvironmentError as e:
raise exc.StorageFailure("Unable to create logbooks"
" required child path %s" % path,
e)
for path in (self._backend.base_path, self._backend.lock_path):
try:
misc.ensure_tree(path)
except EnvironmentError as e:
raise exc.StorageFailure("Unable to create logbooks required"
" path %s" % path, e)
self._run_with_process_lock("init", _step_create)
def clear_all(self):
def _step_clear():
for d in (self._book_path, self._flow_path, self._atom_path):
if os.path.isdir(d):
shutil.rmtree(d)
def _step_atom():
self._run_with_process_lock("atom", _step_clear)
def _step_flow():
self._run_with_process_lock("flow", _step_atom)
def _step_book():
self._run_with_process_lock("book", _step_flow)
# Acquire all locks by going through this little hierarchy.
self._run_with_process_lock("init", _step_book)
def destroy_logbook(self, book_uuid):
def _destroy_atoms(atom_details):
for atom_detail in atom_details:
atom_path = os.path.join(self._atom_path, atom_detail.uuid)
try:
shutil.rmtree(atom_path)
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise exc.StorageFailure("Unable to remove atom"
" directory %s" % atom_path,
e)
def _destroy_flows(flow_details):
for flow_detail in flow_details:
flow_path = os.path.join(self._flow_path, flow_detail.uuid)
self._run_with_process_lock("atom", _destroy_atoms,
list(flow_detail))
try:
shutil.rmtree(flow_path)
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise exc.StorageFailure("Unable to remove flow"
" directory %s" % flow_path,
e)
def _destroy_book():
book = self._get_logbook(book_uuid)
book_path = os.path.join(self._book_path, book.uuid)
self._run_with_process_lock("flow", _destroy_flows, list(book))
try:
shutil.rmtree(book_path)
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise exc.StorageFailure("Unable to remove book"
" directory %s" % book_path, e)
# Acquire all locks by going through this little hierarchy.
self._run_with_process_lock("book", _destroy_book)
def _get_logbook(self, book_uuid):
book_path = os.path.join(self._book_path, book_uuid)
meta_path = os.path.join(book_path, 'metadata')
try:
meta = misc.decode_json(self._read_from(meta_path))
except EnvironmentError as e:
if e.errno == errno.ENOENT:
raise exc.NotFound("No logbook found with id: %s" % book_uuid)
else:
raise
lb = logbook.LogBook.from_dict(meta, unmarshal_time=True)
fd_path = os.path.join(book_path, 'flows')
fd_uuids = []
try:
fd_uuids = [f for f in os.listdir(fd_path)
if os.path.islink(os.path.join(fd_path, f))]
except EnvironmentError as e:
if e.errno != errno.ENOENT:
raise
for fd_uuid in fd_uuids:
lb.add(self._get_flow_details(fd_uuid))
return lb
def get_logbook(self, book_uuid):
return self._run_with_process_lock("book",
self._get_logbook, book_uuid)
def validate(self):
with _storagefailure_wrapper():
for p in (self.flow_path, self.atom_path, self.book_path):
if not os.path.isdir(p):
raise RuntimeError("Missing required directory: %s" % (p))

Some files were not shown because too many files have changed in this diff Show More