Merge "Check documentation for simple style requirements"
This commit is contained in:
commit
258e009ae3
@ -11,10 +11,10 @@ Atom Arguments and Results
|
||||
In taskflow, all flow and task state goes to (potentially persistent) storage.
|
||||
That includes all the information that :doc:`atoms <atoms>` (e.g. tasks) in the
|
||||
flow need when they are executed, and all the information task produces (via
|
||||
serializable task results). A developer who implements tasks or flows can specify
|
||||
what arguments a task accepts and what result it returns in several ways. This
|
||||
document will help you understand what those ways are and how to use those ways
|
||||
to accomplish your desired usage pattern.
|
||||
serializable task results). A developer who implements tasks or flows can
|
||||
specify what arguments a task accepts and what result it returns in several
|
||||
ways. This document will help you understand what those ways are and how to use
|
||||
those ways to accomplish your desired usage pattern.
|
||||
|
||||
.. glossary::
|
||||
|
||||
@ -191,11 +191,11 @@ Results Specification
|
||||
=====================
|
||||
|
||||
In python, function results are not named, so we can not infer what a task
|
||||
returns. This is important since the complete task result (what the |task.execute|
|
||||
method returns) is saved in (potentially persistent) storage, and it is
|
||||
typically (but not always) desirable to make those results accessible to other
|
||||
tasks. To accomplish this the task specifies names of those values via its
|
||||
``provides`` task constructor parameter or other method (see below).
|
||||
returns. This is important since the complete task result (what the
|
||||
|task.execute| method returns) is saved in (potentially persistent) storage,
|
||||
and it is typically (but not always) desirable to make those results accessible
|
||||
to other tasks. To accomplish this the task specifies names of those values via
|
||||
its ``provides`` task constructor parameter or other method (see below).
|
||||
|
||||
Returning One Value
|
||||
-------------------
|
||||
@ -267,7 +267,8 @@ Another option is to return several values as a dictionary (aka a ``dict``).
|
||||
'pieces': 'PIECEs'
|
||||
}
|
||||
|
||||
TaskFlow expects that a dict will be returned if ``provides`` argument is a ``set``:
|
||||
TaskFlow expects that a dict will be returned if ``provides`` argument is a
|
||||
``set``:
|
||||
|
||||
::
|
||||
|
||||
@ -314,15 +315,15 @@ Of course, the flow author can override this to change names if needed:
|
||||
|
||||
BitsAndPiecesTask(provides=('b', 'p'))
|
||||
|
||||
or to change structure -- e.g. this instance will make whole tuple accessible to
|
||||
other tasks by name 'bnp':
|
||||
or to change structure -- e.g. this instance will make whole tuple accessible
|
||||
to other tasks by name 'bnp':
|
||||
|
||||
::
|
||||
|
||||
BitsAndPiecesTask(provides='bnp')
|
||||
|
||||
or the flow author may want to return default behavior and hide the results of the
|
||||
task from other tasks in the flow (e.g. to avoid naming conflicts):
|
||||
or the flow author may want to return default behavior and hide the results of
|
||||
the task from other tasks in the flow (e.g. to avoid naming conflicts):
|
||||
|
||||
::
|
||||
|
||||
@ -339,7 +340,8 @@ For ``result`` value, two cases are possible:
|
||||
|
||||
* if task is being reverted because it failed (an exception was raised from its
|
||||
|task.execute| method), ``result`` value is instance of
|
||||
:py:class:`taskflow.utils.misc.Failure` object that holds exception information;
|
||||
:py:class:`taskflow.utils.misc.Failure` object that holds exception
|
||||
information;
|
||||
|
||||
* if task is being reverted because some other task failed, and this task
|
||||
finished successfully, ``result`` value is task result fetched from storage:
|
||||
@ -360,7 +362,8 @@ To determine if task failed you can check whether ``result`` is instance of
|
||||
|
||||
def revert(self, result, spam, eggs):
|
||||
if isinstance(result, misc.Failure):
|
||||
print("This task failed, exception: %s" % result.exception_str)
|
||||
print("This task failed, exception: %s"
|
||||
% result.exception_str)
|
||||
else:
|
||||
print("do_something returned %r" % result)
|
||||
|
||||
@ -372,9 +375,10 @@ representation of result.
|
||||
Retry Arguments
|
||||
===============
|
||||
|
||||
A Retry controller works with arguments in the same way as a Task. But it has an additional parameter 'history' that is
|
||||
a list of tuples. Each tuple contains a result of the previous Retry run and a table where a key is a failed task and a value
|
||||
is a :py:class:`taskflow.utils.misc.Failure`.
|
||||
A Retry controller works with arguments in the same way as a Task. But it has
|
||||
an additional parameter 'history' that is a list of tuples. Each tuple contains
|
||||
a result of the previous Retry run and a table where a key is a failed task and
|
||||
a value is a :py:class:`taskflow.utils.misc.Failure`.
|
||||
|
||||
Consider the following Retry::
|
||||
|
||||
@ -393,15 +397,18 @@ Consider the following Retry::
|
||||
def revert(self, history, *args, **kwargs):
|
||||
print history
|
||||
|
||||
Imagine the following Retry had returned a value '5' and then some task 'A' failed with some exception.
|
||||
In this case ``on_failure`` method will receive the following history::
|
||||
Imagine the following Retry had returned a value '5' and then some task 'A'
|
||||
failed with some exception. In this case ``on_failure`` method will receive
|
||||
the following history::
|
||||
|
||||
[('5', {'A': misc.Failure()})]
|
||||
|
||||
Then the |retry.execute| method will be called again and it'll receive the same history.
|
||||
Then the |retry.execute| method will be called again and it'll receive the same
|
||||
history.
|
||||
|
||||
If the |retry.execute| method raises an exception, the |retry.revert| method of Retry will be called and :py:class:`taskflow.utils.misc.Failure` object will be present
|
||||
in the history instead of Retry result::
|
||||
If the |retry.execute| method raises an exception, the |retry.revert| method of
|
||||
Retry will be called and :py:class:`taskflow.utils.misc.Failure` object will be
|
||||
present in the history instead of Retry result::
|
||||
|
||||
[('5', {'A': misc.Failure()}), (misc.Failure(), {})]
|
||||
|
||||
|
@ -7,8 +7,8 @@ Overview
|
||||
|
||||
Engines are what **really** runs your atoms.
|
||||
|
||||
An *engine* takes a flow structure (described by :doc:`patterns <patterns>`) and
|
||||
uses it to decide which :doc:`atom <atoms>` to run and when.
|
||||
An *engine* takes a flow structure (described by :doc:`patterns <patterns>`)
|
||||
and uses it to decide which :doc:`atom <atoms>` to run and when.
|
||||
|
||||
TaskFlow provides different implementations of engines. Some may be easier to
|
||||
use (ie, require no additional infrastructure setup) and understand; others
|
||||
@ -18,10 +18,10 @@ select an engine that suites their setup best without modifying the code of
|
||||
said service.
|
||||
|
||||
Engines usually have different capabilities and configuration, but all of them
|
||||
**must** implement the same interface and preserve the semantics of patterns (e.g.
|
||||
parts of :py:class:`linear flow <taskflow.patterns.linear_flow.Flow>` are run
|
||||
one after another, in order, even if engine is *capable* of running tasks in
|
||||
parallel).
|
||||
**must** implement the same interface and preserve the semantics of patterns
|
||||
(e.g. parts of :py:class:`linear flow <taskflow.patterns.linear_flow.Flow>`
|
||||
are run one after another, in order, even if engine is *capable* of running
|
||||
tasks in parallel).
|
||||
|
||||
Why they exist
|
||||
--------------
|
||||
@ -31,66 +31,71 @@ likely a new concept for many programmers so let's describe how it operates in
|
||||
more depth and some of the reasoning behind why it exists. This will hopefully
|
||||
make it more clear on there value add to the TaskFlow library user.
|
||||
|
||||
First though let us discuss something most are familiar already with; the difference
|
||||
between `declarative`_ and `imperative`_ programming models. The imperative model
|
||||
involves establishing statements that accomplish a programs action (likely using
|
||||
conditionals and such other language features to do this). This kind of program embeds
|
||||
the *how* to accomplish a goal while also defining *what* the goal actually is (and the state
|
||||
of this is maintained in memory or on the stack while these statements execute). In contrast
|
||||
there is the the declarative model which instead of combining the *how* to accomplish a goal
|
||||
along side the *what* is to be accomplished splits these two into only declaring what
|
||||
the intended goal is and not the *how*. In TaskFlow terminology the *what* is the structure
|
||||
of your flows and the tasks and other atoms you have inside those flows, but the *how*
|
||||
is not defined (the line becomes blurred since tasks themselves contain imperative
|
||||
code, but for now consider a task as more of a *pure* function that executes, reverts and may
|
||||
require inputs and provide outputs). This is where engines get involved; they do
|
||||
the execution of the *what* defined via :doc:`atoms <atoms>`, tasks, flows and
|
||||
the relationships defined there-in and execute these in a well-defined
|
||||
manner (and the engine is responsible for *most* of the state manipulation
|
||||
instead).
|
||||
First though let us discuss something most are familiar already with; the
|
||||
difference between `declarative`_ and `imperative`_ programming models. The
|
||||
imperative model involves establishing statements that accomplish a programs
|
||||
action (likely using conditionals and such other language features to do this).
|
||||
This kind of program embeds the *how* to accomplish a goal while also defining
|
||||
*what* the goal actually is (and the state of this is maintained in memory or
|
||||
on the stack while these statements execute). In contrast there is the the
|
||||
declarative model which instead of combining the *how* to accomplish a goal
|
||||
along side the *what* is to be accomplished splits these two into only
|
||||
declaring what the intended goal is and not the *how*. In TaskFlow terminology
|
||||
the *what* is the structure of your flows and the tasks and other atoms you
|
||||
have inside those flows, but the *how* is not defined (the line becomes blurred
|
||||
since tasks themselves contain imperative code, but for now consider a task as
|
||||
more of a *pure* function that executes, reverts and may require inputs and
|
||||
provide outputs). This is where engines get involved; they do the execution of
|
||||
the *what* defined via :doc:`atoms <atoms>`, tasks, flows and the relationships
|
||||
defined there-in and execute these in a well-defined manner (and the engine is
|
||||
responsible for *most* of the state manipulation instead).
|
||||
|
||||
This mix of imperative and declarative (with a stronger emphasis on the
|
||||
declarative model) allows for the following functionality to be possible:
|
||||
|
||||
* Enhancing reliability: Decoupling of state alterations from what should be accomplished
|
||||
allows for a *natural* way of resuming by allowing the engine to track the current state
|
||||
and know at which point a flow is in and how to get back into that state when
|
||||
resumption occurs.
|
||||
* Enhancing scalability: When a engine is responsible for executing your desired work
|
||||
it becomes possible to alter the *how* in the future by creating new types of execution
|
||||
backends (for example the worker model which does not execute locally). Without the decoupling
|
||||
of the *what* and the *how* it is not possible to provide such a feature (since by the very
|
||||
nature of that coupling this kind of functionality is inherently hard to provide).
|
||||
* Enhancing consistency: Since the engine is responsible for executing atoms and the
|
||||
associated workflow, it can be one (if not the only) of the primary entities
|
||||
that is working to keep the execution model in a consistent state. Coupled with atoms
|
||||
which *should* be immutable and have have limited (if any) internal state the
|
||||
ability to reason about and obtain consistency can be vastly improved.
|
||||
* Enhancing reliability: Decoupling of state alterations from what should be
|
||||
accomplished allows for a *natural* way of resuming by allowing the engine to
|
||||
track the current state and know at which point a flow is in and how to get
|
||||
back into that state when resumption occurs.
|
||||
* Enhancing scalability: When a engine is responsible for executing your
|
||||
desired work it becomes possible to alter the *how* in the future by creating
|
||||
new types of execution backends (for example the worker model which does not
|
||||
execute locally). Without the decoupling of the *what* and the *how* it is
|
||||
not possible to provide such a feature (since by the very nature of that
|
||||
coupling this kind of functionality is inherently hard to provide).
|
||||
* Enhancing consistency: Since the engine is responsible for executing atoms
|
||||
and the associated workflow, it can be one (if not the only) of the primary
|
||||
entities that is working to keep the execution model in a consistent state.
|
||||
Coupled with atoms which *should* be immutable and have have limited (if any)
|
||||
internal state the ability to reason about and obtain consistency can be
|
||||
vastly improved.
|
||||
|
||||
* With future features around locking (using `tooz`_ to help) engines can also
|
||||
help ensure that resources being accessed by tasks are reliably obtained and
|
||||
mutated on. This will help ensure that other processes, threads, or other types
|
||||
of entities are also not executing tasks that manipulate those same resources (further
|
||||
increasing consistency).
|
||||
* With future features around locking (using `tooz`_ to help) engines can
|
||||
also help ensure that resources being accessed by tasks are reliably
|
||||
obtained and mutated on. This will help ensure that other processes,
|
||||
threads, or other types of entities are also not executing tasks that
|
||||
manipulate those same resources (further increasing consistency).
|
||||
|
||||
Of course these kind of features can come with some drawbacks:
|
||||
|
||||
* The downside of decoupling the *how* and the *what* is that the imperative model
|
||||
where functions control & manipulate state must start to be shifted away from
|
||||
(and this is likely a mindset change for programmers used to the imperative
|
||||
model). We have worked to make this less of a concern by creating and
|
||||
encouraging the usage of :doc:`persistence <persistence>`, to help make it possible
|
||||
to have some level of provided state transfer mechanism.
|
||||
* Depending on how much imperative code exists (and state inside that code) there
|
||||
can be *significant* rework of that code and converting or refactoring it to these new concepts.
|
||||
We have tried to help here by allowing you to have tasks that internally use regular python
|
||||
code (and internally can be written in an imperative style) as well as by providing examples
|
||||
and these developer docs; helping this process be as seamless as possible.
|
||||
* Another one of the downsides of decoupling the *what* from the *how* is that it may become
|
||||
harder to use traditional techniques to debug failures (especially if remote workers are
|
||||
involved). We try to help here by making it easy to track, monitor and introspect
|
||||
the actions & state changes that are occurring inside an engine (see
|
||||
:doc:`notifications <notifications>` for how to use some of these capabilities).
|
||||
* The downside of decoupling the *how* and the *what* is that the imperative
|
||||
model where functions control & manipulate state must start to be shifted
|
||||
away from (and this is likely a mindset change for programmers used to the
|
||||
imperative model). We have worked to make this less of a concern by creating
|
||||
and encouraging the usage of :doc:`persistence <persistence>`, to help make
|
||||
it possible to have some level of provided state transfer mechanism.
|
||||
* Depending on how much imperative code exists (and state inside that code)
|
||||
there can be *significant* rework of that code and converting or refactoring
|
||||
it to these new concepts. We have tried to help here by allowing you to have
|
||||
tasks that internally use regular python code (and internally can be written
|
||||
in an imperative style) as well as by providing examples and these developer
|
||||
docs; helping this process be as seamless as possible.
|
||||
* Another one of the downsides of decoupling the *what* from the *how* is that
|
||||
it may become harder to use traditional techniques to debug failures
|
||||
(especially if remote workers are involved). We try to help here by making it
|
||||
easy to track, monitor and introspect the actions & state changes that are
|
||||
occurring inside an engine (see :doc:`notifications <notifications>` for how
|
||||
to use some of these capabilities).
|
||||
|
||||
.. _declarative: http://en.wikipedia.org/wiki/Declarative_programming
|
||||
.. _imperative: http://en.wikipedia.org/wiki/Imperative_programming
|
||||
@ -112,7 +117,8 @@ might look like::
|
||||
|
||||
...
|
||||
flow = make_flow()
|
||||
engine = engines.load(flow, engine_conf=my_conf, backend=my_persistence_conf)
|
||||
engine = engines.load(flow, engine_conf=my_conf,
|
||||
backend=my_persistence_conf)
|
||||
engine.run
|
||||
|
||||
|
||||
@ -153,10 +159,10 @@ Parallel engine schedules tasks onto different threads to run them in parallel.
|
||||
Additional supported keyword arguments:
|
||||
|
||||
* ``executor``: a object that implements a :pep:`3148` compatible `executor`_
|
||||
interface; it will be used for scheduling tasks. You can use instances
|
||||
of a `thread pool executor`_ or a
|
||||
:py:class:`green executor <taskflow.utils.eventlet_utils.GreenExecutor>`
|
||||
(which internally uses `eventlet <http://eventlet.net/>`_ and greenthread pools).
|
||||
interface; it will be used for scheduling tasks. You can use instances of a
|
||||
`thread pool executor`_ or a :py:class:`green executor
|
||||
<taskflow.utils.eventlet_utils.GreenExecutor>` (which internally uses
|
||||
`eventlet <http://eventlet.net/>`_ and greenthread pools).
|
||||
|
||||
.. tip::
|
||||
|
||||
@ -190,22 +196,23 @@ Creation
|
||||
The first thing that occurs is that the user creates an engine for a given
|
||||
flow, providing a flow detail (where results will be saved into a provided
|
||||
:doc:`persistence <persistence>` backend). This is typically accomplished via
|
||||
the methods described above in `creating engines`_. The engine at this point now will
|
||||
have references to your flow and backends and other internal variables are
|
||||
setup.
|
||||
the methods described above in `creating engines`_. The engine at this point
|
||||
now will have references to your flow and backends and other internal variables
|
||||
are setup.
|
||||
|
||||
Compiling
|
||||
---------
|
||||
|
||||
During this stage the flow will be converted into an internal graph representation
|
||||
using a flow :py:func:`~taskflow.utils.flow_utils.flatten` function. This function
|
||||
converts the flow objects and contained atoms into a `networkx`_ directed graph that
|
||||
contains the equivalent atoms defined in the flow and any nested flows & atoms as
|
||||
well as the constraints that are created by the application of the different flow
|
||||
patterns. This graph is then what will be analyzed & traversed during the engines
|
||||
execution. At this point a few helper object are also created and saved to
|
||||
internal engine variables (these object help in execution of atoms, analyzing
|
||||
the graph and performing other internal engine activities).
|
||||
During this stage the flow will be converted into an internal graph
|
||||
representation using a flow :py:func:`~taskflow.utils.flow_utils.flatten`
|
||||
function. This function converts the flow objects and contained atoms into a
|
||||
`networkx`_ directed graph that contains the equivalent atoms defined in the
|
||||
flow and any nested flows & atoms as well as the constraints that are created
|
||||
by the application of the different flow patterns. This graph is then what will
|
||||
be analyzed & traversed during the engines execution. At this point a few
|
||||
helper object are also created and saved to internal engine variables (these
|
||||
object help in execution of atoms, analyzing the graph and performing other
|
||||
internal engine activities).
|
||||
|
||||
Preparation
|
||||
-----------
|
||||
@ -213,35 +220,37 @@ Preparation
|
||||
This stage starts by setting up the storage needed for all atoms in the
|
||||
previously created graph, ensuring that corresponding
|
||||
:py:class:`~taskflow.persistence.logbook.AtomDetail` (or subclass of) objects
|
||||
are created for each node in the graph. Once this is done final validation occurs
|
||||
on the requirements that are needed to start execution and what storage provides.
|
||||
If there is any atom or flow requirements not satisfied then execution will not be
|
||||
allowed to continue.
|
||||
are created for each node in the graph. Once this is done final validation
|
||||
occurs on the requirements that are needed to start execution and what storage
|
||||
provides. If there is any atom or flow requirements not satisfied then
|
||||
execution will not be allowed to continue.
|
||||
|
||||
Execution
|
||||
---------
|
||||
|
||||
The graph (and helper objects) previously created are now used for guiding further
|
||||
execution. The flow is put into the ``RUNNING`` :doc:`state <states>` and a
|
||||
The graph (and helper objects) previously created are now used for guiding
|
||||
further execution. The flow is put into the ``RUNNING`` :doc:`state <states>`
|
||||
and a
|
||||
:py:class:`~taskflow.engines.action_engine.graph_action.FutureGraphAction`
|
||||
object starts to take over and begins going through the stages listed below.
|
||||
|
||||
Resumption
|
||||
^^^^^^^^^^
|
||||
|
||||
One of the first stages is to analyze the :doc:`state <states>` of the tasks in the graph,
|
||||
determining which ones have failed, which one were previously running and
|
||||
determining what the intention of that task should now be (typically an
|
||||
intention can be that it should ``REVERT``, or that it should ``EXECUTE`` or
|
||||
that it should be ``IGNORED``). This intention is determined by analyzing the
|
||||
current state of the task; which is determined by looking at the state in the task
|
||||
detail object for that task and analyzing edges of the graph for things like
|
||||
retry atom which can influence what a tasks intention should be (this is aided
|
||||
by the usage of the :py:class:`~taskflow.engines.action_engine.graph_analyzer.GraphAnalyzer`
|
||||
helper object which was designed to provide helper methods for this analysis). Once
|
||||
One of the first stages is to analyze the :doc:`state <states>` of the tasks in
|
||||
the graph, determining which ones have failed, which one were previously
|
||||
running and determining what the intention of that task should now be
|
||||
(typically an intention can be that it should ``REVERT``, or that it should
|
||||
``EXECUTE`` or that it should be ``IGNORED``). This intention is determined by
|
||||
analyzing the current state of the task; which is determined by looking at the
|
||||
state in the task detail object for that task and analyzing edges of the graph
|
||||
for things like retry atom which can influence what a tasks intention should be
|
||||
(this is aided by the usage of the
|
||||
:py:class:`~taskflow.engines.action_engine.graph_analyzer.GraphAnalyzer` helper
|
||||
object which was designed to provide helper methods for this analysis). Once
|
||||
these intentions are determined and associated with each task (the intention is
|
||||
also stored in the :py:class:`~taskflow.persistence.logbook.AtomDetail` object) the
|
||||
scheduling stage starts.
|
||||
also stored in the :py:class:`~taskflow.persistence.logbook.AtomDetail` object)
|
||||
the scheduling stage starts.
|
||||
|
||||
Scheduling
|
||||
^^^^^^^^^^
|
||||
@ -250,25 +259,26 @@ This stage selects which atoms are eligible to run (looking at there intention,
|
||||
checking if predecessor atoms have ran and so-on, again using the
|
||||
:py:class:`~taskflow.engines.action_engine.graph_analyzer.GraphAnalyzer` helper
|
||||
object) and submits those atoms to a previously provided compatible
|
||||
`executor`_ for asynchronous execution. This executor will return a `future`_ object
|
||||
for each atom submitted; all of which are collected into a list of not done
|
||||
futures. This will end the initial round of scheduling and at this point the
|
||||
engine enters the waiting stage.
|
||||
`executor`_ for asynchronous execution. This executor will return a `future`_
|
||||
object for each atom submitted; all of which are collected into a list of not
|
||||
done futures. This will end the initial round of scheduling and at this point
|
||||
the engine enters the waiting stage.
|
||||
|
||||
Waiting
|
||||
^^^^^^^
|
||||
|
||||
In this stage the engine waits for any of the future objects previously submitted
|
||||
to complete. Once one of the future objects completes (or fails) that atoms result
|
||||
will be examined and persisted to the persistence backend (saved into the
|
||||
corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail` object) and
|
||||
the state of the atom is changed. At this point what happens falls into two categories,
|
||||
one for if that atom failed and one for if it did not. If the atom failed it may
|
||||
be set to a new intention such as ``RETRY`` or ``REVERT`` (other atoms that were
|
||||
predecessors of this failing atom may also have there intention altered). Once this
|
||||
intention adjustment has happened a new round of scheduling occurs and this process
|
||||
repeats until the engine succeeds or fails (if the process running the engine
|
||||
dies the above stages will be restarted and resuming will occur).
|
||||
In this stage the engine waits for any of the future objects previously
|
||||
submitted to complete. Once one of the future objects completes (or fails) that
|
||||
atoms result will be examined and persisted to the persistence backend (saved
|
||||
into the corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail`
|
||||
object) and the state of the atom is changed. At this point what happens falls
|
||||
into two categories, one for if that atom failed and one for if it did not. If
|
||||
the atom failed it may be set to a new intention such as ``RETRY`` or
|
||||
``REVERT`` (other atoms that were predecessors of this failing atom may also
|
||||
have there intention altered). Once this intention adjustment has happened a
|
||||
new round of scheduling occurs and this process repeats until the engine
|
||||
succeeds or fails (if the process running the engine dies the above stages will
|
||||
be restarted and resuming will occur).
|
||||
|
||||
.. note::
|
||||
|
||||
@ -280,15 +290,17 @@ dies the above stages will be restarted and resuming will occur).
|
||||
Finishing
|
||||
---------
|
||||
|
||||
At this point the :py:class:`~taskflow.engines.action_engine.graph_action.FutureGraphAction`
|
||||
has now finished successfully, failed, or the execution was suspended. Depending
|
||||
on which one of these occurs will cause the flow to enter a new state (typically one
|
||||
of ``FAILURE``, ``SUSPENDED``, ``SUCCESS`` or ``REVERTED``). :doc:`Notifications <notifications>`
|
||||
will be sent out about this final state change (other state changes also send out notifications)
|
||||
and any failures that occurred will be reraised (the failure objects are wrapped
|
||||
exceptions). If no failures have occurred then the engine will have finished and
|
||||
if so desired the :doc:`persistence <persistence>` can be used to cleanup any
|
||||
details that were saved for this execution.
|
||||
At this point the
|
||||
:py:class:`~taskflow.engines.action_engine.graph_action.FutureGraphAction` has
|
||||
now finished successfully, failed, or the execution was suspended. Depending on
|
||||
which one of these occurs will cause the flow to enter a new state (typically
|
||||
one of ``FAILURE``, ``SUSPENDED``, ``SUCCESS`` or ``REVERTED``).
|
||||
:doc:`Notifications <notifications>` will be sent out about this final state
|
||||
change (other state changes also send out notifications) and any failures that
|
||||
occurred will be reraised (the failure objects are wrapped exceptions). If no
|
||||
failures have occurred then the engine will have finished and if so desired the
|
||||
:doc:`persistence <persistence>` can be used to cleanup any details that were
|
||||
saved for this execution.
|
||||
|
||||
Interfaces
|
||||
==========
|
||||
|
@ -6,7 +6,8 @@ easy, consistent, and reliable.*
|
||||
|
||||
.. note::
|
||||
|
||||
Additional documentation is also hosted on wiki: https://wiki.openstack.org/wiki/TaskFlow
|
||||
Additional documentation is also hosted on wiki:
|
||||
https://wiki.openstack.org/wiki/TaskFlow
|
||||
|
||||
Contents
|
||||
========
|
||||
|
@ -4,19 +4,19 @@ Inputs and Outputs
|
||||
|
||||
In TaskFlow there are multiple ways to provide inputs for your tasks and flows
|
||||
and get information from them. This document describes one of them, that
|
||||
involves task arguments and results. There are also
|
||||
:doc:`notifications <notifications>`, which allow you to get notified when task
|
||||
or flow changed state. You may also opt to use the :doc:`persistence <persistence>`
|
||||
layer itself directly.
|
||||
involves task arguments and results. There are also :doc:`notifications
|
||||
<notifications>`, which allow you to get notified when task or flow changed
|
||||
state. You may also opt to use the :doc:`persistence <persistence>` layer
|
||||
itself directly.
|
||||
|
||||
-----------------------
|
||||
Flow Inputs and Outputs
|
||||
-----------------------
|
||||
|
||||
Tasks accept inputs via task arguments and provide outputs via task results
|
||||
(see :doc:`arguments and results <arguments_and_results>` for more details). This
|
||||
is the standard and recommended way to pass data from one task to another. Of
|
||||
course not every task argument needs to be provided to some other task of a
|
||||
(see :doc:`arguments and results <arguments_and_results>` for more details).
|
||||
This is the standard and recommended way to pass data from one task to another.
|
||||
Of course not every task argument needs to be provided to some other task of a
|
||||
flow, and not every task result should be consumed by every task.
|
||||
|
||||
If some value is required by one or more tasks of a flow, but is not provided
|
||||
@ -54,10 +54,12 @@ For example:
|
||||
|
||||
.. make vim syntax highlighter happy**
|
||||
|
||||
As you can see, this flow does not require b, as it is provided by the fist task.
|
||||
As you can see, this flow does not require b, as it is provided by the fist
|
||||
task.
|
||||
|
||||
.. note::
|
||||
There is no difference between processing of Task and Retry inputs and outputs.
|
||||
There is no difference between processing of Task and Retry inputs
|
||||
and outputs.
|
||||
|
||||
------------------
|
||||
Engine and Storage
|
||||
@ -93,7 +95,8 @@ prior to running:
|
||||
>>> engines.run(flo)
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
taskflow.exceptions.MissingDependencies: taskflow.patterns.linear_flow.Flow: cat-dog;
|
||||
taskflow.exceptions.MissingDependencies:
|
||||
taskflow.patterns.linear_flow.Flow: cat-dog;
|
||||
2 requires ['meow', 'woof'] but no other entity produces said requirements
|
||||
|
||||
The recommended way to provide flow inputs is to use the ``store`` parameter
|
||||
@ -120,10 +123,10 @@ of the engine helpers (:py:func:`~taskflow.engines.helpers.run` or
|
||||
woof
|
||||
{'meow': 'meow', 'woof': 'woof', 'dog': 'dog'}
|
||||
|
||||
You can also directly interact with the engine storage layer to add
|
||||
additional values, note that if this route is used you can't use
|
||||
:py:func:`~taskflow.engines.helpers.run` in this case to run your engine (instead
|
||||
your must activate the engines run method directly):
|
||||
You can also directly interact with the engine storage layer to add additional
|
||||
values, note that if this route is used you can't use
|
||||
:py:func:`~taskflow.engines.helpers.run` in this case to run your engine
|
||||
(instead your must activate the engines run method directly):
|
||||
|
||||
.. doctest::
|
||||
|
||||
@ -142,8 +145,8 @@ Outputs
|
||||
As you can see from examples above, the run method returns all flow outputs in
|
||||
a ``dict``. This same data can be fetched via
|
||||
:py:meth:`~taskflow.storage.Storage.fetch_all` method of the storage. You can
|
||||
also get single results using :py:meth:`~taskflow.storage.Storage.fetch_all`. For
|
||||
example:
|
||||
also get single results using :py:meth:`~taskflow.storage.Storage.fetch_all`.
|
||||
For example:
|
||||
|
||||
.. doctest::
|
||||
|
||||
|
@ -6,13 +6,13 @@ Overview
|
||||
========
|
||||
|
||||
Jobs and jobboards are a **novel** concept that taskflow provides to allow for
|
||||
automatic ownership transfer of workflows between capable
|
||||
owners (those owners usually then use :doc:`engines <engines>` to complete the
|
||||
workflow). They provide the necessary semantics to be able to atomically
|
||||
transfer a job from a producer to a consumer in a reliable and fault tolerant
|
||||
manner. They are modeled off the concept used to post and acquire work in the
|
||||
physical world (typically a job listing in a newspaper or online website
|
||||
serves a similar role).
|
||||
automatic ownership transfer of workflows between capable owners (those owners
|
||||
usually then use :doc:`engines <engines>` to complete the workflow). They
|
||||
provide the necessary semantics to be able to atomically transfer a job from a
|
||||
producer to a consumer in a reliable and fault tolerant manner. They are
|
||||
modeled off the concept used to post and acquire work in the physical world
|
||||
(typically a job listing in a newspaper or online website serves a similar
|
||||
role).
|
||||
|
||||
**TLDR:** It's similar to a queue, but consumers lock items on the queue when
|
||||
claiming them, and only remove them from the queue when they're done with the
|
||||
@ -25,20 +25,22 @@ Definitions
|
||||
===========
|
||||
|
||||
Jobs
|
||||
A :py:class:`job <taskflow.jobs.job.Job>` consists of a unique identifier, name,
|
||||
and a reference to a :py:class:`logbook <taskflow.persistence.logbook.LogBook>`
|
||||
which contains the details of the work that has been or should be/will be
|
||||
completed to finish the work that has been created for that job.
|
||||
A :py:class:`job <taskflow.jobs.job.Job>` consists of a unique identifier,
|
||||
name, and a reference to a :py:class:`logbook
|
||||
<taskflow.persistence.logbook.LogBook>` which contains the details of the
|
||||
work that has been or should be/will be completed to finish the work that has
|
||||
been created for that job.
|
||||
|
||||
Jobboards
|
||||
A :py:class:`jobboard <taskflow.jobs.jobboard.JobBoard>` is responsible for managing
|
||||
the posting, ownership, and delivery of jobs. It acts as the location where jobs
|
||||
can be posted, claimed and searched for; typically by iteration or notification.
|
||||
Jobboards may be backed by different *capable* implementations (each with potentially differing
|
||||
configuration) but all jobboards implement the same interface and semantics so
|
||||
that the backend usage is as transparent as possible. This allows deployers or
|
||||
developers of a service that uses TaskFlow to select a jobboard implementation
|
||||
that fits their setup (and there intended usage) best.
|
||||
A :py:class:`jobboard <taskflow.jobs.jobboard.JobBoard>` is responsible for
|
||||
managing the posting, ownership, and delivery of jobs. It acts as the
|
||||
location where jobs can be posted, claimed and searched for; typically by
|
||||
iteration or notification. Jobboards may be backed by different *capable*
|
||||
implementations (each with potentially differing configuration) but all
|
||||
jobboards implement the same interface and semantics so that the backend
|
||||
usage is as transparent as possible. This allows deployers or developers of a
|
||||
service that uses TaskFlow to select a jobboard implementation that fits
|
||||
their setup (and there intended usage) best.
|
||||
|
||||
Features
|
||||
========
|
||||
@ -197,18 +199,19 @@ non-issues but for now they are worth mentioning.
|
||||
Dual-engine jobs
|
||||
----------------
|
||||
|
||||
**What:** Since atoms and engines are not currently `preemptable`_ we can not force
|
||||
a engine (or the threads/remote workers... it is using to run) to stop working on
|
||||
an atom (it is general bad behavior to force code to stop without its consent anyway) if it has
|
||||
already started working on an atom (short of doing a ``kill -9`` on the running interpreter).
|
||||
This could cause problems since the points an engine can notice that it no longer owns a
|
||||
claim is at any :doc:`state <states>` change that occurs (transitioning to a
|
||||
new atom or recording a result for example), where upon noticing the claim has
|
||||
been lost the engine can immediately stop doing further work. The effect that this
|
||||
causes is that when a claim is lost another engine can immediately attempt to acquire
|
||||
the claim that was previously lost and it *could* begin working on the unfinished tasks
|
||||
that the later engine may also still be executing (since that engine is not yet
|
||||
aware that it has lost the claim).
|
||||
**What:** Since atoms and engines are not currently `preemptable`_ we can not
|
||||
force a engine (or the threads/remote workers... it is using to run) to stop
|
||||
working on an atom (it is general bad behavior to force code to stop without
|
||||
its consent anyway) if it has already started working on an atom (short of
|
||||
doing a ``kill -9`` on the running interpreter). This could cause problems
|
||||
since the points an engine can notice that it no longer owns a claim is at any
|
||||
:doc:`state <states>` change that occurs (transitioning to a new atom or
|
||||
recording a result for example), where upon noticing the claim has been lost
|
||||
the engine can immediately stop doing further work. The effect that this causes
|
||||
is that when a claim is lost another engine can immediately attempt to acquire
|
||||
the claim that was previously lost and it *could* begin working on the
|
||||
unfinished tasks that the later engine may also still be executing (since that
|
||||
engine is not yet aware that it has lost the claim).
|
||||
|
||||
**TLDR:** not `preemptable`_, possible to become aware of losing a claim
|
||||
after the fact (at the next state change), another engine could have acquired
|
||||
@ -219,17 +222,18 @@ the claim by then, therefore both would be *working* on a job.
|
||||
#. Ensure your atoms are `idempotent`_, this will cause an engine that may be
|
||||
executing the same atom to be able to continue executing without causing
|
||||
any conflicts/problems (idempotency guarantees this).
|
||||
#. On claiming jobs that have been claimed previously enforce a policy that happens
|
||||
before the jobs workflow begins to execute (possibly prior to an engine beginning
|
||||
the jobs work) that ensures that any prior work has been rolled back before
|
||||
continuing rolling forward. For example:
|
||||
#. On claiming jobs that have been claimed previously enforce a policy that
|
||||
happens before the jobs workflow begins to execute (possibly prior to an
|
||||
engine beginning the jobs work) that ensures that any prior work has been
|
||||
rolled back before continuing rolling forward. For example:
|
||||
|
||||
* Rolling back the last atom/set of atoms that finished.
|
||||
* Rolling back the last state change that occurred.
|
||||
|
||||
#. Delay claiming partially completed work by adding a wait period (to allow the
|
||||
previous engine to coalesce) before working on a partially completed job (combine
|
||||
this with the prior suggestions and dual-engine issues should be avoided).
|
||||
#. Delay claiming partially completed work by adding a wait period (to allow
|
||||
the previous engine to coalesce) before working on a partially completed job
|
||||
(combine this with the prior suggestions and dual-engine issues should be
|
||||
avoided).
|
||||
|
||||
.. _idempotent: http://en.wikipedia.org/wiki/Idempotence
|
||||
.. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
|
||||
|
@ -21,8 +21,8 @@ To receive these notifications you should register a callback in
|
||||
Each engine provides two of them: one notifies about flow state changes,
|
||||
and another notifies about changes of tasks.
|
||||
|
||||
TaskFlow also has a set of predefined :ref:`listeners <listeners>`, and provides
|
||||
means to write your own listeners, which can be more convenient than
|
||||
TaskFlow also has a set of predefined :ref:`listeners <listeners>`, and
|
||||
provides means to write your own listeners, which can be more convenient than
|
||||
using raw callbacks.
|
||||
|
||||
--------------------------------------
|
||||
|
@ -17,17 +17,18 @@ This abstraction serves the following *major* purposes:
|
||||
* Tracking of what was done (introspection).
|
||||
* Saving *memory* which allows for restarting from the last saved state
|
||||
which is a critical feature to restart and resume workflows (checkpointing).
|
||||
* Associating additional metadata with atoms while running (without having those
|
||||
atoms need to save this data themselves). This makes it possible to add-on
|
||||
new metadata in the future without having to change the atoms themselves. For
|
||||
example the following can be saved:
|
||||
* Associating additional metadata with atoms while running (without having
|
||||
those atoms need to save this data themselves). This makes it possible to
|
||||
add-on new metadata in the future without having to change the atoms
|
||||
themselves. For example the following can be saved:
|
||||
|
||||
* Timing information (how long a task took to run).
|
||||
* User information (who the task ran as).
|
||||
* When a atom/workflow was ran (and why).
|
||||
|
||||
* Saving historical data (failures, successes, intermediary results...) to allow
|
||||
for retry atoms to be able to decide if they should should continue vs. stop.
|
||||
* Saving historical data (failures, successes, intermediary results...)
|
||||
to allow for retry atoms to be able to decide if they should should continue
|
||||
vs. stop.
|
||||
* *Something you create...*
|
||||
|
||||
.. _stevedore: http://stevedore.readthedocs.org/
|
||||
@ -35,39 +36,47 @@ This abstraction serves the following *major* purposes:
|
||||
How it is used
|
||||
==============
|
||||
|
||||
On :doc:`engine <engines>` construction typically a backend (it can be optional)
|
||||
will be provided which satisfies the :py:class:`~taskflow.persistence.backends.base.Backend`
|
||||
abstraction. Along with providing a backend object a :py:class:`~taskflow.persistence.logbook.FlowDetail`
|
||||
object will also be created and provided (this object will contain the details about
|
||||
the flow to be ran) to the engine constructor (or associated :py:meth:`load() <taskflow.engines.helpers.load>` helper functions).
|
||||
Typically a :py:class:`~taskflow.persistence.logbook.FlowDetail` object is created from
|
||||
a :py:class:`~taskflow.persistence.logbook.LogBook` object (the book object
|
||||
acts as a type of container for :py:class:`~taskflow.persistence.logbook.FlowDetail`
|
||||
On :doc:`engine <engines>` construction typically a backend (it can be
|
||||
optional) will be provided which satisfies the
|
||||
:py:class:`~taskflow.persistence.backends.base.Backend` abstraction. Along with
|
||||
providing a backend object a
|
||||
:py:class:`~taskflow.persistence.logbook.FlowDetail` object will also be
|
||||
created and provided (this object will contain the details about the flow to be
|
||||
ran) to the engine constructor (or associated :py:meth:`load()
|
||||
<taskflow.engines.helpers.load>` helper functions). Typically a
|
||||
:py:class:`~taskflow.persistence.logbook.FlowDetail` object is created from a
|
||||
:py:class:`~taskflow.persistence.logbook.LogBook` object (the book object acts
|
||||
as a type of container for :py:class:`~taskflow.persistence.logbook.FlowDetail`
|
||||
and :py:class:`~taskflow.persistence.logbook.AtomDetail` objects).
|
||||
|
||||
**Preparation**: Once an engine starts to run it will create a :py:class:`~taskflow.storage.Storage`
|
||||
object which will act as the engines interface to the underlying backend storage
|
||||
objects (it provides helper functions that are commonly used by the engine,
|
||||
avoiding repeating code when interacting with the provided :py:class:`~taskflow.persistence.logbook.FlowDetail`
|
||||
and :py:class:`~taskflow.persistence.backends.base.Backend` objects). As an engine
|
||||
initializes it will extract (or create) :py:class:`~taskflow.persistence.logbook.AtomDetail`
|
||||
objects for each atom in the workflow the engine will be executing.
|
||||
**Preparation**: Once an engine starts to run it will create a
|
||||
:py:class:`~taskflow.storage.Storage` object which will act as the engines
|
||||
interface to the underlying backend storage objects (it provides helper
|
||||
functions that are commonly used by the engine, avoiding repeating code when
|
||||
interacting with the provided
|
||||
:py:class:`~taskflow.persistence.logbook.FlowDetail` and
|
||||
:py:class:`~taskflow.persistence.backends.base.Backend` objects). As an engine
|
||||
initializes it will extract (or create)
|
||||
:py:class:`~taskflow.persistence.logbook.AtomDetail` objects for each atom in
|
||||
the workflow the engine will be executing.
|
||||
|
||||
**Execution:** When an engine beings to execute (see :doc:`engine <engines>` for more
|
||||
of the details about how an engine goes about this process) it will examine any
|
||||
previously existing :py:class:`~taskflow.persistence.logbook.AtomDetail` objects to
|
||||
see if they can be used for resuming; see :doc:`resumption <resumption>` for more details
|
||||
on this subject. For atoms which have not finished (or did not finish correctly from a
|
||||
previous run) they will begin executing only after any dependent inputs are ready. This
|
||||
is done by analyzing the execution graph and looking at predecessor :py:class:`~taskflow.persistence.logbook.AtomDetail`
|
||||
outputs and states (which may have been persisted in a past run). This will result
|
||||
in either using there previous information or by running those predecessors and
|
||||
saving their output to the :py:class:`~taskflow.persistence.logbook.FlowDetail` and
|
||||
:py:class:`~taskflow.persistence.backends.base.Backend` objects. This execution, analysis
|
||||
and interaction with the storage objects continues (what is described here is
|
||||
a simplification of what really happens; which is quite a bit more complex)
|
||||
until the engine has finished running (at which point the engine will have
|
||||
succeeded or failed in its attempt to run the workflow).
|
||||
**Execution:** When an engine beings to execute (see :doc:`engine <engines>`
|
||||
for more of the details about how an engine goes about this process) it will
|
||||
examine any previously existing
|
||||
:py:class:`~taskflow.persistence.logbook.AtomDetail` objects to see if they can
|
||||
be used for resuming; see :doc:`resumption <resumption>` for more details on
|
||||
this subject. For atoms which have not finished (or did not finish correctly
|
||||
from a previous run) they will begin executing only after any dependent inputs
|
||||
are ready. This is done by analyzing the execution graph and looking at
|
||||
predecessor :py:class:`~taskflow.persistence.logbook.AtomDetail` outputs and
|
||||
states (which may have been persisted in a past run). This will result in
|
||||
either using there previous information or by running those predecessors and
|
||||
saving their output to the :py:class:`~taskflow.persistence.logbook.FlowDetail`
|
||||
and :py:class:`~taskflow.persistence.backends.base.Backend` objects. This
|
||||
execution, analysis and interaction with the storage objects continues (what is
|
||||
described here is a simplification of what really happens; which is quite a bit
|
||||
more complex) until the engine has finished running (at which point the engine
|
||||
will have succeeded or failed in its attempt to run the workflow).
|
||||
|
||||
**Post-execution:** Typically when an engine is done running the logbook would
|
||||
be discarded (to avoid creating a stockpile of useless data) and the backend
|
||||
@ -91,23 +100,24 @@ A few scenarios come to mind:
|
||||
|
||||
It should be emphasized that logbook is the authoritative, and, preferably,
|
||||
the **only** (see :doc:`inputs and outputs <inputs_and_outputs>`) source of
|
||||
run-time state information (breaking this principle makes it hard/impossible
|
||||
to restart or resume in any type of automated fashion). When an atom returns
|
||||
a result, it should be written directly to a logbook. When atom or flow state
|
||||
changes in any way, logbook is first to know (see :doc:`notifications <notifications>`
|
||||
for how a user may also get notified of those same state changes). The logbook
|
||||
and a backend and associated storage helper class are responsible to store the actual data.
|
||||
These components used together specify the persistence mechanism (how data
|
||||
is saved and where -- memory, database, whatever...) and the persistence policy
|
||||
(when data is saved -- every time it changes or at some particular moments
|
||||
or simply never).
|
||||
run-time state information (breaking this principle makes it
|
||||
hard/impossible to restart or resume in any type of automated fashion).
|
||||
When an atom returns a result, it should be written directly to a logbook.
|
||||
When atom or flow state changes in any way, logbook is first to know (see
|
||||
:doc:`notifications <notifications>` for how a user may also get notified
|
||||
of those same state changes). The logbook and a backend and associated
|
||||
storage helper class are responsible to store the actual data. These
|
||||
components used together specify the persistence mechanism (how data is
|
||||
saved and where -- memory, database, whatever...) and the persistence
|
||||
policy (when data is saved -- every time it changes or at some particular
|
||||
moments or simply never).
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
To select which persistence backend to use you should use the
|
||||
:py:meth:`fetch() <taskflow.persistence.backends.fetch>` function which uses
|
||||
entrypoints (internally using `stevedore`_) to fetch and configure your backend. This makes
|
||||
To select which persistence backend to use you should use the :py:meth:`fetch()
|
||||
<taskflow.persistence.backends.fetch>` function which uses entrypoints
|
||||
(internally using `stevedore`_) to fetch and configure your backend. This makes
|
||||
it simpler than accessing the backend data types directly and provides a common
|
||||
function from which a backend can be fetched.
|
||||
|
||||
@ -158,11 +168,11 @@ Sqlalchemy
|
||||
|
||||
**Connection**: ``'mysql'`` or ``'postgres'`` or ``'sqlite'``
|
||||
|
||||
Retains all data in a `ACID`_ compliant database using the `sqlalchemy`_ library
|
||||
for schemas, connections, and database interaction functionality. Useful when
|
||||
you need a higher level of durability than offered by the previous solutions. When
|
||||
using these connection types it is possible to resume a engine from a peer machine (this
|
||||
does not apply when using sqlite).
|
||||
Retains all data in a `ACID`_ compliant database using the `sqlalchemy`_
|
||||
library for schemas, connections, and database interaction functionality.
|
||||
Useful when you need a higher level of durability than offered by the previous
|
||||
solutions. When using these connection types it is possible to resume a engine
|
||||
from a peer machine (this does not apply when using sqlite).
|
||||
|
||||
.. _sqlalchemy: http://www.sqlalchemy.org/docs/
|
||||
.. _ACID: https://en.wikipedia.org/wiki/ACID
|
||||
|
@ -5,34 +5,34 @@ Resumption
|
||||
Overview
|
||||
========
|
||||
|
||||
**Question**: *How can we persist the flow so that it can be resumed, restarted or
|
||||
rolled-back on engine failure?*
|
||||
**Question**: *How can we persist the flow so that it can be resumed, restarted
|
||||
or rolled-back on engine failure?*
|
||||
|
||||
**Answer:** Since a flow is a set of :doc:`atoms <atoms>` and relations between atoms we
|
||||
need to create a model and corresponding information that allows us to persist
|
||||
the *right* amount of information to preserve, resume, and rollback a flow on
|
||||
software or hardware failure.
|
||||
**Answer:** Since a flow is a set of :doc:`atoms <atoms>` and relations between
|
||||
atoms we need to create a model and corresponding information that allows us to
|
||||
persist the *right* amount of information to preserve, resume, and rollback a
|
||||
flow on software or hardware failure.
|
||||
|
||||
To allow for resumption taskflow must be able to re-create the flow and re-connect
|
||||
the links between atom (and between atoms->atom details and so on) in order to
|
||||
revert those atoms or resume those atoms in the correct ordering. Taskflow provides
|
||||
a pattern that can help in automating this process (it does **not** prohibit the user
|
||||
from creating their own strategies for doing this).
|
||||
To allow for resumption taskflow must be able to re-create the flow and
|
||||
re-connect the links between atom (and between atoms->atom details and so on)
|
||||
in order to revert those atoms or resume those atoms in the correct ordering.
|
||||
Taskflow provides a pattern that can help in automating this process (it does
|
||||
**not** prohibit the user from creating their own strategies for doing this).
|
||||
|
||||
Factories
|
||||
=========
|
||||
|
||||
The default provided way is to provide a `factory`_ function which will create (or
|
||||
recreate your workflow). This function can be provided when loading
|
||||
a flow and corresponding engine via the provided
|
||||
:py:meth:`load_from_factory() <taskflow.engines.helpers.load_from_factory>` method. This
|
||||
`factory`_ function is expected to be a function (or ``staticmethod``) which is reimportable (aka
|
||||
has a well defined name that can be located by the ``__import__`` function in python, this
|
||||
excludes ``lambda`` style functions and ``instance`` methods). The `factory`_ function
|
||||
name will be saved into the logbook and it will be imported and called to create the
|
||||
workflow objects (or recreate it if resumption happens). This allows for the flow
|
||||
to be recreated if and when that is needed (even on remote machines, as long as the
|
||||
reimportable name can be located).
|
||||
The default provided way is to provide a `factory`_ function which will create
|
||||
(or recreate your workflow). This function can be provided when loading a flow
|
||||
and corresponding engine via the provided :py:meth:`load_from_factory()
|
||||
<taskflow.engines.helpers.load_from_factory>` method. This `factory`_ function
|
||||
is expected to be a function (or ``staticmethod``) which is reimportable (aka
|
||||
has a well defined name that can be located by the ``__import__`` function in
|
||||
python, this excludes ``lambda`` style functions and ``instance`` methods). The
|
||||
`factory`_ function name will be saved into the logbook and it will be imported
|
||||
and called to create the workflow objects (or recreate it if resumption
|
||||
happens). This allows for the flow to be recreated if and when that is needed
|
||||
(even on remote machines, as long as the reimportable name can be located).
|
||||
|
||||
.. _factory: https://en.wikipedia.org/wiki/Factory_%28object-oriented_programming%29
|
||||
|
||||
@ -40,10 +40,10 @@ Names
|
||||
=====
|
||||
|
||||
When a flow is created it is expected that each atom has a unique name, this
|
||||
name serves a special purpose in the resumption process (as well as serving
|
||||
a useful purpose when running, allowing for atom identification in the
|
||||
:doc:`notification <notifications>` process). The reason for having names is that
|
||||
an atom in a flow needs to be somehow matched with (a potentially)
|
||||
name serves a special purpose in the resumption process (as well as serving a
|
||||
useful purpose when running, allowing for atom identification in the
|
||||
:doc:`notification <notifications>` process). The reason for having names is
|
||||
that an atom in a flow needs to be somehow matched with (a potentially)
|
||||
existing :py:class:`~taskflow.persistence.logbook.AtomDetail` during engine
|
||||
resumption & subsequent running.
|
||||
|
||||
@ -61,27 +61,29 @@ Names provide this although they do have weaknesses:
|
||||
|
||||
.. note::
|
||||
|
||||
Even though these weaknesses names were selected as a *good enough* solution for the above
|
||||
matching requirements (until something better is invented/created that can satisfy those
|
||||
same requirements).
|
||||
Even though these weaknesses names were selected as a *good enough*
|
||||
solution for the above matching requirements (until something better is
|
||||
invented/created that can satisfy those same requirements).
|
||||
|
||||
Scenarios
|
||||
=========
|
||||
|
||||
When new flow is loaded into engine, there is no persisted data
|
||||
for it yet, so a corresponding :py:class:`~taskflow.persistence.logbook.FlowDetail` object
|
||||
will be created, as well as a :py:class:`~taskflow.persistence.logbook.AtomDetail` object for
|
||||
each atom that is contained in it. These will be immediately saved into the persistence backend
|
||||
that is configured. If no persistence backend is configured, then as expected nothing will be
|
||||
saved and the atoms and flow will be ran in a non-persistent manner.
|
||||
When new flow is loaded into engine, there is no persisted data for it yet, so
|
||||
a corresponding :py:class:`~taskflow.persistence.logbook.FlowDetail` object
|
||||
will be created, as well as a
|
||||
:py:class:`~taskflow.persistence.logbook.AtomDetail` object for each atom that
|
||||
is contained in it. These will be immediately saved into the persistence
|
||||
backend that is configured. If no persistence backend is configured, then as
|
||||
expected nothing will be saved and the atoms and flow will be ran in a
|
||||
non-persistent manner.
|
||||
|
||||
**Subsequent run:** When we resume the flow from a persistent backend (for example,
|
||||
if the flow was interrupted and engine destroyed to save resources or if the
|
||||
service was restarted), we need to re-create the flow. For that, we will call
|
||||
the function that was saved on first-time loading that builds the flow for
|
||||
us (aka; the flow factory function described above) and the engine will run. The
|
||||
following scenarios explain some expected structural changes and how they can
|
||||
be accommodated (and what the effect will be when resuming & running).
|
||||
**Subsequent run:** When we resume the flow from a persistent backend (for
|
||||
example, if the flow was interrupted and engine destroyed to save resources or
|
||||
if the service was restarted), we need to re-create the flow. For that, we will
|
||||
call the function that was saved on first-time loading that builds the flow for
|
||||
us (aka; the flow factory function described above) and the engine will run.
|
||||
The following scenarios explain some expected structural changes and how they
|
||||
can be accommodated (and what the effect will be when resuming & running).
|
||||
|
||||
Same atoms
|
||||
----------
|
||||
@ -96,61 +98,64 @@ and then the engine resumes.
|
||||
Atom was added
|
||||
--------------
|
||||
|
||||
When the factory function mentioned above alters the flow by adding
|
||||
a new atom in (for example for changing the runtime structure of what was previously
|
||||
ran in the first run).
|
||||
When the factory function mentioned above alters the flow by adding a new atom
|
||||
in (for example for changing the runtime structure of what was previously ran
|
||||
in the first run).
|
||||
|
||||
**Runtime change:** By default when the engine resumes it will notice that
|
||||
a corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail` does not
|
||||
**Runtime change:** By default when the engine resumes it will notice that a
|
||||
corresponding :py:class:`~taskflow.persistence.logbook.AtomDetail` does not
|
||||
exist and one will be created and associated.
|
||||
|
||||
Atom was removed
|
||||
----------------
|
||||
|
||||
When the factory function mentioned above alters the flow by removing
|
||||
a new atom in (for example for changing the runtime structure of what was previously
|
||||
When the factory function mentioned above alters the flow by removing a new
|
||||
atom in (for example for changing the runtime structure of what was previously
|
||||
ran in the first run).
|
||||
|
||||
**Runtime change:** Nothing should be done -- flow structure is reloaded from factory
|
||||
function, and removed atom is not in it -- so, flow will be ran as if it was
|
||||
not there, and any results it returned if it was completed before will be ignored.
|
||||
**Runtime change:** Nothing should be done -- flow structure is reloaded from
|
||||
factory function, and removed atom is not in it -- so, flow will be ran as if
|
||||
it was not there, and any results it returned if it was completed before will
|
||||
be ignored.
|
||||
|
||||
Atom code was changed
|
||||
---------------------
|
||||
|
||||
When the factory function mentioned above alters the flow by deciding that a newer
|
||||
version of a previously existing atom should be ran (possibly to perform some
|
||||
kind of upgrade or to fix a bug in a prior atoms code).
|
||||
When the factory function mentioned above alters the flow by deciding that a
|
||||
newer version of a previously existing atom should be ran (possibly to perform
|
||||
some kind of upgrade or to fix a bug in a prior atoms code).
|
||||
|
||||
**Factory change:** The atom name & version will have to be altered. The
|
||||
factory should replace this name where it was being used previously.
|
||||
|
||||
**Runtime change:** This will fall under the same runtime adjustments that exist
|
||||
when a new atom is added. In the future taskflow could make this easier by
|
||||
providing a ``upgrade()`` function that can be used to give users the ability
|
||||
to upgrade atoms before running (manual introspection & modification of a
|
||||
:py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine loading
|
||||
and running to accomplish this in the meantime).
|
||||
**Runtime change:** This will fall under the same runtime adjustments that
|
||||
exist when a new atom is added. In the future taskflow could make this easier
|
||||
by providing a ``upgrade()`` function that can be used to give users the
|
||||
ability to upgrade atoms before running (manual introspection & modification of
|
||||
a :py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
|
||||
loading and running to accomplish this in the meantime).
|
||||
|
||||
Atom was split in two atoms or merged from two (or more) to one atom
|
||||
--------------------------------------------------------------------
|
||||
|
||||
When the factory function mentioned above alters the flow by deciding that a previously
|
||||
existing atom should be split into N atoms or the factory function decides that N atoms
|
||||
should be merged in <N atoms (typically occurring during refactoring).
|
||||
When the factory function mentioned above alters the flow by deciding that a
|
||||
previously existing atom should be split into N atoms or the factory function
|
||||
decides that N atoms should be merged in <N atoms (typically occurring during
|
||||
refactoring).
|
||||
|
||||
**Runtime change:** This will fall under the same runtime adjustments that exist
|
||||
when a new atom is added or removed. In the future taskflow could make this easier by
|
||||
providing a ``migrate()`` function that can be used to give users the ability
|
||||
to migrate atoms previous data before running (manual introspection & modification of a
|
||||
:py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine loading
|
||||
and running to accomplish this in the meantime).
|
||||
**Runtime change:** This will fall under the same runtime adjustments that
|
||||
exist when a new atom is added or removed. In the future taskflow could make
|
||||
this easier by providing a ``migrate()`` function that can be used to give
|
||||
users the ability to migrate atoms previous data before running (manual
|
||||
introspection & modification of a
|
||||
:py:class:`~taskflow.persistence.logbook.LogBook` can be done before engine
|
||||
loading and running to accomplish this in the meantime).
|
||||
|
||||
Flow structure was changed
|
||||
--------------------------
|
||||
|
||||
If manual links were added or removed from graph, or task requirements were changed, or
|
||||
flow was refactored (atom moved into or out of subflows, linear flow was replaced with
|
||||
graph flow, tasks were reordered in linear flow, etc).
|
||||
If manual links were added or removed from graph, or task requirements were
|
||||
changed, or flow was refactored (atom moved into or out of subflows, linear
|
||||
flow was replaced with graph flow, tasks were reordered in linear flow, etc).
|
||||
|
||||
**Runtime change:** Nothing should be done.
|
||||
|
@ -12,23 +12,52 @@ Flow States
|
||||
|
||||
**PENDING** - A flow starts its life in this state.
|
||||
|
||||
**RUNNING** - In this state flow makes a progress, executes and/or reverts its tasks.
|
||||
**RUNNING** - In this state flow makes a progress, executes and/or reverts its
|
||||
tasks.
|
||||
|
||||
**SUCCESS** - Once all tasks have finished successfully the flow transitions to the SUCCESS state.
|
||||
**SUCCESS** - Once all tasks have finished successfully the flow transitions to
|
||||
the SUCCESS state.
|
||||
|
||||
**REVERTED** - The flow transitions to this state when it has been reverted successfully after the failure.
|
||||
**REVERTED** - The flow transitions to this state when it has been reverted
|
||||
successfully after the failure.
|
||||
|
||||
**FAILURE** - The flow transitions to this state when it can not be reverted after the failure.
|
||||
**FAILURE** - The flow transitions to this state when it can not be reverted
|
||||
after the failure.
|
||||
|
||||
**SUSPENDING** - In the RUNNING state the flow can be suspended. When this happens, flow transitions to the SUSPENDING state immediately. In that state the engine running the flow waits for running tasks to finish (since the engine can not preempt tasks that are active).
|
||||
**SUSPENDING** - In the RUNNING state the flow can be suspended. When this
|
||||
happens, flow transitions to the SUSPENDING state immediately. In that state
|
||||
the engine running the flow waits for running tasks to finish (since the engine
|
||||
can not preempt tasks that are active).
|
||||
|
||||
**SUSPENDED** - When no tasks are running and all results received so far are saved, the flow transitions from the SUSPENDING state to SUSPENDED. Also it may go to the SUCCESS state if all tasks were in fact ran, or to the REVERTED state if the flow was reverting and all tasks were reverted while the engine was waiting for running tasks to finish, or to the FAILURE state if tasks were run or reverted and some of them failed.
|
||||
**SUSPENDED** - When no tasks are running and all results received so far are
|
||||
saved, the flow transitions from the SUSPENDING state to SUSPENDED. Also it may
|
||||
go to the SUCCESS state if all tasks were in fact ran, or to the REVERTED state
|
||||
if the flow was reverting and all tasks were reverted while the engine was
|
||||
waiting for running tasks to finish, or to the FAILURE state if tasks were run
|
||||
or reverted and some of them failed.
|
||||
|
||||
**RESUMING** - When the flow is interrupted 'in a hard way' (e.g. server crashed), it can be loaded from storage in any state. If the state is not PENDING (aka, the flow was never ran) or SUCCESS, FAILURE or REVERTED (in which case the flow has already finished), the flow gets set to the RESUMING state for the short time period while it is being loaded from backend storage [a database, a filesystem...] (this transition is not shown on the diagram). When the flow is finally loaded, it goes to the SUSPENDED state.
|
||||
**RESUMING** - When the flow is interrupted 'in a hard way' (e.g. server
|
||||
crashed), it can be loaded from storage in any state. If the state is not
|
||||
PENDING (aka, the flow was never ran) or SUCCESS, FAILURE or REVERTED (in which
|
||||
case the flow has already finished), the flow gets set to the RESUMING state
|
||||
for the short time period while it is being loaded from backend storage [a
|
||||
database, a filesystem...] (this transition is not shown on the diagram). When
|
||||
the flow is finally loaded, it goes to the SUSPENDED state.
|
||||
|
||||
From the SUCCESS, FAILURE or REVERTED states the flow can be ran again (and thus it goes back into the RUNNING state). One of the possible use cases for this transition is to allow for alteration of a flow or flow details associated with a previously ran flow after the flow has finished, and client code wants to ensure that each task from this new (potentially updated) flow has its chance to run.
|
||||
From the SUCCESS, FAILURE or REVERTED states the flow can be ran again (and
|
||||
thus it goes back into the RUNNING state). One of the possible use cases for
|
||||
this transition is to allow for alteration of a flow or flow details associated
|
||||
with a previously ran flow after the flow has finished, and client code wants
|
||||
to ensure that each task from this new (potentially updated) flow has its
|
||||
chance to run.
|
||||
|
||||
Note: The current code also contains strong checks during each flow state transition using the model described above and raises the InvalidState exception if an invalid transition is attempted. This exception being triggered usually means there is some kind of bug in the engine code or some type of misuse/state violation is occurring, and should be reported as such.
|
||||
.. note::
|
||||
|
||||
The current code also contains strong checks during each flow state
|
||||
transition using the model described above and raises the InvalidState
|
||||
exception if an invalid transition is attempted. This exception being
|
||||
triggered usually means there is some kind of bug in the engine code or some
|
||||
type of misuse/state violation is occurring, and should be reported as such.
|
||||
|
||||
|
||||
Task States
|
||||
@ -39,17 +68,25 @@ Task States
|
||||
:align: right
|
||||
:alt: Task state transitions
|
||||
|
||||
**PENDING** - When a task is added to a flow, it starts in the PENDING state, which means it can be executed immediately or waits for all of task it depends on to complete.
|
||||
The task transitions to the PENDING state after it was reverted and its flow was restarted or retried.
|
||||
**PENDING** - When a task is added to a flow, it starts in the PENDING state,
|
||||
which means it can be executed immediately or waits for all of task it depends
|
||||
on to complete. The task transitions to the PENDING state after it was
|
||||
reverted and its flow was restarted or retried.
|
||||
|
||||
**RUNNING** - When flow starts to execute the task, it transitions to the RUNNING state, and stays in this state until its execute() method returns.
|
||||
**RUNNING** - When flow starts to execute the task, it transitions to the
|
||||
RUNNING state, and stays in this state until its execute() method returns.
|
||||
|
||||
**SUCCESS** - The task transitions to this state after it was finished successfully.
|
||||
**SUCCESS** - The task transitions to this state after it was finished
|
||||
successfully.
|
||||
|
||||
**FAILURE** - The task transitions to this state after it was finished with error. When the flow containing this task is being reverted, all its tasks are walked in particular order.
|
||||
**FAILURE** - The task transitions to this state after it was finished with
|
||||
error. When the flow containing this task is being reverted, all its tasks are
|
||||
walked in particular order.
|
||||
|
||||
**REVERTING** - The task transitions to this state when the flow starts to revert it and its revert() method is called. Only tasks in the SUCCESS or FAILURE state can be reverted.
|
||||
If this method fails (raises exception), task goes to the FAILURE state.
|
||||
**REVERTING** - The task transitions to this state when the flow starts to
|
||||
revert it and its revert() method is called. Only tasks in the SUCCESS or
|
||||
FAILURE state can be reverted. If this method fails (raises exception), task
|
||||
goes to the FAILURE state.
|
||||
|
||||
**REVERTED** - The task that has been reverted appears it this state.
|
||||
|
||||
@ -64,20 +101,30 @@ Retry States
|
||||
|
||||
Retry has the same states as a task and one additional state.
|
||||
|
||||
**PENDING** - When a retry is added to a flow, it starts in the PENDING state, which means it can be executed immediately or waits for all of task it depends on to complete.
|
||||
The retry transitions to the PENDING state after it was reverted and its flow was restarted or retried.
|
||||
**PENDING** - When a retry is added to a flow, it starts in the PENDING state,
|
||||
which means it can be executed immediately or waits for all of task it depends
|
||||
on to complete. The retry transitions to the PENDING state after it was
|
||||
reverted and its flow was restarted or retried.
|
||||
|
||||
**RUNNING** - When flow starts to execute the retry, it transitions to the RUNNING state, and stays in this state until its execute() method returns.
|
||||
**RUNNING** - When flow starts to execute the retry, it transitions to the
|
||||
RUNNING state, and stays in this state until its execute() method returns.
|
||||
|
||||
**SUCCESS** - The retry transitions to this state after it was finished successfully.
|
||||
**SUCCESS** - The retry transitions to this state after it was finished
|
||||
successfully.
|
||||
|
||||
**FAILURE** - The retry transitions to this state after it was finished with error. When the flow containing this retry is being reverted, all its tasks are walked in particular order.
|
||||
**FAILURE** - The retry transitions to this state after it was finished with
|
||||
error. When the flow containing this retry is being reverted, all its tasks are
|
||||
walked in particular order.
|
||||
|
||||
**REVERTING** - The retry transitions to this state when the flow starts to revert it and its revert() method is called. Only retries in SUCCESS or FAILURE state can be reverted.
|
||||
If this method fails (raises exception), task goes to the FAILURE.
|
||||
**REVERTING** - The retry transitions to this state when the flow starts to
|
||||
revert it and its revert() method is called. Only retries in SUCCESS or FAILURE
|
||||
state can be reverted. If this method fails (raises exception), task goes to
|
||||
the FAILURE.
|
||||
|
||||
**REVERTED** - The retry that has been reverted appears it this state.
|
||||
|
||||
**RETRYING** - If flow that is managed by the current retry was failed and reverted, the retry prepares it for the next run and transitions to the RETRYING state.
|
||||
**RETRYING** - If flow that is managed by the current retry was failed and
|
||||
reverted, the retry prepares it for the next run and transitions to the
|
||||
RETRYING state.
|
||||
|
||||
|
||||
|
@ -7,8 +7,8 @@ Overview
|
||||
|
||||
This is engine that schedules tasks to **workers** -- separate processes
|
||||
dedicated for certain atoms execution, possibly running on other machines,
|
||||
connected via `amqp`_ (or other supported
|
||||
`kombu <http://kombu.readthedocs.org/>`_ transports).
|
||||
connected via `amqp`_ (or other supported `kombu
|
||||
<http://kombu.readthedocs.org/>`_ transports).
|
||||
|
||||
.. note::
|
||||
|
||||
@ -43,8 +43,9 @@ Worker
|
||||
configured to run in as many threads (green or not) as desired.
|
||||
|
||||
Proxy
|
||||
Executors interact with workers via a proxy. The proxy maintains the underlying
|
||||
transport and publishes messages (and invokes callbacks on message reception).
|
||||
Executors interact with workers via a proxy. The proxy maintains the
|
||||
underlying transport and publishes messages (and invokes callbacks on message
|
||||
reception).
|
||||
|
||||
Requirements
|
||||
------------
|
||||
@ -122,12 +123,13 @@ engine executor in the following manner:
|
||||
1. The executor initiates task execution/reversion using a proxy object.
|
||||
2. :py:class:`~taskflow.engines.worker_based.proxy.Proxy` publishes task
|
||||
request (format is described below) into a named exchange using a routing
|
||||
key that is used to deliver request to particular workers topic. The executor
|
||||
then waits for the task requests to be accepted and confirmed by workers. If
|
||||
the executor doesn't get a task confirmation from workers within the given
|
||||
timeout the task is considered as timed-out and a timeout exception is
|
||||
raised.
|
||||
3. A worker receives a request message and starts a new thread for processing it.
|
||||
key that is used to deliver request to particular workers topic. The
|
||||
executor then waits for the task requests to be accepted and confirmed by
|
||||
workers. If the executor doesn't get a task confirmation from workers within
|
||||
the given timeout the task is considered as timed-out and a timeout
|
||||
exception is raised.
|
||||
3. A worker receives a request message and starts a new thread for processing
|
||||
it.
|
||||
|
||||
1. The worker dispatches the request (gets desired endpoint that actually
|
||||
executes the task).
|
||||
@ -141,8 +143,8 @@ engine executor in the following manner:
|
||||
handled by the engine, dispatching to listeners and so-on).
|
||||
|
||||
4. The executor gets the task request confirmation from the worker and the task
|
||||
request state changes from the ``PENDING`` to the ``RUNNING`` state. Once
|
||||
a task request is in the ``RUNNING`` state it can't be timed-out (considering
|
||||
request state changes from the ``PENDING`` to the ``RUNNING`` state. Once a
|
||||
task request is in the ``RUNNING`` state it can't be timed-out (considering
|
||||
that task execution process may take unpredictable time).
|
||||
5. The executor gets the task execution result from the worker and passes it
|
||||
back to the executor and worker-based engine to finish task processing (this
|
||||
@ -303,7 +305,8 @@ Additional supported keyword arguments:
|
||||
|
||||
* ``executor``: a class that provides a
|
||||
:py:class:`~taskflow.engines.worker_based.executor.WorkerTaskExecutor`
|
||||
interface; it will be used for executing, reverting and waiting for remote tasks.
|
||||
interface; it will be used for executing, reverting and waiting for remote
|
||||
tasks.
|
||||
|
||||
Limitations
|
||||
===========
|
||||
|
114
tools/check_doc.py
Normal file
114
tools/check_doc.py
Normal file
@ -0,0 +1,114 @@
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright (C) 2014 Ivan Melnikov <iv at altlinux dot org>
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
|
||||
"""Check documentation for simple style requirements.
|
||||
|
||||
What is checked:
|
||||
- lines should not be longer than 79 characters
|
||||
- exception: line with no whitespace except maybe in the beginning
|
||||
- exception: line that starts with '..' -- longer directives are allowed,
|
||||
including footnotes
|
||||
- no tabulation for indentation
|
||||
- no trailing whitespace
|
||||
"""
|
||||
|
||||
import fnmatch
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
|
||||
FILE_PATTERNS = ['*.rst', '*.txt']
|
||||
MAX_LINE_LENGTH = 79
|
||||
TRAILING_WHITESPACE_REGEX = re.compile('\s$')
|
||||
STARTING_WHITESPACE_REGEX = re.compile('^(\s+)')
|
||||
|
||||
|
||||
def check_max_length(line):
|
||||
if len(line) > MAX_LINE_LENGTH:
|
||||
stripped = line.strip()
|
||||
if not any((
|
||||
line.startswith('..'), # this is directive
|
||||
stripped.startswith('>>>'), # this is doctest
|
||||
stripped.startswith('...'), # and this
|
||||
stripped.startswith('taskflow.'),
|
||||
' ' not in stripped # line can't be split
|
||||
)):
|
||||
yield ('D001', 'Line too long')
|
||||
|
||||
|
||||
def check_trailing_whitespace(line):
|
||||
if TRAILING_WHITESPACE_REGEX.search(line):
|
||||
yield ('D002', 'Trailing whitespace')
|
||||
|
||||
|
||||
def check_indentation_no_tab(line):
|
||||
match = STARTING_WHITESPACE_REGEX.search(line)
|
||||
if match:
|
||||
spaces = match.group(1)
|
||||
if '\t' in spaces:
|
||||
yield ('D003', 'Tabulation used for indentation')
|
||||
|
||||
|
||||
LINE_CHECKS = (check_max_length,
|
||||
check_trailing_whitespace,
|
||||
check_indentation_no_tab)
|
||||
|
||||
|
||||
def check_lines(lines):
|
||||
for idx, line in enumerate(lines, 1):
|
||||
line = line.rstrip('\n')
|
||||
for check in LINE_CHECKS:
|
||||
for code, message in check(line):
|
||||
yield idx, code, message
|
||||
|
||||
|
||||
def check_files(filenames):
|
||||
for fn in filenames:
|
||||
with open(fn) as f:
|
||||
for line_num, code, message in check_lines(f):
|
||||
yield fn, line_num, code, message
|
||||
|
||||
|
||||
def find_files(pathes, patterns):
|
||||
for path in pathes:
|
||||
if os.path.isfile(path):
|
||||
yield path
|
||||
elif os.path.isdir(path):
|
||||
for root, dirnames, filenames in os.walk(path):
|
||||
for filename in filenames:
|
||||
if any(fnmatch.fnmatch(filename, pattern)
|
||||
for pattern in patterns):
|
||||
yield os.path.join(root, filename)
|
||||
else:
|
||||
print('Invalid path: %s' % path)
|
||||
|
||||
|
||||
def main():
|
||||
ok = True
|
||||
if len(sys.argv) > 1:
|
||||
dirs = sys.argv[1:]
|
||||
else:
|
||||
dirs = ['.']
|
||||
for error in check_files(find_files(dirs, FILE_PATTERNS)):
|
||||
ok = False
|
||||
print('%s:%s: %s %s' % error)
|
||||
sys.exit(0 if ok else 1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@ -54,6 +54,7 @@ deps = -r{toxinidir}/requirements.txt
|
||||
commands =
|
||||
python setup.py testr --slowest --testr-args='{posargs}'
|
||||
sphinx-build -b doctest doc/source doc/build
|
||||
python tools/check_doc.py doc/source
|
||||
|
||||
[testenv:py33]
|
||||
deps = {[testenv]deps}
|
||||
|
Loading…
Reference in New Issue
Block a user