Add a conductor considerations section
Add a small section in the conductor docs about the cycling issue and give some resolutions that can be applied as well as link to the better solution which is garbage collection for jobs that are not working out. Also includes some tiny tweaks to other docs. Change-Id: I73e9f8f5a8888eaf967d62723f6ffb45b02887c9
This commit is contained in:
parent
95cb0625f4
commit
c2ec0b2e49
@ -24,9 +24,41 @@ They are responsible for the following:
|
|||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
They are inspired by and have similar responsiblities
|
They are inspired by and have similar responsibilities
|
||||||
as `railroad conductors`_.
|
as `railroad conductors`_.
|
||||||
|
|
||||||
|
Considerations
|
||||||
|
==============
|
||||||
|
|
||||||
|
Some usage considerations should be used when using a conductor to make sure
|
||||||
|
it's used in a safe and reliable manner. Eventually we hope to make these
|
||||||
|
non-issues but for now they are worth mentioning.
|
||||||
|
|
||||||
|
Endless cycling
|
||||||
|
---------------
|
||||||
|
|
||||||
|
**What:** Jobs that fail (due to some type of internal error) on one conductor
|
||||||
|
will be abandoned by that conductor and then another conductor may experience
|
||||||
|
those same errors and abandon it (and repeat). This will create a job
|
||||||
|
abandonment cycle that will continue for as long as the job exists in an
|
||||||
|
claimable state.
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
|
||||||
|
.. image:: img/conductor_cycle.png
|
||||||
|
:scale: 70%
|
||||||
|
:alt: Conductor cycling
|
||||||
|
|
||||||
|
**Alleviate by:**
|
||||||
|
|
||||||
|
#. Forcefully delete jobs that have been failing continuously after a given
|
||||||
|
number of conductor attempts. This can be either done manually or
|
||||||
|
automatically via scripts (or other associated monitoring).
|
||||||
|
#. Resolve the internal error's cause (storage backend failure, other...).
|
||||||
|
#. Help implement `jobboard garbage binning`_.
|
||||||
|
|
||||||
|
.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin
|
||||||
|
|
||||||
Interfaces
|
Interfaces
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
BIN
doc/source/img/conductor_cycle.png
Normal file
BIN
doc/source/img/conductor_cycle.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 36 KiB |
@ -214,7 +214,7 @@ the engine can immediately stop doing further work. The effect that this causes
|
|||||||
is that when a claim is lost another engine can immediately attempt to acquire
|
is that when a claim is lost another engine can immediately attempt to acquire
|
||||||
the claim that was previously lost and it *could* begin working on the
|
the claim that was previously lost and it *could* begin working on the
|
||||||
unfinished tasks that the later engine may also still be executing (since that
|
unfinished tasks that the later engine may also still be executing (since that
|
||||||
engine is not yet aware that it has lost the claim).
|
engine is not yet aware that it has *lost* the claim).
|
||||||
|
|
||||||
**TLDR:** not `preemptable`_, possible to become aware of losing a claim
|
**TLDR:** not `preemptable`_, possible to become aware of losing a claim
|
||||||
after the fact (at the next state change), another engine could have acquired
|
after the fact (at the next state change), another engine could have acquired
|
||||||
@ -235,8 +235,8 @@ the claim by then, therefore both would be *working* on a job.
|
|||||||
|
|
||||||
#. Delay claiming partially completed work by adding a wait period (to allow
|
#. Delay claiming partially completed work by adding a wait period (to allow
|
||||||
the previous engine to coalesce) before working on a partially completed job
|
the previous engine to coalesce) before working on a partially completed job
|
||||||
(combine this with the prior suggestions and dual-engine issues should be
|
(combine this with the prior suggestions and *most* dual-engine issues
|
||||||
avoided).
|
should be avoided).
|
||||||
|
|
||||||
.. _idempotent: http://en.wikipedia.org/wiki/Idempotence
|
.. _idempotent: http://en.wikipedia.org/wiki/Idempotence
|
||||||
.. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
|
.. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
|
||||||
|
@ -7,8 +7,7 @@ Overview
|
|||||||
|
|
||||||
This is engine that schedules tasks to **workers** -- separate processes
|
This is engine that schedules tasks to **workers** -- separate processes
|
||||||
dedicated for certain atoms execution, possibly running on other machines,
|
dedicated for certain atoms execution, possibly running on other machines,
|
||||||
connected via `amqp`_ (or other supported `kombu
|
connected via `amqp`_ (or other supported `kombu`_ transports).
|
||||||
<http://kombu.readthedocs.org/>`_ transports).
|
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
@ -18,6 +17,7 @@ connected via `amqp`_ (or other supported `kombu
|
|||||||
production ready.
|
production ready.
|
||||||
|
|
||||||
.. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
|
.. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
|
||||||
|
.. _kombu: http://kombu.readthedocs.org/
|
||||||
|
|
||||||
Terminology
|
Terminology
|
||||||
-----------
|
-----------
|
||||||
|
Loading…
Reference in New Issue
Block a user