Merge "Add a conductor considerations section"
This commit is contained in:
@@ -24,9 +24,41 @@ They are responsible for the following:
|
||||
|
||||
.. note::
|
||||
|
||||
They are inspired by and have similar responsiblities
|
||||
They are inspired by and have similar responsibilities
|
||||
as `railroad conductors`_.
|
||||
|
||||
Considerations
|
||||
==============
|
||||
|
||||
Some usage considerations should be used when using a conductor to make sure
|
||||
it's used in a safe and reliable manner. Eventually we hope to make these
|
||||
non-issues but for now they are worth mentioning.
|
||||
|
||||
Endless cycling
|
||||
---------------
|
||||
|
||||
**What:** Jobs that fail (due to some type of internal error) on one conductor
|
||||
will be abandoned by that conductor and then another conductor may experience
|
||||
those same errors and abandon it (and repeat). This will create a job
|
||||
abandonment cycle that will continue for as long as the job exists in an
|
||||
claimable state.
|
||||
|
||||
**Example:**
|
||||
|
||||
.. image:: img/conductor_cycle.png
|
||||
:scale: 70%
|
||||
:alt: Conductor cycling
|
||||
|
||||
**Alleviate by:**
|
||||
|
||||
#. Forcefully delete jobs that have been failing continuously after a given
|
||||
number of conductor attempts. This can be either done manually or
|
||||
automatically via scripts (or other associated monitoring).
|
||||
#. Resolve the internal error's cause (storage backend failure, other...).
|
||||
#. Help implement `jobboard garbage binning`_.
|
||||
|
||||
.. _jobboard garbage binning: https://blueprints.launchpad.net/taskflow/+spec/jobboard-garbage-bin
|
||||
|
||||
Interfaces
|
||||
==========
|
||||
|
||||
|
||||
BIN
doc/source/img/conductor_cycle.png
Normal file
BIN
doc/source/img/conductor_cycle.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 36 KiB |
@@ -214,7 +214,7 @@ the engine can immediately stop doing further work. The effect that this causes
|
||||
is that when a claim is lost another engine can immediately attempt to acquire
|
||||
the claim that was previously lost and it *could* begin working on the
|
||||
unfinished tasks that the later engine may also still be executing (since that
|
||||
engine is not yet aware that it has lost the claim).
|
||||
engine is not yet aware that it has *lost* the claim).
|
||||
|
||||
**TLDR:** not `preemptable`_, possible to become aware of losing a claim
|
||||
after the fact (at the next state change), another engine could have acquired
|
||||
@@ -235,8 +235,8 @@ the claim by then, therefore both would be *working* on a job.
|
||||
|
||||
#. Delay claiming partially completed work by adding a wait period (to allow
|
||||
the previous engine to coalesce) before working on a partially completed job
|
||||
(combine this with the prior suggestions and dual-engine issues should be
|
||||
avoided).
|
||||
(combine this with the prior suggestions and *most* dual-engine issues
|
||||
should be avoided).
|
||||
|
||||
.. _idempotent: http://en.wikipedia.org/wiki/Idempotence
|
||||
.. _preemptable: http://en.wikipedia.org/wiki/Preemption_%28computing%29
|
||||
|
||||
@@ -7,8 +7,7 @@ Overview
|
||||
|
||||
This is engine that schedules tasks to **workers** -- separate processes
|
||||
dedicated for certain atoms execution, possibly running on other machines,
|
||||
connected via `amqp`_ (or other supported `kombu
|
||||
<http://kombu.readthedocs.org/>`_ transports).
|
||||
connected via `amqp`_ (or other supported `kombu`_ transports).
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -18,6 +17,7 @@ connected via `amqp`_ (or other supported `kombu
|
||||
production ready.
|
||||
|
||||
.. _blueprint page: https://blueprints.launchpad.net/taskflow?searchtext=wbe
|
||||
.. _kombu: http://kombu.readthedocs.org/
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
Reference in New Issue
Block a user