zuul/zuul/manager
James E. Blair 61eea2169b Handle more than 1024 changes in the pipeline change list
We originally wrote the change list to be a best-effort service for
the scheduler check for whether a change is in a pipeline (which
must be fast and can't lock each of the pipelines to read in the full
state).  To make it even simpler, we avoided sharding and instead
limited it to only the first 1024 changes.  But scope creep happened,
and it now also serves to provide the list of relevant changes to the
change cache.  If we have a pipeline with 1025 changes and delete
one of them from the cache, that tenant will break, so this needs to
be corrected.

This change uses sharding to correct it.  Since it's possible to
attempt to read a sharded object mid-write, we retry reads in the case
of exceptions until they succeed.

In most cases this should still only be a single znode, but we do
truncate sharded znodes, so there is a chance even in the case of
a small number of changes of reading incorrect data.

To resolve this for all cases, we retry reading until it succeeds.
The scheduler no longer reads the state at the start of pipeline
processing (it never needed to anyway), so if the data become corrupt,
a scheduler will eventually be able to correct it.  In other words,
the main pipeline processing path only writes this, and the other
paths only read it.

(An alternative solution would be to leave this as it was and instead
load the full pipeline state for maintaining the change cache; that
runs infrequently enough that we can accept the cost.  This method is
chosen since it also makes other uses of this object more correct.)

Change-Id: I132d67149c065df7343cbd3aea69988f547498f4
2021-12-01 16:30:01 -08:00
..
__init__.py Handle more than 1024 changes in the pipeline change list 2021-12-01 16:30:01 -08:00
dependent.py Refactor change key/reference resolution 2021-11-03 18:00:19 +01:00
independent.py Refactor change key/reference resolution 2021-11-03 18:00:19 +01:00
serial.py zuul-web: add pipelines' manager, triggers data in status 2021-10-27 15:13:01 +02:00
shared.py Store change queues in Zookeeper 2021-10-26 07:25:03 +02:00
supercedent.py Fix mutation while iterating over queues 2021-12-01 16:28:08 -08:00