61822ec737
There is a race condition where an executor may crash and leave a stuck build. We can avoid that by performing the following two actions in a transaction: * Update the build request state to COMPLETED * Submit the BuildCompletedEvent to the event queue The race condition occurs when the build request is marked as completed but no BuildCompletedEvent arrives. In that case, Zuul sees the completed build request and assumes that the event will be forthcoming; therefore the build request itself is not considered lost. The only way for a build request to be removed in that case is in the case of a buildset reset. By including these operations in a transaction, only the following states are possible if the executor crashes: * It crashes before the build is complete: the build is declared lost and restarted. * It crashes after the build is complete: the scheduler doesn't care. Transactions are limited to 1MB just like any other ZK network operation, and the result data can be large, but we already put that in a side-channel if it exceeds a certain size, so only the actual event znode and request znode need to be involved in the transaction. Change-Id: Ibedf2c5db825fb444f652b60e1c6f2c7aadc6950 |
||
---|---|---|
.. | ||
sensors | ||
__init__.py | ||
client.py | ||
common.py | ||
server.py |