From 8dbccc1d9944597cba91543c99ef71aae382f42c Mon Sep 17 00:00:00 2001 From: Clark Boylan Date: Thu, 22 Jun 2017 12:51:22 -0700 Subject: [PATCH] Don't join image upload workers on stop() Image uploads are synchronous with their workers and because writing logs of bytes to clouds can be slow this means that join()ing on an image upload worker can take a significant amount of time. When a nodepool builder receives a sigint we want it to stop in a reasonable amount of time and gracefully close connections to the zookeeper database so that locks are released properly and records can be cleaned safely. The old stop() code which handles sigint joined on the upload worker threads. This meant it couldn't happen in a reasonable amount of time for the reason above. This then leads to killing the process in init scripts with sigkill. Thankfully we can just not wait for upload workers to join and let process exit kill the upload process for us. Separately we can gracefully close the zookeeper connection. Then any other builders (possibly when this one restarts) can clean up the upload record in zk and in the cloud. Change-Id: I52425bb8e5b8f0d6e1d25674cbe590e32b629e6d --- nodepool/builder.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/nodepool/builder.py b/nodepool/builder.py index f2be99400..06c82d469 100644 --- a/nodepool/builder.py +++ b/nodepool/builder.py @@ -1127,7 +1127,14 @@ class NodePoolBuilder(object): ''' with self._start_lock: self.log.debug("Stopping. NodePoolBuilder shutting down workers") - workers = self._build_workers + self._upload_workers + # Note we do not add the upload workers to this list intentionally. + # The reason for this is that uploads can take many hours and there + # is no good way to stop the blocking writes performed by the + # uploads in order to join() below on a reasonable amount of time. + # Killing the process will stop the upload then both the record + # in zk and in the cloud will be deleted by any other running + # builders or when this builder starts again. + workers = self._build_workers if self._janitor: workers += [self._janitor] for worker in (workers):