Use node cache in node deleter

The node deleter method runs every 5 seconds and iterates over every node. For any node that is not READY, it attempts to lock the node to determine if it should be deleted. In other words, this method attempts to lock almost every node every 5 seconds. Locking nodes is expensive (involving a number of ZK round trips). However, our node cache keeps a count of the number of lock contenders on a given node. That can be used as a proxy for whether the node is probably locked. Zero contenders means it could be unlocked. One or more contenders means it's probably locked. To greatly reduce ZK traffic, we can use the node cache and only attempt to lock nodes if the cache indicates they are potentially unlocked. If we're wrong, we'll try again in 5 seconds anyway. Change-Id: Ieb54babef92d5dbf3173316c6a1711e0e4a70403
2023-03-14 15:49:50 -07:00 · 2023-03-14 15:49:50 -07:00 · 965e5d9a60
parent 05c5a11ce8
commit 965e5d9a60
1 changed files with 5 additions and 1 deletions
--- a/nodepool/launcher.py
+++ b/nodepool/launcher.py
@ -843,7 +843,7 @@ class DeletedNodeWorker(BaseCleanupWorker):
                          zk.DELETING, zk.DELETED, zk.ABORTED)

        zk_conn = self._nodepool.getZK()
-        for node in zk_conn.nodeIterator():
+        for node in zk_conn.nodeIterator(cached=True, cached_ids=True):
            # If a ready node has been allocated to a request, but that
            # request is now missing, deallocate it.
            if (node.state == zk.READY and node.allocated_to
@ -872,6 +872,10 @@ class DeletedNodeWorker(BaseCleanupWorker):
            if node.provider not in self._nodepool.config.providers:
                continue

+            # This node is *probably* locked, so skip it.
+            if node.lock_contenders:
+                continue
+
            # Any nodes in these states that are unlocked can be deleted.
            if node.state in cleanup_states:
                try: