Use node cache in node deleter

The node deleter method runs every 5 seconds and iterates over
every node.  For any node that is not READY, it attempts to lock
the node to determine if it should be deleted.  In other words,
this method attempts to lock almost every node every 5 seconds.

Locking nodes is expensive (involving a number of ZK round trips).
However, our node cache keeps a count of the number of lock
contenders on a given node.  That can be used as a proxy for whether
the node is probably locked.  Zero contenders means it could be
unlocked.  One or more contenders means it's probably locked.

To greatly reduce ZK traffic, we can use the node cache and only
attempt to lock nodes if the cache indicates they are potentially
unlocked.  If we're wrong, we'll try again in 5 seconds anyway.

Change-Id: Ieb54babef92d5dbf3173316c6a1711e0e4a70403
This commit is contained in:
James E. Blair 2023-03-14 15:49:50 -07:00
parent 5033d399c4
commit 49da7656fe
1 changed files with 5 additions and 1 deletions

View File

@ -843,7 +843,7 @@ class DeletedNodeWorker(BaseCleanupWorker):
zk.DELETING, zk.DELETED, zk.ABORTED)
zk_conn = self._nodepool.getZK()
for node in zk_conn.nodeIterator():
for node in zk_conn.nodeIterator(cached=True, cached_ids=True):
# If a ready node has been allocated to a request, but that
# request is now missing, deallocate it.
if (node.state == zk.READY and node.allocated_to
@ -872,6 +872,10 @@ class DeletedNodeWorker(BaseCleanupWorker):
if node.provider not in self._nodepool.config.providers:
continue
# This node is *probably* locked, so skip it.
if node.lock_contenders:
continue
# Any nodes in these states that are unlocked can be deleted.
if node.state in cleanup_states:
try: