Do not delete queues on error

When a queue is in error, we used to delete the faulty queue.
The problem with that code is that under heavy load, for queues that are
shared accross agents, this could fail in a loop and never recover.

Agent A detect the queue is broken:
it delete the queue and then recreate

In the meantime, Agent B detect the same thing:
it delete the queue and then recreate as well.

Now imagine this with more than 3k agents...

Change-Id: I762bf2839482ee06c5b2fc9fa50f38f5542cbe99
Closes-bug: #2133389
Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
This commit is contained in:
Arnaud Morin
2025-11-20 15:29:52 +01:00
parent cc15b88d3c
commit ecef670efc
2 changed files with 5 additions and 26 deletions

View File

@@ -558,14 +558,6 @@ class Consumer:
'Queue: [%(queue)s], '
'error message: [%(err_str)s]', info)
time.sleep(interval)
if self.queue_arguments.get('x-queue-type') == 'quorum':
# Before re-declare queue, try to delete it
# This is helping with issue #2028384
# NOTE(amorin) we need to make sure the connection is
# established again, because when an error occur, the
# connection is closed.
conn.ensure_connection()
self.queue.delete()
self.queue.declare()
else:
raise
@@ -608,24 +600,6 @@ class Consumer:
nowait=self.nowait)
else:
raise
except amqp_ex.InternalError as exc:
if self.queue_arguments.get('x-queue-type') == 'quorum':
# Before re-consume queue, try to delete it
# This is helping with issue #2028384
if exc.code == 541:
LOG.warning('Queue %s seems broken, will try delete it '
'before starting over.', self.queue.name)
# NOTE(amorin) we need to make sure the connection is
# established again, because when an error occur, the
# connection is closed.
conn.ensure_connection()
self.queue.delete()
self.declare(conn)
self.queue.consume(callback=self._callback,
consumer_tag=str(tag),
nowait=self.nowait)
else:
raise
def cancel(self, tag):
LOG.trace('ConsumerBase.cancel: canceling %s', tag)

View File

@@ -0,0 +1,5 @@
---
fixes:
- |
Avoid deleting RabbitMQ ``quorum`` queues if they are failing on server
side with ``Internal Server Error`` (error ``541``).