From 9e868e15de576ef6ae2d873edc7f6ffed469165d Mon Sep 17 00:00:00 2001
From: Michele Baldessari <michele@acksyn.org>
Date: Thu, 19 Dec 2019 07:13:41 +0100
Subject: [PATCH] Increase rabbitmq tcp backlog

From https://bugzilla.redhat.com/show_bug.cgi?id=1778428

We need to tune the default rabbitmq tcp listen backlog.  Currently it defaults
to 128, but here's what happens:
Say we have 1500 total rabbitmq client connections spread across a 3 node
cluster, evenly distributed so each node has 500 clients.

Then, we stop rabbitmq on one of the nodes.

Now those 500 client connections all immediately fail over to the other two
node.  Assume roughly even split, and each gets 250 connections simultaneously.
Since the tcp listen backlog is only 128, a large number of the failover
connections cannot connect and get ECONNREFUSED because the kernel just drops
them.

Eventually things retry and the backlog clears, but it just makes things noisy
in the logs and makes failover take a little bit longer.

Upstream docs discuss here:
https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections-connection-backlog

Suggested-By: John Eckersberg <jeckersb@redhat.com>
Closes-Bug: #1854704

Change-Id: If6da4aff016db9a72e1cb9dfc9731f06e062f64d
(cherry picked from commit 9f4832fcc4d939da3d4e7f83e26c4f934bff7dc0)
(cherry picked from commit 5c8f9c67f1bd3eccaf68490b7432935deb171776)
---
 puppet/services/rabbitmq.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/puppet/services/rabbitmq.yaml b/puppet/services/rabbitmq.yaml
index 1c25be3e29..58b92b30ef 100644
--- a/puppet/services/rabbitmq.yaml
+++ b/puppet/services/rabbitmq.yaml
@@ -104,6 +104,7 @@ outputs:
             rabbitmq::wipe_db_on_cookie_change: true
             rabbitmq::port: 5672
             rabbitmq::loopback_users: []
+            rabbitmq::tcp_backlog: 4096
             rabbitmq::package_provider: yum
             rabbitmq::package_source: undef
             rabbitmq::repos_ensure: false