From 3ffae36b9de967747a1cc1045eae0923d6bd893c Mon Sep 17 00:00:00 2001 From: Bogdan Dobrelya Date: Mon, 8 Jun 2015 15:24:36 +0200 Subject: [PATCH] Adjust the RabbitMQ kernel net_ticktime parameter W/o this fix, the situation is possible when some arbitrary queue masters got stuck due to high short-time CPU load spikes ending up in the rabbitmqctl hanged and affected rabbit node erased and restarted by OCF monitor logic of the related pacemaker RA. This is an issue as it seems that Oslo.messaging yet to be in a good shape to survive this short-time outage of underlying AMQP layer without disrupting interrupted RPC calls being executed. The workaround is to reduce the net_ticktime kernel parameter from default 60 seconds to 10 seconds. What would allow the RabbitMQ cluster to better detect short-time partitions and autoheal them. And when the partition has been detected and healed, the rabbitmqctl would not hang hopefully as stucked queue masters will be recovered. Partial-bug: #1460762 Change-Id: I3aa47b51ae080bb4a8b298c61a629ac8225d2abd Signed-off-by: Bogdan Dobrelya --- deployment/puppet/osnailyfacter/modular/rabbitmq/rabbitmq.pp | 1 + 1 file changed, 1 insertion(+) diff --git a/deployment/puppet/osnailyfacter/modular/rabbitmq/rabbitmq.pp b/deployment/puppet/osnailyfacter/modular/rabbitmq/rabbitmq.pp index 4b80bd6b58..0797b3d9a8 100644 --- a/deployment/puppet/osnailyfacter/modular/rabbitmq/rabbitmq.pp +++ b/deployment/puppet/osnailyfacter/modular/rabbitmq/rabbitmq.pp @@ -57,6 +57,7 @@ if $queue_provider == 'rabbitmq' { 'inet_dist_listen_min' => '41055', 'inet_dist_listen_max' => '41055', 'inet_default_connect_options' => '[{nodelay,true}]', + 'net_ticktime' => '10', } ) $config_variables = hiera('rabbit_config_variables',