Increase tolerance for declaring neutron agents down
The neutron server listens for heartbeats from the various neutron agents running on worker nodes. The agents send this heartbeat every 30s, but use a synchronous RPC, which can take up to 60s to time out if the rabbitmq server disappears (e.g. when a controller host is powered down unexpectedly). The default timeout is 75s, so if two of these async RPC messages time out in a row (due to rabbitmq server issues related to a controller power down or swact), the neutron agent will be declared down incorrectly. This causes the VIM to migrate instances away from the worker node, which we want to avoid. To make this more tolerant of temporary failures in the rabbitmq server, I am increasing the timeout (agent_down_time) to 150s. Change-Id: Iecd1a7d1034bc8c98853ba279336c26dc7bc3fe9 Closes-Bug: 1817935 Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
This commit is contained in:
@@ -893,6 +893,9 @@ data:
|
||||
enable_new_agents: false
|
||||
allow_automatic_dhcp_failover: true
|
||||
allow_automatic_l3agent_failover: true
|
||||
# Increase from default of 75 seconds to avoid agents being declared
|
||||
# down during controller swacts, reboots, etc...
|
||||
agent_down_time: 150
|
||||
agent:
|
||||
root_helper: sudo
|
||||
vhost:
|
||||
|
||||
Reference in New Issue
Block a user