539e376978
The previous commit attempted to prevent a hung process/thread by using a "tcp ping" to determine if the server_endpoint was reachable before deploying the heat stack and starting the heartbeat thread. While this approach works well in theory we're finding that in practice in a live K8 environment we're seeing a lot of random errors: [Errno 111] Connection refused: ConnectionRefusedError There could be numerous reasons for these random connection errors but ZMQ has retry logic which should overcome these problems. This commit updates the sockets used in ZMQ to add a timeout (a padded value of agent_loss_timeout). While this does not prevent the creation of a heat stack and heartbeat thread that might never respond, it does solve the initial problem of having stuck process/threads and getting a clean exit Change-Id: I8193c72120b459c2a18d780d9f8799e8df592e20 |
||
---|---|---|
.. | ||
agent | ||
engine | ||
openstack | ||
resources | ||
scenarios | ||
tests | ||
__init__.py | ||
lib.py | ||
version.py |