03f7dc29b7
Previously the initial call to connect to a RBD cluster via the RADOS API could hang indefinitely if network or other environmental related issues were encountered. When encountered during a call to update_available_resource this can result in the local n-cpu service reporting as UP while never being able to break out of a subsequent RPC timeout loop as documented in bug This change adds a simple timeout configurable to be used when initially connecting to the cluster [1][2][3]. The default timeout of 5 seconds being sufficiently small enough to ensure that if encountered the n-cpu service will be able to be marked as DOWN before a RPC timeout is seen. [1] http://docs.ceph.com/docs/luminous/rados/api/python/#rados.Rados.connect [2] http://docs.ceph.com/docs/mimic/rados/api/python/#rados.Rados.connect [3] http://docs.ceph.com/docs/nautilus/rados/api/python/#rados.Rados.connect Closes-bug: #1834048 Change-Id: I67f341bf895d6cc5d503da274c089d443295199e
13 lines
509 B
YAML
13 lines
509 B
YAML
---
|
|
other:
|
|
- |
|
|
A new ``[libvirt]/rbd_connect_timeout`` configuration option has been
|
|
introduced to limit the time spent waiting when connecting to a RBD cluster
|
|
via the RADOS API. This timeout currently defaults to 5 seconds.
|
|
|
|
This aims to address issues reported in `bug 1834048`_ where failures to
|
|
initially connect to a RBD cluster left the nova-compute service inoperable
|
|
due to constant RPC timeouts being hit.
|
|
|
|
.. _bug 1834048: https://bugs.launchpad.net/nova/+bug/1834048
|