fe26a52024
This change makes nova configure oslo.messaging's active call monitoring feature if the operator increases the rpc_response_timeout configuration option beyond the default of 60 seconds. If this happens, oslo.messaging will heartbeat actively-running calls to indicate that they are still running, avoiding a false timeout at the shorter interval, while still detecting actual dead-service failures before the longer timeout value. In addition, this adds a long_rpc_timeout configuration option that we can use for known-to-run-long operations separately from the base rpc_response_timeout value, and pre_live_migration() is changed to use this, as it is known to suffer from early false timeouts. Depends-On: Iecb7bef61b3b8145126ead1f74dbaadd7d97b407 Change-Id: Icb0bdc6d4ce4524341e70e737eafcb25f346d197
21 lines
1.1 KiB
YAML
21 lines
1.1 KiB
YAML
---
|
|
features:
|
|
- |
|
|
Utilizing recent changes in oslo.messaging, the
|
|
`rpc_response_timeout` value can now be increased significantly if
|
|
needed or desired to solve issues with long-running RPC calls
|
|
timing out before completing due to legitimate reasons (such as
|
|
live migration prep). If `rpc_response_timeout` is increased
|
|
beyond the default, nova will request active call monitoring from
|
|
oslo.messaging, which will effectively heartbeat running
|
|
activities to avoid a timeout, while still detecting failures
|
|
related to service outages or message bus congestion in a
|
|
reasonable amount of time. Further, the
|
|
`[DEFAULT]/long_rpc_timeout` option has been added which allows
|
|
setting an alternate timeout value for longer-running RPC calls
|
|
which are known to take a long time. The default for this is 1800
|
|
seconds, and the `rpc_response_timeout` value will be used for the
|
|
heartbeat frequency interval, providing a similar
|
|
failure-detection experience for these calls despite the longer
|
|
overall timeout. Currently, only the live migration RPC call uses
|
|
this longer timeout value. |