Browse Source

Fix Field `health_status_reason[api]' cannot be None`

If nodes in a cluster were deleted, period health checks for the cluster
will timeout. This will result in logs of logs like the following:

 ERROR oslo.service.loopingcall ValueError: Field
 `health_status_reason[api]' cannot be None

The timeout is successfully caught by exception handling in this part of
the code. However, it is thrown as a type MaxRetryError exception, which
does not have body or message attrs. E.g.

 MaxRetryError("HTTPSConnectionPool(host='', port=6443): Max
 retries exceeded with url: /healthz (Caused by
 NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
 at 0x7f6b9b915a20>: Failed to establish a new connection: [Errno 110]

This means health_status_reason will be a dict like `{'api': None}`.
Saving this using oslo_versionedobjects will throw an ValueError,
because althought the dict itself is corced as a `Dict(default=<class
'oslo_versionedobjects.fields.UnspecifiedDefault'>,nullable=True)`, the
None value will be coerced as a `String(default=<class
'oslo_versionedobjects.fields.UnspecifiedDefault'>,nullable=False)` and
that is not nullable.

Task: 38316
Change-Id: I8fd8d363284b06cf0bfba45d5845ba8687a2c783
(cherry picked from commit 30436350af)
(cherry picked from commit 62c5de0743)
Jake Yip 5 months ago
committed by Bharat Kunwar
1 changed files with 2 additions and 1 deletions
  1. +2

+ 2
- 1
magnum/drivers/common/ View File

@@ -233,6 +233,7 @@ class K8sMonitor(monitors.MonitorBase):
if not api_status:
api_status = (getattr(exp_api, 'body', None) or
getattr(exp_api, 'message', None))
health_status_reason['api'] = api_status
if api_status is not None:
health_status_reason['api'] = api_status

return health_status, health_status_reason