Fix Field `health_status_reason[api]' cannot be None`

If nodes in a cluster were deleted, period health checks for the cluster
will timeout. This will result in logs of logs like the following:

 ERROR oslo.service.loopingcall ValueError: Field
 `health_status_reason[api]' cannot be None

The timeout is successfully caught by exception handling in this part of
the code. However, it is thrown as a type MaxRetryError exception, which
does not have body or message attrs. E.g.

 MaxRetryError("HTTPSConnectionPool(host='115.146.81.72', port=6443): Max
 retries exceeded with url: /healthz (Caused by
 NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
 at 0x7f6b9b915a20>: Failed to establish a new connection: [Errno 110]
 ETIMEDOUT',))",)

This means health_status_reason will be a dict like `{'api': None}`.
Saving this using oslo_versionedobjects will throw an ValueError,
because althought the dict itself is corced as a `Dict(default=<class
'oslo_versionedobjects.fields.UnspecifiedDefault'>,nullable=True)`, the
None value will be coerced as a `String(default=<class
'oslo_versionedobjects.fields.UnspecifiedDefault'>,nullable=False)` and
that is not nullable.

Task: 38316
Change-Id: I8fd8d363284b06cf0bfba45d5845ba8687a2c783
This commit is contained in:
Jake Yip 2020-01-20 21:53:59 +11:00
parent 7f8ffe7d7b
commit 30436350af
1 changed files with 2 additions and 1 deletions

View File

@ -246,6 +246,7 @@ class K8sMonitor(monitors.MonitorBase):
if not api_status:
api_status = (getattr(exp_api, 'body', None) or
getattr(exp_api, 'message', None))
health_status_reason['api'] = api_status
if api_status is not None:
health_status_reason['api'] = api_status
return health_status, health_status_reason