Use nova's ping method to find out if the service is alive

Currently there is fake rpc call "pod_health_probe_method_ignore_errors"
that is passed to the service, just to find out if it is responding. Because
such method does not exist, it is needed to catch and handle the exception
that is inevitably thrown by the service.

While this is technically working correctly, the exceptions pollute the
log files and make it harder for user to see possible real errors.

This is how the error looks like:

ERROR oslo_messaging.rpc.server [-] Exception during message handling: oslo_messaging.rpc.dispatcher.UnsupportedVersion: Endpoint does not support RPC version 1.0. Attempted method: pod_health_probe_method_ignore_errors
ERROR oslo_messaging.rpc.server Traceback (most recent call last):
ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
ERROR oslo_messaging.rpc.server   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 276, in dispatch
ERROR oslo_messaging.rpc.server     raise UnsupportedVersion(version, method=method)
ERROR oslo_messaging.rpc.server oslo_messaging.rpc.dispatcher.UnsupportedVersion: Endpoint does not support RPC version 1.0. Attempted method: pod_health_probe_method_ignore_errors

This situation is new since https://review.openstack.org/#/c/639711/
which (correctly) increased the default level of logging. Before 639711
error messages from oslo (both real and ones that could be ignored) were not
present in nova logs at all.

Fortunatelly, nova's BaseAPI class provides 'ping' method that is can
be used for this basic purpose by all nova components.

Change-Id: I0062e74bed399206becb8d9e00f9ec805da864a3
This commit is contained in:
Jiří Suchomel 2019-04-09 10:37:46 +02:00
parent 4b4745f1cd
commit baf5356a4f

View File

@ -17,8 +17,8 @@
""" """
Health probe script for OpenStack service that uses RPC/unix domain socket for Health probe script for OpenStack service that uses RPC/unix domain socket for
communication. Check's the RPC tcp socket status on the process and send communication. Check's the RPC tcp socket status on the process and send
message to service through rpc call method and expects a reply. It is expected message to service through rpc call method and expects a reply.
to receive failure from the service's RPC server as the method does not exist. Use nova's ping method that is designed just for such simple purpose.
Script returns failure to Kubernetes only when Script returns failure to Kubernetes only when
a. TCP socket for the RPC communication are not established. a. TCP socket for the RPC communication are not established.
@ -28,7 +28,7 @@ Script returns failure to Kubernetes only when
sys.stderr.write() writes to pod's events on failures. sys.stderr.write() writes to pod's events on failures.
Usage example for Nova Compute: Usage example for Nova Compute:
# python health-probe-rpc.py --config-file /etc/nova/nova.conf \ # python health-probe.py --config-file /etc/nova/nova.conf \
# --service-queue-name compute # --service-queue-name compute
""" """
@ -50,12 +50,15 @@ def check_service_status(transport):
"""Verify service status. Return success if service consumes message""" """Verify service status. Return success if service consumes message"""
try: try:
target = oslo_messaging.Target(topic=cfg.CONF.service_queue_name, target = oslo_messaging.Target(topic=cfg.CONF.service_queue_name,
server=socket.gethostname()) server=socket.gethostname(),
namespace='baseapi',
version="1.1")
client = oslo_messaging.RPCClient(transport, target, client = oslo_messaging.RPCClient(transport, target,
timeout=60, timeout=60,
retry=2) retry=2)
client.call(context.RequestContext(), client.call(context.RequestContext(),
'pod_health_probe_method_ignore_errors') 'ping',
arg=None)
except oslo_messaging.exceptions.MessageDeliveryFailure: except oslo_messaging.exceptions.MessageDeliveryFailure:
# Log to pod events # Log to pod events
sys.stderr.write("Health probe unable to reach message bus") sys.stderr.write("Health probe unable to reach message bus")