OpenStack cross service/project profiler

Go to file

Boris Pavlovic b2c9b86ad7 Add OSprofiler docs This is required by python-docs job + it's good to have documentation on read the docs. As we don't won't to duplicate work, index.rst is just symlink to README.rst Change-Id: I1b42fc7c135367ca77949998c5e0db5fc5dd7434		2014-08-01 12:27:42 +04:00
doc/source	Add OSprofiler docs	2014-08-01 12:27:42 +04:00
osprofiler	Merge "Use compare_digest or an equivalent when available"	2014-07-25 00:05:43 +00:00
tests	Merge "Prevent Messaging to resend failed notifications"	2014-07-21 17:36:58 +00:00
tools	Init Strucutre of lib	2014-01-09 11:25:23 +04:00
.gitignore	Add OSprofiler docs	2014-08-01 12:27:42 +04:00
.gitreview	Add git review file	2014-06-09 22:51:17 +04:00
.testr.conf	Init Strucutre of lib	2014-01-09 11:25:23 +04:00
LICENSE	Init Strucutre of lib	2014-01-09 11:25:23 +04:00
README.rst	Imporve read me	2014-07-14 07:12:18 +04:00
requirements.txt	Remove unused libs from requirments and fix info in setup.cfg	2014-06-25 02:36:33 +04:00
setup.cfg	Add OSprofiler docs	2014-08-01 12:27:42 +04:00
setup.py	Init Strucutre of lib	2014-01-09 11:25:23 +04:00
test-requirements.txt	Add OSprofiler docs	2014-08-01 12:27:42 +04:00
tox.ini	Add OSprofiler docs	2014-08-01 12:27:42 +04:00

README.rst

OSProfiler

OSProfiler is an OpenStack cross-project profiling library.

Background

OpenStack consists of multiple projects. Each project, in turn, is composed of multiple services. To process some request, e.g. to boot a virtual machine, OpenStack uses multiple services from different projects. In the case something works too slowly, it's extremely complicated to understand what exactly goes wrong and to locate the bottleneck.

To resolve this issue, we introduce a tiny but powerful library, osprofiler, that is going to be used by all OpenStack projects and their python clients. To be able to generate 1 trace per request, that goes through all involved services, and builds a tree of calls (see an example).

Why not cProfile and etc?

The scope of this library is quite different:

We are interested in getting one trace of points from different service, not tracing all python calls inside one process.
This library should be easy integratable in OpenStack. This means that:
- It shouldn't require too many changes in code bases of integrating projects.
- We should be able to turn it off fully.
- We should be able to keep it turned on in lazy mode in production (e.g. admin should be able to "trace" on request).

OSprofiler API version 0.2.5

There are a couple of things that you should know about API before using it.

4 ways to add a new trace point

from osprofiler import profiler

def some_func():

profiler.start("point_name", {"any_key": "with_any_value"}) # your code profiler.stop({"any_info_about_point": "in_this_dict"})

@profiler.Trace("point_name",

info={"any_info_about_point": "in_this_dict"}, hide_args=False)

def some_func2(args,*kwargs):

# If you need to hide args in profile info, put hide_args=True pass

def some_func3():

with profiler.trace("point_name",

info={"any_key": "with_any_value"}): # some code here

@profiler.trace_cls("point_name", info={}, hide_args=False,

trace_private=False)

class TracedClass(object):

def traced_method(self):

pass

def _traced_only_if_trace_private_true(self):

pass
How profiler works?
- @profiler.Trace() and profiler.trace() are just syntax sugar, that just calls profiler.start() & profiler.stop() methods.
- Every call of profiler.start() & profiler.stop() sends to collector 1 message. It means that every trace point creates 2 records in the collector. (more about collector & records later)
- Nested trace points are supported. The sample below produces 2 trace points:
  
  profiler.start("parent_point") profiler.start("child_point") profiler.stop() profiler.stop()
  
  The implementation is quite simple. Profiler has one stack that contains ids of all trace points. E.g.:
  
  profiler.start("parent_point") # trace_stack.push(<new_uuid>)
  
  # send to collector -> trace_stack[-2:]
  
  profiler.start("parent_point") # trace_stack.push(<new_uuid>)
  
  # send to collector -> trace_stack[-2:]
  
  profiler.stop() # send to collector -> trace_stack[-2:]
  
  # trace_stack.pop()
  
  profiler.stop() # send to collector -> trace_stack[-2:]
  
  # trace_stack.pop()
  
  It's simple to build a tree of nested trace points, having (parent_id, point_id) of all trace points.
Process of sending to collector

Trace points contain 2 messages (start and stop). Messages like below are sent to a collector:
{ "name": <point_name>-(start|stop) "base_id": <uuid>, "parent_id": <uuid>, "trace_id": <uuid>, "info": <dict> }
- base_id - <uuid> that is equal for all trace points that belong
  
  to one trace, this is done to simplify the process of retrieving all trace points related to one trace from collector
- parent_id - <uuid> of parent trace point
- trace_id - <uuid> of current trace point
- info - it's dictionary that contains user information passed via calls of
  
  profiler start() & stop() methods.
Setting up the collector.

The profiler doesn't include a trace point collector. The user/developer should instead provide a method that sends messages to a collector. Let's take a look at a trivial sample, where the collector is just a file:

import json

from osprofiler import notifier

def send_info_to_file_collector(info, context=None):

with open("traces", "a") as f:

f.write(json.dumps(info))

notifier.set(send_info_to_file_collector)

So now on every profiler.start() and profiler.stop() call we will write info about the trace point to the end of the traces file.
Initialization of profiler.

If profiler is not initialized, all calls to profiler.start() and profiler.stop() will be ignored.

Initialization is a quite simple procedure.

from osprofiler import profiler

profiler.init("SECRET_HMAC_KEY", base_id=<uuid>, parent_id=<uuid>)

SECRET_HMAC_KEY - will be discussed later, because it's related to the integration of OSprofiler & OpenStack.

base_id and trace_id will be used to initialize stack_trace in profiler, e.g. stack_trace = [base_id, trace_id].

Integration with OpenStack

There are 4 topics related to integration OSprofiler & OpenStack:

What we should use as a centralized collector?

We decided to use Ceilometer, because:
- It's already integrated in OpenStack, so it's quite simple to send notifications to it from all projects.
- There is an OpenStack API in Ceilometer that allows us to retrieve all messages related to one trace. Take a look at osprofiler.parsers.ceilometer:get_notifications
How to setup profiler notifier?

We decided to use olso.messaging Notifier API, because:
- oslo.messaging is integrated in all projects
- It's the simplest way to send notification to Ceilometer, take a look at: osprofiler.notifiers.messaging.Messaging:notify method
- We don't need to add any new CONF options in projects
How to initialize profiler, to get one trace across all services?
To enable cross service profiling we actually need to do send from caller to callee (base_id & trace_id). So callee will be able to init its profiler with these values.

In case of OpenStack there are 2 kinds of interaction between 2 services:
- REST API
  It's well known that there are python clients for every project, that generate proper HTTP requests, and parse responses to objects.
  
  These python clients are used in 2 cases:
  - User access -> OpenStack
  - Service from Project 1 would like to access Service from Project 2
  So what we need is to:
  - Put in python clients headers with trace info (if profiler is inited)
  - Add OSprofiler WSGI middleware to service, that initializes profiler, if there are special trace headers, that are signed by HMAC from api-paste.ini
  Actually the algorithm is a bit more complex. The Python client will also sign the trace info with a HMAC key passed to profiler.init, and on reception the WSGI middleware will check that it's signed with the same HMAC key that is specified in api-paste.ini. This ensures that only the user that knows the HMAC key in api-paste.ini can init a profiler properly and send trace info that will be actually processed. This ensures that trace info that is sent in that does not pass the HMAC validation will be discarded.
- RPC API
  RPC calls are used for interaction between services of one project. It's well known that projects are using oslo.messaging to deal with RPC. It's very good, because projects deal with RPC in similar way.
  
  So there are 2 required changes:
  - On callee side put in request context trace info (if profiler was initialized)
  - On caller side initialize profiler, if there is trace info in request context.
  - Trace all methods of callee API (can be done via profiler.trace_cls).
What points should be tracked by default?
I think that for all projects we should include by default 5 kinds of points:
- All HTTP calls - helps to get information about: what HTTP requests were done, duration of calls (latency of service), information about projects involved in request.
- All RPC calls - helps to understand duration of parts of request related to different services in one project. This information is essential to understand which service produce the bottleneck.
- All DB API calls - in some cases slow DB query can produce bottleneck. So it's quite useful to track how much time request spend in DB layer.
- All driver calls - in case of nova, cinder and others we have vendor drivers. Duration
- ALL SQL requests (turned off by default, because it produce a lot of traffic)