OSprofiler cross service & project profiling

Change-Id: I95dccdc9f274661767d2659c18b96da169891f30
This commit is contained in:
Boris Pavlovic 2014-07-01 13:51:05 +04:00
parent 9241178b89
commit 5b1201a149
1 changed files with 404 additions and 0 deletions

View File

@ -0,0 +1,404 @@
=============================================
OSprofiler cross service & project profiling
=============================================
https://blueprints.launchpad.net/oslo/+spec/osprofiler-cross-service-project-profiling
Mission Statement
=================
The OSprofiler is a distributed trace toolkit library. It provides pythonic
helpers to do trace generation to avoid repeated code to trace WSGI, RPC, DB,
and other important places... It also provides an interfaces for various
collection backends and helpers to parse data in and out of it.
Problem Description
===================
* Background:
OpenStack consists of multiple projects. Each project, in turn, is composed
of multiple services. To process some request, e.g. to boot a virtual
machine, OpenStack uses multiple services from different projects. In the
case something works too slowly, it's extremely complicated to understand
what exactly goes wrong and to locate the bottleneck.
To resolve this issue, we introduce a tiny but powerful library,
**OSprofiler**, that could be used by all OpenStack projects and their
python clients. To be able to generate a single trace per request, that goes
through all involved services, and builds a tree of calls (see an
`example <http://boris-42.github.io/ngk.html>`_).
* For more details about how exactly OSprofiler works take a look at
`readme <https://github.com/openstack/osprofiler/blob/master/README.rst>`_
Proposed Change
===============
OSprofiler is already used by Cinder, Heat, Glance and it is going
to be used by most of other projects like:
- Nova https://review.openstack.org/#/c/254703/
- OpenStack Client https://review.openstack.org/#/c/255861/
Since currently OSprofiler is not under OpenStack umbrella in governance
https://github.com/openstack/governance/blob/master/reference/projects.yaml
we are proposing the Oslo program should be home for OSprofiler.
Further just like other oslo projects, we should have a core team just for
OSprofiler as well.
Implementation
--------------
* **How OSprofiler works** *
* OSprofiler is very tiny library that allows you to create nested
traced. Basically it just keeps in memory stack (list) of elements.
Each element has 3 trace id:
base_id - all points of the same trace have same base_id, which helps
us to fetch all points related to some trace
parent_id - parent's point id
current_id - current point's id
And it has 2 methods start() and stop(), start() is pushing new elements
and calling driver's notify method with payload, stop() is removing latest
element from stack and calling one more notify()
This allows us to build tree of calls with durations and payload info:
Like `here <http://boris-42.github.io/ngk.html>`_.
For more details please
`read the docs <https://github.com/openstack/osprofiler>`_.
* **What is going to be used as a OSprofiler driver (trace collector)** *
* OSprofiler is going to have multi drivers support. Which means
that basically any centralized system can be used to collect data.
Or even we can write trace information just to files.
* As a first driver we are going to use oslo.messaging & Ceilometer
* In future OSprofiler team is going to add drivers for:
MongoDB, InfluxDB, ElasticSearch
* **OSprofiler integration points:** *
* Changes in projects configuration:
Add config group "profiler" and 3 config options inside (in all projects):
enabled = False # Fully disable OSprofiler by default
trace_db = False # Whatever trace DB requests or not. It is disabled
# by default, because there are too many DB requests
# and tracing them is useful only for deep debugging.
hmacs = SECRET_KEY # HMAC keys that are used to sign HTTP headers
# that activate OSprofiler, this is used to
# block regular users to trigger OSProfiler.
# They must be the same for all projects.
connection = None # Connection string that specify which
# OSProfiler driver to use and credentials
# for specified backend. Like
# mongo://user:password@ip:port/schema
* Keep single trace between 2 projects:
Add OSprofiler middleware to all pipelines in all paste.ini of all projects
This middleware is initializing OSProfiler if there is special HTTP
header signed with proper HMAC key.
* Keep single trace between 2 services of single project:
If RPC caller has initialized profiler it should add special payload
to all RPC messages that contains trace information.
Callee process should initialize OSProfiler if it found such message.
* Changes required in python clients
There are 2 changes required in each python client:
- If profiler is enabled add profiler trace header, this will be used
when project A is calling project B API via python client.
- CLI argument --profile <HMAC> - that will initialize OSprofiler
* **What points should be tracked by default?**
I think that for all projects we should include by default 5 kinds of points:
* All HTTP calls - helps to get information about: what HTTP requests were
done, duration of calls (latency of service), information about projects
involved in request.
* All RPC calls - helps to understand duration of parts of request related
to different services in one project. This information is essential to
understand which service produce the bottleneck.
* All DB API calls - in some cases slow DB query can produce bottleneck. So
it's quite useful to track how much time request spend in DB layer.
* All driver layer calls - in case of nova, cinder and others we have vendor
drivers. (e.g. nova.virt.driver)
* All DB API layer calls (e.g. nova.db.api)
* All raw SQL requests (turned off by default, because it produce a lot of
traffic)
* Any other import for specific project methods/classes/code pieces
* ** Points that should be tracked in future as well:**
* All external commands.
For example, oslo.concurrency processutils.execute() calls.
Because the work done by external commands is a key part of many backend
API implementations and takes non-trivial time.
* Worker threads spawned / run.
Some API calls will result in single-use background threads being spawned
to process some work asynchronously from the rest of the work.
I think it is important to be able to capture this work in traces,
by recording a trace when a thread start is requested, and then having
a trace in the start+end of the thread main method.
Alternatives
------------
* Why not cProfile and other python tracer/debugger?
**The scope of this library is quite different:**
* It should create single trace that goes cross all services/projects
* It is more interesting to be able to collect data about specific points
in code instead of all methods
* CProfiler like functionality can be integrated in future in OSprofiler
* This library should be easy integratable with OpenStack. This means that:
* It shouldn't require too many changes in code bases of integrating
projects.
* It should be simple to turn it off fully
* It should be possible to keep it turned on in lazy mode in production
(e.g. users that knows HMAC key can trace their requests).
* What about Zipkin and other existing distributed tracing solutions?
OSprofiler is small library that is used to provide no vendor lock-in
tracing solution for OpenStack.
OSprofiler doesn't intend to implement whole Zipkin like stack. It's just
tiny library that is used integrate OpenStack with different collectors
and provide native OpenStack tracer/profiler that is not hard coded on any
tracing service (e.g. Zipkin).
Impact On Existing APIs
-----------------------
* All API methods of all projects will accept 2 new headers:
X-Trace-Info and X-Trace-HMAC that will actually trigger profiler.
* All python clients will accept --profile key, that will actually put proper
X-Trace-Info and X-Trace-HMAC headers to HTTP request, and print trace id.
* There is no need in any changes in any existing oslo libs at the moment.
If we decide to integrate OSprofiler to some of oslo libs we will do the
standard process of specs/pr/reviews in that specific lib.
Security Impact
---------------
OSprofiler is using HMAC to sign trace headers. Only the people who know the
secret key used for HMAC are able to trigger profiler.
As HMAC is quite secure there won't be issues with security.
Even in worse case, when attacker knows secret key, he will be able to trigger
profiler, that will make his request a bit slower, but won't affect other
users.
Performance Impact
------------------
* If it is turned off there is negligible performance overhead.
Just couple of "if None" checks
* If it is turned on, there are two different cases:
* No trace headers in request => No performance impact
* There is trace header:
* Trace id is signed with wrong HMAC overhead on checking that trace id
is signed by proper HMAC key.
* Trace id is signed with proper HMAC, overhead will depend on amount of
trace points and Notification API performance. For requests like
booting VM, creating Volumes it can be measurable, for simple requests
like showing details about resource it will be insignificant.
* Trace every N request configuration (not done yet)
In such configuration every N request will have OSprofiler overhead that
depends on many things: Amount of traced points (depends on API method),
OSprofiler backend, and other factors..
Configuration Impact
--------------------
We are adding to all projects new CONF group options:
[profiler]
#If False fully disable profiling feature.
#enabled = False
# If False doesn't trace SQL requests.
#trace_db = True
# HMAC keys that are used to sign headers.
# Because OpenStack contains many projects we are not able to update all HMAC
# keys at the same point of time. To provide ability of no downtime rolling
# updates of HMAC keys (security reasons) we need ability to specify many HMACs
# The process of update OLDKey -> NEWKey will look like:
# 1) Initial system configuration:
# ALL Services have OLDKey, users use OLDKey
# 2) in the middle 1:
# All Services have OLDKey, part of them have both OLDKey and NEWKey, users
# use OLDKey
# 3) in the middle 2:
# All Services have OLDKey and NEWKey, users use NEWKey
# 4) in the middle 3:
# Part of service have both keys and some of services have only NEWKey,
# users use NEWKey
# 5) end system configuration:
# All services have only NewKey
#hmacs = SECRET_KEY1, SECRET_KEY2
# Profiler driver collector connection string.
connection = None
By default OSprofiler is turned off. However it can be keep on in production,
because it doesn't add any overhead until it is triggered and profiler can be
trigged only by person who knows HMAC key.
Developer Impact
----------------
Developers will be able to profile OpenStack and fix different issues related
to performance and scale.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
boris-42
dinabelova
harlowja
zhiyan
Milestones
----------
Mitaka-3
Work Items
----------
* Remove from paste.ini OSprofiler configuration:
https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/make_paste_ini_config_optional.rst
* Multi backend support:
https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/multi_backend_support.rst
* Move all projects to new multi backend model
* Integrate OSprofiler in all OpenStack projects
* Better integration testing:
https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/integration_testing.rst
* Better DevStack integration:
https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/better_devstack_integration.rst
* Integrate Rally & OSprofiler
Incubation
==========
Adoption
--------
All projects and python clients should add quite small amount of code to
make it possible to do the cross project/service tracing.
Library
-------
Code of OSprofiler library can be find here:
https://github.com/openstack/osprofiler/
Anticipated API Stabilization
-----------------------------
None.
Documentation Impact
====================
We should document in one place how to configure and use OSprofiler.
Dependencies
============
- OSprofiler can be runtime dependency to python clients
- OSprofiler should be in requirements of projects that are will use OSprofiler
References
==========
* OSprofiler lib: https://github.com/stackforge/osprofiler
* OSprofiler specs: https://github.com/openstack/osprofiler/tree/master/doc/specs
.. note::
This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode