Add a document that describes profiling eventlet services
Profiling eventlet using services is a bit different from standard situations so here is a document that tries to explain the basics on how to get started doing it. Change-Id: If8c34653285f07c5cc1abccabfec16f18daafdde
This commit is contained in:
parent
23e7930b35
commit
886142b17a
@ -80,6 +80,10 @@ be Python code. All new code needs to be validated somehow.
|
||||
|
||||
* :doc:`/contributor/testing/down-cell`
|
||||
|
||||
* **Profiling Guides**: These are guides to profiling nova.
|
||||
|
||||
* :doc:`/contributor/testing/eventlet-profiling`
|
||||
|
||||
The Nova API
|
||||
============
|
||||
|
||||
|
241
doc/source/contributor/testing/eventlet-profiling.rst
Normal file
241
doc/source/contributor/testing/eventlet-profiling.rst
Normal file
@ -0,0 +1,241 @@
|
||||
=======================
|
||||
Profiling With Eventlet
|
||||
=======================
|
||||
|
||||
When performance of one of the Nova services is worse than expected, and other
|
||||
sorts of analysis do not lead to candidate fixes, profiling is an excellent
|
||||
tool for producing detailed analysis of what methods in the code are called the
|
||||
most and which consume the most time.
|
||||
|
||||
Because most Nova services use eventlet_, the standard profiling tool provided
|
||||
with Python, cProfile_, will not work. Something is required to keep track of
|
||||
changing tasks. Thankfully eventlet comes with
|
||||
``eventlet.green.profile.Profile``, a mostly undocumented class that provides a
|
||||
similar (but not identical) API to the one provided by Python's ``Profile``
|
||||
while outputting the same format.
|
||||
|
||||
.. note:: The eventlet Profile outputs the ``prof`` format produced by
|
||||
``profile``, which is not the same as that output by ``cProfile``.
|
||||
Some analysis tools (for example, SnakeViz_) only read the latter
|
||||
so the options for analyzing eventlet profiling are not always
|
||||
deluxe (see below).
|
||||
|
||||
Setup
|
||||
=====
|
||||
|
||||
This guide assumes the Nova service being profiled is running devstack, but
|
||||
that is not necessary. What is necessary is that the code associated with the
|
||||
service can be changed and the service restarted, in place.
|
||||
|
||||
Profiling the entire service will produce mostly noise and the output will be
|
||||
confusing because different tasks will operate during the profile run. It is
|
||||
better to begin the process with a candidate task or method *within* the
|
||||
service that can be associated with an identifier. For example,
|
||||
``select_destinations`` in the ``FilterScheduler`` can be associated with the
|
||||
list of ``instance_uuids`` passed to it and it runs only once for that set of
|
||||
instance uuids.
|
||||
|
||||
The process for profiling is:
|
||||
|
||||
#. Identify the method to be profiled.
|
||||
|
||||
#. Populate the environment with sufficient resources to exercise the code. For
|
||||
example you may wish to use the FakeVirtDriver_ to have nova aware of
|
||||
multiple ``nova-compute`` processes. Or you may wish to launch many
|
||||
instances if you are evaluating a method that loops over instances.
|
||||
|
||||
#. At the start of that method, change the code to instantiate a ``Profile``
|
||||
object and ``start()`` it.
|
||||
|
||||
#. At the end of that method, change the code to ``stop()`` profiling and write
|
||||
the data (with ``dump_stats()``) to a reasonable location.
|
||||
|
||||
#. Restart the service.
|
||||
|
||||
#. Cause the method being evaluated to run.
|
||||
|
||||
#. Analyze the profile data with the pstats_ module.
|
||||
|
||||
.. note:: ``stop()`` and ``start()`` are two of the ways in which the eventlet
|
||||
``Profile`` API differs from the stdlib. There the methods are
|
||||
``enable()`` and ``disable()``.
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
For this example we will analyze ``select_destinations`` in the
|
||||
``FilterScheduler``. A known problem is that it does excessive work when
|
||||
presented with too many candidate results from the Placement service. We'd like
|
||||
to know why.
|
||||
|
||||
We'll configure and run devstack_ with FakeVirtDriver_ so there are several
|
||||
candidate hypervisors (the following ``local.conf`` is also useful for other
|
||||
profiling and benchmarking scenarios so not all changes are relevant here):
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[[local|localrc]]
|
||||
ADMIN_PASSWORD=secret
|
||||
DATABASE_PASSWORD=$ADMIN_PASSWORD
|
||||
RABBIT_PASSWORD=$ADMIN_PASSWORD
|
||||
SERVICE_PASSWORD=$ADMIN_PASSWORD
|
||||
USE_PYTHON3=True
|
||||
VIRT_DRIVER=fake
|
||||
# You may use different numbers of fake computes, but be careful: 100 will
|
||||
# completely overwhelm a 16GB, 16VPCU server. In the test profiles below a
|
||||
# value of 50 was used, on a 16GB, 16VCPU server.
|
||||
NUMBER_FAKE_NOVA_COMPUTE=25
|
||||
disable_service cinder
|
||||
disable_service horizon
|
||||
disable_service dstat
|
||||
disable_service tempest
|
||||
|
||||
[[post-config|$NOVA_CONF]]
|
||||
rpc_response_timeout = 300
|
||||
|
||||
# Disable filtering entirely. For some profiling this will not be what you
|
||||
# want.
|
||||
[filter_scheduler]
|
||||
enabled_filters = '""'
|
||||
# Send only one type of notifications to avoid notification overhead.
|
||||
[notifications]
|
||||
notification_format = unversioned
|
||||
|
||||
Change the code in ``nova/scheduler/filter_scheduler.py`` as follows:
|
||||
|
||||
.. code-block:: diff
|
||||
|
||||
|
||||
diff --git a/nova/scheduler/filter_scheduler.py b/nova/scheduler/filter_scheduler.py
|
||||
index 672f23077e..cb0f87fe48 100644
|
||||
--- a/nova/scheduler/filter_scheduler.py
|
||||
+++ b/nova/scheduler/filter_scheduler.py
|
||||
@@ -49,92 +49,99 @@ class FilterScheduler(driver.Scheduler):
|
||||
def select_destinations(self, context, spec_obj, instance_uuids,
|
||||
alloc_reqs_by_rp_uuid, provider_summaries,
|
||||
allocation_request_version=None, return_alternates=False):
|
||||
"""Returns a list of lists of Selection objects, which represent the
|
||||
hosts and (optionally) alternates for each instance.
|
||||
|
||||
:param context: The RequestContext object
|
||||
:param spec_obj: The RequestSpec object
|
||||
:param instance_uuids: List of UUIDs, one for each value of the spec
|
||||
object's num_instances attribute
|
||||
:param alloc_reqs_by_rp_uuid: Optional dict, keyed by resource provider
|
||||
UUID, of the allocation_requests that may
|
||||
be used to claim resources against
|
||||
matched hosts. If None, indicates either
|
||||
the placement API wasn't reachable or
|
||||
that there were no allocation_requests
|
||||
returned by the placement API. If the
|
||||
latter, the provider_summaries will be an
|
||||
empty dict, not None.
|
||||
:param provider_summaries: Optional dict, keyed by resource provider
|
||||
UUID, of information that will be used by
|
||||
the filters/weighers in selecting matching
|
||||
hosts for a request. If None, indicates that
|
||||
the scheduler driver should grab all compute
|
||||
node information locally and that the
|
||||
Placement API is not used. If an empty dict,
|
||||
indicates the Placement API returned no
|
||||
potential matches for the requested
|
||||
resources.
|
||||
:param allocation_request_version: The microversion used to request the
|
||||
allocations.
|
||||
:param return_alternates: When True, zero or more alternate hosts are
|
||||
returned with each selected host. The number
|
||||
of alternates is determined by the
|
||||
configuration option
|
||||
`CONF.scheduler.max_attempts`.
|
||||
"""
|
||||
+ from eventlet.green import profile
|
||||
+ pr = profile.Profile()
|
||||
+ pr.start()
|
||||
+
|
||||
self.notifier.info(
|
||||
context, 'scheduler.select_destinations.start',
|
||||
dict(request_spec=spec_obj.to_legacy_request_spec_dict()))
|
||||
compute_utils.notify_about_scheduler_action(
|
||||
context=context, request_spec=spec_obj,
|
||||
action=fields_obj.NotificationAction.SELECT_DESTINATIONS,
|
||||
phase=fields_obj.NotificationPhase.START)
|
||||
|
||||
host_selections = self._schedule(context, spec_obj, instance_uuids,
|
||||
alloc_reqs_by_rp_uuid, provider_summaries,
|
||||
allocation_request_version, return_alternates)
|
||||
self.notifier.info(
|
||||
context, 'scheduler.select_destinations.end',
|
||||
dict(request_spec=spec_obj.to_legacy_request_spec_dict()))
|
||||
compute_utils.notify_about_scheduler_action(
|
||||
context=context, request_spec=spec_obj,
|
||||
action=fields_obj.NotificationAction.SELECT_DESTINATIONS,
|
||||
phase=fields_obj.NotificationPhase.END)
|
||||
+ pr.stop()
|
||||
+ pr.dump_stats('/tmp/select_destinations/%s.prof' % ':'.join(instance_uuids))
|
||||
+
|
||||
return host_selections
|
||||
|
||||
Make a ``/tmp/select_destinations`` directory that is writable by the user
|
||||
nova-scheduler will run as. This is where the profile output will go.
|
||||
|
||||
Restart the scheduler service. Note that ``systemctl restart`` may not kill
|
||||
things sufficiently dead, so::
|
||||
|
||||
sudo systemctl stop devstack@n-sch
|
||||
sleep 5
|
||||
sudo systemctl start devstack@n-sch
|
||||
|
||||
Create a server (which will call ``select_destinations``)::
|
||||
|
||||
openstack server create --image cirros-0.4.0-x86_64-disk --flavor c1 x1
|
||||
|
||||
In ``/tmp/select_destinations`` there should be a file with a name using the
|
||||
uuid of the created server with a ``.prof`` extension.
|
||||
|
||||
Change to that directory and view the profile using the pstats
|
||||
`interactive mode`_::
|
||||
|
||||
python3 -m pstats ef044142-f3b8-409d-9af6-c60cea39b273.prof
|
||||
|
||||
Sort stats by their cumulative time::
|
||||
|
||||
ef044142-f3b8-409d-9af6-c60cea39b273.prof% sort cumtime
|
||||
ef044142-f3b8-409d-9af6-c60cea39b273.prof% stats 10
|
||||
Tue Aug 6 17:17:56 2019 ef044142-f3b8-409d-9af6-c60cea39b273.prof
|
||||
|
||||
603477 function calls (587772 primitive calls) in 2.294 seconds
|
||||
|
||||
Ordered by: cumulative time
|
||||
List reduced from 2484 to 10 due to restriction <10>
|
||||
|
||||
ncalls tottime percall cumtime percall filename:lineno(function)
|
||||
1 0.000 0.000 1.957 1.957 profile:0(start)
|
||||
1 0.000 0.000 1.911 1.911 /mnt/share/opt/stack/nova/nova/scheduler/filter_scheduler.py:113(_schedule)
|
||||
1 0.000 0.000 1.834 1.834 /mnt/share/opt/stack/nova/nova/scheduler/filter_scheduler.py:485(_get_all_host_states)
|
||||
1 0.000 0.000 1.834 1.834 /mnt/share/opt/stack/nova/nova/scheduler/host_manager.py:757(get_host_states_by_uuids)
|
||||
1 0.004 0.004 1.818 1.818 /mnt/share/opt/stack/nova/nova/scheduler/host_manager.py:777(_get_host_states)
|
||||
104/103 0.001 0.000 1.409 0.014 /usr/local/lib/python3.6/dist-packages/oslo_versionedobjects/base.py:170(wrapper)
|
||||
50 0.001 0.000 1.290 0.026 /mnt/share/opt/stack/nova/nova/scheduler/host_manager.py:836(_get_instance_info)
|
||||
50 0.001 0.000 1.289 0.026 /mnt/share/opt/stack/nova/nova/scheduler/host_manager.py:820(_get_instances_by_host)
|
||||
103 0.001 0.000 0.890 0.009 /usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/query.py:3325(__iter__)
|
||||
50 0.001 0.000 0.776 0.016 /mnt/share/opt/stack/nova/nova/objects/host_mapping.py:99(get_by_host)
|
||||
|
||||
From this we can make a couple of useful inferences about ``get_by_host``:
|
||||
|
||||
* It is called once for each of the 50 ``FakeVirtDriver`` hypervisors
|
||||
configured for these tests.
|
||||
|
||||
* It (and the methods it calls internally) consumes about 40% of the entire
|
||||
time spent running (``0.776 / 1.957``) the ``select_destinations`` method
|
||||
(indicated by ``profile:0(start)``, above).
|
||||
|
||||
Several other sort modes can be used. List those that are available by entering
|
||||
``sort`` without arguments.
|
||||
|
||||
.. _eventlet: https://eventlet.net/
|
||||
.. _cProfile: https://docs.python.org/3/library/profile.html
|
||||
.. _SnakeViz: https://jiffyclub.github.io/snakeviz/
|
||||
.. _devstack: https://docs.openstack.org/devstack/latest/
|
||||
.. _FakeVirtDriver: https://docs.openstack.org/devstack/latest/guides/nova.html#fake-virt-driver
|
||||
.. _pstats: https://docs.python.org/3/library/profile.html#pstats.Stats
|
||||
.. _interactive mode: https://www.stefaanlippens.net/python_profiling_with_pstats_interactive_mode/
|
@ -217,6 +217,7 @@ looking parts of our architecture. These are collected below.
|
||||
contributor/testing/serial-console
|
||||
contributor/testing/zero-downtime-upgrade
|
||||
contributor/testing/down-cell
|
||||
contributor/testing/eventlet-profiling
|
||||
contributor/how-to-get-involved
|
||||
contributor/process
|
||||
contributor/project-scope
|
||||
|
Loading…
x
Reference in New Issue
Block a user