[ops-guide] Openstack process
1. Added openstack process 2. Minor restructure to seperate openstack components from 3rd party software. Change-Id: If2fd671f1407f9b7fe06aab4266f43280d50b80f Closes-bug: 1457768
This commit is contained in:
parent
ae2f313fb7
commit
f4b046bbf4
@ -8,14 +8,6 @@ creating a functional cloud. The latter involves monitoring resource
|
|||||||
usage over time in order to make informed decisions about potential
|
usage over time in order to make informed decisions about potential
|
||||||
bottlenecks and upgrades.
|
bottlenecks and upgrades.
|
||||||
|
|
||||||
**Nagios** is an open source monitoring service. It's capable of executing
|
|
||||||
arbitrary commands to check the status of server and network services,
|
|
||||||
remotely executing arbitrary commands directly on servers, and allowing
|
|
||||||
servers to push notifications back in the form of passive monitoring.
|
|
||||||
Nagios has been around since 1999. Although newer monitoring services
|
|
||||||
are available, Nagios is a tried-and-true systems administration
|
|
||||||
staple.
|
|
||||||
|
|
||||||
Process Monitoring
|
Process Monitoring
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -38,31 +30,46 @@ the ``nova-api`` service is running on the cloud controller:
|
|||||||
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
/usr/bin/nova-api --config-file=/etc/nova/nova.conf
|
||||||
root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
|
root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
|
||||||
|
|
||||||
You can create automated alerts for critical processes by using Nagios
|
|
||||||
and NRPE. For example, to ensure that the ``nova-compute`` process is
|
|
||||||
running on compute nodes, create an alert on your Nagios server that
|
|
||||||
looks like this:
|
|
||||||
|
|
||||||
.. code-block:: none
|
The OpenStack processes that should be monitored depend on the specific
|
||||||
|
configuration of the environment, but can include:
|
||||||
|
|
||||||
define service {
|
**Compute service (nova)**
|
||||||
host_name c01.example.com
|
|
||||||
check_command check_nrpe_1arg!check_nova-compute
|
|
||||||
use generic-service
|
|
||||||
notification_period 24x7
|
|
||||||
contact_groups sysadmins
|
|
||||||
service_description nova-compute
|
|
||||||
}
|
|
||||||
|
|
||||||
Then on the actual compute node, create the following NRPE
|
* nova-api
|
||||||
configuration:
|
* nova-scheduler
|
||||||
|
* nova-conductor
|
||||||
|
* nova-novncproxy
|
||||||
|
* nova-cert
|
||||||
|
* nova-compute
|
||||||
|
|
||||||
.. code-block:: none
|
**Block Storage service (cinder)**
|
||||||
|
|
||||||
\command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: -a nova-compute
|
* cinder-volume
|
||||||
|
* cinder-api
|
||||||
|
* cinder-scheduler
|
||||||
|
|
||||||
Nagios checks that at least one ``nova-compute`` service is running at
|
**Networking service (neutron)**
|
||||||
all times.
|
|
||||||
|
* neutron-api
|
||||||
|
* neutron-server
|
||||||
|
* neutron-openvswitch-agent
|
||||||
|
* neutron-dhcp-agent
|
||||||
|
* neutron-l3-agent
|
||||||
|
* neutron-metadata-agent
|
||||||
|
|
||||||
|
**Image service (glance)**
|
||||||
|
|
||||||
|
* glance-api
|
||||||
|
* glance-registry
|
||||||
|
|
||||||
|
**Identity service (keystone)**
|
||||||
|
|
||||||
|
The keystone processes can run standalone, but are often run within Apache
|
||||||
|
as WSGI applications.
|
||||||
|
|
||||||
|
* keystone-public
|
||||||
|
* keystone-admin
|
||||||
|
|
||||||
Resource Alerting
|
Resource Alerting
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
@ -81,96 +88,10 @@ Some of the resources that you want to monitor include:
|
|||||||
* Network I/O
|
* Network I/O
|
||||||
* Available vCPUs
|
* Available vCPUs
|
||||||
|
|
||||||
For example, to monitor disk capacity on a compute node with Nagios, add
|
Telemetry Service
|
||||||
the following to your Nagios configuration:
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. code-block:: none
|
The Telemetry service (:term:`ceilometer`) collects
|
||||||
|
|
||||||
define service {
|
|
||||||
host_name c01.example.com
|
|
||||||
check_command check_nrpe!check_all_disks!20% 10%
|
|
||||||
use generic-service
|
|
||||||
contact_groups sysadmins
|
|
||||||
service_description Disk
|
|
||||||
}
|
|
||||||
|
|
||||||
On the compute node, add the following to your NRPE configuration:
|
|
||||||
|
|
||||||
.. code-block:: none
|
|
||||||
|
|
||||||
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -e
|
|
||||||
|
|
||||||
Nagios alerts you with a WARNING when any disk on the compute node is 80
|
|
||||||
percent full and CRITICAL when 90 percent is full.
|
|
||||||
|
|
||||||
StackTach
|
|
||||||
~~~~~~~~~
|
|
||||||
|
|
||||||
StackTach is a tool that collects and reports the notifications sent by
|
|
||||||
``nova``. Notifications are essentially the same as logs but can be much
|
|
||||||
more detailed. Nearly all OpenStack components are capable of generating
|
|
||||||
notifications when significant events occur. Notifications are messages
|
|
||||||
placed on the OpenStack queue (generally RabbitMQ) for consumption by
|
|
||||||
downstream systems. An overview of notifications can be found at `System
|
|
||||||
Usage
|
|
||||||
Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
|
|
||||||
|
|
||||||
To enable ``nova`` to send notifications, add the following to
|
|
||||||
``nova.conf``:
|
|
||||||
|
|
||||||
.. code-block:: ini
|
|
||||||
|
|
||||||
notification_topics=monitor
|
|
||||||
notification_driver=messagingv2
|
|
||||||
|
|
||||||
Once ``nova`` is sending notifications, install and configure StackTach.
|
|
||||||
StackTach workers for Queue consumption and pipeling processing are
|
|
||||||
configured to read these notifications from RabbitMQ servers and store
|
|
||||||
them in a database. Users can inquire on instances, requests and servers
|
|
||||||
by using the browser interface or command line tool,
|
|
||||||
`Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
|
|
||||||
relatively new and constantly changing, installation instructions
|
|
||||||
quickly become outdated. Please refer to the `StackTach Git
|
|
||||||
repo <https://git.openstack.org/cgit/openstack/stacktach>`_ for
|
|
||||||
instructions as well as a demo video. Additional details on the latest
|
|
||||||
developments can be discovered at the `official
|
|
||||||
page <http://stacktach.com/>`_
|
|
||||||
|
|
||||||
Logstash
|
|
||||||
~~~~~~~~
|
|
||||||
|
|
||||||
Logstash is a high performance indexing and search engine for logs. Logs
|
|
||||||
from Jenkins test runs are sent to logstash where they are indexed and
|
|
||||||
stored. Logstash facilitates reviewing logs from multiple sources in a
|
|
||||||
single test run, searching for errors or particular events within a test
|
|
||||||
run, and searching for log event trends across test runs.
|
|
||||||
|
|
||||||
There are four major layers in Logstash setup which are
|
|
||||||
|
|
||||||
* Log Pusher
|
|
||||||
* Log Indexer
|
|
||||||
* ElasticSearch
|
|
||||||
* Kibana
|
|
||||||
|
|
||||||
Each layer scales horizontally. As the number of logs grows you can add
|
|
||||||
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
|
|
||||||
|
|
||||||
Logpusher is a pair of Python scripts which first listens to Jenkins
|
|
||||||
build events and converts them into Gearman jobs. Gearman provides a
|
|
||||||
generic application framework to farm out work to other machines or
|
|
||||||
processes that are better suited to do the work. It allows you to do
|
|
||||||
work in parallel, to load balance processing, and to call functions
|
|
||||||
between languages. Later Logpusher performs Gearman jobs to push log
|
|
||||||
files into logstash. Logstash indexer reads these log events, filters
|
|
||||||
them to remove unwanted lines, collapse multiple events together, and
|
|
||||||
parses useful information before shipping them to ElasticSearch for
|
|
||||||
storage and indexing. Kibana is a logstash oriented web client for
|
|
||||||
ElasticSearch.
|
|
||||||
|
|
||||||
OpenStack Telemetry
|
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
An integrated OpenStack project (code-named :term:`ceilometer`) collects
|
|
||||||
metering and event data relating to OpenStack services. Data collected
|
metering and event data relating to OpenStack services. Data collected
|
||||||
by the Telemetry service could be used for billing. Depending on
|
by the Telemetry service could be used for billing. Depending on
|
||||||
deployment configuration, collected data may be accessible to users
|
deployment configuration, collected data may be accessible to users
|
||||||
@ -182,7 +103,7 @@ Guide <http://docs.openstack.org/admin-guide/telemetry.html>`_ or
|
|||||||
in the `developer
|
in the `developer
|
||||||
documentation <http://docs.openstack.org/developer/ceilometer>`_.
|
documentation <http://docs.openstack.org/developer/ceilometer>`_.
|
||||||
|
|
||||||
OpenStack-Specific Resources
|
OpenStack Specific Resources
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Resources such as memory, disk, and CPU are generic resources that all
|
Resources such as memory, disk, and CPU are generic resources that all
|
||||||
@ -392,3 +313,130 @@ collectd is out of the scope of this book, a good starting point would
|
|||||||
be to use collectd to store the result as a COUNTER data type. More
|
be to use collectd to store the result as a COUNTER data type. More
|
||||||
information can be found in `collectd's
|
information can be found in `collectd's
|
||||||
documentation <https://collectd.org/wiki/index.php/Data_source>`_.
|
documentation <https://collectd.org/wiki/index.php/Data_source>`_.
|
||||||
|
|
||||||
|
|
||||||
|
Monitoring Tools
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Nagios
|
||||||
|
------
|
||||||
|
|
||||||
|
|
||||||
|
Nagios is an open source monitoring service. It is capable of executing
|
||||||
|
arbitrary commands to check the status of server and network services,
|
||||||
|
remotely executing arbitrary commands directly on servers, and allowing
|
||||||
|
servers to push notifications back in the form of passive monitoring.
|
||||||
|
Nagios has been around since 1999. Although newer monitoring services
|
||||||
|
are available, Nagios is a tried-and-true systems administration
|
||||||
|
staple.
|
||||||
|
|
||||||
|
You can create automated alerts for critical processes by using Nagios
|
||||||
|
and NRPE. For example, to ensure that the ``nova-compute`` process is
|
||||||
|
running on Compute nodes, create an alert on your Nagios server:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
define service {
|
||||||
|
host_name c01.example.com
|
||||||
|
check_command check_nrpe_1arg!check_nova-compute
|
||||||
|
use generic-service
|
||||||
|
notification_period 24x7
|
||||||
|
contact_groups sysadmins
|
||||||
|
service_description nova-compute
|
||||||
|
}
|
||||||
|
|
||||||
|
On the Compute node, create the following NRPE
|
||||||
|
configuration:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: -a nova-compute
|
||||||
|
|
||||||
|
Nagios checks that at least one ``nova-compute`` service is running at
|
||||||
|
all times.
|
||||||
|
|
||||||
|
For resource alerting, for example, monitor disk capacity on a compute node
|
||||||
|
with Nagios, add the following to your Nagios configuration:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
define service {
|
||||||
|
host_name c01.example.com
|
||||||
|
check_command check_nrpe!check_all_disks!20% 10%
|
||||||
|
use generic-service
|
||||||
|
contact_groups sysadmins
|
||||||
|
service_description Disk
|
||||||
|
}
|
||||||
|
|
||||||
|
On the compute node, add the following to your NRPE configuration:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -e
|
||||||
|
|
||||||
|
Nagios alerts you with a `WARNING` when any disk on the compute node is 80
|
||||||
|
percent full and `CRITICAL` when 90 percent is full.
|
||||||
|
|
||||||
|
StackTach
|
||||||
|
---------
|
||||||
|
|
||||||
|
StackTach is a tool that collects and reports the notifications sent by
|
||||||
|
nova. Notifications are essentially the same as logs but can be much
|
||||||
|
more detailed. Nearly all OpenStack components are capable of generating
|
||||||
|
notifications when significant events occur. Notifications are messages
|
||||||
|
placed on the OpenStack queue (generally RabbitMQ) for consumption by
|
||||||
|
downstream systems. An overview of notifications can be found at `System
|
||||||
|
Usage
|
||||||
|
Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
|
||||||
|
|
||||||
|
To enable nova to send notifications, add the following to the
|
||||||
|
``nova.conf`` configuration file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
notification_topics=monitor
|
||||||
|
notification_driver=messagingv2
|
||||||
|
|
||||||
|
Once nova is sending notifications, install and configure StackTach.
|
||||||
|
StackTach works for queue consumption and pipeline processing are
|
||||||
|
configured to read these notifications from RabbitMQ servers and store
|
||||||
|
them in a database. Users can inquire on instances, requests, and servers
|
||||||
|
by using the browser interface or command-line tool,
|
||||||
|
`Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
|
||||||
|
relatively new and constantly changing, installation instructions
|
||||||
|
quickly become outdated. Refer to the `StackTach Git
|
||||||
|
repository <https://git.openstack.org/cgit/openstack/stacktach>`_ for
|
||||||
|
instructions as well as a demostration video. Additional details on the latest
|
||||||
|
developments can be discovered at the `official
|
||||||
|
page <http://stacktach.com/>`_
|
||||||
|
|
||||||
|
Logstash
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Logstash is a high performance indexing and search engine for logs. Logs
|
||||||
|
from Jenkins test runs are sent to logstash where they are indexed and
|
||||||
|
stored. Logstash facilitates reviewing logs from multiple sources in a
|
||||||
|
single test run, searching for errors or particular events within a test
|
||||||
|
run, and searching for log event trends across test runs.
|
||||||
|
|
||||||
|
There are four major layers in Logstash setup which are:
|
||||||
|
|
||||||
|
* Log Pusher
|
||||||
|
* Log Indexer
|
||||||
|
* ElasticSearch
|
||||||
|
* Kibana
|
||||||
|
|
||||||
|
Each layer scales horizontally. As the number of logs grows you can add
|
||||||
|
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
|
||||||
|
|
||||||
|
Logpusher is a pair of Python scripts that first listens to Jenkins
|
||||||
|
build events, then converts them into Gearman jobs. Gearman provides a
|
||||||
|
generic application framework to farm out work to other machines or
|
||||||
|
processes that are better suited to do the work. It allows you to do
|
||||||
|
work in parallel, to load balance processing, and to call functions
|
||||||
|
between languages. Later, Logpusher performs Gearman jobs to push log
|
||||||
|
files into logstash. Logstash indexer reads these log events, filters
|
||||||
|
them to remove unwanted lines, collapse multiple events together, and
|
||||||
|
parses useful information before shipping them to ElasticSearch for
|
||||||
|
storage and indexing. Kibana is a logstash oriented web client for
|
||||||
|
ElasticSearch.
|
||||||
|
Loading…
Reference in New Issue
Block a user