[ops-guide] Openstack process

1. Added openstack process 2. Minor restructure to seperate openstack components from 3rd party software. Change-Id: If2fd671f1407f9b7fe06aab4266f43280d50b80f Closes-bug: 1457768
2016-08-17 13:57:00 +02:00 · 2016-08-17 13:57:00 +02:00 · f4b046bbf4
commit f4b046bbf4
parent ae2f313fb7
1 changed files with 165 additions and 117 deletions
--- a/doc/ops-guide/source/ops-monitoring.rst
+++ b/doc/ops-guide/source/ops-monitoring.rst
@ -8,14 +8,6 @@ creating a functional cloud. The latter involves monitoring resource
 usage over time in order to make informed decisions about potential
 bottlenecks and upgrades.
 **Nagios** is an open source monitoring service. It's capable of executing
 arbitrary commands to check the status of server and network services,
 remotely executing arbitrary commands directly on servers, and allowing
 servers to push notifications back in the form of passive monitoring.
 Nagios has been around since 1999. Although newer monitoring services
 are available, Nagios is a tried-and-true systems administration
 staple.
 Process Monitoring
 ~~~~~~~~~~~~~~~~~~
@ -38,31 +30,46 @@ the ``nova-api`` service is running on the cloud controller:
   /usr/bin/nova-api --config-file=/etc/nova/nova.conf
   root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
 You can create automated alerts for critical processes by using Nagios
 and NRPE. For example, to ensure that the ``nova-compute`` process is
 running on compute nodes, create an alert on your Nagios server that
 looks like this:
-.. code-block:: none
+The OpenStack processes that should be monitored depend on the specific
 configuration of the environment, but can include:
-   define service {
+**Compute service (nova)**
       host_name c01.example.com
       check_command check_nrpe_1arg!check_nova-compute
       use generic-service
       notification_period 24x7
       contact_groups sysadmins
       service_description nova-compute
   }
-Then on the actual compute node, create the following NRPE
+* nova-api
-configuration:
+* nova-scheduler
 * nova-conductor
 * nova-novncproxy
 * nova-cert
 * nova-compute
-.. code-block:: none
+**Block Storage service (cinder)**
-    \command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: -a nova-compute
+* cinder-volume
 * cinder-api
 * cinder-scheduler
-Nagios checks that at least one ``nova-compute`` service is running at
+**Networking service (neutron)**
-all times.
+
 * neutron-api
 * neutron-server
 * neutron-openvswitch-agent
 * neutron-dhcp-agent
 * neutron-l3-agent
 * neutron-metadata-agent
 **Image service (glance)**
 * glance-api
 * glance-registry
 **Identity service (keystone)**
 The keystone processes can run standalone, but are often run within Apache
 as WSGI applications.
 * keystone-public
 * keystone-admin
 Resource Alerting
 ~~~~~~~~~~~~~~~~~
@ -81,96 +88,10 @@ Some of the resources that you want to monitor include:
 * Network I/O
 * Available vCPUs
-For example, to monitor disk capacity on a compute node with Nagios, add
+Telemetry Service
-the following to your Nagios configuration:
+~~~~~~~~~~~~~~~~~
-.. code-block:: none
+The Telemetry service (:term:`ceilometer`) collects
   define service {
       host_name c01.example.com
       check_command check_nrpe!check_all_disks!20% 10%
       use generic-service
       contact_groups sysadmins
       service_description Disk
   }
 On the compute node, add the following to your NRPE configuration:
 .. code-block:: none
   command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -e
 Nagios alerts you with a WARNING when any disk on the compute node is 80
 percent full and CRITICAL when 90 percent is full.
 StackTach
 ~~~~~~~~~
 StackTach is a tool that collects and reports the notifications sent by
 ``nova``. Notifications are essentially the same as logs but can be much
 more detailed. Nearly all OpenStack components are capable of generating
 notifications when significant events occur. Notifications are messages
 placed on the OpenStack queue (generally RabbitMQ) for consumption by
 downstream systems. An overview of notifications can be found at `System
 Usage
 Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
 To enable ``nova`` to send notifications, add the following to
 ``nova.conf``:
 .. code-block:: ini
   notification_topics=monitor
   notification_driver=messagingv2
 Once ``nova`` is sending notifications, install and configure StackTach.
 StackTach workers for Queue consumption and pipeling processing are
 configured to read these notifications from RabbitMQ servers and store
 them in a database. Users can inquire on instances, requests and servers
 by using the browser interface or command line tool,
 `Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
 relatively new and constantly changing, installation instructions
 quickly become outdated. Please refer to the `StackTach Git
 repo <https://git.openstack.org/cgit/openstack/stacktach>`_ for
 instructions as well as a demo video. Additional details on the latest
 developments can be discovered at the `official
 page <http://stacktach.com/>`_
 Logstash
 ~~~~~~~~
 Logstash is a high performance indexing and search engine for logs. Logs
 from Jenkins test runs are sent to logstash where they are indexed and
 stored. Logstash facilitates reviewing logs from multiple sources in a
 single test run, searching for errors or particular events within a test
 run, and searching for log event trends across test runs.
 There are four major layers in Logstash setup which are
 * Log Pusher
 * Log Indexer
 * ElasticSearch
 * Kibana
 Each layer scales horizontally. As the number of logs grows you can add
 more log pushers, more Logstash indexers, and more ElasticSearch nodes.
 Logpusher is a pair of Python scripts which first listens to Jenkins
 build events and converts them into Gearman jobs. Gearman provides a
 generic application framework to farm out work to other machines or
 processes that are better suited to do the work. It allows you to do
 work in parallel, to load balance processing, and to call functions
 between languages. Later Logpusher performs Gearman jobs to push log
 files into logstash. Logstash indexer reads these log events, filters
 them to remove unwanted lines, collapse multiple events together, and
 parses useful information before shipping them to ElasticSearch for
 storage and indexing. Kibana is a logstash oriented web client for
 ElasticSearch.
 OpenStack Telemetry
 ~~~~~~~~~~~~~~~~~~~
 An integrated OpenStack project (code-named :term:`ceilometer`) collects
 metering and event data relating to OpenStack services. Data collected
 by the Telemetry service could be used for billing. Depending on
 deployment configuration, collected data may be accessible to users
@ -182,7 +103,7 @@ Guide <http://docs.openstack.org/admin-guide/telemetry.html>`_ or
 in the `developer
 documentation <http://docs.openstack.org/developer/ceilometer>`_.
-OpenStack-Specific Resources
+OpenStack Specific Resources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Resources such as memory, disk, and CPU are generic resources that all
@ -392,3 +313,130 @@ collectd is out of the scope of this book, a good starting point would
 be to use collectd to store the result as a COUNTER data type. More
 information can be found in `collectd's
 documentation <https://collectd.org/wiki/index.php/Data_source>`_.
 Monitoring Tools
 ~~~~~~~~~~~~~~~~
 Nagios
 ------
 Nagios is an open source monitoring service. It is capable of executing
 arbitrary commands to check the status of server and network services,
 remotely executing arbitrary commands directly on servers, and allowing
 servers to push notifications back in the form of passive monitoring.
 Nagios has been around since 1999. Although newer monitoring services
 are available, Nagios is a tried-and-true systems administration
 staple.
 You can create automated alerts for critical processes by using Nagios
 and NRPE. For example, to ensure that the ``nova-compute`` process is
 running on Compute nodes, create an alert on your Nagios server:
 .. code-block:: ini
   define service {
       host_name c01.example.com
       check_command check_nrpe_1arg!check_nova-compute
       use generic-service
       notification_period 24x7
       contact_groups sysadmins
       service_description nova-compute
   }
 On the Compute node, create the following NRPE
 configuration:
 .. code-block:: ini
    command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: -a nova-compute
 Nagios checks that at least one ``nova-compute`` service is running at
 all times.
 For resource alerting, for example, monitor disk capacity on a compute node
 with Nagios, add the following to your Nagios configuration:
 .. code-block:: ini
   define service {
       host_name c01.example.com
       check_command check_nrpe!check_all_disks!20% 10%
       use generic-service
       contact_groups sysadmins
       service_description Disk
   }
 On the compute node, add the following to your NRPE configuration:
 .. code-block:: ini
   command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -e
 Nagios alerts you with a `WARNING` when any disk on the compute node is 80
 percent full and `CRITICAL` when 90 percent is full.
 StackTach
 ---------
 StackTach is a tool that collects and reports the notifications sent by
 nova. Notifications are essentially the same as logs but can be much
 more detailed. Nearly all OpenStack components are capable of generating
 notifications when significant events occur. Notifications are messages
 placed on the OpenStack queue (generally RabbitMQ) for consumption by
 downstream systems. An overview of notifications can be found at `System
 Usage
 Data <https://wiki.openstack.org/wiki/SystemUsageData>`_.
 To enable nova to send notifications, add the following to the
 ``nova.conf`` configuration file:
 .. code-block:: ini
   notification_topics=monitor
   notification_driver=messagingv2
 Once nova is sending notifications, install and configure StackTach.
 StackTach works for queue consumption and pipeline processing are
 configured to read these notifications from RabbitMQ servers and store
 them in a database. Users can inquire on instances, requests, and servers
 by using the browser interface or command-line tool,
 `Stacky <https://github.com/rackerlabs/stacky>`_. Since StackTach is
 relatively new and constantly changing, installation instructions
 quickly become outdated. Refer to the `StackTach Git
 repository <https://git.openstack.org/cgit/openstack/stacktach>`_ for
 instructions as well as a demostration video. Additional details on the latest
 developments can be discovered at the `official
 page <http://stacktach.com/>`_
 Logstash
 ~~~~~~~~
 Logstash is a high performance indexing and search engine for logs. Logs
 from Jenkins test runs are sent to logstash where they are indexed and
 stored. Logstash facilitates reviewing logs from multiple sources in a
 single test run, searching for errors or particular events within a test
 run, and searching for log event trends across test runs.
 There are four major layers in Logstash setup which are:
 * Log Pusher
 * Log Indexer
 * ElasticSearch
 * Kibana
 Each layer scales horizontally. As the number of logs grows you can add
 more log pushers, more Logstash indexers, and more ElasticSearch nodes.
 Logpusher is a pair of Python scripts that first listens to Jenkins
 build events, then converts them into Gearman jobs. Gearman provides a
 generic application framework to farm out work to other machines or
 processes that are better suited to do the work. It allows you to do
 work in parallel, to load balance processing, and to call functions
 between languages. Later, Logpusher performs Gearman jobs to push log
 files into logstash. Logstash indexer reads these log events, filters
 them to remove unwanted lines, collapse multiple events together, and
 parses useful information before shipping them to ElasticSearch for
 storage and indexing. Kibana is a logstash oriented web client for
 ElasticSearch.