Add logstash documentation.
* doc/source/logstash.rst: Add documentation on our Logstash system architecture and how to query logstash. Change-Id: I9da3e6d6391081131d1fd852230ddac6326c01a2 Reviewed-on: https://review.openstack.org/31257 Reviewed-by: James E. Blair <corvus@inaugust.com> Reviewed-by: Elizabeth Krumbach Joseph <lyz@princessleia.com> Approved: Jeremy Stanley <fungi@yuggoth.org> Reviewed-by: Jeremy Stanley <fungi@yuggoth.org> Tested-by: Jenkins
This commit is contained in:
parent
c42c7acdc4
commit
6881008de0
@ -12,7 +12,7 @@ At a Glance
|
||||
|
||||
:Hosts:
|
||||
* http://logstash.openstack.org
|
||||
* logstash-worker-\*.openstack.org
|
||||
* logstash-worker\*.openstack.org
|
||||
* elasticsearch.openstack.org
|
||||
:Puppet:
|
||||
* :file:`modules/logstash`
|
||||
@ -21,13 +21,16 @@ At a Glance
|
||||
* :file:`modules/openstack_project/manifests/elasticsearch.pp`
|
||||
:Configuration:
|
||||
* :file:`modules/openstack_project/files/logstash`
|
||||
* :file:`modules/openstack_project/templates/logstash`
|
||||
:Projects:
|
||||
* http://logstash.net/
|
||||
* http://kibana.org/
|
||||
* http://www.elasticsearch.org/
|
||||
:Bugs:
|
||||
* http://bugs.launchpad.net/openstack-ci
|
||||
* https://logstash.jira.com/secure/Dashboard.jspa
|
||||
* https://github.com/rashidkpc/Kibana/issues
|
||||
* https://github.com/elasticsearch/elasticsearch/issues
|
||||
|
||||
Overview
|
||||
========
|
||||
@ -38,7 +41,186 @@ sources in a single test run, searching for errors or particular
|
||||
events within a test run, as well as searching for log event trends
|
||||
across test runs.
|
||||
|
||||
TODO(clarkb): more details about system architecture
|
||||
System Architecture
|
||||
===================
|
||||
|
||||
TODO(clarkb): useful queries
|
||||
There are four major layers in our Logstash setup.
|
||||
|
||||
1. Log Pusher Script.
|
||||
Subscribes to the Jenkins ZeroMQ Event Publisher listening for build
|
||||
finished events. When a build finishes this script fetches the logs
|
||||
generated by that build, chops them up, annotates them with Jenkins
|
||||
build info and finally sends them to a Logstash indexer process.
|
||||
2. Logstash Indexer.
|
||||
Reads these log events from the log pusher, filters them to remove
|
||||
unwanted lines, collapses multiline events together, and parses
|
||||
useful information out of the events before shipping them to
|
||||
ElasticSearch for storage and indexing.
|
||||
3. ElasticSearch.
|
||||
Provides log storage, indexing, and search.
|
||||
4. Kibana.
|
||||
A Logstash oriented web client for ElasticSearch. You can perform
|
||||
queries on your Logstash logs in ElasticSearch through Kibana using
|
||||
the Lucene query language.
|
||||
|
||||
Each layer scales horizontally. As the number of logs grows we can add
|
||||
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
|
||||
Currently we have multiple Logstash worker nodes that pair a log pusher
|
||||
with a Logstash indexer. We did this as each Logstash process can only
|
||||
dedicate a single thread to filtering log events which turns into a
|
||||
bottleneck very quickly. This looks something like:
|
||||
|
||||
::
|
||||
|
||||
_ logstash-worker1 _
|
||||
/ \
|
||||
jenkins -- logstash-worker2 -- elasticsearch -- kibana
|
||||
\_ _/
|
||||
logstash-worker3
|
||||
|
||||
Log Pusher
|
||||
----------
|
||||
|
||||
This is a simple Python script that is given a list of log files to push
|
||||
to Logstash when Jenkins builds complete.
|
||||
|
||||
Log pushing looks like this:
|
||||
|
||||
* Jenkins publishes build complete notifications.
|
||||
* Log pusher receives the notification from Jenkins.
|
||||
* Using info in the notification log files are retrieved.
|
||||
* Log files are processed then shipped to Logstash.
|
||||
|
||||
In the near future this script will be modified to act as a Gearman
|
||||
worker so that we can add an arbitrary number of them without needing
|
||||
to partition the log files that each worker handles by hand. Instead
|
||||
each worker will be able to fetch and push any log file and will do
|
||||
so as directed by Gearman.
|
||||
|
||||
If you are interested in technical details The source of this script
|
||||
can be found at
|
||||
:file:`modules/openstack_project/files/logstash/log-pusher.py`
|
||||
|
||||
Logstash
|
||||
--------
|
||||
|
||||
Logstash does the heavy lifting of squashing all of our log lines into
|
||||
events with a common format. It reads the JSON log events from the log
|
||||
pusher connected to it, deletes events we don't want, parses log lines
|
||||
to set the timestamp, message, and other fields for the event, then
|
||||
ships these processed events off to ElasticSearch where they are stored
|
||||
and made queryable.
|
||||
|
||||
At a high level Logstash takes:
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"fields" {
|
||||
"build_name": "gate-foo",
|
||||
"build_numer": "10",
|
||||
"event_message": "2013-05-31T17:31:39.113 DEBUG Something happened",
|
||||
},
|
||||
}
|
||||
|
||||
And turns that into:
|
||||
|
||||
::
|
||||
|
||||
{
|
||||
"fields" {
|
||||
"build_name": "gate-foo",
|
||||
"build_numer": "10",
|
||||
"loglevel": "DEBUG"
|
||||
},
|
||||
"@message": "Something happened",
|
||||
"@timestamp": "2013-05-31T17:31:39.113Z",
|
||||
}
|
||||
|
||||
It flattens each log line into something that looks very much like
|
||||
all of the other events regardless of the source log line format. This
|
||||
makes querying your logs for lines from a specific build that failed
|
||||
between two timestamps with specific message content very easy. You
|
||||
don't need to write complicated greps instead you query against a
|
||||
schema.
|
||||
|
||||
The config file that tells Logstash how to do this flattening can be
|
||||
found at
|
||||
:file:`modules/openstack_project/templates/logstash/indexer.conf.erb`
|
||||
|
||||
|
||||
ElasticSearch
|
||||
-------------
|
||||
|
||||
ElasticSearch is basically a REST API layer for Lucene. It provides
|
||||
the storage and search engine for Logstash. It scales horizontally and
|
||||
loves it when you give it more memory. Currently we run a single node
|
||||
cluster on a large VM to give ElasticSearch both memory and disk space.
|
||||
Per index (Logstash creates one index per day) we have one replica (on
|
||||
the same node, this does not provide HA, it speeds up searches) and
|
||||
five shards (each shard is basically its own index, having multiple
|
||||
shards increases indexing throughput).
|
||||
|
||||
As this setup grows and handles more logs we may need to add more
|
||||
ElasticSearch nodes and run a proper cluster. Haven't reached that point
|
||||
yet, but will probably be necessary as disk and memory footprints
|
||||
increase.
|
||||
|
||||
Kibana
|
||||
------
|
||||
|
||||
Kibana is a ruby app sitting behind Apache that provides a nice web UI
|
||||
for querying Logstash events stored in ElasticSearch. Our install can
|
||||
be reached at http://logstash.openstack.org. See
|
||||
:ref:`query-logstash` for more info on using Kibana to perform
|
||||
queries.
|
||||
|
||||
.. _query-logstash:
|
||||
|
||||
Querying Logstash
|
||||
=================
|
||||
|
||||
Hop on over to http://logstash.openstack.org and by default you get the
|
||||
last 15 minutes of everything Logstash knows about in chunks of 100.
|
||||
We run a lot of tests but it is possible no logs have come in over the
|
||||
last 15 minutes, change the dropdown in the top left from ``Last 15m``
|
||||
to ``Last 60m`` to get a better window on the logs. At this point you
|
||||
should see a list of logs, if you click on a log event it will expand
|
||||
and show you all of the fields associated with that event and their
|
||||
values (not Chromium and Kibana seem to have trouble with this at times
|
||||
and some fields end up without values, use Firefox if this happens).
|
||||
You can search based on all of these fields and if you click the
|
||||
magnifying glass next to a field in the expanded event view it will add
|
||||
that field and value to your search. This is a good way of refining
|
||||
searches without a lot of typing.
|
||||
|
||||
The above is good info for poking around in the Logstash logs, but
|
||||
one of your changes has a failing test and you want to know why. We
|
||||
can jumpstart the refining process with a simple query.
|
||||
|
||||
``@fields.build_change:"$FAILING_CHANGE" AND @fields.build_patchset:"$FAILING_PATCHSET" AND @fields.build_name:"$FAILING_BUILD_NAME" AND @fields.build_number:"$FAILING_BUILD_NUMBER"``
|
||||
|
||||
This will show you all logs available from the patchset and build pair
|
||||
that failed. Chances are that this is still a significant number of
|
||||
logs and you will want to do more filtering. You can add more filters
|
||||
to the queriy using ``AND`` and ``OR`` and parentheses can be used to
|
||||
group sections of the query. Potential additions to the above query
|
||||
might be
|
||||
|
||||
* ``AND @fields.filename:"logs/syslog.txt"`` to get syslog events.
|
||||
* ``AND @fields.filename:"logs/screen-n-api.txt"`` to get Nova API events.
|
||||
* ``AND @fields.loglevel:"ERROR"`` to get ERROR level events.
|
||||
* ``AND @message"error"`` to get events with error in their message.
|
||||
and so on.
|
||||
|
||||
General query tips:
|
||||
|
||||
* Don't search ``All time``. ElasticSearch is bad at trying to find all
|
||||
the things it ever knew about. Give it a window of time to look
|
||||
through. You can use the presets in the dropdown to select a window or
|
||||
use the ``foo`` to ``bar`` boxes above the frequency graph.
|
||||
* Only the @message field can have fuzzy searches performed on it. Other
|
||||
fields require specific information.
|
||||
* This system is growing fast and may not always keep up with the load.
|
||||
Be patient. If expected logs do not show up immediately after the
|
||||
Jenkins job completes wait a few minutes.
|
||||
|
Loading…
Reference in New Issue
Block a user