Add logstash documentation.
* doc/source/logstash.rst: Add documentation on our Logstash system architecture and how to query logstash. Change-Id: I9da3e6d6391081131d1fd852230ddac6326c01a2 Reviewed-on: https://review.openstack.org/31257 Reviewed-by: James E. Blair <corvus@inaugust.com> Reviewed-by: Elizabeth Krumbach Joseph <lyz@princessleia.com> Approved: Jeremy Stanley <fungi@yuggoth.org> Reviewed-by: Jeremy Stanley <fungi@yuggoth.org> Tested-by: Jenkins
This commit is contained in:
parent
c42c7acdc4
commit
6881008de0
@ -12,7 +12,7 @@ At a Glance
|
|||||||
|
|
||||||
:Hosts:
|
:Hosts:
|
||||||
* http://logstash.openstack.org
|
* http://logstash.openstack.org
|
||||||
* logstash-worker-\*.openstack.org
|
* logstash-worker\*.openstack.org
|
||||||
* elasticsearch.openstack.org
|
* elasticsearch.openstack.org
|
||||||
:Puppet:
|
:Puppet:
|
||||||
* :file:`modules/logstash`
|
* :file:`modules/logstash`
|
||||||
@ -21,13 +21,16 @@ At a Glance
|
|||||||
* :file:`modules/openstack_project/manifests/elasticsearch.pp`
|
* :file:`modules/openstack_project/manifests/elasticsearch.pp`
|
||||||
:Configuration:
|
:Configuration:
|
||||||
* :file:`modules/openstack_project/files/logstash`
|
* :file:`modules/openstack_project/files/logstash`
|
||||||
|
* :file:`modules/openstack_project/templates/logstash`
|
||||||
:Projects:
|
:Projects:
|
||||||
* http://logstash.net/
|
* http://logstash.net/
|
||||||
* http://kibana.org/
|
* http://kibana.org/
|
||||||
|
* http://www.elasticsearch.org/
|
||||||
:Bugs:
|
:Bugs:
|
||||||
* http://bugs.launchpad.net/openstack-ci
|
* http://bugs.launchpad.net/openstack-ci
|
||||||
* https://logstash.jira.com/secure/Dashboard.jspa
|
* https://logstash.jira.com/secure/Dashboard.jspa
|
||||||
* https://github.com/rashidkpc/Kibana/issues
|
* https://github.com/rashidkpc/Kibana/issues
|
||||||
|
* https://github.com/elasticsearch/elasticsearch/issues
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
========
|
========
|
||||||
@ -38,7 +41,186 @@ sources in a single test run, searching for errors or particular
|
|||||||
events within a test run, as well as searching for log event trends
|
events within a test run, as well as searching for log event trends
|
||||||
across test runs.
|
across test runs.
|
||||||
|
|
||||||
TODO(clarkb): more details about system architecture
|
System Architecture
|
||||||
|
===================
|
||||||
|
|
||||||
TODO(clarkb): useful queries
|
There are four major layers in our Logstash setup.
|
||||||
|
|
||||||
|
1. Log Pusher Script.
|
||||||
|
Subscribes to the Jenkins ZeroMQ Event Publisher listening for build
|
||||||
|
finished events. When a build finishes this script fetches the logs
|
||||||
|
generated by that build, chops them up, annotates them with Jenkins
|
||||||
|
build info and finally sends them to a Logstash indexer process.
|
||||||
|
2. Logstash Indexer.
|
||||||
|
Reads these log events from the log pusher, filters them to remove
|
||||||
|
unwanted lines, collapses multiline events together, and parses
|
||||||
|
useful information out of the events before shipping them to
|
||||||
|
ElasticSearch for storage and indexing.
|
||||||
|
3. ElasticSearch.
|
||||||
|
Provides log storage, indexing, and search.
|
||||||
|
4. Kibana.
|
||||||
|
A Logstash oriented web client for ElasticSearch. You can perform
|
||||||
|
queries on your Logstash logs in ElasticSearch through Kibana using
|
||||||
|
the Lucene query language.
|
||||||
|
|
||||||
|
Each layer scales horizontally. As the number of logs grows we can add
|
||||||
|
more log pushers, more Logstash indexers, and more ElasticSearch nodes.
|
||||||
|
Currently we have multiple Logstash worker nodes that pair a log pusher
|
||||||
|
with a Logstash indexer. We did this as each Logstash process can only
|
||||||
|
dedicate a single thread to filtering log events which turns into a
|
||||||
|
bottleneck very quickly. This looks something like:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
_ logstash-worker1 _
|
||||||
|
/ \
|
||||||
|
jenkins -- logstash-worker2 -- elasticsearch -- kibana
|
||||||
|
\_ _/
|
||||||
|
logstash-worker3
|
||||||
|
|
||||||
|
Log Pusher
|
||||||
|
----------
|
||||||
|
|
||||||
|
This is a simple Python script that is given a list of log files to push
|
||||||
|
to Logstash when Jenkins builds complete.
|
||||||
|
|
||||||
|
Log pushing looks like this:
|
||||||
|
|
||||||
|
* Jenkins publishes build complete notifications.
|
||||||
|
* Log pusher receives the notification from Jenkins.
|
||||||
|
* Using info in the notification log files are retrieved.
|
||||||
|
* Log files are processed then shipped to Logstash.
|
||||||
|
|
||||||
|
In the near future this script will be modified to act as a Gearman
|
||||||
|
worker so that we can add an arbitrary number of them without needing
|
||||||
|
to partition the log files that each worker handles by hand. Instead
|
||||||
|
each worker will be able to fetch and push any log file and will do
|
||||||
|
so as directed by Gearman.
|
||||||
|
|
||||||
|
If you are interested in technical details The source of this script
|
||||||
|
can be found at
|
||||||
|
:file:`modules/openstack_project/files/logstash/log-pusher.py`
|
||||||
|
|
||||||
|
Logstash
|
||||||
|
--------
|
||||||
|
|
||||||
|
Logstash does the heavy lifting of squashing all of our log lines into
|
||||||
|
events with a common format. It reads the JSON log events from the log
|
||||||
|
pusher connected to it, deletes events we don't want, parses log lines
|
||||||
|
to set the timestamp, message, and other fields for the event, then
|
||||||
|
ships these processed events off to ElasticSearch where they are stored
|
||||||
|
and made queryable.
|
||||||
|
|
||||||
|
At a high level Logstash takes:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
{
|
||||||
|
"fields" {
|
||||||
|
"build_name": "gate-foo",
|
||||||
|
"build_numer": "10",
|
||||||
|
"event_message": "2013-05-31T17:31:39.113 DEBUG Something happened",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
And turns that into:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
{
|
||||||
|
"fields" {
|
||||||
|
"build_name": "gate-foo",
|
||||||
|
"build_numer": "10",
|
||||||
|
"loglevel": "DEBUG"
|
||||||
|
},
|
||||||
|
"@message": "Something happened",
|
||||||
|
"@timestamp": "2013-05-31T17:31:39.113Z",
|
||||||
|
}
|
||||||
|
|
||||||
|
It flattens each log line into something that looks very much like
|
||||||
|
all of the other events regardless of the source log line format. This
|
||||||
|
makes querying your logs for lines from a specific build that failed
|
||||||
|
between two timestamps with specific message content very easy. You
|
||||||
|
don't need to write complicated greps instead you query against a
|
||||||
|
schema.
|
||||||
|
|
||||||
|
The config file that tells Logstash how to do this flattening can be
|
||||||
|
found at
|
||||||
|
:file:`modules/openstack_project/templates/logstash/indexer.conf.erb`
|
||||||
|
|
||||||
|
|
||||||
|
ElasticSearch
|
||||||
|
-------------
|
||||||
|
|
||||||
|
ElasticSearch is basically a REST API layer for Lucene. It provides
|
||||||
|
the storage and search engine for Logstash. It scales horizontally and
|
||||||
|
loves it when you give it more memory. Currently we run a single node
|
||||||
|
cluster on a large VM to give ElasticSearch both memory and disk space.
|
||||||
|
Per index (Logstash creates one index per day) we have one replica (on
|
||||||
|
the same node, this does not provide HA, it speeds up searches) and
|
||||||
|
five shards (each shard is basically its own index, having multiple
|
||||||
|
shards increases indexing throughput).
|
||||||
|
|
||||||
|
As this setup grows and handles more logs we may need to add more
|
||||||
|
ElasticSearch nodes and run a proper cluster. Haven't reached that point
|
||||||
|
yet, but will probably be necessary as disk and memory footprints
|
||||||
|
increase.
|
||||||
|
|
||||||
|
Kibana
|
||||||
|
------
|
||||||
|
|
||||||
|
Kibana is a ruby app sitting behind Apache that provides a nice web UI
|
||||||
|
for querying Logstash events stored in ElasticSearch. Our install can
|
||||||
|
be reached at http://logstash.openstack.org. See
|
||||||
|
:ref:`query-logstash` for more info on using Kibana to perform
|
||||||
|
queries.
|
||||||
|
|
||||||
|
.. _query-logstash:
|
||||||
|
|
||||||
|
Querying Logstash
|
||||||
|
=================
|
||||||
|
|
||||||
|
Hop on over to http://logstash.openstack.org and by default you get the
|
||||||
|
last 15 minutes of everything Logstash knows about in chunks of 100.
|
||||||
|
We run a lot of tests but it is possible no logs have come in over the
|
||||||
|
last 15 minutes, change the dropdown in the top left from ``Last 15m``
|
||||||
|
to ``Last 60m`` to get a better window on the logs. At this point you
|
||||||
|
should see a list of logs, if you click on a log event it will expand
|
||||||
|
and show you all of the fields associated with that event and their
|
||||||
|
values (not Chromium and Kibana seem to have trouble with this at times
|
||||||
|
and some fields end up without values, use Firefox if this happens).
|
||||||
|
You can search based on all of these fields and if you click the
|
||||||
|
magnifying glass next to a field in the expanded event view it will add
|
||||||
|
that field and value to your search. This is a good way of refining
|
||||||
|
searches without a lot of typing.
|
||||||
|
|
||||||
|
The above is good info for poking around in the Logstash logs, but
|
||||||
|
one of your changes has a failing test and you want to know why. We
|
||||||
|
can jumpstart the refining process with a simple query.
|
||||||
|
|
||||||
|
``@fields.build_change:"$FAILING_CHANGE" AND @fields.build_patchset:"$FAILING_PATCHSET" AND @fields.build_name:"$FAILING_BUILD_NAME" AND @fields.build_number:"$FAILING_BUILD_NUMBER"``
|
||||||
|
|
||||||
|
This will show you all logs available from the patchset and build pair
|
||||||
|
that failed. Chances are that this is still a significant number of
|
||||||
|
logs and you will want to do more filtering. You can add more filters
|
||||||
|
to the queriy using ``AND`` and ``OR`` and parentheses can be used to
|
||||||
|
group sections of the query. Potential additions to the above query
|
||||||
|
might be
|
||||||
|
|
||||||
|
* ``AND @fields.filename:"logs/syslog.txt"`` to get syslog events.
|
||||||
|
* ``AND @fields.filename:"logs/screen-n-api.txt"`` to get Nova API events.
|
||||||
|
* ``AND @fields.loglevel:"ERROR"`` to get ERROR level events.
|
||||||
|
* ``AND @message"error"`` to get events with error in their message.
|
||||||
|
and so on.
|
||||||
|
|
||||||
|
General query tips:
|
||||||
|
|
||||||
|
* Don't search ``All time``. ElasticSearch is bad at trying to find all
|
||||||
|
the things it ever knew about. Give it a window of time to look
|
||||||
|
through. You can use the presets in the dropdown to select a window or
|
||||||
|
use the ``foo`` to ``bar`` boxes above the frequency graph.
|
||||||
|
* Only the @message field can have fuzzy searches performed on it. Other
|
||||||
|
fields require specific information.
|
||||||
|
* This system is growing fast and may not always keep up with the load.
|
||||||
|
Be patient. If expected logs do not show up immediately after the
|
||||||
|
Jenkins job completes wait a few minutes.
|
||||||
|
Loading…
Reference in New Issue
Block a user