Add logging with Heka spec
Change-Id: I278888696f54a813cdf4b8535ddd4b8af7fe318d
This commit is contained in:
parent
8129d01324
commit
ccf7aaff6f
|
@ -0,0 +1,229 @@
|
||||||
|
=================
|
||||||
|
Logging with Heka
|
||||||
|
=================
|
||||||
|
|
||||||
|
[No Jira Epic for this spec]
|
||||||
|
|
||||||
|
This specification describes the logging system to implement in Mirantis Cloud
|
||||||
|
Platform (MCP). It is complementary to the *General Logging, Monitoring,
|
||||||
|
Alerting architecture for MCP* specification.
|
||||||
|
|
||||||
|
That system is based on `Heka`_, `Elasticsearch`_ and `Kibana`_.
|
||||||
|
|
||||||
|
.. _Heka: http://hekad.readthedocs.org/
|
||||||
|
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch
|
||||||
|
.. _Kibana: https://www.elastic.co/products/kibana
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
It is important that MCP comes with a robust and scalable logging system. This
|
||||||
|
specification describes the logging system we want to implement in MCP. It is
|
||||||
|
based on StackLight's logging solution for MOS.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
The target for the logging system is the Operator of Kubernetes cluster. The
|
||||||
|
Operator will use Kibana to view and search logs, with dashboards providing
|
||||||
|
statistics views of the logs.
|
||||||
|
|
||||||
|
We also want to be able to derive metrics from logs and monitor logs to detect
|
||||||
|
spike of errors for example. The solution described in this specification makes
|
||||||
|
this possible, but the details of log monitoring will be covered with
|
||||||
|
a separate specification.
|
||||||
|
|
||||||
|
The deployment and configuration of Elasticsearch and Kibana through Kubernetes
|
||||||
|
will also be described with a separate specification.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Architecture
|
||||||
|
------------
|
||||||
|
|
||||||
|
This is the architecture::
|
||||||
|
|
||||||
|
Cluster nodes
|
||||||
|
+---------------+ +----------------+
|
||||||
|
| +---------------+ | +----------------+
|
||||||
|
| | +---------------+ | | +----------------+
|
||||||
|
| | | | | | | |
|
||||||
|
| | | Logs+-+ | | | | |
|
||||||
|
| | | | | | | | |
|
||||||
|
| | | | | | | | Elasticsearch |
|
||||||
|
| | | +--v-+ | | | | |
|
||||||
|
| | | |Heka+--------------------------> | | |
|
||||||
|
+-+ | +----+ | +-+ | |
|
||||||
|
+-+ | +-+ |
|
||||||
|
+---------------+ +----------------+
|
||||||
|
|
||||||
|
In this architecture Heka runs on every node of the Kubernetes cluster. It runs
|
||||||
|
in a dedicated container, referred to as the *Heka container* in the rest of
|
||||||
|
this document.
|
||||||
|
|
||||||
|
Each Heka instance reads and processes the logs local to the node it runs on,
|
||||||
|
and sends these logs to Elasticsearch for indexing. Elasticsearch may be
|
||||||
|
distributed on multiple nodes for resiliency and scalability, but this topic is
|
||||||
|
outside the scope of that specification.
|
||||||
|
|
||||||
|
Heka, written in Go, is fast and has a small footprint, making it possible to
|
||||||
|
run it on every node of the cluster and effectively distribute the log
|
||||||
|
processing load.
|
||||||
|
|
||||||
|
Another important aspect is flow control and avoiding the loss of log messages
|
||||||
|
in case of overload. Heka’s filter and output plugins, and the Elasticsearch
|
||||||
|
output plugin in particular, support the use of a disk based message queue.
|
||||||
|
This message queue allows plugins to reprocess messages from the queue when
|
||||||
|
downstream servers (Elasticsearch) are down or cannot keep up with the data
|
||||||
|
flow.
|
||||||
|
|
||||||
|
Rely on Docker Logging
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Based on `discussions`_ with the Mirantis architects and experience gained with
|
||||||
|
the Kolla project the plan is to rely on `Docker Logging`_ and Heka's
|
||||||
|
`DockerLogInput plugin`_.
|
||||||
|
|
||||||
|
Since the `Kolla logging specification`_ was written the support for Docker
|
||||||
|
Logging has improved in Heka. More specifically Heka is now able to collect
|
||||||
|
logs that were created while Heka wasn't running.
|
||||||
|
|
||||||
|
Things to note:
|
||||||
|
|
||||||
|
* When ``DockerLogInput`` is used there is no way to differentiate log messages
|
||||||
|
for containers producing multiple log streams – containers running multiple
|
||||||
|
processes/agents for example. So some other technique will have to be used
|
||||||
|
for containers producing multiple log streams. One technique involves using
|
||||||
|
log files and Docker volumes, which is the technique currently used in Kolla.
|
||||||
|
Another technique involves having services use Syslog and have Heka act as
|
||||||
|
a Syslog server for these services.
|
||||||
|
|
||||||
|
* We will also probably encounter services that cannot be configured to log to
|
||||||
|
``stdout``. So again, we will have to resort to using some other technique
|
||||||
|
for these services. Log files or Syslog can be used, as described previously.
|
||||||
|
|
||||||
|
* Past experiments have shown that the OpenStack logs written to ``stdout`` are
|
||||||
|
visible to neither Heka nor ``docker logs``. This problem does not exist
|
||||||
|
when ``stderr`` is used rather than ``stdout``. The cause of this problem is
|
||||||
|
currently unknown.
|
||||||
|
|
||||||
|
* ``DockerLogInput`` relies on Docker's `Get container logs endpoint`_, which
|
||||||
|
works only for containers with the ``json-file`` or ``journald`` logging
|
||||||
|
drivers. This means the Docker daemon cannot be configured with another
|
||||||
|
logging driver than ``json-file`` or ``journald``.
|
||||||
|
|
||||||
|
* If the ``json-file`` logging driver is used then the ``max-size`` and
|
||||||
|
``max-file`` options should be set, for containers logs to be rolled over as
|
||||||
|
appropriate. These options are not set by default in Ubuntu (in neither 14.04
|
||||||
|
nor 16.04).
|
||||||
|
|
||||||
|
.. _discussions: https://docs.google.com/document/d/15QYIX_cggbDH2wAJ6-7xUfmyZ3Izy_MOasVACutwqkE
|
||||||
|
.. _Docker Logging: https://docs.docker.com/engine/admin/logging/overview/
|
||||||
|
.. _DockerLogInput plugin: http://hekad.readthedocs.org/en/v0.10.0/config/inputs/docker_log.html
|
||||||
|
.. _Kolla logging specification: https://github.com/openstack/kolla/blob/master/specs/logging-with-heka.rst
|
||||||
|
.. _Get container logs endpoint: https://docs.docker.com/engine/reference/api/docker_remote_api_v1.20/#get-container-logs
|
||||||
|
|
||||||
|
Read Python Tracebacks
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
In case of exceptions the OpenStack services log Python Tracebacks as multiple
|
||||||
|
log messages. If no special care is taken then the Python Tracebacks will be
|
||||||
|
indexed as separate documents in Elasticsearch, and displayed as distinct log
|
||||||
|
entries in Kibana, making them hard to read. To address that issue we will use
|
||||||
|
a custom Heka decoder, which will be responsible for coalescing the log lines
|
||||||
|
making up a Python Traceback into one message.
|
||||||
|
|
||||||
|
Collect system logs
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
In addition to container logs it is important to collect system logs as well.
|
||||||
|
For that we propose to mount the host's ``/var/log`` directory into the Heka
|
||||||
|
container (as ``/var/log-host/``), and configure Heka to get logs from standard
|
||||||
|
log files located in that directory (e.g. ``kern.log``, ``auth.log``,
|
||||||
|
``messages``).
|
||||||
|
|
||||||
|
Create an ``heka`` user
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
For security reasons an ``heka`` user will be created in the Heka container and
|
||||||
|
the ``hekad`` daemon will run under that user.
|
||||||
|
|
||||||
|
Deployment
|
||||||
|
----------
|
||||||
|
|
||||||
|
Following the MCP approach to packaging and service execution the Heka daemon
|
||||||
|
will run in a container. We plan to rely on Kubernetes's `Daemon Sets`_
|
||||||
|
functionality for deploying Heka on all the Kubernetes nodes.
|
||||||
|
|
||||||
|
We also want Heka to be deployed on the Kubernetes master node. For that the
|
||||||
|
Kubernetes master node should also be a minion server, where Kubernetes may
|
||||||
|
deploy containers.
|
||||||
|
|
||||||
|
.. _Daemon Sets: http://kubernetes.io/docs/admin/daemons/
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The security impact is minor, as Heka will not expose any network port to the
|
||||||
|
outside. Also, Heka's "dynamic sandboxes" functionality will be disabled,
|
||||||
|
eliminating the risk of injecting malicious code into the Heka pipeline.
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
The ``hekad`` daemon will run in a container on each cluster node. And we have
|
||||||
|
assessed that Heka is lightweight enough to run on every node. See the
|
||||||
|
`Introduction of Heka in Kolla`_ email sent to the openstack-dev mailing list
|
||||||
|
for a discussion on comparison between Heka and Logstash. Also, a possible
|
||||||
|
option would be to constrain the resources associated to the Heka container.
|
||||||
|
|
||||||
|
.. _Introduction of Heka in Kolla: http://lists.openstack.org/pipermail/openstack-dev/2016-January/083751.html
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
An alternative to this proposal involves relying on Kubernetes Logging, i.e.
|
||||||
|
use Kubernetes's native logging system. Some `research`_ has been done on
|
||||||
|
Kubernetes Logging. The conclusion to this research is that Kubernetes Logging
|
||||||
|
is not flexible enough, making it impossible to implement features such as
|
||||||
|
log monitoring in the future.
|
||||||
|
|
||||||
|
.. _research: https://mirantis.jira.com/wiki/display/NG/k8s+LMA+approaches
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
Éric Lemoine (elemoine)
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
1. Create an Heka Docker image
|
||||||
|
2. Create some general Heka configuration
|
||||||
|
3. Deploy Heka through Kubernetes
|
||||||
|
4. Collect OpenStack logs
|
||||||
|
5. Collect other services' logs (RabbitMQ, MySQL...)
|
||||||
|
6. Collect Kubernetes logs
|
||||||
|
7. Send logs to Elasticsearch
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
We will add functional tests that verify that the Heka chain works for all the
|
||||||
|
service and system logs Heka collects. These tests will be executed as part of
|
||||||
|
the gating process.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
None.
|
Loading…
Reference in New Issue