Add logging with Heka spec

Change-Id: I278888696f54a813cdf4b8535ddd4b8af7fe318d
2016-04-01 15:23:13 +02:00 · 2016-04-01 15:23:13 +02:00 · ccf7aaff6f
parent 8129d01324
commit ccf7aaff6f
1 changed files with 229 additions and 0 deletions
--- a/specs/logging-with-heka.rst
+++ b/specs/logging-with-heka.rst
@ -0,0 +1,229 @@
 =================
 Logging with Heka
 =================
 [No Jira Epic for this spec]
 This specification describes the logging system to implement in Mirantis Cloud
 Platform (MCP). It is complementary to the *General Logging, Monitoring,
 Alerting architecture for MCP* specification.
 That system is based on `Heka`_, `Elasticsearch`_ and `Kibana`_.
 .. _Heka: http://hekad.readthedocs.org/
 .. _Elasticsearch: https://www.elastic.co/products/elasticsearch
 .. _Kibana: https://www.elastic.co/products/kibana
 Problem description
 ===================
 It is important that MCP comes with a robust and scalable logging system. This
 specification describes the logging system we want to implement in MCP. It is
 based on StackLight's logging solution for MOS.
 Use Cases
 ---------
 The target for the logging system is the Operator of Kubernetes cluster. The
 Operator will use Kibana to view and search logs, with dashboards providing
 statistics views of the logs.
 We also want to be able to derive metrics from logs and monitor logs to detect
 spike of errors for example. The solution described in this specification makes
 this possible, but the details of log monitoring will be covered with
 a separate specification.
 The deployment and configuration of Elasticsearch and Kibana through Kubernetes
 will also be described with a separate specification.
 Proposed change
 ===============
 Architecture
 ------------
 This is the architecture::
      Cluster nodes
    +---------------+                        +----------------+
    | +---------------+                      | +----------------+
    | | +---------------+                    | | +----------------+
    | | |               |                    | | |                |
    | | | Logs+-+       |                    | | |                |
    | | |       |       |                    | | |                |
    | | |       |       |                    | | | Elasticsearch  |
    | | |    +--v-+     |                    | | |                |
    | | |    |Heka+--------------------------> | |                |
    +-+ |    +----+     |                    +-+ |                |
      +-+               |                      +-+                |
        +---------------+                        +----------------+
 In this architecture Heka runs on every node of the Kubernetes cluster. It runs
 in a dedicated container, referred to as the *Heka container* in the rest of
 this document.
 Each Heka instance reads and processes the logs local to the node it runs on,
 and sends these logs to Elasticsearch for indexing. Elasticsearch may be
 distributed on multiple nodes for resiliency and scalability, but this topic is
 outside the scope of that specification.
 Heka, written in Go, is fast and has a small footprint, making it possible to
 run it on every node of the cluster and effectively distribute the log
 processing load.
 Another important aspect is flow control and avoiding the loss of log messages
 in case of overload. Heka’s filter and output plugins, and the Elasticsearch
 output plugin in particular, support the use of a disk based message queue.
 This message queue allows plugins to reprocess messages from the queue when
 downstream servers (Elasticsearch) are down or cannot keep up with the data
 flow.
 Rely on Docker Logging
 ----------------------
 Based on `discussions`_ with the Mirantis architects and experience gained with
 the Kolla project the plan is to rely on `Docker Logging`_ and Heka's
 `DockerLogInput plugin`_.
 Since the `Kolla logging specification`_ was written the support for Docker
 Logging has improved in Heka. More specifically Heka is now able to collect
 logs that were created while Heka wasn't running.
 Things to note:
 * When ``DockerLogInput`` is used there is no way to differentiate log messages
  for containers producing multiple log streams – containers running multiple
  processes/agents for example. So some other technique will have to be used
  for containers producing multiple log streams. One technique involves using
  log files and Docker volumes, which is the technique currently used in Kolla.
  Another technique involves having services use Syslog and have Heka act as
  a Syslog server for these services.
 * We will also probably encounter services that cannot be configured to log to
  ``stdout``. So again, we will have to resort to using some other technique
  for these services. Log files or Syslog can be used, as described previously.
 * Past experiments have shown that the OpenStack logs written to ``stdout`` are
  visible to neither Heka nor ``docker logs``.  This problem does not exist
  when ``stderr`` is used rather than ``stdout``.  The cause of this problem is
  currently unknown.
 * ``DockerLogInput`` relies on Docker's `Get container logs endpoint`_, which
  works only for containers with the ``json-file`` or ``journald`` logging
  drivers. This means the Docker daemon cannot be configured with another
  logging driver than ``json-file`` or ``journald``.
 * If the ``json-file`` logging driver is used then the ``max-size`` and
  ``max-file`` options should be set, for containers logs to be rolled over as
  appropriate. These options are not set by default in Ubuntu (in neither 14.04
  nor 16.04).
 .. _discussions: https://docs.google.com/document/d/15QYIX_cggbDH2wAJ6-7xUfmyZ3Izy_MOasVACutwqkE
 .. _Docker Logging: https://docs.docker.com/engine/admin/logging/overview/
 .. _DockerLogInput plugin: http://hekad.readthedocs.org/en/v0.10.0/config/inputs/docker_log.html
 .. _Kolla logging specification: https://github.com/openstack/kolla/blob/master/specs/logging-with-heka.rst
 .. _Get container logs endpoint: https://docs.docker.com/engine/reference/api/docker_remote_api_v1.20/#get-container-logs
 Read Python Tracebacks
 ----------------------
 In case of exceptions the OpenStack services log Python Tracebacks as multiple
 log messages. If no special care is taken then the Python Tracebacks will be
 indexed as separate documents in Elasticsearch, and displayed as distinct log
 entries in Kibana, making them hard to read.  To address that issue we will use
 a custom Heka decoder, which will be responsible for coalescing the log lines
 making up a Python Traceback into one message.
 Collect system logs
 -------------------
 In addition to container logs it is important to collect system logs as well.
 For that we propose to mount the host's ``/var/log`` directory into the Heka
 container (as ``/var/log-host/``), and configure Heka to get logs from standard
 log files located in that directory (e.g. ``kern.log``, ``auth.log``,
 ``messages``).
 Create an ``heka`` user
 -----------------------
 For security reasons an ``heka`` user will be created in the Heka container and
 the ``hekad`` daemon will run under that user.
 Deployment
 ----------
 Following the MCP approach to packaging and service execution the Heka daemon
 will run in a container. We plan to rely on Kubernetes's `Daemon Sets`_
 functionality for deploying Heka on all the Kubernetes nodes.
 We also want Heka to be deployed on the Kubernetes master node. For that the
 Kubernetes master node should also be a minion server, where Kubernetes may
 deploy containers.
 .. _Daemon Sets: http://kubernetes.io/docs/admin/daemons/
 Security impact
 ---------------
 The security impact is minor, as Heka will not expose any network port to the
 outside. Also, Heka's "dynamic sandboxes" functionality will be disabled,
 eliminating the risk of injecting malicious code into the Heka pipeline.
 Performance Impact
 ------------------
 The ``hekad`` daemon will run in a container on each cluster node. And we have
 assessed that Heka is lightweight enough to run on every node. See the
 `Introduction of Heka in Kolla`_ email sent to the openstack-dev mailing list
 for a discussion on comparison between Heka and Logstash. Also, a possible
 option would be to constrain the resources associated to the Heka container.
 .. _Introduction of Heka in Kolla: http://lists.openstack.org/pipermail/openstack-dev/2016-January/083751.html
 Alternatives
 ------------
 An alternative to this proposal involves relying on Kubernetes Logging, i.e.
 use Kubernetes's native logging system. Some `research`_ has been done on
 Kubernetes Logging. The conclusion to this research is that Kubernetes Logging
 is not flexible enough, making it impossible to implement features such as
 log monitoring in the future.
 .. _research: https://mirantis.jira.com/wiki/display/NG/k8s+LMA+approaches
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  Éric Lemoine (elemoine)
 Work Items
 ----------
 1. Create an Heka Docker image
 2. Create some general Heka configuration
 3. Deploy Heka through Kubernetes
 4. Collect OpenStack logs
 5. Collect other services' logs (RabbitMQ, MySQL...)
 6. Collect Kubernetes logs
 7. Send logs to Elasticsearch
 Testing
 =======
 We will add functional tests that verify that the Heka chain works for all the
 service and system logs Heka collects. These tests will be executed as part of
 the gating process.
 Documentation Impact
 ====================
 None.
 References
 ==========
 None.