From a31a70d265b7696b8607a7dcc65823611bc3c5cd Mon Sep 17 00:00:00 2001 From: caoyuan Date: Sat, 15 Oct 2016 04:07:29 +0000 Subject: [PATCH] Delete the unnecessary space TrivialFix Change-Id: Ifca64d8c4d6e7654d25001022e806f082acfa0b8 --- specs/ansible-multi.rst | 30 ++++----- specs/containerize-openstack.rst | 46 ++++++------- specs/kubernetes-deployment.rst | 46 ++++++------- specs/logging-with-heka.rst | 110 +++++++++++++++---------------- specs/template.rst | 4 +- 5 files changed, 118 insertions(+), 118 deletions(-) diff --git a/specs/ansible-multi.rst b/specs/ansible-multi.rst index 1d05e7f049..15292ecb3b 100644 --- a/specs/ansible-multi.rst +++ b/specs/ansible-multi.rst @@ -9,13 +9,13 @@ Multi-node Ansible ================== This blueprint specifies an approach to automate the deployment of OpenStack -using Ansible and Docker best practices. The overriding principles used in +using Ansible and Docker best practices. The overriding principles used in this specification are simplicity, flexibility and optimized deployment speed. Problem description =================== -Kolla can be deployed multi-node currently. To do so, the environment +Kolla can be deployed multi-node currently. To do so, the environment variables must be hand edited to define the hosts to connect to for various services. @@ -42,10 +42,10 @@ Proposed change =============== The docker-compose tool is single node and does nearly the same job as Ansible -would in this specification. As a result, we recommend deprecating +would in this specification. As a result, we recommend deprecating docker-compose as the default deployment system for Kolla. -To replace it, we recommend Ansible as a technology choice. Ansible is easy +To replace it, we recommend Ansible as a technology choice. Ansible is easy to learn, easy to use, and offers a base set of functionality to solve deployment as outlined in our four use cases. @@ -53,36 +53,36 @@ We recommend three models of configuration. The first model is based upon internally configuring the container and having the container take responsibility for all container configuration including -database setup, database synchronization, and keystone registration. This -model uses docker-compose and docker as dependencies. Existing containers will +database setup, database synchronization, and keystone registration. This +model uses docker-compose and docker as dependencies. Existing containers will be maintained but new container content will use either of the two remaining -models. James Slagle (TripleO PTL on behalf of our downstream TripleO +models. James Slagle (TripleO PTL on behalf of our downstream TripleO community) was very clear that he would prefer to see this model stay available -and maintained. As TripleO enters the world of Big Tent, they don't intend to +and maintained. As TripleO enters the world of Big Tent, they don't intend to deploy all of the services, and as such it doesn't make sense to maintain this legacy operational mode for new container content except on demand of our -downstreams, hopefully with their assistance. This model is called +downstreams, hopefully with their assistance. This model is called CONFIG_INSIDE. The second model and third model configure the containers outside of the -container. These models depend on Ansible and Docker. In the future, the +container. These models depend on Ansible and Docker. In the future, the OpenStack Puppet, OpenStack Chef and TripleO communities may decide to switch to one of these two models in which case these communities may maintain tooling -to integrate with Kolla. The major difference between these two models is that +to integrate with Kolla. The major difference between these two models is that one offers immutability and single source of truth (CONFIG_OUTSIDE_COPY_ONCE), while the third model trades these two properties to allow an Operator to directly modify configuration files on a system and have the configuration be -live in the container (CONFIG_OUTSIDE_COPY_ALWAYS). Because +live in the container (CONFIG_OUTSIDE_COPY_ALWAYS). Because CONFIG_OUTSIDE_COPY_ALWAYS requires direct Operator intervention on a node, and we prefer as a community Operators interact with the tools provided by Kolla, CONFIG_OUTSIDE_COPY_ONCE will be the default. We do not have to further enhance two sets of container configuration, but instead can focus our development effort on the default Ansible configuration -methods. If a defect is found in one of the containers based upon the +methods. If a defect is found in one of the containers based upon the CONFIG_INSIDE model, the community will repair it. -Finally we will implement a complete Ansible deployment system. The details +Finally we will implement a complete Ansible deployment system. The details of the implementation are covered in a later section in this specification. We estimate this will be approximately ~1000 LOC defining ~100 Ansible tasks. We further estimate the total code base when complete will be under 6 KLOC. @@ -97,7 +97,7 @@ best practices while introducing completely customizable configuration. The CONFIG_OUTSIDE_COPY_ALWAYS model of configuration offers the Operator greater flexibility in managing their deployment, at greater risk of damaging -their deployment. It trades one set of best practices for another, +their deployment. It trades one set of best practices for another, specifically the Kolla container best practices for flexibility. Security impact diff --git a/specs/containerize-openstack.rst b/specs/containerize-openstack.rst index 11d775d463..89f2e71bc4 100644 --- a/specs/containerize-openstack.rst +++ b/specs/containerize-openstack.rst @@ -9,8 +9,8 @@ Containerize OpenStack ====================== When upgrading or downgrading OpenStack, it is possible to use package based -management or image-based management. Containerizing OpenStack is meant to -optimize image-based management of OpenStack. Containerizing OpenStack +management or image-based management. Containerizing OpenStack is meant to +optimize image-based management of OpenStack. Containerizing OpenStack solves a manageability and availability problem with the current state of the art deployment systems in OpenStack. @@ -20,34 +20,34 @@ Problem description Current state of the art deployment systems use either image based or package based upgrade. -Image based upgrades are utilized by TripleO. When TripleO updates a system, +Image based upgrades are utilized by TripleO. When TripleO updates a system, it creates an image of the entire disk and deploys that rather than just the -parts that compose the OpenStack deployment. This results in significant -loss of availability. Further running VMs are shut down in the imaging -process. However, image based systems offer atomicity, because all related +parts that compose the OpenStack deployment. This results in significant +loss of availability. Further running VMs are shut down in the imaging +process. However, image based systems offer atomicity, because all related software for a service is updated in one atomic action by reimaging the system. -Other systems use package based upgrade. Package based upgrades suffer from -a non-atomic nature. An update may update 1 or more RPM packages. The update +Other systems use package based upgrade. Package based upgrades suffer from +a non-atomic nature. An update may update 1 or more RPM packages. The update process could fail for any number of reasons, and there is no way to back -out the existing changes. Typically in an OpenStack deployment it is +out the existing changes. Typically in an OpenStack deployment it is desirable to update a service that does one thing including it's dependencies -as an atomic unit. Package based upgrades do not offer atomicity. +as an atomic unit. Package based upgrades do not offer atomicity. To solve this problem, containers can be used to provide an image-based update approach which offers atomic upgrade of a running system with minimal -interruption in service. A rough prototype of compute upgrade [1] shows +interruption in service. A rough prototype of compute upgrade [1] shows approximately a 10 second window of unavailability during a software update. The prototype keeps virtual machines running without interruption. Use cases --------- -1. Upgrade or rollback OpenStack deployments atomically. End-user wants to +1. Upgrade or rollback OpenStack deployments atomically. End-user wants to change the running software versions in her system to deploy a new upstream release without interrupting service for significant periods. -2. Upgrade OpenStack based by component. End-user wants to upgrade her system +2. Upgrade OpenStack based by component. End-user wants to upgrade her system in fine-grained chunks to limit damage from a failed upgrade. -3. Rollback OpenStack based by component. End-user experienced a failed +3. Rollback OpenStack based by component. End-user experienced a failed upgrade and wishes to rollback to the last known good working version. @@ -180,16 +180,16 @@ The various container sets are composed in more detail as follows: * swift-proxy-server In order to achieve the desired results, we plan to permit super-privileged -containers. A super-privileged container is defined as any container launched +containers. A super-privileged container is defined as any container launched with the --privileged=true flag to docker that: * bind-mounts specific security-crucial host operating system directories - with -v. This includes nearly all directories in the filesystem except for + with -v. This includes nearly all directories in the filesystem except for leaf directories with no other host operating system use. * shares any namespace with the --ipc=host, --pid=host, or --net=host flags We will not use the Docker EXPOSE operation since all containers will use ---net=host. One motive for using --net=host is it is inherently simpler. +--net=host. One motive for using --net=host is it is inherently simpler. A different motive for not using EXPOSE is the 20 microsecond penalty applied to every packet forwarded and returned by docker-proxy. If EXPOSE functionality is desired, it can be added back by @@ -207,12 +207,12 @@ If the container does not pass its healthcheck operation, it should be restarted. Integration of metadata with fig or a similar single node Docker orchestration -tool will be implemented. Even though fig executes on a single node, the +tool will be implemented. Even though fig executes on a single node, the containers will be designed to run multi-node and the deploy tool should take -some form of information to allow it to operate multi-node. The deploy tool +some form of information to allow it to operate multi-node. The deploy tool should take a set of key/value pairs as inputs and convert them into inputs -into the environment passed to Docker. These key/value pairs could be a file -or environment variables. We will not offer integration with multi-node +into the environment passed to Docker. These key/value pairs could be a file +or environment variables. We will not offer integration with multi-node scheduling or orchestration tools, but instead expect our consumers to manage each bare metal machine using our fig or similar in nature tool integration. @@ -220,7 +220,7 @@ Any contributions from the community of the required metadata to run these containers using a multi-node orchestration tool will be warmly received but generally won't be maintained by the core team. -The technique for launching the deploy script is not handled by Kolla. This +The technique for launching the deploy script is not handled by Kolla. This is a problem for a higher level deployment tool such as TripleO or Fuel to tackle. @@ -229,7 +229,7 @@ Logs from the individual containers will be retrievable in some consistent way. Security impact --------------- -Container usage with super-privileged mode may possibly impact security. For +Container usage with super-privileged mode may possibly impact security. For example, when using --net=host mode and bind-mounting /run which is necessary for a compute node, it is possible that a compute breakout could corrupt the host operating system. diff --git a/specs/kubernetes-deployment.rst b/specs/kubernetes-deployment.rst index edf5454c61..37cfc759ff 100644 --- a/specs/kubernetes-deployment.rst +++ b/specs/kubernetes-deployment.rst @@ -6,7 +6,7 @@ https://blueprints.launchpad.net/kolla/+spec/kolla-kubernetes Kubernetes was evaluated by the Kolla team in the first two months of the project and it was found to be problematic because it did not support net=host, -pid=host, and --privileged features in docker. Since then, it has developed +pid=host, and --privileged features in docker. Since then, it has developed these features [1]. The objective is to manage the lifecycle of containerized OpenStack services by @@ -51,7 +51,7 @@ Orchestration ------------- OpenStack on Kubernetes will be orchestrated by outside tools in order to create -a production ready OpenStack environment. The kolla-kubernetes repo is where +a production ready OpenStack environment. The kolla-kubernetes repo is where any deployment tool can join the community and be a part of orchestrating a kolla-kubernetes deployment. @@ -60,10 +60,10 @@ Service Config Management Config generation will be completely decoupled from the deployment. The containers only expect a config file to land in a specific directory in -the container in order to run. With this decoupled model, any tool could be -used to generate config files. The kolla-kubernetes community will evaluate +the container in order to run. With this decoupled model, any tool could be +used to generate config files. The kolla-kubernetes community will evaluate any config generation tool, but will likely use Ansible for config generation -in order to reuse existing work from the community. This solution uses +in order to reuse existing work from the community. This solution uses customized Ansible and jinja2 templates to generate the config. Also, there will be a maintained set of defaults and a global yaml file that can override the defaults. @@ -82,7 +82,7 @@ will be a Kubernetes Job, which will run the task until completion then terminate the pods [7]. Each service will have a bootstrap task so that when the operator upgrades, -the bootstrap tasks are reused to upgrade the database. This will allow +the bootstrap tasks are reused to upgrade the database. This will allow deployment and upgrades to follow the same pipeline. The Kolla containers will communicate with the Kubernetes API server to in order @@ -96,14 +96,14 @@ require some orchestration and the bootstrap pod will need to be setup to never restart or be replicated. 2) Use a sidecar container in the pod to handle the database sync with proper -health checking to make sure the services are coming up healthy. The big +health checking to make sure the services are coming up healthy. The big difference between kolla's old docker-compose solution and Kubernetes, is that -docker-compose would only restart the containers. Kubernetes will completely -reschedule them. Which means, removing the pod and restarting it. The reason +docker-compose would only restart the containers. Kubernetes will completely +reschedule them. Which means, removing the pod and restarting it. The reason this would fix that race condition failure kolla saw from docker-compose is because glance would be rescheduled on failure allowing keystone to get a chance to sync with the database and become active instead of constantly being -piled with glance requests. There can also be health checks around this to help +piled with glance requests. There can also be health checks around this to help determine order. If kolla-kubernetes used this sidecar approach, it would regain the use of @@ -116,12 +116,12 @@ Dependencies - Docker >= 1.10.0 - Jinja2 >= 2.8.0 -Kubernetes does not support dependencies between pods. The operator will launch +Kubernetes does not support dependencies between pods. The operator will launch all the services and use kubernetes health checks to bring the deployment to an operational state. With orchestration around Kubernetes, the operator can determine what tasks are -run and when the tasks are run. This way, dependencies are handled at the +run and when the tasks are run. This way, dependencies are handled at the orchestration level, but they are not required because proper health checking will bring up the cluster in a healthy state. @@ -133,7 +133,7 @@ desired state for the pods and the deployment will move the cluster to the desired state when a change is detected. Kolla-kubernetes will provide Jobs that will provide the operator with the -flexibility needed to under go a step wise upgrade. In future releases, +flexibility needed to under go a step wise upgrade. In future releases, kolla-kubernetes will look to Kubernetes to provide a means for operators to plugin these jobs into a Deployment. @@ -141,22 +141,22 @@ Reconfigure ----------- The operator generates a new config and loads it into the Kubernetes configmap -by changing the configmap version in the service yaml file. Then, the operator +by changing the configmap version in the service yaml file. Then, the operator will trigger a rolling upgrade, which will scale down old pods and bring up new ones that will run with the updated configuration files. There's an open issue upstream in Kubernetes where the plan is to add support -around detecting if a pod has a changed in the configmap [6]. Depending on what -the solution is, kolla-kubernetes may or may not use it. The rolling +around detecting if a pod has a changed in the configmap [6]. Depending on what +the solution is, kolla-kubernetes may or may not use it. The rolling upgrade feature will provide kolla-kubernetes with an elegant way to handle restarting the services. HA Architecture --------------- -Kubernetes uses health checks to bring up the services. Therefore, +Kubernetes uses health checks to bring up the services. Therefore, kolla-kubernetes will use the same checks when monitoring if a service is -healthy. When a service fails, the replication controller will be responsible +healthy. When a service fails, the replication controller will be responsible for bringing up a new container in its place [8][9]. However, Kubernetes does not cover all the HA corner cases, for instance, @@ -178,14 +178,14 @@ guarantee a pod will always be scheduled to a host, it makes node based persistent storage unlikely, unless the community uses labels for every pod. Persistent storage in kolla-kubernetes will come from volumes backed by -different storage offerings to provide persistent storage. Kolla-kubernetes +different storage offerings to provide persistent storage. Kolla-kubernetes will provide a default solution using Ceph RBD, that the community will use to deploy multinode with. From there, kolla-kubernetes can add any additional persistent storage options as well as support options for the operator to reference an existing storage solution. To deploy Ceph, the community will use the Ansible playbooks from Kolla to -deploy a containerized Ceph at least for the 1.0 release. After Kubernetes +deploy a containerized Ceph at least for the 1.0 release. After Kubernetes deployment matures, the community can evaluate building its own Ceph deployment solution. @@ -198,9 +198,9 @@ Service Roles At the broadest level, OpenStack can split up into two main roles, Controller and Compute. With Kubernetes, the role definition layer changes. Kolla-kubernetes will still need to define Compute nodes, but not Controller -nodes. Compute nodes hold the libvirt container and the running vms. That +nodes. Compute nodes hold the libvirt container and the running vms. That service cannont migrate because the vms associated with it exist on the node. -However, the Controller role is more flexible. The Kubernetes layer provides IP +However, the Controller role is more flexible. The Kubernetes layer provides IP persistence so that APIs will remain active and abstracted from the operator's view [15]. kolla-kubernetes can direct Controller services away from the Compute node using labels, while managing Compute services more strictly. @@ -244,7 +244,7 @@ To reuse Kolla's containers, kolla-kubernetes will use elastic search, heka, and kibana as the default logging mechanism. The community will implement centralized logging by using a 'side car' container -in the Kubernetes pod [17]. The logging service will trace the logs from the +in the Kubernetes pod [17]. The logging service will trace the logs from the shared volume of the running serivce and send the data to elastic search. This solution is ideal because volumes are shared amoung the containers in a pod. diff --git a/specs/logging-with-heka.rst b/specs/logging-with-heka.rst index 776c9f5258..17ba4b3eb9 100644 --- a/specs/logging-with-heka.rst +++ b/specs/logging-with-heka.rst @@ -4,15 +4,15 @@ Logging with Heka https://blueprints.launchpad.net/kolla/+spec/heka -Kolla currently uses Rsyslog for logging. And Change Request ``252968`` [1] +Kolla currently uses Rsyslog for logging. And Change Request ``252968`` [1] suggests to use ELK (Elasticsearch, Logstash, Kibana) as a way to index all the logs, and visualize them. This spec suggests using Heka [2] instead of Logstash, while still using -Elasticsearch for indexing and Kibana for visualization. It also discusses +Elasticsearch for indexing and Kibana for visualization. It also discusses the removal of Rsyslog along the way. -What is Heka? Heka is a open-source stream processing software created and +What is Heka? Heka is a open-source stream processing software created and maintained by Mozilla. Using Heka will provide a lightweight and scalable log processing solution @@ -22,7 +22,7 @@ Problem description =================== Change Request ``252968`` [1] adds an Ansible role named "elk" that enables -deploying ELK (Elasticsearch, Logstash, Kibana) on nodes with that role. This +deploying ELK (Elasticsearch, Logstash, Kibana) on nodes with that role. This spec builds on that work, proposing a scalable log processing architecture based on the Heka [2] stream processing software. @@ -34,7 +34,7 @@ OpenStack nodes rather than using a centralized log processing engine that represents a bottleneck and a single-point-of-failure. We also know from experience that Heka provides all the necessary flexibility -for processing other types of data streams than log messages. For example, we +for processing other types of data streams than log messages. For example, we already use Heka together with Elasticsearch for logs, but also with collectd and InfluxDB for statistics and metrics. @@ -53,16 +53,16 @@ in a dedicated container, referred to as the Heka container in the rest of this document. Each Heka instance reads and processes the logs local to the node it runs on, -and sends these logs to Elasticsearch for indexing. Elasticsearch may be +and sends these logs to Elasticsearch for indexing. Elasticsearch may be distributed on multiple nodes for resiliency and scalability, but that part is outside the scope of that specification. Heka, written in Go, is fast and has a small footprint, making it possible to -run it on every node of the cluster. In contrast, Logstash runs in a JVM and +run it on every node of the cluster. In contrast, Logstash runs in a JVM and is known [3] to be too heavy to run on every node. Another important aspect is flow control and avoiding the loss of log messages -in case of overload. Heka’s filter and output plugins, and the Elasticsearch +in case of overload. Heka’s filter and output plugins, and the Elasticsearch output plugin in particular, support the use of a disk based message queue. This message queue allows plugins to reprocess messages from the queue when downstream servers (Elasticsearch) are down or cannot keep up with the data @@ -74,20 +74,20 @@ which introduces some complexity and other points-of-failures. Remove Rsyslog -------------- -Kolla currently uses Rsyslog. The Kolla services are configured to write their -logs to Syslog. Rsyslog gets the logs from the ``/var/lib/kolla/dev/log`` Unix -socket and dispatches them to log files on the local file system. Rsyslog +Kolla currently uses Rsyslog. The Kolla services are configured to write their +logs to Syslog. Rsyslog gets the logs from the ``/var/lib/kolla/dev/log`` Unix +socket and dispatches them to log files on the local file system. Rsyslog running in a Docker container, the log files are stored in a Docker volume (named ``rsyslog``). With Rsyslog already running on each cluster node, the question of using two -log processing daemons, namely ``rsyslogd`` and ``hekad``, has been raised on -the mailing list. The spec evaluates the possibility of using ``hekad`` only, +log processing daemons, namely ``rsyslogd`` and ``hekad``, has been raised on +the mailing list. The spec evaluates the possibility of using ``hekad`` only, based on some prototyping work we have conducted [4]. Note: Kolla doesn't currently collect logs from RabbitMQ, HAProxy and -Keepalived. For RabbitMQ the problem is related to RabbitMQ not having the -capability to write its logs to Syslog. HAProxy and Keepalived do have that +Keepalived. For RabbitMQ the problem is related to RabbitMQ not having the +capability to write its logs to Syslog. HAProxy and Keepalived do have that capability, but the ``/var/lib/kolla/dev/log`` Unix socket file is currently not mounted into the HAProxy and Keepalived containers. @@ -96,21 +96,21 @@ Use Heka's ``DockerLogInput`` plugin To remove Rsyslog and only use Heka one option would be to make the Kolla services write their logs to ``stdout`` (or ``stderr``) and rely on Heka's -``DockerLogInput`` plugin [5] for reading the logs. Our experiments have +``DockerLogInput`` plugin [5] for reading the logs. Our experiments have revealed a number of problems with this option: * The ``DockerLogInput`` plugin doesn't currently work for containers that have - a ``tty`` allocated. And Kolla currently allocates a tty for all containers + a ``tty`` allocated. And Kolla currently allocates a tty for all containers (for good reasons). * When ``DockerLogInput`` is used there is no way to differentiate log messages - for containers producing multiple log streams. ``neutron-agents`` is an - example of such a container. (Sam Yaple has raised that issue multiple + for containers producing multiple log streams. ``neutron-agents`` is an + example of such a container. (Sam Yaple has raised that issue multiple times.) * If Heka is stopped and restarted later then log messages will be lost, as the ``DockerLogInput`` plugin doesn't currently have a mechanism for tracking its - positions in the log streams. This is in contrast to the ``LogstreamerInput`` + positions in the log streams. This is in contrast to the ``LogstreamerInput`` plugin [6] which does include that mechanism. For these reasons we think that relying on the ``DockerLogInput`` plugin may @@ -119,7 +119,7 @@ not be a practical option. For the note, our experiments have also shown that the OpenStack containers logs written to ``stdout`` are visible to neither Heka nor ``docker logs``. This problem is not reproducible when ``stderr`` is used rather than -``stdout``. The cause of this problem is currently unknown. And it looks like +``stdout``. The cause of this problem is currently unknown. And it looks like other people have come across that issue [7]. Use local log files @@ -129,7 +129,7 @@ Another option consists of configuring all the Kolla services to log into local files, and using Heka's ``LogstreamerInput`` plugin [5]. This option involves using a Docker named volume, mounted both into the service -containers (in ``rw`` mode) and into the Heka container (in ``ro`` mode). The +containers (in ``rw`` mode) and into the Heka container (in ``ro`` mode). The services write logs into files placed in that volume, and Heka reads logs from the files found in that volume. @@ -138,28 +138,28 @@ And it relies on Heka's ``LogstreamerInput`` plugin, which, based on our experience, is efficient and robust. Keeping file logs locally on the nodes has been established as a requirement by -the Kolla developers. With this option, and the Docker volume used, meeting +the Kolla developers. With this option, and the Docker volume used, meeting that requirement necessitates no additional mechanism. For this option to be applicable the services must have the capability of -logging into files. Most of the Kolla services have this capability. The +logging into files. Most of the Kolla services have this capability. The exceptions are HAProxy and Keepalived, for which a different mechanism should -be used (described further down in the document). Note that this will make it +be used (described further down in the document). Note that this will make it possible to collect logs from RabbitMQ, which does not support logging to Syslog but does support logging to a file. Also, this option requires that the services have the permission to create files into the Docker volume, and that Heka has the permission to read these -files. This means that the Docker named volume will have to have appropriate -owner, group and permission bits. With the Heka container running under +files. This means that the Docker named volume will have to have appropriate +owner, group and permission bits. With the Heka container running under a specific user (see below) this will mean using an ``extend_start.sh`` script -including ``sudo chown`` and possibly ``sudo chmod`` commands. Our prototype +including ``sudo chown`` and possibly ``sudo chmod`` commands. Our prototype [4] already includes this. As mentioned already the ``LogstreamerInput`` plugin includes a mechanism for -tracking positions in log streams. This works with journal files stored on the -file system (in ``/var/cache/hekad``). A specific volume, private to Heka, -will be used for these journal files. In this way no logs will be lost if the +tracking positions in log streams. This works with journal files stored on the +file system (in ``/var/cache/hekad``). A specific volume, private to Heka, +will be used for these journal files. In this way no logs will be lost if the Heka container is removed and a new one is created. Handling HAProxy and Keepalived @@ -174,7 +174,7 @@ This works by using Heka's ``UdpInput`` plugin with its ``net`` option set to ``unixgram``. This also requires that a Unix socket is created by Heka, and that socket is -mounted into the HAProxy and Keepalived containers. For that we will use the +mounted into the HAProxy and Keepalived containers. For that we will use the same technique as the one currently used in Kolla with Rsyslog, that is mounting ``/var/lib/kolla/dev`` into the Heka container and mounting ``/var/lib/kolla/dev/log`` into the service containers. @@ -182,7 +182,7 @@ mounting ``/var/lib/kolla/dev`` into the Heka container and mounting Our prototype already includes some code demonstrating this. See [4]. Also, to be able to store a copy of the HAProxy and Keepalived logs locally on -the node, we will use Heka's ``FileOutput`` plugin. We will possibly create +the node, we will use Heka's ``FileOutput`` plugin. We will possibly create two instances of that plugin, one for HAProxy and one for Keepalived, with specific filters (``message_matcher``). @@ -190,29 +190,29 @@ Read Python Tracebacks ---------------------- In case of exceptions the OpenStack services log Python Tracebacks as multiple -log messages. If no special care is taken then the Python Tracebacks will be +log messages. If no special care is taken then the Python Tracebacks will be indexed as separate documents in Elasticsearch, and displayed as distinct log -entries in Kibana, making them hard to read. To address that issue we will use +entries in Kibana, making them hard to read. To address that issue we will use a custom Heka decoder, which will be responsible for coalescing the log lines -making up a Python Traceback into one message. Our prototype includes that +making up a Python Traceback into one message. Our prototype includes that decoder [4]. Collect system logs ------------------- In addition to container logs we think it is important to collect system logs -as well. For that we propose to mount the host's ``/var/log`` directory into +as well. For that we propose to mount the host's ``/var/log`` directory into the Heka container, and configure Heka to get logs from standard log files -located in that directory (e.g. ``kern.log``, ``auth.log``, ``messages``). The +located in that directory (e.g. ``kern.log``, ``auth.log``, ``messages``). The list of system log files will be determined at development time. Log rotation ------------ -Log rotation is an important aspect of the logging system. Currently Kolla -doesn't rotate logs. Logs just accumulate in the ``rsyslog`` Docker volume. +Log rotation is an important aspect of the logging system. Currently Kolla +doesn't rotate logs. Logs just accumulate in the ``rsyslog`` Docker volume. The work on Heka proposed in this spec isn't directly related to log rotation, -but we are suggesting to address this issue for Mitaka. This will mean +but we are suggesting to address this issue for Mitaka. This will mean creating a new container that uses ``logrotate`` to manage the log files created by the Kolla containers. @@ -220,33 +220,33 @@ Create an ``heka`` user ----------------------- For security reasons an ``heka`` user will be created in the Heka container and -the ``hekad`` daemon will run under that user. The ``heka`` user will be added +the ``hekad`` daemon will run under that user. The ``heka`` user will be added to the ``kolla`` group, to make sure that Heka can read the log files created by the services. Security impact --------------- -Heka is a mature product maintained and used in production by Mozilla. So we -trust Heka as being secure. We also trust the Heka developers as being serious +Heka is a mature product maintained and used in production by Mozilla. So we +trust Heka as being secure. We also trust the Heka developers as being serious should security vulnerabilities be found in the Heka code. As described above we are proposing to use a Docker volume between the service -containers and the Heka container. The group of the volume directory and the -log files will be ``kolla``. And the owner of the log files will be the user -that executes the service producing logs. But the ``gid`` of the ``kolla`` +containers and the Heka container. The group of the volume directory and the +log files will be ``kolla``. And the owner of the log files will be the user +that executes the service producing logs. But the ``gid`` of the ``kolla`` group and the ``uid``'s of the users executing the services may correspond -to a different group and different users on the host system. This means -that the permissions may not be right on the host system. This problem is +to a different group and different users on the host system. This means +that the permissions may not be right on the host system. This problem is not specific to this specification, and it already exists in Kolla (for the mariadb data volume for example). Performance Impact ------------------ -The ``hekad`` daemon will run in a container on each cluster node. But the -``rsyslogd`` will be removed. And we have assessed that Heka is lightweight -enough to run on every node. Also, a possible option would be to constrain the +The ``hekad`` daemon will run in a container on each cluster node. But the +``rsyslogd`` will be removed. And we have assessed that Heka is lightweight +enough to run on every node. Also, a possible option would be to constrain the Heka container to only use a defined amount of resources. Alternatives @@ -256,12 +256,12 @@ An alternative to this proposal involves using Logstash in a centralized way as done in [1]. Another alternative would be to execute Logstash on each cluster node, as this -spec proposes with Heka. But this would mean running a JVM on each cluster +spec proposes with Heka. But this would mean running a JVM on each cluster node, and using Redis as a centralized queue. Also, as described above, we initially considered relying on services writing -their logs to ``stdout`` and use Heka's ``DockerLogInput`` plugin. But our -prototyping work has demonstrated the limits of that approach. See the +their logs to ``stdout`` and use Heka's ``DockerLogInput`` plugin. But our +prototyping work has demonstrated the limits of that approach. See the ``DockerLogInput`` section above for more information. Implementation diff --git a/specs/template.rst b/specs/template.rst index 27b93152f4..6b62ec3760 100644 --- a/specs/template.rst +++ b/specs/template.rst @@ -8,8 +8,8 @@ This template should be in ReSTructured text. The filename in the git repository should match the launchpad URL, for example a URL of https://blueprints.launchpad.net/kolla/+spec/awesome-thing should be named - awesome-thing.rst . Please do not delete any of the sections in this - template. If you have nothing to say for a whole section, just write: None + awesome-thing.rst . Please do not delete any of the sections in this + template. If you have nothing to say for a whole section, just write: None For help with syntax, see http://sphinx-doc.org/rest.html To test out your formatting, see http://www.tele3.cz/jbar/rest/rest.html