add container-healthcheck spec

Adds a spec in support of
https://blueprints.launchpad.net/tripleo/+spec/container-healthchecks

Change-Id: If3250bc8a3bf37ae78c13798af4aa4bbbe888c97
This commit is contained in:
Lars Kellogg-Stedman 2017-05-17 11:27:32 -04:00
parent 55a1f2a4ba
commit 8c02482736
1 changed files with 148 additions and 0 deletions

View File

@ -0,0 +1,148 @@
===========================================
Container Healthchecks for TripleO Services
===========================================
https://blueprints.launchpad.net/tripleo/+spec/container-healthchecks
An OpenStack deployment involves many services spread across many
hosts. It is important that we provide tooling and APIs that make it
as easy as possible to monitor this large, distributed environment.
The move to containerized services in the overcloud [1]
brings with it many opportunities, such as the ability to bundle
services with their associated health checks and provide a standard
API for assessing the health of the service.
[1]: https://blueprints.launchpad.net/tripleo/+spec/containerize-tripleo
Problem Description
===================
The people who are in the best position to develop appropriate health
checks for a service are generally those people responsible for
developing the service. Unfortunately, the task of setting up
monitoring generally ends up in the hands of cloud operators or some
intermediary.
I propose that we take advantage of the bundling offered by
containerized services and create a standard API with which an
operator can assess the health of a service. This makes life easier
for the operator, who can now provide granular service monitoring
without requiring detailed knowledge about every service, and it
allows service developers to ensure that services are monitored
appropriately.
Proposed Change
===============
Overview
--------
The Docker engine (since version 1.12), as well as most higher-level
orchestration frameworks, provide a standard mechanism for validating
the health of a container. Docker itself provides the
HEALTHCHECK_ directive, while Kubernetes has explicit
support for `liveness and readiness probes`_. Both
mechanisms work by executing a defined command inside the container,
and using the result of that executing to determine whether or not the
container is "healthy".
.. _liveness and readiness probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
.. _healthcheck: https://docs.docker.com/engine/reference/builder/#healthcheck
I propose that we explicitly support these interfaces in containerized
TripleO services through the following means:
1. Include in every container a `/openstack/healthcheck` command that
will check the health of the containerized service, exit with
status ``0`` if the service is healthy or ``1`` if not, and provide
a message on ``stdout`` describing the nature of the error.
2. Include in every Docker image an appropriate ``HEALTHCHECK``
directive to utilize the script::
HEALTHCHECK CMD /openstack/healthcheck
3. If Kubernetes becomes a standard part of the TripleO deployment
process, we may be able to implement liveness or readiness probes
using the same script::
livenessProbe:
exec:
command:
- /openstack/healthcheck
Alternatives
------------
The alternative is the status quo: services do not provide a standard
healthcheck API, and service monitoring must be configured
individually by cloud operators.
Security Impact
---------------
N/A
Other End User Impact
---------------------
Users can explicitly run the healthcheck script to immediately assess
the state of a service.
Performance Impact
------------------
This proposal will result in the periodic execution of tasks on the
overcloud hosts. When designing health checks, service developers
should select appropriate check intervals such that there is minimal
operational overhead from the health checks.
Other Deployer Impact
---------------------
N/A
Developer Impact
----------------
Developers will need to determine how best to assess the health of a
service and provide the appropriate script to perform this check.
Implementation
==============
Assignee(s)
-----------
N/A
Work Items
----------
N/A
Dependencies
============
- This requires that we implement `containerize-tripleo-overcloud`_
blueprint.
.. _containerize-tripleo-overcloud: https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/containerize-tripleo-overcloud.html
Testing
=======
TripleO CI jobs should be updated to utilize the healthcheck API to
determine if services are running correctly.
Documentation Impact
====================
Any documentation describing the process of containerizing a service
for TripleoO must be updated to describe the healthcheck API.
References
==========
N/A