109 lines
3.7 KiB
ReStructuredText
109 lines
3.7 KiB
ReStructuredText
==============================
|
|
OPENSTACK DIAGNOSTICS PROPOSAL
|
|
==============================
|
|
|
|
.. contents::
|
|
|
|
Project Name
|
|
============
|
|
|
|
**Official:** OpenStack Diagnostics
|
|
|
|
**Codename:** Rubick
|
|
|
|
OVERVIEW
|
|
========
|
|
|
|
The typical OpenStack cloud life cycle consists of 2 phases:
|
|
|
|
- initial deployment and
|
|
- operation maintenance
|
|
|
|
OpenStack cloud operators usually rely on deploymnet tools to configure all the
|
|
platform components correctly and efficiently in **initial deployment** phase.
|
|
Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack,
|
|
to name a few.
|
|
|
|
However, once you installed and kicked off the cloud, platform configurations
|
|
and operational conditions begin to change. These changes could break
|
|
consistency and integration of cloud platform components. Keeping cloud up and
|
|
running is the essense of **operation maintenance** phase.
|
|
|
|
Cloud operator must quickly and efficiently identify and respond to the root
|
|
cause of such failures. To do so, he must check if his OpenStack configuration
|
|
is sane and consistent. These checks could be thought of as rules of diagnostic
|
|
system.
|
|
|
|
There are not many projects in OpenStack ecosystem aimed to increase reliability
|
|
and resilience of the cloud at the operation stage. With this proposal we want
|
|
to introduce a project which will help operators to diagnose their OpenStack
|
|
platform, reduce response time to known and unknown failures and effectively
|
|
support the desired SLA.
|
|
|
|
Mission
|
|
-------
|
|
|
|
Diagnostics' mission is to **provide OpenStack cloud operators with tools which
|
|
minimize time and effort needed to identify and fix errors in operations
|
|
maintenance phase of cloud life cycle.**
|
|
|
|
User Stories
|
|
-----------
|
|
|
|
- As a **cloud operator**, I want to make sure that my OpenStack architecture
|
|
and configuration is sane and consistent across all platform components and
|
|
services.
|
|
- As a **cloud architect**, I want to make sure that my OpenStack architecture
|
|
and configuration are compliant to best practices.
|
|
- As a **cloud architect**, I need a knowledge base of sanity checks and best
|
|
practices for troubleshooting my OpenStack cloud which I can reuse and update
|
|
with my own checks and rules.
|
|
- As a **cloud operator**, I want to be able to automatically extract
|
|
configuration parameters from all OpenStack components to verify their
|
|
correctness, consistency and integrity.
|
|
- As a **cloud operator**, I want automatic diagnostics tool which can inspect
|
|
configuration of my OpenStack cloud and report if it is sane and/or compliant
|
|
toc community-defined best practices.
|
|
- As a **cloud operator**, I want to be able to define rules used to inspect
|
|
and verify configuration of OpenStack components and store them to use for
|
|
verification of future configuration changes.
|
|
|
|
Roadmap
|
|
-------
|
|
|
|
Proof of concept implementation - end October 2013. PoC implementation includes:
|
|
|
|
#. Open source code in stackforge repository
|
|
#. Standalone service with REST API v0.1
|
|
#. Simple SSH-based configuration data extraction
|
|
#. Rules engine with grammatic analysis
|
|
#. Basic healthcheck ruleset v0.1 with example rules of different types
|
|
#. Filesystem-based ruleset store
|
|
|
|
PoC scope does not include:
|
|
|
|
#. Basic integration with OpenStack Deployment program projects (Tuskar,
|
|
TripleO)
|
|
#. Extraction of configuration data from Heat metadata
|
|
#. Extended ruleset with example best practices
|
|
#. Healthcheck ruleset v1.0
|
|
#. Ruleset store back-ends
|
|
|
|
Assumptions
|
|
-----------
|
|
|
|
We assume that we must reuse as much as possible from OpenStack Deployment
|
|
program in terms of platform configuration and architecture definitions (i.e.
|
|
TripleO Heat and configuration files templates).
|
|
|
|
DESIGN
|
|
======
|
|
|
|
.. include:: service_architecture.rst
|
|
|
|
.. include:: rules_engine.rst
|
|
|
|
.. include:: openstack_integration.rst
|
|
|
|
.. include:: openstack_architecture_model.rst
|