109 lines
		
	
	
		
			3.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			109 lines
		
	
	
		
			3.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
==============================
 | 
						|
OPENSTACK DIAGNOSTICS PROPOSAL
 | 
						|
==============================
 | 
						|
 | 
						|
.. contents::
 | 
						|
 | 
						|
Project Name
 | 
						|
============
 | 
						|
 | 
						|
**Official:** OpenStack Diagnostics
 | 
						|
 | 
						|
**Codename:** Rubick
 | 
						|
 | 
						|
OVERVIEW
 | 
						|
========
 | 
						|
 | 
						|
The typical OpenStack cloud life cycle consists of 2 phases:
 | 
						|
 | 
						|
- initial deployment and
 | 
						|
- operation maintenance
 | 
						|
 | 
						|
OpenStack cloud operators usually rely on deploymnet tools to configure all the
 | 
						|
platform components correctly and efficiently in **initial deployment** phase.
 | 
						|
Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack,
 | 
						|
to name a few.
 | 
						|
 | 
						|
However, once you installed and kicked off the cloud, platform configurations
 | 
						|
and operational conditions begin to change. These changes could break
 | 
						|
consistency and integration of cloud platform components. Keeping cloud up and
 | 
						|
running is the essense of **operation maintenance** phase.
 | 
						|
 | 
						|
Cloud operator must quickly and efficiently identify and respond to the root
 | 
						|
cause of such failures. To do so, he must check if his OpenStack configuration
 | 
						|
is sane and consistent. These checks could be thought of as rules of diagnostic
 | 
						|
system.
 | 
						|
 | 
						|
There are not many projects in OpenStack ecosystem aimed to increase reliability
 | 
						|
and resilience of the cloud at the operation stage. With this proposal we want
 | 
						|
to introduce a project which will help operators to diagnose their OpenStack
 | 
						|
platform, reduce response time to known and unknown failures and effectively
 | 
						|
support the desired SLA.
 | 
						|
 | 
						|
Mission
 | 
						|
-------
 | 
						|
 | 
						|
Diagnostics' mission is to **provide OpenStack cloud operators with tools which
 | 
						|
minimize time and effort needed to identify and fix errors in operations
 | 
						|
maintenance phase of cloud life cycle.**
 | 
						|
 | 
						|
User Stories
 | 
						|
-----------
 | 
						|
 | 
						|
- As a **cloud operator**, I want to make sure that my OpenStack architecture
 | 
						|
  and configuration is sane and consistent across all platform components and
 | 
						|
  services.
 | 
						|
- As a **cloud architect**, I want to make sure that my OpenStack architecture
 | 
						|
  and configuration are compliant to best practices.
 | 
						|
- As a **cloud architect**, I need a knowledge base of sanity checks and best
 | 
						|
  practices for troubleshooting my OpenStack cloud which I can reuse and update
 | 
						|
  with my own checks and rules.
 | 
						|
- As a **cloud operator**, I want to be able to automatically extract
 | 
						|
  configuration parameters from all OpenStack components to verify their
 | 
						|
  correctness, consistency and integrity.
 | 
						|
- As a **cloud operator**, I want automatic diagnostics tool which can inspect
 | 
						|
  configuration of my OpenStack cloud and report if it is sane and/or compliant
 | 
						|
  toc community-defined best practices.
 | 
						|
- As a **cloud operator**, I want to be able to define rules used to inspect
 | 
						|
  and verify configuration of OpenStack components and store them to use for
 | 
						|
  verification of future configuration changes.
 | 
						|
 | 
						|
Roadmap
 | 
						|
-------
 | 
						|
 | 
						|
Proof of concept implementation - end October 2013. PoC implementation includes:
 | 
						|
 | 
						|
#. Open source code in stackforge repository
 | 
						|
#. Standalone service with REST API v0.1
 | 
						|
#. Simple SSH-based configuration data extraction
 | 
						|
#. Rules engine with grammatic analysis
 | 
						|
#. Basic healthcheck ruleset v0.1 with example rules of different types
 | 
						|
#. Filesystem-based ruleset store
 | 
						|
 | 
						|
PoC scope does not include:
 | 
						|
 | 
						|
#. Basic integration with OpenStack Deployment program projects (Tuskar,
 | 
						|
   TripleO)
 | 
						|
#. Extraction of configuration data from Heat metadata
 | 
						|
#. Extended ruleset with example best practices
 | 
						|
#. Healthcheck ruleset v1.0
 | 
						|
#. Ruleset store back-ends
 | 
						|
 | 
						|
Assumptions
 | 
						|
-----------
 | 
						|
 | 
						|
We assume that we must reuse as much as possible from OpenStack Deployment
 | 
						|
program in terms of platform configuration and architecture definitions (i.e.
 | 
						|
TripleO Heat and configuration files templates).
 | 
						|
 | 
						|
DESIGN
 | 
						|
======
 | 
						|
 | 
						|
.. include:: service_architecture.rst
 | 
						|
 | 
						|
.. include:: rules_engine.rst
 | 
						|
 | 
						|
.. include:: openstack_integration.rst
 | 
						|
 | 
						|
.. include:: openstack_architecture_model.rst
 |