Nicholas Chase d6af412fa7 Break HA Guide into chapters and sections
This patch breaks the monolithic bk-ha-guide.xml file into chapters
and sections. Section files are placed in subdirectories, with the
subdirectories named after the chapters (and parts) to which they
belong.

This patch just does structural fixes. Once it's in, we can begin to
do content cleanup in manageable chunks.

Change-Id: I27397834141a3e6c305f60e71350ce869ab7c8a1
Implements: blueprint convert-ha-guide-to-docbook
2014-05-31 20:55:29 -04:00

73 lines
6.4 KiB
XML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
version="5.0" xml:id="ch-intro">
<info>
<title>Introduction to OpenStack High Availability</title>
</info>
<simpara>High Availability systems seek to minimize two things:</simpara>
<itemizedlist>
<listitem>
<simpara><emphasis role="strong">System downtime</emphasis>occurs when a <emphasis>user-facing</emphasis> service is unavailable beyond a specified maximum amount of time, and
</simpara>
</listitem>
<listitem>
<simpara><emphasis role="strong">Data loss</emphasis>accidental deletion or destruction of data.
</simpara>
</listitem>
</itemizedlist>
<simpara>Most high availability systems guarantee protection against system downtime and data loss only in the event of a single failure. However, they are also expected to protect against cascading failures, where a single failure deteriorates into a series of consequential failures.</simpara>
<simpara>A crucial aspect of high availability is the elimination of single points of failure (SPOFs). A SPOF is an individual piece of equipment or software which will cause system downtime or data loss if it fails. In order to eliminate SPOFs, check that mechanisms exist for redundancy of:</simpara>
<itemizedlist>
<listitem>
<simpara>
Network components, such as switches and routers
</simpara>
</listitem>
<listitem>
<simpara>
Applications and automatic service migration
</simpara>
</listitem>
<listitem>
<simpara>
Storage components
</simpara>
</listitem>
<listitem>
<simpara>
Facility services such as power, air conditioning, and fire protection
</simpara>
</listitem>
</itemizedlist>
<simpara>Most high availability systems will fail in the event of multiple independent (non-consequential) failures. In this case, most systems will protect data over maintaining availability.</simpara>
<simpara>High-availability systems typically achieve uptime of 99.99% or more, which roughly equates to less than an hour of cumulative downtime per year. In order to achieve this, high availability systems should keep recovery times after a failure to about one to two minutes, sometimes significantly less.</simpara>
<simpara>OpenStack currently meets such availability requirements for its own infrastructure services, meaning that an uptime of 99.99% is feasible for the OpenStack infrastructure proper. However, OpenStack <emphasis>does</emphasis> <emphasis>not</emphasis> guarantee 99.99% availability for individual guest instances.</simpara>
<simpara>Preventing single points of failure can depend on whether or not a service is stateless.</simpara>
<section xml:id="stateless-vs-stateful">
<info>
<title>Stateless vs. Stateful services</title>
</info>
<simpara>A stateless service is one that provides a response after your request, and then requires no further attention. To make a stateless service highly available, you need to provide redundant instances and load balance them. OpenStack services that are stateless include nova-api, nova-conductor, glance-api, keystone-api, neutron-api and nova-scheduler.</simpara>
<simpara>A stateful service is one where subsequent requests to the service depend on the results of the first request. Stateful services are more difficult to manage because a single action typically involves more than one request, so simply providing additional instances and load balancing will not solve the problem. For example, if the Horizon user interface reset itself every time you went to a new page, it wouldnt be very useful. OpenStack services that are stateful include the OpenStack database and message queue.</simpara>
<simpara>Making stateful services highly available can depend on whether you choose an active/passive or active/active configuration.</simpara>
</section>
<section xml:id="ap-intro">
<info>
<title>Active/Passive</title>
</info>
<simpara>In an active/passive configuration, systems are set up to bring additional resources online to replace those that have failed. For example, OpenStack would write to the main database while maintaining a disaster recovery database that can be brought online in the event that the main database fails.</simpara>
<simpara>Typically, an active/passive installation for a stateless service would maintain a redundant instance that can be brought online when required. Requests are load balanced using a virtual IP address and a load balancer such as HAProxy.</simpara>
<simpara>A typical active/passive installation for a stateful service maintains a replacement resource that can be brought online when required. A separate application (such as Pacemaker or Corosync) monitors these services, bringing the backup online as necessary.</simpara>
</section>
<section xml:id="aa-intro">
<info>
<title>Active/Active</title>
</info>
<simpara>In an active/active configuration, systems also use a backup but will manage both the main and redundant systems concurrently. This way, if there is a failure the user is unlikely to notice. The backup system is already online, and takes on increased load while the main system is fixed and brought back online.</simpara>
<simpara>Typically, an active/active installation for a stateless service would maintain a redundant instance, and requests are load balanced using a virtual IP address and a load balancer such as HAProxy.</simpara>
<simpara>A typical active/active installation for a stateful service would include redundant services with all instances having an identical state. For example, updates to one instance of a database would also update all other instances. This way a request to one instance is the same as a request to any other. A load balancer manages the traffic to these systems, ensuring that operational systems always handle the request.</simpara>
<simpara>These are some of the more common ways to implement these high availability architectures, but they are by no means the only ways to do it. The important thing is to make sure that your services are redundant, and available; how you achieve that is up to you. This document will cover some of the more common options for highly available systems.</simpara>
</section>
</chapter>