Improve chapter "Introduction to OpenStack High Availability"

Change-Id: Ibc10eee41eda2d69ba35b717b966d957509cd723
2014-09-23 22:03:22 +02:00
parent 0afc42e955
commit 5f11d6d651
1 changed files with 118 additions and 69 deletions
--- a/doc/high-availability-guide/ch_intro.xml
+++ b/doc/high-availability-guide/ch_intro.xml
@@ -1,74 +1,123 @@
 <?xml version="1.0" encoding="UTF-8"?>
-  <chapter xmlns="http://docbook.org/ns/docbook"
+<chapter xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="ch-intro">
-
-      <title>Introduction to OpenStack High Availability</title>
-
-    <para>High Availability systems seek to minimize two things:</para>
-    <itemizedlist>
-      <listitem>
-        <para><emphasis role="strong">System downtime</emphasis> — occurs when a <emphasis>user-facing</emphasis> service is unavailable beyond a specified maximum amount of time.
-</para>
-      </listitem>
-      <listitem>
-        <para><emphasis role="strong">Data loss</emphasis> — accidental deletion or destruction of data.
-</para>
-      </listitem>
-    </itemizedlist>
-    <para>Most high availability systems guarantee protection against system downtime and data loss only in the event of a single failure. However, they are also expected to protect against cascading failures, where a single failure deteriorates into a series of consequential failures.</para>
-    <para>A crucial aspect of high availability is the elimination of single points of failure (SPOFs). A SPOF is an individual piece of equipment or software which will cause system downtime or data loss if it fails. In order to eliminate SPOFs, check that mechanisms exist for redundancy of:</para>
-    <itemizedlist>
-      <listitem>
-        <para>
-Network components, such as switches and routers
-</para>
-      </listitem>
-      <listitem>
-        <para>
-Applications and automatic service migration
-</para>
-      </listitem>
-      <listitem>
-        <para>
-Storage components
-</para>
-      </listitem>
-      <listitem>
-        <para>
-Facility services such as power, air conditioning, and fire protection
-</para>
-      </listitem>
-    </itemizedlist>
-    <para>Most high availability systems will fail in the event of multiple independent (non-consequential) failures. In this case, most systems will protect data over maintaining availability.</para>
-    <para>High-availability systems typically achieve an uptime percentage of 99.99% or more, which roughly equates to less than an hour of cumulative downtime per year. In order to achieve this, high availability systems should keep recovery times after a failure to about one to two minutes, sometimes significantly less.</para>
-    <para>OpenStack currently meets such availability requirements for its own infrastructure services, meaning that an uptime of 99.99% is feasible for the OpenStack infrastructure proper. However, OpenStack <emphasis>does</emphasis> <emphasis>not</emphasis> guarantee 99.99% availability for individual guest instances.</para>
-    <para>Preventing single points of failure can depend on whether or not a service is stateless.</para>
-    <section xml:id="stateless-vs-stateful">
-
-        <title>Stateless vs. Stateful services</title>
-
-      <para>A stateless service is one that provides a response after your request, and then requires no further attention. To make a stateless service highly available, you need to provide redundant instances and load balance them. OpenStack services that are stateless include nova-api, nova-conductor, glance-api, keystone-api, neutron-api and nova-scheduler.</para>
-      <para>A stateful service is one where subsequent requests to the service depend on the results of the first request. Stateful services are more difficult to manage because a single action typically involves more than one request, so simply providing additional instances and load balancing will not solve the problem. For example, if the Horizon user interface reset itself every time you went to a new page, it wouldn’t be very useful. OpenStack services that are stateful include the OpenStack database and message queue.</para>
-      <para>Making stateful services highly available can depend on whether you choose an active/passive or active/active configuration.</para>
-    </section>
-    <section xml:id="ap-intro">
-
-        <title>Active/Passive</title>
-
-      <para>In an active/passive configuration, systems are set up to bring additional resources online to replace those that have failed. For example, OpenStack would write to the main database while maintaining a disaster recovery database that can be brought online in the event that the main database fails.</para>
-      <para>Typically, an active/passive installation for a stateless service would maintain a redundant instance that can be brought online when required. Requests may be handled using a virtual IP address to facilitate return to service with minimal reconfiguration required.</para>
-      <para>A typical active/passive installation for a stateful service maintains a replacement resource that can be brought online when required. A separate application (such as Pacemaker or Corosync) monitors these services, bringing the backup online as necessary.</para>
-    </section>
-    <section xml:id="aa-intro">
-
-        <title>Active/Active</title>
-
-      <para>In an active/active configuration, systems also use a backup but will manage both the main and redundant systems concurrently. This way, if there is a failure the user is unlikely to notice. The backup system is already online, and takes on increased load while the main system is fixed and brought back online.</para>
-      <para>Typically, an active/active installation for a stateless service would maintain a redundant instance, and requests are load balanced using a virtual IP address and a load balancer such as HAProxy.</para>
-      <para>A typical active/active installation for a stateful service would include redundant services with all instances having an identical state. For example, updates to one instance of a database would also update all other instances. This way a request to one instance is the same as a request to any other. A load balancer manages the traffic to these systems, ensuring that operational systems always handle the request.</para>
-      <para>These are some of the more common ways to implement these high availability architectures, but they are by no means the only ways to do it. The important thing is to make sure that your services are redundant, and available; how you achieve that is up to you. This document will cover some of the more common options for highly available systems.</para>
-    </section>
-  </chapter>
+  <title>Introduction to OpenStack High Availability</title>
+  <para>High Availability systems seek to minimize two things:</para>
+  <variablelist>
+    <varlistentry>
+      <term>System downtime</term>
+      <listitem><para>Occurs when a user-facing service is unavailable
+        beyond a specified maximum amount of time.</para></listitem>
+    </varlistentry>
+    <varlistentry>
+      <term>Data loss</term>
+      <listitem><para>Accidental deletion or destruction of
+        data.</para></listitem>
+    </varlistentry>
+  </variablelist>
+  <para>Most high availability systems guarantee protection against system
+    downtime and data loss only in the event of a single failure. However,
+    they are also expected to protect against cascading failures, where a
+    single failure deteriorates into a series of consequential failures.</para>
+  <para>A crucial aspect of high availability is the elimination of single
+    points of failure (SPOFs). A SPOF is an individual piece of equipment or
+    software which will cause system downtime or data loss if it fails.
+    In order to eliminate SPOFs, check that mechanisms exist for redundancy
+    of:</para>
+  <itemizedlist>
+    <listitem>
+      <para>Network components, such as switches and routers</para>
+    </listitem>
+    <listitem>
+      <para>Applications and automatic service migration</para>
+    </listitem>
+    <listitem>
+      <para>Storage components</para>
+    </listitem>
+    <listitem>
+      <para>Facility services such as power, air conditioning, and fire
+        protection</para>
+    </listitem>
+  </itemizedlist>
+  <para>Most high availability systems will fail in the event of multiple
+    independent (non-consequential) failures. In this case, most systems will
+    protect data over maintaining availability.</para>
+  <para>High-availability systems typically achieve an uptime percentage of
+    99.99% or more, which roughly equates to less than an hour of cumulative
+    downtime per year. In order to achieve this, high availability systems
+    should keep recovery times after a failure to about one to two minutes,
+    sometimes significantly less.</para>
+  <para>OpenStack currently meets such availability requirements for its own
+    infrastructure services, meaning that an uptime of 99.99% is feasible for
+    the OpenStack infrastructure proper. However, OpenStack does not guarantee
+    99.99% availability for individual guest instances.</para>
+  <para>Preventing single points of failure can depend on whether or not a
+    service is stateless.</para>
+  <section xml:id="stateless-vs-stateful">
+    <title>Stateless vs. Stateful services</title>
+    <para>A stateless service is one that provides a response after your
+      request, and then requires no further attention. To make a stateless
+      service highly available, you need to provide redundant instances and
+      load balance them. OpenStack services that are stateless include
+      <systemitem class="service">nova-api</systemitem>,
+      <systemitem class="service">nova-conductor</systemitem>,
+      <systemitem class="service">glance-api</systemitem>,
+      <systemitem class="service">keystone-api</systemitem>,
+      <systemitem class="service">neutron-api</systemitem> and
+      <systemitem class="service">nova-scheduler</systemitem>.</para>
+    <para>A stateful service is one where subsequent requests to the service
+      depend on the results of the first request. Stateful services are more
+      difficult to manage because a single action typically involves more than
+      one request, so simply providing additional instances and load balancing
+      will not solve the problem. For example, if the Horizon user interface
+      reset itself every time you went to a new page, it wouldn't be very
+      useful. OpenStack services that are stateful include the OpenStack
+      database and message queue.</para>
+    <para>Making stateful services highly available can depend on whether you
+      choose an active/passive or active/active configuration.</para>
+  </section>
+  <section xml:id="ap-intro">
+    <title>Active/Passive</title>
+    <para>In an active/passive configuration, systems are set up to bring
+      additional resources online to replace those that have failed. For
+      example, OpenStack would write to the main database while maintaining a
+      disaster recovery database that can be brought online in the event that
+      the main database fails.</para>
+    <para>Typically, an active/passive installation for a stateless service
+      would maintain a redundant instance that can be brought online when
+      required. Requests may be handled using a virtual IP address to
+      facilitate return to service with minimal reconfiguration
+      required.</para>
+    <para>A typical active/passive installation for a stateful service
+      maintains a replacement resource that can be brought online when
+      required. A separate application (such as Pacemaker or Corosync) monitors
+      these services, bringing the backup online as necessary.</para>
+  </section>
+  <section xml:id="aa-intro">
+    <title>Active/Active</title>
+    <para>In an active/active configuration, systems also use a backup but will
+      manage both the main and redundant systems concurrently. This way, if
+      there is a failure the user is unlikely to notice. The backup system is
+      already online, and takes on increased load while the main system is
+      fixed and brought back online.</para>
+    <para>Typically, an active/active installation for a stateless service
+      would maintain a redundant instance, and requests are load balanced using
+      a virtual IP address and a load balancer such as HAProxy.</para>
+    <para>A typical active/active installation for a stateful service would
+      include redundant services with all instances having an identical state.
+      For example, updates to one instance of a database would also update all
+      other instances. This way a request to one instance is the same as a
+      request to any other. A load balancer manages the traffic to these
+      systems, ensuring that operational systems always handle the
+      request.</para>
+    <para>These are some of the more common ways to implement these high
+      availability architectures, but they are by no means the only ways to do
+      it. The important thing is to make sure that your services are redundant,
+      and available; how you achieve that is up to you. This document will
+      cover some of the more common options for highly available
+      systems.</para>
+  </section>
+</chapter>