openstack-manuals/doc/arch-design/multi_site/section_user_requirements_multi_site.xml
darrenchan 68e8c66e79 Multi-site chapter edits
1. Edits to the multi-site chapter
2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/

Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8
Implements: blueprint arch-guide
2015-08-18 16:00:02 +10:00

177 lines
9.4 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="user-requirements-multi-site">
<?dbhtml stop-chunking?>
<title>User requirements</title>
<section xml:id="workload-characteristics">
<title>Workload characteristics</title>
<para>An understanding of the expected workloads for a desired
multi-site environment and use case is an important factor in
the decision-making process. In this context, <literal>workload</literal>
refers to the way the systems are used. A workload could be a
single application or a suite of applications that work together.
It could also be a duplicate set of applications that need to
run in multiple cloud environments. Often in a multi-site deployment,
the same workload will need to work identically in more than one
physical location.</para>
<para>This multi-site scenario likely includes one or more of the
other scenarios in this book with the additional requirement
of having the workloads in two or more locations. The
following are some possible scenarios:</para>
<para>For many use cases the proximity of the user to their
workloads has a direct influence on the performance of the
application and therefore should be taken into consideration
in the design. Certain applications require zero to minimal
latency that can only be achieved by deploying the cloud in
multiple locations. These locations could be in different data
centers, cities, countries or geographical regions, depending
on the user requirement and location of the users.</para></section>
<section xml:id="consistency-images-templates-across-sites">
<title>Consistency of images and templates across different
sites</title>
<para>It is essential that the deployment of instances is
consistent across the different sites and built
into the infrastructure. If the OpenStack Object Storage is used as
a back end for the Image service, it is possible to create repositories
of consistent images across multiple sites. Having central
endpoints with multiple storage nodes allows consistent centralized
storage for every site.</para>
<para>Not using a centralized object store increases the operational
overhead of maintaining a consistent image library. This
could include development of a replication mechanism to handle
the transport of images and the changes to the images across
multiple sites.</para></section>
<section xml:id="high-availability-multi-site">
<title>High availability</title>
<para>If high availability is a requirement to provide continuous
infrastructure operations, a basic requirement of high
availability should be defined.</para>
<para>The OpenStack management components need to have a basic and
minimal level of redundancy. The simplest example is the loss
of any single site should have minimal impact on the
availability of the OpenStack services.</para>
<para>The <link
xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
High Availability Guide</citetitle></link>
contains more information on how to provide redundancy for the
OpenStack components.</para>
<para>Multiple network links should be deployed between sites to
provide redundancy for all components. This includes storage
replication, which should be isolated to a dedicated network
or VLAN with the ability to assign QoS to control the
replication traffic or provide priority for this traffic. Note
that if the data store is highly changeable, the network
requirements could have a significant effect on the
operational cost of maintaining the sites.</para>
<para>The ability to maintain object availability in both sites
has significant implications on the object storage design and
implementation. It also has a significant impact on the
WAN network design between the sites.</para>
<para>Connecting more than two sites increases the challenges and
adds more complexity to the design considerations. Multi-site
implementations require planning to address the additional
topology used for internal and external connectivity. Some options
include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
<para>If applications running in a cloud are not cloud-aware, there
should be clear measures and expectations to define what the
infrastructure can and cannot support. An example would be
shared storage between sites. It is possible, however such a
solution is not native to OpenStack and requires a third-party
hardware vendor to fulfill such a requirement. Another example
can be seen in applications that are able to consume resources
in object storage directly. These applications need to be
cloud aware to make good use of an OpenStack Object
Store.</para></section>
<section xml:id="application-readiness">
<title>Application readiness</title>
<para>Some applications are tolerant of the lack of synchronized
object storage, while others may need those objects to be
replicated and available across regions. Understanding how
the cloud implementation impacts new and existing applications
is important for risk mitigation, and the overall success of a
cloud project. Applications may have to be written or rewritten
for an infrastructure with little to no redundancy, or with the
cloud in mind.</para></section>
<section xml:id="cost-multi-site">
<title>Cost</title>
<para>A greater number of sites increase cost and complexity for a
multi-site deployment. Costs can be broken down into the following
categories:</para>
<itemizedlist>
<listitem>
<para>Compute resources</para>
</listitem>
<listitem>
<para>Networking resources</para>
</listitem>
<listitem>
<para>Replication</para>
</listitem>
<listitem>
<para>Storage</para>
</listitem>
<listitem>
<para>Management</para>
</listitem>
<listitem>
<para>Operational costs</para>
</listitem>
</itemizedlist></section>
<section xml:id="site-loss-and-recovery">
<title>Site loss and recovery</title>
<para>Outages can cause partial or full loss of site functionality.
Strategies should be implemented to understand and plan for recovery
scenarios.</para>
<itemizedlist>
<listitem>
<para>The deployed applications need to continue to
function and, more importantly, you must consider the
impact on the performance and reliability of the application
when a site is unavailable.</para>
</listitem>
<listitem>
<para>It is important to understand what happens to the
replication of objects and data between the sites when
a site goes down. If this causes queues to start
building up, consider how long these queues can
safely exist until an error occurs.</para>
</listitem>
<listitem>
<para>After an outage, ensure the method for resuming proper
operations of a site is implemented when it comes back online.
We recommend you architect the recovery to avoid race conditions.</para>
</listitem>
</itemizedlist></section>
<section xml:id="compliance-and-geo-location-multi-site">
<title>Compliance and geo-location</title>
<para>An organization may have certain legal obligations and
regulatory compliance measures which could require certain
workloads or data to not be located in certain regions.</para></section>
<section xml:id="auditing-multi-site">
<title>Auditing</title>
<para>A well thought-out auditing strategy is important in order
to be able to quickly track down issues. Keeping track of
changes made to security groups and tenant changes can be
useful in rolling back the changes if they affect production.
For example, if all security group rules for a tenant
disappeared, the ability to quickly track down the issue would
be important for operational and legal reasons.</para></section>
<section xml:id="separation-of-duties">
<title>Separation of duties</title>
<para>A common requirement is to define different roles for the
different cloud administration functions. An example would be
a requirement to segregate the duties and permissions by
site.</para></section>
<section xml:id="authentication-between-sites">
<title>Authentication between sites</title>
<para>It is recommended to have a single authentication domain
rather than a separate implementation for each and every
site. This requires an authentication mechanism that is highly
available and distributed to ensure continuous operation.
Authentication server locality might be required and should be
planned for.</para></section>
</section>