Files
security-doc/security-guide/section_data-processing-deployment.xml
Michael McCune e958f07f31 Adding data processing chapter
This change introduces the data processing service chapter as chapter 13
in the security guide.

Changes
* adding data processing chapter to index
* adding data processing chapter file
* adding introduction section file
* adding architecture image
* adding deployment section file
* adding configuration and hardening section file
* adding case studies section file
* adding data processing to the introduction to openstack section

Change-Id: I50c5066373f7c9bd75eb956cbb163f27d6a63058
Closes-bug: 1415218
2015-02-19 16:59:14 -05:00

119 lines
4.8 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="data-processing-deployment">
<?dbhtml stop-chunking?>
<title>Deployment</title>
<para>
The Data processing service is deployed, like many other OpenStack
services, as an application running on a host connected to the stack.
As of the Kilo release, it has the ability to be deployed in a
distributed manner with several redundant controllers. Like other
services, it also requires a database to store information about its
resources. See <xref linkend="databases"/>. It is important to
note that the Data processing service will need to manage several
Identity service trusts, communicate directly with the Orchestration and
Networking services, and potentially create users in a proxy domain.
For these reasons the controller will need access to the control plane
and as such we recommend installing it alongside other service
controllers.
</para>
<para>
Data processing interacts directly with several openstack services:
</para>
<itemizedlist>
<listitem>
<para>
Compute
</para>
</listitem>
<listitem>
<para>
Identity
</para>
</listitem>
<listitem>
<para>
Networking
</para>
</listitem>
<listitem>
<para>
Object Storage
</para>
</listitem>
<listitem>
<para>
Orchestration
</para>
</listitem>
<listitem>
<para>
Block Storage (optional)
</para>
</listitem>
</itemizedlist>
<para>
We recommend documenting all the data flows and bridging points
between these services and the data processing controller. See
<xref linkend="documentation"/>.
</para>
<para>
The Object Storage service is used by the Data processing service to store
job binaries and data sources. Users wishing to have access to the full
Data processing service functionality will need an object store in the
projects they are using.
</para>
<para>
The Networking service plays an important role in the provisioning of
clusters. Prior to provisioning, the user is expected to provide one
or more networks for the cluster instances. The action of associating
networks is similar to the process of assigning networks when
launching instances through the dashboard. These networks are used by
the controller for administrative access to the instances and
frameworks of its clusters.
</para>
<para>
Also of note is the Identity service. Users of the Data processing service
will need appropriate roles in their projects to allow the provisioning of
instances for their clusters. Installations that use the proxy domain
configuration require special consideration. See
<xref linkend="data-processing-configuration-and-hardening-proxy-domains"/>.
Specifically, the Data processing service will need the ability to create
users within the proxy domain.
</para>
<section xml:id="data-processing-deployment-controller-network-access-to-clusters">
<title>Controller network access to clusters</title>
<para>
One of the primary tasks of the data processing controller is to
communicate with the instances it spawns. These instances are
provisioned and then configured depending on the framework being
used. The communication between the controller and the instances uses
secure shell (SSH) and HTTP protocols.
</para>
<para>
When provisioning clusters each instance will be given an IP address in
the networks provided by the user. The first network is often referred
to as the data processing management network and instances can use the
fixed IP address assigned by the Networking service for this network.
The controller can also be configured to use floating IP addresses for
the instances in addition to their fixed address. When communicating
with the instances the controller will prefer the floating address
if enabled.
</para>
<para>
For situations where the fixed and floating IP addresses do not
provide the functionality required the controller can provide access
through two alternate methods: custom network topologies and indirect
access. The custom network topologies feature allows the controller to
access the instances through a supplied shell command in the
configuration file. Indirect access is used to specify instances that
can be used as proxy gateways by the user during cluster provisioning.
These options are discussed with examples of usage in
<xref linkend="data-processing-configuration-and-hardening"/>.
</para>
</section>
</section>