This change introduces the data processing service chapter as chapter 13 in the security guide. Changes * adding data processing chapter to index * adding data processing chapter file * adding introduction section file * adding architecture image * adding deployment section file * adding configuration and hardening section file * adding case studies section file * adding data processing to the introduction to openstack section Change-Id: I50c5066373f7c9bd75eb956cbb163f27d6a63058 Closes-bug: 1415218
119 lines
4.8 KiB
XML
119 lines
4.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
version="5.0"
|
|
xml:id="data-processing-deployment">
|
|
<?dbhtml stop-chunking?>
|
|
<title>Deployment</title>
|
|
<para>
|
|
The Data processing service is deployed, like many other OpenStack
|
|
services, as an application running on a host connected to the stack.
|
|
As of the Kilo release, it has the ability to be deployed in a
|
|
distributed manner with several redundant controllers. Like other
|
|
services, it also requires a database to store information about its
|
|
resources. See <xref linkend="databases"/>. It is important to
|
|
note that the Data processing service will need to manage several
|
|
Identity service trusts, communicate directly with the Orchestration and
|
|
Networking services, and potentially create users in a proxy domain.
|
|
For these reasons the controller will need access to the control plane
|
|
and as such we recommend installing it alongside other service
|
|
controllers.
|
|
</para>
|
|
<para>
|
|
Data processing interacts directly with several openstack services:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Compute
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Identity
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Networking
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Object Storage
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Orchestration
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Block Storage (optional)
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
We recommend documenting all the data flows and bridging points
|
|
between these services and the data processing controller. See
|
|
<xref linkend="documentation"/>.
|
|
</para>
|
|
<para>
|
|
The Object Storage service is used by the Data processing service to store
|
|
job binaries and data sources. Users wishing to have access to the full
|
|
Data processing service functionality will need an object store in the
|
|
projects they are using.
|
|
</para>
|
|
<para>
|
|
The Networking service plays an important role in the provisioning of
|
|
clusters. Prior to provisioning, the user is expected to provide one
|
|
or more networks for the cluster instances. The action of associating
|
|
networks is similar to the process of assigning networks when
|
|
launching instances through the dashboard. These networks are used by
|
|
the controller for administrative access to the instances and
|
|
frameworks of its clusters.
|
|
</para>
|
|
<para>
|
|
Also of note is the Identity service. Users of the Data processing service
|
|
will need appropriate roles in their projects to allow the provisioning of
|
|
instances for their clusters. Installations that use the proxy domain
|
|
configuration require special consideration. See
|
|
<xref linkend="data-processing-configuration-and-hardening-proxy-domains"/>.
|
|
Specifically, the Data processing service will need the ability to create
|
|
users within the proxy domain.
|
|
</para>
|
|
<section xml:id="data-processing-deployment-controller-network-access-to-clusters">
|
|
<title>Controller network access to clusters</title>
|
|
<para>
|
|
One of the primary tasks of the data processing controller is to
|
|
communicate with the instances it spawns. These instances are
|
|
provisioned and then configured depending on the framework being
|
|
used. The communication between the controller and the instances uses
|
|
secure shell (SSH) and HTTP protocols.
|
|
</para>
|
|
<para>
|
|
When provisioning clusters each instance will be given an IP address in
|
|
the networks provided by the user. The first network is often referred
|
|
to as the data processing management network and instances can use the
|
|
fixed IP address assigned by the Networking service for this network.
|
|
The controller can also be configured to use floating IP addresses for
|
|
the instances in addition to their fixed address. When communicating
|
|
with the instances the controller will prefer the floating address
|
|
if enabled.
|
|
</para>
|
|
<para>
|
|
For situations where the fixed and floating IP addresses do not
|
|
provide the functionality required the controller can provide access
|
|
through two alternate methods: custom network topologies and indirect
|
|
access. The custom network topologies feature allows the controller to
|
|
access the instances through a supplied shell command in the
|
|
configuration file. Indirect access is used to specify instances that
|
|
can be used as proxy gateways by the user during cluster provisioning.
|
|
These options are discussed with examples of usage in
|
|
<xref linkend="data-processing-configuration-and-hardening"/>.
|
|
</para>
|
|
</section>
|
|
</section>
|