Updating features documentation

This change updates the grammar and content of the features page. Additionally it moves some of the more configuration related topics to the configuration guides depending on the nature of the feature. Change-Id: I1489ffbe1fe0f71a3a41a02e46614ab77263b8d8
2015-04-07 18:41:38 -04:00 · 2015-04-07 18:41:38 -04:00 · 2729062cef
commit 2729062cef
parent e70491d5c9
3 changed files with 537 additions and 421 deletions
--- a/doc/source/userdoc/advanced.configuration.guide.rst
+++ b/doc/source/userdoc/advanced.configuration.guide.rst
@ -5,72 +5,6 @@ This guide addresses specific aspects of Sahara configuration that pertain to
 advanced usage. It is divided into sections about various features that can be
 utilized, and their related configurations.

-Object Storage access using proxy users
---------------------------------------
-
-To improve security for clusters accessing files in Object Storage,
-sahara can be configured to use proxy users and delegated trusts for
-access. This behavior has been implemented to reduce the need for
-storing and distributing user credentials.
-
-The use of proxy users involves creating an Identity domain that will be
-designated as the home for these users. Proxy users will be
-created on demand by sahara and will only exist during a job execution
-which requires Object Storage access. The domain created for the
-proxy users must be backed by a driver that allows sahara's admin user to
-create new user accounts. This new domain should contain no roles, to limit
-the potential access of a proxy user.
-
-Once the domain has been created, sahara must be configured to use it by
-adding the domain name and any potential delegated roles that must be used
-for Object Storage access to the configuration file. With the
-domain enabled in sahara, users will no longer be required to enter
-credentials for their data sources and job binaries referenced in
-Object Storage.
-
-Detailed instructions
-^^^^^^^^^^^^^^^^^^^^^
-
-First a domain must be created in the Identity service to hold proxy
-users created by sahara. This domain must have an identity backend driver
-that allows for sahara to create new users. The default SQL engine is
-sufficient but if your keystone identity is backed by LDAP or similar
-then domain specific configurations should be used to ensure sahara's
-access. Please see the `Keystone documentation`_ for more information.
-
-.. _Keystone documentation: http://docs.openstack.org/developer/keystone/configuration.html#domain-specific-drivers
-
-With the domain created, sahara's configuration file should be updated to
-include the new domain name and any potential roles that will be needed. For
-this example let's assume that the name of the proxy domain is
-``sahara_proxy`` and the roles needed by proxy users will be ``Member`` and
-``SwiftUser``.
-
-.. sourcecode:: cfg
-
-    [DEFAULT]
-    use_domain_for_proxy_users=True
-    proxy_user_domain_name=sahara_proxy
-    proxy_user_role_names=Member,SwiftUser
-
-..
-
-A note on the use of roles. In the context of the proxy user, any roles
-specified here are roles intended to be delegated to the proxy user from the
-user with access to Object Storage. More specifically, any roles that
-are required for Object Storage access by the project owning the object
-store must be delegated to the proxy user for authentication to be
-successful.
-
-Finally, the stack administrator must ensure that images registered with
-sahara have the latest version of the Hadoop swift filesystem plugin
-installed. The sources for this plugin can be found in the
-`sahara extra repository`_. For more information on images or swift
-integration see the sahara documentation sections
-:ref:`diskimage-builder-label` and :ref:`swift-integration-label`.
-
-.. _Sahara extra repository: http://github.com/openstack/sahara-extra
-
 .. _custom_network_topologies:

 Custom network topologies
@ -106,6 +40,229 @@ a custom network namespace:
    [DEFAULT]
    proxy_command='ip netns exec ns_for_{network_id} nc {host} {port}'

+.. _data_locality_configuration:
+
+Data-locality configuration
+---------------------------
+
+Hadoop provides the data-locality feature to enable task tracker and
+data nodes the capability of spawning on the same rack, Compute node,
+or virtual machine. Sahara exposes this functionality to the user
+through a few configuration parameters and user defined topology files.
+
+To enable data-locality, set the ``enable_data_locality`` parameter to
+``True`` in the sahara configuration file
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    enable_data_locality=True
+
+With data locality enabled, you must now specify the topology files
+for the Compute and Object Storage services. These files are
+specified in the sahara configuration file as follows:
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    compute_topology_file=/etc/sahara/compute.topology
+    swift_topology_file=/etc/sahara/swift.topology
+
+The ``compute_topology_file`` should contain mappings between Compute
+nodes and racks in the following format:
+
+.. sourcecode:: cfg
+
+    compute1 /rack1
+    compute1 /rack2
+    compute1 /rack2
+
+Note that the Compute node names must be exactly the same as configured in
+OpenStack (``host`` column in admin list for instances).
+
+The ``swift_topology_file`` should contain mappings between Object Storage
+nodes and racks in the following format:
+
+.. sourcecode:: cfg
+
+    node1 /rack1
+    node2 /rack2
+    node3 /rack2
+
+Note that the Object Storage node names must be exactly the same as
+configured in the object ring. Also, you should ensure that instances
+with the task tracker process have direct access to the Object Storage
+nodes.
+
+Hadoop versions after 1.2.0 support four-layer topology (for more detail
+please see `HADOOP-8468 JIRA issue`_). To enable this feature set the
+``enable_hypervisor_awareness`` parameter to ``True`` in the configuration
+file. In this case sahara will add the Compute node ID as a second level of
+topology for virtual machines.
+
+.. _HADOOP-8468 JIRA issue: https://issues.apache.org/jira/browse/HADOOP-8468
+
+.. _distributed-mode-configuration:
+
+Distributed mode configuration
+------------------------------
+
+Sahara can be configured to run in a distributed mode that creates a
+separation between the API and engine processes. This allows the API
+process to remain relatively free to handle requests while offloading
+intensive tasks to the engine processes.
+
+The ``sahara-api`` application works as a front-end and serves user
+requests. It offloads 'heavy' tasks to the ``sahara-engine`` process
+via RPC mechanisms. While the ``sahara-engine`` process could be loaded
+with tasks, ``sahara-api`` stays free and hence may quickly respond to
+user queries.
+
+If sahara runs on several hosts, the API requests could be
+balanced between several ``sahara-api`` hosts using a load balancer.
+It is not required to balance load between different ``sahara-engine``
+hosts as this will be automatically done via the message broker.
+
+If a single host becomes unavailable, other hosts will continue
+serving user requests. Hence, a better scalability is achieved and some
+fault tolerance as well. Note that distributed mode is not a true
+high availability. While the failure of a single host does not
+affect the work of the others, all of the operations running on
+the failed host will stop. For example, if a cluster scaling is
+interrupted, the cluster will be stuck in a half-scaled state. The
+cluster might continue working, but it will be impossible to scale it
+further or run jobs on it via EDP.
+
+To run sahara in distributed mode pick several hosts on which
+you want to run sahara services and follow these steps:
+
+ * On each host install and configure sahara using the
+   `installation guide <../installation.guide.html>`_
+   except:
+
+    * Do not run ``sahara-db-manage`` or launch sahara with ``sahara-all``
+    * Ensure that each configuration file provides a database connection
+      string to a single database for all hosts.
+
+ * Run ``sahara-db-manage`` as described in the installation guide,
+   but only on a single (arbitrarily picked) host.
+
+ * The ``sahara-api`` and ``sahara-engine`` processes use oslo.messaging to
+   communicate with each other. You will need to configure it properly on
+   each host (see below).
+
+ * Run ``sahara-api`` and ``sahara-engine`` on the desired hosts. You may
+   run both processes on the same or separate hosts as long as they are
+   configured to use the same message broker and database.
+
+To configure oslo.messaging, first you will need to choose a message
+broker driver. Currently there are three drivers provided: RabbitMQ, Qpid
+or ZeroMQ. For the RabbitMQ or Qpid drivers please see the
+:ref:`notification-configuration` documentation for an explanation of
+common configuration options.
+
+For an expanded view of all the options provided by each message broker
+driver in oslo.messaging please refer to the options available in the
+respective source trees:
+
+ * For Rabbit MQ see
+
+   * rabbit_opts variable in `impl_rabbit.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_rabbit.py?id=1.4.0#n38>`_
+   * amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
+
+ * For Qpid see
+
+   * qpid_opts variable in `impl_qpid.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_qpid.py?id=1.4.0#n40>`_
+   * amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
+
+ * For Zmq see
+
+   * zmq_opts variable in `impl_zmq.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_zmq.py?id=1.4.0#n49>`_
+   * matchmaker_opts variable in `matchmaker.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker.py?id=1.4.0#n27>`_
+   * matchmaker_redis_opts variable in `matchmaker_redis.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_redis.py?id=1.4.0#n26>`_
+   * matchmaker_opts variable in `matchmaker_ring.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_ring.py?id=1.4.0#n27>`_
+
+These options will also be present in the generated sample configuration
+file. For instructions on creating the configuration file please see the
+:doc:`configuration.guide`.
+
+External key manager usage (EXPERIMENTAL)
+-----------------------------------------
+
+Sahara generates and stores several passwords during the course of operation.
+To harden sahara's usage of passwords it can be instructed to use an
+external key manager for storage and retrieval of these secrets. To enable
+this feature there must first be an OpenStack Key Manager service deployed
+within the stack. Currently, the barbican project is the only key manager
+supported by sahara.
+
+With a Key Manager service deployed on the stack, sahara must be configured
+to enable the external storage of secrets. This is accomplished by editing
+the sahara configuration file as follows:
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    use_external_key_manager=True
+
+.. TODO (mimccune)
+    this language should be removed once a new keystone authentication
+    section has been created in the configuration file.
+
+Additionally, at this time there are two more values which must be provided
+to ensure proper access for sahara to the Key Manager service. These are
+the Identity domain for the administrative user and the domain for the
+administrative project. By default these values will appear as:
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    admin_user_domain_name=default
+    admin_project_domain_name=default
+
+With all of these values configured and the Key Manager service deployed,
+sahara will begin storing its secrets in the external manager.
+
+Indirect instance access through proxy nodes
+--------------------------------------------
+
+.. warning::
+    The indirect VMs access feature is in alpha state. We do not
+    recommend using it in a production environment.
+
+Sahara needs to access instances through SSH during cluster setup. This
+access can be obtained a number of different ways (see
+:ref:`neutron-nova-network`, :ref:`floating_ip_management`,
+:ref:`custom_network_topologies`). Sometimes it is impossible to provide
+access to all nodes (because of limited numbers of floating IPs or security
+policies). In these cases access can be gained using other nodes of the
+cluster as proxy gateways. To enable this set ``is_proxy_gateway=True``
+for the node group you want to use as proxy. Sahara will communicate with
+all other cluster instances through the instances of this node group.
+
+Note, if ``use_floating_ips=true`` and the cluster contains a node group with
+``is_proxy_gateway=True``, the requirement to have ``floating_ip_pool``
+specified is applied only to the proxy node group. Other instances will be
+accessed through proxy instances using the standard private network.
+
+Note, the Cloudera Hadoop plugin doesn't support access to Cloudera manager
+through a proxy node. This means that for CDH clusters only nodes with
+the Cloudera manager can be designated as proxy gateway nodes.
+
+Multi region deployment
+-----------------------
+
+Sahara supports multi region deployment. To enable this option each
+instance of sahara should have the ``os_region_name=<region>``
+parameter set in the configuration file. The following example demonstrates
+configuring sahara to use the ``RegionOne`` region:
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    os_region_name=RegionOne
+
+.. _non-root-users:

 Non-root users
 --------------
@ -166,39 +323,111 @@ set the following parameter in your sahara configuration file:
 For more information on rootwrap please refer to the
 `official Rootwrap documentation <https://wiki.openstack.org/wiki/Rootwrap>`_

-External key manager usage (EXPERIMENTAL)
-----------------------------------------
+Object Storage access using proxy users
+---------------------------------------

-Sahara generates and stores several passwords during the course of operation.
-To harden sahara's usage of passwords it can be instructed to use an
-external key manager for storage and retrieval of these secrets. To enable
-this feature there must first be an OpenStack Key Manager service deployed
-within the stack. Currently, the barbican project is the only key manager
-supported by sahara.
+To improve security for clusters accessing files in Object Storage,
+sahara can be configured to use proxy users and delegated trusts for
+access. This behavior has been implemented to reduce the need for
+storing and distributing user credentials.

-With a Key Manager service deployed on the stack, sahara must be configured
-to enable the external storage of secrets. This is accomplished by editing
-the configuration file as follows:
+The use of proxy users involves creating an Identity domain that will be
+designated as the home for these users. Proxy users will be
+created on demand by sahara and will only exist during a job execution
+which requires Object Storage access. The domain created for the
+proxy users must be backed by a driver that allows sahara's admin user to
+create new user accounts. This new domain should contain no roles, to limit
+the potential access of a proxy user.
+
+Once the domain has been created, sahara must be configured to use it by
+adding the domain name and any potential delegated roles that must be used
+for Object Storage access to the sahara configuration file. With the
+domain enabled in sahara, users will no longer be required to enter
+credentials for their data sources and job binaries referenced in
+Object Storage.
+
+Detailed instructions
+^^^^^^^^^^^^^^^^^^^^^
+
+First a domain must be created in the Identity service to hold proxy
+users created by sahara. This domain must have an identity backend driver
+that allows for sahara to create new users. The default SQL engine is
+sufficient but if your keystone identity is backed by LDAP or similar
+then domain specific configurations should be used to ensure sahara's
+access. Please see the `Keystone documentation`_ for more information.
+
+.. _Keystone documentation: http://docs.openstack.org/developer/keystone/configuration.html#domain-specific-drivers
+
+With the domain created, sahara's configuration file should be updated to
+include the new domain name and any potential roles that will be needed. For
+this example let's assume that the name of the proxy domain is
+``sahara_proxy`` and the roles needed by proxy users will be ``Member`` and
+``SwiftUser``.

 .. sourcecode:: cfg

    [DEFAULT]
-    use_external_key_manager=True
+    use_domain_for_proxy_users=True
+    proxy_user_domain_name=sahara_proxy
+    proxy_user_role_names=Member,SwiftUser

-.. TODO (mimccune)
-    this language should be removed once a new keystone authentication
-    section has been created in the configuration file.
+..

-Additionally, at this time there are two more values which must be provided
-to ensure proper access for sahara to the Key Manager service. These are
-the Identity domain for the administrative user and the domain for the
-administrative project. By default these values will appear as:
+A note on the use of roles. In the context of the proxy user, any roles
+specified here are roles intended to be delegated to the proxy user from the
+user with access to Object Storage. More specifically, any roles that
+are required for Object Storage access by the project owning the object
+store must be delegated to the proxy user for authentication to be
+successful.

-.. sourcecode:: cfg
+Finally, the stack administrator must ensure that images registered with
+sahara have the latest version of the Hadoop swift filesystem plugin
+installed. The sources for this plugin can be found in the
+`sahara extra repository`_. For more information on images or swift
+integration see the sahara documentation sections
+:ref:`diskimage-builder-label` and :ref:`swift-integration-label`.

-    [DEFAULT]
-    admin_user_domain_name=default
-    admin_project_domain_name=default
+.. _Sahara extra repository: http://github.com/openstack/sahara-extra

-With all of these values configured and the Key Manager service deployed,
-sahara will begin storing its secrets in the external manager.
+.. _volume_instance_locality_configuration:
+
+Volume instance locality configuration
+--------------------------------------
+
+The Block Storage service provides the ability to define volume instance
+locality to ensure that instance volumes are created on the same host
+as the hypervisor. The ``InstanceLocalityFilter`` provides the mechanism
+for the selection of a storage provider located on the same physical
+host as an instance.
+
+To enable this functionality for instances of a specific node group, the
+``volume_local_to_instance`` field in the node group template should be
+set to ``True`` and some extra configurations are needed:
+
+* The cinder-volume service should be launched on every physical host and at
+  least one physical host should run both cinder-scheduler and
+  cinder-volume services.
+* ``InstanceLocalityFilter`` should be added to the list of default filters
+  (``scheduler_default_filters`` in cinder) for the Block Storage
+  configuration.
+* The Extended Server Attributes extension needs to be active in the Compute
+  service (this is true by default in nova), so that the
+  ``OS-EXT-SRV-ATTR:host`` property is returned when requesting instance
+  info.
+* The user making the call needs to have sufficient rights for the property to
+  be returned by the Compute service.
+  This can be done by:
+
+  * by changing nova's ``policy.json`` to allow the user access to the
+    ``extended_server_attributes`` option.
+  * by designating an account with privileged rights in the cinder
+    configuration:
+
+    .. sourcecode:: cfg
+
+        os_privileged_user_name =
+        os_privileged_user_password =
+        os_privileged_user_tenant =
+
+It should be noted that in a situation when the host has no space for volume
+creation, the created volume will have an ``Error`` state and can not be used.
--- a/doc/source/userdoc/configuration.guide.rst
+++ b/doc/source/userdoc/configuration.guide.rst
@ -5,6 +5,9 @@ This guide covers the steps for a basic configuration of sahara.
 It will help you to configure the service in the most simple
 manner.

+Basic configuration
+-------------------
+
 Sahara is packaged with a basic sample configration file:
 ``sahara.conf.sample-basic``. This file contains all the essential
 parameters that are required for sahara. We recommend creating your
@ -70,8 +73,66 @@ file which control the level of logging output; ``verbose`` and
 level will be set to INFO, and with ``debug`` set to ``true`` it will
 be set to DEBUG. By default the sahara's log level is set to WARNING.

-Sahara notifications configuration
----------------------------------
+.. _neutron-nova-network:
+
+Networking configuration
+------------------------
+
+By default sahara is configured to use the nova-network implementation
+of OpenStack Networking. If an OpenStack cluster uses Neutron,
+then the ``use_neutron`` parameter should be set to ``True`` in the
+sahara configuration file. Additionally, if the cluster supports network
+namespaces the ``use_namespaces`` property can be used to enable their usage.
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    use_neutron=True
+    use_namespaces=True
+
+.. note::
+    If a user other than ``root`` will be running the Sahara server
+    instance and namespaces are used, some additional configuration is
+    required, please see :ref:`non-root-users` for more information.
+
+.. _floating_ip_management:
+
+Floating IP management
++++++++++++++++++++++
+
+During cluster setup sahara must access instances through a secure
+shell(SSH). To establish this connection it may use either the fixed
+or floating IP address of an instance. By default sahara is configured
+to use floating IP addresses for access. This is controlled by the
+``use_floating_ips`` configuration parameter. With this setup the user
+has two options for ensuring that all instances gain a floating IP
+address:
+
+* If using the nova-network, it may be configured to assign floating
+  IP addresses automatically by setting the ``auto_assign_floating_ip``
+  parameter to ``True`` in the nova configuration file
+  (usually ``nova.conf``).
+
+* The user may specify a floating IP address pool for each node
+  group directly.
+
+.. warning::
+    When using floating IP addresses for management
+    (``use_floating_ip=True``) **every** instance in the cluster must have
+    a floating IP address, otherwise sahara will not be able to utilize
+    that cluster.
+
+If not using floating IP addresses (``use_floating_ip=False``) sahara
+will use fixed IP addresses for instance management. When using neutron
+for the Networking service the user will be able to choose the
+fixed IP network for all instances in a cluster. Whether using nova-network
+or neutron it is important to ensure that all instances running sahara
+have access to the fixed IP networks.
+
+.. _notification-configuration:
+
+Notifications configuration
+---------------------------

 Sahara can be configured to send notifications to the OpenStack
 Telemetry module. To enable this functionality the following parameters
@ -129,15 +190,38 @@ in the ``[oslo_messaging_qpid]`` section:
    qpid_password=
 ..

+.. _orchestration-configuration:
+
+Orchestration configuration
+---------------------------
+
+By default sahara is configured to use the direct engine for instance
+creation. This engine makes calls directly to the services required
+for instance provisioning. Sahara can be configured to use the OpenStack
+Orchestration service for this task instead of the direct engine.
+
+To configure sahara to utilize the Orchestration service for instance
+provisioning the ``infrastructure_engine`` parameter should be modified in
+the configuration file as follows:
+
+.. sourcecode:: cfg
+
+    [DEFAULT]
+    infrastructure_engine=heat
+
+There is feature parity between the direct and heat infrastructure
+engines. We recommend using the heat engine for provisioning as the
+direct is planned for deprecation.
+
 .. _policy-configuration-label:

-Sahara policy configuration
+Policy configuration
 ---------------------------

 Sahara’s public API calls may be restricted to certain sets of users by
 using a policy configuration file. The location of the policy file(s)
 is controlled by the ``policy_file`` and ``policy_dirs`` parameters
-in the ``[DEFAULT]`` section. By default sahara will search for
+in the ``[oslo_policy]`` section. By default sahara will search for
 a ``policy.json`` file in the same directory as the configuration file.

 Examples
--- a/doc/source/userdoc/features.rst
+++ b/doc/source/userdoc/features.rst
@ -1,248 +1,99 @@
 Features Overview
 =================

-Cluster Scaling
---------------
-
-The mechanism of cluster scaling is designed to enable a user to change the
-number of running instances without creating a new cluster.
-A user may change the number of instances in existing Node Groups or add new Node
-Groups.
-
-If a cluster fails to scale properly, all changes will be rolled back.
-
-Swift Integration
-----------------
-
-In order to leverage Swift within Hadoop, including using Swift data sources
-from within EDP, Hadoop requires the application of a patch.
-For additional information about using Swift with Sahara, including patching
-Hadoop and configuring Sahara, please refer to the :doc:`hadoop-swift`
-documentation.
-
-Cinder support
--------------
-Cinder is a block storage service that can be used as an alternative for an
-ephemeral drive. Using Cinder volumes increases reliability of data which is
-important for HDFS service.
-
-A user can set how many volumes will be attached to each node in a Node Group
-and the size of each volume.
-
-All volumes are attached during Cluster creation/scaling operations.
-
-.. _neutron-nova-network:
-
-Neutron and Nova Network support
--------------------------------
-OpenStack clusters may use Nova or Neutron as a networking service. Sahara
-supports both, but when deployed a special configuration for networking
-should be set explicitly. By default Sahara will behave as if Nova is used.
-If an OpenStack cluster uses Neutron, then the ``use_neutron`` property should
-be set to ``True`` in the Sahara configuration file. Additionally, if the
-cluster supports network namespaces the ``use_namespaces`` property can be
-used to enable their usage.
-
-.. sourcecode:: cfg
-
-    [DEFAULT]
-    use_neutron=True
-    use_namespaces=True
-
-.. note::
-    If a user other than ``root`` will be running the Sahara server
-    instance and namespaces are used, some additional configuration is
-    required, please see the :doc:`advanced.configuration.guide` for more
-    information.
-
-.. _floating_ip_management:
-
-Floating IP Management
----------------------
-
-Sahara needs to access instances through ssh during a Cluster setup. To
-establish a connection Sahara may
-use both: fixed and floating IP of an Instance. By default
-``use_floating_ips`` parameter is set to ``True``, so
-Sahara will use Floating IP of an Instance to connect. In this case, the user has
-two options for how to make all instances
-get a floating IP:
-
-* Nova Network may be configured to assign floating IPs automatically by
-  setting ``auto_assign_floating_ip`` to ``True`` in ``nova.conf``
-* User may specify a floating IP pool for each Node Group directly.
-
-Note: When using floating IPs for management (``use_floating_ip=True``)
-**every** instance in the Cluster should have a floating IP,
-otherwise Sahara will not be able to work with it.
-
-If the ``use_floating_ips`` parameter is set to ``False`` Sahara will use
-Instances' fixed IPs for management. In this case
-the node where Sahara is running should have access to Instances' fixed IP
-network. When OpenStack uses Neutron for
-networking, a user will be able to choose fixed IP network for all instances
-in a Cluster.
+This page highlights some of the most prominent features available in
+sahara. The guidance provided here is primarily focused on the
+runtime aspects of sahara, for discussions about configuring the sahara
+server processes please see the :doc:`configuration.guide` and
+:doc:`advanced.configuration.guide`.

 Anti-affinity
 -------------
-One of the problems in Hadoop running on OpenStack is that there is no
-ability to control where the machine is actually running.
-We cannot be sure that two new virtual machines are started on different
-physical machines. As a result, any replication with the cluster
+
+One of the problems with running data processing applications on OpenStack
+is the inability to control where an instance is actually running. It is
+not always possible to ensure that two new virtual machines are started on
+different physical machines. As a result, any replication within the cluster
 is not reliable because all replicas may turn up on one physical machine.
-The anti-affinity feature provides an ability to explicitly tell Sahara to run
-specified processes on different compute nodes. This
-is especially useful for the Hadoop data node process to make HDFS replicas
-reliable.
+To remedy this, sahara provides the anti-affinity feature to explicitly
+command all instances of the specified processes to spawn on different
+Compute nodes. This is especially useful for Hadoop data node processes
+to increase HDFS replica reliability.

-Starting with the Juno release, Sahara creates server groups with the
-``anti-affinity`` policy to enable the anti-affinity feature. Sahara creates one
-server group per cluster and assigns all instances with affected processes to
-this server group. Refer to the Nova documentation on how server groups work.
+Starting with the Juno release, sahara can create server groups with the
+``anti-affinity`` policy to enable this feature. Sahara creates one server
+group per cluster and assigns all instances with affected processes to
+this server group. Refer to the `Nova documentation`_ on how server groups
+work.

-This feature is supported by all plugins out of the box.
+This feature is supported by all plugins out of the box, and can be enabled
+during the cluster template creation.
+
+.. _Nova documentation: http://docs.openstack.org/developer/nova
+
+Block Storage support
+---------------------
+
+OpenStack Block Storage (cinder) can be used as an alternative for
+ephemeral drives on instances. Using Block Storage volumes increases the
+reliability of data which is important for HDFS service.
+
+A user can set how many volumes will be attached to each instance in a
+node group, and the size of each volume.
+
+All volumes are attached during cluster creation/scaling operations.
+
+Cluster scaling
+---------------
+
+Cluster scaling allows users to change the number of running instances
+in a cluster without needing to recreate the cluster. Users may
+increase or decrease the number of instances in node groups or add
+new node groups to existing clusters.
+
+If a cluster fails to scale properly, all changes will be rolled back.

 Data-locality
 -------------
-It is extremely important for data processing to work locally (on the same rack,
-OpenStack compute node or even VM). Hadoop supports the data-locality feature and can schedule jobs to
-task tracker nodes that are local for input stream. In this case task tracker
-could communicate directly with the local data node.

-Sahara supports topology configuration for HDFS and Swift data sources.
+It is extremely important for data processing applications to perform
+work locally on the same rack, OpenStack Compute node, or virtual
+machine. Hadoop supports a data-locality feature and can schedule jobs
+to task tracker nodes that are local for the input stream. In this
+manner the task tracker nodes can communicate directly with the local
+data nodes.

-To enable data-locality set ``enable_data_locality`` parameter to ``True`` in
-Sahara configuration file
+Sahara supports topology configuration for HDFS and Object Storage
+data sources. For more information on configuring this option please
+see the :ref:`data_locality_configuration` documentation.

-.. sourcecode:: cfg
-
-    enable_data_locality=True
-
-In this case two files with topology must be provided to Sahara.
-Options ``compute_topology_file`` and ``swift_topology_file`` parameters
-control location of files with compute and swift nodes topology descriptions
-correspondingly.
-
-``compute_topology_file`` should contain mapping between compute nodes and
-racks in the following format:
-
-.. sourcecode:: cfg
-
-    compute1 /rack1
-    compute1 /rack2
-    compute1 /rack2
-
-Note that the compute node name must be exactly the same as configured in
-OpenStack (``host`` column in admin list for instances).
-
-``swift_topology_file`` should contain mapping between swift nodes and
-racks in the following format:
-
-.. sourcecode:: cfg
-
-    node1 /rack1
-    node2 /rack2
-    node3 /rack2
-
-Note that the swift node must be exactly the same as configures in object.builder
-swift ring. Also make sure that VMs with the task tracker service have direct access
-to swift nodes.
-
-Hadoop versions after 1.2.0 support four-layer topology
-(https://issues.apache.org/jira/browse/HADOOP-8468). To enable this feature
-set ``enable_hypervisor_awareness`` option to ``True`` in Sahara configuration
-file. In this case Sahara will add the compute node ID as a second level of
-topology for Virtual Machines.
-
-Volume-to-instance locality
---------------------------
-
-Having an instance and an attached volume on the same physical host can be very
-helpful in order to achieve high-performance disk I/O. To achieve this,
-volume-to-instance locality should be used.
-
-Cinder has ``InstanceLocalityFilter`` which enables selection of a storage
-back-end located on the host where the instance's hypervisor is running. It
-allows volumes to be created on the same physical host as the instance.
-
-To enable this functionality for instances of a specific node group, the
-``volume_local_to_instance`` field in node group template should be set to
-``True`` and some extra configurations are needed:
-
-* Cinder-volume service should be launched on every physical host and at least
-  one physical host should run both cinder-scheduler and cinder-volume services.
-* ``InstanceLocalityFilter`` should be added to the list of default filters
-  (``scheduler_default_filters`` in Cinder config).
-* The Extended Server Attributes extension needs to be active in Nova
-  (this is true by default), so that the ``OS-EXT-SRV-ATTR:host`` property is
-  returned when requesting instance info.
-* The user making the call needs to have sufficient rights for the property to
-  be returned by Nova.
-  This can be made:
-
-  * by changing Nova's ``policy.json`` (the ``extended_server_attributes`` option)
-  * by setting an account with privileged rights in Cinder config:
-
-    .. sourcecode:: cfg
-
-        os_privileged_user_name =
-        os_privileged_user_password =
-        os_privileged_user_tenant =
-
-It should be noted that in a situation when the host has no space for volume
-creation, the created volume will have an ``Error`` state and can not be used.
-
-Security group management
-------------------------
-
-Sahara allows you to control which security groups will be used for created
-instances. This can be done by providing the ``security_groups`` parameter for
-the Node Group or Node Group Template. By default an empty list is used that
-will result in using the default security group.
-
-Sahara may also create a security group for instances in the node group
-automatically. This security group will only have open ports which are
-required by instance processes or the Sahara engine. This option is useful
-for development and secured from outside environments, but for production
-environments it is recommended to control the security group policy manually.
-
-Heat Integration
+Distributed Mode
 ----------------

-Sahara may use
-`OpenStack Orchestration engine <https://wiki.openstack.org/wiki/Heat>`_
-(aka Heat) to provision nodes for Hadoop cluster.
-To make Sahara work with Heat the following steps are required:
+The :doc:`installation.guide` suggests launching sahara as a single
+``sahara-all`` process. It is also possible to run sahara in distributed
+mode with ``sahara-api`` and ``sahara-engine`` processes running on several
+machines simultaneously. Running in distributed mode allows sahara to
+offload intensive tasks to the engine processes while keeping the API
+process free to handle requests.

-* Your OpenStack installation must have 'orchestration' service up and running
-* Sahara must contain the following configuration parameter in *sahara.conf*:
-
-.. sourcecode:: cfg
-
-    # An engine which will be used to provision infrastructure for Hadoop cluster. (string value)
-    infrastructure_engine=heat
-
-
-There is a feature parity between direct and heat infrastructure engines. It is
-recommended to use the heat engine since the direct engine will be deprecated at some
-point.
-
-Multi region deployment
-----------------------
-Sahara supports multi region deployment. In this case, each instance of Sahara
-should have the ``os_region_name=<region>`` property set in the
-configuration file.
+For an expanded discussion of configuring sahara to run in distrbuted
+mode please see the :ref:`distributed-mode-configuration` documentation.

 Hadoop HDFS High Availability
 -----------------------------
-Hadoop HDFS High Availability (HDFS HA) uses 2 Namenodes in an active/standby
-architecture to ensure that HDFS will continue to work even when the active namenode fails.
-The High Availability is achieved by using a set of JournalNodes and Zookeeper servers along
-with ZooKeeper Failover Controllers (ZKFC) and some additional configurations and changes to
-HDFS and other services that use HDFS.

-Currently HDFS HA is only supported with the HDP 2.0.6 plugin. The feature is enabled through
-a cluster_configs parameter in the cluster's JSON:
+Hadoop HDFS High Availability (HDFS HA) provides an architecture to ensure
+that HDFS will continue to work in the result of an active namenode failure.
+It uses 2 namenodes in an active/standby configuration to provide this
+availibility.
+
+High availability is achieved by using a set of journalnodes and Zookeeper
+servers along with ZooKeeper Failover Controllers (ZKFC) and additional
+configuration changes to HDFS and other services that use HDFS.
+
+Currently HDFS HA is only supported with the HDP 2.0.6 plugin. The feature
+is enabled through a ``cluster_configs`` parameter in the cluster's JSON:

 .. sourcecode:: cfg

@ -252,9 +103,38 @@ a cluster_configs parameter in the cluster's JSON:
                }
        }

+Networking support
+------------------
+
+Sahara supports both the nova-network and neutron implementations of
+OpenStack Networking. By default sahara is configured to behave as if
+the nova-network implementation is available. For OpenStack installations
+that are using the neutron project please see :ref:`neutron-nova-network`.
+
+Object Storage support
+----------------------
+
+Sahara can use OpenStack Object Storage (swift) to store job binaries and data
+sources utilized by its job executions and clusters. In order to
+leverage this support within Hadoop, including using Object Storage
+for data sources for EDP, Hadoop requires the application of
+a patch. For additional information about enabling this support,
+including patching Hadoop and configuring sahara, please refer to
+the :doc:`hadoop-swift` documentation.
+
+Orchestration support
+---------------------
+
+Sahara may use the
+`OpenStack Orchestration engine <https://wiki.openstack.org/wiki/Heat>`_
+(heat) to provision nodes for clusters. For more information about
+enabling Orchestration usage in sahara please see
+:ref:`orchestration-configuration`.
+
 Plugin Capabilities
 -------------------
-The below tables provides a plugin capability matrix:
+
+The following table provides a plugin capability matrix:

 +--------------------------+---------+----------+----------+-------+
 |                          | Plugin                                |
@ -274,111 +154,34 @@ The below tables provides a plugin capability matrix:
 | EDP                      | x       | x        | x        | x     |
 +--------------------------+---------+----------+----------+-------+

-Running Sahara in Distributed Mode
----------------------------------
+Security group management
+-------------------------

-The :doc:`installation.guide` suggests to launch
-Sahara as a single 'sahara-all' process. It is also possible to run Sahara
-in distributed mode with 'sahara-api' and 'sahara-engine' processes running
-on several machines simultaneously.
+.. TODO (mimccune)
+    This section could use an example to show how security groups are
+    used.

-Sahara-api works as a front-end and serves users' requests. It
-offloads 'heavy' tasks to the sahara-engine via RPC mechanism. While the
-sahara-engine could be loaded, sahara-api by design stays free
-and hence may quickly respond on user queries.
+Sahara allows you to control which security groups will be used for created
+instances. This can be done by providing the ``security_groups`` parameter for
+the node group or node group template. The default for this option is an
+empty list, which will result in the default project security group being
+used for the instances.

-If Sahara runs on several machines, the API requests could be
-balanced between several sahara-api instances using a load balancer.
-It is not required to balance load between different sahara-engine
-instances, as that will be automatically done via a message queue.
+Sahara may also create a security group for instances in the node group
+automatically. This security group will only contain open ports for required
+instance processes and the sahara engine. This option is useful
+for development and for when your installation is secured from outside
+environments. For production environments we recommend controlling the
+security group policy manually.

-If a single machine goes down, others will continue serving
-users' requests. Hence a better scalability is achieved and some
-fault tolerance as well. Note that the proposed solution is not
-a true High Availability. While failure of a single machine does not
-affect work of other machines, all of the operations running on
-the failed machine will stop. For example, if a cluster
-scaling is interrupted, the cluster will be stuck in a half-scaled state.
-The cluster will probably continue working, but it will be impossible
-to scale it further or run jobs on it via EDP.
+Volume-to-instance locality
+---------------------------

-To run Sahara in distributed mode pick several machines on which
-you want to run Sahara services and follow these steps:
+Having an instance and an attached volume on the same physical host can
+be very helpful in order to achieve high-performance disk I/O operations.
+To achieve this, sahara provides access to the Block Storage
+volume intance locality functionality.

- * On each machine install and configure Sahara using the
-   `installation guide <../installation.guide.html>`_
-   except:
-
-    * Do not run 'sahara-db-manage' or launch Sahara with 'sahara-all'
-    * Make sure sahara.conf provides database connection string to a
-      single database on all machines.
-
- * Run 'sahara-db-manage' as described in the installation guide,
-   but only on a single (arbitrarily picked) machine.
-
- * sahara-api and sahara-engine processes use oslo.messaging to
-   communicate with each other. You need to configure it properly on
-   each node (see below).
-
- * run sahara-api and sahara-engine on the desired nodes. On a node
-   you can run both sahara-api and sahara-engine or you can run them on
-   separate nodes. It does not matter as long as they are configured
-   to use the same message broker and database.
-
-To configure oslo.messaging, first you need to pick the driver you are
-going to use. Right now three drivers are provided: Rabbit MQ, Qpid or Zmq.
-To use Rabbit MQ or Qpid driver, you will have to setup messaging broker.
-The picked driver must be supplied in ``sahara.conf`` in
-``[DEFAULT]/rpc_backend`` parameter. Use one the following values:
-``rabbit``, ``qpid`` or ``zmq``. Next you have to supply
-driver-specific options.
-
-Unfortunately, right now there is no documentation with a description of
-drivers' configuration. The options are available only in source code.
-
- * For Rabbit MQ see
-
-   * rabbit_opts variable in `impl_rabbit.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_rabbit.py?id=1.4.0#n38>`_
-   * amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
-
- * For Qpid see
-
-   * qpid_opts variable in `impl_qpid.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_qpid.py?id=1.4.0#n40>`_
-   * amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
-
- * For Zmq see
-
-   * zmq_opts variable in `impl_zmq.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_zmq.py?id=1.4.0#n49>`_
-   * matchmaker_opts variable in `matchmaker.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker.py?id=1.4.0#n27>`_
-   * matchmaker_redis_opts variable in `matchmaker_redis.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_redis.py?id=1.4.0#n26>`_
-   * matchmaker_opts variable in `matchmaker_ring.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_ring.py?id=1.4.0#n27>`_
-
-You can find the same options defined in ``sahara.conf.sample``. You can use
-it to find section names for each option (matchmaker options are
-defined not in ``[DEFAULT]``)
-
-Managing instances with limited access
--------------------------------------
-
-.. warning::
-    The indirect VMs access feature is in alpha state. We do not
-    recommend using it in a production environment.
-
-Sahara needs to access instances through ssh during a Cluster setup. This
-could be obtained by a number of ways (see :ref:`neutron-nova-network`,
-:ref:`floating_ip_management`, :ref:`custom_network_topologies`). But
-sometimes it is impossible to provide access to all nodes (because of limited
-numbers of floating IPs or security policies). In this case
-access can be gained using other nodes of the cluster. To do that set
-``is_proxy_gateway=True`` for the node group you want to use as proxy. In this
-case Sahara will communicate with all other instances via instances of this
-node group.
-
-Note, if ``use_floating_ips=true`` and the cluster contains a node group with
-``is_proxy_gateway=True``, requirement to have ``floating_ip_pool`` specified
-is applied only to the proxy node group. Other instances will be accessed via
-proxy instances using standard private network.
-
-Note, Cloudera hadoop plugin doesn't support access to Cloudera manager via
-proxy node. This means that for CDH cluster only node with manager could be
-be a proxy gateway node.
+For more information on using volume instance locality with sahara,
+please see the :ref:`volume_instance_locality_configuration`
+documentation.