4.9 KiB
Sahara Cluster Statuses Overview
All Sahara Cluster operations are performed in multiple steps. A
Cluster object has a Status
attribute which changes when
Sahara finishes one step of operations and starts another one. Also a
Cluster object has a Status description
attribute which
changes whenever Cluster errors occur.
- Sahara supports three types of Cluster operations:
-
- Create a new Cluster
- Scale/Shrink an existing Cluster
- Delete an existing Cluster
Creating a new Cluster
1. Validating
Before performing any operations with OpenStack environment, Sahara validates user input.
- There are two types of validations, that are done:
-
- Check that a request contains all necessary fields and that the request does not violate any constraints like unique naming, etc.
- Plugin check (optional). The provisioning Plugin may also perform any specific checks like a Cluster topology validation check.
If any of the validations fails during creating, the Cluster object
will still be kept in the database with an Error
status. If
any validations fails during scaling the Active
Cluster, it
will be kept with an Active
status. In both cases status
description will contain error messages about the reasons of
failure.
2. InfraUpdating
This status means that the Provisioning plugin is performing some infrastructure updates.
3. Spawning
- Sahara sends requests to OpenStack for all resources to be created:
-
- VMs
- Volumes
- Floating IPs (if Sahara is configured to use Floating IPs)
It takes some time for OpenStack to schedule all the required VMs and
Volumes, so sahara will wait until all of the VMs are in an
Active
state.
4. Waiting
Sahara waits while VMs' operating systems boot up and all internal infrastructure components like networks and volumes are attached and ready to use.
5. Preparing
Sahara prepares a Cluster for starting. This step includes generating
the /etc/hosts
file or changing
/etc/resolv.conf
file (if you use Designate service), so
that all instances can access each other by a hostname. Also Sahara
updates the authorized_keys
file on each VM, so that VMs
can communicate without passwords.
6. Configuring
Sahara pushes service configurations to VMs. Both XML and JSON based configurations and environmental variables are set on this step.
7. Starting
Sahara is starting Hadoop services on Cluster's VMs.
8. Active
Active status means that a Cluster has started successfully and is ready to run EDP Jobs.
Scaling/Shrinking an existing Cluster
1. Validating
Sahara checks the scale/shrink request for validity. The Plugin method called for performing Plugin specific checks is different from the validation method in creation.
2. Scaling
Sahara performs database operations updating all affected existing Node Groups and creating new ones to join the existing Node Groups.
3. Adding Instances
Status is similar to Spawning
in Cluster creation.
Sahara adds required amount of VMs to the existing Node Groups and
creates new Node Groups.
4. Configuring
Status is similar to Configuring
in Cluster creation.
New instances are being configured in the same manner as already
existing ones. The VMs in the existing Cluster are also updated with a
new /etc/hosts
file or /etc/resolv.conf
file.
5. Decommissioning
Sahara stops Hadoop services on VMs that will be deleted from a Cluster. Decommissioning a Data Node may take some time because Hadoop rearranges data replicas around the Cluster, so that no data will be lost after that Data Node is deleted.
6. Deleting Instances
- Sahara sends requests to OpenStack to release unneeded resources:
-
- VMs
- Volumes
- Floating IPs (if they are used)
7. Active
The same Active
status as after Cluster creation.
Deleting an existing Cluster
1. Deleting
The only step, that releases all Cluster's resources and removes it from the database.
Error State
If the Cluster creation fails, the Cluster will enter the
Error
state. This status means the Cluster may not be able
to perform any operations normally. This cluster will stay in the
database until it is manually deleted. The reason for failure may be
found in the sahara logs. Also, the status description will contain
information about the error.
If an error occurs during the Adding Instances
operation, Sahara will first try to rollback this operation. If a
rollback is impossible or fails itself, then the Cluster will also go
into an Error
state. If a rollback was successful, Cluster
will get into an Active
state and status description will
contain a short message about the reason of
Adding Instances
failure.