Configure Compute service groups

Configure Compute service groups To effectively manage and utilize compute nodes, the Compute service must know their statuses. For example, when a user launches a new VM, the Compute scheduler sends the request to a live node; the Compute service queries the ServiceGroup API to get information about whether a node is alive. When a compute worker (running the nova-compute daemon) starts, it calls the join API to join the compute group. Any interested service (for example, the scheduler) can query the group's membership and the status of its nodes. Internally, the ServiceGroup client driver automatically updates the compute worker status. The database, ZooKeeper, and Memcache drivers are available.

Database ServiceGroup driver By default, Compute uses the database driver to track node liveness. In a compute worker, this driver periodically sends a db update command to the database, saying I'm OK with a timestamp. Compute uses a pre-defined timeout (service_down_time) to determine whether a node is dead. The driver has limitations, which can be an issue depending on your setup. The more compute worker nodes that you have, the more pressure you put on the database. By default, the timeout is 60 seconds so it might take some time to detect node failures. You could reduce the timeout value, but you must also make the database update more frequently, which again increases the database workload. The database contains data that is both transient (whether the node is alive) and persistent (for example, entries for VM owners). With the ServiceGroup abstraction, Compute can treat each type separately.

ZooKeeper ServiceGroup driver The ZooKeeper ServiceGroup driver works by using ZooKeeper ephemeral nodes. ZooKeeper, in contrast to databases, is a distributed system. Its load is divided among several servers. At a compute worker node, after establishing a ZooKeeper session, the driver creates an ephemeral znode in the group directory. Ephemeral znodes have the same lifespan as the session. If the worker node or the nova-compute daemon crashes, or a network partition is in place between the worker and the ZooKeeper server quorums, the ephemeral znodes are removed automatically. The driver gets the group membership by running the ls command in the group directory. To use the ZooKeeper driver, you must install ZooKeeper servers and client libraries. Setting up ZooKeeper servers is outside the scope of this guide (for more information, see Apache Zookeeper). To use ZooKeeper, you must install client-side Python libraries on every nova node: python-zookeeper – the official Zookeeper Python binding and evzookeeper – the library to make the binding work with the eventlet threading model. The following example assumes the ZooKeeper server addresses and ports are 192.168.2.1:2181, 192.168.2.2:2181, and 192.168.2.3:2181. The following values in the /etc/nova/nova.conf file (on every node) are required for the ZooKeeper driver: # Driver for the ServiceGroup service servicegroup_driver="zk" [zookeeper] address="192.168.2.1:2181,192.168.2.2:2181,192.168.2.3:2181" To customize the Compute Service groups, use the following configuration option settings:

Memcache ServiceGroup driver The memcache ServiceGroup driver uses memcached, which is a distributed memory object caching system that is often used to increase site performance. For more details, see memcached.org. To use the memcache driver, you must install memcached. However, because memcached is often used for both OpenStack Object Storage and OpenStack dashboard, it might already be installed. If memcached is not installed, refer to the OpenStack Installation Guide for more information. The following values in the /etc/nova/nova.conf file (on every node) are required for the memcache driver: # Driver for the ServiceGroup service servicegroup_driver="mc" # Memcached servers. Use either a list of memcached servers to use for caching (list value), # or "<None>" for in-process caching (default). memcached_servers=<None> # Timeout; maximum time since last check-in for up service (integer value). # Helps to define whether a node is dead service_down_time=60