diff --git a/doc/admin_api/about.rst b/doc/admin_api/about.rst index 83cb5ac9..3123352d 100644 --- a/doc/admin_api/about.rst +++ b/doc/admin_api/about.rst @@ -4,8 +4,11 @@ Description Purpose ------- -The Admin API server listens for REST+JSON connections to interface various -parts of the LBaaS system and other scripts with the LBaaS database state. +The Admin API server listens for REST+JSON connections to provide information +about the state of Libra to external systems. + +Additionally the Admin API has several schedulers which automatically maintain +the health of the Libra system and the connected Load Balancer devices. Design ------ @@ -14,3 +17,6 @@ Similar to the main API server it uses an Eventlet WSGI web server frontend with Pecan+WSME to process requests. SQLAlchemy+MySQL is used to access the data store. The main internal difference (apart from the API itself) is the Admin API server doesn't use keystone or gearman. + +It spawns several scheduled threads to run tasks such as building new devices +for the pool, monitoring load balancer devices and maintaining IP addresses. diff --git a/doc/admin_api/config.rst b/doc/admin_api/config.rst index e6c2e91f..e12b6301 100644 --- a/doc/admin_api/config.rst +++ b/doc/admin_api/config.rst @@ -16,6 +16,7 @@ Configuration File db_section=mysql1 ssl_certfile=/opt/server.crt ssl_keyfile=/opt/server.key + gearman=127.0.0.1:4730 [mysql1] host=localhost @@ -39,7 +40,7 @@ Command Line Options The port number to listen on, default is 8889 - .. option:: --db_secions + .. option:: --db_sections Config file sections that describe the MySQL servers. This option can be specified multiple times for Galera or NDB clusters. @@ -90,11 +91,6 @@ Command Line Options How long to wait until we consider the second and final ping check failed. Default is 30 seconds. - .. option:: --stats_repair_timer - - How often to run a check to see if damaged load balancers had been - repaired (in seconds), default 180 - .. option:: --number_of_servers The number of Admin API servers in the system. @@ -123,3 +119,17 @@ Command Line Options A list of tags to be used for the datadog driver + .. option:: --node_pool_size + + The number of hot spare load balancer devices to keep in the pool, + default 10 + + .. option:: --vip_pool_size + + The number of hot spare floating IPs to keep in the pool, default 10 + + .. option:: --expire_days + + The number of days before DELETED load balancers are purged from the + database. The purge is run every 24 hours. Purge is not run if no + value is provided. diff --git a/doc/admin_api/index.rst b/doc/admin_api/index.rst index f6137510..bf69ee7e 100644 --- a/doc/admin_api/index.rst +++ b/doc/admin_api/index.rst @@ -6,4 +6,5 @@ Libra Admin API Server about config + schedulers api diff --git a/doc/admin_api/schedulers.rst b/doc/admin_api/schedulers.rst new file mode 100644 index 00000000..ae265268 --- /dev/null +++ b/doc/admin_api/schedulers.rst @@ -0,0 +1,66 @@ +Admin Schedulers +================ + +The Admin API has several schedulers to maintain the health of the Libra +system. This section of the document goes into detail about each one. + +Each Admin API server takes it in-turn to run these tasks. Which server is +next is determined by the :option:`--number_of_servers` and +:option:`--server_id` options. + +Stats Scheduler +--------------- + +This scheduler is actually a monitoring scheduler and at a later date will also +gather statistics for billing purposes. It is executed once a minute. + +It sends a gearman message to active Load Balancer device. There are three +possible outcomes from the results: + +#. If all is good, no action is taken +#. If a node connected to a load balancer has failed the node is marked as + ERROR and the load balancer is marked as DEGRADED +#. If a device has failed the device will automatically be rebuilt on a new + device and the associated floating IP will be re-pointed to that device. The + old device will be marked for deletion + +Delete Scheduler +---------------- + +This scheduler looks out for any devices marked for deletion after use or after +an error state. It is executed once a minute. + +It sends a gearman message to the Pool Manager to delete any devices that are +to be deleted and removes them from the database. + +Create Scheduler +---------------- + +This scheduler takes a look at the number of hot spare devices available. It +is executed once a minute (after the delete scheduler). + +If the number of available hot spare devices falls below the value specified by +:option:`--node_pool_size` it will request that new devices are built and those +devices will be added to the database. It records how many are currently being +built so long build times don't mean multiple Admin APIs are trying to fulfil +the same quota. + +VIP Scheduler +------------- + +This scheduler takes a look at the number of hot spare floating IPs available. +It is executed once a minute. + +If the number of available floating IP address falls below the value specified +by :option:`vip_pool_size` it will request that new IPs are build and those +will be added to the database. + +Expunge Scheduler +----------------- + +This scheduler removes logical Load Balancers marked as DELETED from the +database. It is executed once a day. + +The DELETED logical Load Balancers remain in the database mainly for billing +purposes. This clears out any that were deleted after the number of days +specified by :option:`--expire-days`. diff --git a/doc/api/api.rst b/doc/api/api.rst index bd2242f8..64b69f1e 100644 --- a/doc/api/api.rst +++ b/doc/api/api.rst @@ -319,6 +319,8 @@ by the LBaaS service. +-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+ | Virtual IP | :ref:`Get list of virtual IPs ` | GET | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/virtualips | +-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+ +| Logs | :ref:`Archive log file to Object Storage ` | POST | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/logs | ++-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+ 5.2 Common Request Headers ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -2270,6 +2272,74 @@ if not found. ] } +.. _api-logs: + +22. Archive log file to Object Storage +-------------------------------------- + +22.1 Operation +~~~~~~~~~~~~~~ + ++----------+------------------------------------+--------+-----------------------------------------------------+ +| Resource | Operation | Method | Path | ++==========+====================================+========+=====================================================+ +| Logs | Archive log file to Object Storage | POST | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/logs | ++----------+------------------------------------+--------+-----------------------------------------------------+ + +22.2 Description +~~~~~~~~~~~~~~~~ + +The operation tells the load balancer to push the current log file into an HP Cloud Object Storage container. The status of the load balancer will be set to 'PENDING_UPDATE' during the operation and back to 'ACTIVE' upon success or failure. A success/failure message can be found in the 'statusDescription' field when getting the load balancer details. + +**Load Balancer Status Values** + ++----------------+---------------+--------------------------------+ +| Status | Name | Description | ++================+===============+================================+ +| ACTIVE | Load balancer | is in an operational state | +| PENDING_UPDATE | Load balancer | is in the process of an update | ++----------------+---------------+--------------------------------+ + +By default with empty POST data the load balancer will upload to the swift account owned by the same tenant as the load balancer in a container called 'lbaaslogs'. To change this the following optional parameters need to be provided in the POST body: + +**objectStoreBasePath** : the object store container to use + +**objectStoreEndpoint** : the object store endpoint to use including tenantID, for example: https://region-b.geo-1.objects.hpcloudsvc.com:443/v1/1234567890123 + +**authToken** : an authentication token to the object store for the load balancer to use + +22.3 Request Data +~~~~~~~~~~~~~~~~~ + +The caller is required to provide a request data with the POST which includes the appropriate information to upload logs. + +22.4 Query Parameters Supported +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +None required. + +22.5 Required HTTP Header Values +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**X-Auth-Token** + +22.6 Request Body +~~~~~~~~~~~~~~~~~ + +The request body must follow the correct format for new load balancer creation, examples.... + +A request that uploads the logs to a different object store + +:: + + { + "objectStoreBasePath": "mylblogs", + "objectStoreEndpoint": "https://region-b.geo-1.objects.hpcloudsvc.com:443/v1/1234567890123", + "authToken": "HPAuth_d17efd" + } + + + Features Currently Not Implemented or Supported ----------------------------------------------- @@ -2281,7 +2351,3 @@ The following features are not supported. advertised in /protocols request. Instead TCP will be used for port 443 and the HTTPS connections will be passed through the load balancer with no termination at the load balancer. - -3. The ability to list deleted load balancers is not yet supported. - - diff --git a/doc/api/config.rst b/doc/api/config.rst index 0031fe2e..18b06775 100644 --- a/doc/api/config.rst +++ b/doc/api/config.rst @@ -97,11 +97,6 @@ Command Line Options The path for the SSL key file to be used for the frontend of the API server - .. option:: --expire_days - - Deleted Load Balancers older than this number of days will be expunged - from the database using a sceduler that is executed every 24 hours. - .. option:: --ip_filters A mask of IP addresses to filter for backend nodes in the form diff --git a/doc/index.rst b/doc/index.rst index 858b5e58..2f7c15a3 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -7,7 +7,6 @@ Load Balancer as a Service Device Tools introduction worker/index pool_mgm/index - statsd/index api/index admin_api/index config diff --git a/doc/libralayout.png b/doc/libralayout.png index 2f9cfcbb..aaca3947 100644 Binary files a/doc/libralayout.png and b/doc/libralayout.png differ diff --git a/doc/pool_mgm/about.rst b/doc/pool_mgm/about.rst index e74cef37..97e6f3f8 100644 --- a/doc/pool_mgm/about.rst +++ b/doc/pool_mgm/about.rst @@ -4,15 +4,16 @@ Description Purpose ------- -The Libra Node Pool manager is designed to keep a constant pool of spare load -balancer nodes so that when a new one is needed it simply needs configuring. -This saves on time needed to spin up new nodes upon customer request and extra -delays due to new nodes failing. +The Libra Node Pool manager is designed to communicate with Openstack Nova or +any other compute API to provide nodes and floating IPs to the libra system +for use. It does this by providing a gearman worker interface to the Nova +API. This means you can have multiple pool managers running and gearman will +decide on the next available pool manager to take a job. Design ------ -It is designed to probe the API server every X minutes (5 by default) to find -out how many free nodes there are. If this falls below a certain defined level -the pool manager will spin up new nodes and supply their details to the -API server. +It is designed to accept requests from the Libra components to manipulate Nova +instances and floating IPs. It is a daemon which is a gearman worker. Any +commands sent to that worker are converted into Nova commands and the results +are sent back to the client. diff --git a/doc/pool_mgm/code.rst b/doc/pool_mgm/code.rst deleted file mode 100644 index f27dc63e..00000000 --- a/doc/pool_mgm/code.rst +++ /dev/null @@ -1,131 +0,0 @@ -Code Walkthrough -================ - -Here we'll highlight some of the more important code aspects. - -Server Class ------------- -.. py:module:: libra.mgm.mgm - -.. py:class:: Server(logger, args) - - This class is the main server activity once it has started in either - daemon on non-daemon mode. - - :param logger: An instance of :py:class:`logging.logger` - :param args: An instance of :py:class:`libra.common.options.Options` - - .. py:method:: main() - - Sets the signal handler and then called :py:meth:`check_nodes` - - .. py:method:: check_nodes() - - Runs a check to see if new nodes are needed. Called once by - :py:meth:`main` at start and then called by the scheduler. - It also restarts the scheduler at the end of execution - - .. py:method:: reset_scheduler() - - Uses :py:class:`threading.Timer` to set the next scheduled execution of - :py:meth:`check_nodes` - - .. py:method:: build_nodes(count, api) - - Builds the required number of nodes determined by - :py:meth:`check_nodes`. - - :param count: The number of nodes to build - :param api: A driver derived from the :py:class:`MgmDriver` parent class - - .. py:method:: exit_handler(signum, frame) - - The signal handler function. Clears the signal handler and calls - :py:meth:`shutdown` - - :param signum: The signal number - :param frame: The stack frame - - .. py:method:: shutdown(error) - - Causes the application to exit - - :param error: set to True if an error caused shutdown - :type error: boolean - -Node Class ----------- - -.. py:module:: libra.mgm.node - -.. py:class:: Node(username, password, tenant, auth_url, region, keyname, secgroup, image, node_type) - - This class uses :py:class:`novaclient.client` to manipulate Nova nodes - - :param username: The Nova username - :param password: The Nova password - :param tenant: The Nova tenant - :param auth_url: The Nova authentication URL - :param region: The Nova region - :param keyaname: The Nova key name for new nodes - :param secgroup: The Nova security group for new nodes - :param image: The Nova image ID or name for new nodes - :param node_type: The flavor ID or name for new nodes - - .. py:method:: build() - - Creates a new Nova node and tests that it is running. It will poll - every 3 seconds for 2 minutes to check if the node is running. - - :return: True and status dictionary for success, False and error for fail - -MgmDriver Class ---------------- - -.. py:module:: libra.mgm.drivers.base - -.. py:class:: MgmDriver - - The defines the API for interacting with various API servers. Drivers for - these API servers should inherit from this class and implement the relevant - API methods that it can support. - `This is an abstract class and is not meant to be instantiated directly.` - - .. py:method:: get_free_count() - - Gets the number of free nodes. This is used to calculate if more nodes - are needed - - :return: the number of free nodes - - .. py:method:: add_node(name, address) - - Adds the node details for a new device to the API server. - - :param name: the new name for the node - :param address: the new public IP address for the node - :return: True or False and the JSON response (if any) - - .. py:method:: is_online() - - Check to see if the driver has access to a valid API server - - :return: True or False - - .. py:method:: get_url() - - Gets the URL for the current API server - - :return: the URL for the current API server - -Known Drivers Dictionary ------------------------- - -.. py:data:: known_drivers - - This is the dictionary that maps values for the - :option:`--driver ` option - to a class implementing the driver :py:class:`~MgmDriver` API - for that API server. After implementing a new driver class, you simply add - a new entry to this dictionary to plug in the new driver. - diff --git a/doc/pool_mgm/commands.rst b/doc/pool_mgm/commands.rst new file mode 100644 index 00000000..667600ae --- /dev/null +++ b/doc/pool_mgm/commands.rst @@ -0,0 +1,154 @@ +Gearman Commands +================ + +The Pool Manager registers as the worker name ``libra_pool_mgm`` on the gearman +servers. Using this it accepts the JSON requests outlined in this document. + +In all cases it will return the original message along with the following for +success: + +.. code-block:: json + + { + "response": "PASS" + } + +And this for failure: + +.. code-block:: json + + { + "response": "FAIL" + } + +BUILD_DEVICE +------------ + +This command sends the Nova ``boot`` command using the Nova API and returns +details about the resulting new Nova instance. Details about which image and +other Nova settings to use are configured using the options or config file for +Pool Manager. + +Example: + +.. code-block:: json + + { + "action": "BUILD_DEVICE" + } + +Response: + +.. code-block:: json + + { + "action": "BUILD_DEVICE", + "response": "PASS", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9", + "addr": "15.185.175.81", + "type": "basename: libra-stg-haproxy, image: 12345", + "az": "3" + } + +DELETE_DEVICE +------------- + +This command requests that a Nova instance be deleted. + +Example: + +.. code-block:: json + + { + "action": "DELETE_DEVICE", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9" + } + +Response: + +.. code-block:: json + + { + "action": "DELETE_DEVICE", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9", + "response": "PASS" + } + +BUILD_IP +-------- + +This command requests a floating IP from Nova. + +Example: + +.. code-block:: json + + { + "action": "BUILD_IP", + } + +Response: + +.. code-block:: json + + { + "action": "BUILD_IP", + "response": "PASS", + "id": "12345", + "ip": "15.185.234.125" + } + +ASSIGN_IP +--------- + +This command assigns floating IP addresses to Nova instances (by name of +instance). + +Example: + +.. code-block:: json + + { + "action": "ASSIGN_IP", + "ip": "15.185.234.125", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9" + } + +Response: + +.. code-block:: json + + { + "action": "ASSIGN_IP", + "ip": "15.185.234.125", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9", + "response": "PASS" + } + +REMOVE_IP +--------- + +This command removes a floating IP address from a Nova instance, preserving +the IP address to be used another time. + +Example: + +.. code-block:: json + + { + "action": "REMOVE_IP", + "ip": "15.185.234.125", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9" + } + +Response: + +.. code-block:: json + + { + "action": "REMOVE_IP", + "ip": "15.185.234.125", + "name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9", + "response": "PASS" + } + diff --git a/doc/pool_mgm/config.rst b/doc/pool_mgm/config.rst index 1bbb824b..095cd8f4 100644 --- a/doc/pool_mgm/config.rst +++ b/doc/pool_mgm/config.rst @@ -25,40 +25,13 @@ Configuration File nova_secgroup = default nova_image = 12345 nova_image_size = standard.medium - api_server = 10.0.0.1:8889 10.0.0.2:8889 - nodes = 10 - check_interval = 5 - submit_interval = 15 + gearman=127.0.0.1:4730 node_basename = 'libra' Command Line Options -------------------- .. program:: libra_pool_mgm - .. option:: --api_server - - The hostname/IP and port colon separated pointed to an Admin API server - for use with the HP REST API driver. Can be specified multiple times for - multiple servers - - .. option:: --check_interval - - How often to check the API server to see if new nodes are needed - (value is minutes) - - .. option:: --submit_interval - - How often to check the list of nodes to see if the nodes - are now in a good state (value is in minutes) - - .. option:: --driver - - API driver to use. Valid driver options are: - - * *hp_rest* - HP REST API, talks to the HP Cloud API server (based - on Atlas API) - This is the default driver. - .. option:: --datadir The data directory used to store things such as the failed node list. @@ -73,10 +46,6 @@ Command Line Options A name to prefix the UUID name given to the nodes the pool manager generates. - .. option:: --nodes - - The size of the pool of spare nodes the pool manager should keep. - .. option:: --nova_auth_url The URL used to authenticate for the Nova API @@ -114,3 +83,20 @@ Command Line Options The flavor ID (image size ID) or name to use for new nodes spun up in the Nova API + .. option:: --gearman_ssl_ca + + The path for the Gearman SSL Certificate Authority. + + .. option:: --gearman_ssl_cert + + The path for the Gearman SSL certificate. + + .. option:: --gearman_ssl_key + + The path for the Gearman SSL key. + + .. option:: --gearman + + Used to specify the Gearman job server hostname and port. This option + can be used multiple times to specify multiple job servers + diff --git a/doc/pool_mgm/index.rst b/doc/pool_mgm/index.rst index 0b011b3b..6b3d9b19 100644 --- a/doc/pool_mgm/index.rst +++ b/doc/pool_mgm/index.rst @@ -6,4 +6,4 @@ Libra Node Pool Manager about config - code + commands diff --git a/doc/sources/libralayout.odg b/doc/sources/libralayout.odg index ebb82800..579f46bd 100644 Binary files a/doc/sources/libralayout.odg and b/doc/sources/libralayout.odg differ diff --git a/etc/sample_libra.cfg b/etc/sample_libra.cfg index 00398c74..c3bd5afc 100644 --- a/etc/sample_libra.cfg +++ b/etc/sample_libra.cfg @@ -51,33 +51,20 @@ nova_keyname = default nova_secgroup = default nova_image = 12345 nova_image_size = standard.medium -api_server = 10.0.0.1:8889 10.0.0.2:8889 -nodes = 10 -check_interval = 5 -submit_interval = 15 node_basename = 'libra' az = 1 - -[statsd] -api_server=127.0.0.1:8889 -server=127.0.0.1:4730 -logfile=/tmp/statsd.log -pid=/tmp/statsd.pid -driver=dummy datadog hp_rest -datadog_api_key=0987654321 -datadog_app_key=1234567890 -datadog_message_tail="@user@domain.com" -datadog_tags=service:lbaas -datadog_env=prod -ping_interval = 60 -poll_timeout = 5 -poll_timeout_retry = 30 +gearman=127.0.0.1:4730 [admin_api] db_sections=mysql1 ssl_certfile=certfile.crt ssl_keyfile=keyfile.key expire_days=7 +stats_driver=dummy datadog database +datadog_api_key=KEY +datadog_app_key=KEY2 +datadog_tags=service:lbaas +node_pool_size=50 [api] host=0.0.0.0 diff --git a/libra/mgm/mgm.py b/libra/mgm/mgm.py index 1ce956bf..b7980d03 100644 --- a/libra/mgm/mgm.py +++ b/libra/mgm/mgm.py @@ -44,10 +44,6 @@ class Server(object): def main(): options = Options('mgm', 'Node Management Daemon') - options.parser.add_argument( - '--api_server', action='append', metavar='HOST:PORT', default=[], - help='a list of API servers to connect to (for HP REST API driver)' - ) options.parser.add_argument( '--az', type=int, help='The az number the node will reside in (to be passed to the API'