.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ================================= Active-Active, N+1 Amphorae Setup ================================= https://blueprints.launchpad.net/octavia/+spec/active-active-topology Problem description =================== This blueprint describes how Octavia implements an *active-active* loadbalancer (LB) solution that is highly-available through redundant Amphorae. It presents the high-level service topology and suggests high-level code changes to the current code base to realize this scenario. In a nutshell, an *Amphora Cluster* of two or more active Amphorae collectively provide the loadbalancing service. The Amphora Cluster shall be managed by an *Amphora Cluster Manager* (ACM). The ACM shall provide an abstraction that allows different types of active-active features (e.g., failure recovery, elasticity, etc.). The initial implementation shall not rely on external services, but the abstraction shall allow for interaction with external ACMs (to be developed later). This blueprint uses terminology defined in Octavia glossary when available, and defines new terms to describe new components and features as necessary. .. _P2: **Note:** Items marked with [`P2`_] refer to lower priority features to be designed / implemented only after initial release. Proposed change =============== A tenant should be able to start a highly-available, loadbalancer for the tenant's backend services as follows: * The operator should be able to configure an active-active topology through an Octavia configuration file or [`P2`_] through a Neutron flavor, which the loadbalancer shall support. Octavia shall support active-active topologies in addition to the topologies that it currently supports. * In an active-active topology, a cluster of two or more amphorae shall host a replicated configuration of the load-balancing services. Octavia will manage this *Amphora Cluster* as a highly-available service using a pool of active resources. * The Amphora Cluster shall provide the load-balancing services and support the configurations that are supported by a single Amphora topology, including L7 load-balancing, SSL termination, etc. * The active-active topology shall support various Amphora types and implementations; including, virtual machines, [`P2`_] containers, and bare-metal servers. * The operator should be able to configure the high-availability requirements for the active-active load-balancing services. The operator shall be able to specify the number of healthy Amphorae that must exist in the load-balancing Amphora Cluster. If the number of healthy Amphorae drops under the desired number, Octavia shall automatically and seamlessly create and configure a new Amphora and add it to the Amphora Cluster. [`P2`_] The operator should be further able to define that the Amphora Cluster shall be allocated on separate physical resources. * An Amphora Cluster will collectively act to serve as a single logical loadbalancer as defined in the Octavia glossary. Octavia will seamlessly distribute incoming external traffic among the Amphorae in the Amphora Cluster. To that end, Octavia will employ a *Distributor* component that will forward external traffic towards the managed amphora instances. Conceptually, the Distributor provides an extra level of load-balancing for an active-active Octavia application, albeit a simplified one. Octavia should be able to support several Distributor implementations (e.g., software-based and hardware-based) and different affinity models (at minimum, flow-affinity should be supported to allow TCP connectivity between clients and Amphorae). * The detailed design of the Distributor component will be described in a separate document (see "Distributor for Active-Active, N+1 Amphorae Setup", active-active-distributor.rst). High-level Topology Description ------------------------------- Single Tenant ~~~~~~~~~~~~~ * The following diagram illustrates the active-active topology: :: Front-End Back-End Internet Network Network (world) (tenant) (tenant) ║ ║ ║ ┌─╨────┐ floating IP ║ ║ ┌────────┐ │Router│ to LB VIP ║ ┌────┬─────────┬────┐ ║ │ Tenant │ │ GW ├──────────────►╫◄─┤ IP │ Amphora │ IP ├─►╫◄─┤Service │ └──────┘ ║ └┬───┤ (1) │back│ ║ │ (1) │ ║ │VIP├─┬──────┬┴────┘ ║ └────────┘ ║ └───┘ │ MGMT │ ║ ┌────────┐ ╓◄───────────────────║─────────┤ IP │ ║ │ Tenant │ ║ ┌─────────┬────┐ ║ └──────┘ ╟◄─┤Service │ ║ │ Distri- │ IP├►╢ ║ │ (2) │ ║ │ butor ├───┬┘ ║ ┌────┬─────────┬────┐ ║ └────────┘ ║ └─┬──────┬┤VIP│ ╟◄─┤ IP │ Amphora │ IP ├─►╢ ┌────────┐ ║ │ MGMT │└─┬─┘ ║ └┬───┤ (2) │back│ ║ │ Tenant │ ╟◄────┤ IP │ └arp►╢ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │ ║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (3) │ ╟◄───────────────────║─────────┤ IP │ ║ └────────┘ ║ ┌───────────────┐ ║ └──────┘ ║ ║ │ Octavia LBaaS │ ║ ••• ║ • ╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ • ║ └┬─────────────┬┘ ╙◄─┤ IP │ Amphora │ IP ├─►╢ ║ │ Amphora │ └┬───┤ (k) │back│ ║ ┌────────┐ ║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │ Tenant │ ║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │ ╟◄─────────────────────────────┤ IP │ │ (m) │ ║ └──────┘ └────────┘ ║ Management Amphora Cluster Back-end Pool Network 1..k 1..m * An example of high-level data-flow: 1. Internet clients access a tenant service through an externally visible floating-IP (IPv4 or IPv6). 2. If IPv4, a gateway router maps the floating IP into a loadbalancer's internal VIP on the tenant's front-end network. 3. The (multi-tenant) Distributor receives incoming requests to the loadbalancer's VIP. It acts as a one-legged direct return LB, answering ``arp`` requests for the loadbalancer's VIP (see Distributor spec.). 4. The Distributor distributes incoming connections over the tenant's Amphora Cluster, by forwarding each new connection opened with a loadbalancer's VIP to a front-end MAC address of an Amphora in the Amphora Cluster (layer-2 forwarding). *Note*: the Distributor may implement other forwarding schemes to support more complex routing mechanisms, such as DVR (see Distributor spec.). 5. An Amphora receives the connection and accepts traffic addressed to the loadbalancer's VIP. The front-end IPs of the Amphorae are allocated on the tenant's front-end network. Each Amphora accepts VIP traffic, but does not answer ``arp`` request for the VIP address. 6. The Amphora load-balances the incoming connections to the back-end pool of tenant servers, by forwarding each external request to a member on the tenant network. The Amphora also performs SSL termination if configured. 7. Outgoing traffic traverses from the back-end pool members, through the Amphora and directly to the gateway (i.e., not through the Distributor). Multi-tenant Support ~~~~~~~~~~~~~~~~~~~~ * The following diagram illustrates the active-active topology with multiple tenants: :: Front-End Back-End Internet Networks Networks (world) (tenant) (tenant) ║ B A A ║ floating IP ║ ║ ║ ┌────────┐ ┌─╨────┐ to LB VIP A ║ ║ ┌────┬─────────┬────┐ ║ │Tenant A│ │Router├───────────────║─►╫◄─┤A IP│ Amphora │A IP├─►╫◄─┤Service │ │ GW ├──────────────►╢ ║ └┬───┤ (1) │back│ ║ │ (1) │ └──────┘ floating IP ║ ║ │VIP├─┬──────┬┴────┘ ║ └────────┘ to LB VIP B ║ ║ └───┘ │ MGMT │ ║ ┌────────┐ ╓◄───────────────────║──║─────────┤ IP │ ║ │Tenant A│ ║ ║ ║ └──────┘ ╟◄─┤Service │ M B A ┌────┬─────────┬────┐ ║ │ (2) │ ║ ║ ╟◄─┤A IP│ Amphora │A IP├─►╢ └────────┘ ║ ║ ║ └┬───┤ (2) │back│ ║ ┌────────┐ ║ ║ ║ │VIP├─┬──────┬┴────┘ ║ │Tenant A│ ║ ║ ║ └───┘ │ MGMT │ ╟◄─┤Service │ ╟◄───────────────────║──║─────────┤ IP │ ║ │ (3) │ ║ ║ ║ └──────┘ ║ └────────┘ ║ B A ••• B • ║ ┌─────────┬────┐ ║ ║ ┌────┬─────────┬────┐ ║ • ║ │ │IP A├─╢─►╫◄─┤A IP│ Amphora │A IP├─►╢ ┌────────┐ ║ │ ├───┬┘ ║ ║ └┬───┤ (k) │back│ ║ │Tenant A│ ║ │ Distri- │VIP├─arp►╜ │VIP├─┬──────┬┴────┘ ╙◄─┤Service │ ║ │ butor ├───┘ ║ └───┘ │ MGMT │ │ (m) │ ╟◄─ │ │ ─────║────────────┤ IP │ └────────┘ ║ │ ├────┐ ║ └──────┘ ║ │ │IP B├►╢ tenant A ║ │ ├───┬┘ ║ = = = = = = = = = = = = = = = = = = = = = ║ │ │VIP│ ║ ┌────┬─────────┬────┐ B tenant B ║ └─┬──────┬┴─┬─┘ ╟◄────┤B IP│ Amphora │B IP├─►╢ ┌────────┐ ║ │ MGMT │ └arp►╢ └┬───┤ (1) │back│ ║ │Tenant B│ ╟◄────┤ IP │ ║ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │ ║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (1) │ ╟◄───────────────────║────────────┤ IP │ ║ └────────┘ ║ ┌───────────────┐ ║ └──────┘ ║ M │ Octavia LBaaS │ B ••• B • ╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ • ║ └┬─────────────┬┘ ╙◄────┤B IP│ Amphora │B IP├─►╢ ║ │ Amphora │ └┬───┤ (q) │back│ ║ ┌────────┐ ║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │Tenant B│ ║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │ ╟◄────────────────────────────────┤ IP │ │ (r) │ ║ └──────┘ └────────┘ ║ Management Amphora Clusters Back-end Pool Network A(1..k), B(1..q) A(1..m),B(1..r) * Both tenants A and B share the Distributor, but each has a different front-end network. The Distributor listens on both loadbalancers' VIPs and forwards to either A's or B's Amphorae. * The Amphorae and the back-end (tenant) networks are not shared between tenants. Problem Details --------------- * Octavia should support different Distributor implementations, similar to its support for different Amphora types. The operator should be able to configure different types of algorithms for the Distributor. All algorithms should provide flow-affinity to allow TLS termination at the amphora. See :doc:`active-active-distributor` for details. * Octavia controller shall seamlessly configure any newly created Amphora ([`P2`_] including peer state synchronization, such as sticky-tables, if needed) and shall reconfigure the other solution components (e.g., Neutron) as needed. The controller shall further manage all Amphora life-cycle events. * Since it is impractical at scale for peer state synchronization to occur between all Amphorae part of a single load balancer, Amphorae that are all part of a single load balancer configuration need to be divided into smaller peer groups (consisting of 2 or 3 Amphorae) with which they should synchronize state information. Required changes ---------------- The active-active loadbalancers require the following high-level changes: Amphora related changes ~~~~~~~~~~~~~~~~~~~~~~~ * Updated Amphora image to support active-active topology. The front-end still has both a unique IP (to allow direct addressing on front-end network) and a VIP; however, it should not answer ARP requests for the VIP address (all Amphorae in a single Amphora Cluster concurrently serve the same VIP). Amphorae should continue to have a management IP on the LB Network so Octavia can configure them. Amphorae should also generally support hot-plugging interfaces into back-end tenant networks as they do in the current implementation. [`P2`_] Finally, the Amphora configuration may need to be changed to randomize the member list, in order to prevent synchronized decisions by all Amphorae in the Amphora Cluster. * Extend data model to support active-active Amphora. This is somewhat similar to active-passive (VRRP) support. Each Amphora needs to store its IP and port on its front-end network (similar to ha_ip and ha_port_id in the current model) and its role should indicate it is in a cluster. The provisioning status should be interpreted as referring to an Amphora only and not the load-balancing service. The status of the load balancer should correspond to the number of ``ONLINE`` Amphorae in the Cluster. If all Amphoae are ``ONLINE``, the load balancer is also ``ONLINE``. If a small number of Amphorae are not ``ONLINE``, then the load balancer is ``DEGRADED``. If enough Amphorae are not ``ONLINE`` (past a threshold), then the load balancer is ``DOWN``. * Rework some of the controller worker flows to support creation and deletion of Amphorae by the ACM in an asynchronous manner. The compute node may be created/deleted independently of the corresponding Amphora flow, triggered as events by the ACM logic (e.g., node update). The flows do not need much change (beyond those implied by the changes in the data model), since the post-creation/pre-deletion configuration of each Amphora is unchanged. This is also similar to the failure recovery flow, where a recovery flow is triggered asynchronously. * Create a flow (or task) for the controller worker for (de-)registration of Amphorae with Distributor. The Distributor has to be aware of the current ``ONLINE`` Amphorae, to which it can forward traffic. [`P2`_] The Distributor can do very basic monitoring of the Amphorae health (primarily to make sure network connectivity between the Distributor and Amphorae is working). Monitoring pool member health will remain the purview of the pool health monitors. * All the Amphorae in the Amphora Cluster shall replicate the same listeners, pools, and TLS configuration, as they do now. We assume all Amphorae in the Amphora Cluster can perform exactly the same load-balancing decisions and can be treated as equivalent by the Distributor (except for affinity considerations). * Extend the Amphora (REST) API and/or *Plug VIP* task to allow disabling of ``arp`` on the VIP. * In order to prevent losing session_persistence data in the event of an Amphora failure, the Amphorae will need to be configured to share session_persistence data (via stick tables) with a subset of other Amphorae that are part of the same load balancer configuration (ie. a peer group). Amphora Cluster Manager driver for the active-active topology (*new*) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Add an active-active topology to the topology types. * Add a new driver to support creation/deletion of an Amphora Cluster via an ACM. This will re-use existing controller-worker flows as much as possible. The reference ACM will call the existing drivers to create compute nodes for the Amphorae and configure them. * The ACM shall orchestrate creation and deletion of Amphora instances to meet the availability requirements. Amphora failover will utilize the existing health monitor flows, with hooks to notify the ACM when ACTIVE-ACTIVE topology is used. [`P2`_] ACM shall handle graceful amphora removal via draining (delay actual removal until existing connections are terminated or some timeout has passed). * Change the flow of LB creation. The ACM driver shall create an Amphora Cluster instance for each new loadbalancer. It should maintain the desired number of Amphorae in the Cluster and meet the high-availability configuration given by the operator. *Note*: a base functionality is already supported by the Health Manager; it may be enough to support a fixed or dynamic cluster size. In any case, existing flows to manage Amphora life cycle will be re-used in the reference ACM driver. * The ACM shall be responsible for providing health, performance, and life-cycle management at the Cluster-level rather than at Amphora-level. Maintaining the loadbalancer status (as described above) by some function of the collective status of all Amphorae in the Cluster is one example. Other examples include tracking configuration changes, providing Cluster statistics, monitoring and maintaining compute nodes for the Cluster, etc. The ACM abstraction would also support pluggable ACM implementations that may provide more advance capabilities (e.g., elasticity, AZ aware availability, etc.). The reference ACM driver will re-use existing components and/or code which currently handle health, life-cycle, etc. management for other load balancer topologies. * New data model for an Amphora Cluster which has a one-to-one mapping with the loadbalancer. This defines the common properties of the Amphora Cluster (e.g., id, min. size, desired size, etc.) and additional properties for the specific implementation. * Add configuration file options to support configuration of an active-active Amphora Cluster. Add default configuration. [`P2`_] Add Operator API. * Add or update documentation for new components added and new or changed functionality. * Communication between the ACM and Distributors should be secured using two-way SSL certificate authentication much the same way this is accomplished between other Octavia controller components and Amphorae today. Network driver changes ~~~~~~~~~~~~~~~~~~~~~~ * Support the creation, connection, and configuration of the various networks and interfaces as described in ‘high-level topology' diagram. * Adding a new loadbalancer requires attaching the Distributor to the loadbalancer's front-end network, adding a VIP port to the Distributor, and configuring the Distributor to answer ``arp`` requests for the VIP. The Distributor shall have a separate interface for each loadbalancer and shall not allow any routing between different ports; in particular, Amphorae of different tenants must not be able to communicate with each other. In the reference implementation, this will be accomplished by using separate OVS bridges per load balancer. * Adding a new Amphora requires attaching it to the front-end and back-end networks (similar to current implementation), adding the VIP (but with ``arp`` disabled), and registering the Amphora with the Distributor. The tenant's front-end and back-end networks must allow attachment of dynamically created Amphorae by involving the ACM (e.g., when the health monitor replaces a failed Amphora). ([`P2`_] extend the LBaaS API to allow specifying an address range for new Amphorae usage, e.g., a subnet pool). Amphora health-monitoring support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Modify Health Manager to manage the health for an Amphora Cluster through the ACM; namely, forward Amphora health change events to the ACM, so it can decide when the Amphora Cluster is considered to be in healthy state. This should be done in addition to managing the health of each Amphora. [`P2`_] Monitor the Amphorae also on their front-end network (i.e., from the Distributor). Distributor support ~~~~~~~~~~~~~~~~~~~ * **Note:** as mentioned above, the detailed design of the Distributor component is described in a separate document). Some design considerations are highlighted below. * The Distributor should be supported similarly to an Amphora; namely, have its own abstract driver. * For a reference implementation, add support for a Distributor image. * Define a REST API for Distributor configuration (no SSH API). The API shall support: - Add and remove a VIP (loadbalancer) and specify distribution parameters (e.g., affinity, algorithm, etc.). - Registration and de-registration of Amphorae. - Status - [`P2`_] Macro-level stats * Spawn Distributors (if using on demand Distributor compute nodes) and/or attach to existing ones as needed. Manage health and life-cycle of the Distributor(s). Create, connect, and configure Distributor networks as necessary. * Create data model for the Distributor. * Add Distributor driver and flows to (re-)configure the Distributor on creation/destruction of a new loadbalancer (add/remove loadbalancer VIP) and [`P2`_] configure the distribution algorithm for the loadbalancer's Amphora Cluster. * Add flows to Octavia to (re-)configure the Distributor on adding/removing Amphorae from the Amphora Cluster. Packaging ~~~~~~~~~ * Extend Octavia installation scripts to create an image for the Distributor. Alternatives ------------ * Use external services to manage the cluster directly. This utilizes functionality that already exists in OpenStack (e.g., like Heat and Ceilometer) rather than replicating it. This approach would also benefit from future extensions to these services. On the other hand, this adds undesirable dependencies on other projects (and their corresponding teams), complicates handling of failures, and require defensive coding around service calls. Furthermore, these services cannot handle the LB-specific control configuration. * Implement a nested Octavia Use another layer of Octavia to distribute traffic across the Amphora Cluster (i.e., the Amphorae in the Cluster are back-end members of another Octavia instance). This approach has the potential to provide greater flexibility (e.g., provide NAT and/or more complex distribution algorithms). It also potentially reuses existing code. However, we do not want the Distributor to proxy connections so HA-Proxy cannot be used. Furthermore, this approach might significantly increase the overhead of the solution. Data model impact ----------------- * loadbalancer table - `cluster_id`: associated Amphora Cluster (no changes to table, 1-1 relationship from Cluster data-model) * lb_topology table - new value: ``ACTIVE_ACTIVE`` * amphora_role table - new value: ``IN_CLUSTER`` * Distributor table (*new*): Distributor information, similar to Amphora. See :doc:`active-active-distributor` * Cluster table (*new*): an extension to loadbalancer (i.e., one-to-one mapping to load-balancer) - `id` (primary key) - `cluster_name`: identifier of Cluster instance for Amphora Cluster Manager - `desired_size`: required number of Amphorae in Cluster. Octavia will create this many active-active Amphorae in the Amphora Cluster. - `min_size`: number of ``ACTIVE`` Amphorae in Cluster must be above this number for Amphora Cluster status to be ``ACTIVE`` - `cooldown`: cooldown period between successive add/remove Amphora operations (to avoid thrashing) - `load_balancer_id`: 1:1 relationship to loadbalancer - `distributor_id`: N:1 relationship to Distributor. Support multiple Distributors - `provisioning_status` - `operating_status` - `enabled` - `cluster_type`: type of Amphora Cluster implementation REST API impact --------------- * Distributor REST API -- This is a new internal API that will be secured via two-way SSL certificate authentication. See :doc:`active-active-distributor` * Amphora REST API -- support configuration of disabling ``arp`` on VIP. * [`P2`_] LBaaS API -- support configuration of desired availability, perhaps by selecting a flavor (e.g., gold is a minimum of 4 Amphorae, platinum is a minimum of 10 Amphora). * Operator API -- - Topology to use - Cluster type - Default availability parameters for the Amphora Cluster Security impact --------------- * See :doc:`active-active-distributor` for Distributor related security impact. Notifications impact -------------------- None. Other end user impact --------------------- None. Performance Impact ------------------ ACTIVE-ACTIVE should be able to deliver significantly higher performance than SINGLE or ACTIVE-STANDBY topology. It will consume more resources to deliver this higher performance. Other deployer impact --------------------- The reference ACM becomes a new process that is part of the Octavia control components (like the controller worker, health monitor and housekeeper). If the reference implementation is used, a new Distributor image will need to be created and stored in glance much the same way the Amphora image is created and stored today. Developer impact ---------------- None. Implementation ============== Assignee(s) ----------- @TODO Work Items ---------- @TODO Dependencies ============ @TODO Testing ======= * Unit tests with tox. * Function tests with tox. * Scenario tests. Documentation Impact ==================== Need to document all new APIs and API changes, new ACTIVE-ACTIVE topology design and features, and new instructions for operators seeking to deploy Octavia with ACTIVE-ACTIVE topology. References ========== https://blueprints.launchpad.net/octavia/+spec/base-image https://blueprints.launchpad.net/octavia/+spec/controller-worker https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface https://blueprints.launchpad.net/octavia/+spec/controller https://blueprints.launchpad.net/octavia/+spec/operator-api :doc:`../../api/haproxy-amphora-api` https://blueprints.launchpad.net/octavia/+spec/active-active-topology