octavia/specs/version0.8/active_passive_loadbalancer.rst
melissaml 3ee0f52151 Rename review.openstack.org to review.opendev.org
There are many references to review.openstack.org, and while the
redirect should work, we can also go ahead and fix them.

Change-Id: I02b3758e707319489e03a6cd00766b0b9381dc12
2019-05-29 00:21:34 +00:00

390 lines
13 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================
Active-Standby Amphora Setup using VRRP
=======================================
https://blueprints.launchpad.net/octavia/+spec/activepassiveamphora
This blueprint describes how Octavia implements its Active/Standby
solution. It will describe the high level topology and the proposed code
changes from the current supported Single topology to realize the high
availability loadbalancer scenario.
Problem description
===================
A tenant should be able to start high availability loadbalancer(s) for the
tenant's backend services as follows:
* The operator should be able to configure an Active/Standby topology through
an octavia configuration file, which the loadbalancer shall support. An
Active/Standby topology shall be supported by Octavia in addition to the
Single topology that is currently supported.
* In Active/Standby, two Amphorae shall host a replicated configuration of the
load balancing services. Both amphorae will also deploy a Virtual Router
Redundancy Protocol (VRRP) implementation [2].
* Upon failure of the master amphora, the backup one shall seamlessly take over
the load balancing functions. After the master amphora changes to a healthy
status, the backup amphora shall give up the load balancing functions to the
master again (see [2] section 3 for details on master election protocol).
* Fail-overs shall be seamless to end-users and fail-over time should be
minimized.
* The following diagram illustrates the Active/Standby topology.
asciiflow::
+--------+
| Tenant |
|Service |
| (1) |
+--------+ +-----------+
| +--------+ +----+ Master +----+
| | Tenant | |VIP | Amphora |IP1 |
| |Service | +--+-+-----+-----+-+--+
| | (M) | | |MGMT |VRRP | |
| +--------+ | | IP | IP1 | |
| | Tenant | +--+--++----+ |
| | Network | | | | +-----------------+ Floating +---------+
v-v-------------^----+---v-^----v-^-+ Router | IP | |
^---------------+----v-^---+------+-+Floating <-> VIP <----------+ Internet|
| Management | | | | | | | |
| (MGMT) | | | | +-----------------+ +---------+
| Network | +--+--++----+ |
| Paired |MGMT |VRRP | |
| | | IP | IP2 | |
+-----------+ | +-----+-----+ |
| Octavia | ++---+ Backup +-+--+
|Controller | |VIP | Amphora |IP2 |
| (s) | +----+-----------+----+
+-----------+
* The newly introduced VRRP IPs shall communicate on the same tenant network
(see security impact for more details).
* The existing Haproxy Jinja configuration template shall include "peer"
setup for state synchronization over the VRRP IP addresses.
* The VRRP IP addresses shall work with both IPv4 and IPv6.
Proposed change
===============
The Active/Standby loadbalancers require the following high level changes:
* Add support of VRRP in the amphora base image through Keepalived.
* Extend the controller worker to be able to spawn N amphorae associated with
the same loadbalancer on N different compute nodes (This takes into account
future work on Active/Active topology). The amphorae shall be allowed to
use the VIP through "allow address pairing". These amphorae shall replicate
the same listeners, and pools configuration. Note: topology is a property
of a load balancer and not of one of its amphorae.
* Extend the amphora driver interface, the amphora REST driver, and Jinja
configuration templates for the newly introduced VRRP service [4].
* Develop a Keepalived driver.
* Extend the network driver to become aware of the different loadbalancer
topologies and add support of network creation. The network driver shall
also pair the different amphorae in a given topology to the same VIP address.
* Extend the controller worker to build the right flow/sub-flows according to
the given topology. The controller worker is also responsible of creating
the correct stores needed by other flow/sub-flows.
* Extend the Octavia configuration and Operator API to support the
Active/Standby topology.
* MINOR: Extend the Health Manager to be aware of the role of the amphora
(Master/Backup) [9]. If the health manager decided to spawn a new amphora
to replace an unhealthy one (while a backup amphora is already in service),
it must replicate the same VRRP priorities, ids, and authentication
credentials to keep the loadbalancer in its appropriate configuration.
Listeners associated with this load balancer shall be put in a DEGRADED
provisioning state.
Alternatives
------------
We could use heartbeats as an alternative to VRRP, which is also a widely
adopted solution. Heartbeats better suit redundant file servers, filesystems,
and databases rather than network services such as routers, firewalls, and
loadbalancers. Willy Tarreau, the creator of Haproxy, provides a detailed
view on the major differences between heartbeats and VRRP in [5].
Data model impact
-----------------
The data model of the Octavia database shall be impacted as follows:
* A new column in the load_balancer table shall indicate its topology. The
topology field takes values from: SINGLE, or ACTIVE/STANDBY.
* A new column in the amphora table shall indicate an amphora's role in the
topology. If the topology is SINGLE, the amphora role shall be STANDALONE. If
the topology is ACTIVE/STANDBY, the amphora role shall be either MASTER or
BACKUP. This role field will also be of use for the Active/Active topology.
* New value tables for the loadbalancer topology and the amphorae roles.
* New columns in the amphora table shall indicate the VRRP priority, the VRRP
ID, and the VRRP interface of the amphora.
* A new column in the listener table shall indicate the TCP port used for
listener internal data synchronization.
* VRRP groups define the common VRRP configurations for all listeners on an
amphora. A new table shall hold the VRRP groups main configuration
primitives including at least: VRRP authentication information, role and
priority advertisement interval. Each Active/Standby loadbalancer defines one
and only one VRRP group.
REST API impact
---------------
** Changes to amphora API: see [11] **
PUT /listeners/{amphora_id}/{listener_id}/haproxy
PUT /vrrp/upload
PUT /vrrp/{action}
GET /interface/{ip_addr}
** Changes to operator API: see [10] **
POST /loadbalancers
* Successful Status Code - 202
* JSON Request Body Attributes
** vip - another JSON object with one required attribute from the following
*** net_port_id - uuid
*** subnet_id - uuid
*** floating_ip_id - uuid
*** floating_ip_network_id - uuid
** tenant_id - string - optional - default "0" * 36 (for now)
** name - string - optional - default null
** description - string - optional - default null
** enabled - boolean - optional - default true
* JSON Response Body Attributes
** id - uuid
** vip - another JSON object
*** net_port_id - uuid
*** subnet_id - uuid
*** floating_ip_id - uuid
*** floating_ip_network_id - uuid
** tenant_id - string
** name - string
** description - string
** enabled - boolean
** provisioning_status - string enum - (ACTIVE, PENDING_CREATE, PENDING_UPDATE,
PENDING_DELETE, DELETED, ERROR)
** operating_status - string enum - (ONLINE, OFFLINE, DEGRADED, ERROR)
** **topology - string enum - (SINGLE, ACTIVE_STANDBY)**
PUT /loadbalancers/{lb_id}
* Successful Status Code - 202
* JSON Request Body Attributes
** name - string
** description - string
** enabled - boolean
* JSON Response Body Attributes
** id - uuid
** vip - another JSON object
*** net_port_id - uuid
*** subnet_id - uuid
*** floating_ip_id - uuid
*** floating_ip_network_id - uuid
** tenant_id - string
** name - string
** description - string
** enabled - boolean
** provisioning_status - string enum - (ACTIVE, PENDING_CREATE, PENDING_UPDATE,
PENDING_DELETE, DELETED, ERROR)
** operating_status - string enum - (ONLINE, OFFLINE, DEGRADED, ERROR)
** **topology - string enum - (SINGLE, ACTIVE_STANDBY)**
GET /loadbalancers/{lb_id}
* Successful Status Code - 200
* JSON Response Body Attributes
** id - uuid
** vip - another JSON object
*** net_port_id - uuid
*** subnet_id - uuid
*** floating_ip_id - uuid
*** floating_ip_network_id - uuid
** tenant_id - string
** name - string
** description - string
** enabled - boolean
** provisioning_status - string enum - (ACTIVE, PENDING_CREATE, PENDING_UPDATE,
PENDING_DELETE, DELETED, ERROR)
** operating_status - string enum - (ONLINE, OFFLINE, DEGRADED, ERROR)
** **topology - string enum - (SINGLE, ACTIVE_STANDBY)**
Security impact
---------------
* The VRRP driver must automatically add a security group rule to the amphora's
security group to allow VRRP traffic (Protocol number 112) on the same tenant
subnet.
* The VRRP driver shall automatically add a security group rule to allow
Authentication Header traffic (Protocol number 51).
* VRRP driver shall support authentication-type MD5.
* The HAProxy driver must be updated to automatically add a security group rule
that allows multi-peers to synchronize their states.
* Currently HAProxy **does not** support peer authentication, and state sync
messages are in plaintext.
* At this point, VRRP shall communicate on the same tenant network. The
rationale is to fail-over based on a similar network interfaces condition
which the tenant operates experience. Also, VRRP traffic and sync messages
shall naturally inherit same protections applied to the tenant network.
This may create fake fail-overs if the tenant network is under unplanned,
heavy traffic. This is still better than failing over while the master is
actually serving tenant's traffic or not failing over at all if the master
has failed services. Additionally, the Keepalived shall check the health of
the HAproxy service.
* In next steps the following shall be taken into account:
* Tenant quotas and supported topologies.
* Protection of VRRP Traffic, HAproxy state sync, Router IDs, and pass
phrases in both packets and DB.
Notifications impact
--------------------
None.
Other end user impact
---------------------
* The operator shall be able to specify the loadbalancer topology in the
Octavia configuration file (used by default).
Performance Impact
------------------
The Active/Standby can consume up to twice the resources (storage, network,
compute) as required by the Single Topology. Nevertheless, one single amphora
shall be active (i.e. serving end-user) at any point in time. If the Master
amphora is healthy, the backup one shall remain idle until it receives no
VRRP advertisements from the master.
The VRRP requires executing health checks in the amphorae at fine grain
granularity period. The health checks shall be as lightweight as possible
such that VRRP is able to execute all check scripts within a predefined
interval. If the check scripts failed to run within this predefined interval,
VRRP may become unstable and may alternate the amphorae roles between MASTER
and BACKUP incorrectly.
Other deployer impact
---------------------
* An amphora_topology config option shall be added. The controller worker
shall change its taskflow behavior according to the requirement of different
topologies.
* By default, the amphora_topology is SINGLE and the ACTIVE/STANDBY topology
shall be enabled/requested explicitly by operators.
* The Keepalived version deployed in the amphora image must be newer than
1.2.8 to support unicast VRRP mode.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Sherif Abdelwahab (abdelwas)
Work Items
----------
* Amphora image update to include Keepalived.
* Data model updates.
* Control Worker extensions.
* Keepalived driver.
* Update Network driver.
* Security rules.
* Update Amphora REST APIs and Jinja Configurations.
* Update Octavia Operator APIs.
Dependencies
============
Keepalived version deployed in the amphora image must be newer than 1.2.8 to
support unicast VRRP mode.
Testing
=======
* Unit tests with tox.
* Function tests with tox.
Documentation Impact
====================
* Description of the different supported topologies: Single, Active/Standby.
* Octavia configuration file changes to enable the Active/Standby topology.
* CLI changes to enable the Active/Standby topology.
* Changes shall be introduced to the amphora APIs: see [11].
References
==========
[1] Implementing High Availability Instances with Neutron using VRRP
http://goo.gl/eP71g7
[2] RFC3768 Virtual Router Redundancy Protocol (VRRP)
[3] https://review.opendev.org/#/c/38230/
[4] http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html
[5] http://www.formilux.org/archives/haproxy/1003/3259.html
[6] https://blueprints.launchpad.net/octavia/+spec/base-image
[7] https://blueprints.launchpad.net/octavia/+spec/controller-worker
[8] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface
[9] https://blueprints.launchpad.net/octavia/+spec/controller
[10] https://blueprints.launchpad.net/octavia/+spec/operator-api
[11] doc/main/api/haproxy-amphora-api.rst