This patch refactors the failover flows to improve the performance
and reliability of failovers in Octavia.
Specific improvements are:
* More tasks and flows will retry when other OpenStack services are
failing.
* Failover can now succeed even when all of the amphora are missing
for a given load balancer.
* It will check and repair the load balancer VIP should the VIP
port(s) become corrupted in neutron.
* It will cleanup extra resources that may be associated with a
load balancer in the event of a cloud service failure.
This patch also removes some dead code.
Change-Id: I04cb2f1f10ec566298834f81df0cf8b100ca916c
Story: 2003084
Task: 23166
Story: 2004440
Task: 28108
Add new configuration option "minimum_tls_versions" to octavia.conf.
Listeners, pools, or the default values for either will be blocked from
using lower versions.
Change-Id: Ifa0d695c2227772d6b37987a7857fe58ca660dc8
Story: 2006733
Task: 37171
Depends-On: I480b7fb9756d98ba9dbcdfd1d4b193ce6868e291
Add field tls_versions to pools for restricing TLS versions used.
This is a colon-separated string of versions to be used.
Available values (as defined in octavia-lib):
SSLv3, TLSv1, TLSv1.1, TLSv1.2, TLSv1.3
Add default_pool_tls_versions in octavia.conf
Note: TLSv1.3 connections will use haproxy's default ciphers
instead of the listener's tls_ciphers field
Change-Id: I480b7fb9756d98ba9dbcdfd1d4b193ce6868e291
Story: 2006733
Task: 37173
Depends-On: Ic33d9b9a256490ae1b048cdfd2475d6340509fdb
Add field tls_versions to listeners for restricting TLS versions used.
This is a list of versions to be used.
Available values (as defined in octavia-lib):
SSLv3, TLSv1, TLSv1.1, TLSv1.2, TLSv1.3
Add default_listener_tls_versions in octavia.conf.
Note that at this time TLS 1.3 ciphersuites are not impelemented,
so any TLS 1.3 connections will use haproxy's default ciphers
instead of what's specified by tls_ciphers.
Change-Id: Ic33d9b9a256490ae1b048cdfd2475d6340509fdb
Story: 2006733
Task: 37170
Task: 37169
Add new configuration option "tls_cipher_blacklist" to octavia.conf.
Blacklisted ciphers are blocked from being used in listeners, pools, or
default cipher strings.
Change-Id: I44fd4da1b47faee9cc01b9426898a28b6f13f223
Story: 2006627
Task: 37168
* Make sure the user has access to the subnet in the request for
creating or updating pool member.
* Make sure the user has access to port or subnet or network for
creating load balancer
Story: 2007531
Task: 39339
Change-Id: I479019a911b5a1acfc1951d1cbbc2a351089cb4d
Introduce TaskFlowServiceController which uses taskflow
jobboard feature and saves jobs info into persistence backend.
Jobboard could be operated via RedisTaskFlowDriver or
ZookeeperTaskFlowDriver, that could be set via the config.
RedisTaskFlowDriver is intoduced as default backend for jobboard.
Usage of jobboard allows to resume jobs in case of restart/stop
of Octavia controller services.
Persistence backend saves state of flow tasks that required in
case of resuming job. SQLAlchemy backend is used here.
Bump taskflow version to 3.7.1 and add dependency to
SQLAlchemy-Utils (required for taskflow sqlalchemy
backend support).
Story: 2005072
Task: 30806
Task: 30816
Task: 30817
Change-Id: I92ee4e879e98e4718d2e9aba56486341223a9157
Pools can now be each be assigned an OpenSSL cipher string with the
field tls_ciphers. A new configuration option, default_pool_ciphers,
specifies what cipher string to use for new tls-enabled pools
if one is not explicitly specified at time of creation.
Change-Id: Iedb7774bfb8d70ea307d6a513248e1fe2389fa34
Depends-On: I77da6f14063877af0077f2c12df1aab5d5ead187
Story: 2006627
Task: 37172
Listeners will now be able to each be assigned their own OpenSSL
cipher string with a new field: tls_ciphers. There is also a new
configuration option, default_listener_ciphers, which specifies the
cipher string to assign to new listeners when one is not explicitly
specified.
Change-Id: I77da6f14063877af0077f2c12df1aab5d5ead187
Depends-On: Id5f4c20abd40dd092558a711987953012d4ae67f
Story: 2006627
Task: 36839
healthcheck middleware adds a /healthcheck url that allows
unauthenticated access to provide a simple check when running
octavia-api behind a load balancer
https://docs.openstack.org/oslo.middleware/latest/reference/healthcheck_plugins.html
Co-authored-by: Michael Johnson <johnsomor@gmail.com>
Change-Id: I10db6226750f7b7c703067d2ab82eea3a9875112
The provided etc/octavia.conf file is typically installed by system
packages. It is important to set correct configuration option names and
default values even when commented out.
Task: 37525
Story: 2006891
Change-Id: Ia9da64d76e31422464af9d24b675094f25350f48
Fix an issue that prevents graceful shutdown of controller workers.
cotyledon.Service.terminate function is by definition the graceful
termination function and doesn't have any 'graceful' optional boolean
argument (https://cotyledon.readthedocs.io/en/latest/api.html).
Because of this error, message_listener.wait() was never called in the
consumers' termination functions, so flows could be interrupted before
completion and could leave resources such as load balancer in a
PENDING_* provisioning state.
By default cotyledon.Service terminates the server after a timeout if
the worker could not shutdown itself gracefully. The default value
for the timeout is 300 seconds (set in devstack plugin) and can be
overriden using the graceful_shutdown_timeout setting in octavia.conf
The default value will be updated to a lower value when work on
persistant taskflow will be merged.
Story: 2006603
Task: 36770
Change-Id: I3f776bd018246897c9a889699a2d0ecbbfbb7098
This patch adds support for long-running provider driver agents to
the Octavia driver-agent.
It will fork a process for all of the enabled provider driver
agents at startup.
Change-Id: Ib7042bcc48b1dd5b37b671dd5e64728b71ab9542
Story: 2006250
Task: 35863
In some deploy production, using volume based instead of localdisk
to protect data and live migrate can perform.
This patch adds:
- creation a cinder volume for amphora
- boot amphora with cinder volume
- config options for cinder client
- unit tests for cinder functionality
Story: 2001594
Co-authored-by: Vadim Ponomarev <velizarx@gmail.com>
Co-authored-by: Margarita Shakhova <shakhova.margarita@gmail.com>
Change-Id: I8181ed696b9ab556e7741c08839d79167aff8350
This patch adds support for the octavia-lib to get objects by ID.
Change-Id: I98b399891488e5972ea4d332c06b55b34f20fb11
Story: 2005870
Task: 33680
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
The default TaskFlow engine is now set to 'parallel' instead of
'serial'. The parallel engine schedules tasks onto different threads to
allow for running non-dependent tasks simultaneously. This has the
benefit of accelerating the execution of some Octavia Amphora flows such
as provisioning of active-standby amphora loadbalancers.
Change-Id: I108b7f629d39c40b60ddf4a1878631f32e37b357
This sets up the HTTPProxyToWSGI middleware in front of Octavia API. The
purpose of this middleware is to set up the request URL correctly in
the case there is a proxy (For instance, a loadbalancer such as HAProxy)
in front of Octavia API.
So, when TLS connections are terminated at the proxy, and one tries to
get the versions from the '/' resource from Octavia API, one will notice
that the protocol is incorrect; It will show 'http' instead of 'https'.
So this middleware handles such cases.
The HTTPProxyToWSGI is off by default and needs to be enabled via a
configuration value.
It can be enabled with the option in octavia.conf:
[oslo_middleware]
enable_proxy_headers_parsing=True
Story: 2005105
Task: 29732
Change-Id: I276188530a83598ed75560f02ed9d80ce9afca2f
Configure rsyslog to forward logs to a target host
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Story: 1665069
Task: 33646
Change-Id: I00703f86555cbb574b943794b14a36fbc644f1b2
This patch configures the primary components of the amphora to log
to syslog using consistent logging facilities.
By default, user traffic logs will go to LOG_LOCAL0 and the amphora
processes (haproxy, keepalived, etc.) will log to LOG_LOCAL1.
This is a patch supporting log offloading.
Change-Id: Ifda91e0310e812e34f1e398dd3176af8a9c58f89
Story: 1665069
Task: 5486
The OpenStack Anchor project is now officially retired[1].
This patch removes the references to Anchor from Octavia.
These old references were confusing new users.
[1] https://review.opendev.org/#/c/611187/
Change-Id: Idfe90aa69b497e8270118174dde00567d7fab4ab
Includes some updates to docs and configs and related files to remove
references to neutron-lbaas. Also remove handlers.
Change-Id: I3082962841d3b645f3cbd1a6b41fc7fb28dcf7e6
This affects only the internal certificates that we generate and install
on Amphorae for use with the amphora-agent.
Change-Id: I8c3eb71246d339bd2d43092cce4e6122a49e9534
This patch improves the Octavia documentation in two ways:
It patch clarifies the format for the enabled_provider_drivers
configuration setting.
It also adds a link to the Octavia release notes to the documentation
home page.
Change-Id: I3f0349f37a5683061de2beff689314469a7dc255
The server_certs_key_passphrase was added in
I06d329ca53bc36bd27f7870ae7c7ca0cf18575b2 and should be
a part of the example octavia.conf
Change-Id: I5e60e8fbb7af381b59c6d7b02d5ba8eb47e91720
This is the base patch that updates octavia to use the new octavia-lib.
It is backwards compatible by using debtcollector moves.
It adds a new controller process called the "driver-agent".
This patch also adds unit test coverage for a few additional modules.
Depends-On: https://review.openstack.org/#/c/641180/
Change-Id: I438e1548ec0fb6111d1ab85b05015007d9d0a006
This was from when we thought Anchor was the future of our internal cert
authority configuration. Self-signed certs are perfectly acceptable for
production deployments.
Change-Id: I5351a3bc4f1d80846ecbc7e1a77a47d9b91d7de7
This patch changes the [haproxy_amphora] connection_max_retries and
build_active_retries default values from 300 to 120. This means load
balancer builds will wait for ten minutes instead of twenty-five minutes
for nova to boot the virtual machine.
We feel these are more reasonable default values for most production
deployments and provide a better user experience.
Only environments running in nested virtualization, without nested
virtualization enabled in the hypervisor could require a value as high as
300.
Depends-On: https://review.openstack.org/637074
Change-Id: I46be11062fb15ed21169fbec5dc8451a588273a5
Occasionally the test jobs[1] will fail with:
octavia.amphorae.drivers.haproxy.rest_api_driver [-]
Could not connect to instance. Read timed out. (read timeout=120.0)
This patch increases the default read timeout to 180 and changes the
directory copy that would subsequently fail to be more idempotent.
[1] http://logs.openstack.org/09/613709/14/check/ \
octavia-v2-dsvm-scenario-two-node/d83db12/controller2/logs/ \
screen-o-cw.txt.gz#_Feb_08_21_58_23_919928
Change-Id: Ia0bd6762c2605ce240a549b3e90e5c44b65897a5
This patch adds Cloud Auditing Data Federation (CADF) auditing support to the
Octavia API. This is implemented using the keystonemiddleware audit filter.
Change-Id: I87a7e15171dfaf28b6ed97ca71d4423d18fbdbea
This patch adds a few optimizations when using the amphora driver.
1. It increases the amp_active_retries from 10 to 30. This increases
the time we wait for nova to mark an instance "ACTIVE". The old default
of 10 was one minute forty seconds, but in some clouds it's been observed
that the nova schedule can get overloaded and take longer than a minute
forty to schedule the instance. Setting this to 30 means we will wait
five minutes for nova to schedule the instance.
2. It enables TCP kernel splicing in HAProxy. This has been shown to
reduce the CPU overhead for very high rate TCP load balancers.
3. Finally it enables "safe" HTTP keepalives on the backend member
connections [1]. This increases the request rate possible while using HTTP
protocol listeners and members.
[1] http://cbonte.github.io/haproxy-dconv/1.6/configuration.html#4-http-reuse
Change-Id: I3af009cac9a9edc8aef793b52c6a1488fde2c59b
When queue_event_streamer driver is used and RabbitMQ
is down, stats update processes occupy the thread pool
which is shared with health update processes. Then,
RabbitMQ down unexpectedly leads to delete all existing
amphorae. This commit separates the thread pool and aims
to keep the existing amphorae working even when RabbitMQ
is down.
Change-Id: I576687f5b646496ff3a00787cf5e8c27f36b9448
Task: 22929
Story: 2002937
This patch adds a configuration option for reserved IP addresses that
cannot be used for load balancer member addresses. By default, this will
include the nova metadata service address 169.254.169.254.
Change-Id: I25de5ed5f6f35afc55dd1154c3e02934fddb100a
Story: 2003413
Task: 24555
In Pike[1], we introduced a user_group auto detection for haproxy.
The default user group name is auto-detected for any OS distribution
we support as a base for Amphorae.
user_group remained as an option for admins but was also
marked deprecated in Pike[2].
This patch removes that option altogether.
Story: 2003323
Task: 24357
[1] Ia8fede9d7da4709a48661d1fc595a16d04fcbfa9
[2] https://review.openstack.org/#/c/429398/45/octavia/common/config.py@175
Change-Id: Iddd4162674f116705d2b47062cbf7ca88f2677a6
Add new types into db table.
Extending the existing API, including Listener, Pool, HealthMonitor for
UDP fields support.
For healthmonitor part, need to wait for other patch to fix the default
value.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Story: 1657091
Task: 5484
Change-Id: If728705f142f4195fe624bd9ef17413722d54fe3
These files will split with the current Octavia repo, before other parts
are ok.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Change-Id: Ib4924e602d450b1feadb29e830d715ae77f5bbfe
If a load balancer loses more than one amphora at the same time
the failover process will fail and leave the load balancer in
provisioning status ERROR.
This patch resolves this by failing over one amphora at a time
marking any amphora that are also failed in status ERROR. The health
manager will then failover the other failed amphora in subsequent checks.
This patch will update multiple healthy amphora in parallel and will
timeout failed amphroa using the new "active_connection_max_retries"
configuration setting used for "fail-fast" connections.
The patch also updates the amphora failover flow documentation to
show the full flow and not just the spares failover flow.
It updates the amphora driver "get_diagnostics" method to pass instead
of error.
It also adds a AmphoraComputeConnectivityWait task to explicitly wait
for a compute instance to come up and be reachable. This allows a longer
timeout and clarifies this may fail due to compute (nova) failures.
Previously the first plug vip task would do this wait.
Change-Id: Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701
Story: 2001481
Task: 6202
This patch addresses the following:
Fixes some unit tests.
Cleans up some code from the parent patches,
Adds a release note for the provider driver support.
Adds the "List providers" API.
Adds a document listing the know provider drivers.
Adds a provider driver development guide.
Change-Id: I90dc39e5e9d7d5839913dc2dbf187d935ee2b8b5
Story: 1655768
Task: 5165
This patch adds provider driver support to the Octavia v2 API, starting
with the load balancer API.
This patch also creates a provider driver for Octavia, initially fully
implementing the load balancer methods.
Follow on patches will implement the remain parts of the API.
Change-Id: Ia15280827799d1800c23ed76d2af0e3596b9d2f7
Story: 1655768
Task: 5165
PING is a trap. There is no real-world scenario where PING is the option
that makes the most sense, but people are familiar with it, and it seems
"simple", so they pick it. This needs to stop. Empower operators to
disable this!
Change-Id: Ifa80b7a5973361c13f2e6611789aa9798325ece0
Option auth_uri from group keystone_authtoken is deprecated[1].
Use option www_authenticate_uri from group keystone_authtoken.
[1]https://review.openstack.org/#/c/508522/
Change-Id: If6eee4ecfb4c6c607c9ee762cc535cf5d6180d88
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
*NOT* deprecating the old way of storing these, as I believe that would
create a huge mess for anyone already using it.
Change-Id: I1fee174d8b8956f3d2053781a7f18c2940b21765
This patch is the initial implementation of a distributor driver for
Octavia Active/Active topology support.
This patch is a decompostion of the following patch:
https://review.openstack.org/#/c/313006
Story: 2001288
Task: 5836
Depends-On: I97b52b80efb33749647229a55147a08afa112dd2
Change-Id: I65e4a533caee692e1c98e8c6586c2e2132f2e34c
Co-Authored-By: Valeria Perelman <perelman@il.ibm.com>
This adds a way to configure the event streamer transport URL
so it can post to a different queue, e.g. Neutron's
Change-Id: I69d3d6d30e33878052f2c56b8c79a14cc4ec1b24
In large build situations, nova can be slow to build VMs, this means that the
default 100 second timeout may expire before the final status has been updated
in the neutron database. This patch will emit provisioning status to be sync
with neutron db
Change-Id: If6c0b81630fd1911518792d9947f8622f065ff4e