Running amphora failover against the amphora noop driver was raising a
TypeError (reload() takes from 2 to 3 positional arguments but 4 were
This patch refactors the failover flows to improve the performance
and reliability of failovers in Octavia.
Specific improvements are:
* More tasks and flows will retry when other OpenStack services are
* Failover can now succeed even when all of the amphora are missing
for a given load balancer.
* It will check and repair the load balancer VIP should the VIP
port(s) become corrupted in neutron.
* It will cleanup extra resources that may be associated with a
load balancer in the event of a cloud service failure.
This patch also removes some dead code.
-w (timeout) option doesn't do anything in nmap-ncat (default netcat in
CentOS/RHEL) for UDP datagrams, and nmap-ncat has a default idle timeout
set to 2 seconds.
We can get the same behavior as netcap-openbsd (Debian/Ubuntu) by
setting that idle timeout (-i) option to 1 second.
This commit detects the flavor of the netcat binary (nmap vs other) and
uses it to adapt the parameters.
E741 ambiguous variable name 'l'
Change 'l' to another variable in affected code.
Also had to set the latex_engine to 'xelatex' in doc/source/conf.py
in order to get past an openstackdocstheme change the broke the pdf
This patch introduces 2 macros in lvs.
1. Support HTTP GET, allow users create HTTP healthmonitor for udp pool.
2. Support TCP check, allow users create TCP healthmonitor for udp pool.
Co-Authored-By: Adam Harwell <firstname.lastname@example.org>
Flask's stream always returns bytes, file write always takes string.
This causes py3 amps to return 500 on cert rotation AND wipe out the
certificate, so the amphora are no longer controllable and go to ERROR
state. Anyone running py3 amps prior to this patch will experience
amphorae breaking on a timer due to housekeeping cert rotation!
Listeners will now be able to each be assigned their own OpenSSL
cipher string with a new field: tls_ciphers. There is also a new
configuration option, default_listener_ciphers, which specifies the
cipher string to assign to new listeners when one is not explicitly
Should have done "pad to 8 characters" on the hex conversion, but it was
instead hardcoded to pad a single `0`, which is right in a lot of cases
but not all.
>>> ip1 = ipaddress.ip_address('184.108.40.206')
>>> ip2 = ipaddress.ip_address('10.1.1.1')
>>> "%X" % ip1._ip
>>> "%X" % ip2._ip
Convert all code to not require six library and instead
use python 3.x logic.
Created one helper method in common.utils for binary
representation to limit code changes.
Fixed endpoints logs for listener, pool and member as well.
Rework _get_create_amp_for_lb_subflow due to issue with taskflow
decider and retry subflow.
Retry subflow was not ignored properly for spare amphorae case,
so _get_create_amp_for_lb_subflow has been split to
3 separate subflows each of them linked to graph flow.
This is work around and can be removed when proper mechanism
implemeneted in taskflow library. (added several todos about it).
Code was not using the correct filenames for the 'route',
'route6', 'rule' and 'rule6' files on Red Hat images.
Changed to use config option 'agent_server_network_file'
if it's specified, else the file of the correct name, and
added unit tests for each.
This patch corrects a bug with mutli-listener load balancers that
are using either TLS client authentication and/or backend
This patch allows listeners on a load balancer to continue to
operate should one listener fail to access secret content in
barbican. Previously if one listener failed to access barbican
content, all of the listeners would be impacted.
This patch also cleans up some unused code and unnecessary comments.
Use taskflow retry for connectivity wait. 
This reqired for redis jobboard implementation as each retry expand
claim for job on worker. This means that worker is proccesing job and
it should not be released for other workers to work on it.
Adopted for v2 flows.
 - https://docs.openstack.org/taskflow/latest/user/atoms.html#retry
Currently the keepalivedlvs_query script calls ipvsadm -Ln --stats
to query the local lvs for connection information. If any of these
values grow large enough they will be abbreviated with human-
friendly suffixes (K, M, G) and cause the get_ipvsadm_info func
to raise an exception when it receives a non-integer value from
its command output. By using the --exact argument in addition to
the existing arguments, we can ensure the output is always expanded
numbers, per the ipvsadm man page, and will only ever offer integer
outputs to the get_ipvsadm_info command.
The single process patch changed the way listeners and load balancers
are deployed inside the amphora. This caused listeners with SNI
enabled to load all of the certificates for all of the TLS enabled
listeners on a load balancer.
This patch corrects that by configuring each listener with a
specific list of certificates.
With new pylint release (2.4.1), new warnings were triggered:
Multi-listener LB commit (Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b)
introduced a v2 message for octavia healthmonitor.
This commit fixes an issue with healthmonitor messages for UDP
listeners, they didn't follow the v2 message specification: pools
dictionaries were stored in listener objects (v1 format) instead of
being stored as in the root dictionary of the message.
Currently jinja_combo.build_config method expect to use single
tls cert, though with multiple listeners there could be multiple
certs. Also in case of HTTP and TERMINATED_HTTPS listeners on the
same loadbalancer - creation of the second listener will fail.
Correct the inline comment to not include an empty new line at the start
of generated /var/lib/octavia/vrrp/check_script.sh that leads to this
kind of error:
> Aug 26 11:49:32 amphora-12184e15-1ec3-4d80-98a7-c7d1ddb6716f
> Keepalived_vrrp: Error exec-ing command
> '/var/lib/octavia/vrrp/check_script.sh', error 8: Exec format error
Currently the amphora agent will lookup interfaces using the
interface name determined earlier in the plug method. This can
lead to a race condition with the udev interface renaming rule.
This patch changes the interface lookup to use the MAC address
directly and not rely on the interface name.
When removing listeners, listeners are removed from the load balancer's
listener list just before reconfiguring each amphora.
In case of ACTIVE/STANDBY topology, the code is performed on both
amphorae, so the listener is removed twice from the list.
This commit ensures that we don't remove an already removed listener.
When removing a UDP health monitor, keepalived is reloaded with a
configuration without any checkers.
But if keepalived has previously detected a down server, the state of
the server is unchanged and it will never be added to the list of IPVS
Restarting keepalived on configuration change works around this issue.
This issue is fixed in keepalived (>=2.0.14):
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Co-Authored-By: Adam Harwell <email@example.com>
An exception handler in the amphora-agent has a python3 string
comparison bug that will cause a TypeError.
This patch fixes that bug and adds test coverage for the
This patch configures the primary components of the amphora to log
to syslog using consistent logging facilities.
By default, user traffic logs will go to LOG_LOCAL0 and the amphora
processes (haproxy, keepalived, etc.) will log to LOG_LOCAL1.
This is a patch supporting log offloading.
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.
In Python 3.3 IOError is just an alias of OSError. This
causes logging in a very specific scenario to not log
the appropriate message, as one code path is unreachable.
This is fixed in this patch by merging the two exception paths.