Running amphora failover against the amphora noop driver was raising a
TypeError (reload() takes from 2 to 3 positional arguments but 4 were
given).
Change-Id: I64172d6995959cf377364584ad9a2395f9ec0605
This patch refactors the failover flows to improve the performance
and reliability of failovers in Octavia.
Specific improvements are:
* More tasks and flows will retry when other OpenStack services are
failing.
* Failover can now succeed even when all of the amphora are missing
for a given load balancer.
* It will check and repair the load balancer VIP should the VIP
port(s) become corrupted in neutron.
* It will cleanup extra resources that may be associated with a
load balancer in the event of a cloud service failure.
This patch also removes some dead code.
Change-Id: I04cb2f1f10ec566298834f81df0cf8b100ca916c
Story: 2003084
Task: 23166
Story: 2004440
Task: 28108
-w (timeout) option doesn't do anything in nmap-ncat (default netcat in
CentOS/RHEL) for UDP datagrams, and nmap-ncat has a default idle timeout
set to 2 seconds.
We can get the same behavior as netcap-openbsd (Debian/Ubuntu) by
setting that idle timeout (-i) option to 1 second.
This commit detects the flavor of the netcat binary (nmap vs other) and
uses it to adapt the parameters.
Story: 2007688
Task: 39800
Change-Id: I0100aaa428477f011bd39a90dd4ec98199b4bebc
E741 ambiguous variable name 'l'
Change 'l' to another variable in affected code.
Also had to set the latex_engine to 'xelatex' in doc/source/conf.py
in order to get past an openstackdocstheme change the broke the pdf
doc build.
Change-Id: Idd176e40ccf2a79832a5c99140bd30e5e1f9c0d8
This patch introduces 2 macros in lvs.
1. Support HTTP GET, allow users create HTTP healthmonitor for udp pool.
2. Support TCP check, allow users create TCP healthmonitor for udp pool.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: I61c7d8d4df54710a92b8c055be84bba29bf3d7e6
Story: 2003200
Task: 23356
Story: 2003199
Task: 23355
Flask's stream always returns bytes, file write always takes string.
This causes py3 amps to return 500 on cert rotation AND wipe out the
certificate, so the amphora are no longer controllable and go to ERROR
state. Anyone running py3 amps prior to this patch will experience
amphorae breaking on a timer due to housekeeping cert rotation!
Change-Id: I831b0b48d719397c14d80f8ebcbad997c50c7795
Listeners will now be able to each be assigned their own OpenSSL
cipher string with a new field: tls_ciphers. There is also a new
configuration option, default_listener_ciphers, which specifies the
cipher string to assign to new listeners when one is not explicitly
specified.
Change-Id: I77da6f14063877af0077f2c12df1aab5d5ead187
Depends-On: Id5f4c20abd40dd092558a711987953012d4ae67f
Story: 2006627
Task: 36839
Should have done "pad to 8 characters" on the hex conversion, but it was
instead hardcoded to pad a single `0`, which is right in a lot of cases
but not all.
For example:
>>> ip1 = ipaddress.ip_address('98.136.140.23')
>>> ip2 = ipaddress.ip_address('10.1.1.1')
>>> "%X" % ip1._ip
'62888C17'
>>> "%X" % ip2._ip
'A010101'
Change-Id: Ia9fec4e72c00f7086489b245d9dc50ed9c27f12a
Convert all code to not require six library and instead
use python 3.x logic.
Created one helper method in common.utils for binary
representation to limit code changes.
Change-Id: I2716ce93691d11100ee951a3a3f491329a4073f0
Fixed endpoints logs for listener, pool and member as well.
Rework _get_create_amp_for_lb_subflow due to issue with taskflow
decider and retry subflow.
Retry subflow was not ignored properly for spare amphorae case,
so _get_create_amp_for_lb_subflow has been split to
3 separate subflows each of them linked to graph flow.
This is work around and can be removed when proper mechanism
implemeneted in taskflow library. (added several todos about it).
Change-Id: Ibd114fa14123e6de6c5d6f260e32cf7f2b28805a
Story: 2005072
Task: 30814
Code was not using the correct filenames for the 'route',
'route6', 'rule' and 'rule6' files on Red Hat images.
Changed to use config option 'agent_server_network_file'
if it's specified, else the file of the correct name, and
added unit tests for each.
Change-Id: I335287da66524d026f0c42086d885b478c568bbd
Task: 37881
Story: 2007051
This patch corrects a bug with mutli-listener load balancers that
are using either TLS client authentication and/or backend
re-encryption.
Change-Id: Ib7b083e1dfbfd7afcca870ed6f60a871b2e19253
Story: 2006822
Task: 37394
This patch allows listeners on a load balancer to continue to
operate should one listener fail to access secret content in
barbican. Previously if one listener failed to access barbican
content, all of the listeners would be impacted.
This patch also cleans up some unused code and unnecessary comments.
Change-Id: I300839fe7cf88763e1e0b8c484029662beb64f0a
Story: 2006676
Task: 36951
Use taskflow retry for connectivity wait. [1]
This reqired for redis jobboard implementation as each retry expand
claim for job on worker. This means that worker is proccesing job and
it should not be released for other workers to work on it.
Adopted for v2 flows.
[1] - https://docs.openstack.org/taskflow/latest/user/atoms.html#retry
Story: 2005072
Task: 33477
Change-Id: I2cf241ea965ad56ed70ebde83632ab855f5d859e
Currently the keepalivedlvs_query script calls ipvsadm -Ln --stats
to query the local lvs for connection information. If any of these
values grow large enough they will be abbreviated with human-
friendly suffixes (K, M, G) and cause the get_ipvsadm_info func
to raise an exception when it receives a non-integer value from
its command output. By using the --exact argument in addition to
the existing arguments, we can ensure the output is always expanded
numbers, per the ipvsadm man page, and will only ever offer integer
outputs to the get_ipvsadm_info command.
Change-Id: I2e8c0be2221c0c23b752fdf2cdff065cddf830a5
Story: 2006791
Task: 37331
The single process patch changed the way listeners and load balancers
are deployed inside the amphora. This caused listeners with SNI
enabled to load all of the certificates for all of the TLS enabled
listeners on a load balancer.
This patch corrects that by configuring each listener with a
specific list of certificates.
Change-Id: I2f3c7ab4137dbd84d77a6a6b675975af406249d0
Story: 2006758
Task: 37252
With new pylint release (2.4.1), new warnings were triggered:
- unnecessary-comprehension
- no-else-break
- no-else-continue
- import-outside-toplevel
Change-Id: I301cc9fc6b41e9e97f051df29d768b172cade636
Multi-listener LB commit (Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b)
introduced a v2 message for octavia healthmonitor.
This commit fixes an issue with healthmonitor messages for UDP
listeners, they didn't follow the v2 message specification: pools
dictionaries were stored in listener objects (v1 format) instead of
being stored as in the root dictionary of the message.
Story: 2005736
Task: 33394
Change-Id: I93e5eb5bc69fe4de4c450c09367b319769ef07db
The amphora no-op driver had the wrong method signature for the
update_amphora_agent_config method.
This patch corrects that issue.
Change-Id: Ib1b0df3b7227d8a8dd68276e279cae1c4974ded2
Currently jinja_combo.build_config method expect to use single
tls cert, though with multiple listeners there could be multiple
certs. Also in case of HTTP and TERMINATED_HTTPS listeners on the
same loadbalancer - creation of the second listener will fail.
Change-Id: Iad3b55e5add4283256f7836c3d4a501aa57ffc2f
Story: 2006513
Task: 36510
Correct the inline comment to not include an empty new line at the start
of generated /var/lib/octavia/vrrp/check_script.sh that leads to this
kind of error:
> Aug 26 11:49:32 amphora-12184e15-1ec3-4d80-98a7-c7d1ddb6716f
> Keepalived_vrrp[15265]: Error exec-ing command
> '/var/lib/octavia/vrrp/check_script.sh', error 8: Exec format error
Change-Id: Icddd2873abeb56a389a35356995df6dde70872b2
Currently the amphora agent will lookup interfaces using the
interface name determined earlier in the plug method. This can
lead to a race condition with the udev interface renaming rule.
This patch changes the interface lookup to use the MAC address
directly and not rely on the interface name.
Story: 2006300
Task: 36013
Change-Id: I5bc21d5abdeb67a3a8ae88456735643463f15694
When removing listeners, listeners are removed from the load balancer's
listener list just before reconfiguring each amphora.
In case of ACTIVE/STANDBY topology, the code is performed on both
amphorae, so the listener is removed twice from the list.
This commit ensures that we don't remove an already removed listener.
Story: 2006329
Task: 36065
Change-Id: I426255f587f36b415eb999a9eb28cf0f91de94b0
When removing a UDP health monitor, keepalived is reloaded with a
configuration without any checkers.
But if keepalived has previously detected a down server, the state of
the server is unchanged and it will never be added to the list of IPVS
servers.
Restarting keepalived on configuration change works around this issue.
This issue is fixed in keepalived (>=2.0.14):
https://github.com/acassen/keepalived/issues/1163
Story: 2005774
Task: 33491
Change-Id: Iaa34db6cb1dfed98e96a585c5d105e263c7efa65
This commit fixes pool and members status when using UDP loadbalancers.
Story: 2005736
Task: 33394
Change-Id: I75cde3ff820f085aebbdffd1e40c5ff40f16835d
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
An exception handler in the amphora-agent has a python3 string
comparison bug that will cause a TypeError.
This patch fixes that bug and adds test coverage for the
start_stop_listener.
Change-Id: I6f5d95c5f875edda530f54ae72386d6495235ca6
Story: 2005898
Task: 33760
Configure rsyslog to forward logs to a target host
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Story: 1665069
Task: 33646
Change-Id: I00703f86555cbb574b943794b14a36fbc644f1b2
This patch configures the primary components of the amphora to log
to syslog using consistent logging facilities.
By default, user traffic logs will go to LOG_LOCAL0 and the amphora
processes (haproxy, keepalived, etc.) will log to LOG_LOCAL1.
This is a patch supporting log offloading.
Change-Id: Ifda91e0310e812e34f1e398dd3176af8a9c58f89
Story: 1665069
Task: 5486
This resolves extranous "improper escape sequence" warnings on
python 3.6+[1].
Note, this does not resolve those warnings from pylint. There
is already another proposed patch to address pylint[2].
[1] https://review.opendev.org/494322
[2] https://review.opendev.org/635236
Change-Id: Ie160436913e4d935bab118d31ba10193ac38bd8f
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.
Story: 2004073
Task: 27434
Change-Id: I52301d763797d619f195bd8a1c32bc47f1e68420
In Python 3.3 IOError is just an alias of OSError. This
causes logging in a very specific scenario to not log
the appropriate message, as one code path is unreachable.
This is fixed in this patch by merging the two exception paths.
Story: 2005576
Task: 30765
Change-Id: Ie81de8e85753fde1516aea0b084df6a0c513ad7b