Convert all code to not require six library and instead
use python 3.x logic.
Created one helper method in common.utils for binary
representation to limit code changes.
Change-Id: I2716ce93691d11100ee951a3a3f491329a4073f0
Fixed endpoints logs for listener, pool and member as well.
Rework _get_create_amp_for_lb_subflow due to issue with taskflow
decider and retry subflow.
Retry subflow was not ignored properly for spare amphorae case,
so _get_create_amp_for_lb_subflow has been split to
3 separate subflows each of them linked to graph flow.
This is work around and can be removed when proper mechanism
implemeneted in taskflow library. (added several todos about it).
Change-Id: Ibd114fa14123e6de6c5d6f260e32cf7f2b28805a
Story: 2005072
Task: 30814
This patch corrects a bug with mutli-listener load balancers that
are using either TLS client authentication and/or backend
re-encryption.
Change-Id: Ib7b083e1dfbfd7afcca870ed6f60a871b2e19253
Story: 2006822
Task: 37394
This patch allows listeners on a load balancer to continue to
operate should one listener fail to access secret content in
barbican. Previously if one listener failed to access barbican
content, all of the listeners would be impacted.
This patch also cleans up some unused code and unnecessary comments.
Change-Id: I300839fe7cf88763e1e0b8c484029662beb64f0a
Story: 2006676
Task: 36951
Use taskflow retry for connectivity wait. [1]
This reqired for redis jobboard implementation as each retry expand
claim for job on worker. This means that worker is proccesing job and
it should not be released for other workers to work on it.
Adopted for v2 flows.
[1] - https://docs.openstack.org/taskflow/latest/user/atoms.html#retry
Story: 2005072
Task: 33477
Change-Id: I2cf241ea965ad56ed70ebde83632ab855f5d859e
The single process patch changed the way listeners and load balancers
are deployed inside the amphora. This caused listeners with SNI
enabled to load all of the certificates for all of the TLS enabled
listeners on a load balancer.
This patch corrects that by configuring each listener with a
specific list of certificates.
Change-Id: I2f3c7ab4137dbd84d77a6a6b675975af406249d0
Story: 2006758
Task: 37252
The amphora no-op driver had the wrong method signature for the
update_amphora_agent_config method.
This patch corrects that issue.
Change-Id: Ib1b0df3b7227d8a8dd68276e279cae1c4974ded2
Currently jinja_combo.build_config method expect to use single
tls cert, though with multiple listeners there could be multiple
certs. Also in case of HTTP and TERMINATED_HTTPS listeners on the
same loadbalancer - creation of the second listener will fail.
Change-Id: Iad3b55e5add4283256f7836c3d4a501aa57ffc2f
Story: 2006513
Task: 36510
When removing listeners, listeners are removed from the load balancer's
listener list just before reconfiguring each amphora.
In case of ACTIVE/STANDBY topology, the code is performed on both
amphorae, so the listener is removed twice from the list.
This commit ensures that we don't remove an already removed listener.
Story: 2006329
Task: 36065
Change-Id: I426255f587f36b415eb999a9eb28cf0f91de94b0
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.
Story: 2004073
Task: 27434
Change-Id: I52301d763797d619f195bd8a1c32bc47f1e68420
Add tls_ca_container_id and crl_container_id into Pool API.
Story: 2003858
Task: 26672
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I6cd6e2ca8e48a5df707a70d22505dec9d752c7eb
Add 1 fields like Listener does, which is 'tls_container_ref', this
field is introduced into Pool for storage the pool client certificate to
the backend servers, when the traffic willing to bring a cert to the
servers and check for tls connection.
Story: 2003859
Task: 26685
Change-Id: I29b7c7116e6087c942179ed9efdead494ef277a3
Add crl-file in Listener side.
Story: 2002165
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I9e2ec06719fbbfd19482c2b8d39220e7e4ed81e3
This patch add 'client_ca_tls_container_ref' into listener API for front
client authentication.
Story: 2002165
Task: 20018
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I8a96d6fdfe53a16d1abcfd09bc6afedd6c490de2
This patch validates that a flavor is compatible with using spares
pool amphora. It will also update the amphora-agent config after
a spares pool amphora has been allocated.
This patch enables the ability to update a running amphora's agent
configuration and have the mutatable options be adopted.
The following amphora agent configuration options can be updated:
heartbeat_key
controller_ip_port_list
heartbeat_interval
loadbalancer_topology
This patch adds the support to the amphora-agent and the amphora
driver. A follow on patch will expose this capabililty via the
amphora admin API.
Change-Id: I97bdf5188808193516509f20767e82c0f8d2f5a5
The dual-amp-down fix added an amphora parameter to the amphora driver
interface, but failed to update the driver base and the noop driver.
This patch corrects that oversight.
Depends-On: https://review.openstack.org/634992
Change-Id: I7bd63c933f8e7cd10ff5c89fafbbb09e8cc9e3e1
Load balancers with IPv6 VIP addresses would fail to create due to
a duplicate address detection issue. The keepalived process would also
crash with a segfault due to a known bug[1].
This patch resolves both issues and allows load balancers with IPv6
VIP addresses to be created in active/standby topology.
[1] https://github.com/acassen/keepalived/issues/457
Story: 2003451
Task: 24657
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I15a4be05740e2657f998902d468e57763c3ed52e
A recent patch[1] (stein master) added the http-reuse option to the
haproxy template for pools. This feature is not available in the HAProxy
version included with CentOS 7, 1.5.x. This could cause an upgrade issue
if the control plane was upgraded to Stein, but the cloud still had older
CentOS based amphora.
This patch corrects that issue by checking the HAProxy version in the
amphora and adjusting the template if it finds an older HAProxy.
This patch also updates the test_health_check_stale_amphora test to
not wait (sleep) for the full heartbeat_timeout.
[1] https://review.openstack.org/#/c/598379/
Change-Id: I3d990d1d3cd93dbeced9edc53f9c166610dafcd0
Story: 2003901
Task: 26775
The amphora no-op driver did not get updated properly for the multi-amphora
failover fix.
This patch fixes that issue and corrects the doc strings for the
haproxy amphora driver update_amphora_listeners method.
Change-Id: Ib0d63da7c5599069f5ea50f0dfbc59eefba58c84
When queue_event_streamer driver is used and RabbitMQ
is down, stats update processes occupy the thread pool
which is shared with health update processes. Then,
RabbitMQ down unexpectedly leads to delete all existing
amphorae. This commit separates the thread pool and aims
to keep the existing amphorae working even when RabbitMQ
is down.
Change-Id: I576687f5b646496ff3a00787cf5e8c27f36b9448
Task: 22929
Story: 2002937
In Pike[1], we introduced a user_group auto detection for haproxy.
The default user group name is auto-detected for any OS distribution
we support as a base for Amphorae.
user_group remained as an option for admins but was also
marked deprecated in Pike[2].
This patch removes that option altogether.
Story: 2003323
Task: 24357
[1] Ia8fede9d7da4709a48661d1fc595a16d04fcbfa9
[2] https://review.openstack.org/#/c/429398/45/octavia/common/config.py@175
Change-Id: Iddd4162674f116705d2b47062cbf7ca88f2677a6
1. Removes the misc_dynamic setting from the UDP-CONNECT health monitor
as our script does not use it.
2. Adds a release note for the UDP features.
3. Updates the API reference for UDP support.
4. Adds a comment to the keepalived config with the LB ID.
5. Updates the status message type to be the correct UDP protocol.
6. Fix error during deleting a listener if there are multiple amphoraes.
7. Refactors systemd service script handling.
Story: 2003306
Task: 24258
Change-Id: I09240023d066ac5a71836d01045cda6ce5678712
These files will split with the current Octavia repo, before other parts
are ok.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Change-Id: Ib4924e602d450b1feadb29e830d715ae77f5bbfe
If a load balancer loses more than one amphora at the same time
the failover process will fail and leave the load balancer in
provisioning status ERROR.
This patch resolves this by failing over one amphora at a time
marking any amphora that are also failed in status ERROR. The health
manager will then failover the other failed amphora in subsequent checks.
This patch will update multiple healthy amphora in parallel and will
timeout failed amphroa using the new "active_connection_max_retries"
configuration setting used for "fail-fast" connections.
The patch also updates the amphora failover flow documentation to
show the full flow and not just the spares failover flow.
It updates the amphora driver "get_diagnostics" method to pass instead
of error.
It also adds a AmphoraComputeConnectivityWait task to explicitly wait
for a compute instance to come up and be reachable. This allows a longer
timeout and clarifies this may fail due to compute (nova) failures.
Previously the first plug vip task would do this wait.
Change-Id: Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701
Story: 2001481
Task: 6202
We want to default to running all tox environments under python 3, so
set the basepython value in each environment.
We do not want to specify a minor version number, because we do not
want to have to update the file every time we upgrade python.
We do not want to set the override once in testenv, because that
breaks the more specific versions used in default environments like
py35 and py36.
This patch also updates pylint to 1.5.6 which is compatible with
python3.
In updating pylint we have some issues to correct, this patch addresses
those issues so the Octavia code passes pylint 1.5.6.
Change-Id: Iec21f4c803a427059d595612336d67a35ebf9585
Signed-off-by: Doug Hellmann <doug@doughellmann.com>
Currently we are not logging the exception information the amphora
agent provides in the case of failure. This makes troubleshooting
agent issues hard because you need access to the amphora.
This patch adds logging to capture the amphora agent exception
information in the controller log.
Change-Id: I59514f42c38fc37c7a1fbb5507cd0a15f6d545cb
The common name is used as a file name inside the HAproxy
configuration file. However, a common name can include spaces
and it will result in a configuration file that simply doesn't
work because of the spaces.
The patch changes the functionality so that it instead creates
a SHA1 hash of the certificate and uses that as the file name
to avoid those issues.
Change-Id: I039ed0b40df8b72a1238f8896548fe77086c530c
In the case that nova failed to delete an amphroa they will continue to send
health heartbeat messages the the health manager. This patch improves the
logging of these amphora.
It also optimizes the statistics update flow when event streaming is
disabled by removing two extra database calls.
This patch also removes the un-used BaseControllerTask class.
This patch also finally solidifies that there will be one LB per amphora.
Change-Id: Idf83b19216c680a4854c1239ed9c5bc5ce7364a7
It was reported that the Health Manager process could be crashed with
malformed heartbeat packets. I was unable to reproduce the issue
(I suspect oslo_utils fixed the root cause), but I could see how this
could happen and our error handling could be improved.
This is a lower severity as this port is intended to be only accessible
from a private lb-mgmt-net network.
This patch adds additional exception handling to the Health Manager
listener routines to better handle heartbeat packet issues.
Change-Id: I2da6fa394f5152148237d0986fd969b7950815ba
Story: 2001959
Task: 15081
This also fix build-openstack-sphinx-docs, there was a change introduced
in sphinx 1.6.6:
https://github.com/sphinx-doc/sphinx/pull/4335/files
If the size of __init__.py is less than 2, then the module would be
skipped which will cause the sphinx consistency checking failing later.
Change-Id: I9d8764b6e907aceed8bb8a9b04711145d0eb32ad
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:
1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..
2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).
If we short-circuit handling before we update the health table, we can
solve these problems in two ways:
1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.
2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.
Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
The 'import tools' line is fragile as it depends on how things are
executed as to whether or not '.' is in the python path.
Do the sphinx path munging before importing it.
Also, remove reference to modules/autoindex which does not exist and
thus causes sadness from warning-is-error.
Moves documentation requirements into doc/requirements.txt
Depends-On: Ib121961c5a953a434e7b333cd70f7838a2671f69
Change-Id: I23691aa1d0ea038ec1215e6199015529ddd92de4