Convert all code to not require six library and instead
use python 3.x logic.
Created one helper method in common.utils for binary
representation to limit code changes.
Change-Id: I2716ce93691d11100ee951a3a3f491329a4073f0
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.
This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.
Change-Id: I72520a2ca010c2c27315d9dff839a4f9d7540b6b
This patch allows listeners on a load balancer to continue to
operate should one listener fail to access secret content in
barbican. Previously if one listener failed to access barbican
content, all of the listeners would be impacted.
This patch also cleans up some unused code and unnecessary comments.
Change-Id: I300839fe7cf88763e1e0b8c484029662beb64f0a
Story: 2006676
Task: 36951
The single process patch changed the way listeners and load balancers
are deployed inside the amphora. This caused listeners with SNI
enabled to load all of the certificates for all of the TLS enabled
listeners on a load balancer.
This patch corrects that by configuring each listener with a
specific list of certificates.
Change-Id: I2f3c7ab4137dbd84d77a6a6b675975af406249d0
Story: 2006758
Task: 37252
The amphora no-op driver had the wrong method signature for the
update_amphora_agent_config method.
This patch corrects that issue.
Change-Id: Ib1b0df3b7227d8a8dd68276e279cae1c4974ded2
This patch adds support for the octavia-lib to get objects by ID.
Change-Id: I98b399891488e5972ea4d332c06b55b34f20fb11
Story: 2005870
Task: 33680
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
When removing listeners, listeners are removed from the load balancer's
listener list just before reconfiguring each amphora.
In case of ACTIVE/STANDBY topology, the code is performed on both
amphorae, so the listener is removed twice from the list.
This commit ensures that we don't remove an already removed listener.
Story: 2006329
Task: 36065
Change-Id: I426255f587f36b415eb999a9eb28cf0f91de94b0
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
Add tls_ca_container_id and crl_container_id into Pool API.
Story: 2003858
Task: 26672
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I6cd6e2ca8e48a5df707a70d22505dec9d752c7eb
Add 1 fields like Listener does, which is 'tls_container_ref', this
field is introduced into Pool for storage the pool client certificate to
the backend servers, when the traffic willing to bring a cert to the
servers and check for tls connection.
Story: 2003859
Task: 26685
Change-Id: I29b7c7116e6087c942179ed9efdead494ef277a3
Add crl-file in Listener side.
Story: 2002165
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I9e2ec06719fbbfd19482c2b8d39220e7e4ed81e3
This patch add 'client_ca_tls_container_ref' into listener API for front
client authentication.
Story: 2002165
Task: 20018
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I8a96d6fdfe53a16d1abcfd09bc6afedd6c490de2
This patch validates that a flavor is compatible with using spares
pool amphora. It will also update the amphora-agent config after
a spares pool amphora has been allocated.
This patch enables the ability to update a running amphora's agent
configuration and have the mutatable options be adopted.
The following amphora agent configuration options can be updated:
heartbeat_key
controller_ip_port_list
heartbeat_interval
loadbalancer_topology
This patch adds the support to the amphora-agent and the amphora
driver. A follow on patch will expose this capabililty via the
amphora admin API.
Change-Id: I97bdf5188808193516509f20767e82c0f8d2f5a5
The dual-amp-down fix added an amphora parameter to the amphora driver
interface, but failed to update the driver base and the noop driver.
This patch corrects that oversight.
Depends-On: https://review.openstack.org/634992
Change-Id: I7bd63c933f8e7cd10ff5c89fafbbb09e8cc9e3e1
Load balancers with IPv6 VIP addresses would fail to create due to
a duplicate address detection issue. The keepalived process would also
crash with a segfault due to a known bug[1].
This patch resolves both issues and allows load balancers with IPv6
VIP addresses to be created in active/standby topology.
[1] https://github.com/acassen/keepalived/issues/457
Story: 2003451
Task: 24657
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I15a4be05740e2657f998902d468e57763c3ed52e
A recent patch[1] (stein master) added the http-reuse option to the
haproxy template for pools. This feature is not available in the HAProxy
version included with CentOS 7, 1.5.x. This could cause an upgrade issue
if the control plane was upgraded to Stein, but the cloud still had older
CentOS based amphora.
This patch corrects that issue by checking the HAProxy version in the
amphora and adjusting the template if it finds an older HAProxy.
This patch also updates the test_health_check_stale_amphora test to
not wait (sleep) for the full heartbeat_timeout.
[1] https://review.openstack.org/#/c/598379/
Change-Id: I3d990d1d3cd93dbeced9edc53f9c166610dafcd0
Story: 2003901
Task: 26775
The amphora no-op driver did not get updated properly for the multi-amphora
failover fix.
This patch fixes that issue and corrects the doc strings for the
haproxy amphora driver update_amphora_listeners method.
Change-Id: Ib0d63da7c5599069f5ea50f0dfbc59eefba58c84
When queue_event_streamer driver is used and RabbitMQ
is down, stats update processes occupy the thread pool
which is shared with health update processes. Then,
RabbitMQ down unexpectedly leads to delete all existing
amphorae. This commit separates the thread pool and aims
to keep the existing amphorae working even when RabbitMQ
is down.
Change-Id: I576687f5b646496ff3a00787cf5e8c27f36b9448
Task: 22929
Story: 2002937
In Pike[1], we introduced a user_group auto detection for haproxy.
The default user group name is auto-detected for any OS distribution
we support as a base for Amphorae.
user_group remained as an option for admins but was also
marked deprecated in Pike[2].
This patch removes that option altogether.
Story: 2003323
Task: 24357
[1] Ia8fede9d7da4709a48661d1fc595a16d04fcbfa9
[2] https://review.openstack.org/#/c/429398/45/octavia/common/config.py@175
Change-Id: Iddd4162674f116705d2b47062cbf7ca88f2677a6
1. Removes the misc_dynamic setting from the UDP-CONNECT health monitor
as our script does not use it.
2. Adds a release note for the UDP features.
3. Updates the API reference for UDP support.
4. Adds a comment to the keepalived config with the LB ID.
5. Updates the status message type to be the correct UDP protocol.
6. Fix error during deleting a listener if there are multiple amphoraes.
7. Refactors systemd service script handling.
Story: 2003306
Task: 24258
Change-Id: I09240023d066ac5a71836d01045cda6ce5678712
These files will split with the current Octavia repo, before other parts
are ok.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Change-Id: Ib4924e602d450b1feadb29e830d715ae77f5bbfe
If a load balancer loses more than one amphora at the same time
the failover process will fail and leave the load balancer in
provisioning status ERROR.
This patch resolves this by failing over one amphora at a time
marking any amphora that are also failed in status ERROR. The health
manager will then failover the other failed amphora in subsequent checks.
This patch will update multiple healthy amphora in parallel and will
timeout failed amphroa using the new "active_connection_max_retries"
configuration setting used for "fail-fast" connections.
The patch also updates the amphora failover flow documentation to
show the full flow and not just the spares failover flow.
It updates the amphora driver "get_diagnostics" method to pass instead
of error.
It also adds a AmphoraComputeConnectivityWait task to explicitly wait
for a compute instance to come up and be reachable. This allows a longer
timeout and clarifies this may fail due to compute (nova) failures.
Previously the first plug vip task would do this wait.
Change-Id: Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701
Story: 2001481
Task: 6202
The common name is used as a file name inside the HAproxy
configuration file. However, a common name can include spaces
and it will result in a configuration file that simply doesn't
work because of the spaces.
The patch changes the functionality so that it instead creates
a SHA1 hash of the certificate and uses that as the file name
to avoid those issues.
Change-Id: I039ed0b40df8b72a1238f8896548fe77086c530c
In the case that nova failed to delete an amphroa they will continue to send
health heartbeat messages the the health manager. This patch improves the
logging of these amphora.
It also optimizes the statistics update flow when event streaming is
disabled by removing two extra database calls.
This patch also removes the un-used BaseControllerTask class.
This patch also finally solidifies that there will be one LB per amphora.
Change-Id: Idf83b19216c680a4854c1239ed9c5bc5ce7364a7
It was reported that the Health Manager process could be crashed with
malformed heartbeat packets. I was unable to reproduce the issue
(I suspect oslo_utils fixed the root cause), but I could see how this
could happen and our error handling could be improved.
This is a lower severity as this port is intended to be only accessible
from a private lb-mgmt-net network.
This patch adds additional exception handling to the Health Manager
listener routines to better handle heartbeat packet issues.
Change-Id: I2da6fa394f5152148237d0986fd969b7950815ba
Story: 2001959
Task: 15081
This also fix build-openstack-sphinx-docs, there was a change introduced
in sphinx 1.6.6:
https://github.com/sphinx-doc/sphinx/pull/4335/files
If the size of __init__.py is less than 2, then the module would be
skipped which will cause the sphinx consistency checking failing later.
Change-Id: I9d8764b6e907aceed8bb8a9b04711145d0eb32ad
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:
1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..
2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).
If we short-circuit handling before we update the health table, we can
solve these problems in two ways:
1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.
2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.
Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
In case the same resource is deleted twice on the amphora
ignore the 404 -- normally this should be covered by our
API logic to only allow exectly one delete but this adds
an additional fail safe.
Per johnsom's suggestion also handle that on the amphora
level.
Closes-Bug: #1705764
Change-Id: I2c5f2a4719c405d5a24acf76db40a90da67d8d17
WIP - This patch attempts to fix the py3x gates.
Please add to it as you find issues.
Closes-Bug: #1659064
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Depends-On: If6b6f19130c965436a637a03a4cf72203e0786b0
Change-Id: If642f7ddcb886b4e9fd04a12397f26c72b3485a4
One of the rest driver unit tests was not mocking out the requests
module causing it to attempt to make https get requests.
This patch corrects that oversight.
Change-Id: Iff99207883483ff7f886f62c03664446eb0df492
This patch addresses several places where IPv6 and IPv6 link-local
addresses where not considered for communication between amphora and the
controller worker.
In the devstack plugin we permit both IPv4 and IPv6 for health
monitoring and the amphora REST API.
In the amphora's UDP health sender we parse the IP port string in a
manner which permits IPv6 addresses by splitting on the last colon
rather than every colon.
In the controller REST API driver we append an interface scope if using
IPv6 link-local addresses. This interface can be specified by an
operator is they are using an interface other than o-hm0, this only is
required if using IPv6 link-local addresses.
Change-Id: I9d07bec4ac105e8876fadb72a83a590ffd4d2e66
Some of the tests were failing due to improper configuration overrides
in the test cases. This patch fixes those tests to use the current
recommended method with oslo test fixtures that will cleanup after the
test.
Change-Id: I5f1ea16bbc16056aa756415a618a8f4192436dfd
Closes-Bug: #1630060
MTU must be set properly because if the tenant network is some kind
of tunnels, the default mtu may cause packets loss.
Change-Id: Ife10cb8b5ad8e5066f2e7a1565ad72a3e1916688
Closes-Bug: #1627687
This commit adds the ability for Octavia to make use of PKCS7
intermediate certificate bundles. These PKCS7 bundles may be in PEM or
DER format. This feature is being added since barbican specifies that
this is the preferred format for intermediate bundles in secret
containers.
This commit also re-arranges and/or strengthens several of our existing
tests of TLS / SNI functionality and in the process also fixes a bug
where encrypted private keys were not uploaded to amphorae in a format
that haproxy can readily parse. I have also added several sample or
dummy certificates which can be used for an up-coming scenario test
which exercises TLS-termination capabilities of Octavia.
Change-Id: I14e394bbf48456d2e2a7bbefcc777a1b6f4b83e4
Closes-Bug: #1627356
Closes-Bug: #1627367
The admin-state-up=False action for loadbalancer and listener
failed to affect the appropriate change. This patch corrects that
as well as removes an un-necessary call to the amphora-agent.
Change-Id: I698f964f584d150f162f6c8cb41c65f5c5556b52
Closes-Bug: #1619449
Subnets will sometimes be defined to have static routes that all
fixed ips on that subnet should use, neutron calls them host routes
in the API. This makes Octavia aware of them.
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: I37b79da5e4cf532a31780537702d6effa656de5b
This patch updates the haproxy service scripts to handle the case
where the network interfaces have not yet been plugged. This can
occur in a failover situation.
This patch also makes sure we don't move the management lan interface
into the network namespace.
Closes-Bug: #1509706
Closes-Bug: #1577963
Change-Id: I04d267bd3cdedca11f0350c5255086233cba14ec
time.sleep() should be mocked when tests run so they don't take an
extra amount of time to run for no reason.
30s from test_health_daemon.TestHealthDaemon.test_run_sender
10s from test_rest_api_driver.TestAmphoraAPIClientTest.test_request
Change-Id: If32c2021ea37240fd200ebc41d519ed897be87b2
Currently Octavia assumes that DHCP service is available on
the VIP and member subnets. This is not the case at all operators.
This patch makes Octavia use the IP information provided when
the ports are created, if available. If the IP information is
not available on the ports it will fall back to relying on DHCP.
Change-Id: I08a93d4318bbce48128019376320782d1a334369
Closes-Bug: #1607900