If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:
1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..
2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).
If we short-circuit handling before we update the health table, we can
solve these problems in two ways:
1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.
2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.
Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
(cherry picked from commit 61e0c14f48130d1d0519fa5527d2712ba6ce504f)
1:reduce database interaction
2:support more than one LB on an amp
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: If4f5f93fb8cfd4db8ab3eb9f0518a16da9068160
story: 2001202
task: 5707
(cherry picked from commit 4bddaf6bf2bc85a882006a006b3619e4ed8a82ea)
pip 10 fails when installing the amphora-agent due to a package conflict
in oslo.config PyYAML and the python3-yaml installed in the base images.
We cannot backport the venv fix from master as the stable branch uses
the ubuntu element which does not have enough disk space to install with the
venv.
So to make a minimal fix for the stable branches we are going to pin pip
in the amphora image.
(cherry picked from commit 9e1ced71e1e270777b15adbcde8de03577375964)
Change-Id: I04bf04178c5cd169e3ff1a16a1c0423ef0a07b6d
This patch changes the load balancer delete flow order to deallocate the VIP
port(s) prior to deleting the amphorae. This resolves an issue where an
initial delete fails due to a neutron outage.
Change-Id: Ifcfdba1d28aa732d0f7b3b0a8bffb58cee0ab7cc
Story: 2001523
Task: 6325
(cherry picked from commit 87c233fd402397d918e85d774f9e298ff0751a4f)
There is a edge case where amphora may have been deleted, marked deleted in
our database, but still be running in nova. One example is an instance stuck
in nova "deleting" status. These can still report health heartbeats, but
we will see them as failed as they are reporting load balancing configuration
that we do not have a record for. This will lead to a failover to repair the
amphora, which will re-attempt the delete and reset the amphora health record.
Which can lead to another failover attempt.
This patch will log and leave the amphora as "busy" if a failover is attempted
on an amphora we show as "DELETED". This will prevent repeated failover
attempts on "DELETED" amphora that are still alive in the compute system.
It also fixes a mis-named mock object in a failover test.
Change-Id: I3397ffacf8e08964ecd4b47f2353542b6bc57645
(cherry picked from commit 1a35d6dc81b96b34f1b5251540afcbe62f2fb0d2)
The executor will hide any unhandled exceptions raised from the update_health
or update_stats methods. This patch updates the health manager to log those
exceptions.
(cherry picked from commit 24cd0070759fe094430d809167b2aa4964107a70)
Change-Id: Icc4d6c2cd7a93bb22ccab73e102db58ab423959a
Zuul no longer requires the project-name for in-repo configuration.
Omitting it makes forking or renaming projects easier.
Change-Id: I00cf2ae60e6290c3cb4d37ca8b9b4e4f16cdd5ea
The allowed address pairs network driver did not handle having objects
already deleted properly and would prematurely fail.
This patch improves the edge case handling of the driver.
Change-Id: Ice15aa2d6309648da43d6a40ce0286d5dce17500
Story: 2001258
Task: 5791
(cherry picked from commit ab0fe776d854fdf7970eb96efb32ac1a8a9e26a5)
There have been recent changes to how docs jobs are run under zuul [1].
This patch updates stable/pike to be able to run under the new zuul docs
jobs.
[1] https://review.openstack.org/#/c/508694/
(cherry picked from commit 967edebe0c4c1711aa6b566ed581bc33d51b611c)
Change-Id: I7ba3f7ea0b1e0733cec4e0cd3a05128c772e0238
It is possible that during the `_update_security_group_rules` operation,
the security group rule which is to be deleted gets removed by some
other external operation. If this happens, it raises a NotFound
exception is not handled which can lead to a load balancer stuck in
a PENDING_DELETE status.
Change-Id: Ic9ebe8392758f3de5fc6a94f815f83d6de113188
Story: 2001300
Task: 5851
(cherry picked from commit 709a23cdce016795571695983e38bc8e8b38d355)
This brings Octavia in line with what n-lbaas does for VIP port and
allows a user to attach a FIP to the VIP port generated by Octavia.
Change-Id: Ib7e6374cad49a16a733dacb2ea1ca096d8c4e6e4
(cherry picked from commit 0f4a5e21709c77c926cbbf7929217fa951b48ad7)
The keepalived check script would indicate that the amphora was not healthy
if the listener had not yet been created. This change fixes this allowing
the VIP to come up and master election to complete at load balancer create
time.
(cherry picked from commit 29051f012cf8db550c2c133353258f4d64ddb8d7)
Change-Id: Icb393e3ec33fee3d9a4ca89eb4ab338c53fc2d9d
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
This patch corrects a bug in the load balancer locking that would
modify the child object provisioning status even though it is returning
that the load balancer is currently immutable.
Story: 2001258
Task: 5788
Change-Id: I017bdcd902327d0cc363a6edb34c5eaeb9fd42e8
(cherry picked from commit b7bb5aff2dcb08d42aeaf8b13928e1b3c8189263)
OVH infra hosts are causing "KVM: entry failed, hardware error 0x0"
failures where instances fail to start (cirros, etc.).
This patch excludes OVH instances from kvm enablement until the issue is
resolved.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
(cherry picked from commit 926fb27fbeb41cec405e35f0814e14e152aff927)
Change-Id: I41a7e35a66a98a837364f5febf793bb6e93f714e
Previously the neutron network driver was masking the actual error being
returned from neutron while doing a VIP port allocation.
This patch will pass through the neutron error to the user if it is
an API ready exception giving the user more useful information about the
error neutron is reporting. For example that the requested VIP address is
already allocated.
Change-Id: Ic327b06ff9ccf9ae0c9931a8e6569c20d03c4a79
Closes-Bug: #1714593
Story: 1714593
Task: 5005
(cherry picked from commit c1afc1586394333b71da2678568ce5696e0d189d)
When the user attempts to delete a load balancer, without the cascade option,
the load balancer is locked in "PENDING_DELETE" prior to the check for child
objects. The child object check will return the error to the user, but would
leave the load balancer locked in "PENDING_DELETE".
This patch corrects the order to do the validation prior to locking the
load balancer in "PENDING_DELETE".
Story: 2001256
Task: 5786
(cherry picked from commit c05a8cfb88bd1219a086e8826f7783b64f1e8537)
Change-Id: I7364e6dbe6f647962df892e4c72cfdf7598c502b
scenario manager in tempest plugin was copied from
Tempest to avoid any plugin break.
This copied version of scenario manager is using lot of
Tempest interfaces. Many of them are not stable and might
change which can break plugin.
For example - https://review.openstack.org/#/c/503875/3
This commit shrinks the scenario manager copy to keep
only required methods.
Change-Id: I95385f8359521bd8baa603cdcc3f0b6ca4c81db8
This patch fixes a revert method that was not handling extra parameters
being passed to it.
It also adds a hacking check to make sure this does not happen in the
future.
The patch also breaks the bad habit of compiling regex strings for every
line of code in the project.
Change-Id: If29e377204432e215bfea97f9d76bce0a442f4c8
(cherry picked from commit c3754dbf5a5ca6d2fa378cf95e30cd64fc8120a5)
Octavia v2 API was failing to create the load balancer when the user
specified a VIP port ID.
This also improves the user experience when specifying a VIP address.
It also removes the un-used nova_network directory.
Change-Id: I8b533094df1e5425f824fff0454335709ce05447
Closes-Bug: #1709922
If the lb security group contains 1 rule without protocol, worker may
raise error in _update_security_group_rules when listener creation.
Change-Id: Idc826d251296435119ae963c832de29160062967
Right now the vip data isn't actually stored back to the DB, it just
looks like it is... So, actually it will create a port and then orphan
it, then create another port with a different IP later.
Change-Id: Ibb7b2bd89155e37fb41a5f62ba2cda6e233a127a
Tempest is making credentials_factory as stable interface
and will be removing the AdminManager class which is wrapper
of creating client manager wirth admin cred.
admin manager can be instantiated by providing the admin cred
to clients.Manager.
This commit removes the usage of AdminManager.
Change-Id: I321a7985a60ceb0c6230d04e42c2b4c0a804027f
Now that all objects have a proper provisioning status, update the revert
methods to properly set the provisioning status to error.
Change-Id: I74e44474e7cd05979f2a9ce24143641230e4b394
Closes-Bug: #1624166