1857 Commits

Author SHA1 Message Date
Adam Harwell
78f1c7b128 Minimize the effect overloaded Health Manager processes
If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:

1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..

2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).

If we short-circuit handling before we update the health table, we can
solve these problems in two ways:

1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.

2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.

Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
(cherry picked from commit 61e0c14f48130d1d0519fa5527d2712ba6ce504f)
2018-04-21 00:12:53 +00:00
Adam Harwell
ed77d01939 Clean up test_update_db.py a little bit
Change-Id: Ifa7fbb90465f1f1aa867838a1048bd5aad019b53
(cherry picked from commit 3a53e54ff51fdeb94db0367839fe360d3772ea08)
2018-04-21 00:12:50 +00:00
wei
38f076249c Optimize update_health process
1:reduce database interaction
2:support more than one LB on an amp

Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: If4f5f93fb8cfd4db8ab3eb9f0518a16da9068160
story: 2001202
task: 5707
(cherry picked from commit 4bddaf6bf2bc85a882006a006b3619e4ed8a82ea)
2018-04-20 17:12:38 -07:00
Michael Johnson
c49959d2ab Pin pip < 10 in the amphora image
pip 10 fails when installing the amphora-agent due to a package conflict
in oslo.config PyYAML and the python3-yaml installed in the base images.
We cannot backport the venv fix from master as the stable branch uses
the ubuntu element which does not have enough disk space to install with the
venv.
So to make a minimal fix for the stable branches we are going to pin pip
in the amphora image.

(cherry picked from commit 9e1ced71e1e270777b15adbcde8de03577375964)

Change-Id: I04bf04178c5cd169e3ff1a16a1c0423ef0a07b6d
2018-04-19 16:17:23 -07:00
Michael Johnson
5c2d2d2e5c Deallocate the VIP prior to deleting the amphorae
This patch changes the load balancer delete flow order to deallocate the VIP
port(s) prior to deleting the amphorae. This resolves an issue where an
initial delete fails due to a neutron outage.

Change-Id: Ifcfdba1d28aa732d0f7b3b0a8bffb58cee0ab7cc
Story: 2001523
Task: 6325
(cherry picked from commit 87c233fd402397d918e85d774f9e298ff0751a4f)
2018-04-04 18:34:51 +00:00
Michael Johnson
78902ae5dc Fix health manager edge case with zombie amphora
There is a edge case where amphora may have been deleted, marked deleted in
our database, but still be running in nova. One example is an instance stuck
in nova "deleting" status. These can still report health heartbeats, but
we will see them as failed as they are reporting load balancing configuration
that we do not have a record for. This will lead to a failover to repair the
amphora, which will re-attempt the delete and reset the amphora health record.
Which can lead to another failover attempt.

This patch will log and leave the amphora as "busy" if a failover is attempted
on an amphora we show as "DELETED". This will prevent repeated failover
attempts on "DELETED" amphora that are still alive in the compute system.

It also fixes a mis-named mock object in a failover test.

Change-Id: I3397ffacf8e08964ecd4b47f2353542b6bc57645
(cherry picked from commit 1a35d6dc81b96b34f1b5251540afcbe62f2fb0d2)
2018-03-27 12:06:00 -07:00
Michael Johnson
2a162d70f5 Log health manager exceptions
The executor will hide any unhandled exceptions raised from the update_health
or update_stats methods. This patch updates the health manager to log those
exceptions.

(cherry picked from commit 24cd0070759fe094430d809167b2aa4964107a70)

Change-Id: Icc4d6c2cd7a93bb22ccab73e102db58ab423959a
2018-03-18 17:41:49 +00:00
James E. Blair
a39cf13351 Zuul: Remove project name
Zuul no longer requires the project-name for in-repo configuration.
Omitting it makes forking or renaming projects easier.

Change-Id: I00cf2ae60e6290c3cb4d37ca8b9b4e4f16cdd5ea
2018-02-08 20:28:38 -08:00
Michael Johnson
d9e24e83bb Make the allowed_address_pairs driver better
The allowed address pairs network driver did not handle having objects
already deleted properly and would prematurely fail.
This patch improves the edge case handling of the driver.

Change-Id: Ice15aa2d6309648da43d6a40ce0286d5dce17500
Story: 2001258
Task: 5791
(cherry picked from commit ab0fe776d854fdf7970eb96efb32ac1a8a9e26a5)
2017-12-22 00:01:54 +01:00
Michael Johnson
9ade0ff715 Updating for new sphinx docs jobs
There have been recent changes to how docs jobs are run under zuul [1].
This patch updates stable/pike to be able to run under the new zuul docs
jobs.

[1] https://review.openstack.org/#/c/508694/

(cherry picked from commit 967edebe0c4c1711aa6b566ed581bc33d51b611c)

Change-Id: I7ba3f7ea0b1e0733cec4e0cd3a05128c772e0238
2017-12-21 12:12:49 -08:00
Zuul
b84682c587 Merge "Handle race condition deleting security group rule" into stable/pike 2017-12-04 21:07:57 +00:00
Mohammed Naser
f00f00ea0a Handle race condition deleting security group rule
It is possible that during the `_update_security_group_rules` operation,
the security group rule which is to be deleted gets removed by some
other external operation.   If this happens, it raises a NotFound
exception is not handled which can lead to a load balancer stuck in
a PENDING_DELETE status.

Change-Id: Ic9ebe8392758f3de5fc6a94f815f83d6de113188
Story: 2001300
Task: 5851
(cherry picked from commit 709a23cdce016795571695983e38bc8e8b38d355)
2017-12-04 17:02:37 +00:00
German Eichberger
ff15dde793 Adds the user's project id to the VIP port creation
This brings Octavia in line with what n-lbaas does for VIP port and
allows a user to attach a FIP to the VIP port generated by Octavia.

Change-Id: Ib7e6374cad49a16a733dacb2ea1ca096d8c4e6e4
(cherry picked from commit 0f4a5e21709c77c926cbbf7929217fa951b48ad7)
2017-11-30 17:05:25 +00:00
Zuul
d2fad0b654 Merge "Stop masking VIP allocate neutron errors" into stable/pike 2017-11-27 23:25:52 +00:00
Michael Johnson
f3256eb5ef Fix keepalived check script
The keepalived check script would indicate that the amphora was not healthy
if the listener had not yet been created. This change fixes this allowing
the VIP to come up and master election to complete at load balancer create
time.

(cherry picked from commit 29051f012cf8db550c2c133353258f4d64ddb8d7)
Change-Id: Icb393e3ec33fee3d9a4ca89eb4ab338c53fc2d9d
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
2017-11-21 17:10:48 +00:00
Michael Johnson
534e1f932c Stop child objects changing status when LB locked
This patch corrects a bug in the load balancer locking that would
modify the child object provisioning status even though it is returning
that the load balancer is currently immutable.

Story: 2001258
Task: 5788

Change-Id: I017bdcd902327d0cc363a6edb34c5eaeb9fd42e8
(cherry picked from commit b7bb5aff2dcb08d42aeaf8b13928e1b3c8189263)
2017-11-09 08:31:33 +00:00
Michael Johnson
921a93fade Setup Octavia stable/pike for zuul v3
Initial patch to setup Octavia stable/pike branch for zuul v3 jobs.

Change-Id: If0ce494a4f15a96489444f965d3620bcfa4cf8d2
2017-11-06 16:37:23 -08:00
Michael Johnson
1c94f6663c Disable kvm on OVH infra instances
OVH infra hosts are causing "KVM: entry failed, hardware error 0x0"
failures where instances fail to start (cirros, etc.).
This patch excludes OVH instances from kvm enablement until the issue is
resolved.

Co-Authored-By: Adam Harwell <flux.adam@gmail.com>

(cherry picked from commit 926fb27fbeb41cec405e35f0814e14e152aff927)

Change-Id: I41a7e35a66a98a837364f5febf793bb6e93f714e
2017-11-01 15:33:45 -07:00
Michael Johnson
c76cb446c5 Stop masking VIP allocate neutron errors
Previously the neutron network driver was masking the actual error being
returned from neutron while doing a VIP port allocation.
This patch will pass through the neutron error to the user if it is
an API ready exception giving the user more useful information about the
error neutron is reporting. For example that the requested VIP address is
already allocated.

Change-Id: Ic327b06ff9ccf9ae0c9931a8e6569c20d03c4a79
Closes-Bug: #1714593
Story: 1714593
Task: 5005
(cherry picked from commit c1afc1586394333b71da2678568ce5696e0d189d)
2017-10-27 10:11:16 +00:00
Michael Johnson
c3279a6bd4 Fix non-cascade LB delete with children
When the user attempts to delete a load balancer, without the cascade option,
the load balancer is locked in "PENDING_DELETE" prior to the check for child
objects. The child object check will return the error to the user, but would
leave the load balancer locked in "PENDING_DELETE".
This patch corrects the order to do the validation prior to locking the
load balancer in "PENDING_DELETE".

Story: 2001256
Task: 5786
(cherry picked from commit c05a8cfb88bd1219a086e8826f7783b64f1e8537)

Change-Id: I7364e6dbe6f647962df892e4c72cfdf7598c502b
2017-10-26 20:49:12 +00:00
ghanshyam
8565bcd146 Shrink Tempest scenario manager copy
scenario manager in tempest plugin was copied from
Tempest to avoid any plugin break.

This copied version of scenario manager is using lot of
Tempest interfaces. Many of them are not stable and might
change which can break plugin.

For example - https://review.openstack.org/#/c/503875/3

This commit shrinks the scenario manager copy to keep
only required methods.

Change-Id: I95385f8359521bd8baa603cdcc3f0b6ca4c81db8
2017-09-23 18:32:01 +00:00
OpenStack Proposal Bot
4eef736030 Updated from global requirements
Change-Id: I973764f10f18fbdbb47a23b6e4dcd35c3003449f
2017-09-06 21:13:05 +00:00
Michael Johnson
cdac977f63 Add a CLI section to the Octavia docs
Change-Id: I4a134771f2428439cfb6e1b697a6b8c1b16b220a
1.0.1
2017-08-23 14:43:57 -07:00
Michael Johnson
e4152755f3 Setup octavia for stable/pike
Change-Id: I9c981848d5708c47434f29ceb68ad47a2dc4deaf
2017-08-17 16:48:17 -07:00
Jenkins
2c8cc57156 Merge "Fix a bad revert method and add hacking check" into stable/pike 1.0.0.0rc2 1.0.0 2017-08-17 02:56:40 +00:00
Michael Johnson
b684a4639c Fix health monitor DB locking.
Change-Id: Ida0d9e1d7a808706c69808dc78e16bc8292a39c0
(cherry picked from commit 5744872c9474126c3d5e0d5dc1c12dfaed2431e1)
2017-08-16 22:08:24 +00:00
Michael Johnson
9454901b73 Fix a bad revert method and add hacking check
This patch fixes a revert method that was not handling extra parameters
being passed to it.
It also adds a hacking check to make sure this does not happen in the
future.
The patch also breaks the bad habit of compiling regex strings for every
line of code in the project.

Change-Id: If29e377204432e215bfea97f9d76bce0a442f4c8
(cherry picked from commit c3754dbf5a5ca6d2fa378cf95e30cd64fc8120a5)
2017-08-16 22:08:10 +00:00
02926d8907 Update UPPER_CONSTRAINTS_FILE for stable/pike
Change-Id: I31d2a1b7218a2f986389f0c6b3db8470dd24644b
2017-08-11 08:28:08 +00:00
accf201f82 Update .gitreview for stable/pike
Change-Id: I64a99ed2e5a2919a5b50f55bf59116874e362939
2017-08-11 08:28:07 +00:00
Jenkins
e0f996dad7 Merge "Update devstack readme.md" 1.0.0.0rc1 2017-08-11 03:58:05 +00:00
Jenkins
f299fae89f Merge "Fix LB creation with VIP port" 2017-08-11 03:48:23 +00:00
Michael Johnson
7c986df83d Fix LB creation with VIP port
Octavia v2 API was failing to create the load balancer when the user
specified a VIP port ID.
This also improves the user experience when specifying a VIP address.
It also removes the un-used nova_network directory.

Change-Id: I8b533094df1e5425f824fff0454335709ce05447
Closes-Bug: #1709922
2017-08-10 16:18:29 -07:00
Jenkins
283ff4b754 Merge "Ignore 404 amphora error when deleting resources" 2017-08-10 20:10:00 +00:00
Jenkins
4878e909f8 Merge "LB Admin down should show operating_status OFFLINE" 2017-08-10 19:14:17 +00:00
Jenkins
956291a0f7 Merge "Correct status for disabled members (honest abe edition)" 2017-08-10 19:13:31 +00:00
Jenkins
d68c58349b Merge "Fix DB update reverts for provisioning status" 2017-08-10 09:47:25 +00:00
Jenkins
738f3f78a6 Merge "Properly store VIP data on LB Create" 2017-08-10 09:15:49 +00:00
Jenkins
7ab5b27413 Merge "Fix sg_rule didn't set protocol field" 2017-08-10 00:09:09 +00:00
Jenkins
2d4cce80de Merge "Update links in README" 2017-08-09 19:23:11 +00:00
Michael Johnson
6e71722bc9 Update devstack readme.md
Change-Id: I34437efccab1f1f4e183d6cf322207c34b1c49aa
2017-08-09 12:22:45 -07:00
ZhaoBo
7f5749d534 Fix sg_rule didn't set protocol field
If the lb security group contains 1 rule without protocol, worker may
raise error in _update_security_group_rules when listener creation.

Change-Id: Idc826d251296435119ae963c832de29160062967
2017-08-09 19:12:42 +00:00
Jenkins
e9471d728e Merge "Remove usage of credentials_factory.AdminManager" 2017-08-09 08:26:37 +00:00
Adam Harwell
c28d212a17 Properly store VIP data on LB Create
Right now the vip data isn't actually stored back to the DB, it just
looks like it is... So, actually it will create a port and then orphan
it, then create another port with a different IP later.

Change-Id: Ibb7b2bd89155e37fb41a5f62ba2cda6e233a127a
2017-08-08 12:21:53 -07:00
OpenStack Proposal Bot
54a880d6c1 Updated from global requirements
Change-Id: Ie319d58dd01a141066cba0e0c1a7770d2ce79d38
2017-08-08 12:18:14 +00:00
lidong
a1e596c77d Update links in README
Change-Id: I3532dcd5365fad3e8c7a09077ff6fd087592879f
2017-08-08 11:09:39 +08:00
ghanshyam
757dc692df Remove usage of credentials_factory.AdminManager
Tempest is making credentials_factory as stable interface
and will be removing the AdminManager class which is wrapper
of creating client manager wirth admin cred.

admin manager can be instantiated by providing the admin cred
to clients.Manager.

This commit removes the usage of AdminManager.

Change-Id: I321a7985a60ceb0c6230d04e42c2b4c0a804027f
2017-08-08 04:10:18 +03:00
Jenkins
c87ec394c1 Merge "Add allocate vip port when create loadbalancer in server side" 2017-08-07 20:49:05 +00:00
Michael Johnson
579f18d627 Fix DB update reverts for provisioning status
Now that all objects have a proper provisioning status, update the revert
methods to properly set the provisioning status to error.

Change-Id: I74e44474e7cd05979f2a9ce24143641230e4b394
Closes-Bug: #1624166
2017-08-07 10:54:00 -07:00
Jenkins
02d332bde7 Merge "Spec detailing Octavia service flavors support" 2017-08-07 16:37:18 +00:00
cheng
e797e8763f Add allocate vip port when create loadbalancer in server side
Closes-Bug: #1666559
Closes-Bug: #1695331
Change-Id: I102efb9a22ac1cdcffc0f959d0b04401e34f425c
Signed-off-by: cheng <tangch318@gmail.com>
2017-08-06 15:39:25 +00:00