When disable a loadbalancer have listener, the Heath Manager not update
amphora heath make it keep failover after heartbeat_timeout end time
Story: 2007587
Task: 39521
Change-Id: Ia6d3f40ae1b9b352492162513c9262748ee67e6f
Convert all code to not require six library and instead
use python 3.x logic.
Created one helper method in common.utils for binary
representation to limit code changes.
Change-Id: I2716ce93691d11100ee951a3a3f491329a4073f0
Eventually hacking will move to 2.0.0 (diskimage-builder
is holding it back), and when it does there will be a few
errors to fix. We can get ahead of it so it doesn't
break us with some small changes for these items:
F601 dictionary key $item repeated with different values
F632 use ==/!= to compare str, bytes, and int literals
E501 line too long
While doing this noticed the lower-constraints.txt for
hacking was set at 0.12.0, when test-requirements.txt
had it at 1.1.0, so fixed that as well.
Change-Id: I80d2a5f97e7a4896a8fa765c1971c8bb7e72d211
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.
This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.
Change-Id: I72520a2ca010c2c27315d9dff839a4f9d7540b6b
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
Includes some updates to docs and configs and related files to remove
references to neutron-lbaas. Also remove handlers.
Change-Id: I3082962841d3b645f3cbd1a6b41fc7fb28dcf7e6
When no UDP listeners are present, skip the UDP health-check code
branch, which prevents expensive and unnecessary DB calls.
Also optimise the UDP health-check code so it only fetches information
for relevant listeners.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: I7fde54084b39c1d0529cfb0bcfb79697d63ea6ae
Zombie amphorae will be deleted by the system to lighten the load on
operators if the amphora record is still in the database.
Story: 2003912
Task: 26800
Change-Id: If133a3d36a9381bcca9f7d00f5c1531885907940
When running stress tests against the Octavia Health Manager it was
observed that the scalability and performance of the health manager has
degraded.
It was observed that the ORM layer was forming poorly optimized queries,
putting excessive load on the database engine and unnecessary code paths
were executing for each heartbeat message.
This patch optimizes the health manager processing of amphora-agent
heartbeat messages by optimizing the database requests, pool processing,
and event streamer code paths.
Change-Id: I2f75715b09430ad139306d9196df0ec5d7a63da8
Story: 2001896
Task: 14381
These files will split with the current Octavia repo, before other parts
are ok.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Change-Id: Ib4924e602d450b1feadb29e830d715ae77f5bbfe
In the case that nova failed to delete an amphroa they will continue to send
health heartbeat messages the the health manager. This patch improves the
logging of these amphora.
It also optimizes the statistics update flow when event streaming is
disabled by removing two extra database calls.
This patch also removes the un-used BaseControllerTask class.
This patch also finally solidifies that there will be one LB per amphora.
Change-Id: Idf83b19216c680a4854c1239ed9c5bc5ce7364a7
Members that were disabled / admin_state_up=False were simply excluded
from the haproxy configuration we pass to the amps. Instead, we should
be creating them in a disabled state, so they return in health messages
as status "maint", and can be marked OFFLINE via the standard health
mechanism, instead of just via override hacks.
This also resolves a bug introduced in an earlier change:
https://review.openstack.org/#/c/567322/
which caused admin-downed members to stay in NO_MONITOR always.
Change-Id: I6615b3ff89d7cef2af52d474aab3a03d947f98be
During a create of a member on a pool with no healthmonitor, there is a
chance the member will be briefly updated to OFFLINE incorrectly.
The flow of events:
1. A Member is created in DB in NO_MONITOR.
2. A Member is passed to the handler.
3. A health message comes in, not including the member.
4. The Member is present in the DB but missing from the Amp, and is
set to OFFLINE status.
5. The Member is created on the amp.
6. Another health messages comes in, and the Member is put back in the
correct NO_MONITOR state.
Change-Id: I74c8cda0c7da88fedf652215630c701bb2761eef
This patch updates the health manager to update the amphora_health timestamp
even if the listener count does not match if the load balancer is in PENDING_*
provisioning status. This will stop failovers for occuring on load balancers
being updated or deleted.
Change-Id: I7d483d764bd841c6977afa0bcf3227e02c573b91
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
The executor will hide any unhandled exceptions raised from the update_health
or update_stats methods. This patch updates the health manager to log those
exceptions.
Change-Id: I941730c5e0964ec319b5eb2b7992335b71f9d134
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:
1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..
2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).
If we short-circuit handling before we update the health table, we can
solve these problems in two ways:
1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.
2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.
Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
1:reduce database interaction
2:support more than one LB on an amp
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: If4f5f93fb8cfd4db8ab3eb9f0518a16da9068160
story: 2001202
task: 5707
The health manager was marking the load balancer as ONLINE even though it was
admin_state_up == False because it was getting a health heartbeat.
This patch causes the load balancer operating status to reflect the admin
state.
Change-Id: Id0fa51447789f5ab190463a3ee0775a8612551ab
Closes-Bug: #1695323
This is *one way* to handle this problem (arguably the "correct" way).
By actively updating the status of members that don't report, we can set
disabled members to OFFLINE exactly when this becomes true.
Change-Id: I908879470e5b7767711c1234063e7959aa6055ef
Closes-Bug: #1706828
Some of the tests were failing due to improper configuration overrides
in the test cases. This patch fixes those tests to use the current
recommended method with oslo test fixtures that will cleanup after the
test.
Change-Id: I5f1ea16bbc16056aa756415a618a8f4192436dfd
Closes-Bug: #1630060
1. Add request error count
2. Add root element 'listener' in the API response body
Change-Id: I8beb918c176ed848affa264cb036763240d07dcd
Implements: blueprint stats-support
It seems we are running into an intermittent issue where the
session params are not fully mocked causing a
'No sql_connection parameter is established'.
Change-Id: I115d89199118db86d64cb750669c969bada3e401
Closes-Bug: #1621211
This is a fix for the bug listed below. Adding the amphora id
to the table allows multiple amphora to send statistics via
the heartbeat without each heartbeat overwriting other
heartbeats for the same listener / different amphora.
Change-Id: I9f50a5de2c1b0665e62d45fcc5815f2b4093b2df
Closes-Bug: 1573607
This commit updates the health monitor to update the load balancer
status even if the given load balancer (and associated amphorae) do not
have any listeners.
Change-Id: Iedfd9ebcf8e2e948c89cbc584c8fdea921b0cfbd
Closes-Bug: 1544290
EvenStream will be used to serialize messages from the octavia
database to neutron-lbaas database via oslo_messaging. Also
renaming update mixin class since its not really a mixin. The
health manager will make changes to the octavia database when
members etc are marked as down and up etc which would result
in databases that were not in sync between neutron-lbaas and
octavia. A mechanism to communicate database changes from
octavia back to neutron is required so this CR attempts
to use a oslo_messaging system to communicate those changes,
Docimpact - /etc/octavia.conf the user can set the option
event_streamer_driver = neutron_event_streamer
to setup a queue to connect to neutron-lbaas.
if this option is left blank it will default to
the noop_event_streamer which will do nothing
effectively turning the Queue off.
Co-Authored-By: Brandon Logan <brandon.logan@rackspace.com>
Change-Id: I77a049dcc21e3ee6287e661e82365ab7b9c44562
Mock 1.0.1 is actually different enough than unittest.mock 3.4
to cause porting issues.
Hence deprecate unittest.mock use, use mock directly.
Closes-Bug: #1506808
Change-Id: I585a5f370566159ff4a071e25d7e2a3702392a2d
When no health monitor is assigned to a member HAProxy returns
"no_check" as the status for the server. This change will pull
that forward into Octavia as member operational_status "NO_MONITOR"
Closes-Bug: 1491936
Change-Id: Ifefd2bddf8999c75397bf7c693042003e0f8a382
Used binary compressed encoding of json dumped object. To reduce
the size needed to send heart beats incase some stats objects
start getting sent later on. Also used sha256 instead of sha1
with hmac.
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Co-Authored-By: German Eichberger <german.eichbeger@hp.com>
Co-Authored-By: Carlos Garza <carlos.garza@rackspace.com>
Partially implements: health-manager
Change-Id: I932c693101b94c9132e1741291610508876eab43
This model is used to check amphora health
Add a column 'busy' and primary key for data table amphora health
Add mutiprocessing code in cmd/health_manager, one for health check, the other is for UDP pacakge listening,
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Co-Authored-By: min wang <swiftwangster@gmail.com>
Implements: blueprint health-manager
Change-Id: I8aeb6b82b58b59951a414e7c2e4c2c58c33a5d15
Co-Authored-By: German german.eichberger@hp.com
Create data models and repositories for healthmanager
Change the health manager's property---lastupdated to be datetime type
Change the code based on the previous comment
Design a new class healthmanager and implement the update health method
and unit test class test_health_mixin
Add try and exception for get member in update_health_mixin
Delete the pool part when the member gets offline status
Add get_all method for AmphoraHealthRepository so we can pass non equality comparation in it,
also make a test for it in test_repositories.py
Changed the name of test_all and get_all to be test_all_filter and get_all_filter
Change-Id: Ic356dee139e743a9617d401f9658cfefcb49d15f