When disable a loadbalancer have listener, the Heath Manager not update
amphora heath make it keep failover after heartbeat_timeout end time
Story: 2007587
Task: 39521
Change-Id: Ia6d3f40ae1b9b352492162513c9262748ee67e6f
This patch allows listeners on a load balancer to continue to
operate should one listener fail to access secret content in
barbican. Previously if one listener failed to access barbican
content, all of the listeners would be impacted.
This patch also cleans up some unused code and unnecessary comments.
Change-Id: I300839fe7cf88763e1e0b8c484029662beb64f0a
Story: 2006676
Task: 36951
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.
Story: 2004073
Task: 27434
Change-Id: I52301d763797d619f195bd8a1c32bc47f1e68420
Includes some updates to docs and configs and related files to remove
references to neutron-lbaas. Also remove handlers.
Change-Id: I3082962841d3b645f3cbd1a6b41fc7fb28dcf7e6
When no UDP listeners are present, skip the UDP health-check code
branch, which prevents expensive and unnecessary DB calls.
Also optimise the UDP health-check code so it only fetches information
for relevant listeners.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: I7fde54084b39c1d0529cfb0bcfb79697d63ea6ae
Zombie amphorae will be deleted by the system to lighten the load on
operators if the amphora record is still in the database.
Story: 2003912
Task: 26800
Change-Id: If133a3d36a9381bcca9f7d00f5c1531885907940
When running stress tests against the Octavia Health Manager it was
observed that the scalability and performance of the health manager has
degraded.
It was observed that the ORM layer was forming poorly optimized queries,
putting excessive load on the database engine and unnecessary code paths
were executing for each heartbeat message.
This patch optimizes the health manager processing of amphora-agent
heartbeat messages by optimizing the database requests, pool processing,
and event streamer code paths.
Change-Id: I2f75715b09430ad139306d9196df0ec5d7a63da8
Story: 2001896
Task: 14381
These files will split with the current Octavia repo, before other parts
are ok.
Patch List:
[1] Finish keepalived LVS jinja template for UDP support
[2] Extend the ability of amp agent for upload/refresh the keepalived
process
[3] Extend the db model and db table with necessary fields for met the new
udp backend
[4] Add logic/workflow elements process in UDP cases
[5] Extend the existing API to access udp parameters in Listener API
[6] Extend the existing pool API to access the new option in
session_persistence fields
Change-Id: Ib4924e602d450b1feadb29e830d715ae77f5bbfe
The health manager reports an ERROR for each heartbeat from a spare
amphora. This patch resolves that issue.
Change-Id: Ia4a709a5c96803c76cf4325d34281645737a3add
Story: 2002621
Task: 22251
In the case that nova failed to delete an amphroa they will continue to send
health heartbeat messages the the health manager. This patch improves the
logging of these amphora.
It also optimizes the statistics update flow when event streaming is
disabled by removing two extra database calls.
This patch also removes the un-used BaseControllerTask class.
This patch also finally solidifies that there will be one LB per amphora.
Change-Id: Idf83b19216c680a4854c1239ed9c5bc5ce7364a7
During a create of a member on a pool with no healthmonitor, there is a
chance the member will be briefly updated to OFFLINE incorrectly.
The flow of events:
1. A Member is created in DB in NO_MONITOR.
2. A Member is passed to the handler.
3. A health message comes in, not including the member.
4. The Member is present in the DB but missing from the Amp, and is
set to OFFLINE status.
5. The Member is created on the amp.
6. Another health messages comes in, and the Member is put back in the
correct NO_MONITOR state.
Change-Id: I74c8cda0c7da88fedf652215630c701bb2761eef
This patch updates the health manager to update the amphora_health timestamp
even if the listener count does not match if the load balancer is in PENDING_*
provisioning status. This will stop failovers for occuring on load balancers
being updated or deleted.
Change-Id: I7d483d764bd841c6977afa0bcf3227e02c573b91
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
The executor will hide any unhandled exceptions raised from the update_health
or update_stats methods. This patch updates the health manager to log those
exceptions.
Change-Id: I941730c5e0964ec319b5eb2b7992335b71f9d134
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
If a Health Manager is overloaded, it can begin to fall very far behind
in processing health updates. This causes huge delays in the whole
system and can cause two distinctly different issues:
1) If the HMs are all suddenly busy, delays can be long enough that no
messages get through within the failover timeout, and amps start to
fail, increasing load on the HMs and causing a cascade failure (I have
witnessed this happen once and take down over 50 LBs before manual
intervention could be taken)..
2) Even one overloaded HM can cause updates to queue for extremely long
periods, which makes the system unreliable. Amps can go down and still
have health updates register for some time as the HM processes the queue
(in some cases I have seen dead amps updated for 5-10 minutes).
If we short-circuit handling before we update the health table, we can
solve these problems in two ways:
1) The heavy processing generally happens after this, so
short-circuiting early will let some other threads finish faster and
have some chance of success.
2) Amphora health won't continue to be updated long after the messages
were received, so it won't be possible for zombie amphorae to eat as
many brains.
Change-Id: Iceeacfdcaebe1f9bb99bc08e318c9da73a66898d
1:reduce database interaction
2:support more than one LB on an amp
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: If4f5f93fb8cfd4db8ab3eb9f0518a16da9068160
story: 2001202
task: 5707
The health manager was marking the load balancer as ONLINE even though it was
admin_state_up == False because it was getting a health heartbeat.
This patch causes the load balancer operating status to reflect the admin
state.
Change-Id: Id0fa51447789f5ab190463a3ee0775a8612551ab
Closes-Bug: #1695323
This is *one way* to handle this problem (arguably the "correct" way).
By actively updating the status of members that don't report, we can set
disabled members to OFFLINE exactly when this becomes true.
Change-Id: I908879470e5b7767711c1234063e7959aa6055ef
Closes-Bug: #1706828
In large build situations, nova can be slow to build VMs, this means that the
default 100 second timeout may expire before the final status has been updated
in the neutron database. This patch will emit provisioning status to be sync
with neutron db
Change-Id: If6c0b81630fd1911518792d9947f8622f065ff4e
This patch updates the Octavia documentation in support of the
OpenStack documentation migration[1].
[1] https://specs.openstack.org/openstack/docs-specs/specs \
/pike/os-manuals-migration.html
Change-Id: I97fd038b8050bfe776c3fca8336d9090f8236362
Depends-On: Ia750cb049c0f53a234ea70ce1f2bbbb7a2aa9454
we should avoid using six.iteritems to achieve iterators.
We can use dict.items instead, as it will return iterators in PY3 as well.
And dict.items/keys will be more readable.
In py2, the performance about list should be negligible
Change-Id: I153d91e884ef0ea0a760527f3dab2b8d5ed3e38e
1. Add request error count
2. Add root element 'listener' in the API response body
Change-Id: I8beb918c176ed848affa264cb036763240d07dcd
Implements: blueprint stats-support
This is a fix for the bug listed below. Adding the amphora id
to the table allows multiple amphora to send statistics via
the heartbeat without each heartbeat overwriting other
heartbeats for the same listener / different amphora.
Change-Id: I9f50a5de2c1b0665e62d45fcc5815f2b4093b2df
Closes-Bug: 1573607
This commit updates the health monitor to update the load balancer
status even if the given load balancer (and associated amphorae) do not
have any listeners.
Change-Id: Iedfd9ebcf8e2e948c89cbc584c8fdea921b0cfbd
Closes-Bug: 1544290
EvenStream will be used to serialize messages from the octavia
database to neutron-lbaas database via oslo_messaging. Also
renaming update mixin class since its not really a mixin. The
health manager will make changes to the octavia database when
members etc are marked as down and up etc which would result
in databases that were not in sync between neutron-lbaas and
octavia. A mechanism to communicate database changes from
octavia back to neutron is required so this CR attempts
to use a oslo_messaging system to communicate those changes,
Docimpact - /etc/octavia.conf the user can set the option
event_streamer_driver = neutron_event_streamer
to setup a queue to connect to neutron-lbaas.
if this option is left blank it will default to
the noop_event_streamer which will do nothing
effectively turning the Queue off.
Co-Authored-By: Brandon Logan <brandon.logan@rackspace.com>
Change-Id: I77a049dcc21e3ee6287e661e82365ab7b9c44562
When no health monitor is assigned to a member HAProxy returns
"no_check" as the status for the server. This change will pull
that forward into Octavia as member operational_status "NO_MONITOR"
Closes-Bug: 1491936
Change-Id: Ifefd2bddf8999c75397bf7c693042003e0f8a382
Used binary compressed encoding of json dumped object. To reduce
the size needed to send heart beats incase some stats objects
start getting sent later on. Also used sha256 instead of sha1
with hmac.
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Co-Authored-By: German Eichberger <german.eichbeger@hp.com>
Co-Authored-By: Carlos Garza <carlos.garza@rackspace.com>
Partially implements: health-manager
Change-Id: I932c693101b94c9132e1741291610508876eab43
Co-Authored-By: German german.eichberger@hp.com
Create data models and repositories for healthmanager
Change the health manager's property---lastupdated to be datetime type
Change the code based on the previous comment
Design a new class healthmanager and implement the update health method
and unit test class test_health_mixin
Add try and exception for get member in update_health_mixin
Delete the pool part when the member gets offline status
Add get_all method for AmphoraHealthRepository so we can pass non equality comparation in it,
also make a test for it in test_repositories.py
Changed the name of test_all and get_all to be test_all_filter and get_all_filter
Change-Id: Ic356dee139e743a9617d401f9658cfefcb49d15f