Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.
Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
A user came to the IRC channel with CLI errors:
"Client-side error: Validation failure: Missing project ID in
request where one is required."
The root cause was the [api_settings] auth_strategy was set to
"noauth" instead of "keystone".
This patch adds a warning log message to the API process that
warns users that typically the auth_strategy should be set to
keystone.
It also points the user to have an administrator check the keystone
settings in the octavia.conf.
Change-Id: I7793d7a9113b23ac88e7c53d5dc292a70b9453b5
Configure rsyslog to forward logs to a target host
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Story: 1665069
Task: 33646
Change-Id: I00703f86555cbb574b943794b14a36fbc644f1b2
This patch configures the primary components of the amphora to log
to syslog using consistent logging facilities.
By default, user traffic logs will go to LOG_LOCAL0 and the amphora
processes (haproxy, keepalived, etc.) will log to LOG_LOCAL1.
This is a patch supporting log offloading.
Change-Id: Ifda91e0310e812e34f1e398dd3176af8a9c58f89
Story: 1665069
Task: 5486
This patch creates an Amphora v2 provider driver as well as a
V2 controller worker.
This is in preparation for having the amphora driver use the new
provider driver data models and rely less on native Octavia database
access.
It is also a prepartion step for enabling TaskFlow JobBoard as
this work will move to storing dictionaries in the flows instead
of database models.
Change-Id: Ia65539a8c39560e2276750d8e79a637be4c0f265
Story: 2005072
Task: 30806
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.
Story: 2004073
Task: 27434
Change-Id: I52301d763797d619f195bd8a1c32bc47f1e68420
This is the base patch that updates octavia to use the new octavia-lib.
It is backwards compatible by using debtcollector moves.
It adds a new controller process called the "driver-agent".
This patch also adds unit test coverage for a few additional modules.
Depends-On: https://review.openstack.org/#/c/641180/
Change-Id: I438e1548ec0fb6111d1ab85b05015007d9d0a006
This commit adds the functionality of octavia-status CLI for performing
upgrade checks as part of the Stein cycle upgrade-checkers goal.
It only includes a sample check which must be replaced by real checks in
future.
Change-Id: I8b6d134b0bf5b5c82a19177fed6145ef8aaf7507
Story: 2003657
Task: 26146
When queue_event_streamer driver is used and RabbitMQ
is down, stats update processes occupy the thread pool
which is shared with health update processes. Then,
RabbitMQ down unexpectedly leads to delete all existing
amphorae. This commit separates the thread pool and aims
to keep the existing amphorae working even when RabbitMQ
is down.
Change-Id: I576687f5b646496ff3a00787cf5e8c27f36b9448
Task: 22929
Story: 2002937
This patch enables oslo_config mutable configuration for the Octavia
control plane processes. The configuration will be updated when the
parent process receives a HUP signal.
This completes the Rocky goal: Enable mutable configuration.
Change-Id: Idaf608c6e5fd2fa74a68c3b562be441a20107a50
Story: 2001545
Task: 6391
We want to default to running all tox environments under python 3, so
set the basepython value in each environment.
We do not want to specify a minor version number, because we do not
want to have to update the file every time we upgrade python.
We do not want to set the override once in testenv, because that
breaks the more specific versions used in default environments like
py35 and py36.
This patch also updates pylint to 1.5.6 which is compatible with
python3.
In updating pylint we have some issues to correct, this patch addresses
those issues so the Octavia code passes pylint 1.5.6.
Change-Id: Iec21f4c803a427059d595612336d67a35ebf9585
Signed-off-by: Doug Hellmann <doug@doughellmann.com>
Capture exceptions for each thread's task handler to prevent it
from exiting abnormaly.
Story: 2001749
Task: 12128
Change-Id: I0fef53347e4460b8e6e200589736ae65c85a5145
It was reported that the Health Manager process could be crashed with
malformed heartbeat packets. I was unable to reproduce the issue
(I suspect oslo_utils fixed the root cause), but I could see how this
could happen and our error handling could be improved.
This is a lower severity as this port is intended to be only accessible
from a private lb-mgmt-net network.
This patch adds additional exception handling to the Health Manager
listener routines to better handle heartbeat packet issues.
Change-Id: I2da6fa394f5152148237d0986fd969b7950815ba
Story: 2001959
Task: 15081
This also fix build-openstack-sphinx-docs, there was a change introduced
in sphinx 1.6.6:
https://github.com/sphinx-doc/sphinx/pull/4335/files
If the size of __init__.py is less than 2, then the module would be
skipped which will cause the sphinx consistency checking failing later.
Change-Id: I9d8764b6e907aceed8bb8a9b04711145d0eb32ad
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver
Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
The periodics health_check job was reading its timing config before
oslo.config had actually loaded real values, so it would always use the
default. Now it sets up the periodic at runtime, which loads the real
config value.
Change-Id: I85c6ff0e698c6c5899c78f8c3a5b119e80bb972a
When trying to exit health_manager the terminal
would hang due to a child process using time.sleep().
Now the process uses futurist.periodics to schedule
when to run which allows it to quickly and gracefully
exit.
Also handles the `failover_amphora` not working out or being
cancelled correctly and logging the statistics of those occurences
instead of incorrectly assuming everything always works out.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Co-Authored-By: Joshua Harlow <jxharlow@godaddy.com>
Change-Id: I870edaab73ab20a9322c8bc1bd2514897417d12a
Also, create a section for API settings `api_settings` and move some
related settings there.
This patch also enables the configuration settings to be logged
when the api process is started if debug is True.
Change-Id: I31671789d186c4b8a775cc12a414acd2d439512d
When trying to exit out of the house_keeping daemon
the terminal would hang until all threads finished
their iteration of time.sleep(). Now the threads
instead use the Event object so on keyboard intterupt
the threads will exit without waiting.
Change-Id: I4cb62977f647209ea87001a949fc42472ad53a70
we should avoid using six.iteritems to achieve iterators.
We can use dict.items instead, as it will return iterators in PY3 as well.
And dict.items/keys will be more readable.
In py2, the performance about list should be negligible
Change-Id: I153d91e884ef0ea0a760527f3dab2b8d5ed3e38e
Changes in diskimage-builder switched the amphora image to use
python3. The active/standby code was not python3 compatible.
This patch corrects that issue.
Change-Id: I81db0e52f1a21d1e3ceea6a4ec2467145f761e55
Closes-Bug: #1659116
This patch sets up a seperate log file for the amphora-agent
and logrotate to manage this new log.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Change-Id: Ia7b057642d7a567d685d989d1c689d5f3481e73e
This fixes:
Error: '::9443' is not a valid port number.
Introduce unit test skeleton for amphora agent.
Change-Id: Ic7aebf0674ba7036356bb3231c26fa309cd4c475
Remove unneeded import_group lines which are not doing anything and just makes
code harder to understand.
Change-Id: I673dd04dd31ae9771e6af982d184eee0e9cbf2d4
String interpolation should be delayed to be handled by the logging code,
rather than being done at the point of the logging call.
Ref:http://docs.openstack.org/developer/oslo.i18n/guidelines.html#log-translation
For example:
# WRONG
LOG.info(_LI('some message: variable=%s') % variable)
# RIGHT
LOG.info(_LI('some message: variable=%s'), variable)
Change-Id: I77c9b9783c623167ada1631cf05bf4cf4c40e6b1
Flask's default runner (werkzeug) is plagued with bugs.
If we use gunicorn instead, we should have many less problems!
Depends-On: I211dc771aa95147c0f1d9e6ac1a65a7e164b33c2
Change-Id: I59897167f9285bf013f8a155dd2ea4f799ac1d3f
This patch is a prep work needed for Id99948aec64656a0532afc68e146f0610bff1378
which comes to Fix the amphora-agent support for RH based Linux flavors.
This is a pure refactor. Functions were gathered under classes (making
them methods) so state, such as the operating system flavor, can be preserved
throughout the entire amphora agent process lifecycle.
Related-Bug: #1548070
Change-Id: Ic149211dba8ea78e08cb06b6e1f65da00a6571c7
When load balancer is deleted the corresponding DB entry is marked
as DELETED and is never actually removed along with a VIP
associated whit this load balancer.
This adds a new method to db_cleanup routine that scans the DB for
load balancers with DELETED provisioning_status and deletes them
from db if they are older than load_balancer_expiry_age. Corresponding
VIP entries are deleted in cascade.
Added new config option `load_balancer_expiry_age` to the `house_keeping`
config section.
Also changed the default value of exp_age argument to
CONF.house_keeping.amphora_expiry_age in check_amphora_expiry_age
method.
DocImpact
Closes-Bug #1573725
Change-Id: I4f99d38f44f218ac55a76ef062ed9ea401c0a02d
They are modules, should be imported rather than running in shell,
hence not require for executable privilege.
Change-Id: I869d73411ec8e308a80d780e10e77dcc48097d42
Make the octavia.api.app module work with web servers running on WSGI.
Take uwsgi for an example:
uwsgi --socket /tmp/octavia.sock \
--pythonpath /home/devstack/octavia/octavia/api \
--module "app:setup_app()" \
--pidfile /tmp/octavia.pid --vacuum \
--daemonize /var/log/octavia/octavia.log
Change-Id: I3282da1191965e8d83c8bf74ef1a1285673a6987
EvenStream will be used to serialize messages from the octavia
database to neutron-lbaas database via oslo_messaging. Also
renaming update mixin class since its not really a mixin. The
health manager will make changes to the octavia database when
members etc are marked as down and up etc which would result
in databases that were not in sync between neutron-lbaas and
octavia. A mechanism to communicate database changes from
octavia back to neutron is required so this CR attempts
to use a oslo_messaging system to communicate those changes,
Docimpact - /etc/octavia.conf the user can set the option
event_streamer_driver = neutron_event_streamer
to setup a queue to connect to neutron-lbaas.
if this option is left blank it will default to
the noop_event_streamer which will do nothing
effectively turning the Queue off.
Co-Authored-By: Brandon Logan <brandon.logan@rackspace.com>
Change-Id: I77a049dcc21e3ee6287e661e82365ab7b9c44562
This makes more sense and also suppresses the error messages when
launching the service returned from oslo_messaing.get_rpc_server
service. Instead of that service wait() being called, the Consumer's
wait will be called.
Change-Id: I63816e92fbe26a4213946e6ab584531bdc3b7dd2
Closes-Bug: #1527418
This patch implements the Active/Standby blueprint in
https://blueprints.launchpad.net/octavia/+spec/activepassiveamphora
The following points describe the main changes:
1. The patch introduces new flows and subflows to create M amphorae. The
controller worker parses the loadbalancer_topology configuration. If the
loadbalancer_topology value is ACTIVE_STANDBY, the controller invokes a new flow
independent from the SINGLE topology case, which is left untouched. The new
flow uses conditional taskflows to check for spare amphorae at runtime. This
removes the need for the exception workaround we earlier had. The controller
creates the amphorae in parallel using an unordered flow. A new database task
alter an amphora role as either MASTER or BACKUP and assigns a VRRP priority to
each amphora. After the amphorae are created, the controller invokes a separate
flow for post amphora configuration including plug_vip methods, vrrp
configuration upload, and keepalived service start.
2. The patch introduces new data models that include a new table for VRRP group
configuration per loadbalancer, and update the amphora, loadbalancer, and
listener tables to support the new active/standby capability. The VRRPGroup
table hides authentication data, and makes future extensions of VRRP
capabilities easy.
3. This patch updates the existing Haproxy configuration templates to include
peer synchronization. In case of ACTIVE_STANDBY configuration, the jinja
configuration renders the peer section in the Haproxy configuration and assigns
short names to the amphorae as listener peers. As listeners implies different
Haproxy process, each listener synchronizes on a different port evaluated as
BASE_PORT (1024) + NUMBER_OF_LISTENERS accounting for ports in use.
4. This patch introduces a new Jinja configuration templater and a REST driver
for Keepalived (developed as a Mixin). By default, Keepalived runs "all" check
scripts found in a predefined directory. The keepalived driver is a Mixin that
can be plugged in other services' drivers. It is the responsibility of these
services drivers to introduce their own check scripts. In this patch a
lightweight check script for Haproxy was introduced along with changes in the
amphora agent installation script.
5. The VRRP requires enabling protocol 112 for Master/Backup advertisements,
and enabling protocol 51 for authentication header. This patch enables these
protocols as needed in the loadbalancer security group.
Note: Updates to the failover flow to support active/standby will come in
a dependent patch.
Note: The amphora-agent is pinned to this patch in this patch set. This
is required so the scenario tests will pass. It will be removed in a
follow up patch.
Co-Authored-By: Sherif Abdelwahab <sherif.abdelwahab@hp.com>
Co-Authored-By: Michael Johnson <johnsomor@gmail.com>
Implements: blueprint activepassiveamphora
Depends-On: Ifdf20378b26cdd13e0a3ff87cec8990fe89c0661
Change-Id: Ic4e04594e114ba682088d68d5f1af3f8f376db83
The goal of this patch is to add the function that once we detect an
amphora's cert will expire in 2 weeks from utcnow, we will update its
cert with a new one and update its db information at the same time.
In order to achieve this target, I did the following changes:
Add 2 new columns cert_busy and cert_expiration in amphora table
Add methods to get cert expiration date from PEM server_pem and
update db info
Use the new REST agent method to perform cycling
Add process in housekeeping to facilitate rotation
Add unit tests
Change-Id: I28578a3e560ee09ba300788a5423863c893b8638
Oslo_reports enables OpenStack projects to dump Guru Meditation
Reports with useful debugging information to files or stderr.
Closes-Bug: #1514504
Change-Id: Id35fb7dc8c31f304cbf1d9cca0d21b9d5e97865a