Commit Graph

61 Commits (65e132a734f005f090a384bfa129482d195c6d6e)

Author SHA1 Message Date
Zuul 37799137a3 Merge "Fix multi-listener load balancers" 4 years ago
Michael Johnson 06ce4777c3 Fix multi-listener load balancers
Load balancers with multiple listeners, running on an amphora image
with HAProxy 1.8 or newer can experience excessive memory usage that
may lead to an ERROR provisioning_status.
This patch resolves this issue by consolidating the listeners into
a single haproxy process inside the amphora.

Story: 2005412
Task: 34744
Co-Authored-By: Adam Harwell <>
Change-Id: Idaccbcfa0126f1e26fbb3ad770c65c9266cfad5b
4 years ago
Michael Johnson 48371c98ee Add warning log if auth_strategy is not keystone
A user came to the IRC channel with CLI errors:
"Client-side error: Validation failure: Missing project ID in
request where one is required."

The root cause was the [api_settings] auth_strategy was set to
"noauth" instead of "keystone".

This patch adds a warning log message to the API process that
warns users that typically the auth_strategy should be set to
It also points the user to have an administrator check the keystone
settings in the octavia.conf.

Change-Id: I7793d7a9113b23ac88e7c53d5dc292a70b9453b5
4 years ago
German Eichberger 686303e79d Amphora logging
Configure rsyslog to forward logs to a target host

Co-Authored-By: Michael Johnson <>
Story: 1665069
Task: 33646

Change-Id: I00703f86555cbb574b943794b14a36fbc644f1b2
4 years ago
Michael Johnson 80ddbaeef4 Align logging in the amphora
This patch configures the primary components of the amphora to log
to syslog using consistent logging facilities.
By default, user traffic logs will go to LOG_LOCAL0 and the amphora
processes (haproxy, keepalived, etc.) will log to LOG_LOCAL1.

This is a patch supporting log offloading.

Change-Id: Ifda91e0310e812e34f1e398dd3176af8a9c58f89
Story: 1665069
Task: 5486
4 years ago
Zuul ff4680eb71 Merge "Create Amphora V2 provider driver" 4 years ago
Michael Johnson 0ab16921ae Create Amphora V2 provider driver
This patch creates an Amphora v2 provider driver as well as a
V2 controller worker.
This is in preparation for having the amphora driver use the new
provider driver data models and rely less on native Octavia database
It is also a prepartion step for enabling TaskFlow JobBoard as
this work will move to storing dictionaries in the flows instead
of database models.

Change-Id: Ia65539a8c39560e2276750d8e79a637be4c0f265
Story: 2005072
Task: 30806
4 years ago
Zuul 59660fb365 Merge "Force amp-agent communication to TLSv1.2" 4 years ago
Adam Harwell 5b831f2a5b Force amp-agent communication to TLSv1.2
Also allow configuration of this minimum.
The previous default of SSLv2/3 is very insecure.

Change-Id: If34c7c34d9a6a77685fb177976dc2070760c7b37
4 years ago
Carlos Goncalves c4faac25de Add Python 3.7 support
In order to support Python 3.7, pylint has to be updated to 2.0.0
minimum. Newer versions of Pylint enforce additional checkers which can
be addressed with some code refactoring rather than silently ignoring
them in pylintrc; except useless-object-inheritance which is required to
be silented so that we stay compatible with Python 2.x.

Story: 2004073
Task: 27434

Change-Id: I52301d763797d619f195bd8a1c32bc47f1e68420
4 years ago
Michael Johnson 8997def2b5 Updates Octavia to support octavia-lib
This is the base patch that updates octavia to use the new octavia-lib.
It is backwards compatible by using debtcollector moves.

It adds a new controller process called the "driver-agent".

This patch also adds unit test coverage for a few additional modules.


Change-Id: I438e1548ec0fb6111d1ab85b05015007d9d0a006
4 years ago
akhiljain23 c60931f4b4 Add framework for octavia-status upgrade check
This commit adds the functionality of octavia-status CLI for performing
upgrade checks as part of the Stein cycle upgrade-checkers goal.
It only includes a sample check which must be replaced by real checks in

Change-Id: I8b6d134b0bf5b5c82a19177fed6145ef8aaf7507
Story: 2003657
Task: 26146
5 years ago
Tatsuma Matsuki ad69363fc7 Separate the thread pool for health and stats update
When queue_event_streamer driver is used and RabbitMQ
is down, stats update processes occupy the thread pool
which is shared with health update processes. Then,
RabbitMQ down unexpectedly leads to delete all existing
amphorae. This commit separates the thread pool and aims
to keep the existing amphorae working even when RabbitMQ
is down.

Change-Id: I576687f5b646496ff3a00787cf5e8c27f36b9448
Task: 22929
Story: 2002937
5 years ago
Zuul a74f8b4874 Merge "fix tox python3 overrides" 5 years ago
Zuul 047433b722 Merge "Add exception handling for housekeeping service" 5 years ago
Michael Johnson 5021e0f547 Enable oslo_config mutable configurations
This patch enables oslo_config mutable configuration for the Octavia
control plane processes. The configuration will be updated when the
parent process receives a HUP signal.
This completes the Rocky goal: Enable mutable configuration.

Change-Id: Idaf608c6e5fd2fa74a68c3b562be441a20107a50
Story: 2001545
Task: 6391
5 years ago
Doug Hellmann 0322cbc5c3 fix tox python3 overrides
We want to default to running all tox environments under python 3, so
set the basepython value in each environment.

We do not want to specify a minor version number, because we do not
want to have to update the file every time we upgrade python.

We do not want to set the override once in testenv, because that
breaks the more specific versions used in default environments like
py35 and py36.

This patch also updates pylint to 1.5.6 which is compatible with
In updating pylint we have some issues to correct, this patch addresses
those issues so the Octavia code passes pylint 1.5.6.

Change-Id: Iec21f4c803a427059d595612336d67a35ebf9585
Signed-off-by: Doug Hellmann <>
5 years ago
huangshan 7128d732bc Add exception handling for housekeeping service
Capture exceptions for each thread's task handler to prevent it
from exiting abnormaly.
Story: 2001749
Task: 12128

Change-Id: I0fef53347e4460b8e6e200589736ae65c85a5145
5 years ago
Adam Harwell 344967a0c1 Let healthmanager process shutdown cleanly (again)
Change-Id: I5e341915e6e92e5f8279c7c643374ea3d581841b
5 years ago
Michael Johnson 8e2f7512c2 Improve Health Manager error handling
It was reported that the Health Manager process could be crashed with
malformed heartbeat packets. I was unable to reproduce the issue
(I suspect oslo_utils fixed the root cause), but I could see how this
could happen and our error handling could be improved.
This is a lower severity as this port is intended to be only accessible
from a private lb-mgmt-net network.
This patch adds additional exception handling to the Health Manager
listener routines to better handle heartbeat packet issues.

Change-Id: I2da6fa394f5152148237d0986fd969b7950815ba
Story: 2001959
Task: 15081
5 years ago
Jacky Hu 649b33d247 Add license for empty
This also fix build-openstack-sphinx-docs, there was a change introduced
in sphinx 1.6.6:

If the size of is less than 2, then the module would be
skipped which will cause the sphinx consistency checking failing later.

Change-Id: I9d8764b6e907aceed8bb8a9b04711145d0eb32ad
5 years ago
Adam Harwell f9dafb9a7a Overhaul HealthManager update threading
* Switch to ProcessPool from ThreadPool (Python threads are horrible)
* Load the health/stats update drivers correctly with stevedore
* Add logging driver

Change-Id: Icda4ff218a6bbaea252c2d75c073754f4161a597
5 years ago
Adam Harwell 4dc1f63df2 Healthmanager health_check timer config fix
The periodics health_check job was reading its timing config before
oslo.config had actually loaded real values, so it would always use the
default. Now it sets up the periodic at runtime, which loads the real
config value.

Change-Id: I85c6ff0e698c6c5899c78f8c3a5b119e80bb972a
5 years ago
Jude Cross 7663430f06 Fix health_manager to exit without waiting
When trying to exit health_manager the terminal
would hang due to a child process using time.sleep().
Now the process uses futurist.periodics to schedule
when to run which allows it to quickly and gracefully

Also handles the `failover_amphora` not working out or being
cancelled correctly and logging the statistics of those occurences
instead of incorrectly assuming everything always works out.

Co-Authored-By: Adam Harwell <>
Co-Authored-By: Joshua Harlow <>

Change-Id: I870edaab73ab20a9322c8bc1bd2514897417d12a
6 years ago
Michael Johnson 5744872c94 Fix health monitor DB locking.
Change-Id: Ida0d9e1d7a808706c69808dc78e16bc8292a39c0
6 years ago
Adam Harwell c764abc355 Allow operators to disable v1 or v2.0 api endpoints
Also, create a section for API settings `api_settings` and move some
related settings there.

This patch also enables the configuration settings to be logged
when the api process is started if debug is True.

Change-Id: I31671789d186c4b8a775cc12a414acd2d439512d
6 years ago
e dc882e9d27 Remove log translations from octavia
Log messages are no longer being translated. This removes all use of
the _LE, _LI, and _LW translation markers to simplify logging and to
avoid confusion with new contributions.

This patch also adds hacking rules for the translation tags.


Co-Authored-By: Michael Johnson <>
Change-Id: Ic95111d09e38b3f44fd6c85d0bcf0355c21ef545
6 years ago
Adam Harwell 9027154a5a Removing dependency on eventlet and oslo.service
Change-Id: I453e9b86d4edfedd63cc59e47bf745e166ff836f
6 years ago
Jude Cross f37e776eb3 Fix house_keeping daemon to use Event.wait()
When trying to exit out of the house_keeping daemon
the terminal would hang until all threads finished
their iteration of time.sleep(). Now the threads
instead use the Event object so on keyboard intterupt
the threads will exit without waiting.

Change-Id: I4cb62977f647209ea87001a949fc42472ad53a70
6 years ago
WangBinbin 039395f7ee Replace six.iteritems() with .items()
we should avoid using six.iteritems to achieve iterators.
We can use dict.items instead, as it will return iterators in PY3 as well.
And dict.items/keys will be more readable.

In py2, the performance about list should be negligible

Change-Id: I153d91e884ef0ea0a760527f3dab2b8d5ed3e38e
6 years ago
Michael Johnson 7fdc8a1e06 Update for new pep8 rules E402 and W503
Change-Id: I181f396b002d0c3b89579c4fc33c34b1c099953e
6 years ago
Michael Johnson eebd2d4e33 Fix active/standby under python3
Changes in diskimage-builder switched the amphora image to use
python3.  The active/standby code was not python3 compatible.
This patch corrects that issue.

Change-Id: I81db0e52f1a21d1e3ceea6a4ec2467145f761e55
Closes-Bug: #1659116
6 years ago
Adam Harwell bf8aac5561 Amphora-agent should log to a distinct location
This patch sets up a seperate log file for the amphora-agent
and logrotate to manage this new log.

Co-Authored-By: Adam Harwell <>
Co-Authored-By: Michael Johnson <>
Change-Id: Ia7b057642d7a567d685d989d1c689d5f3481e73e
7 years ago
Dustin Lundquist 126ec9701e Properly format IPv6 bind address strings
This fixes:
    Error: '::9443' is not a valid port number.

Introduce unit test skeleton for amphora agent.

Change-Id: Ic7aebf0674ba7036356bb3231c26fa309cd4c475
7 years ago
Jenkins 6a67d49642 Merge "Remove CONF.import_group" 7 years ago
Jenkins 8fef2f04a7 Merge "Run amphora agent with gunicorn" 7 years ago
Lubosz "diltram" Kosnik 867b350988 Remove CONF.import_group
Remove unneeded import_group lines which are not doing anything and just makes
code harder to understand.

Change-Id: I673dd04dd31ae9771e6af982d184eee0e9cbf2d4
7 years ago
zhangyanxian 4ced93b005 Modify variable's using method in Log Messages
String interpolation should be delayed to be handled by the logging code, 
rather than being done at the point of the logging call. 
For example:
# WRONG'some message: variable=%s') % variable)
# RIGHT'some message: variable=%s'), variable)

Change-Id: I77c9b9783c623167ada1631cf05bf4cf4c40e6b1
7 years ago
Adam Harwell 48a1e7cbe9 Run amphora agent with gunicorn
Flask's default runner (werkzeug) is plagued with bugs.
If we use gunicorn instead, we should have many less problems!

Depends-On: I211dc771aa95147c0f1d9e6ac1a65a7e164b33c2
Change-Id: I59897167f9285bf013f8a155dd2ea4f799ac1d3f
7 years ago
Nir Magnezi 12d2a0f01b Amphora agent refactor to classes
This patch is a prep work needed for Id99948aec64656a0532afc68e146f0610bff1378
which comes to Fix the amphora-agent support for RH based Linux flavors.

This is a pure refactor. Functions were gathered under classes (making
them methods) so state, such as the operating system flavor, can be preserved
throughout the entire amphora agent process lifecycle.

Related-Bug: #1548070

Change-Id: Ic149211dba8ea78e08cb06b6e1f65da00a6571c7
7 years ago
Yang Li eff96e6a5e We should set status to be 1 if get nothing from socket
Change-Id: I52964d4d59ecb70438cc3e45c826bf079711b878
7 years ago
Elena Ezhova d73df70d85 Cleanup deleted load balancers in housekeeper's db_cleanup
When load balancer is deleted the corresponding DB entry is marked
as DELETED and is never actually removed along with a VIP
associated whit this load balancer.

This adds a new method to db_cleanup routine that scans the DB for
load balancers with DELETED provisioning_status and deletes them
from db if they are older than load_balancer_expiry_age. Corresponding
VIP entries are deleted in cascade.

Added new config option `load_balancer_expiry_age` to the `house_keeping`
config section.

Also changed the default value of exp_age argument to
CONF.house_keeping.amphora_expiry_age in check_amphora_expiry_age

Closes-Bug #1573725

Change-Id: I4f99d38f44f218ac55a76ef062ed9ea401c0a02d
7 years ago
ZhiQiang Fan 632ab41a03 [Trivial] Remove unnecessary executable privilege
They are modules, should be imported rather than running in shell,
hence not require for executable privilege.

Change-Id: I869d73411ec8e308a80d780e10e77dcc48097d42
7 years ago
Lingxian Kong 2e379269fb Add WSGI support for octavia api
Make the module work with web servers running on WSGI.

Take uwsgi for an example:
uwsgi --socket /tmp/octavia.sock \
      --pythonpath /home/devstack/octavia/octavia/api \
      --module "app:setup_app()" \
      --pidfile /tmp/ --vacuum \
      --daemonize /var/log/octavia/octavia.log

Change-Id: I3282da1191965e8d83c8bf74ef1a1285673a6987
7 years ago
Carlos D. Garza c84021ac27 Implementing EventStreamer
EvenStream will be used to serialize messages from the octavia
database to neutron-lbaas database via oslo_messaging. Also
renaming update mixin class since its not really a mixin. The
health manager will make changes to the octavia database when
members etc are marked as down and up etc which would result
in databases that were not in sync between neutron-lbaas and
octavia. A mechanism to communicate database changes from
octavia back to neutron is required so this CR attempts
to use a oslo_messaging system to communicate those changes,

Docimpact - /etc/octavia.conf the user can set the option
            event_streamer_driver = neutron_event_streamer
            to setup a queue to connect to neutron-lbaas.
            if this option is left blank it will default to
            the noop_event_streamer which will do nothing
            effectively turning the Queue off.

Co-Authored-By: Brandon Logan <>
Change-Id: I77a049dcc21e3ee6287e661e82365ab7b9c44562
7 years ago
caoyue 1f5031fedc Remove unused logging import
it's obviously the code was copied from other place,
let's make it perfect.

Change-Id: I4f24622c497dd65d1d8a3e829a5ef8c4978f6a46
7 years ago
Brandon Logan 4a6e5a3f21 Make Consumer an oslo_service
This makes more sense and also suppresses the error messages when
launching the service returned from oslo_messaing.get_rpc_server
service.  Instead of that service wait() being called, the Consumer's
wait will be called.

Change-Id: I63816e92fbe26a4213946e6ab584531bdc3b7dd2
Closes-Bug: #1527418
8 years ago
Sherif Abdelwahab 58cda714ba Amphora Flows and Drivers for Active Standby
This patch implements the Active/Standby blueprint in

The following points describe the main changes:

1. The patch introduces new flows and subflows to create M amphorae. The
controller worker parses the loadbalancer_topology configuration. If the
loadbalancer_topology value is ACTIVE_STANDBY, the controller invokes a new flow
independent from the SINGLE topology case, which is left untouched. The new
flow uses conditional taskflows to check for spare amphorae at runtime. This
removes the need for the exception workaround we earlier had. The controller
creates the amphorae in parallel using an unordered flow. A new database task
alter an amphora role as either MASTER or BACKUP and assigns a VRRP priority to
each amphora. After the amphorae are created, the controller invokes a separate
flow for post amphora configuration including plug_vip methods, vrrp
configuration upload, and keepalived service start.

2. The patch introduces new data models that include a new table for VRRP group
configuration per loadbalancer, and update the amphora, loadbalancer, and
listener tables to support the new active/standby capability. The VRRPGroup
table hides authentication data, and makes future extensions of VRRP
capabilities easy.

3. This patch updates the existing Haproxy configuration templates  to include
peer synchronization. In case of ACTIVE_STANDBY configuration, the jinja
configuration renders the peer section in the Haproxy configuration and assigns
short names to the amphorae as listener peers. As listeners implies different
Haproxy process, each listener synchronizes on a different port evaluated as
BASE_PORT (1024) + NUMBER_OF_LISTENERS accounting for ports in use.

4. This patch introduces a new Jinja configuration templater and a REST driver
for Keepalived (developed as a Mixin). By default, Keepalived runs "all" check
scripts found in a predefined directory. The keepalived driver is a Mixin that
can be plugged in other services' drivers. It is the responsibility of these
services drivers to introduce their own check scripts. In this patch a
lightweight check script for Haproxy was introduced along with changes in the
amphora agent installation script.

5. The VRRP requires enabling protocol 112 for Master/Backup advertisements,
and enabling protocol 51 for authentication header. This patch enables these
protocols as needed in the loadbalancer security group.

Note: Updates to the failover flow to support active/standby will come in
a dependent patch.
Note: The amphora-agent is pinned to this patch in this patch set.  This
is required so the scenario tests will pass.  It will be removed in a
follow up patch.

Co-Authored-By: Sherif Abdelwahab <>
Co-Authored-By: Michael Johnson <>
Implements: blueprint activepassiveamphora
Depends-On: Ifdf20378b26cdd13e0a3ff87cec8990fe89c0661
Change-Id: Ic4e04594e114ba682088d68d5f1af3f8f376db83
8 years ago
minwang 19c7f93882 Add cert tracking and rotating in Housekeeping
The goal of this patch is to add the function that once we detect an
amphora's cert will expire in 2 weeks from utcnow, we will update its
cert with a new one and update its db information at the same time.

In order to achieve this target, I did the following changes:

Add 2 new columns cert_busy and cert_expiration in amphora table
Add methods to get cert expiration date from PEM server_pem and
update db info
Use the new REST agent method to perform cycling
Add process in housekeeping to facilitate rotation
Add unit tests

Change-Id: I28578a3e560ee09ba300788a5423863c893b8638
8 years ago
Bertrand Lallau d5e0811926 Add Guru Meditation Report feature
Oslo_reports enables OpenStack projects to dump Guru Meditation
Reports with useful debugging information to files or stderr.

Closes-Bug: #1514504
Change-Id: Id35fb7dc8c31f304cbf1d9cca0d21b9d5e97865a
8 years ago