kolla-ansible/ansible/roles
Doug Szumski 6bfe1927f0 Remove classic queue mirroring for internal RabbitMQ
When OpenStack is deployed with Kolla-Ansible, by default there
are no durable queues or exchanges created by the OpenStack
services in RabbitMQ. In Rabbit terminology, not being durable
is referred to as `transient`, and this means that the queue
is generally held in memory.

Whether OpenStack services create durable or transient queues is
traditionally controlled by the Oslo Notification config option:
`amqp_durable_queues`. In Kolla-Ansible, this remains set to
the default of `False` in all services. The only `durable`
objects are the `amq*` exchanges which are internal to RabbitMQ.

More recently, Oslo Notification has introduced support for
Quorum queues [7]. These are a successor to durable classic
queues, however it isn't yet clear if they are a good fit for
OpenStack in general [8].

For clustered RabbitMQ deployments, Kolla-Ansible configures all
queues as `replicated` [1]. Replication occurs over all nodes
in the cluster. RabbitMQ refers to this as 'mirroring of classic
queues'.

In summary, this means that a multi-node Kolla-Ansible deployment
will end up with a large number of transient, mirrored queues
and exchanges. However, the RabbitMQ documentation warns against
this, stating that 'For replicated queues, the only reasonable
option is to use durable queues: [2]`. This is discussed
further in the following bug report: [3].

Whilst we could try enabling the `amqp_durable_queues` option
for each service (this is suggested in [4]), there are
a number of complexities with this approach, not limited to:

1) RabbitMQ is planning to remove classic queue mirroring in
   favor of 'Quorum queues' in a forthcoming release [5].
2) Durable queues will be written to disk, which may cause
   performance problems at scale. Note that this includes
   Quorum queues which are always durable.
3) Potential for race conditions and other complexity
   discussed recently on the mailing list under:
   `[ops] [kolla] RabbitMQ High Availability`

The remaining option, proposed here, is to use classic
non-mirrored queues everywhere, and rely on services to recover
if the node hosting a queue or exchange they are using fails.
There is some discussion of this approach in [6]. The downside
of potential message loss needs to be weighed against the real
upsides of increasing the performance of RabbitMQ, and moving
to a configuration which is officially supported and hopefully
more stable. In the future, we can then consider promoting
specific queues to quorum queues, in cases where message loss
can result in failure states which are hard to recover from.

[1] https://www.rabbitmq.com/ha.html
[2] https://www.rabbitmq.com/queues.html
[3] https://github.com/rabbitmq/rabbitmq-server/issues/2045
[4] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
[5] https://blog.rabbitmq.com/posts/2021/08/4.0-deprecation-announcements/
[6] https://fuel-ccp.readthedocs.io/en/latest/design/ref_arch_1000_nodes.html#replication
[7] https://bugs.launchpad.net/oslo.messaging/+bug/1942933
[8] https://www.rabbitmq.com/quorum-queues.html#use-cases

Partial-Bug: #1954925
Change-Id: I91d0e23b22319cf3fdb7603f5401d24e3b76a56e
2022-02-21 18:54:04 +00:00
..
aodh Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
barbican Remove custom value for max_allowed_request_size_in_bytes 2022-01-18 22:04:31 +01:00
baremetal CI: Fix new ansible-lint failures 2022-02-15 07:42:53 +00:00
bifrost Merge "bifrost: preempt change in defaults for TFTP and HTTP boot paths" 2022-01-07 09:08:56 +00:00
blazar Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
ceilometer Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
ceph-rgw Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
certificates Merge "certificates: generate libvirt TLS certificates" 2022-02-03 19:11:03 +00:00
cinder Merge "Add support for VMware First Class Disk (FCD)" 2022-02-21 11:07:00 +00:00
cloudkitty multiple: remove duplicated variables between defaults and group vars 2022-01-12 09:28:41 +00:00
collectd Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
common Fix fluentd v1 buffer syntax issue 2022-02-11 11:33:38 +00:00
cyborg Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
designate multiple: remove duplicated variables between defaults and group vars 2022-01-12 09:28:41 +00:00
destroy octavia: support tenant management network 2021-03-03 10:20:40 +08:00
elasticsearch Merge "Continue to run all actions if one action failed in curator" 2022-01-18 10:48:22 +00:00
etcd Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
freezer Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
glance Glance: add lock_path setting 2022-02-01 11:24:04 +00:00
gnocchi Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
grafana Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
hacluster Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
haproxy-config Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
heat Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
horizon horizon: Support custom themes 2022-01-31 14:34:24 +00:00
influxdb Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
ironic Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
iscsi Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
kafka Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
keystone Add OIDCDiscoverURL mod_oidc option 2022-02-02 15:40:50 +01:00
kibana Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
kuryr Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
loadbalancer [haproxy] optionally set socket to allow admin commands 2022-02-09 17:21:18 +00:00
magnum Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
manila Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
mariadb Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
masakari Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
memcached Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
mistral Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
module-load Drop support for /etc/modules 2020-08-25 20:20:57 +01:00
monasca Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
multipathd Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
murano Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
neutron Merge "Add support for VMware NSXP" 2022-02-18 12:04:41 +00:00
nova Merge "Add support for VMware NSXP" 2022-02-18 12:04:41 +00:00
nova-cell Merge "Add support for VMware NSXP" 2022-02-18 12:04:41 +00:00
octavia octavia: drop warning about certificate changes 2022-02-08 12:18:13 +00:00
octavia-certificates [docs] Unify project's naming convention 2021-01-27 20:08:41 +01:00
openvswitch Merge "openvswitch: add option to set hw offload" 2022-01-26 10:55:02 +00:00
ovn Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
ovs-dpdk multiple: remove duplicated variables between defaults and group vars 2022-01-12 09:28:41 +00:00
placement Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
prechecks Merge "Add Ansible 5 aka core 2.12 support" 2022-01-20 20:53:03 +00:00
prometheus Configure node-exporter to report correct file system metrics 2022-02-18 18:36:22 +01:00
prune-images Performance: replace unconditional include_tasks with import_tasks 2020-08-28 16:12:03 +00:00
qdrouterd Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
rabbitmq Remove classic queue mirroring for internal RabbitMQ 2022-02-21 18:54:04 +00:00
redis Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
sahara Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
senlin Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
service-cert-copy Add kolla_externally_managed_cert option 2021-03-02 18:09:06 +01:00
service-images-pull Add ability to retry image pulling 2021-08-19 18:38:59 +00:00
service-ks-register Remove delegate_to from service-ks-register tasks 2019-09-26 10:38:35 +01:00
service-precheck Add Ansible group check to prechecks 2020-02-28 16:23:14 +00:00
service-rabbitmq Configure RabbitMQ user tags in nova-cell role 2020-05-15 16:02:46 +01:00
service-stop/tasks Fix kolla-ansible stop with heterogeneous hosts 2020-03-23 17:21:53 +00:00
skydive Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
solum Merge "Use Docker healthchecks for solum services" 2022-01-07 10:22:08 +00:00
storm Merge "Move project_name and kolla_role_name to role vars" 2022-01-06 15:29:25 +00:00
swift Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
tacker Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
telegraf Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
trove Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
vitrage Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
watcher Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
zookeeper Move project_name and kolla_role_name to role vars 2021-12-31 09:26:25 +00:00
zun Deploy Zun with Cinder Ceph support 2022-02-02 19:15:51 +00:00