[zmq] Update configurations documentation

In this change new configurations appeared in Ocata release
like dynamic connections and other types of proxied deployments
being described.

Change-Id: Id6e9b062101d8916323edc143ea5379585192581
This commit is contained in:
ozamiatin 2017-02-13 19:16:31 +02:00
parent 337f499c58
commit 6c7094a921
1 changed files with 399 additions and 75 deletions

View File

@ -36,7 +36,7 @@ Currently, ZeroMQ is one of the RPC backend drivers in oslo.messaging. ZeroMQ
can be the only RPC driver across the OpenStack cluster.
This document provides deployment information for this driver in oslo_messaging.
Other than AMQP-based drivers, like RabbitMQ, ZeroMQ doesn't have
Other than AMQP-based drivers, like RabbitMQ, default ZeroMQ doesn't have
any central brokers in oslo.messaging, instead, each host (running OpenStack
services) is both ZeroMQ client and server. As a result, each host needs to
listen to a certain TCP port for incoming connections and directly connect
@ -78,15 +78,16 @@ Assuming the following systems as a goal.
| Horizon |
+---------------------+
=============
Configuration
=============
===================
Basic Configuration
===================
Enabling (mandatory)
--------------------
To enable the driver the 'transport_url' option must be set to 'zmq://'
in the section [DEFAULT] of the conf file, the 'rpc_zmq_host' flag
in the section [DEFAULT] of the conf file, the 'rpc_zmq_host' option
must be set to the hostname of the current node. ::
[DEFAULT]
@ -95,15 +96,23 @@ must be set to the hostname of the current node. ::
[oslo_messaging_zmq]
rpc_zmq_host = {hostname}
Default configuration of zmq driver is called 'Static Direct Connections' (To
learn more about zmq driver configurations please proceed to the corresponding
section 'Existing Configurations'). That means that all services connect
directly to each other and all connections are static so we open them at the
beginning of service's lifecycle and close them only when service quits. This
configuration is the simplest one since it doesn't require any helper services
(proxies) other than matchmaker to be running.
Match Making (mandatory)
------------------------
Matchmaking (mandatory)
-----------------------
The ZeroMQ driver implements a matching capability to discover hosts available
for communication when sending to a bare topic. This allows broker-less
communications.
The MatchMaker is pluggable and it provides two different MatchMaker classes.
The Matchmaker is pluggable and it provides two different Matchmaker classes.
MatchmakerDummy: default matchmaker driver for all-in-one scenario (messages
are sent to itself; used mainly for testing).
@ -120,14 +129,24 @@ specify the URL as follows::
In order to cleanup redis storage from expired records (e.g. target listener
goes down) TTL may be applied for keys. Configure 'zmq_target_expire' option
which is 120 (seconds) by default. The option is related not specifically to
which is 300 (seconds) by default. The option is related not specifically to
redis so it is also defined in [oslo_messaging_zmq] section. If option value
is <= 0 then keys don't expire and live forever in the storage.
MatchMaker Data Source (mandatory)
The other option is 'zmq_target_update' (180 seconds by default) which
specifies how often each RPC-Server should update the matchmaker. This option's
optimal value generally is zmq_target_expire / 2 (or 1.5). It is recommended to
calculate it based on 'zmq_target_expire' so services records wouldn't expire
earlier than being updated from alive services.
Generally matchmaker can be considered as an alternate approach to services
heartbeating.
Matchmaker Data Source (mandatory)
----------------------------------
MatchMaker data source is stored in files or Redis server discussed in the
Matchmaker data source is stored in files or Redis server discussed in the
previous section. How to make up the database is the key issue for making ZeroMQ
driver work.
@ -154,63 +173,7 @@ To deploy redis with HA follow the `sentinel-install`_ instructions. From the
messaging driver's side you will need to setup following configuration ::
[DEFAULT]
transport_url = "zmq+redis://host1:26379,host2:26379,host3:26379"
Restrict the number of TCP sockets on controller
------------------------------------------------
The most heavily used RPC pattern (CALL) may consume too many TCP sockets on
controller node in directly connected configuration. To solve the issue
ROUTER proxy may be used.
In order to configure driver to use ROUTER proxy set up the 'use_router_proxy'
option to true in [oslo_messaging_zmq] section (false is set by default).
For example::
use_router_proxy = true
Not less than 3 proxies should be running on controllers or on stand alone
nodes. The parameters for the script oslo-messaging-zmq-proxy should be::
oslo-messaging-zmq-proxy
--config-file /etc/oslo/zeromq.conf
--log-file /var/log/oslo/zeromq-router-proxy.log
--host node-123
--frontend-port 50001
--backend-port 50002
--publisher-port 50003
--debug True
Command line arguments like host, frontend_port, backend_port and publisher_port
respectively can also be set in [zmq_proxy_opts] section of a configuration
file (i.e., /etc/oslo/zeromq.conf). All arguments are optional.
Port value of 0 means random port (see the next section for more details).
Fanout-based patterns like CAST+Fanout and notifications always use proxy
as they act over PUB/SUB, 'use_pub_sub' option defaults to true. In such case
publisher proxy should be running. Actually proxy does both: routing to a
DEALER endpoint for direct messages and publishing to all subscribers over
zmq.PUB socket.
If not using PUB/SUB (use_pub_sub = false) then fanout will be emulated over
direct DEALER/ROUTER unicast which is possible but less efficient and therefore
is not recommended. In a case of direct DEALER/ROUTER unicast proxy is not
needed.
This option can be set in [oslo_messaging_zmq] section.
For example::
use_pub_sub = true
In case of using a proxy all publishers (clients) talk to servers over
the proxy connecting to it via TCP.
You can specify ZeroMQ options in /etc/oslo/zeromq.conf if necessary.
transport_url = "zmq+sentinel://host1:26379,host2:26379,host3:26379"
Listening Address (optional)
@ -235,18 +198,379 @@ controls number of retries before 'ports range exceeded' failure.
For example::
rpc_zmq_min_port = 9050
rpc_zmq_max_port = 10050
rpc_zmq_min_port = 49153
rpc_zmq_max_port = 65536
rpc_zmq_bind_port_retries = 100
=======================
Existing Configurations
=======================
Static Direct Connections
-------------------------
The example of service config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = false
use_router_proxy = false
use_dynamic_connections = false
zmq_target_expire = 60
zmq_target_update = 30
rpc_zmq_min_port = 49153
rpc_zmq_max_port = 65536
In both static and dynamic direct connections configuration it is necessary to
configure firewall to open binding port range on each node::
iptables -A INPUT -p tcp --match multiport --dports 49152:65535 -j ACCEPT
The sequrity recommendation here (it is general for any RPC backend) is to
setup private network for message bus and another open network for public APIs.
ZeroMQ driver doesn't support authentication and encryption on its level.
As stated above this configuration is the simplest one since it requires only a
Matchmaker service to be running. That is why driver's options configured by
default in a way to use this type of topology.
The biggest advantage of static direct connections (other than simplicity) is
it's huge performance. On small deployments (20 - 50 nodes) it can outperform
brokered solutions (or solutions with proxies) 3x - 5x times. It becomes possible
because this configuration doesn't have a central node bottleneck so it's
throughput is limited by only a TCP and network bandwidth.
Unfortunately this approach can not be applied as is on a big scale (over 500 nodes).
The main problem is the number of connections between services and particularly
the number of connections on each controller node grows (in a worst case) as
a square function of number of the whole running services. That's not
appropriate.
However this approach can be successfully used and is recommended to be used
when services on controllers doesn't talk to agent services on resource nodes
using oslo.messaging RPC, but RPC is used only to communicate controller
services between each other.
Examples here may be Cinder+Ceph backend and Ironic how it utilises
oslo.messaging.
For all the other cases like Nova and Neutron on a big scale using proxy-based
configurations or dynamic connections configuration is more appropriate.
The exception here may be the case when using OpenStack services inside Docker
containers with Kubernetes. Since Kubernetes already solves similar problems by
using KubeProxy and virtual IP addresses for each container. So it manages all
the traffic using iptables which is more than appropriate to solve the problem
described above.
Summing up it is recommended to use this type of zmq configuration for
1. Small clouds (up to 100 nodes)
2. Cinder+Ceph deployment
3. Ironic deployment
4. OpenStack + Kubernetes (OpenStack in containers) deployment
Dynamic Direct Connections
--------------------------
The example of service config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = false
use_router_proxy = false
use_dynamic_connections = true
zmq_failover_connections = 2
zmq_linger = 60
zmq_target_expire = 60
zmq_target_update = 30
rpc_zmq_min_port = 49153
rpc_zmq_max_port = 65536
The 'use_dynamic_connections = true' obviously states that connections are dynamic.
'zmq_linger' become crucial with dynamic connections in order to avoid socket
leaks. If socket being connected to a wrong (dead) host which somehow still
present in the Matchmaker and message was sent, then the socket can not be closed
until message stays in the queue (the default linger is infinite waiting). So
need to specify linger explicitly.
Services often run more than one worker on the same topic. Workers are equal, so
any can handle the message. In order to connect to more than one available worker
need to setup 'zmq_failover_connections' option to some value (2 by default which
means 2 additional connections). Take care because it may also result in slow-down.
All recommendations regarding port ranges described in previous section are also
valid here.
Most things are similar to what we had with static connections the only
difference is that each message causes connection setup and disconnect afterwards
immediately after message was sent.
The advantage of this deployment is that average number of connections on
controller node at any moment is not high even for quite large deployments.
The disadvantage is overhead caused by need to connect/disconnect per message.
So this configuration can with no doubt be considered as the slowest one. The
good news is the RPC of OpenStack doesn't require "thousands message per second"
bandwidth per each particular service (do not confuse with central broker/proxy
bandwidth which is needed as high as possible for a big scale and can be a
serious bottleneck).
One more bad thing about this particular configuration is fanout. Here it is
completely linear complexity operation and it suffers the most from
connect/disconnect overhead per message. So for fanout it is fair to say that
services can have significant slow-down with dynamic connections.
The recommended way to solve this problem is to use combined solution with
proxied PUB/SUB infrastructure for fanout and dynamic direct connections for
direct message types (plain CAST and CALL messages). This combined approach
will be described later in the text.
Router Proxy
------------
The example of service config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = false
use_router_proxy = true
use_dynamic_connections = false
The example of proxy config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = false
[zmq_proxy_opt]
host = host-1
RPC may consume too many TCP sockets on controller node in directly connected
configuration. To solve the issue ROUTER proxy may be used.
In order to configure driver to use ROUTER proxy set up the 'use_router_proxy'
option to true in [oslo_messaging_zmq] section (false is set by default).
Pay attention to 'use_pub_sub = false' line, which has to match for all
services and proxies configs, so it wouldn't work if proxy uses PUB/SUB and
services don't.
Not less than 3 proxies should be running on controllers or on stand alone
nodes. The parameters for the script oslo-messaging-zmq-proxy should be::
oslo-messaging-zmq-proxy
--config-file /etc/oslo/zeromq.conf
--log-file /var/log/oslo/zeromq-router-proxy.log
--host node-123
--frontend-port 50001
--backend-port 50002
--debug
Config file for proxy consists of default section, 'oslo_messaging_zmq' section
and additional 'zmq_proxy_opts' section.
Command line arguments like host, frontend_port, backend_port and publisher_port
respectively can also be set in 'zmq_proxy_opts' section of a configuration
file (i.e., /etc/oslo/zeromq.conf). All arguments are optional.
Port value of 0 means random port (see the next section for more details).
Take into account that --debug flag makes proxy to make a log record per every
dispatched message which influences proxy performance significantly. So it is
not recommended flag to use in production. Without --debug there will be only
Matchmaker updates or critical errors in proxy logs.
In this configuration we use proxy as a very simple dispatcher (so it has the
best performance with minimal overhead). The only thing proxy does is getting
binary routing-key frame from the message and dispatch message on this key.
In this kind of deployment client is in charge of doing fanout. Before sending
fanout message client takes a list of available hosts for the topic and sends
as many messages as the number of hosts it got.
This configuration just uses DEALER/ROUTER pattern of ZeroMQ and doesn't use
PUB/SUB as it was stated above.
Disadvantage of this approach is again slower client fanout. But it is much
better than with dynamic direct connections because we don't need to connect
and disconnect per each message.
ZeroMQ PUB/SUB Infrastructure
-----------------------------
The example of service config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = true
use_router_proxy = true
use_dynamic_connections = false
The example of proxy config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = true
[zmq_proxy_opt]
host = host-1
It seems obvious that fanout pattern of oslo.messaging maps on ZeroMQ PUB/SUB
pattern, but it is only at first glance. It does really, but lets look a bit
closer.
First caveat is that in oslo.messaging it is a client who makes fanout (and
generally initiates conversation), server is passive. While in ZeroMQ publisher
is a server and subscribers are clients. And here is the problem: RPC-servers
are subscribers in terms of ZeroMQ PUB/SUB, they hold the SUB socket and wait
for messages. And they don't know anything about RPC-clients, and clients
generally come later than servers. So servers don't have a PUB to subscribe
on start, so we need to introduce something in the middle, and here the proxy
plays the role.
Publisher proxy has ROUTER socket on the front-end and PUB socket on the back-end.
So client connects to ROUTER and sends a single message to a publisher proxy.
Proxy redirects this message to PUB socket which performs actual publishing.
Command to run central publisher proxy::
oslo-messaging-zmq-proxy
--config-file /etc/oslo/zeromq.conf
--log-file /var/log/oslo/zeromq-router-proxy.log
--host node-123
--frontend-port 50001
--publisher-port 50003
--debug
When we run a publisher proxy we need to specify a --publisher-port option.
Random port will be picked up otherwise and clients will get it from the
Matchmaker.
The advantage of this approach is really fast fanout, while it takes time on
proxy to publish, but ZeroMQ PUB/SUB is one of the fastest fanout pattern
implementations. It also makes clients faster, because they need to send only a
single message to a proxy.
In order to balance load and HA it is recommended to have at least 3 proxies basically,
but the number of running proxies is not limited. They also don't form a cluster,
so there are no limitations on number caused by consistency algorithm requirements.
The disadvantage is that number of connections on proxy increased twice compared
to previous deployment, because we still need to use router for direct messages.
The documented limitation of ZeroMQ PUB/SUB is 10k subscribers.
In order to limit the number of subscribers and connections the local proxies
may be used. In order to run local publisher the following command may be used::
oslo-messaging-zmq-proxy
--local-publisher
--config-file /etc/oslo/zeromq.conf
--log-file /var/log/oslo/zeromq-router-proxy.log
--host localhost
--publisher-port 60001
--debug
Pay attention to --local-publisher flag which specifies the type of a proxy.
Local publishers may be running on every single node of a deployment. To make
services use of local publishers the 'subscribe_on' option has to be specified
in service's config file::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = true
use_router_proxy = true
use_dynamic_connections = false
subscribe_on = localhost:60001
If we forgot to specify the 'subscribe_on' services will take info from Matchmaker
and still connect to a central proxy, so the trick wouldn't work. Local proxy
gets all the needed info from the matchmaker in order to find central proxies
and subscribes on them. Frankly speaking you can pub a central proxy in the
'subscribe_on' value, even a list of hosts may be passed the same way as we do
for the transport_url::
subscribe_on = host-1:50003,host-2:50003,host-3:50003
This is completely valid, just not necessary because we have information about
central proxies in Matchmaker. One more thing to highlight about 'subscribe_on'
is that it has higher priority than Matchmaker if being explicitly mentioned.
Concluding all the above, fanout over PUB/SUB proxies is the best choice
because of static connections infrastructure, fail over when one or some publishers
die, and ZeroMQ PUB/SUB high performance.
What If Mix Different Configurations?
-------------------------------------
Three boolean variables 'use_pub_sub', 'use_router_proxy' and 'use_dynamic_connections'
give us exactly 8 possible combinations. But from practical perspective not all
of them are usable. So lets discuss only those which make sense.
The main recommended combination is Dynamic Direct Connections plus PUB/SUB
infrastructure. So we deploy PUB/SUB proxies as described in corresponding
paragraph (either with local+central proxies or with only a central proxies).
And the services configuration file will look like the following::
[DEFAULT]
transport_url = "zmq+redis://host-1:6379"
[oslo_messaging_zmq]
use_pub_sub = true
use_router_proxy = false
use_dynamic_connections = true
So we just tell the driver not to pass direct messages CALL and CAST over router,
but send them directly to RPC servers. All the details of configuring services
and port ranges has to be taken from 'Dynamic Direct Connections' paragraph.
So it's combined configuration. Currently it is the best choice from number of
connections perspective.
Frankly speaking, deployment from the 'ZeroMQ PUB/SUB Infrastructure' section is
also a combination of 'Router Proxy' with PUB/SUB, we've just used the same
proxies for both.
Here we've discussed combination inside the same service. But configurations can
also be combined on a higher level, a level of services. So you could have for
example a deployment where Cinder uses static direct connections and Nova/Neutron
use combined PUB/SUB + dynamic direct connections. But such approach needs additional
caution and may be confusing for cloud operators. Still it provides maximum
optimization of performance and number of connections on proxies and controller
nodes.
================
DevStack Support
----------------
================
ZeroMQ driver has been supported by DevStack. The configuration is as follows::
ENABLED_SERVICES+=,-rabbit,zeromq
ZEROMQ_MATCHMAKER=redis
ZeroMQ driver can be tested on a single node deployment with DevStack. Take
into account that on a single node it is not that obvious any performance
increase compared to other backends. To see significant speed up you need at least
20 nodes.
In local.conf [localrc] section need to enable zmq plugin which lives in
`devstack-plugin-zmq`_ repository.