Refactor the security_group_rules_for_devices call.

Refactoring the call into security_group_rules_for_devices_compact which will obsolete the old one during the next cycle. blueprint security-group-rules-for-devices-rpc-call-refactor Change-Id: I6830c48082d3886bbd9c3a9c29f9311e45e2cb17
2014-07-03 13:19:19 +02:00 · 2014-07-03 13:19:19 +02:00 · b6a088cfb6
parent 70a86bc782
commit b6a088cfb6
1 changed files with 420 additions and 0 deletions
--- a/specs/juno/security-group-rules-for-devices-rpc-call-refactoring.rst
+++ b/specs/juno/security-group-rules-for-devices-rpc-call-refactoring.rst
@ -0,0 +1,420 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=======================================================
+'security_group_rules_for_devices' RPC call refactoring
+=======================================================
+
+https://blueprints.launchpad.net/neutron/+spec/security-group-rules-for-devices-rpc-call-refactor
+
+Security group rules synchronization from neutron-server to L2 agents scales
+poorly for high density clouds, leading to a blocked neutron-server in some
+situations.
+
+
+Problem description
+===================
+The security_group_rules_for_devices RPC call from L2 agents to
+neutron-server doesn't scale well because all the security group rule entries
+are expanded with each specific IP address (see [#call_results]_) in a security
+group, when that group is referenced in rules like:
+
+allow all from 'default' group for IPv4 and IPv6
+
+This leads to:
+
+* huge AMQP messages (>20-600 MB)
+
+* very long processing time at neutron-server side when we have lots of
+  instances under the same tenant/security group (>60 seconds)
+
+* neutron-server lockups, when RPC call times out at agent side, and the same
+  security_group_rules_for_devices call is issued back to neutron.
+
+For a more detailed insight:
+
+* security_group_rules_for_devices is an RPC call from the L2 agents to
+  neutron-server, see [#call_results]_.
+
+This call's arguments are a list of device_ids, device_ids are connected
+to ports. Neutron builds a list [#call_results]_ of security group rules and
+returns the list of security group rules per device_id
+
+The message size between neutron-server and L2 agent will grow
+according to the following formula:
+
+MessageSize ~= base + L * Instances_in_host * (Instances_in_security_group-1)
+
+Where L = len(str(security_group_rule)) ~=440 bytes
+
+This problem is more likely to happen in big clouds, or denser
+ones, but it's an issue, as nova can scale to 10x instances
+without incident: see [#n2nov_scaling]_.
+
+
+Proposed change
+===============
+
+Refactor the security_group_rules_for_devices into a new call
+'security_group_rules_for_devices_compact', which instead of returning
+[#call_results]_ , returns a more compact result
+that can be expanded by the l2 agent.
+
+The new 'security_groups_rules_for_devices_compact' RPC call returns:
+
+.. code-block:: python
+
+    {'security_groups': {'sg-id1': {'rules': [ {},{},{},{}]}
+                        },
+     'security_group_member_ips':  { 'sg-id1' : {'ipv4': ['192.168.11.2/32'],
+                                                 'ipv6': [] },
+                                     'sg-id2-referenced-from-id1' : {...},
+                                   },
+
+     'devices': {'dev-id1': { ... ,
+                             'fixed_ips': ['192.168.11.4'],
+                             'security_groups': ['sg-id1', 'sg-id2', ...] }}
+    }
+
+
+Security group rules would be passed in a non-expanded way, which implies
+that any reference to src or dst security groups in rules would be stored
+as security group ids.
+
+
+A response like this (old result):
+
+.. code-block:: python
+
+    {'dev-id1':
+     {...,
+      'security_group_rules': [{'direction': u'egress',
+                                'ethertype': u'IPv6',
+                                'security_group_id':
+                                 u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'egress',
+                               'ethertype': u'IPv4',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'protocol': u'icmp',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.2/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.3/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.4/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'23138476-4fde-454e-33ad-abc123456782',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.33.4/32'}
+                                ]
+     },
+    'dev-id2': {
+     ...,
+     'security_group_rules': [{'direction': u'egress',
+                               'ethertype': u'IPv6',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'egress',
+                               'ethertype': u'IPv4',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'protocol': u'icmp',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.2/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.3/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.4/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.11.5/32'},
+                              {'direction': u'ingress',
+                               'ethertype': u'IPv4',
+                               'remote_group_id':
+                                       u'23138476-4fde-454e-33ad-abc123456782',
+                               'security_group_id':
+                                       u'1809f907-4b0c-4445-a366-ff28eaab9c2e',
+                               'source_ip_prefix': '192.168.33.4/32'}
+                                ]
+     }
+    }
+
+
+Would be like this in the new version:
+
+.. code-block:: python
+
+    {'security_groups': {u'1809f907-4b0c-4445-a366-ff28eaab9c2e':
+                          {'rules': [
+                             {'direction': u'egress', 'ethertype': u'IPv6'},
+                             {'direction': u'egress', 'ethertype': u'IPv4'},
+                             {'direction': u'ingress',
+                              'ethertype': u'IPv4',
+                              'protocol': u'icmp',},
+                             {'direction': u'ingress',
+                              'ethertype': u'IPv4',
+                              'remote_group_id':
+                                   u'1809f907-4b0c-4445-a366-ff28eaab9c2e'},
+                             {'direction': u'ingress',
+                              'ethertype': u'IPv4',
+                              'remote_group_id':
+                                   u'23138476-4fde-454e-33ad-abc123456782'}
+                            ]
+                           }
+                        },
+     'security_group_member_ips':  { u'1809f907-4b0c-4445-a366-ff28eaab9c2e' :
+                                     {u'ipv4': ['192.168.11.2/32',
+                                               '192.168.11.3/32',
+                                               '192.169.11.4/32',
+                                               '192.168.11.5/32'],
+                                      u'ipv6': []
+                                     },
+                                     u'23138476-4fde-454e-33ad-abc123456782' :
+                                     {u'ipv4': ['192.168.33.2/32'],
+                                      u'ipv6': []
+                                     }
+                                   },
+
+     'devices': {'dev-id1': { ... ,
+                             'fixed_ips': ['192.168.11.4'],
+                             'security_groups':
+                              ['1809f907-4b0c-4445-a366-ff28eaab9c2e'] },
+                 'dev-id2': { ... ,
+                             'fixed_ips': ['192.168.11.4'],
+                             'security_groups':
+                              ['1809f907-4b0c-4445-a366-ff28eaab9c2e'] },
+
+                 }
+
+
+All security groups referenced from devices will be included in the
+response.
+
+All security group members ip addresses from all remote_group_id referenced
+groups will be included in the response.
+
+The old call could be marked as deprecated during this J cycle,
+and removed during K cycle.
+
+Making the refactor into a new call would have the following advantages:
+
+#. Compatibility during neutron-server upgrade with older agents
+#. Ability to split patches (server/agents) in more steps, as we will have
+   the ability to address the new call, while keeping the agents calling
+   the old one, and then refactor the agents in further steps.
+
+The resulting message size would be:
+
+MessageSize ~= base +
+               D * Instances_in_host +
+               L * Referenced_security_groups +
+               I * Instances_in_referenced_security_groups
+
+Where L = len(str(compact_security_group_rule)) ~= 220 bytes
+      D = len(str(device_description_including_ips_and_sg_ids))
+      I = len(str(ip_address + ',')) ~= 17 bytes
+
+In the new message format no data is replicated, thus now two variables
+become multiplication factors.
+
+
+Next steps:
+
+There is a proposal from Édouard Thuleau to use an RPC topic per security
+group [#sec_fanout]_ which would be addressed in a second iteration after
+this one.
+
+
+Alternatives
+------------
+
+* Instead of including all the security groups in one rpc call, this could
+  be split to a second call, 'security_groups_and_referenced_members', which
+  would receive a list of security groups ids, and would return a list of
+  security groups and a list of security group ip addresses.
+  A full sync would require 2 calls to neutron server.
+
+* We could have a 'security_groups' and a 'security_groups_members', which
+  would provide the security groups, without member IP addresses and,
+  a list of IPv4 and IPv6 addresses members for each security group.
+  A full sync would require 3 calls to neutron server. But this approach
+  would allow separate communication of new members in security groups,
+  or new rules in security groups, further reducing the information transmited
+  in those cases. As for the first alternative, reducing the traffic volume
+  by increasing the number of calls seems like a bad tradeoff due to the
+  overhead/latency generated by each call.
+
+* We could just compact CIDR ranges in rules generation, that wouldn't
+  require modifications to the agents, but that would increase the
+  rpc request processing time.
+
+
+
+Data model impact
+-----------------
+
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+
+None
+
+Notifications impact
+--------------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+The performance impact should be very positive in the next situations:
+
+* Security group changes: neutron-server load, and AMQP message sizes.
+  At this moment oslo messaging serializes structures to JSON because of
+  AMQP version limitations (string sizes for dictionaries is
+  one of those). Reducing the message size will reduce the dictionary
+  building time and also the serialization time.
+
+* Creation time for new ports
+
+Even higher performance impact in packet processing would be achieved
+by the optimization at iptables level which is proposed in the ipset spec
+[#ipset_spec]_.
+
+If tranmission times become our bottleneck instead of processing times
+we may consider compression if that's available at the AMQP level.
+
+Other deployer impact
+---------------------
+
+None
+
+Developer impact
+----------------
+
+* I'm unsure if there are proprietary l2-agents talking to the RPC.
+  We would allow some time for those to be upgraded by introducing
+  it as a new RPC call instead of modifying the existing one.
+
+* All the agents that use this RPC call would need to be updated
+  to the new call before removing the old one.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  https://launchpad.net/~mangelajo
+
+Other contributors:
+  http://launchpad.net/~shihanzhang
+
+
+Work Items
+----------
+
+* Refactor the rpc call into a new one in neutron-server and
+  and the matching rpc call at the agent mixin.
+
+* Add functional testing to validate the approach.
+
+* Upgrade the agents to use the new call, one by one.
+
+* Analyze db access and file a new spec if improvements can
+  be made in this area.
+
+Dependencies
+============
+
+None
+
+Testing
+=======
+
+Functional testing will validate the approach and make sure
+the result that was possible with the old method can be
+consistently replicated (only faster) with the new method.
+Testing both rpc calls with functional testing will avoid
+regressions in both of the code paths, while testing this
+in Tempest only allows testing the default rpc call.
+
+Documentation Impact
+====================
+
+None
+
+References
+==========
+
+.. [#call_results] http://www.fpaste.org/104401/14008522/
+
+.. [#n2nov_scaling] http://javacruft.wordpress.com/2014/06/18/168k-instances/
+
+.. [#iptables] http://www.fpaste.org/104431/40085672/
+
+.. [#sec_fanout] http://lists.openstack.org/pipermail/openstack-dev/2014-June/038374.html
+
+.. [#ipset_spec] https://review.openstack.org/#/c/100761/