neutron/doc/source/contributor/internals/openvswitch_firewall.rst

571 lines
30 KiB
ReStructuredText

..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Convention for heading levels in Neutron devref:
======= Heading 0 (reserved for the title in a document)
------- Heading 1
~~~~~~~ Heading 2
+++++++ Heading 3
''''''' Heading 4
(Avoid deeper levels because they do not render well.)
Open vSwitch Firewall Driver
============================
The OVS driver has the same API as the current iptables firewall driver,
keeping the state of security groups and ports inside of the firewall.
Class ``SGPortMap`` was created to keep state consistent, and maps from ports
to security groups and vice-versa. Every port and security group is represented
by its own object encapsulating the necessary information.
.. note::
Open vSwitch firewall driver uses ``register 5`` for identifying the port
related to the flow and ``register 6`` which identifies the network, used
in particular for conntrack zones.
Ingress/Egress Terminology
--------------------------
In this document, the terms ``ingress`` and ``egress`` are relative to
a VM instance connected to OVS (or a netns connected to OVS):
* ``ingress`` applies to traffic that will ultimately go into a VM (or into
a netns), assuming it is not dropped
* ``egress`` applies to traffic coming from a VM (or from a netns)
::
. .
_______|\ _______|\
\ ingress \ \ ingress \
/_______ / /_______ /
|/ .-----------------. |/
' | | '
| |-----------( netns interface )
( non-VM, non-netns )---| OVS |
( interface: phy, patch ) | |------------( VM interface )
. | | .
/|________ '-----------------' /|________
/ egress / / egress /
\ ________\ \ ________\
\| \|
' '
Note that these terms are used differently in OVS code and documentation, where
they are relative to the OVS bridge, with ``ingress`` applying to traffic as
it comes into the OVS bridge, and ``egress`` applying to traffic as it leaves
the OVS bridge.
Firewall API calls
------------------
There are two main calls performed by the firewall driver in order to either
create or update a port with security groups - ``prepare_port_filter`` and
``update_port_filter``. Both methods rely on the security group objects that
are already defined in the driver and work similarly to their iptables
counterparts. The definition of the objects will be described later in this
document. ``prepare_port_filter`` must be called only once during port
creation, and it defines the initial rules for the port. When the port is
updated, all filtering rules are removed, and new rules are generated based on
the available information about security groups in the driver.
Security group rules can be defined in the firewall driver by calling
``update_security_group_rules``, which rewrites all the rules for a given
security group. If a remote security group is changed, then
``update_security_group_members`` is called to determine the set of IP
addresses that should be allowed for this remote security group. Calling this
method will not have any effect on existing instance ports. In other words, if
the port is using security groups and its rules are changed by calling one of
the above methods, then no new rules are generated for this port.
``update_port_filter`` must be called for the changes to take effect.
All the machinery above is controlled by security group RPC methods, which mean
the firewall driver doesn't have any logic of which port should be updated
based on the provided changes, it only accomplishes actions when called from
the controller.
OpenFlow rules
--------------
At first, every connection is split into ingress and egress processes based on
the input or output port respectively. Each port contains the initial
hardcoded flows for ARP, DHCP and established connections, which are accepted
by default. To detect established connections, a flow must by marked by
conntrack first with an ``action=ct()`` rule. An accepted flow means that
ingress packets for the connection are directly sent to the port, and egress
packets are left to be normally switched by the integration bridge.
Connections that are not matched by the above rules are sent to either the
ingress or egress filtering table, depending on its direction. The reason the
rules are based on security group rules in separate tables is to make it easy
to detect these rules during removal.
Security group rules are treated differently for those without a
remote group ID and those with a remote group ID. A security group
rule without a remote group ID is expanded into several OpenFlow rules
by the method ``create_flows_from_rule_and_port``. A security group
rule with a remote group ID is expressed by three sets of flows. The
first two are conjunctive flows which will be described in the next
section. The third set matches on the conjunction IDs and does accept
actions.
Flow priorities for security group rules
----------------------------------------
The OpenFlow spec says a packet should not match against multiple
flows at the same priority [1]_. The firewall driver uses 8 levels of
priorities to achieve this. The method ``flow_priority_offset``
calculates a priority for a given security group rule. The use of
priorities is essential with conjunction flows, which will be
described later in the conjunction flows examples.
.. [1] Although OVS seems to magically handle overlapping flows under
some cases, we shouldn't rely on that.
Uses of conjunctive flows
-------------------------
With a security group rule with a remote group ID, flows that match on
nw_src for remote_group_id addresses and match on dl_dst for port MAC
addresses are needed (for ingress rules; likewise for egress
rules). Without conjunction, this results in O(n*m) flows where n and
m are number of ports in the remote group ID and the port security group,
respectively.
A conj_id is allocated for each (remote_group_id, security_group_id,
direction, ethertype, flow_priority_offset) tuple. The class
``ConjIdMap`` handles the mapping. The same conj_id is shared between
security group rules if multiple rules belong to the same tuple above.
Conjunctive flows consist of 2 dimensions. Flows that belong to the
dimension 1 of 2 are generated by the method
``create_flows_for_ip_address`` and are in charge of IP address based
filtering specified by their remote group IDs. Flows that belong to
the dimension 2 of 2 are generated by the method
``create_flows_from_rule_and_port`` and modified by the method
``substitute_conjunction_actions``, which represents the portion of
the rule other than its remote group ID.
Those dimension 2 of 2 flows are per port and contain no remote group
information. When there are multiple security group rules for a port,
those flows can overlap. To avoid such a situation, flows are sorted
and fed to ``merge_port_ranges`` or ``merge_common_rules`` methods to
rearrange them.
Rules example with explanation:
-------------------------------
The following example presents two ports on the same host. They have different
security groups and there is ICMP traffic allowed from first security group to
the second security group. Ports have following attributes:
::
Port 1
- plugged to the port 1 in OVS bridge
- IP address: 192.168.0.1
- MAC address: fa:16:3e:a4:22:10
- security group 1: can send ICMP packets out
- allowed address pair: 10.0.0.1/32, fa:16:3e:8c:84:13
Port 2
- plugged to the port 2 in OVS bridge
- IP address: 192.168.0.2
- MAC address: fa:16:3e:24:57:c7
- security group 2:
- can receive ICMP packets from security group 1
- can receive TCP packets from security group 1
- can receive TCP packets to port 80 from security group 2
- can receive IP packets from security group 3
- allowed address pair: 10.1.0.0/24, fa:16:3e:8c:84:14
|table_0| contains a low priority rule to continue packets processing in
|table_60| aka TRANSIENT table. |table_0| is left for use to other
features that take precedence over firewall, e.g. DVR. The only requirement is
that after such a feature is done with its processing, it needs to pass packets
for processing to the TRANSIENT table. This TRANSIENT table distinguishes the
ingress traffic from the egress traffic and loads into ``register 5`` a value
identifying the port (for egress traffic based on the switch port number, and
for ingress traffic based on the network id and destination MAC address);
``register 6`` contains a value identifying the network (which is also the
OVSDB port tag) to isolate connections into separate conntrack zones.
For VLAN networks, the physical VLAN tag will be used to act as an extra
match rule to do such identifying work as well.
::
table=60, priority=100,in_port=1 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,71)
table=60, priority=100,in_port=2 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,71)
table=60, priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:a4:22:10 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60, priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:8c:84:13 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60, priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:24:57:c7 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60, priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:8c:84:14 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60, priority=0 actions=NORMAL
The following table, |table_71| implements ARP spoofing protection, IP spoofing
protection, allows traffic related to IP address allocations (dhcp, dhcpv6,
slaac, ndp) for egress traffic, and allows ARP replies. Also identifies not
tracked connections which are processed later with information obtained from
conntrack. Notice the ``zone=NXM_NX_REG6[0..15]`` in ``actions`` when obtaining
information from conntrack. It says every port has its own conntrack zone
defined by the value in ``register 6`` (OVSDB port tag identifying the network).
It's there to avoid accepting established traffic that belongs to different
port with same conntrack parameters.
The very first rule in |table_71| is a rule removing conntrack information
for a use-case where Neutron logical port is placed directly to the hypervisor.
In such case kernel does conntrack lookup before packet reaches Open vSwitch
bridge. Tracked packets are sent back for processing by the same table after
conntrack information is cleared.
::
table=71, priority=110,ct_state=+trk actions=ct_clear,resubmit(,71)
Rules below allow ICMPv6 traffic for multicast listeners, neighbour
solicitation and neighbour advertisement.
::
table=71, priority=95,icmp6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:11,ipv6_src=fe80::11,icmp_type=130 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:11,ipv6_src=fe80::11,icmp_type=131 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:11,ipv6_src=fe80::11,icmp_type=132 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:11,ipv6_src=fe80::11,icmp_type=135 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:11,ipv6_src=fe80::11,icmp_type=136 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,dl_src=fa:16:3e:a4:22:22,ipv6_src=fe80::22,icmp_type=130 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,dl_src=fa:16:3e:a4:22:22,ipv6_src=fe80::22,icmp_type=131 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,dl_src=fa:16:3e:a4:22:22,ipv6_src=fe80::22,icmp_type=132 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,dl_src=fa:16:3e:a4:22:22,ipv6_src=fe80::22,icmp_type=135 actions=resubmit(,94)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,dl_src=fa:16:3e:a4:22:22,ipv6_src=fe80::22,icmp_type=136 actions=resubmit(,94)
Following rules implement ARP spoofing protection
::
table=71, priority=95,arp,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,arp_spa=192.168.0.1 actions=resubmit(,94)
table=71, priority=95,arp,reg5=0x1,in_port=1,dl_src=fa:16:3e:8c:84:13,arp_spa=10.0.0.1 actions=resubmit(,94)
table=71, priority=95,arp,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,arp_spa=192.168.0.2 actions=resubmit(,94)
table=71, priority=95,arp,reg5=0x2,in_port=2,dl_src=fa:16:3e:8c:84:14,arp_spa=10.1.0.0/24 actions=resubmit(,94)
DHCP and DHCPv6 traffic is allowed to instance but DHCP servers are blocked on
instances.
::
table=71, priority=80,udp,reg5=0x1,in_port=1,tp_src=68,tp_dst=67 actions=resubmit(,73)
table=71, priority=80,udp6,reg5=0x1,in_port=1,tp_src=546,tp_dst=547 actions=resubmit(,73)
table=71, priority=70,udp,reg5=0x1,in_port=1,tp_src=67,tp_dst=68 actions=resubmit(,93)
table=71, priority=70,udp6,reg5=0x1,in_port=1,tp_src=547,tp_dst=546 actions=resubmit(,93)
table=71, priority=80,udp,reg5=0x2,in_port=2,tp_src=68,tp_dst=67 actions=resubmit(,73)
table=71, priority=80,udp6,reg5=0x2,in_port=2,tp_src=546,tp_dst=547 actions=resubmit(,73)
table=71, priority=70,udp,reg5=0x2,in_port=2,tp_src=67,tp_dst=68 actions=resubmit(,93)
table=71, priority=70,udp6,reg5=0x2,in_port=2,tp_src=547,tp_dst=546 actions=resubmit(,93)
Flowing rules obtain conntrack information for valid IP and MAC address
combinations. All other packets are dropped.
::
table=71, priority=65,ip,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,nw_src=192.168.0.1 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x1,in_port=1,dl_src=fa:16:3e:8c:84:13,nw_src=10.0.0.1 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,nw_src=192.168.0.2 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x2,in_port=2,dl_src=fa:16:3e:8c:84:14,nw_src=10.1.0.0/24 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ipv6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,ipv6_src=fe80::f816:3eff:fea4:2210 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ipv6,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,ipv6_src=fe80::f816:3eff:fe24:57c7 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=10,reg5=0x1,in_port=1 actions=resubmit(,93)
table=71, priority=10,reg5=0x2,in_port=2 actions=resubmit(,93)
table=71, priority=0 actions=drop
|table_72| accepts only established or related connections, and implements
rules defined by security groups. As this egress connection might also be an
ingress connection for some other port, it's not switched yet but eventually
processed by the ingress pipeline.
All established or new connections defined by security group rules are
``accepted``, which will be explained later. All invalid packets are dropped.
In the case below we allow all ICMP egress traffic.
::
table=72, priority=75,ct_state=+est-rel-rpl,icmp,reg5=0x1 actions=resubmit(,73)
table=72, priority=75,ct_state=+new-est,icmp,reg5=0x1 actions=resubmit(,73)
table=72, priority=50,ct_state=+inv+trk actions=resubmit(,93)
Important on the flows below is the ``ct_mark=0x1``. Flows that
were marked as not existing anymore by rule introduced later will value this
value. Those are typically connections that were allowed by some security group
rule and the rule was removed.
::
table=72, priority=50,ct_mark=0x1,reg5=0x1 actions=resubmit(,93)
table=72, priority=50,ct_mark=0x1,reg5=0x2 actions=resubmit(,93)
All other connections that are not marked and are established or related are
allowed.
::
table=72, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x1 actions=resubmit(,94)
table=72, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x2 actions=resubmit(,94)
table=72, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x1 actions=resubmit(,94)
table=72, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x2 actions=resubmit(,94)
In the following, flows are marked established connections that weren't matched
in the previous flows, which means they don't have accepting security group
rule anymore.
::
table=72, priority=40,ct_state=-est,reg5=0x1 actions=resubmit(,93)
table=72, priority=40,ct_state=+est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=72, priority=40,ct_state=-est,reg5=0x2 actions=resubmit(,93)
table=72, priority=40,ct_state=+est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=72, priority=0 actions=drop
In following |table_73| are all detected ingress connections sent to ingress
pipeline. Since the connection was already accepted by egress pipeline, all
remaining egress connections are sent to normal flood'n'learn switching
in |table_94|.
::
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:a4:22:10 actions=load:0x1->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:8c:84:13 actions=load:0x1->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:24:57:c7 actions=load:0x2->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:8c:84:14 actions=load:0x2->NXM_NX_REG5[],resubmit(,81)
table=73, priority=90,ct_state=+new-est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15]),resubmit(,91)
table=73, priority=90,ct_state=+new-est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15]),resubmit(,91)
table=73, priority=80,reg5=0x1 actions=resubmit(,94)
table=73, priority=80,reg5=0x2 actions=resubmit(,94)
table=73, priority=0 actions=drop
|table_81| is similar to |table_71|, allows basic ingress traffic for
obtaining IP address and ARP queries. Note that vlan tag must be removed by
adding ``strip_vlan`` to actions list, prior to injecting packet directly to
port. Not tracked packets are sent to obtain conntrack information.
::
table=81, priority=100,arp,reg5=0x1 actions=strip_vlan,output:1
table=81, priority=100,arp,reg5=0x2 actions=strip_vlan,output:2
table=81, priority=100,icmp6,reg5=0x1,icmp_type=130 actions=strip_vlan,output:1
table=81, priority=100,icmp6,reg5=0x1,icmp_type=131 actions=strip_vlan,output:1
table=81, priority=100,icmp6,reg5=0x1,icmp_type=132 actions=strip_vlan,output:1
table=81, priority=100,icmp6,reg5=0x1,icmp_type=135 actions=strip_vlan,output:1
table=81, priority=100,icmp6,reg5=0x1,icmp_type=136 actions=strip_vlan,output:1
table=81, priority=100,icmp6,reg5=0x2,icmp_type=130 actions=strip_vlan,output:2
table=81, priority=100,icmp6,reg5=0x2,icmp_type=131 actions=strip_vlan,output:2
table=81, priority=100,icmp6,reg5=0x2,icmp_type=132 actions=strip_vlan,output:2
table=81, priority=100,icmp6,reg5=0x2,icmp_type=135 actions=strip_vlan,output:2
table=81, priority=100,icmp6,reg5=0x2,icmp_type=136 actions=strip_vlan,output:2
table=81, priority=95,udp,reg5=0x1,tp_src=67,tp_dst=68 actions=strip_vlan,output:1
table=81, priority=95,udp6,reg5=0x1,tp_src=547,tp_dst=546 actions=strip_vlan,output:1
table=81, priority=95,udp,reg5=0x2,tp_src=67,tp_dst=68 actions=strip_vlan,output:2
table=81, priority=95,udp6,reg5=0x2,tp_src=547,tp_dst=546 actions=strip_vlan,output:2
table=81, priority=90,ct_state=-trk,ip,reg5=0x1 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ipv6,reg5=0x1 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ip,reg5=0x2 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ipv6,reg5=0x2 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=80,ct_state=+trk,reg5=0x1 actions=resubmit(,82)
table=81, priority=80,ct_state=+trk,reg5=0x2 actions=resubmit(,82)
table=81, priority=0 actions=drop
Similarly to |table_72|, |table_82| accepts established and related
connections. In this case we allow all ICMP traffic coming from
``security group 1`` which is in this case only ``port 1``.
The first four flows match on the IP addresses, and the
next two flows match on the ICMP protocol.
These six flows define conjunction flows, and the next two define actions for
them.
::
table=82, priority=71,ct_state=+est-rel-rpl,ip,reg6=0x284,nw_src=192.168.0.1 actions=conjunction(18,1/2)
table=82, priority=71,ct_state=+est-rel-rpl,ip,reg6=0x284,nw_src=10.0.0.1 actions=conjunction(18,1/2)
table=82, priority=71,ct_state=+new-est,ip,reg6=0x284,nw_src=192.168.0.1 actions=conjunction(19,1/2)
table=82, priority=71,ct_state=+new-est,ip,reg6=0x284,nw_src=10.0.0.1 actions=conjunction(19,1/2)
table=82, priority=71,ct_state=+est-rel-rpl,icmp,reg5=0x2 actions=conjunction(18,2/2)
table=82, priority=71,ct_state=+new-est,icmp,reg5=0x2 actions=conjunction(19,2/2)
table=82, priority=71,conj_id=18,ct_state=+est-rel-rpl,ip,reg5=0x2 actions=strip_vlan,output:2
table=82, priority=71,conj_id=19,ct_state=+new-est,ip,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15]),strip_vlan,output:2,resubmit(,92)
table=82, priority=50,ct_state=+inv+trk actions=resubmit(,93)
There are some more security group rules with remote group IDs. Next
we look at TCP related ones. Excerpt of flows that correspond to those
rules are:
::
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x60/0xffe0 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x60/0xffe0 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x40/0xfff0 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x40/0xfff0 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x58/0xfff8 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x58/0xfff8 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x54/0xfffc actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x54/0xfffc actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x52/0xfffe actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x52/0xfffe actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=80 actions=conjunction(22,2/2),conjunction(14,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=81 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=80 actions=conjunction(23,2/2),conjunction(15,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=81 actions=conjunction(23,2/2)
Only dimension 2/2 flows are shown here, as the other are similar to
the previous ICMP example. There are many more flows but only the port
ranges that covers from 64 to 127 are shown for brevity.
The conjunction IDs 14 and 15 correspond to packets from the security
group 1, and the conjunction IDs 22 and 23 correspond to those from
the security group 2. These flows are from the following security group rules,
::
- can receive TCP packets from security group 1
- can receive TCP packets to port 80 from security group 2
and these rules have been processed by ``merge_port_ranges`` into:
::
- can receive TCP packets to port != 80 from security group 1
- can receive TCP packets to port 80 from security group 1 or 2
before translating to flows so that there is only one matching flow
even when the TCP destination port is 80.
The remaining is a L4 protocol agnostic rule.
::
table=82, priority=70,ct_state=+est-rel-rpl,ip,reg5=0x2 actions=conjunction(24,2/2)
table=82, priority=70,ct_state=+new-est,ip,reg5=0x2 actions=conjunction(25,2/2)
Any IP packet that matches the previous TCP flows matches one of these
flows, but the corresponding security group rules have different
remote group IDs. Unlike the above TCP example, there's no convenient
way of expressing ``protocol != TCP`` or ``icmp_code != 1``. So the
OVS firewall uses a different priority than the previous TCP flows so
as not to mix them up.
The mechanism for dropping connections that are not allowed anymore is the
same as in |table_72|.
::
table=82, priority=50,ct_mark=0x1,reg5=0x1 actions=resubmit(,93)
table=82, priority=50,ct_mark=0x1,reg5=0x2 actions=resubmit(,93)
table=82, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x1 actions=strip_vlan,output:1
table=82, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x2 actions=strip_vlan,output:2
table=82, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x1 actions=strip_vlan,output:1
table=82, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x2 actions=strip_vlan,output:2
table=82, priority=40,ct_state=-est,reg5=0x1 actions=resubmit(,93)
table=82, priority=40,ct_state=+est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=82, priority=40,ct_state=-est,reg5=0x2 actions=resubmit(,93)
table=82, priority=40,ct_state=+est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=82, priority=0 actions=drop
.. note::
Conntrack zones on a single node are now based on the network to which
a port is plugged in. That makes a difference between traffic on hypervisor
only and east-west traffic. For example, if a port has a VIP that was
migrated to a port on a different node, then the new port won't contain
conntrack information about previous traffic that happened with VIP.
OVS firewall integration points
-------------------------------
There are three tables where packets are sent once after going through the OVS
firewall pipeline. The tables can be used by other mechanisms that are supposed
to work with the OVS firewall, typically L2 agent extensions.
Egress pipeline
~~~~~~~~~~~~~~~
Packets are sent to |table_91| and |table_94| when they are considered accepted
by the egress pipeline, and they will be processed so that they are forwarded
to their destination by being submitted to a NORMAL action, that results in
Ethernet flood/learn processing.
Two tables are used to differentiate between the first packets of a connection
and the following packets. This was introduced for performance reasons to
allow the logging extension to only log the first packets of a connection.
Only the first accepted packet of each connection session will go to |table_91|
and the following ones will go to |table_94|.
Note that |table_91| merely resubmits to |table_94| that contains the actual
NORMAL action; this allows to have a single place where the NORMAL action can
be overridden by other components (currently used by ``networking-bagpipe``
driver for ``networking-bgpvpn``).
Ingress pipeline
~~~~~~~~~~~~~~~~
The first packet of each connection accepted by the ingress pipeline is sent
to |table_92|. The default action in this table is DROP because at this point
the packets have already been delivered to their destination port. This
integration point is essentially provided for the logging extension.
Packets are sent to |table_93| if processing by the ingress filtering
concluded that they should be dropped.
Upgrade path from iptables hybrid driver
----------------------------------------
During an upgrade, the agent will need to re-plug each instance's tap device
into the integration bridge while trying to not break existing connections. One
of the following approaches can be taken:
1) Pause the running instance in order to prevent a short period of time where
its network interface does not have firewall rules. This can happen due to
the firewall driver calling OVS to obtain information about OVS the port. Once
the instance is paused and no traffic is flowing, we can delete the qvo
interface from integration bridge, detach the tap device from the qbr bridge
and plug the tap device back into the integration bridge. Once this is done,
the firewall rules are applied for the OVS tap interface and the instance is
started from its paused state.
2) Set drop rules for the instance's tap interface, delete the qbr bridge and
related veths, plug the tap device into the integration bridge, apply the OVS
firewall rules and finally remove the drop rules for the instance.
3) Compute nodes can be upgraded one at a time. A free node can be switched to
use the OVS firewall, and instances from other nodes can be live-migrated to
it. Once the first node is evacuated, its firewall driver can be then be
switched to the OVS driver.
.. |table_0| replace:: ``table 0`` (LOCAL_SWITCHING)
.. |table_60| replace:: ``table 60`` (TRANSIENT)
.. |table_71| replace:: ``table 71`` (BASE_EGRESS)
.. |table_72| replace:: ``table 72`` (RULES_EGRESS)
.. |table_73| replace:: ``table 73`` (ACCEPT_OR_INGRESS)
.. |table_81| replace:: ``table 81`` (BASE_INGRESS)
.. |table_82| replace:: ``table 82`` (RULES_INGRESS)
.. |table_91| replace:: ``table 91`` (ACCEPTED_EGRESS_TRAFFIC)
.. |table_92| replace:: ``table 92`` (ACCEPTED_INGRESS_TRAFFIC)
.. |table_93| replace:: ``table 93`` (DROPPED_TRAFFIC)
.. |table_94| replace:: ``table 94`` (ACCEPTED_EGRESS_TRAFFIC_NORMAL)