Browse Source

Merge "Migration document update."

tags/6.0.0.0b1
Zuul 9 months ago
parent
commit
0a6a14de3c
1 changed files with 252 additions and 118 deletions
  1. 252
    118
      doc/source/install/migration.rst

+ 252
- 118
doc/source/install/migration.rst View File

@@ -3,163 +3,213 @@
3 3
 Migration Strategy
4 4
 ==================
5 5
 
6
-This document details an in-place migration strategy from ML2/OVS in either
7
-ovs-firewall, or ovs-hybrid mode in a TripleO OpenStack deployment.
6
+This document details an in-place migration strategy from ML2/OVS to ML2/OVN
7
+in either ovs-firewall or ovs-hybrid mode for a TripleO OpenStack deployment.
8 8
 
9 9
 For non TripleO deployments, please refer to the file ``migration/README.rst``
10 10
 and the ansible playbook ``migration/migrate-to-ovn.yml``.
11 11
 
12 12
 Overview
13 13
 --------
14
-The migration would be accomplished by following the steps:
14
+The migration process is orchestrated through the shell script
15
+ovn_migration.sh, which is provided with networking-ovn.
15 16
 
16
-a. Administrator steps:
17
+The administrator uses ovn_migration.sh to perform readiness steps
18
+and migration from the undercloud node.
19
+The readiness steps, such as host inventory production, DHCP and MTU
20
+adjustments, prepare the environment for the procedure.
17 21
 
18
-    * Updating to the latest openstack/neutron version
22
+Subsequent steps start the migration via Ansible.
19 23
 
20
-    * Reducing the DHCP T1 parameter on dhcp_agent.ini beforehand, which
21
-      is controlled by the dhcp_renewal_time of /etc/neutron/dhcp_agent.ini
24
+Plan for a 24-hour wait after the setup-mtu-t1 step to allow VMs to catch up
25
+with the new MTU size. The default neutron ML2/OVS configuration has a
26
+dhcp_lease_duration of 86400 seconds (24h).
22 27
 
23
-      Somewhere around 30 seconds would be enough (TODO: Data and calculations
24
-      to back this value with precise information).
28
+Also, if there are instances using static IP assignment, the administrator
29
+should be ready to update the MTU of those instances to the new value of 8
30
+bytes less than the ML2/OVS (VXLAN) MTU value. For example, the typical
31
+1500 MTU network value that makes VXLAN tenant networks use 1450 bytes of MTU
32
+will need to change to 1442 under Geneve. Or under the same overlay network,
33
+a GRE encapsulated tenant network would use a 1458 MTU, but again a 1442 MTU
34
+for Geneve.
25 35
 
26
-    * Waiting for at least dhcp_lease_duration (see /etc/neutron/neutron.conf
27
-      or /etc/neutron/dhcp_agent.ini) time (default is 86400 seconds =
28
-      24 hours), that way all instances will grab the new new lease renewal
29
-      time and start checking with the dhcp server periodically based on the
30
-      T1 parameter.
36
+If there are instances which use DHCP but don't support lease update during
37
+the T1 period the administrator will need to reboot them to ensure that MTU
38
+is updated inside those instances.
31 39
 
32
-    * Lowering the MTU of all VXLAN or GRE based networks down to
33
-      make sure geneve works (a tool will be provided for that). The mtu
34
-      must be set to "max_tunneling_network_mtu - ovn_geneve_overhead", that's
35
-      generally "1500 - ovn_geneve_overhead", unless your network and any
36
-      intermediate router hop between compute and network nodes is jumboframe
37
-      capable). ovn_geneve_overhead is 58 bytes. VXLAN overhead is 50 bytes. So
38
-      for the typical 1500 MTU tunneling network, we may need to assign 1442.
39 40
 
40
-b. Automated steps (via ansible)
41
+Steps for migration
42
+-------------------
41 43
 
42
-    * Create pre-migration resources (network and VM) to validate final
43
-      migration.
44
+Perform the following steps in the overcloud/undercloud
45
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44 46
 
45
-    * Update the overcloud stack (in the case of TripleO) to deploy OVN
46
-      alongside reference implementation services using a temporary bridge
47
-      "br-migration" instead of br-int.
47
+1. Ensure that you have updated to the latest openstack/neutron version.
48 48
 
49
-    * Start the migration process:
49
+Perform the following steps in the undercloud
50
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50 51
 
51
-      1. generate the OVN north db by running neutron-ovn-db-sync util
52
-      2. re-assign ovn-controller to br-int instead of br-migration
53
-      3. cleanup network namespaces (fip, snat, qrouter, qdhcp),
54
-      4. remove any unnecessary patch ports on br-int
55
-      5. remove br-tun and br-migration ovs bridges
56
-      6. delete qr-*, ha-* and qg-* ports from br-int
52
+1. Install python-networking-ovn-migration-tool.
57 53
 
58
-    * Delete neutron agents and neutron HA internal networks
54
+  .. code-block:: console
59 55
 
60
-    * Validate connectivity on pre-migration resources.
56
+     yum install python-networking-ovn-migration-tool
61 57
 
62
-    * Delete pre-migration resources.
58
+2. Create a working directory on the undercloud, and copy the ansible playbooks
63 59
 
64
-    * Create post-migration resources.
60
+  .. code-block:: console
65 61
 
66
-    * Validate connectivity on post-migration resources.
62
+     mkdir ~/ovn_migration
63
+     cd ~/ovn_migration
64
+     cp -rfp /usr/share/ansible/networking-ovn-migration/playbooks .
67 65
 
68
-    * Cleanup post-migration resources.
69 66
 
70
-    * Re-run deployment tool to update OVN on br-int.
67
+3. Create or edit the ``overcloud-deploy-ovn.sh`` script in your ``$HOME``.
68
+This script must source your stackrc file, and then execute an ``openstack
69
+overcloud overcloud deploy`` with your original deployment parameters, plus
70
+the following environment files, added to the end of the command
71
+in the following order:
71 72
 
73
+  When your network topology is DVR and your compute nodes have connectivity
74
+  to the external network:
72 75
 
73
-Steps for migration
74
--------------------
75
-Carryout the below steps in the undercloud:
76
+  .. code-block:: console
76 77
 
77
-1. Create ``overcloud-deploy-ovn.sh`` script  in /home/stack. Make sure the
78
-   below environment files are added in the order mentioned below
78
+     -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \
79
+     -e $HOME/ovn-extras.yaml
80
+
81
+
82
+  When your compute nodes don't have external connectivity and you don't use
83
+  DVR:
79 84
 
80 85
   .. code-block:: console
81 86
 
82
-     -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
83 87
      -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml \
84
-     -e /home/stack/ovn-extras.yaml
88
+     -e $HOME/ovn-extras.yaml
89
+
85 90
 
86
-    If compute nodes have external connectivity, then you can use the
87
-    environment file - environments/services-docker/neutron-ovn-dvr-ha.yaml
91
+Make sure that all users have execution privileges on the script, because it
92
+will be called by ovn_migration.sh/ansible during the migration process.
93
+
94
+  .. code-block:: console
95
+
96
+      $ chmod a+x ~/overcloud-deploy-ovn.sh
88 97
 
89
-2. Check the script ``ovn_migration.sh`` and override the environment variables
90
-   if desired.
91 98
 
92
-   Below are the environment variables
99
+4. To configure the parameters of your migration you can set the environment
100
+variables that will be used by ``ovn_migration.sh``. You can skip setting any
101
+values matching the defaults.
93 102
 
94
-    * IS_DVR_ENABLED - If the existing ML2/OVS has DVR enabled, set it to True.
95
-      Default value is False.
103
+    * STACKRC_FILE - must point to your stackrc file in your undercloud.
104
+      Default:  ~/stackrc
96 105
 
97
-    * PUBLIC_NETWORK_NAME - Name of the public network. Default value is
98
-      'public'.
106
+    * OVERCLOUDRC_FILE - must point to your overcloudrc file in your
107
+      undercloud.
108
+      Default: ~/overcloudrc
109
+
110
+    * OVERCLOUD_OVN_DEPLOY_SCRIPT - must point to the script described in step
111
+      1..
112
+      Default: ~/overcloud-deploy-ovn.sh
113
+
114
+    * PUBLIC_NETWORK_NAME - Name of your public network.
115
+      Default: 'public'.
116
+      To support migration validation, this network must have available
117
+      floating IPs, and those floating IPs must be pingable from the
118
+      undercloud. If that's not possible please configure VALIDATE_MIGRATION
119
+      to False.
99 120
 
100 121
     * IMAGE_NAME - Name/ID of the glance image to us for booting a test server.
101
-      Default value is 'cirros'.
122
+      Default:'cirros'.
123
+      It will be automatically downloaded during the pre-validation /
124
+      post-validation process.
102 125
 
103 126
     * VALIDATE_MIGRATION - Create migration resources to validate the
104
-      migration.
105
-      The migration script, before starting the migration, boots a server and
106
-      validates that the server is reachable after the migration.
107
-      Default value is True.
127
+      migration. The migration script, before starting the migration, boot a
128
+      server and validates that the server is reachable after the migration.
129
+      Default: True.
130
+
131
+    * SERVER_USER_NAME - User name to use for logging to the migration
132
+      instances.
133
+      Default: 'cirros'.
108 134
 
109
-    * SERVER_USER_NAME - User name to use for logging to the migration server.
110
-      Default value is 'cirros'.
135
+    * DHCP_RENEWAL_TIME - DHCP renewal time in seconds to configure in DHCP
136
+      agent configuration file.
137
+      Default: 30
111 138
 
112
-    * DHCP_RENEWAL_TIME - DHCP renewal time to configure in dhcp agent
113
-      configuration file. The default value is 30 seconds.
114 139
 
115
-2. Run ``./ovn_migration.sh generate-inventory`` to generate the inventory
116
-   file - hosts_for_migration. Please review this file for correctness and
117
-   modify it if desired.
140
+    .. warning::
141
+
142
+       Please note that VALIDATE_MIGRATION requires enough quota (2
143
+       available floating ips, 2 networks, 2 subnets, 2 instances,
144
+       and 2 routers as admin).
145
+
146
+    For example:
147
+
148
+    .. code-block:: console
149
+
150
+       $ export PUBLIC_NETWORK_NAME=my-public-network
151
+       $ ovn_migration.sh .........
152
+
153
+
154
+5. Run ``ovn_migration.sh generate-inventory`` to generate the inventory
155
+   file - ``hosts_for_migration`` and ``ansible.cfg``. Please review
156
+   ``hosts_for_migration`` for correctness.
157
+
158
+  .. code-block:: console
159
+
160
+       $ ovn_migration.sh generate-inventory
161
+
118 162
 
119
-4. Run ``./ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter
120
-   of the internal neutron DHCP servers configuring the 'dhcp_renewal_time' in
121
-   /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
163
+6. Run ``ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter
164
+   of the internal neutron DHCP servers configuring the ``dhcp_renewal_time``
165
+   in /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
122 166
    in all the nodes where DHCP agent is running.
123 167
 
124
-5. After the previous step we need to wait at least 24h before continuing
125
-   if you are using VXLAN or GRE tenant networking. This will allow VMs to
126
-   catch up with the new MTU size of the next step.
168
+  .. code-block:: console
127 169
 
128
-    .. warning::
170
+       $ ovn_migration.sh setup-mtu-t1
129 171
 
130
-        This step is very important, never skip it if you are using VXLAN
131
-        or GRE tenant networks. If you are using VLAN tenant networks you don't
132
-        need to wait.
133 172
 
134
-    .. warning::
173
+7. If you are using VXLAN or GRE tenant networking, ``wait at least 24 hours``
174
+before continuing. This will allow VMs to catch up with the new MTU size
175
+of the next step.`
176
+
177
+  .. warning::
178
+
179
+        If you are using VXLAN or GRE networks, this 24-hour wait step is critical.
180
+        If you are using VLAN tenant networks you can proceed to the next step without delay.
181
+
182
+  .. warning::
135 183
 
136 184
         If you have any instance with static IP assignation on VXLAN or
137
-        GRE tenant networks, you will need to manually modify the
138
-        configuration of those instances to configure the new geneve MTU,
139
-        which is current VXLAN MTU minus 8 bytes, that is 1442 when VXLAN
140
-        based MTU was 1450.
185
+        GRE tenant networks, you must manually modify the configuration of those instances.
186
+        If your instances don't honor the T1 parameter of DHCP they will need
187
+        to be rebooted.
188
+        to configure the new geneve MTU, which is the current VXLAN MTU minus 8 bytes.
189
+        For instance, if the VXLAN-based MTU was 1450, change it to 1442.
141 190
 
142
-    .. note::
191
+  .. note::
143 192
 
144
-        24h is the time based on default configuration, it actually depends on
193
+        24 hours is the time based on default configuration. It actually depends on
145 194
         /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
146 195
         dhcp_renewal_time and
147 196
         /var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf
148 197
         dhcp_lease_duration parameters. (defaults to 86400 seconds)
149 198
 
150
-    .. note::
199
+  .. note::
151 200
 
152
-        Please note that migrating a VLAN deployment is not recommended at
153
-        this time because of a bug in core ovn, full support is being worked
154
-        out here:
201
+        Please note that migrating a deployment which uses VLAN for tenant/project
202
+        networks is not recommended at this time because of a bug in core ovn,
203
+        full support is being worked out here:
155 204
         https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347594.html
156 205
 
157
-   One way of verifying that the T1 parameter has propated to existing VMs
158
-   is going to one of the compute nodes, and run tcpdump over one of the
159
-   VM taps attached to a tenant network,  we should see that requests happen
160
-   around every 30 seconds.
161 206
 
162
-    .. code-block:: console
207
+  One way to verify that the T1 parameter has propagated to existing VMs
208
+  is to connect to one of the compute nodes, and run ``tcpdump`` over one
209
+  of the VM taps attached to a tenant network. If T1 propegation was a success,
210
+  you should see that requests happen on an interval of approximately 30 seconds.
211
+
212
+  .. code-block:: console
163 213
 
164 214
         [heat-admin@overcloud-novacompute-0 ~]$ sudo tcpdump -i tap52e872c2-e6 port 67 or port 68 -n
165 215
         tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
@@ -169,37 +219,121 @@ Carryout the below steps in the undercloud:
169 219
         13:17:56.241156 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300
170 220
         13:17:56.249899 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355
171 221
 
172
-    .. note::
222
+  .. note::
173 223
 
174
-        This verification is not possible with cirros VMs, due to cirros
175
-        udhcpc implementation which won't obey DHCP option 58 (T1), if you have
176
-        any cirros based instances you will need to reboot them.
224
+        This verification is not possible with cirros VMs. The cirros
225
+        udhcpc implementation does not obey DHCP option 58 (T1). Please
226
+        try this verification on a port that belongs to a full linux VM.
227
+        We recommend you to check all the different types of workloads your
228
+        system runs (Windows, different flavors of linux, etc..).
177 229
 
178
-6. Run ``./ovn_migration.sh reduce-mtu``. This lowers the MTU of the pre
179
-   migration VXLAN and GRE networks. You can skip this step if you use VLAN
180
-   tenant networks. It will be safe to execute in such case, because the
181
-   tool will ignore non-VXLAN/GRE networks.
230
+8. Run ``ovn_migration.sh reduce-mtu``.
231
+
232
+   This lowers the MTU of the pre
233
+   migration VXLAN and GRE networks. The tool will ignore non-VXLAN/GRE
234
+   networks, so if you use VLAN for tenant networks it will be fine if you
235
+   find this step not doing anything.
236
+
237
+   .. code-block:: console
182 238
 
183
-7. Set the below tripleo heat template parameters to point to the proper
184
-   OVN docker images in appropriate environment file
239
+        $ ovn_migration.sh reduce-mtu
185 240
 
186
-    * DockerOvnControllerConfigImage
187
-    * DockerOvnControllerImage
188
-    * DockerOvnNorthdImage
189
-    * DockerNeutronApiImage
190
-    * DockerNeutronConfigImage
191
-    * DockerOvnDbsImage
192
-    * DockerOvnDbsConfigImage
193 241
 
194
-   This can be done running the next command:
242
+   This step will go network by network reducing the MTU, and tagging with
243
+   ``adapted_mtu`` the networks which have been already handled.
244
+
245
+
246
+9. Make Tripleo ``prepare the new container images`` for OVN.
247
+
248
+   If your deployment didn't have a containers-prepare-parameter.yaml, you can
249
+   create one with:
250
+
251
+   .. code-block:: console
252
+
253
+       $ test -f $HOME/containers-prepare-parameter.yaml || \
254
+             openstack tripleo container image prepare default \
255
+                   --output-env-file $HOME/containers-prepare-parameter.yaml
256
+
257
+
258
+   If you had to create the file, please make sure it's included at the end of
259
+   your $HOME/overcloud-deploy-ovn.sh and $HOME/overcloud-deploy.sh
260
+
261
+   Change the neutron_driver in the containers-prepare-parameter.yaml file to
262
+   ovn:
263
+
264
+   .. code-block:: console
265
+
266
+      $ sed -i -E 's/neutron_driver:([ ]\w+)/neutron_driver: ovn/' $HOME/containers-prepare-parameter.yaml
267
+
268
+   You can verify with:
269
+
270
+   .. code-block:: console
271
+
272
+      $ grep neutron_driver containers-prepare-parameter.yaml
273
+      neutron_driver: ovn
274
+
275
+
276
+   Then update the images:
195 277
 
196 278
    .. code-block:: console
197 279
 
198
-       PREPARE_ARGS="-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
199
-                     -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml" \
200
-          ~/overcloud-prep-containers.sh
280
+      $ openstack tripleo container image prepare \
281
+           --environment-file /home/stack/containers-prepare-parameter.yaml
282
+
283
+   .. note::
284
+
285
+      It's important to provide the full path to your containers-prepare-parameter.yaml
286
+      otherwise the command will finish very quickly and won't work (current
287
+      version doesn't seem to output any error).
288
+
289
+
290
+   TripleO will validate the containers and push them to your local
291
+   registry.
292
+
293
+
294
+10. Run ``ovn_migration.sh start-migration`` to kick start the migration
295
+    process.
296
+
297
+   .. code-block:: console
298
+
299
+       $ ovn_migration.sh start-migration
300
+
301
+
302
+   Under the hood, this is what will happen:
303
+
304
+    * Create pre-migration resources (network and VM) to validate existing
305
+      deployment and final migration.
306
+
307
+    * Update the overcloud stack to deploy OVN alongside reference
308
+      implementation services using a temporary bridge "br-migration" instead
309
+      of br-int.
310
+
311
+    * Start the migration process:
312
+
313
+      1. generate the OVN north db by running neutron-ovn-db-sync util
314
+      2. clone the existing resources from br-int to br-migration, to ovn
315
+         find the same resources UUIDS over br-migration
316
+      3. re-assign ovn-controller to br-int instead of br-migration
317
+      4. cleanup network namespaces (fip, snat, qrouter, qdhcp),
318
+      5. remove any unnecessary patch ports on br-int
319
+      6. remove br-tun and br-migration ovs bridges
320
+      7. delete qr-*, ha-* and qg-* ports from br-int (via neutron netns
321
+         cleanup)
322
+
323
+    * Delete neutron agents and neutron HA internal networks from the database
324
+      via API.
325
+
326
+    * Validate connectivity on pre-migration resources.
327
+
328
+    * Delete pre-migration resources.
329
+
330
+    * Create post-migration resources.
331
+
332
+    * Validate connectivity on post-migration resources.
333
+
334
+    * Cleanup post-migration resources.
335
+
336
+    * Re-run deployment tool to update OVN on br-int.
201 337
 
202
-8. Run ``./ovn_migration.sh start-migration`` to kick start the migration
203
-   process.
204 338
 
205 339
 Migration is complete !!!

Loading…
Cancel
Save