Migration document update.

This update makes the migration document more clear.

Change-Id: I83701640625bdc40fa60a7c38bbb39278a155648
changes/21/609721/6
Miguel Angel Ajo 4 years ago
parent c6a5543f9e
commit 8798518263
  1. 370
      doc/source/install/migration.rst

@ -3,163 +3,213 @@
Migration Strategy
==================
This document details an in-place migration strategy from ML2/OVS in either
ovs-firewall, or ovs-hybrid mode in a TripleO OpenStack deployment.
This document details an in-place migration strategy from ML2/OVS to ML2/OVN
in either ovs-firewall or ovs-hybrid mode for a TripleO OpenStack deployment.
For non TripleO deployments, please refer to the file ``migration/README.rst``
and the ansible playbook ``migration/migrate-to-ovn.yml``.
Overview
--------
The migration would be accomplished by following the steps:
The migration process is orchestrated through the shell script
ovn_migration.sh, which is provided with networking-ovn.
a. Administrator steps:
The administrator uses ovn_migration.sh to perform readiness steps
and migration from the undercloud node.
The readiness steps, such as host inventory production, DHCP and MTU
adjustments, prepare the environment for the procedure.
* Updating to the latest openstack/neutron version
Subsequent steps start the migration via Ansible.
* Reducing the DHCP T1 parameter on dhcp_agent.ini beforehand, which
is controlled by the dhcp_renewal_time of /etc/neutron/dhcp_agent.ini
Plan for a 24-hour wait after the setup-mtu-t1 step to allow VMs to catch up
with the new MTU size. The default neutron ML2/OVS configuration has a
dhcp_lease_duration of 86400 seconds (24h).
Somewhere around 30 seconds would be enough (TODO: Data and calculations
to back this value with precise information).
Also, if there are instances using static IP assignment, the administrator
should be ready to update the MTU of those instances to the new value of 8
bytes less than the ML2/OVS (VXLAN) MTU value. For example, the typical
1500 MTU network value that makes VXLAN tenant networks use 1450 bytes of MTU
will need to change to 1442 under Geneve. Or under the same overlay network,
a GRE encapsulated tenant network would use a 1458 MTU, but again a 1442 MTU
for Geneve.
* Waiting for at least dhcp_lease_duration (see /etc/neutron/neutron.conf
or /etc/neutron/dhcp_agent.ini) time (default is 86400 seconds =
24 hours), that way all instances will grab the new new lease renewal
time and start checking with the dhcp server periodically based on the
T1 parameter.
If there are instances which use DHCP but don't support lease update during
the T1 period the administrator will need to reboot them to ensure that MTU
is updated inside those instances.
* Lowering the MTU of all VXLAN or GRE based networks down to
make sure geneve works (a tool will be provided for that). The mtu
must be set to "max_tunneling_network_mtu - ovn_geneve_overhead", that's
generally "1500 - ovn_geneve_overhead", unless your network and any
intermediate router hop between compute and network nodes is jumboframe
capable). ovn_geneve_overhead is 58 bytes. VXLAN overhead is 50 bytes. So
for the typical 1500 MTU tunneling network, we may need to assign 1442.
b. Automated steps (via ansible)
Steps for migration
-------------------
* Create pre-migration resources (network and VM) to validate final
migration.
Perform the following steps in the overcloud/undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Update the overcloud stack (in the case of TripleO) to deploy OVN
alongside reference implementation services using a temporary bridge
"br-migration" instead of br-int.
1. Ensure that you have updated to the latest openstack/neutron version.
* Start the migration process:
Perform the following steps in the undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. generate the OVN north db by running neutron-ovn-db-sync util
2. re-assign ovn-controller to br-int instead of br-migration
3. cleanup network namespaces (fip, snat, qrouter, qdhcp),
4. remove any unnecessary patch ports on br-int
5. remove br-tun and br-migration ovs bridges
6. delete qr-*, ha-* and qg-* ports from br-int
1. Install python-networking-ovn-migration-tool.
* Delete neutron agents and neutron HA internal networks
.. code-block:: console
* Validate connectivity on pre-migration resources.
yum install python-networking-ovn-migration-tool
* Delete pre-migration resources.
2. Create a working directory on the undercloud, and copy the ansible playbooks
* Create post-migration resources.
.. code-block:: console
* Validate connectivity on post-migration resources.
mkdir ~/ovn_migration
cd ~/ovn_migration
cp -rfp /usr/share/ansible/networking-ovn-migration/playbooks .
* Cleanup post-migration resources.
* Re-run deployment tool to update OVN on br-int.
3. Create or edit the ``overcloud-deploy-ovn.sh`` script in your ``$HOME``.
This script must source your stackrc file, and then execute an ``openstack
overcloud overcloud deploy`` with your original deployment parameters, plus
the following environment files, added to the end of the command
in the following order:
When your network topology is DVR and your compute nodes have connectivity
to the external network:
Steps for migration
-------------------
Carryout the below steps in the undercloud:
.. code-block:: console
1. Create ``overcloud-deploy-ovn.sh`` script in /home/stack. Make sure the
below environment files are added in the order mentioned below
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \
-e $HOME/ovn-extras.yaml
When your compute nodes don't have external connectivity and you don't use
DVR:
.. code-block:: console
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml \
-e /home/stack/ovn-extras.yaml
-e $HOME/ovn-extras.yaml
If compute nodes have external connectivity, then you can use the
environment file - environments/services-docker/neutron-ovn-dvr-ha.yaml
Make sure that all users have execution privileges on the script, because it
will be called by ovn_migration.sh/ansible during the migration process.
.. code-block:: console
$ chmod a+x ~/overcloud-deploy-ovn.sh
2. Check the script ``ovn_migration.sh`` and override the environment variables
if desired.
Below are the environment variables
4. To configure the parameters of your migration you can set the environment
variables that will be used by ``ovn_migration.sh``. You can skip setting any
values matching the defaults.
* IS_DVR_ENABLED - If the existing ML2/OVS has DVR enabled, set it to True.
Default value is False.
* STACKRC_FILE - must point to your stackrc file in your undercloud.
Default: ~/stackrc
* PUBLIC_NETWORK_NAME - Name of the public network. Default value is
'public'.
* OVERCLOUDRC_FILE - must point to your overcloudrc file in your
undercloud.
Default: ~/overcloudrc
* OVERCLOUD_OVN_DEPLOY_SCRIPT - must point to the script described in step
1..
Default: ~/overcloud-deploy-ovn.sh
* PUBLIC_NETWORK_NAME - Name of your public network.
Default: 'public'.
To support migration validation, this network must have available
floating IPs, and those floating IPs must be pingable from the
undercloud. If that's not possible please configure VALIDATE_MIGRATION
to False.
* IMAGE_NAME - Name/ID of the glance image to us for booting a test server.
Default value is 'cirros'.
Default:'cirros'.
It will be automatically downloaded during the pre-validation /
post-validation process.
* VALIDATE_MIGRATION - Create migration resources to validate the
migration.
The migration script, before starting the migration, boots a server and
validates that the server is reachable after the migration.
Default value is True.
migration. The migration script, before starting the migration, boot a
server and validates that the server is reachable after the migration.
Default: True.
* SERVER_USER_NAME - User name to use for logging to the migration
instances.
Default: 'cirros'.
* SERVER_USER_NAME - User name to use for logging to the migration server.
Default value is 'cirros'.
* DHCP_RENEWAL_TIME - DHCP renewal time in seconds to configure in DHCP
agent configuration file.
Default: 30
* DHCP_RENEWAL_TIME - DHCP renewal time to configure in dhcp agent
configuration file. The default value is 30 seconds.
2. Run ``./ovn_migration.sh generate-inventory`` to generate the inventory
file - hosts_for_migration. Please review this file for correctness and
modify it if desired.
.. warning::
Please note that VALIDATE_MIGRATION requires enough quota (2
available floating ips, 2 networks, 2 subnets, 2 instances,
and 2 routers as admin).
For example:
.. code-block:: console
$ export PUBLIC_NETWORK_NAME=my-public-network
$ ovn_migration.sh .........
5. Run ``ovn_migration.sh generate-inventory`` to generate the inventory
file - ``hosts_for_migration`` and ``ansible.cfg``. Please review
``hosts_for_migration`` for correctness.
.. code-block:: console
$ ovn_migration.sh generate-inventory
4. Run ``./ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter
of the internal neutron DHCP servers configuring the 'dhcp_renewal_time' in
/var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
6. Run ``ovn_migration.sh setup-mtu-t1``. This lowers the T1 parameter
of the internal neutron DHCP servers configuring the ``dhcp_renewal_time``
in /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
in all the nodes where DHCP agent is running.
5. After the previous step we need to wait at least 24h before continuing
if you are using VXLAN or GRE tenant networking. This will allow VMs to
catch up with the new MTU size of the next step.
.. code-block:: console
.. warning::
$ ovn_migration.sh setup-mtu-t1
This step is very important, never skip it if you are using VXLAN
or GRE tenant networks. If you are using VLAN tenant networks you don't
need to wait.
.. warning::
7. If you are using VXLAN or GRE tenant networking, ``wait at least 24 hours``
before continuing. This will allow VMs to catch up with the new MTU size
of the next step.`
.. warning::
If you are using VXLAN or GRE networks, this 24-hour wait step is critical.
If you are using VLAN tenant networks you can proceed to the next step without delay.
.. warning::
If you have any instance with static IP assignation on VXLAN or
GRE tenant networks, you will need to manually modify the
configuration of those instances to configure the new geneve MTU,
which is current VXLAN MTU minus 8 bytes, that is 1442 when VXLAN
based MTU was 1450.
GRE tenant networks, you must manually modify the configuration of those instances.
If your instances don't honor the T1 parameter of DHCP they will need
to be rebooted.
to configure the new geneve MTU, which is the current VXLAN MTU minus 8 bytes.
For instance, if the VXLAN-based MTU was 1450, change it to 1442.
.. note::
.. note::
24h is the time based on default configuration, it actually depends on
24 hours is the time based on default configuration. It actually depends on
/var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
dhcp_renewal_time and
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf
dhcp_lease_duration parameters. (defaults to 86400 seconds)
.. note::
.. note::
Please note that migrating a VLAN deployment is not recommended at
this time because of a bug in core ovn, full support is being worked
out here:
Please note that migrating a deployment which uses VLAN for tenant/project
networks is not recommended at this time because of a bug in core ovn,
full support is being worked out here:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347594.html
One way of verifying that the T1 parameter has propated to existing VMs
is going to one of the compute nodes, and run tcpdump over one of the
VM taps attached to a tenant network, we should see that requests happen
around every 30 seconds.
.. code-block:: console
One way to verify that the T1 parameter has propagated to existing VMs
is to connect to one of the compute nodes, and run ``tcpdump`` over one
of the VM taps attached to a tenant network. If T1 propegation was a success,
you should see that requests happen on an interval of approximately 30 seconds.
.. code-block:: console
[heat-admin@overcloud-novacompute-0 ~]$ sudo tcpdump -i tap52e872c2-e6 port 67 or port 68 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
@ -169,37 +219,121 @@ Carryout the below steps in the undercloud:
13:17:56.241156 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300
13:17:56.249899 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355
.. note::
.. note::
This verification is not possible with cirros VMs, due to cirros
udhcpc implementation which won't obey DHCP option 58 (T1), if you have
any cirros based instances you will need to reboot them.
This verification is not possible with cirros VMs. The cirros
udhcpc implementation does not obey DHCP option 58 (T1). Please
try this verification on a port that belongs to a full linux VM.
We recommend you to check all the different types of workloads your
system runs (Windows, different flavors of linux, etc..).
6. Run ``./ovn_migration.sh reduce-mtu``. This lowers the MTU of the pre
migration VXLAN and GRE networks. You can skip this step if you use VLAN
tenant networks. It will be safe to execute in such case, because the
tool will ignore non-VXLAN/GRE networks.
8. Run ``ovn_migration.sh reduce-mtu``.
This lowers the MTU of the pre
migration VXLAN and GRE networks. The tool will ignore non-VXLAN/GRE
networks, so if you use VLAN for tenant networks it will be fine if you
find this step not doing anything.
.. code-block:: console
7. Set the below tripleo heat template parameters to point to the proper
OVN docker images in appropriate environment file
$ ovn_migration.sh reduce-mtu
* DockerOvnControllerConfigImage
* DockerOvnControllerImage
* DockerOvnNorthdImage
* DockerNeutronApiImage
* DockerNeutronConfigImage
* DockerOvnDbsImage
* DockerOvnDbsConfigImage
This can be done running the next command:
This step will go network by network reducing the MTU, and tagging with
``adapted_mtu`` the networks which have been already handled.
9. Make Tripleo ``prepare the new container images`` for OVN.
If your deployment didn't have a containers-prepare-parameter.yaml, you can
create one with:
.. code-block:: console
$ test -f $HOME/containers-prepare-parameter.yaml || \
openstack tripleo container image prepare default \
--output-env-file $HOME/containers-prepare-parameter.yaml
If you had to create the file, please make sure it's included at the end of
your $HOME/overcloud-deploy-ovn.sh and $HOME/overcloud-deploy.sh
Change the neutron_driver in the containers-prepare-parameter.yaml file to
ovn:
.. code-block:: console
$ sed -i -E 's/neutron_driver:([ ]\w+)/neutron_driver: ovn/' $HOME/containers-prepare-parameter.yaml
You can verify with:
.. code-block:: console
$ grep neutron_driver containers-prepare-parameter.yaml
neutron_driver: ovn
Then update the images:
.. code-block:: console
PREPARE_ARGS="-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml" \
~/overcloud-prep-containers.sh
$ openstack tripleo container image prepare \
--environment-file /home/stack/containers-prepare-parameter.yaml
.. note::
It's important to provide the full path to your containers-prepare-parameter.yaml
otherwise the command will finish very quickly and won't work (current
version doesn't seem to output any error).
TripleO will validate the containers and push them to your local
registry.
10. Run ``ovn_migration.sh start-migration`` to kick start the migration
process.
.. code-block:: console
$ ovn_migration.sh start-migration
Under the hood, this is what will happen:
* Create pre-migration resources (network and VM) to validate existing
deployment and final migration.
* Update the overcloud stack to deploy OVN alongside reference
implementation services using a temporary bridge "br-migration" instead
of br-int.
* Start the migration process:
1. generate the OVN north db by running neutron-ovn-db-sync util
2. clone the existing resources from br-int to br-migration, to ovn
find the same resources UUIDS over br-migration
3. re-assign ovn-controller to br-int instead of br-migration
4. cleanup network namespaces (fip, snat, qrouter, qdhcp),
5. remove any unnecessary patch ports on br-int
6. remove br-tun and br-migration ovs bridges
7. delete qr-*, ha-* and qg-* ports from br-int (via neutron netns
cleanup)
* Delete neutron agents and neutron HA internal networks from the database
via API.
* Validate connectivity on pre-migration resources.
* Delete pre-migration resources.
* Create post-migration resources.
* Validate connectivity on post-migration resources.
* Cleanup post-migration resources.
* Re-run deployment tool to update OVN on br-int.
8. Run ``./ovn_migration.sh start-migration`` to kick start the migration
process.
Migration is complete !!!

Loading…
Cancel
Save