Merge "[OVN] Admin procedure for duplicated or deleted OVN agents"

This commit is contained in:
Zuul 2023-04-27 14:21:54 +00:00 committed by Gerrit Code Review
commit 1ee0e38588

View File

@ -43,3 +43,82 @@ This problem is not unique to OVN but is amplified due to the possible larger
size of geneve header compared to other common tunneling protocols (VXLAN).
If you are using VM's as compute nodes make sure that you either lower the MTU
size on the virtual interface or enable fragmentation on it.
Duplicated or deleted OVN agents
--------------------------------
The "ovn-controller" process is the local controller daemon for OVN. It runs
in every host belonging to the OVN network and is in charge of registering
the host to the OVN database by creating the corresponding "Chassis" and
"Chassis_Private" registers in the Southbound database. At the same time,
when the process is gracefully stopped, it deletes both registers. These
registers are used by Neutron to control the OVN agents.
.. code-block:: console
$ openstack network agent list -c ID -c "Agent Type" -c Host -c Alive -c State
+--------------------------------------+------------------------------+--------+-------+-------+
| ID | Agent Type | Host | Alive | State |
+--------------------------------------+------------------------------+--------+-------+-------+
| a55c8d85-2071-4452-92cb-95d15c29bde7 | OVN Controller Gateway agent | u20ovn | :-) | UP |
| 62e29a01-a0ac-55c9-b4ec-e223d5c90853 | OVN Metadata agent | u20ovn | :-) | UP |
| ce9a1471-79c1-4472-adfc-9e5ce86eba07 | OVN Controller Gateway agent | u20ovn | XXX | DOWN |
| 3755938f-9aac-4f08-a1ab-32fcff56d1ce | OVN Metadata agent | u20ovn | XXX | DOWN |
+--------------------------------------+------------------------------+--------+-------+-------+
If during a system upgrade the OVS "system-id" changes, the "Chassis" and
"Chassis_Private" registers will be created again but with a different UUID.
If the previous registers are not deleted (that should happen if the
"ovn-controller" process is gracefully stopped), Neutron will show duplicated
agents from the same host. In this case, only one agent will be alive and
the other one will be down because the "Chassis_Private.nb_cfg_timestamp"
is not updated. In this case, the administrator should manually delete from
the OVN Southbound database the stale registers. For example:
* List the "Chassis" registers, filtering by hostname and name (OVS
"system-id"):
.. code-block:: console
$ sudo ovn-sbctl list Chassis | grep name
hostname : u20ovn
name : "a55c8d85-2071-4452-92cb-95d15c29bde7"
hostname : u20ovn
name : "ce9a1471-79c1-4472-adfc-9e5ce86eba07"
* Delete the stale "Chassis" register:
.. code-block:: console
$ sudo ovn-sbctl destroy Chassis ce9a1471-79c1-4472-adfc-9e5ce86eba07
* List the "Chassis_Private" registers, filtering by name:
.. code-block:: console
$ sudo ovn-sbctl list Chassis_Private | grep name
name : "a55c8d85-2071-4452-92cb-95d15c29bde7"
name : "ce9a1471-79c1-4472-adfc-9e5ce86eba07"
* Delete the stale "Chassis_Private" register:
.. code-block:: console
$ sudo ovn-sbctl destroy Chassis_Private ce9a1471-79c1-4472-adfc-9e5ce86eba07
If the host name is also updated during the system upgrade, the Neutron
agent list could present entries from different host names, but the older
ones will be down too. The procedure is the same.
It could also happen that during a node decommission, the "Chassis" register
is deleted but not the "Chassis_Private" one. In that case, the OVN agent
list will present the corresponding agents with the following message:
"('Chassis' register deleted)". Again, the procedure is the same: the
administrator should manually delete the orphaned OVN Southbound database
register. Neutron will receive this event and will delete the associated
OVN agents.