docs: updated information in the troubleshooting guide

"service" replaced to "systemctl" and add information about OVN

Change-Id: Ib45005e0d88d9fa0eeee504000c948a55afa770d
Co-authored-by: Dmitriy Chubinidze <dcu995@gmail.com>
Signed-off-by: Ivan Anfimov <lazekteam@gmail.com>
This commit is contained in:
Ivan Anfimov
2025-09-06 23:10:47 +00:00
committed by Dmitriy Chubinidze
parent a8f49dcb78
commit 0b671b172a
3 changed files with 132 additions and 75 deletions

View File

@@ -14,14 +14,14 @@ required for the OpenStack control plane to function properly.
This does not cover any networking related to instance connectivity.
These instructions assume an OpenStack-Ansible installation using LXC
containers, VXLAN overlay, and the ML2/OVS driver.
containers, VXLAN overlay for ML2/OVS and Geneve overlay for the ML2/OVN drivers.
Network List
------------
1. ``HOST_NET`` (Physical Host Management and Access to Internet)
2. ``CONTAINER_NET`` (LXC container network used OpenStack Services)
3. ``OVERLAY_NET`` (VXLAN overlay network)
2. ``MANAGEMENT_NET`` (LXC container network used OpenStack Services)
3. ``OVERLAY_NET`` (VXLAN overlay network for OVS, Geneve overlay network for OVN)
Useful network utilities and commands:
@@ -36,7 +36,6 @@ Useful network utilities and commands:
# iptables -nL
# arping [-c NUMBER] [-d] <TARGET_IP_ADDRESS>
Troubleshooting host-to-host traffic on HOST_NET
------------------------------------------------
@@ -64,8 +63,8 @@ tagged sub-interface, or in some cases the bridge interface:
valid_lft forever preferred_lft forever
...
Troubleshooting host-to-host traffic on CONTAINER_NET
-----------------------------------------------------
Troubleshooting host-to-host traffic on MANAGEMENT_NET
------------------------------------------------------
Perform the following checks:
@@ -116,6 +115,17 @@ physical interface or tagged-subinterface:
99999999_eth1
...
You can also use ip command to display bridges:
.. code-block:: console
# ip link show master br-mgmt
12: bond0.100@bond0: ... master br-mgmt state UP mode DEFAULT group default qlen 1000
....
51: 11111111_eth1_eth1@if3: ... master br-mgmt state UP mode DEFAULT group default qlen 1000
....
Troubleshooting host-to-host traffic on OVERLAY_NET
---------------------------------------------------
@@ -147,7 +157,7 @@ Checking services
~~~~~~~~~~~~~~~~~
You can check the status of an OpenStack service by accessing every controller
node and running the :command:`service <SERVICE_NAME> status`.
node and running the :command:`systemctl status <SERVICE_NAME>`.
See the following links for additional information to verify OpenStack
services:
@@ -156,9 +166,11 @@ services:
- `Image service (glance) <https://docs.openstack.org/glance/latest/install/verify.html>`_
- `Compute service (nova) <https://docs.openstack.org/nova/latest/install/verify.html>`_
- `Networking service (neutron) <https://docs.openstack.org/neutron/latest/install/verify.html>`_
- `Block Storage service <https://docs.openstack.org/cinder/latest/install/cinder-verify.html>`_
- `Block Storage service (cinder) <https://docs.openstack.org/cinder/latest/install/cinder-verify.html>`_
- `Object Storage service (swift) <https://docs.openstack.org/swift/latest/install/verify.html>`_
Some useful commands to manage LXC see :ref:`command-line-reference`.
Restarting services
~~~~~~~~~~~~~~~~~~~
@@ -173,87 +185,105 @@ The following table lists the commands to restart an OpenStack service.
* - OpenStack service
- Commands
* - Image service
- .. code-block:: console
# service glance-api restart
# systemctl restart glance-api
* - Compute service (controller node)
- .. code-block:: console
# service nova-api-os-compute restart
# service nova-consoleauth restart
# service nova-scheduler restart
# service nova-conductor restart
# service nova-api-metadata restart
# service nova-novncproxy restart (if using novnc)
# service nova-spicehtml5proxy restart (if using spice)
# systemctl restart nova-api-os-compute
# systemctl restart nova-scheduler
# systemctl restart nova-conductor
# systemctl restart nova-api-metadata
# systemctl restart nova-novncproxy (if using noVNC)
# systemctl restart nova-spicehtml5proxy (if using SPICE)
* - Compute service (compute node)
- .. code-block:: console
# service nova-compute restart
* - Networking service
# systemctl restart nova-compute
* - Networking service (controller node, for OVS)
- .. code-block:: console
# service neutron-server restart
# service neutron-dhcp-agent restart
# service neutron-l3-agent restart
# service neutron-metadata-agent restart
# service neutron-openvswitch-agent restart
# systemctl restart neutron-server
# systemctl restart neutron-dhcp-agent
# systemctl restart neutron-l3-agent
# systemctl restart neutron-metadata-agent
# systemctl restart neutron-openvswitch-agent
* - Networking service (compute node)
- .. code-block:: console
# service neutron-openvswitch-agent restart
# systemctl restart neutron-openvswitch-agent
* - Networking service (controller node, for OVN)
- .. code-block:: console
# systemctl restart neutron-server
# systemctl restart neutron-ovn-maintenance-worker
# systemctl restart neutron-periodic-workers
* - Networking service (compute node, for OVN)
- .. code-block:: console
# systemctl restart neutron-ovn-metadata-agent
* - Block Storage service
- .. code-block:: console
# service cinder-api restart
# service cinder-backup restart
# service cinder-scheduler restart
# service cinder-volume restart
# systemctl restart cinder-api
# systemctl restart cinder-backup
# systemctl restart cinder-scheduler
# systemctl restart cinder-volume
* - Shared Filesystems service
- .. code-block:: console
# service manila-api restart
# service manila-data restart
# service manila-share restart
# service manila-scheduler restart
# systemctl restart manila-api
# systemctl restart manila-data
# systemctl restart manila-share
# systemctl restart manila-scheduler
* - Object Storage service
- .. code-block:: console
# service swift-account-auditor restart
# service swift-account-server restart
# service swift-account-reaper restart
# service swift-account-replicator restart
# service swift-container-auditor restart
# service swift-container-server restart
# service swift-container-reconciler restart
# service swift-container-replicator restart
# service swift-container-sync restart
# service swift-container-updater restart
# service swift-object-auditor restart
# service swift-object-expirer restart
# service swift-object-server restart
# service swift-object-reconstructor restart
# service swift-object-replicator restart
# service swift-object-updater restart
# service swift-proxy-server restart
# systemctl restart swift-account-auditor
# systemctl restart swift-account-server
# systemctl restart swift-account-reaper
# systemctl restart swift-account-replicator
# systemctl restart swift-container-auditor
# systemctl restart swift-container-server
# systemctl restart swift-container-reconciler
# systemctl restart swift-container-replicator
# systemctl restart swift-container-sync
# systemctl restart swift-container-updater
# systemctl restart swift-object-auditor
# systemctl restart swift-object-expirer
# systemctl restart swift-object-server
# systemctl restart swift-object-reconstructor
# systemctl restart swift-object-replicator
# systemctl restart swift-object-updater
# systemctl restart swift-proxy-server
Troubleshooting instance connectivity issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section will focus on troubleshooting general instance (VM)
This section will focus on troubleshooting general instances
connectivity communication. This does not cover any networking related
to instance connectivity. This is assuming a OpenStack-Ansible install using
LXC containers, VXLAN overlay and the ML2/OVS driver.
to instance connectivity. This is assuming a OpenStack-Ansible install using LXC
containers, VXLAN overlay for ML2/OVS and Geneve overlay for the ML2/OVN driver.
**Data flow example**
**Data flow example (for OVS)**
.. code-block:: console
COMPUTE NODE
+-------------+ +-------------+
+->"If VXLAN"+->+ *br vxlan +--->+ bond#.#00 +---+
+->"If VXLAN"+->+ *br vxlan +--->+ bond0.#00 +---+
| +-------------+ +-------------+ |
+-------------+ | +-----------------+
Instance +---> | qbr bridge |++ +-->| physical network|
@@ -266,21 +296,37 @@ LXC containers, VXLAN overlay and the ML2/OVS driver.
NETWORK NODE
+-------------+ +-------------+
+->"If VXLAN"+->+ *bond#.#00 +--->+ *br vxlan +-->
+->"If VXLAN"+->+ *bond#.#00 +--->+ *br-vxlan +-->
| +-------------+ +-------------+ |
+----------------+ | +-------------+
|physical network|++ +--->+| qbr bridge |+--> Neutron DHCP/Router
+----------------+ | +-------------+
| +-------------+ +-------------+ |
+->"If VLAN"+->+ bond1 +--->+ br vlan +-->
+->"If VLAN"+->+ bond1 +--->+ br-vlan +-->
+-------------+ +-------------+
**Data flow example (for OVN)**
.. code-block:: console
COMPUTE NODE
+-------------+ +-------------+
+->"If Geneve"+->+ *br-vxlan +--->+ bond0.#00 +---+
| +-------------+ +-------------+ |
+-------------+ | +-----------------+
Instance +---> | br-int |++ +-->| physical network|
+-------------+ | +-----------------+
| +-------------+ +-------------+ |
+->"If VLAN"+->+ br-vlan +--->+ bond1 +-----+
+-------------+ +-------------+
Preliminary troubleshooting questions to answer:
------------------------------------------------
- Which compute node is hosting the VM in question?
- Which compute node is hosting the instance in question?
- Which interface is used for provider network traffic?
- Which interface is used for VXLAN overlay?
- Which interface is used for VXLAN (Geneve) overlay?
- Is there connectivity issue ingress to the instance?
- Is there connectivity issue egress from the instance?
- What is the source address of the traffic?
@@ -343,6 +389,11 @@ No:
on compute and network nodes.
- Check ``security-group-rules``,
consider adding allow ICMP rule for testing.
In case of using OVN check additionally:
- Check ovn-controller on all nodes.
- Verify ovn-northd is running and DBs are healthy.
- Ensure ovn-metadata-agent is active.
- Review logs for ovn-controller, ovn-northd.
Yes:
- Good! The instance can ping its intended gateway.
@@ -359,7 +410,7 @@ Yes:
Do not continue until the instance can reach its gateway.
If VXLAN:
If VXLAN (Geneve):
Does physical interface show link and all VLANs properly trunked
across physical network?
@@ -377,24 +428,24 @@ Yes:
Do not continue until physical network is properly configured.
Are VXLAN VTEP addresses able to ping each other?
Are VXLAN (Geneve) VTEP addresses able to ping each other?
No:
- Check ``br-vxlan`` interface on Compute and Network nodes
- Check ``br-vxlan`` interface on Compute and Network nodes.
- Check veth pairs between containers and Linux bridges on the host.
- Check that OVS bridges contain the proper interfaces
on compute and network nodes.
Yes:
- Check ml2 config file for local VXLAN IP
and other VXLAN configuration settings.
- Check ml2 config file for local VXLAN (Geneve) IP
and other VXLAN (Geneve) configuration settings.
- Check VTEP learning method (multicast or l2population):
- If multicast, make sure the physical switches are properly
allowing and distributing multicast traffic.
.. important::
Do not continue until VXLAN endpoints have reachability to each other.
Do not continue until VXLAN (Geneve) endpoints have reachability to each other.
Does the instance's IP address ping from network's DHCP namespace
or other instances in the same network?
@@ -410,7 +461,8 @@ No:
- Check syslogs.
- Check Neutron Open vSwitch agent logs.
- Check that Bridge Forwarding Database (fdb) contains the proper
entries on both the compute and Neutron agent container.
entries on both the compute and Neutron agent container
(``ovs-appctl fdb/show br-int``).
Yes:
- Good! This suggests that the instance received its IP address
@@ -434,7 +486,14 @@ No:
- Check ``security-group-rules``,
consider adding allow ICMP rule for testing.
- Check that Bridge Forwarding Database (fdb) contains
the proper entries on both the compute and Neutron agent container.
the proper entries on both the compute and Neutron agent container
(``ovs-appctl fdb/show br-int``).
In case of using OVN check additionally:
- Check ovn-controller on all nodes.
- Verify ovn-northd is running and DBs are healthy.
- Ensure ovn-metadata-agent is active.
- Review logs for ovn-controller, ovn-northd.
Yes:
- Good! The instance can ping its intended gateway.
@@ -446,7 +505,7 @@ Yes:
- Check upstream routes, NATs or ``access-control-lists``.
Diagnose Image service issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``glance-api`` handles the API interactions and image store.
@@ -463,12 +522,12 @@ identity problems:
registry is working.
For an example and more information, see `Verify operation
<https://docs.openstack.org/glance/latest/install/verify.html>`_.
<https://docs.openstack.org/glance/latest/install/verify.html>`_
and `Manage Images
<https://docs.openstack.org/glance/latest/admin/manage-images.html>`_.
Failed security hardening after host kernel upgrade from version 3.13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ubuntu kernel packages newer than version 3.13 contain a change in
module naming from ``nf_conntrack`` to ``br_netfilter``. After
@@ -522,11 +581,9 @@ To clear facts for a single host, find its file within
a JSON file that is named after its hostname. The facts for that host
will be regenerated on the next playbook run.
Failed Ansible playbooks during an upgrade
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Container networking issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -613,6 +670,5 @@ Example inventory restore process.
cd -
rm -rf /tmp/inventory_restore
At the completion of this operation the inventory will be restored to the
earlier version.

View File

@@ -1,3 +1,5 @@
.. _command-line-reference:
======================
Command Line Reference
======================
@@ -38,4 +40,3 @@ The following are some useful commands to manage LXC:
.. code-block:: shell-session
# lxc-stop --name container_name

View File

@@ -11,7 +11,7 @@ All service passwords are defined and stored as Ansible variables in
OpenStack-Ansible.
This allows the operator to store passwords in an encrypted format using
`Ansible Vault <https://docs.ansible.com/ansible/latest/vault_guide/index.html>`_
or define them as a lookup to `SOPS <https://getsops.io/>`_ or `OpenBao <https://openbao.org/>`_
or define them as a lookup to `SOPS <https://getsops.io/>`_ or `OpenBao <https://openbao.org/>`_.
Typical password change processes include the following steps: