docs: updated information in the troubleshooting guide

"service" replaced to "systemctl" and add information about OVN

Change-Id: Ib45005e0d88d9fa0eeee504000c948a55afa770d
Co-authored-by: Dmitriy Chubinidze <dcu995@gmail.com>
Signed-off-by: Ivan Anfimov <lazekteam@gmail.com>
This commit is contained in:
Ivan Anfimov
2025-09-06 23:10:47 +00:00
committed by Dmitriy Chubinidze
parent a8f49dcb78
commit 0b671b172a
3 changed files with 132 additions and 75 deletions

View File

@@ -14,14 +14,14 @@ required for the OpenStack control plane to function properly.
This does not cover any networking related to instance connectivity. This does not cover any networking related to instance connectivity.
These instructions assume an OpenStack-Ansible installation using LXC These instructions assume an OpenStack-Ansible installation using LXC
containers, VXLAN overlay, and the ML2/OVS driver. containers, VXLAN overlay for ML2/OVS and Geneve overlay for the ML2/OVN drivers.
Network List Network List
------------ ------------
1. ``HOST_NET`` (Physical Host Management and Access to Internet) 1. ``HOST_NET`` (Physical Host Management and Access to Internet)
2. ``CONTAINER_NET`` (LXC container network used OpenStack Services) 2. ``MANAGEMENT_NET`` (LXC container network used OpenStack Services)
3. ``OVERLAY_NET`` (VXLAN overlay network) 3. ``OVERLAY_NET`` (VXLAN overlay network for OVS, Geneve overlay network for OVN)
Useful network utilities and commands: Useful network utilities and commands:
@@ -36,7 +36,6 @@ Useful network utilities and commands:
# iptables -nL # iptables -nL
# arping [-c NUMBER] [-d] <TARGET_IP_ADDRESS> # arping [-c NUMBER] [-d] <TARGET_IP_ADDRESS>
Troubleshooting host-to-host traffic on HOST_NET Troubleshooting host-to-host traffic on HOST_NET
------------------------------------------------ ------------------------------------------------
@@ -64,8 +63,8 @@ tagged sub-interface, or in some cases the bridge interface:
valid_lft forever preferred_lft forever valid_lft forever preferred_lft forever
... ...
Troubleshooting host-to-host traffic on CONTAINER_NET Troubleshooting host-to-host traffic on MANAGEMENT_NET
----------------------------------------------------- ------------------------------------------------------
Perform the following checks: Perform the following checks:
@@ -116,6 +115,17 @@ physical interface or tagged-subinterface:
99999999_eth1 99999999_eth1
... ...
You can also use ip command to display bridges:
.. code-block:: console
# ip link show master br-mgmt
12: bond0.100@bond0: ... master br-mgmt state UP mode DEFAULT group default qlen 1000
....
51: 11111111_eth1_eth1@if3: ... master br-mgmt state UP mode DEFAULT group default qlen 1000
....
Troubleshooting host-to-host traffic on OVERLAY_NET Troubleshooting host-to-host traffic on OVERLAY_NET
--------------------------------------------------- ---------------------------------------------------
@@ -147,7 +157,7 @@ Checking services
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
You can check the status of an OpenStack service by accessing every controller You can check the status of an OpenStack service by accessing every controller
node and running the :command:`service <SERVICE_NAME> status`. node and running the :command:`systemctl status <SERVICE_NAME>`.
See the following links for additional information to verify OpenStack See the following links for additional information to verify OpenStack
services: services:
@@ -156,9 +166,11 @@ services:
- `Image service (glance) <https://docs.openstack.org/glance/latest/install/verify.html>`_ - `Image service (glance) <https://docs.openstack.org/glance/latest/install/verify.html>`_
- `Compute service (nova) <https://docs.openstack.org/nova/latest/install/verify.html>`_ - `Compute service (nova) <https://docs.openstack.org/nova/latest/install/verify.html>`_
- `Networking service (neutron) <https://docs.openstack.org/neutron/latest/install/verify.html>`_ - `Networking service (neutron) <https://docs.openstack.org/neutron/latest/install/verify.html>`_
- `Block Storage service <https://docs.openstack.org/cinder/latest/install/cinder-verify.html>`_ - `Block Storage service (cinder) <https://docs.openstack.org/cinder/latest/install/cinder-verify.html>`_
- `Object Storage service (swift) <https://docs.openstack.org/swift/latest/install/verify.html>`_ - `Object Storage service (swift) <https://docs.openstack.org/swift/latest/install/verify.html>`_
Some useful commands to manage LXC see :ref:`command-line-reference`.
Restarting services Restarting services
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
@@ -173,87 +185,105 @@ The following table lists the commands to restart an OpenStack service.
* - OpenStack service * - OpenStack service
- Commands - Commands
* - Image service * - Image service
- .. code-block:: console - .. code-block:: console
# service glance-api restart # systemctl restart glance-api
* - Compute service (controller node) * - Compute service (controller node)
- .. code-block:: console - .. code-block:: console
# service nova-api-os-compute restart # systemctl restart nova-api-os-compute
# service nova-consoleauth restart # systemctl restart nova-scheduler
# service nova-scheduler restart # systemctl restart nova-conductor
# service nova-conductor restart # systemctl restart nova-api-metadata
# service nova-api-metadata restart # systemctl restart nova-novncproxy (if using noVNC)
# service nova-novncproxy restart (if using novnc) # systemctl restart nova-spicehtml5proxy (if using SPICE)
# service nova-spicehtml5proxy restart (if using spice)
* - Compute service (compute node) * - Compute service (compute node)
- .. code-block:: console - .. code-block:: console
# service nova-compute restart # systemctl restart nova-compute
* - Networking service
* - Networking service (controller node, for OVS)
- .. code-block:: console - .. code-block:: console
# service neutron-server restart # systemctl restart neutron-server
# service neutron-dhcp-agent restart # systemctl restart neutron-dhcp-agent
# service neutron-l3-agent restart # systemctl restart neutron-l3-agent
# service neutron-metadata-agent restart # systemctl restart neutron-metadata-agent
# service neutron-openvswitch-agent restart # systemctl restart neutron-openvswitch-agent
* - Networking service (compute node) * - Networking service (compute node)
- .. code-block:: console - .. code-block:: console
# service neutron-openvswitch-agent restart # systemctl restart neutron-openvswitch-agent
* - Networking service (controller node, for OVN)
- .. code-block:: console
# systemctl restart neutron-server
# systemctl restart neutron-ovn-maintenance-worker
# systemctl restart neutron-periodic-workers
* - Networking service (compute node, for OVN)
- .. code-block:: console
# systemctl restart neutron-ovn-metadata-agent
* - Block Storage service * - Block Storage service
- .. code-block:: console - .. code-block:: console
# service cinder-api restart # systemctl restart cinder-api
# service cinder-backup restart # systemctl restart cinder-backup
# service cinder-scheduler restart # systemctl restart cinder-scheduler
# service cinder-volume restart # systemctl restart cinder-volume
* - Shared Filesystems service * - Shared Filesystems service
- .. code-block:: console - .. code-block:: console
# service manila-api restart # systemctl restart manila-api
# service manila-data restart # systemctl restart manila-data
# service manila-share restart # systemctl restart manila-share
# service manila-scheduler restart # systemctl restart manila-scheduler
* - Object Storage service * - Object Storage service
- .. code-block:: console - .. code-block:: console
# service swift-account-auditor restart # systemctl restart swift-account-auditor
# service swift-account-server restart # systemctl restart swift-account-server
# service swift-account-reaper restart # systemctl restart swift-account-reaper
# service swift-account-replicator restart # systemctl restart swift-account-replicator
# service swift-container-auditor restart # systemctl restart swift-container-auditor
# service swift-container-server restart # systemctl restart swift-container-server
# service swift-container-reconciler restart # systemctl restart swift-container-reconciler
# service swift-container-replicator restart # systemctl restart swift-container-replicator
# service swift-container-sync restart # systemctl restart swift-container-sync
# service swift-container-updater restart # systemctl restart swift-container-updater
# service swift-object-auditor restart # systemctl restart swift-object-auditor
# service swift-object-expirer restart # systemctl restart swift-object-expirer
# service swift-object-server restart # systemctl restart swift-object-server
# service swift-object-reconstructor restart # systemctl restart swift-object-reconstructor
# service swift-object-replicator restart # systemctl restart swift-object-replicator
# service swift-object-updater restart # systemctl restart swift-object-updater
# service swift-proxy-server restart # systemctl restart swift-proxy-server
Troubleshooting instance connectivity issues Troubleshooting instance connectivity issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section will focus on troubleshooting general instance (VM) This section will focus on troubleshooting general instances
connectivity communication. This does not cover any networking related connectivity communication. This does not cover any networking related
to instance connectivity. This is assuming a OpenStack-Ansible install using to instance connectivity. This is assuming a OpenStack-Ansible install using LXC
LXC containers, VXLAN overlay and the ML2/OVS driver. containers, VXLAN overlay for ML2/OVS and Geneve overlay for the ML2/OVN driver.
**Data flow example** **Data flow example (for OVS)**
.. code-block:: console .. code-block:: console
COMPUTE NODE COMPUTE NODE
+-------------+ +-------------+ +-------------+ +-------------+
+->"If VXLAN"+->+ *br vxlan +--->+ bond#.#00 +---+ +->"If VXLAN"+->+ *br vxlan +--->+ bond0.#00 +---+
| +-------------+ +-------------+ | | +-------------+ +-------------+ |
+-------------+ | +-----------------+ +-------------+ | +-----------------+
Instance +---> | qbr bridge |++ +-->| physical network| Instance +---> | qbr bridge |++ +-->| physical network|
@@ -266,21 +296,37 @@ LXC containers, VXLAN overlay and the ML2/OVS driver.
NETWORK NODE NETWORK NODE
+-------------+ +-------------+ +-------------+ +-------------+
+->"If VXLAN"+->+ *bond#.#00 +--->+ *br vxlan +--> +->"If VXLAN"+->+ *bond#.#00 +--->+ *br-vxlan +-->
| +-------------+ +-------------+ | | +-------------+ +-------------+ |
+----------------+ | +-------------+ +----------------+ | +-------------+
|physical network|++ +--->+| qbr bridge |+--> Neutron DHCP/Router |physical network|++ +--->+| qbr bridge |+--> Neutron DHCP/Router
+----------------+ | +-------------+ +----------------+ | +-------------+
| +-------------+ +-------------+ | | +-------------+ +-------------+ |
+->"If VLAN"+->+ bond1 +--->+ br vlan +--> +->"If VLAN"+->+ bond1 +--->+ br-vlan +-->
+-------------+ +-------------+ +-------------+ +-------------+
**Data flow example (for OVN)**
.. code-block:: console
COMPUTE NODE
+-------------+ +-------------+
+->"If Geneve"+->+ *br-vxlan +--->+ bond0.#00 +---+
| +-------------+ +-------------+ |
+-------------+ | +-----------------+
Instance +---> | br-int |++ +-->| physical network|
+-------------+ | +-----------------+
| +-------------+ +-------------+ |
+->"If VLAN"+->+ br-vlan +--->+ bond1 +-----+
+-------------+ +-------------+
Preliminary troubleshooting questions to answer: Preliminary troubleshooting questions to answer:
------------------------------------------------ ------------------------------------------------
- Which compute node is hosting the VM in question? - Which compute node is hosting the instance in question?
- Which interface is used for provider network traffic? - Which interface is used for provider network traffic?
- Which interface is used for VXLAN overlay? - Which interface is used for VXLAN (Geneve) overlay?
- Is there connectivity issue ingress to the instance? - Is there connectivity issue ingress to the instance?
- Is there connectivity issue egress from the instance? - Is there connectivity issue egress from the instance?
- What is the source address of the traffic? - What is the source address of the traffic?
@@ -343,6 +389,11 @@ No:
on compute and network nodes. on compute and network nodes.
- Check ``security-group-rules``, - Check ``security-group-rules``,
consider adding allow ICMP rule for testing. consider adding allow ICMP rule for testing.
In case of using OVN check additionally:
- Check ovn-controller on all nodes.
- Verify ovn-northd is running and DBs are healthy.
- Ensure ovn-metadata-agent is active.
- Review logs for ovn-controller, ovn-northd.
Yes: Yes:
- Good! The instance can ping its intended gateway. - Good! The instance can ping its intended gateway.
@@ -359,7 +410,7 @@ Yes:
Do not continue until the instance can reach its gateway. Do not continue until the instance can reach its gateway.
If VXLAN: If VXLAN (Geneve):
Does physical interface show link and all VLANs properly trunked Does physical interface show link and all VLANs properly trunked
across physical network? across physical network?
@@ -377,24 +428,24 @@ Yes:
Do not continue until physical network is properly configured. Do not continue until physical network is properly configured.
Are VXLAN VTEP addresses able to ping each other? Are VXLAN (Geneve) VTEP addresses able to ping each other?
No: No:
- Check ``br-vxlan`` interface on Compute and Network nodes - Check ``br-vxlan`` interface on Compute and Network nodes.
- Check veth pairs between containers and Linux bridges on the host. - Check veth pairs between containers and Linux bridges on the host.
- Check that OVS bridges contain the proper interfaces - Check that OVS bridges contain the proper interfaces
on compute and network nodes. on compute and network nodes.
Yes: Yes:
- Check ml2 config file for local VXLAN IP - Check ml2 config file for local VXLAN (Geneve) IP
and other VXLAN configuration settings. and other VXLAN (Geneve) configuration settings.
- Check VTEP learning method (multicast or l2population): - Check VTEP learning method (multicast or l2population):
- If multicast, make sure the physical switches are properly - If multicast, make sure the physical switches are properly
allowing and distributing multicast traffic. allowing and distributing multicast traffic.
.. important:: .. important::
Do not continue until VXLAN endpoints have reachability to each other. Do not continue until VXLAN (Geneve) endpoints have reachability to each other.
Does the instance's IP address ping from network's DHCP namespace Does the instance's IP address ping from network's DHCP namespace
or other instances in the same network? or other instances in the same network?
@@ -410,7 +461,8 @@ No:
- Check syslogs. - Check syslogs.
- Check Neutron Open vSwitch agent logs. - Check Neutron Open vSwitch agent logs.
- Check that Bridge Forwarding Database (fdb) contains the proper - Check that Bridge Forwarding Database (fdb) contains the proper
entries on both the compute and Neutron agent container. entries on both the compute and Neutron agent container
(``ovs-appctl fdb/show br-int``).
Yes: Yes:
- Good! This suggests that the instance received its IP address - Good! This suggests that the instance received its IP address
@@ -434,7 +486,14 @@ No:
- Check ``security-group-rules``, - Check ``security-group-rules``,
consider adding allow ICMP rule for testing. consider adding allow ICMP rule for testing.
- Check that Bridge Forwarding Database (fdb) contains - Check that Bridge Forwarding Database (fdb) contains
the proper entries on both the compute and Neutron agent container. the proper entries on both the compute and Neutron agent container
(``ovs-appctl fdb/show br-int``).
In case of using OVN check additionally:
- Check ovn-controller on all nodes.
- Verify ovn-northd is running and DBs are healthy.
- Ensure ovn-metadata-agent is active.
- Review logs for ovn-controller, ovn-northd.
Yes: Yes:
- Good! The instance can ping its intended gateway. - Good! The instance can ping its intended gateway.
@@ -446,7 +505,7 @@ Yes:
- Check upstream routes, NATs or ``access-control-lists``. - Check upstream routes, NATs or ``access-control-lists``.
Diagnose Image service issues Diagnose Image service issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``glance-api`` handles the API interactions and image store. The ``glance-api`` handles the API interactions and image store.
@@ -463,12 +522,12 @@ identity problems:
registry is working. registry is working.
For an example and more information, see `Verify operation For an example and more information, see `Verify operation
<https://docs.openstack.org/glance/latest/install/verify.html>`_. <https://docs.openstack.org/glance/latest/install/verify.html>`_
and `Manage Images and `Manage Images
<https://docs.openstack.org/glance/latest/admin/manage-images.html>`_. <https://docs.openstack.org/glance/latest/admin/manage-images.html>`_.
Failed security hardening after host kernel upgrade from version 3.13 Failed security hardening after host kernel upgrade from version 3.13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ubuntu kernel packages newer than version 3.13 contain a change in Ubuntu kernel packages newer than version 3.13 contain a change in
module naming from ``nf_conntrack`` to ``br_netfilter``. After module naming from ``nf_conntrack`` to ``br_netfilter``. After
@@ -522,11 +581,9 @@ To clear facts for a single host, find its file within
a JSON file that is named after its hostname. The facts for that host a JSON file that is named after its hostname. The facts for that host
will be regenerated on the next playbook run. will be regenerated on the next playbook run.
Failed Ansible playbooks during an upgrade Failed Ansible playbooks during an upgrade
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Container networking issues Container networking issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -613,6 +670,5 @@ Example inventory restore process.
cd - cd -
rm -rf /tmp/inventory_restore rm -rf /tmp/inventory_restore
At the completion of this operation the inventory will be restored to the At the completion of this operation the inventory will be restored to the
earlier version. earlier version.

View File

@@ -1,3 +1,5 @@
.. _command-line-reference:
====================== ======================
Command Line Reference Command Line Reference
====================== ======================
@@ -38,4 +40,3 @@ The following are some useful commands to manage LXC:
.. code-block:: shell-session .. code-block:: shell-session
# lxc-stop --name container_name # lxc-stop --name container_name

View File

@@ -11,7 +11,7 @@ All service passwords are defined and stored as Ansible variables in
OpenStack-Ansible. OpenStack-Ansible.
This allows the operator to store passwords in an encrypted format using This allows the operator to store passwords in an encrypted format using
`Ansible Vault <https://docs.ansible.com/ansible/latest/vault_guide/index.html>`_ `Ansible Vault <https://docs.ansible.com/ansible/latest/vault_guide/index.html>`_
or define them as a lookup to `SOPS <https://getsops.io/>`_ or `OpenBao <https://openbao.org/>`_ or define them as a lookup to `SOPS <https://getsops.io/>`_ or `OpenBao <https://openbao.org/>`_.
Typical password change processes include the following steps: Typical password change processes include the following steps: