[troubleshooting] Update commands and paths used by modern versions of TripleO.

Previously, the content was primarily targeted at Newton which is EOL.

Change-Id: I2702c9b027be72d2dc636df4c81189b55e213ca0
Signed-off-by: Luke Short <ekultails@gmail.com>
This commit is contained in:
Luke Short 2020-01-09 18:20:34 -05:00
parent f10168f2a6
commit c6da4c6215
5 changed files with 42 additions and 45 deletions

View File

@ -1,8 +1,8 @@
Troubleshooting Image Build Troubleshooting Image Build
----------------------------------- ---------------------------
Images fail to build Images fail to build
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
More space needed More space needed
^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
@ -13,4 +13,4 @@ can fail with a message like "At least 174MB more space needed on
the / filesystem". If freeing up more RAM isn't a possibility, the / filesystem". If freeing up more RAM isn't a possibility,
images can be built on disk by exporting an environment variable:: images can be built on disk by exporting an environment variable::
export DIB_NO_TMPFS=1 $ export DIB_NO_TMPFS=1

View File

@ -19,7 +19,7 @@ colect controller` will match all the overcloud nodes that contain the word
`controller`. To download the run the command and download them to a local `controller`. To download the run the command and download them to a local
directory, run the following command:: directory, run the following command::
openstack overcloud support report collect controller $ openstack overcloud support report collect controller
.. note:: By default if -o is not specified, the logs will be downloaded to a folder .. note:: By default if -o is not specified, the logs will be downloaded to a folder
in the current working directory called `support_logs` in the current working directory called `support_logs`
@ -31,7 +31,7 @@ Example: Download logs from a single host
To download logs from a specific host, you must specify the complete name as To download logs from a specific host, you must specify the complete name as
reported by `openstack service list` from the undercloud:: reported by `openstack service list` from the undercloud::
openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0 $ openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
Example: Leave logs in a swift container Example: Leave logs in a swift container
@ -42,14 +42,14 @@ logs, you can leave them in a swift container for later retrieval. The
``--collect-only`` and ``-c`` options can be leveraged to store the ``--collect-only`` and ``-c`` options can be leveraged to store the
logs in a swift container. For example:: logs in a swift container. For example::
openstack overcloud support report collect -c logs_20170601 --collect-only controller $ openstack overcloud support report collect -c logs_20170601 --collect-only controller
This will run sosreport on the nodes and upload the logs to a container named This will run sosreport on the nodes and upload the logs to a container named
`logs_20170601` on the undercloud. From which standard swift tooling can be `logs_20170601` on the undercloud. From which standard swift tooling can be
used to download the logs. Alternatively, you can then fetch the logs using used to download the logs. Alternatively, you can then fetch the logs using
the `openstack overcloud support report collect` command by running:: the `openstack overcloud support report collect` command by running::
openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller $ openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
.. note:: There is a ``--skip-container-delete`` option that can be used if you .. note:: There is a ``--skip-container-delete`` option that can be used if you
want to leave the logs in swift but still download them. This option want to leave the logs in swift but still download them. This option
@ -64,6 +64,4 @@ The ``openstack overcloud support report collect`` command has additional
that can be passed to work with the log bundles. Run the command with that can be passed to work with the log bundles. Run the command with
``--help`` to see additional options:: ``--help`` to see additional options::
openstack overcloud support report collect --help $ openstack overcloud support report collect --help

View File

@ -5,13 +5,13 @@ Where Are the Logs?
------------------- -------------------
Some logs are stored in *journald*, but most are stored as text files in Some logs are stored in *journald*, but most are stored as text files in
``/var/log``. They are only accessible by the root user. ``/var/log/containers``. They are only accessible by the root user.
ironic-inspector ironic-inspector
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
The introspection logs (from ironic-inspector) are located in The introspection logs (from ironic-inspector) are located in
``/var/log/ironic-inspector``. If something fails during the introspection ``/var/log/containers/ironic-inspector``. If something fails during the introspection
ramdisk run, ironic-inspector stores the ramdisk logs in ramdisk run, ironic-inspector stores the ramdisk logs in
``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files. ``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files.
File names contain date, time and IPMI address of the node if it was detected File names contain date, time and IPMI address of the node if it was detected
@ -27,9 +27,9 @@ To collect introspection logs on success as well, set
ironic ironic
~~~~~~ ~~~~~~
The deployment logs (from ironic) are located in ``/var/log/ironic``. If The deployment logs (from ironic) are located in ``/var/log/containers/ironic``. If
something goes wrong during deployment or cleaning, the ramdisk logs are something goes wrong during deployment or cleaning, the ramdisk logs are
stored in ``/var/log/ironic/deploy``. See `ironic logs retrieving documentation stored in ``/var/log/containers/ironic/deploy``. See `ironic logs retrieving documentation
<https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_ <https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_
for more details. for more details.
@ -60,16 +60,16 @@ For example, a wrong MAC can be fixed in two steps:
* Find out the assigned port UUID by running * Find out the assigned port UUID by running
:: ::
openstack baremetal port list --node <NODE UUID> $ openstack baremetal port list --node <NODE UUID>
* Update the MAC address by running * Update the MAC address by running
:: ::
openstack baremetal port set --address <NEW MAC> <PORT UUID> $ openstack baremetal port set --address <NEW MAC> <PORT UUID>
A Wrong IPMI address can be fixed with the following command:: A Wrong IPMI address can be fixed with the following command::
openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS> $ openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
Node power state is not enforced by Ironic Node power state is not enforced by Ironic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -103,7 +103,7 @@ power management, and it gets stuck in an abnormal state.
Ironic requires that nodes that cannot be operated normally are put in the Ironic requires that nodes that cannot be operated normally are put in the
maintenance mode. It is done by the following command:: maintenance mode. It is done by the following command::
openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>" $ openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
Ironic will stop checking power and health state for such nodes, and Nova will Ironic will stop checking power and health state for such nodes, and Nova will
not pick them for deployment. Power command will still work on them, though. not pick them for deployment. Power command will still work on them, though.
@ -112,11 +112,11 @@ After a node is in the maintenance mode, you can attempt repairing it, e.g. by
`Fixing invalid node information`_. If you manage to make the node operational `Fixing invalid node information`_. If you manage to make the node operational
again, move it out of the maintenance mode:: again, move it out of the maintenance mode::
openstack baremetal node maintenance unset <NODE UUID> $ openstack baremetal node maintenance unset <NODE UUID>
If repairing is not possible, you can force deletion of such node:: If repairing is not possible, you can force deletion of such node::
openstack baremetal node delete <NODE UUID> $ openstack baremetal node delete <NODE UUID>
Forcing node removal will leave it powered on, accessing the network with Forcing node removal will leave it powered on, accessing the network with
the old IP address(es) and with all services running. Before proceeding, make the old IP address(es) and with all services running. Before proceeding, make
@ -163,7 +163,7 @@ or DHCP logs from
:: ::
sudo journalctl -u openstack-ironic-inspector-dnsmasq $ sudo journalctl -u openstack-ironic-inspector-dnsmasq
SSH as a root user with the temporary password or the SSH key. SSH as a root user with the temporary password or the SSH key.
@ -189,5 +189,4 @@ How can introspection be stopped?
Introspection for a node can be stopped with the following command:: Introspection for a node can be stopped with the following command::
openstack baremetal introspection abort <NODE UUID> $ openstack baremetal introspection abort <NODE UUID>

View File

@ -11,7 +11,7 @@ Identifying Failed Component
In most cases, Heat will show the failed overcloud stack when a deployment In most cases, Heat will show the failed overcloud stack when a deployment
has failed:: has failed::
$ heat stack-list $ openstack stack list
+--------------------------------------+------------+--------------------+----------------------+ +--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time | | id | stack_name | stack_status | creation_time |
@ -19,10 +19,10 @@ has failed::
| 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z | | 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z |
+--------------------------------------+------------+--------------------+----------------------+ +--------------------------------------+------------+--------------------+----------------------+
Occasionally, Heat is not even able to create the stack, so the ``heat Occasionally, Heat is not even able to create the stack, so the ``openstack
stack-list`` output will be empty. If this is the case, observe the message stack list`` output will be empty. If this is the case, observe the message
that was printed to the terminal when ``openstack overcloud deploy`` or ``heat that was printed to the terminal when ``openstack overcloud deploy`` or ``openstack
stack-create`` was run. stack create`` was run.
Next, there are a few layers on which the deployment can fail: Next, there are a few layers on which the deployment can fail:
@ -50,7 +50,7 @@ in the resulting table.
You can check the actual cause using the following command:: You can check the actual cause using the following command::
openstack baremetal node show <UUID> -f value -c maintenance_reason $ openstack baremetal node show <UUID> -f value -c maintenance_reason
For example, **Maintenance** goes to ``True`` automatically, if wrong power For example, **Maintenance** goes to ``True`` automatically, if wrong power
credentials are provided. credentials are provided.
@ -58,7 +58,7 @@ in the resulting table.
Fix the cause of the failure, then move the node out of the maintenance Fix the cause of the failure, then move the node out of the maintenance
mode:: mode::
openstack baremetal node maintenance unset <NODE UUID> $ openstack baremetal node maintenance unset <NODE UUID>
* If **Provision State** is ``available`` then the problem occurred before * If **Provision State** is ``available`` then the problem occurred before
bare metal deployment has even started. Proceed with `Debugging Using Heat`_. bare metal deployment has even started. Proceed with `Debugging Using Heat`_.
@ -75,7 +75,7 @@ in the resulting table.
* If **Provision State** is ``error`` or ``deploy failed``, then bare metal * If **Provision State** is ``error`` or ``deploy failed``, then bare metal
deployment has failed for this node. Look at the **last_error** field:: deployment has failed for this node. Look at the **last_error** field::
openstack baremetal node show <UUID> -f value -c last_error $ openstack baremetal node show <UUID> -f value -c last_error
If the error message is vague, you can use logs to clarify it, see If the error message is vague, you can use logs to clarify it, see
:ref:`ironic_logs` for details. :ref:`ironic_logs` for details.
@ -89,7 +89,7 @@ Showing deployment failures
^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Deployment failures can be shown with the following command:: Deployment failures can be shown with the following command::
[stack@undercloud ]$ openstack overcloud failures --plan my-deployment $ openstack overcloud failures --plan my-deployment
The command will show any errors encountered when running ``ansible-playbook`` The command will show any errors encountered when running ``ansible-playbook``
to configure the overcloud during the ``config-download`` process. See to configure the overcloud during the ``config-download`` process. See
@ -104,7 +104,7 @@ Debugging Using Heat
:: ::
$ heat resource-list overcloud $ openstack stack resource list overcloud
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ +-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time | | resource_name | physical_resource_id | resource_type | resource_status | updated_time |
@ -154,7 +154,7 @@ Debugging Using Heat
:: ::
$ heat resource-show overcloud ControllerNodesPostDeployment $ openstack stack resource show overcloud ControllerNodesPostDeployment
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value | | Property | Value |
@ -175,7 +175,7 @@ Debugging Using Heat
| updated_time | 2015-04-06T21:15:20Z | | updated_time | 2015-04-06T21:15:20Z |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The ``resource-show`` doesn't always show a clear reason why the resource The ``resource show`` doesn't always show a clear reason why the resource
failed. In these cases, logging into the Overcloud node is required to failed. In these cases, logging into the Overcloud node is required to
further troubleshoot the issue. further troubleshoot the issue.
@ -185,7 +185,7 @@ Debugging Using Heat
:: ::
$ nova list $ openstack server list
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+ +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
| ID | Name | Status | Task State | Power State | Networks | | ID | Name | Status | Task State | Power State | Networks |
@ -219,17 +219,17 @@ Debugging Using Heat
:: ::
$ nova list $ openstack server list
$ nova show <server-id> $ openstack server show <server-id>
The most common error shown will reference the error message ``No valid host The most common error shown will reference the error message ``No valid host
was found``. Refer to `No Valid Host Found Error`_ below. was found``. Refer to `No Valid Host Found Error`_ below.
In other cases, look at the following log files for further troubleshooting:: In other cases, look at the following log files for further troubleshooting::
/var/log/nova/* /var/log/containers/nova/*
/var/log/heat/* /var/log/containers/heat/*
/var/log/ironic/* /var/log/containers/ironic/*
* Using SOS * Using SOS
@ -247,7 +247,7 @@ Debugging Using Heat
No Valid Host Found Error No Valid Host Found Error
^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes ``/var/log/nova/nova-conductor.log`` contains the following error:: Sometimes ``/var/log/containers/nova/nova-conductor.log`` contains the following error::
NoValidHost: No valid host was found. There are not enough hosts available. NoValidHost: No valid host was found. There are not enough hosts available.
@ -266,7 +266,7 @@ you have enough nodes corresponding to each flavor/profile. Watch
:: ::
openstack baremetal node show <UUID> --fields properties $ openstack baremetal node show <UUID> --fields properties
It should contain e.g. ``profile:compute`` for compute nodes. It should contain e.g. ``profile:compute`` for compute nodes.

View File

@ -19,7 +19,7 @@ Known Issues:
The workaround is to do delete the libvirt capabilities cache and restart the service:: The workaround is to do delete the libvirt capabilities cache and restart the service::
rm -Rf /var/cache/libvirt/qemu/capabilities/ $ rm -Rf /var/cache/libvirt/qemu/capabilities/
systemctl restart libvirtd $ systemctl restart libvirtd
.. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882 .. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882