[troubleshooting] Update commands and paths used by modern versions of TripleO.

Previously, the content was primarily targeted at Newton which is EOL.

Change-Id: I2702c9b027be72d2dc636df4c81189b55e213ca0
Signed-off-by: Luke Short <ekultails@gmail.com>
This commit is contained in:
Luke Short 2020-01-09 18:20:34 -05:00
parent f10168f2a6
commit c6da4c6215
5 changed files with 42 additions and 45 deletions

View File

@ -1,8 +1,8 @@
Troubleshooting Image Build
-----------------------------------
---------------------------
Images fail to build
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^
More space needed
^^^^^^^^^^^^^^^^^
@ -13,4 +13,4 @@ can fail with a message like "At least 174MB more space needed on
the / filesystem". If freeing up more RAM isn't a possibility,
images can be built on disk by exporting an environment variable::
export DIB_NO_TMPFS=1
$ export DIB_NO_TMPFS=1

View File

@ -19,7 +19,7 @@ colect controller` will match all the overcloud nodes that contain the word
`controller`. To download the run the command and download them to a local
directory, run the following command::
openstack overcloud support report collect controller
$ openstack overcloud support report collect controller
.. note:: By default if -o is not specified, the logs will be downloaded to a folder
in the current working directory called `support_logs`
@ -31,7 +31,7 @@ Example: Download logs from a single host
To download logs from a specific host, you must specify the complete name as
reported by `openstack service list` from the undercloud::
openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
$ openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
Example: Leave logs in a swift container
@ -42,14 +42,14 @@ logs, you can leave them in a swift container for later retrieval. The
``--collect-only`` and ``-c`` options can be leveraged to store the
logs in a swift container. For example::
openstack overcloud support report collect -c logs_20170601 --collect-only controller
$ openstack overcloud support report collect -c logs_20170601 --collect-only controller
This will run sosreport on the nodes and upload the logs to a container named
`logs_20170601` on the undercloud. From which standard swift tooling can be
used to download the logs. Alternatively, you can then fetch the logs using
the `openstack overcloud support report collect` command by running::
openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
$ openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
.. note:: There is a ``--skip-container-delete`` option that can be used if you
want to leave the logs in swift but still download them. This option
@ -64,6 +64,4 @@ The ``openstack overcloud support report collect`` command has additional
that can be passed to work with the log bundles. Run the command with
``--help`` to see additional options::
openstack overcloud support report collect --help
$ openstack overcloud support report collect --help

View File

@ -5,13 +5,13 @@ Where Are the Logs?
-------------------
Some logs are stored in *journald*, but most are stored as text files in
``/var/log``. They are only accessible by the root user.
``/var/log/containers``. They are only accessible by the root user.
ironic-inspector
~~~~~~~~~~~~~~~~
The introspection logs (from ironic-inspector) are located in
``/var/log/ironic-inspector``. If something fails during the introspection
``/var/log/containers/ironic-inspector``. If something fails during the introspection
ramdisk run, ironic-inspector stores the ramdisk logs in
``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files.
File names contain date, time and IPMI address of the node if it was detected
@ -27,9 +27,9 @@ To collect introspection logs on success as well, set
ironic
~~~~~~
The deployment logs (from ironic) are located in ``/var/log/ironic``. If
The deployment logs (from ironic) are located in ``/var/log/containers/ironic``. If
something goes wrong during deployment or cleaning, the ramdisk logs are
stored in ``/var/log/ironic/deploy``. See `ironic logs retrieving documentation
stored in ``/var/log/containers/ironic/deploy``. See `ironic logs retrieving documentation
<https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_
for more details.
@ -60,16 +60,16 @@ For example, a wrong MAC can be fixed in two steps:
* Find out the assigned port UUID by running
::
openstack baremetal port list --node <NODE UUID>
$ openstack baremetal port list --node <NODE UUID>
* Update the MAC address by running
::
openstack baremetal port set --address <NEW MAC> <PORT UUID>
$ openstack baremetal port set --address <NEW MAC> <PORT UUID>
A Wrong IPMI address can be fixed with the following command::
openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
$ openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
Node power state is not enforced by Ironic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -103,7 +103,7 @@ power management, and it gets stuck in an abnormal state.
Ironic requires that nodes that cannot be operated normally are put in the
maintenance mode. It is done by the following command::
openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
$ openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
Ironic will stop checking power and health state for such nodes, and Nova will
not pick them for deployment. Power command will still work on them, though.
@ -112,11 +112,11 @@ After a node is in the maintenance mode, you can attempt repairing it, e.g. by
`Fixing invalid node information`_. If you manage to make the node operational
again, move it out of the maintenance mode::
openstack baremetal node maintenance unset <NODE UUID>
$ openstack baremetal node maintenance unset <NODE UUID>
If repairing is not possible, you can force deletion of such node::
openstack baremetal node delete <NODE UUID>
$ openstack baremetal node delete <NODE UUID>
Forcing node removal will leave it powered on, accessing the network with
the old IP address(es) and with all services running. Before proceeding, make
@ -163,7 +163,7 @@ or DHCP logs from
::
sudo journalctl -u openstack-ironic-inspector-dnsmasq
$ sudo journalctl -u openstack-ironic-inspector-dnsmasq
SSH as a root user with the temporary password or the SSH key.
@ -189,5 +189,4 @@ How can introspection be stopped?
Introspection for a node can be stopped with the following command::
openstack baremetal introspection abort <NODE UUID>
$ openstack baremetal introspection abort <NODE UUID>

View File

@ -11,7 +11,7 @@ Identifying Failed Component
In most cases, Heat will show the failed overcloud stack when a deployment
has failed::
$ heat stack-list
$ openstack stack list
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
@ -19,10 +19,10 @@ has failed::
| 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z |
+--------------------------------------+------------+--------------------+----------------------+
Occasionally, Heat is not even able to create the stack, so the ``heat
stack-list`` output will be empty. If this is the case, observe the message
that was printed to the terminal when ``openstack overcloud deploy`` or ``heat
stack-create`` was run.
Occasionally, Heat is not even able to create the stack, so the ``openstack
stack list`` output will be empty. If this is the case, observe the message
that was printed to the terminal when ``openstack overcloud deploy`` or ``openstack
stack create`` was run.
Next, there are a few layers on which the deployment can fail:
@ -50,7 +50,7 @@ in the resulting table.
You can check the actual cause using the following command::
openstack baremetal node show <UUID> -f value -c maintenance_reason
$ openstack baremetal node show <UUID> -f value -c maintenance_reason
For example, **Maintenance** goes to ``True`` automatically, if wrong power
credentials are provided.
@ -58,7 +58,7 @@ in the resulting table.
Fix the cause of the failure, then move the node out of the maintenance
mode::
openstack baremetal node maintenance unset <NODE UUID>
$ openstack baremetal node maintenance unset <NODE UUID>
* If **Provision State** is ``available`` then the problem occurred before
bare metal deployment has even started. Proceed with `Debugging Using Heat`_.
@ -75,7 +75,7 @@ in the resulting table.
* If **Provision State** is ``error`` or ``deploy failed``, then bare metal
deployment has failed for this node. Look at the **last_error** field::
openstack baremetal node show <UUID> -f value -c last_error
$ openstack baremetal node show <UUID> -f value -c last_error
If the error message is vague, you can use logs to clarify it, see
:ref:`ironic_logs` for details.
@ -89,7 +89,7 @@ Showing deployment failures
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Deployment failures can be shown with the following command::
[stack@undercloud ]$ openstack overcloud failures --plan my-deployment
$ openstack overcloud failures --plan my-deployment
The command will show any errors encountered when running ``ansible-playbook``
to configure the overcloud during the ``config-download`` process. See
@ -104,7 +104,7 @@ Debugging Using Heat
::
$ heat resource-list overcloud
$ openstack stack resource list overcloud
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
@ -154,7 +154,7 @@ Debugging Using Heat
::
$ heat resource-show overcloud ControllerNodesPostDeployment
$ openstack stack resource show overcloud ControllerNodesPostDeployment
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
@ -175,7 +175,7 @@ Debugging Using Heat
| updated_time | 2015-04-06T21:15:20Z |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The ``resource-show`` doesn't always show a clear reason why the resource
The ``resource show`` doesn't always show a clear reason why the resource
failed. In these cases, logging into the Overcloud node is required to
further troubleshoot the issue.
@ -185,7 +185,7 @@ Debugging Using Heat
::
$ nova list
$ openstack server list
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
| ID | Name | Status | Task State | Power State | Networks |
@ -219,17 +219,17 @@ Debugging Using Heat
::
$ nova list
$ nova show <server-id>
$ openstack server list
$ openstack server show <server-id>
The most common error shown will reference the error message ``No valid host
was found``. Refer to `No Valid Host Found Error`_ below.
In other cases, look at the following log files for further troubleshooting::
/var/log/nova/*
/var/log/heat/*
/var/log/ironic/*
/var/log/containers/nova/*
/var/log/containers/heat/*
/var/log/containers/ironic/*
* Using SOS
@ -247,7 +247,7 @@ Debugging Using Heat
No Valid Host Found Error
^^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes ``/var/log/nova/nova-conductor.log`` contains the following error::
Sometimes ``/var/log/containers/nova/nova-conductor.log`` contains the following error::
NoValidHost: No valid host was found. There are not enough hosts available.
@ -266,7 +266,7 @@ you have enough nodes corresponding to each flavor/profile. Watch
::
openstack baremetal node show <UUID> --fields properties
$ openstack baremetal node show <UUID> --fields properties
It should contain e.g. ``profile:compute`` for compute nodes.

View File

@ -19,7 +19,7 @@ Known Issues:
The workaround is to do delete the libvirt capabilities cache and restart the service::
rm -Rf /var/cache/libvirt/qemu/capabilities/
systemctl restart libvirtd
$ rm -Rf /var/cache/libvirt/qemu/capabilities/
$ systemctl restart libvirtd
.. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882