[troubleshooting] Update commands and paths used by modern versions of TripleO.
Previously, the content was primarily targeted at Newton which is EOL. Change-Id: I2702c9b027be72d2dc636df4c81189b55e213ca0 Signed-off-by: Luke Short <ekultails@gmail.com>
This commit is contained in:
parent
f10168f2a6
commit
c6da4c6215
@ -1,8 +1,8 @@
|
||||
Troubleshooting Image Build
|
||||
-----------------------------------
|
||||
---------------------------
|
||||
|
||||
Images fail to build
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
More space needed
|
||||
^^^^^^^^^^^^^^^^^
|
||||
@ -13,4 +13,4 @@ can fail with a message like "At least 174MB more space needed on
|
||||
the / filesystem". If freeing up more RAM isn't a possibility,
|
||||
images can be built on disk by exporting an environment variable::
|
||||
|
||||
export DIB_NO_TMPFS=1
|
||||
$ export DIB_NO_TMPFS=1
|
||||
|
@ -19,7 +19,7 @@ colect controller` will match all the overcloud nodes that contain the word
|
||||
`controller`. To download the run the command and download them to a local
|
||||
directory, run the following command::
|
||||
|
||||
openstack overcloud support report collect controller
|
||||
$ openstack overcloud support report collect controller
|
||||
|
||||
.. note:: By default if -o is not specified, the logs will be downloaded to a folder
|
||||
in the current working directory called `support_logs`
|
||||
@ -31,7 +31,7 @@ Example: Download logs from a single host
|
||||
To download logs from a specific host, you must specify the complete name as
|
||||
reported by `openstack service list` from the undercloud::
|
||||
|
||||
openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
|
||||
$ openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
|
||||
|
||||
|
||||
Example: Leave logs in a swift container
|
||||
@ -42,14 +42,14 @@ logs, you can leave them in a swift container for later retrieval. The
|
||||
``--collect-only`` and ``-c`` options can be leveraged to store the
|
||||
logs in a swift container. For example::
|
||||
|
||||
openstack overcloud support report collect -c logs_20170601 --collect-only controller
|
||||
$ openstack overcloud support report collect -c logs_20170601 --collect-only controller
|
||||
|
||||
This will run sosreport on the nodes and upload the logs to a container named
|
||||
`logs_20170601` on the undercloud. From which standard swift tooling can be
|
||||
used to download the logs. Alternatively, you can then fetch the logs using
|
||||
the `openstack overcloud support report collect` command by running::
|
||||
|
||||
openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
|
||||
$ openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
|
||||
|
||||
.. note:: There is a ``--skip-container-delete`` option that can be used if you
|
||||
want to leave the logs in swift but still download them. This option
|
||||
@ -64,6 +64,4 @@ The ``openstack overcloud support report collect`` command has additional
|
||||
that can be passed to work with the log bundles. Run the command with
|
||||
``--help`` to see additional options::
|
||||
|
||||
openstack overcloud support report collect --help
|
||||
|
||||
|
||||
$ openstack overcloud support report collect --help
|
||||
|
@ -5,13 +5,13 @@ Where Are the Logs?
|
||||
-------------------
|
||||
|
||||
Some logs are stored in *journald*, but most are stored as text files in
|
||||
``/var/log``. They are only accessible by the root user.
|
||||
``/var/log/containers``. They are only accessible by the root user.
|
||||
|
||||
ironic-inspector
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The introspection logs (from ironic-inspector) are located in
|
||||
``/var/log/ironic-inspector``. If something fails during the introspection
|
||||
``/var/log/containers/ironic-inspector``. If something fails during the introspection
|
||||
ramdisk run, ironic-inspector stores the ramdisk logs in
|
||||
``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files.
|
||||
File names contain date, time and IPMI address of the node if it was detected
|
||||
@ -27,9 +27,9 @@ To collect introspection logs on success as well, set
|
||||
ironic
|
||||
~~~~~~
|
||||
|
||||
The deployment logs (from ironic) are located in ``/var/log/ironic``. If
|
||||
The deployment logs (from ironic) are located in ``/var/log/containers/ironic``. If
|
||||
something goes wrong during deployment or cleaning, the ramdisk logs are
|
||||
stored in ``/var/log/ironic/deploy``. See `ironic logs retrieving documentation
|
||||
stored in ``/var/log/containers/ironic/deploy``. See `ironic logs retrieving documentation
|
||||
<https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_
|
||||
for more details.
|
||||
|
||||
@ -60,16 +60,16 @@ For example, a wrong MAC can be fixed in two steps:
|
||||
* Find out the assigned port UUID by running
|
||||
::
|
||||
|
||||
openstack baremetal port list --node <NODE UUID>
|
||||
$ openstack baremetal port list --node <NODE UUID>
|
||||
|
||||
* Update the MAC address by running
|
||||
::
|
||||
|
||||
openstack baremetal port set --address <NEW MAC> <PORT UUID>
|
||||
$ openstack baremetal port set --address <NEW MAC> <PORT UUID>
|
||||
|
||||
A Wrong IPMI address can be fixed with the following command::
|
||||
|
||||
openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
|
||||
$ openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
|
||||
|
||||
Node power state is not enforced by Ironic
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -103,7 +103,7 @@ power management, and it gets stuck in an abnormal state.
|
||||
Ironic requires that nodes that cannot be operated normally are put in the
|
||||
maintenance mode. It is done by the following command::
|
||||
|
||||
openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
|
||||
$ openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
|
||||
|
||||
Ironic will stop checking power and health state for such nodes, and Nova will
|
||||
not pick them for deployment. Power command will still work on them, though.
|
||||
@ -112,11 +112,11 @@ After a node is in the maintenance mode, you can attempt repairing it, e.g. by
|
||||
`Fixing invalid node information`_. If you manage to make the node operational
|
||||
again, move it out of the maintenance mode::
|
||||
|
||||
openstack baremetal node maintenance unset <NODE UUID>
|
||||
$ openstack baremetal node maintenance unset <NODE UUID>
|
||||
|
||||
If repairing is not possible, you can force deletion of such node::
|
||||
|
||||
openstack baremetal node delete <NODE UUID>
|
||||
$ openstack baremetal node delete <NODE UUID>
|
||||
|
||||
Forcing node removal will leave it powered on, accessing the network with
|
||||
the old IP address(es) and with all services running. Before proceeding, make
|
||||
@ -163,7 +163,7 @@ or DHCP logs from
|
||||
|
||||
::
|
||||
|
||||
sudo journalctl -u openstack-ironic-inspector-dnsmasq
|
||||
$ sudo journalctl -u openstack-ironic-inspector-dnsmasq
|
||||
|
||||
SSH as a root user with the temporary password or the SSH key.
|
||||
|
||||
@ -189,5 +189,4 @@ How can introspection be stopped?
|
||||
|
||||
Introspection for a node can be stopped with the following command::
|
||||
|
||||
openstack baremetal introspection abort <NODE UUID>
|
||||
|
||||
$ openstack baremetal introspection abort <NODE UUID>
|
||||
|
@ -11,7 +11,7 @@ Identifying Failed Component
|
||||
In most cases, Heat will show the failed overcloud stack when a deployment
|
||||
has failed::
|
||||
|
||||
$ heat stack-list
|
||||
$ openstack stack list
|
||||
|
||||
+--------------------------------------+------------+--------------------+----------------------+
|
||||
| id | stack_name | stack_status | creation_time |
|
||||
@ -19,10 +19,10 @@ has failed::
|
||||
| 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z |
|
||||
+--------------------------------------+------------+--------------------+----------------------+
|
||||
|
||||
Occasionally, Heat is not even able to create the stack, so the ``heat
|
||||
stack-list`` output will be empty. If this is the case, observe the message
|
||||
that was printed to the terminal when ``openstack overcloud deploy`` or ``heat
|
||||
stack-create`` was run.
|
||||
Occasionally, Heat is not even able to create the stack, so the ``openstack
|
||||
stack list`` output will be empty. If this is the case, observe the message
|
||||
that was printed to the terminal when ``openstack overcloud deploy`` or ``openstack
|
||||
stack create`` was run.
|
||||
|
||||
Next, there are a few layers on which the deployment can fail:
|
||||
|
||||
@ -50,7 +50,7 @@ in the resulting table.
|
||||
|
||||
You can check the actual cause using the following command::
|
||||
|
||||
openstack baremetal node show <UUID> -f value -c maintenance_reason
|
||||
$ openstack baremetal node show <UUID> -f value -c maintenance_reason
|
||||
|
||||
For example, **Maintenance** goes to ``True`` automatically, if wrong power
|
||||
credentials are provided.
|
||||
@ -58,7 +58,7 @@ in the resulting table.
|
||||
Fix the cause of the failure, then move the node out of the maintenance
|
||||
mode::
|
||||
|
||||
openstack baremetal node maintenance unset <NODE UUID>
|
||||
$ openstack baremetal node maintenance unset <NODE UUID>
|
||||
|
||||
* If **Provision State** is ``available`` then the problem occurred before
|
||||
bare metal deployment has even started. Proceed with `Debugging Using Heat`_.
|
||||
@ -75,7 +75,7 @@ in the resulting table.
|
||||
* If **Provision State** is ``error`` or ``deploy failed``, then bare metal
|
||||
deployment has failed for this node. Look at the **last_error** field::
|
||||
|
||||
openstack baremetal node show <UUID> -f value -c last_error
|
||||
$ openstack baremetal node show <UUID> -f value -c last_error
|
||||
|
||||
If the error message is vague, you can use logs to clarify it, see
|
||||
:ref:`ironic_logs` for details.
|
||||
@ -89,7 +89,7 @@ Showing deployment failures
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Deployment failures can be shown with the following command::
|
||||
|
||||
[stack@undercloud ]$ openstack overcloud failures --plan my-deployment
|
||||
$ openstack overcloud failures --plan my-deployment
|
||||
|
||||
The command will show any errors encountered when running ``ansible-playbook``
|
||||
to configure the overcloud during the ``config-download`` process. See
|
||||
@ -104,7 +104,7 @@ Debugging Using Heat
|
||||
|
||||
::
|
||||
|
||||
$ heat resource-list overcloud
|
||||
$ openstack stack resource list overcloud
|
||||
|
||||
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
|
||||
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
|
||||
@ -154,7 +154,7 @@ Debugging Using Heat
|
||||
|
||||
::
|
||||
|
||||
$ heat resource-show overcloud ControllerNodesPostDeployment
|
||||
$ openstack stack resource show overcloud ControllerNodesPostDeployment
|
||||
|
||||
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| Property | Value |
|
||||
@ -175,7 +175,7 @@ Debugging Using Heat
|
||||
| updated_time | 2015-04-06T21:15:20Z |
|
||||
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
|
||||
The ``resource-show`` doesn't always show a clear reason why the resource
|
||||
The ``resource show`` doesn't always show a clear reason why the resource
|
||||
failed. In these cases, logging into the Overcloud node is required to
|
||||
further troubleshoot the issue.
|
||||
|
||||
@ -185,7 +185,7 @@ Debugging Using Heat
|
||||
|
||||
::
|
||||
|
||||
$ nova list
|
||||
$ openstack server list
|
||||
|
||||
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
|
||||
| ID | Name | Status | Task State | Power State | Networks |
|
||||
@ -219,17 +219,17 @@ Debugging Using Heat
|
||||
|
||||
::
|
||||
|
||||
$ nova list
|
||||
$ nova show <server-id>
|
||||
$ openstack server list
|
||||
$ openstack server show <server-id>
|
||||
|
||||
The most common error shown will reference the error message ``No valid host
|
||||
was found``. Refer to `No Valid Host Found Error`_ below.
|
||||
|
||||
In other cases, look at the following log files for further troubleshooting::
|
||||
|
||||
/var/log/nova/*
|
||||
/var/log/heat/*
|
||||
/var/log/ironic/*
|
||||
/var/log/containers/nova/*
|
||||
/var/log/containers/heat/*
|
||||
/var/log/containers/ironic/*
|
||||
|
||||
* Using SOS
|
||||
|
||||
@ -247,7 +247,7 @@ Debugging Using Heat
|
||||
No Valid Host Found Error
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Sometimes ``/var/log/nova/nova-conductor.log`` contains the following error::
|
||||
Sometimes ``/var/log/containers/nova/nova-conductor.log`` contains the following error::
|
||||
|
||||
NoValidHost: No valid host was found. There are not enough hosts available.
|
||||
|
||||
@ -266,7 +266,7 @@ you have enough nodes corresponding to each flavor/profile. Watch
|
||||
|
||||
::
|
||||
|
||||
openstack baremetal node show <UUID> --fields properties
|
||||
$ openstack baremetal node show <UUID> --fields properties
|
||||
|
||||
It should contain e.g. ``profile:compute`` for compute nodes.
|
||||
|
||||
|
@ -19,7 +19,7 @@ Known Issues:
|
||||
|
||||
The workaround is to do delete the libvirt capabilities cache and restart the service::
|
||||
|
||||
rm -Rf /var/cache/libvirt/qemu/capabilities/
|
||||
systemctl restart libvirtd
|
||||
$ rm -Rf /var/cache/libvirt/qemu/capabilities/
|
||||
$ systemctl restart libvirtd
|
||||
|
||||
.. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882
|
||||
|
Loading…
Reference in New Issue
Block a user