[troubleshooting] Update commands and paths used by modern versions of TripleO.
Previously, the content was primarily targeted at Newton which is EOL. Change-Id: I2702c9b027be72d2dc636df4c81189b55e213ca0 Signed-off-by: Luke Short <ekultails@gmail.com>
This commit is contained in:
parent
f10168f2a6
commit
c6da4c6215
@ -1,8 +1,8 @@
|
|||||||
Troubleshooting Image Build
|
Troubleshooting Image Build
|
||||||
-----------------------------------
|
---------------------------
|
||||||
|
|
||||||
Images fail to build
|
Images fail to build
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
More space needed
|
More space needed
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
@ -13,4 +13,4 @@ can fail with a message like "At least 174MB more space needed on
|
|||||||
the / filesystem". If freeing up more RAM isn't a possibility,
|
the / filesystem". If freeing up more RAM isn't a possibility,
|
||||||
images can be built on disk by exporting an environment variable::
|
images can be built on disk by exporting an environment variable::
|
||||||
|
|
||||||
export DIB_NO_TMPFS=1
|
$ export DIB_NO_TMPFS=1
|
||||||
|
@ -19,7 +19,7 @@ colect controller` will match all the overcloud nodes that contain the word
|
|||||||
`controller`. To download the run the command and download them to a local
|
`controller`. To download the run the command and download them to a local
|
||||||
directory, run the following command::
|
directory, run the following command::
|
||||||
|
|
||||||
openstack overcloud support report collect controller
|
$ openstack overcloud support report collect controller
|
||||||
|
|
||||||
.. note:: By default if -o is not specified, the logs will be downloaded to a folder
|
.. note:: By default if -o is not specified, the logs will be downloaded to a folder
|
||||||
in the current working directory called `support_logs`
|
in the current working directory called `support_logs`
|
||||||
@ -31,7 +31,7 @@ Example: Download logs from a single host
|
|||||||
To download logs from a specific host, you must specify the complete name as
|
To download logs from a specific host, you must specify the complete name as
|
||||||
reported by `openstack service list` from the undercloud::
|
reported by `openstack service list` from the undercloud::
|
||||||
|
|
||||||
openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
|
$ openstack overcloud support report collect -o /home/stack/logs overcloud-novacompute-0
|
||||||
|
|
||||||
|
|
||||||
Example: Leave logs in a swift container
|
Example: Leave logs in a swift container
|
||||||
@ -42,14 +42,14 @@ logs, you can leave them in a swift container for later retrieval. The
|
|||||||
``--collect-only`` and ``-c`` options can be leveraged to store the
|
``--collect-only`` and ``-c`` options can be leveraged to store the
|
||||||
logs in a swift container. For example::
|
logs in a swift container. For example::
|
||||||
|
|
||||||
openstack overcloud support report collect -c logs_20170601 --collect-only controller
|
$ openstack overcloud support report collect -c logs_20170601 --collect-only controller
|
||||||
|
|
||||||
This will run sosreport on the nodes and upload the logs to a container named
|
This will run sosreport on the nodes and upload the logs to a container named
|
||||||
`logs_20170601` on the undercloud. From which standard swift tooling can be
|
`logs_20170601` on the undercloud. From which standard swift tooling can be
|
||||||
used to download the logs. Alternatively, you can then fetch the logs using
|
used to download the logs. Alternatively, you can then fetch the logs using
|
||||||
the `openstack overcloud support report collect` command by running::
|
the `openstack overcloud support report collect` command by running::
|
||||||
|
|
||||||
openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
|
$ openstack overcloud support report collect -c logs_20170601 --download-only -o /tmp/mylogs controller
|
||||||
|
|
||||||
.. note:: There is a ``--skip-container-delete`` option that can be used if you
|
.. note:: There is a ``--skip-container-delete`` option that can be used if you
|
||||||
want to leave the logs in swift but still download them. This option
|
want to leave the logs in swift but still download them. This option
|
||||||
@ -64,6 +64,4 @@ The ``openstack overcloud support report collect`` command has additional
|
|||||||
that can be passed to work with the log bundles. Run the command with
|
that can be passed to work with the log bundles. Run the command with
|
||||||
``--help`` to see additional options::
|
``--help`` to see additional options::
|
||||||
|
|
||||||
openstack overcloud support report collect --help
|
$ openstack overcloud support report collect --help
|
||||||
|
|
||||||
|
|
||||||
|
@ -5,13 +5,13 @@ Where Are the Logs?
|
|||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
Some logs are stored in *journald*, but most are stored as text files in
|
Some logs are stored in *journald*, but most are stored as text files in
|
||||||
``/var/log``. They are only accessible by the root user.
|
``/var/log/containers``. They are only accessible by the root user.
|
||||||
|
|
||||||
ironic-inspector
|
ironic-inspector
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The introspection logs (from ironic-inspector) are located in
|
The introspection logs (from ironic-inspector) are located in
|
||||||
``/var/log/ironic-inspector``. If something fails during the introspection
|
``/var/log/containers/ironic-inspector``. If something fails during the introspection
|
||||||
ramdisk run, ironic-inspector stores the ramdisk logs in
|
ramdisk run, ironic-inspector stores the ramdisk logs in
|
||||||
``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files.
|
``/var/log/ironic-inspector/ramdisk/`` as gz-compressed tar files.
|
||||||
File names contain date, time and IPMI address of the node if it was detected
|
File names contain date, time and IPMI address of the node if it was detected
|
||||||
@ -27,9 +27,9 @@ To collect introspection logs on success as well, set
|
|||||||
ironic
|
ironic
|
||||||
~~~~~~
|
~~~~~~
|
||||||
|
|
||||||
The deployment logs (from ironic) are located in ``/var/log/ironic``. If
|
The deployment logs (from ironic) are located in ``/var/log/containers/ironic``. If
|
||||||
something goes wrong during deployment or cleaning, the ramdisk logs are
|
something goes wrong during deployment or cleaning, the ramdisk logs are
|
||||||
stored in ``/var/log/ironic/deploy``. See `ironic logs retrieving documentation
|
stored in ``/var/log/containers/ironic/deploy``. See `ironic logs retrieving documentation
|
||||||
<https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_
|
<https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#retrieving-logs-from-the-deploy-ramdisk>`_
|
||||||
for more details.
|
for more details.
|
||||||
|
|
||||||
@ -60,16 +60,16 @@ For example, a wrong MAC can be fixed in two steps:
|
|||||||
* Find out the assigned port UUID by running
|
* Find out the assigned port UUID by running
|
||||||
::
|
::
|
||||||
|
|
||||||
openstack baremetal port list --node <NODE UUID>
|
$ openstack baremetal port list --node <NODE UUID>
|
||||||
|
|
||||||
* Update the MAC address by running
|
* Update the MAC address by running
|
||||||
::
|
::
|
||||||
|
|
||||||
openstack baremetal port set --address <NEW MAC> <PORT UUID>
|
$ openstack baremetal port set --address <NEW MAC> <PORT UUID>
|
||||||
|
|
||||||
A Wrong IPMI address can be fixed with the following command::
|
A Wrong IPMI address can be fixed with the following command::
|
||||||
|
|
||||||
openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
|
$ openstack baremetal node set <NODE UUID> --driver-info ipmi_address=<NEW IPMI ADDRESS>
|
||||||
|
|
||||||
Node power state is not enforced by Ironic
|
Node power state is not enforced by Ironic
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -103,7 +103,7 @@ power management, and it gets stuck in an abnormal state.
|
|||||||
Ironic requires that nodes that cannot be operated normally are put in the
|
Ironic requires that nodes that cannot be operated normally are put in the
|
||||||
maintenance mode. It is done by the following command::
|
maintenance mode. It is done by the following command::
|
||||||
|
|
||||||
openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
|
$ openstack baremetal node maintenance set <NODE UUID> --reason "<EXPLANATION>"
|
||||||
|
|
||||||
Ironic will stop checking power and health state for such nodes, and Nova will
|
Ironic will stop checking power and health state for such nodes, and Nova will
|
||||||
not pick them for deployment. Power command will still work on them, though.
|
not pick them for deployment. Power command will still work on them, though.
|
||||||
@ -112,11 +112,11 @@ After a node is in the maintenance mode, you can attempt repairing it, e.g. by
|
|||||||
`Fixing invalid node information`_. If you manage to make the node operational
|
`Fixing invalid node information`_. If you manage to make the node operational
|
||||||
again, move it out of the maintenance mode::
|
again, move it out of the maintenance mode::
|
||||||
|
|
||||||
openstack baremetal node maintenance unset <NODE UUID>
|
$ openstack baremetal node maintenance unset <NODE UUID>
|
||||||
|
|
||||||
If repairing is not possible, you can force deletion of such node::
|
If repairing is not possible, you can force deletion of such node::
|
||||||
|
|
||||||
openstack baremetal node delete <NODE UUID>
|
$ openstack baremetal node delete <NODE UUID>
|
||||||
|
|
||||||
Forcing node removal will leave it powered on, accessing the network with
|
Forcing node removal will leave it powered on, accessing the network with
|
||||||
the old IP address(es) and with all services running. Before proceeding, make
|
the old IP address(es) and with all services running. Before proceeding, make
|
||||||
@ -163,7 +163,7 @@ or DHCP logs from
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
sudo journalctl -u openstack-ironic-inspector-dnsmasq
|
$ sudo journalctl -u openstack-ironic-inspector-dnsmasq
|
||||||
|
|
||||||
SSH as a root user with the temporary password or the SSH key.
|
SSH as a root user with the temporary password or the SSH key.
|
||||||
|
|
||||||
@ -189,5 +189,4 @@ How can introspection be stopped?
|
|||||||
|
|
||||||
Introspection for a node can be stopped with the following command::
|
Introspection for a node can be stopped with the following command::
|
||||||
|
|
||||||
openstack baremetal introspection abort <NODE UUID>
|
$ openstack baremetal introspection abort <NODE UUID>
|
||||||
|
|
||||||
|
@ -11,7 +11,7 @@ Identifying Failed Component
|
|||||||
In most cases, Heat will show the failed overcloud stack when a deployment
|
In most cases, Heat will show the failed overcloud stack when a deployment
|
||||||
has failed::
|
has failed::
|
||||||
|
|
||||||
$ heat stack-list
|
$ openstack stack list
|
||||||
|
|
||||||
+--------------------------------------+------------+--------------------+----------------------+
|
+--------------------------------------+------------+--------------------+----------------------+
|
||||||
| id | stack_name | stack_status | creation_time |
|
| id | stack_name | stack_status | creation_time |
|
||||||
@ -19,10 +19,10 @@ has failed::
|
|||||||
| 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z |
|
| 7e88af95-535c-4a55-b78d-2c3d9850d854 | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z |
|
||||||
+--------------------------------------+------------+--------------------+----------------------+
|
+--------------------------------------+------------+--------------------+----------------------+
|
||||||
|
|
||||||
Occasionally, Heat is not even able to create the stack, so the ``heat
|
Occasionally, Heat is not even able to create the stack, so the ``openstack
|
||||||
stack-list`` output will be empty. If this is the case, observe the message
|
stack list`` output will be empty. If this is the case, observe the message
|
||||||
that was printed to the terminal when ``openstack overcloud deploy`` or ``heat
|
that was printed to the terminal when ``openstack overcloud deploy`` or ``openstack
|
||||||
stack-create`` was run.
|
stack create`` was run.
|
||||||
|
|
||||||
Next, there are a few layers on which the deployment can fail:
|
Next, there are a few layers on which the deployment can fail:
|
||||||
|
|
||||||
@ -50,7 +50,7 @@ in the resulting table.
|
|||||||
|
|
||||||
You can check the actual cause using the following command::
|
You can check the actual cause using the following command::
|
||||||
|
|
||||||
openstack baremetal node show <UUID> -f value -c maintenance_reason
|
$ openstack baremetal node show <UUID> -f value -c maintenance_reason
|
||||||
|
|
||||||
For example, **Maintenance** goes to ``True`` automatically, if wrong power
|
For example, **Maintenance** goes to ``True`` automatically, if wrong power
|
||||||
credentials are provided.
|
credentials are provided.
|
||||||
@ -58,7 +58,7 @@ in the resulting table.
|
|||||||
Fix the cause of the failure, then move the node out of the maintenance
|
Fix the cause of the failure, then move the node out of the maintenance
|
||||||
mode::
|
mode::
|
||||||
|
|
||||||
openstack baremetal node maintenance unset <NODE UUID>
|
$ openstack baremetal node maintenance unset <NODE UUID>
|
||||||
|
|
||||||
* If **Provision State** is ``available`` then the problem occurred before
|
* If **Provision State** is ``available`` then the problem occurred before
|
||||||
bare metal deployment has even started. Proceed with `Debugging Using Heat`_.
|
bare metal deployment has even started. Proceed with `Debugging Using Heat`_.
|
||||||
@ -75,7 +75,7 @@ in the resulting table.
|
|||||||
* If **Provision State** is ``error`` or ``deploy failed``, then bare metal
|
* If **Provision State** is ``error`` or ``deploy failed``, then bare metal
|
||||||
deployment has failed for this node. Look at the **last_error** field::
|
deployment has failed for this node. Look at the **last_error** field::
|
||||||
|
|
||||||
openstack baremetal node show <UUID> -f value -c last_error
|
$ openstack baremetal node show <UUID> -f value -c last_error
|
||||||
|
|
||||||
If the error message is vague, you can use logs to clarify it, see
|
If the error message is vague, you can use logs to clarify it, see
|
||||||
:ref:`ironic_logs` for details.
|
:ref:`ironic_logs` for details.
|
||||||
@ -89,7 +89,7 @@ Showing deployment failures
|
|||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
Deployment failures can be shown with the following command::
|
Deployment failures can be shown with the following command::
|
||||||
|
|
||||||
[stack@undercloud ]$ openstack overcloud failures --plan my-deployment
|
$ openstack overcloud failures --plan my-deployment
|
||||||
|
|
||||||
The command will show any errors encountered when running ``ansible-playbook``
|
The command will show any errors encountered when running ``ansible-playbook``
|
||||||
to configure the overcloud during the ``config-download`` process. See
|
to configure the overcloud during the ``config-download`` process. See
|
||||||
@ -104,7 +104,7 @@ Debugging Using Heat
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ heat resource-list overcloud
|
$ openstack stack resource list overcloud
|
||||||
|
|
||||||
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
|
+-----------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+
|
||||||
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
|
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
|
||||||
@ -154,7 +154,7 @@ Debugging Using Heat
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ heat resource-show overcloud ControllerNodesPostDeployment
|
$ openstack stack resource show overcloud ControllerNodesPostDeployment
|
||||||
|
|
||||||
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Property | Value |
|
| Property | Value |
|
||||||
@ -175,7 +175,7 @@ Debugging Using Heat
|
|||||||
| updated_time | 2015-04-06T21:15:20Z |
|
| updated_time | 2015-04-06T21:15:20Z |
|
||||||
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
|
||||||
The ``resource-show`` doesn't always show a clear reason why the resource
|
The ``resource show`` doesn't always show a clear reason why the resource
|
||||||
failed. In these cases, logging into the Overcloud node is required to
|
failed. In these cases, logging into the Overcloud node is required to
|
||||||
further troubleshoot the issue.
|
further troubleshoot the issue.
|
||||||
|
|
||||||
@ -185,7 +185,7 @@ Debugging Using Heat
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ nova list
|
$ openstack server list
|
||||||
|
|
||||||
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
|
+--------------------------------------+-------------------------------------------------------+--------+------------+-------------+---------------------+
|
||||||
| ID | Name | Status | Task State | Power State | Networks |
|
| ID | Name | Status | Task State | Power State | Networks |
|
||||||
@ -219,17 +219,17 @@ Debugging Using Heat
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ nova list
|
$ openstack server list
|
||||||
$ nova show <server-id>
|
$ openstack server show <server-id>
|
||||||
|
|
||||||
The most common error shown will reference the error message ``No valid host
|
The most common error shown will reference the error message ``No valid host
|
||||||
was found``. Refer to `No Valid Host Found Error`_ below.
|
was found``. Refer to `No Valid Host Found Error`_ below.
|
||||||
|
|
||||||
In other cases, look at the following log files for further troubleshooting::
|
In other cases, look at the following log files for further troubleshooting::
|
||||||
|
|
||||||
/var/log/nova/*
|
/var/log/containers/nova/*
|
||||||
/var/log/heat/*
|
/var/log/containers/heat/*
|
||||||
/var/log/ironic/*
|
/var/log/containers/ironic/*
|
||||||
|
|
||||||
* Using SOS
|
* Using SOS
|
||||||
|
|
||||||
@ -247,7 +247,7 @@ Debugging Using Heat
|
|||||||
No Valid Host Found Error
|
No Valid Host Found Error
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Sometimes ``/var/log/nova/nova-conductor.log`` contains the following error::
|
Sometimes ``/var/log/containers/nova/nova-conductor.log`` contains the following error::
|
||||||
|
|
||||||
NoValidHost: No valid host was found. There are not enough hosts available.
|
NoValidHost: No valid host was found. There are not enough hosts available.
|
||||||
|
|
||||||
@ -266,7 +266,7 @@ you have enough nodes corresponding to each flavor/profile. Watch
|
|||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
openstack baremetal node show <UUID> --fields properties
|
$ openstack baremetal node show <UUID> --fields properties
|
||||||
|
|
||||||
It should contain e.g. ``profile:compute`` for compute nodes.
|
It should contain e.g. ``profile:compute`` for compute nodes.
|
||||||
|
|
||||||
|
@ -19,7 +19,7 @@ Known Issues:
|
|||||||
|
|
||||||
The workaround is to do delete the libvirt capabilities cache and restart the service::
|
The workaround is to do delete the libvirt capabilities cache and restart the service::
|
||||||
|
|
||||||
rm -Rf /var/cache/libvirt/qemu/capabilities/
|
$ rm -Rf /var/cache/libvirt/qemu/capabilities/
|
||||||
systemctl restart libvirtd
|
$ systemctl restart libvirtd
|
||||||
|
|
||||||
.. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882
|
.. _bug in libvirt: https://bugzilla.redhat.com/show_bug.cgi?id=1195882
|
||||||
|
Loading…
Reference in New Issue
Block a user