 2b62f49a9d
			
		
	
	2b62f49a9d
	
	
	
		
			
			Changed paths to avoid '..', which breaks symlinks in newer versions of sphinx. Consolidated installation include files under /_includes. Prefixed r5 versions with 'r5_' Moved files that are used up/down, but at different paths under /shared/_includes and /shared/figures Move two include files to /_includes Moved addtional images to /shared/figures/... Required for DS platform builds. Signed-off-by: Ron Stone <ronald.stone@windriver.com> Change-Id: Ia38f4205c5803b3d1fc043e6c59617c34a4e5cbd Signed-off-by: Ron Stone <ronald.stone@windriver.com>
		
			
				
	
	
		
			132 lines
		
	
	
		
			4.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			132 lines
		
	
	
		
			4.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ======================
 | |
| Debug StarlingX Issues
 | |
| ======================
 | |
| 
 | |
| This guide contains some basic steps for debugging issues on StarlingX.
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
|    :depth: 1
 | |
| 
 | |
| ----------------
 | |
| Record the issue
 | |
| ----------------
 | |
| 
 | |
| Record information about the issue so it can be reproduced during debugging. The
 | |
| items below describe some issue characteristics to capture.
 | |
| 
 | |
| *   Deployment issue type, such as bootstrap failure, provisioning failure, or
 | |
|     functional failures.
 | |
| 
 | |
| *   Check the StarlingX version with the command:
 | |
|     ::
 | |
| 
 | |
|       cat /etc/build.info
 | |
| 
 | |
| 
 | |
| *   Check the StarlingX deployment configuration, such as: Simplex, Duplex,
 | |
|     Multi-node, by viewing the platform configuration file:
 | |
|     ::
 | |
| 
 | |
|       cat /etc/platform/platform.conf
 | |
| 
 | |
| *   Server type, such as bare metal server(s) or VMs.
 | |
| 
 | |
| *   Hardware device types and characteristics, such as NICs, PCI cards, # of
 | |
|     hard disks, and RAM size.
 | |
| 
 | |
| *   Other aspects of the issue include: steps for reproducing, expected results,
 | |
|     actual results, and so on.
 | |
| 
 | |
| *   Can the issue be reproduced regularly or occasionally?
 | |
| 
 | |
| *   Gather log files and configuration files using the ``collect`` command.
 | |
| 
 | |
| 
 | |
| ---------------------
 | |
| Check status and logs
 | |
| ---------------------
 | |
| 
 | |
| *   Log in to the active controller.
 | |
| 
 | |
| *   Check services using the ``sm-dump`` command:
 | |
|     ::
 | |
| 
 | |
|       sudo sm-dump
 | |
| 
 | |
| *   Check services using the ``systemctl`` command.
 | |
| 
 | |
| *   Apply the platform environment for ``sysadmin`` using:
 | |
|     ::
 | |
| 
 | |
|       source /etc/platform/openrc
 | |
| 
 | |
| *   Check alarms from Fault-Manager using:
 | |
|     ::
 | |
| 
 | |
|       fm alarm-list --uuid
 | |
|       fm alarm-show <uuid>
 | |
| 
 | |
| *   Search for errors in ``/var/log``.
 | |
| 
 | |
|     *   You **must** check ``/var/log/sysinv.log`` for errors.
 | |
|     *   You can get hints from ``sysinv.log`` for many deployment failures.
 | |
|     *   Look into other log files based on the functional area.
 | |
| 
 | |
| *   If a functional area log file includes errors, check the associated
 | |
|     configuration file, which is typically located under the ``/etc/``
 | |
|     subdirectory.
 | |
| 
 | |
| *   You may need to enable the ``debug`` option in the configuration file.
 | |
| 
 | |
| ----------------
 | |
| Debug and triage
 | |
| ----------------
 | |
| 
 | |
| *   Check the Kubernetes status for: node, pod/job, endpoint, services, secret,
 | |
|     configmap.
 | |
| 
 | |
| *   Check the two major namespaces: kube-system, openstack
 | |
| 
 | |
| *   If issues occur inside containerized components, you need to enter the
 | |
|     service using the ``kubectl exec`` command.
 | |
| 
 | |
| ---------------
 | |
| Implement fixes
 | |
| ---------------
 | |
| 
 | |
| *   You can try to resolve the issue by manually making some online
 | |
|     changes without rebooting Linux or even re-deploying StarlingX. For
 | |
|     example, you can modify system config files or the StarlingX
 | |
|     config/database. You can make the changes and restart the corresponding
 | |
|     services using the ``systemctl`` command or the StarlingX ``sm`` (service
 | |
|     management) command.
 | |
| 
 | |
| *   If the fixes must be put on certain nodes (controller, worker, storage),
 | |
|     you can temporarily **lock** that node, make changes using StarlingX
 | |
|     commands, and then **unlock** the lock, to make the changes take effect.
 | |
| 
 | |
| *   If the changes must be made in C/C++/Go code, you can:
 | |
| 
 | |
|     *   Make the changes in your *development workspace* with the StarlingX
 | |
|         codebase.
 | |
|     *   Build the related packages using ``build-pkgs <package_name>``.
 | |
|     *   Create and apply the patch using the :ref:`starlingx_patching` guide.
 | |
|     *   Restart the services using the ``systemctl`` command or the StarlingX
 | |
|         ``sm`` (service management) command.
 | |
| 
 | |
| --------------------
 | |
| Additional resources
 | |
| --------------------
 | |
| 
 | |
| *   Review the `StarlingX Discuss list <http://lists.starlingx.io/pipermail/starlingx-discuss/>`_
 | |
|     for similar questions and workarounds from the community.
 | |
| 
 | |
| *   Check the `StarlingX Launchpad <https://launchpad.net/starlingx>`_ for
 | |
|     similar issues and potential workarounds.
 | |
| 
 | |
| *   Open a new `StarlingX Launchpad <https://launchpad.net/starlingx>`_ item to
 | |
|     report a bug.
 | |
| 
 | |
| 
 |