Closes-bug: 2038565 Change-Id: I9bf8d7d056e61375f2507839419a7edcb786f6a1 Signed-off-by: Bruno Drugowick Muniz <bruno.muniz@encora.com>
4.0 KiB
Debug StarlingX Issues
This guide contains some basic steps for debugging issues on StarlingX.
Record the issue
Record information about the issue so it can be reproduced during debugging. The items below describe some issue characteristics to capture.
Deployment issue type, such as bootstrap failure, provisioning failure, or functional failures.
Check the StarlingX version with the command: :
cat /etc/build.info
Check the StarlingX deployment configuration, such as: Simplex, Duplex, Multi-node, by viewing the platform configuration file: :
cat /etc/platform/platform.conf
Server type, such as bare metal server(s) or VMs.
Hardware device types and characteristics, such as NICs, PCI cards, # of hard disks, and RAM size.
Other aspects of the issue include: steps for reproducing, expected results, actual results, and so on.
Can the issue be reproduced regularly or occasionally?
Gather log files and configuration files using the
collect
command.
Check status and logs
Log in to the active controller.
Check services using the
sm-dump
command: :sudo sm-dump
Check services using the
systemctl
command.Apply the platform environment for
sysadmin
using: :source /etc/platform/openrc
Check alarms from Fault-Manager using: :
fm alarm-list --uuid fm alarm-show <uuid>
Search for errors in
/var/log
.- You must check
/var/log/sysinv.log
for errors. - You can get hints from
sysinv.log
for many deployment failures. - Look into other log files based on the functional area.
- You must check
If a functional area log file includes errors, check the associated configuration file, which is typically located under the
/etc/
subdirectory.You may need to enable the
debug
option in the configuration file.
Debug and triage
- Check the Kubernetes status for: node, pod/job, endpoint, services, secret, configmap.
- Check the two major namespaces: kube-system, openstack
- If issues occur inside containerized components, you need to enter
the service using the
kubectl exec
command.
Implement fixes
- You can try to resolve the issue by manually making some online
changes without rebooting Linux or even re-deploying StarlingX. For
example, you can modify system config files or the StarlingX
config/database. You can make the changes and restart the corresponding
services using the
systemctl
command or the StarlingXsm
(service management) command. - If the fixes must be put on certain nodes (controller, worker, storage), you can temporarily lock that node, make changes using StarlingX commands, and then unlock the lock, to make the changes take effect.
- If the changes must be made in C/C++/Go code, you can:
- Make the changes in your development workspace with the StarlingX codebase.
- Build the related packages using
build-pkgs <package_name>
. - Create and apply the patch using the
starlingx_patching
guide. - Restart the services using the
systemctl
command or the StarlingXsm
(service management) command.
Additional resources
- Review the StarlingX Discuss list for similar questions and workarounds from the community.
- Check the StarlingX Launchpad for similar issues and potential workarounds.
- Open a new StarlingX Launchpad item to report a bug.