diff --git a/doc/source/_includes/data-networks-overview.rest b/doc/source/_includes/data-networks-overview.rest new file mode 100644 index 000000000..0c828b361 --- /dev/null +++ b/doc/source/_includes/data-networks-overview.rest @@ -0,0 +1 @@ +.. This file must exist to satisfy build requirements. \ No newline at end of file diff --git a/doc/source/_includes/openstack-alarm-messages-xxxs.rest b/doc/source/_includes/openstack-alarm-messages-xxxs.rest new file mode 100644 index 000000000..82e640ffc --- /dev/null +++ b/doc/source/_includes/openstack-alarm-messages-xxxs.rest @@ -0,0 +1,36 @@ +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +For more information, see :ref:`Overview `. + +In the following tables, the severity of the alarms is represented by one or +more letters, as follows: + +.. _alarm-messages-300s-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +A slash-separated list of letters is used when the alarm can be triggered with +one of several severity levels. + +An asterisk \(\*\) indicates the management-affecting severity, if any. A +management-affecting alarm is one that cannot be ignored at the indicated +severity level or higher by using relaxed alarm rules during an orchestrated +patch or upgrade operation. + + +Differences exist between the terminology emitted by some alarms and that +used in the CLI, GUI, and elsewhere in the documentations: + +- References to provider networks in alarms refer to data networks. + +- References to data networks in alarms refer to physical networks. + +- References to tenant networks in alarms refer to project networks. \ No newline at end of file diff --git a/doc/source/_includes/openstack-customer-log-messages-xxxs b/doc/source/_includes/openstack-customer-log-messages-xxxs new file mode 100644 index 000000000..cdf117757 --- /dev/null +++ b/doc/source/_includes/openstack-customer-log-messages-xxxs @@ -0,0 +1,13 @@ +The Customer Logs include events that do not require immediate user action. + +The following types of events are included in the Customer Logs. The severity of the events is represented in the table by one or more letters, as follows: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +- NA: Not applicable \ No newline at end of file diff --git a/doc/source/_includes/openstack-customer-log-messages-xxxs.rest b/doc/source/_includes/openstack-customer-log-messages-xxxs.rest new file mode 100644 index 000000000..dd1cd9cb7 --- /dev/null +++ b/doc/source/_includes/openstack-customer-log-messages-xxxs.rest @@ -0,0 +1,15 @@ +The Customer Logs include events that do not require immediate user action. + +The following types of events are included in the Customer Logs. The severity of the events is represented in the table by one or more letters, as follows: + +.. _customer-log-messages-401s-services-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +- NA: Not applicable \ No newline at end of file diff --git a/doc/source/_includes/troubleshooting-log-collection.rest b/doc/source/_includes/troubleshooting-log-collection.rest new file mode 100644 index 000000000..0c828b361 --- /dev/null +++ b/doc/source/_includes/troubleshooting-log-collection.rest @@ -0,0 +1 @@ +.. This file must exist to satisfy build requirements. \ No newline at end of file diff --git a/doc/source/_includes/x00-series-alarm-messages.rest b/doc/source/_includes/x00-series-alarm-messages.rest new file mode 100644 index 000000000..c8a9374df --- /dev/null +++ b/doc/source/_includes/x00-series-alarm-messages.rest @@ -0,0 +1,33 @@ + +.. rsg1586183719424 +.. _alarm-messages-overview: + +Alarm messages are numerically coded by the type of alarm. + +For more information, see +:ref:`Fault Management Overview `. + +In the alarm description tables, the severity of the alarms is represented by +one or more letters, as follows: + +.. _alarm-messages-overview-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +A slash-separated list of letters is used when the alarm can be triggered with +one of several severity levels. + +An asterisk \(\*\) indicates the management-affecting severity, if any. A +management-affecting alarm is one that cannot be ignored at the indicated +severity level or higher by using relaxed alarm rules during an orchestrated +patch or upgrade operation. + +.. note:: + **Degrade Affecting Severity: Critical** indicates a node will be + degraded if the alarm reaches a Critical level. \ No newline at end of file diff --git a/doc/source/fault/100-series-alarm-messages.rst b/doc/source/fault/100-series-alarm-messages.rst new file mode 100644 index 000000000..596d2bdd9 --- /dev/null +++ b/doc/source/fault/100-series-alarm-messages.rst @@ -0,0 +1,336 @@ + +.. jsy1579701868527 +.. _100-series-alarm-messages: + +========================= +100 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _100-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.101** + - Platform CPU threshold exceeded; threshold x%, actual y%. + CRITICAL @ 95% + + MAJOR @ 90% + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - Critical + * - Severity: + - C/M\* + * - Proposed Repair Action + - Monitor and if condition persists, contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.103** + - Memory threshold exceeded; threshold x%, actual y% . + + CRITICAL @ 90% + + MAJOR @ 80% + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - Critical + * - Severity: + - C/M + * - Proposed Repair Action + - Monitor and if condition persists, contact next level of support; may + require additional memory on Host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.104** + - File System threshold exceeded; threshold x%, actual y% + + CRITICAL @ 90% + + MAJOR @ 80% + * - Entity Instance + - host=.filesystem= + * - Degrade Affecting Severity: + - Critical + * - Severity: + - C\*/M + * - Proposed Repair Action + - Monitor and if condition persists, consider adding additional physical + volumes to the volume group. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.105** + - filesystem is not added on both controllers and/or does not + have the same size: . + * - Entity Instance + - fs\_name= + * - Degrade Affecting Severity: + - None + * - Severity: + - C/M\* + * - Proposed Repair Action + - Add image-conversion filesystem on both controllers. + + Consult the System Administration Manual for more details. + + If problem persists, contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.106** + - 'OAM' Port failed. + * - Entity Instance + - host=.port= + * - Degrade Affecting Severity: + - Major + * - Severity: + - M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.107** + - 'OAM' Interface degraded. + + or + + 'OAM' Interface failed. + * - Entity Instance + - host=.interface= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C or M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.108** + - 'MGMT' Port failed. + * - Entity Instance + - host=.port= + * - Degrade Affecting Severity: + - Major + * - Severity: + - M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.109** + - 'OAM' Interface degraded. + + or + + 'OAM' Interface failed. + * - Entity Instance + - host=.interface= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C or M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.110** + - 'CLUSTER-HOST' Port failed. + * - Entity Instance + - host=.port= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C or M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.111** + - 'CLUSTER-HOST' Interface degraded. + + OR + + 'CLUSTER-HOST' Interface failed. + * - Entity Instance + - host=.interface= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C or M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.112** + - 'DATA-VRS' Port down. + * - Entity Instance + - host=.port= + * - Degrade Affecting Severity: + - Major + * - Severity: + - M + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.113** + - 'DATA-VRS' Interface degraded. + + or + + 'DATA-VRS' Interface down. + * - Entity Instance + - host=.interface= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C or M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.114** + - NTP configuration does not contain any valid or reachable NTP servers. + The alarm is raised regardless of NTP enabled/disabled status. + + NTP address is not a valid or a reachable NTP server. + + Connectivity to external PTP Clock Synchronization is lost. + * - Entity Instance + - host=.ntp + + host=.ntp= + * - Degrade Affecting Severity: + - None + * - Severity: + - M or m + * - Proposed Repair Action + - Monitor and if condition persists, contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.118** + - Controller cannot establish connection with remote logging server. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - m + * - Proposed Repair Action + - Ensure Remote Log Server IP is reachable from Controller through OAM + interface; otherwise contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 100.119** + - Major: PTP configuration or out-of-tolerance time-stamping conditions. + + Minor: PTP out-of-tolerance time-stamping condition. + * - Entity Instance + - host=.ptp OR host=.ptp=no-lock + + OR + + host=.ptp=.unsupported=hardware-timestamping + + OR + + host=.ptp=.unsupported=software-timestamping + + OR + + host=.ptp=.unsupported=legacy-timestamping + + OR + + host=.ptp=out-of-tolerance + * - Degrade Affecting Severity: + - None + * - Severity: + - M or m + * - Proposed Repair Action + - Monitor and, if condition persists, contact next level of support. \ No newline at end of file diff --git a/doc/source/fault/200-series-alarm-messages.rst b/doc/source/fault/200-series-alarm-messages.rst new file mode 100644 index 000000000..a49c3691e --- /dev/null +++ b/doc/source/fault/200-series-alarm-messages.rst @@ -0,0 +1,402 @@ + +.. uof1579701912856 +.. _200-series-alarm-messages: + +========================= +200 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _200-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.001** + - was administratively locked to take it out-of-service. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - W\* + * - Proposed Repair Action + - Administratively unlock Host to bring it back in-service. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.004** + - experienced a service-affecting failure. + + Host is being auto recovered by Reboot. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - If auto-recovery is consistently unable to recover host to the + unlocked-enabled state contact next level of support or lock and replace + failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.005** + - Degrade: + + is experiencing an intermittent 'Management Network' + communication failures that have exceeded its lower alarming threshold. + + Failure: + + is experiencing a persistent Critical 'Management Network' + communication failure. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* (Degrade) or C\* (Failure) + * - Proposed Repair Action + - Check 'Management Network' connectivity and support for multicast + messaging. If problem consistently occurs after that and Host is reset, + then contact next level of support or lock and replace failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.006** + - Main Process Monitor Daemon Failure \(Major\) + + 'Process Monitor' \(pmond\) process is not running or + functioning properly. The system is trying to recover this process. + + Monitored Process Failure \(Critical/Major/Minor\) + + Critical: Critical '' process has failed and + could not be auto-recovered gracefully. Auto-recovery progression by + host reboot is required and in progress. + + Major: is degraded due to the failure of its '' + process. Auto recovery of this Major process is in progress. + + Minor: + + '' process has failed. Auto recovery of this + Minor process is in progress. + + '' process has failed. Manual recovery is required. + + tp4l/phc2sys process failure. Manual recovery is required. + * - Entity Instance + - host=.process= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C/M/m\* + * - Proposed Repair Action + - If this alarm does not automatically clear after some time and continues + to be asserted after Host is locked and unlocked then contact next level + of support for root cause analysis and recovery. + + If problem consistently occurs after Host is locked and unlocked then + contact next level of support for root cause analysis and recovery. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.007** + - Critical: \(with host degrade\): + + Host is degraded due to a 'Critical' out-of-tolerance reading from the + '' sensor + + Major: \(with host degrade\) + + Host is degraded due to a 'Major' out-of-tolerance reading from the + '' sensor + + Minor: + + Host is reporting a 'Minor' out-of-tolerance reading from the + '' sensor + * - Entity Instance + - host=.sensor= + * - Degrade Affecting Severity: + - Critical + * - Severity: + - C/M/m + * - Proposed Repair Action + - If problem consistently occurs after Host is power cycled and or reset, + contact next level of support or lock and replace failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.009** + - Degrade: + + is experiencing an intermittent 'Cluster-host Network' + communication failures that have exceeded its lower alarming threshold. + + Failure: + + is experiencing a persistent Critical 'Cluster-host Network' + communication failure. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* (Degrade) or C\* (Critical) + * - Proposed Repair Action + - Check 'Cluster-host Network' connectivity and support for multicast + messaging. If problem consistently occurs after that and Host is reset, + then contact next level of support or lock and replace failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.010** + - access to board management module has failed. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - W + * - Proposed Repair Action + - Check Host's board management configuration and connectivity. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.011** + - experienced a configuration failure during initialization. + Host is being re-configured by Reboot. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - If auto-recovery is consistently unable to recover host to the + unlocked-enabled state contact next level of support or lock and + replace failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.012** + - controller function has in-service failure while compute + services remain healthy. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - Major + * - Severity: + - C\* + * - Proposed Repair Action + - Lock and then Unlock host to recover. Avoid using 'Force Lock' action + as that will impact compute services running on this host. If lock action + fails then contact next level of support to investigate and recover. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.013** + - compute service of the only available controller is not + operational. Auto-recovery is disabled. Degrading host instead. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - Major + * - Severity: + - M\* + * - Proposed Repair Action + - Enable second controller and Switch Activity \(Swact\) over to it as + soon as possible. Then Lock and Unlock host to recover its local compute + service. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.014** + - The Hardware Monitor was unable to load, configure and monitor one + or more hardware sensors. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - m + * - Proposed Repair Action + - Check Board Management Controller provisioning. Try reprovisioning the + BMC. If problem persists try power cycling the host and then the entire + server including the BMC power. If problem persists then contact next + level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 200.015** + - Unable to read one or more sensor groups from this host's board + management controller. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Check board management connectivity and try rebooting the board + management controller. If problem persists contact next level of + support or lock and replace failing host. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 210.001** + - System Backup in progress. + * - Entity Instance + - host=controller + * - Degrade Affecting Severity: + - None + * - Severity: + - m\* + * - Proposed Repair Action + - No action required. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 250.001** + - Configuration is out-of-date. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Administratively lock and unlock to update config. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 250.003** + - Kubernetes certificates rotation failed on host . + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M/w + * - Proposed Repair Action + - Rotate kubernetes certificates manually. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 270.001** + - Host compute services failure\[, reason = \] + * - Entity Instance + - host=.services=compute + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for host services recovery to complete; if problem persists contact + next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 280.001** + - is offline. + * - Entity Instance + - subcloud= + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for subcloud to become online; if problem persists contact next + level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 280.001** + - sync status is out-of-sync. + * - Entity Instance + - \[subcloud=.resource= \| \| + \| \] + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - If problem persists contact next level of support. \ No newline at end of file diff --git a/doc/source/fault/200-series-maintenance-customer-log-messages.rst b/doc/source/fault/200-series-maintenance-customer-log-messages.rst new file mode 100644 index 000000000..c0e3dfee4 --- /dev/null +++ b/doc/source/fault/200-series-maintenance-customer-log-messages.rst @@ -0,0 +1,120 @@ + +.. lzz1579291773073 +.. _200-series-maintenance-customer-log-messages: + +============================================ +200 Series Maintenance Customer Log Messages +============================================ + +The Customer Logs include events that do not require immediate user action. + +The following types of events are included in the Customer Logs. The severity +of the events is represented in the table by one or more letters, as follows: + +.. _200-series-maintenance-customer-log-messages-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +- NA: Not applicable + +.. _200-series-maintenance-customer-log-messages-table-zgf-jvw-v5: + + +.. table:: Table 1. Customer Log Messages + :widths: auto + + +-----------------+------------------------------------------------------------------+----------+ + | Log ID | Description | Severity | + + +------------------------------------------------------------------+----------+ + | | Entity Instance ID | | + +=================+==================================================================+==========+ + | 200.020 | has been 'discovered' on the network | NA | + | | | | + | | host=.event=discovered | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.020 | has been 'added' to the system | NA | + | | | | + | | host=.event=add | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.020 | has 'entered' multi-node failure avoidance | NA | + | | | | + | | host=.event=mnfa\_enter | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.020 | has 'exited' multi-node failure avoidance | NA | + | | | | + | | host=.event=mnfa\_exit | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | board management controller has been 'provisioned' | NA | + | | | | + | | host=.command=provision | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | board management controller has been 're-provisioned' | NA | + | | | | + | | host=.command=reprovision | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | board management controller has been 'de-provisioned' | NA | + | | | | + | | host=.command=deprovision | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'unlock' request | NA | + | | | | + | | host=.command=unlock | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'reboot' request | NA | + | | | | + | | host=.command=reboot | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'reset' request | NA | + | | | | + | | host=.command=reset | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'power-off' request | NA | + | | | | + | | host=.command=power-off | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'power-on' request | NA | + | | | | + | | host=.command=power-on | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'reinstall' request | NA | + | | | | + | | host=.command=reinstall | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'force-lock' request | NA | + | | | | + | | host=.command=force-lock | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'delete' request | NA | + | | | | + | | host=.command=delete | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.021 | manual 'controller switchover' request | NA | + | | | | + | | host=.command=swact | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.022 | is now 'disabled' | NA | + | | | | + | | host=.state=disabled | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.022 | is now 'enabled' | NA | + | | | | + | | host=.state=enabled | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.022 | is now 'online' | NA | + | | | | + | | host=.status=online | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.022 | is now 'offline' | NA | + | | | | + | | host=.status=offline | | + +-----------------+------------------------------------------------------------------+----------+ + | 200.022 | is 'disabled-failed' to the system | NA | + | | | | + | | host=.status=failed | | + +-----------------+------------------------------------------------------------------+----------+ \ No newline at end of file diff --git a/doc/source/fault/300-series-alarm-messages.rst b/doc/source/fault/300-series-alarm-messages.rst new file mode 100644 index 000000000..8229f628f --- /dev/null +++ b/doc/source/fault/300-series-alarm-messages.rst @@ -0,0 +1,53 @@ + +.. zwe1579701930425 +.. _300-series-alarm-messages: + +========================= +300 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the +overall health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _300-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.001** + - 'Data' Port failed. + * - Entity Instance + - host=.port= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.002** + - 'Data' Interface degraded. + + or + + 'Data' Interface failed. + * - Entity Instance + - host=.interface= + * - Degrade Affecting Severity: + - Critical + * - Severity: + - C/M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. \ No newline at end of file diff --git a/doc/source/fault/400-series-alarm-messages.rst b/doc/source/fault/400-series-alarm-messages.rst new file mode 100644 index 000000000..42533121f --- /dev/null +++ b/doc/source/fault/400-series-alarm-messages.rst @@ -0,0 +1,69 @@ + +.. ots1579702138430 +.. _400-series-alarm-messages: + +========================= +400 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _400-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.003** + - License key is not installed; a valid license key is required for + operation. + + or + + License key has expired or is invalid; a valid license key is required + for operation. + + or + + Evaluation license key will expire on ; there are days + remaining in this evaluation. + + or + + Evaluation license key will expire on ; there is only 1 day + remaining in this evaluation. + * - Entity Instance: + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Contact next level of support to obtain a new license key. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.003** + - Communication failure detected with peer over port . + + or + + Communication failure detected with peer over port + within the last 30 seconds. + * - Entity Instance: + - host=.network= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent + equipment. \ No newline at end of file diff --git a/doc/source/fault/400-series-customer-log-messages.rst b/doc/source/fault/400-series-customer-log-messages.rst new file mode 100644 index 000000000..3af6539d8 --- /dev/null +++ b/doc/source/fault/400-series-customer-log-messages.rst @@ -0,0 +1,81 @@ + +.. pgb1579292662158 +.. _400-series-customer-log-messages: + +================================ +400 Series Customer Log Messages +================================ + +The Customer Logs include events that do not require immediate user action. + +The following types of events are included in the Customer Logs. The severity +of the events is represented in the table by one or more letters, as follows: + +.. _400-series-customer-log-messages-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +- NA: Not applicable + +.. _400-series-customer-log-messages-table-zgf-jvw-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.003** + - License key has expired or is invalid + + or + + Evaluation license key will expire on + + or + + License key is valid + * - Entity Instance + - host= + * - Severity: + - C + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.005** + - Communication failure detected with peer over port on host + + + or + + Communication failure detected with peer over port on host + within the last seconds + + or + + Communication established with peer over port on host + * - Entity Instance + - host=.network= + * - Severity: + - C + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.007** + - Swact or swact-force + * - Entity Instance + - host= + * - Severity: + - C \ No newline at end of file diff --git a/doc/source/fault/500-series-alarm-messages.rst b/doc/source/fault/500-series-alarm-messages.rst new file mode 100644 index 000000000..d4424bf6c --- /dev/null +++ b/doc/source/fault/500-series-alarm-messages.rst @@ -0,0 +1,49 @@ + +.. xpx1579702157578 +.. _500-series-alarm-messages: + +========================= +500 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _500-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 500.100** + - TPM initialization failed on host. + * - Entity Instance + - tenant= + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Reinstall HTTPS certificate; if problem persists contact next level of + support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 500.101** + - Developer patch certificate enabled. + * - Entity Instance + - host=controller + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Reinstall system to disable developer certificate and remove untrusted + patches. \ No newline at end of file diff --git a/doc/source/fault/750-series-alarm-messages.rst b/doc/source/fault/750-series-alarm-messages.rst new file mode 100644 index 000000000..d54adba79 --- /dev/null +++ b/doc/source/fault/750-series-alarm-messages.rst @@ -0,0 +1,118 @@ + +.. cta1579702173704 +.. _750-series-alarm-messages: + +========================= +750 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _750-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.001** + - Application upload failure. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - W + * - Proposed Repair Action + - Check the system inventory log for the cause. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.002** + - Application apply failure. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Retry applying the application. If the issue persists, please check the + system inventory log for cause. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.003** + - Application remove failure. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Retry removing the application. If the issue persists, please the check + system inventory log for cause. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.004** + - Application apply in progress. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - W + * - Proposed Repair Action + - No action is required. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.005** + - Application update in progress. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - W + * - Proposed Repair Action + - No action is required. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 750.006** + - Automatic application re-apply is pending. + * - Entity Instance + - k8s\_application= + * - Degrade Affecting Severity: + - None + * - Severity: + - W + * - Proposed Repair Action + - Ensure all hosts are either locked or unlocked. When the system is + stable the application will automatically be reapplied. \ No newline at end of file diff --git a/doc/source/fault/800-series-alarm-messages.rst b/doc/source/fault/800-series-alarm-messages.rst new file mode 100644 index 000000000..f134c0924 --- /dev/null +++ b/doc/source/fault/800-series-alarm-messages.rst @@ -0,0 +1,152 @@ + +.. rww1579702317136 +.. _800-series-alarm-messages: + +========================= +800 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _800-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.001** + - Storage Alarm Condition: + + 1 mons down, quorum 1,2 controller-1,storage-0 + * - Entity Instance + - cluster= + * - Degrade Affecting Severity: + - None + * - Severity: + - C/M\* + * - Proposed Repair Action + - If problem persists, contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.003** + - Storage Alarm Condition: Quota/Space mismatch for the tier. + The sum of Ceph pool quotas does not match the tier size. + * - Entity Instance + - cluster=.tier= + * - Degrade Affecting Severity: + - None + * - Severity: + - m + * - Proposed Repair Action + - Update ceph storage pool quotas to use all available tier space. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.010** + - Potential data loss. No available OSDs in storage replication group. + * - Entity Instance + - cluster=.peergroup= + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Ensure storage hosts from replication group are unlocked and available. + Check if OSDs of each storage host are up and running. If problem + persists contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.011** + - Loss of replication in peergroup. + * - Entity Instance + - cluster=.peergroup= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Ensure storage hosts from replication group are unlocked and available. + Check if OSDs of each storage host are up and running. If problem + persists contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.102** + - Storage Alarm Condition: + + PV configuration on . + Reason: . + * - Entity Instance + - pv= + * - Degrade Affecting Severity: + - None + * - Severity: + - C/M\* + * - Proposed Repair Action + - Remove failed PV and associated Storage Device then recreate them. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.103** + - Storage Alarm Condition: + + \[ Metadata usage for LVM thin pool / exceeded + threshold and automatic extension failed + + Metadata usage for LVM thin pool / exceeded + threshold \]; threshold x%, actual y%. + * - Entity Instance + - .lvmthinpool=/ + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Increase Storage Space Allotment for Cinder on the 'lvm' backend. + Consult the user documentation for more details. If problem persists, + contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.104** + - Storage Alarm Condition: + + configuration failed to apply on host: . + * - Degrade Affecting Severity: + - None + * - Severity: + - C\* + * - Proposed Repair Action + - Update backend setting to reapply configuration. Consult the user + documentation for more details. If problem persists, contact next level + of support. \ No newline at end of file diff --git a/doc/source/fault/900-series-alarm-messages.rst b/doc/source/fault/900-series-alarm-messages.rst new file mode 100644 index 000000000..c7d5641c8 --- /dev/null +++ b/doc/source/fault/900-series-alarm-messages.rst @@ -0,0 +1,260 @@ + +.. pti1579702342696 +.. _900-series-alarm-messages: + +========================= +900 Series Alarm Messages +========================= + +The system inventory and maintenance service reports system changes with +different degrees of severity. Use the reported alarms to monitor the overall +health of the system. + +.. include:: ../_includes/x00-series-alarm-messages.rest + +.. _900-series-alarm-messages-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.001** + - Patching operation in progress. + * - Entity Instance + - host=controller + * - Degrade Affecting Severity: + - None + * - Severity: + - m\* + * - Proposed Repair Action + - Complete reboots of affected hosts. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.002** + - Obsolete patch in system. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - W\* + * - Proposed Repair Action + - Remove and delete obsolete patches. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.003** + - Patch host install failure. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Undo patching operation. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.004** + - Host version mismatch. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Reinstall host to update applied load. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.005** + - System Upgrade in progress. + * - Entity Instance + - host=controller + * - Degrade Affecting Severity: + - None + * - Severity: + - m\* + * - Proposed Repair Action + - No action required. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.101** + - Software update auto-apply in progress. + * - Entity Instance + - sw-update + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for software update auto-apply to complete; if problem persists + contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.102** + - Software update auto-apply aborting. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for software update auto-apply abort to complete; if problem + persists contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.103** + - Software update auto-apply failed. + * - Entity Instance + - host= + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Attempt to apply software updates manually; if problem persists contact + next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.201** + - Software upgrade auto-apply in progress. + * - Entity Instance + - orchestration=sw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for software upgrade auto-apply to complete; if problem persists + contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.202** + - Software upgrade auto-apply aborting + * - Entity Instance + - orchestration=sw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for software upgrade auto-apply abort to complete; if problem + persists contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.203** + - Software update auto-apply failed. + * - Entity Instance + - orchestration=sw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Attempt to apply software upgrade manually; if problem persists contact + next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.301** + - Firmware Update auto-apply in progress. + * - Entity Instance + - orchestration=fw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for firmware update auto-apply to complete; if problem persists + contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.302** + - Firmware Update auto-apply aborting. + * - Entity Instance + - orchestration=fw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Wait for firmware update auto-apply abort to complete; if problem + persists contact next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 900.303** + - Firmware Update auto-apply failed. + * - Entity Instance + - orchestration=fw-upgrade + * - Degrade Affecting Severity: + - None + * - Severity: + - M\* + * - Proposed Repair Action + - Attempt to apply firmware update manually; if problem persists + contact next level of support. \ No newline at end of file diff --git a/doc/source/fault/900-series-orchestration-customer-log-messages.rst b/doc/source/fault/900-series-orchestration-customer-log-messages.rst new file mode 100644 index 000000000..99c7ddc8c --- /dev/null +++ b/doc/source/fault/900-series-orchestration-customer-log-messages.rst @@ -0,0 +1,168 @@ + +.. bdq1579700719122 +.. _900-series-orchestration-customer-log-messages: + +============================================== +900 Series Orchestration Customer Log Messages +============================================== + +The Customer Logs include events that do not require immediate user action. + +The following types of events are included in the Customer Logs. The severity +of the events is represented in the table by one or more letters, as follows: + +.. _900-series-orchestration-customer-log-messages-ul-jsd-jkg-vp: + +- C: Critical + +- M: Major + +- m: Minor + +- W: Warning + +- NA: Not applicable + +.. _900-series-orchestration-customer-log-messages-table-zgf-jvw-v5: + +.. table:: Table 1. Customer Log Messages + :widths: auto + + +-------------------+--------------------------------------------+----------+ + | Log ID | Description | Severity | + + +--------------------------------------------+----------+ + | | Entity Instance ID | + +===================+============================================+==========+ + | 900.111 | Software update auto-apply start | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.112 | Software update auto-apply inprogress | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.113 | Software update auto-apply rejected | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.114 | Software update auto-apply canceled | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.115 | Software update auto-apply failed | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.116 | Software update auto-apply completed | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.117 | Software update auto-apply abort | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.118 | Software update auto-apply aborting | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.119 | Software update auto-apply abort rejected | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.120 | Software update auto-apply abort failed | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.121 | Software update auto-apply aborted | C | + | | | | + | | orchestration=sw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.211 | Software upgrade auto-apply start | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.212 | Software upgrade auto-apply inprogress | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.213 | Software upgrade auto-apply rejected | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.214 | Software upgrade auto-apply canceled | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.215 | Software upgrade auto-apply failed | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.216 | Software upgrade auto-apply completed | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.217 | Software upgrade auto-apply abort | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.218 | Software upgrade auto-apply aborting | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.219 | Software upgrade auto-apply abort rejected | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.220 | Software upgrade auto-apply abort failed | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.221 | Software upgrade auto-apply aborted | C | + | | | | + | | orchestration=sw-upgrade | | + +-------------------+--------------------------------------------+----------+ + | 900.311 | Firmware update auto-apply | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.312 | Firmware update auto-apply in progress | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.313 | Firmware update auto-apply rejected | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.314 | Firmware update auto-apply canceled | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.315 | Firmware update auto-apply failed | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.316 | Firmware update auto-apply completed | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.317 | Firmware update auto-apply aborted | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.318 | Firmware update auto-apply aborting | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.319 | Firmware update auto-apply abort rejected | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.320 | Firmware update auto-apply abort failed | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + | 900.321 | Firmware update auto-apply aborted | C | + | | | | + | | orchestration=fw-update | | + +-------------------+--------------------------------------------+----------+ + diff --git a/doc/source/fault/adding-an-snmp-community-string-using-the-cli.rst b/doc/source/fault/adding-an-snmp-community-string-using-the-cli.rst new file mode 100644 index 000000000..41490c9f0 --- /dev/null +++ b/doc/source/fault/adding-an-snmp-community-string-using-the-cli.rst @@ -0,0 +1,111 @@ + +.. xti1552680491532 +.. _adding-an-snmp-community-string-using-the-cli: + +========================================== +Add an SNMP Community String Using the CLI +========================================== + +To enable SNMP services you need to define one or more SNMP community strings +using the command line interface. + +.. rubric:: |context| + +No default community strings are defined on |prod| after the initial +commissioning of the cluster. This means that no SNMP operations are enabled +by default. + +The following exercise illustrates the system commands available to manage and +query SNMP community strings. It uses the string **commstr1** as an example. + +.. caution:: + For security, do not use the string **public**, or other community strings + that could easily be guessed. + +.. rubric:: |prereq| + +All commands must be executed on the active controller's console, which can be +accessed using the OAM floating IP address. You must acquire Keystone **admin** +credentials in order to execute the commands. + +.. rubric:: |proc| + +#. Add the SNMP community string commstr1 to the system. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-comm-add -c commstr1 + +-----------+--------------------------------------+ + | Property | Value | + +-----------+--------------------------------------+ + | access | ro | + | uuid | eccf5729-e400-4305-82e2-bdf344eb868d | + | community | commstr1 | + | view | .1 | + +-----------+--------------------------------------+ + + + The following are attributes associated with the new community string: + + **access** + The SNMP access type. In |prod| all community strings provide read-only + access. + + **uuid** + The UUID associated with the community string. + + **community** + The community string value. + + **view** + The is always the full MIB tree. + +#. List available community strings. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-comm-list + +----------------+--------------------+--------+ + | SNMP community | View | Access | + +----------------+--------------------+--------+ + | commstr1 | .1 | ro | + +----------------+--------------------+--------+ + +#. Query details of a specific community string. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-comm-show commstr1 + +------------+--------------------------------------+ + | Property | Value | + +------------+--------------------------------------+ + | access | ro | + | created_at | 2014-08-14T21:12:10.037637+00:00 | + | uuid | eccf5729-e400-4305-82e2-bdf344eb868d | + | community | commstr1 | + | view | .1 | + +------------+--------------------------------------+ + +#. Delete a community string. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-comm-delete commstr1 + Deleted community commstr1 + +.. rubric:: |result| + +Community strings in |prod| provide query access to any SNMP monitor +workstation that can reach the controller's OAM address on UDP port 161. + +You can verify SNMP access using any monitor tool. For example, the freely +available command :command:`snmpwalk` can be issued from any host to list +the state of all SNMP Object Identifiers \(OID\): + +.. code-block:: none + + $ snmpwalk -v 2c -c commstr1 10.10.10.100 > oids.txt + +In this example, 10.10.10.100 is the |prod| OAM floating IP address. The output, +which is a large file, is redirected to the file oids.txt. + diff --git a/doc/source/fault/cli-commands-and-paged-output.rst b/doc/source/fault/cli-commands-and-paged-output.rst new file mode 100644 index 000000000..7d6043c83 --- /dev/null +++ b/doc/source/fault/cli-commands-and-paged-output.rst @@ -0,0 +1,61 @@ + +.. idb1552680603462 +.. _cli-commands-and-paged-output: + +============================= +CLI Commands and Paged Output +============================= + +There are some CLI commands that perform paging, and you can use options to +limit the paging or to disable it, which is useful for scripts. + +CLI fault management commands that perform paging include: + +.. _cli-commands-and-paged-output-ul-wjz-y4q-bw: + +- :command:`fm event-list` + +- :command:`fm event-suppress` + +- :command:`fm event-suppress-list` + +- :command:`fm event-unsuppress` + +- :command:`fm event-unsuppress-all` + + +To turn paging off, use the --nopaging option for the above commands. The +--nopaging option is useful for bash script writers. + +.. _cli-commands-and-paged-output-section-N10074-N1001C-N10001: + +-------- +Examples +-------- + +The following examples demonstrate the resulting behavior from the use and +non-use of the paging options. + +This produces a paged list of events. + +.. code-block:: none + + ~(keystone_admin)$ fm event-list + +This produces a list of events without paging. + +.. code-block:: none + + ~(keystone_admin)$ fm event-list --nopaging + +This produces a paged list of 50 events. + +.. code-block:: none + + ~(keystone_admin)$ fm event-list --limit 50 + +This will produce a list of 50 events without paging. + +.. code-block:: none + + ~(keystone_admin)$ fm event-list --limit 50 --nopaging \ No newline at end of file diff --git a/doc/source/fault/configuring-snmp-trap-destinations.rst b/doc/source/fault/configuring-snmp-trap-destinations.rst new file mode 100644 index 000000000..a8d219a89 --- /dev/null +++ b/doc/source/fault/configuring-snmp-trap-destinations.rst @@ -0,0 +1,89 @@ + +.. sjb1552680530874 +.. _configuring-snmp-trap-destinations: + +================================ +Configure SNMP Trap Destinations +================================ + +SNMP trap destinations are hosts configured in |prod| to receive unsolicited +SNMP notifications. + +.. rubric:: |context| + +Destination hosts are specified by IP address, or by host name if it can be +properly resolved by |prod|. Notifications are sent to the hosts using a +designated community string so that they can be validated. + +.. rubric:: |proc| + +#. Configure IP address 10.10.10.1 to receive SNMP notifications using the + community string commstr1. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-trapdest-add -c commstr1 --ip_address 10.10.10.1 + +------------+--------------------------------------+ + | Property | Value | + +------------+--------------------------------------+ + | uuid | c7b6774e-7f45-40f5-bcca-3668de2a186f | + | ip_address | 10.10.10.1 | + | community | commstr1 | + | type | snmpv2c_trap | + | port | 162 | + | transport | udp | + +------------+--------------------------------------+ + + The following are attributes associated with the new community string: + + **uuid** + The UUID associated with the trap destination object. + + **ip\_address** + The trap destination IP address. + + **community** + The community string value to be associated with the notifications. + + **type** + snmpv2c\_trap, the only supported message type for SNMP traps. + + **port** + The destination UDP port that SNMP notifications are sent to. + + **transport** + The transport protocol used to send notifications. + +#. List defined trap destinations. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-trapdest-list + +------------+----------------+------+--------------+-----------+ + | IP Address | SNMP Community | Port | Type | Transport | + +-------------+----------------+------+--------------+-----------+ + | 10.10.10.1 | commstr1 | 162 | snmpv2c_trap | udp | + +-------------+----------------+------+--------------+-----------+ + +#. Query access details of a specific trap destination. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-trapdest-show 10.10.10.1 + +------------+--------------------------------------+ + | Property | Value | + +------------+--------------------------------------+ + | uuid | c7b6774e-7f45-40f5-bcca-3668de2a186f | + | ip_address | 10.10.10.1 | + | community | commstr1 | + | type | snmpv2c_trap | + | port | 162 | + | transport | udp | + +------------+--------------------------------------+ + +#. Disable the sending of SNMP notifications to a specific IP address. + + .. code-block:: none + + ~(keystone_admin)$ system snmp-trapdest-delete 10.10.10.1 + Deleted ip 10.10.10.1 \ No newline at end of file diff --git a/doc/source/fault/deleting-an-alarm-using-the-cli.rst b/doc/source/fault/deleting-an-alarm-using-the-cli.rst new file mode 100644 index 000000000..df8a00f2b --- /dev/null +++ b/doc/source/fault/deleting-an-alarm-using-the-cli.rst @@ -0,0 +1,34 @@ + +.. cpy1552680695138 +.. _deleting-an-alarm-using-the-cli: + +============================= +Delete an Alarm Using the CLI +============================= + +You can manually delete an alarm that is not automatically cleared by the +system. + +.. rubric:: |context| + +Manually deleting an alarm should not be done unless it is absolutely +clear that there is no reason for the alarm to be active. + +You can use the command :command:`fm alarm-delete` to manually delete an alarm +that remains active/set for no apparent reason, which may happen in rare +conditions. Alarms usually clear automatically when the related trigger or +fault condition is corrected. + +.. rubric:: |proc| + +.. _deleting-an-alarm-using-the-cli-steps-clp-fzw-nkb: + +- To delete an alarm, use the :command:`fm alarm-delete` command. + + For example: + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-delete 4ab5698a-19cb-4c17-bd63-302173fef62c + + Substitute the UUID of the alarm you wish to delete. \ No newline at end of file diff --git a/doc/source/fault/enabling-snmp-support.rst b/doc/source/fault/enabling-snmp-support.rst new file mode 100644 index 000000000..4a53712a2 --- /dev/null +++ b/doc/source/fault/enabling-snmp-support.rst @@ -0,0 +1,26 @@ + +.. nat1580220934509 +.. _enabling-snmp-support: + +=================== +Enable SNMP Support +=================== + +SNMP support must be enabled before you can begin using it to monitor a system. + +.. rubric:: |context| + +In order to have a workable SNMP configuration you must use the command line +interface on the active controller to complete the following steps. + +.. rubric:: |proc| + +#. Define at least one SNMP community string. + + See |fault-doc|: :ref:`Adding an SNMP Community String Using the CLI ` for details. + +#. Configure at least one SNMP trap destination. + + This will allow alarms and logs to be reported as they happen. + + For more information, see :ref:`Configuring SNMP Trap Destinations `. \ No newline at end of file diff --git a/doc/source/fault/events-suppression-overview.rst b/doc/source/fault/events-suppression-overview.rst new file mode 100644 index 000000000..35c2a35ba --- /dev/null +++ b/doc/source/fault/events-suppression-overview.rst @@ -0,0 +1,33 @@ + +.. pmt1552680681730 +.. _events-suppression-overview: + +=========================== +Events Suppression Overview +=========================== + +All alarms are unsuppressed by default. A suppressed alarm is excluded from the +Active Alarm and Events displays by setting the **Suppression Status** filter, +on the Horizon Web interface, the CLI, or REST APIs, and will not be included +in the Active Alarm Counts. + +.. warning:: + Suppressing an alarm will result in the system NOT notifying the operator + of this particular fault. + +The Events Suppression page, available from **Admin** \> **Fault Management** +\> **Events Suppression** in the left-hand pane, provides the suppression +status of each event type and functionality for suppressing or unsuppressing +each event type. + +As shown below, the Events Suppression page lists each event type by ID, and +provides a description of the event and a current status indicator. Each event +can be suppressed using the **Suppress Event** button. + +You can sort events by clicking the **Event ID**, **Description**, and +**Status** column headers. You can also use these as filtering criteria +from the **Search** field. + +.. figure:: figures/uty1463514747661.png + :scale: 70 % + :alt: Event Suppression \ No newline at end of file diff --git a/doc/source/fault/fault-management-overview.rst b/doc/source/fault/fault-management-overview.rst new file mode 100644 index 000000000..7c1b22765 --- /dev/null +++ b/doc/source/fault/fault-management-overview.rst @@ -0,0 +1,69 @@ + +.. yrq1552337051689 +.. _fault-management-overview: + +========================= +Fault Management Overview +========================= + +An admin user can view |prod-long| fault management alarms and logs in order +to monitor and respond to fault conditions. + +See :ref:`Alarm Messages <100-series-alarm-messages>` for the list of +alarms and :ref:`Customer Log Messages +<200-series-maintenance-customer-log-messages>` +for the list of customer logs reported by |prod|. + +You can access active and historical alarms, and customer logs using the CLI, +GUI, REST APIs and SNMP. + +To use the CLI, see +:ref:`Viewing Active Alarms Using the CLI +` +and :ref:`Viewing the Event Log Using the CLI +`. + +Using the GUI, you can obtain fault management information in a number of +places. + +.. _fault-management-overview-ul-nqw-hbp-mx: + +- The Fault Management pages, available from + **Admin** \> **Fault Management** in the left-hand pane, provide access to + the following: + + - The Global Alarm Banner in the page header of all screens provides the + active alarm counts for all alarm severities, see + :ref:`The Global Alarm Banner `. + + - **Admin** \> **Fault Management** \> **Active Alarms**—Alarms that are + currently set, and require user action to clear them. For more + information about active alarms, see + :ref:`Viewing Active Alarms Using the CLI + ` + and :ref:`Deleting an Alarm Using the CLI + `. + + - **Admin** \> **Fault Management** \> **Events**—The event log + consolidates historical alarms that have occurred in the past, that + is, both set and clear events of active alarms, as well as customer + logs. + + For more about the event log, which includes historical alarms and + customer logs, see + :ref:`Viewing the Event Log Using Horizon + `. + + - **Admin** \> **Fault Management** \> **Events Suppression**—Individual + events can be put into a suppressed state or an unsuppressed state. A + suppressed alarm is excluded from the Active Alarm and Events displays. + All alarms are unsuppressed by default. An event can be suppressed or + unsuppressed using the Horizon Web interface, the CLI, or REST APIs. + +- The Data Network Topology view provides real-time alarm information for + data networks and associated worker hosts and data/pci-passthru/pci-sriov + interfaces. + +.. xreflink For more information, see |datanet-doc|: :ref:`The Data Network Topology View `. + +To use SNMP, see :ref:`SNMP Overview `. \ No newline at end of file diff --git a/doc/source/fault/figures/nlc1463584178366.png b/doc/source/fault/figures/nlc1463584178366.png new file mode 100644 index 000000000..f625eb77d Binary files /dev/null and b/doc/source/fault/figures/nlc1463584178366.png differ diff --git a/doc/source/fault/figures/psa1567524091300.png b/doc/source/fault/figures/psa1567524091300.png new file mode 100644 index 000000000..7466ac8bb Binary files /dev/null and b/doc/source/fault/figures/psa1567524091300.png differ diff --git a/doc/source/fault/figures/uty1463514747661.png b/doc/source/fault/figures/uty1463514747661.png new file mode 100644 index 000000000..ada2844f4 Binary files /dev/null and b/doc/source/fault/figures/uty1463514747661.png differ diff --git a/doc/source/fault/figures/xyj1558447807645.png b/doc/source/fault/figures/xyj1558447807645.png new file mode 100644 index 000000000..dcdec9bd1 Binary files /dev/null and b/doc/source/fault/figures/xyj1558447807645.png differ diff --git a/doc/source/fault/index.rs1 b/doc/source/fault/index.rs1 new file mode 100644 index 000000000..9de4788d4 --- /dev/null +++ b/doc/source/fault/index.rs1 @@ -0,0 +1,71 @@ +============================ +|prod-long| Fault Management +============================ + +- Fault Management Overview + + - :ref:`Fault Management Overview ` + +- The Global Alarm Banner + + - :ref:`The Global Alarm Banner ` + +- Viewing Active Alarms + + - :ref:`Viewing Active Alarms Using Horizon ` + - :ref:`Viewing Active Alarms Using the CLI ` + - :ref:`Viewing Alarm Details Using the CLI ` + +- Viewing the Event Log + + - :ref:`Viewing the Event Log Using Horizon ` + - :ref:`Viewing the Event Log Using the CLI ` + +- Deleting an Alarm + + - :ref:`Deleting an Alarm Using the CLI ` + +- Events Suppression + + - :ref:`Events Suppression Overview ` + - :ref:`Suppressing and Unsuppressing Events ` + - :ref:`Viewing Suppressed Alarms Using the CLI ` + - :ref:`Suppressing an Alarm Using the CLI ` + - :ref:`Unsuppressing an Alarm Using the CLI ` + +- CLI Commands and Paged Output + + - :ref:`CLI Commands and Paged Output ` + +- SNMP + + - :ref:`SNMP Overview ` + - :ref:`Enabling SNMP Support ` + - :ref:`Traps ` + + - :ref:`Configuring SNMP Trap Destinations ` + + - :ref:`SNMP Event Table ` + - :ref:`Adding an SNMP Community String Using the CLI ` + - :ref:`Setting SNMP Identifying Information ` + +- :ref:`Troubleshooting Log Collection ` +- Cloud Platform Alarm Messages + + - :ref:`Alarm Messages Overview ` + - :ref:`100 Series Alarm Messages <100-series-alarm-messages>` + - :ref:`200 Series Alarm Messages <200-series-alarm-messages>` + - :ref:`300 Series Alarm Messages <300-series-alarm-messages>` + - :ref:`400 Series Alarm Messages <400-series-alarm-messages>` + - :ref:`500 Series Alarm Messages <500-series-alarm-messages>` + - :ref:`750 Series Alarm Messages <750-series-alarm-messages>` + - :ref:`800 Series Alarm Messages <800-series-alarm-messages>` + - :ref:`900 Series Alarm Messages <900-series-alarm-messages>` + +- Cloud Platform Customer Log Messages + + - :ref:`200 Series Maintenance Customer Log Messages <200-series-maintenance-customer-log-messages>` + - :ref:`400 Series Customer Log Messages <400-series-customer-log-messages>` + - :ref:`900 Series Orchestration Customer Log Messages <900-series-orchestration-customer-log-messages>` + + diff --git a/doc/source/fault/index.rst b/doc/source/fault/index.rst new file mode 100644 index 000000000..b9e0771d9 --- /dev/null +++ b/doc/source/fault/index.rst @@ -0,0 +1,161 @@ +.. Fault Management file, created by + sphinx-quickstart on Thu Sep 3 15:14:59 2020. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +================ +Fault Management +================ + +-------------------- +StarlingX Kubernetes +-------------------- + +.. toctree:: + :maxdepth: 1 + + fault-management-overview + +***************** +The global banner +***************** + +.. toctree:: + :maxdepth: 1 + + the-global-alarm-banner + +********************* +Viewing active alarms +********************* + +.. toctree:: + :maxdepth: 1 + + viewing-active-alarms-using-horizon + viewing-active-alarms-using-the-cli + viewing-alarm-details-using-the-cli + +********************* +Viewing the event log +********************* + +.. toctree:: + :maxdepth: 1 + + viewing-the-event-log-using-horizon + viewing-the-event-log-using-the-cli + +***************** +Deleting an alarm +***************** + +.. toctree:: + :maxdepth: 1 + + deleting-an-alarm-using-the-cli + +***************** +Event suppression +***************** + +.. toctree:: + :maxdepth: 1 + + events-suppression-overview + suppressing-and-unsuppressing-events + viewing-suppressed-alarms-using-the-cli + suppressing-an-alarm-using-the-cli + unsuppressing-an-alarm-using-the-cli + +***************************** +CLI commands and paged output +***************************** + +.. toctree:: + :maxdepth: 1 + + cli-commands-and-paged-output + +**** +SNMP +**** + +.. toctree:: + :maxdepth: 1 + + snmp-overview + enabling-snmp-support + traps + configuring-snmp-trap-destinations + snmp-event-table + adding-an-snmp-community-string-using-the-cli + setting-snmp-identifying-information + +****************************** +Troubleshooting log collection +****************************** + +.. toctree:: + :maxdepth: 1 + + troubleshooting-log-collection + +************** +Alarm messages +************** + +.. toctree:: + :maxdepth: 1 + + 100-series-alarm-messages + 200-series-alarm-messages + 300-series-alarm-messages + 400-series-alarm-messages + 500-series-alarm-messages + 750-series-alarm-messages + 800-series-alarm-messages + 900-series-alarm-messages + +************ +Log messages +************ + +.. toctree:: + :maxdepth: 1 + + 200-series-maintenance-customer-log-messages + 400-series-customer-log-messages + 900-series-orchestration-customer-log-messages + +------------------- +StarlingX OpenStack +------------------- + +.. toctree:: + :maxdepth: 1 + + openstack-fault-management-overview + +************************ +OpenStack alarm messages +************************ + +.. toctree:: + :maxdepth: 1 + + openstack-alarm-messages-300s + openstack-alarm-messages-400s + openstack-alarm-messages-700s + openstack-alarm-messages-800s + +******************************* +OpenStack customer log messages +******************************* + +.. toctree:: + :maxdepth: 1 + + openstack-customer-log-messages-270s-virtual-machines + openstack-customer-log-messages-401s-services + openstack-customer-log-messages-700s-virtual-machines \ No newline at end of file diff --git a/doc/source/fault/openstack-alarm-messages-300s.rst b/doc/source/fault/openstack-alarm-messages-300s.rst new file mode 100644 index 000000000..c98f0a7fc --- /dev/null +++ b/doc/source/fault/openstack-alarm-messages-300s.rst @@ -0,0 +1,135 @@ + +.. slf1579788051430 +.. _alarm-messages-300s: + +===================== +Alarm Messages - 300s +===================== + +.. include:: ../_includes/openstack-alarm-messages-xxxs.rest + +.. _alarm-messages-300s-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.003** + - Networking Agent not responding. + * - Entity Instance + - host=.agent= + * - Severity: + - M\* + * - Proposed Repair Action + - If condition persists, attempt to clear issue by administratively locking and unlocking the Host. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.004** + - No enabled compute host with connectivity to provider network. + * - Entity Instance + - host=.providernet= + * - Severity: + - M\* + * - Proposed Repair Action + - Enable compute hosts with required provider network connectivity. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.005** + - Communication failure detected over provider network x% for ranges y% on host z%. + + or + + Communication failure detected over provider network x% on host z%. + * - Entity Instance + - providernet=.host= + * - Severity: + - M\* + * - Proposed Repair Action + - Check neighbor switch port VLAN assignments. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.010** + - ML2 Driver Agent non-reachable + + or + + ML2 Driver Agent reachable but non-responsive + + or + + ML2 Driver Agent authentication failure + + or + + ML2 Driver Agent is unable to sync Neutron database + * - Entity Instance + - host=.ml2driver= + * - Severity: + - M\* + * - Proposed Repair Action + - Monitor and if condition persists, contact next level of support. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.012** + - Openflow Controller connection failed. + * - Entity Instance + - host=.openflow-controller= + * - Severity: + - M\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent equipment. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.013** + - No active Openflow controller connections found for this network. + + or + + One or more Openflow controller connections in disconnected state for this network. + * - Entity Instance + - host=.openflow-network= + * - Severity: + - C, M\* + * - Proposed Repair Action + - host=.openflow-network= + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.015** + - No active OVSDB connections found. + * - Entity Instance + - host= + * - Severity: + - C\* + * - Proposed Repair Action + - Check cabling and far-end port configuration and status on adjacent equipment. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 300.016** + - Dynamic routing agent x% lost connectivity to peer y% + * - Entity Instance + - host=,agent=,bgp-peer= + * - Severity: + - M\* + * - Proposed Repair Action + - If condition persists, fix connectivity to peer. \ No newline at end of file diff --git a/doc/source/fault/openstack-alarm-messages-400s.rst b/doc/source/fault/openstack-alarm-messages-400s.rst new file mode 100644 index 000000000..ec58d929d --- /dev/null +++ b/doc/source/fault/openstack-alarm-messages-400s.rst @@ -0,0 +1,55 @@ + +.. msm1579788069384 +.. _alarm-messages-400s: + +===================== +Alarm Messages - 400s +===================== + +.. include:: ../_includes/openstack-alarm-messages-xxxs.rest + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.001** + - Service group failure; . + + or + + Service group degraded; + + or + + Service group Warning; . + * - Entity Instance + - service\_domain=.service\_group=.host= + * - Severity: + - C/M/m\* + * - Proposed Repair Action + - Contact next level of support. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 400.002** + - Service group loss of redundancy; expected standby member but only standby member available. + + or + + Service group loss of redundancy; expected standby member but only standby member available. + + or + + Service group loss of redundancy; expected active member but no active members available. + + or + + Service group loss of redundancy; expected active member but only active member available. + * - Entity Instance + - service\_domain=.service\_group= + * - Severity: + - M\* + * - Proposed Repair Action + - Bring a controller node back in to service, otherwise contact next level of support. \ No newline at end of file diff --git a/doc/source/fault/openstack-alarm-messages-700s.rst b/doc/source/fault/openstack-alarm-messages-700s.rst new file mode 100644 index 000000000..66857bfb1 --- /dev/null +++ b/doc/source/fault/openstack-alarm-messages-700s.rst @@ -0,0 +1,275 @@ + +.. uxo1579788086872 +.. _alarm-messages-700s: + +===================== +Alarm Messages - 700s +===================== + +.. include:: ../_includes/openstack-alarm-messages-xxxs.rest + +.. _alarm-messages-700s-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.001** + - Instance owned by has failed on host + + + Instance owned by has failed to + schedule + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - The system will attempt recovery; no repair action required. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.002** + - Instance owned by is paused on host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Unpause the instance. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.003** + - Instance owned by is suspended on host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Resume the instance. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.004** + - Instance owned by is stopped on host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Start the instance. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.005** + - Instance owned by is rebooting on host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for reboot to complete; if problem persists contact next level of + support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.006** + - Instance owned by is rebuilding on host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for rebuild to complete; if problem persists contact next level of + support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.007** + - Instance owned by is evacuating from host + . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for evacuate to complete; if problem persists contact next level of + support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.008** + - Instance owned by is live migrating from + host + * - Entity Instance + - tenant=.instance= + * - Severity: + - W\* + * - Proposed Repair Action + - Wait for live migration to complete; if problem persists contact next + level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.009** + - Instance owned by is cold migrating from + host + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for cold migration to complete; if problem persists contact next + level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.010** + - Instance owned by has been cold-migrated + to host waiting for confirmation. + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Confirm or revert cold-migrate of instance. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.011** + - Instance owned by is reverting cold + migrate to host + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for cold migration revert to complete; if problem persists contact + next level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.012** + - Instance owned by is resizing on host + + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for resize to complete; if problem persists contact next level of + support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.013** + - Instance owned by has been resized on + host waiting for confirmation. + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Confirm or revert resize of instance. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.014** + - Instance owned by is reverting resize + on host . + * - Entity Instance + - tenant=.instance= + * - Severity: + - C\* + * - Proposed Repair Action + - Wait for resize revert to complete; if problem persists contact next + level of support. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.016** + - Multi-Node Recovery Mode + * - Entity Instance + - subsystem=vim + * - Severity: + - m\* + * - Proposed Repair Action + - Wait for the system to exit out of this mode. + +----- + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 700.017** + - Server group policy was not satisfied. + * - Entity Instance + - server-group + * - Severity: + - M + * - Proposed Repair Action + - Migrate instances in an attempt to satisfy the policy; if problem + persists contact next level of support. \ No newline at end of file diff --git a/doc/source/fault/openstack-alarm-messages-800s.rst b/doc/source/fault/openstack-alarm-messages-800s.rst new file mode 100644 index 000000000..7a8c05b5e --- /dev/null +++ b/doc/source/fault/openstack-alarm-messages-800s.rst @@ -0,0 +1,98 @@ + +.. tsh1579788106505 +.. _alarm-messages-800s: + +===================== +Alarm Messages - 800s +===================== + +.. include:: ../_includes/openstack-alarm-messages-xxxs.rest + +.. _alarm-messages-800s-table-zrd-tg5-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.002** + - Image storage media is full: There is not enough disk space on the image storage media. + + or + + Instance snapshot failed: There is not enough disk space on the image storage media. + + or + + Supplied \(\) and generated from uploaded image \(\) did not match. Setting image status to 'killed'. + + or + + Error in store configuration. Adding images to store is disabled. + + or + + Forbidden upload attempt: + + or + + Insufficient permissions on image storage media: + + or + + Denying attempt to upload image larger than bytes. + + or + + Denying attempt to upload image because it exceeds the quota: + + or + + Received HTTP error while uploading image + + or + + Client disconnected before sending all data to backend + + or + + Failed to upload image + * - Entity Instance + - image=, instance= + + or + + image=, instance= + * - Severity: + - W\* + * - Proposed Repair Action + - If problem persists, contact next level of support. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.100** + - Storage Alarm Condition: + + Cinder I/O Congestion is above normal range and is building + * - Entity Instance + - cinder\_io\_monitor + * - Severity: + - M + * - Proposed Repair Action + - Reduce the I/O load on the Cinder LVM backend. Use Cinder QoS mechanisms on high usage volumes. + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Alarm ID: 800.101** + - Storage Alarm Condition: + + Cinder I/O Congestion is high and impacting guest performance + * - Entity Instance + - cinder\_io\_monitor + * - Severity: + - C\* + * - Proposed Repair Action + - Reduce the I/O load on the Cinder LVM backend. Cinder actions may fail until congestion is reduced. Use Cinder QoS mechanisms on high usage volumes. diff --git a/doc/source/fault/openstack-customer-log-messages-270s-virtual-machines.rst b/doc/source/fault/openstack-customer-log-messages-270s-virtual-machines.rst new file mode 100644 index 000000000..b37f47348 --- /dev/null +++ b/doc/source/fault/openstack-customer-log-messages-270s-virtual-machines.rst @@ -0,0 +1,38 @@ + +.. ftb1579789103703 +.. _customer-log-messages-270s-virtual-machines: + +============================================= +Customer Log Messages 270s - Virtual Machines +============================================= + +.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest + +.. _customer-log-messages-270s-virtual-machines-table-zgf-jvw-v5: + +.. table:: Table 1. Customer Log Messages - Virtual Machines + :widths: auto + + +-----------+----------------------------------------------------------------------------------+----------+ + | Log ID | Description | Severity | + + +----------------------------------------------------------------------------------+----------+ + | | Entity Instance ID | | + +===========+==================================================================================+==========+ + | 270.101 | Host compute services failure\[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +-----------+----------------------------------------------------------------------------------+----------+ + | 270.102 | Host compute services enabled | C | + | | | | + | | tenant=.instance= | | + +-----------+----------------------------------------------------------------------------------+----------+ + | 270.103 | Host compute services disabled | C | + | | | | + | | tenant=.instance= | | + +-----------+----------------------------------------------------------------------------------+----------+ + | 275.001 | Host hypervisor is now - | C | + | | | | + | | tenant=.instance= | | + +-----------+----------------------------------------------------------------------------------+----------+ + +See also :ref:`Customer Log Messages 700s - Virtual Machines ` \ No newline at end of file diff --git a/doc/source/fault/openstack-customer-log-messages-401s-services.rst b/doc/source/fault/openstack-customer-log-messages-401s-services.rst new file mode 100644 index 000000000..6173825de --- /dev/null +++ b/doc/source/fault/openstack-customer-log-messages-401s-services.rst @@ -0,0 +1,45 @@ + +.. hwr1579789203684 +.. _customer-log-messages-401s-services: + +===================================== +Customer Log Messages 401s - Services +===================================== + +.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest + +.. _customer-log-messages-401s-services-table-zgf-jvw-v5: + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Log Message: 401.001** + - Service group state change from to on host + * - Entity Instance + - service\_domain=.service\_group=.host= + * - Severity: + - C + +.. list-table:: + :widths: 6 15 + :header-rows: 0 + + * - **Log Message: 401.002** + - Service group loss of redundancy; expected standby member but no standby members available. + + or + + Service group loss of redundancy; expected standby member but only standby member\(s\) available. + + or + + Service group has no active members available; expected active member\(s\) + + or + + Service group loss of redundancy; expected active member\(s\) but only active member\(s\) available. + * - Entity Instance + - service\_domain=.service\_group= + * - Severity: + - C diff --git a/doc/source/fault/openstack-customer-log-messages-700s-virtual-machines.rst b/doc/source/fault/openstack-customer-log-messages-700s-virtual-machines.rst new file mode 100644 index 000000000..3a13c8750 --- /dev/null +++ b/doc/source/fault/openstack-customer-log-messages-700s-virtual-machines.rst @@ -0,0 +1,480 @@ + +.. qfy1579789227230 +.. _customer-log-messages-700s-virtual-machines: + +============================================= +Customer Log Messages 700s - Virtual Machines +============================================= + +.. include:: ../_includes/openstack-customer-log-messages-xxxs.rest + +.. _customer-log-messages-700s-virtual-machines-table-zgf-jvw-v5: + +.. table:: Table 1. Customer Log Messages + :widths: auto + + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | Log ID | Description | Severity | + + +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | | Entity Instance ID | | + +==========+====================================================================================================================================================================================+==========+ + | 700.101 | Instance is enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.102 | Instance owned by has failed\[, reason = \]. | C | + | | Instance owned by has failed to schedule\[, reason = \] | | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.103 | Create issued by or by the system against owned by | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.104 | Creating instance owned by | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.105 | Create rejected for instance \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.106 | Create canceled for instance \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.107 | Create failed for instance \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.108 | Instance owned by has been created | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.109 | Delete issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.110 | Deleting instance owned by | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.111 | Delete rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.112 | Delete canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.113 | Delete failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.114 | Deleted instance owned by | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.115 | Pause issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.116 | Pause inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.117 | Pause rejected for instance enabled on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.118 | Pause canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.119 | Pause failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.120 | Pause complete for instance now paused on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.121 | Unpause issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.122 | Unpause inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.123 | Unpause rejected for instance paused on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.124 | Unpause canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.125 | Unpause failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.126 | Unpause complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.127 | Suspend issued by or by the system> against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.128 | Suspend inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.129 | Suspend rejected for instance enabled on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.130 | Suspend canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.131 | Suspend failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.132 | Suspend complete for instance now suspended on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.133 | Resume issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.134 | Resume inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.135 | Resume rejected for instance suspended on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.136 | Resume canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.137 | Resume failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.138 | Resume complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.139 | Start issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.140 | Start inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.141 | Start rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.142 | Start canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.143 | Start failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.144 | Start complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.145 | Stop issued by \ or by the system or by the instance against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.146 | Stop inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.147 | Stop rejected for instance enabled on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.148 | Stop canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.149 | Stop failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.150 | Stop complete for instance now disabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.151 | Live-Migrate issued by or by the system against instance owned by from host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.152 | Live-Migrate inprogress for instance from host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.153 | Live-Migrate rejected for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.154 | Live-Migrate canceled for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.155 | Live-Migrate failed for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.156 | Live-Migrate complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.157 | Cold-Migrate issued by or by the system against instance owned by from host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.158 | Cold-Migrate inprogress for instance from host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.159 | Cold-Migrate rejected for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.160 | Cold-Migrate canceled for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.161 | Cold-Migrate failed for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.162 | Cold-Migrate complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.163 | Cold-Migrate-Confirm issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.164 | Cold-Migrate-Confirm inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.165 | Cold-Migrate-Confirm rejected for instance now enabled on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.166 | Cold-Migrate-Confirm canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.167 | Cold-Migrate-Confirm failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.168 | Cold-Migrate-Confirm complete for instance enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.169 | Cold-Migrate-Revert issued by or by the system\> against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.170 | Cold-Migrate-Revert inprogress for instance from host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.171 | Cold-Migrate-Revert rejected for instance now on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.172 | Cold-Migrate-Revert canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.173 | Cold-Migrate-Revert failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.174 | Cold-Migrate-Revert complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.175 | Evacuate issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.176 | Evacuating instance owned by from host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.177 | Evacuate rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.178 | Evacuate canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.179 | Evacuate failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.180 | Evacuate complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.181 | Reboot <\(soft-reboot\) or \(hard-reboot\)> issued by or by the system or by the instance against instance owned by | C | + | | on host \[, reason = \] | | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.182 | Reboot inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.183 | Reboot rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.184 | Reboot canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.185 | Reboot failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.186 | Reboot complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.187 | Rebuild issued by or by the system against instance using image on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.188 | Rebuild inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.189 | Rebuild rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.190 | Rebuild canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.191 | Rebuild failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.192 | Rebuild complete for instance now enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.193 | Resize issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.194 | Resize inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.195 | Resize rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.196 | Resize canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.197 | Resize failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.198 | Resize complete for instance enabled on host waiting for confirmation | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.199 | Resize-Confirm issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.200 | Resize-Confirm inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.201 | Resize-Confirm rejected for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.202 | Resize-Confirm canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.203 | Resize-Confirm failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.204 | Resize-Confirm complete for instance enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.205 | Resize-Revert issued by or by the system against instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.206 | Resize-Revert inprogress for instance on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.207 | Resize-Revert rejected for instance owned by on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.208 | Resize-Revert canceled for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.209 | Resize-Revert failed for instance on host \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.210 | Resize-Revert complete for instance enabled on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.214 | Instance has been renamed to owned by on host | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.215 | Guest Health Check failed for instance \[, reason = \] | C | + | | | | + | | tenant=.instance= | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.216 | Entered Multi-Node Recovery Mode | C | + | | | | + | | subsystem-vim | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + | 700.217 | Exited Multi-Node Recovery Mode | C | + | | | | + | | subsystem-vim | | + +----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ + +See also :ref:`Customer Log Messages 270s - Virtual Machines ` \ No newline at end of file diff --git a/doc/source/fault/openstack-fault-management-overview.rst b/doc/source/fault/openstack-fault-management-overview.rst new file mode 100644 index 000000000..6dcd056e4 --- /dev/null +++ b/doc/source/fault/openstack-fault-management-overview.rst @@ -0,0 +1,19 @@ + +.. ekn1458933172232 +.. _openstack-fault-management-overview: + +======== +Overview +======== + +|prod-os| is a containerized application running on top of |prod|. + +All Fault Management related interfaces for displaying alarms and logs, +suppressing/unsuppressing events, enabling SNMP and enabling remote log +collection are available on the |prod| REST APIs, CLIs and/or GUIs. + +.. xreflink See :ref:`Fault Management Overview ` for details on these interfaces. + +This section provides the list of OpenStack related Alarms and Customer Logs +that are monitored and reported for the |prod-os| application through the +|prod| fault management interfaces. \ No newline at end of file diff --git a/doc/source/fault/setting-snmp-identifying-information.rst b/doc/source/fault/setting-snmp-identifying-information.rst new file mode 100644 index 000000000..3d3423d15 --- /dev/null +++ b/doc/source/fault/setting-snmp-identifying-information.rst @@ -0,0 +1,30 @@ + +.. tie1580219717420 +.. _setting-snmp-identifying-information: + +================================ +Set SNMP Identifying Information +================================ + +You can set SNMP system information including name, location and contact +details. + +.. rubric:: |proc| + +- Use the following command syntax to set the **sysContact** attribute. + + .. code-block:: none + + ~(keystone_admin)$ system modify --contact + +- Use the following command syntax to set the **sysLocation** attribute. + + .. code-block:: none + + ~(keystone_admin)$ system modify --location + +- Use the following command syntax to set the **sysName** attribute. + + .. code-block:: none + + ~(keystone_admin)$ system modify --location \ No newline at end of file diff --git a/doc/source/fault/snmp-event-table.rst b/doc/source/fault/snmp-event-table.rst new file mode 100644 index 000000000..83e5d1201 --- /dev/null +++ b/doc/source/fault/snmp-event-table.rst @@ -0,0 +1,44 @@ + +.. rdr1552680506097 +.. _snmp-event-table: + +================ +SNMP Event Table +================ + +|prod| supports SNMP active and historical alarms, and customer logs, in an +event table. + +The event table contains historical alarms \(sets and clears\) alarms and +customer logs. It does not contain active alarms. Each entry in the table +includes the following variables: + +.. _snmp-event-table-ul-y1w-4lk-qq: + +- + +- + +- + +- + +- + +- + +- + +- + +- + +- + +- + +- + +.. note:: + The previous SNMP Historical Alarm Table and the SNMP Customer Log Table + are still supported but marked as deprecated in the MIB. \ No newline at end of file diff --git a/doc/source/fault/snmp-overview.rst b/doc/source/fault/snmp-overview.rst new file mode 100644 index 000000000..5a7a89e37 --- /dev/null +++ b/doc/source/fault/snmp-overview.rst @@ -0,0 +1,136 @@ + +.. gzl1552680561274 +.. _snmp-overview: + +============= +SNMP Overview +============= + +|prod| can generate SNMP traps for |prod| Alarm Events and Customer Log Events. + +This includes alarms based on hardware sensors monitored by board management +controllers. + +.. xreflink For more information, see |node-doc|: :ref:`Sensors Tab `. + +.. contents:: + :local: + :depth: 1 + +.. _snmp-overview-section-N10027-N1001F-N10001: + +------------------ +About SNMP Support +------------------ + +Support for Simple Network Management Protocol \(SNMP\) is implemented as follows: + +.. _snmp-overview-ul-bjv-cjd-cp: + +- access is disabled by default, must be enabled manually from the command + line interface + +- available using the controller's node floating OAM IP address, over the + standard SNMP UDP port 161 + +- supported version is SNMPv2c + +- access is read-only for all SNMP communities + +- all SNMP communities have access to the entire OID tree, there is no + support for VIEWS + +- supported SNMP operations are GET, GETNEXT, GETBULK, and SNMPv2C-TRAP2 + +- the SNMP SET operation is not supported + +For information on enabling SNMP support, see +:ref:`Enabling SNMP Support `. + +.. _snmp-overview-section-N10099-N1001F-N10001: + +----------------------- +SNMPv2-MIB \(RFC 3418\) +----------------------- + +Support for the basic standard MIB for SNMP entities is limited to the System +and SNMP groups, as follows: + +.. _snmp-overview-ul-ulb-ypl-hp: + +- System Group, **.iso.org.dod.internet.mgmt.mib-2.system** + +- SNMP Group, **.iso.org.dod.internet.mgmt.mib-2.snmp** + +- coldStart and warmStart Traps + +The following system attributes are used in support of the SNMP implementation. +They can be displayed using the :command:`system show` command. + +**contact** + A read-write system attribute used to populate the **sysContact** attribute + of the SNMP System group. + +**location** + A read-write system attribute used to populate the **sysLocation** attribute + of the SNMP System group. + +**name** + A read-write system attribute used to populate the **sysName** attribute of + the SNMP System group. + +**software\_version** + A read-only system attribute set automatically by the system. Its value is + used to populate the **sysDescr** attribute of the SNMP System group. + +For information on setting the **sysContact**, **sysLocation**, and **sysName** +attributes, see +:ref:`Setting SNMP Identifying Information `. + +The following SNMP attributes are used as follows: + +**sysObjectId** + Set to **iso.org.dod.internet.private.enterprise.wrs.titanium** \(1.3.6.1.4.1.1.2\). + +**sysUpTime** + Set to the up time of the active controller. + +**sysServices** + Set to the nominal value of 72 to indicate that the host provides services at layers 1 to 7. + +.. _snmp-overview-section-N100C9-N1001F-N10001: + +-------------------------- +Wind River Enterprise MIBs +-------------------------- + +|prod| supports the Wind River Enterprise Registration and Alarm MIBs. + +**Enterprise Registration MIB, wrsEnterpriseReg.mib** + Defines the Wind River Systems \(WRS\) hierarchy underneath the + **iso\(1\).org\(3\).dod\(6\).internet\(1\).private\(4\).enterprise\(1\)**. + This hierarchy is administered as follows: + + - **.wrs\(731\)**, the IANA-registered enterprise code for Wind River + Systems + + - **.wrs\(731\).wrsCommon\(1\).wrs\(1-...\)**, + defined in wrsCommon.mib. + + - **.wrs\(731\).wrsProduct\(2-...\)**, defined in wrs.mib. + +**Alarm MIB, wrsAlarmMib.mib** + Defines the common TRAP and ALARM MIBs for |org| products. + The definition includes textual conventions, an active alarm table, a + historical alarm table, a customer log table, and traps. + + **Textual Conventions** + Semantic statements used to simplify definitions in the active alarm + table and traps components of the MIB. + + **Tables** + See :ref:`SNMP Event Table ` for detailed + descriptions. + + **Traps** + See :ref:`Traps ` for detailed descriptions. \ No newline at end of file diff --git a/doc/source/fault/suppressing-an-alarm-using-the-cli.rst b/doc/source/fault/suppressing-an-alarm-using-the-cli.rst new file mode 100644 index 000000000..80167f293 --- /dev/null +++ b/doc/source/fault/suppressing-an-alarm-using-the-cli.rst @@ -0,0 +1,47 @@ + +.. ani1552680633324 +.. _suppressing-an-alarm-using-the-cli: + +=============================== +Suppress an Alarm Using the CLI +=============================== + +You can use the CLI to prevent a monitored system parameter from generating +unnecessary alarms. + +.. rubric:: |proc| + +#. Use the :command:`fm event-suppress` to suppress a single alarm or + multiple alarms by ID. + + .. code-block:: none + + ~(keystone_admin)$ fm event-suppress [--nowrap] --alarm id [,] \ + [--nopaging] [--uuid] + + where + + **** + is a comma separated list of alarm UUIDs. + + **--nowrap** + disables output wrapping + + **--nopaging** + disables paged output + + **--uuid** + includes the alarm type UUIDs in the output + + An error message is generated in the case of an invalid + : **Alarm ID not found: **. + + If the specified number of Alarm IDs is greater than 1, and at least 1 is + wrong, then the suppress command is not applied \(none of the specified + Alarm IDs are suppressed\). + + .. note:: + Suppressing an Alarm will result in the system NOT notifying the + operator of this particular fault. + + diff --git a/doc/source/fault/suppressing-and-unsuppressing-events.rst b/doc/source/fault/suppressing-and-unsuppressing-events.rst new file mode 100644 index 000000000..ea5f29f0c --- /dev/null +++ b/doc/source/fault/suppressing-and-unsuppressing-events.rst @@ -0,0 +1,37 @@ + +.. sla1552680666298 +.. _suppressing-and-unsuppressing-events: + +================================= +Suppress and Unsuppressing Events +================================= + +You can set events to a suppressed state and toggle them back to unsuppressed. + +.. rubric:: |proc| + +#. Open the Events Suppression page, available from **Admin** \> + **Fault Management** \> **Events Suppression** in the left-hand pane. + + The Events Suppression page appears. It provides the suppression status of + each event type and functionality for suppressing or unsuppressing each + event, depending on the current status of the event. + +#. Locate the event ID that you want to suppress. + +#. Click the **Suppress Event** button for that event. + + You are prompted to confirm that you want to suppress the event. + + .. caution:: + Suppressing an Alarm will result in the system *not* notifying the + operator of this particular fault. + +#. Click **Suppress Event** in the Confirm Suppress Event dialog box. + + The Events Suppression tab is refreshed to show the selected event ID with + a status of Suppressed, as shown below. The **Suppress Event** button is + replaced by **Unsuppress Event**, providing a way to toggle the event back + to unsuppressed. + + .. image:: figures/nlc1463584178366.png \ No newline at end of file diff --git a/doc/source/fault/the-global-alarm-banner.rst b/doc/source/fault/the-global-alarm-banner.rst new file mode 100644 index 000000000..102cb54ef --- /dev/null +++ b/doc/source/fault/the-global-alarm-banner.rst @@ -0,0 +1,25 @@ + +.. wtg1552680748451 +.. _the-global-alarm-banner: + +======================= +The Global Alarm Banner +======================= + +The |prod| Horizon Web interface provides an active alarm counts banner in the +page header of all screens. + +The global alarm banner provides a high-level indicator of faults on the system, +that is always visible, regardless of what page you are on in the GUI. The +banner provides a color-coded snapshot of current active alarm counts for each +alarm severity. + +.. image:: figures/xyj1558447807645.png + +.. note:: + Suppressed alarms are not shown. For more about suppressed alarms, see + :ref:`Events Suppression Overview `. + +Clicking on the alarm banner opens the Fault Management page, where more +detailed information about the alarms is provided. + diff --git a/doc/source/fault/traps.rst b/doc/source/fault/traps.rst new file mode 100644 index 000000000..f2d66327b --- /dev/null +++ b/doc/source/fault/traps.rst @@ -0,0 +1,63 @@ + +.. lmy1552680547012 +.. _traps: + +===== +Traps +===== + +|prod| supports SNMP traps. Traps send unsolicited information to monitoring +software when significant events occur. + +The following traps are defined. + +.. _traps-ul-p1j-tvn-c5: + +- **wrsAlarmCritical** + +- **wrsAlarmMajor** + +- **wrsAlarmMinor** + +- **wrsAlarmWarning** + +- **wrsAlarmMessage** + +- **wrsAlarmClear** + +- **wrsAlarmHierarchicalClear** + +.. note:: + Customer Logs always result in **wrsAlarmMessage** traps. + +For Critical, Major, Minor, Warning, and Message traps, all variables in the +active alarm table are included as varbinds \(variable bindings\), where each +varbind is a pair of fields consisting of an object identifier and a value +for the object. + +For the Clear trap, varbinds include only the following variables: + +.. _traps-ul-uks-byn-nkb: + +- + +- + +- + +- + +For the HierarchicalClear trap, varbinds include only the following variables: + +.. _traps-ul-isn-fyn-nkb: + +- + +- + +- + +For all alarms, the Notification Type is based on the severity of the trap or +alarm. This is done to facilitate the interaction with most SNMP trap viewers +which typically use the Notification Type to drive the coloring of traps, that +is, red for critical, yellow for minor, and so on. \ No newline at end of file diff --git a/doc/source/fault/troubleshooting-log-collection.rst b/doc/source/fault/troubleshooting-log-collection.rst new file mode 100644 index 000000000..91ac87338 --- /dev/null +++ b/doc/source/fault/troubleshooting-log-collection.rst @@ -0,0 +1,99 @@ + +.. ley1552581824091 +.. _troubleshooting-log-collection: + +=========================== +Troubleshoot Log Collection +=========================== + +The |prod| log collection tool gathers detailed information. + +.. contents:: + :local: + :depth: 1 + +.. _troubleshooting-log-collection-section-N10061-N1001C-N10001: + +------------------------------ +Collect Tool Caveats and Usage +------------------------------ + +.. _troubleshooting-log-collection-ul-dpj-bxp-jdb: + +- Log in as **sysadmin**, NOT as root, on the active controller and use the + :command:`collect` command. + +- All usage options can be found by using the following command: + + .. code-block:: none + + (keystone_admin)$ collect --help + +- For |prod| Simplex or Duplex systems, use the following command: + + .. code-block:: none + + (keystone_admin)$ collect --all + +- For |prod| Standard systems, use the following commands: + + + - For a small deployment \(less than two worker nodes\): + + .. code-block:: none + + (keystone_admin)$ collect -–all + + - For large deployments: + + .. code-block:: none + + (keystone_admin)$ collect --list host1 host2 host3 + + +- For systems with an up-time of more than 2 months, use the date range options. + + Use --start-date for the collection of logs on and after a given date: + + .. code-block:: none + + (keystone_admin)$ collect [--start-date | -s] + + Use --end-date for the collection of logs on and before a given date : + + .. code-block:: none + + (keystone_admin)$ collect [--end-date | -s] + +- To prefix the collect tar ball name and easily identify the + :command:`collect` when several are present, use the following command. + + .. code-block:: none + + (keystone_admin)$ collect [--name | -n] + + For example, the following prepends **TEST1** to the name of the tarball: + + .. code-block:: none + + (keystone_admin)$ collect --name TEST1 + [sudo] password for sysadmin: + collecting data from 1 host(s): controller-0 + collecting controller-0_20200316.155805 ... done (00:01:39 56M) + creating user-named tarball /scratch/TEST1_20200316.155805.tar ... done (00:01:39 56M) + +- Prior to using the :command:`collect` command, the nodes need to be + unlocked-enabled or disabled online and are required to be unlocked at + least once. + +- Lock the node and wait for the node to reach the disabled-online state + before collecting logs for a node that is rebooting indefinitely. + +- You may be required to run the local :command:`collect` command if the + collect tool running from the active controller node fails to collect + logs from one of the system nodes. Execute the :command:`collect` command + using the console or BMC connection on the node that displays the failure. + +.. only:: partner + + .. include:: ../_includes/troubleshooting-log-collection.rest \ No newline at end of file diff --git a/doc/source/fault/unsuppressing-an-alarm-using-the-cli.rst b/doc/source/fault/unsuppressing-an-alarm-using-the-cli.rst new file mode 100644 index 000000000..ff581675e --- /dev/null +++ b/doc/source/fault/unsuppressing-an-alarm-using-the-cli.rst @@ -0,0 +1,41 @@ + +.. maj1552680619436 +.. _unsuppressing-an-alarm-using-the-cli: + +==================================== +Unsuppressing an Alarm Using the CLI +==================================== + +If you need to reactivate a suppressed alarm, you can do so using the CLI. + +.. rubric:: |proc| + +- Use the :command:`fm event-unsuppress` CLI command to unsuppress a + currently suppressed alarm. + + .. code-block:: none + + ~(keystone_admin)$ fm event-unsuppress [--nowrap] --alarm_id [,] \ + [--nopaging] [--uuid] + + where + + **** + is a comma separated list of **Alarm ID** s of alarms to unsuppress. + + **--nowrap** + disables output wrapping. + + **--nopaging** + disables paged output + + **--uuid** + includes the alarm type UUIDs in the output. + + Alarm type\(s\) with the specified will be unsuppressed. + + You can unsuppress all currently suppressed alarms using the following command: + + .. code-block:: none + + ~(keystone_admin)$ fm event-unsuppress -all [--nopaging] [--uuid] \ No newline at end of file diff --git a/doc/source/fault/viewing-active-alarms-using-horizon.rst b/doc/source/fault/viewing-active-alarms-using-horizon.rst new file mode 100644 index 000000000..6baaf884f --- /dev/null +++ b/doc/source/fault/viewing-active-alarms-using-horizon.rst @@ -0,0 +1,47 @@ + +.. sqv1552680735693 +.. _viewing-active-alarms-using-horizon: + +================================ +View Active Alarms Using Horizon +================================ + +The |prod| Horizon Web interface provides a page for viewing active alarms. + +Alarms are fault conditions that have a state; they are set and cleared by the +system as a result of monitoring and detecting a change in a fault condition. +Active alarms are alarms that are in the set condition. Active alarms typically +require user action to be cleared, for example, replacing a faulty cable, or +removing files from a nearly full filesystem, etc. + +.. note:: + For data networks and worker host data interfaces, you can also use the + Data Network Topology view to monitor active alarms. + +.. xreflink For more information, see |datanet-doc|: :ref:`The Data Network Topology View `. + +.. rubric:: |proc| + +.. _viewing-active-alarms-using-horizon-steps-n43-ssf-pkb: + +#. Select **Admin** \> **Fault Management** \> **Active Alarms** in the left pane. + + The currently Active Alarms are displayed in a table, by default sorted by + severity with the most critical alarms at the top. A color-coded summary + count of active alarms is shown at the top of the active alarm tab as well. + + You can change the sorting of entries by clicking on the column titles. + For example, to sort the table by timestamp severity, click + **Timestamp**. The entries are re-sorted by timestamp. + + Suppressed alarms are excluded by default from the table. Suppressed alarms + can be included or excluded in the table with the **Show Suppressed** and + **Hide Suppressed** filter buttons at the top right of the table. The + suppression filter buttons are only shown when one or more alarms are + suppressed. + + The **Suppression Status** column is only shown in the table when the + **Show Suppressed** filter button is selected. + +#. Click the Alarm ID of an alarm entry in the table to display the details + of the alarm. \ No newline at end of file diff --git a/doc/source/fault/viewing-active-alarms-using-the-cli.rst b/doc/source/fault/viewing-active-alarms-using-the-cli.rst new file mode 100644 index 000000000..eb50a923e --- /dev/null +++ b/doc/source/fault/viewing-active-alarms-using-the-cli.rst @@ -0,0 +1,192 @@ + +.. pdd1551804388161 +.. _viewing-active-alarms-using-the-cli: + +================================ +View Active Alarms Using the CLI +================================ + +You can use the CLI to find information about currently active system alarms. + +.. rubric:: |context| + +.. note:: + You can also use the command :command:`fm alarm-summary` to view the count + of alarms and warnings for the system. + +To review detailed information about a specific alarm instance, see + :ref:`Viewing Alarm Details Using the CLI `. + +.. rubric:: |proc| + +.. _viewing-active-alarms-using-the-cli-steps-gsj-prg-pkb: + +#. Log in with administrative privileges. + + .. code-block:: none + + $ source /etc/platform/openrc + +#. Run the :command:`fm alarm-list` command to view alarms. + + The command syntax is: + + .. code-block:: none + + fm alarm-list [--nowrap] [-q ] [--uuid] [--include_suppress] [--mgmt_affecting] [--degrade_affecting] + + **--nowrap** + Prevent word-wrapping of output. This option is useful when output will + be piped to another process. + + **-q** + is a query string to filter the list output. The typical + OpenStack CLI syntax for this query string is used. The syntax is a + combination of attribute, operator and value. For example: + severity=warning would filter alarms with a severity of warning. More + complex queries can be built. See the upstream OpenStack CLI syntax + for more details on string syntax. Also see additional query + examples below. + + You can use one of the following --query command filters to view + specific subsets of alarms, or a particular alarm: + + .. table:: + :widths: auto + + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | Query Filter | Comment | + +============================================================================+============================================================================+ + | :command:`uuid=` | Query alarms by UUID, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query uuid=4ab5698a-19cb... | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | :command:`alarm\_id=` | Query alarms by alarm ID, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query alarm_id=100.104 | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | :command:`alarm\_type=` | Query alarms by type, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query \ | + | | alarm_type=operational-violation | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | :command:`entity\_type\_id=` | Query alarms by entity type ID, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query \ | + | | entity_type_id=system.host | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | :command:`entity\_instance\_id=` | Query alarms by entity instance id, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query \ | + | | entity_instance_id=host=worker-0 | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | :command:`severity=` | Query alarms by severity type, for example: | + | | | + | | .. code-block:: none | + | | | + | | ~(keystone_admin)$ fm alarm-list --query severity=warning | + | | | + | | The valid severity types are critical, major, minor, and warning. | + +----------------------------------------------------------------------------+----------------------------------------------------------------------------+ + + Query command filters can be combined into a single expression + separated by semicolons, as illustrated in the following example: + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-list -q 'alarm_id=400.002;entity_instance_id=service_domain=controller.service_group=directory-services' + + This option indicates that all active alarms should be displayed, + including suppressed alarms. Suppressed alarms are displayed with + their Alarm ID set to S<\(alarm-id\)>. + + **--uuid** + The --uuid option on the :command:`fm alarm-list` command lists the + active alarm list with unique UUIDs for each alarm such that this + UUID can be used in display alarm details with the + :command:`fm alarm-show` command. + + **--include\_suppress** + Use this option to include suppressed alarms in the list. + + **--mgmt\_affecting** + Management affecting alarms prevent some critical administrative + actions from being performed. For example, software upgrades. Using the + --mgmt\_affecting option will list an additional column in the output, + 'Management Affecting', which indicates whether the alarm is management + affecting or not. + + **--degrade\_affecting** + Include degrade affecting status in output. + + The following example shows alarm UUIDs. + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-list --uuid + +--------------+-------+------------------+---------------+----------+-----------+ + | UUID | Alarm | Reason Text | Entity ID | Severity | Time | + | | ID | | | | Stamp | + +--------------+-------+------------------+---------------+----------+-----------+ + | 6056e290- | 200. | compute-0 was | host= | warning | 2019 | + | 2e56- | 001 | administratively | compute-0 | | -08-29T | + | 4e22-b07a- | | locked to take | | | 17:00:16. | + | ff9cf4fbd81a | | it out-of | | | 363072 | + | | | -service. | | | | + | | | | | | | + | | | | | | | + | 0a8a4aec- | 100. | NTP address | host= | minor | 2019 | + | a2cb- | 114 | 2607:5300:201:3 | controller-1. | | -08-29T | + | 46aa-8498- | | is not a valid | ntp= | | 15:44:44. | + | 9ed9b6448e0c | | or a reachable | 2607:5300: | | 773704 | + | | | NTP server. | 201:3 | | | + | | | | | | | + | | | | | | | + +--------------+-------+------------------+---------------+----------+-----------+ + + This command shows a column to track the management affecting severity of each alarm type. + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-list --mgmt_affecting + +-------+-------------------+---------------+----------+------------+-------------+ + | Alarm | Reason Text | Entity ID | Severity | Management | Time Stamp | + | ID | | | | Affecting | | + +-------+-------------------+---------------+----------+------------+-------------+ + | 100. | Platform Memory | host= | major | False | 2019-05-21T | + | 103 | threshold | controller-0. | | | 13:15:26. | + | | exceeded ; | numa=node0 | | | 464231 | + | | threshold 80%, | | | | | + | | actual 80% | | | | | + | | | | | | | + | 100. | Platform Memory | host= | major | False | 2019-05-21T | + | 103 | threshold | controller-0 | | | 13:15:26. | + | | exceeded ; | | | | 456738 | + | | threshold 80%, | | | | | + | | actual 80% | | | | | + | | | | | | | + | 200. | controller-0 is | host= | major | True | 2019-05-20T | + | 006 | degraded due to | controller-0. | | | 23:56:51. | + | | the failure of | process=ceph | | | 557509 | + | | its 'ceph (osd.0, | (osd.0, ) | | | | + | | )' process. Auto | | | | | + | | recovery of this | | | | | + | | major process is | | | | | + | | in progress. | | | | | + | | | | | | | + | 200. | controller-0 was | host= | warning | True | 2019-05-17T | + | 001 | administratively | controller-0 | | | 14:17:32. | + | | locked to take it | | | | 794640 | + | | out-of-service. | | | | | + | | | | | | | + +-------+-------------------+---------------+----------+------------+-------------+ \ No newline at end of file diff --git a/doc/source/fault/viewing-alarm-details-using-the-cli.rst b/doc/source/fault/viewing-alarm-details-using-the-cli.rst new file mode 100644 index 000000000..1c3c149d9 --- /dev/null +++ b/doc/source/fault/viewing-alarm-details-using-the-cli.rst @@ -0,0 +1,56 @@ + +.. kfs1580755127017 +.. _viewing-alarm-details-using-the-cli: + +================================ +View Alarm Details Using the CLI +================================ + +You can view detailed information to help troubleshoot an alarm. + +.. rubric:: |proc| + +- Use the following command to view details about an alarm. + + .. code-block:: none + + fm alarm-show + + is the ID of the alarm to query. Use the :command:`fm alarm-list` + to obtain UUIDs as described in + :ref:`Viewing Active Alarms Using the CLI `. + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-show 4ab5698a-19cb-4c17-bd63-302173fef62c + +------------------------+-------------------------------------------------+ + | Property | Value | + +------------------------+-------------------------------------------------+ + | alarm_id | 100.104 | + | alarm_state | set | + | alarm_type | operational-violation | + | entity_instance_id | system=hp380-1_4.host=controller-0 | + | entity_type_id | system.host | + | probable_cause | threshold-crossed | + | proposed_repair_action | /dev/sda3 check usage | + | reason_text | /dev/sda3 critical threshold set (0.00 MB left) | + | service_affecting | False | + | severity | critical | + | suppression | True | + | timestamp | 2014-06-25T16:58:57.324613 | + | uuid | 4ab5698a-19cb-4c17-bd63-302173fef62c | + +------------------------+-------------------------------------------------+ + + The pair of attributes **\(alarm\_id, entity\_instance\_id\)** uniquely + identifies an active alarm: + + **alarm\_id** + An ID identifying the particular alarm condition. Note that there are + some alarm conditions, such as *administratively locked*, that can be + raised by more than one entity-instance-id. + + **entity\_instance\_id** + Type and instance information of the object raising the alarm. A + period-separated list of \(key, value\) pairs, representing the + containment structure of the overall entity instance. This structure + is used for processing hierarchical clearing of alarms. \ No newline at end of file diff --git a/doc/source/fault/viewing-suppressed-alarms-using-the-cli.rst b/doc/source/fault/viewing-suppressed-alarms-using-the-cli.rst new file mode 100644 index 000000000..994f309fc --- /dev/null +++ b/doc/source/fault/viewing-suppressed-alarms-using-the-cli.rst @@ -0,0 +1,49 @@ + +.. ohs1552680649558 +.. _viewing-suppressed-alarms-using-the-cli: + +==================================== +View Suppressed Alarms Using the CLI +==================================== + +Alarms may be suppressed. List them to determine if any need to be unsuppressed +or otherwise managed. + +.. rubric:: |proc| + +.. _viewing-suppressed-alarms-using-the-cli-steps-hyn-g1x-nkb: + +- Use the :command:`fm event-suppress-list` CLI command to view a list of + all currently suppressed alarms. + + This command shows all alarm IDs along with their suppression status. + + .. code-block:: none + + ~(keystone_admin)$ fm event-suppress-list [--nopaging] [--uuid] [--include-unsuppressed] + + where + + **--nopaging** + disables paged output, see :ref:`CLI Commands and Paged Output ` + + **--uuid** + includes the alarm type UUIDs in the output + + **--include-unsuppressed** + includes unsuppressed alarm types in the output. By default only + suppressed alarm types are shown. + + For example: + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)] fm event-suppress-list + +----------+-------------+ + | Event ID | Status | + +----------+-------------+ + | 100.101 | suppressed | + | 100.103 | suppressed | + | 100.105 | suppressed | + | ... | ... | + +----------+-------------+ \ No newline at end of file diff --git a/doc/source/fault/viewing-the-event-log-using-horizon.rst b/doc/source/fault/viewing-the-event-log-using-horizon.rst new file mode 100644 index 000000000..2b695f77a --- /dev/null +++ b/doc/source/fault/viewing-the-event-log-using-horizon.rst @@ -0,0 +1,55 @@ + +.. ubf1552680722858 +.. _viewing-the-event-log-using-horizon: + +================================ +View the Event Log Using Horizon +================================ + +The |prod| Horizon Web interface provides a convenient way to work with +historical alarms, and customer logs. + +.. rubric:: |context| + +The event log consolidates historical alarms events, that is, the sets and +clears of alarms that have occurred in the past, and customer logs. + +Customer logs capture important system events and provide useful information +to the administrator for the purposes of overall fault management. Customer +log events do not have a state and do not typically require administrator +actions, for example, they may be reporting a failed login attempt or the fact +that a container was evacuated to another host. + +Customer logs and historical alarms' set and clear actions are held in a +buffer, with older entries discarded as needed to release logging space. + +.. rubric:: |proc| + +#. Select **Admin** \> **Fault Management** \> **Events** in the left pane. + + The Events window appears. By default, the Events screen shows all events, + including both historical set/clear alarms and logs, with the most recent + events at the top. + +#. Use the filter selections from the search field to select the information + you want to view. + + Use the **All Events**, **Alarm Events** and **Log Events** filter buttons + to select all events, only historical alarms set/clear events or only + customer log events to be displayed. By default, all events are displayed. + Suppressed events are by default excluded from the table. Suppressed events + can be included or excluded in the table with the **Show Suppressed and Hide + Suppressed** filter buttons at the top right of table. The suppression filter + buttons are only shown when one or more events are suppressed. + + The **Suppression Status** column is only shown in the table when + **Show Suppressed** filter button is selected. + + .. image:: figures/psa1567524091300.png + + You can sort the entries by clicking on the column titles. For example, to + sort the view of the entries by severity, click **Severity**; the entries + are resorted and grouped by severity. + +#. Click the arrow to the left of an event entry in the table for an expanded + view of event details. \ No newline at end of file diff --git a/doc/source/fault/viewing-the-event-log-using-the-cli.rst b/doc/source/fault/viewing-the-event-log-using-the-cli.rst new file mode 100644 index 000000000..5ee0ceacf --- /dev/null +++ b/doc/source/fault/viewing-the-event-log-using-the-cli.rst @@ -0,0 +1,183 @@ + +.. fcv1552680708686 +.. _viewing-the-event-log-using-the-cli: + +================================ +View the Event Log Using the CLI +================================ + +You can use CLI commands to work with historical alarms and logs in the event log. + +.. rubric:: |proc| + +.. _viewing-the-event-log-using-the-cli-steps-v3r-stf-pkb: + +#. Log in with administrative privileges. + + .. code-block:: none + + $ source /etc/platform/openrc + +#. Use the :command:`fm event-list` command to view historical alarms' + sets/clears and logs. By default, only unsuppressed events are shown. + + For more about event suppression, see + :ref:`Events Suppression Overview `. + + The syntax of the command is: + + .. code-block:: none + + fm event-list [-q ] [-l ] [--alarms] [--logs] [--include_suppress] + + Optional arguments: + + **-q QUERY, --query QUERY** + \- key\[op\]data\_type::value; list. data\_type is optional, but if + supplied must be string, integer, float, or boolean. + + **-l NUMBER, --limit NUMBER** + Maximum number of event logs to return. + + **--alarms** + Show historical alarms set/clears only. + + **--logs** + Show customer logs only. + + **--include\_suppress** + Show suppressed alarms as well as unsuppressed alarms. + + **--uuid** + Include the unique event UUID in the listing such that it can be used + in displaying event details with :command:`fm event-show` . + + **-nopaging** + Disable output paging. + + For details on CLI paging, see + :ref:`CLI Commands and Paged Output `. + + For example: + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5 + +-----------+-----+-----+--------------------+-----------------+---------+ + |Time Stamp |State|Event|Reason Text |Entity Instance |Severity | + | | |Log | |ID | | + | | |ID | | | | + +-----------+-----+-----+--------------------+-----------------+---------+ + |2019-05-21T| set |100. |Platform Memory |host=controller-0|major | + | 13:15:26. | |103 |threshold exceeded ;|numa=node0 | | + | 464231 | | |threshold 80%,actual| | | + | | | |80% | | | + | | | | | | | + |2019-05-21T| set | 100.|Platform Memory |host=controller-0|major | + | 13:15:26. | | 103 |threshold exceeded; | | | + | 456738 | | |threshold 80%,actual| | | + | | | |80% | | | + | | | | | | | + |2019-05-21T|clear| 100.|Platform Memory |host=controller-0|major | + | 13:07:26. | | 103 |threshold exceeded; |numa=node0 | | + | 658374 | | |threshold 80%,actual| | | + | | | |79% | | | + | | | | | | | + |2019-05-21T|clear| 100.|Platform Memory |host=controller-0|major | + | 13:07:26. | | 103 |threshold exceeded; | | | + | 656608 | | |threshold 80%,actual| | | + | | | |79% | | | + | | | | | | | + |2019-05-21T| set | 100 |Platform Memory |host=controller-0|major | + | 13:05:26. | | 103 |threshold exceeded; |numa=node0 | | + | 481240 | | |threshold 80%,actual| | | + | | | |79% | | | + | | | | | | | + +-----------+-----+-----+--------------------+-----------------+---------+ + + .. note:: + You can also use the --nopaging option to avoid paging long event + lists. + + In the following example, the :command:`fm event-list` command shows + alarms only; the **State** column indicates either **set** or **clear**. + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5 --alarms + +-------------+-------+-------+--------------------+---------------+----------+ + | Time Stamp | State | Event | Reason Text | Entity | Severity | + | | | Log | | Instance ID | | + | | | ID | | | | + +-------------+-------+-------+--------------------+---------------+----------+ + | 2019-05-21T | set | 100. | Platform Memory | host= | major | + | 13:15:26. | | 103 | threshold exceeded | controller-0. | | + | 464231 | | | ; threshold 80%, | numa=node0 | | + | | | | actual 80% | | | + | | | | | | | + | 2019-05-21T | set | 100. | Platform Memory | host= | | + | 13:15:26. | | 103 | threshold exceeded | controller-0 | major | + | 456738 | | | ; threshold 80%, | | | + | | | | actual 80% | | | + | | | | | | | + | 2019-05-21T | clear | 100. | Platform Memory | host= | | + | 13:07:26. | | 103 | threshold exceeded | controller-0. | major | + | 658374 | | | ; threshold 80%, | numa=node0 | | + | | | | actual 79% | | | + | | | | | | | + | 2019-05-21T | clear | 100. | Platform Memory | host= | | + | 13:07:26. | | 103 | threshold exceeded | controller-0 | major | + | 656608 | | | ; threshold 80%, | | | + | | | | actual 79% | | | + | | | | | | | + | 2019-05-21T | set | 100. | Platform Memory | host= | | + | 13:05:26. | | 103 | threshold exceeded | controller-0. | major | + | 481240 | | | ; threshold 80%, | numa=node0 | | + | | | | actual 79% | | | + | | | | | | | + +-------------+-------+-------+--------------------+---------------+----------+ + + + In the following example, the :command:`fm event-list` command shows logs + only; the **State** column indicates **log**. + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ fm event-list -l 5 --logs + +-------------+-------+-------+---------------------+---------------+----------+ + | Time Stamp | State | Event | Reason Text | Entity | Severity | + | | | Log | | Instance ID | | + | | | ID | | | | + +-------------+-------+-------+---------------------+---------------+----------+ + | 2019-05-21T | log | 700. | Exited Multi-Node | subsystem=vim | critical | + | 00:50:29. | | 217 | Recovery Mode | | | + | 525068 | | | | | | + | | | | | | | + | 2019-05-21T | log | 700. | Entered Multi-Node | subsystem=vim | critical | + | 00:49:49. | | 216 | Recovery Mode | | | + | 979021 | | | | | | + | | | | | | | + | 2019-05-21T | log | 401. | Service group vim- | service | | + | 00:49:31. | | 002 | services redundancy | _domain= | critical | + | 205116 | | | restored | controller. | | + | | | | | service_group | | + | | | | | =vim- | | + | | | | | services | | + | | | | | | | + | 2019-05-21T | log | 401. | Service group vim- | service | | + | 00:49:30. | | 001 | services state | _domain= | critical | + | 003221 | | | change from go- | controller. | | + | | | | active to active on | service_group | | + | | | | host controller-0 | =vim-services | | + | | | | | .host= | | + | | | | | controller-0 | | + | | | | | | | + | 2019-05-21T | log | 401. | Service group | service | | + | 00:49:29. | | 002 | controller-services | _domain= | critical | + | 950524 | | | redundancy restored | controller. | | + | | | | | service | | + | | | | | _group= | | + | | | | | controller | | + | | | | | -services | | + | | | | | | | + +-------------+-------+-------+---------------------+---------------+----------+ \ No newline at end of file diff --git a/doc/source/index.rst b/doc/source/index.rst index 50ededd7d..ccf6d0063 100755 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -57,6 +57,15 @@ Configuration configuration/index +---------------- +Fault Management +---------------- + +.. toctree:: + :maxdepth: 2 + + fault/index + ---------------- Operation guides ---------------- @@ -91,18 +100,13 @@ General information Governance ---------- -StarlingX is a top-level Open Infrastructure Foundation confirmed project that -is governed by two separate bodies: The `Open Infrastructure Foundation Board of -Directors`_ and the `StarlingX Technical Steering Committee`_. +StarlingX is a top-level OpenStack Foundation pilot project that is governed by +two separate bodies: The `OpenStack Foundation Board of Directors`_ and the +`StarlingX Technical Steering Committee`_. See `StarlingX Governance`_ for additional information about StarlingX project governance. -.. _`Open Infrastructure Foundation Board of Directors`: https://openinfra.dev/about/board/ +.. _`OpenStack Foundation Board of Directors`: https://wiki.openstack.org/wiki/Governance/Foundation .. _`StarlingX Technical Steering Committee`: https://docs.starlingx.io/governance/reference/tsc/ -.. _`StarlingX Governance`: https://docs.starlingx.io/governance/ - - - - - +.. _`StarlingX Governance`: https://docs.starlingx.io/governance/ \ No newline at end of file