226 Commits

Author SHA1 Message Date
Zuul
2ff34d2ca1 Merge "Enable the plugin tests" 2019-03-06 19:42:17 +00:00
Zuul
635cd44258 Merge "Remove include bits/siginfo.h from pmon.h" 2019-03-06 19:42:17 +00:00
Zuul
064cf08e7f Merge "Add EXTRALDFLAGS to linker in a number of Makefiles" 2019-03-06 19:36:08 +00:00
Zuul
aab9dc64cd Merge "Set a fixed install dir and simplify the install process" 2019-03-06 19:36:07 +00:00
Zuul
427e21a7a8 Merge "Fix up requirements for centos7 and bionic" 2019-03-06 19:10:30 +00:00
Zuul
6a2ed1d5f2 Merge "pmond: don't error log first active pulse miss" 2019-03-06 15:28:39 +00:00
Zuul
62955f086c Merge "fix compiling warning in pingUtil.cpp" 2019-03-06 14:45:37 +00:00
Yi Wang
9d837a4cc6 fix compiling warning in pingUtil.cpp
* StarlingX devstack has switched Ubuntu Bionic. Default compiler is
  gcc 7.3.0. gcc 7.3.0 will report compiling error message
  "error: format not a string literal and no format arguments
  [-Werror=format-security]" for the calling of snprintf in
  pingUtil_send of pingUtil.cpp
* gcc 4.8.5 doesn't report such warning. That's why current StarlingX
  building doesn't have such issue.

Passed tests:
* Fresh building
* Deployment test
* Unit tests, verified the change doesn't impact the code behavior.
* System-level verification, mtcAgent and hwmond can start normally.

Story: 2003161
Task: 29793

Change-Id: I21e84ac4b2c9deb8926c752fe79ea284a0d92b30
Signed-off-by: Yi Wang <yi.c.wang@intel.com>
2019-03-06 11:24:53 +08:00
Dean Troyer
732e31b381 Enable the plugin tests
The preceeding 4 reviews all needed to be in place in order for
the devstack run to complete.  Enable it now.

Change-Id: I139c862b8edbe7214ad11b9820e400b7e613bd61
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-28 23:40:52 -06:00
Dean Troyer
91b046dc89 Remove include bits/siginfo.h from pmon.h
libc6 renamed siginfo.h to siginfo-const.h sometime between
2.23 (in Xenial) and 2.27 (in bionic).

This builds on bionic and centos7 and in fact is required to
get DevStack to copmlete on bionic.

This is last in the stack since it has not been tested
beyond the compile/install that DevStack does.  There
may be a better/alternate solution...but with this we should
get a passing DevStack job.

Change-Id: I5a2ed9455b05e604731c3775d0f402c6137da2ef
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-28 22:34:54 -06:00
Dean Troyer
83101e95ba Add EXTRALDFLAGS to linker in a number of Makefiles
This allows DevStack plugins to add its configured STX_INST_DIR
to the linker search path.

Change-Id: I277204cd89767b93eec6c96969fc33d23e04516b
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-28 22:34:54 -06:00
Dean Troyer
f1c8043abf Set a fixed install dir and simplify the install process
* Install build artifacts to a fixed dir rather than attempting
  to infer a location based on the Python binary location.  That
  was intended to work seamlessly in venvs, we'll burn that bridge
  when we come to it, for now just put it all in
  $DEST/usr/{include|lib}.  This also removed the need for
  root access for these files to allow the build steps to be performed
  on laptops that may not otherwise run DevStack.

* Install systemd unit files directly to /etc/systemd/system
  and skip the requirement to copy them a second time

* Add the declarations to settings for the devstack playbook to
  handle plugin precedence order properly.

Change-Id: I5d68465384e000c05eb650a8358b70f7a7a6c293
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-28 22:34:47 -06:00
Dean Troyer
a1a98d3514 Fix up requirements for centos7 and bionic
* Add dependencies for bionic:
  libevent-2.1
  libjson-c*

* Fix a couple of bugs setting /etc/hosts

Change-Id: Ice77cb9db8db367faa982e3113ed1c16065be896
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-28 13:39:02 -06:00
Yi Wang
ab3f2385e8 fix a devstack plugin bug
devstack default user stack may not have permission to modify system
file /etc/hosts. use sudo to make sure the modification is done.

Change-Id: Iabe47cae88da9d70a1f7788c1847d99856963713
Closes-Bug: 1816520
Signed-off-by: Yi Wang <yi.c.wang@intel.com>
final-non-containers final-non-containers-green
2019-02-20 14:54:12 +08:00
Eric MacDonald
9d38f56f7f pmond: don't error log first active pulse miss
Change-Id: I31ef5e290993e8d6b492d0d9b58709b854c4dffa
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-15 10:20:59 -05:00
Alex Kozyrev
506ef3fd7f MTCE: reading BMC passwords from Barbican secret storage.
Use Openstack Barbican API to retrieve BMC passwords stored by SysInv.
See SysInv commit for details on how to write password to Barbican.
MTCE is going to find corresponding secret by host uuid and retrieve
secret payload associated with it. mtcSecretApi_get is used to find
secret reference, based on a hostname. mtcSecretApi_read is used to
read a password using the reference found on a prevoius step.
Also, did a little cleanup and removed old unused token handling code.

Depends-On: I7102a9662f3757c062ab310737f4ba08379d0100
Change-Id: I66011dc95bb69ff536bd5888c08e3987bd666082
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-02-14 09:04:46 -05:00
Zuul
b01f8ea964 Merge "Inventory: store BMC password in Openstack Barbican." 2019-02-13 15:54:43 +00:00
Zuul
348857cc49 Merge "Add devstack job and fix linters" 2019-02-11 20:41:30 +00:00
Zuul
7af72995d0 Merge "fix tox python3 overrides" 2019-02-11 16:17:32 +00:00
Dean Troyer
5133f09a0f Add devstack job and fix linters
Add the base DevStack job and make sure bashate runs on
the devstack plugin files.

Begin to re-structure the plugin to match the common structure.

Add devstack/build.sh and split out the build steps into
separate functions in devstack/lib/stx-metal

This is complete, further work to be done in follow-up changes.

Change-Id: I05f6df758e18f182fb0a05731eddc6cb7f599e51
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2019-02-07 11:28:14 -06:00
Tao Liu
5a44b5be49 Configurable Host HTTP/HTTPS Port Binding
Update pxeboot-update script to accept parameter for
installer base URL

Add a common function to parse the port number from
inst.repo

Update pxeboot and kickstart URLs to support a configurable
HTTP port

Story: 2004642
Task: 28593
Depends-On: https://review.openstack.org/#/c/634237/

Change-Id: Ibd66e89e49794ca57b938eb43d227860eda6674a
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2019-02-06 16:04:07 -06:00
Alex Kozyrev
938d9551c4 Inventory: store BMC password in Openstack Barbican.
Replacing existing mechanism of storing BMC passwords in Inventory.
Porting all the changes made in SysInv to Inventory to make them on par.
Inventory is going to use Barbican API instead of keyring to store
BMC passwords for MTCE as well.

Depends-On: I7102a9662f3757c062ab310737f4ba08379d0100
Change-Id: I74e971495fa7538d77cfebc28d76fd752af69f5e
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-02-06 13:20:20 -05:00
Zuul
0760eb2c4d Merge "Fix the misspelling of "configuration"" 2019-02-04 21:59:17 +00:00
Zuul
6e7bbf6e35 Merge "Add new Link Monitor (lmond) daemon to Mtce" 2019-02-04 17:07:02 +00:00
Eric MacDonald
7e8be89143 Make Mtce default to Simplex system type if label is missing
This update refactors daemon_system_type function so that it
returns a SIMPLEX system type if it is unable to properly
find and parse the system_mode/system_type from platform.conf

This is needed for Ansible Bootstrap Deployment where mtcAgent
and mtcClient need to run and function like it would in a
simplex system prior to the system type being added to the
platform.conf file.

Change-Id: Ib0130f3559ee3aa8d8d8203ea59d4896a571944f
Story: 2004695
Task: 28714
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-04 14:15:40 +00:00
Eric MacDonald
7941ee5bbb Add new Link Monitor (lmond) daemon to Mtce
This update introduces a new Link Monitor daemon to the Mtce
flock of daemons and disable rmon's interface monitoring.

This new daemon parses the platform.conf file and using the
interface names assigned to each monitored network (mgmt,
infra and oam) queries the kernel for their physical,
bonded and vlan interface names and then registers to listen
for netlink events.

All link/interface state change (netlink) events that correspond
to any of the interfaces or links assiciated with the monitored
networks are tracked by this new daemon.

This new daemon then also implements an http listener for
localhost initiated GET requests targeted to /mtce/lmond
on port 2122 and responds with a json link_info string that
contains a summary of monitored networks, links and their
current Up/Down status.

lmond behavioral summary:
  1. learn interface/port model,
  2. load initial link status for learned links,
  3. listen for link status change events
  4. provide link status info to http GET Query requests.

Another update to stx-integ implements the collectd interface
plugin that periodically issues the Link Status GET requests
for the purponse of alarming port and interface Down conditions,
clearing alarms on Up state changes, and storing sample data
that represents the percentage of active links for each monitored
network.

Test Plan:

PASS: Verify lmond process startup
PASS: Verify lmond logging and log rotation
PASS: Verify lmond process monitoring by pmon
PASS: Verify lmond interface learning on process startup
PASS: Verify lmond port learning on process startup
PASS: Verify lmond handling of vlan and bond interface types
PASS: Verify lmond http link info GET Query handling
PASS: Verify lmond has no memory leak during normal and eventfull operation

Change-Id: I58915644e60f31e3a12c3b451399c4f76ec2ea37
Story: 2002823
Task: 28635
Depends-On:
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-02-01 14:57:40 -05:00
Eric MacDonald
ff8ef3ea8a Change Mtce token endpoint lookup to be 'platform'.
The maintenance token request's response parser is looking
for nova compute endpoint as a day one implementation when
mtce actually managed nova. That is long since changed but
this endpoint lookup remained.

In the new containterized environment the nova compute
endpoint is not always present and when its not mtce
fails to get its token.

Since mtce needs the token for communication with sysinv
this update changes the endpoint lookup type to 'platform'
to match that of sysinv.

Change-Id: I389b64d345e47f7d7bc062671da7c7cc51ac398f
Story: 2004695
Task: 29213
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-30 12:55:55 -05:00
Jack Ding
9ececd7623 Remove nova storage aggregates
Remove the automated creation of storage host aggregates and host
population in inventory.

Story: 2004607
Task: 29068
Change-Id: I4a74a1ee1f8b3bc8dc6293a5c971d9c7ed1442b5
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2019-01-25 09:56:09 -05:00
Zuul
9c271569d6 Merge "Clean up and standardize landing pages" 2019-01-23 14:23:28 +00:00
98k
90cb458f7e fix tox python3 overrides
We want to default to running all tox environments under python 3, so
set the basepython value in each environment.

We do not want to specify a minor version number, because we do not
want to have to update the file every time we upgrade python.

We do not want to set the override once in testenv, because that
breaks the more specific versions used in default environments like
py35 and py36.

Change-Id: I1bd6a3aebbbe539d4f21ca71c76d92e3c325c1e8
Closes-Bug:  #1802032
2019-01-12 03:06:01 +00:00
Zuul
887bd34471 Merge "Add NTP server monitoring as a collectd plugin" 2019-01-11 21:00:05 +00:00
Eric MacDonald
f7031cf5fb Add NTP server monitoring as a collectd plugin
This update disables rmon NTP monitoring which is now done
as a collectd plugin with the following depends update.

Story: 2002823
Task: 22859

Depends-On: https://review.openstack.org/#/c/628685/
Change-Id: I736703542c8a6ba3dd9e9db2d6fb7ccbdc906643
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-11 09:15:58 -05:00
Zuul
b1a7d73ee8 Merge "Automatically create cgts-vg volume group on worker nodes" 2019-01-10 22:36:09 +00:00
Kristal Dale
3522ead301 Clean up and standardize landing pages
doc index.rst:
1. Update intro sentence to read as a complete sentence
2. Remove unused toctree
3. Correct heading levels (impacting side nav and correct rendering
of content)
4. Remove "Indices and Tables" section: genindex page not used,
search searches only index (not useful here)

api-ref index.rst:
1. Update intro sentence to read as a complete sentence
2. Update text around search link for consistency (move to
follow intro)
3. Add heading before toctree for consistency with other pages

releasenotes index.rst:
1. Standardize page title reST markup
2. Remove search (make consistent with other openstack release
note pages)

Story: 2004737
Task: 28805

Change-Id: I388cc5d69db56e6e94bf034ece2478933c9d9c1e
Signed-off-by: Kristal Dale <kristal.dale@intel.com>
2019-01-09 09:34:38 -08:00
Mingyuan Qi
4273c21af7 Add devstack plugin
Add maintenance services as stx-metal plugin.
Enable services by both node type and metal components.

Target:
Mtce services are installed and active(running) in devstack.

Story: 2003161
Task: 23296

Change-Id: I2123c64fb1b70bd135e8945d7ff7f4f3691bdbcc
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
2019-01-09 19:11:18 +08:00
Wei Zhou
fe397d5d27 Automatically create cgts-vg volume group on worker nodes
This commit creates cgts-vg volume group automatically on worker
nodes by kickstart. This cgts-vg volume group reserves space for
log-lv, scratch-lv, docker-lv and ceph-mon-lv.

This commit reserves space in cgts-vg volume group for 30G
docker-lv and 20G ceph-mon-lv for AIO configuration.

Story: 2004520
Task: 28663
Change-Id: Ic77d00c354da1070e2c4c2da4545d70ab4a93d91
Signed-off-by: Wei Zhou <wei.zhou@windriver.com>
2019-01-07 22:03:03 -05:00
Eric MacDonald
64c1d400b9 Implement collectd startup in manifest apply post stage
Starting collectd too early in the manifest apply is seen
to occasionally fail due to a dependency configuration on
hostname resolution in FQDNLookup not being complete.

Since influxdb is used by collectd and is a controller
only service this update moves it to the manifest apply
post stage as well and is filtered out from non
controller load types.

This issue is fixed by the following multi-git changes.

stx-metal: This update.
   Filter influxdb out of storage and compute only loads.
   No real inter git merge dependency

stx-integ:
   Add startup Before=pmond dependency

stx-config:
   Move collectd config and startup to manifest apply post stage
   Move influxdb config and startup to manifest apply post stage

Test Plan:
PASS: Build iso
PASS: verify install storage system and collectd startup
PASS: Verify Storage system DOR
PASS: Verify influxdb and extensions excluded in non-controller loads
PASS: Verify collectd starts properly on all nodes (CC,DOR,UNLOCK)
PASS: Verify influxdb starts properly on controller nodes (CC,DOR,UNLOCK)
PASS: Verify collectd pmond process monitoring and recovery
PASS: Verify influxdb pmond process monitoring and recovery

PEND: Verify collectd statistics storage and fetch to/from influxdb
PEND: Install AIO DX and verify collectd and influxdb startup

Change-Id: I8c71f36978620e0650062cc848bfb9d85f6810b2
Closes-Bug: 1797909
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-01-02 09:55:42 -05:00
lijunjie
f538394537 Fix the misspelling of "configuration"
Change-Id: If9c2ae83843a78a01a9c29ff820448c7aefa6b1b
2018-12-27 16:22:15 +08:00
zhipengl
68ab0560cf Fix trivial issue found during code review for hbs related code
1. Build-iso - PASS
2. Install iso and unlock all hosts -PASS
3. Force reboot on unlocked host to verify heartbeat failure detection 
and graceful recovery. PASS
4. Verify hbsAgent logs for unexpected logs. PASS

Change-Id: Ia4f52d3ffa52152914f3c221fa6eb860d127724b
Signed-off-by: zhipengl <zhipengs.liu@intel.com>
2018-12-27 07:56:23 +00:00
Zuul
351cc87c9c Merge "Remove version from installer" 2018-12-21 15:32:32 +00:00
Zuul
7512c6b105 Merge "Mtce: Improve robustness of heartbeat Loss reporting" 2018-12-21 14:59:11 +00:00
Zuul
a4a5a86a08 Merge "Mtce: fix hbsClient active monitoring over config reload" 2018-12-21 14:32:52 +00:00
Eric MacDonald
4fb3ce1121 Mtce: Improve robustness of heartbeat Loss reporting
Closes-Bug: 1806963

In the case where the active controller experiences a
spontaneous reboot failure there is the potential for
a race condition in the new Active-Active Heartbeat
model between the inactive hbsAgent and mtcAgent
starting up on the newly active controller.

The inactive hbsAgent can report a heartbeat Loss before
SM starts up the mtcAgent. This results in a no detect
of the of a heartbeat failed host.

This update modifies the hbsAgent to continue to report
heartbeat Loss at a throttled rate while the hbsAgent
continues to experience heartbeat loss of enabled monitored
hosts. This change is implemented in nodeClass.cpp.

Debug of this issue also revealed another undesirable race
condition and logging issue when a controller is locked. This
issue is remedied with the introduction of a control structure
'locked' state that is set on controller lock and looked at in
the hbs_cluster_update utility. hbsCluster.cpp

Two additional hbsAgent logging changes were implemented with
this update.

  1. Only print "missing peer controller cluster view" on a
     state change event. Otherwise, this becomes excessive
     whenever the inactive controller fails.
     hbsAgent.cpp

  2. Don't print the full heartbeat inventory and state banner
     with hbsInv.print_node_info on every heartbeat Loss event.
     Otherwise, this becomes excessive in larget systems.
     hbsCluster.cpp

Test Plan:
PASS: Verify hbsAgent log stream for implemented improvements.
PASS: Verify Lock inactive controller several times.
PASS: Fail inactive controller several times. verify detect.
PASS: Reboot active controller several times. verify detect.
PASS: DOR System several times. Verify proper recovery.
PASS: DOR system but prevent power-up of several hosts. Verify detect.

Change-Id: I36e6309e141e9c7844b736cce0cf0cddff3eb588
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-12-20 15:46:03 -05:00
Angie Wang
45da23bbce Increase the partition size for docker distribution
This increases the default docker distribution partition size from
1G to 16G. This also increases the minimum disk requirements from
130G to 145G for small disk, 170G to 185G for large disk.

Story: 2004520
Task: 28526
Change-Id: I898cfac45757ff1f9e6ce7c4928bbd9a42dca77d
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2018-12-18 20:52:12 -05:00
Tao Liu
9661e49411 Change compute node to worker node personality
This update replaces compute references to worker in mtce,
kickstarts, installer and bsp files.

Tests Performed:
Non-containerized deployment
AIO-SX: Sanity and Nightly automated test suite
AIO-DX: Sanity and Nightly automated test suite
2+2 System: Sanity and Nightly automated test suite
2+2 System: Horizon Patch Orchestration

Kubernetes deployment:
AIO-SX: Create, delete, reboot and rebuild instances
2+2+2 System: worker nodes are unlock enable and no alarms

Story: 2004022
Task: 27013

Depends-On: https://review.openstack.org/#/c/624452/

Change-Id: I225f7d7143d841f80459603b27b95ac3f846c46f
Signed-off-by: Tao Liu <tao.liu@windriver.com>
2018-12-13 13:08:48 -05:00
Zuul
8eb55b2b03 Merge "Mtce: Add Thresholded Maintenance Enable Recovery support" 2018-12-13 15:57:44 +00:00
Eric MacDonald
4e132af308 Mtce: fix hbsClient active monitoring over config reload
The maintenance process monitor is failing the hbsClient
process over config or process reload operations.

The issue relates to the hbsClient's subfunction being
'last-config' without pmon properly gating the active
monitoring FSM from starting until the passive monitoring
phase is complete and in the MANAGE state.

Test Plan

PASS: Verify active monitoring failure detection and handling
PASS: Verify proper process monitoring over pmond config reload
PASS: Verify proper process monitoring over SIGHUP -> pmond
PASS: Verify proper process monitoring over SIGUSR2 -> pmond
PASS: Verify proper process monitoring over process failure recovery
PASS: Verify pmond regression test soak ; on active and inactive controllers
PASS: Verify pmond regression test soak ; on compute node
PASS: Verify pmond regression test soak ; kill/recovery function
PASS: Verify pmond regression test soak ; restart function
PASS: Verify pmond regression test soak ; alarming function
PASS: Verify pmond handles critical process failure with no restart config
PASS: Verify pmond handles ntpd process failure

PASS: Verify AIO DX Install
PASS: Verify AIO DX Inactive Controller process management over Lock/Unlock.

Change-Id: Ie2fe7b6ce479f660725e5600498cc98f36f78337
Closes-Bug: 1807724
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-12-12 13:53:18 -05:00
Zuul
373f21e5cd Merge "Set SHELL in Makefiles that use bash constructs" 2018-12-12 14:27:53 +00:00
Eric MacDonald
3a5c578355 Mtce: Add Thresholded Maintenance Enable Recovery support
This update stops trying to recover hosts that have failed the
Enable sequence after a thresholded number of back-to-back tries.

A host that has reached a particular failure modes' max failure
threshold then maintenance puts it into a 'unlocked-disabled-failed'
state and left that way with no further recovery action until
it is manually locked and unlocked.

The thresholded Enable failure causes are

 Configuration Failure ....... threshold:2 retry interval:30 secs
 In-Test GoEnabled Failure ... threshold:2 retry interval:30 sec
 Start Host Services Failure . threshold:2 retry interval:30 sec
 Heartbeat Soak Failure ...... threshold:2 retry interval:10 minute

This update refactors the old auto recovery for AIO SX into this
more generic framework.

Story: 2003576
Task: 24905

Test Plan:

PASS: Verify AIO DX System Install
PASS: Verify AIO SX DOR
PASS: Verify Auto recovery disabled state is maintained over AIO SX DOR
PASS: Verify Lock/Unlock recovers host from Auto recovery disabled state
PASS: Verify AIO SX Main Config Failure handling
PASS: Verify AIO SX Main Config Timeout handling
PASS: Verify AIO SX Main GoEnabled Failure Handling
PASS; Verify AIO SX Main Host Services Failure handling
PASS; Verify AIO SX Main Host Services Timeout handling
PASS; Verify AIO SX Subf Config Failure handling
PASS: Verify AIO SX Subf Config Timeout handling
PASS: Verify AIO SX Subf GoEnabled Failure Handling
PASS: Verify AIO SX Subf Host Services Failure handling

PASS: Verify AIO DX System Install
PASS: Verify AIO DX DOR
PASS: Verify AIO DX DOR ; one time active controller GoEnabled failure ; swact requested
PASS: Verify AIO DX Main First Unlock Failure handling
PASS: Verify AIO DX Main Config Failure handling (inactive ctrl)
PASS: Verify AIO DX Main one time Config Failure handling
PASS: Verify AIO DX Main one time GoEnabled Failure handling.
PASS: Verify AIO DX SUBF Inactive Controller 1 GoEnable Failure handling.
PASS: Verify AIO DX Inactive Controller 1 GoEnable Failure with recovery on retry.
PASS: Verify AIO DX Active controller Enable failure with no or locked peer controller.
PASS: Verify AIO DX Reboot Active controller with peer in auto recovery disabled state.
PASS: Verify AIO DX Active controller failure with peer in auto recovery disabled state. (vswitch process)
PASS: Verify AIo DX Active controller failure then recovery after reboot with peer in auto recovery disabled state. (goenabled)
PASS: Verify AIO DX Inactive Controller Enable Heartbeat Soak Failure handling.
PASS: Verify AIO DX Active controller unhealthy detection and handling. (degrade)
PASS: Verify AIO DX Inactive controller unhealthy detection and handling. (fail)

PASS: Verify Normal System Install
PASS: Verify Compute Enable Configuration Failure handling (wc71-75)
PASS: Verify Compute Enable GoEnabled Failure handling (recover after 1)
PASS: Verify Compute Enable Start Host Services Failure handling
PASS: Verify Compute Enable Heartbeat Soak Failure handling
PASS: Verify Inactive Controller Enable Heartbeat Soak Failure handling
PASS: Verify Inactive Controller Configuration Failure handling
PASS; Verify Inactive Controller GoEnabled Failure handling
PASS; Verify Inactive Controller Host Services Failure handling
PASS; Verify goEnabled failure after active controller reboot with no peer controller (C0 rebooted with C1 locked) - no SM startup
PASS: Verify auto recovery threshold number is configurable
PASS: Verify auto recovery retry interval is configurable
PASS: Verify auto recovery host state and status message

Regression:

PASS: Verify Swact behavior, over and back
PASS: Verify 5 node DOR
PASS: Verify 3 host MNFA behavior
PASS: verify in-service heartbeat failure handling
PASS: verify no segfaults during UT

Corner Cases:

PASS: Verify mtcAlive boot failure behavior. reset progression. retry forever. - sleep in config script
PASS: Verify AIO SX mtcAgent process restart while in autorecovery disabled state
PASS: Verify autorecovery disabled state is preserved over mtcAgent process restart.

Change-Id: I7098f16243caef27c5295971ef3c9de5be975755
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-12-12 08:11:36 -05:00
Zuul
42ad23ae83 Merge "No json_object_put() for the json_obj created by json_object_object_get_ex()." 2018-12-11 21:10:30 +00:00