192 Commits

Author SHA1 Message Date
Eric MacDonald
04055390fa pmond: add support for no script label in conf files
Many new services being added to our system are no longer accompanied
with an init script ; only a service file. With the migration from sysvinit
to systemd pmond still requires process conf files to provide a script label.

This update removes that dependency. Instead, pmond will use the service
or script label to find the most appropriate process failure recover method
while handling the omition of either but not both of the service and script
labels.

The change is to first search for a service file that corresponds with the
service label in the conf file.
If the service label does not exist then the script label is looked at.
If the basename of the script has a corresponding service file then use it.
If no service file is found then the full pathed script is searched for.
If no script file is found then the process monitor errors out.

This update also makes an improvement to how pmond deals with the absence
of the hostw process. Current code base blocks startup if it cannot connect
to the hostw process.

This update implements host watchdog socket failure auto recovery while
continuing to monitor processes. With this update, if the host watchdog
process is restarted or is not running then pmond will continue to monitor
processes while periodically trying to recover connection to the host
watchdog once it does recover.

Change-Id: Icf27090d4d00954195b0ac931474587c67341207
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-07-01 21:18:33 -04:00
Abraham Arce
2b7a79ec58 SpellCheck: Typo heartbeat
While code reviewing, heartbeat typo was seeing.

Change-Id: I6fed9f82e1d3b3e78a7d91d2cfc9091f4bf83f70
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018-07-01 16:25:36 -05:00
Zuul
784a5f8a6c Merge "Remove non-voting gate job" 2018-06-29 19:56:40 +00:00
Dean Troyer
9215d018dc Remove non-voting gate job
Change-Id: I06b95f9c4ce0591b7437b0ba407cf785a491f2f6
2018-06-29 14:31:56 -05:00
Bin Qian
5eb13b2eeb Controller Services swact/failover time reduction
Add full support for Active/Active redudancy model
1. services could have enable dependency to services in other service
groups
   (standby group)
2. An active/active service failure will degraded the service group it
is in
3. A failure of active/active service would not prevent a swact
4. Locking a controller that is sole active/active service
provider will be rejected. But lock with force option will still proceed
to lock the node.
5. sm-api bind to port 7777 on mgmt interface. (was localhost:7777)

Change-Id: I0da78a51a50fd60ec128edc91c2eeec31af4a956
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-28 15:51:50 -04:00
Kam Nasim
5e725a7a0a Multi-Region: Support shared LDAP service
Decouple NSLCD from the open-ldap SM service and manage it by PMOND
instead. This is needed because in the Shared LDAP case, we deprovision
the open-ldap service on the Secondary Region which renders NSLCD
unmanaged.

Additionally, we allow the Secondary Region or Sub Clouds to bind
anonymously, but still need to support LDAP read operations in these
regions such as ldapfinger or lsldap. For this purpose, the ldapscripts
runtime library has been modified to allow anonymous binds during LDAP
search operations.

Change-Id: Ic01a8097e8124348d493c9e0c82fda94700e28e2
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-28 15:49:45 -04:00
Zuul
25ee1b19a4 Merge "Mtce: Avoid running subfunction FSM for AIO-DX compute only hosts" 2018-06-28 13:39:02 +00:00
Zuul
4765055961 Merge "Mtce: Implement all token fetches as non-blocking operations." 2018-06-28 13:39:01 +00:00
Zuul
24ba2ae211 Merge "Fix dir creation for downloading patches from pxeboot server" 2018-06-28 13:14:31 +00:00
Zuul
885885195f Merge "Mtce: Stop calling 'event_base_loopbreak' for nonblocking http requests" 2018-06-27 22:06:01 +00:00
Zuul
2b5a400b92 Merge "Mtce: Improve AIO DOR handling" 2018-06-27 21:09:37 +00:00
Zuul
eb18b90340 Merge "Exclude mlnx-ofa_kernel-rt from installation" 2018-06-27 21:01:09 +00:00
Zuul
ea6e03b765 Merge "Spectre kernel options set to off in kickstarts" 2018-06-27 20:56:03 +00:00
Don Penney
168c73f3e0 Spectre kernel options set to off in kickstarts
Change-Id: I4721b8881c7e40556b90eb882b9712f3e6c93841
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-06-27 15:54:26 -04:00
Don Penney
6086b545fd Fix dir creation for downloading patches from pxeboot server
When the pxeboot setup feature is used from a patched ISO, the patch
repo is downloaded from the pxeboot server during installation.
The kickstart was missing a change in working directory when mirroring
the patch repo, leading to issues when installing the second controller.
This update corrects this issue.

Change-Id: I5926ee4f196adf3938b8934f57c15eadde83a5fb
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-06-27 15:51:59 -04:00
Don Penney
8267e3ce99 Add ntpd to installer, sync time from active controller during install
To avoid potential issues due to large time jumps when NTP first syncs
the system time at runtime, this update adds ntpd to the installer
rootfs and adds a pre-script to the kickstarts to sync the time from
the active controller before starting to install the software. This
also ensures that any filesystem timestamps will be accurate right
from the node installation.

Change-Id: I166c52430cec6ba64e5a33ebde64ee65639d623c
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-06-27 15:48:41 -04:00
Don Penney
325947768a Exclude mlnx-ofa_kernel-rt from installation
During patchback testing, it was found Anaconda may also install
mlnx-ofa_kernel-rt from the patching repo, if it's in the patch.
This update adds it to the exclusion lists.

Change-Id: I0809898c2dc748543033bc6b11d5e22bd8e462ad
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-06-27 15:46:13 -04:00
Zuul
8344b27b67 Merge "Kickstart updates to resolve prepatching issues" 2018-06-27 19:41:48 +00:00
Eric MacDonald
2351cd73a5 Mtce: Avoid running subfunction FSM for AIO-DX compute only hosts
The AIO-DX system type was enhanced to allow the customer to
provision additional compute hosts.

However, the maintenance system made a global assumption about the
'system type' and would run the enable subfunction FSMs against
these compute only hosts. In doing so the unlock of these added
hosts would timeout and heartbeat would not enable.

This update ensures that the subfunction FSM is only run against
hosts that have both controller and compute functions.

Change-Id: If7711519d3435ef19faa13e7905afae2ce9084bc
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-27 15:32:08 -04:00
Eric MacDonald
08e66abb1b Mtce: Implement all token fetches as non-blocking operations.
Fetching a keystone token is seen to take a long time When the system
is overloaded. Such delay, due to overload, is most often seen over a
Swact in All-In-One Duplex (AIO-DX) configuration.

Any blocking call that takes a long time to complete can cause a process
stall. If that stall is sufficiently long then the command will either
finish on or before the timeout. The timeout is currently set to 15
seconds which is comparable to the SM monitored audit timeout.

Although rare, then race condition does exist and if there are 2
timeouts within 2 minutes then SM fails the process and triggers a
swact.

Rather than tune the timeouts, this update implements a more robust
fix by making all token fetches ; both 'initial token get' and
'runtime token refresh' as non-blocking.

The change applies to both mtcAgent and hwmond ; both of which need
to get and manage their own authentication token.

Change-Id: I2730b76ae78daec4b9edeaff5c1ca614b75ab52c
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-27 15:00:23 -04:00
Zuul
c15fe119cb Merge "Mtce: Create NTP alarm if only reachable NTP server is peer controller" 2018-06-27 18:50:15 +00:00
Eric MacDonald
0328da92d2 Mtce: Stop calling 'event_base_loopbreak' for nonblocking http requests
The SM API event handler is calling event_base_loopbreak for all requests.
This is sometimes causing segfault when this is done in the http handler
for non-blocking requests.

This update prevents this call for non-blocking requests.

Change-Id: Ib083100ccd74aa984bd86921c7521bfec925e779
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-27 14:28:27 -04:00
Zuul
942e7b1a03 Merge "pxe-network-installer to use updeated grub" 2018-06-27 18:22:14 +00:00
Don Penney
7c756f7ff3 Kickstart updates to resolve prepatching issues
The prepatching feature changed how kickstarts are delivered. Where
before, they were generated as part of build-iso and copied during
installation, they are now delivered in packages. However, when the
controller kickstart mirrors the feed directory from the active
controller after installation, it may overwrite these installed files.
This update changes the wget command options to protect against this.

In addition, testing with an RT kernel patch showed Anaconda was also
installing the mlnx-ofa_kernel-rt-modules package from the patching
repo on a standard note, as it attempts to resolve a packaging
requirement. This update also adds explicit exclusions to the package
lists in the standard and lowlatency kickstarts to avoid installing
rt modules on the standard nodes, and vice versa.

Change-Id: I56b22fb0846db05a96004184c1060c05566d5363
Signed-off-by: Don Penney <don.penney@windriver.com>
2018-06-27 13:21:50 -04:00
jmckenna
d58a4ca481 pxe-network-installer to use updeated grub
pxe-network-installer package modifed to use update version
of grub2 (0.64)

Change-Id: I94f3c010b8d32474d82c27df68621788e44511de
Depends-On: https://review.openstack.org/#/c/578440
2018-06-27 11:42:46 -04:00
Kristine Bujold
09a6a36c36 Mtce: Create NTP alarm if only reachable NTP server is peer controller
The maintenance resource monitor is not asserting a 'no reachable
NTP servers' alarm while the only reachable server is the peer
controller.

Change-Id: I8df7bda01676bbaaeca2d97cb6893265107700aa
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-26 17:16:56 -04:00
Eric MacDonald
e0d9d60d28 Mtce: Improve AIO DOR handling
The AIO (All-In-One) inactive controller is failed by maintenance in a
DOR (Dead-Office-Recovery) situation ; power off then power on of the
system.

This update scales the AIO DOR timeout to accomodate for the extra time
needed for the compute function manifest to apply.

Change-Id: I3006060fe04285881f95d2084cada40ec1002d1c
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-26 16:59:42 -04:00
Eric MacDonald
85a30b56e6 Mtce: Improve efficiency of mtcAgent's end-of-batch message handling
The mtcAgent's message inbox batch handling was needlessly zeroing a
buffer that it would never use after it reached the last received
message.

This update refactors the mtc_service_inbox entry code so that the
message buffer zeroing operation only occurs on the unused part of
fully received messages. To facilitate this change, a new utility for
zeroing a bounded part of a mtce message buf was introduced to common
utils.

Two additional enhancements were also made to the same procedure:
 - variable scoping change.
 - hostaddr and hostname lookup scoping change.

Change-Id: Ia2ef97dad611507b824927ed1652c8df8b54eee5
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-06-26 15:02:07 -04:00
Zuul
c90cddb408 Merge "Package and Enable Memcached on Controllers/AIO" 2018-06-25 13:02:00 +00:00
Zuul
bd36605a6d Merge "Rewrite virtio message handler for guest heartbeat" 2018-06-25 13:01:17 +00:00
Zuul
c41b9686de Merge "Fix guest heartbeat status and reporting state" 2018-06-25 13:01:17 +00:00
Zuul
ee15b88a90 Merge "Fix memory leak in guestServer" 2018-06-25 12:53:57 +00:00
Zuul
6a31c03d29 Merge "Add default test framework" 2018-06-25 12:46:31 +00:00
Jack Ding
723942e009 Package and Enable Memcached on Controllers/AIO
Filter out memcached service on compute and storage nodes.

Change-Id: Ie558ac297d4943038ecb194b38cd4825afc61f99
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-22 21:00:05 -04:00
Jack Ding
94cdbb73d4 Rewrite virtio message handler for guest heartbeat
Due to race conditions, multiple messages might be received from a
single read by guestServer. guestServer in this case would only handle
the first message and discard the remaining ones.

In this particular issue, guestServer received a heartbeat challenge
response message and a vote notification response (reject) message from
a single read, and the latter message was discarded.

This fix rewrites message handler for virtio serial channel to handle
segmented and multiple messages. It uses newline character to deliminate
messages so it assumes any newline characters in client log message are
removed.

Change-Id: Ic6f0509c98fcedf3631f4d210f753c32c37aa442
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-22 21:00:05 -04:00
Jack Ding
89e4e574e8 Fix guest heartbeat status and reporting state
Fixed a few day one bugs:
- Heartbeat status changes are not properly passed to VIM from
  GuestAgent.
- Heartbeat reporting state is not properly updated in guestServer when
  guest heartbeat is enabled.

These bugs could cause heartbeat status/states mismatch among
VIM-guestAgent-guestServer and result in intermittent issues.

Change-Id: I2198760345821fa4af0437af252e3ec6a39978d8
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-22 20:59:56 -04:00
Jack Ding
6f183bd257 Fix memory leak in guestServer
pop_front() only delete the internal copy of jobj_msg in the
message_list. The original object still need to be released.

Change-Id: I869bdfac9de59d512a50d10a073682e5dcd9bbcf
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-06-22 20:55:31 -04:00
Scott Little
89e1fb4dfc Split centos-pkg-dirs along git boundaries.
Problem:
The centos-pkg-dirs files should only reference packages with
compilation instructions hosted in the same git.

Solution:
Create centos-pkg-dirs files in other stx-* gits, and relocate
the relevant entries from the stx-utils centos-pkg-dirs into
the appropriate destination git.

Change-Id: If7e08356d7791f0cfcbf3f0c95bc0e4d0b693329
Signed-off-by: Scott Little <scott.little@windriver.com>
2018-06-20 16:25:33 -04:00
Matt Peters
6eaea4fa0d Open vSwitch integration with host and configuration framework
Integrates the latest Open vSwitch with DPDK into the host management
and configuration framework and configures the default system
vswitch type to be ovs-dpdk.

Change-Id: I59f2346a3b7c0eec34613c6f394e407ba7dd2ddd
Signed-off-by: Matt Peters <matt.peters@windriver.com>
2018-06-12 12:12:09 -05:00
Dean Troyer
3d9425425a Add default test framework
Change-Id: I18bfbf72a120b6832fbc52e630cb3c96daa663b1
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-06-11 18:51:02 -05:00
Dean Troyer
18922761a6 StarlingX open source release updates
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-05-31 07:36:43 -07:00
Dean Troyer
7f0544bc4b Add .gitreview
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-05-31 07:36:43 -07:00