192 Commits

Author SHA1 Message Date
zhipengl
55f5327a69 move memcached changes from platform-utils
Need to change related filter files as we only install
memcached package in controller node.
Basical multinode deployment test pass.

Story: 2004108
Task: 27517

Change-Id: Id21d3db9d172398f6c23f86e5dd5f9e5a249c6b8
Signed-off-by: zhipengl <zhipengs.liu@intel.com>
2018-10-31 01:43:29 +00:00
Zuul
f5de6cc2d6 Merge "get rid of duplicate LICENSE files in 3 packages" 2018-10-31 00:58:33 +00:00
Zuul
7fd3f05a97 Merge "releasenotes: Grammar edit." 2018-10-30 17:27:12 +00:00
Yong Hu
2402fb16ae get rid of duplicate LICENSE files in 3 packages
There are 2 duplicated LICESE files in mtce-control, mtce-compute,
and mtce-storage. Additionally, LICENSE was not placed in the root
directory of src RPM, so this patch is made as an enhancement or fix.
After this change, license file location and code structure in all 4
modules (mtce-common, mtce-compute, mtce-storage and mtce-control)
will be the same.

Test method: make a clean build and check src RPM and binary RPM
to assure there is only one LICENSE in correct place.

Story: 2004186
Task: 27676

Change-Id: Id71a7450e8b45438c5d15976ae8e853b9ba8f4f5
Signed-off-by: Yong Hu <yong.hu@intel.com>
2018-10-30 02:55:34 +00:00
Zuul
e472f51aed Merge "fix compilation warnings in c/cpp files" 2018-10-30 02:37:06 +00:00
Bart Wensley
dcfee58e7b Lock down flake8 version
The new flake8 version 3.6.0 introduces new warnings that cause
the check and gate jobs to fail. Locking down the flake8 version
to avoid these surprises in the future. We can later increment
the flake8 version and fix the new warnings in a controlled
manner.

Change-Id: I0cec95de9fe9d58536752038b94939962919d166
Partial-Bug: 1799721
Signed-off-by: Bart Wensley <barton.wensley@windriver.com>
2018-10-24 10:08:38 -05:00
Zuul
ebb9dae417 Merge "Fix resource leak issues, file not close case" 2018-10-23 18:45:01 +00:00
Yong Hu
40404e86de fix compilation warnings in c/cpp files
The warnings might be treated as errors when the build system using compiler,
for example, gcc/g++ 8.2.1, with "-O2 -Wall -Wextra -Werror" options.

Story: 2004134
Task: 27591

Change-Id: I576a8c0305a4c32772fbc750ef39c73334b19336
Signed-off-by: Yong Hu <yong.hu@intel.com>
2018-10-23 07:38:33 +00:00
Zuul
fd1f04bbdb Merge "[Doc] openstackdocstheme starlingxdocs theme" 2018-10-23 00:44:23 +00:00
Zuul
0fb6d1a135 Merge "Fix host check wrong in virt-support-goenabled.sh" 2018-10-22 19:30:07 +00:00
Martin Chen
50b77e7d62 Fix host check wrong in virt-support-goenabled.sh
Closes-Bug: 1798773

Change-Id: Ie62b7dfdf835ef60f9e7ba81f60dbcc517417a78
Signed-off-by: Martin Chen <haochuan.z.chen@intel.com>
2018-10-23 05:14:07 +08:00
Zuul
c45eca2c2c Merge "Fix resource leak issue, memory not free" 2018-10-22 14:39:56 +00:00
Abraham Arce
a526093ba8 [Doc] openstackdocstheme starlingxdocs theme
Enable starlingxdocs theme support for:

- Documentation
- Release Notes
- API Reference

Change-Id: I5a53eecaf97e835e85ea4d5a7fae2bc3ca3d45a8
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018-10-22 14:37:08 +00:00
Zuul
196eb85eba Merge "remove cgts- prefix to align with other sub-projects (packages)" 2018-10-22 01:25:56 +00:00
Martin Chen
c89f0ffa20 Fix resource leak issues, file not close case
Partial-Bug: 1794903

Change-Id: Id6de282c27374d578a0a41869ec6a934e6675db4
Signed-off-by: Martin Chen <haochuan.z.chen@intel.com>
2018-10-20 02:08:42 +08:00
Yong Hu
718efbcf0d remove cgts- prefix to align with other sub-projects (packages)
Rename files and folders in mtce-compute, mtce-control, and
mtce-storage. As well update packages' names in bsp-files/
filter_out_* scripts accordingly.

Story: 2004079
Task: 27485

Change-Id: Ic1e9bd4bb8d72f30ddcc2a2bfc602a1a34e583da
Signed-off-by: Yong Hu <yong.hu@intel.com>
2018-10-19 06:07:31 +00:00
haochuan
18624ceeb1 Fix resource leak issue, memory not free
Partial-Bug: 1794903

Change-Id: I5d12f9bec2f089674fc601e4cd297f72daefa6f8
Signed-off-by: Martin Chen <haochuan.z.chen@intel.com>
2018-10-19 04:37:54 +08:00
Zuul
0362090b73 Merge "Mtce: Add heartbeat cluster information for SM query" 2018-10-16 18:51:26 +00:00
Zuul
918e9b8bc6 Merge "fix uninitialized scalar variable issue" 2018-10-16 15:41:48 +00:00
Martin Chen
19cf6880ec fix uninitialized scalar variable issue
Closes-Bug: 1794910

Change-Id: Iaf1452c39673f963e1eb5dc8f68f8add26a4aa30
Signed-off-by: Martin Chen <haochuan.z.chen@intel.com>
2018-10-17 12:29:02 +08:00
Dean Troyer
5558276784 Add api-ref and relnotes publish jobs
* Add pointers in the main doc to api-ref and releasenotes pages
* Add publish-stx-api-ref and publish-stx-releasenotes jobs
* Add search at bottom of api-ref and relnotes pages to trigger the jobs

Change-Id: Ib41f10ce72eb283d4edbeb1ecc0543403295d7bf
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018-10-11 08:21:53 -05:00
Scott Rifenbark
aef1a3b2ba releasenotes: Grammar edit.
Change-Id: I2d79be365c48a85180810d9a4b44d9f103e34fb0
Signed-off-by: Scott Rifenbark <srifenbark@gmail.com>
2018-10-10 21:50:46 +00:00
Zuul
cddb485837 Merge "Add publish job for docs" 2018-10-10 13:21:00 +00:00
Zuul
2ebe8c2e8e Merge "[Doc] SPDX License Identifier" 2018-10-09 15:17:34 +00:00
Dean Troyer
6e8731e572 Add publish job for docs
Also set the theme to alabaster until starlingxdocs is ready

Change-Id: I6a113b9fddb64792b5454b3ef0ef866ef9f74fc6
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-10-06 14:23:21 -05:00
Eric MacDonald
8a223f395d Mtce: Add heartbeat cluster information for SM query
This part one of a two part HA Improvements feature that introduces
the collection of heartbeat health at the system level.

The full feature is intended to provide service management (SM)
with the last 2 seconds of maintenace's heartbeat health view that
is reflective of each controller's connectivity to each host
including its peer controller.

The heartbeat cluster summary information is additional information
for SM to draw on when needing to make a choice of which controller
is healthier, if/when to switch over and to ultimately avoid split
brain scenarios in a two controller system.

Feature Behavior: A common heartbeat cluster data structure is
introduced and published to the sysroot for SM. The heartbeat
service populates and maintains a local copy of this structure
with data that reflects the responsivness for each monitored
network of all the monitored hosts for the last 20 heartbeat
periods. Mtce sends the current cluster summary to SM upon request.

General flow of cluster feature wrt hbsAgent:

  hbs_cluster_init: general data init
  hbs_cluster_nums: set controller and network numbers
  forever:

    select:
      hbs_cluster_add / hbs_cluster_del: - add/del hosts from mtcAgent
      hbs_sm_handler -> hbs_cluster_send: - send cluster to SM

    heartbeating:
      hbs_cluster_append: add controller cluster to pulse request
      hbs_cluster_update: get controller cluster data from pulse responses
      hbs_cluster_save: save other controller cluster view in cluster vault
      hbs_cluster_log: log cluster state changes (clog)

Test Plan:

  PASS: Verify compute system install
  PASS: Verify storage system install
  PASS: Verify cluster data ; all members of structure
  PASS: Verify storage-0 state management
  PASS: Verify add of second controller
  PASS: Verify add of storage-0 node
  PASS: Verify behavior over Swact
  PASS: Verify lock/unlock of second controller ; overall behavior
  PASS: Verify lock/unlock of storage-0 ; overall behavior
  PASS: Verify lock/unlock of storage-1 ; overall behavior
  PASS: Verify lock/unlock of compute nodes ; overall behavior
  PASS: Verify heartbeat failure and recovery of compute node
  PASS: Verify heartbeat failure and recovery of storage-0
  PASS: Verify heartbeat failure and recovery of controller
  PASS: Verify delete of controller node
  PASS: Verify delete of storage-0
  PASS: Verify delete of compute node
  PASS: Verify cluster when controller-1 active / controller-0 disabled
  PASS: Verify MNFA and recovery handling
  PASS: Verify handling in presence of multiple failure conditions
  PASS: Verify hbsAgent memory leak soak test with continuous SM query.
  PASS: Verify active controller-1 infra network failure behavior.
  PASS: Verify inactive controller-1 infra network failure behavior.

Change-Id: I4154287f6dcf5249be5ab3180f2752ab47c5da3c
Story: 2003576
Task: 24907
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-10-05 22:47:17 +00:00
Robert Church
ded5907f5b Update minimal PV size to include k8s LVs
Make sure the minimally sized physical volume for the cgts-vg volume
group is sized correctly when the --kubernetes option is used with
config_controller. Include the kubernetes specific logical volumes that
will be present in a kubernetes configuration.

Change-Id: Ifc6d2c0a5dbb880f8e1ff73b01bc05cb8ab22855
Partial-Bug: #1794567
Signed-off-by: Robert Church <robert.church@windriver.com>
2018-10-03 14:11:21 -04:00
Zuul
0d5aeffc35 Merge "Mtce: Increase swact receive retry delay" 2018-10-02 21:59:31 +00:00
Eric MacDonald
66ba248389 Mtce: Increase swact receive retry delay
Maintenance is seen to intermittently fail Swact requests when
it fails to get a response from SM 500 msecs after having issued
the request successfully.

A recent instrumentation update went in which verified that the
http request was being launched properly even in the failure cases.

Seems the 500 msec timeout might not be long enough to account
for SM's scheduling/handling.

This update increases the receive retry delay from 50 msec to 1 second.

Change-Id: I29d6ba03094843a2af9d8720dd074572d76a31a4
Related-Bug: https://bugs.launchpad.net/starlingx/+bug/1791381
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-10-02 19:04:19 +00:00
Kristine Bujold
3f337d5edb Fix bug with PXE Boot Server
The UEFI grub config file used for PXE Boot Server containes a mixed
of menuentry  and submenu. A submenu opens a new context and global
variables cannot be used within them
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1175127. This
was causing pxe_root to be empty for submenus selections. Moving
'1) UEFI Boot from hard drive' into a submenu fixes this problem and
also removes the need to save the root variable. This was previously
required when a user would select boot from hard drive and the hard
drive did not exist, root would no longer be set to the pxeboot
server and selecting the other menu entries would fail.

Remove the '1)' prefix in the 'Boot from hard drive' menu option in
both Legacy and UEFI boot as this is not used by other menues.

Closes-Bug: 1794863

Change-Id: I5bc62039bfb68477e9cd0166ce17877693037640
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>
2018-10-01 14:20:41 -04:00
Scott Rifenbark
90252d2421 stx-metal: API ref doc content added.
I added the api-ref-sysinv-v1-metal.rst file, which is content
for the API reference manual to the api-ref/source dir. This
represents the converted old-style files to the newer OpenStack
supported RST files.

I updated the index.rst file to include the new .rst file so that
the api-ref document can build.

Updated the actual output using Platform http://10.10.10.2:6385/v1
and querying.

Change-Id: I943dd136b2b083743e053db5070d4da7fbe78685
Signed-off-by: Scott Rifenbark <srifenbark@gmail.com>
2018-09-28 17:33:34 +00:00
Dean Troyer
56b739594d Add api-ref job
Change-Id: I8b41c1ecc9ee22aacc55e8413e539ff7b7879a30
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-09-28 11:00:48 -05:00
Abraham Arce
2a8193f812 [Doc] stx.2018.10 Release Summary
Create stx.2018.10 release summary note.

Change-Id: I1ad08a3f770603d2e51fed89c8aa2ad412fb5a36
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018.10.rc1
2018-09-27 11:45:18 -05:00
Abraham Arce
d948fdc228 [Doc] SPDX License Identifier
- Change to SPDX-License-Identifier: Apache-2.0

Change-Id: Ie98277f4b8fca7084ff8d3f9622ced048a5d5f9c
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018-09-25 09:39:07 -05:00
Stefan Dinescu
bea3956ee9 Change backup partition size for small disks
Minimum backup partition size is calculated as the sum between the
sizes of the glance and database partitions, as well as a 20GB
overhead.

This fix increase the default backup size partition to 40GB to
be in line with the above calculations, considering the database
and the glance partitions are 10GB each by default.

This also increases the minimum disk requirements from 120GB
to 130GB.

Change-Id: I5cfc329870a84a6245d868b4c4990829e702e886
Closes-bug: 1793543
Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>
2018-09-24 15:34:37 +03:00
Jim Gauld
6a5e10492c Decouple Guest-server/agent from stx-metal
This decouples the build and packaging of guest-server, guest-agent from
mtce, by splitting guest component into stx-nfv repo.

This leaves existing C++ code, scripts, and resource files untouched,
so there is no functional change. Code refactoring is beyond the scope
of this update.

Makefiles were modified to include devel headers directories
/usr/include/mtce-common and /usr/include/mtce-daemon.
This ensures there is no contamination with other system headers.

The cgts-mtce-common package is renamed and split into:
- repo stx-metal: mtce-common, mtce-common-dev
- repo stx-metal: mtce
- repo stx-nfv: mtce-guest
- repo stx-ha: updates package dependencies to mtce-pmon for
  service-mgmt, sm, and sm-api

mtce-common:
- contains common and daemon shared source utility code

mtce-common-dev:
- based on mtce-common, contains devel package required to build
  mtce-guest and mtce
- contains common library archives and headers

mtce:
- contains components: alarm, fsmon, fsync, heartbeat, hostw, hwmon,
  maintenance, mtclog, pmon, public, rmon

mtce-guest:
- contains guest component guest-server, guest-agent

Story: 2002829
Task: 22748

Change-Id: I9c7a9b846fd69fd566b31aa3f12a043c08f19f1f
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2018-09-18 17:15:08 -04:00
Teresa Ho
eb7559f335 Support mgmt and infra network on an interface
Currently, management interface can be shared with infrastructure only over
an VLAN. This update supports both management and infrastructure network
sharing a single interface.

Story: 2003087
Task: 23171
Depends-On: https://review.openstack.org/#/c/601156

Change-Id: Ie97dbd1260f5c98d7401b0e48361ebd87f060f65
Signed-off-by: Teresa Ho <teresa.ho@windriver.com>
2018-09-17 12:47:04 -04:00
Dean Troyer
a2dc830d33 Add some jobs for docs and releasenotes
* Use build-openstack-docs-pti job template for docs
* Use build-openstack-releasenotes job for release notes
  (We can't use the OpenStack releasenotes template as it includes
  publish jobs, stx needs its own)
* Add newnote tox environment as convenience for creating new release
  notes, re-using the releasenotes venv.
* Create a release summary note.

Change-Id: I5a610cfe271707fd704248ede0db75be6d031121
Signed-off-by: Dean Troyer <dtroyer@gmail.com>
2018-09-13 20:59:12 -05:00
Zuul
6e094435f0 Merge "[Doc] OpenStack API Reference Guide" 2018-09-13 14:23:17 +00:00
Zuul
2cbcd3e282 Merge "[Doc] Release Notes Management" 2018-09-13 14:17:53 +00:00
Zuul
d80884b5da Merge "[Doc] Building docs following Docs Contrib Guide" 2018-09-13 14:17:50 +00:00
Stefan Dinescu
103cccd786 Simplify disk space allocation
Allocating of space on the root disk was done in kickstart, combined
with config_controller. Various checks for disk and volume group
space made it difficult to add new filesystem partitions.

To simplify this, all checks are now merged under disk size checks.

Kickstart files now create the partitions of the same size for
rootfs and log partitions, no matter the disk size, with the
only variable size being the cgts-vg size.

Thus, any future changes will only need to consider only this
size for ensuring enough space is present.

The limit between small and big disks is 240GB, with a minimum
disk size of 120GB.

Depends-On: https://review.openstack.org/#/c/600743/
Change-Id: I37ecc8eb5468811d1ca3a71f8e2a0629525e8fad
Closes-bug: 1791170
Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>
2018-09-12 10:12:56 +03:00
Zuul
0e9b4a9b2f Merge "Mtce: Improve non-blocking http request dispatch" 2018-09-11 14:25:35 +00:00
Zuul
31c4beff75 Merge "Mtce: Make Heartbeat Failure Action Configurable" 2018-09-11 13:50:31 +00:00
Zuul
5ea0426b13 Merge "pep8 job enable and fix pep8 reported issue" 2018-09-11 00:57:05 +00:00
Eric MacDonald
316032b904 Mtce: Improve non-blocking http request dispatch
Maintenance is seen to intermittently fail Swact requests early
after initial system provisioning, without logging an error
reason, only to always succeed later on.

The issue is difficult to reproduce so this update adds extra
logging to this code path and implements a speculative fix.

The event_base_loop calls' non-zero return code is never being
logged. The libevent documentation states that this API will
return 1 while the target has not yet provided any data.

Theory is, because the call is local, that normally it returns
with data even on the first dispatch case. However, during early
system configuration, when the system is busy, that first dispatch
does not complete immediately like it normally does later on.

Speculation is, instead it returns a 1 stating retry but the
existing code path treats that as a failure.

This update modifies the code to return a PASS if the command
dispatch returns a 1 while the error case of -1 gets enhanced
logging and continues to be treated as a failure.

Test Plan:
PASS: Swact 5 times
PASS: Lock/Unlock Host
PASS: Large System DOR

Related Bug: https://bugs.launchpad.net/starlingx/+bug/1791381
Change-Id: I19b22e07d3224b2e9dd3f3569ecbe9aed7d9402f
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-09-10 19:02:42 +00:00
Eric MacDonald
74c5f89ab4 Mtce: Make Heartbeat Failure Action Configurable
The current maintenance heartbeat failure action handling is to Fail
and Gracefully Recover the host. This means that maintenance will
ensure that a heartbeat failed host is rebooted/reset before it is
recovered but will avoid rebooting it a second time if its recovered
uptime indicates that it has already rebooted.

This update expands that single action handling behavior to support
three new actions. In doing so it adds a new configuration service
parameter called heartbeat_failure_action. The customer can configure
this new parameter with any one of the following 4 actions in order of
decreasing impact.

   fail - Host is failed and gracefuly recovered.
        - Current Network specific alarms continue to be raised/cleared.
          Note: Prior to this update this was standard system behavior.
degrade - Host is only degraded while it is failing heartbeat.
        - Current Network specific alarms continue to be raised/cleared.
        - heartbeat degrade reason is cleared as are the alarms when
          heartbeat responses resume.
  alarm - The only indication of a heartbeat failure is by alarm.
        - Same set of alarms as in above action cases
        - Only in this case no degrade, no failure, no reboot/reset
   none - Heartbeat is disabled ; no multicase heartbeat message is sent.
        - All existing heartbeat alarms are cleared.
        - The heartbeat soak as part of the enable sequence is bypassed.

The selected action is a system wide setting.
The selected setting also applies to Multi-Node Failure Avoidance.
The default action is the legacy action Fail.

This update also

 1. Removes redundant inservice failure alarm for MNFA case in support
    of degrade only action. Keeping it would make that alarm handling
    case unnecessarily complicated.
 2. No longer used 'hbs calibration' code is removed (cleanup).
 3. Small amount of heartbeat logging cleanup.

Test Plan:
PASS:    fail: Verify MNFA and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery
PASS:    fail: Verify Single Host heartbeat failure and recovery (from none)
PASS: degrade: Verify MNFA and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery
PASS: degrade: Verify Single Host heartbeat failure and recovery (from alarm)
PASS:   alarm: Verify MNFA and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery
PASS:   alarm: Verify Single Host heartbeat failure and recovery (from degrade)
PASS:    none: Verify heartbeat disable, fail ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignore and no recovery
PASS:    none: Verify Single Host heartbeat ignode and no recovery (from fail)
PASS: Verify action change behavior from none to alarm with active MNFA
PASS: Verify action change behavior from alarm to degrade with active MNFA
PASS: Verify action change behavior from degrade to none with active MNFA
PASS: Verify action change behavior from none to fail with active MNFA
PASS: Verify action change behavior from fail to none with active MNFA
PASS: Verify action change behavior from degrade to fail then MNFA timeout
PASS: Verify all heartbeat action change customer logs
PASS: verify heartbeat stats clear over action change
PASS: Verify LO DOR (several large labs - compute and storage systems)
PASS: Verify recovery from failure of active controller
PASS: Verify 3 host failure behavior with MNFA threshold at 3 (action:fail)
PASS: Verify 2 host failure behavior with MNFA threshold at 3 (action:fail)

Depends-On: https://review.openstack.org/601264
Change-Id: Iede5cdbb1c923898fd71b3a95d5289182f4287b4
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2018-09-10 13:03:30 -04:00
Jack Ding
c9a4b9c1b5 Exclude platform-util-controller from non-controller
Filter out platform-util-controller from compute and storage nodes.

Story: 2002826
Task: 26228
Depends-On: https://review.openstack.org/600868

Change-Id: I3d8fc737c9a59caef7658d4139b302cee3841592
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-09-07 17:41:48 -04:00
Sun Austin
90ce692186 pep8 job enable and fix pep8 reported issue
use flake8 as pep8 tools
enable check and gate for pep8(voting)
Fix below flake8 issues:
    E127 continuation line over-indented for visual indent
    E211 whitespace before '('
    E222 multiple spaces after operator
    E302 expected 2 blank lines, found 1
    E501 line too long (101 > 79 characters)
    E502 the backslash is redundant between brackets
    F401 'platform' imported but unused
    W391 blank line at end of file

Change-Id: Idfb953e52c8ee35c2adefdf0e4143a381c7f49e2
Story: 2003426
Task:  24596
Signed-off-by: Sun Austin <austin.sun@intel.com>
2018-09-06 09:45:51 +08:00
Abraham Arce
0162fa2526 [Doc] OpenStack API Reference Guide
Baseline changes to comply with OpenStack API documentation
from OpenStack Documentation Contributor Guide [0]:

- [1] How to document your OpenStack API service

[0] https://docs.openstack.org/doc-contrib-guide
[1] https://docs.openstack.org/doc-contrib-guide/api-guides.html

Story: 2002712
Task: 24451

Change-Id: I91f7640583081c83d3d43786e979ccfb5bee0490
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2018-09-05 19:59:26 -05:00