250 Commits

Author SHA1 Message Date
marvin
0d5e7e5409 Removing unused flag disable_worker_services
The disable_worker_services file was originally created
to prevent the (bare metal) nova-compute services from
running on a newly upgraded controller in an AIO-DX
configuration. This situation no longer exists because
the bare metal nova-compute services do not exist after
transiting to containers. this flag is no longer needed.
Removing all references to the disable_worker_services file.

Change-Id: Ic9555a36890f613f440e97f9090b22ff5ec8fd82
Partial-Bug: #1838432
Signed-off-by: marvin <weifei.yu@intel.com>
2019-11-02 08:34:56 +08:00
Zuul
df4d161b04 Merge "Build layering, add layer build config file" 2019-10-30 12:19:09 +00:00
Zuul
f69bba0b34 Merge "openSUSE: Runtime Dependencies" 2019-10-23 03:39:15 +00:00
Zuul
bb2cafa72c Merge "openSUSE: Open Build Service Artifacts" 2019-10-23 03:35:40 +00:00
Scott Little
fa609c5ee7 Build layering, add layer build config file
Story: 2006166
Task: 37121

Change-Id: I57708587d763b2f87b78eec1878b17a68e2a36c8
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-10-21 10:53:26 +08:00
Zuul
ba726d9f3e Merge "Change shebang to help rpm runtime dependency detector." 2019-10-18 13:36:17 +00:00
Al Bailey
6260cb0b74 Turn off devstack as a zuul job
devstack is failing, most likely because StarlingX
uses postgres, and postgres was dropped in devstack by:
cf1c847191

I am not removing the devstack job declaration, or the devstack files
because in the future StarlingX could convert from postgres to
another DB backend, at which point we might want to revisit
using devstack.

Change-Id: I3adec4669d9181d71421f43905f86bf2e7e211c2
Partial-Bug: 1848557
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-10-17 16:08:14 -05:00
Zuul
d2e00bf56b Merge "Remove not needed shebangs in sm_client" 2019-10-16 14:11:28 +00:00
Abraham Arce
03b3f990ea openSUSE: Runtime Dependencies
Resolve runtime dependencies for the following service manager
components:

- sm-client
- sm-tools
- sm-api

High availability OBS workspace has been moved to xe1gyq home
project [0], adding repository
Cloud_StarlingX_2.0_openSUSE_Leap_15.1 [1] in order to:

- allow all succesfull packages appear under xe1gyq
  repository [2].
- automatically include other flock dependencies
  (e.g. mtce-devel).

Refer to the following OBS workspaces to verify all service
management packages have built successfully under repository
Cloud_StarlingX_2.0_openSUSE_Leap_15.1:

- https://build.opensuse.org/package/show/home:xe1gyq/sm-db
- https://build.opensuse.org/package/show/home:xe1gyq/sm-common
- https://build.opensuse.org/package/show/home:xe1gyq/sm
- https://build.opensuse.org/package/show/home:xe1gyq/sm-client
- https://build.opensuse.org/package/show/home:xe1gyq/sm-tools
- https://build.opensuse.org/package/show/home:xe1gyq/sm-api

[0] https://build.opensuse.org/project/show/home:xe1gyq
[1] https://build.opensuse.org/repositories/home:xe1gyq
[2] https://download.opensuse.org/repositories/home:/xe1gyq/

Depends-On: https://review.opendev.org/#/c/679686

Story: 2006684
Task: 36968
Task: 36969
Task: 36970

Change-Id: I0a21652fff83b5da8acdfb0191df87165b88389e
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2019-10-09 10:05:54 -05:00
Abraham Arce
f38de3f45f openSUSE: Open Build Service Artifacts
OBS is a generic system to build and distribute binary packages
from sources [0], StarlingX OBS Project:

- Cloud:StarlingX:2.0 [1]

Build Service Management uses Open Build Service (OBS) with the
following base artifacts under Service Management repository:

- Specfiles
- Changelogs
- Rpmlintrcs

The following components are included and succesfully building,
(with their source OBS repository):

- sm        [2]
- sm-common [3]
- sm-db     [4]
- sm-api    [5]
- sm-client [6]
- sm-tools  [7]

The following considerations are taken for Gerrit files:

- Added %changelog directive to all specfiles

The following considerations are taken for OBS _service files:

- Added parameter "extract" to get spec, changes and rpmlintrc files.
- All component version standardized to 1.0.0

[0] openbuildservice.org
[1] https://build.opensuse.org/project/show/Cloud:StarlingX:2.0
[2] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm
[3] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm-common
[4] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm-db
[5] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm-api
[6] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm-client
[7] https://build.opensuse.org/package/show/home:xe1gyq:branches:Cloud:StarlingX:2.0/sm-tools

Story: 2006508
Task: 36495
Task: 36496
Task: 36497
Task: 36498
Task: 36534
Task: 36794

Change-Id: I06a7e132de4892b846d99977ff1bfc5bf240ade4
Co-authored-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
Signed-off-by: Abraham Arce <abraham.arce.moreno@intel.com>
2019-10-09 10:05:20 -05:00
Bin Qian
fc0828238f Bug1845393 remove interface recovering state
In the case of a switch recycle, the connected nic will go down and up
but the communication will restore after the switch is up and running.
This could take a few seconds (much longer than anticipated).

This holds off the i/f state update to the peer.

Also remove the batching interface failover state change. This is already
handled in the failover fsm fail_pending state.

Change-Id: Ia810927dbbc4b3821f7915e6a42bceeac43d9e46
Closes-Bug: 1845393
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-10-07 09:04:08 -04:00
Erich Cordoba
cc401099d7 Change shebang to help rpm runtime dependency detector.
Not having this change causes a linter error in opensuse.

Change-Id: I52830fa64bdb5f1b5bb00c4052f3c047be728bb3
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-10-04 10:25:58 -05:00
Erich Cordoba
c8735e882a Remove version from sm folder
The sm component had the 1.0.0 version in the folder name, this
change removes that version and updates the centos_pkg_dirs.

Story: 2006623
Task: 36827

Depends-On: https://review.opendev.org/#/c/685128/
Change-Id: I6725d1f961c2a82275da5fabbff8e89a8dd6f245
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-26 14:11:31 -05:00
Erich Cordoba
54a16057ff Remove version from sm-db folder
The sm-db component had the 1.0.0 version in the folder name, this
change removes that version and updates the centos_pkg_dirs.

Story: 2006623
Task: 36829

Depends-On: https://review.opendev.org/#/c/685127
Change-Id: Ia6025337529f4f48a89c175bb524548d81bc993f
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-26 14:08:15 -05:00
Erich Cordoba
44f220a3b8 Remove version from sm-common folder
The sm-common component had the 1.0.0 version in the folder name, this
change removes that version and updates the centos_pkg_dirs.

Story: 2006623
Task: 36828

Change-Id: I0e998a3e2482bc06f3a91f9494a3e5d21faa28e7
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-26 12:00:43 -05:00
Zuul
1ec7911bfe Merge "Fix LSB headers in sm" 2019-09-20 16:43:47 +00:00
Erich Cordoba
326ffc96f4 Fix LSB headers in sm
The opensuse build system reported two linter issues regarding
the LSB scripts in sm. The issues are:

  - For `sm`: Has `Should-Start` but no `Should-Stop`.
  - For `sm-shutdown`: LSB header not found.

To fix this issues the `Should-Stop` line was added in `sm` and
the LSB header was added in `sm.shutdown` script.

In `sm.shutdown` the `Default-Start` and `Default-Stop` were set
as the same as `sm`. `sm.shutdown` does nothing on the start stage
so this change won't affect any functionality.

Story: 2006508
Task: 36648

Change-Id: I4fac67a0a1c1abd82e47a3293aeae3036ee9722b
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-19 18:17:20 -05:00
Zuul
f1bc749aa4 Merge "Ensure in AIO-SX, i/f down does not block node going enabled" 2019-09-19 18:01:55 +00:00
Bin Qian
2966c89c1c Ensure in AIO-SX, i/f down does not block node going enabled
AIO-SX by design does not have a peer, so it never needs to
communicate potential peer before determining its role. For 
AIO-SX even all network interfaces are down, the node should
still go enabled based on the situation of the node.

Closes-Bug: 1844427

Change-Id: Iafe0a8209cdbd3f83514c07041856cf6b6824f9c
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-09-19 13:59:00 +00:00
Erich Cordoba
59321f0438 Remove not needed shebangs in sm_client
The linters in the Opensuse build service are failing because sm_client has
unneeded python shebangs in the code. This is because a python source code
file that is not intended to be executed shouldn't include this shebang.

Also, the linter fails as `/usr/bin/env python` is used causing that the
dependency discovery tool fails. It is safe to use `/usr/bin/python` as
currently we don't provide any other python version.

Story: 2006508
Task: 36647

Change-Id: If3f83b9562414c3392515828a3c716a5bc23015d
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-16 08:34:39 -05:00
Zuul
b5f7d066f3 Merge "Modify memory leak in some abnormal cases by adding free function." 2019-09-11 21:10:30 +00:00
Zuul
193e7c390a Merge "Fix format-truncation warnings in sm" 2019-09-11 21:07:59 +00:00
Erich Cordoba
c8691c93d8 Fix format-truncation warnings in sm
Building sm is not possible in opensuse as the code present
format-truncation warnings and the opensuse's build system
enforces the -Werror flag.

The solution is to define the proper string lengths.

  - SM_INTERFACE_NAME_MAX_CHAR was set to IFNAMSIZ.
  - SM_SERVICE_ACTION_PLUGIN_EXIT_CODE_MAX_CHAR increase to 32.
  - SM_SERVICE_HEARTBEAT_ADDRESS_MAX_CHAR decrease to 108.

These changes were updated in the database schema as well.

Story: 2006523
Task: 36551

Change-Id: Icce1d912c147fc6caaf06cc93de3cddadbcb0720
Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>
2019-09-11 12:34:54 -05:00
Zuul
ebba79607c Merge "Extend timeout for the kubectl cmds in dbmon" 2019-09-10 13:07:35 +00:00
Zuul
986825c832 Merge "Fix IPv6 standby controller boot loop" 2019-09-10 01:49:48 +00:00
Bin Qian
7f52df37bd Fix IPv6 standby controller boot loop
IPv6 multicast should be sent to the interface that the socket
binds to.

Closes-Bug: 1842949
Change-Id: I14b6c5193c67a0ddd69e31d1044219c4e9fd6b94
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-09-09 13:15:59 +00:00
Bin Qian
cfd3686d8e Extend timeout for the kubectl cmds in dbmon
In AIO-DX, during the swact, dbmon experiences kubectl commands
respond slower than expected. dbmon reports error while the kubectl
commands not responding within 5 seconds, the 5 seconds timeout is too
short.

Extend the timeout to 10 seconds, to avoid reporting unnecessary error.

Change-Id: Ie07c84e0a53c00ac78970bf6b06e6cf0b19479e1
Closes-Bug: 1837919
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-09-05 15:34:28 -04:00
Scott Little
7775f12ddd Config file changes to add 'stx-ocf-scripts ' after relocation from 'stx-upstream'
Story: 2006166
Task: 35687
Depends-On: I665dc7fabbfffc798ad57843eb74dca16e7647a3
Change-Id: Icadc524bca7b6027caebf3923fce8260e17d9ef1
Signed-off-by: Scott Little <scott.little@windriver.com>
Depends-On: I363681d077eb5724982ca0e9d7d4fa17ac7298dd
Depends-On: I814d35ca3e55fbfb9e0a462f3f05ff2db6a9cca5
2019-09-04 15:59:21 -04:00
Don Penney
e544061f67 Update barbican OCF scripts to enhance logging
This commit updates the barbican OCF scripts to address
logging issues:
- barbican-api is updated to set permissions on the logfile
  to restrict access
- barbican-keystone-listener and barbican-worker are updated
  to log via syslog

Depends-On: I31b29bb8ffff28cd329b383704b88cf73199bcec
Change-Id: I814d35ca3e55fbfb9e0a462f3f05ff2db6a9cca5
Partial-Bug: 1836632
Signed-off-by: Don Penney <don.penney@windriver.com>
2019-09-04 15:43:48 -04:00
Bin Qian
9de51a38bc Fix dbmon warning
mariadb secret name changed to mariadb-dbadmin-password,
update the ocf script accordingly.

Depends-On: I777895497300cc605762db002958a778cd204e49
Change-Id: I31b29bb8ffff28cd329b383704b88cf73199bcec
Closes-Bug: 1826891
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-09-04 15:42:59 -04:00
Chris Friesen
80e84d3b0d Add dbmon timeouts to handle swact scenario
It turns out that when swacting we can end up with kubernetes going
down for a while, causing kubectl commands to hang.

Accordingly, let's add some timeouts to critical commands to limit
how long they can hang for.

Depends-On: I8d91dc13cb9a9adb7f7a7a95faadad4339ddb466
Change-Id: I777895497300cc605762db002958a778cd204e49
Story: 2004712
Task: 30410
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-09-04 15:42:41 -04:00
Chris Friesen
2f44c61d11 Add dbmon ocf script for containerized mariadb on AIO-DX
On a two-node system the openstack-helm chart for mariadb has issues.

If you run with a single replica then your failover times are very
long due to internal timeouts in kubernetes which prevent accessing
the backing volume on the newly-active node.

At the same time, you can't run a "garbd" pod the way we do on a full
lab configuration because there is no third node to run it on.

The only viable option we've found is to trigger something to
explicitly tell the mariadb pod on the active node to bootstrap a new
primary cluster if it loses quorum due to the other mariadb pod going
away unexpectedly.

Accordingly, this commit creates a new "dbmon" OCF script which
behaves basically as follows:

start -- return $OCF_SUCCESS
stop -- return $OCF_NOT_RUNNING
standby -- return $OCF_SUCCESS or $OCF_NOT_RUNNING depending on whether
           mariadb on this node is a member of the primary cluster
active -- if mariadb on this node is not a member of the primary cluster
          then tell it to bootstrap a new primary cluster.  Then check
          again and return $OCF_SUCCESS or $OCF_NOT_RUNNING depending on
          whether mariadb on this node is a member of the primary cluster
monitor -- If mariadb on this node is a member of the primary cluster
           then return $OCF_RUNNING_MASTER on the active controller and
           $OCF_SUCCESS on the standby controller.  If mariadb is not a
           member of the primary cluster return $OCF_NOT_RUNNING.

There are a few complicating factors.
If openstack application or mariadb chart not installed then treat it
like being a member of the primary cluster.
If the mariadb pod is still initializing treat it like not being a member
of the primary cluster.
If we're in a standard lab (with garbd running on a compute node) then
don't actually tell mariadb to bootstrap a new primary cluster but just
report whether it's a member of the primary cluster or not.

Story: 2004712
Task: 30410
Depends-On: I2667d56a71b7d3881c03b6a5c1e5ed61d4f0b902
Change-Id: I8d91dc13cb9a9adb7f7a7a95faadad4339ddb466
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-09-04 15:42:18 -04:00
Alex Kozyrev
0e9618e96c OCF scripts to manage Barbican processes as an HA resource.
Create OCF scripts for controlling Barbican processes lifecycle.
There are three Barican proceses that needs to be managed:
barbican-api, barbican-keystone-listener and barbican-worker.

Depends-On: I63a6fd3d112a98449ea22524bb2a83b5db8ce6d1
Change-Id: I2667d56a71b7d3881c03b6a5c1e5ed61d4f0b902
Story: 2003108
Task: 27700
Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>
2019-09-04 15:41:59 -04:00
Scott Little
490a99a667 Add OCF file for cinder-backup
As part of switching to the upstream implementation of cinder B&R, we
need an OCF script to manage the cinder-backup process.

Depends-On: I6bec51c7401339f4c71f9558d73389d0c793093d
Change-Id: I63a6fd3d112a98449ea22524bb2a83b5db8ce6d1
Story: 2003715
Task:  26375
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-04 15:41:36 -04:00
Scott Little
8d887a37ae Move StarlingX OCF scripts into a stand alone package.
The following upstream projects did not have OCF scripts and these were
created for StarlingX:

aodh-api
aodh-evaluator
aodh-listener
aodh-notifier
ceilometer-agent-notification
heat-api
heat-api-cfn
heat-api-cloudwatch
ironic-api
ironic-conductor
magnum-api
magnum-conductor
murano-api
murano-engine
nova-conductor
nova-placement-api
nova-serialproxy
panko-api

Move these out of stx/git.openstack-ras and place them into a seperate
package within the openstack/stx-upstream repo.

Depends-On: I080b6e893d5f6ccff04951879eed71e8ccbe0b52
Change-Id: I6bec51c7401339f4c71f9558d73389d0c793093d
Story: 2003715
Task:  26375
Signed-off-by: Scott Little <scott.little@windriver.com>
2019-09-04 15:41:10 -04:00
Zuul
e9fa094a82 Merge "Fix memory leak bugs in sm-provision." 2019-08-29 19:29:48 +00:00
ZhangQig
8cbee33774 Fix memory leak bugs in sm-provision.
Also, add a free call in sm_service_group_member_deprovision() 
and sm_service_deprovision().

Change-Id: If6009ce9df3b2a133610e7ce74f5006ecfc99803
Closes-Bug: #1837975
Signed-off-by: ZhangQing <zqhsh527@163.com>
2019-08-29 00:50:06 +00:00
Zuul
3d052b977c Merge "Enhance timer system to avoid double deregister" 2019-08-22 13:35:06 +00:00
Andreas Jaeger
13e42caf7b Use Zuul templates
Use templates instead of individual jobs so that these
can be changed in one place.

Depends-On: https://review.opendev.org/677606
Change-Id: Ic70832ed4e4fba3343381f7ead611085c0849994
2019-08-21 12:54:55 +00:00
Al Bailey
d7dc7b1eaa Fixing failing devstack zuul job
The glance devstack plugin is not working for us,
and is not needed for our devstack to work, so updating
the zuul job to use the "min" devstack version that is used
by other repos such as 'fault' and avoid setting up the
glance devstack plugin altogether.

Change-Id: Id16671961e10962530d2eaff28387b4b206e0a3b
Partial-Bug: 1840292
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
2019-08-15 13:45:26 -05:00
Bin Qian
66e0404217 Enhance timer system to avoid double deregister
The bug reported was because the dbmon service audit timer was
overwritten accidentally, therefore no audit was performed so the
dbmon service was not actually being audit.

Major change is to enhance timer system to use global unique timer
id (not reused) to ensure timer is not double deregistered by 2
different mechanisms (disarm/deregister).
Change the timer id to 64 bit integer to ensure id never overflow.

Above change eliminates the double deregistering a timer issue which
could accidentally deregister a new timer that reuses the same id.

Also some cleaning to get rid of cases that could double deregister
timer (although it is no longer harmful as above mentioned change is
in place)

Change-Id: I2603870d2eb2749d78456e406095ae543353963f
Closes-Bug: 1837724
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-08-13 13:53:39 +00:00
Zuul
2d04c4e428 Merge "dcdbsync for containerized openstack services - SM" 2019-08-12 16:24:09 +00:00
Andy Ning
b64d3a384b dcdbsync for containerized openstack services - SM
This update added the dcdbsync service for containerized openstack
services into SM. Note that this second dcdbsync instance is also
running on platform (not containerized)

Story: 2004766
Task: 36099
Change-Id: If406127d26d6230771c0d44105da3a08facf3277
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2019-08-06 16:36:01 -04:00
Kristine Bujold
4ef138fcf1 Collapse the glance filesystem into platform
The filesystem /opt/cgcs is removed and the “helm_charts” and “keystone”
folders now resides under /opt/platform.

  ls /opt/platform/
  armada  config  helm  nfv  puppet  sysinv

  ls /opt/cgcs/
  helm_charts  keystone

Resources related to cgcs-drbd and /opt/cgcs are removed from puppet.
SMS is no longer monitoring these resources.

Tested in AIO-SX, AIO-DX and Standard hardware labs.

Depends-On: https://review.opendev.org/674360
Partial-Bug: 1830142

Change-Id: I4be7a877efb89bb9e5c2b067bdc7e4259f2b0c0c
Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>
2019-08-02 13:35:54 -04:00
YeHuiSheng
5696e1e56d Modify memory leak in some abnormal cases by adding free function.
Change-Id: I0076635d9c19d39637bf56a3404a4e4ad1f9506f
Closes-Bug: #1837976
Signed-off-by: YeHuiSheng <hsye@fiberhome.com>
2019-07-26 15:20:58 +08:00
Stefan Dinescu
95367fd675 Increase SM timeout for ceph-mon
Note: this only affects AIO-DX setups as that is the only kind
      of setup where ceph-mon is managed by SM

In some edge-cases, during a swact, ceph-mon may take too long
to be stopped on the active controller resulting in a failed
swact.

This change increases the timeout to account for those
edge cases.

Change-Id: I3ace73650e4fe9aafc84c82e2ffe048f2039305e
Partial-bug: 1836075
Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>
v2.0.0.rc0
2019-07-25 14:59:03 +03:00
Zuul
bb8d962771 Merge "Fix the error links for ha docs" 2019-07-23 13:57:33 +00:00
Zuul
026d5fd730 Merge "Add mtc-agent service dependency to fm-mgr" 2019-07-18 16:34:39 +00:00
Bin Qian
a729bbabc6 Add mtc-agent service dependency to fm-mgr
Add mtc-agent service dependency to fm-mgr to ensure mtc-agent shuts
down before fm-mgr does.

An issue was found that in rare cases a swact occurs when mtc-agent
try to clear an alarm, while fm-mgr has been disabled, clear alarm
message went lost. The alarm therefor remained not being able to
clear.

Closes-bug 1829289

Change-Id: I39196d5f3ce764a14b4d1e0fb1a4f3344ddd6a1a
Signed-off-by: Bin Qian <bin.qian@windriver.com>
2019-07-12 13:00:13 -04:00
Mingyuan Qi
85b0ec621b Add floating ip for ironic network
This commit adds ironic-ip service to sm_db for ironic floating ip.

Story: 2004760
Task: 35689

Change-Id: I45039427cc5c96fd0639cf086d7e431244c4e1d9
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
2019-07-09 10:08:55 +08:00