Commit Graph

36 Commits

Author SHA1 Message Date
Lucas Cavalcante
31c4390122 Fix nova plugin fqdn override
Setting custom domain for ingress endpoints breaks apply.
osh-nova and osh-nova-api-proxy are trying to use the same domain,
both starting with 'nova'. This causes a kubernetes error.

Signed-off-by: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
Closes-bug: 1938342
Change-Id: Ic284b83425917102a652330f8349aed38731f9df
2021-07-28 18:30:14 -03:00
Angie Wang
eec60f8b48 Add lifecycle semantic check for auto update
Stx-openstack app is not a RPM installed app which
doesn't support auto-update.

Change-Id: Iec0233910c9e7725c12767138e25b3bd314f82b0
Story: 2007960
Task: 42833
Depends-On: https://review.opendev.org/c/starlingx/config/+/800821/
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2021-07-16 13:20:10 -04:00
Yvonne Ding
4379649008 Disallow application-apply when vim_progress_status is not enabled
This fix is specific for AIO-SX because when node is unlocked/enabled/
available the vim_progress_status could still be services-disabled.
The status need a few more seconds to become services-enabled.

Add a pre-check in openstack-armada-app/lifecycle_openstack.py to check
AIO-SX node stable state before perform_app_apply. It prevents
stx-openstack apply being triggered manually during initialization
stage after node unlock.

Closes-bug: 1929775
Signed-off-by: Yvonne Ding <yvonne.ding@windriver.com>
Change-Id: I563f77f617a68092b59f6cb38f5fb436a7933498
2021-06-08 09:26:21 -04:00
Thiago Brito
963e63cd55 Fix cpu_shared/dedicated_set config location
Change I61514389b616db754b0d2f35deb0101f90dbdd02 removed the deprecated
property vcpu_pin_set in favor of the newer cpu_shared_set and
cpu_dedicated_set, but those new configs are placed under the [compute]
section of nova.conf instead of [DEFAULT]. This is causing VMs to be
scheduled on platform reserved cores. This commit will fix it.

Closes-Bug: #1928683

Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I541760619f4c79c66a2bf22715afdc873b8343ce
2021-05-17 18:26:12 +00:00
Zuul
38470c8045 Merge "Update cpu_shared_set and cpu_dedicated_set in nova config" 2021-04-09 14:40:07 +00:00
Gustavo Santos
58f4d9ffca Add k8s proxy-body-size to horizon overrides
The current network.dashboard.ingress.annotations in horizon's
values.yaml helm charts do not include the kubernetes property
'proxy-body-size'. This makes the resulting nginx.conf file in ingress
add the default rule 'max_body_size 1m' to the horizon servers,
which limits all http requests' size inside horizon to 1MiB, making it
impossible to upload images larger than that to glance using the
horizon GUI, for example.

This change adds said property to the horizon overrides, making
horizon's servers in nginx.conf include a 'max_body_size' of 2500MiB,
which makes uploading images up to that size possible again.

Story: 2008692
Task: 41996
Change-Id: I91888ce238d5304c08eb1e97918989b8f93ee34f
2021-03-08 14:56:55 -03:00
Dan Voiculeasa
b5c1f62088 Introduce metadata for app behavior control
Keep existing behavior when evaluating app reapplies.

Story: 2007960
Task: 41755
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: Ie02743cdf056dda3feb66911c74f9dabe69d98dd
2021-02-25 10:34:57 +02:00
Martin, Chen
eab750b7ff Add override setting in openstack helm plugin for rook-ceph
Deploy with rook-ceph, without "system storage-backend-add ceph"
there is no object storage-ceph in database. As current openstack
helm plugin fixed on object storage-ceph, in rook-ceph case
use a fixed override setting

Story: 2005527
Task: 39914

Depends-On: https://review.opendev.org/#/c/713084/

Change-Id: Ied852d60e8b15d55865747e0b6f4b54f2392d6df
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>
2021-01-27 14:29:20 +00:00
Zuul
591f5aa40d Merge "Fix apply of stx-openstack when host is locked" 2021-01-22 23:00:35 +00:00
Dan Voiculeasa
852d8d61db Introduce lifecycle operator to openstack app
A big chunk of logic is moved from sysinv conductor to application
itself.

Following hooks were necessary:
pre-apply, post-apply, pre-manifest-apply, pre-apply-rbd,
pre-apply-resource, post-remove-rbd, post-remove-resource, post-remove

Change-Id: I41858c831a4af564dbdf38934d51d34489bf8a9a
Story: 2007960
Task: 41293
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
2021-01-13 22:39:07 +02:00
hbrito
b64b020446 Increase proxy-connect-timeout to avoid nginx timeout errors
This patch increases the proxy-connect-timeout from 5 to 30 seconds,
avoiding the Bad Gateway 502 error when CLI commands are executed.

Closes-bug: 1908720
Change-Id: I557456e9d0550a906b6d849d682de7ea3f0f42ad
Signed-off-by: hbrito <hugo.brito@windriver.com>
2021-01-07 20:00:51 +00:00
Zhipeng Liu
cb9854c701 Update cpu_shared_set and cpu_dedicated_set in nova config
Starting from Ussuri, OpenStack is deprecating vcpu_pin_set
in favor of cpu_dedicated_set and cpu_shared_set. These
overriders must be supported to be generated via Starlingx
system commands.

Closes-Bug: 1904729
Change-Id: I61514389b616db754b0d2f35deb0101f90dbdd02
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
2021-01-05 14:18:35 +00:00
Shuicheng Lin
ed82abff0f Create stx_admin account for flock service to communicate with openstack
admin account is used before, but if admin password is changed, flock
service cannot be notified and cannot get the new password, so flock
service like nfv-vim cannot fetch openstack vm info ever.
stx_admin account is created for this case.

Depends-On: https://review.opendev.org/753971
Closes-Bug: 1887755

Change-Id: I36f2442036bf6c98fbb0af727fddf1dd50e58330
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
2020-12-01 12:55:22 +08:00
Zuul
8bd9842dfd Merge "Remove kube-system-ingress from openstack operator" 2020-11-05 15:41:28 +00:00
Shuicheng Lin
e972af2ec6 Correct CEPH_POOL_BACKUP_PG_NUM name to fix python module error
The correct name should be CEPH_POOL_BACKUP_CHUNK_SIZE.

Closes-Bug: 1900710

Change-Id: Ie3aa2c6009cc626c2224ea464e8bea8c719316a3
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
2020-10-21 09:28:37 +08:00
Mihnea Saracin
5fee64eca7 Remove kube-system-ingress from openstack operator
When we apply stx-openstack with the 'mode' argument
like `system application apply restore_db`, only
some of the openstack charts must be deployed.
If kube-system-ingress chart groups is specified,
it won't be found in the armada manifest and the
openstack application will always be deployed
in the default way (deploying all the charts),
ignoring the value of the 'mode' argument.

Depends-on: https://review.opendev.org/#/c/698003/
Change-Id: I6791974e337cd3193bf2a75e9d75f48841f0676d
Story: 2006770
Task: 37780
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
2020-10-13 17:36:44 +03:00
Elena Taivan
a643665af4 Change default pg_num values for ceph pools:
- cinder-volumes
    - cinder.backups
    - images
    - ephemeral

Pg_num values were increased to avoid ceph health warning
that occurs on larger systems due to the default
pg_num settings not being large enough.

Change-Id: I23feffe613c37b12dff51c73e7ced9a9c7663089
Closes-bug: 1899128
Signed-off-by: Elena Taivan <elena.taivan@windriver.com>
2020-10-13 06:10:47 +00:00
Mihnea Saracin
fc68439414 Fix apply of stx-openstack when host is locked
Currently, all of the stx-openstack services have the
replica count set to the number of the controllers.
If one of the controllers is locked their replicas
number will still be 2 which is incorrect.
We solve this by changing the number of replicas
to be equal to the number of the active controllers.
The rabbitmq and mariadb services cannot use this approach because
they are unable to work properly if their replica number
is decreased from 2 to 1. So a kubernetes toleration
is used here to allow the rabbitmq and mariadb pods to be
deployed on the locked controller.

Change-Id: I15cf2a3f62525751435ddbe66760935f3ab21d2b
Closes-Bug: 1879018
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
2020-09-11 18:46:52 +03:00
Mihnea Saracin
d73c7c494d Revert "Fix apply of stx-openstack when host is locked"
The commit that we are reverting broke the normal lock/unlock
case when stx-openstack is applied. More specifically,
the mariadb pod failed to start when stx-openstack
was applied automatically after unlock.

This reverts commit 754a1d33de.

Change-Id: I0f1e5854d22ed54747d0237153ada3985f29ef96
2020-08-25 11:35:18 +03:00
Zuul
cc42f7cf54 Merge "Update mariadb-server suspect_timeout to default value to align with garbd's suspect_timeout" 2020-08-19 13:57:28 +00:00
Zuul
d50204f174 Merge "Remove subcloud openstack overrides" 2020-08-16 18:41:30 +00:00
Dan Voiculeasa
260378f6de Remove subcloud openstack overrides
VMs fail to launch using openstack on subclouds.
Remove openstack admin endpoint overrides as they are not used
currently and they cause openstack services interaction.

Closes-Bug: 1875914
Change-Id: I2ad12ff9b10adb9f3838bca348856ce152a45b21
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
2020-08-15 18:03:56 +03:00
Angie Wang
b21dfbda22 Optimize nova and neutron helm plugins
The dbapi calls in nova and neutron plugins scale linearly with
the number of worker nodes which results in poor performance on
a large number of nodes system.

Currently, the dbapi calls get invoked for each of worker node.
This commit reduces it to a certain number of calls.

Tested stx-openstack upload and apply on a lab with 6 worker nodes.
The time of override generation for nova reduced from 11s to 0.6s
and for neutron reduced from 24s to 0.2s.

Also tested on vbox and lab has "vf" type of interface. Verified
the content of generated overrides are same as before.

Change-Id: I2d19d30b01e3348d6eb60b8e2681e3a30ef93ebc
Partial-Bug: 1886563
Signed-off-by: Angie Wang <angie.wang@windriver.com>
2020-08-14 14:52:15 -04:00
Zuul
9d2e4fcd40 Merge "Add nova pci alias for GPUs Matrox G200E, NVidia M60, P40, T4" 2020-08-12 16:52:50 +00:00
Jim Gauld
c86fae1b23 Add nova pci alias for GPUs Matrox G200E, NVidia M60, P40, T4
This adds nova pci-alias definitions for these GPUs:
- Matrox G200E (type-PCI), 'matrox-g200e'
- NVidia Tesla M60 (type-PCI), 'nvidia-tesla-m60'
- NVidia Tesla P40 (type-PCI), 'nvidia-tesla-p40'
- NVidia Tesla T4 (type-PF), 'nvdia-tesla-t4-pf'

The end user no longer needs to first override the nova helm chart
to launch VMs with these GPUs.

Previously the user needs to provide overrides like this:
cat << EOF > ./gpu_override.yaml
conf:
  nova:
    pci:
      alias:
        type: multistring
        values:
        - '{"vendor_id": "8086", "product_id": "0435", "name":
          "qat-dh895xcc-pf"}'
        - '{"vendor_id": "8086", "product_id": "0443", "name":
          "qat-dh895xcc-vf"}'
        - '{"vendor_id": "8086", "product_id": "37c8", "name":
          "qat-c62x-pf"}'
        - '{"vendor_id": "8086", "product_id": "37c9", "name":
          "qat-c62x-vf"}'
        - '{"name": "gpu"}'
        - '{"vendor_id": "102b", "product_id": "0522", "name":
          "matrox-g200e"}'
        - '{"vendor_id": "10de", "product_id": "13f2", "name":
          "nvidia-tesla-m60"}'
        - '{"vendor_id": "10de", "product_id": "1b38", "name":
          "nvidia-tesla-p40"}'
        - '{"vendor_id": "10de", "product_id": "1eb8",
          "device_type":
          "type-PF", "name": "nvidia-tesla-t4-pf"}'
EOF

system helm-override-update \
 --values ./gpu_override.yaml stx-openstack nova openstack --reuse-values
system application-apply stx-openstack

Closes-Bug: 1880997
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
Change-Id: Iaa212351c13b9d279afff2d25dfeb1ffac0bb99d
2020-08-11 16:54:34 -04:00
Martin, Chen
2f664927c4 Update mariadb-server suspect_timeout to default value to align
with garbd's suspect_timeout

In openstack-helm-infra, it launch evs.suspect_timeout=PT30S
for mariadb-server in configmap, mariadb-etc. This setting is
for three mariadb-server pod deployment, every mariadb-server
with same setting suspect_timeout=30s. But after change to two
mariadb-server and one garbd arbitrator. Setting in configmap
mariadb-etc evs.suspect_timeout=PT30S, only takes effect for 2
mariadb-server, for garbd arbitrator, it use galera default
setting evs.suspect_timeout=PT5S. If mariadb-server-1 exit
abnormal, after 5s, garbd arbitrator suspects mariadb-server-1
is dead, but as not reach 30s, mariadb-server-0 thinks mariadb-server-1
is not dead. In this state, quorum fail, garbd arbitrator and
mariadb-server-0 both set to none primary component, service
down.
For fix solution, set value.conf.data.config_override to override
wsrep_provider_option in mariadb helm chart, which makes garbd
arbitrator and mariadb-server launch with same setting for
"evs.suspect_timeout=PT5S", default value. By this way, mariadb
server recovery time will also improve. To update setting for
"evs.suspect_timeout", it should both update override for mariadb
and garbd helm chart.

Setting for "gmcast.listen_addr=tcp://0.0.0.0:<port>", takes
effect for both ipv4 and ipv6. So keeps such setting.

Reference link for wsrep option and galera cluster quorum
https://mariadb.com/kb/en/wsrep_provider_options/
https://galeracluster.com/library/documentation/weighted-quorum.html

Closes-Bug: 1888546

Change-Id: I06983cf0d91d4d9aa88f352e64b1e6571b816ec6
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>
2020-08-10 10:38:39 +08:00
Mihnea Saracin
754a1d33de Fix apply of stx-openstack when host is locked
Currently, all of the stx-openstack services have the
replica count set to the number of the controllers.
If one of the controllers is locked their replicas
number will still be 2 which is incorrect.
We solve this by changing the number of replicas
to be equal to the number of the active controllers.
The rabbitmq service cannot use this approach because
it is unable to work properly if its replicas number
is decreasaed from 2 to 1. So a kubernetes toleration
is used here to allow the second rabbitmq pod to be
deployed on the locked controller.

Change-Id: Ie979c7b5f2755ad673bd180e38b68e0d53c5f9b2
Closes-Bug: 1879018
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
2020-07-23 17:08:36 +03:00
Zhipeng Liu
2496a170fe Fix the second mariadb node could not join cluster issue.
This bug was introduced by below commit
d3164c63dc
The update after PATCH SET 10 will cause the second mariadb could not
join cluster. In this case, could not set bind_address=:: for ipv4. It
only works for ipv6.

As for conf.database.config_override, we can override it through
system helm-override-update command, but could not use python
plugin to dynamically override it as it will introduce a "-|" line
in first line of config file.
A user override for conf.database.config_override might break the IPv6
system overrides, it need including ipv6 config for ipv6 case as well.

Test pass on duplex setup. Openstack application applied successfully.

Closes-Bug: 1886003

Change-Id: I23c2fb6a7c8b5a38af1e046894d5fae247df2d6f
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
2020-07-07 17:23:41 +08:00
Zuul
e181787872 Merge "Add mariadb database config override to support ipv6" 2020-06-28 04:44:37 +00:00
Robert Church
0cdedf8e44 Make plugins that are common across applications unique
The current implementation of the application framework requires that
plugin names are unique across all applications loaded on the system.
This adjusts the PSP RoleBinding and Helm Toolkit plugins so they don't
conflict with other applications.

Change-Id: Ia5e301d869a4e7200e92010e30f0ee93f2590472
Story: 2006537
Task: 40154
Signed-off-by: Robert Church <robert.church@windriver.com>
2020-06-24 07:55:51 -04:00
Zhipeng Liu
d959e6b7fe Add mariadb database config override to support ipv6
Add "config_override" in conficonfigmap-etc.yaml for ipv6.
It could not be dynamically override in helm/mariadb.py.
Upstream patch: https://review.opendev.org/735277

Story: 2007474
Task:  39879

Change-Id: I9342e16fd98d0099e7e7043b46f00b2374203a51
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
2020-06-23 00:00:00 +00:00
Zuul
eb41746b8d Merge "Fix render error in cinder during openstack-helm rebase" 2020-06-22 02:14:06 +00:00
Zhipeng Liu
6d8f305873 Fix render error in cinder during openstack-helm rebase
Upstream commit https://review.opendev.org/#/c/706387/ modified the
template format and so we need to tweak our overrides to match the new
template format.

Story: 2007474
Task:  39546

Change-Id: I8a43727068f18bc32b60acaf282369d914c04ae0
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
2020-06-19 00:46:07 +00:00
Jerry Sun
92ed6fecc7 Set up openstack cluster role for pod security policies
This commit adds a helm chart that deploys a rolebinding to the openstack
application to allow deployments to the openstack namespace after
PodSecurityPolicy plugin is enabled on the Kubernetes cluster.

Change-Id: I57d3a31c9fcc7e03499e605d6d722fdb36004339
Partial-bug: 1878900
Depends-On: https://review.opendev.org/#/c/734408/
Depends-On: https://review.opendev.org/#/c/735998/
Signed-off-by: Jerry Sun <jerry.sun@windriver.com>
2020-06-17 08:39:49 -04:00
Stefan Dinescu
21a6be081b Disable default settings for NUMA-aware vswitch
By default, nova-compute has the NUMA-aware vswitch feature
enabled. This can be restrictive to users and, if needed,
users can use the sysinv helm-overrides commands to enable
them back.

Partial revert of 1a923c8474b0e4d7ef78b3444d131682babfe6aa

Change-Id: Ic820e91cbc81a7f22927e47ec7b9e55934d42cfa
Closes-bug: 1881672
Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>
2020-06-05 09:22:07 +00:00
Robert Church
949dd5aa77 Enable helm/armada plugin delivery with the application
This creates a new package spec called python-k8sapp-openstack that will
hold all the stevedore plugins needed to support the application. This
spec will build two packages python-k8sapp-openstack and
python-k8sapp-openstack-wheels.

These packages are included in the build dependencies for the
stx-openstack-helm application package build where the wheels file is
included in the application tarball.

The helm and armada plugins have been relocated to this repo and
provided in a k8sapp_openstack python module. This module will be
extracted from the wheels and installed on the platform via the sysinv
application framework. The module will be made available when the
application is enabled.

Change-Id: I342308fbff23d29bfdf64a07dbded4bae01b79fd
Depends-On: https://review.opendev.org/#/c/688191/
Story: 2006537
Task: 36978
Signed-off-by: Robert Church <robert.church@windriver.com>
2020-05-27 15:05:02 -04:00