This review updates openstack-helm-infra to commit
8351fdd0f1228717342c2accc96977b0cdc36dc3 and removes patches that were
merged on osh-i; fixes the remaining patches to the current diffs and do
minor adaptations to make osh-i work on StarlingX.
Story: 2009161
Task: 43151
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I36159b0264a79c3727b20e6ff1b7831183e47c3a
Adding a certificate and ca_certificate using:
`certificate-install -m {openstack | openstack_ca}` ends up breaking
openstack application. OS-STX forces public endpoint and when such
endpoint has TLS enabled everything breaks, therefore based on the
implementation of tls support for openstack-helm that enables tls
for the openstack services we picked the trust cert code without
actually enabling tls backends
Signed-off-by: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
Change-Id: I2dfc7c12defcc948fcdc353251301980e65f3011
Closes-Bug: 1937260
In a DX scenario, after lock-unlock a controller the remaining MariaDB
instance (lets say maria-server0) goes to a Non-Primary + Initializing
State (non-operational). After that it remains searching for the now
deleted pod (maria-server1) but using the old IP, the one before the
restart. maria-server0 flags the old IP as delayed and suspect for
eviction, however being a Non-Primary member it cannot in fact evict
the old node and start looking for new members. Setting a LivenessProbe
that detects nonoperational members and restart them fixes this, as the
recreated pod starts looking for a cluster to join.
Closes-Bug: #1938346
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I38d788f720cbd6bd13b6b6147db6f3d2a2ff9c92
In the event of an uncontrolled reboot on a Standard configuration,
we were seeing a behavior where the MariaDB pods kept trying to elect a
leader and restarting until the pods get to CrashLoopBackoff. After
checking the logs closely and reproducing the problem quite easily by
deleting both pods at the same time, we came to the conclusion that the
cluster wasn't having enough time to elect a new leader and recover from
the crash. This patch increases the timeout for the startup probe of the
mariadb statefulset with some slack to allow databases that are in
production to fully resync the data between the 2 pods.
Closes-Bug: #1938346
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I19e49dab55f3a8661fa71be315093029adb0947e
Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
the version is incremented above the hardcoded version.
Story: 2008455
Task: 41455
Signed-off-by: Don Penney <don.penney@windriver.com>
Change-Id: Icdc9d71d1268a4d3dd9e569c8642717bceadda5e
Currently, all of the stx-openstack services have the
replica count set to the number of the controllers.
If one of the controllers is locked their replicas
number will still be 2 which is incorrect.
We solve this by changing the number of replicas
to be equal to the number of the active controllers.
The rabbitmq and mariadb services cannot use this approach because
they are unable to work properly if their replica number
is decreased from 2 to 1. So a kubernetes toleration
is used here to allow the rabbitmq and mariadb pods to be
deployed on the locked controller.
Change-Id: I15cf2a3f62525751435ddbe66760935f3ab21d2b
Closes-Bug: 1879018
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
The requests to Openstack services hang/fail sometimes
due to message loss when connecting to internal service
endpoints. This issue was observed before and fixed in
commit https://review.opendev.org/#/c/683818/ by setting
net.ipv4.tcp_tw_reuse to 0, however, it's still being
seen on recent STX loads.
It has been tested and proved that requests go through
ingress pod do not have the issue. This commit updates
helm charts and manifest to make all requests sending
to openstack services go to ingress pod and then ingress
would forward requests to the corresponding api service.
Changes included:
- update helm-toolkit manifest job-ks-endpoint.yaml
to provide an ability to conditionally configure
all types of openstack endpoints with public endpoint
url when endpoints.identity.force_public_endpoint is
true. Same update for keystone and keystone-api-proxy.
With the update, for example, the admin,internal
and public endpoints for neutron will be
neutron.openstack.svc.cluster.local:80
- update armada manifest to make neccessary overrides
in openstack service configuration file to make
communications between services go through ingress
Change-Id: Icd646dd07d544da420a75f920bd7c3e8fc931327
Closes-Bug: 1880777
Signed-off-by: Angie Wang <angie.wang@windriver.com>
The commit that we are reverting broke the normal lock/unlock
case when stx-openstack is applied. More specifically,
the mariadb pod failed to start when stx-openstack
was applied automatically after unlock.
This reverts commit 754a1d33de.
Change-Id: I0f1e5854d22ed54747d0237153ada3985f29ef96
with garbd's suspect_timeout
In openstack-helm-infra, it launch evs.suspect_timeout=PT30S
for mariadb-server in configmap, mariadb-etc. This setting is
for three mariadb-server pod deployment, every mariadb-server
with same setting suspect_timeout=30s. But after change to two
mariadb-server and one garbd arbitrator. Setting in configmap
mariadb-etc evs.suspect_timeout=PT30S, only takes effect for 2
mariadb-server, for garbd arbitrator, it use galera default
setting evs.suspect_timeout=PT5S. If mariadb-server-1 exit
abnormal, after 5s, garbd arbitrator suspects mariadb-server-1
is dead, but as not reach 30s, mariadb-server-0 thinks mariadb-server-1
is not dead. In this state, quorum fail, garbd arbitrator and
mariadb-server-0 both set to none primary component, service
down.
For fix solution, set value.conf.data.config_override to override
wsrep_provider_option in mariadb helm chart, which makes garbd
arbitrator and mariadb-server launch with same setting for
"evs.suspect_timeout=PT5S", default value. By this way, mariadb
server recovery time will also improve. To update setting for
"evs.suspect_timeout", it should both update override for mariadb
and garbd helm chart.
Setting for "gmcast.listen_addr=tcp://0.0.0.0:<port>", takes
effect for both ipv4 and ipv6. So keeps such setting.
Reference link for wsrep option and galera cluster quorum
https://mariadb.com/kb/en/wsrep_provider_options/https://galeracluster.com/library/documentation/weighted-quorum.html
Closes-Bug: 1888546
Change-Id: I06983cf0d91d4d9aa88f352e64b1e6571b816ec6
Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>
Currently, all of the stx-openstack services have the
replica count set to the number of the controllers.
If one of the controllers is locked their replicas
number will still be 2 which is incorrect.
We solve this by changing the number of replicas
to be equal to the number of the active controllers.
The rabbitmq service cannot use this approach because
it is unable to work properly if its replicas number
is decreasaed from 2 to 1. So a kubernetes toleration
is used here to allow the second rabbitmq pod to be
deployed on the locked controller.
Change-Id: Ie979c7b5f2755ad673bd180e38b68e0d53c5f9b2
Closes-Bug: 1879018
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
This bug was introduced by below commit
d3164c63dc
The update after PATCH SET 10 will cause the second mariadb could not
join cluster. In this case, could not set bind_address=:: for ipv4. It
only works for ipv6.
As for conf.database.config_override, we can override it through
system helm-override-update command, but could not use python
plugin to dynamically override it as it will introduce a "-|" line
in first line of config file.
A user override for conf.database.config_override might break the IPv6
system overrides, it need including ipv6 config for ipv6 case as well.
Test pass on duplex setup. Openstack application applied successfully.
Closes-Bug: 1886003
Change-Id: I23c2fb6a7c8b5a38af1e046894d5fae247df2d6f
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
Add "config_override" in conficonfigmap-etc.yaml for ipv6.
It could not be dynamically override in helm/mariadb.py.
Upstream patch: https://review.opendev.org/735277
Story: 2007474
Task: 39879
Change-Id: I9342e16fd98d0099e7e7043b46f00b2374203a51
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
Upgrade openstack-helm-infra to below version.
commit 34d54f2812b7d54431d548cff08fe8da7f838124
Date: Sat Apr 11 15:24:54 2020 +0200
Cleanup py27 support and docs
Below 3 patches are removed as they are already merged.
Allow-multiple-containers-per-daemonset-pod.patch
Add-TLS-support-for-Gnocchi-public-endpoint.patch
Update ingress chart for Helm v3
Story: 2007474
Task: 39394
Change-Id: Icf624c8a0a6c74c8cfdb75ad45162e4a7aa5e404
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
This adds support for Helm v3.
- 'helm init' and initialization is no longer required
- 'chartmuseum' is used as a drop-in replacement for 'helm serv'
- all Charts require the tag: apiVersion: v1 (or v2)
This updates ingress chart to specify apiVersion.
Change-Id: Ie41cde4ad450b63a78a0a677995e9c28eefd9798
Story: 2007000
Task: 39327
Depends-On: https://review.opendev.org/719962
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
Adding probes parameters for armada overriding them in duplex AIO and
multi-node deployment. Specifically, there are 2 mariadb-servers in
the DB cluster for OpenStack services at duplex or multi-node cases.
These 2 mariadb-server pods are placed on Controller-0 and Controller-1
respectively (manipulated by anti-affinity). Whenever one Controller is
rebooted on purpose or even worse accidiently shutdown for any reasons
mariadb-server pod on that controller is gone together. To keep mariadb
cluster still working even with only one instance, we have to adjust
the default probe behaviors. Upon this request, we have to export probe
parameters for "startupProbe" and "readinessProbe" so that StarlingX
Armada application could set these parameters accordingly and thereby
mariadb server can still work as expected with even only one pod in the
cases of Controller node rebooting or shutdown.
Closes-bug: 1855474
Change-Id: I3a8a99edd44d7ac4257ddf79b6baba5c52714324
Signed-off-by: Hu, Yong <yong.hu@intel.com>
Co-Authored-By: Zhipeng, Liu <zhipengs.liu@intel.com>
When we use Armada to deploy openstack service for ipv6, rabbitmq
pod could not start listen on [::]:5672 and [::]:15672.
For ipv6, we need an override for configuration file.
Upstream patch link is:
https://review.opendev.org/#/c/714027/
Partial-Bug: 1859641
Depends-on: https://review.opendev.org/#/c/714034/
Change-Id: I34e92afe291c4b7f31f53f1b974ad5fdc47b9560
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
In nginx.tmpl, it not enclose ipv6 addresses in square brackets
resulting in them being unable to be parsed, which cause mariadb
ingress pod could not be ready.
Tested it on both ipv4 and ipv6 simplex setup, it fixes mariadb
ingress not ready issue.
Upstream patch submitted as below
https://review.opendev.org/#/c/710413/
Partial-Bug: 1859641
Change-Id: Ic7726eea671bbedf4f37fbe31965bc8fffd2e8cd
Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
Add variables for initial delay, period and timeout for rabbitmq
liveness and readiness probes. Default to current upstream settings.
Do not recommend this for upstreaming to openstack-helm-infra as
enhancements have been added since the last starlingx rebase to enable
more generic override of probes. On next rebase of starlingx on
openstack-helm-infra, recommend refactoring this change based on these
upstream commits (assuming upstream hasn't done it already):
https://review.opendev.org/#/c/668710/https://review.opendev.org/#/c/631597/
Partial-Bug: 1837426
Change-Id: I0a8d8f466c4b8482cc9161d28de37bff6fc7ced3
Signed-off-by: Gerry Kopec <gerry.kopec@windriver.com>
Here is the upstream patch link:
https://review.opendev.org/#/c/677425
Change-Id: I71b6d37b5e335ce9045937009fa93d47a49bcd4c
Partial-Bug: 1834796
Signed-off-by: Bin Yang <bin.yang@intel.com>
This is based on upstream(openstack-helm/openstack-helm-infra) review:
https://review.opendev.org/#/c/672966/
The only difference is that we do not carry over the Ceph changes
as we do not use Helm's Ceph in StarlingX.
Change-Id: Iabc3689bca198a861f2ade03a620895320897568
Closes-Bug: 1820902
Signed-off-by: Ovidiu Poncea <ovidiu.poncea@windriver.com>
This commit adds the capability for Aodh, Panko and Gnocchi
charts to support TLS on overriden fqdn for public endpoints.
Upstream(openstack-helm/openstack-helm-infra) reviews:
https://review.opendev.org/#/c/670121/https://review.opendev.org/#/c/670123/
Change-Id: I3011a9f0f07c9cf1b30694c97f3c02db6cdef56e
Partial-Bug: 1826583
Signed-off-by: Angie Wang <angie.wang@windriver.com>
The configmap is for the nginx ingress controller in mariadb
chart. With it, we enable the capability of overriding default
nginx configurations in the ingress controller.
Submitted this patch to upstream openstack-helm-infra also.
https://review.opendev.org/#/c/659560/
Closes-Bug: #1823803
Change-Id: Ibda2aef7413b4bf3cb990600463389a0b3661022
Signed-off-by: Yi Wang <yi.c.wang@intel.com>
In order to get swift working on containerized openstack,
changes were needed both on platform and application side.
From platform side, settings from ceph.conf file were replaced.
A runtime manifest was added to update ceph.conf after a successful
application apply:
1. Keystone auth url was updated with keystone openstack url
2. 'rgw_keystone_admin_domain' and 'rgw_keystone_project' settings
were updated with 'service'.
From application side the following changes have been implemented:
1. Ceph-rgw chart from openstack-helm-infra repo was included
in stx-openstack
2. A chart schema for ceph-rgw was added
3. An override file was generated
Change-Id: I7a17d55e1cb6cab2488237d923e02a3515379015
Signed-off-by: Elena Taivan <elena.taivan@windriver.com>
Story: 2003909
Task: 30607
Remove patches that were added on top of upstream to adapt helm to Ceph
Jewel.
Change-Id: I4d05a05ad116e33ee7c24432219c176c8a0b8d61
Depends-On: I815894e712c5ac7e2a3b83c7962a5a837e77e6df
Co-Authored-By: Robert Church <robert.church@windriver.com>
Signed-off-by: Daniel Badea <daniel.baeda@windriver.com>
Each patch included in this commit contains a commit message that
describes the required purpose of the patch.
Change-Id: Ia92158b77478c602e65280b09a744414c1bb31aa
Depends-On: Ic788a2c86edfbceca1f1ff18dd0344472546c81b
Story: 2004520
Task: 29966
Signed-off-by: Robert Church <robert.church@windriver.com>
In the docker image for mariadb-ingress if there are many cores
the calculated value for worker_rlimit_nofile ends up being 1024
which is too small. This change sets the min to 2048.
Closes-Bug: 1816479
Change-Id: I4f198b703eda61d9a9531640ec01a2770f9ec172
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
The functionality of local docker registry authentication will be
enabled in commit https://review.openstack.org/#/c/626355/. However,
the OSH doesn't support a way to pass credentials to kubernetes to
pull images from a registry with authentication turned on.
This commit adds a "imagePullSecrets" field in service account template
resource and references the well-known secret "default-registry-key"
which created in sysinv during application apply. With this change,
kubernetes will pull images from local registry using this secret.
Note:
- This is short-term solution. The long-term solution is to implement
the BP https://blueprints.launchpad.net/openstack-helm/+spec/support
-docker-registry-with-authentication-turned-on which creates the
secret in chart and pass the secret in service account conditionally.
- It works with an unauthed registry and non-existent or existent
secret "default-registry-key" as well.
Change-Id: Icdff8b385cee7f8b0311086ae892b3b1edacea37
Story: 2002840
Task: 28945
Signed-off-by: Angie Wang <angie.wang@windriver.com>
When removing the mariadb release from the cluster, this upstream commit
produces mariadb-ingress pods that are stuck in the "Terminating" state
with the associated containers becoming hung. This ultimately impacts
certain docker operations leading to PLEG health issues in the cluster.
The root cause of this is that the ingress pod uses dumb-init to start
the nginx-ingress-controller process. When the mariadb-ingress pod
terminates (via kill -TERM 1) all child processes are terminated but the
docker-containerd-shim remains causing the hung container condition.
Temporarily reverting this commit. A fix will be introduced upstream
dealing with dumb-init and this commit will be pick up again on the next
full chart rebase.
Change-Id: I25ad9bc3213468a9060e741917d96d9ac5c01b40
Story: 2004520
Task: 29420
Signed-off-by: Robert Church <robert.church@windriver.com>
The mariadb startup script was trying to optimize the single-replica
case but missed the fact that the variable it was checking was a
string rather than an int.
Converting it to an int before doing the comparison makes it work
as expected.
Change-Id: I0f920b52c5cc92672a71ee3db3d7f8e5700fb709
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Story: 2004712
Task: 29385
To enable cold migration, need to update nova charts in openstack-helm
and helm-toolkit chart in openstack-helm-infra. These changes build
on existing upstream components which attempt to add a second container
to the nova-compute pod which creates a sshd process listening on port
8022.
Nova chart changes include:
- Fix bug in ssh-config mapping so config file is generated properly in
/root/.ssh/config in nova-compute container.
- Move private key from sshd container to nova-compute container.
- Map private and public ssh keys to new configmap-ssh which will
default to acceptable file permissions (400) for ssh. Keys will be
provided in overrides.
- Add additional config to /etc/ssh/sshd_config to allow passwordless
root logins over appropriate subnet passed in from overrides. This
is the same as what is done in nova puppet currently.
- Remove chmods from sshd bash script as they are failing. Function is
replaced by configmap-ssh.
To enable cold migration in nova helm chart, we need to allow multiple
containers within the same daemonset pod. This requires a patch to
the helm-toolkit _daemonset_overrides template to remove upstream
restriction. This issue is tracked upstream by storyboard 2003876.
These changes should be upstreamed but may require further refinement.
Story: 2003909
Task: 28927
Change-Id: Id789ba051cec019e8b7564c713cf1b5296ecf9f6
Signed-off-by: Gerry Kopec <Gerry.Kopec@windriver.com>
The spec files for openstack-helm-infra and openstack-helm
have been updated to not require networking, and therefore
can be built the same as other std targets rather than as
a container target.
helm init --client-only was using networking and DNS lookup.
This commit sets up helm without running that command.
Story: 2004005
Task: 28793
Change-Id: I35c9b547a98fac559793bc2ec00012f6eded8ffa
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>
Upstream master has some fixes for the mariadb chart that we hope will improve
behaviour under fault scenarios so import them into our load.
When we update the repo to the latest upstream we should pick these up and
the patches can be dropped.
Change-Id: I5bb367db76b6d00d9922a4b1bb32d87aaa37cf91
Story: 2004520
Task: 28388
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
This commit is to add a cron job for gnocchi to periodically purge the
deleted openstack resources with its associated metrics.
Put all gnocchi chart updates in one patch.
Change-Id: Iab426887a7a997d72950674a7fc1a86a4bef480f
Story: 2003909
Task: 27083
Signed-off-by: Angie Wang <angie.wang@windriver.com>
Rebase to a current version of the upstream repos. This will retire the
reverted commits that we needed to enable per host overrides as this was
fixed upstream.
Change-Id: Iacbdd666687b8bc12053f9d3dd833f9896a508cd
Depends-On: Iedb814ce0c72a59ab0ce5e72e4601082b61f82b0
Story: 2003909
Task: 27632
Signed-off-by: Robert Church <robert.church@windriver.com>
There are 3 patches for openstack-helm-infra based on
upstream SHA 5ec85a5d70fab468160d2fdafed1a2a7a5151405
There are 3 patches for openstack-helm based on
upstream SHA add7a9bc1175f6fafa8ea2918bc1d62209aaf243
Those patches will be removed as the commits are squashed
and merged by the containerization team.
Story: 2003909
Task: 27632
Depends-On: I5c761b9261e72783f1771492d653e641193f7c52
Depends-On: I57c5ec5f3565e9e585f0935af745e495699aa28c
Change-Id: I566f5f841397195024db7c636c1db2be7b2c8f4d
Signed-off-by: Al Bailey <Al.Bailey@windriver.com>