Here I7bfdef3ea2128bbb4e26e3a00161fe30ce29b8e7
we disabled some jobs that involve scripts from
OSH git repo because these scripts had to be
aligned with the new values_overrides location and
directory structure.
Change-Id: I7d0509051c8cd563a3269e21fe09eb56dcdb8f37
This is the action item to implement the spec:
doc/source/specs/2025.1/chart_versioning.rst
Also add overrides env variables
- OSH_VALUES_OVERRIDES_PATH
- OSH_INFRA_VALUES_OVERRIDES_PATH
This commit temporarily disables all jobs that involve scripts
in the OSH git repo because they need to be updated to work
with the new values_overrides structure in the OSH-infra repo.
Once this is merged I4974785c904cf7c8730279854e3ad9b6b7c35498
all these disabled test jobs must be enabled.
Depends-On: I327103c18fc0e10e989a17f69b3bff9995c45eb4
Change-Id: I7bfdef3ea2128bbb4e26e3a00161fe30ce29b8e7
This PS updates ceph-osd pod containers making
sure that osd pods are not stuck at deletion. In
this PS we are taking care of another background
process that has to be terminated by preStop
script.
Change-Id: Icebb6119225b4b88fb213932cc3bcf78d650848f
This PS updates ceph-osd pod containers making sure
that osd pods are not stuck at deletion.
It adds missed lifecycle preStop action for log0runner container.
Change-Id: I8d6853a457d3142c33ca6b5449351d9b05ffacda
This PS updates ceph-osd pod containers making sure
that osd pods are not stuck at deletion. Also
added similar approach to add lifecycle ondelete
hook to kill log-runner container process before pod restart.
And added wait_for_degraded_object function to
helm-test pod making sure that newly deployed pod
are joined the ceph cluster and it is safe to go
on with next ceph-osd chart releade upgrade.
Change-Id: Ib31a5e1a82526906bff8c64ce1b199e3495b44b2
Removing tini from ceph daemon as this didn't resolve
an issue with log runner process as will be resolved in
another change in post-apply job.
Change-Id: I4ebb1d12e736d387e6e34354619a532dd50dfeae
When name of storage class is specified as default, do not add
storageClassName option to let kubernetes pick a default
Change-Id: I25c60e49ba770ce10ea2ec68c3555ffea49848fe
Allow to set terminationGracePeriodSeconds for server instace to let
more time to shutdown all clients gracefully.
Increase timeout to 600 seconds by default.
Change-Id: I1f4ba7d5ca50d1282cedfacffbe818af7ccc60f2
It was observed that under certain circumstances
galera instances can use old IP address of the node
after pod restart. This patch changes the value of
wsrep_cluster_address variable - instead of listing
all dns names of the cluster nodes the discovery service
IP address is used. In this case cluster_node_address is set to IP
address instead of DNS name - otherwise SST method will fail.
Co-Authored-By: Oleksii Grudev <ogrudev@mirantis.com>
Change-Id: I8059f28943150785abd48316514c0ffde56dfde5
The method was deprecated and later dropped, switch to is_alive()
Co-Authored-By: dbiletskiy <dbiletskiy@mirantis.com>
Change-Id: Ie259d0e59c68c9884e85025b1e44bcd347f45eff
* Move all probes into single script to reduce code duplication
* Check free disk percent, fail when we consume 99% to avoid
data corruption
* Do not restart container when SST is in progress
Change-Id: I6efc7596753dc988aa9edd7ade4d57107db98bdd
Make 'data too old' timeout dependent on state report interval. Increase
timeout to 5 times of report interval.
Change-Id: I0c350f9e64b65546965002d0d6a1082fd91f6f58
Sometimes "endpoints_dict" var can be evaluated to None
resulting in "TypeError: 'NoneType' object is not iterable"
error. This patch catches the exception while getting
list of endpoints and checks the value of
endpoints_dict. Also the amount of active endpoints is being logged
for debug purposes.
Change-Id: I79f6b0b5ced8129b9a28c120b61e3ee050af4336
The retries were originally added at [0] but they were never working.
We pass fixed revision that we would like to see during patch to avoid
race condition, into the safe_update_configmap. We can't organize retries
inside function as it will require change of the original revision which
may happen only at upper layer. Revert patch partially.
[0] https://review.opendev.org/c/openstack/openstack-helm-infra/+/788886
Change-Id: I81850d5e534a3cfb3c4993275757c244caec8be9
Stop monitor cluster and leader election threads on sigkill.
This allows to terminate all threads from start.py and actually
exit earlier than terminationGracePeriod in statefulset.
Drop preStop hook which is redundant with stop_mysqld() function call.
Change-Id: Ibc4b7604f00b1c5b3a398370dafed4d19929fd7d
During cold start we pick leader node by seqno. When node is running
of finished non gracefully seqno may stay as -1 unless periodic task
update its based on local grastate.dat or will detect latest seqno via
wsrep_recover. This patch adds an unfinite waiter to leader election
function to wait unless all nodes report seqno different that -1 to make
sure we detect leader based on correct data.
Change-Id: Id042f6f4c915b21b905bde4d57d40e159d924772
Sometimes the pod fails to terminate correctly,
leaving zombie processes. Add option to use tini
to handle processes correctly. Additionally update
log-tail script to handle sigterm and sigint.
Change-Id: I96af2f3bef5f6c48858f1248ba85abdf7740279c
Recently we switched from Deployment to Statefulset
to make it possible to work with memcached instances
directly w/o load balancer. The strategy field is not
valid for statefulsets, so here we remove it.
Change-Id: I52db7dd4563639a55c12850147cf256cec8b1ee4
This commit adds recommended kubernetes name label to pods definition.
This label is used by FluxCD operators to correctly look for the
status of every pod.
Change-Id: I866f1dfdb3ca8379682e090aca4c889d81579e5a
Signed-off-by: Johnny Chia <johnny.chialung@windriver.com>