starlingx/integ

Go to file

Felipe Sanches Zanoni df9ae04e54 Add Ceph mds client hung detection

When there is a buggy cephfs client, the ceph health detail output
will show a message like the one below:

HEALTH_WARN 1 clients failing to respond to capability release; 1 \
              clients failing to advance oldest client/flush tid

MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability \
              release
    mds.controller-0(mds.0): Client controller-0 failing to respond\
              to capability release client_id: 774246

MDS_CLIENT_OLDEST_TID 1 clients failing to advance oldest \
              client/flush tid
    mds.controller-0(mds.0): Client controller-0 failing to advance \
              its oldest client/flush tid.  client_id: 774246

When this happens, the cephfs client cannot read or write to
the volume. To restore the communication, it is necessary to force a
client reconnection.

To force this reconnection, the client must be evicted by Ceph. The
client will be disconnected and added to the Ceph blacklist. After
clearing the blacklist, the client will reconnect to the Ceph cluster.

The client hung detection and the eviction procedure are implemented
in the /etc/init.d/ceph script when checking the status of the MDS
process. The script will look for the error output like this one:

 mds.controller-0(mds.0): Client controller-0: failing to respond to \
     capability release client_id: 774246

Test-Plan:
  PASS: Start a pod reading from and writing to a cephfs pvc in a loop
  PASS: Inject the error line to the Ceph health detail output, verify
        the detection appears in the ceph-process-states.log log file
        and check if the client has been evicted and then reconnected.

Closes-bug: 2085648

Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I2fad851652cf269b4ebb758b2dfdbe994f2a7b0c

2024-11-06 14:51:36 +00:00

Fix HA clock selection algorithm

2024-10-16 18:19:55 -03:00

Merge "Remove CentOS/OpenSUSE build support"

2024-05-22 15:14:42 +00:00

centos-debian-compat

Update integ debian package ver based on git

2023-03-01 18:53:50 +00:00

Add Ceph mds client hung detection

2024-11-06 14:51:36 +00:00

Dynamize Postgres Auth Method Definition

2024-09-26 20:04:41 +02:00

database/mariadb/debian

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

Relocated some packages to repo 'utilities'

2019-09-05 20:31:36 -04:00

Fix tox-docs failing sphinx

2023-08-29 16:52:04 -03:00

docker/python-docker/debian

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

golang-github-dev

fix golang-github-golang-jwt-jwt-dev url

2024-01-31 10:39:34 -05:00

gpu/gpu-operator

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

kata-containers/debian

Changing Kata-container files path.

2024-10-28 15:58:45 -03:00

Set build size advisories

2024-09-26 13:26:58 -04:00

Merge "Change ldapsetpasswd error message"

2024-06-03 16:24:57 +00:00

Correct support for gateway address checking in dual-stack

2024-09-09 10:30:31 -03:00

Make sure the default driver is the in-tree drivers

2024-10-17 01:08:52 +00:00

Debian: setuptools: fix CVE-2022-40897/CVE-2024-6345

2024-09-26 16:52:28 +08:00

Switch to newer openstackdocstheme and reno versions

2020-06-04 14:28:48 +02:00

requests-toolbelt

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

storage-drivers/trident-installer/debian

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

Fix crashdump could not generate after kernel upgrade

2024-11-02 00:11:19 +08:00

Remove CentOS/OpenSUSE build support

2024-05-01 16:39:19 -04:00

.gitignore

Add Docker Registry Token Server

2019-01-08 11:42:04 -05:00

.gitreview

OpenDev Migration Patch

2019-04-19 19:52:31 +00:00

.pylintrc

tox: fixed warnings

2023-09-06 17:54:55 -03:00

.yamllint

Add .yamllint file

2021-09-09 19:05:36 +03:00

.zuul.yaml

Update pylint test to use debian-bullseye nodeset

2024-08-23 12:33:23 -05:00

bindep.txt

Fix pylint zuul jobs failing due to libvirt-python and pkgconfig

2019-07-04 14:14:39 -05:00

CONTRIBUTORS.wrs

StarlingX open source release updates

2018-05-31 07:36:35 -07:00

debian_build_layer.cfg

Add debian_build_layer.cfg file

2021-10-05 14:08:19 -04:00

debian_iso_image.inc

Fix crashdump could not generate after kernel upgrade

2024-11-02 00:11:19 +08:00

debian_pkg_dirs

Fix crashdump could not generate after kernel upgrade

2024-11-02 00:11:19 +08:00

debian_stable_docker_images.inc

Disable n3000 container build

2024-07-03 10:43:52 -04:00

distroless_stable_docker_images.inc

Remove Armada related packages from stx build

2023-09-27 18:58:13 +00:00

LICENSE

StarlingX open source release updates

2018-05-31 07:36:35 -07:00

README.rst

Followup opendev cleanup and test jobs

2019-04-21 09:23:19 -05:00

test-requirements.txt

Update pylint test to use debian-bullseye nodeset

2024-08-23 12:33:23 -05:00

tox.ini

Update pylint test to use debian-bullseye nodeset

2024-08-23 12:33:23 -05:00

README.rst

integ

StarlingX Integration

Languages

JavaScript 31.7%

Shell 27.2%

Python 17.3%

Perl 9.4%

Makefile 5.5%

Other 8.8%