df9ae04e54222414d576d831eccb6339c3b1e643
When there is a buggy cephfs client, the ceph health detail output
will show a message like the one below:
HEALTH_WARN 1 clients failing to respond to capability release; 1 \
clients failing to advance oldest client/flush tid
MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability \
release
mds.controller-0(mds.0): Client controller-0 failing to respond\
to capability release client_id: 774246
MDS_CLIENT_OLDEST_TID 1 clients failing to advance oldest \
client/flush tid
mds.controller-0(mds.0): Client controller-0 failing to advance \
its oldest client/flush tid. client_id: 774246
When this happens, the cephfs client cannot read or write to
the volume. To restore the communication, it is necessary to force a
client reconnection.
To force this reconnection, the client must be evicted by Ceph. The
client will be disconnected and added to the Ceph blacklist. After
clearing the blacklist, the client will reconnect to the Ceph cluster.
The client hung detection and the eviction procedure are implemented
in the /etc/init.d/ceph script when checking the status of the MDS
process. The script will look for the error output like this one:
mds.controller-0(mds.0): Client controller-0: failing to respond to \
capability release client_id: 774246
Test-Plan:
PASS: Start a pod reading from and writing to a cephfs pvc in a loop
PASS: Inject the error line to the Ceph health detail output, verify
the detection appears in the ceph-process-states.log log file
and check if the client has been evicted and then reconnected.
Closes-bug: 2085648
Signed-off-by: Felipe Sanches Zanoni <Felipe.SanchesZanoni@windriver.com>
Change-Id: I2fad851652cf269b4ebb758b2dfdbe994f2a7b0c
integ
StarlingX Integration
Description
Languages
JavaScript
31.7%
Shell
27.2%
Python
17.3%
Perl
9.4%
Makefile
5.5%
Other
8.8%