[Spec] NFS related improvement for filesystem driver
Change-Id: Ic9b316a284641f4c5f33fe4238e08cf1d0faf2a1
This commit is contained in:
parent
3bdda0e98f
commit
52923f9fdd
|
@ -0,0 +1,221 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================================================
|
||||
Improve filesystem store driver to utilize NFS capabilities
|
||||
===========================================================
|
||||
|
||||
https://blueprints.launchpad.net/glance/+spec/improve-filesystem-driver
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The filesystem backend of glance can be used to mount NFS share as local
|
||||
filesystem, so it does not required to store any special configs at
|
||||
glance side. Glance does not care about NFS server address or NFS share
|
||||
path at all, it just assumes that each image is stored in the local
|
||||
filesystem. The downside of this assumption is that glance does not
|
||||
aware whether NFS server is connected/available or not, NFS share
|
||||
is mounted or not and just keeps performing add/delete operations
|
||||
on local filesystem directory which later might causes problem
|
||||
in synchronization when NFS is back online.
|
||||
|
||||
Use case: In a k8s environment where OpenStack Glance is installed on
|
||||
top of OpenShift and NFS share is mounted via the `Volume/VolumeMount`
|
||||
interface, the Glance pod won't start if NFS share isn't ready. Whereas
|
||||
if NFS share is not available after Glance pod is available then
|
||||
upload operation will fail with following error::
|
||||
|
||||
sh-5.1$ openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img cirros
|
||||
ConflictException: 409: Client Error for url: https://glance-default-public-openstack.apps-crc.testing/v2/images/0ce1f894-5af7-44fa-987d-f4c47c77d0cf/file, Conflict
|
||||
|
||||
Even though the Glance Pod is still up, `liveness` and `readiness` probes
|
||||
starts failing and as a result the Glance Pods are marked as `Unhealthy`::
|
||||
|
||||
Normal Started 12m kubelet Started container glance-api
|
||||
Warning Unhealthy 5m24s (x2 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 5m24s (x3 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 5m24s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 4m54s (x2 over 9m24s) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 4m54s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
|
||||
Later in time, according to the failure threshold set for the Pod,
|
||||
the kubelet marks the Pod as Failed, and we can see a failure, and
|
||||
given that the policy is supposed to recreate it::
|
||||
|
||||
glance-default-single-0 0/3 CreateContainerError 4 (3m39s ago) 28m
|
||||
|
||||
$ oc describe pod glance-default-single-0 | tail
|
||||
Normal Started 29m kubelet Started container glance-api
|
||||
Warning Unhealthy 10m (x3 over 26m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 10m kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 10m kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x4 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x5 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x2 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x3 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Failed 4m47s (x2 over 6m48s) kubelet Error: context deadline exceeded
|
||||
|
||||
Unlike other deployments (deployment != k8s) where even if NFS share is not
|
||||
available the glance service keeps running and uploads or deletes the data
|
||||
from local filesystem. In this case we can definitely say that NFS share is
|
||||
not available, the Glance won't be able to upload any image in the
|
||||
filesystem local to the container and the Pod will be marked as failed and
|
||||
it fails to be recreated.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
We propose to use `statvfs(path)` function of inbuilt `os` library to record
|
||||
the `f_fsid` attribute at the start of the glance-api service when
|
||||
filesystem store is initialized. In case of local directory is mounted
|
||||
to NFS share then `statvfs(path)` function returns Zero otherwise it
|
||||
will return Non Zero value for the same.
|
||||
|
||||
For example, if local FS `/opt/stack/data/glance/images` is mounted
|
||||
to NFS share::
|
||||
|
||||
$ df -h
|
||||
10.0.108.117:/mnt/nfsshare_glance 117G 44G 73G 38% /opt/stack/data/glance/images
|
||||
|
||||
import os
|
||||
info = os.statvfs('/opt/stack/data/glance/images')
|
||||
print(info.f_fsid)
|
||||
>>> 0
|
||||
|
||||
Whereas if local FS `/opt/stack/data/glance/images` is not mounted as
|
||||
NFS share::
|
||||
|
||||
import os
|
||||
info = os.statvfs('/opt/stack/data/glance/images')
|
||||
print(info.f_fsid)
|
||||
>>> 3294141091232417704
|
||||
|
||||
So we can record this `f_fsid` value at service startup if it is Zero and
|
||||
assume that NFS share is configured for glance. We will again retrieve this
|
||||
value while adding image data to filesystem store or deleting the data from
|
||||
the filesystem store. If newly retrieved value is Non Zero then we will
|
||||
simply abort the operation and return HTTP 400 to end user.
|
||||
|
||||
If the `f_fsid` is found Non Zero at service start then we will ignore it
|
||||
considering local filesystem is used for Glance storage.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Introduce few configuration options for filesystem driver which will help to
|
||||
detect if the NFS share is unmounted from underneath the Glance service. We
|
||||
proposed to introduce below new configuration options for the same:
|
||||
|
||||
* 'filesystem_is_nfs_configured' - boolean, verify if NFS is configured or not
|
||||
* 'filesystem_nfs_host' - IP address of NFS server
|
||||
* 'filesystem_nfs_share_path' - Mount path of NFS mapped with local filesystem
|
||||
* 'filesystem_nfs_mount_options' - Mount options to be passed to NFS client
|
||||
* 'rootwrap_config' - To run commands as root user
|
||||
|
||||
If 'filesystem_is_nfs_configured' is set, i.e. if NFS is configured then
|
||||
deployer must specify 'filesystem_nfs_host' and 'filesystem_nfs_share_path'
|
||||
config options in glance-api.conf otherwise the respective glance store will
|
||||
be disabled and will not be used for any operation.
|
||||
|
||||
We are planning to use existing os-brick library (already used by cinder driver
|
||||
of glance_store) to create the NFS client with the help of above configuration
|
||||
options and check if NFS share is available or not during service
|
||||
initialization as well as before each image upload/import/delete operation. If
|
||||
NFS share is not available during service initialization then add and delete
|
||||
operations will be disabled but if NFS goes down afterwards we will raise
|
||||
HTTP 410 (HTTP GONE) response to the user.
|
||||
|
||||
Glance still doesn't have capability to check whether particular NFS store has
|
||||
storage capability to store any particular image beforehand. Also it does not
|
||||
have capability to verify if network failure occurs during upload/import
|
||||
operation.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Performance will have very little impact since each add/delete operation
|
||||
we will be calling `os.statvfs()` and comparing the `f_fsid` to validate
|
||||
NFS share is available or not.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
abhishekk
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Modify Filesystem Store init process to record `f_fsid`
|
||||
|
||||
* Modify add and delete operation to compare `f_fsid`
|
||||
|
||||
* Unit/Functional tests for coverage
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit Tests
|
||||
* Functional Tests
|
||||
* Tempest Tests
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Need to document new behavior of filesystem driver if NFS is configured
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* os.statvfs - https://docs.python.org/3/library/os.html#os.statvfs
|
|
@ -6,7 +6,13 @@
|
|||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
TODO: fill this in once a new approved spec is added.
|
||||
2024.2 approved specs for glance:
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
glance_store/*
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue