[Spec] NFS related improvement for filesystem driver
Change-Id: Ic9b316a284641f4c5f33fe4238e08cf1d0faf2a1
This commit is contained in:
parent
3bdda0e98f
commit
5940c59d44
230
specs/2024.2/approved/glance/improve-filesystem-driver.rst
Normal file
230
specs/2024.2/approved/glance/improve-filesystem-driver.rst
Normal file
@ -0,0 +1,230 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================================================
|
||||
Improve filesystem store driver to utilize NFS capabilities
|
||||
===========================================================
|
||||
|
||||
https://blueprints.launchpad.net/glance/+spec/improve-filesystem-driver
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The filesystem backend of glance can be used to mount NFS share as local
|
||||
filesystem, so it is not required to store any special configs at
|
||||
glance side. Glance does not care about NFS server address or NFS share
|
||||
path at all, it just assumes that each image is stored in the local
|
||||
filesystem. The downside of this assumption is that glance is not
|
||||
aware whether NFS server is connected/available or not, NFS share
|
||||
is mounted or not and just keeps performing add/delete operations
|
||||
on local filesystem directory which later might causes problem
|
||||
in synchronization when NFS is back online.
|
||||
|
||||
Use case: In a k8s environment where OpenStack Glance is installed on
|
||||
top of OpenShift and NFS share is mounted via the `Volume/VolumeMount`
|
||||
interface, the Glance pod won't start if NFS share isn't ready. Whereas
|
||||
if NFS share is not available after Glance pod is available then
|
||||
upload operation will fail with following error::
|
||||
|
||||
sh-5.1$ openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img cirros
|
||||
ConflictException: 409: Client Error for url: https://glance-default-public-openstack.apps-crc.testing/v2/images/0ce1f894-5af7-44fa-987d-f4c47c77d0cf/file, Conflict
|
||||
|
||||
Even though the Glance Pod is still up, `liveness` and `readiness` probes
|
||||
starts failing and as a result the Glance Pods are marked as `Unhealthy`::
|
||||
|
||||
Normal Started 12m kubelet Started container glance-api
|
||||
Warning Unhealthy 5m24s (x2 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 5m24s (x3 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 5m24s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 4m54s (x2 over 9m24s) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 4m54s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
|
||||
Later in time, according to the failure threshold set for the Pod,
|
||||
the kubelet marks the Pod as Failed, and we can see a failure, and
|
||||
given that the policy is supposed to recreate it::
|
||||
|
||||
glance-default-single-0 0/3 CreateContainerError 4 (3m39s ago) 28m
|
||||
|
||||
$ oc describe pod glance-default-single-0 | tail
|
||||
Normal Started 29m kubelet Started container glance-api
|
||||
Warning Unhealthy 10m (x3 over 26m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 10m kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 10m kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x4 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x5 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x2 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s (x3 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Unhealthy 9m30s kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||
Warning Failed 4m47s (x2 over 6m48s) kubelet Error: context deadline exceeded
|
||||
|
||||
Unlike other deployments (deployment != k8s) where even if NFS share is not
|
||||
available the glance service keeps running and uploads or deletes the data
|
||||
from local filesystem. In this case we can definitely say that NFS share is
|
||||
not available, the Glance won't be able to upload any image in the
|
||||
filesystem local to the container and the Pod will be marked as failed and
|
||||
it fails to be recreated.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
We are planning to add new plugin `enable_by_files` to `healthcheck`
|
||||
wsgi middleware in `oslo.middleware` which can be used by all openstack
|
||||
components to check if desired path is not present then report
|
||||
`503 <REASON>` error or `200 OK` if everything is OK.
|
||||
|
||||
In glance we can configure this healthcheck middleware as an application
|
||||
in glance-api-paste.ini as an application:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[app:healthcheck]
|
||||
paste.app_factory = oslo_middleware:Healthcheck.app_factory
|
||||
backends = enable_by_files (optional, default: empty)
|
||||
# used by the 'enable_by_files' backend
|
||||
enable_by_file_paths = /var/lib/glance/images/filename,/var/lib/glance/cache/filename (optional, default: empty)
|
||||
|
||||
# Use this composite for keystone auth with caching and cache management
|
||||
[composite:glance-api-keystone+cachemanagement]
|
||||
paste.composite_factory = glance.api:root_app_factory
|
||||
/: api-keystone+cachemanagement
|
||||
/healthcheck: healthcheck
|
||||
|
||||
The middleware will return "200 OK" if everything is OK,
|
||||
or "503 <REASON>" if not with the reason of why this API should not be used.
|
||||
|
||||
"backends" will the name of a stevedore extentions in the namespace
|
||||
"oslo.middleware.healthcheck".
|
||||
|
||||
In glance, if local filesystem path is mounted on NFS share then we
|
||||
propose to add one marker file named `.glance` to NFS share and then
|
||||
use that file path to configure `enable_by_files` healthcheck
|
||||
middleware plugin as shown below:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[app:healthcheck]
|
||||
paste.app_factory = oslo_middleware:Healthcheck.app_factory
|
||||
backends = enable_by_files
|
||||
enable_by_file_paths = /var/lib/glance/images/.glance
|
||||
|
||||
If NFS goes down or somehow the `/healthcheck` starts reporting
|
||||
`503 <REASON>` admin can take appropriate actions to make NFS
|
||||
share available again.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Introduce few configuration options for filesystem driver which will help to
|
||||
detect if the NFS share is unmounted from underneath the Glance service. We
|
||||
proposed to introduce below new configuration options for the same:
|
||||
|
||||
* `filesystem_is_nfs_configured` - boolean, verify if NFS is configured or not
|
||||
* `filesystem_nfs_host` - IP address of NFS server
|
||||
* `filesystem_nfs_share_path` - Mount path of NFS mapped with local filesystem
|
||||
* `filesystem_nfs_mount_options` - Mount options to be passed to NFS client
|
||||
* `rootwrap_config` - To run commands as root user
|
||||
|
||||
If `filesystem_is_nfs_configured` is set, i.e. if NFS is configured then
|
||||
deployer must specify `filesystem_nfs_host` and `filesystem_nfs_share_path`
|
||||
config options in glance-api.conf otherwise the respective glance store will
|
||||
be disabled and will not be used for any operation.
|
||||
|
||||
We are planning to use existing os-brick library (already used by cinder driver
|
||||
of glance_store) to create the NFS client with the help of above configuration
|
||||
options and check if NFS share is available or not during service
|
||||
initialization as well as before each image upload/import/delete operation. If
|
||||
NFS share is not available during service initialization then add and delete
|
||||
operations will be disabled but if NFS goes down afterwards we will raise
|
||||
HTTP 410 (HTTP GONE) response to the user.
|
||||
|
||||
Glance still doesn't have capability to check whether particular NFS store has
|
||||
storage capability to store any particular image beforehand. Also it does not
|
||||
have capability to verify if network failure occurs during upload/import
|
||||
operation.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Need to configure healthcheck middleware for glance.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
abhishekk
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add `enable_by_files` healthcheck backend in oslo.middleware
|
||||
|
||||
* Document how to configure `enable_by_files` healthcheck middleware
|
||||
|
||||
* Unit/Functional tests for coverage
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit Tests
|
||||
* Functional Tests
|
||||
* Tempest Tests
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Need to document new behavior of filesystem driver if NFS and healthcheck
|
||||
middleware is configured.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Oslo.Middleware Implementation - https://review.opendev.org/920055
|
@ -6,7 +6,13 @@
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
TODO: fill this in once a new approved spec is added.
|
||||
2024.2 approved specs for glance:
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
glance/*
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user