[Spec] NFS related improvement for filesystem driver
Change-Id: Ic9b316a284641f4c5f33fe4238e08cf1d0faf2a1
This commit is contained in:
parent
3bdda0e98f
commit
5940c59d44
specs/2024.2/approved
230
specs/2024.2/approved/glance/improve-filesystem-driver.rst
Normal file
230
specs/2024.2/approved/glance/improve-filesystem-driver.rst
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
===========================================================
|
||||||
|
Improve filesystem store driver to utilize NFS capabilities
|
||||||
|
===========================================================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/glance/+spec/improve-filesystem-driver
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
The filesystem backend of glance can be used to mount NFS share as local
|
||||||
|
filesystem, so it is not required to store any special configs at
|
||||||
|
glance side. Glance does not care about NFS server address or NFS share
|
||||||
|
path at all, it just assumes that each image is stored in the local
|
||||||
|
filesystem. The downside of this assumption is that glance is not
|
||||||
|
aware whether NFS server is connected/available or not, NFS share
|
||||||
|
is mounted or not and just keeps performing add/delete operations
|
||||||
|
on local filesystem directory which later might causes problem
|
||||||
|
in synchronization when NFS is back online.
|
||||||
|
|
||||||
|
Use case: In a k8s environment where OpenStack Glance is installed on
|
||||||
|
top of OpenShift and NFS share is mounted via the `Volume/VolumeMount`
|
||||||
|
interface, the Glance pod won't start if NFS share isn't ready. Whereas
|
||||||
|
if NFS share is not available after Glance pod is available then
|
||||||
|
upload operation will fail with following error::
|
||||||
|
|
||||||
|
sh-5.1$ openstack image create --container-format bare --disk-format raw --file /tmp/cirros-0.5.2-x86_64-disk.img cirros
|
||||||
|
ConflictException: 409: Client Error for url: https://glance-default-public-openstack.apps-crc.testing/v2/images/0ce1f894-5af7-44fa-987d-f4c47c77d0cf/file, Conflict
|
||||||
|
|
||||||
|
Even though the Glance Pod is still up, `liveness` and `readiness` probes
|
||||||
|
starts failing and as a result the Glance Pods are marked as `Unhealthy`::
|
||||||
|
|
||||||
|
Normal Started 12m kubelet Started container glance-api
|
||||||
|
Warning Unhealthy 5m24s (x2 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 5m24s (x3 over 9m24s) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 5m24s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 4m54s (x2 over 9m24s) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 4m54s kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
|
||||||
|
Later in time, according to the failure threshold set for the Pod,
|
||||||
|
the kubelet marks the Pod as Failed, and we can see a failure, and
|
||||||
|
given that the policy is supposed to recreate it::
|
||||||
|
|
||||||
|
glance-default-single-0 0/3 CreateContainerError 4 (3m39s ago) 28m
|
||||||
|
|
||||||
|
$ oc describe pod glance-default-single-0 | tail
|
||||||
|
Normal Started 29m kubelet Started container glance-api
|
||||||
|
Warning Unhealthy 10m (x3 over 26m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 10m kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 10m kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 9m30s (x4 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 9m30s (x5 over 26m) kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 9m30s (x2 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 9m30s (x3 over 22m) kubelet Readiness probe failed: Get "https://10.217.0.247:9292/healthcheck": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Unhealthy 9m30s kubelet Liveness probe failed: Get "https://10.217.0.247:9292/healthcheck": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
|
||||||
|
Warning Failed 4m47s (x2 over 6m48s) kubelet Error: context deadline exceeded
|
||||||
|
|
||||||
|
Unlike other deployments (deployment != k8s) where even if NFS share is not
|
||||||
|
available the glance service keeps running and uploads or deletes the data
|
||||||
|
from local filesystem. In this case we can definitely say that NFS share is
|
||||||
|
not available, the Glance won't be able to upload any image in the
|
||||||
|
filesystem local to the container and the Pod will be marked as failed and
|
||||||
|
it fails to be recreated.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
We are planning to add new plugin `enable_by_files` to `healthcheck`
|
||||||
|
wsgi middleware in `oslo.middleware` which can be used by all openstack
|
||||||
|
components to check if desired path is not present then report
|
||||||
|
`503 <REASON>` error or `200 OK` if everything is OK.
|
||||||
|
|
||||||
|
In glance we can configure this healthcheck middleware as an application
|
||||||
|
in glance-api-paste.ini as an application:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[app:healthcheck]
|
||||||
|
paste.app_factory = oslo_middleware:Healthcheck.app_factory
|
||||||
|
backends = enable_by_files (optional, default: empty)
|
||||||
|
# used by the 'enable_by_files' backend
|
||||||
|
enable_by_file_paths = /var/lib/glance/images/filename,/var/lib/glance/cache/filename (optional, default: empty)
|
||||||
|
|
||||||
|
# Use this composite for keystone auth with caching and cache management
|
||||||
|
[composite:glance-api-keystone+cachemanagement]
|
||||||
|
paste.composite_factory = glance.api:root_app_factory
|
||||||
|
/: api-keystone+cachemanagement
|
||||||
|
/healthcheck: healthcheck
|
||||||
|
|
||||||
|
The middleware will return "200 OK" if everything is OK,
|
||||||
|
or "503 <REASON>" if not with the reason of why this API should not be used.
|
||||||
|
|
||||||
|
"backends" will the name of a stevedore extentions in the namespace
|
||||||
|
"oslo.middleware.healthcheck".
|
||||||
|
|
||||||
|
In glance, if local filesystem path is mounted on NFS share then we
|
||||||
|
propose to add one marker file named `.glance` to NFS share and then
|
||||||
|
use that file path to configure `enable_by_files` healthcheck
|
||||||
|
middleware plugin as shown below:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[app:healthcheck]
|
||||||
|
paste.app_factory = oslo_middleware:Healthcheck.app_factory
|
||||||
|
backends = enable_by_files
|
||||||
|
enable_by_file_paths = /var/lib/glance/images/.glance
|
||||||
|
|
||||||
|
If NFS goes down or somehow the `/healthcheck` starts reporting
|
||||||
|
`503 <REASON>` admin can take appropriate actions to make NFS
|
||||||
|
share available again.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Introduce few configuration options for filesystem driver which will help to
|
||||||
|
detect if the NFS share is unmounted from underneath the Glance service. We
|
||||||
|
proposed to introduce below new configuration options for the same:
|
||||||
|
|
||||||
|
* `filesystem_is_nfs_configured` - boolean, verify if NFS is configured or not
|
||||||
|
* `filesystem_nfs_host` - IP address of NFS server
|
||||||
|
* `filesystem_nfs_share_path` - Mount path of NFS mapped with local filesystem
|
||||||
|
* `filesystem_nfs_mount_options` - Mount options to be passed to NFS client
|
||||||
|
* `rootwrap_config` - To run commands as root user
|
||||||
|
|
||||||
|
If `filesystem_is_nfs_configured` is set, i.e. if NFS is configured then
|
||||||
|
deployer must specify `filesystem_nfs_host` and `filesystem_nfs_share_path`
|
||||||
|
config options in glance-api.conf otherwise the respective glance store will
|
||||||
|
be disabled and will not be used for any operation.
|
||||||
|
|
||||||
|
We are planning to use existing os-brick library (already used by cinder driver
|
||||||
|
of glance_store) to create the NFS client with the help of above configuration
|
||||||
|
options and check if NFS share is available or not during service
|
||||||
|
initialization as well as before each image upload/import/delete operation. If
|
||||||
|
NFS share is not available during service initialization then add and delete
|
||||||
|
operations will be disabled but if NFS goes down afterwards we will raise
|
||||||
|
HTTP 410 (HTTP GONE) response to the user.
|
||||||
|
|
||||||
|
Glance still doesn't have capability to check whether particular NFS store has
|
||||||
|
storage capability to store any particular image beforehand. Also it does not
|
||||||
|
have capability to verify if network failure occurs during upload/import
|
||||||
|
operation.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Need to configure healthcheck middleware for glance.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
abhishekk
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
None
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Add `enable_by_files` healthcheck backend in oslo.middleware
|
||||||
|
|
||||||
|
* Document how to configure `enable_by_files` healthcheck middleware
|
||||||
|
|
||||||
|
* Unit/Functional tests for coverage
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
* Unit Tests
|
||||||
|
* Functional Tests
|
||||||
|
* Tempest Tests
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Need to document new behavior of filesystem driver if NFS and healthcheck
|
||||||
|
middleware is configured.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* Oslo.Middleware Implementation - https://review.opendev.org/920055
|
@ -6,7 +6,13 @@
|
|||||||
:glob:
|
:glob:
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
TODO: fill this in once a new approved spec is added.
|
2024.2 approved specs for glance:
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:glob:
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
glance/*
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user