Document management and history of lock files
Something that comes up regularly on the mailing list is the topic of leftover lock files from interprocess locks. An Oslo developer then has to explain why they work the way they do, and this is a not-inconsiderable time suck because it is a complex topic. This change adds an administrator guide that explains what should and should not be done with lock files, and provides an FAQ section to (hopefully) pre-emptively answer many of the common questions people ask. Change-Id: I5442806ef2f107e3b7569a79bffdc344769545b0
This commit is contained in:
parent
85c341aced
commit
f6bf926db7
96
doc/source/admin/index.rst
Normal file
96
doc/source/admin/index.rst
Normal file
@ -0,0 +1,96 @@
|
||||
=====================
|
||||
Administrator Guide
|
||||
=====================
|
||||
|
||||
This section contains information useful to administrators operating a service
|
||||
that uses oslo.concurrency.
|
||||
|
||||
Lock File Management
|
||||
====================
|
||||
|
||||
For services that use oslo.concurrency's external lock functionality for
|
||||
interprocess locking, lock files will be stored in the location specified
|
||||
by the ``lock_path`` config option in the ``oslo_concurrency`` section.
|
||||
These lock files are not automatically deleted by oslo.concurrency because
|
||||
the library has no way to know when the service is done with a lock, and
|
||||
deleting a lock file that is being held by the service would cause
|
||||
concurrency problems. Some services do delete lock files when they are done
|
||||
with them, but deletion of a service's lock files while the service is
|
||||
running should only be done by the service itself. External cleanup methods
|
||||
cannot reasonably know when a lock is no longer needed.
|
||||
|
||||
However, to prevent the ``lock_path`` directory from growing indefinitely,
|
||||
it is a good idea to occasionally delete all the lock files from it. The only
|
||||
safe time to do this is when the service is not running, such as after a
|
||||
reboot or when the service is down for maintenance. In the latter case, make
|
||||
sure that all related services (such as api, worker, conductor, etc) are down
|
||||
If any process that might hold locks is still running, deleting lock files may
|
||||
introduce inconsistency in the service. One possible approach to this cleanup
|
||||
is to put the ``lock_path`` in tmpfs so it will be automatically cleared on
|
||||
reboot.
|
||||
|
||||
Note that in general, leftover lock files are a cosmetic nuisance at worst.
|
||||
If you do run into a functional problem as a result of large numbers of
|
||||
lock files, please report it to the Oslo team so we can look into other
|
||||
mitigation strategies.
|
||||
|
||||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
What is the history of the lock file issue?
|
||||
-------------------------------------------
|
||||
|
||||
It comes up every few months when a deployer of OpenStack notices that they
|
||||
have a large number of lock files lying around, apparently unused. A thread
|
||||
is started on the mailing list and one of the Oslo developers has to provide
|
||||
an explanation of why it works the way it does. This FAQ is intended to be an
|
||||
official replacement for the one-off explanation that is usually given.
|
||||
|
||||
The code responsible for this behavior has actually moved to the
|
||||
`fasteners <https://github.com/harlowja/fasteners>`_ project, and there
|
||||
is an
|
||||
`issue addressing the leftover lock files <https://github.com/harlowja/fasteners/issues/26>`_
|
||||
there. It covers much of the technical history of the problem, as well as
|
||||
some proposed solutions.
|
||||
|
||||
Why hasn't this been fixed yet?
|
||||
-------------------------------
|
||||
|
||||
Because to the Oslo developers' knowledge, no one has ever had a functional
|
||||
issue as a result of leftover lock files. This makes it a lower priority
|
||||
problem, and because of the complexity of fixing it nobody has been able to
|
||||
yet. If functional issues are found, they should be reported as a bug
|
||||
against oslo.concurrency so they can be tracked. In the meantime, this will
|
||||
likely continue to be treated as a cosmetic annoyance and prioritized
|
||||
appropriately.
|
||||
|
||||
Why aren't lock files deleted when the lock is released?
|
||||
--------------------------------------------------------
|
||||
|
||||
In our testing, when a lock file was deleted while another process was waiting
|
||||
for it, it created a sort of split-brain situation between any process that had
|
||||
been waiting for the deleted file, and any process that attempted to lock the
|
||||
file after it had been deleted. Essentially, two processes could end up holding
|
||||
the same lock at the same time, which made this an unacceptable solution.
|
||||
|
||||
Why don't you use some other method of interprocess locking?
|
||||
------------------------------------------------------------
|
||||
|
||||
We tried. Both Posix and SysV IPC were explored as alternatives. Unfortunately,
|
||||
both have significant issues on Linux. Posix locks cannot be broken if the
|
||||
process holding them crashes (at least not without a reboot). SysV locks have
|
||||
a limited range of numerical ids, and because oslo.concurrency supports
|
||||
string-based lock names, the possibility of collisions when hashing names was
|
||||
too high. It was deemed better to have the file-based locking mechanism that
|
||||
would always work than a different method that introduced serious new problems.
|
||||
|
||||
Bonus Question: Why doesn't ``lock_path`` default to a temp directory?
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Because every process that may need to hold a lock must use the same value
|
||||
for ``lock_path`` or it becomes useless. If we allowed ``lock_path`` to be
|
||||
unset and just created a temp directory on startup, each process would create
|
||||
its own temp directory and there would be no actual coordination between them.
|
||||
|
||||
While this isn't strictly related to the lock file issue, it is another FAQ
|
||||
about oslo.concurrency so it made sense to mention it here.
|
@ -10,6 +10,7 @@ external processes.
|
||||
:maxdepth: 1
|
||||
|
||||
install/index
|
||||
admin/index
|
||||
user/index
|
||||
configuration/index
|
||||
contributor/index
|
||||
|
Loading…
Reference in New Issue
Block a user