Basic design doc for Bank Plugin lease
design document for Bank Plugin lease Closes-Bug: #1529199 Change-Id: Iaaeb7d50e998f68ba53414932d72cd3a2dbb339c
This commit is contained in:
parent
4b77c7c651
commit
4cd1110dbe
154
doc/source/specs/bank-plugin-lease.rst
Normal file
154
doc/source/specs/bank-plugin-lease.rst
Normal file
@ -0,0 +1,154 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=================
|
||||
Bank Plugin Basic
|
||||
=================
|
||||
Bank Plugin is a component of smaug (an openstack project working as a service for data protection),
|
||||
which is responsible for execute CRUD actions in Bank.
|
||||
|
||||
The bank is a backend (such as swift) which is used to store the metadata/data of protection plan.
|
||||
Here, we take swift as an bank implementation example.
|
||||
|
||||
*******
|
||||
leases
|
||||
*******
|
||||
Smuag will create a checkpoint when protecting a protection plan. This checkpoint is maintained with
|
||||
status, which is a enum type: protecting, available, restoring, deleted, etc.
|
||||
|
||||
The status is used for smaug API layer to control access to one checkpoint from users.
|
||||
|
||||
With the 'protecting' status, there're two cases which we can't tell the difference:
|
||||
|
||||
1. The protection service is working and those 'protecting' protection plan are being executed;
|
||||
|
||||
2. When the Protection Service crashes, those 'protecting' protection plan are actually zombie ones,
|
||||
and those checkpoints are zombie ones too;
|
||||
|
||||
In the second case, we need a garbage collection component (GC) to cleanup those zombie checkpoints.
|
||||
|
||||
In order to tell whether the checkpoint is a zombie or not, we introduce a lease mechanism based on
|
||||
bank plugin.
|
||||
|
||||
Here, we take swift as an example. The lease is stored as an object in swift with the
|
||||
characteristics of auto-deleted.
|
||||
|
||||
The owner of one checkpoint will periodically refresh the expire time of the lease object key.
|
||||
|
||||
When the protection service crashes, the leases of bank plugins will be auto-deleted by the
|
||||
swift-object-expirer(one service of swift).
|
||||
|
||||
When GC comes to check whether one checkpoint is a zombie to be collected, GC will first get the
|
||||
owner of the checkpoint. Then it will check whether the lease of the owner exists.
|
||||
|
||||
If the lease exists, those 'protecting' checkpoints can not be deleted by the GC; otherwise the GC
|
||||
will cleanup them.
|
||||
|
||||
Granularity
|
||||
=================
|
||||
To avoid flood to bank server, we don't keep one lease for per checkpoint. Instead, we keep one
|
||||
lease per checkpoint owner. So the granularity of lease is per bank plugin instance.
|
||||
|
||||
When one protection service instance gets initialized, each bank plugin instance will get
|
||||
initialized as well. Each bank plugin will start to maintain its own leases with its corresponding
|
||||
bank server.
|
||||
|
||||
Here, every bank plugin will play a role as lease client while the bank server (swift cluster) plays
|
||||
as the lease server.
|
||||
|
||||
Functions
|
||||
===============
|
||||
acquire_lease
|
||||
-------------
|
||||
Each bank plugin (lease client) will use this function to acquire a lease from bank server (lease
|
||||
server).
|
||||
|
||||
For swift specifically, it will create a lease object in swift container and set an expire_window
|
||||
for this lease.
|
||||
|
||||
The expire_window represents the validity of this lease from creation(or latest-renew) until being
|
||||
auto-deleted by swift server. The value of expire_window should be configurable.
|
||||
|
||||
We use owner_id to identify one instance of bank plugin. The owner_id is a uuid created when bank
|
||||
plugin instance is initiated, say, generated from sha256 with parameter as hostname and the
|
||||
timestamp instance initiated.
|
||||
|
||||
The key of lease object stored in swift looks like this: /account/leases/owner_id.
|
||||
|
||||
In order to map one checkpoint to its owner, we will create an index like this:
|
||||
/account/checkpoints/checkpoint_id/owner when creating a checkpoint.
|
||||
|
||||
- create_owner_id: create a uuid to represent this bank plugin instance
|
||||
- put_object: use swift-client to create a lease object in swift, and set 'X-Delete-After' as:
|
||||
expire_window
|
||||
- set_expire_time in memory in lease client side: set the expire_time as: now+expired_time
|
||||
|
||||
renew_lease
|
||||
-----------------
|
||||
This function will be called by each lease client in the background periodically.
|
||||
|
||||
The renew_window represents the period with which the lease client will refresh lease frequently.
|
||||
This renew_window is configurable as well, where renew_window < expire_window.
|
||||
|
||||
If lease client succeeds to renew lease, this lease has a new expire_window in lease server from now
|
||||
on. Then the lease client side will update the expire_time in memory with value as: expire_time =
|
||||
now + expired_window.
|
||||
|
||||
If lease client fails to renew, this lease object keeps the old expire_window in lease server side.
|
||||
The lease client won't update its expire_time in memory.
|
||||
|
||||
- post_object: use swift-client to reset the 'X-Delete-After' header as: expired_window
|
||||
- update_expire_time: if post_object succeeds, update expire_time as: now+expired_window; otherwise,
|
||||
don't refresh the expire_time.
|
||||
|
||||
check_lease_validity
|
||||
--------------------
|
||||
This function is used by the checkpoint owner to check whether there is enough time to execute an
|
||||
update operation to one checkpoint (or anything else garded by the lease) before the lease expiring.
|
||||
|
||||
We use validity_window to represent the time window inside which an update operation to a checkpoint
|
||||
should complete. This window is configurable and should be estimated by admin.
|
||||
|
||||
This function will check if validity_window <= expire_time - now. If it's true, this function will
|
||||
return true and thus allow update operation to go ahead; otherwise, this function will return false
|
||||
and the update operation will abort.
|
||||
|
||||
Although the lease may haven't expired when validity_window <= expire_time - now, there might not be
|
||||
enough time to finish the update operation. If we allow the update operation to go ahead under this
|
||||
situation, there is a risk that while the operation is still on-going, the lease has been recycled
|
||||
by lease server during this period.
|
||||
|
||||
check_lease_existence
|
||||
---------------------
|
||||
This function is used by GC to check whether the lease object exists or not in lease server side.
|
||||
|
||||
Specifically for checkpoints, GC will scan all checkpoints in 'protecting' status. It will first get
|
||||
the owner of a checkpoint through its index, and then check the existence of the lease object in
|
||||
lease server. If the lease object doesn't exist, it will take this checkpoint as zombie and go
|
||||
ahead to recycle it. Otherwise, it will skip this checkpoint and leave it there.
|
||||
|
||||
Configurations
|
||||
==============
|
||||
|
||||
renew_window
|
||||
------------
|
||||
- represents the period with which lease client will renew the lease in background.
|
||||
|
||||
expire_window
|
||||
--------------
|
||||
- represents how long this lease from creation or latest-renew to expire in lease server side.
|
||||
- Note: expired_window > renew_window. To make renew mechanism more robust, we recommend to set
|
||||
expired_window = N*renew_window. With this setting, we allow (N-1) times failure to renew lease
|
||||
to tolerate unstable network case or IO scheduling issue;
|
||||
|
||||
validity_window
|
||||
---------------
|
||||
- an optional configuration; The default value it set according to the renew_window, validity_window
|
||||
<= renew_window
|
||||
- the window estimated by admin, how long one update operation will take at most. The constraint
|
||||
here should be: validity_window < expire_window.
|
||||
- Note: Same background as renew_window setting, to allow (N-1) times failure of renew lease, we
|
||||
recommend to set validity_window <= renew_window.
|
@ -33,6 +33,7 @@ Spec Template
|
||||
|
||||
skeleton
|
||||
template
|
||||
bank-plugin-lease
|
||||
|
||||
|
||||
Indices and tables
|
||||
|
Loading…
Reference in New Issue
Block a user