Basic design doc for Bank Plugin lease
design document for Bank Plugin lease Closes-Bug: #1529199 Change-Id: Iaaeb7d50e998f68ba53414932d72cd3a2dbb339c
This commit is contained in:
parent
4b77c7c651
commit
4cd1110dbe
|
@ -0,0 +1,154 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
=================
|
||||||
|
Bank Plugin Basic
|
||||||
|
=================
|
||||||
|
Bank Plugin is a component of smaug (an openstack project working as a service for data protection),
|
||||||
|
which is responsible for execute CRUD actions in Bank.
|
||||||
|
|
||||||
|
The bank is a backend (such as swift) which is used to store the metadata/data of protection plan.
|
||||||
|
Here, we take swift as an bank implementation example.
|
||||||
|
|
||||||
|
*******
|
||||||
|
leases
|
||||||
|
*******
|
||||||
|
Smuag will create a checkpoint when protecting a protection plan. This checkpoint is maintained with
|
||||||
|
status, which is a enum type: protecting, available, restoring, deleted, etc.
|
||||||
|
|
||||||
|
The status is used for smaug API layer to control access to one checkpoint from users.
|
||||||
|
|
||||||
|
With the 'protecting' status, there're two cases which we can't tell the difference:
|
||||||
|
|
||||||
|
1. The protection service is working and those 'protecting' protection plan are being executed;
|
||||||
|
|
||||||
|
2. When the Protection Service crashes, those 'protecting' protection plan are actually zombie ones,
|
||||||
|
and those checkpoints are zombie ones too;
|
||||||
|
|
||||||
|
In the second case, we need a garbage collection component (GC) to cleanup those zombie checkpoints.
|
||||||
|
|
||||||
|
In order to tell whether the checkpoint is a zombie or not, we introduce a lease mechanism based on
|
||||||
|
bank plugin.
|
||||||
|
|
||||||
|
Here, we take swift as an example. The lease is stored as an object in swift with the
|
||||||
|
characteristics of auto-deleted.
|
||||||
|
|
||||||
|
The owner of one checkpoint will periodically refresh the expire time of the lease object key.
|
||||||
|
|
||||||
|
When the protection service crashes, the leases of bank plugins will be auto-deleted by the
|
||||||
|
swift-object-expirer(one service of swift).
|
||||||
|
|
||||||
|
When GC comes to check whether one checkpoint is a zombie to be collected, GC will first get the
|
||||||
|
owner of the checkpoint. Then it will check whether the lease of the owner exists.
|
||||||
|
|
||||||
|
If the lease exists, those 'protecting' checkpoints can not be deleted by the GC; otherwise the GC
|
||||||
|
will cleanup them.
|
||||||
|
|
||||||
|
Granularity
|
||||||
|
=================
|
||||||
|
To avoid flood to bank server, we don't keep one lease for per checkpoint. Instead, we keep one
|
||||||
|
lease per checkpoint owner. So the granularity of lease is per bank plugin instance.
|
||||||
|
|
||||||
|
When one protection service instance gets initialized, each bank plugin instance will get
|
||||||
|
initialized as well. Each bank plugin will start to maintain its own leases with its corresponding
|
||||||
|
bank server.
|
||||||
|
|
||||||
|
Here, every bank plugin will play a role as lease client while the bank server (swift cluster) plays
|
||||||
|
as the lease server.
|
||||||
|
|
||||||
|
Functions
|
||||||
|
===============
|
||||||
|
acquire_lease
|
||||||
|
-------------
|
||||||
|
Each bank plugin (lease client) will use this function to acquire a lease from bank server (lease
|
||||||
|
server).
|
||||||
|
|
||||||
|
For swift specifically, it will create a lease object in swift container and set an expire_window
|
||||||
|
for this lease.
|
||||||
|
|
||||||
|
The expire_window represents the validity of this lease from creation(or latest-renew) until being
|
||||||
|
auto-deleted by swift server. The value of expire_window should be configurable.
|
||||||
|
|
||||||
|
We use owner_id to identify one instance of bank plugin. The owner_id is a uuid created when bank
|
||||||
|
plugin instance is initiated, say, generated from sha256 with parameter as hostname and the
|
||||||
|
timestamp instance initiated.
|
||||||
|
|
||||||
|
The key of lease object stored in swift looks like this: /account/leases/owner_id.
|
||||||
|
|
||||||
|
In order to map one checkpoint to its owner, we will create an index like this:
|
||||||
|
/account/checkpoints/checkpoint_id/owner when creating a checkpoint.
|
||||||
|
|
||||||
|
- create_owner_id: create a uuid to represent this bank plugin instance
|
||||||
|
- put_object: use swift-client to create a lease object in swift, and set 'X-Delete-After' as:
|
||||||
|
expire_window
|
||||||
|
- set_expire_time in memory in lease client side: set the expire_time as: now+expired_time
|
||||||
|
|
||||||
|
renew_lease
|
||||||
|
-----------------
|
||||||
|
This function will be called by each lease client in the background periodically.
|
||||||
|
|
||||||
|
The renew_window represents the period with which the lease client will refresh lease frequently.
|
||||||
|
This renew_window is configurable as well, where renew_window < expire_window.
|
||||||
|
|
||||||
|
If lease client succeeds to renew lease, this lease has a new expire_window in lease server from now
|
||||||
|
on. Then the lease client side will update the expire_time in memory with value as: expire_time =
|
||||||
|
now + expired_window.
|
||||||
|
|
||||||
|
If lease client fails to renew, this lease object keeps the old expire_window in lease server side.
|
||||||
|
The lease client won't update its expire_time in memory.
|
||||||
|
|
||||||
|
- post_object: use swift-client to reset the 'X-Delete-After' header as: expired_window
|
||||||
|
- update_expire_time: if post_object succeeds, update expire_time as: now+expired_window; otherwise,
|
||||||
|
don't refresh the expire_time.
|
||||||
|
|
||||||
|
check_lease_validity
|
||||||
|
--------------------
|
||||||
|
This function is used by the checkpoint owner to check whether there is enough time to execute an
|
||||||
|
update operation to one checkpoint (or anything else garded by the lease) before the lease expiring.
|
||||||
|
|
||||||
|
We use validity_window to represent the time window inside which an update operation to a checkpoint
|
||||||
|
should complete. This window is configurable and should be estimated by admin.
|
||||||
|
|
||||||
|
This function will check if validity_window <= expire_time - now. If it's true, this function will
|
||||||
|
return true and thus allow update operation to go ahead; otherwise, this function will return false
|
||||||
|
and the update operation will abort.
|
||||||
|
|
||||||
|
Although the lease may haven't expired when validity_window <= expire_time - now, there might not be
|
||||||
|
enough time to finish the update operation. If we allow the update operation to go ahead under this
|
||||||
|
situation, there is a risk that while the operation is still on-going, the lease has been recycled
|
||||||
|
by lease server during this period.
|
||||||
|
|
||||||
|
check_lease_existence
|
||||||
|
---------------------
|
||||||
|
This function is used by GC to check whether the lease object exists or not in lease server side.
|
||||||
|
|
||||||
|
Specifically for checkpoints, GC will scan all checkpoints in 'protecting' status. It will first get
|
||||||
|
the owner of a checkpoint through its index, and then check the existence of the lease object in
|
||||||
|
lease server. If the lease object doesn't exist, it will take this checkpoint as zombie and go
|
||||||
|
ahead to recycle it. Otherwise, it will skip this checkpoint and leave it there.
|
||||||
|
|
||||||
|
Configurations
|
||||||
|
==============
|
||||||
|
|
||||||
|
renew_window
|
||||||
|
------------
|
||||||
|
- represents the period with which lease client will renew the lease in background.
|
||||||
|
|
||||||
|
expire_window
|
||||||
|
--------------
|
||||||
|
- represents how long this lease from creation or latest-renew to expire in lease server side.
|
||||||
|
- Note: expired_window > renew_window. To make renew mechanism more robust, we recommend to set
|
||||||
|
expired_window = N*renew_window. With this setting, we allow (N-1) times failure to renew lease
|
||||||
|
to tolerate unstable network case or IO scheduling issue;
|
||||||
|
|
||||||
|
validity_window
|
||||||
|
---------------
|
||||||
|
- an optional configuration; The default value it set according to the renew_window, validity_window
|
||||||
|
<= renew_window
|
||||||
|
- the window estimated by admin, how long one update operation will take at most. The constraint
|
||||||
|
here should be: validity_window < expire_window.
|
||||||
|
- Note: Same background as renew_window setting, to allow (N-1) times failure of renew lease, we
|
||||||
|
recommend to set validity_window <= renew_window.
|
|
@ -33,6 +33,7 @@ Spec Template
|
||||||
|
|
||||||
skeleton
|
skeleton
|
||||||
template
|
template
|
||||||
|
bank-plugin-lease
|
||||||
|
|
||||||
|
|
||||||
Indices and tables
|
Indices and tables
|
||||||
|
|
Loading…
Reference in New Issue