swift

openstack/swift

Commit Graph

Author	SHA1	Message	Date
Romain LE DISEZ	3061ec803f	relinker: Improve performance by limiting I/O This commit reduce the number of I/O done by the swift-object-relinker. First, it saves a progress state of relinking and cleanup in case the process is interrupted during the operation. This allow to resume operation without rescanning all partitions. Secondly, it prevents from being scanned by relink and cleanup all partitions that are bigger than 2^part_power (or (2^next_part_power)/2). These partitions were not existing before the beginning of the part_power increase, so there is nothing to relink or cleanup. Thirdly, it reverse-orders the partitions to scan so that some useless work is avoided. If a device contains partitions 1 and 3, relinking partition 1 will create "new" objects in partition 3, that will need to be scanned when the relinker will work on partition 3. It is useless. If partition 3 is done first, it will only contain the objects that need to be relinked. Fourthly, it allows to specify a unique device to work on. To do that, some hooks were added in audit_location_generator to allow to execute some custom code before/after iterating a device/partition/suffix/hash. Change-Id: If1bf8ed9036fb0ec619b0d4f16061a81a1af2082	2020-03-31 17:33:06 -04:00
Ondřej Nový	611b28f73a	Add manpage for swift-object-relinker Change-Id: I56dd9c646faba91e9f124f343ea0e08f8c3c4249	2017-12-09 19:10:35 +01:00
Christian Schwede	e1140666d6	Add support to increase object ring partition power This patch adds methods to increase the partition power of an existing object ring without downtime for the users using a 3-step process. Data won't be moved to other nodes; objects using the new increased partition power will be located on the same device and are hardlinked to avoid data movement. 1. A new setting "next_part_power" will be added to the rings, and once the proxy server reloaded the rings it will send this value to the object servers on any write operation. Object servers will now create a hard-link in the new location to the original DiskFile object. Already existing data will be relinked using a new tool in the new locations using hardlinks. 2. The actual partition power itself will be increased. Servers will now use the new partition power to read from and write to. No longer required hard links in the old object location have to be removed now by the relinker tool; the relinker tool reads the next_part_power setting to find object locations that need to be cleaned up. 3. The "next_part_power" flag will be removed. This mostly implements the spec in [1]; however it's not using an "epoch" as described there. The idea of the epoch was to store data using different partition powers in their own namespace to avoid conflicts with auditors and replicators as well as being able to abort such an operation and just remove the new tree. This would require some heavy change of the on-disk data layout, and other object-server implementations would be required to adopt this scheme too. Instead the object-replicator is now aware that there is a partition power increase in progress and will skip replication of data in that storage policy; the relinker tool should be simply run and afterwards the partition power will be increased. This shouldn't take that much time (it's only walking the filesystem and hardlinking); impact should be low therefore. The relinker should be run on all storage nodes at the same time in parallel to decrease the required time (though this is not mandatory). Failures during relinking should not affect cluster operations - relinking can be even aborted manually and restarted later. Auditors are not quarantining objects written to a path with a different partition power and therefore working as before (though they are reading each object twice in the worst case before the no longer needed hard links are removed). Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> [1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/ increasing_partition_power.html Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb	2017-06-15 15:08:48 -07:00

Author

SHA1

Message

Date

Romain LE DISEZ

3061ec803f

relinker: Improve performance by limiting I/O

This commit reduce the number of I/O done by the swift-object-relinker.

First, it saves a progress state of relinking and cleanup in case the
process is interrupted during the operation. This allow to resume
operation without rescanning all partitions.

Secondly, it prevents from being scanned by relink and cleanup all
partitions that are bigger than 2^part_power (or (2^next_part_power)/2).
These partitions were not existing before the beginning of the part_power
increase, so there is nothing to relink or cleanup.

Thirdly, it reverse-orders the partitions to scan so that some useless
work is avoided. If a device contains partitions 1 and 3, relinking
partition 1 will create "new" objects in partition 3, that will need to
be scanned when the relinker will work on partition 3. It is useless. If
partition 3 is done first, it will only contain the objects that need to
be relinked.

Fourthly, it allows to specify a unique device to work on.

To do that, some hooks were added in audit_location_generator to allow
to execute some custom code before/after iterating a
device/partition/suffix/hash.

Change-Id: If1bf8ed9036fb0ec619b0d4f16061a81a1af2082

2020-03-31 17:33:06 -04:00

Ondřej Nový

611b28f73a

Add manpage for swift-object-relinker

Change-Id: I56dd9c646faba91e9f124f343ea0e08f8c3c4249

2017-12-09 19:10:35 +01:00

Christian Schwede

e1140666d6

Add support to increase object ring partition power

This patch adds methods to increase the partition power of an existing
object ring without downtime for the users using a 3-step process. Data
won't be moved to other nodes; objects using the new increased partition
power will be located on the same device and are hardlinked to avoid
data movement.

1. A new setting "next_part_power" will be added to the rings, and once
the proxy server reloaded the rings it will send this value to the
object servers on any write operation. Object servers will now create a
hard-link in the new location to the original DiskFile object. Already
existing data will be relinked using a new tool in the new locations
using hardlinks.

2. The actual partition power itself will be increased. Servers will now
use the new partition power to read from and write to. No longer
required hard links in the old object location have to be removed now by
the relinker tool; the relinker tool reads the next_part_power setting
to find object locations that need to be cleaned up.

3. The "next_part_power" flag will be removed.

This mostly implements the spec in [1]; however it's not using an
"epoch" as described there. The idea of the epoch was to store data
using different partition powers in their own namespace to avoid
conflicts with auditors and replicators as well as being able to abort
such an operation and just remove the new tree.  This would require some
heavy change of the on-disk data layout, and other object-server
implementations would be required to adopt this scheme too.

Instead the object-replicator is now aware that there is a partition
power increase in progress and will skip replication of data in that
storage policy; the relinker tool should be simply run and afterwards
the partition power will be increased. This shouldn't take that much
time (it's only walking the filesystem and hardlinking); impact should
be low therefore. The relinker should be run on all storage nodes at the
same time in parallel to decrease the required time (though this is not
mandatory). Failures during relinking should not affect cluster
operations - relinking can be even aborted manually and restarted later.

Auditors are not quarantining objects written to a path with a different
partition power and therefore working as before (though they are reading
each object twice in the worst case before the no longer needed hard
links are removed).

Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>

[1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/
increasing_partition_power.html

Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb

2017-06-15 15:08:48 -07:00

3 Commits