swift

History

Kota Tsuyuzaki 40ba7f6172 EC Fragment Duplication - Foundational Global EC Cluster Support This patch enables efficent PUT/GET for global distributed cluster[1]. Problem: Erasure coding has the capability to decrease the amout of actual stored data less then replicated model. For example, ec_k=6, ec_m=3 parameter can be 1.5x of the original data which is smaller than 3x replicated. However, unlike replication, erasure coding requires availability of at least some ec_k fragments of the total ec_k + ec_m fragments to service read (e.g. 6 of 9 in the case above). As such, if we stored the EC object into a swift cluster on 2 geographically distributed data centers which have the same volume of disks, it is likely the fragments will be stored evenly (about 4 and 5) so we still need to access a faraway data center to decode the original object. In addition, if one of the data centers was lost in a disaster, the stored objects will be lost forever, and we have to cry a lot. To ensure highly durable storage, you would think of making more parity fragments (e.g. ec_k=6, ec_m=10), unfortunately this causes significant performance degradation due to the cost of mathmetical caluculation for erasure coding encode/decode. How this resolves the problem: EC Fragment Duplication extends on the initial solution to add more fragments from which to rebuild an object similar to the solution described above. The difference is making copies of encoded fragments. With experimental results[1][2], employing small ec_k and ec_m shows enough performance to store/retrieve objects. On PUT: - Encode incomming object with small ec_k and ec_m <- faster! - Make duplicated copies of the encoded fragments. The # of copies are determined by 'ec_duplication_factor' in swift.conf - Store all fragments in Swift Global EC Cluster The duplicated fragments increase pressure on existing requirements when decoding objects in service to a read request. All fragments are stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index encoded by PyECLib, there will be duplicates. Anytime we must decode the original object data, we must only consider the ec_k fragments as unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and avoided if possible. On GET: This patch inclues following changes: - Change GET Path to sort primary nodes grouping as subsets, so that each subset will includes unique fragments - Change Reconstructor to be more aware of possibly duplicate fragments For example, with this change, a policy could be configured such that swift.conf: ec_num_data_fragments = 2 ec_num_parity_fragments = 1 ec_duplication_factor = 2 (object ring must have 6 replicas) At Object-Server: node index (from object ring): 0 1 2 3 4 5 <- keep node index for reconstruct decision X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual fragment index for backend (PyEClib) Additional improvements to Global EC Cluster Support will require features such as Composite Rings, and more efficient fragment rebalance/reconstruction. 1: http://goo.gl/IYiNPk (Swift Design Spec Repository) 2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo) Doc-Impact Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Idd155401982a2c48110c30b480966a863f6bd305		2017-02-22 10:56:13 -08:00
..
account-server.conf-sample	Fixed rysnc -> rsync typo	2016-10-19 20:17:00 +02:00
container-reconciler.conf-sample	Change schedule priority of daemon/server in config	2016-08-10 23:56:15 +02:00
container-server.conf-sample	Fixed rysnc -> rsync typo	2016-10-19 20:17:00 +02:00
container-sync-realms.conf-sample	Removing some redundant words	2016-03-25 17:20:24 +07:00
dispersion.conf-sample	Fix swift-dispersion in multi-region setups	2016-06-01 15:35:47 +02:00
drive-audit.conf-sample	Added comment for "user" option in drive-audit config	2016-11-21 22:13:11 +01:00
internal-client.conf-sample	Removed default value for log_statsd_host	2016-02-10 10:36:59 -06:00
memcache.conf-sample	fixups for ipv6 memcache_servers docs	2016-01-12 21:08:58 -08:00
mime.types-sample	PEP 8 compliance and small modification to mime.types file	2010-11-23 19:26:02 -06:00
object-expirer.conf-sample	Documantation enhancements of nice/ionice feature	2016-08-19 07:39:49 +02:00
object-server.conf-sample	Deprecate broken handoffs_first in favor of handoffs_only	2017-02-13 21:13:29 -08:00
proxy-server.conf-sample	Default object_post_as_copy to False	2017-01-20 12:37:01 -05:00
rsyncd.conf-sample	Allows to configure the rsync modules where the replicators will send data	2015-09-07 08:00:18 +02:00
swift-rsyslog.conf-sample	Add sample rsyslog.conf.	2013-06-25 10:24:26 +08:00
swift.conf-sample	EC Fragment Duplication - Foundational Global EC Cluster Support	2017-02-22 10:56:13 -08:00