swift/etc
Kota Tsuyuzaki 40ba7f6172 EC Fragment Duplication - Foundational Global EC Cluster Support
This patch enables efficent PUT/GET for global distributed cluster[1].

Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.

How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.

On PUT:

- Encode incomming object with small ec_k and ec_m  <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
  are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster

The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request.  All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index.  In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates.  Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index.  On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.

On GET:

This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
  each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments

For example, with this change, a policy could be configured such that

swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)

At Object-Server:
node index (from object ring):  0 1 2 3 4 5 <- keep node index for
                                               reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
                                               fragment index for
                                               backend (PyEClib)

Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.

1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)

Doc-Impact

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
2017-02-22 10:56:13 -08:00
..
account-server.conf-sample Fixed rysnc -> rsync typo 2016-10-19 20:17:00 +02:00
container-reconciler.conf-sample Change schedule priority of daemon/server in config 2016-08-10 23:56:15 +02:00
container-server.conf-sample Fixed rysnc -> rsync typo 2016-10-19 20:17:00 +02:00
container-sync-realms.conf-sample Removing some redundant words 2016-03-25 17:20:24 +07:00
dispersion.conf-sample Fix swift-dispersion in multi-region setups 2016-06-01 15:35:47 +02:00
drive-audit.conf-sample Added comment for "user" option in drive-audit config 2016-11-21 22:13:11 +01:00
internal-client.conf-sample Removed default value for log_statsd_host 2016-02-10 10:36:59 -06:00
memcache.conf-sample fixups for ipv6 memcache_servers docs 2016-01-12 21:08:58 -08:00
mime.types-sample PEP 8 compliance and small modification to mime.types file 2010-11-23 19:26:02 -06:00
object-expirer.conf-sample Documantation enhancements of nice/ionice feature 2016-08-19 07:39:49 +02:00
object-server.conf-sample Deprecate broken handoffs_first in favor of handoffs_only 2017-02-13 21:13:29 -08:00
proxy-server.conf-sample Default object_post_as_copy to False 2017-01-20 12:37:01 -05:00
rsyncd.conf-sample Allows to configure the rsync modules where the replicators will send data 2015-09-07 08:00:18 +02:00
swift-rsyslog.conf-sample Add sample rsyslog.conf. 2013-06-25 10:24:26 +08:00
swift.conf-sample EC Fragment Duplication - Foundational Global EC Cluster Support 2017-02-22 10:56:13 -08:00