README edited online with Bitbucket
This commit is contained in:
363
README
363
README
@@ -1,170 +1,193 @@
|
||||
This is v1.0-rc1 of PyECLib. This library provides a simple Python interface for
|
||||
implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x.
|
||||
|
||||
To obtain the best possible performance, the library utilizes liberasurecode,
|
||||
which is a C based erasure code library. Please let us know if you have any
|
||||
other issues building or installing (email: kmgreen2@gmail.com or
|
||||
tusharsg@gmail.com).
|
||||
|
||||
This library makes use of Jesasure for Reed-Solomon as implemented by the
|
||||
liberasurecode library and provides its' own flat XOR-based erasure code
|
||||
encoder and decoder. Currently, it implements a specific class of HD
|
||||
Combination Codes (see "Flat XOR-based erasure codes in storage systems:
|
||||
Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These
|
||||
codes are well-suited to archival use-cases, have a simple construction and
|
||||
require a minimum number of participating disks during single-disk
|
||||
reconstruction (think XOR-based LRC code).
|
||||
|
||||
Examples of using this library are provided in "tools" directory:
|
||||
|
||||
Command-line encoder::
|
||||
|
||||
tools/pyeclib_encode.py
|
||||
|
||||
Command-line decoder::
|
||||
|
||||
tools/pyeclib_decode.py
|
||||
|
||||
Utility to determine what is needed to reconstruct missing fragments::
|
||||
|
||||
tools/pyeclib_fragments_needed.py
|
||||
|
||||
|
||||
PyEClib initialization::
|
||||
|
||||
ec_driver = ECDriver(k=<num_encoded_data_fragments>,
|
||||
m=<num_encoded_parity_fragments>,
|
||||
ec_type=<ec_scheme>))
|
||||
|
||||
Supported ``ec_type`` values:
|
||||
|
||||
* ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1]
|
||||
* ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2]
|
||||
* ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3]
|
||||
* ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4]
|
||||
* ``shss`` => NTT Lab Japan's Erasure Coding Library
|
||||
|
||||
A configuration utility is provided to help compare available EC schemes in
|
||||
terms of performance and redundancy:: tools/pyeclib_conf_tool.py
|
||||
|
||||
|
||||
The Python API supports the following functions:
|
||||
|
||||
- EC Encode
|
||||
|
||||
Encode N bytes of a data object into k (data) + m (parity) fragments::
|
||||
|
||||
def encode(self, data_bytes)
|
||||
|
||||
input: data_bytes - input data object (bytes)
|
||||
returns: list of fragments (bytes)
|
||||
|
||||
|
||||
- EC Decode
|
||||
|
||||
Decode between k and k+m fragments into original object::
|
||||
|
||||
def decode(self, fragment_payloads)
|
||||
|
||||
input: list of fragment_payloads (bytes)
|
||||
returns: decoded object (bytes)
|
||||
|
||||
|
||||
*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7.
|
||||
In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care
|
||||
needs to be taken when handling input to and output from the ``encode()`` and
|
||||
``decode()`` routines.
|
||||
|
||||
|
||||
- EC Reconstruct
|
||||
|
||||
Reconstruct "missing_fragment_indexes" using "available_fragment_payloads"::
|
||||
|
||||
def reconstruct(self, available_fragment_payloads, missing_fragment_indexes)
|
||||
|
||||
|
||||
|
||||
- Minimum parity fragments needed for durability gurantees::
|
||||
|
||||
def min_parity_fragments_needed(self)
|
||||
|
||||
|
||||
|
||||
- Fragments needed for EC Reconstruct
|
||||
|
||||
Return the indexes of fragments needed to reconstruct "missing_fragment_indexes"::
|
||||
|
||||
def fragments_needed(self, missing_fragment_indexes)
|
||||
|
||||
|
||||
|
||||
- Get EC Metadata
|
||||
|
||||
Return an opaque buffer known by the underlying library::
|
||||
|
||||
def get_metadata(self, fragment)
|
||||
|
||||
|
||||
|
||||
- Verify EC Stripe Consistency
|
||||
|
||||
Use opaque buffers from get_metadata() to verify a the consistency of a stripe::
|
||||
|
||||
def verify_stripe_metadata(self, fragment_metadata_list)
|
||||
|
||||
|
||||
|
||||
- Get EC Segment Info
|
||||
|
||||
Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments::
|
||||
|
||||
def get_segment_info(self, data_len, segment_size)
|
||||
|
||||
|
||||
|
||||
Quick Start:
|
||||
|
||||
Standard stuff to install::
|
||||
|
||||
``Python 2.6``, ``2.7`` or ``3.x`` (including development packages), ``argparse`` and ``liberasurecode``.
|
||||
|
||||
|
||||
As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode
|
||||
can be found at https://bitbucket.org/tsg-/liberasurecode)
|
||||
|
||||
|
||||
Install PyECLib::
|
||||
|
||||
$ sudo python setup.py install
|
||||
|
||||
Run test suite included::
|
||||
|
||||
$ sudo python setup.py test && (cd test; ./ec_pyeclib_file_test.sh)
|
||||
|
||||
If all of this works, then you should be good to go. If not, send us an email!
|
||||
|
||||
If the test suite fails because it cannot find any of the shared libraries,
|
||||
then you probably need to add /usr/local/lib to the path searched when loading
|
||||
libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to::
|
||||
|
||||
/etc/ld.so.conf
|
||||
|
||||
and then run::
|
||||
|
||||
$ ldconfig
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org
|
||||
|
||||
[2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf
|
||||
|
||||
[3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode
|
||||
|
||||
[4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
|
||||
|
||||
[5] Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>, Ryuta Kon <kon.ryuta@po.ntts.co.jp>, "NTT SHSS Erasure Coding backend"
|
||||
|
||||
--
|
||||
1.0
|
||||
This is v1.0-rc1 of PyECLib. This library provides a simple Python interface for
|
||||
implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x.
|
||||
|
||||
To obtain the best possible performance, the library utilizes liberasurecode,
|
||||
which is a C based erasure code library. Please let us know if you have any
|
||||
other issues building or installing (email: kmgreen2@gmail.com or
|
||||
tusharsg@gmail.com).
|
||||
|
||||
This library makes use of Jesasure for Reed-Solomon as implemented by the
|
||||
liberasurecode library and provides its' own flat XOR-based erasure code
|
||||
encoder and decoder. Currently, it implements a specific class of HD
|
||||
Combination Codes (see "Flat XOR-based erasure codes in storage systems:
|
||||
Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These
|
||||
codes are well-suited to archival use-cases, have a simple construction and
|
||||
require a minimum number of participating disks during single-disk
|
||||
reconstruction (think XOR-based LRC code).
|
||||
|
||||
Examples of using this library are provided in "tools" directory:
|
||||
|
||||
Command-line encoder::
|
||||
|
||||
tools/pyeclib_encode.py
|
||||
|
||||
Command-line decoder::
|
||||
|
||||
tools/pyeclib_decode.py
|
||||
|
||||
Utility to determine what is needed to reconstruct missing fragments::
|
||||
|
||||
tools/pyeclib_fragments_needed.py
|
||||
|
||||
|
||||
PyEClib initialization::
|
||||
|
||||
ec_driver = ECDriver(k=<num_encoded_data_fragments>,
|
||||
m=<num_encoded_parity_fragments>,
|
||||
ec_type=<ec_scheme>))
|
||||
|
||||
Supported ``ec_type`` values:
|
||||
|
||||
* ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1]
|
||||
* ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2]
|
||||
* ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3]
|
||||
* ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4]
|
||||
* ``shss`` => NTT Lab Japan's Erasure Coding Library
|
||||
|
||||
A configuration utility is provided to help compare available EC schemes in
|
||||
terms of performance and redundancy:: tools/pyeclib_conf_tool.py
|
||||
|
||||
|
||||
The Python API supports the following functions:
|
||||
|
||||
- EC Encode
|
||||
|
||||
Encode N bytes of a data object into k (data) + m (parity) fragments::
|
||||
|
||||
def encode(self, data_bytes)
|
||||
|
||||
input: data_bytes - input data object (bytes)
|
||||
returns: list of fragments (bytes)
|
||||
|
||||
|
||||
- EC Decode
|
||||
|
||||
Decode between k and k+m fragments into original object::
|
||||
|
||||
def decode(self, fragment_payloads)
|
||||
|
||||
input: list of fragment_payloads (bytes)
|
||||
returns: decoded object (bytes)
|
||||
|
||||
|
||||
*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7.
|
||||
In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care
|
||||
needs to be taken when handling input to and output from the ``encode()`` and
|
||||
``decode()`` routines.
|
||||
|
||||
|
||||
- EC Reconstruct
|
||||
|
||||
Reconstruct "missing_fragment_indexes" using "available_fragment_payloads"::
|
||||
|
||||
def reconstruct(self, available_fragment_payloads, missing_fragment_indexes)
|
||||
|
||||
|
||||
|
||||
- Minimum parity fragments needed for durability gurantees::
|
||||
|
||||
def min_parity_fragments_needed(self)
|
||||
|
||||
|
||||
|
||||
- Fragments needed for EC Reconstruct
|
||||
|
||||
Return the indexes of fragments needed to reconstruct "missing_fragment_indexes"::
|
||||
|
||||
def fragments_needed(self, missing_fragment_indexes)
|
||||
|
||||
|
||||
|
||||
- Get EC Metadata
|
||||
|
||||
Return an opaque buffer known by the underlying library::
|
||||
|
||||
def get_metadata(self, fragment)
|
||||
|
||||
|
||||
|
||||
- Verify EC Stripe Consistency
|
||||
|
||||
Use opaque buffers from get_metadata() to verify a the consistency of a stripe::
|
||||
|
||||
def verify_stripe_metadata(self, fragment_metadata_list)
|
||||
|
||||
|
||||
|
||||
- Get EC Segment Info
|
||||
|
||||
Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments::
|
||||
|
||||
def get_segment_info(self, data_len, segment_size)
|
||||
|
||||
|
||||
|
||||
- Get EC Segment Info given a data length and segment size::
|
||||
|
||||
def get_segment_info_byterange(self, ranges, data_len, segment_size)
|
||||
|
||||
Assume a range request is given for an object with segment size 3K and
|
||||
a 1 MB file:
|
||||
|
||||
Ranges = (0, 1), (1, 12), (10, 1000), (0, segment_size-1),
|
||||
(1, segment_size+1), (segment_size-1, 2*segment_size)
|
||||
|
||||
This will return a map keyed on the ranges, where there is a recipe
|
||||
given for each range:
|
||||
|
||||
{
|
||||
(0, 1): {0: (0, 1)},
|
||||
(10, 1000): {0: (10, 1000)},
|
||||
(1, 12): {0: (1, 12)},
|
||||
(0, 3071): {0: (0, 3071)},
|
||||
(3071, 6144): {0: (3071, 3071), 1: (0, 3071), 2: (0, 0)},
|
||||
(1, 3073): {0: (1, 3071), 1: (0,0)}
|
||||
}
|
||||
|
||||
|
||||
Quick Start:
|
||||
|
||||
Standard stuff to install::
|
||||
|
||||
``Python 2.6``, ``2.7`` or ``3.x`` (including development packages), ``argparse`` and ``liberasurecode``.
|
||||
|
||||
|
||||
As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode
|
||||
can be found at https://bitbucket.org/tsg-/liberasurecode)
|
||||
|
||||
|
||||
Install PyECLib::
|
||||
|
||||
$ sudo python setup.py install
|
||||
|
||||
Run test suite included::
|
||||
|
||||
$ sudo python setup.py test && (cd test; ./ec_pyeclib_file_test.sh)
|
||||
|
||||
If all of this works, then you should be good to go. If not, send us an email!
|
||||
|
||||
If the test suite fails because it cannot find any of the shared libraries,
|
||||
then you probably need to add /usr/local/lib to the path searched when loading
|
||||
libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to::
|
||||
|
||||
/etc/ld.so.conf
|
||||
|
||||
and then run::
|
||||
|
||||
$ ldconfig
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org
|
||||
|
||||
[2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf
|
||||
|
||||
[3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode
|
||||
|
||||
[4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
|
||||
|
||||
[5] Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>, Ryuta Kon <kon.ryuta@po.ntts.co.jp>, "NTT SHSS Erasure Coding backend"
|
||||
|
||||
--
|
||||
1.0
|
||||
Reference in New Issue
Block a user