diff --git a/README b/README index b5f20d2..39ddefb 100644 --- a/README +++ b/README @@ -1,170 +1,193 @@ -This is v1.0-rc1 of PyECLib. This library provides a simple Python interface for -implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x. - -To obtain the best possible performance, the library utilizes liberasurecode, -which is a C based erasure code library. Please let us know if you have any -other issues building or installing (email: kmgreen2@gmail.com or -tusharsg@gmail.com). - -This library makes use of Jesasure for Reed-Solomon as implemented by the -liberasurecode library and provides its' own flat XOR-based erasure code -encoder and decoder. Currently, it implements a specific class of HD -Combination Codes (see "Flat XOR-based erasure codes in storage systems: -Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These -codes are well-suited to archival use-cases, have a simple construction and -require a minimum number of participating disks during single-disk -reconstruction (think XOR-based LRC code). - -Examples of using this library are provided in "tools" directory: - - Command-line encoder:: - - tools/pyeclib_encode.py - - Command-line decoder:: - - tools/pyeclib_decode.py - - Utility to determine what is needed to reconstruct missing fragments:: - - tools/pyeclib_fragments_needed.py - - -PyEClib initialization:: - - ec_driver = ECDriver(k=, - m=, - ec_type=)) - -Supported ``ec_type`` values: - - * ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1] - * ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2] - * ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3] - * ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4] - * ``shss`` => NTT Lab Japan's Erasure Coding Library - -A configuration utility is provided to help compare available EC schemes in -terms of performance and redundancy:: tools/pyeclib_conf_tool.py - - -The Python API supports the following functions: - -- EC Encode - - Encode N bytes of a data object into k (data) + m (parity) fragments:: - - def encode(self, data_bytes) - - input: data_bytes - input data object (bytes) - returns: list of fragments (bytes) - - -- EC Decode - - Decode between k and k+m fragments into original object:: - - def decode(self, fragment_payloads) - - input: list of fragment_payloads (bytes) - returns: decoded object (bytes) - - -*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7. -In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care -needs to be taken when handling input to and output from the ``encode()`` and -``decode()`` routines. - - -- EC Reconstruct - - Reconstruct "missing_fragment_indexes" using "available_fragment_payloads":: - - def reconstruct(self, available_fragment_payloads, missing_fragment_indexes) - - - -- Minimum parity fragments needed for durability gurantees:: - - def min_parity_fragments_needed(self) - - - -- Fragments needed for EC Reconstruct - - Return the indexes of fragments needed to reconstruct "missing_fragment_indexes":: - - def fragments_needed(self, missing_fragment_indexes) - - - -- Get EC Metadata - - Return an opaque buffer known by the underlying library:: - - def get_metadata(self, fragment) - - - -- Verify EC Stripe Consistency - - Use opaque buffers from get_metadata() to verify a the consistency of a stripe:: - - def verify_stripe_metadata(self, fragment_metadata_list) - - - -- Get EC Segment Info - - Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments:: - - def get_segment_info(self, data_len, segment_size) - - - -Quick Start: - - Standard stuff to install:: - - ``Python 2.6``, ``2.7`` or ``3.x`` (including development packages), ``argparse`` and ``liberasurecode``. - - - As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode - can be found at https://bitbucket.org/tsg-/liberasurecode) - - - Install PyECLib:: - - $ sudo python setup.py install - - Run test suite included:: - - $ sudo python setup.py test && (cd test; ./ec_pyeclib_file_test.sh) - - If all of this works, then you should be good to go. If not, send us an email! - - If the test suite fails because it cannot find any of the shared libraries, - then you probably need to add /usr/local/lib to the path searched when loading - libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to:: - - /etc/ld.so.conf - - and then run:: - - $ ldconfig - - -References - - [1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org - - [2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf - - [3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode - - [4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version - - [5] Kota Tsuyuzaki , Ryuta Kon , "NTT SHSS Erasure Coding backend" - --- -1.0 +This is v1.0-rc1 of PyECLib. This library provides a simple Python interface for +implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x. + +To obtain the best possible performance, the library utilizes liberasurecode, +which is a C based erasure code library. Please let us know if you have any +other issues building or installing (email: kmgreen2@gmail.com or +tusharsg@gmail.com). + +This library makes use of Jesasure for Reed-Solomon as implemented by the +liberasurecode library and provides its' own flat XOR-based erasure code +encoder and decoder. Currently, it implements a specific class of HD +Combination Codes (see "Flat XOR-based erasure codes in storage systems: +Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These +codes are well-suited to archival use-cases, have a simple construction and +require a minimum number of participating disks during single-disk +reconstruction (think XOR-based LRC code). + +Examples of using this library are provided in "tools" directory: + + Command-line encoder:: + + tools/pyeclib_encode.py + + Command-line decoder:: + + tools/pyeclib_decode.py + + Utility to determine what is needed to reconstruct missing fragments:: + + tools/pyeclib_fragments_needed.py + + +PyEClib initialization:: + + ec_driver = ECDriver(k=, + m=, + ec_type=)) + +Supported ``ec_type`` values: + + * ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1] + * ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2] + * ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3] + * ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4] + * ``shss`` => NTT Lab Japan's Erasure Coding Library + +A configuration utility is provided to help compare available EC schemes in +terms of performance and redundancy:: tools/pyeclib_conf_tool.py + + +The Python API supports the following functions: + +- EC Encode + + Encode N bytes of a data object into k (data) + m (parity) fragments:: + + def encode(self, data_bytes) + + input: data_bytes - input data object (bytes) + returns: list of fragments (bytes) + + +- EC Decode + + Decode between k and k+m fragments into original object:: + + def decode(self, fragment_payloads) + + input: list of fragment_payloads (bytes) + returns: decoded object (bytes) + + +*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7. +In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care +needs to be taken when handling input to and output from the ``encode()`` and +``decode()`` routines. + + +- EC Reconstruct + + Reconstruct "missing_fragment_indexes" using "available_fragment_payloads":: + + def reconstruct(self, available_fragment_payloads, missing_fragment_indexes) + + + +- Minimum parity fragments needed for durability gurantees:: + + def min_parity_fragments_needed(self) + + + +- Fragments needed for EC Reconstruct + + Return the indexes of fragments needed to reconstruct "missing_fragment_indexes":: + + def fragments_needed(self, missing_fragment_indexes) + + + +- Get EC Metadata + + Return an opaque buffer known by the underlying library:: + + def get_metadata(self, fragment) + + + +- Verify EC Stripe Consistency + + Use opaque buffers from get_metadata() to verify a the consistency of a stripe:: + + def verify_stripe_metadata(self, fragment_metadata_list) + + + +- Get EC Segment Info + + Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments:: + + def get_segment_info(self, data_len, segment_size) + + + +- Get EC Segment Info given a data length and segment size:: + + def get_segment_info_byterange(self, ranges, data_len, segment_size) + + Assume a range request is given for an object with segment size 3K and + a 1 MB file: + + Ranges = (0, 1), (1, 12), (10, 1000), (0, segment_size-1), + (1, segment_size+1), (segment_size-1, 2*segment_size) + + This will return a map keyed on the ranges, where there is a recipe + given for each range: + + { + (0, 1): {0: (0, 1)}, + (10, 1000): {0: (10, 1000)}, + (1, 12): {0: (1, 12)}, + (0, 3071): {0: (0, 3071)}, + (3071, 6144): {0: (3071, 3071), 1: (0, 3071), 2: (0, 0)}, + (1, 3073): {0: (1, 3071), 1: (0,0)} + } + + +Quick Start: + + Standard stuff to install:: + + ``Python 2.6``, ``2.7`` or ``3.x`` (including development packages), ``argparse`` and ``liberasurecode``. + + + As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode + can be found at https://bitbucket.org/tsg-/liberasurecode) + + + Install PyECLib:: + + $ sudo python setup.py install + + Run test suite included:: + + $ sudo python setup.py test && (cd test; ./ec_pyeclib_file_test.sh) + + If all of this works, then you should be good to go. If not, send us an email! + + If the test suite fails because it cannot find any of the shared libraries, + then you probably need to add /usr/local/lib to the path searched when loading + libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to:: + + /etc/ld.so.conf + + and then run:: + + $ ldconfig + + +References + + [1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org + + [2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf + + [3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode + + [4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version + + [5] Kota Tsuyuzaki , Ryuta Kon , "NTT SHSS Erasure Coding backend" + +-- +1.0 \ No newline at end of file