A simple Python interface for implementing erasure codes
Go to file
Tushar Gohad d394a4c3d9 Fix xor backend names in file tests
Signed-off-by: Tushar Gohad <tushar.gohad@intel.com>
2015-02-24 20:49:46 -07:00
pyeclib Add missing isa_l_rs_vand type to valid EC types 2015-02-24 19:04:37 -07:00
src/c/pyeclib_c This is part of Kota's pull request #17. 2015-02-07 14:55:59 -08:00
test Fix xor backend names in file tests 2015-02-24 20:49:46 -07:00
tools Make library_import_str arg to ECDriver conditional 2014-11-19 09:34:16 -07:00
.gitignore Update .gitignore for python output 2014-07-05 17:46:55 -07:00
AUTHORS Updated my email address in the Authors file 2014-10-07 15:50:10 -07:00
License.txt Added BSD license 2013-10-01 17:26:27 -07:00
Makefile Add top-level Makefile to wrap build/install/test targets 2014-12-10 10:25:11 -07:00
MANIFEST.in removed c_eclib from manifest 2014-06-16 13:51:24 -07:00
README PyECLib v1.0-rc1 2015-02-24 18:55:38 -07:00
setup.py Refactor pyeclib python src tree for nosetests 2014-12-10 10:25:01 -07:00

This is v1.0-rc1 of PyECLib.  This library provides a simple Python interface for
implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x.

To obtain the best possible performance, the library utilizes liberasurecode,
which is a C based erasure code library.  Please let us know if you have any
other issues building or installing (email: kmgreen2@gmail.com or
tusharsg@gmail.com).

This library makes use of Jesasure for Reed-Solomon as implemented by the
liberasurecode library and provides its' own flat XOR-based erasure code
encoder and decoder.  Currently, it implements a specific class of HD
Combination Codes (see "Flat XOR-based erasure codes in storage systems:
Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010).  These
codes are well-suited to archival use-cases, have a simple construction and
require a minimum number of participating disks during single-disk
reconstruction (think XOR-based LRC code).

Examples of using this library are provided in "tools" directory:

  Command-line encoder::
  
      tools/pyeclib_encode.py

  Command-line decoder::
  
      tools/pyeclib_decode.py

  Utility to determine what is needed to reconstruct missing fragments::
  
      tools/pyeclib_fragments_needed.py


PyEClib initialization::

  ec_driver = ECDriver(k=<num_encoded_data_fragments>,
                       m=<num_encoded_parity_fragments>,
                       ec_type=<ec_scheme>))

Supported ``ec_type`` values:

  * ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding
  * ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant)
  * ``flat_xor_hd`` => Flat-XOR based HD combination codes
  * ``isa_l_rs_vand`` => SIMD-based Reed-Soloman implementation from ISA-L (Intel(R) Storage Acceleration Library)

A configuration utility is provided to help compare available EC schemes in 
terms of performance and redundancy:: tools/pyeclib_conf_tool.py


The Python API supports the following functions:

- EC Encode

  Encode N bytes of a data object into k (data) + m (parity) fragments::

    def encode(self, data_bytes)

    input:   data_bytes - input data object (bytes)
    returns: list of fragments (bytes)


- EC Decode

  Decode between k and k+m fragments into original object::

    def decode(self, fragment_payloads)

    input:   list of fragment_payloads (bytes)
    returns: decoded object (bytes)


*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7.
In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care
needs to be taken when handling input to and output from the ``encode()`` and
``decode()`` routines.


- EC Reconstruct

  Reconstruct "missing_fragment_indexes" using "available_fragment_payloads"::

    def reconstruct(self, available_fragment_payloads, missing_fragment_indexes)
    

- Minimum parity fragments needed for durability gurantees
    
    def min_parity_fragments_needed(self)

 
- Fragments needed for EC Reconstruct

  Return the indexes of fragments needed to reconstruct "missing_fragment_indexes"::

    def fragments_needed(self, missing_fragment_indexes)


- Get EC Metadata

  Return an opaque buffer known by the underlying library::

    def get_metadata(self, fragment)


- Verify EC Stripe Consistency

  Use opaque buffers from get_metadata() to verify a the consistency of a stripe::

    def verify_stripe_metadata(self, fragment_metadata_list)


- Get EC Segment Info

  Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments::

    def get_segment_info(self, data_len, segment_size)


Quick Start:

  Standard stuff to install::
  
    ``Python 2.6``, ``2.7`` or ``3.x`` (including development packages), ``argparse`` and ``liberasurecode``.


  As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode
  can be found at https://bitbucket.org/tsg-/liberasurecode)


  Install PyECLib::

    $ sudo python setup.py install

  Run test suite included::

    $ sudo python setup.py test && (cd test; ./ec_pyeclib_file_test.sh)

  If all of this works, then you should be good to go.  If not, send us an email!

  If the test suite fails because it cannot find any of the shared libraries,
  then you probably need to add /usr/local/lib to the path searched when loading
  libraries.  The best way to do this (on Linux) is to add '/usr/local/lib' to::

    /etc/ld.so.conf 

  and then run::

    $ ldconfig

--
0.10