Update code_organization.md docs

Liberasurecode docs haven't been updated in a while. There have been
some new implementations added so these have been added to the
code_organisation.md doc. Although I didn't add ALL files in the
repo, just the more important implementation ones, namele:

  - isa_l_rs_cauchy.c
  - isa_l_rs_vand_inv.c
  - liberasurecode_rs_vand

Tim provided an overview of erasure coding to a colleague, and makes a
good additional doc for this repo, and I have his permission to add it.

  doc/erasure_coding.md

Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: Ifd3e4aea4dbed664fb77a3e4a3106bd3b0d6f343
Signed-off-by: Matthew Oliver <matt@oliver.net.au>
This commit is contained in:
Matthew Oliver
2025-07-31 16:54:28 +10:00
committed by Tim Burke
parent 5d25e4a663
commit f6fa6c668b
2 changed files with 188 additions and 25 deletions

View File

@@ -3,49 +3,53 @@ Code organization
```
|-- include
| +-- erasurecode
| | +-- erasurecode.h --> liberasurecode frontend API header
| | +-- erasurecode_backend.h --> liberasurecode backend API header
| +-- xor_codes --> headers for the built-in XOR codes
| | +-- erasurecode.h --> liberasurecode frontend API header
| | +-- erasurecode_backend.h --> liberasurecode backend API header
| +-- xor_codes --> headers for the built-in XOR codes
|
|-- src
| |-- erasurecode.c --> liberasurecode API implementation
| | (frontend + backend)
| |-- erasurecode.c --> liberasurecode API implementation
| | (frontend + backend)
| |-- backends
| | +-- null
| | +--- null.c --> 'null' erasure code backend (template backend)
| | +-- null.c --> 'null' erasure code backend (template backend)
| | +-- xor
| | +--- flat_xor_hd.c --> 'flat_xor_hd' erasure code backend (built-in)
| | +-- jerasure
| | +-- jerasure_rs_cauchy.c --> 'jerasure_rs_vand' erasure code backend (jerasure.org)
| | +-- jerasure_rs_vand.c --> 'jerasure_rs_cauchy' erasure code backend (jerasure.org)
| | +-- flat_xor_hd.c --> 'flat_xor_hd' erasure code backend (built-in)
| | +-- rs_vand
| | +-- liberasurecode_rs_vand.c --> 'liberasurecode_rs_vand' erasure code backend (built-in)
| | +-- jerasure
| | +-- jerasure_rs_cauchy.c --> 'jerasure_rs_vand' erasure code backend (jerasure.org)
| | +-- jerasure_rs_vand.c --> 'jerasure_rs_cauchy' erasure code backend (jerasure.org)
| | +-- isa-l
| | +-- isa_l_rs_vand.c --> 'isa_l_rs_vand' erasure code backend (Intel)
| | +-- isa_l_rs_vand.c --> 'isa_l_rs_vand' erasure code backend (Intel)
| | +-- isa_l_rs_vand_inv.c --> 'isa_l_rs_vand_inv' erasure code backend (Intel)
| | +-- isa_l_rs_cauchy.c --> 'isa_l_rs_cauchy' erasure code backend (Intel)
| | +-- shss
| | +-- shss.c --> 'shss' erasure code backend (NTT Labs)
| | +-- shss.c --> 'shss' erasure code backend (NTT Labs)
| | +-- phazrio
| | +-- libphazr.c --> 'libphazr' erasure code backend (Phazr.IO)
| | +-- libphazr.c --> 'libphazr' erasure code backend (Phazr.IO)
| |
| |-- builtin
| | +-- xor_codes --> XOR HD code backend, built-in erasure
| | | code implementation (shared library)
| | +-- xor_codes --> XOR HD code backend, built-in erasure
| | | code implementation (shared library)
| | +-- xor_code.c
| | +-- xor_hd_code.c
| | +-- rs_vand --> liberasurecode native Reed Soloman codes
| | +-- rs_vand --> liberasurecode native Reed Soloman codes
| |
| +-- utils
| +-- chksum --> fragment checksum utils for erasure
| +-- alg_sig.c coded fragments
| +-- chksum --> fragment checksum utils for erasure
| +-- alg_sig.c coded fragments
| +-- crc32.c
|
|-- doc --> API Documentation
|-- doc --> API Documentation
| +-- Doxyfile
| +-- html
|
|--- test --> Test routines
| +-- builtin
| | +-- xor_codes
| +-- liberasurecode_test.c
| +-- utils
|-- test --> Test routines
| +-- builtin
| | +-- xor_codes
| +-- liberasurecode_test.c
| +-- utils
|
|-- autogen.sh
|-- configure.ac
@@ -57,4 +61,4 @@ Code organization
|-- INSTALL
+-- ChangeLog
```
---
---

159
doc/erasure_coding.md Normal file
View File

@@ -0,0 +1,159 @@
Overview
========
Erasure coding allows the distribution of data across several independent
disks, improving data durability without requiring as much overhead as
high-replica replication. Data is broken into `k` data fragments, then
`k + m` fragments are calculated and stored. Given some `n ∈ [k, k+m)`
of these stored fragments, the original data can be reconstructed. Optimal
codes ensure that all subsets of `k` stored fragments can be used for
reconstruction.
Theory
======
Any [Reed-Solomon](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)
code uses linear algebra over a [Galois field](https://en.wikipedia.org/wiki/Finite_field).
The `k` data fragments are represented as a series of vectors and multiplied
by a `k × (k + m)` encoding matrix `E` to produce the `k + m` fragments for
storage. To decode a set of fragments `[f₁, f₂, ..., fₙ]`, select the
corresponding columns of `E` to create a `k × n` matrix `E` then compute
the decoding matrix `D` as a left-inverse of `Eᵀ ` (i.e., `D × Eᵀ = Iₖ`).
Multiply the fragments by `D` to recover the original data.
Note that for systematic encodings, the left-most `k × k` submatrix of `E` is `Iₖ`.
The encoding matrix `E` is typically based upon either
a [Vandermonde matrix](https://en.wikipedia.org/wiki/Vandermonde_matrix) or
a [Cauchy matrix](https://en.wikipedia.org/wiki/Cauchy_matrix).
The flat XOR codes eschew matrix inversion and multiplication (which are both
expensive) in favor of XOR-ing particular subsets of fragments together to
create parity fragments. For more information, see
"[Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs](https://web.archive.org/web/20161001210233/https://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf)".
Relevant Projects
=================
- [liberasurecode](https://opendev.org/openstack/liberasurecode/)
The primary entrypoint, offering a unifying interface for multiple
possible backends.
- [pyeclib](https://opendev.org/openstack/pyeclib/)
Python bindings for liberasurecode.
- [isa-l](https://github.com/intel/isa-l/)
Collection of optimized low-level functions for storage applications.
Uses [multi-binary dispatch](https://github.com/intel/isa-l/blob/master/doc/functions.md#multi-binary-dispatchers)
to offer optimized assembly to CPUs with a range of capabilities from
a single binary. Notably, provides fast block Reed-Solomon type erasure
codes for arbitrary encode/decode matrices as well as two functions for
generating specific encoding matrices.
- [jerasure](https://github.com/ceph/jerasure/)
First Reed-Solomon codes supported by liberasurecode. Requires gf-complete.
Written by James Plank, who has since made [the original website](jerasure.org)
read-only and [issued a notice](https://web.eecs.utk.edu/~jplank/plank/www/software.html)
regarding claims of patent-infringement.
- [gf-complete](https://github.com/ceph/gf-complete/)
Galois field library used by jerasure; also written by James Plank,
also potentially patent-encumbered.
- shss
Proprietary; developed by NTT. Requires additional data to be stored with
every fragment.
- libphazr
Proprietary; developed by Phazr.io. Requires additional data to be stored
with every fragment.
Supported Backends
==================
Provided by liberasurecode
--------------------------
- `liberasurecode_rs_vand` (added in liberasurecode 1.0.8, pyeclib 1.0.8)
- `flat_xor_hd3`
- `flat_xor_hd4`
Provided by isa-l
-----------------
- `isa_l_rs_vand`
Uses the Reed-Solomon functions provided by isa-l with
[an encoding matrix also provided by isa-l](https://github.com/intel/isa-l/blob/v2.31.1/erasure_code/ec_base.c#L78-L96).
Since this matrix is constructed by extending `Iₖ` with a `k × m` Vandermond
matrix, a sufficient condition for optimality is that `m ≤ 4`; beyond that,
some `k × k` submatrices may not be invertible.
Prior to liberasurecode 1.3.0, it did not detect the failure to invert `Eᵀ `,
leading to incidents of data corruption. See [bug #1639691](https://bugs.launchpad.net/liberasurecode/+bug/1639691)
for more information.
- `isa_l_rs_vand_inv` (added in liberasurecode 1.7.0, pyeclib 1.7.0)
Uses the Reed-Solomon functions provided by isa-l with an encoding matrix
provided by liberasurecode. To construct the encoding matrix, start with a
`k × (k + m)` Vandermond matrix `V`, define `V` as the left-most `k × k`
submatrix, then calculate `E = inv(V) × V`. This makes a systematic code
that is optimal for all `k` and `m`.
- `isa_l_rs_cauchy` (added in liberasurecode 1.4.0, pyeclib 1.4.0)
Uses the Reed-Solomon functions provided by isa-l with
[an encoding matrix also provided by isa-l](https://github.com/intel/isa-l/blob/v2.31.1/erasure_code/ec_base.c#L78-L96).
Being a Cauchy matrix, it forms an optimal code for all `k` and `m`.
Provided by jerasure
--------------------
- `jerasure_rs_vand`
- `jerasure_rs_cauchy`
Proprietary
-----------
- `shss` (added in liberasurecode 1.0.0, pyeclib 1.0.1)
- `libphazr` (added in liberasurecode 1.5.0, pyeclib 1.5.0)
Classifications
===============
Required Fragments
------------------
### n ≡ k
Most supported backends are optimal erasure codes, where any `k` fragments
are sufficient to recover the original data.
### n > k
The flat XOR codes require more than `k` fragments to decode in the general
case. In particular, `flat_xor_hd3` requires at least `n ≡ k + m - 2`
fragments and `flat_xor_hd4` requires at least `n ≡ k + m - 3`.
Systematic vs. Non-systematic
-----------------------------
[Systematic codes](https://en.wikipedia.org/wiki/Systematic_code) ensure that
the first `k` fragments for storage correspond to the initial `k` data
fragments. This can greatly speed up decoding when all `k` data fragments are
available as well as provide more recovery options in certain failure cases.
Non-systematic encodings do not ensure that. Rather, they often will seek to
ensure that *none* of the original data is directly present in the storage
fragments, thus ensuring confidentiality of data when less than `n` fragments
are available. See also: [secure secret sharing](https://en.wikipedia.org/wiki/Secret_sharing).
The following backends are non-systematic:
- `shss`
- `libphazr`
All other supported backends are systematic.