Updating multi-hash spec for revision
Change-Id: I1f25c8d49c2b117cefb09d7edab494d32fed4ddb
This commit is contained in:
parent
660f7c7280
commit
04a9379a62
237
specs/queens/approved/glance/multihash.rst
Normal file
237
specs/queens/approved/glance/multihash.rst
Normal file
@ -0,0 +1,237 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=============================
|
||||
Secure Hash Algorithm Support
|
||||
=============================
|
||||
|
||||
https://blueprints.launchpad.net/glance/+spec/multihash
|
||||
|
||||
This spec provides a future-proof mechanism for providing a
|
||||
secure SHA512 hash for each image.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The hash of an image can be obtained from the ``checksum`` field in the
|
||||
image metadata. Currently, this checksum field holds a md5 hexdigest
|
||||
value. While md5 can still be used to verify data integrity against
|
||||
unintentional corruption, it should not be used to verify data integrity
|
||||
against tampering. A more collision-resistant hashing algorithm should
|
||||
be used for broader data integrity coverage.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec proposes adding two new fields to the image metadata. The
|
||||
proposed keys for this change would be ``os_hash_algo`` and ``os_hash_value``.
|
||||
The proposed value is a hash value that is formatted as a sha512 hexdigest.
|
||||
The hexdigest will be obtained using the current python hashlib library.
|
||||
``os_hash_algo`` will be sha3_512 and ``os_hash_value`` will be the hex. At this
|
||||
point in time, the config will default to sha3_512, but this change
|
||||
will allow us to alter allowed hashing algorithms at any time. The ``os_hash_algo``
|
||||
field will also allow for users to dynamically pass this value to their python code.
|
||||
We will leave MD5 calculations and checksums for backwards compatibility.
|
||||
|
||||
Adding a new image will set all of ``checksum``, ``os_hash_algo`` and ``os_hash_value``
|
||||
fields.
|
||||
|
||||
Requesting the image metadata of an image without the ``os_hash_algo/os_hash_value``
|
||||
value set, will result in a null for the ``os_hash_algo/os_hash_value`` field in the
|
||||
metadata response.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
1. Update the value of the ``checksum`` field with the hexdigest of a
|
||||
different hashing algorithm. This will likely break client-side
|
||||
checksum verification code.
|
||||
|
||||
2. Add only the new ``os_hash_value`` field. While it is a simpler change, it
|
||||
is a less future proof approach because it ties the sha512 algorithm to the
|
||||
``os_hash_value`` property in the same way that the md5 algorithm is tied to the
|
||||
current ``checksum`` property. When a sha512 collision is produced, there will
|
||||
have to be a new spec in to add yet another checksum field.
|
||||
|
||||
The approach described in this spec of using an additional ``os_hash_algo`` field
|
||||
will allow changing the hash algorithm without adding a new field or breaking
|
||||
the API contract. All we'll have to do is update the algorithm used and put
|
||||
its name in the ``os_hash_algo`` property, and then any image consumer will know
|
||||
how to interpret what's in the ``os_hash_value`` property.
|
||||
|
||||
It is worth noting that the Glance implementation of image signature validation
|
||||
gives us a precedent for having a value in one property (``img_signature``) and
|
||||
the name of the algorithm used in another property
|
||||
(``img_signature_hash_method``) [1]_. Thus the proposal in this spec is
|
||||
completely consistent with a related Glance workflow.
|
||||
|
||||
3. Use the hexdigest of a different algorithm. Tests using hashlib's
|
||||
sha256 and sha512 algorithms consistently yielded faster times for
|
||||
sha512 (SEE: Performance impact). The implementation of the sha512
|
||||
algorithm within hashlib demonstrates reasonable performance and
|
||||
should be considered collision-resistant for many years.
|
||||
|
||||
4. Implement a single ``multihash`` field in the image metadata in which we calculate
|
||||
the SHA512 value. The single field will contain the coded algorithm name and hash [0]_.
|
||||
The advantage is that it is only one field being added to the data model and schema.
|
||||
However, having only one field would make it difficult to use the hash quickly as
|
||||
the end user would be required to decode what algorithm is being used before verifying
|
||||
the hash.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
* Triggers: None
|
||||
* Expand: Two new columns (``os_hash_algo`` & ``os_hash_value``) with type string/varchar
|
||||
(defaulting to null) will be added to the images table. We will create an
|
||||
index similar to the one on ``checksum`` as well.
|
||||
* Migrate: None
|
||||
* Contract: None
|
||||
* Conflicts: None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Two new fields (``os_hash_algo`` & ``os_hash_value``) will exist in requests for the image
|
||||
metadata.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
A new more collision-resistant hash will be returned in addition to the
|
||||
current checksum.
|
||||
|
||||
Notification impact
|
||||
-------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
New image uploads (and the first download of previously uploaded images)
|
||||
will take an additional amount of time to calculate the sha512 checksum:
|
||||
|
||||
5G binary blob of random data:
|
||||
* md5: ~9s
|
||||
* sha256: ~22s
|
||||
* sha512: ~14s
|
||||
* 1Gbps line speed upload: 42s
|
||||
|
||||
1.5G Ubuntu 16.04 cloud image:
|
||||
* md5: ~2.9s
|
||||
* sha256: ~7.2s
|
||||
* sha512: ~4.6s
|
||||
* 1Gbps line speed upload: 12s
|
||||
|
||||
555M Debian 8 cloud image:
|
||||
* md5: ~1.0
|
||||
* sha256: ~2.5
|
||||
* sha512: ~1.6
|
||||
* 1Gbps line speed upload: 4.5s
|
||||
|
||||
* Note: SHA512 has been selected and should have minimal impact on overall upload time
|
||||
with regards to the entire process.
|
||||
|
||||
Test Code:
|
||||
|
||||
.. code:: python
|
||||
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import hashlib
|
||||
import time
|
||||
|
||||
|
||||
def runtime(f):
|
||||
def wrapper(*args, **kwargs):
|
||||
start = time.time()
|
||||
f(*args, **kwargs)
|
||||
print("Time elapsed: %s" % (time.time() - start))
|
||||
return wrapper
|
||||
|
||||
|
||||
@runtime
|
||||
def checksum(filename, algorithm):
|
||||
algorithms = {"md5": hashlib.md5,
|
||||
"256": hashlib.sha256,
|
||||
"512": hashlib.sha512,
|
||||
}
|
||||
with open(filename, "rb") as f:
|
||||
m = algorithms[algorithm]()
|
||||
for chunk in iter(lambda: f.read(65536), ''):
|
||||
m.update(chunk)
|
||||
print("%s: %s" % (algorithm, m.hexdigest()))
|
||||
|
||||
checksum("fake.img", "512")
|
||||
checksum("fake.img", "256")
|
||||
checksum("fake.img", "md5")
|
||||
checksum("fake.img", "256")
|
||||
checksum("fake.img", "md5")
|
||||
checksum("fake.img", "512")
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Any future checksum verification code should use the ``os_hash_algo`` & ``os_hash_value`` fields.
|
||||
Fallback to the ``checksum`` field if not properly populated.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee: Scott McClymont
|
||||
|
||||
|
||||
Other contributors:
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add tests
|
||||
|
||||
* Update the db to add ``os_hash_algo`` & ``os_hash_value`` columns to the images table (including
|
||||
expand, migrate, contract, and monolith code)
|
||||
|
||||
* Update the sections of code that calculate the ``checksum`` to also
|
||||
calculate ``os_hash_algo`` & ``os_hash_value`` (includes calculation on upload)
|
||||
|
||||
* Discuss updating on download ``os_hash_algo`` & ``os_hash_value`` when value is null
|
||||
|
||||
* Update internal checksum verification code to use the ``os_hash_value``
|
||||
field and fallback to ``checksum`` field when not present
|
||||
|
||||
* Update glance client
|
||||
|
||||
* Update docs
|
||||
|
||||
* Add the os_hash_algo value to the discovery API if the API is ready
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Update the tests to verify proper population of image properties
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [0] http://multiformats.io/multihash/
|
||||
.. [1] https://docs.openstack.org/glance/latest/user/signature.html
|
@ -1,214 +0,0 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=================
|
||||
Multihash Support
|
||||
=================
|
||||
|
||||
https://blueprints.launchpad.net/glance/+spec/multihash
|
||||
|
||||
This spec provides a future-proof mechanism for providing a
|
||||
self-describing hash for each image.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The hash of an image can be obtained from the ``checksum`` field in the
|
||||
image metadata. Currently, this checksum field holds a md5 hexdigest
|
||||
value. While md5 can still be used to verify data integrity against
|
||||
unintentional corruption, it should not be used to verify data integrity
|
||||
against tampering. A more collision-resistant hashing algorithm should
|
||||
be used for broader data integrity coverage.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec proposes adding a new field to the image metadata. The
|
||||
proposed key name for this field is ``multihash``. The proposed value is
|
||||
a multihash [0]_ formatted sha512 hexdigest (codec:sha2-512, code:0x13).
|
||||
The hexdigest will be obtained using the current hashlib library. The
|
||||
digest length and hash function type will be prepended as described in
|
||||
the documentation [0]_ for the multihash format.
|
||||
|
||||
Adding a new image will set both the ``checksum`` and ``multihash``
|
||||
fields.
|
||||
|
||||
Requesting the image metadata of an image without the ``multihash``
|
||||
value set, will result in a null for the ``multihash`` field in the
|
||||
metadata response.
|
||||
|
||||
Downloading an image without the ``multihash`` value set, will result in
|
||||
the value being populated upon completion.
|
||||
|
||||
Uploading an image to an existing image location will result in the
|
||||
``multihash`` value being set / updated.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
1. Update the value of the ``checksum`` field with the hexdigest of a
|
||||
different hashing algorithm. This will likely break client-side
|
||||
checksum verification code.
|
||||
|
||||
2. Add a new non-multihash checksum field. This is a less future proof
|
||||
approach that will likely result in a new spec in a few years to add
|
||||
yet another checksum field. Using the multihash format allows
|
||||
changing the algorithm without adding a new field or breaking the API
|
||||
contract.
|
||||
|
||||
3. Use the hexdigest of a different algorithm. Tests using hashlib's
|
||||
sha256 and sha512 algorithms consistently yielded faster times for
|
||||
sha512 (SEE: Performance impact). The implementation of the sha512
|
||||
algorithm within hashlib demonstrates reasonable performance and
|
||||
should be considered collision-resistant for many years.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
* Triggers: None
|
||||
* Expand: A new column (``multihash``) with type string/varchar
|
||||
(defaulting to null) will be added to the images table along with an
|
||||
index similar to the one on ``checksum``
|
||||
* Migrate: None
|
||||
* Contract: None
|
||||
* Conflicts: None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
A new field (``multihash``) will exist in requests for the image
|
||||
metadata.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
A new more collision-resistant hash will be returned in addition to the
|
||||
current checksum.
|
||||
|
||||
Notification impact
|
||||
-------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
New image uploads (and the first download of previously uploaded images)
|
||||
will take an additional amount of time to calculate the sha512 checksum:
|
||||
|
||||
5G binary blob of random data:
|
||||
* md5: ~9s
|
||||
* sha256: ~22s
|
||||
* sha512: ~14s
|
||||
* 1Gbps line speed upload: 42s
|
||||
|
||||
1.5G Ubuntu 16.04 cloud image:
|
||||
* md5: ~2.9s
|
||||
* sha256: ~7.2s
|
||||
* sha512: ~4.6s
|
||||
* 1Gbps line speed upload: 12s
|
||||
|
||||
555M Debian 8 cloud image:
|
||||
* md5: ~1.0
|
||||
* sha256: ~2.5
|
||||
* sha512: ~1.6
|
||||
* 1Gbps line speed upload: 4.5s
|
||||
|
||||
Test Code:
|
||||
|
||||
.. code:: python
|
||||
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import hashlib
|
||||
import time
|
||||
|
||||
|
||||
def runtime(f):
|
||||
def wrapper(*args, **kwargs):
|
||||
start = time.time()
|
||||
f(*args, **kwargs)
|
||||
print("Time elapsed: %s" % (time.time() - start))
|
||||
return wrapper
|
||||
|
||||
|
||||
@runtime
|
||||
def checksum(filename, algorithm):
|
||||
algorithms = {"md5": hashlib.md5,
|
||||
"256": hashlib.sha256,
|
||||
"512": hashlib.sha512,
|
||||
}
|
||||
with open(filename, "rb") as f:
|
||||
m = algorithms[algorithm]()
|
||||
for chunk in iter(lambda: f.read(65536), ''):
|
||||
m.update(chunk)
|
||||
print("%s: %s" % (algorithm, m.hexdigest()))
|
||||
|
||||
checksum("fake.img", "512")
|
||||
checksum("fake.img", "256")
|
||||
checksum("fake.img", "md5")
|
||||
checksum("fake.img", "256")
|
||||
checksum("fake.img", "md5")
|
||||
checksum("fake.img", "512")
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Any future checksum verification code should use the ``multihash`` field.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee: unassigned
|
||||
|
||||
|
||||
Other contributors:
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add tests
|
||||
|
||||
* Update the db to add ``multihash`` column to the images table (including
|
||||
expand, migrate, contract, and monolith code)
|
||||
|
||||
* Update the sections of code that calculate the ``checksum`` to also
|
||||
calculate ``multihash`` (includes calculation on upload)
|
||||
|
||||
* Update ``multihash`` on image download when value is null
|
||||
|
||||
* Update internal checksum verification code to use the ``multihash``
|
||||
field and fallback to ``checksum`` field when not present
|
||||
|
||||
* Update glance client
|
||||
|
||||
* Update docs
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
The multihash specification [0]_
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Update the tests to verify multihash
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [0] http://multiformats.io/multihash/
|
Loading…
Reference in New Issue
Block a user