Merge "Add spec to mitigate OSSN-0075"
This commit is contained in:
commit
9dddf33103
276
specs/rocky/approved/glance/mitigate-ossn-0075.rst
Normal file
276
specs/rocky/approved/glance/mitigate-ossn-0075.rst
Normal file
@ -0,0 +1,276 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
==================
|
||||||
|
Mitigate OSSN-0075
|
||||||
|
==================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/glance/+spec/mitigate-ossn-0075
|
||||||
|
|
||||||
|
OpenStack Security Note `OSSN-0075`_, "Deleted Glance image IDs may be
|
||||||
|
reassigned", was made public on 13 September 2016. The current situation is
|
||||||
|
that due to a lack of agreement of how to fix it, we've left operators in a bad
|
||||||
|
state: our advice is that soft-deleted rows in the 'images' table in the Glance
|
||||||
|
database should *not* be purged from the database, yet at the same time, the
|
||||||
|
``glance-manage`` tool deletes such rows without warning.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Briefly, the problem is that Glance has always allowed a user with permission
|
||||||
|
to make the image-create call the option of specifying an image_id. If the
|
||||||
|
specified image_id clashed with an existing image_id, the image-create
|
||||||
|
operation would fail; otherwise, the specified image_id would be applied to the
|
||||||
|
new image. Consistency is enforced by a uniqueness constraint on the 'id'
|
||||||
|
column in the 'images' table in the database. Since Glance database entries
|
||||||
|
are soft-deleted, a proposed image_id will be checked against all image_ids
|
||||||
|
that were assigned since the last purge of the 'images' table.
|
||||||
|
|
||||||
|
As described in `OSSN-0075`_, this problem becomes a security exploit when (a)
|
||||||
|
a popular public or community image is deleted, (b) the database is purged,
|
||||||
|
and (c) a user creates a new image with that same image_id. Users consuming an
|
||||||
|
image by image_id, which is the way Nova and Cinder consume images, may then
|
||||||
|
wind up booting virtual machines using an image different from the one they
|
||||||
|
intend to use.
|
||||||
|
|
||||||
|
Note that the new image would have its own data and checksum that would be
|
||||||
|
different from the original data and checksum, but there would be no way for
|
||||||
|
Nova, for instance, to know that these had changed. Were someone to boot a
|
||||||
|
server using the image_id, Nova would receive image data and then verify the
|
||||||
|
checksum against whatever checksum Glance has recorded as associated with the
|
||||||
|
image, which would be the *new* checksum.
|
||||||
|
|
||||||
|
The idea that once an image goes to 'active' status, the (image_id, image data,
|
||||||
|
checksum) will not change is called *image immutability*. It's important to
|
||||||
|
note that image immutability is required for Glance or else it cannot function
|
||||||
|
as an image catalog. If each consumer had to keep track of the image_id *and*
|
||||||
|
checksum *and* other essential properties in order to verify the downloaded
|
||||||
|
data, then there'd be no point in having Glance maintain this information.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The primary use case for allowing end-users to specify an image_id at the
|
||||||
|
time of image creation is to make it easy to find the "same" image data
|
||||||
|
(that is, the data is bit-for-bit identical although it's stored in
|
||||||
|
different locations) in different regions of a cloud. It's important to
|
||||||
|
note that the "sameness" of images in different regions is *not* guaranteed
|
||||||
|
by Glance. (A Glance installation can guarantee the immutability of images
|
||||||
|
within its own region, but it has no way of knowing what's happening in
|
||||||
|
other regions.) Thus, under the current situation, when an end user relies
|
||||||
|
on the image_id as the guarantor that they're getting the "same" data in
|
||||||
|
different cloud regions, the end user is actually relying upon the
|
||||||
|
trustworthiness of the *image owner*.
|
||||||
|
|
||||||
|
This is a separate issue from `OSSN-0075`_ and is independent of whether or
|
||||||
|
not the Glance database is ever purged. We point it out as something for
|
||||||
|
operators to keep in mind. To be clear about the issue, here's an example.
|
||||||
|
Suppose that a cloud operator puts an image with image_id A in regions R, S,
|
||||||
|
T, though for some reason the operator does not put that image in region U.
|
||||||
|
Any cloud user in region U could create an image with image_id A in
|
||||||
|
region U. The image could then be made available to some target user by
|
||||||
|
image sharing, or with the entire cloud by giving it 'community' visibility.
|
||||||
|
|
||||||
|
An operator can avoid this scenario by creating an image record with
|
||||||
|
image_id A in region U and not uploading any data to it. The image will
|
||||||
|
remain in 'queued' status, and if the visibility is not changed to 'public'
|
||||||
|
or 'community', the image will not appear in any end user's image-list
|
||||||
|
response.
|
||||||
|
|
||||||
|
There is also room for end user education here, namely, that image
|
||||||
|
consumers should *not* rely solely upon image_id to guarantee that they are
|
||||||
|
receiving the same image data in cross-region scenarios.
|
||||||
|
|
||||||
|
Through discussions with operators, it's clear that the ability to set the
|
||||||
|
image_id on image creation is being used out in the field, so we can't simply
|
||||||
|
block this ability. At the same time, we must allow the database to be
|
||||||
|
occasionally purged, as there is evidence that for large deployments, having a
|
||||||
|
large number of soft-deleted rows in the 'images' table affects the response
|
||||||
|
time of the image-list API call.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Modify the current ``glance-manage db purge`` command so that it will not purge
|
||||||
|
the images table.
|
||||||
|
|
||||||
|
Introduce a new command, ``glance-manage db purge-images-table`` to purge the
|
||||||
|
images table. The new command will take the same options as the current purge,
|
||||||
|
namely, ``--age-in-days`` and ``--max-rows``. The rationale for this being a
|
||||||
|
new command (rather than a ``--force`` option to the current command) is
|
||||||
|
twofold: (1) it's likely that the age-in-days used will be different for the
|
||||||
|
images table, and (2) given that purging the images table has a security
|
||||||
|
impact, having it as a completely separate command emphasizes this.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
1. Introduce a policy governing whether or not a user is allowed to specify
|
||||||
|
the image_id at the time of image creation. The downside of this proposal
|
||||||
|
is twofold:
|
||||||
|
|
||||||
|
* it breaks backward compatibility given that this ability has been allowed
|
||||||
|
up to now in both the v1 and v2 versions of the Image API
|
||||||
|
* it breaks interoperability in that end uses will have the ability in some
|
||||||
|
clouds but not in others
|
||||||
|
|
||||||
|
A further problem with this proposal is that if the cross-region use of
|
||||||
|
a particular image_id is denied to end users, they will have to use some
|
||||||
|
other piece of image metadata for this purpose. Since cinder and nova both
|
||||||
|
use the image_id when services are requested, user workflows will have to
|
||||||
|
change to introduce an extra call to the image service to find the image
|
||||||
|
record before the image_id to pass to cinder or nova is determined.
|
||||||
|
|
||||||
|
2. Instead of introducing a new column in the images table, introduce a new
|
||||||
|
single-column table with a uniqueness constraint to record "used" UUIDs.
|
||||||
|
The image-create operation would try to insert a proposed UUID into this
|
||||||
|
table instead of the 'images' table and fail as it currently does if the
|
||||||
|
uniqueness constraint were violated. This "used" UUID table would *never*
|
||||||
|
be purged, but the glance-manage tool could continue to purge all other
|
||||||
|
tables.
|
||||||
|
|
||||||
|
This alternative has the advantage of not impacting the image-list call. It
|
||||||
|
would eventually introduce a small delay into the image-create operation,
|
||||||
|
but that's probably acceptable.
|
||||||
|
|
||||||
|
The downside is that this proposal introduces an unpurgable table that is
|
||||||
|
unbounded in size.
|
||||||
|
|
||||||
|
3. A variation on alternative #2: instead of a single-column table, have at
|
||||||
|
least a deleted_at column in addition to the image_id. This table would not
|
||||||
|
be touched by the "normal" ``glance-manage`` database purge operation.
|
||||||
|
Rather, an additional purge operation could be introduced for this table
|
||||||
|
that would purge rows that were, say, 5 years old from the table.
|
||||||
|
|
||||||
|
A problem with this suggestion is that a determined attacker could
|
||||||
|
nonetheless flood the "used" image_ids table. This is possible because
|
||||||
|
while it might make sense to limit the number of existing images a user
|
||||||
|
owns, it doesn't make sense to limit the number of deleted images a user
|
||||||
|
owns. For example, an end user who creates an image of some important
|
||||||
|
server every day, but only keeps around a week's worth, will accumulate many
|
||||||
|
deleted images (multiplied by the number of servers this is being done for),
|
||||||
|
but this is perfectly legitimate behavior. So I'm not sure how flooding the
|
||||||
|
"used" image_id table could be prevented, except by something like
|
||||||
|
rate-limiting, though that would have to be set in such a way as not to
|
||||||
|
impact legitimate use cases.
|
||||||
|
|
||||||
|
4. Introduce a new field, ``preserve_id``, for use in the images table. This
|
||||||
|
field will be for internal Glance use only and will not be exposed through
|
||||||
|
the API. This field will be null by default and will be set true whenever
|
||||||
|
the 'visibility' field of an image is set to 'public' or 'community'. There
|
||||||
|
will be no way to unset the value of the field. In addition to this, modify
|
||||||
|
the glance-manage tool so that it will never delete an entry from the images
|
||||||
|
table that has ``preserve_id`` == True.
|
||||||
|
|
||||||
|
As with alternatives 2 and 3, the database table will continue to grow, but
|
||||||
|
this growth is constrained by keeping only rows relevant to the OSSN-0075
|
||||||
|
exploit. On the other hand, all an attacker has to do is read this spec to
|
||||||
|
realize that by creating image records with community visibilty, the images
|
||||||
|
table can still be flooded with spurious image records. Thus this strategy
|
||||||
|
is too easily defeated to be worth implementing, especially as it might give
|
||||||
|
operators a false sense of security.
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
This change will enhance security by providing operators with a means of
|
||||||
|
mitigating the exploit described in `OSSN-0075`_.
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
The images table will grow indefinitely, though the associated tables
|
||||||
|
(image_properties, image_tags, image_members, image_locations) can be purged by
|
||||||
|
the ``glance-manage`` tool.
|
||||||
|
|
||||||
|
The images table can be partially purged at appropriate intervals.
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Operators will have to monitor Glance for abnormal usage patterns and take
|
||||||
|
appropriate action.
|
||||||
|
|
||||||
|
Additionally, operators should be made aware of the cross-region version of the
|
||||||
|
OSSN-0075 exploit (as discussed in the Note in the Problem Description
|
||||||
|
section).
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
|
||||||
|
* brian-rosmaita
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
|
||||||
|
* undetermined
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
1. Modify the ``glance-manage`` tool:
|
||||||
|
|
||||||
|
* The current behavior is that it purges all tables of soft-deleted rows.
|
||||||
|
Change the behavior so that the images table is not purged by default.
|
||||||
|
|
||||||
|
* Add a new command to purge the images table. It should take the
|
||||||
|
``--age-in-days`` and ``--max-rows`` options just like the current purge
|
||||||
|
command.
|
||||||
|
|
||||||
|
2. update operator documentation
|
||||||
|
|
||||||
|
3. release note
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
No new dependencies.
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Appropriate unit tests to ensure the changes to glance and the glance-manage
|
||||||
|
tool function correctly.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
The Glance Administrator Guide will need to be updated.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
`OSSN-0075`_: `Deleted Glance image IDs may be reassigned`.
|
||||||
|
|
||||||
|
.. _OSSN-0075: https://wiki.openstack.org/wiki/OSSN/OSSN-0075
|
Loading…
Reference in New Issue
Block a user