Browse Source

Merge "Add spec to mitigate OSSN-0075"

Zuul 10 months ago
parent
commit
9dddf33103
1 changed files with 276 additions and 0 deletions
  1. 276
    0
      specs/rocky/approved/glance/mitigate-ossn-0075.rst

+ 276
- 0
specs/rocky/approved/glance/mitigate-ossn-0075.rst View File

@@ -0,0 +1,276 @@
1
+..
2
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
3
+ License.
4
+
5
+ http://creativecommons.org/licenses/by/3.0/legalcode
6
+
7
+==================
8
+Mitigate OSSN-0075
9
+==================
10
+
11
+https://blueprints.launchpad.net/glance/+spec/mitigate-ossn-0075
12
+
13
+OpenStack Security Note `OSSN-0075`_, "Deleted Glance image IDs may be
14
+reassigned", was made public on 13 September 2016.  The current situation is
15
+that due to a lack of agreement of how to fix it, we've left operators in a bad
16
+state: our advice is that soft-deleted rows in the 'images' table in the Glance
17
+database should *not* be purged from the database, yet at the same time, the
18
+``glance-manage`` tool deletes such rows without warning.
19
+
20
+Problem description
21
+===================
22
+
23
+Briefly, the problem is that Glance has always allowed a user with permission
24
+to make the image-create call the option of specifying an image_id.  If the
25
+specified image_id clashed with an existing image_id, the image-create
26
+operation would fail; otherwise, the specified image_id would be applied to the
27
+new image.  Consistency is enforced by a uniqueness constraint on the 'id'
28
+column in the 'images' table in the database.  Since Glance database entries
29
+are soft-deleted, a proposed image_id will be checked against all image_ids
30
+that were assigned since the last purge of the 'images' table.
31
+
32
+As described in `OSSN-0075`_, this problem becomes a security exploit when (a)
33
+a popular public or community image is deleted, (b) the database is purged,
34
+and (c) a user creates a new image with that same image_id.  Users consuming an
35
+image by image_id, which is the way Nova and Cinder consume images, may then
36
+wind up booting virtual machines using an image different from the one they
37
+intend to use.
38
+
39
+Note that the new image would have its own data and checksum that would be
40
+different from the original data and checksum, but there would be no way for
41
+Nova, for instance, to know that these had changed.  Were someone to boot a
42
+server using the image_id, Nova would receive image data and then verify the
43
+checksum against whatever checksum Glance has recorded as associated with the
44
+image, which would be the *new* checksum.
45
+
46
+The idea that once an image goes to 'active' status, the (image_id, image data,
47
+checksum) will not change is called *image immutability*.  It's important to
48
+note that image immutability is required for Glance or else it cannot function
49
+as an image catalog.  If each consumer had to keep track of the image_id *and*
50
+checksum *and* other essential properties in order to verify the downloaded
51
+data, then there'd be no point in having Glance maintain this information.
52
+
53
+.. note::
54
+
55
+   The primary use case for allowing end-users to specify an image_id at the
56
+   time of image creation is to make it easy to find the "same" image data
57
+   (that is, the data is bit-for-bit identical although it's stored in
58
+   different locations) in different regions of a cloud.  It's important to
59
+   note that the "sameness" of images in different regions is *not* guaranteed
60
+   by Glance.  (A Glance installation can guarantee the immutability of images
61
+   within its own region, but it has no way of knowing what's happening in
62
+   other regions.)  Thus, under the current situation, when an end user relies
63
+   on the image_id as the guarantor that they're getting the "same" data in
64
+   different cloud regions, the end user is actually relying upon the
65
+   trustworthiness of the *image owner*.
66
+
67
+   This is a separate issue from `OSSN-0075`_ and is independent of whether or
68
+   not the Glance database is ever purged.  We point it out as something for
69
+   operators to keep in mind.  To be clear about the issue, here's an example.
70
+   Suppose that a cloud operator puts an image with image_id A in regions R, S,
71
+   T, though for some reason the operator does not put that image in region U.
72
+   Any cloud user in region U could create an image with image_id A in
73
+   region U.  The image could then be made available to some target user by
74
+   image sharing, or with the entire cloud by giving it 'community' visibility.
75
+
76
+   An operator can avoid this scenario by creating an image record with
77
+   image_id A in region U and not uploading any data to it.  The image will
78
+   remain in 'queued' status, and if the visibility is not changed to 'public'
79
+   or 'community', the image will not appear in any end user's image-list
80
+   response.
81
+
82
+   There is also room for end user education here, namely, that image
83
+   consumers should *not* rely solely upon image_id to guarantee that they are
84
+   receiving the same image data in cross-region scenarios.
85
+
86
+Through discussions with operators, it's clear that the ability to set the
87
+image_id on image creation is being used out in the field, so we can't simply
88
+block this ability.  At the same time, we must allow the database to be
89
+occasionally purged, as there is evidence that for large deployments, having a
90
+large number of soft-deleted rows in the 'images' table affects the response
91
+time of the image-list API call.
92
+
93
+Proposed change
94
+===============
95
+
96
+Modify the current ``glance-manage db purge`` command so that it will not purge
97
+the images table.
98
+
99
+Introduce a new command, ``glance-manage db purge-images-table`` to purge the
100
+images table.  The new command will take the same options as the current purge,
101
+namely, ``--age-in-days`` and ``--max-rows``.  The rationale for this being a
102
+new command (rather than a ``--force`` option to the current command) is
103
+twofold: (1) it's likely that the age-in-days used will be different for the
104
+images table, and (2) given that purging the images table has a security
105
+impact, having it as a completely separate command emphasizes this.
106
+
107
+Alternatives
108
+------------
109
+
110
+1. Introduce a policy governing whether or not a user is allowed to specify
111
+   the image_id at the time of image creation.  The downside of this proposal
112
+   is twofold:
113
+
114
+   * it breaks backward compatibility given that this ability has been allowed
115
+     up to now in both the v1 and v2 versions of the Image API
116
+   * it breaks interoperability in that end uses will have the ability in some
117
+     clouds but not in others
118
+
119
+   A further problem with this proposal is that if the cross-region use of
120
+   a particular image_id is denied to end users, they will have to use some
121
+   other piece of image metadata for this purpose.  Since cinder and nova both
122
+   use the image_id when services are requested, user workflows will have to
123
+   change to introduce an extra call to the image service to find the image
124
+   record before the image_id to pass to cinder or nova is determined.
125
+
126
+2. Instead of introducing a new column in the images table, introduce a new
127
+   single-column table with a uniqueness constraint to record "used" UUIDs.
128
+   The image-create operation would try to insert a proposed UUID into this
129
+   table instead of the 'images' table and fail as it currently does if the
130
+   uniqueness constraint were violated.  This "used" UUID table would *never*
131
+   be purged, but the glance-manage tool could continue to purge all other
132
+   tables.
133
+
134
+   This alternative has the advantage of not impacting the image-list call.  It
135
+   would eventually introduce a small delay into the image-create operation,
136
+   but that's probably acceptable.
137
+
138
+   The downside is that this proposal introduces an unpurgable table that is
139
+   unbounded in size.
140
+
141
+3. A variation on alternative #2: instead of a single-column table, have at
142
+   least a deleted_at column in addition to the image_id.  This table would not
143
+   be touched by the "normal" ``glance-manage`` database purge operation.
144
+   Rather, an additional purge operation could be introduced for this table
145
+   that would purge rows that were, say, 5 years old from the table.
146
+
147
+   A problem with this suggestion is that a determined attacker could
148
+   nonetheless flood the "used" image_ids table.  This is possible because
149
+   while it might make sense to limit the number of existing images a user
150
+   owns, it doesn't make sense to limit the number of deleted images a user
151
+   owns.  For example, an end user who creates an image of some important
152
+   server every day, but only keeps around a week's worth, will accumulate many
153
+   deleted images (multiplied by the number of servers this is being done for),
154
+   but this is perfectly legitimate behavior.  So I'm not sure how flooding the
155
+   "used" image_id table could be prevented, except by something like
156
+   rate-limiting, though that would have to be set in such a way as not to
157
+   impact legitimate use cases.
158
+
159
+4. Introduce a new field, ``preserve_id``, for use in the images table.  This
160
+   field will be for internal Glance use only and will not be exposed through
161
+   the API.  This field will be null by default and will be set true whenever
162
+   the 'visibility' field of an image is set to 'public' or 'community'.  There
163
+   will be no way to unset the value of the field.  In addition to this, modify
164
+   the glance-manage tool so that it will never delete an entry from the images
165
+   table that has ``preserve_id`` == True.
166
+
167
+   As with alternatives 2 and 3, the database table will continue to grow, but
168
+   this growth is constrained by keeping only rows relevant to the OSSN-0075
169
+   exploit.  On the other hand, all an attacker has to do is read this spec to
170
+   realize that by creating image records with community visibilty, the images
171
+   table can still be flooded with spurious image records.  Thus this strategy
172
+   is too easily defeated to be worth implementing, especially as it might give
173
+   operators a false sense of security.
174
+
175
+Data model impact
176
+-----------------
177
+
178
+None
179
+
180
+REST API impact
181
+---------------
182
+
183
+None
184
+
185
+Security impact
186
+---------------
187
+
188
+This change will enhance security by providing operators with a means of
189
+mitigating the exploit described in `OSSN-0075`_.
190
+
191
+Notifications impact
192
+--------------------
193
+
194
+None
195
+
196
+Other end user impact
197
+---------------------
198
+
199
+None
200
+
201
+Performance Impact
202
+------------------
203
+
204
+The images table will grow indefinitely, though the associated tables
205
+(image_properties, image_tags, image_members, image_locations) can be purged by
206
+the ``glance-manage`` tool.
207
+
208
+The images table can be partially purged at appropriate intervals.
209
+
210
+Other deployer impact
211
+---------------------
212
+
213
+Operators will have to monitor Glance for abnormal usage patterns and take
214
+appropriate action.
215
+
216
+Additionally, operators should be made aware of the cross-region version of the
217
+OSSN-0075 exploit (as discussed in the Note in the Problem Description
218
+section).
219
+
220
+Developer impact
221
+----------------
222
+
223
+None
224
+
225
+Implementation
226
+==============
227
+
228
+Assignee(s)
229
+-----------
230
+
231
+Primary assignee:
232
+
233
+* brian-rosmaita
234
+
235
+Other contributors:
236
+
237
+* undetermined
238
+
239
+Work Items
240
+----------
241
+
242
+1. Modify the ``glance-manage`` tool:
243
+
244
+   * The current behavior is that it purges all tables of soft-deleted rows.
245
+     Change the behavior so that the images table is not purged by default.
246
+
247
+   * Add a new command to purge the images table.  It should take the
248
+     ``--age-in-days`` and ``--max-rows`` options just like the current purge
249
+     command.
250
+
251
+2. update operator documentation
252
+
253
+3. release note
254
+
255
+Dependencies
256
+============
257
+
258
+No new dependencies.
259
+
260
+Testing
261
+=======
262
+
263
+Appropriate unit tests to ensure the changes to glance and the glance-manage
264
+tool function correctly.
265
+
266
+Documentation Impact
267
+====================
268
+
269
+The Glance Administrator Guide will need to be updated.
270
+
271
+References
272
+==========
273
+
274
+`OSSN-0075`_: `Deleted Glance image IDs may be reassigned`.
275
+
276
+.. _OSSN-0075: https://wiki.openstack.org/wiki/OSSN/OSSN-0075

Loading…
Cancel
Save