Description of how to use etags to avoid lost updates

This new document describes etags and how they can be used to
avoid the lost update problem. To avoid confusion it contains an
introduction that explains that it does not cover all of the many
use and corner cases for etags.

Change-Id: Iacb7cb7cddef32b3cabd371255ae8443ce5beb80
This commit is contained in:
Chris Dent 2016-04-05 18:29:28 +01:00
parent 51ee384774
commit 959e6cdceb
1 changed files with 199 additions and 0 deletions

199
guidelines/etags.rst Normal file
View File

@ -0,0 +1,199 @@
ETags
=====
ETags_ are "opaque validator[s] for differentiating between
multiple representations of the same resource". They are used in a
variety of ways in HTTP to determine the outcome of conditional
requests as described in :rfc:`7232`. Understanding the full breadth
of ETags requires a very complete understanding of HTTP and the
nuances of resources and their representations. This document does
not attempt to address all applications of ETags at once, instead it
addresses specific use cases that have arisen in response to other
guidelines. It will evolve over time.
ETags and the lost update problem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Problem
-----------
HTTP is fundamentally a system for sending representations of
resources back and forth across a network connection. A common
interaction is to ``GET /some/resource``, modify the representation,
and then ``PUT /some/resource`` to update the resource on the server.
This is an extremely useful and superfically simple pattern that drives
many APIs in OpenStack and beyond.
That apparently simplicity is misleading: If there are two or more
clients performing operations on ``/some/resource`` in the same time
frame they can experience the `lost update problem`_:
* Client A and client B both ``GET /some/resource`` and make changes to
their local representation.
* Client B does a ``PUT /some/resource`` at time 1.
* Client A does a ``PUT /some/resource`` at time 2.
Client B's changes have been lost. Neither client is made aware of this.
This is a problem.
A Solution
----------
HTTP/1.1 and beyond has a solution for this problem called ETags_.
These provide a validator for different representations of a
resource that make it straightforward to determine if the
representation provided by a request or a response is the same as
one already in hand. This is very useful when validating cached GET
requests (the ETag answers the question "is what I have in my cache
the same as what the server would give me?") but is also useful for
avoiding the lost update problem.
If the scenario described above is modified to use ETags it would
work like this:
* Client A and client B both ``GET /some/resource``, including a
response header named ``ETag`` that is the same for both clients
(let's make the ETag 'red57'). Details on ETag generation can be
found below.
* They both make changes to their local representation.
* Client B does a ``PUT /some/resource`` and includes a header
named If-Match_ with a value of ``red57``. The request is
successful because the ETag sent in the request is the same as the
ETag generated by the server of its current state of the resource.
* Client A does a ``PUT /some/resource`` and includes the If-Match_
header with value ``red57``. This request fails (with a 412_
response code) because ``red57`` no longer matches the ETag
generated by the server: Its current state has been updated by the
request from client B.
Client B's changes have not been lost and client A has not
inadvertently changed something that is not in the form they
expected. Client A is made aware of this by the response code.
At this stage, client A can choose to GET the resource again and compare
their local representation with that just retrieved and choose a course
of action.
Details
-------
If a service accepts PUT requests and needs to avoid lost updates it
can do so by:
* Sending responses to GET requests with an ETag **header** (see
below for some discussion on ETag-like attributes in
representations).
* Requiring clients to send an If-Match header with a valid ETag when
processing PUT requests.
* Processing the If-Match header on the server side to compare the
ETag provided in the request with the generated ETag of the
currently stored representation. If there is a match, carry on
with the request action, if not, respond with a 412 status code.
.. note:: An ETag value is a double-quoted string: ``"the etag"``.
.. note:: The If-Match header may contain multiple ETags (separated
by commas). If it does, at least one must match for the
request to proceed.
.. note:: What section of a codebase takes the responsibility of
managing the ETag and If-Match headers is greatly dependent on
the architecture of the service. In general the handler or
controller for each resource should be the locus of
responsibility. It may be there are decorators or libraries
that can be shared but such things are beyond the scope of
this document. Early implementors are encouraged to write code
that is transparent and easy to inspect, allowing easier
future extraction.
.. note:: ETags_ can be either strong or weak. see :rfc:`7232` for
discussion on how weak ETags may be used. They are not
addressed in this document as their import is primarily
related to cache handling. Strong ETags signify
byte-for-byte equivalence between representations of the
same resource. Weak ETags indicate only semantic equivalence.
Each of the steps listed above require functionality to generate ETags
for representations. Whenever the representation is different the ETag
should be different. :rfc:`7232#section-2.3.1` has advice on how to
generate good ETags. In practice they should be:
* Different for different forms of the same resource. For example, the
XML and JSON representations of the same version of a resource
should have different ETags.
* Different from version to version.
* Not based on something that will change when the system restarts.
For example not be based on inodes or database keys that are ints
or other non-universal identifiers.
* Not be based on hashes of strings that do not have reliable
ordering. For example it can be tempting to make md5 or sha hashes
of the JSON string that represents a resource. If the ordering in
that JSON is not guaranteed, the ETag is not useful.
Ideally they should be fast to calculate or if not fast then easy
to store (when the representation is written). A hash of a last
udpated timestamp and the content-type can work, but only if updates
are less frequent than clock updates.
.. note:: Many details of how ETags can be useful are left out of this
document. It is worth reading :rfc:`7232` in its entirety to
understand their purpose, how they work, edge cases and
how they interact with other modes of conditional request
handling.
Special Cases
-------------
For simple resources that represent a single unified entity the
above handling works well. For more complex resources the situation
becomes more complicated. Some scenarios worth considering:
* When there is a resource which represents a collection of
resources (e.g. ``GET /resources`` versus ``GET
/resources/some-id``) the strict process for updating one of the
resources in that collection when using ETags would be:
* ``GET /resources`` to get the list of resources.
* Do some client side processing to choose a singe resource's id.
* ``GET /resources/that-id`` to get the resource and its ``ETag``
header.
* Modify the local representation.
* ``PUT /resources/that-id`` with an ``If-Match`` header
containing the ETag.
This may be considered cumbersome. One way to optimize this is to
include an attribute whose value is the ETag in the individual
representations of the singular resources in the collection
resource. Then the second GET above can be skipped as the ETag is
already available.
* When a resource has sub resources (e.g. an ``/image/id`` resource
contains a metadata attribute whose content is also available at
``/image/id/metadata``) it can be desirable to retrieve the image
resource and then PUT to the metadata resource. Strictly speaking
this would require a GET of the metadata resource to determine the
ETag.
If this is a problem, an optimization to work around this is to
allow the ETag of the image resource to be an acceptable ETag of
the metadata resource when provided in an ``If-Match`` header.
If this is done, then it is important that the reverse not be
true: The ETag sent with the metadata resource should not be valid
in an ``If-Match`` header sent to the image resource.
.. note:: In both of the above scenarios the semantics of ETags are being
violated. An ETag is not a magic key to unlock a resource and
make it writable. It is a value used to determine if two
representations of the same resource are in fact the same. In
the situations above they are comparing different resources.
Services should only do so if they must. Either because the
performance benefit is huge (in which case consider fixing the
performance of the API) or the user experience improvement is
significant. The latter is far more important and legitimate
than the former
.. _lost update problem: https://www.w3.org/1999/04/Editing/
.. _ETags: https://tools.ietf.org/html/rfc7232#section-2.3
.. _412: https://tools.ietf.org/html/rfc7232#section-4.2
.. _If-Match: https://tools.ietf.org/html/rfc7232#section-3.1