Description of how to use etags to avoid lost updates

This new document describes etags and how they can be used to avoid the lost update problem. To avoid confusion it contains an introduction that explains that it does not cover all of the many use and corner cases for etags. Change-Id: Iacb7cb7cddef32b3cabd371255ae8443ce5beb80
2016-04-05 18:29:28 +01:00 · 2016-04-05 18:29:28 +01:00 · 959e6cdceb
commit 959e6cdceb
parent 51ee384774
1 changed files with 199 additions and 0 deletions
--- a/guidelines/etags.rst
+++ b/guidelines/etags.rst
@ -0,0 +1,199 @@
+
+ETags
+=====
+
+ETags_ are "opaque validator[s] for differentiating between
+multiple representations of the same resource". They are used in a
+variety of ways in HTTP to determine the outcome of conditional
+requests as described in :rfc:`7232`. Understanding the full breadth
+of ETags requires a very complete understanding of HTTP and the
+nuances of resources and their representations. This document does
+not attempt to address all applications of ETags at once, instead it
+addresses specific use cases that have arisen in response to other
+guidelines. It will evolve over time.
+
+ETags and the lost update problem
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Problem
+-----------
+
+HTTP is fundamentally a system for sending representations of
+resources back and forth across a network connection. A common
+interaction is to ``GET /some/resource``, modify the representation,
+and then ``PUT /some/resource`` to update the resource on the server.
+This is an extremely useful and superfically simple pattern that drives
+many APIs in OpenStack and beyond.
+
+That apparently simplicity is misleading: If there are two or more
+clients performing operations on ``/some/resource`` in the same time
+frame they can experience the `lost update problem`_:
+
+* Client A and client B both ``GET /some/resource`` and make changes to
+  their local representation.
+* Client B does a ``PUT /some/resource`` at time 1.
+* Client A does a ``PUT /some/resource`` at time 2.
+
+Client B's changes have been lost. Neither client is made aware of this.
+This is a problem.
+
+A Solution
+----------
+
+HTTP/1.1 and beyond has a solution for this problem called ETags_.
+These provide a validator for different representations of a
+resource that make it straightforward to determine if the
+representation provided by a request or a response is the same as
+one already in hand. This is very useful when validating cached GET
+requests (the ETag answers the question "is what I have in my cache
+the same as what the server would give me?") but is also useful for
+avoiding the lost update problem.
+
+If the scenario described above is modified to use ETags it would
+work like this:
+
+* Client A and client B both ``GET /some/resource``, including a
+  response header named ``ETag`` that is the same for both clients
+  (let's make the ETag 'red57'). Details on ETag generation can be
+  found below.
+* They both make changes to their local representation.
+* Client B does a ``PUT /some/resource`` and includes a header
+  named If-Match_ with a value of ``red57``. The request is
+  successful because the ETag sent in the request is the same as the
+  ETag generated by the server of its current state of the resource.
+* Client A does a ``PUT /some/resource`` and includes the If-Match_
+  header with value ``red57``. This request fails (with a 412_
+  response code) because ``red57`` no longer matches the ETag
+  generated by the server: Its current state has been updated by the
+  request from client B.
+
+
+Client B's changes have not been lost and client A has not
+inadvertently changed something that is not in the form they
+expected. Client A is made aware of this by the response code.
+At this stage, client A can choose to GET the resource again and compare
+their local representation with that just retrieved and choose a course
+of action.
+
+Details
+-------
+
+If a service accepts PUT requests and needs to avoid lost updates it
+can do so by:
+
+* Sending responses to GET requests with an ETag **header** (see
+  below for some discussion on ETag-like attributes in
+  representations).
+* Requiring clients to send an If-Match header with a valid ETag when
+  processing PUT requests.
+* Processing the If-Match header on the server side to compare the
+  ETag provided in the request with the generated ETag of the
+  currently stored representation. If there is a match, carry on
+  with the request action, if not, respond with a 412 status code.
+
+.. note:: An ETag value is a double-quoted string: ``"the etag"``.
+
+.. note:: The If-Match header may contain multiple ETags (separated
+          by commas). If it does, at least one must match for the
+          request to proceed.
+
+.. note:: What section of a codebase takes the responsibility of
+          managing the ETag and If-Match headers is greatly dependent on
+          the architecture of the service. In general the handler or
+          controller for each resource should be the locus of
+          responsibility. It may be there are decorators or libraries
+          that can be shared but such things are beyond the scope of
+          this document. Early implementors are encouraged to write code
+          that is transparent and easy to inspect, allowing easier
+          future extraction.
+
+.. note:: ETags_ can be either strong or weak. see :rfc:`7232` for
+          discussion on how weak ETags may be used. They are not
+          addressed in this document as their import is primarily
+          related to cache handling. Strong ETags signify
+          byte-for-byte equivalence between representations of the
+          same resource. Weak ETags indicate only semantic equivalence.
+
+Each of the steps listed above require functionality to generate ETags
+for representations. Whenever the representation is different the ETag
+should be different. :rfc:`7232#section-2.3.1` has advice on how to
+generate good ETags. In practice they should be:
+
+* Different for different forms of the same resource. For example, the
+  XML and JSON representations of the same version of a resource
+  should have different ETags.
+* Different from version to version.
+* Not based on something that will change when the system restarts.
+  For example not be based on inodes or database keys that are ints
+  or other non-universal identifiers.
+* Not be based on hashes of strings that do not have reliable
+  ordering. For example it can be tempting to make md5 or sha hashes
+  of the JSON string that represents a resource. If the ordering in
+  that JSON is not guaranteed, the ETag is not useful.
+
+Ideally they should be fast to calculate or if not fast then easy
+to store (when the representation is written). A hash of a last
+udpated timestamp and the content-type can work, but only if updates
+are less frequent than clock updates.
+
+.. note:: Many details of how ETags can be useful are left out of this
+          document. It is worth reading :rfc:`7232` in its entirety to
+          understand their purpose, how they work, edge cases and
+          how they interact with other modes of conditional request
+          handling.
+
+Special Cases
+-------------
+
+For simple resources that represent a single unified entity the
+above handling works well. For more complex resources the situation
+becomes more complicated. Some scenarios worth considering:
+
+* When there is a resource which represents a collection of
+  resources (e.g. ``GET /resources`` versus ``GET
+  /resources/some-id``) the strict process for updating one of the
+  resources in that collection when using ETags would be:
+
+  * ``GET /resources`` to get the list of resources.
+  * Do some client side processing to choose a singe resource's id.
+  * ``GET /resources/that-id`` to get the resource and its ``ETag``
+    header.
+  * Modify the local representation.
+  * ``PUT /resources/that-id`` with an ``If-Match`` header
+    containing the ETag.
+
+  This may be considered cumbersome. One way to optimize this is to
+  include an attribute whose value is the ETag in the individual
+  representations of the singular resources in the collection
+  resource. Then the second GET above can be skipped as the ETag is
+  already available.
+
+* When a resource has sub resources (e.g. an ``/image/id`` resource
+  contains a metadata attribute whose content is also available at
+  ``/image/id/metadata``) it can be desirable to retrieve the image
+  resource and then PUT to the metadata resource. Strictly speaking
+  this would require a GET of the metadata resource to determine the
+  ETag.
+
+  If this is a problem, an optimization to work around this is to
+  allow the ETag of the image resource to be an acceptable ETag of
+  the metadata resource when provided in an ``If-Match`` header.
+  If this is done, then it is important that the reverse not be
+  true: The ETag sent with the metadata resource should not be valid
+  in an ``If-Match`` header sent to the image resource.
+
+.. note:: In both of the above scenarios the semantics of ETags are being
+          violated. An ETag is not a magic key to unlock a resource and
+          make it writable. It is a value used to determine if two
+          representations of the same resource are in fact the same. In
+          the situations above they are comparing different resources.
+          Services should only do so if they must. Either because the
+          performance benefit is huge (in which case consider fixing the
+          performance of the API) or the user experience improvement is
+          significant. The latter is far more important and legitimate
+          than the former
+
+.. _lost update problem: https://www.w3.org/1999/04/Editing/
+.. _ETags: https://tools.ietf.org/html/rfc7232#section-2.3
+.. _412: https://tools.ietf.org/html/rfc7232#section-4.2
+.. _If-Match: https://tools.ietf.org/html/rfc7232#section-3.1