Change-Id: I15ea0a591ab9dce5f15b9ef9bcce5f6cac9ff4bd
13 KiB
Pagination, Filtering, and Sorting
This topic document serves to provide guidance on how to handle the pagination of large result sets and the best ways to provide filtering and sorting capabilities in a project's public REST API.
Pagination
Pagination can be implemented using one or both of two query parameters:
limit
to define the number of items returned in the response, andmarker
to specify the ID of the last seen item
Note that the marker need not always be the ID/UUID field of the record; it can be any field or combination of fields that will uniquely identify that record. Using a marker that is not unique will result in overlap or skipped records between paged results.
For example:
GET /app/items?limit=30
Would return at most 30 items:
{
"items": [
{
"id": "719aae5f70db4364850f6198ea874aa6",
"foo": "bar",
"baz": "quux",
"size": 9
},
...
{
"id": "08ec231f6d9a43dda97d4b950c3393df",
"foo": "buzz",
"baz": "honk",
"size": 6
}
]
}
If, we then wanted to request the next 30 items after the last one, we would do:
GET /app/items?limit=30&marker=08ec231f6d9a43dda97d4b950c3393df
This would return the next (at most) 30 items after the item with ID
08ec231f6d9a43dda97d4b950c3393df
.
The ability to page through results implies that the items are sorted in a consistent fashion in each request, even if that order is nothing more than the order the items were added to the dataset. If the order of the results changes between requests, the returned pages will not have a meaningful relation to each other.
A similar consideration would be how to handle the situation when the
item whose ID is the marker
value is deleted in between
requests. In that event, the response should start with the next item
logically. The definition of "logical" is necessarily fuzzy, and will
depend on how the data is sorted. There may be some cases, however,
where it is not reasonable to try to determine what the next logical
item would be. In those cases, a 400 Bad Request
response should be returned, with a clear explanation in the error
message that the requested marker value does not exist.
Pagination Links
It is also helpful to users if services generate pagination links and
include them in the links
portion of the response body. Providing the
following link types will make pagination navigation easier:
- first
- prev
- self
- next
- last
It is important to note that unless the data being paged is static,
these links cannot be guaranteed to be accurate. For example, if some
items are deleted, the prev
link might contain some items
from the current result.
For example, a response to:
GET /app/items?limit=30&marker=752b0b9997f24be49e5a1d89d1c53279
Would look more akin to:
{
"items": [
{
"id": "719aae5f70db4364850f6198ea874aa6",
"foo": "bar",
"baz": "quux",
"size": 9
},
...
{
"id": "08ec231f6d9a43dda97d4b950c3393df",
"foo": "buzz",
"baz": "honk",
"size": 6
}
],
"links": [
{
"rel": "self",
"href": "http://example.com/app/items?limit=30&marker=752b0b9997f24be49e5a1d89d1c53279",
},
{
"rel": "first",
"href": "http://example.com/app/items?limit=30",
},
{
"rel": "prev",
"href": "http://example.com/app/items?limit=30&marker=eff79f5b4f8743caa1f775846302c1d5",
},
{
"rel": "next",
"href": "http://example.com/app/items?limit=30&marker=08ec231f6d9a43dda97d4b950c3393df",
},
{
"rel": "last",
"href": "http://example.com/app/items?limit=30&marker=6835afb7ea29491bb2722c6c43f1f070",
}
]
}
When using links, the links that are included change based on which
page the user requested. For example, if the user has requested the
first page, then it still makes sense to include first
,
self
, next
, and last
but not
prev
. Likewise if it is the last page, then including
next
is optional but the rest (first
,
prev
, self
, last
) is
sensible.
It should also be emphasized that calculating the last
link can be costly. In many cases, such link calculation would require
querying the entire dataset. Therefore implementing the
last
link is optional.
Link Header Alternative
If services are not including JSON Hyper-Schema
links in their responses, or if they cannot include them for some
reasons, they should return pagination links in the Link
header as defined in 5988
and 6903
.
Note
Adding the Link
to responses should not be considered an
API contract change that needs a either a minor version bump or a
microversion. Because of the nature of HTTP headers and the relationship
of REST services with proxies, load balancers and API gateways, HTTP
clients must already handle the existence of additional headers that may
not be relevant.
Consuming pagination is a fundamental operation that is frequently not done on a per-service basis. Requiring a user to undergo a microversion negotiation or minor version is extra per-service work that is both difficult and which carries no value. Users can simply check to see if a Link header exists, and if one does, they can consume the data in it.
Filtering
Filtering can be implemented as a query parameter named for the field to be filtered on, the value should (naturally) be the value you need to filter for.
An existing example of filtering in Nova It is notable that Nova doesn't support OR filters, requiring separate requests per query.
A different strategy is to specify query objects and pass them as a single URL-encoded JSON list. This is less client-friendly because it requires extra encoding steps.
The simplest way to allow filtering is to map filterable parameters to query parameters. Take the sample object:
GET /app/items
{
"items": [
{
"foo": "bar",
"baz": "quux",
"size": 9
},
{
"foo": "buzz",
"baz": "honk",
"size": 6
}
]
}
To filter on a field, simply add that field and its value to the query.:
GET /app/items?foo=buzz
{
"items": [
{
"foo": "buzz",
"baz": "honk",
"size": 9
}
]
}
Multiple filters result in an implicit AND, so in our example
/app/items?foo=buzz&baz=quux
would provide no
results.
IN operations are available for single fields, using
comma-separated options for the field value and colon separation for the
in
operator. The value must be in the list of values
provided for the query to succeed.:
GET /app/items?foo=in:buzz,bar
{
"items": [
{
"foo": "bar",
"baz": "quux",
"size": 9
},
{
"foo": "buzz",
"baz": "honk",
"size": 6
}
]
}
If values contain commas, they can be quoted similar to CSV escaping.
For example, a query for the value a,bc
or d
would be ?foo=in:"a,bc",d
. If values contain double-quotes,
those can be backslashed inside quotes. Newline ("n") and carriage
return ("r") escapes are also allowed. Actual backslashes must be
doubled. For a value a"b\c
the query would be
?foo="a\"b\\c"
. Unquoted values may not contain quotes and
backslashes are treated as any other character. So for a value
a\b
the query would be ?foo=a\b
.
For queries that need comparisons other than simple equals, operators
are supported for membership, non-membership, inequality, greater-than,
greater-than-or-equal, less-than, and less-than-or-equal-to. In order,
the operators are: in
, nin
, neq
,
gt
, gte
, lt
, and
lte
. Simple equality is the default operation, and is
performed as ?param=foo
.
They can be used in queries compounded with the values they work on.
For example, finding objects with a size greater than 8 would be written
as ?size=gt:8
and would return:
GET /app/items?size=gt:8
{
"items": [
{
"foo": "bar",
"baz": "quux",
"size": 9
}
]
}
Operators must be followed by colons, so the query
?foo=gte
searches for the literal string "gte" and
searching for "gte:" can be done by quoting the value as
?foo="gte:"
.
TODO: Add guidance on a "LIKE" or regex operator to search text.
Paginating responses should be done after applying the filters in a query, because it's possible for there to be no matches in the first page of results, and returning an empty page is a poor API when the user explicitly requested a number of results.
Time based filtering queries
To support filtering based on time intervals such as mentioned in the ISO8601 intervals wikipedia page, it should be possible to express the following use cases through API queries:
- a two-ISO8601-date timestamp interval
- an open-ended, single-ISO8601-date interval
- multiple time intervals an item may belong to
- equality with a default value where no time has been set yet
For instance, the Ironic
Inspector project keeps track of node introspection statuses that
include the started_at
and finished_at
fields.
While the former value is always present, the latter is present only if
the introspection finished:
GET /app/item
{
"items": [
{"id": "item1", "started_at": "2016-10-10T15:00Z",
"finished_at": "2016-10-10T15:30Z"},
{"id": "item2", "started_at": "2016-10-10T15:15Z",
"finished_at": "2016-10-10T16:00Z"},
{"id": "item3", "started_at": "2016-10-10T15:45Z",
"finished_at": null}
]
}
To obtain items that finished between 15:30 and 16:00 UTC Today use an interval with two boundaries:
GET /app/items?finished_at=ge:15:30&finished_at=lt:16:00
{
"items": [
{"id": "item1", "started_at": "2016-10-10T15:00Z",
"finished_at": "2016-10-10T15:30Z"}
]
}
To list items that finished any time after 15:30 UTC Today, use an open-ended time interval query:
GET /app/items?finished_at=ge:15:30
{
"items": [
{"id": "item1", "started_at": "2016-10-10T15:00Z",
"finished_at": "2016-10-10T15:30Z"},
{"id": "item2", "started_at": "2016-10-10T15:15Z",
"finished_at": "2016-10-10T16:00Z"}
]
}
Finally, to include items that didn't finish yet, use the default value equality. Since the queries are implicitly AND-ed, use two requests:
GET /app/items?finished_at=ge:16:00
{
"items": [
{"id": "item2", "started_at": "2016-10-10T15:15Z",
"finished_at": "2016-10-10T16:00Z"}
]
}
GET /app/items?finished_at=null
{
"items": [
{"id": "item3", "started_at": "2016-10-10T15:45Z",
"finished_at": null}
]
}
Sorting
Sorting is determined through the use of the 'sort' query string parameter. The value of this parameter is a comma-separated list of sort keys. Sort directions can optionally be appended to each sort key, separated by the ':' character.
The supported sort directions are either 'asc' for ascending or 'desc' for descending.
The caller may (but is not required to) specify a sort direction for each key. If a sort direction is not specified for a key, then a default is set by the server.
For example:
- Only sort keys specified:
sort=key1,key2,key3
- 'key1' is the first key, 'key2' is the second key, etc.
- Sort directions are defaulted by the server
- Some sort directions specified:
sort=key1:asc,key2,key3
- Any sort key without a corresponding direction is defaulted
- 'key1' is the first key (ascending order), 'key2' is the second key (direction defaulted by the server), etc.
- Equal number of sort keys and directions specified:
sort=key1:asc,key2:desc,key3:asc
- Each key is paired with the corresponding direction
- 'key1' is the first key (ascending order), 'key2' is the second key (descending order), etc.
Note that many projects have implemented sorting using repeating 'sort_key' and 'sort_dir' query string parameters, see [1]. As these projects adopt these guidelines, they should deprecate the older parameters appropriately.
[1]: https://wiki.openstack.org/wiki/API_Working_Group/Current_Design/Sorting