Files
deb-python-colander/docs/index.rst
Chris McDonough c6ec36b5c0 Documentation.
2010-03-12 05:08:14 +00:00

15 KiB

Cereal

Cereal is useful as a system for validating and deserializing data obtained via XML, JSON, an HTML form post or any other equally simple data serialization. Cereal can be used to:

  • Define a data schema
  • Serialize an arbitrary Python structure to a data structure composed of strings, mappings, and lists.
  • Deserialize a data structure composed of strings, mappings, and lists into an arbitrary Python structure after validating the data structure against a data schema.

Out of the box, Cereal can serialize the following types of objects:

  • A mapping object (e.g. dictionary)
  • A variable-length sequence of objects (each object is of the same type).
  • A fixed-length tuple of objects (each object is of a different type).
  • A string or Unicode object.
  • An integer.
  • A dotted Python object path.

Cereal allows additional data structures to be serialized and deserialized by allowing a developer to define new "types".

Defining A Cereal Schema

Imagine you want to deserialize and validate a serialization of data you've obtained by reading a YAML document. An example of such a data serialization might look something like this:

{
 'name':'keith',
 'age':'20',
 'friends':[('1', 'jim'),('2', 'bob'), ('3', 'joe'), ('4', 'fred')],
 'phones':[{'location':'home', 'number':'555-1212'},
           {'location':'work', 'number':'555-8989'},],
}

Let's further imagine you'd like to make sure, on demand, that a particular serialization of this type read from this YAML document or another YAML document is "valid".

Notice that all the innermost values in the serialization are strings, even though some of them (such as age and the position of each friend) are more naturally integer-like. Let's define a schema which will attempt to convert a serialization to a data structure that has different types.

import cereal

class Friend(cereal.TupleSchema):
    rank = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 9999))
    name = cereal.Structure(cereal.String())

class Phone(cereal.MappingSchema):
    location = cereal.Structure(cereal.String(), 
                                validator=cereal.OneOf(['home', 'work']))
    number = cereal.Structure(cereal.String())

class PersonSchema(cereal.MappingSchema):
    name = cereal.Structure(cereal.String())
    age = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 200))
    friends = cereal.Structure(cereal.Sequence(Friend()))
    phones = cereal.Structure(cereal.Sequence(Phone()))

For ease of reading, we've actually defined three schemas above, but we coalesce them all into a single PersonSchema. As the result of our definitions, a PersonSchema represents:

  • A name, which must be a string.
  • An age, which must be deserializable to an integer; after deserialization happens, a validator ensures that the integer is between 0 and 200 inclusive.
  • A sequence of friend structures. Each friend structure is a two-element tuple. The first element represents an integer rank; it must be between 0 and 9999 inclusive. The second element represents a string name.
  • A sequence of phone structures. Each phone structure is a mapping. Each phone mapping has two keys: location and number. The location must be one of work or home. The number must be a string.

Structure Objects

A schema is composed of one or more structure objects, usually in a nested arrangement. Each structure object has a required type, an optional validator, and a slightly less optional name.

The type of a structure indicates its data type (such as cereal.Int or cereal.String).

The validator of a structure is called after deserialization; it makes sure the deserialized value matches a constraint. An example of such a validator is provided in the schema above: validator=cereal.Range(0, 200). The name of a structure appears in error reports.

The name of a structure that is introduced as a class-level attribute of a cereal.MappingSchema or cereal.TupleSchema is its class attribute name. For example:

import cereal

class Phone(cereal.MappingSchema):
    location = cereal.Structure(cereal.String(), 
                                validator=cereal.OneOf(['home', 'work']))
    number = cereal.Structure(cereal.String())

The name of the structure defined by location = cereal.Structure(..) is location.

Schema Objects

The result of creating an instance of a cereal.MappingSchema or cereal.TupleSchema object is also a structure object.

Instantiating a cereal.MappingSchema creates a structure which has a type value of cereal.Mapping. Instantiating a cereal.TupleSchema creates a structure which has a type value of cereal.Tuple.

A structure defined by instantiating a cereal.MappingSchema or a cereal.TupleSchema usually has no validator, and has the empty string as its name.

Deserializing A Data Structure Using a Schema

Earlier we defined a schema:

import cereal

class Friend(cereal.TupleSchema):
    rank = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 9999))
    name = cereal.Structure(cereal.String())

class Phone(cereal.MappingSchema):
    location = cereal.Structure(cereal.String(), 
                                validator=cereal.OneOf(['home', 'work']))
    number = cereal.Structure(cereal.String())

class PersonSchema(cereal.MappingSchema):
    name = cereal.Structure(cereal.String())
    age = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 200))
    friends = cereal.Structure(cereal.Sequence(Friend()))
    phones = cereal.Structure(cereal.Sequence(Phone()))

Let's now use this schema to try to deserialize some concrete data structures.

Deserializing A Valid Serialization

data = {
       'name':'keith',
       'age':'20',
       'friends':[('1', 'jim'),('2', 'bob'), ('3', 'joe'), ('4', 'fred')],
       'phones':[{'location':'home', 'number':'555-1212'},
                 {'location':'work', 'number':'555-8989'},],
       }
schema = PersonSchema()
deserialized = schema.deserialize(data)

When schema.deserialize(data) is called, because all the data in the schema is valid, and the structure represented by data conforms to the schema, deserialized will be the following:

{
'name':'keith',
'age':20,
'friends':[(1, 'jim'),(2, 'bob'), (3, 'joe'), (4, 'fred')],
'phones':[{'location':'home', 'number':'555-1212'},
          {'location':'work', 'number':'555-8989'},],
}

Note that all the friend rankings have been converted to integers, likewise for the age.

Deserializing An Invalid Serialization

Below, the data` structure has some problems. Theageis a negative number. The rank forbobistwhich is not a valid integer. Thelocationof the first phone isbar, which is not a valid location (it is not one of "work" or "home"). What happens when a data structure cannot be deserialized due to a data type error or a validation error? .. code-block:: python :linenos: import cereal data = { 'name':'keith', 'age':'-1', 'friends':[('1', 'jim'),('t', 'bob'), ('3', 'joe'), ('4', 'fred')], 'phones':[{'location':'bar', 'number':'555-1212'}, {'location':'work', 'number':'555-8989'},], } schema = PersonSchema() try: schema.deserialize(data) except cereal.Invalid, e: print e.asdict() Thedeserializemethod will raise an exception, and theexceptclause above will be invoked, causinge.asdict()to be printed. This wil print: .. code-block:: python :linenos: {'age':'-1 is less than minimum value 0', 'friends.1.0':'"t" is not a number', 'phones.0.location:'"bar" is not one of ["home", "work"]'} The above error dictionary is telling us that: - The top-level age variable failed validation. - Bob's rank (the Friend tuple namebob's zeroth element) is not a valid number. - The zeroth phone number has a bad location: it should be one of "home" or "work". Defining A Schema Imperatively ------------------------------ The above schema we defined was defined declaratively via a set ofclassstatements. It's often useful to create schemas more dynamically. For this reason, Cereal offers an "imperative" mode of schema configuration. Here's our previous declarative schema: .. code-block:: python :linenos: import cereal class Friend(cereal.TupleSchema): rank = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 9999)) name = cereal.Structure(cereal.String()) class Phone(cereal.MappingSchema): location = cereal.Structure(cereal.String(), validator=cereal.OneOf(['home', 'work'])) number = cereal.Structure(cereal.String()) class PersonSchema(cereal.MappingSchema): name = cereal.Structure(cereal.String()) age = cereal.Structure(cereal.Int(), validator=cereal.Range(0, 200)) friends = cereal.Structure(cereal.Sequence(Friend())) phones = cereal.Structure(cereal.Sequence(Phone())) We can imperatively construct a completely equivalent schema like so: .. code-block:: python :linenos: import cereal friend = cereal.Structure(Tuple()) friend.add(cereal.Structure(cereal.Int(), validator=cereal.Range(0, 9999), name='rank')) friend.add(cereal.Structure(cereal.String()), name='name') phone = cereal.Structure(Mapping()) phone.add(cereal.Structure(cereal.String(), validator=cereal.OneOf(['home', 'work']), name='location')) phone.add(cereal.Structure(cereal.String(), name='number')) schema = cereal.Structure(Mapping()) schema.add(cereal.Structure(cereal.String(), name='name')) schema.add(cereal.Structure(cereal.Int(), name='age'), validator=cereal.Range(0, 200)) schema.add(cereal.Structure(cereal.Sequence(friend), name='friends')) schema.add(cereal.Structure(cereal.Sequence(phone), name='phones')) Defining a schema imperatively is a lot uglier than defining a schema declaratively, but it's often more useful when you need to define a schema dynamically. Perhaps in the body of a function or method you may need to disinclude a particular schema field based on a business condition; when you define a schema imperatively, you have more opportunity to control the schema composition. Serializing and deserializing using a schema created imperatively is done exactly the same way as you would serialize or deserialize using a schema created declaratively: .. code-block:: python :linenos: data = { 'name':'keith', 'age':'20', 'friends':[('1', 'jim'),('2', 'bob'), ('3', 'joe'), ('4', 'fred')], 'phones':[{'location':'home', 'number':'555-1212'}, {'location':'work', 'number':'555-8989'},], } deserialized = schema.deserialize(data) Defining a New Type ------------------- A new type is a class with two methods::serializeanddeserialize.serializeconverts a Python data structure to a serialization.deserializeconverts a value to a Python data structure. Here's a type which implements boolean serialization and deserialization. It serializes a boolean to the stringtrueorfalse; it deserializes a string (presumablytrueorfalse, but allows some wiggle room fort,on,yes,y, and1) to a boolean value. .. code-block:: python :linenos: class Boolean(object): def deserialize(self, struct, value): if not isinstance(value, basestring): raise Invalid(struct, '%r is not a string' % value) value = value.lower() if value in ('true', 'yes', 'y', 'on', 't', '1'): return True return False def serialize(self, struct, value): if not isinstance(value, bool): raise Invalid(struct, '%r is not a boolean') return value and 'true' or 'false' Here's how you would use the resulting class as part of a schema: .. code-block:: python :linenos: import cereal class Schema(cereal.MappingSchema): interested = cereal.Structure(Boolean()) The above schema has a member namedinterestedwhich will now be serialized and deserialized as a boolean, according to the logic defined in theBooleantype class. Note that the only real constraint of a type class is that itsserializemethod must be able to make sense of a value generated by itsdeserializemethod and vice versa. Defining a New Validator ------------------------ A validator is a callable which accepts two positional arguments:structandvalue. It returnsNoneif the value is valid. It raises acereal.Invalidexception if the value is not valid. Here's a validator that checks if the value is a valid credit card number. .. code-block:: python :linenos: def luhnok(struct, value): """ checks to make sure that the value passes a luhn mod-10 checksum """ sum = 0 num_digits = len(value) oddeven = num_digits & 1 for count in range(0, num_digits): digit = int(value[count]) if not (( count & 1 ) ^ oddeven ): digit = digit * 2 if digit > 9: digit = digit - 9 sum = sum + digit if not (sum % 10) == 0: raise Invalid(struct, '%r is not a valid credit card number' % value) Here's how the resultingluhnokvalidator might be used in a schema: .. code-block:: python :linenos: import cereal class Schema(cereal.MappingSchema): cc_number = cereal.Structure(cereal.String(), validator=lunhnok) Note that the validator doesn't need to check if thevalueis a string: this has already been done as the result of the type of thecc_numberstructure beingcereal.String``. Validators are always passed the deserialized value when they are invoked.

Interface and API Documentation

interfaces.rst api.rst

Indices and tables

  • genindex
  • modindex
  • search