5f2675c6ef
Change-Id: I11ac45341a109d8d4cd65ab67a6b162a9202d783
469 lines
23 KiB
ReStructuredText
469 lines
23 KiB
ReStructuredText
Customizing and extending yaql
|
|
==============================
|
|
|
|
Configuring yaql parser
|
|
-----------------------
|
|
|
|
yaql has two main points of customization:
|
|
|
|
* yaql engine settings allow one to configure the query language and execution
|
|
flags shared by all queries that are processed by the same YAQL parser. This
|
|
includes the list of available operators, yaql resources quotas, and other
|
|
engine parameters.
|
|
* By customizing the yaql context object, one can change the list of available
|
|
functions (add new, override existing) and change naming conventions.
|
|
|
|
Engine options are supplied to the `yaql.language.factory.YaqlFactory` class.
|
|
YaqlFactory is used to create instances of the YaqlEngine, that is the YAQL
|
|
parser. This is done by calling the `create` method of the factory. Once the
|
|
engine is created, it captures all the factory options so that they cannot be
|
|
changed for that particular parser any longer. In general, it is recommended
|
|
to have one yal engine instance per application, because construction of the
|
|
parser is an expensive operation and the parser has no internal state and thus
|
|
can be reused for several queries, including in different threads. However, the
|
|
host may have several YAQL parsers for different option sets or dialects.
|
|
|
|
On the contrary, the context object is cheap to create and is mutable by
|
|
design, since it holds the input data for the query. In most cases it is a good
|
|
idea to execute each query in its own context, although all such contexts might
|
|
be the children of some other, fixed context that is created just once.
|
|
|
|
|
|
Customizing operators
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
`YaqlFactory` object holds an operator table that is recognized by the parser
|
|
produced by it. By default, it is prepopulated with standard operators and
|
|
most applications never need to do anything here. However, if the host wants
|
|
to have some custom operator symbol available in its expressions, this table
|
|
needs to be modified. `YaqlFactory` holds the operator symbols and other
|
|
information about the operator that is relevant to the parser, but not the
|
|
implementations. The implementations (what operators actually do) are put
|
|
in the `context` and can be configured for each expression, but the list of
|
|
available operator symbols cannot be changed for the parser once it has been
|
|
built.
|
|
|
|
Each operator in the table is represented by the tuple
|
|
`(op_symbols, op_type, op_alias):
|
|
|
|
* op_symbols are the operator symbols. There are no limitations on how the
|
|
operators can be called as long as they do not contain whitespaces. It can
|
|
be one symbol (like `+`), several symbols (like `=~`) or even a word
|
|
(like `not`). List/index and dictionary expressions require `[]` and `{}`
|
|
binary left associative operators to be present in the table. Otherwise
|
|
corresponding constructions will not work (and can be disabled by removing
|
|
corresponding operators from the table)
|
|
* op_type is one of the values in `yaql.language.factory.OperatorType`
|
|
enumeration: BINARY_LEFT_ASSOCIATIVE and BINARY_RIGHT_ASSOCIATIVE for binary
|
|
operators, PREFIX_UNARY and SUFFIX_UNARY for unary operators, NAME_VALUE_PAIR
|
|
for the keyword/mapping pseudo-operator (that is `=>`, by default).
|
|
* op_alias is the alias name for the operator. See YAQL language reference on
|
|
how operator aliases are used. Aliases are optional and most operators do not
|
|
have it and thus are represented by a tuple of two elements.
|
|
|
|
Operators are grouped by their precedence. Operators with a higher precedence
|
|
come first in the operator table. Operators within the same group have the same
|
|
precedence. Groups are separated by an empty tuple (`()`).
|
|
|
|
The operator table, which is a list of tuples, is available through the
|
|
`operators` attribute of the factory and is open for modification. To simplify
|
|
the editing, `YaqlFactory` provides the `insert_operator` helper method to
|
|
insert an operator before of after some other existing operator to get the
|
|
desired precedence.
|
|
|
|
Execution options
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
Execution options are the settings and flags that affect execution of each
|
|
query and are accessible and processed by both yaql runtime and standard
|
|
library functions.
|
|
|
|
Options are passed to the `create` method of the `YaqlFactory` class in a
|
|
plain key-value dictionary. The factory does not process the dictionary but
|
|
rather attaches the options to the constructed engine (YAQL parser) after which
|
|
they cannot be changed. However, the engine provides a `copy` method that can
|
|
be used to clone the engine with different execution options.
|
|
|
|
The options that are honored by the yaql are:
|
|
|
|
* `"yaql.limitIterators": <INT>` limit iterators by the given number of
|
|
elements. When set, each time any function declares its parameter to be
|
|
iterator, that iterator is modified to not produce more than a given number
|
|
of items. Also, upon the expression evaluation, all the output collections
|
|
and iterators are limited as well. If not set (or set to -1) the result data
|
|
is allowed to contain endless iterators that would cause errors if the result
|
|
where to be serialized (to JSON or any other format). Default is -1 (do not
|
|
limit).
|
|
* `"yaql.memoryQuota": <INT>` - the memory usage quota (in bytes) for all
|
|
data produced by the expression (or any part of it). Default is -1 (do not
|
|
limit).
|
|
* `"yaql.convertTuplesToLists": <True|False>`. When set to true, yaql converts
|
|
all tuples in the expression result to lists. The default is `True`.
|
|
* `"yaql.convertSetsToLists": <True|False>`. When set to true, yaql converts
|
|
all sets in the expression result to lists. Otherwise the produced result
|
|
may contain sets that are not JSON-serializable. The default is `False`.
|
|
* `"yaql.iterableDicts": <True|False>`. When set to true, dictionaries are
|
|
considered to be iterable and iteration over dictionaries produces their
|
|
keys (as in Python and yaql 0.2). Defaults to `False`.
|
|
|
|
Consumers are free to use their own settings or use the options dictionary to
|
|
provide some other environment information to their own custom functions.
|
|
|
|
|
|
Other engine customizations
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
`YaqlFactory` class initializer has two optional parameters that can be used
|
|
to further customize the YAQL parser:
|
|
|
|
* `keyword_operator` allows one to configure keyword/mapping symbol. The
|
|
default is `=>`. Ability to pass named arguments can be disabled altogether
|
|
if `None` or empty string is provided.
|
|
* `allow_delegates` enables or disables delegate expression parsing. Default
|
|
is False (disabled).
|
|
|
|
Working with contexts
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Context is an interface that yaql runtime uses to obtain a list of available
|
|
functions and variables. Any context object must implement
|
|
`yaql.language.contexts.ContextBase` interface and yaql provides several such
|
|
implementations ranging from the `yaql.language.contexts.Context` class,
|
|
that is a basic context implementation, to contexts that allow one to merge
|
|
several other contexts into one or link an existing context into the list of
|
|
contexts.
|
|
|
|
Any context may have a parent context. Any lookup that is done in the context
|
|
is also performed in its parent context, extending all the way up its chain of
|
|
contexts. During expression evaluation, yaql can create a long chain of
|
|
contexts that are all children of the context that was originally passed with
|
|
the query.
|
|
|
|
Most of the yaql customizations are achieved by context manipulations.
|
|
This includes:
|
|
|
|
* Overriding YAQL functions
|
|
* Building context chains and evaluating sub-expressions in different
|
|
contexts
|
|
* Composing context chains from pre-built contexts
|
|
* Having custom `ContextBase` implementations and mixing them with regular
|
|
contexts in the single chain
|
|
|
|
In fact, it is the context which provides the entry point for expression
|
|
evaluation. And thus custom context implementations may completely change
|
|
the way queries are evaluated.
|
|
|
|
There are three ways to create a context instance:
|
|
|
|
#. Directly instantiate one of `ContextBase` implementations to get an empty
|
|
context
|
|
#. Call `create_child_context` method on any existing context object to get a
|
|
child context
|
|
#. Use `yaql.create_context` function to creates the root context that is
|
|
prepopulated with YAQL standard library functions
|
|
|
|
`yaql.create_context` allows one to selectively disable standard library
|
|
modules.
|
|
|
|
Naming conventions
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Naming conventions define how Python functions and parameter names are
|
|
translated into YAQL names. Conventions are implementations of the
|
|
`yaql.language.conventions.Convention` interface that has just two methods:
|
|
one to translate the function name and another to translate the function
|
|
parameter name.
|
|
|
|
yaql has two implementations included:
|
|
|
|
* `yaql.language.conventions.CamelCaseConvention' that translates Python
|
|
conventions into camel case. For example, it will convert
|
|
`my_func(arg_name)` into `myFunc(argName)`. This convention is used by
|
|
default.
|
|
|
|
* `yaql.language.conventions.PythonConvention' that leaves function and
|
|
parameter names intact.
|
|
|
|
Each context, either directly or indirectly through its parent context, is
|
|
configured to use some convention. When a function is registered in the
|
|
context, its name and parameters are translated with the convention methods.
|
|
Also, regardless of convention used, all trailing underscores are stripped
|
|
from the names. This makes it possible to define several Python functions that
|
|
differ only by trailing underscores and get the same name in YAQL (to create
|
|
several overloads of single function). Also, this allow one to have function
|
|
or parameter names that would otherwise conflict with Python keywords.
|
|
|
|
Instance of convention class can be specified as a context initializer
|
|
parameter or as a parameter of `yaql.create_context` function. Child contexts
|
|
created with the `create_child_context` method inherit their parent convention.
|
|
|
|
Extending yaql
|
|
--------------
|
|
|
|
Extending yaql with new functions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For a function to become available to YAQL queries, it must be present in
|
|
the provided context object. The default context implementation
|
|
(`yaql.language.contexts.Context`) has a `register_function` method to register
|
|
the function implementation.
|
|
|
|
In yaql, all functions are represented by instances of the
|
|
`yaql.language.specs.FunctionDefinition` class. FunctionDefinition describes
|
|
the complete function signature including:
|
|
|
|
* Function name
|
|
* List of parameters - instances of `yaql.language.specs.ParameterDefinition`
|
|
* Function payload (Python callable)
|
|
* Function type: function, method or extension method
|
|
* The flag to disable the keyword arguments syntax for the function
|
|
* Documentation string
|
|
* Custom function metadata (dict)
|
|
|
|
`register_function` method can accept either an instance of
|
|
the `FunctionDefinition` class or a regular Python function. In the latter
|
|
case, it constructs a `FunctionDefinition` instance from the declaration of
|
|
the function using Python introspection. Because a YAQL function signature has
|
|
much more information than the Python one, yaql provides a number of function
|
|
decorators that can be used to fill the missing properties.
|
|
|
|
The decorators are located in the `yaql.language.specs` module.
|
|
Below is the list of available function decorators:
|
|
|
|
* ``@name(function_name)``: set function name to be `function_name` rather
|
|
than its Python name
|
|
* ``@parameter(...)`` is used to declare the type of one of the function
|
|
parameters
|
|
* ``@inject(...)`` is used to declare a hidden function parameter
|
|
* ``@method`` declares function to be YAQL method
|
|
* ``@extension_method`` declares function to be YAQL extension method
|
|
* ``@no_kwargs`` disables the keyword arguments syntax for the function
|
|
* ``@meta(name, value)`` appends the `name` attribute with the given value to
|
|
the function metadata dictionary
|
|
|
|
|
|
Specifying function parameter types
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When yaql constructs `FunctionDefinition`, it collects all possible information
|
|
about its parameters. For each parameter, it records its name, position,
|
|
whether it is a keyword-only argument (available in Python 3), whether it is
|
|
an `*args` or `**kwargs`, and its default parameter value.
|
|
|
|
The only parameter attribute that cannot be obtained through retrospection is
|
|
the parameter type. For that purpose, yaql has a ``@parameter(name, type)``
|
|
decorator that can be used to explicitly declare the parameter type.
|
|
`name` must match the name of one of the function parameters, and `type` must
|
|
be of the `yaql.language.yaqltypes.SmartType` type.
|
|
|
|
`SmartType` is the base class for all yaql type descriptors - classes that
|
|
check if the value is compatible with the desired type and can do type
|
|
conversion between compatible types.
|
|
|
|
YAQL type system slightly differs from Python's:
|
|
|
|
* Strings are not considered to be collections of characters
|
|
* Booleans are not integers
|
|
* Dictionaries are not iterable
|
|
* For most of the types one can specify if the `null` (`None`) value is
|
|
acceptable
|
|
|
|
`yaql.language.yaqltypes` module has many useful smart-type classes. The most
|
|
generic smart-type for primitive types is the `PythonType` class, that
|
|
validates if the value is instance of a given Python type. Due to the mentioned
|
|
differences between YAQL and Python type systems and because
|
|
Python types have a lot of nuances (several string types, differences between
|
|
Python 2 and Python 3, separation between mutable and immutable type versions:
|
|
list-tuple, set-frozenset, dict-FrozenDict, which is missing in Python
|
|
and provided by the yaql instead), yaql provides specialized smart-types
|
|
for most primitive types:
|
|
|
|
* `String` - str and unicode
|
|
* `Integer`
|
|
* `Number` - integer of float
|
|
* `DateTime`
|
|
* `Sequence` - fixed-size iterable collection, except for the dictionary
|
|
* `Iterable` - any iterable or generator
|
|
* `Iterator` - iterator over the iterable
|
|
|
|
And several specialized variants that enforce particular representation in the
|
|
YAQL syntax:
|
|
|
|
* `Keyword`
|
|
* `BooleanConstant`
|
|
* `NumericConstant`
|
|
* `StringConstant`
|
|
|
|
It is also possible to aggregate several smart-types so that the value can be
|
|
of any given type or conform to all of them:
|
|
|
|
* `AnyOf`
|
|
* `Chain`
|
|
* `NotOfType`
|
|
|
|
These three smart-types accept other smart-type(s) as their initializer
|
|
parameter(s).
|
|
|
|
In addition to the smart-types, the second parameter of the `@parameter` can be
|
|
a Python type. For example, ``@parameter("name", unicode)`` or
|
|
``@parameter("name", unicode, nullable=True)``. In this case the Python type
|
|
is automatically wrapped in the `PythonType` smart-type. If nullability is not
|
|
specified, yaql tries to infer it from the parameter declaration - it is
|
|
nullable only if the parameter has its default value set to `None`.
|
|
|
|
Lazy evaluated function parameters
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
All the smart-types from the previous section are for parameters that are
|
|
evaluated before the function gets invoked. But sometimes the function might
|
|
need the parameter to remain unevaluated so that it can be evaluated by the
|
|
function itself, possibly with additional parameters or in a different context.
|
|
|
|
There are two possible representations of non-evaluated arguments:
|
|
|
|
* Get it as a Python callable that the function can call to do the evaluation
|
|
* Get it as a YAQL expression (AST), that can be analyzed
|
|
|
|
The first method is available through the `Lambda` smart-type. The parameter,
|
|
which is declared as a ``Lambda()``, has an `*args/**kwargs` signature and can
|
|
be called from the function: ``parameter(arg1, arg2)``. If it was declared as
|
|
``Lambda(with_context=True)`` the function may invoke it in a context, other
|
|
than that which is used for the function:
|
|
``parameter(new_context, arg1, arg2)``. ``Lambda(method=True)`` specifies
|
|
that the parameter must be a method and the caller can specify the receiver
|
|
object for it: ``parameter(receiver, arg1, arg2)``. Parameters can also be
|
|
combined: ``Lambda(with_context=True, method=True)`` so the callable is
|
|
invoked as ``parameter(receiver, new_context, arg1, arg2)``. All supplied
|
|
callable arguments are automatically published to the `$1` (`$`), `$2` and
|
|
so on context variables for the context in which the callable will be executed.
|
|
|
|
The second method is available through the `YaqlExpression` smart-type. It
|
|
also allows one to request the parameter to be of a particular expression type
|
|
rather than an arbitrary YAQL expression.
|
|
|
|
Auto-injected function parameters
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Besides regular parameters, yaql also supports auto-injected (hidden)
|
|
parameters. This is also known as a function parameter dependency injection.
|
|
The values of injected parameters come from the yaql runtime rather than from
|
|
the caller. Functions use injected parameters to get information on their
|
|
execution environment.
|
|
|
|
Auto-injected parameters are declared using the ``@inject(...)`` decorator,
|
|
which has exactly the same signature as `@parameter` with the only difference
|
|
being that `@inject` checks that that the supplied smart-type is an instance
|
|
of the `yaql.language.yaqltypes.HiddenParameterType` class (in addition to
|
|
`SmartType`), whereas the `@parameter` decorator checks that it is not. This
|
|
difference exists to clearly distinguish explicitly passed parameters from
|
|
those that are injected by the system.
|
|
|
|
yaql has the following hidden parameter smart types:
|
|
|
|
* `Context` - injects the current function context object
|
|
* `Engine` - injects `YaqlEngine` object that was used to parse the expression.
|
|
Engine object may be used to access execution options or to parse some other
|
|
expression
|
|
* `FunctionDefinition` - `FunctionDefinition` object of the function. May be
|
|
used to obtain function metadata and doc-string
|
|
* `Delegate` - injects a Python callable to some other YAQL function by its
|
|
name. This is a convenient way to call one YAQL function from another without
|
|
depending on its Python implementation signature and location. The syntax
|
|
is very similar to `Lambda` smart-type
|
|
* `Super` - similar to `Delegate` - injects callable to an overload of itself
|
|
from the parent context. Useful when the function overload wants to call its
|
|
base implementation (analogous to Python's ``super()``)
|
|
* `Receiver` - injects a method receiver object if the function was called as
|
|
a method and `None` otherwise. Can be used in an extension method to
|
|
distinguish the case, when it was invoked as a method rather than as a
|
|
function. Do not do it without a good reason!
|
|
* `YaqlInterface` - injects a convenient wrapper (`YaqlInterface`) around yaql
|
|
functionality, which also encapsulates many of the values above
|
|
|
|
Auto-injected parameters may appear anywhere in the function signature as they
|
|
do not affect caller syntax. Implementations can add additional hidden
|
|
parameters without breaking existing queries. However, it is important to
|
|
call YAQL function implementations through the yaql mechanisms (such as
|
|
`Delegate`), rather than to call their Python implementations directly.
|
|
|
|
Automatic parameters
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In some cases there is no need to declare the parameter at all. yaql uses
|
|
parameter name and default value to guess the parameter type if it was not
|
|
declared.
|
|
|
|
If the parameter name is `context` or `__context` it will automatically
|
|
be treated as if it was declared as a `Context`. `engine`/`__engine` is
|
|
considered as an `Engine`, and `yaql_interface`/`__yaql_interface` is
|
|
considered as a `YaqlInterface`.
|
|
|
|
The host can override this logic by providing a callable to Context's
|
|
`register_function` method through the `parameter_type_func` parameter.
|
|
When yaql encounters an undeclared parameter, it calls this function, passing
|
|
the parameter name as an argument, and expects it to return a smart-type
|
|
for the parameter.
|
|
|
|
If the `parameter_type_func` callable returned `None`, yaql would assume that
|
|
the smart type should be `PythonType(object)`, that is anything, except for
|
|
the `None` value, unless the parameter had the default value `None`.
|
|
|
|
Function resolution rules
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Function resolution rules are used to determine the correct overload of the
|
|
function when more than one overload is present in the context. Each time a
|
|
function with a given list of parameters is called yaql does the following:
|
|
|
|
#. Walks through the chain of context objects and collects all the
|
|
implementations with a given name and appropriate type (either functions
|
|
and extension methods or methods and extensions methods, depending on the
|
|
call syntax).
|
|
#. All found overloads are organized into layers so that overloads from the
|
|
same context will be put in the same layer whereas overloads from different
|
|
contexts are in different layers. Overloads from contexts that are closer
|
|
to the initial context have precedence over those which were obtained from
|
|
the parent contexts. Also `FunctionDefinition` may have a flag that prevents
|
|
all overload lookups in the parent contexts. If the search encounters an
|
|
overload with such a flag, it does not go any further in the chain.
|
|
#. Scan all found overloads and exclude those, that cannot be called by the
|
|
given syntax. This can happen because the overload has more mandatory
|
|
parameters than the arguments in the calling expression, or because it
|
|
passes the argument using the keyword name and no such parameter exists.
|
|
#. Validates laziness of overload parameters. If at least one function overload
|
|
has a lazy evaluated parameter all other overloads must have it in the same
|
|
position. Violation of this rule causes an exception to be thrown.
|
|
#. All the non-lazy parameters are evaluated. The result values are validated
|
|
by appropriate smart-type instances corresponding to each parameter of
|
|
each overload. All the overloads that are not type-compatible with the
|
|
given arguments are excluded in each layer.
|
|
#. Take first non-empty layer. If no such layer exists (that is all the
|
|
overloads were excluded) then throw an exception.
|
|
#. If the found layer has more than one overload, then we have an ambiguity.
|
|
In this case an exception is thrown since we cannot unambiguously determine
|
|
the right overload.
|
|
#. Otherwise, call the single overload with previously evaluated arguments.
|
|
|
|
|
|
Function development hints
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
* Avoid side effects in your functions, unless you absolutely have to.
|
|
* Do not make changes to the data structures coming from the parameters or the
|
|
context. Functions that modify the data should return the modified copy
|
|
rather than touch the original.
|
|
* If you need to make changes to the context, create a child context and
|
|
make them there. It is usually possible to pass the new context to other
|
|
parts of the query.
|
|
* Strongly prefer immutable data structures over mutable ones. Use `tuple`s
|
|
rather than `list`s, `frozenset` instead of `set`. Python does not have a
|
|
built-in immutable dictionary class so yaql provides one on its own -
|
|
`yaql.language.utils.FrozenDict`.
|
|
* Do not call Python implementation of YAQL functions directly. yaql provides
|
|
plenty of ways to do so.
|
|
* Do not reuse contexts between multiple queries unless it is intentional.
|
|
However all of these contexts can be children of a single prepared context.
|
|
* Do not register all the custom functions for each query. It is better to
|
|
prepare all the contexts with functions at the beginning and then use
|
|
child contexts for each query executed.
|