yaql/doc/source/language_reference.rst
Stan Lagun 7790de30c9 YAQL syntax documentation
Change-Id: I411ecffd0ab4f26041c626fcce2351983b9507e0
2016-12-06 21:17:27 -08:00

403 lines
17 KiB
ReStructuredText

Language reference
==================
YAQL is a single expression language and as such does not have any block
constructs, line formatting, end of statement marks or comments. The expression
can be of any length. All whitespace characters (including newline) that are
not enclosed in quote marks are stripped. Thus, the expressions may span
multiple lines.
Expressions consist of:
* Literals
* Keywords
* Variable access
* Function calls
* Binary and unary operators
* List expressions
* Dictionary expressions
* Index expressions
* Delegate expressions
Terminology
~~~~~~~~~~~
* `YAQL` - the name of the language - acronym for `Yet Another Query Language`
* `yaql` - Python implementation of the YAQL language (this package)
* `expression` - a YAQL query that takes context as an input and produces
result value
* `context` - an object that (directly or indirectly) holds all the data
available to expression and all the function implementations accessible to
expression
* `host` - the application that hosts the yaql interpreter. The host uses yaql
to evaluate expressions, provides initial data, and decides which functions
are going to be available to the expression. The host has ultimate power
to customize yaql - provide additional functions, operators, decide not to
use standard library or use only parts of it, override function and operator
behavior
* `variable` - any data item that is available through the context
* `function` - a Python callable that is exposed to the YAQL expression and
can be called either explicitly or implicitly
* `delegate` - a Python callable that is available as a context variable (in
expression data rather than registered in context)
* `operator` - a form of implicit function on one (unary operator) or two
(binary operator) operands
* `alphanumeric` - consists of latin letters and digits (`A-Z`, `a-z`, `0-9`)
Literals
~~~~~~~~
Literals refer to fixed values in expressions. YAQL has the following literals:
* Integer literals: ``123``
* Floating point literals: ``1.23``, ``1.0``
* Boolean and null literals represented by `keywords` (see below)
* String literals enclosed in either single (') or double (") quotes:
``"abc"``, ``'def'``. The backslash (\) character is used to escape
characters that otherwise have a special meaning, such as newline, backslash
itself, or the quote character
* Verbatim strings enclosed in back quote characters, for example ```abc```,
are used to suppress escape sequences. This is equivalent to ``r'strings'``
in Python and is especially useful for regular expressions
Keywords
~~~~~~~~
Keyword is a sequence of characters that conforms to the following criteria:
* Consists of non-zero alphanumeric characters and an underscore (`_`)
* Doesn't start with a digit
* Doesn't start with two underscore characters (`__`)
* Is not enclosed in quote marks of any type
YAQL has only three predefined keywords: `true`, `false`, and `null` that have
the value of similar JSON keywords.
There are also four keyword operators: `and`, `or`, `not`, `in`. However, this
list is not fixed. The yaql host may decide to have additional keyword
operators or not to have any of the four aforementioned keywords.
All other keywords have the value of their string representation. Thus, except
for the predefined keywords and operators they can be considered as string
literals and can be used anywhere where string is expected. However the
opposite is not true. That is, keywords can be used as string literals but
string literals cannot be used where a token is expected.
Examples:
* ``John + Snow`` - the same as ``"John" + "Snow"``
* ``true + love`` - syntactically valid, but cannot be evaluated because
there is no plus operator that accepts boolean and string (unless you define
one)
* ``not true`` - evaluates to `false`, `not` is an operator
* ``"foo"()`` - invalid expression because the function name must be a token
* ``John Snow`` - invalid expression - two tokens with no operator between
them
Variable access
~~~~~~~~~~~~~~~
Each YAQL expression is a function that takes inputs (arguments) and produces
the result value (usually by doing some computations on those inputs).
Expressions get the input through a `context` - an object that holds all the
data and a list of functions, available for expression.
Besides the argument values, expressions may populate additional data items
to the context. All these data are collectively known as a `variables` and
available to all parts of an expression (unless overwritten with another
value).
The syntax for accessing variable values is ``$variableName`` where
`variableName` is the name of the variable. Variable names may consist of
alphanumeric and underscore characters only. Unlike tokens, variable names
may start with digit, any number of underscores and even be an empty string.
By convention, the first (usually the single) function parameter is accessible
through ``$`` expression (i.e. empty string variable name) which is an alias
for ``$1``. The usual case is to pass the main expression data in a single
structure (document) and access it through the ``$`` variable.
If the variable with given name is not provided, it is assumed to be `null`.
There is no built-in syntax to check if a variable exists to distinguish cases
where it does not and when it is just set to null. However in the future such a
function might be added to yaql standard library.
When the yaql parser encounters the ``$variable`` expression, it automatically
translates it to the ``#get_context_data("$variable")`` function call.
By default, the `#get_context_data` function returns a variable value from the
current context. However the yaql host may decide to override it and provide
another behavior. For example, the host may try to look up the value in an
external data source (database) or throw an exception due to a missing
variable.
Function calls
~~~~~~~~~~~~~~
The power of YAQL comes from the fact that almost everything in YAQL is a
function call (explicit or implicit) and any function may be overridden
by the host. In YAQL there are two types of functions:
* `explicit function` - those that can be called from expressions
* `implicit (system) functions` - functions with predefined names that get
called upon some operations. For example, ``2 + 3`` is translated to
``#operator_+(2, 3)``. In this case, `#operator_+` is the name of the
implicit function. However, because ``#operator_+(2, 3)`` is not a valid YAQL
expression (because of `#`), implicit functions cannot be called explicitly
but still can be redefined by the host.
The syntax for explicit function is:
.. productionlist::
call: funcName "(" [parameters] ")"
funcName: token
parameters: positionalParameters |
: keywordParameters |
: positionalParameters "," keywordParameters
positionalParameters: parameter ("," parameter)*
parameter: expression | empty-string
keywordParameters: keywordParameter ("," keywordParameter)
keywordParameter: parameterName "=>" expression
parameterName: token
In simple words:
* The function name must be a token.
* Parameters may be positional, keyword or both. But keyword parameters
may not come before positional.
* Positional parameters can be skipped if they have a default value, for
example, ``foo(1,,3)``.
* Keyword arguments must have a token name that must match the parameter name
in the function declaration. Therefore, you must know the function signature
for the right name.
Examples:
* ``foo(2 + 3)``
* ``bar(hello, world)``
* ``baz(a,b, kwparam1 => c, kwparam2 => d)``
Functions have ultimate control over how they can be called. In particular:
* Each parameter may (and usually does) have an associated type check. That is,
the function may specify that the expected parameter type and if it can be
null.
* Usually, any parameters can be passed either by positional or keyword syntax.
However, function declaration may force one particular way and make it
positional-only or keyword-only.
* A function may have a variable number of positional (aka `*args`) and/or
keyword (aka `**kwarg`) arguments.
* In most languages, function arguments are evaluated prior to function
invocation. This is not always true in YAQL. In YAQL, a function may declare
a lazy argument. In this case, it is not evaluated and the function
implementation receives a passed value as a callable or even as an AST,
depending on how the parameter was declared. Thus in YAQL there is no special
syntax for lambdas. ``foo($ + 1)`` may mean either "call `foo` with value of
``$ + 1``" or "call `foo` with expression ``$ + 1`` as a parameter". In the
latter case it corresponds to ``foo(lambda *args, **kwargs: args[0] + 1)`` in
Python. Actual argument interpretation depends on the parameter declaration.
* Function may decide to disable keyword argument syntax altogether. For such
functions, the ``name => expr`` expression will be interpreted as a
positional parameter ``yaql.language.utils.MappingRule(name, expr)`` and
the left side of `=>` can be any expression and not just a keyword. This
allows for functions like ``switch($ > 0 => 1, $ < 0 => -1, $ = 0 => 0)``.
Additionally, there are three subtypes of explicit functions. Suppose that
there is a declared function ``foo(string, int)``. By default, the syntax to
call it will be ``foo(something, 123)``. But it can be declared as a `method`.
In this case, the syntax is going to be ``something.foo(123)``. Because of the
type checking, ``something.foo(123)`` will work since `something` is a
string, but not the ``123.foo(456)``. Thus `foo` becomes a method of a string
type.
A function may also be declared as being an extension method. If foo were to be
declared as an extension method it could be called both as a function
(``foo(string, int)``) and as a method (``something.foo(123)``).
YAQL makes use of a full function signature to determine which function
implementation needs to be executed. This allows several overloads of the same
function as long as they differ by parameter count or parameter type,
or anything else that allows unambiguous identification of the right overload
from the function call expression. For example, ``something.foo(123)`` may
be resolved to a completely different implementation of `foo` from that in
``foo(something, 123)`` if there are two functions with the name `foo` present
in the context, but one of them was declared as a function while the other as
a method. If several overloads are equally suitable for the call expression,
an `AmbiguousFunctionException` or `AmbiguousMethodException` exception gets
raised.
Operators
~~~~~~~~~
YAQL has both binary and unary operators, like most other languages do.
Parentheses and `=>` sequence are not considered as operators and handled
internally by the yaql parser. However, it is possible to configure yaql to use
sequence other than `=>` for that purpose.
The list of available operators is not fixed and can be modified by the host.
The following operators are available by default:
Binary operators:
+--------------------------+---------------------------------+
| Group | Operators |
+==========================+=================================+
| math operators | `+`, `-`, `*`, `/`, `mod` |
+--------------------------+---------------------------------+
| comparision operators | `>`, `<`, `>=`, `<=`, `=`, `!=` |
+--------------------------+---------------------------------+
| logical operators | `and`, `or` |
+--------------------------+---------------------------------+
| method/member access | `.`, `?.` |
+--------------------------+---------------------------------+
| regex operators | `=~`, `!~` |
+--------------------------+---------------------------------+
| membership operator | `in` |
+--------------------------+---------------------------------+
| context passing operator | `->` |
+--------------------------+---------------------------------+
Unary operators:
+--------------------------+---------------------------------+
| Group | Operators |
+==========================+=================================+
| math operators | `+`, `-` |
+--------------------------+---------------------------------+
| logical operators | `not` |
+--------------------------+---------------------------------+
YAQL supports for both prefix and suffix unary operators. However, only the
prefix operators are provided by default.
In YAQL there are no built-in operators. The parser is given a list of all
possible operator names (symbols), their associativity, precedence, and type,
but it knows nothing about what operators are applicable for what operands.
Each time a parser recognizes the ``X OP Y`` construct and `OP` is a known
binary operator name, it translates the expression to ``#operator_OP(X, Y)``.
Thus. ``2 + 3`` becomes ``#operator_+(2, 3)`` where `#operator_+` is an
implicit function with several implementations including the one for number
addition and defined in standard library. The host may override it and even
completely disable it. For unary operators, ``OP X`` (or ``X OP`` for suffix
unary operators) becomes ``#unary_operator_OP(X)``.
Upon yaql parser initialization, an operator might be given an alias name.
In such cases, ``X OP Y`` is translated to ``*ALIAS(X, Y)`` and ``OP X`` to
``*ALIAS(X)``. This decouples the operator implementation from the operator
symbol. For example, the `=` operator has the `equal` alias. The host may
configure yaql to have the `==` operator instead of `=` keeping the same alias
so that operator implementation and all its consumers work equally well for the
new operator symbol. In default configuration only `=` and `!=` operators have
alias names.
For information on default operators, see the YAQL standard library reference.
List expressions
~~~~~~~~~~~~~~~~
List expressions have the following form:
.. productionlist::
listExpression: "[" [expressions] "]"
expressions: expression ("," expression)*
When a yaql parser encounters an expression of the form ``[A, B, C]``, it
translates it into ``#list(A, B, C)`` (for arbitrary number of arguments).
Default `#list` function implementation in standard library produces a list
(tuple) comprised of given elements. However, the host might decide to give it
a different implementation.
Map expressions
~~~~~~~~~~~~~~~
Map expressions have the following form:
.. productionlist::
mapExpression: "{" [mappings] "}"
mappings: mapping ("," mapping)*
mapping: expression "=>" expression
When a yaql parser encounters an expression of the form ``{A => X, B => Y}``,
it translates it into ``#map(A => X, B => Y)``.
The default `#map` implementation disables the keyword arguments syntax and
thus receives a variable length list of mappings, which allows dictionary
keys to be expressions rather than a keyword. It returns a (frozen) dictionary
that itself can be used as a key in another map expression. For example,
``{{a => b} => {[2 + 2, 2 * 2] => 4}}`` is a valid YAQL expression though
yaql REPL utility will fail to display its output due to the fact that it is
not JSON-compatible.
Index expressions
~~~~~~~~~~~~~~~~~
Index expressions have the following form:
.. productionlist::
indexExpression: expression listExpression
Examples:
* ``[1, 2, 3][0]``
* ``$arr[$index + 1]``
* ``{foo => 1, bar => 2}[foo]``
When a yaql parser encounters such an expression, it translates it into
``#indexer(expression, index)``.
The standard library provides a number of `#indexer` implementations for
different types.
The right side of the index expression is a list expression. Therefore, an
expression like ``$foo[1, x, null]`` is also a valid YAQL expression and will
be translated to ``#indexer($foo, 1, x, null)``. However, any attempt to
evaluate such expression will result in `NoMatchingFunctionException` exception
because there is no `#indexer` implementation that accepts such arguments
(unless the host defines one).
Delegate expressions
~~~~~~~~~~~~~~~~~~~~
Delegate expressions is an optional language feature that is disabled by
default. It makes possible to pass delegates (callables) as part of the context
data and invoke them from the expression. It has the same syntax as explicit
function calls with the only difference being that instead of function name
(keyword) there is a non-keyword expression that must produce the delegate.
Examples:
* ``$foo(1, arg => 2)`` - call delegate returned by ``$foo`` with parameters
``(1, arg => 2)``
* ``[$foo, $bar][0](x)`` - the same as ``$foo(x)``
* ``foo()()`` - can be written as ``(foo())()`` - ``foo()`` must return a
delegate
Delegate expressions are translated into ``#call(callable, arguments)``.
Thus ``$foo(1, 2)`` becomes ``#call($foo, 1, 2)``.
The default implementation of ``#call`` invokes the result of the evaluation
of its first arguments with the given arguments.