d01a78cc7b
Change-Id: Ic03d341e6e081a96d790915087273e5b9f1af90b
481 lines
13 KiB
ReStructuredText
481 lines
13 KiB
ReStructuredText
Getting started with YAQL
|
|
=========================
|
|
|
|
Introduction to YAQL
|
|
--------------------
|
|
|
|
YAQL (Yet Another Query Language) is an embeddable and extensible query
|
|
language that allows performing complex queries against arbitrary data structures.
|
|
`Embeddable` means that you can easily integrate a YAQL query processor in your code. Queries come
|
|
from your DSLs (domain specific language), user input, JSON, and so on. YAQL has a
|
|
vast and comprehensive standard library of functions that can be used to query data of any complexity.
|
|
Also, YAQL can be extended even further with user-specified functions.
|
|
YAQL is written in Python and is distributed through PyPI.
|
|
|
|
YAQL was inspired by Microsoft LINQ for Objects and its first aim is to execute expressions
|
|
on the data in memory. A YAQL expression has the same role as an SQL query to databases:
|
|
search and operate the data. In general, any SQL query can be transformed to a YAQL expression,
|
|
but YAQL can also be used for computational statements. For example, `2 + 3*4` is a valid
|
|
YAQL expression.
|
|
|
|
Moreover, in YAQL, the following operations are supported out of the box:
|
|
|
|
* Complex data queries
|
|
* Creation and transformation of lists, dicts, and arrays
|
|
* String operations
|
|
* Basic math operations
|
|
* Conditional expression
|
|
* Date and time operations (will be supported in yaql 1.1)
|
|
|
|
An interesting thing in YAQL is that everything is a function and any function can
|
|
be customized or overridden. This is true even for built-in functions.
|
|
YAQL cannot call any function that was not explicitly registered to be accessible
|
|
by YAQL. The same is true for operators.
|
|
|
|
YAQL can be used in two different ways: as an independent CLI tool, and as a
|
|
Python module.
|
|
|
|
Installation
|
|
------------
|
|
|
|
You can install YAQL in two different ways:
|
|
|
|
#. Using PyPi:
|
|
|
|
.. code-block:: console
|
|
|
|
pip install yaql
|
|
|
|
#. Using your system package manager (for example Ubuntu):
|
|
|
|
.. code-block:: console
|
|
|
|
sudo apt-get install python-yaql
|
|
|
|
HowTo: Use YAQL in Python
|
|
-------------------------
|
|
|
|
You can operate with YAQL from Python in three easy steps:
|
|
|
|
* Create a YAQL engine
|
|
* Parse a YAQL expression
|
|
* Execute the parsed expression
|
|
|
|
.. NOTE::
|
|
The engine should be created once for a set of operators and parser rules. It can
|
|
be reused for all queries.
|
|
|
|
Here is an example how it can be done with the YAML file which looks like:
|
|
|
|
.. code-block:: yaml
|
|
|
|
customers_city:
|
|
- city: New York
|
|
customer_id: 1
|
|
- city: Saint Louis
|
|
customer_id: 2
|
|
- city: Mountain View
|
|
customer_id: 3
|
|
customers:
|
|
- customer_id: 1
|
|
name: John
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
- customer_id: 2
|
|
name: Paul
|
|
orders:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
- customer_id: 3
|
|
name: Diana
|
|
orders:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
|
|
.. code-block:: python
|
|
|
|
import yaql
|
|
import yaml
|
|
|
|
data_source = yaml.load(open('shop.yaml', 'r'))
|
|
|
|
engine = yaql.factory.YaqlFactory().create()
|
|
|
|
expression = engine(
|
|
'$.customers.orders.selectMany($.where($.order_id = 4))')
|
|
|
|
order = expression.evaluate(data=data_source)
|
|
|
|
Content of the ``order`` will be the following:
|
|
|
|
.. code-block:: console
|
|
|
|
[{u'item': u'Drums', u'order_id': 4, u'quantity': 1}]
|
|
|
|
YAQL grammar
|
|
------------
|
|
|
|
YAQL has a very simple grammar:
|
|
|
|
* Three keywords as in JSON: true, false, null
|
|
* Numbers, such as 12 and 34.5
|
|
* Strings: `'foo'` and `"bar"`
|
|
* Access to the data: $variable, $
|
|
* Binary and unary operators: 2 + 2, -1, 1 != 2, $list[1]
|
|
|
|
Data access
|
|
~~~~~~~~~~~
|
|
|
|
Although YAQL expressions may be self-sufficient, the most important value of YAQL
|
|
is its ability to operate on user-passed data. Such data is placed into variables
|
|
which are accessible in a YAQL expression as `$<variable_name>`. The `variable_name`
|
|
can contain numbers, English alphabetic characters, and underscore symbols. The `variable_name`
|
|
can be empty, in this case you will use `$`. Variables can be set prior to executing
|
|
a YAQL expression or can be changed during the execution of some functions.
|
|
|
|
According to the convention in YAQL, function parameters, including input data,
|
|
are stored in variables like `$1`, `$2`, and so on. The `$` stands for `$1`.
|
|
For most cases, all function parameters are passed in one piece and can be accessed
|
|
using `$`, that is why this variable is the most used one in YAQL expressions.
|
|
Besides, some functions are expected to get a YAQL expression as one of the
|
|
parameters (for example, a predicate for collection sorting). In this case,
|
|
passed expression is granted access to the data by `$`.
|
|
|
|
Strings
|
|
~~~~~~~
|
|
|
|
In YAQL, strings can be enclosed in `"` and `'`. Both types are absolutely equal and
|
|
support all standard escape symbols including unicode code-points. In YAQL, both types
|
|
of quotes are useful when you need to include one type of quotes into the
|
|
other. In addition, ` is used to create a string where only one escape symbol \` is possible.
|
|
This is especially suitable for regexp expressions.
|
|
|
|
If a string does not start with a digit or `__` and contains only digits, `_`, and English letters,
|
|
it is called identifier and can be used without quotes at all. An identifier can be used
|
|
as a name for function, parameter or property in `$obj.property` case.
|
|
|
|
Functions
|
|
~~~~~~~~~
|
|
|
|
A function call has syntax of `functionName(functionParameters)`. Brackets are necessary
|
|
even if there are no parameters. In YAQL, there are two types of parameters:
|
|
|
|
* Positional parameters
|
|
``foo(1, 2, someValue)``
|
|
* Named parameters
|
|
``foo(paramName1 => value1, paramName2 => 123)``
|
|
|
|
Also, a function can be called using both positional and named parameters: ``foo(1, false, param => null)``.
|
|
In this case, named arguments must be written after positional arguments. In
|
|
``name => value``, `name` must be a valid identifier and must match the name of
|
|
parameter in function definition. Usually, arguments can be passed in both ways,
|
|
but named-only parameters are supported in YAQL since Python 3 supports them.
|
|
|
|
Parameters can have default values. Named parameters is a good way to pass only needed
|
|
parameters and skip arguments which can be use default values, also you can simply
|
|
skip parameters in function call: ``foo(1,,3)``.
|
|
|
|
In YAQL, there are three types of functions:
|
|
|
|
* Regular functions: ``max(1,2)``
|
|
* Method-like functions, which are called by specifying an object for which the
|
|
function is called, followed by a dot and a function call: ``stringValue.toUpper()``
|
|
* Extension methods, which can be called both ways: ``len(string)``, ``string.len()``
|
|
|
|
YAQL standard library contains hundreds of functions which belong to one of these types.
|
|
Moreover, applications can add new functions and override functions from the standard library.
|
|
|
|
Operators
|
|
~~~~~~~~~
|
|
|
|
YAQL supports the following types of operators out of the box:
|
|
|
|
* Arithmetic: `+`. `-`, `*`, `/`, `mod`
|
|
* Logical: `=`, `!=`, `>=`, `<=`, `and`, `or`, `not`
|
|
* Regexp operations: `=~`, `!~`
|
|
* Method call, call to the attribute: `.`, `?.`
|
|
* Context pass: `->`
|
|
* Indexing: `[ ]`
|
|
* Membership test operations: `in`
|
|
|
|
Data structures
|
|
~~~~~~~~~~~~~~~
|
|
|
|
YAQL supports these types out of the box:
|
|
|
|
|
|
* Scalars
|
|
|
|
YAQL supports such types as string, int. boolean. Datetime and timespan
|
|
will be available after yaql 1.1 release.
|
|
|
|
* Lists
|
|
|
|
List creation: ``[1, 2, value, true]``
|
|
Alternative syntax: ``list(1, 2, value, true)``
|
|
List elemenets can be accesessed by index: ``$list[0]``
|
|
|
|
* Dictionaries
|
|
|
|
Dict creation: ``{key1 => value1, true => 1, 0 => false}``
|
|
Alternative syntax: ``dict(key1 => value1, true => 1, 0 => false)``
|
|
Dictionaries can be indexed by keys: ``$dict[key]``. Exception will be raised
|
|
if the key is missing in the dictionary. Also, you can specify value which will
|
|
be returned if the key is not in the dictionary: ``dict.get(key, default)``.
|
|
|
|
.. NOTE::
|
|
During iteration through the dictionary, `key` can be called like: ``$.key``
|
|
|
|
* (Optional) Sets
|
|
|
|
Set creation: ``set(1, 2, value, true)``
|
|
|
|
.. NOTE::
|
|
YAQL is designed to keep input data unchanged. All the functions that
|
|
look as if they change data, actually return an updated copy and keep the original
|
|
data unchanged. This is one reason why YAQL is thread-safe.
|
|
|
|
Basic YAQL query operations
|
|
---------------------------
|
|
|
|
It is obvious that we can compare YAQL with SQL as they both are designed to solve
|
|
similar tasks. Here we will take a look at the YAQL functions which have a direct
|
|
equivalent with SQL.
|
|
|
|
We will use YAML from `HowTo: use YAQL in Python`_ as a data source in our examples.
|
|
|
|
|
|
Filtering
|
|
~~~~~~~~~
|
|
|
|
.. NOTE::
|
|
|
|
Analog is SQL WHERE
|
|
|
|
The most common query to the data sets is filtering. This is a type of
|
|
query which will return only elements for which the filtering query is true. In YAQL,
|
|
we use ``where`` to apply filtering queries.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.where($.name = John)
|
|
|
|
.. code-block:: yaml
|
|
|
|
- customer_id: 1
|
|
name: John
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
|
|
|
|
Ordering
|
|
~~~~~~~~
|
|
|
|
.. NOTE::
|
|
|
|
Analog is SQL ORDER BY
|
|
|
|
It may be required to sort the data returned by some YAQL query. The ``orderBy`` clause will cause
|
|
the elements in the returned sequence to be sorted according to the default comparer
|
|
for the type being sorted. For example, the following query can be extended to sort
|
|
the results based on the profession property.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.orderBy($.name)
|
|
|
|
.. code-block:: yaml
|
|
|
|
- customer_id: 3
|
|
name: Diana
|
|
orders:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
- customer_id: 1
|
|
name: John
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
- customer_id: 2
|
|
name: Paul
|
|
orders:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
|
|
Grouping
|
|
~~~~~~~~
|
|
|
|
.. NOTE::
|
|
|
|
Analog is SQL GROUP BY
|
|
|
|
The ``groupBy`` clause allows you to group the results according to the key you specified.
|
|
Thus, it is possible to group example json by gender.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.groupBy($.name)
|
|
|
|
.. code-block:: yaml
|
|
|
|
- Diana:
|
|
- customer_id: 3
|
|
name: Diana
|
|
orders:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
- Paul:
|
|
- customer_id: 2
|
|
name: Paul
|
|
orders:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
- John:
|
|
- customer_id: 1
|
|
name: John
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
|
|
So, here you can see the difference between ``groupBy`` and ``orderBy``. We use
|
|
the same parameter `name` for both operations, but in the output for ``groupBy``
|
|
`name` is located in additional place before everything else.
|
|
|
|
Selecting
|
|
~~~~~~~~~
|
|
|
|
.. NOTE::
|
|
|
|
Analog is SQL SELECT
|
|
|
|
The ``select`` method allows building new objects out of objects of some collection.
|
|
In the following example, the result will contain a list of name/orders pairs.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.select([$.name, $.orders])
|
|
|
|
.. code-block:: console
|
|
|
|
- John:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
- Paul:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
- Diana:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
|
|
Joining
|
|
~~~~~~~
|
|
|
|
.. NOTE::
|
|
|
|
Analog is SQL JOIN
|
|
|
|
The ``join`` method creates a new collection by joining two other collections by
|
|
some condition.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.join($.customers_city, $1.customer_id = $2.customer_id, {customer=>$1.name, city=>$2.city, orders=>$1.orders})
|
|
|
|
.. code-block:: yaml
|
|
|
|
- customer: John
|
|
city: New York
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|
|
- customer: Paul
|
|
city: Saint Louis
|
|
orders:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
- customer: Diana
|
|
city: Mountain View
|
|
orders:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
|
|
|
|
Take an element from collection
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
YAQL supports two general methods that can help you to take elements from collection
|
|
``skip`` and ``take``.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.skip(1).take(2)
|
|
|
|
.. code-block:: yaml
|
|
|
|
- customer_id: 2
|
|
name: Paul
|
|
orders:
|
|
- order_id: 2
|
|
item: Banjo
|
|
quantity: 2
|
|
- order_id: 3
|
|
item: Piano
|
|
quantity: 1
|
|
- customer_id: 3
|
|
name: Diana
|
|
orders:
|
|
- order_id: 4
|
|
item: Drums
|
|
quantity: 1
|
|
|
|
First element of collection
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The ``first`` method will return the first element of a collection.
|
|
|
|
.. code-block:: console
|
|
|
|
yaql> $.customers.first()
|
|
|
|
.. code-block:: yaml
|
|
|
|
- customer_id: 1
|
|
name: John
|
|
orders:
|
|
- order_id: 1
|
|
item: Guitar
|
|
quantity: 1
|