yaql/doc/source/getting_started.rst
Omar Shykhkerimov d01a78cc7b Fix typo in YAML example
Change-Id: Ic03d341e6e081a96d790915087273e5b9f1af90b
2016-05-24 16:25:06 +03:00

481 lines
13 KiB
ReStructuredText

Getting started with YAQL
=========================
Introduction to YAQL
--------------------
YAQL (Yet Another Query Language) is an embeddable and extensible query
language that allows performing complex queries against arbitrary data structures.
`Embeddable` means that you can easily integrate a YAQL query processor in your code. Queries come
from your DSLs (domain specific language), user input, JSON, and so on. YAQL has a
vast and comprehensive standard library of functions that can be used to query data of any complexity.
Also, YAQL can be extended even further with user-specified functions.
YAQL is written in Python and is distributed through PyPI.
YAQL was inspired by Microsoft LINQ for Objects and its first aim is to execute expressions
on the data in memory. A YAQL expression has the same role as an SQL query to databases:
search and operate the data. In general, any SQL query can be transformed to a YAQL expression,
but YAQL can also be used for computational statements. For example, `2 + 3*4` is a valid
YAQL expression.
Moreover, in YAQL, the following operations are supported out of the box:
* Complex data queries
* Creation and transformation of lists, dicts, and arrays
* String operations
* Basic math operations
* Conditional expression
* Date and time operations (will be supported in yaql 1.1)
An interesting thing in YAQL is that everything is a function and any function can
be customized or overridden. This is true even for built-in functions.
YAQL cannot call any function that was not explicitly registered to be accessible
by YAQL. The same is true for operators.
YAQL can be used in two different ways: as an independent CLI tool, and as a
Python module.
Installation
------------
You can install YAQL in two different ways:
#. Using PyPi:
.. code-block:: console
pip install yaql
#. Using your system package manager (for example Ubuntu):
.. code-block:: console
sudo apt-get install python-yaql
HowTo: Use YAQL in Python
-------------------------
You can operate with YAQL from Python in three easy steps:
* Create a YAQL engine
* Parse a YAQL expression
* Execute the parsed expression
.. NOTE::
The engine should be created once for a set of operators and parser rules. It can
be reused for all queries.
Here is an example how it can be done with the YAML file which looks like:
.. code-block:: yaml
customers_city:
- city: New York
customer_id: 1
- city: Saint Louis
customer_id: 2
- city: Mountain View
customer_id: 3
customers:
- customer_id: 1
name: John
orders:
- order_id: 1
item: Guitar
quantity: 1
- customer_id: 2
name: Paul
orders:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
- customer_id: 3
name: Diana
orders:
- order_id: 4
item: Drums
quantity: 1
.. code-block:: python
import yaql
import yaml
data_source = yaml.load(open('shop.yaml', 'r'))
engine = yaql.factory.YaqlFactory().create()
expression = engine(
'$.customers.orders.selectMany($.where($.order_id = 4))')
order = expression.evaluate(data=data_source)
Content of the ``order`` will be the following:
.. code-block:: console
[{u'item': u'Drums', u'order_id': 4, u'quantity': 1}]
YAQL grammar
------------
YAQL has a very simple grammar:
* Three keywords as in JSON: true, false, null
* Numbers, such as 12 and 34.5
* Strings: `'foo'` and `"bar"`
* Access to the data: $variable, $
* Binary and unary operators: 2 + 2, -1, 1 != 2, $list[1]
Data access
~~~~~~~~~~~
Although YAQL expressions may be self-sufficient, the most important value of YAQL
is its ability to operate on user-passed data. Such data is placed into variables
which are accessible in a YAQL expression as `$<variable_name>`. The `variable_name`
can contain numbers, English alphabetic characters, and underscore symbols. The `variable_name`
can be empty, in this case you will use `$`. Variables can be set prior to executing
a YAQL expression or can be changed during the execution of some functions.
According to the convention in YAQL, function parameters, including input data,
are stored in variables like `$1`, `$2`, and so on. The `$` stands for `$1`.
For most cases, all function parameters are passed in one piece and can be accessed
using `$`, that is why this variable is the most used one in YAQL expressions.
Besides, some functions are expected to get a YAQL expression as one of the
parameters (for example, a predicate for collection sorting). In this case,
passed expression is granted access to the data by `$`.
Strings
~~~~~~~
In YAQL, strings can be enclosed in `"` and `'`. Both types are absolutely equal and
support all standard escape symbols including unicode code-points. In YAQL, both types
of quotes are useful when you need to include one type of quotes into the
other. In addition, ` is used to create a string where only one escape symbol \` is possible.
This is especially suitable for regexp expressions.
If a string does not start with a digit or `__` and contains only digits, `_`, and English letters,
it is called identifier and can be used without quotes at all. An identifier can be used
as a name for function, parameter or property in `$obj.property` case.
Functions
~~~~~~~~~
A function call has syntax of `functionName(functionParameters)`. Brackets are necessary
even if there are no parameters. In YAQL, there are two types of parameters:
* Positional parameters
``foo(1, 2, someValue)``
* Named parameters
``foo(paramName1 => value1, paramName2 => 123)``
Also, a function can be called using both positional and named parameters: ``foo(1, false, param => null)``.
In this case, named arguments must be written after positional arguments. In
``name => value``, `name` must be a valid identifier and must match the name of
parameter in function definition. Usually, arguments can be passed in both ways,
but named-only parameters are supported in YAQL since Python 3 supports them.
Parameters can have default values. Named parameters is a good way to pass only needed
parameters and skip arguments which can be use default values, also you can simply
skip parameters in function call: ``foo(1,,3)``.
In YAQL, there are three types of functions:
* Regular functions: ``max(1,2)``
* Method-like functions, which are called by specifying an object for which the
function is called, followed by a dot and a function call: ``stringValue.toUpper()``
* Extension methods, which can be called both ways: ``len(string)``, ``string.len()``
YAQL standard library contains hundreds of functions which belong to one of these types.
Moreover, applications can add new functions and override functions from the standard library.
Operators
~~~~~~~~~
YAQL supports the following types of operators out of the box:
* Arithmetic: `+`. `-`, `*`, `/`, `mod`
* Logical: `=`, `!=`, `>=`, `<=`, `and`, `or`, `not`
* Regexp operations: `=~`, `!~`
* Method call, call to the attribute: `.`, `?.`
* Context pass: `->`
* Indexing: `[ ]`
* Membership test operations: `in`
Data structures
~~~~~~~~~~~~~~~
YAQL supports these types out of the box:
* Scalars
YAQL supports such types as string, int. boolean. Datetime and timespan
will be available after yaql 1.1 release.
* Lists
List creation: ``[1, 2, value, true]``
Alternative syntax: ``list(1, 2, value, true)``
List elemenets can be accesessed by index: ``$list[0]``
* Dictionaries
Dict creation: ``{key1 => value1, true => 1, 0 => false}``
Alternative syntax: ``dict(key1 => value1, true => 1, 0 => false)``
Dictionaries can be indexed by keys: ``$dict[key]``. Exception will be raised
if the key is missing in the dictionary. Also, you can specify value which will
be returned if the key is not in the dictionary: ``dict.get(key, default)``.
.. NOTE::
During iteration through the dictionary, `key` can be called like: ``$.key``
* (Optional) Sets
Set creation: ``set(1, 2, value, true)``
.. NOTE::
YAQL is designed to keep input data unchanged. All the functions that
look as if they change data, actually return an updated copy and keep the original
data unchanged. This is one reason why YAQL is thread-safe.
Basic YAQL query operations
---------------------------
It is obvious that we can compare YAQL with SQL as they both are designed to solve
similar tasks. Here we will take a look at the YAQL functions which have a direct
equivalent with SQL.
We will use YAML from `HowTo: use YAQL in Python`_ as a data source in our examples.
Filtering
~~~~~~~~~
.. NOTE::
Analog is SQL WHERE
The most common query to the data sets is filtering. This is a type of
query which will return only elements for which the filtering query is true. In YAQL,
we use ``where`` to apply filtering queries.
.. code-block:: console
yaql> $.customers.where($.name = John)
.. code-block:: yaml
- customer_id: 1
name: John
orders:
- order_id: 1
item: Guitar
quantity: 1
Ordering
~~~~~~~~
.. NOTE::
Analog is SQL ORDER BY
It may be required to sort the data returned by some YAQL query. The ``orderBy`` clause will cause
the elements in the returned sequence to be sorted according to the default comparer
for the type being sorted. For example, the following query can be extended to sort
the results based on the profession property.
.. code-block:: console
yaql> $.customers.orderBy($.name)
.. code-block:: yaml
- customer_id: 3
name: Diana
orders:
- order_id: 4
item: Drums
quantity: 1
- customer_id: 1
name: John
orders:
- order_id: 1
item: Guitar
quantity: 1
- customer_id: 2
name: Paul
orders:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
Grouping
~~~~~~~~
.. NOTE::
Analog is SQL GROUP BY
The ``groupBy`` clause allows you to group the results according to the key you specified.
Thus, it is possible to group example json by gender.
.. code-block:: console
yaql> $.customers.groupBy($.name)
.. code-block:: yaml
- Diana:
- customer_id: 3
name: Diana
orders:
- order_id: 4
item: Drums
quantity: 1
- Paul:
- customer_id: 2
name: Paul
orders:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
- John:
- customer_id: 1
name: John
orders:
- order_id: 1
item: Guitar
quantity: 1
So, here you can see the difference between ``groupBy`` and ``orderBy``. We use
the same parameter `name` for both operations, but in the output for ``groupBy``
`name` is located in additional place before everything else.
Selecting
~~~~~~~~~
.. NOTE::
Analog is SQL SELECT
The ``select`` method allows building new objects out of objects of some collection.
In the following example, the result will contain a list of name/orders pairs.
.. code-block:: console
yaql> $.customers.select([$.name, $.orders])
.. code-block:: console
- John:
- order_id: 1
item: Guitar
quantity: 1
- Paul:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
- Diana:
- order_id: 4
item: Drums
quantity: 1
Joining
~~~~~~~
.. NOTE::
Analog is SQL JOIN
The ``join`` method creates a new collection by joining two other collections by
some condition.
.. code-block:: console
yaql> $.customers.join($.customers_city, $1.customer_id = $2.customer_id, {customer=>$1.name, city=>$2.city, orders=>$1.orders})
.. code-block:: yaml
- customer: John
city: New York
orders:
- order_id: 1
item: Guitar
quantity: 1
- customer: Paul
city: Saint Louis
orders:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
- customer: Diana
city: Mountain View
orders:
- order_id: 4
item: Drums
quantity: 1
Take an element from collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
YAQL supports two general methods that can help you to take elements from collection
``skip`` and ``take``.
.. code-block:: console
yaql> $.customers.skip(1).take(2)
.. code-block:: yaml
- customer_id: 2
name: Paul
orders:
- order_id: 2
item: Banjo
quantity: 2
- order_id: 3
item: Piano
quantity: 1
- customer_id: 3
name: Diana
orders:
- order_id: 4
item: Drums
quantity: 1
First element of collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``first`` method will return the first element of a collection.
.. code-block:: console
yaql> $.customers.first()
.. code-block:: yaml
- customer_id: 1
name: John
orders:
- order_id: 1
item: Guitar
quantity: 1