Yet another query language
Go to file
Stan Lagun 1d1f187c5c Small improvements to yaql
Contexts:
* context module was renamed to contexts
* convention property is now part of ContextBase class.
* Standard Context implementation automatically uses convention
  from parent context
* Context interface was enhanced to add capability to check key presence,
   get data with different default or not try to access parent context
* MultiContext to virtually merge several contexts without of making them related.
  All merged contexts may have parents of their own which are also merged.
  Source contexts are not modified. This is sort of context mix-in

Parser:
* keyword arguments (name => value) now require name to be keyword token.
   So f('abc' => value) will not work. This doesnt affect dict() and similar functions
   as they use different argument format.
* tokens that start with two underscores (__) are no more valid. This is done so that it
  would be possible to have auto-injected Python arguments like "__context" without
  affecting possible kwargs of the same function

Delegates:
* Added ability to call delegate (callable) passed as a context value.
  The syntax is $var(args) (or even $(args)).
* Delegate doesn't have to be a context value but also can be result of expression:
  func(args1)(args2), (f() op g())() and so on
* Delegates are disabled by default for security reasons. "allow_delegates" parameter
  was added to both YaqlFactory classes to enable them
* (expr)(args) will be translated to #call(expr, args). So for this to work #call function
  must be present in context. Standard context doesn't provide this function by default.
  Use delegates=True parameter of create_context method (including legacy mode version)
  to register #call() and lambda() functions
* additional lambda(expression) method that returns delegate to passed expression thus
  making it 2nd order lambda. This delegate may be stored in context using let() method
  (or alternatives) or be called as usual. lambda(expression)(args) is equal to expression(args)

Function specs:
* FunctionDefinition now has "meta" dictionary to store arbitrary extended metadata for the
   function (for example version etc.)
* use @specs.meta('key', 'value') decorator to assign meta value to the function. Several
  decorators may be applied to single function
* Context.__call__() / runner.call() now accept optional function_filter parameter that is a
  predicate (functionDefinition, context -> boolean) to additionally filter function candidates
  based on context values, function metadata and so on.
* returns_context decorator and functionality was removed because it was shawn to be useless
* "__context" and "__engine" parameters where not explicitly declared treated the
  same way as  "context" and "engine"
* added ability to control what type be assigned to parameters without explicit specification
  based on parameter name
* added ability to get function definition with stripped hidden parameters for function wrappers
* refactoring of ParameterDefinition creation code. Code from decorator was moved to a more
  generic set_parameter method of FunctionDefinition to support wider range of scenarios
  (for example to overwrite auto-generated parameter specs)
* FunctionDefinition.__call__ was changed:
  a) order of parameter was changed so the sender could have default value
  b) args now automatically prefixed with sender when provided.
       There is no need to provide it twice anymore
* a helper functions was added to utils to get yaql 0.2 style extension methods specs
  for functions that already registered in context as pure functions or methods

Smart types:
* "context" argument was added to each check() method so that it would be possible to do
  checks using context values. Non of standard smart-types use it, however custom types may do.
* return_context flag was removed from Lambda/Delegate/Super types for the same reasons as in FD
* GenericType helper class was added as a base class for custom non-lazy smart-types simplifying
  their development
* added ability to pass received Lambda (Delegate etc) to another method (or base method version)
  when the later expects it in another format (for example sender had Lambda() while the receiver
  had Lambda(with_context=True))

Standard library:
* "#operator_." method for obj.method() now expects first argument (sender expression) to be object
   (pre-evaluated) rather than Lambda. This doesn't brake anything but allows to override this function
   in child contexts for some more specific types of sender without getting resolution error because of
   several versions of the same function being differ by parameter laziness
* collection.flatten() method was added
* string.matches(regexpString) method was added to complement
   regexp.matches(string) that was before
* string.join(collection) method was added to complement collection.join(string)
   that was before. Also now both functions apply str() function to each element
   of collection
* now int(null) = 0 and float(null) = 0.0

Also bumps requirements to latest OpenStack global-requirements and removes version
number from setup.cfg to use git tag-driven versioning of pbr

Change-Id: I0dd06bf1fb70296157ebb0e9831d2b70d93ca137
2015-07-24 02:54:04 +03:00
doc/source yaql 1.0 2015-02-25 14:38:45 +03:00
yaql Small improvements to yaql 2015-07-24 02:54:04 +03:00
.coveragerc Toxify project 2014-07-03 18:01:24 +04:00
.gitignore Update gitignore 2014-11-14 03:30:29 +03:00
.gitreview Toxify project 2014-07-03 18:01:24 +04:00
.mailmap Toxify project 2014-07-03 18:01:24 +04:00
.testr.conf Toxify project 2014-07-03 18:01:24 +04:00
babel.cfg Toxify project 2014-07-03 18:01:24 +04:00
CONTRIBUTING.rst Workflow documentation is now in infra-manual 2014-12-05 03:30:47 +00:00
HACKING.rst Toxify project 2014-07-03 18:01:24 +04:00
LICENSE Toxify project 2014-07-03 18:01:24 +04:00
MANIFEST.in Toxify project 2014-07-03 18:01:24 +04:00
README.rst Toxify project 2014-07-03 18:01:24 +04:00
requirements.txt Small improvements to yaql 2015-07-24 02:54:04 +03:00
setup.cfg Small improvements to yaql 2015-07-24 02:54:04 +03:00
setup.py Fix pep8 checks: W292,W391 2014-07-03 18:34:16 +04:00
test-requirements.txt Small improvements to yaql 2015-07-24 02:54:04 +03:00
tox.ini Small improvements to yaql 2015-07-24 02:54:04 +03:00

YAQL - Yet Another Query Language

At the beginning of millennium the growing trend towards data formats standardization and application integrability made XML extremely popular. XML became lingua franca of the data. Applications tended to process lots of XML files ranging from small config files to very large datasets. As these data often had a complex structure with many levels of nestedness it is quickly became obvious that there is a need for specially crafted domain specific languages to query these data sets. This is how XPath and later XQL were born.

With later popularization of REST services and Web 2.0 JSON started to take XMLs place. JSONs main advantage (besides being simpler than XML) is that is closely reassembles data structures found in most programming languages (arrays, dictionaries, scalars) making it very convenient for data serialization. As JSON lacked all the brilliant XML-related technologies like XSLT, XML Schema, XPath etc. various attempts to develop similar languages for JSON were made. One of those efforts was JSONPath library developed in 2007 by Stefan Gössner. Initial implementation was for PHP and JavaScript languages, but later on ports to other languages including Python were written.

JSONPath allows navigation and querying, well, JSONs. Suppose we have JSON as in following:

{

"customers": [ { "customer_id": 1, "name": "John", "orders": [{ "order_id": 1, "item": "Guitar", "quantity": 1 }] },{ "customer_id": 2, "name": "Paul", "orders": [ { "order_id": 2, "item": "Banjo", "quantity": 2 },{ "order_id": 3, "item": "Piano", "quantity": 1 }] } ]

}

then

jsonpath(data, "$.customers[0].name") -> [John] jsonpath(data, "$.customers[*].orders[*].order_id") -> [1, 2, 3]

But what if we need, for example to find order having ID = 2? Here is how it done in JSONPath:

jsonpath(data, "$.customers[*].orders[?(@.order_id == 2)") -> [{'order_id': 2, 'item': 'Banjo', 'quantity': 2}]

The construct [?(expression)] allows to filter items using any Python expression in our case. @ character is replaced with current value and then the whole expression is evaluated. Evaluation of arbitrary Python expression requires using eval() function unless one wants to develop his own complete parser and interpreter of Python programming language. Needless to say that eval() is a great security breach. If JSONPath expressions are used to simplify program logic it would not be a big deal, but what if JSONPath is written by program users?

JSONPath expression is just a plain string. There is no such concept as parameter. That is if one want to find order having ID = some variable value he has to dynamically construct expression string using string formatting or concatenation. And again that is might be okay for internal usage but would became difficult for external usage and also open the doors for injection attacks (remember SQL injection?)

Another limitation of JSONPath is JSON itself. Technically speaking JSONPath operates not on the JSON itself (i.e. text representation) but on a JSON-like object model that is mixture of arrays, dictionaries and scalar values. But what is one want to query object model consisting of custom objects? What if some parts of this model are dynamically computed? Or the model is a graph rather than a tree?

It seems like JSONPath is good enough to use in Python code when you can eval() things and have many helper function to work with data besides JSONPath capabilities but is not enough for external use when you need to have sufficient power to query model without manual coding and have it still secure. This is why we designed YAQL. YAQL follows the JSONPath ideas and has very similar syntax but offers much more for data querying.

Expressions are quite similar to JSONPath. Here is how examples above can be translated to YAQL:

$.customers[0].name -> $.customers[0].name (no change) $.customers[*].orders[*].order_id -> $.customers.orders.order_id

the main addition to JSONPath is functions and operators. Consider the following YAQL expressions:

$.customers.orders[$.quantity > 0].quantity.sum() -> 4 $.customers.orders.select($.quantity * $.quantity).sum() -> 6 $.customers.orders.order_id.orderDesc($) -> [3, 2, 1] $.customers.orders.order_id.orderDesc($).take(2) -> [3, 2] $.customers.orders.order_id.orderDesc($).first() -> 3

Does it mean that YAQL has large built-in function and operator library?. Yes, YAQL library has a out of the box large set of commonly used functions. But they are not built-in. All the functions and operators (which are also function: a + b = operator_+(a, b) etc) are user-supplied. User is free to add other functions that could be used in expressions and to remove standard ones.

JSONPath library needs 2 arguments - input JSON data and an a expression. YAQL library requires third parameter - context.

Context is a repository of functions and variables that can be used in expressions. So all the functions above are just ordinary Python functions that are registered in Context object. But because they all need to be registered in Context user can always customize them, add his own model-specific ones and have full control over the expression evaluation.