Z3 as an alternative Datalog engine

Z3 is an open source automatic theorem prover developed by Microsoft.
This spec describes how to add Z3 as an alternative Datalog engine
in Congress.

Change-Id: Ie1e891408c3f03618926cff4d88e7904b0c0e370
This commit is contained in:
Pierre Crégut 2018-07-06 18:13:23 +02:00
parent 35d99d2193
commit ffd9d30469
1 changed files with 313 additions and 0 deletions

View File

@ -0,0 +1,313 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Z3 as an alternative Datalog engine
==========================================
https://blueprints.launchpad.net/congress/+spec/alternative-engine-z3
Z3 is an open source Datalog engine developed by Microsoft.
We propose an extension of Congress that lets the user define
theories (ie. set of rules producing new tables from existing ones
that can be later used by other theories) that will be treated
by the Z3 engine instead of the existing one.
Problem description
===================
A cloud operator may want to check if connectivity exists between two machines
or if they are effectively isolated. This means understanding how traffic is
routed (ie. what are the path the traffic may take) and how security groups and
firewall rules are applied along those path.
Here are the main characteristics needed by the Datalog engine for
solving such problems:
* Recursion: routing packets means building paths (link paths, networks)
and paths are naturally described recursively.
* Efficient representation of IP addresses and IP address ranges as commonly
found in switching devices and firewalls.
* Efficient handling of large volumes of data.
Proposed change
===============
We propose to integrate Z3, Microsoft open-source automatic prover, as an
alternate Datalog engine for Congress.
Z3 theories will be expressed in the same syntax as standard theories.
But the Datalog language supported by these theories will be tailored to what
Z3 supports in Datalog mode:
* different and reduced set of built-ins as explained in the Policy section,
* no modals.
Architecture
------------
Overview
^^^^^^^^
A new kind of theory will be introduced alongside the classical
NonRecursiveTheory. Z3 theories will consume data from
other theories and use the Z3 engine to apply rules.
There will be one single Z3 solver instance per analyzer that will be
abstracted in Python as a Z3 context object. Z3 context will host all tables
of Z3 theories in Z3 format.
Z3 theories should be callable from other theories in particular from
regular non-recursive theories.
Z3 theories can use data from other theories. The external tables used by
Z3 theories will be completely imported in the Z3 context.
Compilation of Z3 theories will be delayed until queries are done on those
theories. We will regularly update the representation of external tables in
the Z3 engine to keep them synchronized. Z3 does not support undoing relations
or rules, so a new context must be created each time a change occurs.
Type-checking
^^^^^^^^^^^^^
Z3 requires typed data. Type-checking will be completely implicit and we will
not add type-constraints in the language.
The type-checker will preserve the types from external tables. It will raise
an error if a variable should have two different types according to two
different tables.
The type-checker will try to give the most precise type to constants that
appear in Datalog programs to make them compatible with external tables.
For example the value ``Ingress`` is usually typed as a string (``Str`` type)
but in the context of a Neutron security group, it will be given the type
``Direction``
To be able to have two representations for a value, we will introduce
explicit converter built-ins: ``builtin:cast(x, 'Direction', y, 'Str')``
will convert a variable ``x`` of type ``Direction`` in a variable ``y``
of the standard string type. It will be implemented as a table where each
row is a pair of the representations in the two different types of a same
object. Those tables will be filled when constants in programs are compiled
and external tables are imported.
The same trick will be used to have a bit field representation of Ip addresses.
Finally the type-checker should support some kind of generic polymorphism for
built-ins. For example equalities and comparison should work as long as the
compared objects are of the same type, addition of two items belonging to a
given subtype of integers should give back an item in the same subtype. We do
not plan to support user-defined polymorphic tables. For example the following
rule is ill defined: ::
add3(x,y,z,t) :- builtin:plus(x,y,u), builtin:plus(u,z,t)
It may be desirable to have type-constraints in the language to constrain
such variables.
Internal representations
^^^^^^^^^^^^^^^^^^^^^^^^
Internally Z3 will use bit-vectors. Integers will be translated in 32 bits
bit-vectors. Strings will be bijectively associated with an integer and
represented by its associated bit-vector.
When possible we will use finer types (enumerations) that require smaller
bit-vector.
Alternatives
------------
We could try to rephrase the theories to fit the current engine. For example we
could unroll recursive calls up to a reasonable depth (packets have a TTL).
Unfortunately this solution does not scale well. Unrolling recursive calls
change in fact the overall complexity of the program. Experiments with octant
have shown that some problems that require several minutes of computation with
the current engine can be solved in tenth of seconds with octant.
We could try to incorporate Z3 algorithmic in Congress engine: this would be
time consuming and hazardous. Z3 uses very efficient C++ data structures to
represent tables and more that ten years have been devoted to its development.
We could use another external engine: there may be cases where another Datalog
engine could be a better candidate. The new Congress architecture with explicit
data-type has been designed to let several engines share a same pool of data.
The strong points of Z3 are its overall efficiency and its data-structures
specially crafted for network related problems.
There may be cases where we may wish to have several Z3 contexts with different
settings. Contexts define the variant of the Z3 engine in use and different
theories may benefit from different settings.
Policy
------
Here are the main characteristics of Z3 policies:
* Adds support for recursive rules.
* Support of built-in predicates for equalities / comparison
* Support for basic bit arithmetic (to be used with IP address).
* New built-ins for converting opaque IpAddress to bit fields and IpNetwork
to a pair of bit-fields (mask and address).
* Built-ins to convert a value that exists in different types from one
representation to the other.
The last two kinds of built-ins will in fact be implemented by tables computed
when data-sources and programs are synchronized.
Here is an example of policy that verifies if two networks are connected by
a chain of routers and if VMs are attached to two networks that belong to two
sets of networks that should be kept isolated: ::
connect1(X) :- neutronv2:networks(id=X, name="N1")
connect2(X) :- neutronv2:networks(id=X, name="N2")
linked(X,Y) :-
neutronv2:ports(network_id=X, device_id=Z), neutronv2:routers(id=Z),
neutronv2:ports(network_id=Y, device_id=Z)
path(X,Y) :- linked(X,Y)
path(X,Y) :- path(X,Z), linked(Z,Y)
interco_error(X,Y) :- connect1(X), connect2(Y), path(X,Y)
network1(X) :- connect1(Y), path(Y, X)
network2(X) :- connect2(Y), path(Y, X)
double_attach(X) :-
nova:server(id=X),
neutronv2:ports(network_id=X, device_id=Y), network1(Y),
neutronv2:ports(network_id=X, device_id=Z), network2(Z)
Policy actions
--------------
None. Z3 theories will not support actions. Support may be added later.
Data sources
------------
Independent from the data-source used.
Data model impact
-----------------
None
REST API impact
---------------
We may want to move to more atomic definitions of theories. A theory is
a complete program and should be considered as such. Adding and removing
rules on the fly makes compilation and type-checking unnecessarily complex.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
For python-congressclient, when a new theory is created, there is a new
kind alternative: z3.
Performance impact
------------------
Well behaved Z3 theories will be much faster than equivalent non-recursive
theories. But programs that require too much computing power may be harder to
abort on the Z3 engine.
Other deployer impact
---------------------
Z3 is not a pypi package. There is an unofficial z3solver package but it has
some limitations (not an up-to-date version and a version emitting a lot of
spurious messages). For some Linux distributions, there is no pre-compiled
version of Z3 available. Z3 must be compiled from source on those systems.
Due to all these constraints, we have chosen NOT to add z3 as a requirement.
z3theory will be available IF z3 has been deployed on the server hosting
congress.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
pcregut
Other contributors:
Work items
----------
For Rocky we do not plan to add built-ins. The basic Datalog language is sufficient for interesting examples.
During Stein iteration, built-ins and the treatment of polymorphism
in the type-checker will be added.
Dependencies
============
This feature depends on Microsoft theorem prover Z3.
Testing
=======
Unit tests will be done with a mock of z3 library.
For integration tests, new settings will be added to devstack so that Z3
can be either installed from source (rather slow) or directly deployed from
a pre-compiled release (only available for Ubuntu 14.04, 16.04 and Debian
8.10). As the python package is deployed globally, tempest may skip Z3 tests
if it cannot import the python package.
Documentation impact
====================
Documentation should at least mention:
* How to enable Z3,
* Z3 theories and the kind of builtins supported,
* Explain the benefit of recusive theories.
References
==========
[1] https://github.com/Z3Prover/z3
[2] https://github.com/Orange-OpenSource/octant
[3] https://specs.openstack.org/openstack/congress-specs/specs/rocky/explicit-data-types