July 2018

FO3 and the algebra of binary relations

Submitted by Jan Van den Bussche on Tue, 07/31/2018 - 09:23

Illustration of binary relation
                                Image credit: Why U

The algebra of binary relations, denoted here simply by RA, is one of the oldest logics, created by De Morgan, Pierce and Schroder in the 19th century. Tarski and his collaborators revived the logic in the 20th century. Nowadays RA is still very relevant as it lies at the basis of dynamic logics, description logics, and XPath-like query languages for tree and graph data.

A very interesting result by Tarski and Givant is the equivalence between RA and \(\mathrm{FO}^3\), the three-variable fragment of first-order logic. The monograph by Tarski and Givant on the formalization of set theory without variables contains a proof, but that monograph is not for the impatient. The book by Marx and Venema on multi-dimensional modal logic contains an elegant and short proof. As that book is again not something you can readily dive in and start reading somewhere in the middle, we give here a self-contained version of the Marx-Venema proof.

Let us begin by recalling the definition of RA. We start from a binary relational vocabulary \(\Upsilon\). Recall that an \(\Upsilon\)-structure \(I\) consists of a nonempty domain of atomic data elements, denoted by \(\mathrm{dom}(I)\), and, for every relation name \(R\) from \(\Upsilon\), a binary relation \(I(R)\) on \(\mathrm{dom}(I)\). We can think of \(\Upsilon\) as a database schema with only binary relation names, and of \(I\) as an instance of that schema. The only difference is that usually database instances are defined without an explicit domain, instead using the active-domain semantics for first-order logic. Below we continue to work with structures with explicit domains, but everything can be adapted to work with active-domain semantics instead.

The expressions \(e\) of RA are generated by the following grammar:

$$ e ::= R \mid e \cup e \mid \neg e \mid e \circ e \mid e^{-1} \mid {\it id} \mid {\it all} $$ Here, as above, the symbol $R$ ranges over the relation names from \(\Upsilon\). Given any \(\Upsilon\)-structure \(I\), an RA-expression $e$ defines a binary relation \(e(I)\) on \(\mathrm{dom}(I)\) in a natural manner. Obviously, \(\cup\) is union; \(\neg\) is complement with respect to \(\mathrm{dom}(I)^2\). The operator \(\circ\) is the composition of two binary relations, i.e., \(r \circ s = \{(a,c) \mid \exists b : (a,b) \in r\) and \((b,c) \in s\}\). The operator \({}^{-1}\), called converse, inverts a binary relation, i.e., \(r^{-1} = \{(b,a) \mid (a,b) \in r\}\). Finally, the primitives \({\it id}\) and \({\it all}\) evaluate to the identity relation and the complete binary relation on \(\mathrm{dom}(I)\), respectively.

It is not difficult to see that RA is subsumed by \(\mathrm{FO}^3\), in that for every RA-expression \(e\) there exists a first-order logic formula \(\varphi\) with two free variables and three variables overall, so that on every structure, \(\varphi\) defines the same relation as \(e\). For example, the expression \(R \circ (S \circ T)\) can be equivalently expressed as $$ \{(x,y) \mid \exists z(R(x,z) \land \exists x(S(z,x) \land T(x,y)))\}. $$ Note the reuse of the variable \(x\). The interesting result is that the converse holds as well: for every \(\varphi\) as above there exists an \(e\) defining the same relation on every structure.

To prove this result, let us fix the three available variables as \(x\), \(y\) and \(z\). We begin by defining, for every two-element subset \(\{u,v\} \subseteq \{x,y,z\}\), the logic \(L_{\{u,v\}}\). This logic contains all atomic formulas over \(\Upsilon\) involving the variables \(u\) and \(v\), as well as an explicit proposition \(\top\) for true; is closed under the boolean connectives; and also contains all quantified formulas of the form \(\exists w(\gamma \land \chi)\), where \(w\) is the third variable, i.e., the variable not in \(\{u,v\}\), and \(\gamma\) and \(\chi\) are formulas in \(L_{\{u,w\}}\) and \(L_{\{w,v\}}\), respectively. Thus, the three logics \(L_{\{x,y\}}\), \(L_{\{x,z\}}\) and \(L_{\{y,z\}}\) are defined by mutual recursion. Note that the only variables that can occur free in a formula in \(L_{\{u,v\}}\) are \(u\) and \(v\).

It is not difficult to see that, for every formula \(\varphi\) in \(L_{\{u,v\}}\), we can express \(\{(u,v) \mid \varphi\}\) in RA. For example, we can express $$ \{(x,y) \mid \exists z(R(z,x) \land \top) \land y=x\} $$ by \((R^{-1}\circ {\it all}) \cap {\it id}\), where we use intersection \(\cap\) as a shorthand which can be expressed in terms of union and complement.

We now claim that every \(\mathrm{FO}^3\) formula over \(\Upsilon\) is equivalent to a boolean combination of formulas from \(L_{\{x,y\}}\), \(L_{\{x,z\}}\) and \(L_{\{y,z\}}\). This claim can be proven by structural induction. The basis of the induction is formed by the atomic formulas. Since \(\Upsilon\) has only binary relation names, every atomic formula is always in \(L_{\{x,y\}}\), \(L_{\{x,z\}}\) or \(L_{\{y,z\}}\) as desired. For the induction step, the boolean connectives pose no problem. So, the only case that is not immediately clear is when \(\varphi\) is of the form \(\exists w\, \psi\). Let \(u\) and \(v\) be the two variables other than \(w\). By induction, we can write \(\psi\) as a boolean combination of formulas from \(L_{\{u,v\}}\), \(L_{\{u,w\}}\) and \(L_{\{w,v\}}\). We can put this boolean combination in disjunctive normal form and distribute the existential quantifier over the disjuncts, so that we are left with formulas of the form \(\exists w(\beta \land \gamma \land \chi)\), where \(\beta\), \(\gamma\) and \(\chi\) are from \(L_{\{u,v\}}\), \(L_{\{u,w\}}\) and \(L_{\{w,v\}}\), respectively. We can rewrite this formula as \(\beta \land \exists w(\gamma \land \chi)\) which is a formula in \(L_{\{u,v\}}\) as desired.

The result now follows. Let \(\varphi\) be an \(\mathrm{FO}^3\) formula with two free variables \(u\) and \(v\). Let \(w\) be the third variable. By the above claim, we can write \(\varphi\) as a boolean combination of formulas from \(L_{\{u,v\}}\), \(L_{\{u,w\}}\) and \(L_{\{w,v\}}\). Of these constituent formulas, those that are in \(L_{\{u,w\}}\) can have only \(u\) free, so they can be rewritten as \(L_{\{u,v\}}\)-formulas by swapping \(w\) and \(v\). Similarly, the constituent formulas that are in \(L_{\{v,w\}}\) can be rewritten as \(L_{\{u,v\}}\)-formulas by swapping \(w\) and \(u\). The final result is an \(L_{\{u,v\}}\)-formula, which concludes the proof, as we have seen earlier that \(\{(u,v)\mid \varphi\}\), with \(\varphi\) an \(L_{\{u,v\}}\)-formula, is expressible in RA.

Thanks to Floris Geerts for suggesting that I write this blog post.

Categories

ICDT 2019 - Call for Papers (Second Cycle)

Submitted by Pablo Barcelo on Mon, 07/30/2018 - 22:43

CALL FOR PAPERS

22nd INTERNATIONAL CONFERENCE ON DATABASE THEORY (ICDT 2019) 

Lisbon, Portugal, March 26-28, 2019 

http://edbticdt2019.inesc-id.pt/

About ICDT 

ICDT is a series of international scientific conferences on research of data management theory 
(https://databasetheory.org/icdt-pages). Since 2009, it is annually and jointly held with EDBT (Extending DB Technology). The 22nd edition of ICDT, in 2019, will take place in Lisbon, Portugal. 

Broadening Scope

Continuing with the idea of broadening its scope, ICDT 2019 will have a Reach Out track that calls for
- novel formal frameworks and/or
- articles that connect principles of data management to other communities.
Papers submitted to this track should suggest novel and important directions for database theory and provide a theoretical basis for understanding emerging areas in data management. This year we especially encourage articles that introduce novel formal frameworks and/or establish connections between the area of principles of data management and neighboring communities such as Database Systems, Operating Systems, Machine Learning, Artificial Intelligence, Knowledge Representation, Programming Languages, and Distributed Computing.

It is reasonable for articles submitted to the Reach Out track to be be shorter than regular ICDT submissions. Such papers will be judged mainly in terms of their potential to lead to further theoretical developments, and therefore the efforts should be the focused on illustrating the proposed framework via convincing examples, preliminary results, and clearly defined open problems.   

Topics of Interest

Every topic related to the principles of data management is relevant to ICDT. Particularly welcome are contributions that connect data management to theoretical computer science, and those that connect database theory and database practice. Examples of relevant topics are:
- Data mining, information extraction, information retrieval, and database aspects of machine learning
- Data models, design, structures, semantics, query languages, and algorithms for data management
- Distributed and parallel databases, cloud computing
- Connections between databases and knowledge representation
- Graph databases, (semantic) Web data, and Web services
- Data streams and sketching
- Data-centric (business) process management and workflows
- Data and knowledge integration and exchange, data provenance, views, and data warehouses
- Domain-specific databases (multimedia, scientific, spatial, temporal, text)
- Data privacy and security, concurrency, and recovery

Important dates 

Second cycle abstract deadline: September 16, 2018
Full paper submission deadline: September 23, 2018
Notification: December 5, 2018

Program Committee 

Program Committee Chair:
Pablo Barcelo (University of Chile)

Program Committee Members:
Isolde Adler (University of Leeds)    
Antoine Amarilli (Telecom ParisTech)
Paul Beame (University of Washington)
Elena Botoeva (University of Bolzano) 
Victor Dalmau (Universitat Pompeu Fabra) 
Diego Figueira (Labri, CNRS)    
Dominik D. Freydenberger (Loughborough University)
Paolo Guagliardo (University of Edinburgh)    
Zengfeng Huang (The University of New South Wales)
Benny Kimelfeld (Technion, Israel) 
Paris Koutris (University of Wisconsin)     
Maurizio Lenzerini (Sapienza University of Rome)    
Yakov Nekrich (University of Waterloo)    
Matthias Niewerth (University of Bayreuth)
Cristian Riveros (PUC, Chile) 
Sebastian Skritek (TU Vienna)    
Julia Stoyanovich (Drexel University)    
Szymon Torunczyk (University of Warsaw)
Ke Yi (HKUST)     
Thomas Zeume (University of Dortmund)

Submission Instructions

All submissions will be electronic via EasyChair at https://easychair.org/conferences/?conf=icdt2019.

Papers must be written in English and provide sufficient detail to allow the program committee to assess their merits. The results must be unpublished and not submitted for publication elsewhere, including the proceedings of other symposia or workshops. 

Papers must be submitted as PDF documents, using the LIPIcs style (http://www.dagstuhl.de/en/publications/lipics/instructions-for-authors). Papers must be at most 15 pages, excluding references. Additional details may be included in a clearly marked appendix, which, however, will be read at the discretion of the program committee (online appendices are not allowed). Papers not conforming to these requirements may be rejected without further consideration. 

The proceedings will appear in the Leibniz International Proceedings in Informatics (LIPIcs) series, based at Schloss Dagstuhl. This guarantees that the proceedings will be available online and free of charge, while the authors retain the rights over their work. 

At least one author of each accepted paper is expected to register at the conference and to present the paper.

Awards

An award will be given to the Best Paper. Also, an award will be given to the Best Newcomer Paper written by newcomers to the field of database theory. The latter award will preferentially be given to a paper written only by students; in that case the award will be called Best Student-Paper Award. The program committee reserves the following rights: not to give any award; to split an award among several papers; and to define the notion of a newcomer. Papers authored or co-authored by program committee members are not eligible for any award.