Monotone queries and preservation theorems

Submitted by Jan Van den Bussche on Thu, 10/27/2016 - 14:11

motone bird

Let me begin with a call for blog posts: if you do research work on the principles of data management, please consider registering and contribute a blog post! It is really not the intention that this blog contains mainly blog posts by myself!

Another request: if you think this site is (or has the potential to become) relevant, please link to it from your homepage! For example, you could say your area is the principles of data management and then put a link to this site under that.

Anyway. Last week I was visiting Dirk Van Gucht at Indiana University Bloomington, together with my student Dimitri Surinx. Visiting there always feels a bit like coming home. Among the many things we discussed was the notion of monotone query. We actually realized that there are two different definitions of "monotone query". Let's focus on yes/no queries for simplicity.

Let us call a query Q "model-theory-monotone" if, whenever you have two database instances A and B with the same active domain, and B has all the facts of A, and Q is yes on A, then Q is also yes on B.

A stronger notion is "database-theory-monotone". The definition is the same, except that A and B need not have the same active domain.

A classical result in model theory, that follows from Lyndon's interpolation theorem, is that the model-theory-monotone first-order queries are exactly those expressible by positive sentences. Beware that this result allows infinite instances, so the definition of "monotone" refers to all possible instances A and B, even infinite ones.

If you loosen the definition so that it only needs to hold for finite instances, the characterization by positive sentences fails, as shown by Ajtai and Gurevich.

Consider the query expressed by the positive sentence "forall x forall y R(x,y)", interpreted under active-domain semantics. This query basically says that relation R is completely full given the current active domain, so it is obviously model-theory-monotone. However, it is clearly not database-theory monotone. The database-theory monotone queries, still allowing infinite instances, are now characterized not by the positive sentences but by the positive-existential sentences with nonequalities. (Positive-existential is the same as union of conjunctive queries.) This is proven in the recent great book on query reformulation by Benedikt, Leblay, ten Cate, and Tsamoura.

It appears to be an open question whether that characterization fails in restriction to finite instances. For a long time it seemed like basically every classical model-theoretic result fails in restriction to finite instances (this sentiment probably originates in the classic paper by Gurevich on logic tailored for computational complexity). However, Benjamin Rossman famously proved more than 10 years ago that the queries that are invariant under homomorphisms, even just over finite instances only, are characterized by the unions of conjunctive queries without nonequalities. A year ago, Benjamin put out an improved version of that result.

The question whether database-theory-monotone over finite instances equals unions of conjunctive queries with nonequalities was also raised in Martin Ugarte's PhD thesis, which is an interesting read as it translates classical preservation theorems to the RDF data model and the SPARQL query language.