July 2016

Gems of PODS

Submitted by Marcelo Arenas on Wed, 07/20/2016 - 16:16

For the first time in PODS, there was a session devoted to some papers that were published in this conference and have been very influential. In PODS 2016 this session included two talks:

  • Optimal Score Aggregation Algorithms by Ron Fagin
  • Hypertree Decompositions: Question and Answers by Georg Gottlob

The first talk was about the paper [1] published in PODS 2001, while the second one was about the paper [2] published in PODS 1999. It is well worth reading both articles! You can find the presentations of both talks at the Gems of PODS page.

In this post I would to give some more details about [2], which I know better and have used in my own research.

Consider the following evaluation problem: given a Boolean conjunctive query Q and a database instance I, decide whether Q holds in I. This problem is known to be NP-complete, which motivated Gottlob, Leone and Scarcello to look for natural restrictions on conjunctive queries that could lead to tractability. In particular, motivated by this issue they introduced in [2] the notions of hypertree decomposition and hypertree width of a conjunctive query. The hypertree width of a conjunctive query Q is a natural number greater than or equal to 1 that essentially measures how acyclic Q is, the lower this value the more acyclic the query is. In fact, the hypertree width of a conjunctive query Q is 1 if and only if Q is acyclic in the usual sense studied in the database literature since the early 80s (see [3] for a definition of this notion). The notion of hypertree width helps in solving the intractability issue of the evaluation problem; for the class of conjunctive queries whose hypertree width is bounded by a fixed constant k the evaluation problem can be solved in polynomial time. Moreover, it can be verified in polynomial time whether the hypertree width of a conjunctive query is bounded by a fixed constant k.

It is important to notice that the notions of hypertree decomposition and hypertree width can also be used for the more general case of non-Boolean conjunctive queries, but the size of the output needs to be taken into consideration in this context. A more detailed discussion about this and many more details about these two notions can be found in the article accompanying the invited talk by Georg [4], including some more recent results concerning them and a section discussing the applicability of these techniques in real-world applications.

References

  1. Ronald Fagin, Amnon Lotem, Moni Naor: Optimal Aggregation Algorithms for Middleware. PODS 2001: 102-113.
  2. Georg Gottlob, Nicola Leone, Francesco Scarcello: Hypertree Decompositions and Tractable Queries. PODS 1999: 21-32.
  3. Mihalis Yannakakis: Algorithms for Acyclic Database Schemes. VLDB 1981: 82-94.
  4. Georg Gottlob, Gianluigi Greco, Nicola Leone, Francesco Scarcello: Hypertree Decompositions: Questions and Answers. PODS 2016: 57-74.

Nominate SIGMOD Record Highlights!

Submitted by Jan Van den Bussche on Tue, 07/05/2016 - 12:37

SIGMOD Record logoSIGMOD Record now accepts nominations for papers on data management that you think deserve to be placed in the spotlight. We should nominate theory papers from ICDT as well as from PODS! I just submitted the following nomination:

"Distributed Streaming with Finite Memory" by Frank Neven, Nicole Schweikardt, Frédéric Servais, and Tony Tan

Presented at ICDT 2015.

This paper proposes a tractable formal computational model for MapReduce computations. The focus is on computations where the worker nodes use bounded memory. This models the common situation where the data local to each node is processed in a streaming fashion. Formal computational models are useful because they allow to rigorously prove impossibility results. For example, it is well known how to compute a join in MapReduce. The algorithm distributes tuples on their join attribute in the Map phase; then all the Reduce phase has to do is to essentially output Cartesian products. Obviously, this Cartesian product phase can become a quadratic bottleneck if the data is skewed. The model in the paper allows us to prove formally that this quadratic bottleneck is unavoidable.

The paper contains many more insights. What if we just want to compute a semijoin? Then the output is linear. Suppose we semijoin R to S on a common attribute A. The reducer assigned to a particular A-value x can detect in a streaming fashion that the semijoin is nonempty; it simply checks that it has received both R- and S-tuples. However, once it has detected that, it is too late to produce the output consisting of all the R-tuples; it could not remember all of them because the memory is bounded. If the reducer gets a second pass over the data, it can now stream again over the R-tuples and output them. The necessity of a second pass is again something that can be rigorously proved.

Other results of the paper include a formal proof that allowing more rounds allows more and more queries to be computed. This paper is theoretical computer science at its best; it models a practical paradigm of computing and attempts to explain and gain understanding about what is possible and what is not when using this paradigm.

Categories