Gems of PODS

The Gems of PODS event features topics and results in PODS that have been highly influential in the PODS community and beyond. The first edition of the Gems of PODS took place at SIGMOD/PODS2016.

2019

Latent Semantic Indexings [PDF]

Abstract

In the late 1990s, the possibility of algorithmic extraction of insight from soulless data loomed potentially important and very intriguing. I will look back at our attempt at understanding and advancing this research program in the light of two decades of blistering progress in spectral methods, machine learning, data harvesting and deep nets.

Consistent Query Answers in Inconsistent Databases [PDF]

Leo Bertossi (Carleton University, Relational AI)

Abstract

In this talk I will review the main concepts around database repairs and consistent query answering, with emphasis on tracing back the origin, motivation, and early developments. I will also describe some research directions that has spun from those main concepts and the original line of research. I will emphasize, in particular, fruitful and recent connections between repairs and causality in databases.

2018

Reflections on Schema Mappings, Data Exchange, and Metadata Managements [PDF]

Phokion G. Kolaitis (UC Santa Cruz and IBM Almaden Research Center)

Abstract

A schema mapping is a high-level specification of the relationship between two database schemas. For the past fifteen years, schema mappings have played an essential role in the modeling and analysis of data exchange, data integration, and related data inter-operability tasks. The aim of this talk is to critically reflect on the body of work carried out to date, describe some of the persisting challenges, and suggest directions for future work. The first part of the talk will focus on schema-mapping languages, especially on the language of GLAV (global-and-local as view) mappings and its two main sublanguages, the language of GAV (global-as-view) mappings and the language of LAV (local-as-view) mappings. After highlighting the fundamental structural properties of these languages, we will discuss how structural properties can actually characterize schema-mapping languages. The second part of the talk will focus on metadata management by considering operators on schema mappings, such as the composition operator and the inverse operator. We will discuss why richer languages are needed to express these operators, and will illustrate some of their uses in schema-mapping evolution. The third and final part of the talk will focus on the derivation of schema mappings from semantic information. In particular, we will discuss a variety of approaches for deriving schema mappings from data examples, including casting the derivation of schema mappings as an optimization problem and as a learning problem.

Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems [PDF]

Hung Q. Ngo (Relational AI)

Abstract

Worst-case optimal join algorithms are the class of join algorithms whose runtime match the worst-case output size of a given join query. While the first provably worse-case optimal join algorithm was discovered relatively recently, the techniques and results surrounding these algorithms grow out of decades of research from a wide range of areas, intimately connecting graph theory, algorithms, information theory, constraint satisfaction, database theory, and geometric inequalities. These ideas are not just paperware: in addition to academic project implementations, two variations of such algorithms are the work-horse join algorithms of commercial database and data analytics engines. This paper aims to be a brief introduction to the design and analysis of worst-case optimal join algorithms. We discuss the key techniques for proving runtime and output size bounds. We particularly focus on the fascinating connection between join algorithms and information theoretic inequalities, and the idea of how one can turn a proof into an algorithm. Finally, we conclude with a representative list of fundamental open problems in this area.

2017

Data Integration: From the Enterprise into Your Kitchen [PDF]

Alon Y. Halevy (Recruit Institute of Technology)

Abstract

Twenty five years ago, data integration was a concern mainly for large enterprises with many autonomous data sources. Since then, while data integration became even more important to enterprises, the technology has ventured out into new areas, such as the Web and recently into voice-activated consumer devices that answer a plethora of questions in your home.  At the core of these systems is a mechanism for describing the semantics and capabilities of individual data sources. Database theory has made important contributions to the area of modeling data sources and query answering in data integration systems. I this talk I will cover some of the main developments in the field of data integration since the mid-90’s, and discuss the challenges that lie ahead.

The Semiring Framework for Database Provenance [PDF]

Val Tannen (University of Pennsylvania)

Abstract

Imagine a computational process that uses a complex input consisting of multiple items'' (e.g.,files, tables, tuples, parameters, configuration rules) The provenance analysis of such a process allows us to understand how the different input items affect the output of the computation. It can be used, for example, to derive confidence in the output (given confidences in the input items), to derive the minimum access clearance for the output (given input items with different classifications), to minimize the cost of obtaining the output (given a complex input item pricing scheme).  It also applies to probabilistic reasoning about an output (given input item distributions), as well as to output maintenance, and to debugging.

Provenance analysis for queries, views, database ETL tools, and schema mappings is strongly influenced by their declarative nature, providing mathematically nice descriptions of the output-inputs correlation. In a series of papers starting with PODS 2007 we have developed an algebraic framework for describing such provenance based on commutative semirings and semimodules over such semirings. So far, the framework has exploited usefully the observation that, for database provenance, data use has two flavors: joint and alternative.

Here, we have selected several insights that we consider essential for the appreciation of this framework's nature and effectiveness and we also give some idea of its applicability.

2016

Optimal Score Aggregation Algorithms  [PDF]

Ronald Fagin (IBM Research - Almaden)

Abstract

Assume that there is a set of "voters" and a set of "candidates", where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate's scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Gödel Prize, the top prize for a paper in theoretical computer science.

Hypertree Decompositions: Questions and Answers  [PDF]

Georg Gottlob (University of Oxford)

Abstract

In the database context, the hypertree decomposition method is used for query optimization, whereby conjunctive queries having a low hypertree width (i.e., a low degree of cyclicity) can be recognized and decomposed automatically, and efficiently evaluated. Queries having bounded hypertree width are also highly parallelizable. Hypertree decompositions were introduced at ACM PODS 1999. This talk reviews - in form of questions and answers - the main relevant concepts and algorithms and surveys selected related work including applications and test results. The talk is accompanied by a paper of the same title authored by Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello.