# Gems of PODS

The Gems of PODS event features topics and results in PODS that have been highly influential in the PODS community and beyond. The first edition of the Gems of PODS took place at SIGMOD/PODS2016.

### 2017

Data Integration: From the Enterprise into Your Kitchen

Alon Y. Halevy (Recruit Institute of Technology)

Abstract

Twenty five years ago, data integration was a concern mainly for large enterprises with many autonomous data sources. Since then, while data integration became even more important to enterprises, the technology has ventured out into new areas, such as the Web and recently into voice-activated consumer devices that answer a plethora of questions in your home.  At the core of these systems is a mechanism for describing the semantics and capabilities of individual data sources. Database theory has made important contributions to the area of modeling data sources and query answering in data integration systems. I this talk I will cover some of the main developments in the field of data integration since the mid-90’s, and discuss the challenges that lie ahead.

The Semiring Framework for Database Provenance [PDF]

Val Tannen (University of Pennsylvania)

Abstract

Imagine a computational process that uses a complex input consisting of multiple items'' (e.g.,files, tables, tuples, parameters, configuration rules) The provenance analysis of such a process allows us to understand how the different input items affect the output of the computation. It can be used, for example, to derive confidence in the output (given confidences in the input items), to derive the minimum access clearance for the output (given input items with different classifications), to minimize the cost of obtaining the output (given a complex input item pricing scheme).  It also applies to probabilistic reasoning about an output (given input item distributions), as well as to output maintenance, and to debugging.

Provenance analysis for queries, views, database ETL tools, and schema mappings is strongly influenced by their declarative nature, providing mathematically nice descriptions of the output-inputs correlation. In a series of papers starting with PODS 2007 we have developed an algebraic framework for describing such provenance based on commutative semirings and semimodules over such semirings. So far, the framework has exploited usefully the observation that, for database provenance, data use has two flavors: joint and alternative.

Here, we have selected several insights that we consider essential for the appreciation of this framework's nature and effectiveness and we also give some idea of its applicability.

### 2016

#### Optimal Score Aggregation Algorithms [PDF]

Ronald Fagin (IBM Research - Almaden)

#### Abstract

Assume that there is a set of "voters" and a set of "candidates", where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate's scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Gödel Prize, the top prize for a paper in theoretical computer science.

#### Hypertree Decompositions: Questions and Answers [PDF]

Georg Gottlob (University of Oxford)

#### Abstract

In the database context, the hypertree decomposition method is used for query optimization, whereby conjunctive queries having a low hypertree width (i.e., a low degree of cyclicity) can be recognized and decomposed automatically, and efficiently evaluated. Queries having bounded hypertree width are also highly parallelizable. Hypertree decompositions were introduced at ACM PODS 1999. This talk reviews - in form of questions and answers - the main relevant concepts and algorithms and surveys selected related work including applications and test results. The talk is accompanied by a paper of the same title authored by Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello.