standadization

Task force for the design of a query language for graph-structured data

Submitted by Pablo Barcelo on Tue, 08/30/2016 - 17:03

For more than a year already, I have been collaborating with people from industry and academia on the standardization of the data model and query language for graph-structured data. This group includes participants from world-leading graph database engines, in particular, Neo Technologies, SAP, Oracle, Sparsity Technologies, and IBM. It also includes graph database researchers from different academic institutions around the world, with interests ranging from the purely theoretical to the most applied. As the reader might imagine, it is not always easy understanding each other in such a diverse environment, in the same way that agreement is not always easily reached. Nevertheless, it is fair to say that discussions have, in general, been enriching and fruitful, and that the decisions taken by the group are often reasonable and well thought out. I am personally convinced that this diversity of views is a necessary condition for a standardization process of this kind to be successful: While developers bring the view from real-world applications and provide use cases for the features of the language, theoreticians define formal syntax and semantics, and establish the limits for unreasonably costly expressiveness needs.

So far, the group has managed to accomplish several tasks. First, we identified a graph data model that can be applied in a wide-range of practical applications. This corresponds to the so-called property graphs, which are essentially finite, directed, node- and edge-labeled graphs, with attributes in both nodes and edges. Then the group embarked in a vast revision of the literature and use cases as a way to understand what  features are needed in a graph query language. The basic such features identified correspond to: pattern matching, path-based navigation, aggregation, and transformations among different data types (e.g., tables, paths, and graphs themselves). Based on such features, the group is currently working on the design of a general-purpose query language which should be finished during the first semester of 2017.

The most rewarding conclusion I can take from this task force is the fact that an important part of the theoretical work done in graph databases over the last three decades is of relevance to practice. For instance, while well-studied regular path queries (RPQs) are not currently integrated in graph query languages (e.g., in Cypher), the group has agreed on their importance and will for sure become part of the standard. In the same way, recent work on graph query languages with data comparisons has helped establishing the limits of what this standard should express in terms of them. Finally, the work done by Mendelzon and Wood in the late 80s, regarding the high cost of interpreting RPQs under a simple path semantics, has provided us with good arguments for discarding such interpretation (that is often the one that developers prefer), inciting us to look for more efficient, yet practically relevant, versions of it.

Submitted by Jan Van den Bussche on Thu, 09/01/2016 - 14:16

Permalink

I would be interested in hearing how the graph query language standardization efforts relate or compare to the JSON query language standardization efforts, which these days goes under the header of "SQL++".