Regular path query

From HandWiki

In databases and specifically in graph databases, a regular path query[1] or RPQ is a query asking for pairs of endpoints in the database that are connected by a path satisfying a certain regular expression. A similar feature exists in the SPARQL query language as "property paths"

Definition

A graph database consists of a directed graph whose edges carry a label. A regular path query is just a regular expression over the set of labels. For instance, in a graph database where vertices represent users and there is an edge label "parent" for edges from a parent to a child, the regular path query [math]\displaystyle{ \text{parent} \text{parent}^* }[/math] would select pairs of a node x and a descendant y of x, with a path from x to y of "parent" edges having length 1 or more.

Semantics

The answers to RPQs can consist of endpoint pairs, i.e., pairs of nodes x and y that are connected by some path satisfying the regular expression; or it can consist of the list of all paths satisfying the regular expression. However, this set of paths is generally infinite.

To ensure that the number of results is not infinite, the semantics of RPQs is sometimes defined to return only the simple paths, i.e., the paths that do not go twice via the same vertex; or the trails, i.e., the paths that do not go twice through the same edge.[2]

Complexity

The evaluation of regular path queries (RPQ), in the sense of returning all endpoint pairs, can be performed in polynomial time. To do this, for every endpoint pair, we can see the graph database as a finite automaton, also represent the regular path query as a finite automaton, and check if a suitable path exists by checking that the intersection of both languages is nonempty (i.e., solving the emptiness problem), for instance via the product automaton construction.

Other problems

Several classical problems about queries have been studied for regular path queries, such as query containment[3] and query rewriting.[4]

Extensions

Database theory research has investigated more expressive variants of RPQs:

  • Two-way RPQs aka 2RPQs, which can also traverse edges in the reverse direction. More precisely, a 2RPQ is a regular expression that uses the labels of the graph together with labels corresponding to reverse edges. For instance, the RPQ [math]\displaystyle{ \text{parent}^- \text{parent} }[/math] selects pairs of nodes x and y with a path from x to y going first backward on a parent edge, then forward on a parent edge, i.e., x and y are siblings.
  • Conjunctive regular path queries aka CRPQ, which are conjunctive queries whose atoms are RPQs. Such queries make it possible to test for more complex patterns than just paths, however they are intractable to evaluate.
  • A further extension allowing both disjunctions (like union of conjunctive queries) and two-way expressions are UC2RPQs.[5]

References

  1. "Answering regular path queries using views | IEEE Conference Publication | IEEE Xplore". https://ieeexplore.ieee.org/abstract/document/839439. 
  2. Martens, Wim; Trautner, Tina (2019-10-15). "Dichotomies for Evaluating Simple Regular Path Queries". ACM Transactions on Database Systems 44 (4): 16:1–16:46. doi:10.1145/3331446. ISSN 0362-5915. https://doi.org/10.1145/3331446. 
  3. Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Vardi, M. Y. (2003-12-01). "Reasoning on regular path queries". ACM SIGMOD Record 32 (4): 83–92. doi:10.1145/959060.959076. ISSN 0163-5808. https://doi.org/10.1145/959060.959076. 
  4. Calvanese, Diego; De Giacomo, Giuseppe; Lenzerini, Maurizio; Vardi, Moshe Y. (1999-05-01). "Rewriting of regular expressions and regular path queries". Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS '99 (New York, NY, USA: Association for Computing Machinery): 194–204. doi:10.1145/303976.303996. ISBN 978-1-58113-062-1. https://dl.acm.org/doi/10.1145/303976.303996. 
  5. Figueira, Diego; Morvan, Rémi (November 2023). Semantic Tree-Width and Path-Width of Conjunctive Regular Path Queries. https://hal.science/hal-04311290.