Reverse data management

From HandWiki

Reverse data management describes a branch and set of research questions in relational database theory that aim to reverse the common focus of standard data management. Instead of focusing on the "forward" transformation of an input databases (a set of relational tables) to an output table, which is the main focus of standard query evaluation, reverse data management reverses that focus and studies the possible input database transformations that would achieve a desired output. Usually the objective is to find an intervention (a deletion, addition, or change of tuples) of minimal size, in order to achieve a particular change in the output.

The problem has been studied at least since the 1980s,[1] but has received renewed attention due to an influential paper in the early 2000s[2] that made a connection between provenance and view propagation. The term was coined in a VLDB 2011 vision paper.[3] The problem has been receiving significant attention in recent years due to its connection to computational fairness.

Topics reverse data management problems

Example topics in reverse data management include:

  1. Deletion propagation with source side-effects: Find a minimal number of tuples to delete in the database in order to delete a particular tuple in the output.[1][2]
  2. Deletion propagation with view side-effects: Find a set of tuples to delete in the database in order to delete a particular tuple in the output, while removing the minimal number of other output tuples.[2][4]
  3. Causal responsibility: Find a minimal number of tuples to delete in the database in order to make a particular input tuple counterfactual.[5] This notion is inspired by the notions of actual cause and causal responsibility from the work of Halpern and Pearl [6][7]
  4. Resilience: Find a minimal number of tuples to delete in the database in order to make a Boolean query false. The complexity of this problem is identical to the problem of deletion propagation with source-side effects over a different database.[8]
  5. Smallest witness problem: Find a minimal number of tuples to keep in the a database (or equivalently, delete a maximal number of tuples) while keeping a particular tuple in the output.[9]
  6. Minimum repair: Given a database that violates certain integrity constraints, find a minimal number of tuples to delete in the database in order to fulfill all constraints (also called to "repair" the database).[10]

References

  1. 1.0 1.1 Dayal, Bernstein. On the correct translation of update operations on relational views, TODS 1982, https://doi.org/10.1145/319732.319740
  2. 2.0 2.1 2.2 Buneman, Khanna, Tan. On propagation of deletions and annotations through views, PODS 2002, https://doi.org/10.1145/543613.543633
  3. Meliou, Gatterbauer, Suciu. Reverse Data Management. VLDB 2011 Vision track, https://doi.org/10.14778/3402755.3402803
  4. Kimelfeld, Vondrak, Williams. Maximizing conjunctive views in deletion propagation, PODS 2011, https://doi.org/10.1145/1989284.1989308
  5. Meliou, Gatterbauer, Moore, Suciu. The complexity of causality and responsibility for query answers and non-answers, VLDB 2010. https://doi.org/10.14778/1880172.1880176
  6. Joe Y. Halpern and Judea Pearl. Causes and explanations: A structural-model approach. Part I: Causes. Brit. J. Phil. Sci., 56:843–887, 2005.https://doi.org/10.1093/bjps/axi147
  7. H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model approach. J. Artif. Intell. Res. (JAIR), 22:93–115, 2004.https://doi.org/10.1613/jair.1391
  8. Freire, Gatterbauer, Immerman, Meliou. The complexity of resilience and responsibility for self-join-free conjunctive queries, VLDB 2015. https://doi.org/10.14778/2850583.2850592
  9. Miao, Roy, Yang. Explaining Wrong Queries Using Small Examples. SIGMOD 2019. https://doi.org/10.1145/3299869.3319866
  10. Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. PODS 1999.https://doi.org/10.1145/303976.303983