Lossless join decomposition

From HandWiki

In database design, a lossless join decomposition is a decomposition of a relation [math]\displaystyle{ R }[/math] into relations [math]\displaystyle{ R_1, R_2 }[/math] such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.[1]

Criteria

Lossless join can also be called nonadditive.[2]

If [math]\displaystyle{ R }[/math] is split into [math]\displaystyle{ R_1 }[/math] and [math]\displaystyle{ R_2 }[/math], for this decomposition to be lossless (i.e., [math]\displaystyle{ R_1 \bowtie R_2 = R }[/math]) then at least one of the two following criteria should be met.

Check 1: Verify join explicitly

Projecting on [math]\displaystyle{ R_1 }[/math] and [math]\displaystyle{ R_2 }[/math], and joining them back, results in the relation you started with.[3][unreliable source?]

Check 2: Via functional dependencies

Let [math]\displaystyle{ R }[/math] be a relation schema.

Let F be a set of functional dependencies on [math]\displaystyle{ R }[/math].

Let [math]\displaystyle{ R_1 }[/math] and [math]\displaystyle{ R_2 }[/math] form a decomposition of [math]\displaystyle{ R }[/math] .

The decomposition is lossless if one of the sub-relations (i.e. [math]\displaystyle{ R_1 }[/math] or [math]\displaystyle{ R_2 }[/math]) is a subset of the closure of their intersection. In other words, the decomposition is a lossless-join decomposition of [math]\displaystyle{ R }[/math] if at least one of the following functional dependencies are in F+ (where F+ stands for the closure for every attribute or attribute sets in F):[4]

  • [math]\displaystyle{ R_1 \cap R_2 \rightarrow R_1 }[/math]
  • [math]\displaystyle{ R_1 \cap R_2 \rightarrow R_2 }[/math]

Examples

  • Let [math]\displaystyle{ R = (A, B, C, D) }[/math] be the relation schema, with attributes A, B, C and D.
  • Let [math]\displaystyle{ F = \{ A \rightarrow BC \} }[/math] be the set of functional dependencies.
  • Decomposition into [math]\displaystyle{ R_1 = (A, B, C) }[/math] and [math]\displaystyle{ R_2 = (A, D) }[/math] is lossless under F because [math]\displaystyle{ R_1 \cap R_2 = A) }[/math]. A is a superkey in [math]\displaystyle{ R_1 }[/math], meaning we have a functional dependency [math]\displaystyle{ \{A \rightarrow BC\} }[/math].  In other words, now we have proven that [math]\displaystyle{ (R_1 \cap R_2 \rightarrow R_1) \in F^+ }[/math].

[5][6]

References

  1. Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science 21 (4): 190–212. 
  2. Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777. 
  3. "Lossless Join Property". https://stackoverflow.com/questions/5771810/lossless-join-property. 
  4. "Lossless Join Decomposition". University at Buffalo (Jan Chomicki). http://www.cse.buffalo.edu/~chomicki/560/handout-design.pdf. 
  5. "Lossless-Join Decomposition". http://www.cs.sfu.ca/CourseCentral/354/zaiane/material/notes/Chapter7/node7.html. 
  6. "www.data-e-education.com - Lossless Join Decomposition". http://www.data-e-education.com/E121_Lossless_Join_Decomposition.html.