Imieliński-Lipski algebra
An Imieliński-Lipski algebras is an extension of relational algebra onto tables with different types of null values. It is used to operate on relations with incomplete information. Imieliński-Lipski algebras are defined to satisfy precise conditions for semantically meaningful extension of the usual relational operators, such as projection, selection, union, and join, from operators on relations to operators on relations with various kinds of "null values".
These conditions require that the system be safe in the sense that no incorrect conclusion is derivable by using a specified subset F of the relational operators; and that it be complete in the sense that all valid conclusions expressible by relational expressions using operators in F are in fact derivable in this system. For example, it is well known that the 3-valued logic approach to deal with null values, supported treatment of nulls values by SQL is not complete, see Ullman book.[1] To show this, let T be:
NAME | CLASS | GRADE | SEMESTER |
---|---|---|---|
Rohit | Databases | B | Spring |
Igor | Networks | A | Template:NULL |
Take SQL query Q
Select NAME From T Where (CLASS = 'Networks' AND SEMESTER = 'Spring') OR (GRADE = 'A' AND SEMESTER <> 'Spring')
SQL query Q will return empty set (no results) under 3-valued semantics currently adopted by all variants of SQL. This is the case because in SQL, NULL is never equal to any constant – in this case, neither to “Spring” nor “Fall” nor “Winter” (if there is Winter semester in this school). NULL='Spring'
will evaluate to MAYBE and so will NULL='Fall'
. The disjunction MAYBE OR MAYBE evaluates to MAYBE (not TRUE). Thus Igor will not be part of the answer (and of course neither will Rohit). But Igor should be returned as the answer.
Indeed, regardless what semester Igor took the Networks class (no matter what was the unknown value of NULL), the selection condition will be true. This “Igor” will be missed by SQL and the SQL answer won’t be complete according to completeness requirements specified in Tomasz Imieliński, Witold Lipski, 'Incomplete Information in Relational Databases'.[2] It is also argued there that 3-valued logic (TRUE, FALSE, MAYBE) can never provide guarantee of complete answer for tables with incomplete information.
Three algebras which satisfy conditions of safety and completeness are defined as Imielinski-Lipski algebras: the Codd-Tables algebra, the V-tables algebra and the Conditional tables (C-tables) algebra.
Codd-tables Algebra
Codd-tables Algebra is based on the usual Codd's singe NULL values. The table T above is an example of Codd-table. Codd-table algebra supports projection and positive selections only. It is also demonstrated in [IL84 that it is not possible to correctly extend more relational operators over Codd-Tables. For example, such basic operation as join is not extendable over Codd-tables. It is not possible to define selections with Boolean conditions involving negation and preserve completeness. For example, queries like the above query Q cannot be supported. In order to be able to extend more relational operators, more expressive form of null value representation is needed in tables which are called V-table.
V-Tables Algebra
V-tables algebra is based on many different ("marked") null values or variables allowed to appear in a table. V-tables allow to show that a value may be unknown but the same for different tuples. For example, in the table below Gaurav and Igor order the same (but unknown) beer in two unknown bars (which may, or may not be different – but remain unknown). Gaurav and Jane frequent the same unknown bad (Y1). Thus, instead one NULL value, we use indexed variables, or Skolem constants .[3]
DRINKER | BEER | BAR |
---|---|---|
Zhihan | Heineken | Cabana |
Gaurav | X1 | Y1 |
Igor | X1 | Y2 |
Jane | Bud | Y1 |
John | X2 | Y3 |
V-Tables algebra is shown to correctly support projection, positive selection (with no negation occurring in the selection condition), union, and renaming of attributes, which allows for processing arbitrary conjunctive queries. A very desirable property enjoyed by the V-table algebra is that all relational operators on tables are performed in exactly the same way as in the case of the usual relations.
Conditional Tables (C-Tables) Algebra
Example of conditional table (c-table) is shown below.
NAME | CLASS | GRADE | SEMESTER | con |
---|---|---|---|---|
Rohit | Databases | B | Spring | true |
Igor | Networks | A | x | x = 'Spring' |
Igor | Networks | A | x | x <> 'Spring' |
It has additional column “con” which is a Boolean condition involving variables, null values – same as in V-tables.
Select * From T Where (CLASS = 'Networks' AND SEMESTER = 'Spring') OR (GRADE = 'A' AND SEMESTER <> 'Spring')
over the following table c-table
NAME | CLASS | GRADE | SEMESTER | con |
---|---|---|---|---|
Rohit | Databases | B | Spring | true |
Igor | Networks | A | x | true |
Conditional Tables algebra, mainly of theoretical interest, supports projection, selection, union, join, and renaming. Under closed world assumption, it can also handle the operator of difference, thus it can support all relational operators.
History
Imieliński-Lipski algebras were introduced by Tomasz Imieliński and Witold Lipski Jr. in 'Incomplete Information in Relational Databases' [2]
References
- ↑ J.D. Ullman (1982). Principles of Database Systems (2nd ed.). Computer Science Press, Potomac, MD..
- ↑ 2.0 2.1 Imieliński, T.; Lipski Jr., W. (1984). "Incomplete information in relational databases". Journal of the ACM 31 (4): 761–791. doi:10.1145/1634.1886. https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fEYp6hEAAAAJ&citation_for_view=fEYp6hEAAAAJ:UxriW0iASnsC.
- ↑ Skolem constants, http://www.sds.lcs.mit.edu/spd/larch/LP/glossary/skolem.html
Further reading
- Abiteboul, S.; Kanellakis, P.; Grahne, G. (1991). "On the representation and querying of sets of possible worlds". Theoretical Computer Science 78 (1): 159–187. doi:10.1016/0304-3975(51)90007-2. http://users.encs.concordia.ca/~grahne/papers/sigmod87.pdf.
- Abiteboul, S.; Duschka, O.M. (1998). "Complexity of Answering Queries Using Materialized Views". Proc. ACM SIGMOD-SIGACT-SIGART, PODS: 254–263. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.2956&rep=rep1&type=pdf.
- Green, T.J.; Karvounarakis, G.; Tannen, Val (2007). "Provenance Semiring". Proc. ACM SIGMOD-SIGACT-SIGART, PODS: 31–40. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research.
- Karvounarakis, G.; Green, T.J. (2012). "Semiring-Annotated Data: Queries and Provenance". ACM SIGMOD 41 (3): 5–14. doi:10.1145/2380776.2380778. https://users.dcc.uchile.cl/~pbarcelo/KG.pdf.
- T.J. Green (2009). Models for Incomplete and Probabilistic Information; Chapter 2, in Managing and Mining Uncertain Data. Springer Link.
- Grahne, G.; Kiricenko, V. (Nov 2004). "Towards an algebraic theory of information integration". Information and Computation 194 (2): 79–100. doi:10.1016/j.ic.2004.07.003. http://users.encs.concordia.ca/~grahne/papers/take5.pdf.