Quality of Data

From HandWiki

Quality of Data (QoD) is a designation coined by L. Veiga, that specifies and describes the required Quality of Service of a distributed storage system from the Consistency point of view of its data. It can be used to support big data management frameworks, Workflow management, and HPC systems (mainly for data replication and consistency). It takes into account data semantics, namely the Time interval of data freshness, the Sequence of tolerable number of outstanding versions of the data read before ore refresh, and the Value divergence allowed before displaying it. Initially it was based on a model from an existing research work regarding vector-field Consistency,[1] awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007 and later enhanced for increased scalability and fault-tolerance.[2]

This consistency model has been successfully applied and proven in big data key/value store Apache HBase,[3] initially designed as a middleware[4] module seating between clusters from separate data centres. The HBase-QoD coupling [5] minimises bandwidth usage and optimises resources allocation during replication achieving the desired consistency level at a more fine-grained level.

QoD is defined by the three-dimensions of vector k=(θ,σ,ν), but with a broader view of the issue, applicable also to large-scale data management techniques in regards to their timely delivery.[6]

Other descriptions

Quality of Data should not be confused with other definitions for data quality such as completeness, validity, and accuracy.[7] [8]

References