VACUUM

From HandWiki

VACUUM[1][2][3][4] is a set of normative guidance principles for achieving training and test dataset quality for structured datasets in data science and machine learning. The garbage-in, garbage out principle motivates a solution to the problem of data quality but does not offer a specific solution. Unlike the majority of the ad-hoc data quality assessment metrics often used by practitioners[5] VACUUM specifies qualitative principles for data quality management and serves as a basis for defining more detailed quantitative metrics of data quality.[6]

VACUUM is an acronym that stands for:

  • valid
  • accurate
  • consistent
  • uniform
  • unified
  • model

References

  1. https://books.google.com/books?id=XPBbEAAAQBAJ&q=VACUUM
  2. "VACUUM". https://www.enterprisedb.com/edb-docs/d/postgresql/reference/manual/12.3/sql-vacuum.html. 
  3. Jim Nasby (2015), All the Dirt on VACUUM, PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross, https://av.tib.eu/media/19118, retrieved 2021-04-27 
  4. "An Overview of VACUUM Processing in PostgreSQL" (in en). 2019-11-22. https://severalnines.com/database-blog/overview-vacuum-processing-postgresql. 
  5. Pipino, Leo L.; Lee, Yang W.; Wang, Richard Y. (2002-04-01). "Data quality assessment". Communications of the ACM 45 (4): 211–218. doi:10.1145/505248.506010. ISSN 0001-0782. https://doi.org/10.1145/505248.506010. 
  6. Wang, R.Y.; Storey, V.C.; Firth, C.P. (August 1995). "A framework for analysis of data quality research". IEEE Transactions on Knowledge and Data Engineering 7 (4): 623–640. doi:10.1109/69.404034. https://ieeexplore.ieee.org/document/404034.