Data-centric AI
Data-centric AI is an approach within artificial intelligence that emphasizes on improving the quality, consistency and representativeness of the data used to train machine learning models, rather than focusing primarily on optimizing model architectures or algorithms.[1] This idea has gained traction as researchers and practitioners have come to believe that many performance limitations of machine learning systems stem from issues such as noisy labels, biased datasets, and lack of coverage in the data.[2] Data-centric AI involves disciplined approach to data cleaning, augmentation, labeling, and governance that improves model performance and reliability in applications such as computer vision, natural language processing, and further.[3][4][5][6]
See also
- Artificial intelligence
- Machine learning
- Data preprocessing
- Training data
- Data quality
- Feature engineering
- MLOps
- Data governance
References
- ↑ Ng, Andrew (2021). "MLOps: From Model-centric to Data-centric AI". https://www.deeplearning.ai/the-batch/data-centric-ai-development-part-2/.
- ↑ Sambasivan, Nithya; Kapania, Shubham; Highfill, Hannah; Akrong, Danaë; Paritosh, Praveen; Aroyo, Lora (2021). ""Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI". Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–15. doi:10.1145/3411764.3445518. ISBN 978-1-4503-8096-6.
- ↑ Zaharia, Matei (2021). "The Rise of Data-centric AI". https://databricks.com/blog/2021/07/19/the-rise-of-data-centric-ai.html.
- ↑ Polyzotis, Neoklis; Roy, Sudip; Whang, Steven Euijong; Zinkevich, Martin (2017). "Data Management Challenges in Production Machine Learning". doi:10.1145/3035918.3054782.
- ↑ Halevy, Alon; Norvig, Peter; Pereira, Fernando (2009). "The Unreasonable Effectiveness of Data". IEEE Intelligent Systems 24 (2): 8–12. doi:10.1109/MIS.2009.36. Bibcode: 2009IISys..24b...8H.
- ↑ Northcutt, Curtis G.; Jiang, Lu; Chuang, Isaac L. (2021). "Confident Learning: Estimating Uncertainty in Dataset Labels". Journal of Artificial Intelligence Research 70: 1373–1411. doi:10.1613/jair.1.12125.
