Data mesh

From HandWiki
Short description: Distributed architecture framework for data management

Data mesh is a sociotechnical approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective), and borrows Eric Evans’ theory of domain-driven design[1] and Manuel Pais’ and Matthew Skelton’s theory of team topologies.[2] Data mesh mainly concerns itself with the data itself, taking the data lake and the pipelines as a secondary concern. [3] The main proposition is scaling analytical data by domain-oriented decentralization.[4] With data mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform.[5]

History

The term data mesh was first defined by Zhamak Dehghani in 2019[6] while she was working as a principal consultant at the technology company Thoughtworks.[7][8] Dehghani introduced the term in 2019 and then provided greater detail on its principles and logical architecture throughout 2020. The process was predicted to be a “big contender” for companies in 2022.[9][10] Data meshes have been implemented by companies such as Zalando,[11] Netflix,[12] Intuit,[13] VistaPrint, PayPal[14] and others.

In 2022, Dehghani left Thoughtworks to found Nextdata Technologies to focus on decentralized data.[15]

Principles

Data mesh is based on four core principles:[16]

  • Domain ownership
  • Data as a product[17]
  • Self-serve data platform
  • Federated computational governance

In addition to these principles, Dehghani writes that the data products created by each domain team should be discoverable, addressable, trustworthy, possess self-describing semantics and syntax, be interoperable, secure, and governed by global standards and access controls.[18] In other words, the data should be treated as a product that is ready to use and reliable.[19]

Data mesh in practice

After its introduction in 2019[6] multiple companies started to implement a data mesh[11][13][14] and share their experiences. Challenges (C) and best practices (BP) for practitioners, include:

C1. Federated data governance
Companies report difficulties to adopt a federated governance structure for activities and processes that were previously centrally owned and enforced. This is especially true for security, privacy, and regulatory topics.[20][21][22]
C2. Responsibility shift
In data mesh individuals within domains are end-to-end responsible for data products. This new responsibility can be challenging, because it is rarely compensated and usually benefits other domains.[20][21]
C3. Comprehension
Research has shown a severe lack of comprehension for the data mesh paradigm among employees of companies implementing a data mesh.[20]
BP1. Cross-domain unit
Addressing C1, organizations should introduce a cross-domain steering unit responsible for strategic planning, use case prioritization, and the enforcement of specific governance rules—especially concerning security, regulatory, and privacy-related topics. Nevertheless, a cross-domain steering unit can only complement and support the federated governance structure and may grow obsolete with the increasing maturity of the data mesh.[20][23]
BP2. Track and observe
Addressing C2., organizations should observe and score data product quality as tracking and ranking key data products can encourage high-quality offerings, motivate domain owners, and support budget negotiations.[20]
BP3. Conscious adoption
Organizations should thoroughly assess and evaluate their existing data systems, consider organizational factors, and weigh the potential benefits before implementing a data mesh. When introducing data mesh, it is advised to carefully and consciously introduce data mesh terminology to ensure a clear understanding of the concept (C3).[20]

Community

Scott Hirleman has started a data mesh community that contains over 7,500 people in their Slack channel.[24]

See also

  • Data management
  • Data platform
  • Data vault modeling, method of data modeling with storage of data from various operational systems and tracing of data origin, facilitating auditing, loading speeds and resilience
  • Data warehouse, a well established type of database system for organizing data in a thematic way
  • ETL and ELT

References

  1. Evans, Eric (2004). Domain-driven design : tackling complexity in the heart of software. Boston: Addison-Wesley. ISBN 0-321-12521-5. OCLC 52134890. https://www.worldcat.org/oclc/52134890. 
  2. Skelton, Matthew (2019). Team topologies : organizing business and technology teams for fast flow. Manuel Pais. Portland, OR. ISBN 978-1-942788-84-3. OCLC 1108538721. https://www.worldcat.org/oclc/1108538721. 
  3. Machado, Inês Araújo; Costa, Carlos; Santos, Maribel Yasmina (2022-01-01). "Data Mesh: Concepts and Principles of a Paradigm Shift in Data Architectures" (in en). Procedia Computer Science. International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies 2021 196: 263–271. doi:10.1016/j.procs.2021.12.013. ISSN 1877-0509. 
  4. "Data Mesh Architecture" (in en). https://datamesh-architecture.com/. 
  5. Dehghani, Zhamak (2022). Data Mesh. Sebastopol, CA. ISBN 978-1-4920-9236-0. OCLC 1260236796. https://www.worldcat.org/oclc/1260236796. 
  6. 6.0 6.1 "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh". martinfowler.com. https://martinfowler.com/articles/data-monolith-to-mesh.html. 
  7. Baer (dbInsight), Tony. "Data Mesh: Should you try this at home?" (in en). https://www.zdnet.com/article/data-mesh-should-you-try-this-at-home/. 
  8. Andy Mott (2022-01-12). "Driving Faster Insights with a Data Mesh" (in en-US). https://www.rtinsights.com/driving-faster-insights-with-a-data-mesh/. 
  9. "Developments that will define data governance and operational security in 2022" (in en-US). 2021-12-28. https://www.helpnetsecurity.com/2021/12/28/data-governance-2022/. 
  10. Bane, Andy. "Council Post: Where Is Industrial Transformation Headed In 2022?" (in en). https://www.forbes.com/sites/forbestechcouncil/2022/01/13/where-is-industrial-transformation-headed-in-2022/. 
  11. 11.0 11.1 Schultze, Max; Wider, Arif (2021). Data Mesh in Practice. ISBN 978-1-09-810849-6. 
  12. (in en) Netflix Data Mesh: Composable Data Processing - Justin Cunningham, https://www.youtube.com/watch?v=TO_IiN06jJ4, retrieved 2022-04-29 
  13. 13.0 13.1 Baker, Tristan (2021-02-22). "Intuit's Data Mesh Strategy" (in en). https://medium.com/intuit-engineering/intuits-data-mesh-strategy-778e3edaa017. 
  14. 14.0 14.1 "The next generation of Data Platforms is the Data Mesh" (in en-US). 2022-08-03. https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522/. 
  15. "Why We Started Nextdata" (in en-US). 2022-01-16. https://medium.com/@zhamakd/why-we-started-nextdata-dd30b8528fca/. 
  16. Dehghani, Zhamak (2022). Data Mesh. Sebastopol, CA. ISBN 978-1-4920-9236-0. OCLC 1260236796. https://www.worldcat.org/oclc/1260236796. 
  17. "Data Mesh defined | James Serra's Blog" (in en-US). 16 February 2021. https://www.jamesserra.com/archive/2021/02/data-mesh/. 
  18. "Analytics in 2022 Means Mastery of Distributed Data Politics" (in en-US). 2021-12-29. https://thenewstack.io/analytics-in-2022-means-mastery-of-distributed-data-politics/. 
  19. "Developments that will define data governance and operational security in 2022" (in en-US). 2021-12-28. https://www.helpnetsecurity.com/2021/12/28/data-governance-2022/. 
  20. 20.0 20.1 20.2 20.3 20.4 20.5 Bode, Jan; Kühl, Niklas; Kreuzberger, Dominik; Hirschl, Sebastian; Holtmann, Carsten (2023-05-04). "Data Mesh: Motivational Factors, Challenges, and Best Practices". arXiv:2302.01713v2 [cs.AI].
  21. 21.0 21.1 Vestues, Kathrine; Hanssen, Geir Kjetil; Mikalsen, Marius; Buan, Thor Aleksander; Conboy, Kieran (2022). "Agile Data Management in NAV: A Case Study". Agile Processes in Software Engineering and Extreme Programming. Lecture Notes in Business Information Processing 445 LNBIP. 445. Springer. pp. 220–235. doi:10.1007/978-3-031-08169-9_14. ISBN 978-3-031-08168-2. 
  22. Joshi, Divya; Pratik, Sheetal; Rao, Madhu Podila (2021). "Datagovernanceindata mesh infrastructures: The Saxo bank case study". 21. pp. 599–604. 
  23. Whyte, Martin; Odenkirchen, Andreas; Bautz, Stephan; Heringer, Agnes; Krukow, Oliver (2022). "Data Mesh - Just another buzzword or the next generation data platform?". PwC study 2022: Changing data platforms. https://www.pwc.de/en/digitale-transformation/data-mesh-the-next-generation-enterprise-data-platform.html. 
  24. "The Global Home for Data Mesh" (in en-US). https://datameshlearning.com/.