Software:Modern Data Stack

From HandWiki
Short description: Set of data technologies


The Modern Data Stack (MDS) is a Software stack that can be used to implement a Data management platform with a strong focus on Scalability, Interoperability and Ease of use.[1]

The technology enabling the emergence of the Modern Data Stack was the Cloud data warehouse.[1][2] With the broad adoption of Amazon Redshift, the first Cloud data warehouse that gained significant traction, an entire ecosystem of technologies emerged. While most technologies belonging to the MDS can be accessed only as Software as a service, there are several of them that can be Self-hosted due to their Open-source software licensing model.

History

Even though the very first technologies in the MDS were launched as early as 2011 (BigQuery), the term "Modern Data Stack" itself did not gain broad adoption until 2020, the year when the first Modern Data Stack Conference (MDSCON)[3] was organised by Fivetran. Another event with a strong focus on the Modern Data Stack is Coalesce, an Analytics Engineering conference organised by dbt Labs, the creators of dbt, that was also inaugurated in 2020[4].

Layers

An useful way to look at the Modern Data Stack is to structure its technologies into functional layers:[5][6][7][8][9][10][11]

  • Storage and Processing
  • Ingestion and Transport
  • Transformation and Modeling
  • Analysis and Output
  • Quality and Observability
  • Discovery and Governance
  • Privacy and Security
  • Orchestration

Storage and Processing

The storage layer contains cloud data storage services with a SQL interface:

Ingestion and Transport

Services and technologies in this layer:

  • Fivetran
  • Stitch
  • Airbyte (open source)
  • Meltano (open source)
  • PipelineWise (open source)

Transformation and Modeling

Services and technologies in this layer:

  • dbt (open source)
  • LookML
  • Dataform

Analysis and Output

Services and technologies in this layer:

Quality and Observability

Services and technologies in this layer:

  • Monte Carlo
  • Great Expectations (open source)
  • Datafold

Discovery and Governance

Services and technologies in this layer:

  • Collibra
  • Amundsen (open source)
  • Atlan
  • Alation
  • DataHub (open source)[12]

Privacy and Security

Services and technologies in this layer:

  • Privacera
  • Immuta

Orchestration

Services and technologies in this layer:

References

  1. 1.0 1.1 "The Modern Data Stack: Past, Present, and Future" (in en). 2020-12-01. https://blog.getdbt.com/future-of-the-modern-data-stack/. 
  2. (in en) Future Data 2020 - Tristan Handy - The Modern Data Stack: Past, Present, and Future, https://www.youtube.com/watch?v=1Zj8gTLdf5s, retrieved 2022-04-24 
  3. "Fivetran Hosts Inaugural Modern Data Stack Conference on October 21-22, 2020" (in en). 2020-10-06. https://www.businesswire.com/news/home/20201006005433/en/Fivetran-Hosts-Inaugural-Modern-Data-Stack-Conference-on-October-21-22-2020. 
  4. "Coalesce 2020" (in en-US). https://www.getdbt.com/coalesce-2020/. 
  5. "Data Architecture Revisited: The Platform Hypothesis" (in en). 2020-10-15. https://future.a16z.com/emerging-architectures-modern-data-infrastructure/. 
  6. "What Is A Data Platform? And How To Build One" (in en). 2021-07-08. https://www.montecarlodata.com/blog-what-is-a-data-platform-and-how-to-build-one/. 
  7. "Resilience and Vibrancy: The 2020 Data & AI Landscape" (in en-US). 2020-09-30. https://mattturck.com/data2020/. 
  8. "Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape" (in en-US). 2021-09-28. https://mattturck.com/data2021/. 
  9. "Creating A Unified Experience For The Modern Data Stack At Mozart Data - Episode 242" (in en-US). https://www.dataengineeringpodcast.com/mozart-data-modern-data-stack-episode-242/. 
  10. "Reflections On Designing A Data Platform From Scratch - Episode 268" (in en-US). https://www.dataengineeringpodcast.com/data-platform-design-episode-268/. 
  11. Prukalpa (2021-06-17). "The Beginner's Guide to the Modern Data Stack" (in en). https://towardsdatascience.com/the-beginners-guide-to-the-modern-data-stack-d1c54bd1793e. 
  12. "A Metadata Platform for the Modern Data Stack | DataHub" (in en). https://datahubproject.io/.