Software:Modern Data Stack
The Modern Data Stack (MDS) is a Software stack that can be used to implement a Data management platform with a strong focus on Scalability, Interoperability and Ease of use.[1]
The technology enabling the emergence of the Modern Data Stack was the Cloud data warehouse.[1][2] With the broad adoption of Amazon Redshift, the first Cloud data warehouse that gained significant traction, an entire ecosystem of technologies emerged. While most technologies belonging to the MDS can be accessed only as Software as a service, there are several of them that can be Self-hosted due to their Open-source software licensing model.
History
Even though the very first technologies in the MDS were launched as early as 2011 (BigQuery), the term "Modern Data Stack" itself did not gain broad adoption until 2020, the year when the first Modern Data Stack Conference (MDSCON)[3] was organised by Fivetran. Another event with a strong focus on the Modern Data Stack is Coalesce, an Analytics Engineering conference organised by dbt Labs, the creators of dbt, that was also inaugurated in 2020[4].
Layers
An useful way to look at the Modern Data Stack is to structure its technologies into functional layers:[5][6][7][8][9][10][11]
- Storage and Processing
- Ingestion and Transport
- Transformation and Modeling
- Analysis and Output
- Quality and Observability
- Discovery and Governance
- Privacy and Security
- Orchestration
Storage and Processing
The storage layer contains cloud data storage services with a SQL interface:
- Amazon Redshift
- BigQuery
- Snowflake
- Azure Synapse
- Databricks Delta Lake
Ingestion and Transport
Services and technologies in this layer:
- Fivetran
- Stitch
- Airbyte (open source)
- Meltano (open source)
- PipelineWise (open source)
Transformation and Modeling
Services and technologies in this layer:
- dbt (open source)
- LookML
- Dataform
Analysis and Output
Services and technologies in this layer:
- Looker
- Metabase
- Mode
- Tableau
- Power BI
- Apache Superset (open source)
Quality and Observability
Services and technologies in this layer:
- Monte Carlo
- Great Expectations (open source)
- Datafold
Discovery and Governance
Services and technologies in this layer:
- Collibra
- Amundsen (open source)
- Atlan
- Alation
- DataHub (open source)[12]
Privacy and Security
Services and technologies in this layer:
- Privacera
- Immuta
Orchestration
Services and technologies in this layer:
- Apache Airflow
- Prefect
- Dagster
References
- ↑ 1.0 1.1 "The Modern Data Stack: Past, Present, and Future" (in en). 2020-12-01. https://blog.getdbt.com/future-of-the-modern-data-stack/.
- ↑ (in en) Future Data 2020 - Tristan Handy - The Modern Data Stack: Past, Present, and Future, https://www.youtube.com/watch?v=1Zj8gTLdf5s, retrieved 2022-04-24
- ↑ "Fivetran Hosts Inaugural Modern Data Stack Conference on October 21-22, 2020" (in en). 2020-10-06. https://www.businesswire.com/news/home/20201006005433/en/Fivetran-Hosts-Inaugural-Modern-Data-Stack-Conference-on-October-21-22-2020.
- ↑ "Coalesce 2020" (in en-US). https://www.getdbt.com/coalesce-2020/.
- ↑ "Data Architecture Revisited: The Platform Hypothesis" (in en). 2020-10-15. https://future.a16z.com/emerging-architectures-modern-data-infrastructure/.
- ↑ "What Is A Data Platform? And How To Build One" (in en). 2021-07-08. https://www.montecarlodata.com/blog-what-is-a-data-platform-and-how-to-build-one/.
- ↑ "Resilience and Vibrancy: The 2020 Data & AI Landscape" (in en-US). 2020-09-30. https://mattturck.com/data2020/.
- ↑ "Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape" (in en-US). 2021-09-28. https://mattturck.com/data2021/.
- ↑ "Creating A Unified Experience For The Modern Data Stack At Mozart Data - Episode 242" (in en-US). https://www.dataengineeringpodcast.com/mozart-data-modern-data-stack-episode-242/.
- ↑ "Reflections On Designing A Data Platform From Scratch - Episode 268" (in en-US). https://www.dataengineeringpodcast.com/data-platform-design-episode-268/.
- ↑ Prukalpa (2021-06-17). "The Beginner's Guide to the Modern Data Stack" (in en). https://towardsdatascience.com/the-beginners-guide-to-the-modern-data-stack-d1c54bd1793e.
- ↑ "A Metadata Platform for the Modern Data Stack | DataHub" (in en). https://datahubproject.io/.
Original source: https://en.wikipedia.org/wiki/Modern Data Stack.
Read more |