Software:Delta Lake (Software)
Original author(s) | Michael Armbrust, Databricks |
---|---|
Initial release | April 2019 |
Written in | Scala, Python |
Operating system | Cross-platform |
Type | Data warehouse, Data lake |
License | Apache License 2.0 |
Website |
Delta Lake is an open-source storage framework that enables building a data lakehouse architecture with various compute engines and APIs. It brings ACID transactions and scalable metadata handling to big data workloads and addresses common issues with data lakes such as data quality, schema evolution, and concurrency control.[1] Delta Lake is a project under the Linux Foundation and is released under the Apache License.[2]
History
Databricks open-sourced Delta Lake in April 2019 to the Linux Foundation, but kept some features proprietary. In June 2022 Databricks open-sourced all of Delta Lake[3]
Features
Delta Lake supports multiple compute engines, such as Apache Spark, Presto, Flink, Trino, and Apache Hive. It also provides APIs for different programming languages, such as Scala, Java, Python, Rust, and Ruby. Delta Lake extends Apache Parquet data files with a file-based transaction log that tracks every change to the data and prevents data corruption.
Architecture
Delta Lake works internally by extending Parquet data files with a file-based transaction log (aka "delta log") that tracks every change to the data and ensures ACID transactions. The transaction log consists of JSON files that contain information about the actions performed on the data, such as add, remove, set transaction, and commit. The transaction log also maintains a snapshot of the current state of the data by using checkpoints that store Parquet metadata.[4]
References
- ↑ Armbrust, Michael; Das, Tathagata; Sun, Liwen; Yavuz, Burak; Zhu, Shixiong (2020-08-01). "Delta lake: high-performance ACID table storage over cloud object stores". Proceedings of the VLDB Endowment 13 (12): 3411–3424. doi:10.14778/3415478.3415560. https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf.
- ↑ "apache/iceberg GitHub License". The Apache Software Foundation. 5 October 2022. https://github.com/apache/iceberg/blob/master/LICENSE.
- ↑ Armbrust, Michael; Ghodsi, Ali (2022-06-30). "Open Sourcing All of Delta Lake". News. Databricks. https://www.databricks.com/blog/2022/06/30/open-sourcing-all-of-delta-lake.html#:~:text=The%20Genesis%20of%20Delta%20Lake%20The%20genesis%20of,created%20Delta%20Lake%2C%20Spark%20SQL%2C%20and%20Structured%20Streaming%29.. Retrieved 2023-03-02.
- ↑ "Build Lakehouses with Delta Lake". https://delta.io/. Retrieved 2023-03-02.
Original source: https://en.wikipedia.org/wiki/Delta Lake (Software).
Read more |