Software:LakeFS
| Original author(s) | Einat Orr Oz Katz |
|---|---|
| Developer(s) | Treeverse |
| Initial release | August 3, 2020 |
| Stable release | 1.72.0
|
| Repository | https://github.com/treeverse/lakeFS |
| Written in | Go |
| Type | Data version control |
| License | Apache 2.0 |
| Website | lakefs |
lakeFS is an open-source data version control system for managing data stored in object storage.[1] It provides Git-like operations such as branching, committing, merging, and reverting for large-scale data stored in systems including Amazon S3, Azure Blob Storage, and Google Cloud Storage, as well as other S3-compatible object storage platforms.[2] lakeFS is used in data engineering and machine learning workflows to manage changes to data, support reproducibility, and enable data governance across data lakes.[3] The software is available as an open-source project, as well as in enterprise and managed service offerings, including lakeFS Cloud.[3][1]
History
lakeFS was created in 2020 by Einat Orr and Oz Katz at Treeverse.[4] Its first public release, version 0.8.1, appeared in August 2020 and introduced Git-style operations with support for Amazon S3.[5]
In 2021, Treeverse raised $23 million in a Series A funding round led by Dell Technologies Capital, Norwest Venture Partners, and Zeev Ventures.[6] The same year, lakeFS was included in InfoWorld’s Best of Open Source Software (Bossie) awards.[7]
In June 2022, Treeverse introduced lakeFS Cloud, a managed service providing hosted lakeFS deployments for cloud-based data lakes.[3] Version 1.0 was released in October 2023, adding integrations with platforms such as Databricks and Apache Iceberg, as well as support for orchestration tools including Apache Airflow.[1][8] Public case studies and conference materials have described usage of lakeFS by organizations such as Microsoft, Volvo, and NASA.[1]
In July 2025, Treeverse announced an additional $20 million in growth funding to support further development of lakeFS.[9][10]
In November 2025, Treeverse announced the acquisition of the open-source data version control project DVC.[11]
Software
Overview
lakeFS provides Git-like operations such as branching, committing, merging, and reverting for datasets stored in object storage.[1] These operations are used to manage changes to data, test modifications in isolation, reproduce specific data states, and recover from errors or unintended updates.[2]
Architecture
lakeFS operates as a metadata layer on top of object storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.[2] It stores repository metadata describing commits, branches, and tags, enabling versioned views of data without copying underlying objects.[2]
The system provides access through multiple interfaces, including a web user interface, command-line tools, a REST API, and software development kits.[2] It is designed to integrate with existing data engineering and machine learning workflows, and can be deployed either in self-hosted environments or as a managed service.[3]
Functions
lakeFS provides version control functionality for data stored in object storage–based data lakes. Core features include:
- Atomic commits and version tracking for datasets, supporting reproducibility and auditability.[1]
- Branching and merging mechanisms that allow isolated development and testing without duplicating data.[2]
- Configurable hooks that can validate data or trigger external processes during commit and merge operations.[1]
- The ability to revert repositories to earlier states to recover from data errors or failed changes.[2]
- Recording of commit history and associated metadata for lineage tracking.[3]
- Support for managing data across multiple object storage systems, including Amazon S3, Azure Blob Storage, Google Cloud Storage, and MinIO.[3]
- Use of fixed data versions to reproduce experiments and machine learning model training.[1]
Integrations
Coverage of lakeFS has described integrations with platforms such as Databricks and Apache Iceberg, as well as support for environments including Red Hat OpenShift.[1][2] Additional materials describe its use with Trino, including validation of data changes prior to merging in versioned data workflows, as well as compatibility with orchestration tools such as Apache Airflow.[12]
See also
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Kerner, Sean Michael (October 2023). "Open-source lakeFS data version control levels up to 1.0". https://venturebeat.com/data-infrastructure/open-source-lakefs-data-version-control-levels-up-to-1-0/.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 "LakeFS brings Git-like version control to virtual dataset copies". March 27, 2023. https://blocksandfiles.com/2023/03/27/lakefs-brings-git-like-version-control-to-virtual-dataset-copies/.
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 Kerner, Sean Michael (June 22, 2022). "Treeverse set to launch lakeFS cloud data lake service". https://www.techtarget.com/searchdatamanagement/news/252521898/Treeverse-set-to-launch-LakeFS-cloud-data-lake-service.
- ↑ Orbach, Meir (July 28, 2021). "Treeverse raises $15 million Series A to leverage lakeFS". https://www.calcalistech.com/ctech/articles/0,7340,L-3913525,00.html.
- ↑ "v0.8.1". https://github.com/treeverse/lakeFS/releases/tag/v0.8.1.
- ↑ Sawers, Paul (July 28, 2021). "Treeverse raises $23M to bring Git-like version control to data lakes". https://venturebeat.com/business/treeverse-raises-23m-to-bring-git-like-version-control-to-data-lakes/.
- ↑ Borck, James R. (2021-10-18). "The best open source software of 2021". https://www.infoworld.com/article/3637038/the-best-open-source-software-of-2021.html.
- ↑ "Real-Time Analytics News for the Week Ending October 28". October 2023. https://www.rtinsights.com/real-time-analytics-news-for-the-week-ending-october-28/.
- ↑ "LakeFS nabs $20M to build “Git for Big Data”". July 29, 2025. https://www.bigdatawire.com/2025/07/29/lakefs-nabs-20m-to-build-git-for-big-data/.
- ↑ "LakeFS Secures $20M in Growth Capital, Transforms Critical Gap in Enterprise Data and AI Tech Stack". July 2025. https://www.dbta.com/Editorial/News-Flashes/lakeFS-Secures-20M-in-Growth-Capital-Transforms-Critical-Gap-in-Enterprise-Data-and-AI-Tech-Stack-171058.aspx.
- ↑ "DVC Joins lakeFS: Your Questions Answered". November 18, 2025. https://dvc.org/blog/dvc-joins-lakefs-your-questions-answered/.
- ↑ "Trino Community Broadcast 27: Data versioning with lakeFS". https://trino.io/episodes/27.html.
