Software:Apache Iceberg

Apache Iceberg
Original author(s)	Ryan Blue, Daniel Weeks
Initial release	10 August 2017; 6 years ago
Written in	Java, Python
Operating system	Cross-platform
Type	Data warehouse, Data lake
License	Apache License 2.0
Website	iceberg.apache.org;

Short description: Big Data Table Format

Apache Iceberg is an open source high-performance format for huge analytic tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive, Impala, StarRocks, Doris, and Pig to safely work with the same tables, at the same time.^[1] Iceberg is released under the Apache License.^[2] Iceberg addresses the performance and usability challenges of using Apache Hive tables in large and demanding data lake environments.^[3] Vendors currently supporting Apache Iceberg tables in their products include CelerData, Cloudera, Dremio, IOMETE, Snowflake, Starburst, Tabular,^[4] and AWS.^[5]

History

Iceberg was started at Netflix by Ryan Blue and Dan Weeks. Hive was used by many different services and engines in the Netflix infrastructure. Hive was never able to guarantee correctness and did not provide stable atomic transactions.^[3] Many at Netflix avoided using these services and making changes to the data to avert unintended consequences from the Hive format.^[3] Ryan Blue set out to address three issues that faced the Hive table by creating Iceberg:^[3]

Ensure the correctness of the data and support ACID transactions.
Improve performance by enabling finer-grained operations to be done at the file granularity for optimal writes.
Simplify and obfuscate^{[citation needed]} general operation and maintenance of tables.

Iceberg development started in 2017.^[6] The project was open-sourced and donated to the Apache Software Foundation in November 2018.^[7] In May 2020, the Iceberg project graduated to become a top-level Apache project.^[7]

Iceberg is used by multiple companies including Airbnb,^[8] Apple,^[3] Expedia,^[9] LinkedIn,^[10] Adobe,^[11] Lyft, and many more.^[12]

References

↑ "Apache Iceberg". https://iceberg.apache.org/.
↑ "apache/iceberg GitHub License". The Apache Software Foundation. 5 October 2022. https://github.com/apache/iceberg/blob/master/LICENSE.
↑ ^3.0 ^3.1 ^3.2 ^3.3 ^3.4 Woodie, Alex (8 February 2021). "Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?". https://www.datanami.com/2021/02/08/apache-iceberg-the-hub-of-an-emerging-data-service-ecosystem/.
↑ "Vendors". https://iceberg.apache.org/vendors/.
↑ "Using Apache Iceberg tables – Amazon Athena". Amazon Web Services, Inc.. https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html.
↑ "Initial public release in apache/iceberg" (in en). https://github.com/apache/iceberg/commit/a5eb3f6ba171ecfc517a4f09ae9654e7d8ae0291.
↑ ^7.0 ^7.1 "Incubation Status Template - Apache Incubator". https://incubator.apache.org/projects/iceberg.html.
↑ Zhu, Ronnie (26 September 2022). "Upgrading Data Warehouse Infrastructure at Airbnb" (in en). https://medium.com/airbnb-engineering/upgrading-data-warehouse-infrastructure-at-airbnb-a4e18f09b6d5.
↑ Mathiesen, Christine (26 January 2021). "A Short Introduction to Apache Iceberg" (in en). https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799.
↑ "FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format" (in en). https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin.
↑ Bremner, Jaemi (3 December 2020). "Iceberg at Adobe" (in en). https://blog.developer.adobe.com/iceberg-at-adobe-88cf1950e866.
↑ Council, Data. "Open Source Highlight: Apache Iceberg" (in en-ie). https://www.datacouncil.ai/blog/apache-iceberg.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Apache Iceberg. Read more

[1] "Apache Iceberg". https://iceberg.apache.org/.

[2] "apache/iceberg GitHub License". The Apache Software Foundation. 5 October 2022. https://github.com/apache/iceberg/blob/master/LICENSE.

[iceberg-data-hub-article-3] 3.0 ^3.1 ^3.2 ^3.3 ^3.4 Woodie, Alex (8 February 2021). "Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?". https://www.datanami.com/2021/02/08/apache-iceberg-the-hub-of-an-emerging-data-service-ecosystem/.

[4] "Vendors". https://iceberg.apache.org/vendors/.

[5] "Using Apache Iceberg tables – Amazon Athena". Amazon Web Services, Inc.. https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html.

[6] "Initial public release in apache/iceberg" (in en). https://github.com/apache/iceberg/commit/a5eb3f6ba171ecfc517a4f09ae9654e7d8ae0291.

[iceberg-incubator-7] 7.0 ^7.1 "Incubation Status Template - Apache Incubator". https://incubator.apache.org/projects/iceberg.html.

[8] Zhu, Ronnie (26 September 2022). "Upgrading Data Warehouse Infrastructure at Airbnb" (in en). https://medium.com/airbnb-engineering/upgrading-data-warehouse-infrastructure-at-airbnb-a4e18f09b6d5.

[9] Mathiesen, Christine (26 January 2021). "A Short Introduction to Apache Iceberg" (in en). https://medium.com/expedia-group-tech/a-short-introduction-to-apache-iceberg-d34f628b6799.

[10] "FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format" (in en). https://engineering.linkedin.com/blog/2021/fastingest-low-latency-gobblin.

[11] Bremner, Jaemi (3 December 2020). "Iceberg at Adobe" (in en). https://blog.developer.adobe.com/iceberg-at-adobe-88cf1950e866.

[12] Council, Data. "Open Source Highlight: Apache Iceberg" (in en-ie). https://www.datacouncil.ai/blog/apache-iceberg.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Anonymous

Search

Software:Apache Iceberg

Namespaces

More

Page actions

History

See also

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Software:Apache Iceberg

History

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories