top of page

Dipankar Mazumdar: Apache Iceberg: enabling an open lakehouse architecture for large-scale analytics



Dipankar is currently a Developer Advocate at Dremio where his primary focus is advocating data practitioners on Dremio’s lakehouse platform & various open-sourced projects such as Apache Iceberg, Arrow, Project Nessie, etc. that helps data teams apply & scale analytics. In his past roles, he worked at the intersection of Machine Learning & Data visualization. Dipankar is a co-author of the upcoming O’Reilly book - ‘Apache Iceberg: The Definitive Guide’. He also holds a Masters degree in Computer Science with a research area focused on ExplainableAI.

 

Apache Iceberg: enabling an open lakehouse architecture for large-scale analytics.

Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data.

A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, released by Facebook, which addresses some of these problems, but falls short on data, user, and application scale.


Apache Iceberg is a foundational technology for implementing an open data lakehouse, an architecture that addresses the limitations of traditional data architecture patterns. These limitations include having to ETL the data into each tool creating data drift and data silos, high costs making it cost prohibitive to make warehouse features available to all of your data and lack of flexibility forcing you to adjust your workflow to the tool your data is locked in.


Apache Iceberg provides the capabilities, performance, scalability and savings that fulfill the promise of an open data lakehouse. In this talk we will go through:

- What is a Lakehouse architecture?

- Table Formats in Data Lake?

- Architecture of an Iceberg table

- Benefits of this architecture (cost savings, etc.) & how it enables workloads such as BI, ML


21 views0 comments
bottom of page