Back to all articles

Lakehouse articles

The converged lakehouse paradigm: combining warehouse reliability with data lake flexibility using open table formats.

13 articles

Iceberg Lakehouse with AI Agents: A Guide — AI agent robots navigating an Apache Iceberg lakehouse with analytics dashboards, AI brain, and governance shield icons, Build like Netflix subtitle
Apache IcebergLakehouseLakeOpsData Platforms

Iceberg Lakehouse with AI Agents: A Guide

AI agents are becoming primary consumers of Iceberg lakehouse data — querying tables iteratively, at high frequency, and without human review. This guide walks through the five components your infrastructure needs to support agentic workloads — MCP connectivity, guardrails, multi-engine routing, self-optimizing storage, and observability — and shows how LakeOps provides each one.

Jonathan Saring
Jonathan Saring
24 min read
Databricks to Iceberg smooth migration — Databricks and Apache Iceberg connected by a data bridge, with table data flowing into an open Iceberg lakehouse
DatabricksApache IcebergLakeOpsDelta Lake

Databricks to Iceberg Smooth Migration

Databricks to Iceberg smooth migration opens a multi-engine lakehouse — not a platform exit. Databricks stays central for ML and Spark; Iceberg adds Trino, Snowflake, and open catalogs. Five tools: LakeOps, UC managed Iceberg, Delta UniForm, Spark, and Lakehouse Federation.

David W
David W
18 min read
Snowflake to Iceberg migration — Snowflake tables flowing into an Apache Iceberg lakehouse, illustrating a hybrid multi-engine architecture where Snowflake remains a valued component
SnowflakeApache IcebergLakeOpsData Platforms

Snowflake to Iceberg Smooth Migration

A practical guide for senior data engineers expanding Snowflake into a multi-engine Iceberg lakehouse. Covers five production tools — LakeOps, managed Iceberg, Open Catalog sync, Spark, and AWS Glue — with migration patterns, operational trade-offs, and a phased rollout sequence.

David W
David W
17 min read
Multiple Query Engines with Iceberg — Ferris the Rust crab routing queries to Trino, Snowflake, DataFusion, Databricks, Presto, ClickHouse, DuckDB, and Apache Spark over an Iceberg Lakehouse
Apache IcebergQueryFluxquery routingLakehouse

Routing Multiple Query Engines with Iceberg

How to route queries across Trino, Spark, DuckDB, Snowflake, Athena, and Flink on shared Iceberg tables — covering the architecture of a SQL routing proxy, dialect translation, routing strategies, table-aware optimization, and the tooling that makes it work.

Rob M
Rob M
18 min read
Diagram showing seven Iceberg catalog options — Polaris, Nessie, Glue, Unity, Gravitino, Lakekeeper, and Hive — connected to a central Apache Iceberg symbol
Apache IcebergIceberg catalogLakehouseData Lake

Best Catalog for Apache Iceberg? A Useful Comparison

A technical comparison of the seven major Apache Iceberg catalogs — Hive Metastore, AWS Glue, Apache Polaris, Project Nessie, Databricks Unity Catalog, Apache Gravitino, and Lakekeeper — across protocol support, access control, multi-engine interoperability, credential vending, and production readiness.

Chris P
Chris P
21 min read
Data Lake vs Lakehouse vs Warehouse: A Practical Guide — watercolor illustration comparing a natural data lake (raw flexible storage), a lakehouse (open storage with analytics on the water), and a data warehouse (structured BI building with charts in the windows)
Data PlatformsData LakeLakehouseApache Iceberg

Data Lake vs Lakehouse vs Warehouse: A Practical Guide

Data lakes, warehouses, and lakehouses are not interchangeable — each has hard limits the others cannot cover. A practical guide for platform leaders: where each architecture wins, where it fails, cost and governance trade-offs, and how to choose (or combine) them in 2026.

Chris P
Chris P
22 min read
Iceberg Table Maintenance Solution Comparison — side-by-side feature matrix for LakeOps, AWS Glue, S3 Tables, Snowflake, BigLake, Cloudera, and Starburst
Apache IcebergCompactionLakehouseData Platforms

9 Iceberg Table Compaction Tools Compared for Production Lakehouses

Compaction keeps Apache Iceberg lakehouses fast and lean — but every tool approaches it differently. A side-by-side look at nine production options: LakeOps, AWS Glue, Amazon S3 Tables, Snowflake, Google BigLake, Cloudera, Starburst, Dremio, and Databricks.

Jonathan Saring
Jonathan Saring
17 min read
LakeOps lakehouse control plane — connected to Iceberg catalogs on the left, query engines on the right, with observability, autonomous optimization, and cost management in the center
Apache IcebergLakeOpsLakehouseFinOps

Iceberg Lakehouse Optimization with LakeOps

A practical walkthrough of optimizing an Apache Iceberg lakehouse end to end — from connecting catalogs and diagnosing table health through autonomous compaction, lifecycle management, and multi-engine routing to measurable cost and performance outcomes.

Rob M
Rob M
16 min read
From data swamp to modern Iceberg lakehouse — illustrated journey from scattered files and broken schemas through Apache Iceberg to a managed lakehouse with a control plane
Data PlatformsData SwampApache IcebergLakehouse

From Data Swamp to Modern Iceberg Lakehouse

Every data lake starts with a promise of unlimited flexibility — and most end up as a swamp. Stale files, broken schemas, no observability, and engineers spending more time maintaining pipelines than analyzing data. Apache Iceberg fixed the reliability gap. A lakehouse control plane fixes everything else. A practical guide to the full transition — component by component.

Jonathan Saring
Jonathan Saring
23 min read
Optimizing Iceberg Lake Compaction — scattered small data-block cubes funnel through a compaction machine onto a conveyor belt of optimized blocks, leading to a crystal-clear iceberg lakehouse
Apache IcebergCompactionLakehouseLakeOps

Optimizing Iceberg Lake Compaction: A Guide

Compaction is the most impactful operation in an Apache Iceberg lakehouse — and the hardest to get right at scale. File merging is the easy part. Knowing when to trigger it, what sort strategy to apply per table, how to avoid conflicting with other maintenance, and how to do it without spinning up expensive JVM clusters — that is the real problem. A breakdown of what modern compaction actually requires.

Jonathan Saring
Jonathan Saring
16 min read
Iceberg lakehouse optimization — multi-engine ecosystem (AWS, Databricks, Trino, DuckDB, Snowflake, Flink, and more) around a shared Iceberg lake, with observability and optimization above the waterline
Apache IcebergLakehouseLakeOpslakehouse optimization

Iceberg Lakehouse Optimization — The Right Way

Apache Iceberg gives your lakehouse warehouse-grade reliability on object storage — but the format does not optimize itself. A practical guide to every operational pillar a production Iceberg lakehouse needs — from lake-wide observability and query-aware compaction to snapshot lifecycle, metadata health, and governance — and how LakeOps runs it all from a single control plane.

Jonathan Saring
Jonathan Saring
21 min read
Modern lakehouse architecture: LakeOps control plane for autonomous management and optimization — observability, compaction, routing, AI guardrails, and governance above Iceberg on S3, with catalogs and multi-engine compute (Spark, Trino, Snowflake, Databricks, and more)
Data PlatformsApache IcebergSnowflakeDatabricks

From Databricks and Snowflake to an Open Data Platform

For a decade, Snowflake and Databricks defined enterprise data. Then the lakehouse emerged — open formats on open storage. What was missing was the operational layer to make it work at scale. An autonomous control plane turns a lakehouse into a managed open data platform — without the lock-in.

Jonathan Saring
Jonathan Saring
18 min read
Delta Lake vs Apache Iceberg: Choosing the Right Table FormatExternal
Delta LakeApache IcebergData LakeLakehouse

Delta Lake vs Apache Iceberg: Choosing the Right Table Format

A detailed comparison between Delta Lake and Apache Iceberg, exploring their architectures, performance characteristics, and ideal use cases to help you make the right choice.

Amit Gilad
10 min read