Back to all articles

Data Lake articles

Fundamentals of data lake architecture — storage layout, partitioning, format selection, and lifecycle management.

7 articles

LakeOps measured results on real Iceberg workloads: 95% faster compaction, 12x query performance improvement, 80% cost reduction
Apache IcebergLakeOpsData PlatformsObservability

Apache Iceberg Cost Optimization in 2026

Your Iceberg lake is overcharging you from four directions at once — storage bloat, query compute waste, compaction overhead, and engineering time. This post breaks down exactly where each dollar goes and how autonomous table management eliminates the waste without touching your pipelines.

Amit Gilad
Amit Gilad
21 min read
LakeOps Routing Groups showing four enterprise workload endpoints with stable URLs, engine pools, and query-type scope for AI agent connectivity
Apache IcebergLakeOpsQueryFluxData Platforms

Optimizing Apache Iceberg for Agentic AI: From Slow Tables to Sub-Second Agent Queries

AI agents issue SQL iteratively, repeat query templates at high frequency, and need sub-second responses from tables designed for batch workloads. This post covers what breaks when agents hit a production Iceberg lake — and the five infrastructure layers that fix it: MCP connectivity, guardrails, multi-engine routing, self-optimizing storage, and closed-loop feedback.

Chris P
Chris P
18 min read
LakeOps dashboard showing optimization activity, key metrics, and recent operations across production Iceberg tables
Apache IcebergLakeOpsData PlatformsObservability

Managed Iceberg in 2026: Autonomous Data Lake

Iceberg tables degrade silently — small files pile up, snapshots bloat metadata, and query latency creeps higher. A breakdown of the nine components every production data lake needs to stay healthy — starting with observability and telemetry collection, through compaction, snapshot management, and fleet-wide policies, to multi-engine routing and agentic AI enablement.

David W
David W
22 min read
From 350TB to 230TB in 10 Minutes: The Hidden Weight of Stale DataExternal
Apache IcebergData LakeLakeOps

From 350TB to 230TB in 10 Minutes: The Hidden Weight of Stale Data

See how a 350TB data lake shrank to 230TB in 10 minutes by removing stale data—saving 34% in AWS S3 costs and proving the need for a control plane.

Amit Gilad
5 min read
Why Every Data Lake Needs a Control Plane: Lessons from Apache IcebergExternal
Apache IcebergData LakeLakeOps

Why Every Data Lake Needs a Control Plane: Lessons from Apache Iceberg

Apache Iceberg delivers speed, but without a control plane snapshots pile up, costs surge, query take more time — starting with expiration.

Amit Gilad
8 min read
Cracking the Ice: The Battle Between Sort and Binpack in Apache IcebergExternal
Apache IcebergData LakeData Platforms

Cracking the Ice: The Battle Between Sort and Binpack in Apache Iceberg

Unlocking performance vs. optimizing storage — choosing the right compaction strategy for your data lake.

Amit Gilad
7 min read
Delta Lake vs Apache Iceberg: Choosing the Right Table FormatExternal
Delta LakeApache IcebergData LakeLakehouse

Delta Lake vs Apache Iceberg: Choosing the Right Table Format

A detailed comparison between Delta Lake and Apache Iceberg, exploring their architectures, performance characteristics, and ideal use cases to help you make the right choice.

Amit Gilad
10 min read