Back to all articles

Data Platforms articles

Architecture, strategy, and tooling for modern data platforms — from lakehouse design to multi-engine orchestration.

8 articles

LakeOps measured results on real Iceberg workloads: 95% faster compaction, 12x query performance improvement, 80% cost reduction
Apache IcebergLakeOpsData PlatformsObservability

Apache Iceberg Cost Optimization in 2026

Your Iceberg lake is overcharging you from four directions at once — storage bloat, query compute waste, compaction overhead, and engineering time. This post breaks down exactly where each dollar goes and how autonomous table management eliminates the waste without touching your pipelines.

Amit Gilad
Amit Gilad
21 min read
LakeOps Routing Groups showing four enterprise workload endpoints with stable URLs, engine pools, and query-type scope for AI agent connectivity
Apache IcebergLakeOpsQueryFluxData Platforms

Optimizing Apache Iceberg for Agentic AI: From Slow Tables to Sub-Second Agent Queries

AI agents issue SQL iteratively, repeat query templates at high frequency, and need sub-second responses from tables designed for batch workloads. This post covers what breaks when agents hit a production Iceberg lake — and the five infrastructure layers that fix it: MCP connectivity, guardrails, multi-engine routing, self-optimizing storage, and closed-loop feedback.

Chris P
Chris P
18 min read
LakeOps dashboard showing optimization activity, key metrics, and recent operations across production Iceberg tables
Apache IcebergLakeOpsData PlatformsObservability

Managed Iceberg in 2026: Autonomous Data Lake

Iceberg tables degrade silently — small files pile up, snapshots bloat metadata, and query latency creeps higher. A breakdown of the nine components every production data lake needs to stay healthy — starting with observability and telemetry collection, through compaction, snapshot management, and fleet-wide policies, to multi-engine routing and agentic AI enablement.

David W
David W
22 min read
Introducing QueryFlux: Open-Source Universal Multi-Engine Query Router and SQL ProxyExternal
QueryFluxApache IcebergData Platforms

Introducing QueryFlux: Open-Source Universal Multi-Engine Query Router and SQL Proxy

QueryFlux is a universal SQL proxy and multi-engine query router in Rust—one access layer in front of Trino, DuckDB, StarRocks, and Athena with routing, dialect translation, and observability.

Joni Sar
12 min read
Benchmarking Lakeops: A Production-Grade Compaction Engine for Apache IcebergExternal
Apache IcebergLakeOpsData Platforms

Benchmarking Lakeops: A Production-Grade Compaction Engine for Apache Iceberg

How we compacted 4.5 TB across 10 real production tables, achieved up to 99.8% file reduction, and made Apache Spark OOM on a job we finished in 11 minutes.

Amit Gilad
9 min read
Building a Distributed Compaction Engine for Apache Iceberg with Rust + DataFusionExternal
Apache IcebergLakeOpsData Platforms

Building a Distributed Compaction Engine for Apache Iceberg with Rust + DataFusion

How we built a high-performance, distributed compaction engine for Apache Iceberg using Rust and DataFusion—architecture, design choices, and lessons learned.

Amit Gilad
9 min read
Cracking the Ice: The Battle Between Sort and Binpack in Apache IcebergExternal
Apache IcebergData LakeData Platforms

Cracking the Ice: The Battle Between Sort and Binpack in Apache Iceberg

Unlocking performance vs. optimizing storage — choosing the right compaction strategy for your data lake.

Amit Gilad
7 min read
Incremental Processing with Apache Iceberg & Spark: A Comprehensive GuideExternal
Apache IcebergApache SparkData Platforms

Incremental Processing with Apache Iceberg & Spark: A Comprehensive Guide

Learn how to implement efficient incremental processing with Apache Iceberg and Spark, including best practices for data lake optimization and performance tuning.

Amit Gilad
9 min read