Back to all articles

Compaction articles

Iceberg compaction strategies, engines, and automation — binpack, sort, Z-order, streaming compaction, and production benchmarks.

24 articles

Apache Iceberg Commit Conflicts — causes, prevention, and recovery with concurrent write paths
Apache IcebergStreamingApache FlinkCompaction

Apache Iceberg Commit Conflicts: Causes, Prevention, and Recovery

Every concurrent write to an Apache Iceberg table risks a commit conflict. This guide covers how Iceberg's optimistic concurrency works, what triggers CommitFailedException, the common conflict scenarios in streaming and maintenance workloads, and the strategies — from partition isolation to branch-based writes — that eliminate conflicts in production.

Chris P
Chris P
33 min read
Apache Iceberg Operational Runbook — incidents, symptoms, and fixes with detect, diagnose, resolve, and verify workflow
Apache IcebergObservabilityLakeOpsCompaction

Apache Iceberg Operational Runbook: Incidents, Symptoms, and Fixes

A production-ready runbook for Iceberg incidents: queries suddenly slow, planning takes minutes, write conflicts spike, storage grows uncontrolled, compaction OOMs, time travel breaks, and delete files degrade reads. Each incident follows Symptom → Root Cause → Diagnosis → Fix → Prevention.

David W
David W
24 min read
Automating Apache Iceberg Table Maintenance — compaction, snapshot expiration, orphan cleanup, manifest rewrite, and table health orbiting an Iceberg table.
Apache IcebergCompactionLakeOpsObservability

Automating Apache Iceberg Table Maintenance

Apache Iceberg ships the maintenance primitives — compaction, snapshot expiration, orphan cleanup, and manifest rewriting — but none of them run themselves. This guide covers why each operation matters, the correct execution order, the limitations of scripts and cron jobs, and how to automate the full lifecycle with policies, observability, and a purpose-built control plane.

Chris P
Chris P
21 min read
Kafka to Iceberg Compaction — Kafka events streaming into an Iceberg table, compacted through a gear process into optimized blocks.
CompactionApache IcebergApache KafkaStreaming

Kafka to Iceberg Compaction — Done Right

Streaming from Kafka into Apache Iceberg creates small files faster than any other write pattern. This guide covers why standard compaction approaches fail for streaming tables, how to measure compaction need, implement partition-aware compaction that avoids writer conflicts, tune rewriteDataFiles parameters, and run maintenance autonomously at scale.

Rob M
Rob M
26 min read
Apache Iceberg 1.11.0 What's New — Nessie mascot beside an iceberg with icons for performance, security, routing, and extensibility.
Apache IcebergLakehouseCompactionLakeOps

Apache Iceberg 1.11.0 — What's New?

Apache Iceberg 1.11.0 lands V3 maturity with production-ready deletion vectors, a native Variant type for semi-structured data, server-side scan planning, built-in table encryption, and a pluggable File Format API that opens the door to next-generation storage formats.

Jonathan Saring
Jonathan Saring
10 min read
AWS Glue Iceberg Optimization — an S3 bucket with scattered data objects funneled through an optimization lens into a geometric iceberg, with icons for Search, Analytics, and Tuning
Apache IcebergAWSCompactionLakeOps

AWS Glue Iceberg Optimization: A Practical Guide

AWS Glue provides native Iceberg support for cataloging, ETL, and built-in table maintenance — but production lakehouses hit limitations fast. This guide covers Glue catalog configuration, ETL best practices, compaction tuning, common pitfalls, and how a dedicated control plane fills the operational gaps.

David W
David W
20 min read
Apache Iceberg with dbt Optimization — dbt logo above SQL model cards flowing through a transformation pipeline into a geometric iceberg, with chart and analytics icons
Apache IcebergdbtCompactionLakehouse

Apache Iceberg with dbt: Optimization Guide

dbt transforms your data — but who maintains the Iceberg tables underneath? A practical guide to dbt adapters, incremental strategies, table properties, and the maintenance gap that every dbt + Iceberg team hits in production.

Rob M
Rob M
16 min read
Apache Iceberg with Flink Optimization — Flink squirrel mascot with streaming data flowing through an optimization ring into a geometric iceberg, with performance metric icons
Apache IcebergApache FlinkStreamingCompaction

Apache Iceberg with Flink: Streaming Optimization Guide

Flink streaming into Iceberg creates thousands of small files per hour. This guide covers checkpoint tuning, write distribution modes, Flink SQL patterns, and why external maintenance is essential for production streaming tables.

Chris P
Chris P
15 min read
Apache Iceberg Delete Files — stacked data blocks with pink delete file markers funneled through compaction into clean, optimized data with a performance gauge showing improved read speed
Apache IcebergCompactionLakeOpsStreaming

Apache Iceberg Delete Files: Reducing Merge-on-Read Overhead

Delete files let Iceberg avoid rewriting data on every UPDATE or DELETE — but every unresolved delete file forces readers to reconcile at query time. A deep guide to position deletes, equality deletes, measuring overhead, and resolving accumulation before it tanks performance.

David W
David W
17 min read
Apache Iceberg Table Partitioning Best Practices — a geometric iceberg branching into date, region, and category partition columns, each with table and folder icons showing the partition hierarchy
Apache IcebergLakeOpsAnalyticsLakehouse

Apache Iceberg Table Partitioning Best Practices

Partitioning determines how much data every query must scan. Apache Iceberg's hidden partitioning and partition evolution change the game — but choosing the wrong strategy still creates performance cliffs. A practical guide to transforms, sizing, evolution, and avoiding the small-files trap.

Chris P
Chris P
18 min read
Fixing Small Files in Apache Iceberg — scattered small data cubes compacted into larger organized file blocks flowing toward a geometric iceberg
CompactionApache IcebergLakeOpsApache Flink

Fixing Small Files in Apache Iceberg: A Practical Guide

Small files silently degrade every Apache Iceberg lakehouse — inflating S3 costs, slowing query planning, and bloating metadata. This guide covers root causes, measurement, manual and automated fixes, and how to eliminate the problem at scale.

Rob M
Rob M
19 min read
Apache Iceberg Table Health and Maintenance — health score dashboard showing 92 Healthy with status indicators for Snapshots, Manifests, Delete Files, Orphan Files, and File Health beside a geometric iceberg
Apache IcebergCompactionObservabilityLakeOps

Apache Iceberg Table Health and Maintenance: A Complete Guide

Iceberg tables degrade silently in production — small files multiply, snapshots accumulate, orphans waste storage, and manifests fragment. A comprehensive guide to the five maintenance operations, why sequencing matters, the metrics that reveal problems early, and how to automate the full lifecycle.

David W
David W
20 min read
Apache Iceberg with Trino Optimization — Trino logo with an optimization gauge sending query streams into a geometric iceberg, with performance metric icons for throughput, latency, and efficiency
Apache IcebergTrinoCompactionLakeOps

Apache Iceberg with Trino: Performance Optimization Guide

A practical guide to optimizing Apache Iceberg queries and table maintenance with Trino — covering scan planning, predicate pushdown, file pruning, Trino-side tuning, maintenance procedures, physical layout optimization, and how a dedicated control plane eliminates JVM overhead while adding cross-engine intelligence.

Chris P
Chris P
18 min read
Annual cloud bill infographic showing Iceberg lakehouse spend doubling year over year — FinOps and cost reduction framing for data platform teams in 2026
FinOpsApache IcebergLakeOpsCloud Cost

State of Iceberg FinOps and Cost Reduction in 2026

State of Iceberg FinOps in 2026: where lakehouse spend leaks, what to measure, how autonomous management and optimization are replacing manual maintenance — and a practical survey of tools from cloud optimizers to control planes.

David W
David W
24 min read
Iceberg Lake for Data Analytics: Optimization Guide — iceberg on water with analytics dashboard showing 9.4× query speed, 68% cost efficiency gain, and 82% less data scanned
Apache IcebergData PlatformsData LakeLakeOps

Iceberg Lake for Data Analytics: Optimization Guide

Eight optimization layers for data platform engineers running BI, ad-hoc SQL, and aggregation pipelines on Apache Iceberg — from partition design and file sizing through compaction, routing, and continuous maintenance.

Jonathan Saring
Jonathan Saring
15 min read
Iceberg lakehouse cost reduction — cost waste flows through LakeOps autonomous operations to deliver 80% savings
Apache IcebergLakeOpsCloud CostFinOps

7 Iceberg Lakehouse Cost Reduction Strategies

Iceberg lakehouses silently accumulate cost from small files, dead snapshots, orphan data, unoptimized layouts, and over-provisioned compute. Seven practical strategies — from deploying an autonomous control plane to leveraging partition evolution — that production data teams use to cut lakehouse spend by up to 80%.

Jonathan Saring
Jonathan Saring
9 min read
Optimizing Iceberg Lakehouse Performance — problems (small files, fragmented manifests, unsorted data, delete files) flow through autonomous maintenance into faster queries, lower costs, higher throughput, and healthier data
Apache IcebergLakeOpsAnalyticsData Platforms

Optimizing Iceberg Lakehouse Performance

Iceberg tables degrade silently — small files from streaming, unsorted data, fragmented manifests, accumulated delete files. Each one caps query speed regardless of engine. Six concrete optimization layers, how they interact, and how autonomous maintenance keeps every table at peak performance.

David W
David W
11 min read
Iceberg Table Maintenance Solution Comparison — side-by-side feature matrix for LakeOps, AWS Glue, S3 Tables, Snowflake, BigLake, Cloudera, and Starburst
CompactionApache IcebergLakehouseData Platforms

9 Iceberg Table Compaction Tools Compared for Production Lakehouses

Compaction keeps Apache Iceberg lakehouses fast and lean — but every tool approaches it differently. A side-by-side look at nine production options: LakeOps, AWS Glue, Amazon S3 Tables, Snowflake, Google BigLake, Cloudera, Starburst, Dremio, and Databricks.

Jonathan Saring
Jonathan Saring
17 min read
Optimizing Iceberg Lake Compaction — scattered small data-block cubes funnel through a compaction machine onto a conveyor belt of optimized blocks, leading to a crystal-clear iceberg lakehouse
CompactionApache IcebergLakehouseLakeOps

Optimizing Iceberg Lake Compaction: A Guide

Compaction is the most impactful operation in an Apache Iceberg lakehouse — and the hardest to get right at scale. File merging is the easy part. Knowing when to trigger it, what sort strategy to apply per table, how to avoid conflicting with other maintenance, and how to do it without spinning up expensive JVM clusters — that is the real problem. A breakdown of what modern compaction actually requires.

Jonathan Saring
Jonathan Saring
16 min read
Iceberg lakehouse optimization — multi-engine ecosystem (AWS, Databricks, Trino, DuckDB, Snowflake, Flink, and more) around a shared Iceberg lake, with observability and optimization above the waterline
Apache IcebergLakehouseLakeOpsObservability

Iceberg Lakehouse Optimization — The Right Way

Apache Iceberg gives your lakehouse warehouse-grade reliability on object storage — but the format does not optimize itself. A practical guide to every operational pillar a production Iceberg lakehouse needs — from lake-wide observability and query-aware compaction to snapshot lifecycle, metadata health, and governance — and how LakeOps runs it all from a single control plane.

Jonathan Saring
Jonathan Saring
21 min read
LakeOps measured results on real Iceberg workloads: 95% faster compaction, 12x query performance improvement, 80% cost reduction
Apache IcebergLakeOpsCloud CostFinOps

Apache Iceberg Cost Optimization in 2026

Your Iceberg lake is overcharging you from four directions at once — storage bloat, query compute waste, compaction overhead, and engineering time. This post breaks down exactly where each dollar goes and how autonomous table management eliminates the waste without touching your pipelines.

David W
David W
22 min read
Benchmarking Lakeops: A Production-Grade Compaction Engine for Apache IcebergExternal
Apache IcebergCompactionLakeOpsLakehouse

Benchmarking Lakeops: A Production-Grade Compaction Engine for Apache Iceberg

How we compacted 4.5 TB across 10 real production tables, achieved up to 99.8% file reduction, and made Apache Spark OOM on a job we finished in 11 minutes.

Amit Gilad
9 min read
Building a Distributed Compaction Engine for Apache Iceberg with Rust + DataFusionExternal
Apache IcebergCompactionLakeOpsLakehouse

Building a Distributed Compaction Engine for Apache Iceberg with Rust + DataFusion

How we built a high-performance, distributed compaction engine for Apache Iceberg using Rust and DataFusion—architecture, design choices, and lessons learned.

Amit Gilad
9 min read
Cracking the Ice: The Battle Between Sort and Binpack in Apache IcebergExternal
Apache IcebergCompactionData PlatformsLakehouse

Cracking the Ice: The Battle Between Sort and Binpack in Apache Iceberg

Unlocking performance vs. optimizing storage — choosing the right compaction strategy for your data lake.

Amit Gilad
7 min read