
Apache Iceberg gives you ACID transactions, schema evolution, partition evolution, and time travel — all in an open table format readable by every major query engine. What it does not give you is a way to keep tables healthy over time.
Every write to an Iceberg table creates a new snapshot, adds manifest entries, and produces data files. Over weeks and months, this accumulates into a structural debt that degrades query performance, inflates storage costs, and slows down every engine connected to the table. The fix requires four distinct maintenance operations — compaction, snapshot expiration, orphan file cleanup, and manifest rewriting — executed in the right order, at the right frequency, across every table in the lake.
Iceberg ships the SQL procedures and Java APIs for each operation. But it does not ship scheduling, health detection, sequencing logic, conflict handling, multi-table prioritization, observability, or alerting. Those are left to the operator — and the gap between "we have the procedures" and "the lake is healthy" is where most teams struggle.
This guide covers what each maintenance operation does, why the order matters, how teams typically automate with scripts and orchestrators, where that approach breaks down, and how a dedicated control plane like LakeOps replaces manual maintenance with an autonomous, policy-driven system that keeps every table optimized without human intervention.
The five maintenance operations
Iceberg's maintenance surface consists of five operations. Each targets a different layer of the table — data files, metadata, or storage — and each has dependencies on the others.
1. Snapshot expiration
Every commit to an Iceberg table — every write, every compaction, every schema change — creates a new snapshot. Snapshots enable time travel and rollback, but they also pin references to data files. A table with 5-minute streaming commits accumulates 8,640 snapshots per month. Each one prevents the storage layer from reclaiming superseded data files.
1CALL catalog.system.expire_snapshots(2 table => 'analytics.clickstream',3 older_than => TIMESTAMP '2026-06-04 00:00:00',4 retain_last => 1005);Key parameters:
- `older_than` — snapshots older than this timestamp are expired. Typical streaming retention is 3–7 days; batch tables may retain 14–30 days for compliance
- `retain_last` — minimum number of snapshots to keep regardless of age. Prevents accidentally expiring all snapshots on a low-write table
- `max_concurrent_deletes` — parallelism for file deletion. Increase on tables with many expired files to reduce job duration
Without regular expiration, the metadata layer grows unboundedly and query planning slows as engines traverse an ever-longer snapshot chain. See Iceberg Table Health & Maintenance for a detailed walkthrough of snapshot lifecycle management.
2. Orphan file cleanup
Orphan files are data files that exist in object storage but are not referenced by any Iceberg snapshot. They are created by failed writes, aborted commits, crashed compaction jobs, and concurrent writer conflicts. Iceberg does not track them — they are invisible to query engines but fully billable by S3, GCS, or ADLS.
On mature data lakes, orphan files routinely account for 25–40% of billable storage on affected table prefixes.
1CALL catalog.system.remove_orphan_files(2 table => 'analytics.clickstream',3 older_than => TIMESTAMP '2026-05-31 00:00:00'4);Critical safety rule: Always use a retention threshold of 7+ days. Files from in-progress writes are temporarily orphaned until the writer commits. Deleting them prematurely corrupts the table. The default threshold in Iceberg is 3 days; production environments with long-running Spark or Flink jobs should use 7 days or more.
3. Data file compaction
Streaming ingestion, micro-batch ETL, and frequent appends create thousands of undersized data files. A Flink job checkpointing every 60 seconds against 100 partitions produces 144,000 new files per day — each one tracked in manifests, each one triggering an S3 GET on every query.
Compaction merges these small files into 256–512 MB targets, reducing file count by 10–100x and improving query performance proportionally. Iceberg supports three strategies — binpack, sort, and Z-order — each making a different tradeoff between compaction speed and query improvement.
1CALL catalog.system.rewrite_data_files(2 table => 'analytics.clickstream',3 strategy => 'sort',4 sort_order => 'event_type ASC NULLS LAST',5 where => 'event_date < current_date()',6 options => map(7 'target-file-size-bytes', '268435456',8 'min-input-files', '5',9 'partial-progress.enabled', 'true',10 'partial-progress.max-commits', '10',11 'max-concurrent-file-group-rewrites', '10'12 )13);The where clause is critical for streaming tables — it restricts compaction to cold partitions, preventing conflicts with active writers. partial-progress.enabled allows the job to commit in chunks so a single conflict does not invalidate the entire run. For a deep dive on compaction strategies and tuning, see Kafka to Iceberg Compaction — Done Right.
4. Manifest rewriting
Every commit creates at least one new manifest file. Manifests track which data files belong to which partitions, along with per-file column statistics. After weeks of streaming writes, a table can accumulate thousands of manifest files — each one read during query planning.
1CALL catalog.system.rewrite_manifests(2 table => 'analytics.clickstream'3);Manifest rewriting consolidates fragmented manifests into a handful of properly sized ones, organized by partition spec. On a table with 2,000+ manifests, this can cut query planning time from seconds to milliseconds. Always run manifest rewriting after compaction — compaction changes the file set, and manifests should reflect the final layout.
5. Metadata file cleanup
Each snapshot also produces a metadata file. Iceberg can clean these automatically if you enable the table property:
1ALTER TABLE analytics.clickstream SET TBLPROPERTIES (2 'write.metadata.delete-after-commit.enabled' = 'true',3 'write.metadata.previous-versions-max' = '100'4);This is a write-time optimization that does not require a separate maintenance job. Enable it on every production table.
Why the order matters
The four active maintenance operations have dependencies. Running them in the wrong order wastes compute, misses cleanup opportunities, or produces stale metadata.
The correct sequence is: expire snapshots → remove orphan files → compact data files → rewrite manifests.

Why expire before compact. Snapshot expiration dereferences data files that are no longer needed. If you compact first, the compaction engine reads and rewrites files that expiration would have removed — wasting compute on data no query will ever read again. On a table with months of accumulated snapshots, this waste can amount to hundreds of gigabytes of unnecessary I/O.
Why clean orphans before compact. After expiration releases file references, orphan cleanup removes the physical files. This reduces the storage footprint before compaction runs, so compaction operates on a clean dataset. It also prevents stale orphan files from inflating per-partition file counts and skewing compaction decisions.
Why compact before rewriting manifests. Compaction removes input files and creates new output files. Manifests track files. If you rewrite manifests first, they will be aligned to the pre-compaction layout. The moment compaction runs, those carefully rewritten manifests become stale and fragmented again. Compact first, then rewrite manifests against the final file set.
Running operations independently — separate cron jobs on separate schedules — produces exactly these problems. Snapshot expiration runs on Monday, compaction runs on Tuesday, orphan cleanup runs on Wednesday. Each operation has already missed the window where its output would feed the next step cleanly.
Measuring table health
Before automating maintenance, you need to know which tables need attention and how urgently. Iceberg's metadata tables provide the raw signals.
File health per partition
1SELECT2 partition,3 COUNT(*) AS file_count,4 AVG(file_size_in_bytes) / 1048576 AS avg_size_mb,5 SUM(CASE WHEN file_size_in_bytes < 67108864 THEN 1 ELSE 0 END) AS small_files6FROM analytics.clickstream.files7GROUP BY partition8ORDER BY file_count DESC;Healthy: fewer than 100 files per partition, averaging 256–512 MB. Warning: 500+ files or average size below 128 MB. Critical: 1,000+ files or average below 64 MB.
Snapshot accumulation
1SELECT2 COUNT(*) AS snapshot_count,3 MIN(committed_at) AS oldest_snapshot,4 MAX(committed_at) AS latest_snapshot5FROM analytics.clickstream.snapshots;If snapshot count exceeds 1,000, expiration is not running or not aggressive enough. Each snapshot pins file references and adds planning overhead.
Manifest fragmentation
1SELECT2 COUNT(*) AS manifest_count,3 AVG(added_data_files_count + existing_data_files_count) AS avg_files_per_manifest4FROM analytics.clickstream.manifests;Target: fewer than 100 manifests per snapshot. Streaming tables with 5-minute commits can accumulate 8,600+ manifests in 30 days.
Running these queries manually does not scale beyond a handful of tables. At 50+ tables, you need automated health classification — which is one of the core capabilities a control plane like LakeOps provides. LakeOps continuously monitors these signals across every table and classifies each as Healthy, Warning, or Critical, surfacing the tables that need attention without manual SQL inspection.
Automating with Airflow and scripts
The most common approach to Iceberg maintenance automation is an Airflow DAG that runs Spark SQL procedures on a schedule:
1from airflow import DAG2from airflow.providers.apache.spark.operators.spark_sql import SparkSqlOperator3from datetime import datetime, timedelta4 5with DAG(6 'iceberg_maintenance',7 schedule_interval='0 2 * * *',8 start_date=datetime(2026, 1, 1),9 catchup=False,10) as dag:11 12 expire = SparkSqlOperator(13 task_id='expire_snapshots',14 sql="""15 CALL catalog.system.expire_snapshots(16 table => 'analytics.clickstream',17 older_than => current_timestamp() - INTERVAL 5 DAYS,18 retain_last => 10019 )20 """,21 )22 23 orphans = SparkSqlOperator(24 task_id='remove_orphans',25 sql="""26 CALL catalog.system.remove_orphan_files(27 table => 'analytics.clickstream',28 older_than => current_timestamp() - INTERVAL 7 DAYS29 )30 """,31 )32 33 compact = SparkSqlOperator(34 task_id='compact_files',35 sql="""36 CALL catalog.system.rewrite_data_files(37 table => 'analytics.clickstream',38 strategy => 'binpack',39 where => 'event_date < current_date()',40 options => map(41 'target-file-size-bytes', '268435456',42 'min-input-files', '5',43 'partial-progress.enabled', 'true'44 )45 )46 """,47 )48 49 manifests = SparkSqlOperator(50 task_id='rewrite_manifests',51 sql="""52 CALL catalog.system.rewrite_manifests(53 table => 'analytics.clickstream'54 )55 """,56 )57 58 expire >> orphans >> compact >> manifestsThis works for a single table. It enforces the correct operation order, runs daily, and isolates failures per step. But it has structural limitations that compound as the lake grows.
Where scripts and cron jobs break down
The gap between Iceberg's maintenance procedures and production-grade automation is not a single missing feature — it is an entire operational layer. Here is what scripts and Airflow DAGs cannot provide:
| Capability | Iceberg procedures | Scripts / Airflow | What you actually need |
|---|---|---|---|
| Maintenance operations | Four SQL procedures | Same, wrapped in tasks | Same |
| Scheduling | None | Cron / Airflow scheduler | Event-driven triggers based on table state |
| Health detection | None | Manual SQL queries | Continuous per-table monitoring with thresholds |
| Multi-table prioritization | None | First-in-first-out queue | Cost-aware ordering (worst tables first) |
| Operation sequencing | None | DAG dependency chains | Coordinated pipeline with awareness of each step's output |
| Conflict handling | ValidationException on retry | Catch-and-retry logic you build | Partition-aware execution that avoids hot partitions |
| Resource isolation | None | Cluster sizing | Bounded memory per operation, no OOM |
| Observability | Procedure return values | Log parsing | Per-operation audit trail with before/after metrics |
| Alerting | None | Custom integrations | Severity-ranked insights (Critical / Warning / Healthy) |
| Policy governance | None | DAG configuration files | Versioned, auditable policies at catalog/namespace/table scope |
Fixed schedules are wasteful. A nightly compaction job runs on every table regardless of whether it needs compaction. A streaming table that accumulated 50,000 small files in 6 hours waits until midnight. A batch table that received no new data since last week gets compacted anyway.
Per-table DAGs do not scale. At 10 tables, you maintain 10 DAGs. At 100 tables across 3 catalogs, you maintain 100 DAGs with different schedules, different retention windows, different compaction strategies. Every new table requires a new DAG — and the DAGs themselves become a maintenance burden.
No awareness of table state. Airflow runs what you schedule, when you schedule it. It does not know that raw_clickstream has 200,000 small files and needs urgent compaction while dim_products has 12 files and needs nothing. Both get the same treatment at the same time.
JVM overhead compounds. Each Spark maintenance job carries JVM startup (2–5 minutes), garbage collection pauses, and executor provisioning costs. Across 100 tables, the compaction compute bill can rival the query compute bill. On tables above 1 TB with complex delete patterns, Spark routinely OOMs — requiring cluster resizing or job splitting.
No cross-operation coordination. Even with DAG dependencies enforcing the correct order per table, there is no coordination across tables. Compaction on Table A may conflict with a streaming writer on the same cluster. Orphan cleanup on Table B may start before snapshot expiration on Table B has finished because they are in separate DAGs.
LakeOps: autonomous maintenance for every table
LakeOps is an autonomous control plane for Apache Iceberg that replaces scripts, cron jobs, and Airflow DAGs with a closed-loop maintenance system. It connects to your existing catalogs (Glue, REST, Polaris, Nessie, S3 Tables) and query engines (Trino, Spark, Flink, Snowflake, Athena, DuckDB) without moving data or changing pipelines. Your data stays in your storage account; LakeOps reads metadata and telemetry, executes maintenance, and commits through standard Iceberg APIs.

Here is how LakeOps automates each maintenance operation, component by component.
Compaction: Rust engine with query-aware sort
LakeOps runs compaction on a purpose-built engine written in Rust on Apache DataFusion. The engine processes Parquet data through Arrow columnar buffers with bounded memory, lock-free parallelism, and no JVM — eliminating the startup, GC, and OOM problems that plague Spark-based compaction.
Production benchmarks show binpack completing in 221 seconds versus Spark's 1,612 seconds on identical 200 GB datasets — at roughly $5/TB versus $50/TB. A 1.2 TB table that caused Spark to OOM completed in 11 minutes on the Rust engine. For a full benchmark analysis, see 9 Iceberg Compaction Tools Compared.

Beyond speed, LakeOps compaction is query-aware. The engine collects query telemetry across every connected engine — Trino, Spark, Snowflake, Athena, DuckDB, Flink — and identifies which columns appear in WHERE, JOIN, and GROUP BY clauses per table. Compaction then sorts data by those columns, so Parquet min/max statistics enable engines to skip entire files. The result: every query runs faster across every engine, without changing a single query or pipeline.
Compaction is configurable per table through the Optimization tab — strategy (binpack or sort), predicate scoping, cron schedule, and an instant Execute button for on-demand runs.
Snapshot expiration: automated retention with safety guardrails
LakeOps runs snapshot expiration as the first step in the maintenance pipeline. Retention is configurable per policy — use the table's own history.expire.max-snapshot-age-ms setting, or override with a custom retention period (e.g., 5 days for streaming, 30 days for compliance). A minimum snapshot retention count prevents accidentally expiring all snapshots on low-write tables.
Configuration options include whether to delete associated metadata files and data files — giving you control over how aggressively storage is reclaimed. The operation runs automatically on the configured cron schedule or as part of Adaptive Maintenance, with every execution logged in the Events tab.
Orphan file cleanup: safe reclamation with age thresholds
After snapshot expiration releases file references, LakeOps runs orphan cleanup to remove the physical files from storage. The configurable age threshold (default: 7 days) ensures files from in-progress writes are never deleted. On mature streaming lakes where orphans account for 25–40% of storage costs, this operation alone can produce significant savings. See Reduce S3 Costs on Iceberg Tables for storage cost analysis.
Manifest rewriting: planning-time optimization
After compaction produces the final file layout, LakeOps rewrites manifests to match. Fragmented manifests from weeks of streaming writes are consolidated into properly sized manifests organized by partition spec. This cuts query planning time — the overhead engines pay before reading any data — from seconds to milliseconds on tables with heavy write activity.
The coordinated pipeline
These four operations do not run independently. LakeOps sequences them per table in the correct order: expire snapshots → remove orphans → compact → rewrite manifests. Each step's output feeds the next. The pipeline is conflict-aware — compaction targets cold partitions while active streaming partitions continue receiving writes. If a conflict occurs, LakeOps retries the affected partition on the next cycle. No data is lost or corrupted.

Policies: maintenance rules at scale
Configuring maintenance per table works at 10 tables. At 100 or 500, you need policies. LakeOps policies apply maintenance rules at the catalog, namespace, or table level — every table in scope inherits the policy automatically, including new tables created after the policy is defined.
Policy types
- Adaptive Maintenance — bundles compaction, snapshot expiration, manifest rewrite, and delete file handling into a single data-driven policy. LakeOps monitors table activity and runs the right operations at the right time. When active on a table, it manages all individual optimization sections automatically
- Expire Snapshots — configurable retention period, minimum snapshot count, metadata/data file deletion toggles
- Remove Orphan Files — configurable age threshold (default: 7 days)
- Rewrite Manifests — consolidate manifests on a schedule
- Configuration & Governance — enforce table-level settings like Iceberg format version, default file format (Parquet/ORC/Avro), write distribution mode, and commit retry settings across the organization
Scope and precedence
Policies cascade: catalog-wide → namespace → table. More specific policies override broader ones. A per-table policy always takes precedence over a namespace or catalog-wide policy for the same operation type. This lets you set sensible defaults at the catalog level and override only where needed.
New tables automatically inherit policies from their parent namespace or catalog — no manual setup required. Tables can be excluded from inherited policies with a single toggle.
Example policy set
| Policy | Type | Scope | Schedule |
|---|---|---|---|
| prod_adaptive_maintenance | Adaptive Maintenance | analytics catalog | Data-driven |
| prod_expire_snapshots | Expire Snapshots | analytics catalog | Hourly |
| prod_rewrite_manifests | Rewrite Manifests | analytics catalog | Daily at 4:00 AM |
| org_orphan_cleanup | Remove Orphan Files | All catalogs | Daily at 3:00 AM |
| streaming_compaction_override | Per-table compaction | raw_clickstream | Every 2 hours |
Every policy change is versioned and auditable — you can see when a policy was created, modified, enabled, or disabled, and by whom.

Observability: seeing the health of every table
Autonomous maintenance without observability is a black box. LakeOps provides three levels of visibility:
Lake-wide dashboard
The dashboard shows aggregate metrics across all connected catalogs: total operations run, average query acceleration (pre vs. post optimization), estimated cost savings, CPU and storage reduction, and total data optimized. This is the executive view — a single screen that answers "is the lake getting healthier or worse?"
Per-table health classification
Every table is continuously classified as Healthy, Warning, or Critical based on file count, file size distribution, manifest fragmentation, snapshot depth, and orphan file accumulation. Streaming tables with small-file buildup or delete file accumulation surface as Warning or Critical immediately — not after a user reports slow queries.

Insights: proactive alerts
The Insights tab surfaces problems at four severity levels before users notice them:
- CRITICAL — severe fragmentation (e.g., raw_clickstream with 312 partitions and extreme file count)
- HIGH — excessive manifests or snapshot accumulation (e.g., search_query_logs with 2,000+ manifests)
- WARNING — partition skew, small file buildup, or rising delete-to-data ratios
- LOW — early signs of accumulation that will become problems if unaddressed
Each insight links directly to the affected table for one-click investigation. After policy-driven maintenance resolves the issue, the insight clears automatically.

Per-table events and metrics
The Events tab provides a complete audit trail — every compaction, snapshot expiration, orphan removal, and manifest rewrite with duration, before/after file counts, data volume, and status. The Metrics tab shows convergence toward targets: avgFileSize trending toward 256 MB, totalFiles decreasing, snapshot count stabilizing.
Simulations: preview before you apply
Before committing to a compaction strategy, LakeOps lets you simulate the impact on a real Iceberg branch. Layout simulations show how many files will be rewritten, expected output file count and average size, field access frequency analysis for query-aware optimization, and side-by-side comparison of multiple strategies. This eliminates the guesswork of "should I binpack or sort?" — run both simulations and compare the projected query improvement.
Getting started: from manual to autonomous
The path from manual scripts to autonomous maintenance is incremental. LakeOps supports every step:
1. Connect your catalogs. Point LakeOps at your existing Glue, REST, Polaris, Nessie, or S3 Tables catalog. Discovery and health classification begin immediately — no agents to install, no data to move.

2. Inspect table health. Review which tables are Healthy, Warning, or Critical. Check the Insights tab for severity-ranked issues. Identify the tables that need attention first.
3. Run maintenance manually. Use the per-table Optimization tab to run compaction, snapshot expiration, orphan cleanup, or manifest rewriting on individual tables. Review results in the Events and Metrics tabs. Build confidence in how each operation affects your tables.

4. Enable per-table schedules. Toggle on scheduled compaction, snapshot expiration, and manifest rewriting for individual tables. Set cron expressions and verify results over a few cycles.
5. Create catalog-wide policies. Once validated on individual tables, create policies at the namespace or catalog level. Every existing and future table inherits the maintenance rules automatically.
6. Enable Adaptive Maintenance. For fully autonomous operation, activate Adaptive Maintenance. LakeOps monitors table activity signals and runs the right operations at the right time — no fixed schedules, no manual tuning. The system adapts as workloads change.
Configuration checklist
Table properties (set once per table)
- Enable metadata cleanup: `write.metadata.delete-after-commit.enabled = true`
- Set target file size: `write.target-file-size-bytes = 268435456` (256 MB) for analytics,
536870912(512 MB) for heavy scans - For streaming tables, use `hash` write distribution mode to reduce cross-partition file scatter
- Prefer `day(timestamp)` hidden partitioning over
hour(timestamp)unless volume exceeds 5 GB/hour. See Partitioning Best Practices
Maintenance operations (per table or via policy)
- Expire snapshots: 3–7 day retention for streaming, 14–30 days for batch/compliance. Retain at least 100 snapshots
- Orphan cleanup: 7+ day age threshold. Run daily after snapshot expiration
- Compaction: Binpack every 1–4 hours for streaming tables, daily for batch. Sort compaction daily or weekly for query optimization. Always exclude the active partition
- Manifest rewrite: Daily, after compaction completes
Monitoring thresholds (alert when breached)
- File count per partition exceeds 500
- Average file size drops below 64 MB
- Snapshot count exceeds 1,000
- Manifest count exceeds 100 per snapshot
- Query planning time exceeds 5 seconds
- S3 GET request costs spike more than 20% week-over-week
Summary
Apache Iceberg provides the maintenance operations. It does not provide the automation, scheduling, health detection, conflict handling, sequencing, observability, or governance that production lakehouses require. Most teams start with Airflow DAGs and Spark SQL procedures — and most teams outgrow that approach within months as the table count crosses 50.
The production-grade approach is a dedicated control plane that monitors every table continuously, sequences operations correctly, adapts to each table's write pattern, executes compaction on an engine built for the workload, and provides lake-wide observability with per-table audit trails.
LakeOps provides the full stack — from catalog connection to autonomous, policy-driven maintenance that keeps every Iceberg table compacted, lean, and queryable without scripts, cron jobs, or manual intervention.
Further reading
- Iceberg Table Health & Maintenance — the correct maintenance sequence and why order matters
- Kafka to Iceberg Compaction — Done Right — compaction strategies, tuning, and conflict avoidance for streaming tables
- Kafka to Iceberg: Ingestion Guide — ingestion paths and the operational reality of streaming into Iceberg
- Fixing Small Files in Apache Iceberg — root causes, measurement, and automated resolution
- 9 Iceberg Compaction Tools Compared — LakeOps, Spark, AWS Glue, S3 Tables, Snowflake, and more
- Reduce S3 Costs on Iceberg Tables — storage cost optimization and orphan file impact
- Iceberg Delete Files Guide — monitoring and resolving delete file accumulation
- Iceberg Partitioning Best Practices — choosing the right partition strategy
- Apache Iceberg 1.11.0 — What's New? — deletion vectors, Variant type, and V3 improvements



