Apache Iceberg logo

Query Performance

Incredible Iceberg
query performance

LakeOps continuously reshapes data layout based on actual query patterns — sort order, file sizes, manifests, and delete files all adapt to how your tables are really used. The result: up to 12× faster queries across Trino, Spark, and Snowflake without changing a single query or pipeline.

12×
faster queries
51%
less data scanned
99.8%
max file reduction
95%
faster compaction

Measured Results

Production benchmarks

Query speed

12×faster

After compaction + layout optimization

Scan reduction

51%less data

Query-aware sort enables data skipping

File reduction

99.8%fewer files

42,633 → 69 files on streaming tables

TPC-H benchmark suite • Production Iceberg tables • Multi-cloud, multi-engine

The problem

Why Iceberg tables get slow

Iceberg's metadata architecture is built for fast queries. But without active maintenance, physical table state degrades — and every query pays the penalty.

Small files multiply per-query overhead

Streaming ingestion creates thousands of tiny files. Each file costs an S3 GET request, a metadata read, and a connection — query time scales with file count, not data volume.

Unsorted data defeats data skipping

Without sort order aligned to query patterns, Parquet min/max statistics are useless. Engines scan every row group regardless of predicate filters.

Fragmented manifests bloat planning time

Hundreds of small manifests force the query planner to read excessive metadata. Planning often dominates total query time at 200+ manifests per table.

Delete files compound read amplification

Merge-on-read tables accumulate position delete files. Every query reconciles deletes at read time — performance degrades linearly with delete file count.

How LakeOps accelerates queries

Six layers of
performance optimization.

Each layer amplifies the others. Fewer files + sorted data + lean manifests + clean deletes + right engine = compound acceleration on every query.

Query-aware compaction

Data sorted by how it's actually queried

LakeOps tracks which columns appear in WHERE, JOIN, and GROUP BY clauses for every table. During compaction, data is physically sorted by those columns — so Parquet row group statistics enable engines to skip irrelevant data without reading it.

  • 51% less data scanned — sorted by real filter columns, per table
  • 47,000 → 280 files: same data, same query — 52s drops to 5.8s
  • Self-improving: sort strategy adapts as query patterns evolve
Query Acceleration12× faster

Files before

47,000

Files after

280

Query before

52s

Query after

5.8s

Scan volume reduced 51%

Query-aware sort + optimized file layout

95% faster Rust engine

Tables stay optimized because compaction is fast enough to run continuously

A purpose-built Rust engine with Apache DataFusion eliminates JVM/GC overhead. Compaction completes in minutes instead of hours — so tables never degrade between maintenance windows.

  • 221s vs 1,612s (Spark) vs 6,300s (S3 Tables) on identical 200 GB
  • 2,522 MB/s peak throughput — TB-scale tables compacted in minutes
  • Bounded memory: no OOM crashes regardless of table size
Compaction Speed95% faster
6300s
S3 Tables
1612s
Spark
221s
LakeOps
780s
LakeOps Sort

Manifest & metadata optimization

Query planning stays fast at any table scale

LakeOps consolidates fragmented manifests and computes Puffin column statistics (NDV, min/max, null counts). Planners read fewer manifests and make smarter skip decisions — planning drops from seconds to milliseconds.

  • Manifest consolidation: 200+ manifests → ~30 in a single atomic rewrite
  • Puffin statistics enable aggressive file-level pruning across all engines
  • Auto-triggered after compaction cycles — manifests never drift
Metadata Optimization3 operations

Rewrite Manifests

Consolidate for faster planning

Planning

Rewrite Position Deletes

Eliminate read-time overhead

Reads

Compute Puffin Statistics

Enable aggressive file pruning

Skipping

Delete file optimization

Eliminate read-time reconciliation overhead

Position delete files from merge-on-read workloads accumulate and force every query to reconcile deletions at scan time. LakeOps consolidates and physically applies delete files so reads are always clean.

  • Rewrite Position Deletes: consolidate without full table rewrite
  • Full compaction: physically merge deletes — zero read-time overhead
  • 23,433 delete files (551M rows) cleaned in one compaction cycle
Delete File CleanupZero overhead

Delete files

23,433

After cleanup

0

Rows affected

551M

Read overhead

Eliminated

Multi-engine query routing

Every query on the fastest engine for its shape

LakeOps routes queries across Trino, Spark, DuckDB, Snowflake, Athena, and Flink based on latency profile, query shape, and engine availability. Interactive queries hit sub-second engines. Heavy scans go where compute is strongest.

  • DuckDB: 0.5s point lookups vs 2.3s on Athena for same query
  • Three strategies: latency, cost, throughput — per routing group
  • Optimized tables unlock faster engines for more workload shapes
Engine RoutingLatency-aware
DuckDBPoint lookups0.5s
TrinoInteractive SQL1.8s
SnowflakeHigh concurrency2.1s
SparkLarge distributed scans4.2s

Right engine for every query shape

Layout simulations

Test sort strategies on real data before rewriting anything

Run proposed layout changes on an isolated Iceberg branch — real data, real query patterns replayed. Compare scan reduction and planning overhead across multiple strategies. Discard the branch. Zero production risk.

  • Field access frequency analysis: which columns in FILTER, SELECT, JOIN
  • Side-by-side comparison of file sizes, strategies, and sort keys
  • Predicted vs actual: measurable before committing to a rewrite
Layout SimulationBranch-based
1Analyze field access frequency (FILTER, SELECT, JOIN)
2Create isolated Iceberg branch
3Apply proposed sort order to real data
4Replay production queries, measure scan reduction

Branch discarded after analysis — zero production impact

Runs on your stack

AWSAzureGoogle CloudSnowflakeDatabricksApache FlinkApache HadoopApache IcebergDelta LakeSparkLakekeeperStarRocks

Minutes to value with no risk

1

Connect & collect telemetry

Apache Iceberg
AWS
Snowflake
Trino
2

Manual or autonomous management

Manual
Autonomous
3

Operations run & optimize

Compaction
Snapshots
Orphan cleanup
Manifests & metadata
4

Observability & governance

Metrics
Health
Agents
Routing
Logs
Policies
No vendor lock-in
No code / infra changes
No data changes

Production benchmarks

5.5 TB across 10 production tables

Real workloads. Real data. Batch, streaming, delete-heavy, multi-writer, and terabyte-scale tables — all on the same engine, same hardware.

101K → 19K
files (81% reduction)
2,522 MB/s
peak throughput
99.8%
max file reduction
551M
deleted rows cleaned
TableSizeWorkloadFiles (B → A)ThroughputTimeNotes
balance_snapshots1,192 GBTB-Scale batch11,9573,2701,572 MB/s11 minSpark OOM on same hardware
user_accounts174 GBBatch8784002,269 MB/s74sSingle Node
events_analytics484 GBDelete-Heavy16,1287,198729 MB/s11m 21s23,433 delete files; 551M rows removed
raw_sdk_events8 GBStreaming42,63369167 MB/s138s99.8% file reduction
site_traffic292 GBMulti-Writer2,7407541,465 MB/s3m 25sSingle partition
cluster_registry322 GBBatch9984402,522 MB/s2mPeak throughput

Compaction cost per TB

Normalized to Spark = 100%

Apache Spark100%
AWS S3 Tables / Databricks100%
LakeOps10%

Source: 200 GB (~1 TB uncompressed) benchmark. Spark cost index 100 vs LakeOps 10.

Self-improving: same table, zero config changes

balance_snapshots — 1.192 TB across consecutive runs

Run 122 min · 925 MB/s
Run 218 min · 1,100 MB/s
Run 3 (learned)11 min · 1,572 MB/s

Same data and hardware; planner learns workload telemetry and improves runtime from 22 to 11 minutes.

See your projected acceleration

Connect your catalog and get a free performance analysis in 10 minutes — see exactly where your tables are degraded and how much LakeOps can accelerate them.