Query Performance
Incredible Iceberg
query performance
LakeOps continuously reshapes data layout based on actual query patterns — sort order, file sizes, manifests, and delete files all adapt to how your tables are really used. The result: up to 12× faster queries across Trino, Spark, and Snowflake without changing a single query or pipeline.
Measured Results
Production benchmarks
Query speed
After compaction + layout optimization
Scan reduction
Query-aware sort enables data skipping
File reduction
42,633 → 69 files on streaming tables
TPC-H benchmark suite • Production Iceberg tables • Multi-cloud, multi-engine
The problem
Why Iceberg tables get slow
Iceberg's metadata architecture is built for fast queries. But without active maintenance, physical table state degrades — and every query pays the penalty.
Small files multiply per-query overhead
Streaming ingestion creates thousands of tiny files. Each file costs an S3 GET request, a metadata read, and a connection — query time scales with file count, not data volume.
Unsorted data defeats data skipping
Without sort order aligned to query patterns, Parquet min/max statistics are useless. Engines scan every row group regardless of predicate filters.
Fragmented manifests bloat planning time
Hundreds of small manifests force the query planner to read excessive metadata. Planning often dominates total query time at 200+ manifests per table.
Delete files compound read amplification
Merge-on-read tables accumulate position delete files. Every query reconciles deletes at read time — performance degrades linearly with delete file count.
How LakeOps accelerates queries
Six layers of
performance optimization.
Each layer amplifies the others. Fewer files + sorted data + lean manifests + clean deletes + right engine = compound acceleration on every query.
Query-aware compaction
Data sorted by how it's actually queried
LakeOps tracks which columns appear in WHERE, JOIN, and GROUP BY clauses for every table. During compaction, data is physically sorted by those columns — so Parquet row group statistics enable engines to skip irrelevant data without reading it.
- 51% less data scanned — sorted by real filter columns, per table
- 47,000 → 280 files: same data, same query — 52s drops to 5.8s
- Self-improving: sort strategy adapts as query patterns evolve
Files before
47,000
Files after
280
Query before
52s
Query after
5.8s
Scan volume reduced 51%
Query-aware sort + optimized file layout
95% faster Rust engine
Tables stay optimized because compaction is fast enough to run continuously
A purpose-built Rust engine with Apache DataFusion eliminates JVM/GC overhead. Compaction completes in minutes instead of hours — so tables never degrade between maintenance windows.
- 221s vs 1,612s (Spark) vs 6,300s (S3 Tables) on identical 200 GB
- 2,522 MB/s peak throughput — TB-scale tables compacted in minutes
- Bounded memory: no OOM crashes regardless of table size
Manifest & metadata optimization
Query planning stays fast at any table scale
LakeOps consolidates fragmented manifests and computes Puffin column statistics (NDV, min/max, null counts). Planners read fewer manifests and make smarter skip decisions — planning drops from seconds to milliseconds.
- Manifest consolidation: 200+ manifests → ~30 in a single atomic rewrite
- Puffin statistics enable aggressive file-level pruning across all engines
- Auto-triggered after compaction cycles — manifests never drift
Rewrite Manifests
Consolidate for faster planning
Rewrite Position Deletes
Eliminate read-time overhead
Compute Puffin Statistics
Enable aggressive file pruning
Delete file optimization
Eliminate read-time reconciliation overhead
Position delete files from merge-on-read workloads accumulate and force every query to reconcile deletions at scan time. LakeOps consolidates and physically applies delete files so reads are always clean.
- Rewrite Position Deletes: consolidate without full table rewrite
- Full compaction: physically merge deletes — zero read-time overhead
- 23,433 delete files (551M rows) cleaned in one compaction cycle
Delete files
23,433
After cleanup
0
Rows affected
551M
Read overhead
Eliminated
Multi-engine query routing
Every query on the fastest engine for its shape
LakeOps routes queries across Trino, Spark, DuckDB, Snowflake, Athena, and Flink based on latency profile, query shape, and engine availability. Interactive queries hit sub-second engines. Heavy scans go where compute is strongest.
- DuckDB: 0.5s point lookups vs 2.3s on Athena for same query
- Three strategies: latency, cost, throughput — per routing group
- Optimized tables unlock faster engines for more workload shapes
Right engine for every query shape
Layout simulations
Test sort strategies on real data before rewriting anything
Run proposed layout changes on an isolated Iceberg branch — real data, real query patterns replayed. Compare scan reduction and planning overhead across multiple strategies. Discard the branch. Zero production risk.
- Field access frequency analysis: which columns in FILTER, SELECT, JOIN
- Side-by-side comparison of file sizes, strategies, and sort keys
- Predicted vs actual: measurable before committing to a rewrite
Branch discarded after analysis — zero production impact
Runs on your stack
Go deeper
Optimizing Iceberg Lakehouse Performance
Six layers that determine query speed and how to optimize each one.
Efficient Lakehouse Compaction at Scale
Event-driven triggers, query-aware sort, Rust engine, and coordinated maintenance.
7 Iceberg Cost Reduction Strategies
Performance and cost reinforce each other — seven strategies for both.
Minutes to value with no risk
Connect & collect telemetry
Manual or autonomous management
Operations run & optimize
Observability & governance
Production benchmarks
5.5 TB across 10 production tables
Real workloads. Real data. Batch, streaming, delete-heavy, multi-writer, and terabyte-scale tables — all on the same engine, same hardware.
| Table | Size | Workload | Files (B → A) | Throughput | Time | Notes |
|---|---|---|---|---|---|---|
| balance_snapshots | 1,192 GB | TB-Scale batch | 11,957 → 3,270 | 1,572 MB/s | 11 min | Spark OOM on same hardware |
| user_accounts | 174 GB | Batch | 878 → 400 | 2,269 MB/s | 74s | Single Node |
| events_analytics | 484 GB | Delete-Heavy | 16,128 → 7,198 | 729 MB/s | 11m 21s | 23,433 delete files; 551M rows removed |
| raw_sdk_events | 8 GB | Streaming | 42,633 → 69 | 167 MB/s | 138s | 99.8% file reduction |
| site_traffic | 292 GB | Multi-Writer | 2,740 → 754 | 1,465 MB/s | 3m 25s | Single partition |
| cluster_registry | 322 GB | Batch | 998 → 440 | 2,522 MB/s | 2m | Peak throughput |
Compaction cost per TB
Normalized to Spark = 100%
Source: 200 GB (~1 TB uncompressed) benchmark. Spark cost index 100 vs LakeOps 10.
Self-improving: same table, zero config changes
balance_snapshots — 1.192 TB across consecutive runs
Same data and hardware; planner learns workload telemetry and improves runtime from 22 to 11 minutes.
See your projected acceleration
Connect your catalog and get a free performance analysis in 10 minutes — see exactly where your tables are degraded and how much LakeOps can accelerate them.
