Apache Iceberg logo

Managed Apache Iceberg

Managed and optimized
Iceberg Lakehouse

LakeOps continuously optimizes compaction, data layout, and autonomous table maintenance — snapshots, manifests, metadata, and orphan files — across every engine and cloud, so your Iceberg tables stay fast, lean, and production-ready.

80%cost reduction
12×faster queries
95%faster compaction
100%autonomous

The challenge

Iceberg is powerful — but hard
to operate at scale

Rising cost & latency

Small files, stale snapshots, and orphaned files compound quietly, driving up compute and storage cost while query latency keeps drifting.

Manual engine operations

Spark, Trino, Athena, Snowflake, and Databricks optimize differently. Teams juggle per-engine scripts, configs, and schedules that do not scale.

Metadata & layout drift

Manifests bloat, partitions skew, and layouts drift from real workloads, degrading scan efficiency, cache locality, and query planning.

Ops debt & compliance

Ad-hoc scripts and reactive firefighting create operational debt. Retention, DR, and GDPR policies stay manual instead of control-plane enforced.

Results

Measured impact on
real Iceberg workloads

Benchmarks from production-grade tables across multiple engines and cloud providers.

Compaction speed

95%faster

vs. Apache Spark on identical datasets

Spark
LakeOps
+ Sort

Query performance

12×faster

After compaction + layout optimization

Cost savings

80%reduction

In compute & storage spend

Table health

100%healthy

Autonomous maintenance keeps every table optimized

TPC-DS benchmark suiteProduction Iceberg tablesMulti-cloud, multi-engine

Capabilities

Autonomous Lakehouse
orchestration and optimization

Every layer of your lakehouse — from compaction and metadata to engines, observability, and policy enforcement — managed from one control plane.

Compaction Benchmark1 TB TPC-DS
29× faster

Compaction Duration

Seconds

6300s
1612s
221s
780s
80006000400020000
S3 Tables
Apache Spark
LakeOps
LakeOps (Sort)

Cost of Compaction

Cost ($)

0
0
0
0
100%0
S3 Tables
Apache Spark
LakeOps
LakeOps (Sort)
Triggered by telemetry signals — not cron. Runs only when file health degrades.

Compaction

Intelligent compaction

Not just file merging — LakeOps analyzes which columns your queries actually filter, join, and group on, then organizes data files accordingly. The result: predicate pushdown and column pruning skip entire file groups, reducing I/O, query time, and compute cost across every engine reading the table. Powered by a Rust-based engine with Apache DataFusion — 95% faster and ~10x cheaper than Spark.

  • Query-aware: sorts data by the columns your workloads use most to cut query time and CPU
  • Triggered when needed; no cron jobs
  • Self-improving planner that adapts as query patterns change
Learn more
customer_ordersMaintenance Health
Autonomous
Compaction Scope— what will be targeted this run
already compacted
38% hot zone
#4,812 baseline#7,104 watermark#8,506 now

Coordinated Operations

execution order →
In Progress

Compaction

72%

38% small files — merging 970 → 87 at 512 MB target

then →Expire Snapshots
Cooling

Expire Snapshots

45%

154 snapshots, 62 past 30-day retention

then →Rewrite Manifests
Idle

Rewrite Manifests

18%

12 manifests — below threshold, waiting for compaction

then →Orphan Cleanup
Idle

Orphan Cleanup

8%

847 MB unreferenced — scheduled after expiration

Learning from telemetry

Query patterns

event_date, region

Top sort columns (Trino + Spark)

Improvement

12.4× faster

Avg query speed after optimization

Cycle

Self-tuning

Sort orders adapt as patterns change

Maintenance

Adaptive maintenance that learns

LakeOps continuously collects telemetry — file counts, partition health, snapshot velocity, delete ratios, manifest growth, and query patterns — and uses that signal to decide what to run, when, and in what order. Each operation's outcome feeds back into the next decision. The result is a coordinated maintenance loop that eliminates redundant work, adapts to changing workloads, and keeps every table in optimal shape without human intervention.

  • Scores every table on file health, snapshot growth, and manifest overhead — acts only when needed
  • Sequences operations so each step's output is the next step's clean input
  • Learns from query telemetry across all engines to optimize sort orders and compaction targets
Learn more
Snapshot Lifecyclecustomer_orders
Policy active

Total Snapshots

154

Retention

30 days

Expired Today

12

Storage Freed

18.4 GB

Recent snapshots

684720193Mar 15, 12:18 PMAppend+4 data files
684720191Mar 15, 11:45 AMAppend+2 data files
684720189Mar 15, 10:30 AMOverwrite+6 data files
684720182Mar 14, 08:00 PMAppend+3 data filesExpiring

Version control

Snapshot lifecycle management

Automated retention, expiration, and version history for every table. Set policies once — LakeOps expires old snapshots safely with full awareness of concurrent readers. Time-travel to any point, compare snapshots, and roll back without manual intervention.

  • Policy-based retention: set once, enforced continuously across every table and catalog
  • Concurrency-safe expiration that respects active readers and in-flight queries
  • Full version history with time-travel, snapshot comparison, and one-click rollback
Learn more
Metadata Optimization3 operations
customer_orders

Manifests

487 → 12

97.5% reduced

Planner Latency

−2.1s

3.4s → 1.3s

Puffin Stats

100%

All columns indexed

Auto-operations

Rewrite Manifests

Consolidate manifest files for faster query planning

Rewrite Position Deletes

Optimize position delete files to improve read performance

Compute Statistics (Puffin)

Calculate column stats to optimize query planning and pruning

Last rewrite: 487 → 12 manifests · Planner overhead reduced 62% · 2 min ago

Manifest & metadata optimization

Consolidate and rewrite manifest files so query planning stays fast at any scale. Smaller manifests mean faster planning and fewer metadata scans for Trino, Spark, Flink, and every engine that touches your lake. Includes position delete file optimization and Puffin statistics computation.

  • Rewrites manifests after compaction so planners scan fewer metadata files per query
  • Resolves position delete files to eliminate per-row filter overhead on reads
  • Computes Puffin column statistics for tighter partition and file pruning
Learn more
Orphan File CleanupScheduled
847 MB detected

Unreferenced

847 MB

59,831 files

Age Threshold

7 days

Safety window

Last Cleanup

74.8 GB

Reclaimed 3 hrs ago

Recent cleanups

ecommerce_prod59,83174.81 GB13m 6.9s3 hrs ago
analytics22,034179.49 GB32m1 day ago
staging4,12812.3 GB2m 14s2 days ago

Configuration

Schedule:0 3 * * *
Scope:All catalogs
Files only removed if unreferenced + older than threshold

Orphan file detection & cleanup

Detect and safely remove files no longer referenced by any table. Eliminate storage drift from failed jobs, aborted commits, and legacy tables. Configurable retention thresholds, catalog-wide or per-table scope, and scheduled execution — reclaim capacity without risking data integrity.

  • Age-threshold safety: only removes files unreferenced for 7+ days — no risk to in-flight jobs
  • Runs after snapshot expiration so newly dereferenced files are caught in the same sweep
  • Catalog-wide or per-table scope with full audit trail of every file removed
Learn more
Lake ObservabilityAll engines connected

Queries Today

12,485

+12% from yesterday

Avg Latency

1.2s

−0.3s from last week

Active Engines

4 / 6

All critical online

Active Alerts

3

1 critical

Proactive insights

raw_clickstreamCritical

312 partitions exceed file threshold

Query scan amplified 8×

search_query_logsHigh

Excessive manifests (487) — planning overhead

Planner latency +2.1s

payment_transactionsWarning

Small file ratio 38% — compaction recommended

S3 GET costs elevated

Recent operations

Compact Data Filescustomer_orders1.24 TB, 16→1 files4s
Expire Snapshotspayment_transactions12 snapshots4.6s
Remove Orphan Filesuser_sessions847 MB reclaimed1.8s

Observability

Full lake observability

Continuous analysis of table structure, file health, and optimization opportunities. Monitor active engines, query latency, throughput, and error rates. Cross-system telemetry from S3, GCS, ADLS, and every engine — view, alert, and act from one place.

  • Proactive insights surface file-health issues, partition skew, and manifest bloat before they impact queries
  • Unified event history for every operation — compaction, expiration, orphan removal — with duration, impact, and status
  • Cross-engine telemetry: one view across Trino, Spark, Snowflake, Athena, DuckDB, and Flink
Learn more
Query RoutingHealthy
7 engines

Active Groups

2 / 3

Routing traffic

Engines in Use

7

8 registered

Routed Volume

7,285

This period

Queries this week

Mon
Tue
Wed
Thu
Fri
Sat
Sun

Engine mix

Spark 35%Trino 25%Presto 20%PostgreSQL 15%Others 5%

Routing Groups

Data SciencePerformance
1,243 q·1.8s avg
SparkPrestoTrino
BI ReportingCost
5,621 q·0.9s avg
TrinoPostgreSQL
ETL WorkloadsBalanced
421 q·5.2s avg
SparkDatabricks

Query Routing

Multi-engine query routing

Connect Trino, Spark, Snowflake, Athena, DuckDB, and Flink to one routing layer. Intelligent query routing optimizes for cost, latency, or throughput automatically. Compare engine performance, monitor health, and add new engines — all without engine-specific scripts or duplicate tooling.

  • One SQL endpoint for all engines — route by cost, latency, or workload type automatically
  • Side-by-side engine comparison on the same queries to find the best fit for each workload
  • Add or remove engines without changing application code or connection strings
Learn more
MCP InterfaceAgent-native
Connected

Wire compatibility

PostgreSQLMySQLArrow Flight
psql -h agent.lakeops.dev -U ai_agent -d ecommerce_prod

Schema discovery

Catalogs

4

Tables

127

Columns

1,842

Layered guardrails

ReadOnly

Blocks DDL and DML from agent sessions

CostEstimate

Rejects queries exceeding scan thresholds

PIIMask

Hashes sensitive columns before results reach the model

HumanApproval

Pauses high-stakes operations for review

Agent query telemetry feeds back into compaction and sort-order decisions

Agentic AI

Agentic AI enablement

Built for AI and ML pipelines — optimized metadata, layout, and table structure for agents, feature stores, and autonomous data workflows. Run simulations on file layout changes before applying them. Fast, consistent access to table state and history so AI pipelines get the data they need without extra glue.

  • MCP interface with schema discovery, async queries, and PostgreSQL/MySQL/Arrow Flight wire compatibility
  • Layered guardrails: ReadOnly, CostEstimate, PIIMask, and HumanApproval — configurable per agent session
  • Closed-loop optimization: agent query telemetry feeds back into compaction and sort-order decisions
Learn more
Policies4 active
prod_adaptive_maintenanceAdaptive Maintenance
Scope: ecommerce_prod.*·Next: Data-driven
nightly_snapshot_expiryExpire Snapshots
Scope: ecommerce_prod.*·Next: Mar 16, 01:00
daily_orphan_cleanupOrphan Files
Scope: All catalogs·Next: Mar 16, 03:00
manifest_consolidationRewrite Manifests
Scope: analytics.*·Next: Mar 16, 04:00
staging_configConfiguration
Scope: staging.*·Next:

Total Policies

5

Maintenance

4

Configuration

1

Governance

Governance and policies

Define and enforce compaction, retention, orphan cleanup, and maintenance policies across catalogs and tables. Set schedules, priorities, and target scopes — then let LakeOps execute continuously. Every policy is auditable, versioned, and controllable with one toggle.

  • One-toggle policies for compaction, retention, orphan cleanup, and manifest optimization
  • Catalog-wide or per-table scoping with priority levels and cron-based scheduling
  • Full audit trail: every policy execution logged with duration, impact, and outcome
Learn more

Minutes to value with no risk

1

Connect & collect telemetry

Apache Iceberg
AWS
Snowflake
Trino
2

Manual or autonomous management

Manual
Autonomous
3

Operations run & optimize

Compaction
Snapshots
Orphan cleanup
Manifests & metadata
4

Observability & governance

Metrics
Health
Agents
Routing
Logs
Policies
No vendor lock-in
No code / infra changes
No data changes
Set up in 10 minutes · Works with your existing stack

Works with your stack

One control plane — any engine, catalog, or cloud

LakeOps connects to your existing infrastructure. No vendor lock-in — your data, metadata, and execution stay under your control.

LakeOps Control Plane

Connects, analyzes, optimizes

Engines · Catalogs · Storage · On-prem

Engines

SnowflakeSnowflakeDatabricksDatabricksAmazon AthenaAmazon AthenaDremioDremioDuckDBDuckDBApache FlinkApache FlinkClickHouseClickHouse

Catalogs

AWS GlueAWS GlueApache PolarisApache PolarisApache GravitinoApache GravitinoProject NessieProject NessieLakeKeeperLakeKeeper

Clouds & on-prem

AWSAWSAzureAzureGoogle CloudGoogle Cloud
AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks
AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks

Agentic AI readiness

Your Iceberg lake,
ready for AI agents

AI agents are becoming primary consumers of SQL infrastructure. LakeOps is the control plane that makes your lake intelligent — agent-native interface, built-in guardrails, self-optimizing storage, and a closed-loop feedback system that learns from every query.

AI Agents

Claude, LangChain,
custom MCP agents

LakeOps

RouteGuardOptimizeLearn

Iceberg Lake

Tables, metadata,
engines, catalogs

Closed-loop feedback

Agent-native interface

Native MCP server connects any compatible agent — Claude, LangChain, or custom — with zero integration code. Schema-aware tools, async queries with SSE streaming, and Postgres/MySQL/Arrow Flight wire compatibility.

Safety & governance

Layered guardrails for unsupervised execution — ReadOnlyGuard blocks DDL, CostEstimateGuard rejects expensive scans, PIIMaskGuard scrubs sensitive columns, HumanApprovalGuard pauses high-stakes queries.

Intelligent routing

Three-router stack — Adaptive routes on history, LLM reasons over new templates with live table stats, Semantic matches intent. 0ms cached decisions, data-quality-aware routing enriched by IceProbe.

Self-optimizing lake

Agents querying uncompacted tables pay 5–10× latency penalty. The workload analyst feeds agent query signals to the Rust compaction engine, and the feedback loop auto-updates routing as tables improve.

Production benchmarks

5.5 TB across 10 production tables

Real workloads. Real data. Batch, streaming, delete-heavy, multi-writer, and terabyte-scale tables — all on the same engine, same hardware.

101K → 19K
files (81% reduction)
2,522 MB/s
peak throughput
99.8%
max file reduction
551M
deleted rows cleaned
TableSizeWorkloadFiles (B → A)ThroughputTimeNotes
balance_snapshots1,192 GBTB-Scale batch11,9573,2701,572 MB/s11 minSpark OOM on same hardware
user_accounts174 GBBatch8784002,269 MB/s74sSingle Node
events_analytics484 GBDelete-Heavy16,1287,198729 MB/s11m 21s23,433 delete files; 551M rows removed
raw_sdk_events8 GBStreaming42,63369167 MB/s138s99.8% file reduction
site_traffic292 GBMulti-Writer2,7407541,465 MB/s3m 25sSingle partition
cluster_registry322 GBBatch9984402,522 MB/s2mPeak throughput

Compaction cost per TB

Normalized to Spark = 100%

Apache Spark100%
AWS S3 Tables / Databricks100%
LakeOps10%

Source: 200 GB (~1 TB uncompressed) benchmark. Spark cost index 100 vs LakeOps 10.

Self-improving: same table, zero config changes

balance_snapshots — 1.192 TB across consecutive runs

Run 122 min · 925 MB/s
Run 218 min · 1,100 MB/s
Run 3 (learned)11 min · 1,572 MB/s

Same data and hardware; planner learns workload telemetry and improves runtime from 22 to 11 minutes.

Full Iceberg benefits.
Snowflake-level ease.

Get a personalized walkthrough on your own Iceberg tables — see the impact in minutes.

No vendor lock-inNo infra or data changes10 min to installSecure and compliant