Apache Iceberg logo

Managed Apache Iceberg

Managed Iceberg,
smart and simple

LakeOps continuously optimizes compaction, data layout, and autonomous table maintenance — snapshots, manifests, metadata, and orphan files — across every engine and cloud, so your Iceberg tables stay fast, lean, and production-ready.

80%
cost reduction
12×
faster queries
95%
faster compaction
100%
autonomous

The challenge

Iceberg is powerful — but hard
to operate at scale

Rising cost & latency

Small files, stale snapshots, and orphaned files compound quietly, driving up compute and storage cost while query latency keeps drifting.

Manual engine operations

Spark, Trino, Athena, Snowflake, and Databricks optimize differently. Teams juggle per-engine scripts, configs, and schedules that do not scale.

Metadata & layout drift

Manifests bloat, partitions skew, and layouts drift from real workloads, degrading scan efficiency, cache locality, and query planning.

Ops debt & compliance

Ad-hoc scripts and reactive firefighting create operational debt. Retention, DR, and GDPR policies stay manual instead of control-plane enforced.

Results

Measured impact on
real Iceberg workloads

Benchmarks from production-grade tables across multiple engines and cloud providers.

Compaction speed

95%faster

vs. Apache Spark on identical datasets

Spark
LakeOps
+ Sort

Query performance

12×faster

After compaction + layout optimization

Cost savings

80%reduction

In compute & storage spend

TPC-DS benchmark suiteProduction Iceberg tablesMulti-cloud, multi-engine

Capabilities

Every layer of your Iceberg lake — managed

From compaction and metadata to engine routing, observability, and policy enforcement — one AI-driven control plane that optimizes continuously and autonomously.

Compaction Duration

S3SparkLakeOps

Faster and Smarter Compaction

Rust-based compaction engine that analyzes query patterns and access frequency to optimize file layout at scale. 95% faster than Spark, organizes data by real query usage to cut IO — so your lake stays performant without blocking writes or queries.

Recent Operations
customer_orders
1.8 TB
product_catalog
payment_txns
12.4 TB

Snapshot Lifecycle Management

Automated retention, expiration, and version history for every table. Set policies once — LakeOps expires old snapshots safely with full awareness of concurrent readers. Time-travel to any point, compare snapshots, and roll back without manual intervention.

Rewrite Manifests
Position Deletes
Table Stats

Manifest & Metadata Optimization

Consolidate and rewrite manifest files so query planning stays fast at any scale. Smaller manifests mean faster planning and fewer metadata scans for Trino, Spark, Flink, and every engine. Includes position delete file optimization and Puffin statistics computation.

1Basic InfoEnabled
3Schedule0 0 * * *
4Older Than7 days

Orphan File Detection & Cleanup

Detect and safely remove files no longer referenced by any table. Eliminate storage drift from failed jobs, aborted commits, and legacy tables. Configurable retention thresholds, catalog-wide or per-table scope, and scheduled execution — reclaim capacity without risking data integrity.

Tables
ordersHealthy
productsHealthy
eventsWarning

Full-Stack Observability

Continuous analysis of table structure, file health, and optimization opportunities. Monitor active engines, query latency, throughput, and error rates. Cross-system telemetry from S3, GCS, ADLS, and every engine — view, alert, and act from one place.

Policies
CompactionSorting
Orphan cleanupOrphan
Snapshot expirySnapshots

Organization-Wide Policies

Define and enforce compaction, retention, orphan cleanup, and maintenance policies across catalogs and tables. Set schedules, priorities, and target scopes — then let LakeOps execute continuously. Every policy is auditable, versioned, and controllable with one toggle.

Trino
Spark
Athena
Flink

Multi-Engine Query Routing

Connect Trino, Spark, Snowflake, Athena, DuckDB, and Flink to one routing layer. Intelligent query routing optimizes for cost, latency, or throughput automatically. Compare engine performance, monitor health, and add new engines — no engine-specific scripts or duplicate tooling.

Agentic AI Enablement

Built for AI and ML pipelines — optimized metadata, layout, and table structure for agents, feature stores, and autonomous data workflows. Fast, consistent access to table state and history so AI pipelines get the data they need without extra glue.

Value in minutes with zero risk

No agents, no data movement, no pipeline changes. Connect your catalog, let LakeOps analyze, simulate, and optimize — safely.

1

Connect in ~10 minutes

Connect your catalog and storage in ~10 minutes. No agents, data movement, or pipeline changes.

2

AI analyzes & simulates

LakeOps continuously models table health from metadata, query patterns, and cost signals.

3

Automated optimization

Compaction, manifest cleanup, snapshot hygiene, and layout tuning run continuously on autopilot.

4

Visibility & governance

Unified dashboards track cost, performance, and table health. Every action is logged and reversible.

No vendor lock-in
No code / infra changes
No data changes

Works with your stack

One control plane — any engine, catalog, or cloud

LakeOps connects to your existing infrastructure. No vendor lock-in — your data, metadata, and execution stay under your control.

LakeOps Control Plane

Connects, analyzes, optimizes

Engines · Catalogs · Storage · On-prem

Engines

SnowflakeSnowflakeDatabricksDatabricksAmazon AthenaAmazon AthenaDremioDremioDuckDBDuckDBApache FlinkApache FlinkClickHouseClickHouse

Catalogs

AWS GlueAWS GlueApache PolarisApache PolarisApache GravitinoApache GravitinoProject NessieProject NessieLakeKeeperLakeKeeper

Clouds & on-prem

AWSAWSAzureAzureGoogle CloudGoogle Cloud
AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks
AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks

Agentic AI readiness

Your Iceberg lake,
ready for AI agents

AI agents are becoming primary consumers of SQL infrastructure. LakeOps is the control plane that makes your lake intelligent — agent-native interface, built-in guardrails, self-optimizing storage, and a closed-loop feedback system that learns from every query.

AI Agents

Claude, LangChain,
custom MCP agents

LakeOps

RouteGuardOptimizeLearn

Iceberg Lake

Tables, metadata,
engines, catalogs

Closed-loop feedback

Agent-native interface

Native MCP server connects any compatible agent — Claude, LangChain, or custom — with zero integration code. Schema-aware tools, async queries with SSE streaming, and Postgres/MySQL/Arrow Flight wire compatibility.

Safety & governance

Layered guardrails for unsupervised execution — ReadOnlyGuard blocks DDL, CostEstimateGuard rejects expensive scans, PIIMaskGuard scrubs sensitive columns, HumanApprovalGuard pauses high-stakes queries.

Intelligent routing

Three-router stack — Adaptive routes on history, LLM reasons over new templates with live table stats, Semantic matches intent. 0ms cached decisions, data-quality-aware routing enriched by IceProbe.

Self-optimizing lake

Agents querying uncompacted tables pay 5–10× latency penalty. The workload analyst feeds agent query signals to the Rust compaction engine, and the feedback loop auto-updates routing as tables improve.

Full Iceberg benefits.
Snowflake-level ease.

Get a personalized walkthrough on your own Iceberg tables — see the impact in minutes.

No vendor lock-inNo infra or data changes10 min to installSecure and compliant