LakeOps Documentation

LakeOps is the autonomous control plane for Apache Iceberg data lakes. It automates compaction, table maintenance, multi-engine query routing, and real-time optimization across your entire lake — with built-in observability, governance policies, and agentic AI support.

These docs cover every feature of the platform with step-by-step guides, configuration references, and best practices for common data lake operations.

Platform features

Getting Started

Connect your catalogs and start optimizing in under 10 minutes.

Compaction

Rust-based compaction engine that organizes data files by real query patterns for faster reads and lower costs.

Snapshot Management

Automated retention, expiration, time-travel, and rollback for every table.

Manifest Optimization

Consolidate manifest files so query planning stays fast at any table scale.

Orphan File Cleanup

Detect and safely remove unreferenced files from failed jobs, aborted commits, and legacy tables.

Observability

Table health, engine metrics, query latency, throughput, and cross-system telemetry from one place.

Policies

Define and enforce compaction, retention, and cleanup policies across catalogs and tables.

Engine Management

Connect Trino, Spark, Snowflake, Athena, DuckDB, and Flink. Monitor health and compare performance.

Query Routing

Route queries across engines optimized for cost, latency, or throughput automatically.

Simulations

Run layout simulations to preview the impact of compaction strategies before applying them.

Agentic AI

Agent-native MCP interface, guardrails, and self-optimizing lake for AI pipelines.