Platform

Control plane foryour data lake

End-to-end optimization for tables and metadata across storage and query engines. Telemetry-driven orchestration, real-time maintenance automation, and full visibility in one place.

Runs on your stack

AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks
AWS
Azure
Google Cloud
Snowflake
Databricks
Apache Flink
Apache Hadoop
Apache Iceberg
Delta Lake
Spark
Lakekeeper
StarRocks
Platform overview

Full Iceberg benefits. Snowflake-level ease.

Monitor health, run compaction and maintenance—across catalogs and engines—and manage policies from a single view.

LakeOps

Last 30 days Optimization Activity

Total Operations
12211
successfully run in Last 3 months
Query speed
12.4x
Avg. acceleration across engines
Cost savings
$1,374,672
Saved in last 3 months
CPU & Storage
-76%
Resources saved in last 3 months
Data Optimized
46.8 PB
Volume compacted in last 30 days

Key Metrics

Total Tables
786
Tables in all catalogs
Critical Tables
70
Require Immediate attention
Warning Tables
105
Should be addressed or auto-piloted
Healthy Tables
566
Tables in optimal state
Total Data
112.4 PB
Total lake data size

Recent Operations

Last 10 operations
OperationTableDurationImpactTimeStatus
Compact Data Files
customer_orders
orders
4s1.24 TB, 16 → 1 files57 minutes agoSUCCESS
Expire Snapshots
payment_transactions
payments
27s8.2 TB4 hours agoSUCCESS
Expire Snapshots
inventory_snapshots_20250702
warehouse
3s2.1 TB4 hours agoSUCCESS

Compaction Duration

Seconds

8000
6000
4000
2000
0
6300s
1612s
221s
780s
S3 Tables
Apache Spark
LakeOps
LakeOps (Sort)

Cost of Compaction

Cost ($)

100%
0
0
0
0
0
S3 Tables
Apache Spark
LakeOps
LakeOps (Sort)
01Feature

20x Faster compaction with Rust and AI

Rust-based compaction engine for Iceberg—optimizes file layout at scale. Run more compactions in less time with minimal resource footprint, so your lake stays performant without blocking writes or queries.

Rewrite Manifests

Consolidate and optimize manifest files for improved metadata performance

Rewrite Position Delete Files

Optimize position delete files to improve query performance

Compute Table Statistics (Puffin)

Calculate statistics to optimize query planning and performance.

02Feature

Manifest Rewrites

Compact metadata so query planning stays fast across the lake. Smaller manifests mean faster planning and fewer metadata scans for every engine—Trino, Spark, Flink, and more.

Recent Operations

Last 10 operations
OperationTableDurationSize ReclaimedTimeStatus
Expire Snapshots
customer_orders/ orders
3m 47s1.8 TB1 hour ago
Expire Snapshots
product_catalog/ catalog
0s-1 hour ago
Expire Snapshots
payment_transactions/ payments
23s12.4 TB1 hour ago
Expire Snapshots
loyalty_points_balance/ loyalty
3s9.2 TB1 hour ago
03Feature

Snapshot Optimization

Automated retention and expiration—no manual snapshot hygiene. Set policies once; LakeOps expires old snapshots and cleans history safely, with full awareness of concurrent readers and writers.

Remove Orphan Files Policy

Clean up files no longer referenced by any table

1

Basic Information

Name and priority

e.g. Production Orphan Cleanup
1
StatusEnabled
2

Target Scope

Where this policy applies

Select a catalog
Leave empty for entire catalog
3

Execution Schedule

When the policy runs

0 0 * * *At 12:00 AM daily
4

Orphan File Configuration

How orphans are identified

7Unreferenced files older than this are removed
04Feature

Orphan File Cleanup

Detect and remove orphaned files safely. Eliminate storage drift from failed jobs, aborted commits, and legacy tables—reclaim capacity without risking data integrity.

Policies

Manage all policies including configuration, maintenance, delete, and truncate policies.

All Types
All Status
StatusPolicyTypeNextActions
Orders compaction
ManifestsMar 16, 02:00
Catalog manifest rewrite
Manifests
Payments orphan cleanup
Orphan FilesMar 16, 03:00
Warehouse snapshot expiry
SnapshotsMar 16, 01:00
Loyalty stats refresh
Config
05Feature

Organization policies

Define and enforce policies across catalogs and tables—retention, compaction thresholds, and maintenance windows. Keep the whole organization aligned with consistent rules and guardrails.

Tables

Browse and manage your tables

main
All Namespaces
All Statuses
Search tables...
Table nameNamespaceRecordsSizeStatusLast modified
customer_ordersorders2.4M1.2 GBHEALTHYMar 15, 2026, 12:18 PM
product_catalogcatalog156K84 MBHEALTHYMar 15, 2026, 12:18 PM
payment_transactionspayments8.1M2.4 GBHEALTHYMar 15, 2026, 12:17 PM
inventory_snapshotswarehouse432K356 MBHEALTHYMar 15, 2026, 12:17 PM
loyalty_points_balanceloyalty1.2M128 MBHEALTHYMar 15, 2026, 12:17 PM
user_sessionsanalytics5.8M892 MBHEALTHYMar 15, 2026, 12:16 PM
06Feature

Table Health Monitoring

Continuous analysis of table structure and optimization opportunities. See which tables need compaction, have too many small files, or have stale metadata—with clear priorities and one-click or automated remediation.

orders

Total Partitions

3,632

Total Data Files

110,961

Total Data Size

52.52 TB

Avg Files/Partition

31

Partition Details

Showing 50 of 3,632 partitions.

Page 1 of 73
PARTITION PATHDATA FILESDATA SIZERECORDSAVG FILE SIZEDELETE FILES
region=eu/country=de/order_date=2025-01-15208.94 GB270,209,993457.71 MB
region=na/country=us/order_date=2025-02-0141.54 GB41,333,014394.93 MB
region=eu/country=uk/order_date=2025-01-28167.36 GB205,158,536470.75 MB
07Feature

In-depth table exploration

Drill into any table with partitions, metrics, and SQL. View partition details, file size distribution, records over time, and run queries—all in one place with full visibility.

orders
SQL Query
SQL Query ▾Spark ▾Press Ctrl+Enter to execute
SELECT *
FROM ecommerce.retail.orders
LIMIT 5;
Results5 rows · Spark · 0.4s
order_idcustomer_idregioncountryorder_dateamount_usdstatus
ORD-2847109C-88234euDE2025-01-15429.00delivered
ORD-2847112C-91002naUS2025-01-151,249.50shipped
ORD-2847118C-77401euUK2025-01-1689.99delivered
ORD-2847124C-55291apacJP2025-01-16312.00processing
ORD-2847130C-12088naCA2025-01-17567.25shipped
08Feature

Test queries and engines

Run SQL against any table and choose the engine—Spark, Trino, Flink, or others. Validate queries and compare results across engines without leaving the control plane.

Query Engines

Manage and monitor your connected query engines.

Compare Engines

Compare performance across engines

Compare

Engine Health

Monitor health of query engines

View health

Add Engine

Connect a new query engine

Add engine
Status:All
Active

AWS Athena

Queries: 128Avg: 2.3sCost: $0.0510 min ago
View QueriesConfigure
Active

Trino

Queries: 256Avg: 1.8sCost: $0.035 min ago
View QueriesConfigure
Active

Snowflake

Queries: 192Avg: 2.1sCost: $0.0830 min ago
View QueriesConfigure
Inactive

Spark

Queries: 32Avg: 1.2sCost: $0.021 day ago
View QueriesConfigure
Maintenance

Flink

Queries: 48Avg: 0.9sCost: $0.023 days ago
View QueriesConfigure
Active

DuckDB

Queries: 64Avg: 0.5sCost: $0.012 hrs ago
View QueriesConfigure
09Feature

Multi-Engine Routing

Optimize for Trino, Spark, Flink, and more in one operational layer. No engine-specific scripts or duplicate tooling—one set of policies and one execution layer for your entire lake.

idnamecreated_at
+ add
idnamecreated_atregion
10Feature

Managed Schema Evolution

Schema changes applied safely across engines and workloads. Add, drop, or rename columns with compatibility checks and rollout orchestration so every consumer stays in sync.

11Feature

Cross-System Telemetry

One source of truth across storage, engines, and catalogs. Ingest metrics from S3, GCS, ADLS, and every engine that touches your tables—then view, alert, and act from a single control plane.

Iceberg
works with
AI
12Feature

Native to AI Agents

Built for AI and ML pipelines—optimized metadata and layout for agents and feature stores. Fast, consistent access to table state and history so training and inference pipelines get the data they need without extra glue.

Get in touch

See LakeOps in action

Get a personalized walkthrough of the LakeOps platform with your data. Short call, your architecture.

No commitment · Typically 30 min