Back to all articles

Data Governance articles

Data governance, compliance, and policy enforcement — GDPR, access control, lineage, and catalog management.

4 articles

Iceberg Lakehouse with AI Agents: A Guide — AI agent robots navigating an Apache Iceberg lakehouse with analytics dashboards, AI brain, and governance shield icons, Build like Netflix subtitle
Apache IcebergLakehouseLakeOpsData Platforms

Iceberg Lakehouse with AI Agents: A Guide

AI agents are becoming primary consumers of Iceberg lakehouse data — querying tables iteratively, at high frequency, and without human review. This guide walks through the five components your infrastructure needs to support agentic workloads — MCP connectivity, guardrails, multi-engine routing, self-optimizing storage, and observability — and shows how LakeOps provides each one.

Jonathan Saring
Jonathan Saring
24 min read
Diagram showing seven Iceberg catalog options — Polaris, Nessie, Glue, Unity, Gravitino, Lakekeeper, and Hive — connected to a central Apache Iceberg symbol
Apache IcebergIceberg catalogLakehouseData Lake

Best Catalog for Apache Iceberg? A Useful Comparison

A technical comparison of the seven major Apache Iceberg catalogs — Hive Metastore, AWS Glue, Apache Polaris, Project Nessie, Databricks Unity Catalog, Apache Gravitino, and Lakekeeper — across protocol support, access control, multi-engine interoperability, credential vending, and production readiness.

Chris P
Chris P
21 min read
LakeOps Data Lake Insights showing metadata health alerts across Iceberg tables — manifest fragmentation, snapshot accumulation, and partition skew
Apache IcebergData PlatformsData LakeLakeOps

Iceberg Metadata Lifecycle: Maintenance and Optimization

A deep technical guide to managing the metadata layer that makes Apache Iceberg fast — snapshots, manifests, metadata.json files, and Puffin statistics — covering expiration, consolidation, orphan cleanup, and the sequencing that prevents production incidents.

Jonathan Saring
Jonathan Saring
19 min read
LakeOps control plane for AI agents — MCP, guardrails, routing, storage optimization, observability, and workload policies above Iceberg tables on object storage
Apache IcebergLakeOpsQueryFluxData Platforms

Optimizing Apache Iceberg for Agentic AI: From Slow Tables to Sub-Second Agent Queries

AI agents issue SQL iteratively, repeat query templates at high frequency, and need sub-second responses from tables designed for batch workloads. This post covers what breaks when agents hit a production Iceberg lake — and the five infrastructure layers that fix it: MCP connectivity, guardrails, multi-engine routing, self-optimizing storage, and closed-loop feedback.

Chris P
Chris P
18 min read