Back to blog

Best Catalog for Apache Iceberg? A Useful Comparison

A technical comparison of the seven major Apache Iceberg catalogs — Hive Metastore, AWS Glue, Apache Polaris, Project Nessie, Databricks Unity Catalog, Apache Gravitino, and Lakekeeper — across protocol support, access control, multi-engine interoperability, credential vending, and production readiness.

Diagram showing seven Iceberg catalog options — Polaris, Nessie, Glue, Unity, Gravitino, Lakekeeper, and Hive — connected to a central Apache Iceberg symbol

The table format question is settled — Apache Iceberg won. The catalog question is not.

Choosing the right catalog is now one of the most consequential infrastructure decisions in a lakehouse deployment. The catalog controls metadata resolution, access control, engine interoperability, credential management, and the operational surface of every table in your lake. The wrong fit creates operational debt that compounds with every table you add.

This article compares the seven catalogs that matter in production: Hive Metastore, AWS Glue, Apache Polaris, Project Nessie, Databricks Unity Catalog, Apache Gravitino, and Lakekeeper. For each, we cover architecture, strengths, limitations, and the deployment profile it fits.

What does an Iceberg catalog actually do?

An Iceberg table is a collection of Parquet data files, metadata files, and manifest lists in object storage. The catalog resolves the current metadata pointer — answering: where is the latest `metadata.json` for this table? Without that pointer, no engine can read or write the table.

But modern catalogs do far more than pointer resolution: - Access control — enforce who can read, write, or admin each table and namespace. - Credential vending — issue short-lived, table-scoped storage tokens so compute engines never hold long-lived cloud keys. - Commit sequencing — coordinate concurrent writers with server-side deconflicting instead of client-side optimistic locking. - Namespace management — organize tables into logical hierarchies with properties. - View tracking — store Iceberg view definitions alongside tables. - Governance integration — serve as the single point for lineage, audit, and policy enforcement.

The catalog is the API boundary between your compute engines and your data. Every engine, every query, every write goes through it.

The Iceberg REST Catalog protocol

Before comparing implementations, it helps to understand the protocol that unifies most of them.

The Iceberg REST Catalog specification defines a standard HTTP API — GET /v1/namespaces, POST /v1/namespaces/{ns}/tables/{table}, POST /v1/oauth/tokens, and more — that any engine can call and any server can implement. The spec is open and versioned with a published OpenAPI definition.

Before REST, every engine needed a dedicated connector for every catalog: Spark had a Hive connector, a JDBC connector, a Nessie connector. Adding a new engine or catalog meant writing O(engines × catalogs) integration code. REST collapses that matrix: implement the REST client once per engine, implement the REST server once per catalog, and everything interoperates.

The REST protocol also enables server-side capabilities that were impossible with the old Thrift-based HMS approach: - Credential vending — the catalog issues short-lived, table-scoped storage tokens. Engines never hold long-lived cloud keys, and the blast radius of a leaked credential is one table for a few minutes. - Remote signing — for the most sensitive data, the engine never receives credentials at all. Every file access is pre-signed by the catalog, scoped to a single file and operation. - Server-side commit deconflicting — the catalog sequences concurrent writes and retries conflicts, instead of relying on client-side optimistic locking. - Multi-table commits — atomic visibility of changes across multiple tables in a single operation. - Lazy snapshot loading — metadata is fetched incrementally, reducing cold-start latency for large tables.

Every catalog released after 2023 speaks REST, or is adding REST support. The practical question is no longer whether to use the REST protocol, but which REST implementation fits your stack.

Side-by-side comparison

Before diving into each catalog, here is a high-level feature matrix. The sections below expand on the nuances behind each cell.

FeatureHive MetastoreAWS GlueApache PolarisProject NessieUnity CatalogApache GravitinoLakekeeper
ProtocolThriftREST + Glue APIRESTRESTRESTRESTREST
LicenseApache 2.0ProprietaryApache 2.0Apache 2.0OSS: Apache 2.0 / Managed: proprietaryApache 2.0Apache 2.0
DeploymentSelf-managed (JVM)Fully managed (AWS)Self-managed (JVM) or Snowflake Open CatalogSelf-managed (JVM)Managed (Databricks) or self-managed (OSS)Self-managed (JVM)Self-managed (Rust binary)
Credential vendingNoNoYes (STS session tags)NoYes (managed)YesYes
Fine-grained RBACExternal (Ranger)IAM + Lake FormationBuilt-in + OPAExternalBuilt-inBuilt-inBuilt-in (OpenFGA + OPA)
Branching / versioningNoNoNoYes (Git-style)NoNoNo
Multi-table commitsNoNoYesYes (branch-based)Yes (managed)YesYes
Catalog federationNoNoYes (1.4+)NoYes (read)Yes (native)No
Multi-cloudYes (self-hosted)No (AWS only)YesYesLimited (Databricks regions)YesYes
Iceberg v3Via engineYes (EMR, Glue ETL)YesVia engineYes (Runtime 18.0+)Via engineVia engine
Non-Iceberg formatsHive tablesHive, Delta, HudiRoadmap (generic tables)NoDelta (native), Iceberg, Hudi (foreign)Hive, Hudi, Paimon, Delta, ClickHouseNo
AI/ML asset supportNoNoNoNoYes (Mosaic AI)Yes (models, UDFs)No

1. Hive Metastore (HMS)

The original Iceberg catalog. Hive Metastore stores metadata pointers in a relational database (typically MySQL or PostgreSQL) and exposes them through a Thrift RPC interface. Almost every engine that supports Iceberg also supports HMS, making it the most broadly compatible option.

Strengths: - Universal engine support — Spark, Flink, Trino, Hive, Presto, and many others have mature HMS integrations. If an engine supports Iceberg, it almost certainly supports HMS. - Well-understood operational model — most data platform teams have run HMS for years. Runbooks, monitoring patterns, and failure modes are familiar. - No additional infrastructure — if you already have HMS for Hive tables, adding Iceberg tables requires zero new services.

Limitations: - No REST protocol — HMS uses Thrift, which means no credential vending, no server-side deconflicting, and no multi-table commits. Every engine needs the Hive client JARs on its classpath. - JVM dependency — the HMS server requires a JVM process, and client libraries pull in a transitive dependency tree that includes Hadoop and Hive JARs. - Scalability ceilinglistTables calls degrade as namespace size grows. At 8,000+ tables in a single namespace, operations that should take milliseconds take minutes. The root cause is the getTableObjectsByName Thrift call, which fetches full table objects to filter Iceberg tables from Hive tables. - No branching or versioning — there is no concept of catalog-level branches, tags, or time travel. - No native access control — fine-grained table permissions require an external layer like Apache Ranger. - No multi-catalog isolation — the HiveCatalog implementation ignores the catalog name parameter, so isolating metadata between logical catalogs on the same HMS instance is unreliable. - Hive 3 compatibility issues — Spark's classloader isolation prevents using Hive 3 features (like multi-catalog support) in the Iceberg HiveCatalog, even when connecting to a Hive 3 Metastore.

Best suited for: Teams that already run HMS, need broad engine compatibility, and do not require credential vending, branching, or fine-grained catalog-level access control. For greenfield deployments, a REST-based catalog is typically the better starting point.

2. AWS Glue Data Catalog

AWS Glue Data Catalog is a fully managed, serverless metadata service. It stores Iceberg metadata pointers and integrates natively with IAM, Lake Formation, Athena, EMR, and Redshift Spectrum. In late 2024, AWS added a REST endpoint (https://glue.<region>.amazonaws.com/iceberg) that implements the Iceberg REST spec, letting external engines connect without Glue-specific SDKs.

Strengths: - Zero operational overhead — no servers to provision, no databases to manage, no upgrades to schedule. AWS handles availability, scaling, and durability. - Deep AWS ecosystem integration — IAM policies for auth, Lake Formation for column-level security, CloudTrail for audit logging, built-in compaction for managed tables, and native S3 Tables support. - REST endpoint — the Iceberg REST API surface supports CreateTable, LoadTable, ListNamespaces, ListTables, and DeleteTable. - Iceberg v2 and v3 — including deletion vectors and row lineage on EMR 7.12+ and Glue ETL 5.0+. - Familiar to AWS teams — Glue is the default catalog for Athena, EMR, and Redshift, so most AWS-based data teams already have it configured.

Limitations: - AWS lock-in — the catalog is a managed AWS service. Multi-cloud deployments need a separate catalog per cloud, with no built-in federation between them. - Single-level namespaces — Glue only supports one level of database nesting. Multi-level Iceberg namespace hierarchies require workaround patterns. - No branching or versioning — no concept of catalog branches, tags, or catalog-level time travel. - No multi-table commits — each table commit is independent; there is no atomic cross-table visibility. - REST API gapsUpdateTable is not supported for Iceberg tables through the REST API. Iceberg v3 tables cannot be created via the CreateTable API (only v1 and v2). The REST endpoint does not support credential vending — engines still need their own IAM credentials. - No cross-region catalog — each Glue catalog is regional. Cross-region table access requires additional configuration or replication.

Best suited for: All-AWS teams that want zero ops and do not need multi-cloud access, branching, or multi-table atomicity. For teams already on Athena or EMR, Glue is the path of least resistance.

3. Apache Polaris

Apache Polaris was open-sourced by Snowflake in June 2024 and donated to the Apache Software Foundation, where it graduated from the incubator in February 2026. It is a full open-source implementation of the Iceberg REST Catalog specification — purpose-built for multi-engine access with fine-grained access control.

The latest release, Polaris 1.4.0 (April 2026), is the first post-graduation release. It introduced enhanced credential vending with AWS STS session tags for CloudTrail correlation, storage-scoped credentials, S3 KMS encryption support, CockroachDB as a persistence backend, and Iceberg metrics persistence to database.

Strengths: - Full REST Catalog implementation — credential vending (with STS session tags for audit correlation), server-side deconflicting, multi-table commits, OAuth2 authentication, and namespace management. - Fine-grained RBAC — principals, principal roles, and catalog roles with table-level and namespace-level grants. Open Policy Agent (OPA) integration for external authorization is maturing. - Multi-catalog — a single Polaris server manages multiple logical catalogs, each with its own storage locations, permissions, and encryption keys. - Catalog federation — Polaris 1.4 supports federation to Hive Metastore, Glue, and other Iceberg REST catalogs. A single Polaris instance acts as a routing layer for tables that live in other catalogs, enabling incremental adoption without a big-bang metadata migration. - Broad engine support — tested with Spark, Flink, Trino, Dremio, StarRocks, and Doris. - Managed optionSnowflake Open Catalog is a managed service built on Polaris, offering the same REST API with zero self-hosting. Currently GA and free; pay-per-request billing planned for 2026. - Active community — monthly release cadence. Upcoming features include generic table support for Delta and Hudi, expanded event persistence, and deeper OPA integration.

Limitations: - Self-managed (open source) — you deploy and operate the Polaris server (Quarkus-based JVM application), its persistence layer (PostgreSQL, MySQL, or CockroachDB), and its availability. - No branching — unlike Nessie, Polaris does not offer Git-style branching or catalog-level version history. - Snowflake adjacency — the line between the open-source Apache project and Snowflake's commercial Open Catalog can be confusing. Feature parity between the two is not guaranteed.

Best suited for: Teams that want a vendor-neutral, open-source REST catalog with credential vending, RBAC, and federation — and are willing to operate a JVM service (or use Snowflake Open Catalog for managed hosting).

4. Project Nessie

Project Nessie brings Git-like semantics to catalog metadata. Created by Dremio, it lets you create branches, tags, and commits over your entire catalog state — enabling isolated experimentation, CI/CD workflows for data, and catalog-wide time travel.

Strengths: - Git-style branching — create dev, staging, and feature branches of your catalog. Write to a branch in isolation, then merge when ready. This is transformative for testing schema changes, backfill jobs, or ML feature engineering against production data without affecting live tables. - Catalog-level time travel — roll back the entire catalog to any previous commit, not just individual table snapshots. This gives you a global undo across every table. - Multi-table transactions — loosely coupled atomic visibility via branch-based commits. All changes committed to a branch become visible together on merge. - Cherry-pick and merge — selectively apply commits from one branch to another, exactly like Git. - Iceberg REST API — Nessie implements the Iceberg REST Catalog interface, so engines connect via standard REST. - Broad engine support — Spark (including Spark SQL extensions for branch/tag management), Flink, Trino, Hive, and Presto. Latest release (0.107.5, April 2026) adds Spark SQL 4.0 extensions. - Docker and Kubernetes — ships as a Docker image with Helm chart support for K8s deployments.

Limitations: - Operational investment — Nessie requires a dedicated server process and a persistent backend store (JDBC-based). Running it reliably at scale with replication, backups, and monitoring is non-trivial. - No built-in access control — Nessie does not have fine-grained RBAC. You need to pair it with Polaris, an OPA layer, or a custom authorization service for production-grade permissions. - No credential vending — engines need their own storage credentials; Nessie does not vend scoped tokens. - Branching adds complexity — branch management overhead is only justified if your workflows benefit from data CI/CD. For simpler deployments that just need metadata resolution and access control, branching is unnecessary complexity. - Loosely coupled, not true ACID — while branch merges provide atomic visibility, individual operations within a branch are still separate Iceberg commits. This is not the same as true multi-statement ACID transactions.

Best suited for: Teams that need data CI/CD — isolated branches for development, testing, or experimentation against production catalog state. Nessie is the only catalog that offers Git-level version control over metadata. Consider pairing it with Polaris or another access-control layer for production security.

5. Databricks Unity Catalog

Unity Catalog is Databricks' governance layer for the lakehouse. The open-source version was released under Linux Foundation governance in June 2024, but the production-grade managed version — deeply integrated with Databricks Runtime, Mosaic AI, and Delta Sharing — is the one most teams deploy.

Strengths: - Feature-complete governance — Unity Catalog manages tables, volumes, ML models, functions, and AI assets in a single namespace hierarchy with row-level and column-level security, attribute-based access control, data lineage tracking, and audit logging. - Iceberg REST Catalog API — Unity implements the Iceberg REST spec at /api/2.1/unity-catalog/iceberg-rest, enabling external engines (Spark, Flink, Trino) to read and write Unity-managed Iceberg tables. - Iceberg v3 — deletion vectors, row lineage, and VARIANT data type support on Databricks Runtime 18.0+. - Predictive Optimization — automatic compaction, vacuum, and Liquid Clustering for managed Iceberg tables. Maintenance is handled by the platform based on table access patterns. - Federation — Unity federates to AWS Glue, Hive Metastore, and Snowflake Horizon Catalog for read access to foreign Iceberg tables. - Delta Sharing — cross-organization, cross-cloud data sharing using an open protocol.

Limitations: - Databricks coupling — the managed version is tightly integrated with the Databricks platform. The full feature set (Predictive Optimization, Liquid Clustering, Mosaic AI, Delta Sharing) is only available on Databricks. - Open-source gap — the open-source Unity Catalog under Linux Foundation governance is a separate, slower-moving project with significantly fewer features than the managed version. - Write path constraints — writing to managed Iceberg tables via the REST API requires compatible Iceberg clients (1.9.2+ recommended). External engines have full read access but write support varies by table type. Credential vending for foreign Iceberg tables is not supported. - Cost model — Unity Catalog comes with the Databricks platform. Adopting it for non-Databricks workloads means adopting the Databricks ecosystem.

Best suited for: Teams running Databricks. Unity Catalog is the natural — and often mandatory — choice for Databricks-centric lakehouses. For non-Databricks environments, the open-source version is an option, but expect feature gaps.

6. Apache Gravitino

Apache Gravitino is the most ambitious catalog on this list. After incubating at Apache, it graduated to a Top-Level Project in June 2025. It positions itself as a federated metadata lake — not just an Iceberg catalog, but a unified metadata layer for tables, files, models, Kafka topics, and UDFs across multiple backend systems.

The latest release, Gravitino 1.2.0 (March 2026), introduced a Table Maintenance Service, ClickHouse catalog, end-to-end UDF management, scan planning offload for DuckDB and Spark, and multi-version Trino connector support (435–478).

Strengths: - Federated metadata — Gravitino connects to Hive, MySQL, PostgreSQL, HDFS, S3, Iceberg, Hudi, Paimon, ClickHouse, StarRocks, OceanBase, and more through a unified API. Changes in underlying systems are reflected immediately via direct connectors — no ETL-based metadata sync. - Iceberg REST Catalog service — Gravitino runs a native Iceberg REST endpoint, so any REST-compatible engine can use it as an Iceberg catalog. - Multi-engine integration — Trino connector with multi-version support (435–478), Spark FunctionCatalog integration for UDFs, and Flink user authentication. - Scan planning offload — query engines like DuckDB and Spark can delegate scan planning to Gravitino's IRC server, reducing client-side complexity and latency. - AI asset management — ML model tracking, UDF registry with centralized governance, and feature metadata. - Geo-distribution — designed for multi-region and multi-cloud metadata synchronization with multi-cluster fileset support. - Access control and audit — built-in RBAC, auditing, and metadata discovery across all asset types.

Limitations: - Documentation gaps — the project is powerful but documentation is incomplete, especially around advanced configuration and production hardening. - Operational complexity — running Gravitino means managing a JVM server, its connector layer, and the federation topology. The breadth of scope creates a large configuration surface. - Trino-first — engine integration is most mature for Trino. Spark and Flink support is progressing but not at parity. - Scope breadth — the ambition to catalog everything (tables, files, models, topics, UDFs, schemas across a dozen backend systems) means the project spreads effort across a wide surface area. For teams that only need an Iceberg catalog, Gravitino may be more than necessary.

Best suited for: Teams with a heterogeneous data platform — Hive here, PostgreSQL there, Kafka over there — that need a single metadata layer to federate across all of them. Particularly relevant when Trino is the primary query engine.

7. Lakekeeper

Lakekeeper is the youngest catalog on this list and the most opinionated about simplicity. Written entirely in Rust, it ships as a single binary with no JVM or Python dependency — point it at a PostgreSQL database and start serving REST requests.

The latest release, Lakekeeper 0.12.0 (April 2026), focused on authorization — adding an audit event handler with exactly-once guarantees, OPA batch optimization, Trino custom rule extensions, configurable admin users, and improved role lifecycle management.

Strengths: - Rust performance — single binary, no JVM warmup, low memory footprint. Starts in milliseconds. Ideal for containerized deployments and Kubernetes. - Full REST Catalog — implements the Iceberg REST specification including multi-table commits, server-side deconflicting, and table/view statistics. - Credential vending — storage access management using vended credentials and remote signing for S3, GCS, ADLS, and on-premise S3-compatible stores. Configurable STS endpoints for fine-grained control. - Fine-grained authorization — OpenFGA-based policy engine with OPA bridge for Trino integration. OIDC authentication from any identity provider plus native Kubernetes service account auth. - Multi-tenant — a single deployment serves multiple isolated projects, each with multiple warehouses and independent auth policies. - Change events — built-in CloudEvents emission for reacting to table changes (trigger compaction, feed CDC pipelines, populate audit logs). Contract verification hooks allow external approval before commits. - Extensible architectureCatalog, SecretsStore, Authorizer, CloudEventBackend, and ContractVerification are exposed as Rust traits for custom implementations. - Kubernetes-native — Helm chart, horizontal scaling with no local state, and native K8s service account authentication.

Limitations: - Young project — v0.12.0 with ~1,000 GitHub stars. Production deployment stories are still emerging. - No branching — unlike Nessie, there is no Git-style version control. - PostgreSQL dependency — the backing store is PostgreSQL. Alternative backends require implementing the Catalog trait. - Smaller integration surface — tested with Spark, PyIceberg, Trino, and StarRocks. Flink and Hive support is less validated.

Best suited for: Teams that want a lightweight, high-performance REST catalog that deploys as a single binary on Kubernetes — with strong authz (OpenFGA + OPA) and minimal dependencies.

Managed services and cloud-native catalogs

Beyond the seven catalogs above, the major cloud providers offer managed catalog services that wrap or complement the open-source options: - [Snowflake Open Catalog](https://other-docs.snowflake.com/en/polaris/overview) — a fully managed service built on Apache Polaris. Same REST API, RBAC, and credential vending — zero self-hosting. Currently GA and free to use; pay-per-request billing planned for 2026. - [Google BigLake Metastore](https://cloud.google.com/bigquery/docs/blms-rest-catalog) — a serverless, managed Iceberg REST catalog on GCP (GA). Supports interoperability between Spark, Trino, and BigQuery on the same tables in GCS. Includes BigQuery federation so tables created in Spark are queryable in BigQuery without data copies. - Microsoft Fabric OneLake Catalog — manages metadata for tables across Fabric workspaces with Delta/Iceberg support. Tightly coupled to the Fabric ecosystem. - Dremio's Nessie-based catalog — Dremio integrates Nessie for Git-like branching with an auto-optimization layer for compaction and maintenance.

These managed services reduce operational burden but introduce platform coupling. The trade-off is the same as any managed-vs-self-hosted decision: less ops in exchange for less portability.

Also worth knowing: the JDBC catalog

The Iceberg project includes a built-in [JDBC catalog](https://iceberg.apache.org/docs/1.4.3/jdbc/) that stores table metadata pointers in any JDBC-compatible relational database — PostgreSQL, MySQL, SQLite, Oracle, or SQL Server. It requires no external service beyond the database itself. - Development use — SQLite-backed JDBC catalog gives you a fully local Iceberg environment with no cloud credentials or services. Ideal for unit tests, local prototyping, and CI pipelines. - Simple production use — PostgreSQL-backed JDBC catalog works for single-writer or moderate-concurrency workloads where you do not need credential vending, RBAC, or REST API access from multiple engines. - Not a REST catalog — the JDBC catalog uses direct database connections, not HTTP. Engines need JDBC drivers on the classpath. There is no credential vending, no server-side commit deconflicting, and no multi-table commits.

The JDBC catalog is useful as a stepping stone or for constrained environments. For production multi-engine lakehouses, a REST-based catalog is the better long-term choice.

Governance portability: the hidden problem

One of the biggest practical concerns in the catalog landscape is governance portability. Access control policies — who can query what, at what granularity — are defined in the catalog, but there is no industry standard for sharing these policies across catalogs.

If you set up row-level security in Unity Catalog, that policy does not transfer to Polaris. If you define namespace-level grants in Polaris, those rules do not apply when the same table is accessed through Glue. This is why many architects recommend picking a single catalog as the governance boundary and routing all engine access through it, rather than running multiple catalogs with duplicated (and inevitably inconsistent) governance rules.

For organizations that do run multiple catalogs — and most large enterprises will — federation features in Polaris, Unity, and Gravitino can help by centralizing the access control layer even when metadata lives in distributed backends.

Production considerations

Regardless of which catalog you choose, a few operational patterns apply universally: - Treat the catalog as a Tier-1 dependency. If the catalog is down, no engine can resolve metadata — reads and writes stop. Monitor P99 latency, set up alerting (target: <500ms), and plan for failover. - Use credential vending when available. Distributing long-lived storage credentials to every engine and job is a security liability. Vended credentials scope access to the table being queried and expire in minutes. - Plan for catalog migration before you need it. The REST protocol makes catalog implementations swappable. If you start with a REST catalog, you can switch backends later without changing engine configurations. If you start with HMS, you will need a migration. - Separate governance from metadata resolution. Not every catalog needs to enforce access control. In some architectures, the catalog handles metadata and a separate policy engine (OPA, Cedar, Ranger) handles authorization. This decouples the two concerns and avoids single-vendor governance lock-in. - Monitor metadata growth independently. Catalogs resolve pointers but do not tell you whether a table is degrading — whether manifests are bloated, orphan files are accumulating, or snapshot history is consuming excessive storage. That requires a separate observability layer.

The catalog does not replace operational management

Every catalog on this list resolves metadata pointers and some manage access control. None of them tell you whether a table is healthy.

They do not track how many orphan files are accumulating, whether manifests need consolidation, whether snapshot history is consuming excessive storage, or whether a compaction schedule is keeping up with ingestion volume. They do not alert when a table degrades or automatically fix it when it does.

LakeOps connects to your existing catalogs — Glue, Hive, REST catalogs (Polaris, Nessie, Gravitino, Lakekeeper), and S3 Tables — and adds the operational layer on top. It reads Iceberg metadata to surface table health metrics, identifies degraded tables, and runs autonomous maintenance on a purpose-built Rust engine — compaction, snapshot expiration, orphan cleanup, and manifest optimization.

The catalog handles metadata resolution and access control. The operational layer handles everything after — keeping every table in your lake healthy, compact, and query-ready. The two layers are complementary: pick whichever catalog fits your architecture, and layer operational management on top.

Choosing the right catalog

There is no single right answer. The choice depends on your constraints, your existing stack, and which trade-offs you can accept: - All-AWS, zero ops → AWS Glue. Serverless. IAM-native. Accept the cloud lock-in. - Multi-engine, multi-cloud, open standards → Apache Polaris. Full REST implementation with RBAC, credential vending, and federation. Use Snowflake Open Catalog if you prefer managed hosting. - Data CI/CD with branch isolation → Project Nessie. The only option for Git-style branching over catalog metadata. Consider pairing with Polaris for access control. - Databricks ecosystem → Unity Catalog. Deep platform integration, Predictive Optimization, AI/ML governance. - Heterogeneous metadata federation → Apache Gravitino. Unifies Iceberg, Hive, RDBMS, Kafka, and file metadata under one API. - Lightweight, Kubernetes-native → Lakekeeper. Single Rust binary, strong authz (OpenFGA + OPA), minimal dependencies. - GCP-native → Google BigLake Metastore. Managed REST catalog with BigQuery federation. - Legacy compatibility → Hive Metastore. If your engine matrix depends on it, migration has to be justified. - Local development and CI → JDBC catalog (SQLite). Zero infrastructure, fully portable.

For many organizations, the realistic path is not choosing one catalog exclusively. You may run Glue in AWS for existing workloads, add Polaris for multi-engine access, or use Nessie for a development environment that needs branch isolation. The REST protocol makes this coexistence practical — and federation capabilities in Polaris, Unity, and Gravitino make it manageable.

The safest long-term bet is a REST-compatible implementation. If you start with REST, you can swap catalog backends later without changing engine configurations. That flexibility is worth more than any individual feature.

Related articles

Found this useful? Share it with your team.