Back to blog

Apache Iceberg Multi-Engine Architecture: Spark, Trino, Snowflake, Athena on the Same Tables

How production Iceberg lakehouses run Spark, Trino, Snowflake, Athena, Flink, and DuckDB on the same tables — covering engine decoupling, write isolation, conflict resolution, catalog coordination, read path optimization, query routing, cross-engine governance, and the control plane that ties it together.

Jonathan Saring

Jonathan Saring

26 min read
Apache Iceberg Multi-Engine Architecture — Spark, Trino, Snowflake, and Athena on the same Iceberg tables

Running multiple engines on the same Iceberg tables is the promise of the open lakehouse. But without coordination, engines conflict, costs multiply, and nobody knows which engine is the right one for a given query. LakeOps is the control plane that makes multi-engine actually work — routing queries, coordinating maintenance, and providing unified observability across all engines.

This article covers the full architecture: why multi-engine is the natural state of every production lakehouse, how Iceberg enables engine decoupling at the format level, what each engine contributes, how write conflicts are handled, why the catalog is the coordination point, how query routing sends work to the right place, how governance stays consistent, and how LakeOps provides the control plane that ties it all together.

Why multi-engine is inevitable

Same Data, Different Engines — open lakehouse architecture
Same physical data on object storage. No duplication across engines. Each engine connects to the same Iceberg tables through a shared catalog — bringing different strengths to different workloads.

Before Iceberg, using multiple engines on the same data meant maintaining multiple copies. Spark wrote to HDFS in one format, Snowflake stored a copy in proprietary storage, Redshift compressed into its own columnar layout. Syncing pipelines between them were expensive, fragile, and always slightly behind. Teams chose one engine and lived with its limitations because the alternative — maintaining N copies of the same data — was worse.

Two forces make multi-engine inevitable in every production lakehouse: economics and workload diversity.

The economic case is stark. A point lookup that costs $0.01 on DuckDB costs $0.08 on Snowflake. A 10 TB daily ETL job on Spark spot instances costs 70–90% less than the same job on on-demand warehouse credits. An ad-hoc scan on Athena with good partition pruning costs pennies, while the same query on a permanently provisioned Trino cluster costs the cluster's hourly rate regardless of utilization. Across thousands of queries per day, routing work to the cheapest viable engine compounds into six-figure annual savings.

Workload diversity makes a single engine a bottleneck. No engine does everything well. Trino is poor at large-scale ETL writes. Spark is slow at interactive point lookups. DuckDB cannot distribute across nodes. Athena charges per byte scanned, which punishes compute-heavy joins. Snowflake provides warehouse ergonomics but at credit-based pricing that makes high-volume batch work prohibitively expensive. Flink handles streaming beautifully but is wrong for ad-hoc exploration. These are not marginal differences — they are orders of magnitude for mismatched workloads.

Every production Iceberg lakehouse we see runs at least three engines. The question is not whether you will run multiple engines. It is whether you manage them as a coherent architecture — with intelligent routing, coordinated maintenance, and unified observability — or as a collection of independent systems that happen to read the same files.

How Iceberg enables engine decoupling

Multi-engine access is not just about open file formats. CSV is open, but nobody runs a multi-engine architecture on CSV files. Iceberg enables true engine decoupling through three mechanisms that work together.

Format-level compatibility

Every engine reads the same physical artifacts: Parquet data files, manifest files (Avro), manifest lists (Avro), and metadata files (JSON). The spec defines exactly how these relate — the metadata file points to the manifest list, which points to manifests, which point to data files with per-file statistics. Any engine that implements this traversal sees the same table state. There is no proprietary encoding, no engine-specific optimization layer that locks you in.

Schema evolution, partition evolution, and sort order changes are all tracked in metadata. An engine that wrote the table with one schema can be replaced by a different engine that reads the evolved schema — the metadata carries the full history. Hidden partitioning means partition transforms are defined in metadata, not in directory naming conventions that vary by engine.

Catalog abstraction via the REST specification

The Iceberg REST Catalog specification defines a standard HTTP API for all catalog operations: listing namespaces, loading tables, committing updates, and managing credentials. Before REST, every engine needed a dedicated connector for every catalog — O(engines × catalogs) integration code. REST collapses this: implement the REST client once per engine, implement the REST server once per catalog, and everything interoperates.

The practical effect is that adding a new engine to your lakehouse requires zero catalog changes. Configure the engine with the REST endpoint, provide OAuth2 credentials, and it can immediately see every table. The catalog handles credential vending (short-lived, table-scoped storage tokens), server-side commit deconflicting, and multi-table commits — capabilities that are impossible with the old Thrift-based Hive Metastore approach.

In 2026, every major catalog implements or consumes the REST spec. Snowflake's Horizon Catalog runs on Apache Polaris (which graduated to an Apache top-level project in February 2026), enabling bidirectional read/write access from external engines via the standard REST endpoint. AWS Glue exposes a REST interface. Google BigLake Metastore provides a serverless REST catalog on GCP. Lakekeeper, Nessie, and self-hosted Polaris serve the same spec independently. For a deep comparison, see our Iceberg catalog comparison.

Snapshot isolation

Iceberg's snapshot-based architecture means readers and writers never block each other. A reader opens the table at a specific snapshot and sees a consistent view regardless of concurrent writes. A writer creates new data files and commits a new snapshot atomically. The catalog ensures that the pointer swap from old metadata to new metadata is atomic — if two writers race, exactly one wins and the other retries.

This isolation model means engines can operate completely independently. Spark can write a batch while Trino reads the previous snapshot. Flink can stream new rows while DuckDB queries a historical version. No coordination protocol between engines is required beyond the catalog's atomic commit — the format handles the rest.

Engine roles and strengths

Managed Iceberg lakehouse — engines and catalogs
Each engine in the ecosystem brings distinct capabilities — from Spark's complete write-path support to DuckDB's zero-infrastructure local analytics. The catalog layer connects them all.

Each engine has a distinct operational profile that determines where it belongs in the architecture.

Apache Spark — batch ETL, ML pipelines, table maintenance

Spark has the most complete Iceberg integration. It supports all write operations (INSERT, MERGE INTO, DELETE, UPDATE), all maintenance procedures (compaction, snapshot expiration, orphan cleanup, manifest optimization), schema and partition evolution, and branch/tag management. Spark is the only engine with full write-path support for every Iceberg operation.

Use Spark for: daily/hourly batch ingestion, complex multi-stage ETL, ML feature engineering pipelines, large-scale table rewrites, and any operation that requires distributed write coordination. Spark on spot instances provides the best cost profile for batch workloads that tolerate interruption.

Configuration for connecting Spark to an Iceberg REST catalog:

python
1from pyspark.sql import SparkSession2 3spark = SparkSession.builder \4    .config("spark.sql.catalog.lakehouse", "org.apache.iceberg.spark.SparkCatalog") \5    .config("spark.sql.catalog.lakehouse.type", "rest") \6    .config("spark.sql.catalog.lakehouse.uri", "https://catalog.example.com/api/v1") \7    .config("spark.sql.catalog.lakehouse.credential", "<client-id>:<client-secret>") \8    .config("spark.sql.catalog.lakehouse.warehouse", "s3://my-lakehouse/warehouse") \9    .config("spark.sql.catalog.lakehouse.header.X-Iceberg-Access-Delegation", "vended-credentials") \10    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \11    .getOrCreate()

For Snowflake-managed Iceberg tables accessed via Horizon Catalog, replace the URI with the Horizon REST endpoint (https://<account>.snowflakecomputing.com/polaris/api/catalog) and authenticate with a Snowflake personal access token. Horizon handles credential vending and governance enforcement — the Spark code stays the same.

Trino — interactive SQL, low-latency analytics, BI serving

Trino's stateless distributed architecture makes it the natural choice for interactive analytics. Workers scale independently, queries compile to efficient execution plans, and dynamic filtering pushes join predicates into Iceberg scans at runtime. Trino reads Iceberg metadata directly and leverages per-file statistics for aggressive file pruning.

Use Trino for: dashboard queries, ad-hoc SQL exploration, BI tool connections (Looker, Tableau, Superset), and any workload where sub-10-second response time matters. Trino handles complex joins on compacted data exceptionally well.

Trino catalog configuration for Iceberg REST:

properties
1# etc/catalog/lakehouse.properties2connector.name=iceberg3iceberg.catalog.type=rest4iceberg.rest-catalog.uri=https://catalog.example.com/api/v15iceberg.rest-catalog.security=OAUTH26iceberg.rest-catalog.oauth2.credential=<client-id>:<client-secret>7iceberg.table-statistics-enabled=true8iceberg.projection-pushdown-enabled=true9iceberg.query-partition-filter-required=true

Snowflake — governed analytics, semantic layers, BI connectivity

Snowflake reads and writes Iceberg tables through external volumes and catalog-linked databases. The Horizon Catalog — built on Apache Polaris — exposes Iceberg tables via the REST spec, allowing external engines to access Snowflake-managed tables. Bidirectional write support became generally available in May 2026, meaning external engines like Spark and Trino can now write to Snowflake-managed Iceberg tables through the standard REST protocol with full ACID consistency. Conversely, Snowflake can read tables managed by external catalogs (Polaris, Glue, Nessie) as a consumer via catalog-linked databases.

Use Snowflake for: governed BI workloads where warehouse ergonomics justify credit costs, semantic layer serving, workloads that benefit from Snowflake's role-based security model (including row-access policies and column masking enforced at the catalog layer), and teams that need zero-ops SQL without managing clusters. In a multi-engine architecture Snowflake's strength is governance, ease of use, and deep ecosystem integration with BI tools — not raw throughput. For organizations transitioning from single-vendor, see moving from Databricks/Snowflake to an open data platform.

Amazon Athena — serverless, pay-per-scan ad-hoc queries

Athena requires no infrastructure. You point it at a Glue catalog (or configure an Iceberg REST endpoint), submit SQL, and pay per byte scanned. There are no clusters to provision, no idle costs, and no capacity planning. Athena supports Iceberg v2 tables with full predicate pushdown and partition pruning.

Use Athena for: infrequent or unpredictable query volumes where per-query pricing beats reserved compute, ad-hoc exploration by analysts who do not want to manage infrastructure, and workloads where the scan volume is small (good partitioning, selective predicates). Athena becomes expensive for compute-heavy joins or frequent full-table scans — those belong on compute-priced engines.

Athena Iceberg table setup via AWS Glue:

sql
1CREATE TABLE analytics.events (2    event_id STRING,3    user_id BIGINT,4    event_type STRING,5    event_timestamp TIMESTAMP,6    properties MAP<STRING, STRING>7)8PARTITIONED BY (day(event_timestamp))9LOCATION 's3://my-lakehouse/warehouse/analytics/events'10TBLPROPERTIES (11    'table_type' = 'ICEBERG',12    'format' = 'PARQUET',13    'write_compression' = 'zstd'14);

Flink writes to Iceberg tables continuously using checkpoint-based commits. Each Flink checkpoint triggers an Iceberg commit, producing a new snapshot with the rows accumulated since the last checkpoint. This provides exactly-once semantics for streaming ingestion — if a checkpoint fails, uncommitted data is discarded and reprocessed from the Kafka offset.

Use Flink for: real-time data ingestion from Kafka or Kinesis, CDC pipelines that write change events to Iceberg tables, and any use case where data freshness matters (minutes, not hours). Flink's streaming writes produce many small files — a compaction layer is essential to prevent performance degradation for downstream readers.

DuckDB — single-node, in-process, zero-infrastructure analytics

DuckDB runs as an embedded library inside a Python process, a CLI session, or a notebook. It reads Iceberg tables directly from object storage (via the iceberg extension and httpfs or native S3 support) without any cluster or server process. For selective queries on well-compacted tables, DuckDB delivers sub-second response times with zero infrastructure cost.

Use DuckDB for: local data exploration during development, CI/CD data validation pipelines, selective queries on small-to-medium tables, and any scenario where spinning up a distributed cluster is overhead that is not justified by the query complexity. DuckDB cannot handle very large tables (hundreds of GB) efficiently because it is single-node — those queries belong on Trino or Spark.

Conflict and coordination challenges

The most dangerous operational pattern in a multi-engine lakehouse is uncoordinated concurrent writes. Understanding how Iceberg handles conflicts — and where it cannot — is essential for designing a stable architecture.

How Iceberg resolves write conflicts

Iceberg uses optimistic concurrency control. Each writer performs its work locally (writing new Parquet files to object storage), then attempts to commit by atomically swapping the metadata pointer in the catalog. The commit includes the writer's assumption about the current table state — specifically, which metadata version it read before starting the write.

If the current metadata has not changed since the writer started, the commit succeeds atomically. If another writer committed in the interim, the catalog rejects the commit with a CommitFailedException. The failed writer must then reload the current metadata, check whether its changes conflict with the intervening commit, and retry if they do not.

Conflict detection is operation-aware. Two appends to different partitions never conflict — they add disjoint sets of files. A compaction that rewrites files conflicts with a delete that removed one of those files. An append conflicts with a compaction only if the compaction's output includes the append's new files (which it cannot, since the append had not committed yet).

Conflict scenarios in multi-engine deployments

Spark ETL + Flink streaming on the same table. This is the most common multi-writer pattern. Flink appends new rows continuously. Spark runs periodic compaction or merge operations. If Spark compacts partition day=2026-06-15 while Flink is appending to the same partition, a conflict can occur at commit time. The mitigation: scope compaction to partitions that are no longer receiving streaming writes (e.g., WHERE day < current_date - 1).

Multiple Spark jobs writing to different partitions. This is safe. Iceberg's conflict detection recognizes that appends to disjoint partitions do not overlap. Both commits succeed without retry.

Spark maintenance + Trino reads. No conflict. Readers never block writers and writers never block readers. Trino reads the snapshot that was current when its query started, regardless of concurrent compaction or expiration operations.

Two engines writing to the same partition. This will conflict at commit time. One writer succeeds, the other retries. With proper retry configuration, the retry usually succeeds on the next attempt because the operations are compatible (both appends). But if both writers attempt MERGE operations on overlapping rows, the conflict is irreconcilable and one must fail.

Snowflake bidirectional writes + external engines. With Horizon Catalog now supporting bidirectional writes, an external Spark job writing to a Snowflake-managed Iceberg table while Snowflake itself writes to the same table creates a multi-writer scenario. The Horizon Catalog's server-side commit deconflicting handles the sequencing — but scoping engines to different partitions or tables remains the safest pattern.

Write isolation patterns

Preventing write conflicts is better than resolving them. Three patterns provide write isolation in multi-engine deployments.

Single writer principle. Designate one engine as the writer for each table, and all other engines are read-only consumers. Spark writes the batch ETL. Flink writes the streaming tables. Trino, Snowflake, Athena, and DuckDB only read. This eliminates all write conflicts by design.

Partition-level separation. When multiple engines must write to the same table, assign each engine exclusive write access to specific partitions. Flink writes to today's partition. Spark compacts yesterday's and earlier partitions. Because Iceberg's conflict detection is partition-aware, writes to different partitions never conflict.

Branch-based isolation. For workloads that require isolated experimentation — testing schema changes, validating backfill jobs, ML feature engineering — Iceberg table-level branches or catalog-level branches (via Nessie) provide full write isolation. Each branch maintains an independent metadata history. Changes are merged when validation passes.

Configuration for conflict resilience

Configure retry behavior to handle transient conflicts gracefully:

properties
1# Spark retry configuration for Iceberg commits2spark.sql.catalog.lakehouse.commit.retry.num-retries=43spark.sql.catalog.lakehouse.commit.retry.min-wait-ms=1004spark.sql.catalog.lakehouse.commit.retry.max-wait-ms=600005spark.sql.catalog.lakehouse.commit.retry.total-timeout-ms=1800000

For Flink streaming writers, the checkpoint interval determines the conflict window. Shorter checkpoints (30 seconds) mean smaller batches and faster retries. Longer checkpoints (5 minutes) accumulate more data per commit but have wider conflict windows with concurrent maintenance.

This is exactly where a coordination layer becomes critical. Without centralized awareness of which engines are writing to which partitions, and when maintenance is scheduled, teams resort to tribal knowledge and ad-hoc scheduling. The result is conflicts at 3 AM, failed pipelines, and data freshness regressions. Coordinated maintenance — where compaction and cleanup are scheduled around active write patterns across every engine — eliminates this entire class of operational failure.

The routing layer: sending queries to the right engine

Having multiple engines is not valuable unless queries land on the right one. Without routing, every team picks the engine they know — regardless of whether it is the cheapest or fastest for the query at hand. A heavy ETL job runs on the interactive Trino cluster because the data engineer has Trino credentials. A point lookup pays Snowflake credit pricing because the analyst's BI tool is configured for Snowflake. Teams report that 40–60% of their multi-engine cost savings come from routing alone — simply sending each query to the cheapest viable engine.

Routing architecture

A SQL routing proxy sits between clients and engines. It accepts connections on standard protocols (Trino HTTP, PostgreSQL wire, MySQL wire, Arrow Flight SQL), evaluates routing rules against the query, selects the appropriate engine pool, translates the SQL dialect if needed, and dispatches. To the client, it looks like a regular database connection. The routing decision is transparent.

The routing decision happens in two stages. First, routing rules evaluate the query (protocol, SQL text, client tags, headers, or custom logic) and select a cluster group — a named pool of engines configured for a specific workload type. Second, within that group, a selection strategy (round-robin, least-loaded, failover, or weighted) picks the specific engine instance that is healthy, enabled, and under its concurrency limit.

QueryFlux: the multi-engine routing layer

LakeOps provides QueryFlux as the multi-engine routing layer — an open-source, Rust-based SQL proxy built for exactly this problem. QueryFlux accepts connections on Trino HTTP, PostgreSQL wire, MySQL wire, and Arrow Flight SQL protocols. It evaluates routing rules, selects the optimal engine, translates the SQL dialect via sqlglot, and dispatches — all with ~0.35ms p50 proxy overhead.

Six router types are available: protocol-based (route by connection type), header-based (inspect HTTP headers), query regex (match SQL patterns), client tags (route by session metadata), compound conditions (AND/OR logic across multiple signals), and Python scripts for fully custom routing logic. First match wins.

QueryFlux configuration defines clusters, groups, and routing rules:

yaml
1clusters:2  trino-prod:3    engine: trino4    endpoint: http://trino-coordinator:80805  spark-batch:6    engine: spark7    endpoint: http://spark-thrift:100018  duckdb-local:9    engine: duckDb10    databasePath: /data/lakehouse.duckdb11  athena-adhoc:12    engine: athena13    region: us-east-114    workgroup: analytics15 16clusterGroups:17  interactive:18    members: [trino-prod, duckdb-local]19    maxRunningQueries: 20020    strategy:21      type: leastLoaded22  batch:23    members: [spark-batch, athena-adhoc]24    maxRunningQueries: 5025    strategy:26      type: costOptimized27  governed:28    members: [trino-prod]29    maxRunningQueries: 10030 31routers:32  - type: queryRegex33    rules:34      - regex: "^INSERT|^MERGE|^CREATE|^ALTER"35        targetGroup: batch36  - type: headerBased37    rules:38      - header: X-Workload-Type39        value: bi-dashboard40        targetGroup: governed41  - type: protocolBased42    postgresWire: interactive43    trinoHttp: interactive44 45routingFallback: interactive

Routing strategies

Three strategies apply to different objectives:

Cost-optimized routing sends each query to the cheapest engine that can execute it within acceptable latency bounds. CPU-heavy joins go to compute-priced engines (Trino, where CPU-seconds are cheap). Scan-heavy reads go to scan-priced engines (Athena, where selective queries with good partitioning pay very little). This maximizes savings for batch workloads, scheduled reports, and background processing.

Latency-optimized routing sends each query to the fastest engine for the given query pattern. Uses historical P50 latency per engine for similar query shapes. Interactive dashboards, BI tools, and AI agent queries that need near-instant responses use latency routing.

Throughput-balanced routing distributes queries across engines to maximize total throughput and prevent bottlenecks. Considers current engine utilization and remaining capacity. Mixed workloads with varying patterns use throughput routing.

In practice, you assign a strategy per workload group rather than globally. Analytics dashboards get latency routing on a Trino + DuckDB pool. ETL pipelines get cost routing on a Spark + Athena pool. BI gets governed routing on Snowflake with Trino fallback.

Routing endpoints

Named, workload-scoped endpoints give each consumer a stable connection point. An analytics endpoint routes SELECT and AGGREGATE queries to a Trino + DuckDB pool at high priority. An ETL endpoint routes INSERT and MERGE to a Spark + Athena pool. A BI endpoint routes to Snowflake + Trino with governed access. A reports endpoint routes to Snowflake + ClickHouse for scheduled batch exports.

Each endpoint has its own URL, engine pool, concurrency limits, priority level, and fallback rules. Applications connect to their endpoint and inherit all routing, translation, and failover behavior. Adding or removing engines from a pool requires no client changes — the endpoint abstraction decouples consumers from the backend topology.

Dialect translation

Multi-engine routing requires SQL compatibility across dialects. A client connected via PostgreSQL wire sending Trino-flavored SQL to a DuckDB backend will fail without translation. QueryFlux handles this with sqlglot-backed automatic dialect conversion across 30+ dialects. The SQL proxy detects the source dialect from the frontend protocol, detects the target dialect from the engine type of the selected cluster, and translates the AST automatically. Compatible dialects (e.g., MySQL → StarRocks) skip translation entirely.

Common translation cases include catalog-qualified names (Trino's three-part catalog.schema.table stripped to schema.table for Athena), function mapping (date_diff in Trino vs datediff in DuckDB), type casting variations, and window function syntax differences. For edge cases, custom Python transform scripts run after sqlglot translation. The result: clients write SQL in whatever dialect they know, and the proxy handles the rest. For the full routing architecture and dialect translation details, see routing multiple query engines with Iceberg.

Governance across engines

The hardest operational challenge in multi-engine architectures is not performance — it is governance. When six engines access the same tables, ensuring consistent access control, audit logging, and data masking across all of them requires deliberate design.

The consistency problem

Each engine has its own security model. Trino uses its own authentication and authorization plugins. Spark relies on the underlying catalog's permissions or Ranger policies. Snowflake enforces RBAC through its own role hierarchy. Athena delegates to IAM and Lake Formation. DuckDB has no built-in access control. If you configure permissions independently in each engine, they will inevitably diverge — a user blocked in Snowflake can access the same data through Trino.

Catalog-level governance

The solution is to enforce access control at the catalog layer, not the engine layer. When the catalog controls who can access each table — via RBAC, credential vending, and scoped tokens — engines inherit those restrictions regardless of their own security model. A REST catalog with credential vending only issues storage tokens for tables the authenticated principal is authorized to access. Even if Trino has no access-control plugin configured, the principal cannot read data that the catalog refuses to vend credentials for.

Snowflake's Horizon Catalog enforces row-access policies and column masking for external engines connecting via the REST spec. As of mid-2026, this enforcement is GA for Spark (via the Snowflake Connector v3.1.6+) and in preview for other engines via the Iceberg REST Scan Plan API, which applies governance policies server-side so fine-grained protections travel across compatible engines. Apache Polaris provides built-in RBAC with namespace-level and table-level grants. Lakekeeper uses OpenFGA for policy decisions with OPA integration for Trino-specific rule enforcement. For a comparison of governance capabilities across catalogs, see catalog comparison.

Audit across engines

Audit logging is equally critical. Every engine access should produce an audit trail that answers: who queried what table, when, from which engine, and how much data was scanned. The catalog is the natural collection point — every metadata operation passes through it. REST catalogs that emit CloudEvents (Lakekeeper) or persist metrics (Polaris 1.4+) provide this without instrumenting each engine independently.

But catalog-level audit only captures metadata operations. Query-level audit — what SQL was executed, how long it took, how much it cost, which engine ran it — requires cross-engine observability. This is where a routing layer with unified telemetry becomes essential: every query passes through the proxy, and the proxy records the full lifecycle regardless of which engine ultimately executed it.

The unified control plane: LakeOps

Modern lakehouse with LakeOps control plane
LakeOps as the multi-engine control plane — Spark, Trino, Flink, Snowflake, Athena, and DuckDB all connected through a unified management layer that coordinates maintenance, routing, and optimization with visibility across every engine.
LakeOps platform walkthrough — multi-engine catalog connection and query routing in action.

Running multiple engines on shared Iceberg tables creates operational challenges that no single engine can solve alone. Each engine sees only its own queries. No engine has visibility into how other engines access the same tables. Maintenance decisions made by one engine (sort orders, compaction schedules, file sizes) affect every other engine's performance — but there is no shared context to make those decisions well.

LakeOps is a dedicated lakehouse control plane that connects to your existing catalogs and all your engines simultaneously. It provides the cross-engine intelligence layer that multi-engine architectures require.

Engine management from one UI

LakeOps provides a single console to connect all engines, monitor health, and compare performance side by side. Register Spark clusters, Trino coordinators, Snowflake warehouses, Athena workgroups, Flink deployments, and DuckDB instances. Each engine shows status (Active, Inactive, Maintenance), query success rate, average runtime, cost per query, total queries, and data scanned. Trigger maintenance mode on any engine and it is automatically excluded from routing — in-flight queries fail over to remaining healthy members.

The engine comparison view surfaces which engine performs best for each query shape on your actual workloads. These numbers drive routing rules: if Trino handles a join at $0.03 in 1.8 seconds while Snowflake runs the same join at $0.08 in 2.1 seconds, the routing layer knows where to send it.

Cross-engine query telemetry

LakeOps collects query patterns from every connected engine — Trino, Spark, Flink, Athena, Snowflake, and DuckDB. It tracks which columns appear in WHERE, JOIN, and GROUP BY clauses per table, across all engines. This unified observability is impossible to achieve from within any single engine.

The telemetry feeds three optimization loops:

Sort order optimization. When Trino queries filter on customer_id but Athena queries filter on event_date, LakeOps evaluates whether a composite sort, Z-order, or primary/secondary sort best serves the combined workload. The decision reflects all engines, not just the loudest one.

Routing improvement. Historical latency and cost data per query shape per engine feeds back into routing decisions. The system learns which engine performs best for each recurring query template — routing improves automatically without manual rule updates.

Per-engine cost attribution. Cross-engine cost telemetry shows total spend per engine, cost per query per engine, and spend by workload group. Platform teams see exactly where their money goes and which routing changes would have the biggest impact. Teams using LakeOps report 40–60% cost savings from routing alone — sending queries to the cheapest viable engine eliminates the casual overspend that happens when every team defaults to the engine they know.

Coordinated maintenance across engines

The most insidious problem in multi-engine deployments is uncoordinated maintenance. A cron-scheduled Spark compaction job that runs at midnight does not know that Flink is actively writing to the same partitions. A manual orphan cleanup does not know that a Trino query started 20 minutes ago is still referencing snapshots that are about to be expired.

LakeOps runs the full maintenance pipeline — compaction, snapshot expiration, orphan cleanup, manifest optimization — on its own Rust-based execution engine, completely independent of your query engines. Maintenance does not consume Trino cluster resources, does not compete with Spark ETL for executor slots, and does not require provisioning dedicated maintenance clusters.

The maintenance pipeline is coordinated with write activity across all engines. LakeOps observes Flink's commit cadence and schedules compaction during checkpoint gaps. It detects Spark ETL completion events and triggers compaction on freshly written partitions. It avoids compacting partitions that are actively receiving streaming writes. Compaction and maintenance never conflicts with any engine's active writes. This multi-engine write awareness prevents the conflicts that plague cron-scheduled maintenance.

The Rust execution engine processes Parquet data through Arrow columnar buffers with bounded memory and lock-free parallelism. Binpack compaction runs 86% faster than Spark in production benchmarks — no JVM overhead, no GC pauses, no warm-up time.

Table health across the fleet

Every table in the lake is continuously scored — Critical, Warning, or Healthy — based on structural metrics: file count, average file size, delete file ratio, manifest depth, snapshot accumulation, and partition skew. The health classification reflects how the table's current physical state affects query performance across all engines.

A table scored Critical has degraded beyond the point where any engine can query it efficiently. A Warning-level table is functional but losing performance. The routing layer integrates with this health data: when a table is fragmented, queries on that table route to engines that handle degraded state better — Trino over DuckDB, for instance — until compaction resolves the issue. As tables are compacted and sorted, more engines become eligible for each query shape, and routing adapts automatically.

Putting it together: the five-layer architecture

A production multi-engine Iceberg architecture has five layers:

1. Storage layer. Object storage (S3, GCS, ADLS) holding Parquet data files and Iceberg metadata. The physical substrate. All engines read from and write to this layer.

2. Catalog layer. A REST catalog (Polaris, Horizon, Nessie, Lakekeeper, Glue REST endpoint) that resolves metadata, coordinates commits, vends credentials, and enforces access control. The single coordination point for all engines.

3. Engine layer. Multiple compute engines, each handling its workload type. Spark for batch writes, Trino for interactive reads, Flink for streaming, Snowflake for governed BI, Athena for serverless ad-hoc, DuckDB for local development.

4. Routing layer. A SQL proxy (QueryFlux) that dispatches queries to the right engine based on workload type, cost model, engine health, and table state. Provides stable, workload-scoped endpoints for clients regardless of which engine ultimately executes the query.

5. Control plane. LakeOps as the cross-engine management layer — collecting telemetry from all engines, running autonomous maintenance, optimizing table layouts based on combined access patterns, coordinating maintenance with active writes across every engine, and adapting routing to table health.

This layered architecture separates concerns cleanly. The catalog handles identity and coordination. Engines handle compute. The routing layer handles dispatch and dialect translation. The control plane handles optimization and observability. Each layer can evolve independently — you can swap engines, change catalogs, or adjust routing without rebuilding the others.

Getting started with multi-engine

If you are running a single engine today and want to add multi-engine access incrementally, the path is straightforward.

1. Adopt a REST catalog. If you are on Hive Metastore, migrate to a REST-based catalog (Polaris, Lakekeeper, Horizon, or Glue REST endpoint). This is the prerequisite for clean multi-engine access. See the catalog comparison for guidance.

2. Add a read-only engine. Start with a second engine in read-only mode. Trino for interactive queries if you currently only have Spark. DuckDB for local development. Athena for serverless ad-hoc. No write conflicts are possible when the new engine only reads.

3. Deploy routing. Once you have two or more engines, deploy QueryFlux as a routing proxy. Start with protocol-based routing (PostgreSQL wire → Trino, Trino HTTP → Trino). Add query-type routing as patterns emerge (DML → Spark, SELECT → Trino). See the multi-engine routing solution for deployment patterns.

4. Add governance. Ensure that catalog-level access control applies to all engines. If using Polaris or Horizon, define roles and grants at the namespace level. If using Lakekeeper, configure OpenFGA policies. Verify that a principal blocked in the catalog is blocked across every engine.

5. Connect LakeOps. Attach LakeOps to your catalog and engines. Enable cross-engine telemetry collection. Let the maintenance pipeline take over compaction, expiration, and cleanup — coordinated with every engine's write activity. Enable table-health-aware routing. The system will surface which tables need attention, which sort orders should change based on real access patterns, and which routing adjustments would reduce costs — all based on real workload data across every engine.

Over time, the architecture stabilizes. Query routing automatically directs workloads to the most cost-effective engine. Maintenance keeps tables compact and sorted for all consumers. Governance ensures consistent access control. The platform team manages configuration and policies — not individual engine operations.

Conclusion

Multi-engine is not a nice-to-have in production Iceberg lakehouses — it is the natural state. The table format was designed for it. The REST catalog enables it. The economics demand it. But multi-engine without coordination is just multi-engine chaos: conflicting writes, duplicated costs, inconsistent governance, and no visibility into which engine is the right one for a given query.

The architectural requirements are clear: a REST catalog as the coordination point, write isolation patterns that prevent conflicts, a routing layer that dispatches queries intelligently based on cost, latency, and workload shape, dialect translation that makes engine choice transparent to clients, governance that stays consistent across every access path, and maintenance that is coordinated with active writes across every engine. LakeOps provides the control plane that ties these requirements together — cross-engine telemetry, autonomous maintenance, QueryFlux routing with workload-scoped endpoints, per-engine cost attribution, and table health management — so your platform team focuses on architecture decisions rather than operational firefighting.

For more on specific aspects of multi-engine architecture: query routing patterns, catalog selection, moving from single-vendor to open, and the multi-engine routing solution.

Related articles

Found this useful? Share it with your team.