
Trino has become one of the most popular engines for querying Apache Iceberg tables in production lakehouses. Its stateless, distributed SQL architecture makes it a natural fit for interactive analytics, ad-hoc exploration, and BI workloads over open table formats. But getting fast, cost-efficient queries out of Trino on Iceberg is not automatic — it requires understanding how Trino plans scans against Iceberg metadata, how file layout affects pruning, and where Trino-native maintenance falls short at scale.
This guide covers the full optimization surface: how Trino resolves Iceberg metadata and plans queries, the configuration knobs that matter, Trino-native maintenance procedures and their limitations, physical layout strategies, and how a dedicated control plane like LakeOps closes the gaps that Trino cannot address on its own. For broader context on Iceberg performance tuning across engines, see our Iceberg lake analytics optimization guide.
How Trino queries Iceberg tables
Understanding what happens between SELECT and the first byte of results is essential for optimization. Trino's Iceberg connector follows a multi-stage process to translate SQL into efficient object storage reads.
Metadata resolution
When a query references an Iceberg table, Trino contacts the Iceberg catalog (Hive Metastore, AWS Glue, Nessie, or Iceberg REST) to locate the current metadata pointer. It reads the latest metadata.json file, which references the current snapshot. From the snapshot, Trino walks the manifest list to identify which manifest files describe the table's data files. Each manifest contains per-file metadata: file path, partition values, column-level min/max statistics, row counts, and file size.
This metadata walk is the foundation of Iceberg's query planning advantage over Hive-style tables. Instead of listing directories in object storage (which is slow and eventually consistent on S3), Trino reads a structured metadata tree that tells it exactly which files could contain matching rows.
Scan planning and predicate pushdown
Once Trino has the manifest entries, the coordinator evaluates query predicates against the metadata to eliminate files before any data is read. This happens in three layers:
- 1.Partition pruning — if the table is partitioned and the query filters on a partition column, Trino eliminates entire partitions by comparing the predicate against partition values in the manifest. A query filtering
WHERE event_date = '2026-05-01'on a day-partitioned table skips every partition except that date. - 2.Min/max file pruning — for each surviving partition, Trino checks per-file column statistics. If a file's max value for
user_idis 500 and the query filtersWHERE user_id > 1000, that file is skipped entirely. This works on any column with statistics, not just partition columns. - 3.Row group skipping — within individual Parquet files, Trino uses row group (page) statistics to skip blocks of rows that cannot match the predicate. This is the finest-grained pruning level and depends on data being physically sorted so that value ranges within row groups are tight.
The effectiveness of these layers compounds. Partition pruning might eliminate 95% of files. Min/max pruning removes another 80% of what remains. Row group skipping reduces the bytes read within each surviving file. The combined effect can be orders of magnitude less I/O than a full scan — but only if the table's physical layout cooperates. Dedicated control planes like LakeOps can optimize this layout automatically based on real query patterns — we'll cover how later in this guide.
Query performance tuning
Trino's Iceberg connector exposes several mechanisms for reducing scan volume and query latency. Most of them depend on metadata quality and file layout rather than Trino configuration alone.
Partition pruning
Partition pruning is the highest-impact optimization. When a query filters on a partition column, Trino can skip entire sets of files without reading any data. Iceberg supports hidden partitioning transforms — day(ts), month(ts), bucket(id, 16), truncate(region, 3) — which means partitioning does not require users to add synthetic columns to their data.
To verify that partition pruning is active, use EXPLAIN and check for partitionConstraint in the output:
1EXPLAIN SELECT * FROM lakehouse.events2WHERE event_date = DATE '2026-05-01';If the constraint appears, Trino is pruning at the partition level. If it does not, the predicate may not align with the partition spec — a common issue when timestamp filters do not match the partition transform granularity. For detailed partition design guidance, see our Iceberg partitioning best practices guide.
Min/max column statistics and file pruning
Beyond partitions, Trino uses per-file column statistics stored in Iceberg manifests to skip files whose value ranges do not overlap with query predicates. This is sometimes called "data skipping" or "file pruning" and it works on any column, not just partition columns.
The effectiveness depends on how well the data is physically sorted. If a column is used frequently in WHERE clauses but the data is written in arrival order, the min/max ranges per file will span nearly the entire domain, and no files can be pruned. Sorting the data on that column during compaction tightens the ranges dramatically — each file covers a narrow slice of the value space, and most files can be skipped.
Consider a transactions table with 10,000 files. Without sorting on account_id, a query filtering WHERE account_id = 42 may need to read all 10,000 files because every file's min/max range includes 42. After sorting on account_id, the same query might read only 3 files — a 3,000x reduction in I/O.
Row group skipping
Parquet files are internally organized into row groups, each with its own column statistics. When Trino reads a file, it checks row group stats before decompressing data. If a row group's statistics show that no rows can match the predicate, Trino skips that row group entirely.
Row group skipping is most effective when the data within each file is sorted on the filtered column. Unsorted files have row groups with wide, overlapping value ranges — sorting produces row groups with tight, non-overlapping ranges. The Parquet writer's row-group-size (default 128 MB in most configurations) interacts with sort order to determine how many row groups exist per file and how selective they can be.
Dynamic filtering
Trino's dynamic filtering feature collects filter values from the build side of a join and pushes them into the probe-side scan at runtime. For Iceberg tables, this means that a join on a high-cardinality column can reduce the scan on the probe side to only the files and row groups that contain matching values — even though the original query did not have an explicit filter on that column.
Enable dynamic filtering in Trino's configuration:
1# config.properties2enable-dynamic-filtering=true3dynamic-filtering.large.max-distinct-values-per-driver=100004dynamic-filtering.large.range-row-limit-per-driver=100000Dynamic filtering is particularly effective when joining a large fact table against a small dimension table. The dimension-side values are collected during the build phase and pushed down as predicates into the Iceberg scan, potentially eliminating the majority of fact-table files.
Trino-side configuration
Beyond scan planning, several Trino configuration parameters directly affect query performance on Iceberg tables.
Worker memory and split sizes
Trino divides each query into splits — units of work assigned to workers. For Iceberg, each split typically corresponds to a file or a range within a file. The split size and worker memory determine how many splits a worker can process concurrently.
1# config.properties2query.max-memory-per-node=16GB3query.max-total-memory-per-node=20GB4node-scheduler.max-splits-per-node=256Smaller splits increase parallelism but add scheduling overhead. Larger splits reduce overhead but may cause memory pressure on workers with limited RAM. For most Iceberg workloads, 16 GB per-node memory with reasonable split concurrency provides a good balance. Tune node-scheduler.max-splits-per-node based on your worker count and object storage throughput.
Task concurrency
The task.concurrency property controls how many threads each worker uses for processing splits. Higher values increase throughput for I/O-bound scans but can cause contention for CPU-bound operations like aggregations.
1# config.properties2task.concurrency=163task.writer-count=4For scan-heavy Iceberg workloads reading from S3, increasing task.concurrency to 16–32 often improves throughput because workers spend much of their time waiting for object storage responses.
Iceberg-specific connector properties
The Iceberg connector itself has tuning knobs that affect metadata resolution and scan planning:
1# iceberg.properties2iceberg.table-statistics-enabled=true3iceberg.projection-pushdown-enabled=true4iceberg.query-partition-filter-required=true5iceberg.dynamic-filtering.wait-timeout=1sTable statistics feed Trino's cost-based optimizer for better join ordering and aggregation strategies. Projection pushdown eliminates unnecessary columns early in the scan. Requiring partition filters prevents accidental full-table scans on partitioned tables — a common source of runaway queries in production. Dynamic filtering wait timeout controls how long the scan side waits for the build-side filter in join queries.
Table maintenance with Trino
Trino provides built-in procedures for Iceberg table maintenance. These are useful for small-to-medium deployments but come with significant limitations at scale.
Compaction — rewriting data files
Trino's Iceberg connector supports the optimize statement for compacting small files into larger ones. This is Trino's native compaction mechanism:
1-- Compact files with a size threshold (Trino 4xx+)2ALTER TABLE lakehouse.analytics.events EXECUTE optimize3 WHERE event_date > CURRENT_DATE - INTERVAL '7' DAY;4 5-- With explicit file size threshold6ALTER TABLE lakehouse.analytics.events7 EXECUTE optimize(file_size_threshold => '128MB')8 WHERE event_date > CURRENT_DATE - INTERVAL '7' DAY;The optimize statement supports partition-level filtering with WHERE, allowing you to target recent partitions without scanning the entire table. The file_size_threshold parameter controls which files are eligible for compaction — files smaller than this value are candidates for merging.
Snapshot expiration
Iceberg retains snapshots for time travel and rollback. Over time, old snapshots accumulate metadata that slows down planning. Trino can expire snapshots using the expire_snapshots procedure:
1ALTER TABLE lakehouse.analytics.events2 EXECUTE expire_snapshots(retention_threshold => '7d');This removes snapshot metadata older than the retention threshold and marks the associated data files as candidates for cleanup — but does not delete the files themselves.
Orphan file removal
After snapshot expiration, data files that are no longer referenced by any snapshot become orphans. These files consume storage but serve no purpose. Trino can remove them:
1ALTER TABLE lakehouse.analytics.events2 EXECUTE remove_orphan_files(retention_threshold => '3d');The retention_threshold is a safety measure to avoid deleting files from in-progress writes. Setting it to at least 3 days is common practice.
Manifest optimization
Over time, tables accumulate many small manifest files — especially under frequent streaming or micro-batch ingestion. Trino's optimize_manifests procedure rewrites manifests to cluster them by partition, improving scan planning performance:
1ALTER TABLE lakehouse.analytics.events2 EXECUTE optimize_manifests;This is particularly useful when queries include partition filters but manifest files are not grouped by partition — the optimizer must read every manifest to determine which partitions they cover.
Limitations of Trino-native maintenance
Running maintenance through Trino works for small deployments. At scale — hundreds of tables, mixed streaming and batch ingestion, multiple concurrent engines — Trino's built-in procedures become a bottleneck.
JVM overhead. Trino runs on the JVM. Compaction is a memory-intensive, I/O-heavy operation that competes with query workloads for heap space and garbage collection cycles. Running large compaction jobs on the same Trino cluster that serves interactive queries degrades both workloads. Dedicating a separate Trino cluster for maintenance solves contention but doubles infrastructure cost — a significant factor in Iceberg lakehouse cost optimization.
No cross-engine awareness. Trino only sees queries that Trino executes. If Spark, Flink, or Athena also query the same tables, Trino has no visibility into their access patterns. Sort order decisions based only on Trino query patterns may be suboptimal for the overall workload. Similarly, a compaction job triggered by Trino may conflict with a Spark write that just started — Trino has no mechanism to coordinate across engines.
No coordinated sequencing. The three maintenance operations — snapshot expiration, orphan cleanup, compaction — need to run in a specific order. Expiring snapshots first frees files that are no longer needed, so compaction does not waste time rewriting files that will be deleted. Trino provides the individual procedures but no orchestration layer to sequence them correctly. Teams end up building custom cron jobs or Airflow DAGs to coordinate, adding operational overhead.
No event-driven triggers. Trino maintenance is schedule-driven — you run optimize when a cron job fires. There is no mechanism to detect that a table has accumulated enough small files or delete files to warrant compaction. Cron-based scheduling either wastes compute on tables that do not need maintenance or misses tables that need it urgently. For a deep dive into how event-driven compaction works at scale, see efficient lakehouse compaction at scale.
No per-table tuning at lake scale. Each optimize call requires explicit parameters. Across hundreds of tables with different file size targets, partition schemes, and SLA requirements, maintaining per-table configuration in scheduler scripts becomes unwieldy.

Physical layout optimization
The physical layout of data within Iceberg files — how rows are sorted across and within files — is the single most impactful optimization for Trino query performance. Layout determines whether min/max statistics enable file pruning or are useless, whether row group skipping works or does not, and how much data Trino must read to answer a query.
Sort orders
Iceberg tables support a sorted_by table property that determines how data is arranged within partitions. When data is compacted, the compaction engine re-sorts files according to this order.
For Trino workloads where queries predominantly filter on one column (e.g., customer_id), a single-column sort order is optimal:
1ALTER TABLE lakehouse.analytics.events2 SET PROPERTIES sorted_by = ARRAY['customer_id'];After setting the sort order, the next compaction pass will physically re-sort the data. Each file will cover a contiguous range of customer_id values, and min/max statistics will reflect tight, non-overlapping ranges. A query filtering on customer_id will prune the vast majority of files.
Multi-column sort orders
When queries filter on multiple columns, a composite sort order prioritizes the most selective column first:
1ALTER TABLE lakehouse.analytics.events2 SET PROPERTIES sorted_by = ARRAY['customer_id', 'event_type'];The sort hierarchy matters — filtering on customer_id alone still benefits from pruning. Filtering on event_type alone does not, because the secondary sort key is only ordered within each primary key group. Choose the column that appears in the most queries as the primary sort key.
Z-order for multi-dimensional queries
When queries filter on multiple columns with roughly equal selectivity, Z-ordering interleaves the sort dimensions so that filtering on any subset of the Z-ordered columns produces spatial locality. This is useful when there is no single dominant filter column — for example, tables queried by both region and timestamp in different dashboards.
Z-order trades optimal single-column pruning for good multi-column pruning. If one column dominates 90% of queries, a linear sort on that column will outperform Z-order. If two or three columns each appear in 30–40% of queries, Z-order is the better choice.
Choosing the right strategy
The optimal layout depends on actual query patterns — which columns appear in WHERE, JOIN, and GROUP BY clauses, and how frequently. In practice, most teams set a sort order at table creation and never revisit it, even as query patterns evolve. The gap between the current layout and the optimal layout grows silently, manifesting as gradual query slowdowns that are hard to diagnose because the data volume is also growing.
This is where production telemetry becomes essential. Without visibility into how each table is actually queried — across all engines, not just Trino — layout decisions are guesswork.
How LakeOps enhances Trino-based Iceberg lakehouses
LakeOps is a dedicated lakehouse control plane built in Rust on Apache DataFusion. It connects to your existing Iceberg catalogs and query engines — including Trino — without code changes, infrastructure provisioning, or data movement. For Trino-centric lakehouses, it addresses every maintenance limitation outlined above while adding capabilities no engine-embedded approach can provide.

Key capabilities for Trino-based lakehouses:
- Cross-engine query telemetry — collects query patterns from Trino, Spark, Flink, Athena, Snowflake, and DuckDB. Tracks which columns appear in WHERE, JOIN, and GROUP BY clauses per table. Sort orders reflect how Trino actually queries tables — not how a data engineer guessed six months ago.
- Query-aware sort optimization — evaluates single-column sort, multi-column sort, and Z-order strategies independently per table. Selects the approach that maximizes data skipping for the actual query mix from all engines, not a global default.
- Layout simulations — tests proposed sort changes on a real Iceberg branch, replaying production queries and comparing scan reduction before modifying any production data.
- Autonomous maintenance off-cluster — runs compaction, snapshot expiration, orphan cleanup, and manifest optimization on its own Rust engine, not on your Trino cluster. Interactive query performance is unaffected by background maintenance.
- Rust execution engine — Parquet data processed through Arrow columnar buffers with bounded memory and lock-free parallelism. Binpack compaction 86% faster than Spark in production benchmarks — no JVM overhead, no GC pauses.
- Coordinated maintenance pipeline — snapshot expiration → orphan cleanup → compaction → manifest optimization, sequenced per table. Triggered by structural signals (file count, delete-file ratio, manifest depth), not cron schedules.
- Lake-wide health classification — every table scored as Critical, Warning, or Healthy. Severity-ranked Insights surface issues (excessive manifests, partition skew, small files) before Trino queries degrade. See table health maintenance and observability.
- Multi-engine conflict avoidance — maintenance scheduling respects active writes from Flink, Spark, and other engines, preventing conflicts when Trino's
optimizewould collide with concurrent operations.




Best practices checklist
Use this checklist to systematically optimize your Trino-on-Iceberg deployment:
Query planning and pruning:
- Partition tables on the highest-cardinality column used in
WHEREclauses (typically a date or timestamp) - Verify partition pruning is active with
EXPLAINfor your most frequent queries - Ensure Iceberg column statistics are enabled and up-to-date — stale or missing stats disable file pruning
- Sort data on the columns most frequently used in predicates to tighten min/max ranges
- Choose between single-column sort (dominant filter) and Z-order (multiple roughly-equal filters) based on actual query patterns
Trino configuration:
- Enable dynamic filtering for join-heavy workloads
- Set
task.concurrencyto 16–32 for I/O-bound S3 scans - Enable
iceberg.table-statistics-enabledandiceberg.projection-pushdown-enabledfor cost-based optimization - Set
iceberg.query-partition-filter-required=trueto prevent accidental full-table scans - Size worker memory to support your target split concurrency — 16 GB per node with
node-scheduler.max-splits-per-node=256is a solid baseline - Monitor query plans for full-table scans that bypass pruning
Table maintenance:
- Run snapshot expiration before compaction — free unreferenced files first so compaction does not rewrite data that will be deleted
- Run orphan cleanup after snapshot expiration to reclaim storage from dereferenced files
- Target file sizes of 128–256 MB for point-lookup tables, 256–512 MB for full-scan analytics
- Do not run
optimizeon the same Trino cluster serving interactive queries — use a dedicated cluster or offload to a purpose-built engine - Revisit sort orders quarterly, or use telemetry-driven tooling to detect when access patterns have shifted
At scale — use a dedicated control plane:
- Replace cron-scheduled maintenance with event-driven triggers based on table structural signals
- Use a dedicated maintenance engine (like LakeOps) to avoid JVM overhead and query contention — maintenance should not compete with Trino for cluster resources
- Collect query telemetry across all engines — not just Trino — to inform sort order decisions. A control plane that sees Spark, Flink, Athena, and Trino traffic makes layout decisions no single engine can
- Classify table health lake-wide (Critical → Warning → Healthy) and address the most degraded tables first
- Automate the full maintenance pipeline — snapshot expiration, orphan cleanup, compaction, manifest optimization — as a coordinated sequence, not independent cron jobs
Conclusion
Trino is a powerful query engine for Apache Iceberg lakehouses, but getting the best performance requires work beyond writing SQL. Partition pruning, min/max statistics, sorted file layouts, and proper Trino configuration form the foundation. Trino-native maintenance procedures handle basic compaction and cleanup but break down at scale — JVM overhead, no cross-engine awareness, no coordinated sequencing, no event-driven triggers.
For production lakehouses with hundreds of tables and multiple query engines, a dedicated control plane bridges the gap. LakeOps collects query telemetry from Trino and every other engine, applies sort orders based on actual access patterns, runs maintenance on a purpose-built Rust engine without touching your Trino cluster, and coordinates the full maintenance pipeline autonomously. The result: Trino queries run faster, storage costs drop, and platform teams stop maintaining maintenance infrastructure.



