Apache Iceberg 1.11.0 — What's New?

Apache Iceberg 1.11.0 was released on May 19, 2026, and it represents a significant milestone for anyone operating Iceberg-based lakehouses in production. Where previous releases introduced V3 specification features as experimental capabilities, this release hardens them into stable, production-ready defaults.

The highlights: deletion vectors backed by Roaring bitmaps replace fragmented positional delete files, a native Variant type handles semi-structured data without the performance tradeoffs of raw strings, and the REST catalog gains server-side scan planning to shift metadata traversal off query engines entirely. On top of that, built-in table encryption with KMS support, a pluggable File Format API, and first-class Spark 4.1 and Flink 2.1 support round out a release that touches nearly every layer of the stack.

This post walks through each major change, what it means for your pipelines, and where LakeOps fits in.

Deletion vectors — from fragmented deletes to bitmap precision

If you run CDC, streaming ingest, or heavy MERGE INTO workloads on Iceberg v2 tables, you know the pain of positional delete file accumulation. Every row-level update or delete generates a separate positional delete file. Over time, a single data file can be tied to dozens of small delete files. At read time, the engine has to open, read, and cross-reference all of them — a process that degrades query performance until a compaction job cleans things up.

Iceberg 1.11.0 stabilizes deletion vectors (DVs) as the default mechanism for row-level deletes in V3 tables. Instead of writing separate delete files, the engine maintains a highly compressed Roaring bitmap stored in the Puffin file format. There is a strict 1:1 relationship between a data file and its deletion vector. When a row is deleted, the bitmap is updated directly. At read time, the engine applies the bitmap mask instantly — no file-open overhead, no cross-referencing.

To enable deletion vectors in your Spark pipelines:

sql

1CREATE TABLE prod.logistics.orders (2    order_id BIGINT,3    customer_id STRING,4    status STRING,5    updated_at TIMESTAMP6)7USING iceberg8TBLPROPERTIES (9    'format-version' = '3',10    'write.delete.mode' = 'merge-on-read',11    'write.update.mode' = 'merge-on-read',12    'write.delete.format' = 'puffin'13);

With this configuration, standard MERGE INTO statements manage the underlying bitmaps automatically without littering your storage with small files.

The practical impact is substantial: read performance on tables with heavy update patterns improves immediately, and the interval between compaction runs can be extended because there is no delete file accumulation to clean up.

What this means for maintenance: deletion vectors reduce the urgency of compaction but do not eliminate it. Data files still fragment over time as new files are written through inserts and overwrites. A well-tuned compaction strategy remains essential for query performance at scale — deletion vectors simply remove one of the most painful sources of read amplification.

Native Variant type — semi-structured data without the tradeoffs

Handling JSON in a data lake has always required choosing between flexibility and performance. Store it as a raw string and every query pays the cost of parsing the full document. Flatten it into typed columns and you break downstream pipelines the moment an upstream application adds a field.

The Variant type, now stable in 1.11.0, eliminates this tradeoff. It is a first-class binary encoding for semi-structured data that preserves schema flexibility while enabling predicate pushdown directly into the binary structure.

The key mechanism is shredding: frequently accessed nested paths are automatically materialized as sub-columns under the hood, while the schema remains a single unified VARIANT column. The query engine can push filters into these sub-columns without scanning the full document.

sql

1CREATE TABLE prod.ecommerce.events (2    event_id STRING,3    event_time TIMESTAMP,4    payload VARIANT5)6USING iceberg7TBLPROPERTIES ('format-version' = '3');8 9-- Predicate pushdown into nested Variant fields10SELECT event_id, payload.variant_column['device']['os'] AS os_type11FROM prod.ecommerce.events12WHERE payload.variant_column['customer']['region'] = 'APAC';

For data engineers running IoT, clickstream, or event-driven pipelines, this removes an entire class of ETL complexity. You no longer need to maintain separate raw and curated layers for semi-structured data — a single Variant column handles both exploration and production queries.

Server-side scan planning in the REST catalog

In previous Iceberg versions, the query engine handled scan orchestration locally. The driver would traverse the table's metadata tree — retrieving manifest lists and manifest files from object storage — to determine which data files to read. For large tables with thousands of manifests, this meant significant latency before a query could even begin execution.

Iceberg 1.11.0 shifts this work into the catalog through server-side scan planning. Instead of traversing manifests locally, the engine submits a single POST …/plan request to the REST catalog. The catalog returns optimized FileScanTask objects directly.

The API handles scale gracefully: smaller scans return immediate results, extensive operations return a plan-id for polling, and massive datasets are retrieved via parallel POST …/tasks requests.

The result is that the driver no longer touches metadata in object storage. Planning moves from the engine to the catalog, where it can be cached, optimized, and served at catalog-level throughput rather than object-storage latency.

Operational implication: this change makes manifest optimization even more important. Fewer, well-organized manifests mean faster scan planning responses from the catalog. If your tables have hundreds of small manifests, the catalog still has to traverse them — server-side planning shifts the work but does not eliminate the need for clean metadata.

Built-in table encryption with KMS support

As Iceberg lakehouses increasingly store sensitive PII, financial, and healthcare data, bucket-level encryption is no longer sufficient for compliance requirements. Iceberg 1.11.0 introduces built-in table encryption using envelope encryption with a three-tier key hierarchy:

Table master keys reside in your KMS (AWS KMS, Google KMS) and never leave the key management service
Key-encryption keys (KEKs) are stored in table metadata, wrapped by the master key
Data-encryption keys (DEKs) are unique per file, wrapped by a KEK

Every data file and manifest list is encrypted with AES-GCM under its own unique DEK. This means that even if an attacker gains direct access to your object storage, the data remains unreadable. Built-in authentication tags guard against tampering, and key rotation happens automatically as keys age — no need to rewrite datasets.

This architecture is particularly relevant for multi-engine lakehouses where the same tables are accessed by Spark, Trino, Flink, and BI tools. Encryption at the table level ensures consistent protection regardless of which engine reads the data.

Pluggable File Format API

Iceberg's format-handling code historically grew organically around Parquet, Avro, and ORC. Adding a new format or enforcing consistent feature support across all formats meant duplicating complex engine-specific code paths.

The finalized File Format API in 1.11.0 introduces a clean plugin model:

FormatModel — a standardized implementation defining how a file format handles reader/writer construction and its capabilities
FormatModelRegistry — a central directory where query engines fetch appropriate read and write builders

Beyond code cleanup, this API has two forward-looking implications. First, it opens the door to next-generation formats optimized for AI workloads — like Lance for vector embeddings or Vortex for GPU-accelerated reads — without requiring engine-level rewrites. Second, it enables column families: vertical partitioning of storage where different groups of columns can be read or updated independently.

For maintenance operations, column families could mean targeted compaction on specific column groups rather than rewriting entire files — a meaningful efficiency gain for tables with wide schemas and mixed access patterns.

Spark 4.1 and Flink 2.1 support

Iceberg 1.11.0 adds native support for Spark 4.1 and Flink 2.1, making both the default build targets. Spark 3.4 support is dropped.

Spark 4.1 highlights

MERGE INTO now supports automatic schema evolution via the WITH SCHEMA EVOLUTION clause. If a merge source carries columns the target table lacks, Spark adds those columns within the same statement — no separate ALTER TABLE needed. The 1.11 connector also modernizes against Spark's DataSource V2 APIs and adds an asynchronous micro-batch planner that speeds up Structured Streaming.

Flink 2.1 and DynamicIcebergSink

The centerpiece of the Flink work is DynamicIcebergSink, an experimental sink that breaks the one-sink-per-table model. A single sink routes each record to a table chosen at runtime, creating tables on demand and evolving schemas and partition specs on the fly. Flink also gains support for nanosecond timestamps, the Variant type, and the unknown type from the V3 spec.

For teams running multi-table streaming pipelines, DynamicIcebergSink can dramatically simplify topology. Instead of maintaining a separate Flink job per destination table, a single job can handle the routing — though this also means more tables ingesting data simultaneously, which increases the importance of automated table maintenance to keep file counts and metadata healthy across all of them.

SQL UDF specification

1.11.0 introduces a metadata format for SQL user-defined functions — both scalar and table functions. UDFs are stored as versioned, immutable JSON files in object storage with full rollback support. Parameter and return types map to Iceberg type representations, including support for nested maps, structs, and the new Variant type. Each UDF carries engine-specific implementations, so a single logical function can be optimized for Spark, Trino, or any other engine.

Nanosecond timestamp precision

V3 tables now support timestamp_ns and timestamptz_ns types natively. This is critical for high-frequency financial data, precision IoT logging, and any workload where microsecond timestamps introduce rounding errors. The new types are supported across Spark, Flink, and the core Iceberg library.

Infrastructure changes

JDK 17 baseline — Java 11 support is dropped. The Iceberg build and runtime now target JDK 17, bringing better garbage collection for containers, improved performance, and modern language features
Spark 3.4 removed — only Spark 3.5, 4.0, and 4.1 are supported
Geospatial types — the V3 spec adds geometry bounding box types with intersects checking, laying groundwork for spatial queries on Iceberg tables

What 1.11.0 means for lakehouse operations

Every major feature in this release has maintenance implications:

Deletion vectors reduce delete file accumulation but tables still need compaction for fragmented inserts. The operational question shifts from "how often do I compact to clean up delete files?" to "how do I compact based on actual file health signals?"

Variant type adoption means tables may carry wider effective schemas with mixed access patterns. Sort-order and compaction strategies need to account for which shredded columns are actually queried.

Server-side scan planning rewards well-maintained metadata. Tables with consolidated manifests get faster planning responses; tables with hundreds of small manifests still pay metadata traversal costs — just on the catalog side instead of the engine side.

Flink DynamicIcebergSink multiplies the number of tables receiving streaming writes. Each table needs its own maintenance lifecycle — compaction, snapshot expiration, orphan cleanup — managed automatically rather than through per-table scripts.

Table encryption adds a security layer but does not change maintenance patterns. Compaction, snapshot expiration, and manifest rewrites operate transparently on encrypted tables.

This is where autonomous lakehouse operations become essential. As the number of tables, engines, and ingestion patterns grows, manual maintenance does not scale. LakeOps connects to your Iceberg catalogs, continuously monitors table health telemetry — file counts, partition distributions, snapshot velocity, manifest overhead, and query patterns across all engines — and runs coordinated maintenance operations (compaction, snapshot expiration, orphan cleanup, manifest rewrites) automatically, triggered by actual data signals rather than rigid schedules.

For teams upgrading to 1.11.0, the combination of V3 features and autonomous maintenance means you can adopt deletion vectors, Variant columns, and multi-engine streaming without scaling your operations team to match.

Upgrading to 1.11.0

Update your build dependencies to version 1.11.0. Key migration considerations:

Ensure your runtime environment supports JDK 17
If you are on Spark 3.4, upgrade to Spark 3.5 or 4.1 before updating Iceberg
Test MERGE INTO workloads with deletion vectors enabled to validate performance improvements
Review your manifest rewrite policies — server-side scan planning makes lean metadata more valuable than ever
If you use encryption, configure your KMS provider and enable table-level encryption for sensitive datasets

For a complete list of changes, see the official Apache Iceberg release notes.