Organization-Wide Policies

LakeOps lets you define and enforce compaction, retention, orphan cleanup, and maintenance policies across catalogs and tables. Set schedules, priorities, and target scopes — then let LakeOps execute continuously. Every policy is auditable, versioned, and controllable with one toggle.

Why use policies?

Configuring optimization settings table-by-table works for a handful of tables, but doesn't scale. Policies let you define optimization rules once and apply them across hundreds or thousands of tables automatically.

  • Consistency — every table gets the same optimization standards without manual setup
  • Scale — onboard new tables automatically as they inherit catalog or org-wide policies
  • Governance — every policy change is versioned and auditable with full history
  • Control — enable or disable any policy instantly with a single toggle

How policies work

A policy is a named rule that applies a specific optimization operation to a set of tables on a schedule. Each policy has:

  • A name and optional description
  • A type (Compact Data Files, Expire Snapshots, etc.)
  • A scope (which tables it applies to)
  • A cron schedule controlling when it runs
  • An enable/disable toggle for instant control
  • Type-specific settings (target file size, retention period, strategy, etc.)

When enabled, LakeOps executes the policy on the configured cron schedule. You can also trigger any policy manually at any time.

Policy types

LakeOps supports the following policy types, each mapping to a specific table optimization operation:

Compact Data Files

Merge small data files into optimally-sized files using Binpack or Sort strategy. Reduces file count, improves query performance, and lowers storage request costs.

Configurable settings

  • • Target file size (default: 512 MB)
  • • Compaction strategy: Binpack (size) or Sort (queries)
  • • Cron schedule

Learn more in Compaction docs

Expire Snapshots

Remove snapshots older than the retention period while respecting minimum retention count and concurrent readers. Keeps metadata lean and enables storage reclamation.

Configurable settings

  • • Retention period (use table config or custom days)
  • • Minimum snapshots to retain
  • • Delete associated metadata files (on/off)
  • • Create associated files (on/off)
  • • Cron schedule

Learn more in Snapshot Management docs

Rewrite Manifests

Consolidate manifest files to reduce metadata overhead and improve query planning performance across all connected engines.

Configurable settings

  • • Cron schedule

Learn more in Manifest Optimization docs

Remove Orphan Files

Detect and safely remove unreferenced data files older than the configured age threshold. Reclaims storage from failed writes, expired snapshots, and dropped tables.

Configurable settings

  • • Retention threshold (default: 7 days)
  • • Cron schedule

Learn more in Orphan Cleanup docs

Configuration & Governance (UI label: Configuration)

Configuration & Governance policies let you enforce table-level settings, format standards, and operational guardrails across your organization. Instead of relying on teams to manually configure each table, define rules once and apply them everywhere.

What you can enforce

  • • Iceberg format version (e.g. require v2 across all production catalogs)
  • • Default file format (Parquet, ORC, Avro)
  • • Write distribution mode (hash, range, none)
  • • Commit retry and isolation settings
  • • Naming conventions and metadata standards

Example use cases

  • Standardize format version — ensure every table uses Iceberg v2 so all teams get row-level deletes, position deletes, and improved statistics.
  • Enforce Parquet as default — prevent teams from accidentally creating ORC or Avro tables that break downstream tooling assumptions.
  • Set write distribution mode — apply hash distribution across high-ingestion tables to prevent write hotspots and ensure balanced partition sizing.
  • Governance for new tables — when a team creates a new table in a governed catalog, it automatically inherits the organization's configuration policy — no manual setup required.

Policy scope

Policies can be scoped at different levels of your data hierarchy:

ScopeApplies toUse case
Per-tableA single specific tableCustom settings for critical or unusual tables
Per-namespaceAll tables in a namespaceTeam or domain-level standards
Per-catalogAll tables in a catalogEnvironment-level rules (prod, staging)
Organization-wideAll tables across all catalogsGlobal hygiene (e.g. orphan cleanup everywhere)

Precedence rules

More specific policies override broader ones. A per-table policy always takes precedence over a namespace, catalog, or organization-wide policy for the same operation type. This lets you set sensible defaults at the org level and override only where needed.

Global Policies screen

Navigate to Manage > Policies in the sidebar to access the central policy management screen. This is where you create, search, filter, and manage all policies across your organization.

Screen layout

ElementDescription
+ Create PolicyOpens a form to define a new policy (name, type, scope, schedule, settings)
Search barFilter policies by name or description
Type filterFilter by policy type (All Types, Compact Data Files, Expire Snapshots, etc.)
Status filterFilter by enabled/disabled status (All Status, Enabled, Disabled)

Policy table columns

ColumnDescription
StatusToggle switch to enable or disable the policy instantly
PolicyPolicy name and optional description (e.g. “For all tables in all catalogs every 7 days”)
TypeColor-coded badge showing the policy type
Next RunWhen the policy will next execute (based on cron schedule)
Last RunTimestamp of the most recent execution
UpdatedWhen the policy configuration was last modified
ActionsEdit (pencil icon) and Delete (trash icon) buttons

Creating a policy

1Click + Create Policy in the top-right of the Policies screen.
2Enter a policy name and optional description. Use descriptive names that reflect scope and purpose (e.g. prod_daily_compaction).
3Select a policy type: Compact Data Files, Expire Snapshots, Rewrite Manifests, Remove Orphan Files, or Configuration & Governance.
4Configure type-specific settings. For example, for Compact Data Files: set target file size, compaction strategy (Binpack/Sort), and cron schedule.
5Define the scope: select specific tables, a namespace, a catalog, or apply organization-wide.
6Toggle Enabled to activate immediately, or leave disabled to configure now and activate later. Click Save.

Per-table policy assignment

You can also view and manage policies from the perspective of a single table. Navigate to Explore, select a table, then open the Policies tab.

What you see

The per-table Policies tab shows all policies currently assigned to the selected table, including inherited policies from namespace, catalog, or organization scope. Each row shows:

ColumnDescription
StatusToggle to enable/disable the policy for this table
PolicyPolicy name
TypeColor-coded type badge
Next RunNext scheduled execution
Last RunMost recent execution timestamp

Assigning a policy

Click + Assign Policy to link an existing policy to this table. You can assign multiple policies of different types to the same table (e.g. one compaction policy plus one snapshot expiration policy).

Example: typical policy set

A production table typically has multiple policies covering different optimization operations:

StatusPolicyTypeSchedule
prod_daily_compactionCompact Data FilesDaily at 2:00 AM
prod_expire_snapshotsExpire SnapshotsHourly
prod_rewrite_manifestsRewrite ManifestsDaily at 4:00 AM
org_orphan_cleanupRemove Orphan FilesDaily at 3:00 AM

Compaction and manifest rewrites run daily to keep file layout optimal. Snapshot expiration runs hourly to prevent metadata bloat. Orphan cleanup runs daily at the org level to catch stragglers across all catalogs.

Policies vs. per-table Optimization tab

Both policies and the per-table Optimization tab configure the same underlying operations. The difference is scope and management:

Policies

  • • Managed centrally from the Policies screen
  • • Apply to many tables at once
  • • Versioned with full audit trail
  • • New tables inherit automatically
  • • Best for org-wide standards

Per-table Optimization tab

  • • Configured on individual tables
  • • Quick setup for one-off cases
  • • Override policy defaults when needed
  • • Includes Simulate button for preview
  • • Best for table-specific tuning

A common pattern: set organization-wide policies for baseline hygiene, then use the per-table Optimization tab to override settings on tables that need special treatment.

Scheduling best practices

Policies use cron expressions to control execution timing. Consider these guidelines:

  • Stagger schedules — avoid running compaction, snapshot expiration, and manifest rewrites at the same time. Spread them across different hours.
  • Run during low-traffic windows — schedule heavy operations like compaction during off-peak hours to minimize impact on query workloads.
  • Order matters — run snapshot expiration before orphan cleanup so that expired data files become detectable as orphans.
  • Start conservative — begin with longer intervals and tighten as you observe results.

Recommended schedule

OperationFrequencyCron
Compact Data FilesDaily0 2 * * * *
Expire SnapshotsHourly0 0 * * * *
Rewrite ManifestsDaily0 4 * * * *
Remove Orphan FilesDaily0 3 * * * *

Auditing & versioning

Every policy change is tracked with full audit history:

  • Version history — see when a policy was created, modified, enabled, or disabled, and by whom
  • Execution log — every policy run is recorded in the Events tab for each affected table, showing the operation performed, duration, and impact
  • Updated timestamp — the global Policies table shows when each policy was last modified

Monitoring policy execution

Track policy health and impact through:

  • Global Policies screen — Next Run and Last Run columns show scheduling health at a glance
  • Events tab (per-table) — detailed log of every operation executed by the policy, including before/after metrics
  • Insights tab (per-table) — warnings that should resolve after policy-driven optimizations take effect
  • Dashboard — aggregated operations count and cost savings reflect the cumulative impact of your policies