Organization-Wide Policies
LakeOps lets you define and enforce compaction, retention, orphan cleanup, and maintenance policies across catalogs and tables. Set schedules, priorities, and target scopes — then let LakeOps execute continuously. Every policy is auditable, versioned, and controllable with one toggle.
Why use policies?
Configuring optimization settings table-by-table works for a handful of tables, but doesn't scale. Policies let you define optimization rules once and apply them across hundreds or thousands of tables automatically.
- •Consistency — every table gets the same optimization standards without manual setup
- •Scale — onboard new tables automatically as they inherit catalog or org-wide policies
- •Governance — every policy change is versioned and auditable with full history
- •Control — enable or disable any policy instantly with a single toggle
How policies work
A policy is a named rule that applies a specific optimization operation to a set of tables on a schedule. Each policy has:
- •A name and optional description
- •A type (Compact Data Files, Expire Snapshots, etc.)
- •A scope (which tables it applies to)
- •A cron schedule controlling when it runs
- •An enable/disable toggle for instant control
- •Type-specific settings (target file size, retention period, strategy, etc.)
When enabled, LakeOps executes the policy on the configured cron schedule. You can also trigger any policy manually at any time.
Policy types
LakeOps supports the following policy types, each mapping to a specific table optimization operation:
Compact Data Files
Merge small data files into optimally-sized files using Binpack or Sort strategy. Reduces file count, improves query performance, and lowers storage request costs.
Configurable settings
- • Target file size (default: 512 MB)
- • Compaction strategy: Binpack (size) or Sort (queries)
- • Cron schedule
Learn more in Compaction docs
Expire Snapshots
Remove snapshots older than the retention period while respecting minimum retention count and concurrent readers. Keeps metadata lean and enables storage reclamation.
Configurable settings
- • Retention period (use table config or custom days)
- • Minimum snapshots to retain
- • Delete associated metadata files (on/off)
- • Create associated files (on/off)
- • Cron schedule
Learn more in Snapshot Management docs
Rewrite Manifests
Consolidate manifest files to reduce metadata overhead and improve query planning performance across all connected engines.
Configurable settings
- • Cron schedule
Learn more in Manifest Optimization docs
Remove Orphan Files
Detect and safely remove unreferenced data files older than the configured age threshold. Reclaims storage from failed writes, expired snapshots, and dropped tables.
Configurable settings
- • Retention threshold (default: 7 days)
- • Cron schedule
Learn more in Orphan Cleanup docs
Configuration & Governance (UI label: Configuration)
Configuration & Governance policies let you enforce table-level settings, format standards, and operational guardrails across your organization. Instead of relying on teams to manually configure each table, define rules once and apply them everywhere.
What you can enforce
- • Iceberg format version (e.g. require v2 across all production catalogs)
- • Default file format (Parquet, ORC, Avro)
- • Write distribution mode (hash, range, none)
- • Commit retry and isolation settings
- • Naming conventions and metadata standards
Example use cases
- • Standardize format version — ensure every table uses Iceberg v2 so all teams get row-level deletes, position deletes, and improved statistics.
- • Enforce Parquet as default — prevent teams from accidentally creating ORC or Avro tables that break downstream tooling assumptions.
- • Set write distribution mode — apply hash distribution across high-ingestion tables to prevent write hotspots and ensure balanced partition sizing.
- • Governance for new tables — when a team creates a new table in a governed catalog, it automatically inherits the organization's configuration policy — no manual setup required.
Policy scope
Policies can be scoped at different levels of your data hierarchy:
| Scope | Applies to | Use case |
|---|---|---|
| Per-table | A single specific table | Custom settings for critical or unusual tables |
| Per-namespace | All tables in a namespace | Team or domain-level standards |
| Per-catalog | All tables in a catalog | Environment-level rules (prod, staging) |
| Organization-wide | All tables across all catalogs | Global hygiene (e.g. orphan cleanup everywhere) |
Precedence rules
More specific policies override broader ones. A per-table policy always takes precedence over a namespace, catalog, or organization-wide policy for the same operation type. This lets you set sensible defaults at the org level and override only where needed.
Global Policies screen
Navigate to Manage > Policies in the sidebar to access the central policy management screen. This is where you create, search, filter, and manage all policies across your organization.
Screen layout
| Element | Description |
|---|---|
| + Create Policy | Opens a form to define a new policy (name, type, scope, schedule, settings) |
| Search bar | Filter policies by name or description |
| Type filter | Filter by policy type (All Types, Compact Data Files, Expire Snapshots, etc.) |
| Status filter | Filter by enabled/disabled status (All Status, Enabled, Disabled) |
Policy table columns
| Column | Description |
|---|---|
| Status | Toggle switch to enable or disable the policy instantly |
| Policy | Policy name and optional description (e.g. “For all tables in all catalogs every 7 days”) |
| Type | Color-coded badge showing the policy type |
| Next Run | When the policy will next execute (based on cron schedule) |
| Last Run | Timestamp of the most recent execution |
| Updated | When the policy configuration was last modified |
| Actions | Edit (pencil icon) and Delete (trash icon) buttons |
Creating a policy
prod_daily_compaction).Per-table policy assignment
You can also view and manage policies from the perspective of a single table. Navigate to Explore, select a table, then open the Policies tab.
What you see
The per-table Policies tab shows all policies currently assigned to the selected table, including inherited policies from namespace, catalog, or organization scope. Each row shows:
| Column | Description |
|---|---|
| Status | Toggle to enable/disable the policy for this table |
| Policy | Policy name |
| Type | Color-coded type badge |
| Next Run | Next scheduled execution |
| Last Run | Most recent execution timestamp |
Assigning a policy
Click + Assign Policy to link an existing policy to this table. You can assign multiple policies of different types to the same table (e.g. one compaction policy plus one snapshot expiration policy).
Example: typical policy set
A production table typically has multiple policies covering different optimization operations:
| Status | Policy | Type | Schedule |
|---|---|---|---|
| prod_daily_compaction | Compact Data Files | Daily at 2:00 AM | |
| prod_expire_snapshots | Expire Snapshots | Hourly | |
| prod_rewrite_manifests | Rewrite Manifests | Daily at 4:00 AM | |
| org_orphan_cleanup | Remove Orphan Files | Daily at 3:00 AM |
Compaction and manifest rewrites run daily to keep file layout optimal. Snapshot expiration runs hourly to prevent metadata bloat. Orphan cleanup runs daily at the org level to catch stragglers across all catalogs.
Policies vs. per-table Optimization tab
Both policies and the per-table Optimization tab configure the same underlying operations. The difference is scope and management:
Policies
- • Managed centrally from the Policies screen
- • Apply to many tables at once
- • Versioned with full audit trail
- • New tables inherit automatically
- • Best for org-wide standards
Per-table Optimization tab
- • Configured on individual tables
- • Quick setup for one-off cases
- • Override policy defaults when needed
- • Includes Simulate button for preview
- • Best for table-specific tuning
A common pattern: set organization-wide policies for baseline hygiene, then use the per-table Optimization tab to override settings on tables that need special treatment.
Scheduling best practices
Policies use cron expressions to control execution timing. Consider these guidelines:
- •Stagger schedules — avoid running compaction, snapshot expiration, and manifest rewrites at the same time. Spread them across different hours.
- •Run during low-traffic windows — schedule heavy operations like compaction during off-peak hours to minimize impact on query workloads.
- •Order matters — run snapshot expiration before orphan cleanup so that expired data files become detectable as orphans.
- •Start conservative — begin with longer intervals and tighten as you observe results.
Recommended schedule
| Operation | Frequency | Cron |
|---|---|---|
| Compact Data Files | Daily | 0 2 * * * * |
| Expire Snapshots | Hourly | 0 0 * * * * |
| Rewrite Manifests | Daily | 0 4 * * * * |
| Remove Orphan Files | Daily | 0 3 * * * * |
Auditing & versioning
Every policy change is tracked with full audit history:
- •Version history — see when a policy was created, modified, enabled, or disabled, and by whom
- •Execution log — every policy run is recorded in the Events tab for each affected table, showing the operation performed, duration, and impact
- •Updated timestamp — the global Policies table shows when each policy was last modified
Monitoring policy execution
Track policy health and impact through:
- •Global Policies screen — Next Run and Last Run columns show scheduling health at a glance
- •Events tab (per-table) — detailed log of every operation executed by the policy, including before/after metrics
- •Insights tab (per-table) — warnings that should resolve after policy-driven optimizations take effect
- •Dashboard — aggregated operations count and cost savings reflect the cumulative impact of your policies
