The most performant, cost-effective lakehouse is one that optimizes itself as data volumes, query patterns, and organizational usage continue to evolve. Predictive Optimization (PO) in Unity Catalog enables this behavior by continuously analyzing how data is written and queried, then applying the appropriate maintenance actions automatically without requiring manual work from users or platform teams. In 2025, Predictive Optimization moved from an optional automation feature to the default platform behavior, managing performance and storage efficiency across millions of production tables while removing the operational burden traditionally associated with table tuning. Here’s a look at the milestones that got us here, and what’s coming next in 2026.
Throughout 2025, Predictive Optimization saw rapid adoption across the Databricks Platform as customers increasingly relied on autonomous maintenance to manage a growing data estate. Predictive Optimization has grown rapidly this past year:
Based on consistent performance improvements observed at this scale, Predictive Optimization is now enabled by default for all new Unity Catalog managed tables, workspaces, and accounts.
Predictive Optimization (PO) functions as the platform intelligence layer for the lakehouse, continuously optimizing your data layout, reducing storage footprint, and maintaining the precise file statistics required for efficient query planning on UC managed tables.
Based on observed usage patterns, PO automatically determines when and how to run commands like:
All optimization decisions are workload-driven and adaptive, eliminating the need to manage schedules, tune parameters, or revisit optimization strategies as query patterns change.
Accurate statistics are critical for building efficient query plans, yet manually managing statistics becomes increasingly impractical as data volume and query diversity grow.
With Automatic Statistics (now generally available), Predictive Optimization determines which columns matter based on observed query behavior and ensures that statistics remain up to date without manual ANALYZE commands.
Statistics are maintained through two complementary mechanisms:
Across real customer production workloads, this approach delivered up to twenty-two percent faster queries while removing the operational cost of manual statistics management.
VACUUM plays a critical role in managing storage costs and compliance by deleting unreferenced data files. Standard vacuuming requires listing all files in a table directory to identify candidates for removal, an operation that can take over 40 minutes for tables with 10 million files.
Predictive Optimization now applies an optimized VACUUM execution path that leverages the Delta transaction log to identify removable files directly, avoiding costly directory listings whenever possible.
At scale, this resulted in:
The engine dynamically determines when to use this log-based approach and when to perform a full directory scan to clean up fragments from aborted transactions.
Automatic Liquid Clustering reached general availability in 2025 and is already optimizing millions of tables in production.
The process is entirely workload-driven:
You get faster queries with zero manual tuning. By automatically analyzing workloads and applying the optimal data layout, PO removes the complex task of clustering key selection and ensures your tables remain highly performant as your query patterns evolve.
Predictive Optimization has expanded beyond traditional tables to support a broader set of the Databricks Platform.
This ensures autonomous maintenance across your full data estate rather than isolated optimization of individual tables.
We’re committed to delivering features that replace manual table tuning with automated maintenance. In parallel, we’re planning to extend beyond physical table health to address total data lifecycle intelligence—automated storage cost savings, data lifecycle management, and row deletion. We are also prioritizing enhanced observability, integrating Predictive Optimization insights into common table operations and the Governance Hub to provide clearer visibility into PO operations and their ROI.
Managing data retention or controlling storage costs is a critical, yet often manual, task. We're excited to introduce Auto-TTL, a new Predictive Optimization capability that completely automates row deletion. Using this feature, you’ll be able to set a simple time-to-live policy directly on any UC managed table using a command like:
Once the policy is set, Predictive Optimization takes care of the rest. It automates the entire two-step process by first running a DELETE operation to soft-delete the expired rows, and then following up with a VACUUM to permanently remove them from physical storage.
Reach out to your account team today to try this in Private Preview!
Improved Predictive Optimization Observability
You will be able to track the direct impact and ROI of Predictive Optimization in the new Data Governance Hub. This observability dashboard will come out of the box with a centralized view into PO's operations, surfacing key metrics that quantify its value.
Use this to see exactly what PO is doing under the hood, with clear visualizations for bytes compacted, bytes clustered by Liquid, bytes vacuumed, and bytes analyzed. Most importantly, the hub translates these actions into direct business value by showing your estimated storage cost savings. This will make it easier than ever to understand and communicate the positive impact PO is having on both your storage costs and query performance.
In DESCRIBED EXTENDED, you will also be able to see the reasons that Predictive Optimization skipped optimization (e.g. table already well-clustered, table too small to benefit from compaction, etc).
Furthermore, we’ve added the ability to see column selections for data skipping and Auto Liquid in the PO system table.
Reach out to your account team today to try the Data Governance Hub in Private Preview!
Improved Table-level Storage Observability
To provide greater clarity into your storage footprint, we will introduce enhanced observability features for Predictive Optimization. You will be able to monitor the health and evolution of your tables through high-level metrics like file counts and storage growth. By surfacing these insights directly, we’re making it easier to visualize the impact of automated maintenance and identify new opportunities to reduce costs and streamline your data estate.
Predictive Optimization is available today for Unity Catalog managed tables and is enabled by default for new workloads.
When enabled, customers automatically benefit from faster VACUUM execution, workload-aware Automatic Statistics, and autonomous data layout through Automatic Liquid Clustering.
You can also explore Auto TTL and Predictive Optimization observability (Data Governance Hub) through Private Preview by reaching out to your account team.