Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 17-Mar-2024
[Microsoft Fabric] Z-Order- {def} technique to collocate related information in the same set of files [2]
- ⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
- {benefit} efficient query performance
- reduces the amount of data to read [2] for certain queries
- when the data is appropriately ordered, more files can be skipped [3]
- particularly important for the ordering of multiple columns [3]
- {benefit} data skipping
- automatically skips irrelevant data, further enhancing query speeds
- via data-skipping algorithms [2]
- {benefit} flexibility
- can be applied to multiple columns, making it versatile for various data schemas
- aims to produce evenly-balanced data files with respect to the number of tuples
- ⇐ but not necessarily data size on disk [2]
- ⇐ the two measures are most often correlated [2]
- ⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]
- via ZORDER BY clause
- applicable to columns with high cardinality commonly used in query predicates [2]
- multiple columns can be specified as a comma-separated list
- {warning} the effectiveness of the locality drops with each extra column [2]
- has tradeoffs
- it’s important to analyze query patterns and select the right columns when Z Ordering data [3]
- {warning} using columns that do not have statistics collected on them is ineffective and wastes resources [2]
- statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]
- {characteristic} not idempotent
- every time is executed, it will try to create a new clustering of data in all files in a partition [2]
- it includes new and existing files that were part of previous z-ordering [2]
- compatible with v-order
- {concept} [Databricks] liquid clustering
- replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]
- not compatible with the respective features [4] [6]
- tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
- provides flexibility to redefine clustering keys without rewriting existing data [4] [6]
- ⇒ allows data layout to evolve alongside analytic needs over time [4] [6]
- applies to
- streaming tables
- materialized views
- {scenario} tables often filtered by high cardinality columns [4] [6]
- {scenario} tables with significant skew in data distribution [4] [6]
- {scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
- {scenario} tables with concurrent write requirements [4] [6]
- {scenario} tables with access patterns that change over time [4] [6]
- {scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]
[4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
[5] Databricks (2025) Delta Lake table format interoperability [link]
[6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]
[R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]
MF - Microsoft Fabric