17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Z-Order
  • {def} technique to collocate related information in the same set of files [2]
    • ⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
    • {benefit} efficient query performance
      • reduces the amount of data to read [2] for certain queries
        • when the data is appropriately ordered, more files can be skipped [3]
        • particularly important for the ordering of multiple columns [3]
    • {benefit} data skipping
      • automatically skips irrelevant data, further enhancing query speeds
        • via data-skipping algorithms [2]
    • {benefit} flexibility
      • can be applied to multiple columns, making it versatile for various data schemas
    • aims to produce evenly-balanced data files with respect to the number of tuples
      • ⇐ but not necessarily data size on disk [2]
        • ⇐ the two measures are most often correlated [2]
          • ⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]
    • via ZORDER BY clause 
      • applicable to columns with high cardinality commonly used in query predicates [2]
      • multiple columns can be specified as a comma-separated list
        • {warning} the effectiveness of the locality drops with each extra column [2]
          • has tradeoffs
            • it’s important to analyze query patterns and select the right columns when Z Ordering data [3]
        • {warning} using columns that do not have statistics collected on them is  ineffective and wastes resources [2] 
          • statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]
      • {characteristic} not idempotent
        • every time is executed, it will try to create a new clustering of data in all files in a partition [2]
          • it includes new and existing files that were part of previous z-ordering [2]
      • compatible with v-order
    • {concept} [Databricks] liquid clustering 
      • replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]
        • not compatible with the respective features [4] [6]
      • tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
      • provides flexibility to redefine clustering keys without rewriting existing data [4] [6]
        • ⇒ allows data layout to evolve alongside analytic needs over time [4] [6]
      • applies to 
        • streaming tables 
        • materialized views
      • {scenario} tables often filtered by high cardinality columns [4] [6]
      • {scenario} tables with significant skew in data distribution [4] [6]
      • {scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
      • {scenario} tables with concurrent write requirements [4] [6]
      • {scenario} tables with access patterns that change over time [4] [6]
      • {scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]

    References:
    [1] Bennie Haelen & Dan Davis (2024) Delta Lake Up & Running: Modern Data Lakehouse Architectures with Delta Lake
    [2] Delta Lake (2023) Optimizations [link]
    [3] Delta Lake (2023) Delta Lake Z Order, by Matthew Powers [link]
    [4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
    [5] Databricks (2025) Delta Lake table format interoperability [link]
    [6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]

    Resources:
    [R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]
    [R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    MF - Microsoft Fabric 

    No comments:

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.