17 March 2025

🏭🗒️Microsoft Fabric: V-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] V-Order
  • {def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]
    • all parquet engines can read the files as regular parquet files [2]
    • results in a smaller and therefore faster files to read [5]
      • {benefit} improves read performance 
      • {benefit} decreases storage requirements
      • {benefit} optimizes resources' usage
        • reduces the compute resources required for reading data
          • e.g. network bandwidth, disk I/O, CPU usage
    • still conforms to the open-source Parquet file format [5]
      • they can be read by non-Fabric tools [5]
    • delta tables created and loaded by Fabric items automatically apply V-Order
      • e.g. data pipelines, dataflows, notebooks [5]
    • delta tables and its features are orthogonal to V-Order [2]
      •  e.g. Z-Order, compaction, vacuum, time travel
      • table properties and optimization commands can be used to control the v-order of the partitions [2]
    • compatible with Z-Order [2]
    • not all files have this optimization applied [5]
      • e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut 
      • the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]
    • required by certain features
      • [hash encoding] to assign a numeric identifier to each unique value contained in the column [5]
    • {command} OPTIMIZE 
      • optimizes a Delta table to coalesce smaller files into larger ones [5]
      • can apply V-Order to compact and rewrite the Parquet files [5]
  • [warehouse] 
    • works by applying certain operations on Parquet files
      • special sorting
      • row group distribution
      • dictionary encoding
      • compression 
    • enabled by default
    •  ⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]
      • provides cost efficiency and performance [1]
        • the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]
      • fully-compliant to the open-source parquet format [1]
        • ⇐ all parquet engines can read it as regular parquet files [1]
    • required by certain features
      • [Direct Lake mode] depends on V-Order
    • {operation} disable V-Order
      • causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
      • irreversible operation
        •  once disabled, it cannot be enabled again [3]
      • {scenario} write-intensive warehouses
        • warehouses dedicated to staging data as part of a data ingestion process [1]
      • {warning} consider the effect of V-Order on performance before deciding to disable it [1]
        • {recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]
      • via ALTER DATABASE CURRENT SET VORDER = OFF; [3]
    • {operation} check current status
      • via  SELECT name, is_vorder_enabled FROM sys.databases; [post]
  • {feature} [lakehouse] Load to Table
    • allows to load a single file or a folder of files to a table [6]
    • tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]
  • [Direct Lake semantic model] 
    • data is prepared for fast loading into memory [5]
      • makes less demands on capacity resources [5]
      • results in faster query performance [5]
        • because less memory needs to be scanned [5]

References:
[1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
[2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
[3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
[4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
[5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
[6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]

Resources:
[R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
[R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
[R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]
[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
MF - Microsoft Fabric

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.