Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 17-Mar-2024
[Microsoft Fabric] V-Order- {def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]
- all parquet engines can read the files as regular parquet files [2]
- results in a smaller and therefore faster files to read [5]
- {benefit} improves read performance
- {benefit} decreases storage requirements
- {benefit} optimizes resources' usage
- reduces the compute resources required for reading data
- e.g. network bandwidth, disk I/O, CPU usage
- still conforms to the open-source Parquet file format [5]
- they can be read by non-Fabric tools [5]
- delta tables created and loaded by Fabric items automatically apply V-Order
- e.g. data pipelines, dataflows, notebooks [5]
- delta tables and its features are orthogonal to V-Order [2]
- e.g. Z-Order, compaction, vacuum, time travel
- table properties and optimization commands can be used to control the v-order of the partitions [2]
- compatible with Z-Order [2]
- not all files have this optimization applied [5]
- e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut
- the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]
- required by certain features
- [hash encoding] to assign a numeric identifier to each unique value contained in the column [5]
- {command} OPTIMIZE
- optimizes a Delta table to coalesce smaller files into larger ones [5]
- can apply V-Order to compact and rewrite the Parquet files [5]
- [warehouse]
- works by applying certain operations on Parquet files
- special sorting
- row group distribution
- dictionary encoding
- compression
- enabled by default
- ⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]
- provides cost efficiency and performance [1]
- the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]
- fully-compliant to the open-source parquet format [1]
- ⇐ all parquet engines can read it as regular parquet files [1]
- required by certain features
- [Direct Lake mode] depends on V-Order
- {operation} disable V-Order
- causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
- irreversible operation
- once disabled, it cannot be enabled again [3]
- {scenario} write-intensive warehouses
- warehouses dedicated to staging data as part of a data ingestion process [1]
- {warning} consider the effect of V-Order on performance before deciding to disable it [1]
- {recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]
- via ALTER DATABASE CURRENT SET VORDER = OFF; [3]
- {operation} check current status
- via SELECT name, is_vorder_enabled FROM sys.databases; [post]
- {feature} [lakehouse] Load to Table
- allows to load a single file or a folder of files to a table [6]
- tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]
- [Direct Lake semantic model]
- data is prepared for fast loading into memory [5]
- makes less demands on capacity resources [5]
- results in faster query performance [5]
- because less memory needs to be scanned [5]
References:
[1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
[2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
[3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
[4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
[5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
[1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
[2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
[3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
[4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
[5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
[6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]
Resources:
[R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
[R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
[R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]
[R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
[R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
[R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]
[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
Acronyms:
MF - Microsoft Fabric
MF - Microsoft Fabric
No comments:
Post a Comment