18 March 2025

🏭🗒️Microsoft Fabric: Statistics in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 18-Mar-2024

[Microsoft Fabric] Statistics
  • {def} objects that contain relevant information about data, to allow query optimizer to estimate plans' costs [1]
    • critical for the warehouse and lakehouse SQL endpoint for executing queries quickly and efficiently [3]
      • when a query is executed, the engine tries to collect existing statistics for certain columns in the query and use that information to assist in choosing an optimal execution plan [3]
      • inaccurate statistics can lead to unoptimized query plans and execution times [5]
  • {type} user-defined statistics
    • statistics defined manually by the users via DDL statement [1]
    • users can create, update and drop statistic 
      • via CREATE|UPDATE|DROP STATISTICS
    • users can review the contents of histogram-based single-column statistics [1]
      • via DBCC SHOW_STATISTICS
        • only, a limited version of these statements is supported [1]
    • {recommendation} focus on columns heavily used in query workloads
      • e.g. GROUP BYs, ORDER BYs, filters, and JOINs
    • {recommendation} consider updating column-level statistics regularly [1]
      • e.g. after data changes that significantly change rowcount or distribution of the data [1]
  • {type} automatic statistics
    • statistics created and maintained automatically by the query engine at query time [1]
    • when a query is issued and query optimizer requires statistics for plan exploration, MF automatically creates those statistics if they don't already exist [1]
      • then the query optimizer can utilize them in estimating the plan costs of the triggering query [1]
      • if the query engine determines that existing statistics relevant to query no longer accurately reflect the data, those statistics are automatically refreshed [1]
        • these automatic operations are done synchronously [1]
          • the query duration includes this time [1]
  • {object type} histogram statistics
    • created per column needing histogram statistics at query time [1]
    • contains histogram and density information regarding the distribution of a particular column [1]
    • similar to the statistics automatically created at query-time in Azure Synapse Analytics dedicated pools [1]
    • name begins with _WA_Sys_.
    • contents can be viewed with DBCC SHOW_STATISTICS
  • {object type} average column length statistics
    • created for variable character columns (varchar) greater than 100 needing average column length at query-time [1]
    • contain a value representing the average row size of the varchar column at the time of statistics creation [1]
    • name begins with ACE-AverageColumnLength_
    • contents cannot be viewed and are nonactionable by users [1]
  • {object type} table-based cardinality statistics
    • created per table needing cardinality estimation at query-time [1]
    • contain an estimate of the rowcount of a table [1]
    • named ACE-Cardinality [1]
    • contents cannot be viewed and are nonactionable by user [1]
  • [lakehouse] SQL analytics endpoint
    • uses the same engine as the warehouse to serve high performance, low latency SQL queries [4]
    • {feature} automatic metadata discovery
      • a seamless process reads the delta logs and from the files folder and ensures SQL metadata for tables is always up to date [4]
        • e.g. statistics [4]
  • {limitation} only single-column histogram statistics can be manually created and modified [1]
  • {limitation} multi-column statistics creation is not supported [1]
  • {limitation} other statistics objects might appear in sys.stats
    • besides the statistics created manually/automatically [1]
      • ⇐ the objects are not used for query optimization [1]
  • {limitation} if a transaction has data insertion into an empty table and issues a SELECT before rolling back, the automatically generated statistics can still reflect the uncommitted data, causing inaccurate statistics [5]
    • {recommendation} update statistics for the columns mentioned in the SELECT [5]
  • {recommendation} ensure all table statistics are updated after large DML transactions [2]


References:
[1] Microsoft Learn (2025) Fabric: Statistics in Fabric data warehousing [link
[2] Microsoft Learn (2025) Fabric: Troubleshoot the Warehouse [link
[3] Microsoft Fabric Updates Blog (2023) Microsoft Fabric July 2023 Update [link]
[4] Microsoft Learn (2024) Fabric: Better together: the lakehouse and warehouse [link]
[5] Microsoft Learn (2024) Fabric: Transactions in Warehouse tables in Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
DDL - Data Definition Language
MF - Microsoft Fabric

17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Z-Order
  • {def} technique to collocate related information in the same set of files [2]
    • ⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
    • {benefit} efficient query performance
      • reduces the amount of data to read [2] for certain queries
        • when the data is appropriately ordered, more files can be skipped [3]
        • particularly important for the ordering of multiple columns [3]
    • {benefit} data skipping
      • automatically skips irrelevant data, further enhancing query speeds
        • via data-skipping algorithms [2]
    • {benefit} flexibility
      • can be applied to multiple columns, making it versatile for various data schemas
    • aims to produce evenly-balanced data files with respect to the number of tuples
      • ⇐ but not necessarily data size on disk [2]
        • ⇐ the two measures are most often correlated [2]
          • ⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]
    • via ZORDER BY clause 
      • applicable to columns with high cardinality commonly used in query predicates [2]
      • multiple columns can be specified as a comma-separated list
        • {warning} the effectiveness of the locality drops with each extra column [2]
          • has tradeoffs
            • it’s important to analyze query patterns and select the right columns when Z Ordering data [3]
        • {warning} using columns that do not have statistics collected on them is  ineffective and wastes resources [2] 
          • statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]
      • {characteristic} not idempotent
        • every time is executed, it will try to create a new clustering of data in all files in a partition [2]
          • it includes new and existing files that were part of previous z-ordering [2]
      • compatible with v-order
    • {concept} [Databricks] liquid clustering 
      • replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]
        • not compatible with the respective features [4] [6]
      • tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
      • provides flexibility to redefine clustering keys without rewriting existing data [4] [6]
        • ⇒ allows data layout to evolve alongside analytic needs over time [4] [6]
      • applies to 
        • streaming tables 
        • materialized views
      • {scenario} tables often filtered by high cardinality columns [4] [6]
      • {scenario} tables with significant skew in data distribution [4] [6]
      • {scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
      • {scenario} tables with concurrent write requirements [4] [6]
      • {scenario} tables with access patterns that change over time [4] [6]
      • {scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]

    References:
    [1] Bennie Haelen & Dan Davis (2024) Delta Lake Up & Running: Modern Data Lakehouse Architectures with Delta Lake
    [2] Delta Lake (2023) Optimizations [link]
    [3] Delta Lake (2023) Delta Lake Z Order, by Matthew Powers [link]
    [4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
    [5] Databricks (2025) Delta Lake table format interoperability [link]
    [6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]

    Resources:
    [R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]
    [R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    MF - Microsoft Fabric 

    🏭🗒️Microsoft Fabric: V-Order [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 17-Mar-2024

    [Microsoft Fabric] V-Order
    • {def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]
      • all parquet engines can read the files as regular parquet files [2]
      • results in a smaller and therefore faster files to read [5]
        • {benefit} improves read performance 
        • {benefit} decreases storage requirements
        • {benefit} optimizes resources' usage
          • reduces the compute resources required for reading data
            • e.g. network bandwidth, disk I/O, CPU usage
      • still conforms to the open-source Parquet file format [5]
        • they can be read by non-Fabric tools [5]
      • delta tables created and loaded by Fabric items automatically apply V-Order
        • e.g. data pipelines, dataflows, notebooks [5]
      • delta tables and its features are orthogonal to V-Order [2]
        •  e.g. Z-Order, compaction, vacuum, time travel
        • table properties and optimization commands can be used to control the v-order of the partitions [2]
      • compatible with Z-Order [2]
      • not all files have this optimization applied [5]
        • e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut 
        • the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]
      • required by certain features
        • [hash encoding] to assign a numeric identifier to each unique value contained in the column [5]
      • {command} OPTIMIZE 
        • optimizes a Delta table to coalesce smaller files into larger ones [5]
        • can apply V-Order to compact and rewrite the Parquet files [5]
    • [warehouse] 
      • works by applying certain operations on Parquet files
        • special sorting
        • row group distribution
        • dictionary encoding
        • compression 
      • enabled by default
      •  ⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]
        • provides cost efficiency and performance [1]
          • the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]
        • fully-compliant to the open-source parquet format [1]
          • ⇐ all parquet engines can read it as regular parquet files [1]
      • required by certain features
        • [Direct Lake mode] depends on V-Order
      • {operation} disable V-Order
        • causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
        • irreversible operation
          •  once disabled, it cannot be enabled again [3]
        • {scenario} write-intensive warehouses
          • warehouses dedicated to staging data as part of a data ingestion process [1]
        • {warning} consider the effect of V-Order on performance before deciding to disable it [1]
          • {recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]
        • via ALTER DATABASE CURRENT SET VORDER = OFF; [3]
      • {operation} check current status
        • via  SELECT name, is_vorder_enabled FROM sys.databases; [post]
    • {feature} [lakehouse] Load to Table
      • allows to load a single file or a folder of files to a table [6]
      • tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]
    • [Direct Lake semantic model] 
      • data is prepared for fast loading into memory [5]
        • makes less demands on capacity resources [5]
        • results in faster query performance [5]
          • because less memory needs to be scanned [5]

    References:
    [1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
    [2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
    [3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
    [4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
    [5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
    [6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]

    Resources:
    [R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
    [R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
    [R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]
    [R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    MF - Microsoft Fabric

    🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 17-Mar-2024

    [Microsoft Fabric] Caching
    • {def} technique that improves the performance of data processing by storing frequently accessed data and metadata in a faster storage layer [1]
      • e.g. local memory, local SSD disk
      • ⇐ subsequent requests can be served faster, directly from the cache [1]
        • if a set of data has been previously accessed by a query, any subsequent queries will retrieve that data directly from the in-memory cache [1]
        • local memory operations are notably faster compared to fetching data from remote storage [1]
        • ⇐ significantly diminishes IO latency [1]
      • fully transparent to the user
      • consistently active and operates seamlessly in the background [1]
      • orchestrated and upheld by MF
        • it doesn't offer users the capability to manually clear the cache [1] 
      • provides transactional consistency
        • ensures that any modifications to the data in storage after it has been initially loaded into the in-memory cache, will result in consistent data [1]
      • when the cache reaches its capacity threshold and fresh data is being read for the first time, objects that have remained unused for the longest duration will be removed from the cache [1]
        • process is enacted to create space for the influx of new data and maintain an optimal cache utilization strategy [1] 
    • {type} in-memory cache
      • data in cache is organized in a compressed columnar format (aka columnar storage) [1]
        • ⇐ optimized for analytical queries
        • each column of data is stored separately [1]
          • {benefit} allows for better compression [1]
            • since similar data values are stored together [1]
          • {benefit} reduces the memory footprint
      • when queries need to perform operations on a specific column, the engine can work more efficiently speeding up queries' execution [1]
        • it doesn't have to process unnecessary data from other columns
        • can perform operations on multiple columns simultaneously [1]
          • taking advantage of modern multi-core processors [1]
      • when retrieves data from storage, the data is transformed from its original file-based format into highly optimized structures in in-memory cache [1]
      • {scenario} analytical workloads where queries involve scanning large amounts of data to perform aggregations, filtering, and other data manipulations [1]
    • {type} disk cache
      • complementary extension to the in-memory cache
      • any data loaded into the in-memory cache is also serialized to the SSD cache [1]
        • data removed from the in-memory cache remains within the SSD cache for an extended period
        • when subsequent query requests the data, it is retrieved from the SSD cache into the in-memory cache quicker [1]
    • {issue} cold run (aka cold cache) performance
      • the first 1-3 executions of a query perform noticeably slower than subsequent executions [2]
        • if the first run's performance is crucial, try manually creating statistics (aka pre-warming the cache) [2]
        • otherwise, one can rely on the automatic statistics generated in the first query run and leveraged in subsequent runs [2]
        • as long as underlying data does not change significantly [2]
    • differentiated from
      • [Kusto] caching policy [link]
      • [Apache Spark] intelligent cache [link]
      • [Power BI] query caching [link]
      • [Azure] caching [link]
      References:
      [1] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
      [2] Microsoft Learn (2024) Fabric Data Warehouse performance guidelines [link]
      [3] Sandeep Pawar (2023) Pre-Warming The Direct Lake Dataset For Warm Cache Import-Like Performance [link]

      Resources:
      [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

      Acronyms:
      IO - Input/Output
      MF - Microsoft Fabric
      SSD - Solid State Drive

      16 March 2025

      💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XII: Databases)

      After reviewing the server and database properties, the next step is to review the database configurations using the sys.databases metadata. There's a such object even for warehouse and lakehouse, respectively for mirrored databases.

      Except the last line from the query, all the other attributes are supported by all above mentioned repositories. 

      -- database metadata
      SELECT db.name
      , db.database_id
      --, db.source_database_id
      , db.owner_sid
      , db.create_date
      , db.compatibility_level
      , db.collation_name
      --, db.user_access
      , db.user_access_desc
      --, db.is_read_only
      --, db.is_auto_close_on
      --, db.is_auto_shrink_on
      --, db.state
      , db.state_desc
      --, db.is_in_standby
      --, db.is_cleanly_shutdown
      --, db.is_supplemental_logging_enabled
      --, db.snapshot_isolation_state
      , db.snapshot_isolation_state_desc
      --, db.is_read_committed_snapshot_on
      --, db.recovery_model
      , db.recovery_model_desc
      , db.page_verify_option
      , db.page_verify_option_desc
      , db.is_auto_create_stats_on
      --, db.is_auto_create_stats_incremental_on
      , db.is_auto_update_stats_on
      --, db.is_auto_update_stats_async_on
      --, db.is_ansi_null_default_on
      --, db.is_ansi_nulls_on
      --, db.is_ansi_padding_on
      --, db.is_ansi_warnings_on
      --, db.is_arithabort_on
      --, db.is_concat_null_yields_null_on
      --, db.is_numeric_roundabort_on
      --, db.is_quoted_identifier_on
      --, db.is_recursive_triggers_on
      --, db.is_cursor_close_on_commit_on
      --, db.is_local_cursor_default
      , db.is_fulltext_enabled
      --, db.is_trustworthy_on
      , db.is_db_chaining_on
      --, db.is_parameterization_forced
      , db.is_master_key_encrypted_by_server
      --, db.is_query_store_on
      --, db.is_published
      --, db.is_subscribed
      --, db.is_merge_published
      --, db.is_distributor
      --, db.is_sync_with_backup
      , db.service_broker_guid
      --, db.is_broker_enabled
      --, db.log_reuse_wait
      , db.log_reuse_wait_desc
      --, db.is_date_correlation_on
      --, db.is_cdc_enabled
      --, db.is_encrypted
      --, db.is_honor_broker_priority_on
      --, db.replica_id
      --, db.group_database_id
      --, db.resource_pool_id
      --, db.default_language_lcid
      --, db.default_language_name
      --, db.default_fulltext_language_lcid
      --, db.default_fulltext_language_name
      --, db.is_nested_triggers_on
      --, db.is_transform_noise_words_on
      --, db.two_digit_year_cutoff
      --, db.containment
      --, db.containment_desc
      , db.target_recovery_time_in_seconds
      --, db.delayed_durability
      --, db.delayed_durability_desc
      --, db.is_memory_optimized_elevate_to_snapshot_on
      --, db.is_federation_member
      --, db.is_remote_data_archive_enabled
      --, db.is_mixed_page_allocation_on
      , db.is_temporal_history_retention_enabled
      --, db.catalog_collation_type
      , db.catalog_collation_type_desc
      , db.physical_database_name
      --, db.is_result_set_caching_on
      , db.is_accelerated_database_recovery_on
      --, db.is_tempdb_spill_to_remote_store
      --, db.is_stale_page_detection_on
      , db.is_memory_optimized_enabled
      , db.is_data_retention_enabled
      --, db.is_ledger_on
      --, db.is_change_feed_enabled
      --, db.is_data_lake_replication_enabled
      --, db.is_change_streams_enabled
      --, db.data_lake_log_publishing
      , db.data_lake_log_publishing_desc
      , db.is_vorder_enabled
      --, db.is_optimized_locking_on
      FROM sys.databases db
      

      Output (consolidated):

      Attribute Warehouse Mirrored Db Lakehouse SQL database
      name Warehouse Test 001 MirroredDatabase_1 Lakehouse_DWH SQL DB Test...
      database_id 5 6 7 28
      owner_sid 0x01010000000000051... 0x01010000000000051... x01010000000000051... AAAAAAWQAAAAAA...
      create_date 2025-02-22T18:56:28.700 2025-03-16T14:57:56.600 2025-03-16T15:07:59.563 2025-02-22T03:01:53.7130000
      compatibility_level 160 160 160 160
      collation_name Latin1_General_100_BIN2_UTF8 Latin1_General_100_BIN2_UTF8 Latin1_General_100_BIN2_UTF8 SQL_Latin1_General_CP1_CI_AS
      user_access_desc MULTI_USER MULTI_USER MULTI_USER MULTI_USER
      state_desc ONLINE ONLINE ONLINE ONLINE
      snapshot_isolation_state_desc ON ON ON ON
      recovery_model_desc SIMPLE SIMPLE SIMPLE FULL
      page_verify_option 0 0 0 2
      page_verify_option_desc NONE NONE NONE CHECKSUM
      is_auto_create_stats_on 1 1 1 1
      is_auto_update_stats_on 1 1 1 1
      is_fulltext_enabled 1 1 1 1
      is_db_chaining_on 0 0 0 0
      is_master_key_encrypted_by_server 0 0 0 0
      service_broker_guid 1F2261FC-5031-... 7D882362-567E-... 0D8938AB-BA79-... 2b74bed3-4405-...
      log_reuse_wait_desc NOTHING NOTHING NOTHING NOTHING
      target_recovery_time_in_seconds 60 60 60 60
      is_temporal_history_retention_enabled 1 1 1 1
      catalog_collation_type_desc DATABASE_DEFAULT DATABASE_DEFAULT DATABASE_DEFAULT SQL_Latin1_General_CP1_CI_AS
      physical_database_name Warehouse Test 001 MirroredDatabase_1 Lakehouse_DWH 3f4a3e79-e53e-...
      is_accelerated_database_recovery_on 1 1 1 1
      is_memory_optimized_enabled 1 1 1 1
      is_data_retention_enabled 1 1 1 1
      data_lake_log_publishing_desc AUTO UNSUPPORTED UNSUPPORTED UNSUPPORTED
      is_vorder_enabled 1 1 1 0

      The output for the SQL database was slightly different formatted and is_vorder_enabled is not available. Otherwise, the above query can be used for all environments. 

      All attributes except the last two are known from the earlier versions of SQL Server. is_vorder_enabled reflects the current status of V-Order [2], while data_lake_log_publishing_desc reflects the current state of Delta Lake log publishing [3].

      Consolidating the output from different sources helps identify the differences and one can easily use Excel formulas for this.

      Previous Post <<||>> Next Post

      References:
      [1] Microsoft Learn (2025) SQL Server 2022: sys.databases (Transact-SQL) [link]

      [2] Microsoft Learn (2025) SQL Server 2022: Disable V-Order on Warehouse in Microsoft Fabric [link

      [3] Microsoft Learn (2025) SQL Server 2022: Delta Lake logs in Warehouse in Microsoft Fabric [link

      💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XI: Database and Server Properties)

      When taking over a SQL Server, respectively database, one of the first checks I do focuses on the overall configuration, going through the UI available for admins to see if I can find anything that requires further investigation. If no documentation is available on the same, I run a few scripts and export their output as baseline. 

      Especially when documenting the configuration, it's useful to export the database options and properties defined at database level. Besides the collation and probably the recovery mode, typically the rest of the configuration is similar, though in exceptional cases one should expect also surprises that require further investigation! 

      The following query retrieves in a consolidated way all the options and properties of a SQL database in Microsoft Fabric. 

      -- database settings/properties 
      SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') Collation
      --, DATABASEPROPERTYEX(DB_NAME(), 'ComparisonStyle')  ComparisonStyle
      , DATABASEPROPERTYEX(DB_NAME(), 'Edition') Edition
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullDefault') IsAnsiNullDefault
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiNullsEnabled') IsAnsiNullsEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiPaddingEnabled') IsAnsiPaddingEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAnsiWarningsEnabled') IsAnsiWarningsEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsArithmeticAbortEnabled') IsArithmeticAbortEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoClose') IsAutoClose
      , DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatistics') IsAutoCreateStatistics
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoCreateStatisticsIncremental') IsAutoCreateStatisticsIncremental
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsAutoShrink') IsAutoShrink
      , DATABASEPROPERTYEX(DB_NAME(), 'IsAutoUpdateStatistics') IsAutoUpdateStatistics
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsClone') IsClone
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsCloseCursorsOnCommitEnabled') IsCloseCursorsOnCommitEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsDatabaseSuspendedForSnapshotBackup') IsDatabaseSuspendedForSnapshotBackup
      , DATABASEPROPERTYEX(DB_NAME(), 'IsFulltextEnabled') IsFulltextEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsInStandBy') IsInStandBy
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsLocalCursorsDefault') IsLocalCursorsDefault
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsMemoryOptimizedElevateToSnapshotEnabled') IsMemoryOptimizedElevateToSnapshotEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsMergePublished') IsMergePublished
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsNullConcat') IsNullConcat
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsNumericRoundAbortEnabled') IsNumericRoundAbortEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsParameterizationForced') IsParameterizationForced
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsQuotedIdentifiersEnabled') IsQuotedIdentifiersEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsPublished') IsPublished
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsRecursiveTriggersEnabled') IsRecursiveTriggersEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsSubscribed') IsSubscribed
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsSyncWithBackup') IsSyncWithBackup
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsTornPageDetectionEnabled') IsTornPageDetectionEnabled
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsVerifiedClone') IsVerifiedClone
      --, DATABASEPROPERTYEX(DB_NAME(), 'IsXTPSupported') IsXTPSupported
      , DATABASEPROPERTYEX(DB_NAME(), 'LastGoodCheckDbTime') LastGoodCheckDbTime
      , DATABASEPROPERTYEX(DB_NAME(), 'LCID') LCID
      --, DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') MaxSizeInBytes
      , DATABASEPROPERTYEX(DB_NAME(), 'Recovery') Recovery
      --, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjective') ServiceObjective
      --, DATABASEPROPERTYEX(DB_NAME(), 'ServiceObjectiveId') ServiceObjectiveId
      , DATABASEPROPERTYEX(DB_NAME(), 'SQLSortOrder') SQLSortOrder
      , DATABASEPROPERTYEX(DB_NAME(), 'Status') Status
      , DATABASEPROPERTYEX(DB_NAME(), 'Updateability') Updateability
      , DATABASEPROPERTYEX(DB_NAME(), 'UserAccess') UserAccess
      , DATABASEPROPERTYEX(DB_NAME(), 'Version') Version
      --, DATABASEPROPERTYEX(DB_NAME(), 'ReplicaID') ReplicaID
      

      Output:

      Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
      SQL_Latin1_General_CP1_CI_AS FabricSQLDB 1 1 1 12/31/1899 1033 FULL 52 ONLINE READ_WRITE MULTI_USER 981

      The query can be run also against the SQL analytics endpoints available for warehouses in Microsoft Fabric.

      Output:

      Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
      Latin1_General_100_BIN2_UTF8 DataWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

      Respectively, for lakehouses:

      Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
      Latin1_General_100_BIN2_UTF8 LakeWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

      A similar output is obtained if one runs the query against SQL database's SQL analytics endpoint:

      Output:

      Collation Edition IsAutoCreateStatistics IsAutoUpdateStatistics IsFulltextEnabled LastGoodCheckDbTime LCID Recovery SQLSortOrder Status Updateability UserAccess Version
      Latin1_General_100_BIN2_UTF8 LakeWarehouse 1 1 1 12/31/1899 1033 SIMPLE 0 ONLINE READ_WRITE MULTI_USER 987

      SQL databases seem to inherit the collation from the earlier versions of SQL Server.

      Another meaningful value for SQL databases is MaxSizeInBytes, which in my environment had a value of 3298534883328 bytes ÷ 1,073,741,824 = 3,072 GB.

      There are however also server properties. Here's the consolidated overview:

      -- server properties
      SELECT --SERVERPROPERTY('BuildClrVersion') BuildClrVersion
       SERVERPROPERTY('Collation') Collation
      --, SERVERPROPERTY('CollationID') CollationID
      , SERVERPROPERTY('ComparisonStyle') ComparisonStyle
      --, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') ComputerNamePhysicalNetBIOS
      , SERVERPROPERTY('Edition') Edition
      --, SERVERPROPERTY('EditionID') EditionID
      , SERVERPROPERTY('EngineEdition') EngineEdition
      --, SERVERPROPERTY('FilestreamConfiguredLevel') FilestreamConfiguredLevel
      --, SERVERPROPERTY('FilestreamEffectiveLevel') FilestreamEffectiveLevel
      --, SERVERPROPERTY('FilestreamShareName') FilestreamShareName
      --, SERVERPROPERTY('HadrManagerStatus') HadrManagerStatus
      --, SERVERPROPERTY('InstanceDefaultBackupPath') InstanceDefaultBackupPath
      , SERVERPROPERTY('InstanceDefaultDataPath') InstanceDefaultDataPath
      --, SERVERPROPERTY('InstanceDefaultLogPath') InstanceDefaultLogPath
      --, SERVERPROPERTY('InstanceName') InstanceName
      , SERVERPROPERTY('IsAdvancedAnalyticsInstalled') IsAdvancedAnalyticsInstalled
      --, SERVERPROPERTY('IsBigDataCluster') IsBigDataCluster
      --, SERVERPROPERTY('IsClustered') IsClustered
      , SERVERPROPERTY('IsExternalAuthenticationOnly') IsExternalAuthenticationOnly
      , SERVERPROPERTY('IsExternalGovernanceEnabled') IsExternalGovernanceEnabled
      , SERVERPROPERTY('IsFullTextInstalled') IsFullTextInstalled
      --, SERVERPROPERTY('IsHadrEnabled') IsHadrEnabled
      --, SERVERPROPERTY('IsIntegratedSecurityOnly') IsIntegratedSecurityOnly
      --, SERVERPROPERTY('IsLocalDB') IsLocalDB
      --, SERVERPROPERTY('IsPolyBaseInstalled') IsPolyBaseInstalled
      --, SERVERPROPERTY('IsServerSuspendedForSnapshotBackup') IsServerSuspendedForSnapshotBackup
      --, SERVERPROPERTY('IsSingleUser') IsSingleUser
      --, SERVERPROPERTY('IsTempDbMetadataMemoryOptimized') IsTempDbMetadataMemoryOptimized
      , SERVERPROPERTY('IsXTPSupported') IsXTPSupported
      , SERVERPROPERTY('LCID') LCID
      , SERVERPROPERTY('LicenseType') LicenseType
      , SERVERPROPERTY('MachineName') MachineName
      , SERVERPROPERTY('NumLicenses') NumLicenses
      , SERVERPROPERTY('PathSeparator') PathSeparator
      --, SERVERPROPERTY('ProcessID') ProcessID
      , SERVERPROPERTY('ProductBuild') ProductBuild
      --, SERVERPROPERTY('ProductBuildType') ProductBuildType
      --, SERVERPROPERTY('ProductLevel') ProductLevel
      --, SERVERPROPERTY('ProductMajorVersion') ProductMajorVersion
      --, SERVERPROPERTY('ProductMinorVersion') ProductMinorVersion
      --, SERVERPROPERTY('ProductUpdateLevel') ProductUpdateLevel
      --, SERVERPROPERTY('ProductUpdateReference') ProductUpdateReference
      --, SERVERPROPERTY('ProductUpdateType') ProductUpdateType
      , SERVERPROPERTY('ProductVersion') ProductVersion
      , SERVERPROPERTY('ResourceLastUpdateDateTime') ResourceLastUpdateDateTime
      , SERVERPROPERTY('ResourceVersion') ResourceVersion
      , SERVERPROPERTY('ServerName') ServerName
      , SERVERPROPERTY('SqlCharSet') SqlCharSet
      , SERVERPROPERTY('SqlCharSetName') SqlCharSetName
      , SERVERPROPERTY('SqlSortOrder') SqlSortOrder
      , SERVERPROPERTY('SqlSortOrderName') SqlSortOrderName
      , SERVERPROPERTY('SuspendedDatabaseCount') SuspendedDatabaseCount
      

      Output (consolidated):

      Property SQL database Warehouse Lakehouse
      Collation SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CI_AS SQL_Latin1_General_CP1_CI_AS
      ComparisonStyle 196609 196609 196609
      Edition SQL Azure SQL Azure SQL Azure
      EngineEdition 12 11 11
      InstanceDefaultDataPath NULL NULL NULL
      IsAdvancedAnalyticsInstalled 1 1 1
      IsExternalAuthenticationOnly 1 0 0
      IsExternalGovernanceEnabled 1 1 1
      IsFullTextInstalled 1 0 0
      IsXTPSupported 1 1 1
      LCID 1033 1033 1033
      LicenseType DISABLED DISABLED DISABLED
      MachineName NULL NULL NULL
      NumLicenses NULL NULL NULL
      PathSeparator \ \ \
      ProductBuild 2000 502 502
      ProductVersion 12.0.2000.8 12.0.2000.8 12.0.2000.8
      ResourceLastUpdateDateTime 11/6/2024 3:41:27 PM 3/5/2025 12:05:50 PM 3/5/2025 12:05:50 PM
      ResourceVersion 16.00.5751 17.00.502 17.00.502
      ServerName ... .... ....
      SqlCharSet 1 1 1
      SqlCharSetName iso_1 iso_1 iso_1
      SqlSortOrder 52 52 52
      SqlSortOrderName nocase_iso nocase_iso nocase_iso
      SuspendedDatabaseCount NULL 0 0

      It's interesting that all three instances have the same general collation, while the Engine Edition of SQL databases is not compatible with the others [2]. The Server Names has been removed manually from the output from obvious reasons. The warehouse and lakehouse are in the same environment (SQL Azure instance, see sys.databases), and therefore the same values are shown (though this might happen independently of the environments used).

      The queries were run in a trial Microsoft Fabric environment. Other environments can have upon case different properties. Just replace the "--" from the commented code to get a complete overview.

      The queries should run also in the other editions of SQL Server. If DATABASEPROPERTYEX is not supported, one should try DATABASEPROPERTY instead.

      Happy coding!

      Previous Post <<||>> Next Post

      References:
      [1] Microsoft Learn (2024) SQL Server 2022: DATABASEPROPERTYEX (Transact-SQL) [link]
      [2] Microsoft Learn (2024) SQL Server 2022: SERVERPROPERTY (Transact-SQL) [link]

      🏭🗒️Microsoft Fabric: Azure Storage Account [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 16-Mar-2025

      [Microsoft Fabric] Azure Storage Account

      • {def} a container provided by Microsoft Azure that houses all storage data in the cloud, including blobs (binary large objects), files, queues, and tables 
      • provides a unique namespace for Azure Storage data that's accessible from anywhere in the world over HTTP or HTTPS [1]
        • every object stored has a URL address that includes customer's unique account name [1]
          • the combination of the account name and the service endpoint forms the endpoints for the storage account [1]
      •  is an ARM resource
        • ⇐  the deployment and management service for Azur [
        • belongs to a Azure resource group
          • ⇐ a logical container for grouping Azure services [3]
      • {characteristic} durable
      • {characteristic} highly available
      • {characteristic} secure
      • {characteristic} massively scalable
      • {concept} storage account name
        • must be between 3 and 24 characters in length
        • may contain numbers and lowercase letters only
        • must be unique within Azure
      • {operation} create storage account 
      • {operation} migrate storage account
        • {scenario} move a storage account to a different subscription
        • {scenario} move a storage account to a different resource group
        • {scenario} move a storage account to a different region
        • {scenario} upgrade to a general-purpose v2 storage account
        • {scenario} migrate a classic storage account to Azure Resource Manager
      • {operation} delete storage account
        •  deletes the entire account, including all data in the account [3]
        • recovery is not guaranteed
          • under certain circumstances, a deleted storage account might be recovered [3]
        • deleting the resource group,  deletes the storage account and any other resources in that resource group [3]
        • {recommendation} back up any data before deleting the account [3]
      • [operation} monitoring storage accounts
        • see storage metrics in Azure Monitor
      • {operation} transfer data into a storage account
      • {type} general purpose v1 (GPv1)
        • can no longer be created from the Azure portal [3]
        • no new feature development is expected for this account type [3]
        • currently there's no plan to deprecate support [3]
          • at least one year's advance notice will be provided before deprecating [3]
      • {type} general purpose v2 (GPv2)
        • recommended for most scenarios [3]
        • support the latest Azure Storage features and incorporate all of the functionality of GPv1 [4]
        • upgrading to a GPv2 storage account from GPv1 or Blob storage accounts is straightforward [4]
          • permanent and cannot be undone [4]
          • there's no downtime or risk of data loss associated [4]
          • happens via a simple ARM operation that changes the account type [4]
          • the upgrade is free
          • changing the storage access tier after the upgrade may result in bill changes [4]
      • blob access tiers 
        • enable you to choose the most cost-effective storage based on your anticipated usage patterns [4]
      • storage account encryption
        • all data is automatically encrypted on the service side [1]
      • provides three storage services: 
        • table storage 
          • a NoSQL key/value store
        • queue storage 
          • a simple queueing service
        • blob storage
          • used extensively throughout Azure to store things from logs to virtual machine disks [2]
          • enables storing files predominantly in three different formats [2]
            • ⇐ according to read/write workload [2]
            • block blobs 
              • optimized for storing text or binary files
              • allow for efficient parallel upload/download of a block or list of blocks and modification of a blob at the block granularity [2]
              • modifications to an individual blob take a two-phase approach [2]
                • {phase 1} one uploads the changes as a set of blocks. 
                • {phase 2} one commits the changes by identifying the list of uploaded blocks
                • commonly used for files where the typical workload is to read or write the entire file [2]
                  • e.g. text files, CSVs, and binary files 
            • append blobs
              • a variant of a block blob that is optimized for append-only write workloads [2]
                • e.g. logging
              • doesn't allow deleting or updating of existing blocks [2]
            • page blobs
              • optimized for predominantly random read/write workloads against portions of the blob, where data is stored in pages [2]
                • e.g. virtual machine disks
          • file structure for a blob in Blob Storage
            • container
              • logical grouping of blobs
                • similar to how folders group files on your local machine [2]
              • at the root of the Storage account [2]
              • it can be used to set access permissions on the blobs it contains [2]
              • within each container can have an unlimited quantity of blobs [2]
              • each blob (or more precisely, each container and blob pair) identifies a partition [2]
                • each file have in Blob Storage is its own partition [2]
      • when creating the Storage account one can define the degree of replication of the data desired for high availability and disaster recovery purposes [2]
        • at minimum data stored within a Storage account is replicated on three separate nodes within a single facility (i.e., building)
          • {option} locally redundant storage (LRS)
            • stores three copies of the data within a facility (which is naturally within a specific geographic region) [2]
          • {option} zone-redundant storage (ZRS)
            • augments LRS by enabling a replica within another facility within the same region. 
            • supports only block blobs [2]
          • {option} geo-redundant storage (GRS)
            • automatically replicates blob storage to another geographic region that is hundreds of miles away from the primary [2]
            • this secondary replica is not readable unless the primary becomes unavailable 
              • when this happens, the failover is transparent to your application, but Azure will send you an email notification) [2]
              • when new data arrives, that data is first replicated to the three local replicas and then asynchronously replicated to the secondary geographic replica (where it is also replicated three times) [2]
          • {option} read-only geo-redundant storage (RA-GRS)
            • variant of GRS providing a secondary endpoint that enables reading from the secondary Storage account [2]
        • Azure Monitor 
          • provides a unified monitoring experience [4]
          • stores metrics that include aggregated transaction statistics and capacity data about requests to the storage service [4]
          • receives metric data from the Azure Storage [4]

      Resources:
      [1] Microsoft Learn (2025) Azure: Storage account overview [link]
      [2] Zoiner Tejada (2017) Mastering Azure Analytics 
      [3] Microsoft Learn (2025) Azure: Create a storage account [link]
      [4] Microsoft Learn (2025) Upgrade to a general-purpose v2 storage account [link]

      Resources:
      [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

      Acronyms:
      ARM - Azure Resource Manager 
      GRS - Geo-Redundant Storage
      LRS - Locally Redundant Storage
      RA-GRS - Read-Only Geo-Redundant Storage
      ZRS - Zone-Redundant Storage

      15 March 2025

      🏭🗒️Microsoft Fabric: Azure Data Lake Storage Gen2 (ADLS Gen2) [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes.

      Last updated: 21-Jan-2025

      [Microsoft Fabric] Azure Data Lake Storage Gen2 (ADLS Gen2)

      • {def} a set of capabilities built on Azure Blob Storage and dedicated to big data analytics [27]
      • {capability} low-cost storage
        • provides cloud storage that is less expensive than the cloud storage that relational databases provide [25]
      • {capability} performant massive storage
        • designed to service multiple petabytes of information while sustaining hundreds of gigabits of throughput [27]
          • processing is executed at near-constant per-request latencies that are measured at the service, account, and file levels [27]
      • {capability} tiered storage
      • {capability} high availability/disaster recovery
      • {capability} file system semantics
      • {capability} file-level security
      • {capability} scalability
      • supports a range of tools and programming languages that enable large amounts of data to be reported on, queried, and transformed [25]
      • included in customer subscriptions
        • customers can bring their data lake and integrate it with D365FO
      • combines BYOD and Entity store [25]
      • entity store 
        • staged in the data lake and provides a set of simplified (denormalized) data structures to make reporting easier [25]
          • users can be given direct access to the relevant data and can create reports [25]
        • instead of exporting data by using BYOD, customers can select the data that is staged in the data lake
          • the data in the data lake are synchronized via the D365FO data feed service [25]
          • reflects the updated D365FO data within a few minutes after the changes occur [25]
        • further data can be brought into the data lake
        • the data can be consumed via cloud-based services [25]
        • {benefit} no need to monitor and manage complex data export and orchestration schedules [25]
        • {benefit} no user intervention is required to update data in the data lake [25]
        • o{benefit} cost effective data storage
      • [licensing] included in a customer's subscription
        • the customer must pay for 
          • data storage
          • I/O costs that are incurred when data is read and written to the data lake [25]
          • I/O costs because D365FO apps write data to the data lake or update the data in it [25]
      • {prerequisite} the data lakes must be provisioned in the same country or region as the D365 environment [25]
        • the stored data comply with the CMD folder standard
        • integration makes warm-path reporting available as the default reporting option
          • because data in a data lake is updated within minutes [25]
          • approach acceptable for most reporting scenarios (incl. near-real-time reporting) [25]
      • {benefit} designed to store large amounts of data [25]
      • {benefit} designed for big data analytics [25]
      • {benefit} has many associated services that enable analytics, data transformation, and the application of AI and machine learning [25]
        • hierarchical namespace 
          • organizes objects/files into a hierarchy of directories for efficient data access [27]
          • renaming or deleting a directory become single atomic metadata operations on the directory [28]
          • no need to enumerate and process all objects that share the name prefix of the directory [28]
      • {feature} query acceleration 
        • enables applications and analytics frameworks to significantly optimize data processing by retrieving only the data that they require to perform a given operation [28]
        • accepts filtering predicates and column projections which enable applications to filter rows and columns at the time that data is read from disk [28]
          • only the data that meets the conditions of a predicate are transferred over the network to the application [28]
          • reduces network latency and compute cost [28]
            • a query acceleration request is specified via SQL
          • a request processes only one file
        • advanced relational features of SQL aren't supported [28] 
          • e.g. joins and group by aggregates,
        • supports CSV and JSON formatted data as input to each request
        • compatible with the blobs in storage accounts that don't have a hierarchical namespace enabled on them [28]
        • designed for distributed analytics frameworks 
          • e.g. Apache Spark and Apache Hive
            • these engines include query optimizers that can incorporate knowledge of the underlying I/O service's capabilities when determining an optimal query plan for user queries [28]
            • these frameworks integrate query acceleration
              • ⇒ users see improved query latency and a lower total cost of ownership without having to make any changes to the queries 
          • includes a storage abstraction layer within the framework  [28]
        • designed for data processing applications that perform large-scale data transformations that might not directly lead to analytics insights [28]


      References
      [25] Azure Synapse Analytics (2020) Azure Data Lake overview [link
      [27] Introduction to Azure Data Lake Storage Gen2 [link
      [28] Azure Data Lake Storage query acceleration [link

      Resources:
      [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

      Acronyms:
      AI - Artificial Intelligence
      BYOD - Bring Your Own Device
      CSV - Comma-Separated Values
      I/O - Input/Output
      JSON - JavaScript Object Notation)
      Related Posts Plugin for WordPress, Blogger...

      About Me

      My photo
      Koeln, NRW, Germany
      IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.