Showing posts with label new features. Show all posts
Showing posts with label new features. Show all posts

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 24-May-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
) 
AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLV)

  • {def} persisted, continuously updated view of data [1]
    • {benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations
      • simplifies the implementation of multi-stage Lakehouse processing [1]
        • streamline data workflows
        • enable developers to focus on business logic [1]
          • ⇐ not on infrastructural or data quality-related issues [1]
        • the views can be created in a notebook [2]
      • can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
      • empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]
        • faster development cycles 
        • trustworthy data
        • quicker insights
  • {goal} process only the new or changed data instead of reprocessing everything each time [1]
    • ⇐  leverages Delta Lake’s CDF under the hood
      • ⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]
  • {operation} creation
    • allows defining transformations at each layer [1]
      • e.g. aggregation, projection, filters
    • allows specifying certain checks that the data must meet [1]
      • incorporate data quality constraints directly into the pipeline definition
    • via CREATE MATERIALIZED LAKE VIEW
      • the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]
  • {operation} refresh
    • refreshes only when its source has new data [1]
      • if there’s no change, it can skip running entirely (saving time and resources) [1]
  • {feature} automatically generate a visual report that shows trends on data quality constraints 
    • {benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]
  • {feature} can be combined with Shortcut Transformation feature for CSV ingestion 
    • {benefit} allows building an end-to-end Medallion architecture
  • {feature} dependency graph
    • allows to see the dependencies existing between the various objects [2]
      • ⇐ automatically generated [2]
  • {feature} data quality report
    • built-in Power BI dashboard that shows several aggregated metrics [2]
  • {feature|planned} support for PySpark
  • {feature|planned} incremental refresh
  • {feature|planned} integration with Data Activator
Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]
[2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]
[3] Microsoft Learn (2025)  [link]

Resources:
[R1] Databricks (2025) Use materialized views in Databricks SQL [link]

Acronyms:
CDF - Change Data Feed
ETL - Extract, Transfer, Load
MF - Microsoft Fabric
MLV - Materialized Lake views

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 23-May-2025

[Microsoft Fabric] Warehouse Snapshots

  • {def} read-only representation of a warehouse at a specific point in time [1]
  • allows support for analytics, reporting, and historical analysis scenarios without worrying about the volatility of live data updates [1]
    • provide a consistent and stable view of data [1]
    • ensuring that analytical workloads remain unaffected by ongoing changes or ETL  operations [1]
  • {benefit} guarantees data consistency
    • the dataset remains unaffected by ongoing ETL processes [1]
  • {benefit} immediate roll-Forward updates
    • can be seamlessly rolled forward on demand to reflect the latest state of the warehouse
      • ⇒ {benefit} consumers access the same snapshot using a consistent connection string, even from third-party tools [1]
      • ⇐ updates are applied immediately, as if in a single, atomic transaction [1]
  • {benefit} facilitates historical analysis
    • snapshots can be created on an hourly, daily, or weekly basis to suit their business requirements [1]
  • {benefit} enhanced reporting
    • provides a point-in-time reliable dataset for precise reporting [1]
      • ⇐ free from disruptions caused by data modifications [1]
  • {benefit} doesn't require separate storage [1]
    • relies on source Warehouse [1]
  • {limit} doesn't support database objects 
  • {limit} capture a state within the last 30 days
  • {operation} create snapshot
    • via New warehouse snapshot
    • multiple snapshots can be created for the same parent warehouse [1]
      • appear as child items of the parent warehouse in the workspace view [1]
      • the queries run against provide the current version of the data being accessed [1]
  • {operation} read properties 
    • via 
    • GET https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{warehousesnapshotId} Authorization: Bearer <bearer token>
  • {operation} update snapshot timestamp
    • allows users to roll forward data instantly, ensuring consistency [1]
      • use current state
        • via ALTER DATABASE [<snapshot name>] SET TIMESTAMP = CURRENT_TIMESTAMP; 
      • use point in time
        • ALTER DATABASE snapshot SET TIMESTAMP = 'YYYY-MM-DDTHH:MM:SS.SS'//UTC time
    • queries that are in progress during point in time update will complete against the version of data they were started against [1]
  • {operation} rename snapshot
  • {operation} delete snapshot
    • via DELETE
    • when the parent warehouse gets deleted, the snapshot is also deleted [1]
  • {operation} modify source table
    • DDL changes to source will only impact queries in the snapshot against tables affected [1]
  • {operation} join multiple snapshots
    • the resulting snapshot date will be applied to each warehouse connection [1]
  • {operation} retrieve metadata
    • via sys.databases [1]
  • [permissions] inherited from the source warehouse [1]
    • ⇐ any permission changes in the source warehouse applies instantly to the snapshot [1]
    • security updates on source database will be rendered immediately to the snapshot databases [1]
  • {limitation} can only be created against new warehouses [1]
    • created after Mar-2025
  • {limitation} do not appear in SSMS Object Explorer but will show up in the database selection dropdown [1]
  • {limitation} datetime can be set to any date in the past up to 30 days or database creation time (whichever is later)  [1]
  • {limitation} modified objects after the snapshot timestamp become invalid in the snapshot [1]
    • applies to tables, views, and stored procedures [1]
  • {limitation} must be recreated if the data warehouse is restored [1]
  • {limitation} aren’t supported on the SQL analytics endpoint of the Lakehouse [1]
  • {limitation} aren’t supported as a source for OneLake shortcuts [1]
  •  [Power BI]{limitation} require Direct Query or Import mode [1]
    • don’t support Direct Lake

    References:
    [1] Microsoft Learn (2025) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]
    [2] Microsoft Learn (2025) Warehouse snapshots (preview) [link]
    [3] Microsoft Learn (2025) Create and manage a warehouse snapshot (preview) [link]

    Resources:


    Acronyms:
    DDL - Data Definition Language
    ETL - Extract, Transfer, Load
    MF - Microsoft Fabric
    SSMS - SQL Server Management Studio

    26 April 2025

    🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 26-Apr-2

    [Microsoft Fabric] Dataflow Gen2 Parameters

    • {def} parameters that allow to dynamically control and customize Dataflows Gen2
      • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
      • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
        • Fabric REST API [1]
        • native Fabric experiences [1]
      • parameter names are case sensitive [1]
      • {type} required parameters
        • {warning} the refresh fails if no value is passed for it [1]
      • {type} optional parameters
      • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
    • {limitation} dataflows with parameters can't be
      • scheduled for refresh through the Fabric scheduler [1]
      • manually triggered through the Fabric Workspace list or lineage view [1]
    • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
      • ⇐ connections are linked to the exact data source path defined in the authored dataflow
        • can't be currently override to use other connections or resource paths [1]
    • {limitation} can't be leveraged by dataflows with incremental refresh [1]
    • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
      • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
    • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
    • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
    • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
    • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
    • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
      • subsequent requests are rejected until the first request finishes its evaluation [1]

    References:
    [1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

    Resources:
    [R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

    Acronyms:
    API - Application Programming Interface
    REST - Representational State Transfer

    25 April 2025

    🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 25-Apr-2025

    [Microsoft Fabric] Incremental Refresh in Dataflows Gen2

    • {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
      • designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
      • configurable directly in the dataflow editor [8]
      • doesn't need to specify the historical data range [8]
        • ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
      • doesn't need to specify the parameters for the incremental refresh [8]
        • the filters and parameters are automatically added as the last step in the query [8]
    • {prerequisite} the data source 
      • supports folding [8]
      • needs to contain a Date/DateTime column that can be used to filter the data [8]
    • {prerequisite} the data destination supports incremental refresh [8]
      • available destinations
        • Fabric Warehouse
        • Azure SQL Database
        • Azure Synapse Analytics
        • Fabric Lakehouse [preview]
      • other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
        • allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
          • a full refresh from the staged data to the data destination is still needed [8]
    • works by dividing the data into buckets based on a DateTime column [8]
      • each bucket contains the data that changed since the last refresh [8]
        • the dataflow knows what changed by checking the maximum value in the specified column 
          • if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
          • if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
    • {limitation} 
      • the data destination must be set to a fixed schema [8]
      • ⇒table's schema in the data destination must be fixed and can't change [8]
        • ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
    • {limitation} the only supported update method in the data destination: replace
      • ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
        • data that is outside the bucket range isn't affected [8]
    • {limitation} maximum number of buckets
      • single query: 50
        • {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
      • whole dataflow: 150
        • {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
    • {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
      • because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
      • {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
        • {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
        • {option} reduce the number of buckets by increasing the bucket size [8]
        • {option} disable incremental refresh [8]
    • {recommendation} don't use the column for detecting changes also for filtering [8]
      • because this can lead to unexpected results [8]
    • {setting} limit number of concurrent evaluation
      • setting the value to a lower number, reduces the number of requests sent to the source system [8]
      • via global settings >> Scale tab >> maximum number of parallel query evaluations
      • {recommendation} don't enable this limit unless there're issues with the source system [8]

    References:
    [5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
    [8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link

    Resources:


    14 April 2025

    🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 13-Apr-2025

    [Microsoft Fabric] Quotas

    • {def} assigned number of resources for an Azure subscription
      • depend on the Azure subscription type [5]
        • set and enforced in the scope of the subscription [1]
        • each subscription has a default value for each quota [1]
      • determine the maximum number of CUs for each of the capacities on the subscription [5]
        • customers don't get charged for quotas, but on capacities [5]
      • each quota represents a specific countable resource
        • e.g. number of VMs that can be created [2]
        • e.g. the number of storage accounts that can be used concurrently [2]
        • e.g. the number of networking resources that can be consumed [2]
        • e.g. the number of API calls to a particular service that can be made [2]
      • designed to help protect customers from unexpected behavior [1]
        • e.g. inaccurately resourced deployments, mistaken consumption
        • helps minimize risks from deceptive or inappropriate consumption and unexpected demand [1]
      • limitations
        • ⇐ some limits are managed at a regional level [2]
        • ⇐ variable and dependent on several factors
    • {type} adjustable quotas
      • quotas for which customers can request quota increases
        • via Azure Home My quotas page
        • based on amount or usage percentage and submitting it directly [1]
        • each subscription has a default quota value for each quota [1]
      • the quickest way to increase quotas [1]
    • {type} non-adjustable quotas
      • quotas which have a hard limit, usually determined by the scope of the subscription [1]
      • to make changes, customers must submit a support request for the Azure support team [1]
    • [Fabric] limit the number of CUs customers can provision across multiple capacities in a subscription
      • calculation based on
        • subscription plan type
        • Azure region
    • {action} provision a new capacity
      • {level} Azure management group limits
      • {level} Azure subscription limits
      • {level} Azure resource group limits
      • {level} Template limits
      • {level} Microsoft Entra ID limits
      • {level} Azure API Center limits
      • {level} Azure API Management limits
      • {level} API Management v2 tiers
      • {level} Azure App Service limits
      • {level} Azure Automation limits
      • {action} request quota increase
        • via support request
        • there is no cost associated with requesting a quota increase [1]
          • ⇐ costs are incurred based on resource usage, not the quotas themselves [1]
        • should be based on workloads' characteristics [1]
      • {action} view quota
        • [permission] contributor role, or another role that includes contributor access [5]
      • {action} manage quota
        • [permission] see Quota Request Operator
      • [Azure] quota alerts
        • notifications triggered when the usage of a specific Azure resource nears the predefined quota limit [4]
        • facilitates proactive resource management [4]
        • multiple alert rules can be created for a given quota or across quotas in a subscription [4]

        References:
        [1] Microsoft Learn (2024) Fabric: Quotas overview [link
        [2] Microsoft Learn (2025) Fabric: Azure subscription and service limits, quotas, and constraints [link
        [3] Microsoft Learn (2025) Fabric: Quota monitoring and alerting [link
        [4] Microsoft Fabric Update Blog (2024) Announcing the launch of Microsoft Fabric Quotas [link]
        [5] Microsoft Learn (2025) Fabric: Microsoft Fabric capacity quotas [link]

        Resources:
        [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

        Acronyms:
        API - Application Programming Interface
        CU - Capacity Units
        VM - Virtual Machine

        07 April 2025

        🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 7-Apr-2025

        [Microsoft Fabric] User Data Functions (UDFs)
        • {def} a platform that allows customers to host and run custom logic on Fabric from different types of items and data sources [1]
        • empower data developers to write custom logic and embed it into their Fabric ecosystem [1]
          • the logic can include internal algorithms and libraries [1]
          • {benefit} allows to handle complex transformations, optimizing performance, and integrating diverse data sources beyond the capabilities of low-code tools [5]
            • via public libraries from PyPI
            • via native Fabric integrations to 
              • connect to Fabric data sources
                • e.g. warehouse, lakehouse or SQL Database
              • invoke functions from other Fabric items
                • e.g. notebooks, Power BI reports, data pipelines
            • via edit functions directly in the Fabric portal [1]
              • via in-browser tooling, VS Code extension
            • supports the Python 3.11 runtime [1]
          • {goal} reusability
            • by creating libraries of standardized functionality [1]
              • can be used in many solutions across the organization [1]
          • {goal} customization
            • the applications are tailored to customers’ needs [1]
          • {goal} encapsulation
            • functions can perform various tasks to build sophisticated workflows [1]
          • {goal} external connectivity
            • data functions can be invoked from external client applications using a REST endpoint [1]
            • allows integrations with external systems [1]
        • {feature} programming model
          • SDK that provides the necessary functionality to author and publish runnable functions in Fabric [4]
          • {benefit} allows to seamlessly integrate with other items in the Fabric ecosystem [4]
            • e.g. Fabric data sources
          • {concept} user data functions item 
            • contains one or many functions that can be invoked from the Fabric portal using the provided REST endpoint [4]
              • from another Fabric item, or from an external application 
            • each function is a method in Python script that allows passing parameters and returning an output to the invoker [4]
          • {component} fabric.functions library 
            • provides the code needed to create user data functions in Python [4]
            • imported in the template by default [4]
          • {method} fn.UserDataFunctions() 
            • provides the execution context [4]
            • added at the beginning of the code file in all new user data functions items, before any function definitions [4]
          • {concept} functions
            • every function is identified with a @udf.function() decorator
              • can be invoked individually from the portal or an external invoker [4]
              • functions without the decorator can be invoked directly [4]
        • {feature} invocation logs
          • function invocations are logged
          • {benefit} allows to check the status or perform debugging  [3]
          • {limitation} can take few minutes to appear [3]
            • {recommendation} refresh the page after a few minutes [3]
          • {limitation} daily ingestion limit: 250 MB [3]
            • reset the next day 
              • a new log is made available [3]
          • {limitation} logs are sampled while preserving a statistically correct analysis of application data [3]
            • the sampling is done by User data functions to reduce the volume of logs ingested [3]
            • if some logs are partially missing, it might be because of sampling [3]
          • {limitation} supported log types: information, error, warning, trace [3]
        • {feature} Manage connections
          • {limitation} only supports connecting to Fabric-native data sources [2]
        • {limitation} only available in a subset of Fabric regions [2]
        • {limitation} editable by the owner only [2]
          • can only be modified and published by the user who is the owner of the User Data Functions Fabric item [2]
        • {limitation} adds further keywords to the list of reserved keywords
          • namely: req, context, reqInvocationId
          • can't be used as parameter names or function names [2]

        References:
        [1] Microsoft Learn (2025) Microsoft Fabric: What is Fabric User data functions (Preview)? [link]
        [2] Microsoft Learn (2025) Microsoft Fabric: Service details and limitations of Fabric User Data Functions [link]
        [3] Microsoft Learn (2025) Microsoft Fabric: User data functions invocation logs (Preview) [link]
        [4] Microsoft Learn (2025) Microsoft Fabric: Fabric User data functions programming model overview (Preview) [link]
        [5] Microsoft Fabric Updates Blog (2025) Announcing Fabric User Data Functions (Preview) [link]
        [6] Microsoft Learn (2025) Microsoft Fabric: Use the Functions activity to run Fabric user data functions and Azure Functions [link

        Resources:
        [R1] Microsoft Fabric Updates Blog (2025) Utilize User Data Functions in Data pipelines with the Functions activity (Preview) [link

        Acronyms:
        PyPI - Python Package Index
        REST - Representational State Transfer
        SDK - Software Development Kit
        UDF - User Data Function
        VS - Visual Studio

        06 April 2025

        🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 6-Apr-2025

        [Microsoft Fabric] Hints in Warehouse
        • {def} keywords that users can add to SQL statements to provide additional information or instructions to the query optimizer [2]
          • options or strategies specified for enforcement by the SQL Server query processor [1]
            • applicable to SELECT, INSERT, UPDATE, or DELETE statements [1]
        • {benefit} help improve the performance, scalability, or consistency of queries by overriding the default behavior of the query optimizer [2]
        • {type} join hints
          • specify what join strategy/algorithm used between two tables [3]
            • improves the performance of queries that involve large or complex joins [2]
            • specified in the FROM clause of a query [1]
            • if a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query [3]
              • based on the position of the ON keywords [3]
                • when a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order [3]
          • looping
            • via LOOP JOIN
            • {restriction} can't be specified together with RIGHT or FULL as a join type [3]
          • hashing
            • via HASH JOIN
          • merging
            • via MERGE JOIN
          • REPLICATE
            • causes a broadcast move operation
              • a specific table to be replicated across all distribution nodes [2]
            • with INNER or LEFT` join
              • the broadcast move operation will replicate the right side of the join to all nodes [2]
            • with RIGHT join
              • the broadcast move operation will replicate the left side of the join to all nodes [2]
            • with FULL` join
              • an estimated plan cannot be created [2]
          • REDISTRIBUTE [(colsCount)]
            • ensures two data sources are distributed based on JOIN clause columns [2]
            • handles multiple join conditions, specified by the first n columns in both tables, where n is the column_count argument [2]
            • redistributing data optimizes query performance by evenly spreading data across nodes during intermediate steps of execution [2]
            • the (columns_count) argument is only supported in MF warehouse [2]
        • {type} query hint
          • specify that the indicated hints are used in the scope of a query [3]
            • affect all operators in the statement
              • [UNION only the last query involving a UNION operation can have the OPTION clause [3]
            • specified as part of the OPTION clause [3]
            • Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan [3]
          • used via the OPTION clause at the end of a query [2]
            • followed by the name of the query hint and its optional parameters in parentheses [2]
            • multiple hints can be used in the same query, separated by commas
              • e.g. FORCE ORDER and MAX_GRANT_PERCENT
                • instruct the QO to preserve the join order specified in the query and to limit the memory grant to 20 percent of the available memory
          • {hint} HASH GROUP
            • specifies that the QO should use a hash-based algorithm for the GROUP BY operation [2]
            • {benefit} can improve the performance of queries that involve large or complex grouping sets [2]
          • {hint} ORDER GROUP
            • specifies that the QO should use a sort-based algorithm for the GROUP BY operation [2]
            • {benefit} can improve the performance of queries that involve small or simple grouping sets [2]
          • {hint} MERGE UNION
            • specifies that the QO should use a merge-based algorithm for the UNION or UNION ALL operation [2]
            • {benefit} can improve the performance of queries that involve sorted inputs [2]
          • {hint} HASH UNION
            • specifies that the query optimizer should use a hash-based algorithm for the UNION or UNION ALL operation [2]
            • {benefit} can improve the performance of queries that involve unsorted or large inputs [2]
          • {hint} CONCAT UNION
            • specifies that the QO should use a concatenation-based algorithm for the UNION or UNION ALL operation [2]
            • {benefit} can improves the performance of queries that involve distinct or small inputs[2]
          • {hint} FORCE ORDER
            • specifies that the QO should preserve the join order specified in the query [2]
            • {benefit} can improves the performance or consistency of queries that involve complex join conditions or hints [2]
          • {hint} FORCE SINGLE NODE PLAN/FORCE DISTRIBUTED PLAN
            • allows to choose whether to force a single node plan or a distributed plan for query’s execution [2]
          • {hint} USE HINT
            • adds one or more extra hints to the query processor, where the hints are specified with a hint name inside single quotation marks inside OPTION clause [2] OPTION(USE HINT(‘ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES’)) 
            • used with several hint names, changing the behavior of CE derivation
              • {hint name} ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES
                • applies when calculating cardinality estimates for AND predicates for filters [2]
                • MF assumes full correlation among filters when a high level of underestimation on AND predicates for filters is observed [2]
                • [SQL Server] equivalent TF 4137 [4]
              • {hint name} ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES
                • applies when calculating cardinality estimates for AND predicates for filters  [2]
                • MF assumes full independence among filters [2]
                  • if a high level of overestimation on AND predicates for filters is observed, this hint can help produce a better estimate [2]
              • {hint name} ASSUME_PARTIAL_CORRELATION_FOR_FILTER_ESTIMATES
                • applies when calculating cardinality estimates for AND predicates for filters [2]
                • {default} MF assumes partial correlation among filters [2]
                  • ⇐ it is unlikely that this hint will help improve the estimates [2]
              • {hint name} ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS
                • applies when calculating cardinality estimate for joins [2]
                • uses Simple Containment assumption instead of the default Base Containment assumption [2]
                • [SQL Server] equivalent TF 9476 [4]
        • {type} table hints
          • none enforced currently

        References:
        [1] Microsoft Learn (2025) SQL: Hints (T-SQL) [link]
        [2] Microsoft Fabric Updates Blog (2025) Hints in Fabric Data Warehouse [link]
        [3] Microsoft Learn (2025) SQL: OPTION clause (T-SQL) [link]
        [4] Microsoft Support (2016) KB3189813 - Update introduces USE HINT query hint argument in SQL Server 2016 [link]

        Resources:
        [R1] Microsoft Learn (2025) SQL Server: Query hints (T-SQL) [link]
        [R2] Most Useful Query Hints [link]

        Acronyms:
        MF - Microsoft Fabric
        TF - Trace Flag
        QO - Query Optimizer

        28 March 2025

        🏭🗒️Microsoft Fabric: OneLake Role-Based Access Control (RBAC) [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 28-Mar-2025

        [Microsoft Fabric] OneLake Role-based access control (RBAC)

        • {def} security framework that allows to manage access to resources by assigning roles to users or groups 
          • applies to Lakehouse Items only [1]
          • restricts data access for users with Workspace Viewer or read access to a lakehouse [1]
          • doesn't apply to Workspace Admins, Members, or Contributors [1]
            • ⇒ supports only Read level of permissions [1]
          • uses role assignments to apply permissions to its members
            • assigned to 
              • individuals
              • security groups
              • Microsoft 365 groups
              • distribution lists
              • ⇐ every member of the user group gets the assigned role [1]
            • users in multiple groups get the highest level of permission that is provided by the roles [1]
          • managed through the lakehouse data access settings [1]
          • when a lakehouse is created, OneLake generates a default RBAC Role named Default Readers [1]
            • allows all users with ReadAll permission to read all folders in the Item [1]
          • permissions always inherit to the entire hierarchy of the folder's files and subfolders [1]
          • provides automatic traversal of parent items to ensure that data is easy to discover [1]
            • ⇐ similar to Windows folder permissions [1]
            • [shortcuts] shortcuts to other OneLake locations have specialized behavior [1]
              • the access to a OneLake shortcut is determined by the target permissions of the shortcut [1]
                • when listing shortcuts, no call is made to check the target access [1]
                  • ⇒ when listing a directory all internal shortcuts will be returned regardless of a user's access to the target [1]
                    • when a user tries to open the shortcut the access check will evaluate and a user will only see data they have the required permissions to see [1]
          •  enable you to restrict the data access in OneLake only to specific folders [1]
        • {action} share a lakehouse
          • grants other users or a group of users access to a lakehouse without giving access to the workspace and the rest of its items [1]
          • found through 
            • Data Hub 
            • 'Shared with Me' section in Microsoft Fabrics
        • [shortcuts] permissions always inherit to all Internal shortcuts where a folder is defined as target [1]
          • when a user accesses data through a shortcut to another OneLake location, the identity of the calling user is used to authorize access to the data in the target path of the shortcut [1]
            • ⇒ the user must have OneLake RBAC permissions in the target location to read the data [1]
            • defining RBAC permissions for the internal shortcut is not allowed [1]
              • must be defined on the target folder located in the target item [1]
              • OneLake enables RBAC permissions only for shortcuts targeting folders in lakehouse items [1]


        References:
        [1] Microsoft Learn (2024) Fabric: Role-based access control (RBAC) [link]
        [2] Microsoft Learn (2024) Best practices for OneLake security [link]

        Resources:
        [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

        Acronyms:
        ADLS - Azure Data Lake Storage
        RBAC - Role-Based Access Control

        12 March 2025

        🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 12-Mar-2025

        Query Acceleration
        Query Acceleration [2]

        [Microsoft Fabric] Query Acceleration

        • {def} 
          • indexes and caches on the fly data landing in OneLake [2]
            • {benefit} allows to 
              • analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake 
              • ⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2] 
              • ⇒ accelerate data landing in OneLake
                • ⇐ including existing data and any new updates, and expect similar performance [1]
                • eliminates the need to 
                  • manage ingestion pipelines [1]
                  • maintain duplicate copies of data [1]
                • ensures that data remains in sync without additional effort [4]
                • the initial process is dependent on the size of the external table [4]
            • ⇐ provides significant performance comparable to ingesting data in Eventhouse [1]
              • in some cases up to 50x and beyond [2]
            • ⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]
              • when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
              • accelerating the shortcut means equivalent ingestion into the Eventhouse
              • ⇐  optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
              • e.g. indexing, caching, etc. 
            • all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]  
            • behave like external tables, with the same limitations and capabilities [4]
              • {limitation} materialized view aren't supported [1]
              • {limitation} update policies aren't supported [1]
          • allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]
            • ⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]
              • ⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]
          • {costs} charged under OneLake Premium cache meter [2]
            • ⇐ similar to native Eventhouse tables [2]
            • one can control the amount of data to accelerate by configuring number of days to cache [2]
            • indexing activity may also count towards CU consumption [2]
          • {limitation} the number of columns in the external table can't exceed 900 [1]
          • {limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
          • {limitation} the feature assumes delta tables with static advanced features
            • e.g. column mapping doesn't change, partitions don't change, etc
            • {recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]
          • {limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]
            • might result in acceleration starting from scratch if there was breaking schema change [1]
          • {limitation} index-based pruning isn't supported for partitions [1]
          • {limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

        References:
        [1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
        [2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
        [3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]
        [4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link

        Resources:
        [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

        25 February 2025

        🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 25-Feb-2024

        [Microsoft Fabric] T-SQL notebook

        • {def} notebook that enables to write and run T-SQL code within a notebook [1]
        • {feature} allows to manage complex queries and write better markdown documentation [1]
        • {feature} allows the direct execution of T-SQL on
          • connected warehouse
          • SQL analytics endpoint
          • ⇐ queries can be run directly on the connected endpoint [1]
            • multiple connections are allowed [1]
        • allows running cross-database queries to gather data from multiple warehouses and SQL analytics endpoints [1]
        • the code is run by the primary warehouse
          • used as default in commands which supports three-part naming, though no warehouse was provided [1]
          • three-part naming consists of 
            • database name
              • the name of the warehouse or SQL analytics endpoint [1]
            • schema name
            • table name
        • {feature} autogenerate T-SQL code using the code template from the object explorer's context [1] menu
        • {concept} code cells
          • allow to create and run T-SQL code
            • each code cell is executed in a separate session [1]
              • {limitation} the variables defined in one cell are not available in another cell [1]
              • one can check the execution summary after the code is executed [1]
            • cells can be run individually or together [1]
            • one cell can contain multiple lines of code [1]
              • users can select and run subparts of a cell’s code [1]
          • {feature} Table tab
            • lists the records from the returned result set
              • if the execution contains multiple result set, you can switch from one to another via the dropdown menu [1]
        • a query can be saved as 
          • view
            • via 'Save as' view
            • {limitation} does not support three-part naming [1]
              • the view is always created in the primary warehouse [1]
                • by setting the warehouse as the primary warehouse [1]
          • table
            • via 'Save as' table
            • saved as CTAS 
          • ⇐ 'Save as' is only available for the selected query text
            • the query text must be selected before using the Save as options
        • {limitation} doesn’t support 
          • parameter cell
            • the parameter passed from pipeline or scheduler can't be used [1]
          • {feature} Recent Run 
            • {workaround} use the current data warehouse monitoring feature to check the execution history of the T-SQL notebook [1]
          • {feature} the monitor URL inside the pipeline execution
          • {feature} snapshot 
          • {feature} Git support 
          • {feature} deployment pipeline support 

        References:
        [1] Microsoft Learn (2025) T-SQL support in Microsoft Fabric notebooks [link
        [2] Microsoft Learn (2025) Create and run a SQL Server notebook [link
        [3] Microsoft Learn (2025) T-SQL surface area in Microsoft Fabric [link
        [4] Microsoft Fabric Updates Blog (2024) Announcing Public Preview of T-SQL Notebook in Fabric [link]

        Resources:
        [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

        Acronyms
        CTAS - Create Table as Select
        T-SQL - Transact SQL

        17 January 2025

        💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VIII: Permissions) [new feature]

        Data-based solutions usually target a set of users who (ideally) have restricted permissions to the functionality. Therefore, as part of the process are defined several personas that target different use cases, for which the permissions must be restricted accordingly. 

        In the simplest scenario the user must have access to the underlying objects for querying the data. Supposing that an Entra User was created already, the respective user must be given access also in the Fabric database (see [1], [2]). From database's main menu follow the path to assign read permissions:
        Security >> Manage SQL Security >> (select role: db_datareader)

        Manage SQL Security
        Manage SQL Security

        Manage access >> Add >> (search for User)

        Manage access
        Manage access

        (select user) >> Share database >> (select additional permissions) >> Save

        Manage additional permissions
        Manage additional permissions

        The easiest way to test whether the permissions work before building the functionality is to login over SQL Server Management Studio (SSMS) and check the access using the Microsoft Entra MFA. Ideally, one should have a User's credentials that can be used only for testing purposes. After the above setup was done, the new User was able to access the data. 

        A second User can be created for testing with the maximum of permissions allowed on the SQL database side, which is useful for troubleshooting. Alternatively, one can use only one User for testing and assign or remove the permissions as needed by the test scenario. 

        It's a good idea to try to understand what's happening in the background. For example, the expectation was that for the Entra User created above also a SQL user is created, which doesn't seem to be the case, at least per current functionality available. 

         Before diving deeper, it's useful to retrieve User's details: 

        -- retrieve current user
        SELECT SUser_Name() sys_user_name
        , User_Id() user_id 
        , USER_NAME() user_name
        , current_user [current_user]
        , user [user]; 
        
        Output:
        sys_user_name user_id user_name current_user user
        JamesClavell@[domain].onmicrosoft.com 0 JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com

        Retrieving the current User is useful especially when testing in parallel functionality with different Users. Strangely, User's ID is 0 when only read permissions were assigned. However, a valid User identifier is added for example when to the User is assigned also the db_datawriter role. Removing afterwards the db_datawriter role to the User keeps as expected User's ID. For troubleshooting purposes, at least per current functionality, it might be a good idea to create the Users with a valid User ID (e.g. by assigning temporarily the db_datawriter role to the User). 

        The next step is to look at the Users with access to the database:

        -- database access 
        SELECT USR.uid
        , USR.name
        --, USR.sid 
        , USR.hasdbaccess 
        , USR.islogin
        , USR.issqluser
        --, USR.createdate 
        --, USR.updatedate 
        FROM sys.sysusers USR
        WHERE USR.hasdbaccess = 1
          AND USR.islogin = 1
        ORDER BY uid
        
        Output:
        uid name hasdbaccess islogin issqluser
        1 dbo 1 1 1
        6 CharlesDickens@[...].onmicrosoft.com 1 1 0
        7 TestUser 1 1 1
        9 JamesClavell@[...].onmicrosoft.com 1 1 0

        For testing purposes, besides the standard dbo role and two Entra-based roles, it was created also a SQL role to which was granted access to the SalesLT schema (see initial post):

        -- create the user
        CREATE USER TestUser WITHOUT LOGIN;
        
        -- assign access to SalesLT schema 
        GRANT SELECT ON SCHEMA::SalesLT TO TestUser;
          
        -- test impersonation (run together)
        EXECUTE AS USER = 'TestUser';
        
        SELECT * FROM SalesLT.Customer;
        
        REVERT; 

        Notes:
        1) Strangely, even if access was given explicitly only to the SalesLT schema, the TestUser User has access also to sys.sysusers and other DMVs. That's valid also for the access over SSMS
        2) For the above created User there are no records in the sys.user_token and sys.login_token DMVs, in contrast with the user(s) created for administering the SQL database. 

        Let's look at the permissions granted explicitly:

        -- permissions granted explicitly
        SELECT DPR.principal_id
        , DPR.name
        , DPR.type_desc
        , DPR.authentication_type_desc
        , DPE.state_desc
        , DPE.permission_name
        FROM sys.database_principals DPR
             JOIN sys.database_permissions DPE
        	   ON DPR.principal_id = DPE.grantee_principal_id
        WHERE DPR.principal_id != 0 -- removing the public user
        ORDER BY DPR.principal_id
        , DPE.permission_name;
        
        Result:
        principal_id name type_desc authentication_type_desc state_desc permission_name
        1 dbo SQL_USER INSTANCE GRANT CONNECT
        6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT AUTHENTICATE
        6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT
        7 TestUser SQL_USER NONE GRANT CONNECT
        7 TestUser SQL_USER NONE GRANT SELECT
        9 JamesClavell@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT

        During troubleshooting it might be useful to check current user's permissions at the various levels via sys.fn_my_permissions:

        -- retrieve database-scoped permissions for current user
        SELECT *
        FROM sys.fn_my_permissions(NULL, 'Database');
        
        -- retrieve schema-scoped permissions for current user
        SELECT *
        FROM sys.fn_my_permissions('SalesLT', 'Schema');
        
        -- retrieve object-scoped permissions for current user
        SELECT *
        FROM sys.fn_my_permissions('SalesLT.Customer', 'Object')
        WHERE permission_name = 'SELECT';
        

        Notes:
        1) See also [1] and [4] in what concerns the limitations that apply to managing permissions in SQL databases.

        Happy coding!

        Previous Post <<||>> Next Post

        References:
        [1] Microsoft Learn (2024) Microsoft Fabric: Share your SQL database and manage permissions [link]
        [2] Microsoft Learn (2024) Microsoft Fabric: Share data and manage access to your SQL database in Microsoft Fabric  [link]
        [3] Microsoft Learn (2024) Authorization in SQL database in Microsoft Fabric [link]
        [4] Microsoft Learn (2024) Authentication in SQL database in Microsoft Fabric [link]

        [5] Microsoft Fabric Learn (2025) Manage access for SQL databases in Microsoft Fabric with workspace roles and item permissions [link

        06 January 2025

        💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VII: Things That Don't Work) 🆕

        Microsoft does relatively a good job in documenting what doesn't work in Microsoft Fabric's SQL Databases. There's a good overview available already in the documentation, though beyond this the current post lists my finding while testing the previously written code on this blog,

        USE Database

        The standard syntax allows to change via USE the database context to the specified database or database snapshot. Unfortunately, this syntax doesn't seem to be supported currently and unfortunately many scripts seem to abuse of it. Thus, the following line of code throws an error:

        -- changing the context
        USE master;
        GO
        USE tempdb;
        

        "Msg 40508, Level 16, State 1, Line 1, USE statement is not supported to switch between databases. Use a new connection to connect to a different database"

        However, one can use the 3-part naming convention to reference the various objects:

        -- sys metadata - retrieving the database files
        
        SELECT *
        FROM tempdb.sys.database_files dbf
        ORDER BY name;

        Even if the tempdb is not listed in the sys.databases table, it's still available for querying, which can prove helpful for troubleshooting. 

        DBCC commands 

        The documentation warns that some DBCC commands won't work, though in some cases there are also alternatives. For example:

        -- clearing the procedure cache via DBCC
        DBCC FREEPROCCACHE;
        
        Output:
        "Msg 2571, Level 14, State 9, Line 1, User '<user>' does not have permission to run DBCC freeproccache."

        Alternatively, one can use the following command, which seems to work:

        -- clearing the procedure cash via ALTER
        ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;

        CHECKDB, which checks the logical and physical integrity of all the objects in the specified database, can't be used as well:
         
        -- Checking the logical and physical integrity of a database
        DBCC CHECKDB();
        
        Output:
        "Msg 916, Level 14, State 2, Line 1, The server principal "..." is not able to access the database "..." under the current security context."
        The same error message is received for CHECKTABLE, utility which checks the integrity of all the pages and structures that make up the table (or indexed view):

        -- checking a table's integrity
        DBCC CHECKTABLE ('SalesLT.Address');
        
        Output:
        "Msg 916, Level 14, State 2, Line 2, The server principal "..." is not able to access the database "..." under the current security context."
        A similar error messages is received for SQLPERF, which provides transaction log space usage statistics for all databases:

        -- retrieving the LOGSPACE information for all databases
        DBCC SQLPERF (LOGSPACE);
        
        Output: 
        "Msg 297, Level 16, State 10, Line 1, The user does not have permission to perform this action."
        There are however DBCC commands like SHOW_STATISTICS or SHRINKDATABASE which do work. 
         
        -- current query optimization statistics
        DBCC SHOW_STATISTICS('SalesLT.Address','PK_Address_AddressID');
        
        Output:
        Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows Persisted Sample Percent
        PK_Address_AddressID Dec 21 2024 3:02AM 450 450 197 1 4 NO 450 0

        SHRINKDATABASE shrinks the size of the data and log files in the specified database:

        -- shrinking database
        DBCC SHRINKDATABASE([AdventureWorks01-...]) WITH NO_INFOMSGS;
        

        Update 29-Jan-2025: According to an answer from Ask the Expert session on Fabric Database [3], Microsoft seems to be working in bringing more DBCC features to SQL databases.

        Happy coding!

        Previous Post <<||>> Next Post

        References:
        [1] Microsoft Learn (2024) SQL Server: USE <database> [link]
        [2] Microsoft Learn (2024) Database console commands [link]
        [3] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

        Related Posts Plugin for WordPress, Blogger...

        About Me

        My photo
        Koeln, NRW, Germany
        IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.