SQL Troubles: new features

Showing posts with label new features. Show all posts

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 27-Jul-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
[(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
)]

[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”,

AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLVs)

{def} persisted, continuously updated view of data [1]

{benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations

simplifies the implementation of multi-stage Lakehouse processing [1]

⇐ aids in the creation, management, and monitoring of views [3]
⇐ improves transformations through a declarative approach [3]
streamline data workflows
enable developers to focus on business logic [1]

⇐ not on infrastructural or data quality-related issues [1]

the views can be created in a notebook [2]

{benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]

can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]

faster development cycles
trustworthy data
quicker insights

{goal} process only the new or changed data instead of reprocessing everything each time [1]

⇐ leverages Delta Lake’s CDF under the hood

⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]

{operation} creation

allows defining transformations at each layer [1]

e.g. aggregation, projection, filters

allows specifying certain checks that the data must meet [1]

incorporate data quality constraints directly into the pipeline definition

via CREATE MATERIALIZED LAKE VIEW

the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]

{operation} refresh

refreshes only when its source has new data [1]

if there’s no change, it can skip running entirely (saving time and resources) [1]

via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];

{operation} list views from schema [3]

via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;

{opetation} retrieve definition

via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;

{operstion} update definition

via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;

{operstion} drop view

via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
{warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]

{recommendation} update the reference in all dependent materialized lake views [3]

{operation} schedule view run

lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
depends on

data update frequency: the frequency with which the data is updated [5]
query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
system load: optimizing the time to run the lineage without overloading the system [5]

{operation} view run history

users can access the last 25 runs including lineage and run metadata

available from the dropdown for monitoring and troubleshooting

{concept} lineage

the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]

{feature} automatically generate a visual report that shows trends on data quality constraints

{benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]

{feature} can be combined with Shortcut Transformation feature for CSV ingestion

{benefit} facilitate the building of end-to-end Medallion architectures

{feature} dependency graph

allows to see the dependencies existing between the various objects [2]

⇐ automatically generated [2]

{feature} data quality

{benefit} allows to compose precise queries to exclude poor quality data from the source tables [5]
[medallion architecture] ensuring data quality is essential at every stage of the architecture [5]
maintained by setting constraints when defining the MLVs [5]
{action} FAIL

stops refreshing an MLV if any constraint is violated [5]
{default} halt is at the first instance

even without specifying the FAIL keyword [5]
takes precedence over DROP

{action} DROP

processes the MLV and removes records that don't meet the specified constrain [5]

provides the count of removed records in the lineage view [5]

{constraint} updating data quality constraints after creating an MLV isn't supported [5]

⇐ the MLV must be recreated

{constraint} the use of functions and pattern search with operators in constraint condition is restricted [5]

e.g. LIKE, regex

{known issue} the creation and refresh of an MLV with a FAIL action in constraint may result in a "delta table not found" error

{recommendation} recreate the MLV and avoid using the FAIL action [5]

{feature} data quality report

built-in Power BI dashboard that shows several aggregated metrics [2]

{feature} monitor hub

centralized portal to browse MLV runs in the lakehouse [7]
{operation} view runs' status [7]
{operation} search and filter the runs [7]

based on different criteria

{operation} cancel in-progress run [7]
{operation} drill down run execution details [7]

doesn't support

{feature|planned} PySpark [3]
{feature|planned} incremental refresh [3]
{feature|planned} integration with Data Activator [3]
{feature|planned} API [3]
{feature|planned} cross-lakehouse lineage and execution [3]
{limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
{limitation} creation with delta time-travel [4]
{limitation} DML statements [4]
{limitation} UDFs in CTAS [4]
{limitation} temporary views can't be used to define MLVs [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]

[2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]

[3] Microsoft Learn (2025) What are materialized lake views in Microsoft Fabric? [link]

[4] Microsoft Learn (2025) Materialized lake views Spark SQL reference [link]

[5] Microsoft Learn (2025) Manage Fabric materialized lake views lineage [link]

[6] Microsoft Learn (2025) Data quality in materialized lake views [link]

[7] Microsoft Learn (2025) Monitor materialized lake views [link]

Resources:
[R1] Databricks (2025) Use materialized views in Databricks SQL [link]

[R2] Microsoft Learn (2025) Implement medallion architecture with materialized lake views [link]

Acronyms:

API -

CDF - Change Data Feed

CTA -

DML -

ETL - Extract, Transfer, Load

MF - Microsoft Fabric
MLV - Materialized Lake views

UDF - User-defined functions

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 23-May-2025

[Microsoft Fabric] Warehouse Snapshots

{def} read-only representation of a warehouse at a specific point in time [1]
allows support for analytics, reporting, and historical analysis scenarios without worrying about the volatility of live data updates [1]

provide a consistent and stable view of data [1]
ensuring that analytical workloads remain unaffected by ongoing changes or ETL operations [1]

{benefit} guarantees data consistency

the dataset remains unaffected by ongoing ETL processes [1]

{benefit} immediate roll-Forward updates

can be seamlessly rolled forward on demand to reflect the latest state of the warehouse

⇒ {benefit} consumers access the same snapshot using a consistent connection string, even from third-party tools [1]
⇐ updates are applied immediately, as if in a single, atomic transaction [1]

{benefit} facilitates historical analysis

snapshots can be created on an hourly, daily, or weekly basis to suit their business requirements [1]

{benefit} enhanced reporting

provides a point-in-time reliable dataset for precise reporting [1]

⇐ free from disruptions caused by data modifications [1]

{benefit} doesn't require separate storage [1]

relies on source Warehouse [1]

{limit} doesn't support database objects
{limit} capture a state within the last 30 days
{operation} create snapshot

via New warehouse snapshot
multiple snapshots can be created for the same parent warehouse [1]

appear as child items of the parent warehouse in the workspace view [1]
the queries run against provide the current version of the data being accessed [1]

{operation} read properties

via
GET https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{warehousesnapshotId} Authorization: Bearer <bearer token>

{operation} update snapshot timestamp

allows users to roll forward data instantly, ensuring consistency [1]

use current state

via ALTER DATABASE [<snapshot name>] SET TIMESTAMP = CURRENT_TIMESTAMP;

use point in time

ALTER DATABASE snapshot SET TIMESTAMP = 'YYYY-MM-DDTHH:MM:SS.SS'//UTC time

queries that are in progress during point in time update will complete against the version of data they were started against [1]

{operation} rename snapshot
{operation} delete snapshot

via DELETE
when the parent warehouse gets deleted, the snapshot is also deleted [1]

{operation} modify source table

DDL changes to source will only impact queries in the snapshot against tables affected [1]

{operation} join multiple snapshots

the resulting snapshot date will be applied to each warehouse connection [1]

{operation} retrieve metadata

via sys.databases [1]

[permissions] inherited from the source warehouse [1]

⇐ any permission changes in the source warehouse applies instantly to the snapshot [1]
security updates on source database will be rendered immediately to the snapshot databases [1]

{limitation} can only be created against new warehouses [1]

created after Mar-2025

{limitation} do not appear in SSMS Object Explorer but will show up in the database selection dropdown [1]
{limitation} datetime can be set to any date in the past up to 30 days or database creation time (whichever is later) [1]
{limitation} modified objects after the snapshot timestamp become invalid in the snapshot [1]

applies to tables, views, and stored procedures [1]

{limitation} must be recreated if the data warehouse is restored [1]

{limitation} aren’t supported on the SQL analytics endpoint of the Lakehouse [1]
{limitation} aren’t supported as a source for OneLake shortcuts [1]
[Power BI]{limitation} require Direct Query or Import mode [1]

don’t support Direct Lake

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]

[2] Microsoft Learn (2025) Warehouse snapshots (preview) [link]

[3] Microsoft Learn (2025) Create and manage a warehouse snapshot (preview) [link]

Resources:

Acronyms:

DDL - Data Definition Language
ETL - Extract, Transfer, Load

MF - Microsoft Fabric
SSMS - SQL Server Management Studio

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

{def} parameters that allow to dynamically control and customize Dataflows Gen2

makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
the dataflow is refreshed by passing parameter values outside of the Power Query editor through either

Fabric REST API [1]
native Fabric experiences [1]

parameter names are case sensitive [1]
{type} required parameters

{warning} the refresh fails if no value is passed for it [1]

{type} optional parameters
enabled via Parameters >> Enable parameters to be discovered and override for execution [1]

{limitation} dataflows with parameters can't be

scheduled for refresh through the Fabric scheduler [1]
manually triggered through the Fabric Workspace list or lineage view [1]

{limitation} parameters that affect the resource path of a data source or a destination are not supported [1]

⇐ connections are linked to the exact data source path defined in the authored dataflow

can't be currently override to use other connections or resource paths [1]

{limitation} can't be leveraged by dataflows with incremental refresh [1]
{limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override

any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]

{warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
{limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
{limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
{limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
{limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]

subsequent requests are rejected until the first request finishes its evaluation [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link]

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link]

Acronyms:
API - Application Programming Interface

REST - Representational State Transfer

25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 25-Apr-2025

[Microsoft Fabric] Incremental Refresh in Dataflows Gen2

{feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]

designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
configurable directly in the dataflow editor [8]
doesn't need to specify the historical data range [8]

⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]

doesn't need to specify the parameters for the incremental refresh [8]

the filters and parameters are automatically added as the last step in the query [8]

{prerequisite} the data source

supports folding [8]
needs to contain a Date/DateTime column that can be used to filter the data [8]

{prerequisite} the data destination supports incremental refresh [8]

available destinations

Fabric Warehouse
Azure SQL Database
Azure Synapse Analytics
Fabric Lakehouse [preview]

other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]

allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]

a full refresh from the staged data to the data destination is still needed [8]

works by dividing the data into buckets based on a DateTime column [8]

each bucket contains the data that changed since the last refresh [8]

the dataflow knows what changed by checking the maximum value in the specified column

if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
if the maximum value didn't change, the dataflow doesn't retrieve any data [8]

{limitation}

the data destination must be set to a fixed schema [8]
⇒table's schema in the data destination must be fixed and can't change [8]

⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]

{limitation} the only supported update method in the data destination: replace

⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]

data that is outside the bucket range isn't affected [8]

{limitation} maximum number of buckets

single query: 50

{workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]

whole dataflow: 150

{workaround} reduce the number of incremental refresh queries or increase the bucket size [8]

{downside} the dataflow may take longer to refresh after enabling incremental refresh [8]

because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
{recommendation} review the settings for incremental refresh and adjust them to better fit the scenario

{option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
{option} reduce the number of buckets by increasing the bucket size [8]
{option} disable incremental refresh [8]

{recommendation} don't use the column for detecting changes also for filtering [8]

because this can lead to unexpected results [8]

{setting} limit number of concurrent evaluation

setting the value to a lower number, reduces the number of requests sent to the source system [8]
via global settings >> Scale tab >> maximum number of parallel query evaluations
{recommendation} don't enable this limit unless there're issues with the source system [8]

Previous Post <<||>> Next Post

References:

[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]

Resources:

14 April 2025

🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 13-Apr-2025

[Microsoft Fabric] Quotas

{def} assigned number of resources for an Azure subscription

depend on the Azure subscription type [5]

set and enforced in the scope of the subscription [1]
each subscription has a default value for each quota [1]

determine the maximum number of CUs for each of the capacities on the subscription [5]

customers don't get charged for quotas, but on capacities [5]

each quota represents a specific countable resource

e.g. number of VMs that can be created [2]
e.g. the number of storage accounts that can be used concurrently [2]
e.g. the number of networking resources that can be consumed [2]
e.g. the number of API calls to a particular service that can be made [2]

designed to help protect customers from unexpected behavior [1]

e.g. inaccurately resourced deployments, mistaken consumption
helps minimize risks from deceptive or inappropriate consumption and unexpected demand [1]

limitations

⇐ some limits are managed at a regional level [2]
⇐ variable and dependent on several factors

{type} adjustable quotas

quotas for which customers can request quota increases

via Azure Home My quotas page
based on amount or usage percentage and submitting it directly [1]
each subscription has a default quota value for each quota [1]

the quickest way to increase quotas [1]

{type} non-adjustable quotas

quotas which have a hard limit, usually determined by the scope of the subscription [1]
to make changes, customers must submit a support request for the Azure support team [1]

[Fabric] limit the number of CUs customers can provision across multiple capacities in a subscription

calculation based on

subscription plan type
Azure region

{action} provision a new capacity

{level} Azure management group limits
{level} Azure subscription limits
{level} Azure resource group limits
{level} Template limits
{level} Microsoft Entra ID limits
{level} Azure API Center limits
{level} Azure API Management limits
{level} API Management v2 tiers
{level} Azure App Service limits
{level} Azure Automation limits

{action} request quota increase

via support request
there is no cost associated with requesting a quota increase [1]

⇐ costs are incurred based on resource usage, not the quotas themselves [1]

should be based on workloads' characteristics [1]

{action} view quota

[permission] contributor role, or another role that includes contributor access [5]

{action} manage quota

[permission] see Quota Request Operator

[Azure] quota alerts

notifications triggered when the usage of a specific Azure resource nears the predefined quota limit [4]
facilitates proactive resource management [4]
multiple alert rules can be created for a given quota or across quotas in a subscription [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Fabric: Quotas overview [link]

[2] Microsoft Learn (2025) Fabric: Azure subscription and service limits, quotas, and constraints [link]

[3] Microsoft Learn (2025) Fabric: Quota monitoring and alerting [link]

[4] Microsoft Fabric Update Blog (2024) Announcing the launch of Microsoft Fabric Quotas [link]
[5] Microsoft Learn (2025) Fabric: Microsoft Fabric capacity quotas [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

API - Application Programming Interface
CU - Capacity Units

VM - Virtual Machine

07 April 2025

🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 7-Apr-2025

[Microsoft Fabric] User Data Functions (UDFs)

{def} a platform that allows customers to host and run custom logic on Fabric from different types of items and data sources [1]
empower data developers to write custom logic and embed it into their Fabric ecosystem [1]

the logic can include internal algorithms and libraries [1]
{benefit} allows to handle complex transformations, optimizing performance, and integrating diverse data sources beyond the capabilities of low-code tools [5]

via public libraries from PyPI
via native Fabric integrations to

connect to Fabric data sources

e.g. warehouse, lakehouse or SQL Database

invoke functions from other Fabric items

e.g. notebooks, Power BI reports, data pipelines

via edit functions directly in the Fabric portal [1]

via in-browser tooling, VS Code extension

supports the Python 3.11 runtime [1]

{goal} reusability

by creating libraries of standardized functionality [1]

can be used in many solutions across the organization [1]

{goal} customization

the applications are tailored to customers’ needs [1]

{goal} encapsulation

functions can perform various tasks to build sophisticated workflows [1]

{goal} external connectivity

data functions can be invoked from external client applications using a REST endpoint [1]
allows integrations with external systems [1]

{feature} programming model

SDK that provides the necessary functionality to author and publish runnable functions in Fabric [4]
{benefit} allows to seamlessly integrate with other items in the Fabric ecosystem [4]

e.g. Fabric data sources

{concept} user data functions item

contains one or many functions that can be invoked from the Fabric portal using the provided REST endpoint [4]

from another Fabric item, or from an external application

each function is a method in Python script that allows passing parameters and returning an output to the invoker [4]

{component} fabric.functions library

provides the code needed to create user data functions in Python [4]
imported in the template by default [4]

{method} fn.UserDataFunctions()

provides the execution context [4]
added at the beginning of the code file in all new user data functions items, before any function definitions [4]

{concept} functions

every function is identified with a @udf.function() decorator

can be invoked individually from the portal or an external invoker [4]
functions without the decorator can be invoked directly [4]

{feature} invocation logs

function invocations are logged
{benefit} allows to check the status or perform debugging [3]
{limitation} can take few minutes to appear [3]

{recommendation} refresh the page after a few minutes [3]

{limitation} daily ingestion limit: 250 MB [3]

reset the next day

a new log is made available [3]

{limitation} logs are sampled while preserving a statistically correct analysis of application data [3]

the sampling is done by User data functions to reduce the volume of logs ingested [3]
if some logs are partially missing, it might be because of sampling [3]

{limitation} supported log types: information, error, warning, trace [3]

{feature} Manage connections

{limitation} only supports connecting to Fabric-native data sources [2]

{limitation} only available in a subset of Fabric regions [2]
{limitation} editable by the owner only [2]

can only be modified and published by the user who is the owner of the User Data Functions Fabric item [2]

{limitation} adds further keywords to the list of reserved keywords

namely: req, context, reqInvocationId
can't be used as parameter names or function names [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Microsoft Fabric: What is Fabric User data functions (Preview)? [link]

[2] Microsoft Learn (2025) Microsoft Fabric: Service details and limitations of Fabric User Data Functions [link]

[3] Microsoft Learn (2025) Microsoft Fabric: User data functions invocation logs (Preview) [link]

[4] Microsoft Learn (2025) Microsoft Fabric: Fabric User data functions programming model overview (Preview) [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Fabric User Data Functions (Preview) [link]

[6] Microsoft Learn (2025) Microsoft Fabric: Use the Functions activity to run Fabric user data functions and Azure Functions [link]

Resources:

[R1] Microsoft Fabric Updates Blog (2025) Utilize User Data Functions in Data pipelines with the Functions activity (Preview) [link]

Acronyms:
PyPI - Python Package Index
REST - Representational State Transfer
SDK - Software Development Kit
UDF - User Data Function
VS - Visual Studio

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 6-Apr-2025

[Microsoft Fabric] Hints in Warehouse

{def} keywords that users can add to SQL statements to provide additional information or instructions to the query optimizer [2]

options or strategies specified for enforcement by the SQL Server query processor [1]

applicable to SELECT, INSERT, UPDATE, or DELETE statements [1]

{benefit} help improve the performance, scalability, or consistency of queries by overriding the default behavior of the query optimizer [2]
{type} join hints

specify what join strategy/algorithm used between two tables [3]

improves the performance of queries that involve large or complex joins [2]
specified in the FROM clause of a query [1]
if a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query [3]

based on the position of the ON keywords [3]

when a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order [3]

looping

via LOOP JOIN
{restriction} can't be specified together with RIGHT or FULL as a join type [3]

hashing

via HASH JOIN

merging

via MERGE JOIN

REPLICATE

causes a broadcast move operation

a specific table to be replicated across all distribution nodes [2]

with INNER or LEFT` join

the broadcast move operation will replicate the right side of the join to all nodes [2]

with RIGHT join

the broadcast move operation will replicate the left side of the join to all nodes [2]

with FULL` join

an estimated plan cannot be created [2]

REDISTRIBUTE [(colsCount)]

ensures two data sources are distributed based on JOIN clause columns [2]
handles multiple join conditions, specified by the first n columns in both tables, where n is the column_count argument [2]
redistributing data optimizes query performance by evenly spreading data across nodes during intermediate steps of execution [2]
the (columns_count) argument is only supported in MF warehouse [2]

{type} query hint

specify that the indicated hints are used in the scope of a query [3]

affect all operators in the statement

[UNION only the last query involving a UNION operation can have the OPTION clause [3]

specified as part of the OPTION clause [3]
Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan [3]

used via the OPTION clause at the end of a query [2]

followed by the name of the query hint and its optional parameters in parentheses [2]
multiple hints can be used in the same query, separated by commas

e.g. FORCE ORDER and MAX_GRANT_PERCENT

instruct the QO to preserve the join order specified in the query and to limit the memory grant to 20 percent of the available memory

{hint} HASH GROUP

specifies that the QO should use a hash-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve large or complex grouping sets [2]

{hint} ORDER GROUP

specifies that the QO should use a sort-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve small or simple grouping sets [2]

{hint} MERGE UNION

specifies that the QO should use a merge-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve sorted inputs [2]

{hint} HASH UNION

specifies that the query optimizer should use a hash-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve unsorted or large inputs [2]

{hint} CONCAT UNION

specifies that the QO should use a concatenation-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improves the performance of queries that involve distinct or small inputs[2]

{hint} FORCE ORDER

specifies that the QO should preserve the join order specified in the query [2]
{benefit} can improves the performance or consistency of queries that involve complex join conditions or hints [2]

{hint} FORCE SINGLE NODE PLAN/FORCE DISTRIBUTED PLAN

allows to choose whether to force a single node plan or a distributed plan for query’s execution [2]

{hint} USE HINT

adds one or more extra hints to the query processor, where the hints are specified with a hint name inside single quotation marks inside OPTION clause [2] OPTION(USE HINT(‘ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES’))
used with several hint names, changing the behavior of CE derivation

{hint name} ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full correlation among filters when a high level of underestimation on AND predicates for filters is observed [2]
[SQL Server] equivalent TF 4137 [4]

{hint name} ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full independence among filters [2]

if a high level of overestimation on AND predicates for filters is observed, this hint can help produce a better estimate [2]

{hint name} ASSUME_PARTIAL_CORRELATION_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
{default} MF assumes partial correlation among filters [2]

⇐ it is unlikely that this hint will help improve the estimates [2]

{hint name} ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS

applies when calculating cardinality estimate for joins [2]
uses Simple Containment assumption instead of the default Base Containment assumption [2]
[SQL Server] equivalent TF 9476 [4]

{type} table hints

none enforced currently

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) SQL: Hints (T-SQL) [link]

[2] Microsoft Fabric Updates Blog (2025) Hints in Fabric Data Warehouse [link]
[3] Microsoft Learn (2025) SQL: OPTION clause (T-SQL) [link]
[4] Microsoft Support (2016) KB3189813 - Update introduces USE HINT query hint argument in SQL Server 2016 [link]

Resources:
[R1] Microsoft Learn (2025) SQL Server: Query hints (T-SQL) [link]
[R2] Most Useful Query Hints [link]

Acronyms:
MF - Microsoft Fabric
TF - Trace Flag
QO - Query Optimizer

28 March 2025

🏭🗒️Microsoft Fabric: OneLake Role-Based Access Control (RBAC) [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 28-Mar-2025

[Microsoft Fabric] OneLake Role-based access control (RBAC)

{def} security framework that allows to manage access to resources by assigning roles to users or groups

applies to Lakehouse Items only [1]
restricts data access for users with Workspace Viewer or read access to a lakehouse [1]
doesn't apply to Workspace Admins, Members, or Contributors [1]

⇒ supports only Read level of permissions [1]

uses role assignments to apply permissions to its members

assigned to

individuals
security groups
Microsoft 365 groups
distribution lists
⇐ every member of the user group gets the assigned role [1]

users in multiple groups get the highest level of permission that is provided by the roles [1]

managed through the lakehouse data access settings [1]
when a lakehouse is created, OneLake generates a default RBAC Role named Default Readers [1]

allows all users with ReadAll permission to read all folders in the Item [1]

permissions always inherit to the entire hierarchy of the folder's files and subfolders [1]
provides automatic traversal of parent items to ensure that data is easy to discover [1]

⇐ similar to Windows folder permissions [1]
[shortcuts] shortcuts to other OneLake locations have specialized behavior [1]

the access to a OneLake shortcut is determined by the target permissions of the shortcut [1]

when listing shortcuts, no call is made to check the target access [1]

⇒ when listing a directory all internal shortcuts will be returned regardless of a user's access to the target [1]

when a user tries to open the shortcut the access check will evaluate and a user will only see data they have the required permissions to see [1]

enable you to restrict the data access in OneLake only to specific folders [1]

{action} share a lakehouse

grants other users or a group of users access to a lakehouse without giving access to the workspace and the rest of its items [1]
found through

Data Hub
'Shared with Me' section in Microsoft Fabrics

[shortcuts] permissions always inherit to all Internal shortcuts where a folder is defined as target [1]

when a user accesses data through a shortcut to another OneLake location, the identity of the calling user is used to authorize access to the data in the target path of the shortcut [1]

⇒ the user must have OneLake RBAC permissions in the target location to read the data [1]
defining RBAC permissions for the internal shortcut is not allowed [1]

must be defined on the target folder located in the target item [1]
OneLake enables RBAC permissions only for shortcuts targeting folders in lakehouse items [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Fabric: Role-based access control (RBAC) [link]

[2] Microsoft Learn (2024) Best practices for OneLake security [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ADLS - Azure Data Lake Storage
RBAC - Role-Based Access Control

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 12-Mar-2025

Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

{def}

indexes and caches on the fly data landing in OneLake [2]

{benefit} allows to

analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake
⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2]
⇒ accelerate data landing in OneLake

⇐ including existing data and any new updates, and expect similar performance [1]
eliminates the need to

manage ingestion pipelines [1]
maintain duplicate copies of data [1]

ensures that data remains in sync without additional effort [4]
the initial process is dependent on the size of the external table [4]

⇐ provides significant performance comparable to ingesting data in Eventhouse [1]

in some cases up to 50x and beyond [2]

⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]

when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
accelerating the shortcut means equivalent ingestion into the Eventhouse
⇐ optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
e.g. indexing, caching, etc.

all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]
behave like external tables, with the same limitations and capabilities [4]

{limitation} materialized view aren't supported [1]
{limitation} update policies aren't supported [1]

allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]

⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]

⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]

{costs} charged under OneLake Premium cache meter [2]

⇐ similar to native Eventhouse tables [2]
one can control the amount of data to accelerate by configuring number of days to cache [2]
indexing activity may also count towards CU consumption [2]

{limitation} the number of columns in the external table can't exceed 900 [1]
{limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
{limitation} the feature assumes delta tables with static advanced features

e.g. column mapping doesn't change, partitions don't change, etc
{recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]

{limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]

might result in acceleration starting from scratch if there was breaking schema change [1]

{limitation} index-based pruning isn't supported for partitions [1]
{limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]

[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]

[4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

25 February 2025

🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 25-Feb-2024

[Microsoft Fabric] T-SQL notebook

{def} notebook that enables to write and run T-SQL code within a notebook [1]
{feature} allows to manage complex queries and write better markdown documentation [1]
{feature} allows the direct execution of T-SQL on

connected warehouse
SQL analytics endpoint
⇐ queries can be run directly on the connected endpoint [1]

multiple connections are allowed [1]

allows running cross-database queries to gather data from multiple warehouses and SQL analytics endpoints [1]
the code is run by the primary warehouse

used as default in commands which supports three-part naming, though no warehouse was provided [1]

three-part naming consists of

database name

the name of the warehouse or SQL analytics endpoint [1]

schema name
table name

{feature} autogenerate T-SQL code using the code template from the object explorer's context [1] menu
{concept} code cells

allow to create and run T-SQL code

each code cell is executed in a separate session [1]

{limitation} the variables defined in one cell are not available in another cell [1]
one can check the execution summary after the code is executed [1]

cells can be run individually or together [1]
one cell can contain multiple lines of code [1]

users can select and run subparts of a cell’s code [1]

{feature} Table tab

lists the records from the returned result set

if the execution contains multiple result set, you can switch from one to another via the dropdown menu [1]

a query can be saved as

view

via 'Save as' view
{limitation} does not support three-part naming [1]

the view is always created in the primary warehouse [1]

by setting the warehouse as the primary warehouse [1]

table

via 'Save as' table
saved as CTAS

⇐ 'Save as' is only available for the selected query text

the query text must be selected before using the Save as options

{limitation} doesn’t support

parameter cell

the parameter passed from pipeline or scheduler can't be used [1]

{feature} Recent Run

{workaround} use the current data warehouse monitoring feature to check the execution history of the T-SQL notebook [1]

{feature} the monitor URL inside the pipeline execution
{feature} snapshot
{feature} Git support
{feature} deployment pipeline support

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) T-SQL support in Microsoft Fabric notebooks [link]

[2] Microsoft Learn (2025) Create and run a SQL Server notebook [link]

[3] Microsoft Learn (2025) T-SQL surface area in Microsoft Fabric [link]

[4] Microsoft Fabric Updates Blog (2024) Announcing Public Preview of T-SQL Notebook in Fabric [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms
CTAS - Create Table as Select
T-SQL - Transact SQL

17 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VIII: Permissions) [new feature]

Data-based solutions usually target a set of users who (ideally) have restricted permissions to the functionality. Therefore, as part of the process are defined several personas that target different use cases, for which the permissions must be restricted accordingly.

In the simplest scenario the user must have access to the underlying objects for querying the data. Supposing that an Entra User was created already, the respective user must be given access also in the Fabric database (see [1], [2]). From database's main menu follow the path to assign read permissions:
Security >> Manage SQL Security >> (select role: db_datareader)

Manage SQL Security

Manage access >> Add >> (search for User)

Manage access

(select user) >> Share database >> (select additional permissions) >> Save

Manage additional permissions

The easiest way to test whether the permissions work before building the functionality is to login over SQL Server Management Studio (SSMS) and check the access using the Microsoft Entra MFA. Ideally, one should have a User's credentials that can be used only for testing purposes. After the above setup was done, the new User was able to access the data.

A second User can be created for testing with the maximum of permissions allowed on the SQL database side, which is useful for troubleshooting. Alternatively, one can use only one User for testing and assign or remove the permissions as needed by the test scenario.

It's a good idea to try to understand what's happening in the background. For example, the expectation was that for the Entra User created above also a SQL user is created, which doesn't seem to be the case, at least per current functionality available.

Before diving deeper, it's useful to retrieve User's details:

-- retrieve current user
SELECT SUser_Name() sys_user_name
, User_Id() user_id 
, USER_NAME() user_name
, current_user [current_user]
, user [user];

Output:

sys_user_name	user_id	user_name	current_user	user
JamesClavell@[domain].onmicrosoft.com	0	JamesClavell@[domain].onmicrosoft.com	JamesClavell@[domain].onmicrosoft.com	JamesClavell@[domain].onmicrosoft.com

Retrieving the current User is useful especially when testing in parallel functionality with different Users. Strangely, User's ID is 0 when only read permissions were assigned. However, a valid User identifier is added for example when to the User is assigned also the db_datawriter role. Removing afterwards the db_datawriter role to the User keeps as expected User's ID. For troubleshooting purposes, at least per current functionality, it might be a good idea to create the Users with a valid User ID (e.g. by assigning temporarily the db_datawriter role to the User).

The next step is to look at the Users with access to the database:

-- database access 
SELECT USR.uid
, USR.name
--, USR.sid 
, USR.hasdbaccess 
, USR.islogin
, USR.issqluser
--, USR.createdate 
--, USR.updatedate 
FROM sys.sysusers USR
WHERE USR.hasdbaccess = 1
  AND USR.islogin = 1
ORDER BY uid

Output:

uid	name	hasdbaccess	islogin	issqluser
1	dbo	1	1	1
6	CharlesDickens@[...].onmicrosoft.com	1	1	0
7	TestUser	1	1	1
9	JamesClavell@[...].onmicrosoft.com	1	1	0

For testing purposes, besides the standard dbo role and two Entra-based roles, it was created also a SQL role to which was granted access to the SalesLT schema (see initial post):

-- create the user
CREATE USER TestUser WITHOUT LOGIN;

-- assign access to SalesLT schema 
GRANT SELECT ON SCHEMA::SalesLT TO TestUser;
  
-- test impersonation (run together)
EXECUTE AS USER = 'TestUser';

SELECT * FROM SalesLT.Customer;

REVERT;

Notes:
1) Strangely, even if access was given explicitly only to the SalesLT schema, the TestUser User has access also to sys.sysusers and other DMVs. That's valid also for the access over SSMS
2) For the above created User there are no records in the sys.user_token and sys.login_token DMVs, in contrast with the user(s) created for administering the SQL database.

Let's look at the permissions granted explicitly:

-- permissions granted explicitly
SELECT DPR.principal_id
, DPR.name
, DPR.type_desc
, DPR.authentication_type_desc
, DPE.state_desc
, DPE.permission_name
FROM sys.database_principals DPR
     JOIN sys.database_permissions DPE
	   ON DPR.principal_id = DPE.grantee_principal_id
WHERE DPR.principal_id != 0 -- removing the public user
ORDER BY DPR.principal_id
, DPE.permission_name;

Result:

principal_id	name	type_desc	authentication_type_desc	state_desc	permission_name
1	dbo	SQL_USER	INSTANCE	GRANT	CONNECT
6	CharlesDickens@[...].onmicrosoft.com	EXTERNAL_USER	EXTERNAL	GRANT	AUTHENTICATE
6	CharlesDickens@[...].onmicrosoft.com	EXTERNAL_USER	EXTERNAL	GRANT	CONNECT
7	TestUser	SQL_USER	NONE	GRANT	CONNECT
7	TestUser	SQL_USER	NONE	GRANT	SELECT
9	JamesClavell@[...].onmicrosoft.com	EXTERNAL_USER	EXTERNAL	GRANT	CONNECT

During troubleshooting it might be useful to check current user's permissions at the various levels via sys.fn_my_permissions:

-- retrieve database-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions(NULL, 'Database');

-- retrieve schema-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions('SalesLT', 'Schema');

-- retrieve object-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions('SalesLT.Customer', 'Object')
WHERE permission_name = 'SELECT';

Notes:
1) See also [1] and [4] in what concerns the limitations that apply to managing permissions in SQL databases.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: Share your SQL database and manage permissions [link]
[2] Microsoft Learn (2024) Microsoft Fabric: Share data and manage access to your SQL database in Microsoft Fabric [link]
[3] Microsoft Learn (2024) Authorization in SQL database in Microsoft Fabric [link]
[4] Microsoft Learn (2024) Authentication in SQL database in Microsoft Fabric [link]

[5] Microsoft Fabric Learn (2025) Manage access for SQL databases in Microsoft Fabric with workspace roles and item permissions [link]

SQL Troubles

Pages

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

14 April 2025

🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

07 April 2025

🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes] 🆕

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

28 March 2025

🏭🗒️Microsoft Fabric: OneLake Role-Based Access Control (RBAC) [Notes] 🆕

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

25 February 2025

🏭💠🗒️Microsoft Fabric: T-SQL Notebook [Notes] 🆕

17 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VIII: Permissions) [new feature]

About Me