SQL Troubles: notes

Showing posts with label notes. Show all posts

06 October 2025

🏭🗒️Microsoft Fabric: Git [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 6-Oct-2025

[Microsoft Fabric] Git

{def} an open source, distributed version control platform

enables developers commit their work to a local repository and then sync their copy of the repository with the copy on the server [1]
to be differentiated from centralized version control

where clients must synchronize code with a server before creating new versions of code [1

provides tools for isolating changes and later merging them back together

{benefit} simultaneous development

everyone has their own local copy of code and works simultaneously on their own branches

Git works offline since almost every operation is local

{benefit} faster release

branches allow for flexible and simultaneous development

{benefit} built-in integration

integrates into most tools and products

every major IDE has built-in Git support

this integration simplifies the day-to-day workflow

{benefit} strong community support

the volume of community support makes it easy to get help when needed

{benefit} works with any team

using Git with a source code management tool increases a team's productivity

by encouraging collaboration, enforcing policies, automating processes, and improving visibility and traceability of work

the team can either

settle on individual tools for version control, work item tracking, and continuous integration and deployment
choose a solution that supports all of these tasks in one place

e.g. GitHub, Azure DevOps

{benefit} pull requests

used to discuss code changes with the team before merging them into the main branch
allows to ensure code quality and increase knowledge across team
platforms like GitHub and Azure DevOps offer a rich pull request experience

{benefit} branch policies

protect important branches by preventing direct pushes, requiring reviewers, and ensuring clean build

used to ensure that pull requests meet requirements before completion

teams can configure their solution to enforce consistent workflows and process across the team

{feature} continuous integration
{feature} continuous deployment
{feature} automated testing
{feature} work item tracking
{feature} metrics
{feature} reporting
{operation} commit

snapshot of all files at a point in time [1]

every time work is saved, Git creates a commit [1]
identified by a unique cryptographic hash of the committed content [1]
everything is hashed
it's impossible to make changes, lose information, or corrupt files without Git detecting it [1]

create links to other commits, forming a graph of the development history [2A]
{operation} revert code to a previous commit [1]
{operation} inspect how files changed from one commit to the next [1]
{operation} review information e.g. where and when changes were made [1]

{operation} branch

lightweight pointers to work in progress
each developer saves changes to their own local code repository

there can be many different changes based on the same commit

branches manage this separation

once work created in a branch is finished, it can be merged back into the team's main (or trunk) branch

main branch

contains stable, high-quality code from which programmers release

feature branches

contain work in progress, which are merged into the main branch upon completion
allows to isolate development work and minimize conflicts among multiple developers [2]

release branch

by separating the release branch from development in progress, it's easier to manage stable code and ship updates more quickly

if a file hasn't changed from one commit to the next, Git uses the previously stored file [1]
files are in one of three states

{state}modified

when a file is first modified, the changes exist only in the working directory

they aren't yet part of a commit or the development history

the developer must stage the changed files to be included in the commit
the staging area contains all changes to include in the next commit

{state}committed

once the developer is happy with the staged files, the files are packaged as a commit with a message describing what changed

this commit becomes part of the development history

{state}staged

staging lets developers pick which file changes to save in a commit to break down large changes into a series of smaller commits

by reducing the scope of commits, it's easier to review the commit history to

{best practice} set up a shared Git repository and CI/CD pipelines [2]

enables effective collaboration and deployment in PBIP [2]

enables implementing version control in PBIP [2]

it’s essential for managing project history and collaboration [2]

allows to track changes throughout the model lifecycle [2]
allows to enable effective governance and collaboratin

provides robust version tracking and collaboration features, ensuring traceability

{best practice} use descriptive commit messages [2]

allows to ensure clarity and facilitate collaboration in version control [2]

{best practice} avoid sharing Git credentials [2]

compromises security and accountability [2]

can lead to potential breaches [2]

{best practice} define a naming conventions for files and communicated accordingly [2]
{best practice} avoid merging changes directly into the master branch [2]

{risk} this can lead to integration issues [2]

{best practice} use git merge for integrating changes from one branch to another [2]

{benefit} ensures seamless collaboration [2]

{best practice} avoid skipping merges [2]

failing to merge regularly can lead to complex conflicts and integration challenges [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2022) DeveOps: What is Git? [link]
[2] M Anand, Microsoft Fabric Analytics Engineer Associate: Implementing Analytics Solutions Using Microsoft Fabric (DP-600), 2025

Acronyms:

PBIP - Power BI Project
CI/CD - Continuous Integration and Continuous Deployment
IDE - Integrated Development Environments

05 July 2025

🏭🗒️Microsoft Fabric: Git Repository [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 4-Jul-2025

[Microsoft Fabric] Git Repository

{def} set of features that enable developers to integrate their development processes, tools, and best practices straight into the Fabric platform [2]

{goal} the repo serves as single-source-of-truth
{feature} backup and version control [2]
{feature} revert to previous stages [2]
{feature} collaborate with others or work alone using Git branches [2]
{feature} source control

provides tools to manage Fabric items [2]
supported for Azure DevOps and GitHub [3]

{configuration} tenant switches

⇐ must be enabled from the Admin portal

by the tenant admin, capacity admin, or workspace admin

dependent on organization's settings [3]

users can create Fabric items
users can synchronize workspace items with their Git repositories
create workspaces

only if is needed to branch out to a new workspace [3]

users can synchronize workspace items with GitHub repositories

for GitHub users only [3]

{concept} release process

begins once new updates complete a Pull Request process and merge into the team’s shared branch [3]

{concept} branch

{operation} switch branches

the workspace syncs with the new branch and all items in the workspace are overridden [3]

if there are different versions of the same item in each branch, the item is replaced [3]
if an item is in the old branch, but not the new one, it gets deleted [3]

one can't switch branches if there are any uncommitted changes in the workspace [3]

{action} branch out to another workspace

creates a new workspace, or switches to an existing workspace based on the last commit to the current workspace, and then connects to the target workspace and branch [4]
{permission} contributor and above

{action} checkout new branch )

creates a new branch based on the last synced commit in the workspace [4]
changes the Git connection in the current workspace [4]
doesn't change the workspace content [4]
{permission} workspace admin

{action} switch branch

syncs the workspace with another new or existing branch and overrides all items in the workspace with the content of the selected branch [4]
{permission} workspace admin

{limitation} maximum length of branch name: 244 characters.
{limitation} maximum length of full path for file names: 250 characters
{limitation} maximum file size: 25 MB

{operation} connect a workspace to a Git Repos

can be done only by a workspace admin [4]

once connected, anyone with permissions can work in the workspace [4]

synchronizes the content between the two (aka initial sync)

{scenario} either of the two is empty while the other has content

the content is copied from the nonempty location to the empty on [4]

{scenario}both have content

one must decide which direction the sync should go [4]

overwrite the content from the destination [4]

includes folder structures [4]

workspace items in folders are exported to folders with the same name in the Git repo [4]
items in Git folders are imported to folders with the same name in the workspace [4]
if the workspace has folders and the connected Git folder doesn't yet have subfolders, they're considered to be different [4]

leads to uncommitted changes status in the source control panel [4]

one must to commit the changes to Git before updating the workspace [4]

update first, the Git folder structure overwrites the workspace folder structure [4]

{limitation} empty folders aren't copied to Git

when creating or moving items to a folder, the folder is created in Git [4]

{limitation} empty folders in Git are deleted automatically [4]
{limitation} empty folders in the workspace aren't deleted automatically even if all items are moved to different folders [4]
{limitation} folder structure is retained up to 10 levels deep [4]
{limitation} the folder structure is maintained up to 10 levels deep

Git status

synced

the item is the same in the workspace and Git branch [4]

conflict

the item was changed in both the workspace and Git branch [4]

unsupported item
uncommitted changes in the workspace
update required from Git [4]
item is identical in both places but needs to be updated to the last commit [4]

source control panel

shows the number of items that are different in the workspace and Git branch

when changes are made, the number is updated
when the workspace is synced with the Git branch, the Source control icon displays a 0

commit and update panel

{section} changes

shows the number of items that were changed in the workspace and need to be committed to Git [4]
changed workspace items are listed in the Changes section

when there's more than one changed item, one can select which items to commit to the Git branch [4]

if there were updates made to the Git branch, commits are disabled until you update your workspace [4]

{section} updates

shows the number of items that were modified in the Git branch and need to be updated to the workspace [4]
the Update command always updates the entire branch and syncs to the most recent commit [4]

{limitation} one can’t select specific items to update [4]
if changes were made in the workspace and in the Git branch on the same item, updates are disabled until the conflict is resolved [4]

in each section, the changed items are listed with an icon indicating the status

new
modified
deleted
conflict
same-changes

{concept} related workspace

workspace with the same connection properties as the current branch [4]

e.g. the same organization, project, repository, and git folder [4]

Previous Post <<||>> Next Post

References:

[2] Microsoft Learn (2025) Fabric: What is Microsoft Fabric Git integration? [link]

What is lifecycle management in Microsoft Fabric? [link]

[3] Microsoft Fabric Updates Blog (2025) Fabric: Introducing New Branching Capabilities in Fabric Git Integration [link]

[4] Microsoft Learn (2025) Fabric: Basic concepts in Git integration [link]
[5] [link]

Resources:
[R1] Microsoft Learn (2025) Fabric:

Acronyms:

CI/CD - Continuous Integration and Continuous Deployment

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 21-Jun-2025

[Microsoft Fabric] Result Set Caching in SQL Analytics Endpoints

{def} built-in performance optimization for Warehouse and Lakehouse that improves read latency [1]

fully transparent to the user [3]
persists the final result sets for applicable SELECT T-SQL queries

caches all the data accessed by a query [3]
subsequent runs that "hit" cache will process just the final result set

can bypass complex compilation and data processing of the original query[1]

⇐ returns subsequent queries faster [1]

the cache creation and reuse is applied opportunistically for queries

works on

warehouse tables
shortcuts to OneLake sources
shortcuts to non-Azure sources

the management of cache is handled automatically [1]

regularly evicts cache as needed

as data changes, result consistency is ensured by invalidating cache created earlier [1]

{operation} enable setting

via ALTER DATABASE <database_name> SET RESULT_SET_CACHING ON

{operation} validate setting

via SELECT name, is_result_set_caching_on FROM sys.databases

{operation} configure setting

configurable at item level

once enabled, it can then be disabled

at the item level
for individual queries

e.g. debugging or A/B testing a query

via OPTION ( USE HINT ('DISABLE_RESULT_SET_CACHE')

{default} during the preview, result set caching is off for all items [1]

[monitoring]

via Message Output

applicable to Fabric Query editor, SSMS
the statement "Result set cache was used" is displayed after query execution if the query was able to use an existing result set cache

via queryinsights.exec_requests_history system view

result_cache_hit displays indicates result set cache usage for each query execution [1]

{value} 2: the query used result set cache (cache hit)
{value} 1: the query created result set cache
{value} 0: the query wasn't applicable for result set cache creation or usage [1]

{reason} the cache no longer exists
{reason} the cache was invalidated by a data change, disqualifying it for reuse [1]
{reason} query isn't deterministic

isn't eligible for cache creation [1]

{reason} query isn't a SELECT statement

[warehousing]

{scenario} analytical queries that process large amounts of data to produce a relatively small result [1]
{scenario} workloads that trigger the same analytical queries repeatedly [1]

the same heavy computation can be triggered multiple times, even though the final result remains the same [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Result set caching (preview) [link]

[2] Microsoft Fabric Update Blog (2025) Result Set Caching for Microsoft Fabric Data Warehouse (Preview) [link|aka]

[3] Microsoft Learn (2025) In-memory and disk caching [link]

[4] Microsoft Learn (2025) Performance guidelines in Fabric Data Warehouse [link]

Resources:
[R1] Microsoft Fabric (2025) Fabric Update - June 2025 [link]

Acronyms:

MF - Microsoft Fabric

SSMS - SQL Server Management Studio

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 27-Jul-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
[(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
)]

[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”,

AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLVs)

{def} persisted, continuously updated view of data [1]

{benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations

simplifies the implementation of multi-stage Lakehouse processing [1]

⇐ aids in the creation, management, and monitoring of views [3]
⇐ improves transformations through a declarative approach [3]
streamline data workflows
enable developers to focus on business logic [1]

⇐ not on infrastructural or data quality-related issues [1]

the views can be created in a notebook [2]

{benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]

can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]

faster development cycles
trustworthy data
quicker insights

{goal} process only the new or changed data instead of reprocessing everything each time [1]

⇐ leverages Delta Lake’s CDF under the hood

⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]

{operation} creation

allows defining transformations at each layer [1]

e.g. aggregation, projection, filters

allows specifying certain checks that the data must meet [1]

incorporate data quality constraints directly into the pipeline definition

via CREATE MATERIALIZED LAKE VIEW

the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]

{operation} refresh

refreshes only when its source has new data [1]

if there’s no change, it can skip running entirely (saving time and resources) [1]

via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];

{operation} list views from schema [3]

via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;

{opetation} retrieve definition

via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;

{operstion} update definition

via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;

{operstion} drop view

via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
{warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]

{recommendation} update the reference in all dependent materialized lake views [3]

{operation} schedule view run

lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
depends on

data update frequency: the frequency with which the data is updated [5]
query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
system load: optimizing the time to run the lineage without overloading the system [5]

{operation} view run history

users can access the last 25 runs including lineage and run metadata

available from the dropdown for monitoring and troubleshooting

{concept} lineage

the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]

{feature} automatically generate a visual report that shows trends on data quality constraints

{benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]

{feature} can be combined with Shortcut Transformation feature for CSV ingestion

{benefit} facilitate the building of end-to-end Medallion architectures

{feature} dependency graph

allows to see the dependencies existing between the various objects [2]

⇐ automatically generated [2]

{feature} data quality

{benefit} allows to compose precise queries to exclude poor quality data from the source tables [5]
[medallion architecture] ensuring data quality is essential at every stage of the architecture [5]
maintained by setting constraints when defining the MLVs [5]
{action} FAIL

stops refreshing an MLV if any constraint is violated [5]
{default} halt is at the first instance

even without specifying the FAIL keyword [5]
takes precedence over DROP

{action} DROP

processes the MLV and removes records that don't meet the specified constrain [5]

provides the count of removed records in the lineage view [5]

{constraint} updating data quality constraints after creating an MLV isn't supported [5]

⇐ the MLV must be recreated

{constraint} the use of functions and pattern search with operators in constraint condition is restricted [5]

e.g. LIKE, regex

{known issue} the creation and refresh of an MLV with a FAIL action in constraint may result in a "delta table not found" error

{recommendation} recreate the MLV and avoid using the FAIL action [5]

{feature} data quality report

built-in Power BI dashboard that shows several aggregated metrics [2]

{feature} monitor hub

centralized portal to browse MLV runs in the lakehouse [7]
{operation} view runs' status [7]
{operation} search and filter the runs [7]

based on different criteria

{operation} cancel in-progress run [7]
{operation} drill down run execution details [7]

doesn't support

{feature|planned} PySpark [3]
{feature|planned} incremental refresh [3]
{feature|planned} integration with Data Activator [3]
{feature|planned} API [3]
{feature|planned} cross-lakehouse lineage and execution [3]
{limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
{limitation} creation with delta time-travel [4]
{limitation} DML statements [4]
{limitation} UDFs in CTAS [4]
{limitation} temporary views can't be used to define MLVs [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]

[2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]

[3] Microsoft Learn (2025) What are materialized lake views in Microsoft Fabric? [link]

[4] Microsoft Learn (2025) Materialized lake views Spark SQL reference [link]

[5] Microsoft Learn (2025) Manage Fabric materialized lake views lineage [link]

[6] Microsoft Learn (2025) Data quality in materialized lake views [link]

[7] Microsoft Learn (2025) Monitor materialized lake views [link]

Resources:
[R1] Databricks (2025) Use materialized views in Databricks SQL [link]

[R2] Microsoft Learn (2025) Implement medallion architecture with materialized lake views [link]

[R3] Mastering Declarative Data Transformations with Materialized Lake Views [link]

Acronyms:

API - Application Programming Interface

CDF - Change Data Feed

CTAS - Create Tables As Select

DML - Data Manipulation Language

ETL - Extract, Transfer, Load

MF - Microsoft Fabric
MLV - Materialized Lake views

UDF - User-defined functions

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 23-May-2025

[Microsoft Fabric] Warehouse Snapshots

{def} read-only representation of a warehouse at a specific point in time [1]
allows support for analytics, reporting, and historical analysis scenarios without worrying about the volatility of live data updates [1]

provide a consistent and stable view of data [1]
ensuring that analytical workloads remain unaffected by ongoing changes or ETL operations [1]

{benefit} guarantees data consistency

the dataset remains unaffected by ongoing ETL processes [1]

{benefit} immediate roll-Forward updates

can be seamlessly rolled forward on demand to reflect the latest state of the warehouse

⇒ {benefit} consumers access the same snapshot using a consistent connection string, even from third-party tools [1]
⇐ updates are applied immediately, as if in a single, atomic transaction [1]

{benefit} facilitates historical analysis

snapshots can be created on an hourly, daily, or weekly basis to suit their business requirements [1]

{benefit} enhanced reporting

provides a point-in-time reliable dataset for precise reporting [1]

⇐ free from disruptions caused by data modifications [1]

{benefit} doesn't require separate storage [1]

relies on source Warehouse [1]

{limit} doesn't support database objects
{limit} capture a state within the last 30 days
{operation} create snapshot

via New warehouse snapshot
multiple snapshots can be created for the same parent warehouse [1]

appear as child items of the parent warehouse in the workspace view [1]
the queries run against provide the current version of the data being accessed [1]

{operation} read properties

via
GET https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{warehousesnapshotId} Authorization: Bearer <bearer token>

{operation} update snapshot timestamp

allows users to roll forward data instantly, ensuring consistency [1]

use current state

via ALTER DATABASE [<snapshot name>] SET TIMESTAMP = CURRENT_TIMESTAMP;

use point in time

ALTER DATABASE snapshot SET TIMESTAMP = 'YYYY-MM-DDTHH:MM:SS.SS'//UTC time

queries that are in progress during point in time update will complete against the version of data they were started against [1]

{operation} rename snapshot
{operation} delete snapshot

via DELETE
when the parent warehouse gets deleted, the snapshot is also deleted [1]

{operation} modify source table

DDL changes to source will only impact queries in the snapshot against tables affected [1]

{operation} join multiple snapshots

the resulting snapshot date will be applied to each warehouse connection [1]

{operation} retrieve metadata

via sys.databases [1]

[permissions] inherited from the source warehouse [1]

⇐ any permission changes in the source warehouse applies instantly to the snapshot [1]
security updates on source database will be rendered immediately to the snapshot databases [1]

{limitation} can only be created against new warehouses [1]

created after Mar-2025

{limitation} do not appear in SSMS Object Explorer but will show up in the database selection dropdown [1]
{limitation} datetime can be set to any date in the past up to 30 days or database creation time (whichever is later) [1]
{limitation} modified objects after the snapshot timestamp become invalid in the snapshot [1]

applies to tables, views, and stored procedures [1]

{limitation} must be recreated if the data warehouse is restored [1]

{limitation} aren’t supported on the SQL analytics endpoint of the Lakehouse [1]
{limitation} aren’t supported as a source for OneLake shortcuts [1]
[Power BI]{limitation} require Direct Query or Import mode [1]

don’t support Direct Lake

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]

[2] Microsoft Learn (2025) Warehouse snapshots (preview) [link]

[3] Microsoft Learn (2025) Create and manage a warehouse snapshot (preview) [link]

Resources:

Acronyms:

DDL - Data Definition Language
ETL - Extract, Transfer, Load

MF - Microsoft Fabric
SSMS - SQL Server Management Studio

29 April 2025

🏭🗒️Microsoft Fabric: Purview [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 29-Apr-2025

[Microsoft Purview] Purview

{def} comprehensive data governance and security platform designed to help organizations manage, protect, and govern their data across various environments [1]

incl. on-premises, cloud & SaaS applications [1]
provides the highest and most flexible level of functionality for data governance in MF [1]

offers comprehensive tools for

data discovery
data classification
data cataloging

{capability} managing the data estate

{tool} dedicated portal

aka Fabric Admin portal
used to control tenant settings, capacities, domains, and other objects, typically reserved for administrators

{type} logical containers

used to control access to data and capabilities [1]
{level} tenants

settings for Fabric administrators [1]

{level} domains

group data that is relevant to a single business area or subject field [1]

{level} workspaces

group Fabric items used by a single team or department [1]

{type} capacities

objects that limit compute resource usage for all Fabric workloads [1]

{capability} metadata scanning

extracts values from data lakes

e.g. names, identities, sensitivities, endorsements, etc.
can be used to analyze and set governance policies [1]

{capability} secure and protect data

assure that data is protected against unauthorized access and destructive attacks [1]
compliant with data storage regulations applicable in your region [1]
{tool} data tags

allows to identity the sensitivity of data and apply data retentions and protection policies [1]

{tool} workspace roles

define the users who are authorized to access the data in a workspace [1]

{tool} data-level controls

used at the level of Fabric items

e.g. tables, rows, and columns to impose granular restrictions.

{tool} certifications

Fabric is compliant with many data management certifications

incl. HIPAA BAA, ISO/IEC 27017, ISO/IEC 27018, ISO/IEC 27001, ISO/IEC 27701 [1]

{feature} OneLake data hub

allows users to find and explore the data in their estate.

{feature} endorsement

allows users to endorse a Fabric item to identity it as of high quality [1]

help other users to trust the data that the item contains [1]

{feature} data lineage

allows users to understand the flow of data between items in a workspace and the impact that a change would have [1]

{feature} monitoring hub

allows to monitor activities for the Fabric items for which the user has the permission to view [1]

{feature} capacity metrics

app used to monitor usage and consumption

{feature} allows to automate the identification of sensitive information and provides a centralized repository for metadata [1]
feature} allows to find, manage, and govern data across various environments

incl. both on-premises and cloud-based systems [1]
supports compliance and risk management with features that monitor regulatory adherence and assess data vulnerabilities [1]

{feature} integrated with other Microsoft services and third-party tools

{benefit} enhances its utility
{benefit} streamlines data access controls

enforcing policies, and delivering insights into data lineage [1]

{benefit} helps organizations maintain data integrity, comply with regulations, and use their data effectively for strategic decision-making [1]
{feature} Data Catalog

{benefit} allows users to discover, understand, and manage their organization's data assets

search for and browse datasets
view metadata
gain insights into the data’s lineage, classification, and sensitivity labels [1]

{benefit} promotes collaboration

users can annotate datasets with tags to improve discoverability and data governance [1]

targets users and administrator
{benefit} allows to discover where patient records are held by searching for keywords [1]
{benefit} allows to label documents and items based on their sensitiveness [1]
{benefit} allows to use access policies to manage self-service access requests [1]

{feature} Information Protection

used to classify, label, and protect sensitive data throughout the organization [1]

by applying customizable sensitivity labels, users classify records. [1]
{concept} policies

define access controls and enforce encryption
labels follow the data wherever it goes
helps organizations meet compliance requirements while safeguarding data against accidental exposure or malicious threats [1]

allows to protect records with policies to encrypt data and impose IRM

{feature} Data Loss Prevention (DLP)

the practice of protecting sensitive data to reduce the risk from oversharing [2]

implemented by defining and applying DLP policies [2]

{feature} Audit

user activities are automatically logged and appear in the Purview audit log

e.g. creating files or accessing Fabric items

{feature} connect Purview to Fabric in a different tenant

all functionality is supported, except that

{limitation} Purview's live view isn't available for Fabric items [1]
{limitation} the system can't identify user registration automatically [1]
{limitation} managed identity can’t be used for authentication in cross-tenant connections [1]

{workaround} use a service principal or delegated authentication [1]

{feature} Purview hub

displays reports and insights about Fabric items [1]

acts as a centralized location to begin data governance and access more advanced features [1]
via Settings >> Microsoft Purview hub
administrators see information about their entire organization's Fabric data estate
provides information about

Data Catalog
Information Protection
Audit

the data section displays tables and graphs that analyze the entire organization's items in MF

users only see information about their own Fabric items and data

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Purview: Govern data in Microsoft Fabric with Purview[link]

[2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]

[3] Microsoft Learn (2024) [link]

Resources:

Acronyms:

DLP - Data Loss Prevention
M365 - Microsoft 365

MF - Microsoft Fabric
SaaS - Software-as-a-Service

🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 10-Jun-2025

[Microsoft Purview] Data Loss Prevention (DLP)

{def} the practice of protecting sensitive data to reduce the risk from oversharing [2]

implemented by defining and applying DLP policies [2]

{benefit} helps to protect sensitive information with policies that automatically detect, monitor, and control the sharing or movement of sensitive data [1]

administrators can customize rules to block, restrict, or alert when sensitive data is transferred to prevent accidental or malicious data leaks [1]

{concept} DLP policies

allow to monitor the activities users take on sensitive items and then take protective actions [2]

applies to sensitive items

at rest
in transit [2]
in use [2]

created and maintained in the Microsoft Purview portal [2]

{scope} only supported for Power BI semantic models [1]
{action} show a pop-up policy tip to the user that warns that they might be trying to share a sensitive item inappropriately [2]
{action} block the sharing and, via a policy tip, allow the user to override the block and capture the users' justification [2]
{action} block the sharing without the override option [2]
{action} [data at rest] sensitive items can be locked and moved to a secure quarantine location [2]
{action} sensitive information won't be displayed

e.g. Teams chat

DLP reports

provides data from monitoring policy matches and actions, to user activities [2]

used as basis for tuning policies and triage actions taken on sensitive items [2]

telemetry uses M365 audit Logs and processed the data for the different reporting tools [2]

M365 provides with visibility into risky user activities [2]
scans the audit logs for risky activities and runs them through a correlation engine to find activities that are occurring at a high volume [1]

no DLP policies are required [2]

{feature} detects sensitive items by using deep content analysis [2]

⇐ not by just a simple text scan [2]
based on

keywords matching [2]
evaluation of regular expressions [2]
internal function validation [2]
secondary data matches that are in proximity to the primary data match [2]
ML algorithms and other methods to detect content that matches DLP policies

all DLP monitored activities are recorded to the Microsoft 365 Audit log [2]

DLP lifecycle

{phase} plan for DLP

train and acclimate users to DLP practices on well-planned and tuned policies [2]
{recommendation} use policy tips to raise awareness with users before changing the policy status from simulation mode to more restrictive modes [2]

{phase} prepare for DLP
{phase} deploy policies in production

{action} define control objectives, and how they apply across workloads [2]
{action} draft a policy that embodies the objectives
{action} start with one workload at a time, or across all workloads - there's no impact yet
{feature} implement policies in simulation mode

{benefit} allows to evaluate the impact of controls

the actions defined in a policy aren't applied yet

{benefit} allows to monitor the outcomes of the policy and fine-tune it so that it meets the control objectives while ensuring it doesn't adversely or inadvertently impacting valid user workflows and productivity [2]

e.g. adjusting the locations and people/places that are in or out of scope
e.g. tune the conditions that are used to determine if an item and what is being done with it matches the policy
e.g. the sensitive information definition/s
e.g. add new controls
e.g. add new people
e.g. add new restricted apps
e.g. add new restricted sites

{step} enable the control and tune policies [2]

policies take effect about an hour after being turned on [2]

{action} create DLP policy
{action} deploy DLP policy

DLP alerts

alerts generated when a user performs an action that meets the criteria of a DLP policy [2]

there are incident reports configured to generate alerts [2]
{limitation} available in the alerts dashboard for 30 days [2]

DLP posts the alert for investigation in the DLP Alerts dashboard
{tool} DLP Alerts dashboard

allows to view alerts, triage them, set investigation status, and track resolution

routed to Microsoft Defender portal
{limitation} available for six months [2]

{constraint} administrative unit restricted admins see the DLP alerts for their administrative unit only [2]

{concept} egress activities (aka exfiltration)

{def} actions related to exiting or leaving a space, system or network [2]

{concept}[Microsoft Fabric] policy

when a DLP policy detects a supported item type containing sensitive information, the actions configured in the policy are triggered [3]
{feature} Activity explorer

allows to view Data from DLP for Fabric and Power BI
for accessing the data, user's account must be a member of any of the following roles or higher [3]

Compliance administrator
Security administrator
Compliance data administrator
Global Administrator

{warning} a highly privileged role that should only be used in scenarios where a lesser privileged role can't be used [3]

{recommendation} use a role with the fewest permissions [3]

{warning} DLP evaluation workloads impact capacity consumption [3]
{action} define policy

in the data loss prevention section of the Microsoft Purview portal [3]
allows to specify

conditions

e.g. sensitivity labels

sensitive info types that should be detected [3]

[semantic model] evaluated against DLP policies

whenever one of the following events occurs:

publish
republish
on-demand refresh
scheduled refresh

the evaluation doesn't occur if either of the following is true

the initiator of the event is an account using service principal authentication [3]
the semantic model owner is a service principal [3]

[lakehouse] evaluated against DLP policies when the data within a lakehouse undergoes a change

e.g. getting new data, connecting a new source, adding or updating existing tables, etc. [3]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Learn about data loss prevention [link]

[2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]

[3] Microsoft Learn (2025) Get started with Data loss prevention policies for Fabric and Power BI [link]

Resources:

[R1] Microsoft Fabric Updates Blog (2024) Secure Your Data from Day One: Best Practices for Success with Purview Data Loss Prevention (DLP) Policies in Microsoft Fabric [link]
[R2]

Acronyms:

DLP - Data Loss Prevention
M365 - Microsoft 365

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

{def} parameters that allow to dynamically control and customize Dataflows Gen2

makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
the dataflow is refreshed by passing parameter values outside of the Power Query editor through either

Fabric REST API [1]
native Fabric experiences [1]

parameter names are case sensitive [1]
{type} required parameters

{warning} the refresh fails if no value is passed for it [1]

{type} optional parameters
enabled via Parameters >> Enable parameters to be discovered and override for execution [1]

{limitation} dataflows with parameters can't be

scheduled for refresh through the Fabric scheduler [1]
manually triggered through the Fabric Workspace list or lineage view [1]

{limitation} parameters that affect the resource path of a data source or a destination are not supported [1]

⇐ connections are linked to the exact data source path defined in the authored dataflow

can't be currently override to use other connections or resource paths [1]

{limitation} can't be leveraged by dataflows with incremental refresh [1]
{limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override

any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]

{warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
{limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
{limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
{limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
{limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]

subsequent requests are rejected until the first request finishes its evaluation [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link]

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link]

Acronyms:
API - Application Programming Interface

REST - Representational State Transfer

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

[Microsoft Fabric] Deployment Pipelines

{def} a structured process that enables content creators to manage the lifecycle of their organizational assets [5]

enable creators to develop and test content in the service before it reaches the users [5]

can simplify the deployment process to development, test, and production workspaces [5]
one Premium workspace is assigned to each stage [5]
each stage can have

different configurations [5]
different databases or different query parameters [5]

{action} create pipeline

from the deployment pipelines entry point in Fabric [5]

creating a pipeline from a workspace automatically assigns it to the pipeline [5]

{action} define how many stages it should have and what they should be called [5]

{default} has three stages

e.g. Development, Test, and Production
the number of stages can be changed anywhere between 2-10
{action} add another stage,
{action} delete stage
{action} rename stage

by typing a new name in the box

{action} share a pipeline with others

users receive access to the pipeline and become pipeline admins [5]

⇐ the number of stages are permanent [5]

can't be changed after the pipeline is created [5]

{action} add content to the pipeline [5]

done by assigning a workspace to the pipeline stage [5]

the workspace can be assigned to any stage [5]

{action|optional} make a stage public

{default} the final stage of the pipeline is made public
a consumer of a public stage without access to the pipeline sees it as a regular workspace [5]

without the stage name and deployment pipeline icon on the workspace page next to the workspace name [5]

{action} deploy to an empty stage

when finishing the work in one pipeline stage, the content can be deployed to the next stage [5]

deployment can happen in any direction [5]

{option} full deployment

deploy all content to the target stage [5]

{option} selective deployment

allows select the content to deploy to the target stage [5]

{option} backward deployment

deploy content from a later stage to an earlier stage in the pipeline [5]
{restriction} only possible when the target stage is empty [5]

{action} deploy content between pages [5]

content can be deployed even if the next stage has content

paired items are overwritten [5]

{action|optional} create deployment rules

when deploying content between pipeline stages, allow changes to content while keeping some settings intact [5]
once a rule is defined or changed, the content must be redeployed

the deployed content inherits the value defined in the deployment rule [5]
the value always applies as long as the rule is unchanged and valid [5]

{feature} deployment history

allows to see the last time content was deployed to each stage [5]
allows to to track time between deployments [5]

{concept} pairing

{def} the process by which an item in one stage of the deployment pipeline is associated with the same item in the adjacent stage

applies to reports, dashboards, semantic models
paired items appear on the same line in the pipeline content list [5]

⇐ items that aren't paired, appear on a line by themselves [5]

the items remain paired even if their name changes
items added after the workspace is assigned to a pipeline aren't automatically paired [5]

⇐ one can have identical items in adjacent workspaces that aren't paired [5]

[lakehouse]

can be removed as a dependent object upon deployment [3]
supports mapping different Lakehouses within the deployment pipeline context [3]
{default} a new empty Lakehouse object with same name is created in the target workspace [3]

⇐ if nothing is specified during deployment pipeline configuration
notebook and Spark job definitions are remapped to reference the new lakehouse object in the new workspace [3]
{warning} a new empty Lakehouse object with same name still is created in the target workspace [3]
SQL Analytics endpoints and semantic models are provisioned
no object inside the Lakehouse is overwritten [3]
updates to Lakehouse name can be synchronized across workspaces in a deployment pipeline context [3]

[notebook] deployment rules can be used to customize the behavior of notebooks when deployed [4]

e.g. change notebook's default lakehouse [4]
{feature} auto-binding

binds the default lakehouse and attached environment within the same workspace when deploying to next stage [4]

[environment] custom pool is not supported in deployment pipeline

the configurations of Compute section in the destination environment are set with default values [6]
⇐ subject to change in upcoming releases [6]

[warehouse]

[database project] ALTER TABLE to add a constraint or column

{limitation} the table will be dropped and recreated when deploying, resulting in data loss

{recommendation} do not create a Dataflow Gen2 with an output destination to the warehouse

⇐ deployment would be blocked by a new item named DataflowsStagingWarehouse that appears in the deployment pipeline [10]

SQL analytics endpoint is not supported

[Eventhouse]

{limitation} the connection must be reconfigured in destination that use Direct Ingestion mode [8]

[EventStream]

{limitation} limited support for cross-workspace scenarios

{recommendation} make sure all EventStream destinations within the same workspace [8]

KQL database

applies to tables, functions, materialized views [7]

KQL queryset

⇐ tabs, data sources [7]

[real-time dashboard]

data sources, parameters, base queries, tiles [7]

[SQL database]

includes the specific differences between the individual database objects in the development and test workspaces [9]

can be also used with

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Get started with deployment pipelines [link]

[2] Microsoft Learn (2024) Implement continuous integration and continuous delivery (CI/CD) in Microsoft Fabric [link]

[3] Microsoft Learn (2024) Lakehouse deployment pipelines and git integration (Preview) [link]

[4] Microsoft Learn (2024) Notebook source control and deployment [link]

[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Environment Git integration and deployment pipeline [link]

[7] Microsoft Learn (2024) Microsoft Learn (2024) Real-Time Intelligence: Git integration and deployment pipelines (Preview) [link]

[8] Microsoft Learn (2024) Eventstream CI/CD - Git Integration and Deployment Pipeline [link]

[9] Microsoft Learn (2024) Get started with deployment pipelines integration with SQL database in Microsoft Fabric [link]

[10] Microsoft Learn (2025) Source control with Warehouse (preview) [link]

Resources:

Acronyms:
CLM - Content Lifecycle Management
UAT - User Acceptance Testing

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

Enterprise Content Publishing [2]

[Microsoft Fabric] Power BI Environments

{def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
{environment} development

allows to develop the solution
accessible only to the development team

via Contributor access

{recommendation} use Power BI Desktop as local development environment

{benefit} allows to try, explore, and review updates to reports and datasets

once the work is done, upload the new version to the development stage

{benefit} enables collaborating and changing dashboards
{benefit} avoids duplication

making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication

{recommendation} use version control to keep the .pbix files up to date

[OneDrive] use Power BI's autosync

{alternative} SharePoint Online with folder synchronization
{alternative} GitHub and/or VSTS with local repository & folder synchronization

[enterprise scale deployments]

{recommendation} separate dataset from reports and dashboards’ development

use the deployment pipelines selective deploy option [22]
create separate .pbix files for datasets and reports [22]

create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
create .pbix only for the report, and connect it to the published dataset using a live connection [22]

{benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently

{recommendation} separate data model from report and dashboard development

allows using advanced capabilities

e.g. source control, merging diff changes, automated processes

separate the development from test data sources [1]

the development database should be relatively small [1]

{recommendation} use only a subset of the data [1]

⇐ otherwise the data volume can slow down the development [1]

{environment} user acceptance testing (UAT)

test environment that within the deployment lifecycle sits between development and production

it's not necessary for all Power BI solutions [3]
allows to test the solution before deploying it into production

all tests must have

View access for testing
Contributor access for report authoring

involves business users who are SMEs

provide approval that the content

is accurate
meets requirements
can be deployed for wider consumption

{recommendation} check report’s load and the interactions to find out if changes impact performance [1]
{recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
{recommendation} test data refresh in the Power BI service regularly during development [20]

{environment} production

{concept} staged deployment

{goal} help minimize risk, user disruption, or address other concerns [3]

the deployment involves a smaller group of pilot users who provide feedback [3]

{recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]

allows ensuring the data in production is always connected and available to users [1]

{recommendation} don’t upload a new .pbix version directly to the production stage

⇐ without going through testing

{feature|preview} deployment pipelines

enable creators to develop and test content in the service before it reaches the users [5]

{recommendation} build separate databases for development and testing

helps protect production data [1]

{recommendation} make sure that the test and production environment have similar characteristics [1]

e.g. data volume, sage volume, similar capacity
{warning} testing into production can make production unstable [1]
{recommendation} use Azure A capacities [22]

{recommendation} for formal projects, consider creating an environment for each phase
{recommendation} enable users to connect to published datasets to create their own reports
{recommendation} use parameters to store connection details

e.g. instance names, database names
⇐ deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages

alternatively data source rules can be used to specify a connection string for a given dataset

{restriction} in deployment pipelines, this isn't supported for all data sources

{recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
{recommendation} provide data to self-service authors from a centralized data warehouse [20]

allows to minimize the amount of work that self-service authors need to take on [20]

{recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
{recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
{recommendation} be aware of API connectivity issues and limits [20]
{recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
{recommendation} minimize the query load on source systems [20]

use incremental refresh in Power BI for the dataset(s)
use a Power BI dataflow that extracts the data from the source on a schedule
reduce the dataset size by only extracting the needed amount of data

{recommendation} expect data refresh operations to take some time [20]
{recommendation} use relational database sources when practical [20]
{recommendation} make the data easily accessible [20]
[knowledge area] knowledge transfer

{recommendation} maintain a list of best practices and review it regularly [24]
{recommendation} develop a training plan for the various types of users [24]

usability training for read only report/app users [24
self-service reporting for report authors & data analysts [24]
more elaborated training for advanced analysts & developers [24]

[knowledge area] lifecycle management

consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
{recommendation} postfix files with 3-part version number in Development stage [24]

remove the version number when publishing files in UAT and production

{recommendation} backup files for archive
{recommendation} track version history

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]

[2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
[3] Microsoft Learn (2024) Deploy to Power BI [link]
[4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]

[20] Microsoft (2020) Planning a Power BI Enterprise Deployment [White paper] [link]

[22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]

[24] Paul Turley (2019) A Best Practice Guide and Checklist for Power BI Projects

Resources:

Acronyms:
API - Application Programming Interface
CLM - Content Lifecycle Management
COE - Center of Excellence
SaaS - Software-as-a-Service
SME - Subject Matter Expert
UAT - User Acceptance Testing
VSTS - Visual Studio Team System
SME - Subject Matter Experts

SQL Troubles

Pages

06 October 2025

🏭🗒️Microsoft Fabric: Git [Notes]

05 July 2025

🏭🗒️Microsoft Fabric: Git Repository [Notes]

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

29 April 2025

🏭🗒️Microsoft Fabric: Purview [Notes]

🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

About Me