Showing posts with label notes. Show all posts
Showing posts with label notes. Show all posts

05 July 2025

🏭🗒️Microsoft Fabric: Git Repository [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 4-Jul-2025

[Microsoft Fabric] Git Repository

  • {def} set of features that enable developers to integrate their development processes, tools, and best practices straight into the Fabric platform [2]

  • {goal} the repo serves as single-source-of-truth
  • {feature} backup and version control [2]
  • {feature} revert to previous stages [2]
  • {feature} collaborate with others or work alone using Git branches [2]
  • {feature} source control 
    • provides tools to manage Fabric items [2]
    • supported for Azure DevOps and GitHub [3]
  • {configuration} tenant switches 
    • ⇐ must be enabled from the Admin portal 
      • by the tenant admin, capacity admin, or workspace admin
        • dependent on organization's settings [3]
    • users can create Fabric items
    • users can synchronize workspace items with their Git repositories
    • create workspaces
      • only if is needed to branch out to a new workspace [3]
    • users can synchronize workspace items with GitHub repositories
      • for GitHub users only [3]
  • {concept} release process 
    • begins once new updates complete a Pull Request process and merge into the team’s shared branch [3]
  • {concept} branch
    • {operation} switch branches
      • the workspace syncs with the new branch and all items in the workspace are overridden [3]
        • if there are different versions of the same item in each branch, the item is replaced [3]
        • if an item is in the old branch, but not the new one, it gets deleted [3]
      • one can't switch branches if there are any uncommitted changes in the workspace [3]
    • {action} branch out to another workspace 
      • creates a new workspace, or switches to an existing workspace based on the last commit to the current workspace, and then connects to the target workspace and branch [4]
      • {permission} contributor and above
    • {action} checkout new branch )
      • creates a new branch based on the last synced commit in the workspace [4]
      • changes the Git connection in the current workspace [4]
      • doesn't change the workspace content [4]
      • {permission} workspace admin
    • {action} switch branch
      • syncs the workspace with another new or existing branch and overrides all items in the workspace with the content of the selected branch [4]
      • {permission} workspace admin
    • {limitation} maximum length of branch name: 244 characters.
    • {limitation} maximum length of full path for file names: 250 characters
    • {limitation} maximum file size: 25 MB
  • {operation} connect a workspace to a Git Repos 
    • can be done only by a workspace admin [4]
      • once connected, anyone with permissions can work in the workspace [4]
    • synchronizes the content between the two (aka initial sync)
      • {scenario} either of the two is empty while the other has content
        • the content is copied from the nonempty location to the empty on [4]
      • {scenario}both have content
        • one must decide which direction the sync should go [4]
          • overwrite the content from the destination [4]
      • includes folder structures [4]
        • workspace items in folders are exported to folders with the same name in the Git repo [4]
        • items in Git folders are imported to folders with the same name in the workspace [4]
        • if the workspace has folders and the connected Git folder doesn't yet have subfolders, they're considered to be different [4]
          • leads to uncommitted changes status in the source control panel [4]
            • one must to commit the changes to Git before updating the workspace [4]
              • update first, the Git folder structure overwrites the workspace folder structure [4]
        • {limitation} empty folders aren't copied to Git
          • when creating or moving items to a folder, the folder is created in Git [4]
        • {limitation} empty folders in Git are deleted automatically [4]
        • {limitation} empty folders in the workspace aren't deleted automatically even if all items are moved to different folders [4]
        • {limitation} folder structure is retained up to 10 levels deep [4]
        • {limitation} the folder structure is maintained up to 10 levels deep
    •  Git status
      • synced 
        • the item is the same in the workspace and Git branch [4]
      •  conflict 
        • the item was changed in both the workspace and Git branch [4]
      •  unsupported item
      •  uncommitted changes in the workspace
      •  update required from Git [4]
      •  item is identical in both places but needs to be updated to the last commit [4]
  • source control panel
    • shows the number of items that are different in the workspace and Git branch
      • when changes are made, the number is updated
      • when the workspace is synced with the Git branch, the Source control icon displays a 0
  • commit and update panel 
    • {section} changes 
      • shows the number of items that were changed in the workspace and need to be committed to Git [4]
      • changed workspace items are listed in the Changes section
        • when there's more than one changed item, one can select which items to commit to the Git branch [4]
      • if there were updates made to the Git branch, commits are disabled until you update your workspace [4]
    • {section} updates 
      • shows the number of items that were modified in the Git branch and need to be updated to the workspace [4]
      • the Update command always updates the entire branch and syncs to the most recent commit [4]
        • {limitation} one can’t select specific items to update [4]
        • if changes were made in the workspace and in the Git branch on the same item, updates are disabled until the conflict is resolved [4]
    • in each section, the changed items are listed with an icon indicating the status
      •  new
      •  modified
      •  deleted
      •  conflict
      •  same-changes
  • {concept} related workspace
    • workspace with the same connection properties as the current branch [4]
      • e.g.  the same organization, project, repository, and git folder [4] 
Previous Post <<||>> Next Post 

References:
[2] Microsoft Learn (2025) Fabric: What is Microsoft Fabric Git integration? [link
What is lifecycle management in Microsoft Fabric? [link]
[3] Microsoft Fabric Updates Blog (2025) Fabric: Introducing New Branching Capabilities in Fabric Git Integration [link
[4] Microsoft Learn (2025) Fabric: Basic concepts in Git integration [link]
[5]  [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: 

Acronyms:
CI/CD - Continuous Integration and Continuous Deployment

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 21-Jun-2025

[Microsoft Fabric] Result Set Caching in SQL Analytics Endpoints

  • {def} built-in performance optimization for Warehouse and Lakehouse that improves read latency [1]
    • fully transparent to the user [3]
    • persists the final result sets for applicable SELECT T-SQL queries
      • caches all the data accessed by a query [3]
      • subsequent runs that "hit" cache will process just the final result set
        • can bypass complex compilation and data processing of the original query[1]
          • ⇐ returns subsequent queries faster [1]
      • the cache creation and reuse is applied opportunistically for queries
    • works on
      • warehouse tables
      • shortcuts to OneLake sources
      • shortcuts to non-Azure sources
    • the management of cache is handled automatically [1]
      • regularly evicts cache as needed
    • as data changes, result consistency is ensured by invalidating cache created earlier [1]
  • {operation} enable setting
    • via ALTER DATABASE <database_name> SET RESULT_SET_CACHING ON
  • {operation} validate setting
    • via SELECT name, is_result_set_caching_on FROM sys.databases
  • {operation} configure setting
    • configurable at item level
      • once enabled, it can then be disabled 
        • at the item level
        • for individual queries
          • e.g. debugging or A/B testing a query
        • via OPTION ( USE HINT ('DISABLE_RESULT_SET_CACHE') 
    • {default} during the preview, result set caching is off for all items [1]
  • [monitoring] 
    • via Message Output
      • applicable to Fabric Query editor, SSMS
      • the statement "Result set cache was used" is displayed after query execution if the query was able to use an existing result set cache
    • via queryinsights.exec_requests_history system view
      • result_cache_hit displays indicates result set cache usage for each query execution [1]
        • {value} 2: the query used result set cache (cache hit)
        • {value} 1: the query created result set cache
        • {value} 0: the query wasn't applicable for result set cache creation or usage [1]
          • {reason} the cache no longer exists
          • {reason} the cache was invalidated by a data change, disqualifying it for reuse [1]
          • {reason} query isn't deterministic
            • isn't eligible for cache creation [1]
          • {reason} query isn't a SELECT statement
  • [warehousing] 
    • {scenario} analytical queries that process large amounts of data to produce a relatively small result [1]
    • {scenario} workloads that trigger the same analytical queries repeatedly [1]
      • the same heavy computation can be triggered multiple times, even though the final result remains the same [1]

References:
[1] Microsoft Learn (2025) Result set caching (preview) [link]
[2] Microsoft Fabric Update Blog (2025) Result Set Caching for Microsoft Fabric Data Warehouse (Preview) [link|aka]
[3] Microsoft Learn (2025) In-memory and disk caching [link]
[4] Microsoft Learn (2025) Performance guidelines in Fabric Data Warehouse [link

Resources:
[R1] Microsoft Fabric (2025) Fabric Update - June 2025 [link]

Acronyms:
MF - Microsoft Fabric
SSMS - SQL Server Management Studio

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 27-Jul-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
[(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
)] 
[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”, 
AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLVs)

  • {def} persisted, continuously updated view of data [1]
    • {benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations
      • simplifies the implementation of multi-stage Lakehouse processing [1]
        • ⇐ aids in the creation, management, and monitoring of views [3]
        • ⇐ improves transformations through a declarative approach [3]
        • streamline data workflows
        • enable developers to focus on business logic [1]
          • ⇐ not on infrastructural or data quality-related issues [1]
        • the views can be created in a notebook [2]
    • {benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]
      • can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
      • empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]
        • faster development cycles 
        • trustworthy data
        • quicker insights
  • {goal} process only the new or changed data instead of reprocessing everything each time [1]
    • ⇐  leverages Delta Lake’s CDF under the hood
      • ⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]
  • {operation} creation
    • allows defining transformations at each layer [1]
      • e.g. aggregation, projection, filters
    • allows specifying certain checks that the data must meet [1]
      • incorporate data quality constraints directly into the pipeline definition
    • via CREATE MATERIALIZED LAKE VIEW
      • the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]
  • {operation} refresh
    • refreshes only when its source has new data [1]
      • if there’s no change, it can skip running entirely (saving time and resources) [1]
    • via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];
  • {operation} list views from schema [3]
    • via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;
  • {opetation} retrieve definition
    • via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;
  • {operstion} update definition
    • via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;
  • {operstion} drop view
    • via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
    • {warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]
    • {recommendation} update the reference in all dependent materialized lake views [3]
  • {operation} schedule view run
    • lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
    • depends on
      • data update frequency: the frequency with which the data is updated [5]
      • query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
      • system load: optimizing the time to run the lineage without overloading the system [5]
  • {operation} view run history
    • users can access the last 25 runs including lineage and run metadata
      • available from the dropdown for monitoring and troubleshooting
  • {concept} lineage
    • the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]
  • {feature} automatically generate a visual report that shows trends on data quality constraints 
    • {benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]
  • {feature} can be combined with Shortcut Transformation feature for CSV ingestion 
    • {benefit} facilitate the building of end-to-end Medallion architectures
  • {feature} dependency graph
    • allows to see the dependencies existing between the various objects [2]
      • ⇐ automatically generated [2]
  • {feature} data quality
    • {benefit} allows to compose precise queries to exclude poor quality data from the source tables [5]
    • [medallion architecture] ensuring data quality is essential at every stage of the architecture [5]
    • maintained by setting constraints when defining the MLVs [5]
    • {action} FAIL
      • stops refreshing an MLV if any constraint is violated [5]
      • {default} halt is at the first instance
        • even without specifying the FAIL keyword [5]
        • takes precedence over DROP
    • {action} DROP
      • processes the MLV and removes records that don't meet the specified constrain [5]
        • provides the count of removed records in the lineage view [5]
    • {constraint} updating data quality constraints after creating an MLV isn't supported [5]
      • ⇐ the MLV must be recreated
    • {constraint} the use of functions and pattern search with operators in constraint condition is restricted [5]
      • e.g. LIKE, regex 
    • {known issue} the creation and refresh of an MLV with a FAIL action in constraint may result in a "delta table not found" error
      • {recommendation} recreate the MLV and avoid using the FAIL action [5] 
      • {feature} data quality report
        • built-in Power BI dashboard that shows several aggregated metrics [2]
    • {feature} monitor hub
      • centralized portal to browse MLV runs in the lakehouse [7]
      • {operation} view runs' status [7]
      • {operation} search and filter the runs [7]
        • based on different criteria
      • {operation} cancel in-progress run [7]
      • {operation} drill down run execution details [7]
    • doesn't support
      • {feature|planned} PySpark [3]
      • {feature|planned} incremental refresh [3]
      • {feature|planned} integration with Data Activator [3]
      • {feature|planned} API [3]
      • {feature|planned} cross-lakehouse lineage and execution [3]
      • {limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
      • {limitation} creation with delta time-travel [4]
      • {limitation} DML statements [4]
      • {limitation} UDFs in CTAS [4] 
      • {limitation} temporary views can't be used to define MLVs [4]

    References:
    [1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]
    [2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]
    [3] Microsoft Learn (2025) What are materialized lake views in Microsoft Fabric? [link]
    [4] Microsoft Learn (2025) Materialized lake views Spark SQL reference [link]
    [5] Microsoft Learn (2025) Manage Fabric materialized lake views lineage [link] 
    [6] Microsoft Learn (2025) Data quality in materialized lake views [link]
    [7] Microsoft Learn (2025) Monitor materialized lake views [link

    Resources:
    [R1] Databricks (2025) Use materialized views in Databricks SQL [link]
    [R2] Microsoft Learn (2025) Implement medallion architecture with materialized lake views [link

    Acronyms:
    API - 
    CDF - Change Data Feed
    CTA - 
    DML - 
    ETL - Extract, Transfer, Load
    MF - Microsoft Fabric
    MLV - Materialized Lake views
    UDF - User-defined functions

    23 May 2025

    🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 23-May-2025

    [Microsoft Fabric] Warehouse Snapshots

    • {def} read-only representation of a warehouse at a specific point in time [1]
    • allows support for analytics, reporting, and historical analysis scenarios without worrying about the volatility of live data updates [1]
      • provide a consistent and stable view of data [1]
      • ensuring that analytical workloads remain unaffected by ongoing changes or ETL  operations [1]
    • {benefit} guarantees data consistency
      • the dataset remains unaffected by ongoing ETL processes [1]
    • {benefit} immediate roll-Forward updates
      • can be seamlessly rolled forward on demand to reflect the latest state of the warehouse
        • ⇒ {benefit} consumers access the same snapshot using a consistent connection string, even from third-party tools [1]
        • ⇐ updates are applied immediately, as if in a single, atomic transaction [1]
    • {benefit} facilitates historical analysis
      • snapshots can be created on an hourly, daily, or weekly basis to suit their business requirements [1]
    • {benefit} enhanced reporting
      • provides a point-in-time reliable dataset for precise reporting [1]
        • ⇐ free from disruptions caused by data modifications [1]
    • {benefit} doesn't require separate storage [1]
      • relies on source Warehouse [1]
    • {limit} doesn't support database objects 
    • {limit} capture a state within the last 30 days
    • {operation} create snapshot
      • via New warehouse snapshot
      • multiple snapshots can be created for the same parent warehouse [1]
        • appear as child items of the parent warehouse in the workspace view [1]
        • the queries run against provide the current version of the data being accessed [1]
    • {operation} read properties 
      • via 
      • GET https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{warehousesnapshotId} Authorization: Bearer <bearer token>
    • {operation} update snapshot timestamp
      • allows users to roll forward data instantly, ensuring consistency [1]
        • use current state
          • via ALTER DATABASE [<snapshot name>] SET TIMESTAMP = CURRENT_TIMESTAMP; 
        • use point in time
          • ALTER DATABASE snapshot SET TIMESTAMP = 'YYYY-MM-DDTHH:MM:SS.SS'//UTC time
      • queries that are in progress during point in time update will complete against the version of data they were started against [1]
    • {operation} rename snapshot
    • {operation} delete snapshot
      • via DELETE
      • when the parent warehouse gets deleted, the snapshot is also deleted [1]
    • {operation} modify source table
      • DDL changes to source will only impact queries in the snapshot against tables affected [1]
    • {operation} join multiple snapshots
      • the resulting snapshot date will be applied to each warehouse connection [1]
    • {operation} retrieve metadata
      • via sys.databases [1]
    • [permissions] inherited from the source warehouse [1]
      • ⇐ any permission changes in the source warehouse applies instantly to the snapshot [1]
      • security updates on source database will be rendered immediately to the snapshot databases [1]
    • {limitation} can only be created against new warehouses [1]
      • created after Mar-2025
    • {limitation} do not appear in SSMS Object Explorer but will show up in the database selection dropdown [1]
    • {limitation} datetime can be set to any date in the past up to 30 days or database creation time (whichever is later)  [1]
    • {limitation} modified objects after the snapshot timestamp become invalid in the snapshot [1]
      • applies to tables, views, and stored procedures [1]
    • {limitation} must be recreated if the data warehouse is restored [1]
    • {limitation} aren’t supported on the SQL analytics endpoint of the Lakehouse [1]
    • {limitation} aren’t supported as a source for OneLake shortcuts [1]
    •  [Power BI]{limitation} require Direct Query or Import mode [1]
      • don’t support Direct Lake

      References:
      [1] Microsoft Learn (2025) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]
      [2] Microsoft Learn (2025) Warehouse snapshots (preview) [link]
      [3] Microsoft Learn (2025) Create and manage a warehouse snapshot (preview) [link]

      Resources:


      Acronyms:
      DDL - Data Definition Language
      ETL - Extract, Transfer, Load
      MF - Microsoft Fabric
      SSMS - SQL Server Management Studio

      29 April 2025

      🏭🗒️Microsoft Fabric: Purview [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 29-Apr-2025

      [Microsoft Purview] Purview
      • {def} comprehensive data governance and security platform designed to help organizations manage, protect, and govern their data across various environments [1]
        • incl. on-premises, cloud & SaaS applications [1]
        • provides the highest and most flexible level of functionality for data governance in MF [1]
          • offers comprehensive tools for 
            • data discovery
            • data classification
            • data cataloging
      • {capability} managing the data estate
        • {tool} dedicated portal
          • aka Fabric Admin portal
          • used to control tenant settings, capacities, domains, and other objects, typically reserved for administrators
        • {type} logical containers
          • used to control access to data and capabilities [1]
          • {level} tenants
            • settings for Fabric administrators [1]
          • {level} domains
            • group data that is relevant to a single business area or subject field [1]
          • {level} workspaces 
            • group Fabric items used by a single team or department [1]
        • {type} capacities
          • objects that limit compute resource usage for all Fabric workloads [1]
      • {capability} metadata scanning
        • extracts values from data lakes
          • e.g. names, identities, sensitivities, endorsements, etc. 
          • can be used to analyze and set governance policies [1]
      • {capability} secure and protect data
        • assure that data is protected against unauthorized access and destructive attacks [1]
        • compliant with data storage regulations applicable in your region [1]
        • {tool} data tags
          • allows to identity the sensitivity of data and apply data retentions and protection policies [1]
        • {tool} workspace roles
          • define the users who are authorized to access the data in a workspace [1]
        • {tool} data-level controls
          • used at the level of Fabric items
            • e.g. tables, rows, and columns to impose granular restrictions.
        • {tool} certifications
          • Fabric is compliant with many data management certifications
            • incl. HIPAA BAA, ISO/IEC 27017, ISO/IEC 27018, ISO/IEC 27001, ISO/IEC 27701 [1]
      • {feature} OneLake data hub
        • allows users to find and explore the data in their estate.
      • {feature} endorsement
        • allows users to endorse a Fabric item to identity it as of high quality [1]
          • help other users to trust the data that the item contains [1]
      • {feature} data lineage
        • allows users to understand the flow of data between items in a workspace and the impact that a change would have [1]
      • {feature} monitoring hub
        • allows to monitor activities for the Fabric items for which the user has the permission to view [1]
      • {feature} capacity metrics
        • app used to monitor usage and consumption
      • {feature} allows to automate the identification of sensitive information and provides a centralized repository for metadata [1]
      • feature} allows to find, manage, and govern data across various environments
        • incl. both on-premises and cloud-based systems [1]
        • supports compliance and risk management with features that monitor regulatory adherence and assess data vulnerabilities [1]
      • {feature} integrated with other Microsoft services and third-party tools 
        • {benefit} enhances its utility
        • {benefit} streamlines data access controls
          • enforcing policies, and delivering insights into data lineage [1]
      • {benefit} helps organizations maintain data integrity, comply with regulations, and use their data effectively for strategic decision-making [1]
      • {feature} Data Catalog
        • {benefit} allows users to discover, understand, and manage their organization's data assets
          • search for and browse datasets
          • view metadata
          • gain insights into the data’s lineage, classification, and sensitivity labels [1]
        • {benefit} promotes collaboration
          • users can annotate datasets with tags to improve discoverability and data governance [1]
        • targets users and administrator
        • {benefit} allows to discover where patient records are held by searching for keywords [1]
        • {benefit} allows to label documents and items based on their sensitiveness [1]
        • {benefit} allows to use access policies to manage self-service access requests [1]
      • {feature} Information Protection
        • used to classify, label, and protect sensitive data throughout the organization [1]
          • by applying customizable sensitivity labels, users classify records. [1]
          • {concept} policies
            • define access controls and enforce encryption
            • labels follow the data wherever it goes
            • helps organizations meet compliance requirements while safeguarding data against accidental exposure or malicious threats [1]
        • allows to protect records with policies to encrypt data and impose IRM
      • {feature} Data Loss Prevention (DLP)
        • the practice of protecting sensitive data to reduce the risk from oversharing [2]
          • implemented by defining and applying DLP policies [2]
      • {feature} Audit
        • user activities are automatically logged and appear in the Purview audit log
          • e.g. creating files or accessing Fabric items
      • {feature} connect Purview to Fabric in a different tenant
        • all functionality is supported, except that 
          • {limitation} Purview's live view isn't available for Fabric items [1]
          • {limitation} the system can't identify user registration automatically [1]
          • {limitation} managed identity can’t be used for authentication in cross-tenant connections [1]
            • {workaround} use a service principal or delegated authentication [1]
      • {feature} Purview hub
        • displays reports and insights about Fabric items [1]
          • acts as a centralized location to begin data governance and access more advanced features [1]
          • via Settings >> Microsoft Purview hub
          • administrators see information about their entire organization's Fabric data estate
          • provides information about
            • Data Catalog
            • Information Protection
            • Audit
        • the data section displays tables and graphs that analyze the entire organization's items in MF
          • users only see information about their own Fabric items and data

      References:
      [1] Microsoft Learn (2024) Purview: Govern data in Microsoft Fabric with Purview[link]
      [2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]
      [3] Microsoft Learn (2024) [link]

      Resources:

      Acronyms:
      DLP - Data Loss Prevention
      M365 - Microsoft 365
      MF - Microsoft Fabric
      SaaS - Software-as-a-Service

      🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 10-Jun-2025

      [Microsoft Purview] Data Loss Prevention (DLP)
      • {def} the practice of protecting sensitive data to reduce the risk from oversharing [2]
        • implemented by defining and applying DLP policies [2]
      • {benefit} helps to protect sensitive information with policies that automatically detect, monitor, and control the sharing or movement of sensitive data [1]
        • administrators can customize rules to block, restrict, or alert when sensitive data is transferred to prevent accidental or malicious data leaks [1]
      • {concept} DLP policies
        • allow to monitor the activities users take on sensitive items and then take protective actions [2]
          • applies to sensitive items 
            • at rest
            • in transit [2]
            • in use [2]
          • created and maintained in the Microsoft Purview portal [2]
        • {scope} only supported for Power BI semantic models [1]
        • {action} show a pop-up policy tip to the user that warns that they might be trying to share a sensitive item inappropriately [2]
        • {action} block the sharing and, via a policy tip, allow the user to override the block and capture the users' justification [2]
        • {action} block the sharing without the override option [2]
        • {action} [data at rest] sensitive items can be locked and moved to a secure quarantine location [2]
        • {action} sensitive information won't be displayed 
          • e.g. Teams chat
      • DLP reports
        • provides data from monitoring policy matches and actions, to user activities [2]
          • used as basis for tuning policies and triage actions taken on sensitive items [2]
        • telemetry uses M365 audit Logs and processed the data for the different reporting tools [2]
          • M365 provides with visibility into risky user activities [2]
          • scans the audit logs for risky activities and runs them through a correlation engine to find activities that are occurring at a high volume [1]
            • no DLP policies are required [2]
      • {feature} detects sensitive items by using deep content analysis [2]
        • ⇐ not by just a simple text scan [2]
        • based on
          • keywords matching [2]
          • evaluation of regular expressions [2] 
          • internal function validation [2]
          • secondary data matches that are in proximity to the primary data match [2]
          • ML algorithms and other methods to detect content that matches DLP policies
        • all DLP monitored activities are recorded to the Microsoft 365 Audit log [2]
      • DLP lifecycle
        • {phase} plan for DLP
          • train and acclimate users to DLP practices on well-planned and tuned policies [2]
          • {recommendation} use policy tips to raise awareness with users before changing the policy status from simulation mode to more restrictive modes [2]
        • {phase} prepare for DLP
        • {phase} deploy policies in production
          • {action} define control objectives, and how they apply across workloads [2]
          • {action} draft a policy that embodies the objectives
          • {action} start with one workload at a time, or across all workloads - there's no impact yet
          • {feature} implement policies in simulation mode
            • {benefit} allows to evaluate the impact of controls
              • the actions defined in a policy aren't applied yet
            • {benefit} allows to monitor the outcomes of the policy and fine-tune it so that it meets the control objectives while ensuring it doesn't adversely or inadvertently impacting valid user workflows and productivity [2]
              • e.g. adjusting the locations and people/places that are in or out of scope
              • e.g. tune the conditions that are used to determine if an item and what is being done with it matches the policy
              • e.g. the sensitive information definition/s
              • e.g. add new controls
              • e.g. add new people
              • e.g. add new restricted apps
              • e.g. add new restricted sites
            • {step} enable the control and tune policies [2]
              • policies take effect about an hour after being turned on [2]
          • {action} create DLP policy 
          • {action} deploy DLP policy 
      • DLP alerts 
        • alerts generated when a user performs an action that meets the criteria of a DLP policy [2]
          • there are incident reports configured to generate alerts [2]
          • {limitation} available in the alerts dashboard for 30 days [2]
        • DLP posts the alert for investigation in the DLP Alerts dashboard
        • {tool} DLP Alerts dashboard 
          • allows to view alerts, triage them, set investigation status, and track resolution
            • routed to Microsoft Defender portal 
            • {limitation} available for six months [2]
          • {constraint} administrative unit restricted admins see the DLP alerts for their administrative unit only [2]
      • {concept} egress activities (aka exfiltration)
        • {def} actions related to exiting or leaving a space, system or network [2]
      • {concept}[Microsoft Fabric] policy
        • when a DLP policy detects a supported item type containing sensitive information, the actions configured in the policy are triggered [3]
        • {feature} Activity explorer
          • allows to view Data from DLP for Fabric and Power BI
          • for accessing the data, user's account must be a member of any of the following roles or higher [3]
            • Compliance administrator
            • Security administrator
            • Compliance data administrator
            • Global Administrator 
              • {warning} a highly privileged role that should only be used in scenarios where a lesser privileged role can't be used [3]
            • {recommendation} use a role with the fewest permissions [3]
        • {warning} DLP evaluation workloads impact capacity consumption [3]
        • {action} define policy
          • in the data loss prevention section of the Microsoft Purview portal [3]
          • allows to specify 
            •  conditions 
              • e.g. sensitivity labels
            •  sensitive info types that should be detected [3]
          • [semantic model] evaluated against DLP policies 
            • whenever one of the following events occurs:
              • publish
              • republish
              • on-demand refresh
              • scheduled refresh
            •  the evaluation  doesn't occur if either of the following is true
              • the initiator of the event is an account using service principal authentication [3]
              • the semantic model owner is a service principal [3]
          • [lakehouse] evaluated against DLP policies when the data within a lakehouse undergoes a change
            • e.g. getting new data, connecting a new source, adding or updating existing tables, etc. [3]

      References:
      [1] Microsoft Learn (2025) Learn about data loss prevention [link]
      [2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]
      [3] Microsoft Learn (2025) Get started with Data loss prevention policies for Fabric and Power BI [link]

      Resources:
      [R1] Microsoft Fabric Updates Blog (2024) Secure Your Data from Day One: Best Practices for Success with Purview Data Loss Prevention (DLP) Policies in Microsoft Fabric [link]
      [R2] 

      Acronyms:
      DLP - Data Loss Prevention
      M365 - Microsoft 365

      26 April 2025

      🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 26-Apr-2

      [Microsoft Fabric] Dataflow Gen2 Parameters

      • {def} parameters that allow to dynamically control and customize Dataflows Gen2
        • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
        • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
          • Fabric REST API [1]
          • native Fabric experiences [1]
        • parameter names are case sensitive [1]
        • {type} required parameters
          • {warning} the refresh fails if no value is passed for it [1]
        • {type} optional parameters
        • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
      • {limitation} dataflows with parameters can't be
        • scheduled for refresh through the Fabric scheduler [1]
        • manually triggered through the Fabric Workspace list or lineage view [1]
      • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
        • ⇐ connections are linked to the exact data source path defined in the authored dataflow
          • can't be currently override to use other connections or resource paths [1]
      • {limitation} can't be leveraged by dataflows with incremental refresh [1]
      • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
        • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
      • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
      • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
      • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
      • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
      • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
        • subsequent requests are rejected until the first request finishes its evaluation [1]

      References:
      [1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

      Resources:
      [R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

      Acronyms:
      API - Application Programming Interface
      REST - Representational State Transfer

      🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 26-Apr-2025

      [Microsoft Fabric] Deployment Pipelines

      • {def} a structured process that enables content creators to manage the lifecycle of their organizational assets [5]
        • enable creators to develop and test content in the service before it reaches the users [5]
          • can simplify the deployment process to development, test, and production workspaces [5]
          • one Premium workspace is assigned to each stage [5]
          • each stage can have 
            • different configurations [5]
            • different databases or different query parameters [5]
      • {action} create pipeline
        • from the deployment pipelines entry point in Fabric [5]
          • creating a pipeline from a workspace automatically assigns it to the pipeline [5]
        • {action} define how many stages it should have and what they should be called [5]
          • {default} has three stages
            • e.g. Development, Test, and Production
            • the number of stages can be changed anywhere between 2-10 
            • {action} add another stage,
            • {action} delete stage
            • {action} rename stage 
              • by typing a new name in the box
            • {action} share a pipeline with others
              • users receive access to the pipeline and become pipeline admins [5]
            • ⇐ the number of stages are permanent [5]
              • can't be changed after the pipeline is created [5]
        • {action} add content to the pipeline [5]
          • done by assigning a workspace to the pipeline stage [5]
            • the workspace can be assigned to any stage [5]
        • {action|optional} make a stage public
          • {default} the final stage of the pipeline is made public
          • a consumer of a public stage without access to the pipeline sees it as a regular workspace [5]
            • without the stage name and deployment pipeline icon on the workspace page next to the workspace name [5]
        • {action} deploy to an empty stage
          • when finishing the work in one pipeline stage, the content can be deployed to the next stage [5] 
            • deployment can happen in any direction [5]
          • {option} full deployment 
            • deploy all content to the target stage [5]
          • {option} selective deployment 
            • allows select the content to deploy to the target stage [5]
          • {option} backward deployment 
            • deploy content from a later stage to an earlier stage in the pipeline [5] 
            • {restriction} only possible when the target stage is empty [5]
        • {action} deploy content between pages [5]
          • content can be deployed even if the next stage has content
            • paired items are overwritten [5]
        • {action|optional} create deployment rules
          • when deploying content between pipeline stages, allow changes to content while keeping some settings intact [5] 
          • once a rule is defined or changed, the content must be redeployed
            • the deployed content inherits the value defined in the deployment rule [5]
            • the value always applies as long as the rule is unchanged and valid [5]
        • {feature} deployment history 
          • allows to see the last time content was deployed to each stage [5]
          • allows to to track time between deployments [5]
      • {concept} pairing
        • {def} the process by which an item in one stage of the deployment pipeline is associated with the same item in the adjacent stage
          • applies to reports, dashboards, semantic models
          • paired items appear on the same line in the pipeline content list [5]
            • ⇐ items that aren't paired, appear on a line by themselves [5]
          • the items remain paired even if their name changes
          • items added after the workspace is assigned to a pipeline aren't automatically paired [5]
            • ⇐ one can have identical items in adjacent workspaces that aren't paired [5]
      • [lakehouse]
        • can be removed as a dependent object upon deployment [3]
        • supports mapping different Lakehouses within the deployment pipeline context [3]
        • {default} a new empty Lakehouse object with same name is created in the target workspace [3]
          • ⇐ if nothing is specified during deployment pipeline configuration
          • notebook and Spark job definitions are remapped to reference the new lakehouse object in the new workspace [3]
          • {warning} a new empty Lakehouse object with same name still is created in the target workspace [3]
          • SQL Analytics endpoints and semantic models are provisioned
          • no object inside the Lakehouse is overwritten [3]
          • updates to Lakehouse name can be synchronized across workspaces in a deployment pipeline context [3] 
      • [notebook] deployment rules can be used to customize the behavior of notebooks when deployed [4]
        • e.g. change notebook's default lakehouse [4]
        • {feature} auto-binding
          • binds the default lakehouse and attached environment within the same workspace when deploying to next stage [4]
      • [environment] custom pool is not supported in deployment pipeline
        • the configurations of Compute section in the destination environment are set with default values [6]
        • ⇐ subject to change in upcoming releases [6]
      • [warehouse]
        • [database project] ALTER TABLE to add a constraint or column
          • {limitation} the table will be dropped and recreated when deploying, resulting in data loss
        • {recommendation} do not create a Dataflow Gen2 with an output destination to the warehouse
          • ⇐ deployment would be blocked by a new item named DataflowsStagingWarehouse that appears in the deployment pipeline [10]
        • SQL analytics endpoint is not supported
      • [Eventhouse]
        • {limitation} the connection must be reconfigured in destination that use Direct Ingestion mode [8]
      • [EventStream]
        • {limitation} limited support for cross-workspace scenarios
          • {recommendation} make sure all EventStream destinations within the same workspace [8]
      • KQL database
        • applies to tables, functions, materialized views [7]
      • KQL queryset
        • ⇐ tabs, data sources [7]
      • [real-time dashboard]
        • data sources, parameters, base queries, tiles [7]
      • [SQL database]
        • includes the specific differences between the individual database objects in the development and test workspaces [9]
      • can be also used with

        References:
        [1] Microsoft Learn (2024) Get started with deployment pipelines [link]
        [2] Microsoft Learn (2024) Implement continuous integration and continuous delivery (CI/CD) in Microsoft Fabric [link]
        [3] Microsoft Learn (2024)  Lakehouse deployment pipelines and git integration (Preview) [link]
        [4] Microsoft Learn (2024) Notebook source control and deployment [link
        [5] Microsoft Learn (2024) Introduction to deployment pipelines [link]
        [6] Environment Git integration and deployment pipeline [link]
        [7] Microsoft Learn (2024) Microsoft Learn (2024) Real-Time Intelligence: Git integration and deployment pipelines (Preview) [link]
        [8] Microsoft Learn (2024) Eventstream CI/CD - Git Integration and Deployment Pipeline [link]
        [9] Microsoft Learn (2024) Get started with deployment pipelines integration with SQL database in Microsoft Fabric [link]
        [10] Microsoft Learn (2025) Source control with Warehouse (preview) [link

        Resources:

        Acronyms:
        CLM - Content Lifecycle Management
        UAT - User Acceptance Testing

        🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 26-Apr-2025

        Enterprise Content Publishing [2]

        [Microsoft Fabric] Power BI Environments

        • {def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
        • {environment} development 
          • allows to develop the solution
          • accessible only to the development team 
            • via Contributor access
          • {recommendation} use Power BI Desktop as local development environment
            • {benefit} allows to try, explore, and review updates to reports and datasets
              • once the work is done, upload the new version to the development stage
            • {benefit} enables collaborating and changing dashboards
            • {benefit} avoids duplication 
              • making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication
          • {recommendation} use version control to keep the .pbix files up to date
            • [OneDrive] use Power BI's autosync
              • {alternative} SharePoint Online with folder synchronization
              • {alternative} GitHub and/or VSTS with local repository & folder synchronization
          • [enterprise scale deployments] 
            • {recommendation} separate dataset from reports and dashboards’ development
              • use the deployment pipelines selective deploy option [22]
              • create separate .pbix files for datasets and reports [22]
                • create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
                • create .pbix only for the report, and connect it to the published dataset using a live connection [22]
              • {benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently
            • {recommendation} separate data model from report and dashboard development
              • allows using advanced capabilities 
                • e.g. source control, merging diff changes, automated processes
              • separate the development from test data sources [1]
                • the development database should be relatively small [1]
          • {recommendation} use only a subset of the data [1]
            • ⇐ otherwise the data volume can slow down the development [1]
        • {environment} user acceptance testing (UAT)
          • test environment that within the deployment lifecycle sits between development and production
            • it's not necessary for all Power BI solutions [3]
            • allows to test the solution before deploying it into production
              • all tests must have 
                • View access for testing
                • Contributor access for report authoring
            • involves business users who are SMEs
              • provide approval that the content 
                • is accurate
                • meets requirements
                • can be deployed for wider consumption
          • {recommendation} check report’s load and the interactions to find out if changes impact performance [1]
          • {recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
          • {recommendation} test data refresh in the Power BI service regularly during development [20]
        • {environment} production
          • {concept} staged deployment
            • {goal} help minimize risk, user disruption, or address other concerns [3]
              • the deployment involves a smaller group of pilot users who provide feedback [3]
          • {recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]
            • allows ensuring the data in production is always connected and available to users [1]
          • {recommendation} don’t upload a new .pbix version directly to the production stage
            •  ⇐ without going through testing
        • {feature|preview} deployment pipelines 
          • enable creators to develop and test content in the service before it reaches the users [5]
        • {recommendation} build separate databases for development and testing 
          • helps protect production data [1]
        • {recommendation} make sure that the test and production environment have similar characteristics [1]
          • e.g. data volume, sage volume, similar capacity 
          • {warning} testing into production can make production unstable [1]
          • {recommendation} use Azure A capacities [22]
        • {recommendation} for formal projects, consider creating an environment for each phase
        • {recommendation} enable users to connect to published datasets to create their own reports
        • {recommendation} use parameters to store connection details 
          • e.g. instance names, database names
          • ⇐  deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages
            • alternatively data source rules can be used to specify a connection string for a given dataset
              • {restriction} in deployment pipelines, this isn't supported for all data sources
        • {recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
        • {recommendation} provide data to self-service authors from a centralized data warehouse [20]
          • allows to minimize the amount of work that self-service authors need to take on [20]
        • {recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
        • {recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
        • {recommendation} be aware of API connectivity issues and limits [20]
        • {recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
        • {recommendation} minimize the query load on source systems [20]
          • use incremental refresh in Power BI for the dataset(s)
          • use a Power BI dataflow that extracts the data from the source on a schedule
          • reduce the dataset size by only extracting the needed amount of data 
        • {recommendation} expect data refresh operations to take some time [20]
        • {recommendation} use relational database sources when practical [20]
        • {recommendation} make the data easily accessible [20]
        • [knowledge area] knowledge transfer
          • {recommendation} maintain a list of best practices and review it regularly [24]
          • {recommendation} develop a training plan for the various types of users [24]
            • usability training for read only report/app users [24
            • self-service reporting for report authors & data analysts [24]
            • more elaborated training for advanced analysts & developers [24]
        • [knowledge area] lifecycle management
          • consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
          • {recommendation} postfix files with 3-part version number in Development stage [24]
            • remove the version number when publishing files in UAT and production 
          • {recommendation} backup files for archive 
          • {recommendation} track version history 

          References:
          [1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]
          [2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
          [3] Microsoft Learn (2024) Deploy to Power BI [link]
          [4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
          [5] Microsoft Learn (2024) Introduction to deployment pipelines [link]
          [6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
          [20] Microsoft (2020) Planning a Power BI  Enterprise Deployment [White paper] [link]
          [22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]
          [24] Paul Turley (2019)  A Best Practice Guide and Checklist for Power BI Projects

          Resources:

          Acronyms:
          API - Application Programming Interface
          CLM - Content Lifecycle Management
          COE - Center of Excellence
          SaaS - Software-as-a-Service
          SME - Subject Matter Expert
          UAT - User Acceptance Testing
          VSTS - Visual Studio Team System
          SME - Subject Matter Experts

          25 April 2025

          🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

          Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

          Last updated: 25-Apr-2025

          [Microsoft Fabric] Incremental Refresh in Dataflows Gen2

          • {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
            • designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
            • configurable directly in the dataflow editor [8]
            • doesn't need to specify the historical data range [8]
              • ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
            • doesn't need to specify the parameters for the incremental refresh [8]
              • the filters and parameters are automatically added as the last step in the query [8]
          • {prerequisite} the data source 
            • supports folding [8]
            • needs to contain a Date/DateTime column that can be used to filter the data [8]
          • {prerequisite} the data destination supports incremental refresh [8]
            • available destinations
              • Fabric Warehouse
              • Azure SQL Database
              • Azure Synapse Analytics
              • Fabric Lakehouse [preview]
            • other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
              • allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
                • a full refresh from the staged data to the data destination is still needed [8]
          • works by dividing the data into buckets based on a DateTime column [8]
            • each bucket contains the data that changed since the last refresh [8]
              • the dataflow knows what changed by checking the maximum value in the specified column 
                • if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
                • if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
          • {limitation} 
            • the data destination must be set to a fixed schema [8]
            • ⇒table's schema in the data destination must be fixed and can't change [8]
              • ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
          • {limitation} the only supported update method in the data destination: replace
            • ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
              • data that is outside the bucket range isn't affected [8]
          • {limitation} maximum number of buckets
            • single query: 50
              • {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
            • whole dataflow: 150
              • {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
          • {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
            • because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
            • {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
              • {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
              • {option} reduce the number of buckets by increasing the bucket size [8]
              • {option} disable incremental refresh [8]
          • {recommendation} don't use the column for detecting changes also for filtering [8]
            • because this can lead to unexpected results [8]
          • {setting} limit number of concurrent evaluation
            • setting the value to a lower number, reduces the number of requests sent to the source system [8]
            • via global settings >> Scale tab >> maximum number of parallel query evaluations
            • {recommendation} don't enable this limit unless there're issues with the source system [8]

          References:
          [5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
          [8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link

          Resources:


          Related Posts Plugin for WordPress, Blogger...

          About Me

          My photo
          Koeln, NRW, Germany
          IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.