Showing posts with label notes. Show all posts
Showing posts with label notes. Show all posts

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 24-May-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
) 
AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLV)

  • {def} persisted, continuously updated view of data [1]
    • {benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations
      • simplifies the implementation of multi-stage Lakehouse processing [1]
        • streamline data workflows
        • enable developers to focus on business logic [1]
          • ⇐ not on infrastructural or data quality-related issues [1]
        • the views can be created in a notebook [2]
      • can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
      • empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]
        • faster development cycles 
        • trustworthy data
        • quicker insights
  • {goal} process only the new or changed data instead of reprocessing everything each time [1]
    • ⇐  leverages Delta Lake’s CDF under the hood
      • ⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]
  • {operation} creation
    • allows defining transformations at each layer [1]
      • e.g. aggregation, projection, filters
    • allows specifying certain checks that the data must meet [1]
      • incorporate data quality constraints directly into the pipeline definition
    • via CREATE MATERIALIZED LAKE VIEW
      • the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]
  • {operation} refresh
    • refreshes only when its source has new data [1]
      • if there’s no change, it can skip running entirely (saving time and resources) [1]
  • {feature} automatically generate a visual report that shows trends on data quality constraints 
    • {benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]
  • {feature} can be combined with Shortcut Transformation feature for CSV ingestion 
    • {benefit} allows building an end-to-end Medallion architecture
  • {feature} dependency graph
    • allows to see the dependencies existing between the various objects [2]
      • ⇐ automatically generated [2]
  • {feature} data quality report
    • built-in Power BI dashboard that shows several aggregated metrics [2]
  • {feature|planned} support for PySpark
  • {feature|planned} incremental refresh
  • {feature|planned} integration with Data Activator
Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]
[2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]
[3] Microsoft Learn (2025)  [link]

Resources:
[R1] Databricks (2025) Use materialized views in Databricks SQL [link]

Acronyms:
CDF - Change Data Feed
ETL - Extract, Transfer, Load
MF - Microsoft Fabric
MLV - Materialized Lake views

23 May 2025

🏭🗒️Microsoft Fabric: Warehouse Snapshots [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 23-May-2025

[Microsoft Fabric] Warehouse Snapshots

  • {def} read-only representation of a warehouse at a specific point in time [1]
  • allows support for analytics, reporting, and historical analysis scenarios without worrying about the volatility of live data updates [1]
    • provide a consistent and stable view of data [1]
    • ensuring that analytical workloads remain unaffected by ongoing changes or ETL  operations [1]
  • {benefit} guarantees data consistency
    • the dataset remains unaffected by ongoing ETL processes [1]
  • {benefit} immediate roll-Forward updates
    • can be seamlessly rolled forward on demand to reflect the latest state of the warehouse
      • ⇒ {benefit} consumers access the same snapshot using a consistent connection string, even from third-party tools [1]
      • ⇐ updates are applied immediately, as if in a single, atomic transaction [1]
  • {benefit} facilitates historical analysis
    • snapshots can be created on an hourly, daily, or weekly basis to suit their business requirements [1]
  • {benefit} enhanced reporting
    • provides a point-in-time reliable dataset for precise reporting [1]
      • ⇐ free from disruptions caused by data modifications [1]
  • {benefit} doesn't require separate storage [1]
    • relies on source Warehouse [1]
  • {limit} doesn't support database objects 
  • {limit} capture a state within the last 30 days
  • {operation} create snapshot
    • via New warehouse snapshot
    • multiple snapshots can be created for the same parent warehouse [1]
      • appear as child items of the parent warehouse in the workspace view [1]
      • the queries run against provide the current version of the data being accessed [1]
  • {operation} read properties 
    • via 
    • GET https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{warehousesnapshotId} Authorization: Bearer <bearer token>
  • {operation} update snapshot timestamp
    • allows users to roll forward data instantly, ensuring consistency [1]
      • use current state
        • via ALTER DATABASE [<snapshot name>] SET TIMESTAMP = CURRENT_TIMESTAMP; 
      • use point in time
        • ALTER DATABASE snapshot SET TIMESTAMP = 'YYYY-MM-DDTHH:MM:SS.SS'//UTC time
    • queries that are in progress during point in time update will complete against the version of data they were started against [1]
  • {operation} rename snapshot
  • {operation} delete snapshot
    • via DELETE
    • when the parent warehouse gets deleted, the snapshot is also deleted [1]
  • {operation} modify source table
    • DDL changes to source will only impact queries in the snapshot against tables affected [1]
  • {operation} join multiple snapshots
    • the resulting snapshot date will be applied to each warehouse connection [1]
  • {operation} retrieve metadata
    • via sys.databases [1]
  • [permissions] inherited from the source warehouse [1]
    • ⇐ any permission changes in the source warehouse applies instantly to the snapshot [1]
    • security updates on source database will be rendered immediately to the snapshot databases [1]
  • {limitation} can only be created against new warehouses [1]
    • created after Mar-2025
  • {limitation} do not appear in SSMS Object Explorer but will show up in the database selection dropdown [1]
  • {limitation} datetime can be set to any date in the past up to 30 days or database creation time (whichever is later)  [1]
  • {limitation} modified objects after the snapshot timestamp become invalid in the snapshot [1]
    • applies to tables, views, and stored procedures [1]
  • {limitation} must be recreated if the data warehouse is restored [1]
  • {limitation} aren’t supported on the SQL analytics endpoint of the Lakehouse [1]
  • {limitation} aren’t supported as a source for OneLake shortcuts [1]
  •  [Power BI]{limitation} require Direct Query or Import mode [1]
    • don’t support Direct Lake

    References:
    [1] Microsoft Learn (2025) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]
    [2] Microsoft Learn (2025) Warehouse snapshots (preview) [link]
    [3] Microsoft Learn (2025) Create and manage a warehouse snapshot (preview) [link]

    Resources:


    Acronyms:
    DDL - Data Definition Language
    ETL - Extract, Transfer, Load
    MF - Microsoft Fabric
    SSMS - SQL Server Management Studio

    29 April 2025

    🏭🗒️Microsoft Fabric: Purview [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 29-Apr-2025

    [Microsoft Purview] Purview
    • {def} comprehensive data governance and security platform designed to help organizations manage, protect, and govern their data across various environments [1]
      • incl. on-premises, cloud & SaaS applications [1]
      • provides the highest and most flexible level of functionality for data governance in MF [1]
        • offers comprehensive tools for 
          • data discovery
          • data classification
          • data cataloging
    • {capability} managing the data estate
      • {tool} dedicated portal
        • aka Fabric Admin portal
        • used to control tenant settings, capacities, domains, and other objects, typically reserved for administrators
      • {type} logical containers
        • used to control access to data and capabilities [1]
        • {level} tenants
          • settings for Fabric administrators [1]
        • {level} domains
          • group data that is relevant to a single business area or subject field [1]
        • {level} workspaces 
          • group Fabric items used by a single team or department [1]
      • {type} capacities
        • objects that limit compute resource usage for all Fabric workloads [1]
    • {capability} metadata scanning
      • extracts values from data lakes
        • e.g. names, identities, sensitivities, endorsements, etc. 
        • can be used to analyze and set governance policies [1]
    • {capability} secure and protect data
      • assure that data is protected against unauthorized access and destructive attacks [1]
      • compliant with data storage regulations applicable in your region [1]
      • {tool} data tags
        • allows to identity the sensitivity of data and apply data retentions and protection policies [1]
      • {tool} workspace roles
        • define the users who are authorized to access the data in a workspace [1]
      • {tool} data-level controls
        • used at the level of Fabric items
          • e.g. tables, rows, and columns to impose granular restrictions.
      • {tool} certifications
        • Fabric is compliant with many data management certifications
          • incl. HIPAA BAA, ISO/IEC 27017, ISO/IEC 27018, ISO/IEC 27001, ISO/IEC 27701 [1]
    • {feature} OneLake data hub
      • allows users to find and explore the data in their estate.
    • {feature} endorsement
      • allows users to endorse a Fabric item to identity it as of high quality [1]
        • help other users to trust the data that the item contains [1]
    • {feature} data lineage
      • allows users to understand the flow of data between items in a workspace and the impact that a change would have [1]
    • {feature} monitoring hub
      • allows to monitor activities for the Fabric items for which the user has the permission to view [1]
    • {feature} capacity metrics
      • app used to monitor usage and consumption
    • {feature} allows to automate the identification of sensitive information and provides a centralized repository for metadata [1]
    • feature} allows to find, manage, and govern data across various environments
      • incl. both on-premises and cloud-based systems [1]
      • supports compliance and risk management with features that monitor regulatory adherence and assess data vulnerabilities [1]
    • {feature} integrated with other Microsoft services and third-party tools 
      • {benefit} enhances its utility
      • {benefit} streamlines data access controls
        • enforcing policies, and delivering insights into data lineage [1]
    • {benefit} helps organizations maintain data integrity, comply with regulations, and use their data effectively for strategic decision-making [1]
    • {feature} Data Catalog
      • {benefit} allows users to discover, understand, and manage their organization's data assets
        • search for and browse datasets
        • view metadata
        • gain insights into the data’s lineage, classification, and sensitivity labels [1]
      • {benefit} promotes collaboration
        • users can annotate datasets with tags to improve discoverability and data governance [1]
      • targets users and administrator
      • {benefit} allows to discover where patient records are held by searching for keywords [1]
      • {benefit} allows to label documents and items based on their sensitiveness [1]
      • {benefit} allows to use access policies to manage self-service access requests [1]
    • {feature} Information Protection
      • used to classify, label, and protect sensitive data throughout the organization [1]
        • by applying customizable sensitivity labels, users classify records. [1]
        • {concept} policies
          • define access controls and enforce encryption
          • labels follow the data wherever it goes
          • helps organizations meet compliance requirements while safeguarding data against accidental exposure or malicious threats [1]
      • allows to protect records with policies to encrypt data and impose IRM
    • {feature} Data Loss Prevention (DLP)
      • the practice of protecting sensitive data to reduce the risk from oversharing [2]
        • implemented by defining and applying DLP policies [2]
    • {feature} Audit
      • user activities are automatically logged and appear in the Purview audit log
        • e.g. creating files or accessing Fabric items
    • {feature} connect Purview to Fabric in a different tenant
      • all functionality is supported, except that 
        • {limitation} Purview's live view isn't available for Fabric items [1]
        • {limitation} the system can't identify user registration automatically [1]
        • {limitation} managed identity can’t be used for authentication in cross-tenant connections [1]
          • {workaround} use a service principal or delegated authentication [1]
    • {feature} Purview hub
      • displays reports and insights about Fabric items [1]
        • acts as a centralized location to begin data governance and access more advanced features [1]
        • via Settings >> Microsoft Purview hub
        • administrators see information about their entire organization's Fabric data estate
        • provides information about
          • Data Catalog
          • Information Protection
          • Audit
      • the data section displays tables and graphs that analyze the entire organization's items in MF
        • users only see information about their own Fabric items and data

    References:
    [1] Microsoft Learn (2024) Purview: Govern data in Microsoft Fabric with Purview[link]
    [2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]
    [3] Microsoft Learn (2024) [link]

    Resources:

    Acronyms:
    DLP - Data Loss Prevention
    M365 - Microsoft 365
    MF - Microsoft Fabric
    SaaS - Software-as-a-Service

    🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 10-Jun-2025

    [Microsoft Purview] Data Loss Prevention (DLP)
    • {def} the practice of protecting sensitive data to reduce the risk from oversharing [2]
      • implemented by defining and applying DLP policies [2]
    • {benefit} helps to protect sensitive information with policies that automatically detect, monitor, and control the sharing or movement of sensitive data [1]
      • administrators can customize rules to block, restrict, or alert when sensitive data is transferred to prevent accidental or malicious data leaks [1]
    • {concept} DLP policies
      • allow to monitor the activities users take on sensitive items and then take protective actions [2]
        • applies to sensitive items 
          • at rest
          • in transit [2]
          • in use [2]
        • created and maintained in the Microsoft Purview portal [2]
      • {scope} only supported for Power BI semantic models [1]
      • {action} show a pop-up policy tip to the user that warns that they might be trying to share a sensitive item inappropriately [2]
      • {action} block the sharing and, via a policy tip, allow the user to override the block and capture the users' justification [2]
      • {action} block the sharing without the override option [2]
      • {action} [data at rest] sensitive items can be locked and moved to a secure quarantine location [2]
      • {action} sensitive information won't be displayed 
        • e.g. Teams chat
    • DLP reports
      • provides data from monitoring policy matches and actions, to user activities [2]
        • used as basis for tuning policies and triage actions taken on sensitive items [2]
      • telemetry uses M365 audit Logs and processed the data for the different reporting tools [2]
        • M365 provides with visibility into risky user activities [2]
        • scans the audit logs for risky activities and runs them through a correlation engine to find activities that are occurring at a high volume [1]
          • no DLP policies are required [2]
    • {feature} detects sensitive items by using deep content analysis [2]
      • ⇐ not by just a simple text scan [2]
      • based on
        • keywords matching [2]
        • evaluation of regular expressions [2] 
        • internal function validation [2]
        • secondary data matches that are in proximity to the primary data match [2]
        • ML algorithms and other methods to detect content that matches DLP policies
      • all DLP monitored activities are recorded to the Microsoft 365 Audit log [2]
    • DLP lifecycle
      • {phase} plan for DLP
        • train and acclimate users to DLP practices on well-planned and tuned policies [2]
        • {recommendation} use policy tips to raise awareness with users before changing the policy status from simulation mode to more restrictive modes [2]
      • {phase} prepare for DLP
      • {phase} deploy policies in production
        • {action} define control objectives, and how they apply across workloads [2]
        • {action} draft a policy that embodies the objectives
        • {action} start with one workload at a time, or across all workloads - there's no impact yet
        • {feature} implement policies in simulation mode
          • {benefit} allows to evaluate the impact of controls
            • the actions defined in a policy aren't applied yet
          • {benefit} allows to monitor the outcomes of the policy and fine-tune it so that it meets the control objectives while ensuring it doesn't adversely or inadvertently impacting valid user workflows and productivity [2]
            • e.g. adjusting the locations and people/places that are in or out of scope
            • e.g. tune the conditions that are used to determine if an item and what is being done with it matches the policy
            • e.g. the sensitive information definition/s
            • e.g. add new controls
            • e.g. add new people
            • e.g. add new restricted apps
            • e.g. add new restricted sites
          • {step} enable the control and tune policies [2]
            • policies take effect about an hour after being turned on [2]
        • {action} create DLP policy 
        • {action} deploy DLP policy 
    • DLP alerts 
      • alerts generated when a user performs an action that meets the criteria of a DLP policy [2]
        • there are incident reports configured to generate alerts [2]
        • {limitation} available in the alerts dashboard for 30 days [2]
      • DLP posts the alert for investigation in the DLP Alerts dashboard
      • {tool} DLP Alerts dashboard 
        • allows to view alerts, triage them, set investigation status, and track resolution
          • routed to Microsoft Defender portal 
          • {limitation} available for six months [2]
        • {constraint} administrative unit restricted admins see the DLP alerts for their administrative unit only [2]
    • {concept} egress activities (aka exfiltration)
      • {def} actions related to exiting or leaving a space, system or network [2]
    • {concept}[Microsoft Fabric] policy
      • when a DLP policy detects a supported item type containing sensitive information, the actions configured in the policy are triggered [3]
      • {feature} Activity explorer
        • allows to view Data from DLP for Fabric and Power BI
        • for accessing the data, user's account must be a member of any of the following roles or higher [3]
          • Compliance administrator
          • Security administrator
          • Compliance data administrator
          • Global Administrator 
            • {warning} a highly privileged role that should only be used in scenarios where a lesser privileged role can't be used [3]
          • {recommendation} use a role with the fewest permissions [3]
      • {warning} DLP evaluation workloads impact capacity consumption [3]
      • {action} define policy
        • in the data loss prevention section of the Microsoft Purview portal [3]
        • allows to specify 
          •  conditions 
            • e.g. sensitivity labels
          •  sensitive info types that should be detected [3]
        • [semantic model] evaluated against DLP policies 
          • whenever one of the following events occurs:
            • publish
            • republish
            • on-demand refresh
            • scheduled refresh
          •  the evaluation  doesn't occur if either of the following is true
            • the initiator of the event is an account using service principal authentication [3]
            • the semantic model owner is a service principal [3]
        • [lakehouse] evaluated against DLP policies when the data within a lakehouse undergoes a change
          • e.g. getting new data, connecting a new source, adding or updating existing tables, etc. [3]

    References:
    [1] Microsoft Learn (2025) Learn about data loss prevention [link]
    [2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]
    [3] Microsoft Learn (2025) Get started with Data loss prevention policies for Fabric and Power BI [link]

    Resources:
    [R1] Microsoft Fabric Updates Blog (2024) Secure Your Data from Day One: Best Practices for Success with Purview Data Loss Prevention (DLP) Policies in Microsoft Fabric [link]
    [R2] 

    Acronyms:
    DLP - Data Loss Prevention
    M365 - Microsoft 365

    26 April 2025

    🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 26-Apr-2

    [Microsoft Fabric] Dataflow Gen2 Parameters

    • {def} parameters that allow to dynamically control and customize Dataflows Gen2
      • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
      • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
        • Fabric REST API [1]
        • native Fabric experiences [1]
      • parameter names are case sensitive [1]
      • {type} required parameters
        • {warning} the refresh fails if no value is passed for it [1]
      • {type} optional parameters
      • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
    • {limitation} dataflows with parameters can't be
      • scheduled for refresh through the Fabric scheduler [1]
      • manually triggered through the Fabric Workspace list or lineage view [1]
    • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
      • ⇐ connections are linked to the exact data source path defined in the authored dataflow
        • can't be currently override to use other connections or resource paths [1]
    • {limitation} can't be leveraged by dataflows with incremental refresh [1]
    • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
      • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
    • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
    • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
    • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
    • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
    • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
      • subsequent requests are rejected until the first request finishes its evaluation [1]

    References:
    [1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

    Resources:
    [R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

    Acronyms:
    API - Application Programming Interface
    REST - Representational State Transfer

    🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 26-Apr-2025

    [Microsoft Fabric] Deployment Pipelines

    • {def} a structured process that enables content creators to manage the lifecycle of their organizational assets [5]
      • enable creators to develop and test content in the service before it reaches the users [5]
        • can simplify the deployment process to development, test, and production workspaces [5]
        • one Premium workspace is assigned to each stage [5]
        • each stage can have 
          • different configurations [5]
          • different databases or different query parameters [5]
    • {action} create pipeline
      • from the deployment pipelines entry point in Fabric [5]
        • creating a pipeline from a workspace automatically assigns it to the pipeline [5]
      • {action} define how many stages it should have and what they should be called [5]
        • {default} has three stages
          • e.g. Development, Test, and Production
          • the number of stages can be changed anywhere between 2-10 
          • {action} add another stage,
          • {action} delete stage
          • {action} rename stage 
            • by typing a new name in the box
          • {action} share a pipeline with others
            • users receive access to the pipeline and become pipeline admins [5]
          • ⇐ the number of stages are permanent [5]
            • can't be changed after the pipeline is created [5]
      • {action} add content to the pipeline [5]
        • done by assigning a workspace to the pipeline stage [5]
          • the workspace can be assigned to any stage [5]
      • {action|optional} make a stage public
        • {default} the final stage of the pipeline is made public
        • a consumer of a public stage without access to the pipeline sees it as a regular workspace [5]
          • without the stage name and deployment pipeline icon on the workspace page next to the workspace name [5]
      • {action} deploy to an empty stage
        • when finishing the work in one pipeline stage, the content can be deployed to the next stage [5] 
          • deployment can happen in any direction [5]
        • {option} full deployment 
          • deploy all content to the target stage [5]
        • {option} selective deployment 
          • allows select the content to deploy to the target stage [5]
        • {option} backward deployment 
          • deploy content from a later stage to an earlier stage in the pipeline [5] 
          • {restriction} only possible when the target stage is empty [5]
      • {action} deploy content between pages [5]
        • content can be deployed even if the next stage has content
          • paired items are overwritten [5]
      • {action|optional} create deployment rules
        • when deploying content between pipeline stages, allow changes to content while keeping some settings intact [5] 
        • once a rule is defined or changed, the content must be redeployed
          • the deployed content inherits the value defined in the deployment rule [5]
          • the value always applies as long as the rule is unchanged and valid [5]
      • {feature} deployment history 
        • allows to see the last time content was deployed to each stage [5]
        • allows to to track time between deployments [5]
    • {concept} pairing
      • {def} the process by which an item in one stage of the deployment pipeline is associated with the same item in the adjacent stage
        • applies to reports, dashboards, semantic models
        • paired items appear on the same line in the pipeline content list [5]
          • ⇐ items that aren't paired, appear on a line by themselves [5]
        • the items remain paired even if their name changes
        • items added after the workspace is assigned to a pipeline aren't automatically paired [5]
          • ⇐ one can have identical items in adjacent workspaces that aren't paired [5]
    • [lakehouse]
      • can be removed as a dependent object upon deployment [3]
      • supports mapping different Lakehouses within the deployment pipeline context [3]
      • {default} a new empty Lakehouse object with same name is created in the target workspace [3]
        • ⇐ if nothing is specified during deployment pipeline configuration
        • notebook and Spark job definitions are remapped to reference the new lakehouse object in the new workspace [3]
        • {warning} a new empty Lakehouse object with same name still is created in the target workspace [3]
        • SQL Analytics endpoints and semantic models are provisioned
        • no object inside the Lakehouse is overwritten [3]
        • updates to Lakehouse name can be synchronized across workspaces in a deployment pipeline context [3] 
    • [notebook] deployment rules can be used to customize the behavior of notebooks when deployed [4]
      • e.g. change notebook's default lakehouse [4]
      • {feature} auto-binding
        • binds the default lakehouse and attached environment within the same workspace when deploying to next stage [4]
    • [environment] custom pool is not supported in deployment pipeline
      • the configurations of Compute section in the destination environment are set with default values [6]
      • ⇐ subject to change in upcoming releases [6]
    • [warehouse]
      • [database project] ALTER TABLE to add a constraint or column
        • {limitation} the table will be dropped and recreated when deploying, resulting in data loss
      • {recommendation} do not create a Dataflow Gen2 with an output destination to the warehouse
        • ⇐ deployment would be blocked by a new item named DataflowsStagingWarehouse that appears in the deployment pipeline [10]
      • SQL analytics endpoint is not supported
    • [Eventhouse]
      • {limitation} the connection must be reconfigured in destination that use Direct Ingestion mode [8]
    • [EventStream]
      • {limitation} limited support for cross-workspace scenarios
        • {recommendation} make sure all EventStream destinations within the same workspace [8]
    • KQL database
      • applies to tables, functions, materialized views [7]
    • KQL queryset
      • ⇐ tabs, data sources [7]
    • [real-time dashboard]
      • data sources, parameters, base queries, tiles [7]
    • [SQL database]
      • includes the specific differences between the individual database objects in the development and test workspaces [9]
    • can be also used with

      References:
      [1] Microsoft Learn (2024) Get started with deployment pipelines [link]
      [2] Microsoft Learn (2024) Implement continuous integration and continuous delivery (CI/CD) in Microsoft Fabric [link]
      [3] Microsoft Learn (2024)  Lakehouse deployment pipelines and git integration (Preview) [link]
      [4] Microsoft Learn (2024) Notebook source control and deployment [link
      [5] Microsoft Learn (2024) Introduction to deployment pipelines [link]
      [6] Environment Git integration and deployment pipeline [link]
      [7] Microsoft Learn (2024) Microsoft Learn (2024) Real-Time Intelligence: Git integration and deployment pipelines (Preview) [link]
      [8] Microsoft Learn (2024) Eventstream CI/CD - Git Integration and Deployment Pipeline [link]
      [9] Microsoft Learn (2024) Get started with deployment pipelines integration with SQL database in Microsoft Fabric [link]
      [10] Microsoft Learn (2025) Source control with Warehouse (preview) [link

      Resources:

      Acronyms:
      CLM - Content Lifecycle Management
      UAT - User Acceptance Testing

      🏭🗒️Microsoft Fabric: Power BI Environments [Notes]

      Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

      Last updated: 26-Apr-2025

      Enterprise Content Publishing [2]

      [Microsoft Fabric] Power BI Environments

      • {def} structured spaces within Microsoft Fabric that helps organizations manage the Power BI assets through the entire lifecycle
      • {environment} development 
        • allows to develop the solution
        • accessible only to the development team 
          • via Contributor access
        • {recommendation} use Power BI Desktop as local development environment
          • {benefit} allows to try, explore, and review updates to reports and datasets
            • once the work is done, upload the new version to the development stage
          • {benefit} enables collaborating and changing dashboards
          • {benefit} avoids duplication 
            • making online changes, downloading the .pbix file, and then uploading it again, creates reports and datasets duplication
        • {recommendation} use version control to keep the .pbix files up to date
          • [OneDrive] use Power BI's autosync
            • {alternative} SharePoint Online with folder synchronization
            • {alternative} GitHub and/or VSTS with local repository & folder synchronization
        • [enterprise scale deployments] 
          • {recommendation} separate dataset from reports and dashboards’ development
            • use the deployment pipelines selective deploy option [22]
            • create separate .pbix files for datasets and reports [22]
              • create a dataset .pbix file and uploaded it to the development stage (see shared datasets [22]
              • create .pbix only for the report, and connect it to the published dataset using a live connection [22]
            • {benefit} allows different creators to separately work on modeling and visualizations, and deploy them to production independently
          • {recommendation} separate data model from report and dashboard development
            • allows using advanced capabilities 
              • e.g. source control, merging diff changes, automated processes
            • separate the development from test data sources [1]
              • the development database should be relatively small [1]
        • {recommendation} use only a subset of the data [1]
          • ⇐ otherwise the data volume can slow down the development [1]
      • {environment} user acceptance testing (UAT)
        • test environment that within the deployment lifecycle sits between development and production
          • it's not necessary for all Power BI solutions [3]
          • allows to test the solution before deploying it into production
            • all tests must have 
              • View access for testing
              • Contributor access for report authoring
          • involves business users who are SMEs
            • provide approval that the content 
              • is accurate
              • meets requirements
              • can be deployed for wider consumption
        • {recommendation} check report’s load and the interactions to find out if changes impact performance [1]
        • {recommendation} monitor the load on the capacity to catch extreme loads before they reach production [1]
        • {recommendation} test data refresh in the Power BI service regularly during development [20]
      • {environment} production
        • {concept} staged deployment
          • {goal} help minimize risk, user disruption, or address other concerns [3]
            • the deployment involves a smaller group of pilot users who provide feedback [3]
        • {recommendation} set production deployment rules for data sources and parameters defined in the dataset [1]
          • allows ensuring the data in production is always connected and available to users [1]
        • {recommendation} don’t upload a new .pbix version directly to the production stage
          •  ⇐ without going through testing
      • {feature|preview} deployment pipelines 
        • enable creators to develop and test content in the service before it reaches the users [5]
      • {recommendation} build separate databases for development and testing 
        • helps protect production data [1]
      • {recommendation} make sure that the test and production environment have similar characteristics [1]
        • e.g. data volume, sage volume, similar capacity 
        • {warning} testing into production can make production unstable [1]
        • {recommendation} use Azure A capacities [22]
      • {recommendation} for formal projects, consider creating an environment for each phase
      • {recommendation} enable users to connect to published datasets to create their own reports
      • {recommendation} use parameters to store connection details 
        • e.g. instance names, database names
        • ⇐  deployment pipelines allow configuring parameter rules to set specific values for the development, test, and production stages
          • alternatively data source rules can be used to specify a connection string for a given dataset
            • {restriction} in deployment pipelines, this isn't supported for all data sources
      • {recommendation} keep the data in blob storage under the 50k blobs and 5GB data in total to prevent timeouts [29]
      • {recommendation} provide data to self-service authors from a centralized data warehouse [20]
        • allows to minimize the amount of work that self-service authors need to take on [20]
      • {recommendation} minimize the use of Excel, csv, and text files as sources when practical [20]
      • {recommendation} store source files in a central location accessible by all coauthors of the Power BI solution [20]
      • {recommendation} be aware of API connectivity issues and limits [20]
      • {recommendation} know how to support SaaS solutions from AppSource and expect further data integration requests [20]
      • {recommendation} minimize the query load on source systems [20]
        • use incremental refresh in Power BI for the dataset(s)
        • use a Power BI dataflow that extracts the data from the source on a schedule
        • reduce the dataset size by only extracting the needed amount of data 
      • {recommendation} expect data refresh operations to take some time [20]
      • {recommendation} use relational database sources when practical [20]
      • {recommendation} make the data easily accessible [20]
      • [knowledge area] knowledge transfer
        • {recommendation} maintain a list of best practices and review it regularly [24]
        • {recommendation} develop a training plan for the various types of users [24]
          • usability training for read only report/app users [24
          • self-service reporting for report authors & data analysts [24]
          • more elaborated training for advanced analysts & developers [24]
      • [knowledge area] lifecycle management
        • consists of the processes and practices used to handle content from its creation to its eventual retirement [6]
        • {recommendation} postfix files with 3-part version number in Development stage [24]
          • remove the version number when publishing files in UAT and production 
        • {recommendation} backup files for archive 
        • {recommendation} track version history 

        References:
        [1] Microsoft Learn (2021) Fabric: Deployment pipelines best practices [link]
        [2] Microsoft Learn (2024) Power BI: Power BI usage scenarios: Enterprise content publishing [link]
        [3] Microsoft Learn (2024) Deploy to Power BI [link]
        [4] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
        [5] Microsoft Learn (2024) Introduction to deployment pipelines [link]
        [6] Microsoft Learn (2024) Power BI implementation planning: Content lifecycle management [link]
        [20] Microsoft (2020) Planning a Power BI  Enterprise Deployment [White paper] [link]
        [22] Power BI Docs (2021) Create Power BI Embedded capacity in the Azure portal [link]
        [24] Paul Turley (2019)  A Best Practice Guide and Checklist for Power BI Projects

        Resources:

        Acronyms:
        API - Application Programming Interface
        CLM - Content Lifecycle Management
        COE - Center of Excellence
        SaaS - Software-as-a-Service
        SME - Subject Matter Expert
        UAT - User Acceptance Testing
        VSTS - Visual Studio Team System
        SME - Subject Matter Experts

        25 April 2025

        🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 25-Apr-2025

        [Microsoft Fabric] Incremental Refresh in Dataflows Gen2

        • {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
          • designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
          • configurable directly in the dataflow editor [8]
          • doesn't need to specify the historical data range [8]
            • ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
          • doesn't need to specify the parameters for the incremental refresh [8]
            • the filters and parameters are automatically added as the last step in the query [8]
        • {prerequisite} the data source 
          • supports folding [8]
          • needs to contain a Date/DateTime column that can be used to filter the data [8]
        • {prerequisite} the data destination supports incremental refresh [8]
          • available destinations
            • Fabric Warehouse
            • Azure SQL Database
            • Azure Synapse Analytics
            • Fabric Lakehouse [preview]
          • other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
            • allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
              • a full refresh from the staged data to the data destination is still needed [8]
        • works by dividing the data into buckets based on a DateTime column [8]
          • each bucket contains the data that changed since the last refresh [8]
            • the dataflow knows what changed by checking the maximum value in the specified column 
              • if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
              • if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
        • {limitation} 
          • the data destination must be set to a fixed schema [8]
          • ⇒table's schema in the data destination must be fixed and can't change [8]
            • ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
        • {limitation} the only supported update method in the data destination: replace
          • ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
            • data that is outside the bucket range isn't affected [8]
        • {limitation} maximum number of buckets
          • single query: 50
            • {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
          • whole dataflow: 150
            • {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
        • {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
          • because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
          • {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
            • {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
            • {option} reduce the number of buckets by increasing the bucket size [8]
            • {option} disable incremental refresh [8]
        • {recommendation} don't use the column for detecting changes also for filtering [8]
          • because this can lead to unexpected results [8]
        • {setting} limit number of concurrent evaluation
          • setting the value to a lower number, reduces the number of requests sent to the source system [8]
          • via global settings >> Scale tab >> maximum number of parallel query evaluations
          • {recommendation} don't enable this limit unless there're issues with the source system [8]

        References:
        [5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
        [8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link

        Resources:


        💫🗒️ERP Systems: Microsoft Dynamics 365's Business Process Catalog (BPC) [Notes]

        Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

        Last updated: 25-Apr-2025

        Business Process Catalog - End-to-End Scenarios

        [Dynamics 365] Business Process Catalog (BPC)

        • {def} lists of end-to-end processes that are commonly used to manage or support work within an organization [1]
          • agnostic catalog of business processes contained within the entire D365 solution space [3]
            • {benefit} efficiency and time savings [3]
            • {benefit} best practices [3]
            • {benefit} reduced risk [3]
            • {benefit} technology alignment [3]
            • {benefit} scalability [3]
            • {benefit} cross-industry applicability [3]
          • stored in an Excel workbook
            • used to organize and prioritize the work on the business process documentation [1]
            • {recommendation} check the latest versions (see [R1])
          • assigns unique IDs to 
            • {concept} end-to-end scenario
              • describe in business terms 
                • not in terms of software technology
              • includes the high-level products and features that map to the process [3]
              • covers two or more business process areas
              • {purpose} map products and features to benefits that can be understood in business contexts [3]
            • {concept} business process areas
              • combination of business language and basic D365 terminology [3]
              • groups business processes for easier searching and navigation [1]
              • separated by major job functions or departments in an organization [1]
              • {purpose} map concepts to benefits that can be understood in business context [3]
              • more than 90 business process areas defined [1]
            • {concept} business processes
              • a series of structured activities and tasks that organizations use to achieve specific goals and objectives [3]
                • efficiency and productivity
                • consistency and quality
                • cost reduction
                • risk management
                • scalability
                • data-driven decision-making
              • a set of tasks in a sequence that is completed to achieve a specific objective [5]
                • define when each step is done in the implementation [5] [3]
                • define how many are needed [5] [3]
              • covers a wide range of structured, often sequenced, activities or tasks to achieve a predetermined organizational goal
              • can refer to the cumulative effects of all steps progressing toward a business goal
              • describes a function or process that D365 supports
                • more than 700 business processes identified
                • {goal} provide a single entry point with links to relevant product-specific content [1]
              • {concept} business process guide
                • provides documentation on the structure and patterns of the process along with guidance on how to use them in a process-oriented implementation [3]
                • based on a catalog of business process supported by D365 [3]
              • {concept} process steps 
                • represented sequentially, top to bottom
                  • can include hyperlinks to the product documentation [5] 
                  • {recommendation} avoid back and forth in the steps as much as possible [5]
                • can be
                  • forms used in D365 [5]
                  • steps completed in LCS, PPAC, Azure or other Microsoft products [5]
                  • steps that are done outside the system (incl. third-party system) [5]
                  • steps that are done manually [5]
                • are not 
                  • product documentation [5]
                  • a list of each click to perform a task [5]
              • {concept} process states
                • include
                  • project phase 
                    • e.g. strategize, initialize, develop, prepare, operate
                  • configuration 
                    • e.g. base, foundation, optional
                  • process type
                    • e.g. configuration, operational
            • {concept} patterns
              • repeatable configurations that support a specific business process [1]
                • specific way of setting up D365 to achieve an objective [1]
                • address specific challenges in implementations and are based on a specific scenario or best practice [6]
                • the solution is embedded into the application [6]
                • includes high-level process steps [6]
              • include the most common use cases, scenarios, and industries [1]
              • {goal} provide a baseline for implementations
                • more than 2000 patterns, and we expect that number to grow significantly over time [1]
              • {activity} naming a new pattern
                • starts with a verb
                • describes a process
                • includes product names
                • indicate the industry
                • indicate AppSource products
            • {concept} reference architecture 
              • acts as a core architecture with a common solution that applies to many scenarios [6]
              • typically used for integrations to external solutions [6]
              • must include an architecture diagram [6]
          • {concept} process governance
            • {benefit} improved quality
            • {benefit} enhanced decision making
            • {benefit} agility adaptability
            • {benefit{ Sbd alignment
            • {goal} enhance efficiency 
            • {goal} ensure compliance 
            • {goal} facilitate accountability 
            • {concept} policy
            • {concept} procedure
            • {concept} control
          • {concept} scope definition
            • {recommendation} avoid replicating current processes without considering future needs [4]
              • {risk} replicating processes in the new system without re-evaluating and optimizing [4] 
              • {impact} missed opportunities for process improvement [4]
            • {recommendation} align processes with overarching business goals rather than the limitations of the current system [4]
          • {concept} guidance hub
            • a central landing spot for D365 guidance and tools
            • contains cross-application documentations
        • {purpose} provide considerations and best practices for implementation [6]
        • {purpose} provide technical information for implementation [6]
        • {purpose} provide link to product documentation to achieve the tasks in scope [6]
        Previous Post <<||>> Next Post 

        References:
        [1] Microsoft Learn (2024) Dynamics 365: Overview of end-to-end scenarios and business processes in Dynamics 365 [link]
        [2] Microsoft Dynamics 365 Community (2023) Business Process Guides - Business Process Guides [link]
        [3] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 2 Introduction to Business Processes [link]
        [4] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 3: Using the Business Process Catalog to Manage Project Scope and Estimation [link]
        [5] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 4: Authoring Business Processes [link]
        [6] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance - Part 5:  Authoring Business Processes Patterns and Use Cases [link]
        [7] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance  - Part 6: Conducting Process-Centric Discovery [link]
        [8] Microsoft Dynamics 365 Community (2024) Business Process Catalog and Guidance  - Part 7: Introduction to Process Governance [link]

        Resources:
        [R1] GitHub (2024) Business Process Catalog [link]
        [R2] Microsoft Learn (2024) Dynamics 365 guidance documentation and other resources [link]
        [R3] Dynamics 365 Blog (2025) Process, meet product: The business process catalog for Dynamics 365 [link]

        Acronyms:
        3T - Tools, Techniques, Tips
        ADO - 
        BPC - Business Process Catalog
        D365 - Dynamics 365
        LCS - Lifecycle Services
        PPAC - Power Platform admin center
        RFI - Request for Information
        RFP - Request for Proposal

        14 April 2025

        🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

        Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

        Last updated: 13-Apr-2025

        [Microsoft Fabric] Quotas

        • {def} assigned number of resources for an Azure subscription
          • depend on the Azure subscription type [5]
            • set and enforced in the scope of the subscription [1]
            • each subscription has a default value for each quota [1]
          • determine the maximum number of CUs for each of the capacities on the subscription [5]
            • customers don't get charged for quotas, but on capacities [5]
          • each quota represents a specific countable resource
            • e.g. number of VMs that can be created [2]
            • e.g. the number of storage accounts that can be used concurrently [2]
            • e.g. the number of networking resources that can be consumed [2]
            • e.g. the number of API calls to a particular service that can be made [2]
          • designed to help protect customers from unexpected behavior [1]
            • e.g. inaccurately resourced deployments, mistaken consumption
            • helps minimize risks from deceptive or inappropriate consumption and unexpected demand [1]
          • limitations
            • ⇐ some limits are managed at a regional level [2]
            • ⇐ variable and dependent on several factors
        • {type} adjustable quotas
          • quotas for which customers can request quota increases
            • via Azure Home My quotas page
            • based on amount or usage percentage and submitting it directly [1]
            • each subscription has a default quota value for each quota [1]
          • the quickest way to increase quotas [1]
        • {type} non-adjustable quotas
          • quotas which have a hard limit, usually determined by the scope of the subscription [1]
          • to make changes, customers must submit a support request for the Azure support team [1]
        • [Fabric] limit the number of CUs customers can provision across multiple capacities in a subscription
          • calculation based on
            • subscription plan type
            • Azure region
        • {action} provision a new capacity
          • {level} Azure management group limits
          • {level} Azure subscription limits
          • {level} Azure resource group limits
          • {level} Template limits
          • {level} Microsoft Entra ID limits
          • {level} Azure API Center limits
          • {level} Azure API Management limits
          • {level} API Management v2 tiers
          • {level} Azure App Service limits
          • {level} Azure Automation limits
          • {action} request quota increase
            • via support request
            • there is no cost associated with requesting a quota increase [1]
              • ⇐ costs are incurred based on resource usage, not the quotas themselves [1]
            • should be based on workloads' characteristics [1]
          • {action} view quota
            • [permission] contributor role, or another role that includes contributor access [5]
          • {action} manage quota
            • [permission] see Quota Request Operator
          • [Azure] quota alerts
            • notifications triggered when the usage of a specific Azure resource nears the predefined quota limit [4]
            • facilitates proactive resource management [4]
            • multiple alert rules can be created for a given quota or across quotas in a subscription [4]

            References:
            [1] Microsoft Learn (2024) Fabric: Quotas overview [link
            [2] Microsoft Learn (2025) Fabric: Azure subscription and service limits, quotas, and constraints [link
            [3] Microsoft Learn (2025) Fabric: Quota monitoring and alerting [link
            [4] Microsoft Fabric Update Blog (2024) Announcing the launch of Microsoft Fabric Quotas [link]
            [5] Microsoft Learn (2025) Fabric: Microsoft Fabric capacity quotas [link]

            Resources:
            [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

            Acronyms:
            API - Application Programming Interface
            CU - Capacity Units
            VM - Virtual Machine

            13 April 2025

            🏭🗒️Microsoft Fabric: Continuous Integration & Continuous Deployment [CI/CD] [Notes]

            Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

            Last updated: 13-Apr-2025

            [Microsoft Fabric] Continuous Integration & Continuous Deployment [CI/CD] 
            • {def} development processes, tools, and best practices used to automates the integration, testing, and deployment of code changes to ensure efficient and reliable development
              • can be used in combination with a client tool
                • e.g. VS Code, Power BI Desktop
                • don’t necessarily need a workspace
                  • developers can create branches and commit changes to that branch locally, push those to the remote repo and create a pull request to the main branch, all without a workspace
                  • workspace is needed only as a testing environment [1]
                    • to check that everything works in a real-life scenario [1]
              • addresses a few pain points [2]
                • manual integration issues
                  • manual changes can lead to conflicts and errors
                    • slow down development [2]
                • development delays
                  • manual deployments are time-consuming and prone to errors
                    • lead to delays in delivering new features and updates [2]
                • inconsistent environments
                  • inconsistencies between environment cause issues that are hard to debug [2]
                • lack of visibility
                  • can be challenging to
                    • track changes though their lifetime [2]
                    • understand the state of the codebase[2]
              • {process} continuous integration (CI)
              • {process} continuous deployment (CD)
              • architecture
                • {layer} development database 
                  • {recommendation} should be relatively small [1]
                • {layer} test database 
                  • {recommendation{ should be as similar as possible to the production database [1]
                • {layer} production database

                • data items
                  • items that store data
                  • items' definition in Git defines how the data is stored [1]
              • {stage} development 
                • {best practice} back up work to a Git repository
                  • back up the work by committing it into Git [1]
                  • {prerequisite} the work environment must be isolated [1]
                    • so others don’t override the work before it gets committed [1]
                    • commit to a branch no other developer is using [1]
                    • commit together changes that must be deployed together [1]
                      • helps later when 
                        • deploying to other stages
                        • creating pull requests
                        • reverting changes
                • {warning} big commits might hit the max commit size limit [1]
                  • {bad practice} store large-size items in source control systems, even if it works [1]
                  • {recommendation} consider ways to reduce items’ size if they have lots of static [1] resources, like images [1]
                • {action} revert to a previous version
                  • {operation} undo
                    • revert the immediate changes made, as long as they aren't committed yet [1]
                    • each item can be reverted separately [1]
                  • {operation} revert
                    • reverting to older commits
                      • {recommendation} promote an older commit to be the HEAD 
                        • via git revert or git reset [1]
                        • shows that there’s an update in the source control pane [1]
                        • the workspace can be updated with that new commit [1]
                    • {warning} reverting a data item to an older version might break the existing data and could possibly require dropping the data or the operation might fail [1]
                    • {recommendation} check dependencies in advance before reverting changes back [1]
                • {concept} private workspace
                  • a workspace that provides an isolated environment [1]
                  • allows to work in isolation, use a separate [1]
                  • {prerequisite} the workspace is assigned to a Fabric capacity [1]
                  • {prerequisite} access to data to work in the workspace [1]
                  • {step} create a new branch from the main branch [1]
                    • allows to have most up-to-date version of the content [1]
                    • can be used for any future branch created by the user [1]
                      • when a sprint is over, the changes are merged and one can start a fresh new task [1]
                        • switch the connection to a new branch on the same workspace
                      • approach can be used when is needed to fix a bug in the middle of a sprint [1]
                    • {validation} connect to the correct folder in the branch to pull the right content into the workspace [1]
                • {best practice} make small incremental changes that are easy to merge and less likely to get into conflicts [1]
                  • update the branch to resolve the conflicts first [1]
                • {best practice} change workspace’s configurations to enable productivity [1]
                  • connection between items, or to different data sources or changes to parameters on a given item [1]
                • {recommendation} make sure you're working with the supported structure of the item you're authoring [1]
                  • if you’re not sure, first clone a repo with content already synced to a workspace, then start authoring from there, where the structure is already in place [1]
                • {constraint} a workspace can only be connected to a single branch at a time [1]
                  • {recommendation} treat this as a 1:1 mapping [1]
              • {stage} test
                • {best practice} allows to simulate a real production environment for testing purposes [1]
                  • {alternative} simulate this by connecting Git to another workspace [1]
                • factors to consider for the test environment
                  • data volume
                  • usage volume
                  • production environment’s capacity
                    • stage and production should have the same (minimal) capacity [1]
                      • using the same capacity can make production unstable during load testing [1]
                        • {recommendation} test using a different capacity similar in resources to the production capacity [1]
                        • {recommendation} use a capacity that allows to pay only for the testing time [1]
                          • allows to avoid unnecessary costs [1]
                • {best practice} use deployment rules with a real-life data source
                  • {recommendation} use data source rules to switch data sources in the test stage or parameterize the connection if not working through deployment pipelines [1]
                  • {recommendation} separate the development and test data sources [1]
                  • {recommendation} check related items
                    • the changes made can also affect the dependent items [1]
                  • {recommendation} verify that the changes don’t affect or break the performance of dependent items [1]
                    • via impact analysis.
                • {operation} update data items in the workspace
                  • imports items’ definition into the workspace and applies it on the existing data [1]
                  • the operation is same for Git and deployment pipelines [1]
                  • {recommendation} know in advance what the changes are and what impact they have on the existing data [1]
                  • {recommendation} use commit messages to describe the changes made [1]
                  • {recommendation} upload the changes first to a dev or test environment [1]
                    • {benefit} allows to see how that item handles the change with test data [1]
                  • {recommendation} check the changes on a staging environment, with real-life data (or as close to it as possible) [1]
                    • {benefit} allows to minimize the unexpected behavior in production [1]
                  • {recommendation} consider the best timing when updating the Prod environment [1]
                    • {benefit} minimize the impact errors might cause on the business [1]
                  • {recommendation} perform post-deployment tests in Prod to verify that everything works as expected [1]
                  • {recommendation} have a deployment, respectively a recovery plan [1]
                    • {benefit) allows to minimize the effort, respectively the downtime [1]
              • {stage} production
                • {best practice} let only specific people manage sensitive operations [1]
                • {best practice} use workspace permissions to manage access [1]
                  • applies to all BI creators for a specific workspace who need access to the pipeline
                • {best practice} limit access to the repo or pipeline by only enabling permissions to users [1] who are part of the content creation process [1]
                • {best practice} set deployment rules to ensure production stage availability [1]
                  • {goal} ensure the data in production is always connected and available to users [1]
                  • {benefit} allows deployments run while while minimizing the downtimes
                  • applies to data sources and parameters defined in the semantic model [1]
                • deployment into production using Git branches
                  • {recommendation} use release branches [1]
                    • requires changing the connection of workspace to the new release branches before every deployment [1]
                    • if the build or release pipeline requires to change the source code, or run scripts in a build environment before deployment, then connecting the workspace to Git won't help [1]
                • {recommendation} after deploying to each stage, make sure to change all the configuration specific to that stage [1]

              References:
              [1] Microsoft Learn (2025) Fabric: Best practices for lifecycle management in Fabric [link]
              [2] Microsoft Learn (2025) Fabric: CI/CD for pipelines in Data Factory in Microsoft Fabric [link]
              [3] Microsoft Learn (2025) Fabric: Choose the best Fabric CI/CD workflow option for you [link]

              Acronyms:
              API - Application Programming Interface
              BI - Business Intelligence
              CI/CD - Continuous Integration and Continuous Deployment
              VS - Visual Studio
              Related Posts Plugin for WordPress, Blogger...

              About Me

              My photo
              Koeln, NRW, Germany
              IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.