25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 25-Apr-2025

[Microsoft Fabric] Incremental Refresh in Dataflows Gen2

  • {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
    • designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
    • configurable directly in the dataflow editor [8]
    • doesn't need to specify the historical data range [8]
      • ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
    • doesn't need to specify the parameters for the incremental refresh [8]
      • the filters and parameters are automatically added as the last step in the query [8]
  • {prerequisite} the data source 
    • supports folding [8]
    • needs to contain a Date/DateTime column that can be used to filter the data [8]
  • {prerequisite} the data destination supports incremental refresh [8]
    • available destinations
      • Fabric Warehouse
      • Azure SQL Database
      • Azure Synapse Analytics
      • Fabric Lakehouse [preview]
    • other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
      • allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
        • a full refresh from the staged data to the data destination is still needed [8]
  • works by dividing the data into buckets based on a DateTime column [8]
    • each bucket contains the data that changed since the last refresh [8]
      • the dataflow knows what changed by checking the maximum value in the specified column 
        • if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
        • if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
  • {limitation} 
    • the data destination must be set to a fixed schema [8]
    • ⇒table's schema in the data destination must be fixed and can't change [8]
      • ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
  • {limitation} the only supported update method in the data destination: replace
    • ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
      • data that is outside the bucket range isn't affected [8]
  • {limitation} maximum number of buckets
    • single query: 50
      • {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
    • whole dataflow: 150
      • {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
  • {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
    • because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
    • {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
      • {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
      • {option} reduce the number of buckets by increasing the bucket size [8]
      • {option} disable incremental refresh [8]
  • {recommendation} don't use the column for detecting changes also for filtering [8]
    • because this can lead to unexpected results [8]
  • {setting} limit number of concurrent evaluation
    • setting the value to a lower number, reduces the number of requests sent to the source system [8]
    • via global settings >> Scale tab >> maximum number of parallel query evaluations
    • {recommendation} don't enable this limit unless there're issues with the source system [8]

References:
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link

Resources:


No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.