25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 24-Nov-2025

[Microsoft Fabric] Incremental Refresh in Dataflows Gen2

  • {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
    • designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
    • configurable directly in the dataflow editor [8]
    • doesn't need to specify the historical data range [8]
      • ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
    • doesn't need to specify the parameters for the incremental refresh [8]
      • the filters and parameters are automatically added as the last step in the query [8]
  • {prerequisite} the data source 
    • supports folding [8]
    • needs to contain a Date/DateTime column that can be used to filter the data [8]
  • {prerequisite} the data destination supports incremental refresh [8]
    • available destinations
      • Fabric Warehouse
      • Azure SQL Database
      • Azure Synapse Analytics
      • Fabric Lakehouse [preview]
    • other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
      • allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
        • a full refresh from the staged data to the data destination is still needed [8]
  • works by dividing the data into buckets based on a DateTime column [8]
    • each bucket contains the data that changed since the last refresh [8]
      • the dataflow knows what changed by checking the maximum value in the specified column 
        • if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
        • if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
  • {setting|required} bucket size
    • specifies the size of the buckets that the dataflow uses to filter data [8] 
    • smaller bucket size
      • the dataflow processes less data in each iteration, but requires more iterations to process all the data [8]
    • larger bucket size
      • the dataflow processes more data in each iteration, but requires fewer iterations to process all the data [8]
  • {setting} only extract new data when the maximum value in this column changes
    •   specifies the column that the dataflow uses to determine if data changed [8]
      • the dataflow compares the maximum value to the maximum value from the previous refresh [8]
        • if the maximum value changed, the dataflow gets the data that changed since the last refresh [8]
        • if the maximum value didn't change, the dataflow doesn't get any data [8]
  • {setting|optional} only extract data for concluded periods
    • specifies whether the dataflow should only extract data for concluded periods [8]
    • when disabled, the dataflow extracts data for all periods, including periods that aren't complete and contain future data [8]
  • {setting} require incremental refresh query to fully fold [8]
    • controls whether the incremental refresh query must 'fully fold' [8]
      • ⇒ it gets pushed down entirely to the source system for processing [8]
    • {recommendation}keep this setting enabled
      • after saving the dataflow, it validates whether query folding to the source is possible
        • if the validation fails, the dataflow may suffer from reduced performance and could end up retrieving unnecessary, unfiltered data [8]
    • when enabled the query must fully fold [8]
    • when disabled, the query can be partially processed by the dataflow instead of the source system [8]
  • {limitation} the maximum number of concurrent evaluations: 10 [8]
    •  the dataflow can only evaluate 10 buckets at the same time [8]
    •  for more than 10 buckets, is needed to limit the number of buckets or limit the number of concurrent evaluations [8]
  • {limitation} [lakehouse] if other tools (like Spark) or processes also write to the same table, they can interfere with incremental refresh [8]
    • {recommendation} avoid using other writers while using incremental refresh [8]
      • {recommendation}make sure other writers don't conflict with the incremental refresh process [8] 
  • {limitation} table maintenance operations aren't supported for tables that use incremental refresh [8]
    • e.g. OPTIMIZE/REORG TABLE
  • {limitation} the data destination must be set to a fixed schema [8]
    • ⇒ table's schema in the data destination must be fixed and can't change [8]
      • ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
  • {limitation} the only supported update method in the data destination: replace
    • ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
      • data that is outside the bucket range isn't affected [8
  • {limitation} maximum number of buckets
    • single query: 50
      • {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
    • whole dataflow: 150
      • {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
  • {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
    • because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
    • {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
      • {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
      • {option} reduce the number of buckets by increasing the bucket size [8]
      • {option} disable incremental refresh [8]
  • {recommendation} don't use the column for detecting changes also for filtering [8]
    • because this can lead to unexpected results [8]
  • {recommendation} ensure that the gateway is updated to at least version May 2025 update (3000.270) or later [8]
    • allows to maintain compatibility while ensuring that incremental refresh functions correctly with lakehouse destinations [8]
  • {limitation} switching from non-incremental to incremental refresh with existing overlapping data in the destination is not supported [8]
    • {recommendation} filter the initial ingestion to include only data before the earliest incremental bucket to avoid overlap and ensure correct refresh behavior [8[
  • {setting} limit number of concurrent evaluation
    • setting the value to a lower number, reduces the number of requests sent to the source system [8]
    • via global settings >> Scale tab >> maximum number of parallel query evaluations
    • {recommendation} don't enable this limit unless there're issues with the source system [8]

References:
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link

Resources:


No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.