Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 24-Nov-2025
[Microsoft Fabric] Incremental Refresh in Dataflows Gen2
- {feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
- designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
- configurable directly in the dataflow editor [8]
- doesn't need to specify the historical data range [8]
- ⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]
- doesn't need to specify the parameters for the incremental refresh [8]
- the filters and parameters are automatically added as the last step in the query [8]
- {prerequisite} the data source
- supports folding [8]
- needs to contain a Date/DateTime column that can be used to filter the data [8]
- {prerequisite} the data destination supports incremental refresh [8]
- available destinations
- Fabric Warehouse
- Azure SQL Database
- Azure Synapse Analytics
- Fabric Lakehouse [preview]
- other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]
- allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]
- a full refresh from the staged data to the data destination is still needed [8]
- works by dividing the data into buckets based on a DateTime column [8]
- each bucket contains the data that changed since the last refresh [8]
- the dataflow knows what changed by checking the maximum value in the specified column
- if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
- if the maximum value didn't change, the dataflow doesn't retrieve any data [8]
- {setting|required} bucket size
- specifies the size of the buckets that the dataflow uses to filter data [8]
- smaller bucket size
- the dataflow processes less data in each iteration, but requires more iterations to process all the data [8]
- larger bucket size
- the dataflow processes more data in each iteration, but requires fewer iterations to process all the data [8]
- {setting} only extract new data when the maximum value in this column changes
- specifies the column that the dataflow uses to determine if data changed [8]
- the dataflow compares the maximum value to the maximum value from the previous refresh [8]
- if the maximum value changed, the dataflow gets the data that changed since the last refresh [8]
- if the maximum value didn't change, the dataflow doesn't get any data [8]
- {setting|optional} only extract data for concluded periods
- specifies whether the dataflow should only extract data for concluded periods [8]
- when disabled, the dataflow extracts data for all periods, including periods that aren't complete and contain future data [8]
- {setting} require incremental refresh query to fully fold [8]
- controls whether the incremental refresh query must 'fully fold' [8]
- ⇒ it gets pushed down entirely to the source system for processing [8]
- {recommendation}keep this setting enabled
- after saving the dataflow, it validates whether query folding to the source is possible
- if the validation fails, the dataflow may suffer from reduced performance and could end up retrieving unnecessary, unfiltered data [8]
- when enabled the query must fully fold [8]
- when disabled, the query can be partially processed by the dataflow instead of the source system [8]
- {limitation} the maximum number of concurrent evaluations: 10 [8]
- the dataflow can only evaluate 10 buckets at the same time [8]
- for more than 10 buckets, is needed to limit the number of buckets or limit the number of concurrent evaluations [8]
- {limitation} [lakehouse] if other tools (like Spark) or processes also write to the same table, they can interfere with incremental refresh [8]
- {recommendation} avoid using other writers while using incremental refresh [8]
- {recommendation}make sure other writers don't conflict with the incremental refresh process [8]
- {limitation} table maintenance operations aren't supported for tables that use incremental refresh [8]
- e.g. OPTIMIZE/REORG TABLE
- {limitation} the data destination must be set to a fixed schema [8]
- ⇒ table's schema in the data destination must be fixed and can't change [8]
- ⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]
- {limitation} the only supported update method in the data destination: replace
- ⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]
- data that is outside the bucket range isn't affected [8
- {limitation} maximum number of buckets
- single query: 50
- {workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]
- whole dataflow: 150
- {workaround} reduce the number of incremental refresh queries or increase the bucket size [8]
- {downside} the dataflow may take longer to refresh after enabling incremental refresh [8]
- because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
- {recommendation} review the settings for incremental refresh and adjust them to better fit the scenario
- {option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
- {option} reduce the number of buckets by increasing the bucket size [8]
- {option} disable incremental refresh [8]
- {recommendation} don't use the column for detecting changes also for filtering [8]
- because this can lead to unexpected results [8]
- {recommendation} ensure that the gateway is updated to at least version May 2025 update (3000.270) or later [8]
- allows to maintain compatibility while ensuring that incremental refresh functions correctly with lakehouse destinations [8]
- {limitation} switching from non-incremental to incremental refresh with existing overlapping data in the destination is not supported [8]
- {recommendation} filter the initial ingestion to include only data before the earliest incremental bucket to avoid overlap and ensure correct refresh behavior [8[
- {setting} limit number of concurrent evaluation
- setting the value to a lower number, reduces the number of requests sent to the source system [8]
- via global settings >> Scale tab >> maximum number of parallel query evaluations
- {recommendation} don't enable this limit unless there're issues with the source system [8]
References:
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]
Resources:

No comments:
Post a Comment