SQL Troubles: 🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes]

25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 24-Nov-2025

[Microsoft Fabric] Incremental Refresh in Dataflows Gen2

{feature} enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]

designed to reduce the amount of data that needs to be processed and retrieved from the source system [8]
configurable directly in the dataflow editor [8]
doesn't need to specify the historical data range [8]

⇐ the dataflow doesn't remove any data from the destination that's outside the bucket range [8]

doesn't need to specify the parameters for the incremental refresh [8]

the filters and parameters are automatically added as the last step in the query [8]

{prerequisite} the data source

supports folding [8]
needs to contain a Date/DateTime column that can be used to filter the data [8]

{prerequisite} the data destination supports incremental refresh [8]

available destinations

Fabric Warehouse
Azure SQL Database
Azure Synapse Analytics
Fabric Lakehouse [preview]

other destinations can be used in combination with incremental refresh by using a second query that references the staged data to update the data destination [8]

allows to use incremental refresh to reduce the amount of data that needs to be processed and retrieved from the source system [8]

a full refresh from the staged data to the data destination is still needed [8]

works by dividing the data into buckets based on a DateTime column [8]

each bucket contains the data that changed since the last refresh [8]

the dataflow knows what changed by checking the maximum value in the specified column

if the maximum value changed for that bucket, the dataflow retrieves the whole bucket and replaces the data in the destination [8]
if the maximum value didn't change, the dataflow doesn't retrieve any data [8]

{setting|required} bucket size

specifies the size of the buckets that the dataflow uses to filter data [8]
smaller bucket size

the dataflow processes less data in each iteration, but requires more iterations to process all the data [8]

larger bucket size

the dataflow processes more data in each iteration, but requires fewer iterations to process all the data [8]

{setting} only extract new data when the maximum value in this column changes

specifies the column that the dataflow uses to determine if data changed [8]

the dataflow compares the maximum value to the maximum value from the previous refresh [8]

if the maximum value changed, the dataflow gets the data that changed since the last refresh [8]
if the maximum value didn't change, the dataflow doesn't get any data [8]

{setting|optional} only extract data for concluded periods

specifies whether the dataflow should only extract data for concluded periods [8]
when disabled, the dataflow extracts data for all periods, including periods that aren't complete and contain future data [8]

{setting} require incremental refresh query to fully fold [8]

controls whether the incremental refresh query must 'fully fold' [8]

⇒ it gets pushed down entirely to the source system for processing [8]

{recommendation}keep this setting enabled

after saving the dataflow, it validates whether query folding to the source is possible

if the validation fails, the dataflow may suffer from reduced performance and could end up retrieving unnecessary, unfiltered data [8]

when enabled the query must fully fold [8]
when disabled, the query can be partially processed by the dataflow instead of the source system [8]

{limitation} the maximum number of concurrent evaluations: 10 [8]

the dataflow can only evaluate 10 buckets at the same time [8]
for more than 10 buckets, is needed to limit the number of buckets or limit the number of concurrent evaluations [8]

{limitation} [lakehouse] if other tools (like Spark) or processes also write to the same table, they can interfere with incremental refresh [8]

{recommendation} avoid using other writers while using incremental refresh [8]

{recommendation}make sure other writers don't conflict with the incremental refresh process [8]

{limitation} table maintenance operations aren't supported for tables that use incremental refresh [8]

e.g. OPTIMIZE/REORG TABLE

{limitation} the data destination must be set to a fixed schema [8]

⇒ table's schema in the data destination must be fixed and can't change [8]

⇒ dynamic schema must be changed to fixed schema before configuring incremental refresh [8]

{limitation} the only supported update method in the data destination: replace

⇒the dataflow replaces the data for each bucket in the data destination with the new data [8]

data that is outside the bucket range isn't affected [8

{limitation} maximum number of buckets

single query: 50

{workaround} increase the bucket size or reduce the bucket range to lower the number of buckets [8]

whole dataflow: 150

{workaround} reduce the number of incremental refresh queries or increase the bucket size [8]

{downside} the dataflow may take longer to refresh after enabling incremental refresh [8]

because the additional overhead of checking if data changed and processing the buckets is higher than the time saved by processing less data [8]
{recommendation} review the settings for incremental refresh and adjust them to better fit the scenario

{option} increase the bucket size to reduce the number of buckets and the overhead of processing them [8]
{option} reduce the number of buckets by increasing the bucket size [8]
{option} disable incremental refresh [8]

{recommendation} don't use the column for detecting changes also for filtering [8]

because this can lead to unexpected results [8]

{recommendation} ensure that the gateway is updated to at least version May 2025 update (3000.270) or later [8]

allows to maintain compatibility while ensuring that incremental refresh functions correctly with lakehouse destinations [8]

{limitation} switching from non-incremental to incremental refresh with existing overlapping data in the destination is not supported [8]

{recommendation} filter the initial ingestion to include only data before the earliest incremental bucket to avoid overlap and ensure correct refresh behavior [8[

{setting} limit number of concurrent evaluation

setting the value to a lower number, reduces the number of requests sent to the source system [8]
via global settings >> Scale tab >> maximum number of parallel query evaluations
{recommendation} don't enable this limit unless there're issues with the source system [8]

Previous Post <<||>> Next Post

References:

[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]

Resources:

SQL Troubles

Pages

25 April 2025

🏭🗒️Microsoft Fabric: Dataflows Gen2's Incremental Refresh [Notes]

No comments:

About Me