SQL Troubles: 🏭🗒️Microsoft Fabric: Dataflows Gen2 [Notes]

10 March 2024

🏭🗒️Microsoft Fabric: Dataflows Gen2 [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 24-Nov-2025

Dataflow (Gen2) Architecture [4]

[Microsoft Fabric] Dataflow (Gen2)

cloud-based, low-code interface that provides a modern data integration experience allowing users to ingest, prepare and transform data from a rich set of data sources incl. databases, data warehouses, lakehouses, real-time data repositories, etc. [11]

new generation of dataflows that resides alongside the Power BI Dataflow (Gen1) [2]

brings new features, improved experience [2] and enhanced performance [11]
similar to Dataflow Gen1 in Power BI [2]
{recommendation} implement new functionality using Dataflow (Gen2) [11]

allows to leverage the many features and experiences not available in (Gen1)

{recommendation} migrate from Dataflow (Gen1) to (Gen2) [11]

allows to leverage the modern experience and capabilities

allows to

extract data from various sources [1]
transform it using a wide range of transformation operations [1]
load it into a destination [1]

{goal} provide an easy, reusable way to perform ETL tasks using Power Query Online [1]

allows to promote reusable ETL logic

⇒ prevents the need to create more connections to the data source [1]
offer a wide variety of transformations [1]

can be horizontally partitioned

{component} Lakehouse

used to stage data being ingested

{component} Warehouse

used as a compute engine and means to write back results to staging or supported output destinations faster

{component} mashup engine

extracts, transforms, or loads the data to staging or data destinations when either [4]

warehouse compute cannot be used [4]
{limitation} staging is disabled for a query [4]

{operation} create a dataflow

can be created in a

Data Factory workload
Power BI workspace
Lakehouse

when a dataflow (Gen2) is reated in a workspace, lakehouse and warehouse items are provisioned along with their related SQL analytics endpoint and semantic models [12]

shared by all dataflows in the workspace and are required for Dataflow Gen2 to operate [12]

{warning} shouldn't be deleted, and aren't intended to be used directly by users [12]
aren't visible in the workspace, but might be accessible in other experiences such as the Notebook, SQL-endpoint, Lakehouse, and Warehouse experience [12]
the items can be recognized by their prefix:`DataflowsStaging' [12]

{operation} set a default destination for the dataflow

helps to get started quickly by loading all queries to the same destination [14]
via ribbon or the status bar in the editor
users are prompted to choose a destination and select which queries to bind to it [14]
to update the default destination, delete the current default destination and set a new one [14]
{default} any new query has as destination the lakehouse, warehouse, or KQL database from which it got started [14]

{operation} publish a dataflow

generates dataflow's definition

⇐ the program that runs once the dataflow is refreshed to produce tables in staging storage and/or output destination [4]
used by the dataflow engine to generate an orchestration plan, manage resources, and orchestrate execution of queries across data sources, gateways, and compute engines, and to create tables in either the staging storage or data destination [4]

saves changes and runs validations that must be performed in the background [2]

{operation} export/import dataflows [11]

allows also to migrate from dataflow (Gen1) to (Gen2) [11]

{operation} refresh a dataflow

applies the transformation steps defined during authoring
can be triggered on-demand or by setting up a refresh schedule
{action} cancel refresh

enables to cancel ongoing Dataflow Gen2 refreshes from the workspace items view [6]
once canceled, the dataflow's refresh history status is updated to reflect cancellation status [15]
{scenario} stop a refresh during peak time, if a capacity is nearing its limits, or if refresh is taking longer than expected [15]
it may have different outcomes

data from the last successful refresh is available [15]
data written up to the point of cancellation is available [15]

{warning} if a refresh is canceled before evaluation of a query that loads data to a destination began, there's no change to data in that query's destination [15]

{limitation} each dataflow is allowed up to 300 refreshes per 24-hour rolling window [15]

{warning} attempting 300 refreshes within a short burst (e.g., 60 seconds) may trigger throttling and result in rejected requests [15]

protections in place to ensure system reliability [15]

if the scheduled dataflow refresh fails consecutively, dataflow refresh schedule is paused and an email is sent to the owner [15]

{limitation} a single evaluation of a query has a limit of 8 hours [15]
{limitation} total refresh time of a single refresh of a dataflow is limited to a max of 24 hours [15]
{limitation} per dataflow one can have a maximum of 50 staged queries, or queries with output destination, or combination of both [15]

{operation} copy and paste code in Power Query [11]

allows to migrate dataflow (Gen1) to (Gen2) [11]

{operation} save a dataflow [11]

via 'Save As' feature
can be used to save a dataflow (Gen1) as (Gen2) dataflow [11]

{operation} save a dataflow as draft

allows to make changes to dataflows without immediately publishing them to a workspace [13]

can be later reviewed, and then published, if needed [13]

{operation} publish draft dataflow

performed as a background job [13]
publishing related errors are visible next to the dataflow's name [13]

selecting the indication reveals the publishing errors and allows to edit the dataflow from the last saved version [13]

{operation} run a dataflow

can be performed

manually
on a refresh schedule
as part of a Data Pipeline orchestration

{operation} monitor pipeline runs

allows to check pipelines' status, spot issues early, respectively troubleshoot issues
[Workspace Monitoring] provides log-level visibility for all items in a workspace [link]

via Workspace Settings >> select Monitoring

[Monitoring Hub] serves as a centralized portal for browsing pipeline runs across items within the Data Factory or Data Engineering experience [link]

{feature} connect multiple activities in a pipeline [11]

allows to build end-to-end, automated data workflows

{feature} author dataflows with Power Query

uses the full Power Query experience of Power BI dataflows [2]

{feature} shorter authoring flow

uses step-by-step for getting the data into your the dataflow [2]

the number of steps required to create dataflows were reduced [2]

a few new features were added to improve the experience [2]

{feature} AutoSave and background publishing

changes made to a dataflow are autosaved to the cloud (aka draft version of the dataflow) [2]

⇐ without having to wait for the validation to finish [2]

{functionality} save as draft

stores a draft version of the dataflow every time you make a change [2]
seamless experience and doesn't require any input [2]

{concept} published version

the version of the dataflow that passed validation and is ready to refresh [5]

{feature} integration with data pipelines

integrates directly with Data Factory pipelines for scheduling and orchestration [2]

{feature} high-scale compute

leverages a new, higher-scale compute architecture [2]

improves the performance of both transformations of referenced queries and get data scenarios [2]
creates both Lakehouse and Warehouse items in the workspace, and uses them to store and access data to improve performance for all dataflows [2]

{feature} improved monitoring and refresh history

integrate support for Monitoring Hub [2]
Refresh History experience upgraded [2]

{feature} get data via Dataflows connector

supports a wide variety of data source connectors

include cloud and on-premises relational databases

{feature} incremental refresh

enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]

{feature} data destinations

allows to

specify an output destination
separate ETL logic and destination storage [2]

every tabular data query can have a data destination [3]

available destinations

Azure SQL databases
Azure Data Explorer (Kusto)
Fabric Lakehouse
Fabric Warehouse
Fabric KQL database

a destination can be specified for every query individually [3]
multiple different destinations can be used within a dataflow [3]
connecting to the data destination is similar to connecting to a data source
{limitation} functions and lists aren't supported

{operation} creating a new table

{default} table name has the same name as the query name.

{operation} picking an existing table
{operation} deleting a table manually from the data destination

doesn't recreate the table on the next refresh [3]

{operation} reusing queries from Dataflow Gen1

{method} export Dataflow Gen1 query and import it into Dataflow Gen2

export the queries as a PQT file and import them into Dataflow Gen2 [2]

{method} copy and paste in Power Query

copy the queries and paste them in the Dataflow Gen2 editor [2]

{feature} automatic settings:

{limitation} supported only for Lakehouse and Azure SQL database
{setting} Update method replace:

data in the destination is replaced at every dataflow refresh with the output data of the dataflow [3]

{setting} Managed mapping:

the mapping is automatically adjusted when republishing the data flow to reflect the change

⇒ doesn't need to be updated manually into the data destination experience every time changes occur [3]

{setting} Drop and recreate table:

on every dataflow refresh the table is dropped and recreated to allow schema changes
{limitation} the dataflow refresh fails if any relationships or measures were added to the table [3]

{feature} update methods

{method} replace

on every dataflow refresh, the data is dropped from the destination and replaced by the output data of the dataflow.
{limitation} not supported by Fabric KQL databases and Azure Data Explorer

{method} append

on every dataflow refresh, the output data from the dataflow is appended (aka merged) to the existing data in the data destination table (aka upsert)

{feature} data staging

{default} enabled

allows to use Fabric compute to execute queries

⇐ enhances the performance of query processing

the data is loaded into the staging location

⇐ an internal Lakehouse location accessible only by the dataflow itself

[Warehouse] staging is required before the write operation to the data destination

⇐ improves performance
{limitation} only loading into the same workspace as the dataflow is supported

using staging locations can enhance performance in some cases

disabled

{recommendation} [Lakehouse] disable staging on the query to avoid loading twice into a similar destination

⇐ once for staging and once for data destination
improves dataflow's performance

{scenario} use a dataflow to load data into the lakehouse and then use a notebook to analyze the data [2]
{scenario} use a dataflow to load data into an Azure SQL database and then use a data pipeline to load the data into a data warehouse [2]

{feature} Fast Copy

allows ingesting terabytes of data with the easy experience and the scalable back-end of the pipeline Copy Activity [7]

enables large-scale data ingestion directly utilizing the pipelines Copy Activity capability [6]
supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage [6]
significantly scales up the data processing capacity providing high-scale ELT capabilities

the feature must be enabled [7]

after enabling, Dataflows automatically switch the back-end when data size exceeds a particular threshold [7]
⇐there's no need to change anything during authoring of the dataflows
one can check the refresh history to see if fast copy was used [7]
⇐see the Engine typeRequire fast copy option
{option} Require fast copy

{prerequisite} Fabric capacity is available [7]

requires a Fabric capacity or a Fabric trial capacity [11]

{prerequisite} data files

are in .csv or parquet format
have at least 100 MB
are stored in an ADLS Gen2 or a Blob storage account [6]

{prerequisite} [Azure SQL DB|PostgreSQL] >= 5 million rows in the data source [7]
{limitation} doesn't support [7]

the VNet gateway
writing data into an existing table in Lakehouse
fixed schema

{feature} parameters

allow to dynamically control and customize dataflows

makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [9]
the dataflow is refreshed by passing parameter values outside of the Power Query editor through either

Fabric REST API [9]
native Fabric experiences [9]

parameter names are case sensitive [9]
{type} required parameters

{warning} the refresh fails if no value is passed for it [9]

{type} optional parameters
enabled via Parameters >> Enable parameters to be discovered and override for execution [9]

{limitation} dataflows with parameters can't be

scheduled for refresh through the Fabric scheduler [9]
manually triggered through the Fabric Workspace list or lineage view [9]

{limitation} parameters that affect the resource path of a data source or a destination are not supported [9]

⇐connections are linked to the exact data source path defined in the authored dataflow

can't be currently override to use other connections or resource paths [9]

{limitation} can't be leveraged by dataflows with incremental refresh [9]
{limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override

any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [9]

{warning} allow other users who have permissions to the dataflow to refresh the data with other values [9]
{limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [9]
{limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [9]
{limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [9]
{limitation} only the first request will be accepted from duplicated requests for the same parameter values [9]

subsequent requests are rejected until the first request finishes its evaluation [9]

{feature} support for CI/CD and Git integration

allows to create, edit, and manage dataflows in a Git repository that's connected to a Fabric workspace [10]
allows to use the deployment pipelines to automate the deployment of dataflows between workspaces [10]
allows to use Public APIs to create and manage Dataflow Gen2 with CI/CD and Git integration [10]
allows to create Dataflow Gen2 directly into a workspace folder [10]
allows to use the Fabric settings and scheduler to refresh and edit settings for Dataflow Gen2 [10]
{action} save a workflow

replaces the publish operation

when saving th dataflow, it automatically publishes the changes to the dataflow [10]

{action} delete a dataflow

the staging artifacts become visible in the workspace and are safe to be deleted [10]

{action} schedule a refresh

can be done manually or by scheduling a refresh [10]
{limitation} the Workspace view doesn't show if a refresh is ongoing for the dataflow [10]
refresh information is available in the refresh history [10]

{action} branching out to another workspace

{limitation} the refresh can fail with the message that the staging lakehouse couldn't be found [10]
{workaround} create a new Dataflow Gen2 with CI/CD and Git support in the workspace to trigger the creation of the staging lakehouse [10]
⇐ all other dataflows in the workspace should start to function again.

{action} syncing changes from GIT into the workspace

requires to open the new or updated dataflow and save changes manually with the editor [10]

triggers a publish action in the background to allow the changes to be used during refresh of the dataflow [10]

[Power Automate] {limitation} the connector for dataflows isn't working [10]

{feature} Copilot for Dataflow Gen2

provide AI-powered assistance for creating data integration solutions using natural language prompts [11]
{benefit} helps streamline the dataflow development process by allowing users to use conversational language to perform data transformations and operations [11]

{benefit} enhance flexibility by allowing dynamic adjustments without altering the dataflow itself [9]
{benefit} extends data with consistent data, such as a standard date dimension table [1]
{benefit} allows self-service users access to a subset of data warehouse separately [1]
{benefit} optimizes performance with dataflows, which enable extracting data once for reuse, reducing data refresh time for slower sources [1]
{benefit} simplifies data source complexity by only exposing dataflows to larger analyst groups [1]
{benefit} ensures consistency and quality of data by enabling users to clean and transform data before loading it to a destination [1]
{benefit} simplifies data integration by providing a low-code interface that ingests data from various sources [1]
{limitation} not a replacement for a data warehouse [1]
{limitation} row-level security isn't supported [1]
{limitation} Fabric or Fabric trial capacity workspace is required [1]

Feature	Data flow Gen2	Dataflow Gen1
Author dataflows with Power Query	✓	✓
Shorter authoring flow	✓
Auto-Save and background publishing	✓
Data destinations	✓
Improved monitoring and refresh history	✓
Integration with data pipelines	✓
High-scale compute	✓
Get Data via Dataflows connector	✓	✓
Direct Query via Dataflows connector		✓
Incremental refresh	✓*
Fast Copy	✓*
Cancel refresh	✓*
AI Insights support		✓

Dataflow Gen1 vs Gen2 [2]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2023) Fabric: Ingest data with Microsoft Fabric [link]
[2] Microsoft Learn (2023) Fabric: Getting from Dataflow Generation 1 to Dataflow Generation 2 [link]
[3] Microsoft Learn (2023) Fabric: Dataflow Gen2 data destinations and managed settings [link]
[4] Microsoft Learn (2023) Fabric: Dataflow Gen2 pricing for Data Factory in Microsoft Fabric [link]
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[6] Microsoft Learn (2023) Fabric: What's new and planned for Data Factory in Microsoft Fabric [link][7] Microsoft Learn (2023) Fabric: Fast copy in Dataflows Gen2 [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]
[9] Microsoft Learn (2025) Fabric: Use public parameters in Dataflow Gen2 (Preview) [link]
[10] Microsoft Learn (2025) Fabric: Dataflow Gen2 with CI/CD and Git integration support [link]
[11] Microsoft Learn (2025) Fabric: What is Dataflow Gen2? [link]
[12] Microsoft Learn (2025) Fabric: Use a dataflow in a pipeline [link]
[13] Microsoft Learn (2025) Fabric: Save a draft of your dataflow [link]
[14] Microsoft Learn (2025) Fabric: Dataflow destinations and managed settings [link]
[15] Microsoft Learn (2025) Fabric: Dataflow refresh [link]

Resources:
[R1] Arshad Ali & Bradley Schacht (2024) Learn Microsoft Fabric [link]
[R2] Microsoft Learn: Fabric (2023) Data Factory limitations overview [link]
[R3] Microsoft Fabric Blog (2023) Data Factory Spotlight: Dataflow Gen2, by Miguel Escobar [link]
[R4] Microsoft Learn (2023) Fabric: Dataflow Gen2 connectors in Microsoft Fabric [link]
[R5] Microsoft Learn(2023) Fabric: Pattern to incrementally amass data with Dataflow Gen2 [link]
[R6] Fourmoo (2004) Microsoft Fabric – Comparing Dataflow Gen2 vs Notebook on Costs and usability, by Gilbert Quevauvilliers [link]
[R7] Microsoft Learn: Fabric (2023) A guide to Fabric Dataflows for Azure Data Factory Mapping Data Flow users [link]
[R8] Microsoft Learn: Fabric (2023) Quickstart: Create your first dataflow to get and transform data [link]
[R9] Microsoft Learn: Fabric (2023) Microsoft Fabric decision guide: copy activity, dataflow, or Spark [link]
[R10] Microsoft Fabric Blog (2023) Dataflows Gen2 data destinations and managed settings, by Miquella de Boer [link]
[R11] Microsoft Fabric Blog (2023) Service principal support to connect to data in Dataflow, Datamart, Dataset and Dataflow Gen 2, by Miquella de Boer [link]
[R12] Chris Webb's BI Blog (2023) Fabric Dataflows Gen2: To Stage Or Not To Stage? [link]
[R13] Power BI Tips (2023) Let's Learn Fabric ep.7: Fabric Dataflows Gen2 [link]
[R14] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R15] Microsoft Fabric Blog (2023) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link]

Acronyms:
ADLS - Azure Data Lake Storage

CI/CD - Continuous Integration/Continuous Deployment

ETL - Extract, Transform, Load

KQL - Kusto Query Language
PQO - Power Query Online
PQT - Power Query Template

SQL Troubles

Pages

10 March 2024

🏭🗒️Microsoft Fabric: Dataflows Gen2 [Notes]

No comments:

About Me