Showing posts with label features. Show all posts
Showing posts with label features. Show all posts

07 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 181: How the Early Layers in Deep Learning Models Can Detect Simple Features)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on how the early layers in deep learning models can detect simple features in the context of AI"

Introduction

One of the most elegant aspects of deep learning is the way its internal structure mirrors the layered complexity of the world. At the foundation of this architecture lie the early layers - components that specialize in detecting simple, low‑level features. These layers may seem modest compared to the deeper, more abstract parts of the network, but they play a crucial role. They act as the sensory foundation upon which all higher‑level understanding is built. Understanding how these early layers work reveals not only the mechanics of deep learning but also why these models are so effective at capturing patterns that humans often overlook.

The first key insight is that early layers operate as feature detectors, identifying the most basic building blocks of a signal. In image models, these features include edges, corners, textures, and simple color gradients. In language models, they correspond to character patterns, subword fragments, punctuation structures, and basic syntactic cues. These features are not meaningful on their own, but they form the raw material from which meaning emerges. Just as the human visual system begins by detecting edges before recognizing objects, deep learning models begin by identifying simple patterns before constructing complex representations.

A second important aspect is how these early layers learn. They are not programmed to detect specific features. Instead, they discover them automatically through training. When a model is exposed to large amounts of data, the early layers adjust their parameters to capture the most statistically useful patterns. In images, edges are among the most informative features because they define boundaries and shapes. In text, character sequences and word fragments are essential for understanding structure. The model learns these features because they consistently help reduce prediction error. This self‑organization is one of the reasons deep learning is so powerful: the model discovers the right features without human intervention.

Another strength of early layers is their universality. The simple features they detect tend to be useful across many tasks. An edge detector trained on one dataset will often work well on another. This is why transfer learning is so effective. When a model trained on millions of images is fine‑tuned for a new task, the early layers usually remain unchanged. They provide a stable foundation of general-purpose features, while the deeper layers adapt to the specifics of the new problem. This mirrors biological systems, where early sensory processing is largely universal, and higher-level interpretation is specialized.

Early layers also excel at capturing local patterns, which is essential for building more complex representations. In convolutional neural networks, for example, early filters scan small regions of an image, detecting local structures. These local features are then combined by deeper layers to form larger, more abstract patterns - textures, shapes, and eventually full objects. In language models, early layers capture local dependencies between characters or words, which deeper layers then assemble into phrases, sentences, and semantic relationships. This hierarchical composition is what allows deep learning models to scale from simple signals to sophisticated understanding.

A further advantage is robustness. Simple features tend to be stable across variations in data. An edge remains an edge even when lighting changes. A character sequence remains the same even when the surrounding context shifts. By anchoring their understanding in these stable features, deep learning models become more resilient to noise and variation. This stability is essential for generalization - the ability to perform well on new, unseen data.

Ultimately, the early layers of deep learning models are not just technical components; they are the foundation of the model’s perceptual world. They transform raw data into structured signals, enabling deeper layers to build meaning, context, and abstraction. When humans and AI collaborate, understanding these foundations helps us appreciate how machines perceive the world - and how their perception can complement our own.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

  • {def} parameters that allow to dynamically control and customize Dataflows Gen2
    • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
    • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
      • Fabric REST API [1]
      • native Fabric experiences [1]
    • parameter names are case sensitive [1]
    • {type} required parameters
      • {warning} the refresh fails if no value is passed for it [1]
    • {type} optional parameters
    • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
  • {limitation} dataflows with parameters can't be
    • scheduled for refresh through the Fabric scheduler [1]
    • manually triggered through the Fabric Workspace list or lineage view [1]
  • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
    • ⇐ connections are linked to the exact data source path defined in the authored dataflow
      • can't be currently override to use other connections or resource paths [1]
  • {limitation} can't be leveraged by dataflows with incremental refresh [1]
  • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
    • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
  • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
  • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
  • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
    • subsequent requests are rejected until the first request finishes its evaluation [1]

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

Acronyms:
API - Application Programming Interface
REST - Representational State Transfer

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

ERP Implementations Series
ERP Implementations Series

An ERP implementation is occasionally considered as a one-time endeavor after which an organization will live happily ever after. In an ideal world that would be true, though the work never stops – things that were carved out from the implementation, optimizations, new features, new regulations, new requirements, integration with other systems, etc. An implementation is thus just the beginning from what it comes and it's essential to get the foundation right – and that’s the purpose of the ERP implementation – provide a foundation on which something bigger and solid can be erected. 

No matter how well an ERP implementation is managed and executed, respectively how well people work towards the same goals, there’s always something forgotten or carved out from the initial project. Usually, the casual suspects are the integrations with other systems, though there can be also minor or even bigger features that are planned to be addressed later, if the implementation hasn’t consumed already all the financial resources available, as it's usually the case. Some of the topics can be addressed as Change Requests or consolidated on projects of their own. 

Even simple integrations can become complex when the processes are poorly designed, and that typically happens more often than people think. It’s not necessarily about the lack of skillset or about the technologies used, but about the degree to which the processes can work in a loosely coupled interconnected manner. Even unidirectional integrations can raise challenges, though everything increases in complexity when the flow of data is bidirectional. Moreover, the complexity increases with each system added to the overall architecture. 

Like a sculpture’s manual creation, processes in an ERP implementation form a skeleton that needs chiseling and smoothing until the form reaches the desired optimized shape. However, optimization is not a one-time attempt but a continuous work of exploring what is achievable, what works, what is optimal. Sometimes optimization is an exact science, while other times it’s about (scientifical) experimentation in which theory, ideas and investments are put to good use. However, experimentation tends to be expensive at least in terms of time and effort, and probably these are the main reasons why some organizations don’t even attempt that – or maybe it’s just laziness, pure indifference or self-preservation. In fact, why change something that already works?

Typically, software manufacturers make available new releases on a periodic basis as part of their planning for growth and of attracting more businesses. Each release that touches used functionality typically needs proper evaluation, testing and whatever organizations consider as important as part of the release management process. Ideally, everything should go smoothly though life never ceases to surprise and even a minor release can have an important impact when earlier critical functionality stopped working. Test automation and other practices can make an important difference for organizations, though these require additional effort and investments that usually pay off when done right. 

Regulations and other similar requirements must be addressed as they can involve penalties or other risks that are usually worth avoiding. Ideally such requirements should be supported by design, though even then a certain volume of work is involved. Moreover, the business context can change unexpectedly, and further requirements need to be considered eventually. 

The work on an ERP system and the infrastructure built around it is a never-ending story. Therefore, organizations must have not only the resources for the initial project, but also what comes after that. Of course, some work can be performed manually, some requirements can be delayed, some risks can be assumed, though the value of an ERP system increases with its extended usage, at least in theory. 

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Mar-2025

Query Acceleration
Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

  • {def} 
    • indexes and caches on the fly data landing in OneLake [2]
      • {benefit} allows to 
        • analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake 
        • ⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2] 
        • ⇒ accelerate data landing in OneLake
          • ⇐ including existing data and any new updates, and expect similar performance [1]
          • eliminates the need to 
            • manage ingestion pipelines [1]
            • maintain duplicate copies of data [1]
          • ensures that data remains in sync without additional effort [4]
          • the initial process is dependent on the size of the external table [4]
      • ⇐ provides significant performance comparable to ingesting data in Eventhouse [1]
        • in some cases up to 50x and beyond [2]
      • ⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]
        • when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
        • accelerating the shortcut means equivalent ingestion into the Eventhouse
        • ⇐  optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
        • e.g. indexing, caching, etc. 
      • all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]  
      • behave like external tables, with the same limitations and capabilities [4]
        • {limitation} materialized view aren't supported [1]
        • {limitation} update policies aren't supported [1]
    • allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]
      • ⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]
        • ⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]
    • {costs} charged under OneLake Premium cache meter [2]
      • ⇐ similar to native Eventhouse tables [2]
      • one can control the amount of data to accelerate by configuring number of days to cache [2]
      • indexing activity may also count towards CU consumption [2]
    • {limitation} the number of columns in the external table can't exceed 900 [1]
    • {limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
    • {limitation} the feature assumes delta tables with static advanced features
      • e.g. column mapping doesn't change, partitions don't change, etc
      • {recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]
    • {limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]
      • might result in acceleration starting from scratch if there was breaking schema change [1]
    • {limitation} index-based pruning isn't supported for partitions [1]
    • {limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]
[4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

  • {def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
  • identifiable by its name
    • {constraint} must be unique in a folder or at the root level of the workspace
    • {constraint} can’t include certain special characters [1]
      • C0 and C1 control codes [1]
      • leading or trailing spaces [1]
      • characters: ~"#.&*:<>?/{|} [1]
    • {constraint} can’t have system-reserved names
      • e.g. $recycle.bin, recycled, recycler.
    • {constraint} its length can't exceed 255 characters
  • {operation} create folder
    • can be created in
      • an existing folder (aka nested subfolder) [1]
        • {restriction} a maximum of 10 levels of nested subfolders can be created [1]
        • up to 10 folders can be created in the root folder [1]
        • {benefit} provide a hierarchical structure for organizing and managing items [1]
      • the root
  • {operation} move folder
  • {operation} rename folder
    • same rules applies as for folders’ creation [1]
  • {operation} delete folder
    • {restriction} currently can be deleted only empty folders [1]
      • {recommendation} make sure the folder is empty [1]
  •  {operation} create item in folder
    • {restriction} certain items can’t be created in a folder
      • dataflows gen2
      • streaming semantic models
      • streaming dataflows
    • ⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]
  • {operation} move file(s) between folders [1]
  • {operation} publish to folder [1]
    •   Power BI reports can be published to specific folders
      • {restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]
        • when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]
  • {limitation}may not be supported by certain features
    •   e.g. Git
  • {recommendation} use folders to organize workspaces [1]
    •  allows to improve content’s organization and navigation [AN]
    •  allows to improve collaboration, access control and governance [AN]
  • {permissions}
    • inherit the permissions of the workspace where they're located [1] [2]
    • workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
    • viewers can only view folder hierarchy and navigate in the workspace [1]
  • [deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post  <<||>>  Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

20 January 2025

🏭🗒️Microsoft Fabric: [Azure] Service Principals (SPN) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 20-Jan-2025

[Azure] Service Principal (SPN)  

  • {def} a non-human, application-based security identity used by applications or automation tools to access specific Azure resources [1]
    • can be assigned precise permissions, making them perfect for automated processes or background services
      • allows to minimize the risks of human error and identity-based vulnerabilities
      • supported in datasets, Gen1/Gen2 dataflows, datamarts [2]
      • authentication type 
        • supported only by [2]
          • Azure Data Lake Storage
          • Azure Data Lake Storage Gen2
          • Azure Blob Storage
          • Azure Synapse Analytics
          • Azure SQL Database
          • Dataverse
          • SharePoint online
        • doesn’t support
          • SQL data source with Direct Query in datasets [2]
  • when registering a new application in Microsoft Entra ID, a SPN is automatically created for the app registration [4]
    • the access to resources is restricted by the roles assigned to the SPN
      • ⇒ gives control over which resources can be accessed and at which level [4]
    • {recommendation} use SPN with automated tools [4]
      • rather than allowing them to sign in with a user identity  [4]
    • {prerequisite} an active Microsoft Entra user account with sufficient permissions to 
      • register an application with the tenant [4]
      • assign to the application a role in the Azure subscription [4]
      •  requires Application.ReadWrite.All permission [4]
  • extended to support Fabric Data Warehouses [1]
    • {benefit} automation-friendly API Access
      • allows to create, update, read, and delete Warehouse items via Fabric REST APIs using service principals [1]
      • enables to automate repetitive tasks without relying on user credentials [1]
        • e.g. provisioning or managing warehouses
        • increases security by limiting human error
      • the warehouses thus created, will be displayed in the Workspace list view in Fabric UI, with the Owner name of the SPN [1]
      • applicable to users with administrator, member, or contributor workspace role [3]
      • minimizes risk
        • the warehouses created with delegated account or fixed identity (owner’s identity) will stop working when the owner leaves the organization [1]
          • Fabric requires the user to login every 30 days to ensure a valid token is provided for security reasons [1]
    • {benefit} seamless integration with Client Tools: 
      • tools like SSMS can connect to the Fabric DWH using SPN [1]
      • SPN provides secure access for developers to 
        • run COPY INTO
          • with and without firewall enabled storage [1]
        • run any T-SQL query programmatically on a schedule with ADF pipelines [1]
    • {benefit} granular access control
      • Warehouses can be shared with an SPN through the Fabric portal [1]
        • once shared, administrators can use T-SQL commands to assign specific permissions to SPN [1]
          • allows to control precisely which data and operations an SPN has access to  [1]
            • GRANT SELECT ON <table name> TO <Service principal name>  
      • warehouses' ownership can be changed from an SPN to user, and vice-versa [3]
    • {benefit} improved DevOps and CI/CD Integration
      • SPN can be used to automate the deployment and management of DWH resources [1]
        •  ensures faster, more reliable deployment processes while maintaining strong security postures [1]
    • {limitation} default semantic models are not supported for SPN created warehouses [3]
      • ⇒ features such as listing tables in dataset view, creating report from the default dataset don’t work [3]
    • {limitation} SPN for SQL analytics endpoints is not currently supported
    • {limitation} SPNs are currently not supported for COPY INTO error files [3]
      • ⇐ Entra ID credentials are not supported as well [3]
    • {limitation} SPNs are not supported for GIT APIs. SPN support exists only for Deployment pipeline APIs [3]
    • monitoring tools
      • [DMV] sys.dm_exec_sessions.login_name column [3] 
      • [Query Insights] queryinsights.exec_requests_history.login_name [3]
      • Query activity
        • submitter column in Fabric query activity [3]
      • Capacity metrics app: 
        • compute usage for warehouse operations performed by SPN appears as the Client ID under the User column in Background operations drill through table [3]

References:
[1] Microsoft Fabric Updates Blog (2024) Service principal support for Fabric Data Warehouse [link]
[2] Microsoft Fabric Learn (2024) Service principal support in Data Factory [link]
[3] Microsoft Fabric Learn (2024) Service principal in Fabric Data Warehouse [link
[4] Microsoft Fabric Learn (2024) Register a Microsoft Entra app and create a service principal [link]
[5] Microsoft Fabric Updates Blog (2024) Announcing Service Principal support for Fabric APIs [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link
 
Acronyms:
ADF - Azure Data Factory
API - Application Programming Interface
CI/CD - Continuous Integration/Continuous Deployment
DMV - Dynamic Management View
DWH - Data Warehouse
SPN - service principal
SSMS - SQL Server Management Studio

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series
Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider: 

Area Lakehouse Warehouse Eventhouse Fabric SQL database Power BI Datamart
Data volume Unlimited Unlimited Unlimited 4 TB Up to 100 GB
Type of data Unstructured, semi-structured, structured Structured, semi-structured (JSON) Unstructured, semi-structured, structured Structured, semi-structured, unstructured Structured
Primary developer persona Data engineer, data scientist Data warehouse developer, data architect, data engineer, database developer App developer, data scientist, data engineer AI developer, App developer, database developer, DB admin Data scientist, data analyst
Primary dev skill Spark (Scala, PySpark, Spark SQL, R) SQL No code, KQL, SQL SQL No code, SQL
Data organized by Folders and files, databases, and tables Databases, schemas, and tables Databases, schemas, and tables Databases, schemas, tables Database, tables, queries
Read operations Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark T-SQL Spark, T-SQL
Write operations Spark (Scala, PySpark, Spark SQL, R) T-SQL KQL, Spark, connector ecosystem T-SQL Dataflows, T-SQL
Multi-table transactions No Yes Yes, for multi-table ingestion Yes, full ACID compliance No
Primary development interface Spark notebooks, Spark job definitions SQL scripts KQL Queryset, KQL Database SQL scripts Power BI
Security RLS, CLS**, table level (T-SQL), none for Spark Object level, RLS, CLS, DDL/DML, dynamic data masking RLS Object level, RLS, CLS, DDL/DML, dynamic data masking Built-in RLS editor
Access data via shortcuts Yes Yes Yes Yes No
Can be a source for shortcuts Yes (files and tables) Yes (tables) Yes Yes (tables) No
Query across items Yes Yes Yes Yes No
Advanced analytics Interface for large-scale data processing, built-in data parallelism, and fault tolerance Interface for large-scale data processing, built-in data parallelism, and fault tolerance Time Series native elements, full geo-spatial and query capabilities T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics Interface for data processing with automated performance tuning
Advanced formatting support Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Full indexing for free text and semi-structured data like JSON Table support for OLTP, JSON, vector, graph, XML, spatial, key-value Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency Available instantly for querying Available instantly for querying Queued ingestion, streaming ingestion has a couple of seconds latency Available instantly for querying Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag. 

The warehouse seems to be more versatile, though careful attention needs to be given to its design. 

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas. 

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.


References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]
[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]
[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]
[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]
[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link

09 December 2024

🏭🗒️Microsoft Fabric: Microsoft Fabric [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 8-Dec-2024

Microsoft Fabric 

  • {goal}complete (end-to-end) analytics platform [6]
    • {characteristic} unified
      • {objective} provides a single, integrated environment for all the organization
        • {benefit} data professionals and the business users can collaborate on data projects [5] and solutions
    • {characteristic}serverless SaaS model (aka SaaS-ified)
      • {objective} provisioned automatically with the tenant [6]
      • {objective} highly scalable [5]
      • {objective} cost-effectiveness [5]
      • {objective} accessible 
        • ⇐ from anywhere with an internet connection [5]
      • {objective} continuous updates
        • ⇐ provided by Microsoft
      • {objective} continuous maintenance 
        • ⇐ provided by Microsoft
      • provides a set of integrated services that enable to ingest, store, process, and analyze data in a single environment [5]
    • {objective} secure
    • {objective} governed
  • {goal} lake-centric
    • {characteristic} OneLake-based
      • all workloads automatically store their data in the OneLake workspace folders [6]
      • all the data is organized in an intuitive hierarchical namespace [6]
      • data is automatically indexed [6]
      • provides a set of features 
        • discovery
        • MIP labels
        • lineage
        • PII scans
        • sharing
        • governance
        • compliance
    • {characteristic} one copy
      • available for all computes 
      • all compute engines store their data automatically in OneLake
        •  the data is stored in a (single) common format
          •  delta parquet file format
            • open standards format
            • the storage format for all tabular data in Microsoft Fabric 
        • ⇐ the data is directly accessible by all the engines [6]
          • ⇐ no import/export needed
      • all compute engines are fully optimized to work with Delta Parquet as their native format [6]
      • a shared universal security model is enforced across all the engines [6]
    • {characteristic} open at every tier
  • {goal} empowering
    • {characteristic} intuitive
    • {characteristic} built into M365
    • {characteristic} insight to action
  • {goal} AI-powered
    • {characteristic} Copilot accelerated 
    • {characteristic} ChatGPT enabled
    • {characteristic} AI-driven insights
  •  complete analytics platform
    • addresses the needs of all data professionals and business users who target harnessing the value of data 
  • {feature} scales automatically
    • the system automatically allocates an appropriate number of compute resources based on the job size
    • the cost is proportional to total resource consumption, rather than size of cluster or number of resources allocated 
    •  jobs in general complete faster (and usually, at less overall cost)
      • ⇒ not need to specify cluster sizes
  • natively supports 
    • Spark
    • data science
    • log-analytics
    • real-time ingestion and messaging
    • alerting
    • data pipelines
    • Power BI reporting 
    • interoperability with third-party services 
      • from other vendors that support the same open 
  • data virtualization mechanisms 
    • {feature} mirroring [notes]
    • {feature} shortcuts [notes]
      • allow users to reference data without copying it
      • {benefit} make other domain data available locally without the need for copying data
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains [notes]
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • subdomains
      • a way for fine tuning the logical grouping data under a domain [1]
        • subdivisions of a domain

Resources:
[1] Microsoft Learn (2023) Administer Microsoft Fabric [link]
[2] Microsoft Learn: Fabric (2024) Governance overview and guidance [link]
[3] Microsoft Learn: Fabric (2023) Fabric domains [link]
[4] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam [link]
[5] Microsoft Learn: Fabric (2024) Introduction to end-to-end analytics using Microsoft Fabric [link]
[6] 
Microsoft Fabric (2024) Fabric Analyst in a Day [course notes]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
M365 - Microsoft 365
MF - Microsoft Fabric
PII - Personal Identification Information
SaaS - software-as-a-service

13 June 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: One Person Can’t Learn or Do Everything)

Business Intelligence Series
Business Intelligence Series

Today’s Explicit Measures webcast [1] considered an article written by Kurt Buhler (The Data Goblins): [Microsoft] "Fabric is a Team Sport: One Person Can’t Learn or Do Everything" [2]. It’s a well-written article that deserves some thought as there are several important points made. I can’t say I agree with the full extent of some statements, even if some disagreements are probably just a matter of semantics.

My main disagreement starts with the title “One Person Can’t Learn or Do Everything”. As clarified in webcast's chat, the author defines “everything" as an umbrella for “all the capabilities and experiences that comprise Fabric including both technical (like Power BI) or non-technical (like adoption data literacy) and everything in between” [1].

For me “everything” is relative and considers a domain's core set of knowledge, while "expertise" (≠ "mastery") refers to the degree to which a person can use the respective knowledge to build back-to-back solutions for a given area. I’d say that it becomes more and more challenging for beginners or average data professionals to cover the core features. Moreover, I’d separate the non-technical skills because then one will also need to consider topics like Data, Project, Information or Knowledge Management.

There are different levels of expertise, and they can vary in depth (specialization) or breadth (covering multiple areas), respectively depend on previous experience (whether one worked with similar technologies). Usually, there’s a minimum of requirements that need to be covered for being considered as expert (e.g. certification, building a solution from beginning to the end, troubleshooting, performance optimization, etc.). It’s also challenging to roughly define when one’s expertise starts (or ends), as there are different perspectives on the topics. 

Conversely, the term expert is in general misused extensively, sometimes even with a mischievous intent. As “expert” is usually considered an external consultant or a person who got certified in an area, even if the person may not be able to build solutions that address a customer’s needs. 

Even data professionals with many years of experience can be overwhelmed by the volume of knowledge, especially when one considers the different experiences available in MF, respectively the volume of new features released monthly. Conversely, expertise can be considered in respect to only one or more MF experiences or for one area within a certain layer. Lot of the knowledge can be transported from other areas – writing SQL and complex database objects, modelling (enterprise) semantic layers, programming in Python, R or Power Query, building data pipelines, managing SQL databases, etc. 

Besides the standard documentation, training sessions, and some reference architectures, Microsoft made available also some labs and other material, which helps discovering the features available, though it doesn’t teach people how to build complete solutions. I find more important than declaring explicitly the role-based audience, the creation of learning paths for the various roles.

During the past 6-7 months I've spent on average 2 days per week learning MF topics. My problem is not the documentation but the lack of maturity of some features, the gaps in functionality, identifying the respective gaps, knowing what and when new features will be made available. The fact that features are made available or changed while learning makes the process more challenging. 

My goal is to be able to provide back-to-back solutions and I believe that’s possible, even if I might not consider all the experiences available. During the past 22 years, at least until MF, I could build complete BI solutions starting from requirements elicitation, data extraction, modeling and processing for data consumption, respectively data consumption for the various purposes. At least this was the journey of a Software Engineer into the world of data. 

References:
[1] Explicit Measures (2024) Power BI tips Ep.328: Microsoft Fabric is a Team Sport (link)
[2] Data Goblins (2024) Fabric is a Team Sport: One Person Can’t Learn or Do Everything (link)

10 March 2024

🏭🗒️Microsoft Fabric: Dataflows Gen2 [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 24-Nov-2025

Dataflow (Gen2) Architecture [4]

[Microsoft Fabric] Dataflow (Gen2) 

  • cloud-based, low-code interface that provides a modern data integration experience allowing users to ingest, prepare and transform data from a rich set of data sources incl. databases, data warehouses, lakehouses, real-time data repositories, etc. [11]
    • new generation of dataflows that resides alongside the Power BI Dataflow (Gen1) [2]
      • brings new features, improved experience [2] and enhanced performance [11]
      • similar to Dataflow Gen1 in Power BI [2] 
      • {recommendation} implement new functionality using Dataflow (Gen2) [11]
        • allows to leverage the many features and experiences not available in (Gen1) 
      • {recommendation} migrate from Dataflow (Gen1) to (Gen2) [11] 
        • allows to leverage the modern experience and capabilities 
    • allows to 
      • extract data from various sources [1]
      • transform it using a wide range of transformation operations [1]
      • load it into a destination [1]
    • {goal} provide an easy, reusable way to perform ETL tasks using Power Query Online [1]
      • allows to promote reusable ETL logic 
        • ⇒ prevents the need to create more connections to the data source [1]
        • offer a wide variety of transformations [1]
    • can be horizontally partitioned
  • {component} Lakehouse 
    • used to stage data being ingested
  • {component} Warehouse 
    • used as a compute engine and means to write back results to staging or supported output destinations faster
  • {component} mashup engine
    • extracts, transforms, or loads the data to staging or data destinations when either [4]
      • warehouse compute cannot be used [4]
      • {limitation} staging is disabled for a query [4]
  • {operation} create a dataflow
    • can be created in a
      • Data Factory workload
      • Power BI workspace
      • Lakehouse
    •  when a dataflow (Gen2) is reated in a workspace, lakehouse and warehouse items are provisioned along with their related SQL analytics endpoint and semantic models [12]
      •  shared by all dataflows in the workspace and are required for Dataflow Gen2 to operate [12]
        • {warning} shouldn't be deleted, and aren't intended to be used directly by users [12]
        •  aren't visible in the workspace, but might be accessible in other experiences such as the Notebook, SQL-endpoint, Lakehouse, and Warehouse experience [12]
        •  the items can be recognized by their prefix:`DataflowsStaging' [12]
  • {operation} set a default destination for the dataflow 
    • helps to get started quickly by loading all queries to the same destination [14]
    • via ribbon or the status bar in the editor
    • users are prompted to choose a destination and select which queries to bind to it [14]
    • to update the default destination, delete the current default destination and set a new one [14]
    • {default} any new query has as destination the lakehouse, warehouse, or KQL database from which it got started [14] 
  • {operation} publish a dataflow
    • generates dataflow's definition  
      • ⇐ the program that runs once the dataflow is refreshed to produce tables in staging storage and/or output destination [4]
      • used by the dataflow engine to generate an orchestration plan, manage resources, and orchestrate execution of queries across data sources, gateways, and compute engines, and to create tables in either the staging storage or data destination [4]
    • saves changes and runs validations that must be performed in the background [2]
  • {operation} export/import dataflows [11]
    •  allows also to migrate from dataflow (Gen1) to (Gen2) [11]
  • {operation} refresh a dataflow
    • applies the transformation steps defined during authoring 
    • can be triggered on-demand or by setting up a refresh schedule
    • {action} cancel refresh
      • enables to cancel ongoing Dataflow Gen2 refreshes from the workspace items view [6]
      • once canceled, the dataflow's refresh history status is updated to reflect cancellation status [15] 
      • {scenario} stop a refresh during peak time, if a capacity is nearing its limits, or if refresh is taking longer than expected [15]
      • it may have different outcomes
        • data from the last successful refresh is available [15]
        • data written up to the point of cancellation is available [15]
      • {warning} if a refresh is canceled before evaluation of a query that loads data to a destination began, there's no change to data in that query's destination [15]
    • {limitation} each dataflow is allowed up to 300 refreshes per 24-hour rolling window [15]
      •  {warning} attempting 300 refreshes within a short burst (e.g., 60 seconds) may trigger throttling and result in rejected requests [15]
        •  protections in place to ensure system reliability [15]
      • if the scheduled dataflow refresh fails consecutively,  dataflow refresh schedule is paused and an email is sent to the owner [15]
    • {limitation} a single evaluation of a query has a limit of 8 hours [15]
    • {limitation} total refresh time of a single refresh of a dataflow is limited to a max of 24 hours [15]
    • {limitation} per dataflow one can have a maximum of 50 staged queries, or queries with output destination, or combination of both [15]
  • {operation} copy and paste code in Power Query [11]
    •   allows to migrate dataflow (Gen1) to (Gen2) [11]
  • {operation} save a dataflow [11]
    • via 'Save As'  feature
    • can be used to save a dataflow (Gen1) as (Gen2) dataflow [11] 
  • {operation} save a dataflow as draft 
    •  allows to make changes to dataflows without immediately publishing them to a workspace [13]
      • can be later reviewed, and then published, if needed [13]
    • {operation} publish draft dataflow 
      • performed as a background job [13]
      • publishing related errors are visible next to the dataflow's name [13]
        • selecting the indication reveals the publishing errors and allows to edit the dataflow from the last saved version [13]
  • {operation} run a dataflow 
    • can be performed
      • manually
      • on a refresh schedule
      • as part of a Data Pipeline orchestration
  •  {operation} monitor pipeline runs 
    • allows to check pipelines' status, spot issues early, respectively troubleshoot issues
    • [Workspace Monitoring] provides log-level visibility for all items in a workspace [link]
      • via Workspace Settings >> select Monitoring 
    • [Monitoring Hub] serves as a centralized portal for browsing pipeline runs across items within the Data Factory or Data Engineering experience [link]
  • {feature} connect multiple activities in a pipeline [11]
    •  allows to build end-to-end, automated data workflows
  • {feature} author dataflows with Power Query
    • uses the full Power Query experience of Power BI dataflows [2]
  • {feature} shorter authoring flow
    • uses step-by-step for getting the data into your the dataflow [2]
      • the number of steps required to create dataflows were reduced [2]
    • a few new features were added to improve the experience [2]
  • {feature} AutoSave and background publishing
    • changes made to a dataflow are autosaved to the cloud (aka draft version of the dataflow) [2]
      • ⇐ without having to wait for the validation to finish [2]
    • {functionality} save as draft 
      • stores a draft version of the dataflow every time you make a change [2]
      • seamless experience and doesn't require any input [2]
    • {concept} published version
      • the version of the dataflow that passed validation and is ready to refresh [5]
  • {feature} integration with data pipelines
    • integrates directly with Data Factory pipelines for scheduling and orchestration [2] 
  • {feature} high-scale compute
    • leverages a new, higher-scale compute architecture [2] 
      •  improves the performance of both transformations of referenced queries and get data scenarios [2]
      • creates both Lakehouse and Warehouse items in the workspace, and uses them to store and access data to improve performance for all dataflows [2]
  • {feature} improved monitoring and refresh history
    • integrate support for Monitoring Hub [2]
    • Refresh History experience upgraded [2]
  • {feature} get data via Dataflows connector
    • supports a wide variety of data source connectors
      • include cloud and on-premises relational databases
  • {feature} incremental refresh
    • enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
  • {feature} data destinations
    • allows to 
      • specify an output destination
      • separate ETL logic and destination storage [2]
    • every tabular data query can have a data destination [3]
      • available destinations
        • Azure SQL databases
        • Azure Data Explorer (Kusto)
        • Fabric Lakehouse
        • Fabric Warehouse
        • Fabric KQL database
      • a destination can be specified for every query individually [3]
      • multiple different destinations can be used within a dataflow [3]
      • connecting to the data destination is similar to connecting to a data source
      • {limitation} functions and lists aren't supported
    • {operation} creating a new table
      • {default} table name has the same name as the query name.
    • {operation} picking an existing table
    • {operation} deleting a table manually from the data destination 
      • doesn't recreate the table on the next refresh [3]
    • {operation} reusing queries from Dataflow Gen1
      • {method} export Dataflow Gen1 query and import it into Dataflow Gen2
        • export the queries as a PQT file and import them into Dataflow Gen2 [2]
      • {method} copy and paste in Power Query
        • copy the queries and paste them in the Dataflow Gen2 editor [2]
    • {feature} automatic settings:
      • {limitation} supported only for Lakehouse and Azure SQL database
      • {setting} Update method replace: 
        • data in the destination is replaced at every dataflow refresh with the output data of the dataflow [3]
      • {setting} Managed mapping: 
        • the mapping is automatically adjusted when republishing the data flow to reflect the change 
          • ⇒ doesn't need to be updated manually into the data destination experience every time changes occur [3]
      • {setting} Drop and recreate table: 
        • on every dataflow refresh the table is dropped and recreated to allow schema changes
        • {limitation} the dataflow refresh fails if any relationships or measures were added to the table [3]
    • {feature} update methods
      • {method} replace
        • on every dataflow refresh, the data is dropped from the destination and replaced by the output data of the dataflow.
        • {limitation} not supported by Fabric KQL databases and Azure Data Explorer 
      • {method} append
        • on every dataflow refresh, the output data from the dataflow is appended (aka merged) to the existing data in the data destination table (aka upsert)
    • {feature} data staging 
      • {default} enabled
        • allows to use Fabric compute to execute queries
          • ⇐ enhances the performance of query processing
        • the data is loaded into the staging location
          • ⇐ an internal Lakehouse location accessible only by the dataflow itself
        • [Warehouse] staging is required before the write operation to the data destination
          • ⇐ improves performance
          • {limitation} only loading into the same workspace as the dataflow is supported
        •  using staging locations can enhance performance in some cases
      • disabled
        • {recommendation} [Lakehouse] disable staging on the query to avoid loading twice into a similar destination
          • ⇐ once for staging and once for data destination
          • improves dataflow's performance
    • {scenario} use a dataflow to load data into the lakehouse and then use a notebook to analyze the data [2]
    • {scenario} use a dataflow to load data into an Azure SQL database and then use a data pipeline to load the data into a data warehouse [2]
  • {feature} Fast Copy
    • allows ingesting terabytes of data with the easy experience and the scalable back-end of the pipeline Copy Activity [7]
      • enables large-scale data ingestion directly utilizing the pipelines Copy Activity capability [6]
      • supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage [6]
      • significantly scales up the data processing capacity providing high-scale ELT capabilities
    • the feature must be enabled [7]
      • after enabling, Dataflows automatically switch the back-end when data size exceeds a particular threshold [7]
      • ⇐there's no need to change anything during authoring of the dataflows
      • one can check the refresh history to see if fast copy was used [7]
      • ⇐see the Engine typeRequire fast copy option
      • {option} Require fast copy
    • {prerequisite} Fabric capacity is available [7]
      •  requires a Fabric capacity or a Fabric trial capacity [11]
    • {prerequisite} data files 
      • are in .csv or parquet format
      • have at least 100 MB
      • are stored in an ADLS Gen2 or a Blob storage account [6]
    • {prerequisite} [Azure SQL DB|PostgreSQL] >= 5 million rows in the data source [7]
    • {limitation} doesn't support [7] 
      • the VNet gateway
      • writing data into an existing table in Lakehouse
      • fixed schema
  • {feature} parameters
    • allow to dynamically control and customize dataflows
      • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [9]
      • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
        • Fabric REST API [9]
        • native Fabric experiences [9]
      • parameter names are case sensitive [9]
      • {type} required parameters
        • {warning} the refresh fails if no value is passed for it [9]
      • {type} optional parameters
      • enabled via Parameters >> Enable parameters to be discovered and override for execution [9]
    • {limitation} dataflows with parameters can't be
      • scheduled for refresh through the Fabric scheduler [9]
      • manually triggered through the Fabric Workspace list or lineage view [9]
    • {limitation} parameters that affect the resource path of a data source or a destination are not supported [9]
      • ⇐connections are linked to the exact data source path defined in the authored dataflow
        • can't be currently override to use other connections or resource paths [9]
    • {limitation} can't be leveraged by dataflows with incremental refresh [9]
    • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
      • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [9]
    • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [9]
    • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [9]
    • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [9]
    • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [9]
    • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [9]
      • subsequent requests are rejected until the first request finishes its evaluation [9]
  • {feature} support for CI/CD and Git integration
    • allows to create, edit, and manage dataflows in a Git repository that's connected to a Fabric workspace [10]
    • allows to use the deployment pipelines to automate the deployment of dataflows between workspaces [10]
    • allows to use Public APIs to create and manage Dataflow Gen2 with CI/CD and Git integration [10]
    • allows to create Dataflow Gen2 directly into a workspace folder [10]
    • allows to use the Fabric settings and scheduler to refresh and edit settings for Dataflow Gen2 [10]
    • {action} save a workflow
      • replaces the publish operation
      • when saving th dataflow, it automatically publishes the changes to the dataflow [10]
    • {action} delete a dataflow
      • the staging artifacts become visible in the workspace and are safe to be deleted [10]
    • {action} schedule a refresh
      • can be done manually or by scheduling a refresh [10]
      • {limitation} the Workspace view doesn't show if a refresh is ongoing for the dataflow [10]
      • refresh information is available in the refresh history [10]
    • {action} branching out to another workspace
      • {limitation} the refresh can fail with the message that the staging lakehouse couldn't be found [10]
      • {workaround} create a new Dataflow Gen2 with CI/CD and Git support in the workspace to trigger the creation of the staging lakehouse [10]
      •  all other dataflows in the workspace should start to function again.
    • {action} syncing changes from GIT into the workspace
      • requires to open the new or updated dataflow and save changes manually with the editor [10]
        • triggers a publish action in the background to allow the changes to be used during refresh of the dataflow [10]
    • [Power Automate] {limitation} the connector for dataflows isn't working [10]
  •  {feature} Copilot for Dataflow Gen2
    • provide AI-powered assistance for creating data integration solutions using natural language prompts [11]
    • {benefit} helps streamline the dataflow development process by allowing users to use conversational language to perform data transformations and operations [11]
  • {benefit} enhance flexibility by allowing dynamic adjustments without altering the dataflow itself [9]
  • {benefit} extends data with consistent data, such as a standard date dimension table [1]
  • {benefit} allows self-service users access to a subset of data warehouse separately [1]
  • {benefit} optimizes performance with dataflows, which enable extracting data once for reuse, reducing data refresh time for slower sources [1]
  • {benefit} simplifies data source complexity by only exposing dataflows to larger analyst groups [1]
  • {benefit} ensures consistency and quality of data by enabling users to clean and transform data before loading it to a destination [1]
  • {benefit} simplifies data integration by providing a low-code interface that ingests data from various sources [1]
  • {limitation} not a replacement for a data warehouse [1]
  • {limitation} row-level security isn't supported [1]
  • {limitation} Fabric or Fabric trial capacity workspace is required [1]


Feature Data flow Gen2 Dataflow Gen1
Author dataflows with Power Query
Shorter authoring flow
Auto-Save and background publishing
Data destinations
Improved monitoring and refresh history
Integration with data pipelines
High-scale compute
Get Data via Dataflows connector
Direct Query via Dataflows connector
Incremental refresh ✓*
Fast Copy ✓*
Cancel refresh ✓*
AI Insights support
Dataflow Gen1 vs Gen2 [2]

References:
[1] Microsoft Learn (2023) Fabric: Ingest data with Microsoft Fabric [link]
[2] Microsoft Learn (2023) Fabric: Getting from Dataflow Generation 1 to Dataflow Generation 2 [link]
[3] Microsoft Learn (2023) Fabric: Dataflow Gen2 data destinations and managed settings [link]
[4] Microsoft Learn (2023) Fabric: Dataflow Gen2 pricing for Data Factory in Microsoft Fabric [link]
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[6] Microsoft Learn (2023) Fabric: What's new and planned for Data Factory in Microsoft Fabric [link][7] Microsoft Learn (2023) Fabric: Fast copy in Dataflows Gen2 [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]
[9] Microsoft Learn (2025) Fabric: Use public parameters in Dataflow Gen2 (Preview) [link]
[10] Microsoft Learn (2025) Fabric: Dataflow Gen2 with CI/CD and Git integration support [link]
[11] Microsoft Learn (2025) Fabric: What is Dataflow Gen2? [link]
[12] Microsoft Learn (2025) Fabric: Use a dataflow in a pipeline [link]
[13] Microsoft Learn (2025) Fabric: Save a draft of your dataflow [link]
[14] Microsoft Learn (2025) Fabric: Dataflow destinations and managed settings [link]
[15] Microsoft Learn (2025) Fabric: Dataflow refresh [link]

Resources:
[R1] Arshad Ali & Bradley Schacht (2024) Learn Microsoft Fabric [link]
[R2] Microsoft Learn: Fabric (2023) Data Factory limitations overview [link]
[R3] Microsoft Fabric Blog (2023) Data Factory Spotlight: Dataflow Gen2, by Miguel Escobar [link]
[R4] Microsoft Learn (2023) Fabric: Dataflow Gen2 connectors in Microsoft Fabric [link]
[R5] Microsoft Learn(2023) Fabric: Pattern to incrementally amass data with Dataflow Gen2 [link
[R6] Fourmoo (2004) Microsoft Fabric – Comparing Dataflow Gen2 vs Notebook on Costs and usability, by Gilbert Quevauvilliers [link]
[R7] Microsoft Learn: Fabric (2023) A guide to Fabric Dataflows for Azure Data Factory Mapping Data Flow users [link]
[R8] Microsoft Learn: Fabric (2023) Quickstart: Create your first dataflow to get and transform data [link]
[R9] Microsoft Learn: Fabric (2023) Microsoft Fabric decision guide: copy activity, dataflow, or Spark [link]
[R10] Microsoft Fabric Blog (2023) Dataflows Gen2 data destinations and managed settings, by Miquella de Boer  [link]
[R11] Microsoft Fabric Blog (2023) Service principal support to connect to data in Dataflow, Datamart, Dataset and Dataflow Gen 2, by Miquella de Boer [link]
[R12] Chris Webb's BI Blog (2023) Fabric Dataflows Gen2: To Stage Or Not To Stage? [link]
[R13] Power BI Tips (2023) Let's Learn Fabric ep.7: Fabric Dataflows Gen2 [link]
[R14] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R15] Microsoft Fabric Blog (2023) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

Acronyms:
ADLS - Azure Data Lake Storage
CI/CD - Continuous Integration/Continuous Deployment 
ETL - Extract, Transform, Load
KQL - Kusto Query Language
PQO - Power Query Online
PQT - Power Query Template
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.