Showing posts with label features. Show all posts
Showing posts with label features. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

  • {def} parameters that allow to dynamically control and customize Dataflows Gen2
    • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
    • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
      • Fabric REST API [1]
      • native Fabric experiences [1]
    • parameter names are case sensitive [1]
    • {type} required parameters
      • {warning} the refresh fails if no value is passed for it [1]
    • {type} optional parameters
    • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
  • {limitation} dataflows with parameters can't be
    • scheduled for refresh through the Fabric scheduler [1]
    • manually triggered through the Fabric Workspace list or lineage view [1]
  • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
    • ⇐ connections are linked to the exact data source path defined in the authored dataflow
      • can't be currently override to use other connections or resource paths [1]
  • {limitation} can't be leveraged by dataflows with incremental refresh [1]
  • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
    • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
  • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
  • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
  • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
    • subsequent requests are rejected until the first request finishes its evaluation [1]

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

Acronyms:
API - Application Programming Interface
REST - Representational State Transfer

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

ERP Implementations Series
ERP Implementations Series

An ERP implementation is occasionally considered as a one-time endeavor after which an organization will live happily ever after. In an ideal world that would be true, though the work never stops – things that were carved out from the implementation, optimizations, new features, new regulations, new requirements, integration with other systems, etc. An implementation is thus just the beginning from what it comes and it's essential to get the foundation right – and that’s the purpose of the ERP implementation – provide a foundation on which something bigger and solid can be erected. 

No matter how well an ERP implementation is managed and executed, respectively how well people work towards the same goals, there’s always something forgotten or carved out from the initial project. Usually, the casual suspects are the integrations with other systems, though there can be also minor or even bigger features that are planned to be addressed later, if the implementation hasn’t consumed already all the financial resources available, as it's usually the case. Some of the topics can be addressed as Change Requests or consolidated on projects of their own. 

Even simple integrations can become complex when the processes are poorly designed, and that typically happens more often than people think. It’s not necessarily about the lack of skillset or about the technologies used, but about the degree to which the processes can work in a loosely coupled interconnected manner. Even unidirectional integrations can raise challenges, though everything increases in complexity when the flow of data is bidirectional. Moreover, the complexity increases with each system added to the overall architecture. 

Like a sculpture’s manual creation, processes in an ERP implementation form a skeleton that needs chiseling and smoothing until the form reaches the desired optimized shape. However, optimization is not a one-time attempt but a continuous work of exploring what is achievable, what works, what is optimal. Sometimes optimization is an exact science, while other times it’s about (scientifical) experimentation in which theory, ideas and investments are put to good use. However, experimentation tends to be expensive at least in terms of time and effort, and probably these are the main reasons why some organizations don’t even attempt that – or maybe it’s just laziness, pure indifference or self-preservation. In fact, why change something that already works?

Typically, software manufacturers make available new releases on a periodic basis as part of their planning for growth and of attracting more businesses. Each release that touches used functionality typically needs proper evaluation, testing and whatever organizations consider as important as part of the release management process. Ideally, everything should go smoothly though life never ceases to surprise and even a minor release can have an important impact when earlier critical functionality stopped working. Test automation and other practices can make an important difference for organizations, though these require additional effort and investments that usually pay off when done right. 

Regulations and other similar requirements must be addressed as they can involve penalties or other risks that are usually worth avoiding. Ideally such requirements should be supported by design, though even then a certain volume of work is involved. Moreover, the business context can change unexpectedly, and further requirements need to be considered eventually. 

The work on an ERP system and the infrastructure built around it is a never-ending story. Therefore, organizations must have not only the resources for the initial project, but also what comes after that. Of course, some work can be performed manually, some requirements can be delayed, some risks can be assumed, though the value of an ERP system increases with its extended usage, at least in theory. 

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Mar-2025

Query Acceleration
Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

  • {def} 
    • indexes and caches on the fly data landing in OneLake [2]
      • {benefit} allows to 
        • analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake 
        • ⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2] 
        • ⇒ accelerate data landing in OneLake
          • ⇐ including existing data and any new updates, and expect similar performance [1]
          • eliminates the need to 
            • manage ingestion pipelines [1]
            • maintain duplicate copies of data [1]
          • ensures that data remains in sync without additional effort [4]
          • the initial process is dependent on the size of the external table [4]
      • ⇐ provides significant performance comparable to ingesting data in Eventhouse [1]
        • in some cases up to 50x and beyond [2]
      • ⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]
        • when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
        • accelerating the shortcut means equivalent ingestion into the Eventhouse
        • ⇐  optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
        • e.g. indexing, caching, etc. 
      • all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]  
      • behave like external tables, with the same limitations and capabilities [4]
        • {limitation} materialized view aren't supported [1]
        • {limitation} update policies aren't supported [1]
    • allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]
      • ⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]
        • ⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]
    • {costs} charged under OneLake Premium cache meter [2]
      • ⇐ similar to native Eventhouse tables [2]
      • one can control the amount of data to accelerate by configuring number of days to cache [2]
      • indexing activity may also count towards CU consumption [2]
    • {limitation} the number of columns in the external table can't exceed 900 [1]
    • {limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
    • {limitation} the feature assumes delta tables with static advanced features
      • e.g. column mapping doesn't change, partitions don't change, etc
      • {recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]
    • {limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]
      • might result in acceleration starting from scratch if there was breaking schema change [1]
    • {limitation} index-based pruning isn't supported for partitions [1]
    • {limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]
[4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

  • {def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
  • identifiable by its name
    • {constraint} must be unique in a folder or at the root level of the workspace
    • {constraint} can’t include certain special characters [1]
      • C0 and C1 control codes [1]
      • leading or trailing spaces [1]
      • characters: ~"#.&*:<>?/{|} [1]
    • {constraint} can’t have system-reserved names
      • e.g. $recycle.bin, recycled, recycler.
    • {constraint} its length can't exceed 255 characters
  • {operation} create folder
    • can be created in
      • an existing folder (aka nested subfolder) [1]
        • {restriction} a maximum of 10 levels of nested subfolders can be created [1]
        • up to 10 folders can be created in the root folder [1]
        • {benefit} provide a hierarchical structure for organizing and managing items [1]
      • the root
  • {operation} move folder
  • {operation} rename folder
    • same rules applies as for folders’ creation [1]
  • {operation} delete folder
    • {restriction} currently can be deleted only empty folders [1]
      • {recommendation} make sure the folder is empty [1]
  •  {operation} create item in folder
    • {restriction} certain items can’t be created in a folder
      • dataflows gen2
      • streaming semantic models
      • streaming dataflows
    • ⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]
  • {operation} move file(s) between folders [1]
  • {operation} publish to folder [1]
    •   Power BI reports can be published to specific folders
      • {restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]
        • when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]
  • {limitation}may not be supported by certain features
    •   e.g. Git
  • {recommendation} use folders to organize workspaces [1]
    •  allows to improve content’s organization and navigation [AN]
    •  allows to improve collaboration, access control and governance [AN]
  • {permissions}
    • inherit the permissions of the workspace where they're located [1] [2]
    • workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
    • viewers can only view folder hierarchy and navigate in the workspace [1]
  • [deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post  <<||>>  Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

20 January 2025

🏭🗒️Microsoft Fabric: [Azure] Service Principals (SPN) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 20-Jan-2025

[Azure] Service Principal (SPN)  

  • {def} a non-human, application-based security identity used by applications or automation tools to access specific Azure resources [1]
    • can be assigned precise permissions, making them perfect for automated processes or background services
      • allows to minimize the risks of human error and identity-based vulnerabilities
      • supported in datasets, Gen1/Gen2 dataflows, datamarts [2]
      • authentication type 
        • supported only by [2]
          • Azure Data Lake Storage
          • Azure Data Lake Storage Gen2
          • Azure Blob Storage
          • Azure Synapse Analytics
          • Azure SQL Database
          • Dataverse
          • SharePoint online
        • doesn’t support
          • SQL data source with Direct Query in datasets [2]
  • when registering a new application in Microsoft Entra ID, a SPN is automatically created for the app registration [4]
    • the access to resources is restricted by the roles assigned to the SPN
      • ⇒ gives control over which resources can be accessed and at which level [4]
    • {recommendation} use SPN with automated tools [4]
      • rather than allowing them to sign in with a user identity  [4]
    • {prerequisite} an active Microsoft Entra user account with sufficient permissions to 
      • register an application with the tenant [4]
      • assign to the application a role in the Azure subscription [4]
      •  requires Application.ReadWrite.All permission [4]
  • extended to support Fabric Data Warehouses [1]
    • {benefit} automation-friendly API Access
      • allows to create, update, read, and delete Warehouse items via Fabric REST APIs using service principals [1]
      • enables to automate repetitive tasks without relying on user credentials [1]
        • e.g. provisioning or managing warehouses
        • increases security by limiting human error
      • the warehouses thus created, will be displayed in the Workspace list view in Fabric UI, with the Owner name of the SPN [1]
      • applicable to users with administrator, member, or contributor workspace role [3]
      • minimizes risk
        • the warehouses created with delegated account or fixed identity (owner’s identity) will stop working when the owner leaves the organization [1]
          • Fabric requires the user to login every 30 days to ensure a valid token is provided for security reasons [1]
    • {benefit} seamless integration with Client Tools: 
      • tools like SSMS can connect to the Fabric DWH using SPN [1]
      • SPN provides secure access for developers to 
        • run COPY INTO
          • with and without firewall enabled storage [1]
        • run any T-SQL query programmatically on a schedule with ADF pipelines [1]
    • {benefit} granular access control
      • Warehouses can be shared with an SPN through the Fabric portal [1]
        • once shared, administrators can use T-SQL commands to assign specific permissions to SPN [1]
          • allows to control precisely which data and operations an SPN has access to  [1]
            • GRANT SELECT ON <table name> TO <Service principal name>  
      • warehouses' ownership can be changed from an SPN to user, and vice-versa [3]
    • {benefit} improved DevOps and CI/CD Integration
      • SPN can be used to automate the deployment and management of DWH resources [1]
        •  ensures faster, more reliable deployment processes while maintaining strong security postures [1]
    • {limitation} default semantic models are not supported for SPN created warehouses [3]
      • ⇒ features such as listing tables in dataset view, creating report from the default dataset don’t work [3]
    • {limitation} SPN for SQL analytics endpoints is not currently supported
    • {limitation} SPNs are currently not supported for COPY INTO error files [3]
      • ⇐ Entra ID credentials are not supported as well [3]
    • {limitation} SPNs are not supported for GIT APIs. SPN support exists only for Deployment pipeline APIs [3]
    • monitoring tools
      • [DMV] sys.dm_exec_sessions.login_name column [3] 
      • [Query Insights] queryinsights.exec_requests_history.login_name [3]
      • Query activity
        • submitter column in Fabric query activity [3]
      • Capacity metrics app: 
        • compute usage for warehouse operations performed by SPN appears as the Client ID under the User column in Background operations drill through table [3]

References:
[1] Microsoft Fabric Updates Blog (2024) Service principal support for Fabric Data Warehouse [link]
[2] Microsoft Fabric Learn (2024) Service principal support in Data Factory [link]
[3] Microsoft Fabric Learn (2024) Service principal in Fabric Data Warehouse [link
[4] Microsoft Fabric Learn (2024) Register a Microsoft Entra app and create a service principal [link]
[5] Microsoft Fabric Updates Blog (2024) Announcing Service Principal support for Fabric APIs [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link
 
Acronyms:
ADF - Azure Data Factory
API - Application Programming Interface
CI/CD - Continuous Integration/Continuous Deployment
DMV - Dynamic Management View
DWH - Data Warehouse
SPN - service principal
SSMS - SQL Server Management Studio

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series
Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider: 

Area Lakehouse Warehouse Eventhouse Fabric SQL database Power BI Datamart
Data volume Unlimited Unlimited Unlimited 4 TB Up to 100 GB
Type of data Unstructured, semi-structured, structured Structured, semi-structured (JSON) Unstructured, semi-structured, structured Structured, semi-structured, unstructured Structured
Primary developer persona Data engineer, data scientist Data warehouse developer, data architect, data engineer, database developer App developer, data scientist, data engineer AI developer, App developer, database developer, DB admin Data scientist, data analyst
Primary dev skill Spark (Scala, PySpark, Spark SQL, R) SQL No code, KQL, SQL SQL No code, SQL
Data organized by Folders and files, databases, and tables Databases, schemas, and tables Databases, schemas, and tables Databases, schemas, tables Database, tables, queries
Read operations Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark T-SQL Spark, T-SQL
Write operations Spark (Scala, PySpark, Spark SQL, R) T-SQL KQL, Spark, connector ecosystem T-SQL Dataflows, T-SQL
Multi-table transactions No Yes Yes, for multi-table ingestion Yes, full ACID compliance No
Primary development interface Spark notebooks, Spark job definitions SQL scripts KQL Queryset, KQL Database SQL scripts Power BI
Security RLS, CLS**, table level (T-SQL), none for Spark Object level, RLS, CLS, DDL/DML, dynamic data masking RLS Object level, RLS, CLS, DDL/DML, dynamic data masking Built-in RLS editor
Access data via shortcuts Yes Yes Yes Yes No
Can be a source for shortcuts Yes (files and tables) Yes (tables) Yes Yes (tables) No
Query across items Yes Yes Yes Yes No
Advanced analytics Interface for large-scale data processing, built-in data parallelism, and fault tolerance Interface for large-scale data processing, built-in data parallelism, and fault tolerance Time Series native elements, full geo-spatial and query capabilities T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics Interface for data processing with automated performance tuning
Advanced formatting support Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Full indexing for free text and semi-structured data like JSON Table support for OLTP, JSON, vector, graph, XML, spatial, key-value Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency Available instantly for querying Available instantly for querying Queued ingestion, streaming ingestion has a couple of seconds latency Available instantly for querying Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag. 

The warehouse seems to be more versatile, though careful attention needs to be given to its design. 

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas. 

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.


References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]
[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]
[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]
[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]
[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link

09 December 2024

🏭🗒️Microsoft Fabric: Microsoft Fabric [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 8-Dec-2024

Microsoft Fabric 

  • {goal}complete (end-to-end) analytics platform [6]
    • {characteristic} unified
      • {objective} provides a single, integrated environment for all the organization
        • {benefit} data professionals and the business users can collaborate on data projects [5] and solutions
    • {characteristic}serverless SaaS model (aka SaaS-ified)
      • {objective} provisioned automatically with the tenant [6]
      • {objective} highly scalable [5]
      • {objective} cost-effectiveness [5]
      • {objective} accessible 
        • ⇐ from anywhere with an internet connection [5]
      • {objective} continuous updates
        • ⇐ provided by Microsoft
      • {objective} continuous maintenance 
        • ⇐ provided by Microsoft
      • provides a set of integrated services that enable to ingest, store, process, and analyze data in a single environment [5]
    • {objective} secure
    • {objective} governed
  • {goal} lake-centric
    • {characteristic} OneLake-based
      • all workloads automatically store their data in the OneLake workspace folders [6]
      • all the data is organized in an intuitive hierarchical namespace [6]
      • data is automatically indexed [6]
      • provides a set of features 
        • discovery
        • MIP labels
        • lineage
        • PII scans
        • sharing
        • governance
        • compliance
    • {characteristic} one copy
      • available for all computes 
      • all compute engines store their data automatically in OneLake
        •  the data is stored in a (single) common format
          •  delta parquet file format
            • open standards format
            • the storage format for all tabular data in Microsoft Fabric 
        • ⇐ the data is directly accessible by all the engines [6]
          • ⇐ no import/export needed
      • all compute engines are fully optimized to work with Delta Parquet as their native format [6]
      • a shared universal security model is enforced across all the engines [6]
    • {characteristic} open at every tier
  • {goal} empowering
    • {characteristic} intuitive
    • {characteristic} built into M365
    • {characteristic} insight to action
  • {goal} AI-powered
    • {characteristic} Copilot accelerated 
    • {characteristic} ChatGPT enabled
    • {characteristic} AI-driven insights
  •  complete analytics platform
    • addresses the needs of all data professionals and business users who target harnessing the value of data 
  • {feature} scales automatically
    • the system automatically allocates an appropriate number of compute resources based on the job size
    • the cost is proportional to total resource consumption, rather than size of cluster or number of resources allocated 
    •  jobs in general complete faster (and usually, at less overall cost)
      • ⇒ not need to specify cluster sizes
  • natively supports 
    • Spark
    • data science
    • log-analytics
    • real-time ingestion and messaging
    • alerting
    • data pipelines
    • Power BI reporting 
    • interoperability with third-party services 
      • from other vendors that support the same open 
  • data virtualization mechanisms 
    • {feature} mirroring [notes]
    • {feature} shortcuts [notes]
      • allow users to reference data without copying it
      • {benefit} make other domain data available locally without the need for copying data
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains [notes]
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • subdomains
      • a way for fine tuning the logical grouping data under a domain [1]
        • subdivisions of a domain

Resources:
[1] Microsoft Learn (2023) Administer Microsoft Fabric [link]
[2] Microsoft Learn: Fabric (2024) Governance overview and guidance [link]
[3] Microsoft Learn: Fabric (2023) Fabric domains [link]
[4] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam [link]
[5] Microsoft Learn: Fabric (2024) Introduction to end-to-end analytics using Microsoft Fabric [link]
[6] 
Microsoft Fabric (2024) Fabric Analyst in a Day [course notes]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
M365 - Microsoft 365
MF - Microsoft Fabric
PII - Personal Identification Information
SaaS - software-as-a-service

13 June 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: One Person Can’t Learn or Do Everything)

Business Intelligence Series
Business Intelligence Series

Today’s Explicit Measures webcast [1] considered an article written by Kurt Buhler (The Data Goblins): [Microsoft] "Fabric is a Team Sport: One Person Can’t Learn or Do Everything" [2]. It’s a well-written article that deserves some thought as there are several important points made. I can’t say I agree with the full extent of some statements, even if some disagreements are probably just a matter of semantics.

My main disagreement starts with the title “One Person Can’t Learn or Do Everything”. As clarified in webcast's chat, the author defines “everything" as an umbrella for “all the capabilities and experiences that comprise Fabric including both technical (like Power BI) or non-technical (like adoption data literacy) and everything in between” [1].

For me “everything” is relative and considers a domain's core set of knowledge, while "expertise" (≠ "mastery") refers to the degree to which a person can use the respective knowledge to build back-to-back solutions for a given area. I’d say that it becomes more and more challenging for beginners or average data professionals to cover the core features. Moreover, I’d separate the non-technical skills because then one will also need to consider topics like Data, Project, Information or Knowledge Management.

There are different levels of expertise, and they can vary in depth (specialization) or breadth (covering multiple areas), respectively depend on previous experience (whether one worked with similar technologies). Usually, there’s a minimum of requirements that need to be covered for being considered as expert (e.g. certification, building a solution from beginning to the end, troubleshooting, performance optimization, etc.). It’s also challenging to roughly define when one’s expertise starts (or ends), as there are different perspectives on the topics. 

Conversely, the term expert is in general misused extensively, sometimes even with a mischievous intent. As “expert” is usually considered an external consultant or a person who got certified in an area, even if the person may not be able to build solutions that address a customer’s needs. 

Even data professionals with many years of experience can be overwhelmed by the volume of knowledge, especially when one considers the different experiences available in MF, respectively the volume of new features released monthly. Conversely, expertise can be considered in respect to only one or more MF experiences or for one area within a certain layer. Lot of the knowledge can be transported from other areas – writing SQL and complex database objects, modelling (enterprise) semantic layers, programming in Python, R or Power Query, building data pipelines, managing SQL databases, etc. 

Besides the standard documentation, training sessions, and some reference architectures, Microsoft made available also some labs and other material, which helps discovering the features available, though it doesn’t teach people how to build complete solutions. I find more important than declaring explicitly the role-based audience, the creation of learning paths for the various roles.

During the past 6-7 months I've spent on average 2 days per week learning MF topics. My problem is not the documentation but the lack of maturity of some features, the gaps in functionality, identifying the respective gaps, knowing what and when new features will be made available. The fact that features are made available or changed while learning makes the process more challenging. 

My goal is to be able to provide back-to-back solutions and I believe that’s possible, even if I might not consider all the experiences available. During the past 22 years, at least until MF, I could build complete BI solutions starting from requirements elicitation, data extraction, modeling and processing for data consumption, respectively data consumption for the various purposes. At least this was the journey of a Software Engineer into the world of data. 

References:
[1] Explicit Measures (2024) Power BI tips Ep.328: Microsoft Fabric is a Team Sport (link)
[2] Data Goblins (2024) Fabric is a Team Sport: One Person Can’t Learn or Do Everything (link)

10 March 2024

🏭🗒️Microsoft Fabric: Dataflows Gen2 [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 24-Nov-2025

Dataflow (Gen2) Architecture [4]

[Microsoft Fabric] Dataflow (Gen2) 

  • cloud-based, low-code interface that provides a modern data integration experience allowing users to ingest, prepare and transform data from a rich set of data sources incl. databases, data warehouses, lakehouses, real-time data repositories, etc. [11]
    • new generation of dataflows that resides alongside the Power BI Dataflow (Gen1) [2]
      • brings new features, improved experience [2] and enhanced performance [11]
      • similar to Dataflow Gen1 in Power BI [2] 
      • {recommendation} implement new functionality using Dataflow (Gen2) [11]
        • allows to leverage the many features and experiences not available in (Gen1) 
      • {recommendation} migrate from Dataflow (Gen1) to (Gen2) [11] 
        • allows to leverage the modern experience and capabilities 
    • allows to 
      • extract data from various sources [1]
      • transform it using a wide range of transformation operations [1]
      • load it into a destination [1]
    • {goal} provide an easy, reusable way to perform ETL tasks using Power Query Online [1]
      • allows to promote reusable ETL logic 
        • ⇒ prevents the need to create more connections to the data source [1]
        • offer a wide variety of transformations [1]
    • can be horizontally partitioned
  • {component} Lakehouse 
    • used to stage data being ingested
  • {component} Warehouse 
    • used as a compute engine and means to write back results to staging or supported output destinations faster
  • {component} mashup engine
    • extracts, transforms, or loads the data to staging or data destinations when either [4]
      • warehouse compute cannot be used [4]
      • {limitation} staging is disabled for a query [4]
  • {operation} create a dataflow
    • can be created in a
      • Data Factory workload
      • Power BI workspace
      • Lakehouse
    •  when a dataflow (Gen2) is reated in a workspace, lakehouse and warehouse items are provisioned along with their related SQL analytics endpoint and semantic models [12]
      •  shared by all dataflows in the workspace and are required for Dataflow Gen2 to operate [12]
        • {warning} shouldn't be deleted, and aren't intended to be used directly by users [12]
        •  aren't visible in the workspace, but might be accessible in other experiences such as the Notebook, SQL-endpoint, Lakehouse, and Warehouse experience [12]
        •  the items can be recognized by their prefix:`DataflowsStaging' [12]
  • {operation} set a default destination for the dataflow 
    • helps to get started quickly by loading all queries to the same destination [14]
    • via ribbon or the status bar in the editor
    • users are prompted to choose a destination and select which queries to bind to it [14]
    • to update the default destination, delete the current default destination and set a new one [14]
    • {default} any new query has as destination the lakehouse, warehouse, or KQL database from which it got started [14] 
  • {operation} publish a dataflow
    • generates dataflow's definition  
      • ⇐ the program that runs once the dataflow is refreshed to produce tables in staging storage and/or output destination [4]
      • used by the dataflow engine to generate an orchestration plan, manage resources, and orchestrate execution of queries across data sources, gateways, and compute engines, and to create tables in either the staging storage or data destination [4]
    • saves changes and runs validations that must be performed in the background [2]
  • {operation} export/import dataflows [11]
    •  allows also to migrate from dataflow (Gen1) to (Gen2) [11]
  • {operation} refresh a dataflow
    • applies the transformation steps defined during authoring 
    • can be triggered on-demand or by setting up a refresh schedule
    • {action} cancel refresh
      • enables to cancel ongoing Dataflow Gen2 refreshes from the workspace items view [6]
      • once canceled, the dataflow's refresh history status is updated to reflect cancellation status [15] 
      • {scenario} stop a refresh during peak time, if a capacity is nearing its limits, or if refresh is taking longer than expected [15]
      • it may have different outcomes
        • data from the last successful refresh is available [15]
        • data written up to the point of cancellation is available [15]
      • {warning} if a refresh is canceled before evaluation of a query that loads data to a destination began, there's no change to data in that query's destination [15]
    • {limitation} each dataflow is allowed up to 300 refreshes per 24-hour rolling window [15]
      •  {warning} attempting 300 refreshes within a short burst (e.g., 60 seconds) may trigger throttling and result in rejected requests [15]
        •  protections in place to ensure system reliability [15]
      • if the scheduled dataflow refresh fails consecutively,  dataflow refresh schedule is paused and an email is sent to the owner [15]
    • {limitation} a single evaluation of a query has a limit of 8 hours [15]
    • {limitation} total refresh time of a single refresh of a dataflow is limited to a max of 24 hours [15]
    • {limitation} per dataflow one can have a maximum of 50 staged queries, or queries with output destination, or combination of both [15]
  • {operation} copy and paste code in Power Query [11]
    •   allows to migrate dataflow (Gen1) to (Gen2) [11]
  • {operation} save a dataflow [11]
    • via 'Save As'  feature
    • can be used to save a dataflow (Gen1) as (Gen2) dataflow [11] 
  • {operation} save a dataflow as draft 
    •  allows to make changes to dataflows without immediately publishing them to a workspace [13]
      • can be later reviewed, and then published, if needed [13]
    • {operation} publish draft dataflow 
      • performed as a background job [13]
      • publishing related errors are visible next to the dataflow's name [13]
        • selecting the indication reveals the publishing errors and allows to edit the dataflow from the last saved version [13]
  • {operation} run a dataflow 
    • can be performed
      • manually
      • on a refresh schedule
      • as part of a Data Pipeline orchestration
  •  {operation} monitor pipeline runs 
    • allows to check pipelines' status, spot issues early, respectively troubleshoot issues
    • [Workspace Monitoring] provides log-level visibility for all items in a workspace [link]
      • via Workspace Settings >> select Monitoring 
    • [Monitoring Hub] serves as a centralized portal for browsing pipeline runs across items within the Data Factory or Data Engineering experience [link]
  • {feature} connect multiple activities in a pipeline [11]
    •  allows to build end-to-end, automated data workflows
  • {feature} author dataflows with Power Query
    • uses the full Power Query experience of Power BI dataflows [2]
  • {feature} shorter authoring flow
    • uses step-by-step for getting the data into your the dataflow [2]
      • the number of steps required to create dataflows were reduced [2]
    • a few new features were added to improve the experience [2]
  • {feature} AutoSave and background publishing
    • changes made to a dataflow are autosaved to the cloud (aka draft version of the dataflow) [2]
      • ⇐ without having to wait for the validation to finish [2]
    • {functionality} save as draft 
      • stores a draft version of the dataflow every time you make a change [2]
      • seamless experience and doesn't require any input [2]
    • {concept} published version
      • the version of the dataflow that passed validation and is ready to refresh [5]
  • {feature} integration with data pipelines
    • integrates directly with Data Factory pipelines for scheduling and orchestration [2] 
  • {feature} high-scale compute
    • leverages a new, higher-scale compute architecture [2] 
      •  improves the performance of both transformations of referenced queries and get data scenarios [2]
      • creates both Lakehouse and Warehouse items in the workspace, and uses them to store and access data to improve performance for all dataflows [2]
  • {feature} improved monitoring and refresh history
    • integrate support for Monitoring Hub [2]
    • Refresh History experience upgraded [2]
  • {feature} get data via Dataflows connector
    • supports a wide variety of data source connectors
      • include cloud and on-premises relational databases
  • {feature} incremental refresh
    • enables to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations [5]
  • {feature} data destinations
    • allows to 
      • specify an output destination
      • separate ETL logic and destination storage [2]
    • every tabular data query can have a data destination [3]
      • available destinations
        • Azure SQL databases
        • Azure Data Explorer (Kusto)
        • Fabric Lakehouse
        • Fabric Warehouse
        • Fabric KQL database
      • a destination can be specified for every query individually [3]
      • multiple different destinations can be used within a dataflow [3]
      • connecting to the data destination is similar to connecting to a data source
      • {limitation} functions and lists aren't supported
    • {operation} creating a new table
      • {default} table name has the same name as the query name.
    • {operation} picking an existing table
    • {operation} deleting a table manually from the data destination 
      • doesn't recreate the table on the next refresh [3]
    • {operation} reusing queries from Dataflow Gen1
      • {method} export Dataflow Gen1 query and import it into Dataflow Gen2
        • export the queries as a PQT file and import them into Dataflow Gen2 [2]
      • {method} copy and paste in Power Query
        • copy the queries and paste them in the Dataflow Gen2 editor [2]
    • {feature} automatic settings:
      • {limitation} supported only for Lakehouse and Azure SQL database
      • {setting} Update method replace: 
        • data in the destination is replaced at every dataflow refresh with the output data of the dataflow [3]
      • {setting} Managed mapping: 
        • the mapping is automatically adjusted when republishing the data flow to reflect the change 
          • ⇒ doesn't need to be updated manually into the data destination experience every time changes occur [3]
      • {setting} Drop and recreate table: 
        • on every dataflow refresh the table is dropped and recreated to allow schema changes
        • {limitation} the dataflow refresh fails if any relationships or measures were added to the table [3]
    • {feature} update methods
      • {method} replace
        • on every dataflow refresh, the data is dropped from the destination and replaced by the output data of the dataflow.
        • {limitation} not supported by Fabric KQL databases and Azure Data Explorer 
      • {method} append
        • on every dataflow refresh, the output data from the dataflow is appended (aka merged) to the existing data in the data destination table (aka upsert)
    • {feature} data staging 
      • {default} enabled
        • allows to use Fabric compute to execute queries
          • ⇐ enhances the performance of query processing
        • the data is loaded into the staging location
          • ⇐ an internal Lakehouse location accessible only by the dataflow itself
        • [Warehouse] staging is required before the write operation to the data destination
          • ⇐ improves performance
          • {limitation} only loading into the same workspace as the dataflow is supported
        •  using staging locations can enhance performance in some cases
      • disabled
        • {recommendation} [Lakehouse] disable staging on the query to avoid loading twice into a similar destination
          • ⇐ once for staging and once for data destination
          • improves dataflow's performance
    • {scenario} use a dataflow to load data into the lakehouse and then use a notebook to analyze the data [2]
    • {scenario} use a dataflow to load data into an Azure SQL database and then use a data pipeline to load the data into a data warehouse [2]
  • {feature} Fast Copy
    • allows ingesting terabytes of data with the easy experience and the scalable back-end of the pipeline Copy Activity [7]
      • enables large-scale data ingestion directly utilizing the pipelines Copy Activity capability [6]
      • supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage [6]
      • significantly scales up the data processing capacity providing high-scale ELT capabilities
    • the feature must be enabled [7]
      • after enabling, Dataflows automatically switch the back-end when data size exceeds a particular threshold [7]
      • ⇐there's no need to change anything during authoring of the dataflows
      • one can check the refresh history to see if fast copy was used [7]
      • ⇐see the Engine typeRequire fast copy option
      • {option} Require fast copy
    • {prerequisite} Fabric capacity is available [7]
      •  requires a Fabric capacity or a Fabric trial capacity [11]
    • {prerequisite} data files 
      • are in .csv or parquet format
      • have at least 100 MB
      • are stored in an ADLS Gen2 or a Blob storage account [6]
    • {prerequisite} [Azure SQL DB|PostgreSQL] >= 5 million rows in the data source [7]
    • {limitation} doesn't support [7] 
      • the VNet gateway
      • writing data into an existing table in Lakehouse
      • fixed schema
  • {feature} parameters
    • allow to dynamically control and customize dataflows
      • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [9]
      • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
        • Fabric REST API [9]
        • native Fabric experiences [9]
      • parameter names are case sensitive [9]
      • {type} required parameters
        • {warning} the refresh fails if no value is passed for it [9]
      • {type} optional parameters
      • enabled via Parameters >> Enable parameters to be discovered and override for execution [9]
    • {limitation} dataflows with parameters can't be
      • scheduled for refresh through the Fabric scheduler [9]
      • manually triggered through the Fabric Workspace list or lineage view [9]
    • {limitation} parameters that affect the resource path of a data source or a destination are not supported [9]
      • ⇐connections are linked to the exact data source path defined in the authored dataflow
        • can't be currently override to use other connections or resource paths [9]
    • {limitation} can't be leveraged by dataflows with incremental refresh [9]
    • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
      • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [9]
    • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [9]
    • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [9]
    • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [9]
    • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [9]
    • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [9]
      • subsequent requests are rejected until the first request finishes its evaluation [9]
  • {feature} support for CI/CD and Git integration
    • allows to create, edit, and manage dataflows in a Git repository that's connected to a Fabric workspace [10]
    • allows to use the deployment pipelines to automate the deployment of dataflows between workspaces [10]
    • allows to use Public APIs to create and manage Dataflow Gen2 with CI/CD and Git integration [10]
    • allows to create Dataflow Gen2 directly into a workspace folder [10]
    • allows to use the Fabric settings and scheduler to refresh and edit settings for Dataflow Gen2 [10]
    • {action} save a workflow
      • replaces the publish operation
      • when saving th dataflow, it automatically publishes the changes to the dataflow [10]
    • {action} delete a dataflow
      • the staging artifacts become visible in the workspace and are safe to be deleted [10]
    • {action} schedule a refresh
      • can be done manually or by scheduling a refresh [10]
      • {limitation} the Workspace view doesn't show if a refresh is ongoing for the dataflow [10]
      • refresh information is available in the refresh history [10]
    • {action} branching out to another workspace
      • {limitation} the refresh can fail with the message that the staging lakehouse couldn't be found [10]
      • {workaround} create a new Dataflow Gen2 with CI/CD and Git support in the workspace to trigger the creation of the staging lakehouse [10]
      •  all other dataflows in the workspace should start to function again.
    • {action} syncing changes from GIT into the workspace
      • requires to open the new or updated dataflow and save changes manually with the editor [10]
        • triggers a publish action in the background to allow the changes to be used during refresh of the dataflow [10]
    • [Power Automate] {limitation} the connector for dataflows isn't working [10]
  •  {feature} Copilot for Dataflow Gen2
    • provide AI-powered assistance for creating data integration solutions using natural language prompts [11]
    • {benefit} helps streamline the dataflow development process by allowing users to use conversational language to perform data transformations and operations [11]
  • {benefit} enhance flexibility by allowing dynamic adjustments without altering the dataflow itself [9]
  • {benefit} extends data with consistent data, such as a standard date dimension table [1]
  • {benefit} allows self-service users access to a subset of data warehouse separately [1]
  • {benefit} optimizes performance with dataflows, which enable extracting data once for reuse, reducing data refresh time for slower sources [1]
  • {benefit} simplifies data source complexity by only exposing dataflows to larger analyst groups [1]
  • {benefit} ensures consistency and quality of data by enabling users to clean and transform data before loading it to a destination [1]
  • {benefit} simplifies data integration by providing a low-code interface that ingests data from various sources [1]
  • {limitation} not a replacement for a data warehouse [1]
  • {limitation} row-level security isn't supported [1]
  • {limitation} Fabric or Fabric trial capacity workspace is required [1]


Feature Data flow Gen2 Dataflow Gen1
Author dataflows with Power Query
Shorter authoring flow
Auto-Save and background publishing
Data destinations
Improved monitoring and refresh history
Integration with data pipelines
High-scale compute
Get Data via Dataflows connector
Direct Query via Dataflows connector
Incremental refresh ✓*
Fast Copy ✓*
Cancel refresh ✓*
AI Insights support
Dataflow Gen1 vs Gen2 [2]

References:
[1] Microsoft Learn (2023) Fabric: Ingest data with Microsoft Fabric [link]
[2] Microsoft Learn (2023) Fabric: Getting from Dataflow Generation 1 to Dataflow Generation 2 [link]
[3] Microsoft Learn (2023) Fabric: Dataflow Gen2 data destinations and managed settings [link]
[4] Microsoft Learn (2023) Fabric: Dataflow Gen2 pricing for Data Factory in Microsoft Fabric [link]
[5] Microsoft Learn (2023) Fabric: Save a draft of your dataflow [link]
[6] Microsoft Learn (2023) Fabric: What's new and planned for Data Factory in Microsoft Fabric [link][7] Microsoft Learn (2023) Fabric: Fast copy in Dataflows Gen2 [link]
[8] Microsoft Learn (2025) Fabric: Incremental refresh in Dataflow Gen2 [link]
[9] Microsoft Learn (2025) Fabric: Use public parameters in Dataflow Gen2 (Preview) [link]
[10] Microsoft Learn (2025) Fabric: Dataflow Gen2 with CI/CD and Git integration support [link]
[11] Microsoft Learn (2025) Fabric: What is Dataflow Gen2? [link]
[12] Microsoft Learn (2025) Fabric: Use a dataflow in a pipeline [link]
[13] Microsoft Learn (2025) Fabric: Save a draft of your dataflow [link]
[14] Microsoft Learn (2025) Fabric: Dataflow destinations and managed settings [link]
[15] Microsoft Learn (2025) Fabric: Dataflow refresh [link]

Resources:
[R1] Arshad Ali & Bradley Schacht (2024) Learn Microsoft Fabric [link]
[R2] Microsoft Learn: Fabric (2023) Data Factory limitations overview [link]
[R3] Microsoft Fabric Blog (2023) Data Factory Spotlight: Dataflow Gen2, by Miguel Escobar [link]
[R4] Microsoft Learn (2023) Fabric: Dataflow Gen2 connectors in Microsoft Fabric [link]
[R5] Microsoft Learn(2023) Fabric: Pattern to incrementally amass data with Dataflow Gen2 [link
[R6] Fourmoo (2004) Microsoft Fabric – Comparing Dataflow Gen2 vs Notebook on Costs and usability, by Gilbert Quevauvilliers [link]
[R7] Microsoft Learn: Fabric (2023) A guide to Fabric Dataflows for Azure Data Factory Mapping Data Flow users [link]
[R8] Microsoft Learn: Fabric (2023) Quickstart: Create your first dataflow to get and transform data [link]
[R9] Microsoft Learn: Fabric (2023) Microsoft Fabric decision guide: copy activity, dataflow, or Spark [link]
[R10] Microsoft Fabric Blog (2023) Dataflows Gen2 data destinations and managed settings, by Miquella de Boer  [link]
[R11] Microsoft Fabric Blog (2023) Service principal support to connect to data in Dataflow, Datamart, Dataset and Dataflow Gen 2, by Miquella de Boer [link]
[R12] Chris Webb's BI Blog (2023) Fabric Dataflows Gen2: To Stage Or Not To Stage? [link]
[R13] Power BI Tips (2023) Let's Learn Fabric ep.7: Fabric Dataflows Gen2 [link]
[R14] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R15] Microsoft Fabric Blog (2023) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

Acronyms:
ADLS - Azure Data Lake Storage
CI/CD - Continuous Integration/Continuous Deployment 
ETL - Extract, Transform, Load
KQL - Kusto Query Language
PQO - Power Query Online
PQT - Power Query Template

30 December 2023

💫🗒️ERP Systems: Microsoft Dynamics 365's Invoice Capture (Features) [Notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide an overview of all the features. Please refer to the documentation for a complete overview!

In what concerns the releases see [2].
Last updated: 10-Feb-2025

Invoice Capture - Main Components
Invoice Capture - Main Components [3]

AI Model

  • {feature} prebuilt model (aka Invoice processing model) [by design]
    • can handle the most common invoices in various languages
    • owned by Microsoft and cannot be trained by the customers
  • {feature} custom prebuilt models [planned]
    • built on top of the prebuilt model to handle more complex invoice layouts
    • only requires to train the exceptional invoices
    • after a model is published, additional mapping is required to map the model fields to the invoice files
  • {feature} custom model [by design]
    • requires training the model from scratch 
      • costs time and effort
    • supports the Charges and Sales Tax Amount fields [1.5.0.2]
    • support lookup lists on custom fields [1.6.0.x]

Channels

  • flows that collect the invoices into one location 
    • there's always a 1:1 relationship between flows and channels
  • multiple channels can be defined using different triggers based on Power Automated connectors
  • {feature}default channel for Upload files [1.0.1.0]
  • {feature} supports multiple sources via flow templates [by design]
    • Outlook.com
    • Microsoft Outlook 365
    • Microsoft Outlook 365 shared mailbox [1.1.0.10]
      • to achieve similar behavior one had to modify the first steps of the generated flow with the "When a new email arrives in a shared mailbox (V2)" trigger
    • SharePoint
    • OneDrive
    • OneDrive for business [1.0.1.0]
  • {feature} assign legal entity on the channel
    • the LE value is automatically assigned without applying additional derivation logic [1]
  • invoices sent via predefined channels are captured and appear on the 'Received files' page

Configuration groups 

  • allow managing the list of invoice fields and the manual review settings 
  • can be assigned for each LE or Vendors. 
    • all the legal entities in the same configuration group use the same invoice fields and manual review setting
  • default configuration group 
    • created after deployment, can't be changed or deleted

Fields

  • {feature} standard fields 
    • Legal entity 
      • organizations registered with legal authorities in D365 F&O and selected for Invoice capture
      • {feature} allow to enforce role-based security model
      • {feature} synchronized from D365 F&O to CRM 
      • {missing feature} split the invoices between multiple LEs
      • {feature} Sync deleted legal entities 
        • when legal entities are deleted in D365, their status is set to 'Inactive' [1.9.0.3]
            • ⇐ are not going to be derived during Invoice capture processing
          • ⇒ invoices with inactive legal entities cannot be transferred to D365FO
          • ⇐ same rules apply to vendor accounts
    • Vendor master 
      • individuals or organizations that supply goods or services
      • used to automatically derive the Vendor account
      • {feature} synchronized from D365 F&O to CRM
        • synchronization issues solved [1.6.0.x]
      • {feature}Vendor account can be derived from tax number [1.0.1.0]
      • {feature} Synchronize vendors based on filter conditions [1.9.0.3]
        • one can set filter conditions to only synchronize vendors that are suitable for inclusion in Invoice capture
      • {feature} Sync deleted vendor accounts 
        • when vendor accounts are deleted in D365, their status is set to 'Inactive' [1.9.0.3]
    • Invoice header
      • {feature} Currency code can be derived from the currency symbol [1.0.1.0]
    • Invoice lines
    • Item master
      • {feature} Item number can be derived from External item number [1.0.1.0]
      • {feature} default the item description for procurement category item by the value from original document [1.1.0.32/10.0.39]
    • Expense types (aka Procurement categories) 
    • Charges
      • amounts added to the lines or header
      • {feature} header-level charges [1.1.0.10]
      • {missing feature}line-level charges 
    • {feature}Financial dimensions 
      • header level [1.1.0.32/10.0.39]
      • line level [1.3.0.x]
    • Purchase orders
      • {feature} PO formatting based on the number sequence settings
      • {missing feature}PO details
      • {missing feature} multiple POs, multiple receipts 
    • {missing feature} Project information integration
      • {workaround} the Project Id can be added as custom field on Invoice line for Cost invoices and with this the field should be mapped with the corresponding Data entity for transferring the value for D365 F&O
  • {feature} custom fields [1.1.0.32/10.0.39]

File filter

  • applies additional filtering to incoming files at the application level
  • with the installation a default updatable global file filter is provided
  • can be applied at different channel levels
  • {event} an invoice document is received
    • the channel is checked first for a file filter
    • if no file filter is assigned to the channel level, the file filter at the system level is used [1]
  • {configuration} maximum files size; 20 MB
  • {configuration} supported files types: PDF, PNG, JPG, JPEG, TIF, TIFF
  • {configuration} supported file names 
    • filter out files that aren't relevant to invoices [1]
    • rules can be applied to accept/exclude files whose name contains predefined strings [1]
Actions 

  • Import Invoice
    • {feature} Channel for file upload
      • a default channel is provided for directly uploading the invoice files
      • maximum 20 files can be uploaded simultaneously
  • Capture Invoice
    • {feature} Invoice capture processing
      • different derivation rules are applied to ensure that the invoices are complete and correct [1]
    • {missing feature} differentiate between relevant and non-relevant content}
    • {missing feature} merging/splitting files
      • {workaround} export the file(s), merge/split them, and import the result
  • Void files [1.0.1.0]
    • once the files voided, it is allowed to be deleted from Dataverse
      • saves storage costs
  • Classify Invoice
    • {feature} search for LEs
    • {feature} assign LE to Channel
      • AP clerks view only the invoices under the LE which are assigned to them
    • {feature} search for Vendor accounts by name or address 
  • Maintaining Headers
    • {feature} multiple sales taxes [1.1.0.26]
    • {missing feature} rounding-off [only in D365 F&O]
    • {feature} support Credit Notes [1.5.0.2]
  • Maintaining Lines
    • {feature} Add/remove lines
    • {feature} "Remove all" option for deleting all the invoice lines in Side-by-Side Viewer [1.1.0.26]
      • Previously it was possible to delete the lines one by one, which by incorrectly formatted big invoices would lead to considerable effort. Imagine the invoices from Microsoft or other cloud providers that contain 5-10 pages of lines. 
    • {feature}Aggregate multiple lines [requested]
      • Now all lines from an invoice have the same importance. Especially by big invoices, it would be useful to aggregate the amounts from multiple lines under one. 
    • {feature} Show the total amount across the lines [requested]
      • When removing/adding lines, it would be useful to compare the total amount across the lines with the one from the header. 
    • Check the UoM consistency between invoice line and linked purchase order line
      • For the Invoice to be correctly processed, the two values must match. 
    • Support for discounts [requested]
      • as workaround, discounts can be entered as separate lines
  • Transfer invoice

  • Automation
    • {parameter} Auto invoice cleanup
      •  automatically cleans up the transferred invoices and voided invoices older than 180 days every day [1]
    • Use continuous learning 
      • select this option to turn on the continuous learning feature
      • learns from the corrections made by the AP clerk on a previous instance of the same invoice [1]
        • records the mapping relationship between the invoice context and the derived entities [1]
        • the entities are automatically derived for the next time a similar invoice is captured [1]
      • {missing feature} standard way to copy the continuous learning from UAT to PROD
      • {feature} migration tool for continuous learning knowledge
        • allows us to transfer the learning knowledge from one environment to another [1.6.0.x]
      • {feature} Continuous learning for decimal format [1.9.0.3]
        • allows the system to learn from the history record and automatically apply the correct decimal format on the amount fields
          • {requirement} manually correct the first incoming invoice and do the successful transfer
      • {feature} date format handling
        • fixes the issue with date formatting, which is caused by ambiguity in date recognition [1.8.3.0]
          • when a user corrects the date on the first invoice, the corresponding date format will be automatically applied to the future invoice if it is coming from the same vendor
          •  enabled only when the "Using continuous learning" parameter is active
    • Confidence score check
      • for the prebuilt model the confidence score is always the same 
        • its value is returned by the AI Builder service
        • confidence score can only be improved only when the customer prebuilt model is used 
        •  it can be increased by uploading more samples and do the tagging accordingly
        • a low confidence score is caused by the fact that not enough samples with the same pattern have been trained
      • {parameter} control the confidence score check [1.1.0.32/10.0.39]
  • Manage file filters
User Interface
  • Side-by-side view [by design]
    • users can adjust column widths [1.8.3.0]
      • improves the review experience
  • History logs [1.0.1.0]
    • supported in Received files and Captured invoices
    • help AP clerks know the actions and results in each step during invoice processing 
  • Navigation to D365 F&O [1.0.1.0]
    • once the invoice is successfully transferred to D365 F&O, a quick link is provided for the AP clerk to open the Pending vendor invoice list in F&O
  • Reporting
    • {missing feature} consolidated overview across multiple environments (e.g. for licensing needs evaluation)
    • {missing features} metrics by Legal entity, Processing status, Vendor and/or Invoice type
      • {workaround} a paginated report can be built based on Dataverse

Data Validation

  • [Invoice Capture] derivation rules 
    • applied to ensure that the invoices are complete and correct [1]
    • {missing feature} derive vendor using the associated email address
    • {parameter} Format purchase order 
      • used to check the number sequence settings in D365 F&O to format the PO number [1]
    • {parameter} Derive currency code for cost invoice
      • used to derive the value from invoice master data in D365 F&O [1]
      • ⇐ the currency code on PO Invoices must be identical to the one on PO
    • {parameter} Validate total sales tax amount 
      • validates the consistency between the sales tax amount on the Sales tax card and the total sales tax amount, when there's a sales tax line [1]
    • {parameter} Validate total amount 
      • confirm alignment between the calculated total invoice amount and the captured total amount [1]
  • [Invoice capture] automation
    • {feature} Automatically remove invalid field value [1.9.0.3]
      • when enabled, the system automatically removes the value if it doesn't exist in the lookup list
        • eliminates the need to manually removing values during the review, streamlining the process
        • {scenario} invalid payment terms errors
  • [D365 F&O] before workflow submissions [requested]
    • it makes sense to have out-of-box rules 
    • currently this can be done by implementing extensions
  • [D365 F&O] during workflow execution [by design]
Dynamics 365 for Finance [aka D365 F&O]
  • Attachments 
    • {parameter} control document type for persisting the invoice attachment in D365 F&O [1.1.0.32/10.0.39]
  • Invoice Capture
    • {parameter} select the entities in scope
    • {parameter}differentiate by Invoice type whether Vendor invoice or Invoice journal is used to book the invoices
    • {parameter} Transfer attachment
  • Fixed Assets
    • {missing feature} create Fixed asset automatically during the time the invoice is imported
  • Vendor Invoice Journal 
    • {missing feature} configure what journal the invoice is sent to
      • the system seems to pick the first journal available
      • {parameter} control journal name for creating 'Invoice journal'  [1.1.0.32/10.0.39]
    • {missing feature}grouping multiple invoices together in a journal (e.g., vendor group, payment terms, payment of method)
  •  Approval Workflow
    • {missing feature} involve Responsible person for the Vendor [requested]
    • {missing feature} involve Buyer for Cost invoices [requested]
    • {missing feature} differentiator between the various types of invoices [requested]
    • {missing feature} amount-based approval [requested]
  • Billing schedule
    • {feature} integration with Billing schedules
    • {feature} modify or cancel Billing schedules
  • Reporting
    • {missing feature} Vendor invoices hanging in the approval workflow (incl. responsible person for current action, respectively error message)
    • {missing feature} Report for GL reconciliation between Vendor invoice and GL via Billing schedules
    • {missing feature} Overview of the financial dimensions used (to identify whether further setup is needed)
Licensing
  • licensing name: Dynamics 365 E-Invoicing Documents - Electronic Invoicing Add-on for Dynamics 365
  • customers are entitled to 100 invoice capture transactions per tenant per month
    • if customers need more transactions, they must purchase extra Electronic Invoicing SKUs for 1,000 transactions per tenant per month [5] 
      • the transaction capacity is available on a monthly, use-it-or-lose-it basis [5]
      • customers must purchase for peak capacity [5]
  • only the captured invoices are going to be considered as valid transactions [4]
    • files filtered by the file filter setting won't be counted [4]
Architecture
  •  invoice capture is supported only when the integrated Dataverse environment exists [4]
Security
  • [role] Invoice capture operator
    • must be included in the role settings to 
      • successfully run the derivation and validation logic in Invoice capture [5]
      • transfer the invoice to D365F&O [5]
      • the role must be added to the corresponding flow user on the app side [5]
    •  is only applied on the D365 side [4]
      • ⇒ it doesn't block user to login to Invoice capture [4]
      • ⇐ the corresponding access to the virtual entities will be missed [4]
        • then the user will receive an error [4]
  • [role] Environment maker 
    • must be assigned to the Accounts payable administrator if they create channels in Invoice capture [5]

Previous post <<||>> Next post

Resources:
[1] Microsoft Learn (2023) Invoice capture overview (link)
[2] Yammer (2023) Invoice Capture for Dynamics 365 Finance (link)
[3] Microsoft (2023) Invoice Capture for Dynamics 365 Finance - Implementation Guide
[4] Yammer (2025) Quota/licensing follow up [link]
[5] Microsoft Learn (2024) Dynamics 365 Finance: Invoice capture solution [link]

Release history (requires authentication)

Acronyms:
AI - Artificial Intelligence 
AP - Accounts Payable
F&O - Finance & Operations
LE - Legal Entity
PO - Purchase Order
SKU - Stock Keeping Units 
UoM- Unit of Measure

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.