SQL Troubles: features

Showing posts with label features. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

{def} parameters that allow to dynamically control and customize Dataflows Gen2

makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
the dataflow is refreshed by passing parameter values outside of the Power Query editor through either

Fabric REST API [1]
native Fabric experiences [1]

parameter names are case sensitive [1]
{type} required parameters

{warning} the refresh fails if no value is passed for it [1]

{type} optional parameters
enabled via Parameters >> Enable parameters to be discovered and override for execution [1]

{limitation} dataflows with parameters can't be

scheduled for refresh through the Fabric scheduler [1]
manually triggered through the Fabric Workspace list or lineage view [1]

{limitation} parameters that affect the resource path of a data source or a destination are not supported [1]

⇐ connections are linked to the exact data source path defined in the authored dataflow

can't be currently override to use other connections or resource paths [1]

{limitation} can't be leveraged by dataflows with incremental refresh [1]
{limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override

any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]

{warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
{limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
{limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
{limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
{limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]

subsequent requests are rejected until the first request finishes its evaluation [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link]

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link]

Acronyms:
API - Application Programming Interface

REST - Representational State Transfer

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

ERP Implementations Series

An ERP implementation is occasionally considered as a one-time endeavor after which an organization will live happily ever after. In an ideal world that would be true, though the work never stops – things that were carved out from the implementation, optimizations, new features, new regulations, new requirements, integration with other systems, etc. An implementation is thus just the beginning from what it comes and it's essential to get the foundation right – and that’s the purpose of the ERP implementation – provide a foundation on which something bigger and solid can be erected.

No matter how well an ERP implementation is managed and executed, respectively how well people work towards the same goals, there’s always something forgotten or carved out from the initial project. Usually, the casual suspects are the integrations with other systems, though there can be also minor or even bigger features that are planned to be addressed later, if the implementation hasn’t consumed already all the financial resources available, as it's usually the case. Some of the topics can be addressed as Change Requests or consolidated on projects of their own.

Even simple integrations can become complex when the processes are poorly designed, and that typically happens more often than people think. It’s not necessarily about the lack of skillset or about the technologies used, but about the degree to which the processes can work in a loosely coupled interconnected manner. Even unidirectional integrations can raise challenges, though everything increases in complexity when the flow of data is bidirectional. Moreover, the complexity increases with each system added to the overall architecture.

Like a sculpture’s manual creation, processes in an ERP implementation form a skeleton that needs chiseling and smoothing until the form reaches the desired optimized shape. However, optimization is not a one-time attempt but a continuous work of exploring what is achievable, what works, what is optimal. Sometimes optimization is an exact science, while other times it’s about (scientifical) experimentation in which theory, ideas and investments are put to good use. However, experimentation tends to be expensive at least in terms of time and effort, and probably these are the main reasons why some organizations don’t even attempt that – or maybe it’s just laziness, pure indifference or self-preservation. In fact, why change something that already works?

Typically, software manufacturers make available new releases on a periodic basis as part of their planning for growth and of attracting more businesses. Each release that touches used functionality typically needs proper evaluation, testing and whatever organizations consider as important as part of the release management process. Ideally, everything should go smoothly though life never ceases to surprise and even a minor release can have an important impact when earlier critical functionality stopped working. Test automation and other practices can make an important difference for organizations, though these require additional effort and investments that usually pay off when done right.

Regulations and other similar requirements must be addressed as they can involve penalties or other risks that are usually worth avoiding. Ideally such requirements should be supported by design, though even then a certain volume of work is involved. Moreover, the business context can change unexpectedly, and further requirements need to be considered eventually.

The work on an ERP system and the infrastructure built around it is a never-ending story. Therefore, organizations must have not only the resources for the initial project, but also what comes after that. Of course, some work can be performed manually, some requirements can be delayed, some risks can be assumed, though the value of an ERP system increases with its extended usage, at least in theory.

Previous Post <<||>> Next Post

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 12-Mar-2025

Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

{def}

indexes and caches on the fly data landing in OneLake [2]

{benefit} allows to

analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake
⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2]
⇒ accelerate data landing in OneLake

⇐ including existing data and any new updates, and expect similar performance [1]
eliminates the need to

manage ingestion pipelines [1]
maintain duplicate copies of data [1]

ensures that data remains in sync without additional effort [4]
the initial process is dependent on the size of the external table [4]

⇐ provides significant performance comparable to ingesting data in Eventhouse [1]

in some cases up to 50x and beyond [2]

⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]

when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
accelerating the shortcut means equivalent ingestion into the Eventhouse
⇐ optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
e.g. indexing, caching, etc.

all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]
behave like external tables, with the same limitations and capabilities [4]

{limitation} materialized view aren't supported [1]
{limitation} update policies aren't supported [1]

allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]

⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]

⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]

{costs} charged under OneLake Premium cache meter [2]

⇐ similar to native Eventhouse tables [2]
one can control the amount of data to accelerate by configuring number of days to cache [2]
indexing activity may also count towards CU consumption [2]

{limitation} the number of columns in the external table can't exceed 900 [1]
{limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
{limitation} the feature assumes delta tables with static advanced features

e.g. column mapping doesn't change, partitions don't change, etc
{recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]

{limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]

might result in acceleration starting from scratch if there was breaking schema change [1]

{limitation} index-based pruning isn't supported for partitions [1]
{limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]

[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]

[4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

{def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
identifiable by its name

{constraint} must be unique in a folder or at the root level of the workspace
{constraint} can’t include certain special characters [1]

C0 and C1 control codes [1]
leading or trailing spaces [1]
characters: ~"#.&*:<>?/{|} [1]

{constraint} can’t have system-reserved names

e.g. $recycle.bin, recycled, recycler.

{constraint} its length can't exceed 255 characters

{operation} create folder

can be created in

an existing folder (aka nested subfolder) [1]

{restriction} a maximum of 10 levels of nested subfolders can be created [1]
up to 10 folders can be created in the root folder [1]
{benefit} provide a hierarchical structure for organizing and managing items [1]

the root

{operation} move folder
{operation} rename folder

same rules applies as for folders’ creation [1]

{operation} delete folder

{restriction} currently can be deleted only empty folders [1]

{recommendation} make sure the folder is empty [1]

{operation} create item in folder

{restriction} certain items can’t be created in a folder

dataflows gen2
streaming semantic models
streaming dataflows

⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]

{operation} move file(s) between folders [1]
{operation} publish to folder [1]

Power BI reports can be published to specific folders

{restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]

when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]

{limitation}may not be supported by certain features

e.g. Git

{recommendation} use folders to organize workspaces [1]

allows to improve content’s organization and navigation [AN]
allows to improve collaboration, access control and governance [AN]

{permissions}

inherit the permissions of the workspace where they're located [1] [2]
workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
viewers can only view folder hierarchy and navigate in the workspace [1]

[deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

20 January 2025

🏭🗒️Microsoft Fabric: [Azure] Service Principals (SPN) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 20-Jan-2025

[Azure] Service Principal (SPN)

{def} a non-human, application-based security identity used by applications or automation tools to access specific Azure resources [1]

can be assigned precise permissions, making them perfect for automated processes or background services

allows to minimize the risks of human error and identity-based vulnerabilities
supported in datasets, Gen1/Gen2 dataflows, datamarts [2]
authentication type

supported only by [2]

Azure Data Lake Storage
Azure Data Lake Storage Gen2
Azure Blob Storage
Azure Synapse Analytics
Azure SQL Database
Dataverse
SharePoint online

doesn’t support

SQL data source with Direct Query in datasets [2]

when registering a new application in Microsoft Entra ID, a SPN is automatically created for the app registration [4]

the access to resources is restricted by the roles assigned to the SPN

⇒ gives control over which resources can be accessed and at which level [4]

{recommendation} use SPN with automated tools [4]

rather than allowing them to sign in with a user identity [4]

{prerequisite} an active Microsoft Entra user account with sufficient permissions to

register an application with the tenant [4]
assign to the application a role in the Azure subscription [4]
⇐ requires Application.ReadWrite.All permission [4]

extended to support Fabric Data Warehouses [1]

{benefit} automation-friendly API Access

allows to create, update, read, and delete Warehouse items via Fabric REST APIs using service principals [1]
enables to automate repetitive tasks without relying on user credentials [1]

e.g. provisioning or managing warehouses
increases security by limiting human error

the warehouses thus created, will be displayed in the Workspace list view in Fabric UI, with the Owner name of the SPN [1]
applicable to users with administrator, member, or contributor workspace role [3]
minimizes risk

the warehouses created with delegated account or fixed identity (owner’s identity) will stop working when the owner leaves the organization [1]

Fabric requires the user to login every 30 days to ensure a valid token is provided for security reasons [1]

{benefit} seamless integration with Client Tools:

tools like SSMS can connect to the Fabric DWH using SPN [1]
SPN provides secure access for developers to

run COPY INTO

with and without firewall enabled storage [1]

run any T-SQL query programmatically on a schedule with ADF pipelines [1]

{benefit} granular access control

Warehouses can be shared with an SPN through the Fabric portal [1]

once shared, administrators can use T-SQL commands to assign specific permissions to SPN [1]

allows to control precisely which data and operations an SPN has access to [1]

GRANT SELECT ON <table name> TO <Service principal name>

warehouses' ownership can be changed from an SPN to user, and vice-versa [3]

{benefit} improved DevOps and CI/CD Integration

SPN can be used to automate the deployment and management of DWH resources [1]

⇐ ensures faster, more reliable deployment processes while maintaining strong security postures [1]

{limitation} default semantic models are not supported for SPN created warehouses [3]

⇒ features such as listing tables in dataset view, creating report from the default dataset don’t work [3]

{limitation} SPN for SQL analytics endpoints is not currently supported
{limitation} SPNs are currently not supported for COPY INTO error files [3]

⇐ Entra ID credentials are not supported as well [3]

{limitation} SPNs are not supported for GIT APIs. SPN support exists only for Deployment pipeline APIs [3]
monitoring tools

[DMV] sys.dm_exec_sessions.login_name column [3]
[Query Insights] queryinsights.exec_requests_history.login_name [3]
Query activity

submitter column in Fabric query activity [3]

Capacity metrics app:

compute usage for warehouse operations performed by SPN appears as the Client ID under the User column in Background operations drill through table [3]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric Updates Blog (2024) Service principal support for Fabric Data Warehouse [link]

[2] Microsoft Fabric Learn (2024) Service principal support in Data Factory [link]

[3] Microsoft Fabric Learn (2024) Service principal in Fabric Data Warehouse [link]

[4] Microsoft Fabric Learn (2024) Register a Microsoft Entra app and create a service principal [link]

[5] Microsoft Fabric Updates Blog (2024) Announcing Service Principal support for Fabric APIs [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ADF - Azure Data Factory

API - Application Programming Interface

CI/CD - Continuous Integration/Continuous Deployment

DMV - Dynamic Management View

DWH - Data Warehouse

SPN - service principal

SSMS - SQL Server Management Studio

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider:

Area	Lakehouse	Warehouse	Eventhouse	Fabric SQL database	Power BI Datamart
Data volume	Unlimited	Unlimited	Unlimited	4 TB	Up to 100 GB
Type of data	Unstructured, semi-structured, structured	Structured, semi-structured (JSON)	Unstructured, semi-structured, structured	Structured, semi-structured, unstructured	Structured
Primary developer persona	Data engineer, data scientist	Data warehouse developer, data architect, data engineer, database developer	App developer, data scientist, data engineer	AI developer, App developer, database developer, DB admin	Data scientist, data analyst
Primary dev skill	Spark (Scala, PySpark, Spark SQL, R)	SQL	No code, KQL, SQL	SQL	No code, SQL
Data organized by	Folders and files, databases, and tables	Databases, schemas, and tables	Databases, schemas, and tables	Databases, schemas, tables	Database, tables, queries
Read operations	Spark, T-SQL	T-SQL, Spark*	KQL, T-SQL, Spark	T-SQL	Spark, T-SQL
Write operations	Spark (Scala, PySpark, Spark SQL, R)	T-SQL	KQL, Spark, connector ecosystem	T-SQL	Dataflows, T-SQL
Multi-table transactions	No	Yes	Yes, for multi-table ingestion	Yes, full ACID compliance	No
Primary development interface	Spark notebooks, Spark job definitions	SQL scripts	KQL Queryset, KQL Database	SQL scripts	Power BI
Security	RLS, CLS**, table level (T-SQL), none for Spark	Object level, RLS, CLS, DDL/DML, dynamic data masking	RLS	Object level, RLS, CLS, DDL/DML, dynamic data masking	Built-in RLS editor
Access data via shortcuts	Yes	Yes	Yes	Yes	No
Can be a source for shortcuts	Yes (files and tables)	Yes (tables)	Yes	Yes (tables)	No
Query across items	Yes	Yes	Yes	Yes	No
Advanced analytics	Interface for large-scale data processing, built-in data parallelism, and fault tolerance	Interface for large-scale data processing, built-in data parallelism, and fault tolerance	Time Series native elements, full geo-spatial and query capabilities	T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics	Interface for data processing with automated performance tuning
Advanced formatting support	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format	Full indexing for free text and semi-structured data like JSON	Table support for OLTP, JSON, vector, graph, XML, spatial, key-value	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency	Available instantly for querying	Available instantly for querying	Queued ingestion, streaming ingestion has a couple of seconds latency	Available instantly for querying	Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag.

The warehouse seems to be more versatile, though careful attention needs to be given to its design.

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas.

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.

Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]

[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]

[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]

[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]

[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link]

09 December 2024

🏭🗒️Microsoft Fabric: Microsoft Fabric [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 8-Dec-2024

Microsoft Fabric

{goal}complete (end-to-end) analytics platform [6]

{characteristic} unified

{objective} provides a single, integrated environment for all the organization

{benefit} data professionals and the business users can collaborate on data projects [5] and solutions

{characteristic}serverless SaaS model (aka SaaS-ified)

{objective} provisioned automatically with the tenant [6]
{objective} highly scalable [5]
{objective} cost-effectiveness [5]
{objective} accessible

⇐ from anywhere with an internet connection [5]

{objective} continuous updates

⇐ provided by Microsoft

{objective} continuous maintenance

⇐ provided by Microsoft

provides a set of integrated services that enable to ingest, store, process, and analyze data in a single environment [5]

{objective} secure
{objective} governed

{goal} lake-centric

{characteristic} OneLake-based

all workloads automatically store their data in the OneLake workspace folders [6]
all the data is organized in an intuitive hierarchical namespace [6]
data is automatically indexed [6]
provides a set of features

discovery
MIP labels
lineage
PII scans
sharing
governance
compliance

{characteristic} one copy

available for all computes
all compute engines store their data automatically in OneLake

⇐ the data is stored in a (single) common format

⇐ delta parquet file format

open standards format
the storage format for all tabular data in Microsoft Fabric

⇐ the data is directly accessible by all the engines [6]

⇐ no import/export needed

all compute engines are fully optimized to work with Delta Parquet as their native format [6]
a shared universal security model is enforced across all the engines [6]

{characteristic} open at every tier

{goal} empowering

{characteristic} intuitive
{characteristic} built into M365
{characteristic} insight to action

{goal} AI-powered

{characteristic} Copilot accelerated
{characteristic} ChatGPT enabled
{characteristic} AI-driven insights

complete analytics platform

addresses the needs of all data professionals and business users who target harnessing the value of data

{feature} scales automatically

the system automatically allocates an appropriate number of compute resources based on the job size
the cost is proportional to total resource consumption, rather than size of cluster or number of resources allocated
jobs in general complete faster (and usually, at less overall cost)

⇒ not need to specify cluster sizes

natively supports

Spark
data science
log-analytics
real-time ingestion and messaging
alerting
data pipelines
Power BI reporting
interoperability with third-party services

from other vendors that support the same open

data virtualization mechanisms

{feature} mirroring [notes]
{feature} shortcuts [notes]

allow users to reference data without copying it
{benefit} make other domain data available locally without the need for copying data

{feature} tenant (aka Microsoft Fabric tenant, MF tenant)

a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
can contain any number of workspaces

{feature} workspaces

{definition} a collection of items that brings together different functionality in a single environment designed for collaboration
associated with a domain [3]

{feature} domains [notes]

{definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
subdomains

a way for fine tuning the logical grouping data under a domain [1]

subdivisions of a domain

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Learn (2023) Administer Microsoft Fabric [link]
[2] Microsoft Learn: Fabric (2024) Governance overview and guidance [link]
[3] Microsoft Learn: Fabric (2023) Fabric domains [link]
[4] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam [link]
[5] Microsoft Learn: Fabric (2024) Introduction to end-to-end analytics using Microsoft Fabric [link]
[6] Microsoft Fabric (2024) Fabric Analyst in a Day [course notes]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
M365 - Microsoft 365
MF - Microsoft Fabric
PII - Personal Identification Information
SaaS - software-as-a-service

13 June 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: One Person Can’t Learn or Do Everything)

Business Intelligence Series

Today’s Explicit Measures webcast [1] considered an article written by Kurt Buhler (The Data Goblins): [Microsoft] "Fabric is a Team Sport: One Person Can’t Learn or Do Everything" [2]. It’s a well-written article that deserves some thought as there are several important points made. I can’t say I agree with the full extent of some statements, even if some disagreements are probably just a matter of semantics.

My main disagreement starts with the title “One Person Can’t Learn or Do Everything”. As clarified in webcast's chat, the author defines “everything" as an umbrella for “all the capabilities and experiences that comprise Fabric including both technical (like Power BI) or non-technical (like adoption data literacy) and everything in between” [1].

For me “everything” is relative and considers a domain's core set of knowledge, while "expertise" (≠ "mastery") refers to the degree to which a person can use the respective knowledge to build back-to-back solutions for a given area. I’d say that it becomes more and more challenging for beginners or average data professionals to cover the core features. Moreover, I’d separate the non-technical skills because then one will also need to consider topics like Data, Project, Information or Knowledge Management.

There are different levels of expertise, and they can vary in depth (specialization) or breadth (covering multiple areas), respectively depend on previous experience (whether one worked with similar technologies). Usually, there’s a minimum of requirements that need to be covered for being considered as expert (e.g. certification, building a solution from beginning to the end, troubleshooting, performance optimization, etc.). It’s also challenging to roughly define when one’s expertise starts (or ends), as there are different perspectives on the topics.

Conversely, the term expert is in general misused extensively, sometimes even with a mischievous intent. As “expert” is usually considered an external consultant or a person who got certified in an area, even if the person may not be able to build solutions that address a customer’s needs.

Even data professionals with many years of experience can be overwhelmed by the volume of knowledge, especially when one considers the different experiences available in MF, respectively the volume of new features released monthly. Conversely, expertise can be considered in respect to only one or more MF experiences or for one area within a certain layer. Lot of the knowledge can be transported from other areas – writing SQL and complex database objects, modelling (enterprise) semantic layers, programming in Python, R or Power Query, building data pipelines, managing SQL databases, etc.

Besides the standard documentation, training sessions, and some reference architectures, Microsoft made available also some labs and other material, which helps discovering the features available, though it doesn’t teach people how to build complete solutions. I find more important than declaring explicitly the role-based audience, the creation of learning paths for the various roles.

During the past 6-7 months I've spent on average 2 days per week learning MF topics. My problem is not the documentation but the lack of maturity of some features, the gaps in functionality, identifying the respective gaps, knowing what and when new features will be made available. The fact that features are made available or changed while learning makes the process more challenging.

My goal is to be able to provide back-to-back solutions and I believe that’s possible, even if I might not consider all the experiences available. During the past 22 years, at least until MF, I could build complete BI solutions starting from requirements elicitation, data extraction, modeling and processing for data consumption, respectively data consumption for the various purposes. At least this was the journey of a Software Engineer into the world of data.

References:
[1] Explicit Measures (2024) Power BI tips Ep.328: Microsoft Fabric is a Team Sport (link)
[2] Data Goblins (2024) Fabric is a Team Sport: One Person Can’t Learn or Do Everything (link)

30 December 2023

💫🗒️ERP Systems: Microsoft Dynamics 365's Invoice Capture (Features) [Notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide an overview of all the features. Please refer to the documentation for a complete overview!

In what concerns the releases see [2].
Last updated: 10-Feb-2025

Invoice Capture - Main Components [3]

AI Model

{feature} prebuilt model (aka Invoice processing model) [by design]

can handle the most common invoices in various languages
owned by Microsoft and cannot be trained by the customers

{feature} custom prebuilt models [planned]

built on top of the prebuilt model to handle more complex invoice layouts
only requires to train the exceptional invoices
after a model is published, additional mapping is required to map the model fields to the invoice files

{feature} custom model [by design]

requires training the model from scratch

costs time and effort

supports the Charges and Sales Tax Amount fields [1.5.0.2]
support lookup lists on custom fields [1.6.0.x]

Channels

flows that collect the invoices into one location

there's always a 1:1 relationship between flows and channels

multiple channels can be defined using different triggers based on Power Automated connectors
{feature}default channel for Upload files [1.0.1.0]
{feature} supports multiple sources via flow templates [by design]

Outlook.com
Microsoft Outlook 365
Microsoft Outlook 365 shared mailbox [1.1.0.10]

to achieve similar behavior one had to modify the first steps of the generated flow with the "When a new email arrives in a shared mailbox (V2)" trigger

SharePoint
OneDrive
OneDrive for business [1.0.1.0]

{feature} assign legal entity on the channel

the LE value is automatically assigned without applying additional derivation logic [1]

invoices sent via predefined channels are captured and appear on the 'Received files' page

Configuration groups

allow managing the list of invoice fields and the manual review settings
can be assigned for each LE or Vendors.

all the legal entities in the same configuration group use the same invoice fields and manual review setting

default configuration group

created after deployment, can't be changed or deleted

Fields

{feature} standard fields

Legal entity

organizations registered with legal authorities in D365 F&O and selected for Invoice capture
{feature} allow to enforce role-based security model
{feature} synchronized from D365 F&O to CRM
{missing feature} split the invoices between multiple LEs
{feature} Sync deleted legal entities

when legal entities are deleted in D365, their status is set to 'Inactive' [1.9.0.3]

⇐ are not going to be derived during Invoice capture processing

⇒ invoices with inactive legal entities cannot be transferred to D365FO
⇐ same rules apply to vendor accounts

Vendor master

individuals or organizations that supply goods or services
used to automatically derive the Vendor account
{feature} synchronized from D365 F&O to CRM

synchronization issues solved [1.6.0.x]

{feature}Vendor account can be derived from tax number [1.0.1.0]
{feature} Synchronize vendors based on filter conditions [1.9.0.3]
- one can set filter conditions to only synchronize vendors that are suitable for inclusion in Invoice capture
{feature} Sync deleted vendor accounts

when vendor accounts are deleted in D365, their status is set to 'Inactive' [1.9.0.3]

Invoice header

{feature} Currency code can be derived from the currency symbol [1.0.1.0]

Invoice lines
Item master

{feature} Item number can be derived from External item number [1.0.1.0]
{feature} default the item description for procurement category item by the value from original document [1.1.0.32/10.0.39]

Expense types (aka Procurement categories)
Charges

amounts added to the lines or header
{feature} header-level charges [1.1.0.10]
{missing feature}line-level charges

{feature}Financial dimensions

header level [1.1.0.32/10.0.39]
line level [1.3.0.x]

Purchase orders

{feature} PO formatting based on the number sequence settings
{missing feature}PO details
{missing feature} multiple POs, multiple receipts

{missing feature} Project information integration

{workaround} the Project Id can be added as custom field on Invoice line for Cost invoices and with this the field should be mapped with the corresponding Data entity for transferring the value for D365 F&O

{feature} custom fields [1.1.0.32/10.0.39]

File filter

applies additional filtering to incoming files at the application level
with the installation a default updatable global file filter is provided
can be applied at different channel levels
{event} an invoice document is received

the channel is checked first for a file filter
if no file filter is assigned to the channel level, the file filter at the system level is used [1]

{configuration} maximum files size; 20 MB
{configuration} supported files types: PDF, PNG, JPG, JPEG, TIF, TIFF
{configuration} supported file names

filter out files that aren't relevant to invoices [1]
rules can be applied to accept/exclude files whose name contains predefined strings [1]

Actions

Import Invoice

{feature} Channel for file upload

a default channel is provided for directly uploading the invoice files
maximum 20 files can be uploaded simultaneously

Capture Invoice

{feature} Invoice capture processing

different derivation rules are applied to ensure that the invoices are complete and correct [1]

{missing feature} differentiate between relevant and non-relevant content}
{missing feature} merging/splitting files

{workaround} export the file(s), merge/split them, and import the result

Void files [1.0.1.0]

once the files voided, it is allowed to be deleted from Dataverse

saves storage costs

Classify Invoice

{feature} search for LEs
{feature} assign LE to Channel

AP clerks view only the invoices under the LE which are assigned to them

{feature} search for Vendor accounts by name or address

Maintaining Headers

{feature} multiple sales taxes [1.1.0.26]
{missing feature} rounding-off [only in D365 F&O]
{feature} support Credit Notes [1.5.0.2]

Maintaining Lines

{feature} Add/remove lines
{feature} "Remove all" option for deleting all the invoice lines in Side-by-Side Viewer [1.1.0.26]

Previously it was possible to delete the lines one by one, which by incorrectly formatted big invoices would lead to considerable effort. Imagine the invoices from Microsoft or other cloud providers that contain 5-10 pages of lines.

{feature}Aggregate multiple lines [requested]

Now all lines from an invoice have the same importance. Especially by big invoices, it would be useful to aggregate the amounts from multiple lines under one.

{feature} Show the total amount across the lines [requested]

When removing/adding lines, it would be useful to compare the total amount across the lines with the one from the header.

Check the UoM consistency between invoice line and linked purchase order line

For the Invoice to be correctly processed, the two values must match.

Support for discounts [requested]

as workaround, discounts can be entered as separate lines

Transfer invoice

Automation

{parameter} Auto invoice cleanup

automatically cleans up the transferred invoices and voided invoices older than 180 days every day [1]

Use continuous learning

select this option to turn on the continuous learning feature
learns from the corrections made by the AP clerk on a previous instance of the same invoice [1]

records the mapping relationship between the invoice context and the derived entities [1]
the entities are automatically derived for the next time a similar invoice is captured [1]

{missing feature} standard way to copy the continuous learning from UAT to PROD
{feature} migration tool for continuous learning knowledge

allows us to transfer the learning knowledge from one environment to another [1.6.0.x]

{feature} Continuous learning for decimal format [1.9.0.3]
- allows the system to learn from the history record and automatically apply the correct decimal format on the amount fields
{feature} date format handling

fixes the issue with date formatting, which is caused by ambiguity in date recognition [1.8.3.0]

when a user corrects the date on the first invoice, the corresponding date format will be automatically applied to the future invoice if it is coming from the same vendor
enabled only when the "Using continuous learning" parameter is active

Confidence score check

for the prebuilt model the confidence score is always the same

its value is returned by the AI Builder service
confidence score can only be improved only when the customer prebuilt model is used
it can be increased by uploading more samples and do the tagging accordingly
a low confidence score is caused by the fact that not enough samples with the same pattern have been trained

{parameter} control the confidence score check [1.1.0.32/10.0.39]

Manage file filters

User Interface

Side-by-side view [by design]

users can adjust column widths [1.8.3.0]

improves the review experience

History logs [1.0.1.0]

supported in Received files and Captured invoices
help AP clerks know the actions and results in each step during invoice processing

Navigation to D365 F&O [1.0.1.0]

once the invoice is successfully transferred to D365 F&O, a quick link is provided for the AP clerk to open the Pending vendor invoice list in F&O

Reporting

{missing feature} consolidated overview across multiple environments (e.g. for licensing needs evaluation)
{missing features} metrics by Legal entity, Processing status, Vendor and/or Invoice type

{workaround} a paginated report can be built based on Dataverse

Data Validation

[Invoice Capture] derivation rules

applied to ensure that the invoices are complete and correct [1]
{missing feature} derive vendor using the associated email address
{parameter} Format purchase order

used to check the number sequence settings in D365 F&O to format the PO number [1]

{parameter} Derive currency code for cost invoice

used to derive the value from invoice master data in D365 F&O [1]
⇐ the currency code on PO Invoices must be identical to the one on PO

{parameter} Validate total sales tax amount

validates the consistency between the sales tax amount on the Sales tax card and the total sales tax amount, when there's a sales tax line [1]

{parameter} Validate total amount

confirm alignment between the calculated total invoice amount and the captured total amount [1]

[Invoice capture] automation

{feature} Automatically remove invalid field value [1.9.0.3]
- when enabled, the system automatically removes the value if it doesn't exist in the lookup list

[D365 F&O] before workflow submissions [requested]

it makes sense to have out-of-box rules
currently this can be done by implementing extensions

[D365 F&O] during workflow execution [by design]

Dynamics 365 for Finance [aka D365 F&O]

Attachments

{parameter} control document type for persisting the invoice attachment in D365 F&O [1.1.0.32/10.0.39]

Invoice Capture

{parameter} select the entities in scope
{parameter}differentiate by Invoice type whether Vendor invoice or Invoice journal is used to book the invoices
{parameter} Transfer attachment

Fixed Assets

{missing feature} create Fixed asset automatically during the time the invoice is imported

Vendor Invoice Journal

{missing feature} configure what journal the invoice is sent to

the system seems to pick the first journal available
{parameter} control journal name for creating 'Invoice journal' [1.1.0.32/10.0.39]

{missing feature}grouping multiple invoices together in a journal (e.g., vendor group, payment terms, payment of method)

Approval Workflow

{missing feature} involve Responsible person for the Vendor [requested]
{missing feature} involve Buyer for Cost invoices [requested]
{missing feature} differentiator between the various types of invoices [requested]
{missing feature} amount-based approval [requested]

Billing schedule

{feature} integration with Billing schedules
{feature} modify or cancel Billing schedules

Reporting

{missing feature} Vendor invoices hanging in the approval workflow (incl. responsible person for current action, respectively error message)
{missing feature} Report for GL reconciliation between Vendor invoice and GL via Billing schedules
{missing feature} Overview of the financial dimensions used (to identify whether further setup is needed)

Licensing

licensing name: Dynamics 365 E-Invoicing Documents - Electronic Invoicing Add-on for Dynamics 365
customers are entitled to 100 invoice capture transactions per tenant per month

if customers need more transactions, they must purchase extra Electronic Invoicing SKUs for 1,000 transactions per tenant per month [5]

the transaction capacity is available on a monthly, use-it-or-lose-it basis [5]
customers must purchase for peak capacity [5]

only the captured invoices are going to be considered as valid transactions [4]

files filtered by the file filter setting won't be counted [4]

Architecture

invoice capture is supported only when the integrated Dataverse environment exists [4]

Security

[role] Invoice capture operator

must be included in the role settings to

successfully run the derivation and validation logic in Invoice capture [5]
transfer the invoice to D365F&O [5]
the role must be added to the corresponding flow user on the app side [5]

is only applied on the D365 side [4]

⇒ it doesn't block user to login to Invoice capture [4]
⇐ the corresponding access to the virtual entities will be missed [4]

then the user will receive an error [4]

[role] Environment maker

must be assigned to the Accounts payable administrator if they create channels in Invoice capture [5]

Previous post <<||>> Next post

Resources:
[1] Microsoft Learn (2023) Invoice capture overview (link)
[2] Yammer (2023) Invoice Capture for Dynamics 365 Finance (link)
[3] Microsoft (2023) Invoice Capture for Dynamics 365 Finance - Implementation Guide
[4] Yammer (2025) Quota/licensing follow up [link]
[5] Microsoft Learn (2024) Dynamics 365 Finance: Invoice capture solution [link]

Release history (requires authentication)

1.9.1.x (04-Feb-2025)
1.9.0.3 (18-Dec-2024)
1.8.3.0 (29-Oct-2024)
1.7.0.x (23-Aug-2024)
1.6.0.x (20-May-2024)
1.5.0.2 (17-Apr-2024)
1.4.0.x (26-Feb-2024)
1.3.0.x (30-Jan-2024)
1.2.1.x (28-Dec-2023)
1.1.0.32 (07-Nov-2023)
1.1.0.26 (03-Nov-2023)
1.1.0.10 (18-Aug-2023)

Acronyms:
AI - Artificial Intelligence
AP - Accounts Payable
F&O - Finance & Operations
LE - Legal Entity
PO - Purchase Order
SKU - Stock Keeping Units
UoM- Unit of Measure

29 March 2021

Notes: Team Data Science Process (TDSP)

Team Data Science Process (TDSP)

an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently [1]
{goal} help customers fully realize the benefits of their analytics program [1]
{component} data science lifecycle definition
- {description} a framework to structure the development of data science projects [1]
- {goal} designed for data science projects that ship as part of intelligent applications that deploy ML & AI models for predictive analytics [1]
- {benefit} can be used in the context of other DM methodologies as they have common ground [1]
  - e.g. CRISP-DM, KDD
- {benefit} exploratory data science projects or improvised analytics projects can also benefit from using this process [1]
{component} standardized project structure
- {description} a directory structure that includes templates for project documents
  - ⇒makes it easy for team members to find information [1]
  - ⇐templates for the folder structure and required documents are provided in standard locations [1]
  - all code and documents are stored in an agile VCS tracking repository [1]
    - {recommendation} create a separate repository for each project on the VCS for versioning, information security, and collaboration [1]
- {benefit} organizes the code for the various activities [1]
- {benefit} allows tracking the progress [1]
- {benefit} provides checklist with key questions for each project to guarantee process and deliverables’ quality [1]
- {benefit} enables team collaboration [1]
- {benefit} allows closer tracking of the code for individual features [1]
- {benefit} enables teams to obtain better cost estimates [1]
- {benefit} helps build institutional knowledge across the organization [1]
{component} recommended infrastructure
- {description} a set of recommendations for the infrastructure and resources needed for analytics and storage [1]
- {benefit} addresses cloud and/or on-premises requirements [1]
- {benefit} enables reproducible analysis [1]
- {benefit} avoids infrastructure duplication [1]
  - ⇒minimizes inconsistencies and unnecessary infrastructure costs [1]
- {tools} tools are provided to provision the shared resources, track them, and allow each team member to connect to those resources securely [1]
- {good practice} create a consistent compute environment [1]
  - ⇐allows team members replicate and validate experiments [1]
{component} recommended tools and utilities
- {description} a set of recommendations for the tools and utilities needed for project’s execution [1]
- {benefit} help lower the barriers and increase the consistency of their adoption [1]
- {benefit} provides an initial set of tools and scripts to jump-start methodology’s adoption [1]
- {benefit} helps automate some of the common tasks in the data science lifecycle [1]
  - e.g. data exploration and baseline modeling [1]
- {benefit} well-defined structure provided for individuals to contribute shared tools and utilities into their team's shared code repository [1]
  - ⇐ resources can then be leveraged by other projects [1]
{phase} 1: business understanding
- {goal} define and document the business problem, its objectives, the needed attributes, and the metric(s) used to determine project’s success
- {goal} identify and document the relevant data sources
- {step} 1.1: define project’s objectives
  - elicit together with the stakeholders the requirements, define and document the problem and its objectives, respectively the metric(s) used to determine project’s success
    - requires a good understanding of the business processes, data and further characteristics
- {step} 1.2: identify data sources
  - identify the attributes and the data sources relevant to the problem under study
- {step} 1.3: define project plan and team*
  - develop a high-level milestone plan and identify the resources needed for executing it
- {tool} project charter
  - standard template that documents the business problem, the scope of the project, the business objectives and metric(s) used to determine project’s success
{phase} 2: data acquisition & understanding
- {goal} prepare the base dataset(s) as needed by the modeling phase into the target repository
- {goal} build the data ETL/ELT architecture and processes needed for provisioning the basis data
- {step} 2.1: ingest data
  - make the required data available for the team in the repository where the analytics operations take place
- {step} 2.2: explore data
  - understand data’s characteristics by leveraging specific tools (visualization, analysis)
  - prepare the data as needed for further processing
- {step} 2.3: set up pipelines
  - build the pipelines needed for data actualization and qualitative assessment [3]
  - set up a process to score new data or refresh the data regularly [3]
- {step} 2.4: feasibility analysis*
  - reevaluate the project to determine whether the value expected is sufficient to continue pursuing it
- {tool} data quality report
  - report that includes data summaries, data mappings, variable ranking, data qualitative assessment(s) and further information [3]
- {tool} solution architecture
  - diagram and/or textual-based description of the data pipeline(s), technical assumptions and further aspects
- {tool} data reports
  - document the structure and statistics of the raw data
- {tool} checkpoint decision
  - decision template document that
    - summarizes the findings of the feasibility analysis step
    - includes a set of choices and recommendations for the next steps
    - serves as basis for the decision on whether to continue or not the project, respectively what the next steps are
{phase} 3: modeling
- {goal} create a machine-learning model that addresses the prediction requirements and that's suitable for production
- {step} 3.1: feature engineering
  - the inclusion, aggregation, and transformation of raw variables to create the features used in the analysis [4]
    - ⇐requires a good understanding of how the features relate to each other and how the ML algorithms use those features [4]
- {step} 3.2: model selection*
  - choose one or more modeling algorithms that address problem’s characteristics the best
- {step} 3.3: model training
  - involves the following steps:
    - split the input data into training and test datasets
    - build the models by using the training dataset
    - evaluate the training and the test data set
    - determine the optimal setup and methods
- {step} 3.4: model evaluation
  - evaluate the performance of the model(s)
- {step} 3.5: feasibility analysis*
  - evaluate the readiness of the models for use into production, respectively on whether they fulfill project’s objectives
- {tool} feature sets
  - describe the features developed for the modeling and how they were generated
  - contains pointers to the code used to generate the features
- {tool} model report
  - a standard, template-based report that provides details on each experiment’s outcomes
  - created for each model tried
- {tool} checkpoint decision
- {tool} model performance metrics
  - e.g. ROC curves or MSE
{phase} 4: deployment
- {goal} deploy the models and the data pipelines to the environment used for final user acceptance
- {step} 4.1: operationalize architecture
  - prepare the models and data pipelines for use into production
  - {best practice} expose the models over an open API interface
    - enables models’ consumption from various applications
  - {best practice} build telemetry and monitoring into the models and the data pipelines [5]
    - helps in monitoring and troubleshooting [5]
- {step} 4.2: deploy solution*
  - deploy the architecture into production
- {tool} status dashboard
  - displays data on system’s health and key metrics
- {tool} model report
  - the report in its final form with deployment information
- {tool} solution architecture
  - the document in its final form
{phase} 5: customer acceptance
- {goal} confirm that project’s objectives were fulfilled and get customer’s acceptance
- {step} 5.1: system validation
  - validate system’s performance and outcomes and confirm that it fulfills customer’s needs
- {step} 5.2: project signoff*
  - finalize and review documentation
  - handover the solution and afferent documentation to customer
  - evaluate the project against the defined objectives and get customer’ signoff
- {tool} exit report
- {tool} technical report
  - contains all the details of the project that are useful for learning about how to operate the system [6]

Acronyms:

Artificial Intelligence (AI)

Cross-Industry Standard Process for Data Mining (CRISP-DM)

Data Mining (DM)

Knowledge Discovery in Databases (KDD)

Team Data Science Process (TDSP)

Version Control System (VCS)

Visual Studio Team Services (VSTS)

Resources:

[1] Microsoft Azure (2020) What is the Team Data Science Process? [source]

[2] Microsoft Azure (2020) The business understanding stage of the Team Data Science Process lifecycle [source]

[3] Microsoft Azure (2020) Data acquisition and understanding stage of the Team Data Science Process [source]

[4] Microsoft Azure (2020) Modeling stage of the Team Data Science Process lifecycle [source]

[5] Microsoft Azure (2020) Deployment stage of the Team Data Science Process lifecycle [source]

[6] Microsoft Azure (2020) Customer acceptance stage of the Team Data Science Process lifecycle [source]

SQL Troubles

Pages

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

20 January 2025

🏭🗒️Microsoft Fabric: [Azure] Service Principals (SPN) [Notes]

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

09 December 2024

🏭🗒️Microsoft Fabric: Microsoft Fabric [Notes]

13 June 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: One Person Can’t Learn or Do Everything)

30 December 2023

💫🗒️ERP Systems: Microsoft Dynamics 365's Invoice Capture (Features) [Notes]

29 March 2021

Notes: Team Data Science Process (TDSP)

About Me