Showing posts with label features. Show all posts
Showing posts with label features. Show all posts

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

  • {def} parameters that allow to dynamically control and customize Dataflows Gen2
    • makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
    • the dataflow is refreshed by passing parameter values outside of the Power Query editor through either
      • Fabric REST API [1]
      • native Fabric experiences [1]
    • parameter names are case sensitive [1]
    • {type} required parameters
      • {warning} the refresh fails if no value is passed for it [1]
    • {type} optional parameters
    • enabled via Parameters >> Enable parameters to be discovered and override for execution [1]
  • {limitation} dataflows with parameters can't be
    • scheduled for refresh through the Fabric scheduler [1]
    • manually triggered through the Fabric Workspace list or lineage view [1]
  • {limitation} parameters that affect the resource path of a data source or a destination are not supported [1]
    • ⇐ connections are linked to the exact data source path defined in the authored dataflow
      • can't be currently override to use other connections or resource paths [1]
  • {limitation} can't be leveraged by dataflows with incremental refresh [1]
  • {limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override
    • any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]
  • {warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
  • {limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
  • {limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
  • {limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]
    • subsequent requests are rejected until the first request finishes its evaluation [1]

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link

Acronyms:
API - Application Programming Interface
REST - Representational State Transfer

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

ERP Implementations Series
ERP Implementations Series

An ERP implementation is occasionally considered as a one-time endeavor after which an organization will live happily ever after. In an ideal world that would be true, though the work never stops – things that were carved out from the implementation, optimizations, new features, new regulations, new requirements, integration with other systems, etc. An implementation is thus just the beginning from what it comes and it's essential to get the foundation right – and that’s the purpose of the ERP implementation – provide a foundation on which something bigger and solid can be erected. 

No matter how well an ERP implementation is managed and executed, respectively how well people work towards the same goals, there’s always something forgotten or carved out from the initial project. Usually, the casual suspects are the integrations with other systems, though there can be also minor or even bigger features that are planned to be addressed later, if the implementation hasn’t consumed already all the financial resources available, as it's usually the case. Some of the topics can be addressed as Change Requests or consolidated on projects of their own. 

Even simple integrations can become complex when the processes are poorly designed, and that typically happens more often than people think. It’s not necessarily about the lack of skillset or about the technologies used, but about the degree to which the processes can work in a loosely coupled interconnected manner. Even unidirectional integrations can raise challenges, though everything increases in complexity when the flow of data is bidirectional. Moreover, the complexity increases with each system added to the overall architecture. 

Like a sculpture’s manual creation, processes in an ERP implementation form a skeleton that needs chiseling and smoothing until the form reaches the desired optimized shape. However, optimization is not a one-time attempt but a continuous work of exploring what is achievable, what works, what is optimal. Sometimes optimization is an exact science, while other times it’s about (scientifical) experimentation in which theory, ideas and investments are put to good use. However, experimentation tends to be expensive at least in terms of time and effort, and probably these are the main reasons why some organizations don’t even attempt that – or maybe it’s just laziness, pure indifference or self-preservation. In fact, why change something that already works?

Typically, software manufacturers make available new releases on a periodic basis as part of their planning for growth and of attracting more businesses. Each release that touches used functionality typically needs proper evaluation, testing and whatever organizations consider as important as part of the release management process. Ideally, everything should go smoothly though life never ceases to surprise and even a minor release can have an important impact when earlier critical functionality stopped working. Test automation and other practices can make an important difference for organizations, though these require additional effort and investments that usually pay off when done right. 

Regulations and other similar requirements must be addressed as they can involve penalties or other risks that are usually worth avoiding. Ideally such requirements should be supported by design, though even then a certain volume of work is involved. Moreover, the business context can change unexpectedly, and further requirements need to be considered eventually. 

The work on an ERP system and the infrastructure built around it is a never-ending story. Therefore, organizations must have not only the resources for the initial project, but also what comes after that. Of course, some work can be performed manually, some requirements can be delayed, some risks can be assumed, though the value of an ERP system increases with its extended usage, at least in theory. 

12 March 2025

🏭🎗️🗒️Microsoft Fabric: Query Acceleration [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Mar-2025

Query Acceleration
Query Acceleration [2]

[Microsoft Fabric] Query Acceleration

  • {def} 
    • indexes and caches on the fly data landing in OneLake [2]
      • {benefit} allows to 
        • analyze real-time streams coming directly into Eventhouse and combine it with data landing in OneLake 
        • ⇐ either coming from mirrored databases, Warehouses, Lakehouses or Spark [2] 
        • ⇒ accelerate data landing in OneLake
          • ⇐ including existing data and any new updates, and expect similar performance [1]
          • eliminates the need to 
            • manage ingestion pipelines [1]
            • maintain duplicate copies of data [1]
          • ensures that data remains in sync without additional effort [4]
          • the initial process is dependent on the size of the external table [4]
      • ⇐ provides significant performance comparable to ingesting data in Eventhouse [1]
        • in some cases up to 50x and beyond [2]
      • ⇐ supported in Eventhouse over delta tables from OneLake shortcuts, etc. [4]
        • when creating a shortcut from an Eventhouse to a OneLake delta table, users can choose if they want to accelerate the shortcut [2]
        • accelerating the shortcut means equivalent ingestion into the Eventhouse
        • ⇐  optimizations that deliver the same level of performance for accelerated shortcuts as native Eventhouse tables [2]
        • e.g. indexing, caching, etc. 
      • all data management is done by the data writer and in the Eventhouse the accelerated table shortcut [2]  
      • behave like external tables, with the same limitations and capabilities [4]
        • {limitation} materialized view aren't supported [1]
        • {limitation} update policies aren't supported [1]
    • allows specifying a policy on top of external delta tables that defines the number of days to cache data for high-performance queries [1]
      • ⇐ queries run over OneLake shortcuts can be less performant than on data that is ingested directly to Eventhouses [1]
        • ⇐ due to network calls to fetch data from storage, the absence of indexes, etc. [1]
    • {costs} charged under OneLake Premium cache meter [2]
      • ⇐ similar to native Eventhouse tables [2]
      • one can control the amount of data to accelerate by configuring number of days to cache [2]
      • indexing activity may also count towards CU consumption [2]
    • {limitation} the number of columns in the external table can't exceed 900 [1]
    • {limitation} query performance over accelerated external delta tables which have partitions may not be optimal during preview [1]
    • {limitation} the feature assumes delta tables with static advanced features
      • e.g. column mapping doesn't change, partitions don't change, etc
      • {recommendation} to change advanced features, first disable the policy, and once the change is made, re-enable the policy [1]
    • {limitation} schema changes on the delta table must also be followed with the respective .alter external delta table schema [1]
      • might result in acceleration starting from scratch if there was breaking schema change [1]
    • {limitation} index-based pruning isn't supported for partitions [1]
    • {limitation} parquet files with a compressed size higher than 6 GB won't be cached [1]

References:
[1] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing Eventhouse Query Acceleration for OneLake Shortcuts (Preview) [link]
[3] Microsoft Learn (2024) Fabric: Query acceleration over OneLake shortcuts (preview) [link]
[4] Microsoft Fabric Updates Blog (2025) Eventhouse Accelerated OneLake Table Shortcuts – Generally Available [link

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

  • {def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
  • identifiable by its name
    • {constraint} must be unique in a folder or at the root level of the workspace
    • {constraint} can’t include certain special characters [1]
      • C0 and C1 control codes [1]
      • leading or trailing spaces [1]
      • characters: ~"#.&*:<>?/{|} [1]
    • {constraint} can’t have system-reserved names
      • e.g. $recycle.bin, recycled, recycler.
    • {constraint} its length can't exceed 255 characters
  • {operation} create folder
    • can be created in
      • an existing folder (aka nested subfolder) [1]
        • {restriction} a maximum of 10 levels of nested subfolders can be created [1]
        • up to 10 folders can be created in the root folder [1]
        • {benefit} provide a hierarchical structure for organizing and managing items [1]
      • the root
  • {operation} move folder
  • {operation} rename folder
    • same rules applies as for folders’ creation [1]
  • {operation} delete folder
    • {restriction} currently can be deleted only empty folders [1]
      • {recommendation} make sure the folder is empty [1]
  •  {operation} create item in in folder
    • {restriction} certain items can’t be created in a folder
      • dataflows gen2
      • streaming semantic models
      • streaming dataflows
    • ⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]
  • {operation} move file(s) between folders [1]
  • {operation} publish to folder [1]
    •   Power BI reports can be published to specific folders
      • {restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]
        • when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]
  • {limitation}may not be supported by certain features
    •   e.g. Git
  • {recommendation} use folders to organize workspaces [1]
  • {permissions}
    • inherit the permissions of the workspace where they're located [1] [2]
    • workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
    • viewers can only view folder hierarchy and navigate in the workspace [1]
  • [deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post  <<||>>  Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

20 January 2025

🏭🗒️Microsoft Fabric: [Azure] Service Principals (SPN) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 20-Jan-2025

[Azure] Service Principal (SPN)  

  • {def} a non-human, application-based security identity used by applications or automation tools to access specific Azure resources [1]
    • can be assigned precise permissions, making them perfect for automated processes or background services
      • allows to minimize the risks of human error and identity-based vulnerabilities
      • supported in datasets, Gen1/Gen2 dataflows, datamarts [2]
      • authentication type 
        • supported only by [2]
          • Azure Data Lake Storage
          • Azure Data Lake Storage Gen2
          • Azure Blob Storage
          • Azure Synapse Analytics
          • Azure SQL Database
          • Dataverse
          • SharePoint online
        • doesn’t support
          • SQL data source with Direct Query in datasets [2]
  • when registering a new application in Microsoft Entra ID, a SPN is automatically created for the app registration [4]
    • the access to resources is restricted by the roles assigned to the SPN
      • ⇒ gives control over which resources can be accessed and at which level [4]
    • {recommendation} use SPN with automated tools [4]
      • rather than allowing them to sign in with a user identity  [4]
    • {prerequisite} an active Microsoft Entra user account with sufficient permissions to 
      • register an application with the tenant [4]
      • assign to the application a role in the Azure subscription [4]
      •  requires Application.ReadWrite.All permission [4]
  • extended to support Fabric Data Warehouses [1]
    • {benefit} automation-friendly API Access
      • allows to create, update, read, and delete Warehouse items via Fabric REST APIs using service principals [1]
      • enables to automate repetitive tasks without relying on user credentials [1]
        • e.g. provisioning or managing warehouses
        • increases security by limiting human error
      • the warehouses thus created, will be displayed in the Workspace list view in Fabric UI, with the Owner name of the SPN [1]
      • applicable to users with administrator, member, or contributor workspace role [3]
      • minimizes risk
        • the warehouses created with delegated account or fixed identity (owner’s identity) will stop working when the owner leaves the organization [1]
          • Fabric requires the user to login every 30 days to ensure a valid token is provided for security reasons [1]
    • {benefit} seamless integration with Client Tools: 
      • tools like SSMS can connect to the Fabric DWH using SPN [1]
      • SPN provides secure access for developers to 
        • run COPY INTO
          • with and without firewall enabled storage [1]
        • run any T-SQL query programmatically on a schedule with ADF pipelines [1]
    • {benefit} granular access control
      • Warehouses can be shared with an SPN through the Fabric portal [1]
        • once shared, administrators can use T-SQL commands to assign specific permissions to SPN [1]
          • allows to control precisely which data and operations an SPN has access to  [1]
            • GRANT SELECT ON <table name> TO <Service principal name>  
      • warehouses' ownership can be changed from an SPN to user, and vice-versa [3]
    • {benefit} improved DevOps and CI/CD Integration
      • SPN can be used to automate the deployment and management of DWH resources [1]
        •  ensures faster, more reliable deployment processes while maintaining strong security postures [1]
    • {limitation} default semantic models are not supported for SPN created warehouses [3]
      • ⇒ features such as listing tables in dataset view, creating report from the default dataset don’t work [3]
    • {limitation} SPN for SQL analytics endpoints is not currently supported
    • {limitation} SPNs are currently not supported for COPY INTO error files [3]
      • ⇐ Entra ID credentials are not supported as well [3]
    • {limitation} SPNs are not supported for GIT APIs. SPN support exists only for Deployment pipeline APIs [3]
    • monitoring tools
      • [DMV] sys.dm_exec_sessions.login_name column [3] 
      • [Query Insights] queryinsights.exec_requests_history.login_name [3]
      • Query activity
        • submitter column in Fabric query activity [3]
      • Capacity metrics app: 
        • compute usage for warehouse operations performed by SPN appears as the Client ID under the User column in Background operations drill through table [3]

References:
[1] Microsoft Fabric Updates Blog (2024) Service principal support for Fabric Data Warehouse [link]
[2] Microsoft Fabric Learn (2024) Service principal support in Data Factory [link]
[3] Microsoft Fabric Learn (2024) Service principal in Fabric Data Warehouse [link
[4] Microsoft Fabric Learn (2024) Register a Microsoft Entra app and create a service principal [link]
[5] Microsoft Fabric Updates Blog (2024) Announcing Service Principal support for Fabric APIs [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link
 
Acronyms:
ADF - Azure Data Factory
API - Application Programming Interface
CI/CD - Continuous Integration/Continuous Deployment
DMV - Dynamic Management View
DWH - Data Warehouse
SPN - service principal
SSMS - SQL Server Management Studio

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series
Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider: 

Area Lakehouse Warehouse Eventhouse Fabric SQL database Power BI Datamart
Data volume Unlimited Unlimited Unlimited 4 TB Up to 100 GB
Type of data Unstructured, semi-structured, structured Structured, semi-structured (JSON) Unstructured, semi-structured, structured Structured, semi-structured, unstructured Structured
Primary developer persona Data engineer, data scientist Data warehouse developer, data architect, data engineer, database developer App developer, data scientist, data engineer AI developer, App developer, database developer, DB admin Data scientist, data analyst
Primary dev skill Spark (Scala, PySpark, Spark SQL, R) SQL No code, KQL, SQL SQL No code, SQL
Data organized by Folders and files, databases, and tables Databases, schemas, and tables Databases, schemas, and tables Databases, schemas, tables Database, tables, queries
Read operations Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark T-SQL Spark, T-SQL
Write operations Spark (Scala, PySpark, Spark SQL, R) T-SQL KQL, Spark, connector ecosystem T-SQL Dataflows, T-SQL
Multi-table transactions No Yes Yes, for multi-table ingestion Yes, full ACID compliance No
Primary development interface Spark notebooks, Spark job definitions SQL scripts KQL Queryset, KQL Database SQL scripts Power BI
Security RLS, CLS**, table level (T-SQL), none for Spark Object level, RLS, CLS, DDL/DML, dynamic data masking RLS Object level, RLS, CLS, DDL/DML, dynamic data masking Built-in RLS editor
Access data via shortcuts Yes Yes Yes Yes No
Can be a source for shortcuts Yes (files and tables) Yes (tables) Yes Yes (tables) No
Query across items Yes Yes Yes Yes No
Advanced analytics Interface for large-scale data processing, built-in data parallelism, and fault tolerance Interface for large-scale data processing, built-in data parallelism, and fault tolerance Time Series native elements, full geo-spatial and query capabilities T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics Interface for data processing with automated performance tuning
Advanced formatting support Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Full indexing for free text and semi-structured data like JSON Table support for OLTP, JSON, vector, graph, XML, spatial, key-value Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency Available instantly for querying Available instantly for querying Queued ingestion, streaming ingestion has a couple of seconds latency Available instantly for querying Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag. 

The warehouse seems to be more versatile, though careful attention needs to be given to its design. 

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas. 

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.


References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]
[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]
[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]
[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]
[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link

09 December 2024

🏭🗒️Microsoft Fabric: Microsoft Fabric [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 8-Dec-2024

Microsoft Fabric 

  • {goal}complete (end-to-end) analytics platform [6]
    • {characteristic} unified
      • {objective} provides a single, integrated environment for all the organization
        • {benefit} data professionals and the business users can collaborate on data projects [5] and solutions
    • {characteristic}serverless SaaS model (aka SaaS-ified)
      • {objective} provisioned automatically with the tenant [6]
      • {objective} highly scalable [5]
      • {objective} cost-effectiveness [5]
      • {objective} accessible 
        • ⇐ from anywhere with an internet connection [5]
      • {objective} continuous updates
        • ⇐ provided by Microsoft
      • {objective} continuous maintenance 
        • ⇐ provided by Microsoft
      • provides a set of integrated services that enable to ingest, store, process, and analyze data in a single environment [5]
    • {objective} secure
    • {objective} governed
  • {goal} lake-centric
    • {characteristic} OneLake-based
      • all workloads automatically store their data in the OneLake workspace folders [6]
      • all the data is organized in an intuitive hierarchical namespace [6]
      • data is automatically indexed [6]
      • provides a set of features 
        • discovery
        • MIP labels
        • lineage
        • PII scans
        • sharing
        • governance
        • compliance
    • {characteristic} one copy
      • available for all computes 
      • all compute engines store their data automatically in OneLake
        •  the data is stored in a (single) common format
          •  delta parquet file format
            • open standards format
            • the storage format for all tabular data in Microsoft Fabric 
        • ⇐ the data is directly accessible by all the engines [6]
          • ⇐ no import/export needed
      • all compute engines are fully optimized to work with Delta Parquet as their native format [6]
      • a shared universal security model is enforced across all the engines [6]
    • {characteristic} open at every tier
  • {goal} empowering
    • {characteristic} intuitive
    • {characteristic} built into M365
    • {characteristic} insight to action
  • {goal} AI-powered
    • {characteristic} Copilot accelerated 
    • {characteristic} ChatGPT enabled
    • {characteristic} AI-driven insights
  •  complete analytics platform
    • addresses the needs of all data professionals and business users who target harnessing the value of data 
  • {feature} scales automatically
    • the system automatically allocates an appropriate number of compute resources based on the job size
    • the cost is proportional to total resource consumption, rather than size of cluster or number of resources allocated 
    •  jobs in general complete faster (and usually, at less overall cost)
      • ⇒ not need to specify cluster sizes
  • natively supports 
    • Spark
    • data science
    • log-analytics
    • real-time ingestion and messaging
    • alerting
    • data pipelines
    • Power BI reporting 
    • interoperability with third-party services 
      • from other vendors that support the same open 
  • data virtualization mechanisms 
    • {feature} mirroring [notes]
    • {feature} shortcuts [notes]
      • allow users to reference data without copying it
      • {benefit} make other domain data available locally without the need for copying data
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains [notes]
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • subdomains
      • a way for fine tuning the logical grouping data under a domain [1]
        • subdivisions of a domain

Resources:
[1] Microsoft Learn (2023) Administer Microsoft Fabric [link]
[2] Microsoft Learn: Fabric (2024) Governance overview and guidance [link]
[3] Microsoft Learn: Fabric (2023) Fabric domains [link]
[4] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam [link]
[5] Microsoft Learn: Fabric (2024) Introduction to end-to-end analytics using Microsoft Fabric [link]
[6] 
Microsoft Fabric (2024) Fabric Analyst in a Day [course notes]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
M365 - Microsoft 365
MF - Microsoft Fabric
PII - Personal Identification Information
SaaS - software-as-a-service

13 June 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: One Person Can’t Learn or Do Everything)

Business Intelligence Series
Business Intelligence Series

Today’s Explicit Measures webcast [1] considered an article written by Kurt Buhler (The Data Goblins): [Microsoft] "Fabric is a Team Sport: One Person Can’t Learn or Do Everything" [2]. It’s a well-written article that deserves some thought as there are several important points made. I can’t say I agree with the full extent of some statements, even if some disagreements are probably just a matter of semantics.

My main disagreement starts with the title “One Person Can’t Learn or Do Everything”. As clarified in webcast's chat, the author defines “everything" as an umbrella for “all the capabilities and experiences that comprise Fabric including both technical (like Power BI) or non-technical (like adoption data literacy) and everything in between” [1].

For me “everything” is relative and considers a domain's core set of knowledge, while "expertise" (≠ "mastery") refers to the degree to which a person can use the respective knowledge to build back-to-back solutions for a given area. I’d say that it becomes more and more challenging for beginners or average data professionals to cover the core features. Moreover, I’d separate the non-technical skills because then one will also need to consider topics like Data, Project, Information or Knowledge Management.

There are different levels of expertise, and they can vary in depth (specialization) or breadth (covering multiple areas), respectively depend on previous experience (whether one worked with similar technologies). Usually, there’s a minimum of requirements that need to be covered for being considered as expert (e.g. certification, building a solution from beginning to the end, troubleshooting, performance optimization, etc.). It’s also challenging to roughly define when one’s expertise starts (or ends), as there are different perspectives on the topics. 

Conversely, the term expert is in general misused extensively, sometimes even with a mischievous intent. As “expert” is usually considered an external consultant or a person who got certified in an area, even if the person may not be able to build solutions that address a customer’s needs. 

Even data professionals with many years of experience can be overwhelmed by the volume of knowledge, especially when one considers the different experiences available in MF, respectively the volume of new features released monthly. Conversely, expertise can be considered in respect to only one or more MF experiences or for one area within a certain layer. Lot of the knowledge can be transported from other areas – writing SQL and complex database objects, modelling (enterprise) semantic layers, programming in Python, R or Power Query, building data pipelines, managing SQL databases, etc. 

Besides the standard documentation, training sessions, and some reference architectures, Microsoft made available also some labs and other material, which helps discovering the features available, though it doesn’t teach people how to build complete solutions. I find more important than declaring explicitly the role-based audience, the creation of learning paths for the various roles.

During the past 6-7 months I've spent on average 2 days per week learning MF topics. My problem is not the documentation but the lack of maturity of some features, the gaps in functionality, identifying the respective gaps, knowing what and when new features will be made available. The fact that features are made available or changed while learning makes the process more challenging. 

My goal is to be able to provide back-to-back solutions and I believe that’s possible, even if I might not consider all the experiences available. During the past 22 years, at least until MF, I could build complete BI solutions starting from requirements elicitation, data extraction, modeling and processing for data consumption, respectively data consumption for the various purposes. At least this was the journey of a Software Engineer into the world of data. 

References:
[1] Explicit Measures (2024) Power BI tips Ep.328: Microsoft Fabric is a Team Sport (link)
[2] Data Goblins (2024) Fabric is a Team Sport: One Person Can’t Learn or Do Everything (link)

30 December 2023

💫🗒️ERP Systems: Microsoft Dynamics 365's Invoice Capture (Features) [Notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide an overview of all the features. Please refer to the documentation for a complete overview!

In what concerns the releases see [2].
Last updated: 10-Feb-2025

Invoice Capture - Main Components
Invoice Capture - Main Components [3]

AI Model

  • {feature} prebuilt model (aka Invoice processing model) [by design]
    • can handle the most common invoices in various languages
    • owned by Microsoft and cannot be trained by the customers
  • {feature} custom prebuilt models [planned]
    • built on top of the prebuilt model to handle more complex invoice layouts
    • only requires to train the exceptional invoices
    • after a model is published, additional mapping is required to map the model fields to the invoice files
  • {feature} custom model [by design]
    • requires training the model from scratch 
      • costs time and effort
    • supports the Charges and Sales Tax Amount fields [1.5.0.2]
    • support lookup lists on custom fields [1.6.0.x]

Channels

  • flows that collect the invoices into one location 
    • there's always a 1:1 relationship between flows and channels
  • multiple channels can be defined using different triggers based on Power Automated connectors
  • {feature}default channel for Upload files [1.0.1.0]
  • {feature} supports multiple sources via flow templates [by design]
    • Outlook.com
    • Microsoft Outlook 365
    • Microsoft Outlook 365 shared mailbox [1.1.0.10]
      • to achieve similar behavior one had to modify the first steps of the generated flow with the "When a new email arrives in a shared mailbox (V2)" trigger
    • SharePoint
    • OneDrive
    • OneDrive for business [1.0.1.0]
  • {feature} assign legal entity on the channel
    • the LE value is automatically assigned without applying additional derivation logic [1]
  • invoices sent via predefined channels are captured and appear on the 'Received files' page

Configuration groups 

  • allow managing the list of invoice fields and the manual review settings 
  • can be assigned for each LE or Vendors. 
    • all the legal entities in the same configuration group use the same invoice fields and manual review setting
  • default configuration group 
    • created after deployment, can't be changed or deleted

Fields

  • {feature} standard fields 
    • Legal entity 
      • organizations registered with legal authorities in D365 F&O and selected for Invoice capture
      • {feature} allow to enforce role-based security model
      • {feature} synchronized from D365 F&O to CRM 
      • {missing feature} split the invoices between multiple LEs
      • {feature} Sync deleted legal entities 
        • when legal entities are deleted in D365, their status is set to 'Inactive' [1.9.0.3]
            • ⇐ are not going to be derived during Invoice capture processing
          • ⇒ invoices with inactive legal entities cannot be transferred to D365FO
          • ⇐ same rules apply to vendor accounts
    • Vendor master 
      • individuals or organizations that supply goods or services
      • used to automatically derive the Vendor account
      • {feature} synchronized from D365 F&O to CRM
        • synchronization issues solved [1.6.0.x]
      • {feature}Vendor account can be derived from tax number [1.0.1.0]
      • {feature} Synchronize vendors based on filter conditions [1.9.0.3]
        • one can set filter conditions to only synchronize vendors that are suitable for inclusion in Invoice capture
      • {feature} Sync deleted vendor accounts 
        • when vendor accounts are deleted in D365, their status is set to 'Inactive' [1.9.0.3]
    • Invoice header
      • {feature} Currency code can be derived from the currency symbol [1.0.1.0]
    • Invoice lines
    • Item master
      • {feature} Item number can be derived from External item number [1.0.1.0]
      • {feature} default the item description for procurement category item by the value from original document [1.1.0.32/10.0.39]
    • Expense types (aka Procurement categories) 
    • Charges
      • amounts added to the lines or header
      • {feature} header-level charges [1.1.0.10]
      • {missing feature}line-level charges 
    • {feature}Financial dimensions 
      • header level [1.1.0.32/10.0.39]
      • line level [1.3.0.x]
    • Purchase orders
      • {feature} PO formatting based on the number sequence settings
      • {missing feature}PO details
      • {missing feature} multiple POs, multiple receipts 
    • {missing feature} Project information integration
      • {workaround} the Project Id can be added as custom field on Invoice line for Cost invoices and with this the field should be mapped with the corresponding Data entity for transferring the value for D365 F&O
  • {feature} custom fields [1.1.0.32/10.0.39]

File filter

  • applies additional filtering to incoming files at the application level
  • with the installation a default updatable global file filter is provided
  • can be applied at different channel levels
  • {event} an invoice document is received
    • the channel is checked first for a file filter
    • if no file filter is assigned to the channel level, the file filter at the system level is used [1]
  • {configuration} maximum files size; 20 MB
  • {configuration} supported files types: PDF, PNG, JPG, JPEG, TIF, TIFF
  • {configuration} supported file names 
    • filter out files that aren't relevant to invoices [1]
    • rules can be applied to accept/exclude files whose name contains predefined strings [1]
Actions 

  • Import Invoice
    • {feature} Channel for file upload
      • a default channel is provided for directly uploading the invoice files
      • maximum 20 files can be uploaded simultaneously
  • Capture Invoice
    • {feature} Invoice capture processing
      • different derivation rules are applied to ensure that the invoices are complete and correct [1]
    • {missing feature} differentiate between relevant and non-relevant content}
    • {missing feature} merging/splitting files
      • {workaround} export the file(s), merge/split them, and import the result
  • Void files [1.0.1.0]
    • once the files voided, it is allowed to be deleted from Dataverse
      • saves storage costs
  • Classify Invoice
    • {feature} search for LEs
    • {feature} assign LE to Channel
      • AP clerks view only the invoices under the LE which are assigned to them
    • {feature} search for Vendor accounts by name or address 
  • Maintaining Headers
    • {feature} multiple sales taxes [1.1.0.26]
    • {missing feature} rounding-off [only in D365 F&O]
    • {feature} support Credit Notes [1.5.0.2]
  • Maintaining Lines
    • {feature} Add/remove lines
    • {feature} "Remove all" option for deleting all the invoice lines in Side-by-Side Viewer [1.1.0.26]
      • Previously it was possible to delete the lines one by one, which by incorrectly formatted big invoices would lead to considerable effort. Imagine the invoices from Microsoft or other cloud providers that contain 5-10 pages of lines. 
    • {feature}Aggregate multiple lines [requested]
      • Now all lines from an invoice have the same importance. Especially by big invoices, it would be useful to aggregate the amounts from multiple lines under one. 
    • {feature} Show the total amount across the lines [requested]
      • When removing/adding lines, it would be useful to compare the total amount across the lines with the one from the header. 
    • Check the UoM consistency between invoice line and linked purchase order line
      • For the Invoice to be correctly processed, the two values must match. 
    • Support for discounts [requested]
      • as workaround, discounts can be entered as separate lines
  • Transfer invoice

  • Automation
    • {parameter} Auto invoice cleanup
      •  automatically cleans up the transferred invoices and voided invoices older than 180 days every day [1]
    • Use continuous learning 
      • select this option to turn on the continuous learning feature
      • learns from the corrections made by the AP clerk on a previous instance of the same invoice [1]
        • records the mapping relationship between the invoice context and the derived entities [1]
        • the entities are automatically derived for the next time a similar invoice is captured [1]
      • {missing feature} standard way to copy the continuous learning from UAT to PROD
      • {feature} migration tool for continuous learning knowledge
        • allows us to transfer the learning knowledge from one environment to another [1.6.0.x]
      • {feature} Continuous learning for decimal format [1.9.0.3]
        • allows the system to learn from the history record and automatically apply the correct decimal format on the amount fields
          • {requirement} manually correct the first incoming invoice and do the successful transfer
      • {feature} date format handling
        • fixes the issue with date formatting, which is caused by ambiguity in date recognition [1.8.3.0]
          • when a user corrects the date on the first invoice, the corresponding date format will be automatically applied to the future invoice if it is coming from the same vendor
          •  enabled only when the "Using continuous learning" parameter is active
    • Confidence score check
      • for the prebuilt model the confidence score is always the same 
        • its value is returned by the AI Builder service
        • confidence score can only be improved only when the customer prebuilt model is used 
        •  it can be increased by uploading more samples and do the tagging accordingly
        • a low confidence score is caused by the fact that not enough samples with the same pattern have been trained
      • {parameter} control the confidence score check [1.1.0.32/10.0.39]
  • Manage file filters
User Interface
  • Side-by-side view [by design]
    • users can adjust column widths [1.8.3.0]
      • improves the review experience
  • History logs [1.0.1.0]
    • supported in Received files and Captured invoices
    • help AP clerks know the actions and results in each step during invoice processing 
  • Navigation to D365 F&O [1.0.1.0]
    • once the invoice is successfully transferred to D365 F&O, a quick link is provided for the AP clerk to open the Pending vendor invoice list in F&O
  • Reporting
    • {missing feature} consolidated overview across multiple environments (e.g. for licensing needs evaluation)
    • {missing features} metrics by Legal entity, Processing status, Vendor and/or Invoice type
      • {workaround} a paginated report can be built based on Dataverse

Data Validation

  • [Invoice Capture] derivation rules 
    • applied to ensure that the invoices are complete and correct [1]
    • {missing feature} derive vendor using the associated email address
    • {parameter} Format purchase order 
      • used to check the number sequence settings in D365 F&O to format the PO number [1]
    • {parameter} Derive currency code for cost invoice
      • used to derive the value from invoice master data in D365 F&O [1]
      • ⇐ the currency code on PO Invoices must be identical to the one on PO
    • {parameter} Validate total sales tax amount 
      • validates the consistency between the sales tax amount on the Sales tax card and the total sales tax amount, when there's a sales tax line [1]
    • {parameter} Validate total amount 
      • confirm alignment between the calculated total invoice amount and the captured total amount [1]
  • [Invoice capture] automation
    • {feature} Automatically remove invalid field value [1.9.0.3]
      • when enabled, the system automatically removes the value if it doesn't exist in the lookup list
        • eliminates the need to manually removing values during the review, streamlining the process
        • {scenario} invalid payment terms errors
  • [D365 F&O] before workflow submissions [requested]
    • it makes sense to have out-of-box rules 
    • currently this can be done by implementing extensions
  • [D365 F&O] during workflow execution [by design]
Dynamics 365 for Finance [aka D365 F&O]
  • Attachments 
    • {parameter} control document type for persisting the invoice attachment in D365 F&O [1.1.0.32/10.0.39]
  • Invoice Capture
    • {parameter} select the entities in scope
    • {parameter}differentiate by Invoice type whether Vendor invoice or Invoice journal is used to book the invoices
    • {parameter} Transfer attachment
  • Fixed Assets
    • {missing feature} create Fixed asset automatically during the time the invoice is imported
  • Vendor Invoice Journal 
    • {missing feature} configure what journal the invoice is sent to
      • the system seems to pick the first journal available
      • {parameter} control journal name for creating 'Invoice journal'  [1.1.0.32/10.0.39]
    • {missing feature}grouping multiple invoices together in a journal (e.g., vendor group, payment terms, payment of method)
  •  Approval Workflow
    • {missing feature} involve Responsible person for the Vendor [requested]
    • {missing feature} involve Buyer for Cost invoices [requested]
    • {missing feature} differentiator between the various types of invoices [requested]
    • {missing feature} amount-based approval [requested]
  • Billing schedule
    • {feature} integration with Billing schedules
    • {feature} modify or cancel Billing schedules
  • Reporting
    • {missing feature} Vendor invoices hanging in the approval workflow (incl. responsible person for current action, respectively error message)
    • {missing feature} Report for GL reconciliation between Vendor invoice and GL via Billing schedules
    • {missing feature} Overview of the financial dimensions used (to identify whether further setup is needed)
Licensing
  • licensing name: Dynamics 365 E-Invoicing Documents - Electronic Invoicing Add-on for Dynamics 365
  • customers are entitled to 100 invoice capture transactions per tenant per month
    • if customers need more transactions, they must purchase extra Electronic Invoicing SKUs for 1,000 transactions per tenant per month [5] 
      • the transaction capacity is available on a monthly, use-it-or-lose-it basis [5]
      • customers must purchase for peak capacity [5]
  • only the captured invoices are going to be considered as valid transactions [4]
    • files filtered by the file filter setting won't be counted [4]
Architecture
  •  invoice capture is supported only when the integrated Dataverse environment exists [4]
Security
  • [role] Invoice capture operator
    • must be included in the role settings to 
      • successfully run the derivation and validation logic in Invoice capture [5]
      • transfer the invoice to D365F&O [5]
      • the role must be added to the corresponding flow user on the app side [5]
    •  is only applied on the D365 side [4]
      • ⇒ it doesn't block user to login to Invoice capture [4]
      • ⇐ the corresponding access to the virtual entities will be missed [4]
        • then the user will receive an error [4]
  • [role] Environment maker 
    • must be assigned to the Accounts payable administrator if they create channels in Invoice capture [5]

Previous post <<||>> Next post

Resources:
[1] Microsoft Learn (2023) Invoice capture overview (link)
[2] Yammer (2023) Invoice Capture for Dynamics 365 Finance (link)
[3] Microsoft (2023) Invoice Capture for Dynamics 365 Finance - Implementation Guide
[4] Yammer (2025) Quota/licensing follow up [link]
[5] Microsoft Learn (2024) Dynamics 365 Finance: Invoice capture solution [link]

Release history (requires authentication)

Acronyms:
AI - Artificial Intelligence 
AP - Accounts Payable
F&O - Finance & Operations
LE - Legal Entity
PO - Purchase Order
SKU - Stock Keeping Units 
UoM- Unit of Measure

29 March 2021

Notes: Team Data Science Process (TDSP)

Team Data Science Process (TDSP)
Acronyms:
Artificial Intelligence (AI)
Cross-Industry Standard Process for Data Mining (CRISP-DM)
Data Mining (DM)
Knowledge Discovery in Databases (KDD)
Team Data Science Process (TDSP) 
Version Control System (VCS)
Visual Studio Team Services (VSTS)

Resources:
[1] Microsoft Azure (2020) What is the Team Data Science Process? [source]
[2] Microsoft Azure (2020) The business understanding stage of the Team Data Science Process lifecycle [source]
[3] Microsoft Azure (2020) Data acquisition and understanding stage of the Team Data Science Process [source]
[4] Microsoft Azure (2020) Modeling stage of the Team Data Science Process lifecycle [source
[5] Microsoft Azure (2020) Deployment stage of the Team Data Science Process lifecycle [source]
[6] Microsoft Azure (2020) Customer acceptance stage of the Team Data Science Process lifecycle [source]
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.