SQL Troubles: Search results for Data warehousing

Showing posts sorted by date for query Data warehousing. Sort by relevance Show all posts

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 21-Jun-2025

[Microsoft Fabric] Result Set Caching in SQL Analytics Endpoints

{def} built-in performance optimization for Warehouse and Lakehouse that improves read latency [1]

fully transparent to the user [3]
persists the final result sets for applicable SELECT T-SQL queries

caches all the data accessed by a query [3]
subsequent runs that "hit" cache will process just the final result set

can bypass complex compilation and data processing of the original query[1]

⇐ returns subsequent queries faster [1]

the cache creation and reuse is applied opportunistically for queries

works on

warehouse tables
shortcuts to OneLake sources
shortcuts to non-Azure sources

the management of cache is handled automatically [1]

regularly evicts cache as needed

as data changes, result consistency is ensured by invalidating cache created earlier [1]

{operation} enable setting

via ALTER DATABASE <database_name> SET RESULT_SET_CACHING ON

{operation} validate setting

via SELECT name, is_result_set_caching_on FROM sys.databases

{operation} configure setting

configurable at item level

once enabled, it can then be disabled

at the item level
for individual queries

e.g. debugging or A/B testing a query

via OPTION ( USE HINT ('DISABLE_RESULT_SET_CACHE')

{default} during the preview, result set caching is off for all items [1]

[monitoring]

via Message Output

applicable to Fabric Query editor, SSMS
the statement "Result set cache was used" is displayed after query execution if the query was able to use an existing result set cache

via queryinsights.exec_requests_history system view

result_cache_hit displays indicates result set cache usage for each query execution [1]

{value} 2: the query used result set cache (cache hit)
{value} 1: the query created result set cache
{value} 0: the query wasn't applicable for result set cache creation or usage [1]

{reason} the cache no longer exists
{reason} the cache was invalidated by a data change, disqualifying it for reuse [1]
{reason} query isn't deterministic

isn't eligible for cache creation [1]

{reason} query isn't a SELECT statement

[warehousing]

{scenario} analytical queries that process large amounts of data to produce a relatively small result [1]
{scenario} workloads that trigger the same analytical queries repeatedly [1]

the same heavy computation can be triggered multiple times, even though the final result remains the same [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Result set caching (preview) [link]

[2] Microsoft Fabric Update Blog (2025) Result Set Caching for Microsoft Fabric Data Warehouse (Preview) [link|aka]

[3] Microsoft Learn (2025) In-memory and disk caching [link]

[4] Microsoft Learn (2025) Performance guidelines in Fabric Data Warehouse [link]

Resources:
[R1] Microsoft Fabric (2025) Fabric Update - June 2025 [link]

Acronyms:

MF - Microsoft Fabric

SSMS - SQL Server Management Studio

18 May 2025

#️⃣Software Engineering: Mea Culpa (Part VII: A Look Forward)

Software Engineering Series

I worked for more than 20 years in various areas related to ERP systems - Data Migrations, Business Intelligence/Analytics, Data Warehousing, Data Management, Project Management, (data) integrations, Quality Assurance, and much more, having experience with IFS IV, Oracle e-Business Suite, MS Dynamics AX 2009 and during the past 3-7 years also with MS Dynamics 365 Finance, SCM & HR (in that order). Much earlier, I started to work with SQL Server (2000-2019), Oracle, and more recently with Azure Synapse and MS Fabric, writing over time more than 800 ad-hoc queries and reports for the various stakeholders, covering all the important areas, respectively many more queries for monitoring the various environments.

In the areas where I couldn’t acquire experience on the job, I tried to address this by learning in my free time. I did it because I take seriously my profession, and I want to know how (some) things work. I put thus a lot of time into trying to keep actual with what’s happening in the MS Fabric world, from Power BI to KQL, Python, dataflows, SQL databases and much more. These technologies are Microsoft’s bet, though at least from German’s market perspective, all bets are off! Probably, many companies are circumspect or need more time to react to the political and economic impulses, or probably some companies are already in bad shape.

Unfortunately, the political context has a broad impact on the economy, on what’s happening in the job market right now! However, the two aspects are not the only problem. Between candidates and jobs, the distance seems to grow, a dense wall of opinion being built, multiple layers based on presumptions filtering out voices that (still) matter! Does my experience matter or does it become obsolete like the technologies I used to work with? But I continued to learn, to keep actual… Or do I need to delete everything that reminds the old?

To succeed or at least be hired today one must fit a pattern that frankly doesn’t make sense! Yes, soft skills are important though not all of them are capable of compensating for the lack of technical skills! There seems to be a tendency to exaggerate some of the qualities associated with skills, or better said, of hiding behind big words. Sometimes it feels like a Shakespearian inaccurate adaptation of the stage on which we are merely players.

More likely, this lack of pragmatism will lead to suboptimal constructions that will tend to succumb under their own structure. All the inefficiencies need to be corrected, or somebody (or something) must be able to bear their weight. I saw this too often happening in ERP implementations! Big words don’t compensate for the lack of pragmatism, skills, knowledge, effort or management! For many organizations the answer to nowadays problems is more management, which occasionally might be the right approach, though this is not a universal solution for everything that crosses our path(s).

One of society’s answers to nowadays’ problem seems to be the refuge in AI. So, I wonder – where I’m going now? Jobless, without an acceptable perspective, with AI penetrating the markets and making probably many jobs obsolete. One must adapt, but adapt to what? AI is brainless even if it can mimic intelligence! Probably, it can do more in time to the degree that many more jobs will become obsolete (and I’m wondering what will happen to all those people).

Conversely, to some trends there will be probably other trends against them, however it’s challenging to depict in clear terms the future yet in making. Society seems to be at a crossroad, more important than mine.

Previous Post <<||>> Next Post

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented.

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

{def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities

designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
based on a robust DCP

designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
a highly-available micro-service architecture with well-defined responsibilities [2]

data and query processing is packaged into units (aka tasks)

can be readily moved across compute nodes and re-started at the task level

widely-partitioned data with a flexible distribution model [2]
a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
a framework for fine-grained monitoring and flexible scheduling of tasks [2]

{component} SQL Server Front End (SQL-FE)

responsible for

compilation
authorization
authentication
metadata

used by the compiler to

{operation} generate the search space (aka MEMO) for incoming queries
{operation} bind metadata to data cells
leveraged to ensure the durability of the transaction manifests at commit [2]

only transactions that successfully commit need to be actively tracked to ensure consistency [2]
any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]

{component} SQL Server Backend (SQL-BE)

used to perform write operations on the LST [2]

inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]

SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes [2]
each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]

the serialized information is then uploaded as a block to the manifest file
uploading the block does not yet make any visible changes to the file [2]

each block is identified by a unique ID generated on the writing SQL BE [2]

after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]

the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]

the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]

at this point, the changes to the file on storage will become effective [2]

changes to the manifest file are not visible until the Commit operation on the SQL FE

the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]

the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]

[read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE

the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]

queries over these are processed by the SQL Server query execution engine [2]
the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]

enables the cache to be used by different operations operating on different snapshots of the table [2]
enables the cache to be incrementally updated as new transactions commit [2]

{feature} supports explicit user transactions

can execute multiple statements within the same transaction in a consistent way

the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]

changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]

[multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
{write operations} the behavior of the SQL BE depends on the type of the operation.

insert operations

only add new data and have no dependency on previous changes [2]
the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks

⇐ effectively appends the data to the manifest file [2]

{update|delete operations}

handled differently

⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]

e.g. an update operation can be followed by another update operation touching the same rows

the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]

SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]

the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]

{concept} Distributed Query Processor (DQP)

responsible for

distributed query optimization
distributed query execution
query execution topology management

{concept} Workload Management (WLM)

consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]

each compute server runs two micro-services

{service} Execution Service (ES)

responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]

{service} SQL Server instance

used as the back-bone for execution of the template query for a given task [2]

⇐ holds a cache on top of local SSDs

in addition to in-memory caching of hot data

data can be transferred from one compute server to another

via dedicated data channels

the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]

{concept} cell data abstraction

the key building block that enables to abstract data stores

abstracts DQP from the underlying store [1]
any dataset can be mapped to a collection of cells [1]
allows distributing query processing over data in diverse formats [1]
tailored for vectorized processing when the data is stored in columnar formats [1]
further improves relational query performance

2-dimenstional

distributions (data alignment)
partitions (data pruning)

each cell is self-contained with its own statistics [1]

used for both global and local QO [1]
cells can be grouped physically in storage [1]
queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]

{concept} distributed query processing (DQP) framework

operates at the cell level
agnostic to the details of the data within a cell

data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]

{concept} dataset

logically abstracted as a collection of cells [1]
can be arbitrarily assigned to compute nodes to achieve parallelism [1]
uniformly distributed across a large number of cells

[scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
such that they can be processed in parallel across nodes

{concept} session

supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
all data are accessible from any session [1]

multiple sessions can access all underlying data concurrently [1]

{concept} Physical Metadata layer

new layer introduced in the SQL Server storage engine [2]

25 March 2025

🏭🗒️Microsoft Fabric: Security in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 25-Mar-2024

[Microsoft Fabric] Security in Warehouse

{def} suite of technologies aimed at safeguarding sensitive information in Fabric [1]

leverages SQL engine’s security features [1]

allows for security mechanism at the warehouse level [1]
⇐ the warehouse and SQL analytics endpoint items also allow for the defining of native SQL security [4]

the permissions configured only apply to the queries executed against the respective surfaces [4]

the access to OneLake data is controlled separately through OneLake data access roles [4]

{recommendation}don’t include users with SQL specific permissions in a OneLake data access role [4]

allows to ensure users don't see data they don't have SQL access to [5r]

supports a range of data protection features that enable administrators to shield sensitive data from unauthorized access [1]

⇐ across warehouses and SQL analytics endpoints without necessitating changes to applications [1]

{type} object-level security (OLS)

permissions governing DML operations [1]

applies to tables and views
⇐ when denied, the user will be prevented from performing the respective operation
SELECT

allows users to view the data within the object [1]

INSERT

allows users to insert data in the object [1]

UPDATE

allows users to update data within the object [1]

DELETE

allows users to delete the data within the object [1]

permissions can be granted, revoked or denied on database objects [1]

tables and views
GRANT

permission is granted to user or role [1]

DENY

permissions is denied to user or role [1]

REVOKE

permissions is revoked to user or role [1]

ALTER

grants the user the ability to change the definition of the object [1]

CONTROL

grants the user all rights to the object [1]

{principle} least privilege

users and applications should only be given the permissions needed in order for them to complete the task

{type} column-level security (CLS)

allows to restrict column access to sensitive data [1]

provides granular control over who can access specific pieces of data [1]

⇒ enhances the overall security of the data warehouse [1]

steps

identify the sensitive columns [1]
define access roles [1]
assign roles to users [1]
implement access control [1]

restrict access to ta column based on user's role [1]

{type} row-level security (RLS)

provides granular control over access to rows in a table based on group membership or execution context [1]

using WHERE clause filters [1]

works by associating a function (aka security predicate) with a table [1]

defined to return true or false based on certain conditions [1]

⇐ typically involving the values of one or more columns in the table [1]
when a user attempts to access data in the table, the security predicate function is invoked [1]

if the function returns true, the row is accessible to the user; otherwise, the row doesn't show up in the query results [1]

the predicate can be as simple/complex as required [1]
the process is transparent to the user and is enforced automatically by SQL Server

⇐ ensures consistent application of security rules [1]

implemented in two main steps:

filter predicates

an inline table-valued function that filters the results based on the predicate defined [1]

security policy

invokes an inline table-valued function to protect access to the rows in a table [1]

because access control is configured and applied at the warehouse level, application changes are minimal - if any [1]
users can directly have access to the tables and can query their own data [1]

{recommendation} create a separate schema for predicate functions and security policies [1]
{recommendation} avoid type conversions in predicate functions [1]
{recommendation}avoid using excessive table joins and recursion in predicate functions [1]

allows to maximize performance,

{type} dynamic data masking (DDM)

allows to limits data exposure to nonprivileged users by obscuring sensitive data [1]

e.g. email addresses

{benefit} enhance the security and manageability of the data [1]
{feature} real-time masking

when querying sensitive data, DDM applies dynamic masking to it in real time [1]

the actual data is never exposed to unauthorized users, thus enhancing the security of your data [1]

straightforward to implement [1]
doesn’t require complex coding, making it accessible for users of all skill levels [1]
{benefit} the data in the database isn’t changed when DDM is applied

⇒ the actual data remains intact and secure, while nonprivileged users only see a masked version of the data [1]

{operation} define masking rule

set up at column level [1]
offers a suite of features [1]

comprehensive and partial masking capabilities [1]
supports several masking types

help prevent unauthorized viewing of sensitive data [1]

by enabling administrators to specify how much sensitive data to reveal [1]

⇒ minimal effect on the application layer [1]

applied to query results, so the data in the database isn't changed

⇒ allows many applications to mask sensitive data without modifying existing queries [1]

random masking function designed for numeric data [1]

{risk} unprivileged users with query permissions can infer the actual data since the data isn’t physically obfuscated [1]

{recommendation} DDM should be used as part of a comprehensive data security strategy [1]

should include

the proper management of object-level security with SQL granular permissions [1]
adherence to the principle of minimal required permissions [1]

{concept} Dynamic SQL

allows T-SQL statements to be generated within a stored procedure or a query itself [1]

executed via sp_executesql stored procedure

{risk} SQL injection attacks

use QUOTENAME to sanitize inputs [1]

write access to a warehouse or SQL analytics endpoint

{approach} granted through the Fabric workspace roles

the role automatically translates to a corresponding role in SQL that grants equivalent write access [4]
{recommendation} if a user needs write access to all warehouses and endpoints, assign the user to a workspace role [4]

use the Contributor role unless the user needs to assign other users to workspace roles [4]

{recommendation} grant direct access through SQL permissions if the user only needs to write to specific warehouses or endpoints [4]

{approach} grant read access to the SQL engine, and grant custom SQL permissions to write to some or all the data [4]

write access to a warehouse or SQL analytics endpoint

{approach} grant read access through the ReadData permission, granted as part of the Fabric workspace roles [4]

ReadData permission maps the user to a SQL role that gives SELECT permissions on all tables in the warehouse or lakehouse

helpful if the user needs to see all or most of the data in the lakehouse or warehouse [4]
any SQL DENY permissions set on a particular lakehouse or warehouse still apply and limit access to tables [4]
row and column level security can be set on tables to restrict access at a granular level [4]

{approach} grant read access to the SQL engine, and grant custom SQL permissions to read to some or all the data [4]

if the user needs access only to a specific lakehouse or warehouse, the share feature provides access to only the shared item [4]

during the share, users can choose to give only Read permission or Read + ReadData

granting Read permission allows the user to connect to the warehouse or SQL analytics endpoint but gives no table access [4]
granting users the ReadData permissions gives them full read access to all tables in the warehouse or SQL analytics endpoint

⇐ additional SQL security can be configured to grant or deny access to specific tables [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Secure a Microsoft Fabric data warehouse [link]

[2] Data Mozart (2025) Lock Up! Understanding Data Access Options in Microsoft Fabric, by Nikola Ilic [link]
[3] Microsoft Learn (2024) Security in Microsoft Fabric [link]
[4] Microsoft Learn (2024) Microsoft Fabric: How to secure a lakehouse for Data Warehousing teams [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R2] Microsoft Learn (2025) Fabric: Security for data warehousing in Microsoft Fabric [link]

[R3] Microsoft Learn (2025) Fabric: Share your data and manage permissions [link]

Acronyms:

CLS - Column-Level Security

DDM - Dynamic Data Masking
DML - Data Manipulation Language
MF - Microsoft Fabric

OLS - Object-Level Security

RLS - Row-Level Security
SQL - Structured Query Language

18 March 2025

🏭🗒️Microsoft Fabric: Statistics in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 18-Mar-2024

[Microsoft Fabric] Statistics

{def} objects that contain relevant information about data, to allow query optimizer to estimate plans' costs [1]

critical for the warehouse and lakehouse SQL endpoint for executing queries quickly and efficiently [3]

when a query is executed, the engine tries to collect existing statistics for certain columns in the query and use that information to assist in choosing an optimal execution plan [3]
inaccurate statistics can lead to unoptimized query plans and execution times [5]

{type} user-defined statistics

statistics defined manually by the users via DDL statement [1]
users can create, update and drop statistic

via CREATE|UPDATE|DROP STATISTICS

users can review the contents of histogram-based single-column statistics [1]

via DBCC SHOW_STATISTICS

only, a limited version of these statements is supported [1]

{recommendation} focus on columns heavily used in query workloads

e.g. GROUP BYs, ORDER BYs, filters, and JOINs

{recommendation} consider updating column-level statistics regularly [1]

e.g. after data changes that significantly change rowcount or distribution of the data [1]

{type} automatic statistics

statistics created and maintained automatically by the query engine at query time [1]
when a query is issued and query optimizer requires statistics for plan exploration, MF automatically creates those statistics if they don't already exist [1]

then the query optimizer can utilize them in estimating the plan costs of the triggering query [1]
if the query engine determines that existing statistics relevant to query no longer accurately reflect the data, those statistics are automatically refreshed [1]

these automatic operations are done synchronously [1]

the query duration includes this time [1]

{object type} histogram statistics

created per column needing histogram statistics at query time [1]
contains histogram and density information regarding the distribution of a particular column [1]
similar to the statistics automatically created at query-time in Azure Synapse Analytics dedicated pools [1]
name begins with _WA_Sys_.
contents can be viewed with DBCC SHOW_STATISTICS

{object type} average column length statistics

created for variable character columns (varchar) greater than 100 needing average column length at query-time [1]
contain a value representing the average row size of the varchar column at the time of statistics creation [1]
name begins with ACE-AverageColumnLength_
contents cannot be viewed and are nonactionable by users [1]

{object type} table-based cardinality statistics

created per table needing cardinality estimation at query-time [1]
contain an estimate of the rowcount of a table [1]
named ACE-Cardinality [1]
contents cannot be viewed and are nonactionable by user [1]

[lakehouse] SQL analytics endpoint

uses the same engine as the warehouse to serve high performance, low latency SQL queries [4]
{feature} automatic metadata discovery

a seamless process reads the delta logs and from the files folder and ensures SQL metadata for tables is always up to date [4]

e.g. statistics [4]

{limitation} only single-column histogram statistics can be manually created and modified [1]
{limitation} multi-column statistics creation is not supported [1]
{limitation} other statistics objects might appear in sys.stats

besides the statistics created manually/automatically [1]

⇐ the objects are not used for query optimization [1]

{limitation} if a transaction has data insertion into an empty table and issues a SELECT before rolling back, the automatically generated statistics can still reflect the uncommitted data, causing inaccurate statistics [5]

{recommendation} update statistics for the columns mentioned in the SELECT [5]

{recommendation} ensure all table statistics are updated after large DML transactions [2]

17 March 2025

🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 17-Mar-2024

[Microsoft Fabric] Caching

{def} technique that improves the performance of data processing by storing frequently accessed data and metadata in a faster storage layer [1]

e.g. local memory, local SSD disk
⇐ subsequent requests can be served faster, directly from the cache [1]

if a set of data has been previously accessed by a query, any subsequent queries will retrieve that data directly from the in-memory cache [1]
local memory operations are notably faster compared to fetching data from remote storage [1]
⇐ significantly diminishes IO latency [1]

fully transparent to the user
consistently active and operates seamlessly in the background [1]
orchestrated and upheld by MF

it doesn't offer users the capability to manually clear the cache [1]

provides transactional consistency

ensures that any modifications to the data in storage after it has been initially loaded into the in-memory cache, will result in consistent data [1]

when the cache reaches its capacity threshold and fresh data is being read for the first time, objects that have remained unused for the longest duration will be removed from the cache [1]

process is enacted to create space for the influx of new data and maintain an optimal cache utilization strategy [1]

{type} in-memory cache

data in cache is organized in a compressed columnar format (aka columnar storage) [1]

⇐ optimized for analytical queries
each column of data is stored separately [1]

{benefit} allows for better compression [1]

since similar data values are stored together [1]

{benefit} reduces the memory footprint

when queries need to perform operations on a specific column, the engine can work more efficiently speeding up queries' execution [1]

it doesn't have to process unnecessary data from other columns
can perform operations on multiple columns simultaneously [1]

taking advantage of modern multi-core processors [1]

when retrieves data from storage, the data is transformed from its original file-based format into highly optimized structures in in-memory cache [1]
{scenario} analytical workloads where queries involve scanning large amounts of data to perform aggregations, filtering, and other data manipulations [1]

{type} disk cache

complementary extension to the in-memory cache
any data loaded into the in-memory cache is also serialized to the SSD cache [1]

data removed from the in-memory cache remains within the SSD cache for an extended period
when subsequent query requests the data, it is retrieved from the SSD cache into the in-memory cache quicker [1]

{issue} cold run (aka cold cache) performance

the first 1-3 executions of a query perform noticeably slower than subsequent executions [2]

if the first run's performance is crucial, try manually creating statistics (aka pre-warming the cache) [2]
otherwise, one can rely on the automatic statistics generated in the first query run and leveraged in subsequent runs [2]
as long as underlying data does not change significantly [2]

differentiated from

[Kusto] caching policy [link]
[Apache Spark] intelligent cache [link]
[Power BI] query caching [link]
[Azure] caching [link]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]

[2] Microsoft Learn (2024) Fabric Data Warehouse performance guidelines [link]

[3] Sandeep Pawar (2023) Pre-Warming The Direct Lake Dataset For Warm Cache Import-Like Performance [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
IO - Input/Output
MF - Microsoft Fabric
SSD - Solid State Drive

15 March 2025

💫🗒️ERP Systems: Microsoft Dynamics 365's Business Performance Analytics (BPA) [notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

Last updated: 15-Mar-2025

[Dynamics 365] Business Performance Analytics (BPA)

{def} centralized reporting hub within D365 F&O designed to streamline insights and help organizations make faster, data driven decisions [3]

solution designed to transform organization's data into actionable insights [1]
provides an intuitive out-of-box data model along with familiar tools like Microsoft Excel and Power BI for self-service analytics [4]

data extracted from D365 is classified in BPA in the form of value chains

⇐ a group of business processes on top of the value chain [4]

{benefit} allows to simplify data insights by providing a unified view of business data across entities in near real time [4]
{benefit} allows to streamline financial and operations reporting to reduce the cycle times [4]
{benefit} allows users of all technical abilities to quickly access and analyze data to facilitate data driven decisions [4]
{benefit} provides auditors with direct access to financial data, making the audit process more efficient
{benefit} enables ease of use through familiar apps like Excel and Power BI, in addition to AI driven insights and automation in this platform that can be scalable and extendable [4]
{feature} extends into Microsoft Fabric

{benefit} provide a scalable, secure environment for handling large data sets and ensuring insights are always powered by the latest technology [3]

{feature} ETL process

involves extracting data from finance and operations database, transforming and loading it into Dataverse [4]

each of the entities required for the generation of the dimensional model for the value chains that were mentioned earlier, they are backed by the underlying tables in finance and operations database [4]

installed in Dataverse, virtual entities that are created will then pull in the data into the managed data lake [4]
the data is then transformed to generate the dimensional model which is then pushed into the embedded Power BI workspace in the form of analytical tables [4]
BPA consumes this data from Power BI workspace to render the power BI reports [4]
this data can also be extended to Fabric if there is a need to consolidate data from multiple sources [4]

{feature} reports

designed to provide a detailed overview of an organization's financial health [8]
further reports will be added to expand the coverage for the value chains [8]
out-of-box reports can't be modified

⇐ users cannot rename, delete or edit these type of reports [8]
there’s the option to duplicate the base report and edit the version thus created [8]

can be shared with other users who have access to BPA

⇐ they can receive an in-app notification [8]
can be shared over email with another user by entering user’s email address [8]
one can configure whether the recipient can edit or view the report [8]

{feature} allows to create a new Power BI or Excel report from scratch [8]

{option} start with a blank report or duplicate an existing report [8]

{feature} data refresh

automatic data refreshes run currently two times a day [4]

at 12:00 AM and 12:00 PM UTC
the volume of data is also constrained by the storage capacity of the A3 SKU for Power BI Embedded [1]

future release, may support additional data reporting capacity [1]

⇐ so that larger data sets can be reported and analyzed [1]

the target is to have refreshes every hour or less [3]

data volume will be initially for about eight quarters of data [4]
extensibility will be supported with bring your own Fabric [4]

architecture

SaaS solution

{capability} immediate deployment

businesses can start to analyze data and generate insights with minimal setup [1]

{capability} comprehensive reporting and dashboards

provides access to a wide range of preconfigured reports that cover multiple business functions [1]

{capability} near-real-time analytics

future releases will offer more frequent data refreshes to enable near-real-time data analysis and reporting

{capability} predictive insights

future releases will introduce predictive analytics capabilities that enable businesses to

forecast trends
identify risks
seize opportunities [1]

{capability} user-friendly interface

intuitive design ⇒ minimal training

fosters broader adoption
enables a data-driven culture across the organization [1]

{capability} cost-effectiveness

available as part of D365 license

⇒ provides advanced analytics without requiring significant investments in IT infrastructure [1]

DaaS solution

{capability} organizations can integrate its data models with their existing data warehousing infrastructure in Microsoft Fabric [1]

maximizes the value of existing data solutions [1]
positions businesses for future enhancements [1]

{capability} unified and scalable data models

customers can build custom models on top of a unified framework

ensures consistency and scalability across data sets [1]

{capability} future-proofing with automatic upgrades

data models integrate seamlessly with future D365 updates

reduces manual maintenance and ensures access to the latest features [1]

{capability} consistency and standardization

data models provide consistency and standardization across data sources

ensure high data quality and integrity [1]

{capability} advanced analytics and AI

by customizing the data models, organizations can take advantage of advanced analytics and AI capabilities [1]

deeper insights without having to develop them from scratch [1]

{capability} enhanced data governance

unified data models support better data governance by providing standardized data definitions, relationships, and hierarchies [1]

ensure consistency and quality across the organization [1]

requires an integrated Power Platform environment [5]

must be integrated with the Microsoft Entra tenant [5]

uses shared Dataverse entitlements [1]

includes access to the data lake [1]

setup

dimensions

the selection of dimensions might affect the dimension groups that are created using these dimensions and the users who are assigned there [7]

e.g. legal entity, business unit

dimension groups

users can select specific values for the legal entity, or add a range of values [7]

selecting an invalid combination of dimension values, the dimension group will filter out all the records on the report [7]

{warning} assigning too many dimension groups to a user, slows the load for that user [7]

roles

determine which reports the user can access [7]

security

secure data through role-based access control on top of the value chains [7]
the first user who signs into the app is assigned the BPA admin role [7]

allows a user to access the administrator section of the BPA [7]

where the security can be set up [7]

has automatically assigned

Microsoft report viewer role
the All Access Dimension group [7]

allow the admin to see the data in all the reports across all the dimensions [7]

{feature} dimension-based role-level security

ensures that users only see the data relevant to them based on their role

confidently share reports without duplicating them

⇐ data is automatically filtered by organization's security policies [3]

simple but powerful way to maintain control while providing access for teams that love working in Excel [3]

accessibility

can be accessed through either

Power Platform

admins can access BPA app through PowerApps' makeup portal [6]

Dynamics 365

through the BPA preview shortcut in the homepage or the default dashboard [6]
for end users, the BPA preview shortcut is provided when they have certain duties associated to their role(s) [6]

licensing

included in D365 F&O license [4]

requirements

requires a tier two environment and Dynamics 365 finance version 1.0.38 or later [5]

{project} timeline

[2025 wave 1] backup and restore custom reports and analytics

{benefit} support better lifecycle management and empower customers to develop on sandbox instances before publishing to production [3]

2025: available in all regions where F&O is available [3]
Oct-2024: GA

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Dynamics 365 Finance: What is Business performance analytics? [link]
[2] Microsoft Learn (2025) Business performance analytics (BPA) with Dynamics 365 Finance [link]
[3] Dynamics 365 Finance - Business Performance Analytics 2025 Release Wave 1 Release Highlights [link]
[4] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 1 [link]
[5] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 2 [link]
[6] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 3 [link]
[7] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 4 [link]
[8] Dynamics 365 Community (2024) Dynamics 365 Bites: Business Performance Analytics Part 5 [link]
[9] Microsoft Learn (2024) Dynamics 365: Business performance analytics introduction [link]

Acronyms:
AI - Artificial Intelligence
BPA - Business Performance Analytics
D365 F&O - Dynamics 365 for Finance and Operations
DaaS - Data-as-a-Service
ETL - Extract, Transfer, Load
GA - General Availability
MF - Microsoft Fabric
PP - Public Preview
SaaS - Software-as-a-Service
SKU - Stock Keeping Unit
UTC - Coordinated Universal Time

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 8-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventstream(s)

{def} feature in Microsoft Fabric's Real-Time Intelligence experience, that allows to bring real-time events into Fabric

bring real-time events into Fabric, transform them, and then route them to various destinations without writing any code

⇐ aka no-code solution
{feature} drag and drop experience

gives users an intuitive and easy way to create your event data processing, transforming, and routing logic without writing any code

work by creating a pipeline of events from multiple internal and external sources to different destinations

a conveyor belt that moves data from one place to another [1]
transformations to the data can be added along the way [1]

filtering, aggregating, or enriching

{def} eventstream

an instance of the Eventstream item in Fabric [2]
{feature} end-to-end data flow diagram

provide a comprehensive understanding of the data flow and organization [2].

{feature} eventstream visual editor

used to design pipelines by dragging and dropping different nodes [1]
sources

where event data comes from
one can choose

the source type
the data format
the consumer group

Azure Event Hubs

allows to get event data from an Azure event hub [1]
allows to create a cloud connection with the appropriate authentication and privacy level [1]

Azure IoT Hub

SaaS service used to connect, monitor, and manage IoT assets with a no-code experience [1]

CDC-enabled databases

software process that identifies and tracks changes to data in a database, enabling real-time or near-real-time data movement [1]
Azure SQL Database
PostgreSQL Database
MySQL Database
Azure Cosmos DB

Google Cloud Pub/Sub

messaging service for exchanging event data among applications and services [1]

Amazon Kinesis Data Streams

collect, process, and analyze real-time, streaming data [1]

Confluent Cloud Kafka

fully managed service based on Apache Kafka for stream processing [1]

Fabric workspace events

events triggered by changes in Fabric Workspace

e.g. creating, updating, or deleting items.

allows to capture, transform, and route events for in-depth analysis and monitoring within Fabric [1]
the integration offers enhanced flexibility in tracking and understanding workspace activities [1]

Azure blob storage events

system triggers for actions like creating, replacing, or deleting a blob [1]

these actions are linked to Fabric events

allowing to process Blob Storage events as continuous data streams for routing and analysis within Fabric [1]

support streamed or unstreamed events [1]

custom endpoint

REST API or SDKs can be used to send event data from custom app to eventstream [1]
allows to specify the data format and the consumer group of the custom app [1]

sample data

out-of-box sample data

destinations

where transformed event data is stored.

in a table in an eventhouse or a lakehouse [1]
redirect data to

another eventstream for further processing [1]
an activator to trigger an action [1]

Eventhouse

offers the capability to funnel your real-time event data into a KQL database [1]

Lakehouse

allows to preprocess real-time events before their ingestion in the lakehouse

the events are transformed into Delta Lake format and later stored in specific lakehouse tables [1]

facilitating the data warehousing needs [1]

custom endpoint

directs real-time event traffic to a bespoke application [1]
enables the integration of proprietary applications with the event stream, allowing for the immediate consumption of event data [1]
{scenario} aim to transfer real-time data to an independent system not hosted on the Microsoft Fabric [1]

Derived Stream

specialized destination created post-application of stream operations like Filter or Manage Fields to an eventstream
represents the altered default stream after processing, which can be routed to various destinations within Fabric and monitored in the Real-Time hub [1]

Fabric Activator

enables to use Fabric Activator to trigger automated actions based on values in streaming data [1]

transformations

filter or aggregate the data as is processed from the stream [1]
include common data operations

filtering

filter events based on the value of a field in the input
depending on the data type (number or text), the transformation keeps the values that match the selected condition, such as is null or is not null [1]

joining

transformation that combines data from two streams based on a matching condition between them [1]

aggregating

calculates an aggregation every time a new event occurs over a period of time [1]

Sum, Minimum, Maximum, or Average

allows renaming calculated columns, and filtering or slicing the aggregation based on other dimensions in your data [1]
one can have one or more aggregations in the same transformation [1]

grouping

allows to calculate aggregations across all events within a certain time window [1]

one can group by the values in one or more fields [1]

allows for the renaming of columns

similar to the Aggregate transformation
⇐ provides more options for aggregation and includes more complex options for time windows [1]

allows to add more than one aggregation per transformation [1]

allows to define the logic needed for processing, transforming, and routing event data [1]

union

allows to connect two or more nodes and add events with shared fields (with the same name and data type) into one table [1]

fields that don't match are dropped and not included in the output [1]

expand

array transformation that allows to create a new row for each value within an array [1]

manage fields

allows to add, remove, change data type, or rename fields coming in from an input or another transformation [1]

temporal windowing functions

enable to analyze data events within discrete time periods [1]
way to perform operations on the data contained in temporal windows [1]

e.g. aggregating, filtering, or transforming streaming events that occur within a specified time period [1]
allow analyzing streaming data that changes over time [1]

e.g. sensor readings, web-clicks, on-line transactions, etc.
provide great flexibility to keep an accurate record of events as they occur [1]

{type} tumbling windows

divides incoming events into fixed and nonoverlapping intervals based on arrival time [1]

{type} sliding windows

take the events into fixed and overlapping intervals based on time and divides them [1]

{type} session windows

divides events into variable and nonoverlapping intervals that are based on a gap of lack of activity [1]

{type} hopping windows

are different from tumbling windows as they model scheduled overlapping window [1]

{type} snapshot windows

group event stream events that have the same timestamp and are unlike the other windowing functions, which require the function to be named [1]
one can add the System.Timestamp() to the GROUP BY clause [1]

{type} window duration

the length of each window interval [1]
can be in seconds, minutes, hours, and even days [1]

{parameter} window offset

optional parameter that shifts the start and end of each window interval by a specified amount of time [1]

{concept} grouping key

one or more columns in an event data use to group the data by [1]

aggregation function

one or more of the functions applied to each group of events in each window [1]

where the counts, sums, averages, min/max, and even custom functions become useful [1]

see the event data flowing through the pipeline in real-time [1]
handles the scaling, reliability, and security of event stream automatically [1]

no need to write any code or manage any infrastructure [1]

{feature} eventstream editing canvas

used to

add and manage sources and destinations [1]
see the event data [1]
check the data insights [1]
view logs for each source or destination [1]

{feature} Apache Kafka endpoint on the Eventstream item

{benefit} enables users to connect and consume streaming events through the Kafka protocol [2]

application using the protocol can send or receive streaming events with specific topics [2]
requires updating the connection settings to use the Kafka endpoint provided in the Eventstream [2]

{feature} support runtime logs and data insights for the connector sources in Live View mode [3]

allows to examine detailed logs generated by the connector engines for the specific connector [3]

help with identifying failure causes or warnings [3]
⇐ accessible in the bottom pane of an eventstream by selecting the relevant connector source node on the canvas in Live View mode [3]

{feature} support data insights for the connector sources in Live View mode [3]
{feature} integrates eventstreams CI/CD tools

{benefit} developers can efficiently build and maintain eventstreams from end-to-end in a web-based environment, while ensuring source control and smooth versioning across projects [3]

{feature} REST APIs

allow to automate and manage eventstreams programmatically

{benefit} simplify CI/CD workflows and making it easier to integrate eventstreams with external applications [3]

{recommendation} use event streams feature with at least SKU: F4 [2]
{limitation} maximum message size: 1 MB [2]
{limitation} maximum retention period of event data: 90 days [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Microsoft Fabric: Use real-time eventstreams in Microsoft Fabric [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Fabric Eventstream - overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What's new in Fabric event streams? [link]

[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

[5] Microsoft Learn (2025) Use real-time eventstreams in Microsoft Fabric [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Fabric Updates Blog (2024) CI/CD – Git Integration and Deployment Pipeline [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

API - Application Programming Interface
CDC - Change Data Capture
CI/CD - Continuous Integration/Continuous Delivery
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
RTI - Real-Time Intelligence

SaaS - Software-as-a-Service
SDK - Software Development Kit
SKU - Stock Keeping Unit

SQL Troubles

Pages

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

18 May 2025

#️⃣Software Engineering: Mea Culpa (Part VII: A Look Forward)

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

25 March 2025

🏭🗒️Microsoft Fabric: Security in Warehouse [Notes]

18 March 2025

🏭🗒️Microsoft Fabric: Statistics in Warehouse [Notes]

17 March 2025

🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

15 March 2025

💫🗒️ERP Systems: Microsoft Dynamics 365's Business Performance Analytics (BPA) [notes]

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

About Me