SQL Troubles

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

ERP Implementations Series

An ERP implementation is occasionally considered as a one-time endeavor after which an organization will live happily ever after. In an ideal world that would be true, though the work never stops – things that were carved out from the implementation, optimizations, new features, new regulations, new requirements, integration with other systems, etc. An implementation is thus just the beginning from what it comes and it's essential to get the foundation right – and that’s the purpose of the ERP implementation – provide a foundation on which something bigger and solid can be erected.

No matter how well an ERP implementation is managed and executed, respectively how well people work towards the same goals, there’s always something forgotten or carved out from the initial project. Usually, the casual suspects are the integrations with other systems, though there can be also minor or even bigger features that are planned to be addressed later, if the implementation hasn’t consumed already all the financial resources available, as it's usually the case. Some of the topics can be addressed as Change Requests or consolidated on projects of their own.

Even simple integrations can become complex when the processes are poorly designed, and that typically happens more often than people think. It’s not necessarily about the lack of skillset or about the technologies used, but about the degree to which the processes can work in a loosely coupled interconnected manner. Even unidirectional integrations can raise challenges, though everything increases in complexity when the flow of data is bidirectional. Moreover, the complexity increases with each system added to the overall architecture.

Like a sculpture’s manual creation, processes in an ERP implementation form a skeleton that needs chiseling and smoothing until the form reaches the desired optimized shape. However, optimization is not a one-time attempt but a continuous work of exploring what is achievable, what works, what is optimal. Sometimes optimization is an exact science, while other times it’s about (scientifical) experimentation in which theory, ideas and investments are put to good use. However, experimentation tends to be expensive at least in terms of time and effort, and probably these are the main reasons why some organizations don’t even attempt that – or maybe it’s just laziness, pure indifference or self-preservation. In fact, why change something that already works?

Typically, software manufacturers make available new releases on a periodic basis as part of their planning for growth and of attracting more businesses. Each release that touches used functionality typically needs proper evaluation, testing and whatever organizations consider as important as part of the release management process. Ideally, everything should go smoothly though life never ceases to surprise and even a minor release can have an important impact when earlier critical functionality stopped working. Test automation and other practices can make an important difference for organizations, though these require additional effort and investments that usually pay off when done right.

Regulations and other similar requirements must be addressed as they can involve penalties or other risks that are usually worth avoiding. Ideally such requirements should be supported by design, though even then a certain volume of work is involved. Moreover, the business context can change unexpectedly, and further requirements need to be considered eventually.

The work on an ERP system and the infrastructure built around it is a never-ending story. Therefore, organizations must have not only the resources for the initial project, but also what comes after that. Of course, some work can be performed manually, some requirements can be delayed, some risks can be assumed, though the value of an ERP system increases with its extended usage, at least in theory.

Previous Post <<||>> Next Post

🧮ERP: Implementations (Part XIII: On Project Management)

ERP Implementations Series

Given its intrinsic complexity and extended implications, an ERP implementation can be considered as the real test of endurance for a Project Manager, respectively the team managed. Such projects typically deal with multiple internal and external parties with various interests in the outcomes of the project. Moreover, such projects involve multiple technologies, systems, and even methodologies. But, more importantly, such projects tend to have specific characteristics associated with their mass, being challenging to manage within the predefined constraints: time, scope, costs and quality.

From a Project Manager’s perspective what counts is only the current project. From a PMO perspective, one project, independent of its type, must be put within the broader perspective, while looking at the synergies and other important aspects that can help the organization. Unfortunately, for many organizations all begins and ends with the implementation, and this independently of the outcomes of the project. Often failure lurks in the background and usually there can be small differences that in the long term have a considerable impact. ERP implementations are more than other projects sensitive on the initial conditions – the premises under which the project starts and progresses.

One way of coping with this inherent complexity is to split projects into several phases considered as projects or subprojects in their own boundaries. This allows organizations to narrow the focus and split the overall work into more manageable pieces, reducing to some degree the risks while learning in the process about organization’s capabilities in addressing the various aspects. Conversely, the phases are not necessarily sequential but often must overlap to better manage the resources and minimize waste.

Given that an implementation project can take years, it’s normal for people to come and go, some taking over work from colleagues, with or without knowledge transfer. The knowledge is available further on, as long as the resources don’t leave the organization, though knowledge transfer can’t be taken for granted. It’s also normal for resources to suddenly not be available or disappear, increasing the burden that needs to be shifted on others’ shoulders. There’s seldom a project without such events and one needs to make the best of each situation, even if several tries and iterations are needed in the process.

Somebody needs to manage all this, and the weight of the whole project falls on a PM’s shoulders. Managing by exception and other management principles break under the weight of implementation projects and often it’s challenging to make progress without addressing this. Fortunately, PMs can shift the burden on Key Users and other parties involved in the project. Splitting a project in subprojects can help set boundaries even if more management could occasionally be involved. Also having clear responsibilities and resources who can take over the burdens when needed can be a sign of maturity of the teams, respectively the organization.

Teams in Project Management are often compared with teams in sports, though the metaphor is partially right when each party has a ball to play with, while some of the players or even teams prefer to play alone at their own pace. It takes time to build effective teams that play well together, and the team spirit or other similar concepts can't fill all the gaps existing in organizations! Training in team sports has certain characteristics that must be mirrored in organizations to allow for teams to improve. Various parties expect from the PM to be the binder and troubleshooter of something that should have been part of an organization’s DNA! Bringing external players to do the heavy lifting may sometimes work, though who’ll do the lifting after the respective resources are gone?

Previous Post <<||>> Next Post

15 April 2025

🧮ERP: Implementations (Part XII: The Process Perspective)

ERP Implementations Series

Technology can have a tremendous potential impact on organizations, helping them achieve their strategic goals and objectives, however it takes more than an implementation of one or more technologies to leverage that potential! This applies to ERP and other technology implementations altogether, but the role of technology is more important in the latter through its transformative role. ERP implementations can be the foundation on which the whole future of the organization is built upon, and it’s ideal to have a broader strategy that looks at all the facets of an organization pre-, during and postimplementation.

One of the most important assets an organization has is its processes, organization’s success depending on the degree the processes are used to leverage the various strategies. Many customers want their business processes to be implemented on the new platform and that's the point where many projects go in the wrong direction! There are probably areas where this approach makes sense, though organizations need to look also at the alternatives available in the new ecosystem, identify and prioritize the not existing features accordingly. There will be also extreme cases in which one or a mix of systems will be considered as not feasible, and this is an alternative that should be considered during such evaluations!

An ERP system allows organizations to implement their key value-creation processes by providing a technological skeleton with a set of configurations and features that can be used to address a wide set of requirements. Such a framework is an enabler - makes things possible - though the potential is not reached automatically, and this is one of the many false assumptions associated with such projects. Customers choose such a system and expect magic to happen! Many of the false perceptions are strengthened by implementers or the other parties involved in the projects. As in other IT areas, there are many misconceptions that pervade.

An ERP provides thus a basis on which an organization can implement its processes. Doing an ERP implementation without process redesign is seldom possible, even if many organizations want to avoid it at all costs. Even if organization’s processes are highly standardized, expecting a system to model them by design is utopian, given that ERP system tends to target the most important aspects identified across industries. And thus, customizations come into play, some of them done without looking for alternatives already existing in the intrinsic or extended range of solutions available in an ERP’s ecosystem.

One of the most important dangers is when an organization’s processes are so complex that their replication in the new environment creates more issues that the implementation can solve. At least in the first phases of the implementation, organizations must learn to compromise and focus on the critical aspects without which the organization can’t do its business. Moreover, the costs of implementations tend to increase exponentially, when multiple complex requirements are added to address the gaps. Organizations should always look at alternatives – integrations with third party systems tend to be more cost-effective than rebuilding the respective functionality from scratch!

It's also true that some processes are too complex to be implemented, though the solution resides usually in the middle. Each customization adds another level of complexity, and a whole range of risk many customers take. Conversely, there’s no blueprint that works for everybody. Organizations must thus compromise and that’s probably one of the most important aspects they should be aware of! However, also compromises must be made in the right places, while evaluating alternatives and the possible outcomes. It’s important to be aware of the full extent of the implications for their decisions.

Previous Post <<||>> Next Post

14 April 2025

🧮ERP: Implementations (Part XI: Tales from the Crypt)

ERP Implementations Series

One can seldom meet more frighteningly strange stories than the ones told by people who worked in ERP implementations. Such projects attempt to model an organization’s main functions and processes, independently on whether the focus is on production, finance, supply chain, services, projects or human resources. Because they tend to touch all important aspects of a business, such projects become so complex and political that they are often challenging to manage and occasionally are predestined to failure by design.

For the ones who never participated in an ERP implementation, imagine an average project and the number of challenges associated with it, and multiply it by 10 or a similar number that reflects the increase in complexity with the translation to broader scales. The jump in complexity can be compared with the jump from putting together a bed after a scheme to building a whole house using the same level of detail. The scale can further increase by moving from a house to a whole building or a complex of residential houses. Even if that’s technically achievable, a further challenge is how to build all this in a short amount of time, with minimal costs and acceptable quality levels.

With the increase of scale, imagine the amount of planning and coordination that needs to be achieved to avoid any delays. Even if many plan with the "first-time right" objective in mind, inherent issues are often unavoidable, and an organization’s agility can be measured on how robustly it can handle the foreseeable and unforeseeable challenges altogether. Of course, there are many approaches that allow one to minimize, defer or share the risks, or even opportunities, though there’s usually an important gap between one’s planning and reality!

This doesn’t mean that such projects are unmanageable! Everything can be managed to some level of detail and within some tolerance margins, however many organizations are tempted to answer complexity with complexity, and that’s seldom the right approach! Ideally, complexity should be broken down to manageable parts, though that’s challenging to do when one doesn’t know what is being done. That’s why many organizations search for partners with which to share the risks and success, though that works if the customer, and its partners can stir the same ship toward common destinations, at least for the main itinerary if not for the whole duration of the trip.

Unfortunately, as happens in partnerships that diverge toward distinct goals, the misalignment and other similar factors resulting from this divergence can lead to further challenges that increase the complexity of ERP implementations even more. Ideally, a partner should behave like the mechanics at a pitstop, though that’s utopic especially when they must be always available and this for the whole duration of the project. So, all parties need to compromise somehow, and, even if there are general recipes that can be used, it’s challenging to make everybody happy!

Often in an ERP implementation is defined from the beginning whose needs are the most important, and from there one can build a whole hierarchy of scenarios, models and analyses that should show the right path(s). There’s a lot of knowledge that can be carried out between projects, respectively, between the different phases of a project, though there will always be surprises and one should be prepared for them! Same as the captain must occasionally change the course to avoid or fight storms or other hazards, so must the corresponding structure act when this is the case! Occasionally, each team member may be in the position to act like a captain and raise to expectations, though project designs must allow for this!

Previous Post <<||>> Next Post

🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 13-Apr-2025

[Microsoft Fabric] Quotas

{def} assigned number of resources for an Azure subscription

depend on the Azure subscription type [5]

set and enforced in the scope of the subscription [1]
each subscription has a default value for each quota [1]

determine the maximum number of CUs for each of the capacities on the subscription [5]

customers don't get charged for quotas, but on capacities [5]

each quota represents a specific countable resource

e.g. number of VMs that can be created [2]
e.g. the number of storage accounts that can be used concurrently [2]
e.g. the number of networking resources that can be consumed [2]
e.g. the number of API calls to a particular service that can be made [2]

designed to help protect customers from unexpected behavior [1]

e.g. inaccurately resourced deployments, mistaken consumption
helps minimize risks from deceptive or inappropriate consumption and unexpected demand [1]

limitations

⇐ some limits are managed at a regional level [2]
⇐ variable and dependent on several factors

{type} adjustable quotas

quotas for which customers can request quota increases

via Azure Home My quotas page
based on amount or usage percentage and submitting it directly [1]
each subscription has a default quota value for each quota [1]

the quickest way to increase quotas [1]

{type} non-adjustable quotas

quotas which have a hard limit, usually determined by the scope of the subscription [1]
to make changes, customers must submit a support request for the Azure support team [1]

[Fabric] limit the number of CUs customers can provision across multiple capacities in a subscription

calculation based on

subscription plan type
Azure region

{action} provision a new capacity

{level} Azure management group limits
{level} Azure subscription limits
{level} Azure resource group limits
{level} Template limits
{level} Microsoft Entra ID limits
{level} Azure API Center limits
{level} Azure API Management limits
{level} API Management v2 tiers
{level} Azure App Service limits
{level} Azure Automation limits

{action} request quota increase

via support request
there is no cost associated with requesting a quota increase [1]

⇐ costs are incurred based on resource usage, not the quotas themselves [1]

should be based on workloads' characteristics [1]

{action} view quota

[permission] contributor role, or another role that includes contributor access [5]

{action} manage quota

[permission] see Quota Request Operator

[Azure] quota alerts

notifications triggered when the usage of a specific Azure resource nears the predefined quota limit [4]
facilitates proactive resource management [4]
multiple alert rules can be created for a given quota or across quotas in a subscription [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Fabric: Quotas overview [link]

[2] Microsoft Learn (2025) Fabric: Azure subscription and service limits, quotas, and constraints [link]

[3] Microsoft Learn (2025) Fabric: Quota monitoring and alerting [link]

[4] Microsoft Fabric Update Blog (2024) Announcing the launch of Microsoft Fabric Quotas [link]
[5] Microsoft Learn (2025) Fabric: Microsoft Fabric capacity quotas [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

API - Application Programming Interface
CU - Capacity Units

VM - Virtual Machine

13 April 2025

🏭🗒️Microsoft Fabric: Continuous Integration & Continuous Deployment [CI/CD] [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 13-Apr-2025

[Microsoft Fabric] Continuous Integration & Continuous Deployment [CI/CD]

{def} development processes, tools, and best practices used to automates the integration, testing, and deployment of code changes to ensure efficient and reliable development

can be used in combination with a client tool

e.g. VS Code, Power BI Desktop
don’t necessarily need a workspace

developers can

create branches
commit changes to that branch locally
push changes to the remote repo
create a pull request to the main branch
⇐ all steps can be performed without a workspace [1]

workspace is needed only as a testing environment [1]

to check that everything works in a real-life scenario [1]

addresses a few pain points [2]

manual integration issues

manual changes can lead to conflicts and errors

slow down development [2]

development delays

manual deployments are time-consuming and prone to errors

lead to delays in delivering new features and updates [2]

inconsistent environments

inconsistencies between environment cause issues that are hard to debug [2]

lack of visibility

can be challenging to

track changes though their lifetime [2]
understand the state of the codebase[2]

{process} continuous integration (CI)
{process} continuous deployment (CD)

architecture

{layer} development database

{recommendation} should be relatively small [1]

{layer} test database

{recommendation} should be as similar as possible to the production database [1]

{layer} production database

data items

items that store data
items' definition in Git defines how the data is stored [1]

{stage} development

{best practice} back up work to a Git repository

back up the work by committing it into Git [1]
{prerequisite} the work environment must be isolated [1]

so others don’t override the work before it gets committed [1]
commit to a branch no other developer is using [1]
commit together changes that must be deployed together [1]

helps later when

deploying to other stages
creating pull requests
reverting changes

{warning} big commits might hit the max commit size limit [1]

{bad practice} store large-size items in source control systems, even if it works [1]
{recommendation} consider ways to reduce items’ size if they have lots of static [1] resources, like images [1]

{action} revert to a previous version

{operation} undo

revert the immediate changes made, as long as they aren't committed yet [1]
each item can be reverted separately [1]

{operation} revert

reverting to older commits

{recommendation} promote an older commit to be the HEAD

via git revert or git reset [1]
shows that there’s an update in the source control pane [1]
the workspace can be updated with that new commit [1]

{warning} reverting a data item to an older version might break the existing data and could possibly require dropping the data or the operation might fail [1]
{recommendation} check dependencies in advance before reverting changes back [1]

{concept} private workspace

a workspace that provides an isolated environment [1]

⇐ allows to work in isolation [1]

{prerequisite} the workspace is assigned to a Fabric capacity [1]
{prerequisite} access to data to work in the workspace [1]
{step} create a new branch from the main branch [1]

allows to have most up-to-date version of the content [1]
can be used for any future branch created by the user [1]

when a sprint is over, the changes are merged and one can start a fresh new task [1]

switch the connection to a new branch on the same workspace

approach can be used when is needed to fix a bug in the middle of a sprint [1]

{validation} connect to the correct folder in the branch to pull the right content into the workspace [1]

{best practice} make small incremental changes that are easy to merge and less likely to get into conflicts [1]

update the branch to resolve the conflicts first [1]

{best practice} change workspace’s configurations to enable productivity [1]

connection between items, or to different data sources or changes to parameters on a given item [1]

{recommendation} make sure you're working with the supported structure of the item you're authoring [1]

if you’re not sure, first clone a repo with content already synced to a workspace, then start authoring from there, where the structure is already in place [1]

{constraint} a workspace can only be connected to a single branch at a time [1]

{recommendation} treat this as a 1:1 mapping [1]

{stage} test

{best practice} allows to simulate a real production environment for testing purposes [1]

{alternative} simulate this by connecting Git to another workspace [1]

factors to consider for the test environment

data volume
usage volume
production environment’s capacity

stage and production should have the same (minimal) capacity [1]

using the same capacity can make production unstable during load testing [1]

{recommendation} test using a different capacity similar in resources to the production capacity [1]
{recommendation} use a capacity that allows to pay only for the testing time [1]

allows to avoid unnecessary costs [1]

{best practice} use deployment rules with a real-life data source

{recommendation} use data source rules to switch data sources in the test stage or parameterize the connection if not working through deployment pipelines [1]
{recommendation} separate the development and test data sources [1]
{recommendation} check related items

the changes made can also affect the dependent items [1]

{recommendation} verify that the changes don’t affect or break the performance of dependent items [1]

via impact analysis.

{operation} update data items in the workspace

imports items’ definition into the workspace and applies it on the existing data [1]
the operation is same for Git and deployment pipelines [1]
{recommendation} know in advance what the changes are and what impact they have on the existing data [1]
{recommendation} use commit messages to describe the changes made [1]
{recommendation} upload the changes first to a dev or test environment [1]

{benefit} allows to see how that item handles the change with test data [1]

{recommendation} check the changes on a staging environment, with real-life data (or as close to it as possible) [1]

{benefit} allows to minimize the unexpected behavior in production [1]

{recommendation} consider the best timing when updating the Prod environment [1]

{benefit} minimize the impact errors might cause on the business [1]

{recommendation} perform post-deployment tests in Prod to verify that everything works as expected [1]
{recommendation} have a deployment, respectively a recovery plan [1]

{benefit) allows to minimize the effort, respectively the downtime [1]

{stage} production

{best practice} let only specific people manage sensitive operations [1]
{best practice} use workspace permissions to manage access [1]

applies to all BI creators for a specific workspace who need access to the pipeline

{best practice} limit access to the repo or pipeline by only enabling permissions to users [1] who are part of the content creation process [1]
{best practice} set deployment rules to ensure production stage availability [1]

{goal} ensure the data in production is always connected and available to users [1]
{benefit} allows deployments run while while minimizing the downtimes
applies to data sources and parameters defined in the semantic model [1]

deployment into production using Git branches

{recommendation} use release branches [1]

requires changing the connection of workspace to the new release branches before every deployment [1]
if the build or release pipeline requires to change the source code, or run scripts in a build environment before deployment, then connecting the workspace to Git won't help [1]

{recommendation} after deploying to each stage, make sure to change all the configuration specific to that stage [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Fabric: Best practices for lifecycle management in Fabric [link]

[2] Microsoft Learn (2025) Fabric: CI/CD for pipelines in Data Factory in Microsoft Fabric [link]
[3] Microsoft Learn (2025) Fabric: Choose the best Fabric CI/CD workflow option for you [link]

Acronyms:

API - Application Programming Interface
BI - Business Intelligence
CI/CD - Continuous Integration and Continuous Deployment
VS - Visual Studio

12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 11-Apr-2025

[Microsoft Fabric] Copy job in Data Factory

{def}

{benefit} simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation [1]

across cloud data stores [1]
from on-premises data stores behind a firewall [1]
within a virtual network via a gateway [1]

elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination [1]
{benefit} provides seamless data integration

through over 100 built-in connectors [3]
provides essential tools for data operations [3]

{benefit} provides intuitive experience

easy configuration and monitoring [1]

{benefit} efficiency

enable incremental copying effortlessly, reducing manual intervention [1]

{benefit} less resource utilization and faster copy durations

flexibility to control data movement [1]

choose which tables and columns to copy
map the data
define read/write behavior
set schedules that fit requirements whether [1]

applies for a one-time or recurring jobs [1]

{benefit} robust performance

the serverless setup enables data transfer with large-scale parallelism
maximizes data movement throughput [1]

fully utilizes network bandwidth and data store IOPS for optimal performance [3]

{feature} monitoring

once a job executed, users can monitor its progress and metrics through either [1]

the Copy job panel

shows data from the most recent runs [1]

reports several metrics

status
row read
row written
throughput

the Monitoring hub

acts as a centralized portal for reviewing runs across various items [4]

{mode} full copy

copies all data from the source to the destination at once

{mode|GA} incremental copy

the initial job run copies all data, and subsequent job runs only copy changes since the last run [1]
an incremental column must be selected for each table to identify changes [1]

used as a watermark

allows comparing its value with the same from last run in order to copy the new or updated data only [1]
the incremental column can be a timestamp or an increasing INT [1]

{scenario} copying from a database

new or updated rows will be captured and moved to the destination [1]

{scenario} copying from a storage store

new or updated files identified by their LastModifiedTime are captured and moved to the destination [1]

{scenario} copy data to storage store

new rows from the tables or files are copied to new files in the destination [1]

files with the same name are overwritten [1]

{scenario} copy data to database

new rows from the tables or files are appended to destination tables [1]

the update method to merge or overwrite [1]

{default} appends data to the destination [1]

the update method can be adjusted to

{operation} merge

a key column must be provided

{default} the primary key is used, if available [1]

{operation} overwrite

availability

the same regional availability as the pipeline [1]

billing meter

Data Movement, with an identical consumption rate [1]

{feature} robust Public API

{benefit} allows to automate and manage Copy Job efficiently [2]

{feature} Git Integration

{benefit} allows to leverage Git repositories in Azure DevOps or GitHub [2]
{benefit} allows to seamlessly deploy Copy Job with Fabric’s built-in CI/CD workflows [2]

{feature|preview} VNET gateway support

enables secure connections to data sources within virtual network or behind firewalls

Copy Job can be executed directly on the VNet data gateway, ensuring seamless and secure data movement [2]

{feature} upsert to Azure SQL Database
{feature} overwrite to Fabric Lakehouse
{feature} [Jul-2025] native CDC

enables efficient and automated replication of changed data including inserted, updated and deleted records from a source to a destination [5]

ensures destination data stays up to date without manual effort

improves efficiency in data integration while reducing the load on source systems [5]

see Data Movement - Incremental Copy meter

consumption rate of 3 CU

{benefit} zero manual intervention

automatically captures incremental changes directly from the source [5]

{benefit} automatic replication

keeps destination data continuously synchronized with source changes [5]

{benefit} optimized performance

processes only changed data

reduces processing time and minimizing load on the source [5]

smarter incremental copy

automatically detects CDC-enabled source tables and allows to select either CDC-based or watermark-based incremental copy for each table [5]

applies to

CDC-enabled tables

CDC automatically captures and replicates actions on data

non-CDC-enabled tables

Copy Job detects changes by comparing an incremental column against the last run [5]

then merges or appends the changed data to the destination based on configuration [5]

supported connectors

⇐ applies to sources and destinations
Azure SQL DB [5]
On-premises SQL Server [5]
Azure SQL Managed Instance [5]

{enhancement} column mapping for simple data modification to storage as destination store [2]
{enhancement} data preview to help select the right incremental column [2]
{enhancement} search functionality to quickly find tables or columns [2]
{enhancement} real-time monitoring with an in-progress view of running Copy Jobs [2]
{enhancement} customizable update methods & schedules before job creation [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Fabric: What is the Copy job in Data Factory for Microsoft Fabric? [link]

[2] Microsoft Fabric Updates Blog (2025) Recap of Data Factory Announcements at Fabric Conference US 2025 [link]

[3] Microsoft Fabric Updates Blog (2025) Fabric: Announcing Public Preview: Copy Job in Microsoft Fabric [link]

[4] Microsoft Learn (2025) Fabric: Learn how to monitor a Copy job in Data Factory for Microsoft Fabric [link]
[5] Microsoft Fabric Updates Blog (2025) Fabric: Simplifying Data Ingestion with Copy job – Introducing Change Data Capture (CDC) Support (Preview) [link]
[6] Microsoft Learn (2025) Fabric: Change data capture (CDC) in Copy Job (Preview) [link]

[7] Microsoft Fabric Updates Blog (2025) Simplifying Data Ingestion with Copy job – Incremental Copy GA, Lakehouse Upserts, and New Connectors [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]
[R2] Microsoft Learn (2025) Microsoft Fabric decision guide: copy activity, Copy job, dataflow, Eventstream, or Spark [link]

Acronyms:

API - Application Programming Interface
CDC - Change Data Capture
CI/CD - Continuous Integration and Continuous Deployment

CU - Capacity Unit

DevOps - Development & Operations

DF - Data Factory
IOPS - Input/Output Operations Per Second

VNet - Virtual Network

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

{def} system database available as a global resource to all users connected to a Database Engine instance [1]

does not persist after SQL Server shuts down

created at restart using the specifications from the model database
doesn’t need crash recovery
requires rollback capabilities

{scenario} when updating a row in a global temporary table and rolling back the transaction [12]

there's no way to undo this change unless was logged the ‘before value’ of the update [12]
there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]

{scenario} during an Insert operation

only redo information is needed
when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]

only redo information is needed

not all objects in TempDB are subject to logging [18]

{characteristic} critical database

⇒ needs to be configured adequately
has the highest level of create and drop actions [5]
under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
workers use tempdb like any other database

any worker can issue I/O to and from tempdb as needed [5]

if the tempdb database cannot be created, SQL Server will not start [7]

{recommendation} start SQL Server by using the (-f) startup parameter

files

data file:

{default} dempdb.mdf

log file

{default} templog.ldf

{operation} changing the tempdb database

{step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
{step} stop and restart SQL Server

{operation} create file group

only one file group for data and one file group for logs are allowed.
{default} the number of files is set to 1
can be a high contention point for internal tracking structures

often it's better to create multiple files for tempdb

multiple files can be created for each file group

{recommendation} set the number of files to match the number of CPUs that are configured for the instance

⇐ if the number of logical processors is less than or equal to eight [1]

otherwise, use eight data files [1]
contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]

⇐ it’s not imperative

{scenario}add a file per CPU (aka multiple files)

allows the file group manager to

pick the next file

there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]

then use the target location to start locating free space [4]

⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)

allows each of the logical schedulers to loosely align with a file [4]

since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention [4]

internal latching and other activities achieve better separation (per file)

⇒ the workers don’t cause a resource contention on a single tempdb file

rare type of contention [5]

{misconception}

leads to new I/O threads per file [5]

{scenario} add more data files

may help to solve potential performance problems that are due to I/O operations.
helps to avoid a latch contention on allocation pages

⇐ manifested as a UP-latch)

{downside} having too many files

increases the cost of file switching
requires more IAM pages
increases the manageability overhead.

{recommendation} configure the size of the files to be equal

⇐ to better use the allocation mechanism (proportional fill),

{recommendation} create tempdb files striped across fast disks

the size of the tempdb affect the performance of the database
{event} server restarts

the tempdb file size is reset to the configured value (the default is 8 MB).

{feature} auto grow

temporary

⇐ unlike other types of databases)

when the tempdb grows, all transactional activity may come to an halt

because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
{recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]

{recommendation} should only be used more for exceptions rather than as a strategy [12]

transactions

lose the durability attribute
can be rolled back

there is no need to REDO them because the contents of tempdb do not persist across server restarts

because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions

most internal operations on tempdb do not generate log records because there is no need to roll back

{restriction} add filegroups [1]
{restriction} remove the primary filegroup, primary data file, or log file [1]
{restriction} renaming the database or primary filegroup [1]
{restriction} back up or restore the database [1]
{restriction} change collation [1]

the default collation is the server collation [1]

{restriction} change the database owner [1]

owned by sa

{restriction} create a database snapshot [1]
{restriction} drop the database [1]
{restriction} drop the quest user from the database [1]
{restriction} enable CDC
{restriction} participate in database mirroring [1]
{restriction} can only be configured in the simple recovery model

⇒ the value can’t be changed

{restriction} auto shrink is not allowed for tempdb

database shrink and file shrink capabilities are limited

⇐ because many of the hidden objects stored in tempdb cannot be moved by shrink operations

{restriction} the database CHECKSUM option cannot be enabled
{restriction} database snapshots cannot be created
{restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.

only offline checking for DBCC CHECKTABLE is performed

⇒ TAB-S lock is needed
here are internal consistency checks that occur when tempdb is in use

if the checks fail, the user connection is broken and the tempdb space used by the connection is freed.

{restriction} set the database to OFFLINE
{restriction} set the database or primary filegroup to READ_ONLY [1]

used by

{feature} query
{feature} temporary tables
{feature} table variables
{feature} table-valued functions
{feature} user-defined functions
{feature} online/offline index creation
{feature} triggers
{feature} cursors
{feature} RCSI
{feature} MARS
{feature} DBCC CHECK
{feature} LOB parameters
{feature} Service Broker and event notification
{feature} XML and LOB variable
{feature} query notifications
{feature} database mail

used to store

user objects

objects explicitly created

user objects that can be created in a user database can also be created in tempdb [1]

{limitation} there's no durability guarantee [1]
{limitation} dropped when the Database Engine instance restarts [1]

{type} global and local temporary tables

correspond to ## and # tables and table variables created explicitly by the application [12]

REDO information is not logged

{type} indexes on global and local temporary tables
{type} table variables
{type} tables returned in table-valued functions
{type} cursors
improved caching for temporary objects

temp objects

only cached when none of the following conditions is violated: [12]

named constraints are not created

DDL statements that affect the table are not run after the temp table has been created

e.g. CREATE INDEX or CREATE STATISTICS statements

not created by using dynamic SQ

e.g. sp_executesql N'create table #t(a int)'

internal objects

objects created internally by SQL Server

each object uses a minimum of nine pages [1]

an IAM page and an eight-page extent [1]

{type} work tables

store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]

{type} work files

used for for hash join or hash aggregate operations [1]

{type} intermediate sort results

used for operations

creating or rebuilding indexes

if SORT_IN_TEMPDB is specified

certain GROUP BY, ORDER BY
UNION queries

{restriction} applications cannot directly insert into or delete rows from them
{restriction} metadata is stored in memory
{restriction} metadata does not appear in system catalog views such as sys.all_objects.
{restriction} considered to be hidden objects
{restriction} updates to them do not generate log records
{restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated.
each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
used

to store intermediate runs for sort
to store intermediate results for hash joins and hash aggregates.
to store XML variables or other LOB data type variables

e.g. text, image, ntext, varchar(max), varbinary(max), and all others

by queries that need a spool to store intermediate results
by keyset cursors to store the keys
by static cursors to store a query result
by Service Broker to store messages in transit
by INSTEAD OF triggers to store data for internal processing
by any feature that uses the above mentioned operations

version stores

collections of data pages that hold the data rows that support row versioning [1]
{type} common version store
{type} online index build version store

{} storage

I/O characteristics

{requirement} read after rights

{def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]

{recommendation} writer ordering

{def} the ability of the subsystem to maintain the correct order of write operations [7]

{recommendation} torn I/O prevention

{def} the ability of the system to avoid splitting individual I/O requests [7]

{requirement} physical sector alignment and size

devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]

can be put on specialty systems

⇐ they can’t be used for other databases
RAM disks

non-durable media
support double RAM cache

one in the buffer pool and one on the RAM disk

⇐ directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]

give up RAM

implementations of RAM disks and RAM-based files caches

solid state

high speed subsystems
{recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]

sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]

{feature} logging optimization

avoids logging the "after value" in certain log records in tempdb

when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file

can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device

{feature} instant data file initialization

isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
{benefit} minimizes overhead significantly when tempdb needs to auto grow.

without this, auto grow could take a long time and lead to application timeout.

reduces the impact of database creation [5]

because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]

reduces the gain from using multiple threads during database creation. [5]

{feature} proportional fill optimization

reduces UP latch contention in tempdb
when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time

⇐ accomplished by removing a latch that was taken during proportional fill.

{feature} deferred drop in tempdb

when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait

⇒ faster response time to applications.

{feature} worktable caching

{improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated

⇐ in addition, the first nine pages for the work table are kept.

{feature} caches temporary objects

when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming

⇐ this can cause contentions on tempdb system catalog tables and allocation pages

Previous Post <<||>> Next Post

References

[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005

[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]

[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]

[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]

[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]

[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]

[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output

LOB - large object
MARS - Multiple Active Result Sets

NTFS - New Technology File System

RAM - Random-Access Memory

RCSI - Read Committed Snapshot Isolation

07 April 2025

🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 7-Apr-2025

[Microsoft Fabric] User Data Functions (UDFs)

{def} a platform that allows customers to host and run custom logic on Fabric from different types of items and data sources [1]
empower data developers to write custom logic and embed it into their Fabric ecosystem [1]

the logic can include internal algorithms and libraries [1]
{benefit} allows to handle complex transformations, optimizing performance, and integrating diverse data sources beyond the capabilities of low-code tools [5]

via public libraries from PyPI
via native Fabric integrations to

connect to Fabric data sources

e.g. warehouse, lakehouse or SQL Database

invoke functions from other Fabric items

e.g. notebooks, Power BI reports, data pipelines

via edit functions directly in the Fabric portal [1]

via in-browser tooling, VS Code extension

supports the Python 3.11 runtime [1]

{goal} reusability

by creating libraries of standardized functionality [1]

can be used in many solutions across the organization [1]

{goal} customization

the applications are tailored to customers’ needs [1]

{goal} encapsulation

functions can perform various tasks to build sophisticated workflows [1]

{goal} external connectivity

data functions can be invoked from external client applications using a REST endpoint [1]
allows integrations with external systems [1]

{feature} programming model

SDK that provides the necessary functionality to author and publish runnable functions in Fabric [4]
{benefit} allows to seamlessly integrate with other items in the Fabric ecosystem [4]

e.g. Fabric data sources

{concept} user data functions item

contains one or many functions that can be invoked from the Fabric portal using the provided REST endpoint [4]

from another Fabric item, or from an external application

each function is a method in Python script that allows passing parameters and returning an output to the invoker [4]

{component} fabric.functions library

provides the code needed to create user data functions in Python [4]
imported in the template by default [4]

{method} fn.UserDataFunctions()

provides the execution context [4]
added at the beginning of the code file in all new user data functions items, before any function definitions [4]

{concept} functions

every function is identified with a @udf.function() decorator

can be invoked individually from the portal or an external invoker [4]
functions without the decorator can be invoked directly [4]

{feature} invocation logs

function invocations are logged
{benefit} allows to check the status or perform debugging [3]
{limitation} can take few minutes to appear [3]

{recommendation} refresh the page after a few minutes [3]

{limitation} daily ingestion limit: 250 MB [3]

reset the next day

a new log is made available [3]

{limitation} logs are sampled while preserving a statistically correct analysis of application data [3]

the sampling is done by User data functions to reduce the volume of logs ingested [3]
if some logs are partially missing, it might be because of sampling [3]

{limitation} supported log types: information, error, warning, trace [3]

{feature} Manage connections

{limitation} only supports connecting to Fabric-native data sources [2]

{limitation} only available in a subset of Fabric regions [2]
{limitation} editable by the owner only [2]

can only be modified and published by the user who is the owner of the User Data Functions Fabric item [2]

{limitation} adds further keywords to the list of reserved keywords

namely: req, context, reqInvocationId
can't be used as parameter names or function names [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Microsoft Fabric: What is Fabric User data functions (Preview)? [link]

[2] Microsoft Learn (2025) Microsoft Fabric: Service details and limitations of Fabric User Data Functions [link]

[3] Microsoft Learn (2025) Microsoft Fabric: User data functions invocation logs (Preview) [link]

[4] Microsoft Learn (2025) Microsoft Fabric: Fabric User data functions programming model overview (Preview) [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Fabric User Data Functions (Preview) [link]

[6] Microsoft Learn (2025) Microsoft Fabric: Use the Functions activity to run Fabric user data functions and Azure Functions [link]

Resources:

[R1] Microsoft Fabric Updates Blog (2025) Utilize User Data Functions in Data pipelines with the Functions activity (Preview) [link]

Acronyms:
PyPI - Python Package Index
REST - Representational State Transfer
SDK - Software Development Kit
UDF - User Data Function
VS - Visual Studio

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 6-Apr-2025

[Microsoft Fabric] Hints in Warehouse

{def} keywords that users can add to SQL statements to provide additional information or instructions to the query optimizer [2]

options or strategies specified for enforcement by the SQL Server query processor [1]

applicable to SELECT, INSERT, UPDATE, or DELETE statements [1]

{benefit} help improve the performance, scalability, or consistency of queries by overriding the default behavior of the query optimizer [2]
{type} join hints

specify what join strategy/algorithm used between two tables [3]

improves the performance of queries that involve large or complex joins [2]
specified in the FROM clause of a query [1]
if a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query [3]

based on the position of the ON keywords [3]

when a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order [3]

looping

via LOOP JOIN
{restriction} can't be specified together with RIGHT or FULL as a join type [3]

hashing

via HASH JOIN

merging

via MERGE JOIN

REPLICATE

causes a broadcast move operation

a specific table to be replicated across all distribution nodes [2]

with INNER or LEFT` join

the broadcast move operation will replicate the right side of the join to all nodes [2]

with RIGHT join

the broadcast move operation will replicate the left side of the join to all nodes [2]

with FULL` join

an estimated plan cannot be created [2]

REDISTRIBUTE [(colsCount)]

ensures two data sources are distributed based on JOIN clause columns [2]
handles multiple join conditions, specified by the first n columns in both tables, where n is the column_count argument [2]
redistributing data optimizes query performance by evenly spreading data across nodes during intermediate steps of execution [2]
the (columns_count) argument is only supported in MF warehouse [2]

{type} query hint

specify that the indicated hints are used in the scope of a query [3]

affect all operators in the statement

[UNION only the last query involving a UNION operation can have the OPTION clause [3]

specified as part of the OPTION clause [3]
Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan [3]

used via the OPTION clause at the end of a query [2]

followed by the name of the query hint and its optional parameters in parentheses [2]
multiple hints can be used in the same query, separated by commas

e.g. FORCE ORDER and MAX_GRANT_PERCENT

instruct the QO to preserve the join order specified in the query and to limit the memory grant to 20 percent of the available memory

{hint} HASH GROUP

specifies that the QO should use a hash-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve large or complex grouping sets [2]

{hint} ORDER GROUP

specifies that the QO should use a sort-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve small or simple grouping sets [2]

{hint} MERGE UNION

specifies that the QO should use a merge-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve sorted inputs [2]

{hint} HASH UNION

specifies that the query optimizer should use a hash-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve unsorted or large inputs [2]

{hint} CONCAT UNION

specifies that the QO should use a concatenation-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improves the performance of queries that involve distinct or small inputs[2]

{hint} FORCE ORDER

specifies that the QO should preserve the join order specified in the query [2]
{benefit} can improves the performance or consistency of queries that involve complex join conditions or hints [2]

{hint} FORCE SINGLE NODE PLAN/FORCE DISTRIBUTED PLAN

allows to choose whether to force a single node plan or a distributed plan for query’s execution [2]

{hint} USE HINT

adds one or more extra hints to the query processor, where the hints are specified with a hint name inside single quotation marks inside OPTION clause [2] OPTION(USE HINT(‘ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES’))
used with several hint names, changing the behavior of CE derivation

{hint name} ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full correlation among filters when a high level of underestimation on AND predicates for filters is observed [2]
[SQL Server] equivalent TF 4137 [4]

{hint name} ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full independence among filters [2]

if a high level of overestimation on AND predicates for filters is observed, this hint can help produce a better estimate [2]

{hint name} ASSUME_PARTIAL_CORRELATION_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
{default} MF assumes partial correlation among filters [2]

⇐ it is unlikely that this hint will help improve the estimates [2]

{hint name} ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS

applies when calculating cardinality estimate for joins [2]
uses Simple Containment assumption instead of the default Base Containment assumption [2]
[SQL Server] equivalent TF 9476 [4]

{type} table hints

none enforced currently

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) SQL: Hints (T-SQL) [link]

[2] Microsoft Fabric Updates Blog (2025) Hints in Fabric Data Warehouse [link]
[3] Microsoft Learn (2025) SQL: OPTION clause (T-SQL) [link]
[4] Microsoft Support (2016) KB3189813 - Update introduces USE HINT query hint argument in SQL Server 2016 [link]

Resources:
[R1] Microsoft Learn (2025) SQL Server: Query hints (T-SQL) [link]
[R2] Most Useful Query Hints [link]

Acronyms:
MF - Microsoft Fabric
TF - Trace Flag
QO - Query Optimizer

SQL Troubles

Pages

16 April 2025

🧮ERP: Implementations (Part XIV: A Never-Ending Story)

🧮ERP: Implementations (Part XIII: On Project Management)

15 April 2025

🧮ERP: Implementations (Part XII: The Process Perspective)

14 April 2025

🧮ERP: Implementations (Part XI: Tales from the Crypt)

🏭🗒️Microsoft Fabric: Quotas [Notes] 🆕

13 April 2025

🏭🗒️Microsoft Fabric: Continuous Integration & Continuous Deployment [CI/CD] [Notes]

12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

07 April 2025

🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes] 🆕

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

About Me