SQL Troubles: processes

Showing posts with label processes. Show all posts

09 August 2025

🧭Business Intelligence: Perspectives (Part 33: Data Lifecycle for Analytics)

Business Intelligence Series

In the context of BI, Analytics and other data-related topics, the various parties usually talk about data ingestion, preparation, storage, analysis and visualization, often ignoring processes like data generation, collection, and interpretation. It’s also true that a broader discussion may shift the attention unnecessarily, though it’s important to increase people’s awareness in respect to data’s full lifecycle. Otherwise, many of the data solutions become a mix of castles built into the air, respectively structures of cards waiting for the next flurry to be blown away.

Data is generated continuously by organizations, their customers, vendors, and third parties, as part of a complex network of processes, systems and integrations that extend beyond their intended boundaries. Independently of their type, scope and various other characteristics, all processes consume and generate data at a rapid pace that steadily exceeds organizations’ capabilities to make good use of it.

There are also scenarios in which the data must be collected via surveys, interviews, forms, measurements or direct observations, and whatever processes are used to elicit some aspect of importance. The volume and other characteristics of data generated in this way may depend on the goals and objectives in scope, respectively the methods, procedures and even the methodologies used.

Data ingestion is the process of importing data from the various sources into a central or intermediary repository for storage, processing, analysis and visualization. The repository can be a data mart, warehouse, lakehouse, data lake or any other destination intended for the intermediary or the final intended destination of data. Moreover, data can have different levels of quality in respect to its intended usage.

Data storage refers to the systems and approaches used to securely retain, organize, and access data throughout its journey within the various layers of the infrastructure. It focuses on where and how data is stored, independently on whether that’s done on-premises, in the cloud or across hybrid environments.

Data preparation is the process of transforming the data into a form close to what is intended for analysis and visualization. It may involve data aggregation, enrichment, transposition and other operations that facilitate further steps. It’s probably the most important step in a data project given that the final outcome can have an important impact on data analysis and visualization, facilitating or impeding the respective processes.

Data analysis consists of a multitude of processes that attempt to harness value from data in its various forms of aggregation. The ultimate purpose is to infer meaningful information, respectively knowledge from the data augmented as insights. The road from raw data to these targeted outcomes is a tedious one, where recipes can help and imped altogether. Expecting value from any pile of data can easily become a costly illusion when data, processes and their usage is poorly understood and harnessed.

Data visualization is the means of presenting data and its characteristics in the form of figures, diagrams and other forms of representation that facilitate data’s navigation, perception and understanding for various purposes. Usually, the final purpose is fact-checking, decision-making, problem-solving, etc., though there is a multitude of steps in between. Especially in these areas there are mixed good and poor practices altogether.

Data interpretation is the attempt of drawing meaningful conclusions from the data, information and knowledge gained mainly from data analysis and visualization. It is often a subjective interpretation as it’s usually regarded from people’s understanding of the various facts as they are considered. The inferences made in the process can be a matter of gut feeling, respectively of mature analysis. It’s about sense-making, contextualization, critical thinking, pattern recognition, internalization and externalization, and other similar cognitive processes.

Previous Post <<||>> Next Post

13 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 20: The Negative Influence of AI on the Recruitment Process)

Prompt Engineering Series

Prompt: "write a blogpost of 600 words on how much AI influences the recruitment process"

Introduction

Artificial Intelligence (AI) has transformed the hiring landscape, making recruitment faster, more efficient, and data-driven. However, despite its advantages, AI also introduces significant drawbacks that negatively impact job seekers and hiring practices. From bias in algorithms to depersonalization of the hiring process, AI’s influence on recruitment is not always positive.

1. AI Can Reinforce Bias in Hiring

One of the biggest concerns with AI-driven recruitment is algorithmic bias. AI models are trained on historical hiring data, which may contain gender, racial, or socioeconomic biases. If AI systems are not carefully monitored, they can perpetuate discrimination rather than eliminate it.

For example, AI-powered resume screening tools may favor candidates from certain backgrounds while filtering out others. In some cases, AI has been found to reject qualified applicants simply because their resumes were formatted differently than expected.

2. AI Depersonalizes the Hiring Process

AI-driven recruitment removes the human touch from hiring decisions. Automated systems prioritize efficiency over personal interaction, leading to:

Lack of meaningful engagement between recruiters and candidates.
Automated rejection emails without feedback, leaving applicants frustrated.
"Ghosting" effect, where candidates never hear back after applying.

This depersonalization can make job seekers feel disconnected and undervalued, reducing trust in the hiring process.

3. Over-Reliance on AI Reduces Human Judgment

AI excels at processing large volumes of applications, but it lacks human intuition. Recruiters who rely too heavily on AI may:

Overlook candidates with unconventional career paths who could be valuable assets.
Ignore soft skills and cultural fit, which are crucial for team dynamics.
Lose critical hiring expertise, as AI replaces traditional recruitment strategies.

AI-driven hiring decisions prioritize data points over human potential, leading to missed opportunities for both employers and job seekers.

4. AI Raises Ethical and Privacy Concerns

AI recruitment tools collect and analyze vast amounts of personal data, raising concerns about:

Privacy violations, as candidates’ information is stored and processed without transparency.
Data security risks, where sensitive applicant details could be exposed.
Unregulated AI decision-making, leading to unfair hiring practices.

Without proper oversight, AI can compromise candidate privacy and create ethical dilemmas in recruitment.

5. AI Can Lead to Unfair Job Market Competition

AI-driven hiring favors candidates who optimize their resumes for algorithms, rather than those with genuine qualifications. This creates:

Unfair advantages for applicants who understand AI screening systems.
Disadvantages for candidates with unique skills that AI fails to recognize.
A hiring process that prioritizes keywords over real-world experience.

Job seekers must adapt to AI-driven hiring trends, but this shift does not always reflect true talent or potential.

Conclusion: AI in Recruitment Needs Human Oversight

While AI enhances efficiency, its negative influence on recruitment cannot be ignored. Bias, depersonalization, ethical concerns, and reduced human judgment highlight the risks of over-reliance on AI.

To ensure fair hiring practices, companies must balance AI automation with human oversight, ensuring that recruitment remains inclusive, ethical, and candidate-focused.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

12 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 19: How Much AI Influences the Recruitment Process)

Prompt Engineering Series

Prompt: write a blogpost of 600 words on how much AI influences the recruitment process

Introduction

Artificial Intelligence (AI) is revolutionizing the way companies hire, assess, and engage with candidates. From automated resume screening to predictive hiring analytics, AI is reshaping recruitment by making it faster, more efficient, and data-driven. But how much influence does AI truly have on the hiring process? Let’s explore the impact AI has on recruitment and what it means for employers and job seekers alike.

1. AI-Powered Resume Screening and Candidate Matching

One of the most significant ways AI influences recruitment is through automated resume screening. Traditional hiring methods require recruiters to manually sift through hundreds - or even thousands - of applications. AI streamlines this process by:

Parsing resumes using Natural Language Processing (NLP) to extract relevant skills and experience.
Matching candidates to job descriptions based on predefined criteria.
Eliminating human bias by focusing on qualifications rather than subjective preferences.

AI-driven Applicant Tracking Systems (ATS) reduce hiring time by up to 50% and ensure recruiters focus on top-tier candidates.

2. AI in Candidate Engagement and Communication

AI-powered chatbots and virtual assistants are transforming candidate interactions. These tools:

Answer applicant questions instantly, improving engagement.
Schedule interviews automatically, reducing administrative workload.
Provide real-time feedback, enhancing the candidate experience.

AI-driven chatbots ensure seamless communication, making recruitment more efficient and accessible.

3. Predictive Analytics for Smarter Hiring Decisions

AI enables data-driven hiring by analyzing past recruitment trends and predicting which candidates are most likely to succeed. Predictive analytics helps companies:

Identify high-performing candidates based on historical data.
Reduce turnover rates by selecting employees with long-term potential.
Optimize hiring strategies to align with business goals.

AI-driven hiring decisions minimize costly hiring mistakes, improving workforce quality.

4. AI in Video Interview Analysis

AI is now being used to analyze video interviews, assessing candidates beyond their resumes. AI tools evaluate:

Speech patterns and tone to gauge confidence and communication skills.
Facial expressions and body language to assess engagement.
Keyword usage to determine alignment with company values.

AI-powered video analysis provides deeper insights into candidate suitability, helping recruiters make more informed decisions.

5. AI’s Role in Reducing Hiring Bias

Bias in hiring has long been a challenge, but AI offers solutions to create fairer recruitment processes. AI helps by:

Standardizing candidate evaluations, removing subjective biases.
Ensuring diverse hiring pools by focusing on skills rather than demographics.
Providing objective assessments, reducing discrimination risks.

However, AI is not perfect - if trained on biased data, it can reinforce existing inequalities. Companies must monitor AI algorithms to ensure fairness.

6. The Future of AI in Recruitment

AI is reshaping recruitment, but it won’t replace human recruiters entirely. Instead, AI will:

Enhance decision-making, allowing recruiters to focus on strategy.
Improve efficiency, reducing hiring costs and time.
Personalize candidate experiences, making hiring more engaging.

As AI adoption grows, companies must balance automation with human insight, ensuring ethical and effective hiring practices.

Conclusion: AI’s Influence on Recruitment Is Unstoppable

AI is revolutionizing recruitment, making hiring faster, smarter, and more data-driven. From resume screening to predictive analytics, AI enhances efficiency while reducing bias and improving candidate engagement. However, human oversight remains essential to ensure AI-driven hiring is fair, ethical, and effective.

Previous Post <<||>> Next Post

26 April 2025

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

[Microsoft Fabric] Deployment Pipelines

{def} a structured process that enables content creators to manage the lifecycle of their organizational assets [5]

enable creators to develop and test content in the service before it reaches the users [5]

can simplify the deployment process to development, test, and production workspaces [5]
one Premium workspace is assigned to each stage [5]
each stage can have

different configurations [5]
different databases or different query parameters [5]

{action} create pipeline

from the deployment pipelines entry point in Fabric [5]

creating a pipeline from a workspace automatically assigns it to the pipeline [5]

{action} define how many stages it should have and what they should be called [5]

{default} has three stages

e.g. Development, Test, and Production
the number of stages can be changed anywhere between 2-10
{action} add another stage,
{action} delete stage
{action} rename stage

by typing a new name in the box

{action} share a pipeline with others

users receive access to the pipeline and become pipeline admins [5]

⇐ the number of stages are permanent [5]

can't be changed after the pipeline is created [5]

{action} add content to the pipeline [5]

done by assigning a workspace to the pipeline stage [5]

the workspace can be assigned to any stage [5]

{action|optional} make a stage public

{default} the final stage of the pipeline is made public
a consumer of a public stage without access to the pipeline sees it as a regular workspace [5]

without the stage name and deployment pipeline icon on the workspace page next to the workspace name [5]

{action} deploy to an empty stage

when finishing the work in one pipeline stage, the content can be deployed to the next stage [5]

deployment can happen in any direction [5]

{option} full deployment

deploy all content to the target stage [5]

{option} selective deployment

allows select the content to deploy to the target stage [5]

{option} backward deployment

deploy content from a later stage to an earlier stage in the pipeline [5]
{restriction} only possible when the target stage is empty [5]

{action} deploy content between pages [5]

content can be deployed even if the next stage has content

paired items are overwritten [5]

{action|optional} create deployment rules

when deploying content between pipeline stages, allow changes to content while keeping some settings intact [5]
once a rule is defined or changed, the content must be redeployed

the deployed content inherits the value defined in the deployment rule [5]
the value always applies as long as the rule is unchanged and valid [5]

{feature} deployment history

allows to see the last time content was deployed to each stage [5]
allows to to track time between deployments [5]

{concept} pairing

{def} the process by which an item in one stage of the deployment pipeline is associated with the same item in the adjacent stage

applies to reports, dashboards, semantic models
paired items appear on the same line in the pipeline content list [5]

⇐ items that aren't paired, appear on a line by themselves [5]

the items remain paired even if their name changes
items added after the workspace is assigned to a pipeline aren't automatically paired [5]

⇐ one can have identical items in adjacent workspaces that aren't paired [5]

[lakehouse]

can be removed as a dependent object upon deployment [3]
supports mapping different Lakehouses within the deployment pipeline context [3]
{default} a new empty Lakehouse object with same name is created in the target workspace [3]

⇐ if nothing is specified during deployment pipeline configuration
notebook and Spark job definitions are remapped to reference the new lakehouse object in the new workspace [3]
{warning} a new empty Lakehouse object with same name still is created in the target workspace [3]
SQL Analytics endpoints and semantic models are provisioned
no object inside the Lakehouse is overwritten [3]
updates to Lakehouse name can be synchronized across workspaces in a deployment pipeline context [3]

[notebook] deployment rules can be used to customize the behavior of notebooks when deployed [4]

e.g. change notebook's default lakehouse [4]
{feature} auto-binding

binds the default lakehouse and attached environment within the same workspace when deploying to next stage [4]

[environment] custom pool is not supported in deployment pipeline

the configurations of Compute section in the destination environment are set with default values [6]
⇐ subject to change in upcoming releases [6]

[warehouse]

[database project] ALTER TABLE to add a constraint or column

{limitation} the table will be dropped and recreated when deploying, resulting in data loss

{recommendation} do not create a Dataflow Gen2 with an output destination to the warehouse

⇐ deployment would be blocked by a new item named DataflowsStagingWarehouse that appears in the deployment pipeline [10]

SQL analytics endpoint is not supported

[Eventhouse]

{limitation} the connection must be reconfigured in destination that use Direct Ingestion mode [8]

[EventStream]

{limitation} limited support for cross-workspace scenarios

{recommendation} make sure all EventStream destinations within the same workspace [8]

KQL database

applies to tables, functions, materialized views [7]

KQL queryset

⇐ tabs, data sources [7]

[real-time dashboard]

data sources, parameters, base queries, tiles [7]

[SQL database]

includes the specific differences between the individual database objects in the development and test workspaces [9]

can be also used with

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Get started with deployment pipelines [link]

[2] Microsoft Learn (2024) Implement continuous integration and continuous delivery (CI/CD) in Microsoft Fabric [link]

[3] Microsoft Learn (2024) Lakehouse deployment pipelines and git integration (Preview) [link]

[4] Microsoft Learn (2024) Notebook source control and deployment [link]

[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Environment Git integration and deployment pipeline [link]

[7] Microsoft Learn (2024) Microsoft Learn (2024) Real-Time Intelligence: Git integration and deployment pipelines (Preview) [link]

[8] Microsoft Learn (2024) Eventstream CI/CD - Git Integration and Deployment Pipeline [link]

[9] Microsoft Learn (2024) Get started with deployment pipelines integration with SQL database in Microsoft Fabric [link]

[10] Microsoft Learn (2025) Source control with Warehouse (preview) [link]

Resources:

Acronyms:
CLM - Content Lifecycle Management
UAT - User Acceptance Testing

20 April 2025

🧮ERP: Implementations (Part XVII: Taming the Monsters)

ERP Implementations Series

Given their extensive scope, duration, investment and complexity, ERP implementations are probably one of the most complex endeavors pursued by organizations. Moreover, they are often a matter of endurance with many junctions, skirts, turns, shortcuts, ups and downs, a continuous carousel in which the various issues tend to misbehave like little monsters, many of them haunting one’s dreams unexpectedly during and long after implementations.

Probably, the main drivers are the scale and mass of such projects as they touch all or most important aspects of organizations. Just consider the typical project done for a single department and multiply its complexity by a constant number representing the number of departments in scope. And the more one goes into details, the higher the complexity. To move forward the parties need to compromise, and as no one wants to do that, the discussions are prolonged, the discussions get personal, issues are escalated, and probably more negative effects can be met.

Tensions can be rooted in politics, in the friction between different goals, in the need to prioritize requirements, postponing or leaving things out of scope, or by pushing an agenda other parties don't agree with. Besides the typical constraints of projects, there’s the complexity of performing a huge amount of work within a limited period, time during which the resources must be available, the quality must match the expectations, and there are so many aspects to be considered!

Of course, not all implementations are like this, though each such project is a real exam of maturity for the people involved in it. Sometimes, it’s better to have people who care about the decisions made. On the opposite side, there are organizations that go almost blindly with the solutions suggested to them, with all the effects resulting from this. Probably, the middle way between these two extremes is more indicated, though it’s hard to find such a path through all complexity.

An ERP implementation is highly dependent on the initial conditions under which the project has started, the commitment made by the various parties involved in the project, the way resources are made available, on what’s considered in plan, on the communication that takes place, the planning done and its enforcement, etc. Of course, some topics can be addressed also later, though delays tend to create more delays that can have a ripple effect through the project. Under normal circumstances the backlog and other aspects can be manageable, though it’s enough for a few issues to gather momentum so that their cumulative impact can have an exponential impact.

Certain sensitive project topics can easily lead to crises and abnormal behavior, though such situations are usually exceptions (until they are not). It’s important to have in place the processes and procedures that can be used to address this kind of situation, and, not less important, have them communicated to the team. Moreover, it’s not necessary to reinvent the wheel - the processes defined in IT and project methodologies can be used and adapted for this purpose.

It's important to have in place all the processes, procedures and checkpoints needed to support the project. The people participating in a project should have some hands-on experience with them, including the exceptions (e.g. escalation procedures). It’s useful to have a mentor or some experienced person who can help with advice and even attend meetings and provide constructive feedback. Just having some awareness sessions with no feedback can be as dangerous as not having any training at all! It’s suboptimal to use the implementation itself as an environment for learning though in extremis this approach may work as well.

Previous Post <<||>> Next Post

15 April 2025

🧮ERP: Implementations (Part XII: The Process Perspective)

ERP Implementations Series

Technology can have a tremendous potential impact on organizations, helping them achieve their strategic goals and objectives, however it takes more than an implementation of one or more technologies to leverage that potential! This applies to ERP and other technology implementations altogether, but the role of technology is more important in the latter through its transformative role. ERP implementations can be the foundation on which the whole future of the organization is built upon, and it’s ideal to have a broader strategy that looks at all the facets of an organization pre-, during and postimplementation.

One of the most important assets an organization has is its processes, organization’s success depending on the degree the processes are used to leverage the various strategies. Many customers want their business processes to be implemented on the new platform and that's the point where many projects go in the wrong direction! There are probably areas where this approach makes sense, though organizations need to look also at the alternatives available in the new ecosystem, identify and prioritize the not existing features accordingly. There will be also extreme cases in which one or a mix of systems will be considered as not feasible, and this is an alternative that should be considered during such evaluations!

An ERP system allows organizations to implement their key value-creation processes by providing a technological skeleton with a set of configurations and features that can be used to address a wide set of requirements. Such a framework is an enabler - makes things possible - though the potential is not reached automatically, and this is one of the many false assumptions associated with such projects. Customers choose such a system and expect magic to happen! Many of the false perceptions are strengthened by implementers or the other parties involved in the projects. As in other IT areas, there are many misconceptions that pervade.

An ERP provides thus a basis on which an organization can implement its processes. Doing an ERP implementation without process redesign is seldom possible, even if many organizations want to avoid it at all costs. Even if organization’s processes are highly standardized, expecting a system to model them by design is utopian, given that ERP system tends to target the most important aspects identified across industries. And thus, customizations come into play, some of them done without looking for alternatives already existing in the intrinsic or extended range of solutions available in an ERP’s ecosystem.

One of the most important dangers is when an organization’s processes are so complex that their replication in the new environment creates more issues that the implementation can solve. At least in the first phases of the implementation, organizations must learn to compromise and focus on the critical aspects without which the organization can’t do its business. Moreover, the costs of implementations tend to increase exponentially, when multiple complex requirements are added to address the gaps. Organizations should always look at alternatives – integrations with third party systems tend to be more cost-effective than rebuilding the respective functionality from scratch!

It's also true that some processes are too complex to be implemented, though the solution resides usually in the middle. Each customization adds another level of complexity, and a whole range of risk many customers take. Conversely, there’s no blueprint that works for everybody. Organizations must thus compromise and that’s probably one of the most important aspects they should be aware of! However, also compromises must be made in the right places, while evaluating alternatives and the possible outcomes. It’s important to be aware of the full extent of the implications for their decisions.

Previous Post <<||>> Next Post

15 February 2025

🧭Business Intelligence: Perspectives (Part 27: A Tale of Two Cities II)

Business Intelligence Series

There’s a saying that applies to many contexts ranging from software engineering to data analysis and visualization related solutions: "fools rush in where angels fear to tread" [1]. Much earlier, an adage attributed to Confucius provides a similar perspective: "do not try to rush things; ignore matters of minor advantage". Ignoring these advices, there's the drive in rapid prototyping to jump in with both feet forward without checking first how solid the ground is, often even without having adequate experience in the field. That’s understandable to some degree – people want to see progress and value fast, without building a foundation or getting an understanding of what’s happening, respectively possible, often ignoring the full extent of the problems.

A prototype helps to bring the requirements closer to what’s intended to achieve, though, as the practice often shows, the gap between the initial steps and the final solutions require many iterations, sometimes even too many for making a solution cost-effective. There’s almost always a tradeoff between costs and quality, respectively time and scope. Sooner or later, one must compromise somewhere in between even if the solution is not optimal. The fuzzier the requirements and what’s achievable with a set of data, the harder it gets to find the sweet spot.

Even if people understand the steps, constraints and further aspects of a process relatively easily, making sense of the data generated by it, respectively using the respective data to optimize the process can take a considerable effort. There’s a chain of tradeoffs and constraints that apply to a certain situation in each context, that makes it challenging to always find optimal solutions. Moreover, optimal local solutions don’t necessarily provide the optimum effect when one looks at the broader context of the problems. Further on, even if one brought a process under control, it doesn’t necessarily mean that the process works efficiently.

This is the broader context in which data analysis and visualization topics need to be placed to build useful solutions, to make a sensible difference in one’s job. Especially when the data and processes look numb, one needs to find the perspectives that lead to useful information, respectively knowledge. It’s not realistic to expect to find new insight in any set of data. As experience often proves, insight is rarer than finding gold nuggets. Probably, the most important aspect in gold mining is to know where to look, though it also requires luck, research, the proper use of tools, effort, and probably much more.

One of the problems in working with data is that usually data is analyzed and visualized in aggregates at different levels, often without identifying and depicting the factors that determine why data take certain shapes. Even if a well-suited set of dimensions is defined for data analysis, data are usually still considered in aggregate. Having the possibility to change between aggregates and details is quintessential for data’s understanding, or at least for getting an understanding of what's happening in the various processes.

There is one aspect of data modeling, respectively analysis and visualization that’s typically ignored in BI initiatives – process-wise there is usually data which is not available and approximating the respective values to some degree is often far from the optimal solution. Of course, there’s often a tradeoff between effort and value, though the actual value can be quantified only when gathering enough data for a thorough first analysis. It may also happen that the only benefit is getting a deeper understanding of certain aspects of the processes, respectively business. Occasionally, this price may look high, though searching for cost-effective solutions is part of the job!

Previous Post <<||>> Next Post

References:
[1] Alexander Pope (cca. 1711) An Essay on Criticism

04 February 2025

🧭Business Intelligence: Perspectives (Part 26: Monitoring - A Cockpit View)

Business Intelligence Series

The monitoring of business imperatives is sometimes compared metaphorically with piloting an airplane, where pilots look at the cockpit instruments to verify whether everything is under control and the flight ensues according to the expectations. The use of a cockpit is supported by the fact that an airplane is an almost "closed" system in which the components were developed under strict requirements and tested thoroughly under specific technical conditions. Many instruments were engineered and evolved over decades to operate as such. The processes are standardized, inputs and outputs are under strict control, otherwise the whole edifice would crumble under its own complexity.

In organizational setups, a similar approach is attempted for monitoring the most important aspects of a business. A few dashboards and reports are thus built to monitor and control what’s happening in the areas which were identified as critical for the organization. The various gauges and other visuals were designed to provide similar perspectives as the ones provided by an airplane’s cockpit. At first sight the cockpit metaphor makes sense, though at careful analysis, there are major differences.

Probably, the main difference is that businesses don’t necessarily have standardized processes that were brought under control (and thus have variation). Secondly, the data used doesn’t necessarily have the needed quality and occasionally isn’t fit for use in the business processes, including supporting processes like reporting or decision making. Thirdly, are high the chances that the monitoring within the BI infrastructures doesn’t address the critical aspects of the business, at least not at the needed level of focus, detail or frequency. The interplay between these three main aspects can lead to complex issues and a muddy ground for a business to build a stable edifice upon.

The comparison with airplanes’ cockpit was chosen because the number of instruments available for monitoring is somewhat comparable with the number of visuals existing in an organization. In contrast, autos have a smaller number of controls simple enough to help the one(s) sitting in the cockpit. A car’s monitoring capabilities can probably reflect the needs of single departments or teams, though each unit needs its own gauges with specific business focus. The parallel is however limited because the areas of focus in organizations can change and shift in other directions, some topics may have a periodic character while others can regain momentum after a long time.

There are further important aspects. At high level, the expectation is for software products and processes, including the ones related to BI topics, to have the same stability and quality as the mass production of automobiles, airplanes or other artifacts that have similar complexity and manufacturing characteristics. Even if the design process of software and manufacturing may share many characteristics, the similar aspects diverge as soon as the production processes start, respectively progress, and these are the areas where the most differences lie. Starting from the requirements and ending with the overall goals, everything resembles the characteristics of quick shifting sands on which is challenging to build any stabile edifice.

At micro level in manufacturing each piece was carefully designed and produced according to a set of characteristics that were proved to work. Everything must fit perfectly in the grand design and there are many tests and steps to make sure that happens. To some degree the same is attempted when building software products, though the processes break along the way with the many changes attempted, with the many cost, time and quality constraints. At some point the overall complexity kicks back; it might be still manageable though the overall effort is higher than what organizations bargained for.

Previous Post <<||>> Next Post

26 January 2025

🧭Business Intelligence: Perspectives (Part 25: Grounding the Roots)

Business Intelligence Series

When building something that is supposed to last, one needs a solid foundation on which the artifact can be built upon. That’s valid for castles, houses, IT architectures, and probably most important, for BI infrastructures. There are so many tools out there that allow building a dashboard, report or other types of BI artifacts with a few drag-and-drops, moving things around, adding formatting and shiny things. In many cases all these steps are followed to create a prototype for a set of ideas or more formalized requirements keeping the overall process to a minimum.

Rapid prototyping, the process of building a proof-of-concept by focusing at high level on the most important design and functional aspects, is helpful and sometimes a mandatory step in eliciting and addressing the requirements properly. It provides a fast road from an idea to the actual concept, however the prototype, still in its early stages, can rapidly become the actual solution that unfortunately continues to haunt the dreams of its creator(s).

Especially in the BI area, there are many solutions that started as a prototype and gained mass until they start to disturb many things around them with implications for security, performance, data quality, and many other aspects. Moreover, the mass becomes in time critical, to the degree that it pulled more attention and effort than intended, with positive and negative impact altogether. It’s like building an artificial sun that suddenly becomes a danger for the nearby planet(s) and other celestial bodies.

When building such artifacts, it’s important to define what goals the end-result must or would be nice to have, differentiating clearly between them, respectively when is the time to stop and properly address the aspects mandatory in transitioning from the prototype to an actual solution that addresses the best practices in scope. It’s also the point when one should decide upon solution’s feasibility, needed quality acceptance criteria, and broader aspects like supporting processes, human resources, data, and the various aspects that have impact. Unfortunately, many solutions gain inertia without the proper foundation and in extremis succumb under the various forces.

Developing software artifacts of any type is a balancing act between all these aspects, often under suboptimal circumstances. Therefore, one must be able to set priorities right, react and change direction (and gear) according to the changing context. Many wish all this to be a straight sequential road, when in reality it looks more like mountain climbing, with many peaks, valleys and change of scenery. The more exploration is needed, the slower the progress.

All these aspects require additional time, effort, resources and planning, which can easily increase the overall complexity of projects to the degree that it leads to (exponential) effort and more important - waste. Moreover, the complexity pushes back, leading to more effort, and with it to higher costs. On top of this one has the iteration character of BI topics, multiple iterations being needed from the initial concept to the final solution(s), sometimes many steps being discarded in the process, corners are cut, with all the further implications following from this.

Somewhere in the middle, between minimum and the broad overextending complexity, is the sweet spot that drives the most impact with a minimum of effort. For some organizations, respectively professionals, reaching and remaining in the zone will be quite a challenge, though that’s not impossible. It’s important to be aware of all the aspects that drive and sustain the quality of artefacts, data and processes. There’s a lot to learn from successful as well from failed endeavors, and the various aspects should be reflected in the lessons learned.

Previous Post <<||>> Next Post

15 January 2025

🧭Business Intelligence: Perspectives (Part 23: In between the Many Destinations)

Business Intelligence Series

In too many cases the development of queries, respectively reports or data visualizations (aka artifacts) becomes a succession of drag & drops, formatting, (re)ordering things around, a bit of makeup, configuring a set of parameters, and the desired product is good to go! There seems nothing wrong with this approach as long as the outcomes meet users’ requirements, though it also gives the impression that’s all what the process is about.

Given a set of data entities, usually there are at least as many perspectives into the data as entities’ number. Further perspectives can be found in exceptions and gaps in data, process variations, and the further aspects that can influence an artifact’s logic. All these aspects increase the overall complexity of the artifact, respectively of the development process. One guideline in handling all this is to keep the process in focus, and this starts with requirements’ elicitation and ends with the quality assurance and actual use.

Sometimes, the two words, the processes and their projection into the data and (data) models don’t reflect the reality adequately and one needs to compromise, at least until the gaps are better addressed. Process redesign, data harmonization and further steps need to be upon case considered in multiple iterations that should converge to optimal solutions, at least in theory.

Therefore, in the development process there should be a continuous exploration of the various aspects until an optimum solution is reached. Often, there can be a couple of competing forces that can pull the solution in two or more directions and then compromising is necessary. Especially as part of continuous improvement initiatives there’s the tendency of optimizing locally processes in the detriment of the overall process, with all the consequences resulting from this.

Unfortunately, many of the problems existing in organizations are ill-posed and misunderstood to the degree that in extremis more effort is wasted than the actual benefits. Optimization is a process of putting in balance all the important aspects, respectively of answering with agility to the changing nature of the business and environment. Ignoring the changing nature of the problems and their contexts is a recipe for failure on the long term.

This implies that people in particular and organizations in general need to become and remain aware of the micro and macro changes occurring in organizations. Continuous learning is the key to cope with change. Organizations must learn to compromise and focus on what’s important, achievable and/or probable. Identifying, defining and following the value should be in an organization’s ADN. It also requires pragmatism (as opposed to idealism). Upon case, it may even require to say “no”, at least until the changes in the landscape offer a reevaluation of the various aspects.

One requires a lot from organizations when addressing optimization topics, especially when misalignment or important constraints or challenges may exist. Unfortunately, process related problems don’t always admit linear solutions. The nonlinear aspects are reflected especially when changing the scale, perspective or translating the issues or solutions from one are area to another.

There are probably answers available in the afferent literature or in the approaches followed by other organizations. Reinventing the wheel is part of the game, though invention may require explorations outside of the optimal paths. Conversely, an organization that knows itself has more chances to cope with the challenges and opportunities altogether.

A lot from what organizations do in a consistent manner looks occasionally like inertia, self-occupation, suboptimal or random behavior, in opposition to being self-driven, self-aware, or in self-control. It’s also true that these are ideal qualities or aspects of what organizations should become in time.

Previous Post <<||>> Next Post

11 September 2024

🗄️Data Management: Data Culture (Part IV: Quo vadis? [Where are you going?])

Data Management Series

The people working for many years in the fields of BI/Data Analytics, Data and Process Management probably met many reactions that at the first sight seem funny, though they reflect bigger issues existing in organizations: people don’t always understand the data they work with, how data are brought together as part of the processes they support, respectively how data can be used to manage and optimize the respective processes. Moreover, occasionally people torture the data until it confesses something that doesn’t necessarily reflect the reality. It’s even more deplorable when the conclusions are used for decision-making, managing or optimizing the process. In extremis, the result is an iterative process that creates more and bigger issues than whose it was supposed to solve!

Behind each blunder there are probably bigger understanding issues that need to be addressed. Many of the issues revolve around understanding how data are created, how are brought together, how the processes work and what data they need, use and generate. Moreover, few business and IT people look at the full lifecycle of data and try to optimize it, or they optimize it in the wrong direction. Data Management is supposed to help, and it does this occasionally, though a methodology, its processes and practices are as good as people’s understanding about data and its use! No matter how good a data methodology is, it’s as weak as the weakest link in its use, and typically the issues revolving around data and data understanding are the weakest link.

Besides technical people, few businesspeople understand the full extent of managing data and its lifecycle. Unfortunately, even if some of the topics are treated in the books, they are too dry, need hands on experience and some thought in corroborating practices with theories. Without this, people will do things mechanically, processes being as good as the people using them, their value becoming suboptimal and hinder the business. That’s why training on Data Management is not enough without some hands-on experience!

The most important impact is however in BI/Data Analytics areas - how the various artifacts are created and used as support in decision-making, process optimization and other activities rooted in data. Ideally, some KPIs and other metrics should be enough for managing and directing a business, however just basing the decisions on a set of KPIs without understanding the bigger picture, without having a feeling of the data and their quality, the whole architecture, no matter how splendid, can breakdown as sandcastle on a shore meeting the first powerful wave!

Sometimes it feels like organizations do things from inertia, driven by the forces of the moment, initiatives and business issues for which temporary and later permanent solutions are needed. The best chance for solving many of the issues would have been a long time ago, when the issues were still small to create any powerful waves within the organizations. Therefore, a lot of effort is sometimes spent in solving the consequences of decisions not made at the right time, and that can be painful and costly!

For building a good business one needs also a solid foundation. In the past it was enough to have a good set of products that are profitable. However, during the past decade(s) the rules of the game changed driven by the acerb competition across geographies, inefficiencies, especially in the data and process areas, costing organizations on the short and long term. Data Management in general and Data Quality in particular, even if they’re challenging to quantify, have the power to address by design many of the issues existing in organizations, if given the right chance!

Previous Post <<||>> Next Post

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

Business Intelligence Series

We live in a complex world, which makes it difficult to model and work with the complex models that attempt to represent it. Thus, we try to simplify it to the degree that it becomes processable and understandable for us, while further simplification is needed when we try to depict it by digital means that make it processable by machines, respectively by us. Whenever we simplify something, we lose some aspects, which might be acceptable in many cases, but create issues in a broader number of ways.

With each layer of simplification results a model that addresses some parts while ignoring some parts of it, which restricts models’ usability to the degree that makes them unusable. The more one moves toward the extremes of oversimplification or complexification, the higher the chances for models to become unusable.

This aspect is relevant also in what concerns the business processes we deal with. Many processes are oversimplified to the degree that we track the entry and exit points, respectively the quantitative aspects we are interested in. In theory this information should be enough when answering some business questions, though might be insufficient when one dives deeper into processes. One can try to approximate, however there are high chances that such approximations deviate too much from the value approximated, which can lead to strange outcomes.

Therefore, when a date or other values are important, organizations consider adding more fields to reflect the implemented process with higher accuracy. Unfortunately, unless we save a history of all the important changes in the data, it becomes challenging to derive the snapshots we need for our analyses. Moreover, it is more challenging to obtain consistent snapshots. There are systems which attempt to obtain such snapshots through the implementation of the processes, though also this approach involves some complexity and other challenges.

Looking at the way business processes are implemented (see ERP, CRM and other similar systems), the systems track the created, modified and a few other dates that allow only limited perspectives. The fields typically provide the perspectives we need for data analysis. For many processes, it would be interesting to track other events and maybe other values taken in between.

There is theoretical potential in tracking more detailed data, but also a complexity that’s difficult to transpose into useful information about the processes themselves. Despite tracking more data and the effort involved in such activities, processes can still behave like black boxes, especially when we have no or minimal information about the processes implemented in Information Systems.

There’s another important aspect - even if systems provide similar implementations of similar processes, the behavior of users can make an important difference. The best example is the behavior of people entering the relevant data only when a process closes and ignoring the steps happening in between (dates, price or quantity changes).

There is a lot of missing data/information not tracked by such a system, especially in what concerns users’ behavior. It’s true that such behavior can be tracked to some degree, though that happens only when data are modified physically. One can suppose that there are many activities happening outside of the system.

The data gathered represents only the projection of certain events, which might not represent accurately and completely the processes or users’ behavior. We have the illusion of transparency, though we work with black boxes. There can be a lot of effort happening outside of these borders.

Fortunately, we can handle oversimplified processes and data maintenance, though one can but wonder how many important things can be found beyond the oversimplifications we work with, respectively what we miss in the process.

Previous Post <<||>> Next Post

10 April 2024

🧭Business Intelligence: Perspectives (Part 11: Ways of Thinking about Data)

Business Intelligence Series

One can observe sometimes the tendency of data professionals to move from a business problem directly to data and data modeling without trying to understand the processes behind the data. One could say that the behavior is driven by the eagerness of exploring the data, though even later there are seldom questions considered about the processes themselves. One can argue that maybe the processes are self-explanatory, though that’s seldom the case.

Conversely, looking at the datasets available on the web, usually there’s a fact table and the associated dimensions, the data describing only one process. It’s natural to presume that there are data professionals who don’t think much about, or better said in terms of processes. A similar big jump can be observed in blog posts on dashboards and/or reports, bloggers moving from the data directly to the data model.

In the world of complex systems like Enterprise Resource Planning (ERP) systems thinking in terms of processes is mandatory because a fact table can hold the data for different processes, while processes can span over multiple fact-like tables, and have thus multiple levels of detail. Moreover, processes are broken down into sub-processes and procedures that have a counterpart in the data as well.

Moreover, within a process there can be multiple perspectives that are usually module or role dependent. A perspective is a role’s orientation to the word for which the data belongs to, and it’s slightly different from what the data professional considers as view, the perspective being a projection over a set of processes within the data, while a view is a projection of the perspectives into the data structure.

For example, considering the order-to-cash process there are several sub-processes like order fulfillment, invoicing, and payment collection, though there can be several other processes involved like credit management or production and manufacturing. Creating, respectively updating, or canceling an order can be examples of procedures.

The sales representative, the shop worker and the accountant will have different perspectives projected into the data, focusing on the projection of the data on the modules they work with. Thinking in terms of modules is probably the easiest way to identify the boundaries of the perspectives, though the rules are occasionally more complex than this.

When defining and/or attempting to understand a problem it’s important to understand which perspective needs to be considered. For example, the sales volume can be projected based on Sales orders or on invoiced Sales orders, respectively on the General ledger postings, and the three views can result in different numbers. Moreover, there are partitions within these perspectives based on business rules that determine what to include or exclude from the logic.

One can define a business rule as a set of conditional logic that constraints some part of the data in the data structures by specifying what is allowed or not, though usually we refer to a special type called selection business rule that determines what data are selected (e.g. open Purchase orders, Products with Inventory, etc.). However, when building the data model we need to consider business rules as well, though we might need to check whether they are enforced as well.

Moreover, it’s useful to think also in terms of (data) entities and sub-entities, in which the data entity is an abstraction from the physical implementation of database tables. A data entity encapsulates (hides internal details) a business concept and/or perspective into an abstraction (simplified representation) that makes development, integration, and data processing easier. In certain systems like Dynamics 365 is important to think at this level because data entities can simplify data modelling considerably.

Previous Post <<||>> Next Post

06 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part I: First Thoughts)

Business Intelligence Series

A data project has a set of assumptions and requirements that must be met, otherwise the project has a high chance of failing. It starts with a clear idea of the goals and objectives, and they need to be achievable and feasible, with the involvement of key stakeholders and the executive without which it’s impossible to change the organization’s data culture. Ideally, there should also be a business strategy, respectively a data strategy available to understand the driving forces and the broader requirements.

An organization’s readiness is important not only in what concerns the data but also the things revolving around the data - processes, systems, decision-making, requirements management, project management, etc. One of the challenges is that the systems and processes available can’t be used as they are for answering important business questions, and many of such questions are quite basic, though unavailability or poor quality of data makes this challenging if not impossible.

Thus, when starting a data project an organization must be ready to change some of its processes to address a project’s needs, and thus the project can become more expensive as changes need to be made to the systems. For many organizations the best time to have done this was when they implemented the system, respectively the integration(s) between systems. Any changes made after that come in theory with higher costs derived from systems and processes’ redesign.

Many projects start big and data projects are no exception to this. Some of them build a costly infrastructure without first analyzing the feasibility of the investment, or at least whether the data can form a basis for answering the targeted questions. On one side one can torture any dataset and some knowledge will be obtained from it (aka data will confess), though few datasets can produce valuable insights, and this is where probably many data projects oversell their potential. Conversely, some initiatives are worth pursuing even only for the sake of the exposure and experience the employees get. However, trying to build something big only through the perspective of one project can easily become a disaster.

When building a data infrastructure, the project needs to be an initiative given the transformative potential such an endeavor can have for the organization, and the different aspects must be managed accordingly. It starts with the management of stakeholders’ expectations, with building a data strategy, respectively with addressing the opportunities and risks associated with the broader context.

Organizations recognize that they aren’t capable of planning and executing such a project or initiative, and they search for a partner to lead the way. Becoming overnight such a partner is more than a challenge as a good understanding of the industry and the business is needed. Some service providers have such knowledge, at least in theory, though the leap from knowledge to results can prove to be a challenge even for experienced service providers.

Many projects follow the pattern: the service provider comes, analyzes the requirements, builds something wonderful, the solution is used for some time and then the business realizes that the result is not what was intended. The causes are multiple and usually form a complex network of causality, though probably the most important aspect is that customers don’t have the in-house technical resources to evaluate the feasibility of requirements, solutions, respectively of the results. Even if organizations involve the best key users, are needed also good data professionals or similar resources who can become the bond between the business and the services provider. Without such an intermediary the disconnect between the business and the service provider can grow with all the implications.

Previous Post <<||>> Next Post

28 March 2024

🗄️🗒️Data Management: Master Data Management [MDM] [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources.
Last updated: 28-Mar-2024

Master Data Management (MDM)

{definition} the technologies, processes, policies, standards and guiding principles that enable the management of master data values to enable consistent, shared, contextual use across systems, of the most accurate, timely, and relevant version of truth about essential business entities [2],[3]
{goal} enable sharing of information assets across business domains and applications within an organization [4]
{goal} provide authoritative source of reconciled and quality-assessed master (and reference) data [4]
{goal} lower cost and complexity through use of standards, common data models, and integration patterns [4]
{driver} meeting organizational data requirements
{driver} improving data quality
{driver} reducing the costs for data integration
{driver} reducing risks
{type} operational MDM

involves solutions for managing transactional data in operational applications [1]
rely heavily on data integration technologies

{type} analytical MDM

involves solutions for managing analytical master data
centered on providing high quality dimensions with multiple hierarchies [1]
cannot influence operational systems

any data cleansing made within operational application isn’t recognized by transactional applications [1]

⇒ inconsistencies to the main operational data [1]

transactional application knowledge isn’t available to the cleansing process

{type} enterprise MDM

involves solutions for managing both transactional and analytical master data

manages all master data entities
deliver maximum business value

operational data cleansing

improves the operational efficiencies of the applications and the business processes that use the applications

cross-application data need

consolidation
standardization
cleansing
distribution

needs to support high volume of transactions

⇒ master data must be contained in data models designed for OLTP

⇐ ODS don’t fulfill this requirement

{enabler} high-quality data
{enabler} data governance
{benefit} single source of truth

used to support both operational and analytical applications in a consistent manner [1]

{benefit} consistent reporting

reduces the inconsistencies experienced previously
influenced by complex transformations

{benefit} improved competitiveness

MDM reduces the complexity of integrating new data and systems into the organization

⇒ increased flexibility and improves competitiveness

ability to react to new business opportunities quickly with limited resources

{benefit} improved risk management

more reliable and consistent data improves the business’s ability to manage enterprise risk [1]

{benefit} improved operational efficiency and reduced costs

helps identify business’ pain point

by developing a strategy for managing master data

{benefit} improved decision making

reducing data inconsistency diminishes organizational data mistrust and facilitates clearer (and faster) business decisions [1]

{benefit} more reliable spend analysis and planning

better data integration helps planners come up with better decisions

improves the ability to

aggregate purchasing activities
coordinate competitive sourcing
be more predictable about future spending
generally improve vendor and supplier management

{benefit} regulatory compliance

allows to reduce compliance risk

helps satisfy governance, regulatory and compliance requirements

simplifies compliance auditing

enables more effective information controls that facilitate compliance with regulations

{benefit} increased information quality

enables organizations to monitor conformance more effectively

via metadata collection
it can track whether data meets information quality expectations across vertical applications, which reduces information scrap and rework

{benefit} quicker results

reduces the delays associated with extraction and transformation of data [1]

⇒ it speeds up the implementation of application migrations, modernization projects, and data warehouse/data mart construction [1]

{benefit} improved business productivity

gives enterprise architects the chance to explore how effective the organization is in automating its business processes by exploiting the information asset [1]

⇐ master data helps organizations realize how the same data entities are represented, manipulated, or exchanged across applications within the enterprise and how those objects relate to business process workflows [1]

{benefit} simplified application development

provides the opportunity to consolidate the application functionality associated with the data lifecycle [1]

⇐ consolidation in MDM is not limited to the data
⇒ provides a single functional to which different applications can subscribe

⇐ introducing a technical service layer for data lifecycle functionality provides the type of abstraction needed for deploying SOA or similar architectures

factors to consider for implementing an MDM:

effective technical infrastructure for collaboration [1]
organizational preparedness

for making a quick transition from a loosely combined confederation of vertical silos to a more tightly coupled collaborative framework
{recommendation} evaluate the kinds of training sessions and individual incentives required to create a smooth transition [1]

metadata management

via a metadata registry

{recommendation} sets up a mechanism for unifying a master data view when possible [1]
determines when that unification should be carried out [1]

technology integration

{recommendation} diagnose what technology needs to be integrated to support the process instead of developing the process around the technology [1]

anticipating/managing change

proper preparation and organization will subtly introduce change to the way people think and act as shown in any shift in pattern [1]
changes in reporting structures and needs are unavoidable

creating a partnership between Business and IT

IT roles

plays a major role in executing the MDM program[1]

business roles

identifying and standardizing master data [1]
facilitating change management within the MDM program [1]
establishing data ownership

measurably high data quality
overseeing processes via policies and procedures for data governance [1]

{challenge} establishing enterprise-wide data governance

{recommendation} define and distribute the policies and procedures governing the oversight of master data

seeking feedback from across the different application teams provides a chance to develop the stewardship framework agreed upon by the majority while preparing the organization for the transition [1]

{challenge} isolated islands of information

caused by vertical alignment of IT

makes it difficult to fix the dissimilarities in roles and responsibilities in relation to the isolated data sets because they are integrated into a master view [1]

caused by data ownership

the politics of information ownership and management have created artificial exclusive domains supervised by individuals who have no desire to centralize information [1]

{challenge} consolidating master data into a centrally managed data asset [1]

transfers the responsibility and accountability for information management from the lines of business to the organization [1]

{challenge} managing MDM

MDM should be considered a program and not a project or an application [1]

{challenge} achieving timely and accurate synchronization across disparate systems [1]
{challenge} different definitions of master metadata
- different coding schemes, data types, collations, and more
{challenge} data conflicts

{recommendation} resolve data conflicts during the project [5]
{recommendation} replicate the resolved data issues back to the source systems [5]

{challenge} domain knowledge

{recommendation} involve domain experts in an MDM project [5]

{challenge} documentation

{recommendation} properly document your master data and metadata [5]

approaches

{architecture} no central MDM

isn’t a real MDM approach
used when any kind of cross-system interaction is required [5]

e.g. performing analysis on data from multiple systems, ad-hoc merging and cleansing

{drawback} very inexpensive at the beginning; however, it turns out to be the most expensive over time [5]

{architecture} central metadata storage

provides unified, centrally maintained definitions for master data [5]

followed and implemented by all systems

ad-hoc merging and cleansing becomes somewhat simpler [5]
does not use a specialized solution for the central metadata storage [5]

⇐ the central storage of metadata is probably in an unstructured form

e.g. documents, worksheets, paper

{architecture} central metadata storage with identity mapping

stores keys that map tables in the MDM solution

only has keys from the systems in the MDM database; it does not have any other attributes [5]

{benefit} data integration applications can be developed much more quickly and easily [5]
{drawback} raises problems in regard to maintaining master data over time [5]

there is no versioning or auditing in place to follow the changes [5]

⇒ viable for a limited time only

e.g. during upgrading, testing, and the initial usage of a new ERP system to provide mapping back to the old ERP system

{architecture} central metadata storage and central data that is continuously merged

stores metadata as well as master data in a dedicated MDM system
master data is not inserted or updated in the MDM system [5]
the merging (and cleansing) of master data from source systems occurs continuously, regularly [5]
{drawback} continuous merging can become expensive [5]
the only viable use for this approach is for finding out what has changed in source systems from the last merge [5]

enables merging only the delta (new and updated data)

frequently used for analytical systems

{architecture} central MDM, single copy

involves a specialized MDM application

master data, together with its metadata, is maintained in a central location [5]
⇒ all existing applications are consumers of the master data

{drawback} upgrade all existing applications to consume master data from central storage instead of maintaining their own copies [5]

⇒ can be expensive
⇒ can be impossible (e.g. for older systems)

{drawback} needs to consolidate all metadata from all source systems [5]
{drawback} the process of creating and updating master data could simply be too slow [5]

because of the processes in place

{architecture} central MDM, multiple copies

uses central storage of master data and its metadata

⇐ the metadata here includes only an intersection of common metadata from source systems [5]
each source system maintains its own copy of master data, with additional attributes that pertain to that system only [5]

after master data is inserted into the central MDM system, it is replicated (preferably automatically) to source systems, where the source-specific attributes are updated [5]
{benefit} good compromise between cost, data quality, and the effectiveness of the CRUD process [5]
{drawback} update conflicts

different systems can also update the common data [5]

⇒ involves continuous merges as well [5]

{drawback} uses a special MDM application

Previous Post <<||>> Next Post

Acronyms:

MDM - Master Data Management

ODS - Operational Data Store

OLAP - online analytical processing

OLTP - online transactional processing

SOA - Service Oriented Architecture

References:
[1] The Art of Service (2017) Master Data Management Course
[2] DAMA International (2009) "The DAMA Guide to the Data Management Body of Knowledge" 1st Ed.

[3] Tony Fisher 2009 "The Data Asset"

[4] DAMA International (2017) "The DAMA Guide to the Data Management Body of Knowledge" 2nd Ed.

[5] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)

SQL Troubles

Pages

09 August 2025

🧭Business Intelligence: Perspectives (Part 33: Data Lifecycle for Analytics)

13 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 20: The Negative Influence of AI on the Recruitment Process)

12 June 2025

🤖〽️Prompt Engineering: Copilot Unabridged (Part 19: How Much AI Influences the Recruitment Process)

26 April 2025

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

20 April 2025

🧮ERP: Implementations (Part XVII: Taming the Monsters)

15 April 2025

🧮ERP: Implementations (Part XII: The Process Perspective)

15 February 2025

🧭Business Intelligence: Perspectives (Part 27: A Tale of Two Cities II)

04 February 2025

🧭Business Intelligence: Perspectives (Part 26: Monitoring - A Cockpit View)

26 January 2025

🧭Business Intelligence: Perspectives (Part 25: Grounding the Roots)

15 January 2025

🧭Business Intelligence: Perspectives (Part 23: In between the Many Destinations)

11 September 2024

🗄️Data Management: Data Culture (Part IV: Quo vadis? [Where are you going?])

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

10 April 2024

🧭Business Intelligence: Perspectives (Part 11: Ways of Thinking about Data)

06 April 2024

🧭Business Intelligence: Why Data Projects Fail to Deliver Real-Life Impact (Part I: First Thoughts)

28 March 2024

🗄️🗒️Data Management: Master Data Management [MDM] [Notes]

About Me