SQL Troubles

20 May 2025

#️⃣Software Engineering: Mea Culpa (Part IX: A Look Inward)

Software Engineering Series

Paraphrasing a saying attributed to Lao Tzu, "a journey starts with a look inward". Of course, there are also people walking just for the sake of walking, without a purpose or afterthought, though usually when we follow a path we ask ourselves: where we are? what's the destination? how long is the road? how much it takes to reach the intermediary and final goals? what do I need for the road? what can I find on the road? what else do I need to know? who can help me? with whom or what can I travel? does the journey make sense? am I missing something? am I fit for the road and obstacles ahead? what challenges I might meet?

The traveler's metaphor can be easily applied to the professional life as well, especially when one reached a junction or a point on which a decision must be made about the road ahead. Usually, the first step is being aware where one is, and this applies from all points of view: geographical, physical, mental, spiritual, social and even ideological or religious. Of course, not everything is equally important, and it depends on each individual's values, state of mind, situation in life, etc. The more dimensions one considers into the equations of life, the more challenging it is to find a solution, even if not optimal. Even if some things can be ignored, the overall texture can become quite complex, more complex than our linear way of thinking allows us to cope with!

It would be great if one could always choose what's the destination! Unfortunately, the destination is available based on a serios of variables we are constrained to: our capabilities and knowledge, our experience, the work ahead, on what others want to achieve, and this can change from day to day according to the priorities set. Ideally, we should be aware of all the strings attached to us, respectively which strings are pulled and who's pulling the strings, however life is usually at least one step ahead of us! The lower we find ourselves in the texture that decides upon our life, the more challenging it becomes. Ideally, we should have no attachment to things, situations, people or other material things. However, we aren't hermits living in a cave!

Once we set a destination, is important to consider what one needs for the road, respectively for reaching the intermediary steps. That's probably one of the most challenging aspects of all experience. How can I prepare one for the road not taken? The past is probably the only one that can offer a sense of the future, though what if the future doesn't resemble the past in many essential points? One can always make jumps of faith, though faith alone is not enough! Blind faith without knowledge or wisdom can one bring in the opposite or other wrong directions. One can but try to map the territory and establish a plan, however the contact with reality can be harsher than one can imagine. One can but try to adapt, though one must be built for this! Some fundament must be already there!

A good recommendation for the road ahead is to expect the unexpected. Planning is good for preparing the mind, though what one finds on the way might be surprising as well challenging to overcome. Harsh long-distance journeys are not for everybody! Starting small and training the qualities needed for such journeys might be the best advice one can get. Though, the beginning of wisdom takes more steps than this, no matter the road, no matter the experience or one’s aims!

Previous Post <<||>> Next Post

🤖〽️Prompt Engineering: Copilot Unabridged (Part 1: The Importance of AI in Society - A Transformational Force)

Prompt Engineering Series

Prompt: "write a blogpost of 600 words on the importance of AI in society"

Introduction

Artificial Intelligence (AI) has evolved from a futuristic concept to a daily reality that shapes industries, economies, and even personal lives. Once confined to science fiction, AI now drives technological progress and redefines the way we interact with the world. In this blog post, we will explore the significance of AI, how it influences various sectors, and the ethical considerations accompanying its rapid adoption.

AI: The Backbone of Modern Innovation

At its core, AI enables machines to perform tasks that typically require human intelligence - such as learning, reasoning, problem-solving, and decision-making. From automating repetitive tasks to augmenting human capabilities, AI has proven indispensable across multiple domains.

Take healthcare, for example. AI-powered algorithms analyze complex medical data to assist doctors in diagnosing diseases, predicting patient outcomes, and even recommending personalized treatment plans. Innovations like AI-driven robotic surgery and early cancer detection systems save lives and enhance the precision of medical procedures.

In finance, AI plays a crucial role in fraud detection, risk assessment, and automated trading. Financial institutions leverage machine learning models to analyze spending patterns, prevent cyberattacks, and provide tailored investment advice. AI’s ability to process massive amounts of data ensures accuracy and efficiency in financial operations.

Retail and e-commerce have also been transformed. AI-driven recommendation engines predict customer preferences based on purchasing behavior, allowing businesses to offer personalized shopping experiences. Chatbots streamline customer service, while automated inventory management optimizes supply chain efficiency. AI ensures that companies stay competitive in an increasingly digital marketplace.

AI and the Workplace: Automation vs. Augmentation

The integration of AI into the workplace sparks ongoing debates about its impact on jobs. While automation replaces certain manual tasks, it also opens doors for new career opportunities. AI-driven tools enable professionals to focus on creative and strategic work rather than repetitive processes.

For instance, AI-powered analytics help marketers assess consumer sentiment and predict trends with unprecedented accuracy. Similarly, AI assists engineers and researchers in designing smarter infrastructure, accelerating scientific discoveries, and optimizing business operations.

Rather than eliminating jobs, AI redefines them. Organizations must invest in upskilling employees to harness AI effectively, fostering a collaborative environment where human creativity complements machine intelligence.

Ethical Considerations and Responsible AI Development

With great power comes great responsibility. As AI grows more sophisticated, ethical concerns regarding data privacy, bias, and transparency become more pressing.

AI systems rely on large datasets to function, raising concerns about privacy and security. Governments and corporations must establish stringent regulations to protect user information from misuse or exploitation. Additionally, biases in AI models can lead to discriminatory outcomes, necessitating fair and inclusive development practices.

Transparency is another critical factor. Users should understand how AI-driven decisions are made, particularly in sensitive areas like healthcare and criminal justice. Ethical AI implementation requires collaboration between policymakers, technologists, and the public to ensure that advancements benefit society as a whole.

The Future of AI: A Balance Between Progress and Ethics

The importance of AI in society cannot be understated. It enhances productivity, improves decision-making, and solves complex problems across industries. However, its responsible use is paramount to prevent unintended consequences.

As AI continues to evolve, collaboration between academia, businesses, and governments is necessary to establish ethical guidelines and maximize its benefits. The goal should not be to replace human intelligence but to amplify it, fostering innovation while ensuring accountability.

Ultimately, AI is more than just a tool - it is a transformative force shaping the future. Harnessing its potential wisely will determine whether it serves as a steppingstone toward progress or a challenge requiring careful navigation.

Disclaimer: The whole text was generated by Copilot at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

|>> Next Post

19 May 2025

#️⃣Software Engineering: Mea Culpa (Part VIII: A Look Beyond)

Software Engineering Series

With AI on the verge, blogging and bloggers can easily become obsolete. Why bother navigating through the many blogs to get a broader perspective when the same can be obtained with AI? Just type in a prompt of the type "write a blogpost of 600 words on the importance of AI in society" and Copilot or any other similar AI agent will provide you an answer that may look much better than the first draft of most of the bloggers out there! It doesn't matter whether the text follows a well-articulated idea, a personal perspective or something creative! One gets an acceptable answer with a minimum of effort and that's what matters for many.

The results tend to increase in complexity the more models are assembled together, respectively the more uncontrolled are the experiments. Moreover, solutions that tend to work aren't necessarily optimal. Machines can't offer instant enlightenment or anything close to it. Though they have an incomparable processing power of retrieval, association, aggregation, segregation and/or iteration, which coupled with the vast amount of data, information and knowledge can generate anything in just a matter of seconds. Probably, the only area in which humans can compete with machines is creativity and wisdom, though how many will be able to leverage these at scale? Probably, machines have some characteristics that can be associated with these intrinsic human characteristics, though usually more likely the brute computational power will prevail.

At Microsoft Build, Satya Nadella mentioned that foundry encompasses already more than 1900 supported models. In theory, one can still evaluate and test such models adequately. What will happen when the scale increases with a few orders of magnitude? What will happen when for each person there are one or more personalized AI models? AI can help in many areas by generating and evaluating rapidly many plausible alternatives, though as soon the models deal with some kind of processing randomization, the chances for errors increase exponentially (at least in theory).

It's enough for one or more hallucinations or other unexpected behavior to lead to more unexpected behavior. No matter how well a model was tested, as long as there's no stable predictable mathematical model behind it, the chances for something to go wrong increase with the number of inputs, parameters, uses, or changes of context the model deals with. Unfortunately, all these aspects are seldom documented. It's not like using a formula and you know that given a set of inputs and operations, the result is the same. The evolving nature of such models makes them unpredictable in the long term. Therefore, there must always be a way to observe the changes occurring in models.

One of the important questions is how many errors can we afford in such models? How long it takes until errors impact each other to create effects comparable with a tornado. And what if the tornado increases in magnitude to the degree that it wrecks everything that crosses its path? What if multiple tornadoes join forces? How many tornadoes can destroy a field, a country or a continent? How many or big must be the tornadoes to trigger a warning?

Science-Fiction authors love to create apocalyptic scenarios, and all happens in just a few steps, respectively chapters. In nature, usually it takes many orders of magnitude to generate unpredictable behavior. But, as nature often reveals, unpredictable behavior does happen, probably more often than we expect and wish for. The more we are poking the bear, the higher the chances for something unexpected to happen! Do we really want this? What will be the price we must pay for progress?

Previous Post <<||>> Next Post

18 May 2025

#️⃣Software Engineering: Mea Culpa (Part VII: A Look Forward)

Software Engineering Series

I worked for more than 20 years in various areas related to ERP systems - Data Migrations, Business Intelligence/Analytics, Data Warehousing, Data Management, Project Management, (data) integrations, Quality Assurance, and much more, having experience with IFS IV, Oracle e-Business Suite, MS Dynamics AX 2009 and during the past 3-7 years also with MS Dynamics 365 Finance, SCM & HR (in that order). Much earlier, I started to work with SQL Server (2000-2019), Oracle, and more recently with Azure Synapse and MS Fabric, writing over time more than 800 ad-hoc queries and reports for the various stakeholders, covering all the important areas, respectively many more queries for monitoring the various environments.

In the areas where I couldn’t acquire experience on the job, I tried to address this by learning in my free time. I did it because I take seriously my profession, and I want to know how (some) things work. I put thus a lot of time into trying to keep actual with what’s happening in the MS Fabric world, from Power BI to KQL, Python, dataflows, SQL databases and much more. These technologies are Microsoft’s bet, though at least from German’s market perspective, all bets are off! Probably, many companies are circumspect or need more time to react to the political and economic impulses, or probably some companies are already in bad shape.

Unfortunately, the political context has a broad impact on the economy, on what’s happening in the job market right now! However, the two aspects are not the only problem. Between candidates and jobs, the distance seems to grow, a dense wall of opinion being built, multiple layers based on presumptions filtering out voices that (still) matter! Does my experience matter or does it become obsolete like the technologies I used to work with? But I continued to learn, to keep actual… Or do I need to delete everything that reminds the old?

To succeed or at least be hired today one must fit a pattern that frankly doesn’t make sense! Yes, soft skills are important though not all of them are capable of compensating for the lack of technical skills! There seems to be a tendency to exaggerate some of the qualities associated with skills, or better said, of hiding behind big words. Sometimes it feels like a Shakespearian inaccurate adaptation of the stage on which we are merely players.

More likely, this lack of pragmatism will lead to suboptimal constructions that will tend to succumb under their own structure. All the inefficiencies need to be corrected, or somebody (or something) must be able to bear their weight. I saw this too often happening in ERP implementations! Big words don’t compensate for the lack of pragmatism, skills, knowledge, effort or management! For many organizations the answer to nowadays problems is more management, which occasionally might be the right approach, though this is not a universal solution for everything that crosses our path(s).

One of society’s answers to nowadays’ problem seems to be the refuge in AI. So, I wonder – where I’m going now? Jobless, without an acceptable perspective, with AI penetrating the markets and making probably many jobs obsolete. One must adapt, but adapt to what? AI is brainless even if it can mimic intelligence! Probably, it can do more in time to the degree that many more jobs will become obsolete (and I’m wondering what will happen to all those people).

Conversely, to some trends there will be probably other trends against them, however it’s challenging to depict in clear terms the future yet in making. Society seems to be at a crossroad, more important than mine.

Previous Post <<||>> Next Post

03 May 2025

🧭Business Intelligence: Perspectives (Part 31: More on Data Visualization)

Business Intelligence Series

There are many reasons why the data visualizations available in the different mediums can be considerate as having poor quality and unfortunately there is often more than one issue that can be corroborated with this - the complexity of the data or of the models behind them, the lack of identifying the right data, respectively aspects that should be visualized, poor data visualization software or the lack of skills to use its capabilities, improper choice of visual displays, misleading choice of scales, axes and other elements, the lack of clear outlines for telling a story respectively of pushing a story too far, not adapting visualizations to changing requirements or different perspectives, to name just the most important causes.

The complexity of the data increases with the dimensions associated typically with what we call currently big data - velocity, volume, value, variety, veracity, variability and whatever V might be in scope. If it's relatively easy to work with a small dataset, understanding its shapes and challenges, our understanding power decreases with the Vs added into the picture. Of course, we can always treat the data alike, though the broader the timeframe, the higher the chances are for the data to have important changing characteristics that can impact the outcomes. It can be simple definition changes or more importantly, the model itself. Data, processes and perspectives change fluidly with the many requirements, and quite often the further implications for reporting, visualizations and other aspects are not considered.

Quite often there's a gap between what one wants to achieve with a data visualization and the data or knowledge available. It might be a matter of missing values or whole attributes that would help to delimit clearly the different perspectives or of modelling adequately the processes behind. It can be the intrinsic data quality issues that can be challenging to correct after the fact. It can also be our understanding about the processes themselves as reflected in the data, or more important, on what's missing to provide better perspectives. Therefore, many are forced to work with what they have or what they know.

Many of the data visualizations inadvertently reflect their creators' understanding about the data, procedures, processes, and any other aspects related to them. Unfortunately, also business users or other participants have only limited views and thus their knowledge must be elicited accordingly. Even then, it might be pieces of data that are not reflected in any knowledge available.

If one tortures enough data, one or more stories worthy of telling can probably be identified. However, much of the data is dull to the degree that some creators feel forced to add elements. Earlier, one could have blamed the software for it, though modern software provides nice graphics and plenty of features that can help graphics creators in the process. Even data with high quality can reveal some challenges difficult to overcome. One needs to compromise and there can be compromises in many places to the degree that one can but wonder whether the end result still reflects reality. Unfortunately, it's difficult to evaluate the impact of such gaps, however progress can be made occasionally by continuously evaluating the gaps and finding the appropriate methods to address them.

Not all stories must have complex visualizations in which multiple variables are used to provide the many perspectives. Some simple visualizations can be enough for establishing common ground on which something more complex (or simple) can be built upon. Data visualization is a continuous process of exploration, extrapolation, evaluation, testing assumptions and ideas, where one's experience can be a useful mediator between the various forces.

Previous Post <<||>> Next Post

📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)

Graphical Representation Series

Over the past 10-20 years it became so easy to create data visualizations just by dropping some of the data available into a tool like Excel and providing a visual depiction of it with just a few clicks. In many cases, the first draft, typically provided by default in the tool used, doesn't even need further work as the objective was reached, while in others the creator must have a minimum skillset for making the visualization useful, appealing, or whatever quality is a final requirement for the work in scope. However, the audience might judge the visualization(s) from different perspectives, and there can be a broad audience with different skills in reading, evaluating and understanding the work.

There are many depictions on the web resembling the one below, taken from a LinkedIn post:

Example Chart - Boing vs. Airbus

Even if the visualization is not perfect, it does a fair job in representing the data. Improvements can be made in the areas of labels, the title and positioning of elements, and the color palette used. At least these were the improvements made in the original post. It must be differentiated also between the environment in which the charts are made available, the print format having different characteristics than the ones in business setups. Unfortunately, the requirements of the two are widely confused, probably also because of the overlapping of the mediums used.

Probably, it's a good idea to always start with the row data (or summaries of it) when the result consists of only a few data points that can be easily displayed in a table like the one below (the feature to round the decimals for integer values should be available soon in Power BI):

Summary Table

Of course, one can calculate more meaningful values like percentages from the total, standard deviations and other values that offer more perspectives into the data. Even if the values adequately reflect the reality, the reader can but wonder about the local and global minimal/maximal values, without talking much about the meaning of data points, which is easily identifiable in a chart. At least in the case of small data sets, using a table in combination with a chart can provide a more complete perspective and different ways of analyzing the data, especially when the navigation is interactive.

Column and bar charts do a fair job in comparing values over time, though they do use a lot of ink in the process (see D). While they make it easy to compare neighboring values, the rectangles used tend to occupy a lot of space when they are made too wide or too high to cover the empty space within the display (e.g. when just a few values are displayed, space being wasted in the process). As the main downside, it takes a lot of scanning until the reader identifies the overall trends, and the further away the bars are from each other, the more difficult it becomes to do comparisons.

In theory, line charts are more efficient in representing the above data points, because the marks are usually small and the line thin enough to provide a better data-ink ratio, while one can see a lot at a glance. In Power BI the creator can use different types of interpolation: linear (A), step (B) or smooth (C). In many cases, it might be a good idea to use a linear interpolation, though when there are no or minimal overlapping, it might be worthwhile to explore the other types if interpolation too (and further request feedback from the users):

Linear, Step and Smooth Line Charts

The nearness of values from different series can raise difficulties in identifying adequately the points, respectively delimiting the lines (see B).When the density of values allows it, it makes sense also to include the averages for each data series to reflect the distance between the two data sets. Unfortunately, the chart can get crowded if further data series or summaries are added to the cart(s).

If the column chart (E) is close to the redesigned chart provided in the original redesign, the other alternatives can provide upon case more value. Stacked column charts (D) allow also to compare the overall quantity by month, area charts (F) tend to use even more color than needed, while water charts (G) allow to compare the difference between data points per time unit. Tornado charts (H) are a variation of bar charts, allowing easier comparing of the size of the bars, while ribbon charts (I) show well the stacking values.

Alternatives to Line Charts

One should consider changing the subtitle(s) slightly to reflect the chart type when the patterns shown imply a shift in attention or meaning. Upon case, more that one of the above charts can be used within the same report when two or more perspectives are important. Using a complementary perspective can facilitate data's understanding or of identifying certain patterns that aren't easily identifiable otherwise.

In general, the graphics creators try to use various representational means of facilitating a data set's understanding, though seldom only two series or a small subset of dimensions provide a complete description. The value of data comes when multiple perspectives are combined. Frankly, the same can be said about the above data series. Yes, there are important differences between the two series, though how do the numbers compare when one looks at the bigger picture, especially when broken down on element types (e.g. airplane size). How about plan vs. actual values, how long does it take more for production or other processes? It's one of a visualization's goals to improve the questions posed, but how efficient are visualizations that barely scratch the surface?

In what concerns the code, the following scripts can be used to prepare the data:

-- Power Query script (Boeing vs Airbus)
= let
    Source = let
    Source = #table({"Sorting", "Month Name", "Serial Date", "Boeing Deliveries", "Airbus Deliveries"},
    {
        {1, "Oct", #date(2023, 10, 31), 30, 50},
        {2, "Nov", #date(2023, 11, 30), 40, 40},
        {3, "Dec", #date(2023, 12, 31), 40, 110},
        {4, "Jan", #date(2024, 1, 31), 20, 30},
        {5, "Feb", #date(2024, 2, 29), 30, 40},  // Leap year adjustment
        {6, "Mar", #date(2024, 3, 31), 30, 60},
        {7, "Apr", #date(2024, 4, 30), 40, 60},
        {8, "May", #date(2024, 5, 31), 40, 50},
        {9, "Jun", #date(2024, 6, 30), 50, 80},
        {10, "Jul", #date(2024, 7, 31), 40, 90},
        {11, "Aug", #date(2024, 8, 31), 40, 50},
        {12, "Sep", #date(2024, 9, 30), 30, 50}
    }
    ),
    #"Changed Types" = Table.TransformColumnTypes(Source, {{"Sorting", Int64.Type}, {"Serial Date", type date}, {"Boeing Deliveries", Int64.Type}, {"Airbus Deliveries", Int64.Type}})
in
    #"Changed Types"
in
    Source

It can be useful to create the labels for the charts dynamically:

-- DAX code for labels
MaxDate = Format(Max('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinDate = FORMAT (Min('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinMaxDate = [MinDate] & " to " & [MaxDate]
Title Boing Airbus = "Boing and Airbus Deliveries " & [MinMaxDate]

Happy coding!

Previous Post <<||>> Next Post

29 April 2025

🏭🗒️Microsoft Fabric: Purview [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 29-Apr-2025

[Microsoft Purview] Purview

{def} comprehensive data governance and security platform designed to help organizations manage, protect, and govern their data across various environments [1]

incl. on-premises, cloud & SaaS applications [1]
provides the highest and most flexible level of functionality for data governance in MF [1]

offers comprehensive tools for

data discovery
data classification
data cataloging

{capability} managing the data estate

{tool} dedicated portal

aka Fabric Admin portal
used to control tenant settings, capacities, domains, and other objects, typically reserved for administrators

{type} logical containers

used to control access to data and capabilities [1]
{level} tenants

settings for Fabric administrators [1]

{level} domains

group data that is relevant to a single business area or subject field [1]

{level} workspaces

group Fabric items used by a single team or department [1]

{type} capacities

objects that limit compute resource usage for all Fabric workloads [1]

{capability} metadata scanning

extracts values from data lakes

e.g. names, identities, sensitivities, endorsements, etc.
can be used to analyze and set governance policies [1]

{capability} secure and protect data

assure that data is protected against unauthorized access and destructive attacks [1]
compliant with data storage regulations applicable in your region [1]
{tool} data tags

allows to identity the sensitivity of data and apply data retentions and protection policies [1]

{tool} workspace roles

define the users who are authorized to access the data in a workspace [1]

{tool} data-level controls

used at the level of Fabric items

e.g. tables, rows, and columns to impose granular restrictions.

{tool} certifications

Fabric is compliant with many data management certifications

incl. HIPAA BAA, ISO/IEC 27017, ISO/IEC 27018, ISO/IEC 27001, ISO/IEC 27701 [1]

{feature} OneLake data hub

allows users to find and explore the data in their estate.

{feature} endorsement

allows users to endorse a Fabric item to identity it as of high quality [1]

help other users to trust the data that the item contains [1]

{feature} data lineage

allows users to understand the flow of data between items in a workspace and the impact that a change would have [1]

{feature} monitoring hub

allows to monitor activities for the Fabric items for which the user has the permission to view [1]

{feature} capacity metrics

app used to monitor usage and consumption

{feature} allows to automate the identification of sensitive information and provides a centralized repository for metadata [1]
feature} allows to find, manage, and govern data across various environments

incl. both on-premises and cloud-based systems [1]
supports compliance and risk management with features that monitor regulatory adherence and assess data vulnerabilities [1]

{feature} integrated with other Microsoft services and third-party tools

{benefit} enhances its utility
{benefit} streamlines data access controls

enforcing policies, and delivering insights into data lineage [1]

{benefit} helps organizations maintain data integrity, comply with regulations, and use their data effectively for strategic decision-making [1]
{feature} Data Catalog

{benefit} allows users to discover, understand, and manage their organization's data assets

search for and browse datasets
view metadata
gain insights into the data’s lineage, classification, and sensitivity labels [1]

{benefit} promotes collaboration

users can annotate datasets with tags to improve discoverability and data governance [1]

targets users and administrator
{benefit} allows to discover where patient records are held by searching for keywords [1]
{benefit} allows to label documents and items based on their sensitiveness [1]
{benefit} allows to use access policies to manage self-service access requests [1]

{feature} Information Protection

used to classify, label, and protect sensitive data throughout the organization [1]

by applying customizable sensitivity labels, users classify records. [1]
{concept} policies

define access controls and enforce encryption
labels follow the data wherever it goes
helps organizations meet compliance requirements while safeguarding data against accidental exposure or malicious threats [1]

allows to protect records with policies to encrypt data and impose IRM

{feature} Data Loss Prevention (DLP)

the practice of protecting sensitive data to reduce the risk from oversharing [2]

implemented by defining and applying DLP policies [2]

{feature} Audit

user activities are automatically logged and appear in the Purview audit log

e.g. creating files or accessing Fabric items

{feature} connect Purview to Fabric in a different tenant

all functionality is supported, except that

{limitation} Purview's live view isn't available for Fabric items [1]
{limitation} the system can't identify user registration automatically [1]
{limitation} managed identity can’t be used for authentication in cross-tenant connections [1]

{workaround} use a service principal or delegated authentication [1]

{feature} Purview hub

displays reports and insights about Fabric items [1]

acts as a centralized location to begin data governance and access more advanced features [1]
via Settings >> Microsoft Purview hub
administrators see information about their entire organization's Fabric data estate
provides information about

Data Catalog
Information Protection
Audit

the data section displays tables and graphs that analyze the entire organization's items in MF

users only see information about their own Fabric items and data

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Purview: Govern data in Microsoft Fabric with Purview[link]

[2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]

[3] Microsoft Learn (2024) [link]

Resources:

Acronyms:

DLP - Data Loss Prevention
M365 - Microsoft 365

MF - Microsoft Fabric
SaaS - Software-as-a-Service

🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 10-Jun-2025

[Microsoft Purview] Data Loss Prevention (DLP)

{def} the practice of protecting sensitive data to reduce the risk from oversharing [2]

implemented by defining and applying DLP policies [2]

{benefit} helps to protect sensitive information with policies that automatically detect, monitor, and control the sharing or movement of sensitive data [1]

administrators can customize rules to block, restrict, or alert when sensitive data is transferred to prevent accidental or malicious data leaks [1]

{concept} DLP policies

allow to monitor the activities users take on sensitive items and then take protective actions [2]

applies to sensitive items

at rest
in transit [2]
in use [2]

created and maintained in the Microsoft Purview portal [2]

{scope} only supported for Power BI semantic models [1]
{action} show a pop-up policy tip to the user that warns that they might be trying to share a sensitive item inappropriately [2]
{action} block the sharing and, via a policy tip, allow the user to override the block and capture the users' justification [2]
{action} block the sharing without the override option [2]
{action} [data at rest] sensitive items can be locked and moved to a secure quarantine location [2]
{action} sensitive information won't be displayed

e.g. Teams chat

DLP reports

provides data from monitoring policy matches and actions, to user activities [2]

used as basis for tuning policies and triage actions taken on sensitive items [2]

telemetry uses M365 audit Logs and processed the data for the different reporting tools [2]

M365 provides with visibility into risky user activities [2]
scans the audit logs for risky activities and runs them through a correlation engine to find activities that are occurring at a high volume [1]

no DLP policies are required [2]

{feature} detects sensitive items by using deep content analysis [2]

⇐ not by just a simple text scan [2]
based on

keywords matching [2]
evaluation of regular expressions [2]
internal function validation [2]
secondary data matches that are in proximity to the primary data match [2]
ML algorithms and other methods to detect content that matches DLP policies

all DLP monitored activities are recorded to the Microsoft 365 Audit log [2]

DLP lifecycle

{phase} plan for DLP

train and acclimate users to DLP practices on well-planned and tuned policies [2]
{recommendation} use policy tips to raise awareness with users before changing the policy status from simulation mode to more restrictive modes [2]

{phase} prepare for DLP
{phase} deploy policies in production

{action} define control objectives, and how they apply across workloads [2]
{action} draft a policy that embodies the objectives
{action} start with one workload at a time, or across all workloads - there's no impact yet
{feature} implement policies in simulation mode

{benefit} allows to evaluate the impact of controls

the actions defined in a policy aren't applied yet

{benefit} allows to monitor the outcomes of the policy and fine-tune it so that it meets the control objectives while ensuring it doesn't adversely or inadvertently impacting valid user workflows and productivity [2]

e.g. adjusting the locations and people/places that are in or out of scope
e.g. tune the conditions that are used to determine if an item and what is being done with it matches the policy
e.g. the sensitive information definition/s
e.g. add new controls
e.g. add new people
e.g. add new restricted apps
e.g. add new restricted sites

{step} enable the control and tune policies [2]

policies take effect about an hour after being turned on [2]

{action} create DLP policy
{action} deploy DLP policy

DLP alerts

alerts generated when a user performs an action that meets the criteria of a DLP policy [2]

there are incident reports configured to generate alerts [2]
{limitation} available in the alerts dashboard for 30 days [2]

DLP posts the alert for investigation in the DLP Alerts dashboard
{tool} DLP Alerts dashboard

allows to view alerts, triage them, set investigation status, and track resolution

routed to Microsoft Defender portal
{limitation} available for six months [2]

{constraint} administrative unit restricted admins see the DLP alerts for their administrative unit only [2]

{concept} egress activities (aka exfiltration)

{def} actions related to exiting or leaving a space, system or network [2]

{concept}[Microsoft Fabric] policy

when a DLP policy detects a supported item type containing sensitive information, the actions configured in the policy are triggered [3]
{feature} Activity explorer

allows to view Data from DLP for Fabric and Power BI
for accessing the data, user's account must be a member of any of the following roles or higher [3]

Compliance administrator
Security administrator
Compliance data administrator
Global Administrator

{warning} a highly privileged role that should only be used in scenarios where a lesser privileged role can't be used [3]

{recommendation} use a role with the fewest permissions [3]

{warning} DLP evaluation workloads impact capacity consumption [3]
{action} define policy

in the data loss prevention section of the Microsoft Purview portal [3]
allows to specify

conditions

e.g. sensitivity labels

sensitive info types that should be detected [3]

[semantic model] evaluated against DLP policies

whenever one of the following events occurs:

publish
republish
on-demand refresh
scheduled refresh

the evaluation doesn't occur if either of the following is true

the initiator of the event is an account using service principal authentication [3]
the semantic model owner is a service principal [3]

[lakehouse] evaluated against DLP policies when the data within a lakehouse undergoes a change

e.g. getting new data, connecting a new source, adding or updating existing tables, etc. [3]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2025) Learn about data loss prevention [link]

[2] Microsoft Learn (2024) Purview: Learn about data loss prevention [link]

[3] Microsoft Learn (2025) Get started with Data loss prevention policies for Fabric and Power BI [link]

Resources:

[R1] Microsoft Fabric Updates Blog (2024) Secure Your Data from Day One: Best Practices for Success with Purview Data Loss Prevention (DLP) Policies in Microsoft Fabric [link]
[R2]

Acronyms:

DLP - Data Loss Prevention
M365 - Microsoft 365

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2

[Microsoft Fabric] Dataflow Gen2 Parameters

{def} parameters that allow to dynamically control and customize Dataflows Gen2

makes them more flexible and reusable by enabling different inputs and scenarios without modifying the dataflow itself [1]
the dataflow is refreshed by passing parameter values outside of the Power Query editor through either

Fabric REST API [1]
native Fabric experiences [1]

parameter names are case sensitive [1]
{type} required parameters

{warning} the refresh fails if no value is passed for it [1]

{type} optional parameters
enabled via Parameters >> Enable parameters to be discovered and override for execution [1]

{limitation} dataflows with parameters can't be

scheduled for refresh through the Fabric scheduler [1]
manually triggered through the Fabric Workspace list or lineage view [1]

{limitation} parameters that affect the resource path of a data source or a destination are not supported [1]

⇐ connections are linked to the exact data source path defined in the authored dataflow

can't be currently override to use other connections or resource paths [1]

{limitation} can't be leveraged by dataflows with incremental refresh [1]
{limitation} supports only parameters of the type decimal number, whole number, text and true/false can be passed for override

any other data types don't produce a refresh request in the refresh history but show in the monitoring hub [1]

{warning} allow other users who have permissions to the dataflow to refresh the data with other values [1]
{limitation} refresh history does not display information about the parameters passed during the invocation of the dataflow [1]
{limitation} monitoring hub doesn't display information about the parameters passed during the invocation of the dataflow [1]
{limitation} staged queries only keep the last data refresh of a dataflow stored in the Staging Lakehouse [1]
{limitation} only the first request will be accepted from duplicated requests for the same parameter values [1]

subsequent requests are rejected until the first request finishes its evaluation [1]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Use public parameters in Dataflow Gen2 (Preview) [link]

Resources:
[R1] Microsoft Fabric Blog (2025) Passing parameter values to refresh a Dataflow Gen2 (Preview) [link]

Acronyms:
API - Application Programming Interface

REST - Representational State Transfer

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Apr-2025

[Microsoft Fabric] Deployment Pipelines

{def} a structured process that enables content creators to manage the lifecycle of their organizational assets [5]

enable creators to develop and test content in the service before it reaches the users [5]

can simplify the deployment process to development, test, and production workspaces [5]
one Premium workspace is assigned to each stage [5]
each stage can have

different configurations [5]
different databases or different query parameters [5]

{action} create pipeline

from the deployment pipelines entry point in Fabric [5]

creating a pipeline from a workspace automatically assigns it to the pipeline [5]

{action} define how many stages it should have and what they should be called [5]

{default} has three stages

e.g. Development, Test, and Production
the number of stages can be changed anywhere between 2-10
{action} add another stage,
{action} delete stage
{action} rename stage

by typing a new name in the box

{action} share a pipeline with others

users receive access to the pipeline and become pipeline admins [5]

⇐ the number of stages are permanent [5]

can't be changed after the pipeline is created [5]

{action} add content to the pipeline [5]

done by assigning a workspace to the pipeline stage [5]

the workspace can be assigned to any stage [5]

{action|optional} make a stage public

{default} the final stage of the pipeline is made public
a consumer of a public stage without access to the pipeline sees it as a regular workspace [5]

without the stage name and deployment pipeline icon on the workspace page next to the workspace name [5]

{action} deploy to an empty stage

when finishing the work in one pipeline stage, the content can be deployed to the next stage [5]

deployment can happen in any direction [5]

{option} full deployment

deploy all content to the target stage [5]

{option} selective deployment

allows select the content to deploy to the target stage [5]

{option} backward deployment

deploy content from a later stage to an earlier stage in the pipeline [5]
{restriction} only possible when the target stage is empty [5]

{action} deploy content between pages [5]

content can be deployed even if the next stage has content

paired items are overwritten [5]

{action|optional} create deployment rules

when deploying content between pipeline stages, allow changes to content while keeping some settings intact [5]
once a rule is defined or changed, the content must be redeployed

the deployed content inherits the value defined in the deployment rule [5]
the value always applies as long as the rule is unchanged and valid [5]

{feature} deployment history

allows to see the last time content was deployed to each stage [5]
allows to to track time between deployments [5]

{concept} pairing

{def} the process by which an item in one stage of the deployment pipeline is associated with the same item in the adjacent stage

applies to reports, dashboards, semantic models
paired items appear on the same line in the pipeline content list [5]

⇐ items that aren't paired, appear on a line by themselves [5]

the items remain paired even if their name changes
items added after the workspace is assigned to a pipeline aren't automatically paired [5]

⇐ one can have identical items in adjacent workspaces that aren't paired [5]

[lakehouse]

can be removed as a dependent object upon deployment [3]
supports mapping different Lakehouses within the deployment pipeline context [3]
{default} a new empty Lakehouse object with same name is created in the target workspace [3]

⇐ if nothing is specified during deployment pipeline configuration
notebook and Spark job definitions are remapped to reference the new lakehouse object in the new workspace [3]
{warning} a new empty Lakehouse object with same name still is created in the target workspace [3]
SQL Analytics endpoints and semantic models are provisioned
no object inside the Lakehouse is overwritten [3]
updates to Lakehouse name can be synchronized across workspaces in a deployment pipeline context [3]

[notebook] deployment rules can be used to customize the behavior of notebooks when deployed [4]

e.g. change notebook's default lakehouse [4]
{feature} auto-binding

binds the default lakehouse and attached environment within the same workspace when deploying to next stage [4]

[environment] custom pool is not supported in deployment pipeline

the configurations of Compute section in the destination environment are set with default values [6]
⇐ subject to change in upcoming releases [6]

[warehouse]

[database project] ALTER TABLE to add a constraint or column

{limitation} the table will be dropped and recreated when deploying, resulting in data loss

{recommendation} do not create a Dataflow Gen2 with an output destination to the warehouse

⇐ deployment would be blocked by a new item named DataflowsStagingWarehouse that appears in the deployment pipeline [10]

SQL analytics endpoint is not supported

[Eventhouse]

{limitation} the connection must be reconfigured in destination that use Direct Ingestion mode [8]

[EventStream]

{limitation} limited support for cross-workspace scenarios

{recommendation} make sure all EventStream destinations within the same workspace [8]

KQL database

applies to tables, functions, materialized views [7]

KQL queryset

⇐ tabs, data sources [7]

[real-time dashboard]

data sources, parameters, base queries, tiles [7]

[SQL database]

includes the specific differences between the individual database objects in the development and test workspaces [9]

can be also used with

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Get started with deployment pipelines [link]

[2] Microsoft Learn (2024) Implement continuous integration and continuous delivery (CI/CD) in Microsoft Fabric [link]

[3] Microsoft Learn (2024) Lakehouse deployment pipelines and git integration (Preview) [link]

[4] Microsoft Learn (2024) Notebook source control and deployment [link]

[5] Microsoft Learn (2024) Introduction to deployment pipelines [link]

[6] Environment Git integration and deployment pipeline [link]

[7] Microsoft Learn (2024) Microsoft Learn (2024) Real-Time Intelligence: Git integration and deployment pipelines (Preview) [link]

[8] Microsoft Learn (2024) Eventstream CI/CD - Git Integration and Deployment Pipeline [link]

[9] Microsoft Learn (2024) Get started with deployment pipelines integration with SQL database in Microsoft Fabric [link]

[10] Microsoft Learn (2025) Source control with Warehouse (preview) [link]

Resources:

Acronyms:
CLM - Content Lifecycle Management
UAT - User Acceptance Testing

SQL Troubles

Pages

20 May 2025

#️⃣Software Engineering: Mea Culpa (Part IX: A Look Inward)

🤖〽️Prompt Engineering: Copilot Unabridged (Part 1: The Importance of AI in Society - A Transformational Force)

19 May 2025

#️⃣Software Engineering: Mea Culpa (Part VIII: A Look Beyond)

18 May 2025

#️⃣Software Engineering: Mea Culpa (Part VII: A Look Forward)

03 May 2025

🧭Business Intelligence: Perspectives (Part 31: More on Data Visualization)

📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)

29 April 2025

🏭🗒️Microsoft Fabric: Purview [Notes]

🏭🗒️Microsoft Fabric: Data Loss Prevention (DLP) in Purview [Notes]

26 April 2025

🏭🗒️Microsoft Fabric: Parameters in Dataflows Gen2 [Notes] 🆕

🏭🗒️Microsoft Fabric: Deployment Pipelines [Notes]

About Me