SQL Troubles: August 2024

22 August 2024

🧭Business Intelligence: Perspectives (Part 15: From Data to Storytelling III)

Business Intelligence Series

As children we heard or later read many stories, and even if few remained imprinted in memory, we can still recognize some of the metaphors and ideas used. Stories prepared us for life, and one can suppose that the business stories we hear nowadays have similar intent, charge and impact. However, if we dig deeper into each story and dissect it, we may be disappointed by its simplicity, the resemblance to other stories, to what we've heard over time. Moreover, stories can bring also negative connotations, that can impact any other story we hear.

From the scores or hundreds of distinct stories that have been told, few reach a magnitude that can become more than the stories themselves, few become a catalyst for the auditorium, and even then they tend to manipulate. Conversely, well-written transformative stories can move mountains when they resonate with the auditorium. In a leader’s motivational speech such stories can become a catalyst that moves people in the intended direction.

Children stories are quite simple and apparently don’t need special constructs even if the choice of words, structure and messages is important. Moving further into organizations, storytelling becomes more complex, upon case, structures and messages need to follow certain conventions within some politically correct scripts. Facts become important to the degree they serve the story, though the purposes they serve change with time, becoming secondary to the story. Storytelling becomes thus just of way of changing the facts as seems fit to the storyteller.

Storytelling has its role in organizations for channeling the multitude of messages across various structures. However, the more one hears the word storytelling, the more likely one is closer to fiction than to business decision-making. It's also true that the word in itself carries a power we all tasted during childhood and why not much later. The word has a magic power that appeals to our memories, to our feelings, to our expectations. However, as soon one's expectations are not met, the fight with the chimeras turns into a battle of our own. Yes, storytelling has great power when used right, when there's a story to tell, when the business narratives are worth telling.

The problem with stories is that no matter how much they are based on real facts or happenings, they become fictitious in time, to the degree that they lose some of the most important facts they were based on. That’s valid especially when there’s no written track of the story, though even then various versions of the story can multiply outside of the standard channels and boundaries.

Even if the author tried to keep the story as close to the facts, the way stories are understood, remembered and retold depend on too many factors - the words used, the degree to which metaphors and similar elements are understood, remembered and transmitted correctly, the language used, the mental structure existing in the auditorium, the association of words, ideas or metaphors, etc.

Unfortunately, the effect of stories can be negative too, especially when stories are designed to manipulate the auditorium beyond any ethical norms. When they don’t resonate with the crowd or are repeated unnecessary, the narratives may have adverse effects and the messages can get lost in the crowd or create resistance. Moreover, stories may have a multifold and opposite effect within different segments of the auditorium.

Storytelling can make hearts and minds resonate with the carried messages, though misdirected, improper or poorly conceived stories have also the power to destroy all that have been built over the years. Between the two extremes there’s a small space to send the messages across!

Previous Post <<||>> Next Post

21 August 2024

🧭Business Intelligence: Perspectives (Part 14: From Data to Storytelling II)

Business Intelligence Series

Being snapshots in people and organizations’ lives, data arrive to tell a story, even if the story might not be worth telling or might be important only in certain contexts. In fact each record in a dataset has the potential of bringing a story to life, though business people are more interested in the hidden patterns and “stories” the data reveal through more or less complex techniques. Therefore, data are usually tortured until they confess something, and unfortunately people stop analyzing the data with the first confession(s).

Even if it looks like torture, data need to be processed to reveal certain characteristics, trends or patterns that could help us in sense-making, decision-making or similar specific business purposes. Unfortunately, the volume of data increases with an incredible velocity to which further characteristics like variety, veracity, volume, velocity, value, veracity and variability may add up.

The data in a dashboard, presentation or even a report should ideally tell a story otherwise the data might not be worthy looking at, at least from some people’s perspective. Probably, that’s one of the reason why man dashboards remain unused shortly after they were made available, even if considerable time and money were invested in them. Seeing the same dull numbers gives the illusion that nothing changed, that nothing is worth reviewing, revealing or considering, which might be occasionally true, though one can’t take this as a rule! Lot of important facts could remain hidden or not considered.

One can suppose that there are businesses in which something important seldom happens and an alert can do a better job than reviewing a dashboard or a report frequently. Probably an alert is a better choice than reporting metrics nobody looks at!

Organizations usually define a set of KPIs (key performance indicators) and other types of metrics they (intend to) review periodically. Ideally, the numbers collected should define and reflect the critical points (aka pain points) of an organization, if they can be known in advance. Unfortunately, in dynamic businesses the focus can change considerably from one day to another. Moreover, in systemic contexts critical points can remain undiscovered in time if the set of metrics defined doesn’t consider them adequately.

Typically only one’s experience and current or past issues can tell what one should consider or ignore, which are the critical/pain points or important areas that must be monitored. Ideally, one should implement alerts for the critical points that require a immediate response and use KPIs for the recurring topics (though the two approaches may overlap).

Following the flow of goods, money and other resources one can look at the processes and identify the areas that must be monitored, prioritize them and identify the metrics that are worth tracking, respectively that reflect strengths, weaknesses, opportunities, threats and the risks associated with them.

One can start with what changed by how much, what caused the change(s) and what further impact is expected directly or indirectly, by what magnitude, respectively why nothing changed in the considered time unit. Causality diagrams can help in the process even if the representations can become quite complex.

The deeper one dives and the more questions one attempts to answer, the higher the chances to find a story. However, can we find a story that’s worth telling in any set of data? At least this is the point some adepts of storytelling try to make. Conversely, the data can be dull, especially when one doesn’t track or consider the right data. There are many aspects of a business that may look boring, and many metrics seem to track the boring but probably important aspects.

Previous Post <<||>> Next Post

18 August 2024

🧭Business Intelligence: Mea Culpa (Part III: Problem Solving)

Business Intelligence Series

I've been working for more than 20 years in BI and Data Analytics area, in combination with Software Engineering, ERP implementations, Project Management, IT services and several other areas, which allowed me to look at many recurring problems from different perspectives. One of the things I learnt is that problems are more complex and more dynamic than they seem, respectively that they may require tailored dynamic solutions. Unfortunately, people usually focus on one or two immediate perspectives, ignoring the dynamics and the multilayered character of the problems!

Sometimes, a quick fix and limited perspective is what we need to get started and fix the symptoms, and problem-solvers usually stop there. When left unsupervised, the problems tend to kick back, build up momentum and appear under more complex forms in various places. Moreover, the symptoms can remain hidden until is too late. To this also adds the political agendas and the further limitations existing in organizations (people, money, know-how, etc.).

It seems much easier to involve external people (individual experts, consultancy companies) to solve the problem(s), though unless they get a deep understanding of the business and the issues existing in it, the chances are high that they solve the wrong problems and/or implement the wrong solutions. Therefore, it's more advisable to have internal experts, when feasible, and that's the point where business people with technical expertise and/or IT people with business expertise can help. Ideally, one should have a good mix and the so called competency centers can do a great job in handling the challenges of organizations.

Between business and IT people there's a gap that can be higher or lower depending on resources know-how or the effort made by organizations to reduce it. To this adds the nature of the issues existing in organizations, which can vary considerable across departments, organizations or any other form of establishment. Conversely, the specific skillset can be transmuted where needed, which might happen naturally, though upon case also considerable effort needs to be involved in the process.

Being involved in similar tasks, one may get the impression that one can do whatever the others can do. This can happen in IT as well on the business side. There can be activities that can be done by parties from the other group, though there are also many exceptions in both directions, especially when one considers that one can’t generalize the applicability and/or transmutation of skillset.

A more concrete example is the know-how needed by a businessperson to use the BI infrastructure for answering business questions, and ideally for doing all or at least most of the activities a BI professional can do. Ideally, as part of the learning path, it would be helpful to have a pursuable path in between the two points. The mastery of tools helps in the process though there are different mindsets involved.

Unfortunately, the data-related fields are full of overconfident people who get the problem-solving process wrong. Data-based problem-solving resumes in gathering the right facts and data, building the right conceptual model, identifying the right questions to ask, collecting more data, refining methods and solutions, etc. There’s aways an easy wrong way to solve a problem!

The mastery of tools doesn’t imply the mastery of business domains! What people from the business side can bring is deeper insight in the business problems, though getting from there to implementing solutions can prove a long way, especially when problems require different approaches, different levels of approximations, etc. No tool alone can bridge such gaps yet! Frankly, this is the most difficult to learn and unfortunately many data professionals seem to get this wrong!

Previous Post <<||>> Next Post

16 August 2024

🧭Business Intelligence: Perspectives (Part 13: From Data to Storytelling I)

Business Intelligence Series

Data is an amalgam of signs, words, numbers and other visual or auditory elements used together to memorize, interpret, communicate and do whatever operation may seem appropriate with them. However, the data we use is usually part of one or multiple stories - how something came into being, what it represents, how is used in the various mental and non-mental processes - respectively, the facts, concepts, ideas, contexts places or other physical and nonphysical elements that are brought in connection with.

When we are the active creators of a story, we can in theory easily look at how the story came into being, the data used and its role in the bigger picture, respective the transformative elements considered or left out, etc. However, as soon we deal with a set of data, facts, or any other elements of a story we are not familiar with, we need to extrapolate the hypothetical elements that seem to be connected to the story. We need to make sense of these elements and consider all that seems meaningful, what we considered or left out shaping the story differently.

As children and maybe even later, all of us dealt with stories in one way or another, we all got fascinated by metaphors' wisdom and felt the energy that kept us awake, focused and even transformed by the words coming from narrator's voice, probably without thinking too much at the whole picture, but letting the words do their magic. Growing up, the stories grew in complexity, probably became richer in meaning and contexts, as we were able to decipher the metaphors and other elements, as we included more knowledge about the world around, about stories and storytelling.

In the professional context, storytelling became associated with our profession - data, information, knowledge and wisdom being created, assimilated and exchanged in more complex processes. From, this perspective, data storytelling is about putting data into a (business) context to seed cultural ground, to promote decision making and better understanding by building a narrative around the data, problems, challenges, opportunities, and further organizational context.

Further on, from a BI's perspective, all these cognitive processes impact on how data, information and knowledge are created, (pre)processed, used and communicated in organizations especially when considering data visualizations and their constituent elements (e.g. data, text, labels, metaphors, visual cues), the narratives that seem compelling and resonate with the auditorium.

There's no wonder that data storytelling has become something not to neglect in many business contexts. Storytelling has proved that words, images and metaphors can transmit ideas and knowledge, be transformative, make people think, or even act without much thinking. Stories have the power to seed memes, ideas, or more complex constructs into our minds, they can be used (for noble purposes) or misused.

A story's author usually takes compelling images, metaphors, and further elements, manipulates them to the degree they become interesting to himself/herself, to the auditorium, to the degree they are transformative and become an element of the business vocabulary, respectively culture, without the need to reiterate them when needed to bring more complex concepts, ideas or metaphors into being.

A story can be seen as a replication of the constituting elements, while storytelling is a set of functions that operate on them and change the initial structure and content into something that might look or not like the initial story. Through retelling and reprocessing in any form, the story changes independently of its initial form and content. Sometimes, the auditorium makes connections not recognized or intended by the storyteller. Other times, the use and manipulation of language makes the story change as seems fit.

Previous Post <<||>> Next Post

07 August 2024

🧭Business Intelligence: Perspectives (Part 12: From Data to Data Models)

Business Intelligence Series

A data model can be defined as an abstract, self-contained, logical definition of the data structures available in a database or similar repositories. It’s typically an abstraction of the data structures underpinning a set of processes, procedures and business logic used for a predefined purpose. A data model can be formed also of unrelated micromodels, depicting thus various aspects of a business.

The association between data and data models is bidirectional. Given a set of data, a data model can be built to underpin the respective data. Conversely, one can create or generate data based on a data model. However, in business setups a bidirectional relationship between data and the data model(s) underpinning them is more realistic as the business evolves. In extremis, the data model can be used to reflect a business’ needs, at least when the respective needs are addressed accordingly by extending the data model(s).

Given a set of data (e.g. the data stored in one or more spreadsheets or other type of files) there can be defined in theory multiple data models to reflect the respective data. Within a data model, the fields (aka attributes) are partitioned into a set of data entities, where a data entity is thus a nonunique grouping of attributes that attempt to define together one unitary aspect of the world. Customers, Vendors, Products, Invoices or Sales Orders are examples of such data entities, though entities can have a broader granularity (e.g. Customers can be modeled over several tables like Entity, Addresses, Contact information, etc.).

From an operational database’s perspective, a data entity is based on one or more tables, though several entities can share some of the tables. From a BI artifact’s perspective, an entity should be easy to create from the underlying tables, with a minimal set of transformations. Ideally, the BI data model should be as close as possible to the needed entity for reporting, however an optimal solution lies usually somewhere in between. In this resides the complexity of modeling BI solutions – providing an optimal data model which can be easily built on the source tables, and which allows addressing all or at least most of the BI requirements.

In other words, we deal with two optimization problems of two distinct data models. On one side the business data model must be flexible enough to provide fast read/write operations while keeping the referential data’s granularity efficient. Conversely, a BI data model needs to abstract these entities and provide a fast way of processing the data, while making data reads extremely efficient. These perspectives must apply when we move to Microsoft Fabric too.

The operational data layer must provide this abstraction, and in this resides the complexity of building optimal BI solutions. This is the layer at which the modeling problems need to be tackled. The challenge of BI and Analytics resides in finding an optimal data model that allows us to address most or ideally all the BI requirements. Several overlapping layers of abstraction may be built in the process.

Looking at the data modeling techniques used in notebooks and other similar solutions, data modeling has the chance of becoming a redundant practice prone to errors. Moreover, data models have a tendency of being multilayered and of being based on certain perspectives into the processes they model. Providing reliable flexible models involves finding the right view into the data for modeling aspects of the business. Database views allow us to easily model such perspectives, often in a unique way. Moving away from them just shifts the burden on the multiple solutions built around the base data, which can create other important challenges.

Previous Post <<||>> Next Post

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

Business Intelligence Series

We live in a complex world, which makes it difficult to model and work with the complex models that attempt to represent it. Thus, we try to simplify it to the degree that it becomes processable and understandable for us, while further simplification is needed when we try to depict it by digital means that make it processable by machines, respectively by us. Whenever we simplify something, we lose some aspects, which might be acceptable in many cases, but create issues in a broader number of ways.

With each layer of simplification results a model that addresses some parts while ignoring some parts of it, which restricts models’ usability to the degree that makes them unusable. The more one moves toward the extremes of oversimplification or complexification, the higher the chances for models to become unusable.

This aspect is relevant also in what concerns the business processes we deal with. Many processes are oversimplified to the degree that we track the entry and exit points, respectively the quantitative aspects we are interested in. In theory this information should be enough when answering some business questions, though might be insufficient when one dives deeper into processes. One can try to approximate, however there are high chances that such approximations deviate too much from the value approximated, which can lead to strange outcomes.

Therefore, when a date or other values are important, organizations consider adding more fields to reflect the implemented process with higher accuracy. Unfortunately, unless we save a history of all the important changes in the data, it becomes challenging to derive the snapshots we need for our analyses. Moreover, it is more challenging to obtain consistent snapshots. There are systems which attempt to obtain such snapshots through the implementation of the processes, though also this approach involves some complexity and other challenges.

Looking at the way business processes are implemented (see ERP, CRM and other similar systems), the systems track the created, modified and a few other dates that allow only limited perspectives. The fields typically provide the perspectives we need for data analysis. For many processes, it would be interesting to track other events and maybe other values taken in between.

There is theoretical potential in tracking more detailed data, but also a complexity that’s difficult to transpose into useful information about the processes themselves. Despite tracking more data and the effort involved in such activities, processes can still behave like black boxes, especially when we have no or minimal information about the processes implemented in Information Systems.

There’s another important aspect - even if systems provide similar implementations of similar processes, the behavior of users can make an important difference. The best example is the behavior of people entering the relevant data only when a process closes and ignoring the steps happening in between (dates, price or quantity changes).

There is a lot of missing data/information not tracked by such a system, especially in what concerns users’ behavior. It’s true that such behavior can be tracked to some degree, though that happens only when data are modified physically. One can suppose that there are many activities happening outside of the system.

The data gathered represents only the projection of certain events, which might not represent accurately and completely the processes or users’ behavior. We have the illusion of transparency, though we work with black boxes. There can be a lot of effort happening outside of these borders.

Fortunately, we can handle oversimplified processes and data maintenance, though one can but wonder how many important things can be found beyond the oversimplifications we work with, respectively what we miss in the process.

Previous Post <<||>> Next Post

04 August 2024

📊Graphical Representation: Graphics We Live By (Part X: Pie and Donut Charts in Power BI and Excel)

Graphical Representation Series

Pie charts are loved and hated by many altogether, and there are many entitled reasons to use them and avoid them, though the most important criteria to evaluate them is whether they do the intended job in an acceptable manner, especially when compared to other representational means. The most important aspect they depict is the part to whole ratio, which even if can be depicted by other graphical tools, few tools are efficient in representing it.

The pie chart works well as a visualization tool when it has only 3-5 values that are easily recognizable in the visualization, however as soon the size or the number of pieces vary considerably, the more difficult it is to visualize and interpret them, in case their representation has more negative than positive effects. There are many topics that form something like a long tail - the portion of the distribution having many occurrences far from the head or beginning. Displaying the items from the long tail together with the other components together can totally obscure the distribution of the items from the long tail as they become unrecognizable in the diagram.

One approach to handle this is to group all the items from the long tail together under a piece (e.g. Other) and use a second form of representation to display them separately. For example, Microsoft Excel offers a way to zoom in the section of a pie chart with small percentages by displaying them in a second pie chart (pie of pie) or bar chart (bar of pie), something like a "zoom in" perspective (see image below). Unfortunately, the feature seems to limit itself only to small percentages, and thus can't be used currently to offer a broader perspective. Ideally, it would be useful to zoom in on any piece of the pie, especially when the items are categorized as a hierarchy with two or even more levels.

Unfortunately, even modern visualization tools offer limited features in displaying this kind of perspective into a flexible unitary visualization, and thus users are forced to use their creativity in providing proper solutions. In the below example the "Renewables" piece of pie is further broken down into several components of a full pie, an ensemble supposed to function as a single form of representation. With a bit of effort, the reader probably will understand the meaning behind the two pie charts, however the encoding of colors and other elements used are suboptimal in the decoding process.

Pie Charts - Original Solution

In the above example, the arrow may suggest that in between the two donut charts exists a relationship, reflected also in the description provided, however the readers may still have difficulties in correctly interpreting the diagrams, especially when there's some kind of overlapping or other type of implied or unimplied resemblance. If the colors overlap or have other similarities, are they intentional? If the circles have the same size, does this observed resemblance have a meaning? The reader shouldn't bother himself with this type of questions, but see the resemblance and the meaning of the various elements with a minimum of effort while decoding a chart's elements. Of course, when the meaning is not clear, some guidance should be ideally provided!

Unfortunately, Power BI doesn't seem to have a similar visual like the one from Excel yet, however with a bit of effort one can obtain similar results, even if there are other minor or important limitations. For example, the lines between the two pie charts can't be drawn, so one is forced to use other encodings to show that there's a connection between the Renewable slice and the small pie chart. Moreover, the ensemble thus created isn't treated unitary and handled accordingly. Frankly, the maturity of a graphical representation environment can and should be judged also from this perspective!

The below representation built in Power BI uses a few tricks to display two pie charts together. The smaller pie chart representing the breakdown and pieces' colors are variations of parent's color, attempting to show that there's a relationship between the slice from the first chart and the pie chart with the details. Unfortunately, it wasn't possible to use similar lines like in Excel to show the relation between the two sections.

Pie of Pie in Power BI

Instead of a pie chart, one can use a donut, like in the original representation. Even if the donut uses a smaller area for representation, in theory the pie chart offers a better basis for comparisons, at least in theory. Stacked column charts can be used as well (see C), however one loses the certainty that the pieces must add up to 100%. Further limitations can appear when one wants to achieve more with the visualizations.

Custom charts can be used as well. The pie chart coming from xViz (see D) allows to increase the size of a pie piece by using another radius, technique which could be used to highlight the piece represented in the second chart. Frankly, sunburst diagrams (see E) are better at representing the parent to child proportions, where the same color encoding has been used. Unfortunately, the more information is shown, the more loaded the visualization seems to be.

Pie of Pie Alternatives in Power BI I

A treemap can prove to be a better representation alternative because it encodes proportions in a unitary way, much like pie charts do, though it takes more space if one wants to make the labels visible. Radial charts (see G) and Aster plots (see I) can be occasionally better choices, especially because they use less space as they display only the main categories. A second diagram chart can be used to display the subcategories, much like in A and B. Sankey charts (see H) can be used as well, even if they don't allow representing any quantitative values unless one encodes them directly in the labels.

Pie of Pie Alternatives in Power BI II

When one dives into the world of diagrams and goes behind the still limited representational choices provided by the standard tools, one can be surprised by the additional representational choices. However, their appropriateness should be considered against readers' skillset to read and interpret them! Frankly, the alternatives considered above could be a better choice when they will reach a representational maturity.

Many thanks to Christopher Chin, who in his weekly post on data visualization blunders, suggested the examples used as basis for this post (see [1])!

Previous Post <<||>> Next Post

References:

[1] LinkedIn (2024) Christopher Chin's post (link)

SQL Troubles

Pages

22 August 2024

🧭Business Intelligence: Perspectives (Part 15: From Data to Storytelling III)

21 August 2024

🧭Business Intelligence: Perspectives (Part 14: From Data to Storytelling II)

18 August 2024

🧭Business Intelligence: Mea Culpa (Part III: Problem Solving)

16 August 2024

🧭Business Intelligence: Perspectives (Part 13: From Data to Storytelling I)

07 August 2024

🧭Business Intelligence: Perspectives (Part 12: From Data to Data Models)

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

04 August 2024

📊Graphical Representation: Graphics We Live By (Part X: Pie and Donut Charts in Power BI and Excel)

About Me