Showing posts with label data visualization. Show all posts
Showing posts with label data visualization. Show all posts

15 February 2025

🧭Business Intelligence: Perspectives (Part XXVII: A Tale of Two Cities II)

Business Intelligence Series
Business Intelligence Series
There’s a saying that applies to many contexts ranging from software engineering to data analysis and visualization related solutions: "fools rush in where angels fear to tread" [1]. Much earlier, an adage attributed to Confucius provides a similar perspective: "do not try to rush things; ignore matters of minor advantage". Ignoring these advices, there's the drive in rapid prototyping to jump in with both feet forward without checking first how solid the ground is, often even without having adequate experience in the field. That’s understandable to some degree – people want to see progress and value fast, without building a foundation or getting an understanding of what’s happening, respectively possible, often ignoring the full extent of the problems.

A prototype helps to bring the requirements closer to what’s intended to achieve, though, as the practice often shows, the gap between the initial steps and the final solutions require many iterations, sometimes even too many for making a solution cost-effective. There’s almost always a tradeoff between costs and quality, respectively time and scope. Sooner or later, one must compromise somewhere in between even if the solution is not optimal. The fuzzier the requirements and what’s achievable with a set of data, the harder it gets to find the sweet spot.

Even if people understand the steps, constraints and further aspects of a process relatively easily, making sense of the data generated by it, respectively using the respective data to optimize the process can take a considerable effort. There’s a chain of tradeoffs and constraints that apply to a certain situation in each context, that makes it challenging to always find optimal solutions. Moreover, optimal local solutions don’t necessarily provide the optimum effect when one looks at the broader context of the problems. Further on, even if one brought a process under control, it doesn’t necessarily mean that the process works efficiently.

This is the broader context in which data analysis and visualization topics need to be placed to build useful solutions, to make a sensible difference in one’s job. Especially when the data and processes look numb, one needs to find the perspectives that lead to useful information, respectively knowledge. It’s not realistic to expect to find new insight in any set of data. As experience often proves, insight is rarer than finding gold nuggets. Probably, the most important aspect in gold mining is to know where to look, though it also requires luck, research, the proper use of tools, effort, and probably much more.

One of the problems in working with data is that usually data is analyzed and visualized in aggregates at different levels, often without identifying and depicting the factors that determine why data take certain shapes. Even if a well-suited set of dimensions is defined for data analysis, data are usually still considered in aggregate. Having the possibility to change between aggregates and details is quintessential for data’s understanding, or at least for getting an understanding of what's happening in the various processes. 

There is one aspect of data modeling, respectively analysis and visualization that’s typically ignored in BI initiatives – process-wise there is usually data which is not available and approximating the respective values to some degree is often far from the optimal solution. Of course, there’s often a tradeoff between effort and value, though the actual value can be quantified only when gathering enough data for a thorough first analysis. It may also happen that the only benefit is getting a deeper understanding of certain aspects of the processes, respectively business. Occasionally, this price may look high, though searching for cost-effective solutions is part of the job!

Previous Post  <<||>> Next Post

References:
[1] Alexander Pope (cca. 1711) An Essay on Criticism

04 February 2025

🧭Business Intelligence: Perspectives (Part XXVI: Monitoring - A Cockpit View)

Business Intelligence Series
Business Intelligence Series

The monitoring of business imperatives is sometimes compared metaphorically with piloting an airplane, where pilots look at the cockpit instruments to verify whether everything is under control and the flight ensues according to the expectations. The use of a cockpit is supported by the fact that an airplane is an almost "closed" system in which the components were developed under strict requirements and tested thoroughly under specific technical conditions. Many instruments were engineered and evolved over decades to operate as such. The processes are standardized, inputs and outputs are under strict control, otherwise the whole edifice would crumble under its own complexity. 

In organizational setups, a similar approach is attempted for monitoring the most important aspects of a business. A few dashboards and reports are thus built to monitor and control what’s happening in the areas which were identified as critical for the organization. The various gauges and other visuals were designed to provide similar perspectives as the ones provided by an airplane’s cockpit. At first sight the cockpit metaphor makes sense, though at careful analysis, there are major differences. 

Probably, the main difference is that businesses don’t necessarily have standardized processes that were brought under control (and thus have variation). Secondly, the data used doesn’t necessarily have the needed quality and occasionally isn’t fit for use in the business processes, including supporting processes like reporting or decision making. Thirdly, are high the chances that the monitoring within the BI infrastructures doesn’t address the critical aspects of the business, at least not at the needed level of focus, detail or frequency. The interplay between these three main aspects can lead to complex issues and a muddy ground for a business to build a stable edifice upon. 

The comparison with airplanes’ cockpit was chosen because the number of instruments available for monitoring is somewhat comparable with the number of visuals existing in an organization. In contrast, autos have a smaller number of controls simple enough to help the one(s) sitting in the cockpit. A car’s monitoring capabilities can probably reflect the needs of single departments or teams, though each unit needs its own gauges with specific business focus. The parallel is however limited because the areas of focus in organizations can change and shift in other directions, some topics may have a periodic character while others can regain momentum after a long time. 

There are further important aspects. At high level, the expectation is for software products and processes, including the ones related to BI topics, to have the same stability and quality as the mass production of automobiles, airplanes or other artifacts that have similar complexity and manufacturing characteristics. Even if the design process of software and manufacturing may share many characteristics, the similar aspects diverge as soon as the production processes start, respectively progress, and these are the areas where the most differences lie. Starting from the requirements and ending with the overall goals, everything resembles the characteristics of quick shifting sands on which is challenging to build any stabile edifice.

At micro level in manufacturing each piece was carefully designed and produced according to a set of characteristics that were proved to work. Everything must fit perfectly in the grand design and there are many tests and steps to make sure that happens. To some degree the same is attempted when building software products, though the processes break along the way with the many changes attempted, with the many cost, time and quality constraints. At some point the overall complexity kicks back; it might be still manageable though the overall effort is higher than what organizations bargained for. 

15 January 2025

🧭Business Intelligence: Perspectives (Part XXIII: In between the Many Destinations)

Business Intelligence Series
Business Intelligence Series

In too many cases the development of queries, respectively reports or data visualizations (aka artifacts) becomes a succession of drag & drops, formatting, (re)ordering things around, a bit of makeup, configuring a set of parameters, and the desired product is good to go! There seems nothing wrong with this approach as long as the outcomes meet users’ requirements, though it also gives the impression that’s all what the process is about. 

Given a set of data entities, usually there are at least as many perspectives into the data as entities’ number. Further perspectives can be found in exceptions and gaps in data, process variations, and the further aspects that can influence an artifact’s logic. All these aspects increase the overall complexity of the artifact, respectively of the development process. One guideline in handling all this is to keep the process in focus, and this starts with requirements’ elicitation and ends with the quality assurance and actual use.

Sometimes, the two words, the processes and their projection into the data and (data) models don’t reflect the reality adequately and one needs to compromise, at least until the gaps are better addressed. Process redesign, data harmonization and further steps need to be upon case considered in multiple iterations that should converge to optimal solutions, at least in theory. 

Therefore, in the development process there should be a continuous exploration of the various aspects until an optimum solution is reached. Often, there can be a couple of competing forces that can pull the solution in two or more directions  and then compromising is necessary. Especially as part of continuous improvement initiatives there’s the tendency of optimizing locally processes in the detriment of the overall process, with all the consequences resulting from this. 

Unfortunately, many of the problems existing in organizations are ill-posed and misunderstood to the degree that in extremis more effort is wasted than the actual benefits. Optimization is a process of putting in balance all the important aspects, respectively of answering with agility to the changing nature of the business and environment. Ignoring the changing nature of the problems and their contexts is a recipe for failure on the long term. 

This implies that people in particular and organizations in general need to become and  remain aware of the micro and macro changes occurring in organizations. Continuous learning is the key to cope with change. Organizations must learn to compromise and focus on what’s important, achievable and/or probable. Identifying, defining and following the value should be in an organization’s ADN. It also requires pragmatism (as opposed to idealism). Upon case, it may even require to say “no”, at least until the changes in the landscape offer a reevaluation of the various aspects.

One requires a lot from organizations when addressing optimization topics, especially when misalignment or important constraints or challenges may exist. Unfortunately, process related problems don’t always admit linear solutions. The nonlinear aspects are reflected especially when changing the scale, perspective or translating the issues or solutions from one are area to another.

There are probably answers available in the afferent literature or in the approaches followed by other organizations. Reinventing the wheel is part of the game, though invention may require explorations outside of the optimal paths. Conversely, an organization that knows itself has more chances to cope with the challenges and opportunities altogether. 

A lot from what organizations do in a consistent manner looks occasionally like inertia, self-occupation, suboptimal or random behavior, in opposition to being self-driven, self-aware, or in self-control. It’s also true that these are ideal qualities or aspects of what organizations should become in time. 

14 December 2024

🧭💹Business Intelligence: Perspectives (Part XXI: Data Visualization Revised)

Data Visualization Series
Data Visualization Series

Creating data visualizations nowadays became so easy that anybody can do it with a minimum of effort and knowledge, which on one side is great for the creators but can be easily become a nightmare for the readers, respectively users. Just dumping data in visuals can be barely called data visualization, even if the result is considered as such. The problems of visualization are multiple – the lack of data culture, the lack of understanding processes, data and their characteristics, the lack of being able to define and model problems, the lack of educating the users, the lack of managing the expectations, etc.

There are many books on data visualization though they seem an expensive commodity for the ones who want rapid enlightenment, and often the illusion of knowing proves maybe to be a barrier. It's also true that many sets of data are so dull, that the lack of information and meaning is compensated by adding elements that give a kitsch look-and-feel (aka chartjunk), shifting the attention from the valuable elements to decorations. So, how do we overcome the various challenges? 

Probably, the most important step when visualizing data is to define the primary purpose of the end product. Is it to inform, to summarize or to navigate the data, to provide different perspectives at macro and micro level, to help discovery, to explore, to sharpen the questions, to make people think, respectively understand, to carry a message, to be artistic or represent truthfully the reality, or maybe is just a filler or point of attraction in a textual content?

Clarifying the initial purpose is important because it makes upfront the motives and expectations explicit, allowing to determine the further requirements, characteristics, and set maybe some limits in what concern the time spent and the qualitative and/or qualitative criteria upon which the end result should be eventually evaluated. Narrowing down such aspects helps in planning and the further steps performed. 

Many of the steps are repetitive and past experience can help reduce the overall effort. Therefore, professionals in the field, driven by intuition and experience probably don't always need to go through the full extent of the process. Conversely, what is learned and done poorly, has high chances of delivering poor quality. 

A visualization can be considered as effective when it serves the intended purpose(s), when it reveals with minimal effort the patterns, issues or facts hidden in the data, when it allows people to explore the data, ask questions and find answers altogether. One can talk also about efficiency, especially when readers can see at a glance the many aspects encoded in the visualization. However, the more the discovery process is dependent on data navigation via filters or other techniques, the more difficult it becomes to talk about efficiency.

Better criteria to judge visualizations is whether they are meaningful and useful for the readers, whether the readers understood the authors' intent, the further intrinsic implication, though multiple characteristics can be associated with these criteria: clarity, specificity, correctedness, truthfulness, appropriateness, simplicity, etc. All these are important in lower or higher degree depending on the broader context of the visualization.

All these must be weighted in the bigger picture when creating visualizations, though there are probably also exceptions, especially on the artistic side, where artists can cut corners for creating an artistic effect, though also in here the authors need to be truthful to the data and make sure that their work don't distort excessively the facts. Failing to do so might not have an important impact on the short term considerably, though in time the effects can ripple with unexpected effects.


13 December 2024

🧭💹Business Intelligence: Perspectives (Part XX: From BI to AI)

Business Intelligence Series

No matter how good data visualizations, reports or other forms of BI artifacts are, they only serve a set of purposes for a limited amount of time, limited audience or any other factors that influence their lifespan. Sooner or later the artifacts become thus obsolete, being eventually disabled, archived and/or removed from the infrastructure. 

Many artifacts require a considerable number of resources for their creation and maintenance over time. Sometimes the costs can be considerably higher than the benefits brought, especially when the data or the infrastructure are used for a narrow scope, though there can be other components that need to be considered in the bigger picture. Having a report or visualization one can use when needed can have an important impact on the business in correcting issues, sizing opportunities or filling the knowledge gaps. 

Even if it’s challenging to quantify the costs associated with the loss of opportunities rooted in the lack of data, respectively information, the amounts can be considerable high, greater even than building a whole BI infrastructure. Organization’s agility in addressing the important gaps can make a considerable difference, at least in theory. Having the resources that can be pulled on demand can give organizations the needed competitive boost. Internal or external resources can be used altogether, though, pragmatically speaking, there will be always a gap between demand and supply of knowledgeable resources.

The gap in BI artefacts can be addressed nowadays by AI-driven tools, which have the theoretical potential of shortening the gap between needs and the availability of solutions, respectively a set of answers that can be used in the process. Of course, the processes of sense-making and discovery are not that simple as we’d like, though it’s a considerable step forward. 

Having the possibility of asking questions in natural language and guiding the exploration process to create visualizations and other artifacts using prompt engineering and other AI-enabled methods offers new possibilities and opportunities that at least some organizations started exploring already. This however presumes the existence of an infrastructure on which the needed foundation can be built upon, the knowledge required to bridge the gap, respectively the resources required in the process. 

It must be stressed out that the exploration processes may bring no sensible benefits, at least no immediately, and the whole process depends on organizations’ capabilities of identifying and sizing the respective opportunities. Therefore, even if there are recipes for success, each organization must identify what matters and how to use technologies and the available infrastructure to bridge the gap.

Ideally to make progress organizations need besides the financial resources the required skillset, a set of projects that support learning and value creation, respectively the design and execution of a business strategy that addresses the steps ahead. Each of these aspects implies risks and opportunities altogether. It will be a test of maturity for many organizations. It will be interesting to see how many organizations can handle the challenge, respectively how much past successes or failures will weigh in the balance. 

AI offers a set of capabilities and opportunities, however the chance of exploring and failing fast is of great importance. AI is an enabler and not a magic wand, no matter what is preached in technical workshops! Even if progress follows an exponential trajectory, it took us more than half of century from the first steps until now and probably many challenges must be still overcome. 

The future looks interesting enough to be pursued, though are organizations capable to size the opportunities, respectively to overcome the challenges ahead? Are organizations capable of supporting the effort without neglecting the other priorities? 

12 December 2024

🧭💹Business Intelligence: Perspectives (Part XIX: Data Visualization between Art, Pragmatism and Kitsch)

Business Intelligence Series

The data visualizations (aka dataviz) presented in the media, especially the ones coming from graphical artists, have the power to help us develop what is called graphical intelligence, graphical culture, graphical sense, etc., though without a tutor-like experience the process is suboptimal because it depends on our ability of identifying what is important and which are the steps needed for decoding and interpreting such work, respectively for integrating their messages in our overall understanding about the world.

When such skillset is lacking, without explicit annotations or other form of support, the reader might misinterpret or fail to observe important visual cues even for simple visualizations, with all the implications deriving from this – a false understanding, and further aspects deriving from it, this being probably the most important aspect to consider. Unfortunately, even the most elaborate work can fail if the reader doesn’t have a basic understanding of all that’s implied in the process.

The books of Willard Brinton, Ana Rogers, Jacques Bertin, William Cleveland, Leland Wilkinson, Stephen Few, Albert Cairo, Soctt Berinato and many others can help the readers build a general understanding of the dataviz process and how data visualizations or simple graphics can be used/misused effectively, though each reader must follow his/her own journey. It’s also true that the basics can be easily learned, though the deeper one dives, the more interesting and nontrivial the journey becomes. Fortunately, the average reader can stick to the basics and many visualizations are simple enough to be understood.

To grasp the full extent of the implications, one can make comparisons with the domain of poetry where the author uses basic constructs like metaphor, comparisons, rhythm and epithets to create, communicate and imprint in reader’s mind old and new meanings, images and feelings altogether. Artistic data visualizations tend to offer similar charge as poetry does, even if the impact might not appeal so much to our artistic sensibility. Though dataviz from this perspective is or at least resembles an art form.

Many people can write verses, though only a fraction can write good meaningful poetry, from which a smaller fraction get poems, respectively even fewer get books published. Conversely, not everything can be expressed in verses unless one finds good metaphors and other aspects that can be leveraged in the process. Same can be said about good dataviz.

One can argue that in dataviz the author can explore and learn especially by failing fast (seeing what works and what doesn’t). One can also innovate, though the creator has probably a limited set of tools and rules for communication. Enabling readers to see the obvious or the hidden in complex visualizations or contexts requires skill and some kind of mastery of the visual form.

Therefore, dataviz must be more pragmatic and show the facts. In art one has the freedom to distort or move things around to create new meanings, while in dataviz it’s important for the meaning to be rooted in 'truth', at least by definition. The more the creator of a dataviz innovates, the higher the chances of being misunderstood. Moreover, readers need to be educated in interpreting the new meanings and get used to their continuous use.

Kitsch is a term applied to art and design that is perceived as naïve imitation to the degree that it becomes a waste of resources even if somebody pays the tag price. There’s a trend in dataviz to add elements to visualizations that don’t bring any intrinsic value – images, colors and other elements can be misused to the degree that the result resembles kitsch, and the overall value of the visualization is diminished considerably.

16 October 2024

🧭💹Business Intelligence: Perspectives (Part XVIII: There’s More to Noise)

Business Intelligence Series
Business Intelligence Series

Visualizations should be built with an audience's characteristics in mind! Upon case, it might be sufficient to show only values or labels of importance (minima, maxima, inflexion points, exceptions, trends), while other times it might be needed to show all or most of the values to provide an accurate extended perspective. It even might be useful to allow users switching between the different perspectives to reduce the clutter when navigating the data or look at the patterns revealed by the clutter. 

In data-based storytelling are typically shown the points, labels and further elements that support the story, the aspects the readers should focus on, though this approach limits the navigability and users’ overall experience. The audience should be able to compare magnitudes and make inferences based on what is shown, and the accurate decoding shouldn’t be taken as given, especially when the audience can associate different meanings to what’s available and what’s missing. 

In decision-making, selecting only some well-chosen values or perspectives to show might increase the chances for a decision to be made, though is this equitable? Cherry-picking may be justified by the purpose, though is in general not a recommended practice! What is not shown can be as important as what is shown, and people should be aware of the implications!

One person’s noise can be another person’s signal. Patterns in the noise can provide more insight compared with the trends revealed in the "unnoisy" data shown! Probably such scenarios are rare, though it’s worth investigating what hides behind the noise. The choice of scale, the use of special types of visualizations or the building of models can reveal more. If it’s not possible to identify automatically such scenarios using the standard software, the users should have the possibility of changing the scale and perspective as seems fit. 

Identifying patterns in what seems random can prove to be a challenge no matter the context and the experience in the field. Occasionally, one might need to go beyond the general methods available and statistical packages can help when used intelligently. However, a presenter’s challenge is to find a plausible narrative around the findings and communicate it further adequately. Additional capabilities must be available to confirm the hypotheses framed and other aspects related to this approach.

It's ideal to build data models and a set of visualizations around them. Most probable some noise may be removed in the process, while other noise will be further investigated. However, this should be done through adjustable visual filters because what is removed can be important as well. Rare events do occur, probably more often than we are aware and they may remain hidden until we find the right perspective that takes them into consideration. 

Probably, some of the noise can be explained by special events that don’t need to be that rare. The challenge is to identify those parameters, associations, models and perspectives that reveal such insights. One’s gut feeling and experience can help in this direction, though novel scenarios can surprise us as well.

Not in every set of data one can find patterns, respectively a story trying to come out. Whether we can identify something worth revealing depends also on the data available at our disposal, respectively on whether the chosen data allow identifying significant patterns. Occasionally, the focus might be too narrow, too wide or too shallow. It’s important to look behind the obvious, to look at data from different perspectives, even if the data seems dull. It’s ideal to have the tools and knowledge needed to explore such cases and here the exposure to other real-life similar scenarios is probably critical!

04 August 2024

📊Graphical Representation: Graphics We Live By (Part X: Pie and Donut Charts in Power BI and Excel)

Graphical Representation Series

Pie charts are loved and hated by many altogether, and there are many entitled reasons to use them and avoid them, though the most important criteria to evaluate them is whether they do the intended job in an acceptable manner, especially when compared to other representational means. The most important aspect they depict is the part to whole ratio, which even if can be depicted by other graphical tools, few tools are efficient in representing it. 

The pie chart works well as a visualization tool when it has only 3-5 values that are easily recognizable in the visualization, however as soon the size or the number of pieces vary considerably, the more difficult it is to visualize and interpret them, in case their representation has more negative than positive effects. There are many topics that form something like a long tail - the portion of the distribution having many occurrences far from the head or beginning. Displaying the items from the long tail together with the other components together can totally obscure the distribution of the items from the long tail as they become unrecognizable in the diagram. 

One approach to handle this is to group all the items from the long tail together under a piece (e.g. Other) and use a second form of representation to display them separately. For example,  Microsoft Excel offers a way to zoom in the section of a pie chart with small percentages by displaying them in a second pie chart (pie of pie) or bar chart (bar of pie), something like a "zoom in" perspective (see image below). Unfortunately, the feature seems to limit itself only to small percentages, and thus can't be used currently to offer a broader perspective. Ideally, it would be useful to zoom in on any piece of the pie, especially when the items are categorized as a hierarchy with two or even more levels. 


Unfortunately, even modern visualization tools offer limited features in displaying this kind of perspective into a flexible unitary visualization, and thus users are forced to use their creativity in providing proper solutions. In the below example the "Renewables" piece of pie is further broken down into several components of a full pie, an ensemble supposed to function as a single form of representation. With a bit of effort, the reader probably will understand the meaning behind the two pie charts, however the encoding of colors and other elements used are suboptimal in the decoding process. 

Pie Charts - Original Solution

In the above example, the arrow may suggest that in between the two donut charts exists a relationship, reflected also in the description provided, however the readers may still have difficulties in correctly interpreting the diagrams, especially when there's some kind of overlapping or other type of implied or unimplied resemblance. If the colors overlap or have other similarities, are they intentional? If the circles have the same size, does this observed resemblance have a meaning? The reader shouldn't bother himself with this type of questions, but see the resemblance and the meaning of the various elements with a minimum of effort while decoding a chart's elements. Of course, when the meaning is not clear, some guidance should be ideally provided!

Unfortunately, Power BI doesn't seem to have a similar visual like the one from Excel yet, however with a bit of effort one can obtain similar results, even if there are other minor or important limitations. For example, the lines between the two pie charts can't be drawn, so one is forced to use other encodings to show that there's a connection between the Renewable slice and the small pie chart. Moreover, the ensemble thus created isn't treated unitary and handled accordingly. Frankly, the maturity of a graphical representation environment can and should be judged also from this perspective!

The below representation built in Power BI uses a few tricks to display two pie charts together. The smaller pie chart representing the breakdown and pieces' colors are variations of parent's color, attempting to show that there's a relationship between the slice from the first chart and the pie chart with the details. Unfortunately, it wasn't possible to use similar lines like in Excel to show the relation between the two sections. 

Pie of Pie in Power BI

Instead of a pie chart, one can use a donut, like in the original representation. Even if the donut uses a smaller area for representation, in theory the pie chart offers a better basis for comparisons, at least in theory. Stacked column charts can be used as well (see C), however one loses the certainty that the pieces must add up to 100%. Further limitations can appear when one wants to achieve more with the visualizations.

Custom charts can be used as well. The pie chart coming from xViz (see D) allows to increase the size of a pie piece by using another radius, technique which could be used to highlight the piece represented in the second chart. Frankly, sunburst diagrams (see E) are better at representing the parent to child proportions, where the same color encoding has been used. Unfortunately, the more information is shown, the more loaded the visualization seems to be.

Pie of Pie Alternatives in Power BI I

A treemap can prove to be a better representation alternative because it encodes proportions in a unitary way, much like pie charts do, though it takes more space if one wants to make the labels visible. Radial charts (see G) and Aster plots (see I) can be occasionally better choices, especially because they use less space as they display only the main categories. A second diagram chart can be used to display the subcategories, much like in A and B. Sankey charts (see H) can be used as well, even if they don't allow representing any quantitative values unless one encodes them directly in the labels. 

Pie of Pie Alternatives in Power BI II

When one dives into the world of diagrams and goes behind the still limited representational choices provided by the standard tools, one can be surprised by the additional representational choices. However, their appropriateness should be considered against readers' skillset to read and interpret them! Frankly, the alternatives considered above could be a better choice when they will reach a representational maturity. 

Many thanks to Christopher Chin, who in his weekly post on data visualization blunders, suggested the examples used as basis for this post (see [1])!

Previous Post <<||>> Next Post

References:
[1] LinkedIn (2024) Christopher Chin's post (link)

18 May 2024

📊Graphical Representation: Graphics We Live By (Part IV: Area Charts in MS Excel)

Graphical Representation
Graphical Representation

An area chart or area graph (see A) is a graphical representation of quantitative data based on a line chart for which the areas between axis and the lines of the series are commonly emphasized with colors, textures, or hatchings (Wikipedia). It resembles a combination between line and bar charts. Each data series results in the formation of a region (aka area), allowing thus to identify the overlapping and do comparisons between the lines within the same visual display. This approach works usually well for two or three data series if the lines don't overlap, though if more data series are added to the chart, the higher are the chances for lines to overlap or for one area to be covered by another (see B). This can easily become more than the chart can handle, even if the data series can be filtered dynamically.

Area Charts
Area Charts

Stacked area charts are a variation of area charts in which the areas are stacked, much like stacked bar charts (see C). Research papers abound with such charts, probably because they allow to stack together multiple data series within a small area, reflecting thus the many variables involved. Such charts allow to track individual as well as intermediary and total aggregated trends.

Stacked Area Charts
Stacked Area Charts

Unfortunately, besides the fact that some areas are barely distinguishable or that distant areas can't be compared (especially when one area in between has strong fluctuations), the lack of ticks and/or gridlines (see D) makes it difficult to interpret such charts. Moreover, when the lines are smoothed, it becomes even more difficult to identify the actual points. To address this it makes sense to use markers for data points to show that one works with discrete and not continuous points (see further paragraphs).

In general, it's recommended to reduce the number of data series to 3-5. For example, one can split the data series into 2-3 groups or categories based on series' characteristics (e.g. concentrate on the high values in one chart, respectively the low values in another, or group the low values under an "others" category) which would allow to make better comparisons.

Being able to sort the time series on their average value or other criteria (e.g. showing the areas with minimal variations first) can improve the readability of such charts.

Moreover, areas under curves can easily hide missing data (see F) and occasionally negative values (which is the case of the 8th example), or distort the rate of change when the charts are wider than needed (compare F with C). 

Line Chart, respectively Area Chart based on a subset
Area Charts Variations

Area charts seem to encode a dimension as area, though that's not necessarily the case. It seems natural to display time series of different granularities (day, month, quarter, year), though one needs to be careful about one important aspect! On a time scale, the more one moves away from the day to weeks and months as time units, the bigger the distance between points is. In the end, all the points in a series are discrete points (not continuous), though the bigger the distance, the more category-like these series become (compare F with C, the charts have the same width).

Using the area under the curve as dimension makes sense when there's continuity or the discrete points are close enough to each other to resemble continuity. Thus, area charts are useful when the number of points is high (and the distance between them becomes neglectable), e.g. showing daily values within a year or the months over several years. 

According to [2], [3] and several other sources, using the area to encode quantitative information is a poor graphical method and this applies to pie charts and area charts altogether. By contrast, for a bar chart (see G) one has either height or width to use for comparisons while the points are always as bars delimited. Scatter plots (see H), even if they might miss the time dimension, they better reflect the dispersion of the points along the lines delimited by encoding the color (compare H with E). 

Column Chart and Scatter Plot
Alternatives for Area Charts

The more category-like and the fewer data points the data series have, the higher the chances for other graphical representation tools to be able to better represent the data. For example, year or even quarter-based data can be better visualized with Sankey charts (unfortunately, not available as standard Excel visual yet).

Conversely, there are situations in which the area chart isn't supposed to convey specific values but to get a feeling of areas' shape, or its simplicity is more appropriate, situations in which area charts do a good job. In the end, a graphical representation's utility is linked to a chart's purpose (and audience, of course). 

References:
[1] Wikipedia (2023) Area charts (link)
[2] William S Cleveland (1993) Visualizing Data
[3] Robert L Harris (1996) Information Graphics: A Comprehensive Illustrated Reference

22 March 2024

🧭Business Intelligence: Perspectives (Part IX: Dashboards Are Dead & Other Crap)

Business Intelligence
Business Intelligence Series

I find annoying the posts that declare that a technology is dead, as they seem to seek the sensational and, in the end, don't offer enough arguments for the positions taken; all is just surfing though a few random ideas. Almost each time I klick on such a link I find myself disappointed. Maybe it's just me - having too great expectations from ad-hoc experts who haven't understood the role of technologies and their lifecycle.

At least until now dashboards are the only visual tool that allows displaying related metrics in a consistent manner, reflecting business objectives, health, or other important perspective into an organization's performance. More recently notebooks seem to be getting closer given their capabilities of presenting data visualizations and some intermediary steps used to obtain the data, though they are still far away from offering similar capabilities. So, from where could come any justification against dashboard's utility? Even if I heard one or two expert voices saying that they don't need KPIs for managing an organization, organizations still need metrics to understand how the organization is doing as a whole and taken on parts. 

Many argue that the design of dashboards is poor, that they don't reflect data visualization best practices, or that they are too difficult to navigate. There are so many books on dashboard and/or graphic design that is almost impossible not to find such a book in any big library if one wants to learn more about design. There are many resources online as well, though it's tough to fight with a mind's stubbornness in showing no interest in what concerns the topic. Conversely, there's also lot of crap on the social networks that qualify after the mainstream as best practices. 

Frankly, design is important, though as long as the dashboards show the right data and the organization can guide itself on the respective numbers, the perfectionists can say whatever they want, even if they are right! Unfortunately, the numbers shown in dashboards raise entitled questions and the reasons are multiple. Do dashboards show the right numbers? Do they focus on the objectives or important issues? Can the number be trusted? Do they reflect reality? Can we use them in decision-making? 

There are so many things that can go wrong when building a dashboard - there are so many transformations that need to be performed, that the chances of failure are high. It's enough to have several blunders in the code or data visualizations for people to stop trusting the data shown.

Trust and quality are complex concepts and there’s no standard path to address them because they are a matter of perception, which can vary and change dynamically based on the situation. There are, however, approaches that allow to minimize this. One can start for example by providing transparency. For each dashboard provide also detailed reports that through drilldown (or also by running the reports separately if that’s not possible) allow to validate the numbers from the report. If users don’t trust the data or the report, then they should pinpoint what’s wrong. Of course, the two sources must be in synch, otherwise the validation will become more complex.

There are also issues related to the approach - the way a reporting tool was introduced, the way dashboards flooded the space, how people reacted, etc. Introducing a reporting tool for dashboards is also a matter of strategy, tactics and operations and the various aspects related to them must be addressed. Few organizations address this properly. Many organizations work after the principle "build it and they will come" even if they build the wrong thing!

Previous Post <<||>> Next Post

29 February 2024

📊R Language: Visualizing the Iris Dataset

When working with a dataset that has several numeric features, it's useful to visualize it to understand the shapes of each feature, usually by category or in the case of the iris dataset by species. For this purpose one can use a combination between a boxplot and a stripchart to obtain a visualization like the one below (click on the image for a better resolution):

Iris features by species
Iris features by species (box & jitter plots combined)

And here's the code used to obtain the above visualization:

par(mfrow = c(2,2)) #2x2 matrix display

boxplot(iris$Petal.Width ~ iris$Species) 
stripchart(iris$Petal.Width ~ iris$Species
	, method = "jitter"
	, add = TRUE
	, vertical = TRUE
	, pch = 20
	, jitter = .5
	, col = c('steelblue', 'red', 'purple'))

boxplot(iris$Petal.Length ~ iris$Species) 
stripchart(iris$Petal.Length ~ iris$Species
	, method = "jitter"
	, add = TRUE
	, vertical = TRUE
	, pch = 20
	, jitter = .5
	, col = c('steelblue', 'red', 'purple'))

boxplot(iris$Sepal.Width ~ iris$Species) 
stripchart(iris$Sepal.Width ~ iris$Species
	, method = "jitter"
	, add = TRUE
	, vertical = TRUE
	, pch = 20
	, jitter = .5
	, col = c('steelblue', 'red', 'purple'))

boxplot(iris$Sepal.Length ~ iris$Species) 
stripchart(iris$Sepal.Length ~ iris$Species
	, method = "jitter"
	, add = TRUE
	, vertical = TRUE
	, pch = 20
	, jitter = .5
	, col = c('steelblue', 'red', 'purple'))

mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)
title("Iris Features (cm) by Species", line = -2, outer = TRUE)

By contrast, one can obtain a similar visualization with just a command:

plot(iris, col = c('steelblue', 'red', 'purple'), pch = 20)
title("Iris Features (cm) by Species", line = -1, outer = TRUE)
mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)

And here's the output:

Iris features by species (general plot)

One can improve the visualization by using a bigger contrast between colors (I preferred to use the same colors as in the previous visualization).

I find the first data visualization easier to understand and it provides more information about the shape of data even it requires more work.

Histograms make it easier to understand the distribution of values, though the visualizations make sense only when done by species:

Histograms of Setosa's features

And, here's the code:

par(mfrow = c(2,2)) #2x2 matrix display

setosa = subset(iris, Species == 'setosa') #focus only on setosa
hist(setosa$Sepal.Width)
hist(setosa$Sepal.Length)
hist(setosa$Petal.Width)
hist(setosa$Petal.Length)
title("Setosa's Features (cm)", line = -1, outer = TRUE)
mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)

There's however a visual called stacked histogram that allows to delimit the data for each species:


Iris features by species (stacked histograms)

And, here's the code:

#installing plotrix & multcomp
install.packages("plotrix")
install.packages("plotrix")
library(plotrix)
library(multcomp)

par(mfrow = c(2,2)) #1x2 matrix display

histStack(iris$Sepal.Width
	, z = iris$Species
	, col = c('steelblue', 'red', 'purple')
	, main = "Sepal.Width"
	, xlab = "Width"
	, legend.pos = "topright")

histStack(iris$Sepal.Length
	, z = iris$Species
	, col = c('steelblue', 'red', 'purple')
	, main = "Sepal.Length"
	, xlab = "Length"
	, legend.pos = "topright")

histStack(iris$Petal.Width
	, z = iris$Species
	, col = c('steelblue', 'red', 'purple')
	, main = "Petal.Width"
	, xlab = "Width"
	, legend.pos = "topright")

histStack(iris$Petal.Length
	, z = iris$Species
	, col = c('steelblue', 'red', 'purple')
	, main = "Petal.Length"
	, xlab = "Length"
	, legend.pos = "topright")
title("Iris Features (cm) by Species - Histograms", line = -1, outer = TRUE)
mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)

Conversely, the standard histogram allows drawing the density curves within its boundaries:

par(mfrow = c(2,2)) #1x2 matrix display 

hist(iris$Sepal.Width
	, main = "Sepal.Width"
	, xlab = "Width"
	, las = 1, cex.axis = .8, freq = F)
eq = density(iris$Sepal.Width) # estimate density curve
lines(eq, lwd = 2) # plot density curve

hist(iris$Sepal.Length
	, main = "Sepal.Length"
	, xlab = "Length"
	, las = 1, cex.axis = .8, freq = F)
eq = density(iris$Sepal.Length) # estimate density curve
lines(eq, lwd = 2) # plot density curve

hist(iris$Petal.Width
	, main = "Petal.Width"
	, xlab = "Width"
	, las = 1, cex.axis = .8, freq = F)
eq = density(iris$Petal.Width) # estimate density curve
lines(eq, lwd = 2) # plot density curve

hist(iris$Petal.Length
	, main = "Petal.Length"
	, xlab = "Length"
	, las = 1, cex.axis = .8, freq = F)
eq = density(iris$Petal.Length) # estimate density curve
lines(eq, lwd = 2) # plot density curve

title("Iris Features (cm) by Species - Density plots", line = -1, outer = TRUE)
mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)

And, here's the diagram:

Iris features aggregated (histograms with density plots)

As final visualization, one can also compare the width and length for the sepal, respectively petal:
 
par(mfrow = c(1,2)) #1x2 matrix display

plot(iris$Sepal.Width, iris$Sepal.Length, main = "Sepal Width vs Length", col = iris$Species)
plot(iris$Petal.Width, iris$Petal.Length, main = "Petal Width vs Length", col = iris$Species)

title("Iris Features (cm) by Species - Scatter Plots", line = -1, outer = TRUE)
mtext("© sql-troubles@blogspot.com 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)

And, here's the output:
 
Iris features by species (scatter plots)

Happy coding!

Previous Post <<||>> Next Post

27 February 2024

🔖Book Review: Rolf Hichert & Jürgen Faisst's International Business Communication Standards (IBCS Version 1.2)

Over the last months I found several references to Rolf Hichert & Jürgen Faisst's booklet on business communication standards [1]. It draw my attention especially because it attempts to provide a standard for reports and data visualizations, which frankly it seems like a tremendous endeavor if done right. The two authors founded the IBCS institute 20 years ago, which is a host, training institute, and certification body of the Creative Commons project called IBCS.

The 150 pages booklet considers various standardization techniques with the help of more than 180 instructive figures, the overall structure being based on a set of principles and rules rooted in an acronym that spells "SUCCESS" - Say, Unify, Condense, Check, Express, Simplify, Structure. On one side the principles seem to form a solid fundament, however the fundament seems to suffer from the same rigidity resulted from fitting something in a nicely-spelled acronym. 

Say or conveying a message reflects the principle that each report should convey a message, otherwise the report is just a data collection. According to this "definition" most of the operational reports are just collections of data. Conversely, lot of communication in organizations revolve around issues, metrics and decision making, scenarios in which the messages conveyed can be powerful though dependent on the business context. Settling on only one message can make the message fall short.

Unifying or applying semantic notation reflects the principle that things that have same meaning should look the same. There are many patterns out there that can be standardized, however it's questionable how much complex visualizations can be standardized, respectively how much liberty of expressing certain aspects the standardization allows. 

Condense or increasing the information density reflects the requirements that all information necessary to understanding the content should, if possible, be included on one page. This allows to easier navigate the content and prioritize what the audience is able to see. The principle however seems to have more to do with the ink-information ratio principle (see [2]). 

Check or ensuring the visual integrity reflects the principle that the information should be presented in the most truthful and the most easily understood way. This is something that many data visualizations out there lack.

Express or choosing the proper visualizations is based on the principle that the visuals considered should be as intuitive as possible. In theory, the more intuitive a visual the easier is to be understood and reused, however this depends on the "visual vocabulary" and "visual grammar" of each individual. Intuition is something that needs to grow through the interplay of these two areas. Having the expectation of displaying everything in terms of basic elements is unrealistic and suboptimal. 

Simplify or avoiding clutter refers to eliminating the unnecessary from a visualization, when there's nothing to take out without changing the meaning of a visualization. At least, the principle is correctly considered even if is in general difficult to apply because quite often one needs to build something more complex and reduce the complexity through iterative steps until the simple is obtained. 

Structure or organizing the content is based on the principle that content should follow (a logical consistent) structure. The interplay between function and structure is an important topic in itself.

Browsing through the many data visualizations given as example, I'd say that many of the recommendations make sense, though from there to a standardization is still a long way. The reader should evaluate against his/her own judgements the practices described and consider what seems to work. 

The book is available on the IBS website as PDF, though the Kindle version is 40% cheaper. Overall, it is worth a read. 

Previous Post <<||>>  Next Post

Resources:
[1] Rolf Hichert & Jürgen Faisst (2022) "International Business Communication Standards (IBCS Version 1.2): Conceptual, perceptual, and semantic design of comprehensible business reports, presentations, and dashboards" (link)
[2] Edward R Tufte (1983) "The Visual Display of Quantitative Information"
[3] IBCS Institude (2024) About (link)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.