SQL Troubles: data visualization

Showing posts with label data visualization. Show all posts

09 August 2025

🧭Business Intelligence: Perspectives (Part 33: Data Lifecycle for Analytics)

Business Intelligence Series

In the context of BI, Analytics and other data-related topics, the various parties usually talk about data ingestion, preparation, storage, analysis and visualization, often ignoring processes like data generation, collection, and interpretation. It’s also true that a broader discussion may shift the attention unnecessarily, though it’s important to increase people’s awareness in respect to data’s full lifecycle. Otherwise, many of the data solutions become a mix of castles built into the air, respectively structures of cards waiting for the next flurry to be blown away.

Data is generated continuously by organizations, their customers, vendors, and third parties, as part of a complex network of processes, systems and integrations that extend beyond their intended boundaries. Independently of their type, scope and various other characteristics, all processes consume and generate data at a rapid pace that steadily exceeds organizations’ capabilities to make good use of it.

There are also scenarios in which the data must be collected via surveys, interviews, forms, measurements or direct observations, and whatever processes are used to elicit some aspect of importance. The volume and other characteristics of data generated in this way may depend on the goals and objectives in scope, respectively the methods, procedures and even the methodologies used.

Data ingestion is the process of importing data from the various sources into a central or intermediary repository for storage, processing, analysis and visualization. The repository can be a data mart, warehouse, lakehouse, data lake or any other destination intended for the intermediary or the final intended destination of data. Moreover, data can have different levels of quality in respect to its intended usage.

Data storage refers to the systems and approaches used to securely retain, organize, and access data throughout its journey within the various layers of the infrastructure. It focuses on where and how data is stored, independently on whether that’s done on-premises, in the cloud or across hybrid environments.

Data preparation is the process of transforming the data into a form close to what is intended for analysis and visualization. It may involve data aggregation, enrichment, transposition and other operations that facilitate further steps. It’s probably the most important step in a data project given that the final outcome can have an important impact on data analysis and visualization, facilitating or impeding the respective processes.

Data analysis consists of a multitude of processes that attempt to harness value from data in its various forms of aggregation. The ultimate purpose is to infer meaningful information, respectively knowledge from the data augmented as insights. The road from raw data to these targeted outcomes is a tedious one, where recipes can help and imped altogether. Expecting value from any pile of data can easily become a costly illusion when data, processes and their usage is poorly understood and harnessed.

Data visualization is the means of presenting data and its characteristics in the form of figures, diagrams and other forms of representation that facilitate data’s navigation, perception and understanding for various purposes. Usually, the final purpose is fact-checking, decision-making, problem-solving, etc., though there is a multitude of steps in between. Especially in these areas there are mixed good and poor practices altogether.

Data interpretation is the attempt of drawing meaningful conclusions from the data, information and knowledge gained mainly from data analysis and visualization. It is often a subjective interpretation as it’s usually regarded from people’s understanding of the various facts as they are considered. The inferences made in the process can be a matter of gut feeling, respectively of mature analysis. It’s about sense-making, contextualization, critical thinking, pattern recognition, internalization and externalization, and other similar cognitive processes.

Previous Post <<||>> Next Post

30 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 3: Heuristics)

Graphical Representation Series

Consider the following general heuristics in data visualizations (work in progress):

plan design

plan page composition

text

title, subtitles
dates

refresh, filters applied

parameters applied
guidelines/tooltips
annotation

navigation

main page(s)
additional views
drill-through
zoom in/out
next/previous page
landing page

slicers/selections

date-related

date range
date granularity

functional

metric
comparisons

categorical

structural relations

icons/images

company logo
button icons
background

pick a theme

choose a layout and color schema

use a color palette generator
use a focused color schema or restricted palette
use consistent and limited color scheme
use suggestive icons

use one source (with similar design)

use formatting standards

create a visual hierarchy

use placement, size and color for emphasis
organize content around eye movement pattern
minimize formatting changes
1 font, 2 weights, 4 sizes

plan the design

build/use predictable and consistent templates

e.g. using Figma

use layered design
aim for design unity
define & use formatting standards
check changes

GRACEFUL

group visuals with white space
right chart type
avoid clutter
consistent & limited color schema
enhanced readability
formatting standard
unity of design
layered design

keep it simple

be predictable and consistent
focus on the message

identify the core insights and design around them
pick suggestive titles/subtitles

use dynamics subtitles

align content with the message

avoid unnecessary complexity

minimize visual clutter

remove the unnecessary elements
round numbers

limit colors and fonts

use a restrained color palette (<5 colors)
stick to 1-2 fonts
ensure text is legible without zooming

aggregate values

group similar data points to reduce noise
use statistical methods

averages, medians, min/max

categories when detailed granularity isn’t necessary

highlight what matters

e.g. actionable items
guide attention to key areas

via annotations, arrows, contrasting colors
use conditional formatting

do not show only the metrics

give context

show trends

via sparklines and similar visuals

use familiar visuals

avoid questionable visuals

e.g. pie charts, gauges

avoid distortions

preserve proportions

scale accurately to reflect data values
avoid exaggerated visuals

don’t zoom in on axes to dramatize small differences

use consistent axes

compare data using the same scale and units across charts
don't use dual axes or shifting baselines that can mislead viewers

avoid manipulative scaling

use zero-baseline on bar charts
use logarithmic scales sparingly

design for usability

intuitive interaction
at-a-glance perception

use contrast for clarity
use familiar patterns

use consistent formats the audience already knows

design with the audience in mind

analytical vs managerial perspectives (e.g. dashboards)

use different level of data aggregations

in-depth data exploration

encourage scrutiny

give users enough context to assess accuracy

provide raw values or links to the source

explain anomalies, outliers or notable trends

via annotations

group related items together

helps identify and focus on patterns and other relationships

diversify

don't use only one chart type
pick the chart that reflects the best the data in the conrext considered

show variance

absolute vs relative variance
compare data series
show contribution to variance

use familiar encodings

leverage (known) design patterns

use intuitive navigation

synchronize slicers

use tooltips

be concise
use hover effects

use information buttons

enhances user interaction and understanding

by providing additional context, asking questions

use the full available surface

1080x1920 works usually better

keep standards in mind

e.g. IBCS

state the assumptions

be explicit

clearly state each assumption

instead of leaving it implied

contextualize assumptions

explain the assumption

use evidence, standard practices, or constraints

state scope and limitations

mention what the assumption includes and excludes

tie assumptions to goals & objectives

helps to clarify what underlying beliefs are shaping the analysis
helps identify whether the visualization achieves its intended purpose

show the data

be honest (aka preserve integrity)

avoid distortion, bias, or trickery

support interpretation

provide labels, axes, legends

emphasize what's meaningful

patterns, trends, outliers, correlations, local/global maxima/minima

show what's important

e.g. facts, relationships, flow, similarities, differences, outliers, unknown
prioritize and structure the content

e.g. show first an overview, what's important

make the invisible visible

think about what we do not see

know your (extended) users/audience

who'll use the content, at what level, for that

test for readability

get (early) feedback

have the content reviewed first

via peer review, dry run presentation

tell the story

know the audience and its needs
build momentum, expectation
don't leave the audience to figure it out
show the facts
build a narrative

show data that support it
arrange the visuals in a logical sequence

engage the reader

ask questions that bridge the gaps

e.g. in knowledge, in presentation's flow

show the unexpected
confirm logical deductions

Previous Post <<||>> Next Post

27 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 2: Guidelines)

Graphical Representation Series

Consider the following best practices in data visualizations (work in progress):

avoid poor labeling and annotation practices

label data points

considering labeling at least the important number of points

e.g. starts, ends, local/global minima/minima
when labels clutter the chart or there's minimal variation

avoid abbreviations

unless they are defined clearly upfront, consistent and/or universally understood
can hinder understanding

abbreviations should help compress content without losing meaning

use font types, font sizes, and text orientation that are easy to read
avoid stylish design that makes content hard to read
avoid redundant information
text should never overshadow or distort the actual message or data

use neutral, precise wording

avoid the use of pre-attentive attributes

aka visual features that our brains process almost instantly
color

has identity value: used to distinguish one thing from another

carries its own connotations
gives a visual scale of measure
the use of color doesn’t always help

refers to the dominant color family of a specific color, being processed by the brain based on the different wavelengths of light

allows to differentiate categories

use distinct hues to represent different categories

intensity (aka brightness)

refers to how strong or weak a color appears

saturation (aka chroma, intensity)

refers to the purity or vividness of a color

as saturation decreases, the color becomes more muted or washed out
highly saturated colors have little or no gray in it
highly desaturated colors are almost gray, with none of the original colors

use high saturation for important elements like outliers, trends, or alerts
use low saturation for background elements

avoid pure colors that are bright and saturated

drive attention to the respective elements

avoid colors that are too similar in tone or saturation
avoid colors hard to distinguish for color-blind users

e.g. red-green color blindness

brown-green, orange-red, blue-purple combinations
avoid red-green pairings for status indicators

e.g. success/error

e.g. blue-yellow color blindness

blue-green, yellow-ping, purple-blue

e.g. total color blindness (aka monochromacy)

all colors appear as shades of gray

⇒ users must rely entirely on contrast, shape, and texture

use icons, labels, or patterns alongside color
use tools to test for color issues

e.g. Color Oracle, WebAIM Contrast Checker

use colorblind-safe palettes

e.g. ColorBrewer or Viridis4

for sequential or diverging data, use one hue and vary saturation or brightness to show magnitude
start with all-gray data elements

use color only when it corresponds to differences in data

⇐ helps draw attention to whatever isn’t gray

dull and neutral colors give a sense of uniformity
can modify/contradict readers' intuitive response
choose colors to draw attention, to label, to show relationships

form

shape

allows to distinguish types of data points and encode information

well-shaped data has functional and aesthetic character

complex shapes can become more difficult to be perceived

size

attribute used to encode the magnitude or extent of elements
should be aligned to its probable use, importance, and amount of detail involved

larger elements draw more attention

its encoding should be meaningful

e.g. magnitudes of deviations from the baseline

overemphasis can lead to distortions
choose a size range that is appropriate for the data
avoid using size to represent nominal or categorical data where there's no inherent order to the sizes

orientation

angled or rotated items stand out.

length/width

useful in bar charts to show quantity
avoid stacked bar graphs

curvature

curved lines can contrast with straight ones.

collinearity

alignment can suggest grouping or flow

highlighting
spatial positioning

2D position

placement on axes or grids conveys value

3D position in 2D space

grouping

proximity implies relationships.
keep columns, respectively bars close together

enclosure

borders or shaded areas signal clusters.

depth (stereoscopic or shading)

adds dimensionality

avoid graphical features that are purely decorative

aka elements that don't affect understanding, structure or usability
stylistic embellishments

borders/frames

ornamental lines or patterns around content

background images

images used for ambiance, not content

drop shadows and gradients

enhance depth or style but don’t add meaning.

icons without function

decorative icons that don’t represent actions or concepts

non-informative imagery

stock photos

generic visuals that aren’t referenced in the text.

illustrations

added for visual interest, not explanation.

mascots or logos

when repeated or not tied to specific content.

layout elements

spacers

transparent or blank images used to control layout
leave the right amount of 'white' space between chart elements

custom bullets or list markers

designed for flair, not clarity

visual separators

lines or shapes that divide sections without conveying hierarchy or meaning

avoid bias

sampling bias

showing data that doesn’t represent the full population

avoid cherry-picking data

aka selecting only the data that support a particular viewpoint while ignoring others that might contradict it
enable users to look at both sets of data and contrast them
enable users to navigate the data

avoid survivor bias

aka focusing only on the data that 'survived' a process and ignoring the data that didn’t

use representative data

aka the dataset includes all relevant groups

check for collection bias

avoid data that only comes from one source
avoid data that excludes key demographics

cognitive bias

mental shortcut that sometimes affect interpretation

incl. confirmation bias, framing bias, pattern bias

balance visual hierarchies

don’t make one group look more important by overemphasizing it

show uncertainty

by including confidence intervals or error bars to reflect variability

separate comparisons

when comparing groups, use adjacent charts rather than combining them into one that implies a hierarchy

e.g. ethnicities, region

visual bias

design choices that unintentionally (or intentionally) distort meaning

respectively how viewers interpret the data

avoid manipulating axes

by truncating y-axis

exaggerates differences

by changing scale types

linear vs. logarithmic

a log scale compresses large values and expands small ones, which can flatten exponential growth or make small changes seem more significant

uneven intervals

using inconsistent spacing between tick marks can distort trends

by zooming in/out

adjusting the axis to focus on a specific range can highlight or hide variability and eventually obscure the bigger picture

by using dual axes

if the scales differ too much, it can falsely imply correlation or exaggerate relationships

by distorting the aspect ration

stretching or compressing the chart area can visually amplify or flatten trends

e.g. a steep slope might look flat if the x-axis is stretched

avoid inconsistent scales
label axes clearly
explain scale choices

avoid overemphasis

avoid unnecessary repetition

e.g. of the same graph, of content

avoid focusing on outliers, (short-term) trends
avoid truncating axes, exaggerating scales
avoid manipulating the visual hierarchy

avoid color bias

bright colors draw attention unfairly

avoid overplotting

too much data obscures patterns

avoid clutter

creates cognitive friction

users struggle to focus on what matters because their attention is pulled in too many directions
is about design excess

avoid unnecessary or distracting elements

they don’t contribute to understanding the data

avoid overloading

attempting to show too much data at once

is about data excess

overwhelms readers' processing capacity, making it hard to extract insights or spot patterns

algorithmic bias

the use of ML or other data processing techniques can reinforce certain aspects (e.g. social inequalities, stereotypes)
visualize uncertainty

include error bars, confidence intervals, and notes on limitations

audit data and algorithms

look for bias in inputs, model assumptions and outputs

intergroup bias

charts tend to reflect or reinforce societal biases

e.g. racial or gender disparities

use thoughtful ordering, inclusive labeling
avoid deficit-based comparisons

avoid overcomplicating the visualizations

e.g. by including too much data, details, other elements

avoid comparisons across varying dimensions

e.g. (two) circles of different radius, bar charts of different height, column charts of different length,
don't make users compare angles, areas, volumes

Previous Post <<||>> Next Post

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

Graphical Representation Series

Introduction

Creating simple charts or more complex data visualizations may appear trivial for many, though their authors shouldn't forget that readers have different backgrounds, degrees of literacy, many of them not being maybe able to make sense of graphical displays, at least not without some help.

Beginners start with a limited experience and build upon it, then, on the road to mastery, they get acquainted with the many possibilities, a deeper sense is achieved and the choices become a few. Independently of one's experience, there are seldom 'yes' and 'no' answers for the various choices, but everything is a matter of degree that varies with one's experience, available time, audience's expectations, and many more aspects might be considered in time.

The following questions are intended to expand, respectively narrow down our choices when dealing with data visualizations from a data professional's perspective. The questions are based mainly on [1] though they were extended to include a broader perspective.

General Questions

Where does the data come from? Is the source reliable, representative (for the whole population in scope)? Is the data source certified? Are yhe data actual?

Are there better (usable) sources? What's the effort to consider them? Does the data overlap? To what degree? Are there any benefits in merging the data? How much this changes the overall picture? Are the changes (in trends) explainable?

Was the data collected? How, from where, and using what method? [1] What methodology/approach was used?

What's the dataset about? Can one recognize the data, the (data) entities, respectively the structures behind? How big is the fact table (in terms of rows and columns)? How many dimensions are in scope?

What transformations, calculations or modifications have been applied? What was left out and what's the overall impact?

Any significant assumptions were made? [1] Were the assumptions clearly stated? Are they entitled? Is it more to them?

Were any transformation applied? Do the transformations change any data characteristics? Were they adequately documented/explained? Do they make sense? Was it something important left out? What's the overall impact?

What criteria were used to include/exclude data from the display? [1] Are the criteria adequately explained/documented? Do they make sense?

Are similar data publicly available? Is it (freely) accessible/usable? To what degree? How much do the datasets overlap? Is there any benefit to analyze/use the respective data? Are the characteristics comparable? To what degree?

Dataviz Questions

What's the title/subtitle of the chart? Is it meaningful for the readers? Does the title reflect the data, respectively the findings adequately? Can it be better formulated? Is it an eye-catcher? Does it meet the expectations?

What data is shown? Of what type? At what level is the data aggregated?

What chart (type) is being used? [1] Are the readers familiar with the chart type? Does it needs further introduction/clarifications? Are there better means to represent the data? Does the chart offer the appropriate perspective? Does it make sense to offer different (complementary) perspective(s)? To what degree other perspectives help?

What items of data do the marks represent? What value associations do the attributes represent? [1] Are the marks visible? Are the marks adequately presented (e.g. due to missing data)?

What range of values are displayed? [1] What approximation the values support? To what degree can the values be rounded without losing meaning?

Is the data categorical, ordinal or continuous?

Are the axes property chosen/displayed/labeled? Is the scale properly chosen (linear, semilogarithmic, logarithmic), respectively displayed? Do they emphasize, diminish, distort, simplify, or clutter the information?

What features (shapes, patterns, differences or connections) are observable, interesting or vital for understanding the chart? [1]

Where are the largest, mid-sized and smallest values? (aka ‘stepped magnitude’ judgements). [1]

Where lie the most/least values? Where is the average or normal? (aka ‘global comparison’ judgements)” [1] How are the values distributed? Are there any outliers present? Are they explainable?

What features are expected or unexpected? [1] To what degree are they unexpected?

What features are important given the subject? [1]

What shapes and patterns strike readers as being semantically aligned with the subject? [1]

What is the overall feeling when looking at the final result? Is the chart overcrowded? Can anything be left out/included?

What colors were used? [1] Are the colors adequately chosen, respectively meaningful? Do they follow the general recommendations?

What colors, patterns, forms do readers see first? What impressions come next, respectively last longer?

Are the various elements adequately/intuitively positioned/distinguishable? What's the degree of overlapping/proximity? Do the elements respect an intuitive hierarchy? Do they match readers' expectations, respectively the best practices in scope? Are the deviations entitled?

Is the space properly used? To what degree? Are there major gaps?

Know Your Audience

What audience targets the visualization? Which are its characteristics (level of experience with data visualizations; authors, experts or casual attendees)? Are there any accidental attendees? How likely is the audience to pay attention?

What is audience’s relationship with the subject matter? What knowledge do they have or, conversely, lack about the subject? What assistance might they need to interpret the meaning of the subject? Do they have the capacity to comprehend what it means to them? [1]

Why do the audience wants/needs to understand the topic? Are they familiar, respectively actively interested or more passive? Is it able to grasp the intended meaning? [1] To what degree? What kind of challenges might be involved, of what nature?

What is their motivation? Do they have a direct, expressed need or are they more passive and indifferent? Is it needed a way to persuade them or even seduce them to engage? [1] Can this be done without distorting the data and its meaning(s)?

What are their visualization literacy skill set? Do they require assistance perceiving the chart(s)? Are they sufficiently comfortable with operating features of interactivity? Do they have any visual accessibility issues (e.g. red–green color blindness)? Do they need to be (re)factored into the design? [1]

Reflections

What has been learnt? Has it reinforced or challenged existing knowledge? [1] Was new knowledge gained? How valuable is this knowledge? Can it be reused? In which contexts?

Do the findings meet one's expectations? To what degree? Were the expectations entitled? On what basis? What's missing? What's gaps' relevance?

What feelings have been stirred? Has the experience had an impact emotionally? [1] To what degree? Is the impact positive/negative? Is the reaction entitled/explainable? Are there any factors that distorted the reactions? Are they explainable? Do they make sense?

What does one do with this understanding? Is it just knowledge acquired or something to inspire action (e.g. making a decision or motivating a change in behavior)? [1] How relevant/valuable is the information for us? Can it be used/misused? To what degree?

Are the data and its representation trustworthy? [1] To what degree?

Previous Post <<||>> Next Post

References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019

06 July 2025

🧭Business Intelligence: Perspectives (Part 32: Data Storytelling in Visualizations)

Business Intelligence Series

From data-related professionals to book authors on data visualization topics, there are many voices that require from any visualization to tell a story, respectively to conform to storytelling principles and best practices, and this independently of the environment or context in which the respective artifacts are considered. The need for data visualizations to tell a story may be entitled, though in business setups the data, its focus and context change continuously with the communication means, objectives, and, at least from this perspective, one can question storytelling’s hard requirement.

Data storytelling can be defined as "a structured approach for communicating data insights using narrative elements and explanatory visuals" [1]. Usually, this supposes the establishment of a context, respectively a fundament on which further facts, suppositions, findings, arguments, (conceptual) models, visualizations and other elements can be based upon. Stories help to focus the audience on the intended messages, they connect and eventually resonate with the audience, facilitate the retaining of information and understanding the chain of implications the decisions in scope have, respectively persuade and influence, when needed.

Conversely, besides the fact that it takes time and effort to prepare stories and the afferent content (presentations, manually created visualizations, documentation), expecting each meeting to be a storytelling session can rapidly become a nuisance for the auditorium as well for the presenters. Like in any value-generating process, one should ask where the value in storytelling is based on data visualizations and the effort involved, or whether the effort can be better invested in other areas.

In many scenarios, requesting from a dashboard to tell a story is an entitled requirement given that many dashboards look like a random combination of visuals and data whose relationship and meaning can be difficult to grasp and put into a plausible narrative, even if they are based on the same set of data. Data visualizations of any type should have an intentional well-structured design that facilitates visual elements’ navigation, understanding facts’ retention, respectively resonate with the auditorium.

It’s questionable whether such practices can be implemented in a consistent and meaningful manner, especially when rich navigation features across multiple visuals are available for users to look at data from different perspectives. In such scenarios the identification of cases that require attention and the associations existing between well-established factors help in the discovery process.

Often, it feels like visuals were arranged aleatorily in the page or that there’s no apparent connection between them, which makes the navigation and understanding more challenging. For depicting a story, there must be a logical sequencing of the various visualizations displayed in the dashboards or reports, especially when visuals’ arrangement doesn’t reflect the typical navigation of the visuals or when the facts need a certain sequencing that facilitates understanding. Moreover, the sequencing doesn’t need to be linear but have a clear start and end that encompasses everything in between.

Storytelling works well in setups in which something is presented as the basis for one-time or limited in scope sessions like decision-making, fact-checking, awareness raising and other types of similar communication. However, when building solutions for business monitoring and data exploration, there can be multiple stories or no story worth telling, at least not for the predefined scope. Even if one can zoom in or out, respectively rearrange the visuals and add others to highlight the stories encompassed, the value added by taking the information out of the dashboards and performing such actions can be often neglected to the degree that it doesn’t pay off. A certain consistency, discipline and acumen is needed then for focusing on the important aspects and ignoring thus the nonessential.

Previous Post <<||>> Next Post

References:
[1] Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019 [quotes]

03 May 2025

🧭Business Intelligence: Perspectives (Part 31: More on Data Visualization)

Business Intelligence Series

There are many reasons why the data visualizations available in the different mediums can be considerate as having poor quality and unfortunately there is often more than one issue that can be corroborated with this - the complexity of the data or of the models behind them, the lack of identifying the right data, respectively aspects that should be visualized, poor data visualization software or the lack of skills to use its capabilities, improper choice of visual displays, misleading choice of scales, axes and other elements, the lack of clear outlines for telling a story respectively of pushing a story too far, not adapting visualizations to changing requirements or different perspectives, to name just the most important causes.

The complexity of the data increases with the dimensions associated typically with what we call currently big data - velocity, volume, value, variety, veracity, variability and whatever V might be in scope. If it's relatively easy to work with a small dataset, understanding its shapes and challenges, our understanding power decreases with the Vs added into the picture. Of course, we can always treat the data alike, though the broader the timeframe, the higher the chances are for the data to have important changing characteristics that can impact the outcomes. It can be simple definition changes or more importantly, the model itself. Data, processes and perspectives change fluidly with the many requirements, and quite often the further implications for reporting, visualizations and other aspects are not considered.

Quite often there's a gap between what one wants to achieve with a data visualization and the data or knowledge available. It might be a matter of missing values or whole attributes that would help to delimit clearly the different perspectives or of modelling adequately the processes behind. It can be the intrinsic data quality issues that can be challenging to correct after the fact. It can also be our understanding about the processes themselves as reflected in the data, or more important, on what's missing to provide better perspectives. Therefore, many are forced to work with what they have or what they know.

Many of the data visualizations inadvertently reflect their creators' understanding about the data, procedures, processes, and any other aspects related to them. Unfortunately, also business users or other participants have only limited views and thus their knowledge must be elicited accordingly. Even then, it might be pieces of data that are not reflected in any knowledge available.

If one tortures enough data, one or more stories worthy of telling can probably be identified. However, much of the data is dull to the degree that some creators feel forced to add elements. Earlier, one could have blamed the software for it, though modern software provides nice graphics and plenty of features that can help graphics creators in the process. Even data with high quality can reveal some challenges difficult to overcome. One needs to compromise and there can be compromises in many places to the degree that one can but wonder whether the end result still reflects reality. Unfortunately, it's difficult to evaluate the impact of such gaps, however progress can be made occasionally by continuously evaluating the gaps and finding the appropriate methods to address them.

Not all stories must have complex visualizations in which multiple variables are used to provide the many perspectives. Some simple visualizations can be enough for establishing common ground on which something more complex (or simple) can be built upon. Data visualization is a continuous process of exploration, extrapolation, evaluation, testing assumptions and ideas, where one's experience can be a useful mediator between the various forces.

Previous Post <<||>> Next Post

📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)

Graphical Representation Series

Over the past 10-20 years it became so easy to create data visualizations just by dropping some of the data available into a tool like Excel and providing a visual depiction of it with just a few clicks. In many cases, the first draft, typically provided by default in the tool used, doesn't even need further work as the objective was reached, while in others the creator must have a minimum skillset for making the visualization useful, appealing, or whatever quality is a final requirement for the work in scope. However, the audience might judge the visualization(s) from different perspectives, and there can be a broad audience with different skills in reading, evaluating and understanding the work.

There are many depictions on the web resembling the one below, taken from a LinkedIn post:

Example Chart - Boing vs. Airbus

Even if the visualization is not perfect, it does a fair job in representing the data. Improvements can be made in the areas of labels, the title and positioning of elements, and the color palette used. At least these were the improvements made in the original post. It must be differentiated also between the environment in which the charts are made available, the print format having different characteristics than the ones in business setups. Unfortunately, the requirements of the two are widely confused, probably also because of the overlapping of the mediums used.

Probably, it's a good idea to always start with the row data (or summaries of it) when the result consists of only a few data points that can be easily displayed in a table like the one below (the feature to round the decimals for integer values should be available soon in Power BI):

Summary Table

Of course, one can calculate more meaningful values like percentages from the total, standard deviations and other values that offer more perspectives into the data. Even if the values adequately reflect the reality, the reader can but wonder about the local and global minimal/maximal values, without talking much about the meaning of data points, which is easily identifiable in a chart. At least in the case of small data sets, using a table in combination with a chart can provide a more complete perspective and different ways of analyzing the data, especially when the navigation is interactive.

Column and bar charts do a fair job in comparing values over time, though they do use a lot of ink in the process (see D). While they make it easy to compare neighboring values, the rectangles used tend to occupy a lot of space when they are made too wide or too high to cover the empty space within the display (e.g. when just a few values are displayed, space being wasted in the process). As the main downside, it takes a lot of scanning until the reader identifies the overall trends, and the further away the bars are from each other, the more difficult it becomes to do comparisons.

In theory, line charts are more efficient in representing the above data points, because the marks are usually small and the line thin enough to provide a better data-ink ratio, while one can see a lot at a glance. In Power BI the creator can use different types of interpolation: linear (A), step (B) or smooth (C). In many cases, it might be a good idea to use a linear interpolation, though when there are no or minimal overlapping, it might be worthwhile to explore the other types if interpolation too (and further request feedback from the users):

Linear, Step and Smooth Line Charts

The nearness of values from different series can raise difficulties in identifying adequately the points, respectively delimiting the lines (see B).When the density of values allows it, it makes sense also to include the averages for each data series to reflect the distance between the two data sets. Unfortunately, the chart can get crowded if further data series or summaries are added to the cart(s).

If the column chart (E) is close to the redesigned chart provided in the original redesign, the other alternatives can provide upon case more value. Stacked column charts (D) allow also to compare the overall quantity by month, area charts (F) tend to use even more color than needed, while water charts (G) allow to compare the difference between data points per time unit. Tornado charts (H) are a variation of bar charts, allowing easier comparing of the size of the bars, while ribbon charts (I) show well the stacking values.

Alternatives to Line Charts

One should consider changing the subtitle(s) slightly to reflect the chart type when the patterns shown imply a shift in attention or meaning. Upon case, more that one of the above charts can be used within the same report when two or more perspectives are important. Using a complementary perspective can facilitate data's understanding or of identifying certain patterns that aren't easily identifiable otherwise.

In general, the graphics creators try to use various representational means of facilitating a data set's understanding, though seldom only two series or a small subset of dimensions provide a complete description. The value of data comes when multiple perspectives are combined. Frankly, the same can be said about the above data series. Yes, there are important differences between the two series, though how do the numbers compare when one looks at the bigger picture, especially when broken down on element types (e.g. airplane size). How about plan vs. actual values, how long does it take more for production or other processes? It's one of a visualization's goals to improve the questions posed, but how efficient are visualizations that barely scratch the surface?

In what concerns the code, the following scripts can be used to prepare the data:

-- Power Query script (Boeing vs Airbus)
= let
    Source = let
    Source = #table({"Sorting", "Month Name", "Serial Date", "Boeing Deliveries", "Airbus Deliveries"},
    {
        {1, "Oct", #date(2023, 10, 31), 30, 50},
        {2, "Nov", #date(2023, 11, 30), 40, 40},
        {3, "Dec", #date(2023, 12, 31), 40, 110},
        {4, "Jan", #date(2024, 1, 31), 20, 30},
        {5, "Feb", #date(2024, 2, 29), 30, 40},  // Leap year adjustment
        {6, "Mar", #date(2024, 3, 31), 30, 60},
        {7, "Apr", #date(2024, 4, 30), 40, 60},
        {8, "May", #date(2024, 5, 31), 40, 50},
        {9, "Jun", #date(2024, 6, 30), 50, 80},
        {10, "Jul", #date(2024, 7, 31), 40, 90},
        {11, "Aug", #date(2024, 8, 31), 40, 50},
        {12, "Sep", #date(2024, 9, 30), 30, 50}
    }
    ),
    #"Changed Types" = Table.TransformColumnTypes(Source, {{"Sorting", Int64.Type}, {"Serial Date", type date}, {"Boeing Deliveries", Int64.Type}, {"Airbus Deliveries", Int64.Type}})
in
    #"Changed Types"
in
    Source

It can be useful to create the labels for the charts dynamically:

-- DAX code for labels
MaxDate = Format(Max('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinDate = FORMAT (Min('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinMaxDate = [MinDate] & " to " & [MaxDate]
Title Boing Airbus = "Boing and Airbus Deliveries " & [MinMaxDate]

Happy coding!

Previous Post <<||>> Next Post

10 March 2025

🧭Business Intelligence: Perspectives (Part 28: Cutting through Complexity)

Business Intelligence Series

Independently of the complexity of the problems, one should start by framing the problem(s) correctly and this might take several steps and iterations until a foundation is achieved, upon which further steps can be based. Ideally, the framed problem should reflect reality and should provide a basis on which one can build something stable, durable and sustainable. Conversely, people want quick low-cost fixes and probably the easiest way to achieve this is by focusing on appearances, which are often confused with more.

In many data-related contexts, there’s the tendency to start with the "solution" in mind, typically one or more reports or visualizations which should serve as basis for further exploration. Often, the information exchange between the parties involved (requestor(s), respectively developer(s)) is kept to a minimum, though the formalized requirements barely qualify for the minimum required. The whole process starts with a gap that can create further changes as the development process progresses, with all the consequences deriving from this: the report doesn’t satisfy the needs, more iterations are needed - requirements’ reevaluation, redesign, redevelopment, retesting, etc.

The poor results are understandable, all parties start with a perspective based on facts which provide suboptimal views when compared with the minimum basis for making the right steps in the right direction. That’s not only valid for reports’ development but also for more complex endeavors – data models, data marts and warehouses, and other software products. Data professionals attempt to bridge the gaps by formalizing and validating the requirements, building mock-ups and prototypes, testing, though that’s more than many organizations can handle!

There are simple reports or data visualizations for which not having prior knowledge of the needed data sources, processes and the business rules has a minimal impact on the further steps of the processes involved in building the final product(s). However, "all generalizations are false" to some degree, and there’s a critical point after which minimal practices tend to create more waste than companies can afford. Consequently, applying the full extent of the processes can lead to waste when the steps aren’t imperative for the final product.

Even if one is aware of all the implications, one’s experience and the application of best practices doesn’t guarantee the quality of the results as long as some kind of novelty, unknown, fuzziness or complexity is involved. Novelty can appear in different ways – process, business rules, data or problem formulations, particularities that aren’t easily perceived or correctly understood. Each important minor piece of information can have an exponential impact under the wrong circumstances.

The unknown can encompass novelty, though can be also associated with the multitude of facts not explicitly and/or directly considered. "The devil is in details" and it’s so easy for important or minor facts to remain hidden under the veil of suppositions, expectations, respectively under the complex and fuzzy texture of logical aspects. Many processes operate under strict rules, though there are various consequences, facts or unnecessary information that tend to increase the overall complexity and fuzziness.

Predefined processes, procedures and practices can help cut and illuminate through this complex structure associated with the various requirements and aspects of problems. Plunging headfirst can be occasionally associated with the need to evaluate what is known and unknown from facts and data’s perspective, to identify the gaps and various factors that can weigh in the final solution. Unfortunately, too often it’s nothing of this!

Besides the multitude of good/best practices and problem-solving approaches, all one has is his experience and intuition to cut through the overall complexity. False steps are inevitable for finding the approachable path(s) from the requirements to the solution.

Previous Post <<||>> Next Post

15 February 2025

🧭Business Intelligence: Perspectives (Part 27: A Tale of Two Cities II)

Business Intelligence Series

There’s a saying that applies to many contexts ranging from software engineering to data analysis and visualization related solutions: "fools rush in where angels fear to tread" [1]. Much earlier, an adage attributed to Confucius provides a similar perspective: "do not try to rush things; ignore matters of minor advantage". Ignoring these advices, there's the drive in rapid prototyping to jump in with both feet forward without checking first how solid the ground is, often even without having adequate experience in the field. That’s understandable to some degree – people want to see progress and value fast, without building a foundation or getting an understanding of what’s happening, respectively possible, often ignoring the full extent of the problems.

A prototype helps to bring the requirements closer to what’s intended to achieve, though, as the practice often shows, the gap between the initial steps and the final solutions require many iterations, sometimes even too many for making a solution cost-effective. There’s almost always a tradeoff between costs and quality, respectively time and scope. Sooner or later, one must compromise somewhere in between even if the solution is not optimal. The fuzzier the requirements and what’s achievable with a set of data, the harder it gets to find the sweet spot.

Even if people understand the steps, constraints and further aspects of a process relatively easily, making sense of the data generated by it, respectively using the respective data to optimize the process can take a considerable effort. There’s a chain of tradeoffs and constraints that apply to a certain situation in each context, that makes it challenging to always find optimal solutions. Moreover, optimal local solutions don’t necessarily provide the optimum effect when one looks at the broader context of the problems. Further on, even if one brought a process under control, it doesn’t necessarily mean that the process works efficiently.

This is the broader context in which data analysis and visualization topics need to be placed to build useful solutions, to make a sensible difference in one’s job. Especially when the data and processes look numb, one needs to find the perspectives that lead to useful information, respectively knowledge. It’s not realistic to expect to find new insight in any set of data. As experience often proves, insight is rarer than finding gold nuggets. Probably, the most important aspect in gold mining is to know where to look, though it also requires luck, research, the proper use of tools, effort, and probably much more.

One of the problems in working with data is that usually data is analyzed and visualized in aggregates at different levels, often without identifying and depicting the factors that determine why data take certain shapes. Even if a well-suited set of dimensions is defined for data analysis, data are usually still considered in aggregate. Having the possibility to change between aggregates and details is quintessential for data’s understanding, or at least for getting an understanding of what's happening in the various processes.

There is one aspect of data modeling, respectively analysis and visualization that’s typically ignored in BI initiatives – process-wise there is usually data which is not available and approximating the respective values to some degree is often far from the optimal solution. Of course, there’s often a tradeoff between effort and value, though the actual value can be quantified only when gathering enough data for a thorough first analysis. It may also happen that the only benefit is getting a deeper understanding of certain aspects of the processes, respectively business. Occasionally, this price may look high, though searching for cost-effective solutions is part of the job!

Previous Post <<||>> Next Post

References:
[1] Alexander Pope (cca. 1711) An Essay on Criticism

04 February 2025

🧭Business Intelligence: Perspectives (Part 26: Monitoring - A Cockpit View)

Business Intelligence Series

The monitoring of business imperatives is sometimes compared metaphorically with piloting an airplane, where pilots look at the cockpit instruments to verify whether everything is under control and the flight ensues according to the expectations. The use of a cockpit is supported by the fact that an airplane is an almost "closed" system in which the components were developed under strict requirements and tested thoroughly under specific technical conditions. Many instruments were engineered and evolved over decades to operate as such. The processes are standardized, inputs and outputs are under strict control, otherwise the whole edifice would crumble under its own complexity.

In organizational setups, a similar approach is attempted for monitoring the most important aspects of a business. A few dashboards and reports are thus built to monitor and control what’s happening in the areas which were identified as critical for the organization. The various gauges and other visuals were designed to provide similar perspectives as the ones provided by an airplane’s cockpit. At first sight the cockpit metaphor makes sense, though at careful analysis, there are major differences.

Probably, the main difference is that businesses don’t necessarily have standardized processes that were brought under control (and thus have variation). Secondly, the data used doesn’t necessarily have the needed quality and occasionally isn’t fit for use in the business processes, including supporting processes like reporting or decision making. Thirdly, are high the chances that the monitoring within the BI infrastructures doesn’t address the critical aspects of the business, at least not at the needed level of focus, detail or frequency. The interplay between these three main aspects can lead to complex issues and a muddy ground for a business to build a stable edifice upon.

The comparison with airplanes’ cockpit was chosen because the number of instruments available for monitoring is somewhat comparable with the number of visuals existing in an organization. In contrast, autos have a smaller number of controls simple enough to help the one(s) sitting in the cockpit. A car’s monitoring capabilities can probably reflect the needs of single departments or teams, though each unit needs its own gauges with specific business focus. The parallel is however limited because the areas of focus in organizations can change and shift in other directions, some topics may have a periodic character while others can regain momentum after a long time.

There are further important aspects. At high level, the expectation is for software products and processes, including the ones related to BI topics, to have the same stability and quality as the mass production of automobiles, airplanes or other artifacts that have similar complexity and manufacturing characteristics. Even if the design process of software and manufacturing may share many characteristics, the similar aspects diverge as soon as the production processes start, respectively progress, and these are the areas where the most differences lie. Starting from the requirements and ending with the overall goals, everything resembles the characteristics of quick shifting sands on which is challenging to build any stabile edifice.

At micro level in manufacturing each piece was carefully designed and produced according to a set of characteristics that were proved to work. Everything must fit perfectly in the grand design and there are many tests and steps to make sure that happens. To some degree the same is attempted when building software products, though the processes break along the way with the many changes attempted, with the many cost, time and quality constraints. At some point the overall complexity kicks back; it might be still manageable though the overall effort is higher than what organizations bargained for.

Previous Post <<||>> Next Post

15 January 2025

🧭Business Intelligence: Perspectives (Part 23: In between the Many Destinations)

Business Intelligence Series

In too many cases the development of queries, respectively reports or data visualizations (aka artifacts) becomes a succession of drag & drops, formatting, (re)ordering things around, a bit of makeup, configuring a set of parameters, and the desired product is good to go! There seems nothing wrong with this approach as long as the outcomes meet users’ requirements, though it also gives the impression that’s all what the process is about.

Given a set of data entities, usually there are at least as many perspectives into the data as entities’ number. Further perspectives can be found in exceptions and gaps in data, process variations, and the further aspects that can influence an artifact’s logic. All these aspects increase the overall complexity of the artifact, respectively of the development process. One guideline in handling all this is to keep the process in focus, and this starts with requirements’ elicitation and ends with the quality assurance and actual use.

Sometimes, the two words, the processes and their projection into the data and (data) models don’t reflect the reality adequately and one needs to compromise, at least until the gaps are better addressed. Process redesign, data harmonization and further steps need to be upon case considered in multiple iterations that should converge to optimal solutions, at least in theory.

Therefore, in the development process there should be a continuous exploration of the various aspects until an optimum solution is reached. Often, there can be a couple of competing forces that can pull the solution in two or more directions and then compromising is necessary. Especially as part of continuous improvement initiatives there’s the tendency of optimizing locally processes in the detriment of the overall process, with all the consequences resulting from this.

Unfortunately, many of the problems existing in organizations are ill-posed and misunderstood to the degree that in extremis more effort is wasted than the actual benefits. Optimization is a process of putting in balance all the important aspects, respectively of answering with agility to the changing nature of the business and environment. Ignoring the changing nature of the problems and their contexts is a recipe for failure on the long term.

This implies that people in particular and organizations in general need to become and remain aware of the micro and macro changes occurring in organizations. Continuous learning is the key to cope with change. Organizations must learn to compromise and focus on what’s important, achievable and/or probable. Identifying, defining and following the value should be in an organization’s ADN. It also requires pragmatism (as opposed to idealism). Upon case, it may even require to say “no”, at least until the changes in the landscape offer a reevaluation of the various aspects.

One requires a lot from organizations when addressing optimization topics, especially when misalignment or important constraints or challenges may exist. Unfortunately, process related problems don’t always admit linear solutions. The nonlinear aspects are reflected especially when changing the scale, perspective or translating the issues or solutions from one are area to another.

There are probably answers available in the afferent literature or in the approaches followed by other organizations. Reinventing the wheel is part of the game, though invention may require explorations outside of the optimal paths. Conversely, an organization that knows itself has more chances to cope with the challenges and opportunities altogether.

A lot from what organizations do in a consistent manner looks occasionally like inertia, self-occupation, suboptimal or random behavior, in opposition to being self-driven, self-aware, or in self-control. It’s also true that these are ideal qualities or aspects of what organizations should become in time.

Previous Post <<||>> Next Post

SQL Troubles

Pages

09 August 2025

🧭Business Intelligence: Perspectives (Part 33: Data Lifecycle for Analytics)

30 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 3: Heuristics)

27 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 2: Guidelines)

21 July 2025

📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)

06 July 2025

🧭Business Intelligence: Perspectives (Part 32: Data Storytelling in Visualizations)

03 May 2025

🧭Business Intelligence: Perspectives (Part 31: More on Data Visualization)

📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)

10 March 2025

🧭Business Intelligence: Perspectives (Part 28: Cutting through Complexity)

15 February 2025

🧭Business Intelligence: Perspectives (Part 27: A Tale of Two Cities II)

04 February 2025

🧭Business Intelligence: Perspectives (Part 26: Monitoring - A Cockpit View)

15 January 2025

🧭Business Intelligence: Perspectives (Part 23: In between the Many Destinations)

About Me