"The bar or column chart is the easiest type of graphic to prepare and use in reports. It employs a simple form: four straight lines that are joined to construct a rectangle or oblong box. When the box is shown horizontally it is called a bar; when it is shown vertically it is called a column. [...] The bar chart is an effective way to show comparisons between or among two or more items. It has the added advantage of being easily understood by readers who have little or no background in statistics and who are not accustomed to reading complex tables or charts." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)
A Software Engineer and data professional's blog on SQL, data, databases, data architectures, data management, programming, Software Engineering, Project Management, ERP implementation and other IT related topics.
Pages
- 🏠Home
- 🗃️Definitions
- 🔢SQL Server
- 🎞️SQL Server: VoD
- 🏭Fabric
- 🎞️Fabric: VoD
- ⚡Power BI
- 🎞️Power BI: VoD
- 📚Data
- 📚Engineering
- 📚Management
- 📚SQL Server
- 🎞️D365: VoD
- 📚Systems Thinking
- ✂...Quotes
- 🧾D365: GL
- 💸D365: AP
- 💰D365: AR
- 🏠D365: FA
- 👥D365: HR
- ⛓️D365: SCM
- 🔤Acronyms
- 🪢Experts
- 🗃️Quotes
- 🔠Dataviz & BI
- 🔠D365
- 🔠Fabric
- 🔠Engineering
- 🔠Management
- 🔡Glossary
- 🌐Resources
- 🏺Dataviz
- 🗺️Social
- 📅Events
- ℹ️ About
25 July 2025
📉Graphical Representation: Rectangles (Just the Quotes)
21 July 2025
📉Graphical Representation: Visuals (Just the Quotes)
"Data storytelling can be defined as a structured approach for communicating data insights using narrative elements and explanatory visuals." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"Data storytelling involves the skillful combination of three key elements: data, narrative, and visuals. Data is the primary building block of every data story. It may sound simple, but a data story should always find its origin in data, and data should serve as the foundation for the narrative and visual elements of your story." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"Even with a solid narrative and insightful visuals, a data story cannot overcome a weak data foundation. As the master architect, builder, and designer of your data story, you play an instrumental role in ensuring its truthfulness, quality, and effectiveness. Because you are responsible for pouring the data foundation and framing the narrative structure of your data story, you need to be careful during the analysis process. Because all of the data is being processed and interpreted by you before it is shared with others, it can be exposed to cognitive biases and logical fallacies that distort or weaken the data foundation of your story." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"In addition to managing how the data is visualized to reduce noise, you can also decrease the visual interference by minimizing the extraneous cognitive load. In these cases, the nonrelevant information and design elements surrounding the data can cause extraneous noise. Poor design or display decisions by the data storyteller can inadvertently interfere with the communication of the intended signal. This form of noise can occur at both a macro and micro level." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"The success of your narratives will depend on your ability to effectively perform the following tasks and responsibilities as the data storyteller: Identify a key insight. [...] Minimize or remove bias. [...] Gain adequate context. [...] Understand the audience. [...] Curate the information. [...] Assemble the story. [...] Choose the visuals. [...] Add credibility." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"While visuals are an essential part of data storytelling, data visualizations can serve a variety of purposes from analysis to communication to even art. Most data charts are designed to disseminate information in a visual manner. Only a subset of data compositions is focused on presenting specific insights as opposed to just general information. When most data compositions combine both visualizations and text, it can be difficult to discern whether a particular scenario falls into the realm of data storytelling or not." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"Data visualization is a mix of science and art. Sometimes we want to be closer to the science side of the spectrum - in other words, use visualizations that allow readers to more accurately perceive the absolute values of data and make comparisons. Other times we may want to be closer to the art side of the spectrum and create visuals that engage and excite the reader, even if they do not permit the most accurate comparisons." (Jonathan Schwabish, "Better Data Visualizations: A guide for scholars, researchers, and wonks", 2021)
"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"Good data stories have three key components: data, narrative, and visuals. [...] The data part is fairly obvious - data has to be accurate for the correct insights to be achieved. The narrative has to give a voice to the data in simple language, turning each data point into a character in the story with its own tale to tell. The visuals are what we are most concerned about. They have to allow us to be able to find trends and patterns in our datasets and do so easily and specifically. The last thing we want is for the most important points to be buried in rows and columns." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)
"Good design isn’t just choosing colors and fonts or coming up with an aesthetic for charts. That’s styling - part of design, but by no means the most important part. Rather, people with design talent develop and execute systems for effective visual communication. They understand how to create and edit visuals to focus an audience and distill ideas." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)
📊Graphical Representation: Sense-making in Data Visualizations (Part 1: An Introduction)
Graphical Representation Series |
Introduction
Creating simple charts or more complex data visualizations may appear trivial for many, though their authors shouldn't forget that readers have different backgrounds, degrees of literacy, many of them not being maybe able to make sense of graphical displays, at least not without some help.
Beginners start with a limited experience and build upon it, then, on the road to mastery, they get acquainted with the many possibilities, a deeper sense is achieved and the choices become a few. Independently of one's experience, there are seldom 'yes' and 'no' answers for the various choices, but everything is a matter of degree that varies with one's experience, available time, audience's expectations, and many more aspects might be considered in time.
The following questions are intended to expand, respectively narrow down our choices when dealing with data visualizations from a data professional's perspective. The questions are based mainly on [1] though they were extended to include a broader perspective.
General Questions
Where does the data come from? Is the source reliable, representative (for the whole population in scope)? Is the data source certified? Are yhe data actual?
Are there better (usable) sources? What's the effort to consider them? Does the data overlap? To what degree? Are there any benefits in merging the data? How much this changes the overall picture? Are the changes (in trends) explainable?
Was the data collected? How, from where, and using what method? [1] What methodology/approach was used?
What's the dataset about? Can one recognize the data, the (data) entities, respectively the structures behind? How big is the fact table (in terms of rows and columns)? How many dimensions are in scope?
What transformations, calculations or modifications have been applied? What was left out and what's the overall impact?
Any significant assumptions were made? [1] Were the assumptions clearly stated? Are they entitled? Is it more to them?
Were any transformation applied? Do the transformations change any data characteristics? Were they adequately documented/explained? Do they make sense? Was it something important left out? What's the overall impact?
What criteria were used to include/exclude data from the display? [1] Are the criteria adequately explained/documented? Do they make sense?
Are similar data publicly available? Is it (freely) accessible/usable? To what degree? How much do the datasets overlap? Is there any benefit to analyze/use the respective data? Are the characteristics comparable? To what degree?
Dataviz Questions
What's the title/subtitle of the chart? Is it meaningful for the readers? Does the title reflect the data, respectively the findings adequately? Can it be better formulated? Is it an eye-catcher? Does it meet the expectations?
What data is shown? Of what type? At what level is the data aggregated?
What chart (type) is being used? [1] Are the readers familiar with the chart type? Does it needs further introduction/clarifications? Are there better means to represent the data? Does the chart offer the appropriate perspective? Does it make sense to offer different (complementary) perspective(s)? To what degree other perspectives help?
What items of data do the marks represent? What value associations do the attributes represent? [1] Are the marks visible? Are the marks adequately presented (e.g. due to missing data)?
What range of values are displayed? [1] What approximation the values support? To what degree can the values be rounded without losing meaning?
Is the data categorical, ordinal or continuous?
Are the axes property chosen/displayed/labeled? Is the scale properly chosen (linear, semilogarithmic, logarithmic), respectively displayed? Do they emphasize, diminish, distort, simplify, or clutter the information?
What features (shapes, patterns, differences or connections) are observable, interesting or vital for understanding the chart? [1]
Where are the largest, mid-sized and smallest values? (aka ‘stepped magnitude’ judgements). [1]
Where lie the most/least values? Where is the average or normal? (aka ‘global comparison’ judgements)” [1] How are the values distributed? Are there any outliers present? Are they explainable?
What features are expected or unexpected? [1] To what degree are they unexpected?
What features are important given the subject? [1]
What shapes and patterns strike readers as being semantically aligned with the subject? [1]
What is the overall feeling when looking at the final result? Is the chart overcrowded? Can anything be left out/included?
What colors were used? [1] Are the colors adequately chosen, respectively meaningful? Do they follow the general recommendations?
What colors, patterns, forms do readers see first? What impressions come next, respectively last longer?
Are the various elements adequately/intuitively positioned/distinguishable? What's the degree of overlapping/proximity? Do the elements respect an intuitive hierarchy? Do they match readers' expectations, respectively the best practices in scope? Are the deviations entitled?
Is the space properly used? To what degree? Are there major gaps?
Know Your Audience
What audience targets the visualization? Which are its characteristics (level of experience with data visualizations; authors, experts or casual attendees)? Are there any accidental attendees? How likely is the audience to pay attention?
What is audience’s relationship with the subject matter? What knowledge do they have or, conversely, lack about the subject? What assistance might they need to interpret the meaning of the subject? Do they have the capacity to comprehend what it means to them? [1]
Why do the audience wants/needs to understand the topic? Are they familiar, respectively actively interested or more passive? Is it able to grasp the intended meaning? [1] To what degree? What kind of challenges might be involved, of what nature?
What is their motivation? Do they have a direct, expressed need or are they more passive and indifferent? Is it needed a way to persuade them or even seduce them to engage? [1] Can this be done without distorting the data and its meaning(s)?
What are their visualization literacy skill set? Do they require assistance perceiving the chart(s)? Are they sufficiently comfortable with operating features of interactivity? Do they have any visual accessibility issues (e.g. red–green color blindness)? Do they need to be (re)factored into the design? [1]
Reflections
What has been learnt? Has it reinforced or challenged existing knowledge? [1] Was new knowledge gained? How valuable is this knowledge? Can it be reused? In which contexts?
Do the findings meet one's expectations? To what degree? Were the expectations entitled? On what basis? What's missing? What's gaps' relevance?
What feelings have been stirred? Has the experience had an impact emotionally? [1] To what degree? Is the impact positive/negative? Is the reaction entitled/explainable? Are there any factors that distorted the reactions? Are they explainable? Do they make sense?
What does one do with this understanding? Is it just knowledge acquired or something to inspire action (e.g. making a decision or motivating a change in behavior)? [1] How relevant/valuable is the information for us? Can it be used/misused? To what degree?
Are the data and its representation trustworthy? [1] To what degree?
References:
[1] Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019
03 May 2025
📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)
|
Graphical Representation Series |
Over the past 10-20 years it became so easy to create data visualizations just by dropping some of the data available into a tool like Excel and providing a visual depiction of it with just a few clicks. In many cases, the first draft, typically provided by default in the tool used, doesn't even need further work as the objective was reached, while in others the creator must have a minimum skillset for making the visualization useful, appealing, or whatever quality is a final requirement for the work in scope. However, the audience might judge the visualization(s) from different perspectives, and there can be a broad audience with different skills in reading, evaluating and understanding the work.
There are many depictions on the web resembling the one below, taken from a LinkedIn post:
|
Example Chart - Boing vs. Airbus |
![]() |
Summary Table |
Column and bar charts do a fair job in comparing values over time, though they do use a lot of ink in the process (see D). While they make it easy to compare neighboring values, the rectangles used tend to occupy a lot of space when they are made too wide or too high to cover the empty space within the display (e.g. when just a few values are displayed, space being wasted in the process). As the main downside, it takes a lot of scanning until the reader identifies the overall trends, and the further away the bars are from each other, the more difficult it becomes to do comparisons.
In theory, line charts are more efficient in representing the above data points, because the marks are usually small and the line thin enough to provide a better data-ink ratio, while one can see a lot at a glance. In Power BI the creator can use different types of interpolation: linear (A), step (B) or smooth (C). In many cases, it might be a good idea to use a linear interpolation, though when there are no or minimal overlapping, it might be worthwhile to explore the other types if interpolation too (and further request feedback from the users):
![]() |
Linear, Step and Smooth Line Charts |
![]() |
Alternatives to Line Charts |
-- Power Query script (Boeing vs Airbus) = let Source = let Source = #table({"Sorting", "Month Name", "Serial Date", "Boeing Deliveries", "Airbus Deliveries"}, { {1, "Oct", #date(2023, 10, 31), 30, 50}, {2, "Nov", #date(2023, 11, 30), 40, 40}, {3, "Dec", #date(2023, 12, 31), 40, 110}, {4, "Jan", #date(2024, 1, 31), 20, 30}, {5, "Feb", #date(2024, 2, 29), 30, 40}, // Leap year adjustment {6, "Mar", #date(2024, 3, 31), 30, 60}, {7, "Apr", #date(2024, 4, 30), 40, 60}, {8, "May", #date(2024, 5, 31), 40, 50}, {9, "Jun", #date(2024, 6, 30), 50, 80}, {10, "Jul", #date(2024, 7, 31), 40, 90}, {11, "Aug", #date(2024, 8, 31), 40, 50}, {12, "Sep", #date(2024, 9, 30), 30, 50} } ), #"Changed Types" = Table.TransformColumnTypes(Source, {{"Sorting", Int64.Type}, {"Serial Date", type date}, {"Boeing Deliveries", Int64.Type}, {"Airbus Deliveries", Int64.Type}}) in #"Changed Types" in Source
-- DAX code for labels MaxDate = Format(Max('Boeing vs Airbus'[Serial Date]),"MMM-YYYY") MinDate = FORMAT (Min('Boeing vs Airbus'[Serial Date]),"MMM-YYYY") MinMaxDate = [MinDate] & " to " & [MaxDate] Title Boing Airbus = "Boing and Airbus Deliveries " & [MinMaxDate]
04 August 2024
📊Graphical Representation: Graphics We Live By (Part X: Pie and Donut Charts in Power BI and Excel)
Graphical Representation Series |
Pie charts are loved and hated by many altogether, and there are many entitled reasons to use them and avoid them, though the most important criteria to evaluate them is whether they do the intended job in an acceptable manner, especially when compared to other representational means. The most important aspect they depict is the part to whole ratio, which even if can be depicted by other graphical tools, few tools are efficient in representing it.
The pie chart works well as a visualization tool when it has only 3-5 values that are easily recognizable in the visualization, however as soon the size or the number of pieces vary considerably, the more difficult it is to visualize and interpret them, in case their representation has more negative than positive effects. There are many topics that form something like a long tail - the portion of the distribution having many occurrences far from the head or beginning. Displaying the items from the long tail together with the other components together can totally obscure the distribution of the items from the long tail as they become unrecognizable in the diagram.
One approach to handle this is to group all the items from the long tail together under a piece (e.g. Other) and use a second form of representation to display them separately. For example, Microsoft Excel offers a way to zoom in the section of a pie chart with small percentages by displaying them in a second pie chart (pie of pie) or bar chart (bar of pie), something like a "zoom in" perspective (see image below). Unfortunately, the feature seems to limit itself only to small percentages, and thus can't be used currently to offer a broader perspective. Ideally, it would be useful to zoom in on any piece of the pie, especially when the items are categorized as a hierarchy with two or even more levels.
|
|
Pie Charts - Original Solution |
In the above example, the arrow may suggest that in between the two donut charts exists a relationship, reflected also in the description provided, however the readers may still have difficulties in correctly interpreting the diagrams, especially when there's some kind of overlapping or other type of implied or unimplied resemblance. If the colors overlap or have other similarities, are they intentional? If the circles have the same size, does this observed resemblance have a meaning? The reader shouldn't bother himself with this type of questions, but see the resemblance and the meaning of the various elements with a minimum of effort while decoding a chart's elements. Of course, when the meaning is not clear, some guidance should be ideally provided!
|
Pie of Pie in Power BI |
|
Pie of Pie Alternatives in Power BI I |
A treemap can prove to be a better representation alternative because it encodes proportions in a unitary way, much like pie charts do, though it takes more space if one wants to make the labels visible. Radial charts (see G) and Aster plots (see I) can be occasionally better choices, especially because they use less space as they display only the main categories. A second diagram chart can be used to display the subcategories, much like in A and B. Sankey charts (see H) can be used as well, even if they don't allow representing any quantitative values unless one encodes them directly in the labels.
|
Pie of Pie Alternatives in Power BI II |
When one dives into the world of diagrams and goes behind the still limited representational choices provided by the standard tools, one can be surprised by the additional representational choices. However, their appropriateness should be considered against readers' skillset to read and interpret them! Frankly, the alternatives considered above could be a better choice when they will reach a representational maturity.
Many thanks to Christopher Chin, who in his weekly post on data visualization blunders, suggested the examples used as basis for this post (see [1])!
Previous Post <<||>> Next Post
References:15 June 2024
🗒️Graphical Representation: Bar & Column Charts [Notes]
Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!
Last updated: 15-Jun-2024
Bar & Column Charts with Variations |
- {definition} graphical representation of categorical data with rectangular figures (aka boxes) whose heights (column chart) or lengths (bar chart) are proportional to the values that they represent
- {benefit} allow to visually encode/decode quantitative information-size as magnitude and area based on the relative position of the end of the box along the common scale
- if the width of the box is the same, it's enough to compare the length
- ⇒ the basis of comparison is one-dimensional [1]
- ⇐ orient the reader to the relative magnitudes of the boxes
- area is typically encoded when the width varies
- ⇐ encoding by area is a poor encoding method as it can mislead
- can represent negative and positive values
- one of the most useful, simple, and adaptable techniques in graphic presentation [1]
- easily understood by readers
- sometimes avoided because they are so common
- almost everything could be a bar chart
- the length of each bar is proportional to the quantity or amount of each category represented [1]
- ⇒the zero line must be shown [1]
- ⇒the scale must not be broken [1]
- {exception} an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it [1]
- {benefit} allow to visually represent categorical data
- ⇒ occasionally represented without scales, grid lines or tick marks
- the more data elements are presented, the more difficult it becomes to navigate and/or display the data
- {benefit} allow us to easily compare magnitudes
- sometimes without looking at the actual values
- {type} bar chart
- the box is shown horizontally
- represents magnitude by length
- allows comparing different items as of a specific time
- {type} column chart
- the box is shown vertically
- represents magnitude by height
- allows comparing different items over time
- ⇐ it still displays discrete points
- recommended for comparing similar items for different time periods [2]
- effective way to show most types of comparisons [2]
- {subtype} stacked chart
- variation of bar/column charts in which the boxes of a dimension's components are staked over each other
- {exception} spaces can be used between boxes if the values aren't cumulative [3]
- {benefit} allows encoding a further dimension where the values are staked within the same box
- {drawback} do not show data structure well
- ⇒ make it challenging to compare values across boxes
- {subtype} 100-percent chart
- variation of stacked chart in which the magnitude totals to 100%
- {benefit} allows to display part to whole relationships
- ⇐ preferable to circle chart's angle and area comparison [1]
- {subtype} clustered chart (aka grouped chart)
- variation of bar/column charts that allows encoding further quantitative information in distinct boxes tacked together which occasionally overlap
- ⇐ if there's space, it is usually kept to a minimum
- e.g. can be used to display multiple data series
- can be used with a secondary axis
- {benefit} allows comparisons within the cluster/group as well between clusters/groups
- {drawback} more challenging to make comparisons across points
- {subtype} area chart (variable-width/variwide chart/graph)
- variation of bar/column charts in which the height/width have significance being proportional to some measure or characteristics of the data elements represented [3]
- {benefit} allow encoding a further dimension as part of the area
- {subtype} deviation chart
- variation of bar/column charts that display positive and negative values
- {subtype} joined chart
- variation of bar/column charts in which the boxes are tacked together
- {benefit} allow to better use the space available
- {subtype} paired chart
- variation of bar/column charts in which the boxes are paired in mirror based on an axis
- e.g. the values of one data series are displayed to the left, while the values for a second data series are displayed to the right
- {benefit} allows to study the correlation and/or other relationships between the values of two data series
- the hidden axes can have different scales
- {subtype} circular chart (aka radial chart)
- variation of bar/column charts in which the boxes are wrapped into a circle, the various categories being uniformly spaced along the radial or category axis [3]
- the value scale can have any upper or lower value and can progress in either direction [3]
- {benefit} useful to represent data that have a circular dimension in an aesthetic form
- e.g. months, hours
- {subtype} waterfall chart (aka progressing chart)
- variation of bar/column charts in which the boxes are displayed progressively, the start of a box corresponding the end of the previous box
- time and activity charts can be considered as variations of this subtype [3]
- {advantage} allows to determine cumulative values, respectively the increase/decrease between consecutive boxes
- {subtype}composite chart (aka mixed chart, combination chart, overlay chart)
- variation of bar/column charts in besides boxes are used other graphic types of encoding (line, area)
- ⇐ the different data graphics are overlaid on one another [3]
- {benefit} allows to improve clarity or highlight the relationships between several data series [3]
- {drawback} overlaying can result in clutter
- used to
- display totals, averages or frequencies
- display time series
- display the relationship between two or more items
- make a comparison among several items
- make a comparison between parts and the whole
- can be confounded with
- [histograms]
- show distribution through the frequency of quantitative values against defined intervals of quantitative values
- used for continuous numerical data or data that can be effectively modelled as continuous
- it doesn't have spaces between bars
- ⇐ older use of bar/column charts don't use spaces
- if this aspect is ignored, histograms can be considered as a special type of area chart
- [vertical line chart] (aka price chart, bar chart)
- vertical line charts are sometimes referred as bar charts (see [3])
- things to consider
- distance between bars
- the more distant the bars, the more difficult it becomes to make comparisons and the accuracy of judgment decreases
- sorting
- sorting the bars/columns by their size facilitates comparisons, though it can impede items' search, especially when there are many categories involved
- {exception} not recommended for time series
- clutter
- displaying too many items in a cluster and/or too many labels can lead to clutter
- {recommendation} display at maximum 3-4 clustered boxes
- color
- one should follow the general recommendations
- trend lines
- can be used especially with time series especially to represent the linear regression line
- dual axis
- {benefit} allows to compare the magnitudes of two data series by employing a secondary axis
- overlapping
- overlapping boxes can make charts easier to read
- symbols
- can be used to designate reference points of comparison for each of the bars [3]
- {alternative} pie chart
- can be used to dramatize comparisons in relation to the whole [2]
- one should consider the drawbacks
- {alternative} choropleth maps
- more adequate for geographical dimensions
- provide minimal encoding
- {alternative} line charts
- can be much more informative
- provides an optimal dat-ink ratio
- reduces the chart junk feeling
- {alternative} dot plots
- are closer to the original data
References:
[1] Anna C Rogers (1961) "Graphic Charts Handbook"
[2] Robert Lefferts (1981) "Elements of Graphics: How to prepare charts and graphs for effective reports"
[3] Robert L Harris (1996) "Information Graphics: A Comprehensive Illustrated Reference"
14 June 2024
📊Graphical Representation: Graphics We Live By (Part IX: Word Clouds in Power BI)
|
Graphical Representation Series |
A word cloud (aka tag cloud) is a visual representation of textual data in the form of a cloud - a mass of words in which each word is shown with a different font size and/or color based on its frequency, significance or categorization in the dataset considered. It is used to depict keyword metadata on websites, to visualize free form text or the frequency of specific values within a categorical dimension, respectively to navigate the same.
|
Word Clouds |
Previous Post <<||>> Next Post
References:
[1] Wikipedia (2024) Tag cloud (link)
[2] Microsoft Power BI Blog (2004) Power BI June 2024 Feature
Summary (link)
01 June 2024
📊Graphical Representation: Graphics We Live By (Part VIII: List of Items in Power BI)
|
Graphical Representation Series |
Introduction
There are situations in which one needs to visualize only the rating, other values, or ranking of a list of items (e.g. shopping cart, survey items) on a scale (e.g. 1 to 100, 1 to 10) for a given dimension (e.g. country, department). Besides tables, in Power BI there are 3 main visuals that can be used for this purpose: the clustered bar chart, the line chart (aka line graph), respectively the slopegraph:
|
Main Display Methods |
Main Display Methods
For a small list of items and dimension values probably the best choice would be to use a clustered bar chart (see A). If the chart is big enough, one can display also the values as above. However, the more items in the list, respectively values in the dimension, the more space is needed. One can maybe focus then only on a subset of items from the list (e.g. by grouping several items under a category), respectively choose which dimension values to consider. Another important downside of this method is that one needs to remember the color encodings.
This downside applies also to the next method - the use of a line chart (see B) with categorical data, however applying labels to each line simplifies its navigation and decoding. With line charts the audience can directly see the order of the items, the local and general trends. Moreover, a line chart can better scale with the number of items and dimension values.
The third option (see C), the slopegraph, looks like a line chart though it focuses only on two dimension values (points) and categorizes the line as "down" (downward slope), "neutral" (no change) and "up" (upward slope). For this purpose, one can use parameters fields with measures. Unfortunately, the slopegraph implementation is pretty basic and the labels overlap which makes the graph more difficult to read. Probably, with the new set of changes planned by Microsoft, the use of conditional formatting of lines would allow to implement slope graphs with line charts, creating thus a mix between (B) and (C).
This is one of the cases in which the Y-axis (see B and C) could be broken and start with the meaningful values.
Table Based Displays
Especially when combined with color encodings (see C & G) to create heatmap-like displays or sparklines (see E), tables can provide an alternative navigation of the same data. The color encodings allow to identify the areas of focus (low, average, or high values), while the sparklines allow to show inline the trends. Ideally, it should be possible to combine the two displays.
|
Table Displays and the Aster Plot |
One can vary the use of tables. For example, one can display only the deviations from one of the data series (see F), where the values for the other countries are based on AUS. In (G), with the help of visual calculations one can also display values' ranking.
Pie Charts
Pie charts and their variations appear nowadays almost everywhere. The Aster plot is a variation of the pie charts in which the values are encoded in the height of the pieces. This method was considered because the data used above were encoded in 4 similar plots. Unfortunately, the settings available in Power BI are quite basic - it's not possible to use gradient colors or link the labels as below:
|
Source Data as Aster Plots |
Sankey Diagram
A Sankey diagram is a data visualization method that emphasizes the flow or change from one state (the source) to another (the destination). In theory it could be used to map the items to the dimensions and encode the values in the width of the lines (see I). Unfortunately, the diagram becomes challenging to read because all the lines and most of the labels intersect. Probably this could be solved with more flexible formatting and a rework of the algorithm used for the display of the labels (e.g. align the labels for AUS to the left, while the ones for CAN to the right).
|
Sankey Diagram |
Data Preparation
A variation of the above image with the Aster Plots which contains only the plots was used in ChatGPT to generate the basis data as a table via the following prompts:
- retrieve the labels from the four charts by country and value in a table
- consolidate the values in a matrix table by label country and value
let Source = #table({"Label","Australia","Canada","U.S.","Japan"} , { {"Credit card","67","64","66","68"} , {"Online retail","55","57","48","53"} , {"Banking","58","53","57","48"} , {"Mobile phone","62","55","44","48"} , {"Social media","74","72","62","47"} , {"Search engine","66","64","56","42"} , {"Government","52","52","58","39"} , {"Health insurance","44","48","50","36"} , {"Media","52","50","39","23"} , {"Retail store","44","40","33","23"} , {"Car manufacturing","29","29","26","20"} , {"Airline/hotel","35","37","29","16"} , {"Branded manufacturing","36","33","25","16"} , {"Loyalty program","45","41","32","12"} , {"Cable","40","39","29","9"} } ), #"Changed Types" = Table.TransformColumnTypes(Source,{{"Australia", Int64.Type}, {"Canada", Int64.Type}, {"U.S.", Number.Type}, {"Japan", Number.Type}}) in #"Changed Types"
IndustriesT = UNION ( SUMMARIZECOLUMNS( Industries[Label] , Industries[Australia] , "Country", "Australia" ) , SUMMARIZECOLUMNS( Industries[Label] , Industries[Canada] , "Country", "Canada" ) , SUMMARIZECOLUMNS( Industries[Label] , Industries[U.S.] , "Country", "U.S." ) , SUMMARIZECOLUMNS( Industries[Label] , Industries[Japan] , "Country", "Japan" ) )
install.packages("XML") install.packages("htmlwidgets") install.packages("ggplot2") install.packages("plotly")
29 May 2024
📊Graphical Representation: Graphics We Live By (Part VII: Reading a Conversion Rates Chart with ChatGPT and Copilot)
|
Graphical Representation Series |
One of the areas where ChatGPT, Copilot and other similar AI-based chatbots can help is in summarizing a chart saved as image. Ideally, the chatbots should be able also to approximate the points from the chart as well (an image is made of pixels and thus areas should be easy to delimit). So, I was wondering how far the chatbots can be used for these purposes. I used first an image copied from the web, though I realized that not all visual elements could be read (e.g. Copilot had issues retrieving the values for some months) and I had no basis data for comparisons to identify how big the deviations are.
So, I created a chart in Power BI based on the below chart (see original data):
|
Conversion Rates Dual Axes Chart |
Original data | First attempt | Second attempt | Third attempt | Fourth attempt | |||||||
Sorting | Month | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate |
1 | Jul | 8 | 4 | 10 | 1 | 10 | 1 | 8 | 4 | 8 | 4 |
2 | Aug | 280 | 16 | 275 | 15 | 275 | 15 | 275 | 18 | 275 | 18 |
3 | Sep | 100 | 13 | 225 | 12 | 225 | 10 | 225 | 12 | 225 | 12 |
4 | Oct | 280 | 14 | 275 | 12 | 275 | 11 | 275 | 11 | 275 | 11 |
5 | Nov | 90 | 4 | 75 | 5 | 75 | 6 | 75 | 6 | 75 | 6 |
6 | Dec | 85 | 3.5 | 100 | 5 | 100 | 5 | 100 | 5 | 100 | 5 |
7 | Jan | 70 | 4.5 | 50 | 3 | 50 | 3 | 50 | 4 | 50 | 4 |
8 | Feb | 30 | 1.5 | 50 | 3 | 25 | 2 | 50 | 2.5 | 50 | 2.5 |
9 | Mar | 70 | 4 | 25 | 1 | 50 | 2.5 | 25 | 1.5 | 25 | 1.5 |
10 | Apr | 185 | 11 | 200 | 10 | 200 | 10 | 200 | 10 | 200 | 10 |
11 | May | 25 | 3.5 | 50 | 4 | 50 | 4 | 50 | 3.5 | 50 | 3.5 |
12 | Jun | 195 | 4 | 225 | 10 | 225 | 10 | 225 | 11 | 195 | 4 |
Please note that same values were repeated to identify whether the chatbot is able to identify the resemblance correctly.
Based on the data several tests were made with ChatGPT4o, where the following prompts were given:
- generate the data behind the chart as excel list (see 'First attempt' in the table)
- generate the data behind the chart as excel list where blue is represented on the left axis and orange on the right axis (see 'Second attempt');
- reevaluate the values from the previous output considering that the values for Jul are 8, respectively 4% (see 'Third attempt');
- reevaluate the values from the previous output considering that the values for Jun are 195, respectively 4% (see 'Fourth attempt');
- generate the data behind the dual axis chart as list from the last conversation;
- summarize the chart by revealing trends and other statistics.
Original data | First attempt | Second attempt | Third attempt | Fourth attempt | |||||||
Sorting | Month | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate | Conv. | Conv. Rate |
1 | Jul | 8 | 4 | 300 | 18 | 300 | 18 | 8 | 4 | 8 | 4 |
2 | Aug | 280 | 16 | 250 | 16 | 250 | 16 | 250 | 16 | 250 | 16 |
3 | Sep | 100 | 13 | 200 | 14 | 200 | 14 | 200 | 14 | 200 | 14 |
4 | Oct | 280 | 14 | 150 | 12 | 150 | 12 | 150 | 12 | 150 | 12 |
5 | Nov | 90 | 4 | 100 | 10 | 100 | 10 | 100 | 10 | 100 | 10 |
6 | Dec | 85 | 3.5 | 50 | 8 | 50 | 8 | 50 | 8 | 50 | 8 |
7 | Jan | 70 | 4.5 | 300 | 18 | 300 | 18 | 300 | 18 | 300 | 18 |
8 | Feb | 30 | 1.5 | 250 | 16 | 250 | 16 | 250 | 16 | 250 | 16 |
9 | Mar | 70 | 4 | 200 | 14 | 200 | 14 | 200 | 14 | 200 | 14 |
10 | Apr | 185 | 11 | 150 | 12 | 150 | 12 | 150 | 12 | 150 | 12 |
11 | May | 25 | 3.5 | 100 | 10 | 100 | 10 | 100 | 10 | 100 | 10 |
12 | Jun | 195 | 4 | 50 | 8 | 50 | 8 | 50 | 8 | 195 | 4 |
The outputs of other chatbots like Google's Gemini or Claude-3-Haiku (via Poe) can't be compared with the ones from ChatGPT or Copilot yet. Claude-3-Haiku does provide estimated values (even with comma), though they deviate considerably from the original data.
It would be interesting to test how other charts and plots are processed by chatbots, respectively whether the various visual elements (e.g. gridlines, ticks, markers) make a difference.
About Me

- Adrian
- Koeln, NRW, Germany
- IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.