Showing posts with label area chart. Show all posts
Showing posts with label area chart. Show all posts

03 May 2025

📊Graphical Representation: Graphics We Live By (Part XI: Comparisons Between Data Series)

Graphical Representation Series
Graphical Representation Series

Over the past 10-20 years it became so easy to create data visualizations just by dropping some of the data available into a tool like Excel and providing a visual depiction of it with just a few clicks. In many cases, the first draft, typically provided by default in the tool used, doesn't even need further work as the objective was reached, while in others the creator must have a minimum skillset for making the visualization useful, appealing, or whatever quality is a final requirement for the work in scope. However, the audience might judge the visualization(s) from different perspectives, and there can be a broad audience with different skills in reading, evaluating and understanding the work.

There are many depictions on the web resembling the one below, taken from a LinkedIn post:

Example Chart - Boing vs. Airbus

Even if the visualization is not perfect, it does a fair job in representing the data. Improvements can be made in the areas of labels, the title and positioning of elements, and the color palette used. At least these were the improvements made in the original post. It must be differentiated also between the environment in which the charts are made available, the print format having different characteristics than the ones in business setups. Unfortunately, the requirements of the two are widely confused, probably also because of the overlapping of the mediums used. 

Probably, it's a good idea to always start with the row data (or summaries of it) when the result consists of only a few data points that can be easily displayed in a table like the one below (the feature to round the decimals for integer values should be available soon in Power BI):

Summary Table

Of course, one can calculate more meaningful values like percentages from the total, standard deviations and other values that offer more perspectives into the data. Even if the values adequately reflect the reality, the reader can but wonder about the local and global minimal/maximal values, without talking much about the meaning of data points, which is easily identifiable in a chart. At least in the case of small data sets, using a table in combination with a chart can provide a more complete perspective and different ways of analyzing the data, especially when the navigation is interactive. 

Column and bar charts do a fair job in comparing values over time, though they do use a lot of ink in the process (see D). While they make it easy to compare neighboring values, the rectangles used tend to occupy a lot of space when they are made too wide or too high to cover the empty space within the display (e.g. when just a few values are displayed, space being wasted in the process). As the main downside, it takes a lot of scanning until the reader identifies the overall trends, and the further away the bars are from each other, the more difficult it becomes to do comparisons. 

In theory, line charts are more efficient in representing the above data points, because the marks are usually small and the line thin enough to provide a better data-ink ratio, while one can see a lot at a glance. In Power BI the creator can use different types of interpolation: linear (A), step (B) or smooth (C). In many cases, it might be a good idea to use a linear interpolation, though when there are no or minimal overlapping, it might be worthwhile to explore the other types if interpolation too (and further request feedback from the users):

Linear, Step and Smooth Line Charts

The nearness of values from different series can raise difficulties in identifying adequately the points, respectively delimiting the lines (see B).When the density of values allows it, it makes sense also to include the averages for each data series to reflect the distance between the two data sets. Unfortunately, the chart can get crowded if further data series or summaries are added to the cart(s). 

If the column chart (E) is close to the redesigned chart provided in the original redesign, the other alternatives can provide upon case more value. Stacked column charts (D) allow also to compare the overall quantity by month, area charts (F) tend to use even more color than needed, while water charts (G) allow to compare the difference between data points per time unit. Tornado charts (H) are a variation of bar charts, allowing easier comparing of the size of the bars, while ribbon charts (I) show well the stacking values. 

Alternatives to Line Charts

One should consider changing the subtitle(s) slightly to reflect the chart type when the patterns shown imply a shift in attention or meaning. Upon case, more that one of the above charts can be used within the same report when two or more perspectives are important. Using a complementary perspective can facilitate data's understanding or of identifying certain patterns that aren't easily identifiable otherwise. 

In general, the graphics creators try to use various representational means of facilitating a data set's understanding, though seldom only two series or a small subset of dimensions provide a complete description. The value of data comes when multiple perspectives are combined. Frankly, the same can be said about the above data series. Yes, there are important differences between the two series, though how do the numbers compare when one looks at the bigger picture, especially when broken down on element types (e.g. airplane size). How about plan vs. actual values, how long does it take more for production or other processes? It's one of a visualization's goals to improve the questions posed, but how efficient are visualizations that barely scratch the surface?

In what concerns the code, the following scripts can be used to prepare the data:

-- Power Query script (Boeing vs Airbus)
= let
    Source = let
    Source = #table({"Sorting", "Month Name", "Serial Date", "Boeing Deliveries", "Airbus Deliveries"},
    {
        {1, "Oct", #date(2023, 10, 31), 30, 50},
        {2, "Nov", #date(2023, 11, 30), 40, 40},
        {3, "Dec", #date(2023, 12, 31), 40, 110},
        {4, "Jan", #date(2024, 1, 31), 20, 30},
        {5, "Feb", #date(2024, 2, 29), 30, 40},  // Leap year adjustment
        {6, "Mar", #date(2024, 3, 31), 30, 60},
        {7, "Apr", #date(2024, 4, 30), 40, 60},
        {8, "May", #date(2024, 5, 31), 40, 50},
        {9, "Jun", #date(2024, 6, 30), 50, 80},
        {10, "Jul", #date(2024, 7, 31), 40, 90},
        {11, "Aug", #date(2024, 8, 31), 40, 50},
        {12, "Sep", #date(2024, 9, 30), 30, 50}
    }
    ),
    #"Changed Types" = Table.TransformColumnTypes(Source, {{"Sorting", Int64.Type}, {"Serial Date", type date}, {"Boeing Deliveries", Int64.Type}, {"Airbus Deliveries", Int64.Type}})
in
    #"Changed Types"
in
    Source

It can be useful to create the labels for the charts dynamically:

-- DAX code for labels
MaxDate = Format(Max('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinDate = FORMAT (Min('Boeing vs Airbus'[Serial Date]),"MMM-YYYY")
MinMaxDate = [MinDate] & " to " & [MaxDate]
Title Boing Airbus = "Boing and Airbus Deliveries " & [MinMaxDate]

Happy coding!

Previous Post <<||>> Next Post

15 June 2024

🗒️Graphical Representation: Bar & Column Charts [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!
Last updated: 15-Jun-2024

Bar & Column Charts with Variations
Bar & Column Charts (Graphs) 

  • {definition} graphical representation of categorical data with rectangular figures (aka boxes) whose heights (column chart) or lengths (bar chart) are proportional to the values that they represent
  • {benefit} allow to visually encode/decode quantitative information-size as magnitude and area based on the relative position of the end of the box along the common scale
    • if the width of the box is the same, it's enough to compare the length
      • ⇒ the basis of comparison is one-dimensional [1]
      • ⇐ orient the reader to the relative magnitudes of the boxes
    • area is typically encoded when the width varies
      • ⇐ encoding by area is a poor encoding method as it can mislead
    • can represent negative and positive values 
    • one of the most useful, simple, and adaptable techniques in graphic presentation [1]
      • easily understood by readers
      • sometimes avoided because they are so common
      • almost everything could be a bar chart
    • the length of each bar is proportional to the quantity or amount of each category represented [1]
      • ⇒the zero line must be shown [1]
      • ⇒the scale must not be broken [1]
        • {exception} an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it [1]
  • {benefit} allow to visually represent categorical data
    • ⇒ occasionally represented without scales, grid lines or tick marks
    • the more data elements are presented, the more difficult it becomes to navigate and/or display the data
  • {benefit} allow us to easily compare magnitudes 
    • sometimes without looking at the actual values
  • {type} bar chart
    • the box is shown horizontally
    • represents magnitude by length
    • allows comparing different items as of a specific time
  • {type} column chart
    • the box is shown vertically
    • represents magnitude by height
    • allows comparing different items over time
      • ⇐ it still displays discrete points
    • recommended for comparing similar items for different time periods [2]
    • effective way to show most types of comparisons [2]
  • {subtype} stacked chart
    • variation of bar/column charts in which the boxes of a dimension's components are staked over each other
      • {exception} spaces can be used between boxes if the values aren't cumulative [3]
    • {benefit} allows encoding a further dimension where the values are staked within the same box
    • {drawback} do not show data structure well
      • ⇒ make it challenging to compare values across boxes
  • {subtype} 100-percent chart
    • variation of stacked chart in which the magnitude totals to 100%
    • {benefit} allows to display part to whole relationships
      • ⇐ preferable to circle chart's angle and area comparison [1]
  • {subtype} clustered chart (aka grouped chart)
    • variation of bar/column charts that allows encoding further quantitative information in distinct boxes tacked together which occasionally overlap
      • ⇐ if there's space, it is usually kept to a minimum
      • e.g. can be used to display multiple data series 
    • can be used with a secondary axis
    • {benefit} allows comparisons within the cluster/group as well between clusters/groups
    • {drawback} more challenging to make comparisons across points
  • {subtype} area chart (variable-width/variwide chart/graph
    • variation of bar/column charts in which the height/width have significance being proportional to some measure or characteristics of the data elements represented [3]
    • {benefit} allow encoding a further dimension as part of the area
  • {subtype} deviation chart 
    • variation of bar/column charts that display positive and negative values 
  • {subtype} joined chart
    • variation of bar/column charts in which the boxes are tacked together
    • {benefit} allow to better use the space available 
  • {subtype} paired chart 
    • variation of bar/column charts in which the boxes are paired in mirror based on an axis
      • e.g. the values of one data series are displayed to the left, while the values for a second data series are displayed to the right 
    • {benefit} allows to study the correlation and/or other relationships between the values of two data series
    • the hidden axes can have different scales 
  • {subtype} circular chart (aka radial chart)
    • variation of bar/column charts in which the boxes are wrapped into a circle, the various categories being uniformly spaced along the radial or category axis [3]
    • the value scale can have any upper or lower value and can progress in either direction [3]
    • {benefit} useful to represent data that have a circular dimension in an aesthetic form
      • e.g. months, hours
  • {subtype} waterfall chart (aka progressing chart)
    • variation of bar/column charts in which the boxes are displayed progressively, the start of a box corresponding the end of the previous box 
    • time and activity charts can be considered as variations of this subtype [3]
    • {advantage} allows to determine cumulative values, respectively the increase/decrease between consecutive boxes
  • {subtype}composite chart (aka mixed chartcombination chart, overlay chart)
    • variation of bar/column charts in besides boxes are used other graphic types of encoding (line, area)
      • ⇐ the different data graphics are overlaid on one another [3]
    • {benefit} allows to improve clarity or highlight the relationships between several data series [3]
    • {drawback} overlaying can result in clutter 
  • used to  
    • display totals, averages or frequencies
    • display time series
    • display the relationship between two or more items
    • make a comparison among several items
    • make a comparison between parts and the whole
  • can be confounded with 
    • [histograms]
      • show distribution through the frequency of quantitative values against defined intervals of quantitative values
      • used for continuous numerical data or data that can be effectively modelled as continuous
      • it doesn't have spaces between bars
        • ⇐ older use of bar/column charts don't use spaces
        • if this aspect is ignored, histograms can be considered as a special type of area chart
    • [vertical line chart] (aka price chart, bar chart)
      • vertical line charts are sometimes referred as bar charts (see [3])
  • things to consider
    • distance between bars
      • the more distant the bars, the more difficult it becomes to make comparisons and the accuracy of judgment decreases
    • sorting
      • sorting the bars/columns by their size facilitates comparisons, though it can impede items' search, especially when there are many categories involved
        • {exception} not recommended for time series
    • clutter
      • displaying too many items in a cluster and/or too many labels can lead to clutter
      • {recommendation} display at maximum 3-4 clustered boxes
    • color
      • one should follow the general recommendations 
    • trend lines
      • can be used especially with time series especially to represent the linear regression line
    • dual axis
      • {benefit} allows to compare the magnitudes of two data series by employing a secondary axis
    • overlapping
      • overlapping boxes can make charts easier to read
    • symbols
      • can be used to designate reference points of comparison for each of the bars [3]
  • {alternative} pie chart
    • can be used to dramatize comparisons in relation to the whole [2]
    • one should consider the drawbacks 
  • {alternative} choropleth maps
    • more adequate for geographical dimensions
    • provide minimal encoding 
  • {alternative} line charts
    • can be much more informative
    • provides an optimal dat-ink ratio
    • reduces the chart junk feeling
  • {alternative} dot plots
    • are closer to the original data

References:
[1] Anna C Rogers (1961) "Graphic Charts Handbook"
[2] Robert Lefferts (1981) "Elements of Graphics: How to prepare charts and graphs for effective reports"
[3] Robert L Harris (1996) "Information Graphics: A Comprehensive Illustrated Reference"

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.