Showing posts with label bar chart. Show all posts
Showing posts with label bar chart. Show all posts

15 June 2024

Graphical Representation: Bar & Column Charts (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!
Last updated: 15-Jun-2024

Bar & Column Charts with Variations
Bar & Column Charts (Graphs) 

  • {definition} graphical representation of categorical data with rectangular figures (aka boxes) whose heights (column chart) or lengths (bar chart) are proportional to the values that they represent
  • {benefit} allow to visually encode/decode quantitative information-size as magnitude and area based on the relative position of the end of the box along the common scale
    • if the width of the box is the same, it's enough to compare the length
      • ⇒ the basis of comparison is one-dimensional [1]
      • ⇐ orient the reader to the relative magnitudes of the boxes
    • area is typically encoded when the width varies
      • ⇐ encoding by area is a poor encoding method as it can mislead
    • can represent negative and positive values 
    • one of the most useful, simple, and adaptable techniques in graphic presentation [1]
      • easily understood by readers
      • sometimes avoided because they are so common
      • almost everything could be a bar chart
    • the length of each bar is proportional to the quantity or amount of each category represented [1]
      • ⇒the zero line must be shown [1]
      • ⇒the scale must not be broken [1]
        • {exception} an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it [1]
  • {benefit} allow to visually represent categorical data
    • ⇒ occasionally represented without scales, grid lines or tick marks
    • the more data elements are presented, the more difficult it becomes to navigate and/or display the data
  • {benefit} allow us to easily compare magnitudes 
    • sometimes without looking at the actual values
  • {type} bar chart
    • the box is shown horizontally
    • represents magnitude by length
    • allows comparing different items as of a specific time
  • {type} column chart
    • the box is shown vertically
    • represents magnitude by height
    • allows comparing different items over time
      • ⇐ it still displays discrete points
    • recommended for comparing similar items for different time periods [2]
    • effective way to show most types of comparisons [2]
  • {subtype} stacked chart
    • variation of bar/column charts in which the boxes of a dimension's components are staked over each other
      • {exception} spaces can be used between boxes if the values aren't cumulative [3]
    • {benefit} allows encoding a further dimension where the values are staked within the same box
    • {drawback} do not show data structure well
      • ⇒ make it challenging to compare values across boxes
  • {subtype} 100-percent chart
    • variation of stacked chart in which the magnitude totals to 100%
    • {benefit} allows to display part to whole relationships
      • ⇐ preferable to circle chart's angle and area comparison [1]
  • {subtype} clustered chart (aka grouped chart)
    • variation of bar/column charts that allows encoding further quantitative information in distinct boxes tacked together which occasionally overlap
      • ⇐ if there's space, it is usually kept to a minimum
      • e.g. can be used to display multiple data series 
    • can be used with a secondary axis
    • {benefit} allows comparisons within the cluster/group as well between clusters/groups
    • {drawback} more challenging to make comparisons across points
  • {subtype} area chart (variable-width/variwide chart/graph
    • variation of bar/column charts in which the height/width have significance being proportional to some measure or characteristics of the data elements represented [3]
    • {benefit} allow encoding a further dimension as part of the area
  • {subtype} deviation chart 
    • variation of bar/column charts that display positive and negative values 
  • {subtype} joined chart
    • variation of bar/column charts in which the boxes are tacked together
    • {benefit} allow to better use the space available 
  • {subtype} paired chart 
    • variation of bar/column charts in which the boxes are paired in mirror based on an axis
      • e.g. the values of one data series are displayed to the left, while the values for a second data series are displayed to the right 
    • {benefit} allows to study the correlation and/or other relationships between the values of two data series
    • the hidden axes can have different scales 
  • {subtype} circular chart (aka radial chart)
    • variation of bar/column charts in which the boxes are wrapped into a circle, the various categories being uniformly spaced along the radial or category axis [3]
    • the value scale can have any upper or lower value and can progress in either direction [3]
    • {benefit} useful to represent data that have a circular dimension in an aesthetic form
      • e.g. months, hours
  • {subtype} waterfall chart (aka progressing chart)
    • variation of bar/column charts in which the boxes are displayed progressively, the start of a box corresponding the end of the previous box 
    • time and activity charts can be considered as variations of this subtype [3]
    • {advantage} allows to determine cumulative values, respectively the increase/decrease between consecutive boxes
  • {subtype}composite chart (aka mixed chartcombination chart, overlay chart)
    • variation of bar/column charts in besides boxes are used other graphic types of encoding (line, area)
      • ⇐ the different data graphics are overlaid on one another [3]
    • {benefit} allows to improve clarity or highlight the relationships between several data series [3]
    • {drawback} overlaying can result in clutter 
  • used to  
    • display totals, averages or frequencies
    • display time series
    • display the relationship between two or more items
    • make a comparison among several items
    • make a comparison between parts and the whole
  • can be confounded with 
    • [histograms]
      • show distribution through the frequency of quantitative values against defined intervals of quantitative values
      • used for continuous numerical data or data that can be effectively modelled as continuous
      • it doesn't have spaces between bars
        • ⇐ older use of bar/column charts don't use spaces
        • if this aspect is ignored, histograms can be considered as a special type of area chart
    • [vertical line chart] (aka price chart, bar chart)
      • vertical line charts are sometimes referred as bar charts (see [3])
  • things to consider
    • distance between bars
      • the more distant the bars, the more difficult it becomes to make comparisons and the accuracy of judgment decreases
    • sorting
      • sorting the bars/columns by their size facilitates comparisons, though it can impede items' search, especially when there are many categories involved
        • {exception} not recommended for time series
    • clutter
      • displaying too many items in a cluster and/or too many labels can lead to clutter
      • {recommendation} display at maximum 3-4 clustered boxes
    • color
      • one should follow the general recommendations 
    • trend lines
      • can be used especially with time series especially to represent the linear regression line
    • dual axis
      • {benefit} allows to compare the magnitudes of two data series by employing a secondary axis
    • overlapping
      • overlapping boxes can make charts easier to read
    • symbols
      • can be used to designate reference points of comparison for each of the bars [3]
  • {alternative} pie chart
    • can be used to dramatize comparisons in relation to the whole [2]
    • one should consider the drawbacks 
  • {alternative} choropleth maps
    • more adequate for geographical dimensions
    • provide minimal encoding 
  • {alternative} line charts
    • can be much more informative
    • provides an optimal dat-ink ratio
    • reduces the chart junk feeling
  • {alternative} dot plots
    • are closer to the original data

References:
[1] Anna C Rogers (1961) "Graphic Charts Handbook"
[2] Robert Lefferts (1981) "Elements of Graphics: How to prepare charts and graphs for effective reports"
[3] Robert L Harris (1996) "Information Graphics: A Comprehensive Illustrated Reference"

01 June 2024

Graphical Representation: Graphics We Live By (Part VIII: List of Items in Power BI)

Graphical Representation Series
Graphical Representation Series

Introduction

There are situations in which one needs to visualize only the rating, other values, or ranking of a list of items (e.g. shopping cart, survey items) on a scale (e.g. 1 to 100, 1 to 10) for a given dimension (e.g. country, department). Besides tables, in Power BI there are 3 main visuals that can be used for this purpose: the clustered bar chart, the line chart (aka line graph), respectively the slopegraph:

Main Display Methods

Main Display Methods

For a small list of items and dimension values probably the best choice would be to use a clustered bar chart (see A). If the chart is big enough, one can display also the values as above. However, the more items in the list, respectively values in the dimension, the more space is needed. One can maybe focus then only on a subset of items from the list (e.g. by grouping several items under a category), respectively choose which dimension values to consider. Another important downside of this method is that one needs to remember the color encodings. 

This downside applies also to the next method - the use of a line chart (see B) with categorical data, however applying labels to each line simplifies its navigation and decoding. With line charts the audience can directly see the order of the items, the local and general trends. Moreover, a line chart can better scale with the number of items and dimension values.

The third option (see C), the slopegraph, looks like a line chart though it focuses only on two dimension values (points) and categorizes the line as "down" (downward slope), "neutral" (no change) and "up" (upward slope). For this purpose, one can use parameters fields with measures. Unfortunately, the slopegraph implementation is pretty basic and the labels overlap which makes the graph more difficult to read. Probably, with the new set of changes planned by Microsoft, the use of conditional formatting of lines would allow to implement slope graphs with line charts, creating thus a mix between (B) and (C).

This is one of the cases in which the Y-axis (see B and C) could be broken and start with the meaningful values. 

Table Based Displays

Especially when combined with color encodings (see C & G) to create heatmap-like displays or sparklines (see E), tables can provide an alternative navigation of the same data. The color encodings allow to identify the areas of focus (low, average, or high values), while the sparklines allow to show inline the trends. Ideally, it should be possible to combine the two displays.  

Table Displays and the Aster Plot

One can vary the use of tables. For example, one can display only the deviations from one of the data series (see F), where the values for the other countries are based on AUS. In (G), with the help of visual calculations one can also display values' ranking. 

Pie Charts

Pie charts and their variations appear nowadays almost everywhere. The Aster plot is a variation of the pie charts in which the values are encoded in the height of the pieces. This method was considered because the data used above were encoded in 4 similar plots. Unfortunately, the settings available in Power BI are quite basic - it's not possible to use gradient colors or link the labels as below:

Source Data as Aster Plots

Sankey Diagram

A Sankey diagram is a data visualization method that emphasizes the flow or change from one state (the source) to another (the destination). In theory it could be used to map the items to the dimensions and encode the values in the width of the lines (see I). Unfortunately, the diagram becomes challenging to read because all the lines and most of the labels intersect. Probably this could be solved with more flexible formatting and a rework of the algorithm used for the display of the labels (e.g. align the labels for AUS to the left, while the ones for CAN to the right).

Sankey Diagram

Data Preparation

A variation of the above image with the Aster Plots which contains only the plots was used in ChatGPT to generate the basis data as a table via the following prompts:

  • retrieve the labels from the four charts by country and value in a table
  • consolidate the values in a matrix table by label country and value
The first step generated 4 tables, which were consolidated in a matrix table in the second step. Frankly, the data generated in the first step should have been enough because using the matrix table required an additional step in DAX.

Here is the data imported in Power BI as the Industries query:

let
    Source = #table({"Label","Australia","Canada","U.S.","Japan"}
, {
 {"Credit card","67","64","66","68"}
, {"Online retail","55","57","48","53"}
, {"Banking","58","53","57","48"}
, {"Mobile phone","62","55","44","48"}
, {"Social media","74","72","62","47"}
, {"Search engine","66","64","56","42"}
, {"Government","52","52","58","39"}
, {"Health insurance","44","48","50","36"}
, {"Media","52","50","39","23"}
, {"Retail store","44","40","33","23"}
, {"Car manufacturing","29","29","26","20"}
, {"Airline/hotel","35","37","29","16"}
, {"Branded manufacturing","36","33","25","16"}
, {"Loyalty program","45","41","32","12"}
, {"Cable","40","39","29","9"}
}
),
    #"Changed Types" = Table.TransformColumnTypes(Source,{{"Australia", Int64.Type}, {"Canada", Int64.Type}, {"U.S.", Number.Type}, {"Japan", Number.Type}})
in
    #"Changed Types"

Transforming (unpivoting) the matrix to a table with the values by country:

IndustriesT = UNION (
    SUMMARIZECOLUMNS(
     Industries[Label]
     , Industries[Australia]
     , "Country", "Australia"
    )
    , SUMMARIZECOLUMNS(
     Industries[Label]
     , Industries[Canada]
     , "Country", "Canada"
    )
    , SUMMARIZECOLUMNS(
     Industries[Label]
     , Industries[U.S.]
     , "Country", "U.S."
    )
    ,  SUMMARIZECOLUMNS(
     Industries[Label]
     , Industries[Japan]
     , "Country", "Japan"
    )
)

Notes:
The slopechart from MAQ Software requires several R language libraries to be installed (see how to install the R language and optionally the RStudio). Run the following scripts, then reopen Power BI Desktop and enable running visual's scripts.

install.packages("XML")
install.packages("htmlwidgets")
install.packages("ggplot2")
install.packages("plotly")

Happy (de)coding!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.