Showing posts with label categories. Show all posts
Showing posts with label categories. Show all posts

18 May 2024

Graphical Representation: Graphics We Live By (Part IV: Area Charts in MS Excel)

Graphical Representation
Graphical Representation

An area chart or area graph (see A) is a graphical representation of quantitative data based on a line chart for which the areas between axis and the lines of the series are commonly emphasized with colors, textures, or hatchings (Wikipedia). It resembles a combination between line and bar charts. Each data series results in the formation of a region (aka area), allowing thus to identify the overlapping and do comparisons between the lines within the same visual display. This approach works usually well for two or three data series if the lines don't overlap, though if more data series are added to the chart, the higher are the chances for lines to overlap or for one area to be covered by another (see B). This can easily become more than the chart can handle, even if the data series can be filtered dynamically.

Area Charts
Area Charts

Stacked area charts are a variation of area charts in which the areas are stacked, much like stacked bar charts (see C). Research papers abound with such charts, probably because they allow to stack together multiple data series within a small area, reflecting thus the many variables involved. Such charts allow to track individual as well as intermediary and total aggregated trends.

Stacked Area Charts
Stacked Area Charts

Unfortunately, besides the fact that some areas are barely distinguishable or that distant areas can't be compared (especially when one area in between has strong fluctuations), the lack of ticks and/or gridlines (see D) makes it difficult to interpret such charts. Moreover, when the lines are smoothed, it becomes even more difficult to identify the actual points. To address this it makes sense to use markers for data points to show that one works with discrete and not continuous points (see further paragraphs).

In general, it's recommended to reduce the number of data series to 3-5. For example, one can split the data series into 2-3 groups or categories based on series' characteristics (e.g. concentrate on the high values in one chart, respectively the low values in another, or group the low values under an "others" category) which would allow to make better comparisons.

Being able to sort the time series on their average value or other criteria (e.g. showing the areas with minimal variations first) can improve the readability of such charts.

Moreover, areas under curves can easily hide missing data (see F) and occasionally negative values (which is the case of the 8th example), or distort the rate of change when the charts are wider than needed (compare F with C). 

Line Chart, respectively Area Chart based on a subset
Area Charts Variations

Area charts seem to encode a dimension as area, though that's not necessarily the case. It seems natural to display time series of different granularities (day, month, quarter, year), though one needs to be careful about one important aspect! On a time scale, the more one moves away from the day to weeks and months as time units, the bigger the distance between points is. In the end, all the points in a series are discrete points (not continuous), though the bigger the distance, the more category-like these series become (compare F with C, the charts have the same width).

Using the area under the curve as dimension makes sense when there's continuity or the discrete points are close enough to each other to resemble continuity. Thus, area charts are useful when the number of points is high (and the distance between them becomes neglectable), e.g. showing daily values within a year or the months over several years. 

According to [2], [3] and several other sources, using the area to encode quantitative information is a poor graphical method and this applies to pie charts and area charts altogether. By contrast, for a bar chart (see G) one has either height or width to use for comparisons while the points are always as bars delimited. Scatter plots (see H), even if they might miss the time dimension, they better reflect the dispersion of the points along the lines delimited by encoding the color (compare H with E). 

Column Chart and Scatter Plot
Alternatives for Area Charts

The more category-like and the fewer data points the data series have, the higher the chances for other graphical representation tools to be able to better represent the data. For example, year or even quarter-based data can be better visualized with Sankey charts (unfortunately, not available as standard Excel visual yet).

Conversely, there are situations in which the area chart isn't supposed to convey specific values but to get a feeling of areas' shape, or its simplicity is more appropriate, situations in which area charts do a good job. In the end, a graphical representation's utility is linked to a chart's purpose (and audience, of course). 

References:
[1] Wikipedia (2023) Area charts (link)
[2] William S Cleveland (1993) Visualizing Data
[3] Robert L Harris (1996) Information Graphics: A Comprehensive Illustrated Reference

22 December 2011

Graphical Representation: Categories (Just the Quotes)

"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Category definition and selection in the pre-graphical phase of communication offer varied manipulation opportunities. But once we get to designing the chart itself category distortion opportunities are even more attractive." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Generally pie charts are to be avoided, as they can be difficult to interpret particularly when the number of categories is greater than five. Small proportions can be very hard to discern […] In addition, unless the percentages in each of the individual categories are given as numbers it can be much more difficult to estimate them from a pie chart than from a bar chart […]." (Jenny Freeman et al, "How to Display Data", 2008)

"Where there is no natural ordering to the categories it can be helpful to order them by size, as this can help you to pick out any patterns or compare the relative frequencies across groups. As it can be difficult to discern immediately the numbers represented in each of the categories it is good practice to include the number of observations on which the chart is based, together with the percentages in each category." (Jenny Freeman et al, "How to Display Data", 2008)

"Histograms are often mistaken for bar charts but there are important differences. Histograms show distribution through the frequency of quantitative values (y axis) against defined intervals of quantitative values(x axis). By contrast, bar charts facilitate comparison of categorical values. One of the distinguishing features of a histogram is the lack of gaps between the bars [...]" (Andy Kirk, "Data Visualization: A successful design process", 2012)

"Early exploration of a dataset can be overwhelming, because you don’t know where to start. Ask questions about the data and let your curiosities guide you. […] Make multiple charts, compare all your variables, and see if there are interesting bits that are worth a closer look. Look at your data as a whole and then zoom in on categories and individual data points. […] Subcategories, the categories within categories (within categories), are often more revealing than the main categories. As you drill down, there can be higher variability and more interesting things to see." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"If I had to pick a single go-to graph for categorical data, it would be the horizontal bar chart, which flips the vertical version on its side. Why? Because it is extremely easy to read. The horizontal bar chart is especially useful if your category names are long, as the text is written from left to right, as most audiences read, making your graph legible for your audience." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A taxonomy is a classification scheme that organizes categories in a broader-narrower hierarchy. Items that share similar qualities are grouped into the same category, and the taxonomy provides a global organization by relating categories to one another." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.