(2) Does it make sense to use a chart only for the sake of visualizing the
data?
(3) Where is the benefit for using a chart as long there's no information
conveyed?
One can see similar examples in the media where non-aggregated values are
shown in a chart just for the sake of visualizing the data. Sometimes the
authors compensate for the lack of meaning with junk elements, fancy titles
or other tricks. Usually, sense-making in a chart takes longer than looking
at the values in a table as there are more dimensions or elements to
consider. For a table there's the title, headers and the values, nothing
more! For a chart one has in addition the axes and some visualization
elements that can facilitate or complicate visualization's decoding. Where
to add that there are also many tricks to distort the data.
However, the more data is available in the table, the more difficult it
becomes to navigate the data. But again, if the chart shows the individual
data without any information gained, a table might be still more effective.
One shouldn't be afraid to show a table where is the case!
On the other side, maybe by investing a bit more effort certain aspects can
be improved. In this area beginners start playing with the colors,
formatting the different elements of the chart. Unfortunately, even if color
plays a major role in the encoding and decoding of meaning, is often
misused/overused.
In the next examples taken from the web (diagram C and D), the author
changed the color of the column with the minimal value to red to contrast it
with the other values. Red is usually associated with danger, error,
warning, or other similar characteristics with negative impact. The chances
are high that the reader will associate the value with a negative
connotation, even if red is used also for conveying important information
(usually in text). Moreover, the reader will try to interpret the meaning of
the other colors. In practice, the color grey has a neutral tone (and
calming effect on the mind). Therefore, it's safe to use grey in
visualization (see diagram D in contrast with diagram C). Some even advise
setting grey as default for the visualization and changing the colors as
needed later.
In these charts, the author signalized in titles that red denotes the
lowest value, though it just reduces the confusion. One can meet titles in
which several colors are used, reminding of a Christmas tree. Frankly,
this type of encoding is not esthetically pleasing, and it can annoy the
reader.
(6) What's in a name?
The titles and, upon case, the subtitles are important elements in
communicating what the data reflects. The title should be in general short
and succinct in the information it conveys, having the role of
introducing, respectively identifying the chart, especially when multiple
charts are used. Some charts can also use a subtitle, which can be longer
than the title and have more of a storytelling character by highlighting
the message and/or the finding in the data. In diagrams C and D the
subtitles were considered as tiles, which is not considerably wrong.
In the media and presentations with influencing character, subtitles help
the user understand the message or the main findings, though it's not
appropriate for hardcoding the same in dynamic dashboards. Even if a logic
was identified to handle the various scenarios, this shifts users'
attention, and the chance is high that they'll stop further investigating
the visualization.
A data professional should present the facts with minimal interference
in how the audience and/or users perceive the data.
As a recommendation, one should aim for clear general titles and avoid
transmitting own message in charts. As a principle this can be summarized
as "aim for clarity and equidistance".
(7) What about meaning?
Until now we barely considered the meaning of data. Unfortunately, there's
no information about what the Discount rate means. It could be "the
minimum interest rate set by the US Federal Reserve (and some other
national banks) for lending to other banks" or "a rate used for
discounting bills of exchange", to use the definitions given by the Oxford
dictionary. Searching on the web, the results lead to discount rates for
royalty savings, resident tuitions, or retail for discount transactions.
Most probably the Discount rates from the data set refer to the latter.
We need a definition of the Discount rate to understand what the values
represent when they are ordered. For example, when Texas has a value of
25% (see B), does this value have a negative or a positive impact when
compared with other values? It depends on how it's used in the associated
formula. The last two charts consider that the minimum value has a
negative impact, though without more information the encoding might be
wrong!
Important formulas and definitions should be considered as side
information in the visualization, accompanying text or documentation! If further resources are required for understanding the data, then
links to the required resources should be provided as well. At least this
assures that the reader can acquire the right information without major
overhead.
(8) What do readers look for?
Frankly, this should have been the first question!
Readers have different expectations from data visualizations. First
of all, it's the curiosity - how the data look in row and/or aggregated
form, or in more advanced form how are they shaped (e.g. statistical
characteristics like dispersion, variance, outliers). Secondly, readers
look in the first phase to understand mainly whether the "results" are
good or bad, even if there are many shades of grey in between. Further on,
there must be made distinction between readers who want to learn more
about the data, models, and processes behind, respectively readers who
just want a confirmation of their expectations, opinions and beliefs (aka
bias). And, in the end, there are also people who are not interested in
the data and what it tells, where the title and/or subtitle provide enough
information.
Besides this there are further categories of readers segmented by their
role in the decision making, the planning and execution of operational,
tactical, or strategic activities. Each of these categories has different
needs. However, this exceeds the scope of our analysis.
Returning to our example, one can expect that the average reader will try
to identify the smallest and highest Discount rates from the data set,
respectively try to compare the values between the different States.
Sorting the data and having the values close to each other facilitates
the comparison and ranking, otherwise the reader needing to do this by himself/herself. This latter
aspect and the fact that bar charts better handle the display of
categorical data such as length and number, make from bar charts the tool
of choice (see diagram E). So,
whenever you see categorical data, consider using a bar chart!
Despite sorting the data, the reader might still need to subtract the
various values to identify and compare the differences. The higher the
differences between the values, the more complex these operations become.
Diagram F is supposed to help in this area, the comparison to the minimal
value being shown in orange. Unfortunately, small variances make
numbers' display more challenging especially when the visualization tools
don't offer display alternatives.
For showing the data from Diagram F were added in the table the third and
fourth columns (see diagram A). There's a fifth column which designates
the percentage from a percentage (what's the increase in percentages
between the current and minimal value). Even if that's mathematically
possible, the gain from using such data is neglectable and can create
confusion. This opens the door for another principle that applies in other
areas as well: "just because you can, it doesn't mean you should!".
One should
weigh design decisions against common sense or one's intuition on how
something can be (mis)used and/or (mis)understood!
The downside of Diagram F is that the comparisons are made only in
relation to the minimum value. The variations are small and allow further
comparisons. The higher the differences, the more challenging it becomes
to make further comparisons. A matrix display (see diagram G) which
compares any two values will help if the number of points is manageable.
The upper side of the numbers situated on and above the main diagonal were
grayed (and can be removed) because they are either nonmeaningful, or the
negatives of the numbers found below the diagram. Such diagrams are seldom
used, though upon case they prove to be useful.
Choropleth maps (diagram H) are met almost everywhere data have a
geographical dimension. Like all the other visuals they have their own
advantages (e.g. relative location on the map) and disadvantages (e.g.
encoding or displaying data). The diagram shows only the regions with
data (remember the data-to-ink ratio principle).
(9) How about the shape of data?
When dealing with numerical data series, it's useful to show aggregated
summaries like the average, quartiles, or standard deviation to understand
how the data are shaped. Such summaries don't really make sense for our
data set given the nature of the numbers (five values with small
variance). One can still calculate them and show them in a box plot,
though the benefit is neglectable.
(10) Which chart should be used?
As mentioned above, each chart has advantages and disadvantages. Given the
simplicity and the number of data points, any of the above diagrams will
do. A table is simple enough despite not using any visualization effects.
Also, the bar charts are simple enough to use, with a plus maybe for
diagram F which shows a further dimension of the data. The choropleth map
adds the geographical dimension, which could be important for some
readers. The matrix table is more appropriate for technical readers and
involves more effort to understand, at least at first sight, though the
learning curve is small. The column charts were considered only for
exemplification purposes, though they might work as well.
In the end one should
go with own experience and consider the audience and the communication
channels used. One can also choose 2 different diagrams, especially when they are
complementary and offer an additional dimension (e.g. diagrams F and H),
though the context may dictate whether their use is appropriate or not.
The diagrams should be simple to read and understand, but this
doesn't mean that one should stick to the standard visuals. The data
professional should explore other means of representing the data, a
fresh view having the opportunity of catching the reader's attention.
As a closing remark, nowadays data visualization tools allow building such
diagrams without much effort. Conversely, it takes more effort to go
beyond the basic functionality and provide more value for thyself and the
readers.
One should be able to evaluate upfront how much time it makes sense to
invest.
Hopefully, the few methods, principles and recommendations presented here
will help further!
Previous Post
<<||>>
Next Post
Resources:
[1] Edward R Tufte (1983) "The Visual
Display of Quantitative Information"