07 November 2011

📉Graphical Representation: Emphasis (Just the Quotes)

"By [diagrams] it is possible to present at a glance all the facts which could be obtained from figures as to the increase, fluctuations, and relative importance of prices, quantities, and values of different classes of goods and trade with various countries; while the sharp irregularities of the curves give emphasis to the disturbing causes which produce any striking change." (Arthur L Bowley, "A Short Account of England's Foreign Trade in the Nineteenth Century, its Economic and Social Results", 1905)

"First, color has identity value. In other words, it serves to distinguish one thing from another. In many cases it does this much better and much quicker than black and white coding by different types of shading or lines. […] Second, color has suggestion value. […] Red is usually taken to mean a danger signal or an unfavorable condition. But since it is one of the most visible of colors it is excellent for adding emphasis, regardless of connotation. […] Green has no such unfavorable implication, and is usually appropriate for suggesting a "green light" condition. […] Similarly, every color carries its own connotations; and although they seldom make a vital difference one way or the other, it seems logical to try to make them work for you rather than against you." (Kenneth W Haemer, "Color in Chart Presentation", The American Statistician Vol. 4 (2) , 1950)

"Correct emphasis is basic to effective graphic presentation. Intensity of color is the simplest method of obtaining emphasis. For most reproduction purposes black ink on a white page is most generally used. Screens, dots and lines can, of course, be effectively used to give a gradation of tone from light grey to solid black. When original charts are the subjects of display presentation, use of colors is limited only by the subject and the emphasis desired." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Simplicity, accuracy. appropriate size, proper proportion, correct emphasis, and skilled execution - these are the factors that produce the effective chart. To achieve simplicity your chart must be designed with a definite audience in mind, show only essential information. Technical terms should be absent as far as possible. And in case of doubt it is wiser to oversimplify than to make matters unduly complex. Be careful to avoid distortion or misrepresentation. Accuracy in graphics is more a matter of portraying a clear reliable picture than reiterating exact values. Selecting the right scales and employing authoritative titles and legends are as important as precision plotting. The right size of a chart depends on its probable use, its importance, and the amount of detail involved." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Without adequate planning. it is seldom possible to achieve either proper emphasis of each component element within the chart or a presentation that is pleasing in its entirely. Too often charts are developed around a single detail without sufficient regard for the work as a whole. Good chart design requires consideration of these four major factors: (1) size, (2) proportion, (3) position and margins, and (4) composition." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"[...] exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as for those we believe might be there. Except for its emphasis on graphs, its tools are secondary to its purpose." (John W Tukey, [comment] 1979)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"In order to be easily understood, a display of information must have a logical structure which is appropriate for the user's knowledge and needs, and this structure must be clearly represented visually. In order to indicate structure, it is necessary to be able to emphasize, divide and relate items of information. Visual emphasis can be used to indicate a hierarchical relationship between items of information, as in the case of systems of headings and subheadings for example. Visual separation of items can be used to indicate that they are different in kind or are unrelated functionally, and similarly a visual relationship between items will imply that they are of a similar kind or bear some functional relation to one another. This kind of visual 'coding' helps the reader to appreciate the extent and nature of the relationship between items of information, and to adopt an appropriate scanning strategy." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"[...] error bars are more effectively portrayed on dot charts than on bar charts. […] On the bar chart the upper values of the intervals stand out well, but the lower values are visually deemphasized and are not as well perceived as a result of being embedded in the bars. This deemphasis does not occur on the dot chart." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The plotted points on a graph should always be made to stand out well. They are, after all, the most important feature of a graph, since any lines linking them are nearly always a matter of conjecture. These lines should stop just short of the plotted points so that the latter are emphasised by the space surrounding them. Where a point happens to fall on an axis line, the axis should be broken for a short distance on either side of the point." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"An axis is the ruler that establishes regular intervals for measuring information. Because it is such a widely accepted convention, it is often taken for granted and its importance overlooked. Axes may emphasize, diminish, distort, simplify, or clutter the information. They must be used carefully and accurately." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"What distinguishes data tables from graphics is explicit comparison and the data selection that this requires. While a data table obviously also selects information, this selection is less focused than a chart's on a particular comparison. To the extent that some figures in a table are visually emphasised. say in colour or size and style of print. the table is well on its way to becoming a chart. If you're making no comparisons - because you have no particular message and so need no selection (in other words, if you are simply providing a database, number quarry or recycling facility) - tables are easier to use than charts." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"One way a chart can lie is through overemphasis of the size and scale of items, particularly when the dimension of depth isnʼt considered." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Usually, diagrams contain some noise - information unrelated to the diagram’s primary goal. Noise is decorations, redundant, and irrelevant data, unnecessarily emphasized and ambiguous icons, symbols, lines, grids, or labels. Every unnecessary element draws attention away from the central idea that the designer is trying to share. Noise reduces clarity by hiding useful information in a fog of useless data. You may quickly identify noise elements if you can remove them from the diagram or make them less intense and attractive without compromising the function." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"Beyond basic charts, practitioners must also learn to compose visualizations together elegantly. The perceptual stage focuses on making the literal charts more precise as well as working to de-emphasize the entire piece. Design choices start to consider distractions, reducing visual clutter and centering on the message. Minimalism is espoused as a core value with an emphasis on shifting toward precision as accuracy. This is the most common next step for practitioners. Minimalism is also a key stage in maturation. It is experimentation at one extreme that helps practitioners distill down to core, shared practices." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Deception (Just the Quotes)

"The zero of the scale should appear on every chart, and should shown by a heavy line carried across the sheet. If this is not done the reader may assume the bottom of the sheet to be zero and so be misled. The scale should be graduated from zero to a little over the maximum figure to be plotted on the charts, so that there will be a space between the highest peak on the curve and the top of the chart." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Under certain conditions, however, the ordinary form of graphic chart is slightly misleading. It will be conceded that its true function is to portray comparative fluctuations. This result is practically secured when the factors or quantities compared are nearly of the same value or volume, but analysis will show that this is not accomplished when the amounts compared differ greatly in value or volume. [...] The same criticism applies to charts which employ or more scales for various curve. If the different scale are in proper proportion, the result is the same as with one scale, but when two or more scales are used which are not proportional an indication may be given with respect to comparative fluctuations which is absolutely false." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"When plotting any curve the vertical scale should, if possible, be chosen so that the zero of the scale will appear on the chart. Otherwise, the reader may assume the bottom of the chart to be zero and so be grossly misled. Zero should always be indicated by a broad line much wider than the ordinary co-ordinate lines used for the background of the chart." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Admittedly a chart is primarily a picture, and for presentation purposes should be treated as such; but in most charts it is desirable to be able to read the approximate magnitudes by reference to the scales. Such reference is almost out of the question without some rulings to guide the eye. Second, the picture itself may be misleading without enough rulings to keep the eye 'honest'. Although sight is the most reliable of our senses for measuring (and most other) purposes, the unaided eye is easily deceived; and there are numerous optical illusions to prove it. A third reason, not vital, but still of some importance, is that charts without rulings may appear weak and empty and may lack the structural unity desirable in any illustration." (Kenneth W Haemer, "Hold That Line. A Plea for the Preservation of Chart Scale Ruling", The American Statistician Vol. 1 (1) 1947)

"[….] double-scale charts are likely to be misleading unless the two zero values coincide (either on or off the chart). To insure an accurate comparison of growth the scale intervals should be so chosen that both curves meet at some point. This treatment produces the effect of percentage relatives or simple index numbers with the point of juncture serving as the base point. The principal advantage of this form of presentation is that it is a short-cut method of comparing the relative change of two or more series without computation. It is especially useful for bringing together series that either vary widely in magnitude or are measured in different units and hence cannot be compared conveniently on a chart having only one absolute-amount scale. In general, the double scale treatment should not be used for presenting growth comparisons to the general reader." (Kenneth W Haemer, "Double Scales Are Dangerous", The American Statistician Vol. 2 (3) , 1948)

"An important rule in the drafting of curve charts is that the amount scale should begin at zero. In comparisons of size the omission of the zero base, unless clearly indicated, is likely to give a misleading impression of the relative values and trend." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"Just like the spoken or written word, statistics and graphs can lie. They can lie by not telling the full story. They can lead to wrong conclusions by omitting some of the important facts. [...] Always look at statistics with a critical eye, and you will not be the victim of misleading information." (Dyno Lowenstein, "Graphs", 1976)

"Probably one of the most common misuses (intentional or otherwise) of a graph is the choice of the wrong scale - wrong, that is, from the standpoint of accurate representation of the facts. Even though not deliberate, selection of a scale that magnifies or reduces - even distorts - the appearance of a curve can mislead the viewer." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"For many people the first word that comes to mind when they think about statistical charts is 'lie'. No doubt some graphics do distort the underlying data, making it hard for the viewer to learn the truth. But data graphics are no different from words in this regard, for any means of communication can be used to deceive. There is no reason to believe that graphics are especially vulnerable to exploitation by liars; in fact, most of us have pretty good graphical lie detectors that help us see right through frauds." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphs are used to meet the need to condense all the available information into a more usable quantity. The selection process of combining and condensing will inevitably produce a less than complete study and will lead the user in certain directions, producing a potential for misleading." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Reliability is highly valued by accountants and has been defined as 'the faithfulness with which it (information) represents what it purports to represent'. The reason reliability is so important is that an essential characteristic of an accounting report is its acceptance, and if a report is considered to be misleading or superfluous, it and future reports will be disregarded." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"There are two kinds of misrepresentation. In one. the numerical data do not agree with the data in the graph, or certain relevant data are omitted. This kind of misleading presentation. while perhaps hard to determine, clearly is wrong and can be avoided. In the second kind of misrepresentation, the meaning of the data is different to the preparer and to the user." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The bar of a bar chart has two aspects that can be used to visually decode quantitative information-size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984) 

"The rule is that a graph of a change in a variable with time should always have a vertical scale that starts with zero. Otherwise, it is inherently misleading." (Douglas A Downing & Jeffrey Clark, "Forgotten Statistics: A Self-Teaching Refresher Course", 1996)

"Displaying numerical information always involves selection. The process of selection needs to be described so that the reader will not be misled." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"If you want to hide data, try putting it into a larger group and then use the average of the group for the chart. The basis of the deceit is the endearingly innocent assumption on the part of your readers that you have been scrupulous in using a representative average: one from which individual values do not deviate all that much. In scientific or statistical circles, where audiences tend to take less on trust, the 'quality' of the average (in terms of the scatter of the underlying individual figures) is described by the standard deviation, although this figure is itself an average." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"The donut, its spelling betrays its origins, is nearly always more deceit friendly than the pie, despite being modelled on a life-saving ring. This is because the hole destroys the second most important value- defining element, by hiding the slice angles in the middle." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"There are some chart types that occasionally appear in print but are so bad that they serve neither honesty nor deceit. Among these monuments to human ingenuity at the expense of common sense are the concentric donut and overlapping segments. The concentric donut is really just a bar or column chart bent back on itself to save space. However as anyone who has ever watched a two or four hundred metre race will know, to make sense of the order of arrival at the tape you have to stagger the start to take account of the bend in the track. Blithely ignoring this problem, the concentric donut uses to diminish the difference between the inner and the outer absolute values by anything up to 2.5 times." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Good graphic design is not a panacea for bad copy, poor layout or misleading statistics. If any one of these facets are feebly executed it reflects poorly on the work overall, and this includes bad graphs and charts." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"It is tempting to make charts more engaging by introducing fancy graphics or three dimensions so they leap of f the page, but doing so obscures the real data and misleads people, intentionally or not." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

05 November 2011

📉Graphical Representation: Trends (Just the Quotes)

"Wherever unusual peaks or valleys occur on a curve it is a good plan to mark these points with a small figure inside a circle. This figure should refer to a note on the back of the chart explaining the reason for the unusual condition. It is not always sufficient to show that a certain item is unusually high or low; the executive will want to know why it is that way." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"An important rule in the drafting of curve charts is that the amount scale should begin at zero. In comparisons of size the omission of the zero base, unless clearly indicated, is likely to give a misleading impression of the relative values and trend." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"A piece of self-deception - often dear to the heart of apprentice scientists - is the drawing of a 'smooth curve'" (how attractive it sounds!) through a set of points which have about as much trend as the currants in plum duff. Once this is done, the mind, looking for order amidst chaos, follows the Jack-o'-lantern line with scant attention to the protesting shouts of the actual points. Nor, let it be whispered, is it unknown for people who should know better to rub off the offending points and publish the trend line which their foolish imagination has introduced on the flimsiest of evidence. Allied to this sin is that of overconfident extrapolation, i.e. extending the graph by guesswork beyond the range of factual information. Whenever extrapolation is attempted it should be carefully distinguished from the rest of the graph, e.g. by showing the extrapolation as a dotted line in contrast to the full line of the rest of the graph. [...] Extrapolation always calls for justification, sooner or later. Until this justification is forthcoming, it remains a provisional estimate, based on guesswork." (Michael J Moroney, "Facts from Figures", 1951)

"In line charts with an arithmetic scale, it is essential to set the base line at zero in order that the correct perspective of the general movement may not be lost. Breaking or leaving off part of the scale leads to misinterpretation, because the trend then shows a disproportionate degree of variation in movement." (Mary E Spear, "Charting Statistics", 1952)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff, "How to Lie with Statistics", 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"Since bars represent magnitude by their length, the zero line must be shown and the arithmetic scale must not be broken. Occasionally an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it, without distorting the general trend of the other bars, but this practice applies solely when only one bar exceeds the scale." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Charts not only tell what was, they tell what is; and a trend from was to is" (projected linearly into the will be) contains better percentages than clumsy guessing." (Robert A Levy, "The Relative Strength Concept of Common Stock Forecasting", 1968)

"In certain respects, line graphs are uniquely applicable to particular graphic requirements for which a bar or circle chart could not be substituted. Strictly speaking, the line graph must be used to portray changes in a continuous variable, since technically such a variable must be represented by a line and not by 'points' or 'bars'. Line graphs are often uniquely applicable to problems of analysis, particularly when it is essential to visualize a trend, observe the behavior of a set of variables through time, or portray the same variable in differing time periods." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29" (4), 1975)

"A graphic is an illustration that, like a painting or drawing, depicts certain images on a flat surface. The graphic depends on the use of lines and shapes or symbols to represent numbers and ideas and show comparisons, trends, and relationships. The success of the graphic depends on the extent to which this representation is transmitted in a clear and interesting manner." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Graphic forms help us to perform and influence two critical functions of the mind: the gathering of information and the processing of that information. Graphs and charts are ways to increase the effectiveness and the efficiency of transmitting information in a way that enhances the reader's ability to process that information. Graphics are tools to help give meaning to information because they go beyond the provision of information and show relationships, trends, and comparisons. They help to distinguish which numbers and which ideas are more important than others in a presentation." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Graphic misrepresentation is a frequent misuse in presentations to the nonprofessional. The granddaddy of all graphical offenses is to omit the zero on the vertical axis. As a consequence, the chart is often interpreted as if its bottom axis were zero, even though it may be far removed. This can lead to attention-getting headlines about 'a soar' or 'a dramatic rise" (or fall)'. A modest, and possibly insignificant, change is amplified into a disastrous or inspirational trend." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998) 

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"Graphs are for the forest and tables are for the trees. Graphs give you the big picture and show you the trends; tables give you the details." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Sparklines are compact line graphs that do not have a quantitative scale. They are meant to provide a quick sense of a metric's movement or trend, usually over time. They are more expressive than arrows, which only indicate change from the prior period and do not qualify the degree of change. Sparklines are significantly more compact than normal line graphs but are precise." (Wayne W Eckerson, "Performance Dashboards: Measuring, Monitoring, and Managing Your Business", 2010)

"Line graphs that show more than one line can be useful for making comparisons, but sometimes it is important to discuss each individual line. By using sparklines evaluators can call attention to and discuss individual cases. Sparklines can be embedded within a sentence to illustrate a trend and help stakeholders better understand the data. Evaluators can use this simple visualization when creating reports." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013) 

"What is good visualization? It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns, and outliers that tell you about yourself and what surrounds you. The best visualization evokes that moment of bliss when seeing something for the first time, knowing that what you see has been right in front of you, just slightly hidden. Sometimes it is a simple bar graph, and other times the visualization is complex because the data requires it." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually" (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"As presenters of data visualizations, often we just want our audience to understand something about their environment – a trend, a pattern, a breakdown, a way in which things have been progressing. If we ask ourselves what we want our audience to do with that information, we might have a hard time coming up with a clear answer sometimes. We might just want them to know something." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"[...] scatterplots had advantages over earlier graphic forms: the ability to see clusters, patterns, trends, and relations in a cloud of points. Perhaps most importantly, it allowed the addition of visual annotations (point symbols, lines, curves, enclosing contours, etc.) to make those relationships more coherent and tell more nuanced stories." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Data storytelling is a method of communicating information that is custom-fit for a specific audience and offers a compelling narrative to prove a point, highlight a trend, make a sale, or all of the above. [...] Data storytelling combines three critical components, storytelling, data science, and visualizations, to create not just a colorful chart or graph, but a work of art that carries forth a narrative complete with a beginning, middle, and end." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Bad complexity neither elucidates important salient points nor shows coherent broader trends. It will obfuscate, frustrate, tax the mind, and ultimately convey trendlessness and confusion to the viewer. Good complexity, in contrast, emerges from visualizations that use more data than humans can reasonably process to form a few salient points." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

04 November 2011

📉Graphical Representation: Statistics (Just the Quotes)

"Graphical statistics can be defined as: 'the expression of statistical facts by means of geometric processes' (Levasseur). Its general usefulness consists of replacing figures which, by their multiplicity, confuse memory, with a figure whose general appearance can be discovered all at once and, by speaking to the eyes, is more easily engraved in the memory." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Although, the tabular arrangement is the fundamental form for presenting a statistical series, a graphic representation - in a chart or diagram - is often of great aid in the study and reporting of statistical facts. Moreover, sometimes statistical data must be taken, in their sources, from graphic rather than tabular records." (William L Crum et al, "Introduction to Economic Statistics", 1938)

"The primary purpose of a graph is to show diagrammatically how the values of one of two linked variables change with those of the other. One of the most useful applications of the graph occurs in connection with the representation of statistical data." (John F Kenney & E S Keeping, "Mathematics of Statistics" Vol. I 3rd Ed., 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"Indeed the language of statistics is rarely as objective as we imagine. The way statistics are presented, their arrangement in a particular way in tables, the juxtaposition of sets of figures, in itself reflects the judgment of the author about what is significant and what is trivial in the situation which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it isa mistake – maybe due to A good rule of thumb for deciding how long the analysis of the data actually will take is (1) to add up all the time for everything you can think of - editing the data, checking for errors, calculating various statistics, thinking about the results, going back to the data to try out a new idea, and (2) then multiply the estimate obtained in this first step by five." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Just like the spoken or written word, statistics and graphs can lie. They can lie by not telling the full story. They can lead to wrong conclusions by omitting some of the important facts. [...] Always look at statistics with a critical eye, and you will not be the victim of misleading information." (Dyno Lowenstein, "Graphs", 1976)

"Learning to make graphs involves two things: (l) the techniques of plotting statistics, which might be called the artist's job; and" (2) understanding the statistics. When you know how to work out graphs, all kinds of statistics will probably become more interesting to you." (Dyno Lowenstein, "Graphs", 1976)

"Of course, statistical graphics, just like statistical calculations, are only as good as what goes into them. An ill-specified or preposterous model or a puny data set cannot be rescued by a graphic (or by calculation), no matter how clever or fancy. A silly theory means a silly graphic." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Statistics is a tool. In experimental science you plan and carry out experiments, and then analyse and interpret the results. To do this you use statistical arguments and calculations. Like any other tool - an oscilloscope, for example, or a spectrometer, or even a humble spanner - you can use it delicately or clumsily, skillfully or ineptly. The more you know about it and understand how it works, the better you will be able to use it and the more useful it will be." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"There is an interplay between statistical models and graphics, so it is advantageous to think about models before making a series of plots." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"There are two components to visualizing the structure of statistical data - graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered." (William S Cleveland, "Visualizing Data", 1993)

"But people treat mutant statistics just as they do other statistics - that is, they usually accept even the most implausible claims without question. [...] And people repeat bad statistics [...] bad statistics live on; they take on lives of their own. [...] Statistics, then, have a bad reputation. We suspect that statistics may be wrong, that people who use statistics may be 'lying' - trying to manipulate us by using numbers to somehow distort the truth. Yet, at the same time, we need statistics; we depend upon them to summarize and clarify the nature of our complex society. This is particularly true when we talk about social problems." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Every statistical analysis is an interpretation of the data, and missingness affects the interpretation. The challenge is that when the reasons for the missingness cannot be determined there is basically no way to make appropriate statistical adjustments. Sensitivity analyses are designed to model and explore a reasonable range of explanations in order to assess the robustness of the results." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The consequence of distinguishing statistical methods from the graphics displaying them is to separate form from function. That is, the same statistic can be represented by different types of graphics, and the same type of graphic can be used to display two different statistics. […] This separability of statistical and geometric objects is what gives a system a wide range of representational opportunities." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Oftentimes a statistical graphic provides the evidence for a plausible story, and the evidence, though perhaps only circumstantial, can be quite convincing. […] But such graphical arguments are not always valid. Knowledge of the underlying phenomena and additional facts may be required." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Placing a fact within a context increases its value greatly. […] . An efficacious way to add context to statistical facts is by embedding them in a graphic. Sometimes the most helpful context is geographical, and shaded maps come to mind as examples. Sometimes the most helpful context is temporal, and time-based line graphs are the obvious choice. But how much time? The ending date (today) is usually clear, but where do you start? The starting point determines the scale. […] The starting point and hence the scale are determined by the questions that we expect the graph to answer." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"After all, we do agree that statistical data analysis is concerned with generating and evaluating hypotheses about data. For us, generating hypotheses means that we are searching for patterns in the data - trying to 'see what the data seem to say'. And evaluating hypotheses means that we are seeking an explanation or at least a simple description of what we find - trying to verify what we believe we see." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

03 November 2011

📉Graphical Representation: Confusion (Just the Quotes)

"First, it is generally inadvisable to attempt to portray a series of more than four or five categories by means of pie charts. If, for example, there are six, eight, or more categories, it may be very confusing to differentiate the relative values portrayed, especially if several small sectors are of approximately the same size. Second, the pie chart may lose its effectiveness if an attempt is made to compare the component values of several circles, as might be found in a temporal or geographical series. In such case the one-hundred percent bar or column chart is more appropriate. Third, although the proportionate values portrayed in a pie chart are measured as distances along arcs about the circle, actually there is a tendency to estimate values in terms of areas of sectors or by the size of subtended angles at the center of the circle." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"The eye can accurately appraise only very few features of a diagram, and consequently a complicated or confusing diagram will lead the reader astray. The fundamental rule for all charting is to use a plan which is simple and which takes account, in its arrangement of the facts to be presented, of the above-mentioned capacities of the eye."  (William L Crum et al, "Introduction to Economic Statistics", 1938)

"Besides being easier to construct than a bar chart, the line chart possesses other advantages. It is easier to read, for while the bars stand out more prominently than the line, they tend to become confusing if numerous, and especially so when they record alternate increase and decrease. It is easier for the eye to follow a line across the face of the chart than to jump from bar top to bar top, and the slope of the line connecting two points is a great aid in detecting minor changes. The line is also more suggestive of movement than arc bars, and movement is the very essence of a time series. Again, a line chart permits showing two or more related variables on the same chart, or the same variable over two or more corresponding periods." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"If two or more data paths ate to appear on the graph, it is essential that these lines be labeled clearly, or at least a reference should be provided for the reader to make the necessary identifications. While clarity seems to be a most obvious goal, graphs with inadequate or confusing labeling do appear in publications, The user should not find identification of data paths troublesome or subject to misunderstanding. The designer normally should place no more than three data paths on the graph to prevent confusion - particularly if the data paths intersect at one or more points on the Cartesian plane." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"The information on a plot should be relevant to the goals of the analysis. This means that in choosing graphical methods we should match the capabilities of the methods to our needs in the context of each application. [...] Scatter plots, with the views carefully selected as in draftsman's displays, casement displays, and multiwindow plots, are likely to be more informative. We must be careful, however, not to confuse what is relevant with what we expect or want to find. Often wholly unexpected phenomena constitute our most important findings." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information. Often the less complex and less subtle the line, the more ambiguous and less interesting is the reading. Stripping the detail out of data is a style based on personal preference and fashion, considerations utterly indifferent to substantive content." (Edward R Tufte, "Envisioning Information", 1990)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Technically, there is no limit as to the number of data series that can be plotted on a single graph. Practically, if the number goes above three or four the graph becomes confusing." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"When it comes to drawing a picture of continuous data, you need to think through carefully where one interval ends and the next one begins. Failing to do this can result in overlaps or gaps between adjacent intervals, which can cause confusion." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Using colour, itʼs possible to increase the density of information even further. A single colour can be used to represent two variables simultaneously. The difficulty, however, is that there is a limited amount of information that can be packed into colour without confusion." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Bear in mind is that the use of color doesn’t always help. Use it sparingly and with a specific purpose in mind. Remember that the reader’s brain is looking for patterns, and will expect both recurrence itself and the absence of expected recurrence to carry meaning. If you’re using color to differentiate categorical data, then you need to let the reader know what the categories are. If the dimension of data you’re encoding isn’t significant enough to your message to be labeled or explained in some way - or if there is no dimension to the data underlying your use of difference colors - then you should limit your use so as not to confuse the reader." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Graphs should not be mere decoration, to amuse the easily bored. A useful graph displays data accurately and coherently, and helps us understand the data. Chartjunk, in contrast, distracts, confuses, and annoys. Chartjunk may be well-intentioned, but it is misguided. It may also be a deliberate attempt to mystify." (Gary Smith, "Standard Deviations", 2014)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

"Bad complexity neither elucidates important salient points nor shows coherent broader trends. It will obfuscate, frustrate, tax the mind, and ultimately convey trendlessness and confusion to the viewer. Good complexity, in contrast, emerges from visualizations that use more data than humans can reasonably process to form a few salient points." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

📉Graphical Representation: Groups (Just the Quotes)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"The basic principle which should be observed in designing tables is that of grouping related data, either by the use of space or, if necessary, rules. Items which are close together will be seen as being more closely related than items which are farther apart, and the judicious use of space is therefore vitally important. Similarly, ruled lines can be used to relate and divide information, and it is important to be sure which function is required. Rules should not be used to create closed compartments; this is time-wasting and it interferes with scanning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"The space between columns, on the other hand, should be just sufficient to separate them clearly, but no more. The columns should not, under any circumstances, be spread out merely to fill the width of the type area. […] Sometimes, however, it is difficult to avoid undesirably large gaps between columns, particularly where the data within any given column vary considerably in length. This problem can sometimes be solved by reversing the order of the columns […]. In other instances the insertion of additional space after every fifth entry or row can be helpful, […] but care must be taken not to imply that the grouping has any special meaning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Scatter charts show the relationships between information, plotted as points on a grid. These groupings can portray general features of the source data, and are useful for showing where correlationships occur frequently. Some scatter charts connect points of equal value to produce areas within the grid which consist of similar features." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"A good chart delineates and organizes information. It communicates complex ideas, procedures, and lists of facts by simplifying, grouping, and setting and marking priorities. By spatial organization, it should lead the eye through information smoothly and efficiently." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"When analyzing data it is many times advantageous to generate a variety of graphs using the same data. This is true whether there is little or lots of data. Reasons for this are: (1) Frequently, all aspects of a group of data can not be displayed on a single graph. (2) Multiple graphs generally result in a more in-depth understanding of the information. (3) Different aspects of the same data often become apparent. (4) Some types of graphs cause certain features of the data to stand out better (5) Some people relate better to one type of graph than another." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"If you want to hide data, try putting it into a larger group and then use the average of the group for the chart. The basis of the deceit is the endearingly innocent assumption on the part of your readers that you have been scrupulous in using a representative average: one from which individual values do not deviate all that much. In scientific or statistical circles, where audiences tend to take less on trust, the 'quality' of the average (in terms of the scatter of the underlying individual figures) is described by the standard deviation, although this figure is itself an average." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"We tend automatically to think of all the categories represented on the horizontal axis of a column Chart as being equally important. They vary of course on the value axis. Otherwise, there would be little point in the chart, but there is somehow this feeling that they are in other respects similar members of a group. This convention can be put to good use to manipulate the message of the most boring bar or column chart." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Where there is no natural ordering to the categories it can be helpful to order them by size, as this can help you to pick out any patterns or compare the relative frequencies across groups. As it can be difficult to discern immediately the numbers represented in each of the categories it is good practice to include the number of observations on which the chart is based, together with the percentages in each category." (Jenny Freeman et al, "How to Display Data", 2008)

"Grouping charts according to a theme and in sequence with the message and putting them all on the same sheet or slide helps you find the thread of the message (even if the charts are separated again later)." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The law of connectivity tells us that objects connected to other objects tend to be seen as a group. […] The law of common fate tells us that objects moving in the same direction are seen as a group."  (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The law of continuity states that we interpret images so as not to generate abrupt transitions or otherwise create images that are more complex. […] we can arbitrarily fill in the missing elements to complete a pattern. It’s also the case of time series, in which we assume that data points in the future will be a smooth continuation of the past. […] In a line chart, those series with a similar slope (that is, they appear to follow the same direction) are understood as belonging to the same group." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The law of segregation tells us that objects within a closed shape are seen as a group. A frame around objects (charts or legends, for example) has this function, but it’s also useful for adding visual annotations."  (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

📉Graphical Representation: Mosaic Plots (Just the Quotes)

"We have so consistently inveighed against the use of areas to illustrate quantities that the reader will indeed be surprised at some coming retractions. [...] But the fact is that we now propose to turn to advantage the very feature of areas which has previously been their greatest fault. [...] We now come to data in which we wish to show simultaneously three ratios or sets of ratios, one of which is always the product of the other two. In other words, we wish to show two factors or sets of factors and their product." (Karl Karsten, "Charts and Graphs", 1925)

"A contingency table specifies the joint distribution of a number of discrete variables. The numbers in a contingency table are represented by rectangles of areas proportional to the numbers, with shape and position chosen to expose deviations from independence models. The collection of rectangles for the contingency table is called a mosaic." (John A Hartigan & B Kleiner, "Mosaics for Contingency Tables", 1981)

"Mosaic displays represent the counts in a contingency table by tiles whose size is proportional to the cell count. This graphical display for categorical data generalizes readily to multiway tables."  (Michael Friendly, "Mosaic Displays for Loglinear Models", Proceedings of the Statistical Graphics, 1992)

"Although the basic mosaic display shows the data in any contingency table, it does not in general provide a visual representation of the fit of the data to a specified model. In the two-way case independence is shown when the tiles in each row align vertically, but visual assessment of other models is more difficult." (Michael Friendly, "Mosaic Displays for Loglinear Models", Proceedings of the Statistical Graphics, 1992)

"Categorical data are most often modeled using loglinear models. For certain loglinear models, mosaic plots have unique shapes that do not depend on the actual data being modeled. These shapes reflect the structure of a model, defined by the presence and absence of particular model coefficients. Displaying the expected values of a loglinear model allows one to incorporate the residuals of the model graphically and to visually judge the adequacy of the loglinear fit. This procedure leads to stepwise interactive graphical modeling of loglinear models. We show that it often results in a deeper understanding of the structure of the data. Linking mosaic plots to other inter- active displays offers additional power that allows the investigation of more complex dependence models than provided by static displays." (Martin Theus & Stephan R W Lauer, "Visualizing Loglinear Models", Journal of Computational and Graphical Statistics Vol. 8 (3), 1999)

"The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot, shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary." (Michael Friendly, "Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data", 199)

"A graphical display of a p-dimensional contingency table, the empirical distribution of p categorical variables, is a mosaic plot. Each tile (or bin) corresponds to one cell of the contingency table, its size to the number of the cell's entries. The shape of a tile is calculated during the (strictly hierarchical) construction." (Heike Hoffmann, "Generalized Odds Ratios for Visual Modeling", Journal of Computational and Graphical Statistics Vol. 10 (4), 2001)

"Mosaics are space-filling designs composed of contiguous shapes ('tiles')." (Michael Friendly, "A Brief History of the Mosaic Display", Journal of Computational and Graphical Statistics, Vol. 11 (1), 2002)

"The principal graphical ideas [of mosaic plots] are: (*) using area = height x width, to represent a quantity which depends on a product of two other variables, each of interest; (*) using recursive subsdivision to show any number of variables; (*) using shading to display some other attribute of the data; (*) purely multiplicative relations (e.g., Pij = Pi+P+j) produce equal subdivisions; (*) for two or more variables, the levels of subdivision are spaced with larger gaps at the earlier levels, to allow easier perception of the groupings at various levels, and to provide for empty cells." (Michael Friendly, "A Brief History of the Mosaic Display", Journal of Computational and Graphical Statistics, Vol. 11 (1), 2002)

"Due to their recursive definition, switching the order of variables in a mosaic plot has a strong impact on what can be read from the plot. For instance, exchanging the two variables in a two-dimensional mosaic plot results in a completely new plot rather than in a mere graphically transposed version of the original plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)  

"Mosaic plots are defined recursively, i.e., each variable that is introduced in a mosaic plot is plotted conditioned on the groups already established in the plot. As with barcharts, the area of bars or tiles is proportional to the number of observations (or the sum of the observation weights of a class). The direction along which bars are divided by a newly introduced variable is usually alternating, starting with the x-direction." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009) 

"Mosaic plots become more difficult to read for variables with more than two or three categories. One way out is to assign a constant space for all possible crossings of categories. This way, the data from the r×c table are plotted in a table-like layout. Whereas this regular layout makes it much easier to compare values across rows and columns, the plot space is used less efficiently than in a mosaic plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Conceptually, mosaic plots for s + 1 factors in strength s designs can be used for any s; in practice, the idea is limited by space constraints, especially for accommodating labels for the factor levels. All four margins are used for four-factor projections; with the next dimension, one margin has to be used for two factors. In practice, one will rarely consider mosaic plots for more factors than four at a time." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"Mosaic plots are particularly useful for design and analysis of orthogonal main effect plans. [...] mosaic plots do not reflect geometric properties relevant for designs in quantitative factors. Nevertheless, mosaic plots can also be used to visualize founding severity for designs with quantitative factors [...]" (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"Mosaic plots can get quite messy when increasing the number of variables, which is presumably the reason many commercial software products offer them for two variables only." (Ulrike Grömping, "Mosaic Plots are Useful for Visualizing Low-Order Projections of Factorial Designs", The American Statistician Vol. 68 (2), 2014)

"The way that the model differs from the data gives us clues about how we can improve our model. We can use mosaic displays to find the specific ways in which the model is different from the data, since mosaics show the residuals (or differences) of the cells with respect to the model. Looking at these differences, we can observe patterns in the deviation that will help us in our search." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

02 November 2011

📉Graphical Representation: Problems (Just the Quotes)

"Graphic methods are very commonly used in business correlation problems. On the whole, carefully handled and skillfully interpreted graphs have certain advantages over mathematical methods of determining correlation in the usual business problems. The elements of judgment and special knowledge of conditions can be more easily introduced in studying correlation graphically. Mathematical correlation is often much too rigid for the data at hand." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"One of the greatest values of the graphic chart is its use in the analysis of a problem. Ordinarily, the chart brings up many questions which require careful consideration and further research before a satisfactory conclusion can be reached. A properly drawn chart gives a cross-section picture of the situation. While charts may bring out hidden facts in tables or masses of data, they cannot take the place of careful, analysis. In fact, charts may be dangerous devices when in the hands of those unwilling to base their interpretations upon careful study. This, however, does not detract from their value when they are properly used as aids in solving statistical problems." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"90 percent of all problems can be solved by using the techniques of data stratification, histograms, and control charts. Among the causes of nonconformance, only one-fifth or less are attributable to the workers." (Kaoru Ishikawa, The Quality Management Journal Vol. 1, 1993)

"Visual thinking can begin with the three basic shapes we all learned to draw before kindergarten: the triangle, the circle, and the square. The triangle encourages you to rank parts of a problem by priority. When drawn into a triangle, these parts are less likely to get out of order and take on more importance than they should. While the triangle ranks, the circle encloses and can be used to include and/or exclude. Some problems have to be enclosed to be managed. Finally, the square serves as a versatile problem-solving tool. By assigning it attributes along its sides or corners, we can suddenly give a vague issue a specific place to live and to move about." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Although in most cases the actual value designated by a bar is determined by the location of the end of the bar, many people associate the length or area of the bar with its value. As long as the scale is linear, starts at zero, is continuous, and the bars are the same width, this presents no problem. When any of these conditions are changed, the potential exists that the graph will be misinterpreted." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Pie charts have severe perceptual problems. Experiments in graphical perception have shown that compared with dot charts, they convey information far less reliably. But if you want to display some data, and perceiving the information is not so important, then a pie chart is fine." (Richard Becker & William S Cleveland," S-Plus Trellis Graphics User's Manual", 1996)

"The ordinary histogram is constructed by binning data on a uniform grid. Although this is probably the most widely used statistical graphic, it is one of the more difficult ones to compute. Several problems arise, including choosing the number of bins (bars) and deciding where to place the cutpoints between bars." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Scatterplots are still the go-to visualization when one is examining relationships between continuous variables. One of the problems with the traditional scatterplot is that all data points are presented as if they are on equal footing. [...] Bubble maps are scatterplots with added dimensions. The most common usage is to add weight to individual data points based on population." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013) 

"One of the main problems with the visual approach to statistical data analysis is that it is too easy to generate too many plots: We can easily become totally overwhelmed by the shear number and variety of graphics that we can generate. In a sense, we have been too successful in our goal of making it easy for the user: Many, many plots can be generated, so many that it becomes impossible to understand our data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"One very common problem in data visualization is that encoding numerical variables to area is incredibly popular, but readers can’t translate it back very well." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Whatever approach you take, it’s always a good idea to define a range of reusable colour palettes so you don’t need to face the same colour design problems every time you want to create a chart or map. There will always be exceptions that require a different treatment, but it’s good to have a solid default starting point." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

📉Graphical Representation: Values (Just the Quotes)

"By [diagrams] it is possible to present at a glance all the facts which could be obtained from figures as to the increase, fluctuations, and relative importance of prices, quantities, and values of different classes of goods and trade with various countries; while the sharp irregularities of the curves give emphasis to the disturbing causes which produce any striking change." (Arthur L Bowley, "A Short Account of England's Foreign Trade in the Nineteenth Century, its Economic and Social Results", 1905)

"To summarize - with the ordinary arithmetical scale, fluctuations in large factors are very noticeable, while relatively greater fluctuations in smaller factors are barely apparent. The logarithmic scale permits the graphic representation of changes in every quantity without respect to the magnitude of the quantity itself. At the same time, the logarithmic scale shows the actual value by reference to the numbers in the vertical scale. By indicating both absolute and relative values and changes, the logarithmic scale combines the advantages of both the natural and the percentage scale without the disadvantages of either." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"With the ordinary scale, fluctuations in large factors are very noticeable, while relatively greater fluctuations in smaller factors are barely apparent. The semi-logarithmic scale permits the graphic representation of changes in every quantity on the same basis, without respect to the magnitude of the quantity itself. At the same time, it shows the actual value by reference to the numbers in the scale column. By indicating both absolute and relative value and changes to one scale, it combines the advantages of both the natural and percentage scale, without the disadvantages of either." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"An important rule in the drafting of curve charts is that the amount scale should begin at zero. In comparisons of size the omission of the zero base, unless clearly indicated, is likely to give a misleading impression of the relative values and trend." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"The function of the regression lines, as approximate representations of means of arrays, is to isolate the mean value of one variable corresponding to any given value of the other; the variation of the first variable about its mean is ignored. A regression line is an average relation, and with it there is a variation of values about the average. In the regression of y on x, the variation ignored is in the vertical direction, a variation of y up and down about the line." (Roy D G Allen, "Statistics for Economists", 1951)

"First, it is generally inadvisable to attempt to portray a series of more than four or five categories by means of pie charts. If, for example, there are six, eight, or more categories, it may be very confusing to differentiate the relative values portrayed, especially if several small sectors are of approximately the same size. Second, the pie chart may lose its effectiveness if an attempt is made to compare the component values of several circles, as might be found in a temporal or geographical series. In such case the one-hundred percent bar or column chart is more appropriate. Third, although the proportionate values portrayed in a pie chart are measured as distances along arcs about the circle, actually there is a tendency to estimate values in terms of areas of sectors or by the size of subtended angles at the center of the circle." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"The primary purpose of a graph is to show diagrammatically how the values of one of two linked variables change with those of the other. One of the most useful applications of the graph occurs in connection with the representation of statistical data." (John F Kenney & E S Keeping, "Mathematics of Statistics" Vol. I 3rd Ed., 1954)

"Where the values of a series are such that a large part the grid would be superfluous, it is the practice to break the grid thus eliminating the unused portion of the scale, but at the same time indicating the zero line. Failure to include zero in the vertical scale is a very common omission which distorts the data and gives an erroneous visual impression." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"In line charts the grid structure plays a controlling role in interpreting facts. The number of vertical rulings should be sufficient to indicate the frequency of the plottings, facilitate the reading of the time values on the horizontal scale. and indicate the interval or subdivision of time." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"To be useful data must be consistent - they must reflect periodic recordings of the value of the variable or at least possess logical internal connections. The definition of the variable under consideration cannot change during the period of measurement or enumeration. Also. if the data are to be valuable, they must be relevant to the question to be answered." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"[...] error bars are more effectively portrayed on dot charts than on bar charts. […] On the bar chart the upper values of the intervals stand out well, but the lower values are visually deemphasized and are not as well perceived as a result of being embedded in the bars. This deemphasis does not occur on the dot chart." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The logarithm is an extremely powerful and useful tool for graphical data presentation. One reason is that logarithms turn ratios into differences, and for many sets of data, it is natural to think in terms of ratios. […] Another reason for the power of logarithms is resolution. Data that are amounts or counts are often very skewed to the right; on graphs of such data, there are a few large values that take up most of the scale and the majority of the points are squashed into a small region of the scale with no resolution." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"It is common for positive data to be skewed to the right: some values bunch together at the low end of the scale and others trail off to the high end with increasing gaps between the values as they get higher. Such data can cause severe resolution problems on graphs, and the common remedy is to take logarithms. Indeed, it is the frequent success of this remedy that partly accounts for the large use of logarithms in graphical data display." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Use a reference line when there is an important value that must be seen across the entire graph, but do not let the line interfere with the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Scatter charts show the relationships between information, plotted as points on a grid. These groupings can portray general features of the source data, and are useful for showing where correlationships occur frequently. Some scatter charts connect points of equal value to produce areas within the grid which consist of similar features." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"A coordinate is a number or value used to locate a point with respect to a reference point, line, or plane. Generally the reference is zero. […] The major function of coordinates is to provide a method for encoding information on charts, graphs, and maps in such a way that viewers can accurately decode the information after the graph or map has been generated."  (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"If you want to show the growth of numbers which tend to grow by percentages, plot them on a logarithmic vertical scale. When plotted against a logarithmic vertical axis, equal percentage changes take up equal distances on the vertical axis. Thus, a constant annual percentage rate of change will plot as a straight line. The vertical scale on a logarithmic chart does not start at zero, as it shows the ratio of values (in this case, land values), and dividing by zero is impossible." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"A useful feature of a stem plot is that the values maintain their natural order, while at the same time they are laid out in a way that emphasizes the overall distribution of where the values are concentrated (that is, where the longer branches are). This enables you easily to pick out key values such as the median and quartiles." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Tables work in a variety of situations because they convey large amounts of data in a condensed fashion. Use tables in the following situations: (1) to structure data so the reader can easily pick out the information desired, (2) to display in a chart when the data contains too many variables or values, and (3) to display exact values that are more important than a visual moment in time." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"The biggest difference between line graphs and sparklines is that a sparkline is compact with no grid lines. It isnʼt meant to give precise values; rather, it should be considered just like any other word in the sentence. Its general shape acts as another term and lends additional meaning in its context. The driving forces behind these compact sparklines are speed and convenience." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Color can modify - and possibly even contradict - our intuitive response to value, because of its own powerful connotations." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Histograms are often mistaken for bar charts but there are important differences. Histograms show distribution through the frequency of quantitative values (y axis) against defined intervals of quantitative values (x axis). By contrast, bar charts facilitate comparison of categorical values. One of the distinguishing features of a histogram is the lack of gaps between the bars [...]" (Andy Kirk, "Data Visualization: A successful design process", 2012)

"After you visualize your data, there are certain things to look for […]: increasing, decreasing, outliers, or some mix, and of course, be sure you’re not mixing up noise for patterns. Also note how much of a change there is and how prominent the patterns are. How does the difference compare to the randomness in the data? Observations can stand out because of human or mechanical error, because of the uncertainty of estimated values, or because there was a person or thing that stood out from the rest. You should know which it is." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Upon discovering a visual image, the brain analyzes it in terms of primitive shapes and colors. Next, unity contours and connections are formed. As well, distinct variations are segmented. Finally, the mind attracts active attention to the significant things it found. That process is permanently running to react to similarities and dissimilarities in shapes, positions, rhythms, colors, and behavior. It can reveal patterns and pattern-violations among the hundreds of data values. That natural ability is the most important thing used in diagramming." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"A scatterplot reveals the strength and shape of the relationship between a pair of variables. A scatterplot represents the two variables by axes drawn at right angles to each other, showing the observations as a cloud of points, each point located according to its values on the two variables. Various lines can be added to the plot to help guide our search for understanding." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"The simplest and most common way to represent the empirical distribution of a numerical variable is by showing the individual values as dots arranged along a line. The main difficulty with this plot concerns how to treat tied values. We usually don't want to represent them by the same point, since that means that the two values look like one. What we can do is 'jitter' the points a bit (i.e., move them back and forth at right angles to the plot axis) so that all points are visible. […] In addition to permitting you to identify individual points, dotplots allow you to look into some of the distributional properties of a variable. […] Dotplots can also be good for looking for modality. " (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"A time series is a sequence of values, usually taken in equally spaced intervals. […] Essentially, anything with a time dimension, measured in regular intervals, can be used for time series analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Data is dirty. Let's just get that out there. How is it dirty? In all sorts of ways. Misspelled text values, date format problems, mismatching units, missing values, null values, incompatible geospatial coordinate formats, the list goes on and on." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"Another cardinal sin of data visualization is what is called 'breaking the bar' - that is, using a squiggly line or shape to show that you've cropped one or more of the bars. It's tempting to do this when you have an outlier, but it distorts the relative values between the bars." (Jonathan Schwabish, "Better Data Visualizations: A guide for scholars, researchers, and wonks", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.