09 November 2011

📉Graphical Representation: Inquiry (Just the Quotes)

"Statistics are numerical statements of facts in any department of inquiry, placed in relation to each other; statistical methods are devices for abbreviating and classifying the statements and making clear the relations." (Arthur L Bowley, "An Elementary Manual of Statistics", 1934)

"One of the greatest values of the graphic chart is its use in the analysis of a problem. Ordinarily, the chart brings up many questions which require careful consideration and further research before a satisfactory conclusion can be reached. A properly drawn chart gives a cross-section picture of the situation. While charts may bring out. hidden facts in tables or masses of data, they cannot take the place of careful, analysis. In fact, charts may be dangerous devices when in the hands of those unwilling to base their interpretations upon careful study. This, however, does not detract from their value when they are properly used as aids in solving statistical problems." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"The histogram, with its columns of area proportional to number, like the bar graph, is one of the most classical of statistical graphs. Its combination with a fitted bell-shaped curve has been common since the days when the Gaussian curve entered statistics. Yet as a graphical technique it really performs quite poorly. Who is there among us who can look at a histogram-fitted Gaussian combination and tell us, reliably, whether the fit is excellent, neutral, or poor? Who can tell us, when the fit is poor, of what the poorness consists? Yet these are just the sort of questions that a good graphical technique should answer at least approximately." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"There are as many types of questions as components in the information." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution." (Edward R Tufte, "Envisioning Information", 1990)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis [...] begins with a dataset in hand. Our purpose in data analysis is to learn what we can from those data, to help us draw conclusions about our broader research questions. Our research questions determine what sort of data we need in the first place, and how we ought to go about collecting them. Unless data collection has been done carefully, even a brilliant analyst may be unable to reach valid conclusions regarding the original research questions." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The final step in creating your graphic is to refine it. Step back and look at it with fresh eyes. Is there anything that could be removed? Or anything that should be removed because it is distracting? Consider each element in your figure and question whether it contributes enough to your overall goal to justify its contribution. Also consider whether there is anything that could be represented more clearly. Perhaps you have been so effective at simplifying your graphic that you could now include another point in the same figure. Another method of refinement is to check the placement and alignment of your labels. They should be unobtrusive and clearly indicate which object they refer to. Consistency in fonts and alignment of labels can make the difference between something that is easy and pleasant to read, and something that is cluttered and frustrating." (Felice C Frankel & Angela H DePace, "Visual Strategies", 2012)

"Early exploration of a dataset can be overwhelming, because you don’t know where to start. Ask questions about the data and let your curiosities guide you. […] Make multiple charts, compare all your variables, and see if there are interesting bits that are worth a closer look. Look at your data as a whole and then zoom in on categories and individual data points. […] Subcategories, the categories within categories (within categories), are often more revealing than the main categories. As you drill down, there can be higher variability and more interesting things to see." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Visual Organizations benefit from routinely visualizing many different types and sources of data. Doing so allows them to garner a better understanding of what’s happening and why. Equipped with this knowledge, employees are able to ask better questions and make better business decisions." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"What would a successful outcome look like? If you only had a limited amount of time or a single sentence to tell your audience what they need to know, what would you say? In particular, I find that these last two questions can lead to insightful conversation." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A data story starts out like any other story, with a beginning and a middle. However, the end should never be a fixed event, but rather a set of options or questions to trigger an action from the audience. Never forget that the goal of data storytelling is to encourage and energize critical thinking for business decisions." (James Richardson, 2017)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"What is the purpose of collecting data? People gather and store data for at least three different reasons that I can discern. One reason is that they want to build an arsenal of evidence with which to prove a point or defend an agenda that they already had to begin with. This path is problematic for obvious reasons, and yet we all find ourselves traveling on it from time to time. Another reason people collect data is that they want to feed it into an artificial intelligence algorithm to automate some process or carry out some task. […] A third reason is that they might be collecting data in order to compile information to help them better understand their situation, to answer questions they have in their mind, and to unearth new questions that they didn't think to ask." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"A good visualization can do more than just answer questions; it can help you see that there are other questions you need to answer." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"For a chart to be truly insightful, context is crucial because it provides us with the visual answer to an important question - 'compared with what'? No number on its own is inherently big or small – we need context to make that judgement. Common contextual comparisons in charts are provided by time" ('compared with last year...') and place" ('compared with the north...'). With ranking, context is provided by relative performance" ('compared with our rivals...')." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

📉Graphical Representation: Completeness (Just the Quotes)

"The title for any chart presenting data in the graphic form should be so clear and so complete that the chart and its title could be removed from the context and yet give all the information necessary for a complete interpretation of the data. Charts which present new or especially interesting facts are very frequently copied by many magazines. A chart with its title should be considered a unit, so that anyone wishing to make an abstract of the article in which the chart appears could safely transfer the chart and its title for use elsewhere." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"A man's judgment cannot be better than the information on which he has based it. Give him no news, or present him only with distorted and incomplete data, with ignorant, sloppy, or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning process and make him somewhat less than a man." (Arthur H Sulzberger, [speech] 1948)

"Graphical methodology provides powerful diagnostic tools for conveying properties of the fitted regression, for assessing the adequacy of the fit, and for suggesting improvements. There is seldom any prior guarantee that a hypothesized regression model will provide a good description of the mechanism that generated the data. Standard regression models carry with them many specific assumptions about the relationship between the response and explanatory variables and about the variation in the response that is not accounted for by the explanatory variables. In many applications of regression there is a substantial amount of prior knowledge that makes the assumptions plausible; in many other applications the assumptions are made as a starting point simply to get the analysis off the ground. But whatever the amount of prior knowledge, fitting regression equations is not complete until the assumptions have been examined." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Labels should be complete but succinct. Long and complicated labels will defeat the viewer and therefore the purpose of the graph. Treat a label as a cue to jog the memory or to complete comprehension. Shorten long labels; avoid abbreviations unless they are universally understood; avoid repetition on the same graph. A title, for instance, should not repeat what is already in the axis labels. Be consistent in terminology." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"[…] a graphic with loose, incomplete information that is too verbose, vague or passive can actually impede your audience’s ability to make sense of the information at hand. If the graphic confuses or frustrates the audience, you’re likely to do more harm than good, leave them with more questions than answers and essentially turn them away from your publication." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Information graphics are an essential component of technical communication. Very few technical documents or presentations can be considered complete without graphical elements to present some essential data. Because engineers are visually oriented, graphic aids allow their thoughts and ideas to be better understood by other engineers. Information graphics are essential in presenting data because they simplify the content, offer a visually pleasing alternative to gray text in a proposal or an article, and thereby invite interest." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"To become a great data analyst, you must be able to identify and deal with incomplete data and work to identify the data quality and accuracy issues in a data set." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Using a question as a title is a great way to guide the audience. The question helps you ensure that your charts respond directly to the question and when they do not, you can remove them. And that is the main point: You need to answer the question. If the data is not conclusive, say so. Give an explanation that relates back to your title and close the loop so that your audience is informed and gets the complete picture included in your analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Data storytelling is a method of communicating information that is custom-fit for a specific audience and offers a compelling narrative to prove a point, highlight a trend, make a sale, or all of the above. [...] Data storytelling combines three critical components, storytelling, data science, and visualizations, to create not just a colorful chart or graph, but a work of art that carries forth a narrative complete with a beginning, middle, and end." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

📉Graphical Representation: Failure (Just the Quotes)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Where the values of a series are such that a large part the grid would be superfluous, it is the practice to break the grid thus eliminating the unused portion of the scale, but at the same time indicating the zero line. Failure to include zero in the vertical scale is a very common omission which distorts the data and gives an erroneous visual impression." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"[…] the partial scale break is a weak indicator that the reader can fail to appreciate fully; visually the graph is still a single panel that invites the viewer to see, inappropriately, patterns between the two scales. […] The partial scale break also invites authors to connect points across the break, a poor practice indeed; […]" (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38" (4) 1984)

"When a graph is constructed, quantitative and categorical information is encoded, chiefly through position, size, symbols, and color. When a person looks at a graph, the information is visually decoded by the person's visual system. A graphical method is successful only if the decoding process is effective. No matter how clever and how technologically impressive the encoding, it is a failure if the decoding process is a failure. Informed decisions about how to encode data can be achieved only through an understanding of the visual decoding process, which is called graphical perception." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information." (Edward R Tufte, "Envisioning Information", 1990)

"Audience boredom is usually a content failure, not a decoration failure." (Edward R Tufte, "The cognitive style of PowerPoint", 2003)

"Diagrams are a means of communication and explanation, and they facilitate brainstorming. They serve these ends best if they are minimal. Comprehensive diagrams of the entire object model fail to communicate or explain; they overwhelm the reader with detail and they lack meaning." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"The Sixth Principle for the analysis and display of data: 'Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.' This suggests that the most effective way to improve a presentation is to get better content. It also suggests that design devices and gimmicks cannot salvage failed content." (Edward R Tufte, "Beautiful Evidence", 2006)

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008)

"Designing good visual displays with an easy-to-use interactive system is difficult. The designer’s first attempts will usually fail, so it is critical that proposed systems be tested on at least several sets of typical users. These usability tests help the designer iterate to the best possible system." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"To be sure, data doesn’t always need to be visualized, and many data visualizations just plain suck. Look around you. It’s not hard to find truly awful representations of information. Some work in concept but fail because they are too busy; they confuse people more than they convey information [...]. Visualization for the sake of visualization is unlikely to produce desired results - and this goes double in an era of Big Data. Bad is still bad, even and especially at a larger scale." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"The goal of using data visualization to make better and faster decisions may lead people to think that any data visualization that is not immediately understood is a failure. Yes, a good visualization should allow you to see things that you might have missed, and to glean insights faster, but you still have to think." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The rise of graphicacy and broader data literacy intersects with the technology that makes it possible and the critical need to understand information in ways current literacies fail. Like reading and writing, data literacy must become mainstream to fully democratize information access." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"A perfectly relevant visualization that breaks a few presentation rules is far more valuable - it’s better - than a perfectly executed, beautiful chart that contains the wrong data, communicates the wrong message, or fails to engage its audience. [...] The more relevant a data visualization is to its context, the more forgiving, to a point, we can be about its execution" (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

08 November 2011

📉Graphical Representation: Numbers (Just the Quotes)

"They [diagrams] are designed not so much to allow of reference to particular numbers, which can be better had from printed tables of figures, as to exhibit to the eye the general results of large masses of figures which it is hopeless to attack in any other way than by graphical representation." (William S Jevons, [letter to Richard Hutton] 1863)

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondences or anomalies, to find the causes, to identify the laws." (Émile Cheysson, cca. 1877)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"Graphic charts are ways of presenting quantitative as well as qualitative information in an efficient and effective visual form. Numbers and ideas presented graphically are often more easily understood. remembered. and integrated than when they are presented in narrative or tabular form. Descriptions. trends. relationships, and comparisons can be made more apparent. Less time is required to present and comprehend information when graphic methods are employed. As the old truism states, 'One picture is worth a thousand words.'" (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Inept graphics also flourish because many graphic artists believe that statistics are boring and tedious. It then follows that decorated graphics must pep up, animate, and all too often exaggerate what evidence there is in the data. […] If the statistics are boring, then you've got the wrong numbers." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers even a very large set - is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The essence of a graphic display is that a set of numbers having both magnitudes and an order are represented by an appropriate visual metaphor - the magnitude and order of the metaphorical representation match the numbers. We can display data badly by ignoring or distorting this concept." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"Lurking behind chartjunk is contempt both for information and for the audience. Chartjunk promoters imagine that numbers and details are boring, dull, and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of content. If the numbers are boring, then you've got the wrong numbers." (Edward R Tufte, "Envisioning Information", 1990)

"We analyze numbers in order to know when a change has occurred in our processes or systems. We want to know about such changes in a timely manner so that we can respond appropriately. While this sounds rather straightforward, there is a complication - the numbers can change even when our process does not. So, in our analysis of numbers, we need to have a way to distinguish those changes in the numbers that represent changes in our process from those that are essentially noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Not all statistics start out bad, but any statistic can be made worse. Numbers - even good numbers - can be misunderstood or misinterpreted. Their meanings can be stretched, twisted, distorted, or mangled. These alterations create what we can call mutant statistics - distorted versions of the original figures." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Not all statistics start out bad, but any statistic can be made worse. Numbers - even good numbers - can be misunderstood or misinterpreted. Their meanings can be stretched, twisted, distorted, or mangled. These alterations create what we can call mutant statistics - distorted versions of the original figures." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Statistics can certainly pronounce a fact, but they cannot explain it without an underlying context, or theory. Numbers have an unfortunate tendency to supersede other types of knowing. […] Numbers give the illusion of presenting more truth and precision than they are capable of providing." (Ronald J Baker, "Measure what Matters to Customers: Using Key Predictive Indicators", 2006)

"We need [graphic] techniques because figures do not speak for themselves. Numbers alone seldom make a convincing case or polish their author's image - the twin goals of that other great mind bender, rhetoric. While rhetoric deals in qualitative argument, its quantitative equivalent is graphics. As rhetoric has declined in popularity, so graphics have risen along with our acceptance of quantitative arguments. In graphics, figures finally find their own means of expression." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Another way to obscure the truth is to hide it with relative numbers. […] Relative scales are always given as percentages or proportions. An increase or decrease of a given percentage only tells us part of the story, however. We are missing the anchoring of absolute values." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"By giving numbers a proper shape, by visually encoding them, the graphic has saved you time and energy that you would otherwise waste if you had to use a table that was not designed to aid your mind." (Alberto Cairo, "The Functional Art", 2011)

"The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning." (Nate Silver, "The Signal and the Noise: Why So Many Predictions Fail-but Some Don't", 2012)

"A common mistake is that all visualization must be simple, but this skips a step. You should actually design graphics that lend clarity, and that clarity can make a chart 'simple' to read. However, sometimes a dataset is complex, so the visualization must be complex. The visualization might still work if it provides useful insights that you wouldn’t get from a spreadsheet. […] Sometimes a table is better. Sometimes it’s better to show numbers instead of abstract them with shapes. Sometimes you have a lot of data, and it makes more sense to visualize a simple aggregate than it does to show every data point." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

"Numbers are ideal vehicles for promulgating bullshit. They feel objective, but are easily manipulated to tell whatever story one desires. Words are clearly constructs of human minds, but numbers? Numbers seem to come directly from Nature herself. We know words are subjective. We know they are used to bend and blur the truth. Words suggest intuition, feeling, and expressivity. But not numbers. Numbers suggest precision and imply a scientific approach. Numbers appear to have an existence separate from the humans reporting them." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

07 November 2011

📉Graphical Representation: Emphasis (Just the Quotes)

"By [diagrams] it is possible to present at a glance all the facts which could be obtained from figures as to the increase, fluctuations, and relative importance of prices, quantities, and values of different classes of goods and trade with various countries; while the sharp irregularities of the curves give emphasis to the disturbing causes which produce any striking change." (Arthur L Bowley, "A Short Account of England's Foreign Trade in the Nineteenth Century, its Economic and Social Results", 1905)

"First, color has identity value. In other words, it serves to distinguish one thing from another. In many cases it does this much better and much quicker than black and white coding by different types of shading or lines. […] Second, color has suggestion value. […] Red is usually taken to mean a danger signal or an unfavorable condition. But since it is one of the most visible of colors it is excellent for adding emphasis, regardless of connotation. […] Green has no such unfavorable implication, and is usually appropriate for suggesting a "green light" condition. […] Similarly, every color carries its own connotations; and although they seldom make a vital difference one way or the other, it seems logical to try to make them work for you rather than against you." (Kenneth W Haemer, "Color in Chart Presentation", The American Statistician Vol. 4 (2) , 1950)

"Correct emphasis is basic to effective graphic presentation. Intensity of color is the simplest method of obtaining emphasis. For most reproduction purposes black ink on a white page is most generally used. Screens, dots and lines can, of course, be effectively used to give a gradation of tone from light grey to solid black. When original charts are the subjects of display presentation, use of colors is limited only by the subject and the emphasis desired." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Simplicity, accuracy. appropriate size, proper proportion, correct emphasis, and skilled execution - these are the factors that produce the effective chart. To achieve simplicity your chart must be designed with a definite audience in mind, show only essential information. Technical terms should be absent as far as possible. And in case of doubt it is wiser to oversimplify than to make matters unduly complex. Be careful to avoid distortion or misrepresentation. Accuracy in graphics is more a matter of portraying a clear reliable picture than reiterating exact values. Selecting the right scales and employing authoritative titles and legends are as important as precision plotting. The right size of a chart depends on its probable use, its importance, and the amount of detail involved." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Without adequate planning. it is seldom possible to achieve either proper emphasis of each component element within the chart or a presentation that is pleasing in its entirely. Too often charts are developed around a single detail without sufficient regard for the work as a whole. Good chart design requires consideration of these four major factors: (1) size, (2) proportion, (3) position and margins, and (4) composition." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"[...] exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as for those we believe might be there. Except for its emphasis on graphs, its tools are secondary to its purpose." (John W Tukey, [comment] 1979)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"In order to be easily understood, a display of information must have a logical structure which is appropriate for the user's knowledge and needs, and this structure must be clearly represented visually. In order to indicate structure, it is necessary to be able to emphasize, divide and relate items of information. Visual emphasis can be used to indicate a hierarchical relationship between items of information, as in the case of systems of headings and subheadings for example. Visual separation of items can be used to indicate that they are different in kind or are unrelated functionally, and similarly a visual relationship between items will imply that they are of a similar kind or bear some functional relation to one another. This kind of visual 'coding' helps the reader to appreciate the extent and nature of the relationship between items of information, and to adopt an appropriate scanning strategy." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"[...] error bars are more effectively portrayed on dot charts than on bar charts. […] On the bar chart the upper values of the intervals stand out well, but the lower values are visually deemphasized and are not as well perceived as a result of being embedded in the bars. This deemphasis does not occur on the dot chart." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"The plotted points on a graph should always be made to stand out well. They are, after all, the most important feature of a graph, since any lines linking them are nearly always a matter of conjecture. These lines should stop just short of the plotted points so that the latter are emphasised by the space surrounding them. Where a point happens to fall on an axis line, the axis should be broken for a short distance on either side of the point." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"An axis is the ruler that establishes regular intervals for measuring information. Because it is such a widely accepted convention, it is often taken for granted and its importance overlooked. Axes may emphasize, diminish, distort, simplify, or clutter the information. They must be used carefully and accurately." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"What distinguishes data tables from graphics is explicit comparison and the data selection that this requires. While a data table obviously also selects information, this selection is less focused than a chart's on a particular comparison. To the extent that some figures in a table are visually emphasised. say in colour or size and style of print. the table is well on its way to becoming a chart. If you're making no comparisons - because you have no particular message and so need no selection (in other words, if you are simply providing a database, number quarry or recycling facility) - tables are easier to use than charts." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"One way a chart can lie is through overemphasis of the size and scale of items, particularly when the dimension of depth isnʼt considered." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Usually, diagrams contain some noise - information unrelated to the diagram’s primary goal. Noise is decorations, redundant, and irrelevant data, unnecessarily emphasized and ambiguous icons, symbols, lines, grids, or labels. Every unnecessary element draws attention away from the central idea that the designer is trying to share. Noise reduces clarity by hiding useful information in a fog of useless data. You may quickly identify noise elements if you can remove them from the diagram or make them less intense and attractive without compromising the function." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"Beyond basic charts, practitioners must also learn to compose visualizations together elegantly. The perceptual stage focuses on making the literal charts more precise as well as working to de-emphasize the entire piece. Design choices start to consider distractions, reducing visual clutter and centering on the message. Minimalism is espoused as a core value with an emphasis on shifting toward precision as accuracy. This is the most common next step for practitioners. Minimalism is also a key stage in maturation. It is experimentation at one extreme that helps practitioners distill down to core, shared practices." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Deception (Just the Quotes)

"The zero of the scale should appear on every chart, and should shown by a heavy line carried across the sheet. If this is not done the reader may assume the bottom of the sheet to be zero and so be misled. The scale should be graduated from zero to a little over the maximum figure to be plotted on the charts, so that there will be a space between the highest peak on the curve and the top of the chart." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Under certain conditions, however, the ordinary form of graphic chart is slightly misleading. It will be conceded that its true function is to portray comparative fluctuations. This result is practically secured when the factors or quantities compared are nearly of the same value or volume, but analysis will show that this is not accomplished when the amounts compared differ greatly in value or volume. [...] The same criticism applies to charts which employ or more scales for various curve. If the different scale are in proper proportion, the result is the same as with one scale, but when two or more scales are used which are not proportional an indication may be given with respect to comparative fluctuations which is absolutely false." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"When plotting any curve the vertical scale should, if possible, be chosen so that the zero of the scale will appear on the chart. Otherwise, the reader may assume the bottom of the chart to be zero and so be grossly misled. Zero should always be indicated by a broad line much wider than the ordinary co-ordinate lines used for the background of the chart." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Admittedly a chart is primarily a picture, and for presentation purposes should be treated as such; but in most charts it is desirable to be able to read the approximate magnitudes by reference to the scales. Such reference is almost out of the question without some rulings to guide the eye. Second, the picture itself may be misleading without enough rulings to keep the eye 'honest'. Although sight is the most reliable of our senses for measuring (and most other) purposes, the unaided eye is easily deceived; and there are numerous optical illusions to prove it. A third reason, not vital, but still of some importance, is that charts without rulings may appear weak and empty and may lack the structural unity desirable in any illustration." (Kenneth W Haemer, "Hold That Line. A Plea for the Preservation of Chart Scale Ruling", The American Statistician Vol. 1 (1) 1947)

"[….] double-scale charts are likely to be misleading unless the two zero values coincide (either on or off the chart). To insure an accurate comparison of growth the scale intervals should be so chosen that both curves meet at some point. This treatment produces the effect of percentage relatives or simple index numbers with the point of juncture serving as the base point. The principal advantage of this form of presentation is that it is a short-cut method of comparing the relative change of two or more series without computation. It is especially useful for bringing together series that either vary widely in magnitude or are measured in different units and hence cannot be compared conveniently on a chart having only one absolute-amount scale. In general, the double scale treatment should not be used for presenting growth comparisons to the general reader." (Kenneth W Haemer, "Double Scales Are Dangerous", The American Statistician Vol. 2 (3) , 1948)

"An important rule in the drafting of curve charts is that the amount scale should begin at zero. In comparisons of size the omission of the zero base, unless clearly indicated, is likely to give a misleading impression of the relative values and trend." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"Just like the spoken or written word, statistics and graphs can lie. They can lie by not telling the full story. They can lead to wrong conclusions by omitting some of the important facts. [...] Always look at statistics with a critical eye, and you will not be the victim of misleading information." (Dyno Lowenstein, "Graphs", 1976)

"Probably one of the most common misuses (intentional or otherwise) of a graph is the choice of the wrong scale - wrong, that is, from the standpoint of accurate representation of the facts. Even though not deliberate, selection of a scale that magnifies or reduces - even distorts - the appearance of a curve can mislead the viewer." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"For many people the first word that comes to mind when they think about statistical charts is 'lie'. No doubt some graphics do distort the underlying data, making it hard for the viewer to learn the truth. But data graphics are no different from words in this regard, for any means of communication can be used to deceive. There is no reason to believe that graphics are especially vulnerable to exploitation by liars; in fact, most of us have pretty good graphical lie detectors that help us see right through frauds." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphs are used to meet the need to condense all the available information into a more usable quantity. The selection process of combining and condensing will inevitably produce a less than complete study and will lead the user in certain directions, producing a potential for misleading." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Reliability is highly valued by accountants and has been defined as 'the faithfulness with which it (information) represents what it purports to represent'. The reason reliability is so important is that an essential characteristic of an accounting report is its acceptance, and if a report is considered to be misleading or superfluous, it and future reports will be disregarded." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"There are two kinds of misrepresentation. In one. the numerical data do not agree with the data in the graph, or certain relevant data are omitted. This kind of misleading presentation. while perhaps hard to determine, clearly is wrong and can be avoided. In the second kind of misrepresentation, the meaning of the data is different to the preparer and to the user." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The bar of a bar chart has two aspects that can be used to visually decode quantitative information-size (length and area) and the relative position of the end of the bar along the common scale. The changing sizes of the bars is an important and imposing visual factor; thus it is important that size encode something meaningful. The sizes of bars encode the magnitudes of deviations from the baseline. If the deviations have no important interpretation, the changing sizes are wasted energy and even have the potential to mislead." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984) 

"The rule is that a graph of a change in a variable with time should always have a vertical scale that starts with zero. Otherwise, it is inherently misleading." (Douglas A Downing & Jeffrey Clark, "Forgotten Statistics: A Self-Teaching Refresher Course", 1996)

"Displaying numerical information always involves selection. The process of selection needs to be described so that the reader will not be misled." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"If you want to hide data, try putting it into a larger group and then use the average of the group for the chart. The basis of the deceit is the endearingly innocent assumption on the part of your readers that you have been scrupulous in using a representative average: one from which individual values do not deviate all that much. In scientific or statistical circles, where audiences tend to take less on trust, the 'quality' of the average (in terms of the scatter of the underlying individual figures) is described by the standard deviation, although this figure is itself an average." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"The donut, its spelling betrays its origins, is nearly always more deceit friendly than the pie, despite being modelled on a life-saving ring. This is because the hole destroys the second most important value- defining element, by hiding the slice angles in the middle." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"There are some chart types that occasionally appear in print but are so bad that they serve neither honesty nor deceit. Among these monuments to human ingenuity at the expense of common sense are the concentric donut and overlapping segments. The concentric donut is really just a bar or column chart bent back on itself to save space. However as anyone who has ever watched a two or four hundred metre race will know, to make sense of the order of arrival at the tape you have to stagger the start to take account of the bend in the track. Blithely ignoring this problem, the concentric donut uses to diminish the difference between the inner and the outer absolute values by anything up to 2.5 times." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Good graphic design is not a panacea for bad copy, poor layout or misleading statistics. If any one of these facets are feebly executed it reflects poorly on the work overall, and this includes bad graphs and charts." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"It is tempting to make charts more engaging by introducing fancy graphics or three dimensions so they leap of f the page, but doing so obscures the real data and misleads people, intentionally or not." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

05 November 2011

📉Graphical Representation: Trends (Just the Quotes)

"Wherever unusual peaks or valleys occur on a curve it is a good plan to mark these points with a small figure inside a circle. This figure should refer to a note on the back of the chart explaining the reason for the unusual condition. It is not always sufficient to show that a certain item is unusually high or low; the executive will want to know why it is that way." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"An important rule in the drafting of curve charts is that the amount scale should begin at zero. In comparisons of size the omission of the zero base, unless clearly indicated, is likely to give a misleading impression of the relative values and trend." (Rufus R Lutz, "Graphic Presentation Simplified", 1949)

"A piece of self-deception - often dear to the heart of apprentice scientists - is the drawing of a 'smooth curve'" (how attractive it sounds!) through a set of points which have about as much trend as the currants in plum duff. Once this is done, the mind, looking for order amidst chaos, follows the Jack-o'-lantern line with scant attention to the protesting shouts of the actual points. Nor, let it be whispered, is it unknown for people who should know better to rub off the offending points and publish the trend line which their foolish imagination has introduced on the flimsiest of evidence. Allied to this sin is that of overconfident extrapolation, i.e. extending the graph by guesswork beyond the range of factual information. Whenever extrapolation is attempted it should be carefully distinguished from the rest of the graph, e.g. by showing the extrapolation as a dotted line in contrast to the full line of the rest of the graph. [...] Extrapolation always calls for justification, sooner or later. Until this justification is forthcoming, it remains a provisional estimate, based on guesswork." (Michael J Moroney, "Facts from Figures", 1951)

"In line charts with an arithmetic scale, it is essential to set the base line at zero in order that the correct perspective of the general movement may not be lost. Breaking or leaving off part of the scale leads to misinterpretation, because the trend then shows a disproportionate degree of variation in movement." (Mary E Spear, "Charting Statistics", 1952)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff, "How to Lie with Statistics", 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"Since bars represent magnitude by their length, the zero line must be shown and the arithmetic scale must not be broken. Occasionally an excessively long bar in a series of bars may be broken off at the end, and the amount involved shown directly beyond it, without distorting the general trend of the other bars, but this practice applies solely when only one bar exceeds the scale." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Charts not only tell what was, they tell what is; and a trend from was to is" (projected linearly into the will be) contains better percentages than clumsy guessing." (Robert A Levy, "The Relative Strength Concept of Common Stock Forecasting", 1968)

"In certain respects, line graphs are uniquely applicable to particular graphic requirements for which a bar or circle chart could not be substituted. Strictly speaking, the line graph must be used to portray changes in a continuous variable, since technically such a variable must be represented by a line and not by 'points' or 'bars'. Line graphs are often uniquely applicable to problems of analysis, particularly when it is essential to visualize a trend, observe the behavior of a set of variables through time, or portray the same variable in differing time periods." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29" (4), 1975)

"A graphic is an illustration that, like a painting or drawing, depicts certain images on a flat surface. The graphic depends on the use of lines and shapes or symbols to represent numbers and ideas and show comparisons, trends, and relationships. The success of the graphic depends on the extent to which this representation is transmitted in a clear and interesting manner." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Graphic forms help us to perform and influence two critical functions of the mind: the gathering of information and the processing of that information. Graphs and charts are ways to increase the effectiveness and the efficiency of transmitting information in a way that enhances the reader's ability to process that information. Graphics are tools to help give meaning to information because they go beyond the provision of information and show relationships, trends, and comparisons. They help to distinguish which numbers and which ideas are more important than others in a presentation." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"Graphic misrepresentation is a frequent misuse in presentations to the nonprofessional. The granddaddy of all graphical offenses is to omit the zero on the vertical axis. As a consequence, the chart is often interpreted as if its bottom axis were zero, even though it may be far removed. This can lead to attention-getting headlines about 'a soar' or 'a dramatic rise" (or fall)'. A modest, and possibly insignificant, change is amplified into a disastrous or inspirational trend." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998) 

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"Graphs are for the forest and tables are for the trees. Graphs give you the big picture and show you the trends; tables give you the details." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Sparklines are compact line graphs that do not have a quantitative scale. They are meant to provide a quick sense of a metric's movement or trend, usually over time. They are more expressive than arrows, which only indicate change from the prior period and do not qualify the degree of change. Sparklines are significantly more compact than normal line graphs but are precise." (Wayne W Eckerson, "Performance Dashboards: Measuring, Monitoring, and Managing Your Business", 2010)

"Line graphs that show more than one line can be useful for making comparisons, but sometimes it is important to discuss each individual line. By using sparklines evaluators can call attention to and discuss individual cases. Sparklines can be embedded within a sentence to illustrate a trend and help stakeholders better understand the data. Evaluators can use this simple visualization when creating reports." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013) 

"What is good visualization? It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns, and outliers that tell you about yourself and what surrounds you. The best visualization evokes that moment of bliss when seeing something for the first time, knowing that what you see has been right in front of you, just slightly hidden. Sometimes it is a simple bar graph, and other times the visualization is complex because the data requires it." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Graphs can help us interpret data and draw inferences. They can help us see tendencies, patterns, trends, and relationships. A picture can be worth not only a thousand words, but a thousand numbers. However, a graph is essentially descriptive - a picture meant to tell a story. As with any story, bumblers may mangle the punch line and the dishonest may lie." (Gary Smith, "Standard Deviations", 2014)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually" (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"As presenters of data visualizations, often we just want our audience to understand something about their environment – a trend, a pattern, a breakdown, a way in which things have been progressing. If we ask ourselves what we want our audience to do with that information, we might have a hard time coming up with a clear answer sometimes. We might just want them to know something." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"[...] scatterplots had advantages over earlier graphic forms: the ability to see clusters, patterns, trends, and relations in a cloud of points. Perhaps most importantly, it allowed the addition of visual annotations (point symbols, lines, curves, enclosing contours, etc.) to make those relationships more coherent and tell more nuanced stories." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Data storytelling is a method of communicating information that is custom-fit for a specific audience and offers a compelling narrative to prove a point, highlight a trend, make a sale, or all of the above. [...] Data storytelling combines three critical components, storytelling, data science, and visualizations, to create not just a colorful chart or graph, but a work of art that carries forth a narrative complete with a beginning, middle, and end." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Bad complexity neither elucidates important salient points nor shows coherent broader trends. It will obfuscate, frustrate, tax the mind, and ultimately convey trendlessness and confusion to the viewer. Good complexity, in contrast, emerges from visualizations that use more data than humans can reasonably process to form a few salient points." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

04 November 2011

📉Graphical Representation: Statistics (Just the Quotes)

"Graphical statistics can be defined as: 'the expression of statistical facts by means of geometric processes' (Levasseur). Its general usefulness consists of replacing figures which, by their multiplicity, confuse memory, with a figure whose general appearance can be discovered all at once and, by speaking to the eyes, is more easily engraved in the memory." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

"Although, the tabular arrangement is the fundamental form for presenting a statistical series, a graphic representation - in a chart or diagram - is often of great aid in the study and reporting of statistical facts. Moreover, sometimes statistical data must be taken, in their sources, from graphic rather than tabular records." (William L Crum et al, "Introduction to Economic Statistics", 1938)

"The primary purpose of a graph is to show diagrammatically how the values of one of two linked variables change with those of the other. One of the most useful applications of the graph occurs in connection with the representation of statistical data." (John F Kenney & E S Keeping, "Mathematics of Statistics" Vol. I 3rd Ed., 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"Indeed the language of statistics is rarely as objective as we imagine. The way statistics are presented, their arrangement in a particular way in tables, the juxtaposition of sets of figures, in itself reflects the judgment of the author about what is significant and what is trivial in the situation which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it isa mistake – maybe due to A good rule of thumb for deciding how long the analysis of the data actually will take is (1) to add up all the time for everything you can think of - editing the data, checking for errors, calculating various statistics, thinking about the results, going back to the data to try out a new idea, and (2) then multiply the estimate obtained in this first step by five." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Just like the spoken or written word, statistics and graphs can lie. They can lie by not telling the full story. They can lead to wrong conclusions by omitting some of the important facts. [...] Always look at statistics with a critical eye, and you will not be the victim of misleading information." (Dyno Lowenstein, "Graphs", 1976)

"Learning to make graphs involves two things: (l) the techniques of plotting statistics, which might be called the artist's job; and" (2) understanding the statistics. When you know how to work out graphs, all kinds of statistics will probably become more interesting to you." (Dyno Lowenstein, "Graphs", 1976)

"Of course, statistical graphics, just like statistical calculations, are only as good as what goes into them. An ill-specified or preposterous model or a puny data set cannot be rescued by a graphic (or by calculation), no matter how clever or fancy. A silly theory means a silly graphic." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Statistics is a tool. In experimental science you plan and carry out experiments, and then analyse and interpret the results. To do this you use statistical arguments and calculations. Like any other tool - an oscilloscope, for example, or a spectrometer, or even a humble spanner - you can use it delicately or clumsily, skillfully or ineptly. The more you know about it and understand how it works, the better you will be able to use it and the more useful it will be." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"There is an interplay between statistical models and graphics, so it is advantageous to think about models before making a series of plots." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"There are two components to visualizing the structure of statistical data - graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered." (William S Cleveland, "Visualizing Data", 1993)

"But people treat mutant statistics just as they do other statistics - that is, they usually accept even the most implausible claims without question. [...] And people repeat bad statistics [...] bad statistics live on; they take on lives of their own. [...] Statistics, then, have a bad reputation. We suspect that statistics may be wrong, that people who use statistics may be 'lying' - trying to manipulate us by using numbers to somehow distort the truth. Yet, at the same time, we need statistics; we depend upon them to summarize and clarify the nature of our complex society. This is particularly true when we talk about social problems." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Every statistical analysis is an interpretation of the data, and missingness affects the interpretation. The challenge is that when the reasons for the missingness cannot be determined there is basically no way to make appropriate statistical adjustments. Sensitivity analyses are designed to model and explore a reasonable range of explanations in order to assess the robustness of the results." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"The consequence of distinguishing statistical methods from the graphics displaying them is to separate form from function. That is, the same statistic can be represented by different types of graphics, and the same type of graphic can be used to display two different statistics. […] This separability of statistical and geometric objects is what gives a system a wide range of representational opportunities." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Oftentimes a statistical graphic provides the evidence for a plausible story, and the evidence, though perhaps only circumstantial, can be quite convincing. […] But such graphical arguments are not always valid. Knowledge of the underlying phenomena and additional facts may be required." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Placing a fact within a context increases its value greatly. […] . An efficacious way to add context to statistical facts is by embedding them in a graphic. Sometimes the most helpful context is geographical, and shaded maps come to mind as examples. Sometimes the most helpful context is temporal, and time-based line graphs are the obvious choice. But how much time? The ending date (today) is usually clear, but where do you start? The starting point determines the scale. […] The starting point and hence the scale are determined by the questions that we expect the graph to answer." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"After all, we do agree that statistical data analysis is concerned with generating and evaluating hypotheses about data. For us, generating hypotheses means that we are searching for patterns in the data - trying to 'see what the data seem to say'. And evaluating hypotheses means that we are seeking an explanation or at least a simple description of what we find - trying to verify what we believe we see." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

03 November 2011

📉Graphical Representation: Confusion (Just the Quotes)

"First, it is generally inadvisable to attempt to portray a series of more than four or five categories by means of pie charts. If, for example, there are six, eight, or more categories, it may be very confusing to differentiate the relative values portrayed, especially if several small sectors are of approximately the same size. Second, the pie chart may lose its effectiveness if an attempt is made to compare the component values of several circles, as might be found in a temporal or geographical series. In such case the one-hundred percent bar or column chart is more appropriate. Third, although the proportionate values portrayed in a pie chart are measured as distances along arcs about the circle, actually there is a tendency to estimate values in terms of areas of sectors or by the size of subtended angles at the center of the circle." (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"The eye can accurately appraise only very few features of a diagram, and consequently a complicated or confusing diagram will lead the reader astray. The fundamental rule for all charting is to use a plan which is simple and which takes account, in its arrangement of the facts to be presented, of the above-mentioned capacities of the eye."  (William L Crum et al, "Introduction to Economic Statistics", 1938)

"Besides being easier to construct than a bar chart, the line chart possesses other advantages. It is easier to read, for while the bars stand out more prominently than the line, they tend to become confusing if numerous, and especially so when they record alternate increase and decrease. It is easier for the eye to follow a line across the face of the chart than to jump from bar top to bar top, and the slope of the line connecting two points is a great aid in detecting minor changes. The line is also more suggestive of movement than arc bars, and movement is the very essence of a time series. Again, a line chart permits showing two or more related variables on the same chart, or the same variable over two or more corresponding periods." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"If two or more data paths ate to appear on the graph, it is essential that these lines be labeled clearly, or at least a reference should be provided for the reader to make the necessary identifications. While clarity seems to be a most obvious goal, graphs with inadequate or confusing labeling do appear in publications, The user should not find identification of data paths troublesome or subject to misunderstanding. The designer normally should place no more than three data paths on the graph to prevent confusion - particularly if the data paths intersect at one or more points on the Cartesian plane." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"The information on a plot should be relevant to the goals of the analysis. This means that in choosing graphical methods we should match the capabilities of the methods to our needs in the context of each application. [...] Scatter plots, with the views carefully selected as in draftsman's displays, casement displays, and multiwindow plots, are likely to be more informative. We must be careful, however, not to confuse what is relevant with what we expect or want to find. Often wholly unexpected phenomena constitute our most important findings." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information. Often the less complex and less subtle the line, the more ambiguous and less interesting is the reading. Stripping the detail out of data is a style based on personal preference and fashion, considerations utterly indifferent to substantive content." (Edward R Tufte, "Envisioning Information", 1990)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Technically, there is no limit as to the number of data series that can be plotted on a single graph. Practically, if the number goes above three or four the graph becomes confusing." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"When it comes to drawing a picture of continuous data, you need to think through carefully where one interval ends and the next one begins. Failing to do this can result in overlaps or gaps between adjacent intervals, which can cause confusion." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Arbitrary category sequence and misplaced pie chart emphasis lead to general confusion and weaken messages. Although this can be used for quite deliberate and targeted deceit, manipulation of the category axis only really comes into its own with techniques that bend the relationship between the data and the optics in a more calculated way. Many of these techniques are just twins of similar ruses on the value axis. but are none the less powerful for that." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Using colour, itʼs possible to increase the density of information even further. A single colour can be used to represent two variables simultaneously. The difficulty, however, is that there is a limited amount of information that can be packed into colour without confusion." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Bear in mind is that the use of color doesn’t always help. Use it sparingly and with a specific purpose in mind. Remember that the reader’s brain is looking for patterns, and will expect both recurrence itself and the absence of expected recurrence to carry meaning. If you’re using color to differentiate categorical data, then you need to let the reader know what the categories are. If the dimension of data you’re encoding isn’t significant enough to your message to be labeled or explained in some way - or if there is no dimension to the data underlying your use of difference colors - then you should limit your use so as not to confuse the reader." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Graphs should not be mere decoration, to amuse the easily bored. A useful graph displays data accurately and coherently, and helps us understand the data. Chartjunk, in contrast, distracts, confuses, and annoys. Chartjunk may be well-intentioned, but it is misguided. It may also be a deliberate attempt to mystify." (Gary Smith, "Standard Deviations", 2014)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

"Bad complexity neither elucidates important salient points nor shows coherent broader trends. It will obfuscate, frustrate, tax the mind, and ultimately convey trendlessness and confusion to the viewer. Good complexity, in contrast, emerges from visualizations that use more data than humans can reasonably process to form a few salient points." (Scott Berinato, "Good Charts : the HBR guide to making smarter, more persuasive data visualizations", 2023)

📉Graphical Representation: Groups (Just the Quotes)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"The basic principle which should be observed in designing tables is that of grouping related data, either by the use of space or, if necessary, rules. Items which are close together will be seen as being more closely related than items which are farther apart, and the judicious use of space is therefore vitally important. Similarly, ruled lines can be used to relate and divide information, and it is important to be sure which function is required. Rules should not be used to create closed compartments; this is time-wasting and it interferes with scanning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"The space between columns, on the other hand, should be just sufficient to separate them clearly, but no more. The columns should not, under any circumstances, be spread out merely to fill the width of the type area. […] Sometimes, however, it is difficult to avoid undesirably large gaps between columns, particularly where the data within any given column vary considerably in length. This problem can sometimes be solved by reversing the order of the columns […]. In other instances the insertion of additional space after every fifth entry or row can be helpful, […] but care must be taken not to imply that the grouping has any special meaning." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Scatter charts show the relationships between information, plotted as points on a grid. These groupings can portray general features of the source data, and are useful for showing where correlationships occur frequently. Some scatter charts connect points of equal value to produce areas within the grid which consist of similar features." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"A good chart delineates and organizes information. It communicates complex ideas, procedures, and lists of facts by simplifying, grouping, and setting and marking priorities. By spatial organization, it should lead the eye through information smoothly and efficiently." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"When analyzing data it is many times advantageous to generate a variety of graphs using the same data. This is true whether there is little or lots of data. Reasons for this are: (1) Frequently, all aspects of a group of data can not be displayed on a single graph. (2) Multiple graphs generally result in a more in-depth understanding of the information. (3) Different aspects of the same data often become apparent. (4) Some types of graphs cause certain features of the data to stand out better (5) Some people relate better to one type of graph than another." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996) 

"If you want to hide data, try putting it into a larger group and then use the average of the group for the chart. The basis of the deceit is the endearingly innocent assumption on the part of your readers that you have been scrupulous in using a representative average: one from which individual values do not deviate all that much. In scientific or statistical circles, where audiences tend to take less on trust, the 'quality' of the average (in terms of the scatter of the underlying individual figures) is described by the standard deviation, although this figure is itself an average." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"We tend automatically to think of all the categories represented on the horizontal axis of a column Chart as being equally important. They vary of course on the value axis. Otherwise, there would be little point in the chart, but there is somehow this feeling that they are in other respects similar members of a group. This convention can be put to good use to manipulate the message of the most boring bar or column chart." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"Where there is no natural ordering to the categories it can be helpful to order them by size, as this can help you to pick out any patterns or compare the relative frequencies across groups. As it can be difficult to discern immediately the numbers represented in each of the categories it is good practice to include the number of observations on which the chart is based, together with the percentages in each category." (Jenny Freeman et al, "How to Display Data", 2008)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

02 November 2011

📉Graphical Representation: Problems (Just the Quotes)

"Graphic methods are very commonly used in business correlation problems. On the whole, carefully handled and skillfully interpreted graphs have certain advantages over mathematical methods of determining correlation in the usual business problems. The elements of judgment and special knowledge of conditions can be more easily introduced in studying correlation graphically. Mathematical correlation is often much too rigid for the data at hand." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"One of the greatest values of the graphic chart is its use in the analysis of a problem. Ordinarily, the chart brings up many questions which require careful consideration and further research before a satisfactory conclusion can be reached. A properly drawn chart gives a cross-section picture of the situation. While charts may bring out hidden facts in tables or masses of data, they cannot take the place of careful, analysis. In fact, charts may be dangerous devices when in the hands of those unwilling to base their interpretations upon careful study. This, however, does not detract from their value when they are properly used as aids in solving statistical problems." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"90 percent of all problems can be solved by using the techniques of data stratification, histograms, and control charts. Among the causes of nonconformance, only one-fifth or less are attributable to the workers." (Kaoru Ishikawa, The Quality Management Journal Vol. 1, 1993)

"Visual thinking can begin with the three basic shapes we all learned to draw before kindergarten: the triangle, the circle, and the square. The triangle encourages you to rank parts of a problem by priority. When drawn into a triangle, these parts are less likely to get out of order and take on more importance than they should. While the triangle ranks, the circle encloses and can be used to include and/or exclude. Some problems have to be enclosed to be managed. Finally, the square serves as a versatile problem-solving tool. By assigning it attributes along its sides or corners, we can suddenly give a vague issue a specific place to live and to move about." (Terry Richey, "The Marketer's Visual Tool Kit", 1994)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"Although in most cases the actual value designated by a bar is determined by the location of the end of the bar, many people associate the length or area of the bar with its value. As long as the scale is linear, starts at zero, is continuous, and the bars are the same width, this presents no problem. When any of these conditions are changed, the potential exists that the graph will be misinterpreted." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Grouped area graphs sometimes cause confusion because the viewer cannot determine whether the areas for the data series extend down to the zero axis. […] Grouped area graphs can handle negative values somewhat better than stacked area graphs but they still have the problem of all or portions of data curves being hidden by the data series towards the front." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"Pie charts have severe perceptual problems. Experiments in graphical perception have shown that compared with dot charts, they convey information far less reliably. But if you want to display some data, and perceiving the information is not so important, then a pie chart is fine." (Richard Becker & William S Cleveland," S-Plus Trellis Graphics User's Manual", 1996)

"The ordinary histogram is constructed by binning data on a uniform grid. Although this is probably the most widely used statistical graphic, it is one of the more difficult ones to compute. Several problems arise, including choosing the number of bins (bars) and deciding where to place the cutpoints between bars." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Scatterplots are still the go-to visualization when one is examining relationships between continuous variables. One of the problems with the traditional scatterplot is that all data points are presented as if they are on equal footing. [...] Bubble maps are scatterplots with added dimensions. The most common usage is to add weight to individual data points based on population." (Christopher Lysy, "Developments in Quantitative Data Display and Their Implications for Evaluation", 2013) 

"One of the main problems with the visual approach to statistical data analysis is that it is too easy to generate too many plots: We can easily become totally overwhelmed by the shear number and variety of graphics that we can generate. In a sense, we have been too successful in our goal of making it easy for the user: Many, many plots can be generated, so many that it becomes impossible to understand our data." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"One very common problem in data visualization is that encoding numerical variables to area is incredibly popular, but readers can’t translate it back very well." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Whatever approach you take, it’s always a good idea to define a range of reusable colour palettes so you don’t need to face the same colour design problems every time you want to create a chart or map. There will always be exceptions that require a different treatment, but it’s good to have a solid default starting point." (Alan Smith, "How Charts Work: Understand and explain data with confidence", 2022)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.