18 November 2011

📉Graphical Representation: Relevance (Just the Quotes)

"Summarization of statistical data into tabular form is an art rather than a routine following a set of formal rules. Tabulation inevitably implies a loss of detail. The original data are far too voluminous to be appreciated and understood; the significant details are mixed up with much that is irrelevant. The art of tabulation lies in the sacrifice of detail which is less significant for the purposes in hand so that what is really important can be emphasized. Tabulation implies classification, the grouping of items into classes according to various characteristics. And classification depends on clear and precise definitions." (Roy D G Allen, "Statistics for Economists", 1951)

"Charts and graphs are a method of organizing information for a unique purpose. The purpose may be to inform, to persuade, to obtain a clear understanding of certain facts, or to focus information and attention on a particular problem. The information contained in charts and graphs must, obviously, be relevant to the purpose. For decision-making purposes, information must be focused clearly on the issue or issues requiring attention. The need is not simply for 'information', but for structured information, clearly presented and narrowed to fit a distinctive decision-making context. An advantage of having a 'formula' or 'model' appropriate to a given situation is that the formula indicates what kind of information is needed to obtain a solution or answer to a specific problem." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The information on a plot should be relevant to the goals of the analysis. This means that in choosing graphical methods we should match the capabilities of the methods to our needs in the context of each application. [...] Scatter plots, with the views carefully selected as in draftsman's displays, casement displays, and multiwindow plots, are likely to be more informative. We must be careful, however, not to confuse what is relevant with what we expect or want to find. Often wholly unexpected phenomena constitute our most important findings." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"There are two kinds of misrepresentation. In one. the numerical data do not agree with the data in the graph, or certain relevant data are omitted. This kind of misleading presentation. while perhaps hard to determine, clearly is wrong and can be avoided. In the second kind of misrepresentation, the meaning of the data is different to the preparer and to the user." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Maps used as charts do not need fine cartographic detail. Their purpose is to express ideas, explain relationships, or store data for consultation. Keep your maps simple. Edit out irrelevant detail. Without distortion, try to present the facts as the main feature of your map, which should serve only as a springboard for the idea you're trying to put across." (Bruce Robertson, "How to Draw Charts & Diagrams", 1988)

"Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison, and choice - as so often it is - then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, require viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice." (Edward R Tufte, "Envisioning Information", 1990)

"Often many tracings are shown together. Extraneous parts of the tracings must be eliminated and relevant tracings should be placed in a logical order. Repetitious labels should be eliminated and labels added that will fully clarify your information." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Areas surrounding data-lines may generate unintentional optical clutter. Strong frames produce melodramatic but content-diminishing visual effects. [...] A good way to assess a display for unintentional optical clutter is to ask 'Do the prominent visual effects convey relevant content?'" (Edward R Tufte, "Beautiful Evidence", 2006)

"Evidence is evidence, whether words, numbers, images, din grams- still or moving. It is all information after all. For readers and viewers, the intellectual task remains constant regardless of the particular mode Of evidence: to understand and to reason about the materials at hand, and to appraise their quality, relevance. and integrity." (Edward R Tufte, "Beautiful Evidence", 2006)

"People tend to give greater weight to the data that they have just been exposed to than other relevant data. […] This phenomenon, where people give greater attention to recent or easily available data, is often referred to as an availability error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation - outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, willful misunderstanding of other points of view - suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"It is important to pay heed to the following detail: a disadvantage of logarithmic diagrams is that a graphical integration is not possible, i.e., the area under the curve (the integral) is of no relevance." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"A beautiful visualization has a clear goal, a message, or a particular perspective on the information that it is designed to convey. Access to this information should be as straightforward as possible, without sacrificing any necessary, relevant complexity. [...] Most importantly, beautiful visualizations reflect the qualities of the data that they represent, explicitly revealing properties and relationships inherent and implicit in the source data. As these properties and relationships become available to the reader, they bring new knowledge, insight, and enjoyment."  (Noah Iliinsky, "On Beauty", [in "Beautiful Visualization"] 2010)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Further develop the situation or problem by covering relevant background. Incorporate external context or comparison points. Give examples that illustrate the issue. Include data that demonstrates the problem. Articulate what will happen if no action is taken or no change is made. Discuss potential options for addressing the problem. Illustrate the benefits of your recommended solution." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

"The relevance to data visualization is that we are always conveying a message to some extent, and in the case of associations between variables, that message is sometimes a step removed from the data itself. If you are making visualizations, be careful not to impose your own interpretation too much when showing associations. If you are reading them, don’t assume that the message accompanying the data is as sound and scientifically based as the data themselves." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"What is the secret to getting people to use charts and dashboards? Personalization. Inserting the audience into the visualization, and making it especially meaningful and relevant to the user, never fails." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

📉Graphical Representation: Parts-to-Whole (Just the Quotes)

"The pie or sector chart makes a comparison of various components with each other and with the whole. However, this type should be used sparingly, especially when there are many segments. It is not only difficult to compare area segments, but most difficult to label them properly. When there are many divisions of the data, a bar chart would give greater clarity." (Mary E Spear, "Charting Statistics", 1952)

"A drawing can show a true picture of both the situation as a whole and its separate components at a glance, and do the job better than could figures or the spoken word. In its essence, a chart is a medium of communication conveying a thought, an idea, a situation from one mind to another and not a work of art or a statistical table. The simpler, the more direct it is, the better it will perform that service which is its sole function." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Without adequate planning, it is seldom possible to achieve either proper emphasis of each component element within the chart or a presentation that is pleasing in its entirely. Too often charts are developed around a single detail without sufficient regard for the work as a whole. Good chart design requires consideration of these four major factors: (1) size, (2) proportion, (3) position and margins, and (4) composition." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"A pie chart is comprised of a circle that is divided into segments by straight lines within the circle. The circle represents the total or whole amount. Each segment or wedge of the circle represents the proportion that a particular factor is of the total or whole amount. Thus, a pie chart in its entirety always represents whole amounts of either 100% or a total absolute number, such as 100 cents or 5,000 people. All of the segments of the pie when taken together (that is, in the aggregate) must add up to the total." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"If you want to dramatize comparisons in relation to the whole. use a pie chart. If you want to add coherence to the narrative, the pie chart also helps because it depicts a whole. If your main interest is in stressing the relationship of one factor to another, use bar charts. If you wish to achieve all these effects. you can use either type of chart. and decide on the basis of which one is more aesthetically or pictorially interesting." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"The bar graph and the column graph are popular because they are simple and easy to read. These are the most versatile of the graph forms. They can be used to display time series, to display the relationship between two items, to make a comparison among several items, and to make a comparison between parts and the whole (total). They do not appear to be as 'statistical', which is an advantage to those people who have negative attitudes toward statistics. The column graph shows values over time, and the bar graph shows values at a point in time. bar graph compares different items as of a specific time (not over time)." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"There was a controversy [in the 1920s][...]about whether the divided bar chart or the pie chart was superior for portraying the parts of a whole. The contest appears to have ended in a draw. We conclude that neither graphical form should be used because other methods are demonstrably better." (William Cleveland & Robert McGill, "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Models", Journal of the American Statistical Association 79, 1984)

"Area graphs are generally not used to convey specific values. Instead, they are most frequently used to show trends and relationships, to identify and/or add emphasis to specific information by virtue of the boldness of the shading or color, or to show parts-of-the-whole." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"The unique thing you get with a pie chart is the concept of there being a whole and, thus, parts of a whole. But if the visual is difficult to read, is it worth it?" (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"Form simplification means simplifying relationships among the components of the whole, emphasizing the whole and reducing the relevance of individual components by standardizing and generalizing relationships. This results in an increased weight of useful information (signal) against useless information (noise)." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The problem is that a pie chart does one thing well, and most people don’t use it for that one thing. Specifically, they’re great at giving you a fast and accurate estimate of the part-to-whole relationship for two of the slices. Other than that, pie charts are terrible. [...] The same strengths and shortcomings that apply to the pie chart also apply to the donut chart." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Cohesion means ideas work together to build a unified whole, which helps conversation interlink in purposeful ways, and the basic parts adhere to grammar." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

📉Graphical Representation: Prediction (Just the Quotes)

"Factual science may collect statistics, and make charts. But its predictions are, as has been well said, but past history reversed." (John Dewey, "Art as Experience", 1934)

"The great trouble with all business data upon which the statisticians and economists base their forecasts is that they are ancient history before they ever become available. They pertain to conditions which existed some weeks or months previous. The figures for what is going on at the moment in all lines of business are never available. A business index, while of great interest and value, is always historical and never predictive." (Walter E Weld, "How to Chart; Facts from Figures with Graphs", 1959)

"In part, graphing data needs to be iterative because we often do not know what to expect of the data; a graph can help discover unknown aspects of the data, and once the unknown is known, we frequently find ourselves formulating a new question about the data. Even when we understand the data and are graphing them for presentation, a graph will look different from what we had expected; our mind's eye frequently does not do a good job of predicting what our actual eyes will see." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Changing measures are a particularly common problem with comparisons over time, but measures also can cause problems of their own. [...] We cannot talk about change without making comparisons over time. We cannot avoid such comparisons, nor should we want to. However, there are several basic problems that can affect statistics about change. It is important to consider the problems posed by changing - and sometimes unchanging - measures, and it is also important to recognize the limits of predictions. Claims about change deserve critical inspection; we need to ask ourselves whether apples are being compared to apples - or to very different objects." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

Statistics can certainly pronounce a fact, but they cannot explain it without an underlying context, or theory. Numbers have an unfortunate tendency to supersede other types of knowing. […] Numbers give the illusion of presenting more truth and precision than they are capable of providing." (Ronald J Baker, "Measure what Matters to Customers: Using Key Predictive Indicators", 2006)

"Data visualization is a means to an end, not an end in itself. It's merely a bridge connecting the messenger to the receiver and its limitations are framed by our own inherent irrationalities, prejudices, assumptions, and irrational tastes. All these factors can undermine the consistency and reliability of any predicted reaction to a given visualization, but that is something we can't realistically influence." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"The term shrinkage is used in regression modeling to denote two ideas. The first meaning relates to the slope of a calibration plot, which is a plot of observed responses against predicted responses. When a dataset is used to fit the model parameters as well as to obtain the calibration plot, the usual estimation process will force the slope of observed versus predicted values to be one. When, however, parameter estimates are derived from one dataset and then applied to predict outcomes on an independent dataset, overfitting will cause the slope of the calibration plot" (i.e., the shrinkage factor ) to be less than one, a result of regression to the mean. Typically, low predictions will be too low and high predictions too high. Predictions near the mean predicted value will usually be quite accurate. The second meaning of shrinkage is a statistical estimation method that preshrinks regression coefficients towards zero so that the calibration plot for new data will not need shrinkage as its calibration slope will be one." (Frank E. Harrell Jr., "Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis" 2nd Ed, 2015)

The first myth is that prediction is always based on time-series extrapolation into the future (also known as forecasting). This is not the case: predictive analytics can be applied to generate any type of unknown data, including past and present. In addition, prediction can be applied to non-temporal (time-based) use cases such as disease progression modeling, human relationship modeling, and sentiment analysis for medication adherence, etc. The second myth is that predictive analytics is a guarantor of what will happen in the future. This also is not the case: predictive analytics, due to the nature of the insights they create, are probabilistic and not deterministic. As a result, predictive analytics will not be able to ensure certainty of outcomes." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

📉Graphical Representation: Details (Just the Quotes)

"Graphic methods convey to the mind a more comprehensive grasp of essential features than do written reports, because one can naturally gather interesting details from a picture in far less time than from a written description. Further than this, the examination of a picture allows one to make deductions of his own, while in the case of a written description the reader must, to a great degree, accept the conclusions of the author." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"It pays to keep wide awake in studying any graph. The thing looks so simple, so frank, and so appealing that the careless are easily fooled. [...] Data and formulae should be given along with the graph, so that the interested reader may look at the details if he wishes." (Michael J Moroney, "Facts from Figures", 1951)

"Simplicity, accuracy, appropriate size, proper proportion, correct emphasis, and skilled execution - these are the factors that produce the effective chart. To achieve simplicity your chart must be designed with a definite audience in mind, show only essential information. Technical terms should be absent as far as possible. And in case of doubt it is wiser to oversimplify than to make matters unduly complex. Be careful to avoid distortion or misrepresentation. Accuracy in graphics is more a matter of portraying a clear reliable picture than reiterating exact values. Selecting the right scales and employing authoritative titles and legends are as important as precision plotting. The right size of a chart depends on its probable use, its importance, and the amount of detail involved." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Typically, data analysis is messy, and little details clutter it. Not only confounding factors, but also deviant cases, minor problems in measurement, and ambiguous results lead to frustration and discouragement, so that more data are collected than analyzed. Neglecting or hiding the messy details of the data reduces the researcher's chances of discovering something new." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"There are some who argue that a graph is a success only if the important information in the data can be seen within a few seconds. While there is a place for rapidly-understood graphs, it is too limiting to make speed a requirement in science and technology, where the use of graphs ranges from, detailed, in-depth data analysis to quick presentation." (William S Cleveland, "The Elements of Graphing Data", 1985)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"Lurking behind chartjunk is contempt both for information and for the audience. Chartjunk promoters imagine that numbers and details are boring, dull, and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of content. If the numbers are boring, then you've got the wrong numbers." (Edward R Tufte, "Envisioning Information", 1990)

"Diagrams are a means of communication and explanation, and they facilitate brainstorming. They serve these ends best if they are minimal. Comprehensive diagrams of the entire object model fail to communicate or explain; they overwhelm the reader with detail and they lack meaning." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"Graphical design notations have been with us for a while [...] their primary value is in communication and understanding. A good diagram can often help communicate ideas about a design, particularly when you want to avoid a lot of details. Diagrams can also help you understand either a software system or a business process. As part of a team trying to figure out something, diagrams both help understanding and communicate that understanding throughout a team. Although they aren't, at least yet, a replacement for textual programming languages, they are a helpful assistant." (Martin Fowler, "UML Distilled: A Brief Guide to the Standard Object Modeling", 2004)

"Graphs are for the forest and tables are for the trees. Graphs give you the big picture and show you the trends; tables give you the details." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"One of the easiest ways to display data badly is to display as little information as possible. This includes not labelling axes and titles adequately, and not giving units. In addition, information that is displayed can be obscured by including unnecessary and distracting details." (Jenny Freeman et al, "How to Display Data", 2008)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The first rule of communication is to shut up and listen, so that you can get to know about the audience for your communication, whether it might be politicians, professionals or the general public. We have to understand their inevitable limitations and any misunderstandings, and fight the temptation to be too sophisticated and clever, or put in too much detail." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Dashboards are collections of several linked visualizations all in one place. The idea is very popular as part of business intelligence: having current data on activity summarized and presented all inone place. One danger of cramming a lot of disparate information into one place is that you will quickly hit information overload. Interactivity and small multiples are definitely worth considering as ways of simplifying the information a reader has to digest in a dashboard. As with so many other visualizations, layering the detail for different readers is valuable." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"However, just as in cooking, the details matter: the wrong spice can ruin the stew. In graphing data, different methods or graphical features can make it easier or harder to perceive and understand relationships or comparisons from the same data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"As we enter into certain types of analytical conversations, we expect the conversations to flow in a predictable and cohesive manner. A KPI dashboard, for example, uses redundant structures across specific dimensions or measures to convey information. A dashboard with a top-down exposition style provides high-level information first and clarifies downward, while a bottom-up dashboard starts with the details and clarifies them against the larger picture." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Communication requires the ability to expand or contract a message based on norms within a given culture or language. Expansion provides more detail, sometimes adding in information that is culturally relevant or needed for the person to understand. Contraction preserves the same intent but discards information that isn't needed by that person. Some concepts in certain situations require greater detail than others." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

17 November 2011

📉Graphical Representation: Decision-Making (Just the Quotes)

"Charts and graphs are a method of organizing information for a unique purpose. The purpose may be to inform, to persuade, to obtain a clear understanding of certain facts, or to focus information and attention on a particular problem. The information contained in charts and graphs must, obviously, be relevant to the purpose. For decision-making purposes. information must be focused clearly on the issue or issues requiring attention. The need is not simply for 'information', but for structured information, clearly presented and narrowed to fit a distinctive decision-making context. An advantage of having a 'formula' or 'model' appropriate to a given situation is that the formula indicates what kind of information is needed to obtain a solution or answer to a specific problem." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Graphs can present internal accounting data effectively. Because one of the main functions of the accountant is to communicate accounting information to users. accountants should use graphs, at least to the extent that they clarify the presentation of accounting data. present the data fairly, and enhance management's ability to make a more informed decision. It has been argued that the human brain can absorb and understand images more easily than words and numbers, and, therefore, graphs may be better communicative devices than written reports or tabular statements." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"By showing recent change in relation to many past changes, sparklines provide a context for nuanced analysis - and, one hopes, better decisions. [...] Sparklines efficiently display and narrate binary data (presence/absence, occurrence/non-occurrence, win/loss). [...] Sparklines can simultaneously accommodate several variables. [...] Sparklines can narrate on-going results detail for any process producing sequential binary outcomes." (Edward R Tufte, "Beautiful Evidence", 2006)

"A good chart can tell a story about the data, helping you understand relationships among data so you can make better decisions. The wrong chart can make a royal mess out of even the best data set." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"A data story starts out like any other story, with a beginning and a middle. However, the end should never be a fixed event, but rather a set of options or questions to trigger an action from the audience. Never forget that the goal of data storytelling is to encourage and energize critical thinking for business decisions." (James Richardson, 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"The second rule of communication is to know what you want to achieve. Hopefully the aim is to encourage open debate, and informed decision-making. But there seems no harm in repeating yet again that numbers do not speak for themselves; the context, language and graphic design all contribute to the way the communication is received. We have to acknowledge we are telling a story, and it is inevitable that people will make comparisons and judgements, no matter how much we only want to inform and not persuade. All we can do is try to pre-empt inappropriate gut reactions by design or warning." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Well-designed data graphics provide readers with deeper and more nuanced perspectives, while promoting the use of quantitative information in understanding the world and making decisions." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

📉Graphical Representation: Metaphor (Just the Quotes)

"Every metaphor is the tip of a submerged model. […] Use of theoretical models resembles the use of metaphors in requiring analogical transfer of a vocabulary. Metaphor and model-making reveal new relationships; both are attempts to pour new content into old bottles." (Max Black," Models and Metaphors", 1962)

"One should employ a metaphor in science only when there is good evidence that an important similarity or analogy exists between its primary and secondary subjects. One should seek to discover more about the relevant similarities or analogies, always considering the possibility that there are no important similarities or analogies, or alternatively, that there are quite distinct similarities for which distinct terminology should be introduced. One should try to discover what the 'essential' features of the similarities or analogies are, and one should try to assimilate one’s account of them to other theoretical work in the same subject area - that is, one should attempt to explicate the metaphor." (Richard Boyd, "Metaphor and Theory Change: What Is ‘Metaphor’ a Metaphor For?", 1979)

"The essence of a graphic display is that a set of numbers having both magnitudes and an order are represented by an appropriate visual metaphor - the magnitude and order of the metaphorical representation match the numbers. We can display data badly by ignoring or distorting this concept." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"Despite the prevailing use of graphs as metaphors for communicating and reasoning about dependencies, the task of capturing informational dependencies by graphs is not at all trivial." (Judea Pearl, "Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference", 1988)

"Perhaps our ultimate understanding of scientific topics is measured in terms of our ability to generate metaphoric pictures of what is going on. Maybe understanding is coming up with metaphoric pictures." (Per Bak, "How Nature Works: the science of self-organized criticality", 1996)

"Make use of a simple data metaphor. Regardless of the concept you are trying to convey with an information graphic, you must make sure that the visual metaphor (i.e., a circle to represent a whole, as with a pie chart) be clear and logical. Don’t get so caught up in being clever that you make illogical comparisons or use unclear metaphors. In other words, don’t make your readers have to think too hard to get the point. They’ll appreciate you for it!" (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"Specific numbers, visual descriptions of objects or events and identifiable locations don’t always jump out, and a graphic may not always present itself right away. A good graphics reporter will often discover graphics potential in less obvious ways. Is the explanation in a story getting bogged down and hard to follow? If so, can the information be organized differently? Perhaps in a more graphic manner? Is there information that hat can be conveyed conceptually to put a thought or idea into a more visual perspective? Visual metaphors (or 'data metaphors' in the case of mathematical or quantifiable information) often make it easier for people to digest information." (Jennifer George-Palilonis," A Practical Guide to Graphics Reporting: Information Graphics for Print, Web & Broadcast", 2006)

"All graphics by definition employ metaphors, but some are more metaphorical than others. Sometimes the metaphor escapes from its graphical cage, takes on a life of its own and provides exciting deception opportunities." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"All sorts of metaphorical interpretations are culturally ingrained. An astute designer will think about these possible interpretations and work with them, rather than against them." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Visual metaphors are about integrating a certain visual quality in your work that somehow conveys that extra bit of connection between the data, the design, and the topic. It goes beyond just the choice of visual variable, though this will have a strong influence. Deploying the best visual metaphor is something that really requires a strong design instinct and a certain amount of experience." (Andy Kirk, "Data Visualization: A successful design process", 2012)

16 November 2011

📉Graphical Representation: Action (Just the Quotes)

"The types of graphics used in operating a business fall into three main categories: diagrams, maps, and charts. Diagrams, such as organization diagrams, flow diagrams, and networks, are usually intended to graphically portray how an activity should be, or is being, accomplished, and who is responsible for that accomplishment. Maps such as route maps, location maps, and density maps, illustrate where an activity is, or should be, taking place, and what exists there. [...] Charts such as line charts, column charts, and surface charts, are normally constructed to show the businessman how much and when. Charts have the ability to graphically display the past, present, and anticipated future of an activity. They can be plotted so as to indicate the current direction that is being followed in relationship to what should be followed. They can indicate problems and potential problems, hopefully in time for constructive corrective action to be taken." (Robert D Carlsen & Donald L Vest, "Encyclopedia of Business Charts", 1977)

"Part of the strategy of regression modelling is to improve the model until the residuals look 'structureless', or like a simple random sample. They should only contain structure that is already taken into account (such as nonconstant variance) or imposed by the fitting process itself. By plotting them against a variety of original and derived variables, we can look for systematic patterns that relate to the model's adequacy. Although we talk about graphics for use after the model is fit, if problems with the fit are discovered at this stage of the analysis, We should take corrective action and refit the equation or a modified form of it." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We can gain further insight into what makes good plots by thinking about the process of visual perception. The eye can assimilate large amounts of visual information, perceive unanticipated structure, and recognize complex patterns; however, certain kinds of patterns are more readily perceived than others. If we thoroughly understood the interaction between the brain, eye, and picture, we could organize displays to take advantage of the things that the eye and brain do best, so that the potentially most important patterns are associated with the most easily perceived visual aspects in the display." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Many of the applications of visualization in this book give the impression that data analysis consists of an orderly progression of exploratory graphs, fitting, and visualization of fits and residuals. Coherence of discussion and limited space necessitate a presentation that appears to imply this. Real life is usually quite different. There are blind alleys. There are mistaken actions. There are effects missed until the very end when some visualization saves the day. And worse, there is the possibility of the nearly unmentionable: missed effects." (William S Cleveland, "Visualizing Data", 1993)

"Anyone who has seen, and especially used, a highly responsive interactive visualization tool will be struck by two features. First, that a mere rearrangement of how the data is displayed can lead to a surprising degree of additional insight into that data. Second, that the very property of interactivity can considerably enhance that tool's effectiveness, especially if the computer's response follows a user's action virtually immediately, say within a fraction of a second." (Robert Spence, "Information Visualization", 2001)

"All good KPIs that I have come across, that have made a difference, had the CEO’s constant attention, with daily calls to the relevant staff. [...] A KPI should tell you about what action needs to take place. [...] A KPI is deep enough in the organization that it can be tied down to an individual. [...] A good KPI will affect most of the core CSFs and more than one BSC perspective. [...] A good KPI has a flow on effect." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage 'what gets measured gets done' still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"A persuasive visualization primarily serves the relationship between the designer and the reader. It is useful when the designer wishes to change the reader’s mind about something. It represents a very specific point of view, and advocates a change of opinion or action on the part of the reader. In this category of visualization, the data represented is specifically chosen for the purpose of supporting the designer’s point of view, and is presented carefully so as to convince the reader of same." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Data alone isn’t valuable. In fact, it can be expensive in time and resources to manage and maintain. The analysis of this data is closer to something that is valuable. A clearly communicated analysis starts to transform a reflection of the world into knowledge in the minds of people. Even so, knowledge alone does not make your organization better. It is the decisions and actions of people - based on this data-sourced knowledge - that is the goal. But these decisions are seldom made in a vacuum. In most organizations, decisions are a collaborative, social experience. People come together to discuss options, review their knowledge of the situation, and arrive at a path to go down. Herein is one of the great powers of effective data products: They can shape and guide these discussions. Conclusions are seldom clear-cut, even when there is data to support a direction." (Zach Gemignani et al, "Data Fluency", 2014)

"Data captures actions and characteristics of the real world and transforms them into something that can be examined and explored after the fact." (Zach Gemignani et al, "Data Fluency", 2014)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Further develop the situation or problem by covering relevant background. Incorporate external context or comparison points. Give examples that illustrate the issue. Include data that demonstrates the problem. Articulate what will happen if no action is taken or no change is made. Discuss potential options for addressing the problem. Illustrate the benefits of your recommended solution." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"If you simply present data, it’s easy for your audience to say, Oh, that’s interesting, and move on to the next thing. But if you ask for action, your audience has to make a decision whether to comply or not. This elicits a more productive reaction from your audience, which can lead to a more productive conversation - one that might never have been started if you hadn’t recommended the action in the first place." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"A data story starts out like any other story, with a beginning and a middle. However, the end should never be a fixed event, but rather a set of options or questions to trigger an action from the audience. Never forget that the goal of data storytelling is to encourage and energize critical thinking for business decisions." (James Richardson, 2017)

"Indicators represent a way of 'distilling' the larger volume of data collected by organizations. As data become bigger and bigger, due to the greater span of control or growing complexity of operations, data management becomes increasingly difficult. Actions and decisions are greatly influenced by the nature, use and time horizon (e.g., short or long-term) of indicators." (Fiorenzo Franceschini et al, "Designing Performance Measurement Systems: Theory and Practice of Key Performance Indicators", 2019)

"The intended endpoint or destination of a data story is to guide an audience toward a better understanding and appreciation of your main point or insight, which hopefully leads to discussion, action, and change. However, if you have several divergent findings and try to combine them into a single data story, you may run the risk of confusing your audience or overwhelming them with too much information. To tell a cohesive data story, you must prioritize and limit what you focus on. Sometimes an insight deserves its own data story rather than being appended to the narrative of another insight." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

📉Graphical Representation: Designers (Just the Quotes)

"The numerous design possibilities include several varieties of line graphs that are geared to particular types of problems. The design of a graph should be adapted to the type of data being structured. The data might be percentages, index numbers, frequency distributions, probability distributions, rates of change, numbers of dollars, and so on. Consequently, the designer must be prepared to structure his graph accordingly." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Because ease of use is the purpose, this ratio of function to conceptual complexity is the ultimate test of system design. Neither function alone nor simplicity alone defines a good design. [...] Function, and not simplicity, has always been the measure of excellence for its designers." (Fred P Brooks, "The Mythical Man-Month: Essays", 1975)

"The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies - to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk."  (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Good design protects you from the need for too many highly accurate components in the system. But such design principles are still, to this date, ill-understood and need to be researched extensively. Not that good designers do not understand this intuitively, merely it is not easily incorporated into the design methods you were taught in school. Good minds are still needed in spite of all the computing tools we have developed." (Richard Hamming, "The Art of Doing Science and Engineering: Learning to Learn", 1997)

"There is no end to the information we can use. A 'good' map provides the information we need for a particular purpose - or the information the mapmaker wants us to have. To guide us, a map’s designers must consider more than content and projection; any single map involves hundreds of decisions about presentation." (Peter Turchi, "Maps of the Imagination: The writer as cartographer", 2004)

"For a given dataset there is not a great deal of advice which can be given on content and context. hose who know their own data should know best for their specific purposes. It is advisable to think hard about what should be shown and to check with others if the graphic makes the desired impression. Design should be let to designers, though some basic guidelines should be followed: consistency is important (sets of graphics should be in similar style and use equivalent scaling); proximity is helpful (place graphics on the same page, or on the facing page, of any text that refers to them); and layout should be checked (graphics should be neither too small nor too large and be attractively positioned relative to the whole page or display)." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008)

"Designers are responsible for the project’s fit and finish, that is, specifying the geometry and sizes of components so they properly mate with each other and are ergonomically and aesthetically acceptable within the operating environment." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Having a purposeless or poorly performing dashboard is more common than not. This happens when the underlying architecture is not designed properly to support the needs of dashboard interaction. There is an obvious disconnect between the design of the data warehouse and the design of the dashboards. The people who design the data warehouse do not know what the dashboard will do; and the people who design the dashboards do not know how the data warehouse was designed, resulting in a lack of cohesion between the two. A similar disconnect can also exist between the dashboard designer and the business analyst, resulting in a dashboard that may look beautiful and dazzling but brings very little business value." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"Be aware that bar charts provide ample opportunities for chart junk. The space within the bars is enticingly empty and it is tempting to put images or textures in the background. Some designers even swap out the standard bars for graphics." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"All sorts of metaphorical interpretations are culturally ingrained. An astute designer will think about these possible interpretations and work with them, rather than against them." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"A persuasive visualization primarily serves the relationship between the designer and the reader. It is useful when the designer wishes to change the reader’s mind about something. It represents a very specific point of view, and advocates a change of opinion or action on the part of the reader. In this category of visualization, the data represented is specifically chosen for the purpose of supporting the designer’s point of view, and is presented carefully so as to convince the reader of same." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"[...] visual art, primarily serves the relationship between the designer and the data. [...] it often entails unidirectional encoding of information, meaning that the reader may not be able to decode the visual presentation to understand the underlying information. [...] visual art merely translates the data into a visual form. The designer may intend only to condense it, translate it into a new medium, or make it beautiful; she may not intend for the reader to be able to extract anything from it other than enjoyment." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Information design, when successful - whether in print, on the web, or in the environment - represents the functional balance of the meaning of the information, the skills and inclinations of the designer, and the perceptions, education, experience, and needs of the audience." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Good design is an important part of any visualization, while decoration (or chart-junk) is best omitted. Statisticians should also be careful about comparing themselves to artists and designers; our goals are so different that we will fare poorly in comparison." (Hadley Wickham, "Graphical Criticism: Some Historical Notes", Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Developing a clear understanding of the requirements of a particular target audience is a tricky problem for a designer. While it might seem obvious to you that it would be a good idea to understand requirements, it’s a common pitfall for designers to cut corners by making assumptions rather than actually engaging with any target users. " (Tamara Munzner, "Visualization Analysis and Design", 2014)

"Usually, diagrams contain some noise - information unrelated to the diagram’s primary goal. Noise is decorations, redundant, and irrelevant data, unnecessarily emphasized and ambiguous icons, symbols, lines, grids, or labels. Every unnecessary element draws attention away from the central idea that the designer is trying to share. Noise reduces clarity by hiding useful information in a fog of useless data. You may quickly identify noise elements if you can remove them from the diagram or make them less intense and attractive without compromising the function." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

📉Graphical Representation: Composition (Just the Quotes)

"Nothing is so illuminating as a set of properly proportioned diagrams. [...] In addition to the significance of graphics in analytical work, it is likewise a valuable aid to the memory. A picture is manifestly more readily retained in mind than a description of the same subject, no matter how vividly it may have been expressed. A pictorial or diagrammatic illustration usually produces a firmer and more lasting impression than any composition of words or tabulation of figures, however well they may be arranged or set forth." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Without adequate planning, it is seldom possible to achieve either proper emphasis of each component element within the chart or a presentation that is pleasing in its entirely. Too often charts are developed around a single detail without sufficient regard for the work as a whole. Good chart design requires consideration of these four major factors:" (1) size," (2) proportion," (3) position and margins, and" (4) composition." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"Functional visualizations are more than innovative statistical analyses and computational algorithms. They must make sense to the user and require a visual language system that uses color, shape, line, hierarchy and composition to communicate clearly and appropriately, much like the alphabetic and character-based languages used worldwide between humans." (Matt Woolman, "Digital Information Graphics", 2002)

"While visuals are an essential part of data storytelling, data visualizations can serve a variety of purposes from analysis to communication to even art. Most data charts are designed to disseminate information in a visual manner. Only a subset of data compositions is focused on presenting specific insights as opposed to just general information. When most data compositions combine both visualizations and text, it can be difficult to discern whether a particular scenario falls into the realm of data storytelling or not." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Aligning on data ink can be a powerful way to build relationships across charts. It can be used to obscure the lines between charts, making the composition feel more seamless. [....] Alignment paradigms can also influence the layout design needed. [...] The layout added to the alignment further supports this relationship." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Beyond basic charts, practitioners must also learn to compose visualizations together elegantly. The perceptual stage focuses on making the literal charts more precise as well as working to de-emphasize the entire piece. Design choices start to consider distractions, reducing visual clutter and centering on the message. Minimalism is espoused as a core value with an emphasis on shifting toward precision as accuracy. This is the most common next step for practitioners. Minimalism is also a key stage in maturation. It is experimentation at one extreme that helps practitioners distill down to core, shared practices." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Chart choices can also create weight within the entire composition. Presenting information as a comprehensive visualization, such as in a dashboard, requires thinking beyond individual charts. In writing, we not only craft sentences, but write the composition as an entire piece. Certain sentences may drive the writing more, but all sentences play a role in conveying the message." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Visualizations are abstractions, relying on primary graphicacy skills to fully understand the composition." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

15 November 2011

📉Graphical Representation: Distribution (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"The impression created by a chart depends to a great extent on the shape of the grid and the distribution of time and amount scales. When your individual figures are a part of a series make sure your own will harmonize with the other illustrations in spacing of grid rulings, lettering, intensity of lines, and planned to take the same reduction by following the general style of the presentation." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"The logarithmic transformation serves several purposes:" (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Plotting on power-transformed scales (either cube roots or logs) is recommended only in those cases where the distribution is very asymmetric and the reference configuration for the untransformed plot would be a straight line through the origin." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Comparing normal distributions reduces to comparing only means and standard deviations. If standard deviations are the same, the task even simpler: just compare means. On the other hand, means and standard deviations may be incomplete or misleading as summaries for nonnormal distributions." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Remember that normality and symmetry are not the same thing. All normal distributions are symmetrical, but not all symmetrical distributions are normal. With water use we were able to transform the distribution to be approximately symmetrical and normal, but often symmetry is the most we can hope for. For practical purposes, symmetry (with no severe outliers) may be sufficient. Transformations are not a magic wand, however. Many distributions cannot even be made symmetrical." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"The quantile plot is a good general display since it is fairly easy to construct and does a good job of portraying many aspects of a distribution. Three convenient features of the plot are the following: First, in constructing it, we do not make any arbitrary choices of parameter values or cell boundaries [...] and no models for the data are fitted or assumed. Second, like a table, it is not a summary but a display of all the data. Third, on the quantile plot every point is plotted at a distinct location, even if there are duplicates in the data. The number of points that can be portrayed without overlap is limited only by the resolution of the plotting device. For a high resolution device several hundred points distinguished." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"A useful feature of a stem plot is that the values maintain their natural order, while at the same time they are laid out in a way that emphasizes the overall distribution of where the values are concentrated (that is, where the longer branches are). This enables you easily to pick out key values such as the median and quartiles." (Alan Graham, "Developing Thinking in Statistics", 2006)

"When displaying information visually, there are three questions one will find useful to ask as a starting point. Firstly and most importantly, it is vital to have a clear idea about what is to be displayed; for example, is it important to demonstrate that two sets of data have different distributions or that they have different mean values? Having decided what the main message is, the next step is to examine the methods available and to select an appropriate one. Finally, once the chart or table has been constructed, it is worth reflecting upon whether what has been produced truly reflects the intended message. If not, then refine the display until satisfied; for example if a chart has been used would a table have been better or vice versa?" (Jenny Freeman et al, "How to Display Data", 2008)

"'Distribution' refers to how the vof a variable are placed along an axis, keeping the proportional distances taken from the values in the table. In descriptive statistics, there are two complementary ways to study a distribution: searching for what is common (the measures of central tendency) and searching for what is different along with how much different it is (measures of dispersion)." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"The simplest and most common way to represent the empirical distribution of a numerical variable is by showing the individual values as dots arranged along a line. The main difficulty with this plot concerns how to treat tied values. We usually don't want to represent them by the same point, since that means that the two values look like one. What we can do is 'jitter' the points a bit (i.e., move them back and forth at right angles to the plot axis) so that all points are visible. […] In addition to permitting you to identify individual points, dotplots allow you to look into some of the distributional properties of a variable. […] Dotplots can also be good for looking for modality. " (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"There is no ‘correct’ way to display sets of numbers: each of the plots we have used has some advantages: strip-charts show individual points, box-and-whisker plots are convenient for rapid visual summaries, and histograms give a good feel for the underlying shape of the data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

📉Graphical Representation: Simplification (Just the Quotes)

"Judgment must be used in the showing of figures in any chart or numerical presentation, so that the figures may not give an appearance of greater accuracy than their method of collection would warrant. Too many otherwise excellent reports contain figures which give the impression of great accuracy when in reality the figures may be only the crudest approximations. Except in financial statements, it is a safe rule to use ciphers whenever possible at the right of all numbers of great size. The use of the ciphers greatly simplifies the grasping of the figures by the reader, and, at the same time, it helps to avoid the impression of an accuracy which is not warranted by the methods of collecting the data." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919) 

"The great difference between the graphic representation of yesterday, which was poorly dissociated from the figurative image, and the graphics of tomorrow, is the disappearance of the congential fixity of the image. […] When one can superimpose, juxtapose, transpose, and permute graphic images in ways that lead to groupings and classings, the graphic image passes from the dead image, the 'illustration,' to the living image, the widely accessible research instrument it is now becoming. The graphic is no longer only the 'representation' of a final simplification, it is a point of departure for the discovery of these simplifications and the means for their justification. The graphic has become, by its manageability, an instrument for information processing." (Jacques Bertin, "Semiology of graphics" ["Semiologie Graphique"], 1967)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information. Often the less complex and less subtle the line, the more ambiguous and less interesting is the reading. Stripping the detail out of data is a style based on personal preference and fashion, considerations utterly indifferent to substantive content." (Edward R Tufte, "Envisioning Information", 1990)

"A good chart delineates and organizes information. It communicates complex ideas, procedures, and lists of facts by simplifying, grouping, and setting and marking priorities. By spatial organization, it should lead the eye through information smoothly and efficiently." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"An axis is the ruler that establishes regular intervals for measuring information. Because it is such a widely accepted convention, it is often taken for granted and its importance overlooked. Axes may emphasize, diminish, distort, simplify, or clutter the information. They must be used carefully and accurately." (Mary H Briscoe, "Preparing Scientific Illustrations: A guide to better posters, presentations, and publications" 2nd ed., 1995)

"Good ideas do not communicate themselves. Ideas must be organized. Highly complex ideas need to be clarified and simplified whereas diffuse data may benefit from being combined. Ideas and data must be made interesting and comprehensible to those not familiar with them." (Mary H Briscoe, "Preparing Scientific Illustrations:  guide to better posters, presentations, and publications" 2nd ed., 1995)

"Mathematical models are continually invoking ideas of infinitely smooth surfaces, weightless strings, weightless beams, perfectly spherical balls, projectiles flying through airless space, gases which are perfectly compressible and liquids which are perfectly incompressible, and so on. The purpose of such simplifications is, in theory, to understand the world better despite the oversimplification, which you hope either will not matter or will be corrected when you construct a second (better) model." (David Wells, "You Are a Mathematician: A wise and witty introduction to the joy of numbers", 1995)

"Charts are used to represent quantitative data in a graphic format. A chart visually illustrates relationships between numbers. When creating a chart, keep in mind that the goal is to represent the data in a simplified and appealing way so as not to muddle the message the chart is meant to convey." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"Information graphics are an essential component of technical communication. Very few technical documents or presentations can be considered complete without graphical elements to present some essential data. Because engineers are visually oriented, graphic aids allow their thoughts and ideas to be better understood by other engineers. Information graphics are essential in presenting data because they simplify the content, offer a visually pleasing alternative to gray text in a proposal or an article, and thereby invite interest." (Dennis K Lieu & Sheryl Sorby, "Visualization, Modeling, and Graphics for Engineering Design", 2009)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Form simplification means simplifying relationships among the components of the whole, emphasizing the whole and reducing the relevance of individual components by standardizing and generalizing relationships. This results in an increased weight of useful information (signal) against useless information (noise)." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

14 November 2011

📉Graphical Representation: Boxplots (Just the Quotes)

"Boxplots provide information at a glance about center (median), spread (interquartile range), symmetry, and outliers. With practice they are easy to read and are especially useful for quick comparisons of two or more distributions. Sometimes unexpected features such as outliers, skew, or differences in spread are made obvious by boxplots but might otherwise go unnoticed." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data" (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Before calculating a confidence interval for a mean, first check that one of the situations just described holds. To determine whether the data are bell-shaped or skewed, and to check for outliers, plot the data using a histogram, dotplot, or stemplot. A boxplot can reveal outliers and will sometimes reveal skewness, but it cannot be used to determine the shape otherwise. The sample mean and median can also be compared to each other. Differences between the mean and the median usually occur if the data are skewed - that is, are much more spread out in one direction than in the other." (Jessica M Utts & Robert F Heckard, "Mind on Statistics", 2007)

"Symmetry and skewness can be judged, but boxplots are not entirely useful for judging shape. It is not possible to use a boxplot to judge whether or not a dataset is bell-shaped, nor is it possible to judge whether or not a dataset may be bimodal." (Jessica M Utts & Robert F Heckard, "Mind on Statistics", 2007)

"Sorting data is one of the most efficient actions to derive different views of data in order to see the variables from many angles. Sorting is usually not applied to the data itself, but to statistical objects of a plot. We might want to sort the bars in a barchart, the variables in a parallel boxplot or the categories in a boxplot y by x." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Need to consider outliers as they can affect statistics such as means, standard deviations, and correlations. They can either be explained, deleted, or accommodated (using either robust statistics or obtaining additional data to fill-in). Can be detected by methods such as box plots, scatterplots, histograms or frequency distributions." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"A boxplot is a dotplot enhanced with a schematic that provides information about the center and spread of the data, including the median, quartiles, and so on. This is a very useful way of summarizing a variable's distribution. The dotplot can also be enhanced with a diamond-shaped schematic portraying the mean and standard deviation" (or the standard error of the mean)." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"Visual clutter is one of the most serious issues with bar charts. Using a bar to represent a simple data point is clearly overkill that results in no room for more data. At times, this may make us overlook less obvious things. The population pyramids offer a glaring example of this. But dot plots are not only about reducing clutter and avoiding overstimulation. Because we don’t compare heights, dot plots actually allow us to break the scale to improve resolution, and that’s a big plus over bar charts." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"[…] the drawback of the box plot is that it tends to hide the values due to its design." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Side-by-side box plots is a simpler approach that can give a crude understanding of the relationship between one quantitative variable and two or more qualitative variables. When we have many subgroups, side-by-side box-and-whisker plots can be very useful for comparing basic features of a distribution." (Deborah Nolan & Sara Stoudt, "Communicating with Data: The Art of Writing for Data Science", 2021)

📉Graphical Representation: Extremes (Just the Quotes)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"If the underlying pattern of the data has gentle curvature with no local maxima and minima, then locally linear fitting is usually sufficient. But if there are local maxima or minima, then locally quadratic fitting typically does a better job of following the pattern of the data and maintaining local smoothness." (William S Cleveland, "Visualizing Data", 1993)

"Variance and its square root, the standard deviation, summarize the amount of spread around the mean, or how much a variable varies. Outliers influence these statistics too, even more than they influence the mean. On the other hand. the variance and standard deviation have important mathematical advantages that make them (together with the mean) the foundation of classical statistics. If a distribution appears reasonably symmetrical, with no extreme outliers, then the mean and standard deviation or variance are the summaries most analysts would use." (Lawrence C Hamilton, "Data Analysis for Social Scientists: A first course in applied statistics", 1995)

"Clearly, the mean is greatly influenced by extreme values, but it can be appropriate for many situations where extreme values do not arise. To avoid misuse, it is essential to know which summary measure best reflects the data and to use it carefully. Understanding the situation is necessary for making the right choice. Know the subject!" (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"A feature shared by both the range and the interquartile range is that they are each calculated on the basis of just two values - the range uses the maximum and the minimum values, while the IQR uses the two quartiles. The standard deviation, on the other hand, has the distinction of using, directly, every value in the set as part of its calculation. In terms of representativeness, this is a great strength. But the chief drawback of the standard deviation is that, conceptually, it is harder to grasp than other more intuitive measures of spread." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"Standard quantile graphs offer certain advantages over cumulative percent frequency graphs. Among these advantages are ease of construction, actual data points are shown as opposed to summaries of class intervals, no decisions are required as to what the best size class interval might be, the same curve functions as a less-than and greater-than curve, and the actual maximum and minimum values are shown on the graph." (Robert L Harris, "Information Graphics: A Comprehensive Illustrated Reference", 1996)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it is a mistake - maybe due to a measuring or recording error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Plotting data is a useful first stage to any analysis and will show extreme observations together with any discernible patterns. In addition the relative sizes of categories are easier to see in a diagram" (bar chart or pie chart) than in a table. Graphs are useful as they can be assimilated quickly, and are particularly helpful when presenting information to an audience. Tables can be useful for displaying information about many variables at once, while graphs can be useful for showing multiple observations on groups or individuals. Although there are no hard and fast rules about when to use a graph and when to use a table, in the context of a report or a paper it is often best to use tables so that the reader can scrutinise the numbers directly." (Jenny Freeman et al, "How to Display Data", 2008)

13 November 2011

📉Graphical Representation: Density (Just the Quotes)

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison, and choice - as so often it is - then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, require viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"Using colour, itʼs possible to increase the density of information even further. A single colour can be used to represent two variables simultaneously. The difficulty, however, is that there is a limited amount of information that can be packed into colour without confusion." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"The use of the density scale to construct the histogram ensures that the area of each rectangle in the histogram will be proportional to the corresponding relative frequency. The formula for density can also be used when class widths are equal. However, when the intervals are of equal width, the extra arithmetic required to obtain the densities is unnecessary." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Linking is a powerful dynamic interactive graphics technique that can help us better understand high-dimensional data. This technique works in the following way: When several plots are linked, selecting an observation's point in a plot will do more than highlight the observation in the plot we are interacting with - it will also highlight points in other plots with which it is linked, giving us a more complete idea of its value across all the variables. Selecting is done interactively with a pointing device. The point selected, and corresponding points in the other linked plots, are highlighted simultaneously. Thus, we can select a cluster of points in one plot and see if it corresponds to a cluster in any other plot, enabling us to investigate the high-dimensional shape and density of the cluster of points, and permitting us to investigate the structure of the disease space." (Forrest W Young et al, "Visual Statistics: Seeing data with dynamic interactive graphics", 2016)

"When there are few data points, place the data labels directly on the data. Data density refers to the amount of data shown in a visualization through encodings (points, bars, lines, etc.). A common mistake is presenting too much data in a single data graph. The data itself can obscure the insight. It can make the chart unreadable because the data values are not discernible. Examples include: overlapping data points, too many lines in a line chart, or too many slices in a pie chart. Selecting the appropriate amount of data requires a delicate balance. It is your job to determine how much detail is necessary." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

📉Graphical Representation: Missing Data (Just the Quotes)

"Missing data values pose a particularly sticky problem for symbols. For instance, if the ray corresponding to a missing value is simply left off of a star symbol, the result will be almost indistinguishable from a minimum (i.e., an extreme) value. It may be better either (i) to impute a value, perhaps a median for that variable, or a fitted value from some regression on other variables, (ii) to indicate that the value is missing, possibly with a dashed line, or (iii) not to draw the symbol for a particular observation if any value is missing." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We often think, naïvely, that missing data are the primary impediments to intellectual progress - just find the right facts and all problems will dissipate. But barriers are often deeper and more abstract in thought. We must have access to the right metaphor, not only to the requisite information. Revolutionary thinkers are not, primarily, gatherers of facts, but weavers of new intellectual structures." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"People tend to give greater weight to the data that they have just been exposed to than other relevant data. […] This phenomenon, where people give greater attention to recent or easily available data, is often referred to as an availability error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"There are many reasons for the existence of missing values: the failure of a sensor, different recording standards for different parts of a sample, or structural differences of the objects observed that make it impossible to record all attributes for all observed instances." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[…] events will always occur that cannot be foreseen by following a chain of logical deductive reasoning. Successful prediction requires intuitive leaps and/or information that is not part of the original data available." (John L Casti, "X-Events: The Collapse of Everything", 2012)

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[…] people attempt to use highly flexible mathematical structures with large numbers of parameters that can be adjusted to fit the data, the result often being models that fit the data well but lack structural representation of the phenomena and thus are not predictive outside the range of the data. The situation is exacerbated by uncertainty regarding model parameters on account of insufficient data relative to model complexity, which in fact means uncertainty regarding the models themselves. More importantly from the standpoint of epistemology, the amount of available data is often miniscule in comparison to the amount needed for validation. The desire for knowledge has far outstripped experimental/observational capability. We are starved for data." (Edward R Dougherty, "The Evolution of Scientific Knowledge: From certainty to uncertainty", 2016)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious […]. Other omissions show up only when we take a close look at the claim in question." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.