28 December 2018

Data Science: Statistics' (Mis)usage (Just the Quotes)

"A witty statesman said, you might prove anything by figures." (Thomas Carlyle, Chartism, 1840)

"It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once. An Average is but a solitary fact, whereas if a single other fact be added to it, an entire Normal Scheme, which nearly corresponds to the observed one, starts potentially into existence." (Sir Francis Galton, "Natural Inheritance", 1889)

"No doubt statistics can be easily misinterpreted; and are often very misleading when first applied to new problems. But many of the worst fallacies involved in the misapplications of statistics are definite and can be definitely exposed, till at last no one ventures to repeat them even when addressing an uninstructed audience: and on the whole arguments which can be reduced to statistical forms, though still in a backward condition, are making more sure and more rapid advances than any others towards obtaining the general acceptance of all who have studied the subjects to which they refer." (Alfred Marshall, "Principles of Economics", 1890)

"A statistical estimate may be good or bad, accurate or the reverse; but in almost all cases it is likely to be more accurate than a casual observer’s impression, and the nature of things can only be disproved by statistical methods." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may, for instance, be called the science of counting. Counting appears at first sight to be a very simple operation, which any one can perform or which can be done automatically; but, as a matter of fact, when we come to large numbers, e.g., the population of the United Kingdom, counting is by no means easy, or within the power of an individual; limits of time and place alone prevent it being so carried out, and in no way can absolute accuracy be obtained when the numbers surpass certain limits." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Figures may not lie, but statistics compiled unscientifically and analyzed incompetently are almost sure to be misleading, and when this condition is unnecessarily chronic the so-called statisticians may be called liars." (Edwin B Wilson, "Bulletin of the American Mathematical Society", Vol 18, 1912)

"Great discoveries which give a new direction to currents of thoughts and research are not, as a rule, gained by the accumulation of vast quantities of figures and statistics. These are apt to stifle and asphyxiate and they usually follow rather than precede discovery. The great discoveries are due to the eruption of genius into a closely related field, and the transfer of the precious knowledge there found to his own domain." (Theobald Smith, Boston Medical and Surgical Journal Volume 172, 1915)

"Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness." (Arthur L Bowley, "The Nature and Purpose of the Measurement of Social Phenomena", 1915)

"Averages are like the economic man; they are inventions, not real. When applied to salaries they hide gaunt poverty at the lower end." (Julia Lathrop, 1919)

"A method is a dangerous thing unless its underlying philosophy is understood, and none more dangerous than the statistical. […] Over-attention to technique may actually blind one to the dangers that lurk about on every side- like the gambler who ruins himself with his system carefully elaborated to beat the game. In the long run it is only clear thinking, experienced methods, that win the strongholds of science." (Edwin B Wilson, "The Statistical Significance of Experimental Data", Science, Volume 58 (1493), 1923)

"[…] the methods of statistics are so variable and uncertain, so apt to be influenced by circumstances, that it is never possible to be sure that one is operating with figures of equal weight." (Havelock Ellis, "The Dance of Life", 1923)

"No human mind is capable of grasping in its entirety the meaning of any considerable quantity of numerical data." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"Without an adequate understanding of the statistical methods, the investigator in the social sciences may be like the blind man groping in a dark room for a black cat that is not there. The methods of Statistics are useful in an over-widening range of human activities in any field of thought in which numerical data may be had." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937)

"In earlier times they had no statistics and so they had to fall back on lies. Hence the huge exaggerations of primitive literature, giants, miracles, wonders! It's the size that counts. They did it with lies and we do it with statistics: but it's all the same." (Stephen Leacock, "Model memoirs and other sketches from simple to serious", 1939)

"It has long been recognized by public men of all kinds […] that statistics come under the head of lying, and that no lie is so false or inconclusive as that which is based on statistics." (Hilaire Belloc, "The Silence of the Sea", 1940)

"The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1951)

"By the laws of statistics we could probably approximate just how unlikely it is that it would happen. But people forget - especially those who ought to know better, such as yourself - that while the laws of statistics tell you how unlikely a particular coincidence is, they state just as firmly that coincidences do happen." (Robert A Heinlein, "The Door Into Summer", 1957)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"Many people use statistics as a drunkard uses a street lamp - for support rather than illumination. It is not enough to avoid outright falsehood; one must be on the alert to detect possible distortion of truth. One can hardly pick up a newspaper without seeing some sensational headline based on scanty or doubtful data." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Myth is more individual and expresses life more precisely than does science. Science works with concepts of averages which are far too general to do justice to the subjective variety of an individual life." (Carl G Jung, "Memories, Dreams, Reflections", 1963)

"It has been said that data collection is like garbage collection: before you collect it you should have in mind what you are going to do with it." (Russell Fox et al, "The Science of Science", 1964)

"[…] statistical techniques are tools of thought, and not substitutes for thought." (Abraham Kaplan, "The Conduct of Inquiry", 1964)

"He who accepts statistics indiscriminately will often be duped unnecessarily. But he who distrusts statistics indiscriminately will often be ignorant unnecessarily. There is an accessible alternative between blind gullibility and blind distrust. It is possible to interpret statistics skillfully. The art of interpretation need not be monopolized by statisticians, though, of course, technical statistical knowledge helps. Many important ideas of technical statistics can be conveyed to the non-statistician without distortion or dilution. Statistical interpretation depends not only on statistical ideas but also on ordinary clear thinking. Clear thinking is not only indispensable in interpreting statistics but is often sufficient even in the absence of specific statistical knowledge. For the statistician not only death and taxes but also statistical fallacies are unavoidable. With skill, common sense, patience and above all objectivity, their frequency can be reduced and their effects minimised. But eternal vigilance is the price of freedom from serious statistical blunders." (W Allen Wallis & Harry V Roberts, "The Nature of Statistics", 1965)

"The manipulation of statistical formulas is no substitute for knowing what one is doing." (Hubert M Blalock Jr., "Social Statistics" 2nd Ed., 1972)

"Confidence in the omnicompetence of statistical reasoning grows by what it feeds on." (Harry Hopkins, "The Numbers Game: The Bland Totalitarianism", 1973)

"Probably one of the most common misuses (intentional or otherwise) of a graph is the choice of the wrong scale - wrong, that is, from the standpoint of accurate representation of the facts. Even though not deliberate, selection of a scale that magnifies or reduces - even distorts - the appearance of a curve can mislead the viewer." (Peter H Selby, "Interpreting Graphs and Tables", 1976)

"No matter how much reverence is paid to anything purporting to be ‘statistics’, the term has no meaning unless the source, relevance, and truth are all checked." (Tom Burnam, "The Dictionary of Misinformation", 1975)

"Crude measurement usually yields misleading, even erroneous conclusions no matter how sophisticated a technique is used." (Henry T Reynolds, "Analysis of Nominal Data", 1977)

"Graphs are used to meet the need to condense all the available information into a more usable quantity. The selection process of combining and condensing will inevitably produce a less than complete study and will lead the user in certain directions, producing a potential for misleading." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"It is all too easy to notice the statistical sea that supports our thoughts and actions. If that sea loses its buoyancy, it may take a long time to regain the lost support." (William Kruskal, "Coordination Today: A Disaster or a Disgrace", The American Statistician, Vol. 37, No. 3, 1983)

"There are two kinds of misrepresentation. In one. the numerical data do not agree with the data in the graph, or certain relevant data are omitted. This kind of misleading presentation. while perhaps hard to determine, clearly is wrong and can be avoided. In the second kind of misrepresentation, the meaning of the data is different to the preparer and to the user." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"’Common sense’ is not common but needs to [be] learnt systematically […]. A ‘simple analysis’ can be harder than it looks […]. All statistical techniques, however sophisticated, should be subordinate to subjective judgment." (Christopher Chatfield, "The Initial Examination of Data", Journal of The Royal Statistical Society, Series A, Vol. 148, 1985)

"The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (John Tukey, The American Statistician, 40 (1), 1986)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M. Stigler, "Neutral Models in Biology", 1987)

"[In statistics] you have the fact that the concepts are not very clean. The idea of probability, of randomness, is not a clean mathematical idea. You cannot produce random numbers mathematically. They can only be produced by things like tossing dice or spinning a roulette wheel. With a formula, any formula, the number you get would be predictable and therefore not random. So as a statistician you have to rely on some conception of a world where things happen in some way at random, a conception which mathematicians don’t have." (Lucien LeCam, [interview] 1988)

"Torture numbers, and they will confess to anything." (Gregg Easterbrook, "New Republic", 1989)

"Statistics is a very powerful and persuasive mathematical tool. People put a lot of faith in printed numbers. It seems when a situation is described by assigning it a numerical value, the validity of the report increases in the mind of the viewer. It is the statistician's obligation to be aware that data in the eyes of the uninformed or poor data in the eyes of the naive viewer can be as deceptive as any falsehoods." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method for gathering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"Fairy tales lie just as much as statistics do, but sometimes you can find a grain of truth in them." (Sergei Lukyanenko, "The Night Watch", 1998)

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No comparison between two values can be global. A simple comparison between the current figure and some previous value and convey the behavior of any time series. […] While it is simple and easy to compare one number with another number, such comparisons are limited and weak. They are limited because of the amount of data used, and they are weak because both of the numbers are subject to the variation that is inevitably present in weak world data. Since both the current value and the earlier value are subject to this variation, it will always be difficult to determine just how much of the difference between the values is due to variation in the numbers, and how much, if any, of the difference is due to real changes in the process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Without meaningful data there can be no meaningful analysis. The interpretation of any data set must be based upon the context of those data. Unfortunately, much of the data reported to executives today are aggregated and summed over so many different operating units and processes that they cannot be said to have any context except a historical one - they were all collected during the same time period. While this may be rational with monetary figures, it can be devastating to other types of data." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Since the average is a measure of location, it is common to use averages to compare two data sets. The set with the greater average is thought to ‘exceed’ the other set. While such comparisons may be helpful, they must be used with caution. After all, for any given data set, most of the values will not be equal to the average." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Innumeracy - widespread confusion about basic mathematical ideas - means that many statistical claims about social problems don't get the critical attention they deserve. This is not simply because an innumerate public is being manipulated by advocates who cynically promote inaccurate statistics. Often, statistics about social problems originate with sincere, well-meaning people who are themselves innumerate; they may not grasp the full implications of what they are saying. Similarly, the media are not immune to innumeracy; reporters commonly repeat the figures their sources give them without bothering to think critically about them." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Not all statistics start out bad, but any statistic can be made worse. Numbers - even good numbers - can be misunderstood or misinterpreted. Their meanings can be stretched, twisted, distorted, or mangled. These alterations create what we can call mutant statistics - distorted versions of the original figures." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"The ease with which somewhat complex statistics can produce confusion is important, because we live in a world in which complex numbers are becoming more common. Simple statistical ideas - fractions, percentages, rates - are reasonably well understood by many people. But many social problems involve complex chains of cause and effect that can be understood only through complicated models developed by experts. [...] environment has an influence. Sorting out the interconnected causes of these problems requires relatively complicated statistical ideas - net additions, odds ratios, and the like. If we have an imperfect understanding of these ideas, and if the reporters and other people who relay the statistics to us share our confusion - and they probably do - the chances are good that we'll soon be hearing - and repeating, and perhaps making decisions on the basis of - mutated statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"While some social problems statistics are deliberate deceptions, many - probably the great majority - of bad statistics are the result of confusion, incompetence, innumeracy, or selective, self-righteous efforts to produce numbers that reaffirm principles and interests that their advocates consider just and right. The best response to stat wars is not to try and guess who's lying or, worse, simply to assume that the people we disagree with are the ones telling lies. Rather, we need to watch for the standard causes of bad statistics - guessing, questionable definitions or methods, mutant numbers, and inappropriate comparisons." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"This is true only if you torture the statistics until they produce the confession you want." (Larry Schweikart, "Myths of the 1980s Distort Debate over Tax Cuts", 2001) [source]

"Every number has its limitations; every number is a product of choices that inevitably involve compromise. Statistics are intended to help us summarize, to get an overview of part of the world’s complexity. But some information is always sacrificed in the process of choosing what will be counted and how. Something is, in short, always missing. In evaluating statistics, we should not forget what has been lost, if only because this helps us understand what we still have." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"In short, some numbers are missing from discussions of social issues because certain phenomena are hard to quantify, and any effort to assign numeric values to them is subject to debate. But refusing to somehow incorporate these factors into our calculations creates its own hazards. The best solution is to acknowledge the difficulties we encounter in measuring these phenomena, debate openly, and weigh the options as best we can." (Joel Best, "More Damned Lies and Statistics : How numbers confuse public issues", 2004)

"Another way to obscure the truth is to hide it with relative numbers. […] Relative scales are always given as percentages or proportions. An increase or decrease of a given percentage only tells us part of the story, however. We are missing the anchoring of absolute values." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"A sin of omission – leaving something out – is a strong one and not always recognized; itʼs hard to ask for something you donʼt know is missing. When looking into the data, even before it is graphed and charted, there is potential for abuse. Simply not having all the data or the correct data before telling your story can cause problems and unhappy endings." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"The omission of zero magnifies the ups and downs in the data, allowing us to detect changes that might otherwise be ambiguous. However, once zero has been omitted, the graph is no longer an accurate guide to the magnitude of the changes. Instead, we need to look at the actual numbers." (Gary Smith, "Standard Deviations", 2014)

"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"If we don’t understand the statistics, we’re likely to be badly mistaken about the way the world is. It is all too easy to convince ourselves that whatever we’ve seen with our own eyes is the whole truth; it isn’t. Understanding causation is tough even with good statistics, but hopeless without them. [...] And yet, if we understand only the statistics, we understand little. We need to be curious about the world that we see, hear, touch, and smell, as well as the world we can examine through a spreadsheet." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Do not put faith in what statistics say until you have carefully considered what they do not say." (William W Watt)

"Errors using inadequate data are much less than those using no data at all." (Charles Babbage)

"Facts are stubborn things, but statistics are pliable." (Mark Twain)

"I can prove anything by statistics except the truth." (George Canning

"If the statistics are boring, you've got the wrong numbers." (Edward Tufte)

"If your experiment needs statistics, you ought to have done a better experiment." (Ernest Rutherford)

"It is easy to lie with statistics. It is hard to tell the truth without it." (Andrejs Dunkels)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.