"It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once. An Average is but a solitary fact, whereas if a single other fact be added to it, an entire Normal Scheme, which nearly corresponds to the observed one, starts potentially into existence." (Sir Francis Galton, "Natural Inheritance", 1889)
"Statistics may rightly be called the science of averages. […] Great numbers and the averages resulting from them, such as we always obtain in measuring social phenomena, have great inertia. […] It is this constancy of great numbers that makes statistical measurement possible. It is to great numbers that statistical measurement chiefly applies." (Sir Arthur L Bowley, "Elements of Statistics", 1901)
"[…] the new mathematics is a sort of supplement to language, affording a means of thought about form and quantity and a means of expression, more exact, compact, and ready than ordinary language. The great body of physical science, a great deal of the essential facts of financial science, and endless social and political problems are only accessible and only thinkable to those who have had a sound training in mathematical analysis, and the time may not be very remote when it will be understood that for complete initiation as an efficient citizen of one of the new great complex world wide states that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and to write." (Herbert G Wells, "Mankind In the Making", 1906)
"Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness." (Arthur L Bowley, "The Nature and Purpose of the Measurement of Social Phenomena", 1915)
"Averages are like the economic man; they are inventions, not real. When applied to salaries they hide gaunt poverty at the lower end." (Julia Lathrop, 1919)
"Scientific laws, when we have reason to think them accurate, are different in form from the common-sense rules which have exceptions: they are always, at least in physics, either differential equations, or statistical averages." (Bertrand A Russell, "The Analysis of Matter", 1927)
"An average value is a single value within the range of the data that is used to represent all of the values in the series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937)
"An average is a single value which is taken to represent a group of values. Such a representative value may be obtained in several ways, for there are several types of averages. […] Probably the most commonly used average is the arithmetic average, or arithmetic mean." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)
"Because they are determined mathematically instead of according to their position in the data, the arithmetic and geometric averages are not ascertained by graphic methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)
"[…] statistical literacy. That is, the ability to read diagrams and maps; a 'consumer' understanding of common statistical terms, as average, percent, dispersion, correlation, and index number." (Douglas Scates, "Statistics: The Mathematics for Social Problems", 1943)
"[Disorganized complexity] is a problem in which the number of variables is very large, and one in which each of the many variables has a behavior which is individually erratic, or perhaps totally unknown. However, in spite of this helter-skelter, or unknown, behavior of all the individual variables, the system as a whole possesses certain orderly and analyzable average properties. [...] [Organized complexity is] not problems of disorganized complexity, to which statistical methods hold the key. They are all problems which involve dealing simultaneously with a sizable number of factors which are interrelated into an organic whole. They are all, in the language here proposed, problems of organized complexity." (Warren Weaver, "Science and Complexity", American Scientist Vol. 36, 1948)
"The economists, of course, have great fun - and show remarkable skill - in inventing more refined index numbers. Sometimes they use geometric averages instead of arithmetic averages (the advantage here being that the geometric average is less upset by extreme oscillations in individual items), sometimes they use the harmonic average. But these are all refinements of the basic idea of the index number [...]" (Michael J Moroney, "Facts from Figures", 1951)
"The mode would form a very poor basis for any further calculations of an arithmetical nature, for it has deliberately excluded arithmetical precision in the interests of presenting a typical result. The arithmetic average, on the other hand, excellent as it is for numerical purposes, has sacrificed its desire to be typical in favour of numerical accuracy. In such a case it is often desirable to quote both measures of central tendency."
"An average does not tell the full story. It is hardly fully representative of a mass unless we know the manner in which the individual items scatter around it. A further description of the series is necessary if we are to gauge how representative the average is." (George Simpson & Fritz Kafk, "Basic Statistics", 1952)
"An average is a single value selected from a group of values to represent them in some way, a value which is supposed to stand for whole group of which it is part, as typical of all the values in the group." (Albert E Waugh, "Elements of Statistical Methods" 3rd Ed., 1952)
"Only when there is a substantial number of trials involved is the law of averages a useful description or prediction." (Darell Huff, "How to Lie with Statistics", 1954)
"Place little faith in an average or a graph or a trend when those important figures are missing." (Darell Huff, "How to Lie with Statistics", 1954)
"Every economic and social situation or problem is now described in statistical terms, and we feel that it is such statistics which give us the real basis of fact for understanding and analysing problems and difficulties, and for suggesting remedies. In the main we use such statistics or figures without any elaborate theoretical analysis; little beyond totals, simple averages and perhaps index numbers. Figures have become the language in which we describe our economy or particular parts of it, and the language in which we argue about policy." (Ely Devons, "Essays in Economics", 1961)
"The fact that index numbers attempt to measure changes of items gives rise to some knotty problems. The dispersion of a group of products increases with the passage of time, principally because some items have a long-run tendency to fall while others tend to rise. Basic changes in the demand is fundamentally responsible. The averages become less and less representative as the distance from the period increases." (Anna C Rogers, "Graphic Charts Handbook", 1961)
"Myth is more individual and expresses life more precisely than does science. Science works with concepts of averages which are far too general to do justice to the subjective variety of an individual life." (Carl G Jung, "Memories, Dreams, Reflections", 1963)
"An average is sometimes called a 'measure of central tendency' because individual values of the variable usually cluster around it. Averages are useful, however, for certain types of data in which there is little or no central tendency." (William A Spirr & Charles P Bonini, "Statistical Analysis for Business Decisions" 3rd Ed., 1967)
"The most widely used mathematical tools in the social sciences are statistical, and the prevalence of statistical methods has given rise to theories so abstract and so hugely complicated that they seem a discipline in themselves, divorced from the world outside learned journals. Statistical theories usually assume that the behavior of large numbers of people is a smooth, average 'summing-up' of behavior over a long period of time. It is difficult for them to take into account the sudden, critical points of important qualitative change. The statistical approach leads to models that emphasize the quantitative conditions needed for equilibrium-a balance of wages and prices, say, or of imports and exports. These models are ill suited to describe qualitative change and social discontinuity, and it is here that catastrophe theory may be especially helpful." (Alexander Woodcock & Monte Davis, "Catastrophe Theory", 1978)
"The arithmetic mean has another familiar property that will be useful to remember. The sum of the deviations of the values from their mean is zero, and the sum of the squared deviations of the values about the mean is a minimum. That is to say, the sum of the squared deviations is less than the sum of the squared deviations about any other value." (Charles T Clark & Lawrence L Schkade, "Statistical Analysis for Administrative Decisions", 1979)
"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)
"All the law [of large numbers] tells us is that the average of a large number of throws will be more likely than the average of a small number of throws to differ from the true average by less than some stated amount. And there will always be a possibility that the observed result will differ from the true average by a larger amount than the specified bound." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)
"It is a consequence of the definition of the arithmetic mean that the mean will lie somewhere between the lowest and highest values. In the unrealistic and meaningless case that all values which make up the mean are the same, all values will be equal to the average. In an unlikely and impractical case, it is possible for only one of many values to be above or below the average. By the very definition of the average, it is impossible for all values to be above average in any case." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)
"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)
"Since the average is a measure of location, it is common to use averages to compare two data sets. The set with the greater average is thought to ‘exceed’ the other set. While such comparisons may be helpful, they must be used with caution. After all, for any given data set, most of the values will not be equal to the average." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)
"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)
"If you want to hide data, try putting it into a larger group and then use the average of the group for the chart. The basis of the deceit is the endearingly innocent assumption on the part of your readers that you have been scrupulous in using a representative average: one from which individual values do not deviate all that much. In scientific or statistical circles, where audiences tend to take less on trust, the 'quality' of the average (in terms of the scatter of the underlying individual figures) is described by the standard deviation, although this figure is itself an average." (Nicholas Strange, "Smoke and Mirrors: How to bend facts and figures to your advantage", 2007)
"Prior to the discovery of the butterfly effect it was generally believed that small differences averaged out and were of no real significance. The butterfly effect showed that small things do matter. This has major implications for our notions of predictability, as over time these small differences can lead to quite unpredictable outcomes. For example, first of all, can we be sure that we are aware of all the small things that affect any given system or situation? Second, how do we know how these will affect the long-term outcome of the system or situation under study? The butterfly effect demonstrates the near impossibility of determining with any real degree of accuracy the long term outcomes of a series of events." (Elizabeth McMillan, Complexity, "Management and the Dynamics of Change: Challenges for practice", 2008)
"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)
"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010)
"A very different - and very incorrect - argument is that successes must be balanced by failures (and failures by successes) so that things average out. Every coin flip that lands heads makes tails more likely. Every red at roulette makes black more likely. […] These beliefs are all incorrect. Good luck will certainly not continue indefinitely, but do not assume that good luck makes bad luck more likely, or vice versa." (Gary Smith, "Standard Deviations", 2014)
"The indicators - through no particular fault of anyone in particular - have not kept up with the changing world. As these numbers have become more deeply embedded in our culture as guides to how we are doing, we rely on a few big averages that can never be accurate pictures of complicated systems for the very reason that they are too simple and that they are averages. And we have neither the will nor the resources to invent or refine our current indicators enough to integrate all of these changes." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)
"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)
"When a trait, such as academic or athletic ability, is measured imperfectly, the observed differences in performance exaggerate the actual differences in ability. Those who perform the best are probably not as far above average as they seem. Nor are those who perform the worst as far below average as they seem. Their subsequent performances will consequently regress to the mean." (Gary Smith, "Standard Deviations", 2014)
"The more complex the system, the more variable (risky) the outcomes. The profound implications of this essential feature of reality still elude us in all the practical disciplines. Sometimes variance averages out, but more often fat-tail events beget more fat-tail events because of interdependencies. If there are multiple projects running, outlier (fat-tail) events may also be positively correlated - one IT project falling behind will stretch resources and increase the likelihood that others will be compromised." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)
"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)
"[…] average isn’t something that should be considered in isolation. Your average is only as good as the data that supports it. If your sample isn’t representative of the full population, if you cherry- picked the data, or if there are other issues with your data, your average may be misleading." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"If you’re looking at an average, you are - by definition - studying a specific sample set. If you’re comparing averages, and those averages come from different sample sets, the differences in the sample sets may well be manifested in the averages. Remember, an average is only as good as the underlying data." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"In the real world, statistical issues rarely exist in isolation. You’re going to come across cases where there’s more than one problem with the data. For example, just because you identify some sampling errors doesn’t mean there aren’t also issues with cherry picking and correlations and averages and forecasts - or simply more sampling issues, for that matter. Some cases may have no statistical issues, some may have dozens. But you need to keep your eyes open in order to spot them all." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"Just as with aggregated data, an average is a summary statistic that can tell you something about the data - but it is only one metric, and oftentimes a deceiving one at that. By taking all of the data and boiling it down to one value, an average (and other summary statistics) may imply that all of the underlying data is the same, even when it’s not." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"Keep in mind that a weighted average may be different than a simple (non- weighted) average because a weighted average - by definition - counts certain data points more heavily. When you’re thinking about an average, try to determine if it’s a simple average or a weighted average. If it’s weighted, ask yourself how it’s being weighted, and see which data points count more than others." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"Outliers make it very hard to give an intuitive interpretation of the mean, but in fact, the situation is even worse than that. For a real‐world distribution, there always is a mean (strictly speaking, you can define distributions with no mean, but they’re not realistic), and when we take the average of our data points, we are trying to estimate that mean. But when there are massive outliers, just a single data point is likely to dominate the value of the mean and standard deviation, so much more data is required to even estimate the mean, let alone make sense of it." (Field Cady, "The Data Science Handbook", 2017)
"Theoretically, the normal distribution is most famous because many distributions converge to it, if you sample from them enough times and average the results. This applies to the binomial distribution, Poisson distribution and pretty much any other distribution you’re likely to encounter (technically, any one for which the mean and standard deviation are finite)." (Field Cady, "The Data Science Handbook", 2017)
"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high (for example, income) or low (for example, legs) values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)
"Random forests are essentially an ensemble of trees. They use many short trees, fitted to multiple samples of the data, and the predictions are averaged for each observation. This helps to get around a problem that trees, and many other machine learning techniques, are not guaranteed to find optimal models, in the way that linear regression is. They do a very challenging job of fitting non-linear predictions over many variables, even sometimes when there are more variables than there are observations. To do that, they have to employ 'greedy algorithms', which find a reasonably good model but not necessarily the very best model possible." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)
"Unfortunately, when an ‘average’ is reported in the media, it is often unclear whether this should be interpreted as the mean or median." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)
"Average deviation is the average amount of scatter of the items in a distribution from either the mean or the median, ignoring the signs of the deviations. The average that is taken of the scatter is an arithmetic mean, which accounts for the fact that this measure is often called the mean deviation." (Charles T Clark & Lawrence L Schkade)
"While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what anyone man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician." (Sir Arthur C Doyle)