"Statistics may be defined as numerical statements of facts by means of which large aggregates are analyzed, the relations of individual units to their groups are ascertained, comparisons are made between groups, and continuous records are maintained for comparative purposes." (Melvin T Copeland. "Statistical Methods" [in: Harvard Business Studies, Vol. III, Ed. by Melvin T Copeland, 1917])
"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)
"[…] fitting lines to relationships between variables is often a useful and powerful method of summarizing a set of data. Regression analysis fits naturally with the development of causal explanations, simply because the research worker must, at a minimum, know what he or she is seeking to explain." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)
"Fitting lines to relationships between variables is the major tool of data analysis. Fitted lines often effectively summarize the data and, by doing so, help communicate the analytic results to others. Estimating a fitted line is also the first step in squeezing further information from the data.
"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers even a very large set - is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)
"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)
"Without meaningful data there can be no meaningful analysis. The interpretation of any data set must be based upon the context of those data. Unfortunately, much of the data reported to executives today are aggregated and summed over so many different operating units and processes that they cannot be said to have any context except a historical one - they were all collected during the same time period. While this may be rational with monetary figures, it can be devastating to other types of data." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)
"Ockham's Razor in statistical analysis is used implicitly when models are embedded in richer models -for example, when testing the adequacy of a linear model by incorporating a quadratic term. If the coefficient of the quadratic term is not significant, it is dropped and the linear model is assumed to summarize the data adequately." (Gerald van Belle, "Statistical Rules of Thumb", 2002)
"Every number has its limitations; every number is a product of choices that inevitably involve compromise. Statistics are intended to help us summarize, to get an overview of part of the world’s complexity. But some information is always sacrificed in the process of choosing what will be counted and how. Something is, in short, always missing. In evaluating statistics, we should not forget what has been lost, if only because this helps us understand what we still have."
"Data often arrive in raw form, as long lists of numbers. In this case your job is to summarize the data in a way that captures its essence and conveys its meaning. This can be done numerically, with measures such as the average and standard deviation, or graphically. At other times you find data already in summarized form; in this case you must understand what the summary is telling, and what it is not telling, and then interpret the information for your readers or viewers." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)
"Whereas regression is about attempting to specify the underlying relationship that summarises a set of paired data, correlation is about assessing the strength of that relationship. Where there is a very close match between the scatter of points and the regression line, correlation is said to be 'strong' or 'high' . Where the points are widely scattered, the correlation is said to be 'weak' or 'low'.
"Numerical precision should be consistent throughout and summary statistics such as means and standard deviations should not have more than one extra decimal place (or significant digit) compared to the raw data. Spurious precision should be avoided although when certain measures are to be used for further calculations or when presenting the results of analyses, greater precision may sometimes be appropriate." (Jenny Freeman et al, "How to Display Data", 2008)
"Graphical displays are often constructed to place principal focus on the individual observations in a dataset, and this is particularly helpful in identifying both the typical positions of data points and unusual or influential cases. However, in many investigations, principal interest lies in identifying the nature of underlying trends and relationships between variables, and so it is often helpful to enhance graphical displays in ways which give deeper insight into these features. This can be very beneficial both for small datasets, where variation can obscure underlying patterns, and large datasets, where the volume of data is so large that effective representation inevitably involves suitable summaries." (Adrian W Bowman, "Smoothing Techniques for Visualisation" [in "Handbook of Data Visualization"], 2008)
"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)
"In order to be effective a descriptive statistic has to make sense - it has to distill some essential characteristic of the data into a value that is both appropriate and understandable. […] the justification for computing any given statistic must come from the nature of the data themselves - it cannot come from the arithmetic, nor can it come from the statistic. If the data are a meaningless collection of values, then the summary statistics will also be meaningless - no arithmetic operation can magically create meaning out of nonsense. Therefore, the meaning of any statistic has to come from the context for the data, while the appropriateness of any statistic will depend upon the use we intend to make of that statistic." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)
"[...] things that seem hopelessly random and unpredictable when viewed in isolation often turn out to be lawful and predictable when viewed in aggregate." (Steven Strogatz, "The Joy of X: A Guided Tour of Mathematics, from One to Infinity", 2012)
"In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality - a point well made by George Box in his oft-cited remark that "all models are wrong, but some are useful". (David Hand, "Wonderful examples, but let's not close our eyes", Statistical Science 29, 2014)
"Decision trees are also discriminative models. Decision trees are induced by recursively partitioning the feature space into regions belonging to the different classes, and consequently they define a decision boundary by aggregating the neighboring regions belonging to the same class. Decision tree model ensembles based on bagging and boosting are also discriminative models." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)
"Just as with aggregated data, an average is a summary statistic that can tell you something about the data - but it is only one metric, and oftentimes a deceiving one at that. By taking all of the data and boiling it down to one value, an average (and other summary statistics) may imply that all of the underlying data is the same, even when it’s not." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)
"At very small time scales, the motion of a particle is more like a random walk, as it gets jostled about by discrete collisions with water molecules. But virtually any random movement on small time scales will give rise to Brownian motion on large time scales, just so long as the motion is unbiased. This is because of the Central Limit Theorem, which tells us that the aggregate of many small, independent motions will be normally distributed." (Field Cady, "The Data Science Handbook", 2017)
"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians."
"Again, classical statistics only summarizes data, so it does not provide even a language for asking [a counterfactual] question. Causal inference provides a notation and, more importantly, offers a solution. As with predicting the effect of interventions [...], in many cases we can emulate human retrospective thinking with an algorithm that takes what we know about the observed world and produces an answer about the counterfactual world." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)
"While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what anyone man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician." (Sir Arthur C Doyle)