29 April 2006

🖍️John W Tukey - Collected Quotes

"[We] need men who can practice science - not a particular science - in a word, we need scientific generalists." (John W Tukey, "The Education of a Scientific Generalist", 1949)

"[...] the whole of modern statistics, philosophy and methods alike, is based on the principle of interpreting what did happen in terms of what might have happened." (John W Tukey, "Standard Methods of Analyzing Data, 1951)

"Just remember that not all statistics has been mathematized - and that we may not have to wait for its mathematization in order to use it." (John W Tukey, "The Growth of Experimental Design in a Research Laboratory, 1953)

"Difficulties in identifying problems have delayed statistics far more than difficulties in solving problems." (John W Tukey, Unsolved Problems of Experimental Statistics, 1954)

"Predictions, prophecies, and perhaps even guidance - those who suggested this title to me must have hoped for such-even though occasional indulgences in such actions by statisticians has undoubtedly contributed to the characterization of a statistician as a man who draws straight lines from insufficient data to foregone conclusions!" (John W Tukey, "Where do We Go From Here?", Journal of the American Statistical Association, Vol. 55 (289), 1960)

"Today one of statistics' great needs is a body of able investigators who make it clear to the intellectual world that they are scientific statisticians. and they are proud of that fact that to them mathematics is incidental, though perhaps indispensable." (John W Tukey, "Statistical and Quantitative Methodology, 1961)

"If data analysis is to be well done, much of it must be a matter of judgment, and ‘theory’ whether statistical or non-statistical, will have to guide, not command." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33 (1), 1962)

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"The physical sciences are used to ‘praying over’ their data, examining the same data from a variety of points of view. This process has been very rewarding, and has led to many extremely valuable insights. Without this sort of flexibility, progress in physical science would have been much slower. Flexibility in analysis is often to be had honestly at the price of a willingness not to demand that what has already been observed shall establish, or prove, what analysis suggests. In physical science generally, the results of praying over the data are thought of as something to be put to further test in another experiment, as indications rather than conclusions." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"The histogram, with its columns of area proportional to number, like the bar graph, is one of the most classical of statistical graphs. Its combination with a fitted bell-shaped curve has been common since the days when the Gaussian curve entered statistics. Yet as a graphical technique it really performs quite poorly. Who is there among us who can look at a histogram-fitted Gaussian combination and tell us, reliably, whether the fit is excellent, neutral, or poor? Who can tell us, when the fit is poor, of what the poorness consists? Yet these are just the sort of questions that a good graphical technique should answer at least approximately." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"The first step in data analysis is often an omnibus step. We dare not expect otherwise, but we equally dare not forget that this step, and that step, and other step, are all omnibus steps and that we owe the users of such techniques a deep and important obligation to develop ways, often varied and competitive, of replacing omnibus procedures by ones that are more sharply focused." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"The basic general intent of data analysis is simply stated: to seek through a body of data for interesting relationships and information and to exhibit the results in such a way as to make them recognizable to the data analyzer and recordable for posterity. Its creative task is to be productively descriptive, with as much attention as possible to previous knowledge, and thus to contribute to the mysterious process called insight." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Comparable objectives in data analysis are (l) to achieve more specific description of what is loosely known or suspected; (2) to find unanticipated aspects in the data, and to suggest unthought-of-models for the data's summarization and exposure; (3) to employ the data to assess the (always incomplete) adequacy of a contemplated model; (4) to provide both incentives and guidance for further analysis of the data; and (5) to keep the investigator usefully stimulated while he absorbs the feeling of his data and considers what to do next." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"The science and art of data analysis concerns the process of learning from quantitative records of experience. By its very nature it exists in relation to people. Thus, the techniques and the technology of data analysis must be harnessed to suit human requirements and talents. Some implications for effective data analysis are: (1) that it is essential to have convenience of interaction of people and intermediate results and (2) that at all stages of data analysis the nature and detail of output, both actual and potential, need to be matched to the capabilities of the people who use it and want it." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"In many instances, a picture is indeed worth a thousand words. To make this true in more diverse circumstances, much more creative effort is needed to pictorialize the output from data analysis. Naive pictures are often extremely helpful, but more sophisticated pictures can be both simple and even more informative." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Data analysis must be iterative to be effective. [...] The iterative and interactive interplay of summarizing by fit and exposing by residuals is vital to effective data analysis. Summarizing and exposing are complementary and pervasive." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Summarizing data is a process of constrained and partial a process that essentially and inevitably corresponds to description - some sort of fitting, though it need not necessarily involve formal criteria or well-defined computations." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"The typical statistician has learned from bitter experience that negative results are just as important as positive ones, sometimes more so." (John W Tukey, "A Statistician's Comment", 1967)

"It is fair to say that statistics has made its greatest progress by having to move away from certainty [...] If we really want to make progress, we need to identify our next step away from certainty." (John W Tukey, "What Have Statisticians Been Forgetting", 1967)

"'Every student of the art of data analysis repeatedly needs to build upon his previous statistical knowledge and to reform that foundation through fresh insights and emphasis." (John W Tukey, "Data Analysis, Including Statistics", 1968)

"Every graph is at least an indication, by contrast with some common instances of numbers." (John W Tukey, "Data Analysis, Including Statistics", 1968)

"Nothing can substitute for relatively direct assessment of variability." (John W Tukey, "Data Analysis, Including Statistics", 1968)

"No one knows how to appraise a procedure safely except by using different bodies of data from those that determined it."  (John W Tukey, "Data Analysis, Including Statistics", 1968)

"The problems of different fields are much more alike than their practitioners think, much more alike than different."  (John W Tukey, "Analyzing Data: Sanctification or Detective Work?", 1969)

"[...] bending the question to fit the analysis is to be shunned at all costs." (John W Tukey, "Analyzing Data: Sanctification or Detective Work?", 1969)

"Data analysis is in important ways an antithesis of pure mathematics." (John W Tukey, "Data Analysis, Computation and Mathematics", 1972)

"Undoubtedly, the swing to exploratory data analysis will go somewhat too far. However : It is better to ride a damped pendulum than to be stuck in the mud." (John W Tukey, "Exploratory Data Analysis as Part of a Larger Whole", 1973)

"The greatest value of a picture is when it forces us to notice what we never expected to see." (John W Tukey, "Exploratory Data Analysis", 1977)

"[...] exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as for those we believe might be there. Except for its emphasis on graphs, its tools are secondary to its purpose." (John W Tukey, [comment] 1979)

"There is NO question of teaching confirmatory OR exploratory - we need to teach both." (John W Tukey, "We Need Both Exploratory and Confirmatory", 1980)

"Finding the question is often more important than finding the answer." (John W Tukey, "We Need Both Exploratory and Confirmatory", 1980)

"[...] any hope that we are smart enough to find even transiently optimum solutions to our data analysis problems is doomed to failure, and, indeed, if taken seriously, will mislead us in the allocation of effort, thus wasting both intellectual and computational effort." (John W Tukey, "Choosing Techniques for the Analysis of Data", 1981)

"Detailed study of the quality of data sources is an essential part of applied work. [...] Data analysts need to understand more about the measurement processes through which their data come. To know the name by which a column of figures is headed is far from being enough." (John W Tukey, "An Overview of Techniques of Data Analysis, Emphasizing Its Exploratory Aspects", 1982)

"Exploratory data analysis, EDA, calls for a relatively free hand in exploring the data, together with dual obligations: (•) to look for all plausible alternatives and oddities - and a few implausible ones, (graphic techniques can be most helpful here) and (•) to remove each appearance that seems large enough to be meaningful - ordinarily by some form of fitting, adjustment, or standardization [...] so that what remains, the residuals, can be examined for further appearances." (John W Tukey, "Introduction to Styles of Data Analysis Techniques", 1982)

"The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (John W Tukey, "Sunset Salvo", The American Statistician Vol. 40 (1), 1986)

"The worst, i.e., most dangerous, feature of 'accepting the null hypothesis' is the giving up of explicit uncertainty. […] Mathematics can sometimes be put in such black-and-white terms, but our knowledge or belief about the external world never can." (John W Tukey, "The Philosophy of Multiple Comparisons", Statistical Science Vol. 6 (1), 1991)

"Statistics is the science, the art, the philosophy, and the technique of making inferences from the particular to the general." (John W Tukey)

28 April 2006

🖍️William E Deming - Collected Quotes

"It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest." (William E Deming, "Statistical Adjustment of Data", 1938) 

"The definition of random in terms of a physical operation is notoriously without effect on the mathematical operations of statistical theory because so far as these mathematical operations are concerned random is purely and simply an undefined term." (Walter A Shewhart & William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Experience without theory teaches nothing." (William E Deming, "Out of the Crisis", 1986)

"It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest." (William E Deming, "Statistical Adjustment of Data", 1943)

"Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability." (William E Deming, "Some Theory of Sampling", 1950)

"Why waste knowledge?… No company can afford to waste knowledge. Failure of management to breakdown barriers between activities… is one way to waste knowledge. People that are not working together are not contributing their best to the company. People as they work together, feeling secure in the job reinforce their knowledge and efforts. Their combined output, when they are working together, is more than the sum of their separate. " (W Edwards Deming," Quality, Productivity and Competitive Position", 1982)

"Experience by itself teaches nothing [...] Without theory, experience has no meaning. Without theory, one has no questions to ask. Hence without theory there is no learning." (William E Deming, "The New Economics for Industry, Government, Education", 1993)

"Knowledge is theory. We should be thankful if action of management is based on theory. Knowledge has temporal spread. Information is not knowledge. The world is drowning in information but is slow in acquisition of knowledge. There is no substitute for knowledge." (William E Deming, "The New Economics for Industry, Government, Education", 1993) 

"What is a system? A system is a network of interdependent components that work together to try to accomplish the aim of the system. A system must have an aim. Without an aim, there is no system. The aim of the system must be clear to everyone in the system. The aim must include plans for the future. The aim is a value judgment." (William E Deming, "The New Economics for Industry, Government, Education", 1993)

"The only useful function of a statistician is to make predictions, and thus to provide a basis for action." (William E Deming)

"Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control." (William E Deming)

🖍️Donald J Wheeler - Collected Quotes

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Before you can improve any system you must listen to the voice of the system (the Voice of the Process). Then you must understand how the inputs affect the outputs of the system. Finally, you must be able to change the inputs (and possibly the system) in order to achieve the desired results. This will require sustained effort, constancy of purpose, and an environment where continual improvement is the operating philosophy." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Data are collected as a basis for action. Yet before anyone can use data as a basis for action the data have to be interpreted. The proper interpretation of data will require that the data be presented in context, and that the analysis technique used will filter out the noise."  (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No comparison between two values can be global. A simple comparison between the current figure and some previous value and convey the behavior of any time series. […] While it is simple and easy to compare one number with another number, such comparisons are limited and weak. They are limited because of the amount of data used, and they are weak because both of the numbers are subject to the variation that is inevitably present in weak world data. Since both the current value and the earlier value are subject to this variation, it will always be difficult to determine just how much of the difference between the values is due to variation in the numbers, and how much, if any, of the difference is due to real changes in the process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No matter what the data, and no matter how the values are arranged and presented, you must always use some method of analysis to come up with an interpretation of the data.
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"We analyze numbers in order to know when a change has occurred in our processes or systems. We want to know about such changes in a timely manner so that we can respond appropriately. While this sounds rather straightforward, there is a complication - the numbers can change even when our process does not. So, in our analysis of numbers, we need to have a way to distinguish those changes in the numbers that represent changes in our process from those that are essentially noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"When a system is predictable, it is already performing as consistently as possible. Looking for assignable causes is a waste of time and effort. Instead, you can meaningfully work on making improvements and modifications to the process. When a system is unpredictable, it will be futile to try and improve or modify the process. Instead you must seek to identify the assignable causes which affect the system. The failure to distinguish between these two different courses of action is a major source of confusion and wasted effort in business today." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"When a process displays unpredictable behavior, you can most easily improve the process and process outcomes by identifying the assignable causes of unpredictable variation and removing their effects from your process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"While all data contain noise, some data contain signals. Before you can detect a signal, you must filter out the noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Without meaningful data there can be no meaningful analysis. The interpretation of any data set must be based upon the context of those data. Unfortunately, much of the data reported to executives today are aggregated and summed over so many different operating units and processes that they cannot be said to have any context except a historical one - they were all collected during the same time period. While this may be rational with monetary figures, it can be devastating to other types of data." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] you simply cannot make sense of any number without a contextual basis. Yet the traditional attempts to provide this contextual basis are often flawed in their execution. [...] Data have no meaning apart from their context. Data presented without a context are effectively rendered meaningless." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"A control chart is a tool for maintaining the status-quo - it was created to monitor a process after that process has been brought to a satisfactory level of operation."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Data analysis is not generally thought of as being simple or easy, but it can be. The first step is to understand that the purpose of data analysis is to separate any signals that may be contained within the data from the noise in the data. Once you have filtered out the noise, anything left over will be your potential signals. The rest is just details." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models. If you cannot answer the homogeneity question, then you will not know if you have one probability model or many. [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"In order to be effective a descriptive statistic has to make sense - it has to distill some essential characteristic of the data into a value that is both appropriate and understandable. […] the justification for computing any given statistic must come from the nature of the data themselves - it cannot come from the arithmetic, nor can it come from the statistic. If the data are a meaningless collection of values, then the summary statistics will also be meaningless - no arithmetic operation can magically create meaning out of nonsense. Therefore, the meaning of any statistic has to come from the context for the data, while the appropriateness of any statistic will depend upon the use we intend to make of that statistic." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

[myth] " It has been said that process behavior charts work because of the central limit theorem."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

[myth] "It has been said that the data must be normally distributed before they can be placed on a process behavior chart."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

[myth]  "It has been said that the observations must be independent - data with autocorrelation are inappropriate for process behavior charts." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

[myth] " It has been said that the process must be operating in control before you can place the data on a process behavior chart."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. Any data analyst needs to know how to organize and use these four questions in order to obtain meaningful and correct results. [...] 
THE DESCRIPTION QUESTION: Given a collection of numbers, are there arithmetic values that will summarize the information contained in those numbers in some meaningful way?
THE PROBABILITY QUESTION: Given a known universe, what can we say about samples drawn from this universe? [...] 
THE INFERENCE QUESTION: Given an unknown universe, and given a sample that is known to have been drawn from that unknown universe, and given that we know everything about the sample, what can we say about the unknown universe? [...] 
THE HOMOGENEITY QUESTION: Given a collection of observations, is it reasonable to assume that they came from one universe, or do they show evidence of having come from multiple universes?" (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"The simplicity of the process behavior chart can be deceptive. This is because the simplicity of the charts is based on a completely different concept of data analysis than that which is used for the analysis of experimental data.  When someone does not understand the conceptual basis for process behavior charts they are likely to view the simplicity of the charts as something that needs to be fixed.  Out of these urges to fix the charts all kinds of myths have sprung up resulting in various levels of complexity and obstacles to the use of one of the most powerful analysis techniques ever invented." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

[myth] "The standard deviation statistic is more efficient than the range and therefore we should use the standard deviation statistic when computing limits for a process behavior chart."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

🖍️Nassim N Taleb - Collected Quotes

"A mistake is not something to be determined after the fact, but in the light of the information until that point." (Nassim N Taleb, "Fooled by Randomness", 2001)

"Probability is not about the odds, but about the belief in the existence of an alternative outcome, cause, or motive." (Nassim N Taleb, "Fooled by Randomness", 2001)

"A Black Swan is a highly improbable event with three principal characteristics: It is unpredictable; it carries a massive impact; and, after the fact, we concoct an explanation that makes it appear less random, and more predictable, than it was. […] The Black Swan idea is based on the structure of randomness in empirical reality. [...] the Black Swan is what we leave out of simplification." (Nassim N Taleb, "The Black Swan" , 2007)

"Prediction, not narration, is the real test of our understanding of the world." (Nassim N Taleb, "The Black Swan", 2007)

"The inability to predict outliers implies the inability to predict the course of history.” (Nassim N Taleb, “The Black Swan”, 2007)

"While in theory randomness is an intrinsic property, in practice, randomness is incomplete information." (Nassim N Taleb, "The Black Swan", 2007)

"Complex systems are full of interdependencies - hard to detect - and nonlinear responses. […] Man-made complex systems tend to develop cascades and runaway chains of reactions that decrease, even eliminate, predictability and cause outsized events. So the modern world may be increasing in technological knowledge, but, paradoxically, it is making things a lot more unpredictable." (Nassim N Taleb, "Antifragile: Things that gain from disorder", 2012)

"Technology is the result of antifragility, exploited by risk-takers in the form of tinkering and trial and error, with nerd-driven design confined to the backstage." (Nassim N Taleb, "Antifragile: Things that gain from disorder", 2012)

"The higher the dimension, in other words, the higher the number of possible interactions, and the more disproportionally difficult it is to understand the macro from the micro, the general from the simple units. This disproportionate increase of computational demands is called the curse of dimensionality." (Nassim N Taleb, "Skin in the Game: Hidden Asymmetries in Daily Life", 2018)

"[…] whenever people make decisions after being supplied with the standard deviation number, they act as if it were the expected mean deviation." (Nassim N Taleb, "Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

27 April 2006

🖍️Andrew Gelman - Collected Quotes

"The idea of optimization transfer is very appealing to me, especially since I have never succeeded in fully understanding the EM algorithm." (Andrew Gelman, "Discussion", Journal of Computational and Graphical Statistics vol 9, 2000)

"The difference between 'statistically significant' and 'not statistically significant' is not in itself necessarily statistically significant. By this, I mean more than the obvious point about arbitrary divisions, that there is essentially no difference between something significant at the 0.049 level or the 0.051 level. I have a bigger point to make. It is common in applied research–in the last couple of weeks, I have seen this mistake made in a talk by a leading political scientist and a paper by a psychologist–to compare two effects, from two different analyses, one of which is statistically significant and one which is not, and then to try to interpret/explain the difference. Without any recognition that the difference itself was not statistically significant." (Andrew Gelman, "The difference between ‘statistically significant’ and ‘not statistically significant’ is not in itself necessarily statistically significant", 2005)

"A naive interpretation of regression to the mean is that heights, or baseball records, or other variable phenomena necessarily become more and more 'average' over time. This view is mistaken because it ignores the error in the regression predicting y from x. For any data point xi, the point prediction for its yi will be regressed toward the mean, but the actual yi that is observed will not be exactly where it is predicted. Some points end up falling closer to the mean and some fall further." (Andrew Gelman & Jennifer Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models", 2007)

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman, "The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)

"The arguments I lay out are, briefly, that graphs are a distraction from more serious analysis; that graphs can mislead in displaying compelling patterns that are not statistically significant and that could easily enough be consistent with chance variation; that diagnostic plots could be useful in the development of a model but do not belong in final reports; that, when they take the place of tables, graphs place the careful reader one step further away from the numerical inferences that are the essence of rigorous scientific inquiry; and that the effort spent making flashy graphics would be better spent on the substance of the problem being studied." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)

"Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It is basically a trade-off: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis does not need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)"

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013) 

"Flaws can be found in any research design if you look hard enough. […] In our experience, it is good scientific practice to refine one's research hypotheses in light of the data. Working scientists are also keenly aware of the risks of data dredging, and they use confidence intervals and p-values as a tool to avoid getting fooled by noise. Unfortunately, a by-product of all this struggle and care is that when a statistically significant pattern does show up, it is natural to get excited and believe it. The very fact that scientists generally don't cheat, generally don't go fishing for statistical significance, makes them vulnerable to drawing strong conclusions when they encounter a pattern that is robust enough to cross the p < 0.05 threshold." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There are many roads to statistical significance; if data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth. Realistically, though, a researcher will come into a study with strong substantive hypotheses, to the extent that, for any given data set, the appropriate analysis can seem evidently clear. But even if the chosen data analysis is a deterministic function of the observed data, this does not eliminate the problem posed by multiple comparisons." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There is a growing realization that reported 'statistically significant' claims in statistical publications  are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for 'probability') is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p- value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"I agree with the general message: 'The right variables make a big difference for accuracy. Complex statistical methods, not so much.' This is similar to something Hal Stern told me once: the most important aspect of a statistical analysis is not what you do with the data, it’s what data you use." (Andrew Gelman, "The most important aspect of a statistical analysis is not what you do with the data, it’s what data you use", 2018)

"We thus echo the classical Bayesian literature in concluding that ‘noninformative prior information’ is a contradiction in terms. The flat prior carries information just like any other; it represents the assumption that the effect is likely to be large. This is often not true. Indeed, the signal-to-noise ratios is often very low and then it is necessary to shrink the unbiased estimate. Failure to do so by inappropriately using the flat prior causes overestimation of effects and subsequent failure to replicate them." (Erik van Zwet & Andrew Gelman, "A proposal for informative default priors scaled by the standard error of estimates", The American Statistician 76, 2022)

"Taking a model too seriously is really just another way of not taking it seriously at all." (Andrew Gelman)

26 April 2006

🖍️Gerald van Belle - Collected Quotes

"A bar graph typically presents either averages or frequencies. It is relatively simple to present raw data (in the form of dot plots or box plots). Such plots provide much more information. and they are closer to the original data. If the bar graph categories are linked in some way - for example, doses of treatments - then a line graph will be much more informative. Very complicated bar graphs containing adjacent bars are very difficult to grasp. If the bar graph represents frequencies. and the abscissa values can be ordered, then a line graph will be much more informative and will have substantially reduced chart junk." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A good graph displays relationships and structures that are difficult to detect by merely looking at the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A probability can frequently be expressed as a ratio of the number of events divided by the number of units eligible for the event. What the rule of thumb says is to be aware of what the numerator and denominator are, particularly when assessing probabilities in a personal situation. If someone never goes hang gliding, they clearly do not need to worry about the probability of dying in a hang gliding accident." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Assess agreement by addressing accuracy, scale differential, and precision. Accuracy can be thought of as the lack of bias." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Before choosing a measure of covariation determine the source of the data (sampling scheme), the nature of the variables, and the symmetry status of the measure." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Characterizing variability requires repeatedly observing the variability since the it is not a property inherent in the observation itself. " (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Displaying numerical information always involves selection. The process of selection needs to be described so that the reader will not be misled." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Distinguish among confidence, prediction, and tolerance intervals. Confidence intervals are statements about population means or other parameters. Prediction intervals address future (single or multiple) observations. Tolerance intervals describe the location of a specific proportion of a population, with specified confidence." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Do not let the scale of measurement rigidly determine the method of analysis." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Everyone agrees that there are degrees of quality of information but when asked to define the criteria there a great deal of disagreement. The simple statistical rule that the inverse of the variance of a statistic is a measure of the information contained in the statistic provides a useful criterion for a point estimate but is clearly inadequate for comparing much bigger chunks of information such as a study." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Every statistical analysis is an interpretation of the data, and missingness affects the interpretation. The challenge is that when the reasons for the missingness cannot be determined there is basically no way to make appropriate statistical adjustments. Sensitivity analyses are designed to model and explore a reasonable range of explanations in order to assess the robustness of the results." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"In assessing change, the spacing of the observations is much more important than the number of observations." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"In using a database, first look at the metadata, then look at the data. [...] The old computer acronym GIGO (Garbage In, Garbage Out) applies to the use of large databases. The issue is whether the data from the database will answer the research question. In order to determine this, the investigator must have some idea about the nature of the data in the database - that is, the metadata." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"It is crucial to have a broad understanding of the subject matter involved. Statistical analysis is much more than just carrying out routine computations. Only with keen understanding of the subject matter can statisticians, and statistics, be most usefully engaged." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Know what properties a transformation preserves and does not preserve." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Models can be viewed and used at three levels. The first is a model that fits the data. A test of goodness-of-fit operates at this level. This level is the least useful but is frequently the one at which statisticians and researchers stop. For example, a test of a linear model is judged good when a quadratic term is not significant. A second level of usefulness is that the model predicts future observations. Such a model has been called a forecast model. This level is often required in screening studies or studies predicting outcomes such as growth rate. A third level is that a model reveals unexpected features of the situation being described, a structural model, [...] However, it does not explain the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Observation is selection. [...] To observe one thing implies that another is not observed, hence there is selection. This implies that the observation is taken from a larger collective, the statistical population." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Ockham's Razor in statistical analysis is used implicitly when models are embedded in richer models -for example, when testing the adequacy of a linear model by incorporating a quadratic term. If the coefficient of the quadratic term is not significant, it is dropped and the linear model is assumed to summarize the data adequately." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Precision does not vary linearly with increasing sample size. As is well known, the width of a confidence interval is a function of the square root of the number of observations. But it is more complicate than that. The basic elements determining a confidence interval are the sample size, an estimate of variability, and a pivotal variable associated with the estimate of variability." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Randomization puts systematic sources of variability into the error term." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Since the analysis of variance is an analysis of variability of means it is possible to plot the means in many ways." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Stacked bar graphs do not show data structure well. A trend in one of the stacked variables has to be deduced by scanning along the vertical bars. This becomes especially difficult when the categories do not move in the same direction." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Statistics is the analysis of variation. There are many sources and kinds of variation. In environmental studies it is particularly important to understand the kinds of variation and the implications of the difference. Two important categories are variability and uncertainty. Variability refers to variation in environmental quantities (which may have special regulatory interest), uncertainty refers to the degree of precision with which these quantities are estimated." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The best rule is: Don't have any missing data, Unfortunately, that is unrealistic. Therefore, plan for missing data and develop strategies to account for them. Do this before starting the study. The strategy should state explicitly how the type of missingness will be examined, how it will be handled, and how the sensitivity of the results to the missing data will be assessed." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The bounds on the standard deviation are pretty crude but it is surprising how often the rule will pick up gross errors such as confusing the standard error and standard deviation, confusing the variance and the standard deviation, or reporting the mean in one scale and the standard deviation in another scale." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The content and context of the numerical data determines the most appropriate mode of presentation. A few numbers can be listed, many numbers require a table. Relationships among numbers can be displayed by statistics. However, statistics, of necessity, are summary quantities so they cannot fully display the relationships, so a graph can be used to demonstrate them visually. The attractiveness of the form of the presentation is determined by word layout, data structure, and design." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The most ubiquitous graph is the pie chart. It is a staple of the business world. [...] Never use a pie chart. Present a simple list of percentages, or whatever constitutes the divisions of the pie chart." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"[...] there are two problems with the indiscriminate multiplication of probabilities. First, multiplication without adjustment implies that the events represented by the probabilities are treated as independent. Second, since probabilities are always less than 1, the product will become smaller and smaller. If small probabilities are associated with unlikely events then, by a suitable selection, the joint occurrence of events can be made arbitrarily small." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"This pie chart violates several of the rules suggested by the question posed in the introduction. First, immediacy: the reader has to turn to the legend to find out what the areas represent; and the lack of color makes it very difficult to determine which area belongs to what code. Second, the underlying structure of the data is completely ignored. Third, a tremendous amount of ink is used to display eight simple numbers." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Three key aspects of presenting high dimensional data are: rendering, manipulation, and linking. Rendering determines what is to be plotted, manipulation determines the structure of the relationships, and linking determines what information will be shared between plots or sections of the graph." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"When there is more than one source of variation it is important to identify those sources." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

🖍️George B Dyson - Collected Quotes

"An Internet search engine is a finite-state, deterministic machine, except at those junctures where people, individually and collectively, make a nondeterministic choice as to which results are selected as meaningful and given a click. These clicks are then immediately incorporated into the state of the deterministic machine, which grows ever so incrementally more knowledgeable with every click. This is what Turing defined as an oracle machine."  (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"If life, by some chance, happens to have originated, and survived, elsewhere in the universe, it will have had time to explore an unfathomable diversity of forms. Those best able to survive the passage of time, adapt to changing environments, and migrate across interstellar distances will become the most widespread. A life form that assumes digital representation, for all or part of its life cycle, will be able to travel at the speed of light." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"In our universe, we measure time with clocks, and computers have a 'clock speed', but the clocks that govern the digital universe are very different from the clocks that govern ours. In the digital universe, clocks exist to synchronize the translation between bits that are stored in memory (as structures in space) and bits that are communicated by code (as sequences in time). They are clocks more in the sense of regulating escapement than in the sense of measuring time." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"It is characteristic of objects of low complexity that it is easier to talk about the object than produce it and easier to predict its properties than to build it." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"Life evolved, so far, by making use of the viral cloud as a source of backup copies and a way to rapidly exchange genetic code. Life may be better adapted to the digital universe than we think." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"Monte Carlo is able to discover practical solutions to otherwise intractable problems because the most efficient search of an unmapped territory takes the form of a random walk. Today’s search engines, long descended from their ENIAC-era ancestors, still bear the imprint of their Monte Carlo origins: random search paths being accounted for, statistically, to accumulate increasingly accurate results. The genius of Monte Carlo - and its search-engine descendants - lies in the ability to extract meaningful solutions, in the face of overwhelming information, by recognizing that meaning resides less in the data at the end points and more in the intervening paths." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"Over long distances, it is expensive to transport structures, and inexpensive to transmit sequences. Turing machines, which by definition are structures that can be encoded as sequences, are already propagating themselves, locally, at the speed of light. The notion that one particular computer resides in one particular location at one time is obsolete. (George Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012) 

"Random search can be more efficient than nonrandom search - something that Good and Turing had discovered at Bletchley Park. A random network, whether of neurons, computers, words, or ideas, contains solutions, waiting to be discovered, to problems that need not be explicitly defined." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The brain is a statistical, probabilistic system, with logic and mathematics running as higher-level processes. The computer is a logical, mathematical system, upon which higher-level statistical, probabilistic systems, such as human language and intelligence, could possibly be built." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The good news is that, as Leibniz suggested, we appear to live in the best of all possible worlds, where the computable functions make life predictable enough to be survivable, while the noncomputable functions make life (and mathematical truth) unpredictable enough to remain interesting, no matter how far computers continue to advance."  (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The fundamental, indivisible unit of information is the bit. The fundamental, indivisible unit of digital computation is the transformation of a bit between its two possible forms of existence: as structure (memory) or as sequence (code). This is what a Turing Machine does when reading a mark (or the absence of a mark) on a square of tape, changing its state of mind accordingly, and making (or erasing) a mark somewhere else." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The genius of Monte Carlo - and its search-engine descendants - lies in the ability to extract meaningful solutions, in the face of overwhelming information, by recognizing that meaning resides less in the data at the end points and more in the intervening paths." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The paradox of artificial intelligence is that any system simple enough to be understandable is not complicated enough to behave intelligently, and any system complicated enough to behave intelligently is not simple enough to understand. The path to artificial intelligence, suggested Turing, is to construct a machine with the curiosity of a child, and let intelligence evolve." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"Where does meaning come in? If everything is assigned a number, does this diminish the meaning in the world? What Gödel (and Turing) proved is that formal systems will, sooner or later, produce meaningful statements whose truth can be proved only outside the system itself. This limitation does not confine us to a world with any less meaning. It proves, on the contrary, that we live in a world where higher meaning exists." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"Nature uses digital computing for generation-to-generation information storage, combinatorics, and error correction but relies on analog computing for real-time intelligence and control." (George B Dyson, Analogia: The Emergence of Technology Beyond Programmable Control", 2020)

🖍️George E P Box - Collected Quotes

"Statistical criteria should (1) be sensitive to change in the specific factors tested, (2) be insensitive to changes, of a magnitude likely to occur in practice, in extraneous factors." (George E P Box, 1955)

"The method of least squares is used in the analysis of data from planned experiments and also in the analysis of data from unplanned happenings. The word 'regression' is most often used to describe analysis of unplanned data. It is the tacit assumption that the requirements for the validity of least squares analysis are satisfied for unplanned data that produces a great deal of trouble." (George E P Box, "Use and Abuse of Regression", 1966)

"To find out what happens to a system when you interfere with it you have to interfere with it (not just passively observe it)." (George E P Box, "Use and Abuse of Regression", 1966)

"A man in daily muddy contact with field experiments could not be expected to have much faith in any direct assumption of independently distributed normal errors." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"For the theory-practice iteration to work, the scientist must be, as it were, mentally ambidextrous; fascinated equally on the one hand by possible meanings, theories, and tentative models to be induced from data and the practical reality of the real world, and on the other with the factual implications deducible from tentative theories, models and hypotheses." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"One important idea is that science is a means whereby learning is achieved, not by mere theoretical speculation on the one hand, nor by the undirected accumulation of practical facts on the other, but rather by a motivated iteration between theory and practice." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)

"The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful." (George E P Box, 1987)

"Statistics is, or should be, about scientific investigation and how to do it better, but many statisticians believe it is a branch of mathematics." (George E P Box, Commentary, Technometrics 32, 1990)

"The central limit theorem says that, under conditions almost always satisfied in the real world of experimentation, the distribution of such a linear function of errors will tend to normality as the number of its components becomes large. The tendency to normality occurs almost regardless of the individual distributions of the component errors. An important proviso is that several sources of error must make important contributions to the overall error and that no particular source of error dominate the rest." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"All models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind." (George E P Box & Norman R Draper, "Response Surfaces, Mixtures, and Ridge Analyses", 2007)

"In my view, statistics has no reason for existence except as the catalyst for investigation and discovery." (George E P Box)

🖍️Cathy O'Neil - Collected Quotes

"A model, after all, is nothing more than an abstract representation of some process, be it a baseball game, an oil company’s supply chain, a foreign government’s actions, or a movie theater’s attendance. Whether it’s running in a computer program or in our head, the model takes what we know and uses it to predict responses in various situations. All of us carry thousands of models in our heads. They tell us what to expect, and they guide our decisions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"No model can include all of the real world’s complexity or the nuance of human communication. Inevitably, some important information gets left out. […] To create a model, then, we make choices about what’s important enough to include, simplifying the world into a toy version that can be easily understood and from which we can infer important facts and actions.[…] Sometimes these blind spots don’t matter. […] A model’s blind spots reflect the judgments and priorities of its creators. […] Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"The first question: Even if the participant is aware of being modeled, or what the model is used for, is the model opaque, or even invisible? […] the second question: Does the model work against the subject’s interest? In short, is it unfair? Does it damage or destroy lives? […] The third question is whether a model has the capacity to grow exponentially. As a statistician would put it, can it scale?" (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"Whether or not a model works is also a matter of opinion. After all, a key component of every model, whether formal or informal, is its definition of success." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

25 April 2006

🖍️Darell Huff - Collected Quotes

"Another thing to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated." (Darell Huff, "How to Lie with Statistics", 1954)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff, "How to Lie with Statistics", 1954)

"If you can't prove what you want to prove, demonstrate something else and pretend that they are the something. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference." (Darell Huff, "How to Lie with Statistics", 1954)

"Keep in mind that a correlation may be real and based on real cause and effect -and still be almost worthless in determining action in any single case." (Darell Huff, "How to Lie with Statistics", 1954) 

"Only when there is a substantial number of trials involved is the law of averages a useful description or prediction." (Darell Huff, "How to Lie with Statistics", 1954)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places, you begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"Place little faith in an average or a graph or a trend when those important figures are missing."  (Darell Huff, "How to Lie with Statistics", 1954)

"Sometimes the big ado is made about a difference that is mathematically real and demonstrable but so tiny as to have no importance. This is in defiance of the fine old saying that a difference is a difference only if it makes a difference." (Darell Huff, "How to Lie with Statistics", 1954)

"The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts." (Darell Huff, "How to Lie with Statistics", 1954)

"The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it." (Darell Huff, "How to Lie with Statistics", 1954)

"The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, 'opinion' polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense." (Darell Huff, "How to Lie with Statistics", 1954)

"There are often many ways of expressing any figure. […] The method is to choose the one that sounds best for the purpose at hand and trust that few who read it will recognize how imperfectly it reflects the situation." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"When you are told that something is an average you still don't know very much about it unless you can find out which of the common kinds of average it is-mean, median, or mode. [...] The different averages come out close together when you deal with data, such as those having to do with many human characteristics, that have the grace to fall close to what is called the normal distribution. If you draw a curve to represent it you get something shaped like a bell, and mean, median, and mode fall at the same point." (Darell Huff, "How to Lie with Statistics", 1954)

"When you find somebody - usually an interested party - making a fuss about a correlation, look first of all to see if it is not one of this type, produced by the stream of events, the trend of the times." (Darell Huff, "How to Lie with Statistics", 1954)

🖍️John D Kelleher - Collected Quotes

"A predictive model overfits the training set when at least some of the predictions it returns are based on spurious patterns present in the training data used to induce the model. Overfitting happens for a number of reasons, including sampling variance and noise in the training set. The problem of overfitting can affect any machine learning algorithm; however, the fact that decision tree induction algorithms work by recursively splitting the training data means that they have a natural tendency to segregate noisy instances and to create leaf nodes around these instances. Consequently, decision trees overfit by splitting the data on irrelevant features that only appear relevant due to noise or sampling variance in the training data. The likelihood of overfitting occurring increases as a tree gets deeper because the resulting predictions are based on smaller and smaller subsets as the dataset is partitioned after each feature test in the path." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Decision trees are also discriminative models. Decision trees are induced by recursively partitioning the feature space into regions belonging to the different classes, and consequently they define a decision boundary by aggregating the neighboring regions belonging to the same class. Decision tree model ensembles based on bagging and boosting are also discriminative models." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Decision trees are also considered nonparametric models. The reason for this is that when we train a decision tree from data, we do not assume a fixed set of parameters prior to training that define the tree. Instead, the tree branching and the depth of the tree are related to the complexity of the dataset it is trained on. If new instances were added to the dataset and we rebuilt the tree, it is likely that we would end up with a (potentially very) different tree." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"There are two kinds of mistakes that an inappropriate inductive bias can lead to: underfitting and overfitting. Underfitting occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target feature. Overfitting, by contrast, occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data."(John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"The main advantage of decision tree models is that they are interpretable. It is relatively easy to understand the sequences of tests a decision tree carried out in order to make a prediction. This interpretability is very important in some domains. [...] Decision tree models can be used for datasets that contain both categorical and continuous descriptive features. A real advantage of the decision tree approach is that it has the ability to model the interactions between descriptive features. This arises from the fact that the tests carried out at each node in the tree are performed in the context of the results of the tests on the other descriptive features that were tested at the preceding nodes on the path from the root. Consequently, if there is an interaction effect between two or more descriptive features, a decision tree can model this."  (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Tree pruning identifies and removes subtrees within a decision tree that are likely to be due to noise and sample variance in the training set used to induce it. In cases where a subtree is deemed to be overfitting, pruning the subtree means replacing the subtree with a leaf node that makes a prediction based on the majority target feature level (or average target feature value) of the dataset created by merging the instances from all the leaf nodes in the subtree. Obviously, pruning will result in decision trees being created that are not consistent with the training set used to build them. In general, however, we are more interested in creating prediction models that generalize well to new data rather than that are strictly consistent with training data, so it is common to sacrifice consistency for generalization capacity." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"When datasets are small, a parametric model may perform well because the strong assumptions made by the model - if correct - can help the model to avoid overfitting. However, as the size of the dataset grows, particularly if the decision boundary between the classes is very complex, it may make more sense to allow the data to inform the predictions more directly. Obviously the computational costs associated with nonparametric models and large datasets cannot be ignored. However, support vector machines are an example of a nonparametric model that, to a large extent, avoids this problem. As such, support vector machines are often a good choice in complex domains with lots of data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"When we find data quality issues due to valid data during data exploration, we should note these issues in a data quality plan for potential handling later in the project. The most common issues in this regard are missing values and outliers, which are both examples of noise in the data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"A neural network consists of a set of neurons that are connected together. A neuron takes a set of numeric values as input and maps them to a single output value. At its core, a neuron is simply a multi-input linear-regression function. The only significant difference between the two is that in a neuron the output of the multi-input linear-regression function is passed through another function that is called the activation function." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Data scientists should have some domain expertise. Most data science projects begin with a real-world, domain-specific problem and the need to design a data-driven solution to this problem. As a result, it is important for a data scientist to have enough domain expertise that they understand the problem, why it is important, an dhow a data science solution to the problem might fit into an organization’s processes. This domain expertise guides the data scientist as she works toward identifying an optimized solution." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"However, because ML algorithms are biased to look for different types of patterns, and because there is no one learning bias across all situations, there is no one best ML algorithm. In fact, a theorem known as the 'no free lunch theorem' states that there is no one best ML algorithm that on average outperforms all other algorithms across all possible data sets." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"One of the most important skills for a data scientist is the ability to frame a real-world problem as a standard data science task." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Presenting data in a graphical format makes it much easier to see and understand what is happening with the data. Data visualization applies to all phases of the data science process."  (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The patterns that we extract using data science are useful only if they give us insight into the problem that enables us to do something to help solve the problem." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The promise of data science is that it provides a way to understand the world through data." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Using data science, we can uncover the important patterns in a data set, and these patterns can reveal the important attributes in the domain. The reason why data science is used in so many domains is that it doesn’t matter what the problem domain is: if the right data are available and the problem can be clearly defined, then data science can help."  (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"We humans are reasonably good at defining rules that check one, two, or even three attributes (also commonly referred to as features or variables), but when we go higher than three attributes, we can start to struggle to handle the interactions between them. By contrast, data science is often applied in contexts where we want to look for patterns among tens, hundreds, thousands, and, in extreme cases, millions of attributes." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

🖍️Larry A Wasserman - Collected Quotes

 "A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian inference is appealing when prior information is available since Bayes’ theorem is a natural way to combine prior information with data. Some people find Bayesian inference psychologically appealing because it allows us to make probability statements about parameters. […] In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In general, they need not agree." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Inequalities are useful for bounding quantities that might otherwise be hard to compute." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Probability is a mathematical language for quantifying uncertainty." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Statistical inference, or 'learning' as it is called in computer science, is the process of using data to infer the distribution that generated the data." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"[…] studying methods for parametric models is useful for two reasons. First, there are some cases where background knowledge suggests that a parametric model provides a reasonable approximation. […] Second, the inferential concepts for parametric models provide background for understanding certain nonparametric methods." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The Bayesian approach is based on the following postulates: (B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. […] (B2) We can make probability statements about parameters, even though they are fixed constants. (B3) We make inferences about a parameter θ by producing a probability distribution for θ. Inferences, such as point estimates and interval estimates, may then be extracted from this distribution." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The frequentist point of view is based on the following postulates: (F1) Probability refers to limiting relative frequencies. Probabilities are objective properties of the real world. (F2) Parameters are i xed, unknown constants. Because they are not fluctuating, no useful probability statements can be made about parameters. (F3) Statistical procedures should be designed to have well-defined long run frequency properties." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004) 

"The most important aspect of probability theory concerns the behavior of sequences of random variables." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algorithms are more scalable than statisticians ever thought possible. Formal statistical theory is more pervasive than computer scientists had realized." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Undirected graphs are an alternative to directed graphs for representing independence relations. Since both directed and undirected graphs are used in practice, it is a good idea to be facile with both. The main difference between the two is that the rules for reading independence relations from the graph are different." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

🖍️David S Salsburg - Collected Quotes

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"All methods of dealing with big data require a vast number of mind-numbing, tedious, boring mathematical steps." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"An estimate (the mathematical definition) is a number derived from observed values that is as close as we can get to the true parameter value. Useful estimators are those that are 'better' in some sense than any others." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Correlation is not equivalent to cause for one major reason. Correlation is well defined in terms of a mathematical formula. Cause is not well defined." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Estimators are functions of the observed values that can be used to estimate specific parameters. Good estimators are those that are consistent and have minimum variance. These properties are guaranteed if the estimator maximizes the likelihood of the observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"One final warning about the use of statistical models (whether linear or otherwise): The estimated model describes the structure of the data that have been observed. It is unwise to extend this model very far beyond the observed data." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The central limit conjecture states that most errors are the result of many small errors and, as such, have a normal distribution. The assumption of a normal distribution for error has many advantages and has often been made in applications of statistical models." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The degree to which one variable can be predicted from another can be calculated as the correlation between them. The square of the correlation (R^2) is the proportion of the variance of one that can be 'explained' by knowledge of the other." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The lack of variability is often a hallmark of faked data. […] The failure of faked data to have sufficient variability holds as long as the liar does not know this. If the liar knows this, his best approach is to start with real data and use it cleverly to adapt it to his needs." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"There is a constant battle between the cold abstract absolutes of pure mathematics and, the sometimes sloppy way in which mathematical methods are applied in science." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"What properties should a good statistical estimator have? Since we are dealing with probability, we start with the probability that our estimate will be very close to the true value of the parameter. We want that probability to become greater and greater as we get more and more data. This property is called consistency. This is a statement about probability. It does not say that we are sure to get the right answer. It says that it is highly probable that we will be close to the right answer." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"When we use algebraic notation in statistical models, the problem becomes more complicated because we cannot 'observe' a probability and know its exact number. We can only estimate probabilities on the basis of observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

24 April 2006

🖍️Schuyler W Huck - Collected Quotes

"As used here, the term statistical misconception refers to any of several widely held but incorrect notions about statistical concepts, about procedures for analyzing data and about the meaning of results produced by such analyses. To illustrate, many people think that (1) normal curves are bell shaped, (2) a correlation coefficient should never be used to address questions of causality, and (3) the level of significance dictates the probability of a Type I error. Some people, of course, have only one or two (rather than all three) of these misconceptions, and a few individuals realize that all three of those beliefs are false." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"Distributional shape is an important attribute of data, regardless of whether scores are analyzed descriptively or inferentially. Because the degree of skewness can be summarized by means of a single number, and because computers have no difficulty providing such measures (or estimates) of skewness, those who prepare research reports should include a numerical index of skewness every time they provide measures of central tendency and variability." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"If a researcher checks the normality assumption by visually inspecting each sample’s data (for example, by looking at a frequency distribution or a histogram), that researcher might incorrectly think that the data are nonnormal because the distribution appears to be too tall and skinny or too flat and squatty. As a result of this misdiagnosis, the researcher might unnecessarily abandon his or her initial plan to use a parametric statistical test in favor of a different procedure, perhaps one that is thought to be distribution-free." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"If data are normally distributed, certain things are known about the group and individual scores in the group. For example, the three most frequently used measures of central tendency - the arithmetic mean, median, and mode - all have the same numerical value in a normal distribution. Moreover, if a distribution is normal, we can determine a person’s percentile if we know his or her z-score or T-score." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"It is dangerous to think that standard scores, such as z and T, form a normal distribution because (1) they don’t have to and (2) they often won’t. If you mistakenly presume that a set of standard scores are normally distributed (when they’re not), your conversion of z-scores (or T-scores) into percentiles can lead to great inaccuracies." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"It should be noted that any finite data set cannot “follow” the normal curve exactly. That’s because a normal curve’s two 'tails' extend out to positive and negative infinity. The curved line that forms a normal curve gets closer and closer to the baseline as the curved line moves further and further away from its middle section; however, the curved line never actually touches the abscissa." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"[…] kurtosis is influenced by the variability of the data. This fact leads to two surprising characteristics of kurtosis. First, not all rectangular distributions have the same amount of kurtosis. Second, certain distributions that are not rectangular are more platykurtic than are rectangular distributions!" (Schuyler W Huck, "Statistical Misconceptions", 2008)

"The shape of a normal curve is influenced by two things: (1) the distance between the baseline and the curve’s apex, and (2) the length, on the baseline, that’s set equal to one standard deviation. The arbitrary values chosen for these distances by the person drawing the normal curve determine the appearance of the resulting picture." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"The concept of kurtosis is often thought to deal with the 'peakedness' of a distribution. Compared to a normal distribution (which is said to have a moderate peak), distributions that have taller peaks are referred to as being leptokurtic, while those with smaller peaks are referred to as being platykurtic. Regarding the second of these terms, authors and instructors often suggest that the word flat (which rhymes with the first syllable of platykurtic) is a good mnemonic device for remembering that platykurtic distributions tend to be flatter than normal." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"The second surprising feature of kurtosis is that rectangular distributions, which are flat, are not maximally platykurtic. Bimodal distributions can yield lower kurtosis values than rectangular distributions, even in those situations where the number of scores and score variability are held constant." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"There are degrees to which a distribution can deviate from normality in terms of peakedness. A platykurtic distribution, for instance, might be slightly less peaked than a normal distribution, moderately less peaked than normal, or totally lacking in any peak. One is tempted to think that any perfectly rectangular distribution, being ultraflat in its shape, would be maximally platykurtic. However, this is not the case." (Schuyler W Huck, "Statistical Misconceptions", 2008)

🖍️Field Cady - Collected Quotes

"A common misconception is that data scientists don’t need visualizations. This attitude is not only inaccurate: it is very dangerous. Most machine learning algorithms are not inherently visual, but it is very easy to misinterpret their outputs if you look only at the numbers; there is no substitute for the human eye when it comes to making intuitive sense of things." (Field Cady, "The Data Science Handbook", 2017)

"AI failed (at least relative to the hype it had generated), and it’s partly out of embarrassment on behalf of their discipline that the term 'artificial intelligence' is rarely used in computer science circles (although it’s coming back into favor, just without the over-hyping). We are as far away from mimicking human intelligence as we have ever been, partly because the human brain is fantastically more complicated than a mere logic engine." (Field Cady, "The Data Science Handbook", 2017)

"At very small time scales, the motion of a particle is more like a random walk, as it gets jostled about by discrete collisions with water molecules. But virtually any random movement on small time scales will give rise to Brownian motion on large time scales, just so long as the motion is unbiased. This is because of the Central Limit Theorem, which tells us that the aggregate of many small, independent motions will be normally distributed." (Field Cady, "The Data Science Handbook", 2017)

"By far the greatest headache in machine learning is the problem of overfitting. This means that your results look great for the data you trained them on, but they don’t generalize to other data in the future. [...] The solution is to train on some of your data and assess performance on other data." (Field Cady, "The Data Science Handbook", 2017) 

"Extracting good features is the most important thing for getting your analysis to work. It is much more important than good machine learning classifiers, fancy statistical techniques, or elegant code. Especially if your data doesn’t come with readily available features (as is the case with web pages, images, etc.), how you reduce it to numbers will make the difference between success and failure." (Field Cady, "The Data Science Handbook", 2017)

"Feature extraction is also the most creative part of data science and the one most closely tied to domain expertise. Typically, a really good feature will correspond to some real‐world phenomenon. Data scientists should work closely with domain experts and understand what these phenomena mean and how to distill them into numbers." (Field Cady, "The Data Science Handbook", 2017)

"Outliers make it very hard to give an intuitive interpretation of the mean, but in fact, the situation is even worse than that. For a real‐world distribution, there always is a mean (strictly speaking, you can define distributions with no mean, but they’re not realistic), and when we take the average of our data points, we are trying to estimate that mean. But when there are massive outliers, just a single data point is likely to dominate the value of the mean and standard deviation, so much more data is required to even estimate the mean, let alone make sense of it." (Field Cady, "The Data Science Handbook", 2017)

"The first step is always to frame the problem: understand the business use case and craft a well‐defined analytics problem (or problems) out of it. This is followed by an extensive stage of grappling with the data and the real‐world things that it describes, so that we can extract meaningful features. Finally, these features are plugged into analytical tools that give us hard numerical results." (Field Cady, "The Data Science Handbook", 2017)

"Theoretically, the normal distribution is most famous because many distributions converge to it, if you sample from them enough times and average the results. This applies to the binomial distribution, Poisson distribution and pretty much any other distribution you’re likely to encounter (technically, any one for which the mean and standard deviation are finite)." (Field Cady, "The Data Science Handbook", 2017)

"With time series though, there is absolutely no substitute for plotting. The pertinent pattern might end up being a sharp spike followed by a gentle taper down. Or, maybe there are weird plateaus. There could be noisy spikes that have to be filtered out. A good way to look at it is this: means and standard deviations are based on the naïve assumption that data follows pretty bell curves, but there is no corresponding 'default' assumption for time series data (at least, not one that works well with any frequency), so you always have to look at the data to get a sense of what’s normal. [...] Along the lines of figuring out what patterns to expect, when you are exploring time series data, it is immensely useful to be able to zoom in and out." (Field Cady, "The Data Science Handbook", 2017)

"The myth of replacing domain experts comes from people putting too much faith in the power of ML to find patterns in the data. [...] ML looks for patterns that are generally pretty crude - the power comes from the sheer scale at which they can operate. If the important patterns in the data are not sufficiently crude then ML will not be able to ferret them out. The most powerful classes of models, like deep learning, can sometimes learn good-enough proxies for the real patterns, but that requires more training data than is usually available and yields complicated models that are hard to understand and impossible to debug. It’s much easier to just ask somebody who knows the domain!" (Field Cady, "Data Science: The Executive Summary: A Technical Book for Non-Technical Professionals", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.