12 November 2018

Data Science: Statisticians (Just the Quotes)

"Even trained statisticians often fail to appreciate the extent to which statistics are vitiated by the unrecorded assumptions of their interpreters." (George B Shaw, "The Doctor's Dilemma", 1906)

"Figures may not lie, but statistics compiled unscientifically and analyzed incompetently are almost sure to be misleading, and when this condition is unnecessarily chronic the so-called statisticians may be called liars." (Edwin B Wilson, "Bulletin of the American Mathematical Society", Vol 18, 1912)

"The statistician’s job is to draw general conclusions from fragmentary data. Too often the data supplied to him for analysis are not only fragmentary but positively incoherent, so that he can do next to nothing with them. Even the most kindly statistician swears heartily under his breath whenever this happens". (Michael J Moroney, "Facts from Figures", 1927)

"To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." (Sir Ronald A Fisher, [presidential address] 1938)

"An inference, if it is to have scientific value, must constitute a prediction concerning future data. If the inference is to be made purely with the help of the distribution theory of statistics, the experiments that constitute evidence for the inference must arise from a state of statistical control; until that state is reached, there is no universe, normal or otherwise, and the statistician’s calculations by themselves are an illusion if not a delusion. The fact is that when distribution theory is not applicable for lack of control, any inference, statistical or otherwise, is little better than a conjecture. The state of statistical control is therefore the goal of all experimentation." (William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"The characteristic which distinguishes the present-day professional statistician, is his interest and skill in the measurement of the fallibility of conclusions." (George W Snedecor, "On a Unique Feature of Statistics", [address] 1948)

"It is very easy to devise different tests which, on the average, have similar properties, [...] hey behave satisfactorily when the null hypothesis is true and have approximately the same power of detecting departures from that hypothesis. Two such tests may, however, give very different results when applied to a given set of data. The situation leads to a good deal of contention amongst statisticians and much discredit of the science of statistics. The appalling position can easily arise in which one can get any answer one wants if only one goes around to a large enough number of statisticians." (Frances Yates, "Discussion on the Paper by Dr. Box and Dr. Andersen", Journal of the Royal Statistical Society B Vol. 17, 1955)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The statistician cannot excuse himself from the duty of getting his head clear on the principles of scientific inference, but equally no other thinking man can avoid a like obligation." (Sir Ronald A Fisher, "The Design of Experiments", 1971)

"Statistics is a very powerful and persuasive mathematical tool. People put a lot of faith in printed numbers. It seems when a situation is described by assigning it a numerical value, the validity of the report increases in the mind of the viewer. It is the statistician's obligation to be aware that data in the eyes of the uninformed or poor data in the eyes of the naive viewer can be as deceptive as any falsehoods." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"Some scientists (e.g., econometricians) like to work with mathematical equations; others (e.g., hard-core statisticians) prefer a list of assumptions that ostensibly summarizes the structure of the diagram. Regardless of language, the model should depict, however qualitatively, the process that generates the data - in other words, the cause-effect forces that operate in the environment and shape the data generated." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.