24 December 2018

🔭Data Science: Phenomena (Just the Quotes)

"The word ‘chance’ then expresses only our ignorance of the causes of the phenomena that we observe to occur and to succeed one another in no apparent order. Probability is relative in part to this ignorance, and in part to our knowledge.” (Pierre-Simon Laplace, "Mémoire sur les Approximations des Formules qui sont Fonctions de Très Grands Nombres", 1783)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"The insights gained and garnered by the mind in its wanderings among basic concepts are benefits that theory can provide. Theory cannot equip the mind with formulas for solving problems, nor can it mark the narrow path on which the sole solution is supposed to lie by planting a hedge of principles on either side. But it can give the mind insight into the great mass of phenomena and of their relationships, then leave it free to rise into the higher realms of action." (Carl von Clausewitz, "On War", 1832)

"Theories usually result from the precipitate reasoning of an impatient mind which would like to be rid of phenomena and replace them with images, concepts, indeed often with mere words." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"[…] in order to observe, our mind has need of some theory or other. If in contemplating phenomena we did not immediately connect them with principles, not only would it be impossible for us to combine these isolated observations, and therefore to derive profit from them, but we should even be entirely incapable of remembering facts, which would for the most remain unnoted by us." (Auguste Comte, "Cours de Philosophie Positive", 1830-1842)

"The dimmed outlines of phenomenal things all merge into one another unless we put on the focusing-glass of theory, and screw it up sometimes to one pitch of definition and sometimes to another, so as to see down into different depths through the great millstone of the world." (James C Maxwell, "Are There Real Analogies in Nature?", 1856) 

"Isolated facts and experiments have in themselves no value, however great their number may be. They only become valuable in a theoretical or practical point of view when they make us acquainted with the law of a series of uniformly recurring phenomena, or, it may be, only give a negative result showing an incompleteness in our knowledge of such a law, till then held to be perfect." (Hermann von Helmholtz, "The Aim and Progress of Physical Science", 1869)

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondences or anomalies, to find the causes, to identify the laws." (Émile Cheysson, cca. 1877)

"There is no doubt that graphical expression will soon replace all others whenever one has at hand a movement or change of state - in a word, any phenomenon. Born before science, language is often inappropriate to express exact measures or definite relations." (Étienne-Jules Marey, "La méthode graphique dans les sciences expérimentales et principalement en physiologie et en médecine", 1878)

"Most surprising and far-reaching analogies revealed themselves between apparently quite disparate natural processes. It seemed that nature had built the most various things on exactly the same pattern; or, in the dry words of the analyst, the same differential equations hold for the most various phenomena." (Ludwig Boltzmann, "On the methods of theoretical physics", 1892)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"The principle of complementarity states that no single model is possible which could provide a precise and rational analysis of the connections between these phenomena [before and after measurement]. In such a case, we are not supposed, for example, to attempt to describe in detail how future phenomena arise out of past phenomena. Instead, we should simply accept without further analysis the fact that future phenomena do in fact somehow manage to be produced, in a way that is, however, necessarily beyond the possibility of a detailed description. The only aim of a mathematical theory is then to predict the statistical relations, if any, connecting the phenomena." (David Bohm, "A Suggested Interpretation of the Quantum Theory in Terms of ‘Hidden’ Variables", 1952)

"The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work" (John Von Neumann, "Method in the Physical Sciences", 1955)

"As shorthand, when the phenomena are suitably simple, words such as equilibrium and stability are of great value and convenience. Nevertheless, it should be always borne in mind that they are mere shorthand, and that the phenomena will not always have the simplicity that these words presuppose." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"Can there be laws of chance? The answer, it would seem should be negative, since chance is in fact defined as the characteristic of the phenomena which follow no law, phenomena whose causes are too complex to permit prediction." (Félix E Borel, "Probabilities and Life", 1962)

"Theories are usually introduced when previous study of a class of phenomena has revealed a system of uniformities. […] Theories then seek to explain those regularities and, generally, to afford a deeper and more accurate understanding of the phenomena in question. To this end, a theory construes those phenomena as manifestations of entities and processes that lie behind or beneath them, as it were." (Carl G Hempel, "Philosophy of Natural Science", 1966)

"The less we understand a phenomenon, the more variables we require to explain it." (Russell L Ackoff, "Management Science", 1967)

 "As soon as we inquire into the reasons for the phenomena, we enter the domain of theory, which connects the observed phenomena and traces them back to a single ‘pure’ phenomena, thus bringing about a logical arrangement of an enormous amount of observational material." (Georg Joos, "Theoretical Physics", 1968)

"A model is an abstract description of the real world. It is a simple representation of more complex forms, processes and functions of physical phenomena and ideas." (Moshe F Rubinstein & Iris R Firstenberg, "Patterns of Problem Solving", 1975)

"A real change of theory is not a change of equations - it is a change of mathematical structure, and only fragments of competing theories, often not very important ones conceptually, admit comparison with each other within a limited range of phenomena." (Yuri I Manin, "Mathematics and Physics", 1981)

"In all scientific fields, theory is frequently more important than experimental data. Scientists are generally reluctant to accept the existence of a phenomenon when they do not know how to explain it. On the other hand, they will often accept a theory that is especially plausible before there exists any data to support it." (Richard Morris, 1983)

"Nature is disordered, powerful and chaotic, and through fear of the chaos we impose system on it. We abhor complexity, and seek to simplify things whenever we can by whatever means we have at hand. We need to have an overall explanation of what the universe is and how it functions. In order to achieve this overall view we develop explanatory theories which will give structure to natural phenomena: we classify nature into a coherent system which appears to do what we say it does." (James Burke, "The Day the Universe Changed", 1985) 

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"[…] the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth." (Richard Swinburne, "Simplicity as Evidence for Truth", 1997)

"The point is that scientific descriptions of phenomena in all of these cases do not fully capture reality they are models. This is not a shortcoming but a strength of science much of the scientist's art lies in figuring out what to include and what to exclude in a model, and this ability allows science to make useful predictions without getting bogged down by intractable details." (Philip Ball," The Self-Made Tapestry: Pattern Formation in Nature", 1998)

"A scientific theory is a concise and coherent set of concepts, claims, and laws (frequently expressed mathematically) that can be used to precisely and accurately explain and predict natural phenomena." (Mordechai Ben-Ari, "Just a Theory: Exploring the Nature of Science", 2005)

"Complexity arises when emergent system-level phenomena are characterized by patterns in time or a given state space that have neither too much nor too little form. Neither in stasis nor changing randomly, these emergent phenomena are interesting, due to the coupling of individual and global behaviours as well as the difficulties they pose for prediction. Broad patterns of system behaviour may be predictable, but the system's specific path through a space of possible states is not." (Steve Maguire et al, "Complexity Science and Organization Studies", 2006)

"Humans have difficulty perceiving variables accurately […]. However, in general, they tend to have inaccurate perceptions of system states, including past, current, and future states. This is due, in part, to limited ‘mental models’ of the phenomena of interest in terms of both how things work and how to influence things. Consequently, people have difficulty determining the full implications of what is known, as well as considering future contingencies for potential systems states and the long-term value of addressing these contingencies. " (William B. Rouse, "People and Organizations: Explorations of Human-Centered Design", 2007)

"A theory is a speculative explanation of a particular phenomenon which derives it legitimacy from conforming to the primary assumptions of the worldview of the culture in which it appears. There can be more than one theory for a particular phenomenon that conforms to a given worldview. […]  A new theory may seem to trigger a change in worldview, as in this case, but logically a change in worldview must precede a change in theory, otherwise the theory will not be viable. A change in worldview will necessitate a change in all theories in all branches of study." (M G Jackson, "Transformative Learning for a New Worldview: Learning to Think Differently", 2008)

"[...] construction of a data model is precisely the selective relevant depiction of the phenomena by the user of the theory required for the possibility of representation of the phenomenon."  (Bas C van Fraassen, "Scientific Representation: Paradoxes of Perspective", 2008)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"A theory is a set of deductively closed propositions that explain and predict empirical phenomena, and a model is a theory that is idealized." (Jay Odenbaugh, "True Lies: Realism, Robustness, and Models", Philosophy of Science, Vol. 78, No. 5, 2011)

"Mathematical modeling is the modern version of both applied mathematics and theoretical physics. In earlier times, one proposed not a model but a theory. By talking today of a model rather than a theory, one acknowledges that the way one studies the phenomenon is not unique; it could also be studied other ways. One's model need not claim to be unique or final. It merits consideration if it provides an insight that isn't better provided by some other model." (Reuben Hersh, ”Mathematics as an Empirical Phenomenon, Subject to Modeling”, 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The first epistemic principle to embrace is that there is always a gap between our data and the real world. We fall headfirst into a pitfall when we forget that this gap exists, that our data isn't a perfect reflection of the real-world phenomena it's representing. Do people really fail to remember this? It sounds so basic. How could anyone fall into such an obvious trap?" (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"We live on islands surrounded by seas of data. Some call it 'big data'. In these seas live various species of observable phenomena. Ideas, hypotheses, explanations, and graphics also roam in the seas of data and can clarify the waters or allow unsupported species to die. These creatures thrive on visual explanation and scientific proof. Over time new varieties of graphical species arise, prompted by new problems and inner visions of the fishers in the seas of data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Although to penetrate into the intimate mysteries of nature and hence to learn the true causes of phenomena is not allowed to us, nevertheless it can happen that a certain fictive hypothesis may suffice for explaining many phenomena." (Leonhard Euler)

🔭Data Science: Data Mining (Just the Quotes)

"Data mining is the efficient discovery of valuable, nonobvious information from a large collection of data. […] Data mining centers on the automated discovery of new facts and relationships in data. The idea is that the raw material is the business data, and the data mining algorithm is the excavator, sifting through the vast quantities of raw data looking for the valuable nuggets of business information." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Data mining is more of an art than a science. No one can tell you exactly how to choose columns to include in your data mining models. There are no hard and fast rules you can follow in deciding which columns either help or hinder the final model. For this reason, it is important that you understand how the data behaves before beginning to mine it. The best way to achieve this level of understanding is to see how the data is distributed across columns and how the different columns relate to one another. This is the process of exploring the data." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algorithms are more scalable than statisticians ever thought possible. Formal statistical theory is more pervasive than computer scientists had realized." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Most mainstream data-mining techniques ignore the fact that real-world datasets are combinations of underlying data, and build single models from them. If such datasets can first be separated into the components that underlie them, we might expect that the quality of the models will improve significantly. (David Skillicorn, "Understanding Complex Datasets: Data Mining with Matrix Decompositions", 2007)

"The name ‘data mining’ derives from the metaphor of data as something that is large, contains far too much detail to be used as it is, but contains nuggets of useful information that can have value. So data mining can be defined as the extraction of the valuable information and actionable knowledge that is implicit in large amounts of data. (David Skillicorn, "Understanding Complex Datasets: Data Mining with Matrix Decompositions", 2007)

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Traditional statistical studies use past information to determine a future state of a system (often called prediction), whereas data mining studies use past information to construct patterns based not solely on the input data, but also on the logical consequences of those data. This process is also called prediction, but it contains a vital element missing in statistical analysis: the ability to provide an orderly expression of what might be in the future, compared to what was in the past (based on the assumptions of the statistical method)." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The difference between human dynamics and data mining boils down to this: Data mining predicts our behaviors based on records of our patterns of activity; we don't even have to understand the origins of the patterns exploited by the algorithm. Students of human dynamics, on the other hand, seek to develop models and theories to explain why, when, and where we do the things we do with some regularity." (Albert-László Barabási, "Bursts: The Hidden Pattern Behind Everything We Do", 2010)

"Data mining is a craft. As with many crafts, there is a well-defined process that can help to increase the likelihood of a successful result. This process is a crucial conceptual tool for thinking about data science projects. [...] data mining is an exploratory undertaking closer to research and development than it is to engineering." (Foster Provost, "Data Science for Business", 2013)

"There is another important distinction pertaining to mining data: the difference between (1) mining the data to find patterns and build models, and (2) using the results of data mining. Students often confuse these two processes when studying data science, and managers sometimes confuse them when discussing business analytics. The use of data mining results should influence and inform the data mining process itself, but the two should be kept distinct." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Unfortunately, creating an objective function that matches the true goal of the data mining is usually impossible, so data scientists often choose based on faith and experience." (Foster Provost, "Data Science for Business", 2013)

"Data Mining is the art and science of discovering useful innovative patterns from data. (Anil K. Maheshwari, "Business Intelligence and Data Mining", 2015)

"Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so." (Pedro Domingos, "The Master Algorithm", 2015)

"Today we routinely learn models with millions of parameters, enough to give each elephant in the world his own distinctive wiggle. It’s even been said that data mining means 'torturing the data until it confesses'." (Pedro Domingos, "The Master Algorithm", 2015)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

🔭Data Science: Prediction (Just the Quotes)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"As a science progresses, its power of foresight rapidly increases, until the mathematician in his library acquires the power of anticipating nature, and predicting what will happen in circumstances which the eye of man has never examined." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"No matter how solidly founded a prediction may appear to us, we are never absolutely sure that experiment will not contradict it, if we undertake to verify it . […] It is far better to foresee even without certainty than not to foresee at all." (Henri Poincaré, "The Foundations of Science", 1913)

"[…] the statistical prediction of the future from the past cannot be generally valid, because whatever is future to any given past, is in tum past for some future. That is, whoever continually revises his judgment of the probability of a statistical generalization by its successively observed verifications and failures, cannot fail to make more successful predictions than if he should disregard the past in his anticipation of the future. This might be called the ‘Principle of statistical accumulation’." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1929)

"Postulate 1. All chance systems of causes are not alike in the sense that they enable us to predict the future in terms of the past. Postulate 2. Constant systems of chance causes do exist in nature. Postulate 3. Assignable causes of variation may be found and eliminated."(Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Rule 1. Original data should be presented in a way that will preserve the evidence in the original data for all the predictions assumed to be useful." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Rule 2. Any summary of a distribution of numbers in terms of symmetric functions should not give an objective degree of belief in any one of the inferences or predictions to be made therefrom that would cause human action significantly different from what this action would be if the original distributions had been taken as evidence." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Factual science may collect statistics, and make charts. But its predictions are, as has been well said, but past history reversed." (John Dewey, "Art as Experience", 1934)

"It is never possible to predict a physical occurrence with unlimited precision." (Max Planck, "A Scientific Autobiography", 1949)

"To say that observations of the past are certain, whereas predictions are merely probable, is not the ultimate answer to the question of induction; it is only a sort of intermediate answer, which is incomplete unless a theory of probability is developed that explains what we should mean by ‘probable’ and on what ground we can assert probabilities." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"The world is not made up of empirical facts with the addition of the laws of nature: what we call the laws of nature are conceptual devices by which we organize our empirical knowledge and predict the future." (Richard B Braithwaite, "Scientific Explanation", 1953)

"The predictions of physical theories for the most part concern situations where initial conditions can be precisely specified. If such initial conditions are not found in nature, they can be arranged." (Anatol Rapoport, "The Search for Simplicity", 1956)

"Predictions, prophecies, and perhaps even guidance – those who suggested this title to me must have hoped for such-even though occasional indulgences in such actions by statisticians has undoubtedly contributed to the characterization of a statistician as a man who draws straight lines from insufficient data to foregone conclusions!" (John W Tukey, "Where do We Go From Here?", Journal of the American Statistical Association, Vol. 55, No. 289, 1960)

"Can there be laws of chance? The answer, it would seem should be negative, since chance is in fact defined as the characteristic of the phenomena which follow no law, phenomena whose causes are too complex to permit prediction." (Félix E Borel, "Probabilities and Life", 1962)

"[…] All predictions are statistical, but some predictions have such a high probability that one tends to regard them as certain." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"It is of course desirable to work with manageable models which maximize generality, realism, and precision toward the overlapping but not identical goals of understanding, predicting, and modifying nature. But this cannot be done."(Richard Levins, "The strategy of model building in population biology", American Scientist Vol. 54 (4), 1966)

"The language of association and prediction is probably most often used because the evidence seems insufficient to justify a direct causal statement. A better practice is to state the causal hypothesis and then to present the evidence along with an assessment with respect to the causal hypothesis - instead of letting the quality of the data determine the language of the explanation." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The moment you forecast you know you’re going to be wrong, you just don’t know when and in which direction." (Edgar R Fiedler, 1977)

"But a theory is not like an airline or bus timetable. We are not interested simply in the accuracy of its predictions. A theory also serves as a base for thinking. It helps us to understand what is going on by enabling us to organize our thoughts. Faced with a choice between a theory which predicts well but gives us little insight into how the system works and one which gives us this insight but predicts badly, I would choose the latter, and I am inclined to think that most economists would do the same." (Ronald Coase, "How should economists choose?", 1981)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, Mind and Nature: A necessary unity", 1988)

"We can predict only those things we set up to be predictable, not what we encounter in the real world of living and reactive processes." (Bill Mollison, "Permaculture: A Designers' Manual", 1988)

"A model is generally more believable if it can predict what will happen, rather than 'explain' something that has already occurred." (James R Thompson, "Empirical Model Building", 1989)

"The ability of a scientific theory to be refuted is the key criterion that distinguishes science from metaphysics. If a theory cannot be refuted, if there is no observation that will disprove it, then nothing can prove it - it cannot predict anything, it is a worthless myth." (Eric Lerner, "The Big Bang Never Happened", 1991)

"Unforeseen technological inventions can completely upset the most careful predictions." (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"Prediction (forecasting) is the process of generating information for the possible future development of a process from data about its past and its present development." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details - in the quality of information that forms the basis of probability estimates." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"[…] the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth." (Richard Swinburne, "Simplicity as Evidence for Truth", 1997)

"If you have only a small proportion of cases with missing data, you can simply throw out those cases for purposes of estimation; if you want to make predictions for cases with missing inputs, you don’t have the option of throwing those cases out." (Warren S Sarle, "Prediction with missing inputs", 1998) 

"The point is that scientific descriptions of phenomena in all of these cases do not fully capture reality they are models. This is not a shortcoming but a strength of science much of the scientist's art lies in figuring out what to include and what to exclude in a model, and this ability allows science to make useful predictions without getting bogged down by intractable details." (Philip Ball," The Self-Made Tapestry: Pattern Formation in Nature", 1998)

"We use mathematics and statistics to describe the diverse realms of randomness. From these descriptions, we attempt to glean insights into the workings of chance and to search for hidden causes. With such tools in hand, we seek patterns and relationships and propose predictions that help us make sense of the world." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"When a system is predictable, it is already performing as consistently as possible. Looking for assignable causes is a waste of time and effort. Instead, you can meaningfully work on making improvements and modifications to the process. When a system is unpredictable, it will be futile to try and improve or modify the process. Instead you must seek to identify the assignable causes which affect the system. The failure to distinguish between these two different courses of action is a major source of confusion and wasted effort in business today." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"When a process displays unpredictable behavior, you can most easily improve the process and process outcomes by identifying the assignable causes of unpredictable variation and removing their effects from your process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The only way to look into the future is use theories since conclusive data is only available about the past." (Clayton Christensen et al, "Seeing What’s Next: Using the Theories of Innovation to Predict Industry Change", 2004)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential' view." (Ray Kurzweil, "The Singularity is Near", 2005)

"There may be no significant difference between the point of view of inferring the true structure and that of making a prediction if an infinitely large quantity of data is available or if the data are noiseless. However, in modeling based on a finite quantity of real data, there is a significant gap between these two points of view, because an optimal model for prediction purposes may be different from one obtained by estimating the 'true model'." (Genshiro Kitagawa & Sadanori Konis, "Information Criteria and Statistical Modeling", 2007)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"Complexity carries with it a lack of predictability different to that of chaotic systems, i.e. sensitivity to initial conditions. In the case of complexity, the lack of predictability is due to relevant interactions and novel information created by them." (Carlos Gershenson, "Understanding Complex Systems", 2011)

"The illusion that we understand the past fosters overconfidence in our ability to predict the future." (Daniel Kahneman, "Thinking, Fast and Slow", 2011)

"[...] things that seem hopelessly random and unpredictable when viewed in isolation often turn out to be lawful and predictable when viewed in aggregate." (Steven Strogatz, "The Joy of X: A Guided Tour of Mathematics, from One to Infinity", 2012)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"In data science, a predictive model is a formula for estimating the unknown value of interest: the target. The formula could be mathematical, or it could be a logical statement such as a rule. Often it is a hybrid of the two." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Under complexity science, the more interacting factors, the more unpredictable and irregular the outcome. To be succinct, the greater the complexity, the greater the unpredictability." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […].(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Prediction about the future assumes that the statistical model will continue to fit future data. There are several reasons this is often implausible, but it also seems clear that the model will often degenerate slowly in quality, so that the model will fit data only a few periods in the future almost as well as the data used to fit the model. To some degree, the reliability of extrapolation into the future involves subject-matter expertise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"The problem of complexity is at the heart of mankind’s inability to predict future events with any accuracy. Complexity science has demonstrated that the more factors found within a complex system, the more chances of unpredictable behavior. And without predictability, any meaningful control is nearly impossible. Obviously, this means that you cannot control what you cannot predict. The ability ever to predict long-term events is a pipedream. Mankind has little to do with changing climate; complexity does." (Lawrence K Samuels, "The Real Science Behind Changing Climate", LewRockwell.com, August 1, 2014)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"We filter new information. If it accords with what we expect, we’ll be more likely to accept it. […] Our brains are always trying to make sense of the world around us based on incomplete information. The brain makes predictions about what it expects, and tends to fill in the gaps, often based on surprisingly sparse data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

More quotes on "Prediction" at the-web-of-knowledge.blogspot.com

🔭Data Science: Statistics (Just the Quotes)

"There are two aspects of statistics that are continually mixed, the method and the science. Statistics are used as a method, whenever we measure something, for example, the size of a district, the number of inhabitants of a country, the quantity or price of certain commodities, etc. […] There is, moreover, a science of statistics. It consists of knowing how to gather numbers, combine them and calculate them, in the best way to lead to certain results. But this is, strictly speaking, a branch of mathematics." (Alphonse P de Candolle, "Considerations on Crime Statistics", 1833)

"A judicious man looks at Statistics, not to get knowledge, but to save himself from having ignorance foisted on him." (Thomas Carlyle, "Chartism", 1840)

"What constitutes the well-being of a man? Many things; of which the wages he gets, and the bread he buys with them, are but one preliminary item. Grant, however, that the
wages were the whole; that once knowing the wages and the price of bread, we know all ; then what are the wages? Statistic Inquiry, in its present unguided condition, cannot
tell. The average rate of day's wages is not correctly ascertained for any portion of this country; not only not for half-centuries, it is not even ascertained anywhere for decades
or years: far from instituting comparisons with the past, the present itself is unknown to us." (Thomas Carlyle, "Chartism", 1840)

"Statistics has then for its object that of presenting a faithful representation of a state at a determined epoch." (Adolphe Quetelet, 1849)

"Most statistical arguments depend upon a few figures picked out at random." (William S Jevons, [letter to Richard Hutton] 1863)

"[Statistics] are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man." (Sir Francis Galton, "Natural Inheritance", 1889)

"[…] statistics is the science of the measurement of the social organism, regarded as a whole, in all its manifestations." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may rightly be called the science of averages. […] Great numbers and the averages resulting from them, such as we always obtain in measuring social phenomena, have great inertia. […] It is this constancy of great numbers that makes statistical measurement possible. It is to great numbers that statistical measurement chiefly applies." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may, for instance, be called the science of counting. Counting appears at first sight to be a very simple operation, which any one can perform or which can be done automatically; but, as a matter of fact, when we come to large numbers, e.g., the population of the United Kingdom, counting is by no means easy, or within the power of an individual; limits of time and place alone prevent it being so carried out, and in no way can absolute accuracy be obtained when the numbers surpass certain limits." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may be defined as numerical statements of facts by means of which large aggregates are analyzed, the relations of individual units to their groups are ascertained, comparisons are made between groups, and continuous records are maintained for comparative purposes." (Melvin T Copeland. "Statistical Methods" [in: Harvard Business Studies, Vol. III, Ed. by Melvin T Copeland, 1917])

"Statistics may be regarded as (i) the study of populations, (ii) as the study of variation, and (iii) as the study of methods of the reduction of data." (Sir Ronald A Fisher, "Statistical Methods for Research Worker", 1925)

"The conception of statistics as the study of variation is the natural outcome of viewing the subject as the study of populations; for a population of individuals in all respects identical is completely described by a description of anyone individual, together with the number in the group. The populations which are the object of statistical study always display variations in one or more respects. To speak of statistics as the study of variation also serves to emphasise the contrast between the aims of modern statisticians and those of their predecessors." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The statistical examination of a body of data is thus logically similar to the general alternation of inductive and deductive methods throughout the sciences. A hypothesis is conceived and defined with all necessary exactitude; its logical consequences are ascertained by a deductive argument; these consequences are compared with the available observations; if these are completely in, accord with the deductions, the hypothesis is justified at least until fresh and more stringent observations are available." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained from observation or experiment. The subject has a coherent structure based on the theory of Probability and includes many different procedures which contribute to research and development throughout the whole of Science and Technology." (Egon Pearson, 1936)

"All statistical analysis in business must aim at the control of action. The possible conclusions are: 1. Certain action must be taken. 2. No action is required. 3. Certain tendencies must be watched. 4. The analysis is not significant and either (a) certain further facts are required, or (b) there are no indications that further facts should be obtained." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"[Statistics] is both a science and an art. It is a science in that its methods are basically systematic and have general application; and an art in that their successful application depends to a considerable degree on the skill and special experience of the statistician, and on his knowledge of the field of application, e.g. economics." (Leonard H C Tippett, "Statistics", 1943)

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena. In this definition 'natural phenomena' includes all the happenings of the external world, whether human or not " (Sir Maurice G Kendall, "Advanced Theory of Statistics", Vol. 1, 1943)

"To some people, statistics is ‘quartered pies, cute little battleships and tapering rows of sturdy soldiers in diversified uniforms’. To others, it is columns and columns of numerical facts. Many regard it as a branch of economics. The beginning student of the subject considers it to be largely mathematics." (The Editors, "Statistics, The Physical Sciences and Engineering", The American Statistician, Vol. 2, No. 4, 1948)

"For the most part, Statistics is a method of investigation that is used when other methods are of no avail; it is often a last resort and a forlorn hope. A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. The surgeon must guard carefully against false incisions with his scalpel. Very often he has to sew up the patient as inoperable. The public knows too little about the statistician as a conscientious and skilled servant of true science." (Michael J Moroney, "Facts from Figures", 1951)

"Statistics is the name for that science and art which deals with uncertain inferences - which uses numbers to find out something about nature and experience." (Warren Weaver, 1952)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"In brief, the greatest care must be exercised in using any statistical data, especially when it has been collected by another agency. At all times, the statistician who uses published data must ask himself, by whom were the data collected, how and for what purpose?" (Alfred R Ilersic, "Statistics", 1959)

"Poor statistics may be attributed to a number of causes. There are the mistakes which arise in the course of collecting the data, and there are those which occur when those data are being converted into manageable form for publication. Still later, mistakes arise because the conclusions drawn from the published data are wrong. The real trouble with errors which arise during the course of collecting the data is that they are the hardest to detect." (Alfred R Ilersic, "Statistics", 1959)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"Many people use statistics as a drunkard uses a street lamp - for support rather than illumination. It is not enough to avoid outright falsehood; one must be on the alert to detect possible distortion of truth. One can hardly pick up a newspaper without seeing some sensational headline based on scanty or doubtful data." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"[Statistics] is concerned with things we can count. In so far as things, persons, are unique or ill-defi ned, statistics are meaningless and statisticians silenced; in so far as things are similar and definite - so many workers over 25, so many nuts and bolts made during December - they can be counted and new statistical facts are born." (Maurice S Bartlett, "Essays on Probability and Statistics", 1962)

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena." (Sir Maurice G Kendall & Alan Stuart, "The Advanced Theory of Statistics", 1963)

"Statistics may be defined as the discipline concerned with the treatment of numerical data derived from groups of individuals." (Peter Armitage, "Statistical Methods in Medical Research", 1971)

"We provisionally define statistics as the study of how information should be employed to reflect on, and give guidance for action in, a practical situation involving uncertainty." (Vic Barnett, "Comparative Statistical Inference" 2nd Ed., 1982)

"Statistics is a tool. In experimental science you plan and carry out experiments, and then analyse and interpret the results. To do this you use statistical arguments and calculations. Like any other tool - an oscilloscope, for example, or a spectrometer, or even a humble spanner - you can use it delicately or clumsily, skillfully or ineptly. The more you know about it and understand how it works, the better you will be able to use it and the more useful it will be." (Roger Barlow, "Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences", 1989)

"Sometimes the proprietors of a bad model claim that parts of it are facts, not just beliefs. Evaluation then amounts to determining if facts support the claims, and disciplines like statistics have tools for this task. The difficulty of using statistical tools will vary depending on the problem." (James S Hodges, "Six (or So) Things You Can Do with a Bad Model", 1991)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Statistics is a general intellectual method that applies wherever data, variation, and chance appear. It is a fundamental method because data, variation and chance are omnipresent in modern life. It is an independent discipline with its own core ideas rather than, for example, a branch of mathematics. […] Statistics offers general, fundamental, and independent ways of thinking." (David S Moore, "Statistics among the Liberal Arts", Journal of the American Statistical Association, 1998)

"Statistics is the branch of mathematics that uses observations and measurements called data to analyze, summarize, make inferences, and draw conclusions based on the data gathered." (Allan G Bluman, "Probability Demystified", 2005)

"Sometimes the most important fit statistic you can get is ‘convergence not met’ - it can tell you something is wrong with your model." (Oliver Schabenberger, "Applied Statistics in Agriculture Conference", 2006)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"Statistics is the art of learning from data. It is concerned with the collection of data, their subsequent description, and their analysis, which often leads to the drawing of conclusions." (Sheldon M Ross, "Introductory Statistics" 3rd Ed., 2009)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions." (Ron Larson & Betsy Farber, "Elementary Statistics: Picturing the World" 5th Ed., 2011)

"Statistics is the discipline of using data samples to support claims about populations." (Allen B Downey, "Think Stats: Probability and Statistics for Programmers", 2011)

"[… ] statistics is about understanding the role that variability plays in drawing conclusions based on data. […] Statistics is not about numbers; it is about data - numbers in context. It is the context that makes a problem meaningful and something worth considering." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Statistics is the scientific discipline that provides methods to help us make sense of data. […] The field of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation." (Roxy Peck & Jay L Devore, "Statistics: The Exploration and Analysis of Data" 7th Ed, 2012)

"[…] statistics is a method of pursuing truth. At a minimum, statistics can tell you the likelihood that your hunch is true in this time and place and with these sorts of people. This type of pursuit of truth, especially in the form of an event’s future likelihood, is the essence of psychology, of science, and of human evolution." (Arthhur Aron et al, "Statistics for Phsychology" 6th Ed., 2012)

"Statistics is the scientific discipline that provides methods to help us make sense of data. Statistical methods, used intelligently, offer a set of powerful tools for gaining insight into the world around us." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models.  [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Statistics is the art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data." (Alan Agresti & Christine Franklin, "Statistics: The Art and Science of Learning from Data" 3rd Ed., 2013)

"Statistics is a science that helps us make decisions and draw conclusions in the presence of variability." (Douglas C Montgomery & George C Runger, "Applied Statistics and Probability for Engineers" 6th Ed., 2014)

"Statistics is an integral part of the quantitative approach to knowledge. The field of statistics is concerned with the scientific study of collecting, organizing, analyzing, and drawing conclusions from data." (Kandethody M Ramachandran & Chris P Tsokos, "Mathematical Statistics with Applications in R" 2nd Ed., 2015)

"Statistics can be defined as a collection of techniques used when planning a data collection, and when subsequently analyzing and presenting data." (Birger S Madsen, "Statistics for Non-Statisticians", 2016)

"Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. […] Statistics is the science of learning from data." (Moore McCabe & Alwan Craig, "The Practice of Statistics for Business and Economics" 4th Ed., 2016)

"Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions." (Michael Sullivan, "Statistics: Informed Decisions Using Data", 5th Ed., 2017)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"I believe that the backlash against statistics is due to four primary reasons. The first, and easiest for most people to relate to, is that even the most basic concepts of descriptive and inferential statistics can be difficult to grasp and even harder to explain. […] The second cause for vitriol is that even well-intentioned experts misapply the tools and techniques of statistics far too often, myself included. Statistical pitfalls are numerous and tough to avoid. When we can't trust the experts to get it right, there's a temptation to throw the baby out with the bathwater. The third reason behind all the hate is that those with an agenda can easily craft statistics to lie when they communicate with us  […] And finally, the fourth cause is that often statistics can be perceived as cold and detached, and they can fail to communicate the human element of an issue." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"If we don’t understand the statistics, we’re likely to be badly mistaken about the way the world is. It is all too easy to convince ourselves that whatever we’ve seen with our own eyes is the whole truth; it isn’t. Understanding causation is tough even with good statistics, but hopeless without them. [...] And yet, if we understand only the statistics, we understand little. We need to be curious about the world that we see, hear, touch, and smell, as well as the world we can examine through a spreadsheet." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The contradiction between what we see with our own eyes and what the statistics claim can be very real. […] The truth is more complicated. Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience - and in those cases, which to believe." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The whole discipline of statistics is built on measuring or counting things. […] it is important to understand what is being measured or counted, and how. It is surprising how rarely we do this. Over the years, as I found myself trying to lead people out of statistical mazes week after week, I came to realize that many of the problems I encountered were because people had taken a wrong turn right at the start. They had dived into the mathematics of a statistical claim - asking about sampling errors and margins of error, debating if the number is rising or falling, believing, doubting, analyzing, dissecting - without taking the ti- me to understand the first and most obvious fact: What is being measured, or counted? What definition is being used?" (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

More quotes on "Statistics" at the-web-of-knowledge.blogspot.com

🔭Data Science: Variance (Just the Quotes)

"There is, then, in this analysis of variance no indication of any other than innate and heritable factors at work." (Sir Ronald A Fisher, "The Causes of Human Variability", Eugenics Review Vol. 10, 1918)

"The mean and variance are unambiguously determined by the distribution, but a distribution is, of course, not determined by its mean and variance: A number of different distributions have the same mean and the same variance." (Richard von Mises, "Probability, Statistics And Truth", 1928)

"However, perhaps the main point is that you are under no obligation to analyse variance into its parts if it does not come apart easily, and its unwillingness to do so naturally indicates that one’s line of approach is not very fruitful." (Sir Ronald A Fisher, [Letter to Lancelot Hogben] 1933)

"The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic." (Sir Ronald A Fisher, Journal of the Royal Statistical Society Vol. 1, 1934)

"Undoubtedly one of the most elegant, powerful, and useful techniques in modern statistical method is that of the Analysis of Variation and Co-variation by which the total variation in a set of data may be reduced to components associated with possible sources of variability whose relative importance we wish to assess. The precise form which any given analysis will take is intimately connected with the structure of the investigation from which the data are obtained. A simple structure will lead to a simple analysis; a complex structure to a complex analysis." (Michael J Moroney, "Facts from Figures", 1951)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Analysis of variance [...] stems from a hypothesis-testing formulation that is difficult to take seriously and would be of limited value for making final conclusions." (Herman Chernoff, Comment,  The American Statistician 40(1), 1986)

"One of the classical assumptions in linear regression analysis is that of equal variance, which is frequently referred to as homoscedasticity. However, this assumption may not be valid in data analysis arising from many fields (e.g., economics, finance, engineering, and biological science). When heteroscedasticity (nonconstant variance) occurs, the statistical inferences and predictions via the ordinary least squares method are often not reliable. Therefore, it is crucial to study the heteroscedastic error structure in linear model fitting." (Xiaogang Su et al, "Treed Variance", Journal of Computational and Graphical Statistics, Vol. 15 (2), 2006)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves."  (Bart Kosko, "Noise", 2006)

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"High-bias models typically produce simpler models that do not overfit and in those cases the danger is that of underfitting. Models with low-bias are typically more complex and that complexity enables us to represent the training data in a more accurate way. The danger here is that the flexibility provided by higher complexity may end up representing not only a relationship in the data but also the noise. Another way of portraying the bias-variance trade-off is in terms of complexity v simplicity." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017) 

"If either bias or variance is high, the model can be very far off from reality. In general, there is a trade-off between bias and variance. The goal of any machine-learning algorithm is to achieve low bias and low variance such that it gives good prediction performance. In reality, because of so many other hidden parameters in the model, it is hard to calculate the real bias and variance error. Nevertheless, the bias and variance provide a measure to understand the behavior of the machine-learning algorithm so that the model model can be adjusted to provide good prediction performance." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017) 

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Variance is a prediction error due to different sets of training samples. Ideally, the error should not vary from one training sample to another sample, and the model should be stable enough to handle hidden variations between input and output variables. Normally this occurs with the overfitted model." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance quantifies how accurately a model estimates the target variable if a different dataset is used to train the model. It quantifies whether the mathematical formulation of our model is a good generalization of the underlying patterns. Specific overfitted rules based on specific scenarios and situations = high variance, and rules that are generalized and applicable to a variety of scenarios and situations = low variance." (Imran Ahmad, "40 Algorithms Every Programmer Should Know", 2020)

🔭Data Science: Models (Just the Quotes)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"The principle of complementarity states that no single model is possible which could provide a precise and rational analysis of the connections between these phenomena [before and after measurement]. In such a case, we are not supposed, for example, to attempt to describe in detail how future phenomena arise out of past phenomena. Instead, we should simply accept without further analysis the fact that future phenomena do in fact somehow manage to be produced, in a way that is, however, necessarily beyond the possibility of a detailed description. The only aim of a mathematical theory is then to predict the statistical relations, if any, connecting the phenomena." (David Bohm, "A Suggested Interpretation of the Quantum Theory in Terms of ‘Hidden’ Variables", 1952)

"Consistency and completeness can also be characterized in terms of models: a theory T is consistent if and only if it has at least one model; it is complete if and only if every sentence of T which is satified in one model is also satisfied in any other model of T. Two theories T1 and T2 are said to be compatible if they have a common consistent extension; this is equivalent to saying that the union of T1 and T2 is consistent." (Alfred Tarski et al, "Undecidable Theories", 1953)

"The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work" (John Von Neumann, "Method in the Physical Sciences", 1955)

"[…] no models are [true] = not even the Newtonian laws. When you construct a model you leave out all the details which you, with the knowledge at your disposal, consider inessential. […] Models should not be true, but it is important that they are applicable, and whether they are applicable for any given purpose must of course be investigated. This also means that a model is never accepted finally, only on trial." (Georg Rasch, "Probabilistic Models for Some Intelligence and Attainment Tests", 1960)

"[...] the null-hypothesis models [...] share a crippling flaw: in the real world the null hypothesis is almost never true, and it is usually nonsensical to perform an experiment with the sole aim of rejecting the null hypothesis." (Jum Nunnally, "The place of statistics in psychology", Educational and Psychological Measurement 20, 1960)

"If one technique of data analysis were to be exalted above all others for its ability to be revealing to the mind in connection with each of many different models, there is little doubt which one would be chosen. The simple graph has brought more information to the data analyst’s mind than any other device. It specializes in providing indications of unexpected phenomena." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"A model is essentially a calculating engine designed to produce some output for a given input." (Richard C Lewontin, "Models, Mathematics and Metaphors", Synthese, Vol. 15, No. 2, 1963)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology", 1966)

"Models are to be used, but not to be believed." (Henri Theil, "Principles of Econometrics", 1971)

"A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant." (Manfred Eigen, 1973)

"A model is an abstract description of the real world. It is a simple representation of more complex forms, processes and functions of physical phenomena and ideas." (Moshe F Rubinstein & Iris R Firstenberg, "Patterns of Problem Solving", 1975)

"A model is an attempt to represent some segment of reality and explain, in a simplified manner, the way the segment operates." (E Frank Harrison, "The managerial decision-making process", 1975)


"The value of a model lies in its substitutability for the real system for achieving an intended purpose." (David I Cleland & William R King, "Systems analysis and project management" , 1975)


"For the theory-practice iteration to work, the scientist must be, as it were, mentally ambidextrous; fascinated equally on the one hand by possible meanings, theories, and tentative models to be induced from data and the practical reality of the real world, and on the other with the factual implications deducible from tentative theories, models and hypotheses." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Mathematical models are more precise and less ambiguous than quantitative models and are therefore of greater value in obtaining specific answers to certain managerial questions." (Henry L Tosi & Stephen J Carrol, "Management", 1976)

"The aim of the model is of course not to reproduce reality in all its complexity. It is rather to capture in a vivid, often formal, way what is essential to understanding some aspect of its structure or behavior." (Joseph Weizenbaum, "Computer power and human reason: From judgment to calculation" , 1976)

"Models, of course, are never true, but fortunately it is only necessary that they be useful. For this it is usually needful only that they not be grossly wrong. I think rather simple modifications of our present models will prove adequate to take account of most realities of the outside world. The difficulties of computation which would have been a barrier in the past need not deter us now." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"The purpose of models is not to fit the data but to sharpen the questions." (Samuel Karlin, 1983)

"The connection between a model and a theory is that a model satisfies a theory; that is, a model obeys those laws of behavior that a corresponding theory explicity states or which may be derived from it. [...] Computers make possible an entirely new relationship between theories and models. [...] A theory written in the form of a computer program is [...] both a theory and, when placed on a computer and run, a model to which the theory applies." (Joseph Weizenbaum, "Computer Power and Human Reason", 1984)

“There are those who try to generalize, synthesize, and build models, and there are those who believe nothing and constantly call for more data. The tension between these two groups is a healthy one; science develops mainly because of the model builders, yet they need the second group to keep them honest.” (Andrew Miall, “Principles of Sedimentary Basin Analysis”, 1984)

"Competent scientists do not believe their own models or theories, but rather treat them as convenient fictions. [...] The issue to a scientist is not whether a model is true, but rather whether there is another whose predictive power is enough better to justify movement from today’s fiction to a new one." (Steve Vardeman, "Comment", Journal of the American Statistical Association 82, 1987)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful." (George Box, 1987)

"A theory is a good theory if it satisfies two requirements: it must accurately describe a large class of observations on the basis of a model that contains only a few arbitrary elements, and it must make definite predictions about the results of future observations." (Stephen Hawking, "A Brief History of Time: From Big Bang To Black Holes", 1988) 

"[…] no good model ever accounted for all the facts, since some data was bound to be misleading if not plain wrong. A theory that did fit all the data would have been ‘carpentered’ to do this and would thus be open to suspicion." (Francis H C Crick, "What Mad Pursuit: A Personal View of Scientific Discovery", 1988)

"A model is generally more believable if it can predict what will happen, rather than 'explain' something that has already occurred. […] Model building is not so much the safe and cozy codification of what we are confident about as it is a means of orderly speculation." (James R Thompson, "Empirical Model Building", 1989)

"Model is used as a theory. It becomes theory when the purpose of building a model is to understand the mechanisms involved in the developmental process. Hence as theory, model does not carve up or change the world, but it explains how change takes place and in what way or manner. This leads to build change in the structures." (Laxmi K Patnaik, "Model Building in Political Science", The Indian Journal of Political Science Vol. 50 (2), 1989)

"When evaluating a model, at least two broad standards are relevant. One is whether the model is consistent with the data. The other is whether the model is consistent with the ‘real world’." (Kenneth A Bollen, "Structural Equations with Latent Variables", 1989)

"Statistical models are sometimes misunderstood in epidemiology. Statistical models for data are never true. The question whether a model is true is irrelevant. A more appropriate question is whether we obtain the correct scientific conclusion if we pretend that the process under study behaves according to a particular statistical model." (Scott Zeger, "Statistical reasoning in epidemiology", American Journal of Epidemiology, 1991)

"No one has ever shown that he or she had a free lunch. Here, of course, 'free lunch' means 'usefulness of a model that is locally easy to make inferences from'. (John Tukey, "Issues relevant to an honest account of data-based inference, partially in the light of Laurie Davies’ paper", 1993)

"Model building is the art of selecting those aspects of a process that are relevant to the question being asked. As with any art, this selection is guided by taste, elegance, and metaphor; it is a matter of induction, rather than deduction. High science depends on this art." (John H Holland, "Hidden Order: How Adaptation Builds Complexity", 1995)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"A good model makes the right strategic simplifications. In fact, a really good model is one that generates a lot of understanding from focusing on a very small number of causal arrows." (Robert M Solow, "How Did Economics Get That Way and What Way Did It Get?", Daedalus, Vol. 126, No. 1, 1997)

"A model is a deliberately simplified representation of a much more complicated situation. […] The idea is to focus on one or two causal or conditioning factors, exclude everything else, and hope to understand how just these aspects of reality work and interact." (Robert M Solow, "How Did Economics Get That Way and What Way Did It Get?", Daedalus, Vol. 126, No. 1, 1997)

"We do not learn much from looking at a model - we learn more from building the model and manipulating it. Just as one needs to use or observe the use of a hammer in order to really understand its function, similarly, models have to be used before they will give up their secrets. In this sense, they have the quality of a technology - the power of the model only becomes apparent in the context of its use." (Margaret Morrison & Mary S Morgan, "Models as mediating instruments", 1999)

"Building statistical models is just like this. You take a real situation with real data, messy as this is, and build a model that works to explain the behavior of real data." (Martha Stocking, New York Times, 2000)

"As I left consulting to go back to the university, these were the perceptions I had about working with data to find answers to problems: (a) Focus on finding a good solution–that’s what consultants get paid for. (b) Live with the data before you plunge into modelling. (c) Search for a model that gives a good solution, either algorithmic or data. (d) Predictive accuracy on test sets is the criterion for how good the model is. (e) Computers are an indispensable partner." (
Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science Vol. 16(3), 2001)

"The goals in statistics are to use data to predict and to get information about the underlying data mechanism. Nowhere is it written on a stone tablet what kind of model should be used to solve problems involving data. To make my position clear, I am not against models per se. In some situations they are the most appropriate way to solve the problem. But the emphasis needs to be on the problem and on the data. Unfortunately, our field has a vested interest in models, come hell or high water." (Leo Breiman, "Statistical Modeling: The Two Cultures, Statistical Science" Vol. 16(3), 2001) 

"The point of a model is to get useful information about the relation between the response and predictor variables. Interpretability is a way of getting information. But a model does not have to be simple to provide reliable information about the relation between predictor and response variables; neither does it have to be a data model. The goal is not interpretability, but accurate information." (Leo Breiman, "Statistical Modeling: The Two Cultures, Statistical Science" Vol. 16(3), 2001)

"A good way to evaluate a model is to look at a visual representation of it. After all, what is easier to understand - a table full of mathematical relationships or a graphic displaying a decision tree with all of its splits and branches?" (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Models can be viewed and used at three levels. The first is a model that fits the data. A test of goodness-of-fit operates at this level. This level is the least useful but is frequently the one at which statisticians and researchers stop. For example, a test of a linear model is judged good when a quadratic term is not significant. A second level of usefulness is that the model predicts future observations. Such a model has been called a forecast model. This level is often required in screening studies or studies predicting outcomes such as growth rate. A third level is that a model reveals unexpected features of the situation being described, a structural model, [...] However, it does not explain the data." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Ockham's Razor in statistical analysis is used implicitly when models are embedded in richer models -for example, when testing the adequacy of a linear model by incorporating a quadratic term. If the coefficient of the quadratic term is not significant, it is dropped and the linear model is assumed to summarize the data adequately." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] finding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"[…] studying methods for parametric models is useful for two reasons. First, there are some cases where background knowledge suggests that a parametric model provides a reasonable approximation. […] Second, the inferential concepts for parametric models provide background for understanding certain nonparametric methods." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"I have often thought that outliers contain more information than the model." (Arnold Goodman,  [Joint Statistical Meetings] 2005)

"Sometimes the most important fit statistic you can get is ‘convergence not met’ - it can tell you something is wrong with your model." (Oliver Schabenberger, "Applied Statistics in Agriculture Conference", 2006)

"Effective models require a real world that has enough structure so that some of the details can be ignored. This implies the existence of solid and stable building blocks that encapsulate key parts of the real system’s behavior. Such building blocks provide enough separation from details to allow modeling to proceed."(John H. Miller & Scott E. Page, "Complex Adaptive Systems: An Introduction to Computational Models of Social Life", 2007)

"In science we try to explain reality by using models (theories). This is necessary because reality itself is too complex. So we need to come up with a model for that aspect of reality we want to understand – usually with the help of mathematics. Of course, these models or theories can only be simplifications of that part of reality we are looking at. A model can never be a perfect description of reality, and there can never be a part of reality perfectly mirroring a model." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is also inevitable for any model or theory to have an uncertainty (a difference between model and reality). Such uncertainties apply both to the numerical parameters of the model and to the inadequacy of the model as well. Because it is much harder to get a grip on these types of uncertainties, they are disregarded, usually." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Outliers or flyers are those data points in a set that do not quite fit within the rest of the data, that agree with the model in use. The uncertainty of such an outlier is seemingly too small. The discrepancy between outliers and the model should be subject to thorough examination and should be given much thought. Isolated data points, i.e., data points that are at some distance from the bulk of the data are not outliers if their values are in agreement with the model in use." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"What should be the distribution of random effects in a mixed model? I think Gaussian is just fine, unless you are trying to write a journal paper." (Terry Therneau, "Speaking at useR", 2007)

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"A model is a good model if it:1. Is elegant 2. Contains few arbitrary or adjustable elements 3. Agrees with and explains all existing observations 4. Makes detailed predictions about future observations that can disprove or falsify the model if they are not borne out." (Stephen Hawking & Leonard Mlodinow, "The Grand Design", 2010)

"In other words, the model is terrific in all ways other than the fact that it is totally useless. So why did we create it? In short, because we could: we have a data set, and a statistical package, and add the former to the latter, hit a few buttons and voila, we have another paper." (Andew J Vickers & Angel M Cronin, "Everything you always wanted to know about evaluating prediction models (but were too afraid to ask)", Urology 76(6), 2010)

"Darn right, graphs are not serious. Any untrained, unsophisticated, non-degree-holding civilian can display data. Relying on plots is like admitting you do not need a statistician. Show pictures of the numbers and let people make their own judgments? That can be no better than airing your statistical dirty laundry. People need guidance; they need to be shown what the data are supposed to say. Graphics cannot do that; models can." (William M Briggs, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

"In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality - a point well made by George Box in his oft-cited remark that "all models are wrong, but some are useful". (David Hand, "Wonderful examples, but let's not close our eyes", Statistical Science 29, 2014)

"Things which ought to be expected can seem quite extraordinary if you’ve got the wrong model." (David Hand, "Significance", 2014)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"The crucial concept that brings all of this together is one that is perhaps as rich and suggestive as that of a paradigm: the concept of a model. Some models are concrete, others are abstract. Certain models are fairly rigid; others are left somewhat unspecified. Some models are fully integrated into larger theories; others, or so the story goes, have a life of their own. Models of experiment, models of data, models in simulations, archeological modeling, diagrammatic reasoning, abductive inferences; it is difficult to imagine an area of scientific investigation, or established strategies of research, in which models are not present in some form or another. However, models are ultimately understood, there is no doubt that they play key roles in multiple areas of the sciences, engineering, and mathematics, just as models are central to our understanding of the practices of these fields, their history and the plethora of philosophical, conceptual, logical, and cognitive issues they raise." (Otávio Bueno, [in" Springer Handbook of Model-Based Science", Ed. by Lorenzo Magnani & Tommaso Bertolotti, 2017])

"The different classes of models have a lot to learn from each other, but the goal of full integration has proven counterproductive. No model can be all things to all people." (Olivier Blanchard, "On the future of macroeconomic models", Oxford Review of Economic Policy Vol. 34 (1–2), 2018)

"Bad data makes bad models. Bad models instruct people to make ineffective or harmful interventions. Those bad interventions produce more bad data, which is fed into more bad models." (Cory Doctorow, "Machine Learning’s Crumbling Foundations", 2021)

"On a final note, we would like to stress the importance of design, which often does not receive the attention it deserves. Sometimes, the large number of modeling options for spatial analysis may raise the false impression that design does not matter, and that a sophisticated analysis takes care of everything. Nothing could be further from the truth." (Hans-Peter Piepho et al, "Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials", “Biometrical Journal”, 2022)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.