24 December 2018

🔭Data Science: Prediction (Just the Quotes)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"As a science progresses, its power of foresight rapidly increases, until the mathematician in his library acquires the power of anticipating nature, and predicting what will happen in circumstances which the eye of man has never examined." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"No matter how solidly founded a prediction may appear to us, we are never absolutely sure that experiment will not contradict it, if we undertake to verify it . […] It is far better to foresee even without certainty than not to foresee at all." (Henri Poincaré, "The Foundations of Science", 1913)

"[…] the statistical prediction of the future from the past cannot be generally valid, because whatever is future to any given past, is in tum past for some future. That is, whoever continually revises his judgment of the probability of a statistical generalization by its successively observed verifications and failures, cannot fail to make more successful predictions than if he should disregard the past in his anticipation of the future. This might be called the ‘Principle of statistical accumulation’." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1929)

"Postulate 1. All chance systems of causes are not alike in the sense that they enable us to predict the future in terms of the past. Postulate 2. Constant systems of chance causes do exist in nature. Postulate 3. Assignable causes of variation may be found and eliminated."(Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Rule 1. Original data should be presented in a way that will preserve the evidence in the original data for all the predictions assumed to be useful." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Rule 2. Any summary of a distribution of numbers in terms of symmetric functions should not give an objective degree of belief in any one of the inferences or predictions to be made therefrom that would cause human action significantly different from what this action would be if the original distributions had been taken as evidence." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Factual science may collect statistics, and make charts. But its predictions are, as has been well said, but past history reversed." (John Dewey, "Art as Experience", 1934)

"It is never possible to predict a physical occurrence with unlimited precision." (Max Planck, "A Scientific Autobiography", 1949)

"To say that observations of the past are certain, whereas predictions are merely probable, is not the ultimate answer to the question of induction; it is only a sort of intermediate answer, which is incomplete unless a theory of probability is developed that explains what we should mean by ‘probable’ and on what ground we can assert probabilities." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"The world is not made up of empirical facts with the addition of the laws of nature: what we call the laws of nature are conceptual devices by which we organize our empirical knowledge and predict the future." (Richard B Braithwaite, "Scientific Explanation", 1953)

"The predictions of physical theories for the most part concern situations where initial conditions can be precisely specified. If such initial conditions are not found in nature, they can be arranged." (Anatol Rapoport, "The Search for Simplicity", 1956)

"Predictions, prophecies, and perhaps even guidance – those who suggested this title to me must have hoped for such-even though occasional indulgences in such actions by statisticians has undoubtedly contributed to the characterization of a statistician as a man who draws straight lines from insufficient data to foregone conclusions!" (John W Tukey, "Where do We Go From Here?", Journal of the American Statistical Association, Vol. 55, No. 289, 1960)

"Can there be laws of chance? The answer, it would seem should be negative, since chance is in fact defined as the characteristic of the phenomena which follow no law, phenomena whose causes are too complex to permit prediction." (Félix E Borel, "Probabilities and Life", 1962)

"[…] All predictions are statistical, but some predictions have such a high probability that one tends to regard them as certain." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"It is of course desirable to work with manageable models which maximize generality, realism, and precision toward the overlapping but not identical goals of understanding, predicting, and modifying nature. But this cannot be done."(Richard Levins, "The strategy of model building in population biology", American Scientist Vol. 54 (4), 1966)

"The language of association and prediction is probably most often used because the evidence seems insufficient to justify a direct causal statement. A better practice is to state the causal hypothesis and then to present the evidence along with an assessment with respect to the causal hypothesis - instead of letting the quality of the data determine the language of the explanation." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The moment you forecast you know you’re going to be wrong, you just don’t know when and in which direction." (Edgar R Fiedler, 1977)

"But a theory is not like an airline or bus timetable. We are not interested simply in the accuracy of its predictions. A theory also serves as a base for thinking. It helps us to understand what is going on by enabling us to organize our thoughts. Faced with a choice between a theory which predicts well but gives us little insight into how the system works and one which gives us this insight but predicts badly, I would choose the latter, and I am inclined to think that most economists would do the same." (Ronald Coase, "How should economists choose?", 1981)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, Mind and Nature: A necessary unity", 1988)

"We can predict only those things we set up to be predictable, not what we encounter in the real world of living and reactive processes." (Bill Mollison, "Permaculture: A Designers' Manual", 1988)

"A model is generally more believable if it can predict what will happen, rather than 'explain' something that has already occurred." (James R Thompson, "Empirical Model Building", 1989)

"The ability of a scientific theory to be refuted is the key criterion that distinguishes science from metaphysics. If a theory cannot be refuted, if there is no observation that will disprove it, then nothing can prove it - it cannot predict anything, it is a worthless myth." (Eric Lerner, "The Big Bang Never Happened", 1991)

"Unforeseen technological inventions can completely upset the most careful predictions." (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"Prediction (forecasting) is the process of generating information for the possible future development of a process from data about its past and its present development." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Probability theory is a serious instrument for forecasting, but the devil, as they say, is in the details - in the quality of information that forms the basis of probability estimates." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"[…] the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth." (Richard Swinburne, "Simplicity as Evidence for Truth", 1997)

"If you have only a small proportion of cases with missing data, you can simply throw out those cases for purposes of estimation; if you want to make predictions for cases with missing inputs, you don’t have the option of throwing those cases out." (Warren S Sarle, "Prediction with missing inputs", 1998) 

"The point is that scientific descriptions of phenomena in all of these cases do not fully capture reality they are models. This is not a shortcoming but a strength of science much of the scientist's art lies in figuring out what to include and what to exclude in a model, and this ability allows science to make useful predictions without getting bogged down by intractable details." (Philip Ball," The Self-Made Tapestry: Pattern Formation in Nature", 1998)

"We use mathematics and statistics to describe the diverse realms of randomness. From these descriptions, we attempt to glean insights into the workings of chance and to search for hidden causes. With such tools in hand, we seek patterns and relationships and propose predictions that help us make sense of the world." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"When a system is predictable, it is already performing as consistently as possible. Looking for assignable causes is a waste of time and effort. Instead, you can meaningfully work on making improvements and modifications to the process. When a system is unpredictable, it will be futile to try and improve or modify the process. Instead you must seek to identify the assignable causes which affect the system. The failure to distinguish between these two different courses of action is a major source of confusion and wasted effort in business today." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"When a process displays unpredictable behavior, you can most easily improve the process and process outcomes by identifying the assignable causes of unpredictable variation and removing their effects from your process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The only way to look into the future is use theories since conclusive data is only available about the past." (Clayton Christensen et al, "Seeing What’s Next: Using the Theories of Innovation to Predict Industry Change", 2004)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential' view." (Ray Kurzweil, "The Singularity is Near", 2005)

"There may be no significant difference between the point of view of inferring the true structure and that of making a prediction if an infinitely large quantity of data is available or if the data are noiseless. However, in modeling based on a finite quantity of real data, there is a significant gap between these two points of view, because an optimal model for prediction purposes may be different from one obtained by estimating the 'true model'." (Genshiro Kitagawa & Sadanori Konis, "Information Criteria and Statistical Modeling", 2007)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"Complexity carries with it a lack of predictability different to that of chaotic systems, i.e. sensitivity to initial conditions. In the case of complexity, the lack of predictability is due to relevant interactions and novel information created by them." (Carlos Gershenson, "Understanding Complex Systems", 2011)

"The illusion that we understand the past fosters overconfidence in our ability to predict the future." (Daniel Kahneman, "Thinking, Fast and Slow", 2011)

"[...] things that seem hopelessly random and unpredictable when viewed in isolation often turn out to be lawful and predictable when viewed in aggregate." (Steven Strogatz, "The Joy of X: A Guided Tour of Mathematics, from One to Infinity", 2012)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"In data science, a predictive model is a formula for estimating the unknown value of interest: the target. The formula could be mathematical, or it could be a logical statement such as a rule. Often it is a hybrid of the two." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Under complexity science, the more interacting factors, the more unpredictable and irregular the outcome. To be succinct, the greater the complexity, the greater the unpredictability." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […].(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Prediction about the future assumes that the statistical model will continue to fit future data. There are several reasons this is often implausible, but it also seems clear that the model will often degenerate slowly in quality, so that the model will fit data only a few periods in the future almost as well as the data used to fit the model. To some degree, the reliability of extrapolation into the future involves subject-matter expertise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"The problem of complexity is at the heart of mankind’s inability to predict future events with any accuracy. Complexity science has demonstrated that the more factors found within a complex system, the more chances of unpredictable behavior. And without predictability, any meaningful control is nearly impossible. Obviously, this means that you cannot control what you cannot predict. The ability ever to predict long-term events is a pipedream. Mankind has little to do with changing climate; complexity does." (Lawrence K Samuels, "The Real Science Behind Changing Climate", LewRockwell.com, August 1, 2014)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore." (Scott E Page, "The Model Thinker", 2018)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"We filter new information. If it accords with what we expect, we’ll be more likely to accept it. […] Our brains are always trying to make sense of the world around us based on incomplete information. The brain makes predictions about what it expects, and tends to fill in the gaps, often based on surprisingly sparse data." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

More quotes on "Prediction" at the-web-of-knowledge.blogspot.com

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.