03 December 2018

🔭Data Science: Events (Just the Quotes)

"[…] chance, that is, an infinite number of events, with respect to which our ignorance will not permit us to perceive their causes, and the chain that connects them together. Now, this chance has a greater share in our education than is imagined. It is this that places certain objects before us and, in consequence of this, occasions more happy ideas, and sometimes leads us to the greatest discoveries […]" (Claude A Helvetius, "On Mind", 1751)

"But ignorance of the different causes involved in the production of events, as well as their complexity, taken together with the imperfection of analysis, prevents our reaching the same certainty about the vast majority of phenomena. Thus there are things that are uncertain for us, things more or less probable, and we seek to compensate for the impossibility of knowing them by determining their different degrees of likelihood. So it was that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability." (Pierre-Simon Laplace, "Recherches, 1º, sur l'Intégration des Équations Différentielles aux Différences Finies, et sur leur Usage dans la Théorie des Hasards", 1773)

"[…] determine the probability of a future or unknown event not on the basis of the number of possible combinations resulting in this event or in its complementary event, but only on the basis of the knowledge of order of familiar previous events of this kind" (Marquis de Condorcet, "Essai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix", 1785)

"Probability has reference partly to our ignorance, partly to our knowledge [..] The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all cases possible is the measure of this probability, which is thus simply a fraction whose number is the number of favorable cases and whose denominator is the number of all cases possible." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"Things of all kinds are subject to a universal law which may be called the law of large numbers. It consists in the fact that, if one observes very considerable numbers of events of the same nature, dependent on constant causes and causes which vary irregularly, sometimes in one direction, sometimes in the other, it is to say without their variation being progressive in any definite direction, one shall find, between these numbers, relations which are almost constant." (Siméon-Denis Poisson, "Poisson’s Law of Large Numbers", 1837)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought." (Pierre-Simon de Laplace, "Philosophical Essay on Probabilities", 1902)

"Every theory of the course of events in nature is necessarily based on some process of simplification and is to some extent, therefore, a fairy tale." (Sir Napier Shaw, "Manual of Meteorology", 1932)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Multiple equilibria are not necessarily useless, but from the standpoint of any exact science the existence of a uniquely determined equilibrium is, of course, of the utmost importance, even if proof has to be purchased at the price of very restrictive assumptions; without any possibility of proving the existence of (a) uniquely determined equilibrium - or at all events, of a small number of possible equilibria - at however high a level of abstraction, a field of phenomena is really a chaos that is not under analytical control." (Joseph A Schumpeter, "History of Economic Analysis", 1954)

"In fact, it is empirically ascertainable that every event is actually produced by a number of factors, or is at least accompanied by numerous other events that are somehow connected with it, so that the singling out involved in the picture of the causal chain is an extreme abstraction. Just as ideal objects cannot be isolated from their proper context, material existents exhibit multiple interconnections; therefore the universe is not a heap of things but a system of interacting systems." (Mario Bunge, "Causality: The place of the casual principles in modern science", 1959)

"Certain properties are necessary or sufficient conditions for other properties, and the network of causal relations thus established will make the occurrence of one property at least tend, subject to the presence of other properties, to promote or inhibit the occurrence of another. Arguments from models involve those analogies which can be used to predict the occurrence of certain properties or events, and hence the relevant relations are causal, at least in the sense of implying a tendency to co-occur." (Mary B Hesse," Models and Analogies in Science", 1963)

"In complex systems cause and effect are often not closely related in either time or space. The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by nonlinear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive-feedback loops describing growth processes as well as negative, goal-seeking loops. In the complex system the cause of a difficulty may lie far back in time from the symptoms, or in a completely different and remote part of the system. In fact, causes are usually found, not in prior events, but in the structure and policies of the system." (Jay Wright Forrester, "Urban dynamics", 1969)

"There are different levels of organization in the occurrence of events. You cannot explain the events of one level in terms of the events of another. For example, you cannot explain life in terms of mechanical concepts, nor society in terms of individual psychology. Analysis can only take you down the scale of organization. It cannot reveal the workings of things on a higher level. To some extent the holistic philosophers are right." (Anatol Rapoport, "General Systems" Vol. 14, 1969)

"[I]n probability theory we are faced with situations in which our intuition or some physical experiments we have carried out suggest certain results. Intuition and experience lead us to an assignment of probabilities to events. As far as the mathematics is concerned, any assignment of probabilities will do, subject to the rules of mathematical consistency." (Robert Ash, "Basic probability theory", 1970)

"Perhaps randomness is not merely an adequate description for complex causes that we cannot specify. Perhaps the world really works this way, and many events are uncaused in any conventional sense of the word." (Stephen Jay Gould,"Hen's Teeth and Horse's Toes", 1983)

"If you perceive the world as some place where things happen at random - random events over which you have sometimes very little control, sometimes fairly good control, but still random events - well, one has to be able to have some idea of how these things behave. […] People who are not used to statistics tend to see things in data - there are random fluctuations which can sometimes delude them - so you have to understand what can happen randomly and try to control whatever can be controlled. You have to expect that you are not going to get a clean-cut answer. So how do you interpret what you get? You do it by statistics." (Lucien LeCam, [interview] 1988)

"According to the narrower definition of randomness, a random sequence of events is one in which anything that can ever happen can happen next. Usually it is also understood that the probability that a given event will happen next is the same as the probability that a like event will happen at any later time. [...] According to the broader definition of randomness, a random sequence is simply one in which any one of several things can happen next, even though not necessarily anything that can ever happen can happen next." (Edward N Lorenz, "The Essence of Chaos", 1993)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand.[...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Events may appear to us to be random, but this could be attributed to human ignorance about the details of the processes involved." (Brain S Everitt, "Chance Rules", 1999)

"The subject of probability begins by assuming that some mechanism of uncertainty is at work giving rise to what is called randomness, but it is not necessary to distinguish between chance that occurs because of some hidden order that may exist and chance that is the result of blind lawlessness. This mechanism, figuratively speaking, churns out a succession of events, each individually unpredictable, or it conspires to produce an unforeseeable outcome each time a large ensemble of possibilities is sampled."  (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"Entropy [...] is the amount of disorder or randomness present in any system. All non-living systems tend toward disorder; left alone they will eventually lose all motion and degenerate into an inert mass. When this permanent stage is reached and no events occur, maximum entropy is attained. A living system can, for a finite time, avert this unalterable process by importing energy from its environment. It is then said to create negentropy, something which is characteristic of all kinds of life." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"One can be highly functionally numerate without being a mathematician or a quantitative analyst. It is not the mathematical manipulation of numbers (or symbols representing numbers) that is central to the notion of numeracy. Rather, it is the ability to draw correct meaning from a logical argument couched in numbers. When such a logical argument relates to events in our uncertain real world, the element of uncertainty makes it, in fact, a statistical argument." (Eric R Sowey, "The Getting of Wisdom: Educating Statisticians to Enhance Their Clients' Numeracy", The American Statistician 57(2), 2003)

"Randomness is a difficult notion for people to accept. When events come in clusters and streaks, people look for explanations and patterns. They refuse to believe that such patterns - which frequently occur in random data - could equally well be derived from tossing a coin. So it is in the stock market as well." (Didier Sornette, "Why Stock Markets Crash: Critical events in complex financial systems", 2003)

"The basic concept of complexity theory is that systems show patterns of organization without organizer (autonomous or self-organization). Simple local interactions of many mutually interacting parts can lead to emergence of complex global structures. […] Complexity originates from the tendency of large dynamical systems to organize themselves into a critical state, with avalanches or 'punctuations' of all sizes. In the critical state, events which would otherwise be uncoupled became correlated." (Jochen Fromm, "The Emergence of Complexity", 2004)

"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[...] in probability theory we are faced with situations in which our intuition or some physical experiments we have carried out suggest certain results. Intuition and experience lead us to an assignment of probabilities to events. As far as the mathematics is concerned, any assignment of probabilities will do, subject to the rules of mathematical consistency." (Robert Ash, "Basic Probability Theory", 2008)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"In the network society, the space of flows dissolves time by disordering the sequence of events and making them simultaneous in the communication networks, thus installing society in structural ephemerality: being cancels becoming." (Manuel Castells, "Communication Power", 2009)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"The problem of complexity is at the heart of mankind’s inability to predict future events with any accuracy. Complexity science has demonstrated that the more factors found within a complex system, the more chances of unpredictable behavior. And without predictability, any meaningful control is nearly impossible. Obviously, this means that you cannot control what you cannot predict. The ability ever to predict long-term events is a pipedream. Mankind has little to do with changing climate; complexity does." (Lawrence K Samuels, "The Real Science Behind Changing Climate", 2014)

More quotes on "Events" at the-web-of-knowledge.blogspot.com

🔭Data Science: Regression (Just the Quotes)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The method of least squares is used in the analysis of data from planned experiments and also in the analysis of data from unplanned happenings. The word 'regression' is most often used to describe analysis of unplanned data. It is the tacit assumption that the requirements for the validity of least squares analysis are satisfied for unplanned data that produces a great deal of trouble." (George E P Box, "Use and Abuse of Regression", 1966)

"[…] fitting lines to relationships between variables is often a useful and powerful method of summarizing a set of data. Regression analysis fits naturally with the development of causal explanations, simply because the research worker must, at a minimum, know what he or she is seeking to explain." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Graphical methodology provides powerful diagnostic tools for conveying properties of the fitted regression, for assessing the adequacy of the fit, and for suggesting improvements. There is seldom any prior guarantee that a hypothesized regression model will provide a good description of the mechanism that generated the data. Standard regression models carry with them many specific assumptions about the relationship between the response and explanatory variables and about the variation in the response that is not accounted for by the explanatory variables. In many applications of regression there is a substantial amount of prior knowledge that makes the assumptions plausible; in many other applications the assumptions are made as a starting point simply to get the analysis off the ground. But whatever the amount of prior knowledge, fitting regression equations is not complete until the assumptions have been examined." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Stepwise regression is probably the most abused computerized statistical technique ever devised. If you think you need stepwise regression to solve a particular problem you have, it is almost certain that you do not. Professional statisticians rarely use automated stepwise regression." (Leland Wilkinson, "SYSTAT", 1984)

"Someone has characterized the user of stepwise regression as a person who checks his or her brain at the entrance of the computer center." (Dick R Wittink, "The application of regression analysis", 1988)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Exploratory regression methods attempt to reveal unexpected patterns, so they are ideal for a first look at the data. Unlike other regression techniques, they do not require that we specify a particular model beforehand. Thus exploratory techniques warn against mistakenly fitting a linear model when the relation is curved, a waxing curve when the relation is S-shaped, and so forth." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Linear regression assumes that in the population a normal distribution of error values around the predicted Y is associated with each X value, and that the dispersion of the error values for each X value is the same. The assumptions imply normal and similarly dispersed error distributions." (Fred C Pampel, "Linear Regression: A primer", 2000)

"Whereas regression is about attempting to specify the underlying relationship that summarises a set of paired data, correlation is about assessing the strength of that relationship. Where there is a very close match between the scatter of points and the regression line, correlation is said to be 'strong' or 'high' . Where the points are widely scattered, the correlation is said to be 'weak' or 'low'." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using]models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"Regression analysis, like all forms of statistical inference, is designed to offer us insights into the world around us. We seek patterns that will hold true for the larger population. However, our results are valid only for a population that is similar to the sample on which the analysis has been done." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Multiple regression, like all statistical techniques based on correlation, has a severe limitation due to the fact that correlation doesn't prove causation. And no amount of measuring of 'control' variables can untangle the web of causality. What nature hath joined together, multiple regression cannot put asunder." (Richard Nisbett, "2014 : What scientific idea is ready for retirement?", 2013)

"Multiple regression, like all statistical techniques based on correlation, has a severe limitation due to the fact that correlation doesn't prove causation. And no amount of measuring of 'control' variables can untangle the web of causality. What nature hath joined together, multiple regression cannot put asunder." (Richard Nisbett, "2014 : What scientific idea is ready for retirement?", 2013)

"What nature hath joined together, multiple regression cannot put asunder."  (Richard Nisbett, "2014 : What scientific idea is ready for retirement?", 2013)

"A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Regression does not describe changes in ability that happen as time passes […]. Regression is caused by performances fluctuating about ability, so that performances far from the mean reflect abilities that are closer to the mean." (Gary Smith, "Standard Deviations", 2014)

"We encounter regression in many contexts - pretty much whenever we see an imperfect measure of what we are trying to measure. Standardized tests are obviously an imperfect measure of ability. [...] Each experimental score is an imperfect measure of “ability,” the benefits from the layout. To the extent there is randomness in this experiment - and there surely is - the prospective benefits from the layout that has the highest score are probably closer to the mean than was the score." (Gary Smith, "Standard Deviations", 2014))

"When a trait, such as academic or athletic ability, is measured imperfectly, the observed differences in performance exaggerate the actual differences in ability. Those who perform the best are probably not as far above average as they seem. Nor are those who perform the worst as far below average as they seem. Their subsequent performances will consequently regress to the mean." (Gary Smith, "Standard Deviations", 2014)

"Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense - or deciding whether the method is the right one to use in the first place - requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very slow, buggy version of Microsoft Excel." (Jordan Ellenberg, "How Not to Be Wrong: The Power of Mathematical Thinking", 2014)

"A basic problem with MRA is that it typically assumes that the independent variables can be regarded as building blocks, with each variable taken by itself being logically independent of all the others. This is usually not the case, at least for behavioral data. […] Just as correlation doesn’t prove causation, absence of correlation fails to prove absence of causation. False-negative findings can occur using MRA just as false-positive findings do—because of the hidden web of causation that we’ve failed to identify." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"One technique employing correlational analysis is multiple regression analysis (MRA), in which a number of independent variables are correlated simultaneously (or sometimes sequentially, but we won’t talk about that variant of MRA) with some dependent variable. The predictor variable of interest is examined along with other independent variables that are referred to as control variables. The goal is to show that variable A influences variable B 'net of' the effects of all the other variables. That is to say, the relationship holds even when the effects of the control variables on the dependent variable are taken into account." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The fundamental problem with MRA, as with all correlational methods, is self-selection. The investigator doesn’t choose the value for the independent variable for each subject (or case). This means that any number of variables correlated with the independent variable of interest have been dragged along with it. In most cases, we will fail to identify all these variables. In the case of behavioral research, it’s normally certain that we can’t be confident that we’ve identified all the plausibly relevant variables." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The theory behind multiple regression analysis is that if you control for everything that is related to the independent variable and the dependent variable by pulling their correlations out of the mix, you can get at the true causal relation between the predictor variable and the outcome variable. That’s the theory. In practice, many things prevent this ideal case from being the norm." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Regression describes the relationship between an exploratory variable (i.e., independent) and a response variable (i.e., dependent). Exploratory variables are also referred to as predictors and can have a frequency of more than 1. Regression is being used within the realm of predictions and forecasting. Regression determines the change in response variable when one exploratory variable is varied while the other independent variables are kept constant. This is done to understand the relationship that each of those exploratory variables exhibits." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"Any time you run regression analysis on arbitrary real-world observational data, there’s a significant risk that there’s hidden confounding in your dataset and so causal conclusions from such analysis are likely to be (causally) biased." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"Multiple regression provides scientists and analysts with a tool to perform statistical control - a procedure to remove unwanted influence from certain variables in the model." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"The causal interpretation of linear regression only holds when there are no spurious relationships in your data. This is the case in two scenarios: when you control for a set of all necessary variables (sometimes this set can be empty) or when your data comes from a properly designed randomized experiment." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

More quotes on "Regression" at the-web-of-knowledge.blogspot.com

02 December 2018

🔭Data Science: Complexity (Just the Quotes)

"If we study the history of science we see happen two inverse phenomena […] Sometimes simplicity hides under complex appearances; sometimes it is the simplicity which is apparent, and which disguises extremely complicated realities. […] No doubt, if our means of investigation should become more and more penetrating, we should discover the simple under the complex, then the complex under the simple, then again the simple under the complex, and so on, without our being able to foresee what will be the last term. We must stop somewhere, and that science may be possible, we must stop when we have found simplicity. This is the only ground on which we can rear the edifice of our generalizations." (Henri Poincaré, "Science and Hypothesis", 1901)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"[Disorganized complexity] is a problem in which the number of variables is very large, and one in which each of the many variables has a behavior which is individually erratic, or perhaps totally unknown. However, in spite of this helter-skelter, or unknown, behavior of all the individual variables, the system as a whole possesses certain orderly and analyzable average properties. [...] [Organized complexity is] not problems of disorganized complexity, to which statistical methods hold the key. They are all problems which involve dealing simultaneously with a sizable number of factors which are interrelated into an organic whole. They are all, in the language here proposed, problems of organized complexity." (Warren Weaver, "Science and Complexity", American Scientist Vol. 36, 1948)

"Nor does complexity deny the valid simplification which is part of the process of analysis, and even a method of achieving complex architecture itself." (Robert Venturi, "Complexity and Contradiction in Architecture", 1966)

"The central task of a natural science is to make the wonderful commonplace: to show that complexity, correctly viewed, is only a mask for simplicity; to find pattern hidden in apparent chaos." (Herbert A Simon, "The Sciences of the Artificial", 1969)

"At each level of complexity, entirely new properties appear. [And] at each stage, entirely new laws, concepts, and generalizations are necessary, requiring inspiration and creativity to just as great a degree as in the previous one." (Herb Anderson, 1972)

"In general, complexity and precision bear an inverse relation to one another in the sense that, as the complexity of a problem increases, the possibility of analysing it in precise terms diminishes. Thus 'fuzzy thinking' may not be deplorable, after all, if it makes possible the solution of problems which are much too complex for precise analysis." (Lotfi A Zadeh, "Fuzzy languages and their relation to human intelligence", 1972)

"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage to move in the opposite direction." (Ernst F Schumacher, "Small is Beautiful", 1973)

"The aim of the model is of course not to reproduce reality in all its complexity. It is rather to capture in a vivid, often formal, way what is essential to understanding some aspect of its structure or behavior." (Joseph Weizenbaum, "Computer power and human reason: From judgment to calculation", 1976)

"All nature is a continuum. The endless complexity of life is organized into patterns which repeat themselves at each level of system." (James G Miller, "Living Systems", 1978)

"Simplicity does not precede complexity, but follows it." (Alan J Perlis, "Epigrams on Programming", 1982)

"Organized simplicity occurs where a small number of significant factors and a large number of insignificant factors appear initially to be complex, but on investigation display hidden simplicity." (Robert L Flood & Ewart R Carson, "Dealing with Complexity: An introduction to the theory and application of systems", 1988)

"The state of development of mathematical theory in relation to some attributes of complexity is a clear measure of our ability/inability to deal with that attribute […]" (Robert L Flood & Ewart R Carson, "Dealing with Complexity: An introduction to the theory and application of systems", 1988)

"Complexity is not an objective factor but a subjective one. Supersignals reduce complexity, collapsing a number of features into one. Consequently, complexity must be understood in terms of a specific individual and his or her supply of supersignals. We learn supersignals from experience, and our supply can differ greatly from another individual's. Therefore there can be no objective measure of complexity." (Dietrich Dorner, "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations", 1989)

"Modeling in its broadest sense is the cost-effective use of something in place of something else for some [cognitive] purpose. It allows us to use something that is simpler, safer, or cheaper than reality instead of reality for some purpose. A model represents reality for the given purpose; the model is an abstraction of reality in the sense that it cannot represent all aspects of reality. This allows us to deal with the world in a simplified manner, avoiding the complexity, danger and irreversibility of reality." (Jeff Rothenberg, "The Nature of Modeling. In: Artificial Intelligence, Simulation, and Modeling", 1989)

"A measure that corresponds much better to what is usually meant by complexity in ordinary conversation, as well as in scientific discourse, refers not to the length of the most concise description of an entity (which is roughly what AIC [algorithmic information content] is), but to the length of a concise description of a set of the entity’s regularities. Thus something almost entirely random, with practically no regularities, would have effective complexity near zero. So would something completely regular, such as a bit string consisting entirely of zeroes. Effective complexity can be high only a region intermediate between total order and complete." (Murray Gell-Mann, "What is Complexity?", Complexity Vol 1 (1), 1995)

"The larger, more detailed and complex the model - the less abstract the abstraction – the smaller the number of people capable of understanding it and the longer it takes for its weaknesses and limitations to be found out." (John Adams, "Risk", 1995)

"Complexity is that property of a model which makes it difficult to formulate its overall behaviour in a given language, even when given reasonably complete information about its atomic components and their inter-relations." (Bruce Edmonds, "Syntactic Measures of Complexity", 1999)

"Falling between order and chaos, the moment of complexity is the point at which self-organizing systems emerge to create new patterns of coherence and structures of behaviour." (Mark C Taylor, "The Moment of Complexity: Emerging Network Culture", 2001)

"[…] most earlier attempts to construct a theory of complexity have overlooked the deep link between it and networks. In most systems, complexity starts where networks turn nontrivial." (Albert-László Barabási, "Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life", 2002)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

More on "Complexity" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Error (Just the Quotes)

"The probable is something which lies midway between truth and error" (Christian Thomasius, "Institutes of Divine Jurisprudence", 1688)

"Knowledge being to be had only of visible and certain truth, error is not a fault of our knowledge, but a mistake of our judgment, giving assent to that which is not true." (John Locke, "An Essay Concerning Human Understanding", 1689)

"The errors of definitions multiply themselves according as the reckoning proceeds; and lead men into absurdities, which at last they see but cannot avoid, without reckoning anew from the beginning." (Thomas Hobbes, "The Moral and Political Works of Thomas Hobbes of Malmesbury", 1750)

"Men are often led into errors by the love of simplicity, which disposes us to reduce things to few principles, and to conceive a greater simplicity in nature than there really is." (Thomas Reid, "Essays on the Intellectual Powers of Man", 1785)

"The orbits of certainties touch one another; but in the interstices there is room enough for error to go forth and prevail." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"Nothing hurts a new truth more than an old error." (Johann Wolfgang von Goethe, "Sprüche in Prosa", 1840)

"Every detection of what is false directs us towards what is true: every trial exhausts some tempting form of error. Not only so; but scarcely any attempt is entirely a failure; scarcely any theory, the result of steady thought, is altogether false; no tempting form of error is without some latent charm derived from truth." (William Whewell, "Lectures on the History of Moral Philosophy in England", 1852)

"[…] ideas may be both novel and important, and yet, if they are incorrect - if they lack the very essential support of incontrovertible fact, they are unworthy of credence. Without this, a theory may be both beautiful and grand, but must be as evanescent as it is beautiful, and as unsubstantial as it is grand." (George Brewster, "A New Philosophy of Matter", 1858)

"When a power of nature, invisible and impalpable, is the subject of scientific inquiry, it is necessary, if we would comprehend its essence and properties, to study its manifestations and effects. For this purpose simple observation is insufficient, since error always lies on the surface, whilst truth must be sought in deeper regions." (Justus von Liebig," Familiar Letters on Chemistry", 1859)

"As in the experimental sciences, truth cannot be distinguished from error as long as firm principles have not been established through the rigorous observation of facts." (Louis Pasteur, "Étude sur la maladie des vers à soie", 1870)

"It would be an error to suppose that the great discoverer seizes at once upon the truth, or has any unerring method of divining it. In all probability the errors of the great mind exceed in number those of the less vigorous one. Fertility of imagination and abundance of guesses at truth are among the first requisites of discovery; but the erroneous guesses must be many times as numerous as those that prove well founded. The weakest analogies, the most whimsical notions, the most apparently absurd theories, may pass through the teeming brain, and no record remain of more than the hundredth part. […] The truest theories involve suppositions which are inconceivable, and no limit can really be placed to the freedom of hypotheses." (W Stanley Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1877)

"Perfect readiness to reject a theory inconsistent with fact is a primary requisite of the philosophic mind. But it, would be a mistake to suppose that this candour has anything akin to fickleness; on the contrary, readiness to reject a false theory may be combined with a peculiar pertinacity and courage in maintaining an hypothesis as long as its falsity is not actually apparent." (William S Jevons, "The Principles of Science", 1887)

"One is almost tempted to assert that quite apart from its intellectual mission, theory is the most practical thing conceivable, the quintessence of practice as it were, since the precision of its conclusions cannot be reached by any routine of estimating or trial and error; although given the hidden ways of theory, this will hold only for those who walk them with complete confidence." (Ludwig E Boltzmann, "On the Significance of Theories", 1890)

"[…] to kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact." (Charles R Darwin, "More Letters of Charles Darwin", Vol 2, 1903)

"Man's determination not to be deceived is precisely the origin of the problem of knowledge. The question is always and only this: to learn to know and to grasp reality in the midst of a thousand causes of error which tend to vitiate our observation." (Federigo Enriques, "Problems of Science", 1906)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Poor statistics may be attributed to a number of causes. There are the mistakes which arise in the course of collecting the data, and there are those which occur when those data are being converted into manageable form for publication. Still later, mistakes arise because the conclusions drawn from the published data are wrong. The real trouble with errors which arise during the course of collecting the data is that they are the hardest to detect." (Alfred R Ilersic, "Statistics", 1959)

"When using estimated figures, i.e. figures subject to error, for further calculation make allowance for the absolute and relative errors. Above all, avoid what is known to statisticians as 'spurious' accuracy. For example, if the arithmetic Mean has to be derived from a distribution of ages given to the nearest year, do not give the answer to several places of decimals. Such an answer would imply a degree of accuracy in the results of your calculations which are quite un- justified by the data. The same holds true when calculating percentages." (Alfred R Ilersic, "Statistics", 1959)

"While it is true to assert that much statistical work involves arithmetic and mathematics, it would be quite untrue to suggest that the main source of errors in statistics and their use is due to inaccurate calculations." (Alfred R Ilersic, "Statistics", 1959)

"Errors may also creep into the information transfer stage when the originator of the data is unconsciously looking for a particular result. Such situations may occur in interviews or questionnaires designed to gather original data. Improper wording of the question, or improper voice inflections. and other constructional errors may elicit nonobjective responses. Obviously, if the data is incorrectly gathered, any graph based on that data will contain the original error - even though the graph be most expertly designed and beautifully presented." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"One grievous error in interpreting approximations is to allow only good approximations." (Preston C Hammer, "Mind Pollution", Cybernetics, Vol. 14, 1971)

"Thus, the construction of a mathematical model consisting of certain basic equations of a process is not yet sufficient for effecting optimal control. The mathematical model must also provide for the effects of random factors, the ability to react to unforeseen variations and ensure good control despite errors and inaccuracies." (Yakov Khurgin, "Did You Say Mathematics?", 1974)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Most people like to believe something is or is not true. Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory. If you believe too much you'll never notice the flaws; if you doubt too much you won't get started. It requires a lovely balance." (Richard W Hamming, "You and Your Research", 1986) 

"We have found that some of the hardest errors to detect by traditional methods are unsuspected gaps in the data collection (we usually discovered them serendipitously in the course of graphical checking)." (Peter Huber, "Huge data sets", Compstat '94: Proceedings, 1994)

"Humans may crave absolute certainty; they may aspire to it; they may pretend, as partisans of certain religions do, to have attained it. But the history of science - by far the most successful claim to knowledge accessible to humans - teaches that the most we can hope for is successive improvement in our understanding, learning from our mistakes, an asymptotic approach to the Universe, but with the proviso that absolute certainty will always elude us. We will always be mired in error. The most each generation can hope for is to reduce the error bars a little, and to add to the body of data to which error bars apply." (Carl Sagan, "The Demon-Haunted World: Science as a Candle in the Dark", 1995)

"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Random errors can always be determined by repeating measurements under identical conditions. […] this statement is true only for time-related random errors ." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Systematic errors can be determined inductively. It should be quite obvious that it is not possible to determine the scale error from the pattern of data values." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"A key difference between a traditional statistical problems and a time series problem is that often, in time series, the errors are not independent." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

 "A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"If the observations/errors are not independent, the statistical formulations are completely unreliable unless corrections can be made.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Once a model has been fitted to the data, the deviations from the model are the residuals. If the model is appropriate, then the residuals mimic the true errors. Examination of the residuals often provides clues about departures from the modeling assumptions. Lack of fit - if there is curvature in the residuals, plotted versus the fitted values, this suggests there may be whole regions where the model overestimates the data and other whole regions where the model underestimates the data. This would suggest that the current model is too simple relative to some better model.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

 "The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"When data is not normal, the reason the formulas are working is usually the central limit theorem. For large sample sizes, the formulas are producing parameter estimates that are approximately normal even when the data is not itself normal. The central limit theorem does make some assumptions and one is that the mean and variance of the population exist. Outliers in the data are evidence that these assumptions may not be true. Persistent outliers in the data, ones that are not errors and cannot be otherwise explained, suggest that the usual procedures based on the central limit theorem are not applicable.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

🔭Data Science: Certainty (Just the Quotes)

"Reasoning draws a conclusion and makes us grant the conclusion, but does not make the conclusion certain, nor does it remove doubt so that the mind may rest on the intuition of truth, unless the mind discovers it by the path of experience." (Roger Bacon, "Opus Majus", cca. 1267)

"[…] the highest probability amounts not to certainty, without which there can be no true knowledge." (John Locke, "An Essay Concerning Human Understanding", 1689)

"All effects follow not with like certainty from their supposed causes." (David Hume, "An Enquiry Concerning Human Understanding", 1748)

"As mathematical and absolute certainty is seldom to be attained in human affairs, reason and public utility require that judges and all mankind in forming their opinions of the truth of facts should be regulated by the superior number of the probabilities on the one side or the other whether the amount of these probabilities be expressed in words and arguments or by figures and numbers." (William Murray, 1773) 

"In order to supply the defects of experience, we will have recourse to the probable conjectures of analogy, conclusions which we will bequeath to our posterity to be ascertained by new observations, which, if we augur rightly, will serve to establish our theory and to carry it gradually nearer to absolute certainty." (Johann H Lambert, "The System of the World", 1800)

"One may even say, strictly speaking, that almost all our knowledge is only probable; and in the small number of things that we are able to know with certainty, in the mathematical sciences themselves, the principal means of arriving at the truth - induction and analogy - are based on probabilities, so that the whole system of human knowledge is tied up with the theory set out in this essay." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"The orbits of certainties touch one another; but in the interstices there is room enough for error to go forth and prevail." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"All certainty which does not consist in mathematical demonstration is nothing more than the highest probability; there is no other historical certainty." (Voltaire, "A Philosophical Dictionary", 1881)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"Sometimes the probability in favor of a generalization is enormous, but the infinite probability of certainty is never reached." (William Dampier-Whetham, "Science and the Human Mind", 1912)

"No matter how solidly founded a prediction may appear to us, we are never absolutely sure that experiment will not contradict it, if we undertake to verify it . […] It is far better to foresee even without certainty than not to foresee at all." (Henri Poincaré, "The Foundations of Science", 1913)

"The very name calculus of probabilities is a paradox. Probability opposed to certainty is what we do not know, and how can we calculate what we do not know?" (Henri Poincaré, "The Foundations of Science", 1913)

"The making of decisions, as everyone knows from personal experience, is a burdensome task. Offsetting the exhilaration that may result from correct and successful decision and the relief that follows the termination of a struggle to determine issues is the depression that comes from failure, or error of decision, and the frustration which ensues from uncertainty." (Chester I Barnard, "The Functions of the Executive", 1938)

"Uncertainty is introduced, however, by the impossibility of making generalizations, most of the time, which happens to all members of a class. Even scientific truth is a matter of probability and the degree of probability stops somewhere short of certainty." (Wayne C Minnick, "The Art of Persuasion", 1957)

"Incomplete knowledge must be considered as perfectly normal in probability theory; we might even say that, if we knew all the circumstances of a phenomenon, there would be no place for probability, and we would know the outcome with certainty." (Félix E Borel, Probability and Certainty", 1963)

"It is a commonplace of modern technology that there is a high measure of certainty that problems have solutions before there is knowledge of how they are to be solved." (John K Galbraith, "The New Industrial State", 1967)

"Statistics is a body of methods and theory applied to numerical evidence in making decisions in the face of uncertainty." (Lawrence Lapin, "Statistics for Modern Business Decisions", 1973)

"The most dominant decision type [that will have to be made in an organic organization] will be decisions under uncertainty." (Henry L Tosi & Stephen J Carroll, "Management", 1976)

"The greater the uncertainty, the greater the amount of decision making and information processing. It is hypothesized that organizations have limited capacities to process information and adopt different organizing modes to deal with task uncertainty. Therefore, variations in organizing modes are actually variations in the capacity of organizations to process information and make decisions about events which cannot be anticipated in advance." (John K Galbraith, "Organization Design", 1977)

"Facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts. Facts do not go away while scientists debate rival theories for explaining them." (Stephen J Gould "Evolution as Fact and Theory", 1981)

"Knowledge specialists may ascribe a degree of certainty to their models of the world that baffles and offends managers. Often the complexity of the world cannot be reduced to mathematical abstractions that make sense to a manager. Managers who expect complete, one-to-one correspondence between the real world and each element in a model are disappointed and skeptical." (Dale E Zand, "Information, Organization, and Power", 1981)

"But there is trouble in store for anyone who surrenders to the temptation of mistaking an elegant hypothesis for a certainty: the readers of detective stories know this quite well." (Primo Levi, "The Periodic Table", 1984)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Probability plays a central role in many fields, from quantum mechanics to information theory, and even older fields use probability now that the presence of 'noise' is officially admitted. The newer aspects of many fields start with the admission of uncertainty." (Richard Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"The mathematical theories generally called 'mathematical theories of chance' actually ignore chance, uncertainty and probability. The models they consider are purely deterministic, and the quantities they study are, in the final analysis, no more than the mathematical frequencies of particular configurations, among all equally possible configurations, the calculation of which is based on combinatorial analysis. In reality, no axiomatic definition of chance is conceivable." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"The worst, i.e., most dangerous, feature of 'accepting the null hypothesis' is the giving up of explicit uncertainty. [...] Mathematics can sometimes be put in such black-and-white terms, but our knowledge or belief about the external world never can." (John Tukey, "The Philosophy of Multiple Comparisons", Statistical Science Vol. 6 (1), 1991)

"In nonlinear systems - and the economy is most certainly nonlinear - chaos theory tells you that the slightest uncertainty in your knowledge of the initial conditions will often grow inexorably. After a while, your predictions are nonsense." (M Mitchell Waldrop, "Complexity: The Emerging Science at the Edge of Order and Chaos", 1992)

"It is in the nature of theoretical science that there can be no such thing as certainty. A theory is only ‘true’ for as long as the majority of the scientific community maintain the view that the theory is the one best able to explain the observations." (Jim Baggott, "The Meaning of Quantum Theory", 1992)

"Statistics as a science is to quantify uncertainty, not unknown." (Chamont Wang, "Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety", 1993)

"There is a new science of complexity which says that the link between cause and effect is increasingly difficult to trace; that change (planned or otherwise) unfolds in non-linear ways; that paradoxes and contradictions abound; and that creative solutions arise out of diversity, uncertainty and chaos." (Andy P Hargreaves & Michael Fullan, "What’s Worth Fighting for Out There?", 1998)

"Information entropy has its own special interpretation and is defined as the degree of unexpectedness in a message. The more unexpected words or phrases, the higher the entropy. It may be calculated with the regular binary logarithm on the number of existing alternatives in a given repertoire. A repertoire of 16 alternatives therefore gives a maximum entropy of 4 bits. Maximum entropy presupposes that all probabilities are equal and independent of each other. Minimum entropy exists when only one possibility is expected to be chosen. When uncertainty, variety or entropy decreases it is thus reasonable to speak of a corresponding increase in information." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"Most physical systems, particularly those complex ones, are extremely difficult to model by an accurate and precise mathematical formula or equation due to the complexity of the system structure, nonlinearity, uncertainty, randomness, etc. Therefore, approximate modeling is often necessary and practical in real-world applications. Intuitively, approximate modeling is always possible. However, the key questions are what kind of approximation is good, where the sense of 'goodness' has to be first defined, of course, and how to formulate such a good approximation in modeling a system such that it is mathematically rigorous and can produce satisfactory results in both theory and applications." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Any scientific data without (a stated) uncertainty is of no avail. Therefore the analysis and description of uncertainty are almost as important as those of the data value itself . It should be clear that the uncertainty itself also has an uncertainty – due to its nature as a scientific quantity – and so on. The uncertainty of an uncertainty is generally not determined." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"As uncertainties of scientific data values are nearly as important as the data values themselves, it is usually not acceptable that a best estimate is only accompanied by an estimated uncertainty. Therefore, only the size of nondominant uncertainties should be estimated. For estimating the size of a nondominant uncertainty we need to find its upper limit, i.e., we want to be as sure as possible that the uncertainty does not exceed a certain value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In fact, H [entropy] measures the amount of uncertainty that exists in the phenomenon. If there were only one event, its probability would be equal to 1, and H would be equal to 0 - that is, there is no uncertainty about what will happen in a phenomenon with a single event because we always know what is going to occur. The more events that a phenomenon possesses, the more uncertainty there is about the state of the phenomenon. In other words, the more entropy, the more information." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The storytelling mind is allergic to uncertainty, randomness, and coincidence. It is addicted to meaning. If the storytelling mind cannot find meaningful patterns in the world, it will try to impose them. In short, the storytelling mind is a factory that churns out true stories when it can, but will manufacture lies when it can't." (Jonathan Gottschall, "The Storytelling Animal: How Stories Make Us Human", 2012)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"We have minds that are equipped for certainty, linearity and short-term decisions, that must instead make long-term decisions in a non-linear, probabilistic world." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting - that is, the more you should prefer simplicity." (Brian Christian & Thomas L Griffiths, "Algorithms to Live By: The Computer Science of Human Decisions", 2016)

"A notable difference between many fields and data science is that in data science, if a customer has a wish, even an experienced data scientist may not know whether it’s possible. Whereas a software engineer usually knows what tasks software tools are capable of performing, and a biologist knows more or less what the laboratory can do, a data scientist who has not yet seen or worked with the relevant data is faced with a large amount of uncertainty, principally about what specific data is available and about how much evidence it can provide to answer any given question. Uncertainty is, again, a major factor in the data scientific process and should be kept at the forefront of your mind when talking with customers about their wishes."  (Brian Godsey, "Think Like a Data Scientist", 2017)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Uncertainty is an adversary of coldly logical algorithms, and being aware of how those algorithms might break down in unusual circumstances expedites the process of fixing problems when they occur - and they will occur. A data scientist’s main responsibility is to try to imagine all of the possibilities, address the ones that matter, and reevaluate them all as successes and failures happen." (Brian Godsey, "Think Like a Data Scientist", 2017)

"Bootstrapping provides an intuitive, computer-intensive way of assessing the uncertainty in our estimates, without making strong assumptions and without using probability theory. But the technique is not feasible when it comes to, say, working out the margins of error on unemployment surveys of 100,000 people. Although bootstrapping is a simple, brilliant and extraordinarily effective idea, it is just too clumsy to bootstrap such large quantities of data, especially when a convenient theory exists that can generate formulae for the width of uncertainty intervals." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Entropy is a measure of amount of uncertainty or disorder present in the system within the possible probability distribution. The entropy and amount of unpredictability are directly proportional to each other." (G Suseela & Y Asnath V Phamila, "Security Framework for Smart Visual Sensor Networks", 2019)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what anyone man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician." (Sir Arthur C Doyle)

More quotes on" Certainty" at the-web-of-knowledge.blogspot.com

🔭Data Science: Hypothesis (Just the Quotes)

"[…] it is not necessary that these hypotheses should be true, or even probably; but it is enough if they provide a calculus which fits the observations […]" (Andrew Osiander, "On the Revolutions of the Heavenly Spheres", 1543)

"The art of discovering the causes of phenomena, or true hypothesis, is like the art of decyphering, in which an ingenious conjecture greatly shortens the road." (Gottfried W Leibniz, "New Essays Concerning Human Understanding", 1704) [published 1765]

"In order to shake a hypothesis, it is sometimes not necessary to do anything more than push it as far as it will go." (Denis Diderot, "On the Interpretation of Nature", 1753)

"No hypothesis can lay claim to any value unless it assembles many phenomena under one concept." (Johann Wolfgang von Goethe, [letter to Sommering] 1795)

"Induction, analogy, hypotheses founded upon facts and rectified continually by new observations, a happy tact given by nature and strengthened by numerous comparisons of its indications with experience, such are the principal means for arriving at truth." (Pierre-Simon Laplace, "A Philosophical Essay on Probabilities", 1814)

"The hypothesis is like the captain, and the observations like the soldiers of an army: while he appears to command them, and in this way to work his own will, he does in fact derive all his power of conquest from their obedience, and becomes helpless and useless if they mutiny." (William Whewell, "Philosophy of the Inductive Sciences", 1840)

"The process of scientific discovery is cautious and rigorous, not by abstaining from hypothesis, but by rigorously comparing hypotheses with facts, and by resolutely rejecting all which the comparison does not confirm." (William Whewell, "The Philosophy of the Inductive Sciences Founded Upon Their History" Vol. 2, 1840)

"When the hypothesis, of itself and without adjustment for the purpose, gives us the rule and reason of a class of facts not contemplated in its construction, we have a criterion of its reality, which has never yet been produced in favour of falsehood." (William Whewell, "The Philosophy of the Inductive Sciences", 1840) 

"An hypothesis being a mere supposition, there are no other limits to hypotheses than those of the human imagination; we may, if we please, imagine, by way of accounting for an effect, some cause of a kind utterly unknown, and acting according to a law altogether fictitious." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"It appears, then, to be a condition of a genuinely scientific hypothesis, that it be not destined always to remain an hypothesis, but be certain to be either proved or disproved by [...] comparison with observed facts." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The hypothesis, by suggesting observations and experiments, puts us upon the road to that independent evidence if it be really attainable; and till it be attained, the hypothesis ought not to count for more than a suspicion." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The rules of scientific investigation always require us, when we enter the domains of conjecture, to adopt that hypothesis by which the greatest number of known facts and phenomena may be reconciled." (Matthew F Maury, "The Physical Geography of the Sea", 1855) 

"An anticipative idea or an hypothesis is, then, the necessary starting point for all experimental reasoning. Without it, we could not make any investigation at all nor learn anything; we could only pile up sterile observations. If we experiment without a preconceived idea, we should move at random […]" (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1865)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1868)

"The great tragedy of Science - the slaying of a beautiful hypothesis by an ugly fact." (Thomas H Huxley, "Biogenesis and abiogenesis", [address] 1870)

"[…] wrong hypotheses, rightly worked from, have produced more useful results than unguided observation." (Augustus de Morgan, "A Budget of Paradoxes", 1872)

"An hypothesis is only a habit - a habit of looking through a glass of one peculiar colour, which imparts its hue to all around it." (Frederick Marryat, "The King's Own", 1873) 

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The scientific discovery appears first as the hypothesis of an analogy; and science tends to become independent of the hypothesis." (William K Clifford, "Lectures and Essays", 1879)

"Every hypothesis must derive indubitable results from mechanically well-defined assumptions by mathematically correct methods." (Ludwig Boltzmann, "Certain Questions of the Theory of Gasses", Nature Vol. 51 (1322), 1895) 

"For the truly scientific man, the hypothesis is destined solely to enable him to get the facts of nature in some definite order, an order which shall make apparent their connection with the great order and harmony which is believed to be present in the universe." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"If the working hypothesis fails in any essential particular he [the scientist] is ready to modify or discard it. For the truly inspired investigator, one undoubted fact weighs more in the balance than a thousand theories." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1896)

"Entia non sunt multiplicanda praeter necessitatem. That is to say; before you try a complicated hypothesis, you should make quite sure that no simplification of it will explain the facts equally well." (Charles S Peirce," Pragmatism and Pragmaticism", [lecture] 1903)

"A false hypothesis, if it serve as a guide for further enquiry, may, at the right stage of science, be as useful as, or more useful than, a truer one for which acceptable evidence is not yet at hand." (William C Dampier, "Science and the Human Mind, Science in the Ancient World", 1912) 

"Without hypothesis there can be no progress in knowledge." (Max Verworn, "Irritability", 1913) 

"The great difference between induction and hypothesis is that the former infers the existence of phenomena such as we have observed in cases which are similar, while hypothesis supposes something of a different kind from what we have directly observed, and frequently something which it would be impossible for us to observe directly." (Charles S Peirce, "Chance, Love and Logic: Philosophical Essays, Deduction, Induction, Hypothesis", 1914)

"Theory is the best guide for experiment - that were it not for theory and the problems and hypotheses that come out of it, we would not know the points we wanted to verify, and hence would experiment aimlessly" (Henry Hazlitt,  "Thinking as a Science", 1916)

"A good hypothesis in science must have other properties than those of the phenomenon it is immediately invoked to explain, otherwise it is not prolific enough." (William James, "Selected Papers on Philosophy", 1918) 

"An indispensable hypothesis, even though still far from being a guarantee of success, is however the pursuit of a specific aim, whose lighted beacon, even by initial failures, is not betrayed." (Max Planck, [Nobel lecture] 1918) 

"A hypothesis or theory is clear, decisive, and positive, but it is believed by no one but the man who created it. Experimental findings, on the other hand, are messy, inexact things, which are believed by everyone except the man who did the work." (Harlow Shapley, "Review of Scientific Instruments" Vol. 6, 1922) 

"However successful a theory or law may have been in the past, directly it fails to interpret new discoveries its work is finished, and it must be discarded or modified. However plausible the hypothesis, it must be ever ready for sacrifice on the altar of observation." (Joseph W Mellor, "A Comprehensive Treatise on Inorganic and Theoretical Chemistry", 1922) 

"Hypothesis, however, is an inference based on knowledge which is insufficient to prove its high probability." (Frederick L Barry, "The Scientific Habit of Thought", 1927) 

"Abstraction is the detection of a common quality in the characteristics of a number of diverse observations […] A hypothesis serves the same purpose, but in a different way. It relates apparently diverse experiences, not by directly detecting a common quality in the experiences themselves, but by inventing a fictitious substance or process or idea, in terms of which the experience can be expressed. A hypothesis, in brief, correlates observations by adding something to them, while abstraction achieves the same end by subtracting something." (Herbert Dingle, Science and Human Experience, 1931)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl Popper, "The Logic of Scientific Discovery", 1934) 

"All the theories and hypotheses of empirical science share this provisional character of being established and accepted ‘until further notice’, whereas a mathematical theorem, once proved, is established once and for all; it holds with that particular certainty which no subsequent empirical discoveries, however unexpected and extraordinary, can ever affect to the slightest extent." (Carl G Hempel, "Geometry and Empirical Science", 1935)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher, "The Design of Experiments", 1935)

"The laws of science are the permanent contributions to knowledge - the individual pieces that are fitted together in an attempt to form a picture of the physical universe in action. As the pieces fall into place, we often catch glimpses of emerging patterns, called theories; they set us searching for the missing pieces that will fill in the gaps and complete the patterns. These theories, these provisional interpretations of the data in hand, are mere working hypotheses, and they are treated with scant respect until they can be tested by new pieces of the puzzle." (Edwin P Whipple, "Experiment and Experience", [Commencement Address, California Institute of Technology] 1938)

"When two hypotheses are possible, we provisionally choose that which our minds adjudge to the simpler on the supposition that this Is the more likely to lead in the direction of the truth." (James H Jeans, "Physics and Philosophy" 3rd Ed., 1943)

"We see what we want to see, and observation conforms to hypothesis." (Bergen Evans, "The Natural History of Nonsense", 1946)

"A successful hypothesis is not necessarily a permanent hypothesis, but it is one which stimulates additional research, opens up new fields, or explains and coordinates previously unrelated facts." (Farrington Daniels, "Outlines of Physical Chemistry", 1948)

"There would be cases where we would not want to accept an hypothesis even though the evidence gives a high d. c. [degree of confirmation] score, because we are fearful of the consequences of a wrong decision." (C West Churchman, "Theory of Experimental Inference", 1948) 

"Hypothesis is a tool which can cause trouble if not used properly. We must be ready to abandon out hypothesis as soon as it is shown to be inconsistent with the facts." (William I B Beveridge, "The Art of Scientific Investigation", 1950) 

"A collection of observable concepts in a purely formal hypothesis suggesting no analogy with anything would consequently not suggest either any directions for its own development." (Mary B Hesse, "Operational Definition and Analogy in Physical Theories", British Journal for the Philosophy of Science 2 (8), 1952)

"Whenever we attempt to test a hypothesis we naturally try to avoid errors in judging it. This seems to indicate the right way of proceeding: when choosing a test we should try to minimize the frequency of errors that may be committed in applying it." (Jerzy Neyman, "Lectures and Conferences on Mathematical Statistics", 1952) 

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"[…] the grand aim of all science […] is to cover the greatest possible number of empirical facts by logical deductions from the smallest possible number of hypotheses or axioms." (Albert Einstein, 1954)

"One must credit an hypothesis with all that has had to be discovered in order to demolish it." (Jean Rostand, "The substance of man", 1962)

"The formulation of a hypothesis carries with it an obligation to test it as rigorously as we can command skills to do so." (Peter Medawar, "Hypothesis and Imagination", 1963)

"Truth in science can be defined as the working hypothesis best suited to open the way to the next better one." (Konrad Lorenz, "On Aggression", 1963) 

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology", 1966)

"All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. And this system is not a more or less arbitrary and doubtful point of departure for all our arguments; no it belongs to the essence of what we call an argument. The system is not so much the point of departure, as the element in which our arguments have their life." (Ludwig Wittgenstein, "On Certainty", 1969) 

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"A hypothesis will in the end become a truth when all phenomena let themselves be derived from it in a natural and in an obvious manner, when all these consequences are connected with one another and with the general reasons, in short, when that hypothesis is consistent in all its parts with itself." (Johann H Lambert, 1976)

"The essential function of a hypothesis consists in the guidance it affords to new observations and experiments, by which our conjecture is either confirmed or refuted." (Ernst Mach, "Knowledge and Error: Sketches on the Psychology of Enquiry", 1976)

"Be suspicious of a theory if more and more hypotheses are needed to support it as new facts become available, or as new considerations are brought to bear." (Sir Fred Hoyle & Nalin C Wickramasinghe, "Evolution from Space", 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Don't just read it; fight it! Ask your own question, look for your own examples, dicover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis?" (Paul R Halmos, "I Want to be a Mathematician", 1985)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M Stigler, "Testing Hypotheses or fitting Models? Another Look at Mass Extinctions" [in "Neutral Models in Biology"], 1987)

"All science is based on models, and every scientific model comprises three distinct stages: statement of well-defined hypotheses; deduction of all the consequences of these hypotheses, and nothing but these consequences; confrontation of these consequences with observed data." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"Any physical theory is always provisional, in the sense that it is only a hypothesis: you can never prove it. No matter how many times the results of experiments agree with some theory, you can never be sure that the next time the result will not contradict the theory." (Stephen Hawking,  "A Brief History of Time", 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"The model and the theory it represents must be accepted, at least temporarily, or rejected, depending on the agreement or disagreement between observed data and the hypotheses and implications of the model. When neither the hypotheses nor the implications of a theory can be confronted with the real world, that theory is devoid of any scientific interest. Mere logical, even mathematical, deduction remains worthless in terms of the understanding of reality if it is not closely linked to that reality." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A hypothesis is a novel suggestion that no one wants to believe. It is guilty, until found effective." (Edward Teller, "Conversations on the Dark Secrets of Physics", 1991)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002) 

"[…] a conceptual model is a diagram connecting variables and constructs based on theory and logic that displays the hypotheses to be tested." (Mary W Celsi et al, "Essentials of Business Research Methods", 2011)

"Data science is an iterative process. It starts with a hypothesis (or several hypotheses) about the system we’re studying, and then we analyze the information. The results allow us to reject our initial hypotheses and refine our understanding of the data. When working with thousands of fields and millions of rows, it’s important to develop intuitive ways to reject bad hypotheses quickly." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Observation and experiment, without a rational hypothesis, is like a man groping at objects at random with his eyes shut." (Henry P Tappan, "Elements of Logic", 2015)

"A hypothesis is a starting point for an investigation. When you hypothesize, you make a claim about why something might be the case, based on limited data, to offer an explanation or a path forward. You wouldn’t make a proposition about something you are certain of. You may not have enough evidence yet to even convince you that it’s true. But making such a claim puts a stake in the ground that suggests a path for focused analysis." (Eben Hewitt, "Technology Strategy Patterns: Architecture as strategy" 2nd Ed., 2019)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.