SQL Troubles

09 December 2018

🔭Data Science: Failure (Just the Quotes)

"Every detection of what is false directs us towards what is true: every trial exhausts some tempting form of error. Not only so; but scarcely any attempt is entirely a failure; scarcely any theory, the result of steady thought, is altogether false; no tempting form of error is without some latent charm derived from truth." (William Whewell, "Lectures on the History of Moral Philosophy in England", 1852)

"Scarcely any attempt is entirely a failure; scarcely any theory, the result of steady thought, is altogether false; no tempting form of Error is without some latent charm derived from Truth." (William Whewell, "Lectures on the History of Moral Philosophy in England", 1852)

"We learn wisdom from failure much more than from success. We often discover what will do, by finding out what will not do; and probably he who never made a mistake never made a discovery." (Samuel Smiles, "Facilities and Difficulties", 1859)

"[…] the statistical prediction of the future from the past cannot be generally valid, because whatever is future to any given past, is in tum past for some future. That is, whoever continually revises his judgment of the probability of a statistical generalization by its successively observed verifications and failures, cannot fail to make more successful predictions than if he should disregard the past in his anticipation of the future. This might be called the ‘Principle of statistical accumulation’." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1929)

"Science condemns itself to failure when, yielding to the infatuation of the serious, it aspires to attain being, to contain it, and to possess it; but it finds its truth if it considers itself as a free engagement of thought in the given, aiming, at each discovery, not at fusion with the thing, but at the possibility of new discoveries; what the mind then projects is the concrete accomplishment of its freedom." (Simone de Beauvoir, "The Ethics of Ambiguity", 1947)

"Common sense […] may be thought of as a series of concepts and conceptual schemes which have proved highly satisfactory for the practical uses of mankind. Some of those concepts and conceptual schemes were carried over into science with only a little pruning and whittling and for a long time proved useful. As the recent revolutions in physics indicate, however, many errors can be made by failure to examine carefully just how common sense ideas should be defined in terms of what the experimenter plans to do." (James B Conant, "Science and Common Sense", 1951)

"Catastrophes are often stimulated by the failure to feel the emergence of a domain, and so what cannot be felt in the imagination is experienced as embodied sensation in the catastrophe. (William I Thompson, "Gaia, a Way of Knowing: Political Implications of the New Biology", 1987)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information." (Edward R Tufte, "Envisioning Information", 1990)

"When a system is predictable, it is already performing as consistently as possible. Looking for assignable causes is a waste of time and effort. Instead, you can meaningfully work on making improvements and modifications to the process. When a system is unpredictable, it will be futile to try and improve or modify the process. Instead you must seek to identify the assignable causes which affect the system. The failure to distinguish between these two different courses of action is a major source of confusion and wasted effort in business today." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] in cybernetics, control is seen not as a function of one agent over something else, but as residing within circular causal networks, maintaining stabilities in a system. Circularities have no beginning, no end and no asymmetries. The control metaphor of communication, by contrast, punctuates this circularity unevenly. It privileges the conceptions and actions of a designated controller by distinguishing between messages sent in order to cause desired effects and feedback that informs the controller of successes or failures." (Klaus Krippendorff, "On Communicating: Otherness, Meaning, and Information", 2009)

"To get a true understanding of the work of mathematicians, and the need for proof, it is important for you to experiment with your own intuitions, to see where they lead, and then to experience the same failures and sense of accomplishment that mathematicians experienced when they obtained the correct results. Through this, it should become clear that, when doing any level of mathematics, the roads to correct solutions are rarely straight, can be quite different, and take patience and persistence to explore." (Alan Sultan & Alice F Artzt, "The Mathematics that every Secondary School Math Teacher Needs to Know", 2011)

"A very different - and very incorrect - argument is that successes must be balanced by failures (and failures by successes) so that things average out. Every coin flip that lands heads makes tails more likely. Every red at roulette makes black more likely. […] These beliefs are all incorrect. Good luck will certainly not continue indefinitely, but do not assume that good luck makes bad luck more likely, or vice versa." (Gary Smith, "Standard Deviations", 2014)

"We are seduced by patterns and we want explanations for these patterns. When we see a string of successes, we think that a hot hand has made success more likely. If we see a string of failures, we think a cold hand has made failure more likely. It is easy to dismiss such theories when they involve coin flips, but it is not so easy with humans. We surely have emotions and ailments that can cause our abilities to go up and down. The question is whether these fluctuations are important or trivial." (Gary Smith, "Standard Deviations", 2014)

"Although cascading failures may appear random and unpredictable, they follow reproducible laws that can be quantified and even predicted using the tools of network science. First, to avoid damaging cascades, we must understand the structure of the network on which the cascade propagates. Second, we must be able to model the dynamical processes taking place on these networks, like the flow of electricity. Finally, we need to uncover how the interplay between the network structure and dynamics affects the robustness of the whole system." (Albert-László Barabási, "Network Science", 2016)

More quotes in "Failure" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Distributions (Just the Quotes)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"We know not to what are due the accidental errors, and precisely because we do not know, we are aware they obey the law of Gauss. Such is the paradox." (Henri Poincaré, "The Foundations of Science", 1913)

"The problems which arise in the reduction of data may thus conveniently be divided into three types: (i) Problems of Specification, which arise in the choice of the mathematical form of the population. (ii) When a specification has been obtained, problems of Estimation arise. These involve the choice among the methods of calculating, from our sample, statistics fit to estimate the unknow n parameters of the population. (iii) Problems of Distribution include the mathematical deduction of the exact nature of the distributions in random samples of our estimates of the parameters, and of other statistics designed to test the validity of our specification (tests of Goodness of Fit)." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"An inference, if it is to have scientific value, must constitute a prediction concerning future data. If the inference is to be made purely with the help of the distribution theory of statistics, the experiments that constitute evidence for the inference must arise from a state of statistical control; until that state is reached, there is no universe, normal or otherwise, and the statistician’s calculations by themselves are an illusion if not a delusion. The fact is that when distribution theory is not applicable for lack of control, any inference, statistical or otherwise, is little better than a conjecture. The state of statistical control is therefore the goal of all experimentation. (William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Normality is a myth; there never was, and never will be, a normal distribution. This is an overstatement from the practical point of view, but it represents a safer initial mental attitude than any in fashion during the past two decades." (Roy C Geary, "Testing for Normality", Biometrika Vol. 34, 1947)

"A good estimator will be unbiased and will converge more and more closely (in the long run) on the true value as the sample size increases. Such estimators are known as consistent. But consistency is not all we can ask of an estimator. In estimating the central tendency of a distribution, we are not confined to using the arithmetic mean; we might just as well use the median. Given a choice of possible estimators, all consistent in the sense just defined, we can see whether there is anything which recommends the choice of one rather than another. The thing which at once suggests itself is the sampling variance of the different estimators, since an estimator with a small sampling variance will be less likely to differ from the true value by a large amount than an estimator whose sampling variance is large." (Michael J Moroney, "Facts from Figures", 1951)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"At the heart of probabilistic statistical analysis is the assumption that a set of data arises as a sample from a distribution in some class of probability distributions. The reasons for making distributional assumptions about data are several. First, if we can describe a set of data as a sample from a certain theoretical distribution, say a normal distribution (also called a Gaussian distribution), then we can achieve a valuable compactness of description for the data. For example, in the normal case, the data can be succinctly described by giving the mean and standard deviation and stating that the empirical (sample) distribution of the data is well approximated by the normal distribution. A second reason for distributional assumptions is that they can lead to useful statistical procedures. For example, the assumption that data are generated by normal probability distributions leads to the analysis of variance and least squares. Similarly, much of the theory and technology of reliability assumes samples from the exponential, Weibull, or gamma distribution. A third reason is that the assumptions allow us to characterize the sampling distribution of statistics computed during the analysis and thereby make inferences and probabilistic statements about unknown aspects of the underlying distribution. For example, assuming the data are a sample from a normal distribution allows us to use the t-distribution to form confidence intervals for the mean of the theoretical distribution. A fourth reason for distributional assumptions is that understanding the distribution of a set of data can sometimes shed light on the physical mechanisms involved in generating the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Symmetry is also important because it can simplify our thinking about the distribution of a set of data. If we can establish that the data are (approximately) symmetric, then we no longer need to describe the shapes of both the right and left halves. (We might even combine the information from the two sides and have effectively twice as much data for viewing the distributional shape.) Finally, symmetry is important because many statistical procedures are designed for, and work best on, symmetric data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We will use the convenient expression 'chosen at random' to mean that the probabilities of the events in the sample space are all the same unless some modifying words are near to the words 'at random'. Usually we will compute the probability of the outcome based on the uniform probability model since that is very common in modeling simple situations. However, a uniform distribution does not imply that it comes from a random source; […]" (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"Fitting data means finding mathematical descriptions of structure in the data. An additive shift is a structural property of univariate data in which distributions differ only in location and not in spread or shape. […] The process of identifying a structure in data and then fitting the structure to produce residuals that have the same distribution lies at the heart of statistical analysis. Such homogeneous residuals can be pooled, which increases the power of the description of the variation in the data." (William S Cleveland, "Visualizing Data", 1993)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"When distributions are compared, the goal is to understand how the distributions shift in going from one data set to the next. […] The most effective way to investigate the shifts of distributions is to compare corresponding quantiles." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"A normal distribution is most unlikely, although not impossible, when the observations are dependent upon one another - that is, when the probability of one event is determined by a preceding event. The observations will fail to distribute themselves symmetrically around the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Linear regression assumes that in the population a normal distribution of error values around the predicted Y is associated with each X value, and that the dispersion of the error values for each X value is the same. The assumptions imply normal and similarly dispersed error distributions." (Fred C Pampel, "Linear Regression: A primer", 2000)

"The principle of maximum entropy is employed for estimating unknown probabilities (which cannot be derived deductively) on the basis of the available information. According to this principle, the estimated probability distribution should be such that its entropy reaches maximum within the constraints of the situation, i.e., constraints that represent the available information. This principle thus guarantees that no more information is used in estimating the probabilities than available." (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003)

"The principle of minimum entropy is employed in the formulation of resolution forms and related problems. According to this principle, the entropy of the estimated probability distribution, conditioned by a particular classification of the given events (e.g., states of the variable involved), is minimum subject to the constraints of the situation. This principle thus guarantees that all available information is used, as much as possible within the given constraints (e.g., required number of states), in the estimation of the unknown probabilities." (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003)

"In the laws of probability theory, likelihood distributions are fixed properties of a hypothesis. In the art of rationality, to explain is to anticipate. To anticipate is to explain." (Eliezer S. Yudkowsky, "A Technical Explanation of Technical Explanation", 2005)

"The central limit theorem says that, under conditions almost always satisfied in the real world of experimentation, the distribution of such a linear function of errors will tend to normality as the number of its components becomes large. The tendency to normality occurs almost regardless of the individual distributions of the component errors. An important proviso is that several sources of error must make important contributions to the overall error and that no particular source of error dominate the rest." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"For some scientific data the true value cannot be given by a constant or some straightforward mathematical function but by a probability distribution or an expectation value. Such data are called probabilistic. Even so, their true value does not change with time or place, making them distinctly different from most statistical data of everyday life." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"To fulfill the requirements of the theory underlying uncertainties, variables with random uncertainties must be independent of each other and identically distributed. In the limiting case of an infinite number of such variables, these are called normally distributed. However, one usually speaks of normally distributed variables even if their number is finite." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"It is not enough to give a single summary for a distribution - we need to have an idea of the spread, sometimes known as the variability. [...] The range is a natural choice, but is clearly very sensitive to extreme values [...] In contrast the inter-quartile range (IQR) is unaffected by extremes. This is the distance between the 25th and 75th percentiles of the data and so contains the ‘central half’ of the numbers [...] Finally the standard deviation is a widely used measure of spread. It is the most technically complex measure, but is only really appropriate for well-behaved symmetric data since it is also unduly influenced by outlying values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"[...] the Central Limit Theorem [...] says that the distribution of sample means tends towards the form of a normal distribution with increasing sample size, almost regardless of the shape of the original data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"There is no ‘correct’ way to display sets of numbers: each of the plots we have used has some advantages: strip-charts show individual points, box-and-whisker plots are convenient for rapid visual summaries, and histograms give a good feel for the underlying shape of the data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

More quotes on "Distributions" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Inference (Just the Quotes)

"Analysis is the obtaining of the thing sought by assuming it and so reasoning up to an admitted truth; synthesis is the obtaining of the thing sought by reasoning up to the inference and proof of it." (Eudoxus, cca. 4th century BC)

"Every stage of science has its train of practical applications and systematic inferences, arising both from the demands of convenience and curiosity, and from the pleasure which, as we have already said, ingenious and active-minded men feel in exercising the process of deduction." (William Whewell, "The Philosophy of the Inductive Sciences Founded Upon Their History", 1840)

"Truths are known to us in two ways: some are known directly, and of themselves; some through the medium of other truths. The former are the subject of Intuition, or Consciousness; the latter, of Inference; the latter of Inference. The truths known by Intuition are the original premises, from which all others are inferred." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1858)

"It is experience which has given us our first real knowledge of Nature and her laws. It is experience, in the shape of observation and experiment, which has given us the raw material out of which hypothesis and inference have slowly elaborated that richer conception of the material world which constitutes perhaps the chief, and certainly the most characteristic, glory of the modern mind." (Arthur J Balfour, "The Foundations of Belief", 1912)

"The only thing we know for sure about a missing data point is that it is not there, and there is nothing that the magic of statistics can do change that. The best that can be managed is to estimate the extent to which missing data have influenced the inferences we wish to draw." (Howard Wainer, "14 Conversations About Three Things", Journal of Educational and Behavioral Statistics Vol. 35(1, 2010)

"The study of inductive inference belongs to the theory of probability, since observational facts can make a theory only probable but will never make it absolutely certain." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"Statistics is the name for that science and art which deals with uncertain inferences - which uses numbers to find out something about nature and experience." (Warren Weaver, 1952)

"The heart of all major discoveries in the physical sciences is the discovery of novel methods of representation and so of fresh techniques by which inferences can be drawn - and drawn in ways which fit the phenomena under investigation." (Stephen Toulmin, "The Philosophy of Science", 1957)

"Assumptions that we make, such as those concerning the form of the population sampled, are always untrue." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"Exact truth of a null hypothesis is very unlikely except in a genuine uniformity trial." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"[...] the test of significance has been carrying too much of the burden of scientific inference. It may well be the case that wise and ingenious investigators can find their way to reasonable conclusions from data because and in spite of their procedures. Too often, however, even wise and ingenious investigators [...] tend to credit the test of significance with properties it does not have." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"[...] we need to get on with the business of generating [...] hypotheses and proceed to do investigations and make inferences which bear on them, instead of [...] testing the statistical null hypothesis in any number of contexts in which we have every reason to suppose that it is false in the first place." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"An analogy is a relationship between two entities, processes, or what you will, which allows inferences to be made about one of the things, usually that about which we know least, on the basis of what we know about the other. […] The art of using analogy is to balance up what we know of the likenesses against the unlikenesses between two things, and then on the basis of this balance make an inference as to what is called the neutral analogy, that about which we do not know." (Rom Harré," The Philosophies of Science" , 1972)

"Almost all efforts at data analysis seek, at some point, to generalize the results and extend the reach of the conclusions beyond a particular set of data. The inferential leap may be from past experiences to future ones, from a sample of a population to the whole population, or from a narrow range of a variable to a wider range. The real difficulty is in deciding when the extrapolation beyond the range of the variables is warranted and when it is merely naive. As usual, it is largely a matter of substantive judgment - or, as it is sometimes more delicately put, a matter of 'a priori nonstatistical considerations'." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (W Edwards Deming, "On Probability as Basis for Action", American Statistician, Volume 29, Number 4, November 1975)

"The advantage of semantic networks over standard logic is that some selected set of the possible inferences can be made in a specialized and efficient way. If these correspond to the inferences that people make naturally, then the system will be able to do a more natural sort of reasoning than can be easily achieved using formal logical deduction." (Avron Barr, Natural Language Understanding, AI Magazine Vol. 1 (1), 1980)

"Another reason for the applied statistician to care about Bayesian inference is that consumers of statistical answers, at least interval estimates, commonly interpret them as probability statements about the possible values of parameters. Consequently, the answers statisticians provide to consumers should be capable of being interpreted as approximate Bayesian statements." (Donald B Rubin, "Bayesianly justifiable and relevant frequency calculations for the applied statistician", Annals of Statistics 12(4), 1984)

"The grotesque emphasis on significance tests in statistics courses of all kinds [...] is taught to people, who if they come away with no other notion, will remember that statistics is about tests for significant differences. [...] The apparatus on which their statistics course has been constructed is often worse than irrelevant, it is misleading about what is important in examining data and making inferences." (John A Nelder, "Discussion of Dr Chatfield’s paper", Journal of the Royal Statistical Society A 148, 1985)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"It is difficult to distinguish deduction from what in other circumstances is called problem-solving. And concept learning, inference, and reasoning by analogy are all instances of inductive reasoning. (Detectives typically induce, rather than deduce.) None of these things can be done separately from each other, or from anything else. They are pseudo-categories." (Frank Smith, "To Think: In Language, Learning and Education", 1990)

"No one has ever shown that he or she had a free lunch. Here, of course, 'free lunch' means 'usefulness of a model that is locally easy to make inferences from'. (John Tukey, "Issues relevant to an honest account of data-based inference, partially in the light of Laurie Davies’ paper", 1993)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain ‘exact’ posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don’t see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science vol. 10, 1995)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Theories rarely arise as patient inferences forced by accumulated facts. Theories are mental constructs potentiated by complex external prods (including, in idealized cases, a commanding push from empirical reality)." (Stephen J Gould, "Leonardo's Mountain of Clams and the Diet of Worms" , 1998)

"Let us regard a proof of an assertion as a purely mechanical procedure using precise rules of inference starting with a few unassailable axioms. This means that an algorithm can be devised for testing the validity of an alleged proof simply by checking the successive steps of the argument; the rules of inference constitute an algorithm for generating all the statements that can be deduced in a finite number of steps from the axioms." (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"[…] philosophical theories are structured by conceptual metaphors that constrain which inferences can be drawn within that philosophical theory. The (typically unconscious) conceptual metaphors that are constitutive of a philosophical theory have the causal effect of constraining how you can reason within that philosophical framework." (George Lakoff, "Philosophy in the Flesh: The Embodied Mind and its Challenge to Western Thought", 1999)

"Even if our cognitive maps of causal structure were perfect, learning, especially double-loop learning, would still be difficult. To use a mental model to design a new strategy or organization we must make inferences about the consequences of decision rules that have never been tried and for which we have no data. To do so requires intuitive solution of high-order nonlinear differential equations, a task far exceeding human cognitive capabilities in all but the simplest systems." (John D Sterman, "Business Dynamics: Systems thinking and modeling for a complex world", 2000)

"Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The Bayesian approach is based on the following postulates: (B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. […] (B2) We can make probability statements about parameters, even though they are fixed constants. (B3) We make inferences about a parameter θ by producing a probability distribution for θ. Inferences, such as point estimates and interval estimates, may then be extracted from this distribution." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Statistical inference, or 'learning' as it is called in computer science, is the process of using data to infer the distribution that generated the data." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"A mental model is conceived […] as a knowledge structure possessing slots that can be filled not only with empirically gained information but also with ‘default assumptions’ resulting from prior experience. These default assumptions can be substituted by updated information so that inferences based on the model can be corrected without abandoning the model as a whole. Information is assimilated to the slots of a mental model in the form of ‘frames’ which are understood here as ‘chunks’ of knowledge with a well-defined meaning anchored in a given body of shared knowledge." (Jürgen Renn, "Before the Riemann Tensor: The Emergence of Einstein’s Double Strategy", "The Universe of General Relativity" Ed. by A.J. Kox & Jean Eisenstaedt, 2005)

"Statistics is the branch of mathematics that uses observations and measurements called data to analyze, summarize, make inferences, and draw conclusions based on the data gathered." (Allan G Bluman, "Probability Demystified", 2005)

"In specific cases, we think by applying mental rules, which are similar to rules in computer programs. In most of the cases, however, we reason by constructing, inspecting, and manipulating mental models. These models and the processes that manipulate them are the basis of our competence to reason. In general, it is believed that humans have the competence to perform such inferences error-free. Errors do occur, however, because reasoning performance is limited by capacities of the cognitive system, misunderstanding of the premises, ambiguity of problems, and motivational factors. Moreover, background knowledge can significantly influence our reasoning performance. This influence can either be facilitation or an impedance of the reasoning process." (Carsten Held et al, "Mental Models and the Mind", 2006)

"One of the classical assumptions in linear regression analysis is that of equal variance, which is frequently referred to as homoscedasticity. However, this assumption may not be valid in data analysis arising from many fields (e.g., economics, finance, engineering, and biological science). When heteroscedasticity (nonconstant variance) occurs, the statistical inferences and predictions via the ordinary least squares method are often not reliable. Therefore, it is crucial to study the heteroscedastic error structure in linear model fitting." (Xiaogang Su et al, "Treed Variance", Journal of Computational and Graphical Statistics, Vol. 15 (2), 2006)

"[…] statistics is the key discipline for predicting the future or for making inferences about the unknown, or for producing convenient summaries of data." (David J Hand, "Statistics: A Very Short Introduction", 2008)

"When statistical inferences, such as p-values, follow extensive looks at the data, they no longer have their usual interpretation. Ignoring this reality is dishonest: it is like painting a bull’s eye around the landing spot of your arrow. This is known in some circles as p-hacking, and much has been written about its perils and pitfalls." (Robert E Kass et all, "Ten Simple Rules for Effective Statistical Practice", PLoS Comput Biol 12(6), 2016)

"Inference is to bring about a new thought, which in logic amounts to drawing a conclusion, and more generally involves using what we already know, and what we see or observe, to update prior beliefs. […] Inference is also a leap of sorts, deemed reasonable […] Inference is a basic cognitive act for intelligent minds. If a cognitive agent (a person, an AI system) is not intelligent, it will infer badly. But any system that infers at all must have some basic intelligence, because the very act of using what is known and what is observed to update beliefs is inescapably tied up with what we mean by intelligence. If an AI system is not inferring at all, it doesn’t really deserve to be called AI." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"In statistical inference and machine learning, we often talk about estimates and estimators. Estimates are basically our best guesses regarding some quantities of interest given (finite) data. Estimators are computational devices or procedures that allow us to map between a given (finite) data sample and an estimate of interest." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"The basic goal of causal inference is to estimate the causal effect of one set of variables on another. In most cases, to do it accurately, we need to know which variables we should control for. [...] to accurately control for confounders, we need to go beyond the realm of pure statistics and use the information about the data-generating process, which can be encoded as a (causal) graph. In this sense, the ability to translate between graphical and statistical properties is central to causal inference." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"Statistics is the science, the art, the philosophy, and the technique of making inferences from the particular to the general." (John W Tukey)

"The old rule of trusting the Central Limit Theorem if the sample size is larger than 30 is just that–old. Bootstrap and permutation testing let us more easily do inferences for a wider variety of statistics." (Tim Hesterberg)

More quotes on "Inference" at the-web-of-knowledge.blogspot.com,

08 December 2018

🔭Data Science: Creativity (Just the Quotes)

"[…] science conceived as resting on mere sense-perception, with no other source of observation, is bankrupt, so far as concerns its claim to self-sufficiency. Science can find no individual enjoyment in nature: Science can find no aim in nature: Science can find no creativity in nature; it finds mere rules of succession. These negations are true of Natural Science. They are inherent in it methodology." (Alfred N Whitehead, "Modes of Thought", 1938)

"The design process involves a series of operations. In map design, it is convenient to break this sequence into three stages. In the first stage, you draw heavily on imagination and creativity. You think of various graphic possibilities, consider alternative ways." (Arthur H Robinson, "Elements of Cartography", 1953)

"At each level of complexity, entirely new properties appear. [And] at each stage, entirely new laws, concepts, and generalizations are necessary, requiring inspiration and creativity to just as great a degree as in the previous one." (Herb Anderson, 1972)

"Facts do not ‘speak for themselves’; they are read in the light of theory. Creative thought, in science as much as in the arts, is the motor of changing opinion. Science is a quintessentially human activity, not a mechanized, robot-like accumulation of objective information, leading by laws of logic to inescapable interpretation." (Stephen J Gould, "Ever Since Darwin", 1977)

"Science is not a heartless pursuit of objective information. It is a creative human activity, its geniuses acting more as artists than information processors. Changes in theory are not simply the derivative results of the new discoveries but the work of creative imagination influenced by contemporary social and political forces." (Stephen J Gould, "Ever Since Darwin: Reflections in Natural History", 1977)

"Science, since people must do it, is a socially embedded activity. It progresses by hunch, vision, and intuition. Much of its change through time does not record a closer approach to absolute truth, but the alteration of cultural contexts that influence it so strongly. Facts are not pure and unsullied bits of information; culture also influences what we see and how we see it. Theories, moreover, are not inexorable inductions from facts. The most creative theories are often imaginative visions imposed upon facts; the source of imagination is also strongly cultural." (Stephen J Gould, "The Mismeasure of Man", 1980)

"Some methods, such as those governing the design of experiments or the statistical treatment of data, can be written down and studied. But many methods are learned only through personal experience and interactions with other scientists. Some are even harder to describe or teach. Many of the intangible influences on scientific discovery - curiosity, intuition, creativity - largely defy rational analysis, yet they are often the tools that scientists bring to their work." (Committee on the Conduct of Science, "On Being a Scientist", 1989)

"All of engineering involves some creativity to cover the parts not known, and almost all of science includes some practical engineering to translate the abstractions into practice." (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"Good engineering is not a matter of creativity or centering or grounding or inspiration or lateral thinking, as useful as those might be, but of decoding the clever, even witty, messages the solution space carves on the corpses of the ideas in which you believed with all your heart, and then building the road to the next message." (Fred Hapgood, "Up the infinite Corridor: MIT and the Technical Imagination", 1993)

"[…] creativity is the ability to see the obvious over the long term, and not to be restrained by short-term conventional wisdom." (Arthur J Birch, "To See the Obvious", 1995)

"Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something. It seemed obvious to them after a while. That’s because they were able to connect experiences they’ve had and synthesize new things." (Steve Jobs, 1996)

"The pursuit of science is more than the pursuit of understanding. It is driven by the creative urge, the urge to construct a vision, a map, a picture of the world that gives the world a little more beauty and coherence than it had before." (John A Wheeler, "Geons, Black Holes, and Quantum Foam: A Life in Physics", 1998)

"Simple observation generally gets us nowhere. It is the creative imagination that increases our understanding by finding connections between apparently unrelated phenomena, and forming logical, consistent theories to explain them. And if a theory turns out to be wrong, as many do, all is not lost. The struggle to create an imaginative, correct picture of reality frequently tells us where to go next, even when science has temporarily followed the wrong path." (Richard Morris, "The Universe, the Eleventh Dimension, and Everything: What We Know and How We Know It", 1999)

"Science, and physics in particular, has developed out of the Newtonian paradigm of mechanics. In this world view, every phenomenon we observe can be reduced to a collection of atoms or particles, whose movement is governed by the deterministic laws of nature. Everything that exists now has already existed in some different arrangement in the past, and will continue to exist so in the future. In such a philosophy, there seems to be no place for novelty or creativity." (Francis Heylighen, "The science of self-organization and adaptivity", 2001)

"Evolution moves towards greater complexity, greater elegance, greater knowledge, greater intelligence, greater beauty, greater creativity, and greater levels of subtle attributes such as love. […] Of course, even the accelerating growth of evolution never achieves an infinite level, but as it explodes exponentially it certainly moves rapidly in that direction." (Ray Kurzweil, "The Singularity is Near", 2005)

"Systemic problems trace back in the end to worldviews. But worldviews themselves are in flux and flow. Our most creative opportunity of all may be to reshape those worldviews themselves. New ideas can change everything." (Anthony Weston, "How to Re-Imagine the World", 2007)

More quotes on "Creativity" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Relations (Just the Quotes)

"[It] may be laid down as a general rule that, if the result of a long series of precise observations approximates a simple relation so closely that the remaining difference is undetectable by observation and may be attributed to the errors to which they are liable, then this relation is probably that of nature." (Pierre-Simon Laplace, "Mémoire sur les Inégalites Séculaires des Planètes et des Satellites", 1787)

"Discoveries are not generally made in the order of their scientific arrangement: their connexions and relations are made out gradually; and it is only when the fermentation of invention has subsided that the whole clears into simplicity and order. " (William Whewell, "An Elementary Treatise on Mechanics" Vol. I, 1819)

"There is no inquiry which is not finally reducible to a question of Numbers; for there is none which may not be conceived of as consisting in the determination of quantities by each other, according to certain relations." (Auguste Comte, "The Positive Philosophy", 1830)

"Things of all kinds are subject to a universal law which may be called the law of large numbers. It consists in the fact that, if one observes very considerable numbers of events of the same nature, dependent on constant causes and causes which vary irregularly, sometimes in one direction, sometimes in the other, it is to say without their variation being progressive in any definite direction, one shall find, between these numbers, relations which are almost constant." (Siméon-Denis Poisson, "Poisson’s Law of Large Numbers", 1837)

"A discovery is generally an unforeseen relation not included in theory." (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1865)

"[…] deduction consists in constructing an icon or diagram the relations of whose parts shall present a complete analogy with those of the parts of the object of reasoning, of experimenting upon this image in the imagination, and of observing the result so as to discover unnoticed and hidden relations among the parts." (Charles S Peirce, 1885)

"The use of figures is, above all, then, for the purpose of making known certain relations between the objects that we study, and these relations are those which occupy the branch of geometry that we have called Analysis Situs [that is, topology], and which describes the relative situation of points and lines on surfaces, without consideration of their magnitude." (Henri Poincaré, "Analysis Situs", Journal de l'Ecole Polytechnique 1, 1895)

"Deduction is that mode of reasoning which examines the state of things asserted in the premises, forms a diagram of that state of things, perceives in the parts of the diagram relations not explicitly mentioned in the premises, satisfies itself by mental experiments upon the diagram that these relations would always subsist, or at least would do so in a certain proportion of cases, and concludes their necessary, or probable, truth." (Charles S Peirce, "Kinds of Reasoning", cca. 1896)

"Mathematicians do not study objects, but the relations between objects; to them it is a matter of indifference if these objects are replaced by others, provided that the relations do not change. Matter does not engage their attention, they are interested in form alone." (Henri Poincaré, "Science and Hypothesis", 1901)

"The laws of nature are drawn from experience, but to express them one needs a special language: for, ordinary language is too poor and too vague to express relations so subtle, so rich, so precise. Here then is the first reason why a physicist cannot dispense with mathematics: it provides him with the one language he can speak [...]" (Henri Poincaré, "The Value of Science", 1905)

"The aim of science is not things themselves, as the dogmatists in their simplicity imagine, but the relation between things." (Henri Poincaré, "Science and Hypothesis", 1905)

"But surely it is self-evident that every theory is merely a framework or scheme of concepts together with their necessary relations to one another, and that the basic elements can be constructed as one pleases." (Gottlob Frege, "On the Foundations of Geometry and Formal Theories of Arithmetic" , cca. 1903-1909)

"Statistics may be defined as numerical statements of facts by means of which large aggregates are analyzed, the relations of individual units to their groups are ascertained, comparisons are made between groups, and continuous records are maintained for comparative purposes." (Melvin T Copeland. "Statistical Methods" [in: Harvard Business Studies, Vol. III, Ed. by Melvin T Copeland, 1917])

"Observed facts must be built up, woven together, ordered, arranged, systematized into conclusions and theories by reflection and reason, if they are to have full bearing on life and the universe. Knowledge is the accumulation of facts. Wisdom is the establishment of relations. And just because the latter process is delicate and perilous, it is all the more delightful." (Gamaliel Bradford, "Darwin", 1926)

"A system is said to be coherent if every fact in the system is related every other fact in the system by relations that are not merely conjunctive. A deductive system affords a good example of a coherent system." (Lizzie S Stebbing, "A modern introduction to logic", 1930)

"To apply the category of cause and effect means to find out which parts of nature stand in this relation. Similarly, to apply the gestalt category means to find out which parts of nature belong as parts to functional wholes, to discover their position in these wholes, their degree of relative independence, and the articulation of larger wholes into sub-wholes." (Kurt Koffka, 1931)

"Analogies are useful for analysis in unexplored fields. By means of analogies an unfamiliar system may be compared with one that is better known. The relations and actions are more easily visualized, the mathematics more readily applied, and the analytical solutions more readily obtained in the familiar system." (Harry F Olson, "Dynamical Analogies", 1943)

"Given any object, relatively abstracted from its surroundings for study, the behavioristic approach consists in the examination of the output of the object and of the relations of this output to the input. By output is meant any change produced in the surroundings by the object. By input, conversely, is meant any event external to the object that modifies this object in any manner." (Arturo Rosenblueth, Norbert Wiener & Julian Bigelow, "Behavior, Purpose and Teleology", Philosophy of Science 10, 1943)

"It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest." (William E Deming, "Statistical Adjustment of Data", 1943)

"When the mathematician speaks of the existence of a 'functional relation' between two variable quantities, he means that they are connected by a simple 'formula that is to say, if we are told the value of one of the variable quantities we can find the value of the second quantity by substituting in the formula which tells us how they are related. [...] The thing to be clear about before we proceed further is that a functional relationship in mathematics means an exact and predictable relationship, with no ifs or buts about lt. It is useful in practice so long as the ifs and buts are only tiny voices which even the most ardent protagonist of proportional representation can ignore with a clear conscience." (Michael J Moroney, "Facts from Figures", 1951)

"The principle of complementarity states that no single model is possible which could provide a precise and rational analysis of the connections between these phenomena [before and after measurement]. In such a case, we are not supposed, for example, to attempt to describe in detail how future phenomena arise out of past phenomena. Instead, we should simply accept without further analysis the fact that future phenomena do in fact somehow manage to be produced, in a way that is, however, necessarily beyond the possibility of a detailed description. The only aim of a mathematical theory is then to predict the statistical relations, if any, connecting the phenomena." (David Bohm, "A Suggested Interpretation of the Quantum Theory in Terms of ‘Hidden’ Variables", 1952)

"Every metaphor is the tip of a submerged model. […] Use of theoretical models resembles the use of metaphors in requiring analogical transfer of a vocabulary. Metaphor and model-making reveal new relationships; both are attempts to pour new content into old bottles." (Max Black," Models and Metaphors", 1962)

"Certain properties are necessary or sufficient conditions for other properties, and the network of causal relations thus established will make the occurrence of one property at least tend, subject to the presence of other properties, to promote or inhibit the occurrence of another. Arguments from models involve those analogies which can be used to predict the occurrence of certain properties or events, and hence the relevant relations are causal, at least in the sense of implying a tendency to co-occur." (Mary B Hesse," Models and Analogies in Science", 1963)

"[…] the human reason discovers new relations between things not by deduction, but by that unpredictable blend of speculation and insight […] induction, which - like other forms of imagination - cannot be formalized." (Jacob Bronowski, "The Reach of Imagination", 1967)

"Thus, there exist models, principles, and laws that apply to generalized systems or their subclasses, irrespective of their particular kind, the nature of their component elements, and the relations or 'forces' between them. It seems legitimate to ask for a theory, not of systems of a more or less special kind, but of universal principles applying to systems in general. In this way we postulate a new discipline called General System Theory. Its subject matter is the formulation and derivation of those principles which are valid for ‘systems’ in general." (Ludwig von Bertalanffy, "General System Theory: Foundations, Development, Applications", 1968)

"You cannot sum up the behavior of the whole from the isolated parts, and you have to take into account the relations between the various subordinate systems which are super-ordinated to them in order to understand the behavior of the parts." (Ludwig von Bertalanffy, "General System Theory", 1968)

"In complex systems cause and effect are often not closely related in either time or space. The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by nonlinear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive-feedback loops describing growth processes as well as negative, goal-seeking loops. In the complex system the cause of a difficulty may lie far back in time from the symptoms, or in a completely different and remote part of the system. In fact, causes are usually found, not in prior events, but in the structure and policies of the system." (Jay Wright Forrester, "Urban dynamics", 1969)

"The advantages of models are, on one hand, that they force us to present a 'complete' theory by which I mean a theory taking into account all relevant phenomena and relations and, on the other hand, the confrontation with observation, that is, reality." (Jan Tinbergen, "The Use of Models: Experience," 1969)

"Self-organization can be defined as the spontaneous creation of a globally coherent pattern out of local interactions. Because of its distributed character, this organization tends to be robust, resisting perturbations. The dynamics of a self-organizing system is typically non-linear, because of circular or feedback relations between the components. Positive feedback leads to an explosive growth, which ends when all components have been absorbed into the new configuration, leaving the system in a stable, negative feedback state. Non-linear systems have in general several stable states, and this number tends to increase (bifurcate) as an increasing input of energy pushes the system farther from its thermodynamic equilibrium." (Francis Heylighen, "The Science Of Self-Organization And Adaptivity", 1970)

"A system in one perspective is a subsystem in another. But the systems view always treats systems as integrated wholes of their subsidiary components and never as the mechanistic aggregate of parts in isolable causal relations." (Ervin László, "Introduction to Systems Philosophy", 1972)

"Understandability implies that the graph will mean something to the audience. If the presentation has little meaning to the audience, it has little value. Understandability is the difference between data and information. Data are facts. Information is facts that mean something and make a difference to whoever receives them. Graphic presentation enhances understanding in a number of ways. Many people find that the visual comparison and contrast of information permit relationships to be grasped more easily. Relationships that had been obscure become clear and provide new insights." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"Organization denotes those relations that must exist among the components of a system for it to be a member of a specific class. Structure denotes the components and relations that actually constitute a particular unity and make its organization real." (Humberto Maturana, "The Tree of Knowledge", 1987)

"A semantic network or net represents knowledge as a net-like graph. An idea, event, situation or object almost always has a composite structure; this is represented in a semantic network by a corresponding structure of nodes (drawn as circles or boxes) representing conceptual units, and directed links (drawn as arrows between the nodes) representing the relations between the units." (Fritz Lehman, "Semantic Networks", Computers & Mathematics with Applications Vol. 23 (2-5), 1992)

"Understanding ecological interdependence means understanding relationships. It requires the shifts of perception that are characteristic of systems thinking - from the parts to the whole, from objects to relationships, from contents to patterns." (Fritjof Capra, "The Web of Life: A New Scientific Understanding of Living Systems", 1996)

"[Schemata are] knowledge structures that represent objects or events and provide default assumptions about their characteristics, relationships, and entailments under conditions of incomplete information." (Paul J DiMaggio, "Culture and Cognition", Annual Review of Sociology No. 23, 1997)

"We use mathematics and statistics to describe the diverse realms of randomness. From these descriptions, we attempt to glean insights into the workings of chance and to search for hidden causes. With such tools in hand, we seek patterns and relationships and propose predictions that help us make sense of the world." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"Complexity is that property of a model which makes it difficult to formulate its overall behaviour in a given language, even when given reasonably complete information about its atomic components and their inter-relations." (Bruce Edmonds, "Syntactic Measures of Complexity", 1999)

"Fuzzy relations are developed by allowing the relationship between elements of two or more sets to take on an infinite number of degrees of relationship between the extremes of 'completely related' and 'not related', which are the only degrees of relationship possible in crisp relations. In this sense, fuzzy relations are to crisp relations as fuzzy sets are to crisp sets; crisp sets and relations are more constrained realizations of fuzzy sets and relations." (Timothy J Ross & W Jerry Parkinson, "Fuzzy Set Theory, Fuzzy Logic, and Fuzzy Systems", 2002)

"There exists an alternative to reductionism for studying systems. This alternative is known as holism. Holism considers systems to be more than the sum of their parts. It is of course interested in the parts and particularly the networks of relationships between the parts, but primarily in terms of how they give rise to and sustain in existence the new entity that is the whole whether it be a river system, an automobile, a philosophical system or a quality system." (Michael C Jackson, "Systems Thinking: Creative Holism for Manager", 2003)

"A diagram is a graphic shorthand. Though it is an ideogram, it is not necessarily an abstraction. It is a representation of something in that it is not the thing itself. In this sense, it cannot help but be embodied. It can never be free of value or meaning, even when it attempts to express relationships of formation and their processes. At the same time, a diagram is neither a structure nor an abstraction of structure." (Peter Eisenman, "Written Into the Void: Selected Writings", 1990-2004, 2007)

"A conceptual model of an interactive application is, in summary: the structure of the application - the objects and their operations, attributes, and relationships; an idealized view of the how the application works – the model designers hope users will internalize; the mechanism by which users accomplish the tasks the application is intended to support." (Jeff Johnson & Austin Henderson, "Conceptual Models", 2011)

"We use the term fuzzy logic to refer to all aspects of representing and manipulating knowledge that employ intermediary truth-values. This general, commonsense meaning of the term fuzzy logic encompasses, in particular, fuzzy sets, fuzzy relations, and formal deductive systems that admit intermediary truth-values, as well as the various methods based on them." (Radim Belohlavek & George J Klir, "Concepts and Fuzzy Logic", 2011)

"Mathematical abstraction is the process of considering and manipulating operations, rules, methods and concepts divested from their reference to real world phenomena and circumstances, and also deprived from the content connected to particular applications. […] abstraction is the process of passing from things to ideas, properties and relations, to properties of relations and relations of properties, to properties of relations between properties, etc. Being a fundamental thinking process, abstraction has two faces: a logical face and evidently a psychological aspect that is the target of cognitive sciences." (Hourya B Sinaceur,"Facets and Levels of Mathematical Abstraction", Standards of Rigor in Mathematical Practice 18-1, 2014)

More quotes on "Relations" at the-web-of-knowledge.blogspot.com.

07 December 2018

🔭Data Science: Intuition (Just the Quotes)

"We study the complex in the simple; and only from the intuition of the lower can we safely proceed to the intellection of the higher degrees. The only danger lies in the leaping from low to high, with the neglect of the intervening gradations." (Samuel T Coleridge, "Physiology of Life", 1848)

"The scientific value of truth is not, however, ultimate or absolute. It rests partly on practical, partly on aesthetic interests. As our ideas are gradually brought into conformity with the facts by the painful process of selection, - for intuition runs equally into truth and into error, and can settle nothing if not controlled by experience, - we gain vastly in our command over our environment. This is the fundamental value of natural science" (George Santayana, "The Sense of Beauty: Being the Outlines of Aesthetic Theory", 1896)

"It is by logic that we prove, but by intuition that we discover. To know how to criticize is good, to know how to create is better." (Henri Poincaré, "Science and Method", 1908)

"Mathematics is merely a shorthand method of recording physical intuition and physical reasoning, but it should not be a formalism leading from nowhere to nowhere, as it is likely to be made by one who does not realize its purpose as a tool." (Charles P Steinmetz, "Transactions of the American Institute of Electrical Engineers", 1909)

"Scientific hypotheses are intuitive leaps in the dark." (Alexander Goldenweiser, "Robots or Gods: An Essay on Craft and Mind", American Journal of Sociology 37 (3), 1931)

"There is no such thing as a logical method of having new ideas or a logical reconstruction of this process […] very discovery contains an ‘irrational element’ or a ‘creative intuition’." (Karl R Popper, "The logic of scientific discover", 1934)

"Science does not mean an idle resting upon a body of certain knowledge; it means unresting endeavor and continually progressing development toward an end which the poetic intuition may apprehend, but which the intellect can never fully grasp." (Max Planck, "The Philosophy of Physics", 1936)

"It is his intuition, his mystical insight into the nature of things, rather than his reasoning which makes a great scientist." (Karl R Popper, "The Open Society and Its Enemies", 1945)

"[...] when the pioneer in science sends for the groping feelers of his thoughts, he must have a vivid intuitive imagination, for new ideas are not generated by deduction, but by an artistically creative imagination. Nevertheless, the worth of a new idea is invariably determined, not by the degree of its intuitiveness - which, incidentally, is to a major extent a matter of experience and habit - but by the scope and accuracy of the individual laws to the discovery of which it eventually leads. (Max Planck, "The Meaning and Limits of Exact Science", Science Vol. 110 (2857), 1949)

"[…] observation is not enough, and it seems to me that in science, as in the arts, there is very little worth having that does not require the exercise of intuition as well as of intelligence, the use of imagination as well as of information." (Kathleen Lonsdale, "Facts About Crystals", American Scientist Vol. 39 (4), 1951)

"All great discoveries in experimental physics have been due to the intuition of men who made free use of models, which were for them not products of the imagination, but representatives of real things." (Max Born, "Physical Reality", Philosophical Quarterly Vol. 3 (11),1953)

"Mathematicians create by acts of insight and intuition. Logic then sanctions the conquests of intuition. It is the hygiene that mathematics practice to keep its ideas healthy and strong. Moreover, the whole structure rests fundamentally on uncertain ground, the intuitions of man." (Morris Kline, "Mathematics in Western Culture", 1953)

"The construction of hypotheses is a creative act of inspiration, intuition, invention; its essence is the vision of something new in familiar material." (Milton Friedman, "Essays in Positive Economics", 1953)

"Science, then, is the attentive consideration of common experience; it is common knowledge extended and refined. Its validity is of the same order as that of ordinary perception; memory, and understanding. Its test is found, like theirs, in actual intuition, which sometimes consists in perception and sometimes in intent." (George Santayana, "The Life of Reason, or the Phases of Human Progress", 1954)

"Intuition implies the act of grasping the meaning or significance or structure of a problem without explicit reliance on the analytical apparatus of one’s craft. It is the intuitive mode that yields hypotheses quickly, that produces interesting combinations of ideas before their worth is known. It precedes proof: indeed, it is what the techniques of analysis and proof are designed to test and check. It is founded on a kind of combinatorial playfulness that is only possible when the consequences of error are not overpowering or sinful." (Jerome S Bruner, "On Learning Mathematics", Mathematics Teacher Vol. 53, 1960)

"The functional validity of a working hypothesis is not a priori certain, because often it is initially based on intuition. However, logical deductions from such a hypothesis provide expectations (so called prognoses) as to the circumstances under which certain phenomena will appear in nature. Such a postulate or working hypothesis can then be substantiated by additional observations or by experiments especially arranged to test details. The value of the hypothesis is strengthened if the observed facts fit the expectation within the limits of permissible error." (R Willem van Bemmelen, "The Scientific Character of Geology", The Journal of Geology Vol 69 (4), 1961)

"The most natural way to give an independence proof is to establish a model with the required properties. This is not the only way to proceed since one can attempt to deal directly and analyze the structure of proofs. However, such an approach to set theoretic questions is unnatural since all our intuition come from our belief in the natural, almost physical model of the mathematical universe." (Paul J Cohen, "Set Theory and the Continuum Hypothesis", 1966)

"Real progress in understanding nature is rarely incremental. All important advances are sudden intuitions, new principles, new ways of seeing." (Marilyn Ferguson, "The Aquarian Conspiracy: Personal and Social Transformation in the 1980s", 1980)

"[…] science must be understood as a social phenomenon, a gutsy, human enterprise, not the work of robots programmed to collect pure information. […] Science, since people must do it, is a socially embedded activity. It progresses by hunch, vision, and intuition." (Stephen J Gould, "The Mismeasure of Man", 1980)

"That is to say, intuition is not a direct perception of something existing externally and eternally. It is the effect in the mind of certain experiences of activity and manipulation of concrete objects (at a later stage, of marks on paper or even mental images)." (Philip J Davis & Reuben Hersh, "The Mathematical Experience", 1981)

"The common perception of science as a rational activity, in which one confronts the evidence of fact with an open mind, could not be more false. Facts assume significance only within a pre-existing intellectual structure, which may be based as much on intuition and prejudice as on reason." (Walter Gratzer, The Guardian, 1989)

"Intuition is the art, peculiar to the human mind, of working out the correct answer from data that is, in itself, incomplete or even, perhaps, misleading." (Isaac Asimov, "Forward the Foundation", 1993)

"Scientists reach their conclusions for the damnedest of reasons: intuition, guesses, redirections after wild-goose chases, all combing with a dollop of rigorous observation and logical reasoning to be sure […] This messy and personal side of science should not be disparaged, or covered up, by scientists for two major reasons. First, scientists should proudly show this human face to display their kinship with all other modes of creative human thought […] Second, while biases and references often impede understanding, these mental idiosyncrasies may also serve as powerful, if quirky and personal, guides to solutions." (Stephen J Gould, "Dinosaur in a Haystack: Reflections in natural history", 1995)

"Patterns experienced again and again become intuitions. […] Intuitive judgments are made by our use of imagery; intuition is the result of mental model building. […] The mental model used and the form of the intuition is dependent upon the question being answered." (Roger Frantz,"Two Minds", 2005)

More quotes on "Intuition" at the-web-of-knowledge.blogspot.com.

06 December 2018

🔭Data Science: Assumptions (Just the Quotes)

"Every hypothesis must derive indubitable results from mechanically well-defined assumptions by mathematically correct methods." (Ludwig Boltzmann, "Certain Questions of the Theory of Gasses", Nature Vol. 51 (1322), 1895)

"As soon as science has emerged from its initial stages, theoretical advances are no longer achieved merely by a process of arrangement. Guided by empirical data, the investigator rather develops a system of thought which, in general, is built up logically from a small number of fundamental assumptions, the so-called axioms. We call such a system of thought a theory. The theory finds the justification for its existence in the fact that it correlates a large number of single observations, and it is just here that the 'truth' of the theory lies." (Albert Einstein: "Relativity: The Special and General Theory", 1916)

"We can invent as many theories we like, and any one of them can be made to fit the facts. But that theory is always preferred which makes the fewest number of assumptions." (Albert Einstein [interview] 1929)

"[…] the process of scientific discovery may be regarded as a form of art. This is best seen in the theoretical aspects of Physical Science. The mathematical theorist builds up on certain assumptions and according to well understood logical rules, step by step, a stately edifice, while his imaginative power brings out clearly the hidden relations between its parts. A well-constructed theory is in some respects undoubtedly an artistic production." (Ernest Rutherford, 1932)

"The scientist who discovers a theory is usually guided to his discovery by guesses; he cannot name a method by means of which he found the theory and can only say that it appeared plausible to him, that he had the right hunch or that he saw intuitively which assumption would fit the facts." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"We are driven to conclude that science, like mathematics, is a system of axioms, assumptions, and deductions; it may start from being, but later leaves it to itself, and ends in the formation of a hypothetical reality that has nothing to do with existence; or it is the discovery of an ideal being which is, of course, present in what we call actuality, and renders it an existence for us only by being present in it." (Poolla T Raju, "Idealistic Thought of India", 1953)

"Assumptions that we make, such as those concerning the form of the population sampled, are always untrue." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"A model is a useful (and often indispensable) framework on which to organize our knowledge about a phenomenon. […] It must not be overlooked that the quantitative consequences of any model can be no more reliable than the a priori agreement between the assumptions of the model and the known facts about the real phenomenon. When the model is known to diverge significantly from the facts, it is self-deceiving to claim quantitative usefulness for it by appeal to agreement between a prediction of the model and observation." (John R Philip, 1966)

"Mental models are fuzzy, incomplete, and imprecisely stated. Furthermore, within a single individual, mental models change with time, even during the flow of a single conversation. The human mind assembles a few relationships to fit the context of a discussion. As debate shifts, so do the mental models. Even when only a single topic is being discussed, each participant in a conversation employs a different mental model to interpret the subject. Fundamental assumptions differ but are never brought into the open. […] A mental model may be correct in structure and assumptions but, even so, the human mind - either individually or as a group consensus - is apt to draw the wrong implications for the future." (Jay W Forrester, "Counterintuitive Behaviour of Social Systems", Technology Review, 1971)

"However, and conversely, our models fall far short of representing the world fully. That is why we make mistakes and why we are regularly surprised. In our heads, we can keep track of only a few variables at one time. We often draw illogical conclusions from accurate assumptions, or logical conclusions from inaccurate assumptions. Most of us, for instance, are surprised by the amount of growth an exponential process can generate. Few of us can intuit how to damp oscillations in a complex system." (Donella H Meadows, "Limits to Growth", 1972)

“No equation, however impressive and complex, can arrive at the truth if the initial assumptions are incorrect.” (Arthur C Clarke, “Profiles of the Future”, 1973)

"A model […] is a story with a specified structure: to explain this catch phrase is to explain what a model is. The structure is given by the logical and mathematical form of a set of postulates, the assumptions of the model. The structure forms an uninterpreted system, in much the way the postulates of a pure geometry are now commonly regarded as doing. The theorems that follow from the postulates tell us things about the structure that may not be apparent from an examination of the postulates alone." (Allan Gibbard & Hal R. Varian, "Economic Models", The Journal of Philosophy, Vol. 75, No. 11, 1978)

"The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning." (Stephen J Gould, "The Mismeasure of Man", 1980)

"The assumptions and definitions of mathematics and science come from our intuition, which is based ultimately on experience. They then get shaped by further experience in using them and are occasionally revised. They are not fixed for all eternity." (Richard Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"The model is only a suggestive metaphor, a fiction about the messy and unwieldy observations of the real world. In order for it to be persuasive, to convey a sense of credibility, it is important that it not be too complicated and that the assumptions that are made be clearly in evidence. In short, the model must be simple, transparent, and verifiable." (Edward Beltrami, "Mathematics for Dynamic Modeling", 1987)

"The most misleading assumptions are the ones you don’t even know you’re making." Douglas N Adams, "Last Chance to See", 1990)

"Each of us carries within us a worldview, a set of assumptions about how the world works - what some call a paradigm - that forms the very questions we allow ourselves to ask, and determines our view of future possibilities." (Frances M Lappé, “Rediscovering America's Values”, 1991)

"A model is something one tries to construct when one has to describe a complicated situation. A model is therefore an approximate description of reality and invariably involves many simplifying assumptions. […] models are convenient idealisations." (Ganeschan Venkataraman, "Chandrasekhar and His Limit", 1992)

"Nature behaves in ways that look mathematical, but nature is not the same as mathematics. Every mathematical model makes simplifying assumptions; its conclusions are only as valid as those assumptions. The assumption of perfect symmetry is excellent as a technique for deducing the conditions under which symmetry-breaking is going to occur, the general form of the result, and the range of possible behaviour. To deduce exactly which effect is selected from this range in a practical situation, we have to know which imperfections are present" (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"Mental models are the images, assumptions, and stories which we carry in our minds of ourselves, other people, institutions, and every aspect of the world. Like a pane of glass framing and subtly distorting our vision, mental models determine what we see. Human beings cannot navigate through the complex environments of our world without cognitive ‘mental maps’; and all of these mental maps, by definition, are flawed in some way." (Peter M Senge, "The Fifth Discipline Fieldbook: Strategies and Tools for Building a Learning Organization", 1994)

"Formulation of a mathematical model is the first step in the process of analyzing the behaviour of any real system. However, to produce a useful model, one must first adopt a set of simplifying assumptions which have to be relevant in relation to the physical features of the system to be modelled and to the specific information one is interested in. Thus, the aim of modelling is to produce an idealized description of reality, which is both expressible in a tractable mathematical form and sufficiently close to reality as far as the physical mechanisms of interest are concerned." (Francois Axisa, "Discrete Systems" Vol. I, 2001)

"What is a mathematical model? One basic answer is that it is the formulation in mathematical terms of the assumptions and their consequences believed to underlie a particular ‘real world’ problem. The aim of mathematical modeling is the practical application of mathematics to help unravel the underlying mechanisms involved in, for example, economic, physical, biological, or other systems and processes." (John A Adam, "Mathematics in Nature", 2003)

“Mathematics provides a good part of the cultural context for the worlds of science and technology. Much of that context lies not only in the explicit mathematics that is used, but also in the assumptions and worldview that mathematics brings along with it.” (William Byers, “How Mathematicians Think”, 2007)

"A theory is a speculative explanation of a particular phenomenon which derives it legitimacy from conforming to the primary assumptions of the worldview of the culture in which it appears. There can be more than one theory for a particular phenomenon that conforms to a given worldview." (Michael G Jackson, "Transformative Learning for a New Worldview: Learning to Think Differently", 2008)

"In order to deal with these phenomena, we abstract from details and attempt to concentrate on the larger picture - a particular set of features of the real world or the structure that underlies the processes that lead to the observed outcomes. Models are such abstractions of reality. Models force us to face the results of the structural and dynamic assumptions that we have made in our abstractions." (Bruce Hannon and Matthias Ruth, "Dynamic Modeling of Diseases and Pests", 2009)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models. [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply)." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […]." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Once a model has been fitted to the data, the deviations from the model are the residuals. If the model is appropriate, then the residuals mimic the true errors. Examination of the residuals often provides clues about departures from the modeling assumptions. Lack of fit - if there is curvature in the residuals, plotted versus the fitted values, this suggests there may be whole regions where the model overestimates the data and other whole regions where the model underestimates the data. This would suggest that the current model is too simple relative to some better model." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Prediction about the future assumes that the statistical model will continue to fit future data. There are several reasons this is often implausible, but it also seems clear that the model will often degenerate slowly in quality, so that the model will fit data only a few periods in the future almost as well as the data used to fit the model. To some degree, the reliability of extrapolation into the future involves subject-matter expertise." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

“A worldview is a commitment, a fundamental orientation of the heart, that can be expressed as a story or in a set of presuppositions (assumptions which may be true, partially true or entirely false) which we hold (consciously or subconsciously, consistently or inconsistently) about the basic constitution of reality, and that provides the foundations on which we live and more and have our being.” (James W Sire, “Naming the Elephant: Worldview as a Concept”, 2015)

"The social world that humans have made for themselves is so complex that the mind simplifies the world by using heuristics, customs, and habits, and by making models or assumptions about how things generally work (the ‘causal structure of the world’). And because people rely upon (and are invested in) these mental models, they usually prefer that they remain uncontested." (Dr James Brennan, "Psychological Adjustment to Illness and Injury", West of England Medical Journal Vol. 117 (2), 2018)

"Any machine learning model is trained based on certain assumptions. In general, these assumptions are the simplistic approximations of some real-world phenomena. These assumptions simplify the actual relationships between features and their characteristics and make a model easier to train. More assumptions means more bias. So, while training a model, more simplistic assumptions = high bias, and realistic assumptions that are more representative of actual phenomena = low bias." (Imran Ahmad, "40 Algorithms Every Programmer Should Know", 2020)

More quotes on "Assumptions" at the-web-of-knowledge.blogspot.com.